regressions.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Thorsten Leemhuis <regressions@leemhuis.info>
To: "regressions@lists.linux.dev" <regressions@lists.linux.dev>
Subject: Re: [Bug 215315] New: [REGRESSION BISECTED] amdgpu crashes system suspend - NUC8i7HVKVA #forregzbot
Date: Mon, 10 Jan 2022 13:02:42 +0100	[thread overview]
Message-ID: <f0103811-bca2-e439-ca73-2132fd6e9871@leemhuis.info> (raw)
In-Reply-To: <8e1abb43-664b-5882-7c02-ef517c14fc94@leemhuis.info>

For the record, the culprit was reverted:

https://git.kernel.org/torvalds/c/df5bc0aa7ff6e2e14cb75182b4eda20253c711d4

#regzbot fixed-by: df5bc0aa7ff6e2e14cb75182b4eda20253c711d4

TWIMC: this mail is primarily send for documentation purposes and for
regzbot, my Linux kernel regression tracking bot. These mails usually
contain '#forregzbot' in the subject, to make them easy to spot and filter.

On 13.12.21 07:04, Thorsten Leemhuis wrote:
> [TLDR: adding this regression to regzbot; most of this mail is compiled
> from a few templates paragraphs some of you might have seen already.]
> 
> Hi, this is your Linux kernel regression tracker speaking.
> 
> Top-posting for once, to make this easy accessible to everyone.
> 
> Thanks for the report.
> 
> Adding the regression mailing list to the list of recipients, as it
> should be in the loop for all regressions, as explained here:
> https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html
> 
> To be sure this issue doesn't fall through the cracks unnoticed, I'm
> adding it to regzbot, my Linux kernel regression tracking bot:
> 
> #regzbot ^introduced f7d6779df642720e22bffd449e683bb8690bd3bf
> #regzbot title drm: amdgpu: NUC8i7HVKVA crashes during system suspend
> #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=215315
> #regzbot ignore-activity
> 
> Reminder: when fixing the issue, please add a 'Link:' tag with the URL
> to the report (the parent of this mail), then regzbot will automatically
> mark the regression as resolved once the fix lands in the appropriate
> tree. For more details about regzbot see footer.
> 
> Sending this to everyone that got the initial report, to make all aware
> of the tracking. I also hope that messages like this motivate people to
> directly get at least the regression mailing list and ideally even
> regzbot involved when dealing with regressions, as messages like this
> wouldn't be needed then.
> 
> Don't worry, I'll send further messages wrt to this regression just to
> the lists (with a tag in the subject so people can filter them away), as
> long as they are intended just for regzbot. With a bit of luck no such
> messages will be needed anyway.
> 
> Ciao, Thorsten (wearing his 'Linux kernel regression tracker' hat).
> 
> P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
> on my table. I can only look briefly into most of them. Unfortunately
> therefore I sometimes will get things wrong or miss something important.
> I hope that's not the case here; if you think it is, don't hesitate to
> tell me about it in a public reply. That's in everyone's interest, as
> what I wrote above might be misleading to everyone reading this; any
> suggestion I gave thus might sent someone reading this down the wrong
> rabbit hole, which none of us wants.
> 
> BTW, I have no personal interest in this issue, which is tracked using
> regzbot, my Linux kernel regression tracking bot
> (https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
> this mail to get things rolling again and hence don't need to be CC on
> all further activities wrt to this regression.
> 
> 
> On 13.12.21 00:08, bugzilla-daemon@bugzilla.kernel.org wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=215315
>>
>>             Bug ID: 215315
>>            Summary: [REGRESSION BISECTED] amdgpu crashes system suspend -
>>                     NUC8i7HVKVA
>>            Product: Drivers
>>            Version: 2.5
>>     Kernel Version: 5.15-rc1, 5.15, 5.16-rc4
>>           Hardware: x86-64
>>                 OS: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: normal
>>           Priority: P1
>>          Component: Video(DRI - non Intel)
>>           Assignee: drivers_video-dri@kernel-bugs.osdl.org
>>           Reporter: lenb@kernel.org
>>         Regression: No
>>
>> My Intel NUC8i7HVKVA has an AMD GPU.
>>
>> Until 5.15-rc1, this machine was rock solid in suspend stress testing -- never
>> crashing after hundreds of hours of back-to-back suspend cycles.
>>
>> Until this patch went upstream:
>>
>> commit f7d6779df642720e22bffd449e683bb8690bd3bf (refs/bisect/bad)
>> Author: Guchun Chen <guchun.chen@amd.com>
>> Date:   Fri Aug 27 18:31:41 2021 +0800
>>
>>     drm/amdgpu: stop scheduler when calling hw_fini (v2)
>>
>>     This gurantees no more work on the ring can be submitted
>>     to hardware in suspend/resume case, otherwise a potential
>>     race will occur and the ring will get no chance to stay
>>     empty before suspend.
>>
>>     v2: Call drm_sched_resubmit_job before drm_sched_start to
>>     restart jobs from the pending list.
>>
>>     Suggested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>     Suggested-by: Christian König <christian.koenig@amd.com>
>>     Signed-off-by: Guchun Chen <guchun.chen@amd.com>
>>     Reviewed-by: Christian König <christian.koenig@amd.com>
>>     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>     Cc: stable@vger.kernel.org
>>
>> I bisected that the patch before this one was integrated can handle over 1,000
>> back-to-back "freeze" system suspend cycles.  Yet, when this patch is present,
>> the system may crash before it completes only 100 cycles, and at most lasts a
>> few hundred cycles.
>>
>> This crash is present in all following upstream rc's, including 5.15-rc4.
>>
>> When I revert this patch from 5.15-rc4, stability returns.
>>
>> Usually, the crash is manifest by a black screen, and a system that does not
>> respond to ping, and will only respond to a long AC power button press to
>> remove power; and a subsequent cold reboot.
>>
>> I have witnessed the crash occur, and the "ubuntu color themed" screen enters
>> some sort of reverse video mode.  In this weird color mode, I've seen a text
>> window oscillate between scrolling and un-scrolling for a line -- sort of like
>> it is going back in time, but then changes its mind.  There is no response to
>> keyboard, mouse, or network input.
>>
> 

      reply	other threads:[~2022-01-10 12:02 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <bug-215315-2300@https.bugzilla.kernel.org/>
2021-12-13  6:04 ` [Bug 215315] New: [REGRESSION BISECTED] amdgpu crashes system suspend - NUC8i7HVKVA Thorsten Leemhuis
2022-01-10 12:02   ` Thorsten Leemhuis [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f0103811-bca2-e439-ca73-2132fd6e9871@leemhuis.info \
    --to=regressions@leemhuis.info \
    --cc=regressions@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).