All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Deucher, Alexander" <Alexander.Deucher@amd.com>
To: "Michel Dänzer" <michel@daenzer.net>,
	"Alex Deucher" <alexdeucher@gmail.com>
Cc: xgqt <xgqt@riseup.net>, amd-gfx list <amd-gfx@lists.freedesktop.org>
Subject: Re: AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!"
Date: Mon, 28 Jun 2021 17:16:11 +0000	[thread overview]
Message-ID: <BL1PR12MB514478C04EC9E42F39F9C8BDF7039@BL1PR12MB5144.namprd12.prod.outlook.com> (raw)
In-Reply-To: <c2b9b42d-55e1-fa5d-8e10-ea474fcd9221@daenzer.net>


[-- Attachment #1.1: Type: text/plain, Size: 6625 bytes --]

[Public]

Thanks for narrowing this down.  There is new PCO SDMA firmware available (attached).  Can you try it?

Thanks,

Alex
________________________________
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of Michel Dänzer <michel@daenzer.net>
Sent: Thursday, June 24, 2021 6:51 AM
To: Alex Deucher <alexdeucher@gmail.com>
Cc: xgqt <xgqt@riseup.net>; amd-gfx list <amd-gfx@lists.freedesktop.org>
Subject: Re: AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!"

On 2021-06-04 3:08 p.m., Michel Dänzer wrote:
> On 2021-06-04 2:33 p.m., Alex Deucher wrote:
>> On Fri, Jun 4, 2021 at 3:47 AM Michel Dänzer <michel@daenzer.net> wrote:
>>>
>>> On 2021-05-19 3:57 p.m., Alex Deucher wrote:
>>>> On Wed, May 19, 2021 at 4:48 AM Michel Dänzer <michel@daenzer.net> wrote:
>>>>>
>>>>> On 2021-05-19 12:05 a.m., Alex Deucher wrote:
>>>>>> On Tue, May 18, 2021 at 10:11 AM Michel Dänzer <michel@daenzer.net> wrote:
>>>>>>>
>>>>>>> On 2021-05-17 11:33 a.m., xgqt wrote:
>>>>>>>> Hello!
>>>>>>>>
>>>>>>>> I run a AMD laptop "81NC Lenovo IdeaPad S340-15API" - AMD Ryzen 5 3500U with Radeon Vega 8 Graphics.
>>>>>>>> Recently some breakages started happening for me. In about 1h after boot-up while using a KDE desktop machine GUI would freeze. Sometimes it would be possible to move the mouse but the rest will be frozen. Screen may start blinking or go black.
>>>>>>>>
>>>>>>>> I'm not sure if this is my kernel, firmware or the hardware.
>>>>>>>> I don't understands dmesg that's why I'm guessing, but I think it is the firmware since this behavior started around 2021-05-15.
>>>>>>>> From my Portage logs I see that I updated my firmware on 2021-05-14 at 18:16:06.
>>>>>>>> So breakages started with my kernel: 5.10.27 and FW: 20210511.
>>>>>>>> After breakage I jumped to a older kernel 5.4.97 and compiled 5.12.4. I didn't notice a breakage on 5.4.97 but system ran ~40 minutes.
>>>>>>>> So I booted to newly compiled 5.12.4 where I was ~1h and it broke.
>>>>>>>> After that I booted to 5.4.97 again and downgraded my FW.
>>>>>>>> While I'm writing this I'm booted to kernel: 5.12.4 with FW: 20210315.
>>>>>>>>
>>>>>>>> I also described my situation on the Gentoo bugzilla: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.gentoo.org%2F790566&amp;data=04%7C01%7Calexander.deucher%40amd.com%7C06c9a5296ad74b0cd02408d936fe00bc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637601286843342891%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=5HKZUabvEZWI%2BzQUBBPWl3Cpiy7Zjs%2BqaKa4XZyNK1g%3D&amp;reserved=0
>>>>>>>>
>>>>>>>> "dmesg.log" attached here is from the time machine run fine (at the moment); "errors_sat_may_15_072825_pm_cest_2021.log" is a dmesg log from the time system broke
>>>>>>>>
>>>>>>>> Can I get any help with this? What are the next steps I should take? Any other files I should provide?
>>>>>>>
>>>>>>> I've hit similar hangs with a Lenovo ThinkPad E595 (Ryzen 7 3700U / Picasso / RAVEN 0x1002:0x15D8 0x17AA:0x5124 0xC1). I'm also suspecting them to be firware related. The hangs occurred with firmware from the AMD 20.50 release. I'm currently running with firmware from the 20.40 release, no hang in almost 2 weeks (the hangs happened within 1-2 days after boot).
>>>>>>
>>>>>> Can you narrow down which firmware(s) cause the problem?
>>>>>
>>>>> I'll try, but note I'm not really sure yet my hangs were related to firmware (only). Anyway, I'll try narrowing it down.
>>>>
>>>> Thanks.  Does this patch help?
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fpatch%2F433701%2F&amp;data=04%7C01%7Calexander.deucher%40amd.com%7C06c9a5296ad74b0cd02408d936fe00bc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637601286843352846%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1BJky5Nl47A2ytThBe44pAJEHKEARozWTjskAdkK1s8%3D&amp;reserved=0
>>>
>>> Unfortunately not. After no hangs for two weeks with older firmware, I just got a hang again within a day with newer firmware and a kernel with this fix.
>>>
>>>
>>> I'll try and narrow down which firmware triggers it now. Does Picasso use the picasso_*.bin ones only, or others as well?
>>
>> The picasso ones and raven_dmcu.bin.
>
> Thanks. raven_dmcu.bin hasn't changed, so I'm trying to bisect the 8 Picasso ones which have changed:
>
> picasso_asd.bin
> picasso_ce.bin
> picasso_me.bin
> picasso_mec2.bin
> picasso_mec.bin
> picasso_pfp.bin
> picasso_sdma.bin
> picasso_vcn.bin

Things are pointing to picasso_sdma.bin. I'm currently running with only that one reverted to linux-firmware 20210315, and haven't got any hangs for a week.

Note that I've previously gone for a week without a hang even with firmware which had hung before. So there's still a small chance that I'm just on another lucky run.

That said, Pierre-Eric has also homed in on raven_sdma.bin for similar hangs, and reverting to older firmware seems to have helped multiple people on bug reports.

So, I think it makes sense for you guys to start looking for what could be going wrong with the Picasso/Raven SDMA firmware from 20.50. One thing I noticed is that the SDMA firmware from 20.50 advertises the same feature version, but a *lower* firmware version than the one from 18.50. So it might be worth double-checking that there wasn't an accidental downgrade to some older version.


--
Earthling Michel Dänzer               |               https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2F&amp;data=04%7C01%7Calexander.deucher%40amd.com%7C06c9a5296ad74b0cd02408d936fe00bc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637601286843352846%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=a4DpKvRRhPfsEg82S8CWs%2FFORSeK22RPe1Grbbkd8qE%3D&amp;reserved=0
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Calexander.deucher%40amd.com%7C06c9a5296ad74b0cd02408d936fe00bc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637601286843352846%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=oa3XWhbFjxkpciPx%2BDDcni5fVnkVGGgeRe%2FQimF7vRo%3D&amp;reserved=0

[-- Attachment #1.2: Type: text/html, Size: 11228 bytes --]

[-- Attachment #2: picasso_sdma.bin --]
[-- Type: application/octet-stream, Size: 17408 bytes --]

[-- Attachment #3: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  reply	other threads:[~2021-06-28 17:16 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-17  9:33 AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!" xgqt
2021-05-18 14:10 ` Michel Dänzer
2021-05-18 22:05   ` Alex Deucher
2021-05-19  8:48     ` Michel Dänzer
2021-05-19 13:57       ` Alex Deucher
2021-06-04  7:47         ` Michel Dänzer
2021-06-04 12:33           ` Alex Deucher
2021-06-04 13:08             ` Michel Dänzer
2021-06-24 10:51               ` Michel Dänzer
2021-06-28 17:16                 ` Deucher, Alexander [this message]
2021-06-29 10:36                   ` Michel Dänzer
2021-07-08 16:13                     ` Michel Dänzer
2021-07-11  7:48                       ` Ketsui
2021-07-12  8:36                         ` Michel Dänzer
2021-07-13 14:40                         ` Alex Deucher
2021-07-15  1:42                           ` Ketsui
2021-07-15 17:07                       ` Michel Dänzer
2021-06-01  2:29 Ketsui
2021-06-29 15:45 Ketsui
2021-06-29 20:06 ` Alex Deucher
2021-06-30  4:18 Ketsui
2021-06-30  4:48 ` Alex Deucher
2021-06-30 10:04 Ketsui
2021-07-01  4:39 ` Alex Deucher
2021-07-01 11:11 Ketsui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BL1PR12MB514478C04EC9E42F39F9C8BDF7039@BL1PR12MB5144.namprd12.prod.outlook.com \
    --to=alexander.deucher@amd.com \
    --cc=alexdeucher@gmail.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=michel@daenzer.net \
    --cc=xgqt@riseup.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.