All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michel Dänzer" <michel.daenzer@mailbox.org>
To: "Christian König" <christian.koenig@amd.com>,
	"Alex Hung" <alex.hung@amd.com>,
	"Alex Deucher" <alexdeucher@gmail.com>
Cc: stylon.wang@amd.com, Sunpeng.Li@amd.com,
	Bhawanpreet.Lakha@amd.com, qingqing.zhuo@amd.com,
	Rodrigo.Siqueira@amd.com, roman.li@amd.com,
	amd-gfx@lists.freedesktop.org, solomon.chiu@amd.com,
	Aurabindo Pillai <aurabindo.pillai@amd.com>,
	wayne.lin@amd.com, Harry.Wentland@amd.com,
	agustin.gutierrez@amd.com, pavle.kotarac@amd.com
Subject: Re: [PATCH 05/18] drm/amd/display: Use mdelay to avoid crashes
Date: Thu, 15 Dec 2022 11:29:42 +0100	[thread overview]
Message-ID: <e8801420-3212-702a-93dd-1b3f323bfd87@mailbox.org> (raw)
In-Reply-To: <e2337d02-594e-9c69-d8a4-b046fc9c221f@amd.com>

On 12/15/22 09:09, Christian König wrote:
> Am 15.12.22 um 00:33 schrieb Alex Hung:
>> On 2022-12-14 16:06, Alex Deucher wrote:
>>> On Wed, Dec 14, 2022 at 5:56 PM Alex Hung <alex.hung@amd.com> wrote:
>>>> On 2022-12-14 15:35, Alex Deucher wrote:
>>>>> On Wed, Dec 14, 2022 at 5:25 PM Alex Hung <alex.hung@amd.com> wrote:
>>>>>> On 2022-12-14 14:54, Alex Deucher wrote:
>>>>>>> On Wed, Dec 14, 2022 at 4:50 PM Alex Hung <alex.hung@amd.com> wrote:
>>>>>>>> On 2022-12-14 13:48, Alex Deucher wrote:
>>>>>>>>> On Wed, Dec 14, 2022 at 3:22 PM Aurabindo Pillai
>>>>>>>>> <aurabindo.pillai@amd.com> wrote:
>>>>>>>>>>
>>>>>>>>>> From: Alex Hung <alex.hung@amd.com>
>>>>>>>>>>
>>>>>>>>>> [Why]
>>>>>>>>>> When running IGT kms_bw test with DP monitor, some systems crash from
>>>>>>>>>> msleep no matter how long or short the time is.
>>>>>>>>>>
>>>>>>>>>> [How]
>>>>>>>>>> To replace msleep with mdelay.
>>>>>>>>>
>>>>>>>>> Can you provide a bit more info about the crash?  A lot of platforms
>>>>>>>>> don't support delay larger than 2-4ms so this change will generate
>>>>>>>>> errors on ARM and possibly other platforms.
>>>>>>>>>
>>>>>>>>> Alex
>>>>>>>>
>>>>>>>> The msleep was introduced in eec3303de3378 for non-compliant display
>>>>>>>> port monitors but IGT's kms_bw test can cause a recent h/w to hang at
>>>>>>>> msleep(60) when calling "igt_remove_fb" in IGT
>>>>>>>> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Figt-gpu-tools%2F-%2Fblob%2Fmaster%2Ftests%2Fkms_bw.c%23L197&amp;data=05%7C01%7Calex.hung%40amd.com%7C81664450189542a7bbc408dade27d0e9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638066559844526853%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=M%2BF4H2qddXItPoUZRVyhlc5N8UF1%2FHIh8PpfT%2BCmdZ4%3D&amp;reserved=0)
>>>>>>>>
>>>>>>>> It is possible to workaround this by reversing order of
>>>>>>>> igt_remove_fb(&buffer[i]), as the following example:
>>>>>>>>
>>>>>>>>       igt_create_color_fb with the order buffer[0], buffer[1], buffer[2]
>>>>>>>>
>>>>>>>> Hangs:
>>>>>>>>       igt_remove_fb with the order buffer[0], buffer[1], buffer[2]
>>>>>>>>
>>>>>>>> No hangs:
>>>>>>>>       igt_remove_fb with the reversed order buffer[2], buffer[1], buffer[0]
>>>>>>>>
>>>>>>>> However, IGT simply exposes the problem and it makes more sense to stop
>>>>>>>> the hang from occurring.
>>>>>>>>
>>>>>>>> I also tried to remove the msleep completely and it also work, but I
>>>>>>>> didn't want to break the fix for the original problematic hardware
>>>>>>>> configuration.
>>>>>>>
>>>>>>> Why does sleep vs delay make a difference?  Is there some race that we
>>>>>>> are not locking against?
>>>>>>>
>>>>>>> Alex
>>>>>>>
>>>>>>
>>>>>> That was my original thought but I did not find any previously. I will
>>>>>> investigate it again.
>>>>>>
>>>>>> If mdelay(>4) isn't usable on other platforms, is it an option to use
>>>>>> mdelay on x86_64 only and keep msleep on other platforms or just remove
>>>>>> the msleep for other platforms, something like
>>>>>>
>>>>>> -                       msleep(60);
>>>>>> +#ifdef CONFIG_X86_64
>>>>>> +                       mdelay(60);
>>>>>> +#endif
>>>>>
>>>>> That's pretty ugly.  I'd rather try and resolve the root cause.  How
>>>>> important is the IGT test?  What does it do?  Is the test itself
>>>>> correct?
>>>>>
>>>>
>>>> Agreed, and I didn't want to add conditions around the mdelay for the
>>>> same reason. I will assume this is not an option now.
>>>>
>>>> As in the previous comment, IGT can be modified to avoid the crash by
>>>> reversing the order fb is removed - though I suspect I will receive
>>>> questions why this is not fixed in kernel.
>>>>
>>>> I wanted to fix this in kernel because nothing stops other user-space
>>>> applications to use the same way to crash kernel, so fixing IGT is the
>>>> second option.
>>>>
>>>> Apparently causing problems on other platforms isn't an option at all so
>>>> I will try to figure out an non-mdelay solution, and then maybe an IGT
>>>> solution instead.
>>>
>>> What hangs?  The test or the kernel or the hardware?
>>
>> The system becomes completely unresponsive - no keyboard, mouse nor remote accesses.
> 
> I agree with Alex that changing this is extremely questionable and not justified at all.
> 
> My educated guess is that by using mdelay() instead of msleep() we keep the CPU core busy and preventing something from happening at the same time as something else.
> 
> This clearly points to missing locking or similar to protect concurrent execution of things.
Might another possibility be that this code gets called from an atomic context which can't sleep?


-- 
Earthling Michel Dänzer            |                  https://redhat.com
Libre software enthusiast          |         Mesa and Xwayland developer


  reply	other threads:[~2022-12-15 10:30 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-14 20:21 [PATCH 00/18] DC Patches for Dec 19, 2022 Aurabindo Pillai
2022-12-14 20:21 ` [PATCH 01/18] drm/amd/display: save restore hdcp state when display is unplugged from mst hub Aurabindo Pillai
2022-12-14 20:21 ` [PATCH 02/18] drm/amd/display: phase3 mst hdcp for multiple displays Aurabindo Pillai
2022-12-14 20:21 ` [PATCH 03/18] drm/amd/display: Uninitialized variables causing 4k60 UCLK to stay at DPM1 and not DPM0 Aurabindo Pillai
2022-12-14 20:21 ` [PATCH 04/18] drm/amd/display: Improvements in secure display Aurabindo Pillai
2022-12-14 20:21 ` [PATCH 05/18] drm/amd/display: Use mdelay to avoid crashes Aurabindo Pillai
2022-12-14 20:48   ` Alex Deucher
2022-12-14 21:50     ` Alex Hung
2022-12-14 21:54       ` Alex Deucher
2022-12-14 22:25         ` Alex Hung
2022-12-14 22:35           ` Alex Deucher
2022-12-14 22:55             ` Alex Hung
2022-12-14 23:06               ` Alex Deucher
2022-12-14 23:33                 ` Alex Hung
2022-12-15  8:09                   ` Christian König
2022-12-15 10:29                     ` Michel Dänzer [this message]
2022-12-15 15:17                       ` Harry Wentland
2022-12-15 19:38                         ` Aurabindo Pillai
2022-12-15 21:02                         ` Alex Hung
2022-12-16 19:47                           ` Harry Wentland
2022-12-14 20:21 ` [PATCH 06/18] drm/amd/display: Refactor ABM code flow Aurabindo Pillai
2022-12-14 20:21 ` [PATCH 07/18] drm/amd/display: Turn on phantom OTG before disabling phantom pipe Aurabindo Pillai
2022-12-14 20:21 ` [PATCH 08/18] drm/amd/display: patch cases with unknown plane state to prevent warning Aurabindo Pillai
2022-12-14 20:21 ` [PATCH 09/18] drm/amd/display: Defer DIG FIFO disable after VID stream enable Aurabindo Pillai
2022-12-14 20:21 ` [PATCH 10/18] drm/amd/display: fix dc_get_edp_link_panel_inst to only consider links with panels Aurabindo Pillai
2022-12-14 20:21 ` [PATCH 11/18] drm/amd/display: move dccg programming from link hwss hpo dp to hwss Aurabindo Pillai
2022-12-14 20:21 ` [PATCH 12/18] drm/amd/display: update pixel rate div in enable stream Aurabindo Pillai
2022-12-14 20:21 ` [PATCH 13/18] drm/amd/display: allow hpo and dio encoder switching during dp retrain test Aurabindo Pillai
2022-12-14 20:21 ` [PATCH 14/18] drm/amd/display: Fix crash when setting ABM pipe/backlight Aurabindo Pillai
2022-12-14 20:21 ` [PATCH 15/18] drm/amd/display: set ignore msa parameter only if freesync is enabled Aurabindo Pillai
2022-12-14 20:21 ` [PATCH 16/18] drm/amd/display: Adding braces to prepare for future changes to behavior of if block Aurabindo Pillai
2022-12-14 20:21 ` [PATCH 17/18] drm/amd/display: Reorder dc_state fields to optimize clearing the struct Aurabindo Pillai
2022-12-14 20:21 ` [PATCH 18/18] drm/amd/display: 3.2.217 Aurabindo Pillai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e8801420-3212-702a-93dd-1b3f323bfd87@mailbox.org \
    --to=michel.daenzer@mailbox.org \
    --cc=Bhawanpreet.Lakha@amd.com \
    --cc=Harry.Wentland@amd.com \
    --cc=Rodrigo.Siqueira@amd.com \
    --cc=Sunpeng.Li@amd.com \
    --cc=agustin.gutierrez@amd.com \
    --cc=alex.hung@amd.com \
    --cc=alexdeucher@gmail.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=aurabindo.pillai@amd.com \
    --cc=christian.koenig@amd.com \
    --cc=pavle.kotarac@amd.com \
    --cc=qingqing.zhuo@amd.com \
    --cc=roman.li@amd.com \
    --cc=solomon.chiu@amd.com \
    --cc=stylon.wang@amd.com \
    --cc=wayne.lin@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.