[AMD Official Use Only]


Hi Michel,

The problem with -ERESTARTSYS is the same half-baked atomic state with modifications we made in the interrupted atomic check, is reused in the next retry and fails the atomic check. What we expect in the next retry is with the original atomic state. I am going to dig deeper and see if at DRM side we can go back to use to the original atomic state in the retry.


Regards

Stylon Wang

MTS Software Development Eng.  |  AMD
Display Solution Team

O +(886) 2-3789-3667 ext. 23667  C +(886) 921-897-142

----------------------------------------------------------------------------------------------------------------------------------

6F, 3, YuanCyu St (NanKang Software Park) Taipei, Taiwan

Facebook |  Twitter |  amd.com 

 


From: Michel Dänzer <michel@daenzer.net>
Sent: October 26, 2021 11:51 PM
To: Wang, Chao-kai (Stylon) <Stylon.Wang@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
Cc: Wentland, Harry <Harry.Wentland@amd.com>; Siqueira, Rodrigo <Rodrigo.Siqueira@amd.com>; contact@emersion.fr <contact@emersion.fr>; Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>
Subject: Re: [PATCH] drm/amd/display: Fix error handling on waiting for completion
 
On 2021-10-26 13:07, Stylon Wang wrote:
> [Why]
> In GNOME Settings->Display the switching from mirror mode to single display
> occasionally causes wait_for_completion_interruptible_timeout() to return
> -ERESTARTSYS and fails atomic check.
>
> [How]
> Replace the call with wait_for_completion_timeout() since the waiting for
> hw_done and flip_done completion doesn't need to worry about interruption
> from signal.
>
> Signed-off-by: Stylon Wang <stylon.wang@amd.com>
> ---
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 4cd64529b180..b8f4ff323de1 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -9844,10 +9844,10 @@ static int do_aquire_global_lock(struct drm_device *dev,
>                 * Make sure all pending HW programming completed and
>                 * page flips done
>                 */
> -             ret = wait_for_completion_interruptible_timeout(&commit->hw_done, 10*HZ);
> +             ret = wait_for_completion_timeout(&commit->hw_done, 10*HZ);

>                if (ret > 0)
> -                     ret = wait_for_completion_interruptible_timeout(
> +                     ret = wait_for_completion_timeout(
>                                        &commit->flip_done, 10*HZ);

>                if (ret == 0)
>

The *_interruptible_* variant is needed so that the display manager process can be killed while it's waiting here, which could take up to 10 seconds (per the timeout).

What's the problem with -ERESTARTSYS? Either the ioctl should be restarted automatically, or if it bounces back to user space, that needs to be able to retry the ioctl while it returns -1 and errno == EINTR. drmIoctl handles this transparently.


--
Earthling Michel Dänzer            |                  https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2F&amp;data=04%7C01%7Cstylon.wang%40amd.com%7C251ee7aba8574015713a08d998986a5f%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637708602663589383%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=LcSFyj%2FJ9VYbNfxJQRjpiRAcurbzTbD5yUVysxzpmXs%3D&amp;reserved=0
Libre software enthusiast          |         Mesa and Xwayland developer