All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Grodzovsky, Andrey" <Andrey.Grodzovsky@amd.com>
To: "Deng, Emily" <Emily.Deng@amd.com>
Cc: "Deucher, Alexander" <Alexander.Deucher@amd.com>,
	"steven.price@arm.com" <steven.price@arm.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"Koenig, Christian" <Christian.Koenig@amd.com>
Subject: Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
Date: Tue, 26 Nov 2019 00:09:30 +0000	[thread overview]
Message-ID: <MWHPR12MB1453C6FC45A83482232CA3EDEA450@MWHPR12MB1453.namprd12.prod.outlook.com> (raw)
In-Reply-To: <MN2PR12MB2975C10E36FF996BD423CEBA8F4A0@MN2PR12MB2975.namprd12.prod.outlook.com>

Christian asked to submit it to drm-misc instead of our drm-next to avoid later conflicts with Steven's patch which he mentioned in this thread which is not in drm-next yet.
Christian, Alex, once this merged to drm-misc I guess we need to pull all latest changes from there to drm-next so the issue Emily reported can be avoided.

Andrey

________________________________________
From: Deng, Emily <Emily.Deng@amd.com>
Sent: 25 November 2019 16:44:36
To: Grodzovsky, Andrey
Cc: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian; steven.price@arm.com; Grodzovsky, Andrey
Subject: RE: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.

[AMD Official Use Only - Internal Distribution Only]

Hi Andrey,
    Seems you didn't submit this patch?

Best wishes
Emily Deng



>-----Original Message-----
>From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>Sent: Monday, November 25, 2019 12:51 PM
>Cc: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig,
>Christian <Christian.Koenig@amd.com>; Deng, Emily
><Emily.Deng@amd.com>; steven.price@arm.com; Grodzovsky, Andrey
><Andrey.Grodzovsky@amd.com>
>Subject: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
>
>Problem:
>Due to a race between drm_sched_cleanup_jobs in sched thread and
>drm_sched_job_timedout in timeout work there is a possiblity that bad job
>was already freed while still being accessed from the timeout thread.
>
>Fix:
>Instead of just peeking at the bad job in the mirror list remove it from the list
>under lock and then put it back later when we are garanteed no race with
>main sched thread is possible which is after the thread is parked.
>
>v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.
>
>v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
>drm_sched_get_cleanup_job already has a lock there.
>
>v4: Fix comments to relfect latest code in drm-misc.
>
>Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>Reviewed-by: Christian König <christian.koenig@amd.com>
>Tested-by: Emily Deng <Emily.Deng@amd.com>
>---
> drivers/gpu/drm/scheduler/sched_main.c | 27
>+++++++++++++++++++++++++++
> 1 file changed, 27 insertions(+)
>
>diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>b/drivers/gpu/drm/scheduler/sched_main.c
>index 6774955..1bf9c40 100644
>--- a/drivers/gpu/drm/scheduler/sched_main.c
>+++ b/drivers/gpu/drm/scheduler/sched_main.c
>@@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct
>work_struct *work)
>       unsigned long flags;
>
>       sched = container_of(work, struct drm_gpu_scheduler,
>work_tdr.work);
>+
>+      /* Protects against concurrent deletion in
>drm_sched_get_cleanup_job */
>+      spin_lock_irqsave(&sched->job_list_lock, flags);
>       job = list_first_entry_or_null(&sched->ring_mirror_list,
>                                      struct drm_sched_job, node);
>
>       if (job) {
>+              /*
>+               * Remove the bad job so it cannot be freed by concurrent
>+               * drm_sched_cleanup_jobs. It will be reinserted back after
>sched->thread
>+               * is parked at which point it's safe.
>+               */
>+              list_del_init(&job->node);
>+              spin_unlock_irqrestore(&sched->job_list_lock, flags);
>+
>               job->sched->ops->timedout_job(job);
>
>               /*
>@@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct
>work_struct *work)
>                       job->sched->ops->free_job(job);
>                       sched->free_guilty = false;
>               }
>+      } else {
>+              spin_unlock_irqrestore(&sched->job_list_lock, flags);
>       }
>
>       spin_lock_irqsave(&sched->job_list_lock, flags); @@ -370,6 +383,20
>@@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct
>drm_sched_job *bad)
>       kthread_park(sched->thread);
>
>       /*
>+       * Reinsert back the bad job here - now it's safe as
>+       * drm_sched_get_cleanup_job cannot race against us and release the
>+       * bad job at this point - we parked (waited for) any in progress
>+       * (earlier) cleanups and drm_sched_get_cleanup_job will not be
>called
>+       * now until the scheduler thread is unparked.
>+       */
>+      if (bad && bad->sched == sched)
>+              /*
>+               * Add at the head of the queue to reflect it was the earliest
>+               * job extracted.
>+               */
>+              list_add(&bad->node, &sched->ring_mirror_list);
>+
>+      /*
>        * Iterate the job list from later to  earlier one and either deactive
>        * their HW callbacks or remove them from mirror list if they already
>        * signaled.
>--
>2.7.4
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

WARNING: multiple messages have this Message-ID (diff)
From: "Grodzovsky, Andrey" <Andrey.Grodzovsky@amd.com>
To: "Deng, Emily" <Emily.Deng@amd.com>
Cc: "Deucher, Alexander" <Alexander.Deucher@amd.com>,
	"steven.price@arm.com" <steven.price@arm.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"Koenig,  Christian" <Christian.Koenig@amd.com>
Subject: Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
Date: Tue, 26 Nov 2019 00:09:30 +0000	[thread overview]
Message-ID: <MWHPR12MB1453C6FC45A83482232CA3EDEA450@MWHPR12MB1453.namprd12.prod.outlook.com> (raw)
Message-ID: <20191126000930.iMdA-8uoNW8tsiEppsdLglhTbI7Pxayp7aVnMiE94iQ@z> (raw)
In-Reply-To: <MN2PR12MB2975C10E36FF996BD423CEBA8F4A0@MN2PR12MB2975.namprd12.prod.outlook.com>

Christian asked to submit it to drm-misc instead of our drm-next to avoid later conflicts with Steven's patch which he mentioned in this thread which is not in drm-next yet.
Christian, Alex, once this merged to drm-misc I guess we need to pull all latest changes from there to drm-next so the issue Emily reported can be avoided.

Andrey

________________________________________
From: Deng, Emily <Emily.Deng@amd.com>
Sent: 25 November 2019 16:44:36
To: Grodzovsky, Andrey
Cc: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian; steven.price@arm.com; Grodzovsky, Andrey
Subject: RE: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.

[AMD Official Use Only - Internal Distribution Only]

Hi Andrey,
    Seems you didn't submit this patch?

Best wishes
Emily Deng



>-----Original Message-----
>From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>Sent: Monday, November 25, 2019 12:51 PM
>Cc: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig,
>Christian <Christian.Koenig@amd.com>; Deng, Emily
><Emily.Deng@amd.com>; steven.price@arm.com; Grodzovsky, Andrey
><Andrey.Grodzovsky@amd.com>
>Subject: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
>
>Problem:
>Due to a race between drm_sched_cleanup_jobs in sched thread and
>drm_sched_job_timedout in timeout work there is a possiblity that bad job
>was already freed while still being accessed from the timeout thread.
>
>Fix:
>Instead of just peeking at the bad job in the mirror list remove it from the list
>under lock and then put it back later when we are garanteed no race with
>main sched thread is possible which is after the thread is parked.
>
>v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.
>
>v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
>drm_sched_get_cleanup_job already has a lock there.
>
>v4: Fix comments to relfect latest code in drm-misc.
>
>Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>Reviewed-by: Christian König <christian.koenig@amd.com>
>Tested-by: Emily Deng <Emily.Deng@amd.com>
>---
> drivers/gpu/drm/scheduler/sched_main.c | 27
>+++++++++++++++++++++++++++
> 1 file changed, 27 insertions(+)
>
>diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>b/drivers/gpu/drm/scheduler/sched_main.c
>index 6774955..1bf9c40 100644
>--- a/drivers/gpu/drm/scheduler/sched_main.c
>+++ b/drivers/gpu/drm/scheduler/sched_main.c
>@@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct
>work_struct *work)
>       unsigned long flags;
>
>       sched = container_of(work, struct drm_gpu_scheduler,
>work_tdr.work);
>+
>+      /* Protects against concurrent deletion in
>drm_sched_get_cleanup_job */
>+      spin_lock_irqsave(&sched->job_list_lock, flags);
>       job = list_first_entry_or_null(&sched->ring_mirror_list,
>                                      struct drm_sched_job, node);
>
>       if (job) {
>+              /*
>+               * Remove the bad job so it cannot be freed by concurrent
>+               * drm_sched_cleanup_jobs. It will be reinserted back after
>sched->thread
>+               * is parked at which point it's safe.
>+               */
>+              list_del_init(&job->node);
>+              spin_unlock_irqrestore(&sched->job_list_lock, flags);
>+
>               job->sched->ops->timedout_job(job);
>
>               /*
>@@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct
>work_struct *work)
>                       job->sched->ops->free_job(job);
>                       sched->free_guilty = false;
>               }
>+      } else {
>+              spin_unlock_irqrestore(&sched->job_list_lock, flags);
>       }
>
>       spin_lock_irqsave(&sched->job_list_lock, flags); @@ -370,6 +383,20
>@@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct
>drm_sched_job *bad)
>       kthread_park(sched->thread);
>
>       /*
>+       * Reinsert back the bad job here - now it's safe as
>+       * drm_sched_get_cleanup_job cannot race against us and release the
>+       * bad job at this point - we parked (waited for) any in progress
>+       * (earlier) cleanups and drm_sched_get_cleanup_job will not be
>called
>+       * now until the scheduler thread is unparked.
>+       */
>+      if (bad && bad->sched == sched)
>+              /*
>+               * Add at the head of the queue to reflect it was the earliest
>+               * job extracted.
>+               */
>+              list_add(&bad->node, &sched->ring_mirror_list);
>+
>+      /*
>        * Iterate the job list from later to  earlier one and either deactive
>        * their HW callbacks or remove them from mirror list if they already
>        * signaled.
>--
>2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  reply	other threads:[~2019-11-26  0:09 UTC|newest]

Thread overview: 125+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-25 20:51 [PATCH v4] drm/scheduler: Avoid accessing freed bad job Andrey Grodzovsky
2019-11-25 20:51 ` Andrey Grodzovsky
2019-11-25 20:51 ` Andrey Grodzovsky
2019-11-25 21:44 ` Deng, Emily
2019-11-25 21:44   ` Deng, Emily
2019-11-25 21:44   ` Deng, Emily
2019-11-26  0:09   ` Grodzovsky, Andrey [this message]
2019-11-26  0:09     ` Grodzovsky, Andrey
2019-11-26  0:09     ` Grodzovsky, Andrey
     [not found]     ` <MWHPR12MB1453C6FC45A83482232CA3EDEA450-Gy0DoCVfaSWZBIDmKHdw+wdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-11-26 15:36       ` Deucher, Alexander
2019-11-26 15:36         ` Deucher, Alexander
2019-11-26 15:36         ` Deucher, Alexander
2019-11-26 15:37 ` Andrey Grodzovsky
2019-11-26 15:37   ` Andrey Grodzovsky
     [not found]   ` <b8b716a7-e235-38b2-ea6d-0a21881fa64e-5C7GfCeVMHo@public.gmane.org>
2019-11-27  0:41     ` Deng, Emily
2019-11-27  0:41       ` Deng, Emily
2019-11-27  0:41       ` Deng, Emily
     [not found]       ` <MN2PR12MB2975CA8858F21FDF325C33FE8F440-rweVpJHSKToFlvJWC7EAqwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-12-02 19:24         ` Deng, Emily
2019-12-02 19:24           ` Deng, Emily
2019-12-02 19:24           ` Deng, Emily
2019-12-03 19:10           ` Andrey Grodzovsky
2019-12-03 19:10             ` Andrey Grodzovsky
2019-12-03 19:44             ` Deucher, Alexander
2019-12-03 19:44               ` Deucher, Alexander
2019-12-03 19:57               ` Andrey Grodzovsky
2019-12-03 19:57                 ` Andrey Grodzovsky
2019-12-03 19:59                 ` Deucher, Alexander
2019-12-03 19:59                   ` Deucher, Alexander
2019-12-03 20:32                   ` Andrey Grodzovsky
2019-12-03 20:32                     ` Andrey Grodzovsky
2019-12-03 20:58                     ` Deng, Emily
2019-12-03 20:58                       ` Deng, Emily
2019-12-03 19:53             ` Deng, Emily
2019-12-03 19:53               ` Deng, Emily
2020-02-05 18:24 ` Lucas Stach
2020-02-05 18:24   ` Lucas Stach
2020-02-06 11:10   ` Lucas Stach
2020-02-06 11:10     ` Lucas Stach
2020-02-06 11:49     ` Christian König
2020-02-06 11:49       ` Christian König
2020-02-06 14:49       ` Alex Deucher
2020-02-06 14:49         ` Alex Deucher
2020-02-06 14:51         ` Christian König
2020-02-06 14:51           ` Christian König
2020-02-06 15:49           ` Andrey Grodzovsky
2020-02-06 15:49             ` Andrey Grodzovsky
2020-02-10 16:55             ` Andrey Grodzovsky
2020-02-10 16:55               ` Andrey Grodzovsky
2020-02-10 21:50               ` Luben Tuikov
2020-02-10 21:50                 ` Luben Tuikov
2020-02-11 15:55                 ` Andrey Grodzovsky
2020-02-11 15:55                   ` Andrey Grodzovsky
2020-02-11 21:27                   ` Andrey Grodzovsky
2020-02-11 21:27                     ` Andrey Grodzovsky
2020-02-12  0:53                     ` Luben Tuikov
2020-02-12  0:53                       ` Luben Tuikov
2020-02-12 16:33                       ` Andrey Grodzovsky
2020-02-12 16:33                         ` Andrey Grodzovsky
2020-07-21 11:03                         ` Lucas Stach
2020-07-21 11:03                           ` Lucas Stach
2020-07-21 13:36                           ` Andrey Grodzovsky
2020-07-21 13:36                             ` Andrey Grodzovsky
2020-07-21 13:39                             ` Christian König
2020-07-21 13:39                               ` Christian König
2020-07-21 13:42                               ` Andrey Grodzovsky
2020-07-21 13:42                                 ` Andrey Grodzovsky
2020-07-21 18:29                                 ` Luben Tuikov
2020-07-21 18:29                                   ` Luben Tuikov
2020-11-25  3:17                                   ` [PATCH 0/6] Allow to extend the timeout without jobs disappearing Luben Tuikov
2020-11-25  3:17                                     ` Luben Tuikov
2020-11-25  3:17                                     ` [PATCH 1/6] drm/scheduler: "node" --> "list" Luben Tuikov
2020-11-25  3:17                                       ` Luben Tuikov
2020-11-25  9:44                                       ` Christian König
2020-11-25  9:44                                         ` Christian König
2020-11-25  3:17                                     ` [PATCH 2/6] gpu/drm: ring_mirror_list --> pending_list Luben Tuikov
2020-11-25  3:17                                       ` Luben Tuikov
2020-11-25  9:47                                       ` Christian König
2020-11-25  9:47                                         ` Christian König
2020-11-25 16:42                                         ` Luben Tuikov
2020-11-25 16:42                                           ` Luben Tuikov
2020-11-25  3:17                                     ` [PATCH 3/6] drm/scheduler: Job timeout handler returns status Luben Tuikov
2020-11-25  3:17                                       ` Luben Tuikov
2020-11-25  4:41                                       ` kernel test robot
2020-11-25  4:41                                         ` kernel test robot
2020-11-25  4:41                                         ` kernel test robot
2020-11-25  9:50                                       ` Christian König
2020-11-25  9:50                                         ` Christian König
2020-11-25 16:48                                         ` Luben Tuikov
2020-11-25 16:48                                           ` Luben Tuikov
2020-11-25 11:04                                       ` Steven Price
2020-11-25 11:04                                         ` Steven Price
2020-11-25 11:15                                         ` Lucas Stach
2020-11-25 11:15                                           ` Lucas Stach
2020-11-25 11:22                                           ` Steven Price
2020-11-25 11:22                                             ` Steven Price
2020-11-25 11:47                                             ` Lucas Stach
2020-11-25 11:47                                               ` Lucas Stach
2020-11-25 12:41                                         ` Christian König
2020-11-25 12:41                                           ` Christian König
2020-11-26 15:06                                       ` Andrey Grodzovsky
2020-11-26 15:06                                         ` Andrey Grodzovsky
2020-11-25  3:17                                     ` [PATCH 4/6] drm/scheduler: Essentialize the job done callback Luben Tuikov
2020-11-25  3:17                                       ` Luben Tuikov
2020-11-25  9:51                                       ` Christian König
2020-11-25  9:51                                         ` Christian König
2020-11-25  3:17                                     ` [PATCH 5/6] drm/amdgpu: Don't hardcode thread name length Luben Tuikov
2020-11-25  3:17                                       ` Luben Tuikov
2020-11-25  9:55                                       ` Christian König
2020-11-25  9:55                                         ` Christian König
2020-11-25 17:01                                         ` Luben Tuikov
2020-11-25 17:01                                           ` Luben Tuikov
2020-11-26  8:11                                           ` Christian König
2020-11-26  8:11                                             ` Christian König
2020-11-25  3:17                                     ` [PATCH 6/6] drm/sched: Make use of a "done" thread Luben Tuikov
2020-11-25  3:17                                       ` Luben Tuikov
2020-11-25 10:10                                       ` Christian König
2020-11-25 10:10                                         ` Christian König
2020-11-26  0:24                                         ` Luben Tuikov
2020-11-26  0:24                                           ` Luben Tuikov
2020-11-25 11:09                                       ` Steven Price
2020-11-25 11:09                                         ` Steven Price
2020-11-26  0:30                                         ` Luben Tuikov
2020-11-26  0:30                                           ` Luben Tuikov
2020-02-07 15:26           ` [PATCH v4] drm/scheduler: Avoid accessing freed bad job Daniel Vetter
2020-02-07 15:26             ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MWHPR12MB1453C6FC45A83482232CA3EDEA450@MWHPR12MB1453.namprd12.prod.outlook.com \
    --to=andrey.grodzovsky@amd.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Christian.Koenig@amd.com \
    --cc=Emily.Deng@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=steven.price@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.