linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6] psi: fix race between psi_trigger_create/destroy
@ 2021-05-21  2:05 Huangzhaoyang
  2021-05-21  2:11 ` Suren Baghdasaryan
  2021-05-21 14:18 ` Johannes Weiner
  0 siblings, 2 replies; 6+ messages in thread
From: Huangzhaoyang @ 2021-05-21  2:05 UTC (permalink / raw)
  To: Johannes Weiner, Suren Baghdasaryan, Zhaoyang Huang, Ziwei Dai,
	Ke Wang, linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira

From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>

Race detected between psi_trigger_destroy/create as shown below, which
cause panic by accessing invalid psi_system->poll_wait->wait_queue_entry
and psi_system->poll_timer->entry->next. Under this modification, the
race window is removed by initialising poll_wait and poll_timer in
group_init which are executed only once at beginning.

psi_trigger_destroy                      psi_trigger_create
mutex_lock(trigger_lock);
rcu_assign_pointer(poll_task, NULL);
mutex_unlock(trigger_lock);
					mutex_lock(trigger_lock);
					if (!rcu_access_pointer(group->poll_task)) {

						timer_setup(poll_timer, poll_timer_fn, 0);

						rcu_assign_pointer(poll_task, task);
					}
					mutex_unlock(trigger_lock);

synchronize_rcu();
del_timer_sync(poll_timer); <-- poll_timer has been reinitialized by
psi_trigger_create

So, trigger_lock/RCU correctly protects destruction of group->poll_task but
misses this race affecting poll_timer and poll_wait.

Fixes: 461daba06bdc ("psi: eliminate kthread_worker from psi trigger
scheduling mechanism")

Signed-off-by: ziwei.dai <ziwei.dai@unisoc.com>
Signed-off-by: ke.wang <ke.wang@unisoc.com>
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
---
v2: change del_timer_sync to del_timer in psi_trigger_destroy
v3: remove timer_setup within psi_tirgger_create
    protect del_timer by extending the critical section of mutex_lock
v4: amend fix information on comment
v5: delete the poll_timer while assigning the task to NULL
v6: update subject and comments as Suren's suggestion
---
---
 kernel/sched/psi.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
index cc25a3c..58b36d1 100644
--- a/kernel/sched/psi.c
+++ b/kernel/sched/psi.c
@@ -182,6 +182,8 @@ struct psi_group psi_system = {
 
 static void psi_avgs_work(struct work_struct *work);
 
+static void poll_timer_fn(struct timer_list *t);
+
 static void group_init(struct psi_group *group)
 {
 	int cpu;
@@ -201,6 +203,8 @@ static void group_init(struct psi_group *group)
 	memset(group->polling_total, 0, sizeof(group->polling_total));
 	group->polling_next_update = ULLONG_MAX;
 	group->polling_until = 0;
+	init_waitqueue_head(&group->poll_wait);
+	timer_setup(&group->poll_timer, poll_timer_fn, 0);
 	rcu_assign_pointer(group->poll_task, NULL);
 }
 
@@ -1157,9 +1161,7 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group,
 			return ERR_CAST(task);
 		}
 		atomic_set(&group->poll_wakeup, 0);
-		init_waitqueue_head(&group->poll_wait);
 		wake_up_process(task);
-		timer_setup(&group->poll_timer, poll_timer_fn, 0);
 		rcu_assign_pointer(group->poll_task, task);
 	}
 
@@ -1211,6 +1213,7 @@ static void psi_trigger_destroy(struct kref *ref)
 					group->poll_task,
 					lockdep_is_held(&group->trigger_lock));
 			rcu_assign_pointer(group->poll_task, NULL);
+			del_timer(&group->poll_timer);
 		}
 	}
 
@@ -1223,17 +1226,14 @@ static void psi_trigger_destroy(struct kref *ref)
 	 */
 	synchronize_rcu();
 	/*
-	 * Destroy the kworker after releasing trigger_lock to prevent a
+	 * Stop kthread 'psimon' after releasing trigger_lock to prevent a
 	 * deadlock while waiting for psi_poll_work to acquire trigger_lock
 	 */
 	if (task_to_destroy) {
 		/*
 		 * After the RCU grace period has expired, the worker
 		 * can no longer be found through group->poll_task.
-		 * But it might have been already scheduled before
-		 * that - deschedule it cleanly before destroying it.
 		 */
-		del_timer_sync(&group->poll_timer);
 		kthread_stop(task_to_destroy);
 	}
 	kfree(t);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v6] psi: fix race between psi_trigger_create/destroy
  2021-05-21  2:05 [PATCH v6] psi: fix race between psi_trigger_create/destroy Huangzhaoyang
@ 2021-05-21  2:11 ` Suren Baghdasaryan
  2021-05-21 11:10   ` Peter Zijlstra
  2021-05-21 14:18 ` Johannes Weiner
  1 sibling, 1 reply; 6+ messages in thread
From: Suren Baghdasaryan @ 2021-05-21  2:11 UTC (permalink / raw)
  To: Huangzhaoyang
  Cc: Johannes Weiner, Zhaoyang Huang, Ziwei Dai, Ke Wang, LKML,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira

On Thu, May 20, 2021 at 7:07 PM Huangzhaoyang <huangzhaoyang@gmail.com> wrote:
>
> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
>
> Race detected between psi_trigger_destroy/create as shown below, which
> cause panic by accessing invalid psi_system->poll_wait->wait_queue_entry
> and psi_system->poll_timer->entry->next. Under this modification, the
> race window is removed by initialising poll_wait and poll_timer in
> group_init which are executed only once at beginning.
>
> psi_trigger_destroy                      psi_trigger_create
> mutex_lock(trigger_lock);
> rcu_assign_pointer(poll_task, NULL);
> mutex_unlock(trigger_lock);
>                                         mutex_lock(trigger_lock);
>                                         if (!rcu_access_pointer(group->poll_task)) {
>
>                                                 timer_setup(poll_timer, poll_timer_fn, 0);
>
>                                                 rcu_assign_pointer(poll_task, task);
>                                         }
>                                         mutex_unlock(trigger_lock);
>
> synchronize_rcu();
> del_timer_sync(poll_timer); <-- poll_timer has been reinitialized by
> psi_trigger_create
>
> So, trigger_lock/RCU correctly protects destruction of group->poll_task but
> misses this race affecting poll_timer and poll_wait.
>
> Fixes: 461daba06bdc ("psi: eliminate kthread_worker from psi trigger
> scheduling mechanism")
>
> Signed-off-by: ziwei.dai <ziwei.dai@unisoc.com>
> Signed-off-by: ke.wang <ke.wang@unisoc.com>
> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>

Looks good. Thanks!
Reviewed-by: Suren Baghdasaryan <surenb@google.com>

> ---
> v2: change del_timer_sync to del_timer in psi_trigger_destroy
> v3: remove timer_setup within psi_tirgger_create
>     protect del_timer by extending the critical section of mutex_lock
> v4: amend fix information on comment
> v5: delete the poll_timer while assigning the task to NULL
> v6: update subject and comments as Suren's suggestion
> ---
> ---
>  kernel/sched/psi.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
> index cc25a3c..58b36d1 100644
> --- a/kernel/sched/psi.c
> +++ b/kernel/sched/psi.c
> @@ -182,6 +182,8 @@ struct psi_group psi_system = {
>
>  static void psi_avgs_work(struct work_struct *work);
>
> +static void poll_timer_fn(struct timer_list *t);
> +
>  static void group_init(struct psi_group *group)
>  {
>         int cpu;
> @@ -201,6 +203,8 @@ static void group_init(struct psi_group *group)
>         memset(group->polling_total, 0, sizeof(group->polling_total));
>         group->polling_next_update = ULLONG_MAX;
>         group->polling_until = 0;
> +       init_waitqueue_head(&group->poll_wait);
> +       timer_setup(&group->poll_timer, poll_timer_fn, 0);
>         rcu_assign_pointer(group->poll_task, NULL);
>  }
>
> @@ -1157,9 +1161,7 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group,
>                         return ERR_CAST(task);
>                 }
>                 atomic_set(&group->poll_wakeup, 0);
> -               init_waitqueue_head(&group->poll_wait);
>                 wake_up_process(task);
> -               timer_setup(&group->poll_timer, poll_timer_fn, 0);
>                 rcu_assign_pointer(group->poll_task, task);
>         }
>
> @@ -1211,6 +1213,7 @@ static void psi_trigger_destroy(struct kref *ref)
>                                         group->poll_task,
>                                         lockdep_is_held(&group->trigger_lock));
>                         rcu_assign_pointer(group->poll_task, NULL);
> +                       del_timer(&group->poll_timer);
>                 }
>         }
>
> @@ -1223,17 +1226,14 @@ static void psi_trigger_destroy(struct kref *ref)
>          */
>         synchronize_rcu();
>         /*
> -        * Destroy the kworker after releasing trigger_lock to prevent a
> +        * Stop kthread 'psimon' after releasing trigger_lock to prevent a
>          * deadlock while waiting for psi_poll_work to acquire trigger_lock
>          */
>         if (task_to_destroy) {
>                 /*
>                  * After the RCU grace period has expired, the worker
>                  * can no longer be found through group->poll_task.
> -                * But it might have been already scheduled before
> -                * that - deschedule it cleanly before destroying it.
>                  */
> -               del_timer_sync(&group->poll_timer);
>                 kthread_stop(task_to_destroy);
>         }
>         kfree(t);
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v6] psi: fix race between psi_trigger_create/destroy
  2021-05-21  2:11 ` Suren Baghdasaryan
@ 2021-05-21 11:10   ` Peter Zijlstra
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Zijlstra @ 2021-05-21 11:10 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Huangzhaoyang, Johannes Weiner, Zhaoyang Huang, Ziwei Dai,
	Ke Wang, LKML, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira

On Thu, May 20, 2021 at 07:11:08PM -0700, Suren Baghdasaryan wrote:
> On Thu, May 20, 2021 at 7:07 PM Huangzhaoyang <huangzhaoyang@gmail.com> wrote:
> >
> > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> >
> > Race detected between psi_trigger_destroy/create as shown below, which
> > cause panic by accessing invalid psi_system->poll_wait->wait_queue_entry
> > and psi_system->poll_timer->entry->next. Under this modification, the
> > race window is removed by initialising poll_wait and poll_timer in
> > group_init which are executed only once at beginning.
> >
> > psi_trigger_destroy                      psi_trigger_create
> > mutex_lock(trigger_lock);
> > rcu_assign_pointer(poll_task, NULL);
> > mutex_unlock(trigger_lock);
> >                                         mutex_lock(trigger_lock);
> >                                         if (!rcu_access_pointer(group->poll_task)) {
> >
> >                                                 timer_setup(poll_timer, poll_timer_fn, 0);
> >
> >                                                 rcu_assign_pointer(poll_task, task);
> >                                         }
> >                                         mutex_unlock(trigger_lock);
> >
> > synchronize_rcu();
> > del_timer_sync(poll_timer); <-- poll_timer has been reinitialized by
> > psi_trigger_create
> >
> > So, trigger_lock/RCU correctly protects destruction of group->poll_task but
> > misses this race affecting poll_timer and poll_wait.
> >
> > Fixes: 461daba06bdc ("psi: eliminate kthread_worker from psi trigger
> > scheduling mechanism")
> >
> > Signed-off-by: ziwei.dai <ziwei.dai@unisoc.com>
> > Signed-off-by: ke.wang <ke.wang@unisoc.com>
> > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>

This is not a valid SoB chain though; please (re)read Documentation/process/submitting-patches.rst.

> Looks good. Thanks!
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v6] psi: fix race between psi_trigger_create/destroy
  2021-05-21  2:05 [PATCH v6] psi: fix race between psi_trigger_create/destroy Huangzhaoyang
  2021-05-21  2:11 ` Suren Baghdasaryan
@ 2021-05-21 14:18 ` Johannes Weiner
  2021-06-10  7:31   ` Peter Zijlstra
  1 sibling, 1 reply; 6+ messages in thread
From: Johannes Weiner @ 2021-05-21 14:18 UTC (permalink / raw)
  To: Huangzhaoyang
  Cc: Suren Baghdasaryan, Zhaoyang Huang, Ziwei Dai, Ke Wang,
	linux-kernel, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira

On Fri, May 21, 2021 at 10:05:54AM +0800, Huangzhaoyang wrote:
> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> 
> Race detected between psi_trigger_destroy/create as shown below, which
> cause panic by accessing invalid psi_system->poll_wait->wait_queue_entry
> and psi_system->poll_timer->entry->next. Under this modification, the
> race window is removed by initialising poll_wait and poll_timer in
> group_init which are executed only once at beginning.
> 
> psi_trigger_destroy                      psi_trigger_create
> mutex_lock(trigger_lock);
> rcu_assign_pointer(poll_task, NULL);
> mutex_unlock(trigger_lock);
> 					mutex_lock(trigger_lock);
> 					if (!rcu_access_pointer(group->poll_task)) {
> 
> 						timer_setup(poll_timer, poll_timer_fn, 0);
> 
> 						rcu_assign_pointer(poll_task, task);
> 					}
> 					mutex_unlock(trigger_lock);
> 
> synchronize_rcu();
> del_timer_sync(poll_timer); <-- poll_timer has been reinitialized by
> psi_trigger_create
> 
> So, trigger_lock/RCU correctly protects destruction of group->poll_task but
> misses this race affecting poll_timer and poll_wait.
> 
> Fixes: 461daba06bdc ("psi: eliminate kthread_worker from psi trigger
> scheduling mechanism")
> 
> Signed-off-by: ziwei.dai <ziwei.dai@unisoc.com>
> Signed-off-by: ke.wang <ke.wang@unisoc.com>
> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v6] psi: fix race between psi_trigger_create/destroy
  2021-05-21 14:18 ` Johannes Weiner
@ 2021-06-10  7:31   ` Peter Zijlstra
  2021-06-10  7:32     ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2021-06-10  7:31 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Huangzhaoyang, Suren Baghdasaryan, Zhaoyang Huang, Ziwei Dai,
	Ke Wang, linux-kernel, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira

On Fri, May 21, 2021 at 10:18:53AM -0400, Johannes Weiner wrote:
> On Fri, May 21, 2021 at 10:05:54AM +0800, Huangzhaoyang wrote:
> > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > 
> > Race detected between psi_trigger_destroy/create as shown below, which
> > cause panic by accessing invalid psi_system->poll_wait->wait_queue_entry
> > and psi_system->poll_timer->entry->next. Under this modification, the
> > race window is removed by initialising poll_wait and poll_timer in
> > group_init which are executed only once at beginning.
> > 
> > psi_trigger_destroy                      psi_trigger_create
> > mutex_lock(trigger_lock);
> > rcu_assign_pointer(poll_task, NULL);
> > mutex_unlock(trigger_lock);
> > 					mutex_lock(trigger_lock);
> > 					if (!rcu_access_pointer(group->poll_task)) {
> > 
> > 						timer_setup(poll_timer, poll_timer_fn, 0);
> > 
> > 						rcu_assign_pointer(poll_task, task);
> > 					}
> > 					mutex_unlock(trigger_lock);
> > 
> > synchronize_rcu();
> > del_timer_sync(poll_timer); <-- poll_timer has been reinitialized by
> > psi_trigger_create
> > 
> > So, trigger_lock/RCU correctly protects destruction of group->poll_task but
> > misses this race affecting poll_timer and poll_wait.
> > 
> > Fixes: 461daba06bdc ("psi: eliminate kthread_worker from psi trigger
> > scheduling mechanism")
> > 
> > Signed-off-by: ziwei.dai <ziwei.dai@unisoc.com>
> > Signed-off-by: ke.wang <ke.wang@unisoc.com>
> > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> 
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>

Thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v6] psi: fix race between psi_trigger_create/destroy
  2021-06-10  7:31   ` Peter Zijlstra
@ 2021-06-10  7:32     ` Peter Zijlstra
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Zijlstra @ 2021-06-10  7:32 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Huangzhaoyang, Suren Baghdasaryan, Zhaoyang Huang, Ziwei Dai,
	Ke Wang, linux-kernel, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira

On Thu, Jun 10, 2021 at 09:31:33AM +0200, Peter Zijlstra wrote:
> On Fri, May 21, 2021 at 10:18:53AM -0400, Johannes Weiner wrote:
> > On Fri, May 21, 2021 at 10:05:54AM +0800, Huangzhaoyang wrote:
> > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > 
> > > Race detected between psi_trigger_destroy/create as shown below, which
> > > cause panic by accessing invalid psi_system->poll_wait->wait_queue_entry
> > > and psi_system->poll_timer->entry->next. Under this modification, the
> > > race window is removed by initialising poll_wait and poll_timer in
> > > group_init which are executed only once at beginning.
> > > 
> > > psi_trigger_destroy                      psi_trigger_create
> > > mutex_lock(trigger_lock);
> > > rcu_assign_pointer(poll_task, NULL);
> > > mutex_unlock(trigger_lock);
> > > 					mutex_lock(trigger_lock);
> > > 					if (!rcu_access_pointer(group->poll_task)) {
> > > 
> > > 						timer_setup(poll_timer, poll_timer_fn, 0);
> > > 
> > > 						rcu_assign_pointer(poll_task, task);
> > > 					}
> > > 					mutex_unlock(trigger_lock);
> > > 
> > > synchronize_rcu();
> > > del_timer_sync(poll_timer); <-- poll_timer has been reinitialized by
> > > psi_trigger_create
> > > 
> > > So, trigger_lock/RCU correctly protects destruction of group->poll_task but
> > > misses this race affecting poll_timer and poll_wait.
> > > 
> > > Fixes: 461daba06bdc ("psi: eliminate kthread_worker from psi trigger
> > > scheduling mechanism")
> > > 
> > > Signed-off-by: ziwei.dai <ziwei.dai@unisoc.com>
> > > Signed-off-by: ke.wang <ke.wang@unisoc.com>
> > > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>

Argh noticed the SoB chain is invalid. Please fix.

> > 
> > Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> Thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-06-10  7:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-21  2:05 [PATCH v6] psi: fix race between psi_trigger_create/destroy Huangzhaoyang
2021-05-21  2:11 ` Suren Baghdasaryan
2021-05-21 11:10   ` Peter Zijlstra
2021-05-21 14:18 ` Johannes Weiner
2021-06-10  7:31   ` Peter Zijlstra
2021-06-10  7:32     ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).