All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Brauner <christian.brauner@ubuntu.com>
To: peterz@infradead.org
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Jiri Slaby <jirislaby@kernel.org>,
	christian@brauner.io, "Eric W. Biederman" <ebiederm@xmission.com>,
	Linux kernel mailing list <linux-kernel@vger.kernel.org>,
	Mel Gorman <mgorman@suse.de>,
	Dave Jones <davej@codemonkey.org.uk>
Subject: Re: [PATCH] sched: Fix race against ptrace_freeze_trace()
Date: Tue, 21 Jul 2020 16:29:22 +0200	[thread overview]
Message-ID: <20200721142922.qyd4o44rqvy7kbxu@wittgenstein> (raw)
In-Reply-To: <20200721121308.GH43129@hirez.programming.kicks-ass.net>

On Tue, Jul 21, 2020 at 02:13:08PM +0200, peterz@infradead.org wrote:
> 
> There is apparently one site that violates the rule that only current
> and ttwu() will modify task->state, namely ptrace_{,un}freeze_traced()
> will change task->state for a remote task.
> 
> Oleg explains:
> 
>   "TASK_TRACED/TASK_STOPPED was always protected by siglock. In
> particular, ttwu(__TASK_TRACED) must be always called with siglock
> held. That is why ptrace_freeze_traced() assumes it can safely do
> s/TASK_TRACED/__TASK_TRACED/ under spin_lock(siglock)."
> 
> This breaks the ordering scheme introduced by commit:
> 
>   dbfb089d360b ("sched: Fix loadavg accounting race")
> 
> Specifically, the reload not matching no longer implies we don't have
> to block.
> 
> Simply things by noting that what we need is a LOAD->STORE ordering
> and this can be provided by a control dependency.
> 
> So replace:
> 
> 	prev_state = prev->state;
> 	raw_spin_lock(&rq->lock);
> 	smp_mb__after_spinlock(); /* SMP-MB */
> 	if (... && prev_state && prev_state == prev->state)
> 		deactivate_task();
> 
> with:
> 
> 	prev_state = prev->state;
> 	if (... && prev_state) /* CTRL-DEP */
> 		deactivate_task();
> 
> Since that already implies the 'prev->state' load must be complete
> before allowing the 'prev->on_rq = 0' store to become visible.
> 
> Fixes: dbfb089d360b ("sched: Fix loadavg accounting race")
> Reported-by: Jiri Slaby <jirislaby@kernel.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> ---

Thank you. I applied this on top of v5.8-rc6 and re-ran the strace-test
suite successfully. So at least

Tested-by: Christian Brauner <christian.brauner@ubuntu.com>

> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4193,9 +4193,6 @@ static void __sched notrace __schedule(b
>  	local_irq_disable();
>  	rcu_note_context_switch(preempt);
>  
> -	/* See deactivate_task() below. */
> -	prev_state = prev->state;
> -
>  	/*
>  	 * Make sure that signal_pending_state()->signal_pending() below
>  	 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
> @@ -4219,11 +4216,16 @@ static void __sched notrace __schedule(b
>  	update_rq_clock(rq);
>  
>  	switch_count = &prev->nivcsw;
> +
>  	/*
> -	 * We must re-load prev->state in case ttwu_remote() changed it
> -	 * before we acquired rq->lock.
> +	 * We must load prev->state once (task_struct::state is volatile), such
> +	 * that:
> +	 *
> +	 *  - we form a control dependency vs deactivate_task() below.
> +	 *  - ptrace_{,un}freeze_traced() can change ->state underneath us.
>  	 */
> -	if (!preempt && prev_state && prev_state == prev->state) {
> +	prev_state = prev->state;
> +	if (!preempt && prev_state) {
>  		if (signal_pending_state(prev_state, prev)) {
>  			prev->state = TASK_RUNNING;
>  		} else {
> @@ -4237,10 +4239,12 @@ static void __sched notrace __schedule(b
>  
>  			/*
>  			 * __schedule()			ttwu()
> -			 *   prev_state = prev->state;	  if (READ_ONCE(p->on_rq) && ...)
> -			 *   LOCK rq->lock		    goto out;
> -			 *   smp_mb__after_spinlock();	  smp_acquire__after_ctrl_dep();
> -			 *   p->on_rq = 0;		  p->state = TASK_WAKING;
> +			 *   if (prev_state)		  if (p->on_rq && ...)
> +			 *     p->on_rq = 0;		    goto out;
> +			 *				  smp_acquire__after_ctrl_dep();
> +			 *				  p->state = TASK_WAKING
> +			 *
> +			 * Where __schedule() and ttwu() have matching control dependencies.
>  			 *
>  			 * After this, schedule() must not care about p->state any more.
>  			 */

  reply	other threads:[~2020-07-21 14:29 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-17 10:45 5.8-rc*: kernel BUG at kernel/signal.c:1917 Jiri Slaby
2020-07-17 11:04 ` Jiri Slaby
2020-07-17 11:12   ` Christian Brauner
2020-07-18 13:05     ` Jiri Slaby
2020-07-17 12:26   ` Oleg Nesterov
2020-07-17 12:40     ` Oleg Nesterov
2020-07-18 12:28       ` Jiri Slaby
2020-07-18 17:14         ` Oleg Nesterov
2020-07-18 17:44           ` Christian Brauner
2020-07-20  5:44             ` Jiri Slaby
2020-07-20  6:43               ` Oleg Nesterov
2020-07-20  8:26                 ` Oleg Nesterov
2020-07-20  8:41                   ` Peter Zijlstra
2020-07-20 10:59                     ` peterz
2020-07-20 11:26                       ` peterz
2020-07-20 11:40                         ` Jiri Slaby
2020-07-20 12:20                         ` Valentin Schneider
2020-07-20 13:17                           ` peterz
2020-07-20 14:26                             ` Valentin Schneider
2020-07-20 12:57                         ` Christian Brauner
2020-07-20 14:05                         ` peterz
2020-07-20 14:02                       ` Oleg Nesterov
2020-07-20 14:21                         ` Peter Zijlstra
2020-07-20 14:39                           ` Oleg Nesterov
2020-07-20 15:35                             ` Oleg Nesterov
2020-07-20 15:38                               ` Peter Zijlstra
2020-07-21  4:52                           ` Paul Gortmaker
2020-07-21  8:37                             ` peterz
2020-07-21 12:13                               ` [PATCH] sched: Fix race against ptrace_freeze_trace() peterz
2020-07-21 14:29                                 ` Christian Brauner [this message]
2020-07-21 15:38                                 ` Oleg Nesterov
2020-07-21  9:14                           ` 5.8-rc*: kernel BUG at kernel/signal.c:1917 Valentin Schneider
     [not found]           ` <20200719072726.5892-1-hdanton@sina.com>
2020-07-19 18:23             ` Oleg Nesterov
2020-07-20  6:00           ` Jiri Slaby
2020-07-20  6:56             ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200721142922.qyd4o44rqvy7kbxu@wittgenstein \
    --to=christian.brauner@ubuntu.com \
    --cc=christian@brauner.io \
    --cc=davej@codemonkey.org.uk \
    --cc=ebiederm@xmission.com \
    --cc=jirislaby@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=oleg@redhat.com \
    --cc=paul.gortmaker@windriver.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.