All of lore.kernel.org
 help / color / mirror / Atom feed
From: Will Deacon <will.deacon@arm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: mingo@kernel.org, oleg@redhat.com, linux-kernel@vger.kernel.org,
	paulmck@linux.vnet.ibm.com, boqun.feng@gmail.com, corbet@lwn.net,
	mhocko@kernel.org, dhowells@redhat.com,
	torvalds@linux-foundation.org
Subject: Re: [PATCH 4/4] locking: Introduce smp_cond_acquire()
Date: Mon, 2 Nov 2015 17:42:01 +0000	[thread overview]
Message-ID: <20151102174200.GJ29657@arm.com> (raw)
In-Reply-To: <20151102134941.005198372@infradead.org>

Hi Peter,

On Mon, Nov 02, 2015 at 02:29:05PM +0100, Peter Zijlstra wrote:
> Introduce smp_cond_acquire() which combines a control dependency and a
> read barrier to form acquire semantics.
> 
> This primitive has two benefits:
>  - it documents control dependencies,
>  - its typically cheaper than using smp_load_acquire() in a loop.

I'm not sure that's necessarily true on arm64, where we have a native
load-acquire instruction, but not a READ -> READ barrier (smp_rmb()
orders prior loads against subsequent loads and stores for us).

Perhaps we could allow architectures to provide their own definition of
smp_cond_acquire in case they can implement it more efficiently?

> Note that while smp_cond_acquire() has an explicit
> smp_read_barrier_depends() for Alpha, neither sites it gets used in
> were actually buggy on Alpha for their lack of it. The first uses
> smp_rmb(), which on Alpha is a full barrier too and therefore serves
> its purpose. The second had an explicit full barrier.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/linux/compiler.h |   18 ++++++++++++++++++
>  kernel/sched/core.c      |    8 +-------
>  kernel/task_work.c       |    4 ++--
>  3 files changed, 21 insertions(+), 9 deletions(-)
> 
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -275,6 +275,24 @@ static __always_inline void __write_once
>  	__val; \
>  })
>  
> +/**
> + * smp_cond_acquire() - Spin wait for cond with ACQUIRE ordering
> + * @cond: boolean expression to wait for
> + *
> + * Equivalent to using smp_load_acquire() on the condition variable but employs
> + * the control dependency of the wait to reduce the barrier on many platforms.
> + *
> + * The control dependency provides a LOAD->STORE order, the additional RMB
> + * provides LOAD->LOAD order, together they provide LOAD->{LOAD,STORE} order,
> + * aka. ACQUIRE.
> + */
> +#define smp_cond_acquire(cond)	do {		\

I think the previous version that you posted/discussed had the actual
address of the variable being loaded passed in here too? That would be
useful for arm64, where we can wait-until-memory-location-has-changed
to save us re-evaluating cond prematurely.

> +	while (!(cond))				\
> +		cpu_relax();			\
> +	smp_read_barrier_depends(); /* ctrl */	\
> +	smp_rmb(); /* ctrl + rmb := acquire */	\

It's actually stronger than acquire, I think, because accesses before the
smp_cond_acquire cannot be moved across it.

> +} while (0)
> +
>  #endif /* __KERNEL__ */
>  
>  #endif /* __ASSEMBLY__ */
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2111,19 +2111,13 @@ try_to_wake_up(struct task_struct *p, un
>  	/*
>  	 * If the owning (remote) cpu is still in the middle of schedule() with
>  	 * this task as prev, wait until its done referencing the task.
> -	 */
> -	while (p->on_cpu)
> -		cpu_relax();
> -	/*
> -	 * Combined with the control dependency above, we have an effective
> -	 * smp_load_acquire() without the need for full barriers.
>  	 *
>  	 * Pairs with the smp_store_release() in finish_lock_switch().
>  	 *
>  	 * This ensures that tasks getting woken will be fully ordered against
>  	 * their previous state and preserve Program Order.
>  	 */
> -	smp_rmb();
> +	smp_cond_acquire(!p->on_cpu);
>  
>  	p->sched_contributes_to_load = !!task_contributes_to_load(p);
>  	p->state = TASK_WAKING;
> --- a/kernel/task_work.c
> +++ b/kernel/task_work.c
> @@ -102,13 +102,13 @@ void task_work_run(void)
>  
>  		if (!work)
>  			break;
> +
>  		/*
>  		 * Synchronize with task_work_cancel(). It can't remove
>  		 * the first entry == work, cmpxchg(task_works) should
>  		 * fail, but it can play with *work and other entries.
>  		 */
> -		raw_spin_unlock_wait(&task->pi_lock);
> -		smp_mb();
> +		smp_cond_acquire(!raw_spin_is_locked(&task->pi_lock));

Hmm, there's some sort of release equivalent in kernel/exit.c, but I
couldn't easily figure out whether we could do anything there. If we
could, we could kill raw_spin_unlock_wait :)

Will

  parent reply	other threads:[~2015-11-02 17:42 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-02 13:29 [PATCH 0/4] scheduler ordering bits Peter Zijlstra
2015-11-02 13:29 ` [PATCH 1/4] sched: Better document the try_to_wake_up() barriers Peter Zijlstra
2015-12-04  0:09   ` Byungchul Park
2015-12-04  0:58   ` Byungchul Park
2015-11-02 13:29 ` [PATCH 2/4] sched: Document Program-Order guarantees Peter Zijlstra
2015-11-02 20:27   ` Paul Turner
2015-11-02 20:34     ` Peter Zijlstra
2015-11-02 22:09       ` Paul Turner
2015-11-02 22:12         ` Peter Zijlstra
2015-11-20 10:02     ` Peter Zijlstra
2015-11-20 14:08       ` Boqun Feng
2015-11-20 14:18         ` Peter Zijlstra
2015-11-20 14:21           ` Boqun Feng
2015-11-20 19:41             ` Peter Zijlstra
2015-11-02 13:29 ` [PATCH 3/4] sched: Fix a race in try_to_wake_up() vs schedule() Peter Zijlstra
2015-11-02 13:29 ` [PATCH 4/4] locking: Introduce smp_cond_acquire() Peter Zijlstra
2015-11-02 13:57   ` Peter Zijlstra
2015-11-02 17:43     ` Will Deacon
2015-11-03  1:14       ` Paul E. McKenney
2015-11-03  1:25         ` Linus Torvalds
2015-11-02 17:42   ` Will Deacon [this message]
2015-11-02 18:08   ` Linus Torvalds
2015-11-02 18:37     ` Will Deacon
2015-11-02 19:17       ` Linus Torvalds
2015-11-02 19:57         ` Will Deacon
2015-11-02 20:23           ` Peter Zijlstra
2015-11-02 21:56         ` Peter Zijlstra
2015-11-03  1:57         ` Paul E. McKenney
2015-11-03 19:40           ` Linus Torvalds
2015-11-04  3:57             ` Paul E. McKenney
2015-11-04  4:43               ` Linus Torvalds
2015-11-04 12:54                 ` Paul E. McKenney
2015-11-02 20:36   ` David Howells
2015-11-02 20:40     ` Peter Zijlstra
2015-11-02 21:11     ` Linus Torvalds
2015-11-03 17:59   ` Oleg Nesterov
2015-11-03 18:23     ` Peter Zijlstra
2015-11-11  9:39     ` Boqun Feng
2015-11-11 10:34       ` Boqun Feng
2015-11-11 19:53         ` Oleg Nesterov
2015-11-12 13:50         ` Paul E. McKenney
2015-11-11 12:12       ` Peter Zijlstra
2015-11-11 19:39         ` Oleg Nesterov
2015-11-11 21:23           ` Linus Torvalds
2015-11-12  7:14           ` Boqun Feng
2015-11-12 10:28             ` Peter Zijlstra
2015-11-12 15:00             ` Oleg Nesterov
2015-11-12 14:40               ` Paul E. McKenney
2015-11-12 14:49                 ` Boqun Feng
2015-11-12 15:02                   ` Paul E. McKenney
2015-11-12 21:53                     ` Will Deacon
2015-11-12 14:50                 ` Peter Zijlstra
2015-11-12 15:01                   ` Paul E. McKenney
2015-11-12 15:08                     ` Peter Zijlstra
2015-11-12 15:20                       ` Paul E. McKenney
2015-11-12 21:25                         ` Will Deacon
2015-11-12 15:18               ` Boqun Feng
2015-11-12 18:38                 ` Oleg Nesterov
2015-11-12 18:02                   ` Peter Zijlstra
2015-11-12 19:33                     ` Oleg Nesterov
2015-11-12 18:59                       ` Paul E. McKenney
2015-11-12 21:33                         ` Will Deacon
2015-11-12 23:43                           ` Paul E. McKenney
2015-11-16 13:58                             ` Will Deacon
2015-11-12 18:21             ` Linus Torvalds
2015-11-12 22:09               ` Will Deacon
2015-11-16 15:56               ` Peter Zijlstra
2015-11-16 16:04                 ` Peter Zijlstra
2015-11-16 16:24                   ` Will Deacon
2015-11-16 16:44                     ` Paul E. McKenney
2015-11-16 16:46                       ` Will Deacon
2015-11-16 17:15                         ` Paul E. McKenney
2015-11-16 21:58                     ` Linus Torvalds
2015-11-17 11:51                       ` Will Deacon
2015-11-17 21:01                         ` Paul E. McKenney
2015-11-18 11:25                           ` Will Deacon
2015-11-19 18:01                             ` Will Deacon
2015-11-20 10:09                               ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151102174200.GJ29657@arm.com \
    --to=will.deacon@arm.com \
    --cc=boqun.feng@gmail.com \
    --cc=corbet@lwn.net \
    --cc=dhowells@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.