All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH tip/core/rcu 0/9] Remove spin_unlock_wait()
@ 2017-07-24 22:12 Paul E. McKenney
  2017-07-24 22:13 ` [PATCH tip/core/rcu 1/9] net/netfilter/nf_conntrack_core: Fix net_conntrack_lock() Paul E. McKenney
                   ` (9 more replies)
  0 siblings, 10 replies; 33+ messages in thread
From: Paul E. McKenney @ 2017-07-24 22:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg

Hello!

This series removes spin_unlock_wait():

1.	Remove spin_unlock_wait() from net/netfilter/nf_conntrack_core,
	courtesy of Manfred Spraul.

2.	Replace task_work use of spin_unlock_wait() with lock/unlock
	pair, courtesy of Oleg Nesterov.

3.	Replace scheduler use of spin_unlock_wait() with lock/unlock pair.

4.	Replace completion use of spin_unlock_wait() with lock/unlock
	pair.

5.	Replace exit() use of spin_unlock_wait() with lock/unlock pair.

6.	Replace IPC use of spin_unlock_wait() with lock/unlock pair.

7.	Replace ATA use of spin_unlock_wait() with lock/unlock pair.

8.	Remove generic definition of spin_unlock_wait().

9.	Remove architecture-specific definitions of spin_unlock_wait().

							Thanx, Paul

------------------------------------------------------------------------

 arch/alpha/include/asm/spinlock.h    |    5 -
 arch/arc/include/asm/spinlock.h      |    5 -
 arch/arm/include/asm/spinlock.h      |   16 ----
 arch/arm64/include/asm/spinlock.h    |   58 +----------------
 arch/blackfin/include/asm/spinlock.h |    5 -
 arch/hexagon/include/asm/spinlock.h  |    5 -
 arch/ia64/include/asm/spinlock.h     |   21 ------
 arch/m32r/include/asm/spinlock.h     |    5 -
 arch/metag/include/asm/spinlock.h    |    5 -
 arch/mn10300/include/asm/spinlock.h  |    5 -
 arch/parisc/include/asm/spinlock.h   |    7 --
 arch/powerpc/include/asm/spinlock.h  |   33 ---------
 arch/s390/include/asm/spinlock.h     |    7 --
 arch/sh/include/asm/spinlock-cas.h   |    5 -
 arch/sh/include/asm/spinlock-llsc.h  |    5 -
 arch/sparc/include/asm/spinlock_32.h |    5 -
 arch/tile/include/asm/spinlock_32.h  |    2 
 arch/tile/include/asm/spinlock_64.h  |    2 
 arch/tile/lib/spinlock_32.c          |   23 ------
 arch/tile/lib/spinlock_64.c          |   22 ------
 arch/xtensa/include/asm/spinlock.h   |    5 -
 drivers/ata/libata-eh.c              |    8 --
 include/asm-generic/qspinlock.h      |   14 ----
 include/linux/spinlock.h             |   31 ---------
 include/linux/spinlock_up.h          |    6 -
 ipc/sem.c                            |    3 
 kernel/exit.c                        |    3 
 kernel/locking/qspinlock.c           |  117 -----------------------------------
 kernel/sched/completion.c            |    9 --
 kernel/sched/core.c                  |    5 -
 kernel/task_work.c                   |    8 --
 net/netfilter/nf_conntrack_core.c    |   52 ++++++++-------
 32 files changed, 48 insertions(+), 454 deletions(-)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH tip/core/rcu 1/9] net/netfilter/nf_conntrack_core: Fix net_conntrack_lock()
  2017-07-24 22:12 [PATCH tip/core/rcu 0/9] Remove spin_unlock_wait() Paul E. McKenney
@ 2017-07-24 22:13 ` Paul E. McKenney
  2017-07-24 22:13 ` [PATCH tip/core/rcu 2/9] task_work: Replace spin_unlock_wait() with lock/unlock pair Paul E. McKenney
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2017-07-24 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Manfred Spraul, stable, Alan Stern, Sasha Levin,
	Pablo Neira Ayuso, netfilter-devel, Paul E. McKenney

From: Manfred Spraul <manfred@colorfullife.com>

As we want to remove spin_unlock_wait() and replace it with explicit
spin_lock()/spin_unlock() calls, we can use this to simplify the
locking.

In addition:
- Reading nf_conntrack_locks_all needs ACQUIRE memory ordering.
- The new code avoids the backwards loop.

Only slightly tested, I did not manage to trigger calls to
nf_conntrack_all_lock().

V2: With improved comments, to clearly show how the barriers
    pair.

Fixes: b16c29191dc8 ("netfilter: nf_conntrack: use safer way to lock all buckets")
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 net/netfilter/nf_conntrack_core.c | 52 ++++++++++++++++++++++-----------------
 1 file changed, 29 insertions(+), 23 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 9979f46c81dc..51390febd5e3 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -96,19 +96,26 @@ static struct conntrack_gc_work conntrack_gc_work;
 
 void nf_conntrack_lock(spinlock_t *lock) __acquires(lock)
 {
+	/* 1) Acquire the lock */
 	spin_lock(lock);
-	while (unlikely(nf_conntrack_locks_all)) {
-		spin_unlock(lock);
 
-		/*
-		 * Order the 'nf_conntrack_locks_all' load vs. the
-		 * spin_unlock_wait() loads below, to ensure
-		 * that 'nf_conntrack_locks_all_lock' is indeed held:
-		 */
-		smp_rmb(); /* spin_lock(&nf_conntrack_locks_all_lock) */
-		spin_unlock_wait(&nf_conntrack_locks_all_lock);
-		spin_lock(lock);
-	}
+	/* 2) read nf_conntrack_locks_all, with ACQUIRE semantics
+	 * It pairs with the smp_store_release() in nf_conntrack_all_unlock()
+	 */
+	if (likely(smp_load_acquire(&nf_conntrack_locks_all) == false))
+		return;
+
+	/* fast path failed, unlock */
+	spin_unlock(lock);
+
+	/* Slow path 1) get global lock */
+	spin_lock(&nf_conntrack_locks_all_lock);
+
+	/* Slow path 2) get the lock we want */
+	spin_lock(lock);
+
+	/* Slow path 3) release the global lock */
+	spin_unlock(&nf_conntrack_locks_all_lock);
 }
 EXPORT_SYMBOL_GPL(nf_conntrack_lock);
 
@@ -149,28 +156,27 @@ static void nf_conntrack_all_lock(void)
 	int i;
 
 	spin_lock(&nf_conntrack_locks_all_lock);
-	nf_conntrack_locks_all = true;
 
-	/*
-	 * Order the above store of 'nf_conntrack_locks_all' against
-	 * the spin_unlock_wait() loads below, such that if
-	 * nf_conntrack_lock() observes 'nf_conntrack_locks_all'
-	 * we must observe nf_conntrack_locks[] held:
-	 */
-	smp_mb(); /* spin_lock(&nf_conntrack_locks_all_lock) */
+	nf_conntrack_locks_all = true;
 
 	for (i = 0; i < CONNTRACK_LOCKS; i++) {
-		spin_unlock_wait(&nf_conntrack_locks[i]);
+		spin_lock(&nf_conntrack_locks[i]);
+
+		/* This spin_unlock provides the "release" to ensure that
+		 * nf_conntrack_locks_all==true is visible to everyone that
+		 * acquired spin_lock(&nf_conntrack_locks[]).
+		 */
+		spin_unlock(&nf_conntrack_locks[i]);
 	}
 }
 
 static void nf_conntrack_all_unlock(void)
 {
-	/*
-	 * All prior stores must be complete before we clear
+	/* All prior stores must be complete before we clear
 	 * 'nf_conntrack_locks_all'. Otherwise nf_conntrack_lock()
 	 * might observe the false value but not the entire
-	 * critical section:
+	 * critical section.
+	 * It pairs with the smp_load_acquire() in nf_conntrack_lock()
 	 */
 	smp_store_release(&nf_conntrack_locks_all, false);
 	spin_unlock(&nf_conntrack_locks_all_lock);
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH tip/core/rcu 2/9] task_work: Replace spin_unlock_wait() with lock/unlock pair
  2017-07-24 22:12 [PATCH tip/core/rcu 0/9] Remove spin_unlock_wait() Paul E. McKenney
  2017-07-24 22:13 ` [PATCH tip/core/rcu 1/9] net/netfilter/nf_conntrack_core: Fix net_conntrack_lock() Paul E. McKenney
@ 2017-07-24 22:13 ` Paul E. McKenney
  2017-07-24 22:13 ` [PATCH tip/core/rcu 3/9] sched: " Paul E. McKenney
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2017-07-24 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney

From: Oleg Nesterov <oleg@redhat.com>

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore replaces the spin_unlock_wait() call in
task_work_run() with a spin_lock_irq() and a spin_unlock_irq() aruond
the cmpxchg() dequeue loop.  This should be safe from a performance
perspective because ->pi_lock is local to the task and because calls to
the other side of the race, task_work_cancel(), should be rare.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/task_work.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/kernel/task_work.c b/kernel/task_work.c
index d513051fcca2..836a72a66fba 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -96,20 +96,16 @@ void task_work_run(void)
 		 * work->func() can do task_work_add(), do not set
 		 * work_exited unless the list is empty.
 		 */
+		raw_spin_lock_irq(&task->pi_lock);
 		do {
 			work = READ_ONCE(task->task_works);
 			head = !work && (task->flags & PF_EXITING) ?
 				&work_exited : NULL;
 		} while (cmpxchg(&task->task_works, work, head) != work);
+		raw_spin_unlock_irq(&task->pi_lock);
 
 		if (!work)
 			break;
-		/*
-		 * Synchronize with task_work_cancel(). It can't remove
-		 * the first entry == work, cmpxchg(task_works) should
-		 * fail, but it can play with *work and other entries.
-		 */
-		raw_spin_unlock_wait(&task->pi_lock);
 
 		do {
 			next = work->next;
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH tip/core/rcu 3/9] sched: Replace spin_unlock_wait() with lock/unlock pair
  2017-07-24 22:12 [PATCH tip/core/rcu 0/9] Remove spin_unlock_wait() Paul E. McKenney
  2017-07-24 22:13 ` [PATCH tip/core/rcu 1/9] net/netfilter/nf_conntrack_core: Fix net_conntrack_lock() Paul E. McKenney
  2017-07-24 22:13 ` [PATCH tip/core/rcu 2/9] task_work: Replace spin_unlock_wait() with lock/unlock pair Paul E. McKenney
@ 2017-07-24 22:13 ` Paul E. McKenney
  2017-07-24 22:13 ` [PATCH tip/core/rcu 4/9] completion: " Paul E. McKenney
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2017-07-24 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney, Ingo Molnar, Will Deacon, Alan Stern,
	Andrea Parri, Linus Torvalds

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore replaces the spin_unlock_wait() call in
do_task_dead() with spin_lock() followed immediately by spin_unlock().
This should be safe from a performance perspective because the lock is
this tasks ->pi_lock, and this is called only after the task exits.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
[ paulmck: Replace leading smp_mb() with smp_mb__before_spinlock(),
  courtesy of Arnd Bergmann's noting its odd location. ]
---
 kernel/sched/core.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 17c667b427b4..1179111d82a1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3352,8 +3352,9 @@ void __noreturn do_task_dead(void)
 	 * To avoid it, we have to wait for releasing tsk->pi_lock which
 	 * is held by try_to_wake_up()
 	 */
-	smp_mb();
-	raw_spin_unlock_wait(&current->pi_lock);
+	smp_mb__before_spinlock();
+	raw_spin_lock_irq(&current->pi_lock);
+	raw_spin_unlock_irq(&current->pi_lock);
 
 	/* Causes final put_task_struct in finish_task_switch(): */
 	__set_current_state(TASK_DEAD);
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH tip/core/rcu 4/9] completion: Replace spin_unlock_wait() with lock/unlock pair
  2017-07-24 22:12 [PATCH tip/core/rcu 0/9] Remove spin_unlock_wait() Paul E. McKenney
                   ` (2 preceding siblings ...)
  2017-07-24 22:13 ` [PATCH tip/core/rcu 3/9] sched: " Paul E. McKenney
@ 2017-07-24 22:13 ` Paul E. McKenney
  2017-08-15 16:16   ` [PATCH v5 " Paul E. McKenney
  2017-07-24 22:13 ` [PATCH tip/core/rcu 5/9] exit: " Paul E. McKenney
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2017-07-24 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney, Ingo Molnar, Will Deacon, Alan Stern,
	Andrea Parri, Linus Torvalds

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore replaces the spin_unlock_wait() call in
completion_done() with spin_lock() followed immediately by spin_unlock().
This should be safe from a performance perspective because the lock
will be held only the wakeup happens really quickly.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
---
 kernel/sched/completion.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index 13fc5ae9bf2f..3feb218171a9 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -307,14 +307,9 @@ bool completion_done(struct completion *x)
 	 * If ->done, we need to wait for complete() to release ->wait.lock
 	 * otherwise we can end up freeing the completion before complete()
 	 * is done referencing it.
-	 *
-	 * The RMB pairs with complete()'s RELEASE of ->wait.lock and orders
-	 * the loads of ->done and ->wait.lock such that we cannot observe
-	 * the lock before complete() acquires it while observing the ->done
-	 * after it's acquired the lock.
 	 */
-	smp_rmb();
-	spin_unlock_wait(&x->wait.lock);
+	spin_lock_irq(&x->wait.lock);
+	spin_unlock_irq(&x->wait.lock);
 	return true;
 }
 EXPORT_SYMBOL(completion_done);
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH tip/core/rcu 5/9] exit: Replace spin_unlock_wait() with lock/unlock pair
  2017-07-24 22:12 [PATCH tip/core/rcu 0/9] Remove spin_unlock_wait() Paul E. McKenney
                   ` (3 preceding siblings ...)
  2017-07-24 22:13 ` [PATCH tip/core/rcu 4/9] completion: " Paul E. McKenney
@ 2017-07-24 22:13 ` Paul E. McKenney
  2017-07-24 22:13 ` [PATCH tip/core/rcu 6/9] ipc: " Paul E. McKenney
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2017-07-24 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney, Will Deacon, Alan Stern, Andrea Parri,
	Linus Torvalds

There is no agreed-upon definition of spin_unlock_wait()'s semantics, and
it appears that all callers could do just as well with a lock/unlock pair.
This commit therefore replaces the spin_unlock_wait() call in do_exit()
with spin_lock() followed immediately by spin_unlock().  This should be
safe from a performance perspective because the lock is a per-task lock,
and this is happening only at task-exit time.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
---
 kernel/exit.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index c5548faa9f37..abfbcf66e5c0 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -819,7 +819,8 @@ void __noreturn do_exit(long code)
 	 * Ensure that we must observe the pi_state in exit_mm() ->
 	 * mm_release() -> exit_pi_state_list().
 	 */
-	raw_spin_unlock_wait(&tsk->pi_lock);
+	raw_spin_lock_irq(&tsk->pi_lock);
+	raw_spin_unlock_irq(&tsk->pi_lock);
 
 	if (unlikely(in_atomic())) {
 		pr_info("note: %s[%d] exited with preempt_count %d\n",
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH tip/core/rcu 6/9] ipc: Replace spin_unlock_wait() with lock/unlock pair
  2017-07-24 22:12 [PATCH tip/core/rcu 0/9] Remove spin_unlock_wait() Paul E. McKenney
                   ` (4 preceding siblings ...)
  2017-07-24 22:13 ` [PATCH tip/core/rcu 5/9] exit: " Paul E. McKenney
@ 2017-07-24 22:13 ` Paul E. McKenney
  2017-07-24 22:13   ` Paul E. McKenney
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2017-07-24 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney, Davidlohr Bueso, Will Deacon, Alan Stern,
	Andrea Parri, Linus Torvalds

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore replaces the spin_unlock_wait() call in
exit_sem() with spin_lock() followed immediately by spin_unlock().
This should be safe from a performance perspective because exit_sem()
is rarely invoked in production.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
---
 ipc/sem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/ipc/sem.c b/ipc/sem.c
index 9e70cd7a17da..2570830e29fc 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -2091,7 +2091,8 @@ void exit_sem(struct task_struct *tsk)
 			 * possibility where we exit while freeary() didn't
 			 * finish unlocking sem_undo_list.
 			 */
-			spin_unlock_wait(&ulp->lock);
+			spin_lock(&ulp->lock);
+			spin_unlock(&ulp->lock);
 			rcu_read_unlock();
 			break;
 		}
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH tip/core/rcu 7/9] drivers/ata: Replace spin_unlock_wait() with lock/unlock pair
  2017-07-24 22:12 [PATCH tip/core/rcu 0/9] Remove spin_unlock_wait() Paul E. McKenney
@ 2017-07-24 22:13   ` Paul E. McKenney
  2017-07-24 22:13 ` [PATCH tip/core/rcu 2/9] task_work: Replace spin_unlock_wait() with lock/unlock pair Paul E. McKenney
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2017-07-24 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney, linux-ide, Will Deacon, Alan Stern,
	Andrea Parri, Linus Torvalds

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore eliminates the spin_unlock_wait() call and
associated else-clause and hoists the then-clause's lock and unlock out of
the "if" statement.  This should be safe from a performance perspective
because according to Tejun there should be few if any drivers that don't
set their own error handler.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: <linux-ide@vger.kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/ata/libata-eh.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index b70bcf6d2914..b325db27eb8c 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -645,12 +645,11 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap,
 	 * completions are honored.  A scmd is determined to have
 	 * timed out iff its associated qc is active and not failed.
 	 */
+	spin_lock_irqsave(ap->lock, flags);
 	if (ap->ops->error_handler) {
 		struct scsi_cmnd *scmd, *tmp;
 		int nr_timedout = 0;
 
-		spin_lock_irqsave(ap->lock, flags);
-
 		/* This must occur under the ap->lock as we don't want
 		   a polled recovery to race the real interrupt handler
 
@@ -700,12 +699,11 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap,
 		if (nr_timedout)
 			__ata_port_freeze(ap);
 
-		spin_unlock_irqrestore(ap->lock, flags);
 
 		/* initialize eh_tries */
 		ap->eh_tries = ATA_EH_MAX_TRIES;
-	} else
-		spin_unlock_wait(ap->lock);
+	}
+	spin_unlock_irqrestore(ap->lock, flags);
 
 }
 EXPORT_SYMBOL(ata_scsi_cmd_error_handler);
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH tip/core/rcu 7/9] drivers/ata: Replace spin_unlock_wait() with lock/unlock pair
@ 2017-07-24 22:13   ` Paul E. McKenney
  0 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2017-07-24 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney, linux-ide, Will Deacon, Alan Stern,
	Andrea Parri, Linus Torvalds

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore eliminates the spin_unlock_wait() call and
associated else-clause and hoists the then-clause's lock and unlock out of
the "if" statement.  This should be safe from a performance perspective
because according to Tejun there should be few if any drivers that don't
set their own error handler.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: <linux-ide@vger.kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/ata/libata-eh.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index b70bcf6d2914..b325db27eb8c 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -645,12 +645,11 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap,
 	 * completions are honored.  A scmd is determined to have
 	 * timed out iff its associated qc is active and not failed.
 	 */
+	spin_lock_irqsave(ap->lock, flags);
 	if (ap->ops->error_handler) {
 		struct scsi_cmnd *scmd, *tmp;
 		int nr_timedout = 0;
 
-		spin_lock_irqsave(ap->lock, flags);
-
 		/* This must occur under the ap->lock as we don't want
 		   a polled recovery to race the real interrupt handler
 
@@ -700,12 +699,11 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap,
 		if (nr_timedout)
 			__ata_port_freeze(ap);
 
-		spin_unlock_irqrestore(ap->lock, flags);
 
 		/* initialize eh_tries */
 		ap->eh_tries = ATA_EH_MAX_TRIES;
-	} else
-		spin_unlock_wait(ap->lock);
+	}
+	spin_unlock_irqrestore(ap->lock, flags);
 
 }
 EXPORT_SYMBOL(ata_scsi_cmd_error_handler);
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH tip/core/rcu 8/9] locking: Remove spin_unlock_wait() generic definitions
  2017-07-24 22:12 [PATCH tip/core/rcu 0/9] Remove spin_unlock_wait() Paul E. McKenney
                   ` (6 preceding siblings ...)
  2017-07-24 22:13   ` Paul E. McKenney
@ 2017-07-24 22:13 ` Paul E. McKenney
  2017-07-24 22:13   ` Paul E. McKenney
  2017-07-31 22:57 ` [PATCH v2 tip/core/rcu 0/10] Remove spin_unlock_wait() Paul E. McKenney
  9 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2017-07-24 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney, Arnd Bergmann, Ingo Molnar, Will Deacon,
	Alan Stern, Andrea Parri, Linus Torvalds

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore removes spin_unlock_wait() and related
definitions from core code.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
---
 include/asm-generic/qspinlock.h |  14 -----
 include/linux/spinlock.h        |  31 -----------
 include/linux/spinlock_up.h     |   6 ---
 kernel/locking/qspinlock.c      | 117 ----------------------------------------
 4 files changed, 168 deletions(-)

diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
index 9f0681bf1e87..66260777d644 100644
--- a/include/asm-generic/qspinlock.h
+++ b/include/asm-generic/qspinlock.h
@@ -22,17 +22,6 @@
 #include <asm-generic/qspinlock_types.h>
 
 /**
- * queued_spin_unlock_wait - wait until the _current_ lock holder releases the lock
- * @lock : Pointer to queued spinlock structure
- *
- * There is a very slight possibility of live-lock if the lockers keep coming
- * and the waiter is just unfortunate enough to not see any unlock state.
- */
-#ifndef queued_spin_unlock_wait
-extern void queued_spin_unlock_wait(struct qspinlock *lock);
-#endif
-
-/**
  * queued_spin_is_locked - is the spinlock locked?
  * @lock: Pointer to queued spinlock structure
  * Return: 1 if it is locked, 0 otherwise
@@ -41,8 +30,6 @@ extern void queued_spin_unlock_wait(struct qspinlock *lock);
 static __always_inline int queued_spin_is_locked(struct qspinlock *lock)
 {
 	/*
-	 * See queued_spin_unlock_wait().
-	 *
 	 * Any !0 state indicates it is locked, even if _Q_LOCKED_VAL
 	 * isn't immediately observable.
 	 */
@@ -135,6 +122,5 @@ static __always_inline bool virt_spin_lock(struct qspinlock *lock)
 #define arch_spin_trylock(l)		queued_spin_trylock(l)
 #define arch_spin_unlock(l)		queued_spin_unlock(l)
 #define arch_spin_lock_flags(l, f)	queued_spin_lock(l)
-#define arch_spin_unlock_wait(l)	queued_spin_unlock_wait(l)
 
 #endif /* __ASM_GENERIC_QSPINLOCK_H */
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index d9510e8522d4..ef018a6e4985 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -130,12 +130,6 @@ do {								\
 #define smp_mb__before_spinlock()	smp_wmb()
 #endif
 
-/**
- * raw_spin_unlock_wait - wait until the spinlock gets unlocked
- * @lock: the spinlock in question.
- */
-#define raw_spin_unlock_wait(lock)	arch_spin_unlock_wait(&(lock)->raw_lock)
-
 #ifdef CONFIG_DEBUG_SPINLOCK
  extern void do_raw_spin_lock(raw_spinlock_t *lock) __acquires(lock);
 #define do_raw_spin_lock_flags(lock, flags) do_raw_spin_lock(lock)
@@ -369,31 +363,6 @@ static __always_inline int spin_trylock_irq(spinlock_t *lock)
 	raw_spin_trylock_irqsave(spinlock_check(lock), flags); \
 })
 
-/**
- * spin_unlock_wait - Interpose between successive critical sections
- * @lock: the spinlock whose critical sections are to be interposed.
- *
- * Semantically this is equivalent to a spin_lock() immediately
- * followed by a spin_unlock().  However, most architectures have
- * more efficient implementations in which the spin_unlock_wait()
- * cannot block concurrent lock acquisition, and in some cases
- * where spin_unlock_wait() does not write to the lock variable.
- * Nevertheless, spin_unlock_wait() can have high overhead, so if
- * you feel the need to use it, please check to see if there is
- * a better way to get your job done.
- *
- * The ordering guarantees provided by spin_unlock_wait() are:
- *
- * 1.  All accesses preceding the spin_unlock_wait() happen before
- *     any accesses in later critical sections for this same lock.
- * 2.  All accesses following the spin_unlock_wait() happen after
- *     any accesses in earlier critical sections for this same lock.
- */
-static __always_inline void spin_unlock_wait(spinlock_t *lock)
-{
-	raw_spin_unlock_wait(&lock->rlock);
-}
-
 static __always_inline int spin_is_locked(spinlock_t *lock)
 {
 	return raw_spin_is_locked(&lock->rlock);
diff --git a/include/linux/spinlock_up.h b/include/linux/spinlock_up.h
index 0d9848de677d..612fb530af41 100644
--- a/include/linux/spinlock_up.h
+++ b/include/linux/spinlock_up.h
@@ -26,11 +26,6 @@
 #ifdef CONFIG_DEBUG_SPINLOCK
 #define arch_spin_is_locked(x)		((x)->slock == 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->slock, VAL);
-}
-
 static inline void arch_spin_lock(arch_spinlock_t *lock)
 {
 	lock->slock = 0;
@@ -73,7 +68,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 
 #else /* DEBUG_SPINLOCK */
 #define arch_spin_is_locked(lock)	((void)(lock), 0)
-#define arch_spin_unlock_wait(lock)	do { barrier(); (void)(lock); } while (0)
 /* for sched/core.c and kernel_lock.c: */
 # define arch_spin_lock(lock)		do { barrier(); (void)(lock); } while (0)
 # define arch_spin_lock_flags(lock, flags)	do { barrier(); (void)(lock); } while (0)
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index fd24153e8a48..294294c71ba4 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -268,123 +268,6 @@ static __always_inline u32  __pv_wait_head_or_lock(struct qspinlock *lock,
 #define queued_spin_lock_slowpath	native_queued_spin_lock_slowpath
 #endif
 
-/*
- * Various notes on spin_is_locked() and spin_unlock_wait(), which are
- * 'interesting' functions:
- *
- * PROBLEM: some architectures have an interesting issue with atomic ACQUIRE
- * operations in that the ACQUIRE applies to the LOAD _not_ the STORE (ARM64,
- * PPC). Also qspinlock has a similar issue per construction, the setting of
- * the locked byte can be unordered acquiring the lock proper.
- *
- * This gets to be 'interesting' in the following cases, where the /should/s
- * end up false because of this issue.
- *
- *
- * CASE 1:
- *
- * So the spin_is_locked() correctness issue comes from something like:
- *
- *   CPU0				CPU1
- *
- *   global_lock();			local_lock(i)
- *     spin_lock(&G)			  spin_lock(&L[i])
- *     for (i)				  if (!spin_is_locked(&G)) {
- *       spin_unlock_wait(&L[i]);	    smp_acquire__after_ctrl_dep();
- *					    return;
- *					  }
- *					  // deal with fail
- *
- * Where it is important CPU1 sees G locked or CPU0 sees L[i] locked such
- * that there is exclusion between the two critical sections.
- *
- * The load from spin_is_locked(&G) /should/ be constrained by the ACQUIRE from
- * spin_lock(&L[i]), and similarly the load(s) from spin_unlock_wait(&L[i])
- * /should/ be constrained by the ACQUIRE from spin_lock(&G).
- *
- * Similarly, later stuff is constrained by the ACQUIRE from CTRL+RMB.
- *
- *
- * CASE 2:
- *
- * For spin_unlock_wait() there is a second correctness issue, namely:
- *
- *   CPU0				CPU1
- *
- *   flag = set;
- *   smp_mb();				spin_lock(&l)
- *   spin_unlock_wait(&l);		if (!flag)
- *					  // add to lockless list
- *					spin_unlock(&l);
- *   // iterate lockless list
- *
- * Which wants to ensure that CPU1 will stop adding bits to the list and CPU0
- * will observe the last entry on the list (if spin_unlock_wait() had ACQUIRE
- * semantics etc..)
- *
- * Where flag /should/ be ordered against the locked store of l.
- */
-
-/*
- * queued_spin_lock_slowpath() can (load-)ACQUIRE the lock before
- * issuing an _unordered_ store to set _Q_LOCKED_VAL.
- *
- * This means that the store can be delayed, but no later than the
- * store-release from the unlock. This means that simply observing
- * _Q_LOCKED_VAL is not sufficient to determine if the lock is acquired.
- *
- * There are two paths that can issue the unordered store:
- *
- *  (1) clear_pending_set_locked():	*,1,0 -> *,0,1
- *
- *  (2) set_locked():			t,0,0 -> t,0,1 ; t != 0
- *      atomic_cmpxchg_relaxed():	t,0,0 -> 0,0,1
- *
- * However, in both cases we have other !0 state we've set before to queue
- * ourseves:
- *
- * For (1) we have the atomic_cmpxchg_acquire() that set _Q_PENDING_VAL, our
- * load is constrained by that ACQUIRE to not pass before that, and thus must
- * observe the store.
- *
- * For (2) we have a more intersting scenario. We enqueue ourselves using
- * xchg_tail(), which ends up being a RELEASE. This in itself is not
- * sufficient, however that is followed by an smp_cond_acquire() on the same
- * word, giving a RELEASE->ACQUIRE ordering. This again constrains our load and
- * guarantees we must observe that store.
- *
- * Therefore both cases have other !0 state that is observable before the
- * unordered locked byte store comes through. This means we can use that to
- * wait for the lock store, and then wait for an unlock.
- */
-#ifndef queued_spin_unlock_wait
-void queued_spin_unlock_wait(struct qspinlock *lock)
-{
-	u32 val;
-
-	for (;;) {
-		val = atomic_read(&lock->val);
-
-		if (!val) /* not locked, we're done */
-			goto done;
-
-		if (val & _Q_LOCKED_MASK) /* locked, go wait for unlock */
-			break;
-
-		/* not locked, but pending, wait until we observe the lock */
-		cpu_relax();
-	}
-
-	/* any unlock is good */
-	while (atomic_read(&lock->val) & _Q_LOCKED_MASK)
-		cpu_relax();
-
-done:
-	smp_acquire__after_ctrl_dep();
-}
-EXPORT_SYMBOL(queued_spin_unlock_wait);
-#endif
-
 #endif /* _GEN_PV_LOCK_SLOWPATH */
 
 /**
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH tip/core/rcu 9/9] arch: Remove spin_unlock_wait() arch-specific definitions
  2017-07-24 22:12 [PATCH tip/core/rcu 0/9] Remove spin_unlock_wait() Paul E. McKenney
@ 2017-07-24 22:13   ` Paul E. McKenney
  2017-07-24 22:13 ` [PATCH tip/core/rcu 2/9] task_work: Replace spin_unlock_wait() with lock/unlock pair Paul E. McKenney
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2017-07-24 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney, linux-arch, Alan Stern, Andrea Parri,
	Linus Torvalds

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore removes the underlying arch-specific
arch_spin_unlock_wait() for all architectures providing them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: <linux-arch@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Boqun Feng <boqun.feng@gmail.com>
---
 arch/alpha/include/asm/spinlock.h    |  5 ----
 arch/arc/include/asm/spinlock.h      |  5 ----
 arch/arm/include/asm/spinlock.h      | 16 ----------
 arch/arm64/include/asm/spinlock.h    | 58 ++++--------------------------------
 arch/blackfin/include/asm/spinlock.h |  5 ----
 arch/hexagon/include/asm/spinlock.h  |  5 ----
 arch/ia64/include/asm/spinlock.h     | 21 -------------
 arch/m32r/include/asm/spinlock.h     |  5 ----
 arch/metag/include/asm/spinlock.h    |  5 ----
 arch/mn10300/include/asm/spinlock.h  |  5 ----
 arch/parisc/include/asm/spinlock.h   |  7 -----
 arch/powerpc/include/asm/spinlock.h  | 33 --------------------
 arch/s390/include/asm/spinlock.h     |  7 -----
 arch/sh/include/asm/spinlock-cas.h   |  5 ----
 arch/sh/include/asm/spinlock-llsc.h  |  5 ----
 arch/sparc/include/asm/spinlock_32.h |  5 ----
 arch/tile/include/asm/spinlock_32.h  |  2 --
 arch/tile/include/asm/spinlock_64.h  |  2 --
 arch/tile/lib/spinlock_32.c          | 23 --------------
 arch/tile/lib/spinlock_64.c          | 22 --------------
 arch/xtensa/include/asm/spinlock.h   |  5 ----
 21 files changed, 5 insertions(+), 241 deletions(-)

diff --git a/arch/alpha/include/asm/spinlock.h b/arch/alpha/include/asm/spinlock.h
index a40b9fc0c6c3..718ac0b64adf 100644
--- a/arch/alpha/include/asm/spinlock.h
+++ b/arch/alpha/include/asm/spinlock.h
@@ -16,11 +16,6 @@
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 #define arch_spin_is_locked(x)	((x)->lock != 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->lock, !VAL);
-}
-
 static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 {
         return lock.lock == 0;
diff --git a/arch/arc/include/asm/spinlock.h b/arch/arc/include/asm/spinlock.h
index 233d5ffe6ec7..a325e6a36523 100644
--- a/arch/arc/include/asm/spinlock.h
+++ b/arch/arc/include/asm/spinlock.h
@@ -16,11 +16,6 @@
 #define arch_spin_is_locked(x)	((x)->slock != __ARCH_SPIN_LOCK_UNLOCKED__)
 #define arch_spin_lock_flags(lock, flags)	arch_spin_lock(lock)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->slock, !VAL);
-}
-
 #ifdef CONFIG_ARC_HAS_LLSC
 
 static inline void arch_spin_lock(arch_spinlock_t *lock)
diff --git a/arch/arm/include/asm/spinlock.h b/arch/arm/include/asm/spinlock.h
index 4bec45442072..c030143c18c6 100644
--- a/arch/arm/include/asm/spinlock.h
+++ b/arch/arm/include/asm/spinlock.h
@@ -52,22 +52,6 @@ static inline void dsb_sev(void)
  * memory.
  */
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	u16 owner = READ_ONCE(lock->tickets.owner);
-
-	for (;;) {
-		arch_spinlock_t tmp = READ_ONCE(*lock);
-
-		if (tmp.tickets.owner == tmp.tickets.next ||
-		    tmp.tickets.owner != owner)
-			break;
-
-		wfe();
-	}
-	smp_acquire__after_ctrl_dep();
-}
-
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
 static inline void arch_spin_lock(arch_spinlock_t *lock)
diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h
index cae331d553f8..f445bd7f2b9f 100644
--- a/arch/arm64/include/asm/spinlock.h
+++ b/arch/arm64/include/asm/spinlock.h
@@ -26,58 +26,6 @@
  * The memory barriers are implicit with the load-acquire and store-release
  * instructions.
  */
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	unsigned int tmp;
-	arch_spinlock_t lockval;
-	u32 owner;
-
-	/*
-	 * Ensure prior spin_lock operations to other locks have completed
-	 * on this CPU before we test whether "lock" is locked.
-	 */
-	smp_mb();
-	owner = READ_ONCE(lock->owner) << 16;
-
-	asm volatile(
-"	sevl\n"
-"1:	wfe\n"
-"2:	ldaxr	%w0, %2\n"
-	/* Is the lock free? */
-"	eor	%w1, %w0, %w0, ror #16\n"
-"	cbz	%w1, 3f\n"
-	/* Lock taken -- has there been a subsequent unlock->lock transition? */
-"	eor	%w1, %w3, %w0, lsl #16\n"
-"	cbz	%w1, 1b\n"
-	/*
-	 * The owner has been updated, so there was an unlock->lock
-	 * transition that we missed. That means we can rely on the
-	 * store-release of the unlock operation paired with the
-	 * load-acquire of the lock operation to publish any of our
-	 * previous stores to the new lock owner and therefore don't
-	 * need to bother with the writeback below.
-	 */
-"	b	4f\n"
-"3:\n"
-	/*
-	 * Serialise against any concurrent lockers by writing back the
-	 * unlocked lock value
-	 */
-	ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-"	stxr	%w1, %w0, %2\n"
-	__nops(2),
-	/* LSE atomics */
-"	mov	%w1, %w0\n"
-"	cas	%w0, %w0, %2\n"
-"	eor	%w1, %w1, %w0\n")
-	/* Somebody else wrote to the lock, GOTO 10 and reload the value */
-"	cbnz	%w1, 2b\n"
-"4:"
-	: "=&r" (lockval), "=&r" (tmp), "+Q" (*lock)
-	: "r" (owner)
-	: "memory");
-}
 
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
@@ -176,7 +124,11 @@ static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 
 static inline int arch_spin_is_locked(arch_spinlock_t *lock)
 {
-	smp_mb(); /* See arch_spin_unlock_wait */
+	/*
+	 * Ensure prior spin_lock operations to other locks have completed
+	 * on this CPU before we test whether "lock" is locked.
+	 */
+	smp_mb(); /* ^^^ */
 	return !arch_spin_value_unlocked(READ_ONCE(*lock));
 }
 
diff --git a/arch/blackfin/include/asm/spinlock.h b/arch/blackfin/include/asm/spinlock.h
index c58f4a83ed6f..f6431439d15d 100644
--- a/arch/blackfin/include/asm/spinlock.h
+++ b/arch/blackfin/include/asm/spinlock.h
@@ -48,11 +48,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 	__raw_spin_unlock_asm(&lock->lock);
 }
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->lock, !VAL);
-}
-
 static inline int arch_read_can_lock(arch_rwlock_t *rw)
 {
 	return __raw_uncached_fetch_asm(&rw->lock) > 0;
diff --git a/arch/hexagon/include/asm/spinlock.h b/arch/hexagon/include/asm/spinlock.h
index a1c55788c5d6..53a8d5885887 100644
--- a/arch/hexagon/include/asm/spinlock.h
+++ b/arch/hexagon/include/asm/spinlock.h
@@ -179,11 +179,6 @@ static inline unsigned int arch_spin_trylock(arch_spinlock_t *lock)
  */
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->lock, !VAL);
-}
-
 #define arch_spin_is_locked(x) ((x)->lock != 0)
 
 #define arch_read_lock_flags(lock, flags) arch_read_lock(lock)
diff --git a/arch/ia64/include/asm/spinlock.h b/arch/ia64/include/asm/spinlock.h
index ca9e76149a4a..df2c121164b8 100644
--- a/arch/ia64/include/asm/spinlock.h
+++ b/arch/ia64/include/asm/spinlock.h
@@ -76,22 +76,6 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 	ACCESS_ONCE(*p) = (tmp + 2) & ~1;
 }
 
-static __always_inline void __ticket_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	int	*p = (int *)&lock->lock, ticket;
-
-	ia64_invala();
-
-	for (;;) {
-		asm volatile ("ld4.c.nc %0=[%1]" : "=r"(ticket) : "r"(p) : "memory");
-		if (!(((ticket >> TICKET_SHIFT) ^ ticket) & TICKET_MASK))
-			return;
-		cpu_relax();
-	}
-
-	smp_acquire__after_ctrl_dep();
-}
-
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
 {
 	long tmp = ACCESS_ONCE(lock->lock);
@@ -143,11 +127,6 @@ static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock,
 	arch_spin_lock(lock);
 }
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	__ticket_spin_unlock_wait(lock);
-}
-
 #define arch_read_can_lock(rw)		(*(volatile int *)(rw) >= 0)
 #define arch_write_can_lock(rw)	(*(volatile int *)(rw) == 0)
 
diff --git a/arch/m32r/include/asm/spinlock.h b/arch/m32r/include/asm/spinlock.h
index 323c7fc953cd..a56825592b90 100644
--- a/arch/m32r/include/asm/spinlock.h
+++ b/arch/m32r/include/asm/spinlock.h
@@ -30,11 +30,6 @@
 #define arch_spin_is_locked(x)		(*(volatile int *)(&(x)->slock) <= 0)
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->slock, VAL > 0);
-}
-
 /**
  * arch_spin_trylock - Try spin lock and return a result
  * @lock: Pointer to the lock variable
diff --git a/arch/metag/include/asm/spinlock.h b/arch/metag/include/asm/spinlock.h
index c0c7a22be1ae..ddf7fe5708a6 100644
--- a/arch/metag/include/asm/spinlock.h
+++ b/arch/metag/include/asm/spinlock.h
@@ -15,11 +15,6 @@
  * locked.
  */
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->lock, !VAL);
-}
-
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
 #define	arch_read_lock_flags(lock, flags) arch_read_lock(lock)
diff --git a/arch/mn10300/include/asm/spinlock.h b/arch/mn10300/include/asm/spinlock.h
index 9c7b8f7942d8..fe413b41df6c 100644
--- a/arch/mn10300/include/asm/spinlock.h
+++ b/arch/mn10300/include/asm/spinlock.h
@@ -26,11 +26,6 @@
 
 #define arch_spin_is_locked(x)	(*(volatile signed char *)(&(x)->slock) != 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->slock, !VAL);
-}
-
 static inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
 	asm volatile(
diff --git a/arch/parisc/include/asm/spinlock.h b/arch/parisc/include/asm/spinlock.h
index e32936cd7f10..55bfe4affca3 100644
--- a/arch/parisc/include/asm/spinlock.h
+++ b/arch/parisc/include/asm/spinlock.h
@@ -14,13 +14,6 @@ static inline int arch_spin_is_locked(arch_spinlock_t *x)
 
 #define arch_spin_lock(lock) arch_spin_lock_flags(lock, 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *x)
-{
-	volatile unsigned int *a = __ldcw_align(x);
-
-	smp_cond_load_acquire(a, VAL);
-}
-
 static inline void arch_spin_lock_flags(arch_spinlock_t *x,
 					 unsigned long flags)
 {
diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
index 8c1b913de6d7..d256e448ea49 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -170,39 +170,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 	lock->slock = 0;
 }
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	arch_spinlock_t lock_val;
-
-	smp_mb();
-
-	/*
-	 * Atomically load and store back the lock value (unchanged). This
-	 * ensures that our observation of the lock value is ordered with
-	 * respect to other lock operations.
-	 */
-	__asm__ __volatile__(
-"1:	" PPC_LWARX(%0, 0, %2, 0) "\n"
-"	stwcx. %0, 0, %2\n"
-"	bne- 1b\n"
-	: "=&r" (lock_val), "+m" (*lock)
-	: "r" (lock)
-	: "cr0", "xer");
-
-	if (arch_spin_value_unlocked(lock_val))
-		goto out;
-
-	while (lock->slock) {
-		HMT_low();
-		if (SHARED_PROCESSOR)
-			__spin_yield(lock);
-	}
-	HMT_medium();
-
-out:
-	smp_mb();
-}
-
 /*
  * Read-write spinlocks, allowing multiple readers
  * but only one writer.
diff --git a/arch/s390/include/asm/spinlock.h b/arch/s390/include/asm/spinlock.h
index f7838ecd83c6..217ee5210c32 100644
--- a/arch/s390/include/asm/spinlock.h
+++ b/arch/s390/include/asm/spinlock.h
@@ -98,13 +98,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lp)
 		: "cc", "memory");
 }
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	while (arch_spin_is_locked(lock))
-		arch_spin_relax(lock);
-	smp_acquire__after_ctrl_dep();
-}
-
 /*
  * Read-write spinlocks, allowing multiple readers
  * but only one writer.
diff --git a/arch/sh/include/asm/spinlock-cas.h b/arch/sh/include/asm/spinlock-cas.h
index c46e8cc7b515..5ed7dbbd94ff 100644
--- a/arch/sh/include/asm/spinlock-cas.h
+++ b/arch/sh/include/asm/spinlock-cas.h
@@ -29,11 +29,6 @@ static inline unsigned __sl_cas(volatile unsigned *p, unsigned old, unsigned new
 #define arch_spin_is_locked(x)		((x)->lock <= 0)
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->lock, VAL > 0);
-}
-
 static inline void arch_spin_lock(arch_spinlock_t *lock)
 {
 	while (!__sl_cas(&lock->lock, 1, 0));
diff --git a/arch/sh/include/asm/spinlock-llsc.h b/arch/sh/include/asm/spinlock-llsc.h
index cec78143fa83..f77263aae760 100644
--- a/arch/sh/include/asm/spinlock-llsc.h
+++ b/arch/sh/include/asm/spinlock-llsc.h
@@ -21,11 +21,6 @@
 #define arch_spin_is_locked(x)		((x)->lock <= 0)
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->lock, VAL > 0);
-}
-
 /*
  * Simple spin lock operations.  There are two variants, one clears IRQ's
  * on the local processor, one does not.
diff --git a/arch/sparc/include/asm/spinlock_32.h b/arch/sparc/include/asm/spinlock_32.h
index 8011e79f59c9..67345b2dc408 100644
--- a/arch/sparc/include/asm/spinlock_32.h
+++ b/arch/sparc/include/asm/spinlock_32.h
@@ -14,11 +14,6 @@
 
 #define arch_spin_is_locked(lock) (*((volatile unsigned char *)(lock)) != 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->lock, !VAL);
-}
-
 static inline void arch_spin_lock(arch_spinlock_t *lock)
 {
 	__asm__ __volatile__(
diff --git a/arch/tile/include/asm/spinlock_32.h b/arch/tile/include/asm/spinlock_32.h
index b14b1ba5bf9c..cba8ba9b8da6 100644
--- a/arch/tile/include/asm/spinlock_32.h
+++ b/arch/tile/include/asm/spinlock_32.h
@@ -64,8 +64,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 	lock->current_ticket = old_ticket + TICKET_QUANTUM;
 }
 
-void arch_spin_unlock_wait(arch_spinlock_t *lock);
-
 /*
  * Read-write spinlocks, allowing multiple readers
  * but only one writer.
diff --git a/arch/tile/include/asm/spinlock_64.h b/arch/tile/include/asm/spinlock_64.h
index b9718fb4e74a..9a2c2d605752 100644
--- a/arch/tile/include/asm/spinlock_64.h
+++ b/arch/tile/include/asm/spinlock_64.h
@@ -58,8 +58,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 	__insn_fetchadd4(&lock->lock, 1U << __ARCH_SPIN_CURRENT_SHIFT);
 }
 
-void arch_spin_unlock_wait(arch_spinlock_t *lock);
-
 void arch_spin_lock_slow(arch_spinlock_t *lock, u32 val);
 
 /* Grab the "next" ticket number and bump it atomically.
diff --git a/arch/tile/lib/spinlock_32.c b/arch/tile/lib/spinlock_32.c
index 076c6cc43113..db9333f2447c 100644
--- a/arch/tile/lib/spinlock_32.c
+++ b/arch/tile/lib/spinlock_32.c
@@ -62,29 +62,6 @@ int arch_spin_trylock(arch_spinlock_t *lock)
 }
 EXPORT_SYMBOL(arch_spin_trylock);
 
-void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	u32 iterations = 0;
-	int curr = READ_ONCE(lock->current_ticket);
-	int next = READ_ONCE(lock->next_ticket);
-
-	/* Return immediately if unlocked. */
-	if (next == curr)
-		return;
-
-	/* Wait until the current locker has released the lock. */
-	do {
-		delay_backoff(iterations++);
-	} while (READ_ONCE(lock->current_ticket) == curr);
-
-	/*
-	 * The TILE architecture doesn't do read speculation; therefore
-	 * a control dependency guarantees a LOAD->{LOAD,STORE} order.
-	 */
-	barrier();
-}
-EXPORT_SYMBOL(arch_spin_unlock_wait);
-
 /*
  * The low byte is always reserved to be the marker for a "tns" operation
  * since the low bit is set to "1" by a tns.  The next seven bits are
diff --git a/arch/tile/lib/spinlock_64.c b/arch/tile/lib/spinlock_64.c
index a4b5b2cbce93..de414c22892f 100644
--- a/arch/tile/lib/spinlock_64.c
+++ b/arch/tile/lib/spinlock_64.c
@@ -62,28 +62,6 @@ int arch_spin_trylock(arch_spinlock_t *lock)
 }
 EXPORT_SYMBOL(arch_spin_trylock);
 
-void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	u32 iterations = 0;
-	u32 val = READ_ONCE(lock->lock);
-	u32 curr = arch_spin_current(val);
-
-	/* Return immediately if unlocked. */
-	if (arch_spin_next(val) == curr)
-		return;
-
-	/* Wait until the current locker has released the lock. */
-	do {
-		delay_backoff(iterations++);
-	} while (arch_spin_current(READ_ONCE(lock->lock)) == curr);
-
-	/*
-	 * The TILE architecture doesn't do read speculation; therefore
-	 * a control dependency guarantees a LOAD->{LOAD,STORE} order.
-	 */
-	barrier();
-}
-EXPORT_SYMBOL(arch_spin_unlock_wait);
 
 /*
  * If the read lock fails due to a writer, we retry periodically
diff --git a/arch/xtensa/include/asm/spinlock.h b/arch/xtensa/include/asm/spinlock.h
index a36221cf6363..3bb49681ee24 100644
--- a/arch/xtensa/include/asm/spinlock.h
+++ b/arch/xtensa/include/asm/spinlock.h
@@ -33,11 +33,6 @@
 
 #define arch_spin_is_locked(x) ((x)->slock != 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->slock, !VAL);
-}
-
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
 static inline void arch_spin_lock(arch_spinlock_t *lock)
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH tip/core/rcu 9/9] arch: Remove spin_unlock_wait() arch-specific definitions
@ 2017-07-24 22:13   ` Paul E. McKenney
  0 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2017-07-24 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney, linux-arch, Alan Stern, Andrea Parri,
	Linus Torvalds

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore removes the underlying arch-specific
arch_spin_unlock_wait() for all architectures providing them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: <linux-arch@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Boqun Feng <boqun.feng@gmail.com>
---
 arch/alpha/include/asm/spinlock.h    |  5 ----
 arch/arc/include/asm/spinlock.h      |  5 ----
 arch/arm/include/asm/spinlock.h      | 16 ----------
 arch/arm64/include/asm/spinlock.h    | 58 ++++--------------------------------
 arch/blackfin/include/asm/spinlock.h |  5 ----
 arch/hexagon/include/asm/spinlock.h  |  5 ----
 arch/ia64/include/asm/spinlock.h     | 21 -------------
 arch/m32r/include/asm/spinlock.h     |  5 ----
 arch/metag/include/asm/spinlock.h    |  5 ----
 arch/mn10300/include/asm/spinlock.h  |  5 ----
 arch/parisc/include/asm/spinlock.h   |  7 -----
 arch/powerpc/include/asm/spinlock.h  | 33 --------------------
 arch/s390/include/asm/spinlock.h     |  7 -----
 arch/sh/include/asm/spinlock-cas.h   |  5 ----
 arch/sh/include/asm/spinlock-llsc.h  |  5 ----
 arch/sparc/include/asm/spinlock_32.h |  5 ----
 arch/tile/include/asm/spinlock_32.h  |  2 --
 arch/tile/include/asm/spinlock_64.h  |  2 --
 arch/tile/lib/spinlock_32.c          | 23 --------------
 arch/tile/lib/spinlock_64.c          | 22 --------------
 arch/xtensa/include/asm/spinlock.h   |  5 ----
 21 files changed, 5 insertions(+), 241 deletions(-)

diff --git a/arch/alpha/include/asm/spinlock.h b/arch/alpha/include/asm/spinlock.h
index a40b9fc0c6c3..718ac0b64adf 100644
--- a/arch/alpha/include/asm/spinlock.h
+++ b/arch/alpha/include/asm/spinlock.h
@@ -16,11 +16,6 @@
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 #define arch_spin_is_locked(x)	((x)->lock != 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->lock, !VAL);
-}
-
 static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 {
         return lock.lock == 0;
diff --git a/arch/arc/include/asm/spinlock.h b/arch/arc/include/asm/spinlock.h
index 233d5ffe6ec7..a325e6a36523 100644
--- a/arch/arc/include/asm/spinlock.h
+++ b/arch/arc/include/asm/spinlock.h
@@ -16,11 +16,6 @@
 #define arch_spin_is_locked(x)	((x)->slock != __ARCH_SPIN_LOCK_UNLOCKED__)
 #define arch_spin_lock_flags(lock, flags)	arch_spin_lock(lock)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->slock, !VAL);
-}
-
 #ifdef CONFIG_ARC_HAS_LLSC
 
 static inline void arch_spin_lock(arch_spinlock_t *lock)
diff --git a/arch/arm/include/asm/spinlock.h b/arch/arm/include/asm/spinlock.h
index 4bec45442072..c030143c18c6 100644
--- a/arch/arm/include/asm/spinlock.h
+++ b/arch/arm/include/asm/spinlock.h
@@ -52,22 +52,6 @@ static inline void dsb_sev(void)
  * memory.
  */
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	u16 owner = READ_ONCE(lock->tickets.owner);
-
-	for (;;) {
-		arch_spinlock_t tmp = READ_ONCE(*lock);
-
-		if (tmp.tickets.owner == tmp.tickets.next ||
-		    tmp.tickets.owner != owner)
-			break;
-
-		wfe();
-	}
-	smp_acquire__after_ctrl_dep();
-}
-
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
 static inline void arch_spin_lock(arch_spinlock_t *lock)
diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h
index cae331d553f8..f445bd7f2b9f 100644
--- a/arch/arm64/include/asm/spinlock.h
+++ b/arch/arm64/include/asm/spinlock.h
@@ -26,58 +26,6 @@
  * The memory barriers are implicit with the load-acquire and store-release
  * instructions.
  */
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	unsigned int tmp;
-	arch_spinlock_t lockval;
-	u32 owner;
-
-	/*
-	 * Ensure prior spin_lock operations to other locks have completed
-	 * on this CPU before we test whether "lock" is locked.
-	 */
-	smp_mb();
-	owner = READ_ONCE(lock->owner) << 16;
-
-	asm volatile(
-"	sevl\n"
-"1:	wfe\n"
-"2:	ldaxr	%w0, %2\n"
-	/* Is the lock free? */
-"	eor	%w1, %w0, %w0, ror #16\n"
-"	cbz	%w1, 3f\n"
-	/* Lock taken -- has there been a subsequent unlock->lock transition? */
-"	eor	%w1, %w3, %w0, lsl #16\n"
-"	cbz	%w1, 1b\n"
-	/*
-	 * The owner has been updated, so there was an unlock->lock
-	 * transition that we missed. That means we can rely on the
-	 * store-release of the unlock operation paired with the
-	 * load-acquire of the lock operation to publish any of our
-	 * previous stores to the new lock owner and therefore don't
-	 * need to bother with the writeback below.
-	 */
-"	b	4f\n"
-"3:\n"
-	/*
-	 * Serialise against any concurrent lockers by writing back the
-	 * unlocked lock value
-	 */
-	ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-"	stxr	%w1, %w0, %2\n"
-	__nops(2),
-	/* LSE atomics */
-"	mov	%w1, %w0\n"
-"	cas	%w0, %w0, %2\n"
-"	eor	%w1, %w1, %w0\n")
-	/* Somebody else wrote to the lock, GOTO 10 and reload the value */
-"	cbnz	%w1, 2b\n"
-"4:"
-	: "=&r" (lockval), "=&r" (tmp), "+Q" (*lock)
-	: "r" (owner)
-	: "memory");
-}
 
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
@@ -176,7 +124,11 @@ static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 
 static inline int arch_spin_is_locked(arch_spinlock_t *lock)
 {
-	smp_mb(); /* See arch_spin_unlock_wait */
+	/*
+	 * Ensure prior spin_lock operations to other locks have completed
+	 * on this CPU before we test whether "lock" is locked.
+	 */
+	smp_mb(); /* ^^^ */
 	return !arch_spin_value_unlocked(READ_ONCE(*lock));
 }
 
diff --git a/arch/blackfin/include/asm/spinlock.h b/arch/blackfin/include/asm/spinlock.h
index c58f4a83ed6f..f6431439d15d 100644
--- a/arch/blackfin/include/asm/spinlock.h
+++ b/arch/blackfin/include/asm/spinlock.h
@@ -48,11 +48,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 	__raw_spin_unlock_asm(&lock->lock);
 }
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->lock, !VAL);
-}
-
 static inline int arch_read_can_lock(arch_rwlock_t *rw)
 {
 	return __raw_uncached_fetch_asm(&rw->lock) > 0;
diff --git a/arch/hexagon/include/asm/spinlock.h b/arch/hexagon/include/asm/spinlock.h
index a1c55788c5d6..53a8d5885887 100644
--- a/arch/hexagon/include/asm/spinlock.h
+++ b/arch/hexagon/include/asm/spinlock.h
@@ -179,11 +179,6 @@ static inline unsigned int arch_spin_trylock(arch_spinlock_t *lock)
  */
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->lock, !VAL);
-}
-
 #define arch_spin_is_locked(x) ((x)->lock != 0)
 
 #define arch_read_lock_flags(lock, flags) arch_read_lock(lock)
diff --git a/arch/ia64/include/asm/spinlock.h b/arch/ia64/include/asm/spinlock.h
index ca9e76149a4a..df2c121164b8 100644
--- a/arch/ia64/include/asm/spinlock.h
+++ b/arch/ia64/include/asm/spinlock.h
@@ -76,22 +76,6 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 	ACCESS_ONCE(*p) = (tmp + 2) & ~1;
 }
 
-static __always_inline void __ticket_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	int	*p = (int *)&lock->lock, ticket;
-
-	ia64_invala();
-
-	for (;;) {
-		asm volatile ("ld4.c.nc %0=[%1]" : "=r"(ticket) : "r"(p) : "memory");
-		if (!(((ticket >> TICKET_SHIFT) ^ ticket) & TICKET_MASK))
-			return;
-		cpu_relax();
-	}
-
-	smp_acquire__after_ctrl_dep();
-}
-
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
 {
 	long tmp = ACCESS_ONCE(lock->lock);
@@ -143,11 +127,6 @@ static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock,
 	arch_spin_lock(lock);
 }
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	__ticket_spin_unlock_wait(lock);
-}
-
 #define arch_read_can_lock(rw)		(*(volatile int *)(rw) >= 0)
 #define arch_write_can_lock(rw)	(*(volatile int *)(rw) == 0)
 
diff --git a/arch/m32r/include/asm/spinlock.h b/arch/m32r/include/asm/spinlock.h
index 323c7fc953cd..a56825592b90 100644
--- a/arch/m32r/include/asm/spinlock.h
+++ b/arch/m32r/include/asm/spinlock.h
@@ -30,11 +30,6 @@
 #define arch_spin_is_locked(x)		(*(volatile int *)(&(x)->slock) <= 0)
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->slock, VAL > 0);
-}
-
 /**
  * arch_spin_trylock - Try spin lock and return a result
  * @lock: Pointer to the lock variable
diff --git a/arch/metag/include/asm/spinlock.h b/arch/metag/include/asm/spinlock.h
index c0c7a22be1ae..ddf7fe5708a6 100644
--- a/arch/metag/include/asm/spinlock.h
+++ b/arch/metag/include/asm/spinlock.h
@@ -15,11 +15,6 @@
  * locked.
  */
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->lock, !VAL);
-}
-
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
 #define	arch_read_lock_flags(lock, flags) arch_read_lock(lock)
diff --git a/arch/mn10300/include/asm/spinlock.h b/arch/mn10300/include/asm/spinlock.h
index 9c7b8f7942d8..fe413b41df6c 100644
--- a/arch/mn10300/include/asm/spinlock.h
+++ b/arch/mn10300/include/asm/spinlock.h
@@ -26,11 +26,6 @@
 
 #define arch_spin_is_locked(x)	(*(volatile signed char *)(&(x)->slock) != 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->slock, !VAL);
-}
-
 static inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
 	asm volatile(
diff --git a/arch/parisc/include/asm/spinlock.h b/arch/parisc/include/asm/spinlock.h
index e32936cd7f10..55bfe4affca3 100644
--- a/arch/parisc/include/asm/spinlock.h
+++ b/arch/parisc/include/asm/spinlock.h
@@ -14,13 +14,6 @@ static inline int arch_spin_is_locked(arch_spinlock_t *x)
 
 #define arch_spin_lock(lock) arch_spin_lock_flags(lock, 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *x)
-{
-	volatile unsigned int *a = __ldcw_align(x);
-
-	smp_cond_load_acquire(a, VAL);
-}
-
 static inline void arch_spin_lock_flags(arch_spinlock_t *x,
 					 unsigned long flags)
 {
diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
index 8c1b913de6d7..d256e448ea49 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -170,39 +170,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 	lock->slock = 0;
 }
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	arch_spinlock_t lock_val;
-
-	smp_mb();
-
-	/*
-	 * Atomically load and store back the lock value (unchanged). This
-	 * ensures that our observation of the lock value is ordered with
-	 * respect to other lock operations.
-	 */
-	__asm__ __volatile__(
-"1:	" PPC_LWARX(%0, 0, %2, 0) "\n"
-"	stwcx. %0, 0, %2\n"
-"	bne- 1b\n"
-	: "=&r" (lock_val), "+m" (*lock)
-	: "r" (lock)
-	: "cr0", "xer");
-
-	if (arch_spin_value_unlocked(lock_val))
-		goto out;
-
-	while (lock->slock) {
-		HMT_low();
-		if (SHARED_PROCESSOR)
-			__spin_yield(lock);
-	}
-	HMT_medium();
-
-out:
-	smp_mb();
-}
-
 /*
  * Read-write spinlocks, allowing multiple readers
  * but only one writer.
diff --git a/arch/s390/include/asm/spinlock.h b/arch/s390/include/asm/spinlock.h
index f7838ecd83c6..217ee5210c32 100644
--- a/arch/s390/include/asm/spinlock.h
+++ b/arch/s390/include/asm/spinlock.h
@@ -98,13 +98,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lp)
 		: "cc", "memory");
 }
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	while (arch_spin_is_locked(lock))
-		arch_spin_relax(lock);
-	smp_acquire__after_ctrl_dep();
-}
-
 /*
  * Read-write spinlocks, allowing multiple readers
  * but only one writer.
diff --git a/arch/sh/include/asm/spinlock-cas.h b/arch/sh/include/asm/spinlock-cas.h
index c46e8cc7b515..5ed7dbbd94ff 100644
--- a/arch/sh/include/asm/spinlock-cas.h
+++ b/arch/sh/include/asm/spinlock-cas.h
@@ -29,11 +29,6 @@ static inline unsigned __sl_cas(volatile unsigned *p, unsigned old, unsigned new
 #define arch_spin_is_locked(x)		((x)->lock <= 0)
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->lock, VAL > 0);
-}
-
 static inline void arch_spin_lock(arch_spinlock_t *lock)
 {
 	while (!__sl_cas(&lock->lock, 1, 0));
diff --git a/arch/sh/include/asm/spinlock-llsc.h b/arch/sh/include/asm/spinlock-llsc.h
index cec78143fa83..f77263aae760 100644
--- a/arch/sh/include/asm/spinlock-llsc.h
+++ b/arch/sh/include/asm/spinlock-llsc.h
@@ -21,11 +21,6 @@
 #define arch_spin_is_locked(x)		((x)->lock <= 0)
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->lock, VAL > 0);
-}
-
 /*
  * Simple spin lock operations.  There are two variants, one clears IRQ's
  * on the local processor, one does not.
diff --git a/arch/sparc/include/asm/spinlock_32.h b/arch/sparc/include/asm/spinlock_32.h
index 8011e79f59c9..67345b2dc408 100644
--- a/arch/sparc/include/asm/spinlock_32.h
+++ b/arch/sparc/include/asm/spinlock_32.h
@@ -14,11 +14,6 @@
 
 #define arch_spin_is_locked(lock) (*((volatile unsigned char *)(lock)) != 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->lock, !VAL);
-}
-
 static inline void arch_spin_lock(arch_spinlock_t *lock)
 {
 	__asm__ __volatile__(
diff --git a/arch/tile/include/asm/spinlock_32.h b/arch/tile/include/asm/spinlock_32.h
index b14b1ba5bf9c..cba8ba9b8da6 100644
--- a/arch/tile/include/asm/spinlock_32.h
+++ b/arch/tile/include/asm/spinlock_32.h
@@ -64,8 +64,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 	lock->current_ticket = old_ticket + TICKET_QUANTUM;
 }
 
-void arch_spin_unlock_wait(arch_spinlock_t *lock);
-
 /*
  * Read-write spinlocks, allowing multiple readers
  * but only one writer.
diff --git a/arch/tile/include/asm/spinlock_64.h b/arch/tile/include/asm/spinlock_64.h
index b9718fb4e74a..9a2c2d605752 100644
--- a/arch/tile/include/asm/spinlock_64.h
+++ b/arch/tile/include/asm/spinlock_64.h
@@ -58,8 +58,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 	__insn_fetchadd4(&lock->lock, 1U << __ARCH_SPIN_CURRENT_SHIFT);
 }
 
-void arch_spin_unlock_wait(arch_spinlock_t *lock);
-
 void arch_spin_lock_slow(arch_spinlock_t *lock, u32 val);
 
 /* Grab the "next" ticket number and bump it atomically.
diff --git a/arch/tile/lib/spinlock_32.c b/arch/tile/lib/spinlock_32.c
index 076c6cc43113..db9333f2447c 100644
--- a/arch/tile/lib/spinlock_32.c
+++ b/arch/tile/lib/spinlock_32.c
@@ -62,29 +62,6 @@ int arch_spin_trylock(arch_spinlock_t *lock)
 }
 EXPORT_SYMBOL(arch_spin_trylock);
 
-void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	u32 iterations = 0;
-	int curr = READ_ONCE(lock->current_ticket);
-	int next = READ_ONCE(lock->next_ticket);
-
-	/* Return immediately if unlocked. */
-	if (next == curr)
-		return;
-
-	/* Wait until the current locker has released the lock. */
-	do {
-		delay_backoff(iterations++);
-	} while (READ_ONCE(lock->current_ticket) == curr);
-
-	/*
-	 * The TILE architecture doesn't do read speculation; therefore
-	 * a control dependency guarantees a LOAD->{LOAD,STORE} order.
-	 */
-	barrier();
-}
-EXPORT_SYMBOL(arch_spin_unlock_wait);
-
 /*
  * The low byte is always reserved to be the marker for a "tns" operation
  * since the low bit is set to "1" by a tns.  The next seven bits are
diff --git a/arch/tile/lib/spinlock_64.c b/arch/tile/lib/spinlock_64.c
index a4b5b2cbce93..de414c22892f 100644
--- a/arch/tile/lib/spinlock_64.c
+++ b/arch/tile/lib/spinlock_64.c
@@ -62,28 +62,6 @@ int arch_spin_trylock(arch_spinlock_t *lock)
 }
 EXPORT_SYMBOL(arch_spin_trylock);
 
-void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	u32 iterations = 0;
-	u32 val = READ_ONCE(lock->lock);
-	u32 curr = arch_spin_current(val);
-
-	/* Return immediately if unlocked. */
-	if (arch_spin_next(val) == curr)
-		return;
-
-	/* Wait until the current locker has released the lock. */
-	do {
-		delay_backoff(iterations++);
-	} while (arch_spin_current(READ_ONCE(lock->lock)) == curr);
-
-	/*
-	 * The TILE architecture doesn't do read speculation; therefore
-	 * a control dependency guarantees a LOAD->{LOAD,STORE} order.
-	 */
-	barrier();
-}
-EXPORT_SYMBOL(arch_spin_unlock_wait);
 
 /*
  * If the read lock fails due to a writer, we retry periodically
diff --git a/arch/xtensa/include/asm/spinlock.h b/arch/xtensa/include/asm/spinlock.h
index a36221cf6363..3bb49681ee24 100644
--- a/arch/xtensa/include/asm/spinlock.h
+++ b/arch/xtensa/include/asm/spinlock.h
@@ -33,11 +33,6 @@
 
 #define arch_spin_is_locked(x) ((x)->slock != 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->slock, !VAL);
-}
-
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
 static inline void arch_spin_lock(arch_spinlock_t *lock)
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 tip/core/rcu 0/10] Remove spin_unlock_wait()
  2017-07-24 22:12 [PATCH tip/core/rcu 0/9] Remove spin_unlock_wait() Paul E. McKenney
                   ` (8 preceding siblings ...)
  2017-07-24 22:13   ` Paul E. McKenney
@ 2017-07-31 22:57 ` Paul E. McKenney
  2017-07-31 22:58   ` [PATCH v2 tip/core/rcu 01/10] atomics: Revert addition of comment header to spin_unlock_wait() Paul E. McKenney
                     ` (9 more replies)
  9 siblings, 10 replies; 33+ messages in thread
From: Paul E. McKenney @ 2017-07-31 22:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg

Hello!

This series removes spin_unlock_wait():

1.	Revert over-enthusiastic spin_unlock_wait() header comment.

2.	Remove spin_unlock_wait() from net/netfilter/nf_conntrack_core,
	courtesy of Manfred Spraul.

3.	Replace task_work use of spin_unlock_wait() with lock/unlock
	pair, courtesy of Oleg Nesterov.

4.	Replace scheduler use of spin_unlock_wait() with lock/unlock pair.

5.	Replace completion use of spin_unlock_wait() with lock/unlock
	pair.

6.	Replace exit() use of spin_unlock_wait() with lock/unlock pair.

7.	Replace IPC use of spin_unlock_wait() with lock/unlock pair.

8.	Replace ATA use of spin_unlock_wait() with lock/unlock pair.

9.	Remove generic definition of spin_unlock_wait().

10.	Remove architecture-specific definitions of spin_unlock_wait().

							Thanx, Paul

------------------------------------------------------------------------

 arch/alpha/include/asm/spinlock.h    |    5 -
 arch/arc/include/asm/spinlock.h      |    5 -
 arch/arm/include/asm/spinlock.h      |   16 ----
 arch/arm64/include/asm/spinlock.h    |   58 +----------------
 arch/blackfin/include/asm/spinlock.h |    5 -
 arch/hexagon/include/asm/spinlock.h  |    5 -
 arch/ia64/include/asm/spinlock.h     |   21 ------
 arch/m32r/include/asm/spinlock.h     |    5 -
 arch/metag/include/asm/spinlock.h    |    5 -
 arch/mn10300/include/asm/spinlock.h  |    5 -
 arch/parisc/include/asm/spinlock.h   |    7 --
 arch/powerpc/include/asm/spinlock.h  |   33 ---------
 arch/s390/include/asm/spinlock.h     |    7 --
 arch/sh/include/asm/spinlock-cas.h   |    5 -
 arch/sh/include/asm/spinlock-llsc.h  |    5 -
 arch/sparc/include/asm/spinlock_32.h |    5 -
 arch/tile/include/asm/spinlock_32.h  |    2 
 arch/tile/include/asm/spinlock_64.h  |    2 
 arch/tile/lib/spinlock_32.c          |   23 ------
 arch/tile/lib/spinlock_64.c          |   22 ------
 arch/xtensa/include/asm/spinlock.h   |    5 -
 drivers/ata/libata-eh.c              |    8 --
 include/asm-generic/qspinlock.h      |   14 ----
 include/linux/spinlock.h             |   31 ---------
 include/linux/spinlock_up.h          |    6 -
 ipc/sem.c                            |    3 
 kernel/exit.c                        |    3 
 kernel/locking/qspinlock.c           |  117 -----------------------------------
 kernel/sched/completion.c            |    9 --
 kernel/sched/core.c                  |    5 -
 kernel/task_work.c                   |    8 --
 net/netfilter/nf_conntrack_core.c    |   52 ++++++++-------
 32 files changed, 48 insertions(+), 454 deletions(-)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 tip/core/rcu 01/10] atomics: Revert addition of comment header to spin_unlock_wait()
  2017-07-31 22:57 ` [PATCH v2 tip/core/rcu 0/10] Remove spin_unlock_wait() Paul E. McKenney
@ 2017-07-31 22:58   ` Paul E. McKenney
  2017-07-31 22:58   ` [PATCH v2 tip/core/rcu 02/10] net/netfilter/nf_conntrack_core: Fix net_conntrack_lock() Paul E. McKenney
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2017-07-31 22:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney

There is still considerable confusion as to the semantics of
spin_unlock_wait(), but there seems to be universal agreement that
it is not that of a lock/unlock pair.  This commit therefore removes
the comment added by 6016ffc3874d ("atomics: Add header comment so
spin_unlock_wait()") in order to prevent at least that flavor of
confusion.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/spinlock.h | 20 --------------------
 1 file changed, 20 deletions(-)

diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index d9510e8522d4..59248dcc6ef3 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -369,26 +369,6 @@ static __always_inline int spin_trylock_irq(spinlock_t *lock)
 	raw_spin_trylock_irqsave(spinlock_check(lock), flags); \
 })
 
-/**
- * spin_unlock_wait - Interpose between successive critical sections
- * @lock: the spinlock whose critical sections are to be interposed.
- *
- * Semantically this is equivalent to a spin_lock() immediately
- * followed by a spin_unlock().  However, most architectures have
- * more efficient implementations in which the spin_unlock_wait()
- * cannot block concurrent lock acquisition, and in some cases
- * where spin_unlock_wait() does not write to the lock variable.
- * Nevertheless, spin_unlock_wait() can have high overhead, so if
- * you feel the need to use it, please check to see if there is
- * a better way to get your job done.
- *
- * The ordering guarantees provided by spin_unlock_wait() are:
- *
- * 1.  All accesses preceding the spin_unlock_wait() happen before
- *     any accesses in later critical sections for this same lock.
- * 2.  All accesses following the spin_unlock_wait() happen after
- *     any accesses in earlier critical sections for this same lock.
- */
 static __always_inline void spin_unlock_wait(spinlock_t *lock)
 {
 	raw_spin_unlock_wait(&lock->rlock);
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 tip/core/rcu 02/10] net/netfilter/nf_conntrack_core: Fix net_conntrack_lock()
  2017-07-31 22:57 ` [PATCH v2 tip/core/rcu 0/10] Remove spin_unlock_wait() Paul E. McKenney
  2017-07-31 22:58   ` [PATCH v2 tip/core/rcu 01/10] atomics: Revert addition of comment header to spin_unlock_wait() Paul E. McKenney
@ 2017-07-31 22:58   ` Paul E. McKenney
  2017-07-31 22:58   ` [PATCH v2 tip/core/rcu 03/10] task_work: Replace spin_unlock_wait() with lock/unlock pair Paul E. McKenney
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2017-07-31 22:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Manfred Spraul, stable, Alan Stern, Sasha Levin,
	Pablo Neira Ayuso, netfilter-devel, Paul E. McKenney

From: Manfred Spraul <manfred@colorfullife.com>

As we want to remove spin_unlock_wait() and replace it with explicit
spin_lock()/spin_unlock() calls, we can use this to simplify the
locking.

In addition:
- Reading nf_conntrack_locks_all needs ACQUIRE memory ordering.
- The new code avoids the backwards loop.

Only slightly tested, I did not manage to trigger calls to
nf_conntrack_all_lock().

V2: With improved comments, to clearly show how the barriers
    pair.

Fixes: b16c29191dc8 ("netfilter: nf_conntrack: use safer way to lock all buckets")
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 net/netfilter/nf_conntrack_core.c | 52 ++++++++++++++++++++++-----------------
 1 file changed, 29 insertions(+), 23 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 9979f46c81dc..51390febd5e3 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -96,19 +96,26 @@ static struct conntrack_gc_work conntrack_gc_work;
 
 void nf_conntrack_lock(spinlock_t *lock) __acquires(lock)
 {
+	/* 1) Acquire the lock */
 	spin_lock(lock);
-	while (unlikely(nf_conntrack_locks_all)) {
-		spin_unlock(lock);
 
-		/*
-		 * Order the 'nf_conntrack_locks_all' load vs. the
-		 * spin_unlock_wait() loads below, to ensure
-		 * that 'nf_conntrack_locks_all_lock' is indeed held:
-		 */
-		smp_rmb(); /* spin_lock(&nf_conntrack_locks_all_lock) */
-		spin_unlock_wait(&nf_conntrack_locks_all_lock);
-		spin_lock(lock);
-	}
+	/* 2) read nf_conntrack_locks_all, with ACQUIRE semantics
+	 * It pairs with the smp_store_release() in nf_conntrack_all_unlock()
+	 */
+	if (likely(smp_load_acquire(&nf_conntrack_locks_all) == false))
+		return;
+
+	/* fast path failed, unlock */
+	spin_unlock(lock);
+
+	/* Slow path 1) get global lock */
+	spin_lock(&nf_conntrack_locks_all_lock);
+
+	/* Slow path 2) get the lock we want */
+	spin_lock(lock);
+
+	/* Slow path 3) release the global lock */
+	spin_unlock(&nf_conntrack_locks_all_lock);
 }
 EXPORT_SYMBOL_GPL(nf_conntrack_lock);
 
@@ -149,28 +156,27 @@ static void nf_conntrack_all_lock(void)
 	int i;
 
 	spin_lock(&nf_conntrack_locks_all_lock);
-	nf_conntrack_locks_all = true;
 
-	/*
-	 * Order the above store of 'nf_conntrack_locks_all' against
-	 * the spin_unlock_wait() loads below, such that if
-	 * nf_conntrack_lock() observes 'nf_conntrack_locks_all'
-	 * we must observe nf_conntrack_locks[] held:
-	 */
-	smp_mb(); /* spin_lock(&nf_conntrack_locks_all_lock) */
+	nf_conntrack_locks_all = true;
 
 	for (i = 0; i < CONNTRACK_LOCKS; i++) {
-		spin_unlock_wait(&nf_conntrack_locks[i]);
+		spin_lock(&nf_conntrack_locks[i]);
+
+		/* This spin_unlock provides the "release" to ensure that
+		 * nf_conntrack_locks_all==true is visible to everyone that
+		 * acquired spin_lock(&nf_conntrack_locks[]).
+		 */
+		spin_unlock(&nf_conntrack_locks[i]);
 	}
 }
 
 static void nf_conntrack_all_unlock(void)
 {
-	/*
-	 * All prior stores must be complete before we clear
+	/* All prior stores must be complete before we clear
 	 * 'nf_conntrack_locks_all'. Otherwise nf_conntrack_lock()
 	 * might observe the false value but not the entire
-	 * critical section:
+	 * critical section.
+	 * It pairs with the smp_load_acquire() in nf_conntrack_lock()
 	 */
 	smp_store_release(&nf_conntrack_locks_all, false);
 	spin_unlock(&nf_conntrack_locks_all_lock);
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 tip/core/rcu 03/10] task_work: Replace spin_unlock_wait() with lock/unlock pair
  2017-07-31 22:57 ` [PATCH v2 tip/core/rcu 0/10] Remove spin_unlock_wait() Paul E. McKenney
  2017-07-31 22:58   ` [PATCH v2 tip/core/rcu 01/10] atomics: Revert addition of comment header to spin_unlock_wait() Paul E. McKenney
  2017-07-31 22:58   ` [PATCH v2 tip/core/rcu 02/10] net/netfilter/nf_conntrack_core: Fix net_conntrack_lock() Paul E. McKenney
@ 2017-07-31 22:58   ` Paul E. McKenney
  2017-07-31 22:58   ` [PATCH v2 tip/core/rcu 04/10] sched: " Paul E. McKenney
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2017-07-31 22:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney

From: Oleg Nesterov <oleg@redhat.com>

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore replaces the spin_unlock_wait() call in
task_work_run() with a spin_lock_irq() and a spin_unlock_irq() aruond
the cmpxchg() dequeue loop.  This should be safe from a performance
perspective because ->pi_lock is local to the task and because calls to
the other side of the race, task_work_cancel(), should be rare.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/task_work.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/kernel/task_work.c b/kernel/task_work.c
index d513051fcca2..836a72a66fba 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -96,20 +96,16 @@ void task_work_run(void)
 		 * work->func() can do task_work_add(), do not set
 		 * work_exited unless the list is empty.
 		 */
+		raw_spin_lock_irq(&task->pi_lock);
 		do {
 			work = READ_ONCE(task->task_works);
 			head = !work && (task->flags & PF_EXITING) ?
 				&work_exited : NULL;
 		} while (cmpxchg(&task->task_works, work, head) != work);
+		raw_spin_unlock_irq(&task->pi_lock);
 
 		if (!work)
 			break;
-		/*
-		 * Synchronize with task_work_cancel(). It can't remove
-		 * the first entry == work, cmpxchg(task_works) should
-		 * fail, but it can play with *work and other entries.
-		 */
-		raw_spin_unlock_wait(&task->pi_lock);
 
 		do {
 			next = work->next;
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 tip/core/rcu 04/10] sched: Replace spin_unlock_wait() with lock/unlock pair
  2017-07-31 22:57 ` [PATCH v2 tip/core/rcu 0/10] Remove spin_unlock_wait() Paul E. McKenney
                     ` (2 preceding siblings ...)
  2017-07-31 22:58   ` [PATCH v2 tip/core/rcu 03/10] task_work: Replace spin_unlock_wait() with lock/unlock pair Paul E. McKenney
@ 2017-07-31 22:58   ` Paul E. McKenney
  2017-07-31 22:58   ` [PATCH v2 tip/core/rcu 05/10] completion: " Paul E. McKenney
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2017-07-31 22:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney, Ingo Molnar, Will Deacon, Alan Stern,
	Andrea Parri, Linus Torvalds

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore replaces the spin_unlock_wait() call in
do_task_dead() with spin_lock() followed immediately by spin_unlock().
This should be safe from a performance perspective because the lock is
this tasks ->pi_lock, and this is called only after the task exits.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
[ paulmck: Replace leading smp_mb() with smp_mb__before_spinlock(),
  courtesy of Arnd Bergmann's noting its odd location. ]
---
 kernel/sched/core.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 17c667b427b4..1179111d82a1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3352,8 +3352,9 @@ void __noreturn do_task_dead(void)
 	 * To avoid it, we have to wait for releasing tsk->pi_lock which
 	 * is held by try_to_wake_up()
 	 */
-	smp_mb();
-	raw_spin_unlock_wait(&current->pi_lock);
+	smp_mb__before_spinlock();
+	raw_spin_lock_irq(&current->pi_lock);
+	raw_spin_unlock_irq(&current->pi_lock);
 
 	/* Causes final put_task_struct in finish_task_switch(): */
 	__set_current_state(TASK_DEAD);
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 tip/core/rcu 05/10] completion: Replace spin_unlock_wait() with lock/unlock pair
  2017-07-31 22:57 ` [PATCH v2 tip/core/rcu 0/10] Remove spin_unlock_wait() Paul E. McKenney
                     ` (3 preceding siblings ...)
  2017-07-31 22:58   ` [PATCH v2 tip/core/rcu 04/10] sched: " Paul E. McKenney
@ 2017-07-31 22:58   ` Paul E. McKenney
  2017-07-31 22:58   ` [PATCH v2 tip/core/rcu 06/10] exit: " Paul E. McKenney
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2017-07-31 22:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney, Ingo Molnar, Will Deacon, Alan Stern,
	Andrea Parri, Linus Torvalds

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore replaces the spin_unlock_wait() call in
completion_done() with spin_lock() followed immediately by spin_unlock().
This should be safe from a performance perspective because the lock
will be held only the wakeup happens really quickly.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
---
 kernel/sched/completion.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index 13fc5ae9bf2f..3feb218171a9 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -307,14 +307,9 @@ bool completion_done(struct completion *x)
 	 * If ->done, we need to wait for complete() to release ->wait.lock
 	 * otherwise we can end up freeing the completion before complete()
 	 * is done referencing it.
-	 *
-	 * The RMB pairs with complete()'s RELEASE of ->wait.lock and orders
-	 * the loads of ->done and ->wait.lock such that we cannot observe
-	 * the lock before complete() acquires it while observing the ->done
-	 * after it's acquired the lock.
 	 */
-	smp_rmb();
-	spin_unlock_wait(&x->wait.lock);
+	spin_lock_irq(&x->wait.lock);
+	spin_unlock_irq(&x->wait.lock);
 	return true;
 }
 EXPORT_SYMBOL(completion_done);
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 tip/core/rcu 06/10] exit: Replace spin_unlock_wait() with lock/unlock pair
  2017-07-31 22:57 ` [PATCH v2 tip/core/rcu 0/10] Remove spin_unlock_wait() Paul E. McKenney
                     ` (4 preceding siblings ...)
  2017-07-31 22:58   ` [PATCH v2 tip/core/rcu 05/10] completion: " Paul E. McKenney
@ 2017-07-31 22:58   ` Paul E. McKenney
  2017-07-31 22:58   ` [PATCH v2 tip/core/rcu 07/10] ipc: " Paul E. McKenney
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2017-07-31 22:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney, Will Deacon, Alan Stern, Andrea Parri,
	Linus Torvalds

There is no agreed-upon definition of spin_unlock_wait()'s semantics, and
it appears that all callers could do just as well with a lock/unlock pair.
This commit therefore replaces the spin_unlock_wait() call in do_exit()
with spin_lock() followed immediately by spin_unlock().  This should be
safe from a performance perspective because the lock is a per-task lock,
and this is happening only at task-exit time.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
---
 kernel/exit.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index c5548faa9f37..abfbcf66e5c0 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -819,7 +819,8 @@ void __noreturn do_exit(long code)
 	 * Ensure that we must observe the pi_state in exit_mm() ->
 	 * mm_release() -> exit_pi_state_list().
 	 */
-	raw_spin_unlock_wait(&tsk->pi_lock);
+	raw_spin_lock_irq(&tsk->pi_lock);
+	raw_spin_unlock_irq(&tsk->pi_lock);
 
 	if (unlikely(in_atomic())) {
 		pr_info("note: %s[%d] exited with preempt_count %d\n",
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 tip/core/rcu 07/10] ipc: Replace spin_unlock_wait() with lock/unlock pair
  2017-07-31 22:57 ` [PATCH v2 tip/core/rcu 0/10] Remove spin_unlock_wait() Paul E. McKenney
                     ` (5 preceding siblings ...)
  2017-07-31 22:58   ` [PATCH v2 tip/core/rcu 06/10] exit: " Paul E. McKenney
@ 2017-07-31 22:58   ` Paul E. McKenney
  2017-07-31 22:58     ` Paul E. McKenney
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2017-07-31 22:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney, Davidlohr Bueso, Will Deacon, Alan Stern,
	Andrea Parri, Linus Torvalds

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore replaces the spin_unlock_wait() call in
exit_sem() with spin_lock() followed immediately by spin_unlock().
This should be safe from a performance perspective because exit_sem()
is rarely invoked in production.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
---
 ipc/sem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/ipc/sem.c b/ipc/sem.c
index 9e70cd7a17da..2570830e29fc 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -2091,7 +2091,8 @@ void exit_sem(struct task_struct *tsk)
 			 * possibility where we exit while freeary() didn't
 			 * finish unlocking sem_undo_list.
 			 */
-			spin_unlock_wait(&ulp->lock);
+			spin_lock(&ulp->lock);
+			spin_unlock(&ulp->lock);
 			rcu_read_unlock();
 			break;
 		}
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 tip/core/rcu 08/10] drivers/ata: Replace spin_unlock_wait() with lock/unlock pair
  2017-07-31 22:57 ` [PATCH v2 tip/core/rcu 0/10] Remove spin_unlock_wait() Paul E. McKenney
@ 2017-07-31 22:58     ` Paul E. McKenney
  2017-07-31 22:58   ` [PATCH v2 tip/core/rcu 02/10] net/netfilter/nf_conntrack_core: Fix net_conntrack_lock() Paul E. McKenney
                       ` (8 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2017-07-31 22:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney, linux-ide, Will Deacon, Alan Stern,
	Andrea Parri, Linus Torvalds

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore eliminates the spin_unlock_wait() call and
associated else-clause and hoists the then-clause's lock and unlock out of
the "if" statement.  This should be safe from a performance perspective
because according to Tejun there should be few if any drivers that don't
set their own error handler.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: <linux-ide@vger.kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/ata/libata-eh.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index b70bcf6d2914..b325db27eb8c 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -645,12 +645,11 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap,
 	 * completions are honored.  A scmd is determined to have
 	 * timed out iff its associated qc is active and not failed.
 	 */
+	spin_lock_irqsave(ap->lock, flags);
 	if (ap->ops->error_handler) {
 		struct scsi_cmnd *scmd, *tmp;
 		int nr_timedout = 0;
 
-		spin_lock_irqsave(ap->lock, flags);
-
 		/* This must occur under the ap->lock as we don't want
 		   a polled recovery to race the real interrupt handler
 
@@ -700,12 +699,11 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap,
 		if (nr_timedout)
 			__ata_port_freeze(ap);
 
-		spin_unlock_irqrestore(ap->lock, flags);
 
 		/* initialize eh_tries */
 		ap->eh_tries = ATA_EH_MAX_TRIES;
-	} else
-		spin_unlock_wait(ap->lock);
+	}
+	spin_unlock_irqrestore(ap->lock, flags);
 
 }
 EXPORT_SYMBOL(ata_scsi_cmd_error_handler);
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 tip/core/rcu 08/10] drivers/ata: Replace spin_unlock_wait() with lock/unlock pair
@ 2017-07-31 22:58     ` Paul E. McKenney
  0 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2017-07-31 22:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney, linux-ide, Will Deacon, Alan Stern,
	Andrea Parri, Linus Torvalds

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore eliminates the spin_unlock_wait() call and
associated else-clause and hoists the then-clause's lock and unlock out of
the "if" statement.  This should be safe from a performance perspective
because according to Tejun there should be few if any drivers that don't
set their own error handler.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: <linux-ide@vger.kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/ata/libata-eh.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index b70bcf6d2914..b325db27eb8c 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -645,12 +645,11 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap,
 	 * completions are honored.  A scmd is determined to have
 	 * timed out iff its associated qc is active and not failed.
 	 */
+	spin_lock_irqsave(ap->lock, flags);
 	if (ap->ops->error_handler) {
 		struct scsi_cmnd *scmd, *tmp;
 		int nr_timedout = 0;
 
-		spin_lock_irqsave(ap->lock, flags);
-
 		/* This must occur under the ap->lock as we don't want
 		   a polled recovery to race the real interrupt handler
 
@@ -700,12 +699,11 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap,
 		if (nr_timedout)
 			__ata_port_freeze(ap);
 
-		spin_unlock_irqrestore(ap->lock, flags);
 
 		/* initialize eh_tries */
 		ap->eh_tries = ATA_EH_MAX_TRIES;
-	} else
-		spin_unlock_wait(ap->lock);
+	}
+	spin_unlock_irqrestore(ap->lock, flags);
 
 }
 EXPORT_SYMBOL(ata_scsi_cmd_error_handler);
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 tip/core/rcu 09/10] locking: Remove spin_unlock_wait() generic definitions
  2017-07-31 22:57 ` [PATCH v2 tip/core/rcu 0/10] Remove spin_unlock_wait() Paul E. McKenney
                     ` (7 preceding siblings ...)
  2017-07-31 22:58     ` Paul E. McKenney
@ 2017-07-31 22:58   ` Paul E. McKenney
  2017-07-31 22:58     ` Paul E. McKenney
  9 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2017-07-31 22:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney, Arnd Bergmann, Ingo Molnar, Will Deacon,
	Alan Stern, Andrea Parri, Linus Torvalds

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore removes spin_unlock_wait() and related
definitions from core code.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
---
 include/asm-generic/qspinlock.h |  14 -----
 include/linux/spinlock.h        |  11 ----
 include/linux/spinlock_up.h     |   6 ---
 kernel/locking/qspinlock.c      | 117 ----------------------------------------
 4 files changed, 148 deletions(-)

diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
index 9f0681bf1e87..66260777d644 100644
--- a/include/asm-generic/qspinlock.h
+++ b/include/asm-generic/qspinlock.h
@@ -22,17 +22,6 @@
 #include <asm-generic/qspinlock_types.h>
 
 /**
- * queued_spin_unlock_wait - wait until the _current_ lock holder releases the lock
- * @lock : Pointer to queued spinlock structure
- *
- * There is a very slight possibility of live-lock if the lockers keep coming
- * and the waiter is just unfortunate enough to not see any unlock state.
- */
-#ifndef queued_spin_unlock_wait
-extern void queued_spin_unlock_wait(struct qspinlock *lock);
-#endif
-
-/**
  * queued_spin_is_locked - is the spinlock locked?
  * @lock: Pointer to queued spinlock structure
  * Return: 1 if it is locked, 0 otherwise
@@ -41,8 +30,6 @@ extern void queued_spin_unlock_wait(struct qspinlock *lock);
 static __always_inline int queued_spin_is_locked(struct qspinlock *lock)
 {
 	/*
-	 * See queued_spin_unlock_wait().
-	 *
 	 * Any !0 state indicates it is locked, even if _Q_LOCKED_VAL
 	 * isn't immediately observable.
 	 */
@@ -135,6 +122,5 @@ static __always_inline bool virt_spin_lock(struct qspinlock *lock)
 #define arch_spin_trylock(l)		queued_spin_trylock(l)
 #define arch_spin_unlock(l)		queued_spin_unlock(l)
 #define arch_spin_lock_flags(l, f)	queued_spin_lock(l)
-#define arch_spin_unlock_wait(l)	queued_spin_unlock_wait(l)
 
 #endif /* __ASM_GENERIC_QSPINLOCK_H */
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 59248dcc6ef3..ef018a6e4985 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -130,12 +130,6 @@ do {								\
 #define smp_mb__before_spinlock()	smp_wmb()
 #endif
 
-/**
- * raw_spin_unlock_wait - wait until the spinlock gets unlocked
- * @lock: the spinlock in question.
- */
-#define raw_spin_unlock_wait(lock)	arch_spin_unlock_wait(&(lock)->raw_lock)
-
 #ifdef CONFIG_DEBUG_SPINLOCK
  extern void do_raw_spin_lock(raw_spinlock_t *lock) __acquires(lock);
 #define do_raw_spin_lock_flags(lock, flags) do_raw_spin_lock(lock)
@@ -369,11 +363,6 @@ static __always_inline int spin_trylock_irq(spinlock_t *lock)
 	raw_spin_trylock_irqsave(spinlock_check(lock), flags); \
 })
 
-static __always_inline void spin_unlock_wait(spinlock_t *lock)
-{
-	raw_spin_unlock_wait(&lock->rlock);
-}
-
 static __always_inline int spin_is_locked(spinlock_t *lock)
 {
 	return raw_spin_is_locked(&lock->rlock);
diff --git a/include/linux/spinlock_up.h b/include/linux/spinlock_up.h
index 0d9848de677d..612fb530af41 100644
--- a/include/linux/spinlock_up.h
+++ b/include/linux/spinlock_up.h
@@ -26,11 +26,6 @@
 #ifdef CONFIG_DEBUG_SPINLOCK
 #define arch_spin_is_locked(x)		((x)->slock == 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->slock, VAL);
-}
-
 static inline void arch_spin_lock(arch_spinlock_t *lock)
 {
 	lock->slock = 0;
@@ -73,7 +68,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 
 #else /* DEBUG_SPINLOCK */
 #define arch_spin_is_locked(lock)	((void)(lock), 0)
-#define arch_spin_unlock_wait(lock)	do { barrier(); (void)(lock); } while (0)
 /* for sched/core.c and kernel_lock.c: */
 # define arch_spin_lock(lock)		do { barrier(); (void)(lock); } while (0)
 # define arch_spin_lock_flags(lock, flags)	do { barrier(); (void)(lock); } while (0)
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index fd24153e8a48..294294c71ba4 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -268,123 +268,6 @@ static __always_inline u32  __pv_wait_head_or_lock(struct qspinlock *lock,
 #define queued_spin_lock_slowpath	native_queued_spin_lock_slowpath
 #endif
 
-/*
- * Various notes on spin_is_locked() and spin_unlock_wait(), which are
- * 'interesting' functions:
- *
- * PROBLEM: some architectures have an interesting issue with atomic ACQUIRE
- * operations in that the ACQUIRE applies to the LOAD _not_ the STORE (ARM64,
- * PPC). Also qspinlock has a similar issue per construction, the setting of
- * the locked byte can be unordered acquiring the lock proper.
- *
- * This gets to be 'interesting' in the following cases, where the /should/s
- * end up false because of this issue.
- *
- *
- * CASE 1:
- *
- * So the spin_is_locked() correctness issue comes from something like:
- *
- *   CPU0				CPU1
- *
- *   global_lock();			local_lock(i)
- *     spin_lock(&G)			  spin_lock(&L[i])
- *     for (i)				  if (!spin_is_locked(&G)) {
- *       spin_unlock_wait(&L[i]);	    smp_acquire__after_ctrl_dep();
- *					    return;
- *					  }
- *					  // deal with fail
- *
- * Where it is important CPU1 sees G locked or CPU0 sees L[i] locked such
- * that there is exclusion between the two critical sections.
- *
- * The load from spin_is_locked(&G) /should/ be constrained by the ACQUIRE from
- * spin_lock(&L[i]), and similarly the load(s) from spin_unlock_wait(&L[i])
- * /should/ be constrained by the ACQUIRE from spin_lock(&G).
- *
- * Similarly, later stuff is constrained by the ACQUIRE from CTRL+RMB.
- *
- *
- * CASE 2:
- *
- * For spin_unlock_wait() there is a second correctness issue, namely:
- *
- *   CPU0				CPU1
- *
- *   flag = set;
- *   smp_mb();				spin_lock(&l)
- *   spin_unlock_wait(&l);		if (!flag)
- *					  // add to lockless list
- *					spin_unlock(&l);
- *   // iterate lockless list
- *
- * Which wants to ensure that CPU1 will stop adding bits to the list and CPU0
- * will observe the last entry on the list (if spin_unlock_wait() had ACQUIRE
- * semantics etc..)
- *
- * Where flag /should/ be ordered against the locked store of l.
- */
-
-/*
- * queued_spin_lock_slowpath() can (load-)ACQUIRE the lock before
- * issuing an _unordered_ store to set _Q_LOCKED_VAL.
- *
- * This means that the store can be delayed, but no later than the
- * store-release from the unlock. This means that simply observing
- * _Q_LOCKED_VAL is not sufficient to determine if the lock is acquired.
- *
- * There are two paths that can issue the unordered store:
- *
- *  (1) clear_pending_set_locked():	*,1,0 -> *,0,1
- *
- *  (2) set_locked():			t,0,0 -> t,0,1 ; t != 0
- *      atomic_cmpxchg_relaxed():	t,0,0 -> 0,0,1
- *
- * However, in both cases we have other !0 state we've set before to queue
- * ourseves:
- *
- * For (1) we have the atomic_cmpxchg_acquire() that set _Q_PENDING_VAL, our
- * load is constrained by that ACQUIRE to not pass before that, and thus must
- * observe the store.
- *
- * For (2) we have a more intersting scenario. We enqueue ourselves using
- * xchg_tail(), which ends up being a RELEASE. This in itself is not
- * sufficient, however that is followed by an smp_cond_acquire() on the same
- * word, giving a RELEASE->ACQUIRE ordering. This again constrains our load and
- * guarantees we must observe that store.
- *
- * Therefore both cases have other !0 state that is observable before the
- * unordered locked byte store comes through. This means we can use that to
- * wait for the lock store, and then wait for an unlock.
- */
-#ifndef queued_spin_unlock_wait
-void queued_spin_unlock_wait(struct qspinlock *lock)
-{
-	u32 val;
-
-	for (;;) {
-		val = atomic_read(&lock->val);
-
-		if (!val) /* not locked, we're done */
-			goto done;
-
-		if (val & _Q_LOCKED_MASK) /* locked, go wait for unlock */
-			break;
-
-		/* not locked, but pending, wait until we observe the lock */
-		cpu_relax();
-	}
-
-	/* any unlock is good */
-	while (atomic_read(&lock->val) & _Q_LOCKED_MASK)
-		cpu_relax();
-
-done:
-	smp_acquire__after_ctrl_dep();
-}
-EXPORT_SYMBOL(queued_spin_unlock_wait);
-#endif
-
 #endif /* _GEN_PV_LOCK_SLOWPATH */
 
 /**
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 tip/core/rcu 10/10] arch: Remove spin_unlock_wait() arch-specific definitions
  2017-07-31 22:57 ` [PATCH v2 tip/core/rcu 0/10] Remove spin_unlock_wait() Paul E. McKenney
@ 2017-07-31 22:58     ` Paul E. McKenney
  2017-07-31 22:58   ` [PATCH v2 tip/core/rcu 02/10] net/netfilter/nf_conntrack_core: Fix net_conntrack_lock() Paul E. McKenney
                       ` (8 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2017-07-31 22:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney, linux-arch, Alan Stern, Andrea Parri,
	Linus Torvalds

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore removes the underlying arch-specific
arch_spin_unlock_wait() for all architectures providing them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: <linux-arch@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Boqun Feng <boqun.feng@gmail.com>
---
 arch/alpha/include/asm/spinlock.h    |  5 ----
 arch/arc/include/asm/spinlock.h      |  5 ----
 arch/arm/include/asm/spinlock.h      | 16 ----------
 arch/arm64/include/asm/spinlock.h    | 58 ++++--------------------------------
 arch/blackfin/include/asm/spinlock.h |  5 ----
 arch/hexagon/include/asm/spinlock.h  |  5 ----
 arch/ia64/include/asm/spinlock.h     | 21 -------------
 arch/m32r/include/asm/spinlock.h     |  5 ----
 arch/metag/include/asm/spinlock.h    |  5 ----
 arch/mn10300/include/asm/spinlock.h  |  5 ----
 arch/parisc/include/asm/spinlock.h   |  7 -----
 arch/powerpc/include/asm/spinlock.h  | 33 --------------------
 arch/s390/include/asm/spinlock.h     |  7 -----
 arch/sh/include/asm/spinlock-cas.h   |  5 ----
 arch/sh/include/asm/spinlock-llsc.h  |  5 ----
 arch/sparc/include/asm/spinlock_32.h |  5 ----
 arch/tile/include/asm/spinlock_32.h  |  2 --
 arch/tile/include/asm/spinlock_64.h  |  2 --
 arch/tile/lib/spinlock_32.c          | 23 --------------
 arch/tile/lib/spinlock_64.c          | 22 --------------
 arch/xtensa/include/asm/spinlock.h   |  5 ----
 21 files changed, 5 insertions(+), 241 deletions(-)

diff --git a/arch/alpha/include/asm/spinlock.h b/arch/alpha/include/asm/spinlock.h
index a40b9fc0c6c3..718ac0b64adf 100644
--- a/arch/alpha/include/asm/spinlock.h
+++ b/arch/alpha/include/asm/spinlock.h
@@ -16,11 +16,6 @@
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 #define arch_spin_is_locked(x)	((x)->lock != 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->lock, !VAL);
-}
-
 static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 {
         return lock.lock == 0;
diff --git a/arch/arc/include/asm/spinlock.h b/arch/arc/include/asm/spinlock.h
index 233d5ffe6ec7..a325e6a36523 100644
--- a/arch/arc/include/asm/spinlock.h
+++ b/arch/arc/include/asm/spinlock.h
@@ -16,11 +16,6 @@
 #define arch_spin_is_locked(x)	((x)->slock != __ARCH_SPIN_LOCK_UNLOCKED__)
 #define arch_spin_lock_flags(lock, flags)	arch_spin_lock(lock)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->slock, !VAL);
-}
-
 #ifdef CONFIG_ARC_HAS_LLSC
 
 static inline void arch_spin_lock(arch_spinlock_t *lock)
diff --git a/arch/arm/include/asm/spinlock.h b/arch/arm/include/asm/spinlock.h
index 4bec45442072..c030143c18c6 100644
--- a/arch/arm/include/asm/spinlock.h
+++ b/arch/arm/include/asm/spinlock.h
@@ -52,22 +52,6 @@ static inline void dsb_sev(void)
  * memory.
  */
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	u16 owner = READ_ONCE(lock->tickets.owner);
-
-	for (;;) {
-		arch_spinlock_t tmp = READ_ONCE(*lock);
-
-		if (tmp.tickets.owner == tmp.tickets.next ||
-		    tmp.tickets.owner != owner)
-			break;
-
-		wfe();
-	}
-	smp_acquire__after_ctrl_dep();
-}
-
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
 static inline void arch_spin_lock(arch_spinlock_t *lock)
diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h
index cae331d553f8..f445bd7f2b9f 100644
--- a/arch/arm64/include/asm/spinlock.h
+++ b/arch/arm64/include/asm/spinlock.h
@@ -26,58 +26,6 @@
  * The memory barriers are implicit with the load-acquire and store-release
  * instructions.
  */
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	unsigned int tmp;
-	arch_spinlock_t lockval;
-	u32 owner;
-
-	/*
-	 * Ensure prior spin_lock operations to other locks have completed
-	 * on this CPU before we test whether "lock" is locked.
-	 */
-	smp_mb();
-	owner = READ_ONCE(lock->owner) << 16;
-
-	asm volatile(
-"	sevl\n"
-"1:	wfe\n"
-"2:	ldaxr	%w0, %2\n"
-	/* Is the lock free? */
-"	eor	%w1, %w0, %w0, ror #16\n"
-"	cbz	%w1, 3f\n"
-	/* Lock taken -- has there been a subsequent unlock->lock transition? */
-"	eor	%w1, %w3, %w0, lsl #16\n"
-"	cbz	%w1, 1b\n"
-	/*
-	 * The owner has been updated, so there was an unlock->lock
-	 * transition that we missed. That means we can rely on the
-	 * store-release of the unlock operation paired with the
-	 * load-acquire of the lock operation to publish any of our
-	 * previous stores to the new lock owner and therefore don't
-	 * need to bother with the writeback below.
-	 */
-"	b	4f\n"
-"3:\n"
-	/*
-	 * Serialise against any concurrent lockers by writing back the
-	 * unlocked lock value
-	 */
-	ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-"	stxr	%w1, %w0, %2\n"
-	__nops(2),
-	/* LSE atomics */
-"	mov	%w1, %w0\n"
-"	cas	%w0, %w0, %2\n"
-"	eor	%w1, %w1, %w0\n")
-	/* Somebody else wrote to the lock, GOTO 10 and reload the value */
-"	cbnz	%w1, 2b\n"
-"4:"
-	: "=&r" (lockval), "=&r" (tmp), "+Q" (*lock)
-	: "r" (owner)
-	: "memory");
-}
 
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
@@ -176,7 +124,11 @@ static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 
 static inline int arch_spin_is_locked(arch_spinlock_t *lock)
 {
-	smp_mb(); /* See arch_spin_unlock_wait */
+	/*
+	 * Ensure prior spin_lock operations to other locks have completed
+	 * on this CPU before we test whether "lock" is locked.
+	 */
+	smp_mb(); /* ^^^ */
 	return !arch_spin_value_unlocked(READ_ONCE(*lock));
 }
 
diff --git a/arch/blackfin/include/asm/spinlock.h b/arch/blackfin/include/asm/spinlock.h
index c58f4a83ed6f..f6431439d15d 100644
--- a/arch/blackfin/include/asm/spinlock.h
+++ b/arch/blackfin/include/asm/spinlock.h
@@ -48,11 +48,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 	__raw_spin_unlock_asm(&lock->lock);
 }
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->lock, !VAL);
-}
-
 static inline int arch_read_can_lock(arch_rwlock_t *rw)
 {
 	return __raw_uncached_fetch_asm(&rw->lock) > 0;
diff --git a/arch/hexagon/include/asm/spinlock.h b/arch/hexagon/include/asm/spinlock.h
index a1c55788c5d6..53a8d5885887 100644
--- a/arch/hexagon/include/asm/spinlock.h
+++ b/arch/hexagon/include/asm/spinlock.h
@@ -179,11 +179,6 @@ static inline unsigned int arch_spin_trylock(arch_spinlock_t *lock)
  */
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->lock, !VAL);
-}
-
 #define arch_spin_is_locked(x) ((x)->lock != 0)
 
 #define arch_read_lock_flags(lock, flags) arch_read_lock(lock)
diff --git a/arch/ia64/include/asm/spinlock.h b/arch/ia64/include/asm/spinlock.h
index ca9e76149a4a..df2c121164b8 100644
--- a/arch/ia64/include/asm/spinlock.h
+++ b/arch/ia64/include/asm/spinlock.h
@@ -76,22 +76,6 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 	ACCESS_ONCE(*p) = (tmp + 2) & ~1;
 }
 
-static __always_inline void __ticket_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	int	*p = (int *)&lock->lock, ticket;
-
-	ia64_invala();
-
-	for (;;) {
-		asm volatile ("ld4.c.nc %0=[%1]" : "=r"(ticket) : "r"(p) : "memory");
-		if (!(((ticket >> TICKET_SHIFT) ^ ticket) & TICKET_MASK))
-			return;
-		cpu_relax();
-	}
-
-	smp_acquire__after_ctrl_dep();
-}
-
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
 {
 	long tmp = ACCESS_ONCE(lock->lock);
@@ -143,11 +127,6 @@ static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock,
 	arch_spin_lock(lock);
 }
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	__ticket_spin_unlock_wait(lock);
-}
-
 #define arch_read_can_lock(rw)		(*(volatile int *)(rw) >= 0)
 #define arch_write_can_lock(rw)	(*(volatile int *)(rw) == 0)
 
diff --git a/arch/m32r/include/asm/spinlock.h b/arch/m32r/include/asm/spinlock.h
index 323c7fc953cd..a56825592b90 100644
--- a/arch/m32r/include/asm/spinlock.h
+++ b/arch/m32r/include/asm/spinlock.h
@@ -30,11 +30,6 @@
 #define arch_spin_is_locked(x)		(*(volatile int *)(&(x)->slock) <= 0)
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->slock, VAL > 0);
-}
-
 /**
  * arch_spin_trylock - Try spin lock and return a result
  * @lock: Pointer to the lock variable
diff --git a/arch/metag/include/asm/spinlock.h b/arch/metag/include/asm/spinlock.h
index c0c7a22be1ae..ddf7fe5708a6 100644
--- a/arch/metag/include/asm/spinlock.h
+++ b/arch/metag/include/asm/spinlock.h
@@ -15,11 +15,6 @@
  * locked.
  */
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->lock, !VAL);
-}
-
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
 #define	arch_read_lock_flags(lock, flags) arch_read_lock(lock)
diff --git a/arch/mn10300/include/asm/spinlock.h b/arch/mn10300/include/asm/spinlock.h
index 9c7b8f7942d8..fe413b41df6c 100644
--- a/arch/mn10300/include/asm/spinlock.h
+++ b/arch/mn10300/include/asm/spinlock.h
@@ -26,11 +26,6 @@
 
 #define arch_spin_is_locked(x)	(*(volatile signed char *)(&(x)->slock) != 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->slock, !VAL);
-}
-
 static inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
 	asm volatile(
diff --git a/arch/parisc/include/asm/spinlock.h b/arch/parisc/include/asm/spinlock.h
index e32936cd7f10..55bfe4affca3 100644
--- a/arch/parisc/include/asm/spinlock.h
+++ b/arch/parisc/include/asm/spinlock.h
@@ -14,13 +14,6 @@ static inline int arch_spin_is_locked(arch_spinlock_t *x)
 
 #define arch_spin_lock(lock) arch_spin_lock_flags(lock, 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *x)
-{
-	volatile unsigned int *a = __ldcw_align(x);
-
-	smp_cond_load_acquire(a, VAL);
-}
-
 static inline void arch_spin_lock_flags(arch_spinlock_t *x,
 					 unsigned long flags)
 {
diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
index 8c1b913de6d7..d256e448ea49 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -170,39 +170,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 	lock->slock = 0;
 }
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	arch_spinlock_t lock_val;
-
-	smp_mb();
-
-	/*
-	 * Atomically load and store back the lock value (unchanged). This
-	 * ensures that our observation of the lock value is ordered with
-	 * respect to other lock operations.
-	 */
-	__asm__ __volatile__(
-"1:	" PPC_LWARX(%0, 0, %2, 0) "\n"
-"	stwcx. %0, 0, %2\n"
-"	bne- 1b\n"
-	: "=&r" (lock_val), "+m" (*lock)
-	: "r" (lock)
-	: "cr0", "xer");
-
-	if (arch_spin_value_unlocked(lock_val))
-		goto out;
-
-	while (lock->slock) {
-		HMT_low();
-		if (SHARED_PROCESSOR)
-			__spin_yield(lock);
-	}
-	HMT_medium();
-
-out:
-	smp_mb();
-}
-
 /*
  * Read-write spinlocks, allowing multiple readers
  * but only one writer.
diff --git a/arch/s390/include/asm/spinlock.h b/arch/s390/include/asm/spinlock.h
index f7838ecd83c6..217ee5210c32 100644
--- a/arch/s390/include/asm/spinlock.h
+++ b/arch/s390/include/asm/spinlock.h
@@ -98,13 +98,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lp)
 		: "cc", "memory");
 }
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	while (arch_spin_is_locked(lock))
-		arch_spin_relax(lock);
-	smp_acquire__after_ctrl_dep();
-}
-
 /*
  * Read-write spinlocks, allowing multiple readers
  * but only one writer.
diff --git a/arch/sh/include/asm/spinlock-cas.h b/arch/sh/include/asm/spinlock-cas.h
index c46e8cc7b515..5ed7dbbd94ff 100644
--- a/arch/sh/include/asm/spinlock-cas.h
+++ b/arch/sh/include/asm/spinlock-cas.h
@@ -29,11 +29,6 @@ static inline unsigned __sl_cas(volatile unsigned *p, unsigned old, unsigned new
 #define arch_spin_is_locked(x)		((x)->lock <= 0)
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->lock, VAL > 0);
-}
-
 static inline void arch_spin_lock(arch_spinlock_t *lock)
 {
 	while (!__sl_cas(&lock->lock, 1, 0));
diff --git a/arch/sh/include/asm/spinlock-llsc.h b/arch/sh/include/asm/spinlock-llsc.h
index cec78143fa83..f77263aae760 100644
--- a/arch/sh/include/asm/spinlock-llsc.h
+++ b/arch/sh/include/asm/spinlock-llsc.h
@@ -21,11 +21,6 @@
 #define arch_spin_is_locked(x)		((x)->lock <= 0)
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->lock, VAL > 0);
-}
-
 /*
  * Simple spin lock operations.  There are two variants, one clears IRQ's
  * on the local processor, one does not.
diff --git a/arch/sparc/include/asm/spinlock_32.h b/arch/sparc/include/asm/spinlock_32.h
index 8011e79f59c9..67345b2dc408 100644
--- a/arch/sparc/include/asm/spinlock_32.h
+++ b/arch/sparc/include/asm/spinlock_32.h
@@ -14,11 +14,6 @@
 
 #define arch_spin_is_locked(lock) (*((volatile unsigned char *)(lock)) != 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->lock, !VAL);
-}
-
 static inline void arch_spin_lock(arch_spinlock_t *lock)
 {
 	__asm__ __volatile__(
diff --git a/arch/tile/include/asm/spinlock_32.h b/arch/tile/include/asm/spinlock_32.h
index b14b1ba5bf9c..cba8ba9b8da6 100644
--- a/arch/tile/include/asm/spinlock_32.h
+++ b/arch/tile/include/asm/spinlock_32.h
@@ -64,8 +64,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 	lock->current_ticket = old_ticket + TICKET_QUANTUM;
 }
 
-void arch_spin_unlock_wait(arch_spinlock_t *lock);
-
 /*
  * Read-write spinlocks, allowing multiple readers
  * but only one writer.
diff --git a/arch/tile/include/asm/spinlock_64.h b/arch/tile/include/asm/spinlock_64.h
index b9718fb4e74a..9a2c2d605752 100644
--- a/arch/tile/include/asm/spinlock_64.h
+++ b/arch/tile/include/asm/spinlock_64.h
@@ -58,8 +58,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 	__insn_fetchadd4(&lock->lock, 1U << __ARCH_SPIN_CURRENT_SHIFT);
 }
 
-void arch_spin_unlock_wait(arch_spinlock_t *lock);
-
 void arch_spin_lock_slow(arch_spinlock_t *lock, u32 val);
 
 /* Grab the "next" ticket number and bump it atomically.
diff --git a/arch/tile/lib/spinlock_32.c b/arch/tile/lib/spinlock_32.c
index 076c6cc43113..db9333f2447c 100644
--- a/arch/tile/lib/spinlock_32.c
+++ b/arch/tile/lib/spinlock_32.c
@@ -62,29 +62,6 @@ int arch_spin_trylock(arch_spinlock_t *lock)
 }
 EXPORT_SYMBOL(arch_spin_trylock);
 
-void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	u32 iterations = 0;
-	int curr = READ_ONCE(lock->current_ticket);
-	int next = READ_ONCE(lock->next_ticket);
-
-	/* Return immediately if unlocked. */
-	if (next == curr)
-		return;
-
-	/* Wait until the current locker has released the lock. */
-	do {
-		delay_backoff(iterations++);
-	} while (READ_ONCE(lock->current_ticket) == curr);
-
-	/*
-	 * The TILE architecture doesn't do read speculation; therefore
-	 * a control dependency guarantees a LOAD->{LOAD,STORE} order.
-	 */
-	barrier();
-}
-EXPORT_SYMBOL(arch_spin_unlock_wait);
-
 /*
  * The low byte is always reserved to be the marker for a "tns" operation
  * since the low bit is set to "1" by a tns.  The next seven bits are
diff --git a/arch/tile/lib/spinlock_64.c b/arch/tile/lib/spinlock_64.c
index a4b5b2cbce93..de414c22892f 100644
--- a/arch/tile/lib/spinlock_64.c
+++ b/arch/tile/lib/spinlock_64.c
@@ -62,28 +62,6 @@ int arch_spin_trylock(arch_spinlock_t *lock)
 }
 EXPORT_SYMBOL(arch_spin_trylock);
 
-void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	u32 iterations = 0;
-	u32 val = READ_ONCE(lock->lock);
-	u32 curr = arch_spin_current(val);
-
-	/* Return immediately if unlocked. */
-	if (arch_spin_next(val) == curr)
-		return;
-
-	/* Wait until the current locker has released the lock. */
-	do {
-		delay_backoff(iterations++);
-	} while (arch_spin_current(READ_ONCE(lock->lock)) == curr);
-
-	/*
-	 * The TILE architecture doesn't do read speculation; therefore
-	 * a control dependency guarantees a LOAD->{LOAD,STORE} order.
-	 */
-	barrier();
-}
-EXPORT_SYMBOL(arch_spin_unlock_wait);
 
 /*
  * If the read lock fails due to a writer, we retry periodically
diff --git a/arch/xtensa/include/asm/spinlock.h b/arch/xtensa/include/asm/spinlock.h
index a36221cf6363..3bb49681ee24 100644
--- a/arch/xtensa/include/asm/spinlock.h
+++ b/arch/xtensa/include/asm/spinlock.h
@@ -33,11 +33,6 @@
 
 #define arch_spin_is_locked(x) ((x)->slock != 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->slock, !VAL);
-}
-
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
 static inline void arch_spin_lock(arch_spinlock_t *lock)
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 tip/core/rcu 10/10] arch: Remove spin_unlock_wait() arch-specific definitions
@ 2017-07-31 22:58     ` Paul E. McKenney
  0 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2017-07-31 22:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney, linux-arch, Alan Stern, Andrea Parri,
	Linus Torvalds

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore removes the underlying arch-specific
arch_spin_unlock_wait() for all architectures providing them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: <linux-arch@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Boqun Feng <boqun.feng@gmail.com>
---
 arch/alpha/include/asm/spinlock.h    |  5 ----
 arch/arc/include/asm/spinlock.h      |  5 ----
 arch/arm/include/asm/spinlock.h      | 16 ----------
 arch/arm64/include/asm/spinlock.h    | 58 ++++--------------------------------
 arch/blackfin/include/asm/spinlock.h |  5 ----
 arch/hexagon/include/asm/spinlock.h  |  5 ----
 arch/ia64/include/asm/spinlock.h     | 21 -------------
 arch/m32r/include/asm/spinlock.h     |  5 ----
 arch/metag/include/asm/spinlock.h    |  5 ----
 arch/mn10300/include/asm/spinlock.h  |  5 ----
 arch/parisc/include/asm/spinlock.h   |  7 -----
 arch/powerpc/include/asm/spinlock.h  | 33 --------------------
 arch/s390/include/asm/spinlock.h     |  7 -----
 arch/sh/include/asm/spinlock-cas.h   |  5 ----
 arch/sh/include/asm/spinlock-llsc.h  |  5 ----
 arch/sparc/include/asm/spinlock_32.h |  5 ----
 arch/tile/include/asm/spinlock_32.h  |  2 --
 arch/tile/include/asm/spinlock_64.h  |  2 --
 arch/tile/lib/spinlock_32.c          | 23 --------------
 arch/tile/lib/spinlock_64.c          | 22 --------------
 arch/xtensa/include/asm/spinlock.h   |  5 ----
 21 files changed, 5 insertions(+), 241 deletions(-)

diff --git a/arch/alpha/include/asm/spinlock.h b/arch/alpha/include/asm/spinlock.h
index a40b9fc0c6c3..718ac0b64adf 100644
--- a/arch/alpha/include/asm/spinlock.h
+++ b/arch/alpha/include/asm/spinlock.h
@@ -16,11 +16,6 @@
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 #define arch_spin_is_locked(x)	((x)->lock != 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->lock, !VAL);
-}
-
 static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 {
         return lock.lock == 0;
diff --git a/arch/arc/include/asm/spinlock.h b/arch/arc/include/asm/spinlock.h
index 233d5ffe6ec7..a325e6a36523 100644
--- a/arch/arc/include/asm/spinlock.h
+++ b/arch/arc/include/asm/spinlock.h
@@ -16,11 +16,6 @@
 #define arch_spin_is_locked(x)	((x)->slock != __ARCH_SPIN_LOCK_UNLOCKED__)
 #define arch_spin_lock_flags(lock, flags)	arch_spin_lock(lock)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->slock, !VAL);
-}
-
 #ifdef CONFIG_ARC_HAS_LLSC
 
 static inline void arch_spin_lock(arch_spinlock_t *lock)
diff --git a/arch/arm/include/asm/spinlock.h b/arch/arm/include/asm/spinlock.h
index 4bec45442072..c030143c18c6 100644
--- a/arch/arm/include/asm/spinlock.h
+++ b/arch/arm/include/asm/spinlock.h
@@ -52,22 +52,6 @@ static inline void dsb_sev(void)
  * memory.
  */
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	u16 owner = READ_ONCE(lock->tickets.owner);
-
-	for (;;) {
-		arch_spinlock_t tmp = READ_ONCE(*lock);
-
-		if (tmp.tickets.owner == tmp.tickets.next ||
-		    tmp.tickets.owner != owner)
-			break;
-
-		wfe();
-	}
-	smp_acquire__after_ctrl_dep();
-}
-
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
 static inline void arch_spin_lock(arch_spinlock_t *lock)
diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h
index cae331d553f8..f445bd7f2b9f 100644
--- a/arch/arm64/include/asm/spinlock.h
+++ b/arch/arm64/include/asm/spinlock.h
@@ -26,58 +26,6 @@
  * The memory barriers are implicit with the load-acquire and store-release
  * instructions.
  */
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	unsigned int tmp;
-	arch_spinlock_t lockval;
-	u32 owner;
-
-	/*
-	 * Ensure prior spin_lock operations to other locks have completed
-	 * on this CPU before we test whether "lock" is locked.
-	 */
-	smp_mb();
-	owner = READ_ONCE(lock->owner) << 16;
-
-	asm volatile(
-"	sevl\n"
-"1:	wfe\n"
-"2:	ldaxr	%w0, %2\n"
-	/* Is the lock free? */
-"	eor	%w1, %w0, %w0, ror #16\n"
-"	cbz	%w1, 3f\n"
-	/* Lock taken -- has there been a subsequent unlock->lock transition? */
-"	eor	%w1, %w3, %w0, lsl #16\n"
-"	cbz	%w1, 1b\n"
-	/*
-	 * The owner has been updated, so there was an unlock->lock
-	 * transition that we missed. That means we can rely on the
-	 * store-release of the unlock operation paired with the
-	 * load-acquire of the lock operation to publish any of our
-	 * previous stores to the new lock owner and therefore don't
-	 * need to bother with the writeback below.
-	 */
-"	b	4f\n"
-"3:\n"
-	/*
-	 * Serialise against any concurrent lockers by writing back the
-	 * unlocked lock value
-	 */
-	ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-"	stxr	%w1, %w0, %2\n"
-	__nops(2),
-	/* LSE atomics */
-"	mov	%w1, %w0\n"
-"	cas	%w0, %w0, %2\n"
-"	eor	%w1, %w1, %w0\n")
-	/* Somebody else wrote to the lock, GOTO 10 and reload the value */
-"	cbnz	%w1, 2b\n"
-"4:"
-	: "=&r" (lockval), "=&r" (tmp), "+Q" (*lock)
-	: "r" (owner)
-	: "memory");
-}
 
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
@@ -176,7 +124,11 @@ static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 
 static inline int arch_spin_is_locked(arch_spinlock_t *lock)
 {
-	smp_mb(); /* See arch_spin_unlock_wait */
+	/*
+	 * Ensure prior spin_lock operations to other locks have completed
+	 * on this CPU before we test whether "lock" is locked.
+	 */
+	smp_mb(); /* ^^^ */
 	return !arch_spin_value_unlocked(READ_ONCE(*lock));
 }
 
diff --git a/arch/blackfin/include/asm/spinlock.h b/arch/blackfin/include/asm/spinlock.h
index c58f4a83ed6f..f6431439d15d 100644
--- a/arch/blackfin/include/asm/spinlock.h
+++ b/arch/blackfin/include/asm/spinlock.h
@@ -48,11 +48,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 	__raw_spin_unlock_asm(&lock->lock);
 }
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->lock, !VAL);
-}
-
 static inline int arch_read_can_lock(arch_rwlock_t *rw)
 {
 	return __raw_uncached_fetch_asm(&rw->lock) > 0;
diff --git a/arch/hexagon/include/asm/spinlock.h b/arch/hexagon/include/asm/spinlock.h
index a1c55788c5d6..53a8d5885887 100644
--- a/arch/hexagon/include/asm/spinlock.h
+++ b/arch/hexagon/include/asm/spinlock.h
@@ -179,11 +179,6 @@ static inline unsigned int arch_spin_trylock(arch_spinlock_t *lock)
  */
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->lock, !VAL);
-}
-
 #define arch_spin_is_locked(x) ((x)->lock != 0)
 
 #define arch_read_lock_flags(lock, flags) arch_read_lock(lock)
diff --git a/arch/ia64/include/asm/spinlock.h b/arch/ia64/include/asm/spinlock.h
index ca9e76149a4a..df2c121164b8 100644
--- a/arch/ia64/include/asm/spinlock.h
+++ b/arch/ia64/include/asm/spinlock.h
@@ -76,22 +76,6 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 	ACCESS_ONCE(*p) = (tmp + 2) & ~1;
 }
 
-static __always_inline void __ticket_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	int	*p = (int *)&lock->lock, ticket;
-
-	ia64_invala();
-
-	for (;;) {
-		asm volatile ("ld4.c.nc %0=[%1]" : "=r"(ticket) : "r"(p) : "memory");
-		if (!(((ticket >> TICKET_SHIFT) ^ ticket) & TICKET_MASK))
-			return;
-		cpu_relax();
-	}
-
-	smp_acquire__after_ctrl_dep();
-}
-
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
 {
 	long tmp = ACCESS_ONCE(lock->lock);
@@ -143,11 +127,6 @@ static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock,
 	arch_spin_lock(lock);
 }
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	__ticket_spin_unlock_wait(lock);
-}
-
 #define arch_read_can_lock(rw)		(*(volatile int *)(rw) >= 0)
 #define arch_write_can_lock(rw)	(*(volatile int *)(rw) == 0)
 
diff --git a/arch/m32r/include/asm/spinlock.h b/arch/m32r/include/asm/spinlock.h
index 323c7fc953cd..a56825592b90 100644
--- a/arch/m32r/include/asm/spinlock.h
+++ b/arch/m32r/include/asm/spinlock.h
@@ -30,11 +30,6 @@
 #define arch_spin_is_locked(x)		(*(volatile int *)(&(x)->slock) <= 0)
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->slock, VAL > 0);
-}
-
 /**
  * arch_spin_trylock - Try spin lock and return a result
  * @lock: Pointer to the lock variable
diff --git a/arch/metag/include/asm/spinlock.h b/arch/metag/include/asm/spinlock.h
index c0c7a22be1ae..ddf7fe5708a6 100644
--- a/arch/metag/include/asm/spinlock.h
+++ b/arch/metag/include/asm/spinlock.h
@@ -15,11 +15,6 @@
  * locked.
  */
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->lock, !VAL);
-}
-
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
 #define	arch_read_lock_flags(lock, flags) arch_read_lock(lock)
diff --git a/arch/mn10300/include/asm/spinlock.h b/arch/mn10300/include/asm/spinlock.h
index 9c7b8f7942d8..fe413b41df6c 100644
--- a/arch/mn10300/include/asm/spinlock.h
+++ b/arch/mn10300/include/asm/spinlock.h
@@ -26,11 +26,6 @@
 
 #define arch_spin_is_locked(x)	(*(volatile signed char *)(&(x)->slock) != 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->slock, !VAL);
-}
-
 static inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
 	asm volatile(
diff --git a/arch/parisc/include/asm/spinlock.h b/arch/parisc/include/asm/spinlock.h
index e32936cd7f10..55bfe4affca3 100644
--- a/arch/parisc/include/asm/spinlock.h
+++ b/arch/parisc/include/asm/spinlock.h
@@ -14,13 +14,6 @@ static inline int arch_spin_is_locked(arch_spinlock_t *x)
 
 #define arch_spin_lock(lock) arch_spin_lock_flags(lock, 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *x)
-{
-	volatile unsigned int *a = __ldcw_align(x);
-
-	smp_cond_load_acquire(a, VAL);
-}
-
 static inline void arch_spin_lock_flags(arch_spinlock_t *x,
 					 unsigned long flags)
 {
diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
index 8c1b913de6d7..d256e448ea49 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -170,39 +170,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 	lock->slock = 0;
 }
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	arch_spinlock_t lock_val;
-
-	smp_mb();
-
-	/*
-	 * Atomically load and store back the lock value (unchanged). This
-	 * ensures that our observation of the lock value is ordered with
-	 * respect to other lock operations.
-	 */
-	__asm__ __volatile__(
-"1:	" PPC_LWARX(%0, 0, %2, 0) "\n"
-"	stwcx. %0, 0, %2\n"
-"	bne- 1b\n"
-	: "=&r" (lock_val), "+m" (*lock)
-	: "r" (lock)
-	: "cr0", "xer");
-
-	if (arch_spin_value_unlocked(lock_val))
-		goto out;
-
-	while (lock->slock) {
-		HMT_low();
-		if (SHARED_PROCESSOR)
-			__spin_yield(lock);
-	}
-	HMT_medium();
-
-out:
-	smp_mb();
-}
-
 /*
  * Read-write spinlocks, allowing multiple readers
  * but only one writer.
diff --git a/arch/s390/include/asm/spinlock.h b/arch/s390/include/asm/spinlock.h
index f7838ecd83c6..217ee5210c32 100644
--- a/arch/s390/include/asm/spinlock.h
+++ b/arch/s390/include/asm/spinlock.h
@@ -98,13 +98,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lp)
 		: "cc", "memory");
 }
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	while (arch_spin_is_locked(lock))
-		arch_spin_relax(lock);
-	smp_acquire__after_ctrl_dep();
-}
-
 /*
  * Read-write spinlocks, allowing multiple readers
  * but only one writer.
diff --git a/arch/sh/include/asm/spinlock-cas.h b/arch/sh/include/asm/spinlock-cas.h
index c46e8cc7b515..5ed7dbbd94ff 100644
--- a/arch/sh/include/asm/spinlock-cas.h
+++ b/arch/sh/include/asm/spinlock-cas.h
@@ -29,11 +29,6 @@ static inline unsigned __sl_cas(volatile unsigned *p, unsigned old, unsigned new
 #define arch_spin_is_locked(x)		((x)->lock <= 0)
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->lock, VAL > 0);
-}
-
 static inline void arch_spin_lock(arch_spinlock_t *lock)
 {
 	while (!__sl_cas(&lock->lock, 1, 0));
diff --git a/arch/sh/include/asm/spinlock-llsc.h b/arch/sh/include/asm/spinlock-llsc.h
index cec78143fa83..f77263aae760 100644
--- a/arch/sh/include/asm/spinlock-llsc.h
+++ b/arch/sh/include/asm/spinlock-llsc.h
@@ -21,11 +21,6 @@
 #define arch_spin_is_locked(x)		((x)->lock <= 0)
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->lock, VAL > 0);
-}
-
 /*
  * Simple spin lock operations.  There are two variants, one clears IRQ's
  * on the local processor, one does not.
diff --git a/arch/sparc/include/asm/spinlock_32.h b/arch/sparc/include/asm/spinlock_32.h
index 8011e79f59c9..67345b2dc408 100644
--- a/arch/sparc/include/asm/spinlock_32.h
+++ b/arch/sparc/include/asm/spinlock_32.h
@@ -14,11 +14,6 @@
 
 #define arch_spin_is_locked(lock) (*((volatile unsigned char *)(lock)) != 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->lock, !VAL);
-}
-
 static inline void arch_spin_lock(arch_spinlock_t *lock)
 {
 	__asm__ __volatile__(
diff --git a/arch/tile/include/asm/spinlock_32.h b/arch/tile/include/asm/spinlock_32.h
index b14b1ba5bf9c..cba8ba9b8da6 100644
--- a/arch/tile/include/asm/spinlock_32.h
+++ b/arch/tile/include/asm/spinlock_32.h
@@ -64,8 +64,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 	lock->current_ticket = old_ticket + TICKET_QUANTUM;
 }
 
-void arch_spin_unlock_wait(arch_spinlock_t *lock);
-
 /*
  * Read-write spinlocks, allowing multiple readers
  * but only one writer.
diff --git a/arch/tile/include/asm/spinlock_64.h b/arch/tile/include/asm/spinlock_64.h
index b9718fb4e74a..9a2c2d605752 100644
--- a/arch/tile/include/asm/spinlock_64.h
+++ b/arch/tile/include/asm/spinlock_64.h
@@ -58,8 +58,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 	__insn_fetchadd4(&lock->lock, 1U << __ARCH_SPIN_CURRENT_SHIFT);
 }
 
-void arch_spin_unlock_wait(arch_spinlock_t *lock);
-
 void arch_spin_lock_slow(arch_spinlock_t *lock, u32 val);
 
 /* Grab the "next" ticket number and bump it atomically.
diff --git a/arch/tile/lib/spinlock_32.c b/arch/tile/lib/spinlock_32.c
index 076c6cc43113..db9333f2447c 100644
--- a/arch/tile/lib/spinlock_32.c
+++ b/arch/tile/lib/spinlock_32.c
@@ -62,29 +62,6 @@ int arch_spin_trylock(arch_spinlock_t *lock)
 }
 EXPORT_SYMBOL(arch_spin_trylock);
 
-void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	u32 iterations = 0;
-	int curr = READ_ONCE(lock->current_ticket);
-	int next = READ_ONCE(lock->next_ticket);
-
-	/* Return immediately if unlocked. */
-	if (next == curr)
-		return;
-
-	/* Wait until the current locker has released the lock. */
-	do {
-		delay_backoff(iterations++);
-	} while (READ_ONCE(lock->current_ticket) == curr);
-
-	/*
-	 * The TILE architecture doesn't do read speculation; therefore
-	 * a control dependency guarantees a LOAD->{LOAD,STORE} order.
-	 */
-	barrier();
-}
-EXPORT_SYMBOL(arch_spin_unlock_wait);
-
 /*
  * The low byte is always reserved to be the marker for a "tns" operation
  * since the low bit is set to "1" by a tns.  The next seven bits are
diff --git a/arch/tile/lib/spinlock_64.c b/arch/tile/lib/spinlock_64.c
index a4b5b2cbce93..de414c22892f 100644
--- a/arch/tile/lib/spinlock_64.c
+++ b/arch/tile/lib/spinlock_64.c
@@ -62,28 +62,6 @@ int arch_spin_trylock(arch_spinlock_t *lock)
 }
 EXPORT_SYMBOL(arch_spin_trylock);
 
-void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	u32 iterations = 0;
-	u32 val = READ_ONCE(lock->lock);
-	u32 curr = arch_spin_current(val);
-
-	/* Return immediately if unlocked. */
-	if (arch_spin_next(val) == curr)
-		return;
-
-	/* Wait until the current locker has released the lock. */
-	do {
-		delay_backoff(iterations++);
-	} while (arch_spin_current(READ_ONCE(lock->lock)) == curr);
-
-	/*
-	 * The TILE architecture doesn't do read speculation; therefore
-	 * a control dependency guarantees a LOAD->{LOAD,STORE} order.
-	 */
-	barrier();
-}
-EXPORT_SYMBOL(arch_spin_unlock_wait);
 
 /*
  * If the read lock fails due to a writer, we retry periodically
diff --git a/arch/xtensa/include/asm/spinlock.h b/arch/xtensa/include/asm/spinlock.h
index a36221cf6363..3bb49681ee24 100644
--- a/arch/xtensa/include/asm/spinlock.h
+++ b/arch/xtensa/include/asm/spinlock.h
@@ -33,11 +33,6 @@
 
 #define arch_spin_is_locked(x) ((x)->slock != 0)
 
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	smp_cond_load_acquire(&lock->slock, !VAL);
-}
-
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
 static inline void arch_spin_lock(arch_spinlock_t *lock)
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 tip/core/rcu 4/9] completion: Replace spin_unlock_wait() with lock/unlock pair
  2017-07-24 22:13 ` [PATCH tip/core/rcu 4/9] completion: " Paul E. McKenney
@ 2017-08-15 16:16   ` Paul E. McKenney
  2017-08-16 15:22     ` Steven Rostedt
  2017-08-17  8:26     ` Ingo Molnar
  0 siblings, 2 replies; 33+ messages in thread
From: Paul E. McKenney @ 2017-08-15 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Ingo Molnar, Will Deacon, Alan Stern, Andrea Parri,
	Linus Torvalds

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore replaces the spin_unlock_wait() call in
completion_done() with spin_lock() followed immediately by spin_unlock().
This should be safe from a performance perspective because the lock
will be held only the wakeup happens really quickly.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
[ paulmck: Updated to use irqsave based on 0day Test Robot feedback. ]

diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index 13fc5ae9bf2f..c9524d2d9316 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -300,6 +300,8 @@ EXPORT_SYMBOL(try_wait_for_completion);
  */
 bool completion_done(struct completion *x)
 {
+	unsigned long flags;
+
 	if (!READ_ONCE(x->done))
 		return false;
 
@@ -307,14 +309,9 @@ bool completion_done(struct completion *x)
 	 * If ->done, we need to wait for complete() to release ->wait.lock
 	 * otherwise we can end up freeing the completion before complete()
 	 * is done referencing it.
-	 *
-	 * The RMB pairs with complete()'s RELEASE of ->wait.lock and orders
-	 * the loads of ->done and ->wait.lock such that we cannot observe
-	 * the lock before complete() acquires it while observing the ->done
-	 * after it's acquired the lock.
 	 */
-	smp_rmb();
-	spin_unlock_wait(&x->wait.lock);
+	spin_lock_irqsave(&x->wait.lock, flags);
+	spin_unlock_irqrestore(&x->wait.lock, flags);
 	return true;
 }
 EXPORT_SYMBOL(completion_done);

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 tip/core/rcu 4/9] completion: Replace spin_unlock_wait() with lock/unlock pair
  2017-08-15 16:16   ` [PATCH v5 " Paul E. McKenney
@ 2017-08-16 15:22     ` Steven Rostedt
  2017-08-17 15:07       ` Paul E. McKenney
  2017-08-17  8:26     ` Ingo Molnar
  1 sibling, 1 reply; 33+ messages in thread
From: Steven Rostedt @ 2017-08-16 15:22 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg, Ingo Molnar, Will Deacon, Alan Stern,
	Andrea Parri, Linus Torvalds

On Tue, 15 Aug 2017 09:16:29 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> There is no agreed-upon definition of spin_unlock_wait()'s semantics,
> and it appears that all callers could do just as well with a lock/unlock
> pair.  This commit therefore replaces the spin_unlock_wait() call in
> completion_done() with spin_lock() followed immediately by spin_unlock().
> This should be safe from a performance perspective because the lock
> will be held only the wakeup happens really quickly.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: Andrea Parri <parri.andrea@gmail.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> [ paulmck: Updated to use irqsave based on 0day Test Robot feedback. ]
> 
> diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
> index 13fc5ae9bf2f..c9524d2d9316 100644
> --- a/kernel/sched/completion.c
> +++ b/kernel/sched/completion.c
> @@ -300,6 +300,8 @@ EXPORT_SYMBOL(try_wait_for_completion);
>   */
>  bool completion_done(struct completion *x)
>  {
> +	unsigned long flags;
> +
>  	if (!READ_ONCE(x->done))
>  		return false;
>  
> @@ -307,14 +309,9 @@ bool completion_done(struct completion *x)
>  	 * If ->done, we need to wait for complete() to release ->wait.lock
>  	 * otherwise we can end up freeing the completion before complete()
>  	 * is done referencing it.
> -	 *
> -	 * The RMB pairs with complete()'s RELEASE of ->wait.lock and orders
> -	 * the loads of ->done and ->wait.lock such that we cannot observe
> -	 * the lock before complete() acquires it while observing the ->done
> -	 * after it's acquired the lock.
>  	 */
> -	smp_rmb();
> -	spin_unlock_wait(&x->wait.lock);
> +	spin_lock_irqsave(&x->wait.lock, flags);
> +	spin_unlock_irqrestore(&x->wait.lock, flags);
>  	return true;
>  }
>  EXPORT_SYMBOL(completion_done);

For this patch:

Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

But I was looking at this function, and it is a little worrisome, as it
says it should return false if there are waiters and true otherwise.
But it can also return false if there are no waiters and the completion
is already done.

Basically we have:

	wait_for_completion() {
		while (!done)
			wait();
		done--;
	}

	complete() {
		done++;
		wake_up_waiters();
	}

Thus, completion_done() only returns true if a complete happened and a
wait_for_completion has not. It does not return true if the complete
has not yet occurred, but there are still waiters.

I looked at a couple of use cases, and this does not appear to be an
issue, but the documentation about the completion_done() does not
exactly fit the implementation. Should that be addressed?

Also, if complete_all() is called, then reinit_completion() must be
called before that completion is used. The reinit_completion() has a
comment stating this, but there's no comment by complete_all() stating
this, which is where it really should be. I'll send a patch to fix this
one.

-- Steve

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 tip/core/rcu 4/9] completion: Replace spin_unlock_wait() with lock/unlock pair
  2017-08-15 16:16   ` [PATCH v5 " Paul E. McKenney
  2017-08-16 15:22     ` Steven Rostedt
@ 2017-08-17  8:26     ` Ingo Molnar
  2017-08-17 12:30       ` Paul E. McKenney
  1 sibling, 1 reply; 33+ messages in thread
From: Ingo Molnar @ 2017-08-17  8:26 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, jiangshanlai, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Ingo Molnar, Will Deacon, Alan Stern, Andrea Parri,
	Linus Torvalds


* Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:

> There is no agreed-upon definition of spin_unlock_wait()'s semantics,
> and it appears that all callers could do just as well with a lock/unlock
> pair.  This commit therefore replaces the spin_unlock_wait() call in
> completion_done() with spin_lock() followed immediately by spin_unlock().
> This should be safe from a performance perspective because the lock
> will be held only the wakeup happens really quickly.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: Andrea Parri <parri.andrea@gmail.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> [ paulmck: Updated to use irqsave based on 0day Test Robot feedback. ]
> 
> diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
> index 13fc5ae9bf2f..c9524d2d9316 100644
> --- a/kernel/sched/completion.c
> +++ b/kernel/sched/completion.c
> @@ -300,6 +300,8 @@ EXPORT_SYMBOL(try_wait_for_completion);
>   */
>  bool completion_done(struct completion *x)
>  {
> +	unsigned long flags;
> +
>  	if (!READ_ONCE(x->done))
>  		return false;
>  
> @@ -307,14 +309,9 @@ bool completion_done(struct completion *x)
>  	 * If ->done, we need to wait for complete() to release ->wait.lock
>  	 * otherwise we can end up freeing the completion before complete()
>  	 * is done referencing it.
> -	 *
> -	 * The RMB pairs with complete()'s RELEASE of ->wait.lock and orders
> -	 * the loads of ->done and ->wait.lock such that we cannot observe
> -	 * the lock before complete() acquires it while observing the ->done
> -	 * after it's acquired the lock.
>  	 */
> -	smp_rmb();
> -	spin_unlock_wait(&x->wait.lock);
> +	spin_lock_irqsave(&x->wait.lock, flags);
> +	spin_unlock_irqrestore(&x->wait.lock, flags);
>  	return true;
>  }
>  EXPORT_SYMBOL(completion_done);

I'm fine with this patch - as long as there are no performance regression reports. 
(which I suspect there won't be.)

Would you like to carry this in the RCU tree, due to other changes depending on 
this change - or can I pick this up into the scheduler tree?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 tip/core/rcu 4/9] completion: Replace spin_unlock_wait() with lock/unlock pair
  2017-08-17  8:26     ` Ingo Molnar
@ 2017-08-17 12:30       ` Paul E. McKenney
  2017-08-17 12:49         ` Ingo Molnar
  0 siblings, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2017-08-17 12:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, jiangshanlai, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Ingo Molnar, Will Deacon, Alan Stern, Andrea Parri,
	Linus Torvalds

On Thu, Aug 17, 2017 at 10:26:16AM +0200, Ingo Molnar wrote:
> 
> * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> 
> > There is no agreed-upon definition of spin_unlock_wait()'s semantics,
> > and it appears that all callers could do just as well with a lock/unlock
> > pair.  This commit therefore replaces the spin_unlock_wait() call in
> > completion_done() with spin_lock() followed immediately by spin_unlock().
> > This should be safe from a performance perspective because the lock
> > will be held only the wakeup happens really quickly.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Will Deacon <will.deacon@arm.com>
> > Cc: Alan Stern <stern@rowland.harvard.edu>
> > Cc: Andrea Parri <parri.andrea@gmail.com>
> > Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > [ paulmck: Updated to use irqsave based on 0day Test Robot feedback. ]
> > 
> > diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
> > index 13fc5ae9bf2f..c9524d2d9316 100644
> > --- a/kernel/sched/completion.c
> > +++ b/kernel/sched/completion.c
> > @@ -300,6 +300,8 @@ EXPORT_SYMBOL(try_wait_for_completion);
> >   */
> >  bool completion_done(struct completion *x)
> >  {
> > +	unsigned long flags;
> > +
> >  	if (!READ_ONCE(x->done))
> >  		return false;
> >  
> > @@ -307,14 +309,9 @@ bool completion_done(struct completion *x)
> >  	 * If ->done, we need to wait for complete() to release ->wait.lock
> >  	 * otherwise we can end up freeing the completion before complete()
> >  	 * is done referencing it.
> > -	 *
> > -	 * The RMB pairs with complete()'s RELEASE of ->wait.lock and orders
> > -	 * the loads of ->done and ->wait.lock such that we cannot observe
> > -	 * the lock before complete() acquires it while observing the ->done
> > -	 * after it's acquired the lock.
> >  	 */
> > -	smp_rmb();
> > -	spin_unlock_wait(&x->wait.lock);
> > +	spin_lock_irqsave(&x->wait.lock, flags);
> > +	spin_unlock_irqrestore(&x->wait.lock, flags);
> >  	return true;
> >  }
> >  EXPORT_SYMBOL(completion_done);
> 
> I'm fine with this patch - as long as there are no performance regression reports. 
> (which I suspect there won't be.)

If there is, 0day Test Robot hasn't spotted it, nor has anyone else
reported it.

> Would you like to carry this in the RCU tree, due to other changes depending on 
> this change - or can I pick this up into the scheduler tree?

Timely question!  ;-)

My current plan is to send you a pull request like the following later
today, Pacific Time (but rebased adding Steve Rostedt's Reviewed-by).
This patch is on one of the branches, currently v4.13-rc2..93d8d7a12090
("arch: Remove spin_unlock_wait() arch-specific definitions") in my
-rcu tree.

Ah, and v4.13-rc2..7391304c4959 ("membarrier: Expedited private command")
is mostly outside of RCU as well.

Since I will be rebasing and remerging anyway, if you would prefer that I
split the spin_unlock_wait() and/or misc branches out, I am happy to do so.
If I don't hear otherwise, though, I will send all seven branches using
my usual approach.

So, if you want something different than my usual approach, please just
let me know!

							Thanx, Paul

PS.	At the moment, there are no code changes to be applied, just
	Steve's Reviewed-by.

------------------------------------------------------------------------
Prototype pull request
------------------------------------------------------------------------

The following changes since commit 520eccdfe187591a51ea9ab4c1a024ae4d0f68d9:

  Linux 4.13-rc2 (2017-07-23 16:15:17 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git 20f2d43c114bf57d16580a758120cc65aa991bea

for you to fetch changes up to 20f2d43c114bf57d16580a758120cc65aa991bea:

  Merge branches 'doc.2017.07.24c', 'fixes.2017.08.09a', 'hotplug.2017.07.25b', 'misc.2017.08.10a', 'spin_unlock_wait_no.2017.08.11a', 'srcu.2017.07.27c' and 'torture.2017.07.24c' into HEAD (2017-08-11 13:15:44 -0700)

----------------------------------------------------------------
Joe Perches (1):
      module: Fix pr_fmt() bug for header use of printk

Luis R. Rodriguez (2):
      swait: add idle variants which don't contribute to load average
      rcu: use idle versions of swait to make idle-hack clear

Manfred Spraul (1):
      net/netfilter/nf_conntrack_core: Fix net_conntrack_lock()

Masami Hiramatsu (1):
      rcu/tracing: Set disable_rcu_irq_enter on rcu_eqs_exit()

Mathieu Desnoyers (1):
      membarrier: Expedited private command

Oleg Nesterov (1):
      task_work: Replace spin_unlock_wait() with lock/unlock pair

Paul E. McKenney (56):
      documentation: Fix relation between nohz_full and rcu_nocbs
      doc: RCU documentation update
      doc: Update memory-barriers.txt for read-to-write dependencies
      doc: Add RCU files to docbook-generation files
      doc: No longer allowed to use rcu_dereference on non-pointers
      doc: Set down RCU's scheduling-clock-interrupt needs
      init_task: Remove redundant INIT_TASK_RCU_TREE_PREEMPT() macro
      srcu: Move rcu_scheduler_starting() from Tiny RCU to Tiny SRCU
      rcutorture: Remove obsolete SRCU-C.boot
      srcu: Make process_srcu() be static
      rcutorture: Move SRCU status printing to SRCU implementations
      rcutorture: Print SRCU lock/unlock totals
      rcu: Remove CONFIG_TASKS_RCU ifdef from rcuperf.c
      rcutorture: Select CONFIG_PROVE_LOCKING for Tiny SRCU scenario
      torture: Add --kconfig argument to kvm.sh
      rcutorture: Don't wait for kernel when all builds fail
      rcutorture: Enable SRCU readers from timer handler
      rcutorture: Place event-traced strings into trace buffer
      rcutorture: Use nr_cpus rather than maxcpus to limit test size
      rcutorture: Add task's CPU for rcutorture writer stalls
      rcutorture: Eliminate unused ts_rem local from rcu_trace_clock_local()
      rcu: Add last-CPU to GP-kthread starvation messages
      rcutorture: Invoke call_rcu() from timer handler
      rcu: Use timer as backstop for NOCB deferred wakeups
      atomics: Revert addition of comment header to spin_unlock_wait()
      rcu: Migrate callbacks earlier in the CPU-offline timeline
      rcu: Make expedited GPs correctly handle hardware CPU insertion
      torture: Fix typo suppressing CPU-hotplug statistics
      rcu: Remove orphan/adopt event-tracing fields
      rcu: Check for NOCB CPUs and empty lists earlier in CB migration
      rcu: Make NOCB CPUs migrate CBs directly from outgoing CPU
      rcu: Advance outgoing CPU's callbacks before migrating them
      rcu: Eliminate rcu_state ->orphan_lock
      rcu: Advance callbacks after migration
      rcu: Localize rcu_state ->orphan_pend and ->orphan_done
      rcu: Remove unused RCU list functions
      rcu: Move callback-list warning to irq-disable region
      srcu: Provide ordering for CPU not involved in grace period
      sched,rcu: Make cond_resched() provide RCU quiescent state
      rcu: Drive TASKS_RCU directly off of PREEMPT
      rcu: Create reasonable API for do_exit() TASKS_RCU processing
      rcu: Add TPS() to event-traced strings
      rcu: Move rcu.h to new trivial-function style
      rcu: Add event tracing to ->gp_tasks update at GP start
      rcu: Add TPS() protection for _rcu_barrier_trace strings
      rcu: Add assertions verifying blocked-tasks list
      rcu: Add warning to rcu_idle_enter() for irqs enabled
      rcu: Remove exports from rcu_idle_exit() and rcu_idle_enter()
      sched: Replace spin_unlock_wait() with lock/unlock pair
      completion: Replace spin_unlock_wait() with lock/unlock pair
      exit: Replace spin_unlock_wait() with lock/unlock pair
      ipc: Replace spin_unlock_wait() with lock/unlock pair
      drivers/ata: Replace spin_unlock_wait() with lock/unlock pair
      locking: Remove spin_unlock_wait() generic definitions
      arch: Remove spin_unlock_wait() arch-specific definitions
      Merge branches 'doc.2017.07.24c', 'fixes.2017.08.09a', 'hotplug.2017.07.25b', 'misc.2017.08.10a', 'spin_unlock_wait_no.2017.08.11a', 'srcu.2017.07.27c' and 'torture.2017.07.24c' into HEAD

Peter Zijlstra (Intel) (1):
      rcu: Make rcu_idle_enter() rely on callers disabling irqs

Tejun Heo (1):
      sched: Allow migrating kthreads into online but inactive CPUs

 .../RCU/Design/Requirements/Requirements.html      | 130 +++++++++++
 Documentation/RCU/checklist.txt                    | 121 +++++++----
 Documentation/RCU/rcu.txt                          |   9 +-
 Documentation/RCU/rcu_dereference.txt              |  61 ++----
 Documentation/RCU/rcubarrier.txt                   |   5 +
 Documentation/RCU/torture.txt                      |  20 +-
 Documentation/RCU/whatisRCU.txt                    |   5 +-
 Documentation/admin-guide/kernel-parameters.txt    |   7 +-
 Documentation/core-api/kernel-api.rst              |  49 +++++
 Documentation/memory-barriers.txt                  |  41 ++--
 MAINTAINERS                                        |   2 +-
 arch/alpha/include/asm/spinlock.h                  |   5 -
 arch/arc/include/asm/spinlock.h                    |   5 -
 arch/arm/include/asm/spinlock.h                    |  16 --
 arch/arm64/include/asm/spinlock.h                  |  58 +----
 arch/arm64/kernel/process.c                        |   2 +
 arch/blackfin/include/asm/spinlock.h               |   5 -
 arch/blackfin/kernel/module.c                      |  39 ++--
 arch/hexagon/include/asm/spinlock.h                |   5 -
 arch/ia64/include/asm/spinlock.h                   |  21 --
 arch/m32r/include/asm/spinlock.h                   |   5 -
 arch/metag/include/asm/spinlock.h                  |   5 -
 arch/mn10300/include/asm/spinlock.h                |   5 -
 arch/parisc/include/asm/spinlock.h                 |   7 -
 arch/powerpc/include/asm/spinlock.h                |  33 ---
 arch/s390/include/asm/spinlock.h                   |   7 -
 arch/sh/include/asm/spinlock-cas.h                 |   5 -
 arch/sh/include/asm/spinlock-llsc.h                |   5 -
 arch/sparc/include/asm/spinlock_32.h               |   5 -
 arch/tile/include/asm/spinlock_32.h                |   2 -
 arch/tile/include/asm/spinlock_64.h                |   2 -
 arch/tile/lib/spinlock_32.c                        |  23 --
 arch/tile/lib/spinlock_64.c                        |  22 --
 arch/xtensa/include/asm/spinlock.h                 |   5 -
 drivers/ata/libata-eh.c                            |   8 +-
 include/asm-generic/qspinlock.h                    |  14 --
 include/linux/init_task.h                          |   8 +-
 include/linux/rcupdate.h                           |  15 +-
 include/linux/rcutiny.h                            |   8 +-
 include/linux/sched.h                              |   8 +-
 include/linux/spinlock.h                           |  31 ---
 include/linux/spinlock_up.h                        |   6 -
 include/linux/srcutiny.h                           |  13 ++
 include/linux/srcutree.h                           |   3 +-
 include/linux/swait.h                              |  55 +++++
 include/trace/events/rcu.h                         |   7 +-
 include/uapi/linux/membarrier.h                    |  23 +-
 ipc/sem.c                                          |   3 +-
 kernel/Makefile                                    |   1 -
 kernel/cpu.c                                       |   1 +
 kernel/exit.c                                      |  10 +-
 kernel/locking/qspinlock.c                         | 117 ----------
 kernel/membarrier.c                                |  70 ------
 kernel/rcu/Kconfig                                 |   3 +-
 kernel/rcu/rcu.h                                   | 128 ++---------
 kernel/rcu/rcu_segcblist.c                         | 108 +++-------
 kernel/rcu/rcu_segcblist.h                         |  28 +--
 kernel/rcu/rcuperf.c                               |  17 +-
 kernel/rcu/rcutorture.c                            |  83 +++----
 kernel/rcu/srcutiny.c                              |   8 +
 kernel/rcu/srcutree.c                              |  50 ++++-
 kernel/rcu/tiny.c                                  |   2 -
 kernel/rcu/tiny_plugin.h                           |  47 ----
 kernel/rcu/tree.c                                  | 238 ++++++++-------------
 kernel/rcu/tree.h                                  |  15 +-
 kernel/rcu/tree_exp.h                              |   2 +-
 kernel/rcu/tree_plugin.h                           | 238 ++++++++++++---------
 kernel/rcu/update.c                                |  18 +-
 kernel/sched/Makefile                              |   1 +
 kernel/sched/completion.c                          |  11 +-
 kernel/sched/core.c                                |  39 +++-
 kernel/sched/membarrier.c                          | 152 +++++++++++++
 kernel/task_work.c                                 |   8 +-
 kernel/torture.c                                   |   2 +-
 net/netfilter/nf_conntrack_core.c                  |  52 +++--
 .../selftests/rcutorture/bin/config_override.sh    |  61 ++++++
 .../testing/selftests/rcutorture/bin/functions.sh  |  27 ++-
 .../testing/selftests/rcutorture/bin/kvm-build.sh  |  11 +-
 .../selftests/rcutorture/bin/kvm-test-1-run.sh     |  58 ++---
 tools/testing/selftests/rcutorture/bin/kvm.sh      |  34 ++-
 .../selftests/rcutorture/configs/rcu/BUSTED.boot   |   2 +-
 .../selftests/rcutorture/configs/rcu/SRCU-C.boot   |   1 -
 .../selftests/rcutorture/configs/rcu/SRCU-u        |   3 +-
 .../selftests/rcutorture/configs/rcu/TREE01.boot   |   2 +-
 .../selftests/rcutorture/doc/TREE_RCU-kconfig.txt  |   2 +-
 85 files changed, 1245 insertions(+), 1344 deletions(-)
 delete mode 100644 kernel/membarrier.c
 delete mode 100644 kernel/rcu/tiny_plugin.h
 create mode 100644 kernel/sched/membarrier.c
 create mode 100755 tools/testing/selftests/rcutorture/bin/config_override.sh
 delete mode 100644 tools/testing/selftests/rcutorture/configs/rcu/SRCU-C.boot

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 tip/core/rcu 4/9] completion: Replace spin_unlock_wait() with lock/unlock pair
  2017-08-17 12:30       ` Paul E. McKenney
@ 2017-08-17 12:49         ` Ingo Molnar
  2017-08-17 14:13           ` Paul E. McKenney
  0 siblings, 1 reply; 33+ messages in thread
From: Ingo Molnar @ 2017-08-17 12:49 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, jiangshanlai, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Ingo Molnar, Will Deacon, Alan Stern, Andrea Parri,
	Linus Torvalds


* Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:

> > this change - or can I pick this up into the scheduler tree?
> 
> Timely question!  ;-)
> 
> My current plan is to send you a pull request like the following later
> today, Pacific Time (but rebased adding Steve Rostedt's Reviewed-by).
> This patch is on one of the branches, currently v4.13-rc2..93d8d7a12090
> ("arch: Remove spin_unlock_wait() arch-specific definitions") in my
> -rcu tree.
> 
> Ah, and v4.13-rc2..7391304c4959 ("membarrier: Expedited private command")
> is mostly outside of RCU as well.
> 
> Since I will be rebasing and remerging anyway, if you would prefer that I
> split the spin_unlock_wait() and/or misc branches out, I am happy to do so.
> If I don't hear otherwise, though, I will send all seven branches using
> my usual approach.
> 
> So, if you want something different than my usual approach, please just
> let me know!

No, all branches together sounds good to me!

If you are rebasing anyway, here are some (very minor) commit title nits I noticed:

>       swait: add idle variants which don't contribute to load average
>       rcu: use idle versions of swait to make idle-hack clear

Capitalization.

>       membarrier: Expedited private command

Should start with a verb.

>       doc: RCU documentation update

  doc: Update RCU documentation

?

>       doc: No longer allowed to use rcu_dereference on non-pointers

  doc: Describe that it is no longer allowed to use rcu_dereference() on non-pointers

?

>       torture: Add --kconfig argument to kvm.sh
>       rcutorture: Don't wait for kernel when all builds fail

Is there a difference between 'torture: ' and 'rcutorture: ' prefixes?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 tip/core/rcu 4/9] completion: Replace spin_unlock_wait() with lock/unlock pair
  2017-08-17 12:49         ` Ingo Molnar
@ 2017-08-17 14:13           ` Paul E. McKenney
  2017-08-17 15:32             ` Paul E. McKenney
  0 siblings, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2017-08-17 14:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, jiangshanlai, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Ingo Molnar, Will Deacon, Alan Stern, Andrea Parri,
	Linus Torvalds

On Thu, Aug 17, 2017 at 02:49:09PM +0200, Ingo Molnar wrote:
> 
> * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> 
> > > this change - or can I pick this up into the scheduler tree?
> > 
> > Timely question!  ;-)
> > 
> > My current plan is to send you a pull request like the following later
> > today, Pacific Time (but rebased adding Steve Rostedt's Reviewed-by).
> > This patch is on one of the branches, currently v4.13-rc2..93d8d7a12090
> > ("arch: Remove spin_unlock_wait() arch-specific definitions") in my
> > -rcu tree.
> > 
> > Ah, and v4.13-rc2..7391304c4959 ("membarrier: Expedited private command")
> > is mostly outside of RCU as well.
> > 
> > Since I will be rebasing and remerging anyway, if you would prefer that I
> > split the spin_unlock_wait() and/or misc branches out, I am happy to do so.
> > If I don't hear otherwise, though, I will send all seven branches using
> > my usual approach.
> > 
> > So, if you want something different than my usual approach, please just
> > let me know!
> 
> No, all branches together sounds good to me!

Very good, will do!

> If you are rebasing anyway, here are some (very minor) commit title nits I noticed:
> 
> >       swait: add idle variants which don't contribute to load average
> >       rcu: use idle versions of swait to make idle-hack clear
> 
> Capitalization.

Will fix!  Believe it or not, I looked for these...  :-/

> >       membarrier: Expedited private command
> 
> Should start with a verb.

OK, something like "Provide expedited private command".

> >       doc: RCU documentation update
> 
>   doc: Update RCU documentation
> 
> ?

Works for me!

> >       doc: No longer allowed to use rcu_dereference on non-pointers
> 
>   doc: Describe that it is no longer allowed to use rcu_dereference() on non-pointers
> 
> ?

Will add a real commit log.

> >       torture: Add --kconfig argument to kvm.sh
> >       rcutorture: Don't wait for kernel when all builds fail
> 
> Is there a difference between 'torture: ' and 'rcutorture: ' prefixes?

Yes, rcutorture is specific to RCU, while torture would also affect
locktorture.

Ah, and if I am delaying the cond_resched() patch, I need to retest,
which means I will send you the pull request tomorrow or Monday, depending
on how the testing goes.

							Thanx, Paul

> Thanks,
> 
> 	Ingo
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 tip/core/rcu 4/9] completion: Replace spin_unlock_wait() with lock/unlock pair
  2017-08-16 15:22     ` Steven Rostedt
@ 2017-08-17 15:07       ` Paul E. McKenney
  0 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2017-08-17 15:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg, Ingo Molnar, Will Deacon, Alan Stern,
	Andrea Parri, Linus Torvalds

On Wed, Aug 16, 2017 at 11:22:35AM -0400, Steven Rostedt wrote:
> On Tue, 15 Aug 2017 09:16:29 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > There is no agreed-upon definition of spin_unlock_wait()'s semantics,
> > and it appears that all callers could do just as well with a lock/unlock
> > pair.  This commit therefore replaces the spin_unlock_wait() call in
> > completion_done() with spin_lock() followed immediately by spin_unlock().
> > This should be safe from a performance perspective because the lock
> > will be held only the wakeup happens really quickly.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Will Deacon <will.deacon@arm.com>
> > Cc: Alan Stern <stern@rowland.harvard.edu>
> > Cc: Andrea Parri <parri.andrea@gmail.com>
> > Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > [ paulmck: Updated to use irqsave based on 0day Test Robot feedback. ]
> > 
> > diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
> > index 13fc5ae9bf2f..c9524d2d9316 100644
> > --- a/kernel/sched/completion.c
> > +++ b/kernel/sched/completion.c
> > @@ -300,6 +300,8 @@ EXPORT_SYMBOL(try_wait_for_completion);
> >   */
> >  bool completion_done(struct completion *x)
> >  {
> > +	unsigned long flags;
> > +
> >  	if (!READ_ONCE(x->done))
> >  		return false;
> >  
> > @@ -307,14 +309,9 @@ bool completion_done(struct completion *x)
> >  	 * If ->done, we need to wait for complete() to release ->wait.lock
> >  	 * otherwise we can end up freeing the completion before complete()
> >  	 * is done referencing it.
> > -	 *
> > -	 * The RMB pairs with complete()'s RELEASE of ->wait.lock and orders
> > -	 * the loads of ->done and ->wait.lock such that we cannot observe
> > -	 * the lock before complete() acquires it while observing the ->done
> > -	 * after it's acquired the lock.
> >  	 */
> > -	smp_rmb();
> > -	spin_unlock_wait(&x->wait.lock);
> > +	spin_lock_irqsave(&x->wait.lock, flags);
> > +	spin_unlock_irqrestore(&x->wait.lock, flags);
> >  	return true;
> >  }
> >  EXPORT_SYMBOL(completion_done);
> 
> For this patch:
> 
> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

Applied, thank you!

> But I was looking at this function, and it is a little worrisome, as it
> says it should return false if there are waiters and true otherwise.
> But it can also return false if there are no waiters and the completion
> is already done.
> 
> Basically we have:
> 
> 	wait_for_completion() {
> 		while (!done)
> 			wait();
> 		done--;
> 	}
> 
> 	complete() {
> 		done++;
> 		wake_up_waiters();
> 	}
> 
> Thus, completion_done() only returns true if a complete happened and a
> wait_for_completion has not. It does not return true if the complete
> has not yet occurred, but there are still waiters.
> 
> I looked at a couple of use cases, and this does not appear to be an
> issue, but the documentation about the completion_done() does not
> exactly fit the implementation. Should that be addressed?
> 
> Also, if complete_all() is called, then reinit_completion() must be
> called before that completion is used. The reinit_completion() has a
> comment stating this, but there's no comment by complete_all() stating
> this, which is where it really should be. I'll send a patch to fix this
> one.

But I am too late to return the favor -- good patch, though!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 tip/core/rcu 4/9] completion: Replace spin_unlock_wait() with lock/unlock pair
  2017-08-17 14:13           ` Paul E. McKenney
@ 2017-08-17 15:32             ` Paul E. McKenney
  0 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2017-08-17 15:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, jiangshanlai, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Ingo Molnar, Will Deacon, Alan Stern, Andrea Parri,
	Linus Torvalds

On Thu, Aug 17, 2017 at 07:13:04AM -0700, Paul E. McKenney wrote:
> On Thu, Aug 17, 2017 at 02:49:09PM +0200, Ingo Molnar wrote:
> > 
> > * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> > 
> > > > this change - or can I pick this up into the scheduler tree?
> > > 
> > > Timely question!  ;-)
> > > 
> > > My current plan is to send you a pull request like the following later
> > > today, Pacific Time (but rebased adding Steve Rostedt's Reviewed-by).
> > > This patch is on one of the branches, currently v4.13-rc2..93d8d7a12090
> > > ("arch: Remove spin_unlock_wait() arch-specific definitions") in my
> > > -rcu tree.
> > > 
> > > Ah, and v4.13-rc2..7391304c4959 ("membarrier: Expedited private command")
> > > is mostly outside of RCU as well.
> > > 
> > > Since I will be rebasing and remerging anyway, if you would prefer that I
> > > split the spin_unlock_wait() and/or misc branches out, I am happy to do so.
> > > If I don't hear otherwise, though, I will send all seven branches using
> > > my usual approach.
> > > 
> > > So, if you want something different than my usual approach, please just
> > > let me know!
> > 
> > No, all branches together sounds good to me!
> 
> Very good, will do!
> 
> > If you are rebasing anyway, here are some (very minor) commit title nits I noticed:
> > 
> > >       swait: add idle variants which don't contribute to load average
> > >       rcu: use idle versions of swait to make idle-hack clear
> > 
> > Capitalization.
> 
> Will fix!  Believe it or not, I looked for these...  :-/
> 
> > >       membarrier: Expedited private command
> > 
> > Should start with a verb.
> 
> OK, something like "Provide expedited private command".
> 
> > >       doc: RCU documentation update
> > 
> >   doc: Update RCU documentation
> > 
> > ?
> 
> Works for me!
> 
> > >       doc: No longer allowed to use rcu_dereference on non-pointers
> > 
> >   doc: Describe that it is no longer allowed to use rcu_dereference() on non-pointers
> > 
> > ?
> 
> Will add a real commit log.
> 
> > >       torture: Add --kconfig argument to kvm.sh
> > >       rcutorture: Don't wait for kernel when all builds fail
> > 
> > Is there a difference between 'torture: ' and 'rcutorture: ' prefixes?
> 
> Yes, rcutorture is specific to RCU, while torture would also affect
> locktorture.
> 
> Ah, and if I am delaying the cond_resched() patch, I need to retest,
> which means I will send you the pull request tomorrow or Monday, depending
> on how the testing goes.

And here is the updated prototype pull request, FYI.  Again, moving the
cond_resched() patch requires retesting, and if all goes well, I will
send you the official pull request tomorrow morning, Pacific Time.

							Thanx, Paul

------------------------------------------------------------------------

The following changes since commit 520eccdfe187591a51ea9ab4c1a024ae4d0f68d9:

  Linux 4.13-rc2 (2017-07-23 16:15:17 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git 656e7c0c0a2e8d899f87fd7f081ea7a711146604

for you to fetch changes up to 656e7c0c0a2e8d899f87fd7f081ea7a711146604:

  Merge branches 'doc.2017.08.17a', 'fixes.2017.08.17a', 'hotplug.2017.07.25b', 'misc.2017.08.17a', 'spin_unlock_wait_no.2017.08.17a', 'srcu.2017.07.27c' and 'torture.2017.07.24c' into HEAD (2017-08-17 08:10:04 -0700)

----------------------------------------------------------------
Joe Perches (1):
      module: Fix pr_fmt() bug for header use of printk

Luis R. Rodriguez (2):
      swait: Add idle variants which don't contribute to load average
      rcu: Use idle versions of swait to make idle-hack clear

Manfred Spraul (1):
      net/netfilter/nf_conntrack_core: Fix net_conntrack_lock()

Masami Hiramatsu (1):
      rcu/tracing: Set disable_rcu_irq_enter on rcu_eqs_exit()

Mathieu Desnoyers (1):
      membarrier: Provide expedited private command

Oleg Nesterov (1):
      task_work: Replace spin_unlock_wait() with lock/unlock pair

Paul E. McKenney (55):
      documentation: Fix relation between nohz_full and rcu_nocbs
      init_task: Remove redundant INIT_TASK_RCU_TREE_PREEMPT() macro
      srcu: Move rcu_scheduler_starting() from Tiny RCU to Tiny SRCU
      rcutorture: Remove obsolete SRCU-C.boot
      srcu: Make process_srcu() be static
      rcutorture: Move SRCU status printing to SRCU implementations
      rcutorture: Print SRCU lock/unlock totals
      rcu: Remove CONFIG_TASKS_RCU ifdef from rcuperf.c
      rcutorture: Select CONFIG_PROVE_LOCKING for Tiny SRCU scenario
      torture: Add --kconfig argument to kvm.sh
      rcutorture: Don't wait for kernel when all builds fail
      rcutorture: Enable SRCU readers from timer handler
      rcutorture: Place event-traced strings into trace buffer
      rcutorture: Use nr_cpus rather than maxcpus to limit test size
      rcutorture: Add task's CPU for rcutorture writer stalls
      rcutorture: Eliminate unused ts_rem local from rcu_trace_clock_local()
      rcu: Add last-CPU to GP-kthread starvation messages
      rcutorture: Invoke call_rcu() from timer handler
      rcu: Use timer as backstop for NOCB deferred wakeups
      atomics: Revert addition of comment header to spin_unlock_wait()
      rcu: Migrate callbacks earlier in the CPU-offline timeline
      rcu: Make expedited GPs correctly handle hardware CPU insertion
      torture: Fix typo suppressing CPU-hotplug statistics
      rcu: Remove orphan/adopt event-tracing fields
      rcu: Check for NOCB CPUs and empty lists earlier in CB migration
      rcu: Make NOCB CPUs migrate CBs directly from outgoing CPU
      rcu: Advance outgoing CPU's callbacks before migrating them
      rcu: Eliminate rcu_state ->orphan_lock
      rcu: Advance callbacks after migration
      rcu: Localize rcu_state ->orphan_pend and ->orphan_done
      rcu: Remove unused RCU list functions
      rcu: Move callback-list warning to irq-disable region
      srcu: Provide ordering for CPU not involved in grace period
      sched: Replace spin_unlock_wait() with lock/unlock pair
      rcu: Drive TASKS_RCU directly off of PREEMPT
      rcu: Create reasonable API for do_exit() TASKS_RCU processing
      rcu: Add TPS() to event-traced strings
      rcu: Move rcu.h to new trivial-function style
      rcu: Add event tracing to ->gp_tasks update at GP start
      rcu: Add TPS() protection for _rcu_barrier_trace strings
      rcu: Add assertions verifying blocked-tasks list
      rcu: Add warning to rcu_idle_enter() for irqs enabled
      rcu: Remove exports from rcu_idle_exit() and rcu_idle_enter()
      doc: Update RCU documentation
      doc: Update memory-barriers.txt for read-to-write dependencies
      doc: Add RCU files to docbook-generation files
      doc: No longer allowed to use rcu_dereference on non-pointers
      doc: Set down RCU's scheduling-clock-interrupt needs
      completion: Replace spin_unlock_wait() with lock/unlock pair
      exit: Replace spin_unlock_wait() with lock/unlock pair
      ipc: Replace spin_unlock_wait() with lock/unlock pair
      drivers/ata: Replace spin_unlock_wait() with lock/unlock pair
      locking: Remove spin_unlock_wait() generic definitions
      arch: Remove spin_unlock_wait() arch-specific definitions
      Merge branches 'doc.2017.08.17a', 'fixes.2017.08.17a', 'hotplug.2017.07.25b', 'misc.2017.08.17a', 'spin_unlock_wait_no.2017.08.17a', 'srcu.2017.07.27c' and 'torture.2017.07.24c' into HEAD

Peter Zijlstra (Intel) (1):
      rcu: Make rcu_idle_enter() rely on callers disabling irqs

Tejun Heo (1):
      sched: Allow migrating kthreads into online but inactive CPUs

 .../RCU/Design/Requirements/Requirements.html      | 130 +++++++++++
 Documentation/RCU/checklist.txt                    | 121 +++++++----
 Documentation/RCU/rcu.txt                          |   9 +-
 Documentation/RCU/rcu_dereference.txt              |  61 ++----
 Documentation/RCU/rcubarrier.txt                   |   5 +
 Documentation/RCU/torture.txt                      |  20 +-
 Documentation/RCU/whatisRCU.txt                    |   5 +-
 Documentation/admin-guide/kernel-parameters.txt    |   7 +-
 Documentation/core-api/kernel-api.rst              |  49 +++++
 Documentation/memory-barriers.txt                  |  41 ++--
 MAINTAINERS                                        |   2 +-
 arch/alpha/include/asm/spinlock.h                  |   5 -
 arch/arc/include/asm/spinlock.h                    |   5 -
 arch/arm/include/asm/spinlock.h                    |  16 --
 arch/arm64/include/asm/spinlock.h                  |  58 +----
 arch/arm64/kernel/process.c                        |   2 +
 arch/blackfin/include/asm/spinlock.h               |   5 -
 arch/blackfin/kernel/module.c                      |  39 ++--
 arch/hexagon/include/asm/spinlock.h                |   5 -
 arch/ia64/include/asm/spinlock.h                   |  21 --
 arch/m32r/include/asm/spinlock.h                   |   5 -
 arch/metag/include/asm/spinlock.h                  |   5 -
 arch/mn10300/include/asm/spinlock.h                |   5 -
 arch/parisc/include/asm/spinlock.h                 |   7 -
 arch/powerpc/include/asm/spinlock.h                |  33 ---
 arch/s390/include/asm/spinlock.h                   |   7 -
 arch/sh/include/asm/spinlock-cas.h                 |   5 -
 arch/sh/include/asm/spinlock-llsc.h                |   5 -
 arch/sparc/include/asm/spinlock_32.h               |   5 -
 arch/tile/include/asm/spinlock_32.h                |   2 -
 arch/tile/include/asm/spinlock_64.h                |   2 -
 arch/tile/lib/spinlock_32.c                        |  23 --
 arch/tile/lib/spinlock_64.c                        |  22 --
 arch/xtensa/include/asm/spinlock.h                 |   5 -
 drivers/ata/libata-eh.c                            |   8 +-
 include/asm-generic/qspinlock.h                    |  14 --
 include/linux/init_task.h                          |   8 +-
 include/linux/rcupdate.h                           |  15 +-
 include/linux/rcutiny.h                            |   8 +-
 include/linux/sched.h                              |   5 +-
 include/linux/spinlock.h                           |  31 ---
 include/linux/spinlock_up.h                        |   6 -
 include/linux/srcutiny.h                           |  13 ++
 include/linux/srcutree.h                           |   3 +-
 include/linux/swait.h                              |  55 +++++
 include/trace/events/rcu.h                         |   7 +-
 include/uapi/linux/membarrier.h                    |  23 +-
 ipc/sem.c                                          |   3 +-
 kernel/Makefile                                    |   1 -
 kernel/cpu.c                                       |   1 +
 kernel/exit.c                                      |  10 +-
 kernel/locking/qspinlock.c                         | 117 ----------
 kernel/membarrier.c                                |  70 ------
 kernel/rcu/Kconfig                                 |   3 +-
 kernel/rcu/rcu.h                                   | 128 ++---------
 kernel/rcu/rcu_segcblist.c                         | 108 +++-------
 kernel/rcu/rcu_segcblist.h                         |  28 +--
 kernel/rcu/rcuperf.c                               |  17 +-
 kernel/rcu/rcutorture.c                            |  83 +++----
 kernel/rcu/srcutiny.c                              |   8 +
 kernel/rcu/srcutree.c                              |  50 ++++-
 kernel/rcu/tiny.c                                  |   2 -
 kernel/rcu/tiny_plugin.h                           |  47 ----
 kernel/rcu/tree.c                                  | 213 ++++++++----------
 kernel/rcu/tree.h                                  |  15 +-
 kernel/rcu/tree_exp.h                              |   2 +-
 kernel/rcu/tree_plugin.h                           | 238 ++++++++++++---------
 kernel/rcu/update.c                                |  18 +-
 kernel/sched/Makefile                              |   1 +
 kernel/sched/completion.c                          |  11 +-
 kernel/sched/core.c                                |  38 +++-
 kernel/sched/membarrier.c                          | 152 +++++++++++++
 kernel/task_work.c                                 |   8 +-
 kernel/torture.c                                   |   2 +-
 net/netfilter/nf_conntrack_core.c                  |  52 +++--
 .../selftests/rcutorture/bin/config_override.sh    |  61 ++++++
 .../testing/selftests/rcutorture/bin/functions.sh  |  27 ++-
 .../testing/selftests/rcutorture/bin/kvm-build.sh  |  11 +-
 .../selftests/rcutorture/bin/kvm-test-1-run.sh     |  58 ++---
 tools/testing/selftests/rcutorture/bin/kvm.sh      |  34 ++-
 .../selftests/rcutorture/configs/rcu/BUSTED.boot   |   2 +-
 .../selftests/rcutorture/configs/rcu/SRCU-C.boot   |   1 -
 .../selftests/rcutorture/configs/rcu/SRCU-u        |   3 +-
 .../selftests/rcutorture/configs/rcu/TREE01.boot   |   2 +-
 .../selftests/rcutorture/doc/TREE_RCU-kconfig.txt  |   2 +-
 85 files changed, 1237 insertions(+), 1323 deletions(-)
 delete mode 100644 kernel/membarrier.c
 delete mode 100644 kernel/rcu/tiny_plugin.h
 create mode 100644 kernel/sched/membarrier.c
 create mode 100755 tools/testing/selftests/rcutorture/bin/config_override.sh
 delete mode 100644 tools/testing/selftests/rcutorture/configs/rcu/SRCU-C.boot

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2017-08-17 15:32 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-24 22:12 [PATCH tip/core/rcu 0/9] Remove spin_unlock_wait() Paul E. McKenney
2017-07-24 22:13 ` [PATCH tip/core/rcu 1/9] net/netfilter/nf_conntrack_core: Fix net_conntrack_lock() Paul E. McKenney
2017-07-24 22:13 ` [PATCH tip/core/rcu 2/9] task_work: Replace spin_unlock_wait() with lock/unlock pair Paul E. McKenney
2017-07-24 22:13 ` [PATCH tip/core/rcu 3/9] sched: " Paul E. McKenney
2017-07-24 22:13 ` [PATCH tip/core/rcu 4/9] completion: " Paul E. McKenney
2017-08-15 16:16   ` [PATCH v5 " Paul E. McKenney
2017-08-16 15:22     ` Steven Rostedt
2017-08-17 15:07       ` Paul E. McKenney
2017-08-17  8:26     ` Ingo Molnar
2017-08-17 12:30       ` Paul E. McKenney
2017-08-17 12:49         ` Ingo Molnar
2017-08-17 14:13           ` Paul E. McKenney
2017-08-17 15:32             ` Paul E. McKenney
2017-07-24 22:13 ` [PATCH tip/core/rcu 5/9] exit: " Paul E. McKenney
2017-07-24 22:13 ` [PATCH tip/core/rcu 6/9] ipc: " Paul E. McKenney
2017-07-24 22:13 ` [PATCH tip/core/rcu 7/9] drivers/ata: " Paul E. McKenney
2017-07-24 22:13   ` Paul E. McKenney
2017-07-24 22:13 ` [PATCH tip/core/rcu 8/9] locking: Remove spin_unlock_wait() generic definitions Paul E. McKenney
2017-07-24 22:13 ` [PATCH tip/core/rcu 9/9] arch: Remove spin_unlock_wait() arch-specific definitions Paul E. McKenney
2017-07-24 22:13   ` Paul E. McKenney
2017-07-31 22:57 ` [PATCH v2 tip/core/rcu 0/10] Remove spin_unlock_wait() Paul E. McKenney
2017-07-31 22:58   ` [PATCH v2 tip/core/rcu 01/10] atomics: Revert addition of comment header to spin_unlock_wait() Paul E. McKenney
2017-07-31 22:58   ` [PATCH v2 tip/core/rcu 02/10] net/netfilter/nf_conntrack_core: Fix net_conntrack_lock() Paul E. McKenney
2017-07-31 22:58   ` [PATCH v2 tip/core/rcu 03/10] task_work: Replace spin_unlock_wait() with lock/unlock pair Paul E. McKenney
2017-07-31 22:58   ` [PATCH v2 tip/core/rcu 04/10] sched: " Paul E. McKenney
2017-07-31 22:58   ` [PATCH v2 tip/core/rcu 05/10] completion: " Paul E. McKenney
2017-07-31 22:58   ` [PATCH v2 tip/core/rcu 06/10] exit: " Paul E. McKenney
2017-07-31 22:58   ` [PATCH v2 tip/core/rcu 07/10] ipc: " Paul E. McKenney
2017-07-31 22:58   ` [PATCH v2 tip/core/rcu 08/10] drivers/ata: " Paul E. McKenney
2017-07-31 22:58     ` Paul E. McKenney
2017-07-31 22:58   ` [PATCH v2 tip/core/rcu 09/10] locking: Remove spin_unlock_wait() generic definitions Paul E. McKenney
2017-07-31 22:58   ` [PATCH v2 tip/core/rcu 10/10] arch: Remove spin_unlock_wait() arch-specific definitions Paul E. McKenney
2017-07-31 22:58     ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.