linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] ARC: spinlocks/atomics rework
@ 2015-08-03 10:03 Vineet Gupta
  2015-08-03 10:03 ` [PATCH 1/6] Revert "ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock" Vineet Gupta
                   ` (5 more replies)
  0 siblings, 6 replies; 21+ messages in thread
From: Vineet Gupta @ 2015-08-03 10:03 UTC (permalink / raw)
  To: Peter Zijlstra (Intel); +Cc: lkml, arc-linux-dev, Vineet Gupta

Hi Peter,

Just running this by you for quick eye balling and also as FYI.
The PREFETCHW workaround for llock/scond livelock was not sufficient after
all and we had to do some work there. Extending testing of quad core FPGA
builds shows things pretty stable, whereas w/o patches some of the LTP tests
(shm_open/23-1) would cause the system to go bonkers.

I need to send this to Linus 4.2-rcx so will appreicate if you could take
a quick peek.

Thx,
-Vineet

Vineet Gupta (6):
  Revert "ARCv2: STAR 9000837815 workaround hardware exclusive
    transactions livelock"
  ARC: refactor atomic inline asm operands with symbolic names
  ARC: LLOCK/SCOND based spin_lock
  ARC: LLOCK/SCOND based rwlock
  ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with
    exponential backoff
  ARCv2: spinlock/rwlock: Reset retry delay when starting a new
    spin-wait cycle

 arch/arc/Kconfig                      |   5 +
 arch/arc/include/asm/atomic.h         | 113 +++++--
 arch/arc/include/asm/spinlock.h       | 551 +++++++++++++++++++++++++++++++++-
 arch/arc/include/asm/spinlock_types.h |   2 +
 arch/arc/kernel/setup.c               |   4 +
 5 files changed, 636 insertions(+), 39 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/6] Revert "ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock"
  2015-08-03 10:03 [PATCH 0/6] ARC: spinlocks/atomics rework Vineet Gupta
@ 2015-08-03 10:03 ` Vineet Gupta
  2015-08-03 10:03 ` [PATCH 2/6] ARC: refactor atomic inline asm operands with symbolic names Vineet Gupta
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: Vineet Gupta @ 2015-08-03 10:03 UTC (permalink / raw)
  To: Peter Zijlstra (Intel); +Cc: lkml, arc-linux-dev, Vineet Gupta

Extended testing of quad core configuration revealed that this fix was
insufficient. Specifically LTP open posix shm_op/23-1 would cause the
hardware livelock in llock/scond loop in update_cpu_load_active()

So remove this and make way for a proper workaround

This reverts commit a5c8b52abe677977883655166796f167ef1e0084.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/atomic.h | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
index 03484cb4d16d..20b7dc17979e 100644
--- a/arch/arc/include/asm/atomic.h
+++ b/arch/arc/include/asm/atomic.h
@@ -23,21 +23,13 @@
 
 #define atomic_set(v, i) (((v)->counter) = (i))
 
-#ifdef CONFIG_ISA_ARCV2
-#define PREFETCHW	"	prefetchw   [%1]	\n"
-#else
-#define PREFETCHW
-#endif
-
 #define ATOMIC_OP(op, c_op, asm_op)					\
 static inline void atomic_##op(int i, atomic_t *v)			\
 {									\
 	unsigned int temp;						\
 									\
 	__asm__ __volatile__(						\
-	"1:				\n"				\
-	PREFETCHW							\
-	"	llock   %0, [%1]	\n"				\
+	"1:	llock   %0, [%1]	\n"				\
 	"	" #asm_op " %0, %0, %2	\n"				\
 	"	scond   %0, [%1]	\n"				\
 	"	bnz     1b		\n"				\
@@ -58,9 +50,7 @@ static inline int atomic_##op##_return(int i, atomic_t *v)		\
 	smp_mb();							\
 									\
 	__asm__ __volatile__(						\
-	"1:				\n"				\
-	PREFETCHW							\
-	"	llock   %0, [%1]	\n"				\
+	"1:	llock   %0, [%1]	\n"				\
 	"	" #asm_op " %0, %0, %2	\n"				\
 	"	scond   %0, [%1]	\n"				\
 	"	bnz     1b		\n"				\
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/6] ARC: refactor atomic inline asm operands with symbolic names
  2015-08-03 10:03 [PATCH 0/6] ARC: spinlocks/atomics rework Vineet Gupta
  2015-08-03 10:03 ` [PATCH 1/6] Revert "ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock" Vineet Gupta
@ 2015-08-03 10:03 ` Vineet Gupta
  2015-08-03 10:03 ` [PATCH 3/6] ARC: LLOCK/SCOND based spin_lock Vineet Gupta
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: Vineet Gupta @ 2015-08-03 10:03 UTC (permalink / raw)
  To: Peter Zijlstra (Intel); +Cc: lkml, arc-linux-dev, Vineet Gupta

This reduces the diff in forth-coming patches and also helps understand
better the incremental changes to inline asm.

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/atomic.h | 32 +++++++++++++++++---------------
 1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
index 20b7dc17979e..3dd36c1efee1 100644
--- a/arch/arc/include/asm/atomic.h
+++ b/arch/arc/include/asm/atomic.h
@@ -26,22 +26,23 @@
 #define ATOMIC_OP(op, c_op, asm_op)					\
 static inline void atomic_##op(int i, atomic_t *v)			\
 {									\
-	unsigned int temp;						\
+	unsigned int val;						\
 									\
 	__asm__ __volatile__(						\
-	"1:	llock   %0, [%1]	\n"				\
-	"	" #asm_op " %0, %0, %2	\n"				\
-	"	scond   %0, [%1]	\n"				\
-	"	bnz     1b		\n"				\
-	: "=&r"(temp)	/* Early clobber, to prevent reg reuse */	\
-	: "r"(&v->counter), "ir"(i)					\
+	"1:	llock   %[val], [%[ctr]]		\n"		\
+	"	" #asm_op " %[val], %[val], %[i]	\n"		\
+	"	scond   %[val], [%[ctr]]		\n"		\
+	"	bnz     1b				\n"		\
+	: [val]	"=&r"	(val) /* Early clobber to prevent reg reuse */	\
+	: [ctr]	"r"	(&v->counter), /* Not "m": llock only supports reg direct addr mode */	\
+	  [i]	"ir"	(i)						\
 	: "cc");							\
 }									\
 
 #define ATOMIC_OP_RETURN(op, c_op, asm_op)				\
 static inline int atomic_##op##_return(int i, atomic_t *v)		\
 {									\
-	unsigned int temp;						\
+	unsigned int val;						\
 									\
 	/*								\
 	 * Explicit full memory barrier needed before/after as		\
@@ -50,17 +51,18 @@ static inline int atomic_##op##_return(int i, atomic_t *v)		\
 	smp_mb();							\
 									\
 	__asm__ __volatile__(						\
-	"1:	llock   %0, [%1]	\n"				\
-	"	" #asm_op " %0, %0, %2	\n"				\
-	"	scond   %0, [%1]	\n"				\
-	"	bnz     1b		\n"				\
-	: "=&r"(temp)							\
-	: "r"(&v->counter), "ir"(i)					\
+	"1:	llock   %[val], [%[ctr]]		\n"		\
+	"	" #asm_op " %[val], %[val], %[i]	\n"		\
+	"	scond   %[val], [%[ctr]]		\n"		\
+	"	bnz     1b				\n"		\
+	: [val]	"=&r"	(val)						\
+	: [ctr]	"r"	(&v->counter),					\
+	  [i]	"ir"	(i)						\
 	: "cc");							\
 									\
 	smp_mb();							\
 									\
-	return temp;							\
+	return val;							\
 }
 
 #else	/* !CONFIG_ARC_HAS_LLSC */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/6] ARC: LLOCK/SCOND based spin_lock
  2015-08-03 10:03 [PATCH 0/6] ARC: spinlocks/atomics rework Vineet Gupta
  2015-08-03 10:03 ` [PATCH 1/6] Revert "ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock" Vineet Gupta
  2015-08-03 10:03 ` [PATCH 2/6] ARC: refactor atomic inline asm operands with symbolic names Vineet Gupta
@ 2015-08-03 10:03 ` Vineet Gupta
  2015-08-03 11:29   ` Peter Zijlstra
  2015-08-03 10:03 ` [PATCH 4/6] ARC: LLOCK/SCOND based rwlock Vineet Gupta
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Vineet Gupta @ 2015-08-03 10:03 UTC (permalink / raw)
  To: Peter Zijlstra (Intel); +Cc: lkml, arc-linux-dev, Vineet Gupta

EX causes the cache line to be in Exclusive state and if done
concurrently by multiple cores, it keeps bouncing around.
In LLOCK/SCOND regime, spinning only involves LLOCK which doesn't
change the line state hence better solution.

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/spinlock.h | 76 +++++++++++++++++++++++++++++++++++++----
 1 file changed, 69 insertions(+), 7 deletions(-)

diff --git a/arch/arc/include/asm/spinlock.h b/arch/arc/include/asm/spinlock.h
index e1651df6a93d..4f6c90a0a68a 100644
--- a/arch/arc/include/asm/spinlock.h
+++ b/arch/arc/include/asm/spinlock.h
@@ -18,9 +18,68 @@
 #define arch_spin_unlock_wait(x) \
 	do { while (arch_spin_is_locked(x)) cpu_relax(); } while (0)
 
+#ifdef CONFIG_ARC_HAS_LLSC
+
+static inline void arch_spin_lock(arch_spinlock_t *lock)
+{
+	unsigned int val;
+
+	smp_mb();
+
+	__asm__ __volatile__(
+	"1:	llock	%[val], [%[slock]]	\n"
+	"	breq	%[val], %[LOCKED], 1b	\n"	/* spin while LOCKED */
+	"	scond	%[LOCKED], [%[slock]]	\n"	/* acquire */
+	"	bnz	1b			\n"
+	"					\n"
+	: [val]		"=&r"	(val)
+	: [slock]	"r"	(&(lock->slock)),
+	  [LOCKED]	"r"	(__ARCH_SPIN_LOCK_LOCKED__)
+	: "memory", "cc");
+
+	smp_mb();
+}
+
+/* 1 - lock taken successfully */
+static inline int arch_spin_trylock(arch_spinlock_t *lock)
+{
+	unsigned int val, got_it = 0;
+
+	smp_mb();
+
+	__asm__ __volatile__(
+	"1:	llock	%[val], [%[slock]]	\n"
+	"	breq	%[val], %[LOCKED], 4f	\n"	/* already LOCKED, just bail */
+	"	scond	%[LOCKED], [%[slock]]	\n"	/* acquire */
+	"	bnz	1b			\n"
+	"	mov	%[got_it], 1		\n"
+	"4:					\n"
+	"					\n"
+	: [val]		"=&r"	(val),
+	  [got_it]	"+&r"	(got_it)
+	: [slock]	"r"	(&(lock->slock)),
+	  [LOCKED]	"r"	(__ARCH_SPIN_LOCK_LOCKED__)
+	: "memory", "cc");
+
+	smp_mb();
+
+	return got_it;
+}
+
+static inline void arch_spin_unlock(arch_spinlock_t *lock)
+{
+	smp_mb();
+
+	lock->slock = __ARCH_SPIN_LOCK_UNLOCKED__;
+
+	smp_mb();
+}
+
+#else	/* !CONFIG_ARC_HAS_LLSC */
+
 static inline void arch_spin_lock(arch_spinlock_t *lock)
 {
-	unsigned int tmp = __ARCH_SPIN_LOCK_LOCKED__;
+	unsigned int val = __ARCH_SPIN_LOCK_LOCKED__;
 
 	/*
 	 * This smp_mb() is technically superfluous, we only need the one
@@ -33,7 +92,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
 	__asm__ __volatile__(
 	"1:	ex  %0, [%1]		\n"
 	"	breq  %0, %2, 1b	\n"
-	: "+&r" (tmp)
+	: "+&r" (val)
 	: "r"(&(lock->slock)), "ir"(__ARCH_SPIN_LOCK_LOCKED__)
 	: "memory");
 
@@ -48,26 +107,27 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
 	smp_mb();
 }
 
+/* 1 - lock taken successfully */
 static inline int arch_spin_trylock(arch_spinlock_t *lock)
 {
-	unsigned int tmp = __ARCH_SPIN_LOCK_LOCKED__;
+	unsigned int val = __ARCH_SPIN_LOCK_LOCKED__;
 
 	smp_mb();
 
 	__asm__ __volatile__(
 	"1:	ex  %0, [%1]		\n"
-	: "+r" (tmp)
+	: "+r" (val)
 	: "r"(&(lock->slock))
 	: "memory");
 
 	smp_mb();
 
-	return (tmp == __ARCH_SPIN_LOCK_UNLOCKED__);
+	return (val == __ARCH_SPIN_LOCK_UNLOCKED__);
 }
 
 static inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
-	unsigned int tmp = __ARCH_SPIN_LOCK_UNLOCKED__;
+	unsigned int val = __ARCH_SPIN_LOCK_UNLOCKED__;
 
 	/*
 	 * RELEASE barrier: given the instructions avail on ARCv2, full barrier
@@ -77,7 +137,7 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 
 	__asm__ __volatile__(
 	"	ex  %0, [%1]		\n"
-	: "+r" (tmp)
+	: "+r" (val)
 	: "r"(&(lock->slock))
 	: "memory");
 
@@ -88,6 +148,8 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 	smp_mb();
 }
 
+#endif
+
 /*
  * Read-write spinlocks, allowing multiple readers but only one writer.
  *
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/6] ARC: LLOCK/SCOND based rwlock
  2015-08-03 10:03 [PATCH 0/6] ARC: spinlocks/atomics rework Vineet Gupta
                   ` (2 preceding siblings ...)
  2015-08-03 10:03 ` [PATCH 3/6] ARC: LLOCK/SCOND based spin_lock Vineet Gupta
@ 2015-08-03 10:03 ` Vineet Gupta
  2015-08-03 11:33   ` Peter Zijlstra
  2015-08-03 10:03 ` [PATCH 5/6] ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff Vineet Gupta
  2015-08-03 10:03 ` [PATCH 6/6] ARCv2: spinlock/rwlock: Reset retry delay when starting a new spin-wait cycle Vineet Gupta
  5 siblings, 1 reply; 21+ messages in thread
From: Vineet Gupta @ 2015-08-03 10:03 UTC (permalink / raw)
  To: Peter Zijlstra (Intel); +Cc: lkml, arc-linux-dev, Vineet Gupta

With LLOCK/SCOND, the rwlock counter can be atomically updated w/o need
for a guarding spin lock.

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/spinlock.h       | 187 ++++++++++++++++++++++++++++++++--
 arch/arc/include/asm/spinlock_types.h |   2 +
 2 files changed, 179 insertions(+), 10 deletions(-)

diff --git a/arch/arc/include/asm/spinlock.h b/arch/arc/include/asm/spinlock.h
index 4f6c90a0a68a..9a7a7293f127 100644
--- a/arch/arc/include/asm/spinlock.h
+++ b/arch/arc/include/asm/spinlock.h
@@ -75,6 +75,177 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 	smp_mb();
 }
 
+/*
+ * Read-write spinlocks, allowing multiple readers but only one writer.
+ * Unfair locking as Writers could be starved indefinitely by Reader(s)
+ */
+
+static inline void arch_read_lock(arch_rwlock_t *rw)
+{
+	unsigned int val;
+
+	smp_mb();
+
+	/*
+	 * zero means writer holds the lock exclusively, deny Reader.
+	 * Otherwise grant lock to first/subseq reader
+	 *
+	 * 	if (rw->counter > 0) {
+	 *		rw->counter--;
+	 *		ret = 1;
+	 *	}
+	 */
+
+	__asm__ __volatile__(
+	"1:	llock	%[val], [%[rwlock]]	\n"
+	"	brls	%[val], %[WR_LOCKED], 1b\n"	/* <= 0: spin while write locked */
+	"	sub	%[val], %[val], 1	\n"	/* reader lock */
+	"	scond	%[val], [%[rwlock]]	\n"
+	"	bnz	1b			\n"
+	"					\n"
+	: [val]		"=&r"	(val)
+	: [rwlock]	"r"	(&(rw->counter)),
+	  [WR_LOCKED]	"ir"	(0)
+	: "memory", "cc");
+
+	smp_mb();
+}
+
+/* 1 - lock taken successfully */
+static inline int arch_read_trylock(arch_rwlock_t *rw)
+{
+	unsigned int val, got_it = 0;
+
+	smp_mb();
+
+	__asm__ __volatile__(
+	"1:	llock	%[val], [%[rwlock]]	\n"
+	"	brls	%[val], %[WR_LOCKED], 4f\n"	/* <= 0: already write locked, bail */
+	"	sub	%[val], %[val], 1	\n"	/* counter-- */
+	"	scond	%[val], [%[rwlock]]	\n"
+	"	bnz	1b			\n"	/* retry if collided with someone */
+	"	mov	%[got_it], 1		\n"
+	"					\n"
+	"4: ; --- done ---			\n"
+
+	: [val]		"=&r"	(val),
+	  [got_it]	"+&r"	(got_it)
+	: [rwlock]	"r"	(&(rw->counter)),
+	  [WR_LOCKED]	"ir"	(0)
+	: "memory", "cc");
+
+	smp_mb();
+
+	return got_it;
+}
+
+static inline void arch_write_lock(arch_rwlock_t *rw)
+{
+	unsigned int val;
+
+	smp_mb();
+
+	/*
+	 * If reader(s) hold lock (lock < __ARCH_RW_LOCK_UNLOCKED__),
+	 * deny writer. Otherwise if unlocked grant to writer
+	 * Hence the claim that Linux rwlocks are unfair to writers.
+	 * (can be starved for an indefinite time by readers).
+	 *
+	 *	if (rw->counter == __ARCH_RW_LOCK_UNLOCKED__) {
+	 *		rw->counter = 0;
+	 *		ret = 1;
+	 *	}
+	 */
+
+	__asm__ __volatile__(
+	"1:	llock	%[val], [%[rwlock]]	\n"
+	"	brne	%[val], %[UNLOCKED], 1b	\n"	/* while !UNLOCKED spin */
+	"	mov	%[val], %[WR_LOCKED]	\n"
+	"	scond	%[val], [%[rwlock]]	\n"
+	"	bnz	1b			\n"
+	"					\n"
+	: [val]		"=&r"	(val)
+	: [rwlock]	"r"	(&(rw->counter)),
+	  [UNLOCKED]	"ir"	(__ARCH_RW_LOCK_UNLOCKED__),
+	  [WR_LOCKED]	"ir"	(0)
+	: "memory", "cc");
+
+	smp_mb();
+}
+
+/* 1 - lock taken successfully */
+static inline int arch_write_trylock(arch_rwlock_t *rw)
+{
+	unsigned int val, got_it = 0;
+
+	smp_mb();
+
+	__asm__ __volatile__(
+	"1:	llock	%[val], [%[rwlock]]	\n"
+	"	brne	%[val], %[UNLOCKED], 4f	\n"	/* !UNLOCKED, bail */
+	"	mov	%[val], %[WR_LOCKED]	\n"
+	"	scond	%[val], [%[rwlock]]	\n"
+	"	bnz	1b			\n"	/* retry if collided with someone */
+	"	mov	%[got_it], 1		\n"
+	"					\n"
+	"4: ; --- done ---			\n"
+
+	: [val]		"=&r"	(val),
+	  [got_it]	"+&r"	(got_it)
+	: [rwlock]	"r"	(&(rw->counter)),
+	  [UNLOCKED]	"ir"	(__ARCH_RW_LOCK_UNLOCKED__),
+	  [WR_LOCKED]	"ir"	(0)
+	: "memory", "cc");
+
+	smp_mb();
+
+	return got_it;
+}
+
+static inline void arch_read_unlock(arch_rwlock_t *rw)
+{
+	unsigned int val;
+
+	smp_mb();
+
+	/*
+	 * rw->counter++;
+	 */
+	__asm__ __volatile__(
+	"1:	llock	%[val], [%[rwlock]]	\n"
+	"	add	%[val], %[val], 1	\n"
+	"	scond	%[val], [%[rwlock]]	\n"
+	"	bnz	1b			\n"
+	"					\n"
+	: [val]		"=&r"	(val)
+	: [rwlock]	"r"	(&(rw->counter))
+	: "memory", "cc");
+
+	smp_mb();
+}
+
+static inline void arch_write_unlock(arch_rwlock_t *rw)
+{
+	unsigned int val;
+
+	smp_mb();
+
+	/*
+	 * rw->counter = __ARCH_RW_LOCK_UNLOCKED__;
+	 */
+	__asm__ __volatile__(
+	"1:	llock	%[val], [%[rwlock]]	\n"
+	"	scond	%[UNLOCKED], [%[rwlock]]\n"
+	"	bnz	1b			\n"
+	"					\n"
+	: [val]		"=&r"	(val)
+	: [rwlock]	"r"	(&(rw->counter)),
+	  [UNLOCKED]	"r"	(__ARCH_RW_LOCK_UNLOCKED__)
+	: "memory", "cc");
+
+	smp_mb();
+}
+
 #else	/* !CONFIG_ARC_HAS_LLSC */
 
 static inline void arch_spin_lock(arch_spinlock_t *lock)
@@ -148,23 +319,14 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 	smp_mb();
 }
 
-#endif
-
 /*
  * Read-write spinlocks, allowing multiple readers but only one writer.
+ * Unfair locking as Writers could be starved indefinitely by Reader(s)
  *
  * The spinlock itself is contained in @counter and access to it is
  * serialized with @lock_mutex.
- *
- * Unfair locking as Writers could be starved indefinitely by Reader(s)
  */
 
-/* Would read_trylock() succeed? */
-#define arch_read_can_lock(x)	((x)->counter > 0)
-
-/* Would write_trylock() succeed? */
-#define arch_write_can_lock(x)	((x)->counter == __ARCH_RW_LOCK_UNLOCKED__)
-
 /* 1 - lock taken successfully */
 static inline int arch_read_trylock(arch_rwlock_t *rw)
 {
@@ -235,6 +397,11 @@ static inline void arch_write_unlock(arch_rwlock_t *rw)
 	arch_spin_unlock(&(rw->lock_mutex));
 }
 
+#endif
+
+#define arch_read_can_lock(x)	((x)->counter > 0)
+#define arch_write_can_lock(x)	((x)->counter == __ARCH_RW_LOCK_UNLOCKED__)
+
 #define arch_read_lock_flags(lock, flags)	arch_read_lock(lock)
 #define arch_write_lock_flags(lock, flags)	arch_write_lock(lock)
 
diff --git a/arch/arc/include/asm/spinlock_types.h b/arch/arc/include/asm/spinlock_types.h
index 662627ced4f2..4e1ef5f650c6 100644
--- a/arch/arc/include/asm/spinlock_types.h
+++ b/arch/arc/include/asm/spinlock_types.h
@@ -26,7 +26,9 @@ typedef struct {
  */
 typedef struct {
 	volatile unsigned int	counter;
+#ifndef CONFIG_ARC_HAS_LLSC
 	arch_spinlock_t		lock_mutex;
+#endif
 } arch_rwlock_t;
 
 #define __ARCH_RW_LOCK_UNLOCKED__	0x01000000
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/6] ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff
  2015-08-03 10:03 [PATCH 0/6] ARC: spinlocks/atomics rework Vineet Gupta
                   ` (3 preceding siblings ...)
  2015-08-03 10:03 ` [PATCH 4/6] ARC: LLOCK/SCOND based rwlock Vineet Gupta
@ 2015-08-03 10:03 ` Vineet Gupta
  2015-08-03 11:41   ` Peter Zijlstra
  2015-08-03 11:50   ` Peter Zijlstra
  2015-08-03 10:03 ` [PATCH 6/6] ARCv2: spinlock/rwlock: Reset retry delay when starting a new spin-wait cycle Vineet Gupta
  5 siblings, 2 replies; 21+ messages in thread
From: Vineet Gupta @ 2015-08-03 10:03 UTC (permalink / raw)
  To: Peter Zijlstra (Intel); +Cc: lkml, arc-linux-dev, Vineet Gupta

This is to workaround the hardware livelock when multiple cores are trying to
SCOND to same location (see code comments for details)

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/Kconfig                |   5 +
 arch/arc/include/asm/atomic.h   |  73 ++++++++++
 arch/arc/include/asm/spinlock.h | 292 ++++++++++++++++++++++++++++++++++++++++
 arch/arc/kernel/setup.c         |   4 +
 4 files changed, 374 insertions(+)

diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index a5fccdfbfc8f..bd4670d1b89b 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -365,6 +365,11 @@ config ARC_HAS_LLSC
 	default y
 	depends on !ARC_CANT_LLSC
 
+config ARC_STAR_9000923308
+	bool "Workaround for llock/scond livelock"
+	default y
+	depends on ISA_ARCV2 && SMP && ARC_HAS_LLSC
+
 config ARC_HAS_SWAPE
 	bool "Insn: SWAPE (endian-swap)"
 	default y
diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
index 3dd36c1efee1..f59d2cfc00fc 100644
--- a/arch/arc/include/asm/atomic.h
+++ b/arch/arc/include/asm/atomic.h
@@ -23,6 +23,8 @@
 
 #define atomic_set(v, i) (((v)->counter) = (i))
 
+#ifndef CONFIG_ARC_STAR_9000923308
+
 #define ATOMIC_OP(op, c_op, asm_op)					\
 static inline void atomic_##op(int i, atomic_t *v)			\
 {									\
@@ -65,6 +67,74 @@ static inline int atomic_##op##_return(int i, atomic_t *v)		\
 	return val;							\
 }
 
+#else	/* CONFIG_ARC_STAR_9000923308 */
+
+#define SCOND_FAIL_RETRY_VAR_DEF						\
+	unsigned int delay = 1, tmp;						\
+
+#define SCOND_FAIL_RETRY_ASM							\
+	"	bz	4f			\n"				\
+	"   ; --- scond fail delay ---		\n"				\
+	"	mov	%[tmp], %[delay]	\n"	/* tmp = delay */	\
+	"2: 	brne.d	%[tmp], 0, 2b		\n"	/* while (tmp != 0) */	\
+	"	sub	%[tmp], %[tmp], 1	\n"	/* tmp-- */		\
+	"	asl	%[delay], %[delay], 1	\n"	/* delay *= 2 */	\
+	"	b	1b			\n"	/* start over */	\
+	"4: ; --- success ---			\n"				\
+
+#define SCOND_FAIL_RETRY_VARS							\
+	  ,[delay] "+&r" (delay),[tmp] "=&r"	(tmp)				\
+
+#define ATOMIC_OP(op, c_op, asm_op)					\
+static inline void atomic_##op(int i, atomic_t *v)			\
+{									\
+	unsigned int val, delay = 1, tmp;				\
+									\
+	__asm__ __volatile__(						\
+	"1:	llock   %[val], [%[ctr]]		\n"		\
+	"	" #asm_op " %[val], %[val], %[i]	\n"		\
+	"	scond   %[val], [%[ctr]]		\n"		\
+	"						\n"		\
+	SCOND_FAIL_RETRY_ASM						\
+									\
+	: [val]	"=&r"	(val) /* Early clobber to prevent reg reuse */	\
+	  SCOND_FAIL_RETRY_VARS						\
+	: [ctr]	"r"	(&v->counter), /* Not "m": llock only supports reg direct addr mode */	\
+	  [i]	"ir"	(i)						\
+	: "cc");							\
+}									\
+
+#define ATOMIC_OP_RETURN(op, c_op, asm_op)				\
+static inline int atomic_##op##_return(int i, atomic_t *v)		\
+{									\
+	unsigned int val, delay = 1, tmp;				\
+									\
+	/*								\
+	 * Explicit full memory barrier needed before/after as		\
+	 * LLOCK/SCOND thmeselves don't provide any such semantics	\
+	 */								\
+	smp_mb();							\
+									\
+	__asm__ __volatile__(						\
+	"1:	llock   %[val], [%[ctr]]		\n"		\
+	"	" #asm_op " %[val], %[val], %[i]	\n"		\
+	"	scond   %[val], [%[ctr]]		\n"		\
+	"						\n"		\
+	SCOND_FAIL_RETRY_ASM						\
+									\
+	: [val]	"=&r"	(val)						\
+	  SCOND_FAIL_RETRY_VARS						\
+	: [ctr]	"r"	(&v->counter),					\
+	  [i]	"ir"	(i)						\
+	: "cc");							\
+									\
+	smp_mb();							\
+									\
+	return val;							\
+}
+
+#endif	/* CONFIG_ARC_STAR_9000923308 */
+
 #else	/* !CONFIG_ARC_HAS_LLSC */
 
 #ifndef CONFIG_SMP
@@ -142,6 +212,9 @@ ATOMIC_OP(and, &=, and)
 #undef ATOMIC_OPS
 #undef ATOMIC_OP_RETURN
 #undef ATOMIC_OP
+#undef SCOND_FAIL_RETRY_VAR_DEF
+#undef SCOND_FAIL_RETRY_ASM
+#undef SCOND_FAIL_RETRY_VARS
 
 /**
  * __atomic_add_unless - add unless the number is a given value
diff --git a/arch/arc/include/asm/spinlock.h b/arch/arc/include/asm/spinlock.h
index 9a7a7293f127..297c54484013 100644
--- a/arch/arc/include/asm/spinlock.h
+++ b/arch/arc/include/asm/spinlock.h
@@ -20,6 +20,11 @@
 
 #ifdef CONFIG_ARC_HAS_LLSC
 
+/*
+ * A normal LLOCK/SCOND based system, w/o need for livelock workaround
+ */
+#ifndef CONFIG_ARC_STAR_9000923308
+
 static inline void arch_spin_lock(arch_spinlock_t *lock)
 {
 	unsigned int val;
@@ -246,6 +251,293 @@ static inline void arch_write_unlock(arch_rwlock_t *rw)
 	smp_mb();
 }
 
+#else	/* CONFIG_ARC_STAR_9000923308 */
+
+/*
+ * HS38x4 could get into a LLOCK/SCOND livelock in case of multiple overlapping
+ * coherency transactions in the SCU. The exclusive line state keeps rotating
+ * among contenting cores leading to a never ending cycle. So break the cycle
+ * by deferring the retry of failed exclusive access (SCOND). The actual delay
+ * needed is function of number of contending cores as well as the unrelated
+ * coherency traffic from other cores. To keep the code simple, start off with
+ * small delay of 1 which would suffice most cases and in case of contention
+ * double the delay. Eventually the delay is sufficient such that the coherency
+ * pipeline is drained, thus a subsequent exclusive access would succeed.
+ */
+
+#define SCOND_FAIL_RETRY_VAR_DEF						\
+	unsigned int delay, tmp;						\
+
+#define SCOND_FAIL_RETRY_ASM							\
+	"   ; --- scond fail delay ---		\n"				\
+	"	mov	%[tmp], %[delay]	\n"	/* tmp = delay */	\
+	"2: 	brne.d	%[tmp], 0, 2b		\n"	/* while (tmp != 0) */	\
+	"	sub	%[tmp], %[tmp], 1	\n"	/* tmp-- */		\
+	"	asl	%[delay], %[delay], 1	\n"	/* delay *= 2 */	\
+	"	b	1b			\n"	/* start over */	\
+	"					\n"				\
+	"4: ; --- done ---			\n"				\
+
+#define SCOND_FAIL_RETRY_VARS							\
+	  ,[delay] "=&r" (delay), [tmp] "=&r"	(tmp)				\
+
+static inline void arch_spin_lock(arch_spinlock_t *lock)
+{
+	unsigned int val;
+	SCOND_FAIL_RETRY_VAR_DEF;
+
+	smp_mb();
+
+	__asm__ __volatile__(
+	"0:	mov	%[delay], 1		\n"
+	"1:	llock	%[val], [%[slock]]	\n"
+	"	breq	%[val], %[LOCKED], 1b	\n"	/* spin while LOCKED */
+	"	scond	%[LOCKED], [%[slock]]	\n"	/* acquire */
+	"	bz	4f			\n"	/* done */
+	"					\n"
+	SCOND_FAIL_RETRY_ASM
+
+	: [val]		"=&r"	(val)
+	  SCOND_FAIL_RETRY_VARS
+	: [slock]	"r"	(&(lock->slock)),
+	  [LOCKED]	"r"	(__ARCH_SPIN_LOCK_LOCKED__)
+	: "memory", "cc");
+
+	smp_mb();
+}
+
+/* 1 - lock taken successfully */
+static inline int arch_spin_trylock(arch_spinlock_t *lock)
+{
+	unsigned int val, got_it = 0;
+	SCOND_FAIL_RETRY_VAR_DEF;
+
+	smp_mb();
+
+	__asm__ __volatile__(
+	"0:	mov	%[delay], 1		\n"
+	"1:	llock	%[val], [%[slock]]	\n"
+	"	breq	%[val], %[LOCKED], 4f	\n"	/* already LOCKED, just bail */
+	"	scond	%[LOCKED], [%[slock]]	\n"	/* acquire */
+	"	bz.d	4f			\n"
+	"	mov.z	%[got_it], 1		\n"	/* got it */
+	"					\n"
+	SCOND_FAIL_RETRY_ASM
+
+	: [val]		"=&r"	(val),
+	  [got_it]	"+&r"	(got_it)
+	  SCOND_FAIL_RETRY_VARS
+	: [slock]	"r"	(&(lock->slock)),
+	  [LOCKED]	"r"	(__ARCH_SPIN_LOCK_LOCKED__)
+	: "memory", "cc");
+
+	smp_mb();
+
+	return got_it;
+}
+
+static inline void arch_spin_unlock(arch_spinlock_t *lock)
+{
+	smp_mb();
+
+	lock->slock = __ARCH_SPIN_LOCK_UNLOCKED__;
+
+	smp_mb();
+}
+
+/*
+ * Read-write spinlocks, allowing multiple readers but only one writer.
+ * Unfair locking as Writers could be starved indefinitely by Reader(s)
+ */
+
+static inline void arch_read_lock(arch_rwlock_t *rw)
+{
+	unsigned int val;
+	SCOND_FAIL_RETRY_VAR_DEF;
+
+	smp_mb();
+
+	/*
+	 * zero means writer holds the lock exclusively, deny Reader.
+	 * Otherwise grant lock to first/subseq reader
+	 *
+	 * 	if (rw->counter > 0) {
+	 *		rw->counter--;
+	 *		ret = 1;
+	 *	}
+	 */
+
+	__asm__ __volatile__(
+	"0:	mov	%[delay], 1		\n"
+	"1:	llock	%[val], [%[rwlock]]	\n"
+	"	brls	%[val], %[WR_LOCKED], 1b\n"	/* <= 0: spin while write locked */
+	"	sub	%[val], %[val], 1	\n"	/* reader lock */
+	"	scond	%[val], [%[rwlock]]	\n"
+	"	bz	4f			\n"	/* done */
+	"					\n"
+	SCOND_FAIL_RETRY_ASM
+
+	: [val]		"=&r"	(val)
+	  SCOND_FAIL_RETRY_VARS
+	: [rwlock]	"r"	(&(rw->counter)),
+	  [WR_LOCKED]	"ir"	(0)
+	: "memory", "cc");
+
+	smp_mb();
+}
+
+/* 1 - lock taken successfully */
+static inline int arch_read_trylock(arch_rwlock_t *rw)
+{
+	unsigned int val, got_it = 0;
+	SCOND_FAIL_RETRY_VAR_DEF;
+
+	smp_mb();
+
+	__asm__ __volatile__(
+	"0:	mov	%[delay], 1		\n"
+	"1:	llock	%[val], [%[rwlock]]	\n"
+	"	brls	%[val], %[WR_LOCKED], 4f\n"	/* <= 0: already write locked, bail */
+	"	sub	%[val], %[val], 1	\n"	/* counter-- */
+	"	scond	%[val], [%[rwlock]]	\n"
+	"	bz.d	4f			\n"
+	"	mov.z	%[got_it], 1		\n"	/* got it */
+	"					\n"
+	SCOND_FAIL_RETRY_ASM
+
+	: [val]		"=&r"	(val),
+	  [got_it]	"+&r"	(got_it)
+	  SCOND_FAIL_RETRY_VARS
+	: [rwlock]	"r"	(&(rw->counter)),
+	  [WR_LOCKED]	"ir"	(0)
+	: "memory", "cc");
+
+	smp_mb();
+
+	return got_it;
+}
+
+static inline void arch_write_lock(arch_rwlock_t *rw)
+{
+	unsigned int val;
+	SCOND_FAIL_RETRY_VAR_DEF;
+
+	smp_mb();
+
+	/*
+	 * If reader(s) hold lock (lock < __ARCH_RW_LOCK_UNLOCKED__),
+	 * deny writer. Otherwise if unlocked grant to writer
+	 * Hence the claim that Linux rwlocks are unfair to writers.
+	 * (can be starved for an indefinite time by readers).
+	 *
+	 *	if (rw->counter == __ARCH_RW_LOCK_UNLOCKED__) {
+	 *		rw->counter = 0;
+	 *		ret = 1;
+	 *	}
+	 */
+
+	__asm__ __volatile__(
+	"0:	mov	%[delay], 1		\n"
+	"1:	llock	%[val], [%[rwlock]]	\n"
+	"	brne	%[val], %[UNLOCKED], 1b	\n"	/* while !UNLOCKED spin */
+	"	mov	%[val], %[WR_LOCKED]	\n"
+	"	scond	%[val], [%[rwlock]]	\n"
+	"	bz	4f			\n"
+	"					\n"
+	SCOND_FAIL_RETRY_ASM
+
+	: [val]		"=&r"	(val)
+	  SCOND_FAIL_RETRY_VARS
+	: [rwlock]	"r"	(&(rw->counter)),
+	  [UNLOCKED]	"ir"	(__ARCH_RW_LOCK_UNLOCKED__),
+	  [WR_LOCKED]	"ir"	(0)
+	: "memory", "cc");
+
+	smp_mb();
+}
+
+/* 1 - lock taken successfully */
+static inline int arch_write_trylock(arch_rwlock_t *rw)
+{
+	unsigned int val, got_it = 0;
+	SCOND_FAIL_RETRY_VAR_DEF;
+
+	smp_mb();
+
+	__asm__ __volatile__(
+	"0:	mov	%[delay], 1		\n"
+	"1:	llock	%[val], [%[rwlock]]	\n"
+	"	brne	%[val], %[UNLOCKED], 4f	\n"	/* !UNLOCKED, bail */
+	"	mov	%[val], %[WR_LOCKED]	\n"
+	"	scond	%[val], [%[rwlock]]	\n"
+	"	bz.d	4f			\n"
+	"	mov.z	%[got_it], 1		\n"	/* got it */
+	"					\n"
+	SCOND_FAIL_RETRY_ASM
+
+	: [val]		"=&r"	(val),
+	  [got_it]	"+&r"	(got_it)
+	  SCOND_FAIL_RETRY_VARS
+	: [rwlock]	"r"	(&(rw->counter)),
+	  [UNLOCKED]	"ir"	(__ARCH_RW_LOCK_UNLOCKED__),
+	  [WR_LOCKED]	"ir"	(0)
+	: "memory", "cc");
+
+	smp_mb();
+
+	return got_it;
+}
+
+static inline void arch_read_unlock(arch_rwlock_t *rw)
+{
+	unsigned int val;
+
+	smp_mb();
+
+	/*
+	 * rw->counter++;
+	 */
+	__asm__ __volatile__(
+	"1:	llock	%[val], [%[rwlock]]	\n"
+	"	add	%[val], %[val], 1	\n"
+	"	scond	%[val], [%[rwlock]]	\n"
+	"	bnz	1b			\n"
+	"					\n"
+	: [val]		"=&r"	(val)
+	: [rwlock]	"r"	(&(rw->counter))
+	: "memory", "cc");
+
+	smp_mb();
+}
+
+static inline void arch_write_unlock(arch_rwlock_t *rw)
+{
+	unsigned int val;
+
+	smp_mb();
+
+	/*
+	 * rw->counter = __ARCH_RW_LOCK_UNLOCKED__;
+	 */
+	__asm__ __volatile__(
+	"1:	llock	%[val], [%[rwlock]]	\n"
+	"	scond	%[UNLOCKED], [%[rwlock]]\n"
+	"	bnz	1b			\n"
+	"					\n"
+	: [val]		"=&r"	(val)
+	: [rwlock]	"r"	(&(rw->counter)),
+	  [UNLOCKED]	"r"	(__ARCH_RW_LOCK_UNLOCKED__)
+	: "memory", "cc");
+
+	smp_mb();
+}
+
+#undef SCOND_FAIL_RETRY_VAR_DEF
+#undef SCOND_FAIL_RETRY_ASM
+#undef SCOND_FAIL_RETRY_VARS
+
+#endif	/* CONFIG_ARC_STAR_9000923308 */
+
 #else	/* !CONFIG_ARC_HAS_LLSC */
 
 static inline void arch_spin_lock(arch_spinlock_t *lock)
diff --git a/arch/arc/kernel/setup.c b/arch/arc/kernel/setup.c
index 18cc01591c96..1f3d9b3160fa 100644
--- a/arch/arc/kernel/setup.c
+++ b/arch/arc/kernel/setup.c
@@ -330,6 +330,10 @@ static void arc_chk_core_config(void)
 		pr_warn("CONFIG_ARC_FPU_SAVE_RESTORE needed for working apps\n");
 	else if (!cpu->extn.fpu_dp && fpu_enabled)
 		panic("FPU non-existent, disable CONFIG_ARC_FPU_SAVE_RESTORE\n");
+
+	if (is_isa_arcv2() && IS_ENABLED(CONFIG_SMP) && cpu->isa.atomic &&
+	    !IS_ENABLED(CONFIG_ARC_STAR_9000923308))
+		panic("llock/scond livelock workaround missing\n");
 }
 
 /*
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 6/6] ARCv2: spinlock/rwlock: Reset retry delay when starting a new spin-wait cycle
  2015-08-03 10:03 [PATCH 0/6] ARC: spinlocks/atomics rework Vineet Gupta
                   ` (4 preceding siblings ...)
  2015-08-03 10:03 ` [PATCH 5/6] ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff Vineet Gupta
@ 2015-08-03 10:03 ` Vineet Gupta
  2015-08-03 11:43   ` Peter Zijlstra
  5 siblings, 1 reply; 21+ messages in thread
From: Vineet Gupta @ 2015-08-03 10:03 UTC (permalink / raw)
  To: Peter Zijlstra (Intel); +Cc: lkml, arc-linux-dev, Vineet Gupta

A spin lock could be available momentarily, but the SCOND to actually
acquire it might still fail due to concurrent update from other core(s).
To elide hardware lock, the sequence is retried after a "delay" which is
increased expoenntially to get a nice backoff behaviour.

However, this could cause the delay counter to get to a high value. Thus
when the next spin cycle restarts, reset the counter back to starting
value of 1.

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/spinlock.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arc/include/asm/spinlock.h b/arch/arc/include/asm/spinlock.h
index 297c54484013..dce6e3d8583c 100644
--- a/arch/arc/include/asm/spinlock.h
+++ b/arch/arc/include/asm/spinlock.h
@@ -291,7 +291,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
 	__asm__ __volatile__(
 	"0:	mov	%[delay], 1		\n"
 	"1:	llock	%[val], [%[slock]]	\n"
-	"	breq	%[val], %[LOCKED], 1b	\n"	/* spin while LOCKED */
+	"	breq	%[val], %[LOCKED], 0b	\n"	/* spin while LOCKED */
 	"	scond	%[LOCKED], [%[slock]]	\n"	/* acquire */
 	"	bz	4f			\n"	/* done */
 	"					\n"
@@ -370,7 +370,7 @@ static inline void arch_read_lock(arch_rwlock_t *rw)
 	__asm__ __volatile__(
 	"0:	mov	%[delay], 1		\n"
 	"1:	llock	%[val], [%[rwlock]]	\n"
-	"	brls	%[val], %[WR_LOCKED], 1b\n"	/* <= 0: spin while write locked */
+	"	brls	%[val], %[WR_LOCKED], 0b\n"	/* <= 0: spin while write locked */
 	"	sub	%[val], %[val], 1	\n"	/* reader lock */
 	"	scond	%[val], [%[rwlock]]	\n"
 	"	bz	4f			\n"	/* done */
@@ -439,7 +439,7 @@ static inline void arch_write_lock(arch_rwlock_t *rw)
 	__asm__ __volatile__(
 	"0:	mov	%[delay], 1		\n"
 	"1:	llock	%[val], [%[rwlock]]	\n"
-	"	brne	%[val], %[UNLOCKED], 1b	\n"	/* while !UNLOCKED spin */
+	"	brne	%[val], %[UNLOCKED], 0b	\n"	/* while !UNLOCKED spin */
 	"	mov	%[val], %[WR_LOCKED]	\n"
 	"	scond	%[val], [%[rwlock]]	\n"
 	"	bz	4f			\n"
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/6] ARC: LLOCK/SCOND based spin_lock
  2015-08-03 10:03 ` [PATCH 3/6] ARC: LLOCK/SCOND based spin_lock Vineet Gupta
@ 2015-08-03 11:29   ` Peter Zijlstra
  2015-08-03 11:44     ` Vineet Gupta
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2015-08-03 11:29 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: lkml, arc-linux-dev

On Mon, Aug 03, 2015 at 03:33:05PM +0530, Vineet Gupta wrote:
> EX causes the cache line to be in Exclusive state and if done
> concurrently by multiple cores, it keeps bouncing around.
> In LLOCK/SCOND regime, spinning only involves LLOCK which doesn't
> change the line state hence better solution.

Maybe write like:

"The EXchange instruction forces the cacheline into exclusive state
(because of the modify) and concurrent loops with it will bounce the
cacheline between the cores

Instead use LLOCK/SCOND to form the test-and-set lock implementation
since LLOCK can keep the line in shared state."

Because it wasn't clear to me what EX meant, and surely a LOAD must
change the cacheline state of it previously was in exclusive on another
core. Its just that shared is a whole lot better to spin on than
exclusive.

Also, since you're using LL/SC now, a slightly more complex lock is
trivial to implement, might I suggest you look at implementing ticket
locks next?

> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
> ---
>  arch/arc/include/asm/spinlock.h | 76 +++++++++++++++++++++++++++++++++++++----
>  1 file changed, 69 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arc/include/asm/spinlock.h b/arch/arc/include/asm/spinlock.h
> index e1651df6a93d..4f6c90a0a68a 100644
> --- a/arch/arc/include/asm/spinlock.h
> +++ b/arch/arc/include/asm/spinlock.h
> @@ -18,9 +18,68 @@
>  #define arch_spin_unlock_wait(x) \
>  	do { while (arch_spin_is_locked(x)) cpu_relax(); } while (0)
>  
> +#ifdef CONFIG_ARC_HAS_LLSC
> +
> +static inline void arch_spin_lock(arch_spinlock_t *lock)
> +{
> +	unsigned int val;
> +
> +	smp_mb();

I'm still puzzled by your need of this one ...

> +
> +	__asm__ __volatile__(
> +	"1:	llock	%[val], [%[slock]]	\n"
> +	"	breq	%[val], %[LOCKED], 1b	\n"	/* spin while LOCKED */
> +	"	scond	%[LOCKED], [%[slock]]	\n"	/* acquire */
> +	"	bnz	1b			\n"
> +	"					\n"
> +	: [val]		"=&r"	(val)
> +	: [slock]	"r"	(&(lock->slock)),
> +	  [LOCKED]	"r"	(__ARCH_SPIN_LOCK_LOCKED__)
> +	: "memory", "cc");
> +
> +	smp_mb();
> +}
> +
> +/* 1 - lock taken successfully */
> +static inline int arch_spin_trylock(arch_spinlock_t *lock)
> +{
> +	unsigned int val, got_it = 0;
> +
> +	smp_mb();

Idem.

> +
> +	__asm__ __volatile__(
> +	"1:	llock	%[val], [%[slock]]	\n"
> +	"	breq	%[val], %[LOCKED], 4f	\n"	/* already LOCKED, just bail */
> +	"	scond	%[LOCKED], [%[slock]]	\n"	/* acquire */
> +	"	bnz	1b			\n"
> +	"	mov	%[got_it], 1		\n"
> +	"4:					\n"
> +	"					\n"
> +	: [val]		"=&r"	(val),
> +	  [got_it]	"+&r"	(got_it)
> +	: [slock]	"r"	(&(lock->slock)),
> +	  [LOCKED]	"r"	(__ARCH_SPIN_LOCK_LOCKED__)
> +	: "memory", "cc");
> +
> +	smp_mb();
> +
> +	return got_it;
> +}
> +
> +static inline void arch_spin_unlock(arch_spinlock_t *lock)
> +{
> +	smp_mb();
> +
> +	lock->slock = __ARCH_SPIN_LOCK_UNLOCKED__;
> +
> +	smp_mb();

Idem.

> +}
> +
> +#else	/* !CONFIG_ARC_HAS_LLSC */

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/6] ARC: LLOCK/SCOND based rwlock
  2015-08-03 10:03 ` [PATCH 4/6] ARC: LLOCK/SCOND based rwlock Vineet Gupta
@ 2015-08-03 11:33   ` Peter Zijlstra
  2015-08-03 11:51     ` Vineet Gupta
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2015-08-03 11:33 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: lkml, arc-linux-dev

On Mon, Aug 03, 2015 at 03:33:06PM +0530, Vineet Gupta wrote:
> With LLOCK/SCOND, the rwlock counter can be atomically updated w/o need
> for a guarding spin lock.

Maybe re-iterate the exclusive vs shared spin story again.

And aside from the far too many full barriers (again), I was just
wondering about:

> +static inline void arch_write_unlock(arch_rwlock_t *rw)
> +{
> +	unsigned int val;
> +
> +	smp_mb();
> +
> +	/*
> +	 * rw->counter = __ARCH_RW_LOCK_UNLOCKED__;
> +	 */
> +	__asm__ __volatile__(
> +	"1:	llock	%[val], [%[rwlock]]	\n"
> +	"	scond	%[UNLOCKED], [%[rwlock]]\n"
> +	"	bnz	1b			\n"
> +	"					\n"
> +	: [val]		"=&r"	(val)
> +	: [rwlock]	"r"	(&(rw->counter)),
> +	  [UNLOCKED]	"r"	(__ARCH_RW_LOCK_UNLOCKED__)
> +	: "memory", "cc");
> +
> +	smp_mb();
> +}

Why can't that be a straight store?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/6] ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff
  2015-08-03 10:03 ` [PATCH 5/6] ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff Vineet Gupta
@ 2015-08-03 11:41   ` Peter Zijlstra
  2015-08-03 13:01     ` Vineet Gupta
  2015-08-03 11:50   ` Peter Zijlstra
  1 sibling, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2015-08-03 11:41 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: lkml, arc-linux-dev

On Mon, Aug 03, 2015 at 03:33:07PM +0530, Vineet Gupta wrote:
> +#define SCOND_FAIL_RETRY_VAR_DEF						\
> +	unsigned int delay = 1, tmp;						\
> +
> +#define SCOND_FAIL_RETRY_ASM							\
> +	"	bz	4f			\n"				\
> +	"   ; --- scond fail delay ---		\n"				\
> +	"	mov	%[tmp], %[delay]	\n"	/* tmp = delay */	\
> +	"2: 	brne.d	%[tmp], 0, 2b		\n"	/* while (tmp != 0) */	\
> +	"	sub	%[tmp], %[tmp], 1	\n"	/* tmp-- */		\
> +	"	asl	%[delay], %[delay], 1	\n"	/* delay *= 2 */	\
> +	"	b	1b			\n"	/* start over */	\
> +	"4: ; --- success ---			\n"				\
> +
> +#define SCOND_FAIL_RETRY_VARS							\
> +	  ,[delay] "+&r" (delay),[tmp] "=&r"	(tmp)				\
> +
> +#define ATOMIC_OP(op, c_op, asm_op)					\
> +static inline void atomic_##op(int i, atomic_t *v)			\
> +{									\
> +	unsigned int val, delay = 1, tmp;				\

Maybe use your SCOND_FAIL_RETRY_VAR_DEF ?

> +									\
> +	__asm__ __volatile__(						\
> +	"1:	llock   %[val], [%[ctr]]		\n"		\
> +	"	" #asm_op " %[val], %[val], %[i]	\n"		\
> +	"	scond   %[val], [%[ctr]]		\n"		\
> +	"						\n"		\
> +	SCOND_FAIL_RETRY_ASM						\
> +									\
> +	: [val]	"=&r"	(val) /* Early clobber to prevent reg reuse */	\
> +	  SCOND_FAIL_RETRY_VARS						\
> +	: [ctr]	"r"	(&v->counter), /* Not "m": llock only supports reg direct addr mode */	\
> +	  [i]	"ir"	(i)						\
> +	: "cc");							\
> +}									\
> +
> +#define ATOMIC_OP_RETURN(op, c_op, asm_op)				\
> +static inline int atomic_##op##_return(int i, atomic_t *v)		\
> +{									\
> +	unsigned int val, delay = 1, tmp;				\

Idem.

> +									\
> +	/*								\
> +	 * Explicit full memory barrier needed before/after as		\
> +	 * LLOCK/SCOND thmeselves don't provide any such semantics	\
> +	 */								\
> +	smp_mb();							\
> +									\
> +	__asm__ __volatile__(						\
> +	"1:	llock   %[val], [%[ctr]]		\n"		\
> +	"	" #asm_op " %[val], %[val], %[i]	\n"		\
> +	"	scond   %[val], [%[ctr]]		\n"		\
> +	"						\n"		\
> +	SCOND_FAIL_RETRY_ASM						\
> +									\
> +	: [val]	"=&r"	(val)						\
> +	  SCOND_FAIL_RETRY_VARS						\
> +	: [ctr]	"r"	(&v->counter),					\
> +	  [i]	"ir"	(i)						\
> +	: "cc");							\
> +									\
> +	smp_mb();							\
> +									\
> +	return val;							\
> +}

> +#define SCOND_FAIL_RETRY_VAR_DEF						\
> +	unsigned int delay, tmp;						\
> +
> +#define SCOND_FAIL_RETRY_ASM							\
> +	"   ; --- scond fail delay ---		\n"				\
> +	"	mov	%[tmp], %[delay]	\n"	/* tmp = delay */	\
> +	"2: 	brne.d	%[tmp], 0, 2b		\n"	/* while (tmp != 0) */	\
> +	"	sub	%[tmp], %[tmp], 1	\n"	/* tmp-- */		\
> +	"	asl	%[delay], %[delay], 1	\n"	/* delay *= 2 */	\
> +	"	b	1b			\n"	/* start over */	\
> +	"					\n"				\
> +	"4: ; --- done ---			\n"				\
> +
> +#define SCOND_FAIL_RETRY_VARS							\
> +	  ,[delay] "=&r" (delay), [tmp] "=&r"	(tmp)				\

This is looking remarkably similar to the previous ones, why not a
shared header?

> +static inline void arch_spin_lock(arch_spinlock_t *lock)
> +{
> +	unsigned int val;
> +	SCOND_FAIL_RETRY_VAR_DEF;
> +
> +	smp_mb();
> +
> +	__asm__ __volatile__(
> +	"0:	mov	%[delay], 1		\n"
> +	"1:	llock	%[val], [%[slock]]	\n"
> +	"	breq	%[val], %[LOCKED], 1b	\n"	/* spin while LOCKED */
> +	"	scond	%[LOCKED], [%[slock]]	\n"	/* acquire */
> +	"	bz	4f			\n"	/* done */
> +	"					\n"
> +	SCOND_FAIL_RETRY_ASM

But,... in the case that macro is empty, the label 4 does not actually
exist. I see no real reason for this to be different from the previous
incarnation either.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/6] ARCv2: spinlock/rwlock: Reset retry delay when starting a new spin-wait cycle
  2015-08-03 10:03 ` [PATCH 6/6] ARCv2: spinlock/rwlock: Reset retry delay when starting a new spin-wait cycle Vineet Gupta
@ 2015-08-03 11:43   ` Peter Zijlstra
  2015-08-03 14:40     ` Vineet Gupta
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2015-08-03 11:43 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: lkml, arc-linux-dev

On Mon, Aug 03, 2015 at 03:33:08PM +0530, Vineet Gupta wrote:
> A spin lock could be available momentarily, but the SCOND to actually
> acquire it might still fail due to concurrent update from other core(s).
> To elide hardware lock, the sequence is retried after a "delay" which is
> increased expoenntially to get a nice backoff behaviour.
> 
> However, this could cause the delay counter to get to a high value. Thus
> when the next spin cycle restarts, reset the counter back to starting
> value of 1.


Cute.. fwiw, did you look at what Sparc64 does?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/6] ARC: LLOCK/SCOND based spin_lock
  2015-08-03 11:29   ` Peter Zijlstra
@ 2015-08-03 11:44     ` Vineet Gupta
  0 siblings, 0 replies; 21+ messages in thread
From: Vineet Gupta @ 2015-08-03 11:44 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: lkml, arc-linux-dev

On Monday 03 August 2015 04:59 PM, Peter Zijlstra wrote:
> On Mon, Aug 03, 2015 at 03:33:05PM +0530, Vineet Gupta wrote:
>> EX causes the cache line to be in Exclusive state and if done
>> concurrently by multiple cores, it keeps bouncing around.
>> In LLOCK/SCOND regime, spinning only involves LLOCK which doesn't
>> change the line state hence better solution.
> Maybe write like:
>
> "The EXchange instruction forces the cacheline into exclusive state
> (because of the modify) and concurrent loops with it will bounce the
> cacheline between the cores
>
> Instead use LLOCK/SCOND to form the test-and-set lock implementation
> since LLOCK can keep the line in shared state."

OK ! Will change this for v2.

> Because it wasn't clear to me what EX meant, and surely a LOAD must
> change the cacheline state of it previously was in exclusive on another
> core. Its just that shared is a whole lot better to spin on than
> exclusive.
>
> Also, since you're using LL/SC now, a slightly more complex lock is
> trivial to implement, might I suggest you look at implementing ticket
> locks next?

Sure !

>> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
>> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
>> ---
>>  arch/arc/include/asm/spinlock.h | 76 +++++++++++++++++++++++++++++++++++++----
>>  1 file changed, 69 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/arc/include/asm/spinlock.h b/arch/arc/include/asm/spinlock.h
>> index e1651df6a93d..4f6c90a0a68a 100644
>> --- a/arch/arc/include/asm/spinlock.h
>> +++ b/arch/arc/include/asm/spinlock.h
>> @@ -18,9 +18,68 @@
>>  #define arch_spin_unlock_wait(x) \
>>  	do { while (arch_spin_is_locked(x)) cpu_relax(); } while (0)
>>  
>> +#ifdef CONFIG_ARC_HAS_LLSC
>> +
>> +static inline void arch_spin_lock(arch_spinlock_t *lock)
>> +{
>> +	unsigned int val;
>> +
>> +	smp_mb();
> I'm still puzzled by your need of this one ...

My initial experiment with removing this caused weird jumps in hackbench numbers
and i'v enot had the chance to investigate it at all - but hopefully after this
series I can go back to it. So for now I'd like to keep it as is...

>> +
>> +	__asm__ __volatile__(
>> +	"1:	llock	%[val], [%[slock]]	\n"
>> +	"	breq	%[val], %[LOCKED], 1b	\n"	/* spin while LOCKED */
>> +	"	scond	%[LOCKED], [%[slock]]	\n"	/* acquire */
>> +	"	bnz	1b			\n"
>> +	"					\n"
>> +	: [val]		"=&r"	(val)
>> +	: [slock]	"r"	(&(lock->slock)),
>> +	  [LOCKED]	"r"	(__ARCH_SPIN_LOCK_LOCKED__)
>> +	: "memory", "cc");
>> +
>> +	smp_mb();
>> +}
>> +
>> +/* 1 - lock taken successfully */
>> +static inline int arch_spin_trylock(arch_spinlock_t *lock)
>> +{
>> +	unsigned int val, got_it = 0;
>> +
>> +	smp_mb();
> Idem.
>
>> +
>> +	__asm__ __volatile__(
>> +	"1:	llock	%[val], [%[slock]]	\n"
>> +	"	breq	%[val], %[LOCKED], 4f	\n"	/* already LOCKED, just bail */
>> +	"	scond	%[LOCKED], [%[slock]]	\n"	/* acquire */
>> +	"	bnz	1b			\n"
>> +	"	mov	%[got_it], 1		\n"
>> +	"4:					\n"
>> +	"					\n"
>> +	: [val]		"=&r"	(val),
>> +	  [got_it]	"+&r"	(got_it)
>> +	: [slock]	"r"	(&(lock->slock)),
>> +	  [LOCKED]	"r"	(__ARCH_SPIN_LOCK_LOCKED__)
>> +	: "memory", "cc");
>> +
>> +	smp_mb();
>> +
>> +	return got_it;
>> +}
>> +
>> +static inline void arch_spin_unlock(arch_spinlock_t *lock)
>> +{
>> +	smp_mb();
>> +
>> +	lock->slock = __ARCH_SPIN_LOCK_UNLOCKED__;
>> +
>> +	smp_mb();
> Idem.
>
>> +}
>> +
>> +#else	/* !CONFIG_ARC_HAS_LLSC */


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/6] ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff
  2015-08-03 10:03 ` [PATCH 5/6] ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff Vineet Gupta
  2015-08-03 11:41   ` Peter Zijlstra
@ 2015-08-03 11:50   ` Peter Zijlstra
  2015-08-03 13:02     ` Vineet Gupta
  1 sibling, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2015-08-03 11:50 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: lkml, arc-linux-dev

On Mon, Aug 03, 2015 at 03:33:07PM +0530, Vineet Gupta wrote:
> +#define SCOND_FAIL_RETRY_ASM							\
> +	"	bz	4f			\n"				\
> +	"   ; --- scond fail delay ---		\n"				\
> +	"	mov	%[tmp], %[delay]	\n"	/* tmp = delay */	\
> +	"2: 	brne.d	%[tmp], 0, 2b		\n"	/* while (tmp != 0) */	\
> +	"	sub	%[tmp], %[tmp], 1	\n"	/* tmp-- */		\
> +	"	asl	%[delay], %[delay], 1	\n"	/* delay *= 2 */	\
> +	"	b	1b			\n"	/* start over */	\
> +	"4: ; --- success ---			\n"				\

One more note, you might want a test to handle the case where delay *= 2
overflows and results in a 0.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/6] ARC: LLOCK/SCOND based rwlock
  2015-08-03 11:33   ` Peter Zijlstra
@ 2015-08-03 11:51     ` Vineet Gupta
  0 siblings, 0 replies; 21+ messages in thread
From: Vineet Gupta @ 2015-08-03 11:51 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: lkml, arc-linux-dev

On Monday 03 August 2015 05:03 PM, Peter Zijlstra wrote:
> On Mon, Aug 03, 2015 at 03:33:06PM +0530, Vineet Gupta wrote:
>> With LLOCK/SCOND, the rwlock counter can be atomically updated w/o need
>> for a guarding spin lock.
> Maybe re-iterate the exclusive vs shared spin story again.
>
> And aside from the far too many full barriers (again), I was just
> wondering about:
>
>> +static inline void arch_write_unlock(arch_rwlock_t *rw)
>> +{
>> +	unsigned int val;
>> +
>> +	smp_mb();
>> +
>> +	/*
>> +	 * rw->counter = __ARCH_RW_LOCK_UNLOCKED__;
>> +	 */
>> +	__asm__ __volatile__(
>> +	"1:	llock	%[val], [%[rwlock]]	\n"
>> +	"	scond	%[UNLOCKED], [%[rwlock]]\n"
>> +	"	bnz	1b			\n"
>> +	"					\n"
>> +	: [val]		"=&r"	(val)
>> +	: [rwlock]	"r"	(&(rw->counter)),
>> +	  [UNLOCKED]	"r"	(__ARCH_RW_LOCK_UNLOCKED__)
>> +	: "memory", "cc");
>> +
>> +	smp_mb();
>> +}
> Why can't that be a straight store?

Right - that was my overly cautious initial implementation. I did switch the spin
unlock to regular ST after initial experimenting, but missed this one. I'll take
it out in next version.

-Vineet

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/6] ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff
  2015-08-03 11:41   ` Peter Zijlstra
@ 2015-08-03 13:01     ` Vineet Gupta
  2015-08-03 13:50       ` Vineet Gupta
  0 siblings, 1 reply; 21+ messages in thread
From: Vineet Gupta @ 2015-08-03 13:01 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: lkml, arc-linux-dev

On Monday 03 August 2015 05:11 PM, Peter Zijlstra wrote:
> On Mon, Aug 03, 2015 at 03:33:07PM +0530, Vineet Gupta wrote:
>> +#define SCOND_FAIL_RETRY_VAR_DEF						\
>> +	unsigned int delay = 1, tmp;						\
>> +
>> +#define SCOND_FAIL_RETRY_ASM							\
>> +	"	bz	4f			\n"				\
>> +	"   ; --- scond fail delay ---		\n"				\
>> +	"	mov	%[tmp], %[delay]	\n"	/* tmp = delay */	\
>> +	"2: 	brne.d	%[tmp], 0, 2b		\n"	/* while (tmp != 0) */	\
>> +	"	sub	%[tmp], %[tmp], 1	\n"	/* tmp-- */		\
>> +	"	asl	%[delay], %[delay], 1	\n"	/* delay *= 2 */	\
>> +	"	b	1b			\n"	/* start over */	\
>> +	"4: ; --- success ---			\n"				\
>> +
>> +#define SCOND_FAIL_RETRY_VARS							\
>> +	  ,[delay] "+&r" (delay),[tmp] "=&r"	(tmp)				\
>> +
>> +#define ATOMIC_OP(op, c_op, asm_op)					\
>> +static inline void atomic_##op(int i, atomic_t *v)			\
>> +{									\
>> +	unsigned int val, delay = 1, tmp;				\
> Maybe use your SCOND_FAIL_RETRY_VAR_DEF ?

Right - not sure how I missed that !

>
>> +									\
>> +	__asm__ __volatile__(						\
>> +	"1:	llock   %[val], [%[ctr]]		\n"		\
>> +	"	" #asm_op " %[val], %[val], %[i]	\n"		\
>> +	"	scond   %[val], [%[ctr]]		\n"		\
>> +	"						\n"		\
>> +	SCOND_FAIL_RETRY_ASM						\
>> +									\
>> +	: [val]	"=&r"	(val) /* Early clobber to prevent reg reuse */	\
>> +	  SCOND_FAIL_RETRY_VARS						\
>> +	: [ctr]	"r"	(&v->counter), /* Not "m": llock only supports reg direct addr mode */	\
>> +	  [i]	"ir"	(i)						\
>> +	: "cc");							\
>> +}									\
>> +
>> +#define ATOMIC_OP_RETURN(op, c_op, asm_op)				\
>> +static inline int atomic_##op##_return(int i, atomic_t *v)		\
>> +{									\
>> +	unsigned int val, delay = 1, tmp;				\
> Idem.

OK !

>> +									\
>> +	/*								\
>> +	 * Explicit full memory barrier needed before/after as		\
>> +	 * LLOCK/SCOND thmeselves don't provide any such semantics	\
>> +	 */								\
>> +	smp_mb();							\
>> +									\
>> +	__asm__ __volatile__(						\
>> +	"1:	llock   %[val], [%[ctr]]		\n"		\
>> +	"	" #asm_op " %[val], %[val], %[i]	\n"		\
>> +	"	scond   %[val], [%[ctr]]		\n"		\
>> +	"						\n"		\
>> +	SCOND_FAIL_RETRY_ASM						\
>> +									\
>> +	: [val]	"=&r"	(val)						\
>> +	  SCOND_FAIL_RETRY_VARS						\
>> +	: [ctr]	"r"	(&v->counter),					\
>> +	  [i]	"ir"	(i)						\
>> +	: "cc");							\
>> +									\
>> +	smp_mb();							\
>> +									\
>> +	return val;							\
>> +}
>> +#define SCOND_FAIL_RETRY_VAR_DEF						\
>> +	unsigned int delay, tmp;						\
>> +
>> +#define SCOND_FAIL_RETRY_ASM							\
>> +	"   ; --- scond fail delay ---		\n"				\
>> +	"	mov	%[tmp], %[delay]	\n"	/* tmp = delay */	\
>> +	"2: 	brne.d	%[tmp], 0, 2b		\n"	/* while (tmp != 0) */	\
>> +	"	sub	%[tmp], %[tmp], 1	\n"	/* tmp-- */		\
>> +	"	asl	%[delay], %[delay], 1	\n"	/* delay *= 2 */	\
>> +	"	b	1b			\n"	/* start over */	\
>> +	"					\n"				\
>> +	"4: ; --- done ---			\n"				\
>> +
>> +#define SCOND_FAIL_RETRY_VARS							\
>> +	  ,[delay] "=&r" (delay), [tmp] "=&r"	(tmp)				\
> This is looking remarkably similar to the previous ones, why not a
> shared header?

I thought about it when duplicating the code - however it seemed that readability
was better if code was present in same file, rather than having to look up in a
different header with no context at all.

Plus there are some subtle differences in two when looked closely. Basically
spinlocks need the reset to 1 quirk which atomics don't which means we need the
delay reset to 1 in spinlock inline asm (and a different inline asm constraint).
Plus for atomics, the success branch (bz 4f) is folded away into the macro while
we can't for lock try routines, as that branch uses a delay slot. Agreed that all
of this is in the micro-optim realm, but I suppose worth when u have a 10 stage
pipeline.


>> +static inline void arch_spin_lock(arch_spinlock_t *lock)
>> +{
>> +	unsigned int val;
>> +	SCOND_FAIL_RETRY_VAR_DEF;
>> +
>> +	smp_mb();
>> +
>> +	__asm__ __volatile__(
>> +	"0:	mov	%[delay], 1		\n"
>> +	"1:	llock	%[val], [%[slock]]	\n"
>> +	"	breq	%[val], %[LOCKED], 1b	\n"	/* spin while LOCKED */
>> +	"	scond	%[LOCKED], [%[slock]]	\n"	/* acquire */
>> +	"	bz	4f			\n"	/* done */
>> +	"					\n"
>> +	SCOND_FAIL_RETRY_ASM
> But,... in the case that macro is empty, the label 4 does not actually
> exist. I see no real reason for this to be different from the previous
> incarnation either.

Per current code, the macro is never empty. I initially wrote it to have one
version of routines with different macro definition but then it was getting
terribly difficult to follow so I resorted to duplicating all the routines, with
macros to kind of compensate for duplication by factoring out common code in
duplicated code :-)

for locks, I can again fold the the bz into macro, but then we can't use the delay
slot in try versions !

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/6] ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff
  2015-08-03 11:50   ` Peter Zijlstra
@ 2015-08-03 13:02     ` Vineet Gupta
  2015-08-03 13:06       ` Peter Zijlstra
  0 siblings, 1 reply; 21+ messages in thread
From: Vineet Gupta @ 2015-08-03 13:02 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: lkml, arc-linux-dev

On Monday 03 August 2015 05:21 PM, Peter Zijlstra wrote:

On Mon, Aug 03, 2015 at 03:33:07PM +0530, Vineet Gupta wrote:


> +#define SCOND_FAIL_RETRY_ASM                                                 \
> +     "       bz      4f                      \n"                             \
> +     "   ; --- scond fail delay ---          \n"                             \
> +     "       mov     %[tmp], %[delay]        \n"     /* tmp = delay */       \
> +     "2:     brne.d  %[tmp], 0, 2b           \n"     /* while (tmp != 0) */  \
> +     "       sub     %[tmp], %[tmp], 1       \n"     /* tmp-- */             \
> +     "       asl     %[delay], %[delay], 1   \n"     /* delay *= 2 */        \
> +     "       b       1b                      \n"     /* start over */        \
> +     "4: ; --- success ---                   \n"                             \


One more note, you might want a test to handle the case where delay *= 2
overflows and results in a 0.


yeah !

- asl %[delay], %[delay], 1
+ asl.f  %[delay], %[delay], 1
+ mov.z %[delay], 1



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/6] ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff
  2015-08-03 13:02     ` Vineet Gupta
@ 2015-08-03 13:06       ` Peter Zijlstra
  0 siblings, 0 replies; 21+ messages in thread
From: Peter Zijlstra @ 2015-08-03 13:06 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: lkml, arc-linux-dev

On Mon, Aug 03, 2015 at 01:02:09PM +0000, Vineet Gupta wrote:
> On Monday 03 August 2015 05:21 PM, Peter Zijlstra wrote:
> 
> On Mon, Aug 03, 2015 at 03:33:07PM +0530, Vineet Gupta wrote:
> 
> 
> > +#define SCOND_FAIL_RETRY_ASM                                                 \
> > +     "       bz      4f                      \n"                             \
> > +     "   ; --- scond fail delay ---          \n"                             \
> > +     "       mov     %[tmp], %[delay]        \n"     /* tmp = delay */       \
> > +     "2:     brne.d  %[tmp], 0, 2b           \n"     /* while (tmp != 0) */  \
> > +     "       sub     %[tmp], %[tmp], 1       \n"     /* tmp-- */             \
> > +     "       asl     %[delay], %[delay], 1   \n"     /* delay *= 2 */        \
> > +     "       b       1b                      \n"     /* start over */        \
> > +     "4: ; --- success ---                   \n"                             \
> 
> 
> One more note, you might want a test to handle the case where delay *= 2
> overflows and results in a 0.
> 
> 
> yeah !
> 
> - asl %[delay], %[delay], 1
> + asl.f  %[delay], %[delay], 1
> + mov.z %[delay], 1

The other obvious option is: 0x80000000 :-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/6] ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff
  2015-08-03 13:01     ` Vineet Gupta
@ 2015-08-03 13:50       ` Vineet Gupta
  2015-08-03 14:08         ` Peter Zijlstra
  0 siblings, 1 reply; 21+ messages in thread
From: Vineet Gupta @ 2015-08-03 13:50 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: lkml, arc-linux-dev

On Monday 03 August 2015 06:31 PM, Vineet Gupta wrote:
> On Monday 03 August 2015 05:11 PM, Peter Zijlstra wrote:
>> > On Mon, Aug 03, 2015 at 03:33:07PM +0530, Vineet Gupta wrote:
>>> >> +#define SCOND_FAIL_RETRY_VAR_DEF						\
>>> >> +	unsigned int delay = 1, tmp;						\
>>> >> +
>>> >> +#define SCOND_FAIL_RETRY_ASM							\
>>> >> +	"	bz	4f			\n"				\
>>> >> +	"   ; --- scond fail delay ---		\n"				\
>>> >> +	"	mov	%[tmp], %[delay]	\n"	/* tmp = delay */	\
>>> >> +	"2: 	brne.d	%[tmp], 0, 2b		\n"	/* while (tmp != 0) */	\
>>> >> +	"	sub	%[tmp], %[tmp], 1	\n"	/* tmp-- */		\
>>> >> +	"	asl	%[delay], %[delay], 1	\n"	/* delay *= 2 */	\
>>> >> +	"	b	1b			\n"	/* start over */	\
>>> >> +	"4: ; --- success ---			\n"				\
>>> >> +
>>> >> +#define SCOND_FAIL_RETRY_VARS							\
>>> >> +	  ,[delay] "+&r" (delay),[tmp] "=&r"	(tmp)				\
>>> >> +
>>> >> +#define ATOMIC_OP(op, c_op, asm_op)					\
>>> >> +static inline void atomic_##op(int i, atomic_t *v)			\
>>> >> +{									\
>>> >> +	unsigned int val, delay = 1, tmp;				\
>> > Maybe use your SCOND_FAIL_RETRY_VAR_DEF ?
> Right - not sure how I missed that !
>
>> >
>>> >> +									\
>>> >> +	__asm__ __volatile__(						\
>>> >> +	"1:	llock   %[val], [%[ctr]]		\n"		\
>>> >> +	"	" #asm_op " %[val], %[val], %[i]	\n"		\
>>> >> +	"	scond   %[val], [%[ctr]]		\n"		\
>>> >> +	"						\n"		\
>>> >> +	SCOND_FAIL_RETRY_ASM						\
>>> >> +									\
>>> >> +	: [val]	"=&r"	(val) /* Early clobber to prevent reg reuse */	\
>>> >> +	  SCOND_FAIL_RETRY_VARS						\
>>> >> +	: [ctr]	"r"	(&v->counter), /* Not "m": llock only supports reg direct addr mode */	\
>>> >> +	  [i]	"ir"	(i)						\
>>> >> +	: "cc");							\
>>> >> +}									\
>>> >> +
>>> >> +#define ATOMIC_OP_RETURN(op, c_op, asm_op)				\
>>> >> +static inline int atomic_##op##_return(int i, atomic_t *v)		\
>>> >> +{									\
>>> >> +	unsigned int val, delay = 1, tmp;				\
>> > Idem.
> OK !
>
>>> >> +									\
>>> >> +	/*								\
>>> >> +	 * Explicit full memory barrier needed before/after as		\
>>> >> +	 * LLOCK/SCOND thmeselves don't provide any such semantics	\
>>> >> +	 */								\
>>> >> +	smp_mb();							\
>>> >> +									\
>>> >> +	__asm__ __volatile__(						\
>>> >> +	"1:	llock   %[val], [%[ctr]]		\n"		\
>>> >> +	"	" #asm_op " %[val], %[val], %[i]	\n"		\
>>> >> +	"	scond   %[val], [%[ctr]]		\n"		\
>>> >> +	"						\n"		\
>>> >> +	SCOND_FAIL_RETRY_ASM						\
>>> >> +									\
>>> >> +	: [val]	"=&r"	(val)						\
>>> >> +	  SCOND_FAIL_RETRY_VARS						\
>>> >> +	: [ctr]	"r"	(&v->counter),					\
>>> >> +	  [i]	"ir"	(i)						\
>>> >> +	: "cc");							\
>>> >> +									\
>>> >> +	smp_mb();							\
>>> >> +									\
>>> >> +	return val;							\
>>> >> +}
>>> >> +#define SCOND_FAIL_RETRY_VAR_DEF						\
>>> >> +	unsigned int delay, tmp;						\
>>> >> +
>>> >> +#define SCOND_FAIL_RETRY_ASM							\
>>> >> +	"   ; --- scond fail delay ---		\n"				\
>>> >> +	"	mov	%[tmp], %[delay]	\n"	/* tmp = delay */	\
>>> >> +	"2: 	brne.d	%[tmp], 0, 2b		\n"	/* while (tmp != 0) */	\
>>> >> +	"	sub	%[tmp], %[tmp], 1	\n"	/* tmp-- */		\
>>> >> +	"	asl	%[delay], %[delay], 1	\n"	/* delay *= 2 */	\
>>> >> +	"	b	1b			\n"	/* start over */	\
>>> >> +	"					\n"				\
>>> >> +	"4: ; --- done ---			\n"				\
>>> >> +
>>> >> +#define SCOND_FAIL_RETRY_VARS							\
>>> >> +	  ,[delay] "=&r" (delay), [tmp] "=&r"	(tmp)				\
>> > This is looking remarkably similar to the previous ones, why not a
>> > shared header?

On second thoughts, the duplication of atomic generator macros seems to be
superflous ...

Below is much more readable and shorter.

------------>
diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
index 3dd36c1efee1..c2e012ca4560 100644
--- a/arch/arc/include/asm/atomic.h
+++ b/arch/arc/include/asm/atomic.h
@@ -23,17 +23,50 @@
 
 #define atomic_set(v, i) (((v)->counter) = (i))
 
+#ifdef CONFIG_ARC_STAR_9000923308
+
+#define SCOND_FAIL_RETRY_VAR_DEF                        \
+    unsigned int delay = 1, tmp;                        \
+
+#define SCOND_FAIL_RETRY_ASM                            \
+    "    bz    4f            \n"                \
+    "   ; --- scond fail delay ---        \n"                \
+    "    mov    %[tmp], %[delay]    \n"    /* tmp = delay */    \
+    "2:     brne.d    %[tmp], 0, 2b        \n"    /* while (tmp != 0) */    \
+    "    sub    %[tmp], %[tmp], 1    \n"    /* tmp-- */        \
+    "    asl    %[delay], %[delay], 1    \n"    /* delay *= 2 */    \
+    "    b    1b            \n"    /* start over */    \
+    "4: ; --- success ---            \n"                \
+
+#define SCOND_FAIL_RETRY_VARS                            \
+      ,[delay] "+&r" (delay),[tmp] "=&r"    (tmp)                \
+
+#else    /* !CONFIG_ARC_STAR_9000923308 */
+
+#define SCOND_FAIL_RETRY_VAR_DEF
+
+#define SCOND_FAIL_RETRY_ASM                            \
+    "    bnz     1b            \n"                \
+
+#define SCOND_FAIL_RETRY_VARS
+
+#endif
+
 #define ATOMIC_OP(op, c_op, asm_op)                    \
 static inline void atomic_##op(int i, atomic_t *v)            \
 {                                    \
-    unsigned int val;                        \
+    unsigned int val;                                \
+    SCOND_FAIL_RETRY_VAR_DEF                                        \
                                     \
     __asm__ __volatile__(                        \
     "1:    llock   %[val], [%[ctr]]        \n"        \
     "    " #asm_op " %[val], %[val], %[i]    \n"        \
     "    scond   %[val], [%[ctr]]        \n"        \
-    "    bnz     1b                \n"        \
+    "                        \n"        \
+    SCOND_FAIL_RETRY_ASM                        \
+                                    \
     : [val]    "=&r"    (val) /* Early clobber to prevent reg reuse */    \
+      SCOND_FAIL_RETRY_VARS                        \
     : [ctr]    "r"    (&v->counter), /* Not "m": llock only supports reg direct
addr mode */    \
       [i]    "ir"    (i)                        \
     : "cc");                            \
@@ -42,7 +75,8 @@ static inline void atomic_##op(int i, atomic_t *v)            \
 #define ATOMIC_OP_RETURN(op, c_op, asm_op)                \
 static inline int atomic_##op##_return(int i, atomic_t *v)        \
 {                                    \
-    unsigned int val;                        \
+    unsigned int val;                                \
+    SCOND_FAIL_RETRY_VAR_DEF                                        \
                                     \
     /*                                \
      * Explicit full memory barrier needed before/after as        \
@@ -54,8 +88,11 @@ static inline int atomic_##op##_return(int i, atomic_t *v)        \
     "1:    llock   %[val], [%[ctr]]        \n"        \
     "    " #asm_op " %[val], %[val], %[i]    \n"        \
     "    scond   %[val], [%[ctr]]        \n"        \
-    "    bnz     1b                \n"        \
+    "                        \n"        \
+    SCOND_FAIL_RETRY_ASM                        \
+                                    \
     : [val]    "=&r"    (val)                        \
+      SCOND_FAIL_RETRY_VARS                        \
     : [ctr]    "r"    (&v->counter),                    \
       [i]    "ir"    (i)                        \
     : "cc");                            \
@@ -142,6 +179,9 @@ ATOMIC_OP(and, &=, and)
 #undef ATOMIC_OPS
 #undef ATOMIC_OP_RETURN
 #undef ATOMIC_OP
+#undef SCOND_FAIL_RETRY_VAR_DEF
+#undef SCOND_FAIL_RETRY_ASM
+#undef SCOND_FAIL_RETRY_VARS
 
 /**
  * __atomic_add_unless - add unless the number is a given value






^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/6] ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff
  2015-08-03 13:50       ` Vineet Gupta
@ 2015-08-03 14:08         ` Peter Zijlstra
  0 siblings, 0 replies; 21+ messages in thread
From: Peter Zijlstra @ 2015-08-03 14:08 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: lkml, arc-linux-dev

On Mon, Aug 03, 2015 at 01:50:01PM +0000, Vineet Gupta wrote:
> diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
> index 3dd36c1efee1..c2e012ca4560 100644
> --- a/arch/arc/include/asm/atomic.h
> +++ b/arch/arc/include/asm/atomic.h
> @@ -23,17 +23,50 @@
>  
>  #define atomic_set(v, i) (((v)->counter) = (i))
>  
> +#ifdef CONFIG_ARC_STAR_9000923308
> +
> +#define SCOND_FAIL_RETRY_VAR_DEF                        \
> +    unsigned int delay = 1, tmp;                        \
> +
> +#define SCOND_FAIL_RETRY_ASM                            \
> +    "    bz    4f            \n"                \
> +    "   ; --- scond fail delay ---        \n"                \
> +    "    mov    %[tmp], %[delay]    \n"    /* tmp = delay */    \
> +    "2:     brne.d    %[tmp], 0, 2b        \n"    /* while (tmp != 0) */    \
> +    "    sub    %[tmp], %[tmp], 1    \n"    /* tmp-- */        \
> +    "    asl    %[delay], %[delay], 1    \n"    /* delay *= 2 */    \

You forgot the overflow test .. :-)

> +    "    b    1b            \n"    /* start over */    \
> +    "4: ; --- success ---            \n"                \
> +
> +#define SCOND_FAIL_RETRY_VARS                            \
> +      ,[delay] "+&r" (delay),[tmp] "=&r"    (tmp)                \
> +
> +#else    /* !CONFIG_ARC_STAR_9000923308 */
> +
> +#define SCOND_FAIL_RETRY_VAR_DEF
> +
> +#define SCOND_FAIL_RETRY_ASM                            \
> +    "    bnz     1b            \n"                \
> +
> +#define SCOND_FAIL_RETRY_VARS
> +
> +#endif

But yes, much better.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/6] ARCv2: spinlock/rwlock: Reset retry delay when starting a new spin-wait cycle
  2015-08-03 11:43   ` Peter Zijlstra
@ 2015-08-03 14:40     ` Vineet Gupta
  2015-08-03 14:42       ` Peter Zijlstra
  0 siblings, 1 reply; 21+ messages in thread
From: Vineet Gupta @ 2015-08-03 14:40 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: lkml, arc-linux-dev

On Monday 03 August 2015 05:14 PM, Peter Zijlstra wrote:
> On Mon, Aug 03, 2015 at 03:33:08PM +0530, Vineet Gupta wrote:
>> > A spin lock could be available momentarily, but the SCOND to actually
>> > acquire it might still fail due to concurrent update from other core(s).
>> > To elide hardware lock, the sequence is retried after a "delay" which is
>> > increased expoenntially to get a nice backoff behaviour.
>> > 
>> > However, this could cause the delay counter to get to a high value. Thus
>> > when the next spin cycle restarts, reset the counter back to starting
>> > value of 1.
> Cute.. fwiw, did you look at what Sparc64 does?
>

Can't really comprehend what's special there - are you referring to the special
branching or to the out of line slow path code for 32 bit version.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/6] ARCv2: spinlock/rwlock: Reset retry delay when starting a new spin-wait cycle
  2015-08-03 14:40     ` Vineet Gupta
@ 2015-08-03 14:42       ` Peter Zijlstra
  0 siblings, 0 replies; 21+ messages in thread
From: Peter Zijlstra @ 2015-08-03 14:42 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: lkml, arc-linux-dev

On Mon, Aug 03, 2015 at 02:40:13PM +0000, Vineet Gupta wrote:
> On Monday 03 August 2015 05:14 PM, Peter Zijlstra wrote:
> > On Mon, Aug 03, 2015 at 03:33:08PM +0530, Vineet Gupta wrote:
> >> > A spin lock could be available momentarily, but the SCOND to actually
> >> > acquire it might still fail due to concurrent update from other core(s).
> >> > To elide hardware lock, the sequence is retried after a "delay" which is
> >> > increased expoenntially to get a nice backoff behaviour.
> >> > 
> >> > However, this could cause the delay counter to get to a high value. Thus
> >> > when the next spin cycle restarts, reset the counter back to starting
> >> > value of 1.
> > Cute.. fwiw, did you look at what Sparc64 does?
> >
> 
> Can't really comprehend what's special there - are you referring to the special
> branching or to the out of line slow path code for 32 bit version.

Its the only arch that I can remember seeing decay logic in.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2015-08-03 14:42 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-03 10:03 [PATCH 0/6] ARC: spinlocks/atomics rework Vineet Gupta
2015-08-03 10:03 ` [PATCH 1/6] Revert "ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock" Vineet Gupta
2015-08-03 10:03 ` [PATCH 2/6] ARC: refactor atomic inline asm operands with symbolic names Vineet Gupta
2015-08-03 10:03 ` [PATCH 3/6] ARC: LLOCK/SCOND based spin_lock Vineet Gupta
2015-08-03 11:29   ` Peter Zijlstra
2015-08-03 11:44     ` Vineet Gupta
2015-08-03 10:03 ` [PATCH 4/6] ARC: LLOCK/SCOND based rwlock Vineet Gupta
2015-08-03 11:33   ` Peter Zijlstra
2015-08-03 11:51     ` Vineet Gupta
2015-08-03 10:03 ` [PATCH 5/6] ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff Vineet Gupta
2015-08-03 11:41   ` Peter Zijlstra
2015-08-03 13:01     ` Vineet Gupta
2015-08-03 13:50       ` Vineet Gupta
2015-08-03 14:08         ` Peter Zijlstra
2015-08-03 11:50   ` Peter Zijlstra
2015-08-03 13:02     ` Vineet Gupta
2015-08-03 13:06       ` Peter Zijlstra
2015-08-03 10:03 ` [PATCH 6/6] ARCv2: spinlock/rwlock: Reset retry delay when starting a new spin-wait cycle Vineet Gupta
2015-08-03 11:43   ` Peter Zijlstra
2015-08-03 14:40     ` Vineet Gupta
2015-08-03 14:42       ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).