All of lore.kernel.org
 help / color / mirror / Atom feed
* Question regarding atomic ops
@ 2017-10-10  2:24 Joshua Kinard
  2017-10-10  2:34 ` Joshua Kinard
  0 siblings, 1 reply; 8+ messages in thread
From: Joshua Kinard @ 2017-10-10  2:24 UTC (permalink / raw)
  To: Linux/MIPS

Regarding the issues with IP27, one thing I am noticing a lot when I
disassemble an address found in epc or ErrEPC of a "stuck" kernel, is that
sometimes, I keep getting directed back to arch/mips/include/asm/atomic.h:181:

#define ATOMIC_OPS(op, c_op, asm_op)					      \
	ATOMIC_OP(op, c_op, asm_op)					      \
	ATOMIC_OP_RETURN(op, c_op, asm_op)				      \
	ATOMIC_FETCH_OP(op, c_op, asm_op)

ATOMIC_OPS(add, +=, addu)   <-- HERE
ATOMIC_OPS(sub, -=, subu)


So looking at the code, what stands out to me is that the "(kernel_uses_llsc &&
R10000_LLSC_WAR)" inline asm code:

	if (kernel_uses_llsc && R10000_LLSC_WAR) {			      \
		int temp;						      \
									      \
		__asm__ __volatile__(					      \
		"	.set	arch=r4000				\n"   \
		"1:	ll	%0, %1		# atomic_" #op "	\n"   \
		"	" #asm_op " %0, %2				\n"   \
		"	sc	%0, %1					\n"   \
		"	beqzl	%0, 1b					\n"   \
		"	.set	mips0					\n"   \
		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)	      \
		: "Ir" (i));						      \


Is substantially different from the standard "kernel_uses_llsc" inline asm:

	} else if (kernel_uses_llsc) {					      \
		int temp;						      \
									      \
		do {							      \
			__asm__ __volatile__(				      \
			"	.set	"MIPS_ISA_LEVEL"		\n"   \
			"	ll	%0, %1		# atomic_" #op "\n"   \
			"	" #asm_op " %0, %2			\n"   \
			"	sc	%0, %1				\n"   \
			"	.set	mips0				\n"   \
			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)  \
			: "Ir" (i));					      \
		} while (unlikely(!temp));				      \


(Above is from "atomic_##op" -> #define ATOMIC_OP, starting on line 44 in
current git)


My understanding of what R10000_LLSC_WAR handles, in most cases, is the use of
a "beqzl" instruction over "beqz", due to a silicon bug in earlier R10000 CPUs.
 R10K CPUs with silicon rev >~3.0, R12K, R14K, and R16K are all unaffected and
should be able to safely use the non-R10000_LLSC_WAR branch.  Current upstream
however, does not distinguish between different members of the R10K family,
thus it forces ALL R10K CPUs to take the R10000_LLSC_WAR path.

I've got a patch that splits R10000 support up into plain "R10000" and then
"R12K/R14K/R16K", with the latter case //disabling// the R10000_LLSC_WAR flag.
Thus, because of this change, on my systems, I am executing the standard
"kernel_uses_llsc" inline asm code and this newer code probably does not play
very nicely on these older CPUs.

Checking through a couple of git logs, it looks like the development on later
MIPS ISAs (R2+) on the newer CPUs has been tweaking the atomic ops case for
standard "kernel_uses_llsc", and ignoring the R10000_LLSC_WAR block entirely.
I suspect this is in an attempt by some to not break what is probably assumed
to be working code for systems few people have access to.

Does this sound accurate?

I found that the first of these changes occurred almost 7 years ago this month
between 2.6.36 and 2.6.37 w/ commit 7837314d141c:

https://git.linux-mips.org/cgit/ralf/linux.git/commit/arch/mips/include/asm/atomic.h?id=7837314d141c661c70bc13c5050694413ecfe14a

This raises the question of why was the standard "kernel_uses_llsc" case
changed but not the R10000_LLSC_WAR case?  The changes seem like they would be
applicable to the older R10K CPUs regardless, since this is before a lot of the
code for the newer ISAs (R2+) was added.  I am getting a funny feeling that a
lot of these templates need to be re-written (maybe even in plain C, given
newer gcc's better intelligence) and other useful cleanups done.  I am not
fluent in MIPS asm enough, though, to know what to change.

I'm going to experiment with backing out some of the more recent changes
specific to the newer ISAs/CPUs and set it up so that the main difference
between the R10000_LLSC_WAR case and the standard case is just plain "beqzl"
versus "beqz" and see if this makes my issues on IP27 go away.

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question regarding atomic ops
  2017-10-10  2:24 Question regarding atomic ops Joshua Kinard
@ 2017-10-10  2:34 ` Joshua Kinard
  2017-10-10 14:23   ` Ralf Baechle
  0 siblings, 1 reply; 8+ messages in thread
From: Joshua Kinard @ 2017-10-10  2:34 UTC (permalink / raw)
  To: linux-mips; +Cc: Ralf Baechle

On 10/09/2017 22:24, Joshua Kinard wrote:

[snip]

> This raises the question of why was the standard "kernel_uses_llsc" case
> changed but not the R10000_LLSC_WAR case?  The changes seem like they would be
> applicable to the older R10K CPUs regardless, since this is before a lot of the
> code for the newer ISAs (R2+) was added.  I am getting a funny feeling that a
> lot of these templates need to be re-written (maybe even in plain C, given
> newer gcc's better intelligence) and other useful cleanups done.  I am not
> fluent in MIPS asm enough, though, to know what to change.

Answered one of my own questions via this buried commit from ~2006/2007 that
has a commit message, but no changed files:

https://git.linux-mips.org/cgit/ralf/linux.git/commit/arch/mips/include/asm/atomic.h?id=5999eca25c1fd4b9b9aca7833b04d10fe4bc877d

> [MIPS] Improve branch prediction in ll/sc atomic operations.
> Now that finally all supported versions of binutils have functioning
> support for .subsection use .subsection to tweak the branch prediction
> 
> I did not modify the R10000 errata variants because it seems unclear if
> this will invalidate the workaround which actually relies on the cheesy
> prediction of branch likely to cause a misspredict if the sc was
> successful.
> 
> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

Seems like that second paragraph is a ripe candidate for a comment block so
this is better documented :)

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question regarding atomic ops
  2017-10-10  2:34 ` Joshua Kinard
@ 2017-10-10 14:23   ` Ralf Baechle
  2017-10-11 13:15       ` Joshua Kinard
  2017-10-16  5:51     ` Joshua Kinard
  0 siblings, 2 replies; 8+ messages in thread
From: Ralf Baechle @ 2017-10-10 14:23 UTC (permalink / raw)
  To: Joshua Kinard; +Cc: linux-mips

On Mon, Oct 09, 2017 at 10:34:43PM -0400, Joshua Kinard wrote:

> On 10/09/2017 22:24, Joshua Kinard wrote:
> 
> [snip]
> 
> > This raises the question of why was the standard "kernel_uses_llsc" case
> > changed but not the R10000_LLSC_WAR case?  The changes seem like they would be
> > applicable to the older R10K CPUs regardless, since this is before a lot of the
> > code for the newer ISAs (R2+) was added.  I am getting a funny feeling that a
> > lot of these templates need to be re-written (maybe even in plain C, given
> > newer gcc's better intelligence) and other useful cleanups done.  I am not
> > fluent in MIPS asm enough, though, to know what to change.
> 
> Answered one of my own questions via this buried commit from ~2006/2007 that
> has a commit message, but no changed files:
> 
> https://git.linux-mips.org/cgit/ralf/linux.git/commit/arch/mips/include/asm/atomic.h?id=5999eca25c1fd4b9b9aca7833b04d10fe4bc877d
> 
> > [MIPS] Improve branch prediction in ll/sc atomic operations.
> > Now that finally all supported versions of binutils have functioning
> > support for .subsection use .subsection to tweak the branch prediction
> > 
> > I did not modify the R10000 errata variants because it seems unclear if
> > this will invalidate the workaround which actually relies on the cheesy
> > prediction of branch likely to cause a misspredict if the sc was
> > successful.
> > 
> > Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
> 
> Seems like that second paragraph is a ripe candidate for a comment block so
> this is better documented :)

Btw, I reasonably certain applying the change to the R10000 LL/SC workaround
versions as well would work.  But testing is difficult, even with hardware
at hand - and the other option asing a R10000 RTL designer is tricky about
20 years later!

  Ralf

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question regarding atomic ops
@ 2017-10-11 13:15       ` Joshua Kinard
  0 siblings, 0 replies; 8+ messages in thread
From: Joshua Kinard @ 2017-10-11 13:15 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

On 10/10/2017 10:23, Ralf Baechle wrote:
> On Mon, Oct 09, 2017 at 10:34:43PM -0400, Joshua Kinard wrote:
> 
>> On 10/09/2017 22:24, Joshua Kinard wrote:
>>
>> [snip]
>>
>>> This raises the question of why was the standard "kernel_uses_llsc" case
>>> changed but not the R10000_LLSC_WAR case?  The changes seem like they would be
>>> applicable to the older R10K CPUs regardless, since this is before a lot of the
>>> code for the newer ISAs (R2+) was added.  I am getting a funny feeling that a
>>> lot of these templates need to be re-written (maybe even in plain C, given
>>> newer gcc's better intelligence) and other useful cleanups done.  I am not
>>> fluent in MIPS asm enough, though, to know what to change.
>>
>> Answered one of my own questions via this buried commit from ~2006/2007 that
>> has a commit message, but no changed files:
>>
>> https://git.linux-mips.org/cgit/ralf/linux.git/commit/arch/mips/include/asm/atomic.h?id=5999eca25c1fd4b9b9aca7833b04d10fe4bc877d
>>
>>> [MIPS] Improve branch prediction in ll/sc atomic operations.
>>> Now that finally all supported versions of binutils have functioning
>>> support for .subsection use .subsection to tweak the branch prediction
>>>
>>> I did not modify the R10000 errata variants because it seems unclear if
>>> this will invalidate the workaround which actually relies on the cheesy
>>> prediction of branch likely to cause a misspredict if the sc was
>>> successful.
>>>
>>> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
>>
>> Seems like that second paragraph is a ripe candidate for a comment block so
>> this is better documented :)
> 
> Btw, I reasonably certain applying the change to the R10000 LL/SC workaround
> versions as well would work.  But testing is difficult, even with hardware
> at hand - and the other option asing a R10000 RTL designer is tricky about
> 20 years later!

Okay, I'll see about playing with that file and collapsing that it down in size
somewhat by merging the standard and R10000_LLSC_WAR cases in a similar way
that arch/mips/include/asm/cmpxchg.h was done uses the '__scbeqz' macro.  I
don't have any older R10000 CPUs affected by the errata to test with, but I can
at least test the code path under R12K+ to make sure things still work.

---

For the RCU stalls, I found this tidbit of information:
o	Booting Linux using a console connection that is too slow to
	keep up with the boot-time console-message rate.  For example,
	a 115Kbaud serial console can be -way- too slow to keep up
	with boot-time message rates, and will frequently result in
	RCU CPU stall warning messages.  Especially if you have added
	debug printk()s.

From here:
https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt

By default, IP27 runs at 9600bps on the serial console.  The early PROM
messages are hardcoded for that linespeed, which is why I leave it at 9600.
Under the IOC3 metadriver (and I believe even the stock serial code in
ioc3-eth.c), we're relying on a really simple (and crappy) non-IRQ polling
routine buried deep in the 8250 serial core to read the serial console state,
and so that probably adds to the problems.

Under the metadriver, the DMA-capable ioc3_serial.c driver built originally for
Altix will get detected if you build it in, but I've never been able to get any
output out of it on boot.  I think nyef tried as well in his early IP35 port
and didn't have much luck either.  Or maybe he did....I forget.  I guess that's
something to look into so we can get away from the polling mechanism once and
for all.

If that is the source of my RCU stalls, I'll have to try booting w/o serial at
all and log into the machine by ssh and see the timestamps in dmesg.  If they
stay under ~45 seconds, that should confirm things.  It'd also rule out any
other bugs elsewhere in the newer code I've written.  Which means Origin 200
should run right again.  Onyx2 + NUMA, OTOH, still has something goofy with it
as far as I know.

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question regarding atomic ops
@ 2017-10-11 13:15       ` Joshua Kinard
  0 siblings, 0 replies; 8+ messages in thread
From: Joshua Kinard @ 2017-10-11 13:15 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

On 10/10/2017 10:23, Ralf Baechle wrote:
> On Mon, Oct 09, 2017 at 10:34:43PM -0400, Joshua Kinard wrote:
> 
>> On 10/09/2017 22:24, Joshua Kinard wrote:
>>
>> [snip]
>>
>>> This raises the question of why was the standard "kernel_uses_llsc" case
>>> changed but not the R10000_LLSC_WAR case?  The changes seem like they would be
>>> applicable to the older R10K CPUs regardless, since this is before a lot of the
>>> code for the newer ISAs (R2+) was added.  I am getting a funny feeling that a
>>> lot of these templates need to be re-written (maybe even in plain C, given
>>> newer gcc's better intelligence) and other useful cleanups done.  I am not
>>> fluent in MIPS asm enough, though, to know what to change.
>>
>> Answered one of my own questions via this buried commit from ~2006/2007 that
>> has a commit message, but no changed files:
>>
>> https://git.linux-mips.org/cgit/ralf/linux.git/commit/arch/mips/include/asm/atomic.h?id=5999eca25c1fd4b9b9aca7833b04d10fe4bc877d
>>
>>> [MIPS] Improve branch prediction in ll/sc atomic operations.
>>> Now that finally all supported versions of binutils have functioning
>>> support for .subsection use .subsection to tweak the branch prediction
>>>
>>> I did not modify the R10000 errata variants because it seems unclear if
>>> this will invalidate the workaround which actually relies on the cheesy
>>> prediction of branch likely to cause a misspredict if the sc was
>>> successful.
>>>
>>> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
>>
>> Seems like that second paragraph is a ripe candidate for a comment block so
>> this is better documented :)
> 
> Btw, I reasonably certain applying the change to the R10000 LL/SC workaround
> versions as well would work.  But testing is difficult, even with hardware
> at hand - and the other option asing a R10000 RTL designer is tricky about
> 20 years later!

Okay, I'll see about playing with that file and collapsing that it down in size
somewhat by merging the standard and R10000_LLSC_WAR cases in a similar way
that arch/mips/include/asm/cmpxchg.h was done uses the '__scbeqz' macro.  I
don't have any older R10000 CPUs affected by the errata to test with, but I can
at least test the code path under R12K+ to make sure things still work.

---

For the RCU stalls, I found this tidbit of information:
o	Booting Linux using a console connection that is too slow to
	keep up with the boot-time console-message rate.  For example,
	a 115Kbaud serial console can be -way- too slow to keep up
	with boot-time message rates, and will frequently result in
	RCU CPU stall warning messages.  Especially if you have added
	debug printk()s.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question regarding atomic ops
  2017-10-10 14:23   ` Ralf Baechle
  2017-10-11 13:15       ` Joshua Kinard
@ 2017-10-16  5:51     ` Joshua Kinard
  2017-10-25  7:59         ` James Hogan
  1 sibling, 1 reply; 8+ messages in thread
From: Joshua Kinard @ 2017-10-16  5:51 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

On 10/10/2017 10:23, Ralf Baechle wrote:
> On Mon, Oct 09, 2017 at 10:34:43PM -0400, Joshua Kinard wrote:
> 
>> On 10/09/2017 22:24, Joshua Kinard wrote:
>>
>> [snip]
>>
>>> This raises the question of why was the standard "kernel_uses_llsc" case
>>> changed but not the R10000_LLSC_WAR case?  The changes seem like they would be
>>> applicable to the older R10K CPUs regardless, since this is before a lot of the
>>> code for the newer ISAs (R2+) was added.  I am getting a funny feeling that a
>>> lot of these templates need to be re-written (maybe even in plain C, given
>>> newer gcc's better intelligence) and other useful cleanups done.  I am not
>>> fluent in MIPS asm enough, though, to know what to change.
>>
>> Answered one of my own questions via this buried commit from ~2006/2007 that
>> has a commit message, but no changed files:
>>
>> https://git.linux-mips.org/cgit/ralf/linux.git/commit/arch/mips/include/asm/atomic.h?id=5999eca25c1fd4b9b9aca7833b04d10fe4bc877d
>>
>>> [MIPS] Improve branch prediction in ll/sc atomic operations.
>>> Now that finally all supported versions of binutils have functioning
>>> support for .subsection use .subsection to tweak the branch prediction
>>>
>>> I did not modify the R10000 errata variants because it seems unclear if
>>> this will invalidate the workaround which actually relies on the cheesy
>>> prediction of branch likely to cause a misspredict if the sc was
>>> successful.
>>>
>>> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
>>
>> Seems like that second paragraph is a ripe candidate for a comment block so
>> this is better documented :)
> 
> Btw, I reasonably certain applying the change to the R10000 LL/SC workaround
> versions as well would work.  But testing is difficult, even with hardware
> at hand - and the other option asing a R10000 RTL designer is tricky about
> 20 years later!

Inlined below is a first-draft to cleanup atomic.h, by following Paul Burton's
patches for the cmpxchng changes he sent in a while back.  If this and his
patches both get accepted, some of the changes can probably be deduped
(especially moving the definition of __scbeqz to a common header and giving it
a more descriptive name).

The only uncertainty I have is the bottom of atomic_sub_if_positive and
atomic64_sub_if_positive.  In R10000_LLSC_WAR case, the end of the assembler is:

	  "+" GCC_OFF_SMALL_ASM() (v->counter)
	: "Ir" (i), GCC_OFF_SMALL_ASM() (v->counter)
	: "memory");

While the standard case is:

	: "=&r" (result), "=&r" (temp),
	  "+" GCC_OFF_SMALL_ASM() (v->counter)
	: "Ir" (i));

Other than that, both asm functions are virtually the same.  I am not fluent
enough in asm nor gcc's internal asm magic to decipher this, so I am not
certain which is "better", if one even is over the other.  In the draft patch,
I kept the variant from the R10000_LLSC_CASE.  If someone can explain what
differs between these two sections, that'd be helpful to determine if I
guessed correctly.

I'll send a proper/final patch in a few days, based on any feedback.

Thanks!


Draft:

 arch/mips/include/asm/atomic.h |  187 +++++--------------------------
 1 file changed, 35 insertions(+), 152 deletions(-)

---
diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index 0ab176bdb8e8..404e64e57a72 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -22,6 +22,17 @@
 #include <asm/cmpxchg.h>
 #include <asm/war.h>
 
+/*
+ * Using a branch-likely instruction to check the result of an sc instruction
+ * works around a bug present in R10000 CPUs prior to revision 3.0 that could
+ * cause ll-sc sequences to execute non-atomically.
+ */
+#if R10000_LLSC_WAR
+# define __scbeqz "beqzl"
+#else
+# define __scbeqz "beqz"
+#endif
+
 #define ATOMIC_INIT(i)	  { (i) }
 
 /*
@@ -44,31 +55,18 @@
 #define ATOMIC_OP(op, c_op, asm_op)					      \
 static __inline__ void atomic_##op(int i, atomic_t * v)			      \
 {									      \
-	if (kernel_uses_llsc && R10000_LLSC_WAR) {			      \
+	if (kernel_uses_llsc) {						      \
 		int temp;						      \
 									      \
 		__asm__ __volatile__(					      \
-		"	.set	arch=r4000				\n"   \
+		"	.set	"MIPS_ISA_LEVEL"			\n"   \
 		"1:	ll	%0, %1		# atomic_" #op "	\n"   \
 		"	" #asm_op " %0, %2				\n"   \
 		"	sc	%0, %1					\n"   \
-		"	beqzl	%0, 1b					\n"   \
+		"\t" __scbeqz "	%0, 1b					\n"   \
 		"	.set	mips0					\n"   \
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)	      \
 		: "Ir" (i));						      \
-	} else if (kernel_uses_llsc) {					      \
-		int temp;						      \
-									      \
-		do {							      \
-			__asm__ __volatile__(				      \
-			"	.set	"MIPS_ISA_LEVEL"		\n"   \
-			"	ll	%0, %1		# atomic_" #op "\n"   \
-			"	" #asm_op " %0, %2			\n"   \
-			"	sc	%0, %1				\n"   \
-			"	.set	mips0				\n"   \
-			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)  \
-			: "Ir" (i));					      \
-		} while (unlikely(!temp));				      \
 	} else {							      \
 		unsigned long flags;					      \
 									      \
@@ -83,36 +81,20 @@ static __inline__ int atomic_##op##_return_relaxed(int i, atomic_t * v)	      \
 {									      \
 	int result;							      \
 									      \
-	if (kernel_uses_llsc && R10000_LLSC_WAR) {			      \
+	if (kernel_uses_llsc) {						      \
 		int temp;						      \
 									      \
 		__asm__ __volatile__(					      \
-		"	.set	arch=r4000				\n"   \
+		"	.set	"MIPS_ISA_LEVEL"			\n"   \
 		"1:	ll	%1, %2		# atomic_" #op "_return	\n"   \
 		"	" #asm_op " %0, %1, %3				\n"   \
 		"	sc	%0, %2					\n"   \
-		"	beqzl	%0, 1b					\n"   \
+		"\t" __scbeqz "	%0, 1b					\n"   \
 		"	" #asm_op " %0, %1, %3				\n"   \
 		"	.set	mips0					\n"   \
 		: "=&r" (result), "=&r" (temp),				      \
 		  "+" GCC_OFF_SMALL_ASM() (v->counter)			      \
 		: "Ir" (i));						      \
-	} else if (kernel_uses_llsc) {					      \
-		int temp;						      \
-									      \
-		do {							      \
-			__asm__ __volatile__(				      \
-			"	.set	"MIPS_ISA_LEVEL"		\n"   \
-			"	ll	%1, %2	# atomic_" #op "_return	\n"   \
-			"	" #asm_op " %0, %1, %3			\n"   \
-			"	sc	%0, %2				\n"   \
-			"	.set	mips0				\n"   \
-			: "=&r" (result), "=&r" (temp),			      \
-			  "+" GCC_OFF_SMALL_ASM() (v->counter)		      \
-			: "Ir" (i));					      \
-		} while (unlikely(!result));				      \
-									      \
-		result = temp; result c_op i;				      \
 	} else {							      \
 		unsigned long flags;					      \
 									      \
@@ -131,36 +113,20 @@ static __inline__ int atomic_fetch_##op##_relaxed(int i, atomic_t * v)	      \
 {									      \
 	int result;							      \
 									      \
-	if (kernel_uses_llsc && R10000_LLSC_WAR) {			      \
+	if (kernel_uses_llsc) {						      \
 		int temp;						      \
 									      \
 		__asm__ __volatile__(					      \
-		"	.set	arch=r4000				\n"   \
+		"	.set	"MIPS_ISA_LEVEL"			\n"   \
 		"1:	ll	%1, %2		# atomic_fetch_" #op "	\n"   \
 		"	" #asm_op " %0, %1, %3				\n"   \
 		"	sc	%0, %2					\n"   \
-		"	beqzl	%0, 1b					\n"   \
+		"\t" __scbeqz "	%0, 1b					\n"   \
 		"	move	%0, %1					\n"   \
 		"	.set	mips0					\n"   \
 		: "=&r" (result), "=&r" (temp),				      \
 		  "+" GCC_OFF_SMALL_ASM() (v->counter)			      \
 		: "Ir" (i));						      \
-	} else if (kernel_uses_llsc) {					      \
-		int temp;						      \
-									      \
-		do {							      \
-			__asm__ __volatile__(				      \
-			"	.set	"MIPS_ISA_LEVEL"		\n"   \
-			"	ll	%1, %2	# atomic_fetch_" #op "	\n"   \
-			"	" #asm_op " %0, %1, %3			\n"   \
-			"	sc	%0, %2				\n"   \
-			"	.set	mips0				\n"   \
-			: "=&r" (result), "=&r" (temp),			      \
-			  "+" GCC_OFF_SMALL_ASM() (v->counter)		      \
-			: "Ir" (i));					      \
-		} while (unlikely(!result));				      \
-									      \
-		result = temp;						      \
 	} else {							      \
 		unsigned long flags;					      \
 									      \
@@ -218,17 +184,17 @@ static __inline__ int atomic_sub_if_positive(int i, atomic_t * v)
 
 	smp_mb__before_llsc();
 
-	if (kernel_uses_llsc && R10000_LLSC_WAR) {
+	if (kernel_uses_llsc) {
 		int temp;
 
 		__asm__ __volatile__(
-		"	.set	arch=r4000				\n"
+		"	.set	"MIPS_ISA_LEVEL"			\n"
 		"1:	ll	%1, %2		# atomic_sub_if_positive\n"
 		"	subu	%0, %1, %3				\n"
 		"	bltz	%0, 1f					\n"
 		"	sc	%0, %2					\n"
 		"	.set	noreorder				\n"
-		"	beqzl	%0, 1b					\n"
+		"\t" __scbeqz "	%0, 1b					\n"
 		"	 subu	%0, %1, %3				\n"
 		"	.set	reorder					\n"
 		"1:							\n"
@@ -237,24 +203,6 @@ static __inline__ int atomic_sub_if_positive(int i, atomic_t * v)
 		  "+" GCC_OFF_SMALL_ASM() (v->counter)
 		: "Ir" (i), GCC_OFF_SMALL_ASM() (v->counter)
 		: "memory");
-	} else if (kernel_uses_llsc) {
-		int temp;
-
-		__asm__ __volatile__(
-		"	.set	"MIPS_ISA_LEVEL"			\n"
-		"1:	ll	%1, %2		# atomic_sub_if_positive\n"
-		"	subu	%0, %1, %3				\n"
-		"	bltz	%0, 1f					\n"
-		"	sc	%0, %2					\n"
-		"	.set	noreorder				\n"
-		"	beqz	%0, 1b					\n"
-		"	 subu	%0, %1, %3				\n"
-		"	.set	reorder					\n"
-		"1:							\n"
-		"	.set	mips0					\n"
-		: "=&r" (result), "=&r" (temp),
-		  "+" GCC_OFF_SMALL_ASM() (v->counter)
-		: "Ir" (i));
 	} else {
 		unsigned long flags;
 
@@ -386,31 +334,18 @@ static __inline__ int __atomic_add_unless(atomic_t *v, int a, int u)
 #define ATOMIC64_OP(op, c_op, asm_op)					      \
 static __inline__ void atomic64_##op(long i, atomic64_t * v)		      \
 {									      \
-	if (kernel_uses_llsc && R10000_LLSC_WAR) {			      \
+	if (kernel_uses_llsc) {						      \
 		long temp;						      \
 									      \
 		__asm__ __volatile__(					      \
-		"	.set	arch=r4000				\n"   \
+		"	.set	"MIPS_ISA_LEVEL"			\n"   \
 		"1:	lld	%0, %1		# atomic64_" #op "	\n"   \
 		"	" #asm_op " %0, %2				\n"   \
 		"	scd	%0, %1					\n"   \
-		"	beqzl	%0, 1b					\n"   \
+		"\t" __scbeqz "	%0, 1b					\n"   \
 		"	.set	mips0					\n"   \
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)	      \
 		: "Ir" (i));						      \
-	} else if (kernel_uses_llsc) {					      \
-		long temp;						      \
-									      \
-		do {							      \
-			__asm__ __volatile__(				      \
-			"	.set	"MIPS_ISA_LEVEL"		\n"   \
-			"	lld	%0, %1		# atomic64_" #op "\n" \
-			"	" #asm_op " %0, %2			\n"   \
-			"	scd	%0, %1				\n"   \
-			"	.set	mips0				\n"   \
-			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)      \
-			: "Ir" (i));					      \
-		} while (unlikely(!temp));				      \
 	} else {							      \
 		unsigned long flags;					      \
 									      \
@@ -425,37 +360,20 @@ static __inline__ long atomic64_##op##_return_relaxed(long i, atomic64_t * v) \
 {									      \
 	long result;							      \
 									      \
-	if (kernel_uses_llsc && R10000_LLSC_WAR) {			      \
+	if (kernel_uses_llsc) {						      \
 		long temp;						      \
 									      \
 		__asm__ __volatile__(					      \
-		"	.set	arch=r4000				\n"   \
+		"	.set	"MIPS_ISA_LEVEL"			\n"   \
 		"1:	lld	%1, %2		# atomic64_" #op "_return\n"  \
 		"	" #asm_op " %0, %1, %3				\n"   \
 		"	scd	%0, %2					\n"   \
-		"	beqzl	%0, 1b					\n"   \
+		"\t" __scbeqz "	%0, 1b					\n"   \
 		"	" #asm_op " %0, %1, %3				\n"   \
 		"	.set	mips0					\n"   \
 		: "=&r" (result), "=&r" (temp),				      \
 		  "+" GCC_OFF_SMALL_ASM() (v->counter)			      \
 		: "Ir" (i));						      \
-	} else if (kernel_uses_llsc) {					      \
-		long temp;						      \
-									      \
-		do {							      \
-			__asm__ __volatile__(				      \
-			"	.set	"MIPS_ISA_LEVEL"		\n"   \
-			"	lld	%1, %2	# atomic64_" #op "_return\n"  \
-			"	" #asm_op " %0, %1, %3			\n"   \
-			"	scd	%0, %2				\n"   \
-			"	.set	mips0				\n"   \
-			: "=&r" (result), "=&r" (temp),			      \
-			  "=" GCC_OFF_SMALL_ASM() (v->counter)		      \
-			: "Ir" (i), GCC_OFF_SMALL_ASM() (v->counter)	      \
-			: "memory");					      \
-		} while (unlikely(!result));				      \
-									      \
-		result = temp; result c_op i;				      \
 	} else {							      \
 		unsigned long flags;					      \
 									      \
@@ -474,37 +392,20 @@ static __inline__ long atomic64_fetch_##op##_relaxed(long i, atomic64_t * v)  \
 {									      \
 	long result;							      \
 									      \
-	if (kernel_uses_llsc && R10000_LLSC_WAR) {			      \
+	if (kernel_uses_llsc) {						      \
 		long temp;						      \
 									      \
 		__asm__ __volatile__(					      \
-		"	.set	arch=r4000				\n"   \
+		"	.set	"MIPS_ISA_LEVEL"			\n"   \
 		"1:	lld	%1, %2		# atomic64_fetch_" #op "\n"   \
 		"	" #asm_op " %0, %1, %3				\n"   \
 		"	scd	%0, %2					\n"   \
-		"	beqzl	%0, 1b					\n"   \
+		"\t" __scbeqz "	%0, 1b					\n"   \
 		"	move	%0, %1					\n"   \
 		"	.set	mips0					\n"   \
 		: "=&r" (result), "=&r" (temp),				      \
 		  "+" GCC_OFF_SMALL_ASM() (v->counter)			      \
 		: "Ir" (i));						      \
-	} else if (kernel_uses_llsc) {					      \
-		long temp;						      \
-									      \
-		do {							      \
-			__asm__ __volatile__(				      \
-			"	.set	"MIPS_ISA_LEVEL"		\n"   \
-			"	lld	%1, %2	# atomic64_fetch_" #op "\n"   \
-			"	" #asm_op " %0, %1, %3			\n"   \
-			"	scd	%0, %2				\n"   \
-			"	.set	mips0				\n"   \
-			: "=&r" (result), "=&r" (temp),			      \
-			  "=" GCC_OFF_SMALL_ASM() (v->counter)		      \
-			: "Ir" (i), GCC_OFF_SMALL_ASM() (v->counter)	      \
-			: "memory");					      \
-		} while (unlikely(!result));				      \
-									      \
-		result = temp;						      \
 	} else {							      \
 		unsigned long flags;					      \
 									      \
@@ -563,17 +464,17 @@ static __inline__ long atomic64_sub_if_positive(long i, atomic64_t * v)
 
 	smp_mb__before_llsc();
 
-	if (kernel_uses_llsc && R10000_LLSC_WAR) {
+	if (kernel_uses_llsc) {
 		long temp;
 
 		__asm__ __volatile__(
-		"	.set	arch=r4000				\n"
+		"	.set	"MIPS_ISA_LEVEL"			\n"
 		"1:	lld	%1, %2		# atomic64_sub_if_positive\n"
 		"	dsubu	%0, %1, %3				\n"
 		"	bltz	%0, 1f					\n"
 		"	scd	%0, %2					\n"
 		"	.set	noreorder				\n"
-		"	beqzl	%0, 1b					\n"
+		"\t" __scbeqz "	%0, 1b					\n"
 		"	 dsubu	%0, %1, %3				\n"
 		"	.set	reorder					\n"
 		"1:							\n"
@@ -582,24 +483,6 @@ static __inline__ long atomic64_sub_if_positive(long i, atomic64_t * v)
 		  "=" GCC_OFF_SMALL_ASM() (v->counter)
 		: "Ir" (i), GCC_OFF_SMALL_ASM() (v->counter)
 		: "memory");
-	} else if (kernel_uses_llsc) {
-		long temp;
-
-		__asm__ __volatile__(
-		"	.set	"MIPS_ISA_LEVEL"			\n"
-		"1:	lld	%1, %2		# atomic64_sub_if_positive\n"
-		"	dsubu	%0, %1, %3				\n"
-		"	bltz	%0, 1f					\n"
-		"	scd	%0, %2					\n"
-		"	.set	noreorder				\n"
-		"	beqz	%0, 1b					\n"
-		"	 dsubu	%0, %1, %3				\n"
-		"	.set	reorder					\n"
-		"1:							\n"
-		"	.set	mips0					\n"
-		: "=&r" (result), "=&r" (temp),
-		  "+" GCC_OFF_SMALL_ASM() (v->counter)
-		: "Ir" (i));
 	} else {
 		unsigned long flags;
 

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Question regarding atomic ops
@ 2017-10-25  7:59         ` James Hogan
  0 siblings, 0 replies; 8+ messages in thread
From: James Hogan @ 2017-10-25  7:59 UTC (permalink / raw)
  To: Joshua Kinard; +Cc: Ralf Baechle, linux-mips

[-- Attachment #1: Type: text/plain, Size: 992 bytes --]

On Mon, Oct 16, 2017 at 01:51:01AM -0400, Joshua Kinard wrote:
> The only uncertainty I have is the bottom of atomic_sub_if_positive and
> atomic64_sub_if_positive.  In R10000_LLSC_WAR case, the end of the assembler is:
> 
> 	  "+" GCC_OFF_SMALL_ASM() (v->counter)
> 	: "Ir" (i), GCC_OFF_SMALL_ASM() (v->counter)

The GCC_OFF_SMALL_ASM() (v->counter) input here looks redundant because
1) the output one has +, which means input and output
2) %4 is never referenced.

> 	: "memory");

To me this appears to be redundant since the only side effects are
writing to v->counter which is already an output, and in any case the
smp_mb__before_llsc() and smp_llsc_mb() imply memory clobber anyway.

> 
> While the standard case is:
> 
> 	: "=&r" (result), "=&r" (temp),
> 	  "+" GCC_OFF_SMALL_ASM() (v->counter)
> 	: "Ir" (i));

So yeh, I'd go for this unless anybody can think of a reason it wouldn't
work in the R10000_LLSC_WAR case.

Hope that helps.

Cheers
James

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question regarding atomic ops
@ 2017-10-25  7:59         ` James Hogan
  0 siblings, 0 replies; 8+ messages in thread
From: James Hogan @ 2017-10-25  7:59 UTC (permalink / raw)
  To: Joshua Kinard; +Cc: Ralf Baechle, linux-mips

[-- Attachment #1: Type: text/plain, Size: 992 bytes --]

On Mon, Oct 16, 2017 at 01:51:01AM -0400, Joshua Kinard wrote:
> The only uncertainty I have is the bottom of atomic_sub_if_positive and
> atomic64_sub_if_positive.  In R10000_LLSC_WAR case, the end of the assembler is:
> 
> 	  "+" GCC_OFF_SMALL_ASM() (v->counter)
> 	: "Ir" (i), GCC_OFF_SMALL_ASM() (v->counter)

The GCC_OFF_SMALL_ASM() (v->counter) input here looks redundant because
1) the output one has +, which means input and output
2) %4 is never referenced.

> 	: "memory");

To me this appears to be redundant since the only side effects are
writing to v->counter which is already an output, and in any case the
smp_mb__before_llsc() and smp_llsc_mb() imply memory clobber anyway.

> 
> While the standard case is:
> 
> 	: "=&r" (result), "=&r" (temp),
> 	  "+" GCC_OFF_SMALL_ASM() (v->counter)
> 	: "Ir" (i));

So yeh, I'd go for this unless anybody can think of a reason it wouldn't
work in the R10000_LLSC_WAR case.

Hope that helps.

Cheers
James

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-10-25  8:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-10  2:24 Question regarding atomic ops Joshua Kinard
2017-10-10  2:34 ` Joshua Kinard
2017-10-10 14:23   ` Ralf Baechle
2017-10-11 13:15     ` Joshua Kinard
2017-10-11 13:15       ` Joshua Kinard
2017-10-16  5:51     ` Joshua Kinard
2017-10-25  7:59       ` James Hogan
2017-10-25  7:59         ` James Hogan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.