All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH][RFC] Implement arch primitives for busywait loops
@ 2016-09-16  8:57 Nicholas Piggin
  2016-09-16 11:30   ` David Laight
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Nicholas Piggin @ 2016-09-16  8:57 UTC (permalink / raw)
  To: linux-arch; +Cc: Nicholas Piggin, linuxppc-dev

Implementing busy wait loops with cpu_relax() in callers poses
some difficulties for powerpc.

First, we want to put our SMT thread into a low priority mode for the
duration of the loop, but then return to normal priority after exiting
the loop.  Dependong on the CPU design, 'HMT_low() ; HMT_medium();' as
cpu_relax() does may have HMT_medium take effect before HMT_low made
any (or much) difference.

Second, it can be beneficial for some implementations to spin on the
exit condition with a statically predicted-not-taken branch (i.e.,
always predict the loop will exit).

This is a quick RFC with a couple of users converted to see what
people think. I don't use a C branch with hints, because we don't want
the compiler moving the loop body out of line, which makes it a bit
messy unfortunately. If there's a better way to do it, I'm all ears.

I would not propose to switch all callers immediately, just some
core synchronisation primitives.

---
 arch/powerpc/include/asm/processor.h | 22 ++++++++++++++++++++++
 include/asm-generic/barrier.h        |  7 ++-----
 include/linux/bit_spinlock.h         |  5 ++---
 include/linux/cgroup.h               |  7 ++-----
 include/linux/seqlock.h              | 10 ++++------
 5 files changed, 32 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index 68e3bf5..e10aee2 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -402,6 +402,28 @@ static inline unsigned long __pack_fe01(unsigned int fpmode)
 
 #ifdef CONFIG_PPC64
 #define cpu_relax()	do { HMT_low(); HMT_medium(); barrier(); } while (0)
+
+#define spin_do						\
+do {							\
+	HMT_low();					\
+	__asm__ __volatile__ (	"1010:");
+
+#define spin_while(cond)				\
+	barrier();					\
+	__asm__ __volatile__ (	"cmpdi	%0,0	\n\t"	\
+				"beq-	1010b	\n\t"	\
+				: : "r" (cond));	\
+	HMT_medium();					\
+} while (0)
+
+#define spin_until(cond)				\
+	barrier();					\
+	__asm__ __volatile__ (	"cmpdi	%0,0	\n\t"	\
+				"bne-	1010b	\n\t"	\
+				: : "r" (cond));	\
+	HMT_medium();					\
+} while (0)
+
 #else
 #define cpu_relax()	barrier()
 #endif
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index fe297b5..4c53b3a 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -235,12 +235,9 @@ do {									\
 #define smp_cond_load_acquire(ptr, cond_expr) ({		\
 	typeof(ptr) __PTR = (ptr);				\
 	typeof(*ptr) VAL;					\
-	for (;;) {						\
+	spin_do {						\
 		VAL = READ_ONCE(*__PTR);			\
-		if (cond_expr)					\
-			break;					\
-		cpu_relax();					\
-	}							\
+	} spin_until (cond_expr);				\
 	smp_acquire__after_ctrl_dep();				\
 	VAL;							\
 })
diff --git a/include/linux/bit_spinlock.h b/include/linux/bit_spinlock.h
index 3b5bafc..695743c 100644
--- a/include/linux/bit_spinlock.h
+++ b/include/linux/bit_spinlock.h
@@ -25,9 +25,8 @@ static inline void bit_spin_lock(int bitnum, unsigned long *addr)
 #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
 	while (unlikely(test_and_set_bit_lock(bitnum, addr))) {
 		preempt_enable();
-		do {
-			cpu_relax();
-		} while (test_bit(bitnum, addr));
+		spin_do {
+		} spin_while (test_bit(bitnum, addr));
 		preempt_disable();
 	}
 #endif
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 984f73b..e7d395f 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -450,12 +450,9 @@ task_get_css(struct task_struct *task, int subsys_id)
 	struct cgroup_subsys_state *css;
 
 	rcu_read_lock();
-	while (true) {
+	spin_do {
 		css = task_css(task, subsys_id);
-		if (likely(css_tryget_online(css)))
-			break;
-		cpu_relax();
-	}
+	} spin_until (likely(css_tryget_online(css)));
 	rcu_read_unlock();
 	return css;
 }
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index ead9765..93ed609 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -108,12 +108,10 @@ static inline unsigned __read_seqcount_begin(const seqcount_t *s)
 {
 	unsigned ret;
 
-repeat:
-	ret = READ_ONCE(s->sequence);
-	if (unlikely(ret & 1)) {
-		cpu_relax();
-		goto repeat;
-	}
+	spin_do {
+		ret = READ_ONCE(s->sequence);
+	} spin_while (unlikely(ret & 1));
+
 	return ret;
 }
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* RE: [PATCH][RFC] Implement arch primitives for busywait loops
  2016-09-16  8:57 [PATCH][RFC] Implement arch primitives for busywait loops Nicholas Piggin
@ 2016-09-16 11:30   ` David Laight
  2016-09-19  5:05   ` Aneesh Kumar K.V
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: David Laight @ 2016-09-16 11:30 UTC (permalink / raw)
  To: 'Nicholas Piggin', linux-arch; +Cc: linuxppc-dev

From: Nicholas Piggin
> Sent: 16 September 2016 09:58
> Implementing busy wait loops with cpu_relax() in callers poses
> some difficulties for powerpc.
> 
> First, we want to put our SMT thread into a low priority mode for the
> duration of the loop, but then return to normal priority after exiting
> the loop.  Dependong on the CPU design, 'HMT_low() ; HMT_medium();' as
> cpu_relax() does may have HMT_medium take effect before HMT_low made
> any (or much) difference.
> 
> Second, it can be beneficial for some implementations to spin on the
> exit condition with a statically predicted-not-taken branch (i.e.,
> always predict the loop will exit).
> 
> This is a quick RFC with a couple of users converted to see what
> people think. I don't use a C branch with hints, because we don't want
> the compiler moving the loop body out of line, which makes it a bit
> messy unfortunately. If there's a better way to do it, I'm all ears.

I think it will still all go wrong if the conditional isn't trivial.
In particular if the condition contains || or && it is likely to
have a branch - which could invert the loop.

	David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH][RFC] Implement arch primitives for busywait loops
@ 2016-09-16 11:30   ` David Laight
  0 siblings, 0 replies; 17+ messages in thread
From: David Laight @ 2016-09-16 11:30 UTC (permalink / raw)
  To: 'Nicholas Piggin', linux-arch; +Cc: linuxppc-dev

From: Nicholas Piggin
> Sent: 16 September 2016 09:58
> Implementing busy wait loops with cpu_relax() in callers poses
> some difficulties for powerpc.
>=20
> First, we want to put our SMT thread into a low priority mode for the
> duration of the loop, but then return to normal priority after exiting
> the loop.  Dependong on the CPU design, 'HMT_low() ; HMT_medium();' as
> cpu_relax() does may have HMT_medium take effect before HMT_low made
> any (or much) difference.
>=20
> Second, it can be beneficial for some implementations to spin on the
> exit condition with a statically predicted-not-taken branch (i.e.,
> always predict the loop will exit).
>=20
> This is a quick RFC with a couple of users converted to see what
> people think. I don't use a C branch with hints, because we don't want
> the compiler moving the loop body out of line, which makes it a bit
> messy unfortunately. If there's a better way to do it, I'm all ears.

I think it will still all go wrong if the conditional isn't trivial.
In particular if the condition contains || or && it is likely to
have a branch - which could invert the loop.

	David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][RFC] Implement arch primitives for busywait loops
  2016-09-16 11:30   ` David Laight
  (?)
@ 2016-09-16 11:52   ` Nicholas Piggin
  2016-09-16 11:57       ` David Laight
  -1 siblings, 1 reply; 17+ messages in thread
From: Nicholas Piggin @ 2016-09-16 11:52 UTC (permalink / raw)
  To: David Laight; +Cc: linux-arch, linuxppc-dev

On Fri, 16 Sep 2016 11:30:58 +0000
David Laight <David.Laight@ACULAB.COM> wrote:

> From: Nicholas Piggin
> > Sent: 16 September 2016 09:58
> > Implementing busy wait loops with cpu_relax() in callers poses
> > some difficulties for powerpc.
> > 
> > First, we want to put our SMT thread into a low priority mode for the
> > duration of the loop, but then return to normal priority after exiting
> > the loop.  Dependong on the CPU design, 'HMT_low() ; HMT_medium();' as
> > cpu_relax() does may have HMT_medium take effect before HMT_low made
> > any (or much) difference.
> > 
> > Second, it can be beneficial for some implementations to spin on the
> > exit condition with a statically predicted-not-taken branch (i.e.,
> > always predict the loop will exit).
> > 
> > This is a quick RFC with a couple of users converted to see what
> > people think. I don't use a C branch with hints, because we don't want
> > the compiler moving the loop body out of line, which makes it a bit
> > messy unfortunately. If there's a better way to do it, I'm all ears.  
> 
> I think it will still all go wrong if the conditional isn't trivial.
> In particular if the condition contains || or && it is likely to
> have a branch - which could invert the loop.

I don't know that it will.

Yes, if we have exit condition that requires more branches in order to
be computed then we lose our nice property of never taking a branch
miss on loop exit. But we still avoid *this* branch miss, and still
prevent multiple iterations of the wait loop being speculatively
executed concurrently when there's no work to be done.

And C doesn't know about the loop, so it can't do any transformation
except to compute the final condition.

Or have I missed something?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH][RFC] Implement arch primitives for busywait loops
  2016-09-16 11:52   ` Nicholas Piggin
@ 2016-09-16 11:57       ` David Laight
  0 siblings, 0 replies; 17+ messages in thread
From: David Laight @ 2016-09-16 11:57 UTC (permalink / raw)
  To: 'Nicholas Piggin'; +Cc: linux-arch, linuxppc-dev

From: Nicholas Piggin
> Sent: 16 September 2016 12:52
> On Fri, 16 Sep 2016 11:30:58 +0000
> David Laight <David.Laight@ACULAB.COM> wrote:
> 
> > From: Nicholas Piggin
> > > Sent: 16 September 2016 09:58
> > > Implementing busy wait loops with cpu_relax() in callers poses
> > > some difficulties for powerpc.
> > >
> > > First, we want to put our SMT thread into a low priority mode for the
> > > duration of the loop, but then return to normal priority after exiting
> > > the loop.  Dependong on the CPU design, 'HMT_low() ; HMT_medium();' as
> > > cpu_relax() does may have HMT_medium take effect before HMT_low made
> > > any (or much) difference.
> > >
> > > Second, it can be beneficial for some implementations to spin on the
> > > exit condition with a statically predicted-not-taken branch (i.e.,
> > > always predict the loop will exit).
> > >
> > > This is a quick RFC with a couple of users converted to see what
> > > people think. I don't use a C branch with hints, because we don't want
> > > the compiler moving the loop body out of line, which makes it a bit
> > > messy unfortunately. If there's a better way to do it, I'm all ears.
> >
> > I think it will still all go wrong if the conditional isn't trivial.
> > In particular if the condition contains || or && it is likely to
> > have a branch - which could invert the loop.
> 
> I don't know that it will.
> 
> Yes, if we have exit condition that requires more branches in order to
> be computed then we lose our nice property of never taking a branch
> miss on loop exit. But we still avoid *this* branch miss, and still
> prevent multiple iterations of the wait loop being speculatively
> executed concurrently when there's no work to be done.
> 
> And C doesn't know about the loop, so it can't do any transformation
> except to compute the final condition.
> 
> Or have I missed something?

Try putting the code inside a conditional or at the bottom of a loop.
gcc can replicate code to remove a branch.

So:
	for (;;) {
		a;
		if (b)
			c;
		d;
	}

can become:
   x1:
	a;
	if (b) to x2;
	d;
	goto x1;

    x2:
	c;
	d;
	goto x1;

Which won't work.

	David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH][RFC] Implement arch primitives for busywait loops
@ 2016-09-16 11:57       ` David Laight
  0 siblings, 0 replies; 17+ messages in thread
From: David Laight @ 2016-09-16 11:57 UTC (permalink / raw)
  To: 'Nicholas Piggin'; +Cc: linux-arch, linuxppc-dev

From: Nicholas Piggin
> Sent: 16 September 2016 12:52
> On Fri, 16 Sep 2016 11:30:58 +0000
> David Laight <David.Laight@ACULAB.COM> wrote:
>=20
> > From: Nicholas Piggin
> > > Sent: 16 September 2016 09:58
> > > Implementing busy wait loops with cpu_relax() in callers poses
> > > some difficulties for powerpc.
> > >
> > > First, we want to put our SMT thread into a low priority mode for the
> > > duration of the loop, but then return to normal priority after exitin=
g
> > > the loop.  Dependong on the CPU design, 'HMT_low() ; HMT_medium();' a=
s
> > > cpu_relax() does may have HMT_medium take effect before HMT_low made
> > > any (or much) difference.
> > >
> > > Second, it can be beneficial for some implementations to spin on the
> > > exit condition with a statically predicted-not-taken branch (i.e.,
> > > always predict the loop will exit).
> > >
> > > This is a quick RFC with a couple of users converted to see what
> > > people think. I don't use a C branch with hints, because we don't wan=
t
> > > the compiler moving the loop body out of line, which makes it a bit
> > > messy unfortunately. If there's a better way to do it, I'm all ears.
> >
> > I think it will still all go wrong if the conditional isn't trivial.
> > In particular if the condition contains || or && it is likely to
> > have a branch - which could invert the loop.
>=20
> I don't know that it will.
>=20
> Yes, if we have exit condition that requires more branches in order to
> be computed then we lose our nice property of never taking a branch
> miss on loop exit. But we still avoid *this* branch miss, and still
> prevent multiple iterations of the wait loop being speculatively
> executed concurrently when there's no work to be done.
>=20
> And C doesn't know about the loop, so it can't do any transformation
> except to compute the final condition.
>=20
> Or have I missed something?

Try putting the code inside a conditional or at the bottom of a loop.
gcc can replicate code to remove a branch.

So:
	for (;;) {
		a;
		if (b)
			c;
		d;
	}

can become:
   x1:
	a;
	if (b) to x2;
	d;
	goto x1;

    x2:
	c;
	d;
	goto x1;

Which won't work.

	David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][RFC] Implement arch primitives for busywait loops
  2016-09-16 11:57       ` David Laight
  (?)
@ 2016-09-16 12:06       ` Nicholas Piggin
  2016-09-16 12:59         ` Nicholas Piggin
  -1 siblings, 1 reply; 17+ messages in thread
From: Nicholas Piggin @ 2016-09-16 12:06 UTC (permalink / raw)
  To: David Laight; +Cc: linux-arch, linuxppc-dev

On Fri, 16 Sep 2016 11:57:37 +0000
David Laight <David.Laight@ACULAB.COM> wrote:

> From: Nicholas Piggin
> > Sent: 16 September 2016 12:52
> > On Fri, 16 Sep 2016 11:30:58 +0000
> > David Laight <David.Laight@ACULAB.COM> wrote:
> >   
> > > From: Nicholas Piggin  
> > > > Sent: 16 September 2016 09:58
> > > > Implementing busy wait loops with cpu_relax() in callers poses
> > > > some difficulties for powerpc.
> > > >
> > > > First, we want to put our SMT thread into a low priority mode for the
> > > > duration of the loop, but then return to normal priority after exiting
> > > > the loop.  Dependong on the CPU design, 'HMT_low() ; HMT_medium();' as
> > > > cpu_relax() does may have HMT_medium take effect before HMT_low made
> > > > any (or much) difference.
> > > >
> > > > Second, it can be beneficial for some implementations to spin on the
> > > > exit condition with a statically predicted-not-taken branch (i.e.,
> > > > always predict the loop will exit).
> > > >
> > > > This is a quick RFC with a couple of users converted to see what
> > > > people think. I don't use a C branch with hints, because we don't want
> > > > the compiler moving the loop body out of line, which makes it a bit
> > > > messy unfortunately. If there's a better way to do it, I'm all ears.  
> > >
> > > I think it will still all go wrong if the conditional isn't trivial.
> > > In particular if the condition contains || or && it is likely to
> > > have a branch - which could invert the loop.  
> > 
> > I don't know that it will.
> > 
> > Yes, if we have exit condition that requires more branches in order to
> > be computed then we lose our nice property of never taking a branch
> > miss on loop exit. But we still avoid *this* branch miss, and still
> > prevent multiple iterations of the wait loop being speculatively
> > executed concurrently when there's no work to be done.
> > 
> > And C doesn't know about the loop, so it can't do any transformation
> > except to compute the final condition.
> > 
> > Or have I missed something?  
> 
> Try putting the code inside a conditional or at the bottom of a loop.
> gcc can replicate code to remove a branch.
> 
> So:
> 	for (;;) {
> 		a;
> 		if (b)
> 			c;
> 		d;
> 	}

That's not what this patch does though. The loop is purely asm. gcc has
no idea about it. Only thing gcc knows is to evaluate the condition and
put it in a register.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][RFC] Implement arch primitives for busywait loops
  2016-09-16 12:06       ` Nicholas Piggin
@ 2016-09-16 12:59         ` Nicholas Piggin
  0 siblings, 0 replies; 17+ messages in thread
From: Nicholas Piggin @ 2016-09-16 12:59 UTC (permalink / raw)
  To: David Laight; +Cc: linux-arch, linuxppc-dev

On Fri, 16 Sep 2016 22:06:35 +1000
Nicholas Piggin <npiggin@gmail.com> wrote:

> On Fri, 16 Sep 2016 11:57:37 +0000
> David Laight <David.Laight@ACULAB.COM> wrote:
> 
> > From: Nicholas Piggin  
> > > Sent: 16 September 2016 12:52
> > > On Fri, 16 Sep 2016 11:30:58 +0000
> > > David Laight <David.Laight@ACULAB.COM> wrote:
> > >     
> > > > From: Nicholas Piggin    
> > > > > Sent: 16 September 2016 09:58
> > > > > Implementing busy wait loops with cpu_relax() in callers poses
> > > > > some difficulties for powerpc.
> > > > >
> > > > > First, we want to put our SMT thread into a low priority mode for the
> > > > > duration of the loop, but then return to normal priority after exiting
> > > > > the loop.  Dependong on the CPU design, 'HMT_low() ; HMT_medium();' as
> > > > > cpu_relax() does may have HMT_medium take effect before HMT_low made
> > > > > any (or much) difference.
> > > > >
> > > > > Second, it can be beneficial for some implementations to spin on the
> > > > > exit condition with a statically predicted-not-taken branch (i.e.,
> > > > > always predict the loop will exit).
> > > > >
> > > > > This is a quick RFC with a couple of users converted to see what
> > > > > people think. I don't use a C branch with hints, because we don't want
> > > > > the compiler moving the loop body out of line, which makes it a bit
> > > > > messy unfortunately. If there's a better way to do it, I'm all ears.    
> > > >
> > > > I think it will still all go wrong if the conditional isn't trivial.
> > > > In particular if the condition contains || or && it is likely to
> > > > have a branch - which could invert the loop.    
> > > 
> > > I don't know that it will.
> > > 
> > > Yes, if we have exit condition that requires more branches in order to
> > > be computed then we lose our nice property of never taking a branch
> > > miss on loop exit. But we still avoid *this* branch miss, and still
> > > prevent multiple iterations of the wait loop being speculatively
> > > executed concurrently when there's no work to be done.
> > > 
> > > And C doesn't know about the loop, so it can't do any transformation
> > > except to compute the final condition.
> > > 
> > > Or have I missed something?    
> > 
> > Try putting the code inside a conditional or at the bottom of a loop.
> > gcc can replicate code to remove a branch.
> > 
> > So:
> > 	for (;;) {
> > 		a;
> > 		if (b)
> > 			c;
> > 		d;
> > 	}  
> 
> That's not what this patch does though. The loop is purely asm. gcc has
> no idea about it. Only thing gcc knows is to evaluate the condition and
> put it in a register.


Oh you're right course -- can't branch to random location. Sorry, I didn't
know what you meant at first. It does need to use asm goto I guess.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][RFC] Implement arch primitives for busywait loops
@ 2016-09-19  5:05   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2016-09-19  5:05 UTC (permalink / raw)
  To: linux-arch; +Cc: linuxppc-dev, Nicholas Piggin


> diff --git a/include/linux/bit_spinlock.h b/include/linux/bit_spinlock.h
> index 3b5bafc..695743c 100644
> --- a/include/linux/bit_spinlock.h
> +++ b/include/linux/bit_spinlock.h
> @@ -25,9 +25,8 @@ static inline void bit_spin_lock(int bitnum, unsigned long *addr)
>  #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
>  	while (unlikely(test_and_set_bit_lock(bitnum, addr))) {
>  		preempt_enable();
> -		do {
> -			cpu_relax();
> -		} while (test_bit(bitnum, addr));
> +		spin_do {
> +		} spin_while (test_bit(bitnum, addr));
>  		preempt_disable();
>  	}
>  #endif

That usage is strange, with nothing in the do{ }while loop. May be add a
macro for usages like this ?

-aneesh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][RFC] Implement arch primitives for busywait loops
@ 2016-09-19  5:05   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2016-09-19  5:05 UTC (permalink / raw)
  To: Nicholas Piggin, linux-arch; +Cc: linuxppc-dev


> diff --git a/include/linux/bit_spinlock.h b/include/linux/bit_spinlock.h
> index 3b5bafc..695743c 100644
> --- a/include/linux/bit_spinlock.h
> +++ b/include/linux/bit_spinlock.h
> @@ -25,9 +25,8 @@ static inline void bit_spin_lock(int bitnum, unsigned long *addr)
>  #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
>  	while (unlikely(test_and_set_bit_lock(bitnum, addr))) {
>  		preempt_enable();
> -		do {
> -			cpu_relax();
> -		} while (test_bit(bitnum, addr));
> +		spin_do {
> +		} spin_while (test_bit(bitnum, addr));
>  		preempt_disable();
>  	}
>  #endif

That usage is strange, with nothing in the do{ }while loop. May be add a
macro for usages like this ?

-aneesh


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][RFC] Implement arch primitives for busywait loops
@ 2016-09-19  5:05   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2016-09-19  5:05 UTC (permalink / raw)
  To: Nicholas Piggin, linux-arch; +Cc: linuxppc-dev, Nicholas Piggin


> diff --git a/include/linux/bit_spinlock.h b/include/linux/bit_spinlock.h
> index 3b5bafc..695743c 100644
> --- a/include/linux/bit_spinlock.h
> +++ b/include/linux/bit_spinlock.h
> @@ -25,9 +25,8 @@ static inline void bit_spin_lock(int bitnum, unsigned long *addr)
>  #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
>  	while (unlikely(test_and_set_bit_lock(bitnum, addr))) {
>  		preempt_enable();
> -		do {
> -			cpu_relax();
> -		} while (test_bit(bitnum, addr));
> +		spin_do {
> +		} spin_while (test_bit(bitnum, addr));
>  		preempt_disable();
>  	}
>  #endif

That usage is strange, with nothing in the do{ }while loop. May be add a
macro for usages like this ?

-aneesh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][RFC] Implement arch primitives for busywait loops
  2016-09-16  8:57 [PATCH][RFC] Implement arch primitives for busywait loops Nicholas Piggin
  2016-09-16 11:30   ` David Laight
  2016-09-19  5:05   ` Aneesh Kumar K.V
@ 2016-09-19  7:45 ` Balbir Singh
  2016-09-19  8:48   ` Nicholas Piggin
  2016-09-20 11:19 ` Christian Borntraeger
  3 siblings, 1 reply; 17+ messages in thread
From: Balbir Singh @ 2016-09-19  7:45 UTC (permalink / raw)
  To: Nicholas Piggin, linux-arch; +Cc: linuxppc-dev



On 16/09/16 18:57, Nicholas Piggin wrote:
> Implementing busy wait loops with cpu_relax() in callers poses
> some difficulties for powerpc.
> 
> First, we want to put our SMT thread into a low priority mode for the
> duration of the loop, but then return to normal priority after exiting
> the loop.  Dependong on the CPU design, 'HMT_low() ; HMT_medium();' as
> cpu_relax() does may have HMT_medium take effect before HMT_low made
> any (or much) difference.
> 
> Second, it can be beneficial for some implementations to spin on the
> exit condition with a statically predicted-not-taken branch (i.e.,
> always predict the loop will exit).
> 

IIUC, what you are proposing is that cpu_relax() be split such 
that on entry we do HMT_low() and on exit do HMT_medium(). I think
that makes a lot of sense, in that it allows the required transition
time from low to medium

> This is a quick RFC with a couple of users converted to see what
> people think. I don't use a C branch with hints, because we don't want
> the compiler moving the loop body out of line, which makes it a bit
> messy unfortunately. If there's a better way to do it, I'm all ears.
> 
> I would not propose to switch all callers immediately, just some
> core synchronisation primitives.
> 
> ---
>  arch/powerpc/include/asm/processor.h | 22 ++++++++++++++++++++++
>  include/asm-generic/barrier.h        |  7 ++-----
>  include/linux/bit_spinlock.h         |  5 ++---
>  include/linux/cgroup.h               |  7 ++-----
>  include/linux/seqlock.h              | 10 ++++------
>  5 files changed, 32 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
> index 68e3bf5..e10aee2 100644
> --- a/arch/powerpc/include/asm/processor.h
> +++ b/arch/powerpc/include/asm/processor.h
> @@ -402,6 +402,28 @@ static inline unsigned long __pack_fe01(unsigned int fpmode)
>  
>  #ifdef CONFIG_PPC64
>  #define cpu_relax()	do { HMT_low(); HMT_medium(); barrier(); } while (0)
> +
> +#define spin_do						\

How about cpu_relax_begin()?

> +do {							\
> +	HMT_low();					\
> +	__asm__ __volatile__ (	"1010:");
> +
> +#define spin_while(cond)				\

cpu_relax_while()
> +	barrier();					\
> +	__asm__ __volatile__ (	"cmpdi	%0,0	\n\t"	\
> +				"beq-	1010b	\n\t"	\
> +				: : "r" (cond));	\
> +	HMT_medium();					\
> +} while (0)
> +


> +#define spin_until(cond)				\

This is just spin_while(!cond) from an implementation perspective right?

cpu_relax_until()

> +	barrier();					\
> +	__asm__ __volatile__ (	"cmpdi	%0,0	\n\t"	\
> +				"bne-	1010b	\n\t"	\
> +				: : "r" (cond));	\
> +	HMT_medium();					\
> +} while (0)
> +

Then add cpu_relax_end() that does HMT_medium()

Balbir Singh.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][RFC] Implement arch primitives for busywait loops
  2016-09-19  7:45 ` Balbir Singh
@ 2016-09-19  8:48   ` Nicholas Piggin
  0 siblings, 0 replies; 17+ messages in thread
From: Nicholas Piggin @ 2016-09-19  8:48 UTC (permalink / raw)
  To: Balbir Singh; +Cc: linux-arch, linuxppc-dev

On Mon, 19 Sep 2016 17:45:52 +1000
Balbir Singh <bsingharora@gmail.com> wrote:

> On 16/09/16 18:57, Nicholas Piggin wrote:
> > Implementing busy wait loops with cpu_relax() in callers poses
> > some difficulties for powerpc.
> > 
> > First, we want to put our SMT thread into a low priority mode for the
> > duration of the loop, but then return to normal priority after exiting
> > the loop.  Dependong on the CPU design, 'HMT_low() ; HMT_medium();' as
> > cpu_relax() does may have HMT_medium take effect before HMT_low made
> > any (or much) difference.
> > 
> > Second, it can be beneficial for some implementations to spin on the
> > exit condition with a statically predicted-not-taken branch (i.e.,
> > always predict the loop will exit).
> >   
> 
> IIUC, what you are proposing is that cpu_relax() be split such 
> that on entry we do HMT_low() and on exit do HMT_medium(). I think
> that makes a lot of sense, in that it allows the required transition
> time from low to medium

Basically yes, although also allowing the loop exit branch to be
overridden by the arch code too. That can possibly benefit some
microarchitectures -- e.g., you want loop exit to not take a branch
miss if possible but it may be acceptable to branch miss for every
other iteration. I'm doing some testing of it now (previous patch
was garbage btw, don't try to use it!)


> > diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
> > index 68e3bf5..e10aee2 100644
> > --- a/arch/powerpc/include/asm/processor.h
> > +++ b/arch/powerpc/include/asm/processor.h
> > @@ -402,6 +402,28 @@ static inline unsigned long __pack_fe01(unsigned int fpmode)
> >  
> >  #ifdef CONFIG_PPC64
> >  #define cpu_relax()	do { HMT_low(); HMT_medium(); barrier(); } while (0)
> > +
> > +#define spin_do						\  
> 
> How about cpu_relax_begin()?
> 
> > +do {							\
> > +	HMT_low();					\
> > +	__asm__ __volatile__ (	"1010:");
> > +
> > +#define spin_while(cond)				\  
> 
> cpu_relax_while()
> > +	barrier();					\
> > +	__asm__ __volatile__ (	"cmpdi	%0,0	\n\t"	\
> > +				"beq-	1010b	\n\t"	\
> > +				: : "r" (cond));	\
> > +	HMT_medium();					\
> > +} while (0)
> > +  
> 
> 
> > +#define spin_until(cond)				\  
> 
> This is just spin_while(!cond) from an implementation perspective right?

Yes, the only reason I put it in was because such spin loops often
read a bit better the other way from normal loops (you are interested
in the exit condition rather than the loop-again condition).

> 
> cpu_relax_until()
> 
> > +	barrier();					\
> > +	__asm__ __volatile__ (	"cmpdi	%0,0	\n\t"	\
> > +				"bne-	1010b	\n\t"	\
> > +				: : "r" (cond));	\
> > +	HMT_medium();					\
> > +} while (0)
> > +  
> 
> Then add cpu_relax_end() that does HMT_medium()

Hmm. I see what you mean, but I don't know if we should trust open
coded callers to get this right. It could cause weird problems and
isn't easily caught. If we can get something that breaks build,
perhaps. OTOH, I prefer all the logic including SMT priority to be
in the spin loop primitive directly because it's pretty subtle and
might need to be runtime patched.

We could *also* have a cpu_relax_begin/cpu_relax_end pair, although
I'd like to first see callers that don't suit spin_ primitives and
see what should be done.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][RFC] Implement arch primitives for busywait loops
  2016-09-16  8:57 [PATCH][RFC] Implement arch primitives for busywait loops Nicholas Piggin
                   ` (2 preceding siblings ...)
  2016-09-19  7:45 ` Balbir Singh
@ 2016-09-20 11:19 ` Christian Borntraeger
  2016-09-20 12:27   ` Nicholas Piggin
  3 siblings, 1 reply; 17+ messages in thread
From: Christian Borntraeger @ 2016-09-20 11:19 UTC (permalink / raw)
  To: Nicholas Piggin, linux-arch; +Cc: linuxppc-dev, linux-s390

On 09/16/2016 10:57 AM, Nicholas Piggin wrote:
> Implementing busy wait loops with cpu_relax() in callers poses
> some difficulties for powerpc.
> 
> First, we want to put our SMT thread into a low priority mode for the
> duration of the loop, but then return to normal priority after exiting
> the loop.  Dependong on the CPU design, 'HMT_low() ; HMT_medium();' as
> cpu_relax() does may have HMT_medium take effect before HMT_low made
> any (or much) difference.
> 
> Second, it can be beneficial for some implementations to spin on the
> exit condition with a statically predicted-not-taken branch (i.e.,
> always predict the loop will exit).
> 
> This is a quick RFC with a couple of users converted to see what
> people think. I don't use a C branch with hints, because we don't want
> the compiler moving the loop body out of line, which makes it a bit
> messy unfortunately. If there's a better way to do it, I'm all ears.
> 
> I would not propose to switch all callers immediately, just some
> core synchronisation primitives.
Just a FYA,

On s390 we have a private version of cpu_relax that yields the cpu
time slice back to the hypervisor via a hypercall. As this turned out
to be problematic in some cases there is also now a cpu_relax_lowlatency.

Now, this seems still problematic as there are too many places still 
using cpu_relax instead of cpu_relax_lowlatency. So my plan is to do 
a change of that, make cpu_relax just be a barrier and add a new 
cpu_relax_yield that gives up the time slice. (so that s390 cpu_relax
is just like any other cpu_relax)

As far as I can tell the only place where I want to change cpu_relax
to cpu_relax_lowlatency after that change is the stop machine run 
code, so I hope to have no conflicts with your changes.
> 
> ---
>  arch/powerpc/include/asm/processor.h | 22 ++++++++++++++++++++++
>  include/asm-generic/barrier.h        |  7 ++-----
>  include/linux/bit_spinlock.h         |  5 ++---
>  include/linux/cgroup.h               |  7 ++-----
>  include/linux/seqlock.h              | 10 ++++------
>  5 files changed, 32 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
> index 68e3bf5..e10aee2 100644
> --- a/arch/powerpc/include/asm/processor.h
> +++ b/arch/powerpc/include/asm/processor.h
> @@ -402,6 +402,28 @@ static inline unsigned long __pack_fe01(unsigned int fpmode)
> 
>  #ifdef CONFIG_PPC64
>  #define cpu_relax()	do { HMT_low(); HMT_medium(); barrier(); } while (0)
> +
> +#define spin_do						\
> +do {							\
> +	HMT_low();					\
> +	__asm__ __volatile__ (	"1010:");
> +
> +#define spin_while(cond)				\
> +	barrier();					\
> +	__asm__ __volatile__ (	"cmpdi	%0,0	\n\t"	\
> +				"beq-	1010b	\n\t"	\
> +				: : "r" (cond));	\
> +	HMT_medium();					\
> +} while (0)
> +
> +#define spin_until(cond)				\
> +	barrier();					\
> +	__asm__ __volatile__ (	"cmpdi	%0,0	\n\t"	\
> +				"bne-	1010b	\n\t"	\
> +				: : "r" (cond));	\
> +	HMT_medium();					\
> +} while (0)
> +
>  #else
>  #define cpu_relax()	barrier()
>  #endif
> diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
> index fe297b5..4c53b3a 100644
> --- a/include/asm-generic/barrier.h
> +++ b/include/asm-generic/barrier.h
> @@ -235,12 +235,9 @@ do {									\
>  #define smp_cond_load_acquire(ptr, cond_expr) ({		\
>  	typeof(ptr) __PTR = (ptr);				\
>  	typeof(*ptr) VAL;					\
> -	for (;;) {						\
> +	spin_do {						\
>  		VAL = READ_ONCE(*__PTR);			\
> -		if (cond_expr)					\
> -			break;					\
> -		cpu_relax();					\
> -	}							\
> +	} spin_until (cond_expr);				\
>  	smp_acquire__after_ctrl_dep();				\
>  	VAL;							\
>  })
> diff --git a/include/linux/bit_spinlock.h b/include/linux/bit_spinlock.h
> index 3b5bafc..695743c 100644
> --- a/include/linux/bit_spinlock.h
> +++ b/include/linux/bit_spinlock.h
> @@ -25,9 +25,8 @@ static inline void bit_spin_lock(int bitnum, unsigned long *addr)
>  #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
>  	while (unlikely(test_and_set_bit_lock(bitnum, addr))) {
>  		preempt_enable();
> -		do {
> -			cpu_relax();
> -		} while (test_bit(bitnum, addr));
> +		spin_do {
> +		} spin_while (test_bit(bitnum, addr));
>  		preempt_disable();
>  	}
>  #endif
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 984f73b..e7d395f 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -450,12 +450,9 @@ task_get_css(struct task_struct *task, int subsys_id)
>  	struct cgroup_subsys_state *css;
> 
>  	rcu_read_lock();
> -	while (true) {
> +	spin_do {
>  		css = task_css(task, subsys_id);
> -		if (likely(css_tryget_online(css)))
> -			break;
> -		cpu_relax();
> -	}
> +	} spin_until (likely(css_tryget_online(css)));
>  	rcu_read_unlock();
>  	return css;
>  }
> diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
> index ead9765..93ed609 100644
> --- a/include/linux/seqlock.h
> +++ b/include/linux/seqlock.h
> @@ -108,12 +108,10 @@ static inline unsigned __read_seqcount_begin(const seqcount_t *s)
>  {
>  	unsigned ret;
> 
> -repeat:
> -	ret = READ_ONCE(s->sequence);
> -	if (unlikely(ret & 1)) {
> -		cpu_relax();
> -		goto repeat;
> -	}
> +	spin_do {
> +		ret = READ_ONCE(s->sequence);
> +	} spin_while (unlikely(ret & 1));
> +
>  	return ret;
>  }
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][RFC] Implement arch primitives for busywait loops
  2016-09-20 11:19 ` Christian Borntraeger
@ 2016-09-20 12:27   ` Nicholas Piggin
  2016-09-20 12:35     ` Christian Borntraeger
  0 siblings, 1 reply; 17+ messages in thread
From: Nicholas Piggin @ 2016-09-20 12:27 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: linux-arch, linuxppc-dev, linux-s390

On Tue, 20 Sep 2016 13:19:30 +0200
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 09/16/2016 10:57 AM, Nicholas Piggin wrote:
> > Implementing busy wait loops with cpu_relax() in callers poses
> > some difficulties for powerpc.
> > 
> > First, we want to put our SMT thread into a low priority mode for the
> > duration of the loop, but then return to normal priority after exiting
> > the loop.  Dependong on the CPU design, 'HMT_low() ; HMT_medium();' as
> > cpu_relax() does may have HMT_medium take effect before HMT_low made
> > any (or much) difference.
> > 
> > Second, it can be beneficial for some implementations to spin on the
> > exit condition with a statically predicted-not-taken branch (i.e.,
> > always predict the loop will exit).
> > 
> > This is a quick RFC with a couple of users converted to see what
> > people think. I don't use a C branch with hints, because we don't want
> > the compiler moving the loop body out of line, which makes it a bit
> > messy unfortunately. If there's a better way to do it, I'm all ears.
> > 
> > I would not propose to switch all callers immediately, just some
> > core synchronisation primitives.  
> Just a FYA,
> 
> On s390 we have a private version of cpu_relax that yields the cpu
> time slice back to the hypervisor via a hypercall.

The powerpc guest also wants to yield to hypervisor in some busywait
situations.

> As this turned out
> to be problematic in some cases there is also now a cpu_relax_lowlatency.
> 
> Now, this seems still problematic as there are too many places still 
> using cpu_relax instead of cpu_relax_lowlatency. So my plan is to do 
> a change of that, make cpu_relax just be a barrier and add a new 
> cpu_relax_yield that gives up the time slice. (so that s390 cpu_relax
> is just like any other cpu_relax)
> 
> As far as I can tell the only place where I want to change cpu_relax
> to cpu_relax_lowlatency after that change is the stop machine run 
> code, so I hope to have no conflicts with your changes.

I don't think there should be any conflicts, but it would be good to
make sure busy wait primitives can be usable by s390. So I can add
_yield variants that can do the right thing for s390.

I need to think more about virtualization, so I'm glad you commented.
Powerpc would like to be told when a busywait loop knows the CPU it is
waiting for. So perhaps also a _yield_to_cpu variant as well.

Something that will work with mutex_spin_on_owner and similar would be
nice too. As far as I can tell, powerpc may want to yield to hypervisor
when the owner's vcpu is scheduled off in that case too.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][RFC] Implement arch primitives for busywait loops
  2016-09-20 12:27   ` Nicholas Piggin
@ 2016-09-20 12:35     ` Christian Borntraeger
  2016-09-20 12:46       ` Nicholas Piggin
  0 siblings, 1 reply; 17+ messages in thread
From: Christian Borntraeger @ 2016-09-20 12:35 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linux-arch, linuxppc-dev, linux-s390

On 09/20/2016 02:27 PM, Nicholas Piggin wrote:
> On Tue, 20 Sep 2016 13:19:30 +0200
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
>> On 09/16/2016 10:57 AM, Nicholas Piggin wrote:
>>> Implementing busy wait loops with cpu_relax() in callers poses
>>> some difficulties for powerpc.
>>>
>>> First, we want to put our SMT thread into a low priority mode for the
>>> duration of the loop, but then return to normal priority after exiting
>>> the loop.  Dependong on the CPU design, 'HMT_low() ; HMT_medium();' as
>>> cpu_relax() does may have HMT_medium take effect before HMT_low made
>>> any (or much) difference.
>>>
>>> Second, it can be beneficial for some implementations to spin on the
>>> exit condition with a statically predicted-not-taken branch (i.e.,
>>> always predict the loop will exit).
>>>
>>> This is a quick RFC with a couple of users converted to see what
>>> people think. I don't use a C branch with hints, because we don't want
>>> the compiler moving the loop body out of line, which makes it a bit
>>> messy unfortunately. If there's a better way to do it, I'm all ears.
>>>
>>> I would not propose to switch all callers immediately, just some
>>> core synchronisation primitives.  
>> Just a FYA,
>>
>> On s390 we have a private version of cpu_relax that yields the cpu
>> time slice back to the hypervisor via a hypercall.
> 
> The powerpc guest also wants to yield to hypervisor in some busywait
> situations.
> 
>> As this turned out
>> to be problematic in some cases there is also now a cpu_relax_lowlatency.
>>
>> Now, this seems still problematic as there are too many places still 
>> using cpu_relax instead of cpu_relax_lowlatency. So my plan is to do 
>> a change of that, make cpu_relax just be a barrier and add a new 
>> cpu_relax_yield that gives up the time slice. (so that s390 cpu_relax
>> is just like any other cpu_relax)
>>
>> As far as I can tell the only place where I want to change cpu_relax
>> to cpu_relax_lowlatency after that change is the stop machine run 
>> code, so I hope to have no conflicts with your changes.
> 
> I don't think there should be any conflicts, but it would be good to
> make sure busy wait primitives can be usable by s390. So I can add
> _yield variants that can do the right thing for s390.

I was distracted by "more important work" (TM) but I will put you on
CC when ready.
> 
> I need to think more about virtualization, so I'm glad you commented.
> Powerpc would like to be told when a busywait loop knows the CPU it is
> waiting for. So perhaps also a _yield_to_cpu variant as well.

Yes, we also have 2 hypercalls: one that yields somehow and one that yields
to a specific CPU. The latter is strongly preferred.
> 
> Something that will work with mutex_spin_on_owner and similar would be
> nice too. As far as I can tell, powerpc may want to yield to hypervisor
> when the owner's vcpu is scheduled off in that case too.
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][RFC] Implement arch primitives for busywait loops
  2016-09-20 12:35     ` Christian Borntraeger
@ 2016-09-20 12:46       ` Nicholas Piggin
  0 siblings, 0 replies; 17+ messages in thread
From: Nicholas Piggin @ 2016-09-20 12:46 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: linux-arch, linuxppc-dev, linux-s390

On Tue, 20 Sep 2016 14:35:45 +0200
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 09/20/2016 02:27 PM, Nicholas Piggin wrote:
> > On Tue, 20 Sep 2016 13:19:30 +0200
> > Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> >   
> >> On 09/16/2016 10:57 AM, Nicholas Piggin wrote:  
> >>> Implementing busy wait loops with cpu_relax() in callers poses
> >>> some difficulties for powerpc.
> >>>
> >>> First, we want to put our SMT thread into a low priority mode for the
> >>> duration of the loop, but then return to normal priority after exiting
> >>> the loop.  Dependong on the CPU design, 'HMT_low() ; HMT_medium();' as
> >>> cpu_relax() does may have HMT_medium take effect before HMT_low made
> >>> any (or much) difference.
> >>>
> >>> Second, it can be beneficial for some implementations to spin on the
> >>> exit condition with a statically predicted-not-taken branch (i.e.,
> >>> always predict the loop will exit).
> >>>
> >>> This is a quick RFC with a couple of users converted to see what
> >>> people think. I don't use a C branch with hints, because we don't want
> >>> the compiler moving the loop body out of line, which makes it a bit
> >>> messy unfortunately. If there's a better way to do it, I'm all ears.
> >>>
> >>> I would not propose to switch all callers immediately, just some
> >>> core synchronisation primitives.    
> >> Just a FYA,
> >>
> >> On s390 we have a private version of cpu_relax that yields the cpu
> >> time slice back to the hypervisor via a hypercall.  
> > 
> > The powerpc guest also wants to yield to hypervisor in some busywait
> > situations.
> >   
> >> As this turned out
> >> to be problematic in some cases there is also now a cpu_relax_lowlatency.
> >>
> >> Now, this seems still problematic as there are too many places still 
> >> using cpu_relax instead of cpu_relax_lowlatency. So my plan is to do 
> >> a change of that, make cpu_relax just be a barrier and add a new 
> >> cpu_relax_yield that gives up the time slice. (so that s390 cpu_relax
> >> is just like any other cpu_relax)
> >>
> >> As far as I can tell the only place where I want to change cpu_relax
> >> to cpu_relax_lowlatency after that change is the stop machine run 
> >> code, so I hope to have no conflicts with your changes.  
> > 
> > I don't think there should be any conflicts, but it would be good to
> > make sure busy wait primitives can be usable by s390. So I can add
> > _yield variants that can do the right thing for s390.  
> 
> I was distracted by "more important work" (TM) but I will put you on
> CC when ready.
> > 
> > I need to think more about virtualization, so I'm glad you commented.
> > Powerpc would like to be told when a busywait loop knows the CPU it is
> > waiting for. So perhaps also a _yield_to_cpu variant as well.  
> 
> Yes, we also have 2 hypercalls: one that yields somehow and one that yields
> to a specific CPU. The latter is strongly preferred.

Okay, sounds good. I'll send out some updated patches soon too, so I'll
cc you on those. It would be good to come up with some basic guidelines
for when to use each variant too.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-09-20 12:46 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-16  8:57 [PATCH][RFC] Implement arch primitives for busywait loops Nicholas Piggin
2016-09-16 11:30 ` David Laight
2016-09-16 11:30   ` David Laight
2016-09-16 11:52   ` Nicholas Piggin
2016-09-16 11:57     ` David Laight
2016-09-16 11:57       ` David Laight
2016-09-16 12:06       ` Nicholas Piggin
2016-09-16 12:59         ` Nicholas Piggin
2016-09-19  5:05 ` Aneesh Kumar K.V
2016-09-19  5:05   ` Aneesh Kumar K.V
2016-09-19  5:05   ` Aneesh Kumar K.V
2016-09-19  7:45 ` Balbir Singh
2016-09-19  8:48   ` Nicholas Piggin
2016-09-20 11:19 ` Christian Borntraeger
2016-09-20 12:27   ` Nicholas Piggin
2016-09-20 12:35     ` Christian Borntraeger
2016-09-20 12:46       ` Nicholas Piggin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.