linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] MIPS: implement smp_cond_load_acquire() for Loongson-3
       [not found] <1529042858-9483-1-git-send-email-chenhc@lemote.com>
@ 2018-06-18 18:51 ` Paul Burton
  2018-06-19  6:40   ` 陈华才
  2018-06-19  7:17   ` Peter Zijlstra
  0 siblings, 2 replies; 8+ messages in thread
From: Paul Burton @ 2018-06-18 18:51 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Ralf Baechle, James Hogan, linux-mips, Fuxin Zhang, Zhangjin Wu,
	Huacai Chen, stable, Alan Stern, Andrea Parri, Will Deacon,
	Peter Zijlstra, Boqun Feng, Nicholas Piggin, David Howells,
	Jade Alglave, Luc Maranget, Paul E. McKenney, Akira Yokosawa,
	linux-kernel

Hi Huacai,

On Fri, Jun 15, 2018 at 02:07:38PM +0800, Huacai Chen wrote:
> After commit 7f56b58a92aaf2c ("locking/mcs: Use smp_cond_load_acquire()
> in MCS spin loop") Loongson-3 fails to boot. This is because Loongson-3
> has SFB (Store Fill Buffer) and READ_ONCE() may get an old value in a
> tight loop. So in smp_cond_load_acquire() we need a __smp_mb() after
> every READ_ONCE().

Thanks - modifying smp_cond_load_acquire() is a step better than
modifying arch_mcs_spin_lock_contended() to avoid it, but I'm still not
sure we've reached the root of the problem. If tight loops using
READ_ONCE() are at fault then what's special about
smp_cond_load_acquire()? Could other such loops not hit the same
problem?

Is the scenario you encounter the same as that outlined in the "DATA
DEPENDENCY BARRIERS (HISTORICAL)" section of
Documentation/memory-barriers.txt by any chance? If so then perhaps it
would be better to implement smp_read_barrier_depends(), or just raw
read_barrier_depends() depending upon how your SFB functions.

Is there any public documentation describing the behaviour of the store
fill buffer in Loongson-3?

Part of the problem is that I'm still not sure what's actually happening
in your system - it would be helpful to have further information in the
commit message about what actually happens. For example if you could
walk us through an example of the problem step by step in the style of
the diagrams you'll see in Documentation/memory-barriers.txt then I
think that would help us to see what the best solution here is.

I've copied the LKMM maintainers in case they have further input.

Thanks,
    Paul

> This patch introduce a Loongson-specific smp_cond_load_acquire(). And
> it should be backported to as early as linux-4.5, in which release the
> smp_cond_acquire() is introduced.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Huacai Chen <chenhc@lemote.com>
> ---
>  arch/mips/include/asm/barrier.h | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
> index a5eb1bb..4ea384d 100644
> --- a/arch/mips/include/asm/barrier.h
> +++ b/arch/mips/include/asm/barrier.h
> @@ -222,6 +222,23 @@
>  #define __smp_mb__before_atomic()	__smp_mb__before_llsc()
>  #define __smp_mb__after_atomic()	smp_llsc_mb()
>  
> +#ifdef CONFIG_CPU_LOONGSON3
> +/* Loongson-3 need a __smp_mb() after READ_ONCE() here */
> +#define smp_cond_load_acquire(ptr, cond_expr)			\
> +({								\
> +	typeof(ptr) __PTR = (ptr);				\
> +	typeof(*ptr) VAL;					\
> +	for (;;) {						\
> +		VAL = READ_ONCE(*__PTR);			\
> +		__smp_mb();					\
> +		if (cond_expr)					\
> +			break;					\
> +		cpu_relax();					\
> +	}							\
> +	VAL;							\
> +})
> +#endif	/* CONFIG_CPU_LOONGSON3 */
> +
>  #include <asm-generic/barrier.h>
>  
>  #endif /* __ASM_BARRIER_H */
> -- 
> 2.7.0
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] MIPS: implement smp_cond_load_acquire() for Loongson-3
  2018-06-18 18:51 ` [PATCH] MIPS: implement smp_cond_load_acquire() for Loongson-3 Paul Burton
@ 2018-06-19  6:40   ` 陈华才
  2018-06-19  7:22     ` Peter Zijlstra
  2018-06-19  7:17   ` Peter Zijlstra
  1 sibling, 1 reply; 8+ messages in thread
From: 陈华才 @ 2018-06-19  6:40 UTC (permalink / raw)
  To: Paul Burton
  Cc: Ralf Baechle, James Hogan, linux-mips, Fuxin Zhang, wuzhangjin,
	Huacai Chen, stable, Alan Stern, AndreaParri, Will Deacon,
	Peter Zijlstra, Boqun Feng, Nicholas Piggin, David Howells,
	Jade Alglave, Luc Maranget, Paul E. McKenney, Akira Yokosawa,
	linux-kernel

Hi, Paul,

First of all, could you please check why linux-mips reject e-mails from lemote.com? Of course I can send e-mails by gmail, but my gmail can't receive e-mails from linux-mips since March, 2018.

I have already read Documentation/memory-barriers.txt, but I don't think we should define a smp_read_barrier_depends() for Loongson-3. Because Loongson-3's behavior isn't like Alpha, and in syntax, this is not a data-dependent issue.

There is no document about Loongson-3's SFB. In my opinion, SFB looks like the L0 cache but sometimes it is out of cache-coherent machanism (L1 cache's cross-core coherency is maintained by hardware, but not always true for SFB). smp_mb() is needed for smp_cond_load_acquire(), but not every READ_ONCE().

Huacai
 
------------------ Original ------------------
From:  "Paul Burton"<paul.burton@mips.com>;
Date:  Tue, Jun 19, 2018 02:51 AM
To:  "Huacai Chen"<chenhc@lemote.com>;
Cc:  "Ralf Baechle"<ralf@linux-mips.org>; "James Hogan"<james.hogan@mips.com>; "linux-mips"<linux-mips@linux-mips.org>; "Fuxin Zhang"<zhangfx@lemote.com>; "wuzhangjin"<wuzhangjin@gmail.com>; "Huacai Chen"<chenhuacai@gmail.com>; "stable"<stable@vger.kernel.org>; "Alan Stern"<stern@rowland.harvard.edu>; "AndreaParri"<andrea.parri@amarulasolutions.com>; "Will Deacon"<will.deacon@arm.com>; "Peter Zijlstra"<peterz@infradead.org>; "Boqun Feng"<boqun.feng@gmail.com>; "Nicholas Piggin"<npiggin@gmail.com>; "David Howells"<dhowells@redhat.com>; "Jade Alglave"<j.alglave@ucl.ac.uk>; "Luc Maranget"<luc.maranget@inria.fr>; "Paul E. McKenney"<paulmck@linux.vnet.ibm.com>; "Akira Yokosawa"<akiyks@gmail.com>; "linux-kernel"<linux-kernel@vger.kernel.org>;
Subject:  Re: [PATCH] MIPS: implement smp_cond_load_acquire() for Loongson-3
 
Hi Huacai,

On Fri, Jun 15, 2018 at 02:07:38PM +0800, Huacai Chen wrote:
> After commit 7f56b58a92aaf2c ("locking/mcs: Use smp_cond_load_acquire()
> in MCS spin loop") Loongson-3 fails to boot. This is because Loongson-3
> has SFB (Store Fill Buffer) and READ_ONCE() may get an old value in a
> tight loop. So in smp_cond_load_acquire() we need a __smp_mb() after
> every READ_ONCE().

Thanks - modifying smp_cond_load_acquire() is a step better than
modifying arch_mcs_spin_lock_contended() to avoid it, but I'm still not
sure we've reached the root of the problem. If tight loops using
READ_ONCE() are at fault then what's special about
smp_cond_load_acquire()? Could other such loops not hit the same
problem?

Is the scenario you encounter the same as that outlined in the "DATA
DEPENDENCY BARRIERS (HISTORICAL)" section of
Documentation/memory-barriers.txt by any chance? If so then perhaps it
would be better to implement smp_read_barrier_depends(), or just raw
read_barrier_depends() depending upon how your SFB functions.

Is there any public documentation describing the behaviour of the store
fill buffer in Loongson-3?

Part of the problem is that I'm still not sure what's actually happening
in your system - it would be helpful to have further information in the
commit message about what actually happens. For example if you could
walk us through an example of the problem step by step in the style of
the diagrams you'll see in Documentation/memory-barriers.txt then I
think that would help us to see what the best solution here is.

I've copied the LKMM maintainers in case they have further input.

Thanks,
    Paul

> This patch introduce a Loongson-specific smp_cond_load_acquire(). And
> it should be backported to as early as linux-4.5, in which release the
> smp_cond_acquire() is introduced.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Huacai Chen <chenhc@lemote.com>
> ---
>  arch/mips/include/asm/barrier.h | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
> index a5eb1bb..4ea384d 100644
> --- a/arch/mips/include/asm/barrier.h
> +++ b/arch/mips/include/asm/barrier.h
> @@ -222,6 +222,23 @@
>  #define __smp_mb__before_atomic()	__smp_mb__before_llsc()
>  #define __smp_mb__after_atomic()	smp_llsc_mb()
>  
> +#ifdef CONFIG_CPU_LOONGSON3
> +/* Loongson-3 need a __smp_mb() after READ_ONCE() here */
> +#define smp_cond_load_acquire(ptr, cond_expr)	\
> +({	\
> +	typeof(ptr) __PTR = (ptr);	\
> +	typeof(*ptr) VAL;	\
> +	for (;;) {	\
> +	VAL = READ_ONCE(*__PTR);	\
> +	__smp_mb();	\
> +	if (cond_expr)	\
> +	break;	\
> +	cpu_relax();	\
> +	}	\
> +	VAL;	\
> +})
> +#endif	/* CONFIG_CPU_LOONGSON3 */
> +
>  #include <asm-generic/barrier.h>
>  
>  #endif /* __ASM_BARRIER_H */
> -- 
> 2.7.0
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] MIPS: implement smp_cond_load_acquire() for Loongson-3
  2018-06-18 18:51 ` [PATCH] MIPS: implement smp_cond_load_acquire() for Loongson-3 Paul Burton
  2018-06-19  6:40   ` 陈华才
@ 2018-06-19  7:17   ` Peter Zijlstra
  2018-06-19  8:52     ` Will Deacon
  1 sibling, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2018-06-19  7:17 UTC (permalink / raw)
  To: Paul Burton
  Cc: Huacai Chen, Ralf Baechle, James Hogan, linux-mips, Fuxin Zhang,
	Zhangjin Wu, Huacai Chen, stable, Alan Stern, Andrea Parri,
	Will Deacon, Boqun Feng, Nicholas Piggin, David Howells,
	Jade Alglave, Luc Maranget, Paul E. McKenney, Akira Yokosawa,
	linux-kernel

On Mon, Jun 18, 2018 at 11:51:41AM -0700, Paul Burton wrote:
> Hi Huacai,
> 
> On Fri, Jun 15, 2018 at 02:07:38PM +0800, Huacai Chen wrote:
> > After commit 7f56b58a92aaf2c ("locking/mcs: Use smp_cond_load_acquire()
> > in MCS spin loop") Loongson-3 fails to boot. This is because Loongson-3
> > has SFB (Store Fill Buffer) and READ_ONCE() may get an old value in a
> > tight loop. So in smp_cond_load_acquire() we need a __smp_mb() after
> > every READ_ONCE().
> 
> Thanks - modifying smp_cond_load_acquire() is a step better than
> modifying arch_mcs_spin_lock_contended() to avoid it, but I'm still not
> sure we've reached the root of the problem. 

Agreed, this looks entirely dodgy.

> If tight loops using
> READ_ONCE() are at fault then what's special about
> smp_cond_load_acquire()? Could other such loops not hit the same
> problem?

Right again, Linux has a number of places where it relies on loops like
this.

	for (;;) {
		if (READ_ONCE(*ptr))
			break;

		cpu_relax();
	}

That is assumed to terminate -- provided the store to make *ptr != 0
happens of course.

And this has nothing to do with store buffers per se, sure store-buffers
might delay the store from being visible for a (little) while, but we
very much assume store buffers will not indefinitely hold on to data.

The proposed __smp_mb() doesn't make any kind of sense here. I presume
it's effect is to flush remote store buffers, but that is not something
LKMM allows for.

> Is the scenario you encounter the same as that outlined in the "DATA
> DEPENDENCY BARRIERS (HISTORICAL)" section of
> Documentation/memory-barriers.txt by any chance? If so then perhaps it
> would be better to implement smp_read_barrier_depends(), or just raw
> read_barrier_depends() depending upon how your SFB functions.

That doesn't make any sense, there is no actual data dependency here. We
load a single variable. Data dependencies are where you have (at least)
2 loads, where the second depends on the first, like:

	struct obj *obj = rcu_dereference(objp);
	int val = obj->val;

Here the load of val depends on the load of obj.

> Is there any public documentation describing the behaviour of the store
> fill buffer in Loongson-3?

I know of Store Buffers, but what exactly is a Store Fill Buffer? Better
would be to get a coherent memory model document on Loongson, because
this all smells horribly.

> Part of the problem is that I'm still not sure what's actually happening
> in your system - it would be helpful to have further information in the
> commit message about what actually happens. For example if you could
> walk us through an example of the problem step by step in the style of
> the diagrams you'll see in Documentation/memory-barriers.txt then I
> think that would help us to see what the best solution here is.
> 
> I've copied the LKMM maintainers in case they have further input.

Thanks, patches like proposed certainly require closer scrutiny and I
agree with you that this needs far better explanation.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] MIPS: implement smp_cond_load_acquire() for Loongson-3
  2018-06-19  6:40   ` 陈华才
@ 2018-06-19  7:22     ` Peter Zijlstra
  2018-06-20  3:31       ` 陈华才
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2018-06-19  7:22 UTC (permalink / raw)
  To: 陈华才
  Cc: Paul Burton, Ralf Baechle, James Hogan, linux-mips, Fuxin Zhang,
	wuzhangjin, Huacai Chen, stable, Alan Stern, AndreaParri,
	Will Deacon, Boqun Feng, Nicholas Piggin, David Howells,
	Jade Alglave, Luc Maranget, Paul E. McKenney, Akira Yokosawa,
	linux-kernel

On Tue, Jun 19, 2018 at 02:40:14PM +0800, 陈华才 wrote:
> Hi, Paul,
> 
> First of all, could you please check why linux-mips reject e-mails
> from lemote.com? Of course I can send e-mails by gmail, but my gmail
> can't receive e-mails from linux-mips since March, 2018.

Could you please learn to use email? No top posting and wrap lines at 78
chars.

> I have already read Documentation/memory-barriers.txt, but I don't
> think we should define a smp_read_barrier_depends() for Loongson-3.
> Because Loongson-3's behavior isn't like Alpha, and in syntax, this is
> not a data-dependent issue.

Agreed, this is not a data-dependency issue.

> There is no document about Loongson-3's SFB. In my opinion, SFB looks
> like the L0 cache but sometimes it is out of cache-coherent machanism
> (L1 cache's cross-core coherency is maintained by hardware, but not
> always true for SFB). smp_mb() is needed for smp_cond_load_acquire(),
> but not every READ_ONCE().

Linux does _NOT_ support non-coherent SMP. If your system is not fully
coherent, you're out of luck.

But please, explain in excruciating detail what exactly you need that
smp_mb for. If, like I posited in my previous email, it is to ensure
remote store buffer flushes, then your machine is terminally broken.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] MIPS: implement smp_cond_load_acquire() for Loongson-3
  2018-06-19  7:17   ` Peter Zijlstra
@ 2018-06-19  8:52     ` Will Deacon
  0 siblings, 0 replies; 8+ messages in thread
From: Will Deacon @ 2018-06-19  8:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul Burton, Huacai Chen, Ralf Baechle, James Hogan, linux-mips,
	Fuxin Zhang, Zhangjin Wu, Huacai Chen, stable, Alan Stern,
	Andrea Parri, Boqun Feng, Nicholas Piggin, David Howells,
	Jade Alglave, Luc Maranget, Paul E. McKenney, Akira Yokosawa,
	linux-kernel

Hi all,

On Tue, Jun 19, 2018 at 09:17:10AM +0200, Peter Zijlstra wrote:
> On Mon, Jun 18, 2018 at 11:51:41AM -0700, Paul Burton wrote:
> > On Fri, Jun 15, 2018 at 02:07:38PM +0800, Huacai Chen wrote:
> > > After commit 7f56b58a92aaf2c ("locking/mcs: Use smp_cond_load_acquire()
> > > in MCS spin loop") Loongson-3 fails to boot. This is because Loongson-3
> > > has SFB (Store Fill Buffer) and READ_ONCE() may get an old value in a
> > > tight loop. So in smp_cond_load_acquire() we need a __smp_mb() after
> > > every READ_ONCE().
> > 
> > Thanks - modifying smp_cond_load_acquire() is a step better than
> > modifying arch_mcs_spin_lock_contended() to avoid it, but I'm still not
> > sure we've reached the root of the problem. 
> 
> Agreed, this looks entirely dodgy.
> 
> > If tight loops using
> > READ_ONCE() are at fault then what's special about
> > smp_cond_load_acquire()? Could other such loops not hit the same
> > problem?
> 
> Right again, Linux has a number of places where it relies on loops like
> this.
> 
> 	for (;;) {
> 		if (READ_ONCE(*ptr))
> 			break;
> 
> 		cpu_relax();
> 	}
> 
> That is assumed to terminate -- provided the store to make *ptr != 0
> happens of course.
> 
> And this has nothing to do with store buffers per se, sure store-buffers
> might delay the store from being visible for a (little) while, but we
> very much assume store buffers will not indefinitely hold on to data.

We had an issue 8 years ago with the 11MPCore CPU where reads were
prioritised over writes, so code doing something like:

  WRITE_ONCE(*foo, 1);
  while (!READ_ONCE(*bar));

might never make the store to foo visible to other CPUs. This caused a
livelock in KGDB, where two CPUs were doing this on opposite variables
(i.e. the "SB" litmus test, but with the reads looping until they read
1).

See 534be1d5a2da ("ARM: 6194/1: change definition of cpu_relax() for
ARM11MPCore") for the ugly fix, assuming that the "Store Fill Buffer"
suffers from the same disease.

Will

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] MIPS: implement smp_cond_load_acquire() for Loongson-3
  2018-06-19  7:22     ` Peter Zijlstra
@ 2018-06-20  3:31       ` 陈华才
  2018-06-20  8:17         ` Will Deacon
  0 siblings, 1 reply; 8+ messages in thread
From: 陈华才 @ 2018-06-20  3:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul Burton, Ralf Baechle, James Hogan, linux-mips, Fuxin Zhang,
	wuzhangjin, Huacai Chen, stable, Alan Stern, AndreaParri,
	Will Deacon, Boqun Feng, Nicholas Piggin, David Howells,
	Jade Alglave, Luc Maranget, Paul E. McKenney, Akira Yokosawa,
	linux-kernel

Hi, Peter and Paul,

Loongson-3's Store Fill Buffer is nearly the same as your "Store Buffer", and it increases the memory ordering weakness. So, smp_cond_load_acquire() only need a __smp_mb() before the loop, not after every READ_ONCE(). In other word, the following code is just OK:

#define smp_cond_load_acquire(ptr, cond_expr)                   \
({                                                              \
        typeof(ptr) __PTR = (ptr);                              \
        typeof(*ptr) VAL;                                       \
        __smp_mb();                                     \
        for (;;) {                                              \
                VAL = READ_ONCE(*__PTR);                        \
                if (cond_expr)                                  \
                        break;                                  \
                cpu_relax();                                    \
        }                                                       \
        __smp_mb();                                     \
        VAL;                                                    \
})

the __smp_mb() before loop is used to avoid "reads prioritised over writes", which is caused by SFB's weak ordering and similar to ARM11MPCore (mentioned by Will Deacon).

Huacai

------------------ Original ------------------
From:  "Peter Zijlstra"<peterz@infradead.org>;
Date:  Tue, Jun 19, 2018 03:22 PM
To:  "陈华才"<chenhc@lemote.com>;
Cc:  "Paul Burton"<paul.burton@mips.com>; "Ralf Baechle"<ralf@linux-mips.org>; "James Hogan"<james.hogan@mips.com>; "linux-mips"<linux-mips@linux-mips.org>; "Fuxin Zhang"<zhangfx@lemote.com>; "wuzhangjin"<wuzhangjin@gmail.com>; "Huacai Chen"<chenhuacai@gmail.com>; "stable"<stable@vger.kernel.org>; "Alan Stern"<stern@rowland.harvard.edu>; "AndreaParri"<andrea.parri@amarulasolutions.com>; "Will Deacon"<will.deacon@arm.com>; "Boqun Feng"<boqun.feng@gmail.com>; "Nicholas Piggin"<npiggin@gmail.com>; "David Howells"<dhowells@redhat.com>; "Jade Alglave"<j.alglave@ucl.ac.uk>; "Luc Maranget"<luc.maranget@inria.fr>; "Paul E. McKenney"<paulmck@linux.vnet.ibm.com>; "Akira Yokosawa"<akiyks@gmail.com>; "linux-kernel"<linux-kernel@vger.kernel.org>;
Subject:  Re: [PATCH] MIPS: implement smp_cond_load_acquire() for Loongson-3
 
On Tue, Jun 19, 2018 at 02:40:14PM +0800, 陈华才 wrote:
> Hi, Paul,
> 
> First of all, could you please check why linux-mips reject e-mails
> from lemote.com? Of course I can send e-mails by gmail, but my gmail
> can't receive e-mails from linux-mips since March, 2018.

Could you please learn to use email? No top posting and wrap lines at 78
chars.

> I have already read Documentation/memory-barriers.txt, but I don't
> think we should define a smp_read_barrier_depends() for Loongson-3.
> Because Loongson-3's behavior isn't like Alpha, and in syntax, this is
> not a data-dependent issue.

Agreed, this is not a data-dependency issue.

> There is no document about Loongson-3's SFB. In my opinion, SFB looks
> like the L0 cache but sometimes it is out of cache-coherent machanism
> (L1 cache's cross-core coherency is maintained by hardware, but not
> always true for SFB). smp_mb() is needed for smp_cond_load_acquire(),
> but not every READ_ONCE().

Linux does _NOT_ support non-coherent SMP. If your system is not fully
coherent, you're out of luck.

But please, explain in excruciating detail what exactly you need that
smp_mb for. If, like I posited in my previous email, it is to ensure
remote store buffer flushes, then your machine is terminally broken.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] MIPS: implement smp_cond_load_acquire() for Loongson-3
  2018-06-20  3:31       ` 陈华才
@ 2018-06-20  8:17         ` Will Deacon
  2018-06-20  9:22           ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Will Deacon @ 2018-06-20  8:17 UTC (permalink / raw)
  To: 陈华才
  Cc: Peter Zijlstra, Paul Burton, Ralf Baechle, James Hogan,
	linux-mips, Fuxin Zhang, wuzhangjin, Huacai Chen, stable,
	Alan Stern, AndreaParri, Boqun Feng, Nicholas Piggin,
	David Howells, Jade Alglave, Luc Maranget, Paul E. McKenney,
	Akira Yokosawa, linux-kernel

On Wed, Jun 20, 2018 at 11:31:55AM +0800, 陈华才 wrote:
> Loongson-3's Store Fill Buffer is nearly the same as your "Store Buffer",
> and it increases the memory ordering weakness. So, smp_cond_load_acquire()
> only need a __smp_mb() before the loop, not after every READ_ONCE(). In
> other word, the following code is just OK:
> 
> #define smp_cond_load_acquire(ptr, cond_expr)                   \
> ({                                                              \
>         typeof(ptr) __PTR = (ptr);                              \
>         typeof(*ptr) VAL;                                       \
>         __smp_mb();                                     \
>         for (;;) {                                              \
>                 VAL = READ_ONCE(*__PTR);                        \
>                 if (cond_expr)                                  \
>                         break;                                  \
>                 cpu_relax();                                    \
>         }                                                       \
>         __smp_mb();                                     \
>         VAL;                                                    \
> })
> 
> the __smp_mb() before loop is used to avoid "reads prioritised over
> writes", which is caused by SFB's weak ordering and similar to ARM11MPCore
> (mentioned by Will Deacon).

Sure, but smp_cond_load_acquire() isn't the only place you'll see this sort
of pattern in the kernel. In other places, the only existing arch hook is
cpu_relax(), so unless you want to audit all loops and add a special
MIPs-specific smp_mb() to those that are affected, I think your only option
is to stick it in cpu_relax().

I assume you don't have a control register that can disable this
prioritisation in the SFB?

Will

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] MIPS: implement smp_cond_load_acquire() for Loongson-3
  2018-06-20  8:17         ` Will Deacon
@ 2018-06-20  9:22           ` Peter Zijlstra
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2018-06-20  9:22 UTC (permalink / raw)
  To: Will Deacon
  Cc: 陈华才,
	Paul Burton, Ralf Baechle, James Hogan, linux-mips, Fuxin Zhang,
	wuzhangjin, Huacai Chen, stable, Alan Stern, AndreaParri,
	Boqun Feng, Nicholas Piggin, David Howells, Jade Alglave,
	Luc Maranget, Paul E. McKenney, Akira Yokosawa, linux-kernel

On Wed, Jun 20, 2018 at 09:17:16AM +0100, Will Deacon wrote:
> On Wed, Jun 20, 2018 at 11:31:55AM +0800, 陈华才 wrote:
> > Loongson-3's Store Fill Buffer is nearly the same as your "Store Buffer",
> > and it increases the memory ordering weakness. So, smp_cond_load_acquire()
> > only need a __smp_mb() before the loop, not after every READ_ONCE(). In
> > other word, the following code is just OK:
> > 
> > #define smp_cond_load_acquire(ptr, cond_expr)                   \
> > ({                                                              \
> >         typeof(ptr) __PTR = (ptr);                              \
> >         typeof(*ptr) VAL;                                       \
> >         __smp_mb();                                     \
> >         for (;;) {                                              \
> >                 VAL = READ_ONCE(*__PTR);                        \
> >                 if (cond_expr)                                  \
> >                         break;                                  \
> >                 cpu_relax();                                    \
> >         }                                                       \
> >         __smp_mb();                                     \
> >         VAL;                                                    \
> > })
> > 
> > the __smp_mb() before loop is used to avoid "reads prioritised over
> > writes", which is caused by SFB's weak ordering and similar to ARM11MPCore
> > (mentioned by Will Deacon).
> 
> Sure, but smp_cond_load_acquire() isn't the only place you'll see this sort
> of pattern in the kernel. In other places, the only existing arch hook is
> cpu_relax(), so unless you want to audit all loops and add a special
> MIPs-specific smp_mb() to those that are affected, I think your only option
> is to stick it in cpu_relax().
> 
> I assume you don't have a control register that can disable this
> prioritisation in the SFB?

Right, I think we also want to clarify that this 'feature' is not
supported by the Linux kernel in general and LKMM in specific.

It really is a CPU bug. And the cpu_relax() change is a best effort
work-around.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-06-20  9:46 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1529042858-9483-1-git-send-email-chenhc@lemote.com>
2018-06-18 18:51 ` [PATCH] MIPS: implement smp_cond_load_acquire() for Loongson-3 Paul Burton
2018-06-19  6:40   ` 陈华才
2018-06-19  7:22     ` Peter Zijlstra
2018-06-20  3:31       ` 陈华才
2018-06-20  8:17         ` Will Deacon
2018-06-20  9:22           ` Peter Zijlstra
2018-06-19  7:17   ` Peter Zijlstra
2018-06-19  8:52     ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).