linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] timer: Fix possible issues with non serialized timer_pending( )
@ 2013-03-29 10:33 Vineet Gupta
  2013-04-03  7:20 ` Vineet Gupta
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Vineet Gupta @ 2013-03-29 10:33 UTC (permalink / raw)
  To: tglx; +Cc: Vineet Gupta, Christian Ruppert, Pierrick Hascoet, linux-kernel

When stress testing ARC Linux from 3.9-rc3, we've hit a serialization
issue when mod_timer() races with itself. This is on a FPGA board and
kernel .config among others has !SMP and !PREEMPT_COUNT.

The issue happens in mod_timer( ) because timer_pending( ) based early
exit check is NOT done inside the timer base spinlock - as a networking
optimization.

The value used in there, timer->entry.next is also used further in call
chain (all inlines though) for actual list manipulation. However if the
register containing this pointer remains live across the spinlock (in a
UP setup with !PREEMPT_COUNT there's nothing forcing gcc to reload) then
a stale value of next pointer causes incorrect list manipulation,
observed with following sequence in our tests.

(0). tv1[x] <----> t1 <---> t2
(1). mod_timer(t1) interrupted after it calls timer_pending()
(2). mod_timer(t2) completes
(3). mod_timer(t1) resumes but messes up the list.
(4). __runt_timers( ) uses bogus timer_list entry / crashes in
     timer->function

The simplest fix is to NOT rely on spinlock based compiler barrier but
add an explicit one in timer_pending()

FWIW, the relevant ARCompact disassembly of mod_timer which clearly
shows the issue due to register reuse is:

mod_timer:
    push_s blink
    mov_s r13,r0	# timer, timer

...
    ###### timer_pending( )
    ld_s r3,[r13]       # <------ <variable>.entry.next LOADED
    brne r3, 0, @.L163

.L163:
....
    ###### spin_lock_irq( )
    lr  r5, [status32]  # flags
    bic r4, r5, 6       # temp, flags,
    and.f 0, r5, 6      # flags,
    flag.nz r4

    ###### detach_if_pending( ) begins

    tst_s r3,r3  <--------------
			# timer_pending( ) checks timer->entry.next
                        # r3 is NOT reloaded by gcc, using stale value
    beq.d @.L169
    mov.eq r0,0

    #  detach_timer( ): __list_del( )

    ld r4,[r13,4]    	# <variable>.entry.prev, D.31439
    st r4,[r3,4]     	# <variable>.prev, D.31439
    st r3,[r4]       	# <variable>.next, D.30246

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Reported-by: Christian Ruppert <christian.ruppert@abilis.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Christian Ruppert <christian.ruppert@abilis.com>
Cc: Pierrick Hascoet <pierrick.hascoet@abilis.com>
Cc: linux-kernel@vger.kernel.org
---
 include/linux/timer.h |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 8c5a197..1537104 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -168,7 +168,16 @@ static inline void init_timer_on_stack_key(struct timer_list *timer,
  */
 static inline int timer_pending(const struct timer_list * timer)
 {
-	return timer->entry.next != NULL;
+	int pending = timer->entry.next != NULL;
+
+	/*
+	 * The check above enables timer fast path - early exit.
+	 * However most of the call sites are not protected by timer->base
+	 * spinlock. If the caller (say mod_timer) races with itself, it
+	 * can use the stale "next" pointer. See commit log for details.
+	 */
+	barrier();
+	return pending;
 }
 
 extern void add_timer_on(struct timer_list *timer, int cpu);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] timer: Fix possible issues with non serialized timer_pending( )
  2013-03-29 10:33 [PATCH] timer: Fix possible issues with non serialized timer_pending( ) Vineet Gupta
@ 2013-04-03  7:20 ` Vineet Gupta
  2013-04-03  8:53 ` Christian Ruppert
  2013-04-03 12:36 ` Thomas Gleixner
  2 siblings, 0 replies; 29+ messages in thread
From: Vineet Gupta @ 2013-04-03  7:20 UTC (permalink / raw)
  To: tglx; +Cc: Christian Ruppert, Pierrick Hascoet, linux-kernel

Hi Thomas,

Did you get a chance to look at this one !
It fixes a real problem for ARC platform - w/o it my stress test setup buckles up
in ~20 mins.

Thx,
-Vineet

On 03/29/2013 04:03 PM, Vineet Gupta wrote:
> When stress testing ARC Linux from 3.9-rc3, we've hit a serialization
> issue when mod_timer() races with itself. This is on a FPGA board and
> kernel .config among others has !SMP and !PREEMPT_COUNT.
>
> The issue happens in mod_timer( ) because timer_pending( ) based early
> exit check is NOT done inside the timer base spinlock - as a networking
> optimization.
>
> The value used in there, timer->entry.next is also used further in call
> chain (all inlines though) for actual list manipulation. However if the
> register containing this pointer remains live across the spinlock (in a
> UP setup with !PREEMPT_COUNT there's nothing forcing gcc to reload) then
> a stale value of next pointer causes incorrect list manipulation,
> observed with following sequence in our tests.
>
> (0). tv1[x] <----> t1 <---> t2
> (1). mod_timer(t1) interrupted after it calls timer_pending()
> (2). mod_timer(t2) completes
> (3). mod_timer(t1) resumes but messes up the list.
> (4). __runt_timers( ) uses bogus timer_list entry / crashes in
>      timer->function
>
> The simplest fix is to NOT rely on spinlock based compiler barrier but
> add an explicit one in timer_pending()
>
> FWIW, the relevant ARCompact disassembly of mod_timer which clearly
> shows the issue due to register reuse is:
>
> mod_timer:
>     push_s blink
>     mov_s r13,r0	# timer, timer
>
> ...
>     ###### timer_pending( )
>     ld_s r3,[r13]       # <------ <variable>.entry.next LOADED
>     brne r3, 0, @.L163
>
> .L163:
> ....
>     ###### spin_lock_irq( )
>     lr  r5, [status32]  # flags
>     bic r4, r5, 6       # temp, flags,
>     and.f 0, r5, 6      # flags,
>     flag.nz r4
>
>     ###### detach_if_pending( ) begins
>
>     tst_s r3,r3  <--------------
> 			# timer_pending( ) checks timer->entry.next
>                         # r3 is NOT reloaded by gcc, using stale value
>     beq.d @.L169
>     mov.eq r0,0
>
>     #  detach_timer( ): __list_del( )
>
>     ld r4,[r13,4]    	# <variable>.entry.prev, D.31439
>     st r4,[r3,4]     	# <variable>.prev, D.31439
>     st r3,[r4]       	# <variable>.next, D.30246
>
> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
> Reported-by: Christian Ruppert <christian.ruppert@abilis.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Christian Ruppert <christian.ruppert@abilis.com>
> Cc: Pierrick Hascoet <pierrick.hascoet@abilis.com>
> Cc: linux-kernel@vger.kernel.org
> ---
>  include/linux/timer.h |   11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/timer.h b/include/linux/timer.h
> index 8c5a197..1537104 100644
> --- a/include/linux/timer.h
> +++ b/include/linux/timer.h
> @@ -168,7 +168,16 @@ static inline void init_timer_on_stack_key(struct timer_list *timer,
>   */
>  static inline int timer_pending(const struct timer_list * timer)
>  {
> -	return timer->entry.next != NULL;
> +	int pending = timer->entry.next != NULL;
> +
> +	/*
> +	 * The check above enables timer fast path - early exit.
> +	 * However most of the call sites are not protected by timer->base
> +	 * spinlock. If the caller (say mod_timer) races with itself, it
> +	 * can use the stale "next" pointer. See commit log for details.
> +	 */
> +	barrier();
> +	return pending;
>  }
>  
>  extern void add_timer_on(struct timer_list *timer, int cpu);


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] timer: Fix possible issues with non serialized timer_pending( )
  2013-03-29 10:33 [PATCH] timer: Fix possible issues with non serialized timer_pending( ) Vineet Gupta
  2013-04-03  7:20 ` Vineet Gupta
@ 2013-04-03  8:53 ` Christian Ruppert
  2013-04-03 12:36 ` Thomas Gleixner
  2 siblings, 0 replies; 29+ messages in thread
From: Christian Ruppert @ 2013-04-03  8:53 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: tglx, Pierrick Hascoet, linux-kernel

We have tested the patch in several configurations and boards using
tests which previously crashed the kernel in less than an hour. The
crashes in question could not be reproduced in long term tests (3 days)
and nightly tests with the patch. I can thus confirm that this fixes a
real issue.

On Fri, Mar 29, 2013 at 04:03:38PM +0530, Vineet Gupta wrote:
> When stress testing ARC Linux from 3.9-rc3, we've hit a serialization
> issue when mod_timer() races with itself. This is on a FPGA board and
> kernel .config among others has !SMP and !PREEMPT_COUNT.
> 
> The issue happens in mod_timer( ) because timer_pending( ) based early
> exit check is NOT done inside the timer base spinlock - as a networking
> optimization.
> 
> The value used in there, timer->entry.next is also used further in call
> chain (all inlines though) for actual list manipulation. However if the
> register containing this pointer remains live across the spinlock (in a
> UP setup with !PREEMPT_COUNT there's nothing forcing gcc to reload) then
> a stale value of next pointer causes incorrect list manipulation,
> observed with following sequence in our tests.
> 
> [...]
> 
> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
> Reported-by: Christian Ruppert <christian.ruppert@abilis.com>
Tested-by: Christian Ruppert <christian.ruppert@abilis.com>

> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Christian Ruppert <christian.ruppert@abilis.com>
> Cc: Pierrick Hascoet <pierrick.hascoet@abilis.com>
> Cc: linux-kernel@vger.kernel.org
> ---
>  include/linux/timer.h |   11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/timer.h b/include/linux/timer.h
> index 8c5a197..1537104 100644
> --- a/include/linux/timer.h
> +++ b/include/linux/timer.h
> @@ -168,7 +168,16 @@ static inline void init_timer_on_stack_key(struct timer_list *timer,
>   */
>  static inline int timer_pending(const struct timer_list * timer)
>  {
> -	return timer->entry.next != NULL;
> +	int pending = timer->entry.next != NULL;
> +
> +	/*
> +	 * The check above enables timer fast path - early exit.
> +	 * However most of the call sites are not protected by timer->base
> +	 * spinlock. If the caller (say mod_timer) races with itself, it
> +	 * can use the stale "next" pointer. See commit log for details.
> +	 */
> +	barrier();
> +	return pending;
>  }
>  
>  extern void add_timer_on(struct timer_list *timer, int cpu);
> -- 
> 1.7.10.4
> 

-- 
  Christian Ruppert              ,          <christian.ruppert@abilis.com>
                                /|
  Tel: +41/(0)22 816 19-42     //|                 3, Chemin du Pré-Fleuri
                             _// | bilis Systems   CH-1228 Plan-les-Ouates

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] timer: Fix possible issues with non serialized timer_pending( )
  2013-03-29 10:33 [PATCH] timer: Fix possible issues with non serialized timer_pending( ) Vineet Gupta
  2013-04-03  7:20 ` Vineet Gupta
  2013-04-03  8:53 ` Christian Ruppert
@ 2013-04-03 12:36 ` Thomas Gleixner
  2013-04-03 13:03   ` Christian Ruppert
  2013-04-03 14:11   ` [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT Vineet Gupta
  2 siblings, 2 replies; 29+ messages in thread
From: Thomas Gleixner @ 2013-04-03 12:36 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Christian Ruppert, Pierrick Hascoet, LKML, Peter Zijlstra, Ingo Molnar

Vineet,

On Fri, 29 Mar 2013, Vineet Gupta wrote:

> When stress testing ARC Linux from 3.9-rc3, we've hit a serialization
> issue when mod_timer() races with itself. This is on a FPGA board and
> kernel .config among others has !SMP and !PREEMPT_COUNT.
> 
> The issue happens in mod_timer( ) because timer_pending( ) based early
> exit check is NOT done inside the timer base spinlock - as a networking
> optimization.
> 
> The value used in there, timer->entry.next is also used further in call
> chain (all inlines though) for actual list manipulation. However if the
> register containing this pointer remains live across the spinlock (in a
> UP setup with !PREEMPT_COUNT there's nothing forcing gcc to reload) then
> a stale value of next pointer causes incorrect list manipulation,
> observed with following sequence in our tests.
> 
> (0). tv1[x] <----> t1 <---> t2
> (1). mod_timer(t1) interrupted after it calls timer_pending()
> (2). mod_timer(t2) completes
> (3). mod_timer(t1) resumes but messes up the list.
> (4). __runt_timers( ) uses bogus timer_list entry / crashes in
>      timer->function
> 
> The simplest fix is to NOT rely on spinlock based compiler barrier but
> add an explicit one in timer_pending()

That's simple, but dangerous. There is other code which relies on the
implicit barriers of spinlocks, so I think we need to add the barrier
to the !PREEMPT_COUNT implementation of preempt_*() macros.

Thanks,

	tglx

> FWIW, the relevant ARCompact disassembly of mod_timer which clearly
> shows the issue due to register reuse is:
> 
> mod_timer:
>     push_s blink
>     mov_s r13,r0	# timer, timer
> 
> ...
>     ###### timer_pending( )
>     ld_s r3,[r13]       # <------ <variable>.entry.next LOADED
>     brne r3, 0, @.L163
> 
> .L163:
> ....
>     ###### spin_lock_irq( )
>     lr  r5, [status32]  # flags
>     bic r4, r5, 6       # temp, flags,
>     and.f 0, r5, 6      # flags,
>     flag.nz r4
> 
>     ###### detach_if_pending( ) begins
> 
>     tst_s r3,r3  <--------------
> 			# timer_pending( ) checks timer->entry.next
>                         # r3 is NOT reloaded by gcc, using stale value
>     beq.d @.L169
>     mov.eq r0,0
> 
>     #  detach_timer( ): __list_del( )
> 
>     ld r4,[r13,4]    	# <variable>.entry.prev, D.31439
>     st r4,[r3,4]     	# <variable>.prev, D.31439
>     st r3,[r4]       	# <variable>.next, D.30246
> 
> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
> Reported-by: Christian Ruppert <christian.ruppert@abilis.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Christian Ruppert <christian.ruppert@abilis.com>
> Cc: Pierrick Hascoet <pierrick.hascoet@abilis.com>
> Cc: linux-kernel@vger.kernel.org
> ---
>  include/linux/timer.h |   11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/timer.h b/include/linux/timer.h
> index 8c5a197..1537104 100644
> --- a/include/linux/timer.h
> +++ b/include/linux/timer.h
> @@ -168,7 +168,16 @@ static inline void init_timer_on_stack_key(struct timer_list *timer,
>   */
>  static inline int timer_pending(const struct timer_list * timer)
>  {
> -	return timer->entry.next != NULL;
> +	int pending = timer->entry.next != NULL;
> +
> +	/*
> +	 * The check above enables timer fast path - early exit.
> +	 * However most of the call sites are not protected by timer->base
> +	 * spinlock. If the caller (say mod_timer) races with itself, it
> +	 * can use the stale "next" pointer. See commit log for details.
> +	 */
> +	barrier();
> +	return pending;
>  }
>  
>  extern void add_timer_on(struct timer_list *timer, int cpu);
> -- 
> 1.7.10.4
> 
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] timer: Fix possible issues with non serialized timer_pending( )
  2013-04-03 12:36 ` Thomas Gleixner
@ 2013-04-03 13:03   ` Christian Ruppert
  2013-04-03 13:10     ` [RFC] Add implicit barriers to irqsave/restore class of functions Christian Ruppert
  2013-04-03 14:11   ` [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT Vineet Gupta
  1 sibling, 1 reply; 29+ messages in thread
From: Christian Ruppert @ 2013-04-03 13:03 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Vineet Gupta, Pierrick Hascoet, LKML, Peter Zijlstra, Ingo Molnar

Hello Vineet,

In the following a patch for a seemingly unrelated problem (RB-trees in
hrtimer implementation) which seems to fix the original problem at the
same time. For the moment, this patch is insufficiently tested but I am
posting it here (preliminarily) since it seems to be in line with
Thomas' comment: The implementation of irqsave/restore in other
architectures (MIPS, ARM) imply memory barriers and adding this implicit
barrier to ARC code as well seems to stabilise some of our remaining
crashes.

Greetings,
  Christian

On Wed, Apr 03, 2013 at 02:36:59PM +0200, Thomas Gleixner wrote:
> Vineet,
> 
> On Fri, 29 Mar 2013, Vineet Gupta wrote:
> 
> > When stress testing ARC Linux from 3.9-rc3, we've hit a serialization
> > issue when mod_timer() races with itself. This is on a FPGA board and
> > kernel .config among others has !SMP and !PREEMPT_COUNT.
> > 
> > The issue happens in mod_timer( ) because timer_pending( ) based early
> > exit check is NOT done inside the timer base spinlock - as a networking
> > optimization.
> > 
> > The value used in there, timer->entry.next is also used further in call
> > chain (all inlines though) for actual list manipulation. However if the
> > register containing this pointer remains live across the spinlock (in a
> > UP setup with !PREEMPT_COUNT there's nothing forcing gcc to reload) then
> > a stale value of next pointer causes incorrect list manipulation,
> > observed with following sequence in our tests.
> > 
> > (0). tv1[x] <----> t1 <---> t2
> > (1). mod_timer(t1) interrupted after it calls timer_pending()
> > (2). mod_timer(t2) completes
> > (3). mod_timer(t1) resumes but messes up the list.
> > (4). __runt_timers( ) uses bogus timer_list entry / crashes in
> >      timer->function
> > 
> > The simplest fix is to NOT rely on spinlock based compiler barrier but
> > add an explicit one in timer_pending()
> 
> That's simple, but dangerous. There is other code which relies on the
> implicit barriers of spinlocks, so I think we need to add the barrier
> to the !PREEMPT_COUNT implementation of preempt_*() macros.
> 
> Thanks,
> 
> 	tglx
> 
> > FWIW, the relevant ARCompact disassembly of mod_timer which clearly
> > shows the issue due to register reuse is:
> > 
> > mod_timer:
> >     push_s blink
> >     mov_s r13,r0	# timer, timer
> > 
> > ...
> >     ###### timer_pending( )
> >     ld_s r3,[r13]       # <------ <variable>.entry.next LOADED
> >     brne r3, 0, @.L163
> > 
> > .L163:
> > ....
> >     ###### spin_lock_irq( )
> >     lr  r5, [status32]  # flags
> >     bic r4, r5, 6       # temp, flags,
> >     and.f 0, r5, 6      # flags,
> >     flag.nz r4
> > 
> >     ###### detach_if_pending( ) begins
> > 
> >     tst_s r3,r3  <--------------
> > 			# timer_pending( ) checks timer->entry.next
> >                         # r3 is NOT reloaded by gcc, using stale value
> >     beq.d @.L169
> >     mov.eq r0,0
> > 
> >     #  detach_timer( ): __list_del( )
> > 
> >     ld r4,[r13,4]    	# <variable>.entry.prev, D.31439
> >     st r4,[r3,4]     	# <variable>.prev, D.31439
> >     st r3,[r4]       	# <variable>.next, D.30246
> > 
> > Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
> > Reported-by: Christian Ruppert <christian.ruppert@abilis.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Christian Ruppert <christian.ruppert@abilis.com>
> > Cc: Pierrick Hascoet <pierrick.hascoet@abilis.com>
> > Cc: linux-kernel@vger.kernel.org
> > ---
> >  include/linux/timer.h |   11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/timer.h b/include/linux/timer.h
> > index 8c5a197..1537104 100644
> > --- a/include/linux/timer.h
> > +++ b/include/linux/timer.h
> > @@ -168,7 +168,16 @@ static inline void init_timer_on_stack_key(struct timer_list *timer,
> >   */
> >  static inline int timer_pending(const struct timer_list * timer)
> >  {
> > -	return timer->entry.next != NULL;
> > +	int pending = timer->entry.next != NULL;
> > +
> > +	/*
> > +	 * The check above enables timer fast path - early exit.
> > +	 * However most of the call sites are not protected by timer->base
> > +	 * spinlock. If the caller (say mod_timer) races with itself, it
> > +	 * can use the stale "next" pointer. See commit log for details.
> > +	 */
> > +	barrier();
> > +	return pending;
> >  }
> >  
> >  extern void add_timer_on(struct timer_list *timer, int cpu);
> > -- 
> > 1.7.10.4
> > 
> > 

-- 
  Christian Ruppert              ,          <christian.ruppert@abilis.com>
                                /|
  Tel: +41/(0)22 816 19-42     //|                 3, Chemin du Pré-Fleuri
                             _// | bilis Systems   CH-1228 Plan-les-Ouates

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC] Add implicit barriers to irqsave/restore class of functions
  2013-04-03 13:03   ` Christian Ruppert
@ 2013-04-03 13:10     ` Christian Ruppert
  2013-04-03 13:29       ` Vineet Gupta
  2013-04-04 16:13       ` Peter Zijlstra
  0 siblings, 2 replies; 29+ messages in thread
From: Christian Ruppert @ 2013-04-03 13:10 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Thomas Gleixner, Pierrick Hascoet, LKML, Peter Zijlstra,
	Ingo Molnar, Christian Ruppert

This patch adds implicit memory barriers to irqsave/restore functions of
the ARC architecture port in line with what is done in other architectures.

It seems to fix several seemingly unrelated issues in our platform but for
the moment it is insufficiently tested (and might even be incomplete).
Please comment.

Signed-off-by: Christian Ruppert <christian.ruppert@abilis.com>
---
 arch/arc/include/asm/irqflags.h |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arc/include/asm/irqflags.h b/arch/arc/include/asm/irqflags.h
index ccd8480..c8147d1 100644
--- a/arch/arc/include/asm/irqflags.h
+++ b/arch/arc/include/asm/irqflags.h
@@ -39,7 +39,7 @@ static inline long arch_local_irq_save(void)
 	"	flag.nz %0		\n"
 	: "=r"(temp), "=r"(flags)
 	: "n"((STATUS_E1_MASK | STATUS_E2_MASK))
-	: "cc");
+	: "memory", "cc");
 
 	return flags;
 }
@@ -53,7 +53,7 @@ static inline void arch_local_irq_restore(unsigned long flags)
 	__asm__ __volatile__(
 	"	flag %0			\n"
 	:
-	: "r"(flags));
+	: "r"(flags) : "memory");
 }
 
 /*
@@ -73,7 +73,7 @@ static inline void arch_local_irq_disable(void)
 	"	and %0, %0, %1		\n"
 	"	flag %0			\n"
 	: "=&r"(temp)
-	: "n"(~(STATUS_E1_MASK | STATUS_E2_MASK)));
+	: "n"(~(STATUS_E1_MASK | STATUS_E2_MASK)) : "memory");
 }
 
 /*
@@ -85,7 +85,7 @@ static inline long arch_local_save_flags(void)
 
 	__asm__ __volatile__(
 	"	lr  %0, [status32]	\n"
-	: "=&r"(temp));
+	: "=&r"(temp) : : "memory");
 
 	return temp;
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC] Add implicit barriers to irqsave/restore class of functions
  2013-04-03 13:10     ` [RFC] Add implicit barriers to irqsave/restore class of functions Christian Ruppert
@ 2013-04-03 13:29       ` Vineet Gupta
  2013-04-04  8:26         ` Christian Ruppert
  2013-04-04 16:13       ` Peter Zijlstra
  1 sibling, 1 reply; 29+ messages in thread
From: Vineet Gupta @ 2013-04-03 13:29 UTC (permalink / raw)
  To: Christian Ruppert
  Cc: Thomas Gleixner, Pierrick Hascoet, LKML, Peter Zijlstra, Ingo Molnar

Hi Christian,

On 04/03/2013 06:40 PM, Christian Ruppert wrote:
> This patch adds implicit memory barriers to irqsave/restore functions of
> the ARC architecture port in line with what is done in other architectures.
>
> It seems to fix several seemingly unrelated issues in our platform but for
> the moment it is insufficiently tested (and might even be incomplete).
> Please comment.

I think this is not needed. Semantically we need barrier for spinlocks, not irq
save/restore - although spin locks are one of the primary users of irq
save/restore API.

So repeating what Thomas already said, the barrier already exists for
PREEMPT_COUNT, we need to make sure they are present for !PREEMPT_COUNT.

-Vineet

> Signed-off-by: Christian Ruppert <christian.ruppert@abilis.com>
> ---
>  arch/arc/include/asm/irqflags.h |    8 ++++----
>  1 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arc/include/asm/irqflags.h b/arch/arc/include/asm/irqflags.h
> index ccd8480..c8147d1 100644
> --- a/arch/arc/include/asm/irqflags.h
> +++ b/arch/arc/include/asm/irqflags.h
> @@ -39,7 +39,7 @@ static inline long arch_local_irq_save(void)
>  	"	flag.nz %0		\n"
>  	: "=r"(temp), "=r"(flags)
>  	: "n"((STATUS_E1_MASK | STATUS_E2_MASK))
> -	: "cc");
> +	: "memory", "cc");
>  
>  	return flags;
>  }
> @@ -53,7 +53,7 @@ static inline void arch_local_irq_restore(unsigned long flags)
>  	__asm__ __volatile__(
>  	"	flag %0			\n"
>  	:
> -	: "r"(flags));
> +	: "r"(flags) : "memory");
>  }
>  
>  /*
> @@ -73,7 +73,7 @@ static inline void arch_local_irq_disable(void)
>  	"	and %0, %0, %1		\n"
>  	"	flag %0			\n"
>  	: "=&r"(temp)
> -	: "n"(~(STATUS_E1_MASK | STATUS_E2_MASK)));
> +	: "n"(~(STATUS_E1_MASK | STATUS_E2_MASK)) : "memory");
>  }
>  
>  /*
> @@ -85,7 +85,7 @@ static inline long arch_local_save_flags(void)
>  
>  	__asm__ __volatile__(
>  	"	lr  %0, [status32]	\n"
> -	: "=&r"(temp));
> +	: "=&r"(temp) : : "memory");
>  
>  	return temp;
>  }


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT
  2013-04-03 12:36 ` Thomas Gleixner
  2013-04-03 13:03   ` Christian Ruppert
@ 2013-04-03 14:11   ` Vineet Gupta
  2013-04-04 15:28     ` Christian Ruppert
  1 sibling, 1 reply; 29+ messages in thread
From: Vineet Gupta @ 2013-04-03 14:11 UTC (permalink / raw)
  To: tglx
  Cc: Vineet Gupta, Christian Ruppert, Pierrick Hascoet, Robert Love,
	kpreempt-tech, Frederic Weisbecker, Steven Rostedt,
	Peter Zijlstra, Ingo Molnar, linux-kernel

spinlocks built in a !PREEMPT_COUNT config don't have the compiler
barrier provided by preempt_* routines. This can break lot of code which
relies on barrier semantics.

This manifested as random crashes in timer code when stress testing
ARC Linux (3.9-rc3): !SMP && !PREEMPT_COUNT

Here's the exact sequence which caused this:
(0). tv1[x] <----> t1 <---> t2
(1). mod_timer(t1) interrupted after it calls timer_pending()
(2). mod_timer(t2) completes
(3). mod_timer(t1) resumes but messes up the list.
(4). __runt_timers( ) uses bogus timer_list entry / crashes in
     timer->function

when mod_timer() races against itself, the spinlock rightly serializes
the tv1[] timer link list, however timer_pending() called outside the
spinlock accesses timer's link list element, cached in a register.
With low register pressure (and a deep register file), there's nothing
forcing gcc to reload the element across the spinlock, causing a stale
value in register in case of race - ensuing a list corruption.

And the ARcompact disassembly which shows the culprit generated code:

mod_timer:
    push_s blink
    mov_s r13,r0	# timer, timer

..
    ###### timer_pending( )
    ld_s r3,[r13]       # <------ <variable>.entry.next LOADED
    brne r3, 0, @.L163

.L163:
..
    ###### spin_lock_irq( )
    lr  r5, [status32]  # flags
    bic r4, r5, 6       # temp, flags,
    and.f 0, r5, 6      # flags,
    flag.nz r4

    ###### detach_if_pending( ) begins

    tst_s r3,r3  <--------------
			# timer_pending( ) checks timer->entry.next
                        # r3 is NOT reloaded by gcc, using stale value
    beq.d @.L169
    mov.eq r0,0

    #####  detach_timer( ): __list_del( )

    ld r4,[r13,4]    	# <variable>.entry.prev, D.31439
    st r4,[r3,4]     	# <variable>.prev, D.31439

    st r3,[r4]       	# <variable>.next, D.30246

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Reported-by: Christian Ruppert <christian.ruppert@abilis.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Christian Ruppert <christian.ruppert@abilis.com>
Cc: Pierrick Hascoet <pierrick.hascoet@abilis.com>
Cc: Robert Love <rml@tech9.net>
Cc: kpreempt-tech@lists.sourceforge.net
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org
---
 include/linux/preempt.h |   21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 5a710b9..354d6e3 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -93,14 +93,19 @@ do { \
 
 #else /* !CONFIG_PREEMPT_COUNT */
 
-#define preempt_disable()		do { } while (0)
-#define sched_preempt_enable_no_resched()	do { } while (0)
-#define preempt_enable_no_resched()	do { } while (0)
-#define preempt_enable()		do { } while (0)
-
-#define preempt_disable_notrace()		do { } while (0)
-#define preempt_enable_no_resched_notrace()	do { } while (0)
-#define preempt_enable_notrace()		do { } while (0)
+/*
+ * compiler barrier needed to ensure that spinlocks provide the barrier
+ * semantics despite !CONFIG_PREEMPT_COUNT.
+ * See commit log for actual bug which forced this change
+ */
+#define preempt_disable()			do { barrier(); } while (0)
+#define sched_preempt_enable_no_resched()	do { barrier(); } while (0)
+#define preempt_enable_no_resched()		do { barrier(); } while (0)
+#define preempt_enable()			do { barrier(); } while (0)
+
+#define preempt_disable_notrace()		do { barrier(); } while (0)
+#define preempt_enable_no_resched_notrace()	do { barrier(); } while (0)
+#define preempt_enable_notrace()		do { barrier(); } while (0)
 
 #endif /* CONFIG_PREEMPT_COUNT */
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC] Add implicit barriers to irqsave/restore class of functions
  2013-04-03 13:29       ` Vineet Gupta
@ 2013-04-04  8:26         ` Christian Ruppert
  0 siblings, 0 replies; 29+ messages in thread
From: Christian Ruppert @ 2013-04-04  8:26 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Thomas Gleixner, Pierrick Hascoet, LKML, Peter Zijlstra, Ingo Molnar

Hi Vineet,

Just a short message to inform you that the test campaign on our RFC
patch has run through successfully and we consider the patch stable
enough for release to our customers. The main reason the barriers were
added in this particular place is because other architectures do the
same.

I agree that the patch adding barriers in preempt_* is more in line with
Thomas' comment than our RFC (which predates that comment) and we have
also started a test campaign for that patch yesterday. I'll give an update
directly in reply to the patch when that campaign has run through.

Greetings,
  Christian

On Wed, Apr 03, 2013 at 06:59:36PM +0530, Vineet Gupta wrote:
> Hi Christian,
> 
> On 04/03/2013 06:40 PM, Christian Ruppert wrote:
> > This patch adds implicit memory barriers to irqsave/restore functions of
> > the ARC architecture port in line with what is done in other architectures.
> >
> > It seems to fix several seemingly unrelated issues in our platform but for
> > the moment it is insufficiently tested (and might even be incomplete).
> > Please comment.
> 
> I think this is not needed. Semantically we need barrier for spinlocks, not irq
> save/restore - although spin locks are one of the primary users of irq
> save/restore API.
> 
> So repeating what Thomas already said, the barrier already exists for
> PREEMPT_COUNT, we need to make sure they are present for !PREEMPT_COUNT.
> 
> -Vineet
> 
> > Signed-off-by: Christian Ruppert <christian.ruppert@abilis.com>
> > ---
> >  arch/arc/include/asm/irqflags.h |    8 ++++----
> >  1 files changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/arc/include/asm/irqflags.h b/arch/arc/include/asm/irqflags.h
> > index ccd8480..c8147d1 100644
> > --- a/arch/arc/include/asm/irqflags.h
> > +++ b/arch/arc/include/asm/irqflags.h
> > @@ -39,7 +39,7 @@ static inline long arch_local_irq_save(void)
> >  	"	flag.nz %0		\n"
> >  	: "=r"(temp), "=r"(flags)
> >  	: "n"((STATUS_E1_MASK | STATUS_E2_MASK))
> > -	: "cc");
> > +	: "memory", "cc");
> >  
> >  	return flags;
> >  }
> > @@ -53,7 +53,7 @@ static inline void arch_local_irq_restore(unsigned long flags)
> >  	__asm__ __volatile__(
> >  	"	flag %0			\n"
> >  	:
> > -	: "r"(flags));
> > +	: "r"(flags) : "memory");
> >  }
> >  
> >  /*
> > @@ -73,7 +73,7 @@ static inline void arch_local_irq_disable(void)
> >  	"	and %0, %0, %1		\n"
> >  	"	flag %0			\n"
> >  	: "=&r"(temp)
> > -	: "n"(~(STATUS_E1_MASK | STATUS_E2_MASK)));
> > +	: "n"(~(STATUS_E1_MASK | STATUS_E2_MASK)) : "memory");
> >  }
> >  
> >  /*
> > @@ -85,7 +85,7 @@ static inline long arch_local_save_flags(void)
> >  
> >  	__asm__ __volatile__(
> >  	"	lr  %0, [status32]	\n"
> > -	: "=&r"(temp));
> > +	: "=&r"(temp) : : "memory");
> >  
> >  	return temp;
> >  }
> 

-- 
  Christian Ruppert              ,          <christian.ruppert@abilis.com>
                                /|
  Tel: +41/(0)22 816 19-42     //|                 3, Chemin du Pré-Fleuri
                             _// | bilis Systems   CH-1228 Plan-les-Ouates

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT
  2013-04-03 14:11   ` [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT Vineet Gupta
@ 2013-04-04 15:28     ` Christian Ruppert
  2013-04-05  4:36       ` Vineet Gupta
  0 siblings, 1 reply; 29+ messages in thread
From: Christian Ruppert @ 2013-04-04 15:28 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: tglx, Pierrick Hascoet, Robert Love, kpreempt-tech,
	Frederic Weisbecker, Steven Rostedt, Peter Zijlstra, Ingo Molnar,
	linux-kernel

Hi Vineet,

Our stress testing campaign has just successfully completed on this
patch. It seems to solve several issues we have seen in unpatched
versions, amongst others the original timer issue, a crash in hrtimer
rb-tree manipulation etc.

Greetings,
  Christian

On Wed, Apr 03, 2013 at 07:41:22PM +0530, Vineet Gupta wrote:
> spinlocks built in a !PREEMPT_COUNT config don't have the compiler
> barrier provided by preempt_* routines. This can break lot of code which
> relies on barrier semantics.
> 
> This manifested as random crashes in timer code when stress testing
> ARC Linux (3.9-rc3): !SMP && !PREEMPT_COUNT
> 
> [...]
> 
> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
> Reported-by: Christian Ruppert <christian.ruppert@abilis.com>
Tested-by: Christian Ruppert <christian.ruppert@abilis.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Christian Ruppert <christian.ruppert@abilis.com>
> Cc: Pierrick Hascoet <pierrick.hascoet@abilis.com>
> Cc: Robert Love <rml@tech9.net>
> Cc: kpreempt-tech@lists.sourceforge.net
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Steven Rostedt <srostedt@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: linux-kernel@vger.kernel.org
> [...]
-- 
  Christian Ruppert              ,          <christian.ruppert@abilis.com>
                                /|
  Tel: +41/(0)22 816 19-42     //|                 3, Chemin du Pré-Fleuri
                             _// | bilis Systems   CH-1228 Plan-les-Ouates

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC] Add implicit barriers to irqsave/restore class of functions
  2013-04-03 13:10     ` [RFC] Add implicit barriers to irqsave/restore class of functions Christian Ruppert
  2013-04-03 13:29       ` Vineet Gupta
@ 2013-04-04 16:13       ` Peter Zijlstra
  2013-04-05  4:27         ` Vineet Gupta
  1 sibling, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2013-04-04 16:13 UTC (permalink / raw)
  To: Christian Ruppert
  Cc: Vineet Gupta, Thomas Gleixner, Pierrick Hascoet, LKML, Ingo Molnar

On Wed, 2013-04-03 at 15:10 +0200, Christian Ruppert wrote:
> This patch adds implicit memory barriers to irqsave/restore functions
> of
> the ARC architecture port in line with what is done in other
> architectures.

> diff --git a/arch/arc/include/asm/irqflags.h
> b/arch/arc/include/asm/irqflags.h
> index ccd8480..c8147d1 100644
> --- a/arch/arc/include/asm/irqflags.h
> +++ b/arch/arc/include/asm/irqflags.h
> @@ -39,7 +39,7 @@ static inline long arch_local_irq_save(void)
>         "       flag.nz %0              \n"
>         : "=r"(temp), "=r"(flags)
>         : "n"((STATUS_E1_MASK | STATUS_E2_MASK))
> -       : "cc");
> +       : "memory", "cc");

That's not a memory barrier, that a memory clobber, aka a compiler
barrier.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC] Add implicit barriers to irqsave/restore class of functions
  2013-04-04 16:13       ` Peter Zijlstra
@ 2013-04-05  4:27         ` Vineet Gupta
  0 siblings, 0 replies; 29+ messages in thread
From: Vineet Gupta @ 2013-04-05  4:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christian Ruppert, Thomas Gleixner, Pierrick Hascoet, LKML, Ingo Molnar

Hi Peter,

On 04/04/2013 09:43 PM, Peter Zijlstra wrote:
> -       : "cc");
> +       : "memory", "cc");
> That's not a memory barrier, that a memory clobber, aka a compiler
> barrier.

For the problem under consideration we indeed want a compiler barrier because the
error shows up due to a stale register which is live across a spinlock for
!PREEMPT_COUNT config.

However IMO doing this in irq save/restore macros is semantically incorrect since
those macros might be used elsewhere which don't need the compiler reload reg
semantics. Further per tglx' suggestion fixing preempt_* macros for !PREEMPT_COUNT
case would fix this independent of what arch is doing.

A patch to that effect was already posted to lists:
http://www.spinics.net/lists/kernel/msg1510885.html
Please let us know what you think.

Thx,
-Vineet



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT
  2013-04-04 15:28     ` Christian Ruppert
@ 2013-04-05  4:36       ` Vineet Gupta
  2013-04-06 13:34         ` Vineet Gupta
  0 siblings, 1 reply; 29+ messages in thread
From: Vineet Gupta @ 2013-04-05  4:36 UTC (permalink / raw)
  To: tglx
  Cc: Christian Ruppert, Pierrick Hascoet, Frederic Weisbecker,
	Steven Rostedt, Peter Zijlstra, Ingo Molnar, linux-kernel

Hi Thomas,

On 04/04/2013 08:58 PM, Christian Ruppert wrote:
> Hi Vineet,
>
> Our stress testing campaign has just successfully completed on this
> patch. It seems to solve several issues we have seen in unpatched
> versions, amongst others the original timer issue, a crash in hrtimer
> rb-tree manipulation etc.
>
> Greetings,
>   Christian

Given that we are closing on 3.9 release, and that one/more of these patches fix a
real issue for us - can you please consider my earlier patch to fix
timer_pending() only for 3.9 [http://www.spinics.net/lists/kernel/msg1508224.html]
This will be a localized / low risk change for this late in cycle.

For 3.10 - assuming preempt_* change is blessed, we can revert this one and add
that fuller/better fix.

What do you think ?

Thx,
-Vineet

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT
  2013-04-05  4:36       ` Vineet Gupta
@ 2013-04-06 13:34         ` Vineet Gupta
  2013-04-06 16:13           ` Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: Vineet Gupta @ 2013-04-06 13:34 UTC (permalink / raw)
  To: tglx
  Cc: Linus Torvalds, Christian Ruppert, Pierrick Hascoet,
	Frederic Weisbecker, Steven Rostedt, Peter Zijlstra, Ingo Molnar,
	linux-kernel

> On 04/05/2013 10:06 AM, Vineet Gupta wrote:
> Hi Thomas,
> 
> Given that we are closing on 3.9 release, and that one/more of these patches fix a
> real issue for us - can you please consider my earlier patch to fix
> timer_pending() only for 3.9 [http://www.spinics.net/lists/kernel/msg1508224.html]
> This will be a localized / low risk change for this late in cycle.
> 
> For 3.10 - assuming preempt_* change is blessed, we can revert this one and add
> that fuller/better fix.
> 
> What do you think ?
> 
> Thx,
> -Vineet
> 

Ping ! Sorry for pestering, but one of the fixes is needed before 3.9 goes out.

Simple localized fix: http://www.spinics.net/lists/kernel/msg1508224.html
Better but risky: http://www.spinics.net/lists/kernel/msg1510885.html

Thx,
-Vineet

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT
  2013-04-06 13:34         ` Vineet Gupta
@ 2013-04-06 16:13           ` Linus Torvalds
  2013-04-06 18:01             ` Linus Torvalds
  2013-04-08  4:20             ` [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT Vineet Gupta
  0 siblings, 2 replies; 29+ messages in thread
From: Linus Torvalds @ 2013-04-06 16:13 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Thomas Gleixner, Christian Ruppert, Pierrick Hascoet,
	Frederic Weisbecker, Steven Rostedt, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List

This is all *COMPLETELY* wrong.

Neither the normal preempt macros, nor the plain spinlocks, should
protect anything at all against interrupts.

The real protection should come from the  spin_lock_irqsave() in
lock_timer_base(), not from spinlocks, and not from preemption.

It sounds like ARC is completely buggered, and hasn't made the irq
disable be a compiler barrier. That's an ARC bug, and it's a big one,
and can affect a lot more than just the timers.

So the real fix is to add a "memory" clobber to
arch_local_irq_save/restore() and friends, so that the compiler
doesn't get to cache memory state from the irq-enabled region into the
irq-disabled one.

Fix ARC, don't try to blame generic code. You should have asked
yourself why only ARC saw this bug, when the code apparently works
fine for everybody else!

                   Linus

On Sat, Apr 6, 2013 at 6:34 AM, Vineet Gupta <Vineet.Gupta1@synopsys.com> wrote:
>> On 04/05/2013 10:06 AM, Vineet Gupta wrote:
>> Hi Thomas,
>>
>> Given that we are closing on 3.9 release, and that one/more of these patches fix a
>> real issue for us - can you please consider my earlier patch to fix
>> timer_pending() only for 3.9 [http://www.spinics.net/lists/kernel/msg1508224.html]
>> This will be a localized / low risk change for this late in cycle.
>>
>> For 3.10 - assuming preempt_* change is blessed, we can revert this one and add
>> that fuller/better fix.
>>
>> What do you think ?
>>
>> Thx,
>> -Vineet
>>
>
> Ping ! Sorry for pestering, but one of the fixes is needed before 3.9 goes out.
>
> Simple localized fix: http://www.spinics.net/lists/kernel/msg1508224.html
> Better but risky: http://www.spinics.net/lists/kernel/msg1510885.html
>
> Thx,
> -Vineet

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT
  2013-04-06 16:13           ` Linus Torvalds
@ 2013-04-06 18:01             ` Linus Torvalds
  2013-04-06 19:54               ` Jacquiot, Aurelien
  2013-04-09 16:33               ` [PATCH] tile: comment assumption about __insn_mtspr for <asm/irqflags.h> Chris Metcalf
  2013-04-08  4:20             ` [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT Vineet Gupta
  1 sibling, 2 replies; 29+ messages in thread
From: Linus Torvalds @ 2013-04-06 18:01 UTC (permalink / raw)
  To: Vineet Gupta, Mark Salter, Aurelien Jacquiot
  Cc: Thomas Gleixner, Christian Ruppert, Pierrick Hascoet,
	Frederic Weisbecker, Steven Rostedt, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, linux-arch, linux-c6x-dev

Looking around, it looks like c6x has the same bug.

Some other architectures (tile) have such subtle implementations
(where is __insn_mtspr() defined?) that I have a hard time judging.
And maybe I missed something, but the rest seem ok.

                Linus

On Sat, Apr 6, 2013 at 9:13 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> This is all *COMPLETELY* wrong.
>
> Neither the normal preempt macros, nor the plain spinlocks, should
> protect anything at all against interrupts.
>
> The real protection should come from the  spin_lock_irqsave() in
> lock_timer_base(), not from spinlocks, and not from preemption.
>
> It sounds like ARC is completely buggered, and hasn't made the irq
> disable be a compiler barrier. That's an ARC bug, and it's a big one,
> and can affect a lot more than just the timers.
>
> So the real fix is to add a "memory" clobber to
> arch_local_irq_save/restore() and friends, so that the compiler
> doesn't get to cache memory state from the irq-enabled region into the
> irq-disabled one.
>
> Fix ARC, don't try to blame generic code. You should have asked
> yourself why only ARC saw this bug, when the code apparently works
> fine for everybody else!
>
>                    Linus
>
> On Sat, Apr 6, 2013 at 6:34 AM, Vineet Gupta <Vineet.Gupta1@synopsys.com> wrote:
>>> On 04/05/2013 10:06 AM, Vineet Gupta wrote:
>>> Hi Thomas,
>>>
>>> Given that we are closing on 3.9 release, and that one/more of these patches fix a
>>> real issue for us - can you please consider my earlier patch to fix
>>> timer_pending() only for 3.9 [http://www.spinics.net/lists/kernel/msg1508224.html]
>>> This will be a localized / low risk change for this late in cycle.
>>>
>>> For 3.10 - assuming preempt_* change is blessed, we can revert this one and add
>>> that fuller/better fix.
>>>
>>> What do you think ?
>>>
>>> Thx,
>>> -Vineet
>>>
>>
>> Ping ! Sorry for pestering, but one of the fixes is needed before 3.9 goes out.
>>
>> Simple localized fix: http://www.spinics.net/lists/kernel/msg1508224.html
>> Better but risky: http://www.spinics.net/lists/kernel/msg1510885.html
>>
>> Thx,
>> -Vineet

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT
  2013-04-06 18:01             ` Linus Torvalds
@ 2013-04-06 19:54               ` Jacquiot, Aurelien
  2013-04-09 16:33               ` [PATCH] tile: comment assumption about __insn_mtspr for <asm/irqflags.h> Chris Metcalf
  1 sibling, 0 replies; 29+ messages in thread
From: Jacquiot, Aurelien @ 2013-04-06 19:54 UTC (permalink / raw)
  To: Linus Torvalds, Vineet Gupta, Mark Salter
  Cc: Thomas Gleixner, Christian Ruppert, Pierrick Hascoet,
	Frederic Weisbecker, Steven Rostedt, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, linux-arch, linux-c6x-dev

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2963 bytes --]

Agreed. We will fix it.

Aurelien


Texas Instruments France SA, 821 Avenue Jack Kilby, 06270 Villeneuve Loubet. 036 420 040 R.C.S Antibes. Capital de EUR 12.654.784

-----Original Message-----
From: linus971@gmail.com [mailto:linus971@gmail.com] On Behalf Of Linus Torvalds
Sent: Saturday, April 06, 2013 8:01 PM
To: Vineet Gupta; Mark Salter; Jacquiot, Aurelien
Cc: Thomas Gleixner; Christian Ruppert; Pierrick Hascoet; Frederic Weisbecker; Steven Rostedt; Peter Zijlstra; Ingo Molnar; Linux Kernel Mailing List; linux-arch@vger.kernel.org; linux-c6x-dev@linux-c6x.org
Subject: Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT

Looking around, it looks like c6x has the same bug.

Some other architectures (tile) have such subtle implementations (where is __insn_mtspr() defined?) that I have a hard time judging.
And maybe I missed something, but the rest seem ok.

                Linus

On Sat, Apr 6, 2013 at 9:13 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> This is all *COMPLETELY* wrong.
>
> Neither the normal preempt macros, nor the plain spinlocks, should
> protect anything at all against interrupts.
>
> The real protection should come from the  spin_lock_irqsave() in
> lock_timer_base(), not from spinlocks, and not from preemption.
>
> It sounds like ARC is completely buggered, and hasn't made the irq
> disable be a compiler barrier. That's an ARC bug, and it's a big one,
> and can affect a lot more than just the timers.
>
> So the real fix is to add a "memory" clobber to
> arch_local_irq_save/restore() and friends, so that the compiler
> doesn't get to cache memory state from the irq-enabled region into the
> irq-disabled one.
>
> Fix ARC, don't try to blame generic code. You should have asked
> yourself why only ARC saw this bug, when the code apparently works
> fine for everybody else!
>
>                    Linus
>
> On Sat, Apr 6, 2013 at 6:34 AM, Vineet Gupta <Vineet.Gupta1@synopsys.com> wrote:
>>> On 04/05/2013 10:06 AM, Vineet Gupta wrote:
>>> Hi Thomas,
>>>
>>> Given that we are closing on 3.9 release, and that one/more of these
>>> patches fix a real issue for us - can you please consider my earlier
>>> patch to fix
>>> timer_pending() only for 3.9
>>> [http://www.spinics.net/lists/kernel/msg1508224.html]
>>> This will be a localized / low risk change for this late in cycle.
>>>
>>> For 3.10 - assuming preempt_* change is blessed, we can revert this
>>> one and add that fuller/better fix.
>>>
>>> What do you think ?
>>>
>>> Thx,
>>> -Vineet
>>>
>>
>> Ping ! Sorry for pestering, but one of the fixes is needed before 3.9 goes out.
>>
>> Simple localized fix:
>> http://www.spinics.net/lists/kernel/msg1508224.html
>> Better but risky: http://www.spinics.net/lists/kernel/msg1510885.html
>>
>> Thx,
>> -Vineet

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT
  2013-04-06 16:13           ` Linus Torvalds
  2013-04-06 18:01             ` Linus Torvalds
@ 2013-04-08  4:20             ` Vineet Gupta
  2013-04-08  4:48               ` Linus Torvalds
  2013-04-08  4:49               ` Linus Torvalds
  1 sibling, 2 replies; 29+ messages in thread
From: Vineet Gupta @ 2013-04-08  4:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Christian Ruppert, Pierrick Hascoet,
	Frederic Weisbecker, Steven Rostedt, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List

Hi Linus,

On 04/06/2013 09:43 PM, Linus Torvalds wrote:
> This is all *COMPLETELY* wrong.
>
> Neither the normal preempt macros, nor the plain spinlocks, should
> protect anything at all against interrupts.
>
> The real protection should come from the  spin_lock_irqsave() in
> lock_timer_base(), not from spinlocks, and not from preemption.
>
> It sounds like ARC is completely buggered, and hasn't made the irq
> disable be a compiler barrier. That's an ARC bug, and it's a big one,
> and can affect a lot more than just the timers.
>
> So the real fix is to add a "memory" clobber to
> arch_local_irq_save/restore() and friends, so that the compiler
> doesn't get to cache memory state from the irq-enabled region into the
> irq-disabled one.
>
> Fix ARC, don't try to blame generic code. You should have asked
> yourself why only ARC saw this bug, when the code apparently works
> fine for everybody else!
>
>                    Linus

Christian had already proposed that change - only I was reluctant to take it - as
local_irq_* is used heavily in a configuration of ARC700 linux where (!SMP) cpu
doesn't support load-locked/store-conditional instructions - hence atomic R-M-W
sequences need those routines. Adding a mem clobber would lead to pathetic
generated code hence the reason it was likely removed by me at some point in time.

Also I'd thought that given that barriers are already present in PREEMPT_COUNT
variant of preempt_* macros we might as well add them to !PREEMPT_COUNT ones.

But thinking about it more - you are right (as you always are) - the situation I
saw with timers code, could very well show up with vanilla local_irq_save/restore
when a piece of code races with itself (specially for our mostly !SMP configs).
Would you be OK if I send the single patch to ARC by email (for 3.9-rc7) or you'd
rather have the pull request.

Thx,
-Vineet

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT
  2013-04-08  4:20             ` [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT Vineet Gupta
@ 2013-04-08  4:48               ` Linus Torvalds
  2013-04-08 13:37                 ` Peter Zijlstra
  2013-04-08 14:05                 ` Steven Rostedt
  2013-04-08  4:49               ` Linus Torvalds
  1 sibling, 2 replies; 29+ messages in thread
From: Linus Torvalds @ 2013-04-08  4:48 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Thomas Gleixner, Christian Ruppert, Pierrick Hascoet,
	Frederic Weisbecker, Steven Rostedt, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List

On Sun, Apr 7, 2013 at 9:20 PM, Vineet Gupta <Vineet.Gupta1@synopsys.com> wrote:
>
> Christian had already proposed that change - only I was reluctant to take it - as
> local_irq_* is used heavily in a configuration of ARC700 linux where (!SMP) cpu
> doesn't support load-locked/store-conditional instructions - hence atomic R-M-W
> sequences need those routines. Adding a mem clobber would lead to pathetic
> generated code hence the reason it was likely removed by me at some point in time.

Umm. The irq instructions absolutely *HAVE* to have the barriers. And
yes, the R-M-W instructions (if they are protected by them) obviously
need to have the load and the store inside the spinlocked region.

> Also I'd thought that given that barriers are already present in PREEMPT_COUNT
> variant of preempt_* macros we might as well add them to !PREEMPT_COUNT ones.

That's absolutely insane. It's insane for two reasons:

 - it's not SUFFICIENT. Any user of irq-safety needs the loads and
stores to stay inside the irq-safe region, and the preempt macros have
absolutely nothing to do with that.

 - it's POINTLESS and WRONG. If you run on UP, and have preemption
disabled, then there is no reason for a barrier in the preempt code,
since moving code around them doesn't matter - nothing is ever going
to preempt that.

The thing is, the irq disable/enable sequence absolutely has to have a
memory barrier in it, because if the compiler moves a load outside of
the irq-protected region, then that is a bug. Now, if you want to play
games with your atomics and do them with handcoded asm, go right
ahead, but that really isn't an argument for getting everything else
wrong.

That said, thinking about barriers and preemption made me realize that
we do have a potential issue between: (a) non-preemption UP kernel
(with no barriers in the preempt_enable/disable()) and (b)
architectures that use inline asm without a memory clobber for
get_user/put_user(). That includes x86.

The reason is that I could imagine code like

    if (get_user(val, addr))
        return -EFAULT;
    preempt_disable();
    ... do something percpu ..
    preempt_enable();

and it turns out that for non-preempt UP, we don't tell the compiler
anywhere that it can't move the get_user past the preempt_disable. But
the get_user() can cause a preemption point because of a page fault,
obviously.

I suspect that the best way to fix this ends up relying on the gcc
"asm volatile" rules, and make the rule be that:
 - preempt_disable/enable have to generate an asm volatile() string
(preferably just a ASM comment saying "preempt disable/enable")
 - get_user/put_user doesn't need to add a memory clobber, but needs
to be done with an asm volatile too.

Then the gcc "no reordering of volatile asms" should make the above be
ok, without having to add an actual compiler memory barrier.

Ingo? Peter? I'm not sure anybody really uses UP+no-preempt on x86,
but it does seem to be a bug.. Comments?

But the irq disable/enable sequences definitely need the memory
clobber. Because caching memory that could be changed by an interrupt
in registers is not good.

                    Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT
  2013-04-08  4:20             ` [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT Vineet Gupta
  2013-04-08  4:48               ` Linus Torvalds
@ 2013-04-08  4:49               ` Linus Torvalds
  1 sibling, 0 replies; 29+ messages in thread
From: Linus Torvalds @ 2013-04-08  4:49 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Thomas Gleixner, Christian Ruppert, Pierrick Hascoet,
	Frederic Weisbecker, Steven Rostedt, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List

On Sun, Apr 7, 2013 at 9:20 PM, Vineet Gupta <Vineet.Gupta1@synopsys.com> wrote:
>
> Would you be OK if I send the single patch to ARC by email (for 3.9-rc7) or you'd
> rather have the pull request.

I got distracted by thinking about user-accesses vs preemption, but
yes, sending the ARC patch to fix things by email as a plain patch is
fine.

             Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT
  2013-04-08  4:48               ` Linus Torvalds
@ 2013-04-08 13:37                 ` Peter Zijlstra
  2013-04-08 14:31                   ` Steven Rostedt
  2013-04-08 14:05                 ` Steven Rostedt
  1 sibling, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2013-04-08 13:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Vineet Gupta, Thomas Gleixner, Christian Ruppert,
	Pierrick Hascoet, Frederic Weisbecker, Steven Rostedt,
	Ingo Molnar, Linux Kernel Mailing List

On Sun, 2013-04-07 at 21:48 -0700, Linus Torvalds wrote:
> That said, thinking about barriers and preemption made me realize that
> we do have a potential issue between: (a) non-preemption UP kernel
> (with no barriers in the preempt_enable/disable()) and (b)
> architectures that use inline asm without a memory clobber for
> get_user/put_user(). That includes x86.
> 
> The reason is that I could imagine code like
> 
>     if (get_user(val, addr))
>         return -EFAULT;
>     preempt_disable();
>     ... do something percpu ..
>     preempt_enable();
> 
> and it turns out that for non-preempt UP, we don't tell the compiler
> anywhere that it can't move the get_user past the preempt_disable. But
> the get_user() can cause a preemption point because of a page fault,
> obviously.
> 
> I suspect that the best way to fix this ends up relying on the gcc
> "asm volatile" rules, and make the rule be that:
>  - preempt_disable/enable have to generate an asm volatile() string
> (preferably just a ASM comment saying "preempt disable/enable")
>  - get_user/put_user doesn't need to add a memory clobber, but needs
> to be done with an asm volatile too.
> 
> Then the gcc "no reordering of volatile asms" should make the above be
> ok, without having to add an actual compiler memory barrier.
> 
> Ingo? Peter? I'm not sure anybody really uses UP+no-preempt on x86,
> but it does seem to be a bug.. Comments?

Right, stuff between preempt_disable() and preempt_enable() is supposed
to appear atomic wrt scheduling contexts, allowing any schedule to
happen in between would violate this.

I'm not seeing how this would be UP only though, I can see the same
thing happening on SMP+no-preempt.

Also, is {get,put}_user() the only construct that can do this? If so,
using the "asm volatile" rules as described might be the best way,
otherwise making the PREEMPT_COUNT=n operations be compiler barriers
seems like the safer option.

That said, I can't remember ever having seen a BUG like this, even
though !PREEMPT is (or at least was) the most popular distro setting.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT
  2013-04-08  4:48               ` Linus Torvalds
  2013-04-08 13:37                 ` Peter Zijlstra
@ 2013-04-08 14:05                 ` Steven Rostedt
  1 sibling, 0 replies; 29+ messages in thread
From: Steven Rostedt @ 2013-04-08 14:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Vineet Gupta, Thomas Gleixner, Christian Ruppert,
	Pierrick Hascoet, Frederic Weisbecker, Peter Zijlstra,
	Ingo Molnar, Linux Kernel Mailing List

On Sun, 2013-04-07 at 21:48 -0700, Linus Torvalds wrote:

> Ingo? Peter? I'm not sure anybody really uses UP+no-preempt on x86,
> but it does seem to be a bug.. Comments?

I believe a lot of people still use no-preempt. Well, at least voluntary
preemption, which would have the same bug.

I'm thinking that we may have just been lucky that gcc didn't move the
get_user() into a place that would cause issues.

Sounds like a bug to me.

-- Steve



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT
  2013-04-08 13:37                 ` Peter Zijlstra
@ 2013-04-08 14:31                   ` Steven Rostedt
  2013-04-08 14:50                     ` Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: Steven Rostedt @ 2013-04-08 14:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Vineet Gupta, Thomas Gleixner, Christian Ruppert,
	Pierrick Hascoet, Frederic Weisbecker, Ingo Molnar,
	Linux Kernel Mailing List

On Mon, 2013-04-08 at 15:37 +0200, Peter Zijlstra wrote:

> That said, I can't remember ever having seen a BUG like this, even
> though !PREEMPT is (or at least was) the most popular distro setting.

It requires gcc reordering the code to where a preempt can happen inside
preempt_disable. And also put in a position where the preempt_disable
code it gets added matters.

Then if gcc does this, we need a page fault to occur with a get_user()
operation, which in practice seldom happens as most get user operations
are done on freshly modified memory.

And then, it would require the page fault to cause a schedule. This is
the most likely of the things needed to occur, but itself is not a
problem.

Then, the schedule would have to cause the data that is being protect by
the preempt_disable() to be corrupted. Either by scheduling in another
process that monkeys with the data. Or if it protects per-cpu data,
scheduling to another CPU (for the SMP case only).

If any of the above does not occur, then you wont see a bug. This is
highly unlikely to happen, but that's no excuse to not fix it. But it
probably explains why we never saw a bug report. Heck, it may have
happened, but it would be hard to reproduce, and just forgotten about.

-- Steve




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT
  2013-04-08 14:31                   ` Steven Rostedt
@ 2013-04-08 14:50                     ` Linus Torvalds
  2013-04-08 14:59                       ` Steven Rostedt
  0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2013-04-08 14:50 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Vineet Gupta, Thomas Gleixner, Christian Ruppert,
	Pierrick Hascoet, Frederic Weisbecker, Ingo Molnar,
	Linux Kernel Mailing List

On Mon, Apr 8, 2013 at 7:31 AM, Steven Rostedt <srostedt@redhat.com> wrote:
>
> It requires gcc reordering the code to where a preempt can happen inside
> preempt_disable. And also put in a position where the preempt_disable
> code it gets added matters.
>
> Then if gcc does this, we need a page fault to occur with a get_user()
> operation, which in practice seldom happens as most get user operations
> are done on freshly modified memory.
>
> And then, it would require the page fault to cause a schedule. This is
> the most likely of the things needed to occur, but itself is not a
> problem.
>
> Then, the schedule would have to cause the data that is being protect by
> the preempt_disable() to be corrupted. Either by scheduling in another
> process that monkeys with the data. Or if it protects per-cpu data,
> scheduling to another CPU (for the SMP case only).
>
> If any of the above does not occur, then you wont see a bug.

Right. The gcc reordering is also hard to actually notice if it does
happen, so even testing the fix for this looks nontrivial.

Something like the appended (whitespace-damaged and TOTALLY UNTESTED)
might be sufficient, but..

Comments?

                  Linus

---
    diff --git a/include/linux/preempt.h b/include/linux/preempt.h
    index 5a710b9c578e..465df1c13386 100644
    --- a/include/linux/preempt.h
    +++ b/include/linux/preempt.h
    @@ -93,14 +93,17 @@ do { \

     #else /* !CONFIG_PREEMPT_COUNT */

    -#define preempt_disable() do { } while (0)
    -#define sched_preempt_enable_no_resched() do { } while (0)
    -#define preempt_enable_no_resched() do { } while (0)
    -#define preempt_enable() do { } while (0)
    -
    -#define preempt_disable_notrace() do { } while (0)
    -#define preempt_enable_no_resched_notrace() do { } while (0)
    -#define preempt_enable_notrace() do { } while (0)
    +/* This is only a barrier to other asms. Notably get_user/put_user */
    +#define asm_barrier() asm volatile("")
    +
    +#define preempt_disable() asm_barrier()
    +#define sched_preempt_enable_no_resched() asm_barrier()
    +#define preempt_enable_no_resched() asm_barrier()
    +#define preempt_enable() asm_barrier()
    +
    +#define preempt_disable_notrace() asm_barrier()
    +#define preempt_enable_no_resched_notrace() asm_barrier()
    +#define preempt_enable_notrace() asm_barrier()

     #endif /* CONFIG_PREEMPT_COUNT */

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT
  2013-04-08 14:50                     ` Linus Torvalds
@ 2013-04-08 14:59                       ` Steven Rostedt
  2013-04-08 15:07                         ` Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: Steven Rostedt @ 2013-04-08 14:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Vineet Gupta, Thomas Gleixner, Christian Ruppert,
	Pierrick Hascoet, Frederic Weisbecker, Ingo Molnar,
	Linux Kernel Mailing List

On Mon, 2013-04-08 at 07:50 -0700, Linus Torvalds wrote:

> ---
>     diff --git a/include/linux/preempt.h b/include/linux/preempt.h
>     index 5a710b9c578e..465df1c13386 100644
>     --- a/include/linux/preempt.h
>     +++ b/include/linux/preempt.h
>     @@ -93,14 +93,17 @@ do { \
> 
>      #else /* !CONFIG_PREEMPT_COUNT */
> 
>     -#define preempt_disable() do { } while (0)
>     -#define sched_preempt_enable_no_resched() do { } while (0)
>     -#define preempt_enable_no_resched() do { } while (0)
>     -#define preempt_enable() do { } while (0)
>     -
>     -#define preempt_disable_notrace() do { } while (0)
>     -#define preempt_enable_no_resched_notrace() do { } while (0)
>     -#define preempt_enable_notrace() do { } while (0)
>     +/* This is only a barrier to other asms. Notably get_user/put_user */

Probably should add in the comment:

	" or anything else that can cause a hidden schedule. "

-- Steve

>     +#define asm_barrier() asm volatile("")
>     +
>     +#define preempt_disable() asm_barrier()
>     +#define sched_preempt_enable_no_resched() asm_barrier()
>     +#define preempt_enable_no_resched() asm_barrier()
>     +#define preempt_enable() asm_barrier()
>     +
>     +#define preempt_disable_notrace() asm_barrier()
>     +#define preempt_enable_no_resched_notrace() asm_barrier()
>     +#define preempt_enable_notrace() asm_barrier()
> 
>      #endif /* CONFIG_PREEMPT_COUNT */



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT
  2013-04-08 14:59                       ` Steven Rostedt
@ 2013-04-08 15:07                         ` Linus Torvalds
  2013-04-09 14:32                           ` Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2013-04-08 15:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Vineet Gupta, Thomas Gleixner, Christian Ruppert,
	Pierrick Hascoet, Frederic Weisbecker, Ingo Molnar,
	Linux Kernel Mailing List

On Mon, Apr 8, 2013 at 7:59 AM, Steven Rostedt <srostedt@redhat.com> wrote:
>>     +/* This is only a barrier to other asms. Notably get_user/put_user */
>
> Probably should add in the comment:
>
>         " or anything else that can cause a hidden schedule. "
>

Fair enough. And I just remembered why I thought UP was special - we
need to do the same thing about spinlocks, for the same reasons.

So that "asm_barrier()" should probably be in <linux/compiler.h> along
with the "normal" barrier() definition.

*AND* somebody should re-check the gcc documentation on "volatile
asm". I'm pretty sure it used to say "not moved significantly,
including against each other" or something like that, but the gcc asm
docs have changed over time.

I'd hate to have to add a memory clobber to the get_user/put_user
asms, because that would *really* hurt. But maybe we could add some
other clobber ("cc" or similar) to make sure they can't be re-ordered
if the "volatile" isn't sufficient to make sure asms don't get
re-ordered wrt each other.

                Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT
  2013-04-08 15:07                         ` Linus Torvalds
@ 2013-04-09 14:32                           ` Linus Torvalds
  2013-04-10  7:12                             ` Peter Zijlstra
  0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2013-04-09 14:32 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Vineet Gupta, Thomas Gleixner, Christian Ruppert,
	Pierrick Hascoet, Frederic Weisbecker, Ingo Molnar,
	Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1077 bytes --]

On Mon, Apr 8, 2013 at 8:07 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Apr 8, 2013 at 7:59 AM, Steven Rostedt <srostedt@redhat.com> wrote:
>>>     +/* This is only a barrier to other asms. Notably get_user/put_user */
>>
>> Probably should add in the comment:
>>
>>         " or anything else that can cause a hidden schedule. "
>>
>
> Fair enough. And I just remembered why I thought UP was special - we
> need to do the same thing about spinlocks, for the same reasons.
>
> So that "asm_barrier()" should probably be in <linux/compiler.h> along
> with the "normal" barrier() definition.
>

I'm a moron.

Yes, "asm_barrier()" is a valid barrier for asms. But without the
"memory" clobber, it doesn't actually end up being a barrier to any
normal C loads and stores from memory, so it doesn't actually help.

So I suspect we need to just make UP spinlocks and preemption
enable/disable be full compiler barriers after all.

Something like the attached (still untested, although it seems to at
least compile) patch. Comments?

                    Linus

[-- Attachment #2: patch.diff --]
[-- Type: application/octet-stream, Size: 4275 bytes --]

 include/linux/preempt.h     | 22 ++++++++++++++--------
 include/linux/spinlock_up.h | 29 ++++++++++++++++++-----------
 2 files changed, 32 insertions(+), 19 deletions(-)

diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 5a710b9c578e..87a03c746f17 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -93,14 +93,20 @@ do { \
 
 #else /* !CONFIG_PREEMPT_COUNT */
 
-#define preempt_disable()		do { } while (0)
-#define sched_preempt_enable_no_resched()	do { } while (0)
-#define preempt_enable_no_resched()	do { } while (0)
-#define preempt_enable()		do { } while (0)
-
-#define preempt_disable_notrace()		do { } while (0)
-#define preempt_enable_no_resched_notrace()	do { } while (0)
-#define preempt_enable_notrace()		do { } while (0)
+/*
+ * Even if we don't have any preemption, we need preempt disable/enable
+ * to be barriers, so that we don't have things like get_user/put_user
+ * that can cause faults and scheduling migrate into our preempt-protected
+ * region.
+ */
+#define preempt_disable()		barrier()
+#define sched_preempt_enable_no_resched()	barrier()
+#define preempt_enable_no_resched()	barrier()
+#define preempt_enable()		barrier()
+
+#define preempt_disable_notrace()		barrier()
+#define preempt_enable_no_resched_notrace()	barrier()
+#define preempt_enable_notrace()		barrier()
 
 #endif /* CONFIG_PREEMPT_COUNT */
 
diff --git a/include/linux/spinlock_up.h b/include/linux/spinlock_up.h
index a26e2fb604e6..e2369c167dbd 100644
--- a/include/linux/spinlock_up.h
+++ b/include/linux/spinlock_up.h
@@ -16,7 +16,10 @@
  * In the debug case, 1 means unlocked, 0 means locked. (the values
  * are inverted, to catch initialization bugs)
  *
- * No atomicity anywhere, we are on UP.
+ * No atomicity anywhere, we are on UP. However, we still need
+ * the compiler barriers, because we do not want the compiler to
+ * move potentially faulting instructions (notably user accesses)
+ * into the locked sequence, resulting in non-atomic execution.
  */
 
 #ifdef CONFIG_DEBUG_SPINLOCK
@@ -25,6 +28,7 @@
 static inline void arch_spin_lock(arch_spinlock_t *lock)
 {
 	lock->slock = 0;
+	barrier();
 }
 
 static inline void
@@ -32,6 +36,7 @@ arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags)
 {
 	local_irq_save(flags);
 	lock->slock = 0;
+	barrier();
 }
 
 static inline int arch_spin_trylock(arch_spinlock_t *lock)
@@ -39,32 +44,34 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock)
 	char oldval = lock->slock;
 
 	lock->slock = 0;
+	barrier();
 
 	return oldval > 0;
 }
 
 static inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
+	barrier();
 	lock->slock = 1;
 }
 
 /*
  * Read-write spinlocks. No debug version.
  */
-#define arch_read_lock(lock)		do { (void)(lock); } while (0)
-#define arch_write_lock(lock)		do { (void)(lock); } while (0)
-#define arch_read_trylock(lock)	({ (void)(lock); 1; })
-#define arch_write_trylock(lock)	({ (void)(lock); 1; })
-#define arch_read_unlock(lock)		do { (void)(lock); } while (0)
-#define arch_write_unlock(lock)	do { (void)(lock); } while (0)
+#define arch_read_lock(lock)		do { barrier(); (void)(lock); } while (0)
+#define arch_write_lock(lock)		do { barrier(); (void)(lock); } while (0)
+#define arch_read_trylock(lock)	({ barrier(); (void)(lock); 1; })
+#define arch_write_trylock(lock)	({ barrier(); (void)(lock); 1; })
+#define arch_read_unlock(lock)		do { barrier(); (void)(lock); } while (0)
+#define arch_write_unlock(lock)	do { barrier(); (void)(lock); } while (0)
 
 #else /* DEBUG_SPINLOCK */
 #define arch_spin_is_locked(lock)	((void)(lock), 0)
 /* for sched.c and kernel_lock.c: */
-# define arch_spin_lock(lock)		do { (void)(lock); } while (0)
-# define arch_spin_lock_flags(lock, flags)	do { (void)(lock); } while (0)
-# define arch_spin_unlock(lock)	do { (void)(lock); } while (0)
-# define arch_spin_trylock(lock)	({ (void)(lock); 1; })
+# define arch_spin_lock(lock)		do { barrier(); (void)(lock); } while (0)
+# define arch_spin_lock_flags(lock, flags)	do { barrier(); (void)(lock); } while (0)
+# define arch_spin_unlock(lock)	do { barrier(); (void)(lock); } while (0)
+# define arch_spin_trylock(lock)	({ barrier(); (void)(lock); 1; })
 #endif /* DEBUG_SPINLOCK */
 
 #define arch_spin_is_contended(lock)	(((void)(lock), 0))

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH] tile: comment assumption about __insn_mtspr for <asm/irqflags.h>
  2013-04-06 18:01             ` Linus Torvalds
  2013-04-06 19:54               ` Jacquiot, Aurelien
@ 2013-04-09 16:33               ` Chris Metcalf
  1 sibling, 0 replies; 29+ messages in thread
From: Chris Metcalf @ 2013-04-09 16:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Vineet Gupta, Mark Salter, Aurelien Jacquiot, Thomas Gleixner,
	Christian Ruppert, Pierrick Hascoet, Frederic Weisbecker,
	Steven Rostedt, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, linux-arch, linux-c6x-dev

The arch_local_irq_save(), etc., routines are required to function
as compiler barriers.  They do, but it's subtle and requires knowing
that the gcc builtin __insn_mtspr() is marked as a memory clobber.
Provide a comment explaining the assumption.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 arch/tile/include/asm/irqflags.h |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Linus wrote:
> Some other architectures (tile) have such subtle implementations
> (where is __insn_mtspr() defined?) that I have a hard time judging.

diff --git a/arch/tile/include/asm/irqflags.h b/arch/tile/include/asm/irqflags.h
index 241c0bb..c96f9bb 100644
--- a/arch/tile/include/asm/irqflags.h
+++ b/arch/tile/include/asm/irqflags.h
@@ -40,7 +40,15 @@
 #include <asm/percpu.h>
 #include <arch/spr_def.h>
 
-/* Set and clear kernel interrupt masks. */
+/*
+ * Set and clear kernel interrupt masks.
+ *
+ * NOTE: __insn_mtspr() is a compiler builtin marked as a memory
+ * clobber.  We rely on it being equivalent to a compiler barrier in
+ * this code since arch_local_irq_save() and friends must act as
+ * compiler barriers.  This compiler semantic is baked into enough
+ * places that the compiler will maintain it going forward.
+ */
 #if CHIP_HAS_SPLIT_INTR_MASK()
 #if INT_PERF_COUNT < 32 || INT_AUX_PERF_COUNT < 32 || INT_MEM_ERROR >= 32
 # error Fix assumptions about which word various interrupts are in
-- 
1.7.10.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT
  2013-04-09 14:32                           ` Linus Torvalds
@ 2013-04-10  7:12                             ` Peter Zijlstra
  0 siblings, 0 replies; 29+ messages in thread
From: Peter Zijlstra @ 2013-04-10  7:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Rostedt, Vineet Gupta, Thomas Gleixner, Christian Ruppert,
	Pierrick Hascoet, Frederic Weisbecker, Ingo Molnar,
	Linux Kernel Mailing List

On Tue, 2013-04-09 at 07:32 -0700, Linus Torvalds wrote:
> Something like the attached (still untested, although it seems to at
> least compile) patch. Comments?

Yes, makes me feel far more comfortable than this "asm volatile"
business (which as you noted has holes in it).


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2013-04-10  7:12 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-29 10:33 [PATCH] timer: Fix possible issues with non serialized timer_pending( ) Vineet Gupta
2013-04-03  7:20 ` Vineet Gupta
2013-04-03  8:53 ` Christian Ruppert
2013-04-03 12:36 ` Thomas Gleixner
2013-04-03 13:03   ` Christian Ruppert
2013-04-03 13:10     ` [RFC] Add implicit barriers to irqsave/restore class of functions Christian Ruppert
2013-04-03 13:29       ` Vineet Gupta
2013-04-04  8:26         ` Christian Ruppert
2013-04-04 16:13       ` Peter Zijlstra
2013-04-05  4:27         ` Vineet Gupta
2013-04-03 14:11   ` [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT Vineet Gupta
2013-04-04 15:28     ` Christian Ruppert
2013-04-05  4:36       ` Vineet Gupta
2013-04-06 13:34         ` Vineet Gupta
2013-04-06 16:13           ` Linus Torvalds
2013-04-06 18:01             ` Linus Torvalds
2013-04-06 19:54               ` Jacquiot, Aurelien
2013-04-09 16:33               ` [PATCH] tile: comment assumption about __insn_mtspr for <asm/irqflags.h> Chris Metcalf
2013-04-08  4:20             ` [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT Vineet Gupta
2013-04-08  4:48               ` Linus Torvalds
2013-04-08 13:37                 ` Peter Zijlstra
2013-04-08 14:31                   ` Steven Rostedt
2013-04-08 14:50                     ` Linus Torvalds
2013-04-08 14:59                       ` Steven Rostedt
2013-04-08 15:07                         ` Linus Torvalds
2013-04-09 14:32                           ` Linus Torvalds
2013-04-10  7:12                             ` Peter Zijlstra
2013-04-08 14:05                 ` Steven Rostedt
2013-04-08  4:49               ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).