linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] nohz: Make tick_nohz_irq_exit() irq safe
       [not found] <1361373336-11337-1-git-send-email-fweisbec@gmail.com>
@ 2013-02-20 21:00 ` Thomas Gleixner
  2013-02-20 22:01   ` Frederic Weisbecker
                     ` (5 more replies)
  2013-02-21 17:43 ` [tip:irq/urgent] nohz: Make tick_nohz_irq_exit() irq safe tip-bot for Frederic Weisbecker
  2013-02-21 20:04 ` tip-bot for Frederic Weisbecker
  2 siblings, 6 replies; 18+ messages in thread
From: Thomas Gleixner @ 2013-02-20 21:00 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Ingo Molnar, LKML, Peter Zijlstra, stable

On Wed, 20 Feb 2013, Frederic Weisbecker wrote:

> As it stands, irq_exit() may or may not be called with
> irqs disabled, depending on __ARCH_IRQ_EXIT_IRQS_DISABLED
> that the arch can define.
> 
> It makes tick_nohz_irq_exit() unsafe. For example two
> interrupts can race in tick_nohz_stop_sched_tick(): the inner
> most one computes the expiring time on top of the timer list,
> then it's interrupted right before reprogramming the
> clock. The new interrupt enqueues a new timer list timer,
> it reprogram the clock to take it into account and it exits.
> The CPUs resumes the inner most interrupt and performs the clock
> reprogramming without considering the new timer list timer.
> 
> This regression has been introduced by:
>      280f06774afedf849f0b34248ed6aff57d0f6908
>      ("nohz: Separate out irq exit and idle loop dyntick logic")
> 
> Let's fix it right now with the appropriate protections.

That's not a fix. That's an hack.
 
> A saner long term solution will be to remove
> __ARCH_IRQ_EXIT_IRQS_DISABLED.

We really want to enforce that interrupt disabled condition for
calling irq_exit(). So why make this exclusive to tick_nohz_irq_exit()?

Thanks,

	tglx

Index: linux-2.6/kernel/softirq.c
===================================================================
--- linux-2.6.orig/kernel/softirq.c
+++ linux-2.6/kernel/softirq.c
@@ -322,18 +322,10 @@ void irq_enter(void)
 
 static inline void invoke_softirq(void)
 {
-	if (!force_irqthreads) {
-#ifdef __ARCH_IRQ_EXIT_IRQS_DISABLED
+	if (!force_irqthreads)
 		__do_softirq();
-#else
-		do_softirq();
-#endif
-	} else {
-		__local_bh_disable((unsigned long)__builtin_return_address(0),
-				SOFTIRQ_OFFSET);
+	else
 		wakeup_softirqd();
-		__local_bh_enable(SOFTIRQ_OFFSET);
-	}
 }
 
 /*
@@ -341,6 +333,14 @@ static inline void invoke_softirq(void)
  */
 void irq_exit(void)
 {
+#ifndef __ARCH_IRQ_EXIT_IRQS_DISABLED
+	unsigned long flags;
+
+	local_irq_save(flags);
+#else
+	BUG_ON(!irqs_disabled();
+#endif
+
 	account_irq_exit_time(current);
 	trace_hardirq_exit();
 	sub_preempt_count(IRQ_EXIT_OFFSET);
@@ -354,6 +354,9 @@ void irq_exit(void)
 #endif
 	rcu_irq_exit();
 	sched_preempt_enable_no_resched();
+#ifndef __ARCH_IRQ_EXIT_IRQS_DISABLED
+	local_irq_restore(flags);
+#endif
 }
 
 /*

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] nohz: Make tick_nohz_irq_exit() irq safe
  2013-02-20 21:00 ` [PATCH] nohz: Make tick_nohz_irq_exit() irq safe Thomas Gleixner
@ 2013-02-20 22:01   ` Frederic Weisbecker
  2013-02-20 23:15     ` Thomas Gleixner
  2013-02-21 17:45   ` [tip:irq/urgent] irq: Ensure irq_exit() code runs with interrupts disabled tip-bot for Thomas Gleixner
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Frederic Weisbecker @ 2013-02-20 22:01 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Ingo Molnar, LKML, Peter Zijlstra, stable

2013/2/20 Thomas Gleixner <tglx@linutronix.de>:
> On Wed, 20 Feb 2013, Frederic Weisbecker wrote:
>
>> As it stands, irq_exit() may or may not be called with
>> irqs disabled, depending on __ARCH_IRQ_EXIT_IRQS_DISABLED
>> that the arch can define.
>>
>> It makes tick_nohz_irq_exit() unsafe. For example two
>> interrupts can race in tick_nohz_stop_sched_tick(): the inner
>> most one computes the expiring time on top of the timer list,
>> then it's interrupted right before reprogramming the
>> clock. The new interrupt enqueues a new timer list timer,
>> it reprogram the clock to take it into account and it exits.
>> The CPUs resumes the inner most interrupt and performs the clock
>> reprogramming without considering the new timer list timer.
>>
>> This regression has been introduced by:
>>      280f06774afedf849f0b34248ed6aff57d0f6908
>>      ("nohz: Separate out irq exit and idle loop dyntick logic")
>>
>> Let's fix it right now with the appropriate protections.
>
> That's not a fix. That's an hack.

I know it looks that way. That's because it's a pure regression fix,
minimal for backportability.

I'm distinguishing two different things here: the fact that some archs
can call irq_exit() with interrupts enabled which is a global design
problem, and the fact that tick_nohz_irq_exit() was safe against that
until 3.2 when I broke it with a commit of mine.

My goal was basically to restore that protection in a minimal commit
such that we can backport the regression fix, then deal with
__ARCH_IRQ_EXIT_IRQS_DISABLED afterward, since it requires some more
invasive changes.

>> A saner long term solution will be to remove
>> __ARCH_IRQ_EXIT_IRQS_DISABLED.
>
> We really want to enforce that interrupt disabled condition for
> calling irq_exit(). So why make this exclusive to tick_nohz_irq_exit()?

I need a fix that I can backport. Is the below fine with a stable tag?
It looks a bit too invasive for the single regression involved.

Thanks.

>
> Thanks,
>
>         tglx
>
> Index: linux-2.6/kernel/softirq.c
> ===================================================================
> --- linux-2.6.orig/kernel/softirq.c
> +++ linux-2.6/kernel/softirq.c
> @@ -322,18 +322,10 @@ void irq_enter(void)
>
>  static inline void invoke_softirq(void)
>  {
> -       if (!force_irqthreads) {
> -#ifdef __ARCH_IRQ_EXIT_IRQS_DISABLED
> +       if (!force_irqthreads)
>                 __do_softirq();
> -#else
> -               do_softirq();
> -#endif
> -       } else {
> -               __local_bh_disable((unsigned long)__builtin_return_address(0),
> -                               SOFTIRQ_OFFSET);
> +       else
>                 wakeup_softirqd();
> -               __local_bh_enable(SOFTIRQ_OFFSET);
> -       }
>  }
>
>  /*
> @@ -341,6 +333,14 @@ static inline void invoke_softirq(void)
>   */
>  void irq_exit(void)
>  {
> +#ifndef __ARCH_IRQ_EXIT_IRQS_DISABLED
> +       unsigned long flags;
> +
> +       local_irq_save(flags);
> +#else
> +       BUG_ON(!irqs_disabled();
> +#endif
> +
>         account_irq_exit_time(current);
>         trace_hardirq_exit();
>         sub_preempt_count(IRQ_EXIT_OFFSET);
> @@ -354,6 +354,9 @@ void irq_exit(void)
>  #endif
>         rcu_irq_exit();
>         sched_preempt_enable_no_resched();
> +#ifndef __ARCH_IRQ_EXIT_IRQS_DISABLED
> +       local_irq_restore(flags);
> +#endif
>  }
>
>  /*

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] nohz: Make tick_nohz_irq_exit() irq safe
  2013-02-20 22:01   ` Frederic Weisbecker
@ 2013-02-20 23:15     ` Thomas Gleixner
  2013-02-21 16:13       ` Frederic Weisbecker
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2013-02-20 23:15 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Ingo Molnar, LKML, Peter Zijlstra, stable

On Wed, 20 Feb 2013, Frederic Weisbecker wrote:
> 2013/2/20 Thomas Gleixner <tglx@linutronix.de>:
> > That's not a fix. That's an hack.
> 
> I know it looks that way. That's because it's a pure regression fix,
> minimal for backportability.
> 
> I'm distinguishing two different things here: the fact that some archs
> can call irq_exit() with interrupts enabled which is a global design
> problem, and the fact that tick_nohz_irq_exit() was safe against that
> until 3.2 when I broke it with a commit of mine.
> 
> My goal was basically to restore that protection in a minimal commit
> such that we can backport the regression fix, then deal with
> __ARCH_IRQ_EXIT_IRQS_DISABLED afterward, since it requires some more
> invasive changes.
> 
> >> A saner long term solution will be to remove
> >> __ARCH_IRQ_EXIT_IRQS_DISABLED.
> >
> > We really want to enforce that interrupt disabled condition for
> > calling irq_exit(). So why make this exclusive to tick_nohz_irq_exit()?
> 
> I need a fix that I can backport. Is the below fine with a stable tag?
> It looks a bit too invasive for the single regression involved.

I think that's fine as it's obviously correct and not diluting the
real underlying issue of the __ARCH_IRQ_EXIT_IRQS_DISABLED insanity.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] nohz: Make tick_nohz_irq_exit() irq safe
  2013-02-20 23:15     ` Thomas Gleixner
@ 2013-02-21 16:13       ` Frederic Weisbecker
  2013-02-21 16:46         ` Thomas Gleixner
  0 siblings, 1 reply; 18+ messages in thread
From: Frederic Weisbecker @ 2013-02-21 16:13 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Ingo Molnar, LKML, Peter Zijlstra, stable

2013/2/21 Thomas Gleixner <tglx@linutronix.de>:
> On Wed, 20 Feb 2013, Frederic Weisbecker wrote:
>> 2013/2/20 Thomas Gleixner <tglx@linutronix.de>:
>> > That's not a fix. That's an hack.
>>
>> I know it looks that way. That's because it's a pure regression fix,
>> minimal for backportability.
>>
>> I'm distinguishing two different things here: the fact that some archs
>> can call irq_exit() with interrupts enabled which is a global design
>> problem, and the fact that tick_nohz_irq_exit() was safe against that
>> until 3.2 when I broke it with a commit of mine.
>>
>> My goal was basically to restore that protection in a minimal commit
>> such that we can backport the regression fix, then deal with
>> __ARCH_IRQ_EXIT_IRQS_DISABLED afterward, since it requires some more
>> invasive changes.
>>
>> >> A saner long term solution will be to remove
>> >> __ARCH_IRQ_EXIT_IRQS_DISABLED.
>> >
>> > We really want to enforce that interrupt disabled condition for
>> > calling irq_exit(). So why make this exclusive to tick_nohz_irq_exit()?
>>
>> I need a fix that I can backport. Is the below fine with a stable tag?
>> It looks a bit too invasive for the single regression involved.
>
> I think that's fine as it's obviously correct and not diluting the
> real underlying issue of the __ARCH_IRQ_EXIT_IRQS_DISABLED insanity.

Ok fine. Do you plan to commit your proposed change then?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] nohz: Make tick_nohz_irq_exit() irq safe
  2013-02-21 16:13       ` Frederic Weisbecker
@ 2013-02-21 16:46         ` Thomas Gleixner
  2013-02-21 16:49           ` Frederic Weisbecker
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2013-02-21 16:46 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Ingo Molnar, LKML, Peter Zijlstra, stable

On Thu, 21 Feb 2013, Frederic Weisbecker wrote:
> 2013/2/21 Thomas Gleixner <tglx@linutronix.de>:
> > On Wed, 20 Feb 2013, Frederic Weisbecker wrote:
> >> 2013/2/20 Thomas Gleixner <tglx@linutronix.de>:
> >> > That's not a fix. That's an hack.
> >>
> >> I know it looks that way. That's because it's a pure regression fix,
> >> minimal for backportability.
> >>
> >> I'm distinguishing two different things here: the fact that some archs
> >> can call irq_exit() with interrupts enabled which is a global design
> >> problem, and the fact that tick_nohz_irq_exit() was safe against that
> >> until 3.2 when I broke it with a commit of mine.
> >>
> >> My goal was basically to restore that protection in a minimal commit
> >> such that we can backport the regression fix, then deal with
> >> __ARCH_IRQ_EXIT_IRQS_DISABLED afterward, since it requires some more
> >> invasive changes.
> >>
> >> >> A saner long term solution will be to remove
> >> >> __ARCH_IRQ_EXIT_IRQS_DISABLED.
> >> >
> >> > We really want to enforce that interrupt disabled condition for
> >> > calling irq_exit(). So why make this exclusive to tick_nohz_irq_exit()?
> >>
> >> I need a fix that I can backport. Is the below fine with a stable tag?
> >> It looks a bit too invasive for the single regression involved.
> >
> > I think that's fine as it's obviously correct and not diluting the
> > real underlying issue of the __ARCH_IRQ_EXIT_IRQS_DISABLED insanity.
> 
> Ok fine. Do you plan to commit your proposed change then?

Second thoughts. I probably go for your minimal fix for stable and
then push my version on top of it to Linus only.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] nohz: Make tick_nohz_irq_exit() irq safe
  2013-02-21 16:46         ` Thomas Gleixner
@ 2013-02-21 16:49           ` Frederic Weisbecker
  0 siblings, 0 replies; 18+ messages in thread
From: Frederic Weisbecker @ 2013-02-21 16:49 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Ingo Molnar, LKML, Peter Zijlstra, stable

2013/2/21 Thomas Gleixner <tglx@linutronix.de>:
> On Thu, 21 Feb 2013, Frederic Weisbecker wrote:
> Second thoughts. I probably go for your minimal fix for stable and
> then push my version on top of it to Linus only.

Thanks, I feel more comfortable with that. Then I'll try to iterate
over archs to finally remove that damn macro.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [tip:irq/urgent] nohz: Make tick_nohz_irq_exit() irq safe
       [not found] <1361373336-11337-1-git-send-email-fweisbec@gmail.com>
  2013-02-20 21:00 ` [PATCH] nohz: Make tick_nohz_irq_exit() irq safe Thomas Gleixner
@ 2013-02-21 17:43 ` tip-bot for Frederic Weisbecker
  2013-02-21 20:04 ` tip-bot for Frederic Weisbecker
  2 siblings, 0 replies; 18+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2013-02-21 17:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, fweisbec, torvalds, peterz, tglx

Commit-ID:  493c60613d2d0b2fa6b3f7e2d3299ee4fef52b76
Gitweb:     http://git.kernel.org/tip/493c60613d2d0b2fa6b3f7e2d3299ee4fef52b76
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Wed, 20 Feb 2013 16:15:36 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 21 Feb 2013 18:32:44 +0100

nohz: Make tick_nohz_irq_exit() irq safe

As it stands, irq_exit() may or may not be called with
irqs disabled, depending on __ARCH_IRQ_EXIT_IRQS_DISABLED
that the arch can define.

It makes tick_nohz_irq_exit() unsafe. For example two
interrupts can race in tick_nohz_stop_sched_tick(): the inner
most one computes the expiring time on top of the timer list,
then it's interrupted right before reprogramming the
clock. The new interrupt enqueues a new timer list timer,
it reprogram the clock to take it into account and it exits.
The CPUs resumes the inner most interrupt and performs the clock
reprogramming without considering the new timer list timer.

This regression has been introduced by:
     280f06774afedf849f0b34248ed6aff57d0f6908
     ("nohz: Separate out irq exit and idle loop dyntick logic")

Let's fix it right now with the appropriate protections.

A saner long term solution will be to remove
__ARCH_IRQ_EXIT_IRQS_DISABLED and mandate that irq_exit() is called
with interrupts disabled.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: <stable@vger.kernel.org> #v3.2+
Link: http://lkml.kernel.org/r/1361373336-11337-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/tick-sched.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 314b9ee..520592a 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -565,14 +565,19 @@ void tick_nohz_idle_enter(void)
  */
 void tick_nohz_irq_exit(void)
 {
+	unsigned long flags;
 	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
 
 	if (!ts->inidle)
 		return;
 
-	/* Cancel the timer because CPU already waken up from the C-states*/
+	local_irq_save(flags);
+
+	/* Cancel the timer because CPU already waken up from the C-states */
 	menu_hrtimer_cancel();
 	__tick_nohz_idle_enter(ts);
+
+	local_irq_restore(flags);
 }
 
 /**

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [tip:irq/urgent] irq: Ensure irq_exit() code runs with interrupts disabled
  2013-02-20 21:00 ` [PATCH] nohz: Make tick_nohz_irq_exit() irq safe Thomas Gleixner
  2013-02-20 22:01   ` Frederic Weisbecker
@ 2013-02-21 17:45   ` tip-bot for Thomas Gleixner
  2013-02-21 19:48     ` Frederic Weisbecker
  2013-02-21 17:46   ` [tip:irq/urgent] irq: Sanitize invoke_softirq tip-bot for Thomas Gleixner
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: tip-bot for Thomas Gleixner @ 2013-02-21 17:45 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, peterz, paulmck, fweisbec, tglx

Commit-ID:  630e7580f681475d92d87e78373e4136096e12f2
Gitweb:     http://git.kernel.org/tip/630e7580f681475d92d87e78373e4136096e12f2
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Wed, 20 Feb 2013 22:00:48 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 21 Feb 2013 18:32:44 +0100

irq: Ensure irq_exit() code runs with interrupts disabled

We had already a few problems with code called from irq_exit() when
interrupted from a nesting interrupt. This can happen on architectures
which do not define __ARCH_IRQ_EXIT_IRQS_DISABLED.

__ARCH_IRQ_EXIT_IRQS_DISABLED should go away and we want to make it
mandatory to call irq_exit() with interrupts disabled.

As a temporary protection disable interrupts for those architectures
which do not define __ARCH_IRQ_EXIT_IRQS_DISABLED and add a WARN_ONCE
when an architecture which defines __ARCH_IRQ_EXIT_IRQS_DISABLED calls
irq_exit() with interrupts enabled.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1302202155320.22263@ionos

---
 kernel/softirq.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index f5cc25f..d1f7943 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -341,6 +341,14 @@ static inline void invoke_softirq(void)
  */
 void irq_exit(void)
 {
+#ifndef __ARCH_IRQ_EXIT_IRQS_DISABLED
+	unsigned long flags;
+
+	local_irq_save(flags);
+#else
+	WARN_ON_ONCE(!irqs_disabled();
+#endif
+
 	account_irq_exit_time(current);
 	trace_hardirq_exit();
 	sub_preempt_count(IRQ_EXIT_OFFSET);
@@ -354,6 +362,9 @@ void irq_exit(void)
 #endif
 	rcu_irq_exit();
 	sched_preempt_enable_no_resched();
+#ifndef __ARCH_IRQ_EXIT_IRQS_DISABLED
+	local_irq_restore(flags);
+#endif
 }
 
 /*

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [tip:irq/urgent] irq: Sanitize invoke_softirq
  2013-02-20 21:00 ` [PATCH] nohz: Make tick_nohz_irq_exit() irq safe Thomas Gleixner
  2013-02-20 22:01   ` Frederic Weisbecker
  2013-02-21 17:45   ` [tip:irq/urgent] irq: Ensure irq_exit() code runs with interrupts disabled tip-bot for Thomas Gleixner
@ 2013-02-21 17:46   ` tip-bot for Thomas Gleixner
  2013-02-21 17:53   ` [PATCH] nohz: Make tick_nohz_irq_exit() irq safe Linus Torvalds
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: tip-bot for Thomas Gleixner @ 2013-02-21 17:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, peterz, paulmck, fweisbec, tglx

Commit-ID:  1ba476a0c9bd0907fa31a74d47e9f4f84b404690
Gitweb:     http://git.kernel.org/tip/1ba476a0c9bd0907fa31a74d47e9f4f84b404690
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 21 Feb 2013 18:17:42 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 21 Feb 2013 18:32:45 +0100

irq: Sanitize invoke_softirq

With the irq protection in irq_exit, we can remove the #ifdeffery and
the bh_disable/enable dance in invoke_softirq()

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1302202155320.22263@ionos

---
 kernel/softirq.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index d1f7943..aebc590 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -322,18 +322,10 @@ void irq_enter(void)
 
 static inline void invoke_softirq(void)
 {
-	if (!force_irqthreads) {
-#ifdef __ARCH_IRQ_EXIT_IRQS_DISABLED
+	if (!force_irqthreads)
 		__do_softirq();
-#else
-		do_softirq();
-#endif
-	} else {
-		__local_bh_disable((unsigned long)__builtin_return_address(0),
-				SOFTIRQ_OFFSET);
+	else
 		wakeup_softirqd();
-		__local_bh_enable(SOFTIRQ_OFFSET);
-	}
 }
 
 /*

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] nohz: Make tick_nohz_irq_exit() irq safe
  2013-02-20 21:00 ` [PATCH] nohz: Make tick_nohz_irq_exit() irq safe Thomas Gleixner
                     ` (2 preceding siblings ...)
  2013-02-21 17:46   ` [tip:irq/urgent] irq: Sanitize invoke_softirq tip-bot for Thomas Gleixner
@ 2013-02-21 17:53   ` Linus Torvalds
  2013-02-21 18:21     ` Thomas Gleixner
  2013-02-21 20:05   ` [tip:irq/urgent] irq: Ensure irq_exit() code runs with interrupts disabled tip-bot for Thomas Gleixner
  2013-02-21 20:07   ` [tip:irq/urgent] irq: Sanitize invoke_softirq tip-bot for Thomas Gleixner
  5 siblings, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2013-02-21 17:53 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Frederic Weisbecker, Ingo Molnar, LKML, Peter Zijlstra, stable

On Wed, Feb 20, 2013 at 1:00 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>   */
>  void irq_exit(void)
>  {
> +#ifndef __ARCH_IRQ_EXIT_IRQS_DISABLED
> +       unsigned long flags;
> +
> +       local_irq_save(flags);
> +#else
> +       BUG_ON(!irqs_disabled();
> +#endif

Guys, STOP DOING THIS!

Adding BUG_ON()'s just makes things much much much worse. There is
*never* a reason to add a BUG_ON(). And doing it in an interrupt path
is totally unacceptable. BUG_ON() makes it almost impossible to debug
something, because you just killed the machine. So using BUG_ON() for
"please notice this" is stupid as hell, because the most common end
result is: "Oh, the machine just hung with no messages".

Make it WARN_ON_ONCE() if you absolutely have to let people know, but
for something like this, why would you do even that?

                          Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] nohz: Make tick_nohz_irq_exit() irq safe
  2013-02-21 17:53   ` [PATCH] nohz: Make tick_nohz_irq_exit() irq safe Linus Torvalds
@ 2013-02-21 18:21     ` Thomas Gleixner
  2013-02-21 18:28       ` Linus Torvalds
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2013-02-21 18:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Frederic Weisbecker, Ingo Molnar, LKML, Peter Zijlstra, stable

On Thu, 21 Feb 2013, Linus Torvalds wrote:

> On Wed, Feb 20, 2013 at 1:00 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >   */
> >  void irq_exit(void)
> >  {
> > +#ifndef __ARCH_IRQ_EXIT_IRQS_DISABLED
> > +       unsigned long flags;
> > +
> > +       local_irq_save(flags);
> > +#else
> > +       BUG_ON(!irqs_disabled();
> > +#endif
> 
> Guys, STOP DOING THIS!
> 
> Adding BUG_ON()'s just makes things much much much worse. There is
> *never* a reason to add a BUG_ON(). And doing it in an interrupt path
> is totally unacceptable. BUG_ON() makes it almost impossible to debug
> something, because you just killed the machine. So using BUG_ON() for
> "please notice this" is stupid as hell, because the most common end
> result is: "Oh, the machine just hung with no messages".
> 
> Make it WARN_ON_ONCE() if you absolutely have to let people know, but
> for something like this, why would you do even that?

This was a draft patch. I made it a WARN_ON_ONCE() already. 

We really want to enforce that irq_exit() is called with interrupts
disabled.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] nohz: Make tick_nohz_irq_exit() irq safe
  2013-02-21 18:21     ` Thomas Gleixner
@ 2013-02-21 18:28       ` Linus Torvalds
  2013-02-22  8:54         ` Ingo Molnar
  0 siblings, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2013-02-21 18:28 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Frederic Weisbecker, Ingo Molnar, LKML, Peter Zijlstra, stable

On Thu, Feb 21, 2013 at 10:21 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> This was a draft patch. I made it a WARN_ON_ONCE() already.

Ok, good.

I really wish we could just get rid of BUG_ON(). It was a bad idea,
and it makes it easy for people to do the wrong thing. Sadly, we have
tons of them.

                        Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [tip:irq/urgent] irq: Ensure irq_exit() code runs with interrupts disabled
  2013-02-21 17:45   ` [tip:irq/urgent] irq: Ensure irq_exit() code runs with interrupts disabled tip-bot for Thomas Gleixner
@ 2013-02-21 19:48     ` Frederic Weisbecker
  2013-02-21 19:51       ` Thomas Gleixner
  0 siblings, 1 reply; 18+ messages in thread
From: Frederic Weisbecker @ 2013-02-21 19:48 UTC (permalink / raw)
  To: paulmck, mingo, hpa, linux-kernel, fweisbec, torvalds, peterz, tglx
  Cc: linux-tip-commits

2013/2/21 tip-bot for Thomas Gleixner <tipbot@zytor.com>:
> Commit-ID:  630e7580f681475d92d87e78373e4136096e12f2
> Gitweb:     http://git.kernel.org/tip/630e7580f681475d92d87e78373e4136096e12f2
> Author:     Thomas Gleixner <tglx@linutronix.de>
> AuthorDate: Wed, 20 Feb 2013 22:00:48 +0100
> Committer:  Thomas Gleixner <tglx@linutronix.de>
> CommitDate: Thu, 21 Feb 2013 18:32:44 +0100
>
> irq: Ensure irq_exit() code runs with interrupts disabled
>
> We had already a few problems with code called from irq_exit() when
> interrupted from a nesting interrupt. This can happen on architectures
> which do not define __ARCH_IRQ_EXIT_IRQS_DISABLED.
>
> __ARCH_IRQ_EXIT_IRQS_DISABLED should go away and we want to make it
> mandatory to call irq_exit() with interrupts disabled.
>
> As a temporary protection disable interrupts for those architectures
> which do not define __ARCH_IRQ_EXIT_IRQS_DISABLED and add a WARN_ONCE
> when an architecture which defines __ARCH_IRQ_EXIT_IRQS_DISABLED calls
> irq_exit() with interrupts enabled.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Linus Torvalds <torvalds@linuxfoundation.org>
> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1302202155320.22263@ionos
>
> ---
>  kernel/softirq.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index f5cc25f..d1f7943 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -341,6 +341,14 @@ static inline void invoke_softirq(void)
>   */
>  void irq_exit(void)
>  {
> +#ifndef __ARCH_IRQ_EXIT_IRQS_DISABLED
> +       unsigned long flags;
> +
> +       local_irq_save(flags);
> +#else
> +       WARN_ON_ONCE(!irqs_disabled();

Missing closing parenthesis.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [tip:irq/urgent] irq: Ensure irq_exit() code runs with interrupts disabled
  2013-02-21 19:48     ` Frederic Weisbecker
@ 2013-02-21 19:51       ` Thomas Gleixner
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Gleixner @ 2013-02-21 19:51 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: paulmck, mingo, hpa, linux-kernel, torvalds, peterz, linux-tip-commits

On Thu, 21 Feb 2013, Frederic Weisbecker wrote:
> >  void irq_exit(void)
> >  {
> > +#ifndef __ARCH_IRQ_EXIT_IRQS_DISABLED
> > +       unsigned long flags;
> > +
> > +       local_irq_save(flags);
> > +#else
> > +       WARN_ON_ONCE(!irqs_disabled();
> 
> Missing closing parenthesis.

Dammit.
 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [tip:irq/urgent] nohz: Make tick_nohz_irq_exit() irq safe
       [not found] <1361373336-11337-1-git-send-email-fweisbec@gmail.com>
  2013-02-20 21:00 ` [PATCH] nohz: Make tick_nohz_irq_exit() irq safe Thomas Gleixner
  2013-02-21 17:43 ` [tip:irq/urgent] nohz: Make tick_nohz_irq_exit() irq safe tip-bot for Frederic Weisbecker
@ 2013-02-21 20:04 ` tip-bot for Frederic Weisbecker
  2 siblings, 0 replies; 18+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2013-02-21 20:04 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, fweisbec, torvalds, peterz, tglx

Commit-ID:  e5ab012c3271990e8457055c25cafddc1ae8aa6b
Gitweb:     http://git.kernel.org/tip/e5ab012c3271990e8457055c25cafddc1ae8aa6b
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Wed, 20 Feb 2013 16:15:36 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 21 Feb 2013 20:52:24 +0100

nohz: Make tick_nohz_irq_exit() irq safe

As it stands, irq_exit() may or may not be called with
irqs disabled, depending on __ARCH_IRQ_EXIT_IRQS_DISABLED
that the arch can define.

It makes tick_nohz_irq_exit() unsafe. For example two
interrupts can race in tick_nohz_stop_sched_tick(): the inner
most one computes the expiring time on top of the timer list,
then it's interrupted right before reprogramming the
clock. The new interrupt enqueues a new timer list timer,
it reprogram the clock to take it into account and it exits.
The CPUs resumes the inner most interrupt and performs the clock
reprogramming without considering the new timer list timer.

This regression has been introduced by:
     280f06774afedf849f0b34248ed6aff57d0f6908
     ("nohz: Separate out irq exit and idle loop dyntick logic")

Let's fix it right now with the appropriate protections.

A saner long term solution will be to remove
__ARCH_IRQ_EXIT_IRQS_DISABLED and mandate that irq_exit() is called
with interrupts disabled.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: <stable@vger.kernel.org> #v3.2+
Link: http://lkml.kernel.org/r/1361373336-11337-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/tick-sched.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 314b9ee..520592a 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -565,14 +565,19 @@ void tick_nohz_idle_enter(void)
  */
 void tick_nohz_irq_exit(void)
 {
+	unsigned long flags;
 	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
 
 	if (!ts->inidle)
 		return;
 
-	/* Cancel the timer because CPU already waken up from the C-states*/
+	local_irq_save(flags);
+
+	/* Cancel the timer because CPU already waken up from the C-states */
 	menu_hrtimer_cancel();
 	__tick_nohz_idle_enter(ts);
+
+	local_irq_restore(flags);
 }
 
 /**

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [tip:irq/urgent] irq: Ensure irq_exit() code runs with interrupts disabled
  2013-02-20 21:00 ` [PATCH] nohz: Make tick_nohz_irq_exit() irq safe Thomas Gleixner
                     ` (3 preceding siblings ...)
  2013-02-21 17:53   ` [PATCH] nohz: Make tick_nohz_irq_exit() irq safe Linus Torvalds
@ 2013-02-21 20:05   ` tip-bot for Thomas Gleixner
  2013-02-21 20:07   ` [tip:irq/urgent] irq: Sanitize invoke_softirq tip-bot for Thomas Gleixner
  5 siblings, 0 replies; 18+ messages in thread
From: tip-bot for Thomas Gleixner @ 2013-02-21 20:05 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, peterz, paulmck, fweisbec, tglx

Commit-ID:  74eed0163d0def3fce27228d9ccf3d36e207b286
Gitweb:     http://git.kernel.org/tip/74eed0163d0def3fce27228d9ccf3d36e207b286
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Wed, 20 Feb 2013 22:00:48 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 21 Feb 2013 20:52:24 +0100

irq: Ensure irq_exit() code runs with interrupts disabled

We had already a few problems with code called from irq_exit() when
interrupted from a nesting interrupt. This can happen on architectures
which do not define __ARCH_IRQ_EXIT_IRQS_DISABLED.

__ARCH_IRQ_EXIT_IRQS_DISABLED should go away and we want to make it
mandatory to call irq_exit() with interrupts disabled.

As a temporary protection disable interrupts for those architectures
which do not define __ARCH_IRQ_EXIT_IRQS_DISABLED and add a WARN_ONCE
when an architecture which defines __ARCH_IRQ_EXIT_IRQS_DISABLED calls
irq_exit() with interrupts enabled.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1302202155320.22263@ionos

---
 kernel/softirq.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index f5cc25f..f2a9346 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -341,6 +341,14 @@ static inline void invoke_softirq(void)
  */
 void irq_exit(void)
 {
+#ifndef __ARCH_IRQ_EXIT_IRQS_DISABLED
+	unsigned long flags;
+
+	local_irq_save(flags);
+#else
+	WARN_ON_ONCE(!irqs_disabled());
+#endif
+
 	account_irq_exit_time(current);
 	trace_hardirq_exit();
 	sub_preempt_count(IRQ_EXIT_OFFSET);
@@ -354,6 +362,9 @@ void irq_exit(void)
 #endif
 	rcu_irq_exit();
 	sched_preempt_enable_no_resched();
+#ifndef __ARCH_IRQ_EXIT_IRQS_DISABLED
+	local_irq_restore(flags);
+#endif
 }
 
 /*

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [tip:irq/urgent] irq: Sanitize invoke_softirq
  2013-02-20 21:00 ` [PATCH] nohz: Make tick_nohz_irq_exit() irq safe Thomas Gleixner
                     ` (4 preceding siblings ...)
  2013-02-21 20:05   ` [tip:irq/urgent] irq: Ensure irq_exit() code runs with interrupts disabled tip-bot for Thomas Gleixner
@ 2013-02-21 20:07   ` tip-bot for Thomas Gleixner
  5 siblings, 0 replies; 18+ messages in thread
From: tip-bot for Thomas Gleixner @ 2013-02-21 20:07 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, peterz, paulmck, fweisbec, tglx

Commit-ID:  facd8b80c67a3cf64a467c4a2ac5fb31f2e6745b
Gitweb:     http://git.kernel.org/tip/facd8b80c67a3cf64a467c4a2ac5fb31f2e6745b
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 21 Feb 2013 18:17:42 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 21 Feb 2013 20:52:24 +0100

irq: Sanitize invoke_softirq

With the irq protection in irq_exit, we can remove the #ifdeffery and
the bh_disable/enable dance in invoke_softirq()

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1302202155320.22263@ionos

---
 kernel/softirq.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index f2a9346..24a921b 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -322,18 +322,10 @@ void irq_enter(void)
 
 static inline void invoke_softirq(void)
 {
-	if (!force_irqthreads) {
-#ifdef __ARCH_IRQ_EXIT_IRQS_DISABLED
+	if (!force_irqthreads)
 		__do_softirq();
-#else
-		do_softirq();
-#endif
-	} else {
-		__local_bh_disable((unsigned long)__builtin_return_address(0),
-				SOFTIRQ_OFFSET);
+	else
 		wakeup_softirqd();
-		__local_bh_enable(SOFTIRQ_OFFSET);
-	}
 }
 
 /*

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] nohz: Make tick_nohz_irq_exit() irq safe
  2013-02-21 18:28       ` Linus Torvalds
@ 2013-02-22  8:54         ` Ingo Molnar
  0 siblings, 0 replies; 18+ messages in thread
From: Ingo Molnar @ 2013-02-22  8:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Frederic Weisbecker, LKML, Peter Zijlstra, stable


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Thu, Feb 21, 2013 at 10:21 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > This was a draft patch. I made it a WARN_ON_ONCE() already.
> 
> Ok, good.
> 
> I really wish we could just get rid of BUG_ON(). It was a bad 
> idea, and it makes it easy for people to do the wrong thing. 
> Sadly, we have tons of them.

So my old plan was to use a little bit of psychology.

Firstly, we could just turn BUG_ON() into a WARN() variant that 
emits:

  BUG: ...

while a WARN()ings emit:

  WARNING: ...

and then we could introduce a new primitive:

  CRASH_ON();

which would be used in the (few) places that really, really 
cannot continue sanely and need to crash the box.

This naming alone would inhibit its use through two channels:

 - Putting the word 'CRASH' into your code feels risky, 
   dissonant and wrong (perfect code does not crash) and thus 
   needs conscious frontal lobe effort to justify it - while 
   BUG_ON() really feels more like a harmless assert to most 
   kernel developers, which is in our muscle memory through 
   years training.

 - CRASH_ON() takes one character more typing than WARN_ON(), 
   and we know good kernel developers are fundamentally lazy.

[ This is an arguably lazy plan that does not involve changing 
  the 10,000+ BUG_ON() call sites and does not involve the
  re-training of thousands of mis-trained kernel developers who 
  introduced over 900 new BUG_ON()s in the v3.7->v3.8 cycle 
  alone (!). ]

So while I don't think we can win the war against BUG_ON(), I 
think we can fight the lazy general's war: turn the enemy over 
to our side and declare victory?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-02-22  8:54 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1361373336-11337-1-git-send-email-fweisbec@gmail.com>
2013-02-20 21:00 ` [PATCH] nohz: Make tick_nohz_irq_exit() irq safe Thomas Gleixner
2013-02-20 22:01   ` Frederic Weisbecker
2013-02-20 23:15     ` Thomas Gleixner
2013-02-21 16:13       ` Frederic Weisbecker
2013-02-21 16:46         ` Thomas Gleixner
2013-02-21 16:49           ` Frederic Weisbecker
2013-02-21 17:45   ` [tip:irq/urgent] irq: Ensure irq_exit() code runs with interrupts disabled tip-bot for Thomas Gleixner
2013-02-21 19:48     ` Frederic Weisbecker
2013-02-21 19:51       ` Thomas Gleixner
2013-02-21 17:46   ` [tip:irq/urgent] irq: Sanitize invoke_softirq tip-bot for Thomas Gleixner
2013-02-21 17:53   ` [PATCH] nohz: Make tick_nohz_irq_exit() irq safe Linus Torvalds
2013-02-21 18:21     ` Thomas Gleixner
2013-02-21 18:28       ` Linus Torvalds
2013-02-22  8:54         ` Ingo Molnar
2013-02-21 20:05   ` [tip:irq/urgent] irq: Ensure irq_exit() code runs with interrupts disabled tip-bot for Thomas Gleixner
2013-02-21 20:07   ` [tip:irq/urgent] irq: Sanitize invoke_softirq tip-bot for Thomas Gleixner
2013-02-21 17:43 ` [tip:irq/urgent] nohz: Make tick_nohz_irq_exit() irq safe tip-bot for Frederic Weisbecker
2013-02-21 20:04 ` tip-bot for Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).