All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Bug in on_each_cpu?
       [not found] ` <20070301024524.c7c8ea1a.akpm@linux-foundation.org>
@ 2007-03-01 11:34   ` Zachary Amsden
  2007-03-01 11:41     ` Rusty Russell
  2007-03-01 15:28     ` Andrew Morton
  0 siblings, 2 replies; 8+ messages in thread
From: Zachary Amsden @ 2007-03-01 11:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Daniel Hecht, Rusty Russell, Linus Torvalds, Linux Kernel Mailing List

Andrew Morton wrote:
> On Thu, 01 Mar 2007 02:34:02 -0800 Zachary Amsden <zach@vmware.com> wrote:
>
>   
>> I think on_each_cpu has a serious bug now that hrtimers has been 
>> integrated.  Basically, there are many callsites of clock_was_set, none 
>> of which actually know the current interrupt state - enabled or 
>> disabled.  So they save and restore irqs, as they should.  But 
>> on_each_cpu does not do so, creating bugs in any function which calls 
>> these functions with interrupts disabled.
>>     
>
> on_each_cpu() calls smp_call_function().  It is a bug to call
> smp_call_function() with local interrupts disabled, hence it is a bug to
> call on_each_cpu() with local interrupts disabled.
>
> There is a WARN_ON() in smp_call_function() to catch this.
>   

Ok, I thought that might be the case.  We probably infinite looped in 
the interrupt handler because of bad state and thus missed the WARN_ON 
output.

> Why is it a bug?  Because there's a deadlock where this CPU is waiting for
> CPU A to take the IPI, but CPU A is waiting (with interrupts disabled) for
> this CPU to take an IPI.
>   

Then the bug is not in on_each_cpu().  It is in the usage of 
clock_was_set().  For example, look at do_settimeofday in kernel/timer.c:

        write_sequnlock_irqrestore(&xtime_lock, flags);

        /* signal hrtimers about time change */
        clock_was_set();

        return 0;

And timekeeping_resume has similar code (and called from a sysdev 
callback, so I don't know what the interrupt state should be ).  Either 
the write_sequnlock_irqrestore is redundant, and should be merely an 
write_sequnlock_irq, or the callsite is not prepared to handle enabling 
interrupts temporarily as must be done for on_each_cpu(), which is a 
pretty scary scenario.

What would be really, really nice would be to statically check all 
callsites that issue irq disables actually keep irqs disabled.  
Presumably, there was a reason they disabled irqs, and re-enabling them 
underneath their noses, even if it is to avoid a race, breaks the logic 
behind that reason.

Basically, it is just always a bug to re-enable interrupts from a caller 
which has disabled them.  Either there is no reason they needed to 
disable them in the first place, and it is just a performance issue, or 
they are updating some shared state used by the interrupt handler, which 
can then operate incorrectly, or they were taking some lock to prevent 
this, and the interrupt handler can then attempt to take a nested lock 
and hang forever.

In our case, the bug is easily worked around - we don't actually need to 
set the wallclock time here.  But there is a deeper fundamental issue 
which I think needs to be addressed.

Zach

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug in on_each_cpu?
  2007-03-01 11:34   ` Bug in on_each_cpu? Zachary Amsden
@ 2007-03-01 11:41     ` Rusty Russell
  2007-03-01 11:47       ` Zachary Amsden
  2007-03-01 15:28     ` Andrew Morton
  1 sibling, 1 reply; 8+ messages in thread
From: Rusty Russell @ 2007-03-01 11:41 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Andrew Morton, Daniel Hecht, Linus Torvalds, Linux Kernel Mailing List

On Thu, 2007-03-01 at 03:34 -0800, Zachary Amsden wrote:
> What would be really, really nice would be to statically check all 
> callsites that issue irq disables actually keep irqs disabled.  
> Presumably, there was a reason they disabled irqs, and re-enabling them 
> underneath their noses, even if it is to avoid a race, breaks the logic 
> behind that reason.

For the moment, how about a BUG_ON() in on_each_cpu()?

Rusty.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug in on_each_cpu?
  2007-03-01 11:41     ` Rusty Russell
@ 2007-03-01 11:47       ` Zachary Amsden
  2007-03-01 15:22         ` Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Zachary Amsden @ 2007-03-01 11:47 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Andrew Morton, Daniel Hecht, Linus Torvalds, Linux Kernel Mailing List

Rusty Russell wrote:
> On Thu, 2007-03-01 at 03:34 -0800, Zachary Amsden wrote:
>   
>> What would be really, really nice would be to statically check all 
>> callsites that issue irq disables actually keep irqs disabled.  
>> Presumably, there was a reason they disabled irqs, and re-enabling them 
>> underneath their noses, even if it is to avoid a race, breaks the logic 
>> behind that reason.
>>     
>
> For the moment, how about a BUG_ON() in on_each_cpu()?
>   

Sounds quite decent.  But why does on_each_cpu need to disable 
interrupts at all?  It just calls func(), then re-enables interrupts.  
So whatever was going to happen during func() that might not be 
interrupt safe could just be done in the callee, avoiding the rather 
expensive mess of disabling and re-enabling interrupts for those cases 
where it doesn't matter.

I haven't looked at the call sites, but perhaps on_each_cpu() and 
on_each_cpu_noirq() might be an appropriate distinction.

Zach

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug in on_each_cpu?
  2007-03-01 11:47       ` Zachary Amsden
@ 2007-03-01 15:22         ` Andrew Morton
  2007-03-01 20:31           ` Zachary Amsden
  2007-03-02  4:46           ` Ernie Petrides
  0 siblings, 2 replies; 8+ messages in thread
From: Andrew Morton @ 2007-03-01 15:22 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Rusty Russell, Daniel Hecht, Linus Torvalds, Linux Kernel Mailing List

On Thu, 01 Mar 2007 03:47:39 -0800 Zachary Amsden <zach@vmware.com> wrote:

> Rusty Russell wrote:
> > On Thu, 2007-03-01 at 03:34 -0800, Zachary Amsden wrote:
> >   
> >> What would be really, really nice would be to statically check all 
> >> callsites that issue irq disables actually keep irqs disabled.  
> >> Presumably, there was a reason they disabled irqs, and re-enabling them 
> >> underneath their noses, even if it is to avoid a race, breaks the logic 
> >> behind that reason.
> >>     
> >
> > For the moment, how about a BUG_ON() in on_each_cpu()?
> >   
> 
> Sounds quite decent.  But why does on_each_cpu need to disable 
> interrupts at all?  It just calls func(), then re-enables interrupts.  
> So whatever was going to happen during func() that might not be 
> interrupt safe could just be done in the callee, avoiding the rather 
> expensive mess of disabling and re-enabling interrupts for those cases 
> where it doesn't matter.

The handler for smp_call_function() is called with local interrupts
disabled (from the IPI handler).

So to provide a consistent call environment for that handler, on_each_cpu()
will also disable local interrupts when making the direct call on this CPU.

Similarly the !CONFIG_SMP version of on_each_cpu() disables local
interrupts when running the caller's function.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug in on_each_cpu?
  2007-03-01 11:34   ` Bug in on_each_cpu? Zachary Amsden
  2007-03-01 11:41     ` Rusty Russell
@ 2007-03-01 15:28     ` Andrew Morton
  2007-03-01 20:03       ` Zachary Amsden
  1 sibling, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2007-03-01 15:28 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Daniel Hecht, Rusty Russell, Linus Torvalds, Linux Kernel Mailing List

On Thu, 01 Mar 2007 03:34:18 -0800 Zachary Amsden <zach@vmware.com> wrote:

> > Why is it a bug?  Because there's a deadlock where this CPU is waiting for
> > CPU A to take the IPI, but CPU A is waiting (with interrupts disabled) for
> > this CPU to take an IPI.
> >   
> 
> Then the bug is not in on_each_cpu().  It is in the usage of 
> clock_was_set().  For example, look at do_settimeofday in kernel/timer.c:
> 
>         write_sequnlock_irqrestore(&xtime_lock, flags);
> 
>         /* signal hrtimers about time change */
>         clock_was_set();
> 
>         return 0;

Perhaps a WARN_ON(irqs_disabled()) in clock_was_set() would help.  But probably
the one in smp_call_function() will suffice.

> And timekeeping_resume has similar code (and called from a sysdev 
> callback, so I don't know what the interrupt state should be ).  Either 
> the write_sequnlock_irqrestore is redundant, and should be merely an 
> write_sequnlock_irq, or the callsite is not prepared to handle enabling 
> interrupts temporarily as must be done for on_each_cpu(), which is a 
> pretty scary scenario.
> 
> What would be really, really nice would be to statically check all 
> callsites that issue irq disables actually keep irqs disabled.  
> Presumably, there was a reason they disabled irqs, and re-enabling them 
> underneath their noses, even if it is to avoid a race, breaks the logic 
> behind that reason.

yup.  I once played with adding warnings in places like spin_lock_irq(), 
but there were false positives from places which were odd-but-correct.

It would be worth revisiting however.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug in on_each_cpu?
  2007-03-01 15:28     ` Andrew Morton
@ 2007-03-01 20:03       ` Zachary Amsden
  0 siblings, 0 replies; 8+ messages in thread
From: Zachary Amsden @ 2007-03-01 20:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Daniel Hecht, Rusty Russell, Linus Torvalds,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 2018 bytes --]

Andrew Morton wrote:
> On Thu, 01 Mar 2007 03:34:18 -0800 Zachary Amsden <zach@vmware.com> wrote:
>
>   
>>> Why is it a bug?  Because there's a deadlock where this CPU is waiting for
>>> CPU A to take the IPI, but CPU A is waiting (with interrupts disabled) for
>>> this CPU to take an IPI.
>>>   
>>>       
>> Then the bug is not in on_each_cpu().  It is in the usage of 
>> clock_was_set().  For example, look at do_settimeofday in kernel/timer.c:
>>
>>         write_sequnlock_irqrestore(&xtime_lock, flags);
>>
>>         /* signal hrtimers about time change */
>>         clock_was_set();
>>
>>         return 0;
>>     
>
> Perhaps a WARN_ON(irqs_disabled()) in clock_was_set() would help.  But probably
> the one in smp_call_function() will suffice.
>   

I'm a little unsure of how the timers play with sysfs suspend/resume, 
but calling do_settimeofday with irqs disabled is now clearly a bug.  
I'm thinking if these two cases don't fire under use, then we can safely 
down-convert the call sites.  Ingo, Thomas, what do you think of this 
(untested, unverified, patch attached):  is it true?

> yup.  I once played with adding warnings in places like spin_lock_irq(), 
> but there were false positives from places which were odd-but-correct.
>
> It would be worth revisiting however.
>   

Yes, I recall non-linear control flow in the floppy and ide drivers that 
did this, and were odd, but correct under certain call conventions.  
Whether they remain correct or still exist should be worth revisiting.

I think at least in the timer code, we should preserve the same 
invariants across the two implementations of do_getttimeofday.  One uses 
write_seqlock_irq, the other write_seqlock_irqsave.  This confused me 
into thinking it was safe to call do_gettimeofday with irqs disabled (it 
used to be in 2.6.20).  But they both call clock_was_set, and thus 
on_each_cpu, so either both of them have a bug or they both can be 
reduced to irq instead of irqsave.  I'm guessing the latter is the case.

Zach

[-- Attachment #2: gettimeofday-irq-reduce.patch --]
[-- Type: text/plain, Size: 1168 bytes --]


Index: linux-2.6.21/kernel/timer.c
===================================================================
--- linux-2.6.21.orig/kernel/timer.c	2007-02-16 16:45:44.000000000 -0800
+++ linux-2.6.21/kernel/timer.c	2007-03-01 11:56:23.000000000 -0800
@@ -841,14 +841,14 @@ EXPORT_SYMBOL(do_gettimeofday);
  */
 int do_settimeofday(struct timespec *tv)
 {
-	unsigned long flags;
 	time_t wtm_sec, sec = tv->tv_sec;
 	long wtm_nsec, nsec = tv->tv_nsec;
 
 	if ((unsigned long)tv->tv_nsec >= NSEC_PER_SEC)
 		return -EINVAL;
 
-	write_seqlock_irqsave(&xtime_lock, flags);
+	BUG_ON(irqs_disabled());
+	write_seqlock_irq(&xtime_lock);
 
 	nsec -= __get_nsec_offset();
 
@@ -861,7 +861,7 @@ int do_settimeofday(struct timespec *tv)
 	clock->error = 0;
 	ntp_clear();
 
-	write_sequnlock_irqrestore(&xtime_lock, flags);
+	write_sequnlock_irq(&xtime_lock);
 
 	/* signal hrtimers about time change */
 	clock_was_set();
@@ -981,6 +981,7 @@ static int timekeeping_resume(struct sys
 	unsigned long flags;
 	unsigned long now = read_persistent_clock();
 
+	WARN_ON_ONCE(irqs_disabled());
 	write_seqlock_irqsave(&xtime_lock, flags);
 
 	if (now && (now > timekeeping_suspend_time)) {

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug in on_each_cpu?
  2007-03-01 15:22         ` Andrew Morton
@ 2007-03-01 20:31           ` Zachary Amsden
  2007-03-02  4:46           ` Ernie Petrides
  1 sibling, 0 replies; 8+ messages in thread
From: Zachary Amsden @ 2007-03-01 20:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rusty Russell, Daniel Hecht, Linus Torvalds, Linux Kernel Mailing List

Andrew Morton wrote:
> The handler for smp_call_function() is called with local interrupts
> disabled (from the IPI handler).
>
> So to provide a consistent call environment for that handler, on_each_cpu()
> will also disable local interrupts when making the direct call on this CPU.
>
> Similarly the !CONFIG_SMP version of on_each_cpu() disables local
> interrupts when running the caller's function.
>   

Yes, that is sensible.  Something akin to on_each_cpu(synchronize_tscs) 
would certainly not like interrupts coming in.  Similarly, acpi_nmi 
disable and probably rcu barriers as well.  The irq disable here can 
very validly be used as a barrier, but trying to ensure preserved shared 
state with irq handlers over the call is a bug.

If one had all the spare time in the world, a new "sense" of irq disable 
that communicated this fact would be nice from a static or dynamic 
checking perspective.

Zach

------
sutra I.1 - atha linushasaanam

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug in on_each_cpu?
  2007-03-01 15:22         ` Andrew Morton
  2007-03-01 20:31           ` Zachary Amsden
@ 2007-03-02  4:46           ` Ernie Petrides
  1 sibling, 0 replies; 8+ messages in thread
From: Ernie Petrides @ 2007-03-02  4:46 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Andrew Morton, Rusty Russell, Daniel Hecht, Linus Torvalds,
	Linux Kernel Mailing List

On Thursday, 1-Mar-2007 at 7:22 PST, Andrew Morton wrote:

> On Thu, 01 Mar 2007 03:47:39 -0800 Zachary Amsden <zach@vmware.com> wrote:
> 
> > Rusty Russell wrote:
> > > On Thu, 2007-03-01 at 03:34 -0800, Zachary Amsden wrote:
> > >   
> > >> What would be really, really nice would be to statically check all 
> > >> callsites that issue irq disables actually keep irqs disabled.  
> > >> Presumably, there was a reason they disabled irqs, and re-enabling them 
> > >> underneath their noses, even if it is to avoid a race, breaks the logic 
> > >> behind that reason.
> > >>     
> > >
> > > For the moment, how about a BUG_ON() in on_each_cpu()?
> > >   
> > 
> > Sounds quite decent.  But why does on_each_cpu need to disable 
> > interrupts at all?  It just calls func(), then re-enables interrupts.  
> > So whatever was going to happen during func() that might not be 
> > interrupt safe could just be done in the callee, avoiding the rather 
> > expensive mess of disabling and re-enabling interrupts for those cases 
> > where it doesn't matter.
> 
> The handler for smp_call_function() is called with local interrupts
> disabled (from the IPI handler).
> 
> So to provide a consistent call environment for that handler, on_each_cpu()
> will also disable local interrupts when making the direct call on this CPU.

And further, this "consistent call environment" is *required* for correct
operation of certain callers, e.g. invalidate_bh_lrus(), whose callback
function is invalidate_bh_lru().  If invalidate_bh_lru() is called without
IRQs blocked, it might be interrupted by an IPI that causes nested execution
of that same function on behalf of another cpu's call to on_each_cpu(), and
this can lead to duplicate brelse() calls on a buf head (and ultimately to
ext3 journaling crashes due to invalid concurrent use of that buf head).

Cheers.  -ernie

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-03-02  4:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <45E6AC1A.8050608@vmware.com>
     [not found] ` <20070301024524.c7c8ea1a.akpm@linux-foundation.org>
2007-03-01 11:34   ` Bug in on_each_cpu? Zachary Amsden
2007-03-01 11:41     ` Rusty Russell
2007-03-01 11:47       ` Zachary Amsden
2007-03-01 15:22         ` Andrew Morton
2007-03-01 20:31           ` Zachary Amsden
2007-03-02  4:46           ` Ernie Petrides
2007-03-01 15:28     ` Andrew Morton
2007-03-01 20:03       ` Zachary Amsden

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.