linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86, kdump: No need to disable ioapic in crash path
@ 2012-02-02 18:12 Don Zickus
  2012-02-02 23:24 ` Eric W. Biederman
  0 siblings, 1 reply; 19+ messages in thread
From: Don Zickus @ 2012-02-02 18:12 UTC (permalink / raw)
  To: x86; +Cc: LKML, kexec-list, Don Zickus, Eric W. Biederman, Vivek Goyal

A customer of ours noticed when their machine crashed, kdump did not
work but hung instead.  Using their firmware dumping solution they
grabbed a vmcore and decoded the stacks on the cpus.  What they
noticed seemed to be a rare deadlock with the ioapic_lock.

 CPU4:
 machine_crash_shutdown
 -> machine_ops.crash_shutdown
    -> native_machine_crash_shutdown
       -> kdump_nmi_shootdown_cpus ------> Send NMI to other CPUs
       -> disable_IO_APIC
          -> clear_IO_APIC
             -> clear_IO_APIC_pin
                -> ioapic_read_entry
                   -> spin_lock_irqsave(&ioapic_lock, flags)
                   ---Infinite loop here---

 CPU0:
 do_IRQ
 -> handle_irq
    -> handle_edge_irq
        -> ack_apic_edge
           -> move_native_irq
               -> mask_IO_APIC_irq
                  -> mask_IO_APIC_irq_desc
                     -> spin_lock_irqsave(&ioapic_lock, flags)
                     ---Receive NMI here after getting spinlock---
                        -> nmi
                           -> do_nmi
                              -> crash_nmi_callback
                              ---Infinite loop here---

The problem is that although kdump tries to shutdown minimal hardware,
it still needs to disable the IO APIC.  This requires spinlocks which
may be held by another cpu.  This other cpu is being held infinitely in
an NMI context by kdump in order to serialize the crashing path.  Instant
deadlock.

Eric, brought up a point that because the boot code was restructured we may
not need to disable the io apic any more in the crash path.  The original
concern that led to the development of disable_IO_APIC, was that the TSC
calibration on boot up relied on the PIT timer for reference.  Access
to the PIT required 8259 interrupts to be working.  This wouldn't work
if the ioapic needed to be configured.  So on panic path, the ioapic was
reconfigured to use virtual wire mode to allow the 8259 to passthrough.

Those concerns don't hold true now, thanks to the fast TSC calibration code
not needing the PIT.  As a result, we can remove this call and simplify the
locking needed in the panic path.

I tested kdump on an Ivy Bridge platform, a Pentium4 and an old athlon that
did not have an ioapic.  All three were successful.

Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>

---
I will probably need some help with my explaination as to why this line is not
needed.  Any input is appreciated!
---
 arch/x86/kernel/crash.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 13ad899..b053cf9 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -96,9 +96,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 	cpu_emergency_svm_disable();
 
 	lapic_shutdown();
-#if defined(CONFIG_X86_IO_APIC)
-	disable_IO_APIC();
-#endif
 #ifdef CONFIG_HPET_TIMER
 	hpet_disable();
 #endif
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86, kdump: No need to disable ioapic in crash path
  2012-02-02 18:12 [PATCH] x86, kdump: No need to disable ioapic in crash path Don Zickus
@ 2012-02-02 23:24 ` Eric W. Biederman
  2012-02-07 21:57   ` Don Zickus
  0 siblings, 1 reply; 19+ messages in thread
From: Eric W. Biederman @ 2012-02-02 23:24 UTC (permalink / raw)
  To: Don Zickus; +Cc: x86, LKML, kexec-list, Vivek Goyal

Don Zickus <dzickus@redhat.com> writes:

> A customer of ours noticed when their machine crashed, kdump did not
> work but hung instead.  Using their firmware dumping solution they
> grabbed a vmcore and decoded the stacks on the cpus.  What they
> noticed seemed to be a rare deadlock with the ioapic_lock.
>
>  CPU4:
>  machine_crash_shutdown
>  -> machine_ops.crash_shutdown
>     -> native_machine_crash_shutdown
>        -> kdump_nmi_shootdown_cpus ------> Send NMI to other CPUs
>        -> disable_IO_APIC
>           -> clear_IO_APIC
>              -> clear_IO_APIC_pin
>                 -> ioapic_read_entry
>                    -> spin_lock_irqsave(&ioapic_lock, flags)
>                    ---Infinite loop here---
>
>  CPU0:
>  do_IRQ
>  -> handle_irq
>     -> handle_edge_irq
>         -> ack_apic_edge
>            -> move_native_irq
>                -> mask_IO_APIC_irq
>                   -> mask_IO_APIC_irq_desc
>                      -> spin_lock_irqsave(&ioapic_lock, flags)
>                      ---Receive NMI here after getting spinlock---
>                         -> nmi
>                            -> do_nmi
>                               -> crash_nmi_callback
>                               ---Infinite loop here---
>
> The problem is that although kdump tries to shutdown minimal hardware,
> it still needs to disable the IO APIC.  This requires spinlocks which
> may be held by another cpu.  This other cpu is being held infinitely in
> an NMI context by kdump in order to serialize the crashing path.  Instant
> deadlock.
>
> Eric, brought up a point that because the boot code was restructured we may
> not need to disable the io apic any more in the crash path.  The original
> concern that led to the development of disable_IO_APIC, was that the TSC
> calibration on boot up relied on the PIT timer for reference.  Access
> to the PIT required 8259 interrupts to be working.  This wouldn't work
> if the ioapic needed to be configured.  So on panic path, the ioapic was
> reconfigured to use virtual wire mode to allow the 8259 to passthrough.

A small clarification originally it was the jiffies calibration that
would fail if we could cause the PIT to generate interrupts through the
8259.  The boot would then hang at calibrating jiffies.

> Those concerns don't hold true now, thanks to the fast TSC calibration code
> not needing the PIT.  As a result, we can remove this call and simplify the
> locking needed in the panic path.
>
> I tested kdump on an Ivy Bridge platform, a Pentium4 and an old athlon that
> did not have an ioapic.  All three were successful.
>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Signed-off-by: Don Zickus <dzickus@redhat.com>
>
> ---
> I will probably need some help with my explaination as to why this line is not
> needed.  Any input is appreciated!

Can you test and verify that we also do not need the lapic_shutdown()
call and the disable_local_APIC call on the other processors.  The same
reasoning that supports us not needing to disable the IO_APIC also
supports us not needing to disable local apic.

Removing disable_IO_APIC in and of itself and then booting isn't quite
sufficient as a practical test to prove this code always works.
Sometimes the IOAPIC was not hooked up to interesting interrupt sources
like the 8259.

Eric

> ---
>  arch/x86/kernel/crash.c |    3 ---
>  1 files changed, 0 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index 13ad899..b053cf9 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -96,9 +96,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
>  	cpu_emergency_svm_disable();
>  
>  	lapic_shutdown();
> -#if defined(CONFIG_X86_IO_APIC)
> -	disable_IO_APIC();
> -#endif
>  #ifdef CONFIG_HPET_TIMER
>  	hpet_disable();
>  #endif

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86, kdump: No need to disable ioapic in crash path
  2012-02-02 23:24 ` Eric W. Biederman
@ 2012-02-07 21:57   ` Don Zickus
  2012-02-07 22:19     ` Vivek Goyal
  0 siblings, 1 reply; 19+ messages in thread
From: Don Zickus @ 2012-02-07 21:57 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: x86, LKML, kexec-list, Vivek Goyal

On Thu, Feb 02, 2012 at 03:24:46PM -0800, Eric W. Biederman wrote:
> > Eric, brought up a point that because the boot code was restructured we may
> > not need to disable the io apic any more in the crash path.  The original
> > concern that led to the development of disable_IO_APIC, was that the TSC
> > calibration on boot up relied on the PIT timer for reference.  Access
> > to the PIT required 8259 interrupts to be working.  This wouldn't work
> > if the ioapic needed to be configured.  So on panic path, the ioapic was
> > reconfigured to use virtual wire mode to allow the 8259 to passthrough.
> 
> A small clarification originally it was the jiffies calibration that
> would fail if we could cause the PIT to generate interrupts through the
> 8259.  The boot would then hang at calibrating jiffies.

Ok.  Thanks!

> 
> > Those concerns don't hold true now, thanks to the fast TSC calibration code
> > not needing the PIT.  As a result, we can remove this call and simplify the
> > locking needed in the panic path.
> >
> > I tested kdump on an Ivy Bridge platform, a Pentium4 and an old athlon that
> > did not have an ioapic.  All three were successful.
> >
> > Cc: Eric W. Biederman <ebiederm@xmission.com>
> > Cc: Vivek Goyal <vgoyal@redhat.com>
> > Signed-off-by: Don Zickus <dzickus@redhat.com>
> >
> > ---
> > I will probably need some help with my explaination as to why this line is not
> > needed.  Any input is appreciated!
> 
> Can you test and verify that we also do not need the lapic_shutdown()
> call and the disable_local_APIC call on the other processors.  The same
> reasoning that supports us not needing to disable the IO_APIC also
> supports us not needing to disable local apic.

I did that and it seemed to work on my Ivy Bridge and core2 quad systems.

> 
> Removing disable_IO_APIC in and of itself and then booting isn't quite
> sufficient as a practical test to prove this code always works.
> Sometimes the IOAPIC was not hooked up to interesting interrupt sources
> like the 8259.

So what systems should I look for to test?

Cheers,
Don

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86, kdump: No need to disable ioapic in crash path
  2012-02-07 21:57   ` Don Zickus
@ 2012-02-07 22:19     ` Vivek Goyal
  2012-02-07 23:35       ` Eric W. Biederman
  0 siblings, 1 reply; 19+ messages in thread
From: Vivek Goyal @ 2012-02-07 22:19 UTC (permalink / raw)
  To: Don Zickus; +Cc: Eric W. Biederman, x86, LKML, kexec-list

On Tue, Feb 07, 2012 at 04:57:41PM -0500, Don Zickus wrote:
> On Thu, Feb 02, 2012 at 03:24:46PM -0800, Eric W. Biederman wrote:
> > > Eric, brought up a point that because the boot code was restructured we may
> > > not need to disable the io apic any more in the crash path.  The original
> > > concern that led to the development of disable_IO_APIC, was that the TSC
> > > calibration on boot up relied on the PIT timer for reference.  Access
> > > to the PIT required 8259 interrupts to be working.  This wouldn't work
> > > if the ioapic needed to be configured.  So on panic path, the ioapic was
> > > reconfigured to use virtual wire mode to allow the 8259 to passthrough.
> > 
> > A small clarification originally it was the jiffies calibration that
> > would fail if we could cause the PIT to generate interrupts through the
> > 8259.  The boot would then hang at calibrating jiffies.
> 
> Ok.  Thanks!

So now what has changed? Do we setup LAPIC and IOAPIC early enough to
receive PIT interrupts in regular mode (non-virtual wire mode) or
something else?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86, kdump: No need to disable ioapic in crash path
  2012-02-07 22:19     ` Vivek Goyal
@ 2012-02-07 23:35       ` Eric W. Biederman
  2012-02-08 20:11         ` Don Zickus
  0 siblings, 1 reply; 19+ messages in thread
From: Eric W. Biederman @ 2012-02-07 23:35 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Don Zickus, x86, LKML, kexec-list

Vivek Goyal <vgoyal@redhat.com> writes:

> On Tue, Feb 07, 2012 at 04:57:41PM -0500, Don Zickus wrote:
>> On Thu, Feb 02, 2012 at 03:24:46PM -0800, Eric W. Biederman wrote:
>> > > Eric, brought up a point that because the boot code was restructured we may
>> > > not need to disable the io apic any more in the crash path.  The original
>> > > concern that led to the development of disable_IO_APIC, was that the TSC
>> > > calibration on boot up relied on the PIT timer for reference.  Access
>> > > to the PIT required 8259 interrupts to be working.  This wouldn't work
>> > > if the ioapic needed to be configured.  So on panic path, the ioapic was
>> > > reconfigured to use virtual wire mode to allow the 8259 to passthrough.
>> > 
>> > A small clarification originally it was the jiffies calibration that
>> > would fail if we could cause the PIT to generate interrupts through the
>> > 8259.  The boot would then hang at calibrating jiffies.
>> 
>> Ok.  Thanks!
>
> So now what has changed? Do we setup LAPIC and IOAPIC early enough to
> receive PIT interrupts in regular mode (non-virtual wire mode) or
> something else?

Yes.  Part of the Moorstown work required that this be done because
moorsetown did not support legacy mode.  Last I looked the code hadn't
been generalized beyond Moorsetown but empirically it works now.

Don as to what to test the only case I can think of that might be spooky
is a screaming interrupt during the handover.  You might want to try
playing with lkcdtm to try some of the more exotic crash scenarios.  But
all I expect further testing might reveal are places where we are not
as robust in initializing the hardware as we might be.  Things that
might have been papered over by the ioapic shutdown.

I think we are good to remove my ancient hack of disabling the ioapics,
and putting the system into PIT during the crash kernel switchover.

Eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86, kdump: No need to disable ioapic in crash path
  2012-02-07 23:35       ` Eric W. Biederman
@ 2012-02-08 20:11         ` Don Zickus
  2012-02-08 22:55           ` Eric W. Biederman
  0 siblings, 1 reply; 19+ messages in thread
From: Don Zickus @ 2012-02-08 20:11 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Vivek Goyal, x86, LKML, kexec-list

On Tue, Feb 07, 2012 at 03:35:59PM -0800, Eric W. Biederman wrote:
> Vivek Goyal <vgoyal@redhat.com> writes:
> 
> > On Tue, Feb 07, 2012 at 04:57:41PM -0500, Don Zickus wrote:
> >> On Thu, Feb 02, 2012 at 03:24:46PM -0800, Eric W. Biederman wrote:
> >> > > Eric, brought up a point that because the boot code was restructured we may
> >> > > not need to disable the io apic any more in the crash path.  The original
> >> > > concern that led to the development of disable_IO_APIC, was that the TSC
> >> > > calibration on boot up relied on the PIT timer for reference.  Access
> >> > > to the PIT required 8259 interrupts to be working.  This wouldn't work
> >> > > if the ioapic needed to be configured.  So on panic path, the ioapic was
> >> > > reconfigured to use virtual wire mode to allow the 8259 to passthrough.
> >> > 
> >> > A small clarification originally it was the jiffies calibration that
> >> > would fail if we could cause the PIT to generate interrupts through the
> >> > 8259.  The boot would then hang at calibrating jiffies.
> >> 
> >> Ok.  Thanks!
> >
> > So now what has changed? Do we setup LAPIC and IOAPIC early enough to
> > receive PIT interrupts in regular mode (non-virtual wire mode) or
> > something else?
> 
> Yes.  Part of the Moorstown work required that this be done because
> moorsetown did not support legacy mode.  Last I looked the code hadn't
> been generalized beyond Moorsetown but empirically it works now.
> 
> Don as to what to test the only case I can think of that might be spooky
> is a screaming interrupt during the handover.  You might want to try
> playing with lkcdtm to try some of the more exotic crash scenarios.  But
> all I expect further testing might reveal are places where we are not
> as robust in initializing the hardware as we might be.  Things that
> might have been papered over by the ioapic shutdown.

I ran lkdtm by panic'ing in the interrupt handle thus leaving device
interrupt un-ack'd and the apic might have been un-ack'd too (jprobes
hooked in at do_IRQ).  3 out 3 times the second kernel came up on my core2
quad.

Cheers,
Don

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86, kdump: No need to disable ioapic in crash path
  2012-02-08 20:11         ` Don Zickus
@ 2012-02-08 22:55           ` Eric W. Biederman
  2012-02-09 14:48             ` Don Zickus
  0 siblings, 1 reply; 19+ messages in thread
From: Eric W. Biederman @ 2012-02-08 22:55 UTC (permalink / raw)
  To: Don Zickus; +Cc: Vivek Goyal, x86, LKML, kexec-list

Don Zickus <dzickus@redhat.com> writes:

> On Tue, Feb 07, 2012 at 03:35:59PM -0800, Eric W. Biederman wrote:
>> Vivek Goyal <vgoyal@redhat.com> writes:
>> 
>> > On Tue, Feb 07, 2012 at 04:57:41PM -0500, Don Zickus wrote:
>> >> On Thu, Feb 02, 2012 at 03:24:46PM -0800, Eric W. Biederman wrote:
>> >> > > Eric, brought up a point that because the boot code was restructured we may
>> >> > > not need to disable the io apic any more in the crash path.  The original
>> >> > > concern that led to the development of disable_IO_APIC, was that the TSC
>> >> > > calibration on boot up relied on the PIT timer for reference.  Access
>> >> > > to the PIT required 8259 interrupts to be working.  This wouldn't work
>> >> > > if the ioapic needed to be configured.  So on panic path, the ioapic was
>> >> > > reconfigured to use virtual wire mode to allow the 8259 to passthrough.
>> >> > 
>> >> > A small clarification originally it was the jiffies calibration that
>> >> > would fail if we could cause the PIT to generate interrupts through the
>> >> > 8259.  The boot would then hang at calibrating jiffies.
>> >> 
>> >> Ok.  Thanks!
>> >
>> > So now what has changed? Do we setup LAPIC and IOAPIC early enough to
>> > receive PIT interrupts in regular mode (non-virtual wire mode) or
>> > something else?
>> 
>> Yes.  Part of the Moorstown work required that this be done because
>> moorsetown did not support legacy mode.  Last I looked the code hadn't
>> been generalized beyond Moorsetown but empirically it works now.
>> 
>> Don as to what to test the only case I can think of that might be spooky
>> is a screaming interrupt during the handover.  You might want to try
>> playing with lkcdtm to try some of the more exotic crash scenarios.  But
>> all I expect further testing might reveal are places where we are not
>> as robust in initializing the hardware as we might be.  Things that
>> might have been papered over by the ioapic shutdown.
>
> I ran lkdtm by panic'ing in the interrupt handle thus leaving device
> interrupt un-ack'd and the apic might have been un-ack'd too (jprobes
> hooked in at do_IRQ).  3 out 3 times the second kernel came up on my core2
> quad.

That sounds like more than enough basic testing for me.  Document your
testing in a patch description and let's get the unnecessary local apic
and ioapic stomping removed from the kexec on panic path.

There were bugs.  We deleted the code that had them.  The bugs are gone
and there are no new problems goes over very well in my book.

Eric


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86, kdump: No need to disable ioapic in crash path
  2012-02-08 22:55           ` Eric W. Biederman
@ 2012-02-09 14:48             ` Don Zickus
  0 siblings, 0 replies; 19+ messages in thread
From: Don Zickus @ 2012-02-09 14:48 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Vivek Goyal, x86, LKML, kexec-list

On Wed, Feb 08, 2012 at 02:55:14PM -0800, Eric W. Biederman wrote:
> > I ran lkdtm by panic'ing in the interrupt handle thus leaving device
> > interrupt un-ack'd and the apic might have been un-ack'd too (jprobes
> > hooked in at do_IRQ).  3 out 3 times the second kernel came up on my core2
> > quad.
> 
> That sounds like more than enough basic testing for me.  Document your
> testing in a patch description and let's get the unnecessary local apic
> and ioapic stomping removed from the kexec on panic path.
> 
> There were bugs.  We deleted the code that had them.  The bugs are gone
> and there are no new problems goes over very well in my book.

Great.  Thanks.  I'll put together the patch.

Cheers,
Don

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86, kdump: No need to disable ioapic in crash path
  2012-05-02 19:59           ` Don Zickus
@ 2012-05-02 20:24             ` Eric W. Biederman
  0 siblings, 0 replies; 19+ messages in thread
From: Eric W. Biederman @ 2012-05-02 20:24 UTC (permalink / raw)
  To: Don Zickus; +Cc: Seiji Aguchi, x86, LKML, kexec-list, Vivek Goyal

Don Zickus <dzickus@redhat.com> writes:

> On Wed, May 02, 2012 at 12:39:06PM -0700, Eric W. Biederman wrote:
>> Seiji Aguchi <seiji.aguchi@hds.com> writes:
>> 
>> >> Perhaps calling setup_IO_APIC before setup_local_APIC would be a better fix?
>> >
>> > I checked Intel develper's manual and there is no restriction about the order of enabling IO_APIC/local_APIC.
>> > So, it may work.
>> >
>> > But, I don't understand why we have to change the stable boot-up code.
>> 
>> Because the boot-up code is buggy.  We need to get a better handle on
>> how it is buggy but apparently an interrupt coming in at the wrong
>> moment while booting with interrupts on the interrupt flag on the cpus
>> disalbed puts us in a state where we fail to boot.
>> 
>> We should be able to boot with apics enabled, and we almost can
>> emperically there are a few bugs.
>> 
>> The kdump path is particularly good at finding bugs.
>> 
>> > If kdump disables both local_apic and IO_APIC in proper way in 1st kernel,  2nd kernel works without any change.
>> 
>> We can not guarnatee disabling the local apics in the first kernel.
>> 
>> Ultimately the less we do in the first kernel the more reliable kdump is
>> going to be.  Disabling the apics has been a long standing bug work
>> around.
>> 
>> At worst we may have been a smidge premature in using assuming the
>> kernel can boot with the apics enabled but it I would hope we can
>> track down and fix the boot up code.
>> 
>> Probably what we want to do is not to disable the I/O apics but
>> to program the I/O apics before we enable the local apic so that
>> we have control of the in-comming interrupts.  But I haven't
>> looked at this in nearly enough detail to even guess what needs
>> to happen.
>
> Hi Eric,
>
> Thanks for the info.  I have don't have a problem with what you say above,
> I think that is a noble effort worth pursuing.  From a high level
> perspective, I am trying to understand how that is supposed to be
> acheived.  Getting the code to match the theory is probably easier to do
> than throw random patches/hacks at various kdump problems as they
> arise.

The very basic theory is:

--- Prepare to handle a crash (load kdump kernel etc)

panic.
locally disable interrupts
do things that can only be done in the panicing kernel
jump to purgatory

It is pretty clear from Peter Anvin's comments that we can perform a
generic nmi disable in the panicing kernel just by disabling nmi's
handling in the local apic.  We need to confirm that but it sounds
like a single write.

We shoot down other cpus in a best effort in the crashing kernel because
that is the only way we can possibly get their cpu registers.

> (this leaves apics and virt stuff untouched?)

We have to disable virt stuff because you can't change cpu modes with
virt stuff enabled (trying causes faults).  But disabling the virt stuff
is just a register write.

> 2nd kernel:
>
> normal early boot stuff
> setup memory
> setup scheduler
> ...
> program ioapic/lapic??
>    #currently this is down _after_ boot cpu interrupts are enabled
>    #which seem problematic if you have leftover screaming interrupts
>    #probably a reason for this like timers or something

Yes, we need to figure out how to deal with screaming interrupts in
this stage.  I have not long ago disabled msi interrupts at pci bus
scan time for a similar reason.  The msi interrupts I encountered were
not technically screaming but I did encounter one that was firing ever
couple of microseconds which is effectively the same as screaming.

Basically I don't particularly care how we do this so long as the
screamming or rapid fire interrupts don't stop the boot.

> enable boot cpu interrupts
> setup boot cpu
> setup other cpus
> ....

Eric


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86, kdump: No need to disable ioapic in crash path
  2012-05-02 19:39         ` Eric W. Biederman
@ 2012-05-02 19:59           ` Don Zickus
  2012-05-02 20:24             ` Eric W. Biederman
  0 siblings, 1 reply; 19+ messages in thread
From: Don Zickus @ 2012-05-02 19:59 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Seiji Aguchi, x86, LKML, kexec-list, Vivek Goyal

On Wed, May 02, 2012 at 12:39:06PM -0700, Eric W. Biederman wrote:
> Seiji Aguchi <seiji.aguchi@hds.com> writes:
> 
> >> Perhaps calling setup_IO_APIC before setup_local_APIC would be a better fix?
> >
> > I checked Intel develper's manual and there is no restriction about the order of enabling IO_APIC/local_APIC.
> > So, it may work.
> >
> > But, I don't understand why we have to change the stable boot-up code.
> 
> Because the boot-up code is buggy.  We need to get a better handle on
> how it is buggy but apparently an interrupt coming in at the wrong
> moment while booting with interrupts on the interrupt flag on the cpus
> disalbed puts us in a state where we fail to boot.
> 
> We should be able to boot with apics enabled, and we almost can
> emperically there are a few bugs.
> 
> The kdump path is particularly good at finding bugs.
> 
> > If kdump disables both local_apic and IO_APIC in proper way in 1st kernel,  2nd kernel works without any change.
> 
> We can not guarnatee disabling the local apics in the first kernel.
> 
> Ultimately the less we do in the first kernel the more reliable kdump is
> going to be.  Disabling the apics has been a long standing bug work
> around.
> 
> At worst we may have been a smidge premature in using assuming the
> kernel can boot with the apics enabled but it I would hope we can
> track down and fix the boot up code.
> 
> Probably what we want to do is not to disable the I/O apics but
> to program the I/O apics before we enable the local apic so that
> we have control of the in-comming interrupts.  But I haven't
> looked at this in nearly enough detail to even guess what needs
> to happen.

Hi Eric,

Thanks for the info.  I have don't have a problem with what you say above,
I think that is a noble effort worth pursuing.  From a high level
perspective, I am trying to understand how that is supposed to be
acheived.  Getting the code to match the theory is probably easier to do
than throw random patches/hacks at various kdump problems as they arise.

So can I understand what your thoughts are? Are you expecting the
following in the first kernel:

panic
disable other cpus
setup 2nd kernel jumptables
disable panic cpu interrupts
idt/gdt settings??
jump to purgatory

(this leaves apics and virt stuff untouched?)
(i am ignoring nmi/mce/faults and other exceptions for now)

purgatory stuff...

2nd kernel:

normal early boot stuff
setup memory
setup scheduler
...
program ioapic/lapic??
   #currently this is down _after_ boot cpu interrupts are enabled
   #which seem problematic if you have leftover screaming interrupts
   #probably a reason for this like timers or something
enable boot cpu interrupts
setup boot cpu
setup other cpus
....

Cheers,
Don

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86, kdump: No need to disable ioapic in crash path
  2012-05-02 19:10       ` Seiji Aguchi
@ 2012-05-02 19:39         ` Eric W. Biederman
  2012-05-02 19:59           ` Don Zickus
  0 siblings, 1 reply; 19+ messages in thread
From: Eric W. Biederman @ 2012-05-02 19:39 UTC (permalink / raw)
  To: Seiji Aguchi; +Cc: Don Zickus, x86, LKML, kexec-list, Vivek Goyal

Seiji Aguchi <seiji.aguchi@hds.com> writes:

>> Perhaps calling setup_IO_APIC before setup_local_APIC would be a better fix?
>
> I checked Intel develper's manual and there is no restriction about the order of enabling IO_APIC/local_APIC.
> So, it may work.
>
> But, I don't understand why we have to change the stable boot-up code.

Because the boot-up code is buggy.  We need to get a better handle on
how it is buggy but apparently an interrupt coming in at the wrong
moment while booting with interrupts on the interrupt flag on the cpus
disalbed puts us in a state where we fail to boot.

We should be able to boot with apics enabled, and we almost can
emperically there are a few bugs.

The kdump path is particularly good at finding bugs.

> If kdump disables both local_apic and IO_APIC in proper way in 1st kernel,  2nd kernel works without any change.

We can not guarnatee disabling the local apics in the first kernel.

Ultimately the less we do in the first kernel the more reliable kdump is
going to be.  Disabling the apics has been a long standing bug work
around.

At worst we may have been a smidge premature in using assuming the
kernel can boot with the apics enabled but it I would hope we can
track down and fix the boot up code.

Probably what we want to do is not to disable the I/O apics but
to program the I/O apics before we enable the local apic so that
we have control of the in-comming interrupts.  But I haven't
looked at this in nearly enough detail to even guess what needs
to happen.

Eric



^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH] x86, kdump: No need to disable ioapic in crash path
  2012-04-30 20:53     ` Don Zickus
@ 2012-05-02 19:10       ` Seiji Aguchi
  2012-05-02 19:39         ` Eric W. Biederman
  0 siblings, 1 reply; 19+ messages in thread
From: Seiji Aguchi @ 2012-05-02 19:10 UTC (permalink / raw)
  To: Don Zickus, ebiederm; +Cc: x86, LKML, kexec-list, Vivek Goyal


> Perhaps calling setup_IO_APIC before setup_local_APIC would be a better fix?

I checked Intel develper's manual and there is no restriction about the order of enabling IO_APIC/local_APIC.
So, it may work.

But, I don't understand why we have to change the stable boot-up code.
If kdump disables both local_apic and IO_APIC in proper way in 1st kernel,  2nd kernel works without any change.

I think busting spinlocks ,like io_apic_lock, in 1st kernel is reasonable.

Seiji

> -----Original Message-----
> From: Don Zickus [mailto:dzickus@redhat.com]
> Sent: Monday, April 30, 2012 4:54 PM
> To: Seiji Aguchi; ebiederm@xmission.com
> Cc: x86@kernel.org; LKML; kexec-list; Vivek Goyal
> Subject: Re: [PATCH] x86, kdump: No need to disable ioapic in crash path
> 
> On Thu, Mar 15, 2012 at 05:16:50PM -0400, Seiji Aguchi wrote:
> > Don,
> >
> > What do you think about following scenario?
> > Disabling I/O APIC seems to be needed before booting kdump kernel.
> 
> For some reason I actually believed this was cleared before interrupts were enabled on bootup.  Apparently not.  On a virt guest I can
> easily create a scenario in which scp'ing a file then kdumping, leaves the ethernet interrupt in a triggered state.
> 
> Before this patch, it would be masked by disable_IO_APIC.  With my patch the irq nevers gets masked and during setup_local_APIC
> the kernel falls over once the local APIC is enabled (as setup_IO_APIC is called later).
> Perhaps calling setup_IO_APIC before setup_local_APIC would be a better fix?
> 
> Just like NMIs prohibit the abilty to remove the disable local apic code, an actively triggered interrupt seems to prevent us from
> removing the disable io apic.
> 
> This leaves me with my original problem of deadlocking in the disable_IO_APIC path.
> 
> Thoughts?
> 
> Cheers,
> Don
> 
> >
> > Seiji
> >
> >
> > commit 1e75b31d638d5242ca8e9771dfdcbd28a5f041df
> > Author: Suresh Siddha <suresh.b.siddha@intel.com>
> > Date:   Thu Aug 25 12:01:11 2011 -0700
> >
> >     x86, kdump, ioapic: Reset remote-IRR in clear_IO_APIC
> >
> >     In the kdump scenario mentioned below, we can have a case where
> >     the device using level triggered interrupt will not generate any
> >     interrupts in the kdump kernel.
> >
> >     1. IO-APIC sends a level triggered interrupt to the CPU's local APIC.
> >
> >     2. Kernel crashed before the CPU services this interrupt, leaving
> >        the remote-IRR in the IO-APIC set.
> >
> >     3. kdump kernel boot sequence does clear_IO_APIC() as part of IO-APIC
> >        initialization. But this fails to reset remote-IRR bit of the
> >        IO-APIC RTE as the remote-IRR bit is read-only.
> >
> >     4. Device using that level triggered entry can't generate any
> >        more interrupts because of the remote-IRR bit.
> >
> >     In clear_IO_APIC_pin(), check if the remote-IRR bit is set and if
> >     so do an explicit attempt to clear it (by doing EOI write on
> >     modern io-apic's and changing trigger mode to edge/level on
> >     older io-apic's). Also before doing the explicit EOI to the
> >     io-apic, ensure that the trigger mode is indeed set to level.
> >     This will enable the explicit EOI to the io-apic to reset the
> >     remote-IRR bit.
> >
> >     Tested-by: Leonardo Chiquitto <lchiquitto@novell.com>
> >     Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
> >     Fixes: https://bugzilla.novell.com/show_bug.cgi?id=701686
> >     Cc: Rafael Wysocki <rjw@novell.com>
> >     Cc: Maciej W. Rozycki <macro@linux-mips.org>
> >     Cc: Thomas Renninger <trenn@suse.de>
> >     Cc: jbeulich@novell.com
> >     Cc: yinghai@kernel.org
> >     Link: http://lkml.kernel.org/r/20110825190657.157502602@sbsiddha-desk.sc.intel.com
> >     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> >
> > > -----Original Message-----
> > > From: linux-kernel-owner@vger.kernel.org
> > > [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Don Zickus
> > > Sent: Thursday, March 15, 2012 4:27 PM
> > > To: x86@kernel.org
> > > Cc: LKML; kexec-list; Eric W. Biederman; Vivek Goyal
> > > Subject: Re: [PATCH] x86, kdump: No need to disable ioapic in crash
> > > path
> > >
> > > On Wed, Feb 29, 2012 at 03:08:49PM -0500, Don Zickus wrote:
> > > > A customer of ours noticed when their machine crashed, kdump did
> > > > not work but hung instead.  Using their firmware dumping solution
> > > > they grabbed a vmcore and decoded the stacks on the cpus.  What
> > > > they noticed seemed to be a rare deadlock with the ioapic_lock.
> > >
> > > While we are discussing the NMI stuff in another thread, does anyone
> > > have any objection to committing this patch.  It fixes a real problem today.
> > >
> > > Cheers,
> > > Don
> > >
> > > >
> > > >  CPU4:
> > > >  machine_crash_shutdown
> > > >  -> machine_ops.crash_shutdown
> > > >     -> native_machine_crash_shutdown
> > > >        -> kdump_nmi_shootdown_cpus ------> Send NMI to other CPUs
> > > >        -> disable_IO_APIC
> > > >           -> clear_IO_APIC
> > > >              -> clear_IO_APIC_pin
> > > >                 -> ioapic_read_entry
> > > >                    -> spin_lock_irqsave(&ioapic_lock, flags)
> > > >                    ---Infinite loop here---
> > > >
> > > >  CPU0:
> > > >  do_IRQ
> > > >  -> handle_irq
> > > >     -> handle_edge_irq
> > > >         -> ack_apic_edge
> > > >            -> move_native_irq
> > > >                -> mask_IO_APIC_irq
> > > >                   -> mask_IO_APIC_irq_desc
> > > >                      -> spin_lock_irqsave(&ioapic_lock, flags)
> > > >                      ---Receive NMI here after getting spinlock---
> > > >                         -> nmi
> > > >                            -> do_nmi
> > > >                               -> crash_nmi_callback
> > > >                               ---Infinite loop here---
> > > >
> > > > The problem is that although kdump tries to shutdown minimal
> > > > hardware, it still needs to disable the IO APIC.  This requires
> > > > spinlocks which may be held by another cpu.  This other cpu is
> > > > being held infinitely in an NMI context by kdump in order to serialize the crashing path.
> > > > Instant deadlock.
> > > >
> > > > Eric, brought up a point that because the boot code was
> > > > restructured we may not need to disable the io apic any more in the crash path.
> > > > The original concern that led to the development of
> > > > disable_IO_APIC, was that the jiffies calibration on boot up
> > > > relied on the PIT timer for reference.  Access to the PIT required
> > > > 8259 interrupts to be working.  This wouldn't work if the ioapic needed to be configured.
> > > > So on panic path, the ioapic was reconfigured to use virtual wire mode to allow the 8259 to passthrough.
> > > >
> > > > Those concerns don't hold true now, thanks to the jiffies
> > > > calibration code not needing the PIT.  As a result, we can remove
> > > > this call and simplify the locking needed in the panic path.
> > > >
> > > > I tested kdump on an Ivy Bridge platform, a Pentium4 and an old
> > > > athlon that did not have an ioapic.  All three were successful.
> > > >
> > > > I also tested using lkdtm that would use jprobes to panic the
> > > > system when entering do_IRQ.  The idea was to see how the system
> > > > reacted with an interrupt pending in the second kernel.  My core2
> > > > quad successfully kdump'd
> > > > 3 times in a row with no issues.
> > > >
> > > > v2: removed the disable lapic code too
> > > > v3: re-add disabling of lapic code
> > > >
> > > > Cc: Eric W. Biederman <ebiederm@xmission.com>
> > > > Cc: Vivek Goyal <vgoyal@redhat.com>
> > > > Signed-off-by: Don Zickus <dzickus@redhat.com>
> > > > ---
> > > >
> > > > There are really two problems here.  One is the deadlock of the
> > > > ioapic_lock that I describe above.  Removing the code to disable
> > > > the ioapic seems to resolve that.
> > > >
> > > > The second issue is handling non-IRQ exceptions like NMIs.  Eric
> > > > asked me to include removing the disable lapic code too.  However,
> > > > because the nmi watchdog is stil active and kexec zeros out the
> > > > idt before it jumps to purgatory, an NMI that comes in during the
> > > > transition between the first kernel and second kernel will see an empty idt and reset the cpu.
> > > >
> > > > Leaving the code to disable the lapic in, turns off perf and
> > > > blocks those NMIs from happening (though an external NMI would
> > > > still be an issue but that is no different than right now).
> > > >
> > > > I tried playing with a stub idt and leaving it in place through
> > > > the transition to the second kernel, but I can't quite get it to
> > > > work correctly.  Spinning in the first kernel before the purgatory
> > > > jump catches the idt properly.  Spinning in purgatory before the
> > > > second kernel jump doesn't.  I even disabled the zero'ing out of the idt in the purgatory code.
> > > >
> > > > I would like to get resolution on the ioapic deadlock to fix a
> > > > customer issue while working the idt and NMI thing on the side,
> > > > hence the split of this patchset.
> > > >
> > > > Hopefully, people recognize there are two issues here and that
> > > > this patch resolves the first one and the second one needs more debugging and time.
> > > > ---
> > > >  arch/x86/kernel/crash.c |    3 ---
> > > >  1 files changed, 0 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> > > > index
> > > > 13ad899..b053cf9 100644
> > > > --- a/arch/x86/kernel/crash.c
> > > > +++ b/arch/x86/kernel/crash.c
> > > > @@ -96,9 +96,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
> > > >  	cpu_emergency_svm_disable();
> > > >
> > > >  	lapic_shutdown();
> > > > -#if defined(CONFIG_X86_IO_APIC)
> > > > -	disable_IO_APIC();
> > > > -#endif
> > > >  #ifdef CONFIG_HPET_TIMER
> > > >  	hpet_disable();
> > > >  #endif
> > > > --
> > > > 1.7.7.6
> > > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe
> > > linux-kernel" in the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86, kdump: No need to disable ioapic in crash path
  2012-03-15 21:16   ` Seiji Aguchi
  2012-03-15 21:33     ` Don Zickus
@ 2012-04-30 20:53     ` Don Zickus
  2012-05-02 19:10       ` Seiji Aguchi
  1 sibling, 1 reply; 19+ messages in thread
From: Don Zickus @ 2012-04-30 20:53 UTC (permalink / raw)
  To: Seiji Aguchi, ebiederm; +Cc: x86, LKML, kexec-list, Vivek Goyal

On Thu, Mar 15, 2012 at 05:16:50PM -0400, Seiji Aguchi wrote:
> Don,
> 
> What do you think about following scenario?
> Disabling I/O APIC seems to be needed before booting kdump kernel.

For some reason I actually believed this was cleared before interrupts
were enabled on bootup.  Apparently not.  On a virt guest I can easily
create a scenario in which scp'ing a file then kdumping, leaves the ethernet
interrupt in a triggered state.

Before this patch, it would be masked by disable_IO_APIC.  With my patch
the irq nevers gets masked and during setup_local_APIC the kernel falls
over once the local APIC is enabled (as setup_IO_APIC is called later).
Perhaps calling setup_IO_APIC before setup_local_APIC would be a better
fix?

Just like NMIs prohibit the abilty to remove the disable local apic code,
an actively triggered interrupt seems to prevent us from removing the
disable io apic.

This leaves me with my original problem of deadlocking in the
disable_IO_APIC path.

Thoughts?

Cheers,
Don

> 
> Seiji
> 
> 
> commit 1e75b31d638d5242ca8e9771dfdcbd28a5f041df
> Author: Suresh Siddha <suresh.b.siddha@intel.com>
> Date:   Thu Aug 25 12:01:11 2011 -0700
> 
>     x86, kdump, ioapic: Reset remote-IRR in clear_IO_APIC
>     
>     In the kdump scenario mentioned below, we can have a case where
>     the device using level triggered interrupt will not generate any
>     interrupts in the kdump kernel.
>     
>     1. IO-APIC sends a level triggered interrupt to the CPU's local APIC.
>     
>     2. Kernel crashed before the CPU services this interrupt, leaving
>        the remote-IRR in the IO-APIC set.
>     
>     3. kdump kernel boot sequence does clear_IO_APIC() as part of IO-APIC
>        initialization. But this fails to reset remote-IRR bit of the
>        IO-APIC RTE as the remote-IRR bit is read-only.
>     
>     4. Device using that level triggered entry can't generate any
>        more interrupts because of the remote-IRR bit.
>     
>     In clear_IO_APIC_pin(), check if the remote-IRR bit is set and if
>     so do an explicit attempt to clear it (by doing EOI write on
>     modern io-apic's and changing trigger mode to edge/level on
>     older io-apic's). Also before doing the explicit EOI to the
>     io-apic, ensure that the trigger mode is indeed set to level.
>     This will enable the explicit EOI to the io-apic to reset the
>     remote-IRR bit.
>     
>     Tested-by: Leonardo Chiquitto <lchiquitto@novell.com>
>     Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
>     Fixes: https://bugzilla.novell.com/show_bug.cgi?id=701686
>     Cc: Rafael Wysocki <rjw@novell.com>
>     Cc: Maciej W. Rozycki <macro@linux-mips.org>
>     Cc: Thomas Renninger <trenn@suse.de>
>     Cc: jbeulich@novell.com
>     Cc: yinghai@kernel.org
>     Link: http://lkml.kernel.org/r/20110825190657.157502602@sbsiddha-desk.sc.intel.com
>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> 
> > -----Original Message-----
> > From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Don Zickus
> > Sent: Thursday, March 15, 2012 4:27 PM
> > To: x86@kernel.org
> > Cc: LKML; kexec-list; Eric W. Biederman; Vivek Goyal
> > Subject: Re: [PATCH] x86, kdump: No need to disable ioapic in crash path
> > 
> > On Wed, Feb 29, 2012 at 03:08:49PM -0500, Don Zickus wrote:
> > > A customer of ours noticed when their machine crashed, kdump did not
> > > work but hung instead.  Using their firmware dumping solution they
> > > grabbed a vmcore and decoded the stacks on the cpus.  What they
> > > noticed seemed to be a rare deadlock with the ioapic_lock.
> > 
> > While we are discussing the NMI stuff in another thread, does anyone have any objection to committing this patch.  It fixes a real
> > problem today.
> > 
> > Cheers,
> > Don
> > 
> > >
> > >  CPU4:
> > >  machine_crash_shutdown
> > >  -> machine_ops.crash_shutdown
> > >     -> native_machine_crash_shutdown
> > >        -> kdump_nmi_shootdown_cpus ------> Send NMI to other CPUs
> > >        -> disable_IO_APIC
> > >           -> clear_IO_APIC
> > >              -> clear_IO_APIC_pin
> > >                 -> ioapic_read_entry
> > >                    -> spin_lock_irqsave(&ioapic_lock, flags)
> > >                    ---Infinite loop here---
> > >
> > >  CPU0:
> > >  do_IRQ
> > >  -> handle_irq
> > >     -> handle_edge_irq
> > >         -> ack_apic_edge
> > >            -> move_native_irq
> > >                -> mask_IO_APIC_irq
> > >                   -> mask_IO_APIC_irq_desc
> > >                      -> spin_lock_irqsave(&ioapic_lock, flags)
> > >                      ---Receive NMI here after getting spinlock---
> > >                         -> nmi
> > >                            -> do_nmi
> > >                               -> crash_nmi_callback
> > >                               ---Infinite loop here---
> > >
> > > The problem is that although kdump tries to shutdown minimal hardware,
> > > it still needs to disable the IO APIC.  This requires spinlocks which
> > > may be held by another cpu.  This other cpu is being held infinitely
> > > in an NMI context by kdump in order to serialize the crashing path.
> > > Instant deadlock.
> > >
> > > Eric, brought up a point that because the boot code was restructured
> > > we may not need to disable the io apic any more in the crash path.
> > > The original concern that led to the development of disable_IO_APIC,
> > > was that the jiffies calibration on boot up relied on the PIT timer
> > > for reference.  Access to the PIT required 8259 interrupts to be
> > > working.  This wouldn't work if the ioapic needed to be configured.
> > > So on panic path, the ioapic was reconfigured to use virtual wire mode to allow the 8259 to passthrough.
> > >
> > > Those concerns don't hold true now, thanks to the jiffies calibration
> > > code not needing the PIT.  As a result, we can remove this call and
> > > simplify the locking needed in the panic path.
> > >
> > > I tested kdump on an Ivy Bridge platform, a Pentium4 and an old athlon
> > > that did not have an ioapic.  All three were successful.
> > >
> > > I also tested using lkdtm that would use jprobes to panic the system
> > > when entering do_IRQ.  The idea was to see how the system reacted with
> > > an interrupt pending in the second kernel.  My core2 quad successfully
> > > kdump'd
> > > 3 times in a row with no issues.
> > >
> > > v2: removed the disable lapic code too
> > > v3: re-add disabling of lapic code
> > >
> > > Cc: Eric W. Biederman <ebiederm@xmission.com>
> > > Cc: Vivek Goyal <vgoyal@redhat.com>
> > > Signed-off-by: Don Zickus <dzickus@redhat.com>
> > > ---
> > >
> > > There are really two problems here.  One is the deadlock of the
> > > ioapic_lock that I describe above.  Removing the code to disable the
> > > ioapic seems to resolve that.
> > >
> > > The second issue is handling non-IRQ exceptions like NMIs.  Eric asked
> > > me to include removing the disable lapic code too.  However, because
> > > the nmi watchdog is stil active and kexec zeros out the idt before it
> > > jumps to purgatory, an NMI that comes in during the transition between
> > > the first kernel and second kernel will see an empty idt and reset the cpu.
> > >
> > > Leaving the code to disable the lapic in, turns off perf and blocks
> > > those NMIs from happening (though an external NMI would still be an
> > > issue but that is no different than right now).
> > >
> > > I tried playing with a stub idt and leaving it in place through the
> > > transition to the second kernel, but I can't quite get it to work
> > > correctly.  Spinning in the first kernel before the purgatory jump
> > > catches the idt properly.  Spinning in purgatory before the second
> > > kernel jump doesn't.  I even disabled the zero'ing out of the idt in the purgatory code.
> > >
> > > I would like to get resolution on the ioapic deadlock to fix a
> > > customer issue while working the idt and NMI thing on the side, hence
> > > the split of this patchset.
> > >
> > > Hopefully, people recognize there are two issues here and that this
> > > patch resolves the first one and the second one needs more debugging and time.
> > > ---
> > >  arch/x86/kernel/crash.c |    3 ---
> > >  1 files changed, 0 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index
> > > 13ad899..b053cf9 100644
> > > --- a/arch/x86/kernel/crash.c
> > > +++ b/arch/x86/kernel/crash.c
> > > @@ -96,9 +96,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
> > >  	cpu_emergency_svm_disable();
> > >
> > >  	lapic_shutdown();
> > > -#if defined(CONFIG_X86_IO_APIC)
> > > -	disable_IO_APIC();
> > > -#endif
> > >  #ifdef CONFIG_HPET_TIMER
> > >  	hpet_disable();
> > >  #endif
> > > --
> > > 1.7.7.6
> > >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More
> > majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86, kdump: No need to disable ioapic in crash path
  2012-02-29 20:08 Don Zickus
  2012-03-15 20:26 ` Don Zickus
@ 2012-03-29 16:02 ` Don Zickus
  1 sibling, 0 replies; 19+ messages in thread
From: Don Zickus @ 2012-03-29 16:02 UTC (permalink / raw)
  To: x86; +Cc: LKML, kexec-list, Eric W. Biederman, Vivek Goyal

On Wed, Feb 29, 2012 at 03:08:49PM -0500, Don Zickus wrote:
> A customer of ours noticed when their machine crashed, kdump did not
> work but hung instead.  Using their firmware dumping solution they
> grabbed a vmcore and decoded the stacks on the cpus.  What they
> noticed seemed to be a rare deadlock with the ioapic_lock.

Any feedback?

Cheers,
Don

> 
>  CPU4:
>  machine_crash_shutdown
>  -> machine_ops.crash_shutdown
>     -> native_machine_crash_shutdown
>        -> kdump_nmi_shootdown_cpus ------> Send NMI to other CPUs
>        -> disable_IO_APIC
>           -> clear_IO_APIC
>              -> clear_IO_APIC_pin
>                 -> ioapic_read_entry
>                    -> spin_lock_irqsave(&ioapic_lock, flags)
>                    ---Infinite loop here---
> 
>  CPU0:
>  do_IRQ
>  -> handle_irq
>     -> handle_edge_irq
>         -> ack_apic_edge
>            -> move_native_irq
>                -> mask_IO_APIC_irq
>                   -> mask_IO_APIC_irq_desc
>                      -> spin_lock_irqsave(&ioapic_lock, flags)
>                      ---Receive NMI here after getting spinlock---
>                         -> nmi
>                            -> do_nmi
>                               -> crash_nmi_callback
>                               ---Infinite loop here---
> 
> The problem is that although kdump tries to shutdown minimal hardware,
> it still needs to disable the IO APIC.  This requires spinlocks which
> may be held by another cpu.  This other cpu is being held infinitely in
> an NMI context by kdump in order to serialize the crashing path.  Instant
> deadlock.
> 
> Eric, brought up a point that because the boot code was restructured we may
> not need to disable the io apic any more in the crash path.  The original
> concern that led to the development of disable_IO_APIC, was that the jiffies
> calibration on boot up relied on the PIT timer for reference.  Access
> to the PIT required 8259 interrupts to be working.  This wouldn't work
> if the ioapic needed to be configured.  So on panic path, the ioapic was
> reconfigured to use virtual wire mode to allow the 8259 to passthrough.
> 
> Those concerns don't hold true now, thanks to the jiffies calibration code
> not needing the PIT.  As a result, we can remove this call and simplify the
> locking needed in the panic path.
> 
> I tested kdump on an Ivy Bridge platform, a Pentium4 and an old athlon that
> did not have an ioapic.  All three were successful.
> 
> I also tested using lkdtm that would use jprobes to panic the system when
> entering do_IRQ.  The idea was to see how the system reacted with an
> interrupt pending in the second kernel.  My core2 quad successfully kdump'd
> 3 times in a row with no issues.
> 
> v2: removed the disable lapic code too
> v3: re-add disabling of lapic code
> 
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Signed-off-by: Don Zickus <dzickus@redhat.com>
> ---
> 
> There are really two problems here.  One is the deadlock of the ioapic_lock
> that I describe above.  Removing the code to disable the ioapic seems to
> resolve that.
> 
> The second issue is handling non-IRQ exceptions like NMIs.  Eric asked me
> to include removing the disable lapic code too.  However, because the nmi
> watchdog is stil active and kexec zeros out the idt before it jumps to
> purgatory, an NMI that comes in during the transition between the first
> kernel and second kernel will see an empty idt and reset the cpu.
> 
> Leaving the code to disable the lapic in, turns off perf and blocks those NMIs
> from happening (though an external NMI would still be an issue but that is no
> different than right now).
> 
> I tried playing with a stub idt and leaving it in place through the transition
> to the second kernel, but I can't quite get it to work correctly.  Spinning in the
> first kernel before the purgatory jump catches the idt properly.  Spinning in
> purgatory before the second kernel jump doesn't.  I even disabled the zero'ing
> out of the idt in the purgatory code.
> 
> I would like to get resolution on the ioapic deadlock to fix a customer issue
> while working the idt and NMI thing on the side, hence the split of this
> patchset.
> 
> Hopefully, people recognize there are two issues here and that this patch
> resolves the first one and the second one needs more debugging and time.
> ---
>  arch/x86/kernel/crash.c |    3 ---
>  1 files changed, 0 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index 13ad899..b053cf9 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -96,9 +96,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
>  	cpu_emergency_svm_disable();
>  
>  	lapic_shutdown();
> -#if defined(CONFIG_X86_IO_APIC)
> -	disable_IO_APIC();
> -#endif
>  #ifdef CONFIG_HPET_TIMER
>  	hpet_disable();
>  #endif
> -- 
> 1.7.7.6
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH] x86, kdump: No need to disable ioapic in crash path
  2012-03-15 21:33     ` Don Zickus
@ 2012-03-15 21:37       ` Seiji Aguchi
  0 siblings, 0 replies; 19+ messages in thread
From: Seiji Aguchi @ 2012-03-15 21:37 UTC (permalink / raw)
  To: Don Zickus; +Cc: x86, LKML, kexec-list, Eric W. Biederman, Vivek Goyal

Don,

Thank you.
I understand.

Seiji

> -----Original Message-----
> From: Don Zickus [mailto:dzickus@redhat.com]
> Sent: Thursday, March 15, 2012 5:34 PM
> To: Seiji Aguchi
> Cc: x86@kernel.org; LKML; kexec-list; Eric W. Biederman; Vivek Goyal
> Subject: Re: [PATCH] x86, kdump: No need to disable ioapic in crash path
> 
> On Thu, Mar 15, 2012 at 05:16:50PM -0400, Seiji Aguchi wrote:
> > Don,
> >
> > What do you think about following scenario?
> > Disabling I/O APIC seems to be needed before booting kdump kernel.
> 
> This patch was for *booting* the second kernel.  Kexec/kdump disables interrupts before jumping into the second kernel, so it
> doesn't matter what is done with the i/o apic as interrupts are blocked until the second kernel boots up and deals with them.
> 
> It is Suresh's patch that helps enable a patch like the one I proposed.
> 
> Cheers,
> Don
> 
> >
> > Seiji
> >
> >
> > commit 1e75b31d638d5242ca8e9771dfdcbd28a5f041df
> > Author: Suresh Siddha <suresh.b.siddha@intel.com>
> > Date:   Thu Aug 25 12:01:11 2011 -0700
> >
> >     x86, kdump, ioapic: Reset remote-IRR in clear_IO_APIC
> >
> >     In the kdump scenario mentioned below, we can have a case where
> >     the device using level triggered interrupt will not generate any
> >     interrupts in the kdump kernel.
> >
> >     1. IO-APIC sends a level triggered interrupt to the CPU's local APIC.
> >
> >     2. Kernel crashed before the CPU services this interrupt, leaving
> >        the remote-IRR in the IO-APIC set.
> >
> >     3. kdump kernel boot sequence does clear_IO_APIC() as part of IO-APIC
> >        initialization. But this fails to reset remote-IRR bit of the
> >        IO-APIC RTE as the remote-IRR bit is read-only.
> >
> >     4. Device using that level triggered entry can't generate any
> >        more interrupts because of the remote-IRR bit.
> >
> >     In clear_IO_APIC_pin(), check if the remote-IRR bit is set and if
> >     so do an explicit attempt to clear it (by doing EOI write on
> >     modern io-apic's and changing trigger mode to edge/level on
> >     older io-apic's). Also before doing the explicit EOI to the
> >     io-apic, ensure that the trigger mode is indeed set to level.
> >     This will enable the explicit EOI to the io-apic to reset the
> >     remote-IRR bit.
> >
> >     Tested-by: Leonardo Chiquitto <lchiquitto@novell.com>
> >     Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
> >     Fixes: https://bugzilla.novell.com/show_bug.cgi?id=701686
> >     Cc: Rafael Wysocki <rjw@novell.com>
> >     Cc: Maciej W. Rozycki <macro@linux-mips.org>
> >     Cc: Thomas Renninger <trenn@suse.de>
> >     Cc: jbeulich@novell.com
> >     Cc: yinghai@kernel.org
> >     Link: http://lkml.kernel.org/r/20110825190657.157502602@sbsiddha-desk.sc.intel.com
> >     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> >
> > > -----Original Message-----
> > > From: linux-kernel-owner@vger.kernel.org
> > > [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Don Zickus
> > > Sent: Thursday, March 15, 2012 4:27 PM
> > > To: x86@kernel.org
> > > Cc: LKML; kexec-list; Eric W. Biederman; Vivek Goyal
> > > Subject: Re: [PATCH] x86, kdump: No need to disable ioapic in crash
> > > path
> > >
> > > On Wed, Feb 29, 2012 at 03:08:49PM -0500, Don Zickus wrote:
> > > > A customer of ours noticed when their machine crashed, kdump did
> > > > not work but hung instead.  Using their firmware dumping solution
> > > > they grabbed a vmcore and decoded the stacks on the cpus.  What
> > > > they noticed seemed to be a rare deadlock with the ioapic_lock.
> > >
> > > While we are discussing the NMI stuff in another thread, does anyone
> > > have any objection to committing this patch.  It fixes a real problem today.
> > >
> > > Cheers,
> > > Don
> > >
> > > >
> > > >  CPU4:
> > > >  machine_crash_shutdown
> > > >  -> machine_ops.crash_shutdown
> > > >     -> native_machine_crash_shutdown
> > > >        -> kdump_nmi_shootdown_cpus ------> Send NMI to other CPUs
> > > >        -> disable_IO_APIC
> > > >           -> clear_IO_APIC
> > > >              -> clear_IO_APIC_pin
> > > >                 -> ioapic_read_entry
> > > >                    -> spin_lock_irqsave(&ioapic_lock, flags)
> > > >                    ---Infinite loop here---
> > > >
> > > >  CPU0:
> > > >  do_IRQ
> > > >  -> handle_irq
> > > >     -> handle_edge_irq
> > > >         -> ack_apic_edge
> > > >            -> move_native_irq
> > > >                -> mask_IO_APIC_irq
> > > >                   -> mask_IO_APIC_irq_desc
> > > >                      -> spin_lock_irqsave(&ioapic_lock, flags)
> > > >                      ---Receive NMI here after getting spinlock---
> > > >                         -> nmi
> > > >                            -> do_nmi
> > > >                               -> crash_nmi_callback
> > > >                               ---Infinite loop here---
> > > >
> > > > The problem is that although kdump tries to shutdown minimal
> > > > hardware, it still needs to disable the IO APIC.  This requires
> > > > spinlocks which may be held by another cpu.  This other cpu is
> > > > being held infinitely in an NMI context by kdump in order to serialize the crashing path.
> > > > Instant deadlock.
> > > >
> > > > Eric, brought up a point that because the boot code was
> > > > restructured we may not need to disable the io apic any more in the crash path.
> > > > The original concern that led to the development of
> > > > disable_IO_APIC, was that the jiffies calibration on boot up
> > > > relied on the PIT timer for reference.  Access to the PIT required
> > > > 8259 interrupts to be working.  This wouldn't work if the ioapic needed to be configured.
> > > > So on panic path, the ioapic was reconfigured to use virtual wire mode to allow the 8259 to passthrough.
> > > >
> > > > Those concerns don't hold true now, thanks to the jiffies
> > > > calibration code not needing the PIT.  As a result, we can remove
> > > > this call and simplify the locking needed in the panic path.
> > > >
> > > > I tested kdump on an Ivy Bridge platform, a Pentium4 and an old
> > > > athlon that did not have an ioapic.  All three were successful.
> > > >
> > > > I also tested using lkdtm that would use jprobes to panic the
> > > > system when entering do_IRQ.  The idea was to see how the system
> > > > reacted with an interrupt pending in the second kernel.  My core2
> > > > quad successfully kdump'd
> > > > 3 times in a row with no issues.
> > > >
> > > > v2: removed the disable lapic code too
> > > > v3: re-add disabling of lapic code
> > > >
> > > > Cc: Eric W. Biederman <ebiederm@xmission.com>
> > > > Cc: Vivek Goyal <vgoyal@redhat.com>
> > > > Signed-off-by: Don Zickus <dzickus@redhat.com>
> > > > ---
> > > >
> > > > There are really two problems here.  One is the deadlock of the
> > > > ioapic_lock that I describe above.  Removing the code to disable
> > > > the ioapic seems to resolve that.
> > > >
> > > > The second issue is handling non-IRQ exceptions like NMIs.  Eric
> > > > asked me to include removing the disable lapic code too.  However,
> > > > because the nmi watchdog is stil active and kexec zeros out the
> > > > idt before it jumps to purgatory, an NMI that comes in during the
> > > > transition between the first kernel and second kernel will see an empty idt and reset the cpu.
> > > >
> > > > Leaving the code to disable the lapic in, turns off perf and
> > > > blocks those NMIs from happening (though an external NMI would
> > > > still be an issue but that is no different than right now).
> > > >
> > > > I tried playing with a stub idt and leaving it in place through
> > > > the transition to the second kernel, but I can't quite get it to
> > > > work correctly.  Spinning in the first kernel before the purgatory
> > > > jump catches the idt properly.  Spinning in purgatory before the
> > > > second kernel jump doesn't.  I even disabled the zero'ing out of the idt in the purgatory code.
> > > >
> > > > I would like to get resolution on the ioapic deadlock to fix a
> > > > customer issue while working the idt and NMI thing on the side,
> > > > hence the split of this patchset.
> > > >
> > > > Hopefully, people recognize there are two issues here and that
> > > > this patch resolves the first one and the second one needs more debugging and time.
> > > > ---
> > > >  arch/x86/kernel/crash.c |    3 ---
> > > >  1 files changed, 0 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> > > > index
> > > > 13ad899..b053cf9 100644
> > > > --- a/arch/x86/kernel/crash.c
> > > > +++ b/arch/x86/kernel/crash.c
> > > > @@ -96,9 +96,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
> > > >  	cpu_emergency_svm_disable();
> > > >
> > > >  	lapic_shutdown();
> > > > -#if defined(CONFIG_X86_IO_APIC)
> > > > -	disable_IO_APIC();
> > > > -#endif
> > > >  #ifdef CONFIG_HPET_TIMER
> > > >  	hpet_disable();
> > > >  #endif
> > > > --
> > > > 1.7.7.6
> > > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe
> > > linux-kernel" in the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86, kdump: No need to disable ioapic in crash path
  2012-03-15 21:16   ` Seiji Aguchi
@ 2012-03-15 21:33     ` Don Zickus
  2012-03-15 21:37       ` Seiji Aguchi
  2012-04-30 20:53     ` Don Zickus
  1 sibling, 1 reply; 19+ messages in thread
From: Don Zickus @ 2012-03-15 21:33 UTC (permalink / raw)
  To: Seiji Aguchi; +Cc: x86, LKML, kexec-list, Eric W. Biederman, Vivek Goyal

On Thu, Mar 15, 2012 at 05:16:50PM -0400, Seiji Aguchi wrote:
> Don,
> 
> What do you think about following scenario?
> Disabling I/O APIC seems to be needed before booting kdump kernel.

This patch was for *booting* the second kernel.  Kexec/kdump disables
interrupts before jumping into the second kernel, so it doesn't matter
what is done with the i/o apic as interrupts are blocked until the second
kernel boots up and deals with them.

It is Suresh's patch that helps enable a patch like the one I proposed.

Cheers,
Don

> 
> Seiji
> 
> 
> commit 1e75b31d638d5242ca8e9771dfdcbd28a5f041df
> Author: Suresh Siddha <suresh.b.siddha@intel.com>
> Date:   Thu Aug 25 12:01:11 2011 -0700
> 
>     x86, kdump, ioapic: Reset remote-IRR in clear_IO_APIC
>     
>     In the kdump scenario mentioned below, we can have a case where
>     the device using level triggered interrupt will not generate any
>     interrupts in the kdump kernel.
>     
>     1. IO-APIC sends a level triggered interrupt to the CPU's local APIC.
>     
>     2. Kernel crashed before the CPU services this interrupt, leaving
>        the remote-IRR in the IO-APIC set.
>     
>     3. kdump kernel boot sequence does clear_IO_APIC() as part of IO-APIC
>        initialization. But this fails to reset remote-IRR bit of the
>        IO-APIC RTE as the remote-IRR bit is read-only.
>     
>     4. Device using that level triggered entry can't generate any
>        more interrupts because of the remote-IRR bit.
>     
>     In clear_IO_APIC_pin(), check if the remote-IRR bit is set and if
>     so do an explicit attempt to clear it (by doing EOI write on
>     modern io-apic's and changing trigger mode to edge/level on
>     older io-apic's). Also before doing the explicit EOI to the
>     io-apic, ensure that the trigger mode is indeed set to level.
>     This will enable the explicit EOI to the io-apic to reset the
>     remote-IRR bit.
>     
>     Tested-by: Leonardo Chiquitto <lchiquitto@novell.com>
>     Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
>     Fixes: https://bugzilla.novell.com/show_bug.cgi?id=701686
>     Cc: Rafael Wysocki <rjw@novell.com>
>     Cc: Maciej W. Rozycki <macro@linux-mips.org>
>     Cc: Thomas Renninger <trenn@suse.de>
>     Cc: jbeulich@novell.com
>     Cc: yinghai@kernel.org
>     Link: http://lkml.kernel.org/r/20110825190657.157502602@sbsiddha-desk.sc.intel.com
>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> 
> > -----Original Message-----
> > From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Don Zickus
> > Sent: Thursday, March 15, 2012 4:27 PM
> > To: x86@kernel.org
> > Cc: LKML; kexec-list; Eric W. Biederman; Vivek Goyal
> > Subject: Re: [PATCH] x86, kdump: No need to disable ioapic in crash path
> > 
> > On Wed, Feb 29, 2012 at 03:08:49PM -0500, Don Zickus wrote:
> > > A customer of ours noticed when their machine crashed, kdump did not
> > > work but hung instead.  Using their firmware dumping solution they
> > > grabbed a vmcore and decoded the stacks on the cpus.  What they
> > > noticed seemed to be a rare deadlock with the ioapic_lock.
> > 
> > While we are discussing the NMI stuff in another thread, does anyone have any objection to committing this patch.  It fixes a real
> > problem today.
> > 
> > Cheers,
> > Don
> > 
> > >
> > >  CPU4:
> > >  machine_crash_shutdown
> > >  -> machine_ops.crash_shutdown
> > >     -> native_machine_crash_shutdown
> > >        -> kdump_nmi_shootdown_cpus ------> Send NMI to other CPUs
> > >        -> disable_IO_APIC
> > >           -> clear_IO_APIC
> > >              -> clear_IO_APIC_pin
> > >                 -> ioapic_read_entry
> > >                    -> spin_lock_irqsave(&ioapic_lock, flags)
> > >                    ---Infinite loop here---
> > >
> > >  CPU0:
> > >  do_IRQ
> > >  -> handle_irq
> > >     -> handle_edge_irq
> > >         -> ack_apic_edge
> > >            -> move_native_irq
> > >                -> mask_IO_APIC_irq
> > >                   -> mask_IO_APIC_irq_desc
> > >                      -> spin_lock_irqsave(&ioapic_lock, flags)
> > >                      ---Receive NMI here after getting spinlock---
> > >                         -> nmi
> > >                            -> do_nmi
> > >                               -> crash_nmi_callback
> > >                               ---Infinite loop here---
> > >
> > > The problem is that although kdump tries to shutdown minimal hardware,
> > > it still needs to disable the IO APIC.  This requires spinlocks which
> > > may be held by another cpu.  This other cpu is being held infinitely
> > > in an NMI context by kdump in order to serialize the crashing path.
> > > Instant deadlock.
> > >
> > > Eric, brought up a point that because the boot code was restructured
> > > we may not need to disable the io apic any more in the crash path.
> > > The original concern that led to the development of disable_IO_APIC,
> > > was that the jiffies calibration on boot up relied on the PIT timer
> > > for reference.  Access to the PIT required 8259 interrupts to be
> > > working.  This wouldn't work if the ioapic needed to be configured.
> > > So on panic path, the ioapic was reconfigured to use virtual wire mode to allow the 8259 to passthrough.
> > >
> > > Those concerns don't hold true now, thanks to the jiffies calibration
> > > code not needing the PIT.  As a result, we can remove this call and
> > > simplify the locking needed in the panic path.
> > >
> > > I tested kdump on an Ivy Bridge platform, a Pentium4 and an old athlon
> > > that did not have an ioapic.  All three were successful.
> > >
> > > I also tested using lkdtm that would use jprobes to panic the system
> > > when entering do_IRQ.  The idea was to see how the system reacted with
> > > an interrupt pending in the second kernel.  My core2 quad successfully
> > > kdump'd
> > > 3 times in a row with no issues.
> > >
> > > v2: removed the disable lapic code too
> > > v3: re-add disabling of lapic code
> > >
> > > Cc: Eric W. Biederman <ebiederm@xmission.com>
> > > Cc: Vivek Goyal <vgoyal@redhat.com>
> > > Signed-off-by: Don Zickus <dzickus@redhat.com>
> > > ---
> > >
> > > There are really two problems here.  One is the deadlock of the
> > > ioapic_lock that I describe above.  Removing the code to disable the
> > > ioapic seems to resolve that.
> > >
> > > The second issue is handling non-IRQ exceptions like NMIs.  Eric asked
> > > me to include removing the disable lapic code too.  However, because
> > > the nmi watchdog is stil active and kexec zeros out the idt before it
> > > jumps to purgatory, an NMI that comes in during the transition between
> > > the first kernel and second kernel will see an empty idt and reset the cpu.
> > >
> > > Leaving the code to disable the lapic in, turns off perf and blocks
> > > those NMIs from happening (though an external NMI would still be an
> > > issue but that is no different than right now).
> > >
> > > I tried playing with a stub idt and leaving it in place through the
> > > transition to the second kernel, but I can't quite get it to work
> > > correctly.  Spinning in the first kernel before the purgatory jump
> > > catches the idt properly.  Spinning in purgatory before the second
> > > kernel jump doesn't.  I even disabled the zero'ing out of the idt in the purgatory code.
> > >
> > > I would like to get resolution on the ioapic deadlock to fix a
> > > customer issue while working the idt and NMI thing on the side, hence
> > > the split of this patchset.
> > >
> > > Hopefully, people recognize there are two issues here and that this
> > > patch resolves the first one and the second one needs more debugging and time.
> > > ---
> > >  arch/x86/kernel/crash.c |    3 ---
> > >  1 files changed, 0 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index
> > > 13ad899..b053cf9 100644
> > > --- a/arch/x86/kernel/crash.c
> > > +++ b/arch/x86/kernel/crash.c
> > > @@ -96,9 +96,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
> > >  	cpu_emergency_svm_disable();
> > >
> > >  	lapic_shutdown();
> > > -#if defined(CONFIG_X86_IO_APIC)
> > > -	disable_IO_APIC();
> > > -#endif
> > >  #ifdef CONFIG_HPET_TIMER
> > >  	hpet_disable();
> > >  #endif
> > > --
> > > 1.7.7.6
> > >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More
> > majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH] x86, kdump: No need to disable ioapic in crash path
  2012-03-15 20:26 ` Don Zickus
@ 2012-03-15 21:16   ` Seiji Aguchi
  2012-03-15 21:33     ` Don Zickus
  2012-04-30 20:53     ` Don Zickus
  0 siblings, 2 replies; 19+ messages in thread
From: Seiji Aguchi @ 2012-03-15 21:16 UTC (permalink / raw)
  To: Don Zickus, x86; +Cc: LKML, kexec-list, Eric W. Biederman, Vivek Goyal

Don,

What do you think about following scenario?
Disabling I/O APIC seems to be needed before booting kdump kernel.

Seiji


commit 1e75b31d638d5242ca8e9771dfdcbd28a5f041df
Author: Suresh Siddha <suresh.b.siddha@intel.com>
Date:   Thu Aug 25 12:01:11 2011 -0700

    x86, kdump, ioapic: Reset remote-IRR in clear_IO_APIC
    
    In the kdump scenario mentioned below, we can have a case where
    the device using level triggered interrupt will not generate any
    interrupts in the kdump kernel.
    
    1. IO-APIC sends a level triggered interrupt to the CPU's local APIC.
    
    2. Kernel crashed before the CPU services this interrupt, leaving
       the remote-IRR in the IO-APIC set.
    
    3. kdump kernel boot sequence does clear_IO_APIC() as part of IO-APIC
       initialization. But this fails to reset remote-IRR bit of the
       IO-APIC RTE as the remote-IRR bit is read-only.
    
    4. Device using that level triggered entry can't generate any
       more interrupts because of the remote-IRR bit.
    
    In clear_IO_APIC_pin(), check if the remote-IRR bit is set and if
    so do an explicit attempt to clear it (by doing EOI write on
    modern io-apic's and changing trigger mode to edge/level on
    older io-apic's). Also before doing the explicit EOI to the
    io-apic, ensure that the trigger mode is indeed set to level.
    This will enable the explicit EOI to the io-apic to reset the
    remote-IRR bit.
    
    Tested-by: Leonardo Chiquitto <lchiquitto@novell.com>
    Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
    Fixes: https://bugzilla.novell.com/show_bug.cgi?id=701686
    Cc: Rafael Wysocki <rjw@novell.com>
    Cc: Maciej W. Rozycki <macro@linux-mips.org>
    Cc: Thomas Renninger <trenn@suse.de>
    Cc: jbeulich@novell.com
    Cc: yinghai@kernel.org
    Link: http://lkml.kernel.org/r/20110825190657.157502602@sbsiddha-desk.sc.intel.com
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Don Zickus
> Sent: Thursday, March 15, 2012 4:27 PM
> To: x86@kernel.org
> Cc: LKML; kexec-list; Eric W. Biederman; Vivek Goyal
> Subject: Re: [PATCH] x86, kdump: No need to disable ioapic in crash path
> 
> On Wed, Feb 29, 2012 at 03:08:49PM -0500, Don Zickus wrote:
> > A customer of ours noticed when their machine crashed, kdump did not
> > work but hung instead.  Using their firmware dumping solution they
> > grabbed a vmcore and decoded the stacks on the cpus.  What they
> > noticed seemed to be a rare deadlock with the ioapic_lock.
> 
> While we are discussing the NMI stuff in another thread, does anyone have any objection to committing this patch.  It fixes a real
> problem today.
> 
> Cheers,
> Don
> 
> >
> >  CPU4:
> >  machine_crash_shutdown
> >  -> machine_ops.crash_shutdown
> >     -> native_machine_crash_shutdown
> >        -> kdump_nmi_shootdown_cpus ------> Send NMI to other CPUs
> >        -> disable_IO_APIC
> >           -> clear_IO_APIC
> >              -> clear_IO_APIC_pin
> >                 -> ioapic_read_entry
> >                    -> spin_lock_irqsave(&ioapic_lock, flags)
> >                    ---Infinite loop here---
> >
> >  CPU0:
> >  do_IRQ
> >  -> handle_irq
> >     -> handle_edge_irq
> >         -> ack_apic_edge
> >            -> move_native_irq
> >                -> mask_IO_APIC_irq
> >                   -> mask_IO_APIC_irq_desc
> >                      -> spin_lock_irqsave(&ioapic_lock, flags)
> >                      ---Receive NMI here after getting spinlock---
> >                         -> nmi
> >                            -> do_nmi
> >                               -> crash_nmi_callback
> >                               ---Infinite loop here---
> >
> > The problem is that although kdump tries to shutdown minimal hardware,
> > it still needs to disable the IO APIC.  This requires spinlocks which
> > may be held by another cpu.  This other cpu is being held infinitely
> > in an NMI context by kdump in order to serialize the crashing path.
> > Instant deadlock.
> >
> > Eric, brought up a point that because the boot code was restructured
> > we may not need to disable the io apic any more in the crash path.
> > The original concern that led to the development of disable_IO_APIC,
> > was that the jiffies calibration on boot up relied on the PIT timer
> > for reference.  Access to the PIT required 8259 interrupts to be
> > working.  This wouldn't work if the ioapic needed to be configured.
> > So on panic path, the ioapic was reconfigured to use virtual wire mode to allow the 8259 to passthrough.
> >
> > Those concerns don't hold true now, thanks to the jiffies calibration
> > code not needing the PIT.  As a result, we can remove this call and
> > simplify the locking needed in the panic path.
> >
> > I tested kdump on an Ivy Bridge platform, a Pentium4 and an old athlon
> > that did not have an ioapic.  All three were successful.
> >
> > I also tested using lkdtm that would use jprobes to panic the system
> > when entering do_IRQ.  The idea was to see how the system reacted with
> > an interrupt pending in the second kernel.  My core2 quad successfully
> > kdump'd
> > 3 times in a row with no issues.
> >
> > v2: removed the disable lapic code too
> > v3: re-add disabling of lapic code
> >
> > Cc: Eric W. Biederman <ebiederm@xmission.com>
> > Cc: Vivek Goyal <vgoyal@redhat.com>
> > Signed-off-by: Don Zickus <dzickus@redhat.com>
> > ---
> >
> > There are really two problems here.  One is the deadlock of the
> > ioapic_lock that I describe above.  Removing the code to disable the
> > ioapic seems to resolve that.
> >
> > The second issue is handling non-IRQ exceptions like NMIs.  Eric asked
> > me to include removing the disable lapic code too.  However, because
> > the nmi watchdog is stil active and kexec zeros out the idt before it
> > jumps to purgatory, an NMI that comes in during the transition between
> > the first kernel and second kernel will see an empty idt and reset the cpu.
> >
> > Leaving the code to disable the lapic in, turns off perf and blocks
> > those NMIs from happening (though an external NMI would still be an
> > issue but that is no different than right now).
> >
> > I tried playing with a stub idt and leaving it in place through the
> > transition to the second kernel, but I can't quite get it to work
> > correctly.  Spinning in the first kernel before the purgatory jump
> > catches the idt properly.  Spinning in purgatory before the second
> > kernel jump doesn't.  I even disabled the zero'ing out of the idt in the purgatory code.
> >
> > I would like to get resolution on the ioapic deadlock to fix a
> > customer issue while working the idt and NMI thing on the side, hence
> > the split of this patchset.
> >
> > Hopefully, people recognize there are two issues here and that this
> > patch resolves the first one and the second one needs more debugging and time.
> > ---
> >  arch/x86/kernel/crash.c |    3 ---
> >  1 files changed, 0 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index
> > 13ad899..b053cf9 100644
> > --- a/arch/x86/kernel/crash.c
> > +++ b/arch/x86/kernel/crash.c
> > @@ -96,9 +96,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
> >  	cpu_emergency_svm_disable();
> >
> >  	lapic_shutdown();
> > -#if defined(CONFIG_X86_IO_APIC)
> > -	disable_IO_APIC();
> > -#endif
> >  #ifdef CONFIG_HPET_TIMER
> >  	hpet_disable();
> >  #endif
> > --
> > 1.7.7.6
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More
> majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86, kdump: No need to disable ioapic in crash path
  2012-02-29 20:08 Don Zickus
@ 2012-03-15 20:26 ` Don Zickus
  2012-03-15 21:16   ` Seiji Aguchi
  2012-03-29 16:02 ` Don Zickus
  1 sibling, 1 reply; 19+ messages in thread
From: Don Zickus @ 2012-03-15 20:26 UTC (permalink / raw)
  To: x86; +Cc: LKML, kexec-list, Eric W. Biederman, Vivek Goyal

On Wed, Feb 29, 2012 at 03:08:49PM -0500, Don Zickus wrote:
> A customer of ours noticed when their machine crashed, kdump did not
> work but hung instead.  Using their firmware dumping solution they
> grabbed a vmcore and decoded the stacks on the cpus.  What they
> noticed seemed to be a rare deadlock with the ioapic_lock.

While we are discussing the NMI stuff in another thread, does anyone have
any objection to committing this patch.  It fixes a real problem today.

Cheers,
Don

> 
>  CPU4:
>  machine_crash_shutdown
>  -> machine_ops.crash_shutdown
>     -> native_machine_crash_shutdown
>        -> kdump_nmi_shootdown_cpus ------> Send NMI to other CPUs
>        -> disable_IO_APIC
>           -> clear_IO_APIC
>              -> clear_IO_APIC_pin
>                 -> ioapic_read_entry
>                    -> spin_lock_irqsave(&ioapic_lock, flags)
>                    ---Infinite loop here---
> 
>  CPU0:
>  do_IRQ
>  -> handle_irq
>     -> handle_edge_irq
>         -> ack_apic_edge
>            -> move_native_irq
>                -> mask_IO_APIC_irq
>                   -> mask_IO_APIC_irq_desc
>                      -> spin_lock_irqsave(&ioapic_lock, flags)
>                      ---Receive NMI here after getting spinlock---
>                         -> nmi
>                            -> do_nmi
>                               -> crash_nmi_callback
>                               ---Infinite loop here---
> 
> The problem is that although kdump tries to shutdown minimal hardware,
> it still needs to disable the IO APIC.  This requires spinlocks which
> may be held by another cpu.  This other cpu is being held infinitely in
> an NMI context by kdump in order to serialize the crashing path.  Instant
> deadlock.
> 
> Eric, brought up a point that because the boot code was restructured we may
> not need to disable the io apic any more in the crash path.  The original
> concern that led to the development of disable_IO_APIC, was that the jiffies
> calibration on boot up relied on the PIT timer for reference.  Access
> to the PIT required 8259 interrupts to be working.  This wouldn't work
> if the ioapic needed to be configured.  So on panic path, the ioapic was
> reconfigured to use virtual wire mode to allow the 8259 to passthrough.
> 
> Those concerns don't hold true now, thanks to the jiffies calibration code
> not needing the PIT.  As a result, we can remove this call and simplify the
> locking needed in the panic path.
> 
> I tested kdump on an Ivy Bridge platform, a Pentium4 and an old athlon that
> did not have an ioapic.  All three were successful.
> 
> I also tested using lkdtm that would use jprobes to panic the system when
> entering do_IRQ.  The idea was to see how the system reacted with an
> interrupt pending in the second kernel.  My core2 quad successfully kdump'd
> 3 times in a row with no issues.
> 
> v2: removed the disable lapic code too
> v3: re-add disabling of lapic code
> 
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Signed-off-by: Don Zickus <dzickus@redhat.com>
> ---
> 
> There are really two problems here.  One is the deadlock of the ioapic_lock
> that I describe above.  Removing the code to disable the ioapic seems to
> resolve that.
> 
> The second issue is handling non-IRQ exceptions like NMIs.  Eric asked me
> to include removing the disable lapic code too.  However, because the nmi
> watchdog is stil active and kexec zeros out the idt before it jumps to
> purgatory, an NMI that comes in during the transition between the first
> kernel and second kernel will see an empty idt and reset the cpu.
> 
> Leaving the code to disable the lapic in, turns off perf and blocks those NMIs
> from happening (though an external NMI would still be an issue but that is no
> different than right now).
> 
> I tried playing with a stub idt and leaving it in place through the transition
> to the second kernel, but I can't quite get it to work correctly.  Spinning in the
> first kernel before the purgatory jump catches the idt properly.  Spinning in
> purgatory before the second kernel jump doesn't.  I even disabled the zero'ing
> out of the idt in the purgatory code.
> 
> I would like to get resolution on the ioapic deadlock to fix a customer issue
> while working the idt and NMI thing on the side, hence the split of this
> patchset.
> 
> Hopefully, people recognize there are two issues here and that this patch
> resolves the first one and the second one needs more debugging and time.
> ---
>  arch/x86/kernel/crash.c |    3 ---
>  1 files changed, 0 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index 13ad899..b053cf9 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -96,9 +96,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
>  	cpu_emergency_svm_disable();
>  
>  	lapic_shutdown();
> -#if defined(CONFIG_X86_IO_APIC)
> -	disable_IO_APIC();
> -#endif
>  #ifdef CONFIG_HPET_TIMER
>  	hpet_disable();
>  #endif
> -- 
> 1.7.7.6
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] x86, kdump: No need to disable ioapic in crash path
@ 2012-02-29 20:08 Don Zickus
  2012-03-15 20:26 ` Don Zickus
  2012-03-29 16:02 ` Don Zickus
  0 siblings, 2 replies; 19+ messages in thread
From: Don Zickus @ 2012-02-29 20:08 UTC (permalink / raw)
  To: x86; +Cc: LKML, kexec-list, Don Zickus, Eric W. Biederman, Vivek Goyal

A customer of ours noticed when their machine crashed, kdump did not
work but hung instead.  Using their firmware dumping solution they
grabbed a vmcore and decoded the stacks on the cpus.  What they
noticed seemed to be a rare deadlock with the ioapic_lock.

 CPU4:
 machine_crash_shutdown
 -> machine_ops.crash_shutdown
    -> native_machine_crash_shutdown
       -> kdump_nmi_shootdown_cpus ------> Send NMI to other CPUs
       -> disable_IO_APIC
          -> clear_IO_APIC
             -> clear_IO_APIC_pin
                -> ioapic_read_entry
                   -> spin_lock_irqsave(&ioapic_lock, flags)
                   ---Infinite loop here---

 CPU0:
 do_IRQ
 -> handle_irq
    -> handle_edge_irq
        -> ack_apic_edge
           -> move_native_irq
               -> mask_IO_APIC_irq
                  -> mask_IO_APIC_irq_desc
                     -> spin_lock_irqsave(&ioapic_lock, flags)
                     ---Receive NMI here after getting spinlock---
                        -> nmi
                           -> do_nmi
                              -> crash_nmi_callback
                              ---Infinite loop here---

The problem is that although kdump tries to shutdown minimal hardware,
it still needs to disable the IO APIC.  This requires spinlocks which
may be held by another cpu.  This other cpu is being held infinitely in
an NMI context by kdump in order to serialize the crashing path.  Instant
deadlock.

Eric, brought up a point that because the boot code was restructured we may
not need to disable the io apic any more in the crash path.  The original
concern that led to the development of disable_IO_APIC, was that the jiffies
calibration on boot up relied on the PIT timer for reference.  Access
to the PIT required 8259 interrupts to be working.  This wouldn't work
if the ioapic needed to be configured.  So on panic path, the ioapic was
reconfigured to use virtual wire mode to allow the 8259 to passthrough.

Those concerns don't hold true now, thanks to the jiffies calibration code
not needing the PIT.  As a result, we can remove this call and simplify the
locking needed in the panic path.

I tested kdump on an Ivy Bridge platform, a Pentium4 and an old athlon that
did not have an ioapic.  All three were successful.

I also tested using lkdtm that would use jprobes to panic the system when
entering do_IRQ.  The idea was to see how the system reacted with an
interrupt pending in the second kernel.  My core2 quad successfully kdump'd
3 times in a row with no issues.

v2: removed the disable lapic code too
v3: re-add disabling of lapic code

Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
---

There are really two problems here.  One is the deadlock of the ioapic_lock
that I describe above.  Removing the code to disable the ioapic seems to
resolve that.

The second issue is handling non-IRQ exceptions like NMIs.  Eric asked me
to include removing the disable lapic code too.  However, because the nmi
watchdog is stil active and kexec zeros out the idt before it jumps to
purgatory, an NMI that comes in during the transition between the first
kernel and second kernel will see an empty idt and reset the cpu.

Leaving the code to disable the lapic in, turns off perf and blocks those NMIs
from happening (though an external NMI would still be an issue but that is no
different than right now).

I tried playing with a stub idt and leaving it in place through the transition
to the second kernel, but I can't quite get it to work correctly.  Spinning in the
first kernel before the purgatory jump catches the idt properly.  Spinning in
purgatory before the second kernel jump doesn't.  I even disabled the zero'ing
out of the idt in the purgatory code.

I would like to get resolution on the ioapic deadlock to fix a customer issue
while working the idt and NMI thing on the side, hence the split of this
patchset.

Hopefully, people recognize there are two issues here and that this patch
resolves the first one and the second one needs more debugging and time.
---
 arch/x86/kernel/crash.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 13ad899..b053cf9 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -96,9 +96,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 	cpu_emergency_svm_disable();
 
 	lapic_shutdown();
-#if defined(CONFIG_X86_IO_APIC)
-	disable_IO_APIC();
-#endif
 #ifdef CONFIG_HPET_TIMER
 	hpet_disable();
 #endif
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-05-02 20:24 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-02 18:12 [PATCH] x86, kdump: No need to disable ioapic in crash path Don Zickus
2012-02-02 23:24 ` Eric W. Biederman
2012-02-07 21:57   ` Don Zickus
2012-02-07 22:19     ` Vivek Goyal
2012-02-07 23:35       ` Eric W. Biederman
2012-02-08 20:11         ` Don Zickus
2012-02-08 22:55           ` Eric W. Biederman
2012-02-09 14:48             ` Don Zickus
2012-02-29 20:08 Don Zickus
2012-03-15 20:26 ` Don Zickus
2012-03-15 21:16   ` Seiji Aguchi
2012-03-15 21:33     ` Don Zickus
2012-03-15 21:37       ` Seiji Aguchi
2012-04-30 20:53     ` Don Zickus
2012-05-02 19:10       ` Seiji Aguchi
2012-05-02 19:39         ` Eric W. Biederman
2012-05-02 19:59           ` Don Zickus
2012-05-02 20:24             ` Eric W. Biederman
2012-03-29 16:02 ` Don Zickus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).