linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Don Zickus <dzickus@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>,
	linux-kernel@vger.kernel.org, mingo@redhat.com, hpa@zytor.com,
	torvalds@linux-foundation.org, kexec@lists.infradead.org,
	vgoyal@redhat.com, akpm@linux-foundation.org, tglx@linutronix.de,
	mingo@elte.hu, linux-tip-commits@vger.kernel.org
Subject: Re: [tip:x86/debug] x86/kdump: No need to disable ioapic/ lapic in crash path
Date: Fri, 17 Feb 2012 04:41:01 -0800	[thread overview]
Message-ID: <m1fwe9bpk2.fsf@fess.ebiederm.org> (raw)
In-Reply-To: <m1y5s2ku36.fsf@fess.ebiederm.org> (Eric W. Biederman's message of "Thu, 16 Feb 2012 19:38:21 -0800")

ebiederm@xmission.com (Eric W. Biederman) writes:

> Don Zickus <dzickus@redhat.com> writes:
>
>> On Thu, Feb 16, 2012 at 01:53:29PM -0800, Yinghai Lu wrote:
>>> On Thu, Feb 16, 2012 at 9:27 AM, Don Zickus <dzickus@redhat.com> wrote:
>>> 
>>> > So I think I figured it out.  I went through and commented out code in
>>> > disable_local_APIC until I narrowed it down to the piece of code that
>>> > needs to be disabled for it to work.
>>> >
>>> > Surprise, surprise... its LVTPC or perf! :-)  Actually it is the
>>> > nmi_watchdog which uses perf.  My theory is NMIs are not disabled and one
>>> > is generated by the local apic during decompression (just bad timing) and
>>> > *splat*.
>>> >
>>> > Yinghai, you can probably prove this by
>>> >
>>> > echo 0 > /proc/sys/kernel/nmi_watchdog
>>> >
>>> > then do your kdump crash test.
>>> 
>>> yes.  that will make kdump crash working.
>>
>> Cool.  Thanks.
>>
>> Eric,
>>
>> Just let me know how you want to handle disabling NMIs in the kexec in
>> panic shutdown case.
>
> Interesting.  Apparently we have been avoiding this problem by accident.
>
> Thanks for hunting this down.
>
> The options I can see are:
> - Ensure we can handle and ignore exceptions like this.
> - Always shutoff the lapic and ioapic entries that can generate this.
>
> The good news is that both solutions should be lock free.
>
> The current kernel boot code relies on the assumption that all
> interrupts can be disabled.  In this case with nmi's that is clearly not
> the case.
>
> The most robust solution and what we want to do long term is to
> install an idt that will simply ignore all interrupts until the
> idt is replaced.  Since really all we need to deal with is the NMI
> vector, which is vector #2, we can have a very small interrupt
> descriptor table.
>
> Unfortunately we go through some cpu mode switches in /sbin/kexec,
> allowing us to enter the kernels 32bit entry point before we
> run the decompresser, so at first glance both /sbin/kexec and the
> kernel need to be fixed in a coordinated fashion.
>
> There are two was I can see of removing the need for an exactly
> coordinated release.
> - Document that an old /sbin/kexec userspace requires you not to
>   use the nmi watchdog with modern kernels.
> - For a short while simply retain code that stomps the nmi watchdog.
>   (But still leaves us open to other kinds of nmi's).
>
> Grr.  Looking a little more closely, all throughout the linux kernel's
> boot there is the assumption that any interrupt during boot is a failure
> of some kind, and except for an errant nmi watchdog that is a true
> assumption.
>
> Don I guess I really have to recommend disabling the nmi watchdog in the
> kexec on panic path if we can do so at all reasonably. 
>
> I like the idea of ignoring nmis during boot but that seems to be a
> slightly larger project and with little practical improvement in kexec
> on panic quality.  Other than getting what should be one or two
> i/o writes out of the kexec on panic path.

Hmm.

Thinking about it a little more.  The kernel's boot path is inconsistent
with the rest of the kernel's nmi handling.  For anything exception
except an nmi stopping and giving up is fine.  For an nmi it is very
rare for an NMI to signal a truly nasty failure (usually it just means
someone saw a bitflip somewhere), and we can almost always continue
without problem.

I think in practice we really should make our boot path consistent with
the rest of the kernel and ignore/log/report nmis but not fail on them.
Triple faulting (trigger a cpu reset) as we do today just seems like a
recipe for deep and confusing mystery, and not being helpful to the
user.

My preferred fix would be to fix the boot path and /sbin/kexec to ignore
and report nmis as we boot, as that is really what we want long term and
it gives us the most robust solution.

The fix with a guarantee of no more scope creep is to just disable the
nmi watchdog on the kexec on panic path.

Don if you have time please figure out is needed to ignore nmi's and
possible record and/or report them while we boot, otherwise please cook
up a patch that just disables the nmi watchdog wherever we are sending
it from (the local apic or the ioapic).

Eric

  reply	other threads:[~2012-02-17 12:38 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <tip-d9bc9be89629445758670220787683e37c93f6c1@git.kernel.org>
2012-02-12  1:04 ` [tip:x86/debug] x86/kdump: No need to disable ioapic/ lapic in crash path Yinghai Lu
2012-02-12  3:13   ` Eric W. Biederman
2012-02-12  4:17     ` Yinghai Lu
2012-02-13 12:52       ` Eric W. Biederman
2012-02-13 16:51         ` Yinghai Lu
2012-02-13 18:16           ` Yinghai Lu
2012-02-16 17:27             ` Don Zickus
2012-02-16 21:53               ` Yinghai Lu
2012-02-16 21:56                 ` Don Zickus
2012-02-17  3:38                   ` Eric W. Biederman
2012-02-17 12:41                     ` Eric W. Biederman [this message]
2012-02-17 15:49                       ` HATAYAMA Daisuke
2012-02-17 20:18                         ` Don Zickus
2012-02-20  5:17                           ` HATAYAMA Daisuke
2012-02-20 15:24                             ` Don Zickus
2012-02-17 19:54                       ` Don Zickus
2012-02-18  3:21                         ` Eric W. Biederman
2012-02-20 15:14                           ` Don Zickus
2012-02-21  8:01                             ` Eric W. Biederman
2012-02-21 13:59                               ` Don Zickus
2012-02-29 23:19                                 ` Eric W. Biederman
2012-03-07 10:53                                   ` Fernando Luis Vázquez Cao
2012-03-07 10:54                                     ` [PATCH 1/2] boot: ignore early NMIs Fernando Luis Vázquez Cao
2012-03-07 10:56                                       ` [PATCH 2/2] boot: add early NMI counter Fernando Luis Vázquez Cao
2012-03-08  4:50                                         ` Eric W. Biederman
2012-03-08  6:00                                           ` Fernando Luis Vázquez Cao
2012-03-08  4:41                                       ` [PATCH 1/2] boot: ignore early NMIs Eric W. Biederman
2012-03-08  5:53                                         ` Fernando Luis Vázquez Cao
2012-03-08 16:35                                           ` Eric W. Biederman
2012-03-09  9:31                                             ` Fernando Luis Vázquez Cao
2012-03-09  9:51                                               ` [PATCH 1/3] boot: fortify early_idt_handlers definition Fernando Luis Vázquez Cao
2012-03-09  9:55                                                 ` [PATCH 2/3] boot: ignore early NMIs Fernando Luis Vázquez Cao
2012-03-09 10:01                                                   ` [PATCH 3/3] boot: add early NMI counter Fernando Luis Vázquez Cao
2012-03-09 20:52                                             ` [PATCH 1/2] boot: ignore early NMIs H. Peter Anvin
2012-03-12  5:43                                               ` Fernando Luis Vázquez Cao
2012-03-12  5:49                                                 ` H. Peter Anvin
2012-03-12  6:14                                                   ` Fernando Luis Vázquez Cao
2012-03-12 13:36                                                     ` Vivek Goyal
2012-03-12 19:02                                                       ` Eric W. Biederman
2012-03-12 19:58                                                         ` Vivek Goyal
2012-03-12 20:02                                                         ` H. Peter Anvin
2012-03-12 18:40                                                     ` H. Peter Anvin
2012-03-12 20:01                                                       ` Eric W. Biederman
2012-03-12 20:04                                                         ` H. Peter Anvin
2012-03-12 20:16                                                           ` H. Peter Anvin
2012-03-13  2:11                                                             ` Fernando Luis Vázquez Cao
2012-03-13 13:33                                                               ` Don Zickus
2012-03-15  0:43                                                                 ` Simon Horman
2012-03-13  1:43                                                       ` Fernando Luis Vázquez Cao
2012-03-12 14:41                                                   ` Don Zickus
2012-03-07 15:50                                     ` [tip:x86/debug] x86/kdump: No need to disable ioapic/ lapic in crash path Vivek Goyal
2012-03-07 18:27                                       ` Yinghai Lu
2012-03-08  1:29                                         ` Fernando Luis Vázquez Cao
2012-03-09  0:59                                     ` HATAYAMA Daisuke
2012-03-09  2:48                                       ` Eric W. Biederman
2012-02-12 11:12   ` Ingo Molnar
2012-02-13 15:28   ` Don Zickus
2012-02-13 16:52     ` Yinghai Lu
2012-02-13 22:12       ` Don Zickus
2012-02-13 22:51         ` Don Zickus
2012-02-16  2:53       ` Don Zickus
2012-02-16 18:43         ` Yinghai Lu
2012-02-16 21:41           ` Don Zickus

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m1fwe9bpk2.fsf@fess.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=akpm@linux-foundation.org \
    --cc=dzickus@redhat.com \
    --cc=hpa@zytor.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vgoyal@redhat.com \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).