All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andi Kleen <andi@firstfloor.org>
To: Robert Richter <robert.richter@amd.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>,
	Don Zickus <dzickus@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Lin Ming <ming.m.lin@intel.com>, Ingo Molnar <mingo@elte.hu>,
	"fweisbec@gmail.com" <fweisbec@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Huang, Ying" <ying.huang@intel.com>,
	Yinghai Lu <yinghai@kernel.org>, Andi Kleen <andi@firstfloor.org>
Subject: Re: A question of perf NMI handler
Date: Wed, 4 Aug 2010 21:22:04 +0200	[thread overview]
Message-ID: <20100804192204.GG13161@basil.fritz.box> (raw)
In-Reply-To: <20100804184806.GL26154@erda.amd.com>

> Only the upper 2 bits in io_61h indicate the nmi reason, so in case of
> (!(reason & 0xc0)) the source simply can not be determined and all nmi
> handlers in the chain must be called (DIE_NMI/DIE_NMI_IPI). The
> perfctr handler then stops it.
> 
> So you can decide to either get an unrecovered nmi panic triggered by
> a perfctr or losing unknown nmis from other sources. Maybe this can be
> fixed by implementing handlers for those sources.

This is a tricky area. Me and Ying have been looking at this recently.

Hardware traditionally signals NMI when it has a uncontained error and really 
expects the OS to shut down to prevent data corruption spreading. i

Unfortunately especially for some older hardware
there can be cases where this is not expressed in port 61.
But the default behaviour of Linux for this today is quite wrong.

Some cases can be also determined with the help of APEI, which
can give you more information about the error (and tell you
if shutdown is needed).

But of course we can still have performance counter and other NMI
users.

So the right flow might be something like

- check software events (like crash dump or reboot)
- check perfctrs
- check APEI
- check port 61 for known events (it's probably a good idea
to check perfctrs first because accessing io ports is quite slow.
But the perfctr handler has to make sure it doesn't eat unknown
events, otherwise error handling would be impacted)
- check other event sources
- shutdown (depending on the chipset likely)

This means the NMI users who cannot determine themselves if a event
happened and eat everything (like oprofile today) would need to be fixed.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

  reply	other threads:[~2010-08-04 19:22 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-04  9:21 A question of perf NMI handler Lin Ming
2010-08-04  9:50 ` Peter Zijlstra
2010-08-04 10:01 ` Robert Richter
2010-08-04 10:24   ` Peter Zijlstra
2010-08-04 10:29     ` Robert Richter
2010-08-04 14:00   ` Don Zickus
2010-08-04 14:11     ` Peter Zijlstra
2010-08-04 14:52       ` Don Zickus
2010-08-04 15:02         ` Peter Zijlstra
2010-08-04 15:18           ` Cyrill Gorcunov
2010-08-04 15:50             ` Don Zickus
2010-08-04 16:10               ` Cyrill Gorcunov
2010-08-04 16:20                 ` Don Zickus
2010-08-04 16:39                   ` Cyrill Gorcunov
2010-08-04 18:48                     ` Robert Richter
2010-08-04 19:22                       ` Andi Kleen [this message]
2010-08-04 19:26                       ` Cyrill Gorcunov
2010-08-06  6:52                         ` Robert Richter
2010-08-06 14:21                           ` Don Zickus
2010-08-09 19:48                             ` [PATCH] perf, x86: try to handle unknown nmis with running perfctrs Robert Richter
2010-08-09 20:02                               ` Cyrill Gorcunov
2010-08-10  7:42                                 ` Robert Richter
2010-08-10 16:16                                   ` Cyrill Gorcunov
2010-08-10 16:41                                     ` Robert Richter
2010-08-10 17:24                                       ` Cyrill Gorcunov
2010-08-10 19:05                                         ` Robert Richter
2010-08-10 19:24                                           ` Cyrill Gorcunov
2010-08-12 13:24                                             ` Robert Richter
2010-08-12 14:31                                               ` Cyrill Gorcunov
2010-08-10 20:48                               ` Don Zickus
2010-08-11  2:44                                 ` Frederic Weisbecker
2010-08-11 11:10                                   ` Robert Richter
2010-08-11 12:44                                     ` Don Zickus
2010-08-11 14:03                                       ` Robert Richter
2010-08-11 14:32                                         ` Don Zickus
2010-08-13  4:37                                     ` Frederic Weisbecker
2010-08-13  8:22                                       ` Robert Richter
2010-08-14  1:28                                         ` Frederic Weisbecker
2010-08-14  2:29                                           ` Robert Richter
2010-08-11 12:39                                   ` Don Zickus
2010-08-11  3:19                                 ` Huang Ying
2010-08-11 12:36                                   ` Don Zickus
2010-08-16 14:37                                     ` Peter Zijlstra
2010-08-11 22:00                               ` [PATCH -v2] " Robert Richter
2010-08-12 13:10                                 ` Robert Richter
2010-08-12 18:21                                   ` Don Zickus
2010-08-16  7:37                                     ` Robert Richter
2010-08-12 13:52                                 ` Don Zickus
2010-08-13  4:25                                 ` Frederic Weisbecker
2010-08-16 14:48                                 ` Peter Zijlstra
2010-08-16 16:27                                   ` Cyrill Gorcunov
2010-08-16 17:16                                     ` Robert Richter
2010-08-16 19:06                                       ` Cyrill Gorcunov
2010-08-16 19:13                                         ` Peter Zijlstra
2010-08-16 19:18                                           ` Cyrill Gorcunov
2010-08-16 22:55                                         ` Robert Richter
2010-08-17 15:23                                           ` Cyrill Gorcunov
2010-08-17 15:22                               ` [PATCH -v3] " Robert Richter
2010-08-17 16:17                                 ` Cyrill Gorcunov
2010-08-19 10:45                                 ` Peter Zijlstra
2010-08-19 12:39                                   ` Robert Richter
2010-08-19 14:12                                   ` Don Zickus
2010-08-19 14:27                                     ` Peter Zijlstra
2010-08-19 15:20                                       ` Don Zickus
2010-08-19 17:43                                       ` Cyrill Gorcunov
2010-08-19 17:53                                         ` Peter Zijlstra
2010-08-19 21:58                                       ` Don Zickus
2010-08-20  8:50                                         ` Peter Zijlstra
2010-08-20  1:50                                       ` Don Zickus
2010-08-20  8:16                                         ` Ingo Molnar
2010-08-20 10:04                                           ` Peter Zijlstra
2010-08-20 10:30                                             ` Cyrill Gorcunov
2010-08-20 12:39                                             ` Don Zickus
2010-08-20 13:27                                               ` Ingo Molnar
2010-08-20 13:51                                                 ` Don Zickus
2010-08-20 14:17                                                   ` Ingo Molnar
2010-08-20 20:45                                                     ` Cyrill Gorcunov
2010-08-24 21:48                                                     ` Don Zickus
2010-08-20  8:36                                         ` Robert Richter
2010-08-20 14:17                                       ` [tip:perf/urgent] perf, x86: Fix handle_irq return values tip-bot for Peter Zijlstra
2010-08-20 14:17                                 ` [tip:perf/urgent] perf, x86: Try to handle unknown nmis with an enabled PMU tip-bot for Robert Richter
2010-08-06 15:35                   ` A question of perf NMI handler Andi Kleen
2010-08-04 15:45           ` Don Zickus
2010-08-06 15:37           ` Andi Kleen
2010-08-04 13:54 ` Don Zickus

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100804192204.GG13161@basil.fritz.box \
    --to=andi@firstfloor.org \
    --cc=dzickus@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=gorcunov@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.m.lin@intel.com \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=robert.richter@amd.com \
    --cc=ying.huang@intel.com \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.