linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>, X86 ML <x86@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] x86/MCE: Remove MCP_TIMESTAMP
Date: Mon, 7 Nov 2016 19:08:54 +0100	[thread overview]
Message-ID: <20161107180853.4uxlvtoychzhwr2q@pd.tnic> (raw)
In-Reply-To: <3908561D78D1C84285E8C5FCA982C28F3A22720B@ORSMSX114.amr.corp.intel.com>

On Mon, Nov 07, 2016 at 05:48:46PM +0000, Luck, Tony wrote:
> > So, get rid of all that and simply log an MCE with a TSC value always.
> > Simplifies the code a bit too.
> 
> I'm not necessarily opposed to this ... but there was once some logic behind when
> logged TSC, and when we didn't.  Essentially we wanted the TSC when we were
> logging from #CMCI or #MC .... because the detection of the error was fresh, and
> wanted as much precision on the logged time as possible to compare with logged
> errors from other banks/cpus. This might allow us to distinguish multiple errors logged
> in the same #CMCI, from errors logged in separate #CMCI a tenth of a second apart.
> 
> If we found the error while polling, we didn’t want to provide a false sense of precision.
> The error could have been logged up to five minutes previously (or when logging
> errors during the initial poll of the banks an arbitrary time in the past).

Right, looks like we've lost that logic:

Functions calling this function: machine_check_poll

  File         Function                  Line
0 mce-inject.c raise_poll                  57 machine_check_poll(0, &b);
1 mce.c        mce_timer_fn              1358 machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_poll_banks));
2 mce.c        __mcheck_cpu_init_generic 1508 machine_check_poll(MCP_UC | m_fl, &all_banks);
3 mce_intel.c  mce_intel_cmci_poll        133 if (machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_banks_owned)))
4 mce_intel.c  intel_threshold_interrupt  253 machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_banks_owned));
5 mce_intel.c  cmci_recheck               345 machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_banks_owned));

So the TSC timestamp will be possibly inexact now in mce_timer_fn(),
__mcheck_cpu_init_generic(), mce_intel_cmci_poll() and cmci_recheck().

Should we bother and add a flag to struct mce - maybe somewhere in the
padding __u8 pad; - to denote that the logged TSC may not be exact?

Mind you, there's also

	m->time = get_seconds();

which also collects time and which could also be possibly inexact.

One other possibility would be to use ->time and write ->tsc *only*
when exact - i.e., in the handler - and this is then enough info about
timing.

->time will give you somewhere around where it happened and ->tsc - only
if set - will give you exact, well, *timestamp* :)

This sounds like a pretty straightforward logic to me...

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

  reply	other threads:[~2016-11-07 18:09 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-01 12:09 [RFC PATCH 0/3] x86/RAS: Dump error record to dmesg if no consumers Borislav Petkov
2016-11-01 12:09 ` [RFC PATCH 1/3] notifiers: Document notifier priority Borislav Petkov
2016-11-01 12:09 ` [RFC PATCH 2/3] x86/RAS: Add TSC to the injected MCE Borislav Petkov
2016-11-08 16:19   ` [tip:ras/core] x86/RAS: Add TSC timestamp " tip-bot for Borislav Petkov
2016-11-01 12:09 ` [RFC PATCH 3/3] x86/MCE: Dump MCE to dmesg if no consumers Borislav Petkov
2016-11-08 16:19   ` [tip:ras/core] " tip-bot for Borislav Petkov
2016-11-05 13:11 ` [PATCH] x86/MCE: Remove MCP_TIMESTAMP Borislav Petkov
2016-11-07 17:48   ` Luck, Tony
2016-11-07 18:08     ` Borislav Petkov [this message]
2016-11-07 18:37       ` Luck, Tony
2016-11-08 18:09         ` Borislav Petkov
2016-11-08 18:22           ` Luck, Tony
2016-11-08 20:39           ` Thomas Gleixner
2016-11-08 21:08             ` Borislav Petkov
2016-11-08 21:14               ` Thomas Gleixner
2016-11-08 21:24                 ` Borislav Petkov
2016-11-08 21:54                   ` Thomas Gleixner
2016-11-09 18:06                     ` Borislav Petkov
2017-01-18 20:34                       ` Borislav Petkov
2017-01-19 13:34                         ` Borislav Petkov
2016-11-07  7:37 ` [RFC PATCH 0/3] x86/RAS: Dump error record to dmesg if no consumers Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161107180853.4uxlvtoychzhwr2q@pd.tnic \
    --to=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).