All of lore.kernel.org
 help / color / mirror / Atom feed
From: Havard Skinnemoen <hskinnemoen@google.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Ewout van Bekkum <ewout@google.com>
Subject: Re: [PATCH 1/6] x86-mce: Modify CMCI poll interval to adjust for small check_interval values.
Date: Wed, 9 Jul 2014 14:24:31 -0700	[thread overview]
Message-ID: <CAFQmdRa5Spr0nX6qwzhDGEU9+H1_0vaCtF_NRV=p=OBDwin78A@mail.gmail.com> (raw)
In-Reply-To: <20140709191747.GB5249@pd.tnic>

On Wed, Jul 9, 2014 at 12:17 PM, Borislav Petkov <bp@alien8.de> wrote:
>
> On Wed, Jul 09, 2014 at 10:09:21AM -0700, Havard Skinnemoen wrote:
> > From: Ewout van Bekkum <ewout@google.com>
> >
> > The CMCI poll interval was updated to pick the minimum interval between
> > the original 30 seconds and the check_interval divided by 8 (minimum of
> > 3 polls).
>
> Why min 3 polls? How do you come up with exactly that frequency?

The idea is that if we make it equal to check_interval, it might
bounce back and forth a lot. So we need to divide by something, and 8
seems like a nice, safe value, and it seems to work well. We're not
opposed to considering other values, of course (e.g. 2 and 4 might
work well too, but with somewhat higher risk of ping-ponging).

> > This resolves a bug where the CMCI storm handler is unable to return to
> > interrupt mode from polling mode, if the check_interval shorter than the
> > CMCI poll interval. This problem is caused by the mce_timer_fn function
> > which only allows the poll interval to be incremented up to the
> > check_interval, while the mce_intel_adjust_timer function requires the
> > poll interval to be greater than the CMCI poll interval before leaving
> > the CMCI_STORM_ACTIVE state.
>
> Interesting. So it seems you guys want to set the check_interval to
> something < 30 secs.
>
> Out of curiosity, what is your use case which requires such small
> check_interval setting?

I'm not entirely sure. At some point, it ended up that way, and it
broke in non-obvious ways, so we wanted to fix it.

> Maybe we need to redesign and simplify this intervals thing to make it
> more user-friendly...
>
> Btw, on a related note, we're working on a small mechanism which
> collects correctable errors in the kernel and when a certain count for a
> physical error address has been reached, we soft-offline that page. We'd
> appreciate it if you guys took a look and told us whether it makes sense
> to you:
>
> http://lkml.kernel.org/r/1404242623-10094-1-git-send-email-bp@alien8.de

We will definitely take a look, thanks. Looks interesting, though it's
not always obvious what works for us until we actually go and try it.

Havard

  reply	other threads:[~2014-07-09 21:24 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-09 17:09 [PATCH 0/6] x86 mce fixes Havard Skinnemoen
2014-07-09 17:09 ` [PATCH 1/6] x86-mce: Modify CMCI poll interval to adjust for small check_interval values Havard Skinnemoen
2014-07-09 19:17   ` Borislav Petkov
2014-07-09 21:24     ` Havard Skinnemoen [this message]
2014-07-10  9:01       ` Chen, Gong
2014-07-10 17:16         ` Havard Skinnemoen
2014-07-11  2:12           ` Chen, Gong
2014-07-10 11:42       ` Borislav Petkov
2014-07-10 17:51         ` Havard Skinnemoen
2014-07-10 18:55           ` Tony Luck
2014-07-10 22:45             ` Havard Skinnemoen
2014-07-11 15:35               ` Borislav Petkov
2014-07-11 18:56                 ` Havard Skinnemoen
2014-07-11 20:10                   ` Borislav Petkov
2014-07-11 20:39                     ` Havard Skinnemoen
2014-07-14 14:57                       ` Borislav Petkov
2014-07-11 20:22                   ` Borislav Petkov
2014-07-12  0:10                     ` Havard Skinnemoen
2014-07-14 15:14                       ` Borislav Petkov
2014-07-11 20:36                   ` Borislav Petkov
2014-07-11 21:05                     ` Havard Skinnemoen
2014-07-09 17:09 ` [PATCH 2/6] x86-mce: Modify CMCI storm exit to reenable instead of rediscover banks Havard Skinnemoen
2014-07-09 20:20   ` Luck, Tony
2014-07-09 21:34     ` Havard Skinnemoen
2014-07-10 15:51       ` Borislav Petkov
2014-07-10 18:32         ` Havard Skinnemoen
2014-07-09 17:09 ` [PATCH 3/6] x86-mce: Clear CMCI enable on all claimed CMCI banks before reboot Havard Skinnemoen
2014-07-09 20:36   ` Luck, Tony
2014-07-09 21:40     ` Havard Skinnemoen
2014-07-10 16:24       ` Borislav Petkov
2014-07-10 16:33         ` Tony Luck
2014-07-10 17:56         ` Havard Skinnemoen
2014-07-10 18:27           ` Tony Luck
2014-07-10 18:30           ` Borislav Petkov
2014-07-09 17:09 ` [PATCH 4/6] x86-mce: Add spinlocks to prevent duplicated MCP and CMCI reports Havard Skinnemoen
2014-07-09 20:35   ` Andi Kleen
2014-07-09 21:51     ` Havard Skinnemoen
2014-07-09 23:32       ` Luck, Tony
2014-07-10  8:16         ` Borislav Petkov
2014-07-09 20:47   ` Luck, Tony
2014-07-09 21:56     ` Havard Skinnemoen
2014-07-10 16:41   ` Borislav Petkov
2014-07-10 18:03     ` Havard Skinnemoen
2014-07-10 18:44       ` Borislav Petkov
2014-07-10 18:57         ` Tony Luck
2014-07-10 19:12           ` Borislav Petkov
2014-07-11  9:24             ` Borislav Petkov
2014-07-11 19:06               ` Tony Luck
2014-07-11 19:52                 ` Borislav Petkov
2014-07-11 21:15                   ` Havard Skinnemoen
2014-07-17 10:50                     ` Borislav Petkov
2014-07-18 21:23                       ` Tony Luck
2014-07-18 21:31                         ` Borislav Petkov
2014-07-09 17:09 ` [PATCH 5/6] x86-mce: check if no_way_out applies before deciding not to clear MCE banks Havard Skinnemoen
2014-07-09 21:00   ` Luck, Tony
2014-07-09 23:00     ` Havard Skinnemoen
2014-07-09 23:27       ` Luck, Tony
2014-07-10 16:49         ` Borislav Petkov
2014-07-09 17:09 ` [PATCH 6/6] x86-mce: ensure the MCP timer is not already set in the mce_timer_fn Havard Skinnemoen
2014-07-09 21:04   ` Luck, Tony
2014-07-09 23:01     ` Havard Skinnemoen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFQmdRa5Spr0nX6qwzhDGEU9+H1_0vaCtF_NRV=p=OBDwin78A@mail.gmail.com' \
    --to=hskinnemoen@google.com \
    --cc=bp@alien8.de \
    --cc=ewout@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.