From: Havard Skinnemoen <hskinnemoen@google.com>
To: Tony Luck <tony.luck@intel.com>, Borislav Petkov <bp@alien8.de>
Cc: linux-kernel@vger.kernel.org,
Havard Skinnemoen <hskinnemoen@google.com>,
Ewout van Bekkum <ewout@google.com>
Subject: [PATCH 0/6] x86 mce fixes
Date: Wed, 9 Jul 2014 10:09:20 -0700 [thread overview]
Message-ID: <1404925766-32253-1-git-send-email-hskinnemoen@google.com> (raw)
The following series contains a few fixes we came up with while testing the MCE
handling on our servers in the lab. These should fix the following symptoms:
- Once entering CMCI storm mode, we would never exit. This was because we set
the check_interval to be shorter than 30 seconds, so the condition to exit
storm mode could never become true.
- After a storm, the MCE banks previously handled by a CPU could not be
reclaimed.
- After a kexec reboot, none of the MCE banks could be claimed by any CPU.
- Duplicate MCEs were being reported in some circumstances (e.g. with
mce=no_cmci and/or mce=3).
- Crashes because the polling timer was added multiple times.
We're not sure if these patches are the best way to fix these issues, and they
may introduce new, subtle bugs, but it's the best we managed to come up with.
Please take a good look and tell us what we got wrong.
Ewout did all the leg work in getting this implemented and tested, while I've
been providing advice and reviews.
Signed-off-by: Ewout van Bekkum <ewout@google.com>
Signed-off-by: Havard Skinnemoen <hskinnemoen@google.com>
Ewout van Bekkum (6):
x86-mce: Modify CMCI poll interval to adjust for small check_interval
values.
x86-mce: Modify CMCI storm exit to reenable instead of rediscover
banks.
x86-mce: Clear CMCI enable on all claimed CMCI banks before reboot.
x86-mce: Add spinlocks to prevent duplicated MCP and CMCI reports.
x86-mce: check if no_way_out applies before deciding not to clear MCE
banks.
x86-mce: ensure the MCP timer is not already set in the mce_timer_fn.
arch/x86/kernel/cpu/mcheck/mce-internal.h | 2 +
arch/x86/kernel/cpu/mcheck/mce.c | 39 +++++++++++--
arch/x86/kernel/cpu/mcheck/mce_intel.c | 95 ++++++++++++++++++++++++-------
3 files changed, 111 insertions(+), 25 deletions(-)
--
2.0.0.526.g5318336
next reply other threads:[~2014-07-09 17:09 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-09 17:09 Havard Skinnemoen [this message]
2014-07-09 17:09 ` [PATCH 1/6] x86-mce: Modify CMCI poll interval to adjust for small check_interval values Havard Skinnemoen
2014-07-09 19:17 ` Borislav Petkov
2014-07-09 21:24 ` Havard Skinnemoen
2014-07-10 9:01 ` Chen, Gong
2014-07-10 17:16 ` Havard Skinnemoen
2014-07-11 2:12 ` Chen, Gong
2014-07-10 11:42 ` Borislav Petkov
2014-07-10 17:51 ` Havard Skinnemoen
2014-07-10 18:55 ` Tony Luck
2014-07-10 22:45 ` Havard Skinnemoen
2014-07-11 15:35 ` Borislav Petkov
2014-07-11 18:56 ` Havard Skinnemoen
2014-07-11 20:10 ` Borislav Petkov
2014-07-11 20:39 ` Havard Skinnemoen
2014-07-14 14:57 ` Borislav Petkov
2014-07-11 20:22 ` Borislav Petkov
2014-07-12 0:10 ` Havard Skinnemoen
2014-07-14 15:14 ` Borislav Petkov
2014-07-11 20:36 ` Borislav Petkov
2014-07-11 21:05 ` Havard Skinnemoen
2014-07-09 17:09 ` [PATCH 2/6] x86-mce: Modify CMCI storm exit to reenable instead of rediscover banks Havard Skinnemoen
2014-07-09 20:20 ` Luck, Tony
2014-07-09 21:34 ` Havard Skinnemoen
2014-07-10 15:51 ` Borislav Petkov
2014-07-10 18:32 ` Havard Skinnemoen
2014-07-09 17:09 ` [PATCH 3/6] x86-mce: Clear CMCI enable on all claimed CMCI banks before reboot Havard Skinnemoen
2014-07-09 20:36 ` Luck, Tony
2014-07-09 21:40 ` Havard Skinnemoen
2014-07-10 16:24 ` Borislav Petkov
2014-07-10 16:33 ` Tony Luck
2014-07-10 17:56 ` Havard Skinnemoen
2014-07-10 18:27 ` Tony Luck
2014-07-10 18:30 ` Borislav Petkov
2014-07-09 17:09 ` [PATCH 4/6] x86-mce: Add spinlocks to prevent duplicated MCP and CMCI reports Havard Skinnemoen
2014-07-09 20:35 ` Andi Kleen
2014-07-09 21:51 ` Havard Skinnemoen
2014-07-09 23:32 ` Luck, Tony
2014-07-10 8:16 ` Borislav Petkov
2014-07-09 20:47 ` Luck, Tony
2014-07-09 21:56 ` Havard Skinnemoen
2014-07-10 16:41 ` Borislav Petkov
2014-07-10 18:03 ` Havard Skinnemoen
2014-07-10 18:44 ` Borislav Petkov
2014-07-10 18:57 ` Tony Luck
2014-07-10 19:12 ` Borislav Petkov
2014-07-11 9:24 ` Borislav Petkov
2014-07-11 19:06 ` Tony Luck
2014-07-11 19:52 ` Borislav Petkov
2014-07-11 21:15 ` Havard Skinnemoen
2014-07-17 10:50 ` Borislav Petkov
2014-07-18 21:23 ` Tony Luck
2014-07-18 21:31 ` Borislav Petkov
2014-07-09 17:09 ` [PATCH 5/6] x86-mce: check if no_way_out applies before deciding not to clear MCE banks Havard Skinnemoen
2014-07-09 21:00 ` Luck, Tony
2014-07-09 23:00 ` Havard Skinnemoen
2014-07-09 23:27 ` Luck, Tony
2014-07-10 16:49 ` Borislav Petkov
2014-07-09 17:09 ` [PATCH 6/6] x86-mce: ensure the MCP timer is not already set in the mce_timer_fn Havard Skinnemoen
2014-07-09 21:04 ` Luck, Tony
2014-07-09 23:01 ` Havard Skinnemoen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1404925766-32253-1-git-send-email-hskinnemoen@google.com \
--to=hskinnemoen@google.com \
--cc=bp@alien8.de \
--cc=ewout@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.