linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] x86/mce: protect nr_cpus from rebooting by broadcast mce
@ 2019-08-05  8:58 Pingfan Liu
  2019-08-05  8:58 ` [PATCH 1/4] x86/apic: correct the ENO in generic_processor_info() Pingfan Liu
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Pingfan Liu @ 2019-08-05  8:58 UTC (permalink / raw)
  To: Thomas Gleixner, Andy Lutomirski, x86
  Cc: Pingfan Liu, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Dave Hansen, Peter Zijlstra, Masami Hiramatsu, Qian Cai,
	Vlastimil Babka, Daniel Drake, Jacob Pan, Michal Hocko,
	Eric Biederman, linux-kernel, Dave Young, Baoquan He, kexec

This series include two related groups:
[1-3/4]: protect nr_cpus from rebooting by broadcast mce
[4/4]: improve "kexec -l" robustness against broadcast mce

When I tried to fix [1], Thomas raised concern about the nr_cpus' vulnerability
to unexpected rebooting by broadcast mce. After analysis, I think only the
following first case suffers from the rebooting by broadcast mce. [1-3/4] aims
to fix that issue.

*** Back ground ***

On x86 it's required to have all logical CPUs set CR4.MCE=1. Otherwise, a
broadcast MCE observing CR4.MCE=0b on any core will shutdown the machine.

The option 'nosmt' has already complied with the above rule by Thomas's patch.
For detail, refer to 506a66f3748 (Revert "x86/apic: Ignore secondary threads if
nosmt=force")

But for nr_cpus option, the exposure to broadcast MCE is a little complicated,
and can be categorized into three cases.

-1. boot up by BIOS. Since no one set CR4.MCE=1, nr_cpus risks rebooting by
broadcast MCE.

-2. boot up by "kexec -p nr_cpus=".  Since the 1st kernel has all cpus'
CR4.MCE=1 set before kexec -p, nr_cpus is free of rebooting by broadcast MCE.
Furthermore, the crashed kernel's wreckage, including page table and text, is
not touched by capture kernel. Hence if MCE event happens on capped cpu,
do_machine_check->__mc_check_crashing_cpu() runs smoothly and returns
immediately, the capped cpu is still pinned on "halt".

-3. boot up by "kexec -l nr_cpus=". As "kexec -p", it is free of rebooting by
broadcast MCE. But the 1st kernel's wreckage is discarded and changed.  when
capped cpus execute do_machine_check(), they may crack the new kernel.  But
this is not related with broadcast MCE, and need an extra fix.

*** Solution ***
"nr_cpus" can not follow the same way as "nosmt".  Because nr_cpus limits the
allocation of percpu area and some other kthread memory, which is critical to
cpu hotplug framework.  Instead, developing a dedicated SIPI callback
make_capped_cpu_stable() for capped cpu, which does not lean on percpu area to
work.

[1]: https://lkml.org/lkml/2019/7/5/3

To: Gleixner <tglx@linutronix.de>
To: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
To: x86@kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Daniel Drake <drake@endlessm.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: linux-kernel@vger.kernel.org
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: kexec@lists.infradead.org

---
Pingfan Liu (4):
  x86/apic: correct the ENO in generic_processor_info()
  x86/apic: record capped cpu in generic_processor_info()
  x86/smp: send capped cpus to a stable state when smp_init()
  x86/smp: disallow MCE handler on rebooting AP

 arch/x86/include/asm/apic.h  |  1 +
 arch/x86/include/asm/smp.h   |  3 ++
 arch/x86/kernel/apic/apic.c  | 23 ++++++++----
 arch/x86/kernel/cpu/common.c |  7 ++++
 arch/x86/kernel/smp.c        |  8 +++++
 arch/x86/kernel/smpboot.c    | 83 ++++++++++++++++++++++++++++++++++++++++++++
 kernel/smp.c                 |  6 ++++
 7 files changed, 124 insertions(+), 7 deletions(-)

-- 
2.7.5


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-08-08  6:51 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-05  8:58 [PATCH 0/4] x86/mce: protect nr_cpus from rebooting by broadcast mce Pingfan Liu
2019-08-05  8:58 ` [PATCH 1/4] x86/apic: correct the ENO in generic_processor_info() Pingfan Liu
2019-08-05  8:58 ` [PATCH 2/4] x86/apic: record capped cpu " Pingfan Liu
2019-08-08  0:17   ` kbuild test robot
2019-08-08  0:17   ` [RFC PATCH] x86/apic: __cpu_capped_mask can be static kbuild test robot
2019-08-05  8:58 ` [PATCH 3/4] x86/smp: send capped cpus to a stable state when smp_init() Pingfan Liu
2019-08-08  1:20   ` kbuild test robot
2019-08-08  1:20   ` [RFC PATCH] x86/smp: __cpu_capped_done_mask can be static kbuild test robot
2019-08-08  2:36   ` [PATCH 3/4] x86/smp: send capped cpus to a stable state when smp_init() kbuild test robot
2019-08-08  5:18   ` kbuild test robot
2019-08-05  8:58 ` [PATCH 4/4] x86/smp: disallow MCE handler on rebooting AP Pingfan Liu
2019-08-07  3:00 ` [PATCH 0/4] x86/mce: protect nr_cpus from rebooting by broadcast mce Dave Young
2019-08-07  7:52   ` Pingfan Liu
2019-08-07 13:07     ` Thomas Gleixner
2019-08-08  5:41       ` Pingfan Liu
2019-08-08  6:51         ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).