From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C479C33CA9 for ; Tue, 14 Jan 2020 05:55:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C36FE2072B for ; Tue, 14 Jan 2020 05:55:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726175AbgANFzX (ORCPT ); Tue, 14 Jan 2020 00:55:23 -0500 Received: from mga17.intel.com ([192.55.52.151]:2571 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725854AbgANFzX (ORCPT ); Tue, 14 Jan 2020 00:55:23 -0500 X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Jan 2020 21:55:22 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,431,1571727600"; d="scan'208";a="217634641" Received: from sjchrist-coffee.jf.intel.com (HELO linux.intel.com) ([10.54.74.202]) by orsmga008.jf.intel.com with ESMTP; 13 Jan 2020 21:55:21 -0800 Date: Mon, 13 Jan 2020 21:55:21 -0800 From: Sean Christopherson To: "Luck, Tony" Cc: Peter Zijlstra , Ingo Molnar , Fenghua Yu , Thomas Gleixner , Ingo Molnar , Borislav Petkov , H Peter Anvin , Ashok Raj , Ravi V Shankar , linux-kernel , x86 Subject: Re: [PATCH v11] x86/split_lock: Enable split lock detection by kernel Message-ID: <20200114055521.GI14928@linux.intel.com> References: <20191121060444.GA55272@gmail.com> <20191121130153.GS4097@hirez.programming.kicks-ass.net> <20191121171214.GD12042@gmail.com> <20191121173444.GA5581@agluck-desk2.amr.corp.intel.com> <20191122105141.GY4114@hirez.programming.kicks-ass.net> <20191122152715.GA1909@hirez.programming.kicks-ass.net> <20191123003056.GA28761@agluck-desk2.amr.corp.intel.com> <20191125161348.GA12178@linux.intel.com> <20191212085948.GS2827@hirez.programming.kicks-ass.net> <20200110192409.GA23315@agluck-desk2.amr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200110192409.GA23315@agluck-desk2.amr.corp.intel.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 10, 2020 at 11:24:09AM -0800, Luck, Tony wrote: > From: Peter Zijlstra > > A split-lock occurs when an atomic instruction operates on data > that spans two cache lines. In order to maintain atomicity the > core takes a global bus lock. > > This is typically >1000 cycles slower than an atomic operation > within a cache line. It also disrupts performance on other cores > (which must wait for the bus lock to be released before their > memory operations can complete. For real-time systems this may > mean missing deadlines. For other systems it may just be very > annoying. > > Some CPUs have the capability to raise an #AC trap when a > split lock is attempted. > > Provide a command line option to give the user choices on how > to handle this. split_lock_detect= > off - not enabled (no traps for split locks) > warn - warn once when an application does a > split lock, bust allow it to continue > running. > fatal - Send SIGBUS to applications that cause split lock > > Default is "warn". Note that if the kernel hits a split lock > in any mode other than "off" it will OOPs. > > One implementation wrinkle is that the MSR to control the > split lock detection is per-core, not per thread. This might > result in some short lived races on HT systems in "warn" mode > if Linux tries to enable on one thread while disabling on > the other. Race analysis by Sean Christopherson: > > - Toggling of split-lock is only done in "warn" mode. Worst case > scenario of a race is that a misbehaving task will generate multiple > #AC exceptions on the same instruction. And this race will only occur > if both siblings are running tasks that generate split-lock #ACs, e.g. > a race where sibling threads are writing different values will only > occur if CPUx is disabling split-lock after an #AC and CPUy is > re-enabling split-lock after *its* previous task generated an #AC. > - Transitioning between modes at runtime isn't supported and disabling > is tracked per task, so hardware will always reach a steady state that > matches the configured mode. I.e. split-lock is guaranteed to be > enabled in hardware once all _TIF_SLD threads have been scheduled out. > > Co-developed-by: Fenghua Yu Need Fenghua's SoB. > Co-developed-by: Peter Zijlstra Co-developed-by for Peter not needed since he's the author (attributed via From). > Signed-off-by: Peter Zijlstra (Intel) > Signed-off-by: Tony Luck > --- > > I think all the known places where split locks occur in the kernel > have already been patched, or the patches are queued for the upcoming > merge window. If we missed some, well this patch will help find them > (for people with Icelake or Icelake Xeon systems). PeterZ didn't see > any application level use of split locks in a few hours of runtime > on his desktop. So likely little fallout there (default is just to > warn for applications, so just console noise rather than failure). > > .../admin-guide/kernel-parameters.txt | 18 ++ > arch/x86/include/asm/cpu.h | 17 ++ > arch/x86/include/asm/cpufeatures.h | 2 + > arch/x86/include/asm/msr-index.h | 8 + > arch/x86/include/asm/thread_info.h | 6 +- > arch/x86/include/asm/traps.h | 1 + > arch/x86/kernel/cpu/common.c | 2 + > arch/x86/kernel/cpu/intel.c | 170 ++++++++++++++++++ > arch/x86/kernel/process.c | 3 + > arch/x86/kernel/traps.c | 29 ++- > 10 files changed, 252 insertions(+), 4 deletions(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index ade4e6ec23e0..173c1acff5f0 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -3181,6 +3181,24 @@ > > nosoftlockup [KNL] Disable the soft-lockup detector. > > + split_lock_detect= Would it make sense to name this split_lock_ac? To help clarify what the param does and to future proof a bit in the event split lock detection is able to signal some other form of fault/trap. > + [X86] Enable split lock detection > + > + When enabled (and if hardware support is present), atomic > + instructions that access data across cache line > + boundaries will result in an alignment check exception. > + > + off - not enabled > + > + warn - the kernel will pr_alert about applications > + triggering the #AC exception > + > + fatal - the kernel will SIGBUS applications that > + trigger the #AC exception. > + > + For any more other than 'off' the kernel will die if > + it (or firmware) will trigger #AC. > + > nosync [HW,M68K] Disables sync negotiation for all devices. > > nowatchdog [KNL] Disable both lockup detectors, i.e. ... > diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h > index d779366ce3f8..d23638a0525e 100644 > --- a/arch/x86/include/asm/thread_info.h > +++ b/arch/x86/include/asm/thread_info.h > @@ -92,6 +92,7 @@ struct thread_info { > #define TIF_NOCPUID 15 /* CPUID is not accessible in userland */ > #define TIF_NOTSC 16 /* TSC is not accessible in userland */ > #define TIF_IA32 17 /* IA32 compatibility process */ > +#define TIF_SLD 18 /* split_lock_detect */ A more informative name comment would be helpful since the flag is set when SLD is disabled by the previous task. Something like? #define TIF_NEED_SLD_RESTORE 18 /* Restore split lock detection on context switch */ > #define TIF_NOHZ 19 /* in adaptive nohz mode */ > #define TIF_MEMDIE 20 /* is terminating due to OOM killer */ > #define TIF_POLLING_NRFLAG 21 /* idle is polling for TIF_NEED_RESCHED */ > @@ -122,6 +123,7 @@ struct thread_info { > #define _TIF_NOCPUID (1 << TIF_NOCPUID) > #define _TIF_NOTSC (1 << TIF_NOTSC) > #define _TIF_IA32 (1 << TIF_IA32) > +#define _TIF_SLD (1 << TIF_SLD) > #define _TIF_NOHZ (1 << TIF_NOHZ) > #define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG) > #define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP) > @@ -158,9 +160,9 @@ struct thread_info { > > #ifdef CONFIG_X86_IOPL_IOPERM > # define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | \ > - _TIF_IO_BITMAP) > + _TIF_IO_BITMAP | _TIF_SLD) > #else > -# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY) > +# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | _TIF_SLD) > #endif > > #define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW) > diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h > index ffa0dc8a535e..6ceab60370f0 100644 > --- a/arch/x86/include/asm/traps.h > +++ b/arch/x86/include/asm/traps.h > @@ -175,4 +175,5 @@ enum x86_pf_error_code { > X86_PF_INSTR = 1 << 4, > X86_PF_PK = 1 << 5, > }; > + Spurious whitespace. > #endif /* _ASM_X86_TRAPS_H */ > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c > index 2e4d90294fe6..39245f61fad0 100644 ... > +bool handle_split_lock(void) This is a confusing name IMO, e.g. split_lock_detect_enabled() or similar would be more intuitive. It'd also avoid the weirdness of having different semantics for the returns values of handle_split_lock() and handle_user_split_lock(). > +{ > + return sld_state != sld_off; > +} > + > +bool handle_user_split_lock(struct pt_regs *regs, long error_code) > +{ > + if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal) > + return false; Maybe add "|| WARN_ON_ONCE(sld_state != sld_off)" to try to prevent the kernel from going fully into the weeds if a spurious #AC occurs. > + > + pr_alert("#AC: %s/%d took a split_lock trap at address: 0x%lx\n", pr_warn_ratelimited since it's user controlled? > + current->comm, current->pid, regs->ip); > + > + __sld_msr_set(false); > + set_tsk_thread_flag(current, TIF_SLD); > + return true; > +} > + > +void switch_sld(struct task_struct *prev) > +{ > + __sld_msr_set(true); > + clear_tsk_thread_flag(prev, TIF_SLD); > +} > + > +#define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY} > + > +/* > + * The following processors have split lock detection feature. But since they > + * don't have MSR IA32_CORE_CAPABILITIES, the feature cannot be enumerated by > + * the MSR. So enumerate the feature by family and model on these processors. > + */ > +static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = { > + SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_X), > + SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_L), > + {} > +}; > + > +void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) > +{ > + u64 ia32_core_caps = 0; > + > + if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) { > + /* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */ > + rdmsrl(MSR_IA32_CORE_CAPABILITIES, ia32_core_caps); > + } else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) { > + /* Enumerate split lock detection by family and model. */ > + if (x86_match_cpu(split_lock_cpu_ids)) > + ia32_core_caps |= MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT; > + } > + > + if (ia32_core_caps & MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT) > + split_lock_setup(); > +} > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > index 61e93a318983..55d205820f35 100644 > --- a/arch/x86/kernel/process.c > +++ b/arch/x86/kernel/process.c > @@ -654,6 +654,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p) > /* Enforce MSR update to ensure consistent state */ > __speculation_ctrl_update(~tifn, tifn); > } > + > + if (tifp & _TIF_SLD) > + switch_sld(prev_p); > } > > /* > diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c > index 05da6b5b167b..a933a01f6e40 100644 > --- a/arch/x86/kernel/traps.c > +++ b/arch/x86/kernel/traps.c > @@ -46,6 +46,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -242,7 +243,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs, > { > struct task_struct *tsk = current; > > - Whitespace. > if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code)) > return; > > @@ -288,9 +288,34 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE, 0, NULL, "coprocessor segment overru > DO_ERROR(X86_TRAP_TS, SIGSEGV, 0, NULL, "invalid TSS", invalid_TSS) > DO_ERROR(X86_TRAP_NP, SIGBUS, 0, NULL, "segment not present", segment_not_present) > DO_ERROR(X86_TRAP_SS, SIGBUS, 0, NULL, "stack segment", stack_segment) > -DO_ERROR(X86_TRAP_AC, SIGBUS, BUS_ADRALN, NULL, "alignment check", alignment_check) > #undef IP > > +dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code) > +{ > + unsigned int trapnr = X86_TRAP_AC; > + char str[] = "alignment check"; const if you want to keep it. > + int signr = SIGBUS; Don't see any reason for these, e.g. they're not used for do_trap(). trapnr and signr in particular do more harm than good. > + RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU"); > + > + if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) == NOTIFY_STOP) > + return; > + > + if (!handle_split_lock()) > + return; > + > + if (!user_mode(regs)) > + die("Split lock detected\n", regs, error_code); > + > + cond_local_irq_enable(regs); > + > + if (handle_user_split_lock(regs, error_code)) > + return; > + > + do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs, > + error_code, BUS_ADRALN, NULL); > +} > + > #ifdef CONFIG_VMAP_STACK > __visible void __noreturn handle_stack_overflow(const char *message, > struct pt_regs *regs, > -- > 2.21.0 >