netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Fenghua Yu <fenghua.yu@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	H Peter Anvin <hpa@zytor.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Ashok Raj <ashok.raj@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ravi V Shankar <ravi.v.shankar@intel.com>,
	Xiaoyao Li <xiaoyao.li@intel.com>,
	Christopherson Sean J <sean.j.christopherson@intel.com>,
	Kalle Valo <kvalo@codeaurora.org>,
	Michael Chan <michael.chan@broadcom.com>,
	linux-kernel <linux-kernel@vger.kernel.org>, x86 <x86@kernel.org>,
	kvm@vger.kernel.org, netdev@vger.kernel.org,
	linux-wireless@vger.kernel.org
Subject: Re: [PATCH v8 10/15] x86/split_lock: Handle #AC exception for split lock
Date: Thu, 25 Apr 2019 08:07:15 +0200	[thread overview]
Message-ID: <20190425060715.GB40105@gmail.com> (raw)
In-Reply-To: <1556134382-58814-11-git-send-email-fenghua.yu@intel.com>


* Fenghua Yu <fenghua.yu@intel.com> wrote:

> There may be different considerations on how to handle #AC for split lock,
> e.g. how to handle system hang caused by split lock issue in firmware,
> how to emulate faulting instruction, etc. We use a simple method to
> handle user and kernel split lock and may extend the method in the future.
> 
> When #AC exception for split lock is triggered from user process, the
> process is killed by SIGBUS. To execute the process properly, a user
> application developer needs to fix the split lock issue.
> 
> When #AC exception for split lock is triggered from a kernel instruction,
> disable split lock detection on local CPU and warn the split lock issue.
> After the exception, the faulting instruction will be executed and kernel
> execution continues. Split lock detection is only disabled on the local
> CPU, not globally. It will be re-enabled if the CPU is offline and then
> online or through sysfs interface.
> 
> A kernel/driver developer should check the warning, which contains helpful
> faulting address, context, and callstack info, and fix the split lock
> issues. Then further split lock issues may be captured and fixed.
> 
> After bit 29 in MSR_TEST_CTL is set to 1 in kernel, firmware inherits
> the setting when firmware is executed in S4, S5, run time services, SMI,
> etc. If there is a split lock operation in firmware, it will triggers
> #AC and may hang the system depending on how firmware handles the #AC.
> It's up to a firmware developer to fix split lock issues in firmware.
> 
> MSR TEST_CTL value is cached in per CPU msr_test_ctl_cache which will be
> used in virtualization to avoid costly MSR read.
> 
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> ---
>  arch/x86/include/asm/cpu.h  |  3 +++
>  arch/x86/kernel/cpu/intel.c | 24 ++++++++++++++++++++++++
>  arch/x86/kernel/traps.c     | 31 ++++++++++++++++++++++++++++++-
>  3 files changed, 57 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
> index 4e03f53fc079..5706461eb60f 100644
> --- a/arch/x86/include/asm/cpu.h
> +++ b/arch/x86/include/asm/cpu.h
> @@ -42,7 +42,10 @@ unsigned int x86_model(unsigned int sig);
>  unsigned int x86_stepping(unsigned int sig);
>  #ifdef CONFIG_CPU_SUP_INTEL
>  void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
> +DECLARE_PER_CPU(u64, msr_test_ctl_cache);
> +void handle_split_lock_kernel_mode(void);
>  #else
>  static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
> +static inline void handle_split_lock_kernel_mode(void) {}
>  #endif
>  #endif /* _ASM_X86_CPU_H */
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index d7e676c2aebf..2cc69217ca7c 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -31,6 +31,9 @@
>  #include <asm/apic.h>
>  #endif
>  
> +DEFINE_PER_CPU(u64, msr_test_ctl_cache);
> +EXPORT_PER_CPU_SYMBOL_GPL(msr_test_ctl_cache);
> +
>  /*
>   * Just in case our CPU detection goes bad, or you have a weird system,
>   * allow a way to override the automatic disabling of MPX.
> @@ -654,6 +657,17 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
>  	wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
>  }
>  
> +static void init_split_lock_detect(struct cpuinfo_x86 *c)
> +{
> +	if (cpu_has(c, X86_FEATURE_SPLIT_LOCK_DETECT)) {
> +		u64 test_ctl_val;
> +
> +		/* Cache MSR TEST_CTL */
> +		rdmsrl(MSR_TEST_CTL, test_ctl_val);
> +		this_cpu_write(msr_test_ctl_cache, test_ctl_val);
> +	}
> +}
> +
>  static void init_intel(struct cpuinfo_x86 *c)
>  {
>  	early_init_intel(c);
> @@ -766,6 +780,8 @@ static void init_intel(struct cpuinfo_x86 *c)
>  	init_intel_energy_perf(c);
>  
>  	init_intel_misc_features(c);
> +
> +	init_split_lock_detect(c);
>  }
>  
>  #ifdef CONFIG_X86_32
> @@ -1060,3 +1076,11 @@ void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
>  	if (ia32_core_cap & CORE_CAP_SPLIT_LOCK_DETECT)
>  		set_split_lock_detect();
>  }
> +
> +void handle_split_lock_kernel_mode(void)
> +{
> +	/* Warn and disable split lock detection on this CPU */
> +	msr_clear_bit(MSR_TEST_CTL, TEST_CTL_SPLIT_LOCK_DETECT_SHIFT);
> +	this_cpu_and(msr_test_ctl_cache, ~TEST_CTL_SPLIT_LOCK_DETECT);
> +	WARN_ONCE(1, "split lock operation detected\n");

Please name this more descriptively, such as x86_split_lock_disable() or 
so.

Also, please reorganize the split lock detection namespace to be less 
idiosynchratic, use a common x86_split_lock_ prefix and organize the new 
namespace around that:

    x86_split_lock_init()          // was: init_split_lock_detect
    x86_split_lock_enable()        // was: set_split_lock_detect
    x86_split_lock_disable()       // was: handle_split_lock_kernel_mode
    etc.

> +dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
> +{
> +	unsigned int trapnr = X86_TRAP_AC;
> +	char str[] = "alignment check";
> +	int signr = SIGBUS;
> +
> +	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> +
> +	if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) ==
> +		       NOTIFY_STOP)

Please do not break lines mid-line when it does absolutely nothing to 
improve readablity. Just ignore checkpatch.

> +	cond_local_irq_enable(regs);
> +	if (!user_mode(regs) &&
> +	    static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT)) {

Ditto - in fact this doesn't even violate the col80 rule ...

> +		/*
> +		 * Only split lock can generate #AC from kernel at this point.
> +		 * Warn and disable split lock detection on this CPU. The
> +		 * faulting instruction will be executed without generating
> +		 * another #AC fault.
> +		 */

How about explaining all this more clearly:

> +		/*
> +		 * Only split locks can generate #AC from kernel mode.
> +              *
> +		 * The split-lock detection feature is a one-shot
> +              * debugging facility, so we disable it immediately and 
> +              * print a warning.
> +              *
> +              * This also solves the instruction restart problem: we 
> +              * return the faulting instruction right after this it 
> +              * will be executed without generating another #AC fault
> +              * and getting into an infinite loop, instead it will
> +              * continue without side effects to the interrupted
> +              * execution conext.
> +              *
> +              * Split-lock detection will remain disabled permanently
> +              * after this, until the next reboot.
> +		 */

?

Also, AFAICS this code will disable split-lock detection only on the 
current CPU - all the other 4,096 CPUs hitting this same lock at the 
exact same time will happily continue spamming the kernel log as they 
encounter the same split lock, correct?

While the warning itself uses WARN_ONCE(), that and the underlying 
BUGFLAG_ONCE mechanism is not an atomic facility.

Instead, please add an explicit, global split_lock_debug bit that the 
first CPU hitting it disables, and only that CPU is allowed to print a 
single warning. All other CPUs just disable split-lock debugging silently 
and continue.

This also solves the race if the split-lock #AC fault is re-triggered by 
NMI of perf context interrupting one split-lock warning execution while 
the original WARN_ON() is executing.

Thanks,

	Ingo

  reply	other threads:[~2019-04-25  6:07 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-24 19:32 [PATCH v8 00/15] x86/split_lock: Enable split lock detection Fenghua Yu
2019-04-24 19:32 ` [PATCH v8 01/15] x86/common: Align cpu_caps_cleared and cpu_caps_set to unsigned long Fenghua Yu
2019-04-24 19:32 ` [PATCH v8 02/15] drivers/net/b44: Align pwol_mask to unsigned long for better performance Fenghua Yu
2019-04-24 19:32 ` [PATCH v8 03/15] wlcore: simplify/fix/optimize reg_ch_conf_pending operations Fenghua Yu
2019-04-25 17:12   ` Kalle Valo
2019-04-24 19:32 ` [PATCH v8 04/15] x86/split_lock: Align x86_capability to unsigned long to avoid split locked access Fenghua Yu
2019-04-24 19:32 ` [PATCH v8 05/15] x86/msr-index: Define MSR_IA32_CORE_CAPABILITY and split lock detection bit Fenghua Yu
2019-04-25  5:45   ` Ingo Molnar
2019-04-25 19:01     ` Fenghua Yu
2019-04-25 19:47       ` Ingo Molnar
2019-04-25 19:51         ` Fenghua Yu
2019-04-25 20:08           ` Ingo Molnar
2019-04-25 20:22             ` Fenghua Yu
2019-04-26  6:00               ` Ingo Molnar
2019-05-06  0:12                 ` Fenghua Yu
2019-04-24 19:32 ` [PATCH v8 06/15] x86/cpufeatures: Enumerate MSR_IA32_CORE_CAPABILITY Fenghua Yu
2019-04-24 19:32 ` [PATCH v8 07/15] x86/split_lock: Enumerate split lock detection by MSR_IA32_CORE_CAPABILITY Fenghua Yu
2019-04-24 19:32 ` [PATCH v8 08/15] x86/split_lock: Enumerate split lock detection on Icelake mobile processor Fenghua Yu
2019-04-24 19:32 ` [PATCH v8 09/15] x86/split_lock: Define MSR TEST_CTL register Fenghua Yu
2019-04-25  6:21   ` Ingo Molnar
2019-04-25 19:48     ` Fenghua Yu
2019-04-24 19:32 ` [PATCH v8 10/15] x86/split_lock: Handle #AC exception for split lock Fenghua Yu
2019-04-25  6:07   ` Ingo Molnar [this message]
2019-04-25  7:29   ` Thomas Gleixner
2019-04-24 19:32 ` [PATCH v8 11/15] kvm/x86: Emulate MSR IA32_CORE_CAPABILITY Fenghua Yu
2019-04-24 19:32 ` [PATCH v8 12/15] kvm/vmx: Emulate MSR TEST_CTL Fenghua Yu
2019-04-25  7:42   ` Thomas Gleixner
2019-04-27 12:20     ` Xiaoyao Li
2019-04-28  7:09       ` Thomas Gleixner
2019-04-28  7:34         ` Xiaoyao Li
2019-04-29  7:31           ` Thomas Gleixner
2019-04-29  5:21         ` Xiaoyao Li
2019-04-24 19:33 ` [PATCH v8 13/15] x86/split_lock: Enable split lock detection by default Fenghua Yu
2019-04-25  7:50   ` Thomas Gleixner
2019-05-06 21:39     ` Fenghua Yu
2019-04-25 10:52   ` David Laight
2019-04-25 10:58     ` Thomas Gleixner
2019-04-25 11:13       ` David Laight
2019-04-25 11:41         ` Peter Zijlstra
2019-04-25 13:04         ` Thomas Gleixner
2019-05-07 20:48       ` Fenghua Yu
2019-04-24 19:33 ` [PATCH v8 14/15] x86/split_lock: Disable split lock detection by kernel parameter "nosplit_lock_detect" Fenghua Yu
2019-04-24 19:33 ` [PATCH v8 15/15] x86/split_lock: Add a sysfs interface to enable/disable split lock detection during run time Fenghua Yu
2019-04-25  6:31   ` Ingo Molnar
2019-05-06  0:21     ` Fenghua Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190425060715.GB40105@gmail.com \
    --to=mingo@kernel.org \
    --cc=ashok.raj@intel.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=fenghua.yu@intel.com \
    --cc=hpa@zytor.com \
    --cc=kvalo@codeaurora.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=michael.chan@broadcom.com \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=ravi.v.shankar@intel.com \
    --cc=sean.j.christopherson@intel.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=xiaoyao.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).