linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Yu, Fenghua" <fenghua.yu@intel.com>
To: "Christopherson, Sean J" <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Borislav Petkov <bp@alien8.de>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"Shanbhogue, Vedvyas" <vedvyas.shanbhogue@intel.com>,
	"Luck, Tony" <tony.luck@intel.com>, H Peter Anvin <hpa@zytor.com>,
	Andy Lutomirski <luto@kernel.org>,
	"Shankar, Ravi V" <ravi.v.shankar@intel.com>,
	"Li, Xiaoyao" <xiaoyao.li@intel.com>, x86 <x86@kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: RE: [PATCH RFC] x86/bus_lock: Enable bus lock detection
Date: Wed, 29 Jul 2020 18:09:16 +0000	[thread overview]
Message-ID: <e23b04a2adc54a5dbca48271987de822@intel.com> (raw)
In-Reply-To: <20200729030232.GE5583@linux.intel.com>

Hi, Sean,

> > A bus lock [1] is acquired either through split locked access to
> > writeback (WB) memory or by using locks to uncacheable (UC) memory
> > (e.g. direct device
> 
> Does SLD not detect the lock to UC memory?

The statement might not be accurate. Split Lock Detection doesn't detect bus
lock to non-WB memory (including UC memory).

> 
> > assignment). This is typically >1000 cycles slower than an atomic
> > operation within a cache line. It also disrupts performance on other cores.
> >
> > Although split lock can be detected by #AC trap, the trap is triggered
> > before the instruction acquires bus lock. This makes it difficult to
> > mitigate bus lock (e.g. throttle the user application).
> 
> Mitigate _in a non-fatal way_.  The #AC makes it very easy to mitigate split
> locks, it just has the side effect of SIGBUGS or killing the KVM guest.

Adding "in a non-fatal way" is more better. Will fix it.

> 
> > Some CPUs have ability to notify the kernel by an #DB trap after the
> > instruction acquires a bus lock and is executed. This allows the
> > kernel to enforce user application throttling or mitigations and also
> > provides a better environment to debug kernel split lock issues since
> > the kernel can continue instead of crashing.
> >
> > #DB for bus lock detect fixes all issues in #AC for split lock detect:
> 
> Fixes "all" issues... and creates some new ones, e.g. there are use cases
> where preventing the split lock from happening in the first place is strongly
> desired.  It's why that train wreck exists.

Bus Lock Detection doesn't replace Split Lock Detection. If both features
are enabled, default behavior is warning from bus lock, fatal behavior is
still from split lock. See the behavior table as follows.

> 
> > 1) It's architectural ... just need to look at one CPUID bit to know it
> >    exists
> > 2) The IA32_DEBUGCTL MSR, which reports bus lock in #DB, is per-thread.
> >    So each process or guest can have different behavior.
> > 3) It has support for VMM/guests (new VMEXIT codes, etc).
> >
> > Use the existing kernel command line option "split_lock_detect=" to
> > handle #DB for bus lock:
> 
> Are SLD and BLD mutually exclusive?  Can we even guarantee that given the
> track record of SLD?  If not, we'll likely want to allow the user to choose
> between SDL and BLD via split_lock_detect.

The two hardware features can be enabled on the same platform.
But they are mutually exclusive in the kernel because #AC from SLD happens
before the instruction is executed and #DB happens after the instruction is
executed.

Right now, if both of them are enabled, "warn" behavior goes to
bus lock and "fatal" behavior goes to split lock.

Do you want the user to override the behaviors by something like this?

split_lock_detect=warn[,sld]: if given "sld" while both features are enabled,
warn behavior is from split lock instead of bus lock detection.

split_lock_detect=fatal[,bld]: if given "bld" while both features are enabled,
fatal behavior is from bus lock detection.

> 
> > split_lock_detect=
> > 		#AC for split lock		#DB for bus lock
> >
> > off		Do nothing			Do nothing
> >
> > warn		Kernel OOPs			Kernel warns rate limited
> > 		Warn once per task and		and continues to run.
> > 		disable future checking 	Warn once per task and
> > 						and continue to run.
> > 						When both features are
> > 						supported, warn in #DB
> >
> > fatal		Kernel OOPs			Kernel warn rate limited
> 
> Unless the lock to UC #DB is new behavior, why would we revert to allowing
> split locks in the kernel?

SLD cannot detect lock to non-WB memory (including UC). BLD can detect
both bus locks from both split lock and lock to non-WB.

> 
> > 		Send SIGBUS to user		Send SIGBUS to user
> > 		When both features are
> > 		supported, fatal in #AC.
> >
> > ratelimit:N	Do nothing			Kernel warns rate limited
> 
> This should be more than "Do nothing" for #AC, e.g. fall back to warn or at
> least print a loud error.

Ok. Will fall back to warn.

> 
> > 						and continue to run.
> > 						Limit bus lock rate to
> > 						N per second in the
> > 						current non root user.
> >
> > On systems that support #DB for bus lock detection the default is "warn".
> >
> > [1] Chapter 8
> > https://software.intel.com/sites/default/files/managed/c5/15/architect
> > ure-instruction-set-extensions-programming-reference.pdf
> >
> > Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> > Reviewed-by: Tony Luck <tony.luck@intel.com>
> > ---
> >  .../admin-guide/kernel-parameters.txt         |  48 +++++-
> >  arch/x86/include/asm/cpu.h                    |  16 +-
> >  arch/x86/include/asm/cpufeatures.h            |   1 +
> >  arch/x86/include/asm/msr-index.h              |   1 +
> >  arch/x86/include/uapi/asm/debugreg.h          |   3 +-
> >  arch/x86/kernel/cpu/common.c                  |   2 +-
> >  arch/x86/kernel/cpu/intel.c                   | 156 +++++++++++++++---
> >  arch/x86/kernel/traps.c                       |  10 ++
> >  include/linux/sched/user.h                    |   4 +-
> >  kernel/user.c                                 |   7 +
> >  10 files changed, 214 insertions(+), 34 deletions(-)
> 
> Maybe it's just me, but it'd be nice to break this into multiple patches so that
> the SLD refactoring is separate from the introduction of BLD.  As is, I find it
> hard to review as I can't easily distinguish refactoring from new functionality.

OK. Will split the patch into multiple patches.

> 
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt
> > b/Documentation/admin-guide/kernel-parameters.txt
> > index fb95fad81c79..7a1cb6fe8b8e 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -4816,27 +4816,59 @@
> >  	spia_peddr=
> >
> >  	split_lock_detect=
> > -			[X86] Enable split lock detection
> > +			[X86] Enable split lock detection or bus lock detection
> >
> >  			When enabled (and if hardware support is present),
> atomic
> >  			instructions that access data across cache line
> > -			boundaries will result in an alignment check
> exception.
> > +			boundaries will result in an alignment check
> exception
> > +			for split lock detection or an debug exception for
> > +			bus lock detection.
> >
> >  			off	- not enabled
> >
> > -			warn	- the kernel will emit rate limited warnings
> > -				  about applications triggering the #AC
> > -				  exception. This mode is the default on CPUs
> > -				  that supports split lock detection.
> > +			warn	- Default mode.
> >
> > -			fatal	- the kernel will send SIGBUS to applications
> > -				  that trigger the #AC exception.
> > +				  If split lock detection is enabled in
> > +				  hardware, the kernel will emit rate limited
> > +				  warnings about applications triggering the
> #AC
> > +				  exception.
> > +
> > +				  If bus lock detection is enabled in hardware,
> > +				  the kernel will emit rate limited warnings
> > +				  about applications triggering the #D
> 
> s/#D/#DB
> 
> > +				  exception.
> > +
> > +				  Default behavior is from bus lock detection
> > +				  if both features are enabled in hardware.
> > +
> > +			fatal	- If split lock detection is enabled in
> > +				  hardware, the kernel will send SIGBUS to
> > +				  applications that trigger the #AC exception.
> > +
> > +				  If bus lock detection is enabled in hardware,
> > +				  the kernel will send SIGBUS to application
> > +				  that trigger the #DB exception.
> > +
> > +				  Default behavior is from split lock detection
> > +				  if both are enabled in hardware.
> > +
> > +			ratelimit:N
> > +				  Set rate limit to N bus locks per second
> > +				  for bus lock detection. 0 < N <= HZ/2 and
> > +				  N is approximate. Not applied to root user
> > +				  and the kernel. Only applied to non root
> user.
> > +
> > +				  N/A for split lock detection.
> >
> >  			If an #AC exception is hit in the kernel or in
> >  			firmware (i.e. not while executing in user mode)
> >  			the kernel will oops in either "warn" or "fatal"
> >  			mode.
> >
> > +			If an #DB exception is hit in the kernel or in
> > +			firmware, the kernel will warn in either "warn" or
> > +			"fatal" mode.
> > +
> >  	srbds=		[X86,INTEL]
> >  			Control the Special Register Buffer Data Sampling
> >  			(SRBDS) mitigation.
> 
> ...
> 
> > diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> > index 0ab48f1cdf84..f498472990af 100644
> > --- a/arch/x86/kernel/cpu/intel.c
> > +++ b/arch/x86/kernel/cpu/intel.c
> > @@ -10,6 +10,9 @@
> >  #include <linux/thread_info.h>
> >  #include <linux/init.h>
> >  #include <linux/uaccess.h>
> > +#include <linux/cred.h>
> > +#include <linux/delay.h>
> > +#include <linux/sched/user.h>
> >
> >  #include <asm/cpufeature.h>
> >  #include <asm/msr.h>
> > @@ -39,15 +42,20 @@ enum split_lock_detect_state {
> >  	sld_off = 0,
> >  	sld_warn,
> >  	sld_fatal,
> > +	sld_ratelimit,
> >  };
> >
> >  /*
> >   * Default to sld_off because most systems do not support split lock
> > detection
> > - * split_lock_setup() will switch this to sld_warn on systems that
> > support
> > - * split lock detect, unless there is a command line override.
> > + * sld_state_setup() will switch this to sld_warn on systems that
> > + support
> > + * split lock/bus lock detect, unless there is a command line override.
> >   */
> >  static enum split_lock_detect_state sld_state __ro_after_init =
> > sld_off;  static u64 msr_test_ctrl_cache __ro_after_init;
> > +/* Split lock detection is enabled if it's true. */ static bool sld;
> > +/* Bus lock detection is enabled if it's true. */ static bool bld;
> 
> Why can't these be tracked/reflected in X86_FEATURE_*?

sld and bld are enabled depending on kernel parameter "split_lock_detect=".
They are not static and cannot be tracked by static X86_FEATURE_*.

> 
> >  /*
> >   * With a name like MSR_TEST_CTL it should go without saying, but
> > don't touch @@ -601,6 +609,7 @@ static void
> > init_intel_misc_features(struct cpuinfo_x86 *c)  }
> >
> >  static void split_lock_init(void);
> > +static void bus_lock_init(void);
> >
> >  static void init_intel(struct cpuinfo_x86 *c)  { @@ -717,6 +726,7 @@
> > static void init_intel(struct cpuinfo_x86 *c)
> >  	if (tsx_ctrl_state == TSX_CTRL_DISABLE)
> >  		tsx_disable();
> >
> > +	bus_lock_init();
> >  	split_lock_init();
> >  }
> >
> > @@ -991,13 +1001,25 @@ static const struct {
> >  	{ "off",	sld_off   },
> >  	{ "warn",	sld_warn  },
> >  	{ "fatal",	sld_fatal },
> > +	{ "ratelimit:", sld_ratelimit },
> >  };
> >
> >  static inline bool match_option(const char *arg, int arglen, const
> > char *opt)  {
> > -	int len = strlen(opt);
> >
> > -	return len == arglen && !strncmp(arg, opt, len);
> > +	int len = strlen(opt), ratelimit;
> > +
> > +	if (strncmp(arg, opt, len))
> > +		return false;
> > +
> > +	if (sscanf(arg, "ratelimit:%d", &ratelimit) == 1 && ratelimit > 0 &&
> > +	    ratelimit_bl <= HZ / 2) {
> > +		ratelimit_bl = ratelimit;
> > +
> > +		return true;
> > +	}
> > +
> > +	return len == arglen;
> >  }
> >
> >  static bool split_lock_verify_msr(bool on) @@ -1016,16 +1038,15 @@
> > static bool split_lock_verify_msr(bool on)
> >  	return ctrl == tmp;
> >  }
> >
> > -static void __init split_lock_setup(void)
> > +static void __init sld_state_setup(void)
> >  {
> >  	enum split_lock_detect_state state = sld_warn;
> >  	char arg[20];
> >  	int i, ret;
> >
> > -	if (!split_lock_verify_msr(false)) {
> > -		pr_info("MSR access failed: Disabled\n");
> > +	if (!static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) &&
> > +	    !static_cpu_has(X86_FEATURE_BUS_LOCK_DETECT))
> 
> Pretty sure static_cpu_has() in an __init function is a waste.

Sure. Will change it.

> 
> >  		return;
> > -	}
> >
> >  	ret = cmdline_find_option(boot_command_line, "split_lock_detect",
> >  				  arg, sizeof(arg));
> > @@ -1037,17 +1058,14 @@ static void __init split_lock_setup(void)
> >  			}
> >  		}
> >  	}
> > +	sld_state = state;
> > +}
> >
> > -	switch (state) {
> > -	case sld_off:
> > -		pr_info("disabled\n");
> > +static void __init _split_lock_setup(void) {
> > +	if (!split_lock_verify_msr(false)) {
> > +		pr_info("MSR access failed: Disabled\n");
> >  		return;
> > -	case sld_warn:
> > -		pr_info("warning about user-space split_locks\n");
> > -		break;
> > -	case sld_fatal:
> > -		pr_info("sending SIGBUS on user-space split_locks\n");
> > -		break;
> >  	}
> >
> >  	rdmsrl(MSR_TEST_CTRL, msr_test_ctrl_cache); @@ -1057,8 +1075,11
> @@
> > static void __init split_lock_setup(void)
> >  		return;
> >  	}
> >
> > -	sld_state = state;
> > +	/* Restore the MSR to its cached value. */
> > +	wrmsrl(MSR_TEST_CTRL, msr_test_ctrl_cache);
> > +
> >  	setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
> > +	sld = true;
> >  }
> >
> >  /*
> > @@ -1078,6 +1099,10 @@ static void sld_update_msr(bool on)
> >
> >  static void split_lock_init(void)
> >  {
> > +	/* If supported, #DB for bus lock will handle warn and ratelimit. */
> > +	if (bld && (sld_state == sld_warn || sld_state == sld_ratelimit))
> > +		return;
> > +
> >  	if (cpu_model_supports_sld)
> >  		split_lock_verify_msr(sld_state != sld_off);  } @@ -1114,14
> > +1139,58 @@ bool handle_guest_split_lock(unsigned long ip)  }
> > EXPORT_SYMBOL_GPL(handle_guest_split_lock);
> >
> > +static void bus_lock_init(void)
> > +{
> > +	u64 val;
> > +
> > +	if (!bld)
> > +		return;
> > +
> > +	/* If supported, #AC for split lock will handle fatal. */
> > +	if (sld && sld_state == sld_fatal)
> > +		return;
> > +
> > +	rdmsrl(MSR_IA32_DEBUGCTLMSR, val);
> > +	val |= DEBUGCTLMSR_BUS_LOCK_DETECT;
> 
> Uh, doesn't this enable BLD even if sld_state == sld_off?

If sld_state == sld_off, bld is 0 as well.
Then "if (!bld) return" will disable BLD.

> 
> > +	wrmsrl(MSR_IA32_DEBUGCTLMSR, val);
> > +}
> > +
> >  bool handle_user_split_lock(struct pt_regs *regs, long error_code)  {
> > -	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
> > +	if ((regs->flags & X86_EFLAGS_AC) || !sld || sld_state == sld_fatal ||
> > +	    sld_state == sld_ratelimit)
> >  		return false;
> >  	split_lock_warn(regs->ip);
> >  	return true;
> >  }
> >
> > +bool handle_user_bus_lock(struct pt_regs *regs) {
> > +	if (!bld)
> > +		return false;
> > +
> > +	pr_warn_ratelimited("#DB: %s/%d took a bus_lock trap at address:
> 0x%lx\n",
> > +			    current->comm, current->pid, regs->ip);
> > +
> > +	if (sld_state == sld_ratelimit) {
> > +		while (!__ratelimit(&get_current_user()->ratelimit_bl))
> > +			msleep(1000 / ratelimit_bl);
> > +	}
> > +
> > +	return true;
> > +}
> > +
> > +bool handle_kernel_bus_lock(struct pt_regs *regs) {
> > +	if (!bld)
> > +		return false;
> > +
> > +	pr_warn_ratelimited("#DB: %s/%d took a bus_lock trap at address:
> 0x%lx\n",
> > +			    current->comm, current->pid, regs->ip);
> > +
> > +	return true;
> > +}
> > +
> >  /*
> >   * This function is called only when switching between tasks with
> >   * different split-lock detection modes. It sets the MSR for the @@
> > -1159,7 +1228,7 @@ static const struct x86_cpu_id split_lock_cpu_ids[]
> __initconst = {
> >  	{}
> >  };
> >
> > -void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
> > +static void __init split_lock_setup(struct cpuinfo_x86 *c)
> >  {
> >  	const struct x86_cpu_id *m;
> >  	u64 ia32_core_caps;
> > @@ -1186,5 +1255,50 @@ void __init cpu_set_core_cap_bits(struct
> cpuinfo_x86 *c)
> >  	}
> >
> >  	cpu_model_supports_sld = true;
> > -	split_lock_setup();
> > +	_split_lock_setup();
> > +}
> > +
> > +static void sld_state_show(void)
> > +{
> > +	if (!bld && !sld)
> > +		return;
> > +
> > +	switch (sld_state) {
> > +	case sld_off:
> > +		pr_info("disabled\n");
> > +		break;
> > +
> > +	case sld_warn:
> > +		if (bld)
> > +			pr_info("#DB: warning about kernel and user-space
> bus_locks\n");
> > +		else
> > +			pr_info("#AC: crashing the kernel about kernel
> split_locks and warning about user-space split_locks\n");
> > +		break;
> > +
> > +	case sld_fatal:
> > +		if (sld)
> > +			pr_info("#AC: crashing the kernel on kernel
> split_locks and sending SIGBUS on user-space split_locks\n");
> > +		else
> > +			pr_info("#DB: warning about kernel bus_locks and
> sending SIGBUS on user-space bus_locks\n");
> > +		break;
> > +
> > +	case sld_ratelimit:
> > +		if (bld)
> > +			pr_info("#DB: warning about kernel bus_locks and
> setting silent rate limit to %d/sec per user on non root user-space
> bus_locks\n", ratelimit_bl);
> > +		break;
> > +	}
> > +}
> > +
> > +static void __init bus_lock_setup(void) {
> > +	if (static_cpu_has(X86_FEATURE_BUS_LOCK_DETECT) && sld_state !=
> > +sld_off)
> 
> More overkill.
> 
> > +		bld = true;
> > +}
> > +
> > +void __init sld_setup(struct cpuinfo_x86 *c)
> 
> This wrapper probably should call out that it configures both sld and bld.

Will fix it.

> 
> > +{
> > +	sld_state_setup();
> > +	split_lock_setup(c);
> > +	bus_lock_setup();
> > +	sld_state_show();
> >  }
> > diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index
> > b038695f36c5..58725567da39 100644
> > --- a/arch/x86/kernel/traps.c
> > +++ b/arch/x86/kernel/traps.c
> > @@ -812,6 +812,16 @@ static void handle_debug(struct pt_regs *regs,
> unsigned long dr6, bool user)
> >  	if (!user && !dr6)
> >  		return;
> >
> > +	/* Handle bus lock. */
> > +	if (!(dr6 & DR_BUS_LOCK)) {
> > +		cond_local_irq_enable(regs);
> > +		if (user)
> > +			handle_user_bus_lock(regs);
> > +		else
> > +			handle_kernel_bus_lock(regs);
> > +		goto out;
> > +	}
> > +
> >  	/*
> >  	 * If dr6 has no reason to give us about the origin of this trap,
> >  	 * then it's very likely the result of an icebp/int01 trap.
> > diff --git a/include/linux/sched/user.h b/include/linux/sched/user.h
> > index 917d88edb7b9..fc757ec6c19f 100644
> > --- a/include/linux/sched/user.h
> > +++ b/include/linux/sched/user.h
> > @@ -37,8 +37,9 @@ struct user_struct {
> >  	atomic_long_t locked_vm;
> >  #endif
> >
> > -	/* Miscellaneous per-user rate limit */
> > +	/* Miscellaneous per-user rate limits */
> >  	struct ratelimit_state ratelimit;
> > +	struct ratelimit_state ratelimit_bl;
> 
> Why not spell out ratelimit_bus_lock?  There's no way someone looking at
> this code in isolation is going to have any clue what "bl" means.

Sure. Will fix it.

Thank you very much for your review!

-Fenghua

  parent reply	other threads:[~2020-07-29 18:10 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-17 21:35 [PATCH RFC] x86/bus_lock: Enable bus lock detection Fenghua Yu
2020-07-29  3:02 ` Sean Christopherson
2020-07-29  8:50   ` peterz
2020-07-29 18:09   ` Yu, Fenghua [this message]
2020-07-29 18:46     ` Sean Christopherson
2020-07-29 19:42       ` Fenghua Yu
2020-07-29 20:00         ` Sean Christopherson
2020-07-29 20:09           ` peterz
2020-07-29 20:35           ` Fenghua Yu
2020-07-29 20:39             ` Sean Christopherson
2020-07-29 22:07               ` Fenghua Yu
2020-07-29 23:28                 ` Sean Christopherson
2020-07-29  8:49 ` peterz
2020-07-29 20:40   ` Fenghua Yu
2020-07-29 21:09     ` peterz
2020-07-30 10:08       ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e23b04a2adc54a5dbca48271987de822@intel.com \
    --to=fenghua.yu@intel.com \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=ravi.v.shankar@intel.com \
    --cc=sean.j.christopherson@intel.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=vedvyas.shanbhogue@intel.com \
    --cc=x86@kernel.org \
    --cc=xiaoyao.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).