All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC: default to spec_store_bypass_disable=prctl spectre_v2_user=prctl
@ 2020-11-04 21:57 ` Andrea Arcangeli
  0 siblings, 0 replies; 21+ messages in thread
From: Andrea Arcangeli @ 2020-11-04 21:57 UTC (permalink / raw)
  To: Kees Cook
  Cc: Tobin Feldman-Fitzthum, Hubertus Franke, Jack Chen,
	Giuseppe Scrivano, YiFei Zhu, Waiman Long, Tianyin Xu, Jann Horn,
	Jiri Kosina, Valentin Rothberg, Josep Torrellas, Thomas Gleixner,
	Will Drewry, Linux Containers, kernel list, Andy Lutomirski,
	Dimitrios Skarlatos, David Laight, bpf

Hello,

[ Given the CC list and your mention of spectre_v2_user=prctl is spot
  on to show the badness... I spawned a new thread to suggest another
  thing related to seccomp that I've been intending to suggest for
  a while ]

On Tue, Nov 03, 2020 at 04:29:38PM -0800, Kees Cook wrote:
> I assume this is from Indirect Branch Prediction Barrier (IBPB) and
> Single Threaded Indirect Branch Prediction (STIBP) (which get enabled
> for threads under seccomp by default).
> 
> Try booting with "spectre_v2_user=prctl"

We need to change the kernel default to
"spec_store_bypass_disable=prctl spectre_v2_user=prctl".

I've been recommending to everyone to use
"spec_store_bypass_disable=prctl spectre_v2_user=prctl" for a while
now. I already recommend to Yifei too a few months ago when he first
found out of the huge seccomp regression when he upgraded his codebase
to the upstream kernel with both STIBP/SSBD enabled in seccomp jails.

Here's below a tentative RFC, the code is actually trivial, if you
could help reviewing/improving the commit header it would be great.

Thanks,
Andrea

From 3f7adb783262dc7f4e71cdbf07b4ef9f6b8d3ed9 Mon Sep 17 00:00:00 2001
From: Andrea Arcangeli <aarcange@redhat.com>
Date: Wed, 4 Nov 2020 15:20:33 -0500
Subject: [PATCH 1/1] x86: change default to spec_store_bypass_disable=prctl
 spectre_v2_user=prctl

Switch the kernel default of SSBD and STIBP to the ones with
CONFIG_SECCOMP=n (i.e. spec_store_bypass_disable=prctl
spectre_v2_user=prctl) even if CONFIG_SECCOMP=y.

Several motivations listed below:

- If SMT is enabled the seccomp jail can still attack the rest of the
  system even with spectre_v2_user=seccomp by using MDS-HT (except on
  XEON PHI where MDS can be tamed with SMT left enabled, but that's a
  special case). Setting STIBP become a very expensive window dressing
  after MDS-HT was discovered.

- The seccomp jail cannot attack the kernel with spectre-v2-HT
  regardless (even if STIBP is not set), but with MDS-HT the seccomp
  jail can attack the kernel too.

- With spec_store_bypass_disable=prctl the seccomp jail can attack the
  other userland (guest or host mode) using spectre-v2-HT, but the
  userland attack is already mitigated by both ASLR and pid namespaces
  for host userland and through virt isolation with libkrun or
  kata. (if something if somebody is worried about spectre-v2-HT it's
  best to mount proc with hidepid=2,gid=proc on workstations where not
  all apps may run under container runtimes, rather than slowing down
  all seccomp jails, but the best is to add pid namespaces to the
  seccomp jail). As opposed MDS-HT is not mitigated and the seccomp
  jail can still attack all other host and guest userland if SMT is
  enabled even with spec_store_bypass_disable=seccomp.

- If full security is required then MDS-HT must also be mitigated with
  nosmt and then spectre_v2_user=prctl and spectre_v2_user=seccomp
  would become identical.

- Setting spectre_v2_user=seccomp is overall lower priority than to
  setting javascript.options.wasm false in about:config to protect
  against remote wasm MDS-HT, instead of worrying about Spectre-v2-HT
  and STIBP which again is already statistically well mitigated by
  other means in userland and it's fully mitigated in kernel with
  retpolines (unlike the wasm assist call with MDS-HT).

- SSBD is needed to prevent reading the JIT memory and the primary
  user being the OpenJDK. However the primary user of SSBD wouldn't be
  covered by spec_store_bypass_disable=seccomp because it doesn't use
  seccomp and the primary user also explicitly declined to set
  PR_SET_SPECULATION_CTRL+PR_SPEC_STORE_BYPASS despite it easily
  could. In fact it would need to set it only when the sandboxing
  mechanism is enabled for javaws applets, but it still declined it by
  declaring security within the same user address space as an
  untenable objective for their JIT, even in the sandboxing case where
  performance would be a lesser concern (for the record: I kind of
  disagree in not setting PR_SPEC_STORE_BYPASS in the sandbox case and
  I prefer to run javaws through a wrapper that sets
  PR_SPEC_STORE_BYPASS if I need). In turn it can be inferred that
  even if the primary user of SSBD would use seccomp, they would
  invoke it with SECCOMP_FILTER_FLAG_SPEC_ALLOW by now.

- runc/crun already set SECCOMP_FILTER_FLAG_SPEC_ALLOW by default, k8s
  and podman have a default json seccomp allowlist that cannot be
  slowed down, so for the #1 seccomp user this change is already a
  noop.

- systemd/sshd or other apps that use seccomp, if they really need
  STIBP or SSBD, they need to explicitly set the
  PR_SET_SPECULATION_CTRL by now. The stibp/ssbd seccomp blind
  catch-all approach was done probably initially with a wishful
  thinking objective to pretend to have a peace of mind that it could
  magically fix it all. That was wishful thinking before MDS-HT was
  discovered, but after MDS-HT has been discovered it become just
  window dressing.

- For qemu "-sandbox" seccomp jail it wouldn't make sense to set STIBP
  or SSBD. SSBD doesn't help with KVM because there's no JIT (if it's
  needed with TCG it should be an opt-in with
  PR_SET_SPECULATION_CTRL+PR_SPEC_STORE_BYPASS and it shouldn't
  slowdown KVM for nothing). For qemu+KVM STIBP would be even more
  window dressing than it is for all other apps, because in the
  qemu+KVM case there's not only the MDS attack to worry about with
  SMT enabled. Even after disabling SMT, there's still a theoretical
  spectre-v2 attack possible within the same thread context from guest
  mode to host ring3 that the host kernel retpoline mitigation has no
  theoretical chance to mitigate. On some kernels a
  ibrs-always/ibrs-retpoline opt-in model is provided that will
  enabled IBRS in the qemu host ring3 userland which fixes this
  theoretical concern. Only after enabling IBRS in the host userland
  it would then make sense to proceed and worry about STIBP and an
  attack on the other host userland, but then again SMT would need to
  be disabled for full security anyway, so that would render STIBP
  again a noop.

- last but not the least: the lack of "spec_store_bypass_disable=prctl
  spectre_v2_user=prctl" means the moment a guest boots and
  sshd/systemd runs, the guest kernel will write to SPEC_CTRL MSR
  which will make the guest vmexit forever slower, forcing KVM to
  issue a very slow rdmsr instruction at every vmexit. So the end
  result is that SPEC_CTRL MSR is only available in GCE. Most other
  public cloud providers don't expose SPEC_CTRL, which means that not
  only STIBP/SSBD isn't available, but IBPB isn't available either
  (which would cause no overhead to the guest or the hypervisor
  because it's write only and requires no reading during vmexit). So
  the current default already net loss in security (missing IBPB)
  which means most public cloud providers cannot achieve a fully
  secure guest with nosmt (and nosmt is enough to fully mitigate
  MDS-HT). It also means GCE and is unfairly penalized in performance
  because it provides the option to enable full security in the guest
  as an opt-in (i.e. nosmt and IBPB). So this change will allow all
  cloud providers to expose SPEC_CTRL without incurring into any
  hypervisor slowdown and at the same time it will remove the unfair
  penalization of GCE performance for doing the right thing and it'll
  allow to get full security with nosmt with IBPB being available (and
  STIBP becoming meaningless).

Example to put things in prospective: the STIBP enabled in seccomp has
never been about protecting apps using seccomp like sshd from an
attack from a malicious userland, but to the contrary it has always
been about protecting the system from an attack from sshd, after a
successful remote network exploit against sshd. In fact initially it
wasn't obvious STIBP would work both ways (STIBP was about preventing
the task that runs with STIBP to be attacked with spectre-v2-HT, but
accidentally in the STIBP case it also prevents the attack in the
other direction). In the hypothetical case that sshd has been remotely
exploited the last concern should be STIBP being set, because it'll be
still possible to obtain info even from the kernel by using MDS if
nosmt wasn't set (and if it was set, STIBP is a noop in the first
place). As opposed kernel cannot leak anything with spectre-v2 HT
because of retpolines and the userland is mitigated by ASLR already
and ideally PID namespaces too. If something it'd be worth checking if
sshd run the seccomp thread under pid namespaces too if available in
the running kernel. SSBD also would be a noop for sshd, since sshd
uses no JIT. If sshd prefers to keep doing the STIBP window dressing
exercise, it still can even after this change of defaults by opting-in
with PR_SPEC_STORE_BYPASS.

Ultimately setting SSBD and STIBP by default for all seccomp jails is
a bad sweet spot and bad default with more cons than pros that end up
reducing security in the public cloud (by giving an huge incentive to
not expose SPEC_CTRL which would be needed to get full security with
IBPB after setting nosmt in the guest) and by excessively hurting
performance to more secure apps using seccomp that end up having to
opt out with SECCOMP_FILTER_FLAG_SPEC_ALLOW.

The following is the verified result of the new default with SMT
enabled:

(gdb) print spectre_v2_user_stibp
$1 = SPECTRE_V2_USER_PRCTL
(gdb) print spectre_v2_user_ibpb
$2 = SPECTRE_V2_USER_PRCTL
(gdb) print ssb_mode
$3 = SPEC_STORE_BYPASS_PRCTL

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 5 ++---
 arch/x86/kernel/cpu/bugs.c                      | 4 ++--
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 526d65d8573a..105401a3582f 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4980,8 +4980,7 @@
 			auto    - Kernel selects the mitigation depending on
 				  the available CPU features and vulnerability.
 
-			Default mitigation:
-			If CONFIG_SECCOMP=y then "seccomp", otherwise "prctl"
+			Default mitigation: "prctl"
 
 			Not specifying this option is equivalent to
 			spectre_v2_user=auto.
@@ -5025,7 +5024,7 @@
 				  will disable SSB unless they explicitly opt out.
 
 			Default mitigations:
-			X86:	If CONFIG_SECCOMP=y "seccomp", otherwise "prctl"
+			X86:	"prctl"
 
 			On powerpc the options are:
 
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index d3f0db463f96..5ec39397fe9c 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -721,11 +721,11 @@ spectre_v2_user_select_mitigation(enum spectre_v2_mitigation_cmd v2_cmd)
 	case SPECTRE_V2_USER_CMD_FORCE:
 		mode = SPECTRE_V2_USER_STRICT;
 		break;
+	case SPECTRE_V2_USER_CMD_AUTO:
 	case SPECTRE_V2_USER_CMD_PRCTL:
 	case SPECTRE_V2_USER_CMD_PRCTL_IBPB:
 		mode = SPECTRE_V2_USER_PRCTL;
 		break;
-	case SPECTRE_V2_USER_CMD_AUTO:
 	case SPECTRE_V2_USER_CMD_SECCOMP:
 	case SPECTRE_V2_USER_CMD_SECCOMP_IBPB:
 		if (IS_ENABLED(CONFIG_SECCOMP))
@@ -1132,7 +1132,6 @@ static enum ssb_mitigation __init __ssb_select_mitigation(void)
 		return mode;
 
 	switch (cmd) {
-	case SPEC_STORE_BYPASS_CMD_AUTO:
 	case SPEC_STORE_BYPASS_CMD_SECCOMP:
 		/*
 		 * Choose prctl+seccomp as the default mode if seccomp is
@@ -1146,6 +1145,7 @@ static enum ssb_mitigation __init __ssb_select_mitigation(void)
 	case SPEC_STORE_BYPASS_CMD_ON:
 		mode = SPEC_STORE_BYPASS_DISABLE;
 		break;
+	case SPEC_STORE_BYPASS_CMD_AUTO:
 	case SPEC_STORE_BYPASS_CMD_PRCTL:
 		mode = SPEC_STORE_BYPASS_PRCTL;
 		break;

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* RFC: default to spec_store_bypass_disable=prctl spectre_v2_user=prctl
@ 2020-11-04 21:57 ` Andrea Arcangeli
  0 siblings, 0 replies; 21+ messages in thread
From: Andrea Arcangeli @ 2020-11-04 21:57 UTC (permalink / raw)
  To: Kees Cook
  Cc: YiFei Zhu, Linux Containers, YiFei Zhu, bpf, kernel list,
	Aleksa Sarai, Andy Lutomirski, David Laight, Dimitrios Skarlatos,
	Giuseppe Scrivano, Hubertus Franke, Jack Chen, Jann Horn,
	Josep Torrellas, Tianyin Xu, Tobin Feldman-Fitzthum,
	Tycho Andersen, Valentin Rothberg, Will Drewry, Jiri Kosina,
	Thomas Gleixner, Waiman Long

Hello,

[ Given the CC list and your mention of spectre_v2_user=prctl is spot
  on to show the badness... I spawned a new thread to suggest another
  thing related to seccomp that I've been intending to suggest for
  a while ]

On Tue, Nov 03, 2020 at 04:29:38PM -0800, Kees Cook wrote:
> I assume this is from Indirect Branch Prediction Barrier (IBPB) and
> Single Threaded Indirect Branch Prediction (STIBP) (which get enabled
> for threads under seccomp by default).
> 
> Try booting with "spectre_v2_user=prctl"

We need to change the kernel default to
"spec_store_bypass_disable=prctl spectre_v2_user=prctl".

I've been recommending to everyone to use
"spec_store_bypass_disable=prctl spectre_v2_user=prctl" for a while
now. I already recommend to Yifei too a few months ago when he first
found out of the huge seccomp regression when he upgraded his codebase
to the upstream kernel with both STIBP/SSBD enabled in seccomp jails.

Here's below a tentative RFC, the code is actually trivial, if you
could help reviewing/improving the commit header it would be great.

Thanks,
Andrea

From 3f7adb783262dc7f4e71cdbf07b4ef9f6b8d3ed9 Mon Sep 17 00:00:00 2001
From: Andrea Arcangeli <aarcange@redhat.com>
Date: Wed, 4 Nov 2020 15:20:33 -0500
Subject: [PATCH 1/1] x86: change default to spec_store_bypass_disable=prctl
 spectre_v2_user=prctl

Switch the kernel default of SSBD and STIBP to the ones with
CONFIG_SECCOMP=n (i.e. spec_store_bypass_disable=prctl
spectre_v2_user=prctl) even if CONFIG_SECCOMP=y.

Several motivations listed below:

- If SMT is enabled the seccomp jail can still attack the rest of the
  system even with spectre_v2_user=seccomp by using MDS-HT (except on
  XEON PHI where MDS can be tamed with SMT left enabled, but that's a
  special case). Setting STIBP become a very expensive window dressing
  after MDS-HT was discovered.

- The seccomp jail cannot attack the kernel with spectre-v2-HT
  regardless (even if STIBP is not set), but with MDS-HT the seccomp
  jail can attack the kernel too.

- With spec_store_bypass_disable=prctl the seccomp jail can attack the
  other userland (guest or host mode) using spectre-v2-HT, but the
  userland attack is already mitigated by both ASLR and pid namespaces
  for host userland and through virt isolation with libkrun or
  kata. (if something if somebody is worried about spectre-v2-HT it's
  best to mount proc with hidepid=2,gid=proc on workstations where not
  all apps may run under container runtimes, rather than slowing down
  all seccomp jails, but the best is to add pid namespaces to the
  seccomp jail). As opposed MDS-HT is not mitigated and the seccomp
  jail can still attack all other host and guest userland if SMT is
  enabled even with spec_store_bypass_disable=seccomp.

- If full security is required then MDS-HT must also be mitigated with
  nosmt and then spectre_v2_user=prctl and spectre_v2_user=seccomp
  would become identical.

- Setting spectre_v2_user=seccomp is overall lower priority than to
  setting javascript.options.wasm false in about:config to protect
  against remote wasm MDS-HT, instead of worrying about Spectre-v2-HT
  and STIBP which again is already statistically well mitigated by
  other means in userland and it's fully mitigated in kernel with
  retpolines (unlike the wasm assist call with MDS-HT).

- SSBD is needed to prevent reading the JIT memory and the primary
  user being the OpenJDK. However the primary user of SSBD wouldn't be
  covered by spec_store_bypass_disable=seccomp because it doesn't use
  seccomp and the primary user also explicitly declined to set
  PR_SET_SPECULATION_CTRL+PR_SPEC_STORE_BYPASS despite it easily
  could. In fact it would need to set it only when the sandboxing
  mechanism is enabled for javaws applets, but it still declined it by
  declaring security within the same user address space as an
  untenable objective for their JIT, even in the sandboxing case where
  performance would be a lesser concern (for the record: I kind of
  disagree in not setting PR_SPEC_STORE_BYPASS in the sandbox case and
  I prefer to run javaws through a wrapper that sets
  PR_SPEC_STORE_BYPASS if I need). In turn it can be inferred that
  even if the primary user of SSBD would use seccomp, they would
  invoke it with SECCOMP_FILTER_FLAG_SPEC_ALLOW by now.

- runc/crun already set SECCOMP_FILTER_FLAG_SPEC_ALLOW by default, k8s
  and podman have a default json seccomp allowlist that cannot be
  slowed down, so for the #1 seccomp user this change is already a
  noop.

- systemd/sshd or other apps that use seccomp, if they really need
  STIBP or SSBD, they need to explicitly set the
  PR_SET_SPECULATION_CTRL by now. The stibp/ssbd seccomp blind
  catch-all approach was done probably initially with a wishful
  thinking objective to pretend to have a peace of mind that it could
  magically fix it all. That was wishful thinking before MDS-HT was
  discovered, but after MDS-HT has been discovered it become just
  window dressing.

- For qemu "-sandbox" seccomp jail it wouldn't make sense to set STIBP
  or SSBD. SSBD doesn't help with KVM because there's no JIT (if it's
  needed with TCG it should be an opt-in with
  PR_SET_SPECULATION_CTRL+PR_SPEC_STORE_BYPASS and it shouldn't
  slowdown KVM for nothing). For qemu+KVM STIBP would be even more
  window dressing than it is for all other apps, because in the
  qemu+KVM case there's not only the MDS attack to worry about with
  SMT enabled. Even after disabling SMT, there's still a theoretical
  spectre-v2 attack possible within the same thread context from guest
  mode to host ring3 that the host kernel retpoline mitigation has no
  theoretical chance to mitigate. On some kernels a
  ibrs-always/ibrs-retpoline opt-in model is provided that will
  enabled IBRS in the qemu host ring3 userland which fixes this
  theoretical concern. Only after enabling IBRS in the host userland
  it would then make sense to proceed and worry about STIBP and an
  attack on the other host userland, but then again SMT would need to
  be disabled for full security anyway, so that would render STIBP
  again a noop.

- last but not the least: the lack of "spec_store_bypass_disable=prctl
  spectre_v2_user=prctl" means the moment a guest boots and
  sshd/systemd runs, the guest kernel will write to SPEC_CTRL MSR
  which will make the guest vmexit forever slower, forcing KVM to
  issue a very slow rdmsr instruction at every vmexit. So the end
  result is that SPEC_CTRL MSR is only available in GCE. Most other
  public cloud providers don't expose SPEC_CTRL, which means that not
  only STIBP/SSBD isn't available, but IBPB isn't available either
  (which would cause no overhead to the guest or the hypervisor
  because it's write only and requires no reading during vmexit). So
  the current default already net loss in security (missing IBPB)
  which means most public cloud providers cannot achieve a fully
  secure guest with nosmt (and nosmt is enough to fully mitigate
  MDS-HT). It also means GCE and is unfairly penalized in performance
  because it provides the option to enable full security in the guest
  as an opt-in (i.e. nosmt and IBPB). So this change will allow all
  cloud providers to expose SPEC_CTRL without incurring into any
  hypervisor slowdown and at the same time it will remove the unfair
  penalization of GCE performance for doing the right thing and it'll
  allow to get full security with nosmt with IBPB being available (and
  STIBP becoming meaningless).

Example to put things in prospective: the STIBP enabled in seccomp has
never been about protecting apps using seccomp like sshd from an
attack from a malicious userland, but to the contrary it has always
been about protecting the system from an attack from sshd, after a
successful remote network exploit against sshd. In fact initially it
wasn't obvious STIBP would work both ways (STIBP was about preventing
the task that runs with STIBP to be attacked with spectre-v2-HT, but
accidentally in the STIBP case it also prevents the attack in the
other direction). In the hypothetical case that sshd has been remotely
exploited the last concern should be STIBP being set, because it'll be
still possible to obtain info even from the kernel by using MDS if
nosmt wasn't set (and if it was set, STIBP is a noop in the first
place). As opposed kernel cannot leak anything with spectre-v2 HT
because of retpolines and the userland is mitigated by ASLR already
and ideally PID namespaces too. If something it'd be worth checking if
sshd run the seccomp thread under pid namespaces too if available in
the running kernel. SSBD also would be a noop for sshd, since sshd
uses no JIT. If sshd prefers to keep doing the STIBP window dressing
exercise, it still can even after this change of defaults by opting-in
with PR_SPEC_STORE_BYPASS.

Ultimately setting SSBD and STIBP by default for all seccomp jails is
a bad sweet spot and bad default with more cons than pros that end up
reducing security in the public cloud (by giving an huge incentive to
not expose SPEC_CTRL which would be needed to get full security with
IBPB after setting nosmt in the guest) and by excessively hurting
performance to more secure apps using seccomp that end up having to
opt out with SECCOMP_FILTER_FLAG_SPEC_ALLOW.

The following is the verified result of the new default with SMT
enabled:

(gdb) print spectre_v2_user_stibp
$1 = SPECTRE_V2_USER_PRCTL
(gdb) print spectre_v2_user_ibpb
$2 = SPECTRE_V2_USER_PRCTL
(gdb) print ssb_mode
$3 = SPEC_STORE_BYPASS_PRCTL

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 5 ++---
 arch/x86/kernel/cpu/bugs.c                      | 4 ++--
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 526d65d8573a..105401a3582f 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4980,8 +4980,7 @@
 			auto    - Kernel selects the mitigation depending on
 				  the available CPU features and vulnerability.
 
-			Default mitigation:
-			If CONFIG_SECCOMP=y then "seccomp", otherwise "prctl"
+			Default mitigation: "prctl"
 
 			Not specifying this option is equivalent to
 			spectre_v2_user=auto.
@@ -5025,7 +5024,7 @@
 				  will disable SSB unless they explicitly opt out.
 
 			Default mitigations:
-			X86:	If CONFIG_SECCOMP=y "seccomp", otherwise "prctl"
+			X86:	"prctl"
 
 			On powerpc the options are:
 
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index d3f0db463f96..5ec39397fe9c 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -721,11 +721,11 @@ spectre_v2_user_select_mitigation(enum spectre_v2_mitigation_cmd v2_cmd)
 	case SPECTRE_V2_USER_CMD_FORCE:
 		mode = SPECTRE_V2_USER_STRICT;
 		break;
+	case SPECTRE_V2_USER_CMD_AUTO:
 	case SPECTRE_V2_USER_CMD_PRCTL:
 	case SPECTRE_V2_USER_CMD_PRCTL_IBPB:
 		mode = SPECTRE_V2_USER_PRCTL;
 		break;
-	case SPECTRE_V2_USER_CMD_AUTO:
 	case SPECTRE_V2_USER_CMD_SECCOMP:
 	case SPECTRE_V2_USER_CMD_SECCOMP_IBPB:
 		if (IS_ENABLED(CONFIG_SECCOMP))
@@ -1132,7 +1132,6 @@ static enum ssb_mitigation __init __ssb_select_mitigation(void)
 		return mode;
 
 	switch (cmd) {
-	case SPEC_STORE_BYPASS_CMD_AUTO:
 	case SPEC_STORE_BYPASS_CMD_SECCOMP:
 		/*
 		 * Choose prctl+seccomp as the default mode if seccomp is
@@ -1146,6 +1145,7 @@ static enum ssb_mitigation __init __ssb_select_mitigation(void)
 	case SPEC_STORE_BYPASS_CMD_ON:
 		mode = SPEC_STORE_BYPASS_DISABLE;
 		break;
+	case SPEC_STORE_BYPASS_CMD_AUTO:
 	case SPEC_STORE_BYPASS_CMD_PRCTL:
 		mode = SPEC_STORE_BYPASS_PRCTL;
 		break;


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: RFC: default to spec_store_bypass_disable=prctl spectre_v2_user=prctl
  2020-11-04 21:57 ` Andrea Arcangeli
@ 2020-11-04 22:14   ` Kees Cook
  -1 siblings, 0 replies; 21+ messages in thread
From: Kees Cook @ 2020-11-04 22:14 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Tobin Feldman-Fitzthum, Hubertus Franke, Jack Chen,
	Giuseppe Scrivano, YiFei Zhu, Waiman Long, Tianyin Xu, Jann Horn,
	Jiri Kosina, Valentin Rothberg, Josep Torrellas, Thomas Gleixner,
	Will Drewry, Linux Containers, kernel list, Andy Lutomirski,
	Dimitrios Skarlatos, David Laight, bpf

On Wed, Nov 04, 2020 at 04:57:02PM -0500, Andrea Arcangeli wrote:
> Switch the kernel default of SSBD and STIBP to the ones with
> CONFIG_SECCOMP=n (i.e. spec_store_bypass_disable=prctl
> spectre_v2_user=prctl) even if CONFIG_SECCOMP=y.

Agreed. I think this is the right time to flip this switch. I agree with
the (very well described) rationales. :)

Fundamentally, likely everyone who is interested in manipulating the
mitigations are doing so now, and it doesn't make sense (on many fronts)
to tie some to seccomp mode any more (which was intended as a temporary
defense to gain coverage while sysadmins absorbed what the best
practices should be).

Thanks for sending this!

Acked-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: default to spec_store_bypass_disable=prctl spectre_v2_user=prctl
@ 2020-11-04 22:14   ` Kees Cook
  0 siblings, 0 replies; 21+ messages in thread
From: Kees Cook @ 2020-11-04 22:14 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: YiFei Zhu, Linux Containers, YiFei Zhu, bpf, kernel list,
	Aleksa Sarai, Andy Lutomirski, David Laight, Dimitrios Skarlatos,
	Giuseppe Scrivano, Hubertus Franke, Jack Chen, Jann Horn,
	Josep Torrellas, Tianyin Xu, Tobin Feldman-Fitzthum,
	Tycho Andersen, Valentin Rothberg, Will Drewry, Jiri Kosina,
	Thomas Gleixner, Waiman Long

On Wed, Nov 04, 2020 at 04:57:02PM -0500, Andrea Arcangeli wrote:
> Switch the kernel default of SSBD and STIBP to the ones with
> CONFIG_SECCOMP=n (i.e. spec_store_bypass_disable=prctl
> spectre_v2_user=prctl) even if CONFIG_SECCOMP=y.

Agreed. I think this is the right time to flip this switch. I agree with
the (very well described) rationales. :)

Fundamentally, likely everyone who is interested in manipulating the
mitigations are doing so now, and it doesn't make sense (on many fronts)
to tie some to seccomp mode any more (which was intended as a temporary
defense to gain coverage while sysadmins absorbed what the best
practices should be).

Thanks for sending this!

Acked-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: default to spec_store_bypass_disable=prctl spectre_v2_user=prctl
  2020-11-04 21:57 ` Andrea Arcangeli
@ 2020-11-04 23:22   ` Thomas Gleixner
  -1 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2020-11-04 23:22 UTC (permalink / raw)
  To: Andrea Arcangeli, Kees Cook
  Cc: Waiman Long, Giuseppe Scrivano, Valentin Rothberg, Jann Horn,
	YiFei Zhu, Linux Containers, Jiri Kosina, Tobin Feldman-Fitzthum,
	kernel list, Andy Lutomirski, Hubertus Franke, David Laight,
	Jack Chen, Dimitrios Skarlatos, Josep Torrellas, Will Drewry,
	bpf, Tianyin Xu

On Wed, Nov 04 2020 at 16:57, Andrea Arcangeli wrote:
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 5 ++---

Is Documentation/admin-guide/hw-vuln/* still correct? If not, please
fix that as well.

Aside of that please send patches in the proper format so they do not
need manual interaction when picking them up.

Thanks,

        tglx
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: default to spec_store_bypass_disable=prctl spectre_v2_user=prctl
@ 2020-11-04 23:22   ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2020-11-04 23:22 UTC (permalink / raw)
  To: Andrea Arcangeli, Kees Cook
  Cc: YiFei Zhu, Linux Containers, YiFei Zhu, bpf, kernel list,
	Aleksa Sarai, Andy Lutomirski, David Laight, Dimitrios Skarlatos,
	Giuseppe Scrivano, Hubertus Franke, Jack Chen, Jann Horn,
	Josep Torrellas, Tianyin Xu, Tobin Feldman-Fitzthum,
	Tycho Andersen, Valentin Rothberg, Will Drewry, Jiri Kosina,
	Waiman Long

On Wed, Nov 04 2020 at 16:57, Andrea Arcangeli wrote:
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 5 ++---

Is Documentation/admin-guide/hw-vuln/* still correct? If not, please
fix that as well.

Aside of that please send patches in the proper format so they do not
need manual interaction when picking them up.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: default to spec_store_bypass_disable=prctl spectre_v2_user=prctl
  2020-11-04 23:22   ` Thomas Gleixner
@ 2020-11-04 23:40     ` Andrea Arcangeli
  -1 siblings, 0 replies; 21+ messages in thread
From: Andrea Arcangeli @ 2020-11-04 23:40 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Tobin Feldman-Fitzthum, Hubertus Franke, Jack Chen,
	Giuseppe Scrivano, YiFei Zhu, Waiman Long, Tianyin Xu, Kees Cook,
	Jann Horn, Jiri Kosina, Valentin Rothberg, Josep Torrellas,
	Will Drewry, Linux Containers, kernel list, Andy Lutomirski,
	Dimitrios Skarlatos, David Laight, bpf

On Thu, Nov 05, 2020 at 12:22:29AM +0100, Thomas Gleixner wrote:
> On Wed, Nov 04 2020 at 16:57, Andrea Arcangeli wrote:
> > ---
> >  Documentation/admin-guide/kernel-parameters.txt | 5 ++---
> 
> Is Documentation/admin-guide/hw-vuln/* still correct? If not, please
> fix that as well.

Right, I missed two seccomp mention that needed removing there too.

Also I noticed below I intended PR_SPEC_INDIRECT_BRANCH
(PR_SPEC_STORE_BYPASS there is no point to even mention it as a
possibility to be considered), so I corrected it.

==
uses no JIT. If sshd prefers to keep doing the STIBP window dressing
exercise, it still can even after this change of defaults by opting-in
with PR_SPEC_STORE_BYPASS.
==

> > >with PR_SPEC_INDIRECT_BRANCH.

> Aside of that please send patches in the proper format so they do not
> need manual interaction when picking them up.

This was a RFC per subject since I expected it wouldn't be final, but
I added Kees' Acked-by and I'll submit it now.

Thanks,
Andrea

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: default to spec_store_bypass_disable=prctl spectre_v2_user=prctl
@ 2020-11-04 23:40     ` Andrea Arcangeli
  0 siblings, 0 replies; 21+ messages in thread
From: Andrea Arcangeli @ 2020-11-04 23:40 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kees Cook, YiFei Zhu, Linux Containers, YiFei Zhu, bpf,
	kernel list, Aleksa Sarai, Andy Lutomirski, David Laight,
	Dimitrios Skarlatos, Giuseppe Scrivano, Hubertus Franke,
	Jack Chen, Jann Horn, Josep Torrellas, Tianyin Xu,
	Tobin Feldman-Fitzthum, Tycho Andersen, Valentin Rothberg,
	Will Drewry, Jiri Kosina, Waiman Long

On Thu, Nov 05, 2020 at 12:22:29AM +0100, Thomas Gleixner wrote:
> On Wed, Nov 04 2020 at 16:57, Andrea Arcangeli wrote:
> > ---
> >  Documentation/admin-guide/kernel-parameters.txt | 5 ++---
> 
> Is Documentation/admin-guide/hw-vuln/* still correct? If not, please
> fix that as well.

Right, I missed two seccomp mention that needed removing there too.

Also I noticed below I intended PR_SPEC_INDIRECT_BRANCH
(PR_SPEC_STORE_BYPASS there is no point to even mention it as a
possibility to be considered), so I corrected it.

==
uses no JIT. If sshd prefers to keep doing the STIBP window dressing
exercise, it still can even after this change of defaults by opting-in
with PR_SPEC_STORE_BYPASS.
==

> > >with PR_SPEC_INDIRECT_BRANCH.

> Aside of that please send patches in the proper format so they do not
> need manual interaction when picking them up.

This was a RFC per subject since I expected it wouldn't be final, but
I added Kees' Acked-by and I'll submit it now.

Thanks,
Andrea


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/1] x86: change default to spec_store_bypass_disable=prctl spectre_v2_user=prctl
  2020-11-04 23:22   ` Thomas Gleixner
@ 2020-11-04 23:50     ` Andrea Arcangeli
  -1 siblings, 0 replies; 21+ messages in thread
From: Andrea Arcangeli @ 2020-11-04 23:50 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Josep Torrellas, Tobin Feldman-Fitzthum, Hubertus Franke,
	Jack Chen, Giuseppe Scrivano, Andi Kleen, YiFei Zhu, Waiman Long,
	Tianyin Xu, Jann Horn, Jiri Kosina, Valentin Rothberg,
	Josh Poimboeuf, Will Drewry, Linux Containers, kernel list,
	Andy Lutomirski, Dimitrios Skarlatos, David Laight, bpf

Switch the kernel default of SSBD and STIBP to the ones with
CONFIG_SECCOMP=n (i.e. spec_store_bypass_disable=prctl
spectre_v2_user=prctl) even if CONFIG_SECCOMP=y.

Several motivations listed below:

- If SMT is enabled the seccomp jail can still attack the rest of the
  system even with spectre_v2_user=seccomp by using MDS-HT (except on
  XEON PHI where MDS can be tamed with SMT left enabled, but that's a
  special case). Setting STIBP become a very expensive window dressing
  after MDS-HT was discovered.

- The seccomp jail cannot attack the kernel with spectre-v2-HT
  regardless (even if STIBP is not set), but with MDS-HT the seccomp
  jail can attack the kernel too.

- With spec_store_bypass_disable=prctl the seccomp jail can attack the
  other userland (guest or host mode) using spectre-v2-HT, but the
  userland attack is already mitigated by both ASLR and pid namespaces
  for host userland and through virt isolation with libkrun or
  kata. (if something if somebody is worried about spectre-v2-HT it's
  best to mount proc with hidepid=2,gid=proc on workstations where not
  all apps may run under container runtimes, rather than slowing down
  all seccomp jails, but the best is to add pid namespaces to the
  seccomp jail). As opposed MDS-HT is not mitigated and the seccomp
  jail can still attack all other host and guest userland if SMT is
  enabled even with spec_store_bypass_disable=seccomp.

- If full security is required then MDS-HT must also be mitigated with
  nosmt and then spectre_v2_user=prctl and spectre_v2_user=seccomp
  would become identical.

- Setting spectre_v2_user=seccomp is overall lower priority than to
  setting javascript.options.wasm false in about:config to protect
  against remote wasm MDS-HT, instead of worrying about Spectre-v2-HT
  and STIBP which again is already statistically well mitigated by
  other means in userland and it's fully mitigated in kernel with
  retpolines (unlike the wasm assist call with MDS-HT).

- SSBD is needed to prevent reading the JIT memory and the primary
  user being the OpenJDK. However the primary user of SSBD wouldn't be
  covered by spec_store_bypass_disable=seccomp because it doesn't use
  seccomp and the primary user also explicitly declined to set
  PR_SET_SPECULATION_CTRL+PR_SPEC_STORE_BYPASS despite it easily
  could. In fact it would need to set it only when the sandboxing
  mechanism is enabled for javaws applets, but it still declined it by
  declaring security within the same user address space as an
  untenable objective for their JIT, even in the sandboxing case where
  performance would be a lesser concern (for the record: I kind of
  disagree in not setting PR_SPEC_STORE_BYPASS in the sandbox case and
  I prefer to run javaws through a wrapper that sets
  PR_SPEC_STORE_BYPASS if I need). In turn it can be inferred that
  even if the primary user of SSBD would use seccomp, they would
  invoke it with SECCOMP_FILTER_FLAG_SPEC_ALLOW by now.

- runc/crun already set SECCOMP_FILTER_FLAG_SPEC_ALLOW by default, k8s
  and podman have a default json seccomp allowlist that cannot be
  slowed down, so for the #1 seccomp user this change is already a
  noop.

- systemd/sshd or other apps that use seccomp, if they really need
  STIBP or SSBD, they need to explicitly set the
  PR_SET_SPECULATION_CTRL by now. The stibp/ssbd seccomp blind
  catch-all approach was done probably initially with a wishful
  thinking objective to pretend to have a peace of mind that it could
  magically fix it all. That was wishful thinking before MDS-HT was
  discovered, but after MDS-HT has been discovered it become just
  window dressing.

- For qemu "-sandbox" seccomp jail it wouldn't make sense to set STIBP
  or SSBD. SSBD doesn't help with KVM because there's no JIT (if it's
  needed with TCG it should be an opt-in with
  PR_SET_SPECULATION_CTRL+PR_SPEC_STORE_BYPASS and it shouldn't
  slowdown KVM for nothing). For qemu+KVM STIBP would be even more
  window dressing than it is for all other apps, because in the
  qemu+KVM case there's not only the MDS attack to worry about with
  SMT enabled. Even after disabling SMT, there's still a theoretical
  spectre-v2 attack possible within the same thread context from guest
  mode to host ring3 that the host kernel retpoline mitigation has no
  theoretical chance to mitigate. On some kernels a
  ibrs-always/ibrs-retpoline opt-in model is provided that will
  enabled IBRS in the qemu host ring3 userland which fixes this
  theoretical concern. Only after enabling IBRS in the host userland
  it would then make sense to proceed and worry about STIBP and an
  attack on the other host userland, but then again SMT would need to
  be disabled for full security anyway, so that would render STIBP
  again a noop.

- last but not the least: the lack of "spec_store_bypass_disable=prctl
  spectre_v2_user=prctl" means the moment a guest boots and
  sshd/systemd runs, the guest kernel will write to SPEC_CTRL MSR
  which will make the guest vmexit forever slower, forcing KVM to
  issue a very slow rdmsr instruction at every vmexit. So the end
  result is that SPEC_CTRL MSR is only available in GCE. Most other
  public cloud providers don't expose SPEC_CTRL, which means that not
  only STIBP/SSBD isn't available, but IBPB isn't available either
  (which would cause no overhead to the guest or the hypervisor
  because it's write only and requires no reading during vmexit). So
  the current default already net loss in security (missing IBPB)
  which means most public cloud providers cannot achieve a fully
  secure guest with nosmt (and nosmt is enough to fully mitigate
  MDS-HT). It also means GCE and is unfairly penalized in performance
  because it provides the option to enable full security in the guest
  as an opt-in (i.e. nosmt and IBPB). So this change will allow all
  cloud providers to expose SPEC_CTRL without incurring into any
  hypervisor slowdown and at the same time it will remove the unfair
  penalization of GCE performance for doing the right thing and it'll
  allow to get full security with nosmt with IBPB being available (and
  STIBP becoming meaningless).

Example to put things in prospective: the STIBP enabled in seccomp has
never been about protecting apps using seccomp like sshd from an
attack from a malicious userland, but to the contrary it has always
been about protecting the system from an attack from sshd, after a
successful remote network exploit against sshd. In fact initially it
wasn't obvious STIBP would work both ways (STIBP was about preventing
the task that runs with STIBP to be attacked with spectre-v2-HT, but
accidentally in the STIBP case it also prevents the attack in the
other direction). In the hypothetical case that sshd has been remotely
exploited the last concern should be STIBP being set, because it'll be
still possible to obtain info even from the kernel by using MDS if
nosmt wasn't set (and if it was set, STIBP is a noop in the first
place). As opposed kernel cannot leak anything with spectre-v2 HT
because of retpolines and the userland is mitigated by ASLR already
and ideally PID namespaces too. If something it'd be worth checking if
sshd run the seccomp thread under pid namespaces too if available in
the running kernel. SSBD also would be a noop for sshd, since sshd
uses no JIT. If sshd prefers to keep doing the STIBP window dressing
exercise, it still can even after this change of defaults by opting-in
with PR_SPEC_INDIRECT_BRANCH.

Ultimately setting SSBD and STIBP by default for all seccomp jails is
a bad sweet spot and bad default with more cons than pros that end up
reducing security in the public cloud (by giving an huge incentive to
not expose SPEC_CTRL which would be needed to get full security with
IBPB after setting nosmt in the guest) and by excessively hurting
performance to more secure apps using seccomp that end up having to
opt out with SECCOMP_FILTER_FLAG_SPEC_ALLOW.

The following is the verified result of the new default with SMT
enabled:

(gdb) print spectre_v2_user_stibp
$1 = SPECTRE_V2_USER_PRCTL
(gdb) print spectre_v2_user_ibpb
$2 = SPECTRE_V2_USER_PRCTL
(gdb) print ssb_mode
$3 = SPEC_STORE_BYPASS_PRCTL

Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 Documentation/admin-guide/hw-vuln/spectre.rst   | 10 ++++------
 Documentation/admin-guide/kernel-parameters.txt |  5 ++---
 arch/x86/kernel/cpu/bugs.c                      |  4 ++--
 3 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
index e05e581af5cf..19b897cb1d45 100644
--- a/Documentation/admin-guide/hw-vuln/spectre.rst
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -490,9 +490,8 @@ Spectre variant 2
 
    Restricting indirect branch speculation on a user program will
    also prevent the program from launching a variant 2 attack
-   on x86.  All sand-boxed SECCOMP programs have indirect branch
-   speculation restricted by default.  Administrators can change
-   that behavior via the kernel command line and sysfs control files.
+   on x86.  Administrators can change that behavior via the kernel
+   command line and sysfs control files.
    See :ref:`spectre_mitigation_control_command_line`.
 
    Programs that disable their indirect branch speculation will have
@@ -674,9 +673,8 @@ Mitigation selection guide
    off by disabling their indirect branch speculation when they are run
    (See :ref:`Documentation/userspace-api/spec_ctrl.rst <set_spec_ctrl>`).
    This prevents untrusted programs from polluting the branch target
-   buffer.  All programs running in SECCOMP sandboxes have indirect
-   branch speculation restricted by default. This behavior can be
-   changed via the kernel command line and sysfs control files. See
+   buffer.  This behavior can be changed via the kernel command line
+   and sysfs control files. See
    :ref:`spectre_mitigation_control_command_line`.
 
 3. High security mode
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 526d65d8573a..105401a3582f 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4980,8 +4980,7 @@
 			auto    - Kernel selects the mitigation depending on
 				  the available CPU features and vulnerability.
 
-			Default mitigation:
-			If CONFIG_SECCOMP=y then "seccomp", otherwise "prctl"
+			Default mitigation: "prctl"
 
 			Not specifying this option is equivalent to
 			spectre_v2_user=auto.
@@ -5025,7 +5024,7 @@
 				  will disable SSB unless they explicitly opt out.
 
 			Default mitigations:
-			X86:	If CONFIG_SECCOMP=y "seccomp", otherwise "prctl"
+			X86:	"prctl"
 
 			On powerpc the options are:
 
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index d3f0db463f96..5ec39397fe9c 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -721,11 +721,11 @@ spectre_v2_user_select_mitigation(enum spectre_v2_mitigation_cmd v2_cmd)
 	case SPECTRE_V2_USER_CMD_FORCE:
 		mode = SPECTRE_V2_USER_STRICT;
 		break;
+	case SPECTRE_V2_USER_CMD_AUTO:
 	case SPECTRE_V2_USER_CMD_PRCTL:
 	case SPECTRE_V2_USER_CMD_PRCTL_IBPB:
 		mode = SPECTRE_V2_USER_PRCTL;
 		break;
-	case SPECTRE_V2_USER_CMD_AUTO:
 	case SPECTRE_V2_USER_CMD_SECCOMP:
 	case SPECTRE_V2_USER_CMD_SECCOMP_IBPB:
 		if (IS_ENABLED(CONFIG_SECCOMP))
@@ -1132,7 +1132,6 @@ static enum ssb_mitigation __init __ssb_select_mitigation(void)
 		return mode;
 
 	switch (cmd) {
-	case SPEC_STORE_BYPASS_CMD_AUTO:
 	case SPEC_STORE_BYPASS_CMD_SECCOMP:
 		/*
 		 * Choose prctl+seccomp as the default mode if seccomp is
@@ -1146,6 +1145,7 @@ static enum ssb_mitigation __init __ssb_select_mitigation(void)
 	case SPEC_STORE_BYPASS_CMD_ON:
 		mode = SPEC_STORE_BYPASS_DISABLE;
 		break;
+	case SPEC_STORE_BYPASS_CMD_AUTO:
 	case SPEC_STORE_BYPASS_CMD_PRCTL:
 		mode = SPEC_STORE_BYPASS_PRCTL;
 		break;

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 1/1] x86: change default to spec_store_bypass_disable=prctl spectre_v2_user=prctl
@ 2020-11-04 23:50     ` Andrea Arcangeli
  0 siblings, 0 replies; 21+ messages in thread
From: Andrea Arcangeli @ 2020-11-04 23:50 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: YiFei Zhu, Linux Containers, YiFei Zhu, bpf, kernel list,
	Aleksa Sarai, Andy Lutomirski, David Laight, Dimitrios Skarlatos,
	Giuseppe Scrivano, Hubertus Franke, Jack Chen, Jann Horn,
	Josep Torrellas, Tianyin Xu, Tobin Feldman-Fitzthum,
	Tycho Andersen, Valentin Rothberg, Will Drewry, Jiri Kosina,
	Waiman Long, Josh Poimboeuf, Andi Kleen

Switch the kernel default of SSBD and STIBP to the ones with
CONFIG_SECCOMP=n (i.e. spec_store_bypass_disable=prctl
spectre_v2_user=prctl) even if CONFIG_SECCOMP=y.

Several motivations listed below:

- If SMT is enabled the seccomp jail can still attack the rest of the
  system even with spectre_v2_user=seccomp by using MDS-HT (except on
  XEON PHI where MDS can be tamed with SMT left enabled, but that's a
  special case). Setting STIBP become a very expensive window dressing
  after MDS-HT was discovered.

- The seccomp jail cannot attack the kernel with spectre-v2-HT
  regardless (even if STIBP is not set), but with MDS-HT the seccomp
  jail can attack the kernel too.

- With spec_store_bypass_disable=prctl the seccomp jail can attack the
  other userland (guest or host mode) using spectre-v2-HT, but the
  userland attack is already mitigated by both ASLR and pid namespaces
  for host userland and through virt isolation with libkrun or
  kata. (if something if somebody is worried about spectre-v2-HT it's
  best to mount proc with hidepid=2,gid=proc on workstations where not
  all apps may run under container runtimes, rather than slowing down
  all seccomp jails, but the best is to add pid namespaces to the
  seccomp jail). As opposed MDS-HT is not mitigated and the seccomp
  jail can still attack all other host and guest userland if SMT is
  enabled even with spec_store_bypass_disable=seccomp.

- If full security is required then MDS-HT must also be mitigated with
  nosmt and then spectre_v2_user=prctl and spectre_v2_user=seccomp
  would become identical.

- Setting spectre_v2_user=seccomp is overall lower priority than to
  setting javascript.options.wasm false in about:config to protect
  against remote wasm MDS-HT, instead of worrying about Spectre-v2-HT
  and STIBP which again is already statistically well mitigated by
  other means in userland and it's fully mitigated in kernel with
  retpolines (unlike the wasm assist call with MDS-HT).

- SSBD is needed to prevent reading the JIT memory and the primary
  user being the OpenJDK. However the primary user of SSBD wouldn't be
  covered by spec_store_bypass_disable=seccomp because it doesn't use
  seccomp and the primary user also explicitly declined to set
  PR_SET_SPECULATION_CTRL+PR_SPEC_STORE_BYPASS despite it easily
  could. In fact it would need to set it only when the sandboxing
  mechanism is enabled for javaws applets, but it still declined it by
  declaring security within the same user address space as an
  untenable objective for their JIT, even in the sandboxing case where
  performance would be a lesser concern (for the record: I kind of
  disagree in not setting PR_SPEC_STORE_BYPASS in the sandbox case and
  I prefer to run javaws through a wrapper that sets
  PR_SPEC_STORE_BYPASS if I need). In turn it can be inferred that
  even if the primary user of SSBD would use seccomp, they would
  invoke it with SECCOMP_FILTER_FLAG_SPEC_ALLOW by now.

- runc/crun already set SECCOMP_FILTER_FLAG_SPEC_ALLOW by default, k8s
  and podman have a default json seccomp allowlist that cannot be
  slowed down, so for the #1 seccomp user this change is already a
  noop.

- systemd/sshd or other apps that use seccomp, if they really need
  STIBP or SSBD, they need to explicitly set the
  PR_SET_SPECULATION_CTRL by now. The stibp/ssbd seccomp blind
  catch-all approach was done probably initially with a wishful
  thinking objective to pretend to have a peace of mind that it could
  magically fix it all. That was wishful thinking before MDS-HT was
  discovered, but after MDS-HT has been discovered it become just
  window dressing.

- For qemu "-sandbox" seccomp jail it wouldn't make sense to set STIBP
  or SSBD. SSBD doesn't help with KVM because there's no JIT (if it's
  needed with TCG it should be an opt-in with
  PR_SET_SPECULATION_CTRL+PR_SPEC_STORE_BYPASS and it shouldn't
  slowdown KVM for nothing). For qemu+KVM STIBP would be even more
  window dressing than it is for all other apps, because in the
  qemu+KVM case there's not only the MDS attack to worry about with
  SMT enabled. Even after disabling SMT, there's still a theoretical
  spectre-v2 attack possible within the same thread context from guest
  mode to host ring3 that the host kernel retpoline mitigation has no
  theoretical chance to mitigate. On some kernels a
  ibrs-always/ibrs-retpoline opt-in model is provided that will
  enabled IBRS in the qemu host ring3 userland which fixes this
  theoretical concern. Only after enabling IBRS in the host userland
  it would then make sense to proceed and worry about STIBP and an
  attack on the other host userland, but then again SMT would need to
  be disabled for full security anyway, so that would render STIBP
  again a noop.

- last but not the least: the lack of "spec_store_bypass_disable=prctl
  spectre_v2_user=prctl" means the moment a guest boots and
  sshd/systemd runs, the guest kernel will write to SPEC_CTRL MSR
  which will make the guest vmexit forever slower, forcing KVM to
  issue a very slow rdmsr instruction at every vmexit. So the end
  result is that SPEC_CTRL MSR is only available in GCE. Most other
  public cloud providers don't expose SPEC_CTRL, which means that not
  only STIBP/SSBD isn't available, but IBPB isn't available either
  (which would cause no overhead to the guest or the hypervisor
  because it's write only and requires no reading during vmexit). So
  the current default already net loss in security (missing IBPB)
  which means most public cloud providers cannot achieve a fully
  secure guest with nosmt (and nosmt is enough to fully mitigate
  MDS-HT). It also means GCE and is unfairly penalized in performance
  because it provides the option to enable full security in the guest
  as an opt-in (i.e. nosmt and IBPB). So this change will allow all
  cloud providers to expose SPEC_CTRL without incurring into any
  hypervisor slowdown and at the same time it will remove the unfair
  penalization of GCE performance for doing the right thing and it'll
  allow to get full security with nosmt with IBPB being available (and
  STIBP becoming meaningless).

Example to put things in prospective: the STIBP enabled in seccomp has
never been about protecting apps using seccomp like sshd from an
attack from a malicious userland, but to the contrary it has always
been about protecting the system from an attack from sshd, after a
successful remote network exploit against sshd. In fact initially it
wasn't obvious STIBP would work both ways (STIBP was about preventing
the task that runs with STIBP to be attacked with spectre-v2-HT, but
accidentally in the STIBP case it also prevents the attack in the
other direction). In the hypothetical case that sshd has been remotely
exploited the last concern should be STIBP being set, because it'll be
still possible to obtain info even from the kernel by using MDS if
nosmt wasn't set (and if it was set, STIBP is a noop in the first
place). As opposed kernel cannot leak anything with spectre-v2 HT
because of retpolines and the userland is mitigated by ASLR already
and ideally PID namespaces too. If something it'd be worth checking if
sshd run the seccomp thread under pid namespaces too if available in
the running kernel. SSBD also would be a noop for sshd, since sshd
uses no JIT. If sshd prefers to keep doing the STIBP window dressing
exercise, it still can even after this change of defaults by opting-in
with PR_SPEC_INDIRECT_BRANCH.

Ultimately setting SSBD and STIBP by default for all seccomp jails is
a bad sweet spot and bad default with more cons than pros that end up
reducing security in the public cloud (by giving an huge incentive to
not expose SPEC_CTRL which would be needed to get full security with
IBPB after setting nosmt in the guest) and by excessively hurting
performance to more secure apps using seccomp that end up having to
opt out with SECCOMP_FILTER_FLAG_SPEC_ALLOW.

The following is the verified result of the new default with SMT
enabled:

(gdb) print spectre_v2_user_stibp
$1 = SPECTRE_V2_USER_PRCTL
(gdb) print spectre_v2_user_ibpb
$2 = SPECTRE_V2_USER_PRCTL
(gdb) print ssb_mode
$3 = SPEC_STORE_BYPASS_PRCTL

Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 Documentation/admin-guide/hw-vuln/spectre.rst   | 10 ++++------
 Documentation/admin-guide/kernel-parameters.txt |  5 ++---
 arch/x86/kernel/cpu/bugs.c                      |  4 ++--
 3 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
index e05e581af5cf..19b897cb1d45 100644
--- a/Documentation/admin-guide/hw-vuln/spectre.rst
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -490,9 +490,8 @@ Spectre variant 2
 
    Restricting indirect branch speculation on a user program will
    also prevent the program from launching a variant 2 attack
-   on x86.  All sand-boxed SECCOMP programs have indirect branch
-   speculation restricted by default.  Administrators can change
-   that behavior via the kernel command line and sysfs control files.
+   on x86.  Administrators can change that behavior via the kernel
+   command line and sysfs control files.
    See :ref:`spectre_mitigation_control_command_line`.
 
    Programs that disable their indirect branch speculation will have
@@ -674,9 +673,8 @@ Mitigation selection guide
    off by disabling their indirect branch speculation when they are run
    (See :ref:`Documentation/userspace-api/spec_ctrl.rst <set_spec_ctrl>`).
    This prevents untrusted programs from polluting the branch target
-   buffer.  All programs running in SECCOMP sandboxes have indirect
-   branch speculation restricted by default. This behavior can be
-   changed via the kernel command line and sysfs control files. See
+   buffer.  This behavior can be changed via the kernel command line
+   and sysfs control files. See
    :ref:`spectre_mitigation_control_command_line`.
 
 3. High security mode
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 526d65d8573a..105401a3582f 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4980,8 +4980,7 @@
 			auto    - Kernel selects the mitigation depending on
 				  the available CPU features and vulnerability.
 
-			Default mitigation:
-			If CONFIG_SECCOMP=y then "seccomp", otherwise "prctl"
+			Default mitigation: "prctl"
 
 			Not specifying this option is equivalent to
 			spectre_v2_user=auto.
@@ -5025,7 +5024,7 @@
 				  will disable SSB unless they explicitly opt out.
 
 			Default mitigations:
-			X86:	If CONFIG_SECCOMP=y "seccomp", otherwise "prctl"
+			X86:	"prctl"
 
 			On powerpc the options are:
 
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index d3f0db463f96..5ec39397fe9c 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -721,11 +721,11 @@ spectre_v2_user_select_mitigation(enum spectre_v2_mitigation_cmd v2_cmd)
 	case SPECTRE_V2_USER_CMD_FORCE:
 		mode = SPECTRE_V2_USER_STRICT;
 		break;
+	case SPECTRE_V2_USER_CMD_AUTO:
 	case SPECTRE_V2_USER_CMD_PRCTL:
 	case SPECTRE_V2_USER_CMD_PRCTL_IBPB:
 		mode = SPECTRE_V2_USER_PRCTL;
 		break;
-	case SPECTRE_V2_USER_CMD_AUTO:
 	case SPECTRE_V2_USER_CMD_SECCOMP:
 	case SPECTRE_V2_USER_CMD_SECCOMP_IBPB:
 		if (IS_ENABLED(CONFIG_SECCOMP))
@@ -1132,7 +1132,6 @@ static enum ssb_mitigation __init __ssb_select_mitigation(void)
 		return mode;
 
 	switch (cmd) {
-	case SPEC_STORE_BYPASS_CMD_AUTO:
 	case SPEC_STORE_BYPASS_CMD_SECCOMP:
 		/*
 		 * Choose prctl+seccomp as the default mode if seccomp is
@@ -1146,6 +1145,7 @@ static enum ssb_mitigation __init __ssb_select_mitigation(void)
 	case SPEC_STORE_BYPASS_CMD_ON:
 		mode = SPEC_STORE_BYPASS_DISABLE;
 		break;
+	case SPEC_STORE_BYPASS_CMD_AUTO:
 	case SPEC_STORE_BYPASS_CMD_PRCTL:
 		mode = SPEC_STORE_BYPASS_PRCTL;
 		break;


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 0/1] x86: deduplicate the spectre_v2_user documentation
  2020-11-04 23:40     ` Andrea Arcangeli
@ 2020-11-05  0:14       ` Andrea Arcangeli
  -1 siblings, 0 replies; 21+ messages in thread
From: Andrea Arcangeli @ 2020-11-05  0:14 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Josep Torrellas, Tobin Feldman-Fitzthum, Hubertus Franke,
	Jack Chen, Giuseppe Scrivano, Andi Kleen, YiFei Zhu, Waiman Long,
	Tianyin Xu, Jann Horn, Jiri Kosina, Valentin Rothberg,
	Josh Poimboeuf, Will Drewry, Linux Containers, kernel list,
	Andy Lutomirski, Dimitrios Skarlatos, David Laight, bpf

Hello,

Could you help checking if this incremental doc cleanup is possible?

After the previous patch is applied, there's still a leftover mention
of seccomp that should be removed in a duped bit of documentation, so
I tentatively referred the original documentation already updated in
sync, instead of keeping the dup around and applying the same update
to the dup.

Note: as far as I can tell spec_store_bypass_disable= documentation is
not duplicated in spectre.rst, that's better in my view. The more dups
we have the more one goes out of sync..

Andrea Arcangeli (1):
  x86: deduplicate the spectre_v2_user documentation

 Documentation/admin-guide/hw-vuln/spectre.rst | 51 +------------------
 1 file changed, 2 insertions(+), 49 deletions(-)

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 0/1] x86: deduplicate the spectre_v2_user documentation
@ 2020-11-05  0:14       ` Andrea Arcangeli
  0 siblings, 0 replies; 21+ messages in thread
From: Andrea Arcangeli @ 2020-11-05  0:14 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: YiFei Zhu, Linux Containers, YiFei Zhu, bpf, kernel list,
	Aleksa Sarai, Andy Lutomirski, David Laight, Dimitrios Skarlatos,
	Giuseppe Scrivano, Hubertus Franke, Jack Chen, Jann Horn,
	Josep Torrellas, Tianyin Xu, Tobin Feldman-Fitzthum,
	Tycho Andersen, Valentin Rothberg, Will Drewry, Jiri Kosina,
	Waiman Long, Josh Poimboeuf, Andi Kleen

Hello,

Could you help checking if this incremental doc cleanup is possible?

After the previous patch is applied, there's still a leftover mention
of seccomp that should be removed in a duped bit of documentation, so
I tentatively referred the original documentation already updated in
sync, instead of keeping the dup around and applying the same update
to the dup.

Note: as far as I can tell spec_store_bypass_disable= documentation is
not duplicated in spectre.rst, that's better in my view. The more dups
we have the more one goes out of sync..

Andrea Arcangeli (1):
  x86: deduplicate the spectre_v2_user documentation

 Documentation/admin-guide/hw-vuln/spectre.rst | 51 +------------------
 1 file changed, 2 insertions(+), 49 deletions(-)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/1] x86: deduplicate the spectre_v2_user documentation
  2020-11-05  0:14       ` Andrea Arcangeli
@ 2020-11-05  0:14         ` Andrea Arcangeli
  -1 siblings, 0 replies; 21+ messages in thread
From: Andrea Arcangeli @ 2020-11-05  0:14 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Josep Torrellas, Tobin Feldman-Fitzthum, Hubertus Franke,
	Jack Chen, Giuseppe Scrivano, Andi Kleen, YiFei Zhu, Waiman Long,
	Tianyin Xu, Jann Horn, Jiri Kosina, Valentin Rothberg,
	Josh Poimboeuf, Will Drewry, Linux Containers, kernel list,
	Andy Lutomirski, Dimitrios Skarlatos, David Laight, bpf

This would need updating to make prctl be the new default, but it's
simpler to delete it and refer to the dup.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 Documentation/admin-guide/hw-vuln/spectre.rst | 51 +------------------
 1 file changed, 2 insertions(+), 49 deletions(-)

diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
index 19b897cb1d45..ab7d402c1677 100644
--- a/Documentation/admin-guide/hw-vuln/spectre.rst
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -593,61 +593,14 @@ kernel command line.
 		Not specifying this option is equivalent to
 		spectre_v2=auto.
 
-For user space mitigation:
-
-        spectre_v2_user=
-
-		[X86] Control mitigation of Spectre variant 2
-		(indirect branch speculation) vulnerability between
-		user space tasks
-
-		on
-			Unconditionally enable mitigations. Is
-			enforced by spectre_v2=on
-
-		off
-			Unconditionally disable mitigations. Is
-			enforced by spectre_v2=off
-
-		prctl
-			Indirect branch speculation is enabled,
-			but mitigation can be enabled via prctl
-			per thread. The mitigation control state
-			is inherited on fork.
-
-		prctl,ibpb
-			Like "prctl" above, but only STIBP is
-			controlled per thread. IBPB is issued
-			always when switching between different user
-			space processes.
-
-		seccomp
-			Same as "prctl" above, but all seccomp
-			threads will enable the mitigation unless
-			they explicitly opt out.
-
-		seccomp,ibpb
-			Like "seccomp" above, but only STIBP is
-			controlled per thread. IBPB is issued
-			always when switching between different
-			user space processes.
-
-		auto
-			Kernel selects the mitigation depending on
-			the available CPU features and vulnerability.
-
-		Default mitigation:
-		If CONFIG_SECCOMP=y then "seccomp", otherwise "prctl"
-
-		Not specifying this option is equivalent to
-		spectre_v2_user=auto.
-
 		In general the kernel by default selects
 		reasonable mitigations for the current CPU. To
 		disable Spectre variant 2 mitigations, boot with
 		spectre_v2=off. Spectre variant 1 mitigations
 		cannot be disabled.
 
+For spectre_v2_user see :doc:`/admin-guide/kernel-parameters`.
+
 Mitigation selection guide
 --------------------------
 

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 1/1] x86: deduplicate the spectre_v2_user documentation
@ 2020-11-05  0:14         ` Andrea Arcangeli
  0 siblings, 0 replies; 21+ messages in thread
From: Andrea Arcangeli @ 2020-11-05  0:14 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: YiFei Zhu, Linux Containers, YiFei Zhu, bpf, kernel list,
	Aleksa Sarai, Andy Lutomirski, David Laight, Dimitrios Skarlatos,
	Giuseppe Scrivano, Hubertus Franke, Jack Chen, Jann Horn,
	Josep Torrellas, Tianyin Xu, Tobin Feldman-Fitzthum,
	Tycho Andersen, Valentin Rothberg, Will Drewry, Jiri Kosina,
	Waiman Long, Josh Poimboeuf, Andi Kleen

This would need updating to make prctl be the new default, but it's
simpler to delete it and refer to the dup.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 Documentation/admin-guide/hw-vuln/spectre.rst | 51 +------------------
 1 file changed, 2 insertions(+), 49 deletions(-)

diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
index 19b897cb1d45..ab7d402c1677 100644
--- a/Documentation/admin-guide/hw-vuln/spectre.rst
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -593,61 +593,14 @@ kernel command line.
 		Not specifying this option is equivalent to
 		spectre_v2=auto.
 
-For user space mitigation:
-
-        spectre_v2_user=
-
-		[X86] Control mitigation of Spectre variant 2
-		(indirect branch speculation) vulnerability between
-		user space tasks
-
-		on
-			Unconditionally enable mitigations. Is
-			enforced by spectre_v2=on
-
-		off
-			Unconditionally disable mitigations. Is
-			enforced by spectre_v2=off
-
-		prctl
-			Indirect branch speculation is enabled,
-			but mitigation can be enabled via prctl
-			per thread. The mitigation control state
-			is inherited on fork.
-
-		prctl,ibpb
-			Like "prctl" above, but only STIBP is
-			controlled per thread. IBPB is issued
-			always when switching between different user
-			space processes.
-
-		seccomp
-			Same as "prctl" above, but all seccomp
-			threads will enable the mitigation unless
-			they explicitly opt out.
-
-		seccomp,ibpb
-			Like "seccomp" above, but only STIBP is
-			controlled per thread. IBPB is issued
-			always when switching between different
-			user space processes.
-
-		auto
-			Kernel selects the mitigation depending on
-			the available CPU features and vulnerability.
-
-		Default mitigation:
-		If CONFIG_SECCOMP=y then "seccomp", otherwise "prctl"
-
-		Not specifying this option is equivalent to
-		spectre_v2_user=auto.
-
 		In general the kernel by default selects
 		reasonable mitigations for the current CPU. To
 		disable Spectre variant 2 mitigations, boot with
 		spectre_v2=off. Spectre variant 1 mitigations
 		cannot be disabled.
 
+For spectre_v2_user see :doc:`/admin-guide/kernel-parameters`.
+
 Mitigation selection guide
 --------------------------
 


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: RFC: default to spec_store_bypass_disable=prctl spectre_v2_user=prctl
  2020-11-04 21:57 ` Andrea Arcangeli
                   ` (2 preceding siblings ...)
  (?)
@ 2021-07-10 18:05 ` Jim Newsome
  -1 siblings, 0 replies; 21+ messages in thread
From: Jim Newsome @ 2021-07-10 18:05 UTC (permalink / raw)
  To: aarcange; +Cc: YiFei Zhu, Linux Containers, YiFei Zhu, bpf, kernel list

Is anything happening with this proposal? Is there anything I could do 
to help it along?

My personal motivation is that I'm involved in developing and using the 
[Shadow] simulator, which we use to run hours and days long simulations. 
We're currently looking into running some simulations in gitlab CI 
Docker runner to take advantage of shared hardware, but Docker currently 
doesn't expose a way to opt out of these mitigations without turning off 
seccomp altogether [Docker FR].

I've measured these mitigations to cause simulations to take 50% longer 
[Overhead], so I'm pretty motivated to find a way to disable them :).

[Shadow]: https://shadow.github.io/
[Docker FR]: https://github.com/moby/moby/issues/42619
[Overhead]: 
https://github.com/shadow/shadow/issues/1489#issuecomment-871445482

P.S. Attempting to respond to a thread without actually being subscribed 
to the list; sorry if this ends up not threading correctly. The CC 
header was truncated so also some original recipients have been dropped. 
Original thread: https://lkml.org/lkml/2020/11/4/1135

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/1] x86: change default to spec_store_bypass_disable=prctl spectre_v2_user=prctl
  2020-11-04 23:50     ` Andrea Arcangeli
  (?)
@ 2021-09-11 21:13     ` Kees Cook
  2021-09-12  2:01       ` Josh Poimboeuf
  2021-09-12 23:14       ` Waiman Long
  -1 siblings, 2 replies; 21+ messages in thread
From: Kees Cook @ 2021-09-11 21:13 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Thomas Gleixner, YiFei Zhu, Linux Containers, YiFei Zhu, bpf,
	kernel list, Aleksa Sarai, Andy Lutomirski, David Laight,
	Dimitrios Skarlatos, Giuseppe Scrivano, Hubertus Franke,
	Jack Chen, Jann Horn, Josep Torrellas, Tianyin Xu,
	Tobin Feldman-Fitzthum, Tycho Andersen, Valentin Rothberg,
	Will Drewry, Jiri Kosina, Waiman Long, Josh Poimboeuf,
	Andi Kleen

On Wed, Nov 04, 2020 at 06:50:54PM -0500, Andrea Arcangeli wrote:
> Switch the kernel default of SSBD and STIBP to the ones with
> CONFIG_SECCOMP=n (i.e. spec_store_bypass_disable=prctl
> spectre_v2_user=prctl) even if CONFIG_SECCOMP=y.

Hello x86 maintainers!

I'd really like to get this landed, so I'll take this via the
seccomp-tree unless someone else speaks up. This keeps falling off
the edge of my TODO list. :)

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/1] x86: deduplicate the spectre_v2_user documentation
  2020-11-05  0:14         ` Andrea Arcangeli
  (?)
@ 2021-09-11 21:13         ` Kees Cook
  -1 siblings, 0 replies; 21+ messages in thread
From: Kees Cook @ 2021-09-11 21:13 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Thomas Gleixner, YiFei Zhu, Linux Containers, YiFei Zhu, bpf,
	kernel list, Aleksa Sarai, Andy Lutomirski, David Laight,
	Dimitrios Skarlatos, Giuseppe Scrivano, Hubertus Franke,
	Jack Chen, Jann Horn, Josep Torrellas, Tianyin Xu,
	Tobin Feldman-Fitzthum, Tycho Andersen, Valentin Rothberg,
	Will Drewry, Jiri Kosina, Waiman Long, Josh Poimboeuf,
	Andi Kleen

On Wed, Nov 04, 2020 at 07:14:06PM -0500, Andrea Arcangeli wrote:
> This would need updating to make prctl be the new default, but it's
> simpler to delete it and refer to the dup.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

I'll take this too.

-Kees

> ---
>  Documentation/admin-guide/hw-vuln/spectre.rst | 51 +------------------
>  1 file changed, 2 insertions(+), 49 deletions(-)
> 
> diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
> index 19b897cb1d45..ab7d402c1677 100644
> --- a/Documentation/admin-guide/hw-vuln/spectre.rst
> +++ b/Documentation/admin-guide/hw-vuln/spectre.rst
> @@ -593,61 +593,14 @@ kernel command line.
>  		Not specifying this option is equivalent to
>  		spectre_v2=auto.
>  
> -For user space mitigation:
> -
> -        spectre_v2_user=
> -
> -		[X86] Control mitigation of Spectre variant 2
> -		(indirect branch speculation) vulnerability between
> -		user space tasks
> -
> -		on
> -			Unconditionally enable mitigations. Is
> -			enforced by spectre_v2=on
> -
> -		off
> -			Unconditionally disable mitigations. Is
> -			enforced by spectre_v2=off
> -
> -		prctl
> -			Indirect branch speculation is enabled,
> -			but mitigation can be enabled via prctl
> -			per thread. The mitigation control state
> -			is inherited on fork.
> -
> -		prctl,ibpb
> -			Like "prctl" above, but only STIBP is
> -			controlled per thread. IBPB is issued
> -			always when switching between different user
> -			space processes.
> -
> -		seccomp
> -			Same as "prctl" above, but all seccomp
> -			threads will enable the mitigation unless
> -			they explicitly opt out.
> -
> -		seccomp,ibpb
> -			Like "seccomp" above, but only STIBP is
> -			controlled per thread. IBPB is issued
> -			always when switching between different
> -			user space processes.
> -
> -		auto
> -			Kernel selects the mitigation depending on
> -			the available CPU features and vulnerability.
> -
> -		Default mitigation:
> -		If CONFIG_SECCOMP=y then "seccomp", otherwise "prctl"
> -
> -		Not specifying this option is equivalent to
> -		spectre_v2_user=auto.
> -
>  		In general the kernel by default selects
>  		reasonable mitigations for the current CPU. To
>  		disable Spectre variant 2 mitigations, boot with
>  		spectre_v2=off. Spectre variant 1 mitigations
>  		cannot be disabled.
>  
> +For spectre_v2_user see :doc:`/admin-guide/kernel-parameters`.
> +
>  Mitigation selection guide
>  --------------------------
>  
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/1] x86: change default to spec_store_bypass_disable=prctl spectre_v2_user=prctl
  2021-09-11 21:13     ` Kees Cook
@ 2021-09-12  2:01       ` Josh Poimboeuf
  2021-10-04 17:54         ` Josh Poimboeuf
  2021-09-12 23:14       ` Waiman Long
  1 sibling, 1 reply; 21+ messages in thread
From: Josh Poimboeuf @ 2021-09-12  2:01 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrea Arcangeli, Thomas Gleixner, YiFei Zhu, Linux Containers,
	YiFei Zhu, bpf, kernel list, Aleksa Sarai, Andy Lutomirski,
	David Laight, Dimitrios Skarlatos, Giuseppe Scrivano,
	Hubertus Franke, Jack Chen, Jann Horn, Josep Torrellas,
	Tianyin Xu, Tobin Feldman-Fitzthum, Tycho Andersen,
	Valentin Rothberg, Will Drewry, Jiri Kosina, Waiman Long,
	Andi Kleen



> On Sep 11, 2021, at 2:13 PM, Kees Cook <keescook@chromium.org> wrote:
> 
> On Wed, Nov 04, 2020 at 06:50:54PM -0500, Andrea Arcangeli wrote:
>> Switch the kernel default of SSBD and STIBP to the ones with
>> CONFIG_SECCOMP=n (i.e. spec_store_bypass_disable=prctl
>> spectre_v2_user=prctl) even if CONFIG_SECCOMP=y.
> 
> Hello x86 maintainers!
> 
> I'd really like to get this landed, so I'll take this via the
> seccomp-tree unless someone else speaks up. This keeps falling off
> the edge of my TODO list. :)

Thanks!  You can add my

Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/1] x86: change default to spec_store_bypass_disable=prctl spectre_v2_user=prctl
  2021-09-11 21:13     ` Kees Cook
  2021-09-12  2:01       ` Josh Poimboeuf
@ 2021-09-12 23:14       ` Waiman Long
  1 sibling, 0 replies; 21+ messages in thread
From: Waiman Long @ 2021-09-12 23:14 UTC (permalink / raw)
  To: Kees Cook, Andrea Arcangeli
  Cc: Thomas Gleixner, YiFei Zhu, Linux Containers, YiFei Zhu, bpf,
	kernel list, Aleksa Sarai, Andy Lutomirski, David Laight,
	Dimitrios Skarlatos, Giuseppe Scrivano, Hubertus Franke,
	Jack Chen, Jann Horn, Josep Torrellas, Tianyin Xu,
	Tobin Feldman-Fitzthum, Tycho Andersen, Valentin Rothberg,
	Will Drewry, Jiri Kosina, Josh Poimboeuf, Andi Kleen

On 9/11/21 5:13 PM, Kees Cook wrote:
> On Wed, Nov 04, 2020 at 06:50:54PM -0500, Andrea Arcangeli wrote:
>> Switch the kernel default of SSBD and STIBP to the ones with
>> CONFIG_SECCOMP=n (i.e. spec_store_bypass_disable=prctl
>> spectre_v2_user=prctl) even if CONFIG_SECCOMP=y.
> Hello x86 maintainers!
>
> I'd really like to get this landed, so I'll take this via the
> seccomp-tree unless someone else speaks up. This keeps falling off
> the edge of my TODO list. :)
>
> -Kees
>
You can add my ack too. Thanks!

Acked-by: Waiman Long <longman@redhat.com>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/1] x86: change default to spec_store_bypass_disable=prctl spectre_v2_user=prctl
  2021-09-12  2:01       ` Josh Poimboeuf
@ 2021-10-04 17:54         ` Josh Poimboeuf
  2021-10-04 19:14           ` Kees Cook
  0 siblings, 1 reply; 21+ messages in thread
From: Josh Poimboeuf @ 2021-10-04 17:54 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrea Arcangeli, Thomas Gleixner, YiFei Zhu, Linux Containers,
	YiFei Zhu, bpf, kernel list, Aleksa Sarai, Andy Lutomirski,
	David Laight, Dimitrios Skarlatos, Giuseppe Scrivano,
	Hubertus Franke, Jack Chen, Jann Horn, Josep Torrellas,
	Tianyin Xu, Tobin Feldman-Fitzthum, Tycho Andersen,
	Valentin Rothberg, Will Drewry, Jiri Kosina, Waiman Long,
	Andi Kleen

On Sat, Sep 11, 2021 at 07:01:40PM -0700, Josh Poimboeuf wrote:
> 
> 
> > On Sep 11, 2021, at 2:13 PM, Kees Cook <keescook@chromium.org> wrote:
> > 
> > On Wed, Nov 04, 2020 at 06:50:54PM -0500, Andrea Arcangeli wrote:
> >> Switch the kernel default of SSBD and STIBP to the ones with
> >> CONFIG_SECCOMP=n (i.e. spec_store_bypass_disable=prctl
> >> spectre_v2_user=prctl) even if CONFIG_SECCOMP=y.
> > 
> > Hello x86 maintainers!
> > 
> > I'd really like to get this landed, so I'll take this via the
> > seccomp-tree unless someone else speaks up. This keeps falling off
> > the edge of my TODO list. :)
> 
> Thanks!  You can add my
> 
> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>

Hi Kees,

Ping - I don't see this patch in linux-next.  Are you planning on grabbing this
for the next merge window?

-- 
Josh


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/1] x86: change default to spec_store_bypass_disable=prctl spectre_v2_user=prctl
  2021-10-04 17:54         ` Josh Poimboeuf
@ 2021-10-04 19:14           ` Kees Cook
  0 siblings, 0 replies; 21+ messages in thread
From: Kees Cook @ 2021-10-04 19:14 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Andrea Arcangeli, Thomas Gleixner, YiFei Zhu, Linux Containers,
	YiFei Zhu, bpf, kernel list, Aleksa Sarai, Andy Lutomirski,
	David Laight, Dimitrios Skarlatos, Giuseppe Scrivano,
	Hubertus Franke, Jack Chen, Jann Horn, Josep Torrellas,
	Tianyin Xu, Tobin Feldman-Fitzthum, Tycho Andersen,
	Valentin Rothberg, Will Drewry, Jiri Kosina, Waiman Long,
	Andi Kleen

On Mon, Oct 04, 2021 at 10:54:31AM -0700, Josh Poimboeuf wrote:
> On Sat, Sep 11, 2021 at 07:01:40PM -0700, Josh Poimboeuf wrote:
> > 
> > 
> > > On Sep 11, 2021, at 2:13 PM, Kees Cook <keescook@chromium.org> wrote:
> > > 
> > > On Wed, Nov 04, 2020 at 06:50:54PM -0500, Andrea Arcangeli wrote:
> > >> Switch the kernel default of SSBD and STIBP to the ones with
> > >> CONFIG_SECCOMP=n (i.e. spec_store_bypass_disable=prctl
> > >> spectre_v2_user=prctl) even if CONFIG_SECCOMP=y.
> > > 
> > > Hello x86 maintainers!
> > > 
> > > I'd really like to get this landed, so I'll take this via the
> > > seccomp-tree unless someone else speaks up. This keeps falling off
> > > the edge of my TODO list. :)
> > 
> > Thanks!  You can add my
> > 
> > Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
> 
> Hi Kees,
> 
> Ping - I don't see this patch in linux-next.  Are you planning on grabbing this
> for the next merge window?

Thanks for the reminder! I've pushed this to the seccomp next tree.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-10-04 19:14 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-04 21:57 RFC: default to spec_store_bypass_disable=prctl spectre_v2_user=prctl Andrea Arcangeli
2020-11-04 21:57 ` Andrea Arcangeli
2020-11-04 22:14 ` Kees Cook
2020-11-04 22:14   ` Kees Cook
2020-11-04 23:22 ` Thomas Gleixner
2020-11-04 23:22   ` Thomas Gleixner
2020-11-04 23:40   ` Andrea Arcangeli
2020-11-04 23:40     ` Andrea Arcangeli
2020-11-05  0:14     ` [PATCH 0/1] x86: deduplicate the spectre_v2_user documentation Andrea Arcangeli
2020-11-05  0:14       ` Andrea Arcangeli
2020-11-05  0:14       ` [PATCH 1/1] " Andrea Arcangeli
2020-11-05  0:14         ` Andrea Arcangeli
2021-09-11 21:13         ` Kees Cook
2020-11-04 23:50   ` [PATCH 1/1] x86: change default to spec_store_bypass_disable=prctl spectre_v2_user=prctl Andrea Arcangeli
2020-11-04 23:50     ` Andrea Arcangeli
2021-09-11 21:13     ` Kees Cook
2021-09-12  2:01       ` Josh Poimboeuf
2021-10-04 17:54         ` Josh Poimboeuf
2021-10-04 19:14           ` Kees Cook
2021-09-12 23:14       ` Waiman Long
2021-07-10 18:05 ` RFC: " Jim Newsome

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.