All of lore.kernel.org
 help / color / mirror / Atom feed
* [MODERATED] [PATCH 1/1] Linux patch #1
@ 2018-06-28 22:53 Jiri Kosina
  2018-06-28 23:15 ` [MODERATED] " Jiri Kosina
                   ` (3 more replies)
  0 siblings, 4 replies; 55+ messages in thread
From: Jiri Kosina @ 2018-06-28 22:53 UTC (permalink / raw)
  To: speck

Introduce 'l1tf=nosmt' boot option to allow for turning off SMT during boot
automatically on CPUs affected by L1TF.

This parameter could be further extended to cover other possible 
mitigations or to be consumed by hypervisors when deciding on which 
mitigations to apply.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---

If this is accepted, I think the KVM knobs toggling which kind of L1D 
flushes (if any) to do on vmenter should be added as additional 'l1tf=' 
parameter (once the KVM pile gets applied).

Thanks.

 Documentation/admin-guide/kernel-parameters.txt |  4 ++++
 arch/x86/kernel/cpu/bugs.c                      | 18 +++++++++++++++++-
 include/linux/cpu.h                             |  2 ++
 kernel/cpu.c                                    | 14 ++++++++++++--
 4 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 8e29c4b6756f..8e9594aeb841 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1971,6 +1971,10 @@
 			feature (tagged TLBs) on capable Intel chips.
 			Default is 1 (enabled)
 
+	l1tf=           [X86] Control mitigation of L1TF vulnerability.
+			nosmt	disable hyper-threading on CPUs vulnerable to
+				L1TF
+
 	l2cr=		[PPC]
 
 	l3cr=		[PPC]
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 50500cea6eba..65e8b32e51a6 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -682,6 +682,21 @@ static void __init l1tf_select_mitigation(void)
 
 	setup_force_cpu_cap(X86_FEATURE_L1TF_PTEINV);
 }
+
+static bool l1tf_nosmt;
+
+static int __init l1tf_cmdline(char *str)
+{
+	if (str && !strcmp(str, "nosmt")) {
+		l1tf_nosmt = true;
+		cpu_smt_disable(true);
+	}
+
+	return 0;
+}
+
+early_param("l1tf", l1tf_cmdline);
+
 #undef pr_fmt
 
 #ifdef CONFIG_SYSFS
@@ -713,7 +728,8 @@ static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr
 
 	case X86_BUG_L1TF:
 		if (boot_cpu_has(X86_FEATURE_L1TF_PTEINV))
-			return sprintf(buf, "Mitigation: Page Table Inversion\n");
+			return sprintf(buf, "Mitigation: Page Table Inversion %s\n",
+					l1tf_nosmt ? "+ HT off" : "");
 		break;
 
 	default:
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 7532cbf27b1d..26acd9f51df1 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -177,8 +177,10 @@ enum cpuhp_smt_control {
 
 #if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT)
 extern enum cpuhp_smt_control cpu_smt_control;
+void cpu_smt_disable(bool force);
 #else
 # define cpu_smt_control		(CPU_SMT_ENABLED)
+static inline void cpu_smt_disable(bool) { return; }
 #endif
 
 #endif /* _LINUX_CPU_H_ */
diff --git a/kernel/cpu.c b/kernel/cpu.c
index d29fdd7e57bb..d2e790debc6d 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -936,13 +936,23 @@ EXPORT_SYMBOL(cpu_down);
 #ifdef CONFIG_HOTPLUG_SMT
 enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED;
 
-static int __init smt_cmdline_disable(char *str)
+void __init cpu_smt_disable(bool force)
 {
 	cpu_smt_control = CPU_SMT_DISABLED;
-	if (str && !strcmp(str, "force")) {
+	if (force) {
 		pr_info("SMT: Force disabled\n");
 		cpu_smt_control = CPU_SMT_FORCE_DISABLED;
 	}
+	return;
+}
+
+static int __init smt_cmdline_disable(char *str)
+{
+	if (str && !strcmp(str, "force"))
+		cpu_smt_disable(true);
+	else
+		cpu_smt_disable(false);
+
 	return 0;
 }
 early_param("nosmt", smt_cmdline_disable);

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1] Linux patch #1
  2018-06-28 22:53 [MODERATED] [PATCH 1/1] Linux patch #1 Jiri Kosina
@ 2018-06-28 23:15 ` Jiri Kosina
  2018-06-28 23:36 ` [MODERATED] [PATCH 1/1 v2] " Jiri Kosina
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 55+ messages in thread
From: Jiri Kosina @ 2018-06-28 23:15 UTC (permalink / raw)
  To: speck

On Fri, 29 Jun 2018, speck for Jiri Kosina wrote:

> From: Jiri Kosina <jkosina@suse.cz>
> Subject: [PATCH] x86/bugs: introduce l1tf=nosmt boot-time parameter
> 
> Introduce 'l1tf=nosmt' boot option to allow for turning off SMT during boot
> automatically on CPUs affected by L1TF.
> 
> This parameter could be further extended to cover other possible 
> mitigations or to be consumed by hypervisors when deciding on which 
> mitigations to apply.
> 
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>

Bah, I forgot to refresh and sent a stale version of the patch that 
doesn't actually contain the X86_BUG_L1TF dependency, please disregard 
this one, will resend v2. Sorry for the noise.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] [PATCH 1/1 v2] Linux patch #1
  2018-06-28 22:53 [MODERATED] [PATCH 1/1] Linux patch #1 Jiri Kosina
  2018-06-28 23:15 ` [MODERATED] " Jiri Kosina
@ 2018-06-28 23:36 ` Jiri Kosina
  2018-06-29  8:38   ` [MODERATED] " Borislav Petkov
                     ` (2 more replies)
  2018-06-30 19:48 ` [MODERATED] [PATCH 1/1 v3] " Jiri Kosina
  2018-06-30 22:22 ` [MODERATED] [PATCH 1/1 v4] " Jiri Kosina
  3 siblings, 3 replies; 55+ messages in thread
From: Jiri Kosina @ 2018-06-28 23:36 UTC (permalink / raw)
  To: speck

Introduce 'l1tf=nosmt' boot option to allow for turning off SMT during boot
automatically on CPUs affected by L1TF.

This could be further extended to cover other possible mitigations or to be
consumed by hypervisors when deciding on which mitigations to apply.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---

v1->v2: add forgotten dependency on X86_BUG_L1TF

If this is accepted, I think the KVM knobs toggling which kind of L1D 
flushes (if any) to do on vmenter should be added as additional 'l1tf=' 
parameter (once the KVM pile gets applied).

Thanks.

 Documentation/admin-guide/kernel-parameters.txt |  4 ++++
 arch/x86/kernel/cpu/bugs.c                      | 19 ++++++++++++++++++-
 include/linux/cpu.h                             |  2 ++
 kernel/cpu.c                                    | 14 ++++++++++++--
 4 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 8e29c4b6756f..8e9594aeb841 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1971,6 +1971,10 @@
 			feature (tagged TLBs) on capable Intel chips.
 			Default is 1 (enabled)
 
+	l1tf=           [X86] Control mitigation of L1TF vulnerability.
+			nosmt	disable hyper-threading on CPUs vulnerable to
+				L1TF
+
 	l2cr=		[PPC]
 
 	l3cr=		[PPC]
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 50500cea6eba..e17331b8589e 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -682,6 +682,22 @@ static void __init l1tf_select_mitigation(void)
 
 	setup_force_cpu_cap(X86_FEATURE_L1TF_PTEINV);
 }
+
+static bool l1tf_nosmt;
+
+static int __init l1tf_cmdline(char *str)
+{
+	if (boot_cpu_has_bug(X86_BUG_L1TF) &&
+			str && !strcmp(str, "nosmt")) {
+		l1tf_nosmt = true;
+		cpu_smt_disable(true);
+	}
+
+	return 0;
+}
+
+early_param("l1tf", l1tf_cmdline);
+
 #undef pr_fmt
 
 #ifdef CONFIG_SYSFS
@@ -713,7 +729,8 @@ static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr
 
 	case X86_BUG_L1TF:
 		if (boot_cpu_has(X86_FEATURE_L1TF_PTEINV))
-			return sprintf(buf, "Mitigation: Page Table Inversion\n");
+			return sprintf(buf, "Mitigation: Page Table Inversion %s\n",
+					l1tf_nosmt ? "+ HT off" : "");
 		break;
 
 	default:
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 7532cbf27b1d..26acd9f51df1 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -177,8 +177,10 @@ enum cpuhp_smt_control {
 
 #if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT)
 extern enum cpuhp_smt_control cpu_smt_control;
+void cpu_smt_disable(bool force);
 #else
 # define cpu_smt_control		(CPU_SMT_ENABLED)
+static inline void cpu_smt_disable(bool) { return; }
 #endif
 
 #endif /* _LINUX_CPU_H_ */
diff --git a/kernel/cpu.c b/kernel/cpu.c
index d29fdd7e57bb..d2e790debc6d 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -936,13 +936,23 @@ EXPORT_SYMBOL(cpu_down);
 #ifdef CONFIG_HOTPLUG_SMT
 enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED;
 
-static int __init smt_cmdline_disable(char *str)
+void __init cpu_smt_disable(bool force)
 {
 	cpu_smt_control = CPU_SMT_DISABLED;
-	if (str && !strcmp(str, "force")) {
+	if (force) {
 		pr_info("SMT: Force disabled\n");
 		cpu_smt_control = CPU_SMT_FORCE_DISABLED;
 	}
+	return;
+}
+
+static int __init smt_cmdline_disable(char *str)
+{
+	if (str && !strcmp(str, "force"))
+		cpu_smt_disable(true);
+	else
+		cpu_smt_disable(false);
+
 	return 0;
 }
 early_param("nosmt", smt_cmdline_disable);

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-28 23:36 ` [MODERATED] [PATCH 1/1 v2] " Jiri Kosina
@ 2018-06-29  8:38   ` Borislav Petkov
  2018-06-29 15:43   ` Thomas Gleixner
  2018-06-29 16:48   ` [MODERATED] " Josh Poimboeuf
  2 siblings, 0 replies; 55+ messages in thread
From: Borislav Petkov @ 2018-06-29  8:38 UTC (permalink / raw)
  To: speck

On Fri, Jun 29, 2018 at 01:36:49AM +0200, speck for Jiri Kosina wrote:
> From: Jiri Kosina <jkosina@suse.cz>
> Subject: [PATCH] x86/bugs: introduce l1tf=nosmt boot-time parameter

...

> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index d29fdd7e57bb..d2e790debc6d 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -936,13 +936,23 @@ EXPORT_SYMBOL(cpu_down);
>  #ifdef CONFIG_HOTPLUG_SMT
>  enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED;
>  
> -static int __init smt_cmdline_disable(char *str)
> +void __init cpu_smt_disable(bool force)
>  {
>  	cpu_smt_control = CPU_SMT_DISABLED;
> -	if (str && !strcmp(str, "force")) {
> +	if (force) {
>  		pr_info("SMT: Force disabled\n");
>  		cpu_smt_control = CPU_SMT_FORCE_DISABLED;
>  	}
> +	return;

You don't need that "return" here.

Otherwise looks ok.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-28 23:36 ` [MODERATED] [PATCH 1/1 v2] " Jiri Kosina
  2018-06-29  8:38   ` [MODERATED] " Borislav Petkov
@ 2018-06-29 15:43   ` Thomas Gleixner
  2018-06-29 15:46     ` Thomas Gleixner
  2018-06-29 16:48   ` [MODERATED] " Josh Poimboeuf
  2 siblings, 1 reply; 55+ messages in thread
From: Thomas Gleixner @ 2018-06-29 15:43 UTC (permalink / raw)
  To: speck

On Fri, 29 Jun 2018, speck for Jiri Kosina wrote:
>  #ifdef CONFIG_SYSFS
> @@ -713,7 +729,8 @@ static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr
>  
>  	case X86_BUG_L1TF:
>  		if (boot_cpu_has(X86_FEATURE_L1TF_PTEINV))
> -			return sprintf(buf, "Mitigation: Page Table Inversion\n");
> +			return sprintf(buf, "Mitigation: Page Table Inversion %s\n",
> +					l1tf_nosmt ? "+ HT off" : "");

I rather let that print the SMT control state. Updated patch below.

Thanks,

	tglx

8<----------------
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1971,6 +1971,10 @@
 			feature (tagged TLBs) on capable Intel chips.
 			Default is 1 (enabled)
 
+	l1tf=           [X86] Control mitigation of L1TF vulnerability.
+			nosmt	disable hyper-threading on CPUs vulnerable to
+				L1TF
+
 	l2cr=		[PPC]
 
 	l3cr=		[PPC]
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -682,10 +682,30 @@ static void __init l1tf_select_mitigatio
 
 	setup_force_cpu_cap(X86_FEATURE_L1TF_PTEINV);
 }
+
+static int __init l1tf_cmdline(char *str)
+{
+	if (!boot_cpu_has_bug(X86_BUG_L1TF))
+		return 0;
+
+	if (str && !strcmp(str, "nosmt"))
+		cpu_smt_disable(true);
+
+	return 0;
+}
+early_param("l1tf", l1tf_cmdline);
+
 #undef pr_fmt
 
 #ifdef CONFIG_SYSFS
 
+static const char *smt_states[] = {
+	[CPU_SMT_ENABLED]		= "HT enabled",
+	[CPU_SMT_DISABLED]		= "HT disabled",
+	[CPU_SMT_FORCE_DISABLED]	= "HT force disabled"
+	[CPU_SMT_NOT_SUPPORTED]		= "HT not supported",
+};
+
 static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr,
 			       char *buf, unsigned int bug)
 {
@@ -712,8 +732,10 @@ static ssize_t cpu_show_common(struct de
 		return sprintf(buf, "%s\n", ssb_strings[ssb_mode]);
 
 	case X86_BUG_L1TF:
-		if (boot_cpu_has(X86_FEATURE_L1TF_PTEINV))
-			return sprintf(buf, "Mitigation: Page Table Inversion\n");
+		if (boot_cpu_has(X86_FEATURE_L1TF_PTEINV)) {
+			return sprintf(buf, "Mitigation: Page Table Inversion, %s\n",
+				       smt_states[cpu_smt_control]);
+		}
 		break;
 
 	default:
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -177,8 +177,14 @@ enum cpuhp_smt_control {
 
 #if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT)
 extern enum cpuhp_smt_control cpu_smt_control;
+void cpu_smt_disable(bool force);
 #else
-# define cpu_smt_control		(CPU_SMT_ENABLED)
+# ifdef CONFIG_SMP
+#  define cpu_smt_control		(CPU_SMT_NOT_SUPPORTED)
+# else
+#  define cpu_smt_control		(CPU_SMT_ENABLED)
+# endif
+static inline void cpu_smt_disable(bool force) { }
 #endif
 
 #endif /* _LINUX_CPU_H_ */
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -936,13 +936,18 @@ EXPORT_SYMBOL(cpu_down);
 #ifdef CONFIG_HOTPLUG_SMT
 enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED;
 
-static int __init smt_cmdline_disable(char *str)
+void __init cpu_smt_disable(bool force)
 {
 	cpu_smt_control = CPU_SMT_DISABLED;
-	if (str && !strcmp(str, "force")) {
+	if (force) {
 		pr_info("SMT: Force disabled\n");
 		cpu_smt_control = CPU_SMT_FORCE_DISABLED;
 	}
+}
+
+static int __init smt_cmdline_disable(char *str)
+{
+	cpu_smt_disable(str && !strcmp(str, "force"));
 	return 0;
 }
 early_param("nosmt", smt_cmdline_disable);

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-29 15:43   ` Thomas Gleixner
@ 2018-06-29 15:46     ` Thomas Gleixner
  0 siblings, 0 replies; 55+ messages in thread
From: Thomas Gleixner @ 2018-06-29 15:46 UTC (permalink / raw)
  To: speck

On Fri, 29 Jun 2018, speck for Thomas Gleixner wrote:

> On Fri, 29 Jun 2018, speck for Jiri Kosina wrote:
> >  #ifdef CONFIG_SYSFS
> > @@ -713,7 +729,8 @@ static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr
> >  
> >  	case X86_BUG_L1TF:
> >  		if (boot_cpu_has(X86_FEATURE_L1TF_PTEINV))
> > -			return sprintf(buf, "Mitigation: Page Table Inversion\n");
> > +			return sprintf(buf, "Mitigation: Page Table Inversion %s\n",
> > +					l1tf_nosmt ? "+ HT off" : "");
> 
> I rather let that print the SMT control state. Updated patch below.
> 
> Thanks,
> 
> 	tglx
> 
> 8<----------------
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1971,6 +1971,10 @@
>  			feature (tagged TLBs) on capable Intel chips.
>  			Default is 1 (enabled)
>  
> +	l1tf=           [X86] Control mitigation of L1TF vulnerability.
> +			nosmt	disable hyper-threading on CPUs vulnerable to
> +				L1TF
> +
>  	l2cr=		[PPC]
>  
>  	l3cr=		[PPC]
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -682,10 +682,30 @@ static void __init l1tf_select_mitigatio
>  
>  	setup_force_cpu_cap(X86_FEATURE_L1TF_PTEINV);
>  }
> +
> +static int __init l1tf_cmdline(char *str)
> +{
> +	if (!boot_cpu_has_bug(X86_BUG_L1TF))
> +		return 0;
> +
> +	if (str && !strcmp(str, "nosmt"))
> +		cpu_smt_disable(true);
> +
> +	return 0;
> +}
> +early_param("l1tf", l1tf_cmdline);
> +
>  #undef pr_fmt
>  
>  #ifdef CONFIG_SYSFS
>  
> +static const char *smt_states[] = {
> +	[CPU_SMT_ENABLED]		= "HT enabled",
> +	[CPU_SMT_DISABLED]		= "HT disabled",
> +	[CPU_SMT_FORCE_DISABLED]	= "HT force disabled"

Bah. lacks a comma. I'll never learn to do quilt refresh proper.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-28 23:36 ` [MODERATED] [PATCH 1/1 v2] " Jiri Kosina
  2018-06-29  8:38   ` [MODERATED] " Borislav Petkov
  2018-06-29 15:43   ` Thomas Gleixner
@ 2018-06-29 16:48   ` Josh Poimboeuf
  2018-06-29 16:49     ` Josh Poimboeuf
  2018-06-29 19:47     ` Thomas Gleixner
  2 siblings, 2 replies; 55+ messages in thread
From: Josh Poimboeuf @ 2018-06-29 16:48 UTC (permalink / raw)
  To: speck

On Fri, Jun 29, 2018 at 01:36:49AM +0200, speck for Jiri Kosina wrote:
> From: Jiri Kosina <jkosina@suse.cz>
> Subject: [PATCH] x86/bugs: introduce l1tf=nosmt boot-time parameter
> 
> Introduce 'l1tf=nosmt' boot option to allow for turning off SMT during boot
> automatically on CPUs affected by L1TF.
> 
> This could be further extended to cover other possible mitigations or to be
> consumed by hypervisors when deciding on which mitigations to apply.
> 
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> ---
> 
> v1->v2: add forgotten dependency on X86_BUG_L1TF
> 
> If this is accepted, I think the KVM knobs toggling which kind of L1D 
> flushes (if any) to do on vmenter should be added as additional 'l1tf=' 
> parameter (once the KVM pile gets applied).

For ease of use, how about just 'l1tf=on' and 'l1tf=off'?

For now, 'litf=on' would just be equivalent to 'nosmt=force', but it
could eventually also include vmentry_l1d_flush=1.

And then we could add another option, 'l1tf=flush-always' to mean
'nosmt=force' + 'vmentry_l1d_flush=2'.

>  Documentation/admin-guide/kernel-parameters.txt |  4 ++++
>  arch/x86/kernel/cpu/bugs.c                      | 19 ++++++++++++++++++-
>  include/linux/cpu.h                             |  2 ++
>  kernel/cpu.c                                    | 14 ++++++++++++--
>  4 files changed, 36 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 8e29c4b6756f..8e9594aeb841 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1971,6 +1971,10 @@
>  			feature (tagged TLBs) on capable Intel chips.
>  			Default is 1 (enabled)
>  
> +	l1tf=           [X86] Control mitigation of L1TF vulnerability.
> +			nosmt	disable hyper-threading on CPUs vulnerable to
> +				L1TF

Should clarify here that it's a force disable.

-- 
Josh

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-29 16:48   ` [MODERATED] " Josh Poimboeuf
@ 2018-06-29 16:49     ` Josh Poimboeuf
  2018-06-29 19:47     ` Thomas Gleixner
  1 sibling, 0 replies; 55+ messages in thread
From: Josh Poimboeuf @ 2018-06-29 16:49 UTC (permalink / raw)
  To: speck

On Fri, Jun 29, 2018 at 11:48:07AM -0500, Josh Poimboeuf wrote:
> >  Documentation/admin-guide/kernel-parameters.txt |  4 ++++
> >  arch/x86/kernel/cpu/bugs.c                      | 19 ++++++++++++++++++-
> >  include/linux/cpu.h                             |  2 ++
> >  kernel/cpu.c                                    | 14 ++++++++++++--
> >  4 files changed, 36 insertions(+), 3 deletions(-)
> > 
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index 8e29c4b6756f..8e9594aeb841 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -1971,6 +1971,10 @@
> >  			feature (tagged TLBs) on capable Intel chips.
> >  			Default is 1 (enabled)
> >  
> > +	l1tf=           [X86] Control mitigation of L1TF vulnerability.
> > +			nosmt	disable hyper-threading on CPUs vulnerable to
> > +				L1TF
> 
> Should clarify here that it's a force disable.

"permanent" would be be more descriptive.

-- 
Josh

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-29 16:48   ` [MODERATED] " Josh Poimboeuf
  2018-06-29 16:49     ` Josh Poimboeuf
@ 2018-06-29 19:47     ` Thomas Gleixner
  2018-06-29 19:54       ` [MODERATED] " Josh Poimboeuf
  1 sibling, 1 reply; 55+ messages in thread
From: Thomas Gleixner @ 2018-06-29 19:47 UTC (permalink / raw)
  To: speck

On Fri, 29 Jun 2018, speck for Josh Poimboeuf wrote:
> On Fri, Jun 29, 2018 at 01:36:49AM +0200, speck for Jiri Kosina wrote:
> > From: Jiri Kosina <jkosina@suse.cz>
> > Subject: [PATCH] x86/bugs: introduce l1tf=nosmt boot-time parameter
> > 
> > Introduce 'l1tf=nosmt' boot option to allow for turning off SMT during boot
> > automatically on CPUs affected by L1TF.
> > 
> > This could be further extended to cover other possible mitigations or to be
> > consumed by hypervisors when deciding on which mitigations to apply.
> > 
> > Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> > ---
> > 
> > v1->v2: add forgotten dependency on X86_BUG_L1TF
> > 
> > If this is accepted, I think the KVM knobs toggling which kind of L1D 
> > flushes (if any) to do on vmenter should be added as additional 'l1tf=' 
> > parameter (once the KVM pile gets applied).
> 
> For ease of use, how about just 'l1tf=on' and 'l1tf=off'?
> 
> For now, 'litf=on' would just be equivalent to 'nosmt=force', but it
> could eventually also include vmentry_l1d_flush=1.
> 
> And then we could add another option, 'l1tf=flush-always' to mean
> 'nosmt=force' + 'vmentry_l1d_flush=2'.

How about having

    off

and then have 

    nosmt

and for the l1d control:

    vmx-noflush, vmx-flush, vmx-flush-always

and allow to combine nosmt and one of the flush options separate by a comma

e.g. l1tf=nosmt,vmx-flush

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-29 19:47     ` Thomas Gleixner
@ 2018-06-29 19:54       ` Josh Poimboeuf
  2018-06-29 21:26         ` Jiri Kosina
  0 siblings, 1 reply; 55+ messages in thread
From: Josh Poimboeuf @ 2018-06-29 19:54 UTC (permalink / raw)
  To: speck

On Fri, Jun 29, 2018 at 09:47:30PM +0200, speck for Thomas Gleixner wrote:
> On Fri, 29 Jun 2018, speck for Josh Poimboeuf wrote:
> > On Fri, Jun 29, 2018 at 01:36:49AM +0200, speck for Jiri Kosina wrote:
> > > From: Jiri Kosina <jkosina@suse.cz>
> > > Subject: [PATCH] x86/bugs: introduce l1tf=nosmt boot-time parameter
> > > 
> > > Introduce 'l1tf=nosmt' boot option to allow for turning off SMT during boot
> > > automatically on CPUs affected by L1TF.
> > > 
> > > This could be further extended to cover other possible mitigations or to be
> > > consumed by hypervisors when deciding on which mitigations to apply.
> > > 
> > > Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> > > ---
> > > 
> > > v1->v2: add forgotten dependency on X86_BUG_L1TF
> > > 
> > > If this is accepted, I think the KVM knobs toggling which kind of L1D 
> > > flushes (if any) to do on vmenter should be added as additional 'l1tf=' 
> > > parameter (once the KVM pile gets applied).
> > 
> > For ease of use, how about just 'l1tf=on' and 'l1tf=off'?
> > 
> > For now, 'litf=on' would just be equivalent to 'nosmt=force', but it
> > could eventually also include vmentry_l1d_flush=1.
> > 
> > And then we could add another option, 'l1tf=flush-always' to mean
> > 'nosmt=force' + 'vmentry_l1d_flush=2'.
> 
> How about having
> 
>     off
> 
> and then have 
> 
>     nosmt
> 
> and for the l1d control:
> 
>     vmx-noflush, vmx-flush, vmx-flush-always
> 
> and allow to combine nosmt and one of the flush options separate by a comma
> 
> e.g. l1tf=nosmt,vmx-flush

Do we really need all those combinations?  If I'm an admin, I either
want this mitigation on or off.  Some of the combinations don't really
make sense.

-- 
Josh

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-29 19:54       ` [MODERATED] " Josh Poimboeuf
@ 2018-06-29 21:26         ` Jiri Kosina
  2018-06-29 21:28           ` Jiri Kosina
                             ` (2 more replies)
  0 siblings, 3 replies; 55+ messages in thread
From: Jiri Kosina @ 2018-06-29 21:26 UTC (permalink / raw)
  To: speck

On Fri, 29 Jun 2018, speck for Josh Poimboeuf wrote:

> > How about having
> > 
> >     off
> > 
> > and then have 
> > 
> >     nosmt
> > 
> > and for the l1d control:
> > 
> >     vmx-noflush, vmx-flush, vmx-flush-always
> > 
> > and allow to combine nosmt and one of the flush options separate by a comma
> > 
> > e.g. l1tf=nosmt,vmx-flush
> 
> Do we really need all those combinations?  If I'm an admin, I either
> want this mitigation on or off.  Some of the combinations don't really
> make sense.

Agreed; thinking about this a little bit more from admin POV, I think we 
should just go with

(1) l1tf=full
    That would currently mean PTEINV + nosmt=force, and later once the 
    KVM/Xen bits land in, they will make use of it as well (KVM will start 
    doing the flushes (*), Xen doing the PTE sanitization or their variant 
    of gang scheduling or whatnot)

(2) l1tf=virt 
    That would imply nosmt + whatever the hypervisor decides (KVM 
    flushes, Xen ... ?)). There is some talk going on about disabling EPT 
    instead, but odds are that this is going to have much more drastic 
    performance impact, so let's just ignore that for now

(3) l1tf=off
    All the mitigations turned off (perhaps also clear X86_BUG_L1TF?)

If this is something acceptable by everybody (especially virt folks ... 
Paolo, Konrad, Andrew?), I'll send an updated patch.

(*) I think we *really* should not have 2 modes of L1D flushing on 
    vmenter. No admin on earth would be able to reasonably decide which 
    mode to use. Either we trust that the confined set of vmeters is 
    correct, or we don't.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-29 21:26         ` Jiri Kosina
@ 2018-06-29 21:28           ` Jiri Kosina
  2018-06-29 22:05             ` Andi Kleen
  2018-06-29 21:46           ` Josh Poimboeuf
  2018-06-29 21:49           ` Andi Kleen
  2 siblings, 1 reply; 55+ messages in thread
From: Jiri Kosina @ 2018-06-29 21:28 UTC (permalink / raw)
  To: speck

On Fri, 29 Jun 2018, speck for Jiri Kosina wrote:

> Agreed; thinking about this a little bit more from admin POV, I think we 
> should just go with
> 
> (1) l1tf=full
>     That would currently mean PTEINV + nosmt=force, and later once the 
>     KVM/Xen bits land in, they will make use of it as well (KVM will start 
>     doing the flushes (*), Xen doing the PTE sanitization or their variant 
>     of gang scheduling or whatnot)
> 
> (2) l1tf=virt 
>     That would imply nosmt + whatever the hypervisor decides (KVM 
>     flushes, Xen ... ?)). There is some talk going on about disabling EPT 
>     instead, but odds are that this is going to have much more drastic 
>     performance impact, so let's just ignore that for now
> 
> (3) l1tf=off
>     All the mitigations turned off (perhaps also clear X86_BUG_L1TF?)
> 
> If this is something acceptable by everybody (especially virt folks ... 
> Paolo, Konrad, Andrew?), I'll send an updated patch.

The remaining question of course is what should be the default for 
X86_BUG_L1TF CPUs.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-29 21:26         ` Jiri Kosina
  2018-06-29 21:28           ` Jiri Kosina
@ 2018-06-29 21:46           ` Josh Poimboeuf
  2018-06-29 21:49           ` Andi Kleen
  2 siblings, 0 replies; 55+ messages in thread
From: Josh Poimboeuf @ 2018-06-29 21:46 UTC (permalink / raw)
  To: speck

On Fri, Jun 29, 2018 at 11:26:17PM +0200, speck for Jiri Kosina wrote:
> On Fri, 29 Jun 2018, speck for Josh Poimboeuf wrote:
> 
> > > How about having
> > > 
> > >     off
> > > 
> > > and then have 
> > > 
> > >     nosmt
> > > 
> > > and for the l1d control:
> > > 
> > >     vmx-noflush, vmx-flush, vmx-flush-always
> > > 
> > > and allow to combine nosmt and one of the flush options separate by a comma
> > > 
> > > e.g. l1tf=nosmt,vmx-flush
> > 
> > Do we really need all those combinations?  If I'm an admin, I either
> > want this mitigation on or off.  Some of the combinations don't really
> > make sense.
> 
> Agreed; thinking about this a little bit more from admin POV, I think we 
> should just go with
> 
> (1) l1tf=full
>     That would currently mean PTEINV + nosmt=force, and later once the 
>     KVM/Xen bits land in, they will make use of it as well (KVM will start 
>     doing the flushes (*), Xen doing the PTE sanitization or their variant 
>     of gang scheduling or whatnot)
> 
> (2) l1tf=virt 
>     That would imply nosmt + whatever the hypervisor decides (KVM 
>     flushes, Xen ... ?)). There is some talk going on about disabling EPT 
>     instead, but odds are that this is going to have much more drastic 
>     performance impact, so let's just ignore that for now

IIUC, the only difference between these two is that l1tf=virt doesn't
have the PTEINV mitigation.  How is that useful?  Why not just *always*
leave PTEINV enabled?

-- 
Josh

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-29 21:26         ` Jiri Kosina
  2018-06-29 21:28           ` Jiri Kosina
  2018-06-29 21:46           ` Josh Poimboeuf
@ 2018-06-29 21:49           ` Andi Kleen
  2018-06-29 21:56             ` Jiri Kosina
  2018-06-30  9:05             ` Thomas Gleixner
  2 siblings, 2 replies; 55+ messages in thread
From: Andi Kleen @ 2018-06-29 21:49 UTC (permalink / raw)
  To: speck

> Agreed; thinking about this a little bit more from admin POV, I think we 
> should just go with
> 
> (1) l1tf=full
>     That would currently mean PTEINV + nosmt=force, and later once the 
>     KVM/Xen bits land in, they will make use of it as well (KVM will start 
>     doing the flushes (*), Xen doing the PTE sanitization or their variant 
>     of gang scheduling or whatnot)

That means everyone who doesn't use KVM or uses KVM in a way that is
safe wastes 30%+ of performance completely unnecessarily.

I don't think this is a good idea. Everyone who disables SMT should
make a conscious decision on this. 

L1TF is complex and "turn off your brain, just use big hammer" is not
a suitable strategy for it.

KVM already prints warnings when SMT is on, this can be perhaps
made more clear. 

The decision tree for starting KVM with SMT on is roughly:

- If you trust your guests do nothing
- If your KVM is in a cpuset or affinity that has its own set of
cores do nothing (we usually assume interrupts leaking are not
a problem)
- Then first try to bind the KVM to a subset of cores and only
turn off SMT for those
- If that all fails turn off SMT globally. This should be an
explicit option, not lumped in with something else.

-Andi

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-29 21:49           ` Andi Kleen
@ 2018-06-29 21:56             ` Jiri Kosina
  2018-06-29 22:05               ` Thomas Gleixner
  2018-06-29 22:43               ` [MODERATED] " Luck, Tony
  2018-06-30  9:05             ` Thomas Gleixner
  1 sibling, 2 replies; 55+ messages in thread
From: Jiri Kosina @ 2018-06-29 21:56 UTC (permalink / raw)
  To: speck

On Fri, 29 Jun 2018, speck for Andi Kleen wrote:

> That means everyone who doesn't use KVM or uses KVM in a way that is
> safe wastes 30%+ of performance completely unnecessarily.

So how about l1tf=[full,novirt,off] then? Where 'novirt' would imply just 
the PTEINV and nothing else (and semantics of 'full' and 'off' are then 
obvious).

> The decision tree for starting KVM with SMT on is roughly:
> 
> - If you trust your guests do nothing

That's too simplified. "Do nothing" holds only if you trust your guest's 
ring3 equally to your host's ring0 (as they can snoop on each other).

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-29 21:56             ` Jiri Kosina
@ 2018-06-29 22:05               ` Thomas Gleixner
  2018-06-29 22:43               ` [MODERATED] " Luck, Tony
  1 sibling, 0 replies; 55+ messages in thread
From: Thomas Gleixner @ 2018-06-29 22:05 UTC (permalink / raw)
  To: speck

On Fri, 29 Jun 2018, speck for Jiri Kosina wrote:
> On Fri, 29 Jun 2018, speck for Andi Kleen wrote:
> 
> > That means everyone who doesn't use KVM or uses KVM in a way that is
> > safe wastes 30%+ of performance completely unnecessarily.
> 
> So how about l1tf=[full,novirt,off] then? Where 'novirt' would imply just 
> the PTEINV and nothing else (and semantics of 'full' and 'off' are then 
> obvious).

PTEINV is unconditionally ON except for 2 level pagetables where it is
unconditionally OFF, but then you don't have to worry about virt either.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-29 21:28           ` Jiri Kosina
@ 2018-06-29 22:05             ` Andi Kleen
  2018-06-29 22:17               ` Jiri Kosina
  0 siblings, 1 reply; 55+ messages in thread
From: Andi Kleen @ 2018-06-29 22:05 UTC (permalink / raw)
  To: speck

> The remaining question of course is what should be the default for 
> X86_BUG_L1TF CPUs.

Linus already covered this earlier. Default should be SMT on,
with a warning on KVM start and pointer to suitable documentation
what to do.

Later on we may be able even remove the warning if gang
scheduling turns out to be working.

-Andi

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-29 22:05             ` Andi Kleen
@ 2018-06-29 22:17               ` Jiri Kosina
  2018-06-29 23:21                 ` Andi Kleen
  0 siblings, 1 reply; 55+ messages in thread
From: Jiri Kosina @ 2018-06-29 22:17 UTC (permalink / raw)
  To: speck

On Fri, 29 Jun 2018, speck for Andi Kleen wrote:

> Linus already covered this earlier. Default should be SMT on,
> with a warning on KVM start and pointer to suitable documentation
> what to do.

If I put myself in a shoes of a hypothetical admin, that basically means 
that we'd have to have:

- lt1tf=full (which currently means nosmt+pteinv, and once the 
  virtualization bits are merged it'll extend its meaning to whatever the 
  particular hypervisor considers full mitigation)

- l1tf=off (mitigations (maybe including PTEINV, but it doesn't make a 
  big difference really) is turned off, and hypervisors *don't* warn ever)

- l1tf=novirt (only PTEINV is turned on and hypervisors *do* warn when 
  they start VM and HT is on)

Sounds acceptable?

> Later on we may be able even remove the warning if gang scheduling turns 
> out to be working.

Oh yeah.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-29 21:56             ` Jiri Kosina
  2018-06-29 22:05               ` Thomas Gleixner
@ 2018-06-29 22:43               ` Luck, Tony
  1 sibling, 0 replies; 55+ messages in thread
From: Luck, Tony @ 2018-06-29 22:43 UTC (permalink / raw)
  To: speck

On Fri, Jun 29, 2018 at 11:56:33PM +0200, speck for Jiri Kosina wrote:
> On Fri, 29 Jun 2018, speck for Andi Kleen wrote:
> > - If you trust your guests do nothing
> 
> That's too simplified. "Do nothing" holds only if you trust your guest's 
> ring3 equally to your host's ring0 (as they can snoop on each other).

If the trusted guests are using PTENIV, then the applications in those
guests don't have not-present PTEs that point at anything interesting.

-Tony

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-29 22:17               ` Jiri Kosina
@ 2018-06-29 23:21                 ` Andi Kleen
  2018-06-29 23:33                   ` Jiri Kosina
  0 siblings, 1 reply; 55+ messages in thread
From: Andi Kleen @ 2018-06-29 23:21 UTC (permalink / raw)
  To: speck

> If I put myself in a shoes of a hypothetical admin, that basically means 
> that we'd have to have:
> 
> - lt1tf=full (which currently means nosmt+pteinv, and once the 
>   virtualization bits are merged it'll extend its meaning to whatever the 
>   particular hypervisor considers full mitigation)

I don't see any point in this.  


We should just have a nosmt option,
but only recommend it after presenting the proper decision tree
and when KVM is actually started.

> 
> - l1tf=off (mitigations (maybe including PTEINV, but it doesn't make a 
>   big difference really) is turned off, and hypervisors *don't* warn ever)

This is pointless. The workaround is basically free and there is
not reason to not use it.

> 
> - l1tf=novirt (only PTEINV is turned on and hypervisors *do* warn when 
>   they start VM and HT is on)
> 
> Sounds acceptable?

I think we shouldn't have l1tf= options at all. They're the wrong
use model for this vulnerability. Just nosmt and proper warnings in KVM
with documentation.

Just because something was useful for spectre doesn't mean we need
it here too.


-Andi

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-29 23:21                 ` Andi Kleen
@ 2018-06-29 23:33                   ` Jiri Kosina
  2018-06-29 23:37                     ` Jiri Kosina
  2018-06-29 23:44                     ` Andi Kleen
  0 siblings, 2 replies; 55+ messages in thread
From: Jiri Kosina @ 2018-06-29 23:33 UTC (permalink / raw)
  To: speck

On Fri, 29 Jun 2018, speck for Andi Kleen wrote:

> > - lt1tf=full (which currently means nosmt+pteinv, and once the 
> >   virtualization bits are merged it'll extend its meaning to whatever the 
> >   particular hypervisor considers full mitigation)
> 
> I don't see any point in this.  
> 
> We should just have a nosmt option

Sorry, but there are other x86 CPU vendors (or even legacy Intel x86 CPUs) 
that are not susceptible to L1TF.

What's wrong with having a global switch, that could be deployed across 
the whole (for example) datacenter, that says "if the cpu has L1TF issue, 
do whatever it takes to mitigate it, otherwise operate normally"?

"nosmt" doesn't let you do that, as you'd have to know what x86 system 
exactly you're booting on. The conditional one seems like the natural way 
to go.

> > - l1tf=off (mitigations (maybe including PTEINV, but it doesn't make a 
> >   big difference really) is turned off, and hypervisors *don't* warn ever)
> 
> This is pointless. The workaround is basically free and there is
> not reason to not use it.

Yeah, that's why I did put that into brackets. I agree it doesn't really 
matter.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-29 23:33                   ` Jiri Kosina
@ 2018-06-29 23:37                     ` Jiri Kosina
  2018-06-29 23:44                     ` Andi Kleen
  1 sibling, 0 replies; 55+ messages in thread
From: Jiri Kosina @ 2018-06-29 23:37 UTC (permalink / raw)
  To: speck

On Sat, 30 Jun 2018, speck for Jiri Kosina wrote:

> Sorry, but there are other x86 CPU vendors (or even legacy Intel x86 CPUs) 
> that are not susceptible to L1TF.
> 
> What's wrong with having a global switch, that could be deployed across 
> the whole (for example) datacenter, that says "if the cpu has L1TF issue, 
> do whatever it takes to mitigate it, otherwise operate normally"?
> 
> "nosmt" doesn't let you do that, as you'd have to know what x86 system 
> exactly you're booting on. The conditional one seems like the natural way 
> to go.

IOW, I do absolutely understand why you'd be oposing to defaulting to SMT 
off, globally.

But I don't understand why you'd be oposing to making the mitigation 
explicitly/automatically depending on whether the given CPU behaves in 
this particular L1TF-way.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-29 23:33                   ` Jiri Kosina
  2018-06-29 23:37                     ` Jiri Kosina
@ 2018-06-29 23:44                     ` Andi Kleen
  2018-06-30  0:02                       ` Jiri Kosina
  1 sibling, 1 reply; 55+ messages in thread
From: Andi Kleen @ 2018-06-29 23:44 UTC (permalink / raw)
  To: speck

> What's wrong with having a global switch, that could be deployed across 
> the whole (for example) datacenter, that says "if the cpu has L1TF issue, 
> do whatever it takes to mitigate it, otherwise operate normally"?

Most people who use it would lose a lot of performance for no gain at all.

> "nosmt" doesn't let you do that, as you'd have to know what x86 system 
> exactly you're booting on. The conditional one seems like the natural way 
> to go.

Right that's the point. L1TF virtualization mitigation should be only done 
after properly analyzing the situation, never blindly.

It's like a lot of other security threats, like let's say if your
data is properly encrypted on the wire or how you configure
a firewall. You cannot just get security blindly, but need to make some
decisions to have useful trade offs.

-Andi

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-29 23:44                     ` Andi Kleen
@ 2018-06-30  0:02                       ` Jiri Kosina
  2018-06-30  0:41                         ` Andi Kleen
  0 siblings, 1 reply; 55+ messages in thread
From: Jiri Kosina @ 2018-06-30  0:02 UTC (permalink / raw)
  To: speck

On Fri, 29 Jun 2018, speck for Andi Kleen wrote:

> > "nosmt" doesn't let you do that, as you'd have to know what x86 system 
> > exactly you're booting on. The conditional one seems like the natural way 
> > to go.
> 
> Right that's the point. L1TF virtualization mitigation should be only done 
> after properly analyzing the situation, never blindly.
> 
> It's like a lot of other security threats, like let's say if your
> data is properly encrypted on the wire or how you configure
> a firewall. You cannot just get security blindly, but need to make some
> decisions to have useful trade offs.

I do understand your arugment, but given all the pushback you're giving to 
the "simple" solution, I guess it's now your turn to propose how _exactly_ 
this should be done.

Namely, we need to make sure that

- whoever provides public VMs (cloud providers) gets the right (all) 
  mitigations; at the same time, any performance loss that would not
  be security-justified is a no-go

- whoever doesn't need the mitigations (usually desktop users, but who 
  knows) gets what they need (how do you/they figure that out easily?)

- Joe The Random User, who absoluetely doesn't have a clue what either of 
  the terms  "CPU", "cache", "speculation", "virtualization" really means
  in the low-level technical terms, but is able to click around and use 
  his computer (including firing up virtual machines for any purpose, as
  that's super-trivial these days) doesn't get fully compromised 
  immediately

- you don't have to fine-tune this (build-time-)configuration per each and 
  every individual system

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-30  0:02                       ` Jiri Kosina
@ 2018-06-30  0:41                         ` Andi Kleen
  2018-06-30  0:50                           ` Jiri Kosina
                                             ` (3 more replies)
  0 siblings, 4 replies; 55+ messages in thread
From: Andi Kleen @ 2018-06-30  0:41 UTC (permalink / raw)
  To: speck

> I do understand your arugment, but given all the pushback you're giving to 
> the "simple" solution, I guess it's now your turn to propose how _exactly_ 
> this should be done.

Right.

I think the key missing piece is a Documentation/ file that describes
what to do, and to which KVM can point to. I'll look into this next week.

> 
> Namely, we need to make sure that
> 
> - whoever provides public VMs (cloud providers) gets the right (all) 
>   mitigations; at the same time, any performance loss that would not
>   be security-justified is a no-go

These will be using a wide variety of mitigations depending on their
circumstances (e.g. likely per core binding or some others like falling 
back to shadow page tables for UP), likely few will be 
using SMT off

They should know what they are doing.
> 
> - whoever doesn't need the mitigations (usually desktop users, but who 
>   knows) gets what they need (how do you/they figure that out easily?)

Two steps:
- do you control the guest OS?
- is the guest OS mitigated too.

Then they are safe.

It's really a simple rule: just patch and update everything you control.

If you don't control the guest OS then consider first cpuset, and then
SMT off.

> 
> - Joe The Random User, who absoluetely doesn't have a clue what either of 
>   the terms  "CPU", "cache", "speculation", "virtualization" really means
>   in the low-level technical terms, but is able to click around and use 
>   his computer (including firing up virtual machines for any purpose, as
>   that's super-trivial these days) doesn't get fully compromised 
>   immediately

It should be safe to assume they don't run untrusted gursts.

They also need to make sure to update the guest OS.

> 
> - you don't have to fine-tune this (build-time-)configuration per each and 
>   every individual system

?

-Andi

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-30  0:41                         ` Andi Kleen
@ 2018-06-30  0:50                           ` Jiri Kosina
  2018-06-30  8:59                           ` Thomas Gleixner
                                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 55+ messages in thread
From: Jiri Kosina @ 2018-06-30  0:50 UTC (permalink / raw)
  To: speck

On Fri, 29 Jun 2018, speck for Andi Kleen wrote:

> > - whoever provides public VMs (cloud providers) gets the right (all) 
> >   mitigations; at the same time, any performance loss that would not
> >   be security-justified is a no-go
> 
> These will be using a wide variety of mitigations depending on their
> circumstances (e.g. likely per core binding 

Per-core binding doesn't work for untrusted guests if you want to protect 
the host.

> or some others like falling back to shadow page tables for UP), 

UP doesn't realistically exist any more, and SPT we're currently trying to 
measure internally; I don't have any final numbers yet, but even the 
measurements that VMWare did [1] some time ago indicate that it's not the 
way to go.

[1] https://www.vmware.com/pdf/Perf_ESX_Intel-EPT-eval.pdf

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-30  0:41                         ` Andi Kleen
  2018-06-30  0:50                           ` Jiri Kosina
@ 2018-06-30  8:59                           ` Thomas Gleixner
  2018-06-30 17:42                             ` [MODERATED] " Linus Torvalds
  2018-07-05 20:03                             ` [MODERATED] " Jon Masters
  2018-06-30 14:59                           ` Josh Poimboeuf
  2018-06-30 23:34                           ` Dave Hansen
  3 siblings, 2 replies; 55+ messages in thread
From: Thomas Gleixner @ 2018-06-30  8:59 UTC (permalink / raw)
  To: speck

On Fri, 29 Jun 2018, speck for Andi Kleen wrote:
> > - Joe The Random User, who absoluetely doesn't have a clue what either of 
> >   the terms  "CPU", "cache", "speculation", "virtualization" really means
> >   in the low-level technical terms, but is able to click around and use 
> >   his computer (including firing up virtual machines for any purpose, as
> >   that's super-trivial these days) doesn't get fully compromised 
> >   immediately
> 
> It should be safe to assume they don't run untrusted gursts.
> 
> They also need to make sure to update the guest OS.

You are making assumptions all over the place which are biased by Intel
interests. Is Intel going to update all their marketing drivel how secure
virtualization is and how it solves problems with untrusted code magically?

The sane assumption is that Joe Random User will run untrusted guests,
because that's what he can read everywhere that it's safe and zero risk
because of virtualization. The web is full of VM images providing demos and
whatever cool stuff Joe User wants and the virtualization gives him that
warm and fuzzy feeling of being safe.

So it's perfectly justified to switch off SMT by default when Joe Random
User starts a VM. It's a distro policy problem though and the kernel needs
to provide the proper knobs to select a policy based on their analysis and
in the interest of users.

These interests might not be the same as Intels, but that's not our
problem.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-29 21:49           ` Andi Kleen
  2018-06-29 21:56             ` Jiri Kosina
@ 2018-06-30  9:05             ` Thomas Gleixner
  1 sibling, 0 replies; 55+ messages in thread
From: Thomas Gleixner @ 2018-06-30  9:05 UTC (permalink / raw)
  To: speck

On Fri, 29 Jun 2018, speck for Andi Kleen wrote:
> The decision tree for starting KVM with SMT on is roughly:
> 
> - If you trust your guests do nothing
> - If your KVM is in a cpuset or affinity that has its own set of
> cores do nothing (we usually assume interrupts leaking are not
> a problem)

Who is we? We == Intel I assume.

I further assume that Intel did a proper analysis of all interrupt handlers
and all softirq functions that there is nothing interesting to leak, right?

Can we see that analysis please?
 
Thanks,

	tglx

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-30  0:41                         ` Andi Kleen
  2018-06-30  0:50                           ` Jiri Kosina
  2018-06-30  8:59                           ` Thomas Gleixner
@ 2018-06-30 14:59                           ` Josh Poimboeuf
  2018-06-30 23:34                           ` Dave Hansen
  3 siblings, 0 replies; 55+ messages in thread
From: Josh Poimboeuf @ 2018-06-30 14:59 UTC (permalink / raw)
  To: speck

On Fri, Jun 29, 2018 at 05:41:30PM -0700, speck for Andi Kleen wrote:
> > I do understand your arugment, but given all the pushback you're giving to 
> > the "simple" solution, I guess it's now your turn to propose how _exactly_ 
> > this should be done.
> 
> Right.
> 
> I think the key missing piece is a Documentation/ file that describes
> what to do, and to which KVM can point to. I'll look into this next week.
> 
> > 
> > Namely, we need to make sure that
> > 
> > - whoever provides public VMs (cloud providers) gets the right (all) 
> >   mitigations; at the same time, any performance loss that would not
> >   be security-justified is a no-go
> 
> These will be using a wide variety of mitigations depending on their
> circumstances (e.g. likely per core binding or some others like falling 
> back to shadow page tables for UP), likely few will be 
> using SMT off
> 
> They should know what they are doing.

Why not have the best of both worlds:

- have sane defaults for the average user, so you can use l1tf=on and
  know that you're safe without needing a PhD;

- give power users the ability to tweak the default so they can decide
  where they want to be on the security/performance continuum.

So I would propose a simple l1tf=on/off cmdline option:

- on:  non-forced nosmt + confined flush + PTEINV
- off: PTEINV only (is there ever a reason to turn it off?)

And a similar runtime knob at

  /sys/devices/system/cpu/vulnerabilities/l1tf
  
so the user (or virtualization manager) can make that decision at game
time (and even undo it after shutting down the VM).

For power users who want to tweak, after setting l1tf=on, they could
selectively re-enable SMT threads as they wish.  In that case, when
running a VM on a CPU which has an online sibling, kvm can print a
one-time warning message, basically: "I hope you know what you're
doing".

There could also be another warning message for the l1tf=off case, when
starting a VM on a CPU which has a sibling: "If your VM contains
untrusted code, you should set l1tf=on".

And there could be flags provided by the user to skip the warnings, like
"I know what I'm doing" or "I trust the VM".

-- 
Josh

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-30  8:59                           ` Thomas Gleixner
@ 2018-06-30 17:42                             ` Linus Torvalds
  2018-06-30 19:30                               ` Jiri Kosina
  2018-07-02  8:06                               ` Thomas Gleixner
  2018-07-05 20:03                             ` [MODERATED] " Jon Masters
  1 sibling, 2 replies; 55+ messages in thread
From: Linus Torvalds @ 2018-06-30 17:42 UTC (permalink / raw)
  To: speck



On Sat, 30 Jun 2018, speck for Thomas Gleixner wrote:
> 
> So it's perfectly justified to switch off SMT by default when Joe Random
> User starts a VM.

No.

A distro can do whatever the hell it wants, but I want the default 
behavior to be to either just warn, or to have some _smart_ dynamic 
behavior.

I don't think normal people run VM's with random images - and certainly 
not primarily for security. Normal people run VM's because they *need* to, 
not because of any warm fuzzies. Maybe there is some application that only 
runs under Windows (and Wine isn't good enough).

Or maybe you're a developer and you want to do cross-platform building 
and/or testing (whether it's your web thing and you want to check it in IE 
and Safari, or whether it's an actual application that you just want to 
build or run).

Are there people who run VM's principally for security? Yes. If you're a 
security person, and you want to test a virus, sure. Or maybe you're 
really just a normal person and really security conscious but want to run 
random apps, and you set up a VM for that. But the latter is certainly not 
even remotely "normal". Maybe it _should_ be, but it isn't.

And just turning off SMT because they _might_ be that very unusual 
security case that actually runs random garbage off the net (more random 
than the usual stuff)? No.

So either warn, or be smart. The "be smart" might be something like "keep 
SMT on by default, but if somebody actually uses kvm, turn it off and 
start a watchdog timer, and turn it back on again when vmx hasn't been 
used for a minute or whatever".

Could we do that kind of clever thing? Yes. But somebody would have to 
write the code and test it, and I suspect nobody is really motivated 
enough, because most people are probably ok with just warning or using the 
boot-time parameter..

             Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-30 17:42                             ` [MODERATED] " Linus Torvalds
@ 2018-06-30 19:30                               ` Jiri Kosina
  2018-06-30 19:52                                 ` Linus Torvalds
  2018-07-02  8:06                               ` Thomas Gleixner
  1 sibling, 1 reply; 55+ messages in thread
From: Jiri Kosina @ 2018-06-30 19:30 UTC (permalink / raw)
  To: speck

On Sat, 30 Jun 2018, speck for Linus Torvalds wrote:

> So either warn, or be smart. The "be smart" might be something like 
> "keep SMT on by default, but if somebody actually uses kvm, turn it off 
> and start a watchdog timer, and turn it back on again when vmx hasn't 
> been used for a minute or whatever".

I'd personally rather not go down this road; it feels a way too bit 
"unexpected".

There are many use cases out there, in which the particular application 
(not to name one major DB software as a random example :) ) evaluates the 
system during its startup, and carefully sets up CPU affinities for all of 
its gazillion threads, so that it runs optimized for a particular 
workload.

Once we start offlining and onlining, all this breaks for those scenarios 
completely.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] [PATCH 1/1 v3] Linux patch #1
  2018-06-28 22:53 [MODERATED] [PATCH 1/1] Linux patch #1 Jiri Kosina
  2018-06-28 23:15 ` [MODERATED] " Jiri Kosina
  2018-06-28 23:36 ` [MODERATED] [PATCH 1/1 v2] " Jiri Kosina
@ 2018-06-30 19:48 ` Jiri Kosina
  2018-06-30 21:31   ` [MODERATED] " Josh Poimboeuf
  2018-06-30 22:22 ` [MODERATED] [PATCH 1/1 v4] " Jiri Kosina
  3 siblings, 1 reply; 55+ messages in thread
From: Jiri Kosina @ 2018-06-30 19:48 UTC (permalink / raw)
  To: speck

Introduce 'l1tf=' boot option to allow for boot-time switching of mitigation
that is used on CPUs affected by L1TF.

The values are

       full    Provide all available mitigations for L1TF
	       vulnerability (disable HT, perform PTE bit
	       inversion, signal hypervisors to provide full
	       mitigations)

       novirt  Provide all available mitigations needed
	       for running on bare metal (PTE bit inversion),
	       while not applying mitigations needed for
	       VM isolation. This is intended for environments
	       in which not having VM isolated from
	       hypervisor's ring0 is acceptable.

       off     Disable all L1TF mitigations.

Also let KVM issue a warning in case of 'novirt' being used, as that means 
that VMX mode is not fully protected/protecting (SMT has to be turned off 
and KVM needs to do L1D flushes on vmentry).

Once KVM mitigations are applied on top, the L1TF_MITIGATION_FULL case in 
vmx_vm_init() should be modified so that it makes L1D flushes to happen on 
particular vmentries, and the L1TF_MSG_FULL warning should be removed.

Also, introduce CONFIG_ option to allow chosing the default mitigation to 
be used on affected CPUs in case no l1tf= cmdline option is present.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---

v2->v3:
	- provide l1tf=[full,novirt,off]
	- provide config option to chose the default
	- let KVM warn in novirt case

v1->v2: add forgotten dependency on X86_BUG_L1TF

What should be done next on top of this:
	- once Paolo's/Konrad's KVM bits land in the tree, they should 
          look at the currently active mitigation setting and decide about 
	  doing L1D flushes based on that
	- sysfs toggling can also be added later on top

 Documentation/admin-guide/kernel-parameters.txt | 14 ++++++
 arch/x86/Kconfig                                | 18 ++++++++
 arch/x86/include/asm/processor.h                |  7 +++
 arch/x86/kernel/cpu/bugs.c                      | 57 +++++++++++++++++++++++--
 arch/x86/kvm/vmx.c                              | 19 +++++++++
 include/linux/cpu.h                             |  2 +
 kernel/cpu.c                                    |  9 +++-
 7 files changed, 121 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 8e29c4b6756f..e594e3596e44 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1971,6 +1971,20 @@
 			feature (tagged TLBs) on capable Intel chips.
 			Default is 1 (enabled)
 
+	l1tf=           [X86] Control mitigation of L1TF vulnerability on the
+			      affected CPUs
+			full	Provide all available mitigations for L1TF
+				vulnerability (disable HT, perform PTE bit
+				inversion, signal hypervisors to provide full
+				mitigations)
+			novirt	Provide all available mitigations needed
+				for running on bare metal (PTE bit inversion),
+				while not applying mitigations needed for
+				VM isolation. This is intended for environments
+				in which not having VM isolated from
+				hypervisor's ring0 is acceptable
+			off	Disable all L1TF mitigations
+
 	l2cr=		[PPC]
 
 	l3cr=		[PPC]
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 7a34fdf8daf0..a5231a0812e3 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2390,6 +2390,24 @@ config MODIFY_LDT_SYSCALL
 	  surface.  Disabling it removes the modify_ldt(2) system call.
 
 	  Saying 'N' here may make sense for embedded or server kernels.
+choice
+	prompt "Default L1TF mitigation"
+	default L1TF_MITIGATION_NOVIRT
+	help
+		Define what the default behavior for selecting mitigation on
+		CPUs affected by L1TF should be. The default can be overrided
+		on the kernel command-line. Refer to
+		<file:Documentation/admin-guide/kernel-parameters.txt>
+
+config L1TF_MITIGATION_FULL
+	bool "Full available L1TF mitigation"
+config L1TF_MITIGATION_NOVIRT
+	bool "Use L1TF bare metal mitigations only"
+config L1TF_MITIGATION_OFF
+	bool "Turn all L1TF mitigations off"
+
+endchoice
+
 
 source "kernel/livepatch/Kconfig"
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 7e3ac5eedcd6..05471c590964 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -982,4 +982,11 @@ bool xen_set_default_idle(void);
 void stop_this_cpu(void *dummy);
 void df_debug(struct pt_regs *regs, long error_code);
 void microcode_check(void);
+
+enum l1tf_mitigations {
+	L1TF_MITIGATION_OFF,
+	L1TF_MITIGATION_NOVIRT,
+	L1TF_MITIGATION_FULL
+};
+enum l1tf_mitigations get_l1tf_mitigation(void);
 #endif /* _ASM_X86_PROCESSOR_H */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 50500cea6eba..72cf0747011a 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -657,6 +657,23 @@ void x86_spec_ctrl_setup_ap(void)
 
 #undef pr_fmt
 #define pr_fmt(fmt)	"L1TF: " fmt
+/* Default mitigation for L1TF-affected CPUs */
+static int l1tf_mitigation =
+#ifdef CONFIG_L1TF_MITIGATION_FULL
+	L1TF_MITIGATION_NOVIRT;
+#endif
+#ifdef CONFIG_L1TF_MITIGATION_NOVIRT
+	L1TF_MITIGATION_NOVIRT;
+#endif
+#ifdef CONFIG_L1TF_MITIGATION_OFF
+	L1TF_MITIGATION_OFF;
+#endif
+enum l1tf_mitigations get_l1tf_mitigation(void)
+{
+	return l1tf_mitigation;
+}
+EXPORT_SYMBOL(get_l1tf_mitigation);
+
 static void __init l1tf_select_mitigation(void)
 {
 	u64 half_pa;
@@ -664,6 +681,16 @@ static void __init l1tf_select_mitigation(void)
 	if (!boot_cpu_has_bug(X86_BUG_L1TF))
 		return;
 
+	switch (get_l1tf_mitigation()) {
+	case L1TF_MITIGATION_OFF:
+		return;
+	case L1TF_MITIGATION_FULL:
+		cpu_smt_disable(true);
+		break;
+	case L1TF_MITIGATION_NOVIRT:
+		break;
+	}
+
 #if CONFIG_PGTABLE_LEVELS == 2
 	pr_warn("Kernel not compiled for PAE. No mitigation for L1TF\n");
 	return;
@@ -682,10 +709,36 @@ static void __init l1tf_select_mitigation(void)
 
 	setup_force_cpu_cap(X86_FEATURE_L1TF_PTEINV);
 }
+
+static int __init l1tf_cmdline(char *str)
+{
+	if (!boot_cpu_has_bug(X86_BUG_L1TF))
+		return 0;
+
+	if (!str)
+		return 0;
+
+	if (!strcmp(str, "full"))
+		l1tf_mitigation = L1TF_MITIGATION_FULL;
+	else if (!strcmp(str, "novirt"))
+		l1tf_mitigation = L1TF_MITIGATION_NOVIRT;
+	else if (!strcmp(str, "off"))
+		l1tf_mitigation = L1TF_MITIGATION_OFF;
+
+	return 0;
+}
+early_param("l1tf", l1tf_cmdline);
+
 #undef pr_fmt
 
 #ifdef CONFIG_SYSFS
 
+static const char *l1tf_states[] = {
+	[L1TF_MITIGATION_FULL]		= "Mitigation: Full",
+	[L1TF_MITIGATION_NOVIRT]	= "Mitigation: Page Table Inversion",
+	[L1TF_MITIGATION_OFF]		= "Vulnerable"
+};
+
 static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr,
 			       char *buf, unsigned int bug)
 {
@@ -712,9 +765,7 @@ static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr
 		return sprintf(buf, "%s\n", ssb_strings[ssb_mode]);
 
 	case X86_BUG_L1TF:
-		if (boot_cpu_has(X86_FEATURE_L1TF_PTEINV))
-			return sprintf(buf, "Mitigation: Page Table Inversion\n");
-		break;
+		return sprintf(buf, "%s\n", l1tf_states[get_l1tf_mitigation()]);
 
 	default:
 		break;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 559a12b6184d..8a5921ad38e2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10370,10 +10370,29 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
 	return ERR_PTR(err);
 }
 
+#define L1TF_MSG_NOVIRT "kvm: L1TF CPU bug present and virtualization mitigation disabled. Refer to CVE-2018-3620 for details.\n"
+#define L1TF_MSG_FULL "kvm: L1TF CPU bug present and KVM lacks support for L1D flushes. Refer to CVE-2018-3620 for details.\n"
 static int vmx_vm_init(struct kvm *kvm)
 {
 	if (!ple_gap)
 		kvm->arch.pause_in_guest = true;
+	if (boot_cpu_has(X86_BUG_L1TF)) {
+			switch (get_l1tf_mitigation()) {
+			case L1TF_MITIGATION_OFF:
+				break;
+			case L1TF_MITIGATION_NOVIRT:
+				printk_once (KERN_ERR L1TF_MSG_NOVIRT);
+				break;
+			case L1TF_MITIGATION_FULL:
+				/*
+				 * FIXME: once L1D flushes are implemented for
+				 * VMX, this will go away and L1TF_MITIGATION_FULL
+				 * would imply L1D flushing being turned on
+				 */
+				printk_once (KERN_ERR L1TF_MSG_FULL);
+				break;
+			}
+	}
 	return 0;
 }
 
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 7532cbf27b1d..3a3b5c4b1d4a 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -177,8 +177,10 @@ enum cpuhp_smt_control {
 
 #if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT)
 extern enum cpuhp_smt_control cpu_smt_control;
+void cpu_smt_disable(bool force);
 #else
 # define cpu_smt_control		(CPU_SMT_ENABLED)
+static inline void cpu_smt_disable(bool force) { }
 #endif
 
 #endif /* _LINUX_CPU_H_ */
diff --git a/kernel/cpu.c b/kernel/cpu.c
index d29fdd7e57bb..cba5afcab8a4 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -936,13 +936,18 @@ EXPORT_SYMBOL(cpu_down);
 #ifdef CONFIG_HOTPLUG_SMT
 enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED;
 
-static int __init smt_cmdline_disable(char *str)
+void __init cpu_smt_disable(bool force)
 {
 	cpu_smt_control = CPU_SMT_DISABLED;
-	if (str && !strcmp(str, "force")) {
+	if (force) {
 		pr_info("SMT: Force disabled\n");
 		cpu_smt_control = CPU_SMT_FORCE_DISABLED;
 	}
+}
+
+static int __init smt_cmdline_disable(char *str)
+{
+	cpu_smt_disable(str && !strcmp(str, "force"));
 	return 0;
 }
 early_param("nosmt", smt_cmdline_disable);

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-30 19:30                               ` Jiri Kosina
@ 2018-06-30 19:52                                 ` Linus Torvalds
  2018-06-30 19:58                                   ` Jiri Kosina
  0 siblings, 1 reply; 55+ messages in thread
From: Linus Torvalds @ 2018-06-30 19:52 UTC (permalink / raw)
  To: speck



On Sat, 30 Jun 2018, speck for Jiri Kosina wrote:

> > So either warn, or be smart. The "be smart" might be something like 
> > "keep SMT on by default, but if somebody actually uses kvm, turn it off 
> > and start a watchdog timer, and turn it back on again when vmx hasn't 
> > been used for a minute or whatever".
> 
> I'd personally rather not go down this road; it feels a way too bit 
> "unexpected".

The automatic thing would have nice semantics, but I agree that it's 
definitely subtle too.

But I think it's entirely unacceptable to disable SMT by default just 
because a user *might* run in virtualization.

So basically, I think the default *has* to be to keep SMT enabled (and 
just warn if you run virtualized loads), or to turn off SMT only for when 
somebody actively starts to use kvm.

Because the "disable SMT at boot" thing absolutely _has_ to be an explicit 
choice by the admin.

No way in hell will I accept "disable at boot" as a defaul value. Most 
people never run VM's at all.

That said, the "automatic disable when actually running kvm" might be a 
valid default _despite_ the subtleties - and your careful Oracle tuning 
would then be the thing that requires an explicit boot option (either 
"disable smt at boot" or "don't disable smt automatically when I use 
kvm").

So I still think the biggest reason to not do the automatic disable is 
likely that nobody wants to write the code for it.

                   Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-30 19:52                                 ` Linus Torvalds
@ 2018-06-30 19:58                                   ` Jiri Kosina
  2018-07-02 14:52                                     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 55+ messages in thread
From: Jiri Kosina @ 2018-06-30 19:58 UTC (permalink / raw)
  To: speck

On Sat, 30 Jun 2018, speck for Linus Torvalds wrote:

> But I think it's entirely unacceptable to disable SMT by default just 
> because a user *might* run in virtualization.
> 
> So basically, I think the default *has* to be to keep SMT enabled (and 
> just warn if you run virtualized loads), or to turn off SMT only for when 
> somebody actively starts to use kvm.
> 
> Because the "disable SMT at boot" thing absolutely _has_ to be an explicit 
> choice by the admin.
> 
> No way in hell will I accept "disable at boot" as a defaul value. Most 
> people never run VM's at all.

Yeah, the v3 patch I just sent defaults to bare-metal mitigation only 
("what is default behavior" can be easily overridden by embedded 
kernel/distro vendors if they wish to, so that they don't have to carry 
the boot cmdline around forever) and KVM warns in case it's about to start 
the first VM in such case.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v3] Linux patch #1
  2018-06-30 19:48 ` [MODERATED] [PATCH 1/1 v3] " Jiri Kosina
@ 2018-06-30 21:31   ` Josh Poimboeuf
  2018-06-30 21:35     ` Linus Torvalds
  2018-06-30 21:43     ` Jiri Kosina
  0 siblings, 2 replies; 55+ messages in thread
From: Josh Poimboeuf @ 2018-06-30 21:31 UTC (permalink / raw)
  To: speck

On Sat, Jun 30, 2018 at 09:48:25PM +0200, speck for Jiri Kosina wrote:
> From: Jiri Kosina <jkosina@suse.cz>
> Subject: [PATCH] x86/bugs: introduce boot-time control of L1TF mitigations
> 
> Introduce 'l1tf=' boot option to allow for boot-time switching of mitigation
> that is used on CPUs affected by L1TF.
> 
> The values are
> 
>        full    Provide all available mitigations for L1TF
> 	       vulnerability (disable HT, perform PTE bit
> 	       inversion, signal hypervisors to provide full
> 	       mitigations)
> 
>        novirt  Provide all available mitigations needed
> 	       for running on bare metal (PTE bit inversion),
> 	       while not applying mitigations needed for
> 	       VM isolation. This is intended for environments
> 	       in which not having VM isolated from
> 	       hypervisor's ring0 is acceptable.
> 
>        off     Disable all L1TF mitigations.

I still don't get what's the point of making PTEINV configurable?

-- 
Josh

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v3] Linux patch #1
  2018-06-30 21:31   ` [MODERATED] " Josh Poimboeuf
@ 2018-06-30 21:35     ` Linus Torvalds
  2018-06-30 21:43     ` Jiri Kosina
  1 sibling, 0 replies; 55+ messages in thread
From: Linus Torvalds @ 2018-06-30 21:35 UTC (permalink / raw)
  To: speck



On Sat, 30 Jun 2018, speck for Josh Poimboeuf wrote:
> 
> I still don't get what's the point of making PTEINV configurable?

It's not - that part is fixed and static at compile-time.

I think Jiri means just the cache invalidation at VM entry.

            Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v3] Linux patch #1
  2018-06-30 21:31   ` [MODERATED] " Josh Poimboeuf
  2018-06-30 21:35     ` Linus Torvalds
@ 2018-06-30 21:43     ` Jiri Kosina
  1 sibling, 0 replies; 55+ messages in thread
From: Jiri Kosina @ 2018-06-30 21:43 UTC (permalink / raw)
  To: speck

On Sat, 30 Jun 2018, speck for Josh Poimboeuf wrote:

> >        novirt  Provide all available mitigations needed
> > 	       for running on bare metal (PTE bit inversion),
> > 	       while not applying mitigations needed for
> > 	       VM isolation. This is intended for environments
> > 	       in which not having VM isolated from
> > 	       hypervisor's ring0 is acceptable.
> > 
> >        off     Disable all L1TF mitigations.
> 
> I still don't get what's the point of making PTEINV configurable?

The novirt vs. off is there to basically be able to decide whether issuing 
the warning when starting KVM should be done or not (as explicitly chosing 
"off" means "I don't care, don't even warn").

But you are right that there is a small bug in the code that clears the 
PTEINV feature bit when "off" is set, which is confusing and doesn't make 
sense (as the inverting is there all the time anyway, hardcoded). I'll fix 
that up in v4.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] [PATCH 1/1 v4] Linux patch #1
  2018-06-28 22:53 [MODERATED] [PATCH 1/1] Linux patch #1 Jiri Kosina
                   ` (2 preceding siblings ...)
  2018-06-30 19:48 ` [MODERATED] [PATCH 1/1 v3] " Jiri Kosina
@ 2018-06-30 22:22 ` Jiri Kosina
  2018-07-02 14:51   ` [MODERATED] " Konrad Rzeszutek Wilk
  3 siblings, 1 reply; 55+ messages in thread
From: Jiri Kosina @ 2018-06-30 22:22 UTC (permalink / raw)
  To: speck

Introduce 'l1tf=' boot option to allow for boot-time switching of 
mitigation that is used on CPUs affected by L1TF.

The possible values are

	full    Provide all available mitigations for L1TF
		vulnerability (disable HT, perform PTE bit
		inversion, allow hypervisors to know that
		they should provide all mitigations)
	novirt  Provide all available mitigations needed
		for running on bare metal (PTE bit inversion),
		while not applying mitigations needed for
		VM isolation. Hypervisors will be issuing
		warning when first VM is being started in
		pontentially insecure configuraion
	off     Claim "I don't care at all about this issue".
		The PTE bit inversion (bare metal mitigation) will
		still be performed, but hypervisors will not be
		issuing warning when VM is being started in
		potentially insecure configuration

Once KVM mitigations are applied on top, the L1TF_MITIGATION_FULL case in 
will be modified so that it makes L1D flushes to happen on (particular) 
vmentries, and the L1TF_MSG_FULL warning will be removed.

Also, introduce CONFIG_ option to allow vendors to chose the default 
mitigation to be used on affected CPUs in case no l1tf= cmdline option is 
present.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---

v3->v4:
	- unconfuse the meaning of 'off', both in the documentation and in 
	  the code (spotted by Josh)

v2->v3:
        - provide l1tf=[full,novirt,off]
        - provide config option to chose the default
        - let KVM warn in novirt case

v1->v2: add forgotten dependency on X86_BUG_L1TF

What should be done next on top of this:
        - once Paolo's/Konrad's KVM bits land in the tree, they should 
          look at the currently active mitigation setting and decide about 
          doing L1D flushes based on that
        - sysfs toggling can also be added later on top


 Documentation/admin-guide/kernel-parameters.txt | 18 ++++++++
 arch/x86/Kconfig                                | 18 ++++++++
 arch/x86/include/asm/processor.h                |  7 ++++
 arch/x86/kernel/cpu/bugs.c                      | 56 +++++++++++++++++++++++--
 arch/x86/kvm/vmx.c                              | 19 +++++++++
 include/linux/cpu.h                             |  2 +
 kernel/cpu.c                                    |  9 +++-
 7 files changed, 124 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 8e29c4b6756f..5dc277555ea6 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1971,6 +1971,24 @@
 			feature (tagged TLBs) on capable Intel chips.
 			Default is 1 (enabled)
 
+	l1tf=           [X86] Control mitigation of L1TF vulnerability on the
+			      affected CPUs
+			full	Provide all available mitigations for L1TF
+				vulnerability (disable HT, perform PTE bit
+				inversion, allow hypervisors to know that
+				they should provide all mitigations)
+			novirt	Provide all available mitigations needed
+				for running on bare metal (PTE bit inversion),
+				while not applying mitigations needed for
+				VM isolation. Hypervisors will be issuing
+				warning when first VM is being started in
+				pontentially insecure configuraion
+			off	Claim "I don't care at all about this issue".
+				The PTE bit inversion (bare metal mitigation) will
+				still be performed, but hypervisors will not be
+				issuing warning when VM is being started in
+				potentially insecure configuration
+
 	l2cr=		[PPC]
 
 	l3cr=		[PPC]
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 7a34fdf8daf0..a5231a0812e3 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2390,6 +2390,24 @@ config MODIFY_LDT_SYSCALL
 	  surface.  Disabling it removes the modify_ldt(2) system call.
 
 	  Saying 'N' here may make sense for embedded or server kernels.
+choice
+	prompt "Default L1TF mitigation"
+	default L1TF_MITIGATION_NOVIRT
+	help
+		Define what the default behavior for selecting mitigation on
+		CPUs affected by L1TF should be. The default can be overrided
+		on the kernel command-line. Refer to
+		<file:Documentation/admin-guide/kernel-parameters.txt>
+
+config L1TF_MITIGATION_FULL
+	bool "Full available L1TF mitigation"
+config L1TF_MITIGATION_NOVIRT
+	bool "Use L1TF bare metal mitigations only"
+config L1TF_MITIGATION_OFF
+	bool "Ignore L1TF issue"
+
+endchoice
+
 
 source "kernel/livepatch/Kconfig"
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 7e3ac5eedcd6..05471c590964 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -982,4 +982,11 @@ bool xen_set_default_idle(void);
 void stop_this_cpu(void *dummy);
 void df_debug(struct pt_regs *regs, long error_code);
 void microcode_check(void);
+
+enum l1tf_mitigations {
+	L1TF_MITIGATION_OFF,
+	L1TF_MITIGATION_NOVIRT,
+	L1TF_MITIGATION_FULL
+};
+enum l1tf_mitigations get_l1tf_mitigation(void);
 #endif /* _ASM_X86_PROCESSOR_H */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 50500cea6eba..9aa8b94334d5 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -657,6 +657,23 @@ void x86_spec_ctrl_setup_ap(void)
 
 #undef pr_fmt
 #define pr_fmt(fmt)	"L1TF: " fmt
+/* Default mitigation for L1TF-affected CPUs */
+static int l1tf_mitigation =
+#ifdef CONFIG_L1TF_MITIGATION_FULL
+	L1TF_MITIGATION_NOVIRT;
+#endif
+#ifdef CONFIG_L1TF_MITIGATION_NOVIRT
+	L1TF_MITIGATION_NOVIRT;
+#endif
+#ifdef CONFIG_L1TF_MITIGATION_OFF
+	L1TF_MITIGATION_OFF;
+#endif
+enum l1tf_mitigations get_l1tf_mitigation(void)
+{
+	return l1tf_mitigation;
+}
+EXPORT_SYMBOL(get_l1tf_mitigation);
+
 static void __init l1tf_select_mitigation(void)
 {
 	u64 half_pa;
@@ -664,6 +681,15 @@ static void __init l1tf_select_mitigation(void)
 	if (!boot_cpu_has_bug(X86_BUG_L1TF))
 		return;
 
+	switch (get_l1tf_mitigation()) {
+	case L1TF_MITIGATION_FULL:
+		cpu_smt_disable(true);
+		break;
+	case L1TF_MITIGATION_OFF:
+	case L1TF_MITIGATION_NOVIRT:
+		break;
+	}
+
 #if CONFIG_PGTABLE_LEVELS == 2
 	pr_warn("Kernel not compiled for PAE. No mitigation for L1TF\n");
 	return;
@@ -682,10 +708,36 @@ static void __init l1tf_select_mitigation(void)
 
 	setup_force_cpu_cap(X86_FEATURE_L1TF_PTEINV);
 }
+
+static int __init l1tf_cmdline(char *str)
+{
+	if (!boot_cpu_has_bug(X86_BUG_L1TF))
+		return 0;
+
+	if (!str)
+		return 0;
+
+	if (!strcmp(str, "full"))
+		l1tf_mitigation = L1TF_MITIGATION_FULL;
+	else if (!strcmp(str, "novirt"))
+		l1tf_mitigation = L1TF_MITIGATION_NOVIRT;
+	else if (!strcmp(str, "off"))
+		l1tf_mitigation = L1TF_MITIGATION_OFF;
+
+	return 0;
+}
+early_param("l1tf", l1tf_cmdline);
+
 #undef pr_fmt
 
 #ifdef CONFIG_SYSFS
 
+static const char *l1tf_states[] = {
+	[L1TF_MITIGATION_FULL]		= "Mitigation: Full",
+	[L1TF_MITIGATION_NOVIRT]	= "Mitigation: Page Table Inversion",
+	[L1TF_MITIGATION_OFF]		= "Mitigation: Page Table Inversion"
+};
+
 static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr,
 			       char *buf, unsigned int bug)
 {
@@ -712,9 +764,7 @@ static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr
 		return sprintf(buf, "%s\n", ssb_strings[ssb_mode]);
 
 	case X86_BUG_L1TF:
-		if (boot_cpu_has(X86_FEATURE_L1TF_PTEINV))
-			return sprintf(buf, "Mitigation: Page Table Inversion\n");
-		break;
+		return sprintf(buf, "%s\n", l1tf_states[get_l1tf_mitigation()]);
 
 	default:
 		break;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 559a12b6184d..8a5921ad38e2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10370,10 +10370,29 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
 	return ERR_PTR(err);
 }
 
+#define L1TF_MSG_NOVIRT "kvm: L1TF CPU bug present and virtualization mitigation disabled. Refer to CVE-2018-3620 for details.\n"
+#define L1TF_MSG_FULL "kvm: L1TF CPU bug present and KVM lacks support for L1D flushes. Refer to CVE-2018-3620 for details.\n"
 static int vmx_vm_init(struct kvm *kvm)
 {
 	if (!ple_gap)
 		kvm->arch.pause_in_guest = true;
+	if (boot_cpu_has(X86_BUG_L1TF)) {
+			switch (get_l1tf_mitigation()) {
+			case L1TF_MITIGATION_OFF:
+				break;
+			case L1TF_MITIGATION_NOVIRT:
+				printk_once (KERN_ERR L1TF_MSG_NOVIRT);
+				break;
+			case L1TF_MITIGATION_FULL:
+				/*
+				 * FIXME: once L1D flushes are implemented for
+				 * VMX, this will go away and L1TF_MITIGATION_FULL
+				 * would imply L1D flushing being turned on
+				 */
+				printk_once (KERN_ERR L1TF_MSG_FULL);
+				break;
+			}
+	}
 	return 0;
 }
 
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 7532cbf27b1d..3a3b5c4b1d4a 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -177,8 +177,10 @@ enum cpuhp_smt_control {
 
 #if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT)
 extern enum cpuhp_smt_control cpu_smt_control;
+void cpu_smt_disable(bool force);
 #else
 # define cpu_smt_control		(CPU_SMT_ENABLED)
+static inline void cpu_smt_disable(bool force) { }
 #endif
 
 #endif /* _LINUX_CPU_H_ */
diff --git a/kernel/cpu.c b/kernel/cpu.c
index d29fdd7e57bb..cba5afcab8a4 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -936,13 +936,18 @@ EXPORT_SYMBOL(cpu_down);
 #ifdef CONFIG_HOTPLUG_SMT
 enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED;
 
-static int __init smt_cmdline_disable(char *str)
+void __init cpu_smt_disable(bool force)
 {
 	cpu_smt_control = CPU_SMT_DISABLED;
-	if (str && !strcmp(str, "force")) {
+	if (force) {
 		pr_info("SMT: Force disabled\n");
 		cpu_smt_control = CPU_SMT_FORCE_DISABLED;
 	}
+}
+
+static int __init smt_cmdline_disable(char *str)
+{
+	cpu_smt_disable(str && !strcmp(str, "force"));
 	return 0;
 }
 early_param("nosmt", smt_cmdline_disable);

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-30  0:41                         ` Andi Kleen
                                             ` (2 preceding siblings ...)
  2018-06-30 14:59                           ` Josh Poimboeuf
@ 2018-06-30 23:34                           ` Dave Hansen
  2018-07-01  0:06                             ` Linus Torvalds
  3 siblings, 1 reply; 55+ messages in thread
From: Dave Hansen @ 2018-06-30 23:34 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 821 bytes --]

On 06/29/2018 05:41 PM, speck for Andi Kleen wrote:
> Two steps:
> - do you control the guest OS?
> - is the guest OS mitigated too.
> 
> Then they are safe.

I'd say it another way: Do you depend on the security boundaries that
are established by hardware virtualization, but weakened by L1TF?

Let's be honest, it's a rather rare case that code that's run, even in
guests, is fully under your control.  You might "trust" it, but if you
run a web browser in there, you shouldn't trust it farther than you can
throw it.

I can't tell you how many times I've heard folks say through this entire
Spectre/Meltdown/whatever fun that they are a closed system that does
not run untrusted code and they don't think they need mitigation.  They
totally trust their web browser, app store and network daemons.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-30 23:34                           ` Dave Hansen
@ 2018-07-01  0:06                             ` Linus Torvalds
  0 siblings, 0 replies; 55+ messages in thread
From: Linus Torvalds @ 2018-07-01  0:06 UTC (permalink / raw)
  To: speck



On Sat, 30 Jun 2018, speck for Dave Hansen wrote:
> 
> I can't tell you how many times I've heard folks say through this entire
> Spectre/Meltdown/whatever fun that they are a closed system that does
> not run untrusted code and they don't think they need mitigation.  They
> totally trust their web browser, app store and network daemons.

Absolute security doesn't exist.

And honestly, the people who argue _for_ some security measures "just in 
case" may well be more stupid than the people who think they are safe.

Saying "I'm not affected" might well be the right thing. Because in the 
end, you have to make a judgement call.

Honestly, if you have a bad app or a network daemon, you probably have 
bigger issues than most of the Spectre class of attacks.  

                Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-30 17:42                             ` [MODERATED] " Linus Torvalds
  2018-06-30 19:30                               ` Jiri Kosina
@ 2018-07-02  8:06                               ` Thomas Gleixner
  1 sibling, 0 replies; 55+ messages in thread
From: Thomas Gleixner @ 2018-07-02  8:06 UTC (permalink / raw)
  To: speck

On Sat, 30 Jun 2018, speck for Linus Torvalds wrote:
> On Sat, 30 Jun 2018, speck for Thomas Gleixner wrote:
> > 
> > So it's perfectly justified to switch off SMT by default when Joe Random
> > User starts a VM.
> 
> No.
> 
> A distro can do whatever the hell it wants, but I want the default 
> behavior to be to either just warn, or to have some _smart_ dynamic 
> behavior.

That's exactly what I said in the sentence which you snipped from my reply:

  It's a distro policy problem though and the kernel needs to provide the
  proper knobs to select a policy based on their analysis and in the
  interest of users.

So the kernel default surely is ON and all the kernel has to provide are
the proper knobs. What the distro decides to do is not the kernels problem.

Thanks,

	tglx

 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v4] Linux patch #1
  2018-06-30 22:22 ` [MODERATED] [PATCH 1/1 v4] " Jiri Kosina
@ 2018-07-02 14:51   ` Konrad Rzeszutek Wilk
  2018-07-02 15:00     ` Jiri Kosina
  0 siblings, 1 reply; 55+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-07-02 14:51 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 8130 bytes --]

..snip..
> What should be done next on top of this:
>         - once Paolo's/Konrad's KVM bits land in the tree, they should 
>           look at the currently active mitigation setting and decide about 
>           doing L1D flushes based on that

I would say the inverse. That is this patch should be on top of the kvm pile
as it simplies it a bit, but <shrugs>

Anyhow got a couple of input that were raised when I posted the patch for KVM
for the warning.

>         - sysfs toggling can also be added later on top
> 
> 
>  Documentation/admin-guide/kernel-parameters.txt | 18 ++++++++
>  arch/x86/Kconfig                                | 18 ++++++++
>  arch/x86/include/asm/processor.h                |  7 ++++
>  arch/x86/kernel/cpu/bugs.c                      | 56 +++++++++++++++++++++++--
>  arch/x86/kvm/vmx.c                              | 19 +++++++++
>  include/linux/cpu.h                             |  2 +
>  kernel/cpu.c                                    |  9 +++-
>  7 files changed, 124 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 8e29c4b6756f..5dc277555ea6 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1971,6 +1971,24 @@
>  			feature (tagged TLBs) on capable Intel chips.
>  			Default is 1 (enabled)
>  
> +	l1tf=           [X86] Control mitigation of L1TF vulnerability on the
> +			      affected CPUs
> +			full	Provide all available mitigations for L1TF
> +				vulnerability (disable HT, perform PTE bit
> +				inversion, allow hypervisors to know that
> +				they should provide all mitigations)
> +			novirt	Provide all available mitigations needed
> +				for running on bare metal (PTE bit inversion),
> +				while not applying mitigations needed for
> +				VM isolation. Hypervisors will be issuing
> +				warning when first VM is being started in
> +				pontentially insecure configuraion
> +			off	Claim "I don't care at all about this issue".
> +				The PTE bit inversion (bare metal mitigation) will
> +				still be performed, but hypervisors will not be
> +				issuing warning when VM is being started in
> +				potentially insecure configuration
> +
>  	l2cr=		[PPC]
>  
>  	l3cr=		[PPC]
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 7a34fdf8daf0..a5231a0812e3 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -2390,6 +2390,24 @@ config MODIFY_LDT_SYSCALL
>  	  surface.  Disabling it removes the modify_ldt(2) system call.
>  
>  	  Saying 'N' here may make sense for embedded or server kernels.
> +choice
> +	prompt "Default L1TF mitigation"
> +	default L1TF_MITIGATION_NOVIRT
> +	help
> +		Define what the default behavior for selecting mitigation on
> +		CPUs affected by L1TF should be. The default can be overrided
> +		on the kernel command-line. Refer to
> +		<file:Documentation/admin-guide/kernel-parameters.txt>
> +
> +config L1TF_MITIGATION_FULL
> +	bool "Full available L1TF mitigation"
> +config L1TF_MITIGATION_NOVIRT
> +	bool "Use L1TF bare metal mitigations only"
> +config L1TF_MITIGATION_OFF
> +	bool "Ignore L1TF issue"
> +
> +endchoice
> +
>  
>  source "kernel/livepatch/Kconfig"
>  
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 7e3ac5eedcd6..05471c590964 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -982,4 +982,11 @@ bool xen_set_default_idle(void);
>  void stop_this_cpu(void *dummy);
>  void df_debug(struct pt_regs *regs, long error_code);
>  void microcode_check(void);
> +
> +enum l1tf_mitigations {
> +	L1TF_MITIGATION_OFF,
> +	L1TF_MITIGATION_NOVIRT,
> +	L1TF_MITIGATION_FULL
> +};
> +enum l1tf_mitigations get_l1tf_mitigation(void);
>  #endif /* _ASM_X86_PROCESSOR_H */
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index 50500cea6eba..9aa8b94334d5 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -657,6 +657,23 @@ void x86_spec_ctrl_setup_ap(void)
>  
>  #undef pr_fmt
>  #define pr_fmt(fmt)	"L1TF: " fmt
> +/* Default mitigation for L1TF-affected CPUs */
> +static int l1tf_mitigation =
> +#ifdef CONFIG_L1TF_MITIGATION_FULL
> +	L1TF_MITIGATION_NOVIRT;
> +#endif
> +#ifdef CONFIG_L1TF_MITIGATION_NOVIRT
> +	L1TF_MITIGATION_NOVIRT;
> +#endif
> +#ifdef CONFIG_L1TF_MITIGATION_OFF
> +	L1TF_MITIGATION_OFF;
> +#endif
> +enum l1tf_mitigations get_l1tf_mitigation(void)
> +{
> +	return l1tf_mitigation;
> +}
> +EXPORT_SYMBOL(get_l1tf_mitigation);
> +
>  static void __init l1tf_select_mitigation(void)
>  {
>  	u64 half_pa;
> @@ -664,6 +681,15 @@ static void __init l1tf_select_mitigation(void)
>  	if (!boot_cpu_has_bug(X86_BUG_L1TF))
>  		return;
>  
> +	switch (get_l1tf_mitigation()) {
> +	case L1TF_MITIGATION_FULL:
> +		cpu_smt_disable(true);
> +		break;
> +	case L1TF_MITIGATION_OFF:
> +	case L1TF_MITIGATION_NOVIRT:
> +		break;
> +	}
> +
>  #if CONFIG_PGTABLE_LEVELS == 2
>  	pr_warn("Kernel not compiled for PAE. No mitigation for L1TF\n");
>  	return;
> @@ -682,10 +708,36 @@ static void __init l1tf_select_mitigation(void)
>  
>  	setup_force_cpu_cap(X86_FEATURE_L1TF_PTEINV);
>  }
> +
> +static int __init l1tf_cmdline(char *str)
> +{
> +	if (!boot_cpu_has_bug(X86_BUG_L1TF))
> +		return 0;
> +
> +	if (!str)
> +		return 0;
> +
> +	if (!strcmp(str, "full"))
> +		l1tf_mitigation = L1TF_MITIGATION_FULL;
> +	else if (!strcmp(str, "novirt"))
> +		l1tf_mitigation = L1TF_MITIGATION_NOVIRT;
> +	else if (!strcmp(str, "off"))
> +		l1tf_mitigation = L1TF_MITIGATION_OFF;
> +
> +	return 0;
> +}
> +early_param("l1tf", l1tf_cmdline);
> +
>  #undef pr_fmt
>  
>  #ifdef CONFIG_SYSFS
>  
> +static const char *l1tf_states[] = {
> +	[L1TF_MITIGATION_FULL]		= "Mitigation: Full",
> +	[L1TF_MITIGATION_NOVIRT]	= "Mitigation: Page Table Inversion",
> +	[L1TF_MITIGATION_OFF]		= "Mitigation: Page Table Inversion"
> +};
> +
>  static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr,
>  			       char *buf, unsigned int bug)
>  {
> @@ -712,9 +764,7 @@ static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr
>  		return sprintf(buf, "%s\n", ssb_strings[ssb_mode]);
>  
>  	case X86_BUG_L1TF:
> -		if (boot_cpu_has(X86_FEATURE_L1TF_PTEINV))
> -			return sprintf(buf, "Mitigation: Page Table Inversion\n");
> -		break;
> +		return sprintf(buf, "%s\n", l1tf_states[get_l1tf_mitigation()]);
>  
>  	default:
>  		break;
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 559a12b6184d..8a5921ad38e2 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -10370,10 +10370,29 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
>  	return ERR_PTR(err);
>  }
>  
> +#define L1TF_MSG_NOVIRT "kvm: L1TF CPU bug present and virtualization mitigation disabled. Refer to CVE-2018-3620 for details.\n"
> +#define L1TF_MSG_FULL "kvm: L1TF CPU bug present and KVM lacks support for L1D flushes. Refer to CVE-2018-3620 for details.\n"
>  static int vmx_vm_init(struct kvm *kvm)

This should be in a different function - when the guest is created/started.

>  {
>  	if (!ple_gap)
>  		kvm->arch.pause_in_guest = true;
> +	if (boot_cpu_has(X86_BUG_L1TF)) {
> +			switch (get_l1tf_mitigation()) {
> +			case L1TF_MITIGATION_OFF:
> +				break;
> +			case L1TF_MITIGATION_NOVIRT:
> +				printk_once (KERN_ERR L1TF_MSG_NOVIRT);

Linus/Paolo/etc mentioned that it should be WARN not ERR unless you really want to enforce
it in which case it should an error and fail the creation of the guest. Not sure if this patch does that?

> +				break;
> +			case L1TF_MITIGATION_FULL:
> +				/*
> +				 * FIXME: once L1D flushes are implemented for
> +				 * VMX, this will go away and L1TF_MITIGATION_FULL
> +				 * would imply L1D flushing being turned on

Missing stop.
> +				 */
> +				printk_once (KERN_ERR L1TF_MSG_FULL);

But more importantly, I think you are missing the check to see .. why not just
rebase this on top of the kvm/pile. Then you already have the right CPU bits.

Attaching the bundle I had sent to Thomas.

[-- Attachment #2: kvm.l1tf.v5.rc2.bundle --]
[-- Type: application/octet-stream, Size: 12106 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-30 19:58                                   ` Jiri Kosina
@ 2018-07-02 14:52                                     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 55+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-07-02 14:52 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 1563 bytes --]

On Sat, Jun 30, 2018 at 09:58:49PM +0200, speck for Jiri Kosina wrote:
> On Sat, 30 Jun 2018, speck for Linus Torvalds wrote:
> 
> > But I think it's entirely unacceptable to disable SMT by default just 
> > because a user *might* run in virtualization.
> > 
> > So basically, I think the default *has* to be to keep SMT enabled (and 
> > just warn if you run virtualized loads), or to turn off SMT only for when 
> > somebody actively starts to use kvm.
> > 
> > Because the "disable SMT at boot" thing absolutely _has_ to be an explicit 
> > choice by the admin.
> > 
> > No way in hell will I accept "disable at boot" as a defaul value. Most 
> > people never run VM's at all.
> 
> Yeah, the v3 patch I just sent defaults to bare-metal mitigation only 
> ("what is default behavior" can be easily overridden by embedded 
> kernel/distro vendors if they wish to, so that they don't have to carry 
> the boot cmdline around forever) and KVM warns in case it's about to start 
> the first VM in such case.

I picked the wrong weekend to go offline and now this week is July 4th and I am
offline too.

Thomas has graciously picked up the kvm pile bundle (also attached the latest I sent)
to steam blast and I think it nicely fits in what has been decided.

That is by default we just print a warning and continue on. And the vmentry_l1d_flush
is now accepting strings (always,never, cond) and 'cond' is the default.

I think that should lay the foundation and then we can continue with the smarter
choices like disabling SMT if KVM starts creating guests or such.

[-- Attachment #2: kvm.l1tf.v5.rc2.bundle --]
[-- Type: application/octet-stream, Size: 12106 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v4] Linux patch #1
  2018-07-02 14:51   ` [MODERATED] " Konrad Rzeszutek Wilk
@ 2018-07-02 15:00     ` Jiri Kosina
  2018-07-02 15:14       ` Thomas Gleixner
  0 siblings, 1 reply; 55+ messages in thread
From: Jiri Kosina @ 2018-07-02 15:00 UTC (permalink / raw)
  To: speck

On Mon, 2 Jul 2018, speck for Konrad Rzeszutek Wilk wrote:

> > What should be done next on top of this:
> >         - once Paolo's/Konrad's KVM bits land in the tree, they should 
> >           look at the currently active mitigation setting and decide about
> >           doing L1D flushes based on that
> 
> I would say the inverse. That is this patch should be on top of the kvm pile
> as it simplies it a bit, but <shrugs>

I agree, but it's just not in the tree yet. I can either rebase on top of 
your git bundle, or let Thomas eventually apply it the KVM bit would 
have to be slightly modified on top of it.

Depends on what happens first really.

Thomas, what would you prefer?
  
> > +#define L1TF_MSG_NOVIRT "kvm: L1TF CPU bug present and virtualization
> > mitigation disabled. Refer to CVE-2018-3620 for details.\n"
> > +#define L1TF_MSG_FULL "kvm: L1TF CPU bug present and KVM lacks support for
> > L1D flushes. Refer to CVE-2018-3620 for details.\n"
> >  static int vmx_vm_init(struct kvm *kvm)
> 
> This should be in a different function - when the guest is created/started.

I am putting it to vmx_vm_init(), which gives the warning first time VM is 
(attemted to be) created, which IIUC is the behavior everybody would like 
to see.

> >  {
> >   if (!ple_gap)
> > 		kvm->arch.pause_in_guest = true;
> > +	if (boot_cpu_has(X86_BUG_L1TF)) {
> > +			switch (get_l1tf_mitigation()) {
> > +			case L1TF_MITIGATION_OFF:
> > +				break;
> > +			case L1TF_MITIGATION_NOVIRT:
> > +				printk_once (KERN_ERR L1TF_MSG_NOVIRT);
> 
> Linus/Paolo/etc mentioned that it should be WARN not ERR unless you 
> really want to enforce it in which case it should an error and fail the 
> creation of the guest. Not sure if this patch does that?

It does not; it really intends to warn very loudly.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/1 v4] Linux patch #1
  2018-07-02 15:00     ` Jiri Kosina
@ 2018-07-02 15:14       ` Thomas Gleixner
  0 siblings, 0 replies; 55+ messages in thread
From: Thomas Gleixner @ 2018-07-02 15:14 UTC (permalink / raw)
  To: speck

On Mon, 2 Jul 2018, speck for Jiri Kosina wrote:
> On Mon, 2 Jul 2018, speck for Konrad Rzeszutek Wilk wrote:
> 
> > > What should be done next on top of this:
> > >         - once Paolo's/Konrad's KVM bits land in the tree, they should 
> > >           look at the currently active mitigation setting and decide about
> > >           doing L1D flushes based on that
> > 
> > I would say the inverse. That is this patch should be on top of the kvm pile
> > as it simplies it a bit, but <shrugs>
> 
> I agree, but it's just not in the tree yet. I can either rebase on top of 
> your git bundle, or let Thomas eventually apply it the KVM bit would 
> have to be slightly modified on top of it.
> 
> Depends on what happens first really.
> 
> Thomas, what would you prefer?

Putting the KVM bits first is probably the right thing. I'm working on the
kvm bits at the moment and will post another version soonish.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-06-30  8:59                           ` Thomas Gleixner
  2018-06-30 17:42                             ` [MODERATED] " Linus Torvalds
@ 2018-07-05 20:03                             ` Jon Masters
  2018-07-05 20:16                               ` Jiri Kosina
  2018-07-05 20:25                               ` Linus Torvalds
  1 sibling, 2 replies; 55+ messages in thread
From: Jon Masters @ 2018-07-05 20:03 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 2253 bytes --]

On 06/30/2018 04:59 AM, speck for Thomas Gleixner wrote:
> On Fri, 29 Jun 2018, speck for Andi Kleen wrote:
>>> - Joe The Random User, who absoluetely doesn't have a clue what either of 
>>>   the terms  "CPU", "cache", "speculation", "virtualization" really means
>>>   in the low-level technical terms, but is able to click around and use 
>>>   his computer (including firing up virtual machines for any purpose, as
>>>   that's super-trivial these days) doesn't get fully compromised 
>>>   immediately
>>
>> It should be safe to assume they don't run untrusted gursts.
>>
>> They also need to make sure to update the guest OS.
> 
> You are making assumptions all over the place which are biased by Intel
> interests. Is Intel going to update all their marketing drivel how secure
> virtualization is and how it solves problems with untrusted code magically?

Just to capture our thoughts on this whole sub-thread:

* We don't want to have to turn off HT for everyone. Mostly because
it'll kill performance and hurt all kinds of tunings people have, and
maybe in small part for some political reasons related to the above.

* We have to assume a deployment running OpenStack or similar with any
guest where the user has control over the image and/or kernel is
"untrusted". We've had some (limited) discussions about whether secure
boot could be used to say "this is known to be a trusted kernel in a
trusted guest so allow it" but that treads on things people find evil.

* We need to agree among the distros with a sane default hopefully
identical to whatever Linus does upstream. Like we did with SSBD. By all
agreeing together, nobody fights or claims they are "the safest airline"
and everyone can keep flying without falling into that marketing BS.
This was important last time to not let things get too out of hand.

So I suggest that whatever Linus wants to do we try to all get behind
from a distro point of view as the default, and quickly. Then we can
have whitepapers covering the rest.

Linus, are you suggesting this:
	* HT on by default
	* PTE inversion by default
	* Warn on guest creation by default (because HT on)

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-07-05 20:03                             ` [MODERATED] " Jon Masters
@ 2018-07-05 20:16                               ` Jiri Kosina
  2018-07-05 21:29                                 ` Jon Masters
  2018-07-05 20:25                               ` Linus Torvalds
  1 sibling, 1 reply; 55+ messages in thread
From: Jiri Kosina @ 2018-07-05 20:16 UTC (permalink / raw)
  To: speck

On Thu, 5 Jul 2018, speck for Jon Masters wrote:

> Linus, are you suggesting this:
> 	* HT on by default
> 	* PTE inversion by default
> 	* Warn on guest creation by default (because HT on)

FWIW, that's exactly the behavior my boot-time control patch enforces (by 
default).

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-07-05 20:03                             ` [MODERATED] " Jon Masters
  2018-07-05 20:16                               ` Jiri Kosina
@ 2018-07-05 20:25                               ` Linus Torvalds
  2018-07-05 20:50                                 ` Thomas Gleixner
  1 sibling, 1 reply; 55+ messages in thread
From: Linus Torvalds @ 2018-07-05 20:25 UTC (permalink / raw)
  To: speck



On Thu, 5 Jul 2018, speck for Jon Masters wrote:
> 
> Linus, are you suggesting this:
> 	* HT on by default
> 	* PTE inversion by default
> 	* Warn on guest creation by default (because HT on)

Yes.

I still think that for a "naive user" default, it might be better to just 
dynamically turn off HT on guest creation (and dynamically turn it back on 
when no VMX activity has happened for a while), simply because that works 
best for the "user is not aware" issues.

But that could interact badly with fancy users, and it's more complex than 
the simple warning, so it's probably not realistic.

             Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/1 v2] Linux patch #1
  2018-07-05 20:25                               ` Linus Torvalds
@ 2018-07-05 20:50                                 ` Thomas Gleixner
  2018-07-05 21:21                                   ` [MODERATED] " Jon Masters
  0 siblings, 1 reply; 55+ messages in thread
From: Thomas Gleixner @ 2018-07-05 20:50 UTC (permalink / raw)
  To: speck

On Thu, 5 Jul 2018, speck for Linus Torvalds wrote:
> On Thu, 5 Jul 2018, speck for Jon Masters wrote:
> > 
> > Linus, are you suggesting this:
> > 	* HT on by default
> > 	* PTE inversion by default

PTE inversion is not optional. It's simply there. All we talk about is the
hypervisor side.

> > 	* Warn on guest creation by default (because HT on)
> 
> Yes.
> 
> I still think that for a "naive user" default, it might be better to just 
> dynamically turn off HT on guest creation (and dynamically turn it back on 
> when no VMX activity has happened for a while), simply because that works 
> best for the "user is not aware" issues.
> 
> But that could interact badly with fancy users, and it's more complex than 
> the simple warning, so it's probably not realistic.

That can be done completely from user space before starting a guest and
after it went down. So for Joe user this might be one of those one time pop
ups after updating where he can chose the "paranoid" variant.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-07-05 20:50                                 ` Thomas Gleixner
@ 2018-07-05 21:21                                   ` Jon Masters
  2018-07-05 21:24                                     ` Jon Masters
  0 siblings, 1 reply; 55+ messages in thread
From: Jon Masters @ 2018-07-05 21:21 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 2725 bytes --]

On 07/05/2018 04:50 PM, speck for Thomas Gleixner wrote:
> On Thu, 5 Jul 2018, speck for Linus Torvalds wrote:
>> On Thu, 5 Jul 2018, speck for Jon Masters wrote:
>>>
>>> Linus, are you suggesting this:
>>> 	* HT on by default
>>> 	* PTE inversion by default
> 
> PTE inversion is not optional. It's simply there. All we talk about is the
> hypervisor side.

Sure, sorry I didn't say "built in at compile time".

>>> 	* Warn on guest creation by default (because HT on)
>>
>> Yes.
>>
>> I still think that for a "naive user" default, it might be better to just 
>> dynamically turn off HT on guest creation (and dynamically turn it back on 
>> when no VMX activity has happened for a while), simply because that works 
>> best for the "user is not aware" issues.

We could do this for desktop users, e.g. in Fedora where they probably
aren't running lots of VMs but might expect the marketing about security
of VMs to be true.

>> But that could interact badly with fancy users, and it's more complex than 
>> the simple warning, so it's probably not realistic.

I floated this idea (dynamic switching) early in our internal planning
process but it was quickly shot down because of customer tunings and
tuning profiles that might be loaded. It turns out (we're still
checking) that as many as 90% of our users of OpenStack are using HT-on
by default (just an example) and its default behavior is to (wrongly)
treat threads as cheap extra cores. So those fancy users are in for a
nightmare already. I've already asked that our OpenStack folks get a
project ready to propose upstream to fix this broken assumption.

Do you plan on revisiting added an "l1tf=" top level knob then, or just
keep the vmx option and nosmt setting? I don't see "l1tf=" right now?

There was a lot of discussion about not confusing users who might have
hardware that isn't vulnerable to L1TF exploit. For example, AMD EPYC is
not vulnerable to this exploit and it's going to be confusing for users
of that hardware, or future Intel hardware if they need to explictly set
nosmt on the command line vs. the "l1tf=" approach. The latter will just
do nothing on future platforms, or on AMD, but "nosmt" is likely to
persist around or get used more broadly than it needs to be.

Btw, we have confirmed that, in theory, there ARE Intel systems with
over 32TB of RAM possible with Skylake. Currently, we have only tested
24TB systems but 32TB is expected, and the "theoretical" support matrix
for RHEL claims 64TB. I have requested that after disclosure we refuse
to certify systems for RHEL that have over 32TB and vulnereable.

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-07-05 21:21                                   ` [MODERATED] " Jon Masters
@ 2018-07-05 21:24                                     ` Jon Masters
  0 siblings, 0 replies; 55+ messages in thread
From: Jon Masters @ 2018-07-05 21:24 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 626 bytes --]

On 07/05/2018 05:21 PM, speck for Jon Masters wrote:

> Btw, we have confirmed that, in theory, there ARE Intel systems with
> over 32TB of RAM possible with Skylake. Currently, we have only tested
> 24TB systems but 32TB is expected, and the "theoretical" support matrix
> for RHEL claims 64TB. I have requested that after disclosure we refuse
> to certify systems for RHEL that have over 32TB and vulnereable.

(we'd need to have 5-level paging and the resultant higher MAX_PA, but
I'm hoping those future platforms also aren't vulnerable to L1TF)

-- 
Computer Architect | Sent from my Fedora powered laptop


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-07-05 20:16                               ` Jiri Kosina
@ 2018-07-05 21:29                                 ` Jon Masters
  2018-07-05 21:39                                   ` Jiri Kosina
  0 siblings, 1 reply; 55+ messages in thread
From: Jon Masters @ 2018-07-05 21:29 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 647 bytes --]

On 07/05/2018 04:16 PM, speck for Jiri Kosina wrote:
> On Thu, 5 Jul 2018, speck for Jon Masters wrote:
> 
>> Linus, are you suggesting this:
>> 	* HT on by default
>> 	* PTE inversion by default
>> 	* Warn on guest creation by default (because HT on)
> 
> FWIW, that's exactly the behavior my boot-time control patch enforces (by 
> default).

I see now that your patch is still around, it just hadn't been merged
yet. Can you followup to my other reply with what you intend to do with
it? I think we'd prefer one upstream option like "l1tf=" to get behind.

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-07-05 21:29                                 ` Jon Masters
@ 2018-07-05 21:39                                   ` Jiri Kosina
  2018-07-05 22:19                                     ` Thomas Gleixner
  0 siblings, 1 reply; 55+ messages in thread
From: Jiri Kosina @ 2018-07-05 21:39 UTC (permalink / raw)
  To: speck

On Thu, 5 Jul 2018, speck for Jon Masters wrote:

> I see now that your patch is still around, it just hadn't been merged 
> yet. 

Yup, I've updated it on top of the KVM pile that is now in speck.git; it's 
in that thread as patch 11/10.

> Can you followup to my other reply with what you intend to do with it? 

Well, I certainly hope we can agree on one way or another on the semantics 
l1tf= should have, and have that upstream (once that happens, my plan is 
to do sysfs runtime toggling on top as well). The latest version I've sent 
implements the semantics you and Linus seem to be in agreement about.

> I think we'd prefer one upstream option like "l1tf=3D" to get behind.=

Absolutely. I am reasonably sure we'll be shipping it in some form in SUSE 
kernels so that we offer "make it easy for users" option even if it 
wouldn't happen to be upstream for one reason or another.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/1 v2] Linux patch #1
  2018-07-05 21:39                                   ` Jiri Kosina
@ 2018-07-05 22:19                                     ` Thomas Gleixner
  2018-07-05 23:49                                       ` [MODERATED] " Josh Poimboeuf
  0 siblings, 1 reply; 55+ messages in thread
From: Thomas Gleixner @ 2018-07-05 22:19 UTC (permalink / raw)
  To: speck

On Thu, 5 Jul 2018, speck for Jiri Kosina wrote:
> On Thu, 5 Jul 2018, speck for Jon Masters wrote:
> 
> > I see now that your patch is still around, it just hadn't been merged 
> > yet. 
> 
> Yup, I've updated it on top of the KVM pile that is now in speck.git; it's 
> in that thread as patch 11/10.
> 
> > Can you followup to my other reply with what you intend to do with it? 
> 
> Well, I certainly hope we can agree on one way or another on the semantics 
> l1tf= should have, and have that upstream (once that happens, my plan is 
> to do sysfs runtime toggling on top as well). The latest version I've sent 
> implements the semantics you and Linus seem to be in agreement about.
> 
> > I think we'd prefer one upstream option like "l1tf=3D" to get behind.=
> 
> Absolutely. I am reasonably sure we'll be shipping it in some form in SUSE 
> kernels so that we offer "make it easy for users" option even if it 
> wouldn't happen to be upstream for one reason or another.

As everyone seems to be happy with the latest version, I'm going to pick it
up.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [MODERATED] Re: [PATCH 1/1 v2] Linux patch #1
  2018-07-05 22:19                                     ` Thomas Gleixner
@ 2018-07-05 23:49                                       ` Josh Poimboeuf
  0 siblings, 0 replies; 55+ messages in thread
From: Josh Poimboeuf @ 2018-07-05 23:49 UTC (permalink / raw)
  To: speck

On Fri, Jul 06, 2018 at 12:19:33AM +0200, speck for Thomas Gleixner wrote:
> On Thu, 5 Jul 2018, speck for Jiri Kosina wrote:
> > On Thu, 5 Jul 2018, speck for Jon Masters wrote:
> > 
> > > I see now that your patch is still around, it just hadn't been merged 
> > > yet. 
> > 
> > Yup, I've updated it on top of the KVM pile that is now in speck.git; it's 
> > in that thread as patch 11/10.
> > 
> > > Can you followup to my other reply with what you intend to do with it? 
> > 
> > Well, I certainly hope we can agree on one way or another on the semantics 
> > l1tf= should have, and have that upstream (once that happens, my plan is 
> > to do sysfs runtime toggling on top as well). The latest version I've sent 
> > implements the semantics you and Linus seem to be in agreement about.
> > 
> > > I think we'd prefer one upstream option like "l1tf=3D" to get behind.=
> > 
> > Absolutely. I am reasonably sure we'll be shipping it in some form in SUSE 
> > kernels so that we offer "make it easy for users" option even if it 
> > wouldn't happen to be upstream for one reason or another.
> 
> As everyone seems to be happy with the latest version, I'm going to pick it
> up.

Wait!  I have some comments brewing...

-- 
Josh

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2018-07-05 23:50 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-28 22:53 [MODERATED] [PATCH 1/1] Linux patch #1 Jiri Kosina
2018-06-28 23:15 ` [MODERATED] " Jiri Kosina
2018-06-28 23:36 ` [MODERATED] [PATCH 1/1 v2] " Jiri Kosina
2018-06-29  8:38   ` [MODERATED] " Borislav Petkov
2018-06-29 15:43   ` Thomas Gleixner
2018-06-29 15:46     ` Thomas Gleixner
2018-06-29 16:48   ` [MODERATED] " Josh Poimboeuf
2018-06-29 16:49     ` Josh Poimboeuf
2018-06-29 19:47     ` Thomas Gleixner
2018-06-29 19:54       ` [MODERATED] " Josh Poimboeuf
2018-06-29 21:26         ` Jiri Kosina
2018-06-29 21:28           ` Jiri Kosina
2018-06-29 22:05             ` Andi Kleen
2018-06-29 22:17               ` Jiri Kosina
2018-06-29 23:21                 ` Andi Kleen
2018-06-29 23:33                   ` Jiri Kosina
2018-06-29 23:37                     ` Jiri Kosina
2018-06-29 23:44                     ` Andi Kleen
2018-06-30  0:02                       ` Jiri Kosina
2018-06-30  0:41                         ` Andi Kleen
2018-06-30  0:50                           ` Jiri Kosina
2018-06-30  8:59                           ` Thomas Gleixner
2018-06-30 17:42                             ` [MODERATED] " Linus Torvalds
2018-06-30 19:30                               ` Jiri Kosina
2018-06-30 19:52                                 ` Linus Torvalds
2018-06-30 19:58                                   ` Jiri Kosina
2018-07-02 14:52                                     ` Konrad Rzeszutek Wilk
2018-07-02  8:06                               ` Thomas Gleixner
2018-07-05 20:03                             ` [MODERATED] " Jon Masters
2018-07-05 20:16                               ` Jiri Kosina
2018-07-05 21:29                                 ` Jon Masters
2018-07-05 21:39                                   ` Jiri Kosina
2018-07-05 22:19                                     ` Thomas Gleixner
2018-07-05 23:49                                       ` [MODERATED] " Josh Poimboeuf
2018-07-05 20:25                               ` Linus Torvalds
2018-07-05 20:50                                 ` Thomas Gleixner
2018-07-05 21:21                                   ` [MODERATED] " Jon Masters
2018-07-05 21:24                                     ` Jon Masters
2018-06-30 14:59                           ` Josh Poimboeuf
2018-06-30 23:34                           ` Dave Hansen
2018-07-01  0:06                             ` Linus Torvalds
2018-06-29 21:46           ` Josh Poimboeuf
2018-06-29 21:49           ` Andi Kleen
2018-06-29 21:56             ` Jiri Kosina
2018-06-29 22:05               ` Thomas Gleixner
2018-06-29 22:43               ` [MODERATED] " Luck, Tony
2018-06-30  9:05             ` Thomas Gleixner
2018-06-30 19:48 ` [MODERATED] [PATCH 1/1 v3] " Jiri Kosina
2018-06-30 21:31   ` [MODERATED] " Josh Poimboeuf
2018-06-30 21:35     ` Linus Torvalds
2018-06-30 21:43     ` Jiri Kosina
2018-06-30 22:22 ` [MODERATED] [PATCH 1/1 v4] " Jiri Kosina
2018-07-02 14:51   ` [MODERATED] " Konrad Rzeszutek Wilk
2018-07-02 15:00     ` Jiri Kosina
2018-07-02 15:14       ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.