Re: [GIT PULL] x86/mm changes for v5.8

From: "Singh, Balbir" <sblbir@amazon.com>
To: "torvalds@linux-foundation.org" <torvalds@linux-foundation.org>
Cc: "mingo@kernel.org" <mingo@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"keescook@chromium.org" <keescook@chromium.org>,
	"a.p.zijlstra@chello.nl" <a.p.zijlstra@chello.nl>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"luto@kernel.org" <luto@kernel.org>,
	"bp@alien8.de" <bp@alien8.de>,
	"benh@kernel.crashing.org" <benh@kernel.crashing.org>
Subject: Re: [GIT PULL] x86/mm changes for v5.8
Date: Wed, 3 Jun 2020 01:31:05 +0000	[thread overview]
Message-ID: <9710330d4fa99d720f6ec8f6af2eae26115d27a5.camel@amazon.com> (raw)
In-Reply-To: <CAHk-=whC4PUhErcoDhCbTOdmPPy-Pj8j9ytsdcyz9TorOb4KUw@mail.gmail.com>

On Tue, 2020-06-02 at 16:28 -0700, Linus Torvalds wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On Tue, Jun 2, 2020 at 4:01 PM Singh, Balbir <sblbir@amazon.com> wrote:
> > 
> > >      (c) and if I read the code correctly, trying to flush the L1D$ on
> > > non-intel without the HW support, it causes a WARN_ON_ONCE()! WTF?
> > 
> > That is not correct, the function only complains if we do a software fallback
> > flush without allocating the flush pages.
> 
> Right.
> 
> And if you're not on Intel, then that allocation would never have been
> done, since the allocation function returns an error for non-intel
> systems.
> 
> > That function is not exposed without
> > the user using the prctl() API, which allocates those flush pages.
> 
> See above: it doesn't actually allocate those pages on anything but intel CPU's.
> 
> That said, looking deeper, it then does look like a
> l1d_flush_init_once() failure will also cause the code to avoid
> setting the TIF_SPEC_L1D_FLUSH bit, so non-intel CPU's will never call
> the actual flushing routines, and thus never hit the WARN_ON. Ok.
> 
> > >  (2) the HW case is done for any vendor, if it reports the "I have the MSR"
> > 
> > No l1d_flush_init_once() fails for users opting in via the prctl(), it
> > succeeds for users of L1TF.
> 
> Yeah, again it looks like this all is basically just a hack for Intel CPU's.
> 
> It should never have been conditional on "do this on Intel".
> 
> It should have been conditional on the L1TF bug.
> 
> Yes, there's certainly overlap there, but it's not complete.
> 
> > >  (3) the VMX support certainly has various sanity checks like "oh, CPU
> > > doesn't have X86_BUG_L1TF, then I won't do this even if there was some
> > > kernel command line to say I should". But the new prctrl doesn't have
> > > anything like that. It just enables that L1D$ thing mindlessly,
> > > thinking that user-land software somehow knows what it's doing. BS.
> > 
> > So you'd like to see a double opt-in?
> 
> I'd like it to be gated on being sane by default, together with some
> system option like we have for pretty much all the mitigations.
> 
> >     Unforunately there is no gating
> > of the bug and I tried to make it generic - clearly calling it opt-in
> > flushing for the paranoid, for those who really care about CVE-2020-0550.
> 
> No, you didn't make it generic at all - you made it depend on
> X86_VENDOR_INTEL instead.
> 
> So now the logic is "on Intel, do this thing whether it makes sense or
> not, on other vendors, never do it whether it _would_ make sense or
> not".
> 
> That to me is not sensible. I just don't see the logic.
> 
> This feature should never be enabled unless X86_BUG_L1TF is on, as far
> as I can tell.
> 
> And it should never be enabled if SMT is on.
> 
> At that point, it at least starts making sense. Maybe we don't need
> any further admin options at that point.
> 
> > Would this make you happier?
> > 
> > 1. Remove SW fallback flush
> > 2. Implement a double opt-in (CAP_SYS_ADMIN for the prctl or a
> >    system wide disable)?
> > 3. Ensure the flush happens only when the current core has
> >    SMT disabled
> 
> I think that (3) case should basically be "X86_BUG_L1TF && !SMT". That
> should basically be the default setting for this.
> 
> The (2) thing I would prefer to just be the same kind of thing we do
> for all the other mitigations: have a kernel command line to override
> the defaults.
> 
> The SW fallback right now feels wrong to me. It does seem to be very
> microarchitecture-specific and I'd really like to understand the
> reason for the magic TLB filling. At the same time, if the feature is
> at least enabled under sane and understandable circumstances, and
> people have a way to turn it off, maybe I don't care too much.
>

I cooked up a quick patch (yet untested patch, which leaves the current
refactoring as is) for comments. This should hopefully address your concerns.
This is not the final patch, just the approach for the line of thinking
so far.

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index a58360c8e6e8..988a9d0c31ec 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -293,6 +293,13 @@ enum taa_mitigations {
 	TAA_MITIGATION_TSX_DISABLED,
 };
 
+enum l1d_flush_out_mitigations {
+	L1D_FLUSH_OUT_OFF,
+	L1D_FLUSH_OUT_ON,
+};
+
+static enum l1d_flush_out_mitigations l1d_flush_out_mitigation __ro_after_init = L1D_FLUSH_OUT_ON;
+
 /* Default mitigation for TAA-affected CPUs */
 static enum taa_mitigations taa_mitigation __ro_after_init = TAA_MITIGATION_VERW;
 static bool taa_nosmt __ro_after_init;
@@ -376,6 +383,18 @@ static void __init taa_select_mitigation(void)
 	pr_info("%s\n", taa_strings[taa_mitigation]);
 }
 
+static int __init l1d_flush_out_parse_cmdline(char *str)
+{
+	if (!boot_cpu_has_bug(X86_BUG_L1TF))
+		return 0;
+
+	if (!strcmp(str, "off"))
+		l1d_flush_out_mitigation = L1D_FLUSH_OUT_OFF;
+
+	return 0;
+}
+early_param("l1d_flush_out", l1d_flush_out_parse_cmdline);
+
 static int __init tsx_async_abort_parse_cmdline(char *str)
 {
 	if (!boot_cpu_has_bug(X86_BUG_TAA))
@@ -1123,6 +1142,10 @@ static void task_update_spec_tif(struct task_struct *tsk)
 
 static int l1d_flush_out_prctl_set(struct task_struct *task, unsigned long ctrl)
 {
+
+	if (l1d_flush_out_mitigation == L1D_FLUSH_OUT_OFF)
+		return -EPERM;
+
 	switch (ctrl) {
 	case PR_SPEC_ENABLE:
 		return enable_l1d_flush_for_task(task);
@@ -1240,6 +1263,9 @@ static int l1d_flush_out_prctl_get(struct task_struct *task)
 {
 	int ret;
 
+	if (l1d_flush_out_mitigation == L1D_FLUSH_OUT_OFF)
+		return PR_SPEC_FORCE_DISABLE;
+
 	ret = test_ti_thread_flag(&task->thread_info, TIF_SPEC_L1D_FLUSH);
 	if (ret)
 		return PR_SPEC_PRCTL | PR_SPEC_ENABLE;
diff --git a/arch/x86/kernel/l1d_flush.c b/arch/x86/kernel/l1d_flush.c
index 4662f90fa321..0b898c1b76cd 100644
--- a/arch/x86/kernel/l1d_flush.c
+++ b/arch/x86/kernel/l1d_flush.c
@@ -89,9 +89,6 @@ int l1d_flush_init_once(void)
 {
 	int ret = 0;
 
-	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
-		return -ENOTSUPP;
-
 	if (static_cpu_has(X86_FEATURE_FLUSH_L1D) || l1d_flush_pages)
 		return ret;
 
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index ff1ff8c83452..ddab8166a474 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -320,12 +320,29 @@ EXPORT_SYMBOL_GPL(leave_mm);
 
 int enable_l1d_flush_for_task(struct task_struct *tsk)
 {
-	int ret = l1d_flush_init_once();
+	int cpu = get_cpu();
+	int ret;
+
+	/*
+	 * Do not enable L1D_FLUSH_OUT if
+	 * a. The current core has SMT enabled
+	 * b. The CPU is not affected by the L1TF bug
+	 * c. The CPU does not have L1D FLUSH feature support
+	 */
+	if ((cpumask_weight(topology_sibling_cpumask(cpu)) != 1) ||
+			!boot_cpu_has_bug(X86_BUG_L1TF) ||
+			!static_cpu_has(X86_FEATURE_FLUSH_L1D)) {
+		ret = -EINVAL;
+		goto done;
+	}
 
+	ret = l1d_flush_init_once();
 	if (ret < 0)
-		return ret;
+		goto done;
 
 	set_ti_thread_flag(&tsk->thread_info, TIF_SPEC_L1D_FLUSH);
+done:
+	put_cpu();
 	return ret;
 }