All of lore.kernel.org
 help / color / mirror / Atom feed
* [MODERATED] [PATCH v4 3/8] [PATCH v4 3/8] Linux Patch #3
@ 2018-06-23 13:54 konrad.wilk
  2018-06-25 14:26 ` [MODERATED] " Paolo Bonzini
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: konrad.wilk @ 2018-06-23 13:54 UTC (permalink / raw)
  To: speck

336996-Speculative-Execution-Side-Channel-Mitigations.pdf defines
a new MSR (IA32_FLUSH_CMD aka 0x10B) which has similar write-only
semantics to other MSRs defined in the document.

The semantics of this MSR is to allow "finer granularity invalidation
of caching structures than existing mechanisms like WBINVD. It will
writeback and invalidate the L1 data cache, including all cachelines
brought in by preceding instructions, without invalidating all caches
(eg. L2 or LLC). Some processors may also invalidate the first level level
instruction cache on a L1D_FLUSH command. The L1 data and
instruction caches may be shared across the logical processors of a core."

Hence right before we do an VMENTER we need to flush the L1 data cache
to thwart against untrusted guests reading the host memory that
is cached in L1 data cache.

A copy of this document is available at
   https://bugzilla.kernel.org/show_bug.cgi?id=199511

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
v3: Redo the commit
---
 arch/x86/include/asm/msr-index.h |  6 ++++++
 arch/x86/kvm/x86.c               | 10 ++++++++--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 68b2c3150de1..0e7517089b80 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -76,6 +76,12 @@
 						    * control required.
 						    */
 
+#define MSR_IA32_FLUSH_CMD		0x0000010b
+#define L1D_FLUSH			(1 << 0)   /*
+						    * Writeback and invalidate the
+						    * L1 data cache.
+						    */
+
 #define MSR_IA32_BBL_CR_CTL		0x00000119
 #define MSR_IA32_BBL_CR_CTL3		0x0000011e
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4d2e4975f91d..f0f25d31e5e2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6551,11 +6551,17 @@ static void *__read_mostly empty_zero_pages;
 
 void kvm_l1d_flush(void)
 {
-	/* FIXME: could this be boot_cpu_data.x86_cache_size * 2?  */
-	int size = PAGE_SIZE << L1D_CACHE_ORDER;
+	int size;
 
 	ASSERT(boot_cpu_has(X86_BUG_L1TF));
 
+	if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) {
+		wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
+		return;
+	}
+
+	/* FIXME: could this be boot_cpu_data.x86_cache_size * 2?  */
+	size = PAGE_SIZE << L1D_CACHE_ORDER;
 	asm volatile(
 		/* First ensure the pages are in the TLB */
 		"xorl %%eax, %%eax\n\t"
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [MODERATED] Re: [PATCH v4 3/8] [PATCH v4 3/8] Linux Patch #3
  2018-06-23 13:54 [MODERATED] [PATCH v4 3/8] [PATCH v4 3/8] Linux Patch #3 konrad.wilk
@ 2018-06-25 14:26 ` Paolo Bonzini
  2018-06-25 16:32   ` Dave Hansen
  2018-06-27 10:21 ` Thomas Gleixner
  2018-06-27 16:08 ` [MODERATED] " Borislav Petkov
  2 siblings, 1 reply; 9+ messages in thread
From: Paolo Bonzini @ 2018-06-25 14:26 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 443 bytes --]

On 23/06/2018 15:54, speck for konrad.wilk_at_oracle.com wrote:
> +	if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) {
> +		wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
> +		return;
> +	}
> +
> +	/* FIXME: could this be boot_cpu_data.x86_cache_size * 2?  */
> +	size = PAGE_SIZE << L1D_CACHE_ORDER;

Regarding this "FIXME": Peter or Tim, do you know if loading 32KiB is
enough on pre-Skylake parts to clear the L1D cache?

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [MODERATED] Re: [PATCH v4 3/8] [PATCH v4 3/8] Linux Patch #3
  2018-06-25 14:26 ` [MODERATED] " Paolo Bonzini
@ 2018-06-25 16:32   ` Dave Hansen
  2018-06-25 16:46     ` Paolo Bonzini
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Hansen @ 2018-06-25 16:32 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 781 bytes --]

On 06/25/2018 07:26 AM, speck for Paolo Bonzini wrote:
> On 23/06/2018 15:54, speck for konrad.wilk_at_oracle.com wrote:
>> +	if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) {
>> +		wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
>> +		return;
>> +	}
>> +
>> +	/* FIXME: could this be boot_cpu_data.x86_cache_size * 2?  */
>> +	size = PAGE_SIZE << L1D_CACHE_ORDER;
> Regarding this "FIXME": Peter or Tim, do you know if loading 32KiB is
> enough on pre-Skylake parts to clear the L1D cache?

As usual, it's complicated.

32k is theoretically enough, but _only_ if none of the lines being
touched were in the cache previously.  That's why it was a 64k buffer in
some examples.

You also need guard pages at either end to ensure the prefetchers don't
run into the next page.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [MODERATED] Re: [PATCH v4 3/8] [PATCH v4 3/8] Linux Patch #3
  2018-06-25 16:32   ` Dave Hansen
@ 2018-06-25 16:46     ` Paolo Bonzini
  2018-06-25 17:26       ` Dave Hansen
  2018-06-25 23:33       ` Andi Kleen
  0 siblings, 2 replies; 9+ messages in thread
From: Paolo Bonzini @ 2018-06-25 16:46 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 1149 bytes --]

On 25/06/2018 18:32, speck for Dave Hansen wrote:
> On 06/25/2018 07:26 AM, speck for Paolo Bonzini wrote:
>> On 23/06/2018 15:54, speck for konrad.wilk_at_oracle.com wrote:
>>> +	if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) {
>>> +		wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
>>> +		return;
>>> +	}
>>> +
>>> +	/* FIXME: could this be boot_cpu_data.x86_cache_size * 2?  */
>>> +	size = PAGE_SIZE << L1D_CACHE_ORDER;
>> Regarding this "FIXME": Peter or Tim, do you know if loading 32KiB is
>> enough on pre-Skylake parts to clear the L1D cache?
> 
> As usual, it's complicated.
> 
> 32k is theoretically enough, but _only_ if none of the lines being
> touched were in the cache previously.  That's why it was a 64k buffer in
> some examples.

But pre-Skylake has 16k cache only, doesn't it?  Does it need to read in
4 times the cache size?

> You also need guard pages at either end to ensure the prefetchers don't
> run into the next page.

Hmm, it would be a pity to require order 5 even.  Earlier in the thread
someone said that 52 KiB were enough, if that's confirmed we could keep
order 4 and have guard pages.

Paolo


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [MODERATED] Re: [PATCH v4 3/8] [PATCH v4 3/8] Linux Patch #3
  2018-06-25 16:46     ` Paolo Bonzini
@ 2018-06-25 17:26       ` Dave Hansen
  2018-06-25 23:33       ` Andi Kleen
  1 sibling, 0 replies; 9+ messages in thread
From: Dave Hansen @ 2018-06-25 17:26 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 1808 bytes --]

On 06/25/2018 09:46 AM, speck for Paolo Bonzini wrote:
>> 32k is theoretically enough, but _only_ if none of the lines being
>> touched were in the cache previously.  That's why it was a 64k buffer in
>> some examples.
> 
> But pre-Skylake has 16k cache only, doesn't it?  Does it need to read in
> 4 times the cache size?

I thought it's been 32k for a while.  But, either way, I guess we should
be doing the *enumerated* L1D size rather than a fixed 32k.

Here's a Haswell system, btw:

dave@o2:~$ cat /sys/devices/system/cpu/cpu0/cache/index0/size
32K
dave@o2:~$ cat /proc/cpuinfo  | grep model
model name	: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz

Or a Westmere Xeon:

dave@bigbox:~$ cat /sys/devices/system/cpu/cpu0/cache/index0/size
32K

>> You also need guard pages at either end to ensure the prefetchers don't
>> run into the next page.
> 
> Hmm, it would be a pity to require order 5 even.  Earlier in the thread
> someone said that 52 KiB were enough, if that's confirmed we could keep
> order 4 and have guard pages.

52k was the theoretical floor of the smallest possible size that would
guarantee 32k got _evicted_.  But, the recommendation from the hardware
folks was to do more than 52k so there was some buffer in case the
analysis was imprecise.

BTW, this buffer does *not* need to be per-thread necessarily.  Tony
Luck pointed out that we could just have a buffer for each hyperthread.
Core-0/Thread-0 could share its buffer with Core-1/Thread-0, for
instance.  Having them be NUMA-node-local would be nice too, but not
required for correctness.

At the point that we've got two per NUMA node, I'm not sure we really
care much whether it's 128k or 64k consumed.  I'd much rather do what
the hardware folks are comfortable with than save 64k.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [MODERATED] Re: [PATCH v4 3/8] [PATCH v4 3/8] Linux Patch #3
  2018-06-25 16:46     ` Paolo Bonzini
  2018-06-25 17:26       ` Dave Hansen
@ 2018-06-25 23:33       ` Andi Kleen
  2018-06-26  6:41         ` Thomas Gleixner
  1 sibling, 1 reply; 9+ messages in thread
From: Andi Kleen @ 2018-06-25 23:33 UTC (permalink / raw)
  To: speck

> > As usual, it's complicated.
> > 
> > 32k is theoretically enough, but _only_ if none of the lines being
> > touched were in the cache previously.  That's why it was a 64k buffer in
> > some examples.
> 
> But pre-Skylake has 16k cache only, doesn't it?  Does it need to read in

All relevant recent Intel core CPUs have 32k L1.

> Hmm, it would be a pity to require order 5 even.  Earlier in the thread
> someone said that 52 KiB were enough, if that's confirmed we could keep
> order 4 and have guard pages.

The real solution is to not use the software sequence, but 
use the microcode instead, which you need anyways.

-Andi

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 3/8] [PATCH v4 3/8] Linux Patch #3
  2018-06-25 23:33       ` Andi Kleen
@ 2018-06-26  6:41         ` Thomas Gleixner
  0 siblings, 0 replies; 9+ messages in thread
From: Thomas Gleixner @ 2018-06-26  6:41 UTC (permalink / raw)
  To: speck

On Mon, 25 Jun 2018, speck for Andi Kleen wrote:
> > Hmm, it would be a pity to require order 5 even.  Earlier in the thread
> > someone said that 52 KiB were enough, if that's confirmed we could keep
> > order 4 and have guard pages.
> 
> The real solution is to not use the software sequence, but 
> use the microcode instead, which you need anyways.

We all know that by now and repeating it over and over does not make the
microcode magically appear. It's a month since V4 went public and while
the kernel has everything in place since day 1, microcode still has not
materialized.

The code will use the MSR, if available, but as things stay today we need
to have a fallback in place.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 3/8] [PATCH v4 3/8] Linux Patch #3
  2018-06-23 13:54 [MODERATED] [PATCH v4 3/8] [PATCH v4 3/8] Linux Patch #3 konrad.wilk
  2018-06-25 14:26 ` [MODERATED] " Paolo Bonzini
@ 2018-06-27 10:21 ` Thomas Gleixner
  2018-06-27 16:08 ` [MODERATED] " Borislav Petkov
  2 siblings, 0 replies; 9+ messages in thread
From: Thomas Gleixner @ 2018-06-27 10:21 UTC (permalink / raw)
  To: speck

On Sat, 23 Jun 2018, speck for konrad.wilk_at_oracle.com wrote:
>  
> +	if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) {
> +		wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
> +		return;
> +	}

This is only half of the story. The page allocation for empty_zero_pages is
pointless when the MSR is available. Fixed it already.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [MODERATED] Re: [PATCH v4 3/8] [PATCH v4 3/8] Linux Patch #3
  2018-06-23 13:54 [MODERATED] [PATCH v4 3/8] [PATCH v4 3/8] Linux Patch #3 konrad.wilk
  2018-06-25 14:26 ` [MODERATED] " Paolo Bonzini
  2018-06-27 10:21 ` Thomas Gleixner
@ 2018-06-27 16:08 ` Borislav Petkov
  2 siblings, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2018-06-27 16:08 UTC (permalink / raw)
  To: speck

On Sat, Jun 23, 2018 at 09:54:17AM -0400, speck for konrad.wilk_at_oracle.com wrote:
> x86/KVM: Use L1 cache flush before VMENTER if available.
> 
> 336996-Speculative-Execution-Side-Channel-Mitigations.pdf defines
> a new MSR (IA32_FLUSH_CMD aka 0x10B) which has similar write-only
> semantics to other MSRs defined in the document.
> 
> The semantics of this MSR is to allow "finer granularity invalidation
> of caching structures than existing mechanisms like WBINVD. It will
> writeback and invalidate the L1 data cache, including all cachelines
> brought in by preceding instructions, without invalidating all caches
> (eg. L2 or LLC). Some processors may also invalidate the first level level
> instruction cache on a L1D_FLUSH command. The L1 data and
> instruction caches may be shared across the logical processors of a core."
> 
> Hence right before we do an VMENTER we need to flush the L1 data cache
> to thwart against untrusted guests reading the host memory that
> is cached in L1 data cache.
> 
> A copy of this document is available at
>    https://bugzilla.kernel.org/show_bug.cgi?id=199511
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

As before, fix From: or fixup when committing.

...

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4d2e4975f91d..f0f25d31e5e2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6551,11 +6551,17 @@ static void *__read_mostly empty_zero_pages;
>  
>  void kvm_l1d_flush(void)
>  {
> -	/* FIXME: could this be boot_cpu_data.x86_cache_size * 2?  */
> -	int size = PAGE_SIZE << L1D_CACHE_ORDER;
> +	int size;
>  
>  	ASSERT(boot_cpu_has(X86_BUG_L1TF));
>  
> +	if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) {
> +		wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH);

alternative_msr_write(...

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-06-27 16:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-23 13:54 [MODERATED] [PATCH v4 3/8] [PATCH v4 3/8] Linux Patch #3 konrad.wilk
2018-06-25 14:26 ` [MODERATED] " Paolo Bonzini
2018-06-25 16:32   ` Dave Hansen
2018-06-25 16:46     ` Paolo Bonzini
2018-06-25 17:26       ` Dave Hansen
2018-06-25 23:33       ` Andi Kleen
2018-06-26  6:41         ` Thomas Gleixner
2018-06-27 10:21 ` Thomas Gleixner
2018-06-27 16:08 ` [MODERATED] " Borislav Petkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.