All of lore.kernel.org
 help / color / mirror / Atom feed
* [MODERATED] terminal fault
@ 2018-04-18  8:35 Michal Hocko
  2018-04-18 13:36 ` [MODERATED] " Jon Masters
  2018-04-18 14:46 ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 8+ messages in thread
From: Michal Hocko @ 2018-04-18  8:35 UTC (permalink / raw)
  To: speck

We have discussed the following patch as a mitigation for the native OS
L1 Terminal fault issue.

Intel was suggesting to set a bit outside of the uarch addressable range
but flipping all the bits seems both easier and more future proof. So
unless there is something else that would prevent such a fix I would
vote to go with this patch.

Thoughts?
---
From 7b03455455e1152988b2a295a917c0641f531fb0 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Tue, 10 Apr 2018 14:10:42 +0200
Subject: [PATCH] mm, swap, x86: make sure high bits of the swap offset are set

Intel platforms have a bug where L1 cache contents can speculatively
be used to load content of !present entries.  This allows certain side
channel attacks. We do have several different classes of !present
pages. Unmapped memory clears the whole ptes so they are non-issue.
mprotect, numa hints are referring to an existing pfns which cannot be
tweaked by an attacker to a different privilege domains.  So we are left
with swap entries which encode the swap offset and that might conflict
with an existing pfn. Obfuscate those entries by inverting bits in the
swap offset which will set all the high bits and that _should_ stop the
speculation as it should refer to the maximum addressable memory on all
Intel platforms.

Well this doesn't solve the problem on very large offsets (1<<30 on
uarchs with 44b addressing) but this should be out of any practical
attack space.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/x86/include/asm/pgtable_64.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 1149d2112b2e..213c15b2e168 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -299,10 +299,10 @@ static inline int pgd_large(pgd_t pgd) { return 0; }
 
 #define __swp_type(x)			(((x).val >> (SWP_TYPE_FIRST_BIT)) \
 					 & ((1U << SWP_TYPE_BITS) - 1))
-#define __swp_offset(x)			((x).val >> SWP_OFFSET_FIRST_BIT)
+#define __swp_offset(x)			(~(x).val >> SWP_OFFSET_FIRST_BIT)
 #define __swp_entry(type, offset)	((swp_entry_t) { \
 					 ((type) << (SWP_TYPE_FIRST_BIT)) \
-					 | ((offset) << SWP_OFFSET_FIRST_BIT) })
+					 | (~(offset) << SWP_OFFSET_FIRST_BIT) })
 #define __pte_to_swp_entry(pte)		((swp_entry_t) { pte_val((pte)) })
 #define __pmd_to_swp_entry(pmd)		((swp_entry_t) { pmd_val((pmd)) })
 #define __swp_entry_to_pte(x)		((pte_t) { .pte = (x).val })
-- 
2.16.3

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [MODERATED] Re: terminal fault
  2018-04-18  8:35 [MODERATED] terminal fault Michal Hocko
@ 2018-04-18 13:36 ` Jon Masters
  2018-04-18 14:46 ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 8+ messages in thread
From: Jon Masters @ 2018-04-18 13:36 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 930 bytes --]

On 04/18/2018 04:35 AM, speck for Michal Hocko wrote:

> We have discussed the following patch as a mitigation for the native OS
> L1 Terminal fault issue.
> 
> Intel was suggesting to set a bit outside of the uarch addressable range
> but flipping all the bits seems both easier and more future proof. So
> unless there is something else that would prevent such a fix I would
> vote to go with this patch.
> 
> Thoughts?

We've discussed this with Intel at length and are ok with this approach.

Since we're discussing L1TF here, I'll share that I've had a working
reproducer for the VMM attack for the past couple months. I'm using it
to track testing some of the proposed fixes. It's a modified Meltdown
attack inside the guest, and I've committed to some other folks to
followup with bandwidth estimates of the read rate, and so on.

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [MODERATED] Re: terminal fault
  2018-04-18  8:35 [MODERATED] terminal fault Michal Hocko
  2018-04-18 13:36 ` [MODERATED] " Jon Masters
@ 2018-04-18 14:46 ` Konrad Rzeszutek Wilk
  2018-04-19  7:28   ` Michal Hocko
  1 sibling, 1 reply; 8+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-04-18 14:46 UTC (permalink / raw)
  To: speck

On Wed, Apr 18, 2018 at 10:35:29AM +0200, speck for Michal Hocko wrote:
> We have discussed the following patch as a mitigation for the native OS
> L1 Terminal fault issue.
> 
> Intel was suggesting to set a bit outside of the uarch addressable range
> but flipping all the bits seems both easier and more future proof. So
> unless there is something else that would prevent such a fix I would
> vote to go with this patch.
> 
> Thoughts?
> ---
> >From 7b03455455e1152988b2a295a917c0641f531fb0 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Tue, 10 Apr 2018 14:10:42 +0200
> Subject: [PATCH] mm, swap, x86: make sure high bits of the swap offset are set
> 
> Intel platforms have a bug where L1 cache contents can speculatively
> be used to load content of !present entries.  This allows certain side
> channel attacks. We do have several different classes of !present
> pages. Unmapped memory clears the whole ptes so they are non-issue.

[Could you point to where this is done? I am just curious on this]

> mprotect, numa hints are referring to an existing pfns which cannot be
> tweaked by an attacker to a different privilege domains.  So we are left
> with swap entries which encode the swap offset and that might conflict
> with an existing pfn. Obfuscate those entries by inverting bits in the
> swap offset which will set all the high bits and that _should_ stop the
> speculation as it should refer to the maximum addressable memory on all
> Intel platforms.
> 
> Well this doesn't solve the problem on very large offsets (1<<30 on
> uarchs with 44b addressing) but this should be out of any practical
> attack space.

How does this help if your max memory is at the tip of the max physical support?

That would mean we will allow the top of the memory to leak in the L1
cache won't it?
> 
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  arch/x86/include/asm/pgtable_64.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
> index 1149d2112b2e..213c15b2e168 100644
> --- a/arch/x86/include/asm/pgtable_64.h
> +++ b/arch/x86/include/asm/pgtable_64.h
> @@ -299,10 +299,10 @@ static inline int pgd_large(pgd_t pgd) { return 0; }
>  
>  #define __swp_type(x)			(((x).val >> (SWP_TYPE_FIRST_BIT)) \
>  					 & ((1U << SWP_TYPE_BITS) - 1))
> -#define __swp_offset(x)			((x).val >> SWP_OFFSET_FIRST_BIT)
> +#define __swp_offset(x)			(~(x).val >> SWP_OFFSET_FIRST_BIT)
>  #define __swp_entry(type, offset)	((swp_entry_t) { \
>  					 ((type) << (SWP_TYPE_FIRST_BIT)) \
> -					 | ((offset) << SWP_OFFSET_FIRST_BIT) })
> +					 | (~(offset) << SWP_OFFSET_FIRST_BIT) })
>  #define __pte_to_swp_entry(pte)		((swp_entry_t) { pte_val((pte)) })
>  #define __pmd_to_swp_entry(pmd)		((swp_entry_t) { pmd_val((pmd)) })
>  #define __swp_entry_to_pte(x)		((pte_t) { .pte = (x).val })
> -- 
> 2.16.3
> 
> -- 
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [MODERATED] Re: terminal fault
  2018-04-18 14:46 ` Konrad Rzeszutek Wilk
@ 2018-04-19  7:28   ` Michal Hocko
  2018-04-19  7:33     ` Jiri Kosina
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2018-04-19  7:28 UTC (permalink / raw)
  To: speck

On Wed 18-04-18 10:46:05, speck for Konrad Rzeszutek Wilk wrote:
> On Wed, Apr 18, 2018 at 10:35:29AM +0200, speck for Michal Hocko wrote:
> > We have discussed the following patch as a mitigation for the native OS
> > L1 Terminal fault issue.
> > 
> > Intel was suggesting to set a bit outside of the uarch addressable range
> > but flipping all the bits seems both easier and more future proof. So
> > unless there is something else that would prevent such a fix I would
> > vote to go with this patch.
> > 
> > Thoughts?
> > ---
> > >From 7b03455455e1152988b2a295a917c0641f531fb0 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Tue, 10 Apr 2018 14:10:42 +0200
> > Subject: [PATCH] mm, swap, x86: make sure high bits of the swap offset are set
> > 
> > Intel platforms have a bug where L1 cache contents can speculatively
> > be used to load content of !present entries.  This allows certain side
> > channel attacks. We do have several different classes of !present
> > pages. Unmapped memory clears the whole ptes so they are non-issue.
> 
> [Could you point to where this is done? I am just curious on this]

e.g. zap_pte_range -> ptep_get_and_clear_full resp. similar for other
pte layers.

> > mprotect, numa hints are referring to an existing pfns which cannot be
> > tweaked by an attacker to a different privilege domains.  So we are left
> > with swap entries which encode the swap offset and that might conflict
> > with an existing pfn. Obfuscate those entries by inverting bits in the
> > swap offset which will set all the high bits and that _should_ stop the
> > speculation as it should refer to the maximum addressable memory on all
> > Intel platforms.
> > 
> > Well this doesn't solve the problem on very large offsets (1<<30 on
> > uarchs with 44b addressing) but this should be out of any practical
> > attack space.
> 
> How does this help if your max memory is at the tip of the max physical support?
> 
> That would mean we will allow the top of the memory to leak in the L1
> cache won't it?

If your swap offset is that large then yes. But is this a realistic
scenario to exploit? The changelog is explicit about this and we might
be more anal and fail to swap on such a large partitions/files but I do
not see much point making this more complicated than necessary.

Or have I misunderstood your concern?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [MODERATED] Re: terminal fault
  2018-04-19  7:28   ` Michal Hocko
@ 2018-04-19  7:33     ` Jiri Kosina
  2018-04-19  7:41       ` Michal Hocko
  0 siblings, 1 reply; 8+ messages in thread
From: Jiri Kosina @ 2018-04-19  7:33 UTC (permalink / raw)
  To: speck

On Thu, 19 Apr 2018, speck for Michal Hocko wrote:

> we might be more anal and fail to swap on such a large partitions/files 
> but I do not see much point making this more complicated than necessary.

I think that issuing WARN_ON_ONCE() somewhere in swapon() in cases where 
the swap is too big so that it doesn't cover it for maximum pfn on given 
microarchitecture (&& in case we're running on affected CPU, which for 
starters will mean empty whitelist) could be reasonable.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [MODERATED] Re: terminal fault
  2018-04-19  7:33     ` Jiri Kosina
@ 2018-04-19  7:41       ` Michal Hocko
  2018-04-19  7:52         ` Jiri Kosina
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2018-04-19  7:41 UTC (permalink / raw)
  To: speck

On Thu 19-04-18 09:33:31, speck for Jiri Kosina wrote:
> On Thu, 19 Apr 2018, speck for Michal Hocko wrote:
> 
> > we might be more anal and fail to swap on such a large partitions/files 
> > but I do not see much point making this more complicated than necessary.
> 
> I think that issuing WARN_ON_ONCE() somewhere in swapon() in cases where 
> the swap is too big so that it doesn't cover it for maximum pfn on given 
> microarchitecture (&& in case we're running on affected CPU, which for 
> starters will mean empty whitelist) could be reasonable.

Do we have that max pfn per uarch?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [MODERATED] Re: terminal fault
  2018-04-19  7:41       ` Michal Hocko
@ 2018-04-19  7:52         ` Jiri Kosina
  2018-04-19 17:02           ` Jon Masters
  0 siblings, 1 reply; 8+ messages in thread
From: Jiri Kosina @ 2018-04-19  7:52 UTC (permalink / raw)
  To: speck

On Thu, 19 Apr 2018, speck for Michal Hocko wrote:

> Do we have that max pfn per uarch?

Hm, one would hope that if/once this gets extended by 48 addressing pins, 
at the same time the CPUs would be taking the present bit into account 
when speculating ... but yeah, I get your point.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [MODERATED] Re: terminal fault
  2018-04-19  7:52         ` Jiri Kosina
@ 2018-04-19 17:02           ` Jon Masters
  0 siblings, 0 replies; 8+ messages in thread
From: Jon Masters @ 2018-04-19 17:02 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 650 bytes --]

On 04/19/2018 03:52 AM, speck for Jiri Kosina wrote:
> On Thu, 19 Apr 2018, speck for Michal Hocko wrote:
> 
>> Do we have that max pfn per uarch?
> 
> Hm, one would hope that if/once this gets extended by 48 addressing pins, 
> at the same time the CPUs would be taking the present bit into account 
> when speculating ... but yeah, I get your point.

There's discussion about having a new MSR convey which bits to mask off,
which is uarch/SoC platform specific anyway. So there's no reason they
couldn't also convey this. Are Intel folks here able to take that?

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-04-19 17:02 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-18  8:35 [MODERATED] terminal fault Michal Hocko
2018-04-18 13:36 ` [MODERATED] " Jon Masters
2018-04-18 14:46 ` Konrad Rzeszutek Wilk
2018-04-19  7:28   ` Michal Hocko
2018-04-19  7:33     ` Jiri Kosina
2018-04-19  7:41       ` Michal Hocko
2018-04-19  7:52         ` Jiri Kosina
2018-04-19 17:02           ` Jon Masters

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.