x86: use WRITE_ONCE() when setting PTEs
diff mbox series

Message ID 20180902181451.80520-1-namit@vmware.com
State New, archived
Headers show
Series
  • x86: use WRITE_ONCE() when setting PTEs
Related show

Commit Message

Nadav Amit Sept. 2, 2018, 6:14 p.m. UTC
When page-table entries are set, the compiler might optimize their
assignment by using multiple instructions to set the PTE. This might
turn into a security hazard if the user somehow manages to use the
interim PTE. L1TF does not make our lives easier, making even an interim
non-present PTE a security hazard.

Using WRITE_ONCE() to set PTEs and friends should prevent this potential
security hazard.

I skimmed the differences in the binary with and without this patch. The
differences are (obviously) greater when CONFIG_PARAVIRT=n as more
code optimizations are possible. For better and worse, the impact on the
binary with this patch is pretty small. Skimming the code did not cause
anything to jump out as a security hazard, but it seems that at least
move_soft_dirty_pte() caused set_pte_at() to use multiple writes.

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Nadav Amit <namit@vmware.com>

---

rfc->v1:
* Fixed more instances of PTE settings
---
 arch/x86/include/asm/pgtable.h    |  2 +-
 arch/x86/include/asm/pgtable_64.h | 20 ++++++++++----------
 arch/x86/mm/pgtable.c             |  8 ++++----
 3 files changed, 15 insertions(+), 15 deletions(-)

Comments

Dave Hansen Sept. 6, 2018, 5:21 p.m. UTC | #1
On 09/02/2018 11:14 AM, Nadav Amit wrote:
> When page-table entries are set, the compiler might optimize their
> assignment by using multiple instructions to set the PTE. This might
> turn into a security hazard if the user somehow manages to use the
> interim PTE. L1TF does not make our lives easier, making even an interim
> non-present PTE a security hazard.
> 
> Using WRITE_ONCE() to set PTEs and friends should prevent this potential
> security hazard.

But, our types are already 64-bit, and we're doing a 64-bit pointer
write.  Our WRITE_ONCE() implementation boils down to:

static __always_inline void __write_once_size(...
{
        switch (size) {
	...
        case 8: *(volatile __u64 *)p = *(__u64 *)res; break;


For 64-bit types, which is precisely the same thing.  Right?

So, while I agree that it's nice to document the need for a full 64-bit
"atomicity" of the PTE set, I don't see a practical problem here with
our implementation.

There's probably a massive number of things that would break if we
assumed sane 64-bit writes can be observed piecemeal.
Peter Zijlstra Sept. 6, 2018, 7:28 p.m. UTC | #2
On Thu, Sep 06, 2018 at 10:21:38AM -0700, Dave Hansen wrote:
> There's probably a massive number of things that would break if we
> assumed sane 64-bit writes can be observed piecemeal.

Do assume, it has been observed. WRITE_ONCE()/READ_ONCE() are _required_
if you cannot have load/store tearing.
Peter Zijlstra Sept. 6, 2018, 7:57 p.m. UTC | #3
On Sun, Sep 02, 2018 at 11:14:50AM -0700, Nadav Amit wrote:
> When page-table entries are set, the compiler might optimize their
> assignment by using multiple instructions to set the PTE. This might
> turn into a security hazard if the user somehow manages to use the
> interim PTE. L1TF does not make our lives easier, making even an interim
> non-present PTE a security hazard.
> 
> Using WRITE_ONCE() to set PTEs and friends should prevent this potential
> security hazard.
> 
> I skimmed the differences in the binary with and without this patch. The
> differences are (obviously) greater when CONFIG_PARAVIRT=n as more
> code optimizations are possible. For better and worse, the impact on the
> binary with this patch is pretty small. Skimming the code did not cause
> anything to jump out as a security hazard, but it seems that at least
> move_soft_dirty_pte() caused set_pte_at() to use multiple writes.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Also, its corollary would also make sense/be required, use READ_ONCE()
when reading these.
Nadav Amit Sept. 6, 2018, 8:12 p.m. UTC | #4
at 12:57 PM, Peter Zijlstra <peterz@infradead.org> wrote:

> On Sun, Sep 02, 2018 at 11:14:50AM -0700, Nadav Amit wrote:
>> When page-table entries are set, the compiler might optimize their
>> assignment by using multiple instructions to set the PTE. This might
>> turn into a security hazard if the user somehow manages to use the
>> interim PTE. L1TF does not make our lives easier, making even an interim
>> non-present PTE a security hazard.
>> 
>> Using WRITE_ONCE() to set PTEs and friends should prevent this potential
>> security hazard.
>> 
>> I skimmed the differences in the binary with and without this patch. The
>> differences are (obviously) greater when CONFIG_PARAVIRT=n as more
>> code optimizations are possible. For better and worse, the impact on the
>> binary with this patch is pretty small. Skimming the code did not cause
>> anything to jump out as a security hazard, but it seems that at least
>> move_soft_dirty_pte() caused set_pte_at() to use multiple writes.
> 
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> Also, its corollary would also make sense/be required, use READ_ONCE()
> when reading these.

I don’t know. This would obviously be much more intrusive. I can add a
get_pte() and write a Coccinelle script to use it instead of reading the
PTE, but in most cases, I presume, it would be an overkill.

The reason for that is that the PTEs are supposed to be accessed while
holding the page-table lock, and the hardware can only change dirty & access
bits. I think that any code that assumes that these bits do not change while
holding the lock is already broken in more ways.
Peter Zijlstra Sept. 6, 2018, 8:27 p.m. UTC | #5
On Thu, Sep 06, 2018 at 01:12:14PM -0700, Nadav Amit wrote:
> at 12:57 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Sun, Sep 02, 2018 at 11:14:50AM -0700, Nadav Amit wrote:
> >> When page-table entries are set, the compiler might optimize their
> >> assignment by using multiple instructions to set the PTE. This might
> >> turn into a security hazard if the user somehow manages to use the
> >> interim PTE. L1TF does not make our lives easier, making even an interim
> >> non-present PTE a security hazard.
> >> 
> >> Using WRITE_ONCE() to set PTEs and friends should prevent this potential
> >> security hazard.
> >> 
> >> I skimmed the differences in the binary with and without this patch. The
> >> differences are (obviously) greater when CONFIG_PARAVIRT=n as more
> >> code optimizations are possible. For better and worse, the impact on the
> >> binary with this patch is pretty small. Skimming the code did not cause
> >> anything to jump out as a security hazard, but it seems that at least
> >> move_soft_dirty_pte() caused set_pte_at() to use multiple writes.
> > 
> > Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > 
> > Also, its corollary would also make sense/be required, use READ_ONCE()
> > when reading these.
> 
> I don’t know. This would obviously be much more intrusive. I can add a
> get_pte() and write a Coccinelle script to use it instead of reading the
> PTE, but in most cases, I presume, it would be an overkill.
> 
> The reason for that is that the PTEs are supposed to be accessed while
> holding the page-table lock, and the hardware can only change dirty & access
> bits. I think that any code that assumes that these bits do not change while
> holding the lock is already broken in more ways.

There are lockless readers, but I just checked, mm/gup.c already uses
READ_ONCE(), so that should be fine.
Vlastimil Babka Sept. 7, 2018, 2:10 p.m. UTC | #6
On 09/06/2018 07:21 PM, Dave Hansen wrote:
> On 09/02/2018 11:14 AM, Nadav Amit wrote:
>> When page-table entries are set, the compiler might optimize their
>> assignment by using multiple instructions to set the PTE. This might
>> turn into a security hazard if the user somehow manages to use the
>> interim PTE. L1TF does not make our lives easier, making even an interim
>> non-present PTE a security hazard.
>>
>> Using WRITE_ONCE() to set PTEs and friends should prevent this potential
>> security hazard.
> 
> But, our types are already 64-bit, and we're doing a 64-bit pointer
> write.  Our WRITE_ONCE() implementation boils down to:
> 
> static __always_inline void __write_once_size(...
> {
>         switch (size) {
> 	...
>         case 8: *(volatile __u64 *)p = *(__u64 *)res; break;
> 
> 
> For 64-bit types, which is precisely the same thing.  Right?

Notice the volatile cast. While the CPU write itself is fine, the
*compiler* can decide to do partial updates; volatile forbids it.

Patch
diff mbox series

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 3de9a1fb7a9a..0e3b5917fcef 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1198,7 +1198,7 @@  static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
 		return xchg(pmdp, pmd);
 	} else {
 		pmd_t old = *pmdp;
-		*pmdp = pmd;
+		WRITE_ONCE(*pmdp, pmd);
 		return old;
 	}
 }
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index f773d5e6c8cc..ce2b59047cb8 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -55,15 +55,15 @@  struct mm_struct;
 void set_pte_vaddr_p4d(p4d_t *p4d_page, unsigned long vaddr, pte_t new_pte);
 void set_pte_vaddr_pud(pud_t *pud_page, unsigned long vaddr, pte_t new_pte);
 
-static inline void native_pte_clear(struct mm_struct *mm, unsigned long addr,
-				    pte_t *ptep)
+static inline void native_set_pte(pte_t *ptep, pte_t pte)
 {
-	*ptep = native_make_pte(0);
+	WRITE_ONCE(*ptep, pte);
 }
 
-static inline void native_set_pte(pte_t *ptep, pte_t pte)
+static inline void native_pte_clear(struct mm_struct *mm, unsigned long addr,
+				    pte_t *ptep)
 {
-	*ptep = pte;
+	native_set_pte(ptep, native_make_pte(0));
 }
 
 static inline void native_set_pte_atomic(pte_t *ptep, pte_t pte)
@@ -73,7 +73,7 @@  static inline void native_set_pte_atomic(pte_t *ptep, pte_t pte)
 
 static inline void native_set_pmd(pmd_t *pmdp, pmd_t pmd)
 {
-	*pmdp = pmd;
+	WRITE_ONCE(*pmdp, pmd);
 }
 
 static inline void native_pmd_clear(pmd_t *pmd)
@@ -109,7 +109,7 @@  static inline pmd_t native_pmdp_get_and_clear(pmd_t *xp)
 
 static inline void native_set_pud(pud_t *pudp, pud_t pud)
 {
-	*pudp = pud;
+	WRITE_ONCE(*pudp, pud);
 }
 
 static inline void native_pud_clear(pud_t *pud)
@@ -137,13 +137,13 @@  static inline void native_set_p4d(p4d_t *p4dp, p4d_t p4d)
 	pgd_t pgd;
 
 	if (pgtable_l5_enabled() || !IS_ENABLED(CONFIG_PAGE_TABLE_ISOLATION)) {
-		*p4dp = p4d;
+		WRITE_ONCE(*p4dp, p4d);
 		return;
 	}
 
 	pgd = native_make_pgd(native_p4d_val(p4d));
 	pgd = pti_set_user_pgtbl((pgd_t *)p4dp, pgd);
-	*p4dp = native_make_p4d(native_pgd_val(pgd));
+	WRITE_ONCE(*p4dp, native_make_p4d(native_pgd_val(pgd)));
 }
 
 static inline void native_p4d_clear(p4d_t *p4d)
@@ -153,7 +153,7 @@  static inline void native_p4d_clear(p4d_t *p4d)
 
 static inline void native_set_pgd(pgd_t *pgdp, pgd_t pgd)
 {
-	*pgdp = pti_set_user_pgtbl(pgdp, pgd);
+	WRITE_ONCE(*pgdp, pti_set_user_pgtbl(pgdp, pgd));
 }
 
 static inline void native_pgd_clear(pgd_t *pgd)
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index e848a4811785..ae394552fb94 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -269,7 +269,7 @@  static void mop_up_one_pmd(struct mm_struct *mm, pgd_t *pgdp)
 	if (pgd_val(pgd) != 0) {
 		pmd_t *pmd = (pmd_t *)pgd_page_vaddr(pgd);
 
-		*pgdp = native_make_pgd(0);
+		pgd_clear(pgdp);
 
 		paravirt_release_pmd(pgd_val(pgd) >> PAGE_SHIFT);
 		pmd_free(mm, pmd);
@@ -494,7 +494,7 @@  int ptep_set_access_flags(struct vm_area_struct *vma,
 	int changed = !pte_same(*ptep, entry);
 
 	if (changed && dirty)
-		*ptep = entry;
+		set_pte(ptep, entry);
 
 	return changed;
 }
@@ -509,7 +509,7 @@  int pmdp_set_access_flags(struct vm_area_struct *vma,
 	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
 
 	if (changed && dirty) {
-		*pmdp = entry;
+		set_pmd(pmdp, entry);
 		/*
 		 * We had a write-protection fault here and changed the pmd
 		 * to to more permissive. No need to flush the TLB for that,
@@ -529,7 +529,7 @@  int pudp_set_access_flags(struct vm_area_struct *vma, unsigned long address,
 	VM_BUG_ON(address & ~HPAGE_PUD_MASK);
 
 	if (changed && dirty) {
-		*pudp = entry;
+		set_pud(pudp, entry);
 		/*
 		 * We had a write-protection fault here and changed the pud
 		 * to to more permissive. No need to flush the TLB for that,