* [PATCH 0/2] x86/xen: avoid 32-bit writes to PTEs in PV PAE guests @ 2018-08-20 5:14 Juergen Gross 2018-08-20 5:14 ` [PATCH 1/2] x86/xen: don't write ptes directly in 32-bit PV guests Juergen Gross 2018-08-20 5:14 ` [PATCH 2/2] x86/pae: use 64 bit atomic xchg function in native_ptep_get_and_clear Juergen Gross 0 siblings, 2 replies; 8+ messages in thread From: Juergen Gross @ 2018-08-20 5:14 UTC (permalink / raw) To: linux-kernel, xen-devel, x86 Cc: boris.ostrovsky, hpa, tglx, mingo, Juergen Gross While the hypervisor emulates plain writes to PTEs happily, this is much slower than issuing a hypercall for PTE modifcations. And writing a PTE via two 32-bit write instructions (especially when clearing the PTE) will result in an intermediate L1TF vulnerable PTE. Writes to PAE PTEs should always be done with 64-bit writes or via hypercalls. Juergen Gross (2): x86/xen: don't write ptes directly in 32-bit PV guests x86/pae: use 64 bit atomic xchg function in native_ptep_get_and_clear arch/x86/include/asm/pgtable-3level.h | 14 ++++++++------ arch/x86/xen/mmu_pv.c | 7 +++---- 2 files changed, 11 insertions(+), 10 deletions(-) -- 2.13.7 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/2] x86/xen: don't write ptes directly in 32-bit PV guests 2018-08-20 5:14 [PATCH 0/2] x86/xen: avoid 32-bit writes to PTEs in PV PAE guests Juergen Gross @ 2018-08-20 5:14 ` Juergen Gross 2018-08-20 8:38 ` [Xen-devel] " Jan Beulich 2018-08-20 5:14 ` [PATCH 2/2] x86/pae: use 64 bit atomic xchg function in native_ptep_get_and_clear Juergen Gross 1 sibling, 1 reply; 8+ messages in thread From: Juergen Gross @ 2018-08-20 5:14 UTC (permalink / raw) To: linux-kernel, xen-devel, x86 Cc: boris.ostrovsky, hpa, tglx, mingo, Juergen Gross In some cases 32-bit PAE PV guests still write PTEs directly instead of using hypercalls. This is especially bad when clearing a PTE as this is done via 32-bit writes which will produce intermediate L1TF attackable PTEs. Change the code to use hypercalls instead. Signed-off-by: Juergen Gross <jgross@suse.com> --- arch/x86/xen/mmu_pv.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c index 9e7012858420..9396b4d17064 100644 --- a/arch/x86/xen/mmu_pv.c +++ b/arch/x86/xen/mmu_pv.c @@ -434,14 +434,13 @@ static void xen_set_pud(pud_t *ptr, pud_t val) static void xen_set_pte_atomic(pte_t *ptep, pte_t pte) { trace_xen_mmu_set_pte_atomic(ptep, pte); - set_64bit((u64 *)ptep, native_pte_val(pte)); + __xen_set_pte(ptep, pte); } static void xen_pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { trace_xen_mmu_pte_clear(mm, addr, ptep); - if (!xen_batched_set_pte(ptep, native_make_pte(0))) - native_pte_clear(mm, addr, ptep); + __xen_set_pte(ptep, native_make_pte(0)); } static void xen_pmd_clear(pmd_t *pmdp) @@ -1569,7 +1568,7 @@ static void __init xen_set_pte_init(pte_t *ptep, pte_t pte) pte = __pte_ma(((pte_val_ma(*ptep) & _PAGE_RW) | ~_PAGE_RW) & pte_val_ma(pte)); #endif - native_set_pte(ptep, pte); + __xen_set_pte(ptep, pte); } /* Early in boot, while setting up the initial pagetable, assume -- 2.13.7 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [Xen-devel] [PATCH 1/2] x86/xen: don't write ptes directly in 32-bit PV guests 2018-08-20 5:14 ` [PATCH 1/2] x86/xen: don't write ptes directly in 32-bit PV guests Juergen Gross @ 2018-08-20 8:38 ` Jan Beulich 0 siblings, 0 replies; 8+ messages in thread From: Jan Beulich @ 2018-08-20 8:38 UTC (permalink / raw) To: Juergen Gross Cc: the arch/x86 maintainers, tglx, xen-devel, Boris Ostrovsky, mingo, linux-kernel, hpa >>> On 20.08.18 at 07:14, <jgross@suse.com> wrote: > In some cases 32-bit PAE PV guests still write PTEs directly instead of > using hypercalls. This is especially bad when clearing a PTE as this is > done via 32-bit writes which will produce intermediate L1TF attackable > PTEs. > > Change the code to use hypercalls instead. > > Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/2] x86/pae: use 64 bit atomic xchg function in native_ptep_get_and_clear 2018-08-20 5:14 [PATCH 0/2] x86/xen: avoid 32-bit writes to PTEs in PV PAE guests Juergen Gross 2018-08-20 5:14 ` [PATCH 1/2] x86/xen: don't write ptes directly in 32-bit PV guests Juergen Gross @ 2018-08-20 5:14 ` Juergen Gross 2018-08-20 8:40 ` [Xen-devel] " Jan Beulich 2018-08-20 13:26 ` Thomas Gleixner 1 sibling, 2 replies; 8+ messages in thread From: Juergen Gross @ 2018-08-20 5:14 UTC (permalink / raw) To: linux-kernel, xen-devel, x86 Cc: boris.ostrovsky, hpa, tglx, mingo, Juergen Gross Using only 32-bit writes for the pte will result in an intermediate L1TF vulnerable PTE. When running as a Xen PV guest this will at once switch the guest to shadow mode resulting in a loss of performance. Use arch_atomic64_xchg() instead which will perform the requested operation atomically with all 64 bits. Some performance considerations according to: https://software.intel.com/sites/default/files/managed/ad/dc/Intel-Xeon-Scalable-Processor-throughput-latency.pdf The main number should be the latency, as there is no tight loop around native_ptep_get_and_clear(). "lock cmpxchg8b" has a latency of 20 cycles, while "lock xchg" (with a memory operand) isn't mentioned in that document. "lock xadd" (with xadd having 3 cycles less latency than xchg) has a latency of 11, so we can assume a latency of 14 for "lock xchg". Signed-off-by: Juergen Gross <jgross@suse.com> --- In case adding about 6 cycles for native_ptep_get_and_clear() is believed to be too bad I can modify the patch to add a paravirt function for that purpose in order to add the overhead for Xen guests only (in fact the overhead for Xen guests will be less, as only one instruction writing to the PTE has to be emulated by the hypervisor). --- arch/x86/include/asm/pgtable-3level.h | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h index a564084c6141..7919ae4e27d8 100644 --- a/arch/x86/include/asm/pgtable-3level.h +++ b/arch/x86/include/asm/pgtable-3level.h @@ -2,6 +2,8 @@ #ifndef _ASM_X86_PGTABLE_3LEVEL_H #define _ASM_X86_PGTABLE_3LEVEL_H +#include <asm/atomic64_32.h> + /* * Intel Physical Address Extension (PAE) Mode - three-level page * tables on PPro+ CPUs. @@ -148,14 +150,14 @@ static inline void pud_clear(pud_t *pudp) #ifdef CONFIG_SMP static inline pte_t native_ptep_get_and_clear(pte_t *ptep) { - pte_t res; + union { + pte_t pte; + long long val; + } res; - /* xchg acts as a barrier before the setting of the high bits */ - res.pte_low = xchg(&ptep->pte_low, 0); - res.pte_high = ptep->pte_high; - ptep->pte_high = 0; + res.val = arch_atomic64_xchg((atomic64_t *)ptep, 0); - return res; + return res.pte; } #else #define native_ptep_get_and_clear(xp) native_local_ptep_get_and_clear(xp) -- 2.13.7 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [Xen-devel] [PATCH 2/2] x86/pae: use 64 bit atomic xchg function in native_ptep_get_and_clear 2018-08-20 5:14 ` [PATCH 2/2] x86/pae: use 64 bit atomic xchg function in native_ptep_get_and_clear Juergen Gross @ 2018-08-20 8:40 ` Jan Beulich 2018-08-20 8:56 ` Juergen Gross 2018-08-20 13:26 ` Thomas Gleixner 1 sibling, 1 reply; 8+ messages in thread From: Jan Beulich @ 2018-08-20 8:40 UTC (permalink / raw) To: Juergen Gross Cc: the arch/x86 maintainers, tglx, xen-devel, Boris Ostrovsky, mingo, linux-kernel, hpa >>> On 20.08.18 at 07:14, <jgross@suse.com> wrote: > @@ -148,14 +150,14 @@ static inline void pud_clear(pud_t *pudp) > #ifdef CONFIG_SMP > static inline pte_t native_ptep_get_and_clear(pte_t *ptep) > { > - pte_t res; > + union { > + pte_t pte; > + long long val; > + } res; Why the union? pte_t already is one, with the pte field being what you're after ... > - /* xchg acts as a barrier before the setting of the high bits */ > - res.pte_low = xchg(&ptep->pte_low, 0); > - res.pte_high = ptep->pte_high; > - ptep->pte_high = 0; > + res.val = arch_atomic64_xchg((atomic64_t *)ptep, 0); ... here. Jan ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Xen-devel] [PATCH 2/2] x86/pae: use 64 bit atomic xchg function in native_ptep_get_and_clear 2018-08-20 8:40 ` [Xen-devel] " Jan Beulich @ 2018-08-20 8:56 ` Juergen Gross 0 siblings, 0 replies; 8+ messages in thread From: Juergen Gross @ 2018-08-20 8:56 UTC (permalink / raw) To: Jan Beulich Cc: the arch/x86 maintainers, tglx, xen-devel, Boris Ostrovsky, mingo, linux-kernel, hpa On 20/08/18 10:40, Jan Beulich wrote: >>>> On 20.08.18 at 07:14, <jgross@suse.com> wrote: >> @@ -148,14 +150,14 @@ static inline void pud_clear(pud_t *pudp) >> #ifdef CONFIG_SMP >> static inline pte_t native_ptep_get_and_clear(pte_t *ptep) >> { >> - pte_t res; >> + union { >> + pte_t pte; >> + long long val; >> + } res; > > Why the union? pte_t already is one, with the pte field being what > you're after ... > >> - /* xchg acts as a barrier before the setting of the high bits */ >> - res.pte_low = xchg(&ptep->pte_low, 0); >> - res.pte_high = ptep->pte_high; >> - ptep->pte_high = 0; >> + res.val = arch_atomic64_xchg((atomic64_t *)ptep, 0); > > ... here. Uuh, yes. I'm waiting for more comments, especially regarding the potential need for a paravirt function. Juergen ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] x86/pae: use 64 bit atomic xchg function in native_ptep_get_and_clear 2018-08-20 5:14 ` [PATCH 2/2] x86/pae: use 64 bit atomic xchg function in native_ptep_get_and_clear Juergen Gross 2018-08-20 8:40 ` [Xen-devel] " Jan Beulich @ 2018-08-20 13:26 ` Thomas Gleixner 2018-08-20 14:55 ` Juergen Gross 1 sibling, 1 reply; 8+ messages in thread From: Thomas Gleixner @ 2018-08-20 13:26 UTC (permalink / raw) To: Juergen Gross; +Cc: linux-kernel, xen-devel, x86, boris.ostrovsky, hpa, mingo On Mon, 20 Aug 2018, Juergen Gross wrote: > In case adding about 6 cycles for native_ptep_get_and_clear() is believed > to be too bad I can modify the patch to add a paravirt function for that > purpose in order to add the overhead for Xen guests only (in fact the > overhead for Xen guests will be less, as only one instruction writing to > the PTE has to be emulated by the hypervisor). I doubt that its worth the trouble of yet another paravirt thingy. > --- > arch/x86/include/asm/pgtable-3level.h | 14 ++++++++------ > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h > index a564084c6141..7919ae4e27d8 100644 > --- a/arch/x86/include/asm/pgtable-3level.h > +++ b/arch/x86/include/asm/pgtable-3level.h > @@ -2,6 +2,8 @@ > #ifndef _ASM_X86_PGTABLE_3LEVEL_H > #define _ASM_X86_PGTABLE_3LEVEL_H > > +#include <asm/atomic64_32.h> > + > /* > * Intel Physical Address Extension (PAE) Mode - three-level page > * tables on PPro+ CPUs. > @@ -148,14 +150,14 @@ static inline void pud_clear(pud_t *pudp) > #ifdef CONFIG_SMP > static inline pte_t native_ptep_get_and_clear(pte_t *ptep) > { > - pte_t res; > + union { > + pte_t pte; > + long long val; > + } res; > > - /* xchg acts as a barrier before the setting of the high bits */ > - res.pte_low = xchg(&ptep->pte_low, 0); > - res.pte_high = ptep->pte_high; > - ptep->pte_high = 0; > + res.val = arch_atomic64_xchg((atomic64_t *)ptep, 0); Couldn't you just keep pte_t res; and do: res.pte = (pteval_t) arch_atomic64_xchg((atomic64_t *)ptep, 0); Hmm? Thanks, tglx ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] x86/pae: use 64 bit atomic xchg function in native_ptep_get_and_clear 2018-08-20 13:26 ` Thomas Gleixner @ 2018-08-20 14:55 ` Juergen Gross 0 siblings, 0 replies; 8+ messages in thread From: Juergen Gross @ 2018-08-20 14:55 UTC (permalink / raw) To: Thomas Gleixner; +Cc: linux-kernel, xen-devel, x86, boris.ostrovsky, hpa, mingo On 20/08/18 15:26, Thomas Gleixner wrote: > On Mon, 20 Aug 2018, Juergen Gross wrote: >> In case adding about 6 cycles for native_ptep_get_and_clear() is believed >> to be too bad I can modify the patch to add a paravirt function for that >> purpose in order to add the overhead for Xen guests only (in fact the >> overhead for Xen guests will be less, as only one instruction writing to >> the PTE has to be emulated by the hypervisor). > > I doubt that its worth the trouble of yet another paravirt thingy. > >> --- >> arch/x86/include/asm/pgtable-3level.h | 14 ++++++++------ >> 1 file changed, 8 insertions(+), 6 deletions(-) >> >> diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h >> index a564084c6141..7919ae4e27d8 100644 >> --- a/arch/x86/include/asm/pgtable-3level.h >> +++ b/arch/x86/include/asm/pgtable-3level.h >> @@ -2,6 +2,8 @@ >> #ifndef _ASM_X86_PGTABLE_3LEVEL_H >> #define _ASM_X86_PGTABLE_3LEVEL_H >> >> +#include <asm/atomic64_32.h> >> + >> /* >> * Intel Physical Address Extension (PAE) Mode - three-level page >> * tables on PPro+ CPUs. >> @@ -148,14 +150,14 @@ static inline void pud_clear(pud_t *pudp) >> #ifdef CONFIG_SMP >> static inline pte_t native_ptep_get_and_clear(pte_t *ptep) >> { >> - pte_t res; >> + union { >> + pte_t pte; >> + long long val; >> + } res; >> >> - /* xchg acts as a barrier before the setting of the high bits */ >> - res.pte_low = xchg(&ptep->pte_low, 0); >> - res.pte_high = ptep->pte_high; >> - ptep->pte_high = 0; >> + res.val = arch_atomic64_xchg((atomic64_t *)ptep, 0); > > Couldn't you just keep > > pte_t res; > > and do: > > res.pte = (pteval_t) arch_atomic64_xchg((atomic64_t *)ptep, 0); > > Hmm? Yes, got this suggestion already by Jan. I'm waiting with V2 until tomorrow to see whether someone has other complaints. Juergen ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-08-20 14:56 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-08-20 5:14 [PATCH 0/2] x86/xen: avoid 32-bit writes to PTEs in PV PAE guests Juergen Gross 2018-08-20 5:14 ` [PATCH 1/2] x86/xen: don't write ptes directly in 32-bit PV guests Juergen Gross 2018-08-20 8:38 ` [Xen-devel] " Jan Beulich 2018-08-20 5:14 ` [PATCH 2/2] x86/pae: use 64 bit atomic xchg function in native_ptep_get_and_clear Juergen Gross 2018-08-20 8:40 ` [Xen-devel] " Jan Beulich 2018-08-20 8:56 ` Juergen Gross 2018-08-20 13:26 ` Thomas Gleixner 2018-08-20 14:55 ` Juergen Gross
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).