All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/9] Fixes to pagetable handling
@ 2017-03-16 16:31 Andrew Cooper
  2017-03-16 16:31 ` [PATCH v2 1/9] x86/cpuid: Sort cpu_has_* predicates by feature number Andrew Cooper
                   ` (8 more replies)
  0 siblings, 9 replies; 41+ messages in thread
From: Andrew Cooper @ 2017-03-16 16:31 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

This series has been a long time in preparation (i.e. most of the 4.8 and 4.9
dev cycles).  It started when I tried to make an XTF PoC for XSA-176, and
stumbled upon the the _PAGE_PAGED aliasing issue (see patch 7) which caused by
PoC to be descheduled waiting for (a non-existent) paging agent to respond.

This series is built upon:
  1) The switch to using _Bool. I spent rather too long trying to debug why
     CR0.WP wasn't behaving properly, and it was down to static inline bool_t
     guest_wp_enabled().
  2) The series to switch emulation to using system-segment relative memory
     accesses (directly relevant to patches 1 and 2), which in turn resulted
     in the discovery of XSA-191.
  3) The CPUID improvement work to get maxphysaddr into a sensibly audited
     state, and sensibly accessible location.

Patches 1-4 and 8-9 are new in v2, mostly items requested to be split out in
review from v1.

Andrew Cooper (9):
  x86/cpuid: Sort cpu_has_* predicates by feature number
  x86/pagewalk: Use pointer syntax for pfec parameter
  x86/shadow: Drop VALID_GFN()
  x86/pagewalk: Clean up guest_supports_* predicates
  x86/pagewalk: Helpers for reserved bit handling
  x86/pagewalk: Re-implement the pagetable walker
  x86/shadow: Use the pagewalk reserved bits helpers
  x86/pagewalk: Improve the logic behind setting access and dirty bits
  x86/pagewalk: non-functional cleanup

 xen/arch/x86/mm/guest_walk.c      | 574 +++++++++++++++++++++-----------------
 xen/arch/x86/mm/hap/guest_walk.c  |  33 +--
 xen/arch/x86/mm/hap/nested_ept.c  |   2 +-
 xen/arch/x86/mm/p2m.c             |  17 +-
 xen/arch/x86/mm/shadow/multi.c    |  99 ++++---
 xen/include/asm-x86/cpufeature.h  | 118 ++++----
 xen/include/asm-x86/guest_pt.h    | 215 +++++++++++---
 xen/include/asm-x86/hvm/hvm.h     |   4 -
 xen/include/asm-x86/p2m.h         |   2 +-
 xen/include/asm-x86/page.h        |   3 -
 xen/include/asm-x86/paging.h      |   7 +-
 xen/include/asm-x86/processor.h   |   2 +
 xen/include/asm-x86/x86_64/page.h |   6 -
 13 files changed, 642 insertions(+), 440 deletions(-)

-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 1/9] x86/cpuid: Sort cpu_has_* predicates by feature number
  2017-03-16 16:31 [PATCH v2 0/9] Fixes to pagetable handling Andrew Cooper
@ 2017-03-16 16:31 ` Andrew Cooper
  2017-03-17 16:08   ` Jan Beulich
  2017-03-16 16:31 ` [PATCH v2 2/9] x86/pagewalk: Use pointer syntax for pfec parameter Andrew Cooper
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 41+ messages in thread
From: Andrew Cooper @ 2017-03-16 16:31 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

v2:
 * New
---
 xen/include/asm-x86/cpufeature.h | 117 ++++++++++++++++++++++-----------------
 1 file changed, 66 insertions(+), 51 deletions(-)

diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index b3d613f..5978783 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -29,65 +29,80 @@
 #define CPUID_PM_LEAF                    6
 #define CPUID6_ECX_APERFMPERF_CAPABILITY 0x1
 
-#define cpu_has_fpu		1
-#define cpu_has_de		1
-#define cpu_has_pse		1
-#define cpu_has_pge		1
-#define cpu_has_pat		1
-#define cpu_has_apic		boot_cpu_has(X86_FEATURE_APIC)
-#define cpu_has_sep		boot_cpu_has(X86_FEATURE_SEP)
-#define cpu_has_mtrr		1
-#define cpu_has_mmx		1
-#define cpu_has_sse3		boot_cpu_has(X86_FEATURE_SSE3)
-#define cpu_has_ssse3		boot_cpu_has(X86_FEATURE_SSSE3)
-#define cpu_has_sse4_1		boot_cpu_has(X86_FEATURE_SSE4_1)
-#define cpu_has_sse4_2		boot_cpu_has(X86_FEATURE_SSE4_2)
-#define cpu_has_pclmulqdq	boot_cpu_has(X86_FEATURE_PCLMULQDQ)
-#define cpu_has_popcnt		boot_cpu_has(X86_FEATURE_POPCNT)
-#define cpu_has_aesni		boot_cpu_has(X86_FEATURE_AESNI)
-#define cpu_has_htt		boot_cpu_has(X86_FEATURE_HTT)
-#define cpu_has_nx		boot_cpu_has(X86_FEATURE_NX)
-#define cpu_has_clflush		boot_cpu_has(X86_FEATURE_CLFLUSH)
-#define cpu_has_page1gb		boot_cpu_has(X86_FEATURE_PAGE1GB)
-#define cpu_has_fsgsbase	boot_cpu_has(X86_FEATURE_FSGSBASE)
-#define cpu_has_aperfmperf	boot_cpu_has(X86_FEATURE_APERFMPERF)
-#define cpu_has_smep            boot_cpu_has(X86_FEATURE_SMEP)
-#define cpu_has_smap            boot_cpu_has(X86_FEATURE_SMAP)
-#define cpu_has_fpu_sel         (!boot_cpu_has(X86_FEATURE_NO_FPU_SEL))
-#define cpu_has_ffxsr           ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) \
-                                 && boot_cpu_has(X86_FEATURE_FFXSR))
-#define cpu_has_x2apic          boot_cpu_has(X86_FEATURE_X2APIC)
+/* CPUID level 0x00000001.edx */
+#define cpu_has_fpu             1
+#define cpu_has_de              1
+#define cpu_has_pse             1
+#define cpu_has_apic            boot_cpu_has(X86_FEATURE_APIC)
+#define cpu_has_sep             boot_cpu_has(X86_FEATURE_SEP)
+#define cpu_has_mtrr            1
+#define cpu_has_pge             1
+#define cpu_has_pat             1
+#define cpu_has_clflush         boot_cpu_has(X86_FEATURE_CLFLUSH)
+#define cpu_has_mmx             1
+#define cpu_has_htt             boot_cpu_has(X86_FEATURE_HTT)
+
+/* CPUID level 0x00000001.ecx */
+#define cpu_has_sse3            boot_cpu_has(X86_FEATURE_SSE3)
+#define cpu_has_pclmulqdq       boot_cpu_has(X86_FEATURE_PCLMULQDQ)
+#define cpu_has_monitor         boot_cpu_has(X86_FEATURE_MONITOR)
+#define cpu_has_vmx             boot_cpu_has(X86_FEATURE_VMX)
+#define cpu_has_eist            boot_cpu_has(X86_FEATURE_EIST)
+#define cpu_has_ssse3           boot_cpu_has(X86_FEATURE_SSSE3)
+#define cpu_has_cx16            boot_cpu_has(X86_FEATURE_CX16)
+#define cpu_has_pdcm            boot_cpu_has(X86_FEATURE_PDCM)
 #define cpu_has_pcid            boot_cpu_has(X86_FEATURE_PCID)
+#define cpu_has_sse4_1          boot_cpu_has(X86_FEATURE_SSE4_1)
+#define cpu_has_sse4_2          boot_cpu_has(X86_FEATURE_SSE4_2)
+#define cpu_has_x2apic          boot_cpu_has(X86_FEATURE_X2APIC)
+#define cpu_has_popcnt          boot_cpu_has(X86_FEATURE_POPCNT)
+#define cpu_has_aesni           boot_cpu_has(X86_FEATURE_AESNI)
 #define cpu_has_xsave           boot_cpu_has(X86_FEATURE_XSAVE)
 #define cpu_has_avx             boot_cpu_has(X86_FEATURE_AVX)
+#define cpu_has_rdrand          boot_cpu_has(X86_FEATURE_RDRAND)
+#define cpu_has_hypervisor      boot_cpu_has(X86_FEATURE_HYPERVISOR)
+
+/* CPUID level 0x80000001.edx */
+#define cpu_has_nx              boot_cpu_has(X86_FEATURE_NX)
+#define cpu_has_ffxsr           ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) \
+                                 && boot_cpu_has(X86_FEATURE_FFXSR))
+#define cpu_has_page1gb         boot_cpu_has(X86_FEATURE_PAGE1GB)
+#define cpu_has_rdtscp          boot_cpu_has(X86_FEATURE_RDTSCP)
+
+/* CPUID level 0x80000001.ecx */
+#define cpu_has_cmp_legacy      boot_cpu_has(X86_FEATURE_CMP_LEGACY)
+#define cpu_has_svm             boot_cpu_has(X86_FEATURE_SVM)
+#define cpu_has_sse4a           boot_cpu_has(X86_FEATURE_SSE4A)
 #define cpu_has_lwp             boot_cpu_has(X86_FEATURE_LWP)
+#define cpu_has_tbm             boot_cpu_has(X86_FEATURE_TBM)
+
+/* CPUID level 0x0000000D:1.eax */
+#define cpu_has_xsaveopt        boot_cpu_has(X86_FEATURE_XSAVEOPT)
+#define cpu_has_xsavec          boot_cpu_has(X86_FEATURE_XSAVEC)
+#define cpu_has_xgetbv1         boot_cpu_has(X86_FEATURE_XGETBV1)
+#define cpu_has_xsaves          boot_cpu_has(X86_FEATURE_XSAVES)
+
+/* CPUID level 0x00000007:0.ebx */
+#define cpu_has_fsgsbase        boot_cpu_has(X86_FEATURE_FSGSBASE)
 #define cpu_has_bmi1            boot_cpu_has(X86_FEATURE_BMI1)
+#define cpu_has_hle             boot_cpu_has(X86_FEATURE_HLE)
+#define cpu_has_avx2            boot_cpu_has(X86_FEATURE_AVX2)
+#define cpu_has_smep            boot_cpu_has(X86_FEATURE_SMEP)
 #define cpu_has_bmi2            boot_cpu_has(X86_FEATURE_BMI2)
+#define cpu_has_rtm             boot_cpu_has(X86_FEATURE_RTM)
+#define cpu_has_fpu_sel         (!boot_cpu_has(X86_FEATURE_NO_FPU_SEL))
 #define cpu_has_mpx             boot_cpu_has(X86_FEATURE_MPX)
+#define cpu_has_rdseed          boot_cpu_has(X86_FEATURE_RDSEED)
+#define cpu_has_smap            boot_cpu_has(X86_FEATURE_SMAP)
+#define cpu_has_sha             boot_cpu_has(X86_FEATURE_SHA)
+
+/* CPUID level 0x80000007.edx */
+#define cpu_has_itsc            boot_cpu_has(X86_FEATURE_ITSC)
+
+/* Synthesized. */
 #define cpu_has_arch_perfmon    boot_cpu_has(X86_FEATURE_ARCH_PERFMON)
-#define cpu_has_rdtscp          boot_cpu_has(X86_FEATURE_RDTSCP)
-#define cpu_has_svm		boot_cpu_has(X86_FEATURE_SVM)
-#define cpu_has_vmx		boot_cpu_has(X86_FEATURE_VMX)
-#define cpu_has_cpuid_faulting	boot_cpu_has(X86_FEATURE_CPUID_FAULTING)
-#define cpu_has_cx16            boot_cpu_has(X86_FEATURE_CX16)
-#define cpu_has_xsaveopt	boot_cpu_has(X86_FEATURE_XSAVEOPT)
-#define cpu_has_xsavec		boot_cpu_has(X86_FEATURE_XSAVEC)
-#define cpu_has_xgetbv1		boot_cpu_has(X86_FEATURE_XGETBV1)
-#define cpu_has_xsaves		boot_cpu_has(X86_FEATURE_XSAVES)
-#define cpu_has_avx2		boot_cpu_has(X86_FEATURE_AVX2)
-#define cpu_has_monitor		boot_cpu_has(X86_FEATURE_MONITOR)
-#define cpu_has_eist		boot_cpu_has(X86_FEATURE_EIST)
-#define cpu_has_hypervisor	boot_cpu_has(X86_FEATURE_HYPERVISOR)
-#define cpu_has_rdrand		boot_cpu_has(X86_FEATURE_RDRAND)
-#define cpu_has_rdseed		boot_cpu_has(X86_FEATURE_RDSEED)
-#define cpu_has_cmp_legacy	boot_cpu_has(X86_FEATURE_CMP_LEGACY)
-#define cpu_has_sse4a		boot_cpu_has(X86_FEATURE_SSE4A)
-#define cpu_has_tbm		boot_cpu_has(X86_FEATURE_TBM)
-#define cpu_has_itsc		boot_cpu_has(X86_FEATURE_ITSC)
-#define cpu_has_hle		boot_cpu_has(X86_FEATURE_HLE)
-#define cpu_has_rtm		boot_cpu_has(X86_FEATURE_RTM)
-#define cpu_has_pdcm		boot_cpu_has(X86_FEATURE_PDCM)
-#define cpu_has_sha		boot_cpu_has(X86_FEATURE_SHA)
+#define cpu_has_cpuid_faulting  boot_cpu_has(X86_FEATURE_CPUID_FAULTING)
+#define cpu_has_aperfmperf      boot_cpu_has(X86_FEATURE_APERFMPERF)
 
 enum _cache_type {
     CACHE_TYPE_NULL = 0,
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 2/9] x86/pagewalk: Use pointer syntax for pfec parameter
  2017-03-16 16:31 [PATCH v2 0/9] Fixes to pagetable handling Andrew Cooper
  2017-03-16 16:31 ` [PATCH v2 1/9] x86/cpuid: Sort cpu_has_* predicates by feature number Andrew Cooper
@ 2017-03-16 16:31 ` Andrew Cooper
  2017-03-17 16:09   ` Jan Beulich
                     ` (2 more replies)
  2017-03-16 16:31 ` [PATCH v2 3/9] x86/shadow: Drop VALID_GFN() Andrew Cooper
                   ` (6 subsequent siblings)
  8 siblings, 3 replies; 41+ messages in thread
From: Andrew Cooper @ 2017-03-16 16:31 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Jan Beulich

It is a pointer, not an array.

No functional change.

Requested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: George Dunlap <george.dunlap@eu.citrix.com>

v2:
 * New
---
 xen/arch/x86/mm/hap/guest_walk.c | 24 ++++++++++++------------
 xen/arch/x86/mm/shadow/multi.c   | 12 ++++++------
 xen/include/asm-x86/paging.h     |  7 ++++---
 3 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/xen/arch/x86/mm/hap/guest_walk.c b/xen/arch/x86/mm/hap/guest_walk.c
index 313f82f..e202c9a 100644
--- a/xen/arch/x86/mm/hap/guest_walk.c
+++ b/xen/arch/x86/mm/hap/guest_walk.c
@@ -65,7 +65,7 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PAGING_LEVELS)(
     if ( p2m_is_paging(p2mt) )
     {
         ASSERT(p2m_is_hostp2m(p2m));
-        pfec[0] = PFEC_page_paged;
+        *pfec = PFEC_page_paged;
         if ( top_page )
             put_page(top_page);
         p2m_mem_paging_populate(p2m->domain, cr3 >> PAGE_SHIFT);
@@ -73,14 +73,14 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PAGING_LEVELS)(
     }
     if ( p2m_is_shared(p2mt) )
     {
-        pfec[0] = PFEC_page_shared;
+        *pfec = PFEC_page_shared;
         if ( top_page )
             put_page(top_page);
         return gfn_x(INVALID_GFN);
     }
     if ( !top_page )
     {
-        pfec[0] &= ~PFEC_page_present;
+        *pfec &= ~PFEC_page_present;
         goto out_tweak_pfec;
     }
     top_mfn = _mfn(page_to_mfn(top_page));
@@ -91,7 +91,7 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PAGING_LEVELS)(
 #if GUEST_PAGING_LEVELS == 3
     top_map += (cr3 & ~(PAGE_MASK | 31));
 #endif
-    missing = guest_walk_tables(v, p2m, ga, &gw, pfec[0], top_mfn, top_map);
+    missing = guest_walk_tables(v, p2m, ga, &gw, *pfec, top_mfn, top_map);
     unmap_domain_page(top_map);
     put_page(top_page);
 
@@ -107,13 +107,13 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PAGING_LEVELS)(
         if ( p2m_is_paging(p2mt) )
         {
             ASSERT(p2m_is_hostp2m(p2m));
-            pfec[0] = PFEC_page_paged;
+            *pfec = PFEC_page_paged;
             p2m_mem_paging_populate(p2m->domain, gfn_x(gfn));
             return gfn_x(INVALID_GFN);
         }
         if ( p2m_is_shared(p2mt) )
         {
-            pfec[0] = PFEC_page_shared;
+            *pfec = PFEC_page_shared;
             return gfn_x(INVALID_GFN);
         }
 
@@ -124,19 +124,19 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PAGING_LEVELS)(
     }
 
     if ( missing & _PAGE_PRESENT )
-        pfec[0] &= ~PFEC_page_present;
+        *pfec &= ~PFEC_page_present;
 
     if ( missing & _PAGE_INVALID_BITS ) 
-        pfec[0] |= PFEC_reserved_bit;
+        *pfec |= PFEC_reserved_bit;
 
     if ( missing & _PAGE_PKEY_BITS )
-        pfec[0] |= PFEC_prot_key;
+        *pfec |= PFEC_prot_key;
 
     if ( missing & _PAGE_PAGED )
-        pfec[0] = PFEC_page_paged;
+        *pfec = PFEC_page_paged;
 
     if ( missing & _PAGE_SHARED )
-        pfec[0] = PFEC_page_shared;
+        *pfec = PFEC_page_shared;
 
  out_tweak_pfec:
     /*
@@ -144,7 +144,7 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PAGING_LEVELS)(
      * The PFEC_insn_fetch flag is set only when NX or SMEP are enabled.
      */
     if ( !hvm_nx_enabled(v) && !hvm_smep_enabled(v) )
-        pfec[0] &= ~PFEC_insn_fetch;
+        *pfec &= ~PFEC_insn_fetch;
 
     return gfn_x(INVALID_GFN);
 }
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 7ea9d81..d9bf212 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3727,30 +3727,30 @@ sh_gva_to_gfn(struct vcpu *v, struct p2m_domain *p2m,
 
 #if (SHADOW_OPTIMIZATIONS & SHOPT_VIRTUAL_TLB)
     /* Check the vTLB cache first */
-    unsigned long vtlb_gfn = vtlb_lookup(v, va, pfec[0]);
+    unsigned long vtlb_gfn = vtlb_lookup(v, va, *pfec);
     if ( VALID_GFN(vtlb_gfn) )
         return vtlb_gfn;
 #endif /* (SHADOW_OPTIMIZATIONS & SHOPT_VIRTUAL_TLB) */
 
-    if ( (missing = sh_walk_guest_tables(v, va, &gw, pfec[0])) != 0 )
+    if ( (missing = sh_walk_guest_tables(v, va, &gw, *pfec)) != 0 )
     {
         if ( (missing & _PAGE_PRESENT) )
-            pfec[0] &= ~PFEC_page_present;
+            *pfec &= ~PFEC_page_present;
         if ( missing & _PAGE_INVALID_BITS )
-            pfec[0] |= PFEC_reserved_bit;
+            *pfec |= PFEC_reserved_bit;
         /*
          * SDM Intel 64 Volume 3, Chapter Paging, PAGE-FAULT EXCEPTIONS:
          * The PFEC_insn_fetch flag is set only when NX or SMEP are enabled.
          */
         if ( is_hvm_vcpu(v) && !hvm_nx_enabled(v) && !hvm_smep_enabled(v) )
-            pfec[0] &= ~PFEC_insn_fetch;
+            *pfec &= ~PFEC_insn_fetch;
         return gfn_x(INVALID_GFN);
     }
     gfn = guest_walk_to_gfn(&gw);
 
 #if (SHADOW_OPTIMIZATIONS & SHOPT_VIRTUAL_TLB)
     /* Remember this successful VA->GFN translation for later. */
-    vtlb_insert(v, va >> PAGE_SHIFT, gfn_x(gfn), pfec[0]);
+    vtlb_insert(v, va >> PAGE_SHIFT, gfn_x(gfn), *pfec);
 #endif /* (SHADOW_OPTIMIZATIONS & SHOPT_VIRTUAL_TLB) */
 
     return gfn_x(gfn);
diff --git a/xen/include/asm-x86/paging.h b/xen/include/asm-x86/paging.h
index 2f5befc..f262c9e 100644
--- a/xen/include/asm-x86/paging.h
+++ b/xen/include/asm-x86/paging.h
@@ -242,12 +242,13 @@ paging_fault(unsigned long va, struct cpu_user_regs *regs)
 /* Handle invlpg requests on vcpus. */
 void paging_invlpg(struct vcpu *v, unsigned long va);
 
-/* Translate a guest virtual address to the frame number that the
+/*
+ * Translate a guest virtual address to the frame number that the
  * *guest* pagetables would map it to.  Returns INVALID_GFN if the guest
  * tables don't map this address for this kind of access.
- * pfec[0] is used to determine which kind of access this is when
+ * *pfec is used to determine which kind of access this is when
  * walking the tables.  The caller should set the PFEC_page_present bit
- * in pfec[0]; in the failure case, that bit will be cleared if appropriate.
+ * in *pfec; in the failure case, that bit will be cleared if appropriate.
  *
  * SDM Intel 64 Volume 3, Chapter Paging, PAGE-FAULT EXCEPTIONS:
  * The PFEC_insn_fetch flag is set only when NX or SMEP are enabled.
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 3/9] x86/shadow: Drop VALID_GFN()
  2017-03-16 16:31 [PATCH v2 0/9] Fixes to pagetable handling Andrew Cooper
  2017-03-16 16:31 ` [PATCH v2 1/9] x86/cpuid: Sort cpu_has_* predicates by feature number Andrew Cooper
  2017-03-16 16:31 ` [PATCH v2 2/9] x86/pagewalk: Use pointer syntax for pfec parameter Andrew Cooper
@ 2017-03-16 16:31 ` Andrew Cooper
  2017-03-23 16:30   ` Tim Deegan
  2017-03-16 16:31 ` [PATCH v2 4/9] x86/pagewalk: Clean up guest_supports_* predicates Andrew Cooper
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 41+ messages in thread
From: Andrew Cooper @ 2017-03-16 16:31 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Tim Deegan

There is only one single user of VALID_GFN().  Inline the macro to remove the
added layer of indirection in sh_gva_to_gfn()

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Tim Deegan <tim@xen.org>

v2:
 * New
---
 xen/arch/x86/mm/shadow/multi.c | 2 +-
 xen/include/asm-x86/guest_pt.h | 2 --
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index d9bf212..63c7ab5 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3728,7 +3728,7 @@ sh_gva_to_gfn(struct vcpu *v, struct p2m_domain *p2m,
 #if (SHADOW_OPTIMIZATIONS & SHOPT_VIRTUAL_TLB)
     /* Check the vTLB cache first */
     unsigned long vtlb_gfn = vtlb_lookup(v, va, *pfec);
-    if ( VALID_GFN(vtlb_gfn) )
+    if ( vtlb_gfn != gfn_x(INVALID_GFN) )
         return vtlb_gfn;
 #endif /* (SHADOW_OPTIMIZATIONS & SHOPT_VIRTUAL_TLB) */
 
diff --git a/xen/include/asm-x86/guest_pt.h b/xen/include/asm-x86/guest_pt.h
index 6a06ba0..bedc771 100644
--- a/xen/include/asm-x86/guest_pt.h
+++ b/xen/include/asm-x86/guest_pt.h
@@ -32,8 +32,6 @@
 #error GUEST_PAGING_LEVELS not defined
 #endif
 
-#define VALID_GFN(m) (m != gfn_x(INVALID_GFN))
-
 static inline paddr_t
 gfn_to_paddr(gfn_t gfn)
 {
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 4/9] x86/pagewalk: Clean up guest_supports_* predicates
  2017-03-16 16:31 [PATCH v2 0/9] Fixes to pagetable handling Andrew Cooper
                   ` (2 preceding siblings ...)
  2017-03-16 16:31 ` [PATCH v2 3/9] x86/shadow: Drop VALID_GFN() Andrew Cooper
@ 2017-03-16 16:31 ` Andrew Cooper
  2017-03-20  8:45   ` Jan Beulich
  2017-03-23 16:34   ` Tim Deegan
  2017-03-16 16:31 ` [PATCH v2 5/9] x86/pagewalk: Helpers for reserved bit handling Andrew Cooper
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 41+ messages in thread
From: Andrew Cooper @ 2017-03-16 16:31 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Jan Beulich

Switch them to returning bool, and taking const parameters.

Rename guest_supports_superpages() to guest_supports_l2_superpages() to
indicate which level of pagetables it is actually referring to, and rename
guest_supports_1G_superpages() to guest_supports_l3_superpages() for
consistency.

guest_supports_l3_superpages() is a static property of the domain, rather than
control register settings, so is switched to take a domain pointer.
hvm_pse1gb_supported() is inlined into its sole user because it isn't strictly
hvm-specific (it is hap-specific) and really should be beside a comment
explaining why the cpuid policy is ignored.

While cleaning up part of the file, clean up all trailing whilespace, and fix
one comment which accidently refered to PG living in CR4 rather than CR0.

Requested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: George Dunlap <george.dunlap@eu.citrix.com>

v2:
 * New
---
 xen/arch/x86/mm/guest_walk.c   |  4 +--
 xen/arch/x86/mm/shadow/multi.c | 10 +++----
 xen/include/asm-x86/guest_pt.h | 61 ++++++++++++++++++++++++------------------
 xen/include/asm-x86/hvm/hvm.h  |  4 ---
 4 files changed, 42 insertions(+), 37 deletions(-)

diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c
index 8187226..11f4c5d 100644
--- a/xen/arch/x86/mm/guest_walk.c
+++ b/xen/arch/x86/mm/guest_walk.c
@@ -272,7 +272,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
             /* _PAGE_PSE_PAT not set: remove _PAGE_PAT from flags. */
             flags &= ~_PAGE_PAT;
 
-        if ( !guest_supports_1G_superpages(v) )
+        if ( !guest_supports_l3_superpages(d) )
             rc |= _PAGE_PSE | _PAGE_INVALID_BIT;
         if ( gfn_x(start) & GUEST_L3_GFN_MASK & ~0x1 )
             rc |= _PAGE_INVALID_BITS;
@@ -326,7 +326,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
     }
     rc |= ((gflags & mflags) ^ mflags);
 
-    pse2M = (gflags & _PAGE_PSE) && guest_supports_superpages(v); 
+    pse2M = (gflags & _PAGE_PSE) && guest_supports_l2_superpages(v);
 
     if ( pse2M )
     {
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 63c7ab5..52cef46 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -238,7 +238,7 @@ shadow_check_gwalk(struct vcpu *v, unsigned long va, walk_t *gw, int version)
     l2p = (guest_l2e_t *)v->arch.paging.shadow.guest_vtable;
     mismatch |= (gw->l2e.l2 != l2p[guest_l2_table_offset(va)].l2);
 #endif
-    if ( !(guest_supports_superpages(v) &&
+    if ( !(guest_supports_l2_superpages(v) &&
            (guest_l2e_get_flags(gw->l2e) & _PAGE_PSE)) )
     {
         l1p = map_domain_page(gw->l1mfn);
@@ -310,7 +310,7 @@ gw_remove_write_accesses(struct vcpu *v, unsigned long va, walk_t *gw)
         rc |= GW_RMWR_FLUSHTLB;
 #endif /* GUEST_PAGING_LEVELS >= 3 */
 
-    if ( !(guest_supports_superpages(v) &&
+    if ( !(guest_supports_l2_superpages(v) &&
            (guest_l2e_get_flags(gw->l2e) & _PAGE_PSE))
 #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
          && !mfn_is_out_of_sync(gw->l1mfn)
@@ -660,7 +660,7 @@ _sh_propagate(struct vcpu *v,
     if ( unlikely(((level == 1) ||
                    ((level == 2) &&
                     (gflags & _PAGE_PSE) &&
-                    guest_supports_superpages(v)))
+                    guest_supports_l2_superpages(v)))
                   && !(gflags & _PAGE_DIRTY)) )
         sflags &= ~_PAGE_RW;
 
@@ -1846,7 +1846,7 @@ static shadow_l1e_t * shadow_get_and_create_l1e(struct vcpu *v,
         /* No l1 shadow installed: find and install it. */
         if ( !(flags & _PAGE_PRESENT) )
             return NULL; /* No guest page. */
-        if ( guest_supports_superpages(v) && (flags & _PAGE_PSE) )
+        if ( guest_supports_l2_superpages(v) && (flags & _PAGE_PSE) )
         {
             /* Splintering a superpage */
             gfn_t l2gfn = guest_l2e_get_gfn(gw->l2e);
@@ -2251,7 +2251,7 @@ static int validate_gl2e(struct vcpu *v, void *new_ge, mfn_t sl2mfn, void *se)
     if ( guest_l2e_get_flags(new_gl2e) & _PAGE_PRESENT )
     {
         gfn_t gl1gfn = guest_l2e_get_gfn(new_gl2e);
-        if ( guest_supports_superpages(v) &&
+        if ( guest_supports_l2_superpages(v) &&
              (guest_l2e_get_flags(new_gl2e) & _PAGE_PSE) )
         {
             // superpage -- need to look up the shadow L1 which holds the
diff --git a/xen/include/asm-x86/guest_pt.h b/xen/include/asm-x86/guest_pt.h
index bedc771..43061b7 100644
--- a/xen/include/asm-x86/guest_pt.h
+++ b/xen/include/asm-x86/guest_pt.h
@@ -2,7 +2,7 @@
  * xen/asm-x86/guest_pt.h
  *
  * Types and accessors for guest pagetable entries, as distinct from
- * Xen's pagetable types. 
+ * Xen's pagetable types.
  *
  * Users must #define GUEST_PAGING_LEVELS to 2, 3 or 4 before including
  * this file.
@@ -10,17 +10,17 @@
  * Parts of this code are Copyright (c) 2006 by XenSource Inc.
  * Parts of this code are Copyright (c) 2006 by Michael A Fetterman
  * Parts based on earlier work by Michael A Fetterman, Ian Pratt et al.
- * 
+ *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
  * (at your option) any later version.
- * 
+ *
  * This program is distributed in the hope that it will be useful,
  * but WITHOUT ANY WARRANTY; without even the implied warranty of
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  * GNU General Public License for more details.
- * 
+ *
  * You should have received a copy of the GNU General Public License
  * along with this program; If not, see <http://www.gnu.org/licenses/>.
  */
@@ -168,27 +168,34 @@ static inline guest_l4e_t guest_l4e_from_gfn(gfn_t gfn, u32 flags)
 
 /* Which pagetable features are supported on this vcpu? */
 
-static inline int
-guest_supports_superpages(struct vcpu *v)
+static inline bool guest_supports_l2_superpages(const struct vcpu *v)
 {
-    /* The _PAGE_PSE bit must be honoured in HVM guests, whenever
-     * CR4.PSE is set or the guest is in PAE or long mode. 
-     * It's also used in the dummy PT for vcpus with CR4.PG cleared. */
+    /*
+     * The _PAGE_PSE bit must be honoured in HVM guests, whenever
+     * CR4.PSE is set or the guest is in PAE or long mode.
+     * It's also used in the dummy PT for vcpus with CR0.PG cleared.
+     */
     return (is_pv_vcpu(v)
             ? opt_allow_superpage
-            : (GUEST_PAGING_LEVELS != 2 
+            : (GUEST_PAGING_LEVELS != 2
                || !hvm_paging_enabled(v)
                || (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PSE)));
 }
 
-static inline int
-guest_supports_1G_superpages(struct vcpu *v)
+static inline bool guest_supports_l3_superpages(const struct domain *d)
 {
-    return (GUEST_PAGING_LEVELS >= 4 && hvm_pse1gb_supported(v->domain));
+    /*
+     * There are no control register settings for the hardware pagewalk on the
+     * subject of 1G superpages.
+     *
+     * If the guest constructs a 1GB superpage on capable hardware, it will
+     * function irrespective of whether the feature is advertised.  Xen's
+     * model of performing a pagewalk should match.
+     */
+    return GUEST_PAGING_LEVELS >= 4 && paging_mode_hap(d) && cpu_has_page1gb;
 }
 
-static inline int
-guest_supports_nx(struct vcpu *v)
+static inline bool guest_supports_nx(const struct vcpu *v)
 {
     if ( GUEST_PAGING_LEVELS == 2 || !cpu_has_nx )
         return 0;
@@ -258,11 +265,11 @@ static inline paddr_t guest_walk_to_gpa(const walk_t *gw)
     return (gfn_x(gfn) << PAGE_SHIFT) | (gw->va & ~PAGE_MASK);
 }
 
-/* Given a walk_t from a successful walk, return the page-order of the 
+/* Given a walk_t from a successful walk, return the page-order of the
  * page or superpage that the virtual address is in. */
 static inline unsigned int guest_walk_to_page_order(const walk_t *gw)
 {
-    /* This is only valid for successful walks - otherwise the 
+    /* This is only valid for successful walks - otherwise the
      * PSE bits might be invalid. */
     ASSERT(guest_l1e_get_flags(gw->l1e) & _PAGE_PRESENT);
 #if GUEST_PAGING_LEVELS >= 3
@@ -275,28 +282,30 @@ static inline unsigned int guest_walk_to_page_order(const walk_t *gw)
 }
 
 
-/* Walk the guest pagetables, after the manner of a hardware walker. 
+/*
+ * Walk the guest pagetables, after the manner of a hardware walker.
  *
- * Inputs: a vcpu, a virtual address, a walk_t to fill, a 
- *         pointer to a pagefault code, the MFN of the guest's 
- *         top-level pagetable, and a mapping of the 
+ * Inputs: a vcpu, a virtual address, a walk_t to fill, a
+ *         pointer to a pagefault code, the MFN of the guest's
+ *         top-level pagetable, and a mapping of the
  *         guest's top-level pagetable.
- * 
+ *
  * We walk the vcpu's guest pagetables, filling the walk_t with what we
  * see and adding any Accessed and Dirty bits that are needed in the
  * guest entries.  Using the pagefault code, we check the permissions as
  * we go.  For the purposes of reading pagetables we treat all non-RAM
  * memory as contining zeroes.
- * 
- * Returns 0 for success, or the set of permission bits that we failed on 
- * if the walk did not complete. */
+ *
+ * Returns 0 for success, or the set of permission bits that we failed on
+ * if the walk did not complete.
+ */
 
 /* Macro-fu so you can call guest_walk_tables() and get the right one. */
 #define GPT_RENAME2(_n, _l) _n ## _ ## _l ## _levels
 #define GPT_RENAME(_n, _l) GPT_RENAME2(_n, _l)
 #define guest_walk_tables GPT_RENAME(guest_walk_tables, GUEST_PAGING_LEVELS)
 
-extern uint32_t 
+extern uint32_t
 guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m, unsigned long va,
                   walk_t *gw, uint32_t pfec, mfn_t top_mfn, void *top_map);
 
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 2b4e328..3f81a5a 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -300,10 +300,6 @@ int hvm_girq_dest_2_vcpu_id(struct domain *d, uint8_t dest, uint8_t dest_mode);
 #define hap_has_1gb (!!(hvm_funcs.hap_capabilities & HVM_HAP_SUPERPAGE_1GB))
 #define hap_has_2mb (!!(hvm_funcs.hap_capabilities & HVM_HAP_SUPERPAGE_2MB))
 
-/* Can the guest use 1GB superpages in its own pagetables? */
-#define hvm_pse1gb_supported(d) \
-    (cpu_has_page1gb && paging_mode_hap(d))
-
 #define hvm_long_mode_enabled(v) \
     ((v)->arch.hvm_vcpu.guest_efer & EFER_LMA)
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 5/9] x86/pagewalk: Helpers for reserved bit handling
  2017-03-16 16:31 [PATCH v2 0/9] Fixes to pagetable handling Andrew Cooper
                   ` (3 preceding siblings ...)
  2017-03-16 16:31 ` [PATCH v2 4/9] x86/pagewalk: Clean up guest_supports_* predicates Andrew Cooper
@ 2017-03-16 16:31 ` Andrew Cooper
  2017-03-20  8:48   ` Jan Beulich
  2017-03-23 16:55   ` Tim Deegan
  2017-03-16 16:31 ` [PATCH v2 6/9] x86/pagewalk: Re-implement the pagetable walker Andrew Cooper
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 41+ messages in thread
From: Andrew Cooper @ 2017-03-16 16:31 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Jan Beulich

Some bits are unconditionally reserved in pagetable entries, or reserved
because of alignment restrictions.  Other bits are reserved because of control
register configuration.

Introduce helpers which take an individual vcpu and guest pagetable entry, and
calculates whether any reserved bits are set.

While here, add a couple of newlines to aid readability.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
CC: Tim Deegan <tim@xen.org>

v2:
 * Cleanup split out to earlier patches.
 * Switch guest_supports_pse36() to take a domain, and inherit hardwares pse36
   setting.
---
 xen/include/asm-x86/cpufeature.h |  1 +
 xen/include/asm-x86/guest_pt.h   | 96 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 97 insertions(+)

diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index 5978783..84cc51d 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -38,6 +38,7 @@
 #define cpu_has_mtrr            1
 #define cpu_has_pge             1
 #define cpu_has_pat             1
+#define cpu_has_pse36           boot_cpu_has(X86_FEATURE_PSE36)
 #define cpu_has_clflush         boot_cpu_has(X86_FEATURE_CLFLUSH)
 #define cpu_has_mmx             1
 #define cpu_has_htt             boot_cpu_has(X86_FEATURE_HTT)
diff --git a/xen/include/asm-x86/guest_pt.h b/xen/include/asm-x86/guest_pt.h
index 43061b7..4e5018d 100644
--- a/xen/include/asm-x86/guest_pt.h
+++ b/xen/include/asm-x86/guest_pt.h
@@ -42,6 +42,18 @@ gfn_to_paddr(gfn_t gfn)
 #undef get_gfn
 #define get_gfn(d, g, t) get_gfn_type((d), gfn_x(g), (t), P2M_ALLOC)
 
+/* Mask covering the reserved bits from superpage alignment. */
+#define SUPERPAGE_RSVD(bit)                                             \
+    (((1ULL << (bit)) - 1) & ~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1)))
+
+static inline uint32_t fold_pse36(uint64_t val)
+{
+    return (val & ~(0x1ffULL << 13)) | ((val & (0x1ffULL << 32)) >> (32 - 13));
+}
+static inline uint64_t unfold_pse36(uint32_t val)
+{
+    return (val & ~(0x1ffULL << 13)) | ((val & (0x1ffULL << 13)) << (32 - 13));
+}
 
 /* Types of the guest's page tables and access functions for them */
 
@@ -49,9 +61,13 @@ gfn_to_paddr(gfn_t gfn)
 
 #define GUEST_L1_PAGETABLE_ENTRIES     1024
 #define GUEST_L2_PAGETABLE_ENTRIES     1024
+
 #define GUEST_L1_PAGETABLE_SHIFT         12
 #define GUEST_L2_PAGETABLE_SHIFT         22
 
+#define GUEST_L1_PAGETABLE_RSVD           0
+#define GUEST_L2_PAGETABLE_RSVD           0
+
 typedef uint32_t guest_intpte_t;
 typedef struct { guest_intpte_t l1; } guest_l1e_t;
 typedef struct { guest_intpte_t l2; } guest_l2e_t;
@@ -86,21 +102,39 @@ static inline guest_l2e_t guest_l2e_from_gfn(gfn_t gfn, u32 flags)
 #else /* GUEST_PAGING_LEVELS != 2 */
 
 #if GUEST_PAGING_LEVELS == 3
+
 #define GUEST_L1_PAGETABLE_ENTRIES      512
 #define GUEST_L2_PAGETABLE_ENTRIES      512
 #define GUEST_L3_PAGETABLE_ENTRIES        4
+
 #define GUEST_L1_PAGETABLE_SHIFT         12
 #define GUEST_L2_PAGETABLE_SHIFT         21
 #define GUEST_L3_PAGETABLE_SHIFT         30
+
+#define GUEST_L1_PAGETABLE_RSVD            0x7ff0000000000000UL
+#define GUEST_L2_PAGETABLE_RSVD            0x7ff0000000000000UL
+#define GUEST_L3_PAGETABLE_RSVD                                      \
+    (0xfff0000000000000UL | _PAGE_GLOBAL | _PAGE_PSE | _PAGE_DIRTY | \
+     _PAGE_ACCESSED | _PAGE_USER | _PAGE_RW)
+
 #else /* GUEST_PAGING_LEVELS == 4 */
+
 #define GUEST_L1_PAGETABLE_ENTRIES      512
 #define GUEST_L2_PAGETABLE_ENTRIES      512
 #define GUEST_L3_PAGETABLE_ENTRIES      512
 #define GUEST_L4_PAGETABLE_ENTRIES      512
+
 #define GUEST_L1_PAGETABLE_SHIFT         12
 #define GUEST_L2_PAGETABLE_SHIFT         21
 #define GUEST_L3_PAGETABLE_SHIFT         30
 #define GUEST_L4_PAGETABLE_SHIFT         39
+
+#define GUEST_L1_PAGETABLE_RSVD            0
+#define GUEST_L2_PAGETABLE_RSVD            0
+#define GUEST_L3_PAGETABLE_RSVD            0
+/* NB L4e._PAGE_GLOBAL is reserved for AMD, but ignored for Intel. */
+#define GUEST_L4_PAGETABLE_RSVD            _PAGE_PSE
+
 #endif
 
 typedef l1_pgentry_t guest_l1e_t;
@@ -182,6 +216,21 @@ static inline bool guest_supports_l2_superpages(const struct vcpu *v)
                || (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PSE)));
 }
 
+static inline bool guest_supports_pse36(const struct domain *d)
+{
+    /*
+     * Only called in the context of 2-level guests, after
+     * guest_supports_l2_superpages() has indicated true.
+     *
+     * Once L2 superpages are active, here are no control register settings
+     * for the hardware pagewalk on the subject of PSE36.  If the guest
+     * constructs a PSE36 superpage on capable hardware, it will function
+     * irrespective of whether the feature is advertised.  Xen's model of
+     * performing a pagewalk should match.
+     */
+    return paging_mode_hap(d) && cpu_has_pse36;
+}
+
 static inline bool guest_supports_l3_superpages(const struct domain *d)
 {
     /*
@@ -218,6 +267,53 @@ static inline bool guest_supports_nx(const struct vcpu *v)
 #define _PAGE_INVALID_BITS _PAGE_INVALID_BIT
 #endif
 
+/* Helpers for identifying whether guest entries have reserved bits set. */
+
+/* Bits reserved because of maxphysaddr, and (lack of) EFER.NX */
+static inline uint64_t guest_rsvd_bits(const struct vcpu *v)
+{
+    return ((PADDR_MASK &
+             ~((1UL << v->domain->arch.cpuid->extd.maxphysaddr) - 1)) |
+            (guest_supports_nx(v) ? 0 : put_pte_flags(_PAGE_NX_BIT)));
+}
+
+static inline bool guest_l1e_rsvd_bits(const struct vcpu *v, guest_l1e_t l1e)
+{
+    return l1e.l1 & (guest_rsvd_bits(v) | GUEST_L1_PAGETABLE_RSVD);
+}
+
+static inline bool guest_l2e_rsvd_bits(const struct vcpu *v, guest_l2e_t l2e)
+{
+    uint64_t rsvd_bits = guest_rsvd_bits(v);
+
+    return ((l2e.l2 & (rsvd_bits | GUEST_L2_PAGETABLE_RSVD |
+                       (guest_supports_l2_superpages(v) ? 0 : _PAGE_PSE))) ||
+            ((l2e.l2 & _PAGE_PSE) &&
+             (l2e.l2 & ((GUEST_PAGING_LEVELS == 2 &&
+                         guest_supports_pse36(v->domain))
+                          /* PSE36 tops out at 40 bits of address width. */
+                        ? (fold_pse36(rsvd_bits | (1ULL << 40)))
+                        : SUPERPAGE_RSVD(GUEST_L2_PAGETABLE_SHIFT)))));
+}
+
+#if GUEST_PAGING_LEVELS >= 3
+static inline bool guest_l3e_rsvd_bits(const struct vcpu *v, guest_l3e_t l3e)
+{
+    return ((l3e.l3 & (guest_rsvd_bits(v) | GUEST_L3_PAGETABLE_RSVD |
+                       (guest_supports_l3_superpages(v->domain) ? 0 : _PAGE_PSE))) ||
+            ((l3e.l3 & _PAGE_PSE) &&
+             (l3e.l3 & SUPERPAGE_RSVD(GUEST_L3_PAGETABLE_SHIFT))));
+}
+
+#if GUEST_PAGING_LEVELS >= 4
+static inline bool guest_l4e_rsvd_bits(const struct vcpu *v, guest_l4e_t l4e)
+{
+    return l4e.l4 & (guest_rsvd_bits(v) | GUEST_L4_PAGETABLE_RSVD |
+                     ((v->domain->arch.cpuid->x86_vendor == X86_VENDOR_AMD)
+                      ? _PAGE_GLOBAL : 0));
+}
+#endif /* GUEST_PAGING_LEVELS >= 4 */
+#endif /* GUEST_PAGING_LEVELS >= 3 */
 
 /* Type used for recording a walk through guest pagetables.  It is
  * filled in by the pagetable walk function, and also used as a cache
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 6/9] x86/pagewalk: Re-implement the pagetable walker
  2017-03-16 16:31 [PATCH v2 0/9] Fixes to pagetable handling Andrew Cooper
                   ` (4 preceding siblings ...)
  2017-03-16 16:31 ` [PATCH v2 5/9] x86/pagewalk: Helpers for reserved bit handling Andrew Cooper
@ 2017-03-16 16:31 ` Andrew Cooper
  2017-03-16 16:31 ` [PATCH v2 7/9] x86/shadow: Use the pagewalk reserved bits helpers Andrew Cooper
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 41+ messages in thread
From: Andrew Cooper @ 2017-03-16 16:31 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

The existing pagetable walker has complicated return semantics, which squeeze
multiple pieces of information into single integer.  This would be fine if the
information didn't overlap, but it does.

Specifically, _PAGE_INVALID_BITS for 3-level guests alias _PAGE_PAGED and
_PAGE_SHARED.  A guest which constructs a PTE with bits 52 or 53 set (the
start of the upper software-available range) will create a virtual address
which, when walked by Xen, tricks Xen into believing the frame is paged or
shared.  This behaviour was introduced by XSA-173 (c/s 8b17648).

It is also complicated to turn rc back into a normal pagefault error code.
Instead, change the calling semantics to return a boolean indicating success,
and have the function accumulate a real pagefault error code as it goes
(including synthetic error codes, which do not alias hardware ones).  This
requires an equivalent adjustment to map_domain_gfn().

Issues fixed:
 * 2-level PSE36 superpages now return the correct translation.
 * 2-level L2 superpages without CR0.PSE now return the correct translation.
 * SMEP now inhibits a user instruction fetch even if NX isn't active.
 * Supervisor writes without CR0.WP now set the leaf dirty bit.
 * L4e._PAGE_GLOBAL is strictly reserved on AMD.
 * 3-level l3 entries have all reserved bits checked.
 * 3-level entries can no longer alias Xen's idea of paged or shared.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/mm/guest_walk.c      | 449 ++++++++++++++++++++------------------
 xen/arch/x86/mm/hap/guest_walk.c  |  21 +-
 xen/arch/x86/mm/hap/nested_ept.c  |   2 +-
 xen/arch/x86/mm/p2m.c             |  17 +-
 xen/arch/x86/mm/shadow/multi.c    |  27 +--
 xen/include/asm-x86/guest_pt.h    |  62 ++++--
 xen/include/asm-x86/p2m.h         |   2 +-
 xen/include/asm-x86/page.h        |   3 -
 xen/include/asm-x86/processor.h   |   2 +
 xen/include/asm-x86/x86_64/page.h |   6 -
 10 files changed, 303 insertions(+), 288 deletions(-)

diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c
index 11f4c5d..bdb52d9 100644
--- a/xen/arch/x86/mm/guest_walk.c
+++ b/xen/arch/x86/mm/guest_walk.c
@@ -32,44 +32,6 @@ asm(".file \"" __OBJECT_FILE__ "\"");
 #include <asm/page.h>
 #include <asm/guest_pt.h>
 
-extern const uint32_t gw_page_flags[];
-#if GUEST_PAGING_LEVELS == CONFIG_PAGING_LEVELS
-const uint32_t gw_page_flags[] = {
-    /* I/F -  Usr Wr */
-    /* 0   0   0   0 */ _PAGE_PRESENT,
-    /* 0   0   0   1 */ _PAGE_PRESENT|_PAGE_RW,
-    /* 0   0   1   0 */ _PAGE_PRESENT|_PAGE_USER,
-    /* 0   0   1   1 */ _PAGE_PRESENT|_PAGE_RW|_PAGE_USER,
-    /* 0   1   0   0 */ _PAGE_PRESENT,
-    /* 0   1   0   1 */ _PAGE_PRESENT|_PAGE_RW,
-    /* 0   1   1   0 */ _PAGE_PRESENT|_PAGE_USER,
-    /* 0   1   1   1 */ _PAGE_PRESENT|_PAGE_RW|_PAGE_USER,
-    /* 1   0   0   0 */ _PAGE_PRESENT|_PAGE_NX_BIT,
-    /* 1   0   0   1 */ _PAGE_PRESENT|_PAGE_RW|_PAGE_NX_BIT,
-    /* 1   0   1   0 */ _PAGE_PRESENT|_PAGE_USER|_PAGE_NX_BIT,
-    /* 1   0   1   1 */ _PAGE_PRESENT|_PAGE_RW|_PAGE_USER|_PAGE_NX_BIT,
-    /* 1   1   0   0 */ _PAGE_PRESENT|_PAGE_NX_BIT,
-    /* 1   1   0   1 */ _PAGE_PRESENT|_PAGE_RW|_PAGE_NX_BIT,
-    /* 1   1   1   0 */ _PAGE_PRESENT|_PAGE_USER|_PAGE_NX_BIT,
-    /* 1   1   1   1 */ _PAGE_PRESENT|_PAGE_RW|_PAGE_USER|_PAGE_NX_BIT,
-};
-#endif
-
-/* Flags that are needed in a pagetable entry, with the sense of NX inverted */
-static uint32_t mandatory_flags(struct vcpu *v, uint32_t pfec) 
-{
-    /* Don't demand not-NX if the CPU wouldn't enforce it. */
-    if ( !guest_supports_nx(v) )
-        pfec &= ~PFEC_insn_fetch;
-
-    /* Don't demand R/W if the CPU wouldn't enforce it. */
-    if ( is_hvm_vcpu(v) && unlikely(!hvm_wp_enabled(v)) 
-         && !(pfec & PFEC_user_mode) )
-        pfec &= ~PFEC_write_access;
-
-    return gw_page_flags[(pfec & 0x1f) >> 1] | _PAGE_INVALID_BITS;
-}
-
 /* Modify a guest pagetable entry to set the Accessed and Dirty bits.
  * Returns non-zero if it actually writes to guest memory. */
 static uint32_t set_ad_bits(void *guest_p, void *walk_p, int set_dirty)
@@ -90,62 +52,33 @@ static uint32_t set_ad_bits(void *guest_p, void *walk_p, int set_dirty)
     return 0;
 }
 
-#if GUEST_PAGING_LEVELS >= 4
-static bool_t pkey_fault(struct vcpu *vcpu, uint32_t pfec,
-        uint32_t pte_flags, uint32_t pte_pkey)
-{
-    uint32_t pkru;
-
-    /* When page isn't present,  PKEY isn't checked. */
-    if ( !(pfec & PFEC_page_present) || is_pv_vcpu(vcpu) )
-        return 0;
-
-    /*
-     * PKU:  additional mechanism by which the paging controls
-     * access to user-mode addresses based on the value in the
-     * PKRU register. A fault is considered as a PKU violation if all
-     * of the following conditions are true:
-     * 1.CR4_PKE=1.
-     * 2.EFER_LMA=1.
-     * 3.Page is present with no reserved bit violations.
-     * 4.The access is not an instruction fetch.
-     * 5.The access is to a user page.
-     * 6.PKRU.AD=1 or
-     *      the access is a data write and PKRU.WD=1 and
-     *          either CR0.WP=1 or it is a user access.
-     */
-    if ( !hvm_pku_enabled(vcpu) ||
-         !hvm_long_mode_enabled(vcpu) ||
-         (pfec & PFEC_reserved_bit) ||
-         (pfec & PFEC_insn_fetch) ||
-         !(pte_flags & _PAGE_USER) )
-        return 0;
-
-    pkru = read_pkru();
-    if ( unlikely(pkru) )
-    {
-        bool_t pkru_ad = read_pkru_ad(pkru, pte_pkey);
-        bool_t pkru_wd = read_pkru_wd(pkru, pte_pkey);
-
-        /* Condition 6 */
-        if ( pkru_ad ||
-             (pkru_wd && (pfec & PFEC_write_access) &&
-              (hvm_wp_enabled(vcpu) || (pfec & PFEC_user_mode))) )
-            return 1;
-    }
-
-    return 0;
-}
-#endif
-
-/* Walk the guest pagetables, after the manner of a hardware walker. */
-/* Because the walk is essentially random, it can cause a deadlock 
- * warning in the p2m locking code. Highly unlikely this is an actual
- * deadlock, because who would walk page table in the opposite order? */
-uint32_t
+/*
+ * Walk the guest pagetables, after the manner of a hardware walker.
+ *
+ * This is a condensing of the 'Paging' chapters from Intel and AMD software
+ * manuals.  Please refer closely to them.
+ *
+ * A pagetable walk consists of two parts:
+ *   1) to find whether a translation exists, and
+ *   2) if a translation does exist, to check whether the translation's access
+ *      rights permit the access.
+ *
+ * A translation is found by following the pagetable structure (starting at
+ * %cr3) to a leaf entry (an L1 PTE, or a higher level entry with PSE set)
+ * which identifies the physical destination of the access.
+ *
+ * A translation from one level to the next exists if the PTE is both present
+ * and has no reserved bits set.  If the pagewalk counters a situation where a
+ * translation does not exist, the walk stops at that point.
+ *
+ * The access rights (NX, User, RW bits) are collected as the walk progresses.
+ * If a translation exists, the accumulated access rights are compared to the
+ * requested walk, to see whether the access is permitted.
+ */
+bool
 guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
                   unsigned long va, walk_t *gw, 
-                  uint32_t pfec, mfn_t top_mfn, void *top_map)
+                  uint32_t walk, mfn_t top_mfn, void *top_map)
 {
     struct domain *d = v->domain;
     p2m_type_t p2mt;
@@ -155,64 +88,44 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
     guest_l3e_t *l3p = NULL;
     guest_l4e_t *l4p;
 #endif
-    unsigned int pkey;
-    uint32_t gflags, mflags, iflags, rc = 0;
-    bool_t smep = 0, smap = 0;
+    uint32_t gflags, rc;
     bool_t pse1G = 0, pse2M = 0;
     p2m_query_t qt = P2M_ALLOC | P2M_UNSHARE;
 
-    /* Only implicit supervisor data accesses exist. */
-    ASSERT(!(pfec & PFEC_implicit) ||
-           !(pfec & (PFEC_insn_fetch|PFEC_user_mode)));
+#define AR_ACCUM_AND (_PAGE_USER | _PAGE_RW)
+#define AR_ACCUM_OR  (_PAGE_NX_BIT)
+    /* Start with all AND bits set, all OR bits clear. */
+    uint32_t ar, ar_and = ~0u, ar_or = 0;
 
-    perfc_incr(guest_walk);
-    memset(gw, 0, sizeof(*gw));
-    gw->va = va;
-
-    /* Mandatory bits that must be set in every entry.  We invert NX and
-     * the invalid bits, to calculate as if there were an "X" bit that
-     * allowed access.  We will accumulate, in rc, the set of flags that
-     * are missing/unwanted. */
-    mflags = mandatory_flags(v, pfec);
-    iflags = (_PAGE_NX_BIT | _PAGE_INVALID_BITS);
+    bool walk_ok = false;
 
-    if ( is_hvm_domain(d) && !(pfec & PFEC_user_mode) )
-    {
-        const struct cpu_user_regs *regs = guest_cpu_user_regs();
+    /*
+     * TODO - We should ASSERT() that only the following bits are set as
+     * inputs to a guest walk, but a whole load of code currently passes in
+     * other PFEC_ constants.
+     */
+    walk &= (PFEC_implicit | PFEC_insn_fetch | PFEC_user_mode | PFEC_write_access);
 
-        /* SMEP: kernel-mode instruction fetches from user-mode mappings
-         * should fault.  Unlike NX or invalid bits, we're looking for _all_
-         * entries in the walk to have _PAGE_USER set, so we need to do the
-         * whole walk as if it were a user-mode one and then invert the answer. */
-        smep =  hvm_smep_enabled(v) && (pfec & PFEC_insn_fetch);
+    /* Only implicit supervisor data accesses exist. */
+    ASSERT(!(walk & PFEC_implicit) ||
+           !(walk & (PFEC_insn_fetch|PFEC_user_mode)));
 
-        switch ( v->arch.smap_check_policy )
-        {
-        case SMAP_CHECK_HONOR_CPL_AC:
-            /*
-             * SMAP: kernel-mode data accesses from user-mode mappings
-             * should fault.
-             * A fault is considered as a SMAP violation if the following
-             * conditions come true:
-             *   - X86_CR4_SMAP is set in CR4
-             *   - A user page is accessed
-             *   - CPL = 3 or X86_EFLAGS_AC is clear
-             *   - Page fault in kernel mode
-             */
-            smap = hvm_smap_enabled(v) &&
-                   ((hvm_get_cpl(v) == 3) || !(regs->eflags & X86_EFLAGS_AC));
-            break;
-        case SMAP_CHECK_ENABLED:
-            smap = hvm_smap_enabled(v);
-            break;
-        default:
-            ASSERT(v->arch.smap_check_policy == SMAP_CHECK_DISABLED);
-            break;
-        }
-    }
+    /*
+     * PFEC_insn_fetch is only used as an input to pagetable walking if NX or
+     * SMEP are enabled.  Otherwise, instruction fetches are indistinguishable
+     * from data reads.
+     *
+     * This property can be demonstrated on real hardware by having NX and
+     * SMEP inactive, but SMAP active, and observing that EFLAGS.AC determines
+     * whether a pagefault occures for supervisor execution on user mappings.
+     */
+    if ( !(guest_supports_nx(v) || guest_smep_enabled(v)) )
+        walk &= ~PFEC_insn_fetch;
 
-    if ( smep || smap )
-        mflags |= _PAGE_USER;
+    perfc_incr(guest_walk);
+    memset(gw, 0, sizeof(*gw));
+    gw->va = va;
+    gw->pfec = walk & (PFEC_insn_fetch | PFEC_user_mode | PFEC_write_access);
 
 #if GUEST_PAGING_LEVELS >= 3 /* PAE or 64... */
 #if GUEST_PAGING_LEVELS >= 4 /* 64-bit only... */
@@ -221,17 +134,20 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
     gw->l4mfn = top_mfn;
     l4p = (guest_l4e_t *) top_map;
     gw->l4e = l4p[guest_l4_table_offset(va)];
-    gflags = guest_l4e_get_flags(gw->l4e) ^ iflags;
-    if ( !(gflags & _PAGE_PRESENT) ) {
-        rc |= _PAGE_PRESENT;
+    gflags = guest_l4e_get_flags(gw->l4e);
+    if ( !(gflags & _PAGE_PRESENT) )
         goto out;
-    }
-    if ( gflags & _PAGE_PSE )
+
+    /* Check for reserved bits. */
+    if ( guest_l4e_rsvd_bits(v, gw->l4e) )
     {
-        rc |= _PAGE_PSE | _PAGE_INVALID_BIT;
+        gw->pfec |= PFEC_reserved_bit | PFEC_page_present;
         goto out;
     }
-    rc |= ((gflags & mflags) ^ mflags);
+
+    /* Accumulate l4e access rights. */
+    ar_and &= gflags;
+    ar_or  |= gflags;
 
     /* Map the l3 table */
     l3p = map_domain_gfn(p2m, 
@@ -241,17 +157,28 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
                          qt,
                          &rc); 
     if(l3p == NULL)
+    {
+        gw->pfec |= rc & PFEC_synth_mask;
         goto out;
+    }
+
     /* Get the l3e and check its flags*/
     gw->l3e = l3p[guest_l3_table_offset(va)];
-    pkey = guest_l3e_get_pkey(gw->l3e);
-    gflags = guest_l3e_get_flags(gw->l3e) ^ iflags;
-    if ( !(gflags & _PAGE_PRESENT) ) {
-        rc |= _PAGE_PRESENT;
+    gflags = guest_l3e_get_flags(gw->l3e);
+    if ( !(gflags & _PAGE_PRESENT) )
+        goto out;
+
+    /* Check for reserved bits, including possibly _PAGE_PSE. */
+    if ( guest_l3e_rsvd_bits(v, gw->l3e) )
+    {
+        gw->pfec |= PFEC_reserved_bit | PFEC_page_present;
         goto out;
     }
-    rc |= ((gflags & mflags) ^ mflags);
     
+    /* Accumulate l3e access rights. */
+    ar_and &= gflags;
+    ar_or  |= gflags;
+
     pse1G = !!(gflags & _PAGE_PSE);
 
     if ( pse1G )
@@ -272,26 +199,25 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
             /* _PAGE_PSE_PAT not set: remove _PAGE_PAT from flags. */
             flags &= ~_PAGE_PAT;
 
-        if ( !guest_supports_l3_superpages(d) )
-            rc |= _PAGE_PSE | _PAGE_INVALID_BIT;
-        if ( gfn_x(start) & GUEST_L3_GFN_MASK & ~0x1 )
-            rc |= _PAGE_INVALID_BITS;
-
         /* Increment the pfn by the right number of 4k pages. */
         start = _gfn((gfn_x(start) & ~GUEST_L3_GFN_MASK) +
                      ((va >> PAGE_SHIFT) & GUEST_L3_GFN_MASK));
         gw->l1e = guest_l1e_from_gfn(start, flags);
         gw->l2mfn = gw->l1mfn = INVALID_MFN;
-        goto set_ad;
+        goto leaf;
     }
 
 #else /* PAE only... */
 
     /* Get the l3e and check its flag */
     gw->l3e = ((guest_l3e_t *) top_map)[guest_l3_table_offset(va)];
-    if ( !(guest_l3e_get_flags(gw->l3e) & _PAGE_PRESENT) ) 
+    gflags = guest_l3e_get_flags(gw->l3e);
+    if ( !(gflags & _PAGE_PRESENT) )
+        goto out;
+
+    if ( guest_l3e_rsvd_bits(v, gw->l3e) )
     {
-        rc |= _PAGE_PRESENT;
+        gw->pfec |= PFEC_reserved_bit | PFEC_page_present;
         goto out;
     }
 
@@ -305,7 +231,11 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
                          qt,
                          &rc); 
     if(l2p == NULL)
+    {
+        gw->pfec |= rc & PFEC_synth_mask;
         goto out;
+    }
+
     /* Get the l2e */
     gw->l2e = l2p[guest_l2_table_offset(va)];
 
@@ -318,15 +248,32 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 
 #endif /* All levels... */
 
-    pkey = guest_l2e_get_pkey(gw->l2e);
-    gflags = guest_l2e_get_flags(gw->l2e) ^ iflags;
-    if ( !(gflags & _PAGE_PRESENT) ) {
-        rc |= _PAGE_PRESENT;
+    /* Check the l2e flags. */
+    gflags = guest_l2e_get_flags(gw->l2e);
+    if ( !(gflags & _PAGE_PRESENT) )
+        goto out;
+
+    /*
+     * In 2-level paging without CR0.PSE, there are no reserved bits, and the
+     * PAT/PSE bit is ignored.
+     */
+    if ( GUEST_PAGING_LEVELS == 2 && !guest_supports_l2_superpages(v) )
+    {
+        gw->l2e.l2 &= ~_PAGE_PSE;
+        gflags &= ~_PAGE_PSE;
+    }
+    /* else check for reserved bits, including possibly _PAGE_PSE. */
+    else if ( guest_l2e_rsvd_bits(v, gw->l2e) )
+    {
+        gw->pfec |= PFEC_reserved_bit | PFEC_page_present;
         goto out;
     }
-    rc |= ((gflags & mflags) ^ mflags);
 
-    pse2M = (gflags & _PAGE_PSE) && guest_supports_l2_superpages(v);
+    /* Accumulate l2e access rights. */
+    ar_and &= gflags;
+    ar_or  |= gflags;
+
+    pse2M = !!(gflags & _PAGE_PSE);
 
     if ( pse2M )
     {
@@ -334,7 +281,11 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
          * no guest l1e.  We make one up so that the propagation code
          * can generate a shadow l1 table.  Start with the gfn of the 
          * first 4k-page of the superpage. */
+#if GUEST_PAGING_LEVELS == 2
+        gfn_t start = _gfn(unfold_pse36(gw->l2e.l2) >> PAGE_SHIFT);
+#else
         gfn_t start = guest_l2e_get_gfn(gw->l2e);
+#endif
         /* Grant full access in the l1e, since all the guest entry's 
          * access controls are enforced in the shadow l2e. */
         int flags = (_PAGE_PRESENT|_PAGE_USER|_PAGE_RW|
@@ -348,70 +299,137 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
             /* _PAGE_PSE_PAT not set: remove _PAGE_PAT from flags. */
             flags &= ~_PAGE_PAT;
 
-        if ( gfn_x(start) & GUEST_L2_GFN_MASK & ~0x1 )
-            rc |= _PAGE_INVALID_BITS;
-
-        /* Increment the pfn by the right number of 4k pages.  
-         * Mask out PAT and invalid bits. */
+        /* Increment the pfn by the right number of 4k pages. */
         start = _gfn((gfn_x(start) & ~GUEST_L2_GFN_MASK) +
                      guest_l1_table_offset(va));
+#if GUEST_PAGING_LEVELS == 2
+         /* Wider than 32 bits if PSE36 superpage. */
+        gw->el1e = (gfn_x(start) << PAGE_SHIFT) | flags;
+#else
         gw->l1e = guest_l1e_from_gfn(start, flags);
+#endif
         gw->l1mfn = INVALID_MFN;
-    } 
-    else 
+        goto leaf;
+    }
+
+    /* Map the l1 table */
+    l1p = map_domain_gfn(p2m,
+                         guest_l2e_get_gfn(gw->l2e),
+                         &gw->l1mfn,
+                         &p2mt,
+                         qt,
+                         &rc);
+    if ( l1p == NULL )
     {
-        /* Not a superpage: carry on and find the l1e. */
-        l1p = map_domain_gfn(p2m, 
-                             guest_l2e_get_gfn(gw->l2e), 
-                             &gw->l1mfn,
-                             &p2mt,
-                             qt,
-                             &rc);
-        if(l1p == NULL)
+        gw->pfec |= rc & PFEC_synth_mask;
+        goto out;
+    }
+    gw->l1e = l1p[guest_l1_table_offset(va)];
+    gflags = guest_l1e_get_flags(gw->l1e);
+    if ( !(gflags & _PAGE_PRESENT) )
+        goto out;
+
+    /* Check for reserved bits. */
+    if ( guest_l1e_rsvd_bits(v, gw->l1e) )
+    {
+        gw->pfec |= PFEC_reserved_bit | PFEC_page_present;
+        goto out;
+    }
+
+    /* Accumulate l1e access rights. */
+    ar_and &= gflags;
+    ar_or  |= gflags;
+
+ leaf:
+    gw->pfec |= PFEC_page_present;
+
+    /*
+     * The pagetable walk has returned a successful translation.  Now check
+     * access rights to see whether the access should succeed.
+     */
+    ar = (ar_and & AR_ACCUM_AND) | (ar_or & AR_ACCUM_OR);
+
+    if ( (walk & PFEC_insn_fetch) && (ar & _PAGE_NX_BIT) )
+        /* Requested an instruction fetch and found NX? Fail. */
+        goto out;
+
+    if ( walk & PFEC_user_mode ) /* Requested a user acess. */
+    {
+        if ( !(ar & _PAGE_USER) )
+            /* Got a supervisor walk?  Unconditional fail. */
             goto out;
-        gw->l1e = l1p[guest_l1_table_offset(va)];
-        pkey = guest_l1e_get_pkey(gw->l1e);
-        gflags = guest_l1e_get_flags(gw->l1e) ^ iflags;
-        if ( !(gflags & _PAGE_PRESENT) ) {
-            rc |= _PAGE_PRESENT;
+
+        if ( (walk & PFEC_write_access) && !(ar & _PAGE_RW) )
+            /* Requested a write and only got a read? Fail. */
             goto out;
+    }
+    else /* Requested a supervisor access. */
+    {
+        if ( ar & _PAGE_USER ) /* Got a user walk. */
+        {
+            if ( (walk & PFEC_insn_fetch) && guest_smep_enabled(v) )
+                /* User insn fetch and smep? Fail. */
+                goto out;
+
+            if ( !(walk & PFEC_insn_fetch) && guest_smap_enabled(v) &&
+                 ((walk & PFEC_implicit) ||
+                  !(guest_cpu_user_regs()->eflags & X86_EFLAGS_AC)) )
+                /* User data access and smap? Fail. */
+                goto out;
         }
-        rc |= ((gflags & mflags) ^ mflags);
+
+        if ( (walk & PFEC_write_access) && !(ar & _PAGE_RW) &&
+             guest_wp_enabled(v) )
+            /* Requested a write, got a read, and CR0.WP is set? Fail. */
+            goto out;
     }
 
 #if GUEST_PAGING_LEVELS >= 4 /* 64-bit only... */
-set_ad:
-    if ( pkey_fault(v, pfec, gflags, pkey) )
-        rc |= _PAGE_PKEY_BITS;
+    /*
+     * If all access checks are thusfar ok, check Protection Key for 64bit
+     * user data accesses.
+     *
+     * N.B. In the case that the walk ended with a superpage, the fabricated
+     * gw->l1e contains the appropriate leaf pkey.
+     */
+    if ( (walk & PFEC_user_mode) && !(walk & PFEC_insn_fetch) &&
+         guest_pku_enabled(v) )
+    {
+        unsigned int pkey = guest_l1e_get_pkey(gw->l1e);
+        unsigned int pkru = read_pkru();
+
+        if ( read_pkru_ad(pkru, pkey) ||
+             ((ar & PFEC_write_access) && read_pkru_wd(pkru, pkey)) )
+        {
+            gw->pfec |= PFEC_prot_key;
+            goto out;
+        }
+    }
 #endif
-    /* Now re-invert the user-mode requirement for SMEP and SMAP */
-    if ( smep || smap )
-        rc ^= _PAGE_USER;
+
+    walk_ok = true;
 
     /* Go back and set accessed and dirty bits only if the walk was a
      * success.  Although the PRMs say higher-level _PAGE_ACCESSED bits
      * get set whenever a lower-level PT is used, at least some hardware
      * walkers behave this way. */
-    if ( rc == 0 ) 
-    {
 #if GUEST_PAGING_LEVELS == 4 /* 64-bit only... */
-        if ( set_ad_bits(l4p + guest_l4_table_offset(va), &gw->l4e, 0) )
-            paging_mark_dirty(d, gw->l4mfn);
-        if ( set_ad_bits(l3p + guest_l3_table_offset(va), &gw->l3e,
-                         (pse1G && (pfec & PFEC_write_access))) )
-            paging_mark_dirty(d, gw->l3mfn);
+    if ( set_ad_bits(l4p + guest_l4_table_offset(va), &gw->l4e, 0) )
+        paging_mark_dirty(d, gw->l4mfn);
+    if ( set_ad_bits(l3p + guest_l3_table_offset(va), &gw->l3e,
+                     (pse1G && (walk & PFEC_write_access))) )
+        paging_mark_dirty(d, gw->l3mfn);
 #endif
-        if ( !pse1G ) 
+    if ( !pse1G )
+    {
+        if ( set_ad_bits(l2p + guest_l2_table_offset(va), &gw->l2e,
+                         (pse2M && (walk & PFEC_write_access))) )
+            paging_mark_dirty(d, gw->l2mfn);
+        if ( !pse2M )
         {
-            if ( set_ad_bits(l2p + guest_l2_table_offset(va), &gw->l2e,
-                             (pse2M && (pfec & PFEC_write_access))) )
-                paging_mark_dirty(d, gw->l2mfn);
-            if ( !pse2M ) 
-            {
-                if ( set_ad_bits(l1p + guest_l1_table_offset(va), &gw->l1e, 
-                                 (pfec & PFEC_write_access)) )
-                    paging_mark_dirty(d, gw->l1mfn);
-            }
+            if ( set_ad_bits(l1p + guest_l1_table_offset(va), &gw->l1e,
+                             (walk & PFEC_write_access)) )
+                paging_mark_dirty(d, gw->l1mfn);
         }
     }
 
@@ -436,10 +454,5 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
         put_page(mfn_to_page(mfn_x(gw->l1mfn)));
     }
 
-    /* If this guest has a restricted physical address space then the
-     * target GFN must fit within it. */
-    if ( !(rc & _PAGE_PRESENT) && !gfn_valid(d, guest_walk_to_gfn(gw)) )
-        rc |= _PAGE_INVALID_BITS;
-
-    return rc;
+    return walk_ok;
 }
diff --git a/xen/arch/x86/mm/hap/guest_walk.c b/xen/arch/x86/mm/hap/guest_walk.c
index e202c9a..5a4896b 100644
--- a/xen/arch/x86/mm/hap/guest_walk.c
+++ b/xen/arch/x86/mm/hap/guest_walk.c
@@ -50,7 +50,7 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PAGING_LEVELS)(
     struct vcpu *v, struct p2m_domain *p2m, unsigned long cr3,
     paddr_t ga, uint32_t *pfec, unsigned int *page_order)
 {
-    uint32_t missing;
+    bool walk_ok;
     mfn_t top_mfn;
     void *top_map;
     p2m_type_t p2mt;
@@ -91,12 +91,12 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PAGING_LEVELS)(
 #if GUEST_PAGING_LEVELS == 3
     top_map += (cr3 & ~(PAGE_MASK | 31));
 #endif
-    missing = guest_walk_tables(v, p2m, ga, &gw, *pfec, top_mfn, top_map);
+    walk_ok = guest_walk_tables(v, p2m, ga, &gw, *pfec, top_mfn, top_map);
     unmap_domain_page(top_map);
     put_page(top_page);
 
     /* Interpret the answer */
-    if ( missing == 0 )
+    if ( walk_ok )
     {
         gfn_t gfn = guest_walk_to_gfn(&gw);
         struct page_info *page;
@@ -123,20 +123,7 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PAGING_LEVELS)(
         return gfn_x(gfn);
     }
 
-    if ( missing & _PAGE_PRESENT )
-        *pfec &= ~PFEC_page_present;
-
-    if ( missing & _PAGE_INVALID_BITS ) 
-        *pfec |= PFEC_reserved_bit;
-
-    if ( missing & _PAGE_PKEY_BITS )
-        *pfec |= PFEC_prot_key;
-
-    if ( missing & _PAGE_PAGED )
-        *pfec = PFEC_page_paged;
-
-    if ( missing & _PAGE_SHARED )
-        *pfec = PFEC_page_shared;
+    *pfec = gw.pfec;
 
  out_tweak_pfec:
     /*
diff --git a/xen/arch/x86/mm/hap/nested_ept.c b/xen/arch/x86/mm/hap/nested_ept.c
index 02b27b1..14b1bb0 100644
--- a/xen/arch/x86/mm/hap/nested_ept.c
+++ b/xen/arch/x86/mm/hap/nested_ept.c
@@ -208,7 +208,7 @@ nept_walk_tables(struct vcpu *v, unsigned long l2ga, ept_walk_t *gw)
     goto out;
 
 map_err:
-    if ( rc == _PAGE_PAGED )
+    if ( rc == PFEC_page_paged )
     {
         ret = EPT_TRANSLATE_RETRY;
         goto out;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index a5651a3..d38004c 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1774,17 +1774,18 @@ unsigned long paging_gva_to_gfn(struct vcpu *v,
 }
 
 /*
- * If the map is non-NULL, we leave this function having
- * acquired an extra ref on mfn_to_page(*mfn).
+ * If the map is non-NULL, we leave this function having acquired an extra ref
+ * on mfn_to_page(*mfn).  In all cases, *pfec contains appropriate
+ * synthetic/structure PFEC_* bits.
  */
 void *map_domain_gfn(struct p2m_domain *p2m, gfn_t gfn, mfn_t *mfn,
-                     p2m_type_t *p2mt, p2m_query_t q, uint32_t *rc)
+                     p2m_type_t *p2mt, p2m_query_t q, uint32_t *pfec)
 {
     struct page_info *page;
 
     if ( !gfn_valid(p2m->domain, gfn) )
     {
-        *rc = _PAGE_INVALID_BIT;
+        *pfec = PFEC_reserved_bit | PFEC_page_present;
         return NULL;
     }
 
@@ -1796,21 +1797,23 @@ void *map_domain_gfn(struct p2m_domain *p2m, gfn_t gfn, mfn_t *mfn,
         if ( page )
             put_page(page);
         p2m_mem_paging_populate(p2m->domain, gfn_x(gfn));
-        *rc = _PAGE_PAGED;
+        *pfec = PFEC_page_paged;
         return NULL;
     }
     if ( p2m_is_shared(*p2mt) )
     {
         if ( page )
             put_page(page);
-        *rc = _PAGE_SHARED;
+        *pfec = PFEC_page_shared;
         return NULL;
     }
     if ( !page )
     {
-        *rc |= _PAGE_PRESENT;
+        *pfec = 0;
         return NULL;
     }
+
+    *pfec = PFEC_page_present;
     *mfn = page_to_mfn(page);
     ASSERT(mfn_valid(*mfn));
 
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 52cef46..4762e15 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -171,7 +171,7 @@ delete_shadow_status(struct domain *d, mfn_t gmfn, u32 shadow_type, mfn_t smfn)
 /**************************************************************************/
 /* Functions for walking the guest page tables */
 
-static inline uint32_t
+static inline bool
 sh_walk_guest_tables(struct vcpu *v, unsigned long va, walk_t *gw,
                      uint32_t pfec)
 {
@@ -2858,6 +2858,7 @@ static int sh_page_fault(struct vcpu *v,
     int r;
     p2m_type_t p2mt;
     uint32_t rc, error_code;
+    bool walk_ok;
     int version;
     const struct npfec access = {
          .read_access = 1,
@@ -3075,21 +3076,20 @@ static int sh_page_fault(struct vcpu *v,
      * shadow page table. */
     version = atomic_read(&d->arch.paging.shadow.gtable_dirty_version);
     rmb();
-    rc = sh_walk_guest_tables(v, va, &gw, error_code);
+    walk_ok = sh_walk_guest_tables(v, va, &gw, error_code);
 
 #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
     regs->error_code &= ~PFEC_page_present;
-    if ( !(rc & _PAGE_PRESENT) )
+    if ( gw.pfec & PFEC_page_present )
         regs->error_code |= PFEC_page_present;
 #endif
 
-    if ( rc != 0 )
+    if ( !walk_ok )
     {
         perfc_incr(shadow_fault_bail_real_fault);
         SHADOW_PRINTK("not a shadow fault\n");
         reset_early_unshadow(v);
-        if ( (rc & _PAGE_INVALID_BITS) )
-            regs->error_code |= PFEC_reserved_bit;
+        regs->error_code = gw.pfec & PFEC_arch_mask;
         goto propagate;
     }
 
@@ -3723,7 +3723,7 @@ sh_gva_to_gfn(struct vcpu *v, struct p2m_domain *p2m,
 {
     walk_t gw;
     gfn_t gfn;
-    uint32_t missing;
+    bool walk_ok;
 
 #if (SHADOW_OPTIMIZATIONS & SHOPT_VIRTUAL_TLB)
     /* Check the vTLB cache first */
@@ -3732,18 +3732,9 @@ sh_gva_to_gfn(struct vcpu *v, struct p2m_domain *p2m,
         return vtlb_gfn;
 #endif /* (SHADOW_OPTIMIZATIONS & SHOPT_VIRTUAL_TLB) */
 
-    if ( (missing = sh_walk_guest_tables(v, va, &gw, *pfec)) != 0 )
+    if ( !(walk_ok = sh_walk_guest_tables(v, va, &gw, *pfec)) )
     {
-        if ( (missing & _PAGE_PRESENT) )
-            *pfec &= ~PFEC_page_present;
-        if ( missing & _PAGE_INVALID_BITS )
-            *pfec |= PFEC_reserved_bit;
-        /*
-         * SDM Intel 64 Volume 3, Chapter Paging, PAGE-FAULT EXCEPTIONS:
-         * The PFEC_insn_fetch flag is set only when NX or SMEP are enabled.
-         */
-        if ( is_hvm_vcpu(v) && !hvm_nx_enabled(v) && !hvm_smep_enabled(v) )
-            *pfec &= ~PFEC_insn_fetch;
+        *pfec = gw.pfec;
         return gfn_x(INVALID_GFN);
     }
     gfn = guest_walk_to_gfn(&gw);
diff --git a/xen/include/asm-x86/guest_pt.h b/xen/include/asm-x86/guest_pt.h
index 4e5018d..5fb77c6 100644
--- a/xen/include/asm-x86/guest_pt.h
+++ b/xen/include/asm-x86/guest_pt.h
@@ -253,19 +253,26 @@ static inline bool guest_supports_nx(const struct vcpu *v)
     return hvm_nx_enabled(v);
 }
 
+static inline bool guest_wp_enabled(const struct vcpu *v)
+{
+    /* PV guests can't control CR0.WP, and it is unconditionally set by Xen. */
+    return is_pv_vcpu(v) || hvm_wp_enabled(v);
+}
 
-/*
- * Some bits are invalid in any pagetable entry.
- * Normal flags values get represented in 24-bit values (see
- * get_pte_flags() and put_pte_flags()), so set bit 24 in
- * addition to be able to flag out of range frame numbers.
- */
-#if GUEST_PAGING_LEVELS == 3
-#define _PAGE_INVALID_BITS \
-    (_PAGE_INVALID_BIT | get_pte_flags(((1ull << 63) - 1) & ~(PAGE_SIZE - 1)))
-#else /* 2-level and 4-level */
-#define _PAGE_INVALID_BITS _PAGE_INVALID_BIT
-#endif
+static inline bool guest_smep_enabled(const struct vcpu *v)
+{
+    return !is_pv_vcpu(v) && hvm_smep_enabled(v);
+}
+
+static inline bool guest_smap_enabled(const struct vcpu *v)
+{
+    return !is_pv_vcpu(v) && hvm_smap_enabled(v);
+}
+
+static inline bool guest_pku_enabled(const struct vcpu *v)
+{
+    return !is_pv_vcpu(v) && hvm_pku_enabled(v);
+}
 
 /* Helpers for identifying whether guest entries have reserved bits set. */
 
@@ -331,13 +338,19 @@ struct guest_pagetable_walk
     guest_l3e_t l3e;            /* Guest's level 3 entry */
 #endif
     guest_l2e_t l2e;            /* Guest's level 2 entry */
-    guest_l1e_t l1e;            /* Guest's level 1 entry (or fabrication) */
+    union
+    {
+        guest_l1e_t l1e;        /* Guest's level 1 entry (or fabrication). */
+        uint64_t   el1e;        /* L2 PSE36 superpages wider than 32 bits. */
+    };
 #if GUEST_PAGING_LEVELS >= 4
     mfn_t l4mfn;                /* MFN that the level 4 entry was in */
     mfn_t l3mfn;                /* MFN that the level 3 entry was in */
 #endif
     mfn_t l2mfn;                /* MFN that the level 2 entry was in */
     mfn_t l1mfn;                /* MFN that the level 1 entry was in */
+
+    uint32_t pfec;              /* Accumulated PFEC_* error code from walk. */
 };
 
 /* Given a walk_t, translate the gw->va into the guest's notion of the
@@ -346,7 +359,9 @@ static inline gfn_t guest_walk_to_gfn(const walk_t *gw)
 {
     if ( !(guest_l1e_get_flags(gw->l1e) & _PAGE_PRESENT) )
         return INVALID_GFN;
-    return guest_l1e_get_gfn(gw->l1e);
+    return (GUEST_PAGING_LEVELS == 2
+            ? _gfn(gw->el1e >> PAGE_SHIFT)
+            : guest_l1e_get_gfn(gw->l1e));
 }
 
 /* Given a walk_t, translate the gw->va into the guest's notion of the
@@ -392,8 +407,8 @@ static inline unsigned int guest_walk_to_page_order(const walk_t *gw)
  * we go.  For the purposes of reading pagetables we treat all non-RAM
  * memory as contining zeroes.
  *
- * Returns 0 for success, or the set of permission bits that we failed on
- * if the walk did not complete.
+ * Returns a boolean indicating success or failure.  walk_t.pfec contains
+ * the accumulated error code on failure.
  */
 
 /* Macro-fu so you can call guest_walk_tables() and get the right one. */
@@ -401,7 +416,7 @@ static inline unsigned int guest_walk_to_page_order(const walk_t *gw)
 #define GPT_RENAME(_n, _l) GPT_RENAME2(_n, _l)
 #define guest_walk_tables GPT_RENAME(guest_walk_tables, GUEST_PAGING_LEVELS)
 
-extern uint32_t
+bool
 guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m, unsigned long va,
                   walk_t *gw, uint32_t pfec, mfn_t top_mfn, void *top_map);
 
@@ -421,8 +436,21 @@ static inline void print_gw(const walk_t *gw)
 #endif /* All levels... */
     gprintk(XENLOG_INFO, "   l2e=%" PRI_gpte " l2mfn=%" PRI_mfn "\n",
             gw->l2e.l2, mfn_x(gw->l2mfn));
+#if GUEST_PAGING_LEVELS == 2
+    gprintk(XENLOG_INFO, "  el1e=%08" PRIx64 " l1mfn=%" PRI_mfn "\n",
+            gw->el1e, mfn_x(gw->l1mfn));
+#else
     gprintk(XENLOG_INFO, "   l1e=%" PRI_gpte " l1mfn=%" PRI_mfn "\n",
             gw->l1e.l1, mfn_x(gw->l1mfn));
+#endif
+    gprintk(XENLOG_INFO, "   pfec=%02x[%c%c%c%c%c%c]\n", gw->pfec,
+            gw->pfec & PFEC_prot_key     ? 'K' : '-',
+            gw->pfec & PFEC_insn_fetch   ? 'I' : 'd',
+            gw->pfec & PFEC_reserved_bit ? 'R' : '-',
+            gw->pfec & PFEC_user_mode    ? 'U' : 's',
+            gw->pfec & PFEC_write_access ? 'W' : 'r',
+            gw->pfec & PFEC_page_present ? 'P' : '-'
+        );
 }
 
 #endif /* _XEN_ASM_GUEST_PT_H */
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 470d29d..bc189d1 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -679,7 +679,7 @@ int p2m_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
 extern void p2m_pt_init(struct p2m_domain *p2m);
 
 void *map_domain_gfn(struct p2m_domain *p2m, gfn_t gfn, mfn_t *mfn,
-                     p2m_type_t *p2mt, p2m_query_t q, uint32_t *rc);
+                     p2m_type_t *p2mt, p2m_query_t q, uint32_t *pfec);
 
 /* Debugging and auditing of the P2M code? */
 #ifndef NDEBUG
diff --git a/xen/include/asm-x86/page.h b/xen/include/asm-x86/page.h
index 46faffc..bc5946b 100644
--- a/xen/include/asm-x86/page.h
+++ b/xen/include/asm-x86/page.h
@@ -314,9 +314,6 @@ void efi_update_l4_pgtable(unsigned int l4idx, l4_pgentry_t);
 #define _PAGE_PSE_PAT  _AC(0x1000,U)
 #define _PAGE_AVAIL_HIGH (_AC(0x7ff, U) << 12)
 #define _PAGE_NX       (cpu_has_nx ? _PAGE_NX_BIT : 0)
-/* non-architectural flags */
-#define _PAGE_PAGED   0x2000U
-#define _PAGE_SHARED  0x4000U
 
 /*
  * Debug option: Ensure that granted mappings are not implicitly unmapped.
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index d3ba8ea..75632d9 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -73,10 +73,12 @@
 #define PFEC_reserved_bit   (_AC(1,U) << 3)
 #define PFEC_insn_fetch     (_AC(1,U) << 4)
 #define PFEC_prot_key       (_AC(1,U) << 5)
+#define PFEC_arch_mask      (_AC(0xffff,U)) /* Architectural PFEC values. */
 /* Internally used only flags. */
 #define PFEC_page_paged     (1U<<16)
 #define PFEC_page_shared    (1U<<17)
 #define PFEC_implicit       (1U<<18) /* Pagewalk input for ldt/gdt/idt/tr accesses. */
+#define PFEC_synth_mask     (~PFEC_arch_mask) /* Synthetic PFEC values. */
 
 /* Other exception error code values. */
 #define X86_XEC_EXT         (_AC(1,U) << 0)
diff --git a/xen/include/asm-x86/x86_64/page.h b/xen/include/asm-x86/x86_64/page.h
index 5a613bc..1a6cae6 100644
--- a/xen/include/asm-x86/x86_64/page.h
+++ b/xen/include/asm-x86/x86_64/page.h
@@ -152,12 +152,6 @@ typedef l4_pgentry_t root_pgentry_t;
 #define _PAGE_GNTTAB (1U<<22)
 
 /*
- * Bit 24 of a 24-bit flag mask!  This is not any bit of a real pte,
- * and is only used for signalling in variables that contain flags.
- */
-#define _PAGE_INVALID_BIT (1U<<24)
-
-/*
  * Bit 12 of a 24-bit flag mask. This corresponds to bit 52 of a pte.
  * This is needed to distinguish between user and kernel PTEs since _PAGE_USER
  * is asserted for both.
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 7/9] x86/shadow: Use the pagewalk reserved bits helpers
  2017-03-16 16:31 [PATCH v2 0/9] Fixes to pagetable handling Andrew Cooper
                   ` (5 preceding siblings ...)
  2017-03-16 16:31 ` [PATCH v2 6/9] x86/pagewalk: Re-implement the pagetable walker Andrew Cooper
@ 2017-03-16 16:31 ` Andrew Cooper
  2017-03-16 17:25   ` Andrew Cooper
                     ` (2 more replies)
  2017-03-16 16:31 ` [PATCH v2 8/9] x86/pagewalk: Improve the logic behind setting access and dirty bits Andrew Cooper
  2017-03-16 16:31 ` [PATCH v2 9/9] x86/pagewalk: non-functional cleanup Andrew Cooper
  8 siblings, 3 replies; 41+ messages in thread
From: Andrew Cooper @ 2017-03-16 16:31 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Jan Beulich

The shadow logic should not create a valid/present shadow of a guest PTE which
contains reserved bits from the guests point of view.  It is not guaranteed
that the hardware pagewalk will come to the same conclusion, and raise a
pagefault.

Shadows created on demand from the pagefault handler are fine because the
pagewalk over the guest tables will have injected the fault into the guest
rather than creating a shadow.

However, shadows created by sh_resync_l1() and sh_prefetch() haven't undergone
a pagewalk and need to account for reserved bits before creating the shadow.

In practice, this means a 3-level guest could previously cause PTEs with bits
63:52 set to be shadowed (and discarded).  This PTE should cause #PF[RSVD]
when encountered by hardware, but the installed shadow is valid and hardware
doesn't fault.

Reuse the pagewalk reserved bits helpers, and assert in
l?e_propagate_from_guest() that shadows are not attempted to be created with
reserved bits set.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: George Dunlap <george.dunlap@eu.citrix.com>

v2:
 * Reword commit message, and include sh_resync_l1/sh_prefetch
---
 xen/arch/x86/mm/shadow/multi.c | 56 +++++++++++++++++++++++++++++++++---------
 1 file changed, 45 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 4762e15..56114c7 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -745,6 +745,10 @@ l4e_propagate_from_guest(struct vcpu *v,
                          shadow_l4e_t *sl4e,
                          fetch_type_t ft)
 {
+    if ( !mfn_eq(sl3mfn, INVALID_MFN) &&
+         (guest_l4e_get_flags(gl4e) & _PAGE_PRESENT) )
+        ASSERT(!guest_l4e_rsvd_bits(v, gl4e));
+
     _sh_propagate(v, gl4e.l4, sl3mfn, sl4e, 4, ft, p2m_ram_rw);
 }
 
@@ -755,6 +759,10 @@ l3e_propagate_from_guest(struct vcpu *v,
                          shadow_l3e_t *sl3e,
                          fetch_type_t ft)
 {
+    if ( !mfn_eq(sl2mfn, INVALID_MFN) &&
+         (guest_l3e_get_flags(gl3e) & _PAGE_PRESENT) )
+        ASSERT(!guest_l3e_rsvd_bits(v, gl3e));
+
     _sh_propagate(v, gl3e.l3, sl2mfn, sl3e, 3, ft, p2m_ram_rw);
 }
 #endif // GUEST_PAGING_LEVELS >= 4
@@ -766,6 +774,10 @@ l2e_propagate_from_guest(struct vcpu *v,
                          shadow_l2e_t *sl2e,
                          fetch_type_t ft)
 {
+    if ( !mfn_eq(sl1mfn, INVALID_MFN) &&
+         (guest_l2e_get_flags(gl2e) & _PAGE_PRESENT) )
+        ASSERT(!guest_l2e_rsvd_bits(v, gl2e));
+
     _sh_propagate(v, gl2e.l2, sl1mfn, sl2e, 2, ft, p2m_ram_rw);
 }
 
@@ -777,6 +789,10 @@ l1e_propagate_from_guest(struct vcpu *v,
                          fetch_type_t ft,
                          p2m_type_t p2mt)
 {
+    if ( !mfn_eq(gmfn, INVALID_MFN) &&
+         (guest_l1e_get_flags(gl1e) & _PAGE_PRESENT) )
+        ASSERT(!guest_l1e_rsvd_bits(v, gl1e));
+
     _sh_propagate(v, gl1e.l1, gmfn, sl1e, 1, ft, p2mt);
 }
 
@@ -2157,7 +2173,8 @@ static int validate_gl4e(struct vcpu *v, void *new_ge, mfn_t sl4mfn, void *se)
 
     perfc_incr(shadow_validate_gl4e_calls);
 
-    if ( guest_l4e_get_flags(new_gl4e) & _PAGE_PRESENT )
+    if ( (guest_l4e_get_flags(new_gl4e) & _PAGE_PRESENT) &&
+         !guest_l4e_rsvd_bits(v, new_gl4e) )
     {
         gfn_t gl3gfn = guest_l4e_get_gfn(new_gl4e);
         mfn_t gl3mfn = get_gfn_query_unlocked(d, gfn_x(gl3gfn), &p2mt);
@@ -2215,7 +2232,8 @@ static int validate_gl3e(struct vcpu *v, void *new_ge, mfn_t sl3mfn, void *se)
 
     perfc_incr(shadow_validate_gl3e_calls);
 
-    if ( guest_l3e_get_flags(new_gl3e) & _PAGE_PRESENT )
+    if ( (guest_l3e_get_flags(new_gl3e) & _PAGE_PRESENT) &&
+         !guest_l3e_rsvd_bits(v, new_gl3e) )
     {
         gfn_t gl2gfn = guest_l3e_get_gfn(new_gl3e);
         mfn_t gl2mfn = get_gfn_query_unlocked(d, gfn_x(gl2gfn), &p2mt);
@@ -2248,7 +2266,8 @@ static int validate_gl2e(struct vcpu *v, void *new_ge, mfn_t sl2mfn, void *se)
 
     perfc_incr(shadow_validate_gl2e_calls);
 
-    if ( guest_l2e_get_flags(new_gl2e) & _PAGE_PRESENT )
+    if ( (guest_l2e_get_flags(new_gl2e) & _PAGE_PRESENT) &&
+         !guest_l2e_rsvd_bits(v, new_gl2e) )
     {
         gfn_t gl1gfn = guest_l2e_get_gfn(new_gl2e);
         if ( guest_supports_l2_superpages(v) &&
@@ -2289,7 +2308,7 @@ static int validate_gl1e(struct vcpu *v, void *new_ge, mfn_t sl1mfn, void *se)
     guest_l1e_t new_gl1e = *(guest_l1e_t *)new_ge;
     shadow_l1e_t *sl1p = se;
     gfn_t gfn;
-    mfn_t gmfn;
+    mfn_t gmfn = INVALID_MFN;
     p2m_type_t p2mt;
     int result = 0;
 #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
@@ -2298,8 +2317,12 @@ static int validate_gl1e(struct vcpu *v, void *new_ge, mfn_t sl1mfn, void *se)
 
     perfc_incr(shadow_validate_gl1e_calls);
 
-    gfn = guest_l1e_get_gfn(new_gl1e);
-    gmfn = get_gfn_query_unlocked(d, gfn_x(gfn), &p2mt);
+    if ( (guest_l1e_get_flags(new_gl1e) & _PAGE_PRESENT) &&
+         !guest_l1e_rsvd_bits(v, new_gl1e) )
+    {
+        gfn = guest_l1e_get_gfn(new_gl1e);
+        gmfn = get_gfn_query_unlocked(d, gfn_x(gfn), &p2mt);
+    }
 
     l1e_propagate_from_guest(v, new_gl1e, gmfn, &new_sl1e, ft_prefetch, p2mt);
     result |= shadow_set_l1e(d, sl1p, new_sl1e, p2mt, sl1mfn);
@@ -2355,12 +2378,17 @@ void sh_resync_l1(struct vcpu *v, mfn_t gl1mfn, mfn_t snpmfn)
         if ( memcmp(snpl1p, &gl1e, sizeof(gl1e)) )
         {
             gfn_t gfn;
-            mfn_t gmfn;
+            mfn_t gmfn = INVALID_MFN;
             p2m_type_t p2mt;
             shadow_l1e_t nsl1e;
 
-            gfn = guest_l1e_get_gfn(gl1e);
-            gmfn = get_gfn_query_unlocked(d, gfn_x(gfn), &p2mt);
+            if ( (guest_l1e_get_flags(gl1e) & _PAGE_PRESENT) &&
+                 !guest_l1e_rsvd_bits(v, gl1e) )
+            {
+                gfn = guest_l1e_get_gfn(gl1e);
+                gmfn = get_gfn_query_unlocked(d, gfn_x(gfn), &p2mt);
+            }
+
             l1e_propagate_from_guest(v, gl1e, gmfn, &nsl1e, ft_prefetch, p2mt);
             rc |= shadow_set_l1e(d, sl1p, nsl1e, p2mt, sl1mfn);
             *snpl1p = gl1e;
@@ -2686,8 +2714,14 @@ static void sh_prefetch(struct vcpu *v, walk_t *gw,
         }
 
         /* Look at the gfn that the l1e is pointing at */
-        gfn = guest_l1e_get_gfn(gl1e);
-        gmfn = get_gfn_query_unlocked(d, gfn_x(gfn), &p2mt);
+        if ( (guest_l1e_get_flags(gl1e) & _PAGE_PRESENT) &&
+             !guest_l1e_rsvd_bits(v, gl1e) )
+        {
+            gfn = guest_l1e_get_gfn(gl1e);
+            gmfn = get_gfn_query_unlocked(d, gfn_x(gfn), &p2mt);
+        }
+        else
+            gmfn = INVALID_MFN;
 
         /* Propagate the entry.  */
         l1e_propagate_from_guest(v, gl1e, gmfn, &sl1e, ft_prefetch, p2mt);
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 8/9] x86/pagewalk: Improve the logic behind setting access and dirty bits
  2017-03-16 16:31 [PATCH v2 0/9] Fixes to pagetable handling Andrew Cooper
                   ` (6 preceding siblings ...)
  2017-03-16 16:31 ` [PATCH v2 7/9] x86/shadow: Use the pagewalk reserved bits helpers Andrew Cooper
@ 2017-03-16 16:31 ` Andrew Cooper
  2017-03-20  9:03   ` Jan Beulich
  2017-03-23 17:09   ` Tim Deegan
  2017-03-16 16:31 ` [PATCH v2 9/9] x86/pagewalk: non-functional cleanup Andrew Cooper
  8 siblings, 2 replies; 41+ messages in thread
From: Andrew Cooper @ 2017-03-16 16:31 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Jan Beulich

The boolean pse2M is misnamed, because it might refer to a 4M superpage.

Switch the logic to be in terms of the level of the leaf entry, and rearrange
the calls to set_ad_bits() to be a fallthrough switch statement, to make it
easier to follow.

Alter set_ad_bits() to use properly use bool rather than integers.  Introduce
properly typed pointers rather than repeatedly casting away from void.  Use
ACCESS_ONCE() to ensure that *walk_p is only read once.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: George Dunlap <george.dunlap@eu.citrix.com>

v2:
 * New
---
 xen/arch/x86/mm/guest_walk.c | 81 +++++++++++++++++++++++++-------------------
 1 file changed, 47 insertions(+), 34 deletions(-)

diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c
index bdb52d9..4ba7602 100644
--- a/xen/arch/x86/mm/guest_walk.c
+++ b/xen/arch/x86/mm/guest_walk.c
@@ -32,24 +32,28 @@ asm(".file \"" __OBJECT_FILE__ "\"");
 #include <asm/page.h>
 #include <asm/guest_pt.h>
 
-/* Modify a guest pagetable entry to set the Accessed and Dirty bits.
- * Returns non-zero if it actually writes to guest memory. */
-static uint32_t set_ad_bits(void *guest_p, void *walk_p, int set_dirty)
+/*
+ * Modify a guest pagetable entry to set the Accessed and Dirty bits.
+ * Returns non-zero if it actually writes to guest memory.
+ */
+static bool set_ad_bits(void *_guest_p, void *_walk_p, bool set_dirty)
 {
-    guest_intpte_t old, new;
+    guest_intpte_t *guest_p = _guest_p, *walk_p = _walk_p;
+    guest_intpte_t new, old = ACCESS_ONCE(*walk_p);
 
-    old = *(guest_intpte_t *)walk_p;
     new = old | _PAGE_ACCESSED | (set_dirty ? _PAGE_DIRTY : 0);
-    if ( old != new ) 
+    if ( old != new )
     {
-        /* Write the new entry into the walk, and try to write it back
+        /*
+         * Write the new entry into the walk, and try to write it back
          * into the guest table as well.  If the guest table has changed
-         * under out feet then leave it alone. */
-        *(guest_intpte_t *)walk_p = new;
-        if ( cmpxchg(((guest_intpte_t *)guest_p), old, new) == old ) 
-            return 1;
+         * under out feet then leave it alone.
+         */
+        *walk_p = new;
+        if ( cmpxchg(guest_p, old, new) == old )
+            return true;
     }
-    return 0;
+    return false;
 }
 
 /*
@@ -89,7 +93,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
     guest_l4e_t *l4p;
 #endif
     uint32_t gflags, rc;
-    bool_t pse1G = 0, pse2M = 0;
+    unsigned int leaf_level;
     p2m_query_t qt = P2M_ALLOC | P2M_UNSHARE;
 
 #define AR_ACCUM_AND (_PAGE_USER | _PAGE_RW)
@@ -179,9 +183,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
     ar_and &= gflags;
     ar_or  |= gflags;
 
-    pse1G = !!(gflags & _PAGE_PSE);
-
-    if ( pse1G )
+    if ( gflags & _PAGE_PSE )
     {
         /* Generate a fake l1 table entry so callers don't all 
          * have to understand superpages. */
@@ -204,6 +206,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
                      ((va >> PAGE_SHIFT) & GUEST_L3_GFN_MASK));
         gw->l1e = guest_l1e_from_gfn(start, flags);
         gw->l2mfn = gw->l1mfn = INVALID_MFN;
+        leaf_level = 3;
         goto leaf;
     }
 
@@ -273,9 +276,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
     ar_and &= gflags;
     ar_or  |= gflags;
 
-    pse2M = !!(gflags & _PAGE_PSE);
-
-    if ( pse2M )
+    if ( gflags & _PAGE_PSE )
     {
         /* Special case: this guest VA is in a PSE superpage, so there's
          * no guest l1e.  We make one up so that the propagation code
@@ -309,6 +310,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
         gw->l1e = guest_l1e_from_gfn(start, flags);
 #endif
         gw->l1mfn = INVALID_MFN;
+        leaf_level = 2;
         goto leaf;
     }
 
@@ -340,6 +342,8 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
     ar_and &= gflags;
     ar_or  |= gflags;
 
+    leaf_level = 1;
+
  leaf:
     gw->pfec |= PFEC_page_present;
 
@@ -413,24 +417,33 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
      * success.  Although the PRMs say higher-level _PAGE_ACCESSED bits
      * get set whenever a lower-level PT is used, at least some hardware
      * walkers behave this way. */
-#if GUEST_PAGING_LEVELS == 4 /* 64-bit only... */
-    if ( set_ad_bits(l4p + guest_l4_table_offset(va), &gw->l4e, 0) )
-        paging_mark_dirty(d, gw->l4mfn);
-    if ( set_ad_bits(l3p + guest_l3_table_offset(va), &gw->l3e,
-                     (pse1G && (walk & PFEC_write_access))) )
-        paging_mark_dirty(d, gw->l3mfn);
-#endif
-    if ( !pse1G )
+    switch ( leaf_level )
     {
+    default:
+        ASSERT_UNREACHABLE();
+        break;
+
+    case 1:
+        if ( set_ad_bits(l1p + guest_l1_table_offset(va), &gw->l1e,
+                         (walk & PFEC_write_access) && leaf_level == 1) )
+            paging_mark_dirty(d, gw->l1mfn);
+        /* Fallthrough */
+
+    case 2:
         if ( set_ad_bits(l2p + guest_l2_table_offset(va), &gw->l2e,
-                         (pse2M && (walk & PFEC_write_access))) )
+                         (walk & PFEC_write_access) && leaf_level == 2) )
             paging_mark_dirty(d, gw->l2mfn);
-        if ( !pse2M )
-        {
-            if ( set_ad_bits(l1p + guest_l1_table_offset(va), &gw->l1e,
-                             (walk & PFEC_write_access)) )
-                paging_mark_dirty(d, gw->l1mfn);
-        }
+        /* Fallthrough */
+
+#if GUEST_PAGING_LEVELS == 4 /* 64-bit only... */
+    case 3:
+        if ( set_ad_bits(l3p + guest_l3_table_offset(va), &gw->l3e,
+                         (walk & PFEC_write_access) && leaf_level == 3) )
+            paging_mark_dirty(d, gw->l3mfn);
+
+        if ( set_ad_bits(l4p + guest_l4_table_offset(va), &gw->l4e, 0) )
+            paging_mark_dirty(d, gw->l4mfn);
+#endif
     }
 
  out:
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 9/9] x86/pagewalk: non-functional cleanup
  2017-03-16 16:31 [PATCH v2 0/9] Fixes to pagetable handling Andrew Cooper
                   ` (7 preceding siblings ...)
  2017-03-16 16:31 ` [PATCH v2 8/9] x86/pagewalk: Improve the logic behind setting access and dirty bits Andrew Cooper
@ 2017-03-16 16:31 ` Andrew Cooper
  2017-03-20  9:04   ` Jan Beulich
  2017-03-23 17:10   ` Tim Deegan
  8 siblings, 2 replies; 41+ messages in thread
From: Andrew Cooper @ 2017-03-16 16:31 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Jan Beulich

 * Drop trailing whitespace
 * Consistently apply Xen style
 * Introduce a local variable block

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: George Dunlap <george.dunlap@eu.citrix.com>

v2:
 * New
---
 xen/arch/x86/mm/guest_walk.c | 82 ++++++++++++++++++++++++++++----------------
 1 file changed, 53 insertions(+), 29 deletions(-)

diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c
index 4ba7602..a8b72b2 100644
--- a/xen/arch/x86/mm/guest_walk.c
+++ b/xen/arch/x86/mm/guest_walk.c
@@ -81,7 +81,7 @@ static bool set_ad_bits(void *_guest_p, void *_walk_p, bool set_dirty)
  */
 bool
 guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
-                  unsigned long va, walk_t *gw, 
+                  unsigned long va, walk_t *gw,
                   uint32_t walk, mfn_t top_mfn, void *top_map)
 {
     struct domain *d = v->domain;
@@ -154,13 +154,13 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
     ar_or  |= gflags;
 
     /* Map the l3 table */
-    l3p = map_domain_gfn(p2m, 
-                         guest_l4e_get_gfn(gw->l4e), 
+    l3p = map_domain_gfn(p2m,
+                         guest_l4e_get_gfn(gw->l4e),
                          &gw->l3mfn,
                          &p2mt,
                          qt,
-                         &rc); 
-    if(l3p == NULL)
+                         &rc);
+    if ( l3p == NULL )
     {
         gw->pfec |= rc & PFEC_synth_mask;
         goto out;
@@ -178,23 +178,29 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
         gw->pfec |= PFEC_reserved_bit | PFEC_page_present;
         goto out;
     }
-    
+
     /* Accumulate l3e access rights. */
     ar_and &= gflags;
     ar_or  |= gflags;
 
     if ( gflags & _PAGE_PSE )
     {
-        /* Generate a fake l1 table entry so callers don't all 
-         * have to understand superpages. */
+        /*
+         * Generate a fake l1 table entry so callers don't all
+         * have to understand superpages.
+         */
         gfn_t start = guest_l3e_get_gfn(gw->l3e);
-        /* Grant full access in the l1e, since all the guest entry's
-         * access controls are enforced in the l3e. */
+        /*
+         * Grant full access in the l1e, since all the guest entry's
+         * access controls are enforced in the l3e.
+         */
         int flags = (_PAGE_PRESENT|_PAGE_USER|_PAGE_RW|
                      _PAGE_ACCESSED|_PAGE_DIRTY);
-        /* Import cache-control bits. Note that _PAGE_PAT is actually
+        /*
+         * Import cache-control bits. Note that _PAGE_PAT is actually
          * _PAGE_PSE, and it is always set. We will clear it in case
-         * _PAGE_PSE_PAT (bit 12, i.e. first bit of gfn) is clear. */
+         * _PAGE_PSE_PAT (bit 12, i.e. first bit of gfn) is clear.
+         */
         flags |= (guest_l3e_get_flags(gw->l3e)
                   & (_PAGE_PAT|_PAGE_PWT|_PAGE_PCD));
         if ( !(gfn_x(start) & 1) )
@@ -227,13 +233,13 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 #endif /* PAE or 64... */
 
     /* Map the l2 table */
-    l2p = map_domain_gfn(p2m, 
-                         guest_l3e_get_gfn(gw->l3e), 
+    l2p = map_domain_gfn(p2m,
+                         guest_l3e_get_gfn(gw->l3e),
                          &gw->l2mfn,
-                         &p2mt, 
+                         &p2mt,
                          qt,
-                         &rc); 
-    if(l2p == NULL)
+                         &rc);
+    if ( l2p == NULL )
     {
         gw->pfec |= rc & PFEC_synth_mask;
         goto out;
@@ -278,22 +284,28 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 
     if ( gflags & _PAGE_PSE )
     {
-        /* Special case: this guest VA is in a PSE superpage, so there's
+        /*
+         * Special case: this guest VA is in a PSE superpage, so there's
          * no guest l1e.  We make one up so that the propagation code
-         * can generate a shadow l1 table.  Start with the gfn of the 
-         * first 4k-page of the superpage. */
+         * can generate a shadow l1 table.  Start with the gfn of the
+         * first 4k-page of the superpage.
+         */
 #if GUEST_PAGING_LEVELS == 2
         gfn_t start = _gfn(unfold_pse36(gw->l2e.l2) >> PAGE_SHIFT);
 #else
         gfn_t start = guest_l2e_get_gfn(gw->l2e);
 #endif
-        /* Grant full access in the l1e, since all the guest entry's 
-         * access controls are enforced in the shadow l2e. */
+        /*
+         * Grant full access in the l1e, since all the guest entry's
+         * access controls are enforced in the shadow l2e.
+         */
         int flags = (_PAGE_PRESENT|_PAGE_USER|_PAGE_RW|
                      _PAGE_ACCESSED|_PAGE_DIRTY);
-        /* Import cache-control bits. Note that _PAGE_PAT is actually
+        /*
+         * Import cache-control bits. Note that _PAGE_PAT is actually
          * _PAGE_PSE, and it is always set. We will clear it in case
-         * _PAGE_PSE_PAT (bit 12, i.e. first bit of gfn) is clear. */
+         * _PAGE_PSE_PAT (bit 12, i.e. first bit of gfn) is clear.
+         */
         flags |= (guest_l2e_get_flags(gw->l2e)
                   & (_PAGE_PAT|_PAGE_PWT|_PAGE_PCD));
         if ( !(gfn_x(start) & 1) )
@@ -413,10 +425,12 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 
     walk_ok = true;
 
-    /* Go back and set accessed and dirty bits only if the walk was a
+    /*
+     * Go back and set accessed and dirty bits only if the walk was a
      * success.  Although the PRMs say higher-level _PAGE_ACCESSED bits
      * get set whenever a lower-level PT is used, at least some hardware
-     * walkers behave this way. */
+     * walkers behave this way.
+     */
     switch ( leaf_level )
     {
     default:
@@ -448,20 +462,20 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 
  out:
 #if GUEST_PAGING_LEVELS == 4
-    if ( l3p ) 
+    if ( l3p )
     {
         unmap_domain_page(l3p);
         put_page(mfn_to_page(mfn_x(gw->l3mfn)));
     }
 #endif
 #if GUEST_PAGING_LEVELS >= 3
-    if ( l2p ) 
+    if ( l2p )
     {
         unmap_domain_page(l2p);
         put_page(mfn_to_page(mfn_x(gw->l2mfn)));
     }
 #endif
-    if ( l1p ) 
+    if ( l1p )
     {
         unmap_domain_page(l1p);
         put_page(mfn_to_page(mfn_x(gw->l1mfn)));
@@ -469,3 +483,13 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 
     return walk_ok;
 }
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 7/9] x86/shadow: Use the pagewalk reserved bits helpers
  2017-03-16 16:31 ` [PATCH v2 7/9] x86/shadow: Use the pagewalk reserved bits helpers Andrew Cooper
@ 2017-03-16 17:25   ` Andrew Cooper
  2017-03-20  8:53   ` Jan Beulich
  2017-03-23 16:57   ` Tim Deegan
  2 siblings, 0 replies; 41+ messages in thread
From: Andrew Cooper @ 2017-03-16 17:25 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Tim Deegan, Jan Beulich

On 16/03/17 16:31, Andrew Cooper wrote:
> The shadow logic should not create a valid/present shadow of a guest PTE which
> contains reserved bits from the guests point of view.  It is not guaranteed
> that the hardware pagewalk will come to the same conclusion, and raise a
> pagefault.
>
> Shadows created on demand from the pagefault handler are fine because the
> pagewalk over the guest tables will have injected the fault into the guest
> rather than creating a shadow.
>
> However, shadows created by sh_resync_l1() and sh_prefetch() haven't undergone
> a pagewalk and need to account for reserved bits before creating the shadow.
>
> In practice, this means a 3-level guest could previously cause PTEs with bits
> 63:52 set to be shadowed (and discarded).  This PTE should cause #PF[RSVD]
> when encountered by hardware, but the installed shadow is valid and hardware
> doesn't fault.
>
> Reuse the pagewalk reserved bits helpers, and assert in
> l?e_propagate_from_guest() that shadows are not attempted to be created with
> reserved bits set.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

The travis build running clang points out that p2mt might now be
uninitialised in the l1 cases.  In practice it doesn't matter because
INVALID_MFN will cause _sh_propagate() to head in an empty() direction,
but I have folded the following delta in.

diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 56114c7..627284d 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -2309,7 +2309,7 @@ static int validate_gl1e(struct vcpu *v, void
*new_ge, mfn_t sl1mfn, void *se)
     shadow_l1e_t *sl1p = se;
     gfn_t gfn;
     mfn_t gmfn = INVALID_MFN;
-    p2m_type_t p2mt;
+    p2m_type_t p2mt = p2m_invalid;
     int result = 0;
 #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
     mfn_t gl1mfn;
@@ -2379,7 +2379,7 @@ void sh_resync_l1(struct vcpu *v, mfn_t gl1mfn,
mfn_t snpmfn)
         {
             gfn_t gfn;
             mfn_t gmfn = INVALID_MFN;
-            p2m_type_t p2mt;
+            p2m_type_t p2mt = p2m_invalid;
             shadow_l1e_t nsl1e;
 
             if ( (guest_l1e_get_flags(gl1e) & _PAGE_PRESENT) &&
@@ -2721,7 +2721,10 @@ static void sh_prefetch(struct vcpu *v, walk_t *gw,
             gmfn = get_gfn_query_unlocked(d, gfn_x(gfn), &p2mt);
         }
         else
+        {
             gmfn = INVALID_MFN;
+            p2mt = p2m_invalid;
+        }
 
         /* Propagate the entry.  */
         l1e_propagate_from_guest(v, gl1e, gmfn, &sl1e, ft_prefetch, p2mt);

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/9] x86/cpuid: Sort cpu_has_* predicates by feature number
  2017-03-16 16:31 ` [PATCH v2 1/9] x86/cpuid: Sort cpu_has_* predicates by feature number Andrew Cooper
@ 2017-03-17 16:08   ` Jan Beulich
  0 siblings, 0 replies; 41+ messages in thread
From: Jan Beulich @ 2017-03-17 16:08 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 16.03.17 at 17:31, <andrew.cooper3@citrix.com> wrote:
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/9] x86/pagewalk: Use pointer syntax for pfec parameter
  2017-03-16 16:31 ` [PATCH v2 2/9] x86/pagewalk: Use pointer syntax for pfec parameter Andrew Cooper
@ 2017-03-17 16:09   ` Jan Beulich
  2017-03-20 11:29   ` George Dunlap
  2017-03-23 16:28   ` Tim Deegan
  2 siblings, 0 replies; 41+ messages in thread
From: Jan Beulich @ 2017-03-17 16:09 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Tim Deegan, Xen-devel

>>> On 16.03.17 at 17:31, <andrew.cooper3@citrix.com> wrote:
> It is a pointer, not an array.
> 
> No functional change.
> 
> Requested-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/9] x86/pagewalk: Clean up guest_supports_* predicates
  2017-03-16 16:31 ` [PATCH v2 4/9] x86/pagewalk: Clean up guest_supports_* predicates Andrew Cooper
@ 2017-03-20  8:45   ` Jan Beulich
  2017-03-20 13:36     ` Andrew Cooper
  2017-03-23 16:34   ` Tim Deegan
  1 sibling, 1 reply; 41+ messages in thread
From: Jan Beulich @ 2017-03-20  8:45 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Tim Deegan, Xen-devel

>>> On 16.03.17 at 17:31, <andrew.cooper3@citrix.com> wrote:
> Switch them to returning bool, and taking const parameters.
> 
> Rename guest_supports_superpages() to guest_supports_l2_superpages() to
> indicate which level of pagetables it is actually referring to, and rename
> guest_supports_1G_superpages() to guest_supports_l3_superpages() for
> consistency.
> 
> guest_supports_l3_superpages() is a static property of the domain, rather than
> control register settings, so is switched to take a domain pointer.
> hvm_pse1gb_supported() is inlined into its sole user because it isn't strictly
> hvm-specific (it is hap-specific) and really should be beside a comment
> explaining why the cpuid policy is ignored.
> 
> While cleaning up part of the file, clean up all trailing whilespace, and fix
> one comment which accidently refered to PG living in CR4 rather than CR0.
> 
> Requested-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
with two remarks:

> -static inline int
> -guest_supports_1G_superpages(struct vcpu *v)
> +static inline bool guest_supports_l3_superpages(const struct domain *d)
>  {
> -    return (GUEST_PAGING_LEVELS >= 4 && hvm_pse1gb_supported(v->domain));
> +    /*
> +     * There are no control register settings for the hardware pagewalk on the
> +     * subject of 1G superpages.
> +     *
> +     * If the guest constructs a 1GB superpage on capable hardware, it will
> +     * function irrespective of whether the feature is advertised.  Xen's
> +     * model of performing a pagewalk should match.
> +     */
> +    return GUEST_PAGING_LEVELS >= 4 && paging_mode_hap(d) && cpu_has_page1gb;

Is it perhaps worth adding half a sentence stating that shadow
doesn't support 1Gb pages at all?

Also I'm still not really happy with the guest_supports_ prefixes
for this and its L2 counterpart: The question here isn't whether the
guest supports it (we can't know whether it does), but whether it
enabled PSE/PAE/LM. Arguably the L3 case is less clear because
of the mentioned lack of an explicit enabled bit, so I can live with
the patch going in unchanged (the L2 side then simply for things
to remain consistent, albeit there's then already the difference of
parameter types).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 5/9] x86/pagewalk: Helpers for reserved bit handling
  2017-03-16 16:31 ` [PATCH v2 5/9] x86/pagewalk: Helpers for reserved bit handling Andrew Cooper
@ 2017-03-20  8:48   ` Jan Beulich
  2017-03-23 16:55   ` Tim Deegan
  1 sibling, 0 replies; 41+ messages in thread
From: Jan Beulich @ 2017-03-20  8:48 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Tim Deegan, Xen-devel

>>> On 16.03.17 at 17:31, <andrew.cooper3@citrix.com> wrote:
> Some bits are unconditionally reserved in pagetable entries, or reserved
> because of alignment restrictions.  Other bits are reserved because of control
> register configuration.
> 
> Introduce helpers which take an individual vcpu and guest pagetable entry, and
> calculates whether any reserved bits are set.
> 
> While here, add a couple of newlines to aid readability.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
with the same remarks as on the prior patch for ...

> @@ -182,6 +216,21 @@ static inline bool guest_supports_l2_superpages(const struct vcpu *v)
>                 || (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PSE)));
>  }
>  
> +static inline bool guest_supports_pse36(const struct domain *d)
> +{
> +    /*
> +     * Only called in the context of 2-level guests, after
> +     * guest_supports_l2_superpages() has indicated true.
> +     *
> +     * Once L2 superpages are active, here are no control register settings
> +     * for the hardware pagewalk on the subject of PSE36.  If the guest
> +     * constructs a PSE36 superpage on capable hardware, it will function
> +     * irrespective of whether the feature is advertised.  Xen's model of
> +     * performing a pagewalk should match.
> +     */
> +    return paging_mode_hap(d) && cpu_has_pse36;
> +}

... this.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 7/9] x86/shadow: Use the pagewalk reserved bits helpers
  2017-03-16 16:31 ` [PATCH v2 7/9] x86/shadow: Use the pagewalk reserved bits helpers Andrew Cooper
  2017-03-16 17:25   ` Andrew Cooper
@ 2017-03-20  8:53   ` Jan Beulich
  2017-03-23 16:57   ` Tim Deegan
  2 siblings, 0 replies; 41+ messages in thread
From: Jan Beulich @ 2017-03-20  8:53 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Tim Deegan, Xen-devel

>>> On 16.03.17 at 17:31, <andrew.cooper3@citrix.com> wrote:
> The shadow logic should not create a valid/present shadow of a guest PTE which
> contains reserved bits from the guests point of view.  It is not guaranteed
> that the hardware pagewalk will come to the same conclusion, and raise a
> pagefault.
> 
> Shadows created on demand from the pagefault handler are fine because the
> pagewalk over the guest tables will have injected the fault into the guest
> rather than creating a shadow.
> 
> However, shadows created by sh_resync_l1() and sh_prefetch() haven't undergone
> a pagewalk and need to account for reserved bits before creating the shadow.
> 
> In practice, this means a 3-level guest could previously cause PTEs with bits
> 63:52 set to be shadowed (and discarded).  This PTE should cause #PF[RSVD]
> when encountered by hardware, but the installed shadow is valid and hardware
> doesn't fault.
> 
> Reuse the pagewalk reserved bits helpers, and assert in
> l?e_propagate_from_guest() that shadows are not attempted to be created with
> reserved bits set.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

This looks fine to me, but I can't claim to know the shadow code well
enough to be certain all places where changes are needed are
actually covered. So I'll defer to Tim and George.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 8/9] x86/pagewalk: Improve the logic behind setting access and dirty bits
  2017-03-16 16:31 ` [PATCH v2 8/9] x86/pagewalk: Improve the logic behind setting access and dirty bits Andrew Cooper
@ 2017-03-20  9:03   ` Jan Beulich
  2017-03-23 17:09   ` Tim Deegan
  1 sibling, 0 replies; 41+ messages in thread
From: Jan Beulich @ 2017-03-20  9:03 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Tim Deegan, Xen-devel

>>> On 16.03.17 at 17:31, <andrew.cooper3@citrix.com> wrote:
> @@ -413,24 +417,33 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
>       * success.  Although the PRMs say higher-level _PAGE_ACCESSED bits
>       * get set whenever a lower-level PT is used, at least some hardware
>       * walkers behave this way. */
> -#if GUEST_PAGING_LEVELS == 4 /* 64-bit only... */
> -    if ( set_ad_bits(l4p + guest_l4_table_offset(va), &gw->l4e, 0) )
> -        paging_mark_dirty(d, gw->l4mfn);
> -    if ( set_ad_bits(l3p + guest_l3_table_offset(va), &gw->l3e,
> -                     (pse1G && (walk & PFEC_write_access))) )
> -        paging_mark_dirty(d, gw->l3mfn);
> -#endif
> -    if ( !pse1G )
> +    switch ( leaf_level )
>      {
> +    default:
> +        ASSERT_UNREACHABLE();
> +        break;
> +
> +    case 1:
> +        if ( set_ad_bits(l1p + guest_l1_table_offset(va), &gw->l1e,
> +                         (walk & PFEC_write_access) && leaf_level == 1) )

The comparison on leaf_level is pointless here (other than further
down).

> +            paging_mark_dirty(d, gw->l1mfn);
> +        /* Fallthrough */
> +
> +    case 2:
>          if ( set_ad_bits(l2p + guest_l2_table_offset(va), &gw->l2e,
> -                         (pse2M && (walk & PFEC_write_access))) )
> +                         (walk & PFEC_write_access) && leaf_level == 2) )
>              paging_mark_dirty(d, gw->l2mfn);
> -        if ( !pse2M )
> -        {
> -            if ( set_ad_bits(l1p + guest_l1_table_offset(va), &gw->l1e,
> -                             (walk & PFEC_write_access)) )
> -                paging_mark_dirty(d, gw->l1mfn);
> -        }
> +        /* Fallthrough */
> +
> +#if GUEST_PAGING_LEVELS == 4 /* 64-bit only... */
> +    case 3:
> +        if ( set_ad_bits(l3p + guest_l3_table_offset(va), &gw->l3e,
> +                         (walk & PFEC_write_access) && leaf_level == 3) )
> +            paging_mark_dirty(d, gw->l3mfn);
> +
> +        if ( set_ad_bits(l4p + guest_l4_table_offset(va), &gw->l4e, 0) )

false

Other than that,
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Personally I also think the fall through behavior would be more
immediately visible if you omitted the blank lines between the case
blocks.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 9/9] x86/pagewalk: non-functional cleanup
  2017-03-16 16:31 ` [PATCH v2 9/9] x86/pagewalk: non-functional cleanup Andrew Cooper
@ 2017-03-20  9:04   ` Jan Beulich
  2017-03-23 17:10   ` Tim Deegan
  1 sibling, 0 replies; 41+ messages in thread
From: Jan Beulich @ 2017-03-20  9:04 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Tim Deegan, Xen-devel

 >>> On 16.03.17 at 17:31, <andrew.cooper3@citrix.com> wrote:
> * Drop trailing whitespace
>  * Consistently apply Xen style
>  * Introduce a local variable block
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/9] x86/pagewalk: Use pointer syntax for pfec parameter
  2017-03-16 16:31 ` [PATCH v2 2/9] x86/pagewalk: Use pointer syntax for pfec parameter Andrew Cooper
  2017-03-17 16:09   ` Jan Beulich
@ 2017-03-20 11:29   ` George Dunlap
  2017-03-23 16:28   ` Tim Deegan
  2 siblings, 0 replies; 41+ messages in thread
From: George Dunlap @ 2017-03-20 11:29 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel; +Cc: George Dunlap, Tim Deegan, Jan Beulich

On 16/03/17 16:31, Andrew Cooper wrote:
> It is a pointer, not an array.
> 
> No functional change.
> 
> Requested-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Thank you, this has bothered me for a long time:

Reviewed-by: George Dunlap <george.dunlap@citrix.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/9] x86/pagewalk: Clean up guest_supports_* predicates
  2017-03-20  8:45   ` Jan Beulich
@ 2017-03-20 13:36     ` Andrew Cooper
  2017-03-20 13:59       ` Jan Beulich
  0 siblings, 1 reply; 41+ messages in thread
From: Andrew Cooper @ 2017-03-20 13:36 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Tim Deegan, Xen-devel

On 20/03/17 08:45, Jan Beulich wrote:
>>>> On 16.03.17 at 17:31, <andrew.cooper3@citrix.com> wrote:
>> Switch them to returning bool, and taking const parameters.
>>
>> Rename guest_supports_superpages() to guest_supports_l2_superpages() to
>> indicate which level of pagetables it is actually referring to, and rename
>> guest_supports_1G_superpages() to guest_supports_l3_superpages() for
>> consistency.
>>
>> guest_supports_l3_superpages() is a static property of the domain, rather than
>> control register settings, so is switched to take a domain pointer.
>> hvm_pse1gb_supported() is inlined into its sole user because it isn't strictly
>> hvm-specific (it is hap-specific) and really should be beside a comment
>> explaining why the cpuid policy is ignored.
>>
>> While cleaning up part of the file, clean up all trailing whilespace, and fix
>> one comment which accidently refered to PG living in CR4 rather than CR0.
>>
>> Requested-by: Jan Beulich <jbeulich@suse.com>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> with two remarks:
>
>> -static inline int
>> -guest_supports_1G_superpages(struct vcpu *v)
>> +static inline bool guest_supports_l3_superpages(const struct domain *d)
>>  {
>> -    return (GUEST_PAGING_LEVELS >= 4 && hvm_pse1gb_supported(v->domain));
>> +    /*
>> +     * There are no control register settings for the hardware pagewalk on the
>> +     * subject of 1G superpages.
>> +     *
>> +     * If the guest constructs a 1GB superpage on capable hardware, it will
>> +     * function irrespective of whether the feature is advertised.  Xen's
>> +     * model of performing a pagewalk should match.
>> +     */
>> +    return GUEST_PAGING_LEVELS >= 4 && paging_mode_hap(d) && cpu_has_page1gb;
> Is it perhaps worth adding half a sentence stating that shadow
> doesn't support 1Gb pages at all?

Good point.

>
> Also I'm still not really happy with the guest_supports_ prefixes
> for this and its L2 counterpart: The question here isn't whether the
> guest supports it (we can't know whether it does), but whether it
> enabled PSE/PAE/LM. Arguably the L3 case is less clear because
> of the mentioned lack of an explicit enabled bit, so I can live with
> the patch going in unchanged (the L2 side then simply for things
> to remain consistent, albeit there's then already the difference of
> parameter types).

How would you prefer them to be named?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/9] x86/pagewalk: Clean up guest_supports_* predicates
  2017-03-20 13:36     ` Andrew Cooper
@ 2017-03-20 13:59       ` Jan Beulich
  2017-03-23 17:32         ` Andrew Cooper
  0 siblings, 1 reply; 41+ messages in thread
From: Jan Beulich @ 2017-03-20 13:59 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Tim Deegan, Xen-devel

>>> On 20.03.17 at 14:36, <andrew.cooper3@citrix.com> wrote:
> On 20/03/17 08:45, Jan Beulich wrote:
>> Also I'm still not really happy with the guest_supports_ prefixes
>> for this and its L2 counterpart: The question here isn't whether the
>> guest supports it (we can't know whether it does), but whether it
>> enabled PSE/PAE/LM. Arguably the L3 case is less clear because
>> of the mentioned lack of an explicit enabled bit, so I can live with
>> the patch going in unchanged (the L2 side then simply for things
>> to remain consistent, albeit there's then already the difference of
>> parameter types).
> 
> How would you prefer them to be named?

I think I did (or at least had meant to) suggest guest_uses_...() or
something similar.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/9] x86/pagewalk: Use pointer syntax for pfec parameter
  2017-03-16 16:31 ` [PATCH v2 2/9] x86/pagewalk: Use pointer syntax for pfec parameter Andrew Cooper
  2017-03-17 16:09   ` Jan Beulich
  2017-03-20 11:29   ` George Dunlap
@ 2017-03-23 16:28   ` Tim Deegan
  2 siblings, 0 replies; 41+ messages in thread
From: Tim Deegan @ 2017-03-23 16:28 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Jan Beulich, Xen-devel

At 16:31 +0000 on 16 Mar (1489681896), Andrew Cooper wrote:
> It is a pointer, not an array.
> 
> No functional change.
> 
> Requested-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Tim Deegan <tim@xen.org>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 3/9] x86/shadow: Drop VALID_GFN()
  2017-03-16 16:31 ` [PATCH v2 3/9] x86/shadow: Drop VALID_GFN() Andrew Cooper
@ 2017-03-23 16:30   ` Tim Deegan
  0 siblings, 0 replies; 41+ messages in thread
From: Tim Deegan @ 2017-03-23 16:30 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

At 16:31 +0000 on 16 Mar (1489681897), Andrew Cooper wrote:
> There is only one single user of VALID_GFN().  Inline the macro to remove the
> added layer of indirection in sh_gva_to_gfn()
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Tim Deegan <tim@xen.org>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/9] x86/pagewalk: Clean up guest_supports_* predicates
  2017-03-16 16:31 ` [PATCH v2 4/9] x86/pagewalk: Clean up guest_supports_* predicates Andrew Cooper
  2017-03-20  8:45   ` Jan Beulich
@ 2017-03-23 16:34   ` Tim Deegan
  1 sibling, 0 replies; 41+ messages in thread
From: Tim Deegan @ 2017-03-23 16:34 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Jan Beulich, Xen-devel

At 16:31 +0000 on 16 Mar (1489681898), Andrew Cooper wrote:
> Switch them to returning bool, and taking const parameters.
> 
> Rename guest_supports_superpages() to guest_supports_l2_superpages() to
> indicate which level of pagetables it is actually referring to, and rename
> guest_supports_1G_superpages() to guest_supports_l3_superpages() for
> consistency.
> 
> guest_supports_l3_superpages() is a static property of the domain, rather than
> control register settings, so is switched to take a domain pointer.
> hvm_pse1gb_supported() is inlined into its sole user because it isn't strictly
> hvm-specific (it is hap-specific) and really should be beside a comment
> explaining why the cpuid policy is ignored.
> 
> While cleaning up part of the file, clean up all trailing whilespace, and fix
> one comment which accidently refered to PG living in CR4 rather than CR0.
> 
> Requested-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Tim Deegan <tim@xen.org>
(with or without the renaming Jan asked for).

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 5/9] x86/pagewalk: Helpers for reserved bit handling
  2017-03-16 16:31 ` [PATCH v2 5/9] x86/pagewalk: Helpers for reserved bit handling Andrew Cooper
  2017-03-20  8:48   ` Jan Beulich
@ 2017-03-23 16:55   ` Tim Deegan
  2017-03-23 17:02     ` Andrew Cooper
  1 sibling, 1 reply; 41+ messages in thread
From: Tim Deegan @ 2017-03-23 16:55 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Jan Beulich, Xen-devel

At 16:31 +0000 on 16 Mar (1489681899), Andrew Cooper wrote:
> Some bits are unconditionally reserved in pagetable entries, or reserved
> because of alignment restrictions.  Other bits are reserved because of control
> register configuration.
> 
> Introduce helpers which take an individual vcpu and guest pagetable entry, and
> calculates whether any reserved bits are set.
> 
> While here, add a couple of newlines to aid readability.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Tim Deegan <tim@xen.org>, although:

> +/* Mask covering the reserved bits from superpage alignment. */
> +#define SUPERPAGE_RSVD(bit)                                             \
> +    (((1ULL << (bit)) - 1) & ~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1)))

I think this will be wrong if we ever get l4 superpages, as the mask
is only 32 bits wide.

Cheers,

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 7/9] x86/shadow: Use the pagewalk reserved bits helpers
  2017-03-16 16:31 ` [PATCH v2 7/9] x86/shadow: Use the pagewalk reserved bits helpers Andrew Cooper
  2017-03-16 17:25   ` Andrew Cooper
  2017-03-20  8:53   ` Jan Beulich
@ 2017-03-23 16:57   ` Tim Deegan
  2 siblings, 0 replies; 41+ messages in thread
From: Tim Deegan @ 2017-03-23 16:57 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Jan Beulich, Xen-devel

At 16:31 +0000 on 16 Mar (1489681901), Andrew Cooper wrote:
> The shadow logic should not create a valid/present shadow of a guest PTE which
> contains reserved bits from the guests point of view.  It is not guaranteed
> that the hardware pagewalk will come to the same conclusion, and raise a
> pagefault.
> 
> Shadows created on demand from the pagefault handler are fine because the
> pagewalk over the guest tables will have injected the fault into the guest
> rather than creating a shadow.
> 
> However, shadows created by sh_resync_l1() and sh_prefetch() haven't undergone
> a pagewalk and need to account for reserved bits before creating the shadow.
> 
> In practice, this means a 3-level guest could previously cause PTEs with bits
> 63:52 set to be shadowed (and discarded).  This PTE should cause #PF[RSVD]
> when encountered by hardware, but the installed shadow is valid and hardware
> doesn't fault.
> 
> Reuse the pagewalk reserved bits helpers, and assert in
> l?e_propagate_from_guest() that shadows are not attempted to be created with
> reserved bits set.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Tim Deegan <tim@xen.org>
(including your later adjustment to p2mt)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 5/9] x86/pagewalk: Helpers for reserved bit handling
  2017-03-23 16:55   ` Tim Deegan
@ 2017-03-23 17:02     ` Andrew Cooper
  2017-03-23 17:12       ` Tim Deegan
  0 siblings, 1 reply; 41+ messages in thread
From: Andrew Cooper @ 2017-03-23 17:02 UTC (permalink / raw)
  To: Tim Deegan; +Cc: George Dunlap, Jan Beulich, Xen-devel

On 23/03/17 16:55, Tim Deegan wrote:
> At 16:31 +0000 on 16 Mar (1489681899), Andrew Cooper wrote:
>> Some bits are unconditionally reserved in pagetable entries, or reserved
>> because of alignment restrictions.  Other bits are reserved because of control
>> register configuration.
>>
>> Introduce helpers which take an individual vcpu and guest pagetable entry, and
>> calculates whether any reserved bits are set.
>>
>> While here, add a couple of newlines to aid readability.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Tim Deegan <tim@xen.org>, although:
>
>> +/* Mask covering the reserved bits from superpage alignment. */
>> +#define SUPERPAGE_RSVD(bit)                                             \
>> +    (((1ULL << (bit)) - 1) & ~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1)))
> I think this will be wrong if we ever get l4 superpages, as the mask
> is only 32 bits wide.

What is 32 bits wide?  1ULL should cause everything else to be suitably
promoted, no?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 8/9] x86/pagewalk: Improve the logic behind setting access and dirty bits
  2017-03-16 16:31 ` [PATCH v2 8/9] x86/pagewalk: Improve the logic behind setting access and dirty bits Andrew Cooper
  2017-03-20  9:03   ` Jan Beulich
@ 2017-03-23 17:09   ` Tim Deegan
  2017-03-23 17:40     ` Andrew Cooper
  1 sibling, 1 reply; 41+ messages in thread
From: Tim Deegan @ 2017-03-23 17:09 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Jan Beulich, Xen-devel

At 16:31 +0000 on 16 Mar (1489681902), Andrew Cooper wrote:
> --- a/xen/arch/x86/mm/guest_walk.c
> +++ b/xen/arch/x86/mm/guest_walk.c
> @@ -32,24 +32,28 @@ asm(".file \"" __OBJECT_FILE__ "\"");
>  #include <asm/page.h>
>  #include <asm/guest_pt.h>
>  
> -/* Modify a guest pagetable entry to set the Accessed and Dirty bits.
> - * Returns non-zero if it actually writes to guest memory. */
> -static uint32_t set_ad_bits(void *guest_p, void *walk_p, int set_dirty)
> +/*
> + * Modify a guest pagetable entry to set the Accessed and Dirty bits.
> + * Returns non-zero if it actually writes to guest memory.

s/non-zero/true/.

> + */
> +static bool set_ad_bits(void *_guest_p, void *_walk_p, bool set_dirty)

Would it work to have this take guest_intpte_t * and have the caller
pass e.g. &l4p[guest_l4_table_offset(va)].l4 ?  Not very much prettier, I
suppose. :)

>  {
> -    guest_intpte_t old, new;
> +    guest_intpte_t *guest_p = _guest_p, *walk_p = _walk_p;
> +    guest_intpte_t new, old = ACCESS_ONCE(*walk_p);

Why ACCESS_ONCE?  The walk is unlikely to change underfoot.

>  
> -    old = *(guest_intpte_t *)walk_p;
>      new = old | _PAGE_ACCESSED | (set_dirty ? _PAGE_DIRTY : 0);
> -    if ( old != new ) 
> +    if ( old != new )
>      {
> -        /* Write the new entry into the walk, and try to write it back
> +        /*
> +         * Write the new entry into the walk, and try to write it back
>           * into the guest table as well.  If the guest table has changed
> -         * under out feet then leave it alone. */
> -        *(guest_intpte_t *)walk_p = new;
> -        if ( cmpxchg(((guest_intpte_t *)guest_p), old, new) == old ) 
> -            return 1;
> +         * under out feet then leave it alone.

s/out/our/ while we're here. 

The rest of this LGTM, thanks.

Cheers,

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 9/9] x86/pagewalk: non-functional cleanup
  2017-03-16 16:31 ` [PATCH v2 9/9] x86/pagewalk: non-functional cleanup Andrew Cooper
  2017-03-20  9:04   ` Jan Beulich
@ 2017-03-23 17:10   ` Tim Deegan
  1 sibling, 0 replies; 41+ messages in thread
From: Tim Deegan @ 2017-03-23 17:10 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Jan Beulich, Xen-devel

At 16:31 +0000 on 16 Mar (1489681903), Andrew Cooper wrote:
>  * Drop trailing whitespace
>  * Consistently apply Xen style
>  * Introduce a local variable block
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Tim Deegan <tim@xen.org>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 5/9] x86/pagewalk: Helpers for reserved bit handling
  2017-03-23 17:02     ` Andrew Cooper
@ 2017-03-23 17:12       ` Tim Deegan
  2017-03-23 17:35         ` Andrew Cooper
  0 siblings, 1 reply; 41+ messages in thread
From: Tim Deegan @ 2017-03-23 17:12 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Jan Beulich, Xen-devel

At 17:02 +0000 on 23 Mar (1490288548), Andrew Cooper wrote:
> On 23/03/17 16:55, Tim Deegan wrote:
> > At 16:31 +0000 on 16 Mar (1489681899), Andrew Cooper wrote:
> >> Some bits are unconditionally reserved in pagetable entries, or reserved
> >> because of alignment restrictions.  Other bits are reserved because of control
> >> register configuration.
> >>
> >> Introduce helpers which take an individual vcpu and guest pagetable entry, and
> >> calculates whether any reserved bits are set.
> >>
> >> While here, add a couple of newlines to aid readability.
> >>
> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > Reviewed-by: Tim Deegan <tim@xen.org>, although:
> >
> >> +/* Mask covering the reserved bits from superpage alignment. */
> >> +#define SUPERPAGE_RSVD(bit)                                             \
> >> +    (((1ULL << (bit)) - 1) & ~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1)))
> > I think this will be wrong if we ever get l4 superpages, as the mask
> > is only 32 bits wide.
> 
> What is 32 bits wide?  1ULL should cause everything else to be suitably
> promoted, no?


~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1)) == 0xffffe000.  Promotion comes
too late.

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/9] x86/pagewalk: Clean up guest_supports_* predicates
  2017-03-20 13:59       ` Jan Beulich
@ 2017-03-23 17:32         ` Andrew Cooper
  2017-03-24  7:19           ` Jan Beulich
  0 siblings, 1 reply; 41+ messages in thread
From: Andrew Cooper @ 2017-03-23 17:32 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Tim Deegan, Xen-devel

On 20/03/17 13:59, Jan Beulich wrote:
>>>> On 20.03.17 at 14:36, <andrew.cooper3@citrix.com> wrote:
>> On 20/03/17 08:45, Jan Beulich wrote:
>>> Also I'm still not really happy with the guest_supports_ prefixes
>>> for this and its L2 counterpart: The question here isn't whether the
>>> guest supports it (we can't know whether it does), but whether it
>>> enabled PSE/PAE/LM. Arguably the L3 case is less clear because
>>> of the mentioned lack of an explicit enabled bit, so I can live with
>>> the patch going in unchanged (the L2 side then simply for things
>>> to remain consistent, albeit there's then already the difference of
>>> parameter types).
>> How would you prefer them to be named?
> I think I did (or at least had meant to) suggest guest_uses_...() or
> something similar.

Grammatically, that is still somewhat awkward.

How about guest_can_use_...() ? That logically covers both that the
feature might be missing, or the control register might not be suitably
configured.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 5/9] x86/pagewalk: Helpers for reserved bit handling
  2017-03-23 17:12       ` Tim Deegan
@ 2017-03-23 17:35         ` Andrew Cooper
  2017-03-24  5:45           ` Juergen Gross
  2017-03-24  7:47           ` Jan Beulich
  0 siblings, 2 replies; 41+ messages in thread
From: Andrew Cooper @ 2017-03-23 17:35 UTC (permalink / raw)
  To: Tim Deegan; +Cc: George Dunlap, Jan Beulich, Xen-devel

On 23/03/17 17:12, Tim Deegan wrote:
> At 17:02 +0000 on 23 Mar (1490288548), Andrew Cooper wrote:
>> On 23/03/17 16:55, Tim Deegan wrote:
>>> At 16:31 +0000 on 16 Mar (1489681899), Andrew Cooper wrote:
>>>> Some bits are unconditionally reserved in pagetable entries, or reserved
>>>> because of alignment restrictions.  Other bits are reserved because of control
>>>> register configuration.
>>>>
>>>> Introduce helpers which take an individual vcpu and guest pagetable entry, and
>>>> calculates whether any reserved bits are set.
>>>>
>>>> While here, add a couple of newlines to aid readability.
>>>>
>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> Reviewed-by: Tim Deegan <tim@xen.org>, although:
>>>
>>>> +/* Mask covering the reserved bits from superpage alignment. */
>>>> +#define SUPERPAGE_RSVD(bit)                                             \
>>>> +    (((1ULL << (bit)) - 1) & ~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1)))
>>> I think this will be wrong if we ever get l4 superpages, as the mask
>>> is only 32 bits wide.
>> What is 32 bits wide?  1ULL should cause everything else to be suitably
>> promoted, no?
>
> ~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1)) == 0xffffe000.  Promotion comes
> too late.

Oh - good point.  Best fix this right now.

Would you prefer ~((uint64_t)_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1)) or
~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1) | 0ULL)

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 8/9] x86/pagewalk: Improve the logic behind setting access and dirty bits
  2017-03-23 17:09   ` Tim Deegan
@ 2017-03-23 17:40     ` Andrew Cooper
  0 siblings, 0 replies; 41+ messages in thread
From: Andrew Cooper @ 2017-03-23 17:40 UTC (permalink / raw)
  To: Tim Deegan; +Cc: George Dunlap, Jan Beulich, Xen-devel

On 23/03/17 17:09, Tim Deegan wrote:
> At 16:31 +0000 on 16 Mar (1489681902), Andrew Cooper wrote:
>> --- a/xen/arch/x86/mm/guest_walk.c
>> +++ b/xen/arch/x86/mm/guest_walk.c
>> @@ -32,24 +32,28 @@ asm(".file \"" __OBJECT_FILE__ "\"");
>>  #include <asm/page.h>
>>  #include <asm/guest_pt.h>
>>  
>> -/* Modify a guest pagetable entry to set the Accessed and Dirty bits.
>> - * Returns non-zero if it actually writes to guest memory. */
>> -static uint32_t set_ad_bits(void *guest_p, void *walk_p, int set_dirty)
>> +/*
>> + * Modify a guest pagetable entry to set the Accessed and Dirty bits.
>> + * Returns non-zero if it actually writes to guest memory.
> s/non-zero/true/.
>
>> + */
>> +static bool set_ad_bits(void *_guest_p, void *_walk_p, bool set_dirty)
> Would it work to have this take guest_intpte_t * and have the caller
> pass e.g. &l4p[guest_l4_table_offset(va)].l4 ?  Not very much prettier, I
> suppose. :)

I tried that and came to the same conclusion.  This is a static local
helper, so I figured it was fine to stay untyped like this.

>
>>  {
>> -    guest_intpte_t old, new;
>> +    guest_intpte_t *guest_p = _guest_p, *walk_p = _walk_p;
>> +    guest_intpte_t new, old = ACCESS_ONCE(*walk_p);
> Why ACCESS_ONCE?  The walk is unlikely to change underfoot.

Oh yes.  (I was concerned about a double read, but that is via guest_p
not walk_p.)  I will drop.

>
>>  
>> -    old = *(guest_intpte_t *)walk_p;
>>      new = old | _PAGE_ACCESSED | (set_dirty ? _PAGE_DIRTY : 0);
>> -    if ( old != new ) 
>> +    if ( old != new )
>>      {
>> -        /* Write the new entry into the walk, and try to write it back
>> +        /*
>> +         * Write the new entry into the walk, and try to write it back
>>           * into the guest table as well.  If the guest table has changed
>> -         * under out feet then leave it alone. */
>> -        *(guest_intpte_t *)walk_p = new;
>> -        if ( cmpxchg(((guest_intpte_t *)guest_p), old, new) == old ) 
>> -            return 1;
>> +         * under out feet then leave it alone.
> s/out/our/ while we're here.

Will do.

~Andrew

>
> The rest of this LGTM, thanks.
>
> Cheers,
>
> Tim.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 5/9] x86/pagewalk: Helpers for reserved bit handling
  2017-03-23 17:35         ` Andrew Cooper
@ 2017-03-24  5:45           ` Juergen Gross
  2017-03-24  7:51             ` Jan Beulich
                               ` (2 more replies)
  2017-03-24  7:47           ` Jan Beulich
  1 sibling, 3 replies; 41+ messages in thread
From: Juergen Gross @ 2017-03-24  5:45 UTC (permalink / raw)
  To: Andrew Cooper, Tim Deegan; +Cc: George Dunlap, Jan Beulich, Xen-devel

On 23/03/17 18:35, Andrew Cooper wrote:
> On 23/03/17 17:12, Tim Deegan wrote:
>> At 17:02 +0000 on 23 Mar (1490288548), Andrew Cooper wrote:
>>> On 23/03/17 16:55, Tim Deegan wrote:
>>>> At 16:31 +0000 on 16 Mar (1489681899), Andrew Cooper wrote:
>>>>> Some bits are unconditionally reserved in pagetable entries, or reserved
>>>>> because of alignment restrictions.  Other bits are reserved because of control
>>>>> register configuration.
>>>>>
>>>>> Introduce helpers which take an individual vcpu and guest pagetable entry, and
>>>>> calculates whether any reserved bits are set.
>>>>>
>>>>> While here, add a couple of newlines to aid readability.
>>>>>
>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> Reviewed-by: Tim Deegan <tim@xen.org>, although:
>>>>
>>>>> +/* Mask covering the reserved bits from superpage alignment. */
>>>>> +#define SUPERPAGE_RSVD(bit)                                             \
>>>>> +    (((1ULL << (bit)) - 1) & ~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1)))
>>>> I think this will be wrong if we ever get l4 superpages, as the mask
>>>> is only 32 bits wide.
>>> What is 32 bits wide?  1ULL should cause everything else to be suitably
>>> promoted, no?
>>
>> ~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1)) == 0xffffe000.  Promotion comes
>> too late.
> 
> Oh - good point.  Best fix this right now.
> 
> Would you prefer ~((uint64_t)_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1)) or
> ~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1) | 0ULL)

Wouldn't it be better to just define the _PAGE_PSE bits accordingly?

Letting them be 32 bit only seems to be very error prone.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/9] x86/pagewalk: Clean up guest_supports_* predicates
  2017-03-23 17:32         ` Andrew Cooper
@ 2017-03-24  7:19           ` Jan Beulich
  0 siblings, 0 replies; 41+ messages in thread
From: Jan Beulich @ 2017-03-24  7:19 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Tim Deegan, Xen-devel

>>> On 23.03.17 at 18:32, <andrew.cooper3@citrix.com> wrote:
> On 20/03/17 13:59, Jan Beulich wrote:
>>>>> On 20.03.17 at 14:36, <andrew.cooper3@citrix.com> wrote:
>>> On 20/03/17 08:45, Jan Beulich wrote:
>>>> Also I'm still not really happy with the guest_supports_ prefixes
>>>> for this and its L2 counterpart: The question here isn't whether the
>>>> guest supports it (we can't know whether it does), but whether it
>>>> enabled PSE/PAE/LM. Arguably the L3 case is less clear because
>>>> of the mentioned lack of an explicit enabled bit, so I can live with
>>>> the patch going in unchanged (the L2 side then simply for things
>>>> to remain consistent, albeit there's then already the difference of
>>>> parameter types).
>>> How would you prefer them to be named?
>> I think I did (or at least had meant to) suggest guest_uses_...() or
>> something similar.
> 
> Grammatically, that is still somewhat awkward.
> 
> How about guest_can_use_...() ? That logically covers both that the
> feature might be missing, or the control register might not be suitably
> configured.

Fine with me.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 5/9] x86/pagewalk: Helpers for reserved bit handling
  2017-03-23 17:35         ` Andrew Cooper
  2017-03-24  5:45           ` Juergen Gross
@ 2017-03-24  7:47           ` Jan Beulich
  2017-03-24  8:36             ` Andrew Cooper
  1 sibling, 1 reply; 41+ messages in thread
From: Jan Beulich @ 2017-03-24  7:47 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Tim Deegan, Xen-devel

>>> On 23.03.17 at 18:35, <andrew.cooper3@citrix.com> wrote:
> On 23/03/17 17:12, Tim Deegan wrote:
>> At 17:02 +0000 on 23 Mar (1490288548), Andrew Cooper wrote:
>>> On 23/03/17 16:55, Tim Deegan wrote:
>>>> At 16:31 +0000 on 16 Mar (1489681899), Andrew Cooper wrote:
>>>>> Some bits are unconditionally reserved in pagetable entries, or reserved
>>>>> because of alignment restrictions.  Other bits are reserved because of 
> control
>>>>> register configuration.
>>>>>
>>>>> Introduce helpers which take an individual vcpu and guest pagetable entry, 
> and
>>>>> calculates whether any reserved bits are set.
>>>>>
>>>>> While here, add a couple of newlines to aid readability.
>>>>>
>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> Reviewed-by: Tim Deegan <tim@xen.org>, although:
>>>>
>>>>> +/* Mask covering the reserved bits from superpage alignment. */
>>>>> +#define SUPERPAGE_RSVD(bit)                                             \
>>>>> +    (((1ULL << (bit)) - 1) & ~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1)))
>>>> I think this will be wrong if we ever get l4 superpages, as the mask
>>>> is only 32 bits wide.
>>> What is 32 bits wide?  1ULL should cause everything else to be suitably
>>> promoted, no?
>>
>> ~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1)) == 0xffffe000.  Promotion comes
>> too late.
> 
> Oh - good point.  Best fix this right now.
> 
> Would you prefer ~((uint64_t)_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1)) or
> ~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1) | 0ULL)

Why not simply tag the suffix on a constant already there, i.e.

~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1ULL))

(And btw, we commonly use only UL these days.)

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 5/9] x86/pagewalk: Helpers for reserved bit handling
  2017-03-24  5:45           ` Juergen Gross
@ 2017-03-24  7:51             ` Jan Beulich
       [not found]             ` <58D4DDFF0200007800147138@suse.com>
  2017-03-24  9:06             ` Andrew Cooper
  2 siblings, 0 replies; 41+ messages in thread
From: Jan Beulich @ 2017-03-24  7:51 UTC (permalink / raw)
  To: Juergen Gross; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Xen-devel

>>> On 24.03.17 at 06:45, <jgross@suse.com> wrote:
> On 23/03/17 18:35, Andrew Cooper wrote:
>> Would you prefer ~((uint64_t)_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1)) or
>> ~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1) | 0ULL)
> 
> Wouldn't it be better to just define the _PAGE_PSE bits accordingly?

I don't think that's a good idea, since the flags accessors deal with
unsigned int quantities (see {get,put}_pte_flags()), and there's no
need to promote these to 64 bit operations. Otherwise you'd also
have to e.g. ask for _PAGE_NX_BIT to be made 1ULL << 63 instead
of its current 1ULL << 23.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 5/9] x86/pagewalk: Helpers for reserved bit handling
       [not found]             ` <58D4DDFF0200007800147138@suse.com>
@ 2017-03-24  7:58               ` Juergen Gross
  2017-03-24  8:25                 ` Jan Beulich
  0 siblings, 1 reply; 41+ messages in thread
From: Juergen Gross @ 2017-03-24  7:58 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Xen-devel

On 24/03/17 08:51, Jan Beulich wrote:
>>>> On 24.03.17 at 06:45, <jgross@suse.com> wrote:
>> On 23/03/17 18:35, Andrew Cooper wrote:
>>> Would you prefer ~((uint64_t)_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1)) or
>>> ~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1) | 0ULL)
>>
>> Wouldn't it be better to just define the _PAGE_PSE bits accordingly?
> 
> I don't think that's a good idea, since the flags accessors deal with
> unsigned int quantities (see {get,put}_pte_flags()), and there's no
> need to promote these to 64 bit operations. Otherwise you'd also
> have to e.g. ask for _PAGE_NX_BIT to be made 1ULL << 63 instead
> of its current 1ULL << 23.

Well, I came to my conclusion looking at the usage of _PAGE_PSE_PAT
in do_recalc() (source file arch/x86/mm/p2m-pt.c). While being fine
right now it will be a problem as soon as we support >16 TB hosts.
And finding these kind of problems might be hard.

Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 5/9] x86/pagewalk: Helpers for reserved bit handling
  2017-03-24  7:58               ` Juergen Gross
@ 2017-03-24  8:25                 ` Jan Beulich
  0 siblings, 0 replies; 41+ messages in thread
From: Jan Beulich @ 2017-03-24  8:25 UTC (permalink / raw)
  To: Juergen Gross; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Xen-devel

>>> On 24.03.17 at 08:58, <jgross@suse.com> wrote:
> On 24/03/17 08:51, Jan Beulich wrote:
>>>>> On 24.03.17 at 06:45, <jgross@suse.com> wrote:
>>> On 23/03/17 18:35, Andrew Cooper wrote:
>>>> Would you prefer ~((uint64_t)_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1)) or
>>>> ~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1) | 0ULL)
>>>
>>> Wouldn't it be better to just define the _PAGE_PSE bits accordingly?
>> 
>> I don't think that's a good idea, since the flags accessors deal with
>> unsigned int quantities (see {get,put}_pte_flags()), and there's no
>> need to promote these to 64 bit operations. Otherwise you'd also
>> have to e.g. ask for _PAGE_NX_BIT to be made 1ULL << 63 instead
>> of its current 1ULL << 23.
> 
> Well, I came to my conclusion looking at the usage of _PAGE_PSE_PAT
> in do_recalc() (source file arch/x86/mm/p2m-pt.c). While being fine
> right now it will be a problem as soon as we support >16 TB hosts.
> And finding these kind of problems might be hard.

We do support such hosts (CONFIG_BIGMEM), so we should fix such
issues. Luckily a quick grep shows this to be the only problem of that
kind. The problem really only occurs when the flags accessors are
being bypassed (which sadly in a few cases is almost unavoidable).

I guess you'll submit a patch, else let me know if I should do so.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 5/9] x86/pagewalk: Helpers for reserved bit handling
  2017-03-24  7:47           ` Jan Beulich
@ 2017-03-24  8:36             ` Andrew Cooper
  0 siblings, 0 replies; 41+ messages in thread
From: Andrew Cooper @ 2017-03-24  8:36 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Tim Deegan, Xen-devel

On 24/03/2017 07:47, Jan Beulich wrote:
>>>> On 23.03.17 at 18:35, <andrew.cooper3@citrix.com> wrote:
>> On 23/03/17 17:12, Tim Deegan wrote:
>>> At 17:02 +0000 on 23 Mar (1490288548), Andrew Cooper wrote:
>>>> On 23/03/17 16:55, Tim Deegan wrote:
>>>>> At 16:31 +0000 on 16 Mar (1489681899), Andrew Cooper wrote:
>>>>>> Some bits are unconditionally reserved in pagetable entries, or reserved
>>>>>> because of alignment restrictions.  Other bits are reserved because of 
>> control
>>>>>> register configuration.
>>>>>>
>>>>>> Introduce helpers which take an individual vcpu and guest pagetable entry, 
>> and
>>>>>> calculates whether any reserved bits are set.
>>>>>>
>>>>>> While here, add a couple of newlines to aid readability.
>>>>>>
>>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>> Reviewed-by: Tim Deegan <tim@xen.org>, although:
>>>>>
>>>>>> +/* Mask covering the reserved bits from superpage alignment. */
>>>>>> +#define SUPERPAGE_RSVD(bit)                                             \
>>>>>> +    (((1ULL << (bit)) - 1) & ~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1)))
>>>>> I think this will be wrong if we ever get l4 superpages, as the mask
>>>>> is only 32 bits wide.
>>>> What is 32 bits wide?  1ULL should cause everything else to be suitably
>>>> promoted, no?
>>> ~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1)) == 0xffffe000.  Promotion comes
>>> too late.
>> Oh - good point.  Best fix this right now.
>>
>> Would you prefer ~((uint64_t)_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1)) or
>> ~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1) | 0ULL)
> Why not simply tag the suffix on a constant already there, i.e.
>
> ~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1ULL))

Oh - good point.  I will do that.

> (And btw, we commonly use only UL these days.)

Sorry - force of habit.  (XTF, being 32bit as well, really does need ULL
over UL)

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 5/9] x86/pagewalk: Helpers for reserved bit handling
  2017-03-24  5:45           ` Juergen Gross
  2017-03-24  7:51             ` Jan Beulich
       [not found]             ` <58D4DDFF0200007800147138@suse.com>
@ 2017-03-24  9:06             ` Andrew Cooper
  2 siblings, 0 replies; 41+ messages in thread
From: Andrew Cooper @ 2017-03-24  9:06 UTC (permalink / raw)
  To: Juergen Gross, Tim Deegan; +Cc: George Dunlap, Jan Beulich, Xen-devel

On 24/03/2017 05:45, Juergen Gross wrote:
> On 23/03/17 18:35, Andrew Cooper wrote:
>> On 23/03/17 17:12, Tim Deegan wrote:
>>> At 17:02 +0000 on 23 Mar (1490288548), Andrew Cooper wrote:
>>>> On 23/03/17 16:55, Tim Deegan wrote:
>>>>> At 16:31 +0000 on 16 Mar (1489681899), Andrew Cooper wrote:
>>>>>> Some bits are unconditionally reserved in pagetable entries, or reserved
>>>>>> because of alignment restrictions.  Other bits are reserved because of control
>>>>>> register configuration.
>>>>>>
>>>>>> Introduce helpers which take an individual vcpu and guest pagetable entry, and
>>>>>> calculates whether any reserved bits are set.
>>>>>>
>>>>>> While here, add a couple of newlines to aid readability.
>>>>>>
>>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>> Reviewed-by: Tim Deegan <tim@xen.org>, although:
>>>>>
>>>>>> +/* Mask covering the reserved bits from superpage alignment. */
>>>>>> +#define SUPERPAGE_RSVD(bit)                                             \
>>>>>> +    (((1ULL << (bit)) - 1) & ~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1)))
>>>>> I think this will be wrong if we ever get l4 superpages, as the mask
>>>>> is only 32 bits wide.
>>>> What is 32 bits wide?  1ULL should cause everything else to be suitably
>>>> promoted, no?
>>> ~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1)) == 0xffffe000.  Promotion comes
>>> too late.
>> Oh - good point.  Best fix this right now.
>>
>> Would you prefer ~((uint64_t)_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1)) or
>> ~(_PAGE_PSE_PAT | (_PAGE_PSE_PAT - 1) | 0ULL)
> Wouldn't it be better to just define the _PAGE_PSE bits accordingly?
>
> Letting them be 32 bit only seems to be very error prone.

As you see from the other replies, that's not the currently intended use
of these flags.

However, the development of this pagewalk series has caused me to
question the sense in keeping pte flags as a 32bit quantity.  There is a
non-trivial amount of packing and unpacking of them, which is surely
more expensive  than just using a 64bit quantity, now that we are 64bit
only.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2017-03-24  9:06 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-16 16:31 [PATCH v2 0/9] Fixes to pagetable handling Andrew Cooper
2017-03-16 16:31 ` [PATCH v2 1/9] x86/cpuid: Sort cpu_has_* predicates by feature number Andrew Cooper
2017-03-17 16:08   ` Jan Beulich
2017-03-16 16:31 ` [PATCH v2 2/9] x86/pagewalk: Use pointer syntax for pfec parameter Andrew Cooper
2017-03-17 16:09   ` Jan Beulich
2017-03-20 11:29   ` George Dunlap
2017-03-23 16:28   ` Tim Deegan
2017-03-16 16:31 ` [PATCH v2 3/9] x86/shadow: Drop VALID_GFN() Andrew Cooper
2017-03-23 16:30   ` Tim Deegan
2017-03-16 16:31 ` [PATCH v2 4/9] x86/pagewalk: Clean up guest_supports_* predicates Andrew Cooper
2017-03-20  8:45   ` Jan Beulich
2017-03-20 13:36     ` Andrew Cooper
2017-03-20 13:59       ` Jan Beulich
2017-03-23 17:32         ` Andrew Cooper
2017-03-24  7:19           ` Jan Beulich
2017-03-23 16:34   ` Tim Deegan
2017-03-16 16:31 ` [PATCH v2 5/9] x86/pagewalk: Helpers for reserved bit handling Andrew Cooper
2017-03-20  8:48   ` Jan Beulich
2017-03-23 16:55   ` Tim Deegan
2017-03-23 17:02     ` Andrew Cooper
2017-03-23 17:12       ` Tim Deegan
2017-03-23 17:35         ` Andrew Cooper
2017-03-24  5:45           ` Juergen Gross
2017-03-24  7:51             ` Jan Beulich
     [not found]             ` <58D4DDFF0200007800147138@suse.com>
2017-03-24  7:58               ` Juergen Gross
2017-03-24  8:25                 ` Jan Beulich
2017-03-24  9:06             ` Andrew Cooper
2017-03-24  7:47           ` Jan Beulich
2017-03-24  8:36             ` Andrew Cooper
2017-03-16 16:31 ` [PATCH v2 6/9] x86/pagewalk: Re-implement the pagetable walker Andrew Cooper
2017-03-16 16:31 ` [PATCH v2 7/9] x86/shadow: Use the pagewalk reserved bits helpers Andrew Cooper
2017-03-16 17:25   ` Andrew Cooper
2017-03-20  8:53   ` Jan Beulich
2017-03-23 16:57   ` Tim Deegan
2017-03-16 16:31 ` [PATCH v2 8/9] x86/pagewalk: Improve the logic behind setting access and dirty bits Andrew Cooper
2017-03-20  9:03   ` Jan Beulich
2017-03-23 17:09   ` Tim Deegan
2017-03-23 17:40     ` Andrew Cooper
2017-03-16 16:31 ` [PATCH v2 9/9] x86/pagewalk: non-functional cleanup Andrew Cooper
2017-03-20  9:04   ` Jan Beulich
2017-03-23 17:10   ` Tim Deegan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.