* [PATCHv1 0/3] x86,xen: remove _PAGE_IOMAP PTE flag
@ 2014-09-02 16:37 David Vrabel
2014-09-02 16:37 ` [PATCH 1/3] x86: skip check for spurious faults for non-present faults David Vrabel
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: David Vrabel @ 2014-09-02 16:37 UTC (permalink / raw)
To: linux-kernel
Cc: David Vrabel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
xen-devel, Konrad Rzeszutek Wilk, Boris Ostrovsky
The series removes the Xen-specific _PAGE_IOMAP PTE flag, freeing it
up for other uses.
I plan to queue this for 3.18 via the Xen tree.
This was previously posted as part of a larger series (x86/xen: fixes
for mapping high MMIO regions (and remove _PAGE_IOMAP)), the bulk of
which is already merged.
Changes in v1 (from v6 of the previous series):
- Improve comments/description of the spurious fault fix.
David
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 1/3] x86: skip check for spurious faults for non-present faults
2014-09-02 16:37 [PATCHv1 0/3] x86,xen: remove _PAGE_IOMAP PTE flag David Vrabel
@ 2014-09-02 16:37 ` David Vrabel
2014-09-02 16:37 ` [PATCH 2/3] x86/xen: do not use _PAGE_IOMAP PTE flag for I/O mappings David Vrabel
2014-09-02 16:37 ` [PATCH 3/3] x86: remove the Xen-specific _PAGE_IOMAP PTE flag David Vrabel
2 siblings, 0 replies; 4+ messages in thread
From: David Vrabel @ 2014-09-02 16:37 UTC (permalink / raw)
To: linux-kernel
Cc: David Vrabel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
xen-devel, Konrad Rzeszutek Wilk, Boris Ostrovsky
If a fault on a kernel address is due to a non-present page, then it
cannot be the result of stale TLB entry from a protection change (RO
to RW or NX to X). Thus the pagetable walk in spurious_fault() can be
skipped.
See the initial if in spurious_fault() and the tests in
spurious_fault_check()) for the set of possible error codes checked
for spurious faults. These are:
IRUWP
Before x00xx && ( 1xxxx || xxx1x )
After ( 10001 || 00011 ) && ( 1xxxx || xxx1x )
Thus the new condition is a subset of the previous one, excluding only
non-present faults (I == 1 and W == 1 are mutually exclusive).
This avoids spurious_fault() oopsing in some cases if the pagetables
it attempts to walk are not accessible. This obscures the location of
the original fault.
This also fixes a crash with Xen PV guests when they access entries in
the M2P corresponding to device MMIO regions. The M2P is mapped
(read-only) by Xen into the kernel address space of the guest and this
mapping may contains holes for non-RAM regions. Read faults will
result in calls to spurious_fault(), but because the page tables for
the M2P mappings are not accessible by the guest the pagetable walk
would fault.
This was not normally a problem as MMIO mappings would not normally
result in a M2P lookup because of the use of the _PAGE_IOMAP bit the
PTE. However, removing the _PAGE_IOMAP bit requires M2P lookups for
MMIO mappings as well.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
---
arch/x86/mm/fault.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index a241946..83bb03b 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -933,8 +933,17 @@ static int spurious_fault_check(unsigned long error_code, pte_t *pte)
* cross-processor TLB flush, even if no stale TLB entries exist
* on other processors.
*
+ * Spurious faults may only occur if the TLB contains an entry with
+ * fewer permission than the page table entry. Non-present (P = 0)
+ * and reserved bit (R = 1) faults are never spurious.
+ *
* There are no security implications to leaving a stale TLB when
* increasing the permissions on a page.
+ *
+ * Returns non-zero if a spurious fault was handled, zero otherwise.
+ *
+ * See Intel Developer's Manual Vol 3 Section 4.10.4.3, bullet 3
+ * (Optional Invalidation).
*/
static noinline int
spurious_fault(unsigned long error_code, unsigned long address)
@@ -945,8 +954,17 @@ spurious_fault(unsigned long error_code, unsigned long address)
pte_t *pte;
int ret;
- /* Reserved-bit violation or user access to kernel space? */
- if (error_code & (PF_USER | PF_RSVD))
+ /*
+ * Only writes to RO or instruction fetches from NX may cause
+ * spurious faults.
+ *
+ * These could be from user or supervisor accesses but the TLB
+ * is only lazily flushed after a kernel mapping protection
+ * change, so user accesses are not expected to cause spurious
+ * faults.
+ */
+ if (error_code != (PF_WRITE | PF_PROT)
+ && error_code != (PF_INSTR | PF_PROT))
return 0;
pgd = init_mm.pgd + pgd_index(address);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/3] x86/xen: do not use _PAGE_IOMAP PTE flag for I/O mappings
2014-09-02 16:37 [PATCHv1 0/3] x86,xen: remove _PAGE_IOMAP PTE flag David Vrabel
2014-09-02 16:37 ` [PATCH 1/3] x86: skip check for spurious faults for non-present faults David Vrabel
@ 2014-09-02 16:37 ` David Vrabel
2014-09-02 16:37 ` [PATCH 3/3] x86: remove the Xen-specific _PAGE_IOMAP PTE flag David Vrabel
2 siblings, 0 replies; 4+ messages in thread
From: David Vrabel @ 2014-09-02 16:37 UTC (permalink / raw)
To: linux-kernel
Cc: David Vrabel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
xen-devel, Konrad Rzeszutek Wilk, Boris Ostrovsky
Since mfn_to_pfn() returns the correct PFN for identity mappings (as
used for MMIO regions), the use of _PAGE_IOMAP is not required in
pte_mfn_to_pfn().
Do not set the _PAGE_IOMAP flag in pte_pfn_to_mfn() and do not use it
in pte_mfn_to_pfn().
This will allow _PAGE_IOMAP to be removed, making it available for
future use.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
arch/x86/xen/mmu.c | 48 ++++--------------------------------------------
1 file changed, 4 insertions(+), 44 deletions(-)
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index e8a1201..90e1d44 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -399,38 +399,14 @@ static pteval_t pte_pfn_to_mfn(pteval_t val)
if (unlikely(mfn == INVALID_P2M_ENTRY)) {
mfn = 0;
flags = 0;
- } else {
- /*
- * Paramount to do this test _after_ the
- * INVALID_P2M_ENTRY as INVALID_P2M_ENTRY &
- * IDENTITY_FRAME_BIT resolves to true.
- */
- mfn &= ~FOREIGN_FRAME_BIT;
- if (mfn & IDENTITY_FRAME_BIT) {
- mfn &= ~IDENTITY_FRAME_BIT;
- flags |= _PAGE_IOMAP;
- }
- }
+ } else
+ mfn &= ~(FOREIGN_FRAME_BIT | IDENTITY_FRAME_BIT);
val = ((pteval_t)mfn << PAGE_SHIFT) | flags;
}
return val;
}
-static pteval_t iomap_pte(pteval_t val)
-{
- if (val & _PAGE_PRESENT) {
- unsigned long pfn = (val & PTE_PFN_MASK) >> PAGE_SHIFT;
- pteval_t flags = val & PTE_FLAGS_MASK;
-
- /* We assume the pte frame number is a MFN, so
- just use it as-is. */
- val = ((pteval_t)pfn << PAGE_SHIFT) | flags;
- }
-
- return val;
-}
-
__visible pteval_t xen_pte_val(pte_t pte)
{
pteval_t pteval = pte.pte;
@@ -441,9 +417,6 @@ __visible pteval_t xen_pte_val(pte_t pte)
pteval = (pteval & ~_PAGE_PAT) | _PAGE_PWT;
}
#endif
- if (xen_initial_domain() && (pteval & _PAGE_IOMAP))
- return pteval;
-
return pte_mfn_to_pfn(pteval);
}
PV_CALLEE_SAVE_REGS_THUNK(xen_pte_val);
@@ -481,7 +454,6 @@ void xen_set_pat(u64 pat)
__visible pte_t xen_make_pte(pteval_t pte)
{
- phys_addr_t addr = (pte & PTE_PFN_MASK);
#if 0
/* If Linux is trying to set a WC pte, then map to the Xen WC.
* If _PAGE_PAT is set, then it probably means it is really
@@ -496,19 +468,7 @@ __visible pte_t xen_make_pte(pteval_t pte)
pte = (pte & ~(_PAGE_PCD | _PAGE_PWT)) | _PAGE_PAT;
}
#endif
- /*
- * Unprivileged domains are allowed to do IOMAPpings for
- * PCI passthrough, but not map ISA space. The ISA
- * mappings are just dummy local mappings to keep other
- * parts of the kernel happy.
- */
- if (unlikely(pte & _PAGE_IOMAP) &&
- (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
- pte = iomap_pte(pte);
- } else {
- pte &= ~_PAGE_IOMAP;
- pte = pte_pfn_to_mfn(pte);
- }
+ pte = pte_pfn_to_mfn(pte);
return native_make_pte(pte);
}
@@ -2094,7 +2054,7 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot)
default:
/* By default, set_fixmap is used for hardware mappings */
- pte = mfn_pte(phys, __pgprot(pgprot_val(prot) | _PAGE_IOMAP));
+ pte = mfn_pte(phys, prot);
break;
}
--
1.7.10.4
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 3/3] x86: remove the Xen-specific _PAGE_IOMAP PTE flag
2014-09-02 16:37 [PATCHv1 0/3] x86,xen: remove _PAGE_IOMAP PTE flag David Vrabel
2014-09-02 16:37 ` [PATCH 1/3] x86: skip check for spurious faults for non-present faults David Vrabel
2014-09-02 16:37 ` [PATCH 2/3] x86/xen: do not use _PAGE_IOMAP PTE flag for I/O mappings David Vrabel
@ 2014-09-02 16:37 ` David Vrabel
2 siblings, 0 replies; 4+ messages in thread
From: David Vrabel @ 2014-09-02 16:37 UTC (permalink / raw)
To: linux-kernel
Cc: David Vrabel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
xen-devel, Konrad Rzeszutek Wilk, Boris Ostrovsky
The _PAGE_IO_MAP PTE flag was only used by Xen PV guests to mark PTEs
that were used to map I/O regions that are 1:1 in the p2m. This
allowed Xen to obtain the correct PFN when converting the MFNs read
from a PTE back to their PFN.
Xen guests no longer use _PAGE_IOMAP for this. Instead mfn_to_pfn()
returns the correct PFN by using a combination of the m2p and p2m to
determine if an MFN corresponds to a 1:1 mapping in the the p2m.
Remove _PAGE_IOMAP, replacing it with _PAGE_UNUSED2 to allow for
future uses of the PTE flag.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
---
arch/x86/include/asm/pgtable_types.h | 11 +++++------
arch/x86/mm/init_32.c | 2 +-
arch/x86/mm/init_64.c | 2 +-
arch/x86/pci/i386.c | 2 --
arch/x86/xen/enlighten.c | 2 --
5 files changed, 7 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index f216963..cee83b2 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -23,7 +23,6 @@
#define _PAGE_BIT_SPECIAL _PAGE_BIT_SOFTW1
#define _PAGE_BIT_CPA_TEST _PAGE_BIT_SOFTW1
#define _PAGE_BIT_SPLITTING _PAGE_BIT_SOFTW2 /* only valid on a PSE pmd */
-#define _PAGE_BIT_IOMAP _PAGE_BIT_SOFTW2 /* flag used to indicate IO mapping */
#define _PAGE_BIT_HIDDEN _PAGE_BIT_SOFTW3 /* hidden by kmemcheck */
#define _PAGE_BIT_SOFT_DIRTY _PAGE_BIT_SOFTW3 /* software dirty tracking */
#define _PAGE_BIT_NX 63 /* No execute: only valid after cpuid check */
@@ -52,7 +51,7 @@
#define _PAGE_PSE (_AT(pteval_t, 1) << _PAGE_BIT_PSE)
#define _PAGE_GLOBAL (_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL)
#define _PAGE_SOFTW1 (_AT(pteval_t, 1) << _PAGE_BIT_SOFTW1)
-#define _PAGE_IOMAP (_AT(pteval_t, 1) << _PAGE_BIT_IOMAP)
+#define _PAGE_SOFTW2 (_AT(pteval_t, 1) << _PAGE_BIT_SOFTW2)
#define _PAGE_PAT (_AT(pteval_t, 1) << _PAGE_BIT_PAT)
#define _PAGE_PAT_LARGE (_AT(pteval_t, 1) << _PAGE_BIT_PAT_LARGE)
#define _PAGE_SPECIAL (_AT(pteval_t, 1) << _PAGE_BIT_SPECIAL)
@@ -168,10 +167,10 @@
#define __PAGE_KERNEL_LARGE_NOCACHE (__PAGE_KERNEL | _PAGE_CACHE_UC | _PAGE_PSE)
#define __PAGE_KERNEL_LARGE_EXEC (__PAGE_KERNEL_EXEC | _PAGE_PSE)
-#define __PAGE_KERNEL_IO (__PAGE_KERNEL | _PAGE_IOMAP)
-#define __PAGE_KERNEL_IO_NOCACHE (__PAGE_KERNEL_NOCACHE | _PAGE_IOMAP)
-#define __PAGE_KERNEL_IO_UC_MINUS (__PAGE_KERNEL_UC_MINUS | _PAGE_IOMAP)
-#define __PAGE_KERNEL_IO_WC (__PAGE_KERNEL_WC | _PAGE_IOMAP)
+#define __PAGE_KERNEL_IO (__PAGE_KERNEL)
+#define __PAGE_KERNEL_IO_NOCACHE (__PAGE_KERNEL_NOCACHE)
+#define __PAGE_KERNEL_IO_UC_MINUS (__PAGE_KERNEL_UC_MINUS)
+#define __PAGE_KERNEL_IO_WC (__PAGE_KERNEL_WC)
#define PAGE_KERNEL __pgprot(__PAGE_KERNEL)
#define PAGE_KERNEL_RO __pgprot(__PAGE_KERNEL_RO)
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 7d05565..c8140e1 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -537,7 +537,7 @@ static void __init pagetable_init(void)
permanent_kmaps_init(pgd_base);
}
-pteval_t __supported_pte_mask __read_mostly = ~(_PAGE_NX | _PAGE_GLOBAL | _PAGE_IOMAP);
+pteval_t __supported_pte_mask __read_mostly = ~(_PAGE_NX | _PAGE_GLOBAL);
EXPORT_SYMBOL_GPL(__supported_pte_mask);
/* user-defined highmem size */
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 5621c47..5d98476 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -151,7 +151,7 @@ early_param("gbpages", parse_direct_gbpages_on);
* around without checking the pgd every time.
*/
-pteval_t __supported_pte_mask __read_mostly = ~_PAGE_IOMAP;
+pteval_t __supported_pte_mask __read_mostly = ~0;
EXPORT_SYMBOL_GPL(__supported_pte_mask);
int force_personality32;
diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index 2ae525e..37c1435 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -442,8 +442,6 @@ int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
*/
prot |= _PAGE_CACHE_UC_MINUS;
- prot |= _PAGE_IOMAP; /* creating a mapping for IO */
-
vma->vm_page_prot = __pgprot(prot);
if (io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index c0cb11f..be1cff5 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1559,8 +1559,6 @@ asmlinkage __visible void __init xen_start_kernel(void)
#endif
__supported_pte_mask &= ~(_PAGE_PWT | _PAGE_PCD);
- __supported_pte_mask |= _PAGE_IOMAP;
-
/*
* Prevent page tables from being allocated in highmem, even
* if CONFIG_HIGHPTE is enabled.
--
1.7.10.4
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-09-02 16:39 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-02 16:37 [PATCHv1 0/3] x86,xen: remove _PAGE_IOMAP PTE flag David Vrabel
2014-09-02 16:37 ` [PATCH 1/3] x86: skip check for spurious faults for non-present faults David Vrabel
2014-09-02 16:37 ` [PATCH 2/3] x86/xen: do not use _PAGE_IOMAP PTE flag for I/O mappings David Vrabel
2014-09-02 16:37 ` [PATCH 3/3] x86: remove the Xen-specific _PAGE_IOMAP PTE flag David Vrabel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).