All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv6 0/9] x86/xen: fixes for mapping high MMIO regions (and remove _PAGE_IOMAP)
@ 2014-04-15 14:15 David Vrabel
  2014-04-15 14:15   ` David Vrabel
                   ` (12 more replies)
  0 siblings, 13 replies; 29+ messages in thread
From: David Vrabel @ 2014-04-15 14:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel,
	linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Mel Gorman

This a fix for the problems with mapping high MMIO regions in certain
cases (e.g., the RDMA drivers) as not all mappers were specifing
_PAGE_IOMAP which meant no valid MFN could be found and the resulting
PTEs would be set as not present, causing subsequent faults.

It assumes that anything that isn't RAM (whether ballooned out or not)
is an I/O region and thus should be 1:1 in the p2m.  Specifically, the
region after the end of the E820 map and the region beyond the end of
the p2m.  Ballooned frames are still marked as missing in the p2m as
before.

As a follow on, pte_mfn_to_pfn() and pte_pfn_to_mfn() are modified to
not use the _PAGE_IOMAP PTE flag and MFN to PFN and PFN to MFN
translations will now do the right thing for all I/O regions.  This
means the Xen-specific _PAGE_IOMAP can be removed,

This series has been tested (in dom0) on all unique machines we have
in out test lab (~100 machines), some of which have PCI devices with
BARs above the end of RAM.

Note this does not fix a 32-bit dom0 trying to access BARs above 16 TB
as this is a caused by MFNs/PFNs being limited to 32-bits (unsigned
long).

You may find it useful to apply patch #3 to more easily review the
updated p2m diagram.

Changes in v6:
- don't oops in spurious_fault() after faulting on an M2P access.
  This fixes an oops when userspace unmaps an I/O regions in certain
  cases.

Changes in v5:
- improve performance of set_phys_range_identity() by not iterating
  over all pages if p2m_mid_identity or p2m_identity mid or leaves are
  already present. (Thanks to Andrew Cooper for reporting this and
  Frediano Ziglio for providing a fix.)

Changes in v4:
- fix p2m_mid_identity initialization.

Changes in v3 (not posted):
- use correct end of e820
- fix xen_remap_domain_mfn_range()

Changes in v2:
- fix to actually set end-of-RAM to 512 GiB region as 1:1.
- introduce p2m_mid_identity to efficiently store large 1:1 regions.
- Split the _PAGE_IOMAP patch into Xen and generic x86 halves.

David


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/9] x86/xen: rename early_p2m_alloc() and early_p2m_alloc_middle()
  2014-04-15 14:15 [PATCHv6 0/9] x86/xen: fixes for mapping high MMIO regions (and remove _PAGE_IOMAP) David Vrabel
@ 2014-04-15 14:15   ` David Vrabel
  2014-04-15 14:15   ` David Vrabel
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: David Vrabel @ 2014-04-15 14:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel,
	linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Mel Gorman

early_p2m_alloc_middle() allocates a new leaf page and
early_p2m_alloc() allocates a new middle page.  This is confusing.

Swap the names so they match what the functions actually do.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/p2m.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 85e5d78..4fc71cc 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -596,7 +596,7 @@ static bool alloc_p2m(unsigned long pfn)
 	return true;
 }
 
-static bool __init early_alloc_p2m_middle(unsigned long pfn, bool check_boundary)
+static bool __init early_alloc_p2m(unsigned long pfn, bool check_boundary)
 {
 	unsigned topidx, mididx, idx;
 	unsigned long *p2m;
@@ -638,7 +638,7 @@ static bool __init early_alloc_p2m_middle(unsigned long pfn, bool check_boundary
 	return true;
 }
 
-static bool __init early_alloc_p2m(unsigned long pfn)
+static bool __init early_alloc_p2m_middle(unsigned long pfn)
 {
 	unsigned topidx = p2m_top_index(pfn);
 	unsigned long *mid_mfn_p;
@@ -663,7 +663,7 @@ static bool __init early_alloc_p2m(unsigned long pfn)
 		p2m_top_mfn_p[topidx] = mid_mfn_p;
 		p2m_top_mfn[topidx] = virt_to_mfn(mid_mfn_p);
 		/* Note: we don't set mid_mfn_p[midix] here,
-		 * look in early_alloc_p2m_middle */
+		 * look in early_alloc_p2m() */
 	}
 	return true;
 }
@@ -739,7 +739,7 @@ found:
 
 	/* This shouldn't happen */
 	if (WARN_ON(p2m_top[topidx] == p2m_mid_missing))
-		early_alloc_p2m(set_pfn);
+		early_alloc_p2m_middle(set_pfn);
 
 	if (WARN_ON(p2m_top[topidx][mididx] != p2m_missing))
 		return false;
@@ -754,13 +754,13 @@ found:
 bool __init early_set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 {
 	if (unlikely(!__set_phys_to_machine(pfn, mfn)))  {
-		if (!early_alloc_p2m(pfn))
+		if (!early_alloc_p2m_middle(pfn))
 			return false;
 
 		if (early_can_reuse_p2m_middle(pfn, mfn))
 			return __set_phys_to_machine(pfn, mfn);
 
-		if (!early_alloc_p2m_middle(pfn, false /* boundary crossover OK!*/))
+		if (!early_alloc_p2m(pfn, false /* boundary crossover OK!*/))
 			return false;
 
 		if (!__set_phys_to_machine(pfn, mfn))
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 1/9] x86/xen: rename early_p2m_alloc() and early_p2m_alloc_middle()
@ 2014-04-15 14:15   ` David Vrabel
  0 siblings, 0 replies; 29+ messages in thread
From: David Vrabel @ 2014-04-15 14:15 UTC (permalink / raw)
  To: xen-devel
  Cc: x86, linux-kernel, Ingo Molnar, David Vrabel, H. Peter Anvin,
	Boris Ostrovsky, Thomas Gleixner, Mel Gorman

early_p2m_alloc_middle() allocates a new leaf page and
early_p2m_alloc() allocates a new middle page.  This is confusing.

Swap the names so they match what the functions actually do.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/p2m.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 85e5d78..4fc71cc 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -596,7 +596,7 @@ static bool alloc_p2m(unsigned long pfn)
 	return true;
 }
 
-static bool __init early_alloc_p2m_middle(unsigned long pfn, bool check_boundary)
+static bool __init early_alloc_p2m(unsigned long pfn, bool check_boundary)
 {
 	unsigned topidx, mididx, idx;
 	unsigned long *p2m;
@@ -638,7 +638,7 @@ static bool __init early_alloc_p2m_middle(unsigned long pfn, bool check_boundary
 	return true;
 }
 
-static bool __init early_alloc_p2m(unsigned long pfn)
+static bool __init early_alloc_p2m_middle(unsigned long pfn)
 {
 	unsigned topidx = p2m_top_index(pfn);
 	unsigned long *mid_mfn_p;
@@ -663,7 +663,7 @@ static bool __init early_alloc_p2m(unsigned long pfn)
 		p2m_top_mfn_p[topidx] = mid_mfn_p;
 		p2m_top_mfn[topidx] = virt_to_mfn(mid_mfn_p);
 		/* Note: we don't set mid_mfn_p[midix] here,
-		 * look in early_alloc_p2m_middle */
+		 * look in early_alloc_p2m() */
 	}
 	return true;
 }
@@ -739,7 +739,7 @@ found:
 
 	/* This shouldn't happen */
 	if (WARN_ON(p2m_top[topidx] == p2m_mid_missing))
-		early_alloc_p2m(set_pfn);
+		early_alloc_p2m_middle(set_pfn);
 
 	if (WARN_ON(p2m_top[topidx][mididx] != p2m_missing))
 		return false;
@@ -754,13 +754,13 @@ found:
 bool __init early_set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 {
 	if (unlikely(!__set_phys_to_machine(pfn, mfn)))  {
-		if (!early_alloc_p2m(pfn))
+		if (!early_alloc_p2m_middle(pfn))
 			return false;
 
 		if (early_can_reuse_p2m_middle(pfn, mfn))
 			return __set_phys_to_machine(pfn, mfn);
 
-		if (!early_alloc_p2m_middle(pfn, false /* boundary crossover OK!*/))
+		if (!early_alloc_p2m(pfn, false /* boundary crossover OK!*/))
 			return false;
 
 		if (!__set_phys_to_machine(pfn, mfn))
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/9] x86/xen: fix set_phys_range_identity() if pfn_e > MAX_P2M_PFN
  2014-04-15 14:15 [PATCHv6 0/9] x86/xen: fixes for mapping high MMIO regions (and remove _PAGE_IOMAP) David Vrabel
@ 2014-04-15 14:15   ` David Vrabel
  2014-04-15 14:15   ` David Vrabel
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: David Vrabel @ 2014-04-15 14:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel,
	linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Mel Gorman

Allow set_phys_range_identity() to work with a range that overlaps
MAX_P2M_PFN by clamping pfn_e to MAX_P2M_PFN.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/p2m.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 4fc71cc..82c8c93 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -774,7 +774,7 @@ unsigned long __init set_phys_range_identity(unsigned long pfn_s,
 {
 	unsigned long pfn;
 
-	if (unlikely(pfn_s >= MAX_P2M_PFN || pfn_e >= MAX_P2M_PFN))
+	if (unlikely(pfn_s >= MAX_P2M_PFN))
 		return 0;
 
 	if (unlikely(xen_feature(XENFEAT_auto_translated_physmap)))
@@ -783,6 +783,9 @@ unsigned long __init set_phys_range_identity(unsigned long pfn_s,
 	if (pfn_s > pfn_e)
 		return 0;
 
+	if (pfn_e > MAX_P2M_PFN)
+		pfn_e = MAX_P2M_PFN;
+
 	for (pfn = (pfn_s & ~(P2M_MID_PER_PAGE * P2M_PER_PAGE - 1));
 		pfn < ALIGN(pfn_e, (P2M_MID_PER_PAGE * P2M_PER_PAGE));
 		pfn += P2M_MID_PER_PAGE * P2M_PER_PAGE)
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/9] x86/xen: fix set_phys_range_identity() if pfn_e > MAX_P2M_PFN
@ 2014-04-15 14:15   ` David Vrabel
  0 siblings, 0 replies; 29+ messages in thread
From: David Vrabel @ 2014-04-15 14:15 UTC (permalink / raw)
  To: xen-devel
  Cc: x86, linux-kernel, Ingo Molnar, David Vrabel, H. Peter Anvin,
	Boris Ostrovsky, Thomas Gleixner, Mel Gorman

Allow set_phys_range_identity() to work with a range that overlaps
MAX_P2M_PFN by clamping pfn_e to MAX_P2M_PFN.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/p2m.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 4fc71cc..82c8c93 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -774,7 +774,7 @@ unsigned long __init set_phys_range_identity(unsigned long pfn_s,
 {
 	unsigned long pfn;
 
-	if (unlikely(pfn_s >= MAX_P2M_PFN || pfn_e >= MAX_P2M_PFN))
+	if (unlikely(pfn_s >= MAX_P2M_PFN))
 		return 0;
 
 	if (unlikely(xen_feature(XENFEAT_auto_translated_physmap)))
@@ -783,6 +783,9 @@ unsigned long __init set_phys_range_identity(unsigned long pfn_s,
 	if (pfn_s > pfn_e)
 		return 0;
 
+	if (pfn_e > MAX_P2M_PFN)
+		pfn_e = MAX_P2M_PFN;
+
 	for (pfn = (pfn_s & ~(P2M_MID_PER_PAGE * P2M_PER_PAGE - 1));
 		pfn < ALIGN(pfn_e, (P2M_MID_PER_PAGE * P2M_PER_PAGE));
 		pfn += P2M_MID_PER_PAGE * P2M_PER_PAGE)
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/9] x86/xen: compactly store large identity ranges in the p2m
  2014-04-15 14:15 [PATCHv6 0/9] x86/xen: fixes for mapping high MMIO regions (and remove _PAGE_IOMAP) David Vrabel
@ 2014-04-15 14:15   ` David Vrabel
  2014-04-15 14:15   ` David Vrabel
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: David Vrabel @ 2014-04-15 14:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel,
	linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Mel Gorman

Large (multi-GB) identity ranges currently require a unique middle page
(filled with p2m_identity entries) per 1 GB region.

Similar to the common p2m_mid_missing middle page for large missing
regions, introduce a p2m_mid_identity page (filled with p2m_identity
entries) which can be used instead.

set_phys_range_identity() thus only needs to allocate new middle pages
at the beginning and end of the range.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/p2m.c |  155 +++++++++++++++++++++++++++++++++++-----------------
 1 files changed, 105 insertions(+), 50 deletions(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 82c8c93..5d716f7 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -36,7 +36,7 @@
  *  pfn_to_mfn(0xc0000)=0xc0000
  *
  * The benefit of this is, that we can assume for non-RAM regions (think
- * PCI BARs, or ACPI spaces), we can create mappings easily b/c we
+ * PCI BARs, or ACPI spaces), we can create mappings easily because we
  * get the PFN value to match the MFN.
  *
  * For this to work efficiently we have one new page p2m_identity and
@@ -60,7 +60,7 @@
  * There is also a digram of the P2M at the end that can help.
  * Imagine your E820 looking as so:
  *
- *                    1GB                                           2GB
+ *                    1GB                                           2GB    4GB
  * /-------------------+---------\/----\         /----------\    /---+-----\
  * | System RAM        | Sys RAM ||ACPI|         | reserved |    | Sys RAM |
  * \-------------------+---------/\----/         \----------/    \---+-----/
@@ -77,9 +77,8 @@
  * of the PFN and the end PFN (263424 and 512256 respectively). The first step
  * is to reserve_brk a top leaf page if the p2m[1] is missing. The top leaf page
  * covers 512^2 of page estate (1GB) and in case the start or end PFN is not
- * aligned on 512^2*PAGE_SIZE (1GB) we loop on aligned 1GB PFNs from start pfn
- * to end pfn.  We reserve_brk top leaf pages if they are missing (means they
- * point to p2m_mid_missing).
+ * aligned on 512^2*PAGE_SIZE (1GB) we reserve_brk new middle and leaf pages as
+ * required to split any existing p2m_mid_missing middle pages.
  *
  * With the E820 example above, 263424 is not 1GB aligned so we allocate a
  * reserve_brk page which will cover the PFNs estate from 0x40000 to 0x80000.
@@ -88,7 +87,7 @@
  * Next stage is to determine if we need to do a more granular boundary check
  * on the 4MB (or 2MB depending on architecture) off the start and end pfn's.
  * We check if the start pfn and end pfn violate that boundary check, and if
- * so reserve_brk a middle (p2m[x][y]) leaf page. This way we have a much finer
+ * so reserve_brk a (p2m[x][y]) leaf page. This way we have a much finer
  * granularity of setting which PFNs are missing and which ones are identity.
  * In our example 263424 and 512256 both fail the check so we reserve_brk two
  * pages. Populate them with INVALID_P2M_ENTRY (so they both have "missing"
@@ -102,9 +101,10 @@
  *
  * The next step is to walk from the start pfn to the end pfn setting
  * the IDENTITY_FRAME_BIT on each PFN. This is done in set_phys_range_identity.
- * If we find that the middle leaf is pointing to p2m_missing we can swap it
- * over to p2m_identity - this way covering 4MB (or 2MB) PFN space.  At this
- * point we do not need to worry about boundary aligment (so no need to
+ * If we find that the middle entry is pointing to p2m_missing we can swap it
+ * over to p2m_identity - this way covering 4MB (or 2MB) PFN space (and
+ * similarly swapping p2m_mid_missing for p2m_mid_identity for larger regions).
+ * At this point we do not need to worry about boundary aligment (so no need to
  * reserve_brk a middle page, figure out which PFNs are "missing" and which
  * ones are identity), as that has been done earlier.  If we find that the
  * middle leaf is not occupied by p2m_identity or p2m_missing, we dereference
@@ -118,6 +118,9 @@
  * considered missing). In our case, p2m[1][2][0->255] and p2m[1][488][257->511]
  * contain the INVALID_P2M_ENTRY value and are considered "missing."
  *
+ * Finally, the region beyond the end of of the E820 (4 GB in this example)
+ * is set to be identity (in case there are MMIO regions placed here).
+ *
  * This is what the p2m ends up looking (for the E820 above) with this
  * fabulous drawing:
  *
@@ -129,21 +132,27 @@
  *  |-----|    \                      | [p2m_identity]+\\    | ....            |
  *  |  2  |--\  \-------------------->|  ...          | \\   \----------------/
  *  |-----|   \                       \---------------/  \\
- *  |  3  |\   \                                          \\  p2m_identity
- *  |-----| \   \-------------------->/---------------\   /-----------------\
- *  | ..  +->+                        | [p2m_identity]+-->| ~0, ~0, ~0, ... |
- *  \-----/ /                         | [p2m_identity]+-->| ..., ~0         |
- *         / /---------------\        | ....          |   \-----------------/
- *        /  | IDENTITY[@0]  |      /-+-[x], ~0, ~0.. |
- *       /   | IDENTITY[@256]|<----/  \---------------/
- *      /    | ~0, ~0, ....  |
- *     |     \---------------/
- *     |
- *   p2m_mid_missing           p2m_missing
- * /-----------------\     /------------\
- * | [p2m_missing]   +---->| ~0, ~0, ~0 |
- * | [p2m_missing]   +---->| ..., ~0    |
- * \-----------------/     \------------/
+ *  |  3  |-\  \                                          \\  p2m_identity [1]
+ *  |-----|  \  \-------------------->/---------------\   /-----------------\
+ *  | ..  |\  |                       | [p2m_identity]+-->| ~0, ~0, ~0, ... |
+ *  \-----/ | |                       | [p2m_identity]+-->| ..., ~0         |
+ *          | |                       | ....          |   \-----------------/
+ *          | |                       +-[x], ~0, ~0.. +\
+ *          | |                       \---------------/ \
+ *          | |                                          \-> /---------------\
+ *          | V  p2m_mid_missing       p2m_missing           | IDENTITY[@0]  |
+ *          | /-----------------\     /------------\         | IDENTITY[@256]|
+ *          | | [p2m_missing]   +---->| ~0, ~0, ...|         | ~0, ~0, ....  |
+ *          | | [p2m_missing]   +---->| ..., ~0    |         \---------------/
+ *          | | ...             |     \------------/
+ *          | \-----------------/
+ *          |
+ *          |     p2m_mid_identity 
+ *          |   /-----------------\     
+ *          \-->| [p2m_identity]  +---->[1]
+ *              | [p2m_identity]  +---->[1]
+ *              | ...             |
+ *              \-----------------/
  *
  * where ~0 is INVALID_P2M_ENTRY. IDENTITY is (PFN | IDENTITY_BIT)
  */
@@ -187,13 +196,15 @@ static RESERVE_BRK_ARRAY(unsigned long, p2m_top_mfn, P2M_TOP_PER_PAGE);
 static RESERVE_BRK_ARRAY(unsigned long *, p2m_top_mfn_p, P2M_TOP_PER_PAGE);
 
 static RESERVE_BRK_ARRAY(unsigned long, p2m_identity, P2M_PER_PAGE);
+static RESERVE_BRK_ARRAY(unsigned long *, p2m_mid_identity, P2M_MID_PER_PAGE);
+static RESERVE_BRK_ARRAY(unsigned long, p2m_mid_identity_mfn, P2M_MID_PER_PAGE);
 
 RESERVE_BRK(p2m_mid, PAGE_SIZE * (MAX_DOMAIN_PAGES / (P2M_PER_PAGE * P2M_MID_PER_PAGE)));
 RESERVE_BRK(p2m_mid_mfn, PAGE_SIZE * (MAX_DOMAIN_PAGES / (P2M_PER_PAGE * P2M_MID_PER_PAGE)));
 
 /* We might hit two boundary violations at the start and end, at max each
  * boundary violation will require three middle nodes. */
-RESERVE_BRK(p2m_mid_identity, PAGE_SIZE * 2 * 3);
+RESERVE_BRK(p2m_mid_extra, PAGE_SIZE * 2 * 3);
 
 /* When we populate back during bootup, the amount of pages can vary. The
  * max we have is seen is 395979, but that does not mean it can't be more.
@@ -242,20 +253,20 @@ static void p2m_top_mfn_p_init(unsigned long **top)
 		top[i] = p2m_mid_missing_mfn;
 }
 
-static void p2m_mid_init(unsigned long **mid)
+static void p2m_mid_init(unsigned long **mid, unsigned long *leaf)
 {
 	unsigned i;
 
 	for (i = 0; i < P2M_MID_PER_PAGE; i++)
-		mid[i] = p2m_missing;
+		mid[i] = leaf;
 }
 
-static void p2m_mid_mfn_init(unsigned long *mid)
+static void p2m_mid_mfn_init(unsigned long *mid, unsigned long *leaf)
 {
 	unsigned i;
 
 	for (i = 0; i < P2M_MID_PER_PAGE; i++)
-		mid[i] = virt_to_mfn(p2m_missing);
+		mid[i] = virt_to_mfn(leaf);
 }
 
 static void p2m_init(unsigned long *p2m)
@@ -286,7 +297,9 @@ void __ref xen_build_mfn_list_list(void)
 	/* Pre-initialize p2m_top_mfn to be completely missing */
 	if (p2m_top_mfn == NULL) {
 		p2m_mid_missing_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
-		p2m_mid_mfn_init(p2m_mid_missing_mfn);
+		p2m_mid_mfn_init(p2m_mid_missing_mfn, p2m_missing);
+		p2m_mid_identity_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
+		p2m_mid_mfn_init(p2m_mid_identity_mfn, p2m_identity);
 
 		p2m_top_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
 		p2m_top_mfn_p_init(p2m_top_mfn_p);
@@ -295,7 +308,8 @@ void __ref xen_build_mfn_list_list(void)
 		p2m_top_mfn_init(p2m_top_mfn);
 	} else {
 		/* Reinitialise, mfn's all change after migration */
-		p2m_mid_mfn_init(p2m_mid_missing_mfn);
+		p2m_mid_mfn_init(p2m_mid_missing_mfn, p2m_missing);
+		p2m_mid_mfn_init(p2m_mid_identity_mfn, p2m_identity);
 	}
 
 	for (pfn = 0; pfn < xen_max_p2m_pfn; pfn += P2M_PER_PAGE) {
@@ -327,7 +341,7 @@ void __ref xen_build_mfn_list_list(void)
 			 * it too late.
 			 */
 			mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
-			p2m_mid_mfn_init(mid_mfn_p);
+			p2m_mid_mfn_init(mid_mfn_p, p2m_missing);
 
 			p2m_top_mfn_p[topidx] = mid_mfn_p;
 		}
@@ -365,16 +379,17 @@ void __init xen_build_dynamic_phys_to_machine(void)
 
 	p2m_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
 	p2m_init(p2m_missing);
+	p2m_identity = extend_brk(PAGE_SIZE, PAGE_SIZE);
+	p2m_init(p2m_identity);
 
 	p2m_mid_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
-	p2m_mid_init(p2m_mid_missing);
+	p2m_mid_init(p2m_mid_missing, p2m_missing);
+	p2m_mid_identity = extend_brk(PAGE_SIZE, PAGE_SIZE);
+	p2m_mid_init(p2m_mid_identity, p2m_identity);
 
 	p2m_top = extend_brk(PAGE_SIZE, PAGE_SIZE);
 	p2m_top_init(p2m_top);
 
-	p2m_identity = extend_brk(PAGE_SIZE, PAGE_SIZE);
-	p2m_init(p2m_identity);
-
 	/*
 	 * The domain builder gives us a pre-constructed p2m array in
 	 * mfn_list for all the pages initially given to us, so we just
@@ -386,7 +401,7 @@ void __init xen_build_dynamic_phys_to_machine(void)
 
 		if (p2m_top[topidx] == p2m_mid_missing) {
 			unsigned long **mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
-			p2m_mid_init(mid);
+			p2m_mid_init(mid, p2m_missing);
 
 			p2m_top[topidx] = mid;
 		}
@@ -545,7 +560,7 @@ static bool alloc_p2m(unsigned long pfn)
 		if (!mid)
 			return false;
 
-		p2m_mid_init(mid);
+		p2m_mid_init(mid, p2m_missing);
 
 		if (cmpxchg(top_p, p2m_mid_missing, mid) != p2m_mid_missing)
 			free_p2m_page(mid);
@@ -565,7 +580,7 @@ static bool alloc_p2m(unsigned long pfn)
 		if (!mid_mfn)
 			return false;
 
-		p2m_mid_mfn_init(mid_mfn);
+		p2m_mid_mfn_init(mid_mfn, p2m_missing);
 
 		missing_mfn = virt_to_mfn(p2m_mid_missing_mfn);
 		mid_mfn_mfn = virt_to_mfn(mid_mfn);
@@ -649,7 +664,7 @@ static bool __init early_alloc_p2m_middle(unsigned long pfn)
 	if (mid == p2m_mid_missing) {
 		mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
 
-		p2m_mid_init(mid);
+		p2m_mid_init(mid, p2m_missing);
 
 		p2m_top[topidx] = mid;
 
@@ -658,7 +673,7 @@ static bool __init early_alloc_p2m_middle(unsigned long pfn)
 	/* And the save/restore P2M tables.. */
 	if (mid_mfn_p == p2m_mid_missing_mfn) {
 		mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
-		p2m_mid_mfn_init(mid_mfn_p);
+		p2m_mid_mfn_init(mid_mfn_p, p2m_missing);
 
 		p2m_top_mfn_p[topidx] = mid_mfn_p;
 		p2m_top_mfn[topidx] = virt_to_mfn(mid_mfn_p);
@@ -769,6 +784,24 @@ bool __init early_set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 
 	return true;
 }
+
+static void __init early_split_p2m(unsigned long pfn)
+{
+	unsigned long mididx, idx;
+
+	mididx = p2m_mid_index(pfn);
+	idx = p2m_index(pfn);
+
+	/*
+	 * Allocate new middle and leaf pages if this pfn lies in the
+	 * middle of one.
+	 */
+	if (mididx || idx)
+		early_alloc_p2m_middle(pfn);
+	if (idx)
+		early_alloc_p2m(pfn, false);
+}
+
 unsigned long __init set_phys_range_identity(unsigned long pfn_s,
 				      unsigned long pfn_e)
 {
@@ -786,19 +819,27 @@ unsigned long __init set_phys_range_identity(unsigned long pfn_s,
 	if (pfn_e > MAX_P2M_PFN)
 		pfn_e = MAX_P2M_PFN;
 
-	for (pfn = (pfn_s & ~(P2M_MID_PER_PAGE * P2M_PER_PAGE - 1));
-		pfn < ALIGN(pfn_e, (P2M_MID_PER_PAGE * P2M_PER_PAGE));
-		pfn += P2M_MID_PER_PAGE * P2M_PER_PAGE)
-	{
-		WARN_ON(!early_alloc_p2m(pfn));
-	}
+	early_split_p2m(pfn_s);
+	early_split_p2m(pfn_e);
 
-	early_alloc_p2m_middle(pfn_s, true);
-	early_alloc_p2m_middle(pfn_e, true);
+	for (pfn = pfn_s; pfn < pfn_e;) {
+		unsigned topidx = p2m_top_index(pfn);
+		unsigned mididx = p2m_mid_index(pfn);
 
-	for (pfn = pfn_s; pfn < pfn_e; pfn++)
 		if (!__set_phys_to_machine(pfn, IDENTITY_FRAME(pfn)))
 			break;
+		pfn++;
+
+		/*
+		 * If the PFN was set to a middle or leaf identity
+		 * page the remainder must also be identity, so skip
+		 * ahead to the next middle or leaf entry.
+		 */
+		if (p2m_top[topidx] == p2m_mid_identity)
+			pfn = ALIGN(pfn, P2M_MID_PER_PAGE * P2M_PER_PAGE);
+		else if (p2m_top[topidx][mididx] == p2m_identity)
+			pfn = ALIGN(pfn, P2M_PER_PAGE);
+	}
 
 	if (!WARN((pfn - pfn_s) != (pfn_e - pfn_s),
 		"Identity mapping failed. We are %ld short of 1-1 mappings!\n",
@@ -828,8 +869,22 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 
 	/* For sparse holes were the p2m leaf has real PFN along with
 	 * PCI holes, stick in the PFN as the MFN value.
+	 *
+	 * set_phys_range_identity() will have allocated new middle
+	 * and leaf pages as required so an existing p2m_mid_missing
+	 * or p2m_missing mean that whole range will be identity so
+	 * these can be switched to p2m_mid_identity or p2m_identity.
 	 */
 	if (mfn != INVALID_P2M_ENTRY && (mfn & IDENTITY_FRAME_BIT)) {
+		if (p2m_top[topidx] == p2m_mid_identity)
+			return true;
+
+		if (p2m_top[topidx] == p2m_mid_missing) {
+			WARN_ON(cmpxchg(&p2m_top[topidx], p2m_mid_missing,
+					p2m_mid_identity) != p2m_mid_missing);
+			return true;
+		}
+
 		if (p2m_top[topidx][mididx] == p2m_identity)
 			return true;
 
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/9] x86/xen: compactly store large identity ranges in the p2m
@ 2014-04-15 14:15   ` David Vrabel
  0 siblings, 0 replies; 29+ messages in thread
From: David Vrabel @ 2014-04-15 14:15 UTC (permalink / raw)
  To: xen-devel
  Cc: x86, linux-kernel, Ingo Molnar, David Vrabel, H. Peter Anvin,
	Boris Ostrovsky, Thomas Gleixner, Mel Gorman

Large (multi-GB) identity ranges currently require a unique middle page
(filled with p2m_identity entries) per 1 GB region.

Similar to the common p2m_mid_missing middle page for large missing
regions, introduce a p2m_mid_identity page (filled with p2m_identity
entries) which can be used instead.

set_phys_range_identity() thus only needs to allocate new middle pages
at the beginning and end of the range.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/p2m.c |  155 +++++++++++++++++++++++++++++++++++-----------------
 1 files changed, 105 insertions(+), 50 deletions(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 82c8c93..5d716f7 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -36,7 +36,7 @@
  *  pfn_to_mfn(0xc0000)=0xc0000
  *
  * The benefit of this is, that we can assume for non-RAM regions (think
- * PCI BARs, or ACPI spaces), we can create mappings easily b/c we
+ * PCI BARs, or ACPI spaces), we can create mappings easily because we
  * get the PFN value to match the MFN.
  *
  * For this to work efficiently we have one new page p2m_identity and
@@ -60,7 +60,7 @@
  * There is also a digram of the P2M at the end that can help.
  * Imagine your E820 looking as so:
  *
- *                    1GB                                           2GB
+ *                    1GB                                           2GB    4GB
  * /-------------------+---------\/----\         /----------\    /---+-----\
  * | System RAM        | Sys RAM ||ACPI|         | reserved |    | Sys RAM |
  * \-------------------+---------/\----/         \----------/    \---+-----/
@@ -77,9 +77,8 @@
  * of the PFN and the end PFN (263424 and 512256 respectively). The first step
  * is to reserve_brk a top leaf page if the p2m[1] is missing. The top leaf page
  * covers 512^2 of page estate (1GB) and in case the start or end PFN is not
- * aligned on 512^2*PAGE_SIZE (1GB) we loop on aligned 1GB PFNs from start pfn
- * to end pfn.  We reserve_brk top leaf pages if they are missing (means they
- * point to p2m_mid_missing).
+ * aligned on 512^2*PAGE_SIZE (1GB) we reserve_brk new middle and leaf pages as
+ * required to split any existing p2m_mid_missing middle pages.
  *
  * With the E820 example above, 263424 is not 1GB aligned so we allocate a
  * reserve_brk page which will cover the PFNs estate from 0x40000 to 0x80000.
@@ -88,7 +87,7 @@
  * Next stage is to determine if we need to do a more granular boundary check
  * on the 4MB (or 2MB depending on architecture) off the start and end pfn's.
  * We check if the start pfn and end pfn violate that boundary check, and if
- * so reserve_brk a middle (p2m[x][y]) leaf page. This way we have a much finer
+ * so reserve_brk a (p2m[x][y]) leaf page. This way we have a much finer
  * granularity of setting which PFNs are missing and which ones are identity.
  * In our example 263424 and 512256 both fail the check so we reserve_brk two
  * pages. Populate them with INVALID_P2M_ENTRY (so they both have "missing"
@@ -102,9 +101,10 @@
  *
  * The next step is to walk from the start pfn to the end pfn setting
  * the IDENTITY_FRAME_BIT on each PFN. This is done in set_phys_range_identity.
- * If we find that the middle leaf is pointing to p2m_missing we can swap it
- * over to p2m_identity - this way covering 4MB (or 2MB) PFN space.  At this
- * point we do not need to worry about boundary aligment (so no need to
+ * If we find that the middle entry is pointing to p2m_missing we can swap it
+ * over to p2m_identity - this way covering 4MB (or 2MB) PFN space (and
+ * similarly swapping p2m_mid_missing for p2m_mid_identity for larger regions).
+ * At this point we do not need to worry about boundary aligment (so no need to
  * reserve_brk a middle page, figure out which PFNs are "missing" and which
  * ones are identity), as that has been done earlier.  If we find that the
  * middle leaf is not occupied by p2m_identity or p2m_missing, we dereference
@@ -118,6 +118,9 @@
  * considered missing). In our case, p2m[1][2][0->255] and p2m[1][488][257->511]
  * contain the INVALID_P2M_ENTRY value and are considered "missing."
  *
+ * Finally, the region beyond the end of of the E820 (4 GB in this example)
+ * is set to be identity (in case there are MMIO regions placed here).
+ *
  * This is what the p2m ends up looking (for the E820 above) with this
  * fabulous drawing:
  *
@@ -129,21 +132,27 @@
  *  |-----|    \                      | [p2m_identity]+\\    | ....            |
  *  |  2  |--\  \-------------------->|  ...          | \\   \----------------/
  *  |-----|   \                       \---------------/  \\
- *  |  3  |\   \                                          \\  p2m_identity
- *  |-----| \   \-------------------->/---------------\   /-----------------\
- *  | ..  +->+                        | [p2m_identity]+-->| ~0, ~0, ~0, ... |
- *  \-----/ /                         | [p2m_identity]+-->| ..., ~0         |
- *         / /---------------\        | ....          |   \-----------------/
- *        /  | IDENTITY[@0]  |      /-+-[x], ~0, ~0.. |
- *       /   | IDENTITY[@256]|<----/  \---------------/
- *      /    | ~0, ~0, ....  |
- *     |     \---------------/
- *     |
- *   p2m_mid_missing           p2m_missing
- * /-----------------\     /------------\
- * | [p2m_missing]   +---->| ~0, ~0, ~0 |
- * | [p2m_missing]   +---->| ..., ~0    |
- * \-----------------/     \------------/
+ *  |  3  |-\  \                                          \\  p2m_identity [1]
+ *  |-----|  \  \-------------------->/---------------\   /-----------------\
+ *  | ..  |\  |                       | [p2m_identity]+-->| ~0, ~0, ~0, ... |
+ *  \-----/ | |                       | [p2m_identity]+-->| ..., ~0         |
+ *          | |                       | ....          |   \-----------------/
+ *          | |                       +-[x], ~0, ~0.. +\
+ *          | |                       \---------------/ \
+ *          | |                                          \-> /---------------\
+ *          | V  p2m_mid_missing       p2m_missing           | IDENTITY[@0]  |
+ *          | /-----------------\     /------------\         | IDENTITY[@256]|
+ *          | | [p2m_missing]   +---->| ~0, ~0, ...|         | ~0, ~0, ....  |
+ *          | | [p2m_missing]   +---->| ..., ~0    |         \---------------/
+ *          | | ...             |     \------------/
+ *          | \-----------------/
+ *          |
+ *          |     p2m_mid_identity 
+ *          |   /-----------------\     
+ *          \-->| [p2m_identity]  +---->[1]
+ *              | [p2m_identity]  +---->[1]
+ *              | ...             |
+ *              \-----------------/
  *
  * where ~0 is INVALID_P2M_ENTRY. IDENTITY is (PFN | IDENTITY_BIT)
  */
@@ -187,13 +196,15 @@ static RESERVE_BRK_ARRAY(unsigned long, p2m_top_mfn, P2M_TOP_PER_PAGE);
 static RESERVE_BRK_ARRAY(unsigned long *, p2m_top_mfn_p, P2M_TOP_PER_PAGE);
 
 static RESERVE_BRK_ARRAY(unsigned long, p2m_identity, P2M_PER_PAGE);
+static RESERVE_BRK_ARRAY(unsigned long *, p2m_mid_identity, P2M_MID_PER_PAGE);
+static RESERVE_BRK_ARRAY(unsigned long, p2m_mid_identity_mfn, P2M_MID_PER_PAGE);
 
 RESERVE_BRK(p2m_mid, PAGE_SIZE * (MAX_DOMAIN_PAGES / (P2M_PER_PAGE * P2M_MID_PER_PAGE)));
 RESERVE_BRK(p2m_mid_mfn, PAGE_SIZE * (MAX_DOMAIN_PAGES / (P2M_PER_PAGE * P2M_MID_PER_PAGE)));
 
 /* We might hit two boundary violations at the start and end, at max each
  * boundary violation will require three middle nodes. */
-RESERVE_BRK(p2m_mid_identity, PAGE_SIZE * 2 * 3);
+RESERVE_BRK(p2m_mid_extra, PAGE_SIZE * 2 * 3);
 
 /* When we populate back during bootup, the amount of pages can vary. The
  * max we have is seen is 395979, but that does not mean it can't be more.
@@ -242,20 +253,20 @@ static void p2m_top_mfn_p_init(unsigned long **top)
 		top[i] = p2m_mid_missing_mfn;
 }
 
-static void p2m_mid_init(unsigned long **mid)
+static void p2m_mid_init(unsigned long **mid, unsigned long *leaf)
 {
 	unsigned i;
 
 	for (i = 0; i < P2M_MID_PER_PAGE; i++)
-		mid[i] = p2m_missing;
+		mid[i] = leaf;
 }
 
-static void p2m_mid_mfn_init(unsigned long *mid)
+static void p2m_mid_mfn_init(unsigned long *mid, unsigned long *leaf)
 {
 	unsigned i;
 
 	for (i = 0; i < P2M_MID_PER_PAGE; i++)
-		mid[i] = virt_to_mfn(p2m_missing);
+		mid[i] = virt_to_mfn(leaf);
 }
 
 static void p2m_init(unsigned long *p2m)
@@ -286,7 +297,9 @@ void __ref xen_build_mfn_list_list(void)
 	/* Pre-initialize p2m_top_mfn to be completely missing */
 	if (p2m_top_mfn == NULL) {
 		p2m_mid_missing_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
-		p2m_mid_mfn_init(p2m_mid_missing_mfn);
+		p2m_mid_mfn_init(p2m_mid_missing_mfn, p2m_missing);
+		p2m_mid_identity_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
+		p2m_mid_mfn_init(p2m_mid_identity_mfn, p2m_identity);
 
 		p2m_top_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
 		p2m_top_mfn_p_init(p2m_top_mfn_p);
@@ -295,7 +308,8 @@ void __ref xen_build_mfn_list_list(void)
 		p2m_top_mfn_init(p2m_top_mfn);
 	} else {
 		/* Reinitialise, mfn's all change after migration */
-		p2m_mid_mfn_init(p2m_mid_missing_mfn);
+		p2m_mid_mfn_init(p2m_mid_missing_mfn, p2m_missing);
+		p2m_mid_mfn_init(p2m_mid_identity_mfn, p2m_identity);
 	}
 
 	for (pfn = 0; pfn < xen_max_p2m_pfn; pfn += P2M_PER_PAGE) {
@@ -327,7 +341,7 @@ void __ref xen_build_mfn_list_list(void)
 			 * it too late.
 			 */
 			mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
-			p2m_mid_mfn_init(mid_mfn_p);
+			p2m_mid_mfn_init(mid_mfn_p, p2m_missing);
 
 			p2m_top_mfn_p[topidx] = mid_mfn_p;
 		}
@@ -365,16 +379,17 @@ void __init xen_build_dynamic_phys_to_machine(void)
 
 	p2m_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
 	p2m_init(p2m_missing);
+	p2m_identity = extend_brk(PAGE_SIZE, PAGE_SIZE);
+	p2m_init(p2m_identity);
 
 	p2m_mid_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
-	p2m_mid_init(p2m_mid_missing);
+	p2m_mid_init(p2m_mid_missing, p2m_missing);
+	p2m_mid_identity = extend_brk(PAGE_SIZE, PAGE_SIZE);
+	p2m_mid_init(p2m_mid_identity, p2m_identity);
 
 	p2m_top = extend_brk(PAGE_SIZE, PAGE_SIZE);
 	p2m_top_init(p2m_top);
 
-	p2m_identity = extend_brk(PAGE_SIZE, PAGE_SIZE);
-	p2m_init(p2m_identity);
-
 	/*
 	 * The domain builder gives us a pre-constructed p2m array in
 	 * mfn_list for all the pages initially given to us, so we just
@@ -386,7 +401,7 @@ void __init xen_build_dynamic_phys_to_machine(void)
 
 		if (p2m_top[topidx] == p2m_mid_missing) {
 			unsigned long **mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
-			p2m_mid_init(mid);
+			p2m_mid_init(mid, p2m_missing);
 
 			p2m_top[topidx] = mid;
 		}
@@ -545,7 +560,7 @@ static bool alloc_p2m(unsigned long pfn)
 		if (!mid)
 			return false;
 
-		p2m_mid_init(mid);
+		p2m_mid_init(mid, p2m_missing);
 
 		if (cmpxchg(top_p, p2m_mid_missing, mid) != p2m_mid_missing)
 			free_p2m_page(mid);
@@ -565,7 +580,7 @@ static bool alloc_p2m(unsigned long pfn)
 		if (!mid_mfn)
 			return false;
 
-		p2m_mid_mfn_init(mid_mfn);
+		p2m_mid_mfn_init(mid_mfn, p2m_missing);
 
 		missing_mfn = virt_to_mfn(p2m_mid_missing_mfn);
 		mid_mfn_mfn = virt_to_mfn(mid_mfn);
@@ -649,7 +664,7 @@ static bool __init early_alloc_p2m_middle(unsigned long pfn)
 	if (mid == p2m_mid_missing) {
 		mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
 
-		p2m_mid_init(mid);
+		p2m_mid_init(mid, p2m_missing);
 
 		p2m_top[topidx] = mid;
 
@@ -658,7 +673,7 @@ static bool __init early_alloc_p2m_middle(unsigned long pfn)
 	/* And the save/restore P2M tables.. */
 	if (mid_mfn_p == p2m_mid_missing_mfn) {
 		mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
-		p2m_mid_mfn_init(mid_mfn_p);
+		p2m_mid_mfn_init(mid_mfn_p, p2m_missing);
 
 		p2m_top_mfn_p[topidx] = mid_mfn_p;
 		p2m_top_mfn[topidx] = virt_to_mfn(mid_mfn_p);
@@ -769,6 +784,24 @@ bool __init early_set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 
 	return true;
 }
+
+static void __init early_split_p2m(unsigned long pfn)
+{
+	unsigned long mididx, idx;
+
+	mididx = p2m_mid_index(pfn);
+	idx = p2m_index(pfn);
+
+	/*
+	 * Allocate new middle and leaf pages if this pfn lies in the
+	 * middle of one.
+	 */
+	if (mididx || idx)
+		early_alloc_p2m_middle(pfn);
+	if (idx)
+		early_alloc_p2m(pfn, false);
+}
+
 unsigned long __init set_phys_range_identity(unsigned long pfn_s,
 				      unsigned long pfn_e)
 {
@@ -786,19 +819,27 @@ unsigned long __init set_phys_range_identity(unsigned long pfn_s,
 	if (pfn_e > MAX_P2M_PFN)
 		pfn_e = MAX_P2M_PFN;
 
-	for (pfn = (pfn_s & ~(P2M_MID_PER_PAGE * P2M_PER_PAGE - 1));
-		pfn < ALIGN(pfn_e, (P2M_MID_PER_PAGE * P2M_PER_PAGE));
-		pfn += P2M_MID_PER_PAGE * P2M_PER_PAGE)
-	{
-		WARN_ON(!early_alloc_p2m(pfn));
-	}
+	early_split_p2m(pfn_s);
+	early_split_p2m(pfn_e);
 
-	early_alloc_p2m_middle(pfn_s, true);
-	early_alloc_p2m_middle(pfn_e, true);
+	for (pfn = pfn_s; pfn < pfn_e;) {
+		unsigned topidx = p2m_top_index(pfn);
+		unsigned mididx = p2m_mid_index(pfn);
 
-	for (pfn = pfn_s; pfn < pfn_e; pfn++)
 		if (!__set_phys_to_machine(pfn, IDENTITY_FRAME(pfn)))
 			break;
+		pfn++;
+
+		/*
+		 * If the PFN was set to a middle or leaf identity
+		 * page the remainder must also be identity, so skip
+		 * ahead to the next middle or leaf entry.
+		 */
+		if (p2m_top[topidx] == p2m_mid_identity)
+			pfn = ALIGN(pfn, P2M_MID_PER_PAGE * P2M_PER_PAGE);
+		else if (p2m_top[topidx][mididx] == p2m_identity)
+			pfn = ALIGN(pfn, P2M_PER_PAGE);
+	}
 
 	if (!WARN((pfn - pfn_s) != (pfn_e - pfn_s),
 		"Identity mapping failed. We are %ld short of 1-1 mappings!\n",
@@ -828,8 +869,22 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 
 	/* For sparse holes were the p2m leaf has real PFN along with
 	 * PCI holes, stick in the PFN as the MFN value.
+	 *
+	 * set_phys_range_identity() will have allocated new middle
+	 * and leaf pages as required so an existing p2m_mid_missing
+	 * or p2m_missing mean that whole range will be identity so
+	 * these can be switched to p2m_mid_identity or p2m_identity.
 	 */
 	if (mfn != INVALID_P2M_ENTRY && (mfn & IDENTITY_FRAME_BIT)) {
+		if (p2m_top[topidx] == p2m_mid_identity)
+			return true;
+
+		if (p2m_top[topidx] == p2m_mid_missing) {
+			WARN_ON(cmpxchg(&p2m_top[topidx], p2m_mid_missing,
+					p2m_mid_identity) != p2m_mid_missing);
+			return true;
+		}
+
 		if (p2m_top[topidx][mididx] == p2m_identity)
 			return true;
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 4/9] x86/xen: only warn once if bad MFNs are found during setup
  2014-04-15 14:15 [PATCHv6 0/9] x86/xen: fixes for mapping high MMIO regions (and remove _PAGE_IOMAP) David Vrabel
                   ` (2 preceding siblings ...)
  2014-04-15 14:15   ` David Vrabel
@ 2014-04-15 14:15 ` David Vrabel
  2014-04-15 14:15 ` David Vrabel
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: David Vrabel @ 2014-04-15 14:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel,
	linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Mel Gorman

In xen_add_extra_mem(), if the WARN() checks for bad MFNs trigger it is
likely that they will trigger at lot, spamming the log.

Use WARN_ONCE() instead.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/setup.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 0982233..2afe55e 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -89,10 +89,10 @@ static void __init xen_add_extra_mem(u64 start, u64 size)
 	for (pfn = PFN_DOWN(start); pfn < xen_max_p2m_pfn; pfn++) {
 		unsigned long mfn = pfn_to_mfn(pfn);
 
-		if (WARN(mfn == pfn, "Trying to over-write 1-1 mapping (pfn: %lx)\n", pfn))
+		if (WARN_ONCE(mfn == pfn, "Trying to over-write 1-1 mapping (pfn: %lx)\n", pfn))
 			continue;
-		WARN(mfn != INVALID_P2M_ENTRY, "Trying to remove %lx which has %lx mfn!\n",
-			pfn, mfn);
+		WARN_ONCE(mfn != INVALID_P2M_ENTRY, "Trying to remove %lx which has %lx mfn!\n",
+			  pfn, mfn);
 
 		__set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
 	}
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 4/9] x86/xen: only warn once if bad MFNs are found during setup
  2014-04-15 14:15 [PATCHv6 0/9] x86/xen: fixes for mapping high MMIO regions (and remove _PAGE_IOMAP) David Vrabel
                   ` (3 preceding siblings ...)
  2014-04-15 14:15 ` [PATCH 4/9] x86/xen: only warn once if bad MFNs are found during setup David Vrabel
@ 2014-04-15 14:15 ` David Vrabel
  2014-04-15 14:15 ` [PATCH 5/9] x86/xen: set regions above the end of RAM as 1:1 David Vrabel
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: David Vrabel @ 2014-04-15 14:15 UTC (permalink / raw)
  To: xen-devel
  Cc: x86, linux-kernel, Ingo Molnar, David Vrabel, H. Peter Anvin,
	Boris Ostrovsky, Thomas Gleixner, Mel Gorman

In xen_add_extra_mem(), if the WARN() checks for bad MFNs trigger it is
likely that they will trigger at lot, spamming the log.

Use WARN_ONCE() instead.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/setup.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 0982233..2afe55e 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -89,10 +89,10 @@ static void __init xen_add_extra_mem(u64 start, u64 size)
 	for (pfn = PFN_DOWN(start); pfn < xen_max_p2m_pfn; pfn++) {
 		unsigned long mfn = pfn_to_mfn(pfn);
 
-		if (WARN(mfn == pfn, "Trying to over-write 1-1 mapping (pfn: %lx)\n", pfn))
+		if (WARN_ONCE(mfn == pfn, "Trying to over-write 1-1 mapping (pfn: %lx)\n", pfn))
 			continue;
-		WARN(mfn != INVALID_P2M_ENTRY, "Trying to remove %lx which has %lx mfn!\n",
-			pfn, mfn);
+		WARN_ONCE(mfn != INVALID_P2M_ENTRY, "Trying to remove %lx which has %lx mfn!\n",
+			  pfn, mfn);
 
 		__set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
 	}
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 5/9] x86/xen: set regions above the end of RAM as 1:1
  2014-04-15 14:15 [PATCHv6 0/9] x86/xen: fixes for mapping high MMIO regions (and remove _PAGE_IOMAP) David Vrabel
                   ` (4 preceding siblings ...)
  2014-04-15 14:15 ` David Vrabel
@ 2014-04-15 14:15 ` David Vrabel
  2014-04-15 14:15 ` David Vrabel
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: David Vrabel @ 2014-04-15 14:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel,
	linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Mel Gorman

PCI devices may have BARs located above the end of RAM so mark such
frames as identity frames in the p2m (instead of the default of
missing).

PFNs outside the p2m (above MAX_P2M_PFN) are also considered to be
identity frames for the same reason.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/p2m.c   |    2 +-
 arch/x86/xen/setup.c |    9 +++++++++
 2 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 5d716f7..416c356 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -507,7 +507,7 @@ unsigned long get_phys_to_machine(unsigned long pfn)
 	unsigned topidx, mididx, idx;
 
 	if (unlikely(pfn >= MAX_P2M_PFN))
-		return INVALID_P2M_ENTRY;
+		return IDENTITY_FRAME(pfn);
 
 	topidx = p2m_top_index(pfn);
 	mididx = p2m_mid_index(pfn);
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 2afe55e..210426a 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -469,6 +469,15 @@ char * __init xen_memory_setup(void)
 	}
 
 	/*
+	 * Set the rest as identity mapped, in case PCI BARs are
+	 * located here.
+	 *
+	 * PFNs above MAX_P2M_PFN are considered identity mapped as
+	 * well.
+	 */
+	set_phys_range_identity(map[i-1].addr / PAGE_SIZE, ~0ul);
+
+	/*
 	 * In domU, the ISA region is normal, usable memory, but we
 	 * reserve ISA memory anyway because too many things poke
 	 * about in there.
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 5/9] x86/xen: set regions above the end of RAM as 1:1
  2014-04-15 14:15 [PATCHv6 0/9] x86/xen: fixes for mapping high MMIO regions (and remove _PAGE_IOMAP) David Vrabel
                   ` (5 preceding siblings ...)
  2014-04-15 14:15 ` [PATCH 5/9] x86/xen: set regions above the end of RAM as 1:1 David Vrabel
@ 2014-04-15 14:15 ` David Vrabel
  2014-04-15 14:15   ` David Vrabel
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: David Vrabel @ 2014-04-15 14:15 UTC (permalink / raw)
  To: xen-devel
  Cc: x86, linux-kernel, Ingo Molnar, David Vrabel, H. Peter Anvin,
	Boris Ostrovsky, Thomas Gleixner, Mel Gorman

PCI devices may have BARs located above the end of RAM so mark such
frames as identity frames in the p2m (instead of the default of
missing).

PFNs outside the p2m (above MAX_P2M_PFN) are also considered to be
identity frames for the same reason.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/p2m.c   |    2 +-
 arch/x86/xen/setup.c |    9 +++++++++
 2 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 5d716f7..416c356 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -507,7 +507,7 @@ unsigned long get_phys_to_machine(unsigned long pfn)
 	unsigned topidx, mididx, idx;
 
 	if (unlikely(pfn >= MAX_P2M_PFN))
-		return INVALID_P2M_ENTRY;
+		return IDENTITY_FRAME(pfn);
 
 	topidx = p2m_top_index(pfn);
 	mididx = p2m_mid_index(pfn);
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 2afe55e..210426a 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -469,6 +469,15 @@ char * __init xen_memory_setup(void)
 	}
 
 	/*
+	 * Set the rest as identity mapped, in case PCI BARs are
+	 * located here.
+	 *
+	 * PFNs above MAX_P2M_PFN are considered identity mapped as
+	 * well.
+	 */
+	set_phys_range_identity(map[i-1].addr / PAGE_SIZE, ~0ul);
+
+	/*
 	 * In domU, the ISA region is normal, usable memory, but we
 	 * reserve ISA memory anyway because too many things poke
 	 * about in there.
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 6/9] x86/xen: do not use _PAGE_IOMAP in xen_remap_domain_mfn_range()
  2014-04-15 14:15 [PATCHv6 0/9] x86/xen: fixes for mapping high MMIO regions (and remove _PAGE_IOMAP) David Vrabel
@ 2014-04-15 14:15   ` David Vrabel
  2014-04-15 14:15   ` David Vrabel
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: David Vrabel @ 2014-04-15 14:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel,
	linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Mel Gorman

_PAGE_IOMAP is used in xen_remap_domain_mfn_range() to prevent the
pfn_pte() call in remap_area_mfn_pte_fn() from using the p2m to translate
the MFN.  If mfn_pte() is used instead, the p2m look up is avoided and
the use of _PAGE_IOMAP is no longer needed.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/mmu.c |    4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 86e02ea..d916024 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2522,7 +2522,7 @@ static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
 				 unsigned long addr, void *data)
 {
 	struct remap_data *rmd = data;
-	pte_t pte = pte_mkspecial(pfn_pte(rmd->mfn++, rmd->prot));
+	pte_t pte = pte_mkspecial(mfn_pte(rmd->mfn++, rmd->prot));
 
 	rmd->mmu_update->ptr = virt_to_machine(ptep).maddr;
 	rmd->mmu_update->val = pte_val_ma(pte);
@@ -2547,8 +2547,6 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
 	if (xen_feature(XENFEAT_auto_translated_physmap))
 		return -EINVAL;
 
-	prot = __pgprot(pgprot_val(prot) | _PAGE_IOMAP);
-
 	BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));
 
 	rmd.mfn = mfn;
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 6/9] x86/xen: do not use _PAGE_IOMAP in xen_remap_domain_mfn_range()
@ 2014-04-15 14:15   ` David Vrabel
  0 siblings, 0 replies; 29+ messages in thread
From: David Vrabel @ 2014-04-15 14:15 UTC (permalink / raw)
  To: xen-devel
  Cc: x86, linux-kernel, Ingo Molnar, David Vrabel, H. Peter Anvin,
	Boris Ostrovsky, Thomas Gleixner, Mel Gorman

_PAGE_IOMAP is used in xen_remap_domain_mfn_range() to prevent the
pfn_pte() call in remap_area_mfn_pte_fn() from using the p2m to translate
the MFN.  If mfn_pte() is used instead, the p2m look up is avoided and
the use of _PAGE_IOMAP is no longer needed.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/mmu.c |    4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 86e02ea..d916024 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2522,7 +2522,7 @@ static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
 				 unsigned long addr, void *data)
 {
 	struct remap_data *rmd = data;
-	pte_t pte = pte_mkspecial(pfn_pte(rmd->mfn++, rmd->prot));
+	pte_t pte = pte_mkspecial(mfn_pte(rmd->mfn++, rmd->prot));
 
 	rmd->mmu_update->ptr = virt_to_machine(ptep).maddr;
 	rmd->mmu_update->val = pte_val_ma(pte);
@@ -2547,8 +2547,6 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
 	if (xen_feature(XENFEAT_auto_translated_physmap))
 		return -EINVAL;
 
-	prot = __pgprot(pgprot_val(prot) | _PAGE_IOMAP);
-
 	BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));
 
 	rmd.mfn = mfn;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 7/9] x86: skip check for spurious faults for non-present faults
  2014-04-15 14:15 [PATCHv6 0/9] x86/xen: fixes for mapping high MMIO regions (and remove _PAGE_IOMAP) David Vrabel
@ 2014-04-15 14:15   ` David Vrabel
  2014-04-15 14:15   ` David Vrabel
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: David Vrabel @ 2014-04-15 14:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel,
	linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Mel Gorman

If a fault on a kernel address is due to a non-present page, then it
cannot be the result of stale TLB entry from a protection change (RO
to RW or NX to X).  Thus the pagetable walk in spurious_fault() can be
skipped.

This avoids spurious_fault() oopsing in some cases if the pagetables
it attempts to walk are not accessible.  This obscures the location of
the original fault.

This also fixes a crash with Xen PV guests when they access entries in
the M2P corresponding to device MMIO regions.  The M2P is mapped
(read-only) by Xen into the kernel address space of the guest and this
mapping may contains holes for non-RAM regions.  Read faults will
result in calls to spurious_fault(), but because the page tables for
the M2P mappings are not accessible by the guest the pagetable walk
would fault.

This was not normally a problem as MMIO mappings would not normally
result in a M2P lookup because of the use of the _PAGE_IOMAP bit the
PTE.  However, removing the _PAGE_IOMAP bit requires M2P lookups for
MMIO mappings as well.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
x86 maintainers, this is a prerequisite for removing Xen's usage of
_PAGE_IOMAP so I think this is best merged via the Xen tree.
---
 arch/x86/mm/fault.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 8e57229..c39e249 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -936,8 +936,10 @@ spurious_fault(unsigned long error_code, unsigned long address)
 	pte_t *pte;
 	int ret;
 
-	/* Reserved-bit violation or user access to kernel space? */
-	if (error_code & (PF_USER | PF_RSVD))
+	/* Only check for spurious faults on supervisor write or
+	   instruction faults. */
+	if (error_code != (PF_WRITE | PF_PROT)
+	    && error_code != (PF_INSTR | PF_PROT))
 		return 0;
 
 	pgd = init_mm.pgd + pgd_index(address);
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 7/9] x86: skip check for spurious faults for non-present faults
@ 2014-04-15 14:15   ` David Vrabel
  0 siblings, 0 replies; 29+ messages in thread
From: David Vrabel @ 2014-04-15 14:15 UTC (permalink / raw)
  To: xen-devel
  Cc: x86, linux-kernel, Ingo Molnar, David Vrabel, H. Peter Anvin,
	Boris Ostrovsky, Thomas Gleixner, Mel Gorman

If a fault on a kernel address is due to a non-present page, then it
cannot be the result of stale TLB entry from a protection change (RO
to RW or NX to X).  Thus the pagetable walk in spurious_fault() can be
skipped.

This avoids spurious_fault() oopsing in some cases if the pagetables
it attempts to walk are not accessible.  This obscures the location of
the original fault.

This also fixes a crash with Xen PV guests when they access entries in
the M2P corresponding to device MMIO regions.  The M2P is mapped
(read-only) by Xen into the kernel address space of the guest and this
mapping may contains holes for non-RAM regions.  Read faults will
result in calls to spurious_fault(), but because the page tables for
the M2P mappings are not accessible by the guest the pagetable walk
would fault.

This was not normally a problem as MMIO mappings would not normally
result in a M2P lookup because of the use of the _PAGE_IOMAP bit the
PTE.  However, removing the _PAGE_IOMAP bit requires M2P lookups for
MMIO mappings as well.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
x86 maintainers, this is a prerequisite for removing Xen's usage of
_PAGE_IOMAP so I think this is best merged via the Xen tree.
---
 arch/x86/mm/fault.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 8e57229..c39e249 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -936,8 +936,10 @@ spurious_fault(unsigned long error_code, unsigned long address)
 	pte_t *pte;
 	int ret;
 
-	/* Reserved-bit violation or user access to kernel space? */
-	if (error_code & (PF_USER | PF_RSVD))
+	/* Only check for spurious faults on supervisor write or
+	   instruction faults. */
+	if (error_code != (PF_WRITE | PF_PROT)
+	    && error_code != (PF_INSTR | PF_PROT))
 		return 0;
 
 	pgd = init_mm.pgd + pgd_index(address);
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 8/9] x86/xen: do not use _PAGE_IOMAP PTE flag for I/O mappings
  2014-04-15 14:15 [PATCHv6 0/9] x86/xen: fixes for mapping high MMIO regions (and remove _PAGE_IOMAP) David Vrabel
@ 2014-04-15 14:15   ` David Vrabel
  2014-04-15 14:15   ` David Vrabel
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: David Vrabel @ 2014-04-15 14:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel,
	linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Mel Gorman

Since mfn_to_pfn() returns the correct PFN for identity mappings (as
used for MMIO regions), the use of _PAGE_IOMAP is not required in
pte_mfn_to_pfn().

Do not set the _PAGE_IOMAP flag in pte_pfn_to_mfn() and do not use it
in pte_mfn_to_pfn().

This will allow _PAGE_IOMAP to be removed, making it available for
future use.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/mmu.c |   48 ++++--------------------------------------------
 1 files changed, 4 insertions(+), 44 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index d916024..b86ebff 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -399,38 +399,14 @@ static pteval_t pte_pfn_to_mfn(pteval_t val)
 		if (unlikely(mfn == INVALID_P2M_ENTRY)) {
 			mfn = 0;
 			flags = 0;
-		} else {
-			/*
-			 * Paramount to do this test _after_ the
-			 * INVALID_P2M_ENTRY as INVALID_P2M_ENTRY &
-			 * IDENTITY_FRAME_BIT resolves to true.
-			 */
-			mfn &= ~FOREIGN_FRAME_BIT;
-			if (mfn & IDENTITY_FRAME_BIT) {
-				mfn &= ~IDENTITY_FRAME_BIT;
-				flags |= _PAGE_IOMAP;
-			}
-		}
+		} else
+			mfn &= ~(FOREIGN_FRAME_BIT | IDENTITY_FRAME_BIT);
 		val = ((pteval_t)mfn << PAGE_SHIFT) | flags;
 	}
 
 	return val;
 }
 
-static pteval_t iomap_pte(pteval_t val)
-{
-	if (val & _PAGE_PRESENT) {
-		unsigned long pfn = (val & PTE_PFN_MASK) >> PAGE_SHIFT;
-		pteval_t flags = val & PTE_FLAGS_MASK;
-
-		/* We assume the pte frame number is a MFN, so
-		   just use it as-is. */
-		val = ((pteval_t)pfn << PAGE_SHIFT) | flags;
-	}
-
-	return val;
-}
-
 __visible pteval_t xen_pte_val(pte_t pte)
 {
 	pteval_t pteval = pte.pte;
@@ -441,9 +417,6 @@ __visible pteval_t xen_pte_val(pte_t pte)
 		pteval = (pteval & ~_PAGE_PAT) | _PAGE_PWT;
 	}
 #endif
-	if (xen_initial_domain() && (pteval & _PAGE_IOMAP))
-		return pteval;
-
 	return pte_mfn_to_pfn(pteval);
 }
 PV_CALLEE_SAVE_REGS_THUNK(xen_pte_val);
@@ -481,7 +454,6 @@ void xen_set_pat(u64 pat)
 
 __visible pte_t xen_make_pte(pteval_t pte)
 {
-	phys_addr_t addr = (pte & PTE_PFN_MASK);
 #if 0
 	/* If Linux is trying to set a WC pte, then map to the Xen WC.
 	 * If _PAGE_PAT is set, then it probably means it is really
@@ -496,19 +468,7 @@ __visible pte_t xen_make_pte(pteval_t pte)
 			pte = (pte & ~(_PAGE_PCD | _PAGE_PWT)) | _PAGE_PAT;
 	}
 #endif
-	/*
-	 * Unprivileged domains are allowed to do IOMAPpings for
-	 * PCI passthrough, but not map ISA space.  The ISA
-	 * mappings are just dummy local mappings to keep other
-	 * parts of the kernel happy.
-	 */
-	if (unlikely(pte & _PAGE_IOMAP) &&
-	    (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
-		pte = iomap_pte(pte);
-	} else {
-		pte &= ~_PAGE_IOMAP;
-		pte = pte_pfn_to_mfn(pte);
-	}
+	pte = pte_pfn_to_mfn(pte);
 
 	return native_make_pte(pte);
 }
@@ -2095,7 +2055,7 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot)
 
 	default:
 		/* By default, set_fixmap is used for hardware mappings */
-		pte = mfn_pte(phys, __pgprot(pgprot_val(prot) | _PAGE_IOMAP));
+		pte = mfn_pte(phys, prot);
 		break;
 	}
 
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 8/9] x86/xen: do not use _PAGE_IOMAP PTE flag for I/O mappings
@ 2014-04-15 14:15   ` David Vrabel
  0 siblings, 0 replies; 29+ messages in thread
From: David Vrabel @ 2014-04-15 14:15 UTC (permalink / raw)
  To: xen-devel
  Cc: x86, linux-kernel, Ingo Molnar, David Vrabel, H. Peter Anvin,
	Boris Ostrovsky, Thomas Gleixner, Mel Gorman

Since mfn_to_pfn() returns the correct PFN for identity mappings (as
used for MMIO regions), the use of _PAGE_IOMAP is not required in
pte_mfn_to_pfn().

Do not set the _PAGE_IOMAP flag in pte_pfn_to_mfn() and do not use it
in pte_mfn_to_pfn().

This will allow _PAGE_IOMAP to be removed, making it available for
future use.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/mmu.c |   48 ++++--------------------------------------------
 1 files changed, 4 insertions(+), 44 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index d916024..b86ebff 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -399,38 +399,14 @@ static pteval_t pte_pfn_to_mfn(pteval_t val)
 		if (unlikely(mfn == INVALID_P2M_ENTRY)) {
 			mfn = 0;
 			flags = 0;
-		} else {
-			/*
-			 * Paramount to do this test _after_ the
-			 * INVALID_P2M_ENTRY as INVALID_P2M_ENTRY &
-			 * IDENTITY_FRAME_BIT resolves to true.
-			 */
-			mfn &= ~FOREIGN_FRAME_BIT;
-			if (mfn & IDENTITY_FRAME_BIT) {
-				mfn &= ~IDENTITY_FRAME_BIT;
-				flags |= _PAGE_IOMAP;
-			}
-		}
+		} else
+			mfn &= ~(FOREIGN_FRAME_BIT | IDENTITY_FRAME_BIT);
 		val = ((pteval_t)mfn << PAGE_SHIFT) | flags;
 	}
 
 	return val;
 }
 
-static pteval_t iomap_pte(pteval_t val)
-{
-	if (val & _PAGE_PRESENT) {
-		unsigned long pfn = (val & PTE_PFN_MASK) >> PAGE_SHIFT;
-		pteval_t flags = val & PTE_FLAGS_MASK;
-
-		/* We assume the pte frame number is a MFN, so
-		   just use it as-is. */
-		val = ((pteval_t)pfn << PAGE_SHIFT) | flags;
-	}
-
-	return val;
-}
-
 __visible pteval_t xen_pte_val(pte_t pte)
 {
 	pteval_t pteval = pte.pte;
@@ -441,9 +417,6 @@ __visible pteval_t xen_pte_val(pte_t pte)
 		pteval = (pteval & ~_PAGE_PAT) | _PAGE_PWT;
 	}
 #endif
-	if (xen_initial_domain() && (pteval & _PAGE_IOMAP))
-		return pteval;
-
 	return pte_mfn_to_pfn(pteval);
 }
 PV_CALLEE_SAVE_REGS_THUNK(xen_pte_val);
@@ -481,7 +454,6 @@ void xen_set_pat(u64 pat)
 
 __visible pte_t xen_make_pte(pteval_t pte)
 {
-	phys_addr_t addr = (pte & PTE_PFN_MASK);
 #if 0
 	/* If Linux is trying to set a WC pte, then map to the Xen WC.
 	 * If _PAGE_PAT is set, then it probably means it is really
@@ -496,19 +468,7 @@ __visible pte_t xen_make_pte(pteval_t pte)
 			pte = (pte & ~(_PAGE_PCD | _PAGE_PWT)) | _PAGE_PAT;
 	}
 #endif
-	/*
-	 * Unprivileged domains are allowed to do IOMAPpings for
-	 * PCI passthrough, but not map ISA space.  The ISA
-	 * mappings are just dummy local mappings to keep other
-	 * parts of the kernel happy.
-	 */
-	if (unlikely(pte & _PAGE_IOMAP) &&
-	    (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
-		pte = iomap_pte(pte);
-	} else {
-		pte &= ~_PAGE_IOMAP;
-		pte = pte_pfn_to_mfn(pte);
-	}
+	pte = pte_pfn_to_mfn(pte);
 
 	return native_make_pte(pte);
 }
@@ -2095,7 +2055,7 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot)
 
 	default:
 		/* By default, set_fixmap is used for hardware mappings */
-		pte = mfn_pte(phys, __pgprot(pgprot_val(prot) | _PAGE_IOMAP));
+		pte = mfn_pte(phys, prot);
 		break;
 	}
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 9/9] x86: remove the Xen-specific _PAGE_IOMAP PTE flag
  2014-04-15 14:15 [PATCHv6 0/9] x86/xen: fixes for mapping high MMIO regions (and remove _PAGE_IOMAP) David Vrabel
@ 2014-04-15 14:15   ` David Vrabel
  2014-04-15 14:15   ` David Vrabel
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: David Vrabel @ 2014-04-15 14:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel,
	linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Mel Gorman

The _PAGE_IO_MAP PTE flag was only used by Xen PV guests to mark PTEs
that were used to map I/O regions that are 1:1 in the p2m.  This
allowed Xen to obtain the correct PFN when converting the MFNs read
from a PTE back to their PFN.

Xen guests no longer use _PAGE_IOMAP for this. Instead mfn_to_pfn()
returns the correct PFN by using a combination of the m2p and p2m to
determine if an MFN corresponds to a 1:1 mapping in the the p2m.

Remove _PAGE_IOMAP, replacing it with _PAGE_UNUSED2 to allow for
future uses of the PTE flag.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
---
This depends on the preceeding Xen changes, so this will be merged via
the Xen tree.
---
 arch/x86/include/asm/pgtable_types.h |   12 ++++++------
 arch/x86/mm/init_32.c                |    2 +-
 arch/x86/mm/init_64.c                |    2 +-
 arch/x86/pci/i386.c                  |    2 --
 arch/x86/xen/enlighten.c             |    2 --
 5 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index eb3d449..ead3cb7 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -17,7 +17,7 @@
 #define _PAGE_BIT_PAT		7	/* on 4KB pages */
 #define _PAGE_BIT_GLOBAL	8	/* Global TLB entry PPro+ */
 #define _PAGE_BIT_UNUSED1	9	/* available for programmer */
-#define _PAGE_BIT_IOMAP		10	/* flag used to indicate IO mapping */
+#define _PAGE_BIT_UNUSED2	10	/* available for programmer */
 #define _PAGE_BIT_HIDDEN	11	/* hidden by kmemcheck */
 #define _PAGE_BIT_PAT_LARGE	12	/* On 2MB or 1GB pages */
 #define _PAGE_BIT_SPECIAL	_PAGE_BIT_UNUSED1
@@ -41,7 +41,7 @@
 #define _PAGE_PSE	(_AT(pteval_t, 1) << _PAGE_BIT_PSE)
 #define _PAGE_GLOBAL	(_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL)
 #define _PAGE_UNUSED1	(_AT(pteval_t, 1) << _PAGE_BIT_UNUSED1)
-#define _PAGE_IOMAP	(_AT(pteval_t, 1) << _PAGE_BIT_IOMAP)
+#define _PAGE_UNUSED2	(_AT(pteval_t, 1) << _PAGE_BIT_UNUSED2)
 #define _PAGE_PAT	(_AT(pteval_t, 1) << _PAGE_BIT_PAT)
 #define _PAGE_PAT_LARGE (_AT(pteval_t, 1) << _PAGE_BIT_PAT_LARGE)
 #define _PAGE_SPECIAL	(_AT(pteval_t, 1) << _PAGE_BIT_SPECIAL)
@@ -164,10 +164,10 @@
 #define __PAGE_KERNEL_LARGE_NOCACHE	(__PAGE_KERNEL | _PAGE_CACHE_UC | _PAGE_PSE)
 #define __PAGE_KERNEL_LARGE_EXEC	(__PAGE_KERNEL_EXEC | _PAGE_PSE)
 
-#define __PAGE_KERNEL_IO		(__PAGE_KERNEL | _PAGE_IOMAP)
-#define __PAGE_KERNEL_IO_NOCACHE	(__PAGE_KERNEL_NOCACHE | _PAGE_IOMAP)
-#define __PAGE_KERNEL_IO_UC_MINUS	(__PAGE_KERNEL_UC_MINUS | _PAGE_IOMAP)
-#define __PAGE_KERNEL_IO_WC		(__PAGE_KERNEL_WC | _PAGE_IOMAP)
+#define __PAGE_KERNEL_IO		(__PAGE_KERNEL)
+#define __PAGE_KERNEL_IO_NOCACHE	(__PAGE_KERNEL_NOCACHE)
+#define __PAGE_KERNEL_IO_UC_MINUS	(__PAGE_KERNEL_UC_MINUS)
+#define __PAGE_KERNEL_IO_WC		(__PAGE_KERNEL_WC)
 
 #define PAGE_KERNEL			__pgprot(__PAGE_KERNEL)
 #define PAGE_KERNEL_RO			__pgprot(__PAGE_KERNEL_RO)
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index e395048..af7259c 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -537,7 +537,7 @@ static void __init pagetable_init(void)
 	permanent_kmaps_init(pgd_base);
 }
 
-pteval_t __supported_pte_mask __read_mostly = ~(_PAGE_NX | _PAGE_GLOBAL | _PAGE_IOMAP);
+pteval_t __supported_pte_mask __read_mostly = ~(_PAGE_NX | _PAGE_GLOBAL);
 EXPORT_SYMBOL_GPL(__supported_pte_mask);
 
 /* user-defined highmem size */
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index f35c66c..9e6fa6d 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -151,7 +151,7 @@ early_param("gbpages", parse_direct_gbpages_on);
  * around without checking the pgd every time.
  */
 
-pteval_t __supported_pte_mask __read_mostly = ~_PAGE_IOMAP;
+pteval_t __supported_pte_mask __read_mostly = ~0;
 EXPORT_SYMBOL_GPL(__supported_pte_mask);
 
 int force_personality32;
diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index db6b1ab..1f642d6 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -433,8 +433,6 @@ int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
 		 */
 		prot |= _PAGE_CACHE_UC_MINUS;
 
-	prot |= _PAGE_IOMAP;	/* creating a mapping for IO */
-
 	vma->vm_page_prot = __pgprot(prot);
 
 	if (io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 201d09a..c5e21e3 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1555,8 +1555,6 @@ asmlinkage void __init xen_start_kernel(void)
 #endif
 		__supported_pte_mask &= ~(_PAGE_PWT | _PAGE_PCD);
 
-	__supported_pte_mask |= _PAGE_IOMAP;
-
 	/*
 	 * Prevent page tables from being allocated in highmem, even
 	 * if CONFIG_HIGHPTE is enabled.
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 9/9] x86: remove the Xen-specific _PAGE_IOMAP PTE flag
@ 2014-04-15 14:15   ` David Vrabel
  0 siblings, 0 replies; 29+ messages in thread
From: David Vrabel @ 2014-04-15 14:15 UTC (permalink / raw)
  To: xen-devel
  Cc: x86, linux-kernel, Ingo Molnar, David Vrabel, H. Peter Anvin,
	Boris Ostrovsky, Thomas Gleixner, Mel Gorman

The _PAGE_IO_MAP PTE flag was only used by Xen PV guests to mark PTEs
that were used to map I/O regions that are 1:1 in the p2m.  This
allowed Xen to obtain the correct PFN when converting the MFNs read
from a PTE back to their PFN.

Xen guests no longer use _PAGE_IOMAP for this. Instead mfn_to_pfn()
returns the correct PFN by using a combination of the m2p and p2m to
determine if an MFN corresponds to a 1:1 mapping in the the p2m.

Remove _PAGE_IOMAP, replacing it with _PAGE_UNUSED2 to allow for
future uses of the PTE flag.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
---
This depends on the preceeding Xen changes, so this will be merged via
the Xen tree.
---
 arch/x86/include/asm/pgtable_types.h |   12 ++++++------
 arch/x86/mm/init_32.c                |    2 +-
 arch/x86/mm/init_64.c                |    2 +-
 arch/x86/pci/i386.c                  |    2 --
 arch/x86/xen/enlighten.c             |    2 --
 5 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index eb3d449..ead3cb7 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -17,7 +17,7 @@
 #define _PAGE_BIT_PAT		7	/* on 4KB pages */
 #define _PAGE_BIT_GLOBAL	8	/* Global TLB entry PPro+ */
 #define _PAGE_BIT_UNUSED1	9	/* available for programmer */
-#define _PAGE_BIT_IOMAP		10	/* flag used to indicate IO mapping */
+#define _PAGE_BIT_UNUSED2	10	/* available for programmer */
 #define _PAGE_BIT_HIDDEN	11	/* hidden by kmemcheck */
 #define _PAGE_BIT_PAT_LARGE	12	/* On 2MB or 1GB pages */
 #define _PAGE_BIT_SPECIAL	_PAGE_BIT_UNUSED1
@@ -41,7 +41,7 @@
 #define _PAGE_PSE	(_AT(pteval_t, 1) << _PAGE_BIT_PSE)
 #define _PAGE_GLOBAL	(_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL)
 #define _PAGE_UNUSED1	(_AT(pteval_t, 1) << _PAGE_BIT_UNUSED1)
-#define _PAGE_IOMAP	(_AT(pteval_t, 1) << _PAGE_BIT_IOMAP)
+#define _PAGE_UNUSED2	(_AT(pteval_t, 1) << _PAGE_BIT_UNUSED2)
 #define _PAGE_PAT	(_AT(pteval_t, 1) << _PAGE_BIT_PAT)
 #define _PAGE_PAT_LARGE (_AT(pteval_t, 1) << _PAGE_BIT_PAT_LARGE)
 #define _PAGE_SPECIAL	(_AT(pteval_t, 1) << _PAGE_BIT_SPECIAL)
@@ -164,10 +164,10 @@
 #define __PAGE_KERNEL_LARGE_NOCACHE	(__PAGE_KERNEL | _PAGE_CACHE_UC | _PAGE_PSE)
 #define __PAGE_KERNEL_LARGE_EXEC	(__PAGE_KERNEL_EXEC | _PAGE_PSE)
 
-#define __PAGE_KERNEL_IO		(__PAGE_KERNEL | _PAGE_IOMAP)
-#define __PAGE_KERNEL_IO_NOCACHE	(__PAGE_KERNEL_NOCACHE | _PAGE_IOMAP)
-#define __PAGE_KERNEL_IO_UC_MINUS	(__PAGE_KERNEL_UC_MINUS | _PAGE_IOMAP)
-#define __PAGE_KERNEL_IO_WC		(__PAGE_KERNEL_WC | _PAGE_IOMAP)
+#define __PAGE_KERNEL_IO		(__PAGE_KERNEL)
+#define __PAGE_KERNEL_IO_NOCACHE	(__PAGE_KERNEL_NOCACHE)
+#define __PAGE_KERNEL_IO_UC_MINUS	(__PAGE_KERNEL_UC_MINUS)
+#define __PAGE_KERNEL_IO_WC		(__PAGE_KERNEL_WC)
 
 #define PAGE_KERNEL			__pgprot(__PAGE_KERNEL)
 #define PAGE_KERNEL_RO			__pgprot(__PAGE_KERNEL_RO)
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index e395048..af7259c 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -537,7 +537,7 @@ static void __init pagetable_init(void)
 	permanent_kmaps_init(pgd_base);
 }
 
-pteval_t __supported_pte_mask __read_mostly = ~(_PAGE_NX | _PAGE_GLOBAL | _PAGE_IOMAP);
+pteval_t __supported_pte_mask __read_mostly = ~(_PAGE_NX | _PAGE_GLOBAL);
 EXPORT_SYMBOL_GPL(__supported_pte_mask);
 
 /* user-defined highmem size */
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index f35c66c..9e6fa6d 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -151,7 +151,7 @@ early_param("gbpages", parse_direct_gbpages_on);
  * around without checking the pgd every time.
  */
 
-pteval_t __supported_pte_mask __read_mostly = ~_PAGE_IOMAP;
+pteval_t __supported_pte_mask __read_mostly = ~0;
 EXPORT_SYMBOL_GPL(__supported_pte_mask);
 
 int force_personality32;
diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index db6b1ab..1f642d6 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -433,8 +433,6 @@ int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
 		 */
 		prot |= _PAGE_CACHE_UC_MINUS;
 
-	prot |= _PAGE_IOMAP;	/* creating a mapping for IO */
-
 	vma->vm_page_prot = __pgprot(prot);
 
 	if (io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 201d09a..c5e21e3 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1555,8 +1555,6 @@ asmlinkage void __init xen_start_kernel(void)
 #endif
 		__supported_pte_mask &= ~(_PAGE_PWT | _PAGE_PCD);
 
-	__supported_pte_mask |= _PAGE_IOMAP;
-
 	/*
 	 * Prevent page tables from being allocated in highmem, even
 	 * if CONFIG_HIGHPTE is enabled.
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/9] x86: skip check for spurious faults for non-present faults
  2014-04-15 14:15   ` David Vrabel
  (?)
  (?)
@ 2014-04-30 12:41   ` David Vrabel
  -1 siblings, 0 replies; 29+ messages in thread
From: David Vrabel @ 2014-04-30 12:41 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, Konrad Rzeszutek Wilk, Boris Ostrovsky, linux-kernel,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Mel Gorman

On 15/04/14 15:15, David Vrabel wrote:
> If a fault on a kernel address is due to a non-present page, then it
> cannot be the result of stale TLB entry from a protection change (RO
> to RW or NX to X).  Thus the pagetable walk in spurious_fault() can be
> skipped.
> 
> This avoids spurious_fault() oopsing in some cases if the pagetables
> it attempts to walk are not accessible.  This obscures the location of
> the original fault.
> 
> This also fixes a crash with Xen PV guests when they access entries in
> the M2P corresponding to device MMIO regions.  The M2P is mapped
> (read-only) by Xen into the kernel address space of the guest and this
> mapping may contains holes for non-RAM regions.  Read faults will
> result in calls to spurious_fault(), but because the page tables for
> the M2P mappings are not accessible by the guest the pagetable walk
> would fault.
> 
> This was not normally a problem as MMIO mappings would not normally
> result in a M2P lookup because of the use of the _PAGE_IOMAP bit the
> PTE.  However, removing the _PAGE_IOMAP bit requires M2P lookups for
> MMIO mappings as well.
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
> x86 maintainers, this is a prerequisite for removing Xen's usage of
> _PAGE_IOMAP so I think this is best merged via the Xen tree.

Peter, any opinion on this patch?

David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/9] x86: skip check for spurious faults for non-present faults
  2014-04-15 14:15   ` David Vrabel
  (?)
@ 2014-04-30 12:41   ` David Vrabel
  -1 siblings, 0 replies; 29+ messages in thread
From: David Vrabel @ 2014-04-30 12:41 UTC (permalink / raw)
  To: David Vrabel
  Cc: x86, linux-kernel, Ingo Molnar, Mel Gorman, H. Peter Anvin,
	xen-devel, Boris Ostrovsky, Thomas Gleixner

On 15/04/14 15:15, David Vrabel wrote:
> If a fault on a kernel address is due to a non-present page, then it
> cannot be the result of stale TLB entry from a protection change (RO
> to RW or NX to X).  Thus the pagetable walk in spurious_fault() can be
> skipped.
> 
> This avoids spurious_fault() oopsing in some cases if the pagetables
> it attempts to walk are not accessible.  This obscures the location of
> the original fault.
> 
> This also fixes a crash with Xen PV guests when they access entries in
> the M2P corresponding to device MMIO regions.  The M2P is mapped
> (read-only) by Xen into the kernel address space of the guest and this
> mapping may contains holes for non-RAM regions.  Read faults will
> result in calls to spurious_fault(), but because the page tables for
> the M2P mappings are not accessible by the guest the pagetable walk
> would fault.
> 
> This was not normally a problem as MMIO mappings would not normally
> result in a M2P lookup because of the use of the _PAGE_IOMAP bit the
> PTE.  However, removing the _PAGE_IOMAP bit requires M2P lookups for
> MMIO mappings as well.
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
> x86 maintainers, this is a prerequisite for removing Xen's usage of
> _PAGE_IOMAP so I think this is best merged via the Xen tree.

Peter, any opinion on this patch?

David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv6 0/9] x86/xen: fixes for mapping high MMIO regions (and remove _PAGE_IOMAP)
  2014-04-15 14:15 [PATCHv6 0/9] x86/xen: fixes for mapping high MMIO regions (and remove _PAGE_IOMAP) David Vrabel
                   ` (11 preceding siblings ...)
  2014-05-15 15:30 ` [PATCHv6 0/9] x86/xen: fixes for mapping high MMIO regions (and remove _PAGE_IOMAP) David Vrabel
@ 2014-05-15 15:30 ` David Vrabel
  12 siblings, 0 replies; 29+ messages in thread
From: David Vrabel @ 2014-05-15 15:30 UTC (permalink / raw)
  To: xen-devel
  Cc: Konrad Rzeszutek Wilk, Boris Ostrovsky, linux-kernel,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Mel Gorman

On 15/04/14 15:15, David Vrabel wrote:
> This a fix for the problems with mapping high MMIO regions in certain
> cases (e.g., the RDMA drivers) as not all mappers were specifing
> _PAGE_IOMAP which meant no valid MFN could be found and the resulting
> PTEs would be set as not present, causing subsequent faults.

I've applied patches #1 to #6 to devel/for-linus-3.16.  These fix the
bug.  Patches #7 to #9 remove _PAGE_IOMAP but depend on an x86 change
which is not yet acked and I'm not going to delay (again) the bug fix
for this.

David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv6 0/9] x86/xen: fixes for mapping high MMIO regions (and remove _PAGE_IOMAP)
  2014-04-15 14:15 [PATCHv6 0/9] x86/xen: fixes for mapping high MMIO regions (and remove _PAGE_IOMAP) David Vrabel
                   ` (10 preceding siblings ...)
  2014-04-15 14:15   ` David Vrabel
@ 2014-05-15 15:30 ` David Vrabel
  2014-05-15 15:30 ` David Vrabel
  12 siblings, 0 replies; 29+ messages in thread
From: David Vrabel @ 2014-05-15 15:30 UTC (permalink / raw)
  To: xen-devel
  Cc: x86, linux-kernel, Ingo Molnar, Mel Gorman, H. Peter Anvin,
	Boris Ostrovsky, Thomas Gleixner

On 15/04/14 15:15, David Vrabel wrote:
> This a fix for the problems with mapping high MMIO regions in certain
> cases (e.g., the RDMA drivers) as not all mappers were specifing
> _PAGE_IOMAP which meant no valid MFN could be found and the resulting
> PTEs would be set as not present, causing subsequent faults.

I've applied patches #1 to #6 to devel/for-linus-3.16.  These fix the
bug.  Patches #7 to #9 remove _PAGE_IOMAP but depend on an x86 change
which is not yet acked and I'm not going to delay (again) the bug fix
for this.

David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/9] x86: skip check for spurious faults for non-present faults
  2014-04-15 14:15   ` David Vrabel
                     ` (2 preceding siblings ...)
  (?)
@ 2014-05-15 18:42   ` H. Peter Anvin
  2014-05-15 19:22     ` Keir Fraser
  2014-05-15 19:22     ` [Xen-devel] " Keir Fraser
  -1 siblings, 2 replies; 29+ messages in thread
From: H. Peter Anvin @ 2014-05-15 18:42 UTC (permalink / raw)
  To: David Vrabel, xen-devel
  Cc: Konrad Rzeszutek Wilk, Boris Ostrovsky, linux-kernel,
	Thomas Gleixner, Ingo Molnar, x86, Mel Gorman, Dave Hansen

On 04/15/2014 07:15 AM, David Vrabel wrote:
> If a fault on a kernel address is due to a non-present page, then it
> cannot be the result of stale TLB entry from a protection change (RO
> to RW or NX to X).  Thus the pagetable walk in spurious_fault() can be
> skipped.

Erk... this code is screaming WTF to me.  The x86 architecture is such
that the CPU is responsible for avoiding these faults.

<dig> <dig> <dig>

5b727a3b0158a129827c21ce3bfb0ba997e8ddd0

    x86: ignore spurious faults

    When changing a kernel page from RO->RW, it's OK to leave stale TLB
    entries around, since doing a global flush is expensive and they
    pose no security problem.  They can, however, generate a spurious
    fault, which we should catch and simply return from (which will
    have the side-effect of reloading the TLB to the current PTE).

    This can occur when running under Xen, because it frequently changes
    kernel pages from RW->RO->RW to implement Xen's pagetable semantics.
    It could also occur when using CONFIG_DEBUG_PAGEALLOC, since it
    avoids doing a global TLB flush after changing page permissions.

    Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
    Cc: Harvey Harrison <harvey.harrison@gmail.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Again WTF?

Are we chasing hardware errata here?  Or did someone go off and *assume*
that the x86 hardware architecture work a certain way?  Or is there
something way more subtle going on?

I guess next step is mailing list archaeology...

Does anyone still have contacts with Jeremy, and if so, could they poke
him perhaps?

	-hpa


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/9] x86: skip check for spurious faults for non-present faults
  2014-04-15 14:15   ` David Vrabel
                     ` (3 preceding siblings ...)
  (?)
@ 2014-05-15 18:42   ` H. Peter Anvin
  -1 siblings, 0 replies; 29+ messages in thread
From: H. Peter Anvin @ 2014-05-15 18:42 UTC (permalink / raw)
  To: David Vrabel, xen-devel
  Cc: x86, linux-kernel, Dave Hansen, Ingo Molnar, Mel Gorman,
	Boris Ostrovsky, Thomas Gleixner

On 04/15/2014 07:15 AM, David Vrabel wrote:
> If a fault on a kernel address is due to a non-present page, then it
> cannot be the result of stale TLB entry from a protection change (RO
> to RW or NX to X).  Thus the pagetable walk in spurious_fault() can be
> skipped.

Erk... this code is screaming WTF to me.  The x86 architecture is such
that the CPU is responsible for avoiding these faults.

<dig> <dig> <dig>

5b727a3b0158a129827c21ce3bfb0ba997e8ddd0

    x86: ignore spurious faults

    When changing a kernel page from RO->RW, it's OK to leave stale TLB
    entries around, since doing a global flush is expensive and they
    pose no security problem.  They can, however, generate a spurious
    fault, which we should catch and simply return from (which will
    have the side-effect of reloading the TLB to the current PTE).

    This can occur when running under Xen, because it frequently changes
    kernel pages from RW->RO->RW to implement Xen's pagetable semantics.
    It could also occur when using CONFIG_DEBUG_PAGEALLOC, since it
    avoids doing a global TLB flush after changing page permissions.

    Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
    Cc: Harvey Harrison <harvey.harrison@gmail.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Again WTF?

Are we chasing hardware errata here?  Or did someone go off and *assume*
that the x86 hardware architecture work a certain way?  Or is there
something way more subtle going on?

I guess next step is mailing list archaeology...

Does anyone still have contacts with Jeremy, and if so, could they poke
him perhaps?

	-hpa

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xen-devel] [PATCH 7/9] x86: skip check for spurious faults for non-present faults
  2014-05-15 18:42   ` H. Peter Anvin
  2014-05-15 19:22     ` Keir Fraser
@ 2014-05-15 19:22     ` Keir Fraser
  2014-05-15 19:51       ` H. Peter Anvin
  2014-05-15 19:51       ` [Xen-devel] " H. Peter Anvin
  1 sibling, 2 replies; 29+ messages in thread
From: Keir Fraser @ 2014-05-15 19:22 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: David Vrabel, xen-devel, x86, linux-kernel, Dave Hansen,
	Ingo Molnar, Mel Gorman, Boris Ostrovsky, Thomas Gleixner

H. Peter Anvin wrote:
> On 04/15/2014 07:15 AM, David Vrabel wrote:
>> If a fault on a kernel address is due to a non-present page, then it
>> cannot be the result of stale TLB entry from a protection change (RO
>> to RW or NX to X).  Thus the pagetable walk in spurious_fault() can be
>> skipped.
>
> Erk... this code is screaming WTF to me.  The x86 architecture is such
> that the CPU is responsible for avoiding these faults.

Not in this case...

> <dig>  <dig>  <dig>
>
> 5b727a3b0158a129827c21ce3bfb0ba997e8ddd0
>
>      x86: ignore spurious faults
>
>      When changing a kernel page from RO->RW, it's OK to leave stale TLB
>      entries around, since doing a global flush is expensive and they
>      pose no security problem.  They can, however, generate a spurious
>      fault, which we should catch and simply return from (which will
>      have the side-effect of reloading the TLB to the current PTE).
>
>      This can occur when running under Xen, because it frequently changes
>      kernel pages from RW->RO->RW to implement Xen's pagetable semantics.
>      It could also occur when using CONFIG_DEBUG_PAGEALLOC, since it
>      avoids doing a global TLB flush after changing page permissions.
>
>      Signed-off-by: Jeremy Fitzhardinge<jeremy@xensource.com>
>      Cc: Harvey Harrison<harvey.harrison@gmail.com>
>      Signed-off-by: Ingo Molnar<mingo@elte.hu>
>      Signed-off-by: Thomas Gleixner<tglx@linutronix.de>
>
> Again WTF?
>
> Are we chasing hardware errata here?  Or did someone go off and *assume*
> that the x86 hardware architecture work a certain way?  Or is there
> something way more subtle going on?

See Intel Developer's Manual Vol 3 Section 4.10.4.3, 3rd bullet... This 
is expected behaviour, probably to make copy-on-write faults faster.

  -- Keir

> I guess next step is mailing list archaeology...
>
> Does anyone still have contacts with Jeremy, and if so, could they poke
> him perhaps?
>
> 	-hpa
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/9] x86: skip check for spurious faults for non-present faults
  2014-05-15 18:42   ` H. Peter Anvin
@ 2014-05-15 19:22     ` Keir Fraser
  2014-05-15 19:22     ` [Xen-devel] " Keir Fraser
  1 sibling, 0 replies; 29+ messages in thread
From: Keir Fraser @ 2014-05-15 19:22 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: x86, linux-kernel, Dave Hansen, Ingo Molnar, David Vrabel,
	xen-devel, Boris Ostrovsky, Thomas Gleixner, Mel Gorman

H. Peter Anvin wrote:
> On 04/15/2014 07:15 AM, David Vrabel wrote:
>> If a fault on a kernel address is due to a non-present page, then it
>> cannot be the result of stale TLB entry from a protection change (RO
>> to RW or NX to X).  Thus the pagetable walk in spurious_fault() can be
>> skipped.
>
> Erk... this code is screaming WTF to me.  The x86 architecture is such
> that the CPU is responsible for avoiding these faults.

Not in this case...

> <dig>  <dig>  <dig>
>
> 5b727a3b0158a129827c21ce3bfb0ba997e8ddd0
>
>      x86: ignore spurious faults
>
>      When changing a kernel page from RO->RW, it's OK to leave stale TLB
>      entries around, since doing a global flush is expensive and they
>      pose no security problem.  They can, however, generate a spurious
>      fault, which we should catch and simply return from (which will
>      have the side-effect of reloading the TLB to the current PTE).
>
>      This can occur when running under Xen, because it frequently changes
>      kernel pages from RW->RO->RW to implement Xen's pagetable semantics.
>      It could also occur when using CONFIG_DEBUG_PAGEALLOC, since it
>      avoids doing a global TLB flush after changing page permissions.
>
>      Signed-off-by: Jeremy Fitzhardinge<jeremy@xensource.com>
>      Cc: Harvey Harrison<harvey.harrison@gmail.com>
>      Signed-off-by: Ingo Molnar<mingo@elte.hu>
>      Signed-off-by: Thomas Gleixner<tglx@linutronix.de>
>
> Again WTF?
>
> Are we chasing hardware errata here?  Or did someone go off and *assume*
> that the x86 hardware architecture work a certain way?  Or is there
> something way more subtle going on?

See Intel Developer's Manual Vol 3 Section 4.10.4.3, 3rd bullet... This 
is expected behaviour, probably to make copy-on-write faults faster.

  -- Keir

> I guess next step is mailing list archaeology...
>
> Does anyone still have contacts with Jeremy, and if so, could they poke
> him perhaps?
>
> 	-hpa
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xen-devel] [PATCH 7/9] x86: skip check for spurious faults for non-present faults
  2014-05-15 19:22     ` [Xen-devel] " Keir Fraser
  2014-05-15 19:51       ` H. Peter Anvin
@ 2014-05-15 19:51       ` H. Peter Anvin
  1 sibling, 0 replies; 29+ messages in thread
From: H. Peter Anvin @ 2014-05-15 19:51 UTC (permalink / raw)
  To: Keir Fraser
  Cc: David Vrabel, xen-devel, x86, linux-kernel, Dave Hansen,
	Ingo Molnar, Mel Gorman, Boris Ostrovsky, Thomas Gleixner

On 05/15/2014 12:22 PM, Keir Fraser wrote:
>>
>> Are we chasing hardware errata here?  Or did someone go off and *assume*
>> that the x86 hardware architecture work a certain way?  Or is there
>> something way more subtle going on?
> 
> See Intel Developer's Manual Vol 3 Section 4.10.4.3, 3rd bullet... This
> is expected behaviour, probably to make copy-on-write faults faster.
> 

Hm, yes.  My memory of this comes from before these formal rules were
written down... I guess there is some wiggle room in there, presumably
as you say, for performance reasons (or implementation leeway, which is
another way to say performance.)

This does make a P bit switch architecturally different from W or NX, so
I'm okay with that, but I would like the patch adjusted in the following
ways:

1. Put in an explicit comment about the architectural difference
   between the P bit on one hand and an W and NX on the other; an SDM
   reference is good, and *why* this makes the specific filtering
   correct.

2. Please use the standard format for multiline comments;

	/*
         * blah
         * blah
         */

With that this should be okay.

	-hpa


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/9] x86: skip check for spurious faults for non-present faults
  2014-05-15 19:22     ` [Xen-devel] " Keir Fraser
@ 2014-05-15 19:51       ` H. Peter Anvin
  2014-05-15 19:51       ` [Xen-devel] " H. Peter Anvin
  1 sibling, 0 replies; 29+ messages in thread
From: H. Peter Anvin @ 2014-05-15 19:51 UTC (permalink / raw)
  To: Keir Fraser
  Cc: x86, linux-kernel, Dave Hansen, Ingo Molnar, David Vrabel,
	xen-devel, Boris Ostrovsky, Thomas Gleixner, Mel Gorman

On 05/15/2014 12:22 PM, Keir Fraser wrote:
>>
>> Are we chasing hardware errata here?  Or did someone go off and *assume*
>> that the x86 hardware architecture work a certain way?  Or is there
>> something way more subtle going on?
> 
> See Intel Developer's Manual Vol 3 Section 4.10.4.3, 3rd bullet... This
> is expected behaviour, probably to make copy-on-write faults faster.
> 

Hm, yes.  My memory of this comes from before these formal rules were
written down... I guess there is some wiggle room in there, presumably
as you say, for performance reasons (or implementation leeway, which is
another way to say performance.)

This does make a P bit switch architecturally different from W or NX, so
I'm okay with that, but I would like the patch adjusted in the following
ways:

1. Put in an explicit comment about the architectural difference
   between the P bit on one hand and an W and NX on the other; an SDM
   reference is good, and *why* this makes the specific filtering
   correct.

2. Please use the standard format for multiline comments;

	/*
         * blah
         * blah
         */

With that this should be okay.

	-hpa

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2014-05-15 19:52 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-15 14:15 [PATCHv6 0/9] x86/xen: fixes for mapping high MMIO regions (and remove _PAGE_IOMAP) David Vrabel
2014-04-15 14:15 ` [PATCH 1/9] x86/xen: rename early_p2m_alloc() and early_p2m_alloc_middle() David Vrabel
2014-04-15 14:15   ` David Vrabel
2014-04-15 14:15 ` [PATCH 2/9] x86/xen: fix set_phys_range_identity() if pfn_e > MAX_P2M_PFN David Vrabel
2014-04-15 14:15   ` David Vrabel
2014-04-15 14:15 ` [PATCH 3/9] x86/xen: compactly store large identity ranges in the p2m David Vrabel
2014-04-15 14:15   ` David Vrabel
2014-04-15 14:15 ` [PATCH 4/9] x86/xen: only warn once if bad MFNs are found during setup David Vrabel
2014-04-15 14:15 ` David Vrabel
2014-04-15 14:15 ` [PATCH 5/9] x86/xen: set regions above the end of RAM as 1:1 David Vrabel
2014-04-15 14:15 ` David Vrabel
2014-04-15 14:15 ` [PATCH 6/9] x86/xen: do not use _PAGE_IOMAP in xen_remap_domain_mfn_range() David Vrabel
2014-04-15 14:15   ` David Vrabel
2014-04-15 14:15 ` [PATCH 7/9] x86: skip check for spurious faults for non-present faults David Vrabel
2014-04-15 14:15   ` David Vrabel
2014-04-30 12:41   ` David Vrabel
2014-04-30 12:41   ` David Vrabel
2014-05-15 18:42   ` H. Peter Anvin
2014-05-15 19:22     ` Keir Fraser
2014-05-15 19:22     ` [Xen-devel] " Keir Fraser
2014-05-15 19:51       ` H. Peter Anvin
2014-05-15 19:51       ` [Xen-devel] " H. Peter Anvin
2014-05-15 18:42   ` H. Peter Anvin
2014-04-15 14:15 ` [PATCH 8/9] x86/xen: do not use _PAGE_IOMAP PTE flag for I/O mappings David Vrabel
2014-04-15 14:15   ` David Vrabel
2014-04-15 14:15 ` [PATCH 9/9] x86: remove the Xen-specific _PAGE_IOMAP PTE flag David Vrabel
2014-04-15 14:15   ` David Vrabel
2014-05-15 15:30 ` [PATCHv6 0/9] x86/xen: fixes for mapping high MMIO regions (and remove _PAGE_IOMAP) David Vrabel
2014-05-15 15:30 ` David Vrabel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.