All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC v2] Consider E820 non-RAM and E820 gaps as 1-1 mappings.
@ 2010-12-30 19:48 ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-12-30 19:48 UTC (permalink / raw)
  To: linux-kernel, Jeremy Fitzhardinge, hpa, Ian Campbell
  Cc: Jan Beulich, xen-devel, Konrad Rzeszutek Wilk, Stefano Stabellini

Please see attached an RFC of second set of patches that augments
how Xen MMU deals with PFNs that point to physical devices. This
patchset is still a development type so sharp edges present.

Changelog: [since v1 https://lkml.org/lkml/2010/12/21/255]:
 - Diagrams of P2M included.
 - More conservative approach used (memory that is not populated or 
   identity is considered "missing", instead of as identity).
 - Added IDENTITY_PAGE_FRAME bit to uniquely identify 1-1 mappings.
 - Optional debugfs file (xen/mmu/p2m) to print out the level and types in
   the P2M tree.
 - Lots of comments - if I missed any please prompt me.

Short summary: No need to troll through code to add VM_IO on mmap paths
anymore.

Long summary:
Under Xen MMU we would distinguish two different types of PFNs in
the P2M tree: real MFN, INVALID_P2M_ENTRY (missing PFN - used for ballooning).
If there was a device which PCI BAR was within the P2M, we would look
at the flags and if _PAGE_IOMAP was passed we would just return the PFN without
consulting the P2M. We have a patch (and some auxiliary for other subsystems)
that sets this:
 x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas

This patchset proposes a different way of doing this where the patch
above and the other auxiliary ones will not be necessary.

This approach is the one that H. Peter Anvin, Jeremy Fitzhardinge, Ian Campbell
suggested. The mechanism is to think of the E820 non-RAM entries and E820 gaps
in the P2M tree structure as identity (1-1) mapping.

In the past we used to think of those regions as "missing" and under the ownership
of the balloon code. But the balloon code only operates on a specific region. This
region is in last E820 RAM page (basically any region past nr_pages is considered balloon
type page).

Gaps in the E820 (which are usually considered to PCI BAR spaces) would end up
with the void entries and point to the "missing" pages.

This patchset iterates over the E820 and finds the gaps and non-RAM E820 and
marks them as as "identity".

Since the E820 gaps could cross boundary (keep in mind that the P2M structure
is a 3-level tree) in the P2M regions we go through the E820 gaps and reserved
E820 regions and set those to be identity. For large regions we just hook up
the top (or middle) pointer to shared "identity" pages. For smaller regions
we set the MFN wherein pfn_to_mfn(pfn)==pfn.

The two attached diagrams crudely explain how we are doing this. "P2M story"
(https://docs.google.com/drawings/edit?id=1LQy0np2GCkFtspoucgs5Y_TILI8eceY6rikXwtPqlTI&hl=en&authkey=CO_yv_cC)
is how the P2M is constructed and setup with balloon pages. The "P2M with 1-1.."
(https://docs.google.com/drawings/edit?id=1smqIRPYq2mSxmvpabuk_Ap6bQAW_aaZnOnjzqlcmxKc&hl=en&authkey=CI2iwKcE)
is how we insert the identity mappings in the P2M tree.

For the balloon pages, the setting of the "missing" pages is mostly already done.
The initial case of carving the last E820 region for balloon ownership is augmented
to set those PFNs to missing and we also change the balloon code to be more
aggressive.


This patchset is also available under git:
 git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git devel/p2m-identity.v4.2

Further work:
Also filter out _PAGE_IOMAP on entries that are System RAM (which is wrong).
With this P2M to lookup we can make this determination easily.

P.S.
Along with the devel/ttm.pci-api-v2, I've been able to boot Dom0 on a variety
of PCIe type graphics hardware with X working (G31M, ATI ES1000, GeForce 6150SE,
HD 4350 Radeon, HD 3200 Radeon, GeForce 8600 GT). That test branch is located
at devel/fix-amd-bootup if you are curious.


 arch/x86/include/asm/xen/page.h |    7 +-
 arch/x86/xen/mmu.c              |  222 +++++++++++++++++++++++++++++++++++----
 arch/x86/xen/setup.c            |   46 ++++++++-
 drivers/xen/balloon.c           |    2 +-
 4 files changed, 255 insertions(+), 22 deletions(-)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH RFC v2] Consider E820 non-RAM and E820 gaps as 1-1 mappings.
@ 2010-12-30 19:48 ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-12-30 19:48 UTC (permalink / raw)
  To: linux-kernel, Jeremy Fitzhardinge, hpa, Ian Campbell
  Cc: Jan Beulich, xen-devel, Konrad Rzeszutek Wilk, Stefano Stabellini

Please see attached an RFC of second set of patches that augments
how Xen MMU deals with PFNs that point to physical devices. This
patchset is still a development type so sharp edges present.

Changelog: [since v1 https://lkml.org/lkml/2010/12/21/255]:
 - Diagrams of P2M included.
 - More conservative approach used (memory that is not populated or 
   identity is considered "missing", instead of as identity).
 - Added IDENTITY_PAGE_FRAME bit to uniquely identify 1-1 mappings.
 - Optional debugfs file (xen/mmu/p2m) to print out the level and types in
   the P2M tree.
 - Lots of comments - if I missed any please prompt me.

Short summary: No need to troll through code to add VM_IO on mmap paths
anymore.

Long summary:
Under Xen MMU we would distinguish two different types of PFNs in
the P2M tree: real MFN, INVALID_P2M_ENTRY (missing PFN - used for ballooning).
If there was a device which PCI BAR was within the P2M, we would look
at the flags and if _PAGE_IOMAP was passed we would just return the PFN without
consulting the P2M. We have a patch (and some auxiliary for other subsystems)
that sets this:
 x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas

This patchset proposes a different way of doing this where the patch
above and the other auxiliary ones will not be necessary.

This approach is the one that H. Peter Anvin, Jeremy Fitzhardinge, Ian Campbell
suggested. The mechanism is to think of the E820 non-RAM entries and E820 gaps
in the P2M tree structure as identity (1-1) mapping.

In the past we used to think of those regions as "missing" and under the ownership
of the balloon code. But the balloon code only operates on a specific region. This
region is in last E820 RAM page (basically any region past nr_pages is considered balloon
type page).

Gaps in the E820 (which are usually considered to PCI BAR spaces) would end up
with the void entries and point to the "missing" pages.

This patchset iterates over the E820 and finds the gaps and non-RAM E820 and
marks them as as "identity".

Since the E820 gaps could cross boundary (keep in mind that the P2M structure
is a 3-level tree) in the P2M regions we go through the E820 gaps and reserved
E820 regions and set those to be identity. For large regions we just hook up
the top (or middle) pointer to shared "identity" pages. For smaller regions
we set the MFN wherein pfn_to_mfn(pfn)==pfn.

The two attached diagrams crudely explain how we are doing this. "P2M story"
(https://docs.google.com/drawings/edit?id=1LQy0np2GCkFtspoucgs5Y_TILI8eceY6rikXwtPqlTI&hl=en&authkey=CO_yv_cC)
is how the P2M is constructed and setup with balloon pages. The "P2M with 1-1.."
(https://docs.google.com/drawings/edit?id=1smqIRPYq2mSxmvpabuk_Ap6bQAW_aaZnOnjzqlcmxKc&hl=en&authkey=CI2iwKcE)
is how we insert the identity mappings in the P2M tree.

For the balloon pages, the setting of the "missing" pages is mostly already done.
The initial case of carving the last E820 region for balloon ownership is augmented
to set those PFNs to missing and we also change the balloon code to be more
aggressive.


This patchset is also available under git:
 git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git devel/p2m-identity.v4.2

Further work:
Also filter out _PAGE_IOMAP on entries that are System RAM (which is wrong).
With this P2M to lookup we can make this determination easily.

P.S.
Along with the devel/ttm.pci-api-v2, I've been able to boot Dom0 on a variety
of PCIe type graphics hardware with X working (G31M, ATI ES1000, GeForce 6150SE,
HD 4350 Radeon, HD 3200 Radeon, GeForce 8600 GT). That test branch is located
at devel/fix-amd-bootup if you are curious.


 arch/x86/include/asm/xen/page.h |    7 +-
 arch/x86/xen/mmu.c              |  222 +++++++++++++++++++++++++++++++++++----
 arch/x86/xen/setup.c            |   46 ++++++++-
 drivers/xen/balloon.c           |    2 +-
 4 files changed, 255 insertions(+), 22 deletions(-)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/8] xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY.
  2010-12-30 19:48 ` Konrad Rzeszutek Wilk
@ 2010-12-30 19:48   ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-12-30 19:48 UTC (permalink / raw)
  To: linux-kernel, Jeremy Fitzhardinge, hpa, Ian Campbell
  Cc: Jan Beulich, xen-devel, Konrad Rzeszutek Wilk,
	Stefano Stabellini, Konrad Rzeszutek Wilk

With this patch, we diligently set regions that will be used by the
balloon driver to be INVALID_P2M_ENTRY and under the ownership
of the balloon driver. We are OK using the __set_phys_to_machine
as we do not expect to be allocating any P2M middle or entries pages.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/include/asm/xen/page.h |    1 +
 arch/x86/xen/mmu.c              |   12 ++++++------
 arch/x86/xen/setup.c            |    7 ++++++-
 drivers/xen/balloon.c           |    2 +-
 4 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index 8760cc6..be671f6 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -41,6 +41,7 @@ extern unsigned int   machine_to_phys_order;
 
 extern unsigned long get_phys_to_machine(unsigned long pfn);
 extern bool set_phys_to_machine(unsigned long pfn, unsigned long mfn);
+extern bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn);
 
 static inline unsigned long pfn_to_mfn(unsigned long pfn)
 {
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 44924e5..8e9c669 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -503,6 +503,11 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 {
 	unsigned topidx, mididx, idx;
 
+	if (unlikely(xen_feature(XENFEAT_auto_translated_physmap))) {
+		BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
+		return true;
+	}
+
 	if (unlikely(pfn >= MAX_P2M_PFN)) {
 		BUG_ON(mfn != INVALID_P2M_ENTRY);
 		return true;
@@ -522,11 +527,6 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 
 bool set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 {
-	if (unlikely(xen_feature(XENFEAT_auto_translated_physmap))) {
-		BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
-		return true;
-	}
-
 	if (unlikely(!__set_phys_to_machine(pfn, mfn)))  {
 		if (!alloc_p2m(pfn))
 			return false;
@@ -2438,7 +2438,7 @@ static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order,
 			in_frames[i] = virt_to_mfn(vaddr);
 
 		MULTI_update_va_mapping(mcs.mc, vaddr, VOID_PTE, 0);
-		set_phys_to_machine(virt_to_pfn(vaddr), INVALID_P2M_ENTRY);
+		__set_phys_to_machine(virt_to_pfn(vaddr), INVALID_P2M_ENTRY);
 
 		if (out_frames)
 			out_frames[i] = virt_to_pfn(vaddr);
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index b5a7f92..7201800 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -52,6 +52,8 @@ phys_addr_t xen_extra_mem_start, xen_extra_mem_size;
 
 static __init void xen_add_extra_mem(unsigned long pages)
 {
+	unsigned long pfn;
+
 	u64 size = (u64)pages * PAGE_SIZE;
 	u64 extra_start = xen_extra_mem_start + xen_extra_mem_size;
 
@@ -66,6 +68,9 @@ static __init void xen_add_extra_mem(unsigned long pages)
 	xen_extra_mem_size += size;
 
 	xen_max_p2m_pfn = PFN_DOWN(extra_start + size);
+
+	for (pfn = PFN_DOWN(extra_start); pfn <= xen_max_p2m_pfn; pfn++)
+		__set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
 }
 
 static unsigned long __init xen_release_chunk(phys_addr_t start_addr,
@@ -104,7 +109,7 @@ static unsigned long __init xen_release_chunk(phys_addr_t start_addr,
 		WARN(ret != 1, "Failed to release memory %lx-%lx err=%d\n",
 		     start, end, ret);
 		if (ret == 1) {
-			set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
+			__set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
 			len++;
 		}
 	}
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 43f9f02..b1661cd 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -296,7 +296,7 @@ static int decrease_reservation(unsigned long nr_pages)
 	/* No more mappings: invalidate P2M and add to balloon. */
 	for (i = 0; i < nr_pages; i++) {
 		pfn = mfn_to_pfn(frame_list[i]);
-		set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
+		__set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
 		balloon_append(pfn_to_page(pfn));
 	}
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 1/8] xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY.
@ 2010-12-30 19:48   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-12-30 19:48 UTC (permalink / raw)
  To: linux-kernel, Jeremy Fitzhardinge, hpa, Ian Campbell
  Cc: Jan Beulich, xen-devel, Konrad Rzeszutek Wilk,
	Stefano Stabellini, Konrad Rzeszutek Wilk

With this patch, we diligently set regions that will be used by the
balloon driver to be INVALID_P2M_ENTRY and under the ownership
of the balloon driver. We are OK using the __set_phys_to_machine
as we do not expect to be allocating any P2M middle or entries pages.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/include/asm/xen/page.h |    1 +
 arch/x86/xen/mmu.c              |   12 ++++++------
 arch/x86/xen/setup.c            |    7 ++++++-
 drivers/xen/balloon.c           |    2 +-
 4 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index 8760cc6..be671f6 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -41,6 +41,7 @@ extern unsigned int   machine_to_phys_order;
 
 extern unsigned long get_phys_to_machine(unsigned long pfn);
 extern bool set_phys_to_machine(unsigned long pfn, unsigned long mfn);
+extern bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn);
 
 static inline unsigned long pfn_to_mfn(unsigned long pfn)
 {
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 44924e5..8e9c669 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -503,6 +503,11 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 {
 	unsigned topidx, mididx, idx;
 
+	if (unlikely(xen_feature(XENFEAT_auto_translated_physmap))) {
+		BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
+		return true;
+	}
+
 	if (unlikely(pfn >= MAX_P2M_PFN)) {
 		BUG_ON(mfn != INVALID_P2M_ENTRY);
 		return true;
@@ -522,11 +527,6 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 
 bool set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 {
-	if (unlikely(xen_feature(XENFEAT_auto_translated_physmap))) {
-		BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
-		return true;
-	}
-
 	if (unlikely(!__set_phys_to_machine(pfn, mfn)))  {
 		if (!alloc_p2m(pfn))
 			return false;
@@ -2438,7 +2438,7 @@ static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order,
 			in_frames[i] = virt_to_mfn(vaddr);
 
 		MULTI_update_va_mapping(mcs.mc, vaddr, VOID_PTE, 0);
-		set_phys_to_machine(virt_to_pfn(vaddr), INVALID_P2M_ENTRY);
+		__set_phys_to_machine(virt_to_pfn(vaddr), INVALID_P2M_ENTRY);
 
 		if (out_frames)
 			out_frames[i] = virt_to_pfn(vaddr);
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index b5a7f92..7201800 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -52,6 +52,8 @@ phys_addr_t xen_extra_mem_start, xen_extra_mem_size;
 
 static __init void xen_add_extra_mem(unsigned long pages)
 {
+	unsigned long pfn;
+
 	u64 size = (u64)pages * PAGE_SIZE;
 	u64 extra_start = xen_extra_mem_start + xen_extra_mem_size;
 
@@ -66,6 +68,9 @@ static __init void xen_add_extra_mem(unsigned long pages)
 	xen_extra_mem_size += size;
 
 	xen_max_p2m_pfn = PFN_DOWN(extra_start + size);
+
+	for (pfn = PFN_DOWN(extra_start); pfn <= xen_max_p2m_pfn; pfn++)
+		__set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
 }
 
 static unsigned long __init xen_release_chunk(phys_addr_t start_addr,
@@ -104,7 +109,7 @@ static unsigned long __init xen_release_chunk(phys_addr_t start_addr,
 		WARN(ret != 1, "Failed to release memory %lx-%lx err=%d\n",
 		     start, end, ret);
 		if (ret == 1) {
-			set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
+			__set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
 			len++;
 		}
 	}
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 43f9f02..b1661cd 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -296,7 +296,7 @@ static int decrease_reservation(unsigned long nr_pages)
 	/* No more mappings: invalidate P2M and add to balloon. */
 	for (i = 0; i < nr_pages; i++) {
 		pfn = mfn_to_pfn(frame_list[i]);
-		set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
+		__set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
 		balloon_append(pfn_to_page(pfn));
 	}
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 2/8] xen/mmu: Add the notion of identity (1-1) mapping.
  2010-12-30 19:48 ` Konrad Rzeszutek Wilk
@ 2010-12-30 19:48   ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-12-30 19:48 UTC (permalink / raw)
  To: linux-kernel, Jeremy Fitzhardinge, hpa, Ian Campbell
  Cc: Jan Beulich, xen-devel, Konrad Rzeszutek Wilk,
	Stefano Stabellini, Konrad Rzeszutek Wilk

Our P2M tree structure is a three-level. On the leaf nodes
we set the Machine Frame Number (MFN) of the PFN. What this means
is that when one does: pfn_to_mfn(pfn), which is used when creating
PTE entries, you get the real MFN of the hardware. When Xen sets
up a guest it initially populates a array which has descending MFN
values, as so:

 idx: 0,  1,       2
 [0x290F, 0x290E, 0x290D, ..]

so pfn_to_mfn(2)==0x290D. If you start, restart many guests that list
starts looking quite random.

We graft this structure on our P2M tree structure and stick in
those MFN in the leafs. But for all other leaf entries, or for the top
root, or middle one, for which there is a void entry, we assume it is
"missing". So
 pfn_to_mfn(0xc0000)=INVALID_P2M_ENTRY.

We add the possibility of setting 1-1 mappings on certain regions, so
that:
 pfn_to_mfn(0xc0000)=0xc0000

The benefit of this is, that we can assume for non-RAM regions (think
PCI BARs, or ACPI spaces), we can create mappings easily b/c we
get the PFN value to match the MFN.

For this to work efficiently we introduce are two new pages: p2m_identity
and p2m_mid_identity.  All entries in p2m_identity are set to INVALID_P2M_ENTRY
type, and all entries in p2m_mid_identity point to p2m_identity. Whenever we
are told to set the MFN which is equal to PFN for void regions, we swap over
from p2m_missing to p2m_identity or p2m_mid_missing to p2m_mid_identity.

See this diagram for details:
https://docs.google.com/drawings/edit?id=1smqIRPYq2mSxmvpabuk_Ap6bQAW_aaZnOnjzqlcmxKc&hl=en&authkey=CI2iwKcE

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c |   67 +++++++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 56 insertions(+), 11 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 8e9c669..667901f 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -217,6 +217,9 @@ static RESERVE_BRK_ARRAY(unsigned long **, p2m_top, P2M_TOP_PER_PAGE);
 static RESERVE_BRK_ARRAY(unsigned long, p2m_top_mfn, P2M_TOP_PER_PAGE);
 static RESERVE_BRK_ARRAY(unsigned long *, p2m_top_mfn_p, P2M_TOP_PER_PAGE);
 
+static RESERVE_BRK_ARRAY(unsigned long, p2m_identity, P2M_PER_PAGE);
+static RESERVE_BRK_ARRAY(unsigned long *, p2m_mid_identity, P2M_MID_PER_PAGE);
+
 RESERVE_BRK(p2m_mid, PAGE_SIZE * (MAX_DOMAIN_PAGES / (P2M_PER_PAGE * P2M_MID_PER_PAGE)));
 RESERVE_BRK(p2m_mid_mfn, PAGE_SIZE * (MAX_DOMAIN_PAGES / (P2M_PER_PAGE * P2M_MID_PER_PAGE)));
 
@@ -260,12 +263,12 @@ static void p2m_top_mfn_p_init(unsigned long **top)
 		top[i] = p2m_mid_missing_mfn;
 }
 
-static void p2m_mid_init(unsigned long **mid)
+static void p2m_mid_init(unsigned long **mid, unsigned long *ptr)
 {
 	unsigned i;
 
 	for (i = 0; i < P2M_MID_PER_PAGE; i++)
-		mid[i] = p2m_missing;
+		mid[i] = ptr;
 }
 
 static void p2m_mid_mfn_init(unsigned long *mid)
@@ -326,7 +329,7 @@ void xen_build_mfn_list_list(void)
 		 * they're just missing, just update the stored mfn,
 		 * since all could have changed over a migrate.
 		 */
-		if (mid == p2m_mid_missing) {
+		if (mid == p2m_mid_identity || mid == p2m_mid_missing) {
 			BUG_ON(mididx);
 			BUG_ON(mid_mfn_p != p2m_mid_missing_mfn);
 			p2m_top_mfn[topidx] = virt_to_mfn(p2m_mid_missing_mfn);
@@ -374,11 +377,16 @@ void __init xen_build_dynamic_phys_to_machine(void)
 	p2m_init(p2m_missing);
 
 	p2m_mid_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
-	p2m_mid_init(p2m_mid_missing);
+	p2m_mid_init(p2m_mid_missing, p2m_missing);
 
 	p2m_top = extend_brk(PAGE_SIZE, PAGE_SIZE);
 	p2m_top_init(p2m_top);
 
+	p2m_identity = extend_brk(PAGE_SIZE, PAGE_SIZE);
+	p2m_init(p2m_identity);
+
+	p2m_mid_identity = extend_brk(PAGE_SIZE, PAGE_SIZE);
+	p2m_mid_init(p2m_mid_identity, p2m_identity);
 	/*
 	 * The domain builder gives us a pre-constructed p2m array in
 	 * mfn_list for all the pages initially given to us, so we just
@@ -390,7 +398,7 @@ void __init xen_build_dynamic_phys_to_machine(void)
 
 		if (p2m_top[topidx] == p2m_mid_missing) {
 			unsigned long **mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
-			p2m_mid_init(mid);
+			p2m_mid_init(mid, p2m_missing);
 
 			p2m_top[topidx] = mid;
 		}
@@ -410,6 +418,17 @@ unsigned long get_phys_to_machine(unsigned long pfn)
 	mididx = p2m_mid_index(pfn);
 	idx = p2m_index(pfn);
 
+	/*
+	 * The INVALID_P2M_ENTRY is filled in both p2m_*identity
+	 * and in p2m_*missing, so returning the INVALID_P2M_ENTRY
+	 * would be wrong.
+	 */
+	if (p2m_top[topidx] == p2m_mid_identity)
+		return pfn;
+
+	if (p2m_top[topidx][mididx] == p2m_identity)
+		return pfn;
+
 	return p2m_top[topidx][mididx][idx];
 }
 EXPORT_SYMBOL_GPL(get_phys_to_machine);
@@ -434,7 +453,7 @@ static void free_p2m_page(void *p)
 static bool alloc_p2m(unsigned long pfn)
 {
 	unsigned topidx, mididx;
-	unsigned long ***top_p, **mid;
+	unsigned long ***top_p, **mid, **mid_orig;
 	unsigned long *top_mfn_p, *mid_mfn;
 
 	topidx = p2m_top_index(pfn);
@@ -443,15 +462,19 @@ static bool alloc_p2m(unsigned long pfn)
 	top_p = &p2m_top[topidx];
 	mid = *top_p;
 
-	if (mid == p2m_mid_missing) {
+	if (mid == p2m_mid_identity || mid == p2m_mid_missing) {
 		/* Mid level is missing, allocate a new one */
+		mid_orig = mid;
 		mid = alloc_p2m_page();
 		if (!mid)
 			return false;
 
-		p2m_mid_init(mid);
+		if (mid == p2m_mid_identity)
+			p2m_mid_init(mid, p2m_identity);
+		else
+			p2m_mid_init(mid, p2m_missing);
 
-		if (cmpxchg(top_p, p2m_mid_missing, mid) != p2m_mid_missing)
+		if (cmpxchg(top_p, mid_orig, mid) != mid_orig)
 			free_p2m_page(mid);
 	}
 
@@ -479,9 +502,11 @@ static bool alloc_p2m(unsigned long pfn)
 			p2m_top_mfn_p[topidx] = mid_mfn;
 	}
 
-	if (p2m_top[topidx][mididx] == p2m_missing) {
+	if (p2m_top[topidx][mididx] == p2m_identity ||
+	    p2m_top[topidx][mididx] == p2m_missing) {
 		/* p2m leaf page is missing */
 		unsigned long *p2m;
+		unsigned long *p2m_orig = p2m_top[topidx][mididx];
 
 		p2m = alloc_p2m_page();
 		if (!p2m)
@@ -489,7 +514,7 @@ static bool alloc_p2m(unsigned long pfn)
 
 		p2m_init(p2m);
 
-		if (cmpxchg(&mid[mididx], p2m_missing, p2m) != p2m_missing)
+		if (cmpxchg(&mid[mididx], p2m_orig, p2m) != p2m_orig)
 			free_p2m_page(p2m);
 		else
 			mid_mfn[mididx] = virt_to_mfn(p2m);
@@ -517,6 +542,26 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 	mididx = p2m_mid_index(pfn);
 	idx = p2m_index(pfn);
 
+	/* For sparse holes were the p2m leaf has real PFN along with
+	 * PCI holes, stick in the PFN as the MFN value.
+	 */
+	if (pfn == mfn) {
+		if (p2m_top[topidx] == p2m_mid_identity)
+			return 1;
+		if (p2m_top[topidx][mididx] == p2m_identity)
+			return 1;
+
+		/* Swap over from MISSING to IDENTITY if needed. */
+		if (p2m_top[topidx] == p2m_mid_missing) {
+			p2m_top[topidx] = p2m_mid_identity;
+			return 1;
+		}
+		if (p2m_top[topidx][mididx] == p2m_missing) {
+			p2m_top[topidx][mididx] = p2m_identity;
+			return 1;
+		}
+	}
+
 	if (p2m_top[topidx][mididx] == p2m_missing)
 		return mfn == INVALID_P2M_ENTRY;
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 2/8] xen/mmu: Add the notion of identity (1-1) mapping.
@ 2010-12-30 19:48   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-12-30 19:48 UTC (permalink / raw)
  To: linux-kernel, Jeremy Fitzhardinge, hpa, Ian Campbell
  Cc: Jan Beulich, xen-devel, Konrad Rzeszutek Wilk,
	Stefano Stabellini, Konrad Rzeszutek Wilk

Our P2M tree structure is a three-level. On the leaf nodes
we set the Machine Frame Number (MFN) of the PFN. What this means
is that when one does: pfn_to_mfn(pfn), which is used when creating
PTE entries, you get the real MFN of the hardware. When Xen sets
up a guest it initially populates a array which has descending MFN
values, as so:

 idx: 0,  1,       2
 [0x290F, 0x290E, 0x290D, ..]

so pfn_to_mfn(2)==0x290D. If you start, restart many guests that list
starts looking quite random.

We graft this structure on our P2M tree structure and stick in
those MFN in the leafs. But for all other leaf entries, or for the top
root, or middle one, for which there is a void entry, we assume it is
"missing". So
 pfn_to_mfn(0xc0000)=INVALID_P2M_ENTRY.

We add the possibility of setting 1-1 mappings on certain regions, so
that:
 pfn_to_mfn(0xc0000)=0xc0000

The benefit of this is, that we can assume for non-RAM regions (think
PCI BARs, or ACPI spaces), we can create mappings easily b/c we
get the PFN value to match the MFN.

For this to work efficiently we introduce are two new pages: p2m_identity
and p2m_mid_identity.  All entries in p2m_identity are set to INVALID_P2M_ENTRY
type, and all entries in p2m_mid_identity point to p2m_identity. Whenever we
are told to set the MFN which is equal to PFN for void regions, we swap over
from p2m_missing to p2m_identity or p2m_mid_missing to p2m_mid_identity.

See this diagram for details:
https://docs.google.com/drawings/edit?id=1smqIRPYq2mSxmvpabuk_Ap6bQAW_aaZnOnjzqlcmxKc&hl=en&authkey=CI2iwKcE

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c |   67 +++++++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 56 insertions(+), 11 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 8e9c669..667901f 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -217,6 +217,9 @@ static RESERVE_BRK_ARRAY(unsigned long **, p2m_top, P2M_TOP_PER_PAGE);
 static RESERVE_BRK_ARRAY(unsigned long, p2m_top_mfn, P2M_TOP_PER_PAGE);
 static RESERVE_BRK_ARRAY(unsigned long *, p2m_top_mfn_p, P2M_TOP_PER_PAGE);
 
+static RESERVE_BRK_ARRAY(unsigned long, p2m_identity, P2M_PER_PAGE);
+static RESERVE_BRK_ARRAY(unsigned long *, p2m_mid_identity, P2M_MID_PER_PAGE);
+
 RESERVE_BRK(p2m_mid, PAGE_SIZE * (MAX_DOMAIN_PAGES / (P2M_PER_PAGE * P2M_MID_PER_PAGE)));
 RESERVE_BRK(p2m_mid_mfn, PAGE_SIZE * (MAX_DOMAIN_PAGES / (P2M_PER_PAGE * P2M_MID_PER_PAGE)));
 
@@ -260,12 +263,12 @@ static void p2m_top_mfn_p_init(unsigned long **top)
 		top[i] = p2m_mid_missing_mfn;
 }
 
-static void p2m_mid_init(unsigned long **mid)
+static void p2m_mid_init(unsigned long **mid, unsigned long *ptr)
 {
 	unsigned i;
 
 	for (i = 0; i < P2M_MID_PER_PAGE; i++)
-		mid[i] = p2m_missing;
+		mid[i] = ptr;
 }
 
 static void p2m_mid_mfn_init(unsigned long *mid)
@@ -326,7 +329,7 @@ void xen_build_mfn_list_list(void)
 		 * they're just missing, just update the stored mfn,
 		 * since all could have changed over a migrate.
 		 */
-		if (mid == p2m_mid_missing) {
+		if (mid == p2m_mid_identity || mid == p2m_mid_missing) {
 			BUG_ON(mididx);
 			BUG_ON(mid_mfn_p != p2m_mid_missing_mfn);
 			p2m_top_mfn[topidx] = virt_to_mfn(p2m_mid_missing_mfn);
@@ -374,11 +377,16 @@ void __init xen_build_dynamic_phys_to_machine(void)
 	p2m_init(p2m_missing);
 
 	p2m_mid_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
-	p2m_mid_init(p2m_mid_missing);
+	p2m_mid_init(p2m_mid_missing, p2m_missing);
 
 	p2m_top = extend_brk(PAGE_SIZE, PAGE_SIZE);
 	p2m_top_init(p2m_top);
 
+	p2m_identity = extend_brk(PAGE_SIZE, PAGE_SIZE);
+	p2m_init(p2m_identity);
+
+	p2m_mid_identity = extend_brk(PAGE_SIZE, PAGE_SIZE);
+	p2m_mid_init(p2m_mid_identity, p2m_identity);
 	/*
 	 * The domain builder gives us a pre-constructed p2m array in
 	 * mfn_list for all the pages initially given to us, so we just
@@ -390,7 +398,7 @@ void __init xen_build_dynamic_phys_to_machine(void)
 
 		if (p2m_top[topidx] == p2m_mid_missing) {
 			unsigned long **mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
-			p2m_mid_init(mid);
+			p2m_mid_init(mid, p2m_missing);
 
 			p2m_top[topidx] = mid;
 		}
@@ -410,6 +418,17 @@ unsigned long get_phys_to_machine(unsigned long pfn)
 	mididx = p2m_mid_index(pfn);
 	idx = p2m_index(pfn);
 
+	/*
+	 * The INVALID_P2M_ENTRY is filled in both p2m_*identity
+	 * and in p2m_*missing, so returning the INVALID_P2M_ENTRY
+	 * would be wrong.
+	 */
+	if (p2m_top[topidx] == p2m_mid_identity)
+		return pfn;
+
+	if (p2m_top[topidx][mididx] == p2m_identity)
+		return pfn;
+
 	return p2m_top[topidx][mididx][idx];
 }
 EXPORT_SYMBOL_GPL(get_phys_to_machine);
@@ -434,7 +453,7 @@ static void free_p2m_page(void *p)
 static bool alloc_p2m(unsigned long pfn)
 {
 	unsigned topidx, mididx;
-	unsigned long ***top_p, **mid;
+	unsigned long ***top_p, **mid, **mid_orig;
 	unsigned long *top_mfn_p, *mid_mfn;
 
 	topidx = p2m_top_index(pfn);
@@ -443,15 +462,19 @@ static bool alloc_p2m(unsigned long pfn)
 	top_p = &p2m_top[topidx];
 	mid = *top_p;
 
-	if (mid == p2m_mid_missing) {
+	if (mid == p2m_mid_identity || mid == p2m_mid_missing) {
 		/* Mid level is missing, allocate a new one */
+		mid_orig = mid;
 		mid = alloc_p2m_page();
 		if (!mid)
 			return false;
 
-		p2m_mid_init(mid);
+		if (mid == p2m_mid_identity)
+			p2m_mid_init(mid, p2m_identity);
+		else
+			p2m_mid_init(mid, p2m_missing);
 
-		if (cmpxchg(top_p, p2m_mid_missing, mid) != p2m_mid_missing)
+		if (cmpxchg(top_p, mid_orig, mid) != mid_orig)
 			free_p2m_page(mid);
 	}
 
@@ -479,9 +502,11 @@ static bool alloc_p2m(unsigned long pfn)
 			p2m_top_mfn_p[topidx] = mid_mfn;
 	}
 
-	if (p2m_top[topidx][mididx] == p2m_missing) {
+	if (p2m_top[topidx][mididx] == p2m_identity ||
+	    p2m_top[topidx][mididx] == p2m_missing) {
 		/* p2m leaf page is missing */
 		unsigned long *p2m;
+		unsigned long *p2m_orig = p2m_top[topidx][mididx];
 
 		p2m = alloc_p2m_page();
 		if (!p2m)
@@ -489,7 +514,7 @@ static bool alloc_p2m(unsigned long pfn)
 
 		p2m_init(p2m);
 
-		if (cmpxchg(&mid[mididx], p2m_missing, p2m) != p2m_missing)
+		if (cmpxchg(&mid[mididx], p2m_orig, p2m) != p2m_orig)
 			free_p2m_page(p2m);
 		else
 			mid_mfn[mididx] = virt_to_mfn(p2m);
@@ -517,6 +542,26 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 	mididx = p2m_mid_index(pfn);
 	idx = p2m_index(pfn);
 
+	/* For sparse holes were the p2m leaf has real PFN along with
+	 * PCI holes, stick in the PFN as the MFN value.
+	 */
+	if (pfn == mfn) {
+		if (p2m_top[topidx] == p2m_mid_identity)
+			return 1;
+		if (p2m_top[topidx][mididx] == p2m_identity)
+			return 1;
+
+		/* Swap over from MISSING to IDENTITY if needed. */
+		if (p2m_top[topidx] == p2m_mid_missing) {
+			p2m_top[topidx] = p2m_mid_identity;
+			return 1;
+		}
+		if (p2m_top[topidx][mididx] == p2m_missing) {
+			p2m_top[topidx][mididx] = p2m_identity;
+			return 1;
+		}
+	}
+
 	if (p2m_top[topidx][mididx] == p2m_missing)
 		return mfn == INVALID_P2M_ENTRY;
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 3/8] xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.
  2010-12-30 19:48 ` Konrad Rzeszutek Wilk
@ 2010-12-30 19:48   ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-12-30 19:48 UTC (permalink / raw)
  To: linux-kernel, Jeremy Fitzhardinge, hpa, Ian Campbell
  Cc: Jan Beulich, xen-devel, Konrad Rzeszutek Wilk,
	Stefano Stabellini, Konrad Rzeszutek Wilk

We walk the E820 region and start at 0 (for PV guests we start
at ISA_END_ADDRESS) and skip any E820 RAM regions. For all other
regions and as well the gaps we set them to be identity mappings.

The reasons we do not want to set the identity mapping from 0->
ISA_END_ADDRESS when running as PV is b/c that the kernel would
try to read DMI information and fail (no permissions to read that).

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/setup.c |   39 +++++++++++++++++++++++++++++++++++++++
 1 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 7201800..643b3fc 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -143,6 +143,38 @@ static unsigned long __init xen_return_unused_memory(unsigned long max_pfn,
 	return released;
 }
 
+static unsigned long __init xen_set_identity(const struct e820map *e820)
+{
+	phys_addr_t last = xen_initial_domain() ? 0 : ISA_END_ADDRESS;
+	int i;
+	unsigned long identity = 0;
+	unsigned long pfn;
+
+	for (i = 0; i < e820->nr_map; i++) {
+		phys_addr_t start = e820->map[i].addr;
+		phys_addr_t end = start + e820->map[i].size;
+
+		if (end < start)
+			continue;
+
+		/* Skip over the 1MB region. */
+		if (last > end)
+			continue;
+
+		if (e820->map[i].type == E820_RAM) {
+			/* Without saving 'last' we would gooble RAM too. */
+			last = end;
+			continue;
+		}
+
+		for (pfn = PFN_UP(last); pfn < PFN_DOWN(end); pfn++)
+			__set_phys_to_machine(pfn, pfn);
+		identity += pfn - PFN_UP(last);
+
+		last = end;
+	}
+	return identity;
+}
 /**
  * machine_specific_memory_setup - Hook for machine specific memory setup.
  **/
@@ -156,6 +188,7 @@ char * __init xen_memory_setup(void)
 	struct xen_memory_map memmap;
 	unsigned long extra_pages = 0;
 	unsigned long extra_limit;
+	unsigned long identity_pages = 0;
 	int i;
 	int op;
 
@@ -251,6 +284,12 @@ char * __init xen_memory_setup(void)
 
 	xen_add_extra_mem(extra_pages);
 
+	/*
+	 * Set P2M for all non-RAM pages and E820 gaps to be identity
+	 * type PFNs.
+	 */
+	identity_pages = xen_set_identity(&e820);
+	printk(KERN_INFO "Set %ld page(s) to 1-1 mapping.\n", identity_pages);
 	return "Xen";
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 3/8] xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.
@ 2010-12-30 19:48   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-12-30 19:48 UTC (permalink / raw)
  To: linux-kernel, Jeremy Fitzhardinge, hpa, Ian Campbell
  Cc: Jan Beulich, xen-devel, Konrad Rzeszutek Wilk,
	Stefano Stabellini, Konrad Rzeszutek Wilk

We walk the E820 region and start at 0 (for PV guests we start
at ISA_END_ADDRESS) and skip any E820 RAM regions. For all other
regions and as well the gaps we set them to be identity mappings.

The reasons we do not want to set the identity mapping from 0->
ISA_END_ADDRESS when running as PV is b/c that the kernel would
try to read DMI information and fail (no permissions to read that).

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/setup.c |   39 +++++++++++++++++++++++++++++++++++++++
 1 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 7201800..643b3fc 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -143,6 +143,38 @@ static unsigned long __init xen_return_unused_memory(unsigned long max_pfn,
 	return released;
 }
 
+static unsigned long __init xen_set_identity(const struct e820map *e820)
+{
+	phys_addr_t last = xen_initial_domain() ? 0 : ISA_END_ADDRESS;
+	int i;
+	unsigned long identity = 0;
+	unsigned long pfn;
+
+	for (i = 0; i < e820->nr_map; i++) {
+		phys_addr_t start = e820->map[i].addr;
+		phys_addr_t end = start + e820->map[i].size;
+
+		if (end < start)
+			continue;
+
+		/* Skip over the 1MB region. */
+		if (last > end)
+			continue;
+
+		if (e820->map[i].type == E820_RAM) {
+			/* Without saving 'last' we would gooble RAM too. */
+			last = end;
+			continue;
+		}
+
+		for (pfn = PFN_UP(last); pfn < PFN_DOWN(end); pfn++)
+			__set_phys_to_machine(pfn, pfn);
+		identity += pfn - PFN_UP(last);
+
+		last = end;
+	}
+	return identity;
+}
 /**
  * machine_specific_memory_setup - Hook for machine specific memory setup.
  **/
@@ -156,6 +188,7 @@ char * __init xen_memory_setup(void)
 	struct xen_memory_map memmap;
 	unsigned long extra_pages = 0;
 	unsigned long extra_limit;
+	unsigned long identity_pages = 0;
 	int i;
 	int op;
 
@@ -251,6 +284,12 @@ char * __init xen_memory_setup(void)
 
 	xen_add_extra_mem(extra_pages);
 
+	/*
+	 * Set P2M for all non-RAM pages and E820 gaps to be identity
+	 * type PFNs.
+	 */
+	identity_pages = xen_set_identity(&e820);
+	printk(KERN_INFO "Set %ld page(s) to 1-1 mapping.\n", identity_pages);
 	return "Xen";
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 4/8] xen/mmu: Warn against races.
  2010-12-30 19:48 ` Konrad Rzeszutek Wilk
@ 2010-12-30 19:48   ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-12-30 19:48 UTC (permalink / raw)
  To: linux-kernel, Jeremy Fitzhardinge, hpa, Ian Campbell
  Cc: Jan Beulich, xen-devel, Konrad Rzeszutek Wilk,
	Stefano Stabellini, Konrad Rzeszutek Wilk

The initial bootup code uses set_phys_to_machine quite a lot, and after
bootup it would be used by the balloon driver. The balloon driver does have
mutex lock so this should not be necessary - but just in case, add
a warning if we do hit this scenario.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 667901f..36bdd2d 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -553,11 +553,13 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 
 		/* Swap over from MISSING to IDENTITY if needed. */
 		if (p2m_top[topidx] == p2m_mid_missing) {
-			p2m_top[topidx] = p2m_mid_identity;
+			WARN_ON(cmpxchg(&p2m_top[topidx], p2m_mid_missing,
+				p2m_mid_identity) != p2m_mid_missing);
 			return 1;
 		}
 		if (p2m_top[topidx][mididx] == p2m_missing) {
-			p2m_top[topidx][mididx] = p2m_identity;
+			WARN_ON(cmpxchg(&p2m_top[topidx][mididx], p2m_missing,
+				p2m_identity) != p2m_missing);
 			return 1;
 		}
 	}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 4/8] xen/mmu: Warn against races.
@ 2010-12-30 19:48   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-12-30 19:48 UTC (permalink / raw)
  To: linux-kernel, Jeremy Fitzhardinge, hpa, Ian Campbell
  Cc: Jan Beulich, xen-devel, Konrad Rzeszutek Wilk,
	Stefano Stabellini, Konrad Rzeszutek Wilk

The initial bootup code uses set_phys_to_machine quite a lot, and after
bootup it would be used by the balloon driver. The balloon driver does have
mutex lock so this should not be necessary - but just in case, add
a warning if we do hit this scenario.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 667901f..36bdd2d 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -553,11 +553,13 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 
 		/* Swap over from MISSING to IDENTITY if needed. */
 		if (p2m_top[topidx] == p2m_mid_missing) {
-			p2m_top[topidx] = p2m_mid_identity;
+			WARN_ON(cmpxchg(&p2m_top[topidx], p2m_mid_missing,
+				p2m_mid_identity) != p2m_mid_missing);
 			return 1;
 		}
 		if (p2m_top[topidx][mididx] == p2m_missing) {
-			p2m_top[topidx][mididx] = p2m_identity;
+			WARN_ON(cmpxchg(&p2m_top[topidx][mididx], p2m_missing,
+				p2m_identity) != p2m_missing);
 			return 1;
 		}
 	}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 5/8] xen/debug: Print out all pages in the P2M.
  2010-12-30 19:48 ` Konrad Rzeszutek Wilk
@ 2010-12-30 19:48   ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-12-30 19:48 UTC (permalink / raw)
  To: linux-kernel, Jeremy Fitzhardinge, hpa, Ian Campbell
  Cc: Jan Beulich, xen-devel, Konrad Rzeszutek Wilk,
	Stefano Stabellini, Konrad Rzeszutek Wilk

We walk over the whole P2M and constructing a simplified view of which
regions belong to what level and what type they are.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c |   84 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 84 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 36bdd2d..e9dfdd6 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -46,6 +46,7 @@
 #include <linux/module.h>
 #include <linux/gfp.h>
 #include <linux/memblock.h>
+#include <linux/seq_file.h>
 
 #include <asm/pgtable.h>
 #include <asm/tlbflush.h>
@@ -2764,6 +2765,88 @@ EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_range);
 
 #ifdef CONFIG_XEN_DEBUG_FS
 
+static int p2m_dump_show(struct seq_file *m, void *v)
+{
+	static const char * const level_name[] = { "top", "middle",
+						"entry", "abnormal" };
+	static const char * const type_name[] = { "identity", "missing",
+						"pfn", "abnormal"};
+#define TYPE_IDENTITY 0
+#define TYPE_MISSING 1
+#define TYPE_PFN 2
+#define TYPE_UNKN 3
+	unsigned long pfn, prev_pfn_type = 0, prev_pfn_level = 0;
+	unsigned int prev_level = 0;
+	unsigned int prev_type = TYPE_UNKN;
+
+	if (!p2m_top)
+		return 0;
+
+	for (pfn = 0; pfn < MAX_DOMAIN_PAGES; pfn++) {
+		unsigned topidx = p2m_top_index(pfn);
+		unsigned mididx = p2m_mid_index(pfn);
+		unsigned idx = p2m_index(pfn);
+		unsigned lvl, type;
+
+		lvl = 4;
+		type = TYPE_UNKN;
+		if (p2m_top[topidx] == p2m_mid_missing) {
+			lvl = 0; type = TYPE_MISSING;
+		} else if (p2m_top[topidx] == p2m_mid_identity) {
+			lvl = 0; type = TYPE_IDENTITY;
+		} else if (p2m_top[topidx][mididx] == p2m_identity) {
+			lvl = 1; type = TYPE_IDENTITY;
+		} else if (p2m_top[topidx][mididx][idx] == pfn) {
+			lvl = 2; type = TYPE_IDENTITY;
+		} else if (p2m_top[topidx] == NULL) {
+			lvl = 0; type = TYPE_UNKN;
+		} else if (p2m_top[topidx][mididx] == NULL) {
+			lvl = 1; type = TYPE_UNKN;
+		} else if (p2m_top[topidx][mididx][idx] == 0) {
+			lvl = 2; type = TYPE_UNKN;
+		} else if (p2m_top[topidx][mididx] == p2m_missing) {
+			lvl = 1; type = TYPE_MISSING;
+		} else if (p2m_top[topidx][mididx][idx] == INVALID_P2M_ENTRY) {
+			lvl = 2; type = TYPE_MISSING;
+		} else if (p2m_top[topidx][mididx][idx] != pfn) {
+			lvl = 2; type = TYPE_PFN;
+		}
+		if (pfn == 0) {
+			prev_level = lvl;
+			prev_type = type;
+		}
+		if (pfn == MAX_DOMAIN_PAGES-1) {
+			lvl = 3;
+			type = TYPE_UNKN;
+		}
+		if (prev_type != type) {
+			seq_printf(m, " [0x%lx->0x%lx] %s\n",
+				prev_pfn_type, pfn, type_name[prev_type]);
+			prev_pfn_type = pfn;
+			prev_type = type;
+		}
+		if (prev_level != lvl) {
+			seq_printf(m, " [0x%lx->0x%lx] level %s\n",
+				prev_pfn_level, pfn, level_name[prev_level]);
+			prev_pfn_level = pfn;
+			prev_level = lvl;
+		}
+	}
+	return 0;
+}
+
+static int p2m_dump_open(struct inode *inode, struct file *filp)
+{
+	return single_open(filp, p2m_dump_show, NULL);
+}
+
+static const struct file_operations p2m_dump_fops = {
+	.open		= p2m_dump_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
 static struct dentry *d_mmu_debug;
 
 static int __init xen_mmu_debugfs(void)
@@ -2819,6 +2902,7 @@ static int __init xen_mmu_debugfs(void)
 	debugfs_create_u32("prot_commit_batched", 0444, d_mmu_debug,
 			   &mmu_stats.prot_commit_batched);
 
+	debugfs_create_file("p2m", 0600, d_mmu_debug, NULL, &p2m_dump_fops);
 	return 0;
 }
 fs_initcall(xen_mmu_debugfs);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 5/8] xen/debug: Print out all pages in the P2M.
@ 2010-12-30 19:48   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-12-30 19:48 UTC (permalink / raw)
  To: linux-kernel, Jeremy Fitzhardinge, hpa, Ian Campbell
  Cc: Konrad Rzeszutek Wilk, Konrad Rzeszutek Wilk, xen-devel,
	Jan Beulich, Stefano Stabellini

We walk over the whole P2M and constructing a simplified view of which
regions belong to what level and what type they are.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c |   84 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 84 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 36bdd2d..e9dfdd6 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -46,6 +46,7 @@
 #include <linux/module.h>
 #include <linux/gfp.h>
 #include <linux/memblock.h>
+#include <linux/seq_file.h>
 
 #include <asm/pgtable.h>
 #include <asm/tlbflush.h>
@@ -2764,6 +2765,88 @@ EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_range);
 
 #ifdef CONFIG_XEN_DEBUG_FS
 
+static int p2m_dump_show(struct seq_file *m, void *v)
+{
+	static const char * const level_name[] = { "top", "middle",
+						"entry", "abnormal" };
+	static const char * const type_name[] = { "identity", "missing",
+						"pfn", "abnormal"};
+#define TYPE_IDENTITY 0
+#define TYPE_MISSING 1
+#define TYPE_PFN 2
+#define TYPE_UNKN 3
+	unsigned long pfn, prev_pfn_type = 0, prev_pfn_level = 0;
+	unsigned int prev_level = 0;
+	unsigned int prev_type = TYPE_UNKN;
+
+	if (!p2m_top)
+		return 0;
+
+	for (pfn = 0; pfn < MAX_DOMAIN_PAGES; pfn++) {
+		unsigned topidx = p2m_top_index(pfn);
+		unsigned mididx = p2m_mid_index(pfn);
+		unsigned idx = p2m_index(pfn);
+		unsigned lvl, type;
+
+		lvl = 4;
+		type = TYPE_UNKN;
+		if (p2m_top[topidx] == p2m_mid_missing) {
+			lvl = 0; type = TYPE_MISSING;
+		} else if (p2m_top[topidx] == p2m_mid_identity) {
+			lvl = 0; type = TYPE_IDENTITY;
+		} else if (p2m_top[topidx][mididx] == p2m_identity) {
+			lvl = 1; type = TYPE_IDENTITY;
+		} else if (p2m_top[topidx][mididx][idx] == pfn) {
+			lvl = 2; type = TYPE_IDENTITY;
+		} else if (p2m_top[topidx] == NULL) {
+			lvl = 0; type = TYPE_UNKN;
+		} else if (p2m_top[topidx][mididx] == NULL) {
+			lvl = 1; type = TYPE_UNKN;
+		} else if (p2m_top[topidx][mididx][idx] == 0) {
+			lvl = 2; type = TYPE_UNKN;
+		} else if (p2m_top[topidx][mididx] == p2m_missing) {
+			lvl = 1; type = TYPE_MISSING;
+		} else if (p2m_top[topidx][mididx][idx] == INVALID_P2M_ENTRY) {
+			lvl = 2; type = TYPE_MISSING;
+		} else if (p2m_top[topidx][mididx][idx] != pfn) {
+			lvl = 2; type = TYPE_PFN;
+		}
+		if (pfn == 0) {
+			prev_level = lvl;
+			prev_type = type;
+		}
+		if (pfn == MAX_DOMAIN_PAGES-1) {
+			lvl = 3;
+			type = TYPE_UNKN;
+		}
+		if (prev_type != type) {
+			seq_printf(m, " [0x%lx->0x%lx] %s\n",
+				prev_pfn_type, pfn, type_name[prev_type]);
+			prev_pfn_type = pfn;
+			prev_type = type;
+		}
+		if (prev_level != lvl) {
+			seq_printf(m, " [0x%lx->0x%lx] level %s\n",
+				prev_pfn_level, pfn, level_name[prev_level]);
+			prev_pfn_level = pfn;
+			prev_level = lvl;
+		}
+	}
+	return 0;
+}
+
+static int p2m_dump_open(struct inode *inode, struct file *filp)
+{
+	return single_open(filp, p2m_dump_show, NULL);
+}
+
+static const struct file_operations p2m_dump_fops = {
+	.open		= p2m_dump_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
 static struct dentry *d_mmu_debug;
 
 static int __init xen_mmu_debugfs(void)
@@ -2819,6 +2902,7 @@ static int __init xen_mmu_debugfs(void)
 	debugfs_create_u32("prot_commit_batched", 0444, d_mmu_debug,
 			   &mmu_stats.prot_commit_batched);
 
+	debugfs_create_file("p2m", 0600, d_mmu_debug, NULL, &p2m_dump_fops);
 	return 0;
 }
 fs_initcall(xen_mmu_debugfs);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 6/8] xen/debug: WARN_ON when 1-1 but no _PAGE_IOMAP flag set.
  2010-12-30 19:48 ` Konrad Rzeszutek Wilk
@ 2010-12-30 19:48   ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-12-30 19:48 UTC (permalink / raw)
  To: linux-kernel, Jeremy Fitzhardinge, hpa, Ian Campbell
  Cc: Jan Beulich, xen-devel, Konrad Rzeszutek Wilk,
	Stefano Stabellini, Konrad Rzeszutek Wilk

Only enabled if XEN_DEBUG_FS is enabled. We print a warning
when:

 pfn_to_mfn(pfn) == pfn, but no VM_IO (_PAGE_IOMAP) flag
 pfn_to_mfn(pfn) != pfn, and VM_IO flag is set.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c |   37 +++++++++++++++++++++++++++++++++++++
 1 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index e9dfdd6..d98bd43 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -944,6 +944,40 @@ pte_t xen_make_pte(pteval_t pte)
 }
 PV_CALLEE_SAVE_REGS_THUNK(xen_make_pte);
 
+#ifdef CONFIG_XEN_DEBUG_FS
+pte_t xen_make_pte_debug(pteval_t pte)
+{
+	phys_addr_t addr = (pte & PTE_PFN_MASK);
+	phys_addr_t other_addr;
+	bool io_page = false;
+	pte_t _pte;
+
+	if (pte & _PAGE_IOMAP)
+		io_page = true;
+
+	_pte = xen_make_pte(pte);
+
+	if (!addr)
+		return _pte;
+
+	if (io_page &&
+	    (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
+		other_addr = pfn_to_mfn(addr >> PAGE_SHIFT) << PAGE_SHIFT;
+		WARN(addr != other_addr,
+			"0x%lx is using VM_IO, but it is 0x%lx!\n",
+			(unsigned long)addr, (unsigned long)other_addr);
+	} else {
+		other_addr = (_pte.pte & PTE_PFN_MASK);
+		WARN((addr == other_addr) && (!io_page),
+			"0x%lx is missing VM_IO!\n",
+			(unsigned long)addr);
+	}
+
+	return _pte;
+}
+PV_CALLEE_SAVE_REGS_THUNK(xen_make_pte_debug);
+#endif
+
 pgd_t xen_make_pgd(pgdval_t pgd)
 {
 	pgd = pte_pfn_to_mfn(pgd);
@@ -2354,6 +2388,9 @@ __init void xen_ident_map_ISA(void)
 
 static __init void xen_post_allocator_init(void)
 {
+#ifdef CONFIG_XEN_DEBUG_FS
+	pv_mmu_ops.make_pte = PV_CALLEE_SAVE(xen_make_pte_debug);
+#endif
 	pv_mmu_ops.set_pte = xen_set_pte;
 	pv_mmu_ops.set_pmd = xen_set_pmd;
 	pv_mmu_ops.set_pud = xen_set_pud;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 6/8] xen/debug: WARN_ON when 1-1 but no _PAGE_IOMAP flag set.
@ 2010-12-30 19:48   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-12-30 19:48 UTC (permalink / raw)
  To: linux-kernel, Jeremy Fitzhardinge, hpa, Ian Campbell
  Cc: Jan Beulich, xen-devel, Konrad Rzeszutek Wilk,
	Stefano Stabellini, Konrad Rzeszutek Wilk

Only enabled if XEN_DEBUG_FS is enabled. We print a warning
when:

 pfn_to_mfn(pfn) == pfn, but no VM_IO (_PAGE_IOMAP) flag
 pfn_to_mfn(pfn) != pfn, and VM_IO flag is set.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c |   37 +++++++++++++++++++++++++++++++++++++
 1 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index e9dfdd6..d98bd43 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -944,6 +944,40 @@ pte_t xen_make_pte(pteval_t pte)
 }
 PV_CALLEE_SAVE_REGS_THUNK(xen_make_pte);
 
+#ifdef CONFIG_XEN_DEBUG_FS
+pte_t xen_make_pte_debug(pteval_t pte)
+{
+	phys_addr_t addr = (pte & PTE_PFN_MASK);
+	phys_addr_t other_addr;
+	bool io_page = false;
+	pte_t _pte;
+
+	if (pte & _PAGE_IOMAP)
+		io_page = true;
+
+	_pte = xen_make_pte(pte);
+
+	if (!addr)
+		return _pte;
+
+	if (io_page &&
+	    (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
+		other_addr = pfn_to_mfn(addr >> PAGE_SHIFT) << PAGE_SHIFT;
+		WARN(addr != other_addr,
+			"0x%lx is using VM_IO, but it is 0x%lx!\n",
+			(unsigned long)addr, (unsigned long)other_addr);
+	} else {
+		other_addr = (_pte.pte & PTE_PFN_MASK);
+		WARN((addr == other_addr) && (!io_page),
+			"0x%lx is missing VM_IO!\n",
+			(unsigned long)addr);
+	}
+
+	return _pte;
+}
+PV_CALLEE_SAVE_REGS_THUNK(xen_make_pte_debug);
+#endif
+
 pgd_t xen_make_pgd(pgdval_t pgd)
 {
 	pgd = pte_pfn_to_mfn(pgd);
@@ -2354,6 +2388,9 @@ __init void xen_ident_map_ISA(void)
 
 static __init void xen_post_allocator_init(void)
 {
+#ifdef CONFIG_XEN_DEBUG_FS
+	pv_mmu_ops.make_pte = PV_CALLEE_SAVE(xen_make_pte_debug);
+#endif
 	pv_mmu_ops.set_pte = xen_set_pte;
 	pv_mmu_ops.set_pmd = xen_set_pmd;
 	pv_mmu_ops.set_pud = xen_set_pud;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 7/8] xen/mmu: Introduce IDENTITY_FRAME_BIT
  2010-12-30 19:48 ` Konrad Rzeszutek Wilk
@ 2010-12-30 19:48   ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-12-30 19:48 UTC (permalink / raw)
  To: linux-kernel, Jeremy Fitzhardinge, hpa, Ian Campbell
  Cc: Jan Beulich, xen-devel, Konrad Rzeszutek Wilk,
	Stefano Stabellini, Konrad Rzeszutek Wilk

We tack on the IDENTITY_FRAME_BIT on all PFNs which have been
elected during the E820 parsing to be identity mapping.

This way, in case a freak occurs such that pfn_to_mfn(pfn)==pfn
for non-identity PFNs we can guard against that and not consider
it a 1-1 mapping.

Also this will allows us to leverage this and tack on _PAGE_IOMAP
on any page that has IDENTITY_FRAME_BIT set (see:
"xen/mmu: Set _PAGE_IOMAP if PFN is in identity P2M.").

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/include/asm/xen/page.h |    6 +++++-
 arch/x86/xen/mmu.c              |   11 ++++++++---
 arch/x86/xen/setup.c            |    2 +-
 3 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index be671f6..a1e4ce8 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -30,6 +30,7 @@ typedef struct xpaddr {
 /**** MACHINE <-> PHYSICAL CONVERSION MACROS ****/
 #define INVALID_P2M_ENTRY	(~0UL)
 #define FOREIGN_FRAME_BIT	(1UL<<31)
+#define IDENTITY_FRAME_BIT	(1UL<<30)
 #define FOREIGN_FRAME(m)	((m) | FOREIGN_FRAME_BIT)
 
 /* Maximum amount of memory we can handle in a domain in pages */
@@ -52,9 +53,12 @@ static inline unsigned long pfn_to_mfn(unsigned long pfn)
 
 	mfn = get_phys_to_machine(pfn);
 
-	if (mfn != INVALID_P2M_ENTRY)
+	if (mfn != INVALID_P2M_ENTRY) {
 		mfn &= ~FOREIGN_FRAME_BIT;
 
+		if (mfn & IDENTITY_FRAME_BIT)
+			mfn &= ~IDENTITY_FRAME_BIT;
+	}
 	return mfn;
 }
 
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index d98bd43..d470435 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -425,10 +425,10 @@ unsigned long get_phys_to_machine(unsigned long pfn)
 	 * would be wrong.
 	 */
 	if (p2m_top[topidx] == p2m_mid_identity)
-		return pfn;
+		return pfn | IDENTITY_FRAME_BIT;
 
 	if (p2m_top[topidx][mididx] == p2m_identity)
-		return pfn;
+		return pfn | IDENTITY_FRAME_BIT;
 
 	return p2m_top[topidx][mididx][idx];
 }
@@ -546,7 +546,7 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 	/* For sparse holes were the p2m leaf has real PFN along with
 	 * PCI holes, stick in the PFN as the MFN value.
 	 */
-	if (pfn == mfn) {
+	if (mfn != INVALID_P2M_ENTRY && (mfn & IDENTITY_FRAME_BIT)) {
 		if (p2m_top[topidx] == p2m_mid_identity)
 			return 1;
 		if (p2m_top[topidx][mididx] == p2m_identity)
@@ -2834,7 +2834,12 @@ static int p2m_dump_show(struct seq_file *m, void *v)
 		} else if (p2m_top[topidx][mididx] == p2m_identity) {
 			lvl = 1; type = TYPE_IDENTITY;
 		} else if (p2m_top[topidx][mididx][idx] == pfn) {
+			lvl = 2; type = TYPE_PFN;
+		} else if (p2m_top[topidx][mididx][idx] ==
+					(pfn | IDENTITY_FRAME_BIT)) {
 			lvl = 2; type = TYPE_IDENTITY;
+		} else if (p2m_top[topidx][mididx][idx] != pfn) {
+			lvl = 2; type = TYPE_PFN;
 		} else if (p2m_top[topidx] == NULL) {
 			lvl = 0; type = TYPE_UNKN;
 		} else if (p2m_top[topidx][mididx] == NULL) {
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 643b3fc..35b3b4d 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -168,7 +168,7 @@ static unsigned long __init xen_set_identity(const struct e820map *e820)
 		}
 
 		for (pfn = PFN_UP(last); pfn < PFN_DOWN(end); pfn++)
-			__set_phys_to_machine(pfn, pfn);
+			__set_phys_to_machine(pfn, pfn | IDENTITY_FRAME_BIT);
 		identity += pfn - PFN_UP(last);
 
 		last = end;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 7/8] xen/mmu: Introduce IDENTITY_FRAME_BIT
@ 2010-12-30 19:48   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-12-30 19:48 UTC (permalink / raw)
  To: linux-kernel, Jeremy Fitzhardinge, hpa, Ian Campbell
  Cc: Jan Beulich, xen-devel, Konrad Rzeszutek Wilk,
	Stefano Stabellini, Konrad Rzeszutek Wilk

We tack on the IDENTITY_FRAME_BIT on all PFNs which have been
elected during the E820 parsing to be identity mapping.

This way, in case a freak occurs such that pfn_to_mfn(pfn)==pfn
for non-identity PFNs we can guard against that and not consider
it a 1-1 mapping.

Also this will allows us to leverage this and tack on _PAGE_IOMAP
on any page that has IDENTITY_FRAME_BIT set (see:
"xen/mmu: Set _PAGE_IOMAP if PFN is in identity P2M.").

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/include/asm/xen/page.h |    6 +++++-
 arch/x86/xen/mmu.c              |   11 ++++++++---
 arch/x86/xen/setup.c            |    2 +-
 3 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index be671f6..a1e4ce8 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -30,6 +30,7 @@ typedef struct xpaddr {
 /**** MACHINE <-> PHYSICAL CONVERSION MACROS ****/
 #define INVALID_P2M_ENTRY	(~0UL)
 #define FOREIGN_FRAME_BIT	(1UL<<31)
+#define IDENTITY_FRAME_BIT	(1UL<<30)
 #define FOREIGN_FRAME(m)	((m) | FOREIGN_FRAME_BIT)
 
 /* Maximum amount of memory we can handle in a domain in pages */
@@ -52,9 +53,12 @@ static inline unsigned long pfn_to_mfn(unsigned long pfn)
 
 	mfn = get_phys_to_machine(pfn);
 
-	if (mfn != INVALID_P2M_ENTRY)
+	if (mfn != INVALID_P2M_ENTRY) {
 		mfn &= ~FOREIGN_FRAME_BIT;
 
+		if (mfn & IDENTITY_FRAME_BIT)
+			mfn &= ~IDENTITY_FRAME_BIT;
+	}
 	return mfn;
 }
 
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index d98bd43..d470435 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -425,10 +425,10 @@ unsigned long get_phys_to_machine(unsigned long pfn)
 	 * would be wrong.
 	 */
 	if (p2m_top[topidx] == p2m_mid_identity)
-		return pfn;
+		return pfn | IDENTITY_FRAME_BIT;
 
 	if (p2m_top[topidx][mididx] == p2m_identity)
-		return pfn;
+		return pfn | IDENTITY_FRAME_BIT;
 
 	return p2m_top[topidx][mididx][idx];
 }
@@ -546,7 +546,7 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 	/* For sparse holes were the p2m leaf has real PFN along with
 	 * PCI holes, stick in the PFN as the MFN value.
 	 */
-	if (pfn == mfn) {
+	if (mfn != INVALID_P2M_ENTRY && (mfn & IDENTITY_FRAME_BIT)) {
 		if (p2m_top[topidx] == p2m_mid_identity)
 			return 1;
 		if (p2m_top[topidx][mididx] == p2m_identity)
@@ -2834,7 +2834,12 @@ static int p2m_dump_show(struct seq_file *m, void *v)
 		} else if (p2m_top[topidx][mididx] == p2m_identity) {
 			lvl = 1; type = TYPE_IDENTITY;
 		} else if (p2m_top[topidx][mididx][idx] == pfn) {
+			lvl = 2; type = TYPE_PFN;
+		} else if (p2m_top[topidx][mididx][idx] ==
+					(pfn | IDENTITY_FRAME_BIT)) {
 			lvl = 2; type = TYPE_IDENTITY;
+		} else if (p2m_top[topidx][mididx][idx] != pfn) {
+			lvl = 2; type = TYPE_PFN;
 		} else if (p2m_top[topidx] == NULL) {
 			lvl = 0; type = TYPE_UNKN;
 		} else if (p2m_top[topidx][mididx] == NULL) {
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 643b3fc..35b3b4d 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -168,7 +168,7 @@ static unsigned long __init xen_set_identity(const struct e820map *e820)
 		}
 
 		for (pfn = PFN_UP(last); pfn < PFN_DOWN(end); pfn++)
-			__set_phys_to_machine(pfn, pfn);
+			__set_phys_to_machine(pfn, pfn | IDENTITY_FRAME_BIT);
 		identity += pfn - PFN_UP(last);
 
 		last = end;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 8/8] xen/mmu: Set _PAGE_IOMAP if PFN is in identity P2M.
  2010-12-30 19:48 ` Konrad Rzeszutek Wilk
@ 2010-12-30 19:48   ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-12-30 19:48 UTC (permalink / raw)
  To: linux-kernel, Jeremy Fitzhardinge, hpa, Ian Campbell
  Cc: Jan Beulich, xen-devel, Konrad Rzeszutek Wilk,
	Stefano Stabellini, Konrad Rzeszutek Wilk

If we find that the PFN is within the P2M as an identity
make sure to tack on the _PAGE_IOMAP flag.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c |   19 +++++++++++++++----
 1 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index d470435..158d07a 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -828,7 +828,7 @@ static pteval_t pte_pfn_to_mfn(pteval_t val)
 	if (val & _PAGE_PRESENT) {
 		unsigned long pfn = (val & PTE_PFN_MASK) >> PAGE_SHIFT;
 		pteval_t flags = val & PTE_FLAGS_MASK;
-		unsigned long mfn = pfn_to_mfn(pfn);
+		unsigned long mfn = get_phys_to_machine(pfn);
 
 		/*
 		 * If there's no mfn for the pfn, then just create an
@@ -839,8 +839,18 @@ static pteval_t pte_pfn_to_mfn(pteval_t val)
 		if (unlikely(mfn == INVALID_P2M_ENTRY)) {
 			mfn = 0;
 			flags = 0;
+		} else {
+			/*
+			 * Paramount to do this test _after_ the
+			 * INVALID_P2M_ENTRY as INVALID_P2M_ENTRY &
+			 * IDENTITY_FRAME_BIT resolves to true.
+			 */
+			mfn &= ~FOREIGN_FRAME_BIT;
+			if (mfn & IDENTITY_FRAME_BIT) {
+				mfn &= ~IDENTITY_FRAME_BIT;
+				flags |= _PAGE_IOMAP;
+			}
 		}
-
 		val = ((pteval_t)mfn << PAGE_SHIFT) | flags;
 	}
 
@@ -968,8 +978,9 @@ pte_t xen_make_pte_debug(pteval_t pte)
 			(unsigned long)addr, (unsigned long)other_addr);
 	} else {
 		other_addr = (_pte.pte & PTE_PFN_MASK);
-		WARN((addr == other_addr) && (!io_page),
-			"0x%lx is missing VM_IO!\n",
+		pteval_t iomap_set = (_pte.pte & PTE_FLAGS_MASK) & _PAGE_IOMAP;
+		WARN((addr == other_addr) && (!io_page) && (!iomap_set),
+			"0x%lx is missing VM_IO (and wasn't fixed)!\n",
 			(unsigned long)addr);
 	}
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 8/8] xen/mmu: Set _PAGE_IOMAP if PFN is in identity P2M.
@ 2010-12-30 19:48   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-12-30 19:48 UTC (permalink / raw)
  To: linux-kernel, Jeremy Fitzhardinge, hpa, Ian Campbell
  Cc: Jan Beulich, xen-devel, Konrad Rzeszutek Wilk,
	Stefano Stabellini, Konrad Rzeszutek Wilk

If we find that the PFN is within the P2M as an identity
make sure to tack on the _PAGE_IOMAP flag.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c |   19 +++++++++++++++----
 1 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index d470435..158d07a 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -828,7 +828,7 @@ static pteval_t pte_pfn_to_mfn(pteval_t val)
 	if (val & _PAGE_PRESENT) {
 		unsigned long pfn = (val & PTE_PFN_MASK) >> PAGE_SHIFT;
 		pteval_t flags = val & PTE_FLAGS_MASK;
-		unsigned long mfn = pfn_to_mfn(pfn);
+		unsigned long mfn = get_phys_to_machine(pfn);
 
 		/*
 		 * If there's no mfn for the pfn, then just create an
@@ -839,8 +839,18 @@ static pteval_t pte_pfn_to_mfn(pteval_t val)
 		if (unlikely(mfn == INVALID_P2M_ENTRY)) {
 			mfn = 0;
 			flags = 0;
+		} else {
+			/*
+			 * Paramount to do this test _after_ the
+			 * INVALID_P2M_ENTRY as INVALID_P2M_ENTRY &
+			 * IDENTITY_FRAME_BIT resolves to true.
+			 */
+			mfn &= ~FOREIGN_FRAME_BIT;
+			if (mfn & IDENTITY_FRAME_BIT) {
+				mfn &= ~IDENTITY_FRAME_BIT;
+				flags |= _PAGE_IOMAP;
+			}
 		}
-
 		val = ((pteval_t)mfn << PAGE_SHIFT) | flags;
 	}
 
@@ -968,8 +978,9 @@ pte_t xen_make_pte_debug(pteval_t pte)
 			(unsigned long)addr, (unsigned long)other_addr);
 	} else {
 		other_addr = (_pte.pte & PTE_PFN_MASK);
-		WARN((addr == other_addr) && (!io_page),
-			"0x%lx is missing VM_IO!\n",
+		pteval_t iomap_set = (_pte.pte & PTE_FLAGS_MASK) & _PAGE_IOMAP;
+		WARN((addr == other_addr) && (!io_page) && (!iomap_set),
+			"0x%lx is missing VM_IO (and wasn't fixed)!\n",
 			(unsigned long)addr);
 	}
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 7/8] xen/mmu: Introduce IDENTITY_FRAME_BIT
  2010-12-30 19:48   ` Konrad Rzeszutek Wilk
  (?)
@ 2011-01-04 16:26   ` Ian Campbell
  2011-01-04 16:45     ` Konrad Rzeszutek Wilk
  -1 siblings, 1 reply; 41+ messages in thread
From: Ian Campbell @ 2011-01-04 16:26 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, Jeremy Fitzhardinge, hpa, Jan Beulich, xen-devel,
	Konrad Rzeszutek Wilk, Stefano Stabellini

On Thu, 2010-12-30 at 19:48 +0000, Konrad Rzeszutek Wilk wrote:
> @@ -52,9 +53,12 @@ static inline unsigned long pfn_to_mfn(unsigned long pfn)
>  
>  	mfn = get_phys_to_machine(pfn);
>  
> -	if (mfn != INVALID_P2M_ENTRY)
> +	if (mfn != INVALID_P2M_ENTRY) {
>  		mfn &= ~FOREIGN_FRAME_BIT;
>  
> +		if (mfn & IDENTITY_FRAME_BIT)
> +			mfn &= ~IDENTITY_FRAME_BIT;
> +	}

I don't think the inner-if buys us anything here and the whole thing is
equivalent to:
	if (mfn != INVALID_P2M_ENTRY)
		mfn &= ~(FOREIGN_FRAME_BIT|IDENTITY_FRAME_BIT);

Not sure if the FOREIGN_FRAME_BIT|IDENTITY_FRAME_BIT construct gets
enough use to be worthy of a #define FRAME_TYPE_MASK etc.

>  	return mfn;
>  }
>  
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index d98bd43..d470435 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -425,10 +425,10 @@ unsigned long get_phys_to_machine(unsigned long pfn)
>  	 * would be wrong.
>  	 */
>  	if (p2m_top[topidx] == p2m_mid_identity)
> -		return pfn;
> +		return pfn | IDENTITY_FRAME_BIT;

It's probably worth defining IDENTITY_FRAME(m) in the pattern of
FOREIGN_FRAME(m).

Ian.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/8] xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY.
  2010-12-30 19:48   ` Konrad Rzeszutek Wilk
@ 2011-01-04 16:34     ` Ian Campbell
  -1 siblings, 0 replies; 41+ messages in thread
From: Ian Campbell @ 2011-01-04 16:34 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, Jeremy Fitzhardinge, hpa, Jan Beulich, xen-devel,
	Konrad Rzeszutek Wilk, Stefano Stabellini

On Thu, 2010-12-30 at 19:48 +0000, Konrad Rzeszutek Wilk wrote:
> With this patch, we diligently set regions that will be used by the
> balloon driver to be INVALID_P2M_ENTRY and under the ownership
> of the balloon driver. We are OK using the __set_phys_to_machine
> as we do not expect to be allocating any P2M middle or entries pages.

... because xen_build_mfn_list_list will have already allocated all such
pages up to xen_max_p2m_pfn.

(right?)

Ian.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/8] xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY.
@ 2011-01-04 16:34     ` Ian Campbell
  0 siblings, 0 replies; 41+ messages in thread
From: Ian Campbell @ 2011-01-04 16:34 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Jeremy Fitzhardinge, xen-devel, Stefano Stabellini, linux-kernel,
	Jan Beulich, Konrad Rzeszutek Wilk, hpa

On Thu, 2010-12-30 at 19:48 +0000, Konrad Rzeszutek Wilk wrote:
> With this patch, we diligently set regions that will be used by the
> balloon driver to be INVALID_P2M_ENTRY and under the ownership
> of the balloon driver. We are OK using the __set_phys_to_machine
> as we do not expect to be allocating any P2M middle or entries pages.

... because xen_build_mfn_list_list will have already allocated all such
pages up to xen_max_p2m_pfn.

(right?)

Ian.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/8] xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY.
  2011-01-04 16:34     ` Ian Campbell
  (?)
@ 2011-01-04 16:45     ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-01-04 16:45 UTC (permalink / raw)
  To: Ian Campbell
  Cc: linux-kernel, Jeremy Fitzhardinge, hpa, Jan Beulich, xen-devel,
	Konrad Rzeszutek Wilk, Stefano Stabellini

On Tue, Jan 04, 2011 at 04:34:48PM +0000, Ian Campbell wrote:
> On Thu, 2010-12-30 at 19:48 +0000, Konrad Rzeszutek Wilk wrote:
> > With this patch, we diligently set regions that will be used by the
> > balloon driver to be INVALID_P2M_ENTRY and under the ownership
> > of the balloon driver. We are OK using the __set_phys_to_machine
> > as we do not expect to be allocating any P2M middle or entries pages.
> 
> ... because xen_build_mfn_list_list will have already allocated all such
> pages up to xen_max_p2m_pfn.

Correct. Let me update the description accordingly.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 7/8] xen/mmu: Introduce IDENTITY_FRAME_BIT
  2011-01-04 16:26   ` Ian Campbell
@ 2011-01-04 16:45     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-01-04 16:45 UTC (permalink / raw)
  To: Ian Campbell
  Cc: linux-kernel, Jeremy Fitzhardinge, hpa, Jan Beulich, xen-devel,
	Konrad Rzeszutek Wilk, Stefano Stabellini

On Tue, Jan 04, 2011 at 04:26:15PM +0000, Ian Campbell wrote:
> On Thu, 2010-12-30 at 19:48 +0000, Konrad Rzeszutek Wilk wrote:
> > @@ -52,9 +53,12 @@ static inline unsigned long pfn_to_mfn(unsigned long pfn)
> >  
> >  	mfn = get_phys_to_machine(pfn);
> >  
> > -	if (mfn != INVALID_P2M_ENTRY)
> > +	if (mfn != INVALID_P2M_ENTRY) {
> >  		mfn &= ~FOREIGN_FRAME_BIT;
> >  
> > +		if (mfn & IDENTITY_FRAME_BIT)
> > +			mfn &= ~IDENTITY_FRAME_BIT;
> > +	}
> 
> I don't think the inner-if buys us anything here and the whole thing is
> equivalent to:
> 	if (mfn != INVALID_P2M_ENTRY)
> 		mfn &= ~(FOREIGN_FRAME_BIT|IDENTITY_FRAME_BIT);
> 
> Not sure if the FOREIGN_FRAME_BIT|IDENTITY_FRAME_BIT construct gets
> enough use to be worthy of a #define FRAME_TYPE_MASK etc.

Yeah, that is much better.
> 
> >  	return mfn;
> >  }
> >  
> > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> > index d98bd43..d470435 100644
> > --- a/arch/x86/xen/mmu.c
> > +++ b/arch/x86/xen/mmu.c
> > @@ -425,10 +425,10 @@ unsigned long get_phys_to_machine(unsigned long pfn)
> >  	 * would be wrong.
> >  	 */
> >  	if (p2m_top[topidx] == p2m_mid_identity)
> > -		return pfn;
> > +		return pfn | IDENTITY_FRAME_BIT;
> 
> It's probably worth defining IDENTITY_FRAME(m) in the pattern of
> FOREIGN_FRAME(m).

<nods>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/8] xen/mmu: Add the notion of identity (1-1) mapping.
  2010-12-30 19:48   ` Konrad Rzeszutek Wilk
  (?)
@ 2011-01-04 16:53   ` Ian Campbell
  2011-01-04 16:59     ` Ian Campbell
  2011-01-04 19:24     ` Konrad Rzeszutek Wilk
  -1 siblings, 2 replies; 41+ messages in thread
From: Ian Campbell @ 2011-01-04 16:53 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, Jeremy Fitzhardinge, hpa, Jan Beulich, xen-devel,
	Konrad Rzeszutek Wilk, Stefano Stabellini

Should patch 7/8 (add the IDENTITY_FRAME_BIT) be folded back into this
patch?

On Thu, 2010-12-30 at 19:48 +0000, Konrad Rzeszutek Wilk wrote:
> Our P2M tree structure is a three-level. On the leaf nodes
> we set the Machine Frame Number (MFN) of the PFN. What this means
> is that when one does: pfn_to_mfn(pfn), which is used when creating
> PTE entries, you get the real MFN of the hardware. When Xen sets
> up a guest it initially populates a array which has descending MFN
> values, as so:
> 
>  idx: 0,  1,       2
>  [0x290F, 0x290E, 0x290D, ..]
> 
> so pfn_to_mfn(2)==0x290D. If you start, restart many guests that list
> starts looking quite random.
> 
> We graft this structure on our P2M tree structure and stick in
> those MFN in the leafs. But for all other leaf entries, or for the top
> root, or middle one, for which there is a void entry, we assume it is
> "missing". So
>  pfn_to_mfn(0xc0000)=INVALID_P2M_ENTRY.
> 
> We add the possibility of setting 1-1 mappings on certain regions, so
> that:
>  pfn_to_mfn(0xc0000)=0xc0000
> 
> The benefit of this is, that we can assume for non-RAM regions (think
> PCI BARs, or ACPI spaces), we can create mappings easily b/c we
> get the PFN value to match the MFN.
> 
> For this to work efficiently we introduce are two new pages: p2m_identity
> and p2m_mid_identity.  All entries in p2m_identity are set to INVALID_P2M_ENTRY
> type,

This statement confused me at first. I think the piece of information
which is missing in the rest of the paragraph is that on lookup we spot
that the entry points to p2m_identity and return the identity value
instead of dereferencing and returning INVALID_P2M_ENTRY.

If the value is never actually (deliberately) completely dereferecned
then perhaps for sanity/debugging the page should contain some other
invalid/poison value so we can spot in stack traces etc cases where it
has been erroneously dereferenced? (e.g. 0xFFFFFFFE?) Or does that
confuse the tools/migration or something similar?

> @@ -410,6 +418,17 @@ unsigned long get_phys_to_machine(unsigned long pfn)
>  	mididx = p2m_mid_index(pfn);
>  	idx = p2m_index(pfn);
>  
> +	/*
> +	 * The INVALID_P2M_ENTRY is filled in both p2m_*identity
> +	 * and in p2m_*missing, so returning the INVALID_P2M_ENTRY
> +	 * would be wrong.
> +	 */
> +	if (p2m_top[topidx] == p2m_mid_identity)
> +		return pfn;
> +
> +	if (p2m_top[topidx][mididx] == p2m_identity)
> +		return pfn;
> +

If p2m_top[topidx] == p2m_mid_identity the also p2m_top[topidx][mididx]
== p2m_identity. Therefore I'm not sure if the first check is worthwhile
or not.

> @@ -517,6 +542,26 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
>  	mididx = p2m_mid_index(pfn);
>  	idx = p2m_index(pfn);
>  
> +	/* For sparse holes were the p2m leaf has real PFN along with
> +	 * PCI holes, stick in the PFN as the MFN value.
> +	 */
> +	if (pfn == mfn) {
> +		if (p2m_top[topidx] == p2m_mid_identity)
> +			return 1;
> +		if (p2m_top[topidx][mididx] == p2m_identity)
> +			return 1;

Should be "return true" throughout for a function returning bool. I
think it can also be simplified as per my comment above.

> +		/* Swap over from MISSING to IDENTITY if needed. */
> +		if (p2m_top[topidx] == p2m_mid_missing) {
> +			p2m_top[topidx] = p2m_mid_identity;
> +			return 1;
> +		}
> +		if (p2m_top[topidx][mididx] == p2m_missing) {
> +			p2m_top[topidx][mididx] = p2m_identity;
> +			return 1;
> +		}

I don't think I quite understand this bit.

If I do "__set_phys_to_machine(X, X)" where X happens to currently
correspond to p2m_mid_missing won't that cause all pfn entries in the
range covered by p2m_top[topidx] (i.e. quite a few, like 512^2 pages or
something) to switch from missing to identity? Similarly for
p2m_top[topidx][mididx]?

Perhaps ranges of identity bits are often well aligned with the
boundaries in the p2m 3-level tree but wouldn't that just be
coincidence?

Don't we need to completely populate the tree at this point and setup
the leaf nodes as appropriate? Which we can't do since this is
__set_phys_to_machine so we need to fail and let set_phys_to_machine to
its thing? Or maybe this hole bit needs to be hoisted up to
set_phys_to_machine?

Ian.

> +	}
> +
>  	if (p2m_top[topidx][mididx] == p2m_missing)
>  		return mfn == INVALID_P2M_ENTRY;
>  



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/8] xen/mmu: Add the notion of identity (1-1) mapping.
  2011-01-04 16:53   ` Ian Campbell
@ 2011-01-04 16:59     ` Ian Campbell
  2011-01-04 17:20       ` [Xen-devel] " Ian Campbell
  2011-01-04 19:24     ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 41+ messages in thread
From: Ian Campbell @ 2011-01-04 16:59 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, Jeremy Fitzhardinge, hpa, Jan Beulich, xen-devel,
	Konrad Rzeszutek Wilk, Stefano Stabellini

On Tue, 2011-01-04 at 16:53 +0000, Ian Campbell wrote:
> 
> If I do "__set_phys_to_machine(X, X)" where X happens to currently
> correspond to p2m_mid_missing won't that cause all pfn entries in the
> range covered by p2m_top[topidx] (i.e. quite a few, like 512^2 pages
> or
> something) to switch from missing to identity? Similarly for
> p2m_top[topidx][mididx]?
> 
> Perhaps ranges of identity bits are often well aligned with the
> boundaries in the p2m 3-level tree but wouldn't that just be
> coincidence? 

I wonder if it would make sense to have a debugfs file which exports the
p2m, for the purposes of eye-balling it for this sort of issue?
comparing to the e820 etc...

Maybe the whole thing in raw form would be overkill but spitting out a
list of ranges of identity, invalid, normal pages, plus annotation
regarding whether those come from a p2m_mid_identity style page or a
leaf node, might be occasionally useful.

Ian.
-- 
Ian Campbell
Current Noise: Place of Skulls - The Watchers

Call toll free number before digging.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/8] xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.
  2010-12-30 19:48   ` Konrad Rzeszutek Wilk
@ 2011-01-04 17:18     ` Ian Campbell
  -1 siblings, 0 replies; 41+ messages in thread
From: Ian Campbell @ 2011-01-04 17:18 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, Jeremy Fitzhardinge, hpa, Jan Beulich, xen-devel,
	Konrad Rzeszutek Wilk, Stefano Stabellini

On Thu, 2010-12-30 at 19:48 +0000, Konrad Rzeszutek Wilk wrote:
> We walk the E820 region and start at 0 (for PV guests we start
> at ISA_END_ADDRESS)

I was trying to figure out what any of this had to do with HVM guests,
but you mean as opposed to dom0, which with my pedant hat on is also a
guest ;-).

>  and skip any E820 RAM regions. For all other
> regions and as well the gaps we set them to be identity mappings.
> 
> The reasons we do not want to set the identity mapping from 0->
> ISA_END_ADDRESS when running as PV is b/c that the kernel would
> try to read DMI information and fail (no permissions to read that).

The reason for this special case is that in domU we have already punched
a hole from 640k-1M into the e820 which the hypervisor gave us.

Should we perhaps be doing this identity mapping before we punch that
extra hole? i.e. setup ID mappings based on the hypervisors idea of the
guest e820 not the munged one we subsequently magicked up? Only the
original e820 is going to bear any possible resemblance to the identity
pages which the guest can actually see.

Ian.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/8] xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.
@ 2011-01-04 17:18     ` Ian Campbell
  0 siblings, 0 replies; 41+ messages in thread
From: Ian Campbell @ 2011-01-04 17:18 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Jeremy Fitzhardinge, xen-devel, Stefano Stabellini, linux-kernel,
	Jan Beulich, Konrad Rzeszutek Wilk, hpa

On Thu, 2010-12-30 at 19:48 +0000, Konrad Rzeszutek Wilk wrote:
> We walk the E820 region and start at 0 (for PV guests we start
> at ISA_END_ADDRESS)

I was trying to figure out what any of this had to do with HVM guests,
but you mean as opposed to dom0, which with my pedant hat on is also a
guest ;-).

>  and skip any E820 RAM regions. For all other
> regions and as well the gaps we set them to be identity mappings.
> 
> The reasons we do not want to set the identity mapping from 0->
> ISA_END_ADDRESS when running as PV is b/c that the kernel would
> try to read DMI information and fail (no permissions to read that).

The reason for this special case is that in domU we have already punched
a hole from 640k-1M into the e820 which the hypervisor gave us.

Should we perhaps be doing this identity mapping before we punch that
extra hole? i.e. setup ID mappings based on the hypervisors idea of the
guest e820 not the munged one we subsequently magicked up? Only the
original e820 is going to bear any possible resemblance to the identity
pages which the guest can actually see.

Ian.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Xen-devel] Re: [PATCH 2/8] xen/mmu: Add the notion of identity (1-1) mapping.
  2011-01-04 16:59     ` Ian Campbell
@ 2011-01-04 17:20       ` Ian Campbell
  0 siblings, 0 replies; 41+ messages in thread
From: Ian Campbell @ 2011-01-04 17:20 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Jeremy Fitzhardinge, xen-devel, Stefano Stabellini, linux-kernel,
	Jn Beulich, Konrad Rzeszutek Wilk, hpa

On Tue, 2011-01-04 at 16:59 +0000, Ian Campbell wrote:
> On Tue, 2011-01-04 at 16:53 +0000, Ian Campbell wrote:
> > 
> > If I do "__set_phys_to_machine(X, X)" where X happens to currently
> > correspond to p2m_mid_missing won't that cause all pfn entries in the
> > range covered by p2m_top[topidx] (i.e. quite a few, like 512^2 pages
> > or
> > something) to switch from missing to identity? Similarly for
> > p2m_top[topidx][mididx]?
> > 
> > Perhaps ranges of identity bits are often well aligned with the
> > boundaries in the p2m 3-level tree but wouldn't that just be
> > coincidence? 
> 
> I wonder if it would make sense to have a debugfs file which exports the
> p2m, for the purposes of eye-balling it for this sort of issue?
> comparing to the e820 etc...
> 
> Maybe the whole thing in raw form would be overkill but spitting out a
> list of ranges of identity, invalid, normal pages, plus annotation
> regarding whether those come from a p2m_mid_identity style page or a
> leaf node, might be occasionally useful.

I should have read patch 5/8 first...


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 6/8] xen/debug: WARN_ON when 1-1 but no _PAGE_IOMAP flag set.
  2010-12-30 19:48   ` Konrad Rzeszutek Wilk
  (?)
@ 2011-01-04 17:24   ` Ian Campbell
  2011-01-04 18:46     ` Konrad Rzeszutek Wilk
  -1 siblings, 1 reply; 41+ messages in thread
From: Ian Campbell @ 2011-01-04 17:24 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, Jeremy Fitzhardinge, hpa, Jan Beulich, xen-devel,
	Konrad Rzeszutek Wilk, Stefano Stabellini

On Thu, 2010-12-30 at 19:48 +0000, Konrad Rzeszutek Wilk wrote:
> Only enabled if XEN_DEBUG_FS is enabled.

Bit of an odd configuration option to use. Perhaps co-opt
CONFIG_PARAVIRT_DEBUG instead?

Or maybe we need a new XEN_DEBUG option? or just make it a developer
only EXTRA_CFLAGS +=-DDEBUG thing?

Is this only temporary until the need for _PAGE_IOMAP is removed anyway?

Ian.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/8] xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.
  2011-01-04 17:18     ` Ian Campbell
  (?)
@ 2011-01-04 18:38     ` Konrad Rzeszutek Wilk
  2011-01-04 19:27       ` Ian Campbell
  -1 siblings, 1 reply; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-01-04 18:38 UTC (permalink / raw)
  To: Ian Campbell
  Cc: linux-kernel, Jeremy Fitzhardinge, hpa, Jan Beulich, xen-devel,
	Konrad Rzeszutek Wilk, Stefano Stabellini

On Tue, Jan 04, 2011 at 05:18:58PM +0000, Ian Campbell wrote:
> On Thu, 2010-12-30 at 19:48 +0000, Konrad Rzeszutek Wilk wrote:
> > We walk the E820 region and start at 0 (for PV guests we start
> > at ISA_END_ADDRESS)
> 
> I was trying to figure out what any of this had to do with HVM guests,
> but you mean as opposed to dom0, which with my pedant hat on is also a
> guest ;-).
> 
> >  and skip any E820 RAM regions. For all other
> > regions and as well the gaps we set them to be identity mappings.
> > 
> > The reasons we do not want to set the identity mapping from 0->
> > ISA_END_ADDRESS when running as PV is b/c that the kernel would
> > try to read DMI information and fail (no permissions to read that).
> 
> The reason for this special case is that in domU we have already punched
> a hole from 640k-1M into the e820 which the hypervisor gave us.

For the privileged guest - yes. But for the non-priviligied it does not have
such range and would end up failing.
> 
> Should we perhaps be doing this identity mapping before we punch that
> extra hole? i.e. setup ID mappings based on the hypervisors idea of the
> guest e820 not the munged one we subsequently magicked up? Only the

You mean the ISA_START_ADDRESS->ISA_END_ADDRESS we mark as reserved?

It sure would be easier (and it would mean we can return that memory
back to the hypervisor).

> original e820 is going to bear any possible resemblance to the identity
> pages which the guest can actually see.
> 
> Ian.
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 6/8] xen/debug: WARN_ON when 1-1 but no _PAGE_IOMAP flag set.
  2011-01-04 17:24   ` Ian Campbell
@ 2011-01-04 18:46     ` Konrad Rzeszutek Wilk
  2011-01-04 19:20       ` Ian Campbell
  0 siblings, 1 reply; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-01-04 18:46 UTC (permalink / raw)
  To: Ian Campbell
  Cc: linux-kernel, Jeremy Fitzhardinge, hpa, Jan Beulich, xen-devel,
	Konrad Rzeszutek Wilk, Stefano Stabellini

On Tue, Jan 04, 2011 at 05:24:38PM +0000, Ian Campbell wrote:
> On Thu, 2010-12-30 at 19:48 +0000, Konrad Rzeszutek Wilk wrote:
> > Only enabled if XEN_DEBUG_FS is enabled.
> 
> Bit of an odd configuration option to use. Perhaps co-opt
> CONFIG_PARAVIRT_DEBUG instead?
> 
> Or maybe we need a new XEN_DEBUG option? or just make it a developer
> only EXTRA_CFLAGS +=-DDEBUG thing?

I think the 'XEN_DEBUG' works better.
> 
> Is this only temporary until the need for _PAGE_IOMAP is removed anyway?

I was thinking to leave it as a way to weed out bugs, but I could as well
just leave it in my "debug" branch and not propose it upstream.

I am not sure how to remove the _PAGE_IOMAP fully. We need some _flag_ to
signal 'xen_pte_val' that the PTE should not be looked up in the M2P.

Otherwise, for identity mappings, the value is 0xfffff.. (or 0x55555.. sometimes)
and the PTE ends up being messed up. Instead there is a check to see if
_PAGE_IOMAP is part of the PTE, and if so, no M2P lookup is done.

I guess we could do the M2P irregardless and just see if it is 0xfffff... and
if so just return the PTE without any changes.

> 
> Ian.
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 6/8] xen/debug: WARN_ON when 1-1 but no _PAGE_IOMAP flag set.
  2011-01-04 18:46     ` Konrad Rzeszutek Wilk
@ 2011-01-04 19:20       ` Ian Campbell
  2011-01-06 19:50         ` Stefano Stabellini
  0 siblings, 1 reply; 41+ messages in thread
From: Ian Campbell @ 2011-01-04 19:20 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, Jeremy Fitzhardinge, hpa, Jan Beulich, xen-devel,
	Konrad Rzeszutek Wilk, Stefano Stabellini

On Tue, 2011-01-04 at 18:46 +0000, Konrad Rzeszutek Wilk wrote: 
> On Tue, Jan 04, 2011 at 05:24:38PM +0000, Ian Campbell wrote:
> > On Thu, 2010-12-30 at 19:48 +0000, Konrad Rzeszutek Wilk wrote:
> > > Only enabled if XEN_DEBUG_FS is enabled.
> > 
> > Bit of an odd configuration option to use. Perhaps co-opt
> > CONFIG_PARAVIRT_DEBUG instead?
> > 
> > Or maybe we need a new XEN_DEBUG option? or just make it a developer
> > only EXTRA_CFLAGS +=-DDEBUG thing?
> 
> I think the 'XEN_DEBUG' works better.
> > 
> > Is this only temporary until the need for _PAGE_IOMAP is removed anyway?
> 
> I was thinking to leave it as a way to weed out bugs, but I could as well
> just leave it in my "debug" branch and not propose it upstream.
> 
> I am not sure how to remove the _PAGE_IOMAP fully. We need some _flag_ to
> signal 'xen_pte_val' that the PTE should not be looked up in the M2P.
> 
> Otherwise, for identity mappings, the value is 0xfffff.. (or 0x55555.. sometimes)
> and the PTE ends up being messed up. Instead there is a check to see if
> _PAGE_IOMAP is part of the PTE, and if so, no M2P lookup is done.
> 
> I guess we could do the M2P irregardless and just see if it is 0xfffff... and
> if so just return the PTE without any changes.

Perhaps this ties in with the m2p overlay which Stefano+Jeremy have been
working on to deal with granted foreign pages? I/O pages are a bit like
foreign memory (if you squint enough)...

Ian.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/8] xen/mmu: Add the notion of identity (1-1) mapping.
  2011-01-04 16:53   ` Ian Campbell
  2011-01-04 16:59     ` Ian Campbell
@ 2011-01-04 19:24     ` Konrad Rzeszutek Wilk
  2011-01-05 14:03       ` Ian Campbell
  1 sibling, 1 reply; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-01-04 19:24 UTC (permalink / raw)
  To: Ian Campbell
  Cc: linux-kernel, Jeremy Fitzhardinge, hpa, Jan Beulich, xen-devel,
	Konrad Rzeszutek Wilk, Stefano Stabellini

On Tue, Jan 04, 2011 at 04:53:48PM +0000, Ian Campbell wrote:
> Should patch 7/8 (add the IDENTITY_FRAME_BIT) be folded back into this
> patch?

Does not have too. This can work independently (it does not introduce
any regressions).
> 
> On Thu, 2010-12-30 at 19:48 +0000, Konrad Rzeszutek Wilk wrote:
> > Our P2M tree structure is a three-level. On the leaf nodes
> > we set the Machine Frame Number (MFN) of the PFN. What this means
> > is that when one does: pfn_to_mfn(pfn), which is used when creating
> > PTE entries, you get the real MFN of the hardware. When Xen sets
> > up a guest it initially populates a array which has descending MFN
> > values, as so:
> > 
> >  idx: 0,  1,       2
> >  [0x290F, 0x290E, 0x290D, ..]
> > 
> > so pfn_to_mfn(2)==0x290D. If you start, restart many guests that list
> > starts looking quite random.
> > 
> > We graft this structure on our P2M tree structure and stick in
> > those MFN in the leafs. But for all other leaf entries, or for the top
> > root, or middle one, for which there is a void entry, we assume it is
> > "missing". So
> >  pfn_to_mfn(0xc0000)=INVALID_P2M_ENTRY.
> > 
> > We add the possibility of setting 1-1 mappings on certain regions, so
> > that:
> >  pfn_to_mfn(0xc0000)=0xc0000
> > 
> > The benefit of this is, that we can assume for non-RAM regions (think
> > PCI BARs, or ACPI spaces), we can create mappings easily b/c we
> > get the PFN value to match the MFN.
> > 
> > For this to work efficiently we introduce are two new pages: p2m_identity
> > and p2m_mid_identity.  All entries in p2m_identity are set to INVALID_P2M_ENTRY
> > type,
> 
> This statement confused me at first. I think the piece of information
> which is missing in the rest of the paragraph is that on lookup we spot
> that the entry points to p2m_identity and return the identity value
> instead of dereferencing and returning INVALID_P2M_ENTRY.
> 
> If the value is never actually (deliberately) completely dereferecned
> then perhaps for sanity/debugging the page should contain some other
> invalid/poison value so we can spot in stack traces etc cases where it
> has been erroneously dereferenced? (e.g. 0xFFFFFFFE?) Or does that
> confuse the tools/migration or something similar?

Yup. The tools have checks for either valid PFNs or 0xFFFFFFF, but nothing
else. They flip out if you stick other values in .. oh and sticking in
the same MFN (in case you were thinking of them pointing to a dummy page)
makes the toolstack think you are having a race.

> 
> > @@ -410,6 +418,17 @@ unsigned long get_phys_to_machine(unsigned long pfn)
> >  	mididx = p2m_mid_index(pfn);
> >  	idx = p2m_index(pfn);
> >  
> > +	/*
> > +	 * The INVALID_P2M_ENTRY is filled in both p2m_*identity
> > +	 * and in p2m_*missing, so returning the INVALID_P2M_ENTRY
> > +	 * would be wrong.
> > +	 */
> > +	if (p2m_top[topidx] == p2m_mid_identity)
> > +		return pfn;
> > +
> > +	if (p2m_top[topidx][mididx] == p2m_identity)
> > +		return pfn;
> > +
> 
> If p2m_top[topidx] == p2m_mid_identity the also p2m_top[topidx][mididx]
> == p2m_identity. Therefore I'm not sure if the first check is worthwhile
> or not.

Good eye.

> 
> > @@ -517,6 +542,26 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
> >  	mididx = p2m_mid_index(pfn);
> >  	idx = p2m_index(pfn);
> >  
> > +	/* For sparse holes were the p2m leaf has real PFN along with
> > +	 * PCI holes, stick in the PFN as the MFN value.
> > +	 */
> > +	if (pfn == mfn) {
> > +		if (p2m_top[topidx] == p2m_mid_identity)
> > +			return 1;
> > +		if (p2m_top[topidx][mididx] == p2m_identity)
> > +			return 1;
> 
> Should be "return true" throughout for a function returning bool. I
> think it can also be simplified as per my comment above.

<nods>
> 
> > +		/* Swap over from MISSING to IDENTITY if needed. */
> > +		if (p2m_top[topidx] == p2m_mid_missing) {
> > +			p2m_top[topidx] = p2m_mid_identity;
> > +			return 1;
> > +		}
> > +		if (p2m_top[topidx][mididx] == p2m_missing) {
> > +			p2m_top[topidx][mididx] = p2m_identity;
> > +			return 1;
> > +		}
> 
> I don't think I quite understand this bit.
> 
> If I do "__set_phys_to_machine(X, X)" where X happens to currently
> correspond to p2m_mid_missing won't that cause all pfn entries in the
> range covered by p2m_top[topidx] (i.e. quite a few, like 512^2 pages or
> something) to switch from missing to identity? Similarly for
> p2m_top[topidx][mididx]?

Yup. Keep in mind that this patchset explores the more "conservative" approach.

All 'void' entries are pointing to the p2m_*MISSING instead of p2m_*IDENTITY
pages.

Hence the swap over from missing to identity is correct.
> 
> Perhaps ranges of identity bits are often well aligned with the
> boundaries in the p2m 3-level tree but wouldn't that just be
> coincidence?

I thought so, but for all the machines I've run this on, those boundaries
are there.

> 
> Don't we need to completely populate the tree at this point and setup
> the leaf nodes as appropriate? Which we can't do since this is

Not exactly. At this point (xen/setup, parsing the E820), the P2M has
been populated up to nr_pages. So, say we have 2G to the guest, that is
p2m[0, 1] point to two pages "created" by extend_brk. The contents of the
pages points to mfn_list[..]. The rest of the p2m (so, p2m[2]..p2m[511])
point to p2m_mid_missing.

When we start the identity mapping, we end up right (for example) from 3GB
to 4GB with identity mapping, so p2m[3] is now p2m_mid_identity. The p2m[4] and
later are still in p2m_mid_missing.

Thought I've just thought of nasty condition. Say those PCI regions start at 2.5GB,
and the user provided dom0_mem=max:3GB. Oh, and actually that still works. The
p2m[1] will be allocated by reserv_brk, and we just end up sticking from
p2m[1][473]..p2m[1][511] with the PFN | IDENTITY_FRAME_BIT, which is OK as
that contents used to have INVALID_P2M_IDENTITY (it would have been
created by extend_brk in xen_build_dynamic_phys_to_machine and filled
with ~0 value).

> __set_phys_to_machine so we need to fail and let set_phys_to_machine to
> its thing? Or maybe this hole bit needs to be hoisted up to
> set_phys_to_machine?

It can, but set_phys_to_machine would have to use reserved_brk.

The danger I am seeing is if the non-RAM regions (and gaps) in the E820 are
not 1GB (or 512MB under 32-bit) boundary aligned (2.36GB?)_and_ the dom0_mem is
employed as to make the top-level P2M right up to the non-boundary aligned E820
void (so, dom0_mem=max:2GB for example). That would create a proper p2m[0] and p2m[1],
but p2m[3] would point to p2m_mid_missing, and when we try to make
p2m[3][231] identity (so ~2.36GB area) we would discover it is p2m_missing and
would over-write it to p2m_identity .. and it would be OK as we cannot use the
balloon driver (max:2GB). However, if we used 'dom0_mem=max:2364MB,1GB' I think
we would hit a failure when the balloon driver would try to balloon up and
end up hitting this BUG: phys_to_machine_mapping_valid(pfn) (as it should have
returned INVALID_P2M_ENTRY but instead returns the MFN).

A simple solution could be to make 'xen_build_dynamic_phys_to_machine'
cover E820 regions between "System RAM" in case the nr_pages does not extend
that far. Or introduce a 'set_phys_to_machine_early' that would do exactly the
same thing as 'set_phys_to_machine' but use extend_brk instead of __get_free_pages.

Ideas?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/8] xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.
  2011-01-04 18:38     ` Konrad Rzeszutek Wilk
@ 2011-01-04 19:27       ` Ian Campbell
  2011-01-04 21:28           ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 41+ messages in thread
From: Ian Campbell @ 2011-01-04 19:27 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, Jeremy Fitzhardinge, hpa, Jan Beulich, xen-devel,
	Konrad Rzeszutek Wilk, Stefano Stabellini

On Tue, 2011-01-04 at 18:38 +0000, Konrad Rzeszutek Wilk wrote: 
> On Tue, Jan 04, 2011 at 05:18:58PM +0000, Ian Campbell wrote:
> > On Thu, 2010-12-30 at 19:48 +0000, Konrad Rzeszutek Wilk wrote:
> > > We walk the E820 region and start at 0 (for PV guests we start
> > > at ISA_END_ADDRESS)
> > 
> > I was trying to figure out what any of this had to do with HVM guests,
> > but you mean as opposed to dom0, which with my pedant hat on is also a
> > guest ;-).
> > 
> > >  and skip any E820 RAM regions. For all other
> > > regions and as well the gaps we set them to be identity mappings.
> > > 
> > > The reasons we do not want to set the identity mapping from 0->
> > > ISA_END_ADDRESS when running as PV is b/c that the kernel would
> > > try to read DMI information and fail (no permissions to read that).
> > 
> > The reason for this special case is that in domU we have already punched
> > a hole from 640k-1M into the e820 which the hypervisor gave us.
> 
> For the privileged guest - yes. But for the non-priviligied it does not have
> such range and would end up failing.

xen_memory_setup has: 
        e820_add_region(ISA_START_ADDRESS, ISA_END_ADDRESS - ISA_START_ADDRESS,
                        E820_RESERVED);
which is unconditional but is actually more for domU's benefit than
dom0's which already sees the host e820 presumably with the right hole
already in place, which we simply shadow, or maybe slightly extend,
here.

In a domU we do this because if you let these pages into the general
allocation pool then they will potentially get used as page table pages
(hence be R/O) but e.g. the DMI code tries to map them to probe for
signatures and tries to does so R/W which fails. We could try and find
everywhere in the kernel which does this or we can simply reserve the
region which stops it getting used for page tables or other special
things, and is somewhat less surprising for non-Xen code.

> > Should we perhaps be doing this identity mapping before we punch that
> > extra hole? i.e. setup ID mappings based on the hypervisors idea of the
> > guest e820 not the munged one we subsequently magicked up? Only the
> 
> You mean the ISA_START_ADDRESS->ISA_END_ADDRESS we mark as reserved?

Yep. 

> It sure would be easier

> (and it would mean we can return that memory back to the hypervisor).

I don't think you can return it, since something like the DMI code which
wants to probe it expects to be able to map that PFN, if you've given
the MFN back then that will fail.

I suppose we could alias all such PFNs to the same scratch MFN but I'd
be concerned about some piece of code which expects to interact with
firmware scribbling over it and surprising some other piece of code
which interacts with the firmware...

Ian.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/8] xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.
  2011-01-04 19:27       ` Ian Campbell
@ 2011-01-04 21:28           ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-01-04 21:28 UTC (permalink / raw)
  To: Ian Campbell
  Cc: linux-kernel, Jeremy Fitzhardinge, hpa, Jan Beulich, xen-devel,
	Konrad Rzeszutek Wilk, Stefano Stabellini


> > For the privileged guest - yes. But for the non-priviligied it does not
> > have such range and would end up failing.
> 
> xen_memory_setup has:
>         e820_add_region(ISA_START_ADDRESS, ISA_END_ADDRESS -
> ISA_START_ADDRESS, E820_RESERVED);
> which is unconditional but is actually more for domU's benefit than
> dom0's which already sees the host e820 presumably with the right hole
> already in place, which we simply shadow, or maybe slightly extend,
> here.

Actually we don't do anything with that region in Dom0 case. We just
return the PFN without consulting the P2M for 0->0x100 while for DomU _we_
do consult the P2M and set those in the PTE. (look in xen_make_pte)

> 
> In a domU we do this because if you let these pages into the general
> allocation pool then they will potentially get used as page table pages
> (hence be R/O) but e.g. the DMI code tries to map them to probe for
> signatures and tries to does so R/W which fails. We could try and find
> everywhere in the kernel which does this or we can simply reserve the
> region which stops it getting used for page tables or other special
> things, and is somewhat less surprising for non-Xen code.

Yeah, went that hole once.. too many generic pieces of code.
.. snip..
> > You mean the ISA_START_ADDRESS->ISA_END_ADDRESS we mark as reserved?
> 
> Yep.
> 
> > It sure would be easier
> > 
> > (and it would mean we can return that memory back to the hypervisor).
> 
> I don't think you can return it, since something like the DMI code which
> wants to probe it expects to be able to map that PFN, if you've given
> the MFN back then that will fail.

Correct (for non-priviliged PV domain).
> 
> I suppose we could alias all such PFNs to the same scratch MFN but I'd

It actually works. I setup 0x1->0x100 to point to whatever the MFN was at
0x0, and released the pages from 0x1->0x100 and it worked for DomU PV guests 
(and dom0 since I ended up stomping those regions with the PFN|
IDENTITY_BIT_FRAME).

However, the tools weren't happy ('xm save'). They did not like the same PFN 
across a couple of entries in the P2M table and complained about a potential 
race. But there is another way and that is to special case in 'xen_make_pte' 
when we want to create a PTE for 0->ISA_END_ADDRESS and just give it the MFN 
from P2M[0x0] (for !xen_initial_domain()) while having the the pfns from 0x1-
>0x100 freed and set to be IDENTITY_BIT_FRAME... But that all just smacks of 
weird corner cases. Thought the code that is there is already special casing 
access to that region. Maybe it would clear it up a bit.


> be concerned about some piece of code which expects to interact with
> firmware scribbling over it and surprising some other piece of code
> which interacts with the firmware...

Fortunatly the all look for a some signature first before trying to scribble.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/8] xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.
@ 2011-01-04 21:28           ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-01-04 21:28 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Jeremy Fitzhardinge, xen-devel, Stefano Stabellini, linux-kernel,
	Jan Beulich, Konrad Rzeszutek Wilk, hpa


> > For the privileged guest - yes. But for the non-priviligied it does not
> > have such range and would end up failing.
> 
> xen_memory_setup has:
>         e820_add_region(ISA_START_ADDRESS, ISA_END_ADDRESS -
> ISA_START_ADDRESS, E820_RESERVED);
> which is unconditional but is actually more for domU's benefit than
> dom0's which already sees the host e820 presumably with the right hole
> already in place, which we simply shadow, or maybe slightly extend,
> here.

Actually we don't do anything with that region in Dom0 case. We just
return the PFN without consulting the P2M for 0->0x100 while for DomU _we_
do consult the P2M and set those in the PTE. (look in xen_make_pte)

> 
> In a domU we do this because if you let these pages into the general
> allocation pool then they will potentially get used as page table pages
> (hence be R/O) but e.g. the DMI code tries to map them to probe for
> signatures and tries to does so R/W which fails. We could try and find
> everywhere in the kernel which does this or we can simply reserve the
> region which stops it getting used for page tables or other special
> things, and is somewhat less surprising for non-Xen code.

Yeah, went that hole once.. too many generic pieces of code.
.. snip..
> > You mean the ISA_START_ADDRESS->ISA_END_ADDRESS we mark as reserved?
> 
> Yep.
> 
> > It sure would be easier
> > 
> > (and it would mean we can return that memory back to the hypervisor).
> 
> I don't think you can return it, since something like the DMI code which
> wants to probe it expects to be able to map that PFN, if you've given
> the MFN back then that will fail.

Correct (for non-priviliged PV domain).
> 
> I suppose we could alias all such PFNs to the same scratch MFN but I'd

It actually works. I setup 0x1->0x100 to point to whatever the MFN was at
0x0, and released the pages from 0x1->0x100 and it worked for DomU PV guests 
(and dom0 since I ended up stomping those regions with the PFN|
IDENTITY_BIT_FRAME).

However, the tools weren't happy ('xm save'). They did not like the same PFN 
across a couple of entries in the P2M table and complained about a potential 
race. But there is another way and that is to special case in 'xen_make_pte' 
when we want to create a PTE for 0->ISA_END_ADDRESS and just give it the MFN 
from P2M[0x0] (for !xen_initial_domain()) while having the the pfns from 0x1-
>0x100 freed and set to be IDENTITY_BIT_FRAME... But that all just smacks of 
weird corner cases. Thought the code that is there is already special casing 
access to that region. Maybe it would clear it up a bit.


> be concerned about some piece of code which expects to interact with
> firmware scribbling over it and surprising some other piece of code
> which interacts with the firmware...

Fortunatly the all look for a some signature first before trying to scribble.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/8] xen/mmu: Add the notion of identity (1-1) mapping.
  2011-01-04 19:24     ` Konrad Rzeszutek Wilk
@ 2011-01-05 14:03       ` Ian Campbell
  0 siblings, 0 replies; 41+ messages in thread
From: Ian Campbell @ 2011-01-05 14:03 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, Jeremy Fitzhardinge, hpa, Jan Beulich, xen-devel,
	Konrad Rzeszutek Wilk, Stefano Stabellini

On Tue, 2011-01-04 at 19:24 +0000, Konrad Rzeszutek Wilk wrote:
> On Tue, Jan 04, 2011 at 04:53:48PM +0000, Ian Campbell wrote:
> > Should patch 7/8 (add the IDENTITY_FRAME_BIT) be folded back into this
> > patch?
> 
> Does not have too. This can work independently (it does not introduce
> any regressions).

Without 7/8 don't you have the issue of confusing identity mapped PFNs
with ones which just happen to be identity?

> > 
> > > +		/* Swap over from MISSING to IDENTITY if needed. */
> > > +		if (p2m_top[topidx] == p2m_mid_missing) {
> > > +			p2m_top[topidx] = p2m_mid_identity;
> > > +			return 1;
> > > +		}
> > > +		if (p2m_top[topidx][mididx] == p2m_missing) {
> > > +			p2m_top[topidx][mididx] = p2m_identity;
> > > +			return 1;
> > > +		}
> > 
> > I don't think I quite understand this bit.
> > 
> > If I do "__set_phys_to_machine(X, X)" where X happens to currently
> > correspond to p2m_mid_missing won't that cause all pfn entries in the
> > range covered by p2m_top[topidx] (i.e. quite a few, like 512^2 pages or
> > something) to switch from missing to identity? Similarly for
> > p2m_top[topidx][mididx]?
> 
> Yup. Keep in mind that this patchset explores the more "conservative" approach.
> 
> All 'void' entries are pointing to the p2m_*MISSING instead of p2m_*IDENTITY
> pages.
> 
> Hence the swap over from missing to identity is correct.

How?

If I have a P2M such that P2M[X] == missing and P2M[X+1] == missing
(because they both happen to be in a region backed by p2m_mid_missing)
and I do __set_phys_to_machine(X, X) then this makes P2M[X] == identity,
which is fine. But it also makes P2M[X+1] == identity which is at best
counter-intuitive and at worst just totally wrong.

I also suspect we shouldn't be special casing __set_phys_to_machine(X,
X) in this way (since it is prone to false positives). We should
probably have a separate __set_phys_to_machine_identity(X) or something.

> > 
> > Perhaps ranges of identity bits are often well aligned with the
> > boundaries in the p2m 3-level tree but wouldn't that just be
> > coincidence?
> 
> I thought so, but for all the machines I've run this on, those boundaries
> are there.

I really don't think we should rely on that, it's very likely just a
coincidence...

> > Don't we need to completely populate the tree at this point and setup
> > the leaf nodes as appropriate? Which we can't do since this is
> 
> Not exactly. At this point (xen/setup, parsing the E820), the P2M has
> been populated up to nr_pages. So, say we have 2G to the guest, that is
> p2m[0, 1] point to two pages "created" by extend_brk. The contents of the
> pages points to mfn_list[..]. The rest of the p2m (so, p2m[2]..p2m[511])
> point to p2m_mid_missing.
> 
> When we start the identity mapping, we end up right (for example) from 3GB
> to 4GB with identity mapping, so p2m[3] is now p2m_mid_identity. The p2m[4] and
> later are still in p2m_mid_missing.
> 
> Thought I've just thought of nasty condition. Say those PCI regions start at 2.5GB,
> and the user provided dom0_mem=max:3GB. Oh, and actually that still works. The
> p2m[1] will be allocated by reserv_brk, and we just end up sticking from
> p2m[1][473]..p2m[1][511] with the PFN | IDENTITY_FRAME_BIT, which is OK as
> that contents used to have INVALID_P2M_IDENTITY (it would have been
> created by extend_brk in xen_build_dynamic_phys_to_machine and filled
> with ~0 value).

> > __set_phys_to_machine so we need to fail and let set_phys_to_machine to
> > its thing? Or maybe this hole bit needs to be hoisted up to
> > set_phys_to_machine?
> 
> It can, but set_phys_to_machine would have to use reserved_brk.
> 
> The danger I am seeing is if the non-RAM regions (and gaps) in the E820 are
> not 1GB (or 512MB under 32-bit) boundary aligned (2.36GB?)_and_ the dom0_mem is
> employed as to make the top-level P2M right up to the non-boundary aligned E820
> void (so, dom0_mem=max:2GB for example). That would create a proper p2m[0] and p2m[1],
> but p2m[3] would point to p2m_mid_missing, and when we try to make
> p2m[3][231] identity (so ~2.36GB area) we would discover it is p2m_missing and
> would over-write it to p2m_identity .. and it would be OK as we cannot use the
> balloon driver (max:2GB). However, if we used 'dom0_mem=max:2364MB,1GB' I think
> we would hit a failure when the balloon driver would try to balloon up and
> end up hitting this BUG: phys_to_machine_mapping_valid(pfn) (as it should have
> returned INVALID_P2M_ENTRY but instead returns the MFN).

Yes, this is exactly the problem case I was thinking of.

> A simple solution could be to make 'xen_build_dynamic_phys_to_machine'
> cover E820 regions between "System RAM" in case the nr_pages does not extend
> that far.

I think the limit used by xen_build_dynamic_phys_to_machine should
likely be the last page of the last entry in the e820 even if the e820
ends with RAM, hole, IO memory (or the RAM has been clamped such that it
effectively looks like that). 

I guess it would be possible/wise to leave gaps in the case that the
hole is massive, i.e. only populate the RAM + IO memory bits.

Ian.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 6/8] xen/debug: WARN_ON when 1-1 but no _PAGE_IOMAP flag set.
  2011-01-04 19:20       ` Ian Campbell
@ 2011-01-06 19:50         ` Stefano Stabellini
  2011-01-06 20:17           ` Keir Fraser
  0 siblings, 1 reply; 41+ messages in thread
From: Stefano Stabellini @ 2011-01-06 19:50 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Konrad Rzeszutek Wilk, linux-kernel, Jeremy Fitzhardinge, hpa,
	Jan Beulich, xen-devel, Konrad Rzeszutek Wilk,
	Stefano Stabellini, Keir Fraser

On Tue, 4 Jan 2011, Ian Campbell wrote:
> On Tue, 2011-01-04 at 18:46 +0000, Konrad Rzeszutek Wilk wrote: 
> > On Tue, Jan 04, 2011 at 05:24:38PM +0000, Ian Campbell wrote:
> > > On Thu, 2010-12-30 at 19:48 +0000, Konrad Rzeszutek Wilk wrote:
> > > > Only enabled if XEN_DEBUG_FS is enabled.
> > > 
> > > Bit of an odd configuration option to use. Perhaps co-opt
> > > CONFIG_PARAVIRT_DEBUG instead?
> > > 
> > > Or maybe we need a new XEN_DEBUG option? or just make it a developer
> > > only EXTRA_CFLAGS +=-DDEBUG thing?
> > 
> > I think the 'XEN_DEBUG' works better.
> > > 
> > > Is this only temporary until the need for _PAGE_IOMAP is removed anyway?
> > 
> > I was thinking to leave it as a way to weed out bugs, but I could as well
> > just leave it in my "debug" branch and not propose it upstream.
> > 
> > I am not sure how to remove the _PAGE_IOMAP fully. We need some _flag_ to
> > signal 'xen_pte_val' that the PTE should not be looked up in the M2P.
> > 
> > Otherwise, for identity mappings, the value is 0xfffff.. (or 0x55555.. sometimes)
> > and the PTE ends up being messed up. Instead there is a check to see if
> > _PAGE_IOMAP is part of the PTE, and if so, no M2P lookup is done.
> > 
> > I guess we could do the M2P irregardless and just see if it is 0xfffff... and
> > if so just return the PTE without any changes.
> 
> Perhaps this ties in with the m2p overlay which Stefano+Jeremy have been
> working on to deal with granted foreign pages? I/O pages are a bit like
> foreign memory (if you squint enough)...
 
In theory the m2p overlay could be used for this purpose but in practice
the current m2p overlay API needs a struct page, also it might end up
stressing the hashtable too much.

Besides I think Konrad's solution might be simpler: if the m2p returns
one of the two special values we just return mfn from pte_mfn_to_pfn.

Keir, could you confirm that the m2p entries of DOM_IO pages are always
0xffffff or 0x55555?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 6/8] xen/debug: WARN_ON when 1-1 but no _PAGE_IOMAP flag set.
  2011-01-06 19:50         ` Stefano Stabellini
@ 2011-01-06 20:17           ` Keir Fraser
  2011-01-06 21:59             ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 41+ messages in thread
From: Keir Fraser @ 2011-01-06 20:17 UTC (permalink / raw)
  To: Stefano Stabellini, Ian Campbell
  Cc: Konrad Rzeszutek Wilk, linux-kernel, Jeremy Fitzhardinge, hpa,
	Jan Beulich, xen-devel, Konrad Rzeszutek Wilk

On 06/01/2011 19:50, "Stefano Stabellini" <stefano.stabellini@eu.citrix.com>
wrote:

>> Perhaps this ties in with the m2p overlay which Stefano+Jeremy have been
>> working on to deal with granted foreign pages? I/O pages are a bit like
>> foreign memory (if you squint enough)...
>  
> In theory the m2p overlay could be used for this purpose but in practice
> the current m2p overlay API needs a struct page, also it might end up
> stressing the hashtable too much.
> 
> Besides I think Konrad's solution might be simpler: if the m2p returns
> one of the two special values we just return mfn from pte_mfn_to_pfn.
> 
> Keir, could you confirm that the m2p entries of DOM_IO pages are always
> 0xffffff or 0x55555?

Always 0x55...55 (for m2p entries that exist), else page fault on access to
the non-existent m2p entry (m2p entries only guaranteed to exist for ram).
Perhaps the 0xff...ff values come from Linux's own fixup code handling a
faulting read access of the m2p array? If so you could return 0x55...55
instead and avoid checking for 0xff...ff. I really don't know how you could
get 0xff...ff for non-RAM pages from Xen itself.

 -- Keir



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 6/8] xen/debug: WARN_ON when 1-1 but no _PAGE_IOMAP flag set.
  2011-01-06 20:17           ` Keir Fraser
@ 2011-01-06 21:59             ` Konrad Rzeszutek Wilk
  2011-01-06 22:17               ` Keir Fraser
  0 siblings, 1 reply; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-01-06 21:59 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Stefano Stabellini, Ian Campbell, linux-kernel,
	Jeremy Fitzhardinge, hpa, Jan Beulich, xen-devel,
	Konrad Rzeszutek Wilk

On Thu, Jan 06, 2011 at 08:17:38PM +0000, Keir Fraser wrote:
> On 06/01/2011 19:50, "Stefano Stabellini" <stefano.stabellini@eu.citrix.com>
> wrote:
> 
> >> Perhaps this ties in with the m2p overlay which Stefano+Jeremy have been
> >> working on to deal with granted foreign pages? I/O pages are a bit like
> >> foreign memory (if you squint enough)...
> >  
> > In theory the m2p overlay could be used for this purpose but in practice
> > the current m2p overlay API needs a struct page, also it might end up
> > stressing the hashtable too much.
> > 
> > Besides I think Konrad's solution might be simpler: if the m2p returns
> > one of the two special values we just return mfn from pte_mfn_to_pfn.
> > 
> > Keir, could you confirm that the m2p entries of DOM_IO pages are always
> > 0xffffff or 0x55555?
> 
> Always 0x55...55 (for m2p entries that exist), else page fault on access to
> the non-existent m2p entry (m2p entries only guaranteed to exist for ram).
> Perhaps the 0xff...ff values come from Linux's own fixup code handling a
> faulting read access of the m2p array? If so you could return 0x55...55
> instead and avoid checking for 0xff...ff. I really don't know how you could
> get 0xff...ff for non-RAM pages from Xen itself.

The non-RAM pages are assinged to a DOMID_IO (arch_init_memory), for example:

 298     /* First 1MB of RAM is historically marked as I/O. */
 299     for ( i = 0; i < 0x100; i++ )
 300         share_xen_page_with_guest(mfn_to_page(i), dom_io, XENSHARE_writable);

and share_xen_page.. sets that page to INVALID_M2P_ENTRY.

But I could also be reading the code wrongly?

> 
>  -- Keir
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 6/8] xen/debug: WARN_ON when 1-1 but no _PAGE_IOMAP flag set.
  2011-01-06 21:59             ` Konrad Rzeszutek Wilk
@ 2011-01-06 22:17               ` Keir Fraser
  0 siblings, 0 replies; 41+ messages in thread
From: Keir Fraser @ 2011-01-06 22:17 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, Ian Campbell, linux-kernel,
	Jeremy Fitzhardinge, hpa, Jan Beulich, xen-devel,
	Konrad Rzeszutek Wilk

On 06/01/2011 21:59, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com> wrote:

>> Always 0x55...55 (for m2p entries that exist), else page fault on access to
>> the non-existent m2p entry (m2p entries only guaranteed to exist for ram).
>> Perhaps the 0xff...ff values come from Linux's own fixup code handling a
>> faulting read access of the m2p array? If so you could return 0x55...55
>> instead and avoid checking for 0xff...ff. I really don't know how you could
>> get 0xff...ff for non-RAM pages from Xen itself.
> 
> The non-RAM pages are assinged to a DOMID_IO (arch_init_memory), for example:
> 
>  298     /* First 1MB of RAM is historically marked as I/O. */
>  299     for ( i = 0; i < 0x100; i++ )
>  300         share_xen_page_with_guest(mfn_to_page(i), dom_io,
> XENSHARE_writable);
> 
> and share_xen_page.. sets that page to INVALID_M2P_ENTRY.
> 
> But I could also be reading the code wrongly?

You're right, I missed that.

 -- Keir



^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2011-01-06 22:17 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-30 19:48 [PATCH RFC v2] Consider E820 non-RAM and E820 gaps as 1-1 mappings Konrad Rzeszutek Wilk
2010-12-30 19:48 ` Konrad Rzeszutek Wilk
2010-12-30 19:48 ` [PATCH 1/8] xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY Konrad Rzeszutek Wilk
2010-12-30 19:48   ` Konrad Rzeszutek Wilk
2011-01-04 16:34   ` Ian Campbell
2011-01-04 16:34     ` Ian Campbell
2011-01-04 16:45     ` Konrad Rzeszutek Wilk
2010-12-30 19:48 ` [PATCH 2/8] xen/mmu: Add the notion of identity (1-1) mapping Konrad Rzeszutek Wilk
2010-12-30 19:48   ` Konrad Rzeszutek Wilk
2011-01-04 16:53   ` Ian Campbell
2011-01-04 16:59     ` Ian Campbell
2011-01-04 17:20       ` [Xen-devel] " Ian Campbell
2011-01-04 19:24     ` Konrad Rzeszutek Wilk
2011-01-05 14:03       ` Ian Campbell
2010-12-30 19:48 ` [PATCH 3/8] xen/setup: Set identity mapping for non-RAM E820 and E820 gaps Konrad Rzeszutek Wilk
2010-12-30 19:48   ` Konrad Rzeszutek Wilk
2011-01-04 17:18   ` Ian Campbell
2011-01-04 17:18     ` Ian Campbell
2011-01-04 18:38     ` Konrad Rzeszutek Wilk
2011-01-04 19:27       ` Ian Campbell
2011-01-04 21:28         ` Konrad Rzeszutek Wilk
2011-01-04 21:28           ` Konrad Rzeszutek Wilk
2010-12-30 19:48 ` [PATCH 4/8] xen/mmu: Warn against races Konrad Rzeszutek Wilk
2010-12-30 19:48   ` Konrad Rzeszutek Wilk
2010-12-30 19:48 ` [PATCH 5/8] xen/debug: Print out all pages in the P2M Konrad Rzeszutek Wilk
2010-12-30 19:48   ` Konrad Rzeszutek Wilk
2010-12-30 19:48 ` [PATCH 6/8] xen/debug: WARN_ON when 1-1 but no _PAGE_IOMAP flag set Konrad Rzeszutek Wilk
2010-12-30 19:48   ` Konrad Rzeszutek Wilk
2011-01-04 17:24   ` Ian Campbell
2011-01-04 18:46     ` Konrad Rzeszutek Wilk
2011-01-04 19:20       ` Ian Campbell
2011-01-06 19:50         ` Stefano Stabellini
2011-01-06 20:17           ` Keir Fraser
2011-01-06 21:59             ` Konrad Rzeszutek Wilk
2011-01-06 22:17               ` Keir Fraser
2010-12-30 19:48 ` [PATCH 7/8] xen/mmu: Introduce IDENTITY_FRAME_BIT Konrad Rzeszutek Wilk
2010-12-30 19:48   ` Konrad Rzeszutek Wilk
2011-01-04 16:26   ` Ian Campbell
2011-01-04 16:45     ` Konrad Rzeszutek Wilk
2010-12-30 19:48 ` [PATCH 8/8] xen/mmu: Set _PAGE_IOMAP if PFN is in identity P2M Konrad Rzeszutek Wilk
2010-12-30 19:48   ` Konrad Rzeszutek Wilk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.