All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4] Consider E820 non-RAM and E820 gaps as 1-1 mappings.
@ 2011-01-31 22:44 Konrad Rzeszutek Wilk
  2011-01-31 22:44 ` [PATCH 01/11] xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY Konrad Rzeszutek Wilk
                   ` (10 more replies)
  0 siblings, 11 replies; 30+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-01-31 22:44 UTC (permalink / raw)
  To: linux-kernel, Xen-devel, konrad, jeremy
  Cc: hpa, stefano.stabellini, Ian.Campbell

I am proposing this for 2.6.39.

This series augments how Xen MMU deals with PFNs that point to physical
devices (PCI BARs, and such).

Reason for this: No need to troll through code to add VM_IO on mmap paths
anymore.

Long summary:
Under Xen MMU we would distinguish three different types of PFNs in
the P2M tree: real MFN, INVALID_P2M_ENTRY (missing PFN - used for ballooning)
and foreign MFNs (the one in the guest).

If there was a device which PCI BAR was within the P2M, we would look
at the PTE flags and if the _PAGE_IOMAP was passed we would just return
the PFN without consulting the P2M. We have a patch
(and some auxiliary for other subsystems) that sets this:
 x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas

This patchset proposes a different way of doing this where the patch
above and the other auxiliary ones will not be necessary.

This approach is the one that H. Peter Anvin, Jeremy Fitzhardinge, Ian Campbell
suggested. The mechanism is to think of the E820 non-RAM entries and E820 gaps
in the P2M tree structure as identity (1-1) mapping. Many thanks to Ian Campbell
and Stefano Stabellini for looking in details at the patches and asking quite
difficult questions.

In the past we used to think of those regions as "missing" and under the ownership
of the balloon code. But the balloon code only operates on a specific regions. This
region is in last E820 RAM page (basically any region past nr_pages is considered balloon
type page). [Honesty compels me to say that during run-time the balloon code
could own pages in different regions, but we do not have to worry about that as that 
works OK and we only have to worry about the bootup-case]

Gaps in the E820 (which are usually considered to PCI BAR spaces) would end up
with the void entries and point to the "missing" pages.

This patchset finds the ranges of non-RAM E820 entries and gaps and
marks them as as "identity". So for example, for this E820:

                    1GB                                           2GB
 /-------------------+---------\/----\         /----------\    /---+-----\
 | System RAM        | Sys RAM ||ACPI|         | reserved |    | Sys RAM |
 \-------------------+---------/\----/         \----------/    \---+-----/
                               ^- 1029MB                       ^- 2001MB

The identity range would be from 1029MB to 2001MB.

Since the E820 gaps could cross P2M level boundaries (keep in mind that the
P2M structure is a 3-level tree, first level covers 1GB, next down 4MB,
and then each page) we might have to allocate extra pages to handle those
violators.  For large regions (1GB) we create a
page which holds pointers to a shared "p2m_identity" page. For smaller regions
if necessary we create pages wherein we can mark PFNs as 1-1 mapping, so:
 pfn_to_mfn(pfn)==pfn.

The two attached diagrams crudely explain how we are doing this. "P2M story"
(https://docs.google.com/drawings/edit?id=1LQy0np2GCkFtspoucgs5Y_TILI8eceY6rikXwtPqlTI&hl=en&authkey=CO_yv_cC)
is how the P2M is constructed and setup with balloon pages. The "P2M with 1-1.."
(https://docs.google.com/drawings/edit?id=1smqIRPYq2mSxmvpabuk_Ap6bQAW_aaZnOnjzqlcmxKc&hl=en&authkey=CI2iwKcE)
is how we insert the identity mappings in the P2M tree.

Also, the first patch "xen/mmu: Add the notion of identity (1-1) mapping."
has an exhaustive explanation.

For the balloon pages, the setting of the "missing" pages is mostly already done.
The initial case of carving the last E820 region for balloon ownership is augmented
to set those PFNs to missing and we also change the balloon code to be more
aggressive.

This patchset is also available under git:
 git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git devel/p2m-identity.v4.7

The diffstat:

 arch/x86/include/asm/xen/page.h |   24 ++++-
 arch/x86/xen/Kconfig            |    8 ++
 arch/x86/xen/mmu.c              |   72 +++++++++++++-
 arch/x86/xen/p2m.c              |  202 +++++++++++++++++++++++++++++++++++++--
 arch/x86/xen/setup.c            |  107 ++++++++++++++++++++-
 drivers/xen/balloon.c           |    2 +-
 6 files changed, 400 insertions(+), 15 deletions(-)

And the shortlog:
Konrad Rzeszutek Wilk (10):
      xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY.
      xen/mmu: Add the notion of identity (1-1) mapping.
      xen/mmu: Set _PAGE_IOMAP if PFN is an identity PFN.
      xen/mmu: BUG_ON when racing to swap middle leaf.
      xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.
      xen/setup: Skip over 1st gap after System RAM.
      x86/setup: Consult the raw E820 for zero sized E820 RAM regions.
      xen/debugfs: Add 'p2m' file for printing out the P2M layout.
      xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set.
      xen/m2p: No need to catch exceptions when we know that there is no RAM

Stefano Stabellini (1):
      xen/m2p: Check whether the MFN has IDENTITY_FRAME bit set..


----
Changelog: [since v4, not posted]
 - Fixed corner-cases bugs on machines with swiss-cheese type E820 regions.

[since v3, not posted]
 - Made the passing of identity PFNs much simpler and cleaner.
 - Expanded the commit description.

[since v2 https://lkml.org/lkml/2010/12/30/163]
 - Added Reviewed-by.
 - Squashed some patches together..
 - Replaced p2m_mid_identity with using reserved_brk to allocate top
   identity entries. This protects us from non 1GB boundary conditions.
 - Expanded the commit descriptions.

[since v1 https://lkml.org/lkml/2010/12/21/255]:
 - Diagrams of P2M included.
 - More conservative approach used (memory that is not populated or 
   identity is considered "missing", instead of as identity).
 - Added IDENTITY_PAGE_FRAME bit to uniquely identify 1-1 mappings.
 - Optional debugfs file (xen/mmu/p2m) to print out the level and types in
   the P2M tree.
 - Lots of comments - if I missed any please prompt me.

P.S.
Along with the stable/ttm.pci-api.v4, I've been able to boot Dom0 on a variety
of PCIe type graphics hardware with X working (G31M, ATI ES1000, GeForce 6150SE,
HD 4350 Radeon, HD 3200 Radeon, GeForce 8600 GT). That test branch is located
at #master if you are curious.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 01/11] xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY.
  2011-01-31 22:44 [PATCH v4] Consider E820 non-RAM and E820 gaps as 1-1 mappings Konrad Rzeszutek Wilk
@ 2011-01-31 22:44 ` Konrad Rzeszutek Wilk
  2011-01-31 22:44 ` [PATCH 02/11] xen/mmu: Add the notion of identity (1-1) mapping Konrad Rzeszutek Wilk
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-01-31 22:44 UTC (permalink / raw)
  To: linux-kernel, Xen-devel, konrad, jeremy
  Cc: hpa, stefano.stabellini, Ian.Campbell, Konrad Rzeszutek Wilk

With this patch, we diligently set regions that will be used by the
balloon driver to be INVALID_P2M_ENTRY and under the ownership
of the balloon driver. We are OK using the __set_phys_to_machine
as we do not expect to be allocating any P2M middle or entries pages.
The set_phys_to_machine has the side-effect of potentially allocating
new pages and we do not want that at this stage.

We can do this because xen_build_mfn_list_list will have already
allocated all such pages up to xen_max_p2m_pfn.

We also move the check for auto translated physmap down the
stack so it is present in __set_phys_to_machine.

[v2: Rebased with mmu->p2m code split]
Reviewed-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/include/asm/xen/page.h |    1 +
 arch/x86/xen/mmu.c              |    2 +-
 arch/x86/xen/p2m.c              |    9 ++++-----
 arch/x86/xen/setup.c            |    7 ++++++-
 drivers/xen/balloon.c           |    2 +-
 5 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index f25bdf2..8ea9772 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -41,6 +41,7 @@ extern unsigned int   machine_to_phys_order;
 
 extern unsigned long get_phys_to_machine(unsigned long pfn);
 extern bool set_phys_to_machine(unsigned long pfn, unsigned long mfn);
+extern bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn);
 
 extern int m2p_add_override(unsigned long mfn, struct page *page);
 extern int m2p_remove_override(struct page *page);
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 5e92b61..0180ae8 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2074,7 +2074,7 @@ static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order,
 			in_frames[i] = virt_to_mfn(vaddr);
 
 		MULTI_update_va_mapping(mcs.mc, vaddr, VOID_PTE, 0);
-		set_phys_to_machine(virt_to_pfn(vaddr), INVALID_P2M_ENTRY);
+		__set_phys_to_machine(virt_to_pfn(vaddr), INVALID_P2M_ENTRY);
 
 		if (out_frames)
 			out_frames[i] = virt_to_pfn(vaddr);
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index ddc81a0..df4e367 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -365,6 +365,10 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 {
 	unsigned topidx, mididx, idx;
 
+	if (unlikely(xen_feature(XENFEAT_auto_translated_physmap))) {
+		BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
+		return true;
+	}
 	if (unlikely(pfn >= MAX_P2M_PFN)) {
 		BUG_ON(mfn != INVALID_P2M_ENTRY);
 		return true;
@@ -384,11 +388,6 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 
 bool set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 {
-	if (unlikely(xen_feature(XENFEAT_auto_translated_physmap))) {
-		BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
-		return true;
-	}
-
 	if (unlikely(!__set_phys_to_machine(pfn, mfn)))  {
 		if (!alloc_p2m(pfn))
 			return false;
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index b5a7f92..7201800 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -52,6 +52,8 @@ phys_addr_t xen_extra_mem_start, xen_extra_mem_size;
 
 static __init void xen_add_extra_mem(unsigned long pages)
 {
+	unsigned long pfn;
+
 	u64 size = (u64)pages * PAGE_SIZE;
 	u64 extra_start = xen_extra_mem_start + xen_extra_mem_size;
 
@@ -66,6 +68,9 @@ static __init void xen_add_extra_mem(unsigned long pages)
 	xen_extra_mem_size += size;
 
 	xen_max_p2m_pfn = PFN_DOWN(extra_start + size);
+
+	for (pfn = PFN_DOWN(extra_start); pfn <= xen_max_p2m_pfn; pfn++)
+		__set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
 }
 
 static unsigned long __init xen_release_chunk(phys_addr_t start_addr,
@@ -104,7 +109,7 @@ static unsigned long __init xen_release_chunk(phys_addr_t start_addr,
 		WARN(ret != 1, "Failed to release memory %lx-%lx err=%d\n",
 		     start, end, ret);
 		if (ret == 1) {
-			set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
+			__set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
 			len++;
 		}
 	}
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 43f9f02..b1661cd 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -296,7 +296,7 @@ static int decrease_reservation(unsigned long nr_pages)
 	/* No more mappings: invalidate P2M and add to balloon. */
 	for (i = 0; i < nr_pages; i++) {
 		pfn = mfn_to_pfn(frame_list[i]);
-		set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
+		__set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
 		balloon_append(pfn_to_page(pfn));
 	}
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 02/11] xen/mmu: Add the notion of identity (1-1) mapping.
  2011-01-31 22:44 [PATCH v4] Consider E820 non-RAM and E820 gaps as 1-1 mappings Konrad Rzeszutek Wilk
  2011-01-31 22:44 ` [PATCH 01/11] xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY Konrad Rzeszutek Wilk
@ 2011-01-31 22:44 ` Konrad Rzeszutek Wilk
  2011-02-01 21:33   ` Jeremy Fitzhardinge
  2011-01-31 22:44 ` [PATCH 03/11] xen/mmu: Set _PAGE_IOMAP if PFN is an identity PFN Konrad Rzeszutek Wilk
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-01-31 22:44 UTC (permalink / raw)
  To: linux-kernel, Xen-devel, konrad, jeremy
  Cc: hpa, stefano.stabellini, Ian.Campbell, Konrad Rzeszutek Wilk

Our P2M tree structure is a three-level. On the leaf nodes
we set the Machine Frame Number (MFN) of the PFN. What this means
is that when one does: pfn_to_mfn(pfn), which is used when creating
PTE entries, you get the real MFN of the hardware. When Xen sets
up a guest it initially populates a array which has descending
(or ascending) MFN values, as so:

 idx: 0,  1,       2
 [0x290F, 0x290E, 0x290D, ..]

so pfn_to_mfn(2)==0x290D. If you start, restart many guests that list
starts looking quite random.

We graft this structure on our P2M tree structure and stick in
those MFN in the leafs. But for all other leaf entries, or for the top
root, or middle one, for which there is a void entry, we assume it is
"missing". So
 pfn_to_mfn(0xc0000)=INVALID_P2M_ENTRY.

We add the possibility of setting 1-1 mappings on certain regions, so
that:
 pfn_to_mfn(0xc0000)=0xc0000

The benefit of this is, that we can assume for non-RAM regions (think
PCI BARs, or ACPI spaces), we can create mappings easily b/c we
get the PFN value to match the MFN.

For this to work efficiently we introduce one new page p2m_identity and
allocate (via reserved_brk) any other pages we need to cover the sides
(1GB or 4MB boundary violations). All entries in p2m_identity are set to
INVALID_P2M_ENTRY type (Xen toolstack only recognizes that and MFNs,
no other fancy value).

On lookup we spot that the entry points to p2m_identity and return the identity
value instead of dereferencing and returning INVALID_P2M_ENTRY. If the entry
points to an allocated page, we just proceed as before and return the PFN.
If the PFN has IDENTITY_FRAME_BIT set we unmask that in appropriate functions
(pfn_to_mfn).

The reason for having the IDENTITY_FRAME_BIT instead of just returning the
PFN is that we could find ourselves where pfn_to_mfn(pfn)==pfn for a
non-identity pfn. To protect ourselves against we elect to set (and get) the
IDENTITY_FRAME_BIT on all identity mapped PFNs.

This simplistic diagram is used to explain the more subtle piece of code.
There is also a digram of the P2M at the end that can help.
Imagine your E820 looking as so:

                   1GB                                           2GB
/-------------------+---------\/----\         /----------\    /---+-----\
| System RAM        | Sys RAM ||ACPI|         | reserved |    | Sys RAM |
\-------------------+---------/\----/         \----------/    \---+-----/
                              ^- 1029MB                       ^- 2001MB

[1029MB = 263424 (0x40500), 2001MB = 512256 (0x7D100), 2048MB = 524288 (0x80000)]

And dom0_mem=max:3GB,1GB is passed in to the guest, meaning memory past 1GB
is actually not present (would have to kick the balloon driver to put it in).

When we are told to set the PFNs for identity mapping (see patch: "xen/setup:
Set identity mapping for non-RAM E820 and E820 gaps.") we pass in the start
of the PFN and the end PFN (263424 and 512256 respectively). The first step is
to reserve_brk a top leaf page if the p2m[1] is missing. The top leaf page
covers 512^2 of page estate (1GB) and in case the start or end PFN is not
aligned on 512^2*PAGE_SIZE (1GB) we loop on aligned 1GB PFNs from start pfn to
end pfn.  We reserve_brk top leaf pages if they are missing (means they point
to p2m_mid_missing).

With the E820 example above, 263424 is not 1GB aligned so we allocate a
reserve_brk page which will cover the PFNs estate from 0x40000 to 0x80000.
Each entry in the allocate page is "missing" (points to p2m_missing).

Next stage is to determine if we need to do a more granular boundary check
on the 4MB (or 2MB depending on architecture) off the start and end pfn's.
We check if the start pfn and end pfn violate that boundary check, and if
so reserve_brk a middle (p2m[x][y]) leaf page. This way we have a much finer
granularity of setting which PFNs are missing and which ones are identity.
In our example 263424 and 512256 both fail the check so we reserve_brk two
pages. Populate them with INVALID_P2M_ENTRY (so they both have "missing" values)
and assign them to p2m[1][2] and p2m[1][488] respectively.

At this point we would at minimum reserve_brk one page, but could be up to
three. Each call to set_phys_range_identity has at maximum a three page
cost. If we were to query the P2M at this stage, all those entries from
start PFN through end PFN (so 1029MB -> 2001MB) would return INVALID_P2M_ENTRY
("missing").

The next step is to walk from the start pfn to the end pfn setting
the IDENTITY_FRAME_BIT on each PFN. This is done in '__set_phys_to_machine'.
If we find that the middle leaf is pointing to p2m_missing we can swap it over
to p2m_identity - this way covering 4MB (or 2MB) PFN space.  At this point we
do not need to worry about boundary aligment (so no need to reserve_brk a middle
page, figure out which PFNs are "missing" and which ones are identity), as that
has been done earlier.  If we find that the middle leaf is not occupied by
p2m_identity or p2m_missing, we dereference that page (which covers
512 PFNs) and set the appropriate PFN with IDENTITY_FRAME_BIT. In our example
263424 and 512256 end up there, and we set from p2m[1][2][256->511] and
p2m[1][488][0->256] with IDENTITY_FRAME_BIT set.

All other regions that are void (or not filled) either point to p2m_missing
(considered missing) or have the default value of INVALID_P2M_ENTRY (also
considered missing). In our case, p2m[1][2][0->255] and p2m[1][488][257->511]
contain the INVALID_P2M_ENTRY value and are considered "missing."

This is what the p2m ends up looking (for the E820 above) with this
fabulous drawing:

   p2m         /--------------\
 /-----\       | &mfn_list[0],|                           /-----------------\
 |  0  |------>| &mfn_list[1],|    /---------------\      | ~0, ~0, ..      |
 |-----|       |  ..., ~0, ~0 |    | ~0, ~0, [x]---+----->| IDENTITY [@256] |
 |  1  |---\   \--------------/    | [p2m_identity]+\     | IDENTITY [@257] |
 |-----|    \                      | [p2m_identity]+\\    | ....            |
 |  2  |--\  \-------------------->|  ...          | \\   \----------------/
 |-----|   \                       \---------------/  \\
 |  3  |\   \                                          \\  p2m_identity
 |-----| \   \-------------------->/---------------\   /-----------------\
 | ..  +->+                        | [p2m_identity]+-->| ~0, ~0, ~0, ... |
 \-----/ /                         | [p2m_identity]+-->| ..., ~0         |
        / /---------------\        | ....          |   \-----------------/
       /  | IDENTITY[@0]  |      /-+-[x], ~0, ~0.. |
      /   | IDENTITY[@256]|<----/  \---------------/
     /    | ~0, ~0, ....  |
    |     \---------------/
    |
    p2m_missing             p2m_missing
/------------------\     /------------\
| [p2m_mid_missing]+---->| ~0, ~0, ~0 |
| [p2m_mid_missing]+---->| ..., ~0    |
\------------------/     \------------/

where ~0 is INVALID_P2M_ENTRY. IDENTITY is (PFN | IDENTITY_BIT)

[v4: Squished patches in just this one]
[v5: Changed code to use ranges, added ASCII art]
[v6: Rebased on top of xen->p2m code split]
[v7: Added RESERVE_BRK for potentially allocated pages]
[v8: Fixed alignment problem]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/include/asm/xen/page.h |    6 ++-
 arch/x86/xen/p2m.c              |  109 ++++++++++++++++++++++++++++++++++++++-
 2 files changed, 112 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index 8ea9772..47c1b59 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -30,7 +30,9 @@ typedef struct xpaddr {
 /**** MACHINE <-> PHYSICAL CONVERSION MACROS ****/
 #define INVALID_P2M_ENTRY	(~0UL)
 #define FOREIGN_FRAME_BIT	(1UL<<31)
+#define IDENTITY_FRAME_BIT	(1UL<<30)
 #define FOREIGN_FRAME(m)	((m) | FOREIGN_FRAME_BIT)
+#define IDENTITY_FRAME(m)	((m) | IDENTITY_FRAME_BIT)
 
 /* Maximum amount of memory we can handle in a domain in pages */
 #define MAX_DOMAIN_PAGES						\
@@ -42,6 +44,8 @@ extern unsigned int   machine_to_phys_order;
 extern unsigned long get_phys_to_machine(unsigned long pfn);
 extern bool set_phys_to_machine(unsigned long pfn, unsigned long mfn);
 extern bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn);
+extern unsigned long set_phys_range_identity(unsigned long pfn_s,
+					     unsigned long pfn_e);
 
 extern int m2p_add_override(unsigned long mfn, struct page *page);
 extern int m2p_remove_override(struct page *page);
@@ -58,7 +62,7 @@ static inline unsigned long pfn_to_mfn(unsigned long pfn)
 	mfn = get_phys_to_machine(pfn);
 
 	if (mfn != INVALID_P2M_ENTRY)
-		mfn &= ~FOREIGN_FRAME_BIT;
+		mfn &= ~(FOREIGN_FRAME_BIT | IDENTITY_FRAME_BIT);
 
 	return mfn;
 }
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index df4e367..19b0a65 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -59,9 +59,15 @@ static RESERVE_BRK_ARRAY(unsigned long **, p2m_top, P2M_TOP_PER_PAGE);
 static RESERVE_BRK_ARRAY(unsigned long, p2m_top_mfn, P2M_TOP_PER_PAGE);
 static RESERVE_BRK_ARRAY(unsigned long *, p2m_top_mfn_p, P2M_TOP_PER_PAGE);
 
+static RESERVE_BRK_ARRAY(unsigned long, p2m_identity, P2M_PER_PAGE);
+
 RESERVE_BRK(p2m_mid, PAGE_SIZE * (MAX_DOMAIN_PAGES / (P2M_PER_PAGE * P2M_MID_PER_PAGE)));
 RESERVE_BRK(p2m_mid_mfn, PAGE_SIZE * (MAX_DOMAIN_PAGES / (P2M_PER_PAGE * P2M_MID_PER_PAGE)));
 
+/* We might hit two boundary violations at the start and end, at max each
+ * boundary violation will require three middle nodes. */
+RESERVE_BRK(p2m_mid_identity, PAGE_SIZE * 2 * 3);
+
 static inline unsigned p2m_top_index(unsigned long pfn)
 {
 	BUG_ON(pfn >= MAX_P2M_PFN);
@@ -221,6 +227,9 @@ void __init xen_build_dynamic_phys_to_machine(void)
 	p2m_top = extend_brk(PAGE_SIZE, PAGE_SIZE);
 	p2m_top_init(p2m_top);
 
+	p2m_identity = extend_brk(PAGE_SIZE, PAGE_SIZE);
+	p2m_init(p2m_identity);
+
 	/*
 	 * The domain builder gives us a pre-constructed p2m array in
 	 * mfn_list for all the pages initially given to us, so we just
@@ -272,6 +281,14 @@ unsigned long get_phys_to_machine(unsigned long pfn)
 	mididx = p2m_mid_index(pfn);
 	idx = p2m_index(pfn);
 
+	/*
+	 * The INVALID_P2M_ENTRY is filled in both p2m_*identity
+	 * and in p2m_*missing, so returning the INVALID_P2M_ENTRY
+	 * would be wrong.
+	 */
+	if (p2m_top[topidx][mididx] == p2m_identity)
+		return IDENTITY_FRAME(pfn);
+
 	return p2m_top[topidx][mididx][idx];
 }
 EXPORT_SYMBOL_GPL(get_phys_to_machine);
@@ -341,9 +358,11 @@ static bool alloc_p2m(unsigned long pfn)
 			p2m_top_mfn_p[topidx] = mid_mfn;
 	}
 
-	if (p2m_top[topidx][mididx] == p2m_missing) {
+	if (p2m_top[topidx][mididx] == p2m_identity ||
+	    p2m_top[topidx][mididx] == p2m_missing) {
 		/* p2m leaf page is missing */
 		unsigned long *p2m;
+		unsigned long *p2m_orig = p2m_top[topidx][mididx];
 
 		p2m = alloc_p2m_page();
 		if (!p2m)
@@ -351,7 +370,7 @@ static bool alloc_p2m(unsigned long pfn)
 
 		p2m_init(p2m);
 
-		if (cmpxchg(&mid[mididx], p2m_missing, p2m) != p2m_missing)
+		if (cmpxchg(&mid[mididx], p2m_orig, p2m) != p2m_orig)
 			free_p2m_page(p2m);
 		else
 			mid_mfn[mididx] = virt_to_mfn(p2m);
@@ -360,6 +379,78 @@ static bool alloc_p2m(unsigned long pfn)
 	return true;
 }
 
+bool __early_alloc_p2m(unsigned long pfn)
+{
+	unsigned topidx, mididx, idx;
+
+	topidx = p2m_top_index(pfn);
+	mididx = p2m_mid_index(pfn);
+	idx = p2m_index(pfn);
+
+	/* Pfff.. No boundary cross-over, lets get out. */
+	if (!idx)
+		return false;
+
+	WARN(p2m_top[topidx][mididx] == p2m_identity,
+		"P2M[%d][%d] == IDENTITY, should be MISSING (or alloced)!\n",
+		topidx, mididx);
+
+	/*
+	 * Could be done by xen_build_dynamic_phys_to_machine..
+	 */
+	if (p2m_top[topidx][mididx] != p2m_missing)
+		return false;
+
+	/* Boundary cross-over for the edges: */
+	if (idx) {
+		unsigned long *p2m = extend_brk(PAGE_SIZE, PAGE_SIZE);
+
+		p2m_init(p2m);
+
+		p2m_top[topidx][mididx] = p2m;
+
+	}
+	return idx != 0;
+}
+unsigned long set_phys_range_identity(unsigned long pfn_s,
+					unsigned long pfn_e)
+{
+	unsigned long pfn;
+
+	if (unlikely(pfn_s >= MAX_P2M_PFN || pfn_e >= MAX_P2M_PFN))
+		return 0;
+
+	if (unlikely(xen_feature(XENFEAT_auto_translated_physmap)))
+		return pfn_e - pfn_s;
+
+	for (pfn = (pfn_s & ~(P2M_MID_PER_PAGE * P2M_PER_PAGE - 1));
+		pfn < ALIGN(pfn_e, (P2M_MID_PER_PAGE * P2M_PER_PAGE));
+		pfn += P2M_MID_PER_PAGE * P2M_PER_PAGE)
+	{
+		unsigned topidx = p2m_top_index(pfn);
+		if (p2m_top[topidx] == p2m_mid_missing) {
+			unsigned long **mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
+
+			p2m_mid_init(mid);
+
+			p2m_top[topidx] = mid;
+		}
+	}
+
+	__early_alloc_p2m(pfn_s);
+	__early_alloc_p2m(pfn_e);
+
+	for (pfn = pfn_s; pfn < pfn_e; pfn++)
+		if (!__set_phys_to_machine(pfn, IDENTITY_FRAME(pfn)))
+			break;
+
+	WARN((pfn - pfn_s) != (pfn_e - pfn_s),
+		"Identity mapping failed. We are %ld short of 1-1 mappings!\n",
+		(pfn_e - pfn_s) - (pfn - pfn_s));
+
+	return pfn - pfn_s;
+}
+
 /* Try to install p2m mapping; fail if intermediate bits missing */
 bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 {
@@ -378,6 +469,20 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 	mididx = p2m_mid_index(pfn);
 	idx = p2m_index(pfn);
 
+	/* For sparse holes were the p2m leaf has real PFN along with
+	 * PCI holes, stick in the PFN as the MFN value.
+	 */
+	if (mfn != INVALID_P2M_ENTRY && (mfn & IDENTITY_FRAME_BIT)) {
+		if (p2m_top[topidx][mididx] == p2m_identity)
+			return true;
+
+		/* Swap over from MISSING to IDENTITY if needed. */
+		if (p2m_top[topidx][mididx] == p2m_missing) {
+			p2m_top[topidx][mididx] = p2m_identity;
+			return true;
+		}
+	}
+
 	if (p2m_top[topidx][mididx] == p2m_missing)
 		return mfn == INVALID_P2M_ENTRY;
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 03/11] xen/mmu: Set _PAGE_IOMAP if PFN is an identity PFN.
  2011-01-31 22:44 [PATCH v4] Consider E820 non-RAM and E820 gaps as 1-1 mappings Konrad Rzeszutek Wilk
  2011-01-31 22:44 ` [PATCH 01/11] xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY Konrad Rzeszutek Wilk
  2011-01-31 22:44 ` [PATCH 02/11] xen/mmu: Add the notion of identity (1-1) mapping Konrad Rzeszutek Wilk
@ 2011-01-31 22:44 ` Konrad Rzeszutek Wilk
  2011-01-31 22:44 ` [PATCH 04/11] xen/mmu: BUG_ON when racing to swap middle leaf Konrad Rzeszutek Wilk
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-01-31 22:44 UTC (permalink / raw)
  To: linux-kernel, Xen-devel, konrad, jeremy
  Cc: hpa, stefano.stabellini, Ian.Campbell, Konrad Rzeszutek Wilk

If we find that the PFN is within the P2M as an identity
PFN make sure to tack on the _PAGE_IOMAP flag.

Reviewed-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c |   18 ++++++++++++++++--
 1 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 0180ae8..9c9e076 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -416,8 +416,12 @@ static pteval_t pte_pfn_to_mfn(pteval_t val)
 	if (val & _PAGE_PRESENT) {
 		unsigned long pfn = (val & PTE_PFN_MASK) >> PAGE_SHIFT;
 		pteval_t flags = val & PTE_FLAGS_MASK;
-		unsigned long mfn = pfn_to_mfn(pfn);
+		unsigned long mfn;
 
+		if (!xen_feature(XENFEAT_auto_translated_physmap))
+			mfn = get_phys_to_machine(pfn);
+		else
+			mfn = pfn;
 		/*
 		 * If there's no mfn for the pfn, then just create an
 		 * empty non-present pte.  Unfortunately this loses
@@ -427,8 +431,18 @@ static pteval_t pte_pfn_to_mfn(pteval_t val)
 		if (unlikely(mfn == INVALID_P2M_ENTRY)) {
 			mfn = 0;
 			flags = 0;
+		} else {
+			/*
+			 * Paramount to do this test _after_ the
+			 * INVALID_P2M_ENTRY as INVALID_P2M_ENTRY &
+			 * IDENTITY_FRAME_BIT resolves to true.
+			 */
+			mfn &= ~FOREIGN_FRAME_BIT;
+			if (mfn & IDENTITY_FRAME_BIT) {
+				mfn &= ~IDENTITY_FRAME_BIT;
+				flags |= _PAGE_IOMAP;
+			}
 		}
-
 		val = ((pteval_t)mfn << PAGE_SHIFT) | flags;
 	}
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 04/11] xen/mmu: BUG_ON when racing to swap middle leaf.
  2011-01-31 22:44 [PATCH v4] Consider E820 non-RAM and E820 gaps as 1-1 mappings Konrad Rzeszutek Wilk
                   ` (2 preceding siblings ...)
  2011-01-31 22:44 ` [PATCH 03/11] xen/mmu: Set _PAGE_IOMAP if PFN is an identity PFN Konrad Rzeszutek Wilk
@ 2011-01-31 22:44 ` Konrad Rzeszutek Wilk
  2011-02-01 21:34   ` Jeremy Fitzhardinge
  2011-01-31 22:44 ` [PATCH 05/11] xen/setup: Set identity mapping for non-RAM E820 and E820 gaps Konrad Rzeszutek Wilk
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-01-31 22:44 UTC (permalink / raw)
  To: linux-kernel, Xen-devel, konrad, jeremy
  Cc: hpa, stefano.stabellini, Ian.Campbell, Konrad Rzeszutek Wilk

The initial bootup code uses set_phys_to_machine quite a lot, and after
bootup it would be used by the balloon driver. The balloon driver does have
mutex lock so this should not be necessary - but just in case, add
a BUG_ON if we do hit this scenario.

[v2: Change from WARN to BUG_ON]
[v3: Rebased on top of xen->p2m code split]
Reviewed-by: Ian Campbell <Ian.Campbell@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/p2m.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 19b0a65..fbbd2ab 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -478,7 +478,8 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 
 		/* Swap over from MISSING to IDENTITY if needed. */
 		if (p2m_top[topidx][mididx] == p2m_missing) {
-			p2m_top[topidx][mididx] = p2m_identity;
+			BUG_ON(cmpxchg(&p2m_top[topidx][mididx], p2m_missing,
+				p2m_identity) != p2m_missing);
 			return true;
 		}
 	}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 05/11] xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.
  2011-01-31 22:44 [PATCH v4] Consider E820 non-RAM and E820 gaps as 1-1 mappings Konrad Rzeszutek Wilk
                   ` (3 preceding siblings ...)
  2011-01-31 22:44 ` [PATCH 04/11] xen/mmu: BUG_ON when racing to swap middle leaf Konrad Rzeszutek Wilk
@ 2011-01-31 22:44 ` Konrad Rzeszutek Wilk
  2011-02-01 22:32     ` Konrad Rzeszutek Wilk
  2011-01-31 22:44 ` [PATCH 06/11] xen/setup: Skip over 1st gap after System RAM Konrad Rzeszutek Wilk
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-01-31 22:44 UTC (permalink / raw)
  To: linux-kernel, Xen-devel, konrad, jeremy
  Cc: hpa, stefano.stabellini, Ian.Campbell, Konrad Rzeszutek Wilk

We walk the E820 region and start at 0 (for PV guests we start
at ISA_END_ADDRESS) and skip any E820 RAM regions. For all other
regions and as well the gaps we set them to be identity mappings.

The reasons we do not want to set the identity mapping from 0->
ISA_END_ADDRESS when running as PV is b/c that the kernel would
try to read DMI information and fail (no permissions to read that).
There is a lot of gnarly code to deal with that weird region so
we won't try to do a cleanup in this patch.

This code ends up calling 'set_phys_to_identity' with the start
and end PFN of the the E820 that are non-RAM or have gaps.
On 99% of machines that means one big region right underneath the
4GB mark. Usually starts at 0xc0000 (or 0x80000) and goes to
0x100000.

[v2: Fix for E820 crossing 1MB region and clamp the start]
[v3: Squshed in code that does this over ranges]
[v4: Moved the comment to the correct spot]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/setup.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 7201800..c2a5b5f 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -143,6 +143,44 @@ static unsigned long __init xen_return_unused_memory(unsigned long max_pfn,
 	return released;
 }
 
+static unsigned long __init xen_set_identity(const struct e820map *e820)
+{
+	phys_addr_t last = xen_initial_domain() ? 0 : ISA_END_ADDRESS;
+	phys_addr_t start_pci = last;
+	int i;
+	unsigned long identity = 0;
+
+	for (i = 0; i < e820->nr_map; i++) {
+		phys_addr_t start = e820->map[i].addr;
+		phys_addr_t end = start + e820->map[i].size;
+
+		if (start < last)
+			start = last;
+
+		if (end <= start)
+			continue;
+
+		/* Skip over the 1MB region. */
+		if (last > end)
+			continue;
+
+		if (e820->map[i].type == E820_RAM) {
+			if (start > start_pci)
+				identity += set_phys_range_identity(
+					PFN_UP(start_pci), PFN_DOWN(start));
+			start_pci = end;
+			/* Without saving 'last' we would gooble RAM too. */
+			last = end;
+			continue;
+		}
+		start_pci = min(start, start_pci);
+		last = end;
+	}
+	if (last > start_pci)
+		identity += set_phys_range_identity(
+					PFN_UP(start_pci), PFN_DOWN(last));
+	return identity;
+}
 /**
  * machine_specific_memory_setup - Hook for machine specific memory setup.
  **/
@@ -156,6 +194,7 @@ char * __init xen_memory_setup(void)
 	struct xen_memory_map memmap;
 	unsigned long extra_pages = 0;
 	unsigned long extra_limit;
+	unsigned long identity_pages = 0;
 	int i;
 	int op;
 
@@ -251,6 +290,12 @@ char * __init xen_memory_setup(void)
 
 	xen_add_extra_mem(extra_pages);
 
+	/*
+	 * Set P2M for all non-RAM pages and E820 gaps to be identity
+	 * type PFNs.
+	 */
+	identity_pages = xen_set_identity(&e820);
+	printk(KERN_INFO "Set %ld page(s) to 1-1 mapping.\n", identity_pages);
 	return "Xen";
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 06/11] xen/setup: Skip over 1st gap after System RAM.
  2011-01-31 22:44 [PATCH v4] Consider E820 non-RAM and E820 gaps as 1-1 mappings Konrad Rzeszutek Wilk
                   ` (4 preceding siblings ...)
  2011-01-31 22:44 ` [PATCH 05/11] xen/setup: Set identity mapping for non-RAM E820 and E820 gaps Konrad Rzeszutek Wilk
@ 2011-01-31 22:44 ` Konrad Rzeszutek Wilk
  2011-02-01 15:08   ` Ian Campbell
  2011-01-31 22:44 ` [PATCH 07/11] x86/setup: Consult the raw E820 for zero sized E820 RAM regions Konrad Rzeszutek Wilk
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-01-31 22:44 UTC (permalink / raw)
  To: linux-kernel, Xen-devel, konrad, jeremy
  Cc: hpa, stefano.stabellini, Ian.Campbell, Konrad Rzeszutek Wilk

If the kernel is booted with dom0_mem=max:512MB and the
machine has more than 512MB of RAM, the E820 we get is:

Xen: 0000000000100000 - 0000000020000000 (usable)
Xen: 00000000b7ee0000 - 00000000b7ee3000 (ACPI NVS)

while in actuality it is:

(XEN)  0000000000100000 - 00000000b7ee0000 (usable)
(XEN)  00000000b7ee0000 - 00000000b7ee3000 (ACPI NVS)

Based on that, we would determine that the "gap" between
0x20000 -> 0xb7ee0 is not System RAM and try to assign it to
1-1 mapping. This meant that later on when we setup the page tables
we would try to assign those regions to DOMID_IO and the
Xen hypervisor would fail such operation. This patch
guards against that and sets the "gap" to be after the first
non-RAM E820 region.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/setup.c |   20 ++++++++++++++++++--
 1 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index c2a5b5f..5b2ae49 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -147,6 +147,7 @@ static unsigned long __init xen_set_identity(const struct e820map *e820)
 {
 	phys_addr_t last = xen_initial_domain() ? 0 : ISA_END_ADDRESS;
 	phys_addr_t start_pci = last;
+	phys_addr_t ram_end = last;
 	int i;
 	unsigned long identity = 0;
 
@@ -168,11 +169,26 @@ static unsigned long __init xen_set_identity(const struct e820map *e820)
 			if (start > start_pci)
 				identity += set_phys_range_identity(
 					PFN_UP(start_pci), PFN_DOWN(start));
-			start_pci = end;
 			/* Without saving 'last' we would gooble RAM too. */
-			last = end;
+			start_pci = last = ram_end = end;
 			continue;
 		}
+		/* Gap found right after the 1st RAM region. Skip over it.
+		 * Why? That is b/c if we pass in dom0_mem=max:512MB and
+		 * have in reality 1GB, the E820 is clipped at 512MB.
+		 * In xen_set_pte_init we end up calling xen_set_domain_pte
+		 * which asks Xen hypervisor to alter the ownership of the MFN
+		 * to DOMID_IO. We would try to set that on PFNs from 512MB
+		 * up to the next System RAM region (likely from 0x20000->
+		 * 0x100000). But changing the ownership on "real" RAM regions
+		 * will infuriate Xen hypervisor and we will fail (WARN).
+		 * So instead of trying to set IDENTITY mapping on the gap
+		 * between the System RAM and the first non-RAM E820 region
+		 * we start at the non-RAM E820 region.*/
+		if (ram_end && start >= ram_end) {
+			start_pci = start;
+			ram_end = 0;
+		}
 		start_pci = min(start, start_pci);
 		last = end;
 	}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 07/11] x86/setup: Consult the raw E820 for zero sized E820 RAM regions.
  2011-01-31 22:44 [PATCH v4] Consider E820 non-RAM and E820 gaps as 1-1 mappings Konrad Rzeszutek Wilk
                   ` (5 preceding siblings ...)
  2011-01-31 22:44 ` [PATCH 06/11] xen/setup: Skip over 1st gap after System RAM Konrad Rzeszutek Wilk
@ 2011-01-31 22:44 ` Konrad Rzeszutek Wilk
  2011-02-01 17:52   ` Stefano Stabellini
  2011-01-31 22:44 ` [PATCH 08/11] xen/debugfs: Add 'p2m' file for printing out the P2M layout Konrad Rzeszutek Wilk
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-01-31 22:44 UTC (permalink / raw)
  To: linux-kernel, Xen-devel, konrad, jeremy
  Cc: hpa, stefano.stabellini, Ian.Campbell, Konrad Rzeszutek Wilk

When the Xen hypervisor provides us with an E820, it can
contain zero sized RAM regions. Those are entries that have
been trimmed down due to the user utilizing the dom0_mem flag.

What it means is that there is RAM at those regions, and we
should _not_ be considering those regions as 1-1 mapping.

This dom0_mem parameter changes a nice looking E820 like this:

Xen: 0000000000000000 - 000000000009d000 (usable)
Xen: 000000000009d000 - 0000000000100000 (reserved)
Xen: 0000000000100000 - 000000009cf67000 (usable)
Xen: 000000009cf67000 - 000000009d103000 (ACPI NVS)
Xen: 000000009d103000 - 000000009f6bd000 (usable)
Xen: 000000009f6bd000 - 000000009f6bf000 (reserved)
Xen: 000000009f6bf000 - 000000009f714000 (usable)

(wherein we would happily set 9d->100, 9cf67->9d103, and
9f6bd->9f6bf to identity mapping) .. but with a dom0_mem
argument (say dom0_mem=700MB) it looks as so:

Xen: 0000000000000000 - 000000000009d000 (usable)
Xen: 000000000009d000 - 0000000000100000 (reserved)
Xen: 0000000000100000 - 000000002bc00000 (usable)
Xen: 000000009cf67000 - 000000009d103000 (ACPI NVS)
Xen: 000000009f6bd000 - 000000009f6bf000 (reserved)

We would set 9d->100, and 9cf670->9f6bf to identity
mapping. The region from 9d103->9f6bd - which is
System RAM where a guest could be allocated from,
would be considered identity which is incorrect.

[Note: this printout of the E820 is after E820
sanitization, the raw E820 would look like this]:

Xen: 0000000000000000 - 000000000009d000 (usable)
Xen: 000000000009d000 - 0000000000100000 (reserved)
Xen: 0000000000100000 - 000000002bc00000 (usable)
Xen: 000000009cf67000 - 000000009d103000 (ACPI NVS)
Xen: 000000009d103000 - 000000009d103000 (usable)  <===
Xen: 000000009f6bd000 - 000000009f6bf000 (reserved)

[Notice the "usable" zero sized region]

This patch consults the non-sanitized version of the E820
and checks if there are zero-sized RAM regions right before
the non-RAM E820 entry we are currently evaluating.
If so, we utilize the 'ram_end' value to piggyback on the
code introduced by "xen/setup: Pay attention to zero sized
E820_RAM regions" patch. Also we add a printk to help
us determine which region has been set to 1-1 mapping and
add some sanity checking.

We must keep those regions zero-size E820 RAM regions
as is (so missing), otherwise the M2P override code can
malfunction if a guest grant page is present in those regions.

Shifting the "xen_set_identity" to be called earlier (so that
we are using the non-sanitized version of the &e820) does not
work as we need to take into account the E820 after the
initial increase/decrease reservation done and addition of a
new E820 region in 'xen_add_extra_mem').

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/p2m.c   |    9 +++++++--
 arch/x86/xen/setup.c |   47 +++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 50 insertions(+), 6 deletions(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index fbbd2ab..70bd49b 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -423,6 +423,9 @@ unsigned long set_phys_range_identity(unsigned long pfn_s,
 	if (unlikely(xen_feature(XENFEAT_auto_translated_physmap)))
 		return pfn_e - pfn_s;
 
+	if (pfn_s > pfn_e)
+		return 0;
+
 	for (pfn = (pfn_s & ~(P2M_MID_PER_PAGE * P2M_PER_PAGE - 1));
 		pfn < ALIGN(pfn_e, (P2M_MID_PER_PAGE * P2M_PER_PAGE));
 		pfn += P2M_MID_PER_PAGE * P2M_PER_PAGE)
@@ -444,9 +447,11 @@ unsigned long set_phys_range_identity(unsigned long pfn_s,
 		if (!__set_phys_to_machine(pfn, IDENTITY_FRAME(pfn)))
 			break;
 
-	WARN((pfn - pfn_s) != (pfn_e - pfn_s),
+	if (!WARN((pfn - pfn_s) != (pfn_e - pfn_s),
 		"Identity mapping failed. We are %ld short of 1-1 mappings!\n",
-		(pfn_e - pfn_s) - (pfn - pfn_s));
+		(pfn_e - pfn_s) - (pfn - pfn_s)))
+		printk(KERN_DEBUG "1-1 mapping on %lx->%lx\n", pfn_s, pfn);
+
 
 	return pfn - pfn_s;
 }
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 5b2ae49..e7ee04c 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -143,13 +143,16 @@ static unsigned long __init xen_return_unused_memory(unsigned long max_pfn,
 	return released;
 }
 
-static unsigned long __init xen_set_identity(const struct e820map *e820)
+static unsigned long __init xen_set_identity(const struct e820map *e820,
+					     const struct e820entry *list,
+					     ssize_t map_size)
 {
 	phys_addr_t last = xen_initial_domain() ? 0 : ISA_END_ADDRESS;
 	phys_addr_t start_pci = last;
 	phys_addr_t ram_end = last;
-	int i;
+	int i, j;
 	unsigned long identity = 0;
+	const struct e820entry *entry;
 
 	for (i = 0; i < e820->nr_map; i++) {
 		phys_addr_t start = e820->map[i].addr;
@@ -158,6 +161,8 @@ static unsigned long __init xen_set_identity(const struct e820map *e820)
 		if (start < last)
 			start = last;
 
+		/* Sadly, we do not get E820 entries with zero size after
+		 * sanitization. */
 		if (end <= start)
 			continue;
 
@@ -173,6 +178,37 @@ static unsigned long __init xen_set_identity(const struct e820map *e820)
 			start_pci = last = ram_end = end;
 			continue;
 		}
+		/* Consult the real non-sanitizied version of E820 to see
+		 * if there is a E820_RAM region with zero size right before
+		 * our non-RAM E820 entry. The 'zero size' are real RAM
+		 * regions which the hypervisor has truncated to zero size.
+		 * This is b/c the user supplied a dom0_mem flag to trim how
+		 * much RAM we can use.*/
+		for (j = 0, entry = list; j < map_size; j++, entry++) {
+			/* Found this non-RAM E820 region. If previous entry
+			 * is a zero sized E820 RAM region, then rethink.
+			 */
+			if (start == entry->addr) {
+				const struct e820entry *tmp = entry-1;
+				phys_addr_t ghost_ram = tmp->addr;
+
+				if ((tmp->type != E820_RAM) && (tmp->size != 0))
+					break;
+
+				if (ghost_ram > start_pci) {
+					identity += set_phys_range_identity(
+						PFN_UP(start_pci),
+						PFN_DOWN(ghost_ram));
+				}
+				/* We ought to reset it to the _end_ of the
+				 * E820 RAM region but since it is zero sized,
+				 * that would not work. Instead we reset it to
+				 * the start of non-RAM E820 region and the let
+				 * the code right below fix up the values.*/
+				ram_end = start;
+				break;
+			}
+		}
 		/* Gap found right after the 1st RAM region. Skip over it.
 		 * Why? That is b/c if we pass in dom0_mem=max:512MB and
 		 * have in reality 1GB, the E820 is clipped at 512MB.
@@ -308,9 +344,12 @@ char * __init xen_memory_setup(void)
 
 	/*
 	 * Set P2M for all non-RAM pages and E820 gaps to be identity
-	 * type PFNs.
+	 * type PFNs. We also supply it with the non-sanitized version
+	 * of the E820 - which can have zero size E820 RAM regions
+	 * that we _MUST_ consult so that we do not set 1-1 mapping
+	 * on RAM regions (which might be assigned to guests for example).
 	 */
-	identity_pages = xen_set_identity(&e820);
+	identity_pages = xen_set_identity(&e820, map, memmap.nr_entries);
 	printk(KERN_INFO "Set %ld page(s) to 1-1 mapping.\n", identity_pages);
 	return "Xen";
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 08/11] xen/debugfs: Add 'p2m' file for printing out the P2M layout.
  2011-01-31 22:44 [PATCH v4] Consider E820 non-RAM and E820 gaps as 1-1 mappings Konrad Rzeszutek Wilk
                   ` (6 preceding siblings ...)
  2011-01-31 22:44 ` [PATCH 07/11] x86/setup: Consult the raw E820 for zero sized E820 RAM regions Konrad Rzeszutek Wilk
@ 2011-01-31 22:44 ` Konrad Rzeszutek Wilk
  2011-01-31 22:44 ` [PATCH 09/11] xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set Konrad Rzeszutek Wilk
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-01-31 22:44 UTC (permalink / raw)
  To: linux-kernel, Xen-devel, konrad, jeremy
  Cc: hpa, stefano.stabellini, Ian.Campbell, Konrad Rzeszutek Wilk

We walk over the whole P2M tree and construct a simplified view of
which PFN regions belong to what level and what type they are.

Only enabled if CONFIG_XEN_DEBUG_FS is set.

[v2: UNKN->UNKNOWN, use uninitialized_var]
[v3: Rebased on top of mmu->p2m code split]
[v4: Fixed the else if]
Reviewed-by: Ian Campbell <Ian.Campbell@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/include/asm/xen/page.h |    3 +
 arch/x86/xen/mmu.c              |   14 +++++++
 arch/x86/xen/p2m.c              |   78 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 95 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index 47c1b59..7b17fc5 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -52,6 +52,9 @@ extern int m2p_remove_override(struct page *page);
 extern struct page *m2p_find_override(unsigned long mfn);
 extern unsigned long m2p_find_override_pfn(unsigned long mfn, unsigned long pfn);
 
+#ifdef CONFIG_XEN_DEBUG_FS
+extern int p2m_dump_show(struct seq_file *m, void *v);
+#endif
 static inline unsigned long pfn_to_mfn(unsigned long pfn)
 {
 	unsigned long mfn;
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 9c9e076..b13b6ca 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -46,6 +46,7 @@
 #include <linux/module.h>
 #include <linux/gfp.h>
 #include <linux/memblock.h>
+#include <linux/seq_file.h>
 
 #include <asm/pgtable.h>
 #include <asm/tlbflush.h>
@@ -2367,6 +2368,18 @@ EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_range);
 
 #ifdef CONFIG_XEN_DEBUG_FS
 
+static int p2m_dump_open(struct inode *inode, struct file *filp)
+{
+	return single_open(filp, p2m_dump_show, NULL);
+}
+
+static const struct file_operations p2m_dump_fops = {
+	.open		= p2m_dump_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
 static struct dentry *d_mmu_debug;
 
 static int __init xen_mmu_debugfs(void)
@@ -2422,6 +2435,7 @@ static int __init xen_mmu_debugfs(void)
 	debugfs_create_u32("prot_commit_batched", 0444, d_mmu_debug,
 			   &mmu_stats.prot_commit_batched);
 
+	debugfs_create_file("p2m", 0600, d_mmu_debug, NULL, &p2m_dump_fops);
 	return 0;
 }
 fs_initcall(xen_mmu_debugfs);
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 70bd49b..38ae7fe 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -30,6 +30,7 @@
 #include <linux/list.h>
 #include <linux/hash.h>
 #include <linux/sched.h>
+#include <linux/seq_file.h>
 
 #include <asm/cache.h>
 #include <asm/setup.h>
@@ -636,3 +637,80 @@ unsigned long m2p_find_override_pfn(unsigned long mfn, unsigned long pfn)
 	return ret;
 }
 EXPORT_SYMBOL_GPL(m2p_find_override_pfn);
+
+#ifdef CONFIG_XEN_DEBUG_FS
+
+int p2m_dump_show(struct seq_file *m, void *v)
+{
+	static const char * const level_name[] = { "top", "middle",
+						"entry", "abnormal" };
+	static const char * const type_name[] = { "identity", "missing",
+						"pfn", "abnormal"};
+#define TYPE_IDENTITY 0
+#define TYPE_MISSING 1
+#define TYPE_PFN 2
+#define TYPE_UNKNOWN 3
+	unsigned long pfn, prev_pfn_type = 0, prev_pfn_level = 0;
+	unsigned int uninitialized_var(prev_level);
+	unsigned int uninitialized_var(prev_type);
+
+	if (!p2m_top)
+		return 0;
+
+	for (pfn = 0; pfn < MAX_DOMAIN_PAGES; pfn++) {
+		unsigned topidx = p2m_top_index(pfn);
+		unsigned mididx = p2m_mid_index(pfn);
+		unsigned idx = p2m_index(pfn);
+		unsigned lvl, type;
+
+		lvl = 4;
+		type = TYPE_UNKNOWN;
+		if (p2m_top[topidx] == p2m_mid_missing) {
+			lvl = 0; type = TYPE_MISSING;
+		} else if (p2m_top[topidx] == NULL) {
+			lvl = 0; type = TYPE_UNKNOWN;
+		} else if (p2m_top[topidx][mididx] == NULL) {
+			lvl = 1; type = TYPE_UNKNOWN;
+		} else if (p2m_top[topidx][mididx] == p2m_identity) {
+			lvl = 1; type = TYPE_IDENTITY;
+		} else if (p2m_top[topidx][mididx] == p2m_missing) {
+			lvl = 1; type = TYPE_MISSING;
+		} else if (p2m_top[topidx][mididx][idx] == 0) {
+			lvl = 2; type = TYPE_UNKNOWN;
+		} else if (p2m_top[topidx][mididx][idx] == IDENTITY_FRAME(pfn)) {
+			lvl = 2; type = TYPE_IDENTITY;
+		} else if (p2m_top[topidx][mididx][idx] == INVALID_P2M_ENTRY) {
+			lvl = 2; type = TYPE_MISSING;
+		} else if (p2m_top[topidx][mididx][idx] == pfn) {
+			lvl = 2; type = TYPE_PFN;
+		} else if (p2m_top[topidx][mididx][idx] != pfn) {
+			lvl = 2; type = TYPE_PFN;
+		}
+		if (pfn == 0) {
+			prev_level = lvl;
+			prev_type = type;
+		}
+		if (pfn == MAX_DOMAIN_PAGES-1) {
+			lvl = 3;
+			type = TYPE_UNKNOWN;
+		}
+		if (prev_type != type) {
+			seq_printf(m, " [0x%lx->0x%lx] %s\n",
+				prev_pfn_type, pfn, type_name[prev_type]);
+			prev_pfn_type = pfn;
+			prev_type = type;
+		}
+		if (prev_level != lvl) {
+			seq_printf(m, " [0x%lx->0x%lx] level %s\n",
+				prev_pfn_level, pfn, level_name[prev_level]);
+			prev_pfn_level = pfn;
+			prev_level = lvl;
+		}
+	}
+	return 0;
+#undef TYPE_IDENTITY
+#undef TYPE_MISSING
+#undef TYPE_PFN
+#undef TYPE_UNKNOWN
+}
+#endif
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 09/11] xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set.
  2011-01-31 22:44 [PATCH v4] Consider E820 non-RAM and E820 gaps as 1-1 mappings Konrad Rzeszutek Wilk
                   ` (7 preceding siblings ...)
  2011-01-31 22:44 ` [PATCH 08/11] xen/debugfs: Add 'p2m' file for printing out the P2M layout Konrad Rzeszutek Wilk
@ 2011-01-31 22:44 ` Konrad Rzeszutek Wilk
  2011-01-31 22:44 ` [PATCH 10/11] xen/m2p: No need to catch exceptions when we know that there is no RAM Konrad Rzeszutek Wilk
  2011-01-31 22:44 ` [PATCH 11/11] xen/m2p: Check whether the MFN has IDENTITY_FRAME bit set Konrad Rzeszutek Wilk
  10 siblings, 0 replies; 30+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-01-31 22:44 UTC (permalink / raw)
  To: linux-kernel, Xen-devel, konrad, jeremy
  Cc: hpa, stefano.stabellini, Ian.Campbell, Konrad Rzeszutek Wilk

Only enabled if XEN_DEBUG is enabled. We print a warning
when:

 pfn_to_mfn(pfn) == pfn, but no VM_IO (_PAGE_IOMAP) flag set
	(and pfn is an identity mapped pfn)
 pfn_to_mfn(pfn) != pfn, and VM_IO flag is set.
	(ditto, pfn is an identity mapped pfn)

[v2: Make it dependent on CONFIG_XEN_DEBUG instead of ..DEBUG_FS]
[v3: Fix compiler warning]

Reviewed-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/Kconfig |    8 ++++++++
 arch/x86/xen/mmu.c   |   38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index 5b54892..e4343fe 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -48,3 +48,11 @@ config XEN_DEBUG_FS
 	help
 	  Enable statistics output and various tuning options in debugfs.
 	  Enabling this option may incur a significant performance overhead.
+
+config XEN_DEBUG
+	bool "Enable Xen debug checks"
+	depends on XEN
+	default n
+	help
+	  Enable various WARN_ON checks in the Xen MMU code.
+	  Enabling this option WILL incur a significant performance overhead.
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index b13b6ca..0c376a2 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -547,6 +547,41 @@ pte_t xen_make_pte(pteval_t pte)
 }
 PV_CALLEE_SAVE_REGS_THUNK(xen_make_pte);
 
+#ifdef CONFIG_XEN_DEBUG
+pte_t xen_make_pte_debug(pteval_t pte)
+{
+	phys_addr_t addr = (pte & PTE_PFN_MASK);
+	phys_addr_t other_addr;
+	bool io_page = false;
+	pte_t _pte;
+
+	if (pte & _PAGE_IOMAP)
+		io_page = true;
+
+	_pte = xen_make_pte(pte);
+
+	if (!addr)
+		return _pte;
+
+	if (io_page &&
+	    (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
+		other_addr = pfn_to_mfn(addr >> PAGE_SHIFT) << PAGE_SHIFT;
+		WARN(addr != other_addr,
+			"0x%lx is using VM_IO, but it is 0x%lx!\n",
+			(unsigned long)addr, (unsigned long)other_addr);
+	} else {
+		pteval_t iomap_set = (_pte.pte & PTE_FLAGS_MASK) & _PAGE_IOMAP;
+		other_addr = (_pte.pte & PTE_PFN_MASK);
+		WARN((addr == other_addr) && (!io_page) && (!iomap_set),
+			"0x%lx is missing VM_IO (and wasn't fixed)!\n",
+			(unsigned long)addr);
+	}
+
+	return _pte;
+}
+PV_CALLEE_SAVE_REGS_THUNK(xen_make_pte_debug);
+#endif
+
 pgd_t xen_make_pgd(pgdval_t pgd)
 {
 	pgd = pte_pfn_to_mfn(pgd);
@@ -1957,6 +1992,9 @@ __init void xen_ident_map_ISA(void)
 
 static __init void xen_post_allocator_init(void)
 {
+#ifdef CONFIG_XEN_DEBUG
+	pv_mmu_ops.make_pte = PV_CALLEE_SAVE(xen_make_pte_debug);
+#endif
 	pv_mmu_ops.set_pte = xen_set_pte;
 	pv_mmu_ops.set_pmd = xen_set_pmd;
 	pv_mmu_ops.set_pud = xen_set_pud;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 10/11] xen/m2p: No need to catch exceptions when we know that there is no RAM
  2011-01-31 22:44 [PATCH v4] Consider E820 non-RAM and E820 gaps as 1-1 mappings Konrad Rzeszutek Wilk
                   ` (8 preceding siblings ...)
  2011-01-31 22:44 ` [PATCH 09/11] xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set Konrad Rzeszutek Wilk
@ 2011-01-31 22:44 ` Konrad Rzeszutek Wilk
  2011-01-31 22:44 ` [PATCH 11/11] xen/m2p: Check whether the MFN has IDENTITY_FRAME bit set Konrad Rzeszutek Wilk
  10 siblings, 0 replies; 30+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-01-31 22:44 UTC (permalink / raw)
  To: linux-kernel, Xen-devel, konrad, jeremy
  Cc: hpa, stefano.stabellini, Ian.Campbell, Konrad Rzeszutek Wilk

. beyound what we think is the end of memory. However there might
be more System RAM - but assigned to a guest. Hence jump to the
M2P override check and consult.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/include/asm/xen/page.h |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index 7b17fc5..ed46ec2 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -85,6 +85,10 @@ static inline unsigned long mfn_to_pfn(unsigned long mfn)
 	if (xen_feature(XENFEAT_auto_translated_physmap))
 		return mfn;
 
+	if (unlikely((mfn >> machine_to_phys_order) != 0)) {
+		pfn = ~0;
+		goto try_override;
+	}
 	pfn = 0;
 	/*
 	 * The array access can fail (e.g., device space beyond end of RAM).
@@ -92,7 +96,7 @@ static inline unsigned long mfn_to_pfn(unsigned long mfn)
 	 * but we must handle the fault without crashing!
 	 */
 	__get_user(pfn, &machine_to_phys_mapping[mfn]);
-
+try_override:
 	/*
 	 * If this appears to be a foreign mfn (because the pfn
 	 * doesn't map back to the mfn), then check the local override
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 11/11] xen/m2p: Check whether the MFN has IDENTITY_FRAME bit set..
  2011-01-31 22:44 [PATCH v4] Consider E820 non-RAM and E820 gaps as 1-1 mappings Konrad Rzeszutek Wilk
                   ` (9 preceding siblings ...)
  2011-01-31 22:44 ` [PATCH 10/11] xen/m2p: No need to catch exceptions when we know that there is no RAM Konrad Rzeszutek Wilk
@ 2011-01-31 22:44 ` Konrad Rzeszutek Wilk
  2011-02-01 17:52   ` Stefano Stabellini
  10 siblings, 1 reply; 30+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-01-31 22:44 UTC (permalink / raw)
  To: linux-kernel, Xen-devel, konrad, jeremy
  Cc: hpa, stefano.stabellini, Ian.Campbell, Konrad Rzeszutek Wilk

From: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

If we have the IDENTITY_FRAME bit set from the P2M, there
is no point in looking up MFN in the M2P override. This is
b/c the MFN is a physical MFN.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/include/asm/xen/page.h |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index ed46ec2..e6f7f37 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -81,6 +81,7 @@ static inline int phys_to_machine_mapping_valid(unsigned long pfn)
 static inline unsigned long mfn_to_pfn(unsigned long mfn)
 {
 	unsigned long pfn;
+	unsigned long p2m_mfn;
 
 	if (xen_feature(XENFEAT_auto_translated_physmap))
 		return mfn;
@@ -102,7 +103,12 @@ try_override:
 	 * doesn't map back to the mfn), then check the local override
 	 * table to see if there's a better pfn to use.
 	 */
-	if (get_phys_to_machine(pfn) != mfn)
+	p2m_mfn = get_phys_to_machine(pfn);
+
+	if (p2m_mfn == IDENTITY_FRAME(mfn))
+		return pfn;
+
+	if (p2m_mfn != mfn)
 		pfn = m2p_find_override_pfn(mfn, pfn);
 
 	return pfn;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 06/11] xen/setup: Skip over 1st gap after System RAM.
  2011-01-31 22:44 ` [PATCH 06/11] xen/setup: Skip over 1st gap after System RAM Konrad Rzeszutek Wilk
@ 2011-02-01 15:08   ` Ian Campbell
  2011-02-01 17:14       ` H. Peter Anvin
  2011-02-01 22:28     ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 30+ messages in thread
From: Ian Campbell @ 2011-02-01 15:08 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, Xen-devel, konrad, jeremy, hpa, Stefano Stabellini

On Mon, 2011-01-31 at 22:44 +0000, Konrad Rzeszutek Wilk wrote:
> If the kernel is booted with dom0_mem=max:512MB and the
> machine has more than 512MB of RAM, the E820 we get is:
> 
> Xen: 0000000000100000 - 0000000020000000 (usable)
> Xen: 00000000b7ee0000 - 00000000b7ee3000 (ACPI NVS)
> 
> while in actuality it is:
> 
> (XEN)  0000000000100000 - 00000000b7ee0000 (usable)
> (XEN)  00000000b7ee0000 - 00000000b7ee3000 (ACPI NVS)
> 
> Based on that, we would determine that the "gap" between
> 0x20000 -> 0xb7ee0 is not System RAM and try to assign it to
> 1-1 mapping. This meant that later on when we setup the page tables
> we would try to assign those regions to DOMID_IO and the
> Xen hypervisor would fail such operation. This patch
> guards against that and sets the "gap" to be after the first
> non-RAM E820 region.

This seems dodgy to me and makes assumptions about the sanity of the
BIOS provided e820 maps. e.g. it's not impossible that there are systems
out there with 2 or more little holes under 1M etc.

The truncation (from 0xb7ee0000 to 0x20000000 in this case) happens in
the dom0 kernel not the hypervisor right? So we can at least know that
we've done it.

Can we do the identity setup before that truncation happens? If not can
can we not remember the untruncated map too and refer to it as
necessary. One way of doing that might be to insert an e820 region
covering the truncated region to identify it as such (perhaps
E820_UNUSABLE?) or maybe integrating e.g. with the memblock reservations
(or whatever the early enough allocator is).

The scheme we have is that all pre-ballooned memory goes at the end of
the e820 right, as opposed to allowing it to first fill truncated
regions such as this? 

Ian.

> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  arch/x86/xen/setup.c |   20 ++++++++++++++++++--
>  1 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index c2a5b5f..5b2ae49 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -147,6 +147,7 @@ static unsigned long __init xen_set_identity(const struct e820map *e820)
>  {
>  	phys_addr_t last = xen_initial_domain() ? 0 : ISA_END_ADDRESS;
>  	phys_addr_t start_pci = last;
> +	phys_addr_t ram_end = last;
>  	int i;
>  	unsigned long identity = 0;
>  
> @@ -168,11 +169,26 @@ static unsigned long __init xen_set_identity(const struct e820map *e820)
>  			if (start > start_pci)
>  				identity += set_phys_range_identity(
>  					PFN_UP(start_pci), PFN_DOWN(start));
> -			start_pci = end;
>  			/* Without saving 'last' we would gooble RAM too. */
> -			last = end;
> +			start_pci = last = ram_end = end;
>  			continue;
>  		}
> +		/* Gap found right after the 1st RAM region. Skip over it.
> +		 * Why? That is b/c if we pass in dom0_mem=max:512MB and
> +		 * have in reality 1GB, the E820 is clipped at 512MB.
> +		 * In xen_set_pte_init we end up calling xen_set_domain_pte
> +		 * which asks Xen hypervisor to alter the ownership of the MFN
> +		 * to DOMID_IO. We would try to set that on PFNs from 512MB
> +		 * up to the next System RAM region (likely from 0x20000->
> +		 * 0x100000). But changing the ownership on "real" RAM regions
> +		 * will infuriate Xen hypervisor and we will fail (WARN).
> +		 * So instead of trying to set IDENTITY mapping on the gap
> +		 * between the System RAM and the first non-RAM E820 region
> +		 * we start at the non-RAM E820 region.*/
> +		if (ram_end && start >= ram_end) {
> +			start_pci = start;
> +			ram_end = 0;
> +		}
>  		start_pci = min(start, start_pci);
>  		last = end;
>  	}



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 06/11] xen/setup: Skip over 1st gap after System RAM.
  2011-02-01 15:08   ` Ian Campbell
@ 2011-02-01 17:14       ` H. Peter Anvin
  2011-02-01 22:28     ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 30+ messages in thread
From: H. Peter Anvin @ 2011-02-01 17:14 UTC (permalink / raw)
  To: Ian Campbell, Konrad Rzeszutek Wilk
  Cc: linux-kernel, Xen-devel, konrad, jeremy, Stefano Stabellini

Not merely possible, it's fairly common.

"Ian Campbell" <Ian.Campbell@eu.citrix.com> wrote:

>On Mon, 2011-01-31 at 22:44 +0000, Konrad Rzeszutek Wilk wrote:
>> If the kernel is booted with dom0_mem=max:512MB and the
>> machine has more than 512MB of RAM, the E820 we get is:
>> 
>> Xen: 0000000000100000 - 0000000020000000 (usable)
>> Xen: 00000000b7ee0000 - 00000000b7ee3000 (ACPI NVS)
>> 
>> while in actuality it is:
>> 
>> (XEN)  0000000000100000 - 00000000b7ee0000 (usable)
>> (XEN)  00000000b7ee0000 - 00000000b7ee3000 (ACPI NVS)
>> 
>> Based on that, we would determine that the "gap" between
>> 0x20000 -> 0xb7ee0 is not System RAM and try to assign it to
>> 1-1 mapping. This meant that later on when we setup the page tables
>> we would try to assign those regions to DOMID_IO and the
>> Xen hypervisor would fail such operation. This patch
>> guards against that and sets the "gap" to be after the first
>> non-RAM E820 region.
>
>This seems dodgy to me and makes assumptions about the sanity of the
>BIOS provided e820 maps. e.g. it's not impossible that there are
>systems
>out there with 2 or more little holes under 1M etc.
>
>The truncation (from 0xb7ee0000 to 0x20000000 in this case) happens in
>the dom0 kernel not the hypervisor right? So we can at least know that
>we've done it.
>
>Can we do the identity setup before that truncation happens? If not can
>can we not remember the untruncated map too and refer to it as
>necessary. One way of doing that might be to insert an e820 region
>covering the truncated region to identify it as such (perhaps
>E820_UNUSABLE?) or maybe integrating e.g. with the memblock
>reservations
>(or whatever the early enough allocator is).
>
>The scheme we have is that all pre-ballooned memory goes at the end of
>the e820 right, as opposed to allowing it to first fill truncated
>regions such as this? 
>
>Ian.
>
>> 
>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> ---
>>  arch/x86/xen/setup.c |   20 ++++++++++++++++++--
>>  1 files changed, 18 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
>> index c2a5b5f..5b2ae49 100644
>> --- a/arch/x86/xen/setup.c
>> +++ b/arch/x86/xen/setup.c
>> @@ -147,6 +147,7 @@ static unsigned long __init
>xen_set_identity(const struct e820map *e820)
>>  {
>>  	phys_addr_t last = xen_initial_domain() ? 0 : ISA_END_ADDRESS;
>>  	phys_addr_t start_pci = last;
>> +	phys_addr_t ram_end = last;
>>  	int i;
>>  	unsigned long identity = 0;
>>  
>> @@ -168,11 +169,26 @@ static unsigned long __init
>xen_set_identity(const struct e820map *e820)
>>  			if (start > start_pci)
>>  				identity += set_phys_range_identity(
>>  					PFN_UP(start_pci), PFN_DOWN(start));
>> -			start_pci = end;
>>  			/* Without saving 'last' we would gooble RAM too. */
>> -			last = end;
>> +			start_pci = last = ram_end = end;
>>  			continue;
>>  		}
>> +		/* Gap found right after the 1st RAM region. Skip over it.
>> +		 * Why? That is b/c if we pass in dom0_mem=max:512MB and
>> +		 * have in reality 1GB, the E820 is clipped at 512MB.
>> +		 * In xen_set_pte_init we end up calling xen_set_domain_pte
>> +		 * which asks Xen hypervisor to alter the ownership of the MFN
>> +		 * to DOMID_IO. We would try to set that on PFNs from 512MB
>> +		 * up to the next System RAM region (likely from 0x20000->
>> +		 * 0x100000). But changing the ownership on "real" RAM regions
>> +		 * will infuriate Xen hypervisor and we will fail (WARN).
>> +		 * So instead of trying to set IDENTITY mapping on the gap
>> +		 * between the System RAM and the first non-RAM E820 region
>> +		 * we start at the non-RAM E820 region.*/
>> +		if (ram_end && start >= ram_end) {
>> +			start_pci = start;
>> +			ram_end = 0;
>> +		}
>>  		start_pci = min(start, start_pci);
>>  		last = end;
>>  	}

-- 
Sent from my mobile phone.  Please pardon any lack of formatting.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 06/11] xen/setup: Skip over 1st gap after System RAM.
@ 2011-02-01 17:14       ` H. Peter Anvin
  0 siblings, 0 replies; 30+ messages in thread
From: H. Peter Anvin @ 2011-02-01 17:14 UTC (permalink / raw)
  To: Ian Campbell, Konrad Rzeszutek Wilk
  Cc: konrad, jeremy, Xen-devel, linux-kernel, Stefano Stabellini

Not merely possible, it's fairly common.

"Ian Campbell" <Ian.Campbell@eu.citrix.com> wrote:

>On Mon, 2011-01-31 at 22:44 +0000, Konrad Rzeszutek Wilk wrote:
>> If the kernel is booted with dom0_mem=max:512MB and the
>> machine has more than 512MB of RAM, the E820 we get is:
>> 
>> Xen: 0000000000100000 - 0000000020000000 (usable)
>> Xen: 00000000b7ee0000 - 00000000b7ee3000 (ACPI NVS)
>> 
>> while in actuality it is:
>> 
>> (XEN)  0000000000100000 - 00000000b7ee0000 (usable)
>> (XEN)  00000000b7ee0000 - 00000000b7ee3000 (ACPI NVS)
>> 
>> Based on that, we would determine that the "gap" between
>> 0x20000 -> 0xb7ee0 is not System RAM and try to assign it to
>> 1-1 mapping. This meant that later on when we setup the page tables
>> we would try to assign those regions to DOMID_IO and the
>> Xen hypervisor would fail such operation. This patch
>> guards against that and sets the "gap" to be after the first
>> non-RAM E820 region.
>
>This seems dodgy to me and makes assumptions about the sanity of the
>BIOS provided e820 maps. e.g. it's not impossible that there are
>systems
>out there with 2 or more little holes under 1M etc.
>
>The truncation (from 0xb7ee0000 to 0x20000000 in this case) happens in
>the dom0 kernel not the hypervisor right? So we can at least know that
>we've done it.
>
>Can we do the identity setup before that truncation happens? If not can
>can we not remember the untruncated map too and refer to it as
>necessary. One way of doing that might be to insert an e820 region
>covering the truncated region to identify it as such (perhaps
>E820_UNUSABLE?) or maybe integrating e.g. with the memblock
>reservations
>(or whatever the early enough allocator is).
>
>The scheme we have is that all pre-ballooned memory goes at the end of
>the e820 right, as opposed to allowing it to first fill truncated
>regions such as this? 
>
>Ian.
>
>> 
>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> ---
>>  arch/x86/xen/setup.c |   20 ++++++++++++++++++--
>>  1 files changed, 18 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
>> index c2a5b5f..5b2ae49 100644
>> --- a/arch/x86/xen/setup.c
>> +++ b/arch/x86/xen/setup.c
>> @@ -147,6 +147,7 @@ static unsigned long __init
>xen_set_identity(const struct e820map *e820)
>>  {
>>  	phys_addr_t last = xen_initial_domain() ? 0 : ISA_END_ADDRESS;
>>  	phys_addr_t start_pci = last;
>> +	phys_addr_t ram_end = last;
>>  	int i;
>>  	unsigned long identity = 0;
>>  
>> @@ -168,11 +169,26 @@ static unsigned long __init
>xen_set_identity(const struct e820map *e820)
>>  			if (start > start_pci)
>>  				identity += set_phys_range_identity(
>>  					PFN_UP(start_pci), PFN_DOWN(start));
>> -			start_pci = end;
>>  			/* Without saving 'last' we would gooble RAM too. */
>> -			last = end;
>> +			start_pci = last = ram_end = end;
>>  			continue;
>>  		}
>> +		/* Gap found right after the 1st RAM region. Skip over it.
>> +		 * Why? That is b/c if we pass in dom0_mem=max:512MB and
>> +		 * have in reality 1GB, the E820 is clipped at 512MB.
>> +		 * In xen_set_pte_init we end up calling xen_set_domain_pte
>> +		 * which asks Xen hypervisor to alter the ownership of the MFN
>> +		 * to DOMID_IO. We would try to set that on PFNs from 512MB
>> +		 * up to the next System RAM region (likely from 0x20000->
>> +		 * 0x100000). But changing the ownership on "real" RAM regions
>> +		 * will infuriate Xen hypervisor and we will fail (WARN).
>> +		 * So instead of trying to set IDENTITY mapping on the gap
>> +		 * between the System RAM and the first non-RAM E820 region
>> +		 * we start at the non-RAM E820 region.*/
>> +		if (ram_end && start >= ram_end) {
>> +			start_pci = start;
>> +			ram_end = 0;
>> +		}
>>  		start_pci = min(start, start_pci);
>>  		last = end;
>>  	}

-- 
Sent from my mobile phone.  Please pardon any lack of formatting.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 07/11] x86/setup: Consult the raw E820 for zero sized E820 RAM regions.
  2011-01-31 22:44 ` [PATCH 07/11] x86/setup: Consult the raw E820 for zero sized E820 RAM regions Konrad Rzeszutek Wilk
@ 2011-02-01 17:52   ` Stefano Stabellini
  2011-02-01 22:29     ` [Xen-devel] " Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 30+ messages in thread
From: Stefano Stabellini @ 2011-02-01 17:52 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, Xen-devel, konrad, jeremy, hpa, Stefano Stabellini,
	Ian Campbell

On Mon, 31 Jan 2011, Konrad Rzeszutek Wilk wrote:
> When the Xen hypervisor provides us with an E820, it can
> contain zero sized RAM regions. Those are entries that have
> been trimmed down due to the user utilizing the dom0_mem flag.
> 
> What it means is that there is RAM at those regions, and we
> should _not_ be considering those regions as 1-1 mapping.
> 
> This dom0_mem parameter changes a nice looking E820 like this:
> 
> Xen: 0000000000000000 - 000000000009d000 (usable)
> Xen: 000000000009d000 - 0000000000100000 (reserved)
> Xen: 0000000000100000 - 000000009cf67000 (usable)
> Xen: 000000009cf67000 - 000000009d103000 (ACPI NVS)
> Xen: 000000009d103000 - 000000009f6bd000 (usable)
> Xen: 000000009f6bd000 - 000000009f6bf000 (reserved)
> Xen: 000000009f6bf000 - 000000009f714000 (usable)
> 
> (wherein we would happily set 9d->100, 9cf67->9d103, and
> 9f6bd->9f6bf to identity mapping) .. but with a dom0_mem
> argument (say dom0_mem=700MB) it looks as so:
> 
> Xen: 0000000000000000 - 000000000009d000 (usable)
> Xen: 000000000009d000 - 0000000000100000 (reserved)
> Xen: 0000000000100000 - 000000002bc00000 (usable)
> Xen: 000000009cf67000 - 000000009d103000 (ACPI NVS)
> Xen: 000000009f6bd000 - 000000009f6bf000 (reserved)
> 
> We would set 9d->100, and 9cf670->9f6bf to identity
> mapping. The region from 9d103->9f6bd - which is
> System RAM where a guest could be allocated from,
> would be considered identity which is incorrect.
> 
> [Note: this printout of the E820 is after E820
> sanitization, the raw E820 would look like this]:
> 
> Xen: 0000000000000000 - 000000000009d000 (usable)
> Xen: 000000000009d000 - 0000000000100000 (reserved)
> Xen: 0000000000100000 - 000000002bc00000 (usable)
> Xen: 000000009cf67000 - 000000009d103000 (ACPI NVS)
> Xen: 000000009d103000 - 000000009d103000 (usable)  <===
> Xen: 000000009f6bd000 - 000000009f6bf000 (reserved)
> 
> [Notice the "usable" zero sized region]
> 
> This patch consults the non-sanitized version of the E820
> and checks if there are zero-sized RAM regions right before
> the non-RAM E820 entry we are currently evaluating.
> If so, we utilize the 'ram_end' value to piggyback on the
> code introduced by "xen/setup: Pay attention to zero sized
> E820_RAM regions" patch. Also we add a printk to help
> us determine which region has been set to 1-1 mapping and
> add some sanity checking.
> 
> We must keep those regions zero-size E820 RAM regions
> as is (so missing), otherwise the M2P override code can
> malfunction if a guest grant page is present in those regions.
> 
> Shifting the "xen_set_identity" to be called earlier (so that
> we are using the non-sanitized version of the &e820) does not
> work as we need to take into account the E820 after the
> initial increase/decrease reservation done and addition of a
> new E820 region in 'xen_add_extra_mem').

Can we just call two different functions, one before sanitizing the e820
and another after xen_add_extra_mem?
We could just go through the original e820 and set as identity all the
non-ram regions, after all we don't want to risk setting as identity
valid ram regions.
If the problem is caused by xen_memory_setup modifying the e820, maybe
we could avoid doing that, adding all the extra memory regions to the
balloon without moving them together to the end.
It is just that this code (and xen_memory_setup) lookis a bit too
complicated.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 11/11] xen/m2p: Check whether the MFN has IDENTITY_FRAME bit set..
  2011-01-31 22:44 ` [PATCH 11/11] xen/m2p: Check whether the MFN has IDENTITY_FRAME bit set Konrad Rzeszutek Wilk
@ 2011-02-01 17:52   ` Stefano Stabellini
  2011-02-01 20:29       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 30+ messages in thread
From: Stefano Stabellini @ 2011-02-01 17:52 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, Xen-devel, konrad, jeremy, hpa, Stefano Stabellini,
	Ian Campbell

On Mon, 31 Jan 2011, Konrad Rzeszutek Wilk wrote:
> From: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> 
> If we have the IDENTITY_FRAME bit set from the P2M, there
> is no point in looking up MFN in the M2P override. This is
> b/c the MFN is a physical MFN.
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  arch/x86/include/asm/xen/page.h |    8 +++++++-
>  1 files changed, 7 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
> index ed46ec2..e6f7f37 100644
> --- a/arch/x86/include/asm/xen/page.h
> +++ b/arch/x86/include/asm/xen/page.h
> @@ -81,6 +81,7 @@ static inline int phys_to_machine_mapping_valid(unsigned long pfn)
>  static inline unsigned long mfn_to_pfn(unsigned long mfn)
>  {
>  	unsigned long pfn;
> +	unsigned long p2m_mfn;
>  
>  	if (xen_feature(XENFEAT_auto_translated_physmap))
>  		return mfn;
> @@ -102,7 +103,12 @@ try_override:
>  	 * doesn't map back to the mfn), then check the local override
>  	 * table to see if there's a better pfn to use.
>  	 */
> -	if (get_phys_to_machine(pfn) != mfn)
> +	p2m_mfn = get_phys_to_machine(pfn);
> +
> +	if (p2m_mfn == IDENTITY_FRAME(mfn))
> +		return pfn;
> +
> +	if (p2m_mfn != mfn)
>  		pfn = m2p_find_override_pfn(mfn, pfn);
>  
>  	return pfn;
 

I have been thinking some more about it and now I have few questions:

1) is it possible for a single domain to have a valid mfn with the same
number as an identity mfn (apart from the IDENTITY_FRAME bit)?

2) is it true that mfn_to_pfn should never be called passing an identity
mfn (because we set _PAGE_IOMAP)?

3) what is the value returned by m2p(identity_mfn)? Is it a correct pfn
or may be something like 0xfffff or 0xeeeee?


These are my guessed answers:

1) yes, it is possible.
For example mfn=0xc0100 might be a valid ram mfn present in the mfn_list
of a domU and also be present as 1:1 mfn of the 3G-4G region.
For this reason I think we should look in m2p_override first and check
for possible identity mapping later.
We might want to avoid these situations but the only way I can see to do
it would be to make sure that the 1:1 regions are always subset of
the host reserved regions, even for domUs.

2) yes indeed.
One more reason to look in the m2p_override first.

3) the returned pfn might be 0xfffff or 0xeeeee.
We should use the mfn value directly as pfn value to check for possible
identity mappings.


The resulting patch looks like the following:

---


diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index ed46ec2..7f9bae2 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -80,6 +80,7 @@ static inline int phys_to_machine_mapping_valid(unsigned long pfn)
 
 static inline unsigned long mfn_to_pfn(unsigned long mfn)
 {
+	int ret = 0;
 	unsigned long pfn;
 
 	if (xen_feature(XENFEAT_auto_translated_physmap))
@@ -95,15 +96,21 @@ static inline unsigned long mfn_to_pfn(unsigned long mfn)
 	 * In such cases it doesn't matter what we return (we return garbage),
 	 * but we must handle the fault without crashing!
 	 */
-	__get_user(pfn, &machine_to_phys_mapping[mfn]);
+	ret = __get_user(pfn, &machine_to_phys_mapping[mfn]);
 try_override:
 	/*
 	 * If this appears to be a foreign mfn (because the pfn
 	 * doesn't map back to the mfn), then check the local override
 	 * table to see if there's a better pfn to use.
 	 */
-	if (get_phys_to_machine(pfn) != mfn)
-		pfn = m2p_find_override_pfn(mfn, pfn);
+	if (ret < 0)
+		pfn = ~0;
+	else if (get_phys_to_machine(pfn) != mfn)
+		pfn = m2p_find_override_pfn(mfn, ~0);
+
+	if (pfn == ~0 &&
+			get_phys_to_machine(mfn) == IDENTITY_FRAME(mfn))
+		pfn = mfn;
 
 	return pfn;
 }

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 11/11] xen/m2p: Check whether the MFN has IDENTITY_FRAME bit set..
  2011-02-01 17:52   ` Stefano Stabellini
@ 2011-02-01 20:29       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 30+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-02-01 20:29 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: linux-kernel, Xen-devel, konrad, jeremy, hpa, Ian Campbell

On Tue, Feb 01, 2011 at 05:52:29PM +0000, Stefano Stabellini wrote:
> On Mon, 31 Jan 2011, Konrad Rzeszutek Wilk wrote:
> > From: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > 
> > If we have the IDENTITY_FRAME bit set from the P2M, there
> > is no point in looking up MFN in the M2P override. This is
> > b/c the MFN is a physical MFN.
> > 
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > ---
> >  arch/x86/include/asm/xen/page.h |    8 +++++++-
> >  1 files changed, 7 insertions(+), 1 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
> > index ed46ec2..e6f7f37 100644
> > --- a/arch/x86/include/asm/xen/page.h
> > +++ b/arch/x86/include/asm/xen/page.h
> > @@ -81,6 +81,7 @@ static inline int phys_to_machine_mapping_valid(unsigned long pfn)
> >  static inline unsigned long mfn_to_pfn(unsigned long mfn)
> >  {
> >  	unsigned long pfn;
> > +	unsigned long p2m_mfn;
> >  
> >  	if (xen_feature(XENFEAT_auto_translated_physmap))
> >  		return mfn;
> > @@ -102,7 +103,12 @@ try_override:
> >  	 * doesn't map back to the mfn), then check the local override
> >  	 * table to see if there's a better pfn to use.
> >  	 */
> > -	if (get_phys_to_machine(pfn) != mfn)
> > +	p2m_mfn = get_phys_to_machine(pfn);
> > +
> > +	if (p2m_mfn == IDENTITY_FRAME(mfn))
> > +		return pfn;
> > +
> > +	if (p2m_mfn != mfn)
> >  		pfn = m2p_find_override_pfn(mfn, pfn);
> >  
> >  	return pfn;
>  
> 
> I have been thinking some more about it and now I have few questions:
> 
> 1) is it possible for a single domain to have a valid mfn with the same
> number as an identity mfn (apart from the IDENTITY_FRAME bit)?

Yes.
> 
> 2) is it true that mfn_to_pfn should never be called passing an identity
> mfn (because we set _PAGE_IOMAP)?

Yes. And currently the code checks for _PAGE_IOMAP and bypasses the
M2P.

> 
> 3) what is the value returned by m2p(identity_mfn)? Is it a correct pfn
> or may be something like 0xfffff or 0xeeeee?

0xFFFFF... or 0x5555555..
> 
> 
> These are my guessed answers:
> 
> 1) yes, it is possible.
> For example mfn=0xc0100 might be a valid ram mfn present in the mfn_list
> of a domU and also be present as 1:1 mfn of the 3G-4G region.

If we consider it valid, then it would be in the E820 as an E820_RAM
type. The xen_setup_identity code would skip over that region and not
mark is as IDENTITY.

Keep in mind the code skips over small/big E820_RAM regions even if
those regions have reserved E820 regions on both sides.

> For this reason I think we should look in m2p_override first and check
> for possible identity mapping later.
> We might want to avoid these situations but the only way I can see to do
> it would be to make sure that the 1:1 regions are always subset of
> the host reserved regions, even for domUs.

Right, and they are...
> 
> 2) yes indeed.
> One more reason to look in the m2p_override first.

Not sure I understand.
> 
> 3) the returned pfn might be 0xfffff or 0xeeeee.
> We should use the mfn value directly as pfn value to check for possible
> identity mappings.

Aren't we doing that via 'get_phys_to_machine' ? It returns the value
and if it has the IDENTITY_FRAME_BIT it is an identity.

Or are you thinking of abolishing the IDENTITY_FRAME_BIT and check the
M2P in conjunction with the P2M to see if the mfn is a 1-1 mapping?

> 
> 
> The resulting patch looks like the following:
> 
> ---
> 
> 
> diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
> index ed46ec2..7f9bae2 100644
> --- a/arch/x86/include/asm/xen/page.h
> +++ b/arch/x86/include/asm/xen/page.h
> @@ -80,6 +80,7 @@ static inline int phys_to_machine_mapping_valid(unsigned long pfn)
>  
>  static inline unsigned long mfn_to_pfn(unsigned long mfn)
>  {
> +	int ret = 0;
>  	unsigned long pfn;
>  
>  	if (xen_feature(XENFEAT_auto_translated_physmap))
> @@ -95,15 +96,21 @@ static inline unsigned long mfn_to_pfn(unsigned long mfn)
>  	 * In such cases it doesn't matter what we return (we return garbage),
>  	 * but we must handle the fault without crashing!
>  	 */
> -	__get_user(pfn, &machine_to_phys_mapping[mfn]);
> +	ret = __get_user(pfn, &machine_to_phys_mapping[mfn]);
>  try_override:
>  	/*
>  	 * If this appears to be a foreign mfn (because the pfn
>  	 * doesn't map back to the mfn), then check the local override
>  	 * table to see if there's a better pfn to use.
>  	 */
> -	if (get_phys_to_machine(pfn) != mfn)
> -		pfn = m2p_find_override_pfn(mfn, pfn);
> +	if (ret < 0)
> +		pfn = ~0;
> +	else if (get_phys_to_machine(pfn) != mfn)
> +		pfn = m2p_find_override_pfn(mfn, ~0);
> +
> +	if (pfn == ~0 &&

You should also check for 0x55555... then.

> +			get_phys_to_machine(mfn) == IDENTITY_FRAME(mfn))
> +		pfn = mfn;

So for identity type mfns we end up calling 'get_phys_to_machine(mfn)' twice
I think?

Would it make sense to save the result of 'get_phys_to_machine(mfn)'
the first call?

>  
>  	return pfn;
>  }

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 11/11] xen/m2p: Check whether the MFN has IDENTITY_FRAME bit set..
@ 2011-02-01 20:29       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 30+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-02-01 20:29 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: jeremy, Xen-devel, linux-kernel, Ian Campbell, konrad, hpa

On Tue, Feb 01, 2011 at 05:52:29PM +0000, Stefano Stabellini wrote:
> On Mon, 31 Jan 2011, Konrad Rzeszutek Wilk wrote:
> > From: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > 
> > If we have the IDENTITY_FRAME bit set from the P2M, there
> > is no point in looking up MFN in the M2P override. This is
> > b/c the MFN is a physical MFN.
> > 
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > ---
> >  arch/x86/include/asm/xen/page.h |    8 +++++++-
> >  1 files changed, 7 insertions(+), 1 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
> > index ed46ec2..e6f7f37 100644
> > --- a/arch/x86/include/asm/xen/page.h
> > +++ b/arch/x86/include/asm/xen/page.h
> > @@ -81,6 +81,7 @@ static inline int phys_to_machine_mapping_valid(unsigned long pfn)
> >  static inline unsigned long mfn_to_pfn(unsigned long mfn)
> >  {
> >  	unsigned long pfn;
> > +	unsigned long p2m_mfn;
> >  
> >  	if (xen_feature(XENFEAT_auto_translated_physmap))
> >  		return mfn;
> > @@ -102,7 +103,12 @@ try_override:
> >  	 * doesn't map back to the mfn), then check the local override
> >  	 * table to see if there's a better pfn to use.
> >  	 */
> > -	if (get_phys_to_machine(pfn) != mfn)
> > +	p2m_mfn = get_phys_to_machine(pfn);
> > +
> > +	if (p2m_mfn == IDENTITY_FRAME(mfn))
> > +		return pfn;
> > +
> > +	if (p2m_mfn != mfn)
> >  		pfn = m2p_find_override_pfn(mfn, pfn);
> >  
> >  	return pfn;
>  
> 
> I have been thinking some more about it and now I have few questions:
> 
> 1) is it possible for a single domain to have a valid mfn with the same
> number as an identity mfn (apart from the IDENTITY_FRAME bit)?

Yes.
> 
> 2) is it true that mfn_to_pfn should never be called passing an identity
> mfn (because we set _PAGE_IOMAP)?

Yes. And currently the code checks for _PAGE_IOMAP and bypasses the
M2P.

> 
> 3) what is the value returned by m2p(identity_mfn)? Is it a correct pfn
> or may be something like 0xfffff or 0xeeeee?

0xFFFFF... or 0x5555555..
> 
> 
> These are my guessed answers:
> 
> 1) yes, it is possible.
> For example mfn=0xc0100 might be a valid ram mfn present in the mfn_list
> of a domU and also be present as 1:1 mfn of the 3G-4G region.

If we consider it valid, then it would be in the E820 as an E820_RAM
type. The xen_setup_identity code would skip over that region and not
mark is as IDENTITY.

Keep in mind the code skips over small/big E820_RAM regions even if
those regions have reserved E820 regions on both sides.

> For this reason I think we should look in m2p_override first and check
> for possible identity mapping later.
> We might want to avoid these situations but the only way I can see to do
> it would be to make sure that the 1:1 regions are always subset of
> the host reserved regions, even for domUs.

Right, and they are...
> 
> 2) yes indeed.
> One more reason to look in the m2p_override first.

Not sure I understand.
> 
> 3) the returned pfn might be 0xfffff or 0xeeeee.
> We should use the mfn value directly as pfn value to check for possible
> identity mappings.

Aren't we doing that via 'get_phys_to_machine' ? It returns the value
and if it has the IDENTITY_FRAME_BIT it is an identity.

Or are you thinking of abolishing the IDENTITY_FRAME_BIT and check the
M2P in conjunction with the P2M to see if the mfn is a 1-1 mapping?

> 
> 
> The resulting patch looks like the following:
> 
> ---
> 
> 
> diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
> index ed46ec2..7f9bae2 100644
> --- a/arch/x86/include/asm/xen/page.h
> +++ b/arch/x86/include/asm/xen/page.h
> @@ -80,6 +80,7 @@ static inline int phys_to_machine_mapping_valid(unsigned long pfn)
>  
>  static inline unsigned long mfn_to_pfn(unsigned long mfn)
>  {
> +	int ret = 0;
>  	unsigned long pfn;
>  
>  	if (xen_feature(XENFEAT_auto_translated_physmap))
> @@ -95,15 +96,21 @@ static inline unsigned long mfn_to_pfn(unsigned long mfn)
>  	 * In such cases it doesn't matter what we return (we return garbage),
>  	 * but we must handle the fault without crashing!
>  	 */
> -	__get_user(pfn, &machine_to_phys_mapping[mfn]);
> +	ret = __get_user(pfn, &machine_to_phys_mapping[mfn]);
>  try_override:
>  	/*
>  	 * If this appears to be a foreign mfn (because the pfn
>  	 * doesn't map back to the mfn), then check the local override
>  	 * table to see if there's a better pfn to use.
>  	 */
> -	if (get_phys_to_machine(pfn) != mfn)
> -		pfn = m2p_find_override_pfn(mfn, pfn);
> +	if (ret < 0)
> +		pfn = ~0;
> +	else if (get_phys_to_machine(pfn) != mfn)
> +		pfn = m2p_find_override_pfn(mfn, ~0);
> +
> +	if (pfn == ~0 &&

You should also check for 0x55555... then.

> +			get_phys_to_machine(mfn) == IDENTITY_FRAME(mfn))
> +		pfn = mfn;

So for identity type mfns we end up calling 'get_phys_to_machine(mfn)' twice
I think?

Would it make sense to save the result of 'get_phys_to_machine(mfn)'
the first call?

>  
>  	return pfn;
>  }

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 02/11] xen/mmu: Add the notion of identity (1-1) mapping.
  2011-01-31 22:44 ` [PATCH 02/11] xen/mmu: Add the notion of identity (1-1) mapping Konrad Rzeszutek Wilk
@ 2011-02-01 21:33   ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2011-02-01 21:33 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, Xen-devel, konrad, hpa, stefano.stabellini, Ian.Campbell

On 01/31/2011 02:44 PM, Konrad Rzeszutek Wilk wrote:
> Our P2M tree structure is a three-level. On the leaf nodes
> we set the Machine Frame Number (MFN) of the PFN. What this means
> is that when one does: pfn_to_mfn(pfn), which is used when creating
> PTE entries, you get the real MFN of the hardware. When Xen sets
> up a guest it initially populates a array which has descending
> (or ascending) MFN values, as so:
>
>  idx: 0,  1,       2
>  [0x290F, 0x290E, 0x290D, ..]
>
> so pfn_to_mfn(2)==0x290D. If you start, restart many guests that list
> starts looking quite random.
>
> We graft this structure on our P2M tree structure and stick in
> those MFN in the leafs. But for all other leaf entries, or for the top
> root, or middle one, for which there is a void entry, we assume it is
> "missing". So
>  pfn_to_mfn(0xc0000)=INVALID_P2M_ENTRY.
>
> We add the possibility of setting 1-1 mappings on certain regions, so
> that:
>  pfn_to_mfn(0xc0000)=0xc0000
>
> The benefit of this is, that we can assume for non-RAM regions (think
> PCI BARs, or ACPI spaces), we can create mappings easily b/c we
> get the PFN value to match the MFN.
>
> For this to work efficiently we introduce one new page p2m_identity and
> allocate (via reserved_brk) any other pages we need to cover the sides
> (1GB or 4MB boundary violations). All entries in p2m_identity are set to
> INVALID_P2M_ENTRY type (Xen toolstack only recognizes that and MFNs,
> no other fancy value).
>
> On lookup we spot that the entry points to p2m_identity and return the identity
> value instead of dereferencing and returning INVALID_P2M_ENTRY. If the entry
> points to an allocated page, we just proceed as before and return the PFN.
> If the PFN has IDENTITY_FRAME_BIT set we unmask that in appropriate functions
> (pfn_to_mfn).
>
> The reason for having the IDENTITY_FRAME_BIT instead of just returning the
> PFN is that we could find ourselves where pfn_to_mfn(pfn)==pfn for a
> non-identity pfn. To protect ourselves against we elect to set (and get) the
> IDENTITY_FRAME_BIT on all identity mapped PFNs.
>
> This simplistic diagram is used to explain the more subtle piece of code.
> There is also a digram of the P2M at the end that can help.
> Imagine your E820 looking as so:
>
>                    1GB                                           2GB
> /-------------------+---------\/----\         /----------\    /---+-----\
> | System RAM        | Sys RAM ||ACPI|         | reserved |    | Sys RAM |
> \-------------------+---------/\----/         \----------/    \---+-----/
>                               ^- 1029MB                       ^- 2001MB
>
> [1029MB = 263424 (0x40500), 2001MB = 512256 (0x7D100), 2048MB = 524288 (0x80000)]
>
> And dom0_mem=max:3GB,1GB is passed in to the guest, meaning memory past 1GB
> is actually not present (would have to kick the balloon driver to put it in).
>
> When we are told to set the PFNs for identity mapping (see patch: "xen/setup:
> Set identity mapping for non-RAM E820 and E820 gaps.") we pass in the start
> of the PFN and the end PFN (263424 and 512256 respectively). The first step is
> to reserve_brk a top leaf page if the p2m[1] is missing. The top leaf page
> covers 512^2 of page estate (1GB) and in case the start or end PFN is not
> aligned on 512^2*PAGE_SIZE (1GB) we loop on aligned 1GB PFNs from start pfn to
> end pfn.  We reserve_brk top leaf pages if they are missing (means they point
> to p2m_mid_missing).
>
> With the E820 example above, 263424 is not 1GB aligned so we allocate a
> reserve_brk page which will cover the PFNs estate from 0x40000 to 0x80000.
> Each entry in the allocate page is "missing" (points to p2m_missing).
>
> Next stage is to determine if we need to do a more granular boundary check
> on the 4MB (or 2MB depending on architecture) off the start and end pfn's.
> We check if the start pfn and end pfn violate that boundary check, and if
> so reserve_brk a middle (p2m[x][y]) leaf page. This way we have a much finer
> granularity of setting which PFNs are missing and which ones are identity.
> In our example 263424 and 512256 both fail the check so we reserve_brk two
> pages. Populate them with INVALID_P2M_ENTRY (so they both have "missing" values)
> and assign them to p2m[1][2] and p2m[1][488] respectively.
>
> At this point we would at minimum reserve_brk one page, but could be up to
> three. Each call to set_phys_range_identity has at maximum a three page
> cost. If we were to query the P2M at this stage, all those entries from
> start PFN through end PFN (so 1029MB -> 2001MB) would return INVALID_P2M_ENTRY
> ("missing").
>
> The next step is to walk from the start pfn to the end pfn setting
> the IDENTITY_FRAME_BIT on each PFN. This is done in '__set_phys_to_machine'.
> If we find that the middle leaf is pointing to p2m_missing we can swap it over
> to p2m_identity - this way covering 4MB (or 2MB) PFN space.  At this point we
> do not need to worry about boundary aligment (so no need to reserve_brk a middle
> page, figure out which PFNs are "missing" and which ones are identity), as that
> has been done earlier.  If we find that the middle leaf is not occupied by
> p2m_identity or p2m_missing, we dereference that page (which covers
> 512 PFNs) and set the appropriate PFN with IDENTITY_FRAME_BIT. In our example
> 263424 and 512256 end up there, and we set from p2m[1][2][256->511] and
> p2m[1][488][0->256] with IDENTITY_FRAME_BIT set.
>
> All other regions that are void (or not filled) either point to p2m_missing
> (considered missing) or have the default value of INVALID_P2M_ENTRY (also
> considered missing). In our case, p2m[1][2][0->255] and p2m[1][488][257->511]
> contain the INVALID_P2M_ENTRY value and are considered "missing."
>
> This is what the p2m ends up looking (for the E820 above) with this
> fabulous drawing:
>
>    p2m         /--------------\
>  /-----\       | &mfn_list[0],|                           /-----------------\
>  |  0  |------>| &mfn_list[1],|    /---------------\      | ~0, ~0, ..      |
>  |-----|       |  ..., ~0, ~0 |    | ~0, ~0, [x]---+----->| IDENTITY [@256] |
>  |  1  |---\   \--------------/    | [p2m_identity]+\     | IDENTITY [@257] |
>  |-----|    \                      | [p2m_identity]+\\    | ....            |
>  |  2  |--\  \-------------------->|  ...          | \\   \----------------/
>  |-----|   \                       \---------------/  \\
>  |  3  |\   \                                          \\  p2m_identity
>  |-----| \   \-------------------->/---------------\   /-----------------\
>  | ..  +->+                        | [p2m_identity]+-->| ~0, ~0, ~0, ... |
>  \-----/ /                         | [p2m_identity]+-->| ..., ~0         |
>         / /---------------\        | ....          |   \-----------------/
>        /  | IDENTITY[@0]  |      /-+-[x], ~0, ~0.. |
>       /   | IDENTITY[@256]|<----/  \---------------/
>      /    | ~0, ~0, ....  |
>     |     \---------------/
>     |
>     p2m_missing             p2m_missing
> /------------------\     /------------\
> | [p2m_mid_missing]+---->| ~0, ~0, ~0 |
> | [p2m_mid_missing]+---->| ..., ~0    |
> \------------------/     \------------/
>
> where ~0 is INVALID_P2M_ENTRY. IDENTITY is (PFN | IDENTITY_BIT)
>
> [v4: Squished patches in just this one]
> [v5: Changed code to use ranges, added ASCII art]
> [v6: Rebased on top of xen->p2m code split]
> [v7: Added RESERVE_BRK for potentially allocated pages]
> [v8: Fixed alignment problem]
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  arch/x86/include/asm/xen/page.h |    6 ++-
>  arch/x86/xen/p2m.c              |  109 ++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 112 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
> index 8ea9772..47c1b59 100644
> --- a/arch/x86/include/asm/xen/page.h
> +++ b/arch/x86/include/asm/xen/page.h
> @@ -30,7 +30,9 @@ typedef struct xpaddr {
>  /**** MACHINE <-> PHYSICAL CONVERSION MACROS ****/
>  #define INVALID_P2M_ENTRY	(~0UL)
>  #define FOREIGN_FRAME_BIT	(1UL<<31)
> +#define IDENTITY_FRAME_BIT	(1UL<<30)

These need to be BITS_PER_LONG-1 and -2.

    J

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 04/11] xen/mmu: BUG_ON when racing to swap middle leaf.
  2011-01-31 22:44 ` [PATCH 04/11] xen/mmu: BUG_ON when racing to swap middle leaf Konrad Rzeszutek Wilk
@ 2011-02-01 21:34   ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2011-02-01 21:34 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, Xen-devel, konrad, hpa, stefano.stabellini, Ian.Campbell

On 01/31/2011 02:44 PM, Konrad Rzeszutek Wilk wrote:
> The initial bootup code uses set_phys_to_machine quite a lot, and after
> bootup it would be used by the balloon driver. The balloon driver does have
> mutex lock so this should not be necessary - but just in case, add
> a BUG_ON if we do hit this scenario.
>
> [v2: Change from WARN to BUG_ON]
> [v3: Rebased on top of xen->p2m code split]
> Reviewed-by: Ian Campbell <Ian.Campbell@eu.citrix.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  arch/x86/xen/p2m.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
> index 19b0a65..fbbd2ab 100644
> --- a/arch/x86/xen/p2m.c
> +++ b/arch/x86/xen/p2m.c
> @@ -478,7 +478,8 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
>  
>  		/* Swap over from MISSING to IDENTITY if needed. */
>  		if (p2m_top[topidx][mididx] == p2m_missing) {
> -			p2m_top[topidx][mididx] = p2m_identity;
> +			BUG_ON(cmpxchg(&p2m_top[topidx][mididx], p2m_missing,
> +				p2m_identity) != p2m_missing);

Don't put side-effects in BUG_ONs.  Why is it a bug anyway?

    J

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 06/11] xen/setup: Skip over 1st gap after System RAM.
  2011-02-01 15:08   ` Ian Campbell
  2011-02-01 17:14       ` H. Peter Anvin
@ 2011-02-01 22:28     ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 30+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-02-01 22:28 UTC (permalink / raw)
  To: Ian Campbell
  Cc: linux-kernel, Xen-devel, konrad, jeremy, hpa, Stefano Stabellini

On Tue, Feb 01, 2011 at 03:08:16PM +0000, Ian Campbell wrote:
> On Mon, 2011-01-31 at 22:44 +0000, Konrad Rzeszutek Wilk wrote:
> > If the kernel is booted with dom0_mem=max:512MB and the
> > machine has more than 512MB of RAM, the E820 we get is:
> > 
> > Xen: 0000000000100000 - 0000000020000000 (usable)
> > Xen: 00000000b7ee0000 - 00000000b7ee3000 (ACPI NVS)
> > 
> > while in actuality it is:
> > 
> > (XEN)  0000000000100000 - 00000000b7ee0000 (usable)
> > (XEN)  00000000b7ee0000 - 00000000b7ee3000 (ACPI NVS)
> > 
> > Based on that, we would determine that the "gap" between
> > 0x20000 -> 0xb7ee0 is not System RAM and try to assign it to
> > 1-1 mapping. This meant that later on when we setup the page tables
> > we would try to assign those regions to DOMID_IO and the
> > Xen hypervisor would fail such operation. This patch
> > guards against that and sets the "gap" to be after the first
> > non-RAM E820 region.
> 
> This seems dodgy to me and makes assumptions about the sanity of the
> BIOS provided e820 maps. e.g. it's not impossible that there are systems
> out there with 2 or more little holes under 1M etc.

[edit: I am droppping this patch.. explanation at the end of email]

Are you thinking of something like this:

   000->a00 [RAM]
   b00->c00 [reserved]
   dff->f00 [RAM]

So there is a gap between a00->b00, and c00->dff which is not System RAM
but real PCI space and as such should be considered identity mapping?
(Lets ignore the fact that we consider any access under ISA_END_ADDRESS
to be identity).

The hypervisor is the one that truncates the E820_RAM such that it is dangerous
to consider the gap after the end of E820_RAM to be not System RAM.
(You actually were hitting this way back when you ran my patchset the first time).
[edit: actually I was mistaken, read below..]



> 
> The truncation (from 0xb7ee0000 to 0x20000000 in this case) happens in
> the dom0 kernel not the hypervisor right? So we can at least know that
> we've done it.

The hypervisor does this. [edit: I missed this code:

280                 if (map[i].type == E820_RAM && end > mem_end) {
281                         /* RAM off the end - may be partially included */
282                         u64 delta = min(map[i].size, end - mem_end);
283                         

which shows that we, dom0, truncate the E820 entries.]

> 
> Can we do the identity setup before that truncation happens? If not can

Sadly no. [edit: happily yes]

There are two types of truncation:

1). The hypervisor truncates the E820_RAM to the proper PFN number. If
there are E820_RAM regions past the first one (see the
example in "x86/setup: Consult the raw E820 for zero sized E820 RAM regions.")
it makes the size of those E820_RAM to be zero. The patch I mentioned
consults the 'raw' E820 to see if this exists.
[edit: ignore this pls, the code in xen_memory_setup does the truncation]


2). The Linux kernel e820 library "sanititizes" the E820. This means
if you have E820_RAM regions with zero size they disappear. We can't
use the E820 before this sanitization b/c the increase/decrease code has to
run its course to fill up the E820_RAM past the 4GB.

> can we not remember the untruncated map too and refer to it as
> necessary. One way of doing that might be to insert an e820 region
> covering the truncated region to identify it as such (perhaps
> E820_UNUSABLE?) or maybe integrating e.g. with the memblock reservations
> (or whatever the early enough allocator is).
> 
> The scheme we have is that all pre-ballooned memory goes at the end of
> the e820 right, as opposed to allowing it to first fill truncated
> regions such as this? 

Correct. We siphon out any Sysem RAM memory that is under 4GB that can
be siphoned out and expand the E820_RAM entry past the 4GB with the count
of PFNs that we siphoned out.


[edit: reason why I am dropping this patch]

Your email got me thinking that maybe I missed something about the E820
and sure enough - I somehow skipped that whole process of figuring
out the delta and messing with e820->size. The weird part is
I knew about increase/decrease memory and the delta but somehow did not
connect that those numbers are gathered during the first loop over the E820.
Talk about tunnel vision.

Anyhow, your suggestion of refering to the raw, unmodified version of
the E820 makes this all work quite nicely and I can drop:

xen/setup: Skip over 1st gap after System RAM
x86/setup: Consult the raw E820 for zero sized E820 RAM regions.

I am attaching the patch I am talking about to the "xen/setup: Set
identity mapping for non-RAM E820 and E820 gaps"

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Xen-devel] Re: [PATCH 07/11] x86/setup: Consult the raw E820 for zero sized E820 RAM regions.
  2011-02-01 17:52   ` Stefano Stabellini
@ 2011-02-01 22:29     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 30+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-02-01 22:29 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: jeremy, Xen-devel, linux-kernel, Ian Campbell, konrad, hpa

On Tue, Feb 01, 2011 at 05:52:15PM +0000, Stefano Stabellini wrote:
> On Mon, 31 Jan 2011, Konrad Rzeszutek Wilk wrote:
> > When the Xen hypervisor provides us with an E820, it can
> > contain zero sized RAM regions. Those are entries that have
> > been trimmed down due to the user utilizing the dom0_mem flag.
> > 
> > What it means is that there is RAM at those regions, and we
> > should _not_ be considering those regions as 1-1 mapping.
> > 
> > This dom0_mem parameter changes a nice looking E820 like this:
> > 
> > Xen: 0000000000000000 - 000000000009d000 (usable)
> > Xen: 000000000009d000 - 0000000000100000 (reserved)
> > Xen: 0000000000100000 - 000000009cf67000 (usable)
> > Xen: 000000009cf67000 - 000000009d103000 (ACPI NVS)
> > Xen: 000000009d103000 - 000000009f6bd000 (usable)
> > Xen: 000000009f6bd000 - 000000009f6bf000 (reserved)
> > Xen: 000000009f6bf000 - 000000009f714000 (usable)
> > 
> > (wherein we would happily set 9d->100, 9cf67->9d103, and
> > 9f6bd->9f6bf to identity mapping) .. but with a dom0_mem
> > argument (say dom0_mem=700MB) it looks as so:
> > 
> > Xen: 0000000000000000 - 000000000009d000 (usable)
> > Xen: 000000000009d000 - 0000000000100000 (reserved)
> > Xen: 0000000000100000 - 000000002bc00000 (usable)
> > Xen: 000000009cf67000 - 000000009d103000 (ACPI NVS)
> > Xen: 000000009f6bd000 - 000000009f6bf000 (reserved)
> > 
> > We would set 9d->100, and 9cf670->9f6bf to identity
> > mapping. The region from 9d103->9f6bd - which is
> > System RAM where a guest could be allocated from,
> > would be considered identity which is incorrect.
> > 
> > [Note: this printout of the E820 is after E820
> > sanitization, the raw E820 would look like this]:
> > 
> > Xen: 0000000000000000 - 000000000009d000 (usable)
> > Xen: 000000000009d000 - 0000000000100000 (reserved)
> > Xen: 0000000000100000 - 000000002bc00000 (usable)
> > Xen: 000000009cf67000 - 000000009d103000 (ACPI NVS)
> > Xen: 000000009d103000 - 000000009d103000 (usable)  <===
> > Xen: 000000009f6bd000 - 000000009f6bf000 (reserved)
> > 
> > [Notice the "usable" zero sized region]
> > 
> > This patch consults the non-sanitized version of the E820
> > and checks if there are zero-sized RAM regions right before
> > the non-RAM E820 entry we are currently evaluating.
> > If so, we utilize the 'ram_end' value to piggyback on the
> > code introduced by "xen/setup: Pay attention to zero sized
> > E820_RAM regions" patch. Also we add a printk to help
> > us determine which region has been set to 1-1 mapping and
> > add some sanity checking.
> > 
> > We must keep those regions zero-size E820 RAM regions
> > as is (so missing), otherwise the M2P override code can
> > malfunction if a guest grant page is present in those regions.
> > 
> > Shifting the "xen_set_identity" to be called earlier (so that
> > we are using the non-sanitized version of the &e820) does not
> > work as we need to take into account the E820 after the
> > initial increase/decrease reservation done and addition of a
> > new E820 region in 'xen_add_extra_mem').
> 
> Can we just call two different functions, one before sanitizing the e820
> and another after xen_add_extra_mem?
> We could just go through the original e820 and set as identity all the
> non-ram regions, after all we don't want to risk setting as identity
> valid ram regions.
> If the problem is caused by xen_memory_setup modifying the e820, maybe
> we could avoid doing that, adding all the extra memory regions to the
> balloon without moving them together to the end.
> It is just that this code (and xen_memory_setup) lookis a bit too
> complicated.

Another idea occured to me after I was ingesting Ian's comments
and that is just to parse the "raw" E820 from the hypervisor. It works
quite nicely, so dropping this patch.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 05/11] xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.
  2011-01-31 22:44 ` [PATCH 05/11] xen/setup: Set identity mapping for non-RAM E820 and E820 gaps Konrad Rzeszutek Wilk
@ 2011-02-01 22:32     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 30+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-02-01 22:32 UTC (permalink / raw)
  To: linux-kernel, Xen-devel, konrad, jeremy
  Cc: hpa, stefano.stabellini, Ian.Campbell

On Mon, Jan 31, 2011 at 05:44:30PM -0500, Konrad Rzeszutek Wilk wrote:
> We walk the E820 region and start at 0 (for PV guests we start
> at ISA_END_ADDRESS) and skip any E820 RAM regions. For all other
> regions and as well the gaps we set them to be identity mappings.
> 
> The reasons we do not want to set the identity mapping from 0->
> ISA_END_ADDRESS when running as PV is b/c that the kernel would
> try to read DMI information and fail (no permissions to read that).
> There is a lot of gnarly code to deal with that weird region so
> we won't try to do a cleanup in this patch.
> 
> This code ends up calling 'set_phys_to_identity' with the start
> and end PFN of the the E820 that are non-RAM or have gaps.
> On 99% of machines that means one big region right underneath the
> 4GB mark. Usually starts at 0xc0000 (or 0x80000) and goes to
> 0x100000.
> 

Please consider this one instead. The change is that we 
use the unmodified E820 retrieved from the hypervisor. This E820
has no changes to the size of the System RAM (which were throwing off my
ranges earlier):

commit acaf45c9c7d1b0d87c047590b9bffa0d8a30cbee
Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date:   Tue Feb 1 17:15:30 2011 -0500

    xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.
    
    We walk the E820 region and start at 0 (for PV guests we start
    at ISA_END_ADDRESS) and skip any E820 RAM regions. For all other
    regions and as well the gaps we set them to be identity mappings.
    
    The reasons we do not want to set the identity mapping from 0->
    ISA_END_ADDRESS when running as PV is b/c that the kernel would
    try to read DMI information and fail (no permissions to read that).
    There is a lot of gnarly code to deal with that weird region so
    we won't try to do a cleanup in this patch.
    
    This code ends up calling 'set_phys_to_identity' with the start
    and end PFN of the the E820 that are non-RAM or have gaps.
    On 99% of machines that means one big region right underneath the
    4GB mark. Usually starts at 0xc0000 (or 0x80000) and goes to
    0x100000.
    
    [v2: Fix for E820 crossing 1MB region and clamp the start]
    [v3: Squshed in code that does this over ranges]
    [v4: Moved the comment to the correct spot]
    [v5: Use the "raw" E820 from the hypervisor]
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 7201800..54d9379 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -143,12 +143,55 @@ static unsigned long __init xen_return_unused_memory(unsigned long max_pfn,
 	return released;
 }
 
+static unsigned long __init xen_set_identity(const struct e820entry *list,
+					     ssize_t map_size)
+{
+	phys_addr_t last = xen_initial_domain() ? 0 : ISA_END_ADDRESS;
+	phys_addr_t start_pci = last;
+	const struct e820entry *entry;
+	unsigned long identity = 0;
+	int i;
+
+	for (i = 0, entry = list; i < map_size; i++, entry++) {
+		phys_addr_t start = entry->addr;
+		phys_addr_t end = start + entry->size;
+
+		if (start < last)
+			start = last;
+
+		if (end <= start)
+			continue;
+
+		/* Skip over the 1MB region. */
+		if (last > end)
+			continue;
+
+		if (entry->type == E820_RAM) {
+			if (start > start_pci)
+				identity += set_phys_range_identity(
+						PFN_UP(start_pci), PFN_DOWN(start));
+
+			/* Without saving 'last' we would gooble RAM too
+			 * at the end of the loop. */
+			last = end;
+			start_pci = end;
+			continue;
+		}
+		start_pci = min(start, start_pci);
+		last = end;
+	}
+	if (last > start_pci)
+		identity += set_phys_range_identity(
+					PFN_UP(start_pci), PFN_DOWN(last));
+	return identity;
+}
 /**
  * machine_specific_memory_setup - Hook for machine specific memory setup.
  **/
 char * __init xen_memory_setup(void)
 {
 	static struct e820entry map[E820MAX] __initdata;
+	static struct e820entry map_raw[E820MAX] __initdata;
 
 	unsigned long max_pfn = xen_start_info->nr_pages;
 	unsigned long long mem_end;
@@ -156,6 +199,7 @@ char * __init xen_memory_setup(void)
 	struct xen_memory_map memmap;
 	unsigned long extra_pages = 0;
 	unsigned long extra_limit;
+	unsigned long identity_pages = 0;
 	int i;
 	int op;
 
@@ -181,6 +225,7 @@ char * __init xen_memory_setup(void)
 	}
 	BUG_ON(rc);
 
+	memcpy(map_raw, map, sizeof(map));
 	e820.nr_map = 0;
 	xen_extra_mem_start = mem_end;
 	for (i = 0; i < memmap.nr_entries; i++) {
@@ -251,6 +296,13 @@ char * __init xen_memory_setup(void)
 
 	xen_add_extra_mem(extra_pages);
 
+	/*
+	 * Set P2M for all non-RAM pages and E820 gaps to be identity
+	 * type PFNs. We supply it with the non-sanitized version
+	 * of the E820.
+	 */
+	identity_pages = xen_set_identity(map_raw, memmap.nr_entries);
+	printk(KERN_INFO "Set %ld page(s) to 1-1 mapping.\n", identity_pages);
 	return "Xen";
 }
 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 05/11] xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.
@ 2011-02-01 22:32     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 30+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-02-01 22:32 UTC (permalink / raw)
  To: linux-kernel, Xen-devel, konrad, jeremy
  Cc: Ian.Campbell, stefano.stabellini, hpa

On Mon, Jan 31, 2011 at 05:44:30PM -0500, Konrad Rzeszutek Wilk wrote:
> We walk the E820 region and start at 0 (for PV guests we start
> at ISA_END_ADDRESS) and skip any E820 RAM regions. For all other
> regions and as well the gaps we set them to be identity mappings.
> 
> The reasons we do not want to set the identity mapping from 0->
> ISA_END_ADDRESS when running as PV is b/c that the kernel would
> try to read DMI information and fail (no permissions to read that).
> There is a lot of gnarly code to deal with that weird region so
> we won't try to do a cleanup in this patch.
> 
> This code ends up calling 'set_phys_to_identity' with the start
> and end PFN of the the E820 that are non-RAM or have gaps.
> On 99% of machines that means one big region right underneath the
> 4GB mark. Usually starts at 0xc0000 (or 0x80000) and goes to
> 0x100000.
> 

Please consider this one instead. The change is that we 
use the unmodified E820 retrieved from the hypervisor. This E820
has no changes to the size of the System RAM (which were throwing off my
ranges earlier):

commit acaf45c9c7d1b0d87c047590b9bffa0d8a30cbee
Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date:   Tue Feb 1 17:15:30 2011 -0500

    xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.
    
    We walk the E820 region and start at 0 (for PV guests we start
    at ISA_END_ADDRESS) and skip any E820 RAM regions. For all other
    regions and as well the gaps we set them to be identity mappings.
    
    The reasons we do not want to set the identity mapping from 0->
    ISA_END_ADDRESS when running as PV is b/c that the kernel would
    try to read DMI information and fail (no permissions to read that).
    There is a lot of gnarly code to deal with that weird region so
    we won't try to do a cleanup in this patch.
    
    This code ends up calling 'set_phys_to_identity' with the start
    and end PFN of the the E820 that are non-RAM or have gaps.
    On 99% of machines that means one big region right underneath the
    4GB mark. Usually starts at 0xc0000 (or 0x80000) and goes to
    0x100000.
    
    [v2: Fix for E820 crossing 1MB region and clamp the start]
    [v3: Squshed in code that does this over ranges]
    [v4: Moved the comment to the correct spot]
    [v5: Use the "raw" E820 from the hypervisor]
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 7201800..54d9379 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -143,12 +143,55 @@ static unsigned long __init xen_return_unused_memory(unsigned long max_pfn,
 	return released;
 }
 
+static unsigned long __init xen_set_identity(const struct e820entry *list,
+					     ssize_t map_size)
+{
+	phys_addr_t last = xen_initial_domain() ? 0 : ISA_END_ADDRESS;
+	phys_addr_t start_pci = last;
+	const struct e820entry *entry;
+	unsigned long identity = 0;
+	int i;
+
+	for (i = 0, entry = list; i < map_size; i++, entry++) {
+		phys_addr_t start = entry->addr;
+		phys_addr_t end = start + entry->size;
+
+		if (start < last)
+			start = last;
+
+		if (end <= start)
+			continue;
+
+		/* Skip over the 1MB region. */
+		if (last > end)
+			continue;
+
+		if (entry->type == E820_RAM) {
+			if (start > start_pci)
+				identity += set_phys_range_identity(
+						PFN_UP(start_pci), PFN_DOWN(start));
+
+			/* Without saving 'last' we would gooble RAM too
+			 * at the end of the loop. */
+			last = end;
+			start_pci = end;
+			continue;
+		}
+		start_pci = min(start, start_pci);
+		last = end;
+	}
+	if (last > start_pci)
+		identity += set_phys_range_identity(
+					PFN_UP(start_pci), PFN_DOWN(last));
+	return identity;
+}
 /**
  * machine_specific_memory_setup - Hook for machine specific memory setup.
  **/
 char * __init xen_memory_setup(void)
 {
 	static struct e820entry map[E820MAX] __initdata;
+	static struct e820entry map_raw[E820MAX] __initdata;
 
 	unsigned long max_pfn = xen_start_info->nr_pages;
 	unsigned long long mem_end;
@@ -156,6 +199,7 @@ char * __init xen_memory_setup(void)
 	struct xen_memory_map memmap;
 	unsigned long extra_pages = 0;
 	unsigned long extra_limit;
+	unsigned long identity_pages = 0;
 	int i;
 	int op;
 
@@ -181,6 +225,7 @@ char * __init xen_memory_setup(void)
 	}
 	BUG_ON(rc);
 
+	memcpy(map_raw, map, sizeof(map));
 	e820.nr_map = 0;
 	xen_extra_mem_start = mem_end;
 	for (i = 0; i < memmap.nr_entries; i++) {
@@ -251,6 +296,13 @@ char * __init xen_memory_setup(void)
 
 	xen_add_extra_mem(extra_pages);
 
+	/*
+	 * Set P2M for all non-RAM pages and E820 gaps to be identity
+	 * type PFNs. We supply it with the non-sanitized version
+	 * of the E820.
+	 */
+	identity_pages = xen_set_identity(map_raw, memmap.nr_entries);
+	printk(KERN_INFO "Set %ld page(s) to 1-1 mapping.\n", identity_pages);
 	return "Xen";
 }

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 11/11] xen/m2p: Check whether the MFN has IDENTITY_FRAME bit set..
  2011-02-01 20:29       ` Konrad Rzeszutek Wilk
  (?)
@ 2011-02-02 11:52       ` Stefano Stabellini
  2011-02-02 16:43           ` Konrad Rzeszutek Wilk
  -1 siblings, 1 reply; 30+ messages in thread
From: Stefano Stabellini @ 2011-02-02 11:52 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, linux-kernel, Xen-devel, konrad, jeremy, hpa,
	Ian Campbell

On Tue, 1 Feb 2011, Konrad Rzeszutek Wilk wrote:
> > I have been thinking some more about it and now I have few questions:
> > 
> > 1) is it possible for a single domain to have a valid mfn with the same
> > number as an identity mfn (apart from the IDENTITY_FRAME bit)?
> 
> Yes.
> > 
> > 2) is it true that mfn_to_pfn should never be called passing an identity
> > mfn (because we set _PAGE_IOMAP)?
> 
> Yes. And currently the code checks for _PAGE_IOMAP and bypasses the
> M2P.
> 
> > 
> > 3) what is the value returned by m2p(identity_mfn)? Is it a correct pfn
> > or may be something like 0xfffff or 0xeeeee?
> 
> 0xFFFFF... or 0x5555555..
> > 
> > 
> > These are my guessed answers:
> > 
> > 1) yes, it is possible.
> > For example mfn=0xc0100 might be a valid ram mfn present in the mfn_list
> > of a domU and also be present as 1:1 mfn of the 3G-4G region.
> 
> If we consider it valid, then it would be in the E820 as an E820_RAM
> type. The xen_setup_identity code would skip over that region and not
> mark is as IDENTITY.
> 
> Keep in mind the code skips over small/big E820_RAM regions even if
> those regions have reserved E820 regions on both sides.
> 
> > For this reason I think we should look in m2p_override first and check
> > for possible identity mapping later.
> > We might want to avoid these situations but the only way I can see to do
> > it would be to make sure that the 1:1 regions are always subset of
> > the host reserved regions, even for domUs.
> 
> Right, and they are...
> > 
> > 2) yes indeed.
> > One more reason to look in the m2p_override first.
> 
> Not sure I understand.

Considering that getting in this function with an identity mfn passed as
argument shouldn't happen at all, and considering that there are cases
in which the same mfn can correspond to a normal mapping, a 1:1 mapping
or even a granted page, it is a good idea to check the m2p_override
*before* checking if the mfn is an identity mfn.
So that if there are two identical mfns, one granted and the other
one belonging to an identity mapping, we would return the pfn
corresponding to the granted mfn, that is the right answer because we
shouldn't even be here if the argument was an identity mfn.


> > 3) the returned pfn might be 0xfffff or 0xeeeee.
> > We should use the mfn value directly as pfn value to check for possible
> > identity mappings.
> 
> Aren't we doing that via 'get_phys_to_machine' ? It returns the value
> and if it has the IDENTITY_FRAME_BIT it is an identity.
> 
> Or are you thinking of abolishing the IDENTITY_FRAME_BIT and check the
> M2P in conjunction with the P2M to see if the mfn is a 1-1 mapping?
> 

Not at all.
The problem is the following:

pfn = m2p(mfn);

let's suppose that pfn is garbage (0xFFFFF.. or 0x55555..), in this case
get_phys_to_machine(pfn) will return INVALID_P2M_ENTRY that is not what
we want.
So we check for

get_phys_to_machine(mfn) == IDENTITY_FRAME(mfn)
                    ---
if it is true than it means that the mfn passed as argument belongs to
an identity mapping.



> > 
> > The resulting patch looks like the following:
> > 
> > ---
> > 
> > 
> > diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
> > index ed46ec2..7f9bae2 100644
> > --- a/arch/x86/include/asm/xen/page.h
> > +++ b/arch/x86/include/asm/xen/page.h
> > @@ -80,6 +80,7 @@ static inline int phys_to_machine_mapping_valid(unsigned long pfn)
> >  
> >  static inline unsigned long mfn_to_pfn(unsigned long mfn)
> >  {
> > +	int ret = 0;
> >  	unsigned long pfn;
> >  
> >  	if (xen_feature(XENFEAT_auto_translated_physmap))
> > @@ -95,15 +96,21 @@ static inline unsigned long mfn_to_pfn(unsigned long mfn)
> >  	 * In such cases it doesn't matter what we return (we return garbage),
> >  	 * but we must handle the fault without crashing!
> >  	 */
> > -	__get_user(pfn, &machine_to_phys_mapping[mfn]);
> > +	ret = __get_user(pfn, &machine_to_phys_mapping[mfn]);
> >  try_override:
> >  	/*
> >  	 * If this appears to be a foreign mfn (because the pfn
> >  	 * doesn't map back to the mfn), then check the local override
> >  	 * table to see if there's a better pfn to use.
> >  	 */
> > -	if (get_phys_to_machine(pfn) != mfn)
> > -		pfn = m2p_find_override_pfn(mfn, pfn);
> > +	if (ret < 0)
> > +		pfn = ~0;
> > +	else if (get_phys_to_machine(pfn) != mfn)
> > +		pfn = m2p_find_override_pfn(mfn, ~0);
> > +
> > +	if (pfn == ~0 &&
> 
> You should also check for 0x55555... then.

that is not necessary because pfn == ~0 at this point, this is the code
path:

ret = __get_user(pfn, &machine_to_phys_mapping[mfn]); /* ret is >= 0 and
                                                       pfn is 0x55555.. */
get_phys_to_machine(0x55555) != mfn /* true, I am assuming there are no
                                     valid p2m mappings at 0x55555.. */
pfn = m2p_find_override_pfn(mfn, ~0); /* the m2p_ovverride doesn't find
                                         anything so it returns ~0 (the
                                         second argument), pfn == ~0 now */
if (pfn == ~0 && /* true */


maybe I should add a comment (and drink less caffeine)?


> > +			get_phys_to_machine(mfn) == IDENTITY_FRAME(mfn))
> > +		pfn = mfn;
> 
> So for identity type mfns we end up calling 'get_phys_to_machine(mfn)' twice
> I think?
> 
> Would it make sense to save the result of 'get_phys_to_machine(mfn)'
> the first call?

the first call is get_phys_to_machine(pfn), with pfn potentially
garbage; this call is get_phys_to_machine(mfn).

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Xen-devel] Re: [PATCH 11/11] xen/m2p: Check whether the MFN has IDENTITY_FRAME bit set..
  2011-02-02 11:52       ` Stefano Stabellini
@ 2011-02-02 16:43           ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 30+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-02-02 16:43 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: jeremy, Xen-devel, linux-kernel, Ian Campbell, konrad, hpa

> > Not sure I understand.
> 
> Considering that getting in this function with an identity mfn passed as
> argument shouldn't happen at all, and considering that there are cases

True, unless a developer wants to check that p2m(m2p(mfn))=mfn and 
supplies the MFNs that cover the PCI BAR spaces.. but that is mostly
done when trying to troubleshoot something and normal folks won't do this.


> in which the same mfn can correspond to a normal mapping, a 1:1 mapping
> or even a granted page, it is a good idea to check the m2p_override
> *before* checking if the mfn is an identity mfn.
> So that if there are two identical mfns, one granted and the other
> one belonging to an identity mapping, we would return the pfn

How can you have that? Are you passing through a PCI device to
a PV domain, making the PV domain have the netback/blkback device,
and then granting those pages to another domain?

> corresponding to the granted mfn, that is the right answer because we
> shouldn't even be here if the argument was an identity mfn.
> 
> 
> > > 3) the returned pfn might be 0xfffff or 0xeeeee.

>From the machine_to_phys_mapping array. OK.
> > > We should use the mfn value directly as pfn value to check for possible
> > > identity mappings.
> > 
> > Aren't we doing that via 'get_phys_to_machine' ? It returns the value
> > and if it has the IDENTITY_FRAME_BIT it is an identity.
> > 
> > Or are you thinking of abolishing the IDENTITY_FRAME_BIT and check the
> > M2P in conjunction with the P2M to see if the mfn is a 1-1 mapping?
> > 
> 
> Not at all.
> The problem is the following:
> 
> pfn = m2p(mfn);

You are assuming that the 'mfn' is outside the scope off our machine_to_phys array
or that it is "garbage", such as 0xdeadbeef for example, I think.

In which case the pfn value is either 0xdeadbeef or INVALID_P2M_ENTRY.

If you provide that to get_phys_to_machine(0xdeadbeef) you get INVALID_P2M_ENTRY,
and you would call:
  pfn = m2p_find_override_pfn(mfn, pfn);

So we still end up checking the M2P override. If the pfn supplied
was INVALID_P2M_ENTRY, you would get the same value and still check the P2M override.


> 
> let's suppose that pfn is garbage (0xFFFFF.. or 0x55555..), in this case

0xFFFF or 0x5555 are not garbage. They are valid values. Garbage is the
value zero. 0xFFFF.. has two meanings: It is a missing page that can be
balloon-ed in, or it belongs to the DOMID_IO. The 0x5555.. means it is
a shared DOMID_IO page.

> get_phys_to_machine(pfn) will return INVALID_P2M_ENTRY that is not what
> we want.
> So we check for
> 
> get_phys_to_machine(mfn) == IDENTITY_FRAME(mfn)
>                     ---
> if it is true than it means that the mfn passed as argument belongs to
> an identity mapping.

<nods> Yup.
> 
> 
> 
> > > 
> > > The resulting patch looks like the following:
> > > 
> > > ---
> > > 
> > > 
> > > diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
> > > index ed46ec2..7f9bae2 100644
> > > --- a/arch/x86/include/asm/xen/page.h
> > > +++ b/arch/x86/include/asm/xen/page.h
> > > @@ -80,6 +80,7 @@ static inline int phys_to_machine_mapping_valid(unsigned long pfn)
> > >  
> > >  static inline unsigned long mfn_to_pfn(unsigned long mfn)
> > >  {
> > > +	int ret = 0;
> > >  	unsigned long pfn;
> > >  
> > >  	if (xen_feature(XENFEAT_auto_translated_physmap))
> > > @@ -95,15 +96,21 @@ static inline unsigned long mfn_to_pfn(unsigned long mfn)
> > >  	 * In such cases it doesn't matter what we return (we return garbage),
> > >  	 * but we must handle the fault without crashing!
> > >  	 */
> > > -	__get_user(pfn, &machine_to_phys_mapping[mfn]);
> > > +	ret = __get_user(pfn, &machine_to_phys_mapping[mfn]);
> > >  try_override:
> > >  	/*
> > >  	 * If this appears to be a foreign mfn (because the pfn
> > >  	 * doesn't map back to the mfn), then check the local override
> > >  	 * table to see if there's a better pfn to use.
> > >  	 */
> > > -	if (get_phys_to_machine(pfn) != mfn)
> > > -		pfn = m2p_find_override_pfn(mfn, pfn);
> > > +	if (ret < 0)
> > > +		pfn = ~0;
> > > +	else if (get_phys_to_machine(pfn) != mfn)
> > > +		pfn = m2p_find_override_pfn(mfn, ~0);
> > > +
> > > +	if (pfn == ~0 &&
> > 
> > You should also check for 0x55555... then.
> 
> that is not necessary because pfn == ~0 at this point, this is the code
> path:
> 
> ret = __get_user(pfn, &machine_to_phys_mapping[mfn]); /* ret is >= 0 and
>                                                        pfn is 0x55555.. */
> get_phys_to_machine(0x55555) != mfn /* true, I am assuming there are no
>                                      valid p2m mappings at 0x55555.. */

> pfn = m2p_find_override_pfn(mfn, ~0); /* the m2p_ovverride doesn't find
>                                          anything so it returns ~0 (the
>                                          second argument), pfn == ~0 now */
> if (pfn == ~0 && /* true */
> 
> 
> maybe I should add a comment (and drink less caffeine)?

The first time I saw the patch I missed that you passed in 'mfn' to
the second get_phys_to_machine .. Comments are good here I think.
> 
> 
> > > +			get_phys_to_machine(mfn) == IDENTITY_FRAME(mfn))
> > > +		pfn = mfn;
> > 
> > So for identity type mfns we end up calling 'get_phys_to_machine(mfn)' twice
> > I think?
> > 
> > Would it make sense to save the result of 'get_phys_to_machine(mfn)'
> > the first call?
> 
> the first call is get_phys_to_machine(pfn), with pfn potentially
> garbage; this call is get_phys_to_machine(mfn).

I think what you are telling me is that it is pointless to check for
the IDENTITY_FRAME b/c it won't happen often or at all.

So moving the code so that we do the hot-paths first makes more
sense and we should structure the code as so, right?

I agree with that sentiment, do you want to prep another patch that
has this patch and also some more comments?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Re: [PATCH 11/11] xen/m2p: Check whether the MFN has IDENTITY_FRAME bit set..
@ 2011-02-02 16:43           ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 30+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-02-02 16:43 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: jeremy, Xen-devel, linux-kernel, Ian Campbell, konrad, hpa

> > Not sure I understand.
> 
> Considering that getting in this function with an identity mfn passed as
> argument shouldn't happen at all, and considering that there are cases

True, unless a developer wants to check that p2m(m2p(mfn))=mfn and 
supplies the MFNs that cover the PCI BAR spaces.. but that is mostly
done when trying to troubleshoot something and normal folks won't do this.


> in which the same mfn can correspond to a normal mapping, a 1:1 mapping
> or even a granted page, it is a good idea to check the m2p_override
> *before* checking if the mfn is an identity mfn.
> So that if there are two identical mfns, one granted and the other
> one belonging to an identity mapping, we would return the pfn

How can you have that? Are you passing through a PCI device to
a PV domain, making the PV domain have the netback/blkback device,
and then granting those pages to another domain?

> corresponding to the granted mfn, that is the right answer because we
> shouldn't even be here if the argument was an identity mfn.
> 
> 
> > > 3) the returned pfn might be 0xfffff or 0xeeeee.

>From the machine_to_phys_mapping array. OK.
> > > We should use the mfn value directly as pfn value to check for possible
> > > identity mappings.
> > 
> > Aren't we doing that via 'get_phys_to_machine' ? It returns the value
> > and if it has the IDENTITY_FRAME_BIT it is an identity.
> > 
> > Or are you thinking of abolishing the IDENTITY_FRAME_BIT and check the
> > M2P in conjunction with the P2M to see if the mfn is a 1-1 mapping?
> > 
> 
> Not at all.
> The problem is the following:
> 
> pfn = m2p(mfn);

You are assuming that the 'mfn' is outside the scope off our machine_to_phys array
or that it is "garbage", such as 0xdeadbeef for example, I think.

In which case the pfn value is either 0xdeadbeef or INVALID_P2M_ENTRY.

If you provide that to get_phys_to_machine(0xdeadbeef) you get INVALID_P2M_ENTRY,
and you would call:
  pfn = m2p_find_override_pfn(mfn, pfn);

So we still end up checking the M2P override. If the pfn supplied
was INVALID_P2M_ENTRY, you would get the same value and still check the P2M override.


> 
> let's suppose that pfn is garbage (0xFFFFF.. or 0x55555..), in this case

0xFFFF or 0x5555 are not garbage. They are valid values. Garbage is the
value zero. 0xFFFF.. has two meanings: It is a missing page that can be
balloon-ed in, or it belongs to the DOMID_IO. The 0x5555.. means it is
a shared DOMID_IO page.

> get_phys_to_machine(pfn) will return INVALID_P2M_ENTRY that is not what
> we want.
> So we check for
> 
> get_phys_to_machine(mfn) == IDENTITY_FRAME(mfn)
>                     ---
> if it is true than it means that the mfn passed as argument belongs to
> an identity mapping.

<nods> Yup.
> 
> 
> 
> > > 
> > > The resulting patch looks like the following:
> > > 
> > > ---
> > > 
> > > 
> > > diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
> > > index ed46ec2..7f9bae2 100644
> > > --- a/arch/x86/include/asm/xen/page.h
> > > +++ b/arch/x86/include/asm/xen/page.h
> > > @@ -80,6 +80,7 @@ static inline int phys_to_machine_mapping_valid(unsigned long pfn)
> > >  
> > >  static inline unsigned long mfn_to_pfn(unsigned long mfn)
> > >  {
> > > +	int ret = 0;
> > >  	unsigned long pfn;
> > >  
> > >  	if (xen_feature(XENFEAT_auto_translated_physmap))
> > > @@ -95,15 +96,21 @@ static inline unsigned long mfn_to_pfn(unsigned long mfn)
> > >  	 * In such cases it doesn't matter what we return (we return garbage),
> > >  	 * but we must handle the fault without crashing!
> > >  	 */
> > > -	__get_user(pfn, &machine_to_phys_mapping[mfn]);
> > > +	ret = __get_user(pfn, &machine_to_phys_mapping[mfn]);
> > >  try_override:
> > >  	/*
> > >  	 * If this appears to be a foreign mfn (because the pfn
> > >  	 * doesn't map back to the mfn), then check the local override
> > >  	 * table to see if there's a better pfn to use.
> > >  	 */
> > > -	if (get_phys_to_machine(pfn) != mfn)
> > > -		pfn = m2p_find_override_pfn(mfn, pfn);
> > > +	if (ret < 0)
> > > +		pfn = ~0;
> > > +	else if (get_phys_to_machine(pfn) != mfn)
> > > +		pfn = m2p_find_override_pfn(mfn, ~0);
> > > +
> > > +	if (pfn == ~0 &&
> > 
> > You should also check for 0x55555... then.
> 
> that is not necessary because pfn == ~0 at this point, this is the code
> path:
> 
> ret = __get_user(pfn, &machine_to_phys_mapping[mfn]); /* ret is >= 0 and
>                                                        pfn is 0x55555.. */
> get_phys_to_machine(0x55555) != mfn /* true, I am assuming there are no
>                                      valid p2m mappings at 0x55555.. */

> pfn = m2p_find_override_pfn(mfn, ~0); /* the m2p_ovverride doesn't find
>                                          anything so it returns ~0 (the
>                                          second argument), pfn == ~0 now */
> if (pfn == ~0 && /* true */
> 
> 
> maybe I should add a comment (and drink less caffeine)?

The first time I saw the patch I missed that you passed in 'mfn' to
the second get_phys_to_machine .. Comments are good here I think.
> 
> 
> > > +			get_phys_to_machine(mfn) == IDENTITY_FRAME(mfn))
> > > +		pfn = mfn;
> > 
> > So for identity type mfns we end up calling 'get_phys_to_machine(mfn)' twice
> > I think?
> > 
> > Would it make sense to save the result of 'get_phys_to_machine(mfn)'
> > the first call?
> 
> the first call is get_phys_to_machine(pfn), with pfn potentially
> garbage; this call is get_phys_to_machine(mfn).

I think what you are telling me is that it is pointless to check for
the IDENTITY_FRAME b/c it won't happen often or at all.

So moving the code so that we do the hot-paths first makes more
sense and we should structure the code as so, right?

I agree with that sentiment, do you want to prep another patch that
has this patch and also some more comments?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Xen-devel] Re: [PATCH 11/11] xen/m2p: Check whether the MFN has IDENTITY_FRAME bit set..
  2011-02-02 16:43           ` Konrad Rzeszutek Wilk
@ 2011-02-02 18:32             ` Stefano Stabellini
  -1 siblings, 0 replies; 30+ messages in thread
From: Stefano Stabellini @ 2011-02-02 18:32 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, jeremy, Xen-devel, linux-kernel,
	Ian Campbell, konrad, hpa

On Wed, 2 Feb 2011, Konrad Rzeszutek Wilk wrote:
> > in which the same mfn can correspond to a normal mapping, a 1:1 mapping
> > or even a granted page, it is a good idea to check the m2p_override
> > *before* checking if the mfn is an identity mfn.
> > So that if there are two identical mfns, one granted and the other
> > one belonging to an identity mapping, we would return the pfn
> 
> How can you have that? Are you passing through a PCI device to
> a PV domain, making the PV domain have the netback/blkback device,
> and then granting those pages to another domain?

it is an hypothetical scenario: we have two domUs dA and dB; dA grants a
page P (the corresponding mfn will be called mfn_P) to dB.  mfn_P
happens to be normal valid ram in dA but in dB another mfn called mfn_Q
belonging to a 1:1 region exists so that mfn_P == mfn_Q.
I think this scenario is possible, we just need two domU with different
e820's.


> > let's suppose that pfn is garbage (0xFFFFF.. or 0x55555..), in this case
> 
> 0xFFFF or 0x5555 are not garbage. They are valid values. Garbage is the
> value zero. 0xFFFF.. has two meanings: It is a missing page that can be
> balloon-ed in, or it belongs to the DOMID_IO. The 0x5555.. means it is
> a shared DOMID_IO page.

I see, this is interesting.


> > ret = __get_user(pfn, &machine_to_phys_mapping[mfn]); /* ret is >= 0 and
> >                                                        pfn is 0x55555.. */
> > get_phys_to_machine(0x55555) != mfn /* true, I am assuming there are no
> >                                      valid p2m mappings at 0x55555.. */
> 
> > pfn = m2p_find_override_pfn(mfn, ~0); /* the m2p_ovverride doesn't find
> >                                          anything so it returns ~0 (the
> >                                          second argument), pfn == ~0 now */
> > if (pfn == ~0 && /* true */
> > 
> > 
> > maybe I should add a comment (and drink less caffeine)?
> 
> The first time I saw the patch I missed that you passed in 'mfn' to
> the second get_phys_to_machine .. Comments are good here I think.

I'll do.


> > 
> > 
> > > > +			get_phys_to_machine(mfn) == IDENTITY_FRAME(mfn))
> > > > +		pfn = mfn;
> > > 
> > > So for identity type mfns we end up calling 'get_phys_to_machine(mfn)' twice
> > > I think?
> > > 
> > > Would it make sense to save the result of 'get_phys_to_machine(mfn)'
> > > the first call?
> > 
> > the first call is get_phys_to_machine(pfn), with pfn potentially
> > garbage; this call is get_phys_to_machine(mfn).
> 
> I think what you are telling me is that it is pointless to check for
> the IDENTITY_FRAME b/c it won't happen often or at all.
> 
> So moving the code so that we do the hot-paths first makes more
> sense and we should structure the code as so, right?
> 

Yes.
My point is that it makes sense to check the m2p_override first
because it is more likely to get a valid result than checking for an
identity mapping.
Besides if both are available for the same mfn (case described above) we
want to return the m2p_override result here.


> I agree with that sentiment, do you want to prep another patch that
> has this patch and also some more comments?

yes, appended.

---


diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index be118d8..16ba2a8 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -81,6 +81,7 @@ static inline int phys_to_machine_mapping_valid(unsigned long pfn)
 static inline unsigned long mfn_to_pfn(unsigned long mfn)
 {
 	unsigned long pfn;
+	int ret = 0;
 
 	if (xen_feature(XENFEAT_auto_translated_physmap))
 		return mfn;
@@ -95,15 +96,29 @@ static inline unsigned long mfn_to_pfn(unsigned long mfn)
 	 * In such cases it doesn't matter what we return (we return garbage),
 	 * but we must handle the fault without crashing!
 	 */
-	__get_user(pfn, &machine_to_phys_mapping[mfn]);
+	ret = __get_user(pfn, &machine_to_phys_mapping[mfn]);
 try_override:
-	/*
-	 * If this appears to be a foreign mfn (because the pfn
-	 * doesn't map back to the mfn), then check the local override
-	 * table to see if there's a better pfn to use.
+	/* ret might be < 0 if there are no entries in the m2p for mfn */
+	if (ret < 0)
+		pfn = ~0;
+	else if (get_phys_to_machine(pfn) != mfn)
+		/*
+		 * If this appears to be a foreign mfn (because the pfn
+		 * doesn't map back to the mfn), then check the local override
+		 * table to see if there's a better pfn to use.
+		 *
+		 * m2p_find_override_pfn returns ~0 if it doesn't find anything.
+		 */
+		pfn = m2p_find_override_pfn(mfn, ~0);
+
+	/* 
+	 * pfn is ~0 if there are no entries in the m2p for mfn or if the
+	 * entry doesn't map back to the mfn and m2p_override doesn't have a
+	 * valid entry for it.
 	 */
-	if (get_phys_to_machine(pfn) != mfn)
-		pfn = m2p_find_override_pfn(mfn, pfn);
+	if (pfn == ~0 &&
+			get_phys_to_machine(mfn) == IDENTITY_FRAME(mfn))
+		pfn = mfn;
 
 	return pfn;
 }

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: Re: [PATCH 11/11] xen/m2p: Check whether the MFN has IDENTITY_FRAME bit set..
@ 2011-02-02 18:32             ` Stefano Stabellini
  0 siblings, 0 replies; 30+ messages in thread
From: Stefano Stabellini @ 2011-02-02 18:32 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: jeremy, Xen-devel, Stefano Stabellini, linux-kernel,
	Ian Campbell, konrad, hpa

On Wed, 2 Feb 2011, Konrad Rzeszutek Wilk wrote:
> > in which the same mfn can correspond to a normal mapping, a 1:1 mapping
> > or even a granted page, it is a good idea to check the m2p_override
> > *before* checking if the mfn is an identity mfn.
> > So that if there are two identical mfns, one granted and the other
> > one belonging to an identity mapping, we would return the pfn
> 
> How can you have that? Are you passing through a PCI device to
> a PV domain, making the PV domain have the netback/blkback device,
> and then granting those pages to another domain?

it is an hypothetical scenario: we have two domUs dA and dB; dA grants a
page P (the corresponding mfn will be called mfn_P) to dB.  mfn_P
happens to be normal valid ram in dA but in dB another mfn called mfn_Q
belonging to a 1:1 region exists so that mfn_P == mfn_Q.
I think this scenario is possible, we just need two domU with different
e820's.


> > let's suppose that pfn is garbage (0xFFFFF.. or 0x55555..), in this case
> 
> 0xFFFF or 0x5555 are not garbage. They are valid values. Garbage is the
> value zero. 0xFFFF.. has two meanings: It is a missing page that can be
> balloon-ed in, or it belongs to the DOMID_IO. The 0x5555.. means it is
> a shared DOMID_IO page.

I see, this is interesting.


> > ret = __get_user(pfn, &machine_to_phys_mapping[mfn]); /* ret is >= 0 and
> >                                                        pfn is 0x55555.. */
> > get_phys_to_machine(0x55555) != mfn /* true, I am assuming there are no
> >                                      valid p2m mappings at 0x55555.. */
> 
> > pfn = m2p_find_override_pfn(mfn, ~0); /* the m2p_ovverride doesn't find
> >                                          anything so it returns ~0 (the
> >                                          second argument), pfn == ~0 now */
> > if (pfn == ~0 && /* true */
> > 
> > 
> > maybe I should add a comment (and drink less caffeine)?
> 
> The first time I saw the patch I missed that you passed in 'mfn' to
> the second get_phys_to_machine .. Comments are good here I think.

I'll do.


> > 
> > 
> > > > +			get_phys_to_machine(mfn) == IDENTITY_FRAME(mfn))
> > > > +		pfn = mfn;
> > > 
> > > So for identity type mfns we end up calling 'get_phys_to_machine(mfn)' twice
> > > I think?
> > > 
> > > Would it make sense to save the result of 'get_phys_to_machine(mfn)'
> > > the first call?
> > 
> > the first call is get_phys_to_machine(pfn), with pfn potentially
> > garbage; this call is get_phys_to_machine(mfn).
> 
> I think what you are telling me is that it is pointless to check for
> the IDENTITY_FRAME b/c it won't happen often or at all.
> 
> So moving the code so that we do the hot-paths first makes more
> sense and we should structure the code as so, right?
> 

Yes.
My point is that it makes sense to check the m2p_override first
because it is more likely to get a valid result than checking for an
identity mapping.
Besides if both are available for the same mfn (case described above) we
want to return the m2p_override result here.


> I agree with that sentiment, do you want to prep another patch that
> has this patch and also some more comments?

yes, appended.

---


diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index be118d8..16ba2a8 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -81,6 +81,7 @@ static inline int phys_to_machine_mapping_valid(unsigned long pfn)
 static inline unsigned long mfn_to_pfn(unsigned long mfn)
 {
 	unsigned long pfn;
+	int ret = 0;
 
 	if (xen_feature(XENFEAT_auto_translated_physmap))
 		return mfn;
@@ -95,15 +96,29 @@ static inline unsigned long mfn_to_pfn(unsigned long mfn)
 	 * In such cases it doesn't matter what we return (we return garbage),
 	 * but we must handle the fault without crashing!
 	 */
-	__get_user(pfn, &machine_to_phys_mapping[mfn]);
+	ret = __get_user(pfn, &machine_to_phys_mapping[mfn]);
 try_override:
-	/*
-	 * If this appears to be a foreign mfn (because the pfn
-	 * doesn't map back to the mfn), then check the local override
-	 * table to see if there's a better pfn to use.
+	/* ret might be < 0 if there are no entries in the m2p for mfn */
+	if (ret < 0)
+		pfn = ~0;
+	else if (get_phys_to_machine(pfn) != mfn)
+		/*
+		 * If this appears to be a foreign mfn (because the pfn
+		 * doesn't map back to the mfn), then check the local override
+		 * table to see if there's a better pfn to use.
+		 *
+		 * m2p_find_override_pfn returns ~0 if it doesn't find anything.
+		 */
+		pfn = m2p_find_override_pfn(mfn, ~0);
+
+	/* 
+	 * pfn is ~0 if there are no entries in the m2p for mfn or if the
+	 * entry doesn't map back to the mfn and m2p_override doesn't have a
+	 * valid entry for it.
 	 */
-	if (get_phys_to_machine(pfn) != mfn)
-		pfn = m2p_find_override_pfn(mfn, pfn);
+	if (pfn == ~0 &&
+			get_phys_to_machine(mfn) == IDENTITY_FRAME(mfn))
+		pfn = mfn;
 
 	return pfn;
 }

^ permalink raw reply related	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2011-02-02 18:32 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-31 22:44 [PATCH v4] Consider E820 non-RAM and E820 gaps as 1-1 mappings Konrad Rzeszutek Wilk
2011-01-31 22:44 ` [PATCH 01/11] xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY Konrad Rzeszutek Wilk
2011-01-31 22:44 ` [PATCH 02/11] xen/mmu: Add the notion of identity (1-1) mapping Konrad Rzeszutek Wilk
2011-02-01 21:33   ` Jeremy Fitzhardinge
2011-01-31 22:44 ` [PATCH 03/11] xen/mmu: Set _PAGE_IOMAP if PFN is an identity PFN Konrad Rzeszutek Wilk
2011-01-31 22:44 ` [PATCH 04/11] xen/mmu: BUG_ON when racing to swap middle leaf Konrad Rzeszutek Wilk
2011-02-01 21:34   ` Jeremy Fitzhardinge
2011-01-31 22:44 ` [PATCH 05/11] xen/setup: Set identity mapping for non-RAM E820 and E820 gaps Konrad Rzeszutek Wilk
2011-02-01 22:32   ` Konrad Rzeszutek Wilk
2011-02-01 22:32     ` Konrad Rzeszutek Wilk
2011-01-31 22:44 ` [PATCH 06/11] xen/setup: Skip over 1st gap after System RAM Konrad Rzeszutek Wilk
2011-02-01 15:08   ` Ian Campbell
2011-02-01 17:14     ` H. Peter Anvin
2011-02-01 17:14       ` H. Peter Anvin
2011-02-01 22:28     ` Konrad Rzeszutek Wilk
2011-01-31 22:44 ` [PATCH 07/11] x86/setup: Consult the raw E820 for zero sized E820 RAM regions Konrad Rzeszutek Wilk
2011-02-01 17:52   ` Stefano Stabellini
2011-02-01 22:29     ` [Xen-devel] " Konrad Rzeszutek Wilk
2011-01-31 22:44 ` [PATCH 08/11] xen/debugfs: Add 'p2m' file for printing out the P2M layout Konrad Rzeszutek Wilk
2011-01-31 22:44 ` [PATCH 09/11] xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set Konrad Rzeszutek Wilk
2011-01-31 22:44 ` [PATCH 10/11] xen/m2p: No need to catch exceptions when we know that there is no RAM Konrad Rzeszutek Wilk
2011-01-31 22:44 ` [PATCH 11/11] xen/m2p: Check whether the MFN has IDENTITY_FRAME bit set Konrad Rzeszutek Wilk
2011-02-01 17:52   ` Stefano Stabellini
2011-02-01 20:29     ` Konrad Rzeszutek Wilk
2011-02-01 20:29       ` Konrad Rzeszutek Wilk
2011-02-02 11:52       ` Stefano Stabellini
2011-02-02 16:43         ` [Xen-devel] " Konrad Rzeszutek Wilk
2011-02-02 16:43           ` Konrad Rzeszutek Wilk
2011-02-02 18:32           ` [Xen-devel] " Stefano Stabellini
2011-02-02 18:32             ` Stefano Stabellini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.