linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5).
@ 2012-04-16 17:15 Konrad Rzeszutek Wilk
  2012-04-16 17:15 ` [PATCH 1/8] xen/p2m: Move code around to allow for better re-usage Konrad Rzeszutek Wilk
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-04-16 17:15 UTC (permalink / raw)
  To: linux-kernel, david.vrabel, JBeulich; +Cc: xen-devel

Changelog v5 [since v4]:
 - used populate_physmap, fixed bugs.
[v2-v4: not posted]
 - reworked the code in setup.c to work properly.
[v1: https://lkml.org/lkml/2012/3/30/492]
 - initial patchset


These patches that did not change between the first posting and this one are:

      xen/p2m: Move code around to allow for better re-usage.
      xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument
      xen/p2m: Collapse early_alloc_p2m_middle redundant checks.
      xen/p2m: An early bootup variant of set_phys_to_machine

so if you read them before - you can skip them.

The purpose of these patches is three-fold:
 1) Fix the case where dom0_mem=X is not used and the machine boots with less
    memory than it should. A subsequent 'xl mem-set 0 X' is required to balloon up
    to the right amount. Here is an example of what it looks like with the older
    kernels and with a pvops (and the patches):

	2.6.32 SLES:
	Memory: 7538688k/8079432k available (3971k kernel code, 8192k absent, 532300k reserved, 2491k data, 348k init)
	MemTotal:        8063140 kB
	MemFree:         7421504 kB
	DirectMap4k:     8071240 kB
	Domain-0                                     0  7873     4     r-----     20.3

	3.3:
	Memory: 6486452k/9208688k available (5825k kernel code, 1136060k absent, 1586176k reserved, 2890k data, 692k init)
	MemTotal:        6716156 kB
	MemFree:         6365696 kB
	DirectMap4k:     8078192 kB
	Domain-0                                     0  6774     4     r-----     26.0

	3.3+patches:
	Memory: 7621460k/9208688k available (5817k kernel code, 1136060k absent, 451168k reserved, 2899k data, 692k init)
	MemTotal:        7849924 kB
	MemFree:         7500748 kB
	DirectMap4k:     8078192 kB
	Domain-0                                     0  7883     4     r-----     11.9

    This is implemented in:
    [PATCH 7/8] xen/setup: Populate freed MFNs from non-RAM E820 entries

 2). Fix where dom0_mem=XXXG is used (so the 'max' parameter is not involved). The
     bug is that we would ignore it:

	With dom0_mem=1G
	2.6.32 SLES:
	Memory: 650240k/1056768k available (3971k kernel code, 8192k absent, 397852k reserved, 2491k data, 348k init)
	3.3:
	Memory: 610884k/9435136k available (5817k kernel code, 1136060k absent, 7688192k reserved, 2899k data, 696k init)
	3.3+patches:
	Memory: 724184k/1053064k available (5817k kernel code, 4552k absent, 324328k reserved, 2899k data, 696k init)

   This is implemented in:
   [PATCH 6/8] xen/setup: Work properly with 'dom0_mem=X' or with not


Note, that the combination of 'dom0_mem=XXXG,max:YYYG' is perfectly fine, so there is no
need to fix that:

	With dom0_mem=1G,max:3G

	2.6.32 SLES:
	Memory: 650240k/1056768k available (3971k kernel code, 8192k absent, 397852k reserved, 2491k data, 348k init)
	3.3:
	Memory: 690324k/4281724k available (5826k kernel code, 1136060k absent, 2455340k reserved, 2891k data, 692k init)
	3.3+patches:
	Memory: 690960k/4281724k available (5824k kernel code, 1136060k absent, 2454704k reserved, 2893k data, 692k init)


Konrad Rzeszutek Wilk (8):
      xen/p2m: Move code around to allow for better re-usage.
      xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument
      xen/p2m: Collapse early_alloc_p2m_middle redundant checks.
      xen/p2m: An early bootup variant of set_phys_to_machine
      xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0
      xen/setup: Work properly with 'dom0_mem=X' or with not dom0_mem.
      xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM
      xen/setup: Combine the two hypercall functions - since they are quite similar.

 arch/x86/include/asm/xen/page.h |    1 +
 arch/x86/xen/p2m.c              |  104 ++++++++++++++++-----------
 arch/x86/xen/setup.c            |  152 +++++++++++++++++++++++++++++++++------
 3 files changed, 193 insertions(+), 64 deletions(-)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/8] xen/p2m: Move code around to allow for better re-usage.
  2012-04-16 17:15 [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5) Konrad Rzeszutek Wilk
@ 2012-04-16 17:15 ` Konrad Rzeszutek Wilk
  2012-04-16 17:15 ` [PATCH 2/8] xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument Konrad Rzeszutek Wilk
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-04-16 17:15 UTC (permalink / raw)
  To: linux-kernel, david.vrabel, JBeulich; +Cc: xen-devel, Konrad Rzeszutek Wilk

We are going to be using the early_alloc_p2m (and
early_alloc_p2m_middle) code in follow up patches which
are not related to setting identity pages.

Hence lets move the code out in its own function and
rename them as appropiate.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/p2m.c |   62 ++++++++++++++++++++++++++++-----------------------
 1 files changed, 34 insertions(+), 28 deletions(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 1b267e7..3cc3afe 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -499,7 +499,7 @@ static bool alloc_p2m(unsigned long pfn)
 	return true;
 }
 
-static bool __init __early_alloc_p2m(unsigned long pfn)
+static bool __init early_alloc_p2m_middle(unsigned long pfn)
 {
 	unsigned topidx, mididx, idx;
 
@@ -541,6 +541,36 @@ static bool __init __early_alloc_p2m(unsigned long pfn)
 	}
 	return idx != 0;
 }
+
+static bool __init early_alloc_p2m(unsigned long pfn)
+{
+	unsigned topidx = p2m_top_index(pfn);
+	unsigned long *mid_mfn_p;
+	unsigned long **mid;
+
+	mid = p2m_top[topidx];
+	mid_mfn_p = p2m_top_mfn_p[topidx];
+	if (mid == p2m_mid_missing) {
+		mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
+
+		p2m_mid_init(mid);
+
+		p2m_top[topidx] = mid;
+
+		BUG_ON(mid_mfn_p != p2m_mid_missing_mfn);
+	}
+	/* And the save/restore P2M tables.. */
+	if (mid_mfn_p == p2m_mid_missing_mfn) {
+		mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
+		p2m_mid_mfn_init(mid_mfn_p);
+
+		p2m_top_mfn_p[topidx] = mid_mfn_p;
+		p2m_top_mfn[topidx] = virt_to_mfn(mid_mfn_p);
+		/* Note: we don't set mid_mfn_p[midix] here,
+		 * look in early_alloc_p2m_middle */
+	}
+	return true;
+}
 unsigned long __init set_phys_range_identity(unsigned long pfn_s,
 				      unsigned long pfn_e)
 {
@@ -559,35 +589,11 @@ unsigned long __init set_phys_range_identity(unsigned long pfn_s,
 		pfn < ALIGN(pfn_e, (P2M_MID_PER_PAGE * P2M_PER_PAGE));
 		pfn += P2M_MID_PER_PAGE * P2M_PER_PAGE)
 	{
-		unsigned topidx = p2m_top_index(pfn);
-		unsigned long *mid_mfn_p;
-		unsigned long **mid;
-
-		mid = p2m_top[topidx];
-		mid_mfn_p = p2m_top_mfn_p[topidx];
-		if (mid == p2m_mid_missing) {
-			mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
-
-			p2m_mid_init(mid);
-
-			p2m_top[topidx] = mid;
-
-			BUG_ON(mid_mfn_p != p2m_mid_missing_mfn);
-		}
-		/* And the save/restore P2M tables.. */
-		if (mid_mfn_p == p2m_mid_missing_mfn) {
-			mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
-			p2m_mid_mfn_init(mid_mfn_p);
-
-			p2m_top_mfn_p[topidx] = mid_mfn_p;
-			p2m_top_mfn[topidx] = virt_to_mfn(mid_mfn_p);
-			/* Note: we don't set mid_mfn_p[midix] here,
-		 	 * look in __early_alloc_p2m */
-		}
+		WARN_ON(!early_alloc_p2m(pfn));
 	}
 
-	__early_alloc_p2m(pfn_s);
-	__early_alloc_p2m(pfn_e);
+	early_alloc_p2m_middle(pfn_s);
+	early_alloc_p2m_middle(pfn_e);
 
 	for (pfn = pfn_s; pfn < pfn_e; pfn++)
 		if (!__set_phys_to_machine(pfn, IDENTITY_FRAME(pfn)))
-- 
1.7.7.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/8] xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument
  2012-04-16 17:15 [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5) Konrad Rzeszutek Wilk
  2012-04-16 17:15 ` [PATCH 1/8] xen/p2m: Move code around to allow for better re-usage Konrad Rzeszutek Wilk
@ 2012-04-16 17:15 ` Konrad Rzeszutek Wilk
  2012-04-16 17:15 ` [PATCH 3/8] xen/p2m: Collapse early_alloc_p2m_middle redundant checks Konrad Rzeszutek Wilk
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-04-16 17:15 UTC (permalink / raw)
  To: linux-kernel, david.vrabel, JBeulich; +Cc: xen-devel, Konrad Rzeszutek Wilk

For identity cases we want to call reserve_brk only on the boundary
conditions of the middle P2M (so P2M[x][y][0] = extend_brk). This is
to work around identify regions (PCI spaces, gaps in E820) which are not
aligned on 2MB regions.

However for the case were we want to allocate P2M middle leafs at the
early bootup stage, irregardless of this alignment check we need some
means of doing that.  For that we provide the new argument.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/p2m.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 3cc3afe..8b3a395 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -499,7 +499,7 @@ static bool alloc_p2m(unsigned long pfn)
 	return true;
 }
 
-static bool __init early_alloc_p2m_middle(unsigned long pfn)
+static bool __init early_alloc_p2m_middle(unsigned long pfn, bool check_boundary)
 {
 	unsigned topidx, mididx, idx;
 
@@ -508,7 +508,7 @@ static bool __init early_alloc_p2m_middle(unsigned long pfn)
 	idx = p2m_index(pfn);
 
 	/* Pfff.. No boundary cross-over, lets get out. */
-	if (!idx)
+	if (!idx && check_boundary)
 		return false;
 
 	WARN(p2m_top[topidx][mididx] == p2m_identity,
@@ -531,7 +531,7 @@ static bool __init early_alloc_p2m_middle(unsigned long pfn)
 		p2m_top[topidx][mididx] = p2m;
 
 		/* For save/restore we need to MFN of the P2M saved */
-		
+
 		mid_mfn_p = p2m_top_mfn_p[topidx];
 		WARN(mid_mfn_p[mididx] != virt_to_mfn(p2m_missing),
 			"P2M_TOP_P[%d][%d] != MFN of p2m_missing!\n",
@@ -592,8 +592,8 @@ unsigned long __init set_phys_range_identity(unsigned long pfn_s,
 		WARN_ON(!early_alloc_p2m(pfn));
 	}
 
-	early_alloc_p2m_middle(pfn_s);
-	early_alloc_p2m_middle(pfn_e);
+	early_alloc_p2m_middle(pfn_s, true);
+	early_alloc_p2m_middle(pfn_e, true);
 
 	for (pfn = pfn_s; pfn < pfn_e; pfn++)
 		if (!__set_phys_to_machine(pfn, IDENTITY_FRAME(pfn)))
-- 
1.7.7.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/8] xen/p2m: Collapse early_alloc_p2m_middle redundant checks.
  2012-04-16 17:15 [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5) Konrad Rzeszutek Wilk
  2012-04-16 17:15 ` [PATCH 1/8] xen/p2m: Move code around to allow for better re-usage Konrad Rzeszutek Wilk
  2012-04-16 17:15 ` [PATCH 2/8] xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument Konrad Rzeszutek Wilk
@ 2012-04-16 17:15 ` Konrad Rzeszutek Wilk
  2012-04-16 17:15 ` [PATCH 4/8] xen/p2m: An early bootup variant of set_phys_to_machine Konrad Rzeszutek Wilk
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-04-16 17:15 UTC (permalink / raw)
  To: linux-kernel, david.vrabel, JBeulich; +Cc: xen-devel, Konrad Rzeszutek Wilk

At the start of the function we were checking for idx != 0
and bailing out. And later calling extend_brk if idx != 0.

That is unnecessary so remove that checks.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/p2m.c |   25 ++++++++++++-------------
 1 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 8b3a395..952edef 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -502,6 +502,8 @@ static bool alloc_p2m(unsigned long pfn)
 static bool __init early_alloc_p2m_middle(unsigned long pfn, bool check_boundary)
 {
 	unsigned topidx, mididx, idx;
+	unsigned long *p2m;
+	unsigned long *mid_mfn_p;
 
 	topidx = p2m_top_index(pfn);
 	mididx = p2m_mid_index(pfn);
@@ -522,24 +524,21 @@ static bool __init early_alloc_p2m_middle(unsigned long pfn, bool check_boundary
 		return false;
 
 	/* Boundary cross-over for the edges: */
-	if (idx) {
-		unsigned long *p2m = extend_brk(PAGE_SIZE, PAGE_SIZE);
-		unsigned long *mid_mfn_p;
+	p2m = extend_brk(PAGE_SIZE, PAGE_SIZE);
 
-		p2m_init(p2m);
+	p2m_init(p2m);
 
-		p2m_top[topidx][mididx] = p2m;
+	p2m_top[topidx][mididx] = p2m;
 
-		/* For save/restore we need to MFN of the P2M saved */
+	/* For save/restore we need to MFN of the P2M saved */
 
-		mid_mfn_p = p2m_top_mfn_p[topidx];
-		WARN(mid_mfn_p[mididx] != virt_to_mfn(p2m_missing),
-			"P2M_TOP_P[%d][%d] != MFN of p2m_missing!\n",
-			topidx, mididx);
-		mid_mfn_p[mididx] = virt_to_mfn(p2m);
+	mid_mfn_p = p2m_top_mfn_p[topidx];
+	WARN(mid_mfn_p[mididx] != virt_to_mfn(p2m_missing),
+		"P2M_TOP_P[%d][%d] != MFN of p2m_missing!\n",
+		topidx, mididx);
+	mid_mfn_p[mididx] = virt_to_mfn(p2m);
 
-	}
-	return idx != 0;
+	return true;
 }
 
 static bool __init early_alloc_p2m(unsigned long pfn)
-- 
1.7.7.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/8] xen/p2m: An early bootup variant of set_phys_to_machine
  2012-04-16 17:15 [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5) Konrad Rzeszutek Wilk
                   ` (2 preceding siblings ...)
  2012-04-16 17:15 ` [PATCH 3/8] xen/p2m: Collapse early_alloc_p2m_middle redundant checks Konrad Rzeszutek Wilk
@ 2012-04-16 17:15 ` Konrad Rzeszutek Wilk
  2012-04-16 17:15 ` [PATCH 6/8] xen/setup: Work properly with 'dom0_mem=X' or with not dom0_mem Konrad Rzeszutek Wilk
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-04-16 17:15 UTC (permalink / raw)
  To: linux-kernel, david.vrabel, JBeulich; +Cc: xen-devel, Konrad Rzeszutek Wilk

During early bootup we can't use alloc_page, so to allocate
leaf pages in the P2M we need to use extend_brk. For that
we are utilizing the early_alloc_p2m and early_alloc_p2m_middle
functions to do the job for us. This function follows the
same logic as set_phys_to_machine.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/include/asm/xen/page.h |    1 +
 arch/x86/xen/p2m.c              |   15 +++++++++++++++
 2 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index c34f96c..93971e8 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -44,6 +44,7 @@ extern unsigned long  machine_to_phys_nr;
 
 extern unsigned long get_phys_to_machine(unsigned long pfn);
 extern bool set_phys_to_machine(unsigned long pfn, unsigned long mfn);
+extern bool __init early_set_phys_to_machine(unsigned long pfn, unsigned long mfn);
 extern bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn);
 extern unsigned long set_phys_range_identity(unsigned long pfn_s,
 					     unsigned long pfn_e);
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 952edef..ffd08c4 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -570,6 +570,21 @@ static bool __init early_alloc_p2m(unsigned long pfn)
 	}
 	return true;
 }
+bool __init early_set_phys_to_machine(unsigned long pfn, unsigned long mfn)
+{
+	if (unlikely(!__set_phys_to_machine(pfn, mfn)))  {
+		if (!early_alloc_p2m(pfn))
+			return false;
+
+		if (!early_alloc_p2m_middle(pfn, false /* boundary crossover OK!*/))
+			return false;
+
+		if (!__set_phys_to_machine(pfn, mfn))
+			return false;
+	}
+
+	return true;
+}
 unsigned long __init set_phys_range_identity(unsigned long pfn_s,
 				      unsigned long pfn_e)
 {
-- 
1.7.7.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/8] xen/setup: Work properly with 'dom0_mem=X' or with not dom0_mem.
  2012-04-16 17:15 [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5) Konrad Rzeszutek Wilk
                   ` (3 preceding siblings ...)
  2012-04-16 17:15 ` [PATCH 4/8] xen/p2m: An early bootup variant of set_phys_to_machine Konrad Rzeszutek Wilk
@ 2012-04-16 17:15 ` Konrad Rzeszutek Wilk
  2012-05-03 11:54   ` [Xen-devel] " David Vrabel
  2012-04-16 17:15 ` [PATCH 7/8] xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM Konrad Rzeszutek Wilk
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-04-16 17:15 UTC (permalink / raw)
  To: linux-kernel, david.vrabel, JBeulich; +Cc: xen-devel, Konrad Rzeszutek Wilk

We ignored the X value and ended up populating up to
max(MAX_DOMAIN_PAGES, last E820_RAM entry).

This fixes it by figuring out how many RAM nr_pages the
hypervisor wanted to provide to us and cap the populate
hypercalls up to that.

The end result is (on a 8GB box):

dom0_mem=1G
-Memory: 610884k/9435136k available (5817k kernel code, 1136060k absent, 7688192k reserved, 2899k data, 696k init)
+Memory: 724184k/1053064k available (5817k kernel code, 4552k absent, 324328k reserved, 2899k data, 696k init)

no dom0_mem
-Memory: 7619036k/9435136k available (5817k kernel code, 1136060k absent, 680040k reserved, 2899k data, 696k init)
+Memory: 7621460k/9208688k available (5817k kernel code, 1136060k absent, 451168k reserved, 2899k data, 696k init)

[v1: Details added]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/setup.c |   18 ++++++++++++++++++
 1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 7b0ab77..0e82bf1 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -185,11 +185,29 @@ static unsigned long __init xen_get_max_pages(void)
 	 * the current maximum rather than the static maximum. In this
 	 * case the e820 map provided to us will cover the static
 	 * maximum region.
+	 *
+	 * The dom0_mem=min:X,max:Y tweaks options differently depending
+	 * on the version, but in general this is what we get:
+	 *                | XENMEM_maximum_reser  | nr_pages
+	 * --------------++-----------------------+-------------------
+	 *  no dom0_mem   | INT_MAX               | max_phys_pfn
+	 *  =3G           | INT_MAX               | 786432
+	 *  =max:3G       | 786432                | 786432
+	 *  =min:1G,max:3G| INT_MAX               | max_phys_fn
+	 *  =1G,max:3G    | INT_MAX               | 262144
+	 *  =min:1G,max:3G,2G | INT_MAX           | max_phys_fn
+	 *
+	 * The =3G is often used and it lead to us initially setting
+	 * 786432 and allowing dom0 to balloon up to the max_physical_pfn.
+	 * This is at odd with the classic XenOClassic so lets emulate
+	 * the classic behavior.
 	 */
 	if (xen_initial_domain()) {
 		ret = HYPERVISOR_memory_op(XENMEM_maximum_reservation, &domid);
 		if (ret > 0)
 			max_pages = ret;
+		if (ret == -1UL)
+			max_pages = xen_start_info->nr_pages;
 	}
 
 	return min(max_pages, MAX_DOMAIN_PAGES);
-- 
1.7.7.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 7/8] xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM
  2012-04-16 17:15 [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5) Konrad Rzeszutek Wilk
                   ` (4 preceding siblings ...)
  2012-04-16 17:15 ` [PATCH 6/8] xen/setup: Work properly with 'dom0_mem=X' or with not dom0_mem Konrad Rzeszutek Wilk
@ 2012-04-16 17:15 ` Konrad Rzeszutek Wilk
  2012-05-03 11:56   ` [Xen-devel] " David Vrabel
  2012-04-16 17:15 ` [PATCH 8/8] xen/setup: Combine the two hypercall functions - since they are quite similar Konrad Rzeszutek Wilk
  2012-05-01 16:37 ` [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5) Konrad Rzeszutek Wilk
  7 siblings, 1 reply; 20+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-04-16 17:15 UTC (permalink / raw)
  To: linux-kernel, david.vrabel, JBeulich; +Cc: xen-devel, Konrad Rzeszutek Wilk

When the Xen hypervisor boots a PV kernel it hands it two pieces
of information: nr_pages and a made up E820 entry.

The nr_pages value defines the range from zero to nr_pages of PFNs
which have a valid Machine Frame Number (MFN) underneath it. The
E820 mirrors that (with the VGA hole):
BIOS-provided physical RAM map:
 Xen: 0000000000000000 - 00000000000a0000 (usable)
 Xen: 00000000000a0000 - 0000000000100000 (reserved)
 Xen: 0000000000100000 - 0000000080800000 (usable)

The fun comes when a PV guest that is run with a machine E820 - that
can either be the initial domain or a PCI PV guest, where the E820
looks like the normal thing:

BIOS-provided physical RAM map:
 Xen: 0000000000000000 - 000000000009e000 (usable)
 Xen: 000000000009ec00 - 0000000000100000 (reserved)
 Xen: 0000000000100000 - 0000000020000000 (usable)
 Xen: 0000000020000000 - 0000000020200000 (reserved)
 Xen: 0000000020200000 - 0000000040000000 (usable)
 Xen: 0000000040000000 - 0000000040200000 (reserved)
 Xen: 0000000040200000 - 00000000bad80000 (usable)
 Xen: 00000000bad80000 - 00000000badc9000 (ACPI NVS)
..
With that overlaying the nr_pages directly on the E820 does not
work as there are gaps and non-RAM regions that won't be used
by the memory allocator. The 'xen_release_chunk' helps with that
by punching holes in the P2M (PFN to MFN lookup tree) for those
regions and tells us that:

Freeing  20000-20200 pfn range: 512 pages freed
Freeing  40000-40200 pfn range: 512 pages freed
Freeing  bad80-badf4 pfn range: 116 pages freed
Freeing  badf6-bae7f pfn range: 137 pages freed
Freeing  bb000-100000 pfn range: 282624 pages freed
Released 283999 pages of unused memory

Those 283999 pages are subtracted from the nr_pages and are returned
to the hypervisor. The end result is that the initial domain
boots with 1GB less memory as the nr_pages has been subtracted by
the amount of pages residing within the PCI hole. It can balloon up
to that if desired using 'xl mem-set 0 8092', but the balloon driver
is not always compiled in for the initial domain.

This patch, implements the populate hypercall (XENMEM_populate_physmap)
which increases the the domain with the same amount of pages that
were released.

The other solution (that did not work) was to transplant the MFN in
the P2M tree - the ones that were going to be freed were put in
the E820_RAM regions past the nr_pages. But the modifications to the
M2P array (the other side of creating PTEs) were not carried away.
As the hypervisor is the only one capable of modifying that and the
only two hypercalls that would do this are: the update_va_mapping
(which won't work, as during initial bootup only PFNs up to nr_pages
are mapped in the guest) or via the populate hypercall.

The end result is that the kernel can now boot with the
nr_pages without having to subtract the 283999 pages.

On a 8GB machine, with various dom0_mem= parameters this is what we get:

no dom0_mem
-Memory: 6485264k/9435136k available (5817k kernel code, 1136060k absent, 1813812k reserved, 2899k data, 696k init)
+Memory: 7619036k/9435136k available (5817k kernel code, 1136060k absent, 680040k reserved, 2899k data, 696k init)

dom0_mem=3G
-Memory: 2616536k/9435136k available (5817k kernel code, 1136060k absent, 5682540k reserved, 2899k data, 696k init)
+Memory: 2703776k/9435136k available (5817k kernel code, 1136060k absent, 5595300k reserved, 2899k data, 696k init)

dom0_mem=max:3G
-Memory: 2696732k/4281724k available (5817k kernel code, 1136060k absent, 448932k reserved, 2899k data, 696k init)
+Memory: 2702204k/4281724k available (5817k kernel code, 1136060k absent, 443460k reserved, 2899k data, 696k init)

And the 'xm list' or 'xl list' now reflect what the dom0_mem=
argument is.

[v2: Use populate hypercall]
[v3: Remove debug printks]
[v4: Simplify code]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/setup.c |  116 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 112 insertions(+), 4 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 0e82bf1..b44d409 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -26,7 +26,6 @@
 #include <xen/interface/memory.h>
 #include <xen/interface/physdev.h>
 #include <xen/features.h>
-
 #include "xen-ops.h"
 #include "vdso.h"
 
@@ -120,7 +119,105 @@ static unsigned long __init xen_release_chunk(unsigned long start,
 
 	return len;
 }
+static unsigned long __init xen_populate_physmap(unsigned long start,
+						 unsigned long end)
+{
+	struct xen_memory_reservation reservation = {
+		.address_bits = 0,
+		.extent_order = 0,
+		.domid        = DOMID_SELF
+	};
+	unsigned long len = 0;
+	int ret;
+
+	for (pfn = start; pfn < end; pfn++) {
+		unsigned long frame;
+
+		/* Make sure pfn does not exists to start with */
+		if (pfn_to_mfn(pfn) != INVALID_P2M_ENTRY)
+			continue;
 
+		frame = pfn;
+		set_xen_guest_handle(reservation.extent_start, &frame);
+		reservation.nr_extents = 1;
+
+		ret = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
+		WARN(ret != 1, "Failed to populate pfn %lx err=%d\n", pfn, ret);
+		if (ret == 1) {
+			if (!early_set_phys_to_machine(pfn, frame)) {
+				set_xen_guest_handle(reservation.extent_start, &frame);
+				reservation.nr_extents = 1;
+				ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation,
+							&reservation);
+				break;
+			}
+			len++;
+		} else
+			break;
+	}
+	if (len)
+		printk(KERN_INFO "Populated %lx-%lx pfn range: %lu pages added\n",
+		       start, end, len);
+	return len;
+}
+static unsigned long __init xen_populate_chunk(
+	const struct e820entry *list, size_t map_size,
+	unsigned long max_pfn, unsigned long *last_pfn,
+	unsigned long credits_left)
+{
+	const struct e820entry *entry;
+	unsigned int i;
+	unsigned long done = 0;
+	unsigned long dest_pfn;
+
+	for (i = 0, entry = list; i < map_size; i++, entry++) {
+		unsigned long credits = credits_left;
+		unsigned long s_pfn;
+		unsigned long e_pfn;
+		unsigned long pfns;
+		long capacity;
+
+		if (credits <= 0)
+			break;
+
+		if (entry->type != E820_RAM)
+			continue;
+
+		e_pfn = PFN_UP(entry->addr + entry->size);
+
+		/* We only care about E820 after the xen_start_info->nr_pages */
+		if (e_pfn <= max_pfn)
+			continue;
+
+		s_pfn = PFN_DOWN(entry->addr);
+		/* If the E820 falls within the nr_pages, we want to start
+		 * at the nr_pages PFN.
+		 * If that would mean going past the E820 entry, skip it
+		 */
+		if (s_pfn <= max_pfn) {
+			capacity = e_pfn - max_pfn;
+			dest_pfn = max_pfn;
+		} else {
+			/* last_pfn MUST be within E820_RAM regions */
+			if (*last_pfn && e_pfn >= *last_pfn)
+				s_pfn = *last_pfn;
+			capacity = e_pfn - s_pfn;
+			dest_pfn = s_pfn;
+		}
+		/* If we had filled this E820_RAM entry, go to the next one. */
+		if (capacity <= 0)
+			continue;
+
+		if (credits > capacity)
+			credits = capacity;
+
+		pfns = xen_populate_physmap(dest_pfn, dest_pfn + credits);
+		done += pfns;
+		credits_left -= pfns;
+		*last_pfn = (dest_pfn + pfns);
+	}
+	return done;
+}
 static unsigned long __init xen_set_identity_and_release(
 	const struct e820entry *list, size_t map_size, unsigned long nr_pages)
 {
@@ -143,7 +240,6 @@ static unsigned long __init xen_set_identity_and_release(
 	 */
 	for (i = 0, entry = list; i < map_size; i++, entry++) {
 		phys_addr_t end = entry->addr + entry->size;
-
 		if (entry->type == E820_RAM || i == map_size - 1) {
 			unsigned long start_pfn = PFN_DOWN(start);
 			unsigned long end_pfn = PFN_UP(end);
@@ -238,7 +334,9 @@ char * __init xen_memory_setup(void)
 	int rc;
 	struct xen_memory_map memmap;
 	unsigned long max_pages;
+	unsigned long last_pfn = 0;
 	unsigned long extra_pages = 0;
+	unsigned long populated;
 	int i;
 	int op;
 
@@ -278,9 +376,20 @@ char * __init xen_memory_setup(void)
 	 */
 	xen_released_pages = xen_set_identity_and_release(
 		map, memmap.nr_entries, max_pfn);
-	extra_pages += xen_released_pages;
 
 	/*
+	 * Populate back the non-RAM pages and E820 gaps that had been
+	 * released. */
+	populated = xen_populate_chunk(map, memmap.nr_entries,
+			max_pfn, &last_pfn, xen_released_pages);
+
+	extra_pages += (xen_released_pages - populated);
+
+	if (last_pfn > max_pfn) {
+		max_pfn = min(MAX_DOMAIN_PAGES, last_pfn);
+		mem_end = PFN_PHYS(max_pfn);
+	}
+	/*
 	 * Clamp the amount of extra memory to a EXTRA_MEM_RATIO
 	 * factor the base size.  On non-highmem systems, the base
 	 * size is the full initial memory allocation; on highmem it
@@ -293,7 +402,6 @@ char * __init xen_memory_setup(void)
 	 */
 	extra_pages = min(EXTRA_MEM_RATIO * min(max_pfn, PFN_DOWN(MAXMEM)),
 			  extra_pages);
-
 	i = 0;
 	while (i < memmap.nr_entries) {
 		u64 addr = map[i].addr;
-- 
1.7.7.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 8/8] xen/setup: Combine the two hypercall functions - since they are quite similar.
  2012-04-16 17:15 [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5) Konrad Rzeszutek Wilk
                   ` (5 preceding siblings ...)
  2012-04-16 17:15 ` [PATCH 7/8] xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM Konrad Rzeszutek Wilk
@ 2012-04-16 17:15 ` Konrad Rzeszutek Wilk
  2012-05-03 11:58   ` [Xen-devel] " David Vrabel
  2012-05-01 16:37 ` [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5) Konrad Rzeszutek Wilk
  7 siblings, 1 reply; 20+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-04-16 17:15 UTC (permalink / raw)
  To: linux-kernel, david.vrabel, JBeulich; +Cc: xen-devel, Konrad Rzeszutek Wilk

They use the same set of arguments, so it is just the matter
of using the proper hypercall.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/setup.c |   81 ++++++++++++++++++-------------------------------
 1 files changed, 30 insertions(+), 51 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index b44d409..506a3e6 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -83,8 +83,8 @@ static void __init xen_add_extra_mem(u64 start, u64 size)
 		__set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
 }
 
-static unsigned long __init xen_release_chunk(unsigned long start,
-					      unsigned long end)
+static unsigned long __init xen_do_chunk(unsigned long start,
+					 unsigned long end, bool release)
 {
 	struct xen_memory_reservation reservation = {
 		.address_bits = 0,
@@ -95,60 +95,36 @@ static unsigned long __init xen_release_chunk(unsigned long start,
 	unsigned long pfn;
 	int ret;
 
-	for(pfn = start; pfn < end; pfn++) {
-		unsigned long mfn = pfn_to_mfn(pfn);
-
-		/* Make sure pfn exists to start with */
-		if (mfn == INVALID_P2M_ENTRY || mfn_to_pfn(mfn) != pfn)
-			continue;
-
-		set_xen_guest_handle(reservation.extent_start, &mfn);
-		reservation.nr_extents = 1;
-
-		ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation,
-					   &reservation);
-		WARN(ret != 1, "Failed to release pfn %lx err=%d\n", pfn, ret);
-		if (ret == 1) {
-			__set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
-			len++;
-		}
-	}
-	if (len)
-		printk(KERN_INFO "Freeing  %lx-%lx pfn range: %lu pages freed\n",
-		       start, end, len);
-
-	return len;
-}
-static unsigned long __init xen_populate_physmap(unsigned long start,
-						 unsigned long end)
-{
-	struct xen_memory_reservation reservation = {
-		.address_bits = 0,
-		.extent_order = 0,
-		.domid        = DOMID_SELF
-	};
-	unsigned long len = 0;
-	int ret;
-
 	for (pfn = start; pfn < end; pfn++) {
 		unsigned long frame;
+		unsigned long mfn = pfn_to_mfn(pfn);
 
-		/* Make sure pfn does not exists to start with */
-		if (pfn_to_mfn(pfn) != INVALID_P2M_ENTRY)
-			continue;
-
-		frame = pfn;
+		if (release) {
+			/* Make sure pfn exists to start with */
+			if (mfn == INVALID_P2M_ENTRY || mfn_to_pfn(mfn) != pfn)
+				continue;
+			frame = mfn;
+		} else {
+			if (mfn != INVALID_P2M_ENTRY)
+				continue;
+			frame = pfn;
+		}
 		set_xen_guest_handle(reservation.extent_start, &frame);
 		reservation.nr_extents = 1;
 
-		ret = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
-		WARN(ret != 1, "Failed to populate pfn %lx err=%d\n", pfn, ret);
+		ret = HYPERVISOR_memory_op(release ? XENMEM_decrease_reservation : XENMEM_populate_physmap,
+					   &reservation);
+		WARN(ret != 1, "Failed to %s pfn %lx err=%d\n",
+		     release ? "release" : "populate", pfn, ret);
+
 		if (ret == 1) {
-			if (!early_set_phys_to_machine(pfn, frame)) {
+			if (!early_set_phys_to_machine(pfn, release ? INVALID_P2M_ENTRY : frame)) {
+				if (release)
+					break;
 				set_xen_guest_handle(reservation.extent_start, &frame);
 				reservation.nr_extents = 1;
 				ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation,
-							&reservation);
+							   &reservation);
 				break;
 			}
 			len++;
@@ -156,8 +132,11 @@ static unsigned long __init xen_populate_physmap(unsigned long start,
 			break;
 	}
 	if (len)
-		printk(KERN_INFO "Populated %lx-%lx pfn range: %lu pages added\n",
-		       start, end, len);
+		printk(KERN_INFO "%s %lx-%lx pfn range: %lu pages %s\n",
+		       release ? "Freeing" : "Populating",
+		       start, end, len,
+		       release ? "freed" : "added");
+
 	return len;
 }
 static unsigned long __init xen_populate_chunk(
@@ -211,7 +190,7 @@ static unsigned long __init xen_populate_chunk(
 		if (credits > capacity)
 			credits = capacity;
 
-		pfns = xen_populate_physmap(dest_pfn, dest_pfn + credits);
+		pfns = xen_do_chunk(dest_pfn, dest_pfn + credits, false);
 		done += pfns;
 		credits_left -= pfns;
 		*last_pfn = (dest_pfn + pfns);
@@ -249,8 +228,8 @@ static unsigned long __init xen_set_identity_and_release(
 
 			if (start_pfn < end_pfn) {
 				if (start_pfn < nr_pages)
-					released += xen_release_chunk(
-						start_pfn, min(end_pfn, nr_pages));
+					released += xen_do_chunk(
+						start_pfn, min(end_pfn, nr_pages), true);
 
 				identity += set_phys_range_identity(
 					start_pfn, end_pfn);
-- 
1.7.7.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5).
  2012-04-16 17:15 [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5) Konrad Rzeszutek Wilk
                   ` (6 preceding siblings ...)
  2012-04-16 17:15 ` [PATCH 8/8] xen/setup: Combine the two hypercall functions - since they are quite similar Konrad Rzeszutek Wilk
@ 2012-05-01 16:37 ` Konrad Rzeszutek Wilk
  2012-05-02  9:05   ` Jan Beulich
  2012-05-03 11:48   ` David Vrabel
  7 siblings, 2 replies; 20+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-05-01 16:37 UTC (permalink / raw)
  To: linux-kernel, david.vrabel, JBeulich; +Cc: xen-devel

On Mon, Apr 16, 2012 at 01:15:31PM -0400, Konrad Rzeszutek Wilk wrote:
> Changelog v5 [since v4]:
>  - used populate_physmap, fixed bugs.
> [v2-v4: not posted]
>  - reworked the code in setup.c to work properly.
> [v1: https://lkml.org/lkml/2012/3/30/492]
>  - initial patchset

One bug I found was that with 'dom0_mem=max:1G' (with and without these
patches) I would get a bunch of

(XEN) page_alloc.c:1148:d0 Over-allocation for domain 0: 2097153 > 2097152
(XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (0 of 17)

where the (0 of X), sometimes was 1, 2,3,4 or 17 -depending on the machine
I ran on it. I figured it out that the difference was in the ACPI tables
that are allocated - and that those regions - even though are returned
back to the hypervisor, cannot be repopulated. I can't find the actual
exact piece of code in the hypervisor to pin-point and say "Aha".

What I did was use the same metrix that the hypervisor uses to figure
out whether to deny the guest ballooning up - checking the d->tot_pages
against t->max_pages. For that the XENMEM_current_reservation is used.


>From e4568b678455f68d374277319fb5cc41f11b6c4f Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Thu, 26 Apr 2012 22:11:08 -0400
Subject: [PATCH] xen/setup: Cap amount to populate based on current tot_pages
 count.

Previous to this patch we would try to populate exactly up to
xen_released_pages number (so the ones that we released), but
that is incorrect as there are some pages that we thought were
released but in actuality were shared. Depending on the platform
it could be small amounts - 2 pages, but some had much higher - 17.

As such, lets use the XENMEM_current_reservation to get the exact
count of pages we are using, subtract it from nr_pages and use the
lesser of this and xen_released_pages to populate back.

This fixes errors such as:

(XEN) page_alloc.c:1148:d0 Over-allocation for domain 0: 2097153 > 2097152
(XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (0 of 17)

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/setup.c |   16 ++++++++++++++--
 1 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 506a3e6..8e7dcfd 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -287,7 +287,15 @@ static unsigned long __init xen_get_max_pages(void)
 
 	return min(max_pages, MAX_DOMAIN_PAGES);
 }
-
+static unsigned long xen_get_current_pages(void)
+{
+	domid_t domid = DOMID_SELF;
+	int ret;
+	ret = HYPERVISOR_memory_op(XENMEM_current_reservation, &domid);
+	if (ret > 0)
+		return ret;
+	return 0;
+}
 static void xen_align_and_add_e820_region(u64 start, u64 size, int type)
 {
 	u64 end = start + size;
@@ -358,7 +366,11 @@ char * __init xen_memory_setup(void)
 
 	/*
 	 * Populate back the non-RAM pages and E820 gaps that had been
-	 * released. */
+	 * released. But cap it as certain regions cannot be repopulated.
+	 */
+	if (xen_get_current_pages())
+		xen_released_pages = min(max_pfn - xen_get_current_pages(),
+				xen_released_pages);
 	populated = xen_populate_chunk(map, memmap.nr_entries,
 			max_pfn, &last_pfn, xen_released_pages);
 
-- 
1.7.7.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5).
  2012-05-01 16:37 ` [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5) Konrad Rzeszutek Wilk
@ 2012-05-02  9:05   ` Jan Beulich
  2012-05-03 11:48   ` David Vrabel
  1 sibling, 0 replies; 20+ messages in thread
From: Jan Beulich @ 2012-05-02  9:05 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: david.vrabel, xen-devel, linux-kernel

>>> On 01.05.12 at 18:37, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> On Mon, Apr 16, 2012 at 01:15:31PM -0400, Konrad Rzeszutek Wilk wrote:
>> Changelog v5 [since v4]:
>>  - used populate_physmap, fixed bugs.
>> [v2-v4: not posted]
>>  - reworked the code in setup.c to work properly.
>> [v1: https://lkml.org/lkml/2012/3/30/492]
>>  - initial patchset
> 
> One bug I found was that with 'dom0_mem=max:1G' (with and without these
> patches) I would get a bunch of
> 
> (XEN) page_alloc.c:1148:d0 Over-allocation for domain 0: 2097153 > 2097152
> (XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (0 
> of 17)
> 
> where the (0 of X), sometimes was 1, 2,3,4 or 17 -depending on the machine
> I ran on it. I figured it out that the difference was in the ACPI tables
> that are allocated - and that those regions - even though are returned
> back to the hypervisor, cannot be repopulated. I can't find the actual
> exact piece of code in the hypervisor to pin-point and say "Aha".

Are you sure about this? For one, given that you wrote that you saw this
with as little as a single page off, I doubt there's any half way modern
system where ACPI tables all fit into a single page (2 would even seem to
be a theoretical minimum, as there ought to be one NVS region along with
the "normal" tables).

Second, the ability to repopulate shouldn't really depend on the nature of
the corresponding machine pages.

Jan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5).
  2012-05-01 16:37 ` [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5) Konrad Rzeszutek Wilk
  2012-05-02  9:05   ` Jan Beulich
@ 2012-05-03 11:48   ` David Vrabel
  2012-05-03 15:15     ` [Xen-devel] " David Vrabel
  1 sibling, 1 reply; 20+ messages in thread
From: David Vrabel @ 2012-05-03 11:48 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: linux-kernel, JBeulich, xen-devel

On 01/05/12 17:37, Konrad Rzeszutek Wilk wrote:
> On Mon, Apr 16, 2012 at 01:15:31PM -0400, Konrad Rzeszutek Wilk wrote:
>> Changelog v5 [since v4]:
>>  - used populate_physmap, fixed bugs.
>> [v2-v4: not posted]
>>  - reworked the code in setup.c to work properly.
>> [v1: https://lkml.org/lkml/2012/3/30/492]
>>  - initial patchset
> 
> One bug I found was that with 'dom0_mem=max:1G' (with and without these
> patches) I would get a bunch of
> 
> (XEN) page_alloc.c:1148:d0 Over-allocation for domain 0: 2097153 > 2097152
> (XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (0 of 17)
> 
> where the (0 of X), sometimes was 1, 2,3,4 or 17 -depending on the machine
> I ran on it. I figured it out that the difference was in the ACPI tables
> that are allocated - and that those regions - even though are returned
> back to the hypervisor, cannot be repopulated. I can't find the actual
> exact piece of code in the hypervisor to pin-point and say "Aha".

It was tricky to track down what is going here but I think I see what's
happening.

The problem pages (on the system I looked at) were located just before
the ISA memory region (so PFN < a0) and so they are mapped in the
bootstrap page tables and have an additional ref so are not immediately
freed when the page is released.  They do get freed later on, presumably
when the page tables are swapped over.

I think the mapping needs to be removed with
HYPERVISOR_update_va_mapping() before releasing the page.  This is
already done for the ISA region in xen_ident_map_ISA().

I may be easier to avoid doing anything with the PFNs < 0x100 and take
the minimal lose of memory.

David

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [PATCH 6/8] xen/setup: Work properly with 'dom0_mem=X' or with not dom0_mem.
  2012-04-16 17:15 ` [PATCH 6/8] xen/setup: Work properly with 'dom0_mem=X' or with not dom0_mem Konrad Rzeszutek Wilk
@ 2012-05-03 11:54   ` David Vrabel
  0 siblings, 0 replies; 20+ messages in thread
From: David Vrabel @ 2012-05-03 11:54 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: linux-kernel, david.vrabel, JBeulich, xen-devel

On 16/04/12 18:15, Konrad Rzeszutek Wilk wrote:
> We ignored the X value and ended up populating up to
> max(MAX_DOMAIN_PAGES, last E820_RAM entry).
> 
> This fixes it by figuring out how many RAM nr_pages the
> hypervisor wanted to provide to us and cap the populate
> hypercalls up to that.

I don't think we should change the behavior again, particularly as the
dom0_mem documentation doesn't describe this proposed behaviour.

I would drop this patch.

David

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [PATCH 7/8] xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM
  2012-04-16 17:15 ` [PATCH 7/8] xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM Konrad Rzeszutek Wilk
@ 2012-05-03 11:56   ` David Vrabel
  0 siblings, 0 replies; 20+ messages in thread
From: David Vrabel @ 2012-05-03 11:56 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: linux-kernel, david.vrabel, JBeulich, xen-devel

On 16/04/12 18:15, Konrad Rzeszutek Wilk wrote:
> 
> 
> This patch, implements the populate hypercall (XENMEM_populate_physmap)
> which increases the the domain with the same amount of pages that
> were released.

Provided the problem with the unreclaimable pages are resolved:

Acked-by: David Vrabel <david.vrabel@citrix.com>

David

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [PATCH 8/8] xen/setup: Combine the two hypercall functions - since they are quite similar.
  2012-04-16 17:15 ` [PATCH 8/8] xen/setup: Combine the two hypercall functions - since they are quite similar Konrad Rzeszutek Wilk
@ 2012-05-03 11:58   ` David Vrabel
  2012-05-03 15:37     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 20+ messages in thread
From: David Vrabel @ 2012-05-03 11:58 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: linux-kernel, david.vrabel, JBeulich, xen-devel

On 16/04/12 18:15, Konrad Rzeszutek Wilk wrote:
> They use the same set of arguments, so it is just the matter
> of using the proper hypercall.
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Acked-by: David Vrabel <david.vrabel@citrix.com>

But would it be better to fold this into the previous patch?

David

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5).
  2012-05-03 11:48   ` David Vrabel
@ 2012-05-03 15:15     ` David Vrabel
  2012-05-03 16:27       ` David Vrabel
  0 siblings, 1 reply; 20+ messages in thread
From: David Vrabel @ 2012-05-03 15:15 UTC (permalink / raw)
  To: David Vrabel; +Cc: Konrad Rzeszutek Wilk, xen-devel, linux-kernel, JBeulich

On 03/05/12 12:48, David Vrabel wrote:
> On 01/05/12 17:37, Konrad Rzeszutek Wilk wrote:
>> On Mon, Apr 16, 2012 at 01:15:31PM -0400, Konrad Rzeszutek Wilk wrote:
>>> Changelog v5 [since v4]:
>>>  - used populate_physmap, fixed bugs.
>>> [v2-v4: not posted]
>>>  - reworked the code in setup.c to work properly.
>>> [v1: https://lkml.org/lkml/2012/3/30/492]
>>>  - initial patchset
>>
>> One bug I found was that with 'dom0_mem=max:1G' (with and without these
>> patches) I would get a bunch of
>>
>> (XEN) page_alloc.c:1148:d0 Over-allocation for domain 0: 2097153 > 2097152
>> (XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (0 of 17)
>>
>> where the (0 of X), sometimes was 1, 2,3,4 or 17 -depending on the machine
>> I ran on it. I figured it out that the difference was in the ACPI tables
>> that are allocated - and that those regions - even though are returned
>> back to the hypervisor, cannot be repopulated. I can't find the actual
>> exact piece of code in the hypervisor to pin-point and say "Aha".
> 
> It was tricky to track down what is going here but I think I see what's
> happening.
> 
> The problem pages (on the system I looked at) were located just before
> the ISA memory region (so PFN < a0) and so they are mapped in the
> bootstrap page tables and have an additional ref so are not immediately
> freed when the page is released.  They do get freed later on, presumably
> when the page tables are swapped over.

It's not the bootstrap page tables but those constructed in
xen_setup_kernel_pagetable() but this has the same effect.

> I think the mapping needs to be removed with
> HYPERVISOR_update_va_mapping() before releasing the page.  This is
> already done for the ISA region in xen_ident_map_ISA().

And here's a patch that does this.  I've not given it a lot of testing.

This is on top of your 8/8 patch and your "xen/setup: Cap amount to
populate based on current tot_pages count." patch is no longer needed.

David

8<---------------------
>From 17900ce942ed34ccc85b8e6fbce392118d95d9d3 Mon Sep 17 00:00:00 2001
From: David Vrabel <david.vrabel@citrix.com>
Date: Thu, 3 May 2012 15:57:00 +0100
Subject: [PATCH] xen: update VA mapping when releasing memory during setup

In xen_memory_setup(), if a page that is being released has a VA
mapping this must also be updated.  Otherwise, the page will be not
released completely -- it will still be referenced in Xen and won't be
freed util the mapping is removed and this prevents it from being
reallocated at a different PFN.

This was already being done for the ISA memory region in
xen_ident_map_ISA() but on many systems this was omitting a few pages
as many systems marked a few pages below the ISA memory region as
reserved in the e820 map.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/enlighten.c |    1 -
 arch/x86/xen/mmu.c       |   23 -----------------------
 arch/x86/xen/setup.c     |   41 ++++++++++++++++++++++++++++++++++-------
 arch/x86/xen/xen-ops.h   |    1 -
 4 files changed, 34 insertions(+), 32 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index a8f8844..ff9a20a 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1306,7 +1306,6 @@ asmlinkage void __init xen_start_kernel(void)
 
 	xen_raw_console_write("mapping kernel into physical memory\n");
 	pgd = xen_setup_kernel_pagetable(pgd, xen_start_info->nr_pages);
-	xen_ident_map_ISA();
 
 	/* Allocate and initialize top and mid mfn levels for p2m structure */
 	xen_build_mfn_list_list();
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index b8e2794..b756d8c 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1929,29 +1929,6 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot)
 #endif
 }
 
-void __init xen_ident_map_ISA(void)
-{
-	unsigned long pa;
-
-	/*
-	 * If we're dom0, then linear map the ISA machine addresses into
-	 * the kernel's address space.
-	 */
-	if (!xen_initial_domain())
-		return;
-
-	xen_raw_printk("Xen: setup ISA identity maps\n");
-
-	for (pa = ISA_START_ADDRESS; pa < ISA_END_ADDRESS; pa += PAGE_SIZE) {
-		pte_t pte = mfn_pte(PFN_DOWN(pa), PAGE_KERNEL_IO);
-
-		if (HYPERVISOR_update_va_mapping(PAGE_OFFSET + pa, pte, 0))
-			BUG();
-	}
-
-	xen_flush_tlb();
-}
-
 static void __init xen_post_allocator_init(void)
 {
 	pv_mmu_ops.set_pte = xen_set_pte;
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 506a3e6..d5f8714 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -139,6 +139,13 @@ static unsigned long __init xen_do_chunk(unsigned long start,
 
 	return len;
 }
+
+static unsigned long __init xen_release_chunk(unsigned long start,
+					      unsigned long end)
+{
+	return xen_do_chunk(start, end, true);
+}
+
 static unsigned long __init xen_populate_chunk(
 	const struct e820entry *list, size_t map_size,
 	unsigned long max_pfn, unsigned long *last_pfn,
@@ -197,6 +204,29 @@ static unsigned long __init xen_populate_chunk(
 	}
 	return done;
 }
+
+static void __init xen_set_identity_and_release_chunk(
+	unsigned long start_pfn, unsigned long end_pfn, unsigned long nr_pages,
+	unsigned long *released, unsigned long *identity)
+{
+	unsigned long pfn;
+
+	/*
+	 * If the PFNs are currently mapped, the VA mapping also needs
+	 * to be updated to be 1:1.
+	 */
+	for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn; pfn++)
+		(void)HYPERVISOR_update_va_mapping(
+			(unsigned long)__va(pfn << PAGE_SHIFT),
+			mfn_pte(pfn, PAGE_KERNEL_IO), 0);
+
+	if (start_pfn < nr_pages)
+		*released += xen_release_chunk(
+			start_pfn, min(end_pfn, nr_pages));
+
+	*identity += set_phys_range_identity(start_pfn, end_pfn);
+}
+
 static unsigned long __init xen_set_identity_and_release(
 	const struct e820entry *list, size_t map_size, unsigned long nr_pages)
 {
@@ -226,14 +256,11 @@ static unsigned long __init xen_set_identity_and_release(
 			if (entry->type == E820_RAM)
 				end_pfn = PFN_UP(entry->addr);
 
-			if (start_pfn < end_pfn) {
-				if (start_pfn < nr_pages)
-					released += xen_do_chunk(
-						start_pfn, min(end_pfn, nr_pages), true);
+			if (start_pfn < end_pfn)
+				xen_set_identity_and_release_chunk(
+					start_pfn, end_pfn, nr_pages,
+					&released, &identity);
 
-				identity += set_phys_range_identity(
-					start_pfn, end_pfn);
-			}
 			start = end;
 		}
 	}
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index b095739..506fa08 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -28,7 +28,6 @@ void xen_setup_shared_info(void);
 void xen_build_mfn_list_list(void);
 void xen_setup_machphys_mapping(void);
 pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn);
-void xen_ident_map_ISA(void);
 void xen_reserve_top(void);
 extern unsigned long xen_max_p2m_pfn;
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [PATCH 8/8] xen/setup: Combine the two hypercall functions - since they are quite similar.
  2012-05-03 11:58   ` [Xen-devel] " David Vrabel
@ 2012-05-03 15:37     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 20+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-05-03 15:37 UTC (permalink / raw)
  To: David Vrabel
  Cc: Konrad Rzeszutek Wilk, xen-devel, linux-kernel, JBeulich, david.vrabel

On Thu, May 03, 2012 at 12:58:05PM +0100, David Vrabel wrote:
> On 16/04/12 18:15, Konrad Rzeszutek Wilk wrote:
> > They use the same set of arguments, so it is just the matter
> > of using the proper hypercall.
> > 
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
> Acked-by: David Vrabel <david.vrabel@citrix.com>
> 
> But would it be better to fold this into the previous patch?

Nah, one logical change per patch.
> 
> David
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5).
  2012-05-03 15:15     ` [Xen-devel] " David Vrabel
@ 2012-05-03 16:27       ` David Vrabel
  2012-05-07 18:48         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 20+ messages in thread
From: David Vrabel @ 2012-05-03 16:27 UTC (permalink / raw)
  To: xen-devel; +Cc: Konrad Rzeszutek Wilk, linux-kernel, JBeulich

On 03/05/12 16:15, David Vrabel wrote:
> 
> xen: update VA mapping when releasing memory during setup
> 
> In xen_memory_setup(), if a page that is being released has a VA
> mapping this must also be updated.  Otherwise, the page will be not
> released completely -- it will still be referenced in Xen and won't be
> freed util the mapping is removed and this prevents it from being
> reallocated at a different PFN.
> 
> This was already being done for the ISA memory region in
> xen_ident_map_ISA() but on many systems this was omitting a few pages
> as many systems marked a few pages below the ISA memory region as
> reserved in the e820 map.
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
[...]
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1929,29 +1929,6 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot)
>  #endif
>  }
>  
> -void __init xen_ident_map_ISA(void)
> -{
> -	unsigned long pa;
> -
> -	/*
> -	 * If we're dom0, then linear map the ISA machine addresses into
> -	 * the kernel's address space.
> -	 */
> -	if (!xen_initial_domain())
> -		return;

It might look like this test has gone, however the new code which
updates the VA mapping uses the e820 map and for a domU its map will not
have a ISA region so there's no mapping to be updated.

David

> -
> -	xen_raw_printk("Xen: setup ISA identity maps\n");
> -
> -	for (pa = ISA_START_ADDRESS; pa < ISA_END_ADDRESS; pa += PAGE_SIZE) {
> -		pte_t pte = mfn_pte(PFN_DOWN(pa), PAGE_KERNEL_IO);
> -
> -		if (HYPERVISOR_update_va_mapping(PAGE_OFFSET + pa, pte, 0))
> -			BUG();
> -	}
> -
> -	xen_flush_tlb();
> -}
> -
>  static void __init xen_post_allocator_init(void)
>  {
>  	pv_mmu_ops.set_pte = xen_set_pte;
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index 506a3e6..d5f8714 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -139,6 +139,13 @@ static unsigned long __init xen_do_chunk(unsigned long start,
>  
>  	return len;
>  }
> +
> +static unsigned long __init xen_release_chunk(unsigned long start,
> +					      unsigned long end)
> +{
> +	return xen_do_chunk(start, end, true);
> +}
> +
>  static unsigned long __init xen_populate_chunk(
>  	const struct e820entry *list, size_t map_size,
>  	unsigned long max_pfn, unsigned long *last_pfn,
> @@ -197,6 +204,29 @@ static unsigned long __init xen_populate_chunk(
>  	}
>  	return done;
>  }
> +
> +static void __init xen_set_identity_and_release_chunk(
> +	unsigned long start_pfn, unsigned long end_pfn, unsigned long nr_pages,
> +	unsigned long *released, unsigned long *identity)
> +{
> +	unsigned long pfn;
> +
> +	/*
> +	 * If the PFNs are currently mapped, the VA mapping also needs
> +	 * to be updated to be 1:1.
> +	 */
> +	for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn; pfn++)
> +		(void)HYPERVISOR_update_va_mapping(
> +			(unsigned long)__va(pfn << PAGE_SHIFT),
> +			mfn_pte(pfn, PAGE_KERNEL_IO), 0);
> +
> +	if (start_pfn < nr_pages)
> +		*released += xen_release_chunk(
> +			start_pfn, min(end_pfn, nr_pages));
> +
> +	*identity += set_phys_range_identity(start_pfn, end_pfn);
> +}
> +
>  static unsigned long __init xen_set_identity_and_release(
>  	const struct e820entry *list, size_t map_size, unsigned long nr_pages)
>  {
> @@ -226,14 +256,11 @@ static unsigned long __init xen_set_identity_and_release(
>  			if (entry->type == E820_RAM)
>  				end_pfn = PFN_UP(entry->addr);
>  
> -			if (start_pfn < end_pfn) {
> -				if (start_pfn < nr_pages)
> -					released += xen_do_chunk(
> -						start_pfn, min(end_pfn, nr_pages), true);
> +			if (start_pfn < end_pfn)
> +				xen_set_identity_and_release_chunk(
> +					start_pfn, end_pfn, nr_pages,
> +					&released, &identity);
>  
> -				identity += set_phys_range_identity(
> -					start_pfn, end_pfn);
> -			}
>  			start = end;
>  		}
>  	}
> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> index b095739..506fa08 100644
> --- a/arch/x86/xen/xen-ops.h
> +++ b/arch/x86/xen/xen-ops.h
> @@ -28,7 +28,6 @@ void xen_setup_shared_info(void);
>  void xen_build_mfn_list_list(void);
>  void xen_setup_machphys_mapping(void);
>  pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn);
> -void xen_ident_map_ISA(void);
>  void xen_reserve_top(void);
>  extern unsigned long xen_max_p2m_pfn;
>  


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5).
  2012-05-03 16:27       ` David Vrabel
@ 2012-05-07 18:48         ` Konrad Rzeszutek Wilk
  2012-05-08 18:12           ` David Vrabel
  0 siblings, 1 reply; 20+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-05-07 18:48 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, linux-kernel, JBeulich

On Thu, May 03, 2012 at 05:27:27PM +0100, David Vrabel wrote:
> On 03/05/12 16:15, David Vrabel wrote:
> > 
> > xen: update VA mapping when releasing memory during setup
> > 
> > In xen_memory_setup(), if a page that is being released has a VA
> > mapping this must also be updated.  Otherwise, the page will be not
> > released completely -- it will still be referenced in Xen and won't be
> > freed util the mapping is removed and this prevents it from being
> > reallocated at a different PFN.
> > 
> > This was already being done for the ISA memory region in
> > xen_ident_map_ISA() but on many systems this was omitting a few pages
> > as many systems marked a few pages below the ISA memory region as
> > reserved in the e820 map.
> > 
> > Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> > ---
> [...]
> > --- a/arch/x86/xen/mmu.c
> > +++ b/arch/x86/xen/mmu.c
> > @@ -1929,29 +1929,6 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot)
> >  #endif
> >  }
> >  
> > -void __init xen_ident_map_ISA(void)
> > -{
> > -	unsigned long pa;
> > -
> > -	/*
> > -	 * If we're dom0, then linear map the ISA machine addresses into
> > -	 * the kernel's address space.
> > -	 */
> > -	if (!xen_initial_domain())
> > -		return;
> 
> It might look like this test has gone, however the new code which
> updates the VA mapping uses the e820 map and for a domU its map will not
> have a ISA region so there's no mapping to be updated.

What if you use e820_hole=1 and the pci=xx in the guest?
> 
> David
> 
> > -
> > -	xen_raw_printk("Xen: setup ISA identity maps\n");
> > -
> > -	for (pa = ISA_START_ADDRESS; pa < ISA_END_ADDRESS; pa += PAGE_SIZE) {
> > -		pte_t pte = mfn_pte(PFN_DOWN(pa), PAGE_KERNEL_IO);
> > -
> > -		if (HYPERVISOR_update_va_mapping(PAGE_OFFSET + pa, pte, 0))
> > -			BUG();
> > -	}
> > -
> > -	xen_flush_tlb();
> > -}
> > -
> >  static void __init xen_post_allocator_init(void)
> >  {
> >  	pv_mmu_ops.set_pte = xen_set_pte;
> > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> > index 506a3e6..d5f8714 100644
> > --- a/arch/x86/xen/setup.c
> > +++ b/arch/x86/xen/setup.c
> > @@ -139,6 +139,13 @@ static unsigned long __init xen_do_chunk(unsigned long start,
> >  
> >  	return len;
> >  }
> > +
> > +static unsigned long __init xen_release_chunk(unsigned long start,
> > +					      unsigned long end)
> > +{
> > +	return xen_do_chunk(start, end, true);
> > +}
> > +
> >  static unsigned long __init xen_populate_chunk(
> >  	const struct e820entry *list, size_t map_size,
> >  	unsigned long max_pfn, unsigned long *last_pfn,
> > @@ -197,6 +204,29 @@ static unsigned long __init xen_populate_chunk(
> >  	}
> >  	return done;
> >  }
> > +
> > +static void __init xen_set_identity_and_release_chunk(
> > +	unsigned long start_pfn, unsigned long end_pfn, unsigned long nr_pages,
> > +	unsigned long *released, unsigned long *identity)
> > +{
> > +	unsigned long pfn;
> > +
> > +	/*
> > +	 * If the PFNs are currently mapped, the VA mapping also needs
> > +	 * to be updated to be 1:1.
> > +	 */
> > +	for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn; pfn++)
> > +		(void)HYPERVISOR_update_va_mapping(
> > +			(unsigned long)__va(pfn << PAGE_SHIFT),
> > +			mfn_pte(pfn, PAGE_KERNEL_IO), 0);
> > +
> > +	if (start_pfn < nr_pages)
> > +		*released += xen_release_chunk(
> > +			start_pfn, min(end_pfn, nr_pages));
> > +
> > +	*identity += set_phys_range_identity(start_pfn, end_pfn);
> > +}
> > +
> >  static unsigned long __init xen_set_identity_and_release(
> >  	const struct e820entry *list, size_t map_size, unsigned long nr_pages)
> >  {
> > @@ -226,14 +256,11 @@ static unsigned long __init xen_set_identity_and_release(
> >  			if (entry->type == E820_RAM)
> >  				end_pfn = PFN_UP(entry->addr);
> >  
> > -			if (start_pfn < end_pfn) {
> > -				if (start_pfn < nr_pages)
> > -					released += xen_do_chunk(
> > -						start_pfn, min(end_pfn, nr_pages), true);
> > +			if (start_pfn < end_pfn)
> > +				xen_set_identity_and_release_chunk(
> > +					start_pfn, end_pfn, nr_pages,
> > +					&released, &identity);
> >  
> > -				identity += set_phys_range_identity(
> > -					start_pfn, end_pfn);
> > -			}
> >  			start = end;
> >  		}
> >  	}
> > diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> > index b095739..506fa08 100644
> > --- a/arch/x86/xen/xen-ops.h
> > +++ b/arch/x86/xen/xen-ops.h
> > @@ -28,7 +28,6 @@ void xen_setup_shared_info(void);
> >  void xen_build_mfn_list_list(void);
> >  void xen_setup_machphys_mapping(void);
> >  pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn);
> > -void xen_ident_map_ISA(void);
> >  void xen_reserve_top(void);
> >  extern unsigned long xen_max_p2m_pfn;
> >  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5).
  2012-05-07 18:48         ` Konrad Rzeszutek Wilk
@ 2012-05-08 18:12           ` David Vrabel
  2012-05-08 18:24             ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 20+ messages in thread
From: David Vrabel @ 2012-05-08 18:12 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: David Vrabel, xen-devel, linux-kernel, JBeulich

On 07/05/12 19:48, Konrad Rzeszutek Wilk wrote:
> On Thu, May 03, 2012 at 05:27:27PM +0100, David Vrabel wrote:
>> On 03/05/12 16:15, David Vrabel wrote:
>>>
>>> xen: update VA mapping when releasing memory during setup
>>>
>>> In xen_memory_setup(), if a page that is being released has a VA
>>> mapping this must also be updated.  Otherwise, the page will be not
>>> released completely -- it will still be referenced in Xen and won't be
>>> freed util the mapping is removed and this prevents it from being
>>> reallocated at a different PFN.
>>>
>>> This was already being done for the ISA memory region in
>>> xen_ident_map_ISA() but on many systems this was omitting a few pages
>>> as many systems marked a few pages below the ISA memory region as
>>> reserved in the e820 map.
>>>
>>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
>>> ---
>> [...]
>>> --- a/arch/x86/xen/mmu.c
>>> +++ b/arch/x86/xen/mmu.c
>>> @@ -1929,29 +1929,6 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot)
>>>  #endif
>>>  }
>>>  
>>> -void __init xen_ident_map_ISA(void)
>>> -{
>>> -	unsigned long pa;
>>> -
>>> -	/*
>>> -	 * If we're dom0, then linear map the ISA machine addresses into
>>> -	 * the kernel's address space.
>>> -	 */
>>> -	if (!xen_initial_domain())
>>> -		return;
>>
>> It might look like this test has gone, however the new code which
>> updates the VA mapping uses the e820 map and for a domU its map will not
>> have a ISA region so there's no mapping to be updated.
> 
> What if you use e820_hole=1 and the pci=xx in the guest?

Are these xl configuration options? I'm not familiar with xl.

The PCI memory hole should be above 3 GiB so this hole will be will
above the memory that will be initially mapped at boot.

I've not managed to persuade my test box to passthrough a PCI device to
a guest (using xapi as the toolstack) to confirm, though.  I'll have
another go tomorrow.

David

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5).
  2012-05-08 18:12           ` David Vrabel
@ 2012-05-08 18:24             ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 20+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-05-08 18:24 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, JBeulich, linux-kernel

On Tue, May 08, 2012 at 07:12:18PM +0100, David Vrabel wrote:
> On 07/05/12 19:48, Konrad Rzeszutek Wilk wrote:
> > On Thu, May 03, 2012 at 05:27:27PM +0100, David Vrabel wrote:
> >> On 03/05/12 16:15, David Vrabel wrote:
> >>>
> >>> xen: update VA mapping when releasing memory during setup
> >>>
> >>> In xen_memory_setup(), if a page that is being released has a VA
> >>> mapping this must also be updated.  Otherwise, the page will be not
> >>> released completely -- it will still be referenced in Xen and won't be
> >>> freed util the mapping is removed and this prevents it from being
> >>> reallocated at a different PFN.
> >>>
> >>> This was already being done for the ISA memory region in
> >>> xen_ident_map_ISA() but on many systems this was omitting a few pages
> >>> as many systems marked a few pages below the ISA memory region as
> >>> reserved in the e820 map.
> >>>
> >>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> >>> ---
> >> [...]
> >>> --- a/arch/x86/xen/mmu.c
> >>> +++ b/arch/x86/xen/mmu.c
> >>> @@ -1929,29 +1929,6 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot)
> >>>  #endif
> >>>  }
> >>>  
> >>> -void __init xen_ident_map_ISA(void)
> >>> -{
> >>> -	unsigned long pa;
> >>> -
> >>> -	/*
> >>> -	 * If we're dom0, then linear map the ISA machine addresses into
> >>> -	 * the kernel's address space.
> >>> -	 */
> >>> -	if (!xen_initial_domain())
> >>> -		return;
> >>
> >> It might look like this test has gone, however the new code which
> >> updates the VA mapping uses the e820 map and for a domU its map will not
> >> have a ISA region so there's no mapping to be updated.
> > 
> > What if you use e820_hole=1 and the pci=xx in the guest?
> 
> Are these xl configuration options? I'm not familiar with xl.

Yeah. Just have this in your guest config:

e820_hole=1
pci=["01:00.0"]

(And do the PCI bind/unbind to the xen-pciback module)

but looking at the source there is this comment:
/* Weed out anything under 1MB */

so I think we are fine.

> 
> The PCI memory hole should be above 3 GiB so this hole will be will
> above the memory that will be initially mapped at boot.
> 
> I've not managed to persuade my test box to passthrough a PCI device to
> a guest (using xapi as the toolstack) to confirm, though.  I'll have
> another go tomorrow.
> 
> David
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-05-08 18:30 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-16 17:15 [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5) Konrad Rzeszutek Wilk
2012-04-16 17:15 ` [PATCH 1/8] xen/p2m: Move code around to allow for better re-usage Konrad Rzeszutek Wilk
2012-04-16 17:15 ` [PATCH 2/8] xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument Konrad Rzeszutek Wilk
2012-04-16 17:15 ` [PATCH 3/8] xen/p2m: Collapse early_alloc_p2m_middle redundant checks Konrad Rzeszutek Wilk
2012-04-16 17:15 ` [PATCH 4/8] xen/p2m: An early bootup variant of set_phys_to_machine Konrad Rzeszutek Wilk
2012-04-16 17:15 ` [PATCH 6/8] xen/setup: Work properly with 'dom0_mem=X' or with not dom0_mem Konrad Rzeszutek Wilk
2012-05-03 11:54   ` [Xen-devel] " David Vrabel
2012-04-16 17:15 ` [PATCH 7/8] xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM Konrad Rzeszutek Wilk
2012-05-03 11:56   ` [Xen-devel] " David Vrabel
2012-04-16 17:15 ` [PATCH 8/8] xen/setup: Combine the two hypercall functions - since they are quite similar Konrad Rzeszutek Wilk
2012-05-03 11:58   ` [Xen-devel] " David Vrabel
2012-05-03 15:37     ` Konrad Rzeszutek Wilk
2012-05-01 16:37 ` [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5) Konrad Rzeszutek Wilk
2012-05-02  9:05   ` Jan Beulich
2012-05-03 11:48   ` David Vrabel
2012-05-03 15:15     ` [Xen-devel] " David Vrabel
2012-05-03 16:27       ` David Vrabel
2012-05-07 18:48         ` Konrad Rzeszutek Wilk
2012-05-08 18:12           ` David Vrabel
2012-05-08 18:24             ` Konrad Rzeszutek Wilk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).