linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Fixes for v3.6 (v1).
@ 2012-08-16 15:50 Konrad Rzeszutek Wilk
  2012-08-16 15:50 ` [PATCH 1/2] xen/p2m: Fix for 32-bit builds the "Reserve 8MB of _brk space for P2M" Konrad Rzeszutek Wilk
  2012-08-16 15:50 ` [PATCH 2/2] Revert "xen PVonHVM: move shared_info to MMIO before kexec" Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 9+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-08-16 15:50 UTC (permalink / raw)
  To: linux-kernel, xen-devel

I sent a git pull two days ago to Linus and then I realized yesterday
we still have two more bugs that should be fixed.

So these are two tiny fixes that I am going to propose for v3.6.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] xen/p2m: Fix for 32-bit builds the "Reserve 8MB of _brk space for P2M"
  2012-08-16 15:50 [PATCH] Fixes for v3.6 (v1) Konrad Rzeszutek Wilk
@ 2012-08-16 15:50 ` Konrad Rzeszutek Wilk
  2012-08-16 17:32   ` [Xen-devel] " Konrad Rzeszutek Wilk
  2012-08-16 15:50 ` [PATCH 2/2] Revert "xen PVonHVM: move shared_info to MMIO before kexec" Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 9+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-08-16 15:50 UTC (permalink / raw)
  To: linux-kernel, xen-devel; +Cc: Konrad Rzeszutek Wilk

The git commit 5bc6f9888db5739abfa0cae279b4b442e4db8049
xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back.

extended the _brk space to fit 1048576 PFNs. The math is that each
P2M leaf can cover PAGE_SIZE/sizeof(unsigned long) PFNs. In 64-bit
that means 512 PFNs, on 32-bit that is 1024. If on 64-bit machines
we want to cover 4GB of PFNs, that means having enough for space
to fit 1048576 unsigned longs.

On 64-bit:
1048576 * sizeof(unsigned long) (8) bytes = 8MB

On 32-bit:
1048576 * sizeof(unsigned long) (4) bytes = 4MB

We fix that by using the above mentioned math instead of predefined
PMD_SIZE.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/p2m.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index b2e91d4..626c979 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -198,7 +198,8 @@ RESERVE_BRK(p2m_mid_identity, PAGE_SIZE * 2 * 3);
  * max we have is seen is 395979, but that does not mean it can't be more.
  * But some machines can have 3GB I/O holes even. So lets reserve enough
  * for 4GB of I/O and E820 holes. */
-RESERVE_BRK(p2m_populated, PMD_SIZE * 4);
+RESERVE_BRK(p2m_populated, 1048576 * sizeof(unsigned long));
+
 static inline unsigned p2m_top_index(unsigned long pfn)
 {
 	BUG_ON(pfn >= MAX_P2M_PFN);
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] Revert "xen PVonHVM: move shared_info to MMIO before kexec"
  2012-08-16 15:50 [PATCH] Fixes for v3.6 (v1) Konrad Rzeszutek Wilk
  2012-08-16 15:50 ` [PATCH 1/2] xen/p2m: Fix for 32-bit builds the "Reserve 8MB of _brk space for P2M" Konrad Rzeszutek Wilk
@ 2012-08-16 15:50 ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 9+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-08-16 15:50 UTC (permalink / raw)
  To: linux-kernel, xen-devel; +Cc: Konrad Rzeszutek Wilk

This reverts commit 00e37bdb0113a98408de42db85be002f21dbffd3.

During shutdown of PVHVM guests with more than 2VCPUs on certain
machines we can hit the race where the replaced shared_info is not
replaced fast enough and the PV time clock retries reading the same
area over and over without any success and is stuck in an
infinite loop.

Acked-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/enlighten.c   |  118 ++++---------------------------------------
 arch/x86/xen/suspend.c     |    2 +-
 arch/x86/xen/xen-ops.h     |    2 +-
 drivers/xen/platform-pci.c |   15 ------
 include/xen/events.h       |    2 -
 5 files changed, 13 insertions(+), 126 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index a6f8acb..f1814fc 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -31,7 +31,6 @@
 #include <linux/pci.h>
 #include <linux/gfp.h>
 #include <linux/memblock.h>
-#include <linux/syscore_ops.h>
 
 #include <xen/xen.h>
 #include <xen/interface/xen.h>
@@ -1472,130 +1471,38 @@ asmlinkage void __init xen_start_kernel(void)
 #endif
 }
 
-#ifdef CONFIG_XEN_PVHVM
-/*
- * The pfn containing the shared_info is located somewhere in RAM. This
- * will cause trouble if the current kernel is doing a kexec boot into a
- * new kernel. The new kernel (and its startup code) can not know where
- * the pfn is, so it can not reserve the page. The hypervisor will
- * continue to update the pfn, and as a result memory corruption occours
- * in the new kernel.
- *
- * One way to work around this issue is to allocate a page in the
- * xen-platform pci device's BAR memory range. But pci init is done very
- * late and the shared_info page is already in use very early to read
- * the pvclock. So moving the pfn from RAM to MMIO is racy because some
- * code paths on other vcpus could access the pfn during the small
- * window when the old pfn is moved to the new pfn. There is even a
- * small window were the old pfn is not backed by a mfn, and during that
- * time all reads return -1.
- *
- * Because it is not known upfront where the MMIO region is located it
- * can not be used right from the start in xen_hvm_init_shared_info.
- *
- * To minimise trouble the move of the pfn is done shortly before kexec.
- * This does not eliminate the race because all vcpus are still online
- * when the syscore_ops will be called. But hopefully there is no work
- * pending at this point in time. Also the syscore_op is run last which
- * reduces the risk further.
- */
-
-static struct shared_info *xen_hvm_shared_info;
-
-static void xen_hvm_connect_shared_info(unsigned long pfn)
+void __ref xen_hvm_init_shared_info(void)
 {
+	int cpu;
 	struct xen_add_to_physmap xatp;
+	static struct shared_info *shared_info_page = 0;
 
+	if (!shared_info_page)
+		shared_info_page = (struct shared_info *)
+			extend_brk(PAGE_SIZE, PAGE_SIZE);
 	xatp.domid = DOMID_SELF;
 	xatp.idx = 0;
 	xatp.space = XENMAPSPACE_shared_info;
-	xatp.gpfn = pfn;
+	xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
 	if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
 		BUG();
 
-}
-static void xen_hvm_set_shared_info(struct shared_info *sip)
-{
-	int cpu;
-
-	HYPERVISOR_shared_info = sip;
+	HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
 
 	/* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
 	 * page, we use it in the event channel upcall and in some pvclock
 	 * related functions. We don't need the vcpu_info placement
 	 * optimizations because we don't use any pv_mmu or pv_irq op on
 	 * HVM.
-	 * When xen_hvm_set_shared_info is run at boot time only vcpu 0 is
-	 * online but xen_hvm_set_shared_info is run at resume time too and
+	 * When xen_hvm_init_shared_info is run at boot time only vcpu 0 is
+	 * online but xen_hvm_init_shared_info is run at resume time too and
 	 * in that case multiple vcpus might be online. */
 	for_each_online_cpu(cpu) {
 		per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu];
 	}
 }
 
-/* Reconnect the shared_info pfn to a mfn */
-void xen_hvm_resume_shared_info(void)
-{
-	xen_hvm_connect_shared_info(__pa(xen_hvm_shared_info) >> PAGE_SHIFT);
-}
-
-#ifdef CONFIG_KEXEC
-static struct shared_info *xen_hvm_shared_info_kexec;
-static unsigned long xen_hvm_shared_info_pfn_kexec;
-
-/* Remember a pfn in MMIO space for kexec reboot */
-void __devinit xen_hvm_prepare_kexec(struct shared_info *sip, unsigned long pfn)
-{
-	xen_hvm_shared_info_kexec = sip;
-	xen_hvm_shared_info_pfn_kexec = pfn;
-}
-
-static void xen_hvm_syscore_shutdown(void)
-{
-	struct xen_memory_reservation reservation = {
-		.domid = DOMID_SELF,
-		.nr_extents = 1,
-	};
-	unsigned long prev_pfn;
-	int rc;
-
-	if (!xen_hvm_shared_info_kexec)
-		return;
-
-	prev_pfn = __pa(xen_hvm_shared_info) >> PAGE_SHIFT;
-	set_xen_guest_handle(reservation.extent_start, &prev_pfn);
-
-	/* Move pfn to MMIO, disconnects previous pfn from mfn */
-	xen_hvm_connect_shared_info(xen_hvm_shared_info_pfn_kexec);
-
-	/* Update pointers, following hypercall is also a memory barrier */
-	xen_hvm_set_shared_info(xen_hvm_shared_info_kexec);
-
-	/* Allocate new mfn for previous pfn */
-	do {
-		rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
-		if (rc == 0)
-			msleep(123);
-	} while (rc == 0);
-
-	/* Make sure the previous pfn is really connected to a (new) mfn */
-	BUG_ON(rc != 1);
-}
-
-static struct syscore_ops xen_hvm_syscore_ops = {
-	.shutdown = xen_hvm_syscore_shutdown,
-};
-#endif
-
-/* Use a pfn in RAM, may move to MMIO before kexec. */
-static void __init xen_hvm_init_shared_info(void)
-{
-	/* Remember pointer for resume */
-	xen_hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE);
-	xen_hvm_connect_shared_info(__pa(xen_hvm_shared_info) >> PAGE_SHIFT);
-	xen_hvm_set_shared_info(xen_hvm_shared_info);
-}
-
+#ifdef CONFIG_XEN_PVHVM
 static void __init init_hvm_pv_info(void)
 {
 	int major, minor;
@@ -1646,9 +1553,6 @@ static void __init xen_hvm_guest_init(void)
 	init_hvm_pv_info();
 
 	xen_hvm_init_shared_info();
-#ifdef CONFIG_KEXEC
-	register_syscore_ops(&xen_hvm_syscore_ops);
-#endif
 
 	if (xen_feature(XENFEAT_hvm_callback_vector))
 		xen_have_vector_callback = 1;
diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
index ae8a00c..45329c8 100644
--- a/arch/x86/xen/suspend.c
+++ b/arch/x86/xen/suspend.c
@@ -30,7 +30,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled)
 {
 #ifdef CONFIG_XEN_PVHVM
 	int cpu;
-	xen_hvm_resume_shared_info();
+	xen_hvm_init_shared_info();
 	xen_callback_vector();
 	xen_unplug_emulated_devices();
 	if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 1e4329e..202d4c1 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -41,7 +41,7 @@ void xen_enable_syscall(void);
 void xen_vcpu_restore(void);
 
 void xen_callback_vector(void);
-void xen_hvm_resume_shared_info(void);
+void xen_hvm_init_shared_info(void);
 void xen_unplug_emulated_devices(void);
 
 void __init xen_build_dynamic_phys_to_machine(void);
diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
index d4c50d6..97ca359 100644
--- a/drivers/xen/platform-pci.c
+++ b/drivers/xen/platform-pci.c
@@ -101,19 +101,6 @@ static int platform_pci_resume(struct pci_dev *pdev)
 	return 0;
 }
 
-static void __devinit prepare_shared_info(void)
-{
-#ifdef CONFIG_KEXEC
-	unsigned long addr;
-	struct shared_info *hvm_shared_info;
-
-	addr = alloc_xen_mmio(PAGE_SIZE);
-	hvm_shared_info = ioremap(addr, PAGE_SIZE);
-	memset(hvm_shared_info, 0, PAGE_SIZE);
-	xen_hvm_prepare_kexec(hvm_shared_info, addr >> PAGE_SHIFT);
-#endif
-}
-
 static int __devinit platform_pci_init(struct pci_dev *pdev,
 				       const struct pci_device_id *ent)
 {
@@ -151,8 +138,6 @@ static int __devinit platform_pci_init(struct pci_dev *pdev,
 	platform_mmio = mmio_addr;
 	platform_mmiolen = mmio_len;
 
-	prepare_shared_info();
-
 	if (!xen_have_vector_callback) {
 		ret = xen_allocate_irq(pdev);
 		if (ret) {
diff --git a/include/xen/events.h b/include/xen/events.h
index 9c641de..04399b2 100644
--- a/include/xen/events.h
+++ b/include/xen/events.h
@@ -58,8 +58,6 @@ void notify_remote_via_irq(int irq);
 
 void xen_irq_resume(void);
 
-void xen_hvm_prepare_kexec(struct shared_info *sip, unsigned long pfn);
-
 /* Clear an irq's pending state, in preparation for polling on it */
 void xen_clear_irq_pending(int irq);
 void xen_set_irq_pending(int irq);
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] xen/p2m: Fix for 32-bit builds the "Reserve 8MB of _brk space for P2M"
  2012-08-16 15:50 ` [PATCH 1/2] xen/p2m: Fix for 32-bit builds the "Reserve 8MB of _brk space for P2M" Konrad Rzeszutek Wilk
@ 2012-08-16 17:32   ` Konrad Rzeszutek Wilk
  2012-08-16 21:02     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 9+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-08-16 17:32 UTC (permalink / raw)
  To: linux-kernel, xen-devel

On Thu, Aug 16, 2012 at 11:50:13AM -0400, Konrad Rzeszutek Wilk wrote:
> The git commit 5bc6f9888db5739abfa0cae279b4b442e4db8049
> xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back.
> 
> extended the _brk space to fit 1048576 PFNs. The math is that each
> P2M leaf can cover PAGE_SIZE/sizeof(unsigned long) PFNs. In 64-bit
> that means 512 PFNs, on 32-bit that is 1024. If on 64-bit machines
> we want to cover 4GB of PFNs, that means having enough for space
> to fit 1048576 unsigned longs.

Scratch that patch. This is better, but even with that I am still
hitting some weird 32-bit cases.

 
>From 5502d44e8c7293f6d81a7fdabe25e49845c25cf8 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Thu, 16 Aug 2012 10:57:09 -0400
Subject: [PATCH] xen/p2m: Fix for 32-bit builds the "Reserve 8MB of _brk
 space for P2M"

The git commit 5bc6f9888db5739abfa0cae279b4b442e4db8049
xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back.

extended the _brk space to fit 1048576 PFNs. The math is that each
P2M leaf can cover PAGE_SIZE/sizeof(unsigned long) PFNs. In 64-bit
that means 512 PFNs, on 32-bit that is 1024. If on 64-bit machines
we want to cover 4GB of PFNs, that means having enough for space
to fit 1048576 unsigned longs.

On 64-bit:
1048576 * sizeof(unsigned long) (8) bytes = 8MB

On 32-bit:
1048576 * sizeof(unsigned long) (4) bytes = 4MB

.. But if you look in the comment it says 3GB not 4GB, so
lets also fix that and reserve enough space for 3GB of PFNs.

We fix that by using the above mentioned math instead of predefined
PMD_SIZE.

CC: stable@vger.kernel.org #only for 3.5
[v2: 4GB/3GB]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/p2m.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index b2e91d4..29244d0 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -196,9 +196,9 @@ RESERVE_BRK(p2m_mid_identity, PAGE_SIZE * 2 * 3);
 
 /* When we populate back during bootup, the amount of pages can vary. The
  * max we have is seen is 395979, but that does not mean it can't be more.
- * But some machines can have 3GB I/O holes even. So lets reserve enough
- * for 4GB of I/O and E820 holes. */
-RESERVE_BRK(p2m_populated, PMD_SIZE * 4);
+ * Some machines can have 3GB I/O holes so lets reserve for that. */
+RESERVE_BRK(p2m_populated, 786432 * sizeof(unsigned long));
+
 static inline unsigned p2m_top_index(unsigned long pfn)
 {
 	BUG_ON(pfn >= MAX_P2M_PFN);
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] xen/p2m: Fix for 32-bit builds the "Reserve 8MB of _brk space for P2M"
  2012-08-16 17:32   ` [Xen-devel] " Konrad Rzeszutek Wilk
@ 2012-08-16 21:02     ` Konrad Rzeszutek Wilk
  2012-08-17 11:14       ` David Vrabel
  0 siblings, 1 reply; 9+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-08-16 21:02 UTC (permalink / raw)
  To: linux-kernel, xen-devel

On Thu, Aug 16, 2012 at 01:32:15PM -0400, Konrad Rzeszutek Wilk wrote:
> On Thu, Aug 16, 2012 at 11:50:13AM -0400, Konrad Rzeszutek Wilk wrote:
> > The git commit 5bc6f9888db5739abfa0cae279b4b442e4db8049
> > xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back.
> > 
> > extended the _brk space to fit 1048576 PFNs. The math is that each
> > P2M leaf can cover PAGE_SIZE/sizeof(unsigned long) PFNs. In 64-bit
> > that means 512 PFNs, on 32-bit that is 1024. If on 64-bit machines
> > we want to cover 4GB of PFNs, that means having enough for space
> > to fit 1048576 unsigned longs.
> 
> Scratch that patch. This is better, but even with that I am still
> hitting some weird 32-bit cases.

So I thought about this some more and came up with this patch. Its
RFC and going to run it through some overnight tests to see how they fare.


commit da858a92dbeb52fb3246e3d0f1dd57989b5b1734
Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date:   Fri Jul 27 16:05:47 2012 -0400

    xen/p2m: Reuse existing P2M leafs if they are filled with 1:1 PFNs or INVALID.
    
    If P2M leaf is completly packed with INVALID_P2M_ENTRY or with
    1:1 PFNs (so IDENTITY_FRAME type PFNs), we can swap the P2M leaf
    with either a p2m_missing or p2m_identity respectively. The old
    page (which was created via extend_brk or was grafted on from the
    mfn_list) can be re-used for setting new PFNs.
    
    This also means we can remove git commit:
    5bc6f9888db5739abfa0cae279b4b442e4db8049
    xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back
    which tried to fix this.
    
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 29244d0..b6b7c10 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -194,11 +194,6 @@ RESERVE_BRK(p2m_mid_mfn, PAGE_SIZE * (MAX_DOMAIN_PAGES / (P2M_PER_PAGE * P2M_MID
  * boundary violation will require three middle nodes. */
 RESERVE_BRK(p2m_mid_identity, PAGE_SIZE * 2 * 3);
 
-/* When we populate back during bootup, the amount of pages can vary. The
- * max we have is seen is 395979, but that does not mean it can't be more.
- * Some machines can have 3GB I/O holes so lets reserve for that. */
-RESERVE_BRK(p2m_populated, 786432 * sizeof(unsigned long));
-
 static inline unsigned p2m_top_index(unsigned long pfn)
 {
 	BUG_ON(pfn >= MAX_P2M_PFN);
@@ -575,12 +570,99 @@ static bool __init early_alloc_p2m(unsigned long pfn)
 	}
 	return true;
 }
+
+/*
+ * Skim over the P2M tree looking at pages that are either filled with
+ * INVALID_P2M_ENTRY or with 1:1 PFNs. If found, re-use that page and
+ * replace the P2M leaf with a p2m_missing or p2m_identity.
+ * Stick the old page in the new P2M tree location.
+ */
+bool __init early_can_reuse_p2m_middle(unsigned long set_pfn, unsigned long set_mfn)
+{
+	unsigned topidx;
+	unsigned mididx;
+	unsigned ident_pfns;
+	unsigned inv_pfns;
+	unsigned long *p2m;
+	unsigned long *mid_mfn_p;
+	unsigned idx;
+	unsigned long pfn;
+
+	/* We only look when this entails a P2M middle layer */
+	if (p2m_index(set_pfn))
+		return false;
+
+	for (pfn = 0; pfn <= MAX_DOMAIN_PAGES; pfn += P2M_PER_PAGE) {
+		topidx = p2m_top_index(pfn);
+
+		if (!p2m_top[topidx])
+			continue;
+
+		if (p2m_top[topidx] == p2m_mid_missing)
+			continue;
+
+		mididx = p2m_mid_index(pfn);
+		p2m = p2m_top[topidx][mididx];
+		if (!p2m)
+			continue;
+
+		if ((p2m == p2m_missing) || (p2m == p2m_identity))
+			continue;
+
+		if ((unsigned long)p2m == INVALID_P2M_ENTRY)
+			continue;
+
+		ident_pfns = 0;
+		inv_pfns = 0;
+		for (idx = 0; idx < P2M_PER_PAGE; idx++) {
+			/* IDENTITY_PFNs are 1:1 */
+			if (p2m[idx] == IDENTITY_FRAME(pfn + idx))
+				ident_pfns++;
+			else if (p2m[idx] == INVALID_P2M_ENTRY)
+				inv_pfns++;
+			else
+				break;
+		}
+		if ((ident_pfns == P2M_PER_PAGE) || (inv_pfns == P2M_PER_PAGE))
+			goto found;
+	}
+	return false;
+found:
+	/* Found one, replace old with p2m_identity or p2m_missing */
+	p2m_top[topidx][mididx] = (ident_pfns ? p2m_identity : p2m_missing);
+	/* And the other for save/restore.. */
+	mid_mfn_p = p2m_top_mfn_p[topidx];
+	/* NOTE: Even if it is a p2m_identity it should still be point to
+	 * a page filled with INVALID_P2M_ENTRY entries. */
+	mid_mfn_p[mididx] = virt_to_mfn(p2m_missing);
+
+	/* Reset where we want to stick the old page in. */
+	topidx = p2m_top_index(set_pfn);
+	mididx = p2m_mid_index(set_pfn);
+
+	/* This shouldn't happen */
+	if (WARN_ON(p2m_top[topidx] == p2m_mid_missing))
+		early_alloc_p2m(set_pfn);
+
+	if (WARN_ON(p2m_top[topidx][mididx] != p2m_missing))
+		return false;
+
+	p2m_init(p2m);
+	p2m_top[topidx][mididx] = p2m;
+	mid_mfn_p = p2m_top_mfn_p[topidx];
+	mid_mfn_p[mididx] = virt_to_mfn(p2m);
+
+	return true;
+}
 bool __init early_set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 {
 	if (unlikely(!__set_phys_to_machine(pfn, mfn)))  {
 		if (!early_alloc_p2m(pfn))
 			return false;
 
+		if (early_can_reuse_p2m_middle(pfn, mfn))
+			return __set_phys_to_machine(pfn, mfn);
+
 		if (!early_alloc_p2m_middle(pfn, false /* boundary crossover OK!*/))
 			return false;
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] xen/p2m: Fix for 32-bit builds the "Reserve 8MB of _brk space for P2M"
  2012-08-16 21:02     ` Konrad Rzeszutek Wilk
@ 2012-08-17 11:14       ` David Vrabel
  2012-08-17 13:06         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 9+ messages in thread
From: David Vrabel @ 2012-08-17 11:14 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: linux-kernel, xen-devel

On 16/08/12 22:02, Konrad Rzeszutek Wilk wrote:
> 
> So I thought about this some more and came up with this patch. Its
> RFC and going to run it through some overnight tests to see how they fare.
> 
> 
> commit da858a92dbeb52fb3246e3d0f1dd57989b5b1734
> Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Date:   Fri Jul 27 16:05:47 2012 -0400
> 
>     xen/p2m: Reuse existing P2M leafs if they are filled with 1:1 PFNs or INVALID.
>     
>     If P2M leaf is completly packed with INVALID_P2M_ENTRY or with
>     1:1 PFNs (so IDENTITY_FRAME type PFNs), we can swap the P2M leaf
>     with either a p2m_missing or p2m_identity respectively. The old
>     page (which was created via extend_brk or was grafted on from the
>     mfn_list) can be re-used for setting new PFNs.

Does this actually find any p2m pages to reclaim?

xen_set_identity_and_release() is careful to set the largest possible
range as 1:1 and the comments at the top of p2m.c suggest the mid
entries will be made to point to p2m_identity already.

David

>     This also means we can remove git commit:
>     5bc6f9888db5739abfa0cae279b4b442e4db8049
>     xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back
>     which tried to fix this.
>     
>     Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
> diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
> index 29244d0..b6b7c10 100644
> --- a/arch/x86/xen/p2m.c
> +++ b/arch/x86/xen/p2m.c
> @@ -194,11 +194,6 @@ RESERVE_BRK(p2m_mid_mfn, PAGE_SIZE * (MAX_DOMAIN_PAGES / (P2M_PER_PAGE * P2M_MID
>   * boundary violation will require three middle nodes. */
>  RESERVE_BRK(p2m_mid_identity, PAGE_SIZE * 2 * 3);
>  
> -/* When we populate back during bootup, the amount of pages can vary. The
> - * max we have is seen is 395979, but that does not mean it can't be more.
> - * Some machines can have 3GB I/O holes so lets reserve for that. */
> -RESERVE_BRK(p2m_populated, 786432 * sizeof(unsigned long));
> -
>  static inline unsigned p2m_top_index(unsigned long pfn)
>  {
>  	BUG_ON(pfn >= MAX_P2M_PFN);
> @@ -575,12 +570,99 @@ static bool __init early_alloc_p2m(unsigned long pfn)
>  	}
>  	return true;
>  }
> +
> +/*
> + * Skim over the P2M tree looking at pages that are either filled with
> + * INVALID_P2M_ENTRY or with 1:1 PFNs. If found, re-use that page and
> + * replace the P2M leaf with a p2m_missing or p2m_identity.
> + * Stick the old page in the new P2M tree location.
> + */
> +bool __init early_can_reuse_p2m_middle(unsigned long set_pfn, unsigned long set_mfn)
> +{
> +	unsigned topidx;
> +	unsigned mididx;
> +	unsigned ident_pfns;
> +	unsigned inv_pfns;
> +	unsigned long *p2m;
> +	unsigned long *mid_mfn_p;
> +	unsigned idx;
> +	unsigned long pfn;
> +
> +	/* We only look when this entails a P2M middle layer */
> +	if (p2m_index(set_pfn))
> +		return false;
> +
> +	for (pfn = 0; pfn <= MAX_DOMAIN_PAGES; pfn += P2M_PER_PAGE) {
> +		topidx = p2m_top_index(pfn);
> +
> +		if (!p2m_top[topidx])
> +			continue;
> +
> +		if (p2m_top[topidx] == p2m_mid_missing)
> +			continue;
> +
> +		mididx = p2m_mid_index(pfn);
> +		p2m = p2m_top[topidx][mididx];
> +		if (!p2m)
> +			continue;
> +
> +		if ((p2m == p2m_missing) || (p2m == p2m_identity))
> +			continue;
> +
> +		if ((unsigned long)p2m == INVALID_P2M_ENTRY)
> +			continue;
> +
> +		ident_pfns = 0;
> +		inv_pfns = 0;
> +		for (idx = 0; idx < P2M_PER_PAGE; idx++) {
> +			/* IDENTITY_PFNs are 1:1 */
> +			if (p2m[idx] == IDENTITY_FRAME(pfn + idx))
> +				ident_pfns++;
> +			else if (p2m[idx] == INVALID_P2M_ENTRY)
> +				inv_pfns++;
> +			else
> +				break;
> +		}
> +		if ((ident_pfns == P2M_PER_PAGE) || (inv_pfns == P2M_PER_PAGE))
> +			goto found;
> +	}
> +	return false;
> +found:
> +	/* Found one, replace old with p2m_identity or p2m_missing */
> +	p2m_top[topidx][mididx] = (ident_pfns ? p2m_identity : p2m_missing);
> +	/* And the other for save/restore.. */
> +	mid_mfn_p = p2m_top_mfn_p[topidx];
> +	/* NOTE: Even if it is a p2m_identity it should still be point to
> +	 * a page filled with INVALID_P2M_ENTRY entries. */
> +	mid_mfn_p[mididx] = virt_to_mfn(p2m_missing);
> +
> +	/* Reset where we want to stick the old page in. */
> +	topidx = p2m_top_index(set_pfn);
> +	mididx = p2m_mid_index(set_pfn);
> +
> +	/* This shouldn't happen */
> +	if (WARN_ON(p2m_top[topidx] == p2m_mid_missing))
> +		early_alloc_p2m(set_pfn);
> +
> +	if (WARN_ON(p2m_top[topidx][mididx] != p2m_missing))
> +		return false;
> +
> +	p2m_init(p2m);
> +	p2m_top[topidx][mididx] = p2m;
> +	mid_mfn_p = p2m_top_mfn_p[topidx];
> +	mid_mfn_p[mididx] = virt_to_mfn(p2m);
> +
> +	return true;
> +}
>  bool __init early_set_phys_to_machine(unsigned long pfn, unsigned long mfn)
>  {
>  	if (unlikely(!__set_phys_to_machine(pfn, mfn)))  {
>  		if (!early_alloc_p2m(pfn))
>  			return false;
>  
> +		if (early_can_reuse_p2m_middle(pfn, mfn))
> +			return __set_phys_to_machine(pfn, mfn);
> +
>  		if (!early_alloc_p2m_middle(pfn, false /* boundary crossover OK!*/))
>  			return false;
>  
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] xen/p2m: Fix for 32-bit builds the "Reserve 8MB of _brk space for P2M"
  2012-08-17 11:14       ` David Vrabel
@ 2012-08-17 13:06         ` Konrad Rzeszutek Wilk
  2012-08-17 13:28           ` David Vrabel
  0 siblings, 1 reply; 9+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-08-17 13:06 UTC (permalink / raw)
  To: David Vrabel; +Cc: linux-kernel, xen-devel

On Fri, Aug 17, 2012 at 12:14:12PM +0100, David Vrabel wrote:
> On 16/08/12 22:02, Konrad Rzeszutek Wilk wrote:
> > 
> > So I thought about this some more and came up with this patch. Its
> > RFC and going to run it through some overnight tests to see how they fare.
> > 
> > 
> > commit da858a92dbeb52fb3246e3d0f1dd57989b5b1734
> > Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > Date:   Fri Jul 27 16:05:47 2012 -0400
> > 
> >     xen/p2m: Reuse existing P2M leafs if they are filled with 1:1 PFNs or INVALID.
> >     
> >     If P2M leaf is completly packed with INVALID_P2M_ENTRY or with
> >     1:1 PFNs (so IDENTITY_FRAME type PFNs), we can swap the P2M leaf
> >     with either a p2m_missing or p2m_identity respectively. The old
> >     page (which was created via extend_brk or was grafted on from the
> >     mfn_list) can be re-used for setting new PFNs.
> 
> Does this actually find any p2m pages to reclaim?

Very much so. When I run the kernel without dom0_mem, and end up returning
around 372300 pages back, and then populating them back - they (mostly)
all get to re-use the transplanted mfn_list.

The ones in the 9a-100 obviously don't.
> 
> xen_set_identity_and_release() is careful to set the largest possible
> range as 1:1 and the comments at the top of p2m.c suggest the mid
> entries will be made to point to p2m_identity already.

Right, and that is still true - for cases where the are no mid entries
(so P2M[3][400] for example can point in the middle of the MMIO region).

But if you boot without dom0_mem=max, that region (P2M[3][400]) would at
the start be backed by the &mfn_list, so when we call 1-1 on that region
it ends up sticking in the &mfn_list a whole bunch of IDENTITY_FRAME(pfn).

This patch harvests those chunks of &mfn_list that have that and re-uses them.

And without any dom0_mem= I seem to at most call extend_bkr twice (to
allocate the top leafs P2M[4] and P2M[5]). Hm, to be on a safe side I should
probably do 'reserve_brk(p2m_popualated, 3 * PAGE_SIZE)' in case we
end up transplanting 3GB of PFNs in in the P2M[4], P2M[5] and P2M[6] nodes.

> 
> David
> 
> >     This also means we can remove git commit:
> >     5bc6f9888db5739abfa0cae279b4b442e4db8049
> >     xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back
> >     which tried to fix this.
> >     
> >     Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > 
> > diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
> > index 29244d0..b6b7c10 100644
> > --- a/arch/x86/xen/p2m.c
> > +++ b/arch/x86/xen/p2m.c
> > @@ -194,11 +194,6 @@ RESERVE_BRK(p2m_mid_mfn, PAGE_SIZE * (MAX_DOMAIN_PAGES / (P2M_PER_PAGE * P2M_MID
> >   * boundary violation will require three middle nodes. */
> >  RESERVE_BRK(p2m_mid_identity, PAGE_SIZE * 2 * 3);
> >  
> > -/* When we populate back during bootup, the amount of pages can vary. The
> > - * max we have is seen is 395979, but that does not mean it can't be more.
> > - * Some machines can have 3GB I/O holes so lets reserve for that. */
> > -RESERVE_BRK(p2m_populated, 786432 * sizeof(unsigned long));
> > -
> >  static inline unsigned p2m_top_index(unsigned long pfn)
> >  {
> >  	BUG_ON(pfn >= MAX_P2M_PFN);
> > @@ -575,12 +570,99 @@ static bool __init early_alloc_p2m(unsigned long pfn)
> >  	}
> >  	return true;
> >  }
> > +
> > +/*
> > + * Skim over the P2M tree looking at pages that are either filled with
> > + * INVALID_P2M_ENTRY or with 1:1 PFNs. If found, re-use that page and
> > + * replace the P2M leaf with a p2m_missing or p2m_identity.
> > + * Stick the old page in the new P2M tree location.
> > + */
> > +bool __init early_can_reuse_p2m_middle(unsigned long set_pfn, unsigned long set_mfn)
> > +{
> > +	unsigned topidx;
> > +	unsigned mididx;
> > +	unsigned ident_pfns;
> > +	unsigned inv_pfns;
> > +	unsigned long *p2m;
> > +	unsigned long *mid_mfn_p;
> > +	unsigned idx;
> > +	unsigned long pfn;
> > +
> > +	/* We only look when this entails a P2M middle layer */
> > +	if (p2m_index(set_pfn))
> > +		return false;
> > +
> > +	for (pfn = 0; pfn <= MAX_DOMAIN_PAGES; pfn += P2M_PER_PAGE) {
> > +		topidx = p2m_top_index(pfn);
> > +
> > +		if (!p2m_top[topidx])
> > +			continue;
> > +
> > +		if (p2m_top[topidx] == p2m_mid_missing)
> > +			continue;
> > +
> > +		mididx = p2m_mid_index(pfn);
> > +		p2m = p2m_top[topidx][mididx];
> > +		if (!p2m)
> > +			continue;
> > +
> > +		if ((p2m == p2m_missing) || (p2m == p2m_identity))
> > +			continue;
> > +
> > +		if ((unsigned long)p2m == INVALID_P2M_ENTRY)
> > +			continue;
> > +
> > +		ident_pfns = 0;
> > +		inv_pfns = 0;
> > +		for (idx = 0; idx < P2M_PER_PAGE; idx++) {
> > +			/* IDENTITY_PFNs are 1:1 */
> > +			if (p2m[idx] == IDENTITY_FRAME(pfn + idx))
> > +				ident_pfns++;
> > +			else if (p2m[idx] == INVALID_P2M_ENTRY)
> > +				inv_pfns++;
> > +			else
> > +				break;
> > +		}
> > +		if ((ident_pfns == P2M_PER_PAGE) || (inv_pfns == P2M_PER_PAGE))
> > +			goto found;
> > +	}
> > +	return false;
> > +found:
> > +	/* Found one, replace old with p2m_identity or p2m_missing */
> > +	p2m_top[topidx][mididx] = (ident_pfns ? p2m_identity : p2m_missing);
> > +	/* And the other for save/restore.. */
> > +	mid_mfn_p = p2m_top_mfn_p[topidx];
> > +	/* NOTE: Even if it is a p2m_identity it should still be point to
> > +	 * a page filled with INVALID_P2M_ENTRY entries. */
> > +	mid_mfn_p[mididx] = virt_to_mfn(p2m_missing);
> > +
> > +	/* Reset where we want to stick the old page in. */
> > +	topidx = p2m_top_index(set_pfn);
> > +	mididx = p2m_mid_index(set_pfn);
> > +
> > +	/* This shouldn't happen */
> > +	if (WARN_ON(p2m_top[topidx] == p2m_mid_missing))
> > +		early_alloc_p2m(set_pfn);
> > +
> > +	if (WARN_ON(p2m_top[topidx][mididx] != p2m_missing))
> > +		return false;
> > +
> > +	p2m_init(p2m);
> > +	p2m_top[topidx][mididx] = p2m;
> > +	mid_mfn_p = p2m_top_mfn_p[topidx];
> > +	mid_mfn_p[mididx] = virt_to_mfn(p2m);
> > +
> > +	return true;
> > +}
> >  bool __init early_set_phys_to_machine(unsigned long pfn, unsigned long mfn)
> >  {
> >  	if (unlikely(!__set_phys_to_machine(pfn, mfn)))  {
> >  		if (!early_alloc_p2m(pfn))
> >  			return false;
> >  
> > +		if (early_can_reuse_p2m_middle(pfn, mfn))
> > +			return __set_phys_to_machine(pfn, mfn);
> > +
> >  		if (!early_alloc_p2m_middle(pfn, false /* boundary crossover OK!*/))
> >  			return false;
> >  
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] xen/p2m: Fix for 32-bit builds the "Reserve 8MB of _brk space for P2M"
  2012-08-17 13:06         ` Konrad Rzeszutek Wilk
@ 2012-08-17 13:28           ` David Vrabel
  2012-08-17 17:36             ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 9+ messages in thread
From: David Vrabel @ 2012-08-17 13:28 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: linux-kernel, xen-devel

On 17/08/12 14:06, Konrad Rzeszutek Wilk wrote:
> On Fri, Aug 17, 2012 at 12:14:12PM +0100, David Vrabel wrote:
>> On 16/08/12 22:02, Konrad Rzeszutek Wilk wrote:
>>>
>>> So I thought about this some more and came up with this patch. Its
>>> RFC and going to run it through some overnight tests to see how they fare.
>>>
>>>
>>> commit da858a92dbeb52fb3246e3d0f1dd57989b5b1734
>>> Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>>> Date:   Fri Jul 27 16:05:47 2012 -0400
>>>
>>>     xen/p2m: Reuse existing P2M leafs if they are filled with 1:1 PFNs or INVALID.
>>>     
>>>     If P2M leaf is completly packed with INVALID_P2M_ENTRY or with
>>>     1:1 PFNs (so IDENTITY_FRAME type PFNs), we can swap the P2M leaf
>>>     with either a p2m_missing or p2m_identity respectively. The old
>>>     page (which was created via extend_brk or was grafted on from the
>>>     mfn_list) can be re-used for setting new PFNs.
>>
>> Does this actually find any p2m pages to reclaim?
> 
> Very much so. When I run the kernel without dom0_mem, and end up returning
> around 372300 pages back, and then populating them back - they (mostly)
> all get to re-use the transplanted mfn_list.
> 
> The ones in the 9a-100 obviously don't.
>>
>> xen_set_identity_and_release() is careful to set the largest possible
>> range as 1:1 and the comments at the top of p2m.c suggest the mid
>> entries will be made to point to p2m_identity already.
> 
> Right, and that is still true - for cases where the are no mid entries
> (so P2M[3][400] for example can point in the middle of the MMIO region).
> 
> But if you boot without dom0_mem=max, that region (P2M[3][400]) would at
> the start be backed by the &mfn_list, so when we call 1-1 on that region
> it ends up sticking in the &mfn_list a whole bunch of IDENTITY_FRAME(pfn).

Ah, I see.  This makes sense now.

> This patch harvests those chunks of &mfn_list that have that and re-uses them.
> 
> And without any dom0_mem= I seem to at most call extend_bkr twice (to
> allocate the top leafs P2M[4] and P2M[5]). Hm, to be on a safe side I should
> probably do 'reserve_brk(p2m_popualated, 3 * PAGE_SIZE)' in case we
> end up transplanting 3GB of PFNs in in the P2M[4], P2M[5] and P2M[6] nodes.

That sounds sensible.

David

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] xen/p2m: Fix for 32-bit builds the "Reserve 8MB of _brk space for P2M"
  2012-08-17 13:28           ` David Vrabel
@ 2012-08-17 17:36             ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 9+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-08-17 17:36 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, linux-kernel

On Fri, Aug 17, 2012 at 02:28:51PM +0100, David Vrabel wrote:
> On 17/08/12 14:06, Konrad Rzeszutek Wilk wrote:
> > On Fri, Aug 17, 2012 at 12:14:12PM +0100, David Vrabel wrote:
> >> On 16/08/12 22:02, Konrad Rzeszutek Wilk wrote:
> >>>
> >>> So I thought about this some more and came up with this patch. Its
> >>> RFC and going to run it through some overnight tests to see how they fare.
> >>>
> >>>
> >>> commit da858a92dbeb52fb3246e3d0f1dd57989b5b1734
> >>> Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> >>> Date:   Fri Jul 27 16:05:47 2012 -0400
> >>>
> >>>     xen/p2m: Reuse existing P2M leafs if they are filled with 1:1 PFNs or INVALID.
> >>>     
> >>>     If P2M leaf is completly packed with INVALID_P2M_ENTRY or with
> >>>     1:1 PFNs (so IDENTITY_FRAME type PFNs), we can swap the P2M leaf
> >>>     with either a p2m_missing or p2m_identity respectively. The old
> >>>     page (which was created via extend_brk or was grafted on from the
> >>>     mfn_list) can be re-used for setting new PFNs.
> >>
> >> Does this actually find any p2m pages to reclaim?
> > 
> > Very much so. When I run the kernel without dom0_mem, and end up returning
> > around 372300 pages back, and then populating them back - they (mostly)
> > all get to re-use the transplanted mfn_list.
> > 
> > The ones in the 9a-100 obviously don't.
> >>
> >> xen_set_identity_and_release() is careful to set the largest possible
> >> range as 1:1 and the comments at the top of p2m.c suggest the mid
> >> entries will be made to point to p2m_identity already.
> > 
> > Right, and that is still true - for cases where the are no mid entries
> > (so P2M[3][400] for example can point in the middle of the MMIO region).
> > 
> > But if you boot without dom0_mem=max, that region (P2M[3][400]) would at
> > the start be backed by the &mfn_list, so when we call 1-1 on that region
> > it ends up sticking in the &mfn_list a whole bunch of IDENTITY_FRAME(pfn).
> 
> Ah, I see.  This makes sense now.
> 
> > This patch harvests those chunks of &mfn_list that have that and re-uses them.
> > 
> > And without any dom0_mem= I seem to at most call extend_bkr twice (to
> > allocate the top leafs P2M[4] and P2M[5]). Hm, to be on a safe side I should
> > probably do 'reserve_brk(p2m_popualated, 3 * PAGE_SIZE)' in case we
> > end up transplanting 3GB of PFNs in in the P2M[4], P2M[5] and P2M[6] nodes.
> 
> That sounds sensible.

Here is an updated (just made so to scale the reserve_brk down)
one that I was thinking to send to Linus next week.

>From 250a41e0ecc433cdd553a364d0fc74c766425209 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Fri, 17 Aug 2012 09:27:35 -0400
Subject: [PATCH] xen/p2m: Reuse existing P2M leafs if they are filled with
 1:1 PFNs or INVALID.

If P2M leaf is completly packed with INVALID_P2M_ENTRY or with
1:1 PFNs (so IDENTITY_FRAME type PFNs), we can swap the P2M leaf
with either a p2m_missing or p2m_identity respectively. The old
page (which was created via extend_brk or was grafted on from the
mfn_list) can be re-used for setting new PFNs.

This also means we can remove git commit:
5bc6f9888db5739abfa0cae279b4b442e4db8049
xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back
which tried to fix this.

and make the amount that is required to be reserved much smaller.

CC: stable@vger.kernel.org # for 3.5 only.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/p2m.c |   95 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 92 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index b2e91d4..d4b25546 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -196,9 +196,11 @@ RESERVE_BRK(p2m_mid_identity, PAGE_SIZE * 2 * 3);
 
 /* When we populate back during bootup, the amount of pages can vary. The
  * max we have is seen is 395979, but that does not mean it can't be more.
- * But some machines can have 3GB I/O holes even. So lets reserve enough
- * for 4GB of I/O and E820 holes. */
-RESERVE_BRK(p2m_populated, PMD_SIZE * 4);
+ * Some machines can have 3GB I/O holes even. With early_can_reuse_p2m_middle
+ * it can re-use Xen provided mfn_list array, so we only need to allocate at
+ * most three P2M top nodes. */
+RESERVE_BRK(p2m_populated, PAGE_SIZE * 3);
+
 static inline unsigned p2m_top_index(unsigned long pfn)
 {
 	BUG_ON(pfn >= MAX_P2M_PFN);
@@ -575,12 +577,99 @@ static bool __init early_alloc_p2m(unsigned long pfn)
 	}
 	return true;
 }
+
+/*
+ * Skim over the P2M tree looking at pages that are either filled with
+ * INVALID_P2M_ENTRY or with 1:1 PFNs. If found, re-use that page and
+ * replace the P2M leaf with a p2m_missing or p2m_identity.
+ * Stick the old page in the new P2M tree location.
+ */
+bool __init early_can_reuse_p2m_middle(unsigned long set_pfn, unsigned long set_mfn)
+{
+	unsigned topidx;
+	unsigned mididx;
+	unsigned ident_pfns;
+	unsigned inv_pfns;
+	unsigned long *p2m;
+	unsigned long *mid_mfn_p;
+	unsigned idx;
+	unsigned long pfn;
+
+	/* We only look when this entails a P2M middle layer */
+	if (p2m_index(set_pfn))
+		return false;
+
+	for (pfn = 0; pfn <= MAX_DOMAIN_PAGES; pfn += P2M_PER_PAGE) {
+		topidx = p2m_top_index(pfn);
+
+		if (!p2m_top[topidx])
+			continue;
+
+		if (p2m_top[topidx] == p2m_mid_missing)
+			continue;
+
+		mididx = p2m_mid_index(pfn);
+		p2m = p2m_top[topidx][mididx];
+		if (!p2m)
+			continue;
+
+		if ((p2m == p2m_missing) || (p2m == p2m_identity))
+			continue;
+
+		if ((unsigned long)p2m == INVALID_P2M_ENTRY)
+			continue;
+
+		ident_pfns = 0;
+		inv_pfns = 0;
+		for (idx = 0; idx < P2M_PER_PAGE; idx++) {
+			/* IDENTITY_PFNs are 1:1 */
+			if (p2m[idx] == IDENTITY_FRAME(pfn + idx))
+				ident_pfns++;
+			else if (p2m[idx] == INVALID_P2M_ENTRY)
+				inv_pfns++;
+			else
+				break;
+		}
+		if ((ident_pfns == P2M_PER_PAGE) || (inv_pfns == P2M_PER_PAGE))
+			goto found;
+	}
+	return false;
+found:
+	/* Found one, replace old with p2m_identity or p2m_missing */
+	p2m_top[topidx][mididx] = (ident_pfns ? p2m_identity : p2m_missing);
+	/* And the other for save/restore.. */
+	mid_mfn_p = p2m_top_mfn_p[topidx];
+	/* NOTE: Even if it is a p2m_identity it should still be point to
+	 * a page filled with INVALID_P2M_ENTRY entries. */
+	mid_mfn_p[mididx] = virt_to_mfn(p2m_missing);
+
+	/* Reset where we want to stick the old page in. */
+	topidx = p2m_top_index(set_pfn);
+	mididx = p2m_mid_index(set_pfn);
+
+	/* This shouldn't happen */
+	if (WARN_ON(p2m_top[topidx] == p2m_mid_missing))
+		early_alloc_p2m(set_pfn);
+
+	if (WARN_ON(p2m_top[topidx][mididx] != p2m_missing))
+		return false;
+
+	p2m_init(p2m);
+	p2m_top[topidx][mididx] = p2m;
+	mid_mfn_p = p2m_top_mfn_p[topidx];
+	mid_mfn_p[mididx] = virt_to_mfn(p2m);
+
+	return true;
+}
 bool __init early_set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 {
 	if (unlikely(!__set_phys_to_machine(pfn, mfn)))  {
 		if (!early_alloc_p2m(pfn))
 			return false;
 
+		if (early_can_reuse_p2m_middle(pfn, mfn))
+			return __set_phys_to_machine(pfn, mfn);
+
 		if (!early_alloc_p2m_middle(pfn, false /* boundary crossover OK!*/))
 			return false;
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-08-17 17:46 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-16 15:50 [PATCH] Fixes for v3.6 (v1) Konrad Rzeszutek Wilk
2012-08-16 15:50 ` [PATCH 1/2] xen/p2m: Fix for 32-bit builds the "Reserve 8MB of _brk space for P2M" Konrad Rzeszutek Wilk
2012-08-16 17:32   ` [Xen-devel] " Konrad Rzeszutek Wilk
2012-08-16 21:02     ` Konrad Rzeszutek Wilk
2012-08-17 11:14       ` David Vrabel
2012-08-17 13:06         ` Konrad Rzeszutek Wilk
2012-08-17 13:28           ` David Vrabel
2012-08-17 17:36             ` Konrad Rzeszutek Wilk
2012-08-16 15:50 ` [PATCH 2/2] Revert "xen PVonHVM: move shared_info to MMIO before kexec" Konrad Rzeszutek Wilk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).