All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] x86/xen: page fault and context switch performance (v2)
@ 2012-07-09 10:39 David Vrabel
  2012-07-09 10:39 ` [PATCH 1/4] xen/mm: do direct hypercall in xen_set_pte() if batching is unavailable David Vrabel
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: David Vrabel @ 2012-07-09 10:39 UTC (permalink / raw)
  To: xen-devel
  Cc: David Vrabel, Konrad Rzeszutek Wilk, x86, H. Peter Anvin, linux-kernel

This series improves the performance of Xen PV guests when doing page
faults (32-bit guests only) and context switches.

Can an x86 maintainer ack patch 3 (x86: add desc_equal() to compare
GDT descriptors) so this series can go via Konrad's tree?  The other
patches are Xen-specific.

Changes since v1:

- fix (very!) slow boot on x86_64 by avoiding hypercalls when building
  the initial page tables.  These updates were mostly a) done to
  unpinned pages and b) often had invalid MFNs that Xen warns about.

- When building the initial page tables, zero PTEs for pages without
  an MFN yet.

- add desc_equal() instead of open coding it.

David


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] xen/mm: do direct hypercall in xen_set_pte() if batching is unavailable
  2012-07-09 10:39 [PATCH 0/4] x86/xen: page fault and context switch performance (v2) David Vrabel
@ 2012-07-09 10:39 ` David Vrabel
  2012-07-09 10:39 ` [PATCH 2/4] xen/mm: zero PTEs for non-present MFNs in the initial page table David Vrabel
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: David Vrabel @ 2012-07-09 10:39 UTC (permalink / raw)
  To: xen-devel
  Cc: David Vrabel, Konrad Rzeszutek Wilk, x86, H. Peter Anvin, linux-kernel

From: David Vrabel <david.vrabel@citrix.com>

In xen_set_pte() if batching is unavailable (because the caller is in
an interrupt context such as handling a page fault) it would fall back
to using native_set_pte() and trapping and emulating the PTE write.

On 32-bit guests this requires two traps for each PTE write (one for
each dword of the PTE).  Instead, do one mmu_update hypercall
directly.

During construction of the initial page tables, continue to use
native_set_pte() because most of the PTEs being set are in writable
and unpinned pages (see phys_pmd_init() in arch/x86/mm/init_64.c) and
using a hypercall for this is very expensive.

This significantly improves page fault performance in 32-bit PV
guests.

lmbench3 test  Before    After     Improvement
----------------------------------------------
lat_pagefault  3.18 us   2.32 us   27%
lat_proc fork  356 us    313.3 us  11%

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/mmu.c |   30 +++++++++++++++++++++++++-----
 1 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 3a73785..3f1783a 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -308,8 +308,20 @@ static bool xen_batched_set_pte(pte_t *ptep, pte_t pteval)
 
 static inline void __xen_set_pte(pte_t *ptep, pte_t pteval)
 {
-	if (!xen_batched_set_pte(ptep, pteval))
-		native_set_pte(ptep, pteval);
+	if (!xen_batched_set_pte(ptep, pteval)) {
+		/*
+		 * Could call native_set_pte() here and trap and
+		 * emulate the PTE write but with 32-bit guests this
+		 * needs two traps (one for each of the two 32-bit
+		 * words in the PTE) so do one hypercall directly
+		 * instead.
+		 */
+		struct mmu_update u;
+
+		u.ptr = virt_to_machine(ptep).maddr | MMU_NORMAL_PT_UPDATE;
+		u.val = pte_val_ma(pteval);
+		HYPERVISOR_mmu_update(&u, 1, NULL, DOMID_SELF);
+	}
 }
 
 static void xen_set_pte(pte_t *ptep, pte_t pteval)
@@ -1416,13 +1428,21 @@ static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte)
 }
 #endif /* CONFIG_X86_64 */
 
-/* Init-time set_pte while constructing initial pagetables, which
-   doesn't allow RO pagetable pages to be remapped RW */
+/*
+ * Init-time set_pte while constructing initial pagetables, which
+ * doesn't allow RO page table pages to be remapped RW.
+ *
+ * Many of these PTE updates are done on unpinned and writable pages
+ * and doing a hypercall for these is unnecessary and expensive.  At
+ * this point it is not possible to tell if a page is pinned or not,
+ * so always write the PTE directly and rely on Xen trapping and
+ * emulating any updates as necessary.
+ */
 static void __init xen_set_pte_init(pte_t *ptep, pte_t pte)
 {
 	pte = mask_rw_pte(ptep, pte);
 
-	xen_set_pte(ptep, pte);
+	native_set_pte(ptep, pte);
 }
 
 static void pin_pagetable_pfn(unsigned cmd, unsigned long pfn)
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] xen/mm: zero PTEs for non-present MFNs in the initial page table
  2012-07-09 10:39 [PATCH 0/4] x86/xen: page fault and context switch performance (v2) David Vrabel
  2012-07-09 10:39 ` [PATCH 1/4] xen/mm: do direct hypercall in xen_set_pte() if batching is unavailable David Vrabel
@ 2012-07-09 10:39 ` David Vrabel
  2012-07-09 10:39 ` [PATCH 3/4] x86: add desc_equal() to compare GDT descriptors David Vrabel
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: David Vrabel @ 2012-07-09 10:39 UTC (permalink / raw)
  To: xen-devel
  Cc: David Vrabel, Konrad Rzeszutek Wilk, x86, H. Peter Anvin, linux-kernel

From: David Vrabel <david.vrabel@citrix.com>

When constructing the initial page tables, if the MFN for a usable PFN
is missing in the p2m then that frame is initially ballooned out.  In
this case, zero the PTE (as in decrease_reservation() in
drivers/xen/balloon.c).

This is obviously safe instead of having an valid PTE with an MFN of
INVALID_P2M_ENTRY (~0).

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/mmu.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 3f1783a..27336df 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1432,6 +1432,10 @@ static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte)
  * Init-time set_pte while constructing initial pagetables, which
  * doesn't allow RO page table pages to be remapped RW.
  *
+ * If there is no MFN for this PFN then this page is initially
+ * ballooned out so clear the PTE (as in decrease_reservation() in
+ * drivers/xen/balloon.c).
+ *
  * Many of these PTE updates are done on unpinned and writable pages
  * and doing a hypercall for these is unnecessary and expensive.  At
  * this point it is not possible to tell if a page is pinned or not,
@@ -1440,7 +1444,10 @@ static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte)
  */
 static void __init xen_set_pte_init(pte_t *ptep, pte_t pte)
 {
-	pte = mask_rw_pte(ptep, pte);
+	if (pte_mfn(pte) != INVALID_P2M_ENTRY)
+		pte = mask_rw_pte(ptep, pte);
+	else
+		pte = __pte_ma(0);
 
 	native_set_pte(ptep, pte);
 }
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] x86: add desc_equal() to compare GDT descriptors
  2012-07-09 10:39 [PATCH 0/4] x86/xen: page fault and context switch performance (v2) David Vrabel
  2012-07-09 10:39 ` [PATCH 1/4] xen/mm: do direct hypercall in xen_set_pte() if batching is unavailable David Vrabel
  2012-07-09 10:39 ` [PATCH 2/4] xen/mm: zero PTEs for non-present MFNs in the initial page table David Vrabel
@ 2012-07-09 10:39 ` David Vrabel
  2012-07-11 19:21   ` Konrad Rzeszutek Wilk
  2012-07-17 15:22   ` David Vrabel
  2012-07-09 10:39 ` [PATCH 4/4] x86/xen: avoid updating TLS descriptors if they haven't changed David Vrabel
  2012-07-09 14:38 ` [Xen-devel] [PATCH 0/4] x86/xen: page fault and context switch performance (v2) Ian Campbell
  4 siblings, 2 replies; 10+ messages in thread
From: David Vrabel @ 2012-07-09 10:39 UTC (permalink / raw)
  To: xen-devel
  Cc: David Vrabel, Konrad Rzeszutek Wilk, x86, H. Peter Anvin, linux-kernel

From: David Vrabel <david.vrabel@citrix.com>

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/include/asm/desc.h |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index 8bf1c06..d26f059 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -85,6 +85,12 @@ static inline int desc_empty(const void *ptr)
 	return !(desc[0] | desc[1]);
 }
 
+static inline bool desc_equal(const struct desc_struct *d1,
+			      const struct desc_struct *d2)
+{
+	return d1->a == d2->a && d1->b == d2->b;
+}
+
 #ifdef CONFIG_PARAVIRT
 #include <asm/paravirt.h>
 #else
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] x86/xen: avoid updating TLS descriptors if they haven't changed
  2012-07-09 10:39 [PATCH 0/4] x86/xen: page fault and context switch performance (v2) David Vrabel
                   ` (2 preceding siblings ...)
  2012-07-09 10:39 ` [PATCH 3/4] x86: add desc_equal() to compare GDT descriptors David Vrabel
@ 2012-07-09 10:39 ` David Vrabel
  2012-07-09 14:38 ` [Xen-devel] [PATCH 0/4] x86/xen: page fault and context switch performance (v2) Ian Campbell
  4 siblings, 0 replies; 10+ messages in thread
From: David Vrabel @ 2012-07-09 10:39 UTC (permalink / raw)
  To: xen-devel
  Cc: David Vrabel, Konrad Rzeszutek Wilk, x86, H. Peter Anvin, linux-kernel

From: David Vrabel <david.vrabel@citrix.com>

When switching tasks in a Xen PV guest, avoid updating the TLS
descriptors if they haven't changed.  This improves the speed of
context switches by almost 10% as much of the time the descriptors are
the same or only one is different.

The descriptors written into the GDT by Xen are modified from the
values passed in the update_descriptor hypercall so we keep shadow
copies of the three TLS descriptors to compare against.

lmbench3 test     Before  After  Improvement
--------------------------------------------
lat_ctx -s 32 24   7.19    6.52  9%
lat_pipe          12.56   11.66  7%

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/enlighten.c |   29 ++++++++++++++++++++++++++---
 1 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index ff962d4..33d53a1 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -124,6 +124,19 @@ struct shared_info *HYPERVISOR_shared_info = (void *)&xen_dummy_shared_info;
  */
 static int have_vcpu_info_placement = 1;
 
+struct tls_descs {
+	struct desc_struct desc[3];
+};
+
+/*
+ * Updating the 3 TLS descriptors in the GDT on every task switch is
+ * surprisingly expensive so we avoid updating them if they haven't
+ * changed.  Since Xen writes different descriptors than the one
+ * passed in the update_descriptor hypercall we keep shadow copies to
+ * compare against.
+ */
+static DEFINE_PER_CPU(struct tls_descs, shadow_tls_desc);
+
 static void clamp_max_cpus(void)
 {
 #ifdef CONFIG_SMP
@@ -543,9 +556,19 @@ static void __init xen_load_gdt_boot(const struct desc_ptr *dtr)
 static void load_TLS_descriptor(struct thread_struct *t,
 				unsigned int cpu, unsigned int i)
 {
-	struct desc_struct *gdt = get_cpu_gdt_table(cpu);
-	xmaddr_t maddr = arbitrary_virt_to_machine(&gdt[GDT_ENTRY_TLS_MIN+i]);
-	struct multicall_space mc = __xen_mc_entry(0);
+	struct desc_struct *shadow = &per_cpu(shadow_tls_desc, cpu).desc[i];
+	struct desc_struct *gdt;
+	xmaddr_t maddr;
+	struct multicall_space mc;
+
+	if (desc_equal(shadow, &t->tls_array[i]))
+		return;
+
+	*shadow = t->tls_array[i];
+
+	gdt = get_cpu_gdt_table(cpu);
+	maddr = arbitrary_virt_to_machine(&gdt[GDT_ENTRY_TLS_MIN+i]);
+	mc = __xen_mc_entry(0);
 
 	MULTI_update_descriptor(mc.mc, maddr.maddr, t->tls_array[i]);
 }
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Xen-devel] [PATCH 0/4] x86/xen: page fault and context switch performance (v2)
  2012-07-09 10:39 [PATCH 0/4] x86/xen: page fault and context switch performance (v2) David Vrabel
                   ` (3 preceding siblings ...)
  2012-07-09 10:39 ` [PATCH 4/4] x86/xen: avoid updating TLS descriptors if they haven't changed David Vrabel
@ 2012-07-09 14:38 ` Ian Campbell
  4 siblings, 0 replies; 10+ messages in thread
From: Ian Campbell @ 2012-07-09 14:38 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, linux-kernel, x86, H. Peter Anvin, Konrad Rzeszutek Wilk

On Mon, 2012-07-09 at 06:39 -0400, David Vrabel wrote:
> This series improves the performance of Xen PV guests when doing page
> faults (32-bit guests only) and context switches.

These all look good to me, thanks David.

Acked-by: Ian Campbell <ian.campbell@citrix.com>



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/4] x86: add desc_equal() to compare GDT descriptors
  2012-07-09 10:39 ` [PATCH 3/4] x86: add desc_equal() to compare GDT descriptors David Vrabel
@ 2012-07-11 19:21   ` Konrad Rzeszutek Wilk
  2012-07-17 15:22   ` David Vrabel
  1 sibling, 0 replies; 10+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-11 19:21 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, x86, H. Peter Anvin, linux-kernel

On Mon, Jul 09, 2012 at 11:39:07AM +0100, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

> ---
>  arch/x86/include/asm/desc.h |    6 ++++++
>  1 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
> index 8bf1c06..d26f059 100644
> --- a/arch/x86/include/asm/desc.h
> +++ b/arch/x86/include/asm/desc.h
> @@ -85,6 +85,12 @@ static inline int desc_empty(const void *ptr)
>  	return !(desc[0] | desc[1]);
>  }
>  
> +static inline bool desc_equal(const struct desc_struct *d1,
> +			      const struct desc_struct *d2)
> +{
> +	return d1->a == d2->a && d1->b == d2->b;
> +}
> +
>  #ifdef CONFIG_PARAVIRT
>  #include <asm/paravirt.h>
>  #else
> -- 
> 1.7.2.5

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/4] x86: add desc_equal() to compare GDT descriptors
  2012-07-09 10:39 ` [PATCH 3/4] x86: add desc_equal() to compare GDT descriptors David Vrabel
  2012-07-11 19:21   ` Konrad Rzeszutek Wilk
@ 2012-07-17 15:22   ` David Vrabel
  2012-07-17 15:57       ` H. Peter Anvin
  1 sibling, 1 reply; 10+ messages in thread
From: David Vrabel @ 2012-07-17 15:22 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, Konrad Rzeszutek Wilk, x86, H. Peter Anvin, linux-kernel

On 09/07/12 11:39, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

ping?  Can we get an x86 an x86 maintainer for this patch, please?

David

> ---
>  arch/x86/include/asm/desc.h |    6 ++++++
>  1 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
> index 8bf1c06..d26f059 100644
> --- a/arch/x86/include/asm/desc.h
> +++ b/arch/x86/include/asm/desc.h
> @@ -85,6 +85,12 @@ static inline int desc_empty(const void *ptr)
>  	return !(desc[0] | desc[1]);
>  }
>  
> +static inline bool desc_equal(const struct desc_struct *d1,
> +			      const struct desc_struct *d2)
> +{
> +	return d1->a == d2->a && d1->b == d2->b;
> +}
> +
>  #ifdef CONFIG_PARAVIRT
>  #include <asm/paravirt.h>
>  #else

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/4] x86: add desc_equal() to compare GDT descriptors
  2012-07-17 15:22   ` David Vrabel
@ 2012-07-17 15:57       ` H. Peter Anvin
  0 siblings, 0 replies; 10+ messages in thread
From: H. Peter Anvin @ 2012-07-17 15:57 UTC (permalink / raw)
  To: David Vrabel, David Vrabel
  Cc: xen-devel, Konrad Rzeszutek Wilk, x86, linux-kernel

Sorry we're all travelling at the moment

David Vrabel <david.vrabel@citrix.com> wrote:

>On 09/07/12 11:39, David Vrabel wrote:
>> From: David Vrabel <david.vrabel@citrix.com>
>> 
>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
>
>ping?  Can we get an x86 an x86 maintainer for this patch, please?
>
>David
>
>> ---
>>  arch/x86/include/asm/desc.h |    6 ++++++
>>  1 files changed, 6 insertions(+), 0 deletions(-)
>> 
>> diff --git a/arch/x86/include/asm/desc.h
>b/arch/x86/include/asm/desc.h
>> index 8bf1c06..d26f059 100644
>> --- a/arch/x86/include/asm/desc.h
>> +++ b/arch/x86/include/asm/desc.h
>> @@ -85,6 +85,12 @@ static inline int desc_empty(const void *ptr)
>>  	return !(desc[0] | desc[1]);
>>  }
>>  
>> +static inline bool desc_equal(const struct desc_struct *d1,
>> +			      const struct desc_struct *d2)
>> +{
>> +	return d1->a == d2->a && d1->b == d2->b;
>> +}
>> +
>>  #ifdef CONFIG_PARAVIRT
>>  #include <asm/paravirt.h>
>>  #else

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/4] x86: add desc_equal() to compare GDT descriptors
@ 2012-07-17 15:57       ` H. Peter Anvin
  0 siblings, 0 replies; 10+ messages in thread
From: H. Peter Anvin @ 2012-07-17 15:57 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Konrad Rzeszutek Wilk, x86, linux-kernel

Sorry we're all travelling at the moment

David Vrabel <david.vrabel@citrix.com> wrote:

>On 09/07/12 11:39, David Vrabel wrote:
>> From: David Vrabel <david.vrabel@citrix.com>
>> 
>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
>
>ping?  Can we get an x86 an x86 maintainer for this patch, please?
>
>David
>
>> ---
>>  arch/x86/include/asm/desc.h |    6 ++++++
>>  1 files changed, 6 insertions(+), 0 deletions(-)
>> 
>> diff --git a/arch/x86/include/asm/desc.h
>b/arch/x86/include/asm/desc.h
>> index 8bf1c06..d26f059 100644
>> --- a/arch/x86/include/asm/desc.h
>> +++ b/arch/x86/include/asm/desc.h
>> @@ -85,6 +85,12 @@ static inline int desc_empty(const void *ptr)
>>  	return !(desc[0] | desc[1]);
>>  }
>>  
>> +static inline bool desc_equal(const struct desc_struct *d1,
>> +			      const struct desc_struct *d2)
>> +{
>> +	return d1->a == d2->a && d1->b == d2->b;
>> +}
>> +
>>  #ifdef CONFIG_PARAVIRT
>>  #include <asm/paravirt.h>
>>  #else

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-07-17 15:58 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-09 10:39 [PATCH 0/4] x86/xen: page fault and context switch performance (v2) David Vrabel
2012-07-09 10:39 ` [PATCH 1/4] xen/mm: do direct hypercall in xen_set_pte() if batching is unavailable David Vrabel
2012-07-09 10:39 ` [PATCH 2/4] xen/mm: zero PTEs for non-present MFNs in the initial page table David Vrabel
2012-07-09 10:39 ` [PATCH 3/4] x86: add desc_equal() to compare GDT descriptors David Vrabel
2012-07-11 19:21   ` Konrad Rzeszutek Wilk
2012-07-17 15:22   ` David Vrabel
2012-07-17 15:57     ` H. Peter Anvin
2012-07-17 15:57       ` H. Peter Anvin
2012-07-09 10:39 ` [PATCH 4/4] x86/xen: avoid updating TLS descriptors if they haven't changed David Vrabel
2012-07-09 14:38 ` [Xen-devel] [PATCH 0/4] x86/xen: page fault and context switch performance (v2) Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.