[PATCH 0/2] x86/mm: remove arch-specific PTE/PMD get-and-clear functions

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] x86/mm: remove arch-specific PTE/PMD get-and-clear functions
@ 2012-06-13 10:20 David Vrabel
  2012-06-13 10:20 ` [PATCH 1/2] x86/mm: remove arch-specific ptep_get_and_clear() function David Vrabel
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: David Vrabel @ 2012-06-13 10:20 UTC (permalink / raw)
  To: xen-devel
  Cc: David Vrabel, Konrad Rzeszutek Wilk, H. Peter Anvin, x86, linux-kernel

This series removes the x86-specific implementation of
ptep_get_and_clear() and pmdp_get_and_clear().

The principal reason for this is it allows Xen paravitualized guests
to batch the PTE clears which is a significant performance
optimization of munmap() and mremap() -- the number of entries into
the hypervisor is reduced by about a factor of about 30 (60 in 32-bit
guests) for munmap().

There may be minimal gains on native and KVM guests due to the removal
of the locked xchg.

Removal of arch-specific functions where generic ones are suitable
seems to be a generally useful thing to me.

The full reasoning for why this is safe is included in the commit
message of patch 1 but to summarize.  The atomic get-and-clear does
not guarantee that the latest dirty/accessed bits are returned as TLB
as there is a still a window after the get-and-clear and before the
TLB flush that the bits may be updated on other processors.  So, user
space applications accessing pages that are being unmapped or remapped
already have unpredictable behaviour.

David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] x86/mm: remove arch-specific ptep_get_and_clear() function
  2012-06-13 10:20 [PATCH 0/2] x86/mm: remove arch-specific PTE/PMD get-and-clear functions David Vrabel
@ 2012-06-13 10:20 ` David Vrabel
  2012-06-15  9:41   ` David Vrabel
  2012-06-13 10:20 ` [PATCH 2/2] x86/mm: remove arch-specific pmdp_get_and_clear() function David Vrabel
  2012-06-13 14:04 ` [PATCH 0/2] x86/mm: remove arch-specific PTE/PMD get-and-clear functions Konrad Rzeszutek Wilk
  2 siblings, 1 reply; 11+ messages in thread
From: David Vrabel @ 2012-06-13 10:20 UTC (permalink / raw)
  To: xen-devel
  Cc: David Vrabel, Konrad Rzeszutek Wilk, H. Peter Anvin, x86, linux-kernel

From: David Vrabel <david.vrabel@citrix.com>

x86 provides a ptep_get_and_clear() function instead of using the
generic implementation so it can atomically get and clear the PTE.

It is not clear why x86 has this requirement.  PTE updates are done
while the appropriate PTE lock are held, so presumably, it is an
attempt to ensure that the accessed and dirty bits of the PTE are
saved even if they have been updated by the hardware due to a
concurrent access on another CPU.

However, the atomic exchange is not sufficient for saving the dirty
bit as if there is a TLB entry for this page on another CPU then the
dirty bit may be written by that processor after the PTE is cleared
but before the TLB entry is flushed.  It is also not clear from the
Intel SDM[1] if the processor's read of the PTE and the write to set
the accessed bit are atomic or not.

With this patch the get/clear becomes a read of the PTE and call to
pte_clear() which allows the clears to be batched by some
paravirtualized guests (such as Xen guests).  This can lead to
significant performance improvements in munmap().  e.g., for Xen, a
trap-and-emulate[2] for every page becomes a hypercall for every 31
pages.

There should be no change in the performance on native or for KVM
guests.  There may be a performance regression with lguest guests as
an optimization for avoiding calling pte_update() when doing a full
teardown of an mm is removed.

As a consequence there is a slight increase of the window where a set
(by the processor) of the accessed or dirty bit may be lost.  This
shouldn't change the behaviour of user space in any meaningful way.
Any user space application accessing a mmap()'d region that is being
munmap()'d or mremap()'d is already going to have unpredicatable
behaviour -- the access may succeed, it may fault, or the write to the
mapped file may be lost.

[1] Intel Software Developer's Manual Vol 3A section 4.8 says nothing
on this.

[2] For 32-bit guests, two traps are required for every page to update
both halves of the PTE.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/include/asm/pgtable-2level.h |    9 --------
 arch/x86/include/asm/pgtable-3level.h |   16 --------------
 arch/x86/include/asm/pgtable.h        |   37 ---------------------------------
 arch/x86/include/asm/pgtable_64.h     |   13 -----------
 4 files changed, 0 insertions(+), 75 deletions(-)

diff --git a/arch/x86/include/asm/pgtable-2level.h b/arch/x86/include/asm/pgtable-2level.h
index 98391db..be7c20e 100644
--- a/arch/x86/include/asm/pgtable-2level.h
+++ b/arch/x86/include/asm/pgtable-2level.h
@@ -38,15 +38,6 @@ static inline void native_pte_clear(struct mm_struct *mm,
 }
 
 #ifdef CONFIG_SMP
-static inline pte_t native_ptep_get_and_clear(pte_t *xp)
-{
-	return __pte(xchg(&xp->pte_low, 0));
-}
-#else
-#define native_ptep_get_and_clear(xp) native_local_ptep_get_and_clear(xp)
-#endif
-
-#ifdef CONFIG_SMP
 static inline pmd_t native_pmdp_get_and_clear(pmd_t *xp)
 {
 	return __pmd(xchg((pmdval_t *)xp, 0));
diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h
index 43876f1..42101e9 100644
--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -134,22 +134,6 @@ static inline void pud_clear(pud_t *pudp)
 }
 
 #ifdef CONFIG_SMP
-static inline pte_t native_ptep_get_and_clear(pte_t *ptep)
-{
-	pte_t res;
-
-	/* xchg acts as a barrier before the setting of the high bits */
-	res.pte_low = xchg(&ptep->pte_low, 0);
-	res.pte_high = ptep->pte_high;
-	ptep->pte_high = 0;
-
-	return res;
-}
-#else
-#define native_ptep_get_and_clear(xp) native_local_ptep_get_and_clear(xp)
-#endif
-
-#ifdef CONFIG_SMP
 union split_pmd {
 	struct {
 		u32 pmd_low;
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 49afb3f..4413bed 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -598,16 +598,6 @@ static inline int pgd_none(pgd_t pgd)
 
 extern int direct_gbpages;
 
-/* local pte updates need not use xchg for locking */
-static inline pte_t native_local_ptep_get_and_clear(pte_t *ptep)
-{
-	pte_t res = *ptep;
-
-	/* Pure native function needs no input for mm, addr */
-	native_pte_clear(NULL, 0, ptep);
-	return res;
-}
-
 static inline pmd_t native_local_pmdp_get_and_clear(pmd_t *pmdp)
 {
 	pmd_t res = *pmdp;
@@ -668,33 +658,6 @@ extern int ptep_test_and_clear_young(struct vm_area_struct *vma,
 extern int ptep_clear_flush_young(struct vm_area_struct *vma,
 				  unsigned long address, pte_t *ptep);
 
-#define __HAVE_ARCH_PTEP_GET_AND_CLEAR
-static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
-				       pte_t *ptep)
-{
-	pte_t pte = native_ptep_get_and_clear(ptep);
-	pte_update(mm, addr, ptep);
-	return pte;
-}
-
-#define __HAVE_ARCH_PTEP_GET_AND_CLEAR_FULL
-static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm,
-					    unsigned long addr, pte_t *ptep,
-					    int full)
-{
-	pte_t pte;
-	if (full) {
-		/*
-		 * Full address destruction in progress; paravirt does not
-		 * care about updates and native needs no locking
-		 */
-		pte = native_local_ptep_get_and_clear(ptep);
-	} else {
-		pte = ptep_get_and_clear(mm, addr, ptep);
-	}
-	return pte;
-}
-
 #define __HAVE_ARCH_PTEP_SET_WRPROTECT
 static inline void ptep_set_wrprotect(struct mm_struct *mm,
 				      unsigned long addr, pte_t *ptep)
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 975f709..cc12d27 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -69,19 +69,6 @@ static inline void native_pmd_clear(pmd_t *pmd)
 	native_set_pmd(pmd, native_make_pmd(0));
 }
 
-static inline pte_t native_ptep_get_and_clear(pte_t *xp)
-{
-#ifdef CONFIG_SMP
-	return native_make_pte(xchg(&xp->pte, 0));
-#else
-	/* native_local_ptep_get_and_clear,
-	   but duplicated because of cyclic dependency */
-	pte_t ret = *xp;
-	native_pte_clear(NULL, 0, xp);
-	return ret;
-#endif
-}
-
 static inline pmd_t native_pmdp_get_and_clear(pmd_t *xp)
 {
 #ifdef CONFIG_SMP
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] x86/mm: remove arch-specific pmdp_get_and_clear() function
  2012-06-13 10:20 [PATCH 0/2] x86/mm: remove arch-specific PTE/PMD get-and-clear functions David Vrabel
  2012-06-13 10:20 ` [PATCH 1/2] x86/mm: remove arch-specific ptep_get_and_clear() function David Vrabel
@ 2012-06-13 10:20 ` David Vrabel
  2012-06-13 14:04 ` [PATCH 0/2] x86/mm: remove arch-specific PTE/PMD get-and-clear functions Konrad Rzeszutek Wilk
  2 siblings, 0 replies; 11+ messages in thread
From: David Vrabel @ 2012-06-13 10:20 UTC (permalink / raw)
  To: xen-devel
  Cc: David Vrabel, Konrad Rzeszutek Wilk, H. Peter Anvin, x86, linux-kernel

From: David Vrabel <david.vrabel@citrix.com>

Similar to the removal of the x86 specific ptep_get_and_clear(), do
the same for pmdp_get_and_clear().  The reasoning for this is the
same.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/include/asm/pgtable-2level.h |    9 ---------
 arch/x86/include/asm/pgtable-3level.h |   23 -----------------------
 arch/x86/include/asm/pgtable.h        |   17 -----------------
 arch/x86/include/asm/pgtable_64.h     |   13 -------------
 4 files changed, 0 insertions(+), 62 deletions(-)

diff --git a/arch/x86/include/asm/pgtable-2level.h b/arch/x86/include/asm/pgtable-2level.h
index be7c20e..9b08339 100644
--- a/arch/x86/include/asm/pgtable-2level.h
+++ b/arch/x86/include/asm/pgtable-2level.h
@@ -37,15 +37,6 @@ static inline void native_pte_clear(struct mm_struct *mm,
 	*xp = native_make_pte(0);
 }
 
-#ifdef CONFIG_SMP
-static inline pmd_t native_pmdp_get_and_clear(pmd_t *xp)
-{
-	return __pmd(xchg((pmdval_t *)xp, 0));
-}
-#else
-#define native_pmdp_get_and_clear(xp) native_local_pmdp_get_and_clear(xp)
-#endif
-
 /*
  * Bits _PAGE_BIT_PRESENT, _PAGE_BIT_FILE and _PAGE_BIT_PROTNONE are taken,
  * split up the 29 bits of offset into this range:
diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h
index 42101e9..f081699 100644
--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -133,29 +133,6 @@ static inline void pud_clear(pud_t *pudp)
 	 */
 }
 
-#ifdef CONFIG_SMP
-union split_pmd {
-	struct {
-		u32 pmd_low;
-		u32 pmd_high;
-	};
-	pmd_t pmd;
-};
-static inline pmd_t native_pmdp_get_and_clear(pmd_t *pmdp)
-{
-	union split_pmd res, *orig = (union split_pmd *)pmdp;
-
-	/* xchg acts as a barrier before setting of the high bits */
-	res.pmd_low = xchg(&orig->pmd_low, 0);
-	res.pmd_high = orig->pmd_high;
-	orig->pmd_high = 0;
-
-	return res.pmd;
-}
-#else
-#define native_pmdp_get_and_clear(xp) native_local_pmdp_get_and_clear(xp)
-#endif
-
 /*
  * Bits 0, 6 and 7 are taken in the low part of the pte,
  * put the 32 bits of offset into the high part.
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 4413bed..34f576c 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -598,14 +598,6 @@ static inline int pgd_none(pgd_t pgd)
 
 extern int direct_gbpages;
 
-static inline pmd_t native_local_pmdp_get_and_clear(pmd_t *pmdp)
-{
-	pmd_t res = *pmdp;
-
-	native_pmd_clear(pmdp);
-	return res;
-}
-
 static inline void native_set_pte_at(struct mm_struct *mm, unsigned long addr,
 				     pte_t *ptep , pte_t pte)
 {
@@ -694,15 +686,6 @@ static inline int pmd_write(pmd_t pmd)
 	return pmd_flags(pmd) & _PAGE_RW;
 }
 
-#define __HAVE_ARCH_PMDP_GET_AND_CLEAR
-static inline pmd_t pmdp_get_and_clear(struct mm_struct *mm, unsigned long addr,
-				       pmd_t *pmdp)
-{
-	pmd_t pmd = native_pmdp_get_and_clear(pmdp);
-	pmd_update(mm, addr, pmdp);
-	return pmd;
-}
-
 #define __HAVE_ARCH_PMDP_SET_WRPROTECT
 static inline void pmdp_set_wrprotect(struct mm_struct *mm,
 				      unsigned long addr, pmd_t *pmdp)
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index cc12d27..92eb013 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -69,19 +69,6 @@ static inline void native_pmd_clear(pmd_t *pmd)
 	native_set_pmd(pmd, native_make_pmd(0));
 }
 
-static inline pmd_t native_pmdp_get_and_clear(pmd_t *xp)
-{
-#ifdef CONFIG_SMP
-	return native_make_pmd(xchg(&xp->pmd, 0));
-#else
-	/* native_local_pmdp_get_and_clear,
-	   but duplicated because of cyclic dependency */
-	pmd_t ret = *xp;
-	native_pmd_clear(xp);
-	return ret;
-#endif
-}
-
 static inline void native_set_pud(pud_t *pudp, pud_t pud)
 {
 	*pudp = pud;
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] x86/mm: remove arch-specific PTE/PMD get-and-clear functions
  2012-06-13 10:20 [PATCH 0/2] x86/mm: remove arch-specific PTE/PMD get-and-clear functions David Vrabel
  2012-06-13 10:20 ` [PATCH 1/2] x86/mm: remove arch-specific ptep_get_and_clear() function David Vrabel
  2012-06-13 10:20 ` [PATCH 2/2] x86/mm: remove arch-specific pmdp_get_and_clear() function David Vrabel
@ 2012-06-13 14:04 ` Konrad Rzeszutek Wilk
  2012-06-13 15:00   ` David Vrabel
  2 siblings, 1 reply; 11+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-06-13 14:04 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, H. Peter Anvin, x86, linux-kernel

On Wed, Jun 13, 2012 at 11:20:43AM +0100, David Vrabel wrote:
> This series removes the x86-specific implementation of
> ptep_get_and_clear() and pmdp_get_and_clear().
> 
> The principal reason for this is it allows Xen paravitualized guests
> to batch the PTE clears which is a significant performance
> optimization of munmap() and mremap() -- the number of entries into
> the hypervisor is reduced by about a factor of about 30 (60 in 32-bit
> guests) for munmap().
> 
> There may be minimal gains on native and KVM guests due to the removal
> of the locked xchg.

What about lguest?
> 
> Removal of arch-specific functions where generic ones are suitable
> seems to be a generally useful thing to me.
> 
> The full reasoning for why this is safe is included in the commit
> message of patch 1 but to summarize.  The atomic get-and-clear does
> not guarantee that the latest dirty/accessed bits are returned as TLB
> as there is a still a window after the get-and-clear and before the
> TLB flush that the bits may be updated on other processors.  So, user
> space applications accessing pages that are being unmapped or remapped
> already have unpredictable behaviour.
> 
> David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] x86/mm: remove arch-specific PTE/PMD get-and-clear functions
  2012-06-13 14:04 ` [PATCH 0/2] x86/mm: remove arch-specific PTE/PMD get-and-clear functions Konrad Rzeszutek Wilk
@ 2012-06-13 15:00   ` David Vrabel
  2012-06-14 18:29     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 11+ messages in thread
From: David Vrabel @ 2012-06-13 15:00 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, H. Peter Anvin, x86, linux-kernel

On 13/06/12 15:04, Konrad Rzeszutek Wilk wrote:
> On Wed, Jun 13, 2012 at 11:20:43AM +0100, David Vrabel wrote:
>> This series removes the x86-specific implementation of
>> ptep_get_and_clear() and pmdp_get_and_clear().
>>
>> The principal reason for this is it allows Xen paravitualized guests
>> to batch the PTE clears which is a significant performance
>> optimization of munmap() and mremap() -- the number of entries into
>> the hypervisor is reduced by about a factor of about 30 (60 in 32-bit
>> guests) for munmap().
>>
>> There may be minimal gains on native and KVM guests due to the removal
>> of the locked xchg.
> 
> What about lguest?

As I note in the description of patch 1:

"There may be a performance regression with lguest guests as
an optimization for avoiding calling pte_update() when doing a full
teardown of an mm is removed."

I don't know how much this performance regression would be or if the
performance of lguest guests is something people care about.

We could have an x86-specific ptep_get_and_clear_full() which looks like:

pte_t ptep_get_and_clear_full(
	struct mm_struct *mm, unsigned long addr, pte_t *ptep,
	int full)
{
	pte_t pte = *ptep;

	pte_clear(mm, address, ptep);
	if (!full)
		pte_update(mm, addr, ptep);

	return pte;
}

Which would have all the performance benefits of the proposed patch
without the performance regression with lguest.

David

>>
>> Removal of arch-specific functions where generic ones are suitable
>> seems to be a generally useful thing to me.
>>
>> The full reasoning for why this is safe is included in the commit
>> message of patch 1 but to summarize.  The atomic get-and-clear does
>> not guarantee that the latest dirty/accessed bits are returned as TLB
>> as there is a still a window after the get-and-clear and before the
>> TLB flush that the bits may be updated on other processors.  So, user
>> space applications accessing pages that are being unmapped or remapped
>> already have unpredictable behaviour.
>>
>> David


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] x86/mm: remove arch-specific PTE/PMD get-and-clear functions
  2012-06-13 15:00   ` David Vrabel
@ 2012-06-14 18:29     ` Konrad Rzeszutek Wilk
  2012-06-14 18:41       ` H. Peter Anvin
  0 siblings, 1 reply; 11+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-06-14 18:29 UTC (permalink / raw)
  To: David Vrabel, rusty; +Cc: xen-devel, H. Peter Anvin, x86, linux-kernel

On Wed, Jun 13, 2012 at 04:00:23PM +0100, David Vrabel wrote:
> On 13/06/12 15:04, Konrad Rzeszutek Wilk wrote:
> > On Wed, Jun 13, 2012 at 11:20:43AM +0100, David Vrabel wrote:
> >> This series removes the x86-specific implementation of
> >> ptep_get_and_clear() and pmdp_get_and_clear().
> >>
> >> The principal reason for this is it allows Xen paravitualized guests
> >> to batch the PTE clears which is a significant performance
> >> optimization of munmap() and mremap() -- the number of entries into
> >> the hypervisor is reduced by about a factor of about 30 (60 in 32-bit
> >> guests) for munmap().
> >>
> >> There may be minimal gains on native and KVM guests due to the removal
> >> of the locked xchg.
> > 
> > What about lguest?
> 
> As I note in the description of patch 1:
> 
> "There may be a performance regression with lguest guests as
> an optimization for avoiding calling pte_update() when doing a full
> teardown of an mm is removed."
> 
> I don't know how much this performance regression would be or if the
> performance of lguest guests is something people care about.
> 
> We could have an x86-specific ptep_get_and_clear_full() which looks like:
> 
> pte_t ptep_get_and_clear_full(
> 	struct mm_struct *mm, unsigned long addr, pte_t *ptep,
> 	int full)
> {
> 	pte_t pte = *ptep;
> 
> 	pte_clear(mm, address, ptep);
> 	if (!full)
> 		pte_update(mm, addr, ptep);
> 
> 	return pte;
> }
> 
> Which would have all the performance benefits of the proposed patch
> without the performance regression with lguest.

Lets rope Rusty in this since he is the maintainer of lguest.
> 
> David
> 
> >>
> >> Removal of arch-specific functions where generic ones are suitable
> >> seems to be a generally useful thing to me.
> >>
> >> The full reasoning for why this is safe is included in the commit
> >> message of patch 1 but to summarize.  The atomic get-and-clear does
> >> not guarantee that the latest dirty/accessed bits are returned as TLB
> >> as there is a still a window after the get-and-clear and before the
> >> TLB flush that the bits may be updated on other processors.  So, user
> >> space applications accessing pages that are being unmapped or remapped
> >> already have unpredictable behaviour.
> >>
> >> David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] x86/mm: remove arch-specific PTE/PMD get-and-clear functions
  2012-06-14 18:29     ` Konrad Rzeszutek Wilk
@ 2012-06-14 18:41       ` H. Peter Anvin
  2012-06-18  9:13           ` Rusty Russell
  0 siblings, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2012-06-14 18:41 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: David Vrabel, rusty, xen-devel, x86, linux-kernel

On 06/14/2012 11:29 AM, Konrad Rzeszutek Wilk wrote:
> 
> Lets rope Rusty in this since he is the maintainer of lguest.
>

Also, let's have realistic expectations for lguest.

I don't want to break lguest just for sh*ts and giggles, but if lguest
is in the way of making forward progress for other things, it would have
to have a very strong case to not be sacrificed.

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] x86/mm: remove arch-specific ptep_get_and_clear() function
  2012-06-13 10:20 ` [PATCH 1/2] x86/mm: remove arch-specific ptep_get_and_clear() function David Vrabel
@ 2012-06-15  9:41   ` David Vrabel
  2012-06-15 10:49     ` [Xen-devel] " Keir Fraser
  0 siblings, 1 reply; 11+ messages in thread
From: David Vrabel @ 2012-06-15  9:41 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, Konrad Rzeszutek Wilk, H. Peter Anvin, x86, linux-kernel

On 13/06/12 11:20, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
> 
> x86 provides a ptep_get_and_clear() function instead of using the
> generic implementation so it can atomically get and clear the PTE.
> 
> It is not clear why x86 has this requirement.  PTE updates are done
> while the appropriate PTE lock are held, so presumably, it is an
> attempt to ensure that the accessed and dirty bits of the PTE are
> saved even if they have been updated by the hardware due to a
> concurrent access on another CPU.
> 
> However, the atomic exchange is not sufficient for saving the dirty
> bit as if there is a TLB entry for this page on another CPU then the
> dirty bit may be written by that processor after the PTE is cleared
> but before the TLB entry is flushed.  It is also not clear from the
> Intel SDM[1] if the processor's read of the PTE and the write to set
> the accessed bit are atomic or not.

This reasoning is probably not correct.  When a dirty bit must be
updated in a PTE the processor does a pagetable walk (possibly using any
cached page table structures).  The AMD APM section 5.4.2 states:

"The processor never sets the Accessed bit or the Dirty bit for a not
present page (P = 0)."

and

"If PTE[D] is cleared to 0, software can rely on the fact that the page
has not been written."

Thus this patch would /introduce/ a race where a dirty bit set would be
lost (rather than extending the window where this would happen).

However (and this is a weaker argument), no sensible userspace
application should be accessing pages that are being unmapped or
remapped (since it is unpredictable whether they will fault) so perhaps
this additional unpredictable behaviour is acceptable?

> With this patch the get/clear becomes a read of the PTE and call to
> pte_clear() which allows the clears to be batched by some
> paravirtualized guests (such as Xen guests).  This can lead to
> significant performance improvements in munmap().  e.g., for Xen, a
> trap-and-emulate[2] for every page becomes a hypercall for every 31
> pages.
> 
> There should be no change in the performance on native or for KVM
> guests.  There may be a performance regression with lguest guests as
> an optimization for avoiding calling pte_update() when doing a full
> teardown of an mm is removed.

Having looked further into how lguest works the performance penalty is
likely to not be as bad as I first thought.  pte_clear() does call
pte_update() but these are batched and the batch size seems to be up-to
64 calls so the performance penalty should be pretty minimal.

> [1] Intel Software Developer's Manual Vol 3A section 4.8 says nothing
> on this.
> 
> [2] For 32-bit guests, two traps are required for every page to update
> both halves of the PTE.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] x86/mm: remove arch-specific ptep_get_and_clear() function
  2012-06-15  9:41   ` David Vrabel
@ 2012-06-15 10:49     ` Keir Fraser
  0 siblings, 0 replies; 11+ messages in thread
From: Keir Fraser @ 2012-06-15 10:49 UTC (permalink / raw)
  To: David Vrabel
  Cc: linux-kernel, xen-devel, H. Peter Anvin, x86, Konrad Rzeszutek Wilk

On 15/06/2012 10:41, "David Vrabel" <david.vrabel@citrix.com> wrote:

> This reasoning is probably not correct.  When a dirty bit must be
> updated in a PTE the processor does a pagetable walk (possibly using any
> cached page table structures).  The AMD APM section 5.4.2 states:
> 
> "The processor never sets the Accessed bit or the Dirty bit for a not
> present page (P = 0)."
> 
> and
> 
> "If PTE[D] is cleared to 0, software can rely on the fact that the page
> has not been written."

Writing of dirty and accessed bits is done as part of the page-table walk on
TLB fill. A/D bits never have writeback caching semantics. It wouldn't be
safe: e.g., on unmap, TLB flushes happen after ptes have been cleared (to
avoid TLB-fill races), but that would mean that A/D updates could be lost
even on non-explicit unmaps (e.g., page out) which is obviously bad.

> Thus this patch would /introduce/ a race where a dirty bit set would be
> lost (rather than extending the window where this would happen).
> 
> However (and this is a weaker argument), no sensible userspace
> application should be accessing pages that are being unmapped or
> remapped (since it is unpredictable whether they will fault) so perhaps
> this additional unpredictable behaviour is acceptable?

If there's a big win to be had through batching, we're better off devising a
hypercall method for capturing the atomic rmw operation as it stands, rather
than subtly messing with semantics.

 -- Keir

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] x86/mm: remove arch-specific PTE/PMD get-and-clear functions
  2012-06-14 18:41       ` H. Peter Anvin
@ 2012-06-18  9:13           ` Rusty Russell
  0 siblings, 0 replies; 11+ messages in thread
From: Rusty Russell @ 2012-06-18  9:13 UTC (permalink / raw)
  To: H. Peter Anvin, Konrad Rzeszutek Wilk
  Cc: David Vrabel, xen-devel, x86, linux-kernel

On Thu, 14 Jun 2012 11:41:16 -0700, "H. Peter Anvin" <hpa@zytor.com> wrote:
> On 06/14/2012 11:29 AM, Konrad Rzeszutek Wilk wrote:
> > 
> > Lets rope Rusty in this since he is the maintainer of lguest.
> >
> 
> Also, let's have realistic expectations for lguest.
> 
> I don't want to break lguest just for sh*ts and giggles, but if lguest
> is in the way of making forward progress for other things, it would have
> to have a very strong case to not be sacrificed.

Exactly!  Please cc' me if you need me to unbreak lguest, but for
performance?  Not really...

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] x86/mm: remove arch-specific PTE/PMD get-and-clear functions
@ 2012-06-18  9:13           ` Rusty Russell
  0 siblings, 0 replies; 11+ messages in thread
From: Rusty Russell @ 2012-06-18  9:13 UTC (permalink / raw)
  To: H. Peter Anvin, Konrad Rzeszutek Wilk
  Cc: David Vrabel, xen-devel, x86, linux-kernel

On Thu, 14 Jun 2012 11:41:16 -0700, "H. Peter Anvin" <hpa@zytor.com> wrote:
> On 06/14/2012 11:29 AM, Konrad Rzeszutek Wilk wrote:
> > 
> > Lets rope Rusty in this since he is the maintainer of lguest.
> >
> 
> Also, let's have realistic expectations for lguest.
> 
> I don't want to break lguest just for sh*ts and giggles, but if lguest
> is in the way of making forward progress for other things, it would have
> to have a very strong case to not be sacrificed.

Exactly!  Please cc' me if you need me to unbreak lguest, but for
performance?  Not really...

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-06-18  9:29 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-13 10:20 [PATCH 0/2] x86/mm: remove arch-specific PTE/PMD get-and-clear functions David Vrabel
2012-06-13 10:20 ` [PATCH 1/2] x86/mm: remove arch-specific ptep_get_and_clear() function David Vrabel
2012-06-15  9:41   ` David Vrabel
2012-06-15 10:49     ` [Xen-devel] " Keir Fraser
2012-06-13 10:20 ` [PATCH 2/2] x86/mm: remove arch-specific pmdp_get_and_clear() function David Vrabel
2012-06-13 14:04 ` [PATCH 0/2] x86/mm: remove arch-specific PTE/PMD get-and-clear functions Konrad Rzeszutek Wilk
2012-06-13 15:00   ` David Vrabel
2012-06-14 18:29     ` Konrad Rzeszutek Wilk
2012-06-14 18:41       ` H. Peter Anvin
2012-06-18  9:13         ` Rusty Russell
2012-06-18  9:13           ` Rusty Russell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.