linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0 of 4] x86: cleanups from pmd lifetime series
@ 2008-02-01 16:39 Jeremy Fitzhardinge
  2008-02-01 16:39 ` [PATCH 1 of 4] x86: unify PAE/non-PAE pgd_ctor Jeremy Fitzhardinge
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Jeremy Fitzhardinge @ 2008-02-01 16:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Andi Kleen, Jan Beulich, Eduardo Pereira Habkost,
	Ian Campbell, H Peter Anvin

Hi Ingo,

Here's a followup set from that last batch of patches:
 1. fix up the pgd_ctor merge, so that non-PAE will end up getting
    kernel mappings
 2. revert "optimise-pud_clear-cr3-reload"
 3. only do a cr3 reload if pud_clear is being used on the active pagetable
 4. update documentation about PAE tlb flushing.

The third of these makes pud_clear more robust, since it doesn't rely
on it being followed by the right kind of TLB flush.  In practice it
shouldn't make any performance difference, since the only performance
critical paths pud_clear is used on are exit and execve, and they both
operate on some other pagetable at the time the old pagetable is being
pulled down.

It will generate TLB flushes in the case of a usermode process
munmapping a 1+G chunk of its address space, or something to do with
unsharing a hugetlbfs mapping.  I don't think either of these are
performance critical.

Thanks,
	J



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1 of 4] x86: unify PAE/non-PAE pgd_ctor
  2008-02-01 16:39 [PATCH 0 of 4] x86: cleanups from pmd lifetime series Jeremy Fitzhardinge
@ 2008-02-01 16:39 ` Jeremy Fitzhardinge
  2008-02-01 16:39 ` [PATCH 2 of 4] x86: revert "defer cr3 reload when doing pud_clear()" Jeremy Fitzhardinge
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Jeremy Fitzhardinge @ 2008-02-01 16:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Andi Kleen, Jan Beulich, Eduardo Pereira Habkost,
	Ian Campbell, H Peter Anvin

The constructors for PAE and non-PAE pgd_ctors are more or less
identical, and can be made into the same function.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: William Irwin <wli@holomorphy.com>

---
 arch/x86/mm/pgtable_32.c |   58 +++++++++++++++++-----------------------------
 1 file changed, 22 insertions(+), 36 deletions(-)

diff --git a/arch/x86/mm/pgtable_32.c b/arch/x86/mm/pgtable_32.c
--- a/arch/x86/mm/pgtable_32.c
+++ b/arch/x86/mm/pgtable_32.c
@@ -219,50 +219,39 @@
 	list_del(&page->lru);
 }
 
+#define UNSHARED_PTRS_PER_PGD				\
+	(SHARED_KERNEL_PMD ? USER_PTRS_PER_PGD : PTRS_PER_PGD)
 
-
-#if (PTRS_PER_PMD == 1)
-/* Non-PAE pgd constructor */
-static void pgd_ctor(void *pgd)
+static void pgd_ctor(void *p)
 {
+	pgd_t *pgd = p;
 	unsigned long flags;
 
-	/* !PAE, no pagetable sharing */
+	/* Clear usermode parts of PGD */
 	memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t));
 
 	spin_lock_irqsave(&pgd_lock, flags);
 
-	/* must happen under lock */
-	clone_pgd_range((pgd_t *)pgd + USER_PTRS_PER_PGD,
-			swapper_pg_dir + USER_PTRS_PER_PGD,
-			KERNEL_PGD_PTRS);
-	paravirt_alloc_pd_clone(__pa(pgd) >> PAGE_SHIFT,
-				__pa(swapper_pg_dir) >> PAGE_SHIFT,
-				USER_PTRS_PER_PGD,
+	/* If the pgd points to a shared pagetable level (either the
+	   ptes in non-PAE, or shared PMD in PAE), then just copy the
+	   references from swapper_pg_dir. */
+	if (PAGETABLE_LEVELS == 2 ||
+	    (PAGETABLE_LEVELS == 3 && SHARED_KERNEL_PMD)) {
+		clone_pgd_range(pgd + USER_PTRS_PER_PGD,
+				swapper_pg_dir + USER_PTRS_PER_PGD,
 				KERNEL_PGD_PTRS);
-	pgd_list_add(pgd);
+		paravirt_alloc_pd_clone(__pa(pgd) >> PAGE_SHIFT,
+					__pa(swapper_pg_dir) >> PAGE_SHIFT,
+					USER_PTRS_PER_PGD,
+					KERNEL_PGD_PTRS);
+	}
+
+	/* list required to sync kernel mapping updates */
+	if (!SHARED_KERNEL_PMD)
+		pgd_list_add(pgd);
+
 	spin_unlock_irqrestore(&pgd_lock, flags);
 }
-#else  /* PTRS_PER_PMD > 1 */
-/* PAE pgd constructor */
-static void pgd_ctor(void *pgd)
-{
-	/* PAE, kernel PMD may be shared */
-
-	if (SHARED_KERNEL_PMD) {
-		clone_pgd_range((pgd_t *)pgd + USER_PTRS_PER_PGD,
-				swapper_pg_dir + USER_PTRS_PER_PGD,
-				KERNEL_PGD_PTRS);
-	} else {
-		unsigned long flags;
-
-		memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t));
-		spin_lock_irqsave(&pgd_lock, flags);
-		pgd_list_add(pgd);
-		spin_unlock_irqrestore(&pgd_lock, flags);
-	}
-}
-#endif	/* PTRS_PER_PMD */
 
 static void pgd_dtor(void *pgd)
 {
@@ -275,9 +264,6 @@
 	pgd_list_del(pgd);
 	spin_unlock_irqrestore(&pgd_lock, flags);
 }
-
-#define UNSHARED_PTRS_PER_PGD				\
-	(SHARED_KERNEL_PMD ? USER_PTRS_PER_PGD : PTRS_PER_PGD)
 
 #ifdef CONFIG_X86_PAE
 /*



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2 of 4] x86: revert "defer cr3 reload when doing pud_clear()"
  2008-02-01 16:39 [PATCH 0 of 4] x86: cleanups from pmd lifetime series Jeremy Fitzhardinge
  2008-02-01 16:39 ` [PATCH 1 of 4] x86: unify PAE/non-PAE pgd_ctor Jeremy Fitzhardinge
@ 2008-02-01 16:39 ` Jeremy Fitzhardinge
  2008-02-01 16:39 ` [PATCH 3 of 4] x86: pud_clear: only reload cr3 if necessary Jeremy Fitzhardinge
  2008-02-01 16:39 ` [PATCH 4 of 4] x86: update reference for PAE tlb flushing Jeremy Fitzhardinge
  3 siblings, 0 replies; 6+ messages in thread
From: Jeremy Fitzhardinge @ 2008-02-01 16:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Andi Kleen, Jan Beulich, Eduardo Pereira Habkost,
	Ian Campbell, H Peter Anvin

Revert "defer cr3 reload when doing pud_clear()" since I'm going to
replace it.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
---
 arch/x86/mm/pgtable_32.c         |    7 -------
 include/asm-x86/pgtable-3level.h |   21 ++++++---------------
 2 files changed, 6 insertions(+), 22 deletions(-)

diff --git a/arch/x86/mm/pgtable_32.c b/arch/x86/mm/pgtable_32.c
--- a/arch/x86/mm/pgtable_32.c
+++ b/arch/x86/mm/pgtable_32.c
@@ -373,13 +373,6 @@
 
 void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
 {
-	/* This is called just after the pmd has been detached from
-	   the pgd, which requires a full tlb flush to be recognized
-	   by the CPU.  Rather than incurring multiple tlb flushes
-	   while the address space is being pulled down, make the tlb
-	   gathering machinery do a full flush when we're done. */
-	tlb->fullmm = 1;
-
 	paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT);
 	tlb_remove_page(tlb, virt_to_page(pmd));
 }
diff --git a/include/asm-x86/pgtable-3level.h b/include/asm-x86/pgtable-3level.h
--- a/include/asm-x86/pgtable-3level.h
+++ b/include/asm-x86/pgtable-3level.h
@@ -96,23 +96,14 @@
 	set_pud(pudp, __pud(0));
 
 	/*
-	 * In principle we need to do a cr3 reload here to make sure
-	 * the processor recognizes the changed pgd.  In practice, all
-	 * the places where pud_clear() gets called are followed by
-	 * full tlb flushes anyway, so we can defer the cost here.
+	 * Pentium-II erratum A13: in PAE mode we explicitly have to flush
+	 * the TLB via cr3 if the top-level pgd is changed...
 	 *
-	 * Specifically:
-	 *
-	 * mm/memory.c:free_pmd_range() - immediately after the
-	 * pud_clear() it does a pmd_free_tlb().  We change the
-	 * mmu_gather structure to do a full tlb flush (which has the
-	 * effect of reloading cr3) when the pagetable free is
-	 * complete.
-	 *
-	 * arch/x86/mm/hugetlbpage.c:huge_pmd_unshare() - the call to
-	 * this is followed by a flush_tlb_range, which on x86 does a
-	 * full tlb flush.
+	 * XXX I don't think we need to worry about this here, since
+	 * when clearing the pud, the calling code needs to flush the
+	 * tlb anyway.  But do it now for safety's sake. - jsgf
 	 */
+	write_cr3(read_cr3());
 }
 
 #define pud_page(pud) \



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 3 of 4] x86: pud_clear: only reload cr3 if necessary
  2008-02-01 16:39 [PATCH 0 of 4] x86: cleanups from pmd lifetime series Jeremy Fitzhardinge
  2008-02-01 16:39 ` [PATCH 1 of 4] x86: unify PAE/non-PAE pgd_ctor Jeremy Fitzhardinge
  2008-02-01 16:39 ` [PATCH 2 of 4] x86: revert "defer cr3 reload when doing pud_clear()" Jeremy Fitzhardinge
@ 2008-02-01 16:39 ` Jeremy Fitzhardinge
  2008-02-01 16:39 ` [PATCH 4 of 4] x86: update reference for PAE tlb flushing Jeremy Fitzhardinge
  3 siblings, 0 replies; 6+ messages in thread
From: Jeremy Fitzhardinge @ 2008-02-01 16:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Andi Kleen, Jan Beulich, Eduardo Pereira Habkost,
	Ian Campbell, H Peter Anvin

Rather than unconditionally reloading cr3, only do so if the pud we're
updating is within the active pgd.

This eliminates TLB flushes most of the time.  The
performance-critical uses of pud_clear are during execve and exit, but
in those cases cr3 is referring to some other pagetable.  The only
other use of pud_clear is during a large (1Gbyte+) munmap, and those
are sufficiently rare that a couple of cr3 reloads won't hurt.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
---
 include/asm-x86/pgtable-3level.h |   11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/asm-x86/pgtable-3level.h b/include/asm-x86/pgtable-3level.h
--- a/include/asm-x86/pgtable-3level.h
+++ b/include/asm-x86/pgtable-3level.h
@@ -93,17 +93,20 @@
 
 static inline void pud_clear(pud_t *pudp)
 {
+	unsigned long pgd;
+
 	set_pud(pudp, __pud(0));
 
 	/*
 	 * Pentium-II erratum A13: in PAE mode we explicitly have to flush
 	 * the TLB via cr3 if the top-level pgd is changed...
 	 *
-	 * XXX I don't think we need to worry about this here, since
-	 * when clearing the pud, the calling code needs to flush the
-	 * tlb anyway.  But do it now for safety's sake. - jsgf
+	 * Make sure the pud entry we're updating is within the
+	 * current pgd to avoid unnecessary TLB flushes.
 	 */
-	write_cr3(read_cr3());
+	pgd = read_cr3();
+	if (__pa(pudp) >= pgd && __pa(pudp) < (pgd + sizeof(pgd_t)*PTRS_PER_PGD))
+		write_cr3(pgd);
 }
 
 #define pud_page(pud) \



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 4 of 4] x86: update reference for PAE tlb flushing
  2008-02-01 16:39 [PATCH 0 of 4] x86: cleanups from pmd lifetime series Jeremy Fitzhardinge
                   ` (2 preceding siblings ...)
  2008-02-01 16:39 ` [PATCH 3 of 4] x86: pud_clear: only reload cr3 if necessary Jeremy Fitzhardinge
@ 2008-02-01 16:39 ` Jeremy Fitzhardinge
  3 siblings, 0 replies; 6+ messages in thread
From: Jeremy Fitzhardinge @ 2008-02-01 16:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Andi Kleen, Jan Beulich, Eduardo Pereira Habkost,
	Ian Campbell, H Peter Anvin

Remove bogus reference to "Pentium-II erratum A13" and point to the
actual canonical source of information about what requirements x86
processors have for PAE pagetable updates.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
---
 include/asm-x86/pgalloc_32.h     |    6 ++++--
 include/asm-x86/pgtable-3level.h |    6 ++++--
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/include/asm-x86/pgalloc_32.h b/include/asm-x86/pgalloc_32.h
--- a/include/asm-x86/pgalloc_32.h
+++ b/include/asm-x86/pgalloc_32.h
@@ -80,8 +80,10 @@
 	set_pud(pudp, __pud(__pa(pmd) | _PAGE_PRESENT));
 
 	/*
-	 * Pentium-II erratum A13: in PAE mode we explicitly have to flush
-	 * the TLB via cr3 if the top-level pgd is changed...
+	 * According to Intel App note "TLBs, Paging-Structure Caches,
+	 * and Their Invalidation", April 2007, document 317080-001,
+	 * section 8.1: in PAE mode we explicitly have to flush the
+	 * TLB via cr3 if the top-level pgd is changed...
 	 */
 	if (mm == current->active_mm)
 		write_cr3(read_cr3());
diff --git a/include/asm-x86/pgtable-3level.h b/include/asm-x86/pgtable-3level.h
--- a/include/asm-x86/pgtable-3level.h
+++ b/include/asm-x86/pgtable-3level.h
@@ -98,8 +98,10 @@
 	set_pud(pudp, __pud(0));
 
 	/*
-	 * Pentium-II erratum A13: in PAE mode we explicitly have to flush
-	 * the TLB via cr3 if the top-level pgd is changed...
+	 * According to Intel App note "TLBs, Paging-Structure Caches,
+	 * and Their Invalidation", April 2007, document 317080-001,
+	 * section 8.1: in PAE mode we explicitly have to flush the
+	 * TLB via cr3 if the top-level pgd is changed...
 	 *
 	 * Make sure the pud entry we're updating is within the
 	 * current pgd to avoid unnecessary TLB flushes.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 3 of 4] x86: pud_clear: only reload cr3 if necessary
  2008-01-28 23:48 [PATCH 0 of 4] x86: cleanups from pmd lifetime series Jeremy Fitzhardinge
@ 2008-01-28 23:48 ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 6+ messages in thread
From: Jeremy Fitzhardinge @ 2008-01-28 23:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Andi Kleen, Jan Beulich, Eduardo Pereira Habkost,
	Ian Campbell, H Peter Anvin

Rather than unconditionally reloading cr3, only do so if the pud we're
updating is within the active pgd.

This eliminates TLB flushes most of the time.  The
performance-critical uses of pud_clear are during execve and exit, but
in those cases cr3 is referring to some other pagetable.  The only
other use of pud_clear is during a large (1Gbyte+) munmap, and those
are sufficiently rare that a couple of cr3 reloads won't hurt.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
---
 include/asm-x86/pgtable-3level.h |   11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/asm-x86/pgtable-3level.h b/include/asm-x86/pgtable-3level.h
--- a/include/asm-x86/pgtable-3level.h
+++ b/include/asm-x86/pgtable-3level.h
@@ -93,17 +93,20 @@ static inline void native_pmd_clear(pmd_
 
 static inline void pud_clear(pud_t *pudp)
 {
+	unsigned long pgd;
+
 	set_pud(pudp, __pud(0));
 
 	/*
 	 * Pentium-II erratum A13: in PAE mode we explicitly have to flush
 	 * the TLB via cr3 if the top-level pgd is changed...
 	 *
-	 * XXX I don't think we need to worry about this here, since
-	 * when clearing the pud, the calling code needs to flush the
-	 * tlb anyway.  But do it now for safety's sake. - jsgf
+	 * Make sure the pud entry we're updating is within the
+	 * current pgd to avoid unnecessary TLB flushes.
 	 */
-	write_cr3(read_cr3());
+	pgd = read_cr3();
+	if (__pa(pudp) >= pgd && __pa(pudp) < (pgd + sizeof(pgd_t)*PTRS_PER_PGD))
+		write_cr3(pgd);
 }
 
 #define pud_page(pud) \



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-02-01 16:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-01 16:39 [PATCH 0 of 4] x86: cleanups from pmd lifetime series Jeremy Fitzhardinge
2008-02-01 16:39 ` [PATCH 1 of 4] x86: unify PAE/non-PAE pgd_ctor Jeremy Fitzhardinge
2008-02-01 16:39 ` [PATCH 2 of 4] x86: revert "defer cr3 reload when doing pud_clear()" Jeremy Fitzhardinge
2008-02-01 16:39 ` [PATCH 3 of 4] x86: pud_clear: only reload cr3 if necessary Jeremy Fitzhardinge
2008-02-01 16:39 ` [PATCH 4 of 4] x86: update reference for PAE tlb flushing Jeremy Fitzhardinge
  -- strict thread matches above, loose matches on Subject: below --
2008-01-28 23:48 [PATCH 0 of 4] x86: cleanups from pmd lifetime series Jeremy Fitzhardinge
2008-01-28 23:48 ` [PATCH 3 of 4] x86: pud_clear: only reload cr3 if necessary Jeremy Fitzhardinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).