* [PATCH 00/25] mm: Preemptibility -v7
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
This patch-set makes part of the mm a lot more preemptible. It converts
i_mmap_lock and anon_vma->lock to mutexes and makes mmu_gather fully
preemptible.
The main motivation was making mm_take_all_locks() preemptible, since it
appears people are nesting hundreds of spinlocks there.
The side-effects are that can finally make mmu_gather preemptible,
something which lots of people have wanted to do for a long time.
It also gets us anon_vma refcounting, which seems to result in a nice
cleanup of the anon_vma lifetime rules wrt KSM and compaction.
This patch-set is build and boot-tested on x86_64 (a previous version was
also tested on Dave's Niagra2 machines, and I suppose s390 was too when
Martin provided the conversion patch for his arch).
There are no known architectures left unconverted.
Yanmin ran the -v3 posting through the comprehensive Intel test farm
and didn't find any regressions.
( Not included in this posting are the 4 Sparc64 patches that implement
gup_fast, those can be applied separately after this series gets
anywhere. )
The full series (including the Sparc64 gup_fast bits) also available in -git
form from (against something post .38-rc2):
git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-mmu_preempt.git
Changes since -v6:
Suggested by Hugh:
- reordered the patches
- changed to GFP_NOWAIT | __GFP_NOWARN to allocate mmu_gather pages
- s/lock/mutex/ for the spinlock to mutex conversion
- removed all DEFINE_PER_CPU(struct mmu_gather, mmu_gather) remnants
- split the i_mmap_lock and anon_vma->lock conversion
- removed some KSM wrappers
- avoid tlb_flush_mmu() while holding pte_lock
Other:
- remove the i_mmap_lock lockbreak in truncate (XXX)
- arch/tile __pte_free_tlb() change
TODO:
- decide if we want to actually remove the i_mmap_lock lockbreak
- figure out if LOCK vs MB works or add a smp_mb() to patch #23
- figure out what to do with sparc's tlb_batch lack of ->fullmm
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 00/25] mm: Preemptibility -v7
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
This patch-set makes part of the mm a lot more preemptible. It converts
i_mmap_lock and anon_vma->lock to mutexes and makes mmu_gather fully
preemptible.
The main motivation was making mm_take_all_locks() preemptible, since it
appears people are nesting hundreds of spinlocks there.
The side-effects are that can finally make mmu_gather preemptible,
something which lots of people have wanted to do for a long time.
It also gets us anon_vma refcounting, which seems to result in a nice
cleanup of the anon_vma lifetime rules wrt KSM and compaction.
This patch-set is build and boot-tested on x86_64 (a previous version was
also tested on Dave's Niagra2 machines, and I suppose s390 was too when
Martin provided the conversion patch for his arch).
There are no known architectures left unconverted.
Yanmin ran the -v3 posting through the comprehensive Intel test farm
and didn't find any regressions.
( Not included in this posting are the 4 Sparc64 patches that implement
gup_fast, those can be applied separately after this series gets
anywhere. )
The full series (including the Sparc64 gup_fast bits) also available in -git
form from (against something post .38-rc2):
git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-mmu_preempt.git
Changes since -v6:
Suggested by Hugh:
- reordered the patches
- changed to GFP_NOWAIT | __GFP_NOWARN to allocate mmu_gather pages
- s/lock/mutex/ for the spinlock to mutex conversion
- removed all DEFINE_PER_CPU(struct mmu_gather, mmu_gather) remnants
- split the i_mmap_lock and anon_vma->lock conversion
- removed some KSM wrappers
- avoid tlb_flush_mmu() while holding pte_lock
Other:
- remove the i_mmap_lock lockbreak in truncate (XXX)
- arch/tile __pte_free_tlb() change
TODO:
- decide if we want to actually remove the i_mmap_lock lockbreak
- figure out if LOCK vs MB works or add a smp_mb() to patch #23
- figure out what to do with sparc's tlb_batch lack of ->fullmm
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 01/25] tile: Fix __pte_free_tlb
2011-01-25 17:31 ` Peter Zijlstra
(?)
@ 2011-01-25 17:31 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Chris Metcalf
[-- Attachment #1: tile-fix-pte_free_tlb.patch --]
[-- Type: text/plain, Size: 1256 bytes --]
Tile's __pte_free_tlb() implementation makes assumptions about the
generic mmu_gather implementation, cure this ;-)
[ Chris, from a quick look L2_USER_PGTABLE_PAGES is something like:
1 << (24 - 16 + 3), which looks awefully large for an on-stack
array. ]
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/tile/mm/pgtable.c | 15 ++-------------
1 file changed, 2 insertions(+), 13 deletions(-)
Index: linux-2.6/arch/tile/mm/pgtable.c
===================================================================
--- linux-2.6.orig/arch/tile/mm/pgtable.c
+++ linux-2.6/arch/tile/mm/pgtable.c
@@ -252,19 +252,8 @@ void __pte_free_tlb(struct mmu_gather *t
int i;
pgtable_page_dtor(pte);
- tlb->need_flush = 1;
- if (tlb_fast_mode(tlb)) {
- struct page *pte_pages[L2_USER_PGTABLE_PAGES];
- for (i = 0; i < L2_USER_PGTABLE_PAGES; ++i)
- pte_pages[i] = pte + i;
- free_pages_and_swap_cache(pte_pages, L2_USER_PGTABLE_PAGES);
- return;
- }
- for (i = 0; i < L2_USER_PGTABLE_PAGES; ++i) {
- tlb->pages[tlb->nr++] = pte + i;
- if (tlb->nr >= FREE_PTE_NR)
- tlb_flush_mmu(tlb, 0, 0);
- }
+ for (i = 0; i < L2_USER_PGTABLE_PAGES; ++i)
+ tlb_remove_page(tlb, pte + i);
}
#ifndef __tilegx__
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 01/25] tile: Fix __pte_free_tlb
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Chris Metcalf
[-- Attachment #1: tile-fix-pte_free_tlb.patch --]
[-- Type: text/plain, Size: 1552 bytes --]
Tile's __pte_free_tlb() implementation makes assumptions about the
generic mmu_gather implementation, cure this ;-)
[ Chris, from a quick look L2_USER_PGTABLE_PAGES is something like:
1 << (24 - 16 + 3), which looks awefully large for an on-stack
array. ]
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/tile/mm/pgtable.c | 15 ++-------------
1 file changed, 2 insertions(+), 13 deletions(-)
Index: linux-2.6/arch/tile/mm/pgtable.c
===================================================================
--- linux-2.6.orig/arch/tile/mm/pgtable.c
+++ linux-2.6/arch/tile/mm/pgtable.c
@@ -252,19 +252,8 @@ void __pte_free_tlb(struct mmu_gather *t
int i;
pgtable_page_dtor(pte);
- tlb->need_flush = 1;
- if (tlb_fast_mode(tlb)) {
- struct page *pte_pages[L2_USER_PGTABLE_PAGES];
- for (i = 0; i < L2_USER_PGTABLE_PAGES; ++i)
- pte_pages[i] = pte + i;
- free_pages_and_swap_cache(pte_pages, L2_USER_PGTABLE_PAGES);
- return;
- }
- for (i = 0; i < L2_USER_PGTABLE_PAGES; ++i) {
- tlb->pages[tlb->nr++] = pte + i;
- if (tlb->nr >= FREE_PTE_NR)
- tlb_flush_mmu(tlb, 0, 0);
- }
+ for (i = 0; i < L2_USER_PGTABLE_PAGES; ++i)
+ tlb_remove_page(tlb, pte + i);
}
#ifndef __tilegx__
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 01/25] tile: Fix __pte_free_tlb
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Chris Metcalf
[-- Attachment #1: tile-fix-pte_free_tlb.patch --]
[-- Type: text/plain, Size: 1254 bytes --]
Tile's __pte_free_tlb() implementation makes assumptions about the
generic mmu_gather implementation, cure this ;-)
[ Chris, from a quick look L2_USER_PGTABLE_PAGES is something like:
1 << (24 - 16 + 3), which looks awefully large for an on-stack
array. ]
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/tile/mm/pgtable.c | 15 ++-------------
1 file changed, 2 insertions(+), 13 deletions(-)
Index: linux-2.6/arch/tile/mm/pgtable.c
===================================================================
--- linux-2.6.orig/arch/tile/mm/pgtable.c
+++ linux-2.6/arch/tile/mm/pgtable.c
@@ -252,19 +252,8 @@ void __pte_free_tlb(struct mmu_gather *t
int i;
pgtable_page_dtor(pte);
- tlb->need_flush = 1;
- if (tlb_fast_mode(tlb)) {
- struct page *pte_pages[L2_USER_PGTABLE_PAGES];
- for (i = 0; i < L2_USER_PGTABLE_PAGES; ++i)
- pte_pages[i] = pte + i;
- free_pages_and_swap_cache(pte_pages, L2_USER_PGTABLE_PAGES);
- return;
- }
- for (i = 0; i < L2_USER_PGTABLE_PAGES; ++i) {
- tlb->pages[tlb->nr++] = pte + i;
- if (tlb->nr >= FREE_PTE_NR)
- tlb_flush_mmu(tlb, 0, 0);
- }
+ for (i = 0; i < L2_USER_PGTABLE_PAGES; ++i)
+ tlb_remove_page(tlb, pte + i);
}
#ifndef __tilegx__
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 01/25] tile: Fix __pte_free_tlb
2011-01-25 17:31 ` Peter Zijlstra
(?)
@ 2011-02-04 20:39 ` Chris Metcalf
-1 siblings, 0 replies; 128+ messages in thread
From: Chris Metcalf @ 2011-02-04 20:39 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds, linux-kernel, linux-arch,
linux-mm, Benjamin Herrenschmidt, David Miller, Hugh Dickins,
Mel Gorman, Nick Piggin, Paul McKenney, Yanmin Zhang
On 1/25/2011 12:31 PM, Peter Zijlstra wrote:
> Tile's __pte_free_tlb() implementation makes assumptions about the
> generic mmu_gather implementation, cure this ;-)
I assume you will take this patch into your tree? If so:
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
> [ Chris, from a quick look L2_USER_PGTABLE_PAGES is something like:
> 1 << (24 - 16 + 3), which looks awefully large for an on-stack
> array. ]
Yes, the pte_pages[] array in this routine is 2KB currently. Currently we
ship with 64KB pagesize, so the kernel stack has plenty of room. I do like
that your patch removes this buffer, however, since we're currently looking
into (re-)supporting 4KB pages, which would totally blow the kernel stack
in this routine.
> Cc: Chris Metcalf <cmetcalf@tilera.com>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
> arch/tile/mm/pgtable.c | 15 ++-------------
> 1 file changed, 2 insertions(+), 13 deletions(-)
>
> Index: linux-2.6/arch/tile/mm/pgtable.c
> ===================================================================
> --- linux-2.6.orig/arch/tile/mm/pgtable.c
> +++ linux-2.6/arch/tile/mm/pgtable.c
> @@ -252,19 +252,8 @@ void __pte_free_tlb(struct mmu_gather *t
> int i;
>
> pgtable_page_dtor(pte);
> - tlb->need_flush = 1;
> - if (tlb_fast_mode(tlb)) {
> - struct page *pte_pages[L2_USER_PGTABLE_PAGES];
> - for (i = 0; i < L2_USER_PGTABLE_PAGES; ++i)
> - pte_pages[i] = pte + i;
> - free_pages_and_swap_cache(pte_pages, L2_USER_PGTABLE_PAGES);
> - return;
> - }
> - for (i = 0; i < L2_USER_PGTABLE_PAGES; ++i) {
> - tlb->pages[tlb->nr++] = pte + i;
> - if (tlb->nr >= FREE_PTE_NR)
> - tlb_flush_mmu(tlb, 0, 0);
> - }
> + for (i = 0; i < L2_USER_PGTABLE_PAGES; ++i)
> + tlb_remove_page(tlb, pte + i);
> }
>
> #ifndef __tilegx__
>
>
--
Chris Metcalf, Tilera Corp.
http://www.tilera.com
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 01/25] tile: Fix __pte_free_tlb
@ 2011-02-04 20:39 ` Chris Metcalf
0 siblings, 0 replies; 128+ messages in thread
From: Chris Metcalf @ 2011-02-04 20:39 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds, linux-kernel, linux-arch,
linux-mm, Benjamin Herrenschmidt, David Miller, Hugh Dickins,
Mel Gorman, Nick Piggin, Paul McKenney, Yanmin Zhang
On 1/25/2011 12:31 PM, Peter Zijlstra wrote:
> Tile's __pte_free_tlb() implementation makes assumptions about the
> generic mmu_gather implementation, cure this ;-)
I assume you will take this patch into your tree? If so:
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
> [ Chris, from a quick look L2_USER_PGTABLE_PAGES is something like:
> 1 << (24 - 16 + 3), which looks awefully large for an on-stack
> array. ]
Yes, the pte_pages[] array in this routine is 2KB currently. Currently we
ship with 64KB pagesize, so the kernel stack has plenty of room. I do like
that your patch removes this buffer, however, since we're currently looking
into (re-)supporting 4KB pages, which would totally blow the kernel stack
in this routine.
> Cc: Chris Metcalf <cmetcalf@tilera.com>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
> arch/tile/mm/pgtable.c | 15 ++-------------
> 1 file changed, 2 insertions(+), 13 deletions(-)
>
> Index: linux-2.6/arch/tile/mm/pgtable.c
> ===================================================================
> --- linux-2.6.orig/arch/tile/mm/pgtable.c
> +++ linux-2.6/arch/tile/mm/pgtable.c
> @@ -252,19 +252,8 @@ void __pte_free_tlb(struct mmu_gather *t
> int i;
>
> pgtable_page_dtor(pte);
> - tlb->need_flush = 1;
> - if (tlb_fast_mode(tlb)) {
> - struct page *pte_pages[L2_USER_PGTABLE_PAGES];
> - for (i = 0; i < L2_USER_PGTABLE_PAGES; ++i)
> - pte_pages[i] = pte + i;
> - free_pages_and_swap_cache(pte_pages, L2_USER_PGTABLE_PAGES);
> - return;
> - }
> - for (i = 0; i < L2_USER_PGTABLE_PAGES; ++i) {
> - tlb->pages[tlb->nr++] = pte + i;
> - if (tlb->nr >= FREE_PTE_NR)
> - tlb_flush_mmu(tlb, 0, 0);
> - }
> + for (i = 0; i < L2_USER_PGTABLE_PAGES; ++i)
> + tlb_remove_page(tlb, pte + i);
> }
>
> #ifndef __tilegx__
>
>
--
Chris Metcalf, Tilera Corp.
http://www.tilera.com
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 01/25] tile: Fix __pte_free_tlb
@ 2011-02-04 20:39 ` Chris Metcalf
0 siblings, 0 replies; 128+ messages in thread
From: Chris Metcalf @ 2011-02-04 20:39 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds, linux-kernel, linux-arch,
linux-mm, Benjamin Herrenschmidt, David Miller, Hugh Dickins,
Mel Gorman, Nick Piggin, Paul McKenney, Yanmin Zhang
On 1/25/2011 12:31 PM, Peter Zijlstra wrote:
> Tile's __pte_free_tlb() implementation makes assumptions about the
> generic mmu_gather implementation, cure this ;-)
I assume you will take this patch into your tree? If so:
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
> [ Chris, from a quick look L2_USER_PGTABLE_PAGES is something like:
> 1 << (24 - 16 + 3), which looks awefully large for an on-stack
> array. ]
Yes, the pte_pages[] array in this routine is 2KB currently. Currently we
ship with 64KB pagesize, so the kernel stack has plenty of room. I do like
that your patch removes this buffer, however, since we're currently looking
into (re-)supporting 4KB pages, which would totally blow the kernel stack
in this routine.
> Cc: Chris Metcalf <cmetcalf@tilera.com>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
> arch/tile/mm/pgtable.c | 15 ++-------------
> 1 file changed, 2 insertions(+), 13 deletions(-)
>
> Index: linux-2.6/arch/tile/mm/pgtable.c
> ===================================================================
> --- linux-2.6.orig/arch/tile/mm/pgtable.c
> +++ linux-2.6/arch/tile/mm/pgtable.c
> @@ -252,19 +252,8 @@ void __pte_free_tlb(struct mmu_gather *t
> int i;
>
> pgtable_page_dtor(pte);
> - tlb->need_flush = 1;
> - if (tlb_fast_mode(tlb)) {
> - struct page *pte_pages[L2_USER_PGTABLE_PAGES];
> - for (i = 0; i < L2_USER_PGTABLE_PAGES; ++i)
> - pte_pages[i] = pte + i;
> - free_pages_and_swap_cache(pte_pages, L2_USER_PGTABLE_PAGES);
> - return;
> - }
> - for (i = 0; i < L2_USER_PGTABLE_PAGES; ++i) {
> - tlb->pages[tlb->nr++] = pte + i;
> - if (tlb->nr >= FREE_PTE_NR)
> - tlb_flush_mmu(tlb, 0, 0);
> - }
> + for (i = 0; i < L2_USER_PGTABLE_PAGES; ++i)
> + tlb_remove_page(tlb, pte + i);
> }
>
> #ifndef __tilegx__
>
>
--
Chris Metcalf, Tilera Corp.
http://www.tilera.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 01/25] tile: Fix __pte_free_tlb
2011-02-04 20:39 ` Chris Metcalf
@ 2011-02-07 13:55 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-02-07 13:55 UTC (permalink / raw)
To: Chris Metcalf
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds, linux-kernel, linux-arch,
linux-mm, Benjamin Herrenschmidt, David Miller, Hugh Dickins,
Mel Gorman, Nick Piggin, Paul McKenney, Yanmin Zhang
On Fri, 2011-02-04 at 15:39 -0500, Chris Metcalf wrote:
> On 1/25/2011 12:31 PM, Peter Zijlstra wrote:
> > Tile's __pte_free_tlb() implementation makes assumptions about the
> > generic mmu_gather implementation, cure this ;-)
>
> I assume you will take this patch into your tree? If so:
>
> Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Feel free to take it yourself, this series might take a while to land..
> > [ Chris, from a quick look L2_USER_PGTABLE_PAGES is something like:
> > 1 << (24 - 16 + 3), which looks awefully large for an on-stack
> > array. ]
>
> Yes, the pte_pages[] array in this routine is 2KB currently. Currently we
> ship with 64KB pagesize, so the kernel stack has plenty of room. I do like
> that your patch removes this buffer, however, since we're currently looking
> into (re-)supporting 4KB pages, which would totally blow the kernel stack
> in this routine.
Ah, ok.
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 01/25] tile: Fix __pte_free_tlb
@ 2011-02-07 13:55 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-02-07 13:55 UTC (permalink / raw)
To: Chris Metcalf
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds, linux-kernel, linux-arch,
linux-mm, Benjamin Herrenschmidt, David Miller, Hugh Dickins,
Mel Gorman, Nick Piggin, Paul McKenney, Yanmin Zhang
On Fri, 2011-02-04 at 15:39 -0500, Chris Metcalf wrote:
> On 1/25/2011 12:31 PM, Peter Zijlstra wrote:
> > Tile's __pte_free_tlb() implementation makes assumptions about the
> > generic mmu_gather implementation, cure this ;-)
>
> I assume you will take this patch into your tree? If so:
>
> Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Feel free to take it yourself, this series might take a while to land..
> > [ Chris, from a quick look L2_USER_PGTABLE_PAGES is something like:
> > 1 << (24 - 16 + 3), which looks awefully large for an on-stack
> > array. ]
>
> Yes, the pte_pages[] array in this routine is 2KB currently. Currently we
> ship with 64KB pagesize, so the kernel stack has plenty of room. I do like
> that your patch removes this buffer, however, since we're currently looking
> into (re-)supporting 4KB pages, which would totally blow the kernel stack
> in this routine.
Ah, ok.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 01/25] tile: Fix __pte_free_tlb
2011-02-07 13:55 ` Peter Zijlstra
(?)
@ 2011-02-23 20:59 ` Chris Metcalf
-1 siblings, 0 replies; 128+ messages in thread
From: Chris Metcalf @ 2011-02-23 20:59 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds, linux-kernel, linux-arch,
linux-mm, Benjamin Herrenschmidt, David Miller, Hugh Dickins,
Mel Gorman, Nick Piggin, Paul McKenney, Yanmin Zhang
On 2/7/2011 8:55 AM, Peter Zijlstra wrote:
> On Fri, 2011-02-04 at 15:39 -0500, Chris Metcalf wrote:
>> On 1/25/2011 12:31 PM, Peter Zijlstra wrote:
>>> Tile's __pte_free_tlb() implementation makes assumptions about the
>>> generic mmu_gather implementation, cure this ;-)
>> I assume you will take this patch into your tree? If so:
>>
>> Acked-by: Chris Metcalf <cmetcalf@tilera.com>
> Feel free to take it yourself, this series might take a while to land..
Thanks, I took it into my tree (without the comment about the on-stack array).
--
Chris Metcalf, Tilera Corp.
http://www.tilera.com
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 01/25] tile: Fix __pte_free_tlb
@ 2011-02-23 20:59 ` Chris Metcalf
0 siblings, 0 replies; 128+ messages in thread
From: Chris Metcalf @ 2011-02-23 20:59 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds, linux-kernel, linux-arch,
linux-mm, Benjamin Herrenschmidt, David Miller, Hugh Dickins,
Mel Gorman, Nick Piggin, Paul McKenney, Yanmin Zhang
On 2/7/2011 8:55 AM, Peter Zijlstra wrote:
> On Fri, 2011-02-04 at 15:39 -0500, Chris Metcalf wrote:
>> On 1/25/2011 12:31 PM, Peter Zijlstra wrote:
>>> Tile's __pte_free_tlb() implementation makes assumptions about the
>>> generic mmu_gather implementation, cure this ;-)
>> I assume you will take this patch into your tree? If so:
>>
>> Acked-by: Chris Metcalf <cmetcalf@tilera.com>
> Feel free to take it yourself, this series might take a while to land..
Thanks, I took it into my tree (without the comment about the on-stack array).
--
Chris Metcalf, Tilera Corp.
http://www.tilera.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 01/25] tile: Fix __pte_free_tlb
@ 2011-02-23 20:59 ` Chris Metcalf
0 siblings, 0 replies; 128+ messages in thread
From: Chris Metcalf @ 2011-02-23 20:59 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds, linux-kernel, linux-arch,
linux-mm, Benjamin Herrenschmidt, David Miller, Hugh Dickins,
Mel Gorman, Nick Piggin, Paul McKenney, Yanmin Zhang
On 2/7/2011 8:55 AM, Peter Zijlstra wrote:
> On Fri, 2011-02-04 at 15:39 -0500, Chris Metcalf wrote:
>> On 1/25/2011 12:31 PM, Peter Zijlstra wrote:
>>> Tile's __pte_free_tlb() implementation makes assumptions about the
>>> generic mmu_gather implementation, cure this ;-)
>> I assume you will take this patch into your tree? If so:
>>
>> Acked-by: Chris Metcalf <cmetcalf@tilera.com>
> Feel free to take it yourself, this series might take a while to land..
Thanks, I took it into my tree (without the comment about the on-stack array).
--
Chris Metcalf, Tilera Corp.
http://www.tilera.com
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 02/25] mm: Preemptible mmu_gather
2011-01-25 17:31 ` Peter Zijlstra
(?)
@ 2011-01-25 17:31 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Martin Schwidefsky,
Russell King, Paul Mundt, Jeff Dike, Tony Luck, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-preemptible_mmu_gather.patch --]
[-- Type: text/plain, Size: 10182 bytes --]
Make mmu_gather preemptible by using a small on stack list and use
an option allocation to speed things up.
Preemptible mmu_gather is desired in general and usable once
i_mmap_lock becomes a mutex. Doing it before the mutex conversion
saves us from having to rework the code by moving the mmu_gather
bits inside the pte_lock.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Miller <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Tony Luck <tony.luck@intel.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
fs/exec.c | 10 ++++-----
include/asm-generic/tlb.h | 51 +++++++++++++++++++++++++++++-----------------
include/linux/mm.h | 2 -
mm/memory.c | 27 +++++-------------------
mm/mmap.c | 18 ++++++++--------
5 files changed, 54 insertions(+), 54 deletions(-)
Index: linux-2.6/fs/exec.c
===================================================================
--- linux-2.6.orig/fs/exec.c
+++ linux-2.6/fs/exec.c
@@ -550,7 +550,7 @@ static int shift_arg_pages(struct vm_are
unsigned long length = old_end - old_start;
unsigned long new_start = old_start - shift;
unsigned long new_end = old_end - shift;
- struct mmu_gather *tlb;
+ struct mmu_gather tlb;
BUG_ON(new_start > new_end);
@@ -576,12 +576,12 @@ static int shift_arg_pages(struct vm_are
return -ENOMEM;
lru_add_drain();
- tlb = tlb_gather_mmu(mm, 0);
+ tlb_gather_mmu(&tlb, mm, 0);
if (new_end > old_start) {
/*
* when the old and new regions overlap clear from new_end.
*/
- free_pgd_range(tlb, new_end, old_end, new_end,
+ free_pgd_range(&tlb, new_end, old_end, new_end,
vma->vm_next ? vma->vm_next->vm_start : 0);
} else {
/*
@@ -590,10 +590,10 @@ static int shift_arg_pages(struct vm_are
* have constraints on va-space that make this illegal (IA64) -
* for the others its just a little faster.
*/
- free_pgd_range(tlb, old_start, old_end, new_end,
+ free_pgd_range(&tlb, old_start, old_end, new_end,
vma->vm_next ? vma->vm_next->vm_start : 0);
}
- tlb_finish_mmu(tlb, new_end, old_end);
+ tlb_finish_mmu(&tlb, new_end, old_end);
/*
* Shrink the vma to just the new range. Always succeeds.
Index: linux-2.6/include/asm-generic/tlb.h
===================================================================
--- linux-2.6.orig/include/asm-generic/tlb.h
+++ linux-2.6/include/asm-generic/tlb.h
@@ -22,14 +22,8 @@
* and page free order so much..
*/
#ifdef CONFIG_SMP
- #ifdef ARCH_FREE_PTR_NR
- #define FREE_PTR_NR ARCH_FREE_PTR_NR
- #else
- #define FREE_PTE_NR 506
- #endif
#define tlb_fast_mode(tlb) ((tlb)->nr == ~0U)
#else
- #define FREE_PTE_NR 1
#define tlb_fast_mode(tlb) 1
#endif
@@ -39,30 +33,48 @@
struct mmu_gather {
struct mm_struct *mm;
unsigned int nr; /* set to ~0U means fast mode */
+ unsigned int max; /* nr < max */
unsigned int need_flush;/* Really unmapped some ptes? */
unsigned int fullmm; /* non-zero means full mm flush */
- struct page * pages[FREE_PTE_NR];
+#ifdef HAVE_ARCH_MMU_GATHER
+ struct arch_mmu_gather arch;
+#endif
+ struct page **pages;
+ struct page *local[8];
};
-/* Users of the generic TLB shootdown code must declare this storage space. */
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
+static inline void __tlb_alloc_page(struct mmu_gather *tlb)
+{
+ unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
+
+ if (addr) {
+ tlb->pages = (void *)addr;
+ tlb->max = PAGE_SIZE / sizeof(struct page *);
+ }
+}
/* tlb_gather_mmu
* Return a pointer to an initialized struct mmu_gather.
*/
-static inline struct mmu_gather *
-tlb_gather_mmu(struct mm_struct *mm, unsigned int full_mm_flush)
+static inline void
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
{
- struct mmu_gather *tlb = &get_cpu_var(mmu_gathers);
-
tlb->mm = mm;
- /* Use fast mode if only one CPU is online */
- tlb->nr = num_online_cpus() > 1 ? 0U : ~0U;
+ tlb->max = ARRAY_SIZE(tlb->local);
+ tlb->pages = tlb->local;
+
+ if (num_online_cpus() > 1) {
+ tlb->nr = 0;
+ __tlb_alloc_page(tlb);
+ } else /* Use fast mode if only one CPU is online */
+ tlb->nr = ~0U;
tlb->fullmm = full_mm_flush;
- return tlb;
+#ifdef HAVE_ARCH_MMU_GATHER
+ tlb->arch = ARCH_MMU_GATHER_INIT;
+#endif
}
static inline void
@@ -75,6 +87,8 @@ tlb_flush_mmu(struct mmu_gather *tlb, un
if (!tlb_fast_mode(tlb)) {
free_pages_and_swap_cache(tlb->pages, tlb->nr);
tlb->nr = 0;
+ if (tlb->pages == tlb->local)
+ __tlb_alloc_page(tlb);
}
}
@@ -90,7 +104,8 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
/* keep the page table cache within bounds */
check_pgt_cache();
- put_cpu_var(mmu_gathers);
+ if (tlb->pages != tlb->local)
+ free_pages((unsigned long)tlb->pages, 0);
}
/* tlb_remove_page
@@ -106,7 +121,7 @@ static inline void tlb_remove_page(struc
return;
}
tlb->pages[tlb->nr++] = page;
- if (tlb->nr >= FREE_PTE_NR)
+ if (tlb->nr >= tlb->max)
tlb_flush_mmu(tlb, 0, 0);
}
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -887,7 +887,7 @@ int zap_vma_ptes(struct vm_area_struct *
unsigned long size);
unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
unsigned long size, struct zap_details *);
-unsigned long unmap_vmas(struct mmu_gather **tlb,
+unsigned long unmap_vmas(struct mmu_gather *tlb,
struct vm_area_struct *start_vma, unsigned long start_addr,
unsigned long end_addr, unsigned long *nr_accounted,
struct zap_details *);
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -1121,17 +1121,14 @@ static unsigned long unmap_page_range(st
* ensure that any thus-far unmapped pages are flushed before unmap_vmas()
* drops the lock and schedules.
*/
-unsigned long unmap_vmas(struct mmu_gather **tlbp,
+unsigned long unmap_vmas(struct mmu_gather *tlb,
struct vm_area_struct *vma, unsigned long start_addr,
unsigned long end_addr, unsigned long *nr_accounted,
struct zap_details *details)
{
long zap_work = ZAP_BLOCK_SIZE;
- unsigned long tlb_start = 0; /* For tlb_finish_mmu */
- int tlb_start_valid = 0;
unsigned long start = start_addr;
spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
- int fullmm = (*tlbp)->fullmm;
struct mm_struct *mm = vma->vm_mm;
mmu_notifier_invalidate_range_start(mm, start_addr, end_addr);
@@ -1152,11 +1149,6 @@ unsigned long unmap_vmas(struct mmu_gath
untrack_pfn_vma(vma, 0, 0);
while (start != end) {
- if (!tlb_start_valid) {
- tlb_start = start;
- tlb_start_valid = 1;
- }
-
if (unlikely(is_vm_hugetlb_page(vma))) {
/*
* It is undesirable to test vma->vm_file as it
@@ -1177,7 +1169,7 @@ unsigned long unmap_vmas(struct mmu_gath
start = end;
} else
- start = unmap_page_range(*tlbp, vma,
+ start = unmap_page_range(tlb, vma,
start, end, &zap_work, details);
if (zap_work > 0) {
@@ -1185,19 +1177,13 @@ unsigned long unmap_vmas(struct mmu_gath
break;
}
- tlb_finish_mmu(*tlbp, tlb_start, start);
-
if (need_resched() ||
(i_mmap_lock && spin_needbreak(i_mmap_lock))) {
- if (i_mmap_lock) {
- *tlbp = NULL;
+ if (i_mmap_lock)
goto out;
- }
cond_resched();
}
- *tlbp = tlb_gather_mmu(vma->vm_mm, fullmm);
- tlb_start_valid = 0;
zap_work = ZAP_BLOCK_SIZE;
}
}
@@ -1217,16 +1203,15 @@ unsigned long zap_page_range(struct vm_a
unsigned long size, struct zap_details *details)
{
struct mm_struct *mm = vma->vm_mm;
- struct mmu_gather *tlb;
+ struct mmu_gather tlb;
unsigned long end = address + size;
unsigned long nr_accounted = 0;
lru_add_drain();
- tlb = tlb_gather_mmu(mm, 0);
+ tlb_gather_mmu(&tlb, mm, 0);
update_hiwater_rss(mm);
end = unmap_vmas(&tlb, vma, address, end, &nr_accounted, details);
- if (tlb)
- tlb_finish_mmu(tlb, address, end);
+ tlb_finish_mmu(&tlb, address, end);
return end;
}
Index: linux-2.6/mm/mmap.c
===================================================================
--- linux-2.6.orig/mm/mmap.c
+++ linux-2.6/mm/mmap.c
@@ -1913,17 +1913,17 @@ static void unmap_region(struct mm_struc
unsigned long start, unsigned long end)
{
struct vm_area_struct *next = prev? prev->vm_next: mm->mmap;
- struct mmu_gather *tlb;
+ struct mmu_gather tlb;
unsigned long nr_accounted = 0;
lru_add_drain();
- tlb = tlb_gather_mmu(mm, 0);
+ tlb_gather_mmu(&tlb, mm, 0);
update_hiwater_rss(mm);
unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL);
vm_unacct_memory(nr_accounted);
- free_pgtables(tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS,
- next? next->vm_start: 0);
- tlb_finish_mmu(tlb, start, end);
+ free_pgtables(&tlb, vma, prev ? prev->vm_end : FIRST_USER_ADDRESS,
+ next ? next->vm_start : 0);
+ tlb_finish_mmu(&tlb, start, end);
}
/*
@@ -2265,7 +2265,7 @@ EXPORT_SYMBOL(do_brk);
/* Release all mmaps. */
void exit_mmap(struct mm_struct *mm)
{
- struct mmu_gather *tlb;
+ struct mmu_gather tlb;
struct vm_area_struct *vma;
unsigned long nr_accounted = 0;
unsigned long end;
@@ -2290,14 +2290,14 @@ void exit_mmap(struct mm_struct *mm)
lru_add_drain();
flush_cache_mm(mm);
- tlb = tlb_gather_mmu(mm, 1);
+ tlb_gather_mmu(&tlb, mm, 1);
/* update_hiwater_rss(mm) here? but nobody should be looking */
/* Use -1 here to ensure all VMAs in the mm are unmapped */
end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
vm_unacct_memory(nr_accounted);
- free_pgtables(tlb, vma, FIRST_USER_ADDRESS, 0);
- tlb_finish_mmu(tlb, 0, end);
+ free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, 0);
+ tlb_finish_mmu(&tlb, 0, end);
/*
* Walk the list again, actually closing and freeing it,
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 02/25] mm: Preemptible mmu_gather
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Martin Schwidefsky,
Russell King, Paul Mundt, Jeff Dike, Tony Luck, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-preemptible_mmu_gather.patch --]
[-- Type: text/plain, Size: 10478 bytes --]
Make mmu_gather preemptible by using a small on stack list and use
an option allocation to speed things up.
Preemptible mmu_gather is desired in general and usable once
i_mmap_lock becomes a mutex. Doing it before the mutex conversion
saves us from having to rework the code by moving the mmu_gather
bits inside the pte_lock.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Miller <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Tony Luck <tony.luck@intel.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
fs/exec.c | 10 ++++-----
include/asm-generic/tlb.h | 51 +++++++++++++++++++++++++++++-----------------
include/linux/mm.h | 2 -
mm/memory.c | 27 +++++-------------------
mm/mmap.c | 18 ++++++++--------
5 files changed, 54 insertions(+), 54 deletions(-)
Index: linux-2.6/fs/exec.c
===================================================================
--- linux-2.6.orig/fs/exec.c
+++ linux-2.6/fs/exec.c
@@ -550,7 +550,7 @@ static int shift_arg_pages(struct vm_are
unsigned long length = old_end - old_start;
unsigned long new_start = old_start - shift;
unsigned long new_end = old_end - shift;
- struct mmu_gather *tlb;
+ struct mmu_gather tlb;
BUG_ON(new_start > new_end);
@@ -576,12 +576,12 @@ static int shift_arg_pages(struct vm_are
return -ENOMEM;
lru_add_drain();
- tlb = tlb_gather_mmu(mm, 0);
+ tlb_gather_mmu(&tlb, mm, 0);
if (new_end > old_start) {
/*
* when the old and new regions overlap clear from new_end.
*/
- free_pgd_range(tlb, new_end, old_end, new_end,
+ free_pgd_range(&tlb, new_end, old_end, new_end,
vma->vm_next ? vma->vm_next->vm_start : 0);
} else {
/*
@@ -590,10 +590,10 @@ static int shift_arg_pages(struct vm_are
* have constraints on va-space that make this illegal (IA64) -
* for the others its just a little faster.
*/
- free_pgd_range(tlb, old_start, old_end, new_end,
+ free_pgd_range(&tlb, old_start, old_end, new_end,
vma->vm_next ? vma->vm_next->vm_start : 0);
}
- tlb_finish_mmu(tlb, new_end, old_end);
+ tlb_finish_mmu(&tlb, new_end, old_end);
/*
* Shrink the vma to just the new range. Always succeeds.
Index: linux-2.6/include/asm-generic/tlb.h
===================================================================
--- linux-2.6.orig/include/asm-generic/tlb.h
+++ linux-2.6/include/asm-generic/tlb.h
@@ -22,14 +22,8 @@
* and page free order so much..
*/
#ifdef CONFIG_SMP
- #ifdef ARCH_FREE_PTR_NR
- #define FREE_PTR_NR ARCH_FREE_PTR_NR
- #else
- #define FREE_PTE_NR 506
- #endif
#define tlb_fast_mode(tlb) ((tlb)->nr == ~0U)
#else
- #define FREE_PTE_NR 1
#define tlb_fast_mode(tlb) 1
#endif
@@ -39,30 +33,48 @@
struct mmu_gather {
struct mm_struct *mm;
unsigned int nr; /* set to ~0U means fast mode */
+ unsigned int max; /* nr < max */
unsigned int need_flush;/* Really unmapped some ptes? */
unsigned int fullmm; /* non-zero means full mm flush */
- struct page * pages[FREE_PTE_NR];
+#ifdef HAVE_ARCH_MMU_GATHER
+ struct arch_mmu_gather arch;
+#endif
+ struct page **pages;
+ struct page *local[8];
};
-/* Users of the generic TLB shootdown code must declare this storage space. */
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
+static inline void __tlb_alloc_page(struct mmu_gather *tlb)
+{
+ unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
+
+ if (addr) {
+ tlb->pages = (void *)addr;
+ tlb->max = PAGE_SIZE / sizeof(struct page *);
+ }
+}
/* tlb_gather_mmu
* Return a pointer to an initialized struct mmu_gather.
*/
-static inline struct mmu_gather *
-tlb_gather_mmu(struct mm_struct *mm, unsigned int full_mm_flush)
+static inline void
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
{
- struct mmu_gather *tlb = &get_cpu_var(mmu_gathers);
-
tlb->mm = mm;
- /* Use fast mode if only one CPU is online */
- tlb->nr = num_online_cpus() > 1 ? 0U : ~0U;
+ tlb->max = ARRAY_SIZE(tlb->local);
+ tlb->pages = tlb->local;
+
+ if (num_online_cpus() > 1) {
+ tlb->nr = 0;
+ __tlb_alloc_page(tlb);
+ } else /* Use fast mode if only one CPU is online */
+ tlb->nr = ~0U;
tlb->fullmm = full_mm_flush;
- return tlb;
+#ifdef HAVE_ARCH_MMU_GATHER
+ tlb->arch = ARCH_MMU_GATHER_INIT;
+#endif
}
static inline void
@@ -75,6 +87,8 @@ tlb_flush_mmu(struct mmu_gather *tlb, un
if (!tlb_fast_mode(tlb)) {
free_pages_and_swap_cache(tlb->pages, tlb->nr);
tlb->nr = 0;
+ if (tlb->pages == tlb->local)
+ __tlb_alloc_page(tlb);
}
}
@@ -90,7 +104,8 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
/* keep the page table cache within bounds */
check_pgt_cache();
- put_cpu_var(mmu_gathers);
+ if (tlb->pages != tlb->local)
+ free_pages((unsigned long)tlb->pages, 0);
}
/* tlb_remove_page
@@ -106,7 +121,7 @@ static inline void tlb_remove_page(struc
return;
}
tlb->pages[tlb->nr++] = page;
- if (tlb->nr >= FREE_PTE_NR)
+ if (tlb->nr >= tlb->max)
tlb_flush_mmu(tlb, 0, 0);
}
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -887,7 +887,7 @@ int zap_vma_ptes(struct vm_area_struct *
unsigned long size);
unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
unsigned long size, struct zap_details *);
-unsigned long unmap_vmas(struct mmu_gather **tlb,
+unsigned long unmap_vmas(struct mmu_gather *tlb,
struct vm_area_struct *start_vma, unsigned long start_addr,
unsigned long end_addr, unsigned long *nr_accounted,
struct zap_details *);
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -1121,17 +1121,14 @@ static unsigned long unmap_page_range(st
* ensure that any thus-far unmapped pages are flushed before unmap_vmas()
* drops the lock and schedules.
*/
-unsigned long unmap_vmas(struct mmu_gather **tlbp,
+unsigned long unmap_vmas(struct mmu_gather *tlb,
struct vm_area_struct *vma, unsigned long start_addr,
unsigned long end_addr, unsigned long *nr_accounted,
struct zap_details *details)
{
long zap_work = ZAP_BLOCK_SIZE;
- unsigned long tlb_start = 0; /* For tlb_finish_mmu */
- int tlb_start_valid = 0;
unsigned long start = start_addr;
spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
- int fullmm = (*tlbp)->fullmm;
struct mm_struct *mm = vma->vm_mm;
mmu_notifier_invalidate_range_start(mm, start_addr, end_addr);
@@ -1152,11 +1149,6 @@ unsigned long unmap_vmas(struct mmu_gath
untrack_pfn_vma(vma, 0, 0);
while (start != end) {
- if (!tlb_start_valid) {
- tlb_start = start;
- tlb_start_valid = 1;
- }
-
if (unlikely(is_vm_hugetlb_page(vma))) {
/*
* It is undesirable to test vma->vm_file as it
@@ -1177,7 +1169,7 @@ unsigned long unmap_vmas(struct mmu_gath
start = end;
} else
- start = unmap_page_range(*tlbp, vma,
+ start = unmap_page_range(tlb, vma,
start, end, &zap_work, details);
if (zap_work > 0) {
@@ -1185,19 +1177,13 @@ unsigned long unmap_vmas(struct mmu_gath
break;
}
- tlb_finish_mmu(*tlbp, tlb_start, start);
-
if (need_resched() ||
(i_mmap_lock && spin_needbreak(i_mmap_lock))) {
- if (i_mmap_lock) {
- *tlbp = NULL;
+ if (i_mmap_lock)
goto out;
- }
cond_resched();
}
- *tlbp = tlb_gather_mmu(vma->vm_mm, fullmm);
- tlb_start_valid = 0;
zap_work = ZAP_BLOCK_SIZE;
}
}
@@ -1217,16 +1203,15 @@ unsigned long zap_page_range(struct vm_a
unsigned long size, struct zap_details *details)
{
struct mm_struct *mm = vma->vm_mm;
- struct mmu_gather *tlb;
+ struct mmu_gather tlb;
unsigned long end = address + size;
unsigned long nr_accounted = 0;
lru_add_drain();
- tlb = tlb_gather_mmu(mm, 0);
+ tlb_gather_mmu(&tlb, mm, 0);
update_hiwater_rss(mm);
end = unmap_vmas(&tlb, vma, address, end, &nr_accounted, details);
- if (tlb)
- tlb_finish_mmu(tlb, address, end);
+ tlb_finish_mmu(&tlb, address, end);
return end;
}
Index: linux-2.6/mm/mmap.c
===================================================================
--- linux-2.6.orig/mm/mmap.c
+++ linux-2.6/mm/mmap.c
@@ -1913,17 +1913,17 @@ static void unmap_region(struct mm_struc
unsigned long start, unsigned long end)
{
struct vm_area_struct *next = prev? prev->vm_next: mm->mmap;
- struct mmu_gather *tlb;
+ struct mmu_gather tlb;
unsigned long nr_accounted = 0;
lru_add_drain();
- tlb = tlb_gather_mmu(mm, 0);
+ tlb_gather_mmu(&tlb, mm, 0);
update_hiwater_rss(mm);
unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL);
vm_unacct_memory(nr_accounted);
- free_pgtables(tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS,
- next? next->vm_start: 0);
- tlb_finish_mmu(tlb, start, end);
+ free_pgtables(&tlb, vma, prev ? prev->vm_end : FIRST_USER_ADDRESS,
+ next ? next->vm_start : 0);
+ tlb_finish_mmu(&tlb, start, end);
}
/*
@@ -2265,7 +2265,7 @@ EXPORT_SYMBOL(do_brk);
/* Release all mmaps. */
void exit_mmap(struct mm_struct *mm)
{
- struct mmu_gather *tlb;
+ struct mmu_gather tlb;
struct vm_area_struct *vma;
unsigned long nr_accounted = 0;
unsigned long end;
@@ -2290,14 +2290,14 @@ void exit_mmap(struct mm_struct *mm)
lru_add_drain();
flush_cache_mm(mm);
- tlb = tlb_gather_mmu(mm, 1);
+ tlb_gather_mmu(&tlb, mm, 1);
/* update_hiwater_rss(mm) here? but nobody should be looking */
/* Use -1 here to ensure all VMAs in the mm are unmapped */
end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
vm_unacct_memory(nr_accounted);
- free_pgtables(tlb, vma, FIRST_USER_ADDRESS, 0);
- tlb_finish_mmu(tlb, 0, end);
+ free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, 0);
+ tlb_finish_mmu(&tlb, 0, end);
/*
* Walk the list again, actually closing and freeing it,
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 02/25] mm: Preemptible mmu_gather
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Martin Schwidefsky,
Russell King, Paul Mundt, Jeff Dike, Tony Luck, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-preemptible_mmu_gather.patch --]
[-- Type: text/plain, Size: 10478 bytes --]
Make mmu_gather preemptible by using a small on stack list and use
an option allocation to speed things up.
Preemptible mmu_gather is desired in general and usable once
i_mmap_lock becomes a mutex. Doing it before the mutex conversion
saves us from having to rework the code by moving the mmu_gather
bits inside the pte_lock.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Miller <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Tony Luck <tony.luck@intel.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
fs/exec.c | 10 ++++-----
include/asm-generic/tlb.h | 51 +++++++++++++++++++++++++++++-----------------
include/linux/mm.h | 2 -
mm/memory.c | 27 +++++-------------------
mm/mmap.c | 18 ++++++++--------
5 files changed, 54 insertions(+), 54 deletions(-)
Index: linux-2.6/fs/exec.c
===================================================================
--- linux-2.6.orig/fs/exec.c
+++ linux-2.6/fs/exec.c
@@ -550,7 +550,7 @@ static int shift_arg_pages(struct vm_are
unsigned long length = old_end - old_start;
unsigned long new_start = old_start - shift;
unsigned long new_end = old_end - shift;
- struct mmu_gather *tlb;
+ struct mmu_gather tlb;
BUG_ON(new_start > new_end);
@@ -576,12 +576,12 @@ static int shift_arg_pages(struct vm_are
return -ENOMEM;
lru_add_drain();
- tlb = tlb_gather_mmu(mm, 0);
+ tlb_gather_mmu(&tlb, mm, 0);
if (new_end > old_start) {
/*
* when the old and new regions overlap clear from new_end.
*/
- free_pgd_range(tlb, new_end, old_end, new_end,
+ free_pgd_range(&tlb, new_end, old_end, new_end,
vma->vm_next ? vma->vm_next->vm_start : 0);
} else {
/*
@@ -590,10 +590,10 @@ static int shift_arg_pages(struct vm_are
* have constraints on va-space that make this illegal (IA64) -
* for the others its just a little faster.
*/
- free_pgd_range(tlb, old_start, old_end, new_end,
+ free_pgd_range(&tlb, old_start, old_end, new_end,
vma->vm_next ? vma->vm_next->vm_start : 0);
}
- tlb_finish_mmu(tlb, new_end, old_end);
+ tlb_finish_mmu(&tlb, new_end, old_end);
/*
* Shrink the vma to just the new range. Always succeeds.
Index: linux-2.6/include/asm-generic/tlb.h
===================================================================
--- linux-2.6.orig/include/asm-generic/tlb.h
+++ linux-2.6/include/asm-generic/tlb.h
@@ -22,14 +22,8 @@
* and page free order so much..
*/
#ifdef CONFIG_SMP
- #ifdef ARCH_FREE_PTR_NR
- #define FREE_PTR_NR ARCH_FREE_PTR_NR
- #else
- #define FREE_PTE_NR 506
- #endif
#define tlb_fast_mode(tlb) ((tlb)->nr == ~0U)
#else
- #define FREE_PTE_NR 1
#define tlb_fast_mode(tlb) 1
#endif
@@ -39,30 +33,48 @@
struct mmu_gather {
struct mm_struct *mm;
unsigned int nr; /* set to ~0U means fast mode */
+ unsigned int max; /* nr < max */
unsigned int need_flush;/* Really unmapped some ptes? */
unsigned int fullmm; /* non-zero means full mm flush */
- struct page * pages[FREE_PTE_NR];
+#ifdef HAVE_ARCH_MMU_GATHER
+ struct arch_mmu_gather arch;
+#endif
+ struct page **pages;
+ struct page *local[8];
};
-/* Users of the generic TLB shootdown code must declare this storage space. */
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
+static inline void __tlb_alloc_page(struct mmu_gather *tlb)
+{
+ unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
+
+ if (addr) {
+ tlb->pages = (void *)addr;
+ tlb->max = PAGE_SIZE / sizeof(struct page *);
+ }
+}
/* tlb_gather_mmu
* Return a pointer to an initialized struct mmu_gather.
*/
-static inline struct mmu_gather *
-tlb_gather_mmu(struct mm_struct *mm, unsigned int full_mm_flush)
+static inline void
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
{
- struct mmu_gather *tlb = &get_cpu_var(mmu_gathers);
-
tlb->mm = mm;
- /* Use fast mode if only one CPU is online */
- tlb->nr = num_online_cpus() > 1 ? 0U : ~0U;
+ tlb->max = ARRAY_SIZE(tlb->local);
+ tlb->pages = tlb->local;
+
+ if (num_online_cpus() > 1) {
+ tlb->nr = 0;
+ __tlb_alloc_page(tlb);
+ } else /* Use fast mode if only one CPU is online */
+ tlb->nr = ~0U;
tlb->fullmm = full_mm_flush;
- return tlb;
+#ifdef HAVE_ARCH_MMU_GATHER
+ tlb->arch = ARCH_MMU_GATHER_INIT;
+#endif
}
static inline void
@@ -75,6 +87,8 @@ tlb_flush_mmu(struct mmu_gather *tlb, un
if (!tlb_fast_mode(tlb)) {
free_pages_and_swap_cache(tlb->pages, tlb->nr);
tlb->nr = 0;
+ if (tlb->pages == tlb->local)
+ __tlb_alloc_page(tlb);
}
}
@@ -90,7 +104,8 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
/* keep the page table cache within bounds */
check_pgt_cache();
- put_cpu_var(mmu_gathers);
+ if (tlb->pages != tlb->local)
+ free_pages((unsigned long)tlb->pages, 0);
}
/* tlb_remove_page
@@ -106,7 +121,7 @@ static inline void tlb_remove_page(struc
return;
}
tlb->pages[tlb->nr++] = page;
- if (tlb->nr >= FREE_PTE_NR)
+ if (tlb->nr >= tlb->max)
tlb_flush_mmu(tlb, 0, 0);
}
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -887,7 +887,7 @@ int zap_vma_ptes(struct vm_area_struct *
unsigned long size);
unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
unsigned long size, struct zap_details *);
-unsigned long unmap_vmas(struct mmu_gather **tlb,
+unsigned long unmap_vmas(struct mmu_gather *tlb,
struct vm_area_struct *start_vma, unsigned long start_addr,
unsigned long end_addr, unsigned long *nr_accounted,
struct zap_details *);
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -1121,17 +1121,14 @@ static unsigned long unmap_page_range(st
* ensure that any thus-far unmapped pages are flushed before unmap_vmas()
* drops the lock and schedules.
*/
-unsigned long unmap_vmas(struct mmu_gather **tlbp,
+unsigned long unmap_vmas(struct mmu_gather *tlb,
struct vm_area_struct *vma, unsigned long start_addr,
unsigned long end_addr, unsigned long *nr_accounted,
struct zap_details *details)
{
long zap_work = ZAP_BLOCK_SIZE;
- unsigned long tlb_start = 0; /* For tlb_finish_mmu */
- int tlb_start_valid = 0;
unsigned long start = start_addr;
spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
- int fullmm = (*tlbp)->fullmm;
struct mm_struct *mm = vma->vm_mm;
mmu_notifier_invalidate_range_start(mm, start_addr, end_addr);
@@ -1152,11 +1149,6 @@ unsigned long unmap_vmas(struct mmu_gath
untrack_pfn_vma(vma, 0, 0);
while (start != end) {
- if (!tlb_start_valid) {
- tlb_start = start;
- tlb_start_valid = 1;
- }
-
if (unlikely(is_vm_hugetlb_page(vma))) {
/*
* It is undesirable to test vma->vm_file as it
@@ -1177,7 +1169,7 @@ unsigned long unmap_vmas(struct mmu_gath
start = end;
} else
- start = unmap_page_range(*tlbp, vma,
+ start = unmap_page_range(tlb, vma,
start, end, &zap_work, details);
if (zap_work > 0) {
@@ -1185,19 +1177,13 @@ unsigned long unmap_vmas(struct mmu_gath
break;
}
- tlb_finish_mmu(*tlbp, tlb_start, start);
-
if (need_resched() ||
(i_mmap_lock && spin_needbreak(i_mmap_lock))) {
- if (i_mmap_lock) {
- *tlbp = NULL;
+ if (i_mmap_lock)
goto out;
- }
cond_resched();
}
- *tlbp = tlb_gather_mmu(vma->vm_mm, fullmm);
- tlb_start_valid = 0;
zap_work = ZAP_BLOCK_SIZE;
}
}
@@ -1217,16 +1203,15 @@ unsigned long zap_page_range(struct vm_a
unsigned long size, struct zap_details *details)
{
struct mm_struct *mm = vma->vm_mm;
- struct mmu_gather *tlb;
+ struct mmu_gather tlb;
unsigned long end = address + size;
unsigned long nr_accounted = 0;
lru_add_drain();
- tlb = tlb_gather_mmu(mm, 0);
+ tlb_gather_mmu(&tlb, mm, 0);
update_hiwater_rss(mm);
end = unmap_vmas(&tlb, vma, address, end, &nr_accounted, details);
- if (tlb)
- tlb_finish_mmu(tlb, address, end);
+ tlb_finish_mmu(&tlb, address, end);
return end;
}
Index: linux-2.6/mm/mmap.c
===================================================================
--- linux-2.6.orig/mm/mmap.c
+++ linux-2.6/mm/mmap.c
@@ -1913,17 +1913,17 @@ static void unmap_region(struct mm_struc
unsigned long start, unsigned long end)
{
struct vm_area_struct *next = prev? prev->vm_next: mm->mmap;
- struct mmu_gather *tlb;
+ struct mmu_gather tlb;
unsigned long nr_accounted = 0;
lru_add_drain();
- tlb = tlb_gather_mmu(mm, 0);
+ tlb_gather_mmu(&tlb, mm, 0);
update_hiwater_rss(mm);
unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL);
vm_unacct_memory(nr_accounted);
- free_pgtables(tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS,
- next? next->vm_start: 0);
- tlb_finish_mmu(tlb, start, end);
+ free_pgtables(&tlb, vma, prev ? prev->vm_end : FIRST_USER_ADDRESS,
+ next ? next->vm_start : 0);
+ tlb_finish_mmu(&tlb, start, end);
}
/*
@@ -2265,7 +2265,7 @@ EXPORT_SYMBOL(do_brk);
/* Release all mmaps. */
void exit_mmap(struct mm_struct *mm)
{
- struct mmu_gather *tlb;
+ struct mmu_gather tlb;
struct vm_area_struct *vma;
unsigned long nr_accounted = 0;
unsigned long end;
@@ -2290,14 +2290,14 @@ void exit_mmap(struct mm_struct *mm)
lru_add_drain();
flush_cache_mm(mm);
- tlb = tlb_gather_mmu(mm, 1);
+ tlb_gather_mmu(&tlb, mm, 1);
/* update_hiwater_rss(mm) here? but nobody should be looking */
/* Use -1 here to ensure all VMAs in the mm are unmapped */
end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
vm_unacct_memory(nr_accounted);
- free_pgtables(tlb, vma, FIRST_USER_ADDRESS, 0);
- tlb_finish_mmu(tlb, 0, end);
+ free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, 0);
+ tlb_finish_mmu(&tlb, 0, end);
/*
* Walk the list again, actually closing and freeing it,
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 03/25] powerpc: Preemptible mmu_gather
2011-01-25 17:31 ` Peter Zijlstra
(?)
@ 2011-01-25 17:31 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: peter_zijlstra-powerpc-preemptible_mmu_gather.patch --]
[-- Type: text/plain, Size: 8795 bytes --]
Fix up powerpc to the new mmu_gather stuffs.
PPC has an extra batching queue to RCU free the actual pagetable
allocations, use the ARCH extentions for that for now.
For the ppc64_tlb_batch, which tracks the vaddrs to unhash from the
hardware hash-table, keep using per-cpu arrays but flush on context
switch and use a TLF bit to track the laxy_mmu state.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/powerpc/include/asm/pgalloc.h | 4 ++--
arch/powerpc/include/asm/thread_info.h | 2 ++
arch/powerpc/include/asm/tlb.h | 10 ++++++++++
arch/powerpc/kernel/process.c | 23 ++++++++++++++++++++++-
arch/powerpc/mm/pgtable.c | 14 ++++----------
arch/powerpc/mm/tlb_hash32.c | 2 +-
arch/powerpc/mm/tlb_hash64.c | 12 +++++++-----
arch/powerpc/mm/tlb_nohash.c | 2 +-
8 files changed, 49 insertions(+), 20 deletions(-)
Index: linux-2.6/arch/powerpc/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/tlb.h
+++ linux-2.6/arch/powerpc/include/asm/tlb.h
@@ -28,6 +28,16 @@
#define tlb_start_vma(tlb, vma) do { } while (0)
#define tlb_end_vma(tlb, vma) do { } while (0)
+#define HAVE_ARCH_MMU_GATHER 1
+
+struct pte_freelist_batch;
+
+struct arch_mmu_gather {
+ struct pte_freelist_batch *batch;
+};
+
+#define ARCH_MMU_GATHER_INIT (struct arch_mmu_gather){ .batch = NULL, }
+
extern void tlb_flush(struct mmu_gather *tlb);
/* Get the generic bits... */
Index: linux-2.6/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/process.c
+++ linux-2.6/arch/powerpc/kernel/process.c
@@ -393,6 +393,9 @@ struct task_struct *__switch_to(struct t
struct thread_struct *new_thread, *old_thread;
unsigned long flags;
struct task_struct *last;
+#ifdef CONFIG_PPC_BOOK3S_64
+ struct ppc64_tlb_batch *batch;
+#endif
#ifdef CONFIG_SMP
/* avoid complexity of lazy save/restore of fpu
@@ -511,7 +514,17 @@ struct task_struct *__switch_to(struct t
old_thread->accum_tb += (current_tb - start_tb);
new_thread->start_tb = current_tb;
}
-#endif
+#endif /* CONFIG_PPC64 */
+
+#ifdef CONFIG_PPC_BOOK3S_64
+ batch = &__get_cpu_var(ppc64_tlb_batch);
+ if (batch->active) {
+ current_thread_info()->local_flags |= _TLF_LAZY_MMU;
+ if (batch->index)
+ __flush_tlb_pending(batch);
+ batch->active = 0;
+ }
+#endif /* CONFIG_PPC_BOOK3S_64 */
local_irq_save(flags);
@@ -526,6 +539,14 @@ struct task_struct *__switch_to(struct t
hard_irq_disable();
last = _switch(old_thread, new_thread);
+#ifdef CONFIG_PPC_BOOK3S_64
+ if (current_thread_info()->local_flags & _TLF_LAZY_MMU) {
+ current_thread_info()->local_flags &= ~_TLF_LAZY_MMU;
+ batch = &__get_cpu_var(ppc64_tlb_batch);
+ batch->active = 1;
+ }
+#endif /* CONFIG_PPC_BOOK3S_64 */
+
local_irq_restore(flags);
return last;
Index: linux-2.6/arch/powerpc/mm/pgtable.c
===================================================================
--- linux-2.6.orig/arch/powerpc/mm/pgtable.c
+++ linux-2.6/arch/powerpc/mm/pgtable.c
@@ -33,8 +33,6 @@
#include "mmu_decl.h"
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
#ifdef CONFIG_SMP
/*
@@ -43,7 +41,6 @@ DEFINE_PER_CPU(struct mmu_gather, mmu_ga
* freeing a page table page that is being walked without locks
*/
-static DEFINE_PER_CPU(struct pte_freelist_batch *, pte_freelist_cur);
static unsigned long pte_freelist_forced_free;
struct pte_freelist_batch
@@ -97,12 +94,10 @@ static void pte_free_submit(struct pte_f
void pgtable_free_tlb(struct mmu_gather *tlb, void *table, unsigned shift)
{
- /* This is safe since tlb_gather_mmu has disabled preemption */
- struct pte_freelist_batch **batchp = &__get_cpu_var(pte_freelist_cur);
+ struct pte_freelist_batch **batchp = &tlb->arch.batch;
unsigned long pgf;
- if (atomic_read(&tlb->mm->mm_users) < 2 ||
- cpumask_equal(mm_cpumask(tlb->mm), cpumask_of(smp_processor_id()))){
+ if (atomic_read(&tlb->mm->mm_users) < 2) {
pgtable_free(table, shift);
return;
}
@@ -124,10 +119,9 @@ void pgtable_free_tlb(struct mmu_gather
}
}
-void pte_free_finish(void)
+void pte_free_finish(struct mmu_gather *tlb)
{
- /* This is safe since tlb_gather_mmu has disabled preemption */
- struct pte_freelist_batch **batchp = &__get_cpu_var(pte_freelist_cur);
+ struct pte_freelist_batch **batchp = &tlb->arch.batch;
if (*batchp == NULL)
return;
Index: linux-2.6/arch/powerpc/mm/tlb_hash64.c
===================================================================
--- linux-2.6.orig/arch/powerpc/mm/tlb_hash64.c
+++ linux-2.6/arch/powerpc/mm/tlb_hash64.c
@@ -38,13 +38,11 @@ DEFINE_PER_CPU(struct ppc64_tlb_batch, p
* neesd to be flushed. This function will either perform the flush
* immediately or will batch it up if the current CPU has an active
* batch on it.
- *
- * Must be called from within some kind of spinlock/non-preempt region...
*/
void hpte_need_flush(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, unsigned long pte, int huge)
{
- struct ppc64_tlb_batch *batch = &__get_cpu_var(ppc64_tlb_batch);
+ struct ppc64_tlb_batch *batch = &get_cpu_var(ppc64_tlb_batch);
unsigned long vsid, vaddr;
unsigned int psize;
int ssize;
@@ -99,6 +97,7 @@ void hpte_need_flush(struct mm_struct *m
*/
if (!batch->active) {
flush_hash_page(vaddr, rpte, psize, ssize, 0);
+ put_cpu_var(ppc64_tlb_batch);
return;
}
@@ -127,6 +126,7 @@ void hpte_need_flush(struct mm_struct *m
batch->index = ++i;
if (i >= PPC64_TLB_BATCH_NR)
__flush_tlb_pending(batch);
+ put_cpu_var(ppc64_tlb_batch);
}
/*
@@ -155,7 +155,7 @@ void __flush_tlb_pending(struct ppc64_tl
void tlb_flush(struct mmu_gather *tlb)
{
- struct ppc64_tlb_batch *tlbbatch = &__get_cpu_var(ppc64_tlb_batch);
+ struct ppc64_tlb_batch *tlbbatch = &get_cpu_var(ppc64_tlb_batch);
/* If there's a TLB batch pending, then we must flush it because the
* pages are going to be freed and we really don't want to have a CPU
@@ -164,8 +164,10 @@ void tlb_flush(struct mmu_gather *tlb)
if (tlbbatch->index)
__flush_tlb_pending(tlbbatch);
+ put_cpu_var(ppc64_tlb_batch);
+
/* Push out batch of freed page tables */
- pte_free_finish();
+ pte_free_finish(tlb);
}
/**
Index: linux-2.6/arch/powerpc/include/asm/thread_info.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/thread_info.h
+++ linux-2.6/arch/powerpc/include/asm/thread_info.h
@@ -139,10 +139,12 @@ static inline struct thread_info *curren
#define TLF_NAPPING 0 /* idle thread enabled NAP mode */
#define TLF_SLEEPING 1 /* suspend code enabled SLEEP mode */
#define TLF_RESTORE_SIGMASK 2 /* Restore signal mask in do_signal */
+#define TLF_LAZY_MMU 3 /* tlb_batch is active */
#define _TLF_NAPPING (1 << TLF_NAPPING)
#define _TLF_SLEEPING (1 << TLF_SLEEPING)
#define _TLF_RESTORE_SIGMASK (1 << TLF_RESTORE_SIGMASK)
+#define _TLF_LAZY_MMU (1 << TLF_LAZY_MMU)
#ifndef __ASSEMBLY__
#define HAVE_SET_RESTORE_SIGMASK 1
Index: linux-2.6/arch/powerpc/include/asm/pgalloc.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/pgalloc.h
+++ linux-2.6/arch/powerpc/include/asm/pgalloc.h
@@ -32,13 +32,13 @@ static inline void pte_free(struct mm_st
#ifdef CONFIG_SMP
extern void pgtable_free_tlb(struct mmu_gather *tlb, void *table, unsigned shift);
-extern void pte_free_finish(void);
+extern void pte_free_finish(struct mmu_gather *tlb);
#else /* CONFIG_SMP */
static inline void pgtable_free_tlb(struct mmu_gather *tlb, void *table, unsigned shift)
{
pgtable_free(table, shift);
}
-static inline void pte_free_finish(void) { }
+static inline void pte_free_finish(struct mmu_gather *tlb) { }
#endif /* !CONFIG_SMP */
static inline void __pte_free_tlb(struct mmu_gather *tlb, struct page *ptepage,
Index: linux-2.6/arch/powerpc/mm/tlb_hash32.c
===================================================================
--- linux-2.6.orig/arch/powerpc/mm/tlb_hash32.c
+++ linux-2.6/arch/powerpc/mm/tlb_hash32.c
@@ -73,7 +73,7 @@ void tlb_flush(struct mmu_gather *tlb)
}
/* Push out batch of freed page tables */
- pte_free_finish();
+ pte_free_finish(tlb);
}
/*
Index: linux-2.6/arch/powerpc/mm/tlb_nohash.c
===================================================================
--- linux-2.6.orig/arch/powerpc/mm/tlb_nohash.c
+++ linux-2.6/arch/powerpc/mm/tlb_nohash.c
@@ -301,7 +301,7 @@ void tlb_flush(struct mmu_gather *tlb)
flush_tlb_mm(tlb->mm);
/* Push out batch of freed page tables */
- pte_free_finish();
+ pte_free_finish(tlb);
}
/*
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 03/25] powerpc: Preemptible mmu_gather
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: peter_zijlstra-powerpc-preemptible_mmu_gather.patch --]
[-- Type: text/plain, Size: 9091 bytes --]
Fix up powerpc to the new mmu_gather stuffs.
PPC has an extra batching queue to RCU free the actual pagetable
allocations, use the ARCH extentions for that for now.
For the ppc64_tlb_batch, which tracks the vaddrs to unhash from the
hardware hash-table, keep using per-cpu arrays but flush on context
switch and use a TLF bit to track the laxy_mmu state.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/powerpc/include/asm/pgalloc.h | 4 ++--
arch/powerpc/include/asm/thread_info.h | 2 ++
arch/powerpc/include/asm/tlb.h | 10 ++++++++++
arch/powerpc/kernel/process.c | 23 ++++++++++++++++++++++-
arch/powerpc/mm/pgtable.c | 14 ++++----------
arch/powerpc/mm/tlb_hash32.c | 2 +-
arch/powerpc/mm/tlb_hash64.c | 12 +++++++-----
arch/powerpc/mm/tlb_nohash.c | 2 +-
8 files changed, 49 insertions(+), 20 deletions(-)
Index: linux-2.6/arch/powerpc/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/tlb.h
+++ linux-2.6/arch/powerpc/include/asm/tlb.h
@@ -28,6 +28,16 @@
#define tlb_start_vma(tlb, vma) do { } while (0)
#define tlb_end_vma(tlb, vma) do { } while (0)
+#define HAVE_ARCH_MMU_GATHER 1
+
+struct pte_freelist_batch;
+
+struct arch_mmu_gather {
+ struct pte_freelist_batch *batch;
+};
+
+#define ARCH_MMU_GATHER_INIT (struct arch_mmu_gather){ .batch = NULL, }
+
extern void tlb_flush(struct mmu_gather *tlb);
/* Get the generic bits... */
Index: linux-2.6/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/process.c
+++ linux-2.6/arch/powerpc/kernel/process.c
@@ -393,6 +393,9 @@ struct task_struct *__switch_to(struct t
struct thread_struct *new_thread, *old_thread;
unsigned long flags;
struct task_struct *last;
+#ifdef CONFIG_PPC_BOOK3S_64
+ struct ppc64_tlb_batch *batch;
+#endif
#ifdef CONFIG_SMP
/* avoid complexity of lazy save/restore of fpu
@@ -511,7 +514,17 @@ struct task_struct *__switch_to(struct t
old_thread->accum_tb += (current_tb - start_tb);
new_thread->start_tb = current_tb;
}
-#endif
+#endif /* CONFIG_PPC64 */
+
+#ifdef CONFIG_PPC_BOOK3S_64
+ batch = &__get_cpu_var(ppc64_tlb_batch);
+ if (batch->active) {
+ current_thread_info()->local_flags |= _TLF_LAZY_MMU;
+ if (batch->index)
+ __flush_tlb_pending(batch);
+ batch->active = 0;
+ }
+#endif /* CONFIG_PPC_BOOK3S_64 */
local_irq_save(flags);
@@ -526,6 +539,14 @@ struct task_struct *__switch_to(struct t
hard_irq_disable();
last = _switch(old_thread, new_thread);
+#ifdef CONFIG_PPC_BOOK3S_64
+ if (current_thread_info()->local_flags & _TLF_LAZY_MMU) {
+ current_thread_info()->local_flags &= ~_TLF_LAZY_MMU;
+ batch = &__get_cpu_var(ppc64_tlb_batch);
+ batch->active = 1;
+ }
+#endif /* CONFIG_PPC_BOOK3S_64 */
+
local_irq_restore(flags);
return last;
Index: linux-2.6/arch/powerpc/mm/pgtable.c
===================================================================
--- linux-2.6.orig/arch/powerpc/mm/pgtable.c
+++ linux-2.6/arch/powerpc/mm/pgtable.c
@@ -33,8 +33,6 @@
#include "mmu_decl.h"
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
#ifdef CONFIG_SMP
/*
@@ -43,7 +41,6 @@ DEFINE_PER_CPU(struct mmu_gather, mmu_ga
* freeing a page table page that is being walked without locks
*/
-static DEFINE_PER_CPU(struct pte_freelist_batch *, pte_freelist_cur);
static unsigned long pte_freelist_forced_free;
struct pte_freelist_batch
@@ -97,12 +94,10 @@ static void pte_free_submit(struct pte_f
void pgtable_free_tlb(struct mmu_gather *tlb, void *table, unsigned shift)
{
- /* This is safe since tlb_gather_mmu has disabled preemption */
- struct pte_freelist_batch **batchp = &__get_cpu_var(pte_freelist_cur);
+ struct pte_freelist_batch **batchp = &tlb->arch.batch;
unsigned long pgf;
- if (atomic_read(&tlb->mm->mm_users) < 2 ||
- cpumask_equal(mm_cpumask(tlb->mm), cpumask_of(smp_processor_id()))){
+ if (atomic_read(&tlb->mm->mm_users) < 2) {
pgtable_free(table, shift);
return;
}
@@ -124,10 +119,9 @@ void pgtable_free_tlb(struct mmu_gather
}
}
-void pte_free_finish(void)
+void pte_free_finish(struct mmu_gather *tlb)
{
- /* This is safe since tlb_gather_mmu has disabled preemption */
- struct pte_freelist_batch **batchp = &__get_cpu_var(pte_freelist_cur);
+ struct pte_freelist_batch **batchp = &tlb->arch.batch;
if (*batchp == NULL)
return;
Index: linux-2.6/arch/powerpc/mm/tlb_hash64.c
===================================================================
--- linux-2.6.orig/arch/powerpc/mm/tlb_hash64.c
+++ linux-2.6/arch/powerpc/mm/tlb_hash64.c
@@ -38,13 +38,11 @@ DEFINE_PER_CPU(struct ppc64_tlb_batch, p
* neesd to be flushed. This function will either perform the flush
* immediately or will batch it up if the current CPU has an active
* batch on it.
- *
- * Must be called from within some kind of spinlock/non-preempt region...
*/
void hpte_need_flush(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, unsigned long pte, int huge)
{
- struct ppc64_tlb_batch *batch = &__get_cpu_var(ppc64_tlb_batch);
+ struct ppc64_tlb_batch *batch = &get_cpu_var(ppc64_tlb_batch);
unsigned long vsid, vaddr;
unsigned int psize;
int ssize;
@@ -99,6 +97,7 @@ void hpte_need_flush(struct mm_struct *m
*/
if (!batch->active) {
flush_hash_page(vaddr, rpte, psize, ssize, 0);
+ put_cpu_var(ppc64_tlb_batch);
return;
}
@@ -127,6 +126,7 @@ void hpte_need_flush(struct mm_struct *m
batch->index = ++i;
if (i >= PPC64_TLB_BATCH_NR)
__flush_tlb_pending(batch);
+ put_cpu_var(ppc64_tlb_batch);
}
/*
@@ -155,7 +155,7 @@ void __flush_tlb_pending(struct ppc64_tl
void tlb_flush(struct mmu_gather *tlb)
{
- struct ppc64_tlb_batch *tlbbatch = &__get_cpu_var(ppc64_tlb_batch);
+ struct ppc64_tlb_batch *tlbbatch = &get_cpu_var(ppc64_tlb_batch);
/* If there's a TLB batch pending, then we must flush it because the
* pages are going to be freed and we really don't want to have a CPU
@@ -164,8 +164,10 @@ void tlb_flush(struct mmu_gather *tlb)
if (tlbbatch->index)
__flush_tlb_pending(tlbbatch);
+ put_cpu_var(ppc64_tlb_batch);
+
/* Push out batch of freed page tables */
- pte_free_finish();
+ pte_free_finish(tlb);
}
/**
Index: linux-2.6/arch/powerpc/include/asm/thread_info.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/thread_info.h
+++ linux-2.6/arch/powerpc/include/asm/thread_info.h
@@ -139,10 +139,12 @@ static inline struct thread_info *curren
#define TLF_NAPPING 0 /* idle thread enabled NAP mode */
#define TLF_SLEEPING 1 /* suspend code enabled SLEEP mode */
#define TLF_RESTORE_SIGMASK 2 /* Restore signal mask in do_signal */
+#define TLF_LAZY_MMU 3 /* tlb_batch is active */
#define _TLF_NAPPING (1 << TLF_NAPPING)
#define _TLF_SLEEPING (1 << TLF_SLEEPING)
#define _TLF_RESTORE_SIGMASK (1 << TLF_RESTORE_SIGMASK)
+#define _TLF_LAZY_MMU (1 << TLF_LAZY_MMU)
#ifndef __ASSEMBLY__
#define HAVE_SET_RESTORE_SIGMASK 1
Index: linux-2.6/arch/powerpc/include/asm/pgalloc.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/pgalloc.h
+++ linux-2.6/arch/powerpc/include/asm/pgalloc.h
@@ -32,13 +32,13 @@ static inline void pte_free(struct mm_st
#ifdef CONFIG_SMP
extern void pgtable_free_tlb(struct mmu_gather *tlb, void *table, unsigned shift);
-extern void pte_free_finish(void);
+extern void pte_free_finish(struct mmu_gather *tlb);
#else /* CONFIG_SMP */
static inline void pgtable_free_tlb(struct mmu_gather *tlb, void *table, unsigned shift)
{
pgtable_free(table, shift);
}
-static inline void pte_free_finish(void) { }
+static inline void pte_free_finish(struct mmu_gather *tlb) { }
#endif /* !CONFIG_SMP */
static inline void __pte_free_tlb(struct mmu_gather *tlb, struct page *ptepage,
Index: linux-2.6/arch/powerpc/mm/tlb_hash32.c
===================================================================
--- linux-2.6.orig/arch/powerpc/mm/tlb_hash32.c
+++ linux-2.6/arch/powerpc/mm/tlb_hash32.c
@@ -73,7 +73,7 @@ void tlb_flush(struct mmu_gather *tlb)
}
/* Push out batch of freed page tables */
- pte_free_finish();
+ pte_free_finish(tlb);
}
/*
Index: linux-2.6/arch/powerpc/mm/tlb_nohash.c
===================================================================
--- linux-2.6.orig/arch/powerpc/mm/tlb_nohash.c
+++ linux-2.6/arch/powerpc/mm/tlb_nohash.c
@@ -301,7 +301,7 @@ void tlb_flush(struct mmu_gather *tlb)
flush_tlb_mm(tlb->mm);
/* Push out batch of freed page tables */
- pte_free_finish();
+ pte_free_finish(tlb);
}
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 03/25] powerpc: Preemptible mmu_gather
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: peter_zijlstra-powerpc-preemptible_mmu_gather.patch --]
[-- Type: text/plain, Size: 9091 bytes --]
Fix up powerpc to the new mmu_gather stuffs.
PPC has an extra batching queue to RCU free the actual pagetable
allocations, use the ARCH extentions for that for now.
For the ppc64_tlb_batch, which tracks the vaddrs to unhash from the
hardware hash-table, keep using per-cpu arrays but flush on context
switch and use a TLF bit to track the laxy_mmu state.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/powerpc/include/asm/pgalloc.h | 4 ++--
arch/powerpc/include/asm/thread_info.h | 2 ++
arch/powerpc/include/asm/tlb.h | 10 ++++++++++
arch/powerpc/kernel/process.c | 23 ++++++++++++++++++++++-
arch/powerpc/mm/pgtable.c | 14 ++++----------
arch/powerpc/mm/tlb_hash32.c | 2 +-
arch/powerpc/mm/tlb_hash64.c | 12 +++++++-----
arch/powerpc/mm/tlb_nohash.c | 2 +-
8 files changed, 49 insertions(+), 20 deletions(-)
Index: linux-2.6/arch/powerpc/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/tlb.h
+++ linux-2.6/arch/powerpc/include/asm/tlb.h
@@ -28,6 +28,16 @@
#define tlb_start_vma(tlb, vma) do { } while (0)
#define tlb_end_vma(tlb, vma) do { } while (0)
+#define HAVE_ARCH_MMU_GATHER 1
+
+struct pte_freelist_batch;
+
+struct arch_mmu_gather {
+ struct pte_freelist_batch *batch;
+};
+
+#define ARCH_MMU_GATHER_INIT (struct arch_mmu_gather){ .batch = NULL, }
+
extern void tlb_flush(struct mmu_gather *tlb);
/* Get the generic bits... */
Index: linux-2.6/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/process.c
+++ linux-2.6/arch/powerpc/kernel/process.c
@@ -393,6 +393,9 @@ struct task_struct *__switch_to(struct t
struct thread_struct *new_thread, *old_thread;
unsigned long flags;
struct task_struct *last;
+#ifdef CONFIG_PPC_BOOK3S_64
+ struct ppc64_tlb_batch *batch;
+#endif
#ifdef CONFIG_SMP
/* avoid complexity of lazy save/restore of fpu
@@ -511,7 +514,17 @@ struct task_struct *__switch_to(struct t
old_thread->accum_tb += (current_tb - start_tb);
new_thread->start_tb = current_tb;
}
-#endif
+#endif /* CONFIG_PPC64 */
+
+#ifdef CONFIG_PPC_BOOK3S_64
+ batch = &__get_cpu_var(ppc64_tlb_batch);
+ if (batch->active) {
+ current_thread_info()->local_flags |= _TLF_LAZY_MMU;
+ if (batch->index)
+ __flush_tlb_pending(batch);
+ batch->active = 0;
+ }
+#endif /* CONFIG_PPC_BOOK3S_64 */
local_irq_save(flags);
@@ -526,6 +539,14 @@ struct task_struct *__switch_to(struct t
hard_irq_disable();
last = _switch(old_thread, new_thread);
+#ifdef CONFIG_PPC_BOOK3S_64
+ if (current_thread_info()->local_flags & _TLF_LAZY_MMU) {
+ current_thread_info()->local_flags &= ~_TLF_LAZY_MMU;
+ batch = &__get_cpu_var(ppc64_tlb_batch);
+ batch->active = 1;
+ }
+#endif /* CONFIG_PPC_BOOK3S_64 */
+
local_irq_restore(flags);
return last;
Index: linux-2.6/arch/powerpc/mm/pgtable.c
===================================================================
--- linux-2.6.orig/arch/powerpc/mm/pgtable.c
+++ linux-2.6/arch/powerpc/mm/pgtable.c
@@ -33,8 +33,6 @@
#include "mmu_decl.h"
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
#ifdef CONFIG_SMP
/*
@@ -43,7 +41,6 @@ DEFINE_PER_CPU(struct mmu_gather, mmu_ga
* freeing a page table page that is being walked without locks
*/
-static DEFINE_PER_CPU(struct pte_freelist_batch *, pte_freelist_cur);
static unsigned long pte_freelist_forced_free;
struct pte_freelist_batch
@@ -97,12 +94,10 @@ static void pte_free_submit(struct pte_f
void pgtable_free_tlb(struct mmu_gather *tlb, void *table, unsigned shift)
{
- /* This is safe since tlb_gather_mmu has disabled preemption */
- struct pte_freelist_batch **batchp = &__get_cpu_var(pte_freelist_cur);
+ struct pte_freelist_batch **batchp = &tlb->arch.batch;
unsigned long pgf;
- if (atomic_read(&tlb->mm->mm_users) < 2 ||
- cpumask_equal(mm_cpumask(tlb->mm), cpumask_of(smp_processor_id()))){
+ if (atomic_read(&tlb->mm->mm_users) < 2) {
pgtable_free(table, shift);
return;
}
@@ -124,10 +119,9 @@ void pgtable_free_tlb(struct mmu_gather
}
}
-void pte_free_finish(void)
+void pte_free_finish(struct mmu_gather *tlb)
{
- /* This is safe since tlb_gather_mmu has disabled preemption */
- struct pte_freelist_batch **batchp = &__get_cpu_var(pte_freelist_cur);
+ struct pte_freelist_batch **batchp = &tlb->arch.batch;
if (*batchp == NULL)
return;
Index: linux-2.6/arch/powerpc/mm/tlb_hash64.c
===================================================================
--- linux-2.6.orig/arch/powerpc/mm/tlb_hash64.c
+++ linux-2.6/arch/powerpc/mm/tlb_hash64.c
@@ -38,13 +38,11 @@ DEFINE_PER_CPU(struct ppc64_tlb_batch, p
* neesd to be flushed. This function will either perform the flush
* immediately or will batch it up if the current CPU has an active
* batch on it.
- *
- * Must be called from within some kind of spinlock/non-preempt region...
*/
void hpte_need_flush(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, unsigned long pte, int huge)
{
- struct ppc64_tlb_batch *batch = &__get_cpu_var(ppc64_tlb_batch);
+ struct ppc64_tlb_batch *batch = &get_cpu_var(ppc64_tlb_batch);
unsigned long vsid, vaddr;
unsigned int psize;
int ssize;
@@ -99,6 +97,7 @@ void hpte_need_flush(struct mm_struct *m
*/
if (!batch->active) {
flush_hash_page(vaddr, rpte, psize, ssize, 0);
+ put_cpu_var(ppc64_tlb_batch);
return;
}
@@ -127,6 +126,7 @@ void hpte_need_flush(struct mm_struct *m
batch->index = ++i;
if (i >= PPC64_TLB_BATCH_NR)
__flush_tlb_pending(batch);
+ put_cpu_var(ppc64_tlb_batch);
}
/*
@@ -155,7 +155,7 @@ void __flush_tlb_pending(struct ppc64_tl
void tlb_flush(struct mmu_gather *tlb)
{
- struct ppc64_tlb_batch *tlbbatch = &__get_cpu_var(ppc64_tlb_batch);
+ struct ppc64_tlb_batch *tlbbatch = &get_cpu_var(ppc64_tlb_batch);
/* If there's a TLB batch pending, then we must flush it because the
* pages are going to be freed and we really don't want to have a CPU
@@ -164,8 +164,10 @@ void tlb_flush(struct mmu_gather *tlb)
if (tlbbatch->index)
__flush_tlb_pending(tlbbatch);
+ put_cpu_var(ppc64_tlb_batch);
+
/* Push out batch of freed page tables */
- pte_free_finish();
+ pte_free_finish(tlb);
}
/**
Index: linux-2.6/arch/powerpc/include/asm/thread_info.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/thread_info.h
+++ linux-2.6/arch/powerpc/include/asm/thread_info.h
@@ -139,10 +139,12 @@ static inline struct thread_info *curren
#define TLF_NAPPING 0 /* idle thread enabled NAP mode */
#define TLF_SLEEPING 1 /* suspend code enabled SLEEP mode */
#define TLF_RESTORE_SIGMASK 2 /* Restore signal mask in do_signal */
+#define TLF_LAZY_MMU 3 /* tlb_batch is active */
#define _TLF_NAPPING (1 << TLF_NAPPING)
#define _TLF_SLEEPING (1 << TLF_SLEEPING)
#define _TLF_RESTORE_SIGMASK (1 << TLF_RESTORE_SIGMASK)
+#define _TLF_LAZY_MMU (1 << TLF_LAZY_MMU)
#ifndef __ASSEMBLY__
#define HAVE_SET_RESTORE_SIGMASK 1
Index: linux-2.6/arch/powerpc/include/asm/pgalloc.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/pgalloc.h
+++ linux-2.6/arch/powerpc/include/asm/pgalloc.h
@@ -32,13 +32,13 @@ static inline void pte_free(struct mm_st
#ifdef CONFIG_SMP
extern void pgtable_free_tlb(struct mmu_gather *tlb, void *table, unsigned shift);
-extern void pte_free_finish(void);
+extern void pte_free_finish(struct mmu_gather *tlb);
#else /* CONFIG_SMP */
static inline void pgtable_free_tlb(struct mmu_gather *tlb, void *table, unsigned shift)
{
pgtable_free(table, shift);
}
-static inline void pte_free_finish(void) { }
+static inline void pte_free_finish(struct mmu_gather *tlb) { }
#endif /* !CONFIG_SMP */
static inline void __pte_free_tlb(struct mmu_gather *tlb, struct page *ptepage,
Index: linux-2.6/arch/powerpc/mm/tlb_hash32.c
===================================================================
--- linux-2.6.orig/arch/powerpc/mm/tlb_hash32.c
+++ linux-2.6/arch/powerpc/mm/tlb_hash32.c
@@ -73,7 +73,7 @@ void tlb_flush(struct mmu_gather *tlb)
}
/* Push out batch of freed page tables */
- pte_free_finish();
+ pte_free_finish(tlb);
}
/*
Index: linux-2.6/arch/powerpc/mm/tlb_nohash.c
===================================================================
--- linux-2.6.orig/arch/powerpc/mm/tlb_nohash.c
+++ linux-2.6/arch/powerpc/mm/tlb_nohash.c
@@ -301,7 +301,7 @@ void tlb_flush(struct mmu_gather *tlb)
flush_tlb_mm(tlb->mm);
/* Push out batch of freed page tables */
- pte_free_finish();
+ pte_free_finish(tlb);
}
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 04/25] sparc: Preemptible mmu_gather
2011-01-25 17:31 ` Peter Zijlstra
(?)
@ 2011-01-25 17:31 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: peter_zijlstra-sparc-preemptible_mmu_gather.patch --]
[-- Type: text/plain, Size: 8846 bytes --]
Rework the sparc mmu_gather usage to conform to the new world order :-)
Sparc mmu_gather does two things:
- tracks vaddrs to unhash
- tracks pages to free
Split these two things like powerpc has done and keep the vaddrs
in per-cpu data structures and flush them on context switch.
The remaining bits can then use the generic mmu_gather.
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/sparc/include/asm/pgalloc_64.h | 3 +
arch/sparc/include/asm/tlb_64.h | 91 ++---------------------------------
arch/sparc/include/asm/tlbflush_64.h | 12 +++-
arch/sparc/mm/tlb.c | 42 +++++++++-------
arch/sparc/mm/tsb.c | 15 +++--
5 files changed, 51 insertions(+), 112 deletions(-)
Index: linux-2.6/arch/sparc/include/asm/pgalloc_64.h
===================================================================
--- linux-2.6.orig/arch/sparc/include/asm/pgalloc_64.h
+++ linux-2.6/arch/sparc/include/asm/pgalloc_64.h
@@ -78,4 +78,7 @@ static inline void check_pgt_cache(void)
quicklist_trim(0, NULL, 25, 16);
}
+#define __pte_free_tlb(tlb, pte, addr) pte_free((tlb)->mm, pte)
+#define __pmd_free_tlb(tlb, pmd, addr) pmd_free((tlb)->mm, pmd)
+
#endif /* _SPARC64_PGALLOC_H */
Index: linux-2.6/arch/sparc/include/asm/tlb_64.h
===================================================================
--- linux-2.6.orig/arch/sparc/include/asm/tlb_64.h
+++ linux-2.6/arch/sparc/include/asm/tlb_64.h
@@ -7,66 +7,11 @@
#include <asm/tlbflush.h>
#include <asm/mmu_context.h>
-#define TLB_BATCH_NR 192
-
-/*
- * For UP we don't need to worry about TLB flush
- * and page free order so much..
- */
-#ifdef CONFIG_SMP
- #define FREE_PTE_NR 506
- #define tlb_fast_mode(bp) ((bp)->pages_nr == ~0U)
-#else
- #define FREE_PTE_NR 1
- #define tlb_fast_mode(bp) 1
-#endif
-
-struct mmu_gather {
- struct mm_struct *mm;
- unsigned int pages_nr;
- unsigned int need_flush;
- unsigned int fullmm;
- unsigned int tlb_nr;
- unsigned long vaddrs[TLB_BATCH_NR];
- struct page *pages[FREE_PTE_NR];
-};
-
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
#ifdef CONFIG_SMP
extern void smp_flush_tlb_pending(struct mm_struct *,
unsigned long, unsigned long *);
#endif
-extern void __flush_tlb_pending(unsigned long, unsigned long, unsigned long *);
-extern void flush_tlb_pending(void);
-
-static inline struct mmu_gather *tlb_gather_mmu(struct mm_struct *mm, unsigned int full_mm_flush)
-{
- struct mmu_gather *mp = &get_cpu_var(mmu_gathers);
-
- BUG_ON(mp->tlb_nr);
-
- mp->mm = mm;
- mp->pages_nr = num_online_cpus() > 1 ? 0U : ~0U;
- mp->fullmm = full_mm_flush;
-
- return mp;
-}
-
-
-static inline void tlb_flush_mmu(struct mmu_gather *mp)
-{
- if (!mp->fullmm)
- flush_tlb_pending();
- if (mp->need_flush) {
- free_pages_and_swap_cache(mp->pages, mp->pages_nr);
- mp->pages_nr = 0;
- mp->need_flush = 0;
- }
-
-}
-
#ifdef CONFIG_SMP
extern void smp_flush_tlb_mm(struct mm_struct *mm);
#define do_flush_tlb_mm(mm) smp_flush_tlb_mm(mm)
@@ -74,38 +19,14 @@ extern void smp_flush_tlb_mm(struct mm_s
#define do_flush_tlb_mm(mm) __flush_tlb_mm(CTX_HWBITS(mm->context), SECONDARY_CONTEXT)
#endif
-static inline void tlb_finish_mmu(struct mmu_gather *mp, unsigned long start, unsigned long end)
-{
- tlb_flush_mmu(mp);
-
- if (mp->fullmm)
- mp->fullmm = 0;
-
- /* keep the page table cache within bounds */
- check_pgt_cache();
-
- put_cpu_var(mmu_gathers);
-}
-
-static inline void tlb_remove_page(struct mmu_gather *mp, struct page *page)
-{
- if (tlb_fast_mode(mp)) {
- free_page_and_swap_cache(page);
- return;
- }
- mp->need_flush = 1;
- mp->pages[mp->pages_nr++] = page;
- if (mp->pages_nr >= FREE_PTE_NR)
- tlb_flush_mmu(mp);
-}
-
-#define tlb_remove_tlb_entry(mp,ptep,addr) do { } while (0)
-#define pte_free_tlb(mp, ptepage, addr) pte_free((mp)->mm, ptepage)
-#define pmd_free_tlb(mp, pmdp, addr) pmd_free((mp)->mm, pmdp)
-#define pud_free_tlb(tlb,pudp, addr) __pud_free_tlb(tlb,pudp,addr)
+extern void __flush_tlb_pending(unsigned long, unsigned long, unsigned long *);
+extern void flush_tlb_pending(void);
-#define tlb_migrate_finish(mm) do { } while (0)
#define tlb_start_vma(tlb, vma) do { } while (0)
#define tlb_end_vma(tlb, vma) do { } while (0)
+#define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
+#define tlb_flush(tlb) flush_tlb_pending()
+
+#include <asm-generic/tlb.h>
#endif /* _SPARC64_TLB_H */
Index: linux-2.6/arch/sparc/include/asm/tlbflush_64.h
===================================================================
--- linux-2.6.orig/arch/sparc/include/asm/tlbflush_64.h
+++ linux-2.6/arch/sparc/include/asm/tlbflush_64.h
@@ -5,9 +5,17 @@
#include <asm/mmu_context.h>
/* TSB flush operations. */
-struct mmu_gather;
+
+#define TLB_BATCH_NR 192
+
+struct tlb_batch {
+ struct mm_struct *mm;
+ unsigned long tlb_nr;
+ unsigned long vaddrs[TLB_BATCH_NR];
+};
+
extern void flush_tsb_kernel_range(unsigned long start, unsigned long end);
-extern void flush_tsb_user(struct mmu_gather *mp);
+extern void flush_tsb_user(struct tlb_batch *tb);
/* TLB flush operations. */
Index: linux-2.6/arch/sparc/mm/tlb.c
===================================================================
--- linux-2.6.orig/arch/sparc/mm/tlb.c
+++ linux-2.6/arch/sparc/mm/tlb.c
@@ -19,33 +19,33 @@
/* Heavily inspired by the ppc64 code. */
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
+static DEFINE_PER_CPU(struct tlb_batch, tlb_batch);
void flush_tlb_pending(void)
{
- struct mmu_gather *mp = &get_cpu_var(mmu_gathers);
+ struct tlb_batch *tb = &get_cpu_var(tlb_batch);
- if (mp->tlb_nr) {
- flush_tsb_user(mp);
+ if (tb->tlb_nr) {
+ flush_tsb_user(tb);
- if (CTX_VALID(mp->mm->context)) {
+ if (CTX_VALID(tb->mm->context)) {
#ifdef CONFIG_SMP
- smp_flush_tlb_pending(mp->mm, mp->tlb_nr,
- &mp->vaddrs[0]);
+ smp_flush_tlb_pending(tb->mm, tb->tlb_nr,
+ &tb->vaddrs[0]);
#else
- __flush_tlb_pending(CTX_HWBITS(mp->mm->context),
- mp->tlb_nr, &mp->vaddrs[0]);
+ __flush_tlb_pending(CTX_HWBITS(tb->mm->context),
+ tb->tlb_nr, &tb->vaddrs[0]);
#endif
}
- mp->tlb_nr = 0;
+ tb->tlb_nr = 0;
}
- put_cpu_var(mmu_gathers);
+ put_cpu_var(tlb_batch);
}
void tlb_batch_add(struct mm_struct *mm, unsigned long vaddr, pte_t *ptep, pte_t orig)
{
- struct mmu_gather *mp = &__get_cpu_var(mmu_gathers);
+ struct tlb_batch *tb = &get_cpu_var(tlb_batch);
unsigned long nr;
vaddr &= PAGE_MASK;
@@ -77,21 +77,27 @@ void tlb_batch_add(struct mm_struct *mm,
no_cache_flush:
- if (mp->fullmm)
+ /*
+ if (tb->fullmm) {
+ put_cpu_var(tlb_batch);
return;
+ }
+ */
- nr = mp->tlb_nr;
+ nr = tb->tlb_nr;
- if (unlikely(nr != 0 && mm != mp->mm)) {
+ if (unlikely(nr != 0 && mm != tb->mm)) {
flush_tlb_pending();
nr = 0;
}
if (nr == 0)
- mp->mm = mm;
+ tb->mm = mm;
- mp->vaddrs[nr] = vaddr;
- mp->tlb_nr = ++nr;
+ tb->vaddrs[nr] = vaddr;
+ tb->tlb_nr = ++nr;
if (nr >= TLB_BATCH_NR)
flush_tlb_pending();
+
+ put_cpu_var(tlb_batch);
}
Index: linux-2.6/arch/sparc/mm/tsb.c
===================================================================
--- linux-2.6.orig/arch/sparc/mm/tsb.c
+++ linux-2.6/arch/sparc/mm/tsb.c
@@ -47,12 +47,13 @@ void flush_tsb_kernel_range(unsigned lon
}
}
-static void __flush_tsb_one(struct mmu_gather *mp, unsigned long hash_shift, unsigned long tsb, unsigned long nentries)
+static void __flush_tsb_one(struct tlb_batch *tb, unsigned long hash_shift,
+ unsigned long tsb, unsigned long nentries)
{
unsigned long i;
- for (i = 0; i < mp->tlb_nr; i++) {
- unsigned long v = mp->vaddrs[i];
+ for (i = 0; i < tb->tlb_nr; i++) {
+ unsigned long v = tb->vaddrs[i];
unsigned long tag, ent, hash;
v &= ~0x1UL;
@@ -65,9 +66,9 @@ static void __flush_tsb_one(struct mmu_g
}
}
-void flush_tsb_user(struct mmu_gather *mp)
+void flush_tsb_user(struct tlb_batch *tb)
{
- struct mm_struct *mm = mp->mm;
+ struct mm_struct *mm = tb->mm;
unsigned long nentries, base, flags;
spin_lock_irqsave(&mm->context.lock, flags);
@@ -76,7 +77,7 @@ void flush_tsb_user(struct mmu_gather *m
nentries = mm->context.tsb_block[MM_TSB_BASE].tsb_nentries;
if (tlb_type == cheetah_plus || tlb_type == hypervisor)
base = __pa(base);
- __flush_tsb_one(mp, PAGE_SHIFT, base, nentries);
+ __flush_tsb_one(tb, PAGE_SHIFT, base, nentries);
#ifdef CONFIG_HUGETLB_PAGE
if (mm->context.tsb_block[MM_TSB_HUGE].tsb) {
@@ -84,7 +85,7 @@ void flush_tsb_user(struct mmu_gather *m
nentries = mm->context.tsb_block[MM_TSB_HUGE].tsb_nentries;
if (tlb_type == cheetah_plus || tlb_type == hypervisor)
base = __pa(base);
- __flush_tsb_one(mp, HPAGE_SHIFT, base, nentries);
+ __flush_tsb_one(tb, HPAGE_SHIFT, base, nentries);
}
#endif
spin_unlock_irqrestore(&mm->context.lock, flags);
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 04/25] sparc: Preemptible mmu_gather
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: peter_zijlstra-sparc-preemptible_mmu_gather.patch --]
[-- Type: text/plain, Size: 9142 bytes --]
Rework the sparc mmu_gather usage to conform to the new world order :-)
Sparc mmu_gather does two things:
- tracks vaddrs to unhash
- tracks pages to free
Split these two things like powerpc has done and keep the vaddrs
in per-cpu data structures and flush them on context switch.
The remaining bits can then use the generic mmu_gather.
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/sparc/include/asm/pgalloc_64.h | 3 +
arch/sparc/include/asm/tlb_64.h | 91 ++---------------------------------
arch/sparc/include/asm/tlbflush_64.h | 12 +++-
arch/sparc/mm/tlb.c | 42 +++++++++-------
arch/sparc/mm/tsb.c | 15 +++--
5 files changed, 51 insertions(+), 112 deletions(-)
Index: linux-2.6/arch/sparc/include/asm/pgalloc_64.h
===================================================================
--- linux-2.6.orig/arch/sparc/include/asm/pgalloc_64.h
+++ linux-2.6/arch/sparc/include/asm/pgalloc_64.h
@@ -78,4 +78,7 @@ static inline void check_pgt_cache(void)
quicklist_trim(0, NULL, 25, 16);
}
+#define __pte_free_tlb(tlb, pte, addr) pte_free((tlb)->mm, pte)
+#define __pmd_free_tlb(tlb, pmd, addr) pmd_free((tlb)->mm, pmd)
+
#endif /* _SPARC64_PGALLOC_H */
Index: linux-2.6/arch/sparc/include/asm/tlb_64.h
===================================================================
--- linux-2.6.orig/arch/sparc/include/asm/tlb_64.h
+++ linux-2.6/arch/sparc/include/asm/tlb_64.h
@@ -7,66 +7,11 @@
#include <asm/tlbflush.h>
#include <asm/mmu_context.h>
-#define TLB_BATCH_NR 192
-
-/*
- * For UP we don't need to worry about TLB flush
- * and page free order so much..
- */
-#ifdef CONFIG_SMP
- #define FREE_PTE_NR 506
- #define tlb_fast_mode(bp) ((bp)->pages_nr == ~0U)
-#else
- #define FREE_PTE_NR 1
- #define tlb_fast_mode(bp) 1
-#endif
-
-struct mmu_gather {
- struct mm_struct *mm;
- unsigned int pages_nr;
- unsigned int need_flush;
- unsigned int fullmm;
- unsigned int tlb_nr;
- unsigned long vaddrs[TLB_BATCH_NR];
- struct page *pages[FREE_PTE_NR];
-};
-
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
#ifdef CONFIG_SMP
extern void smp_flush_tlb_pending(struct mm_struct *,
unsigned long, unsigned long *);
#endif
-extern void __flush_tlb_pending(unsigned long, unsigned long, unsigned long *);
-extern void flush_tlb_pending(void);
-
-static inline struct mmu_gather *tlb_gather_mmu(struct mm_struct *mm, unsigned int full_mm_flush)
-{
- struct mmu_gather *mp = &get_cpu_var(mmu_gathers);
-
- BUG_ON(mp->tlb_nr);
-
- mp->mm = mm;
- mp->pages_nr = num_online_cpus() > 1 ? 0U : ~0U;
- mp->fullmm = full_mm_flush;
-
- return mp;
-}
-
-
-static inline void tlb_flush_mmu(struct mmu_gather *mp)
-{
- if (!mp->fullmm)
- flush_tlb_pending();
- if (mp->need_flush) {
- free_pages_and_swap_cache(mp->pages, mp->pages_nr);
- mp->pages_nr = 0;
- mp->need_flush = 0;
- }
-
-}
-
#ifdef CONFIG_SMP
extern void smp_flush_tlb_mm(struct mm_struct *mm);
#define do_flush_tlb_mm(mm) smp_flush_tlb_mm(mm)
@@ -74,38 +19,14 @@ extern void smp_flush_tlb_mm(struct mm_s
#define do_flush_tlb_mm(mm) __flush_tlb_mm(CTX_HWBITS(mm->context), SECONDARY_CONTEXT)
#endif
-static inline void tlb_finish_mmu(struct mmu_gather *mp, unsigned long start, unsigned long end)
-{
- tlb_flush_mmu(mp);
-
- if (mp->fullmm)
- mp->fullmm = 0;
-
- /* keep the page table cache within bounds */
- check_pgt_cache();
-
- put_cpu_var(mmu_gathers);
-}
-
-static inline void tlb_remove_page(struct mmu_gather *mp, struct page *page)
-{
- if (tlb_fast_mode(mp)) {
- free_page_and_swap_cache(page);
- return;
- }
- mp->need_flush = 1;
- mp->pages[mp->pages_nr++] = page;
- if (mp->pages_nr >= FREE_PTE_NR)
- tlb_flush_mmu(mp);
-}
-
-#define tlb_remove_tlb_entry(mp,ptep,addr) do { } while (0)
-#define pte_free_tlb(mp, ptepage, addr) pte_free((mp)->mm, ptepage)
-#define pmd_free_tlb(mp, pmdp, addr) pmd_free((mp)->mm, pmdp)
-#define pud_free_tlb(tlb,pudp, addr) __pud_free_tlb(tlb,pudp,addr)
+extern void __flush_tlb_pending(unsigned long, unsigned long, unsigned long *);
+extern void flush_tlb_pending(void);
-#define tlb_migrate_finish(mm) do { } while (0)
#define tlb_start_vma(tlb, vma) do { } while (0)
#define tlb_end_vma(tlb, vma) do { } while (0)
+#define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
+#define tlb_flush(tlb) flush_tlb_pending()
+
+#include <asm-generic/tlb.h>
#endif /* _SPARC64_TLB_H */
Index: linux-2.6/arch/sparc/include/asm/tlbflush_64.h
===================================================================
--- linux-2.6.orig/arch/sparc/include/asm/tlbflush_64.h
+++ linux-2.6/arch/sparc/include/asm/tlbflush_64.h
@@ -5,9 +5,17 @@
#include <asm/mmu_context.h>
/* TSB flush operations. */
-struct mmu_gather;
+
+#define TLB_BATCH_NR 192
+
+struct tlb_batch {
+ struct mm_struct *mm;
+ unsigned long tlb_nr;
+ unsigned long vaddrs[TLB_BATCH_NR];
+};
+
extern void flush_tsb_kernel_range(unsigned long start, unsigned long end);
-extern void flush_tsb_user(struct mmu_gather *mp);
+extern void flush_tsb_user(struct tlb_batch *tb);
/* TLB flush operations. */
Index: linux-2.6/arch/sparc/mm/tlb.c
===================================================================
--- linux-2.6.orig/arch/sparc/mm/tlb.c
+++ linux-2.6/arch/sparc/mm/tlb.c
@@ -19,33 +19,33 @@
/* Heavily inspired by the ppc64 code. */
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
+static DEFINE_PER_CPU(struct tlb_batch, tlb_batch);
void flush_tlb_pending(void)
{
- struct mmu_gather *mp = &get_cpu_var(mmu_gathers);
+ struct tlb_batch *tb = &get_cpu_var(tlb_batch);
- if (mp->tlb_nr) {
- flush_tsb_user(mp);
+ if (tb->tlb_nr) {
+ flush_tsb_user(tb);
- if (CTX_VALID(mp->mm->context)) {
+ if (CTX_VALID(tb->mm->context)) {
#ifdef CONFIG_SMP
- smp_flush_tlb_pending(mp->mm, mp->tlb_nr,
- &mp->vaddrs[0]);
+ smp_flush_tlb_pending(tb->mm, tb->tlb_nr,
+ &tb->vaddrs[0]);
#else
- __flush_tlb_pending(CTX_HWBITS(mp->mm->context),
- mp->tlb_nr, &mp->vaddrs[0]);
+ __flush_tlb_pending(CTX_HWBITS(tb->mm->context),
+ tb->tlb_nr, &tb->vaddrs[0]);
#endif
}
- mp->tlb_nr = 0;
+ tb->tlb_nr = 0;
}
- put_cpu_var(mmu_gathers);
+ put_cpu_var(tlb_batch);
}
void tlb_batch_add(struct mm_struct *mm, unsigned long vaddr, pte_t *ptep, pte_t orig)
{
- struct mmu_gather *mp = &__get_cpu_var(mmu_gathers);
+ struct tlb_batch *tb = &get_cpu_var(tlb_batch);
unsigned long nr;
vaddr &= PAGE_MASK;
@@ -77,21 +77,27 @@ void tlb_batch_add(struct mm_struct *mm,
no_cache_flush:
- if (mp->fullmm)
+ /*
+ if (tb->fullmm) {
+ put_cpu_var(tlb_batch);
return;
+ }
+ */
- nr = mp->tlb_nr;
+ nr = tb->tlb_nr;
- if (unlikely(nr != 0 && mm != mp->mm)) {
+ if (unlikely(nr != 0 && mm != tb->mm)) {
flush_tlb_pending();
nr = 0;
}
if (nr == 0)
- mp->mm = mm;
+ tb->mm = mm;
- mp->vaddrs[nr] = vaddr;
- mp->tlb_nr = ++nr;
+ tb->vaddrs[nr] = vaddr;
+ tb->tlb_nr = ++nr;
if (nr >= TLB_BATCH_NR)
flush_tlb_pending();
+
+ put_cpu_var(tlb_batch);
}
Index: linux-2.6/arch/sparc/mm/tsb.c
===================================================================
--- linux-2.6.orig/arch/sparc/mm/tsb.c
+++ linux-2.6/arch/sparc/mm/tsb.c
@@ -47,12 +47,13 @@ void flush_tsb_kernel_range(unsigned lon
}
}
-static void __flush_tsb_one(struct mmu_gather *mp, unsigned long hash_shift, unsigned long tsb, unsigned long nentries)
+static void __flush_tsb_one(struct tlb_batch *tb, unsigned long hash_shift,
+ unsigned long tsb, unsigned long nentries)
{
unsigned long i;
- for (i = 0; i < mp->tlb_nr; i++) {
- unsigned long v = mp->vaddrs[i];
+ for (i = 0; i < tb->tlb_nr; i++) {
+ unsigned long v = tb->vaddrs[i];
unsigned long tag, ent, hash;
v &= ~0x1UL;
@@ -65,9 +66,9 @@ static void __flush_tsb_one(struct mmu_g
}
}
-void flush_tsb_user(struct mmu_gather *mp)
+void flush_tsb_user(struct tlb_batch *tb)
{
- struct mm_struct *mm = mp->mm;
+ struct mm_struct *mm = tb->mm;
unsigned long nentries, base, flags;
spin_lock_irqsave(&mm->context.lock, flags);
@@ -76,7 +77,7 @@ void flush_tsb_user(struct mmu_gather *m
nentries = mm->context.tsb_block[MM_TSB_BASE].tsb_nentries;
if (tlb_type == cheetah_plus || tlb_type == hypervisor)
base = __pa(base);
- __flush_tsb_one(mp, PAGE_SHIFT, base, nentries);
+ __flush_tsb_one(tb, PAGE_SHIFT, base, nentries);
#ifdef CONFIG_HUGETLB_PAGE
if (mm->context.tsb_block[MM_TSB_HUGE].tsb) {
@@ -84,7 +85,7 @@ void flush_tsb_user(struct mmu_gather *m
nentries = mm->context.tsb_block[MM_TSB_HUGE].tsb_nentries;
if (tlb_type == cheetah_plus || tlb_type == hypervisor)
base = __pa(base);
- __flush_tsb_one(mp, HPAGE_SHIFT, base, nentries);
+ __flush_tsb_one(tb, HPAGE_SHIFT, base, nentries);
}
#endif
spin_unlock_irqrestore(&mm->context.lock, flags);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 04/25] sparc: Preemptible mmu_gather
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: peter_zijlstra-sparc-preemptible_mmu_gather.patch --]
[-- Type: text/plain, Size: 9142 bytes --]
Rework the sparc mmu_gather usage to conform to the new world order :-)
Sparc mmu_gather does two things:
- tracks vaddrs to unhash
- tracks pages to free
Split these two things like powerpc has done and keep the vaddrs
in per-cpu data structures and flush them on context switch.
The remaining bits can then use the generic mmu_gather.
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/sparc/include/asm/pgalloc_64.h | 3 +
arch/sparc/include/asm/tlb_64.h | 91 ++---------------------------------
arch/sparc/include/asm/tlbflush_64.h | 12 +++-
arch/sparc/mm/tlb.c | 42 +++++++++-------
arch/sparc/mm/tsb.c | 15 +++--
5 files changed, 51 insertions(+), 112 deletions(-)
Index: linux-2.6/arch/sparc/include/asm/pgalloc_64.h
===================================================================
--- linux-2.6.orig/arch/sparc/include/asm/pgalloc_64.h
+++ linux-2.6/arch/sparc/include/asm/pgalloc_64.h
@@ -78,4 +78,7 @@ static inline void check_pgt_cache(void)
quicklist_trim(0, NULL, 25, 16);
}
+#define __pte_free_tlb(tlb, pte, addr) pte_free((tlb)->mm, pte)
+#define __pmd_free_tlb(tlb, pmd, addr) pmd_free((tlb)->mm, pmd)
+
#endif /* _SPARC64_PGALLOC_H */
Index: linux-2.6/arch/sparc/include/asm/tlb_64.h
===================================================================
--- linux-2.6.orig/arch/sparc/include/asm/tlb_64.h
+++ linux-2.6/arch/sparc/include/asm/tlb_64.h
@@ -7,66 +7,11 @@
#include <asm/tlbflush.h>
#include <asm/mmu_context.h>
-#define TLB_BATCH_NR 192
-
-/*
- * For UP we don't need to worry about TLB flush
- * and page free order so much..
- */
-#ifdef CONFIG_SMP
- #define FREE_PTE_NR 506
- #define tlb_fast_mode(bp) ((bp)->pages_nr == ~0U)
-#else
- #define FREE_PTE_NR 1
- #define tlb_fast_mode(bp) 1
-#endif
-
-struct mmu_gather {
- struct mm_struct *mm;
- unsigned int pages_nr;
- unsigned int need_flush;
- unsigned int fullmm;
- unsigned int tlb_nr;
- unsigned long vaddrs[TLB_BATCH_NR];
- struct page *pages[FREE_PTE_NR];
-};
-
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
#ifdef CONFIG_SMP
extern void smp_flush_tlb_pending(struct mm_struct *,
unsigned long, unsigned long *);
#endif
-extern void __flush_tlb_pending(unsigned long, unsigned long, unsigned long *);
-extern void flush_tlb_pending(void);
-
-static inline struct mmu_gather *tlb_gather_mmu(struct mm_struct *mm, unsigned int full_mm_flush)
-{
- struct mmu_gather *mp = &get_cpu_var(mmu_gathers);
-
- BUG_ON(mp->tlb_nr);
-
- mp->mm = mm;
- mp->pages_nr = num_online_cpus() > 1 ? 0U : ~0U;
- mp->fullmm = full_mm_flush;
-
- return mp;
-}
-
-
-static inline void tlb_flush_mmu(struct mmu_gather *mp)
-{
- if (!mp->fullmm)
- flush_tlb_pending();
- if (mp->need_flush) {
- free_pages_and_swap_cache(mp->pages, mp->pages_nr);
- mp->pages_nr = 0;
- mp->need_flush = 0;
- }
-
-}
-
#ifdef CONFIG_SMP
extern void smp_flush_tlb_mm(struct mm_struct *mm);
#define do_flush_tlb_mm(mm) smp_flush_tlb_mm(mm)
@@ -74,38 +19,14 @@ extern void smp_flush_tlb_mm(struct mm_s
#define do_flush_tlb_mm(mm) __flush_tlb_mm(CTX_HWBITS(mm->context), SECONDARY_CONTEXT)
#endif
-static inline void tlb_finish_mmu(struct mmu_gather *mp, unsigned long start, unsigned long end)
-{
- tlb_flush_mmu(mp);
-
- if (mp->fullmm)
- mp->fullmm = 0;
-
- /* keep the page table cache within bounds */
- check_pgt_cache();
-
- put_cpu_var(mmu_gathers);
-}
-
-static inline void tlb_remove_page(struct mmu_gather *mp, struct page *page)
-{
- if (tlb_fast_mode(mp)) {
- free_page_and_swap_cache(page);
- return;
- }
- mp->need_flush = 1;
- mp->pages[mp->pages_nr++] = page;
- if (mp->pages_nr >= FREE_PTE_NR)
- tlb_flush_mmu(mp);
-}
-
-#define tlb_remove_tlb_entry(mp,ptep,addr) do { } while (0)
-#define pte_free_tlb(mp, ptepage, addr) pte_free((mp)->mm, ptepage)
-#define pmd_free_tlb(mp, pmdp, addr) pmd_free((mp)->mm, pmdp)
-#define pud_free_tlb(tlb,pudp, addr) __pud_free_tlb(tlb,pudp,addr)
+extern void __flush_tlb_pending(unsigned long, unsigned long, unsigned long *);
+extern void flush_tlb_pending(void);
-#define tlb_migrate_finish(mm) do { } while (0)
#define tlb_start_vma(tlb, vma) do { } while (0)
#define tlb_end_vma(tlb, vma) do { } while (0)
+#define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
+#define tlb_flush(tlb) flush_tlb_pending()
+
+#include <asm-generic/tlb.h>
#endif /* _SPARC64_TLB_H */
Index: linux-2.6/arch/sparc/include/asm/tlbflush_64.h
===================================================================
--- linux-2.6.orig/arch/sparc/include/asm/tlbflush_64.h
+++ linux-2.6/arch/sparc/include/asm/tlbflush_64.h
@@ -5,9 +5,17 @@
#include <asm/mmu_context.h>
/* TSB flush operations. */
-struct mmu_gather;
+
+#define TLB_BATCH_NR 192
+
+struct tlb_batch {
+ struct mm_struct *mm;
+ unsigned long tlb_nr;
+ unsigned long vaddrs[TLB_BATCH_NR];
+};
+
extern void flush_tsb_kernel_range(unsigned long start, unsigned long end);
-extern void flush_tsb_user(struct mmu_gather *mp);
+extern void flush_tsb_user(struct tlb_batch *tb);
/* TLB flush operations. */
Index: linux-2.6/arch/sparc/mm/tlb.c
===================================================================
--- linux-2.6.orig/arch/sparc/mm/tlb.c
+++ linux-2.6/arch/sparc/mm/tlb.c
@@ -19,33 +19,33 @@
/* Heavily inspired by the ppc64 code. */
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
+static DEFINE_PER_CPU(struct tlb_batch, tlb_batch);
void flush_tlb_pending(void)
{
- struct mmu_gather *mp = &get_cpu_var(mmu_gathers);
+ struct tlb_batch *tb = &get_cpu_var(tlb_batch);
- if (mp->tlb_nr) {
- flush_tsb_user(mp);
+ if (tb->tlb_nr) {
+ flush_tsb_user(tb);
- if (CTX_VALID(mp->mm->context)) {
+ if (CTX_VALID(tb->mm->context)) {
#ifdef CONFIG_SMP
- smp_flush_tlb_pending(mp->mm, mp->tlb_nr,
- &mp->vaddrs[0]);
+ smp_flush_tlb_pending(tb->mm, tb->tlb_nr,
+ &tb->vaddrs[0]);
#else
- __flush_tlb_pending(CTX_HWBITS(mp->mm->context),
- mp->tlb_nr, &mp->vaddrs[0]);
+ __flush_tlb_pending(CTX_HWBITS(tb->mm->context),
+ tb->tlb_nr, &tb->vaddrs[0]);
#endif
}
- mp->tlb_nr = 0;
+ tb->tlb_nr = 0;
}
- put_cpu_var(mmu_gathers);
+ put_cpu_var(tlb_batch);
}
void tlb_batch_add(struct mm_struct *mm, unsigned long vaddr, pte_t *ptep, pte_t orig)
{
- struct mmu_gather *mp = &__get_cpu_var(mmu_gathers);
+ struct tlb_batch *tb = &get_cpu_var(tlb_batch);
unsigned long nr;
vaddr &= PAGE_MASK;
@@ -77,21 +77,27 @@ void tlb_batch_add(struct mm_struct *mm,
no_cache_flush:
- if (mp->fullmm)
+ /*
+ if (tb->fullmm) {
+ put_cpu_var(tlb_batch);
return;
+ }
+ */
- nr = mp->tlb_nr;
+ nr = tb->tlb_nr;
- if (unlikely(nr != 0 && mm != mp->mm)) {
+ if (unlikely(nr != 0 && mm != tb->mm)) {
flush_tlb_pending();
nr = 0;
}
if (nr == 0)
- mp->mm = mm;
+ tb->mm = mm;
- mp->vaddrs[nr] = vaddr;
- mp->tlb_nr = ++nr;
+ tb->vaddrs[nr] = vaddr;
+ tb->tlb_nr = ++nr;
if (nr >= TLB_BATCH_NR)
flush_tlb_pending();
+
+ put_cpu_var(tlb_batch);
}
Index: linux-2.6/arch/sparc/mm/tsb.c
===================================================================
--- linux-2.6.orig/arch/sparc/mm/tsb.c
+++ linux-2.6/arch/sparc/mm/tsb.c
@@ -47,12 +47,13 @@ void flush_tsb_kernel_range(unsigned lon
}
}
-static void __flush_tsb_one(struct mmu_gather *mp, unsigned long hash_shift, unsigned long tsb, unsigned long nentries)
+static void __flush_tsb_one(struct tlb_batch *tb, unsigned long hash_shift,
+ unsigned long tsb, unsigned long nentries)
{
unsigned long i;
- for (i = 0; i < mp->tlb_nr; i++) {
- unsigned long v = mp->vaddrs[i];
+ for (i = 0; i < tb->tlb_nr; i++) {
+ unsigned long v = tb->vaddrs[i];
unsigned long tag, ent, hash;
v &= ~0x1UL;
@@ -65,9 +66,9 @@ static void __flush_tsb_one(struct mmu_g
}
}
-void flush_tsb_user(struct mmu_gather *mp)
+void flush_tsb_user(struct tlb_batch *tb)
{
- struct mm_struct *mm = mp->mm;
+ struct mm_struct *mm = tb->mm;
unsigned long nentries, base, flags;
spin_lock_irqsave(&mm->context.lock, flags);
@@ -76,7 +77,7 @@ void flush_tsb_user(struct mmu_gather *m
nentries = mm->context.tsb_block[MM_TSB_BASE].tsb_nentries;
if (tlb_type == cheetah_plus || tlb_type == hypervisor)
base = __pa(base);
- __flush_tsb_one(mp, PAGE_SHIFT, base, nentries);
+ __flush_tsb_one(tb, PAGE_SHIFT, base, nentries);
#ifdef CONFIG_HUGETLB_PAGE
if (mm->context.tsb_block[MM_TSB_HUGE].tsb) {
@@ -84,7 +85,7 @@ void flush_tsb_user(struct mmu_gather *m
nentries = mm->context.tsb_block[MM_TSB_HUGE].tsb_nentries;
if (tlb_type == cheetah_plus || tlb_type == hypervisor)
base = __pa(base);
- __flush_tsb_one(mp, HPAGE_SHIFT, base, nentries);
+ __flush_tsb_one(tb, HPAGE_SHIFT, base, nentries);
}
#endif
spin_unlock_irqrestore(&mm->context.lock, flags);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 04/25] sparc: Preemptible mmu_gather
2011-01-25 17:31 ` Peter Zijlstra
@ 2011-01-25 20:30 ` David Miller
-1 siblings, 0 replies; 128+ messages in thread
From: David Miller @ 2011-01-25 20:30 UTC (permalink / raw)
To: a.p.zijlstra
Cc: aarcange, avi, tglx, riel, mingo, akpm, torvalds, linux-kernel,
linux-arch, linux-mm, benh, hugh.dickins, mel, npiggin, paulmck,
yanmin_zhang
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Tue, 25 Jan 2011 18:31:15 +0100
> Rework the sparc mmu_gather usage to conform to the new world order :-)
>
> Sparc mmu_gather does two things:
> - tracks vaddrs to unhash
> - tracks pages to free
>
> Split these two things like powerpc has done and keep the vaddrs
> in per-cpu data structures and flush them on context switch.
>
> The remaining bits can then use the generic mmu_gather.
>
> Cc: David Miller <davem@davemloft.net>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 04/25] sparc: Preemptible mmu_gather
@ 2011-01-25 20:30 ` David Miller
0 siblings, 0 replies; 128+ messages in thread
From: David Miller @ 2011-01-25 20:30 UTC (permalink / raw)
To: a.p.zijlstra
Cc: aarcange, avi, tglx, riel, mingo, akpm, torvalds, linux-kernel,
linux-arch, linux-mm, benh, hugh.dickins, mel, npiggin, paulmck,
yanmin_zhang
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Tue, 25 Jan 2011 18:31:15 +0100
> Rework the sparc mmu_gather usage to conform to the new world order :-)
>
> Sparc mmu_gather does two things:
> - tracks vaddrs to unhash
> - tracks pages to free
>
> Split these two things like powerpc has done and keep the vaddrs
> in per-cpu data structures and flush them on context switch.
>
> The remaining bits can then use the generic mmu_gather.
>
> Cc: David Miller <davem@davemloft.net>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: David S. Miller <davem@davemloft.net>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 05/25] s390: preemptible mmu_gather
2011-01-25 17:31 ` Peter Zijlstra
(?)
@ 2011-01-25 17:31 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Martin Schwidefsky
[-- Attachment #1: martin_schwidefsky-s390-preemptible_mmu_gather.patch --]
[-- Type: text/plain, Size: 2636 bytes --]
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
Adapt the stand-alone s390 mmu_gather implementation to the new
preemptible mmu_gather interface.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20101126145410.881573395@chello.nl>
---
arch/s390/include/asm/tlb.h | 43 +++++++++++++++++++++++++------------------
1 file changed, 25 insertions(+), 18 deletions(-)
Index: linux-2.6/arch/s390/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/s390/include/asm/tlb.h
+++ linux-2.6/arch/s390/include/asm/tlb.h
@@ -28,44 +28,50 @@
#include <asm/smp.h>
#include <asm/tlbflush.h>
-#ifndef CONFIG_SMP
-#define TLB_NR_PTRS 1
-#else
-#define TLB_NR_PTRS 508
-#endif
-
struct mmu_gather {
struct mm_struct *mm;
unsigned int fullmm;
unsigned int nr_ptes;
unsigned int nr_pxds;
- void *array[TLB_NR_PTRS];
+ unsigned int max;
+ void **array;
+ void *local[8];
};
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
-static inline struct mmu_gather *tlb_gather_mmu(struct mm_struct *mm,
- unsigned int full_mm_flush)
+static inline void __tlb_alloc_page(struct mmu_gather *tlb)
{
- struct mmu_gather *tlb = &get_cpu_var(mmu_gathers);
+ unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
+
+ if (addr) {
+ tlb->array = (void *) addr;
+ tlb->max = PAGE_SIZE / sizeof(void *);
+ }
+}
+static inline void tlb_gather_mmu(struct mmu_gather *tlb,
+ struct mm_struct *mm,
+ unsigned int full_mm_flush)
+{
tlb->mm = mm;
+ tlb->max = ARRAY_SIZE(tlb->local);
+ tlb->array = tlb->local;
tlb->fullmm = full_mm_flush;
- tlb->nr_ptes = 0;
- tlb->nr_pxds = TLB_NR_PTRS;
if (tlb->fullmm)
__tlb_flush_mm(mm);
- return tlb;
+ else
+ __tlb_alloc_page(tlb);
+ tlb->nr_ptes = 0;
+ tlb->nr_pxds = tlb->max;
}
static inline void tlb_flush_mmu(struct mmu_gather *tlb,
unsigned long start, unsigned long end)
{
- if (!tlb->fullmm && (tlb->nr_ptes > 0 || tlb->nr_pxds < TLB_NR_PTRS))
+ if (!tlb->fullmm && (tlb->nr_ptes > 0 || tlb->nr_pxds < tlb->max))
__tlb_flush_mm(tlb->mm);
while (tlb->nr_ptes > 0)
page_table_free_rcu(tlb->mm, tlb->array[--tlb->nr_ptes]);
- while (tlb->nr_pxds < TLB_NR_PTRS)
+ while (tlb->nr_pxds < tlb->max)
crst_table_free_rcu(tlb->mm, tlb->array[tlb->nr_pxds++]);
}
@@ -79,7 +85,8 @@ static inline void tlb_finish_mmu(struct
/* keep the page table cache within bounds */
check_pgt_cache();
- put_cpu_var(mmu_gathers);
+ if (tlb->array != tlb->local)
+ free_pages((unsigned long) tlb->array, 0);
}
/*
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 05/25] s390: preemptible mmu_gather
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Martin Schwidefsky
[-- Attachment #1: martin_schwidefsky-s390-preemptible_mmu_gather.patch --]
[-- Type: text/plain, Size: 2932 bytes --]
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
Adapt the stand-alone s390 mmu_gather implementation to the new
preemptible mmu_gather interface.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20101126145410.881573395@chello.nl>
---
arch/s390/include/asm/tlb.h | 43 +++++++++++++++++++++++++------------------
1 file changed, 25 insertions(+), 18 deletions(-)
Index: linux-2.6/arch/s390/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/s390/include/asm/tlb.h
+++ linux-2.6/arch/s390/include/asm/tlb.h
@@ -28,44 +28,50 @@
#include <asm/smp.h>
#include <asm/tlbflush.h>
-#ifndef CONFIG_SMP
-#define TLB_NR_PTRS 1
-#else
-#define TLB_NR_PTRS 508
-#endif
-
struct mmu_gather {
struct mm_struct *mm;
unsigned int fullmm;
unsigned int nr_ptes;
unsigned int nr_pxds;
- void *array[TLB_NR_PTRS];
+ unsigned int max;
+ void **array;
+ void *local[8];
};
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
-static inline struct mmu_gather *tlb_gather_mmu(struct mm_struct *mm,
- unsigned int full_mm_flush)
+static inline void __tlb_alloc_page(struct mmu_gather *tlb)
{
- struct mmu_gather *tlb = &get_cpu_var(mmu_gathers);
+ unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
+
+ if (addr) {
+ tlb->array = (void *) addr;
+ tlb->max = PAGE_SIZE / sizeof(void *);
+ }
+}
+static inline void tlb_gather_mmu(struct mmu_gather *tlb,
+ struct mm_struct *mm,
+ unsigned int full_mm_flush)
+{
tlb->mm = mm;
+ tlb->max = ARRAY_SIZE(tlb->local);
+ tlb->array = tlb->local;
tlb->fullmm = full_mm_flush;
- tlb->nr_ptes = 0;
- tlb->nr_pxds = TLB_NR_PTRS;
if (tlb->fullmm)
__tlb_flush_mm(mm);
- return tlb;
+ else
+ __tlb_alloc_page(tlb);
+ tlb->nr_ptes = 0;
+ tlb->nr_pxds = tlb->max;
}
static inline void tlb_flush_mmu(struct mmu_gather *tlb,
unsigned long start, unsigned long end)
{
- if (!tlb->fullmm && (tlb->nr_ptes > 0 || tlb->nr_pxds < TLB_NR_PTRS))
+ if (!tlb->fullmm && (tlb->nr_ptes > 0 || tlb->nr_pxds < tlb->max))
__tlb_flush_mm(tlb->mm);
while (tlb->nr_ptes > 0)
page_table_free_rcu(tlb->mm, tlb->array[--tlb->nr_ptes]);
- while (tlb->nr_pxds < TLB_NR_PTRS)
+ while (tlb->nr_pxds < tlb->max)
crst_table_free_rcu(tlb->mm, tlb->array[tlb->nr_pxds++]);
}
@@ -79,7 +85,8 @@ static inline void tlb_finish_mmu(struct
/* keep the page table cache within bounds */
check_pgt_cache();
- put_cpu_var(mmu_gathers);
+ if (tlb->array != tlb->local)
+ free_pages((unsigned long) tlb->array, 0);
}
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 05/25] s390: preemptible mmu_gather
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Martin Schwidefsky
[-- Attachment #1: martin_schwidefsky-s390-preemptible_mmu_gather.patch --]
[-- Type: text/plain, Size: 2932 bytes --]
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
Adapt the stand-alone s390 mmu_gather implementation to the new
preemptible mmu_gather interface.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20101126145410.881573395@chello.nl>
---
arch/s390/include/asm/tlb.h | 43 +++++++++++++++++++++++++------------------
1 file changed, 25 insertions(+), 18 deletions(-)
Index: linux-2.6/arch/s390/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/s390/include/asm/tlb.h
+++ linux-2.6/arch/s390/include/asm/tlb.h
@@ -28,44 +28,50 @@
#include <asm/smp.h>
#include <asm/tlbflush.h>
-#ifndef CONFIG_SMP
-#define TLB_NR_PTRS 1
-#else
-#define TLB_NR_PTRS 508
-#endif
-
struct mmu_gather {
struct mm_struct *mm;
unsigned int fullmm;
unsigned int nr_ptes;
unsigned int nr_pxds;
- void *array[TLB_NR_PTRS];
+ unsigned int max;
+ void **array;
+ void *local[8];
};
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
-static inline struct mmu_gather *tlb_gather_mmu(struct mm_struct *mm,
- unsigned int full_mm_flush)
+static inline void __tlb_alloc_page(struct mmu_gather *tlb)
{
- struct mmu_gather *tlb = &get_cpu_var(mmu_gathers);
+ unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
+
+ if (addr) {
+ tlb->array = (void *) addr;
+ tlb->max = PAGE_SIZE / sizeof(void *);
+ }
+}
+static inline void tlb_gather_mmu(struct mmu_gather *tlb,
+ struct mm_struct *mm,
+ unsigned int full_mm_flush)
+{
tlb->mm = mm;
+ tlb->max = ARRAY_SIZE(tlb->local);
+ tlb->array = tlb->local;
tlb->fullmm = full_mm_flush;
- tlb->nr_ptes = 0;
- tlb->nr_pxds = TLB_NR_PTRS;
if (tlb->fullmm)
__tlb_flush_mm(mm);
- return tlb;
+ else
+ __tlb_alloc_page(tlb);
+ tlb->nr_ptes = 0;
+ tlb->nr_pxds = tlb->max;
}
static inline void tlb_flush_mmu(struct mmu_gather *tlb,
unsigned long start, unsigned long end)
{
- if (!tlb->fullmm && (tlb->nr_ptes > 0 || tlb->nr_pxds < TLB_NR_PTRS))
+ if (!tlb->fullmm && (tlb->nr_ptes > 0 || tlb->nr_pxds < tlb->max))
__tlb_flush_mm(tlb->mm);
while (tlb->nr_ptes > 0)
page_table_free_rcu(tlb->mm, tlb->array[--tlb->nr_ptes]);
- while (tlb->nr_pxds < TLB_NR_PTRS)
+ while (tlb->nr_pxds < tlb->max)
crst_table_free_rcu(tlb->mm, tlb->array[tlb->nr_pxds++]);
}
@@ -79,7 +85,8 @@ static inline void tlb_finish_mmu(struct
/* keep the page table cache within bounds */
check_pgt_cache();
- put_cpu_var(mmu_gathers);
+ if (tlb->array != tlb->local)
+ free_pages((unsigned long) tlb->array, 0);
}
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 06/25] arm: Preemptible mmu_gather
2011-01-25 17:31 ` Peter Zijlstra
(?)
@ 2011-01-25 17:31 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Russell King
[-- Attachment #1: peter_zijlstra-arm-preemptible_mmu_gather.patch --]
[-- Type: text/plain, Size: 1111 bytes --]
Fix up the arm mmu_gahter code to conform to the new API.
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/arm/include/asm/tlb.h | 12 ++----------
1 file changed, 2 insertions(+), 10 deletions(-)
Index: linux-2.6/arch/arm/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/arm/include/asm/tlb.h
+++ linux-2.6/arch/arm/include/asm/tlb.h
@@ -40,17 +40,11 @@ struct mmu_gather {
unsigned long range_end;
};
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
-static inline struct mmu_gather *
-tlb_gather_mmu(struct mm_struct *mm, unsigned int full_mm_flush)
+static inline void
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
{
- struct mmu_gather *tlb = &get_cpu_var(mmu_gathers);
-
tlb->mm = mm;
tlb->fullmm = full_mm_flush;
-
- return tlb;
}
static inline void
@@ -61,8 +55,6 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
/* keep the page table cache within bounds */
check_pgt_cache();
-
- put_cpu_var(mmu_gathers);
}
/*
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 06/25] arm: Preemptible mmu_gather
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Russell King
[-- Attachment #1: peter_zijlstra-arm-preemptible_mmu_gather.patch --]
[-- Type: text/plain, Size: 1407 bytes --]
Fix up the arm mmu_gahter code to conform to the new API.
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/arm/include/asm/tlb.h | 12 ++----------
1 file changed, 2 insertions(+), 10 deletions(-)
Index: linux-2.6/arch/arm/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/arm/include/asm/tlb.h
+++ linux-2.6/arch/arm/include/asm/tlb.h
@@ -40,17 +40,11 @@ struct mmu_gather {
unsigned long range_end;
};
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
-static inline struct mmu_gather *
-tlb_gather_mmu(struct mm_struct *mm, unsigned int full_mm_flush)
+static inline void
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
{
- struct mmu_gather *tlb = &get_cpu_var(mmu_gathers);
-
tlb->mm = mm;
tlb->fullmm = full_mm_flush;
-
- return tlb;
}
static inline void
@@ -61,8 +55,6 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
/* keep the page table cache within bounds */
check_pgt_cache();
-
- put_cpu_var(mmu_gathers);
}
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 06/25] arm: Preemptible mmu_gather
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Russell King
[-- Attachment #1: peter_zijlstra-arm-preemptible_mmu_gather.patch --]
[-- Type: text/plain, Size: 1407 bytes --]
Fix up the arm mmu_gahter code to conform to the new API.
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/arm/include/asm/tlb.h | 12 ++----------
1 file changed, 2 insertions(+), 10 deletions(-)
Index: linux-2.6/arch/arm/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/arm/include/asm/tlb.h
+++ linux-2.6/arch/arm/include/asm/tlb.h
@@ -40,17 +40,11 @@ struct mmu_gather {
unsigned long range_end;
};
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
-static inline struct mmu_gather *
-tlb_gather_mmu(struct mm_struct *mm, unsigned int full_mm_flush)
+static inline void
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
{
- struct mmu_gather *tlb = &get_cpu_var(mmu_gathers);
-
tlb->mm = mm;
tlb->fullmm = full_mm_flush;
-
- return tlb;
}
static inline void
@@ -61,8 +55,6 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
/* keep the page table cache within bounds */
check_pgt_cache();
-
- put_cpu_var(mmu_gathers);
}
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 07/25] sh: Preemptible mmu_gather
2011-01-25 17:31 ` Peter Zijlstra
(?)
@ 2011-01-25 17:31 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Paul Mundt
[-- Attachment #1: peter_zijlstra-sh-preemptible_mmu_gather.patch --]
[-- Type: text/plain, Size: 1303 bytes --]
Fix up the sh mmu_gahter code to conform to the new API.
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/sh/include/asm/tlb.h | 12 ++----------
1 file changed, 2 insertions(+), 10 deletions(-)
Index: linux-2.6/arch/sh/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/sh/include/asm/tlb.h
+++ linux-2.6/arch/sh/include/asm/tlb.h
@@ -23,8 +23,6 @@ struct mmu_gather {
unsigned long start, end;
};
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
static inline void init_tlb_gather(struct mmu_gather *tlb)
{
tlb->start = TASK_SIZE;
@@ -36,17 +34,13 @@ static inline void init_tlb_gather(struc
}
}
-static inline struct mmu_gather *
-tlb_gather_mmu(struct mm_struct *mm, unsigned int full_mm_flush)
+static inline void
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
{
- struct mmu_gather *tlb = &get_cpu_var(mmu_gathers);
-
tlb->mm = mm;
tlb->fullmm = full_mm_flush;
init_tlb_gather(tlb);
-
- return tlb;
}
static inline void
@@ -57,8 +51,6 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
/* keep the page table cache within bounds */
check_pgt_cache();
-
- put_cpu_var(mmu_gathers);
}
static inline void
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 07/25] sh: Preemptible mmu_gather
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Paul Mundt
[-- Attachment #1: peter_zijlstra-sh-preemptible_mmu_gather.patch --]
[-- Type: text/plain, Size: 1599 bytes --]
Fix up the sh mmu_gahter code to conform to the new API.
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/sh/include/asm/tlb.h | 12 ++----------
1 file changed, 2 insertions(+), 10 deletions(-)
Index: linux-2.6/arch/sh/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/sh/include/asm/tlb.h
+++ linux-2.6/arch/sh/include/asm/tlb.h
@@ -23,8 +23,6 @@ struct mmu_gather {
unsigned long start, end;
};
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
static inline void init_tlb_gather(struct mmu_gather *tlb)
{
tlb->start = TASK_SIZE;
@@ -36,17 +34,13 @@ static inline void init_tlb_gather(struc
}
}
-static inline struct mmu_gather *
-tlb_gather_mmu(struct mm_struct *mm, unsigned int full_mm_flush)
+static inline void
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
{
- struct mmu_gather *tlb = &get_cpu_var(mmu_gathers);
-
tlb->mm = mm;
tlb->fullmm = full_mm_flush;
init_tlb_gather(tlb);
-
- return tlb;
}
static inline void
@@ -57,8 +51,6 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
/* keep the page table cache within bounds */
check_pgt_cache();
-
- put_cpu_var(mmu_gathers);
}
static inline void
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 07/25] sh: Preemptible mmu_gather
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Paul Mundt
[-- Attachment #1: peter_zijlstra-sh-preemptible_mmu_gather.patch --]
[-- Type: text/plain, Size: 1599 bytes --]
Fix up the sh mmu_gahter code to conform to the new API.
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/sh/include/asm/tlb.h | 12 ++----------
1 file changed, 2 insertions(+), 10 deletions(-)
Index: linux-2.6/arch/sh/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/sh/include/asm/tlb.h
+++ linux-2.6/arch/sh/include/asm/tlb.h
@@ -23,8 +23,6 @@ struct mmu_gather {
unsigned long start, end;
};
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
static inline void init_tlb_gather(struct mmu_gather *tlb)
{
tlb->start = TASK_SIZE;
@@ -36,17 +34,13 @@ static inline void init_tlb_gather(struc
}
}
-static inline struct mmu_gather *
-tlb_gather_mmu(struct mm_struct *mm, unsigned int full_mm_flush)
+static inline void
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
{
- struct mmu_gather *tlb = &get_cpu_var(mmu_gathers);
-
tlb->mm = mm;
tlb->fullmm = full_mm_flush;
init_tlb_gather(tlb);
-
- return tlb;
}
static inline void
@@ -57,8 +51,6 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
/* keep the page table cache within bounds */
check_pgt_cache();
-
- put_cpu_var(mmu_gathers);
}
static inline void
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 08/25] um: Preemptible mmu_gather
2011-01-25 17:31 ` Peter Zijlstra
(?)
@ 2011-01-25 17:31 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Jeff Dike
[-- Attachment #1: peter_zijlstra-um-preemptible_mmu_gather.patch --]
[-- Type: text/plain, Size: 1576 bytes --]
Fix up the um mmu_gather code to conform to the new API.
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/um/include/asm/tlb.h | 16 ++--------------
1 file changed, 2 insertions(+), 14 deletions(-)
Index: linux-2.6/arch/um/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/um/include/asm/tlb.h
+++ linux-2.6/arch/um/include/asm/tlb.h
@@ -22,9 +22,6 @@ struct mmu_gather {
unsigned int fullmm; /* non-zero means full mm flush */
};
-/* Users of the generic TLB shootdown code must declare this storage space. */
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
static inline void __tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep,
unsigned long address)
{
@@ -47,20 +44,13 @@ static inline void init_tlb_gather(struc
}
}
-/* tlb_gather_mmu
- * Return a pointer to an initialized struct mmu_gather.
- */
-static inline struct mmu_gather *
-tlb_gather_mmu(struct mm_struct *mm, unsigned int full_mm_flush)
+static inline void
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
{
- struct mmu_gather *tlb = &get_cpu_var(mmu_gathers);
-
tlb->mm = mm;
tlb->fullmm = full_mm_flush;
init_tlb_gather(tlb);
-
- return tlb;
}
extern void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
@@ -87,8 +77,6 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
/* keep the page table cache within bounds */
check_pgt_cache();
-
- put_cpu_var(mmu_gathers);
}
/* tlb_remove_page
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 08/25] um: Preemptible mmu_gather
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Jeff Dike
[-- Attachment #1: peter_zijlstra-um-preemptible_mmu_gather.patch --]
[-- Type: text/plain, Size: 1872 bytes --]
Fix up the um mmu_gather code to conform to the new API.
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/um/include/asm/tlb.h | 16 ++--------------
1 file changed, 2 insertions(+), 14 deletions(-)
Index: linux-2.6/arch/um/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/um/include/asm/tlb.h
+++ linux-2.6/arch/um/include/asm/tlb.h
@@ -22,9 +22,6 @@ struct mmu_gather {
unsigned int fullmm; /* non-zero means full mm flush */
};
-/* Users of the generic TLB shootdown code must declare this storage space. */
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
static inline void __tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep,
unsigned long address)
{
@@ -47,20 +44,13 @@ static inline void init_tlb_gather(struc
}
}
-/* tlb_gather_mmu
- * Return a pointer to an initialized struct mmu_gather.
- */
-static inline struct mmu_gather *
-tlb_gather_mmu(struct mm_struct *mm, unsigned int full_mm_flush)
+static inline void
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
{
- struct mmu_gather *tlb = &get_cpu_var(mmu_gathers);
-
tlb->mm = mm;
tlb->fullmm = full_mm_flush;
init_tlb_gather(tlb);
-
- return tlb;
}
extern void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
@@ -87,8 +77,6 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
/* keep the page table cache within bounds */
check_pgt_cache();
-
- put_cpu_var(mmu_gathers);
}
/* tlb_remove_page
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 08/25] um: Preemptible mmu_gather
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Jeff Dike
[-- Attachment #1: peter_zijlstra-um-preemptible_mmu_gather.patch --]
[-- Type: text/plain, Size: 1872 bytes --]
Fix up the um mmu_gather code to conform to the new API.
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/um/include/asm/tlb.h | 16 ++--------------
1 file changed, 2 insertions(+), 14 deletions(-)
Index: linux-2.6/arch/um/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/um/include/asm/tlb.h
+++ linux-2.6/arch/um/include/asm/tlb.h
@@ -22,9 +22,6 @@ struct mmu_gather {
unsigned int fullmm; /* non-zero means full mm flush */
};
-/* Users of the generic TLB shootdown code must declare this storage space. */
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
static inline void __tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep,
unsigned long address)
{
@@ -47,20 +44,13 @@ static inline void init_tlb_gather(struc
}
}
-/* tlb_gather_mmu
- * Return a pointer to an initialized struct mmu_gather.
- */
-static inline struct mmu_gather *
-tlb_gather_mmu(struct mm_struct *mm, unsigned int full_mm_flush)
+static inline void
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
{
- struct mmu_gather *tlb = &get_cpu_var(mmu_gathers);
-
tlb->mm = mm;
tlb->fullmm = full_mm_flush;
init_tlb_gather(tlb);
-
- return tlb;
}
extern void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
@@ -87,8 +77,6 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
/* keep the page table cache within bounds */
check_pgt_cache();
-
- put_cpu_var(mmu_gathers);
}
/* tlb_remove_page
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 09/25] ia64: Preemptible mmu_gather
2011-01-25 17:31 ` Peter Zijlstra
(?)
@ 2011-01-25 17:31 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Tony Luck
[-- Attachment #1: peter_zijlstra-ia64-preemptible_mmu_gather.patch --]
[-- Type: text/plain, Size: 3448 bytes --]
Fix up the ia64 mmu_gather code to conform to the new API.
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/ia64/include/asm/tlb.h | 41 +++++++++++++++++++++++++----------------
1 file changed, 25 insertions(+), 16 deletions(-)
Index: linux-2.6/arch/ia64/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/ia64/include/asm/tlb.h
+++ linux-2.6/arch/ia64/include/asm/tlb.h
@@ -47,21 +47,21 @@
#include <asm/machvec.h>
#ifdef CONFIG_SMP
-# define FREE_PTE_NR 2048
# define tlb_fast_mode(tlb) ((tlb)->nr == ~0U)
#else
-# define FREE_PTE_NR 0
# define tlb_fast_mode(tlb) (1)
#endif
struct mmu_gather {
struct mm_struct *mm;
unsigned int nr; /* == ~0U => fast mode */
+ unsigned int max;
unsigned char fullmm; /* non-zero means full mm flush */
unsigned char need_flush; /* really unmapped some PTEs? */
unsigned long start_addr;
unsigned long end_addr;
- struct page *pages[FREE_PTE_NR];
+ struct page **pages;
+ struct page *local[8];
};
struct ia64_tr_entry {
@@ -90,9 +90,6 @@ extern struct ia64_tr_entry *ia64_idtrs[
#define RR_RID_MASK 0x00000000ffffff00L
#define RR_TO_RID(val) ((val >> 8) & 0xffffff)
-/* Users of the generic TLB shootdown code must declare this storage space. */
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
/*
* Flush the TLB for address range START to END and, if not in fast mode, release the
* freed pages that where gathered up to this point.
@@ -147,15 +144,23 @@ ia64_tlb_flush_mmu (struct mmu_gather *t
}
}
-/*
- * Return a pointer to an initialized struct mmu_gather.
- */
-static inline struct mmu_gather *
-tlb_gather_mmu (struct mm_struct *mm, unsigned int full_mm_flush)
+static inline void __tlb_alloc_page(struct mmu_gather *tlb)
{
- struct mmu_gather *tlb = &get_cpu_var(mmu_gathers);
+ unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
+
+ if (addr) {
+ tlb->pages = (void *)addr;
+ tlb->max = PAGE_SIZE / sizeof(void *);
+ }
+}
+
+static inline void
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
+{
tlb->mm = mm;
+ tlb->max = ARRAY_SIZE(tlb->local);
+ tlb->pages = tlb->local;
/*
* Use fast mode if only 1 CPU is online.
*
@@ -172,7 +177,6 @@ tlb_gather_mmu (struct mm_struct *mm, un
tlb->nr = (num_online_cpus() == 1) ? ~0U : 0;
tlb->fullmm = full_mm_flush;
tlb->start_addr = ~0UL;
- return tlb;
}
/*
@@ -180,7 +184,7 @@ tlb_gather_mmu (struct mm_struct *mm, un
* collected.
*/
static inline void
-tlb_finish_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long end)
+tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
{
/*
* Note: tlb->nr may be 0 at this point, so we can't rely on tlb->start_addr and
@@ -191,7 +195,8 @@ tlb_finish_mmu (struct mmu_gather *tlb,
/* keep the page table cache within bounds */
check_pgt_cache();
- put_cpu_var(mmu_gathers);
+ if (tlb->pages != tlb->local)
+ free_pages((unsigned long)tlb->pages, 0);
}
/*
@@ -208,8 +213,12 @@ tlb_remove_page (struct mmu_gather *tlb,
free_page_and_swap_cache(page);
return;
}
+
+ if (!tlb->nr && tlb->pages == tlb->local)
+ __tlb_alloc_page(tlb);
+
tlb->pages[tlb->nr++] = page;
- if (tlb->nr >= FREE_PTE_NR)
+ if (tlb->nr >= tlb->max)
ia64_tlb_flush_mmu(tlb, tlb->start_addr, tlb->end_addr);
}
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 09/25] ia64: Preemptible mmu_gather
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Tony Luck
[-- Attachment #1: peter_zijlstra-ia64-preemptible_mmu_gather.patch --]
[-- Type: text/plain, Size: 3744 bytes --]
Fix up the ia64 mmu_gather code to conform to the new API.
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/ia64/include/asm/tlb.h | 41 +++++++++++++++++++++++++----------------
1 file changed, 25 insertions(+), 16 deletions(-)
Index: linux-2.6/arch/ia64/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/ia64/include/asm/tlb.h
+++ linux-2.6/arch/ia64/include/asm/tlb.h
@@ -47,21 +47,21 @@
#include <asm/machvec.h>
#ifdef CONFIG_SMP
-# define FREE_PTE_NR 2048
# define tlb_fast_mode(tlb) ((tlb)->nr == ~0U)
#else
-# define FREE_PTE_NR 0
# define tlb_fast_mode(tlb) (1)
#endif
struct mmu_gather {
struct mm_struct *mm;
unsigned int nr; /* == ~0U => fast mode */
+ unsigned int max;
unsigned char fullmm; /* non-zero means full mm flush */
unsigned char need_flush; /* really unmapped some PTEs? */
unsigned long start_addr;
unsigned long end_addr;
- struct page *pages[FREE_PTE_NR];
+ struct page **pages;
+ struct page *local[8];
};
struct ia64_tr_entry {
@@ -90,9 +90,6 @@ extern struct ia64_tr_entry *ia64_idtrs[
#define RR_RID_MASK 0x00000000ffffff00L
#define RR_TO_RID(val) ((val >> 8) & 0xffffff)
-/* Users of the generic TLB shootdown code must declare this storage space. */
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
/*
* Flush the TLB for address range START to END and, if not in fast mode, release the
* freed pages that where gathered up to this point.
@@ -147,15 +144,23 @@ ia64_tlb_flush_mmu (struct mmu_gather *t
}
}
-/*
- * Return a pointer to an initialized struct mmu_gather.
- */
-static inline struct mmu_gather *
-tlb_gather_mmu (struct mm_struct *mm, unsigned int full_mm_flush)
+static inline void __tlb_alloc_page(struct mmu_gather *tlb)
{
- struct mmu_gather *tlb = &get_cpu_var(mmu_gathers);
+ unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
+
+ if (addr) {
+ tlb->pages = (void *)addr;
+ tlb->max = PAGE_SIZE / sizeof(void *);
+ }
+}
+
+static inline void
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
+{
tlb->mm = mm;
+ tlb->max = ARRAY_SIZE(tlb->local);
+ tlb->pages = tlb->local;
/*
* Use fast mode if only 1 CPU is online.
*
@@ -172,7 +177,6 @@ tlb_gather_mmu (struct mm_struct *mm, un
tlb->nr = (num_online_cpus() == 1) ? ~0U : 0;
tlb->fullmm = full_mm_flush;
tlb->start_addr = ~0UL;
- return tlb;
}
/*
@@ -180,7 +184,7 @@ tlb_gather_mmu (struct mm_struct *mm, un
* collected.
*/
static inline void
-tlb_finish_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long end)
+tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
{
/*
* Note: tlb->nr may be 0 at this point, so we can't rely on tlb->start_addr and
@@ -191,7 +195,8 @@ tlb_finish_mmu (struct mmu_gather *tlb,
/* keep the page table cache within bounds */
check_pgt_cache();
- put_cpu_var(mmu_gathers);
+ if (tlb->pages != tlb->local)
+ free_pages((unsigned long)tlb->pages, 0);
}
/*
@@ -208,8 +213,12 @@ tlb_remove_page (struct mmu_gather *tlb,
free_page_and_swap_cache(page);
return;
}
+
+ if (!tlb->nr && tlb->pages == tlb->local)
+ __tlb_alloc_page(tlb);
+
tlb->pages[tlb->nr++] = page;
- if (tlb->nr >= FREE_PTE_NR)
+ if (tlb->nr >= tlb->max)
ia64_tlb_flush_mmu(tlb, tlb->start_addr, tlb->end_addr);
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 09/25] ia64: Preemptible mmu_gather
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Tony Luck
[-- Attachment #1: peter_zijlstra-ia64-preemptible_mmu_gather.patch --]
[-- Type: text/plain, Size: 3446 bytes --]
Fix up the ia64 mmu_gather code to conform to the new API.
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/ia64/include/asm/tlb.h | 41 +++++++++++++++++++++++++----------------
1 file changed, 25 insertions(+), 16 deletions(-)
Index: linux-2.6/arch/ia64/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/ia64/include/asm/tlb.h
+++ linux-2.6/arch/ia64/include/asm/tlb.h
@@ -47,21 +47,21 @@
#include <asm/machvec.h>
#ifdef CONFIG_SMP
-# define FREE_PTE_NR 2048
# define tlb_fast_mode(tlb) ((tlb)->nr == ~0U)
#else
-# define FREE_PTE_NR 0
# define tlb_fast_mode(tlb) (1)
#endif
struct mmu_gather {
struct mm_struct *mm;
unsigned int nr; /* == ~0U => fast mode */
+ unsigned int max;
unsigned char fullmm; /* non-zero means full mm flush */
unsigned char need_flush; /* really unmapped some PTEs? */
unsigned long start_addr;
unsigned long end_addr;
- struct page *pages[FREE_PTE_NR];
+ struct page **pages;
+ struct page *local[8];
};
struct ia64_tr_entry {
@@ -90,9 +90,6 @@ extern struct ia64_tr_entry *ia64_idtrs[
#define RR_RID_MASK 0x00000000ffffff00L
#define RR_TO_RID(val) ((val >> 8) & 0xffffff)
-/* Users of the generic TLB shootdown code must declare this storage space. */
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
/*
* Flush the TLB for address range START to END and, if not in fast mode, release the
* freed pages that where gathered up to this point.
@@ -147,15 +144,23 @@ ia64_tlb_flush_mmu (struct mmu_gather *t
}
}
-/*
- * Return a pointer to an initialized struct mmu_gather.
- */
-static inline struct mmu_gather *
-tlb_gather_mmu (struct mm_struct *mm, unsigned int full_mm_flush)
+static inline void __tlb_alloc_page(struct mmu_gather *tlb)
{
- struct mmu_gather *tlb = &get_cpu_var(mmu_gathers);
+ unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
+
+ if (addr) {
+ tlb->pages = (void *)addr;
+ tlb->max = PAGE_SIZE / sizeof(void *);
+ }
+}
+
+static inline void
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
+{
tlb->mm = mm;
+ tlb->max = ARRAY_SIZE(tlb->local);
+ tlb->pages = tlb->local;
/*
* Use fast mode if only 1 CPU is online.
*
@@ -172,7 +177,6 @@ tlb_gather_mmu (struct mm_struct *mm, un
tlb->nr = (num_online_cpus() == 1) ? ~0U : 0;
tlb->fullmm = full_mm_flush;
tlb->start_addr = ~0UL;
- return tlb;
}
/*
@@ -180,7 +184,7 @@ tlb_gather_mmu (struct mm_struct *mm, un
* collected.
*/
static inline void
-tlb_finish_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long end)
+tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
{
/*
* Note: tlb->nr may be 0 at this point, so we can't rely on tlb->start_addr and
@@ -191,7 +195,8 @@ tlb_finish_mmu (struct mmu_gather *tlb,
/* keep the page table cache within bounds */
check_pgt_cache();
- put_cpu_var(mmu_gathers);
+ if (tlb->pages != tlb->local)
+ free_pages((unsigned long)tlb->pages, 0);
}
/*
@@ -208,8 +213,12 @@ tlb_remove_page (struct mmu_gather *tlb,
free_page_and_swap_cache(page);
return;
}
+
+ if (!tlb->nr && tlb->pages == tlb->local)
+ __tlb_alloc_page(tlb);
+
tlb->pages[tlb->nr++] = page;
- if (tlb->nr >= FREE_PTE_NR)
+ if (tlb->nr >= tlb->max)
ia64_tlb_flush_mmu(tlb, tlb->start_addr, tlb->end_addr);
}
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 09/25] ia64: Preemptible mmu_gather
2011-01-25 17:31 ` Peter Zijlstra
@ 2011-01-25 20:12 ` Tony Luck
-1 siblings, 0 replies; 128+ messages in thread
From: Tony Luck @ 2011-01-25 20:12 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds, linux-kernel, linux-arch,
linux-mm, Benjamin Herrenschmidt, David Miller, Hugh Dickins,
Mel Gorman, Nick Piggin, Paul McKenney, Yanmin Zhang
On Tue, Jan 25, 2011 at 9:31 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> struct mmu_gather {
> struct mm_struct *mm;
> unsigned int nr; /* == ~0U => fast mode */
> + unsigned int max;
> unsigned char fullmm; /* non-zero means full mm flush */
> unsigned char need_flush; /* really unmapped some PTEs? */
> unsigned long start_addr;
> unsigned long end_addr;
> - struct page *pages[FREE_PTE_NR];
> + struct page **pages;
> + struct page *local[8];
> };
Overall it looks OK - builds, boots & runs too. One question about
the above bit ... why "8" elements in the local[] array? This ought to be
a #define, maybe with a comment explaining the significance. It doesn't
seem to fill out struct mmu_gather to some "nice" size. I can't think
of why batching 8 at a time (in the fallback cannot allocate **pages case)
is a good number. So is there some science to the choice, or did you
pluck 8 out of the air?
Thanks
-Tony
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 09/25] ia64: Preemptible mmu_gather
@ 2011-01-25 20:12 ` Tony Luck
0 siblings, 0 replies; 128+ messages in thread
From: Tony Luck @ 2011-01-25 20:12 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds, linux-kernel, linux-arch,
linux-mm, Benjamin Herrenschmidt, David Miller, Hugh Dickins,
Mel Gorman, Nick Piggin, Paul McKenney, Yanmin Zhang
On Tue, Jan 25, 2011 at 9:31 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> struct mmu_gather {
> struct mm_struct *mm;
> unsigned int nr; /* == ~0U => fast mode */
> + unsigned int max;
> unsigned char fullmm; /* non-zero means full mm flush */
> unsigned char need_flush; /* really unmapped some PTEs? */
> unsigned long start_addr;
> unsigned long end_addr;
> - struct page *pages[FREE_PTE_NR];
> + struct page **pages;
> + struct page *local[8];
> };
Overall it looks OK - builds, boots & runs too. One question about
the above bit ... why "8" elements in the local[] array? This ought to be
a #define, maybe with a comment explaining the significance. It doesn't
seem to fill out struct mmu_gather to some "nice" size. I can't think
of why batching 8 at a time (in the fallback cannot allocate **pages case)
is a good number. So is there some science to the choice, or did you
pluck 8 out of the air?
Thanks
-Tony
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 09/25] ia64: Preemptible mmu_gather
2011-01-25 20:12 ` Tony Luck
@ 2011-01-25 20:22 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 20:22 UTC (permalink / raw)
To: Tony Luck
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds, linux-kernel, linux-arch,
linux-mm, Benjamin Herrenschmidt, David Miller, Hugh Dickins,
Mel Gorman, Nick Piggin, Paul McKenney, Yanmin Zhang
On Tue, 2011-01-25 at 12:12 -0800, Tony Luck wrote:
> On Tue, Jan 25, 2011 at 9:31 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > struct mmu_gather {
> > struct mm_struct *mm;
> > unsigned int nr; /* == ~0U => fast mode */
> > + unsigned int max;
> > unsigned char fullmm; /* non-zero means full mm flush */
> > unsigned char need_flush; /* really unmapped some PTEs? */
> > unsigned long start_addr;
> > unsigned long end_addr;
> > - struct page *pages[FREE_PTE_NR];
> > + struct page **pages;
> > + struct page *local[8];
> > };
>
> Overall it looks OK - builds, boots & runs too. One question about
> the above bit ... why "8" elements in the local[] array? This ought to be
> a #define, maybe with a comment explaining the significance. It doesn't
> seem to fill out struct mmu_gather to some "nice" size. I can't think
> of why batching 8 at a time (in the fallback cannot allocate **pages case)
> is a good number. So is there some science to the choice, or did you
> pluck 8 out of the air?
Yeah, pretty much a random number small enough to make struct mmu_gather
fit on stack, the reason its not 1 is that a few more entries increase
performance a little and freeing more pages increases the chance the
page allocation works next time around.
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 09/25] ia64: Preemptible mmu_gather
@ 2011-01-25 20:22 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 20:22 UTC (permalink / raw)
To: Tony Luck
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds, linux-kernel, linux-arch,
linux-mm, Benjamin Herrenschmidt, David Miller, Hugh Dickins,
Mel Gorman, Nick Piggin, Paul McKenney, Yanmin Zhang
On Tue, 2011-01-25 at 12:12 -0800, Tony Luck wrote:
> On Tue, Jan 25, 2011 at 9:31 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > struct mmu_gather {
> > struct mm_struct *mm;
> > unsigned int nr; /* == ~0U => fast mode */
> > + unsigned int max;
> > unsigned char fullmm; /* non-zero means full mm flush */
> > unsigned char need_flush; /* really unmapped some PTEs? */
> > unsigned long start_addr;
> > unsigned long end_addr;
> > - struct page *pages[FREE_PTE_NR];
> > + struct page **pages;
> > + struct page *local[8];
> > };
>
> Overall it looks OK - builds, boots & runs too. One question about
> the above bit ... why "8" elements in the local[] array? This ought to be
> a #define, maybe with a comment explaining the significance. It doesn't
> seem to fill out struct mmu_gather to some "nice" size. I can't think
> of why batching 8 at a time (in the fallback cannot allocate **pages case)
> is a good number. So is there some science to the choice, or did you
> pluck 8 out of the air?
Yeah, pretty much a random number small enough to make struct mmu_gather
fit on stack, the reason its not 1 is that a few more entries increase
performance a little and freeing more pages increases the chance the
page allocation works next time around.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 09/25] ia64: Preemptible mmu_gather
2011-01-25 20:22 ` Peter Zijlstra
@ 2011-01-25 21:23 ` Tony Luck
-1 siblings, 0 replies; 128+ messages in thread
From: Tony Luck @ 2011-01-25 21:23 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds, linux-kernel, linux-arch,
linux-mm, Benjamin Herrenschmidt, David Miller, Hugh Dickins,
Mel Gorman, Nick Piggin, Paul McKenney, Yanmin Zhang
> Yeah, pretty much a random number small enough to make struct mmu_gather
> fit on stack, the reason its not 1 is that a few more entries increase
> performance a little and freeing more pages increases the chance the
> page allocation works next time around.
Okay ... then could you swap out your part 09/25 for this version that
has a #define and a comment. You can add
Acked-by: Tony Luck <tony.luck@intel.com>
-Tony
---
diff --git a/arch/ia64/include/asm/tlb.h b/arch/ia64/include/asm/tlb.h
index 23cce99..ec4ca06 100644
--- a/arch/ia64/include/asm/tlb.h
+++ b/arch/ia64/include/asm/tlb.h
@@ -47,21 +47,27 @@
#include <asm/machvec.h>
#ifdef CONFIG_SMP
-# define FREE_PTE_NR 2048
# define tlb_fast_mode(tlb) ((tlb)->nr == ~0U)
#else
-# define FREE_PTE_NR 0
# define tlb_fast_mode(tlb) (1)
#endif
+/*
+ * If we can't allocate a page to make a big batch of page pointers
+ * to work on, then just handle a few from the on-stack structure.
+ */
+#define IA64_GATHER_BUNDLE 8
+
struct mmu_gather {
struct mm_struct *mm;
unsigned int nr; /* == ~0U => fast mode */
+ unsigned int max;
unsigned char fullmm; /* non-zero means full mm flush */
unsigned char need_flush; /* really unmapped some PTEs? */
unsigned long start_addr;
unsigned long end_addr;
- struct page *pages[FREE_PTE_NR];
+ struct page **pages;
+ struct page *local[IA64_GATHER_BUNDLE];
};
struct ia64_tr_entry {
@@ -90,9 +96,6 @@ extern struct ia64_tr_entry *ia64_idtrs[NR_CPUS];
#define RR_RID_MASK 0x00000000ffffff00L
#define RR_TO_RID(val) ((val >> 8) & 0xffffff)
-/* Users of the generic TLB shootdown code must declare this storage space. */
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
/*
* Flush the TLB for address range START to END and, if not in fast mode, release the
* freed pages that where gathered up to this point.
@@ -147,15 +150,23 @@ ia64_tlb_flush_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long e
}
}
-/*
- * Return a pointer to an initialized struct mmu_gather.
- */
-static inline struct mmu_gather *
-tlb_gather_mmu (struct mm_struct *mm, unsigned int full_mm_flush)
+static inline void __tlb_alloc_page(struct mmu_gather *tlb)
{
- struct mmu_gather *tlb = &get_cpu_var(mmu_gathers);
+ unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
+
+ if (addr) {
+ tlb->pages = (void *)addr;
+ tlb->max = PAGE_SIZE / sizeof(void *);
+ }
+}
+
+static inline void
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
+{
tlb->mm = mm;
+ tlb->max = ARRAY_SIZE(tlb->local);
+ tlb->pages = tlb->local;
/*
* Use fast mode if only 1 CPU is online.
*
@@ -172,7 +183,6 @@ tlb_gather_mmu (struct mm_struct *mm, unsigned int full_mm_flush)
tlb->nr = (num_online_cpus() == 1) ? ~0U : 0;
tlb->fullmm = full_mm_flush;
tlb->start_addr = ~0UL;
- return tlb;
}
/*
@@ -180,7 +190,7 @@ tlb_gather_mmu (struct mm_struct *mm, unsigned int full_mm_flush)
* collected.
*/
static inline void
-tlb_finish_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long end)
+tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
{
/*
* Note: tlb->nr may be 0 at this point, so we can't rely on tlb->start_addr and
@@ -191,7 +201,8 @@ tlb_finish_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long end)
/* keep the page table cache within bounds */
check_pgt_cache();
- put_cpu_var(mmu_gathers);
+ if (tlb->pages != tlb->local)
+ free_pages((unsigned long)tlb->pages, 0);
}
/*
@@ -208,8 +219,12 @@ tlb_remove_page (struct mmu_gather *tlb, struct page *page)
free_page_and_swap_cache(page);
return;
}
+
+ if (!tlb->nr && tlb->pages == tlb->local)
+ __tlb_alloc_page(tlb);
+
tlb->pages[tlb->nr++] = page;
- if (tlb->nr >= FREE_PTE_NR)
+ if (tlb->nr >= tlb->max)
ia64_tlb_flush_mmu(tlb, tlb->start_addr, tlb->end_addr);
}
^ permalink raw reply related [flat|nested] 128+ messages in thread
* Re: [PATCH 09/25] ia64: Preemptible mmu_gather
@ 2011-01-25 21:23 ` Tony Luck
0 siblings, 0 replies; 128+ messages in thread
From: Tony Luck @ 2011-01-25 21:23 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds, linux-kernel, linux-arch,
linux-mm, Benjamin Herrenschmidt, David Miller, Hugh Dickins,
Mel Gorman, Nick Piggin, Paul McKenney, Yanmin Zhang
> Yeah, pretty much a random number small enough to make struct mmu_gather
> fit on stack, the reason its not 1 is that a few more entries increase
> performance a little and freeing more pages increases the chance the
> page allocation works next time around.
Okay ... then could you swap out your part 09/25 for this version that
has a #define and a comment. You can add
Acked-by: Tony Luck <tony.luck@intel.com>
-Tony
---
diff --git a/arch/ia64/include/asm/tlb.h b/arch/ia64/include/asm/tlb.h
index 23cce99..ec4ca06 100644
--- a/arch/ia64/include/asm/tlb.h
+++ b/arch/ia64/include/asm/tlb.h
@@ -47,21 +47,27 @@
#include <asm/machvec.h>
#ifdef CONFIG_SMP
-# define FREE_PTE_NR 2048
# define tlb_fast_mode(tlb) ((tlb)->nr == ~0U)
#else
-# define FREE_PTE_NR 0
# define tlb_fast_mode(tlb) (1)
#endif
+/*
+ * If we can't allocate a page to make a big batch of page pointers
+ * to work on, then just handle a few from the on-stack structure.
+ */
+#define IA64_GATHER_BUNDLE 8
+
struct mmu_gather {
struct mm_struct *mm;
unsigned int nr; /* == ~0U => fast mode */
+ unsigned int max;
unsigned char fullmm; /* non-zero means full mm flush */
unsigned char need_flush; /* really unmapped some PTEs? */
unsigned long start_addr;
unsigned long end_addr;
- struct page *pages[FREE_PTE_NR];
+ struct page **pages;
+ struct page *local[IA64_GATHER_BUNDLE];
};
struct ia64_tr_entry {
@@ -90,9 +96,6 @@ extern struct ia64_tr_entry *ia64_idtrs[NR_CPUS];
#define RR_RID_MASK 0x00000000ffffff00L
#define RR_TO_RID(val) ((val >> 8) & 0xffffff)
-/* Users of the generic TLB shootdown code must declare this storage space. */
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
/*
* Flush the TLB for address range START to END and, if not in fast mode, release the
* freed pages that where gathered up to this point.
@@ -147,15 +150,23 @@ ia64_tlb_flush_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long e
}
}
-/*
- * Return a pointer to an initialized struct mmu_gather.
- */
-static inline struct mmu_gather *
-tlb_gather_mmu (struct mm_struct *mm, unsigned int full_mm_flush)
+static inline void __tlb_alloc_page(struct mmu_gather *tlb)
{
- struct mmu_gather *tlb = &get_cpu_var(mmu_gathers);
+ unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
+
+ if (addr) {
+ tlb->pages = (void *)addr;
+ tlb->max = PAGE_SIZE / sizeof(void *);
+ }
+}
+
+static inline void
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
+{
tlb->mm = mm;
+ tlb->max = ARRAY_SIZE(tlb->local);
+ tlb->pages = tlb->local;
/*
* Use fast mode if only 1 CPU is online.
*
@@ -172,7 +183,6 @@ tlb_gather_mmu (struct mm_struct *mm, unsigned int full_mm_flush)
tlb->nr = (num_online_cpus() == 1) ? ~0U : 0;
tlb->fullmm = full_mm_flush;
tlb->start_addr = ~0UL;
- return tlb;
}
/*
@@ -180,7 +190,7 @@ tlb_gather_mmu (struct mm_struct *mm, unsigned int full_mm_flush)
* collected.
*/
static inline void
-tlb_finish_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long end)
+tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
{
/*
* Note: tlb->nr may be 0 at this point, so we can't rely on tlb->start_addr and
@@ -191,7 +201,8 @@ tlb_finish_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long end)
/* keep the page table cache within bounds */
check_pgt_cache();
- put_cpu_var(mmu_gathers);
+ if (tlb->pages != tlb->local)
+ free_pages((unsigned long)tlb->pages, 0);
}
/*
@@ -208,8 +219,12 @@ tlb_remove_page (struct mmu_gather *tlb, struct page *page)
free_page_and_swap_cache(page);
return;
}
+
+ if (!tlb->nr && tlb->pages == tlb->local)
+ __tlb_alloc_page(tlb);
+
tlb->pages[tlb->nr++] = page;
- if (tlb->nr >= FREE_PTE_NR)
+ if (tlb->nr >= tlb->max)
ia64_tlb_flush_mmu(tlb, tlb->start_addr, tlb->end_addr);
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 128+ messages in thread
* Re: [PATCH 09/25] ia64: Preemptible mmu_gather
2011-01-25 21:23 ` Tony Luck
@ 2011-01-26 11:01 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-26 11:01 UTC (permalink / raw)
To: tony.luck
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds, linux-kernel, linux-arch,
linux-mm, Benjamin Herrenschmidt, David Miller, Hugh Dickins,
Mel Gorman, Nick Piggin, Paul McKenney, Yanmin Zhang
On Tue, 2011-01-25 at 13:23 -0800, tony.luck@intel.com wrote:
> Okay ... then could you swap out your part 09/25 for this version that
> has a #define and a comment. You can add
>
> Acked-by: Tony Luck <tony.luck@intel.com>
>
Thanks Tony!
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 09/25] ia64: Preemptible mmu_gather
@ 2011-01-26 11:01 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-26 11:01 UTC (permalink / raw)
To: tony.luck
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds, linux-kernel, linux-arch,
linux-mm, Benjamin Herrenschmidt, David Miller, Hugh Dickins,
Mel Gorman, Nick Piggin, Paul McKenney, Yanmin Zhang
On Tue, 2011-01-25 at 13:23 -0800, tony.luck@intel.com wrote:
> Okay ... then could you swap out your part 09/25 for this version that
> has a #define and a comment. You can add
>
> Acked-by: Tony Luck <tony.luck@intel.com>
>
Thanks Tony!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 10/25] mm: Now that all old mmu_gather code is gone, remove the storage
2011-01-25 17:31 ` Peter Zijlstra
(?)
@ 2011-01-25 17:31 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: arch-remove-mmu_gather.patch --]
[-- Type: text/plain, Size: 8855 bytes --]
XXX fold all the mmu_gather preempt patches into one for submission
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/alpha/mm/init.c | 2 --
arch/arm/mm/mmu.c | 2 --
arch/avr32/mm/init.c | 2 --
arch/cris/mm/init.c | 2 --
arch/frv/mm/init.c | 2 --
arch/ia64/mm/init.c | 2 --
arch/m32r/mm/init.c | 2 --
arch/m68k/mm/init.c | 2 --
arch/microblaze/mm/init.c | 2 --
arch/mips/mm/init.c | 2 --
arch/mn10300/mm/init.c | 2 --
arch/parisc/mm/init.c | 2 --
arch/s390/mm/pgtable.c | 1 -
arch/score/mm/init.c | 2 --
arch/sh/mm/init.c | 1 -
arch/sparc/mm/init_32.c | 2 --
arch/tile/mm/init.c | 2 --
arch/um/kernel/smp.c | 3 ---
arch/x86/mm/init.c | 2 --
arch/xtensa/mm/mmu.c | 2 --
20 files changed, 39 deletions(-)
Index: linux-2.6/arch/alpha/mm/init.c
===================================================================
--- linux-2.6.orig/arch/alpha/mm/init.c
+++ linux-2.6/arch/alpha/mm/init.c
@@ -32,8 +32,6 @@
#include <asm/console.h>
#include <asm/tlb.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
extern void die_if_kernel(char *,struct pt_regs *,long);
static struct pcb_struct original_pcb;
Index: linux-2.6/arch/arm/mm/mmu.c
===================================================================
--- linux-2.6.orig/arch/arm/mm/mmu.c
+++ linux-2.6/arch/arm/mm/mmu.c
@@ -31,8 +31,6 @@
#include "mm.h"
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
/*
* empty_zero_page is a special page that is used for
* zero-initialized data and COW.
Index: linux-2.6/arch/avr32/mm/init.c
===================================================================
--- linux-2.6.orig/arch/avr32/mm/init.c
+++ linux-2.6/arch/avr32/mm/init.c
@@ -25,8 +25,6 @@
#include <asm/setup.h>
#include <asm/sections.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
pgd_t swapper_pg_dir[PTRS_PER_PGD] __page_aligned_data;
struct page *empty_zero_page;
Index: linux-2.6/arch/cris/mm/init.c
===================================================================
--- linux-2.6.orig/arch/cris/mm/init.c
+++ linux-2.6/arch/cris/mm/init.c
@@ -13,8 +13,6 @@
#include <linux/bootmem.h>
#include <asm/tlb.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
unsigned long empty_zero_page;
extern char _stext, _edata, _etext; /* From linkerscript */
Index: linux-2.6/arch/frv/mm/init.c
===================================================================
--- linux-2.6.orig/arch/frv/mm/init.c
+++ linux-2.6/arch/frv/mm/init.c
@@ -41,8 +41,6 @@
#undef DEBUG
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
/*
* BAD_PAGE is the page that is used for page faults when linux
* is out-of-memory. Older versions of linux just did a
Index: linux-2.6/arch/ia64/mm/init.c
===================================================================
--- linux-2.6.orig/arch/ia64/mm/init.c
+++ linux-2.6/arch/ia64/mm/init.c
@@ -36,8 +36,6 @@
#include <asm/mca.h>
#include <asm/paravirt.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
extern void ia64_tlb_init (void);
unsigned long MAX_DMA_ADDRESS = PAGE_OFFSET + 0x100000000UL;
Index: linux-2.6/arch/m32r/mm/init.c
===================================================================
--- linux-2.6.orig/arch/m32r/mm/init.c
+++ linux-2.6/arch/m32r/mm/init.c
@@ -35,8 +35,6 @@ extern char __init_begin, __init_end;
pgd_t swapper_pg_dir[1024];
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
/*
* Cache of MMU context last used.
*/
Index: linux-2.6/arch/m68k/mm/init.c
===================================================================
--- linux-2.6.orig/arch/m68k/mm/init.c
+++ linux-2.6/arch/m68k/mm/init.c
@@ -32,8 +32,6 @@
#include <asm/sections.h>
#include <asm/tlb.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
pg_data_t pg_data_map[MAX_NUMNODES];
EXPORT_SYMBOL(pg_data_map);
Index: linux-2.6/arch/microblaze/mm/init.c
===================================================================
--- linux-2.6.orig/arch/microblaze/mm/init.c
+++ linux-2.6/arch/microblaze/mm/init.c
@@ -32,8 +32,6 @@ unsigned int __page_offset;
EXPORT_SYMBOL(__page_offset);
#else
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
static int init_bootmem_done;
#endif /* CONFIG_MMU */
Index: linux-2.6/arch/mips/mm/init.c
===================================================================
--- linux-2.6.orig/arch/mips/mm/init.c
+++ linux-2.6/arch/mips/mm/init.c
@@ -64,8 +64,6 @@
#endif /* CONFIG_MIPS_MT_SMTC */
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
/*
* We have up to 8 empty zeroed pages so we can map one of the right colour
* when needed. This is necessary only on R4000 / R4400 SC and MC versions
Index: linux-2.6/arch/mn10300/mm/init.c
===================================================================
--- linux-2.6.orig/arch/mn10300/mm/init.c
+++ linux-2.6/arch/mn10300/mm/init.c
@@ -37,8 +37,6 @@
#include <asm/tlb.h>
#include <asm/sections.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
unsigned long highstart_pfn, highend_pfn;
#ifdef CONFIG_MN10300_HAS_ATOMIC_OPS_UNIT
Index: linux-2.6/arch/parisc/mm/init.c
===================================================================
--- linux-2.6.orig/arch/parisc/mm/init.c
+++ linux-2.6/arch/parisc/mm/init.c
@@ -31,8 +31,6 @@
#include <asm/mmzone.h>
#include <asm/sections.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
extern int data_start;
#ifdef CONFIG_DISCONTIGMEM
Index: linux-2.6/arch/s390/mm/pgtable.c
===================================================================
--- linux-2.6.orig/arch/s390/mm/pgtable.c
+++ linux-2.6/arch/s390/mm/pgtable.c
@@ -36,7 +36,6 @@ struct rcu_table_freelist {
((PAGE_SIZE - sizeof(struct rcu_table_freelist)) \
/ sizeof(unsigned long))
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
static DEFINE_PER_CPU(struct rcu_table_freelist *, rcu_table_freelist);
static void __page_table_free(struct mm_struct *mm, unsigned long *table);
Index: linux-2.6/arch/score/mm/init.c
===================================================================
--- linux-2.6.orig/arch/score/mm/init.c
+++ linux-2.6/arch/score/mm/init.c
@@ -38,8 +38,6 @@
#include <asm/sections.h>
#include <asm/tlb.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
unsigned long empty_zero_page;
EXPORT_SYMBOL_GPL(empty_zero_page);
Index: linux-2.6/arch/sh/mm/init.c
===================================================================
--- linux-2.6.orig/arch/sh/mm/init.c
+++ linux-2.6/arch/sh/mm/init.c
@@ -28,7 +28,6 @@
#include <asm/cache.h>
#include <asm/sizes.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
pgd_t swapper_pg_dir[PTRS_PER_PGD];
void __init generic_mem_init(void)
Index: linux-2.6/arch/sparc/mm/init_32.c
===================================================================
--- linux-2.6.orig/arch/sparc/mm/init_32.c
+++ linux-2.6/arch/sparc/mm/init_32.c
@@ -37,8 +37,6 @@
#include <asm/prom.h>
#include <asm/leon.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
unsigned long *sparc_valid_addr_bitmap;
EXPORT_SYMBOL(sparc_valid_addr_bitmap);
Index: linux-2.6/arch/tile/mm/init.c
===================================================================
--- linux-2.6.orig/arch/tile/mm/init.c
+++ linux-2.6/arch/tile/mm/init.c
@@ -71,8 +71,6 @@
unsigned long VMALLOC_RESERVE = CONFIG_VMALLOC_RESERVE;
#endif
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
/* Create an L2 page table */
static pte_t * __init alloc_pte(void)
{
Index: linux-2.6/arch/um/kernel/smp.c
===================================================================
--- linux-2.6.orig/arch/um/kernel/smp.c
+++ linux-2.6/arch/um/kernel/smp.c
@@ -7,9 +7,6 @@
#include "asm/pgalloc.h"
#include "asm/tlb.h"
-/* For some reason, mmu_gathers are referenced when CONFIG_SMP is off. */
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
#ifdef CONFIG_SMP
#include "linux/sched.h"
Index: linux-2.6/arch/x86/mm/init.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init.c
+++ linux-2.6/arch/x86/mm/init.c
@@ -16,8 +16,6 @@
#include <asm/tlb.h>
#include <asm/proto.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
unsigned long __initdata e820_table_start;
unsigned long __meminitdata e820_table_end;
unsigned long __meminitdata e820_table_top;
Index: linux-2.6/arch/xtensa/mm/mmu.c
===================================================================
--- linux-2.6.orig/arch/xtensa/mm/mmu.c
+++ linux-2.6/arch/xtensa/mm/mmu.c
@@ -14,8 +14,6 @@
#include <asm/mmu_context.h>
#include <asm/page.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
void __init paging_init(void)
{
memset(swapper_pg_dir, 0, PAGE_SIZE);
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 10/25] mm: Now that all old mmu_gather code is gone, remove the storage
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: arch-remove-mmu_gather.patch --]
[-- Type: text/plain, Size: 9151 bytes --]
XXX fold all the mmu_gather preempt patches into one for submission
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/alpha/mm/init.c | 2 --
arch/arm/mm/mmu.c | 2 --
arch/avr32/mm/init.c | 2 --
arch/cris/mm/init.c | 2 --
arch/frv/mm/init.c | 2 --
arch/ia64/mm/init.c | 2 --
arch/m32r/mm/init.c | 2 --
arch/m68k/mm/init.c | 2 --
arch/microblaze/mm/init.c | 2 --
arch/mips/mm/init.c | 2 --
arch/mn10300/mm/init.c | 2 --
arch/parisc/mm/init.c | 2 --
arch/s390/mm/pgtable.c | 1 -
arch/score/mm/init.c | 2 --
arch/sh/mm/init.c | 1 -
arch/sparc/mm/init_32.c | 2 --
arch/tile/mm/init.c | 2 --
arch/um/kernel/smp.c | 3 ---
arch/x86/mm/init.c | 2 --
arch/xtensa/mm/mmu.c | 2 --
20 files changed, 39 deletions(-)
Index: linux-2.6/arch/alpha/mm/init.c
===================================================================
--- linux-2.6.orig/arch/alpha/mm/init.c
+++ linux-2.6/arch/alpha/mm/init.c
@@ -32,8 +32,6 @@
#include <asm/console.h>
#include <asm/tlb.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
extern void die_if_kernel(char *,struct pt_regs *,long);
static struct pcb_struct original_pcb;
Index: linux-2.6/arch/arm/mm/mmu.c
===================================================================
--- linux-2.6.orig/arch/arm/mm/mmu.c
+++ linux-2.6/arch/arm/mm/mmu.c
@@ -31,8 +31,6 @@
#include "mm.h"
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
/*
* empty_zero_page is a special page that is used for
* zero-initialized data and COW.
Index: linux-2.6/arch/avr32/mm/init.c
===================================================================
--- linux-2.6.orig/arch/avr32/mm/init.c
+++ linux-2.6/arch/avr32/mm/init.c
@@ -25,8 +25,6 @@
#include <asm/setup.h>
#include <asm/sections.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
pgd_t swapper_pg_dir[PTRS_PER_PGD] __page_aligned_data;
struct page *empty_zero_page;
Index: linux-2.6/arch/cris/mm/init.c
===================================================================
--- linux-2.6.orig/arch/cris/mm/init.c
+++ linux-2.6/arch/cris/mm/init.c
@@ -13,8 +13,6 @@
#include <linux/bootmem.h>
#include <asm/tlb.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
unsigned long empty_zero_page;
extern char _stext, _edata, _etext; /* From linkerscript */
Index: linux-2.6/arch/frv/mm/init.c
===================================================================
--- linux-2.6.orig/arch/frv/mm/init.c
+++ linux-2.6/arch/frv/mm/init.c
@@ -41,8 +41,6 @@
#undef DEBUG
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
/*
* BAD_PAGE is the page that is used for page faults when linux
* is out-of-memory. Older versions of linux just did a
Index: linux-2.6/arch/ia64/mm/init.c
===================================================================
--- linux-2.6.orig/arch/ia64/mm/init.c
+++ linux-2.6/arch/ia64/mm/init.c
@@ -36,8 +36,6 @@
#include <asm/mca.h>
#include <asm/paravirt.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
extern void ia64_tlb_init (void);
unsigned long MAX_DMA_ADDRESS = PAGE_OFFSET + 0x100000000UL;
Index: linux-2.6/arch/m32r/mm/init.c
===================================================================
--- linux-2.6.orig/arch/m32r/mm/init.c
+++ linux-2.6/arch/m32r/mm/init.c
@@ -35,8 +35,6 @@ extern char __init_begin, __init_end;
pgd_t swapper_pg_dir[1024];
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
/*
* Cache of MMU context last used.
*/
Index: linux-2.6/arch/m68k/mm/init.c
===================================================================
--- linux-2.6.orig/arch/m68k/mm/init.c
+++ linux-2.6/arch/m68k/mm/init.c
@@ -32,8 +32,6 @@
#include <asm/sections.h>
#include <asm/tlb.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
pg_data_t pg_data_map[MAX_NUMNODES];
EXPORT_SYMBOL(pg_data_map);
Index: linux-2.6/arch/microblaze/mm/init.c
===================================================================
--- linux-2.6.orig/arch/microblaze/mm/init.c
+++ linux-2.6/arch/microblaze/mm/init.c
@@ -32,8 +32,6 @@ unsigned int __page_offset;
EXPORT_SYMBOL(__page_offset);
#else
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
static int init_bootmem_done;
#endif /* CONFIG_MMU */
Index: linux-2.6/arch/mips/mm/init.c
===================================================================
--- linux-2.6.orig/arch/mips/mm/init.c
+++ linux-2.6/arch/mips/mm/init.c
@@ -64,8 +64,6 @@
#endif /* CONFIG_MIPS_MT_SMTC */
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
/*
* We have up to 8 empty zeroed pages so we can map one of the right colour
* when needed. This is necessary only on R4000 / R4400 SC and MC versions
Index: linux-2.6/arch/mn10300/mm/init.c
===================================================================
--- linux-2.6.orig/arch/mn10300/mm/init.c
+++ linux-2.6/arch/mn10300/mm/init.c
@@ -37,8 +37,6 @@
#include <asm/tlb.h>
#include <asm/sections.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
unsigned long highstart_pfn, highend_pfn;
#ifdef CONFIG_MN10300_HAS_ATOMIC_OPS_UNIT
Index: linux-2.6/arch/parisc/mm/init.c
===================================================================
--- linux-2.6.orig/arch/parisc/mm/init.c
+++ linux-2.6/arch/parisc/mm/init.c
@@ -31,8 +31,6 @@
#include <asm/mmzone.h>
#include <asm/sections.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
extern int data_start;
#ifdef CONFIG_DISCONTIGMEM
Index: linux-2.6/arch/s390/mm/pgtable.c
===================================================================
--- linux-2.6.orig/arch/s390/mm/pgtable.c
+++ linux-2.6/arch/s390/mm/pgtable.c
@@ -36,7 +36,6 @@ struct rcu_table_freelist {
((PAGE_SIZE - sizeof(struct rcu_table_freelist)) \
/ sizeof(unsigned long))
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
static DEFINE_PER_CPU(struct rcu_table_freelist *, rcu_table_freelist);
static void __page_table_free(struct mm_struct *mm, unsigned long *table);
Index: linux-2.6/arch/score/mm/init.c
===================================================================
--- linux-2.6.orig/arch/score/mm/init.c
+++ linux-2.6/arch/score/mm/init.c
@@ -38,8 +38,6 @@
#include <asm/sections.h>
#include <asm/tlb.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
unsigned long empty_zero_page;
EXPORT_SYMBOL_GPL(empty_zero_page);
Index: linux-2.6/arch/sh/mm/init.c
===================================================================
--- linux-2.6.orig/arch/sh/mm/init.c
+++ linux-2.6/arch/sh/mm/init.c
@@ -28,7 +28,6 @@
#include <asm/cache.h>
#include <asm/sizes.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
pgd_t swapper_pg_dir[PTRS_PER_PGD];
void __init generic_mem_init(void)
Index: linux-2.6/arch/sparc/mm/init_32.c
===================================================================
--- linux-2.6.orig/arch/sparc/mm/init_32.c
+++ linux-2.6/arch/sparc/mm/init_32.c
@@ -37,8 +37,6 @@
#include <asm/prom.h>
#include <asm/leon.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
unsigned long *sparc_valid_addr_bitmap;
EXPORT_SYMBOL(sparc_valid_addr_bitmap);
Index: linux-2.6/arch/tile/mm/init.c
===================================================================
--- linux-2.6.orig/arch/tile/mm/init.c
+++ linux-2.6/arch/tile/mm/init.c
@@ -71,8 +71,6 @@
unsigned long VMALLOC_RESERVE = CONFIG_VMALLOC_RESERVE;
#endif
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
/* Create an L2 page table */
static pte_t * __init alloc_pte(void)
{
Index: linux-2.6/arch/um/kernel/smp.c
===================================================================
--- linux-2.6.orig/arch/um/kernel/smp.c
+++ linux-2.6/arch/um/kernel/smp.c
@@ -7,9 +7,6 @@
#include "asm/pgalloc.h"
#include "asm/tlb.h"
-/* For some reason, mmu_gathers are referenced when CONFIG_SMP is off. */
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
#ifdef CONFIG_SMP
#include "linux/sched.h"
Index: linux-2.6/arch/x86/mm/init.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init.c
+++ linux-2.6/arch/x86/mm/init.c
@@ -16,8 +16,6 @@
#include <asm/tlb.h>
#include <asm/proto.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
unsigned long __initdata e820_table_start;
unsigned long __meminitdata e820_table_end;
unsigned long __meminitdata e820_table_top;
Index: linux-2.6/arch/xtensa/mm/mmu.c
===================================================================
--- linux-2.6.orig/arch/xtensa/mm/mmu.c
+++ linux-2.6/arch/xtensa/mm/mmu.c
@@ -14,8 +14,6 @@
#include <asm/mmu_context.h>
#include <asm/page.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
void __init paging_init(void)
{
memset(swapper_pg_dir, 0, PAGE_SIZE);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 10/25] mm: Now that all old mmu_gather code is gone, remove the storage
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: arch-remove-mmu_gather.patch --]
[-- Type: text/plain, Size: 9151 bytes --]
XXX fold all the mmu_gather preempt patches into one for submission
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/alpha/mm/init.c | 2 --
arch/arm/mm/mmu.c | 2 --
arch/avr32/mm/init.c | 2 --
arch/cris/mm/init.c | 2 --
arch/frv/mm/init.c | 2 --
arch/ia64/mm/init.c | 2 --
arch/m32r/mm/init.c | 2 --
arch/m68k/mm/init.c | 2 --
arch/microblaze/mm/init.c | 2 --
arch/mips/mm/init.c | 2 --
arch/mn10300/mm/init.c | 2 --
arch/parisc/mm/init.c | 2 --
arch/s390/mm/pgtable.c | 1 -
arch/score/mm/init.c | 2 --
arch/sh/mm/init.c | 1 -
arch/sparc/mm/init_32.c | 2 --
arch/tile/mm/init.c | 2 --
arch/um/kernel/smp.c | 3 ---
arch/x86/mm/init.c | 2 --
arch/xtensa/mm/mmu.c | 2 --
20 files changed, 39 deletions(-)
Index: linux-2.6/arch/alpha/mm/init.c
===================================================================
--- linux-2.6.orig/arch/alpha/mm/init.c
+++ linux-2.6/arch/alpha/mm/init.c
@@ -32,8 +32,6 @@
#include <asm/console.h>
#include <asm/tlb.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
extern void die_if_kernel(char *,struct pt_regs *,long);
static struct pcb_struct original_pcb;
Index: linux-2.6/arch/arm/mm/mmu.c
===================================================================
--- linux-2.6.orig/arch/arm/mm/mmu.c
+++ linux-2.6/arch/arm/mm/mmu.c
@@ -31,8 +31,6 @@
#include "mm.h"
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
/*
* empty_zero_page is a special page that is used for
* zero-initialized data and COW.
Index: linux-2.6/arch/avr32/mm/init.c
===================================================================
--- linux-2.6.orig/arch/avr32/mm/init.c
+++ linux-2.6/arch/avr32/mm/init.c
@@ -25,8 +25,6 @@
#include <asm/setup.h>
#include <asm/sections.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
pgd_t swapper_pg_dir[PTRS_PER_PGD] __page_aligned_data;
struct page *empty_zero_page;
Index: linux-2.6/arch/cris/mm/init.c
===================================================================
--- linux-2.6.orig/arch/cris/mm/init.c
+++ linux-2.6/arch/cris/mm/init.c
@@ -13,8 +13,6 @@
#include <linux/bootmem.h>
#include <asm/tlb.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
unsigned long empty_zero_page;
extern char _stext, _edata, _etext; /* From linkerscript */
Index: linux-2.6/arch/frv/mm/init.c
===================================================================
--- linux-2.6.orig/arch/frv/mm/init.c
+++ linux-2.6/arch/frv/mm/init.c
@@ -41,8 +41,6 @@
#undef DEBUG
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
/*
* BAD_PAGE is the page that is used for page faults when linux
* is out-of-memory. Older versions of linux just did a
Index: linux-2.6/arch/ia64/mm/init.c
===================================================================
--- linux-2.6.orig/arch/ia64/mm/init.c
+++ linux-2.6/arch/ia64/mm/init.c
@@ -36,8 +36,6 @@
#include <asm/mca.h>
#include <asm/paravirt.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
extern void ia64_tlb_init (void);
unsigned long MAX_DMA_ADDRESS = PAGE_OFFSET + 0x100000000UL;
Index: linux-2.6/arch/m32r/mm/init.c
===================================================================
--- linux-2.6.orig/arch/m32r/mm/init.c
+++ linux-2.6/arch/m32r/mm/init.c
@@ -35,8 +35,6 @@ extern char __init_begin, __init_end;
pgd_t swapper_pg_dir[1024];
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
/*
* Cache of MMU context last used.
*/
Index: linux-2.6/arch/m68k/mm/init.c
===================================================================
--- linux-2.6.orig/arch/m68k/mm/init.c
+++ linux-2.6/arch/m68k/mm/init.c
@@ -32,8 +32,6 @@
#include <asm/sections.h>
#include <asm/tlb.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
pg_data_t pg_data_map[MAX_NUMNODES];
EXPORT_SYMBOL(pg_data_map);
Index: linux-2.6/arch/microblaze/mm/init.c
===================================================================
--- linux-2.6.orig/arch/microblaze/mm/init.c
+++ linux-2.6/arch/microblaze/mm/init.c
@@ -32,8 +32,6 @@ unsigned int __page_offset;
EXPORT_SYMBOL(__page_offset);
#else
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
static int init_bootmem_done;
#endif /* CONFIG_MMU */
Index: linux-2.6/arch/mips/mm/init.c
===================================================================
--- linux-2.6.orig/arch/mips/mm/init.c
+++ linux-2.6/arch/mips/mm/init.c
@@ -64,8 +64,6 @@
#endif /* CONFIG_MIPS_MT_SMTC */
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
/*
* We have up to 8 empty zeroed pages so we can map one of the right colour
* when needed. This is necessary only on R4000 / R4400 SC and MC versions
Index: linux-2.6/arch/mn10300/mm/init.c
===================================================================
--- linux-2.6.orig/arch/mn10300/mm/init.c
+++ linux-2.6/arch/mn10300/mm/init.c
@@ -37,8 +37,6 @@
#include <asm/tlb.h>
#include <asm/sections.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
unsigned long highstart_pfn, highend_pfn;
#ifdef CONFIG_MN10300_HAS_ATOMIC_OPS_UNIT
Index: linux-2.6/arch/parisc/mm/init.c
===================================================================
--- linux-2.6.orig/arch/parisc/mm/init.c
+++ linux-2.6/arch/parisc/mm/init.c
@@ -31,8 +31,6 @@
#include <asm/mmzone.h>
#include <asm/sections.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
extern int data_start;
#ifdef CONFIG_DISCONTIGMEM
Index: linux-2.6/arch/s390/mm/pgtable.c
===================================================================
--- linux-2.6.orig/arch/s390/mm/pgtable.c
+++ linux-2.6/arch/s390/mm/pgtable.c
@@ -36,7 +36,6 @@ struct rcu_table_freelist {
((PAGE_SIZE - sizeof(struct rcu_table_freelist)) \
/ sizeof(unsigned long))
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
static DEFINE_PER_CPU(struct rcu_table_freelist *, rcu_table_freelist);
static void __page_table_free(struct mm_struct *mm, unsigned long *table);
Index: linux-2.6/arch/score/mm/init.c
===================================================================
--- linux-2.6.orig/arch/score/mm/init.c
+++ linux-2.6/arch/score/mm/init.c
@@ -38,8 +38,6 @@
#include <asm/sections.h>
#include <asm/tlb.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
unsigned long empty_zero_page;
EXPORT_SYMBOL_GPL(empty_zero_page);
Index: linux-2.6/arch/sh/mm/init.c
===================================================================
--- linux-2.6.orig/arch/sh/mm/init.c
+++ linux-2.6/arch/sh/mm/init.c
@@ -28,7 +28,6 @@
#include <asm/cache.h>
#include <asm/sizes.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
pgd_t swapper_pg_dir[PTRS_PER_PGD];
void __init generic_mem_init(void)
Index: linux-2.6/arch/sparc/mm/init_32.c
===================================================================
--- linux-2.6.orig/arch/sparc/mm/init_32.c
+++ linux-2.6/arch/sparc/mm/init_32.c
@@ -37,8 +37,6 @@
#include <asm/prom.h>
#include <asm/leon.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
unsigned long *sparc_valid_addr_bitmap;
EXPORT_SYMBOL(sparc_valid_addr_bitmap);
Index: linux-2.6/arch/tile/mm/init.c
===================================================================
--- linux-2.6.orig/arch/tile/mm/init.c
+++ linux-2.6/arch/tile/mm/init.c
@@ -71,8 +71,6 @@
unsigned long VMALLOC_RESERVE = CONFIG_VMALLOC_RESERVE;
#endif
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
/* Create an L2 page table */
static pte_t * __init alloc_pte(void)
{
Index: linux-2.6/arch/um/kernel/smp.c
===================================================================
--- linux-2.6.orig/arch/um/kernel/smp.c
+++ linux-2.6/arch/um/kernel/smp.c
@@ -7,9 +7,6 @@
#include "asm/pgalloc.h"
#include "asm/tlb.h"
-/* For some reason, mmu_gathers are referenced when CONFIG_SMP is off. */
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
#ifdef CONFIG_SMP
#include "linux/sched.h"
Index: linux-2.6/arch/x86/mm/init.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init.c
+++ linux-2.6/arch/x86/mm/init.c
@@ -16,8 +16,6 @@
#include <asm/tlb.h>
#include <asm/proto.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
unsigned long __initdata e820_table_start;
unsigned long __meminitdata e820_table_end;
unsigned long __meminitdata e820_table_top;
Index: linux-2.6/arch/xtensa/mm/mmu.c
===================================================================
--- linux-2.6.orig/arch/xtensa/mm/mmu.c
+++ linux-2.6/arch/xtensa/mm/mmu.c
@@ -14,8 +14,6 @@
#include <asm/mmu_context.h>
#include <asm/page.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
void __init paging_init(void)
{
memset(swapper_pg_dir, 0, PAGE_SIZE);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 11/25] mm, powerpc: Move the RCU page-table freeing into generic code
2011-01-25 17:31 ` Peter Zijlstra
(?)
@ 2011-01-25 17:31 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: peter_zijlstra-mm_powerpc-move_the_rcu_page-table_freeing_into.patch --]
[-- Type: text/plain, Size: 12550 bytes --]
In case other architectures require RCU freed page-tables to implement
gup_fast() and software filled hashes and similar things, provide the
means to do so by moving the logic into generic code.
Requested-by: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/Kconfig | 3 +
arch/powerpc/Kconfig | 1
arch/powerpc/include/asm/pgalloc.h | 21 ++++++-
arch/powerpc/include/asm/tlb.h | 10 ---
arch/powerpc/mm/pgtable.c | 98 -------------------------------------
arch/powerpc/mm/tlb_hash32.c | 3 -
arch/powerpc/mm/tlb_hash64.c | 3 -
arch/powerpc/mm/tlb_nohash.c | 3 -
include/asm-generic/tlb.h | 57 +++++++++++++++++++--
mm/memory.c | 77 +++++++++++++++++++++++++++++
10 files changed, 151 insertions(+), 125 deletions(-)
Index: linux-2.6/arch/powerpc/include/asm/pgalloc.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/pgalloc.h
+++ linux-2.6/arch/powerpc/include/asm/pgalloc.h
@@ -31,14 +31,29 @@ static inline void pte_free(struct mm_st
#endif
#ifdef CONFIG_SMP
-extern void pgtable_free_tlb(struct mmu_gather *tlb, void *table, unsigned shift);
-extern void pte_free_finish(struct mmu_gather *tlb);
+struct mmu_gather;
+extern void tlb_remove_table(struct mmu_gather *, void *);
+
+static inline void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift)
+{
+ unsigned long pgf = (unsigned long)table;
+ BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
+ pgf |= shift;
+ tlb_remove_table(tlb, (void *)pgf);
+}
+
+static inline void __tlb_remove_table(void *_table)
+{
+ void *table = (void *)((unsigned long)_table & ~MAX_PGTABLE_INDEX_SIZE);
+ unsigned shift = (unsigned long)_table & MAX_PGTABLE_INDEX_SIZE;
+
+ pgtable_free(table, shift);
+}
#else /* CONFIG_SMP */
static inline void pgtable_free_tlb(struct mmu_gather *tlb, void *table, unsigned shift)
{
pgtable_free(table, shift);
}
-static inline void pte_free_finish(struct mmu_gather *tlb) { }
#endif /* !CONFIG_SMP */
static inline void __pte_free_tlb(struct mmu_gather *tlb, struct page *ptepage,
Index: linux-2.6/arch/powerpc/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/tlb.h
+++ linux-2.6/arch/powerpc/include/asm/tlb.h
@@ -28,16 +28,6 @@
#define tlb_start_vma(tlb, vma) do { } while (0)
#define tlb_end_vma(tlb, vma) do { } while (0)
-#define HAVE_ARCH_MMU_GATHER 1
-
-struct pte_freelist_batch;
-
-struct arch_mmu_gather {
- struct pte_freelist_batch *batch;
-};
-
-#define ARCH_MMU_GATHER_INIT (struct arch_mmu_gather){ .batch = NULL, }
-
extern void tlb_flush(struct mmu_gather *tlb);
/* Get the generic bits... */
Index: linux-2.6/arch/powerpc/mm/pgtable.c
===================================================================
--- linux-2.6.orig/arch/powerpc/mm/pgtable.c
+++ linux-2.6/arch/powerpc/mm/pgtable.c
@@ -33,104 +33,6 @@
#include "mmu_decl.h"
-#ifdef CONFIG_SMP
-
-/*
- * Handle batching of page table freeing on SMP. Page tables are
- * queued up and send to be freed later by RCU in order to avoid
- * freeing a page table page that is being walked without locks
- */
-
-static unsigned long pte_freelist_forced_free;
-
-struct pte_freelist_batch
-{
- struct rcu_head rcu;
- unsigned int index;
- unsigned long tables[0];
-};
-
-#define PTE_FREELIST_SIZE \
- ((PAGE_SIZE - sizeof(struct pte_freelist_batch)) \
- / sizeof(unsigned long))
-
-static void pte_free_smp_sync(void *arg)
-{
- /* Do nothing, just ensure we sync with all CPUs */
-}
-
-/* This is only called when we are critically out of memory
- * (and fail to get a page in pte_free_tlb).
- */
-static void pgtable_free_now(void *table, unsigned shift)
-{
- pte_freelist_forced_free++;
-
- smp_call_function(pte_free_smp_sync, NULL, 1);
-
- pgtable_free(table, shift);
-}
-
-static void pte_free_rcu_callback(struct rcu_head *head)
-{
- struct pte_freelist_batch *batch =
- container_of(head, struct pte_freelist_batch, rcu);
- unsigned int i;
-
- for (i = 0; i < batch->index; i++) {
- void *table = (void *)(batch->tables[i] & ~MAX_PGTABLE_INDEX_SIZE);
- unsigned shift = batch->tables[i] & MAX_PGTABLE_INDEX_SIZE;
-
- pgtable_free(table, shift);
- }
-
- free_page((unsigned long)batch);
-}
-
-static void pte_free_submit(struct pte_freelist_batch *batch)
-{
- call_rcu_sched(&batch->rcu, pte_free_rcu_callback);
-}
-
-void pgtable_free_tlb(struct mmu_gather *tlb, void *table, unsigned shift)
-{
- struct pte_freelist_batch **batchp = &tlb->arch.batch;
- unsigned long pgf;
-
- if (atomic_read(&tlb->mm->mm_users) < 2) {
- pgtable_free(table, shift);
- return;
- }
-
- if (*batchp == NULL) {
- *batchp = (struct pte_freelist_batch *)__get_free_page(GFP_ATOMIC);
- if (*batchp == NULL) {
- pgtable_free_now(table, shift);
- return;
- }
- (*batchp)->index = 0;
- }
- BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
- pgf = (unsigned long)table | shift;
- (*batchp)->tables[(*batchp)->index++] = pgf;
- if ((*batchp)->index == PTE_FREELIST_SIZE) {
- pte_free_submit(*batchp);
- *batchp = NULL;
- }
-}
-
-void pte_free_finish(struct mmu_gather *tlb)
-{
- struct pte_freelist_batch **batchp = &tlb->arch.batch;
-
- if (*batchp == NULL)
- return;
- pte_free_submit(*batchp);
- *batchp = NULL;
-}
-
-#endif /* CONFIG_SMP */
-
static inline int is_exec_fault(void)
{
return current->thread.regs && TRAP(current->thread.regs) == 0x400;
Index: linux-2.6/arch/powerpc/mm/tlb_hash32.c
===================================================================
--- linux-2.6.orig/arch/powerpc/mm/tlb_hash32.c
+++ linux-2.6/arch/powerpc/mm/tlb_hash32.c
@@ -71,9 +71,6 @@ void tlb_flush(struct mmu_gather *tlb)
*/
_tlbia();
}
-
- /* Push out batch of freed page tables */
- pte_free_finish(tlb);
}
/*
Index: linux-2.6/arch/powerpc/mm/tlb_hash64.c
===================================================================
--- linux-2.6.orig/arch/powerpc/mm/tlb_hash64.c
+++ linux-2.6/arch/powerpc/mm/tlb_hash64.c
@@ -165,9 +165,6 @@ void tlb_flush(struct mmu_gather *tlb)
__flush_tlb_pending(tlbbatch);
put_cpu_var(ppc64_tlb_batch);
-
- /* Push out batch of freed page tables */
- pte_free_finish(tlb);
}
/**
Index: linux-2.6/arch/powerpc/mm/tlb_nohash.c
===================================================================
--- linux-2.6.orig/arch/powerpc/mm/tlb_nohash.c
+++ linux-2.6/arch/powerpc/mm/tlb_nohash.c
@@ -299,9 +299,6 @@ EXPORT_SYMBOL(flush_tlb_range);
void tlb_flush(struct mmu_gather *tlb)
{
flush_tlb_mm(tlb->mm);
-
- /* Push out batch of freed page tables */
- pte_free_finish(tlb);
}
/*
Index: linux-2.6/include/asm-generic/tlb.h
===================================================================
--- linux-2.6.orig/include/asm-generic/tlb.h
+++ linux-2.6/include/asm-generic/tlb.h
@@ -27,6 +27,49 @@
#define tlb_fast_mode(tlb) 1
#endif
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+/*
+ * Semi RCU freeing of the page directories.
+ *
+ * This is needed by some architectures to implement software pagetable walkers.
+ *
+ * gup_fast() and other software pagetable walkers do a lockless page-table
+ * walk and therefore needs some synchronization with the freeing of the page
+ * directories. The chosen means to accomplish that is by disabling IRQs over
+ * the walk.
+ *
+ * Architectures that use IPIs to flush TLBs will then automagically DTRT,
+ * since we unlink the page, flush TLBs, free the page. Since the disabling of
+ * IRQs delays the completion of the TLB flush we can never observe an already
+ * freed page.
+ *
+ * Architectures that do not have this (PPC) need to delay the freeing by some
+ * other means, this is that means.
+ *
+ * What we do is batch the freed directory pages (tables) and RCU free them.
+ * We use the sched RCU variant, as that guarantees that IRQ/preempt disabling
+ * holds off grace periods.
+ *
+ * However, in order to batch these pages we need to allocate storage, this
+ * allocation is deep inside the MM code and can thus easily fail on memory
+ * pressure. To guarantee progress we fall back to single table freeing, see
+ * the implementation of tlb_remove_table_one().
+ *
+ */
+struct mmu_table_batch {
+ struct rcu_head rcu;
+ unsigned int nr;
+ void *tables[0];
+};
+
+#define MAX_TABLE_BATCH \
+ ((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
+
+extern void tlb_table_flush(struct mmu_gather *tlb);
+extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
+
+#endif
+
/* struct mmu_gather is an opaque type used by the mm code for passing around
* any data needed by arch specific code for tlb_remove_page.
*/
@@ -36,11 +79,12 @@ struct mmu_gather {
unsigned int max; /* nr < max */
unsigned int need_flush;/* Really unmapped some ptes? */
unsigned int fullmm; /* non-zero means full mm flush */
-#ifdef HAVE_ARCH_MMU_GATHER
- struct arch_mmu_gather arch;
-#endif
struct page **pages;
struct page *local[8];
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+ struct mmu_table_batch *batch;
+#endif
};
static inline void __tlb_alloc_page(struct mmu_gather *tlb)
@@ -72,8 +116,8 @@ tlb_gather_mmu(struct mmu_gather *tlb, s
tlb->fullmm = full_mm_flush;
-#ifdef HAVE_ARCH_MMU_GATHER
- tlb->arch = ARCH_MMU_GATHER_INIT;
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+ tlb->batch = NULL;
#endif
}
@@ -84,6 +128,9 @@ tlb_flush_mmu(struct mmu_gather *tlb, un
return;
tlb->need_flush = 0;
tlb_flush(tlb);
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+ tlb_table_flush(tlb);
+#endif
if (!tlb_fast_mode(tlb)) {
free_pages_and_swap_cache(tlb->pages, tlb->nr);
tlb->nr = 0;
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -193,6 +193,83 @@ static void check_sync_rss_stat(struct t
#endif
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+
+/*
+ * See the comment near struct mmu_table_batch.
+ */
+
+static void tlb_remove_table_smp_sync(void *arg)
+{
+ /* Simply deliver the interrupt */
+}
+
+static void tlb_remove_table_one(void *table)
+{
+ /*
+ * This isn't an RCU grace period and hence the page-tables cannot be
+ * assumed to be actually RCU-freed.
+ *
+ * It is however sufficient for software page-table walkers that rely on
+ * IRQ disabling. See the comment near struct mmu_table_batch.
+ */
+ smp_call_function(tlb_remove_table_smp_sync, NULL, 1);
+ __tlb_remove_table(table);
+}
+
+static void tlb_remove_table_rcu(struct rcu_head *head)
+{
+ struct mmu_table_batch *batch;
+ int i;
+
+ batch = container_of(head, struct mmu_table_batch, rcu);
+
+ for (i = 0; i < batch->nr; i++)
+ __tlb_remove_table(batch->tables[i]);
+
+ free_page((unsigned long)batch);
+}
+
+void tlb_table_flush(struct mmu_gather *tlb)
+{
+ struct mmu_table_batch **batch = &tlb->batch;
+
+ if (*batch) {
+ call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
+ *batch = NULL;
+ }
+}
+
+void tlb_remove_table(struct mmu_gather *tlb, void *table)
+{
+ struct mmu_table_batch **batch = &tlb->batch;
+
+ tlb->need_flush = 1;
+
+ /*
+ * When there's less then two users of this mm there cannot be a
+ * concurrent page-table walk.
+ */
+ if (atomic_read(&tlb->mm->mm_users) < 2) {
+ __tlb_remove_table(table);
+ return;
+ }
+
+ if (*batch == NULL) {
+ *batch = (struct mmu_table_batch *)__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
+ if (*batch == NULL) {
+ tlb_remove_table_one(table);
+ return;
+ }
+ (*batch)->nr = 0;
+ }
+ (*batch)->tables[(*batch)->nr++] = table;
+ if ((*batch)->nr == MAX_TABLE_BATCH)
+ tlb_table_flush(tlb);
+}
+
+#endif
+
/*
* If a p?d_bad entry is found while walking page tables, report
* the error, before resetting entry to p?d_none. Usually (but
Index: linux-2.6/arch/Kconfig
===================================================================
--- linux-2.6.orig/arch/Kconfig
+++ linux-2.6/arch/Kconfig
@@ -178,4 +178,7 @@ config HAVE_ARCH_JUMP_LABEL
config HAVE_ARCH_MUTEX_CPU_RELAX
bool
+config HAVE_RCU_TABLE_FREE
+ bool
+
source "kernel/gcov/Kconfig"
Index: linux-2.6/arch/powerpc/Kconfig
===================================================================
--- linux-2.6.orig/arch/powerpc/Kconfig
+++ linux-2.6/arch/powerpc/Kconfig
@@ -134,6 +134,7 @@ config PPC
select HAVE_GENERIC_HARDIRQS
select HAVE_SPARSE_IRQ
select IRQ_PER_CPU
+ select HAVE_RCU_TABLE_FREE if SMP
config EARLY_PRINTK
bool
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 11/25] mm, powerpc: Move the RCU page-table freeing into generic code
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: peter_zijlstra-mm_powerpc-move_the_rcu_page-table_freeing_into.patch --]
[-- Type: text/plain, Size: 12846 bytes --]
In case other architectures require RCU freed page-tables to implement
gup_fast() and software filled hashes and similar things, provide the
means to do so by moving the logic into generic code.
Requested-by: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/Kconfig | 3 +
arch/powerpc/Kconfig | 1
arch/powerpc/include/asm/pgalloc.h | 21 ++++++-
arch/powerpc/include/asm/tlb.h | 10 ---
arch/powerpc/mm/pgtable.c | 98 -------------------------------------
arch/powerpc/mm/tlb_hash32.c | 3 -
arch/powerpc/mm/tlb_hash64.c | 3 -
arch/powerpc/mm/tlb_nohash.c | 3 -
include/asm-generic/tlb.h | 57 +++++++++++++++++++--
mm/memory.c | 77 +++++++++++++++++++++++++++++
10 files changed, 151 insertions(+), 125 deletions(-)
Index: linux-2.6/arch/powerpc/include/asm/pgalloc.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/pgalloc.h
+++ linux-2.6/arch/powerpc/include/asm/pgalloc.h
@@ -31,14 +31,29 @@ static inline void pte_free(struct mm_st
#endif
#ifdef CONFIG_SMP
-extern void pgtable_free_tlb(struct mmu_gather *tlb, void *table, unsigned shift);
-extern void pte_free_finish(struct mmu_gather *tlb);
+struct mmu_gather;
+extern void tlb_remove_table(struct mmu_gather *, void *);
+
+static inline void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift)
+{
+ unsigned long pgf = (unsigned long)table;
+ BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
+ pgf |= shift;
+ tlb_remove_table(tlb, (void *)pgf);
+}
+
+static inline void __tlb_remove_table(void *_table)
+{
+ void *table = (void *)((unsigned long)_table & ~MAX_PGTABLE_INDEX_SIZE);
+ unsigned shift = (unsigned long)_table & MAX_PGTABLE_INDEX_SIZE;
+
+ pgtable_free(table, shift);
+}
#else /* CONFIG_SMP */
static inline void pgtable_free_tlb(struct mmu_gather *tlb, void *table, unsigned shift)
{
pgtable_free(table, shift);
}
-static inline void pte_free_finish(struct mmu_gather *tlb) { }
#endif /* !CONFIG_SMP */
static inline void __pte_free_tlb(struct mmu_gather *tlb, struct page *ptepage,
Index: linux-2.6/arch/powerpc/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/tlb.h
+++ linux-2.6/arch/powerpc/include/asm/tlb.h
@@ -28,16 +28,6 @@
#define tlb_start_vma(tlb, vma) do { } while (0)
#define tlb_end_vma(tlb, vma) do { } while (0)
-#define HAVE_ARCH_MMU_GATHER 1
-
-struct pte_freelist_batch;
-
-struct arch_mmu_gather {
- struct pte_freelist_batch *batch;
-};
-
-#define ARCH_MMU_GATHER_INIT (struct arch_mmu_gather){ .batch = NULL, }
-
extern void tlb_flush(struct mmu_gather *tlb);
/* Get the generic bits... */
Index: linux-2.6/arch/powerpc/mm/pgtable.c
===================================================================
--- linux-2.6.orig/arch/powerpc/mm/pgtable.c
+++ linux-2.6/arch/powerpc/mm/pgtable.c
@@ -33,104 +33,6 @@
#include "mmu_decl.h"
-#ifdef CONFIG_SMP
-
-/*
- * Handle batching of page table freeing on SMP. Page tables are
- * queued up and send to be freed later by RCU in order to avoid
- * freeing a page table page that is being walked without locks
- */
-
-static unsigned long pte_freelist_forced_free;
-
-struct pte_freelist_batch
-{
- struct rcu_head rcu;
- unsigned int index;
- unsigned long tables[0];
-};
-
-#define PTE_FREELIST_SIZE \
- ((PAGE_SIZE - sizeof(struct pte_freelist_batch)) \
- / sizeof(unsigned long))
-
-static void pte_free_smp_sync(void *arg)
-{
- /* Do nothing, just ensure we sync with all CPUs */
-}
-
-/* This is only called when we are critically out of memory
- * (and fail to get a page in pte_free_tlb).
- */
-static void pgtable_free_now(void *table, unsigned shift)
-{
- pte_freelist_forced_free++;
-
- smp_call_function(pte_free_smp_sync, NULL, 1);
-
- pgtable_free(table, shift);
-}
-
-static void pte_free_rcu_callback(struct rcu_head *head)
-{
- struct pte_freelist_batch *batch =
- container_of(head, struct pte_freelist_batch, rcu);
- unsigned int i;
-
- for (i = 0; i < batch->index; i++) {
- void *table = (void *)(batch->tables[i] & ~MAX_PGTABLE_INDEX_SIZE);
- unsigned shift = batch->tables[i] & MAX_PGTABLE_INDEX_SIZE;
-
- pgtable_free(table, shift);
- }
-
- free_page((unsigned long)batch);
-}
-
-static void pte_free_submit(struct pte_freelist_batch *batch)
-{
- call_rcu_sched(&batch->rcu, pte_free_rcu_callback);
-}
-
-void pgtable_free_tlb(struct mmu_gather *tlb, void *table, unsigned shift)
-{
- struct pte_freelist_batch **batchp = &tlb->arch.batch;
- unsigned long pgf;
-
- if (atomic_read(&tlb->mm->mm_users) < 2) {
- pgtable_free(table, shift);
- return;
- }
-
- if (*batchp == NULL) {
- *batchp = (struct pte_freelist_batch *)__get_free_page(GFP_ATOMIC);
- if (*batchp == NULL) {
- pgtable_free_now(table, shift);
- return;
- }
- (*batchp)->index = 0;
- }
- BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
- pgf = (unsigned long)table | shift;
- (*batchp)->tables[(*batchp)->index++] = pgf;
- if ((*batchp)->index == PTE_FREELIST_SIZE) {
- pte_free_submit(*batchp);
- *batchp = NULL;
- }
-}
-
-void pte_free_finish(struct mmu_gather *tlb)
-{
- struct pte_freelist_batch **batchp = &tlb->arch.batch;
-
- if (*batchp == NULL)
- return;
- pte_free_submit(*batchp);
- *batchp = NULL;
-}
-
-#endif /* CONFIG_SMP */
-
static inline int is_exec_fault(void)
{
return current->thread.regs && TRAP(current->thread.regs) == 0x400;
Index: linux-2.6/arch/powerpc/mm/tlb_hash32.c
===================================================================
--- linux-2.6.orig/arch/powerpc/mm/tlb_hash32.c
+++ linux-2.6/arch/powerpc/mm/tlb_hash32.c
@@ -71,9 +71,6 @@ void tlb_flush(struct mmu_gather *tlb)
*/
_tlbia();
}
-
- /* Push out batch of freed page tables */
- pte_free_finish(tlb);
}
/*
Index: linux-2.6/arch/powerpc/mm/tlb_hash64.c
===================================================================
--- linux-2.6.orig/arch/powerpc/mm/tlb_hash64.c
+++ linux-2.6/arch/powerpc/mm/tlb_hash64.c
@@ -165,9 +165,6 @@ void tlb_flush(struct mmu_gather *tlb)
__flush_tlb_pending(tlbbatch);
put_cpu_var(ppc64_tlb_batch);
-
- /* Push out batch of freed page tables */
- pte_free_finish(tlb);
}
/**
Index: linux-2.6/arch/powerpc/mm/tlb_nohash.c
===================================================================
--- linux-2.6.orig/arch/powerpc/mm/tlb_nohash.c
+++ linux-2.6/arch/powerpc/mm/tlb_nohash.c
@@ -299,9 +299,6 @@ EXPORT_SYMBOL(flush_tlb_range);
void tlb_flush(struct mmu_gather *tlb)
{
flush_tlb_mm(tlb->mm);
-
- /* Push out batch of freed page tables */
- pte_free_finish(tlb);
}
/*
Index: linux-2.6/include/asm-generic/tlb.h
===================================================================
--- linux-2.6.orig/include/asm-generic/tlb.h
+++ linux-2.6/include/asm-generic/tlb.h
@@ -27,6 +27,49 @@
#define tlb_fast_mode(tlb) 1
#endif
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+/*
+ * Semi RCU freeing of the page directories.
+ *
+ * This is needed by some architectures to implement software pagetable walkers.
+ *
+ * gup_fast() and other software pagetable walkers do a lockless page-table
+ * walk and therefore needs some synchronization with the freeing of the page
+ * directories. The chosen means to accomplish that is by disabling IRQs over
+ * the walk.
+ *
+ * Architectures that use IPIs to flush TLBs will then automagically DTRT,
+ * since we unlink the page, flush TLBs, free the page. Since the disabling of
+ * IRQs delays the completion of the TLB flush we can never observe an already
+ * freed page.
+ *
+ * Architectures that do not have this (PPC) need to delay the freeing by some
+ * other means, this is that means.
+ *
+ * What we do is batch the freed directory pages (tables) and RCU free them.
+ * We use the sched RCU variant, as that guarantees that IRQ/preempt disabling
+ * holds off grace periods.
+ *
+ * However, in order to batch these pages we need to allocate storage, this
+ * allocation is deep inside the MM code and can thus easily fail on memory
+ * pressure. To guarantee progress we fall back to single table freeing, see
+ * the implementation of tlb_remove_table_one().
+ *
+ */
+struct mmu_table_batch {
+ struct rcu_head rcu;
+ unsigned int nr;
+ void *tables[0];
+};
+
+#define MAX_TABLE_BATCH \
+ ((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
+
+extern void tlb_table_flush(struct mmu_gather *tlb);
+extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
+
+#endif
+
/* struct mmu_gather is an opaque type used by the mm code for passing around
* any data needed by arch specific code for tlb_remove_page.
*/
@@ -36,11 +79,12 @@ struct mmu_gather {
unsigned int max; /* nr < max */
unsigned int need_flush;/* Really unmapped some ptes? */
unsigned int fullmm; /* non-zero means full mm flush */
-#ifdef HAVE_ARCH_MMU_GATHER
- struct arch_mmu_gather arch;
-#endif
struct page **pages;
struct page *local[8];
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+ struct mmu_table_batch *batch;
+#endif
};
static inline void __tlb_alloc_page(struct mmu_gather *tlb)
@@ -72,8 +116,8 @@ tlb_gather_mmu(struct mmu_gather *tlb, s
tlb->fullmm = full_mm_flush;
-#ifdef HAVE_ARCH_MMU_GATHER
- tlb->arch = ARCH_MMU_GATHER_INIT;
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+ tlb->batch = NULL;
#endif
}
@@ -84,6 +128,9 @@ tlb_flush_mmu(struct mmu_gather *tlb, un
return;
tlb->need_flush = 0;
tlb_flush(tlb);
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+ tlb_table_flush(tlb);
+#endif
if (!tlb_fast_mode(tlb)) {
free_pages_and_swap_cache(tlb->pages, tlb->nr);
tlb->nr = 0;
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -193,6 +193,83 @@ static void check_sync_rss_stat(struct t
#endif
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+
+/*
+ * See the comment near struct mmu_table_batch.
+ */
+
+static void tlb_remove_table_smp_sync(void *arg)
+{
+ /* Simply deliver the interrupt */
+}
+
+static void tlb_remove_table_one(void *table)
+{
+ /*
+ * This isn't an RCU grace period and hence the page-tables cannot be
+ * assumed to be actually RCU-freed.
+ *
+ * It is however sufficient for software page-table walkers that rely on
+ * IRQ disabling. See the comment near struct mmu_table_batch.
+ */
+ smp_call_function(tlb_remove_table_smp_sync, NULL, 1);
+ __tlb_remove_table(table);
+}
+
+static void tlb_remove_table_rcu(struct rcu_head *head)
+{
+ struct mmu_table_batch *batch;
+ int i;
+
+ batch = container_of(head, struct mmu_table_batch, rcu);
+
+ for (i = 0; i < batch->nr; i++)
+ __tlb_remove_table(batch->tables[i]);
+
+ free_page((unsigned long)batch);
+}
+
+void tlb_table_flush(struct mmu_gather *tlb)
+{
+ struct mmu_table_batch **batch = &tlb->batch;
+
+ if (*batch) {
+ call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
+ *batch = NULL;
+ }
+}
+
+void tlb_remove_table(struct mmu_gather *tlb, void *table)
+{
+ struct mmu_table_batch **batch = &tlb->batch;
+
+ tlb->need_flush = 1;
+
+ /*
+ * When there's less then two users of this mm there cannot be a
+ * concurrent page-table walk.
+ */
+ if (atomic_read(&tlb->mm->mm_users) < 2) {
+ __tlb_remove_table(table);
+ return;
+ }
+
+ if (*batch == NULL) {
+ *batch = (struct mmu_table_batch *)__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
+ if (*batch == NULL) {
+ tlb_remove_table_one(table);
+ return;
+ }
+ (*batch)->nr = 0;
+ }
+ (*batch)->tables[(*batch)->nr++] = table;
+ if ((*batch)->nr == MAX_TABLE_BATCH)
+ tlb_table_flush(tlb);
+}
+
+#endif
+
/*
* If a p?d_bad entry is found while walking page tables, report
* the error, before resetting entry to p?d_none. Usually (but
Index: linux-2.6/arch/Kconfig
===================================================================
--- linux-2.6.orig/arch/Kconfig
+++ linux-2.6/arch/Kconfig
@@ -178,4 +178,7 @@ config HAVE_ARCH_JUMP_LABEL
config HAVE_ARCH_MUTEX_CPU_RELAX
bool
+config HAVE_RCU_TABLE_FREE
+ bool
+
source "kernel/gcov/Kconfig"
Index: linux-2.6/arch/powerpc/Kconfig
===================================================================
--- linux-2.6.orig/arch/powerpc/Kconfig
+++ linux-2.6/arch/powerpc/Kconfig
@@ -134,6 +134,7 @@ config PPC
select HAVE_GENERIC_HARDIRQS
select HAVE_SPARSE_IRQ
select IRQ_PER_CPU
+ select HAVE_RCU_TABLE_FREE if SMP
config EARLY_PRINTK
bool
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 11/25] mm, powerpc: Move the RCU page-table freeing into generic code
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: peter_zijlstra-mm_powerpc-move_the_rcu_page-table_freeing_into.patch --]
[-- Type: text/plain, Size: 12846 bytes --]
In case other architectures require RCU freed page-tables to implement
gup_fast() and software filled hashes and similar things, provide the
means to do so by moving the logic into generic code.
Requested-by: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/Kconfig | 3 +
arch/powerpc/Kconfig | 1
arch/powerpc/include/asm/pgalloc.h | 21 ++++++-
arch/powerpc/include/asm/tlb.h | 10 ---
arch/powerpc/mm/pgtable.c | 98 -------------------------------------
arch/powerpc/mm/tlb_hash32.c | 3 -
arch/powerpc/mm/tlb_hash64.c | 3 -
arch/powerpc/mm/tlb_nohash.c | 3 -
include/asm-generic/tlb.h | 57 +++++++++++++++++++--
mm/memory.c | 77 +++++++++++++++++++++++++++++
10 files changed, 151 insertions(+), 125 deletions(-)
Index: linux-2.6/arch/powerpc/include/asm/pgalloc.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/pgalloc.h
+++ linux-2.6/arch/powerpc/include/asm/pgalloc.h
@@ -31,14 +31,29 @@ static inline void pte_free(struct mm_st
#endif
#ifdef CONFIG_SMP
-extern void pgtable_free_tlb(struct mmu_gather *tlb, void *table, unsigned shift);
-extern void pte_free_finish(struct mmu_gather *tlb);
+struct mmu_gather;
+extern void tlb_remove_table(struct mmu_gather *, void *);
+
+static inline void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift)
+{
+ unsigned long pgf = (unsigned long)table;
+ BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
+ pgf |= shift;
+ tlb_remove_table(tlb, (void *)pgf);
+}
+
+static inline void __tlb_remove_table(void *_table)
+{
+ void *table = (void *)((unsigned long)_table & ~MAX_PGTABLE_INDEX_SIZE);
+ unsigned shift = (unsigned long)_table & MAX_PGTABLE_INDEX_SIZE;
+
+ pgtable_free(table, shift);
+}
#else /* CONFIG_SMP */
static inline void pgtable_free_tlb(struct mmu_gather *tlb, void *table, unsigned shift)
{
pgtable_free(table, shift);
}
-static inline void pte_free_finish(struct mmu_gather *tlb) { }
#endif /* !CONFIG_SMP */
static inline void __pte_free_tlb(struct mmu_gather *tlb, struct page *ptepage,
Index: linux-2.6/arch/powerpc/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/tlb.h
+++ linux-2.6/arch/powerpc/include/asm/tlb.h
@@ -28,16 +28,6 @@
#define tlb_start_vma(tlb, vma) do { } while (0)
#define tlb_end_vma(tlb, vma) do { } while (0)
-#define HAVE_ARCH_MMU_GATHER 1
-
-struct pte_freelist_batch;
-
-struct arch_mmu_gather {
- struct pte_freelist_batch *batch;
-};
-
-#define ARCH_MMU_GATHER_INIT (struct arch_mmu_gather){ .batch = NULL, }
-
extern void tlb_flush(struct mmu_gather *tlb);
/* Get the generic bits... */
Index: linux-2.6/arch/powerpc/mm/pgtable.c
===================================================================
--- linux-2.6.orig/arch/powerpc/mm/pgtable.c
+++ linux-2.6/arch/powerpc/mm/pgtable.c
@@ -33,104 +33,6 @@
#include "mmu_decl.h"
-#ifdef CONFIG_SMP
-
-/*
- * Handle batching of page table freeing on SMP. Page tables are
- * queued up and send to be freed later by RCU in order to avoid
- * freeing a page table page that is being walked without locks
- */
-
-static unsigned long pte_freelist_forced_free;
-
-struct pte_freelist_batch
-{
- struct rcu_head rcu;
- unsigned int index;
- unsigned long tables[0];
-};
-
-#define PTE_FREELIST_SIZE \
- ((PAGE_SIZE - sizeof(struct pte_freelist_batch)) \
- / sizeof(unsigned long))
-
-static void pte_free_smp_sync(void *arg)
-{
- /* Do nothing, just ensure we sync with all CPUs */
-}
-
-/* This is only called when we are critically out of memory
- * (and fail to get a page in pte_free_tlb).
- */
-static void pgtable_free_now(void *table, unsigned shift)
-{
- pte_freelist_forced_free++;
-
- smp_call_function(pte_free_smp_sync, NULL, 1);
-
- pgtable_free(table, shift);
-}
-
-static void pte_free_rcu_callback(struct rcu_head *head)
-{
- struct pte_freelist_batch *batch =
- container_of(head, struct pte_freelist_batch, rcu);
- unsigned int i;
-
- for (i = 0; i < batch->index; i++) {
- void *table = (void *)(batch->tables[i] & ~MAX_PGTABLE_INDEX_SIZE);
- unsigned shift = batch->tables[i] & MAX_PGTABLE_INDEX_SIZE;
-
- pgtable_free(table, shift);
- }
-
- free_page((unsigned long)batch);
-}
-
-static void pte_free_submit(struct pte_freelist_batch *batch)
-{
- call_rcu_sched(&batch->rcu, pte_free_rcu_callback);
-}
-
-void pgtable_free_tlb(struct mmu_gather *tlb, void *table, unsigned shift)
-{
- struct pte_freelist_batch **batchp = &tlb->arch.batch;
- unsigned long pgf;
-
- if (atomic_read(&tlb->mm->mm_users) < 2) {
- pgtable_free(table, shift);
- return;
- }
-
- if (*batchp == NULL) {
- *batchp = (struct pte_freelist_batch *)__get_free_page(GFP_ATOMIC);
- if (*batchp == NULL) {
- pgtable_free_now(table, shift);
- return;
- }
- (*batchp)->index = 0;
- }
- BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
- pgf = (unsigned long)table | shift;
- (*batchp)->tables[(*batchp)->index++] = pgf;
- if ((*batchp)->index == PTE_FREELIST_SIZE) {
- pte_free_submit(*batchp);
- *batchp = NULL;
- }
-}
-
-void pte_free_finish(struct mmu_gather *tlb)
-{
- struct pte_freelist_batch **batchp = &tlb->arch.batch;
-
- if (*batchp == NULL)
- return;
- pte_free_submit(*batchp);
- *batchp = NULL;
-}
-
-#endif /* CONFIG_SMP */
-
static inline int is_exec_fault(void)
{
return current->thread.regs && TRAP(current->thread.regs) == 0x400;
Index: linux-2.6/arch/powerpc/mm/tlb_hash32.c
===================================================================
--- linux-2.6.orig/arch/powerpc/mm/tlb_hash32.c
+++ linux-2.6/arch/powerpc/mm/tlb_hash32.c
@@ -71,9 +71,6 @@ void tlb_flush(struct mmu_gather *tlb)
*/
_tlbia();
}
-
- /* Push out batch of freed page tables */
- pte_free_finish(tlb);
}
/*
Index: linux-2.6/arch/powerpc/mm/tlb_hash64.c
===================================================================
--- linux-2.6.orig/arch/powerpc/mm/tlb_hash64.c
+++ linux-2.6/arch/powerpc/mm/tlb_hash64.c
@@ -165,9 +165,6 @@ void tlb_flush(struct mmu_gather *tlb)
__flush_tlb_pending(tlbbatch);
put_cpu_var(ppc64_tlb_batch);
-
- /* Push out batch of freed page tables */
- pte_free_finish(tlb);
}
/**
Index: linux-2.6/arch/powerpc/mm/tlb_nohash.c
===================================================================
--- linux-2.6.orig/arch/powerpc/mm/tlb_nohash.c
+++ linux-2.6/arch/powerpc/mm/tlb_nohash.c
@@ -299,9 +299,6 @@ EXPORT_SYMBOL(flush_tlb_range);
void tlb_flush(struct mmu_gather *tlb)
{
flush_tlb_mm(tlb->mm);
-
- /* Push out batch of freed page tables */
- pte_free_finish(tlb);
}
/*
Index: linux-2.6/include/asm-generic/tlb.h
===================================================================
--- linux-2.6.orig/include/asm-generic/tlb.h
+++ linux-2.6/include/asm-generic/tlb.h
@@ -27,6 +27,49 @@
#define tlb_fast_mode(tlb) 1
#endif
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+/*
+ * Semi RCU freeing of the page directories.
+ *
+ * This is needed by some architectures to implement software pagetable walkers.
+ *
+ * gup_fast() and other software pagetable walkers do a lockless page-table
+ * walk and therefore needs some synchronization with the freeing of the page
+ * directories. The chosen means to accomplish that is by disabling IRQs over
+ * the walk.
+ *
+ * Architectures that use IPIs to flush TLBs will then automagically DTRT,
+ * since we unlink the page, flush TLBs, free the page. Since the disabling of
+ * IRQs delays the completion of the TLB flush we can never observe an already
+ * freed page.
+ *
+ * Architectures that do not have this (PPC) need to delay the freeing by some
+ * other means, this is that means.
+ *
+ * What we do is batch the freed directory pages (tables) and RCU free them.
+ * We use the sched RCU variant, as that guarantees that IRQ/preempt disabling
+ * holds off grace periods.
+ *
+ * However, in order to batch these pages we need to allocate storage, this
+ * allocation is deep inside the MM code and can thus easily fail on memory
+ * pressure. To guarantee progress we fall back to single table freeing, see
+ * the implementation of tlb_remove_table_one().
+ *
+ */
+struct mmu_table_batch {
+ struct rcu_head rcu;
+ unsigned int nr;
+ void *tables[0];
+};
+
+#define MAX_TABLE_BATCH \
+ ((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
+
+extern void tlb_table_flush(struct mmu_gather *tlb);
+extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
+
+#endif
+
/* struct mmu_gather is an opaque type used by the mm code for passing around
* any data needed by arch specific code for tlb_remove_page.
*/
@@ -36,11 +79,12 @@ struct mmu_gather {
unsigned int max; /* nr < max */
unsigned int need_flush;/* Really unmapped some ptes? */
unsigned int fullmm; /* non-zero means full mm flush */
-#ifdef HAVE_ARCH_MMU_GATHER
- struct arch_mmu_gather arch;
-#endif
struct page **pages;
struct page *local[8];
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+ struct mmu_table_batch *batch;
+#endif
};
static inline void __tlb_alloc_page(struct mmu_gather *tlb)
@@ -72,8 +116,8 @@ tlb_gather_mmu(struct mmu_gather *tlb, s
tlb->fullmm = full_mm_flush;
-#ifdef HAVE_ARCH_MMU_GATHER
- tlb->arch = ARCH_MMU_GATHER_INIT;
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+ tlb->batch = NULL;
#endif
}
@@ -84,6 +128,9 @@ tlb_flush_mmu(struct mmu_gather *tlb, un
return;
tlb->need_flush = 0;
tlb_flush(tlb);
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+ tlb_table_flush(tlb);
+#endif
if (!tlb_fast_mode(tlb)) {
free_pages_and_swap_cache(tlb->pages, tlb->nr);
tlb->nr = 0;
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -193,6 +193,83 @@ static void check_sync_rss_stat(struct t
#endif
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+
+/*
+ * See the comment near struct mmu_table_batch.
+ */
+
+static void tlb_remove_table_smp_sync(void *arg)
+{
+ /* Simply deliver the interrupt */
+}
+
+static void tlb_remove_table_one(void *table)
+{
+ /*
+ * This isn't an RCU grace period and hence the page-tables cannot be
+ * assumed to be actually RCU-freed.
+ *
+ * It is however sufficient for software page-table walkers that rely on
+ * IRQ disabling. See the comment near struct mmu_table_batch.
+ */
+ smp_call_function(tlb_remove_table_smp_sync, NULL, 1);
+ __tlb_remove_table(table);
+}
+
+static void tlb_remove_table_rcu(struct rcu_head *head)
+{
+ struct mmu_table_batch *batch;
+ int i;
+
+ batch = container_of(head, struct mmu_table_batch, rcu);
+
+ for (i = 0; i < batch->nr; i++)
+ __tlb_remove_table(batch->tables[i]);
+
+ free_page((unsigned long)batch);
+}
+
+void tlb_table_flush(struct mmu_gather *tlb)
+{
+ struct mmu_table_batch **batch = &tlb->batch;
+
+ if (*batch) {
+ call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
+ *batch = NULL;
+ }
+}
+
+void tlb_remove_table(struct mmu_gather *tlb, void *table)
+{
+ struct mmu_table_batch **batch = &tlb->batch;
+
+ tlb->need_flush = 1;
+
+ /*
+ * When there's less then two users of this mm there cannot be a
+ * concurrent page-table walk.
+ */
+ if (atomic_read(&tlb->mm->mm_users) < 2) {
+ __tlb_remove_table(table);
+ return;
+ }
+
+ if (*batch == NULL) {
+ *batch = (struct mmu_table_batch *)__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
+ if (*batch == NULL) {
+ tlb_remove_table_one(table);
+ return;
+ }
+ (*batch)->nr = 0;
+ }
+ (*batch)->tables[(*batch)->nr++] = table;
+ if ((*batch)->nr == MAX_TABLE_BATCH)
+ tlb_table_flush(tlb);
+}
+
+#endif
+
/*
* If a p?d_bad entry is found while walking page tables, report
* the error, before resetting entry to p?d_none. Usually (but
Index: linux-2.6/arch/Kconfig
===================================================================
--- linux-2.6.orig/arch/Kconfig
+++ linux-2.6/arch/Kconfig
@@ -178,4 +178,7 @@ config HAVE_ARCH_JUMP_LABEL
config HAVE_ARCH_MUTEX_CPU_RELAX
bool
+config HAVE_RCU_TABLE_FREE
+ bool
+
source "kernel/gcov/Kconfig"
Index: linux-2.6/arch/powerpc/Kconfig
===================================================================
--- linux-2.6.orig/arch/powerpc/Kconfig
+++ linux-2.6/arch/powerpc/Kconfig
@@ -134,6 +134,7 @@ config PPC
select HAVE_GENERIC_HARDIRQS
select HAVE_SPARSE_IRQ
select IRQ_PER_CPU
+ select HAVE_RCU_TABLE_FREE if SMP
config EARLY_PRINTK
bool
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 12/25] lockdep, mutex: Provide mutex_lock_nest_lock
2011-01-25 17:31 ` Peter Zijlstra
(?)
@ 2011-01-25 17:31 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: peter_zijlstra-lockdep_mutex-provide_mutex_lock_nest_lock.patch --]
[-- Type: text/plain, Size: 5442 bytes --]
Provide the mutex_lock_nest_lock() annotation.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/lockdep.h | 3 +++
include/linux/mutex.h | 9 +++++++++
kernel/mutex.c | 25 +++++++++++++++++--------
3 files changed, 29 insertions(+), 8 deletions(-)
Index: linux-2.6/include/linux/lockdep.h
===================================================================
--- linux-2.6.orig/include/linux/lockdep.h
+++ linux-2.6/include/linux/lockdep.h
@@ -495,12 +495,15 @@ static inline void print_irqtrace_events
#ifdef CONFIG_DEBUG_LOCK_ALLOC
# ifdef CONFIG_PROVE_LOCKING
# define mutex_acquire(l, s, t, i) lock_acquire(l, s, t, 0, 2, NULL, i)
+# define mutex_acquire_nest(l, s, t, n, i) lock_acquire(l, s, t, 0, 2, n, i)
# else
# define mutex_acquire(l, s, t, i) lock_acquire(l, s, t, 0, 1, NULL, i)
+# define mutex_acquire_nest(l, s, t, n, i) lock_acquire(l, s, t, 0, 1, n, i)
# endif
# define mutex_release(l, n, i) lock_release(l, n, i)
#else
# define mutex_acquire(l, s, t, i) do { } while (0)
+# define mutex_acquire_nest(l, s, t, n, i) do { } while (0)
# define mutex_release(l, n, i) do { } while (0)
#endif
Index: linux-2.6/include/linux/mutex.h
===================================================================
--- linux-2.6.orig/include/linux/mutex.h
+++ linux-2.6/include/linux/mutex.h
@@ -132,6 +132,7 @@ static inline int mutex_is_locked(struct
*/
#ifdef CONFIG_DEBUG_LOCK_ALLOC
extern void mutex_lock_nested(struct mutex *lock, unsigned int subclass);
+extern void _mutex_lock_nest_lock(struct mutex *lock, struct lockdep_map *nest_lock);
extern int __must_check mutex_lock_interruptible_nested(struct mutex *lock,
unsigned int subclass);
extern int __must_check mutex_lock_killable_nested(struct mutex *lock,
@@ -140,6 +141,13 @@ extern int __must_check mutex_lock_killa
#define mutex_lock(lock) mutex_lock_nested(lock, 0)
#define mutex_lock_interruptible(lock) mutex_lock_interruptible_nested(lock, 0)
#define mutex_lock_killable(lock) mutex_lock_killable_nested(lock, 0)
+
+#define mutex_lock_nest_lock(lock, nest_lock) \
+do { \
+ typecheck(struct lockdep_map *, &(nest_lock)->dep_map); \
+ _mutex_lock_nest_lock(lock, &(nest_lock)->dep_map); \
+} while (0)
+
#else
extern void mutex_lock(struct mutex *lock);
extern int __must_check mutex_lock_interruptible(struct mutex *lock);
@@ -148,6 +156,7 @@ extern int __must_check mutex_lock_killa
# define mutex_lock_nested(lock, subclass) mutex_lock(lock)
# define mutex_lock_interruptible_nested(lock, subclass) mutex_lock_interruptible(lock)
# define mutex_lock_killable_nested(lock, subclass) mutex_lock_killable(lock)
+# define mutex_lock_nest_lock(lock, nest_lock) mutex_lock(lock)
#endif
/*
Index: linux-2.6/kernel/mutex.c
===================================================================
--- linux-2.6.orig/kernel/mutex.c
+++ linux-2.6/kernel/mutex.c
@@ -131,14 +131,14 @@ EXPORT_SYMBOL(mutex_unlock);
*/
static inline int __sched
__mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
- unsigned long ip)
+ struct lockdep_map *nest_lock, unsigned long ip)
{
struct task_struct *task = current;
struct mutex_waiter waiter;
unsigned long flags;
preempt_disable();
- mutex_acquire(&lock->dep_map, subclass, 0, ip);
+ mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);
#ifdef CONFIG_MUTEX_SPIN_ON_OWNER
/*
@@ -276,16 +276,25 @@ void __sched
mutex_lock_nested(struct mutex *lock, unsigned int subclass)
{
might_sleep();
- __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, subclass, _RET_IP_);
+ __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, subclass, NULL, _RET_IP_);
}
EXPORT_SYMBOL_GPL(mutex_lock_nested);
+void __sched
+_mutex_lock_nest_lock(struct mutex *lock, struct lockdep_map *nest)
+{
+ might_sleep();
+ __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, 0, nest, _RET_IP_);
+}
+
+EXPORT_SYMBOL_GPL(_mutex_lock_nest_lock);
+
int __sched
mutex_lock_killable_nested(struct mutex *lock, unsigned int subclass)
{
might_sleep();
- return __mutex_lock_common(lock, TASK_KILLABLE, subclass, _RET_IP_);
+ return __mutex_lock_common(lock, TASK_KILLABLE, subclass, NULL, _RET_IP_);
}
EXPORT_SYMBOL_GPL(mutex_lock_killable_nested);
@@ -294,7 +303,7 @@ mutex_lock_interruptible_nested(struct m
{
might_sleep();
return __mutex_lock_common(lock, TASK_INTERRUPTIBLE,
- subclass, _RET_IP_);
+ subclass, NULL, _RET_IP_);
}
EXPORT_SYMBOL_GPL(mutex_lock_interruptible_nested);
@@ -400,7 +409,7 @@ __mutex_lock_slowpath(atomic_t *lock_cou
{
struct mutex *lock = container_of(lock_count, struct mutex, count);
- __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, 0, _RET_IP_);
+ __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, 0, NULL, _RET_IP_);
}
static noinline int __sched
@@ -408,7 +417,7 @@ __mutex_lock_killable_slowpath(atomic_t
{
struct mutex *lock = container_of(lock_count, struct mutex, count);
- return __mutex_lock_common(lock, TASK_KILLABLE, 0, _RET_IP_);
+ return __mutex_lock_common(lock, TASK_KILLABLE, 0, NULL, _RET_IP_);
}
static noinline int __sched
@@ -416,7 +425,7 @@ __mutex_lock_interruptible_slowpath(atom
{
struct mutex *lock = container_of(lock_count, struct mutex, count);
- return __mutex_lock_common(lock, TASK_INTERRUPTIBLE, 0, _RET_IP_);
+ return __mutex_lock_common(lock, TASK_INTERRUPTIBLE, 0, NULL, _RET_IP_);
}
#endif
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 12/25] lockdep, mutex: Provide mutex_lock_nest_lock
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: peter_zijlstra-lockdep_mutex-provide_mutex_lock_nest_lock.patch --]
[-- Type: text/plain, Size: 5738 bytes --]
Provide the mutex_lock_nest_lock() annotation.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/lockdep.h | 3 +++
include/linux/mutex.h | 9 +++++++++
kernel/mutex.c | 25 +++++++++++++++++--------
3 files changed, 29 insertions(+), 8 deletions(-)
Index: linux-2.6/include/linux/lockdep.h
===================================================================
--- linux-2.6.orig/include/linux/lockdep.h
+++ linux-2.6/include/linux/lockdep.h
@@ -495,12 +495,15 @@ static inline void print_irqtrace_events
#ifdef CONFIG_DEBUG_LOCK_ALLOC
# ifdef CONFIG_PROVE_LOCKING
# define mutex_acquire(l, s, t, i) lock_acquire(l, s, t, 0, 2, NULL, i)
+# define mutex_acquire_nest(l, s, t, n, i) lock_acquire(l, s, t, 0, 2, n, i)
# else
# define mutex_acquire(l, s, t, i) lock_acquire(l, s, t, 0, 1, NULL, i)
+# define mutex_acquire_nest(l, s, t, n, i) lock_acquire(l, s, t, 0, 1, n, i)
# endif
# define mutex_release(l, n, i) lock_release(l, n, i)
#else
# define mutex_acquire(l, s, t, i) do { } while (0)
+# define mutex_acquire_nest(l, s, t, n, i) do { } while (0)
# define mutex_release(l, n, i) do { } while (0)
#endif
Index: linux-2.6/include/linux/mutex.h
===================================================================
--- linux-2.6.orig/include/linux/mutex.h
+++ linux-2.6/include/linux/mutex.h
@@ -132,6 +132,7 @@ static inline int mutex_is_locked(struct
*/
#ifdef CONFIG_DEBUG_LOCK_ALLOC
extern void mutex_lock_nested(struct mutex *lock, unsigned int subclass);
+extern void _mutex_lock_nest_lock(struct mutex *lock, struct lockdep_map *nest_lock);
extern int __must_check mutex_lock_interruptible_nested(struct mutex *lock,
unsigned int subclass);
extern int __must_check mutex_lock_killable_nested(struct mutex *lock,
@@ -140,6 +141,13 @@ extern int __must_check mutex_lock_killa
#define mutex_lock(lock) mutex_lock_nested(lock, 0)
#define mutex_lock_interruptible(lock) mutex_lock_interruptible_nested(lock, 0)
#define mutex_lock_killable(lock) mutex_lock_killable_nested(lock, 0)
+
+#define mutex_lock_nest_lock(lock, nest_lock) \
+do { \
+ typecheck(struct lockdep_map *, &(nest_lock)->dep_map); \
+ _mutex_lock_nest_lock(lock, &(nest_lock)->dep_map); \
+} while (0)
+
#else
extern void mutex_lock(struct mutex *lock);
extern int __must_check mutex_lock_interruptible(struct mutex *lock);
@@ -148,6 +156,7 @@ extern int __must_check mutex_lock_killa
# define mutex_lock_nested(lock, subclass) mutex_lock(lock)
# define mutex_lock_interruptible_nested(lock, subclass) mutex_lock_interruptible(lock)
# define mutex_lock_killable_nested(lock, subclass) mutex_lock_killable(lock)
+# define mutex_lock_nest_lock(lock, nest_lock) mutex_lock(lock)
#endif
/*
Index: linux-2.6/kernel/mutex.c
===================================================================
--- linux-2.6.orig/kernel/mutex.c
+++ linux-2.6/kernel/mutex.c
@@ -131,14 +131,14 @@ EXPORT_SYMBOL(mutex_unlock);
*/
static inline int __sched
__mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
- unsigned long ip)
+ struct lockdep_map *nest_lock, unsigned long ip)
{
struct task_struct *task = current;
struct mutex_waiter waiter;
unsigned long flags;
preempt_disable();
- mutex_acquire(&lock->dep_map, subclass, 0, ip);
+ mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);
#ifdef CONFIG_MUTEX_SPIN_ON_OWNER
/*
@@ -276,16 +276,25 @@ void __sched
mutex_lock_nested(struct mutex *lock, unsigned int subclass)
{
might_sleep();
- __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, subclass, _RET_IP_);
+ __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, subclass, NULL, _RET_IP_);
}
EXPORT_SYMBOL_GPL(mutex_lock_nested);
+void __sched
+_mutex_lock_nest_lock(struct mutex *lock, struct lockdep_map *nest)
+{
+ might_sleep();
+ __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, 0, nest, _RET_IP_);
+}
+
+EXPORT_SYMBOL_GPL(_mutex_lock_nest_lock);
+
int __sched
mutex_lock_killable_nested(struct mutex *lock, unsigned int subclass)
{
might_sleep();
- return __mutex_lock_common(lock, TASK_KILLABLE, subclass, _RET_IP_);
+ return __mutex_lock_common(lock, TASK_KILLABLE, subclass, NULL, _RET_IP_);
}
EXPORT_SYMBOL_GPL(mutex_lock_killable_nested);
@@ -294,7 +303,7 @@ mutex_lock_interruptible_nested(struct m
{
might_sleep();
return __mutex_lock_common(lock, TASK_INTERRUPTIBLE,
- subclass, _RET_IP_);
+ subclass, NULL, _RET_IP_);
}
EXPORT_SYMBOL_GPL(mutex_lock_interruptible_nested);
@@ -400,7 +409,7 @@ __mutex_lock_slowpath(atomic_t *lock_cou
{
struct mutex *lock = container_of(lock_count, struct mutex, count);
- __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, 0, _RET_IP_);
+ __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, 0, NULL, _RET_IP_);
}
static noinline int __sched
@@ -408,7 +417,7 @@ __mutex_lock_killable_slowpath(atomic_t
{
struct mutex *lock = container_of(lock_count, struct mutex, count);
- return __mutex_lock_common(lock, TASK_KILLABLE, 0, _RET_IP_);
+ return __mutex_lock_common(lock, TASK_KILLABLE, 0, NULL, _RET_IP_);
}
static noinline int __sched
@@ -416,7 +425,7 @@ __mutex_lock_interruptible_slowpath(atom
{
struct mutex *lock = container_of(lock_count, struct mutex, count);
- return __mutex_lock_common(lock, TASK_INTERRUPTIBLE, 0, _RET_IP_);
+ return __mutex_lock_common(lock, TASK_INTERRUPTIBLE, 0, NULL, _RET_IP_);
}
#endif
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 12/25] lockdep, mutex: Provide mutex_lock_nest_lock
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: peter_zijlstra-lockdep_mutex-provide_mutex_lock_nest_lock.patch --]
[-- Type: text/plain, Size: 5738 bytes --]
Provide the mutex_lock_nest_lock() annotation.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/lockdep.h | 3 +++
include/linux/mutex.h | 9 +++++++++
kernel/mutex.c | 25 +++++++++++++++++--------
3 files changed, 29 insertions(+), 8 deletions(-)
Index: linux-2.6/include/linux/lockdep.h
===================================================================
--- linux-2.6.orig/include/linux/lockdep.h
+++ linux-2.6/include/linux/lockdep.h
@@ -495,12 +495,15 @@ static inline void print_irqtrace_events
#ifdef CONFIG_DEBUG_LOCK_ALLOC
# ifdef CONFIG_PROVE_LOCKING
# define mutex_acquire(l, s, t, i) lock_acquire(l, s, t, 0, 2, NULL, i)
+# define mutex_acquire_nest(l, s, t, n, i) lock_acquire(l, s, t, 0, 2, n, i)
# else
# define mutex_acquire(l, s, t, i) lock_acquire(l, s, t, 0, 1, NULL, i)
+# define mutex_acquire_nest(l, s, t, n, i) lock_acquire(l, s, t, 0, 1, n, i)
# endif
# define mutex_release(l, n, i) lock_release(l, n, i)
#else
# define mutex_acquire(l, s, t, i) do { } while (0)
+# define mutex_acquire_nest(l, s, t, n, i) do { } while (0)
# define mutex_release(l, n, i) do { } while (0)
#endif
Index: linux-2.6/include/linux/mutex.h
===================================================================
--- linux-2.6.orig/include/linux/mutex.h
+++ linux-2.6/include/linux/mutex.h
@@ -132,6 +132,7 @@ static inline int mutex_is_locked(struct
*/
#ifdef CONFIG_DEBUG_LOCK_ALLOC
extern void mutex_lock_nested(struct mutex *lock, unsigned int subclass);
+extern void _mutex_lock_nest_lock(struct mutex *lock, struct lockdep_map *nest_lock);
extern int __must_check mutex_lock_interruptible_nested(struct mutex *lock,
unsigned int subclass);
extern int __must_check mutex_lock_killable_nested(struct mutex *lock,
@@ -140,6 +141,13 @@ extern int __must_check mutex_lock_killa
#define mutex_lock(lock) mutex_lock_nested(lock, 0)
#define mutex_lock_interruptible(lock) mutex_lock_interruptible_nested(lock, 0)
#define mutex_lock_killable(lock) mutex_lock_killable_nested(lock, 0)
+
+#define mutex_lock_nest_lock(lock, nest_lock) \
+do { \
+ typecheck(struct lockdep_map *, &(nest_lock)->dep_map); \
+ _mutex_lock_nest_lock(lock, &(nest_lock)->dep_map); \
+} while (0)
+
#else
extern void mutex_lock(struct mutex *lock);
extern int __must_check mutex_lock_interruptible(struct mutex *lock);
@@ -148,6 +156,7 @@ extern int __must_check mutex_lock_killa
# define mutex_lock_nested(lock, subclass) mutex_lock(lock)
# define mutex_lock_interruptible_nested(lock, subclass) mutex_lock_interruptible(lock)
# define mutex_lock_killable_nested(lock, subclass) mutex_lock_killable(lock)
+# define mutex_lock_nest_lock(lock, nest_lock) mutex_lock(lock)
#endif
/*
Index: linux-2.6/kernel/mutex.c
===================================================================
--- linux-2.6.orig/kernel/mutex.c
+++ linux-2.6/kernel/mutex.c
@@ -131,14 +131,14 @@ EXPORT_SYMBOL(mutex_unlock);
*/
static inline int __sched
__mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
- unsigned long ip)
+ struct lockdep_map *nest_lock, unsigned long ip)
{
struct task_struct *task = current;
struct mutex_waiter waiter;
unsigned long flags;
preempt_disable();
- mutex_acquire(&lock->dep_map, subclass, 0, ip);
+ mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);
#ifdef CONFIG_MUTEX_SPIN_ON_OWNER
/*
@@ -276,16 +276,25 @@ void __sched
mutex_lock_nested(struct mutex *lock, unsigned int subclass)
{
might_sleep();
- __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, subclass, _RET_IP_);
+ __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, subclass, NULL, _RET_IP_);
}
EXPORT_SYMBOL_GPL(mutex_lock_nested);
+void __sched
+_mutex_lock_nest_lock(struct mutex *lock, struct lockdep_map *nest)
+{
+ might_sleep();
+ __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, 0, nest, _RET_IP_);
+}
+
+EXPORT_SYMBOL_GPL(_mutex_lock_nest_lock);
+
int __sched
mutex_lock_killable_nested(struct mutex *lock, unsigned int subclass)
{
might_sleep();
- return __mutex_lock_common(lock, TASK_KILLABLE, subclass, _RET_IP_);
+ return __mutex_lock_common(lock, TASK_KILLABLE, subclass, NULL, _RET_IP_);
}
EXPORT_SYMBOL_GPL(mutex_lock_killable_nested);
@@ -294,7 +303,7 @@ mutex_lock_interruptible_nested(struct m
{
might_sleep();
return __mutex_lock_common(lock, TASK_INTERRUPTIBLE,
- subclass, _RET_IP_);
+ subclass, NULL, _RET_IP_);
}
EXPORT_SYMBOL_GPL(mutex_lock_interruptible_nested);
@@ -400,7 +409,7 @@ __mutex_lock_slowpath(atomic_t *lock_cou
{
struct mutex *lock = container_of(lock_count, struct mutex, count);
- __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, 0, _RET_IP_);
+ __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, 0, NULL, _RET_IP_);
}
static noinline int __sched
@@ -408,7 +417,7 @@ __mutex_lock_killable_slowpath(atomic_t
{
struct mutex *lock = container_of(lock_count, struct mutex, count);
- return __mutex_lock_common(lock, TASK_KILLABLE, 0, _RET_IP_);
+ return __mutex_lock_common(lock, TASK_KILLABLE, 0, NULL, _RET_IP_);
}
static noinline int __sched
@@ -416,7 +425,7 @@ __mutex_lock_interruptible_slowpath(atom
{
struct mutex *lock = container_of(lock_count, struct mutex, count);
- return __mutex_lock_common(lock, TASK_INTERRUPTIBLE, 0, _RET_IP_);
+ return __mutex_lock_common(lock, TASK_INTERRUPTIBLE, 0, NULL, _RET_IP_);
}
#endif
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 13/25] mutex: Provide mutex_is_contended
2011-01-25 17:31 ` Peter Zijlstra
(?)
@ 2011-01-25 17:31 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: peter_zijlstra-mutex-provide_mutex_is_contended.patch --]
[-- Type: text/plain, Size: 741 bytes --]
Usable for lock-breaks and such.
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/mutex.h | 5 +++++
1 file changed, 5 insertions(+)
Index: linux-2.6/include/linux/mutex.h
===================================================================
--- linux-2.6.orig/include/linux/mutex.h
+++ linux-2.6/include/linux/mutex.h
@@ -126,6 +126,11 @@ static inline int mutex_is_locked(struct
return atomic_read(&lock->count) != 1;
}
+static inline int mutex_is_contended(struct mutex *lock)
+{
+ return atomic_read(&lock->count) < 0;
+}
+
/*
* See kernel/mutex.c for detailed documentation of these APIs.
* Also see Documentation/mutex-design.txt.
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 13/25] mutex: Provide mutex_is_contended
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: peter_zijlstra-mutex-provide_mutex_is_contended.patch --]
[-- Type: text/plain, Size: 1037 bytes --]
Usable for lock-breaks and such.
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/mutex.h | 5 +++++
1 file changed, 5 insertions(+)
Index: linux-2.6/include/linux/mutex.h
===================================================================
--- linux-2.6.orig/include/linux/mutex.h
+++ linux-2.6/include/linux/mutex.h
@@ -126,6 +126,11 @@ static inline int mutex_is_locked(struct
return atomic_read(&lock->count) != 1;
}
+static inline int mutex_is_contended(struct mutex *lock)
+{
+ return atomic_read(&lock->count) < 0;
+}
+
/*
* See kernel/mutex.c for detailed documentation of these APIs.
* Also see Documentation/mutex-design.txt.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 13/25] mutex: Provide mutex_is_contended
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: peter_zijlstra-mutex-provide_mutex_is_contended.patch --]
[-- Type: text/plain, Size: 739 bytes --]
Usable for lock-breaks and such.
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/mutex.h | 5 +++++
1 file changed, 5 insertions(+)
Index: linux-2.6/include/linux/mutex.h
===================================================================
--- linux-2.6.orig/include/linux/mutex.h
+++ linux-2.6/include/linux/mutex.h
@@ -126,6 +126,11 @@ static inline int mutex_is_locked(struct
return atomic_read(&lock->count) != 1;
}
+static inline int mutex_is_contended(struct mutex *lock)
+{
+ return atomic_read(&lock->count) < 0;
+}
+
/*
* See kernel/mutex.c for detailed documentation of these APIs.
* Also see Documentation/mutex-design.txt.
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 14/25] mm: Convert i_mmap_lock to a mutex
2011-01-25 17:31 ` Peter Zijlstra
(?)
@ 2011-01-25 17:31 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-convert_i_mmap_lock_to_mutexes.patch --]
[-- Type: text/plain, Size: 26158 bytes --]
Straight fwd conversion of i_mmap_lock to a mutex
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
Documentation/lockstat.txt | 2 +-
Documentation/vm/locking | 2 +-
arch/x86/mm/hugetlbpage.c | 4 ++--
fs/gfs2/main.c | 2 +-
fs/hugetlbfs/inode.c | 4 ++--
fs/inode.c | 2 +-
fs/nilfs2/page.c | 2 +-
include/linux/fs.h | 2 +-
include/linux/mm.h | 2 +-
include/linux/mmu_notifier.h | 2 +-
kernel/fork.c | 4 ++--
mm/filemap.c | 10 +++++-----
mm/filemap_xip.c | 4 ++--
mm/fremap.c | 4 ++--
mm/hugetlb.c | 14 +++++++-------
mm/memory-failure.c | 4 ++--
mm/memory.c | 22 +++++++++++-----------
mm/mmap.c | 22 +++++++++++-----------
mm/mremap.c | 4 ++--
mm/rmap.c | 28 ++++++++++++++--------------
20 files changed, 70 insertions(+), 70 deletions(-)
Index: linux-2.6/arch/x86/mm/hugetlbpage.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/hugetlbpage.c
+++ linux-2.6/arch/x86/mm/hugetlbpage.c
@@ -72,7 +72,7 @@ static void huge_pmd_share(struct mm_str
if (!vma_shareable(vma, addr))
return;
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
vma_prio_tree_foreach(svma, &iter, &mapping->i_mmap, idx, idx) {
if (svma == vma)
continue;
@@ -97,7 +97,7 @@ static void huge_pmd_share(struct mm_str
put_page(virt_to_page(spte));
spin_unlock(&mm->page_table_lock);
out:
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
}
/*
Index: linux-2.6/fs/hugetlbfs/inode.c
===================================================================
--- linux-2.6.orig/fs/hugetlbfs/inode.c
+++ linux-2.6/fs/hugetlbfs/inode.c
@@ -413,10 +413,10 @@ static int hugetlb_vmtruncate(struct ino
pgoff = offset >> PAGE_SHIFT;
i_size_write(inode, offset);
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
if (!prio_tree_empty(&mapping->i_mmap))
hugetlb_vmtruncate_list(&mapping->i_mmap, pgoff);
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
truncate_hugepages(inode, offset);
return 0;
}
Index: linux-2.6/fs/inode.c
===================================================================
--- linux-2.6.orig/fs/inode.c
+++ linux-2.6/fs/inode.c
@@ -310,7 +310,7 @@ void inode_init_once(struct inode *inode
INIT_LIST_HEAD(&inode->i_lru);
INIT_RADIX_TREE(&inode->i_data.page_tree, GFP_ATOMIC);
spin_lock_init(&inode->i_data.tree_lock);
- spin_lock_init(&inode->i_data.i_mmap_lock);
+ mutex_init(&inode->i_data.i_mmap_mutex);
INIT_LIST_HEAD(&inode->i_data.private_list);
spin_lock_init(&inode->i_data.private_lock);
INIT_RAW_PRIO_TREE_ROOT(&inode->i_data.i_mmap);
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -639,7 +639,7 @@ struct address_space {
unsigned int i_mmap_writable;/* count VM_SHARED mappings */
struct prio_tree_root i_mmap; /* tree of private and shared mappings */
struct list_head i_mmap_nonlinear;/*list VM_NONLINEAR mappings */
- spinlock_t i_mmap_lock; /* protect tree, count, list */
+ struct mutex i_mmap_mutex; /* protect tree, count, list */
unsigned int truncate_count; /* Cover race condition with truncate */
unsigned long nrpages; /* number of total pages */
pgoff_t writeback_index;/* writeback starts here */
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -876,7 +876,7 @@ struct zap_details {
struct address_space *check_mapping; /* Check page->mapping if set */
pgoff_t first_index; /* Lowest page->index to unmap */
pgoff_t last_index; /* Highest page->index to unmap */
- spinlock_t *i_mmap_lock; /* For unmap_mapping_range: */
+ struct mutex *i_mmap_mutex; /* For unmap_mapping_range: */
unsigned long truncate_count; /* Compare vm_truncate_count */
};
Index: linux-2.6/kernel/fork.c
===================================================================
--- linux-2.6.orig/kernel/fork.c
+++ linux-2.6/kernel/fork.c
@@ -376,7 +376,7 @@ static int dup_mmap(struct mm_struct *mm
get_file(file);
if (tmp->vm_flags & VM_DENYWRITE)
atomic_dec(&inode->i_writecount);
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
if (tmp->vm_flags & VM_SHARED)
mapping->i_mmap_writable++;
tmp->vm_truncate_count = mpnt->vm_truncate_count;
@@ -384,7 +384,7 @@ static int dup_mmap(struct mm_struct *mm
/* insert tmp into the share list, just after mpnt */
vma_prio_tree_add(tmp, mpnt);
flush_dcache_mmap_unlock(mapping);
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
}
/*
Index: linux-2.6/mm/filemap_xip.c
===================================================================
--- linux-2.6.orig/mm/filemap_xip.c
+++ linux-2.6/mm/filemap_xip.c
@@ -183,7 +183,7 @@ __xip_unmap (struct address_space * mapp
return;
retry:
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
mm = vma->vm_mm;
address = vma->vm_start +
@@ -201,7 +201,7 @@ __xip_unmap (struct address_space * mapp
page_cache_release(page);
}
}
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
if (locked) {
mutex_unlock(&xip_sparse_mutex);
Index: linux-2.6/mm/fremap.c
===================================================================
--- linux-2.6.orig/mm/fremap.c
+++ linux-2.6/mm/fremap.c
@@ -211,13 +211,13 @@ SYSCALL_DEFINE5(remap_file_pages, unsign
}
goto out;
}
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
flush_dcache_mmap_lock(mapping);
vma->vm_flags |= VM_NONLINEAR;
vma_prio_tree_remove(vma, &mapping->i_mmap);
vma_nonlinear_insert(vma, &mapping->i_mmap_nonlinear);
flush_dcache_mmap_unlock(mapping);
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
}
if (vma->vm_flags & VM_LOCKED) {
Index: linux-2.6/mm/hugetlb.c
===================================================================
--- linux-2.6.orig/mm/hugetlb.c
+++ linux-2.6/mm/hugetlb.c
@@ -2207,7 +2207,7 @@ void __unmap_hugepage_range(struct vm_ar
unsigned long sz = huge_page_size(h);
/*
- * A page gathering list, protected by per file i_mmap_lock. The
+ * A page gathering list, protected by per file i_mmap_mutex. The
* lock is used to avoid list corruption from multiple unmapping
* of the same page since we are using page->lru.
*/
@@ -2276,9 +2276,9 @@ void __unmap_hugepage_range(struct vm_ar
void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
unsigned long end, struct page *ref_page)
{
- spin_lock(&vma->vm_file->f_mapping->i_mmap_lock);
+ mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
__unmap_hugepage_range(vma, start, end, ref_page);
- spin_unlock(&vma->vm_file->f_mapping->i_mmap_lock);
+ mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
}
/*
@@ -2310,7 +2310,7 @@ static int unmap_ref_private(struct mm_s
* this mapping should be shared between all the VMAs,
* __unmap_hugepage_range() is called as the lock is already held
*/
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
vma_prio_tree_foreach(iter_vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
/* Do not unmap the current VMA */
if (iter_vma == vma)
@@ -2328,7 +2328,7 @@ static int unmap_ref_private(struct mm_s
address, address + huge_page_size(h),
page);
}
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
return 1;
}
@@ -2812,7 +2812,7 @@ void hugetlb_change_protection(struct vm
BUG_ON(address >= end);
flush_cache_range(vma, address, end);
- spin_lock(&vma->vm_file->f_mapping->i_mmap_lock);
+ mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
spin_lock(&mm->page_table_lock);
for (; address < end; address += huge_page_size(h)) {
ptep = huge_pte_offset(mm, address);
@@ -2827,7 +2827,7 @@ void hugetlb_change_protection(struct vm
}
}
spin_unlock(&mm->page_table_lock);
- spin_unlock(&vma->vm_file->f_mapping->i_mmap_lock);
+ mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
flush_tlb_range(vma, start, end);
}
Index: linux-2.6/mm/memory-failure.c
===================================================================
--- linux-2.6.orig/mm/memory-failure.c
+++ linux-2.6/mm/memory-failure.c
@@ -431,7 +431,7 @@ static void collect_procs_file(struct pa
*/
read_lock(&tasklist_lock);
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
for_each_process(tsk) {
pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
@@ -451,7 +451,7 @@ static void collect_procs_file(struct pa
add_to_kill(tsk, page, vma, to_kill, tkc);
}
}
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
read_unlock(&tasklist_lock);
}
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -1205,7 +1205,7 @@ unsigned long unmap_vmas(struct mmu_gath
{
long zap_work = ZAP_BLOCK_SIZE;
unsigned long start = start_addr;
- spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
+ struct mutex *i_mmap_mutex = details ? details->i_mmap_mutex : NULL;
struct mm_struct *mm = vma->vm_mm;
mmu_notifier_invalidate_range_start(mm, start_addr, end_addr);
@@ -1255,8 +1255,8 @@ unsigned long unmap_vmas(struct mmu_gath
}
if (need_resched() ||
- (i_mmap_lock && spin_needbreak(i_mmap_lock))) {
- if (i_mmap_lock)
+ (i_mmap_mutex && mutex_is_contended(i_mmap_mutex))) {
+ if (i_mmap_mutex)
goto out;
cond_resched();
}
@@ -2531,7 +2531,7 @@ static int do_wp_page(struct mm_struct *
/*
* Helper functions for unmap_mapping_range().
*
- * __ Notes on dropping i_mmap_lock to reduce latency while unmapping __
+ * __ Notes on dropping i_mmap_mutex to reduce latency while unmapping __
*
* We have to restart searching the prio_tree whenever we drop the lock,
* since the iterator is only valid while the lock is held, and anyway
@@ -2550,7 +2550,7 @@ static int do_wp_page(struct mm_struct *
* can't efficiently keep all vmas in step with mapping->truncate_count:
* so instead reset them all whenever it wraps back to 0 (then go to 1).
* mapping->truncate_count and vma->vm_truncate_count are protected by
- * i_mmap_lock.
+ * i_mmap_mutex.
*
* In order to make forward progress despite repeatedly restarting some
* large vma, note the restart_addr from unmap_vmas when it breaks out:
@@ -2600,7 +2600,7 @@ static int unmap_mapping_range_vma(struc
restart_addr = zap_page_range(vma, start_addr,
end_addr - start_addr, details);
- need_break = need_resched() || spin_needbreak(details->i_mmap_lock);
+ need_break = need_resched() || mutex_is_contended(details->i_mmap_mutex);
if (restart_addr >= end_addr) {
/* We have now completed this vma: mark it so */
@@ -2614,9 +2614,9 @@ static int unmap_mapping_range_vma(struc
goto again;
}
- spin_unlock(details->i_mmap_lock);
+ mutex_unlock(details->i_mmap_mutex);
cond_resched();
- spin_lock(details->i_mmap_lock);
+ mutex_lock(details->i_mmap_mutex);
return -EINTR;
}
@@ -2710,9 +2710,9 @@ void unmap_mapping_range(struct address_
details.last_index = hba + hlen - 1;
if (details.last_index < details.first_index)
details.last_index = ULONG_MAX;
- details.i_mmap_lock = &mapping->i_mmap_lock;
+ details.i_mmap_mutex = &mapping->i_mmap_mutex;
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
/* Protect against endless unmapping loops */
mapping->truncate_count++;
@@ -2727,7 +2727,7 @@ void unmap_mapping_range(struct address_
unmap_mapping_range_tree(&mapping->i_mmap, &details);
if (unlikely(!list_empty(&mapping->i_mmap_nonlinear)))
unmap_mapping_range_list(&mapping->i_mmap_nonlinear, &details);
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
}
EXPORT_SYMBOL(unmap_mapping_range);
Index: linux-2.6/mm/mmap.c
===================================================================
--- linux-2.6.orig/mm/mmap.c
+++ linux-2.6/mm/mmap.c
@@ -190,7 +190,7 @@ int __vm_enough_memory(struct mm_struct
}
/*
- * Requires inode->i_mapping->i_mmap_lock
+ * Requires inode->i_mapping->i_mmap_mutex
*/
static void __remove_shared_vm_struct(struct vm_area_struct *vma,
struct file *file, struct address_space *mapping)
@@ -218,9 +218,9 @@ void unlink_file_vma(struct vm_area_stru
if (file) {
struct address_space *mapping = file->f_mapping;
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
__remove_shared_vm_struct(vma, file, mapping);
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
}
}
@@ -465,7 +465,7 @@ static void vma_link(struct mm_struct *m
mapping = vma->vm_file->f_mapping;
if (mapping) {
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
vma->vm_truncate_count = mapping->truncate_count;
}
@@ -473,7 +473,7 @@ static void vma_link(struct mm_struct *m
__vma_link_file(vma);
if (mapping)
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
mm->map_count++;
validate_mm(mm);
@@ -576,7 +576,7 @@ again: remove_next = 1 + (end > next->
mapping = file->f_mapping;
if (!(vma->vm_flags & VM_NONLINEAR))
root = &mapping->i_mmap;
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
if (importer &&
vma->vm_truncate_count != next->vm_truncate_count) {
/*
@@ -652,7 +652,7 @@ again: remove_next = 1 + (end > next->
if (anon_vma)
anon_vma_unlock(anon_vma);
if (mapping)
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
if (remove_next) {
if (file) {
@@ -2311,7 +2311,7 @@ void exit_mmap(struct mm_struct *mm)
/* Insert vm structure into process list sorted by address
* and into the inode's i_mmap tree. If vm_file is non-NULL
- * then i_mmap_lock is taken here.
+ * then i_mmap_mutex is taken here.
*/
int insert_vm_struct(struct mm_struct * mm, struct vm_area_struct * vma)
{
@@ -2553,7 +2553,7 @@ static void vm_lock_mapping(struct mm_st
*/
if (test_and_set_bit(AS_MM_ALL_LOCKS, &mapping->flags))
BUG();
- spin_lock_nest_lock(&mapping->i_mmap_lock, &mm->mmap_sem);
+ mutex_lock_nest_lock(&mapping->i_mmap_mutex, &mm->mmap_sem);
}
}
@@ -2580,7 +2580,7 @@ static void vm_lock_mapping(struct mm_st
* vma in this mm is backed by the same anon_vma or address_space.
*
* We can take all the locks in random order because the VM code
- * taking i_mmap_lock or anon_vma->lock outside the mmap_sem never
+ * taking i_mmap_mutex or anon_vma->lock outside the mmap_sem never
* takes more than one of them in a row. Secondly we're protected
* against a concurrent mm_take_all_locks() by the mm_all_locks_mutex.
*
@@ -2652,7 +2652,7 @@ static void vm_unlock_mapping(struct add
* AS_MM_ALL_LOCKS can't change to 0 from under us
* because we hold the mm_all_locks_mutex.
*/
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
if (!test_and_clear_bit(AS_MM_ALL_LOCKS,
&mapping->flags))
BUG();
Index: linux-2.6/mm/mremap.c
===================================================================
--- linux-2.6.orig/mm/mremap.c
+++ linux-2.6/mm/mremap.c
@@ -93,7 +93,7 @@ static void move_ptes(struct vm_area_str
* and we propagate stale pages into the dst afterward.
*/
mapping = vma->vm_file->f_mapping;
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
if (new_vma->vm_truncate_count &&
new_vma->vm_truncate_count != vma->vm_truncate_count)
new_vma->vm_truncate_count = 0;
@@ -125,7 +125,7 @@ static void move_ptes(struct vm_area_str
pte_unmap(new_pte - 1);
pte_unmap_unlock(old_pte - 1, old_ptl);
if (mapping)
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
mmu_notifier_invalidate_range_end(vma->vm_mm, old_start, old_end);
}
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -24,7 +24,7 @@
* inode->i_alloc_sem (vmtruncate_range)
* mm->mmap_sem
* page->flags PG_locked (lock_page)
- * mapping->i_mmap_lock
+ * mapping->i_mmap_mutex
* anon_vma->lock
* mm->page_table_lock or pte_lock
* zone->lru_lock (in mark_page_accessed, isolate_lru_page)
@@ -625,14 +625,14 @@ static int page_referenced_file(struct p
* The page lock not only makes sure that page->mapping cannot
* suddenly be NULLified by truncation, it makes sure that the
* structure at mapping cannot be freed and reused yet,
- * so we can safely take mapping->i_mmap_lock.
+ * so we can safely take mapping->i_mmap_mutex.
*/
BUG_ON(!PageLocked(page));
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
/*
- * i_mmap_lock does not stabilize mapcount at all, but mapcount
+ * i_mmap_mutex does not stabilize mapcount at all, but mapcount
* is more likely to be accurate if we note it after spinning.
*/
mapcount = page_mapcount(page);
@@ -654,7 +654,7 @@ static int page_referenced_file(struct p
break;
}
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
return referenced;
}
@@ -741,7 +741,7 @@ static int page_mkclean_file(struct addr
BUG_ON(PageAnon(page));
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
if (vma->vm_flags & VM_SHARED) {
unsigned long address = vma_address(page, vma);
@@ -750,7 +750,7 @@ static int page_mkclean_file(struct addr
ret += page_mkclean_one(page, vma, address);
}
}
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
return ret;
}
@@ -1101,7 +1101,7 @@ int try_to_unmap_one(struct page *page,
/*
* We need mmap_sem locking, Otherwise VM_LOCKED check makes
* unstable result and race. Plus, We can't wait here because
- * we now hold anon_vma->lock or mapping->i_mmap_lock.
+ * we now hold anon_vma->lock or mapping->i_mmap_mutex.
* if trylock failed, the page remain in evictable lru and later
* vmscan could retry to move the page to unevictable lru if the
* page is actually mlocked.
@@ -1327,7 +1327,7 @@ static int try_to_unmap_file(struct page
unsigned long max_nl_size = 0;
unsigned int mapcount;
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
unsigned long address = vma_address(page, vma);
if (address == -EFAULT)
@@ -1373,7 +1373,7 @@ static int try_to_unmap_file(struct page
mapcount = page_mapcount(page);
if (!mapcount)
goto out;
- cond_resched_lock(&mapping->i_mmap_lock);
+ cond_resched();
max_nl_size = (max_nl_size + CLUSTER_SIZE - 1) & CLUSTER_MASK;
if (max_nl_cursor == 0)
@@ -1395,7 +1395,7 @@ static int try_to_unmap_file(struct page
}
vma->vm_private_data = (void *) max_nl_cursor;
}
- cond_resched_lock(&mapping->i_mmap_lock);
+ cond_resched();
max_nl_cursor += CLUSTER_SIZE;
} while (max_nl_cursor <= max_nl_size);
@@ -1407,7 +1407,7 @@ static int try_to_unmap_file(struct page
list_for_each_entry(vma, &mapping->i_mmap_nonlinear, shared.vm_set.list)
vma->vm_private_data = NULL;
out:
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
return ret;
}
@@ -1552,7 +1552,7 @@ static int rmap_walk_file(struct page *p
if (!mapping)
return ret;
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
unsigned long address = vma_address(page, vma);
if (address == -EFAULT)
@@ -1566,7 +1566,7 @@ static int rmap_walk_file(struct page *p
* never contain migration ptes. Decide what to do about this
* limitation to linear when we need rmap_walk() on nonlinear.
*/
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
return ret;
}
Index: linux-2.6/fs/gfs2/main.c
===================================================================
--- linux-2.6.orig/fs/gfs2/main.c
+++ linux-2.6/fs/gfs2/main.c
@@ -62,7 +62,7 @@ static void gfs2_init_gl_aspace_once(voi
memset(mapping, 0, sizeof(*mapping));
INIT_RADIX_TREE(&mapping->page_tree, GFP_ATOMIC);
spin_lock_init(&mapping->tree_lock);
- spin_lock_init(&mapping->i_mmap_lock);
+ mutex_init(&mapping->i_mmap_mutex);
INIT_LIST_HEAD(&mapping->private_list);
spin_lock_init(&mapping->private_lock);
INIT_RAW_PRIO_TREE_ROOT(&mapping->i_mmap);
Index: linux-2.6/fs/nilfs2/page.c
===================================================================
--- linux-2.6.orig/fs/nilfs2/page.c
+++ linux-2.6/fs/nilfs2/page.c
@@ -500,7 +500,7 @@ void nilfs_mapping_init_once(struct addr
INIT_LIST_HEAD(&mapping->private_list);
spin_lock_init(&mapping->private_lock);
- spin_lock_init(&mapping->i_mmap_lock);
+ mutex_init(&mapping->i_mmap_mutex);
INIT_RAW_PRIO_TREE_ROOT(&mapping->i_mmap);
INIT_LIST_HEAD(&mapping->i_mmap_nonlinear);
}
Index: linux-2.6/Documentation/lockstat.txt
===================================================================
--- linux-2.6.orig/Documentation/lockstat.txt
+++ linux-2.6/Documentation/lockstat.txt
@@ -136,7 +136,7 @@ The integer part of the time values is i
dcache_lock: 1037 1161 0.38 45.32 774.51 6611 243371 0.15 306.48 77387.24
&inode->i_mutex: 161 286 18446744073709 62882.54 1244614.55 3653 20598 18446744073709 62318.60 1693822.74
&zone->lru_lock: 94 94 0.53 7.33 92.10 4366 32690 0.29 59.81 16350.06
- &inode->i_data.i_mmap_lock: 79 79 0.40 3.77 53.03 11779 87755 0.28 116.93 29898.44
+ &inode->i_data.i_mmap_mutex: 79 79 0.40 3.77 53.03 11779 87755 0.28 116.93 29898.44
&q->__queue_lock: 48 50 0.52 31.62 86.31 774 13131 0.17 113.08 12277.52
&rq->rq_lock_key: 43 47 0.74 68.50 170.63 3706 33929 0.22 107.99 17460.62
&rq->rq_lock_key#2: 39 46 0.75 6.68 49.03 2979 32292 0.17 125.17 17137.63
Index: linux-2.6/Documentation/vm/locking
===================================================================
--- linux-2.6.orig/Documentation/vm/locking
+++ linux-2.6/Documentation/vm/locking
@@ -66,7 +66,7 @@ in some cases it is not really needed. E
expand_stack(), it is hard to come up with a destructive scenario without
having the vmlist protection in this case.
-The page_table_lock nests with the inode i_mmap_lock and the kmem cache
+The page_table_lock nests with the inode i_mmap_mutex and the kmem cache
c_spinlock spinlocks. This is okay, since the kmem code asks for pages after
dropping c_spinlock. The page_table_lock also nests with pagecache_lock and
pagemap_lru_lock spinlocks, and no code asks for memory with these locks
Index: linux-2.6/include/linux/mmu_notifier.h
===================================================================
--- linux-2.6.orig/include/linux/mmu_notifier.h
+++ linux-2.6/include/linux/mmu_notifier.h
@@ -150,7 +150,7 @@ struct mmu_notifier_ops {
* Therefore notifier chains can only be traversed when either
*
* 1. mmap_sem is held.
- * 2. One of the reverse map locks is held (i_mmap_lock or anon_vma->lock).
+ * 2. One of the reverse map locks is held (i_mmap_mutex or anon_vma->lock).
* 3. No other concurrent thread can access the list (release)
*/
struct mmu_notifier {
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -58,16 +58,16 @@
/*
* Lock ordering:
*
- * ->i_mmap_lock (truncate_pagecache)
+ * ->i_mmap_mutex (truncate_pagecache)
* ->private_lock (__free_pte->__set_page_dirty_buffers)
* ->swap_lock (exclusive_swap_page, others)
* ->mapping->tree_lock
*
* ->i_mutex
- * ->i_mmap_lock (truncate->unmap_mapping_range)
+ * ->i_mmap_mutex (truncate->unmap_mapping_range)
*
* ->mmap_sem
- * ->i_mmap_lock
+ * ->i_mmap_mutex
* ->page_table_lock or pte_lock (various, mainly in memory.c)
* ->mapping->tree_lock (arch-dependent flush_dcache_mmap_lock)
*
@@ -84,7 +84,7 @@
* ->sb_lock (fs/fs-writeback.c)
* ->mapping->tree_lock (__sync_single_inode)
*
- * ->i_mmap_lock
+ * ->i_mmap_mutex
* ->anon_vma.lock (vma_adjust)
*
* ->anon_vma.lock
@@ -104,7 +104,7 @@
*
* (code doesn't rely on that order, so you could switch it around)
* ->tasklist_lock (memory_failure, collect_procs_ao)
- * ->i_mmap_lock
+ * ->i_mmap_mutex
*/
/*
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 14/25] mm: Convert i_mmap_lock to a mutex
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-convert_i_mmap_lock_to_mutexes.patch --]
[-- Type: text/plain, Size: 26454 bytes --]
Straight fwd conversion of i_mmap_lock to a mutex
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
Documentation/lockstat.txt | 2 +-
Documentation/vm/locking | 2 +-
arch/x86/mm/hugetlbpage.c | 4 ++--
fs/gfs2/main.c | 2 +-
fs/hugetlbfs/inode.c | 4 ++--
fs/inode.c | 2 +-
fs/nilfs2/page.c | 2 +-
include/linux/fs.h | 2 +-
include/linux/mm.h | 2 +-
include/linux/mmu_notifier.h | 2 +-
kernel/fork.c | 4 ++--
mm/filemap.c | 10 +++++-----
mm/filemap_xip.c | 4 ++--
mm/fremap.c | 4 ++--
mm/hugetlb.c | 14 +++++++-------
mm/memory-failure.c | 4 ++--
mm/memory.c | 22 +++++++++++-----------
mm/mmap.c | 22 +++++++++++-----------
mm/mremap.c | 4 ++--
mm/rmap.c | 28 ++++++++++++++--------------
20 files changed, 70 insertions(+), 70 deletions(-)
Index: linux-2.6/arch/x86/mm/hugetlbpage.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/hugetlbpage.c
+++ linux-2.6/arch/x86/mm/hugetlbpage.c
@@ -72,7 +72,7 @@ static void huge_pmd_share(struct mm_str
if (!vma_shareable(vma, addr))
return;
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
vma_prio_tree_foreach(svma, &iter, &mapping->i_mmap, idx, idx) {
if (svma == vma)
continue;
@@ -97,7 +97,7 @@ static void huge_pmd_share(struct mm_str
put_page(virt_to_page(spte));
spin_unlock(&mm->page_table_lock);
out:
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
}
/*
Index: linux-2.6/fs/hugetlbfs/inode.c
===================================================================
--- linux-2.6.orig/fs/hugetlbfs/inode.c
+++ linux-2.6/fs/hugetlbfs/inode.c
@@ -413,10 +413,10 @@ static int hugetlb_vmtruncate(struct ino
pgoff = offset >> PAGE_SHIFT;
i_size_write(inode, offset);
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
if (!prio_tree_empty(&mapping->i_mmap))
hugetlb_vmtruncate_list(&mapping->i_mmap, pgoff);
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
truncate_hugepages(inode, offset);
return 0;
}
Index: linux-2.6/fs/inode.c
===================================================================
--- linux-2.6.orig/fs/inode.c
+++ linux-2.6/fs/inode.c
@@ -310,7 +310,7 @@ void inode_init_once(struct inode *inode
INIT_LIST_HEAD(&inode->i_lru);
INIT_RADIX_TREE(&inode->i_data.page_tree, GFP_ATOMIC);
spin_lock_init(&inode->i_data.tree_lock);
- spin_lock_init(&inode->i_data.i_mmap_lock);
+ mutex_init(&inode->i_data.i_mmap_mutex);
INIT_LIST_HEAD(&inode->i_data.private_list);
spin_lock_init(&inode->i_data.private_lock);
INIT_RAW_PRIO_TREE_ROOT(&inode->i_data.i_mmap);
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -639,7 +639,7 @@ struct address_space {
unsigned int i_mmap_writable;/* count VM_SHARED mappings */
struct prio_tree_root i_mmap; /* tree of private and shared mappings */
struct list_head i_mmap_nonlinear;/*list VM_NONLINEAR mappings */
- spinlock_t i_mmap_lock; /* protect tree, count, list */
+ struct mutex i_mmap_mutex; /* protect tree, count, list */
unsigned int truncate_count; /* Cover race condition with truncate */
unsigned long nrpages; /* number of total pages */
pgoff_t writeback_index;/* writeback starts here */
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -876,7 +876,7 @@ struct zap_details {
struct address_space *check_mapping; /* Check page->mapping if set */
pgoff_t first_index; /* Lowest page->index to unmap */
pgoff_t last_index; /* Highest page->index to unmap */
- spinlock_t *i_mmap_lock; /* For unmap_mapping_range: */
+ struct mutex *i_mmap_mutex; /* For unmap_mapping_range: */
unsigned long truncate_count; /* Compare vm_truncate_count */
};
Index: linux-2.6/kernel/fork.c
===================================================================
--- linux-2.6.orig/kernel/fork.c
+++ linux-2.6/kernel/fork.c
@@ -376,7 +376,7 @@ static int dup_mmap(struct mm_struct *mm
get_file(file);
if (tmp->vm_flags & VM_DENYWRITE)
atomic_dec(&inode->i_writecount);
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
if (tmp->vm_flags & VM_SHARED)
mapping->i_mmap_writable++;
tmp->vm_truncate_count = mpnt->vm_truncate_count;
@@ -384,7 +384,7 @@ static int dup_mmap(struct mm_struct *mm
/* insert tmp into the share list, just after mpnt */
vma_prio_tree_add(tmp, mpnt);
flush_dcache_mmap_unlock(mapping);
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
}
/*
Index: linux-2.6/mm/filemap_xip.c
===================================================================
--- linux-2.6.orig/mm/filemap_xip.c
+++ linux-2.6/mm/filemap_xip.c
@@ -183,7 +183,7 @@ __xip_unmap (struct address_space * mapp
return;
retry:
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
mm = vma->vm_mm;
address = vma->vm_start +
@@ -201,7 +201,7 @@ __xip_unmap (struct address_space * mapp
page_cache_release(page);
}
}
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
if (locked) {
mutex_unlock(&xip_sparse_mutex);
Index: linux-2.6/mm/fremap.c
===================================================================
--- linux-2.6.orig/mm/fremap.c
+++ linux-2.6/mm/fremap.c
@@ -211,13 +211,13 @@ SYSCALL_DEFINE5(remap_file_pages, unsign
}
goto out;
}
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
flush_dcache_mmap_lock(mapping);
vma->vm_flags |= VM_NONLINEAR;
vma_prio_tree_remove(vma, &mapping->i_mmap);
vma_nonlinear_insert(vma, &mapping->i_mmap_nonlinear);
flush_dcache_mmap_unlock(mapping);
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
}
if (vma->vm_flags & VM_LOCKED) {
Index: linux-2.6/mm/hugetlb.c
===================================================================
--- linux-2.6.orig/mm/hugetlb.c
+++ linux-2.6/mm/hugetlb.c
@@ -2207,7 +2207,7 @@ void __unmap_hugepage_range(struct vm_ar
unsigned long sz = huge_page_size(h);
/*
- * A page gathering list, protected by per file i_mmap_lock. The
+ * A page gathering list, protected by per file i_mmap_mutex. The
* lock is used to avoid list corruption from multiple unmapping
* of the same page since we are using page->lru.
*/
@@ -2276,9 +2276,9 @@ void __unmap_hugepage_range(struct vm_ar
void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
unsigned long end, struct page *ref_page)
{
- spin_lock(&vma->vm_file->f_mapping->i_mmap_lock);
+ mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
__unmap_hugepage_range(vma, start, end, ref_page);
- spin_unlock(&vma->vm_file->f_mapping->i_mmap_lock);
+ mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
}
/*
@@ -2310,7 +2310,7 @@ static int unmap_ref_private(struct mm_s
* this mapping should be shared between all the VMAs,
* __unmap_hugepage_range() is called as the lock is already held
*/
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
vma_prio_tree_foreach(iter_vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
/* Do not unmap the current VMA */
if (iter_vma == vma)
@@ -2328,7 +2328,7 @@ static int unmap_ref_private(struct mm_s
address, address + huge_page_size(h),
page);
}
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
return 1;
}
@@ -2812,7 +2812,7 @@ void hugetlb_change_protection(struct vm
BUG_ON(address >= end);
flush_cache_range(vma, address, end);
- spin_lock(&vma->vm_file->f_mapping->i_mmap_lock);
+ mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
spin_lock(&mm->page_table_lock);
for (; address < end; address += huge_page_size(h)) {
ptep = huge_pte_offset(mm, address);
@@ -2827,7 +2827,7 @@ void hugetlb_change_protection(struct vm
}
}
spin_unlock(&mm->page_table_lock);
- spin_unlock(&vma->vm_file->f_mapping->i_mmap_lock);
+ mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
flush_tlb_range(vma, start, end);
}
Index: linux-2.6/mm/memory-failure.c
===================================================================
--- linux-2.6.orig/mm/memory-failure.c
+++ linux-2.6/mm/memory-failure.c
@@ -431,7 +431,7 @@ static void collect_procs_file(struct pa
*/
read_lock(&tasklist_lock);
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
for_each_process(tsk) {
pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
@@ -451,7 +451,7 @@ static void collect_procs_file(struct pa
add_to_kill(tsk, page, vma, to_kill, tkc);
}
}
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
read_unlock(&tasklist_lock);
}
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -1205,7 +1205,7 @@ unsigned long unmap_vmas(struct mmu_gath
{
long zap_work = ZAP_BLOCK_SIZE;
unsigned long start = start_addr;
- spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
+ struct mutex *i_mmap_mutex = details ? details->i_mmap_mutex : NULL;
struct mm_struct *mm = vma->vm_mm;
mmu_notifier_invalidate_range_start(mm, start_addr, end_addr);
@@ -1255,8 +1255,8 @@ unsigned long unmap_vmas(struct mmu_gath
}
if (need_resched() ||
- (i_mmap_lock && spin_needbreak(i_mmap_lock))) {
- if (i_mmap_lock)
+ (i_mmap_mutex && mutex_is_contended(i_mmap_mutex))) {
+ if (i_mmap_mutex)
goto out;
cond_resched();
}
@@ -2531,7 +2531,7 @@ static int do_wp_page(struct mm_struct *
/*
* Helper functions for unmap_mapping_range().
*
- * __ Notes on dropping i_mmap_lock to reduce latency while unmapping __
+ * __ Notes on dropping i_mmap_mutex to reduce latency while unmapping __
*
* We have to restart searching the prio_tree whenever we drop the lock,
* since the iterator is only valid while the lock is held, and anyway
@@ -2550,7 +2550,7 @@ static int do_wp_page(struct mm_struct *
* can't efficiently keep all vmas in step with mapping->truncate_count:
* so instead reset them all whenever it wraps back to 0 (then go to 1).
* mapping->truncate_count and vma->vm_truncate_count are protected by
- * i_mmap_lock.
+ * i_mmap_mutex.
*
* In order to make forward progress despite repeatedly restarting some
* large vma, note the restart_addr from unmap_vmas when it breaks out:
@@ -2600,7 +2600,7 @@ static int unmap_mapping_range_vma(struc
restart_addr = zap_page_range(vma, start_addr,
end_addr - start_addr, details);
- need_break = need_resched() || spin_needbreak(details->i_mmap_lock);
+ need_break = need_resched() || mutex_is_contended(details->i_mmap_mutex);
if (restart_addr >= end_addr) {
/* We have now completed this vma: mark it so */
@@ -2614,9 +2614,9 @@ static int unmap_mapping_range_vma(struc
goto again;
}
- spin_unlock(details->i_mmap_lock);
+ mutex_unlock(details->i_mmap_mutex);
cond_resched();
- spin_lock(details->i_mmap_lock);
+ mutex_lock(details->i_mmap_mutex);
return -EINTR;
}
@@ -2710,9 +2710,9 @@ void unmap_mapping_range(struct address_
details.last_index = hba + hlen - 1;
if (details.last_index < details.first_index)
details.last_index = ULONG_MAX;
- details.i_mmap_lock = &mapping->i_mmap_lock;
+ details.i_mmap_mutex = &mapping->i_mmap_mutex;
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
/* Protect against endless unmapping loops */
mapping->truncate_count++;
@@ -2727,7 +2727,7 @@ void unmap_mapping_range(struct address_
unmap_mapping_range_tree(&mapping->i_mmap, &details);
if (unlikely(!list_empty(&mapping->i_mmap_nonlinear)))
unmap_mapping_range_list(&mapping->i_mmap_nonlinear, &details);
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
}
EXPORT_SYMBOL(unmap_mapping_range);
Index: linux-2.6/mm/mmap.c
===================================================================
--- linux-2.6.orig/mm/mmap.c
+++ linux-2.6/mm/mmap.c
@@ -190,7 +190,7 @@ int __vm_enough_memory(struct mm_struct
}
/*
- * Requires inode->i_mapping->i_mmap_lock
+ * Requires inode->i_mapping->i_mmap_mutex
*/
static void __remove_shared_vm_struct(struct vm_area_struct *vma,
struct file *file, struct address_space *mapping)
@@ -218,9 +218,9 @@ void unlink_file_vma(struct vm_area_stru
if (file) {
struct address_space *mapping = file->f_mapping;
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
__remove_shared_vm_struct(vma, file, mapping);
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
}
}
@@ -465,7 +465,7 @@ static void vma_link(struct mm_struct *m
mapping = vma->vm_file->f_mapping;
if (mapping) {
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
vma->vm_truncate_count = mapping->truncate_count;
}
@@ -473,7 +473,7 @@ static void vma_link(struct mm_struct *m
__vma_link_file(vma);
if (mapping)
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
mm->map_count++;
validate_mm(mm);
@@ -576,7 +576,7 @@ again: remove_next = 1 + (end > next->
mapping = file->f_mapping;
if (!(vma->vm_flags & VM_NONLINEAR))
root = &mapping->i_mmap;
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
if (importer &&
vma->vm_truncate_count != next->vm_truncate_count) {
/*
@@ -652,7 +652,7 @@ again: remove_next = 1 + (end > next->
if (anon_vma)
anon_vma_unlock(anon_vma);
if (mapping)
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
if (remove_next) {
if (file) {
@@ -2311,7 +2311,7 @@ void exit_mmap(struct mm_struct *mm)
/* Insert vm structure into process list sorted by address
* and into the inode's i_mmap tree. If vm_file is non-NULL
- * then i_mmap_lock is taken here.
+ * then i_mmap_mutex is taken here.
*/
int insert_vm_struct(struct mm_struct * mm, struct vm_area_struct * vma)
{
@@ -2553,7 +2553,7 @@ static void vm_lock_mapping(struct mm_st
*/
if (test_and_set_bit(AS_MM_ALL_LOCKS, &mapping->flags))
BUG();
- spin_lock_nest_lock(&mapping->i_mmap_lock, &mm->mmap_sem);
+ mutex_lock_nest_lock(&mapping->i_mmap_mutex, &mm->mmap_sem);
}
}
@@ -2580,7 +2580,7 @@ static void vm_lock_mapping(struct mm_st
* vma in this mm is backed by the same anon_vma or address_space.
*
* We can take all the locks in random order because the VM code
- * taking i_mmap_lock or anon_vma->lock outside the mmap_sem never
+ * taking i_mmap_mutex or anon_vma->lock outside the mmap_sem never
* takes more than one of them in a row. Secondly we're protected
* against a concurrent mm_take_all_locks() by the mm_all_locks_mutex.
*
@@ -2652,7 +2652,7 @@ static void vm_unlock_mapping(struct add
* AS_MM_ALL_LOCKS can't change to 0 from under us
* because we hold the mm_all_locks_mutex.
*/
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
if (!test_and_clear_bit(AS_MM_ALL_LOCKS,
&mapping->flags))
BUG();
Index: linux-2.6/mm/mremap.c
===================================================================
--- linux-2.6.orig/mm/mremap.c
+++ linux-2.6/mm/mremap.c
@@ -93,7 +93,7 @@ static void move_ptes(struct vm_area_str
* and we propagate stale pages into the dst afterward.
*/
mapping = vma->vm_file->f_mapping;
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
if (new_vma->vm_truncate_count &&
new_vma->vm_truncate_count != vma->vm_truncate_count)
new_vma->vm_truncate_count = 0;
@@ -125,7 +125,7 @@ static void move_ptes(struct vm_area_str
pte_unmap(new_pte - 1);
pte_unmap_unlock(old_pte - 1, old_ptl);
if (mapping)
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
mmu_notifier_invalidate_range_end(vma->vm_mm, old_start, old_end);
}
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -24,7 +24,7 @@
* inode->i_alloc_sem (vmtruncate_range)
* mm->mmap_sem
* page->flags PG_locked (lock_page)
- * mapping->i_mmap_lock
+ * mapping->i_mmap_mutex
* anon_vma->lock
* mm->page_table_lock or pte_lock
* zone->lru_lock (in mark_page_accessed, isolate_lru_page)
@@ -625,14 +625,14 @@ static int page_referenced_file(struct p
* The page lock not only makes sure that page->mapping cannot
* suddenly be NULLified by truncation, it makes sure that the
* structure at mapping cannot be freed and reused yet,
- * so we can safely take mapping->i_mmap_lock.
+ * so we can safely take mapping->i_mmap_mutex.
*/
BUG_ON(!PageLocked(page));
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
/*
- * i_mmap_lock does not stabilize mapcount at all, but mapcount
+ * i_mmap_mutex does not stabilize mapcount at all, but mapcount
* is more likely to be accurate if we note it after spinning.
*/
mapcount = page_mapcount(page);
@@ -654,7 +654,7 @@ static int page_referenced_file(struct p
break;
}
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
return referenced;
}
@@ -741,7 +741,7 @@ static int page_mkclean_file(struct addr
BUG_ON(PageAnon(page));
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
if (vma->vm_flags & VM_SHARED) {
unsigned long address = vma_address(page, vma);
@@ -750,7 +750,7 @@ static int page_mkclean_file(struct addr
ret += page_mkclean_one(page, vma, address);
}
}
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
return ret;
}
@@ -1101,7 +1101,7 @@ int try_to_unmap_one(struct page *page,
/*
* We need mmap_sem locking, Otherwise VM_LOCKED check makes
* unstable result and race. Plus, We can't wait here because
- * we now hold anon_vma->lock or mapping->i_mmap_lock.
+ * we now hold anon_vma->lock or mapping->i_mmap_mutex.
* if trylock failed, the page remain in evictable lru and later
* vmscan could retry to move the page to unevictable lru if the
* page is actually mlocked.
@@ -1327,7 +1327,7 @@ static int try_to_unmap_file(struct page
unsigned long max_nl_size = 0;
unsigned int mapcount;
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
unsigned long address = vma_address(page, vma);
if (address == -EFAULT)
@@ -1373,7 +1373,7 @@ static int try_to_unmap_file(struct page
mapcount = page_mapcount(page);
if (!mapcount)
goto out;
- cond_resched_lock(&mapping->i_mmap_lock);
+ cond_resched();
max_nl_size = (max_nl_size + CLUSTER_SIZE - 1) & CLUSTER_MASK;
if (max_nl_cursor == 0)
@@ -1395,7 +1395,7 @@ static int try_to_unmap_file(struct page
}
vma->vm_private_data = (void *) max_nl_cursor;
}
- cond_resched_lock(&mapping->i_mmap_lock);
+ cond_resched();
max_nl_cursor += CLUSTER_SIZE;
} while (max_nl_cursor <= max_nl_size);
@@ -1407,7 +1407,7 @@ static int try_to_unmap_file(struct page
list_for_each_entry(vma, &mapping->i_mmap_nonlinear, shared.vm_set.list)
vma->vm_private_data = NULL;
out:
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
return ret;
}
@@ -1552,7 +1552,7 @@ static int rmap_walk_file(struct page *p
if (!mapping)
return ret;
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
unsigned long address = vma_address(page, vma);
if (address == -EFAULT)
@@ -1566,7 +1566,7 @@ static int rmap_walk_file(struct page *p
* never contain migration ptes. Decide what to do about this
* limitation to linear when we need rmap_walk() on nonlinear.
*/
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
return ret;
}
Index: linux-2.6/fs/gfs2/main.c
===================================================================
--- linux-2.6.orig/fs/gfs2/main.c
+++ linux-2.6/fs/gfs2/main.c
@@ -62,7 +62,7 @@ static void gfs2_init_gl_aspace_once(voi
memset(mapping, 0, sizeof(*mapping));
INIT_RADIX_TREE(&mapping->page_tree, GFP_ATOMIC);
spin_lock_init(&mapping->tree_lock);
- spin_lock_init(&mapping->i_mmap_lock);
+ mutex_init(&mapping->i_mmap_mutex);
INIT_LIST_HEAD(&mapping->private_list);
spin_lock_init(&mapping->private_lock);
INIT_RAW_PRIO_TREE_ROOT(&mapping->i_mmap);
Index: linux-2.6/fs/nilfs2/page.c
===================================================================
--- linux-2.6.orig/fs/nilfs2/page.c
+++ linux-2.6/fs/nilfs2/page.c
@@ -500,7 +500,7 @@ void nilfs_mapping_init_once(struct addr
INIT_LIST_HEAD(&mapping->private_list);
spin_lock_init(&mapping->private_lock);
- spin_lock_init(&mapping->i_mmap_lock);
+ mutex_init(&mapping->i_mmap_mutex);
INIT_RAW_PRIO_TREE_ROOT(&mapping->i_mmap);
INIT_LIST_HEAD(&mapping->i_mmap_nonlinear);
}
Index: linux-2.6/Documentation/lockstat.txt
===================================================================
--- linux-2.6.orig/Documentation/lockstat.txt
+++ linux-2.6/Documentation/lockstat.txt
@@ -136,7 +136,7 @@ The integer part of the time values is i
dcache_lock: 1037 1161 0.38 45.32 774.51 6611 243371 0.15 306.48 77387.24
&inode->i_mutex: 161 286 18446744073709 62882.54 1244614.55 3653 20598 18446744073709 62318.60 1693822.74
&zone->lru_lock: 94 94 0.53 7.33 92.10 4366 32690 0.29 59.81 16350.06
- &inode->i_data.i_mmap_lock: 79 79 0.40 3.77 53.03 11779 87755 0.28 116.93 29898.44
+ &inode->i_data.i_mmap_mutex: 79 79 0.40 3.77 53.03 11779 87755 0.28 116.93 29898.44
&q->__queue_lock: 48 50 0.52 31.62 86.31 774 13131 0.17 113.08 12277.52
&rq->rq_lock_key: 43 47 0.74 68.50 170.63 3706 33929 0.22 107.99 17460.62
&rq->rq_lock_key#2: 39 46 0.75 6.68 49.03 2979 32292 0.17 125.17 17137.63
Index: linux-2.6/Documentation/vm/locking
===================================================================
--- linux-2.6.orig/Documentation/vm/locking
+++ linux-2.6/Documentation/vm/locking
@@ -66,7 +66,7 @@ in some cases it is not really needed. E
expand_stack(), it is hard to come up with a destructive scenario without
having the vmlist protection in this case.
-The page_table_lock nests with the inode i_mmap_lock and the kmem cache
+The page_table_lock nests with the inode i_mmap_mutex and the kmem cache
c_spinlock spinlocks. This is okay, since the kmem code asks for pages after
dropping c_spinlock. The page_table_lock also nests with pagecache_lock and
pagemap_lru_lock spinlocks, and no code asks for memory with these locks
Index: linux-2.6/include/linux/mmu_notifier.h
===================================================================
--- linux-2.6.orig/include/linux/mmu_notifier.h
+++ linux-2.6/include/linux/mmu_notifier.h
@@ -150,7 +150,7 @@ struct mmu_notifier_ops {
* Therefore notifier chains can only be traversed when either
*
* 1. mmap_sem is held.
- * 2. One of the reverse map locks is held (i_mmap_lock or anon_vma->lock).
+ * 2. One of the reverse map locks is held (i_mmap_mutex or anon_vma->lock).
* 3. No other concurrent thread can access the list (release)
*/
struct mmu_notifier {
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -58,16 +58,16 @@
/*
* Lock ordering:
*
- * ->i_mmap_lock (truncate_pagecache)
+ * ->i_mmap_mutex (truncate_pagecache)
* ->private_lock (__free_pte->__set_page_dirty_buffers)
* ->swap_lock (exclusive_swap_page, others)
* ->mapping->tree_lock
*
* ->i_mutex
- * ->i_mmap_lock (truncate->unmap_mapping_range)
+ * ->i_mmap_mutex (truncate->unmap_mapping_range)
*
* ->mmap_sem
- * ->i_mmap_lock
+ * ->i_mmap_mutex
* ->page_table_lock or pte_lock (various, mainly in memory.c)
* ->mapping->tree_lock (arch-dependent flush_dcache_mmap_lock)
*
@@ -84,7 +84,7 @@
* ->sb_lock (fs/fs-writeback.c)
* ->mapping->tree_lock (__sync_single_inode)
*
- * ->i_mmap_lock
+ * ->i_mmap_mutex
* ->anon_vma.lock (vma_adjust)
*
* ->anon_vma.lock
@@ -104,7 +104,7 @@
*
* (code doesn't rely on that order, so you could switch it around)
* ->tasklist_lock (memory_failure, collect_procs_ao)
- * ->i_mmap_lock
+ * ->i_mmap_mutex
*/
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 14/25] mm: Convert i_mmap_lock to a mutex
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-convert_i_mmap_lock_to_mutexes.patch --]
[-- Type: text/plain, Size: 26454 bytes --]
Straight fwd conversion of i_mmap_lock to a mutex
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
Documentation/lockstat.txt | 2 +-
Documentation/vm/locking | 2 +-
arch/x86/mm/hugetlbpage.c | 4 ++--
fs/gfs2/main.c | 2 +-
fs/hugetlbfs/inode.c | 4 ++--
fs/inode.c | 2 +-
fs/nilfs2/page.c | 2 +-
include/linux/fs.h | 2 +-
include/linux/mm.h | 2 +-
include/linux/mmu_notifier.h | 2 +-
kernel/fork.c | 4 ++--
mm/filemap.c | 10 +++++-----
mm/filemap_xip.c | 4 ++--
mm/fremap.c | 4 ++--
mm/hugetlb.c | 14 +++++++-------
mm/memory-failure.c | 4 ++--
mm/memory.c | 22 +++++++++++-----------
mm/mmap.c | 22 +++++++++++-----------
mm/mremap.c | 4 ++--
mm/rmap.c | 28 ++++++++++++++--------------
20 files changed, 70 insertions(+), 70 deletions(-)
Index: linux-2.6/arch/x86/mm/hugetlbpage.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/hugetlbpage.c
+++ linux-2.6/arch/x86/mm/hugetlbpage.c
@@ -72,7 +72,7 @@ static void huge_pmd_share(struct mm_str
if (!vma_shareable(vma, addr))
return;
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
vma_prio_tree_foreach(svma, &iter, &mapping->i_mmap, idx, idx) {
if (svma == vma)
continue;
@@ -97,7 +97,7 @@ static void huge_pmd_share(struct mm_str
put_page(virt_to_page(spte));
spin_unlock(&mm->page_table_lock);
out:
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
}
/*
Index: linux-2.6/fs/hugetlbfs/inode.c
===================================================================
--- linux-2.6.orig/fs/hugetlbfs/inode.c
+++ linux-2.6/fs/hugetlbfs/inode.c
@@ -413,10 +413,10 @@ static int hugetlb_vmtruncate(struct ino
pgoff = offset >> PAGE_SHIFT;
i_size_write(inode, offset);
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
if (!prio_tree_empty(&mapping->i_mmap))
hugetlb_vmtruncate_list(&mapping->i_mmap, pgoff);
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
truncate_hugepages(inode, offset);
return 0;
}
Index: linux-2.6/fs/inode.c
===================================================================
--- linux-2.6.orig/fs/inode.c
+++ linux-2.6/fs/inode.c
@@ -310,7 +310,7 @@ void inode_init_once(struct inode *inode
INIT_LIST_HEAD(&inode->i_lru);
INIT_RADIX_TREE(&inode->i_data.page_tree, GFP_ATOMIC);
spin_lock_init(&inode->i_data.tree_lock);
- spin_lock_init(&inode->i_data.i_mmap_lock);
+ mutex_init(&inode->i_data.i_mmap_mutex);
INIT_LIST_HEAD(&inode->i_data.private_list);
spin_lock_init(&inode->i_data.private_lock);
INIT_RAW_PRIO_TREE_ROOT(&inode->i_data.i_mmap);
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -639,7 +639,7 @@ struct address_space {
unsigned int i_mmap_writable;/* count VM_SHARED mappings */
struct prio_tree_root i_mmap; /* tree of private and shared mappings */
struct list_head i_mmap_nonlinear;/*list VM_NONLINEAR mappings */
- spinlock_t i_mmap_lock; /* protect tree, count, list */
+ struct mutex i_mmap_mutex; /* protect tree, count, list */
unsigned int truncate_count; /* Cover race condition with truncate */
unsigned long nrpages; /* number of total pages */
pgoff_t writeback_index;/* writeback starts here */
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -876,7 +876,7 @@ struct zap_details {
struct address_space *check_mapping; /* Check page->mapping if set */
pgoff_t first_index; /* Lowest page->index to unmap */
pgoff_t last_index; /* Highest page->index to unmap */
- spinlock_t *i_mmap_lock; /* For unmap_mapping_range: */
+ struct mutex *i_mmap_mutex; /* For unmap_mapping_range: */
unsigned long truncate_count; /* Compare vm_truncate_count */
};
Index: linux-2.6/kernel/fork.c
===================================================================
--- linux-2.6.orig/kernel/fork.c
+++ linux-2.6/kernel/fork.c
@@ -376,7 +376,7 @@ static int dup_mmap(struct mm_struct *mm
get_file(file);
if (tmp->vm_flags & VM_DENYWRITE)
atomic_dec(&inode->i_writecount);
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
if (tmp->vm_flags & VM_SHARED)
mapping->i_mmap_writable++;
tmp->vm_truncate_count = mpnt->vm_truncate_count;
@@ -384,7 +384,7 @@ static int dup_mmap(struct mm_struct *mm
/* insert tmp into the share list, just after mpnt */
vma_prio_tree_add(tmp, mpnt);
flush_dcache_mmap_unlock(mapping);
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
}
/*
Index: linux-2.6/mm/filemap_xip.c
===================================================================
--- linux-2.6.orig/mm/filemap_xip.c
+++ linux-2.6/mm/filemap_xip.c
@@ -183,7 +183,7 @@ __xip_unmap (struct address_space * mapp
return;
retry:
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
mm = vma->vm_mm;
address = vma->vm_start +
@@ -201,7 +201,7 @@ __xip_unmap (struct address_space * mapp
page_cache_release(page);
}
}
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
if (locked) {
mutex_unlock(&xip_sparse_mutex);
Index: linux-2.6/mm/fremap.c
===================================================================
--- linux-2.6.orig/mm/fremap.c
+++ linux-2.6/mm/fremap.c
@@ -211,13 +211,13 @@ SYSCALL_DEFINE5(remap_file_pages, unsign
}
goto out;
}
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
flush_dcache_mmap_lock(mapping);
vma->vm_flags |= VM_NONLINEAR;
vma_prio_tree_remove(vma, &mapping->i_mmap);
vma_nonlinear_insert(vma, &mapping->i_mmap_nonlinear);
flush_dcache_mmap_unlock(mapping);
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
}
if (vma->vm_flags & VM_LOCKED) {
Index: linux-2.6/mm/hugetlb.c
===================================================================
--- linux-2.6.orig/mm/hugetlb.c
+++ linux-2.6/mm/hugetlb.c
@@ -2207,7 +2207,7 @@ void __unmap_hugepage_range(struct vm_ar
unsigned long sz = huge_page_size(h);
/*
- * A page gathering list, protected by per file i_mmap_lock. The
+ * A page gathering list, protected by per file i_mmap_mutex. The
* lock is used to avoid list corruption from multiple unmapping
* of the same page since we are using page->lru.
*/
@@ -2276,9 +2276,9 @@ void __unmap_hugepage_range(struct vm_ar
void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
unsigned long end, struct page *ref_page)
{
- spin_lock(&vma->vm_file->f_mapping->i_mmap_lock);
+ mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
__unmap_hugepage_range(vma, start, end, ref_page);
- spin_unlock(&vma->vm_file->f_mapping->i_mmap_lock);
+ mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
}
/*
@@ -2310,7 +2310,7 @@ static int unmap_ref_private(struct mm_s
* this mapping should be shared between all the VMAs,
* __unmap_hugepage_range() is called as the lock is already held
*/
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
vma_prio_tree_foreach(iter_vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
/* Do not unmap the current VMA */
if (iter_vma == vma)
@@ -2328,7 +2328,7 @@ static int unmap_ref_private(struct mm_s
address, address + huge_page_size(h),
page);
}
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
return 1;
}
@@ -2812,7 +2812,7 @@ void hugetlb_change_protection(struct vm
BUG_ON(address >= end);
flush_cache_range(vma, address, end);
- spin_lock(&vma->vm_file->f_mapping->i_mmap_lock);
+ mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
spin_lock(&mm->page_table_lock);
for (; address < end; address += huge_page_size(h)) {
ptep = huge_pte_offset(mm, address);
@@ -2827,7 +2827,7 @@ void hugetlb_change_protection(struct vm
}
}
spin_unlock(&mm->page_table_lock);
- spin_unlock(&vma->vm_file->f_mapping->i_mmap_lock);
+ mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
flush_tlb_range(vma, start, end);
}
Index: linux-2.6/mm/memory-failure.c
===================================================================
--- linux-2.6.orig/mm/memory-failure.c
+++ linux-2.6/mm/memory-failure.c
@@ -431,7 +431,7 @@ static void collect_procs_file(struct pa
*/
read_lock(&tasklist_lock);
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
for_each_process(tsk) {
pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
@@ -451,7 +451,7 @@ static void collect_procs_file(struct pa
add_to_kill(tsk, page, vma, to_kill, tkc);
}
}
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
read_unlock(&tasklist_lock);
}
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -1205,7 +1205,7 @@ unsigned long unmap_vmas(struct mmu_gath
{
long zap_work = ZAP_BLOCK_SIZE;
unsigned long start = start_addr;
- spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
+ struct mutex *i_mmap_mutex = details ? details->i_mmap_mutex : NULL;
struct mm_struct *mm = vma->vm_mm;
mmu_notifier_invalidate_range_start(mm, start_addr, end_addr);
@@ -1255,8 +1255,8 @@ unsigned long unmap_vmas(struct mmu_gath
}
if (need_resched() ||
- (i_mmap_lock && spin_needbreak(i_mmap_lock))) {
- if (i_mmap_lock)
+ (i_mmap_mutex && mutex_is_contended(i_mmap_mutex))) {
+ if (i_mmap_mutex)
goto out;
cond_resched();
}
@@ -2531,7 +2531,7 @@ static int do_wp_page(struct mm_struct *
/*
* Helper functions for unmap_mapping_range().
*
- * __ Notes on dropping i_mmap_lock to reduce latency while unmapping __
+ * __ Notes on dropping i_mmap_mutex to reduce latency while unmapping __
*
* We have to restart searching the prio_tree whenever we drop the lock,
* since the iterator is only valid while the lock is held, and anyway
@@ -2550,7 +2550,7 @@ static int do_wp_page(struct mm_struct *
* can't efficiently keep all vmas in step with mapping->truncate_count:
* so instead reset them all whenever it wraps back to 0 (then go to 1).
* mapping->truncate_count and vma->vm_truncate_count are protected by
- * i_mmap_lock.
+ * i_mmap_mutex.
*
* In order to make forward progress despite repeatedly restarting some
* large vma, note the restart_addr from unmap_vmas when it breaks out:
@@ -2600,7 +2600,7 @@ static int unmap_mapping_range_vma(struc
restart_addr = zap_page_range(vma, start_addr,
end_addr - start_addr, details);
- need_break = need_resched() || spin_needbreak(details->i_mmap_lock);
+ need_break = need_resched() || mutex_is_contended(details->i_mmap_mutex);
if (restart_addr >= end_addr) {
/* We have now completed this vma: mark it so */
@@ -2614,9 +2614,9 @@ static int unmap_mapping_range_vma(struc
goto again;
}
- spin_unlock(details->i_mmap_lock);
+ mutex_unlock(details->i_mmap_mutex);
cond_resched();
- spin_lock(details->i_mmap_lock);
+ mutex_lock(details->i_mmap_mutex);
return -EINTR;
}
@@ -2710,9 +2710,9 @@ void unmap_mapping_range(struct address_
details.last_index = hba + hlen - 1;
if (details.last_index < details.first_index)
details.last_index = ULONG_MAX;
- details.i_mmap_lock = &mapping->i_mmap_lock;
+ details.i_mmap_mutex = &mapping->i_mmap_mutex;
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
/* Protect against endless unmapping loops */
mapping->truncate_count++;
@@ -2727,7 +2727,7 @@ void unmap_mapping_range(struct address_
unmap_mapping_range_tree(&mapping->i_mmap, &details);
if (unlikely(!list_empty(&mapping->i_mmap_nonlinear)))
unmap_mapping_range_list(&mapping->i_mmap_nonlinear, &details);
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
}
EXPORT_SYMBOL(unmap_mapping_range);
Index: linux-2.6/mm/mmap.c
===================================================================
--- linux-2.6.orig/mm/mmap.c
+++ linux-2.6/mm/mmap.c
@@ -190,7 +190,7 @@ int __vm_enough_memory(struct mm_struct
}
/*
- * Requires inode->i_mapping->i_mmap_lock
+ * Requires inode->i_mapping->i_mmap_mutex
*/
static void __remove_shared_vm_struct(struct vm_area_struct *vma,
struct file *file, struct address_space *mapping)
@@ -218,9 +218,9 @@ void unlink_file_vma(struct vm_area_stru
if (file) {
struct address_space *mapping = file->f_mapping;
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
__remove_shared_vm_struct(vma, file, mapping);
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
}
}
@@ -465,7 +465,7 @@ static void vma_link(struct mm_struct *m
mapping = vma->vm_file->f_mapping;
if (mapping) {
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
vma->vm_truncate_count = mapping->truncate_count;
}
@@ -473,7 +473,7 @@ static void vma_link(struct mm_struct *m
__vma_link_file(vma);
if (mapping)
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
mm->map_count++;
validate_mm(mm);
@@ -576,7 +576,7 @@ again: remove_next = 1 + (end > next->
mapping = file->f_mapping;
if (!(vma->vm_flags & VM_NONLINEAR))
root = &mapping->i_mmap;
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
if (importer &&
vma->vm_truncate_count != next->vm_truncate_count) {
/*
@@ -652,7 +652,7 @@ again: remove_next = 1 + (end > next->
if (anon_vma)
anon_vma_unlock(anon_vma);
if (mapping)
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
if (remove_next) {
if (file) {
@@ -2311,7 +2311,7 @@ void exit_mmap(struct mm_struct *mm)
/* Insert vm structure into process list sorted by address
* and into the inode's i_mmap tree. If vm_file is non-NULL
- * then i_mmap_lock is taken here.
+ * then i_mmap_mutex is taken here.
*/
int insert_vm_struct(struct mm_struct * mm, struct vm_area_struct * vma)
{
@@ -2553,7 +2553,7 @@ static void vm_lock_mapping(struct mm_st
*/
if (test_and_set_bit(AS_MM_ALL_LOCKS, &mapping->flags))
BUG();
- spin_lock_nest_lock(&mapping->i_mmap_lock, &mm->mmap_sem);
+ mutex_lock_nest_lock(&mapping->i_mmap_mutex, &mm->mmap_sem);
}
}
@@ -2580,7 +2580,7 @@ static void vm_lock_mapping(struct mm_st
* vma in this mm is backed by the same anon_vma or address_space.
*
* We can take all the locks in random order because the VM code
- * taking i_mmap_lock or anon_vma->lock outside the mmap_sem never
+ * taking i_mmap_mutex or anon_vma->lock outside the mmap_sem never
* takes more than one of them in a row. Secondly we're protected
* against a concurrent mm_take_all_locks() by the mm_all_locks_mutex.
*
@@ -2652,7 +2652,7 @@ static void vm_unlock_mapping(struct add
* AS_MM_ALL_LOCKS can't change to 0 from under us
* because we hold the mm_all_locks_mutex.
*/
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
if (!test_and_clear_bit(AS_MM_ALL_LOCKS,
&mapping->flags))
BUG();
Index: linux-2.6/mm/mremap.c
===================================================================
--- linux-2.6.orig/mm/mremap.c
+++ linux-2.6/mm/mremap.c
@@ -93,7 +93,7 @@ static void move_ptes(struct vm_area_str
* and we propagate stale pages into the dst afterward.
*/
mapping = vma->vm_file->f_mapping;
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
if (new_vma->vm_truncate_count &&
new_vma->vm_truncate_count != vma->vm_truncate_count)
new_vma->vm_truncate_count = 0;
@@ -125,7 +125,7 @@ static void move_ptes(struct vm_area_str
pte_unmap(new_pte - 1);
pte_unmap_unlock(old_pte - 1, old_ptl);
if (mapping)
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
mmu_notifier_invalidate_range_end(vma->vm_mm, old_start, old_end);
}
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -24,7 +24,7 @@
* inode->i_alloc_sem (vmtruncate_range)
* mm->mmap_sem
* page->flags PG_locked (lock_page)
- * mapping->i_mmap_lock
+ * mapping->i_mmap_mutex
* anon_vma->lock
* mm->page_table_lock or pte_lock
* zone->lru_lock (in mark_page_accessed, isolate_lru_page)
@@ -625,14 +625,14 @@ static int page_referenced_file(struct p
* The page lock not only makes sure that page->mapping cannot
* suddenly be NULLified by truncation, it makes sure that the
* structure at mapping cannot be freed and reused yet,
- * so we can safely take mapping->i_mmap_lock.
+ * so we can safely take mapping->i_mmap_mutex.
*/
BUG_ON(!PageLocked(page));
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
/*
- * i_mmap_lock does not stabilize mapcount at all, but mapcount
+ * i_mmap_mutex does not stabilize mapcount at all, but mapcount
* is more likely to be accurate if we note it after spinning.
*/
mapcount = page_mapcount(page);
@@ -654,7 +654,7 @@ static int page_referenced_file(struct p
break;
}
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
return referenced;
}
@@ -741,7 +741,7 @@ static int page_mkclean_file(struct addr
BUG_ON(PageAnon(page));
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
if (vma->vm_flags & VM_SHARED) {
unsigned long address = vma_address(page, vma);
@@ -750,7 +750,7 @@ static int page_mkclean_file(struct addr
ret += page_mkclean_one(page, vma, address);
}
}
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
return ret;
}
@@ -1101,7 +1101,7 @@ int try_to_unmap_one(struct page *page,
/*
* We need mmap_sem locking, Otherwise VM_LOCKED check makes
* unstable result and race. Plus, We can't wait here because
- * we now hold anon_vma->lock or mapping->i_mmap_lock.
+ * we now hold anon_vma->lock or mapping->i_mmap_mutex.
* if trylock failed, the page remain in evictable lru and later
* vmscan could retry to move the page to unevictable lru if the
* page is actually mlocked.
@@ -1327,7 +1327,7 @@ static int try_to_unmap_file(struct page
unsigned long max_nl_size = 0;
unsigned int mapcount;
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
unsigned long address = vma_address(page, vma);
if (address == -EFAULT)
@@ -1373,7 +1373,7 @@ static int try_to_unmap_file(struct page
mapcount = page_mapcount(page);
if (!mapcount)
goto out;
- cond_resched_lock(&mapping->i_mmap_lock);
+ cond_resched();
max_nl_size = (max_nl_size + CLUSTER_SIZE - 1) & CLUSTER_MASK;
if (max_nl_cursor == 0)
@@ -1395,7 +1395,7 @@ static int try_to_unmap_file(struct page
}
vma->vm_private_data = (void *) max_nl_cursor;
}
- cond_resched_lock(&mapping->i_mmap_lock);
+ cond_resched();
max_nl_cursor += CLUSTER_SIZE;
} while (max_nl_cursor <= max_nl_size);
@@ -1407,7 +1407,7 @@ static int try_to_unmap_file(struct page
list_for_each_entry(vma, &mapping->i_mmap_nonlinear, shared.vm_set.list)
vma->vm_private_data = NULL;
out:
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
return ret;
}
@@ -1552,7 +1552,7 @@ static int rmap_walk_file(struct page *p
if (!mapping)
return ret;
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
unsigned long address = vma_address(page, vma);
if (address == -EFAULT)
@@ -1566,7 +1566,7 @@ static int rmap_walk_file(struct page *p
* never contain migration ptes. Decide what to do about this
* limitation to linear when we need rmap_walk() on nonlinear.
*/
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
return ret;
}
Index: linux-2.6/fs/gfs2/main.c
===================================================================
--- linux-2.6.orig/fs/gfs2/main.c
+++ linux-2.6/fs/gfs2/main.c
@@ -62,7 +62,7 @@ static void gfs2_init_gl_aspace_once(voi
memset(mapping, 0, sizeof(*mapping));
INIT_RADIX_TREE(&mapping->page_tree, GFP_ATOMIC);
spin_lock_init(&mapping->tree_lock);
- spin_lock_init(&mapping->i_mmap_lock);
+ mutex_init(&mapping->i_mmap_mutex);
INIT_LIST_HEAD(&mapping->private_list);
spin_lock_init(&mapping->private_lock);
INIT_RAW_PRIO_TREE_ROOT(&mapping->i_mmap);
Index: linux-2.6/fs/nilfs2/page.c
===================================================================
--- linux-2.6.orig/fs/nilfs2/page.c
+++ linux-2.6/fs/nilfs2/page.c
@@ -500,7 +500,7 @@ void nilfs_mapping_init_once(struct addr
INIT_LIST_HEAD(&mapping->private_list);
spin_lock_init(&mapping->private_lock);
- spin_lock_init(&mapping->i_mmap_lock);
+ mutex_init(&mapping->i_mmap_mutex);
INIT_RAW_PRIO_TREE_ROOT(&mapping->i_mmap);
INIT_LIST_HEAD(&mapping->i_mmap_nonlinear);
}
Index: linux-2.6/Documentation/lockstat.txt
===================================================================
--- linux-2.6.orig/Documentation/lockstat.txt
+++ linux-2.6/Documentation/lockstat.txt
@@ -136,7 +136,7 @@ The integer part of the time values is i
dcache_lock: 1037 1161 0.38 45.32 774.51 6611 243371 0.15 306.48 77387.24
&inode->i_mutex: 161 286 18446744073709 62882.54 1244614.55 3653 20598 18446744073709 62318.60 1693822.74
&zone->lru_lock: 94 94 0.53 7.33 92.10 4366 32690 0.29 59.81 16350.06
- &inode->i_data.i_mmap_lock: 79 79 0.40 3.77 53.03 11779 87755 0.28 116.93 29898.44
+ &inode->i_data.i_mmap_mutex: 79 79 0.40 3.77 53.03 11779 87755 0.28 116.93 29898.44
&q->__queue_lock: 48 50 0.52 31.62 86.31 774 13131 0.17 113.08 12277.52
&rq->rq_lock_key: 43 47 0.74 68.50 170.63 3706 33929 0.22 107.99 17460.62
&rq->rq_lock_key#2: 39 46 0.75 6.68 49.03 2979 32292 0.17 125.17 17137.63
Index: linux-2.6/Documentation/vm/locking
===================================================================
--- linux-2.6.orig/Documentation/vm/locking
+++ linux-2.6/Documentation/vm/locking
@@ -66,7 +66,7 @@ in some cases it is not really needed. E
expand_stack(), it is hard to come up with a destructive scenario without
having the vmlist protection in this case.
-The page_table_lock nests with the inode i_mmap_lock and the kmem cache
+The page_table_lock nests with the inode i_mmap_mutex and the kmem cache
c_spinlock spinlocks. This is okay, since the kmem code asks for pages after
dropping c_spinlock. The page_table_lock also nests with pagecache_lock and
pagemap_lru_lock spinlocks, and no code asks for memory with these locks
Index: linux-2.6/include/linux/mmu_notifier.h
===================================================================
--- linux-2.6.orig/include/linux/mmu_notifier.h
+++ linux-2.6/include/linux/mmu_notifier.h
@@ -150,7 +150,7 @@ struct mmu_notifier_ops {
* Therefore notifier chains can only be traversed when either
*
* 1. mmap_sem is held.
- * 2. One of the reverse map locks is held (i_mmap_lock or anon_vma->lock).
+ * 2. One of the reverse map locks is held (i_mmap_mutex or anon_vma->lock).
* 3. No other concurrent thread can access the list (release)
*/
struct mmu_notifier {
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -58,16 +58,16 @@
/*
* Lock ordering:
*
- * ->i_mmap_lock (truncate_pagecache)
+ * ->i_mmap_mutex (truncate_pagecache)
* ->private_lock (__free_pte->__set_page_dirty_buffers)
* ->swap_lock (exclusive_swap_page, others)
* ->mapping->tree_lock
*
* ->i_mutex
- * ->i_mmap_lock (truncate->unmap_mapping_range)
+ * ->i_mmap_mutex (truncate->unmap_mapping_range)
*
* ->mmap_sem
- * ->i_mmap_lock
+ * ->i_mmap_mutex
* ->page_table_lock or pte_lock (various, mainly in memory.c)
* ->mapping->tree_lock (arch-dependent flush_dcache_mmap_lock)
*
@@ -84,7 +84,7 @@
* ->sb_lock (fs/fs-writeback.c)
* ->mapping->tree_lock (__sync_single_inode)
*
- * ->i_mmap_lock
+ * ->i_mmap_mutex
* ->anon_vma.lock (vma_adjust)
*
* ->anon_vma.lock
@@ -104,7 +104,7 @@
*
* (code doesn't rely on that order, so you could switch it around)
* ->tasklist_lock (memory_failure, collect_procs_ao)
- * ->i_mmap_lock
+ * ->i_mmap_mutex
*/
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 15/25] mm: Extended batches for generic mmu_gather
2011-01-25 17:31 ` Peter Zijlstra
(?)
@ 2011-01-25 17:31 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-extended_batches_for_generic_mmu_gather.patch --]
[-- Type: text/plain, Size: 5558 bytes --]
Instead of using a single batch (the small on-stack, or an allocated
page), try and extend the batch every time it runs out and only flush
once either the extend fails or we're done.
Requested-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/asm-generic/tlb.h | 122 ++++++++++++++++++++++++++++++----------------
1 file changed, 82 insertions(+), 40 deletions(-)
Index: linux-2.6/include/asm-generic/tlb.h
===================================================================
--- linux-2.6.orig/include/asm-generic/tlb.h
+++ linux-2.6/include/asm-generic/tlb.h
@@ -17,16 +17,6 @@
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
-/*
- * For UP we don't need to worry about TLB flush
- * and page free order so much..
- */
-#ifdef CONFIG_SMP
- #define tlb_fast_mode(tlb) ((tlb)->nr == ~0U)
-#else
- #define tlb_fast_mode(tlb) 1
-#endif
-
#ifdef CONFIG_HAVE_RCU_TABLE_FREE
/*
* Semi RCU freeing of the page directories.
@@ -70,31 +60,66 @@ extern void tlb_remove_table(struct mmu_
#endif
+struct mmu_gather_batch {
+ struct mmu_gather_batch *next;
+ unsigned int nr;
+ unsigned int max;
+ struct page *pages[0];
+};
+
+#define MAX_GATHER_BATCH \
+ ((PAGE_SIZE - sizeof(struct mmu_gather_batch)) / sizeof(void *))
+
/* struct mmu_gather is an opaque type used by the mm code for passing around
* any data needed by arch specific code for tlb_remove_page.
*/
struct mmu_gather {
struct mm_struct *mm;
- unsigned int nr; /* set to ~0U means fast mode */
- unsigned int max; /* nr < max */
- unsigned int need_flush;/* Really unmapped some ptes? */
- unsigned int fullmm; /* non-zero means full mm flush */
- struct page **pages;
- struct page *local[8];
+ unsigned int need_flush : 1, /* Did free PTEs */
+ fast_mode : 1; /* No batching */
+ unsigned int fullmm; /* Flush full mm */
+
+ struct mmu_gather_batch *active;
+ struct mmu_gather_batch local;
+ struct page *__pages[8];
#ifdef CONFIG_HAVE_RCU_TABLE_FREE
struct mmu_table_batch *batch;
#endif
};
-static inline void __tlb_alloc_page(struct mmu_gather *tlb)
+/*
+ * For UP we don't need to worry about TLB flush
+ * and page free order so much..
+ */
+#ifdef CONFIG_SMP
+ #define tlb_fast_mode(tlb) (tlb->fast_mode)
+#else
+ #define tlb_fast_mode(tlb) 1
+#endif
+
+static inline int tlb_next_batch(struct mmu_gather *tlb)
{
- unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
+ struct mmu_gather_batch *batch;
- if (addr) {
- tlb->pages = (void *)addr;
- tlb->max = PAGE_SIZE / sizeof(struct page *);
+ batch = tlb->active;
+ if (batch->next) {
+ tlb->active = batch->next;
+ return 1;
}
+
+ batch = (void *)__get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
+ if (!batch)
+ return 0;
+
+ batch->next = NULL;
+ batch->nr = 0;
+ batch->max = MAX_GATHER_BATCH;
+
+ tlb->active->next = batch;
+ tlb->active = batch;
+
+ return 1;
}
/* tlb_gather_mmu
@@ -105,17 +130,16 @@ tlb_gather_mmu(struct mmu_gather *tlb, s
{
tlb->mm = mm;
- tlb->max = ARRAY_SIZE(tlb->local);
- tlb->pages = tlb->local;
-
- if (num_online_cpus() > 1) {
- tlb->nr = 0;
- __tlb_alloc_page(tlb);
- } else /* Use fast mode if only one CPU is online */
- tlb->nr = ~0U;
-
+ tlb->need_flush = 0;
+ if (num_online_cpus() == 1)
+ tlb->fast_mode = 1;
tlb->fullmm = full_mm_flush;
+ tlb->local.next = NULL;
+ tlb->local.nr = 0;
+ tlb->local.max = ARRAY_SIZE(tlb->__pages);
+ tlb->active = &tlb->local;
+
#ifdef CONFIG_HAVE_RCU_TABLE_FREE
tlb->batch = NULL;
#endif
@@ -124,6 +148,8 @@ tlb_gather_mmu(struct mmu_gather *tlb, s
static inline void
tlb_flush_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
{
+ struct mmu_gather_batch *batch;
+
if (!tlb->need_flush)
return;
tlb->need_flush = 0;
@@ -131,12 +157,14 @@ tlb_flush_mmu(struct mmu_gather *tlb, un
#ifdef CONFIG_HAVE_RCU_TABLE_FREE
tlb_table_flush(tlb);
#endif
- if (!tlb_fast_mode(tlb)) {
- free_pages_and_swap_cache(tlb->pages, tlb->nr);
- tlb->nr = 0;
- if (tlb->pages == tlb->local)
- __tlb_alloc_page(tlb);
+ if (tlb_fast_mode(tlb))
+ return;
+
+ for (batch = &tlb->local; batch; batch = batch->next) {
+ free_pages_and_swap_cache(batch->pages, batch->nr);
+ batch->nr = 0;
}
+ tlb->active = &tlb->local;
}
/* tlb_finish_mmu
@@ -146,13 +174,18 @@ tlb_flush_mmu(struct mmu_gather *tlb, un
static inline void
tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
{
+ struct mmu_gather_batch *batch, *next;
+
tlb_flush_mmu(tlb, start, end);
/* keep the page table cache within bounds */
check_pgt_cache();
- if (tlb->pages != tlb->local)
- free_pages((unsigned long)tlb->pages, 0);
+ for (batch = tlb->local.next; batch; batch = next) {
+ next = batch->next;
+ free_pages((unsigned long)batch, 0);
+ }
+ tlb->local.next = NULL;
}
/* tlb_remove_page
@@ -162,14 +195,23 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
*/
static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
{
+ struct mmu_gather_batch *batch;
+
tlb->need_flush = 1;
+
if (tlb_fast_mode(tlb)) {
free_page_and_swap_cache(page);
return;
}
- tlb->pages[tlb->nr++] = page;
- if (tlb->nr >= tlb->max)
- tlb_flush_mmu(tlb, 0, 0);
+
+ batch = tlb->active;
+ if (batch->nr == batch->max) {
+ if (!tlb_next_batch(tlb))
+ tlb_flush_mmu(tlb, 0, 0);
+ batch = tlb->active;
+ }
+
+ batch->pages[batch->nr++] = page;
}
/**
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 15/25] mm: Extended batches for generic mmu_gather
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-extended_batches_for_generic_mmu_gather.patch --]
[-- Type: text/plain, Size: 5854 bytes --]
Instead of using a single batch (the small on-stack, or an allocated
page), try and extend the batch every time it runs out and only flush
once either the extend fails or we're done.
Requested-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/asm-generic/tlb.h | 122 ++++++++++++++++++++++++++++++----------------
1 file changed, 82 insertions(+), 40 deletions(-)
Index: linux-2.6/include/asm-generic/tlb.h
===================================================================
--- linux-2.6.orig/include/asm-generic/tlb.h
+++ linux-2.6/include/asm-generic/tlb.h
@@ -17,16 +17,6 @@
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
-/*
- * For UP we don't need to worry about TLB flush
- * and page free order so much..
- */
-#ifdef CONFIG_SMP
- #define tlb_fast_mode(tlb) ((tlb)->nr == ~0U)
-#else
- #define tlb_fast_mode(tlb) 1
-#endif
-
#ifdef CONFIG_HAVE_RCU_TABLE_FREE
/*
* Semi RCU freeing of the page directories.
@@ -70,31 +60,66 @@ extern void tlb_remove_table(struct mmu_
#endif
+struct mmu_gather_batch {
+ struct mmu_gather_batch *next;
+ unsigned int nr;
+ unsigned int max;
+ struct page *pages[0];
+};
+
+#define MAX_GATHER_BATCH \
+ ((PAGE_SIZE - sizeof(struct mmu_gather_batch)) / sizeof(void *))
+
/* struct mmu_gather is an opaque type used by the mm code for passing around
* any data needed by arch specific code for tlb_remove_page.
*/
struct mmu_gather {
struct mm_struct *mm;
- unsigned int nr; /* set to ~0U means fast mode */
- unsigned int max; /* nr < max */
- unsigned int need_flush;/* Really unmapped some ptes? */
- unsigned int fullmm; /* non-zero means full mm flush */
- struct page **pages;
- struct page *local[8];
+ unsigned int need_flush : 1, /* Did free PTEs */
+ fast_mode : 1; /* No batching */
+ unsigned int fullmm; /* Flush full mm */
+
+ struct mmu_gather_batch *active;
+ struct mmu_gather_batch local;
+ struct page *__pages[8];
#ifdef CONFIG_HAVE_RCU_TABLE_FREE
struct mmu_table_batch *batch;
#endif
};
-static inline void __tlb_alloc_page(struct mmu_gather *tlb)
+/*
+ * For UP we don't need to worry about TLB flush
+ * and page free order so much..
+ */
+#ifdef CONFIG_SMP
+ #define tlb_fast_mode(tlb) (tlb->fast_mode)
+#else
+ #define tlb_fast_mode(tlb) 1
+#endif
+
+static inline int tlb_next_batch(struct mmu_gather *tlb)
{
- unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
+ struct mmu_gather_batch *batch;
- if (addr) {
- tlb->pages = (void *)addr;
- tlb->max = PAGE_SIZE / sizeof(struct page *);
+ batch = tlb->active;
+ if (batch->next) {
+ tlb->active = batch->next;
+ return 1;
}
+
+ batch = (void *)__get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
+ if (!batch)
+ return 0;
+
+ batch->next = NULL;
+ batch->nr = 0;
+ batch->max = MAX_GATHER_BATCH;
+
+ tlb->active->next = batch;
+ tlb->active = batch;
+
+ return 1;
}
/* tlb_gather_mmu
@@ -105,17 +130,16 @@ tlb_gather_mmu(struct mmu_gather *tlb, s
{
tlb->mm = mm;
- tlb->max = ARRAY_SIZE(tlb->local);
- tlb->pages = tlb->local;
-
- if (num_online_cpus() > 1) {
- tlb->nr = 0;
- __tlb_alloc_page(tlb);
- } else /* Use fast mode if only one CPU is online */
- tlb->nr = ~0U;
-
+ tlb->need_flush = 0;
+ if (num_online_cpus() == 1)
+ tlb->fast_mode = 1;
tlb->fullmm = full_mm_flush;
+ tlb->local.next = NULL;
+ tlb->local.nr = 0;
+ tlb->local.max = ARRAY_SIZE(tlb->__pages);
+ tlb->active = &tlb->local;
+
#ifdef CONFIG_HAVE_RCU_TABLE_FREE
tlb->batch = NULL;
#endif
@@ -124,6 +148,8 @@ tlb_gather_mmu(struct mmu_gather *tlb, s
static inline void
tlb_flush_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
{
+ struct mmu_gather_batch *batch;
+
if (!tlb->need_flush)
return;
tlb->need_flush = 0;
@@ -131,12 +157,14 @@ tlb_flush_mmu(struct mmu_gather *tlb, un
#ifdef CONFIG_HAVE_RCU_TABLE_FREE
tlb_table_flush(tlb);
#endif
- if (!tlb_fast_mode(tlb)) {
- free_pages_and_swap_cache(tlb->pages, tlb->nr);
- tlb->nr = 0;
- if (tlb->pages == tlb->local)
- __tlb_alloc_page(tlb);
+ if (tlb_fast_mode(tlb))
+ return;
+
+ for (batch = &tlb->local; batch; batch = batch->next) {
+ free_pages_and_swap_cache(batch->pages, batch->nr);
+ batch->nr = 0;
}
+ tlb->active = &tlb->local;
}
/* tlb_finish_mmu
@@ -146,13 +174,18 @@ tlb_flush_mmu(struct mmu_gather *tlb, un
static inline void
tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
{
+ struct mmu_gather_batch *batch, *next;
+
tlb_flush_mmu(tlb, start, end);
/* keep the page table cache within bounds */
check_pgt_cache();
- if (tlb->pages != tlb->local)
- free_pages((unsigned long)tlb->pages, 0);
+ for (batch = tlb->local.next; batch; batch = next) {
+ next = batch->next;
+ free_pages((unsigned long)batch, 0);
+ }
+ tlb->local.next = NULL;
}
/* tlb_remove_page
@@ -162,14 +195,23 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
*/
static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
{
+ struct mmu_gather_batch *batch;
+
tlb->need_flush = 1;
+
if (tlb_fast_mode(tlb)) {
free_page_and_swap_cache(page);
return;
}
- tlb->pages[tlb->nr++] = page;
- if (tlb->nr >= tlb->max)
- tlb_flush_mmu(tlb, 0, 0);
+
+ batch = tlb->active;
+ if (batch->nr == batch->max) {
+ if (!tlb_next_batch(tlb))
+ tlb_flush_mmu(tlb, 0, 0);
+ batch = tlb->active;
+ }
+
+ batch->pages[batch->nr++] = page;
}
/**
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 15/25] mm: Extended batches for generic mmu_gather
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-extended_batches_for_generic_mmu_gather.patch --]
[-- Type: text/plain, Size: 5854 bytes --]
Instead of using a single batch (the small on-stack, or an allocated
page), try and extend the batch every time it runs out and only flush
once either the extend fails or we're done.
Requested-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/asm-generic/tlb.h | 122 ++++++++++++++++++++++++++++++----------------
1 file changed, 82 insertions(+), 40 deletions(-)
Index: linux-2.6/include/asm-generic/tlb.h
===================================================================
--- linux-2.6.orig/include/asm-generic/tlb.h
+++ linux-2.6/include/asm-generic/tlb.h
@@ -17,16 +17,6 @@
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
-/*
- * For UP we don't need to worry about TLB flush
- * and page free order so much..
- */
-#ifdef CONFIG_SMP
- #define tlb_fast_mode(tlb) ((tlb)->nr == ~0U)
-#else
- #define tlb_fast_mode(tlb) 1
-#endif
-
#ifdef CONFIG_HAVE_RCU_TABLE_FREE
/*
* Semi RCU freeing of the page directories.
@@ -70,31 +60,66 @@ extern void tlb_remove_table(struct mmu_
#endif
+struct mmu_gather_batch {
+ struct mmu_gather_batch *next;
+ unsigned int nr;
+ unsigned int max;
+ struct page *pages[0];
+};
+
+#define MAX_GATHER_BATCH \
+ ((PAGE_SIZE - sizeof(struct mmu_gather_batch)) / sizeof(void *))
+
/* struct mmu_gather is an opaque type used by the mm code for passing around
* any data needed by arch specific code for tlb_remove_page.
*/
struct mmu_gather {
struct mm_struct *mm;
- unsigned int nr; /* set to ~0U means fast mode */
- unsigned int max; /* nr < max */
- unsigned int need_flush;/* Really unmapped some ptes? */
- unsigned int fullmm; /* non-zero means full mm flush */
- struct page **pages;
- struct page *local[8];
+ unsigned int need_flush : 1, /* Did free PTEs */
+ fast_mode : 1; /* No batching */
+ unsigned int fullmm; /* Flush full mm */
+
+ struct mmu_gather_batch *active;
+ struct mmu_gather_batch local;
+ struct page *__pages[8];
#ifdef CONFIG_HAVE_RCU_TABLE_FREE
struct mmu_table_batch *batch;
#endif
};
-static inline void __tlb_alloc_page(struct mmu_gather *tlb)
+/*
+ * For UP we don't need to worry about TLB flush
+ * and page free order so much..
+ */
+#ifdef CONFIG_SMP
+ #define tlb_fast_mode(tlb) (tlb->fast_mode)
+#else
+ #define tlb_fast_mode(tlb) 1
+#endif
+
+static inline int tlb_next_batch(struct mmu_gather *tlb)
{
- unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
+ struct mmu_gather_batch *batch;
- if (addr) {
- tlb->pages = (void *)addr;
- tlb->max = PAGE_SIZE / sizeof(struct page *);
+ batch = tlb->active;
+ if (batch->next) {
+ tlb->active = batch->next;
+ return 1;
}
+
+ batch = (void *)__get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
+ if (!batch)
+ return 0;
+
+ batch->next = NULL;
+ batch->nr = 0;
+ batch->max = MAX_GATHER_BATCH;
+
+ tlb->active->next = batch;
+ tlb->active = batch;
+
+ return 1;
}
/* tlb_gather_mmu
@@ -105,17 +130,16 @@ tlb_gather_mmu(struct mmu_gather *tlb, s
{
tlb->mm = mm;
- tlb->max = ARRAY_SIZE(tlb->local);
- tlb->pages = tlb->local;
-
- if (num_online_cpus() > 1) {
- tlb->nr = 0;
- __tlb_alloc_page(tlb);
- } else /* Use fast mode if only one CPU is online */
- tlb->nr = ~0U;
-
+ tlb->need_flush = 0;
+ if (num_online_cpus() == 1)
+ tlb->fast_mode = 1;
tlb->fullmm = full_mm_flush;
+ tlb->local.next = NULL;
+ tlb->local.nr = 0;
+ tlb->local.max = ARRAY_SIZE(tlb->__pages);
+ tlb->active = &tlb->local;
+
#ifdef CONFIG_HAVE_RCU_TABLE_FREE
tlb->batch = NULL;
#endif
@@ -124,6 +148,8 @@ tlb_gather_mmu(struct mmu_gather *tlb, s
static inline void
tlb_flush_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
{
+ struct mmu_gather_batch *batch;
+
if (!tlb->need_flush)
return;
tlb->need_flush = 0;
@@ -131,12 +157,14 @@ tlb_flush_mmu(struct mmu_gather *tlb, un
#ifdef CONFIG_HAVE_RCU_TABLE_FREE
tlb_table_flush(tlb);
#endif
- if (!tlb_fast_mode(tlb)) {
- free_pages_and_swap_cache(tlb->pages, tlb->nr);
- tlb->nr = 0;
- if (tlb->pages == tlb->local)
- __tlb_alloc_page(tlb);
+ if (tlb_fast_mode(tlb))
+ return;
+
+ for (batch = &tlb->local; batch; batch = batch->next) {
+ free_pages_and_swap_cache(batch->pages, batch->nr);
+ batch->nr = 0;
}
+ tlb->active = &tlb->local;
}
/* tlb_finish_mmu
@@ -146,13 +174,18 @@ tlb_flush_mmu(struct mmu_gather *tlb, un
static inline void
tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
{
+ struct mmu_gather_batch *batch, *next;
+
tlb_flush_mmu(tlb, start, end);
/* keep the page table cache within bounds */
check_pgt_cache();
- if (tlb->pages != tlb->local)
- free_pages((unsigned long)tlb->pages, 0);
+ for (batch = tlb->local.next; batch; batch = next) {
+ next = batch->next;
+ free_pages((unsigned long)batch, 0);
+ }
+ tlb->local.next = NULL;
}
/* tlb_remove_page
@@ -162,14 +195,23 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
*/
static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
{
+ struct mmu_gather_batch *batch;
+
tlb->need_flush = 1;
+
if (tlb_fast_mode(tlb)) {
free_page_and_swap_cache(page);
return;
}
- tlb->pages[tlb->nr++] = page;
- if (tlb->nr >= tlb->max)
- tlb_flush_mmu(tlb, 0, 0);
+
+ batch = tlb->active;
+ if (batch->nr == batch->max) {
+ if (!tlb_next_batch(tlb))
+ tlb_flush_mmu(tlb, 0, 0);
+ batch = tlb->active;
+ }
+
+ batch->pages[batch->nr++] = page;
}
/**
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 16/25] mm: Revert page_lock_anon_vma() lock annotation
2011-01-25 17:31 ` Peter Zijlstra
(?)
@ 2011-01-25 17:31 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Namhyung Kim,
Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-revert_page_lock_anon_vma_lock_annotation.patch --]
[-- Type: text/plain, Size: 1904 bytes --]
Its beyond ugly and gets in the way.
Cc: Namhyung Kim <namhyung@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/rmap.h | 15 +--------------
mm/rmap.c | 4 +---
2 files changed, 2 insertions(+), 17 deletions(-)
Index: linux-2.6/include/linux/rmap.h
===================================================================
--- linux-2.6.orig/include/linux/rmap.h
+++ linux-2.6/include/linux/rmap.h
@@ -243,20 +243,7 @@ int try_to_munlock(struct page *);
/*
* Called by memory-failure.c to kill processes.
*/
-struct anon_vma *__page_lock_anon_vma(struct page *page);
-
-static inline struct anon_vma *page_lock_anon_vma(struct page *page)
-{
- struct anon_vma *anon_vma;
-
- __cond_lock(RCU, anon_vma = __page_lock_anon_vma(page));
-
- /* (void) is needed to make gcc happy */
- (void) __cond_lock(&anon_vma->root->lock, anon_vma);
-
- return anon_vma;
-}
-
+struct anon_vma *page_lock_anon_vma(struct page *page);
void page_unlock_anon_vma(struct anon_vma *anon_vma);
int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma);
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -318,7 +318,7 @@ void __init anon_vma_init(void)
* Getting a lock on a stable anon_vma from a page off the LRU is
* tricky: page_lock_anon_vma rely on RCU to guard against the races.
*/
-struct anon_vma *__page_lock_anon_vma(struct page *page)
+struct anon_vma *page_lock_anon_vma(struct page *page)
{
struct anon_vma *anon_vma, *root_anon_vma;
unsigned long anon_mapping;
@@ -352,8 +352,6 @@ struct anon_vma *__page_lock_anon_vma(st
}
void page_unlock_anon_vma(struct anon_vma *anon_vma)
- __releases(&anon_vma->root->lock)
- __releases(RCU)
{
anon_vma_unlock(anon_vma);
rcu_read_unlock();
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 16/25] mm: Revert page_lock_anon_vma() lock annotation
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Namhyung Kim,
Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-revert_page_lock_anon_vma_lock_annotation.patch --]
[-- Type: text/plain, Size: 2200 bytes --]
Its beyond ugly and gets in the way.
Cc: Namhyung Kim <namhyung@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/rmap.h | 15 +--------------
mm/rmap.c | 4 +---
2 files changed, 2 insertions(+), 17 deletions(-)
Index: linux-2.6/include/linux/rmap.h
===================================================================
--- linux-2.6.orig/include/linux/rmap.h
+++ linux-2.6/include/linux/rmap.h
@@ -243,20 +243,7 @@ int try_to_munlock(struct page *);
/*
* Called by memory-failure.c to kill processes.
*/
-struct anon_vma *__page_lock_anon_vma(struct page *page);
-
-static inline struct anon_vma *page_lock_anon_vma(struct page *page)
-{
- struct anon_vma *anon_vma;
-
- __cond_lock(RCU, anon_vma = __page_lock_anon_vma(page));
-
- /* (void) is needed to make gcc happy */
- (void) __cond_lock(&anon_vma->root->lock, anon_vma);
-
- return anon_vma;
-}
-
+struct anon_vma *page_lock_anon_vma(struct page *page);
void page_unlock_anon_vma(struct anon_vma *anon_vma);
int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma);
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -318,7 +318,7 @@ void __init anon_vma_init(void)
* Getting a lock on a stable anon_vma from a page off the LRU is
* tricky: page_lock_anon_vma rely on RCU to guard against the races.
*/
-struct anon_vma *__page_lock_anon_vma(struct page *page)
+struct anon_vma *page_lock_anon_vma(struct page *page)
{
struct anon_vma *anon_vma, *root_anon_vma;
unsigned long anon_mapping;
@@ -352,8 +352,6 @@ struct anon_vma *__page_lock_anon_vma(st
}
void page_unlock_anon_vma(struct anon_vma *anon_vma)
- __releases(&anon_vma->root->lock)
- __releases(RCU)
{
anon_vma_unlock(anon_vma);
rcu_read_unlock();
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 16/25] mm: Revert page_lock_anon_vma() lock annotation
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Namhyung Kim,
Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-revert_page_lock_anon_vma_lock_annotation.patch --]
[-- Type: text/plain, Size: 1902 bytes --]
Its beyond ugly and gets in the way.
Cc: Namhyung Kim <namhyung@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/rmap.h | 15 +--------------
mm/rmap.c | 4 +---
2 files changed, 2 insertions(+), 17 deletions(-)
Index: linux-2.6/include/linux/rmap.h
===================================================================
--- linux-2.6.orig/include/linux/rmap.h
+++ linux-2.6/include/linux/rmap.h
@@ -243,20 +243,7 @@ int try_to_munlock(struct page *);
/*
* Called by memory-failure.c to kill processes.
*/
-struct anon_vma *__page_lock_anon_vma(struct page *page);
-
-static inline struct anon_vma *page_lock_anon_vma(struct page *page)
-{
- struct anon_vma *anon_vma;
-
- __cond_lock(RCU, anon_vma = __page_lock_anon_vma(page));
-
- /* (void) is needed to make gcc happy */
- (void) __cond_lock(&anon_vma->root->lock, anon_vma);
-
- return anon_vma;
-}
-
+struct anon_vma *page_lock_anon_vma(struct page *page);
void page_unlock_anon_vma(struct anon_vma *anon_vma);
int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma);
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -318,7 +318,7 @@ void __init anon_vma_init(void)
* Getting a lock on a stable anon_vma from a page off the LRU is
* tricky: page_lock_anon_vma rely on RCU to guard against the races.
*/
-struct anon_vma *__page_lock_anon_vma(struct page *page)
+struct anon_vma *page_lock_anon_vma(struct page *page)
{
struct anon_vma *anon_vma, *root_anon_vma;
unsigned long anon_mapping;
@@ -352,8 +352,6 @@ struct anon_vma *__page_lock_anon_vma(st
}
void page_unlock_anon_vma(struct anon_vma *anon_vma)
- __releases(&anon_vma->root->lock)
- __releases(RCU)
{
anon_vma_unlock(anon_vma);
rcu_read_unlock();
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 17/25] mm: Improve page_lock_anon_vma() comment
2011-01-25 17:31 ` Peter Zijlstra
(?)
@ 2011-01-25 17:31 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-improve_page_lock_anon_vma_comment.patch --]
[-- Type: text/plain, Size: 1685 bytes --]
A slightly more verbose comment to go along with the trickery in
page_lock_anon_vma().
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
mm/rmap.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -315,8 +315,22 @@ void __init anon_vma_init(void)
}
/*
- * Getting a lock on a stable anon_vma from a page off the LRU is
- * tricky: page_lock_anon_vma rely on RCU to guard against the races.
+ * Getting a lock on a stable anon_vma from a page off the LRU is tricky!
+ *
+ * Since there is no serialization what so ever against page_remove_rmap()
+ * the best this function can do is return a locked anon_vma that might
+ * have been relevant to this page.
+ *
+ * The page might have been remapped to a different anon_vma or the anon_vma
+ * returned may already be freed (and even reused).
+ *
+ * All users of this function must be very careful when walking the anon_vma
+ * chain and verify that the page in question is indeed mapped in it
+ * [ something equivalent to page_mapped_in_vma() ].
+ *
+ * Since anon_vma's slab is DESTROY_BY_RCU and we know from page_remove_rmap()
+ * that the anon_vma pointer from page->mapping is valid if there is a
+ * mapcount, we can dereference the anon_vma after observing those.
*/
struct anon_vma *page_lock_anon_vma(struct page *page)
{
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 17/25] mm: Improve page_lock_anon_vma() comment
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-improve_page_lock_anon_vma_comment.patch --]
[-- Type: text/plain, Size: 1981 bytes --]
A slightly more verbose comment to go along with the trickery in
page_lock_anon_vma().
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
mm/rmap.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -315,8 +315,22 @@ void __init anon_vma_init(void)
}
/*
- * Getting a lock on a stable anon_vma from a page off the LRU is
- * tricky: page_lock_anon_vma rely on RCU to guard against the races.
+ * Getting a lock on a stable anon_vma from a page off the LRU is tricky!
+ *
+ * Since there is no serialization what so ever against page_remove_rmap()
+ * the best this function can do is return a locked anon_vma that might
+ * have been relevant to this page.
+ *
+ * The page might have been remapped to a different anon_vma or the anon_vma
+ * returned may already be freed (and even reused).
+ *
+ * All users of this function must be very careful when walking the anon_vma
+ * chain and verify that the page in question is indeed mapped in it
+ * [ something equivalent to page_mapped_in_vma() ].
+ *
+ * Since anon_vma's slab is DESTROY_BY_RCU and we know from page_remove_rmap()
+ * that the anon_vma pointer from page->mapping is valid if there is a
+ * mapcount, we can dereference the anon_vma after observing those.
*/
struct anon_vma *page_lock_anon_vma(struct page *page)
{
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 17/25] mm: Improve page_lock_anon_vma() comment
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-improve_page_lock_anon_vma_comment.patch --]
[-- Type: text/plain, Size: 1981 bytes --]
A slightly more verbose comment to go along with the trickery in
page_lock_anon_vma().
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
mm/rmap.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -315,8 +315,22 @@ void __init anon_vma_init(void)
}
/*
- * Getting a lock on a stable anon_vma from a page off the LRU is
- * tricky: page_lock_anon_vma rely on RCU to guard against the races.
+ * Getting a lock on a stable anon_vma from a page off the LRU is tricky!
+ *
+ * Since there is no serialization what so ever against page_remove_rmap()
+ * the best this function can do is return a locked anon_vma that might
+ * have been relevant to this page.
+ *
+ * The page might have been remapped to a different anon_vma or the anon_vma
+ * returned may already be freed (and even reused).
+ *
+ * All users of this function must be very careful when walking the anon_vma
+ * chain and verify that the page in question is indeed mapped in it
+ * [ something equivalent to page_mapped_in_vma() ].
+ *
+ * Since anon_vma's slab is DESTROY_BY_RCU and we know from page_remove_rmap()
+ * that the anon_vma pointer from page->mapping is valid if there is a
+ * mapcount, we can dereference the anon_vma after observing those.
*/
struct anon_vma *page_lock_anon_vma(struct page *page)
{
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 18/25] mm: Rename drop_anon_vma to put_anon_vma
2011-01-25 17:31 ` Peter Zijlstra
(?)
@ 2011-01-25 17:31 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-rename_drop_anon_vma_to_put_anon_vma.patch --]
[-- Type: text/plain, Size: 4344 bytes --]
The normal code pattern used in the kernel is: get/put.
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/rmap.h | 4 ++--
mm/ksm.c | 23 +++++------------------
mm/migrate.c | 4 ++--
mm/rmap.c | 4 ++--
4 files changed, 11 insertions(+), 24 deletions(-)
Index: linux-2.6/include/linux/rmap.h
===================================================================
--- linux-2.6.orig/include/linux/rmap.h
+++ linux-2.6/include/linux/rmap.h
@@ -87,7 +87,7 @@ static inline void get_anon_vma(struct a
atomic_inc(&anon_vma->external_refcount);
}
-void drop_anon_vma(struct anon_vma *);
+void put_anon_vma(struct anon_vma *);
#else
static inline void anonvma_external_refcount_init(struct anon_vma *anon_vma)
{
@@ -102,7 +102,7 @@ static inline void get_anon_vma(struct a
{
}
-static inline void drop_anon_vma(struct anon_vma *anon_vma)
+static inline void put_anon_vma(struct anon_vma *anon_vma)
{
}
#endif /* CONFIG_KSM */
Index: linux-2.6/mm/ksm.c
===================================================================
--- linux-2.6.orig/mm/ksm.c
+++ linux-2.6/mm/ksm.c
@@ -301,20 +301,6 @@ static inline int in_stable_tree(struct
return rmap_item->address & STABLE_FLAG;
}
-static void hold_anon_vma(struct rmap_item *rmap_item,
- struct anon_vma *anon_vma)
-{
- rmap_item->anon_vma = anon_vma;
- get_anon_vma(anon_vma);
-}
-
-static void ksm_drop_anon_vma(struct rmap_item *rmap_item)
-{
- struct anon_vma *anon_vma = rmap_item->anon_vma;
-
- drop_anon_vma(anon_vma);
-}
-
/*
* ksmd, and unmerge_and_remove_all_rmap_items(), must not touch an mm's
* page tables after it has passed through ksm_exit() - which, if necessary,
@@ -397,7 +383,7 @@ static void break_cow(struct rmap_item *
* It is not an accident that whenever we want to break COW
* to undo, we also need to drop a reference to the anon_vma.
*/
- ksm_drop_anon_vma(rmap_item);
+ put_anon_vma(rmap_item->anon_vma);
down_read(&mm->mmap_sem);
if (ksm_test_exit(mm))
@@ -466,7 +452,7 @@ static void remove_node_from_stable_tree
ksm_pages_sharing--;
else
ksm_pages_shared--;
- ksm_drop_anon_vma(rmap_item);
+ put_anon_vma(rmap_item->anon_vma);
rmap_item->address &= PAGE_MASK;
cond_resched();
}
@@ -554,7 +540,7 @@ static void remove_rmap_item_from_tree(s
else
ksm_pages_shared--;
- ksm_drop_anon_vma(rmap_item);
+ put_anon_vma(rmap_item->anon_vma);
rmap_item->address &= PAGE_MASK;
} else if (rmap_item->address & UNSTABLE_FLAG) {
@@ -949,7 +935,8 @@ static int try_to_merge_with_ksm_page(st
goto out;
/* Must get reference to anon_vma while still holding mmap_sem */
- hold_anon_vma(rmap_item, vma->anon_vma);
+ rmap_item->anon_vma = vma->anon_vma;
+ get_anon_vma(vma->anon_vma);
out:
up_read(&mm->mmap_sem);
return err;
Index: linux-2.6/mm/migrate.c
===================================================================
--- linux-2.6.orig/mm/migrate.c
+++ linux-2.6/mm/migrate.c
@@ -764,7 +764,7 @@ static int unmap_and_move(new_page_t get
/* Drop an anon_vma reference if we took one */
if (anon_vma)
- drop_anon_vma(anon_vma);
+ put_anon_vma(anon_vma);
uncharge:
if (!charge)
@@ -857,7 +857,7 @@ static int unmap_and_move_huge_page(new_
remove_migration_ptes(hpage, hpage);
if (anon_vma)
- drop_anon_vma(anon_vma);
+ put_anon_vma(anon_vma);
out:
unlock_page(hpage);
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -278,7 +278,7 @@ static void anon_vma_unlink(struct anon_
if (empty) {
/* We no longer need the root anon_vma */
if (anon_vma->root != anon_vma)
- drop_anon_vma(anon_vma->root);
+ put_anon_vma(anon_vma->root);
anon_vma_free(anon_vma);
}
}
@@ -1489,7 +1489,7 @@ int try_to_munlock(struct page *page)
* we know we are the last user, nobody else can get a reference and we
* can do the freeing without the lock.
*/
-void drop_anon_vma(struct anon_vma *anon_vma)
+void put_anon_vma(struct anon_vma *anon_vma)
{
BUG_ON(atomic_read(&anon_vma->external_refcount) <= 0);
if (atomic_dec_and_lock(&anon_vma->external_refcount, &anon_vma->root->lock)) {
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 18/25] mm: Rename drop_anon_vma to put_anon_vma
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-rename_drop_anon_vma_to_put_anon_vma.patch --]
[-- Type: text/plain, Size: 4640 bytes --]
The normal code pattern used in the kernel is: get/put.
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/rmap.h | 4 ++--
mm/ksm.c | 23 +++++------------------
mm/migrate.c | 4 ++--
mm/rmap.c | 4 ++--
4 files changed, 11 insertions(+), 24 deletions(-)
Index: linux-2.6/include/linux/rmap.h
===================================================================
--- linux-2.6.orig/include/linux/rmap.h
+++ linux-2.6/include/linux/rmap.h
@@ -87,7 +87,7 @@ static inline void get_anon_vma(struct a
atomic_inc(&anon_vma->external_refcount);
}
-void drop_anon_vma(struct anon_vma *);
+void put_anon_vma(struct anon_vma *);
#else
static inline void anonvma_external_refcount_init(struct anon_vma *anon_vma)
{
@@ -102,7 +102,7 @@ static inline void get_anon_vma(struct a
{
}
-static inline void drop_anon_vma(struct anon_vma *anon_vma)
+static inline void put_anon_vma(struct anon_vma *anon_vma)
{
}
#endif /* CONFIG_KSM */
Index: linux-2.6/mm/ksm.c
===================================================================
--- linux-2.6.orig/mm/ksm.c
+++ linux-2.6/mm/ksm.c
@@ -301,20 +301,6 @@ static inline int in_stable_tree(struct
return rmap_item->address & STABLE_FLAG;
}
-static void hold_anon_vma(struct rmap_item *rmap_item,
- struct anon_vma *anon_vma)
-{
- rmap_item->anon_vma = anon_vma;
- get_anon_vma(anon_vma);
-}
-
-static void ksm_drop_anon_vma(struct rmap_item *rmap_item)
-{
- struct anon_vma *anon_vma = rmap_item->anon_vma;
-
- drop_anon_vma(anon_vma);
-}
-
/*
* ksmd, and unmerge_and_remove_all_rmap_items(), must not touch an mm's
* page tables after it has passed through ksm_exit() - which, if necessary,
@@ -397,7 +383,7 @@ static void break_cow(struct rmap_item *
* It is not an accident that whenever we want to break COW
* to undo, we also need to drop a reference to the anon_vma.
*/
- ksm_drop_anon_vma(rmap_item);
+ put_anon_vma(rmap_item->anon_vma);
down_read(&mm->mmap_sem);
if (ksm_test_exit(mm))
@@ -466,7 +452,7 @@ static void remove_node_from_stable_tree
ksm_pages_sharing--;
else
ksm_pages_shared--;
- ksm_drop_anon_vma(rmap_item);
+ put_anon_vma(rmap_item->anon_vma);
rmap_item->address &= PAGE_MASK;
cond_resched();
}
@@ -554,7 +540,7 @@ static void remove_rmap_item_from_tree(s
else
ksm_pages_shared--;
- ksm_drop_anon_vma(rmap_item);
+ put_anon_vma(rmap_item->anon_vma);
rmap_item->address &= PAGE_MASK;
} else if (rmap_item->address & UNSTABLE_FLAG) {
@@ -949,7 +935,8 @@ static int try_to_merge_with_ksm_page(st
goto out;
/* Must get reference to anon_vma while still holding mmap_sem */
- hold_anon_vma(rmap_item, vma->anon_vma);
+ rmap_item->anon_vma = vma->anon_vma;
+ get_anon_vma(vma->anon_vma);
out:
up_read(&mm->mmap_sem);
return err;
Index: linux-2.6/mm/migrate.c
===================================================================
--- linux-2.6.orig/mm/migrate.c
+++ linux-2.6/mm/migrate.c
@@ -764,7 +764,7 @@ static int unmap_and_move(new_page_t get
/* Drop an anon_vma reference if we took one */
if (anon_vma)
- drop_anon_vma(anon_vma);
+ put_anon_vma(anon_vma);
uncharge:
if (!charge)
@@ -857,7 +857,7 @@ static int unmap_and_move_huge_page(new_
remove_migration_ptes(hpage, hpage);
if (anon_vma)
- drop_anon_vma(anon_vma);
+ put_anon_vma(anon_vma);
out:
unlock_page(hpage);
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -278,7 +278,7 @@ static void anon_vma_unlink(struct anon_
if (empty) {
/* We no longer need the root anon_vma */
if (anon_vma->root != anon_vma)
- drop_anon_vma(anon_vma->root);
+ put_anon_vma(anon_vma->root);
anon_vma_free(anon_vma);
}
}
@@ -1489,7 +1489,7 @@ int try_to_munlock(struct page *page)
* we know we are the last user, nobody else can get a reference and we
* can do the freeing without the lock.
*/
-void drop_anon_vma(struct anon_vma *anon_vma)
+void put_anon_vma(struct anon_vma *anon_vma)
{
BUG_ON(atomic_read(&anon_vma->external_refcount) <= 0);
if (atomic_dec_and_lock(&anon_vma->external_refcount, &anon_vma->root->lock)) {
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 18/25] mm: Rename drop_anon_vma to put_anon_vma
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-rename_drop_anon_vma_to_put_anon_vma.patch --]
[-- Type: text/plain, Size: 4640 bytes --]
The normal code pattern used in the kernel is: get/put.
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/rmap.h | 4 ++--
mm/ksm.c | 23 +++++------------------
mm/migrate.c | 4 ++--
mm/rmap.c | 4 ++--
4 files changed, 11 insertions(+), 24 deletions(-)
Index: linux-2.6/include/linux/rmap.h
===================================================================
--- linux-2.6.orig/include/linux/rmap.h
+++ linux-2.6/include/linux/rmap.h
@@ -87,7 +87,7 @@ static inline void get_anon_vma(struct a
atomic_inc(&anon_vma->external_refcount);
}
-void drop_anon_vma(struct anon_vma *);
+void put_anon_vma(struct anon_vma *);
#else
static inline void anonvma_external_refcount_init(struct anon_vma *anon_vma)
{
@@ -102,7 +102,7 @@ static inline void get_anon_vma(struct a
{
}
-static inline void drop_anon_vma(struct anon_vma *anon_vma)
+static inline void put_anon_vma(struct anon_vma *anon_vma)
{
}
#endif /* CONFIG_KSM */
Index: linux-2.6/mm/ksm.c
===================================================================
--- linux-2.6.orig/mm/ksm.c
+++ linux-2.6/mm/ksm.c
@@ -301,20 +301,6 @@ static inline int in_stable_tree(struct
return rmap_item->address & STABLE_FLAG;
}
-static void hold_anon_vma(struct rmap_item *rmap_item,
- struct anon_vma *anon_vma)
-{
- rmap_item->anon_vma = anon_vma;
- get_anon_vma(anon_vma);
-}
-
-static void ksm_drop_anon_vma(struct rmap_item *rmap_item)
-{
- struct anon_vma *anon_vma = rmap_item->anon_vma;
-
- drop_anon_vma(anon_vma);
-}
-
/*
* ksmd, and unmerge_and_remove_all_rmap_items(), must not touch an mm's
* page tables after it has passed through ksm_exit() - which, if necessary,
@@ -397,7 +383,7 @@ static void break_cow(struct rmap_item *
* It is not an accident that whenever we want to break COW
* to undo, we also need to drop a reference to the anon_vma.
*/
- ksm_drop_anon_vma(rmap_item);
+ put_anon_vma(rmap_item->anon_vma);
down_read(&mm->mmap_sem);
if (ksm_test_exit(mm))
@@ -466,7 +452,7 @@ static void remove_node_from_stable_tree
ksm_pages_sharing--;
else
ksm_pages_shared--;
- ksm_drop_anon_vma(rmap_item);
+ put_anon_vma(rmap_item->anon_vma);
rmap_item->address &= PAGE_MASK;
cond_resched();
}
@@ -554,7 +540,7 @@ static void remove_rmap_item_from_tree(s
else
ksm_pages_shared--;
- ksm_drop_anon_vma(rmap_item);
+ put_anon_vma(rmap_item->anon_vma);
rmap_item->address &= PAGE_MASK;
} else if (rmap_item->address & UNSTABLE_FLAG) {
@@ -949,7 +935,8 @@ static int try_to_merge_with_ksm_page(st
goto out;
/* Must get reference to anon_vma while still holding mmap_sem */
- hold_anon_vma(rmap_item, vma->anon_vma);
+ rmap_item->anon_vma = vma->anon_vma;
+ get_anon_vma(vma->anon_vma);
out:
up_read(&mm->mmap_sem);
return err;
Index: linux-2.6/mm/migrate.c
===================================================================
--- linux-2.6.orig/mm/migrate.c
+++ linux-2.6/mm/migrate.c
@@ -764,7 +764,7 @@ static int unmap_and_move(new_page_t get
/* Drop an anon_vma reference if we took one */
if (anon_vma)
- drop_anon_vma(anon_vma);
+ put_anon_vma(anon_vma);
uncharge:
if (!charge)
@@ -857,7 +857,7 @@ static int unmap_and_move_huge_page(new_
remove_migration_ptes(hpage, hpage);
if (anon_vma)
- drop_anon_vma(anon_vma);
+ put_anon_vma(anon_vma);
out:
unlock_page(hpage);
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -278,7 +278,7 @@ static void anon_vma_unlink(struct anon_
if (empty) {
/* We no longer need the root anon_vma */
if (anon_vma->root != anon_vma)
- drop_anon_vma(anon_vma->root);
+ put_anon_vma(anon_vma->root);
anon_vma_free(anon_vma);
}
}
@@ -1489,7 +1489,7 @@ int try_to_munlock(struct page *page)
* we know we are the last user, nobody else can get a reference and we
* can do the freeing without the lock.
*/
-void drop_anon_vma(struct anon_vma *anon_vma)
+void put_anon_vma(struct anon_vma *anon_vma)
{
BUG_ON(atomic_read(&anon_vma->external_refcount) <= 0);
if (atomic_dec_and_lock(&anon_vma->external_refcount, &anon_vma->root->lock)) {
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 19/25] mm: Move anon_vma ref out from under CONFIG_KSM
2011-01-25 17:31 ` Peter Zijlstra
(?)
@ 2011-01-25 17:31 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-move_anon_vma_ref_out_from_under_config_ksm.patch --]
[-- Type: text/plain, Size: 4561 bytes --]
We need an anon_vma refcount for preemptible anon_vma->lock.
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/rmap.h | 40 ++++------------------------------------
mm/rmap.c | 14 ++++++--------
2 files changed, 10 insertions(+), 44 deletions(-)
Index: linux-2.6/include/linux/rmap.h
===================================================================
--- linux-2.6.orig/include/linux/rmap.h
+++ linux-2.6/include/linux/rmap.h
@@ -27,18 +27,15 @@
struct anon_vma {
struct anon_vma *root; /* Root of this anon_vma tree */
spinlock_t lock; /* Serialize access to vma list */
-#if defined(CONFIG_KSM) || defined(CONFIG_MIGRATION)
-
/*
- * The external_refcount is taken by either KSM or page migration
- * to take a reference to an anon_vma when there is no
+ * The refcount is taken on an anon_vma when there is no
* guarantee that the vma of page tables will exist for
* the duration of the operation. A caller that takes
* the reference is responsible for clearing up the
* anon_vma if they are the last user on release
*/
- atomic_t external_refcount;
-#endif
+ atomic_t refcount;
+
/*
* NOTE: the LSB of the head.next is set by
* mm_take_all_locks() _after_ taking the above lock. So the
@@ -71,41 +68,12 @@ struct anon_vma_chain {
};
#ifdef CONFIG_MMU
-#if defined(CONFIG_KSM) || defined(CONFIG_MIGRATION)
-static inline void anonvma_external_refcount_init(struct anon_vma *anon_vma)
-{
- atomic_set(&anon_vma->external_refcount, 0);
-}
-
-static inline int anonvma_external_refcount(struct anon_vma *anon_vma)
-{
- return atomic_read(&anon_vma->external_refcount);
-}
-
static inline void get_anon_vma(struct anon_vma *anon_vma)
{
- atomic_inc(&anon_vma->external_refcount);
+ atomic_inc(&anon_vma->refcount);
}
void put_anon_vma(struct anon_vma *);
-#else
-static inline void anonvma_external_refcount_init(struct anon_vma *anon_vma)
-{
-}
-
-static inline int anonvma_external_refcount(struct anon_vma *anon_vma)
-{
- return 0;
-}
-
-static inline void get_anon_vma(struct anon_vma *anon_vma)
-{
-}
-
-static inline void put_anon_vma(struct anon_vma *anon_vma)
-{
-}
-#endif /* CONFIG_KSM */
static inline struct anon_vma *page_anon_vma(struct page *page)
{
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -272,7 +272,7 @@ static void anon_vma_unlink(struct anon_
list_del(&anon_vma_chain->same_anon_vma);
/* We must garbage collect the anon_vma if it's empty */
- empty = list_empty(&anon_vma->head) && !anonvma_external_refcount(anon_vma);
+ empty = list_empty(&anon_vma->head) && !atomic_read(&anon_vma->refcount);
anon_vma_unlock(anon_vma);
if (empty) {
@@ -303,7 +303,7 @@ static void anon_vma_ctor(void *data)
struct anon_vma *anon_vma = data;
spin_lock_init(&anon_vma->lock);
- anonvma_external_refcount_init(anon_vma);
+ atomic_set(&anon_vma->refcount, 0);
INIT_LIST_HEAD(&anon_vma->head);
}
@@ -1482,7 +1482,6 @@ int try_to_munlock(struct page *page)
return try_to_unmap_file(page, TTU_MUNLOCK);
}
-#if defined(CONFIG_KSM) || defined(CONFIG_MIGRATION)
/*
* Drop an anon_vma refcount, freeing the anon_vma and anon_vma->root
* if necessary. Be careful to do all the tests under the lock. Once
@@ -1491,8 +1490,8 @@ int try_to_munlock(struct page *page)
*/
void put_anon_vma(struct anon_vma *anon_vma)
{
- BUG_ON(atomic_read(&anon_vma->external_refcount) <= 0);
- if (atomic_dec_and_lock(&anon_vma->external_refcount, &anon_vma->root->lock)) {
+ BUG_ON(atomic_read(&anon_vma->refcount) <= 0);
+ if (atomic_dec_and_lock(&anon_vma->refcount, &anon_vma->root->lock)) {
struct anon_vma *root = anon_vma->root;
int empty = list_empty(&anon_vma->head);
int last_root_user = 0;
@@ -1503,8 +1502,8 @@ void put_anon_vma(struct anon_vma *anon_
* the refcount on the root and check if we need to free it.
*/
if (empty && anon_vma != root) {
- BUG_ON(atomic_read(&root->external_refcount) <= 0);
- last_root_user = atomic_dec_and_test(&root->external_refcount);
+ BUG_ON(atomic_read(&root->refcount) <= 0);
+ last_root_user = atomic_dec_and_test(&root->refcount);
root_empty = list_empty(&root->head);
}
anon_vma_unlock(anon_vma);
@@ -1516,7 +1515,6 @@ void put_anon_vma(struct anon_vma *anon_
}
}
}
-#endif
#ifdef CONFIG_MIGRATION
/*
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 19/25] mm: Move anon_vma ref out from under CONFIG_KSM
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-move_anon_vma_ref_out_from_under_config_ksm.patch --]
[-- Type: text/plain, Size: 4857 bytes --]
We need an anon_vma refcount for preemptible anon_vma->lock.
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/rmap.h | 40 ++++------------------------------------
mm/rmap.c | 14 ++++++--------
2 files changed, 10 insertions(+), 44 deletions(-)
Index: linux-2.6/include/linux/rmap.h
===================================================================
--- linux-2.6.orig/include/linux/rmap.h
+++ linux-2.6/include/linux/rmap.h
@@ -27,18 +27,15 @@
struct anon_vma {
struct anon_vma *root; /* Root of this anon_vma tree */
spinlock_t lock; /* Serialize access to vma list */
-#if defined(CONFIG_KSM) || defined(CONFIG_MIGRATION)
-
/*
- * The external_refcount is taken by either KSM or page migration
- * to take a reference to an anon_vma when there is no
+ * The refcount is taken on an anon_vma when there is no
* guarantee that the vma of page tables will exist for
* the duration of the operation. A caller that takes
* the reference is responsible for clearing up the
* anon_vma if they are the last user on release
*/
- atomic_t external_refcount;
-#endif
+ atomic_t refcount;
+
/*
* NOTE: the LSB of the head.next is set by
* mm_take_all_locks() _after_ taking the above lock. So the
@@ -71,41 +68,12 @@ struct anon_vma_chain {
};
#ifdef CONFIG_MMU
-#if defined(CONFIG_KSM) || defined(CONFIG_MIGRATION)
-static inline void anonvma_external_refcount_init(struct anon_vma *anon_vma)
-{
- atomic_set(&anon_vma->external_refcount, 0);
-}
-
-static inline int anonvma_external_refcount(struct anon_vma *anon_vma)
-{
- return atomic_read(&anon_vma->external_refcount);
-}
-
static inline void get_anon_vma(struct anon_vma *anon_vma)
{
- atomic_inc(&anon_vma->external_refcount);
+ atomic_inc(&anon_vma->refcount);
}
void put_anon_vma(struct anon_vma *);
-#else
-static inline void anonvma_external_refcount_init(struct anon_vma *anon_vma)
-{
-}
-
-static inline int anonvma_external_refcount(struct anon_vma *anon_vma)
-{
- return 0;
-}
-
-static inline void get_anon_vma(struct anon_vma *anon_vma)
-{
-}
-
-static inline void put_anon_vma(struct anon_vma *anon_vma)
-{
-}
-#endif /* CONFIG_KSM */
static inline struct anon_vma *page_anon_vma(struct page *page)
{
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -272,7 +272,7 @@ static void anon_vma_unlink(struct anon_
list_del(&anon_vma_chain->same_anon_vma);
/* We must garbage collect the anon_vma if it's empty */
- empty = list_empty(&anon_vma->head) && !anonvma_external_refcount(anon_vma);
+ empty = list_empty(&anon_vma->head) && !atomic_read(&anon_vma->refcount);
anon_vma_unlock(anon_vma);
if (empty) {
@@ -303,7 +303,7 @@ static void anon_vma_ctor(void *data)
struct anon_vma *anon_vma = data;
spin_lock_init(&anon_vma->lock);
- anonvma_external_refcount_init(anon_vma);
+ atomic_set(&anon_vma->refcount, 0);
INIT_LIST_HEAD(&anon_vma->head);
}
@@ -1482,7 +1482,6 @@ int try_to_munlock(struct page *page)
return try_to_unmap_file(page, TTU_MUNLOCK);
}
-#if defined(CONFIG_KSM) || defined(CONFIG_MIGRATION)
/*
* Drop an anon_vma refcount, freeing the anon_vma and anon_vma->root
* if necessary. Be careful to do all the tests under the lock. Once
@@ -1491,8 +1490,8 @@ int try_to_munlock(struct page *page)
*/
void put_anon_vma(struct anon_vma *anon_vma)
{
- BUG_ON(atomic_read(&anon_vma->external_refcount) <= 0);
- if (atomic_dec_and_lock(&anon_vma->external_refcount, &anon_vma->root->lock)) {
+ BUG_ON(atomic_read(&anon_vma->refcount) <= 0);
+ if (atomic_dec_and_lock(&anon_vma->refcount, &anon_vma->root->lock)) {
struct anon_vma *root = anon_vma->root;
int empty = list_empty(&anon_vma->head);
int last_root_user = 0;
@@ -1503,8 +1502,8 @@ void put_anon_vma(struct anon_vma *anon_
* the refcount on the root and check if we need to free it.
*/
if (empty && anon_vma != root) {
- BUG_ON(atomic_read(&root->external_refcount) <= 0);
- last_root_user = atomic_dec_and_test(&root->external_refcount);
+ BUG_ON(atomic_read(&root->refcount) <= 0);
+ last_root_user = atomic_dec_and_test(&root->refcount);
root_empty = list_empty(&root->head);
}
anon_vma_unlock(anon_vma);
@@ -1516,7 +1515,6 @@ void put_anon_vma(struct anon_vma *anon_
}
}
}
-#endif
#ifdef CONFIG_MIGRATION
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 19/25] mm: Move anon_vma ref out from under CONFIG_KSM
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-move_anon_vma_ref_out_from_under_config_ksm.patch --]
[-- Type: text/plain, Size: 4857 bytes --]
We need an anon_vma refcount for preemptible anon_vma->lock.
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/rmap.h | 40 ++++------------------------------------
mm/rmap.c | 14 ++++++--------
2 files changed, 10 insertions(+), 44 deletions(-)
Index: linux-2.6/include/linux/rmap.h
===================================================================
--- linux-2.6.orig/include/linux/rmap.h
+++ linux-2.6/include/linux/rmap.h
@@ -27,18 +27,15 @@
struct anon_vma {
struct anon_vma *root; /* Root of this anon_vma tree */
spinlock_t lock; /* Serialize access to vma list */
-#if defined(CONFIG_KSM) || defined(CONFIG_MIGRATION)
-
/*
- * The external_refcount is taken by either KSM or page migration
- * to take a reference to an anon_vma when there is no
+ * The refcount is taken on an anon_vma when there is no
* guarantee that the vma of page tables will exist for
* the duration of the operation. A caller that takes
* the reference is responsible for clearing up the
* anon_vma if they are the last user on release
*/
- atomic_t external_refcount;
-#endif
+ atomic_t refcount;
+
/*
* NOTE: the LSB of the head.next is set by
* mm_take_all_locks() _after_ taking the above lock. So the
@@ -71,41 +68,12 @@ struct anon_vma_chain {
};
#ifdef CONFIG_MMU
-#if defined(CONFIG_KSM) || defined(CONFIG_MIGRATION)
-static inline void anonvma_external_refcount_init(struct anon_vma *anon_vma)
-{
- atomic_set(&anon_vma->external_refcount, 0);
-}
-
-static inline int anonvma_external_refcount(struct anon_vma *anon_vma)
-{
- return atomic_read(&anon_vma->external_refcount);
-}
-
static inline void get_anon_vma(struct anon_vma *anon_vma)
{
- atomic_inc(&anon_vma->external_refcount);
+ atomic_inc(&anon_vma->refcount);
}
void put_anon_vma(struct anon_vma *);
-#else
-static inline void anonvma_external_refcount_init(struct anon_vma *anon_vma)
-{
-}
-
-static inline int anonvma_external_refcount(struct anon_vma *anon_vma)
-{
- return 0;
-}
-
-static inline void get_anon_vma(struct anon_vma *anon_vma)
-{
-}
-
-static inline void put_anon_vma(struct anon_vma *anon_vma)
-{
-}
-#endif /* CONFIG_KSM */
static inline struct anon_vma *page_anon_vma(struct page *page)
{
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -272,7 +272,7 @@ static void anon_vma_unlink(struct anon_
list_del(&anon_vma_chain->same_anon_vma);
/* We must garbage collect the anon_vma if it's empty */
- empty = list_empty(&anon_vma->head) && !anonvma_external_refcount(anon_vma);
+ empty = list_empty(&anon_vma->head) && !atomic_read(&anon_vma->refcount);
anon_vma_unlock(anon_vma);
if (empty) {
@@ -303,7 +303,7 @@ static void anon_vma_ctor(void *data)
struct anon_vma *anon_vma = data;
spin_lock_init(&anon_vma->lock);
- anonvma_external_refcount_init(anon_vma);
+ atomic_set(&anon_vma->refcount, 0);
INIT_LIST_HEAD(&anon_vma->head);
}
@@ -1482,7 +1482,6 @@ int try_to_munlock(struct page *page)
return try_to_unmap_file(page, TTU_MUNLOCK);
}
-#if defined(CONFIG_KSM) || defined(CONFIG_MIGRATION)
/*
* Drop an anon_vma refcount, freeing the anon_vma and anon_vma->root
* if necessary. Be careful to do all the tests under the lock. Once
@@ -1491,8 +1490,8 @@ int try_to_munlock(struct page *page)
*/
void put_anon_vma(struct anon_vma *anon_vma)
{
- BUG_ON(atomic_read(&anon_vma->external_refcount) <= 0);
- if (atomic_dec_and_lock(&anon_vma->external_refcount, &anon_vma->root->lock)) {
+ BUG_ON(atomic_read(&anon_vma->refcount) <= 0);
+ if (atomic_dec_and_lock(&anon_vma->refcount, &anon_vma->root->lock)) {
struct anon_vma *root = anon_vma->root;
int empty = list_empty(&anon_vma->head);
int last_root_user = 0;
@@ -1503,8 +1502,8 @@ void put_anon_vma(struct anon_vma *anon_
* the refcount on the root and check if we need to free it.
*/
if (empty && anon_vma != root) {
- BUG_ON(atomic_read(&root->external_refcount) <= 0);
- last_root_user = atomic_dec_and_test(&root->external_refcount);
+ BUG_ON(atomic_read(&root->refcount) <= 0);
+ last_root_user = atomic_dec_and_test(&root->refcount);
root_empty = list_empty(&root->head);
}
anon_vma_unlock(anon_vma);
@@ -1516,7 +1515,6 @@ void put_anon_vma(struct anon_vma *anon_
}
}
}
-#endif
#ifdef CONFIG_MIGRATION
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 20/25] mm: Simplify anon_vma refcounts
2011-01-25 17:31 ` Peter Zijlstra
(?)
@ 2011-01-25 17:31 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-simplify_anon_vma_refcounts.patch --]
[-- Type: text/plain, Size: 6088 bytes --]
This patch changes the anon_vma refcount to be 0 when the object is
free. It does this by adding 1 ref to being in use in the anon_vma
structure (iow. the anon_vma->head list is not empty).
This allows a simpler release scheme without having to check both the
refcount and the list as well as avoids taking a ref for each entry
on the list.
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/rmap.h | 11 +++++--
mm/rmap.c | 79 ++++++++++++++++++---------------------------------
2 files changed, 37 insertions(+), 53 deletions(-)
Index: linux-2.6/include/linux/rmap.h
===================================================================
--- linux-2.6.orig/include/linux/rmap.h
+++ linux-2.6/include/linux/rmap.h
@@ -73,7 +73,13 @@ static inline void get_anon_vma(struct a
atomic_inc(&anon_vma->refcount);
}
-void put_anon_vma(struct anon_vma *);
+void __put_anon_vma(struct anon_vma *anon_vma);
+
+static inline void put_anon_vma(struct anon_vma *anon_vma)
+{
+ if (atomic_dec_and_test(&anon_vma->refcount))
+ __put_anon_vma(anon_vma);
+}
static inline struct anon_vma *page_anon_vma(struct page *page)
{
@@ -116,7 +122,6 @@ void unlink_anon_vmas(struct vm_area_str
int anon_vma_clone(struct vm_area_struct *, struct vm_area_struct *);
int anon_vma_fork(struct vm_area_struct *, struct vm_area_struct *);
void __anon_vma_link(struct vm_area_struct *);
-void anon_vma_free(struct anon_vma *);
static inline void anon_vma_merge(struct vm_area_struct *vma,
struct vm_area_struct *next)
@@ -125,6 +130,8 @@ static inline void anon_vma_merge(struct
unlink_anon_vmas(next);
}
+struct anon_vma *page_get_anon_vma(struct page *page);
+
/*
* rmap interfaces called when adding or removing pte of page
*/
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -67,11 +67,24 @@ static struct kmem_cache *anon_vma_chain
static inline struct anon_vma *anon_vma_alloc(void)
{
- return kmem_cache_alloc(anon_vma_cachep, GFP_KERNEL);
+ struct anon_vma *anon_vma;
+
+ anon_vma = kmem_cache_alloc(anon_vma_cachep, GFP_KERNEL);
+ if (anon_vma) {
+ atomic_set(&anon_vma->refcount, 1);
+ /*
+ * Initialise the anon_vma root to point to itself. If called
+ * from fork, the root will be reset to the parents anon_vma.
+ */
+ anon_vma->root = anon_vma;
+ }
+
+ return anon_vma;
}
-void anon_vma_free(struct anon_vma *anon_vma)
+static inline void anon_vma_free(struct anon_vma *anon_vma)
{
+ VM_BUG_ON(atomic_read(&anon_vma->refcount));
kmem_cache_free(anon_vma_cachep, anon_vma);
}
@@ -133,11 +146,6 @@ int anon_vma_prepare(struct vm_area_stru
if (unlikely(!anon_vma))
goto out_enomem_free_avc;
allocated = anon_vma;
- /*
- * This VMA had no anon_vma yet. This anon_vma is
- * the root of any anon_vma tree that might form.
- */
- anon_vma->root = anon_vma;
}
anon_vma_lock(anon_vma);
@@ -156,7 +164,7 @@ int anon_vma_prepare(struct vm_area_stru
anon_vma_unlock(anon_vma);
if (unlikely(allocated))
- anon_vma_free(allocated);
+ put_anon_vma(allocated);
if (unlikely(avc))
anon_vma_chain_free(avc);
}
@@ -241,9 +249,9 @@ int anon_vma_fork(struct vm_area_struct
*/
anon_vma->root = pvma->anon_vma->root;
/*
- * With KSM refcounts, an anon_vma can stay around longer than the
- * process it belongs to. The root anon_vma needs to be pinned
- * until this anon_vma is freed, because the lock lives in the root.
+ * With refcounts, an anon_vma can stay around longer than the
+ * process it belongs to. The root anon_vma needs to be pinned until
+ * this anon_vma is freed, because the lock lives in the root.
*/
get_anon_vma(anon_vma->root);
/* Mark this anon_vma as the one where our new (COWed) pages go. */
@@ -253,7 +261,7 @@ int anon_vma_fork(struct vm_area_struct
return 0;
out_error_free_anon_vma:
- anon_vma_free(anon_vma);
+ put_anon_vma(anon_vma);
out_error:
unlink_anon_vmas(vma);
return -ENOMEM;
@@ -272,15 +280,11 @@ static void anon_vma_unlink(struct anon_
list_del(&anon_vma_chain->same_anon_vma);
/* We must garbage collect the anon_vma if it's empty */
- empty = list_empty(&anon_vma->head) && !atomic_read(&anon_vma->refcount);
+ empty = list_empty(&anon_vma->head);
anon_vma_unlock(anon_vma);
- if (empty) {
- /* We no longer need the root anon_vma */
- if (anon_vma->root != anon_vma)
- put_anon_vma(anon_vma->root);
- anon_vma_free(anon_vma);
- }
+ if (empty)
+ put_anon_vma(anon_vma);
}
void unlink_anon_vmas(struct vm_area_struct *vma)
@@ -1482,38 +1486,11 @@ int try_to_munlock(struct page *page)
return try_to_unmap_file(page, TTU_MUNLOCK);
}
-/*
- * Drop an anon_vma refcount, freeing the anon_vma and anon_vma->root
- * if necessary. Be careful to do all the tests under the lock. Once
- * we know we are the last user, nobody else can get a reference and we
- * can do the freeing without the lock.
- */
-void put_anon_vma(struct anon_vma *anon_vma)
-{
- BUG_ON(atomic_read(&anon_vma->refcount) <= 0);
- if (atomic_dec_and_lock(&anon_vma->refcount, &anon_vma->root->lock)) {
- struct anon_vma *root = anon_vma->root;
- int empty = list_empty(&anon_vma->head);
- int last_root_user = 0;
- int root_empty = 0;
-
- /*
- * The refcount on a non-root anon_vma got dropped. Drop
- * the refcount on the root and check if we need to free it.
- */
- if (empty && anon_vma != root) {
- BUG_ON(atomic_read(&root->refcount) <= 0);
- last_root_user = atomic_dec_and_test(&root->refcount);
- root_empty = list_empty(&root->head);
- }
- anon_vma_unlock(anon_vma);
-
- if (empty) {
- anon_vma_free(anon_vma);
- if (root_empty && last_root_user)
- anon_vma_free(root);
- }
- }
+void __put_anon_vma(struct anon_vma *anon_vma)
+{
+ if (anon_vma->root != anon_vma)
+ put_anon_vma(anon_vma->root);
+ anon_vma_free(anon_vma);
}
#ifdef CONFIG_MIGRATION
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 20/25] mm: Simplify anon_vma refcounts
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-simplify_anon_vma_refcounts.patch --]
[-- Type: text/plain, Size: 6384 bytes --]
This patch changes the anon_vma refcount to be 0 when the object is
free. It does this by adding 1 ref to being in use in the anon_vma
structure (iow. the anon_vma->head list is not empty).
This allows a simpler release scheme without having to check both the
refcount and the list as well as avoids taking a ref for each entry
on the list.
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/rmap.h | 11 +++++--
mm/rmap.c | 79 ++++++++++++++++++---------------------------------
2 files changed, 37 insertions(+), 53 deletions(-)
Index: linux-2.6/include/linux/rmap.h
===================================================================
--- linux-2.6.orig/include/linux/rmap.h
+++ linux-2.6/include/linux/rmap.h
@@ -73,7 +73,13 @@ static inline void get_anon_vma(struct a
atomic_inc(&anon_vma->refcount);
}
-void put_anon_vma(struct anon_vma *);
+void __put_anon_vma(struct anon_vma *anon_vma);
+
+static inline void put_anon_vma(struct anon_vma *anon_vma)
+{
+ if (atomic_dec_and_test(&anon_vma->refcount))
+ __put_anon_vma(anon_vma);
+}
static inline struct anon_vma *page_anon_vma(struct page *page)
{
@@ -116,7 +122,6 @@ void unlink_anon_vmas(struct vm_area_str
int anon_vma_clone(struct vm_area_struct *, struct vm_area_struct *);
int anon_vma_fork(struct vm_area_struct *, struct vm_area_struct *);
void __anon_vma_link(struct vm_area_struct *);
-void anon_vma_free(struct anon_vma *);
static inline void anon_vma_merge(struct vm_area_struct *vma,
struct vm_area_struct *next)
@@ -125,6 +130,8 @@ static inline void anon_vma_merge(struct
unlink_anon_vmas(next);
}
+struct anon_vma *page_get_anon_vma(struct page *page);
+
/*
* rmap interfaces called when adding or removing pte of page
*/
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -67,11 +67,24 @@ static struct kmem_cache *anon_vma_chain
static inline struct anon_vma *anon_vma_alloc(void)
{
- return kmem_cache_alloc(anon_vma_cachep, GFP_KERNEL);
+ struct anon_vma *anon_vma;
+
+ anon_vma = kmem_cache_alloc(anon_vma_cachep, GFP_KERNEL);
+ if (anon_vma) {
+ atomic_set(&anon_vma->refcount, 1);
+ /*
+ * Initialise the anon_vma root to point to itself. If called
+ * from fork, the root will be reset to the parents anon_vma.
+ */
+ anon_vma->root = anon_vma;
+ }
+
+ return anon_vma;
}
-void anon_vma_free(struct anon_vma *anon_vma)
+static inline void anon_vma_free(struct anon_vma *anon_vma)
{
+ VM_BUG_ON(atomic_read(&anon_vma->refcount));
kmem_cache_free(anon_vma_cachep, anon_vma);
}
@@ -133,11 +146,6 @@ int anon_vma_prepare(struct vm_area_stru
if (unlikely(!anon_vma))
goto out_enomem_free_avc;
allocated = anon_vma;
- /*
- * This VMA had no anon_vma yet. This anon_vma is
- * the root of any anon_vma tree that might form.
- */
- anon_vma->root = anon_vma;
}
anon_vma_lock(anon_vma);
@@ -156,7 +164,7 @@ int anon_vma_prepare(struct vm_area_stru
anon_vma_unlock(anon_vma);
if (unlikely(allocated))
- anon_vma_free(allocated);
+ put_anon_vma(allocated);
if (unlikely(avc))
anon_vma_chain_free(avc);
}
@@ -241,9 +249,9 @@ int anon_vma_fork(struct vm_area_struct
*/
anon_vma->root = pvma->anon_vma->root;
/*
- * With KSM refcounts, an anon_vma can stay around longer than the
- * process it belongs to. The root anon_vma needs to be pinned
- * until this anon_vma is freed, because the lock lives in the root.
+ * With refcounts, an anon_vma can stay around longer than the
+ * process it belongs to. The root anon_vma needs to be pinned until
+ * this anon_vma is freed, because the lock lives in the root.
*/
get_anon_vma(anon_vma->root);
/* Mark this anon_vma as the one where our new (COWed) pages go. */
@@ -253,7 +261,7 @@ int anon_vma_fork(struct vm_area_struct
return 0;
out_error_free_anon_vma:
- anon_vma_free(anon_vma);
+ put_anon_vma(anon_vma);
out_error:
unlink_anon_vmas(vma);
return -ENOMEM;
@@ -272,15 +280,11 @@ static void anon_vma_unlink(struct anon_
list_del(&anon_vma_chain->same_anon_vma);
/* We must garbage collect the anon_vma if it's empty */
- empty = list_empty(&anon_vma->head) && !atomic_read(&anon_vma->refcount);
+ empty = list_empty(&anon_vma->head);
anon_vma_unlock(anon_vma);
- if (empty) {
- /* We no longer need the root anon_vma */
- if (anon_vma->root != anon_vma)
- put_anon_vma(anon_vma->root);
- anon_vma_free(anon_vma);
- }
+ if (empty)
+ put_anon_vma(anon_vma);
}
void unlink_anon_vmas(struct vm_area_struct *vma)
@@ -1482,38 +1486,11 @@ int try_to_munlock(struct page *page)
return try_to_unmap_file(page, TTU_MUNLOCK);
}
-/*
- * Drop an anon_vma refcount, freeing the anon_vma and anon_vma->root
- * if necessary. Be careful to do all the tests under the lock. Once
- * we know we are the last user, nobody else can get a reference and we
- * can do the freeing without the lock.
- */
-void put_anon_vma(struct anon_vma *anon_vma)
-{
- BUG_ON(atomic_read(&anon_vma->refcount) <= 0);
- if (atomic_dec_and_lock(&anon_vma->refcount, &anon_vma->root->lock)) {
- struct anon_vma *root = anon_vma->root;
- int empty = list_empty(&anon_vma->head);
- int last_root_user = 0;
- int root_empty = 0;
-
- /*
- * The refcount on a non-root anon_vma got dropped. Drop
- * the refcount on the root and check if we need to free it.
- */
- if (empty && anon_vma != root) {
- BUG_ON(atomic_read(&root->refcount) <= 0);
- last_root_user = atomic_dec_and_test(&root->refcount);
- root_empty = list_empty(&root->head);
- }
- anon_vma_unlock(anon_vma);
-
- if (empty) {
- anon_vma_free(anon_vma);
- if (root_empty && last_root_user)
- anon_vma_free(root);
- }
- }
+void __put_anon_vma(struct anon_vma *anon_vma)
+{
+ if (anon_vma->root != anon_vma)
+ put_anon_vma(anon_vma->root);
+ anon_vma_free(anon_vma);
}
#ifdef CONFIG_MIGRATION
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 20/25] mm: Simplify anon_vma refcounts
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-simplify_anon_vma_refcounts.patch --]
[-- Type: text/plain, Size: 6384 bytes --]
This patch changes the anon_vma refcount to be 0 when the object is
free. It does this by adding 1 ref to being in use in the anon_vma
structure (iow. the anon_vma->head list is not empty).
This allows a simpler release scheme without having to check both the
refcount and the list as well as avoids taking a ref for each entry
on the list.
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/rmap.h | 11 +++++--
mm/rmap.c | 79 ++++++++++++++++++---------------------------------
2 files changed, 37 insertions(+), 53 deletions(-)
Index: linux-2.6/include/linux/rmap.h
===================================================================
--- linux-2.6.orig/include/linux/rmap.h
+++ linux-2.6/include/linux/rmap.h
@@ -73,7 +73,13 @@ static inline void get_anon_vma(struct a
atomic_inc(&anon_vma->refcount);
}
-void put_anon_vma(struct anon_vma *);
+void __put_anon_vma(struct anon_vma *anon_vma);
+
+static inline void put_anon_vma(struct anon_vma *anon_vma)
+{
+ if (atomic_dec_and_test(&anon_vma->refcount))
+ __put_anon_vma(anon_vma);
+}
static inline struct anon_vma *page_anon_vma(struct page *page)
{
@@ -116,7 +122,6 @@ void unlink_anon_vmas(struct vm_area_str
int anon_vma_clone(struct vm_area_struct *, struct vm_area_struct *);
int anon_vma_fork(struct vm_area_struct *, struct vm_area_struct *);
void __anon_vma_link(struct vm_area_struct *);
-void anon_vma_free(struct anon_vma *);
static inline void anon_vma_merge(struct vm_area_struct *vma,
struct vm_area_struct *next)
@@ -125,6 +130,8 @@ static inline void anon_vma_merge(struct
unlink_anon_vmas(next);
}
+struct anon_vma *page_get_anon_vma(struct page *page);
+
/*
* rmap interfaces called when adding or removing pte of page
*/
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -67,11 +67,24 @@ static struct kmem_cache *anon_vma_chain
static inline struct anon_vma *anon_vma_alloc(void)
{
- return kmem_cache_alloc(anon_vma_cachep, GFP_KERNEL);
+ struct anon_vma *anon_vma;
+
+ anon_vma = kmem_cache_alloc(anon_vma_cachep, GFP_KERNEL);
+ if (anon_vma) {
+ atomic_set(&anon_vma->refcount, 1);
+ /*
+ * Initialise the anon_vma root to point to itself. If called
+ * from fork, the root will be reset to the parents anon_vma.
+ */
+ anon_vma->root = anon_vma;
+ }
+
+ return anon_vma;
}
-void anon_vma_free(struct anon_vma *anon_vma)
+static inline void anon_vma_free(struct anon_vma *anon_vma)
{
+ VM_BUG_ON(atomic_read(&anon_vma->refcount));
kmem_cache_free(anon_vma_cachep, anon_vma);
}
@@ -133,11 +146,6 @@ int anon_vma_prepare(struct vm_area_stru
if (unlikely(!anon_vma))
goto out_enomem_free_avc;
allocated = anon_vma;
- /*
- * This VMA had no anon_vma yet. This anon_vma is
- * the root of any anon_vma tree that might form.
- */
- anon_vma->root = anon_vma;
}
anon_vma_lock(anon_vma);
@@ -156,7 +164,7 @@ int anon_vma_prepare(struct vm_area_stru
anon_vma_unlock(anon_vma);
if (unlikely(allocated))
- anon_vma_free(allocated);
+ put_anon_vma(allocated);
if (unlikely(avc))
anon_vma_chain_free(avc);
}
@@ -241,9 +249,9 @@ int anon_vma_fork(struct vm_area_struct
*/
anon_vma->root = pvma->anon_vma->root;
/*
- * With KSM refcounts, an anon_vma can stay around longer than the
- * process it belongs to. The root anon_vma needs to be pinned
- * until this anon_vma is freed, because the lock lives in the root.
+ * With refcounts, an anon_vma can stay around longer than the
+ * process it belongs to. The root anon_vma needs to be pinned until
+ * this anon_vma is freed, because the lock lives in the root.
*/
get_anon_vma(anon_vma->root);
/* Mark this anon_vma as the one where our new (COWed) pages go. */
@@ -253,7 +261,7 @@ int anon_vma_fork(struct vm_area_struct
return 0;
out_error_free_anon_vma:
- anon_vma_free(anon_vma);
+ put_anon_vma(anon_vma);
out_error:
unlink_anon_vmas(vma);
return -ENOMEM;
@@ -272,15 +280,11 @@ static void anon_vma_unlink(struct anon_
list_del(&anon_vma_chain->same_anon_vma);
/* We must garbage collect the anon_vma if it's empty */
- empty = list_empty(&anon_vma->head) && !atomic_read(&anon_vma->refcount);
+ empty = list_empty(&anon_vma->head);
anon_vma_unlock(anon_vma);
- if (empty) {
- /* We no longer need the root anon_vma */
- if (anon_vma->root != anon_vma)
- put_anon_vma(anon_vma->root);
- anon_vma_free(anon_vma);
- }
+ if (empty)
+ put_anon_vma(anon_vma);
}
void unlink_anon_vmas(struct vm_area_struct *vma)
@@ -1482,38 +1486,11 @@ int try_to_munlock(struct page *page)
return try_to_unmap_file(page, TTU_MUNLOCK);
}
-/*
- * Drop an anon_vma refcount, freeing the anon_vma and anon_vma->root
- * if necessary. Be careful to do all the tests under the lock. Once
- * we know we are the last user, nobody else can get a reference and we
- * can do the freeing without the lock.
- */
-void put_anon_vma(struct anon_vma *anon_vma)
-{
- BUG_ON(atomic_read(&anon_vma->refcount) <= 0);
- if (atomic_dec_and_lock(&anon_vma->refcount, &anon_vma->root->lock)) {
- struct anon_vma *root = anon_vma->root;
- int empty = list_empty(&anon_vma->head);
- int last_root_user = 0;
- int root_empty = 0;
-
- /*
- * The refcount on a non-root anon_vma got dropped. Drop
- * the refcount on the root and check if we need to free it.
- */
- if (empty && anon_vma != root) {
- BUG_ON(atomic_read(&root->refcount) <= 0);
- last_root_user = atomic_dec_and_test(&root->refcount);
- root_empty = list_empty(&root->head);
- }
- anon_vma_unlock(anon_vma);
-
- if (empty) {
- anon_vma_free(anon_vma);
- if (root_empty && last_root_user)
- anon_vma_free(root);
- }
- }
+void __put_anon_vma(struct anon_vma *anon_vma)
+{
+ if (anon_vma->root != anon_vma)
+ put_anon_vma(anon_vma->root);
+ anon_vma_free(anon_vma);
}
#ifdef CONFIG_MIGRATION
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 20/25] mm: Simplify anon_vma refcounts
2011-01-25 17:31 ` Peter Zijlstra
@ 2011-01-25 20:16 ` Linus Torvalds
-1 siblings, 0 replies; 128+ messages in thread
From: Linus Torvalds @ 2011-01-25 20:16 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, linux-kernel, linux-arch, linux-mm,
Benjamin Herrenschmidt, David Miller, Hugh Dickins, Mel Gorman,
Nick Piggin, Paul McKenney, Yanmin Zhang, Hugh Dickins
On Wed, Jan 26, 2011 at 3:31 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> This patch changes the anon_vma refcount to be 0 when the object is
> free. It does this by adding 1 ref to being in use in the anon_vma
> structure (iow. the anon_vma->head list is not empty).
Why is this patch part of this series, rather than being an
independent patch before the whole series?
I think this part of the series is the only total no-brainer, ie we
should have done this from the beginning. The preemptability stuff I'm
more nervous about (performance issues? semantic differences?)
Linus
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 20/25] mm: Simplify anon_vma refcounts
@ 2011-01-25 20:16 ` Linus Torvalds
0 siblings, 0 replies; 128+ messages in thread
From: Linus Torvalds @ 2011-01-25 20:16 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, linux-kernel, linux-arch, linux-mm,
Benjamin Herrenschmidt, David Miller, Hugh Dickins, Mel Gorman,
Nick Piggin, Paul McKenney, Yanmin Zhang, Hugh Dickins
On Wed, Jan 26, 2011 at 3:31 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> This patch changes the anon_vma refcount to be 0 when the object is
> free. It does this by adding 1 ref to being in use in the anon_vma
> structure (iow. the anon_vma->head list is not empty).
Why is this patch part of this series, rather than being an
independent patch before the whole series?
I think this part of the series is the only total no-brainer, ie we
should have done this from the beginning. The preemptability stuff I'm
more nervous about (performance issues? semantic differences?)
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 20/25] mm: Simplify anon_vma refcounts
2011-01-25 20:16 ` Linus Torvalds
@ 2011-01-25 20:31 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 20:31 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, linux-kernel, linux-arch, linux-mm,
Benjamin Herrenschmidt, David Miller, Hugh Dickins, Mel Gorman,
Nick Piggin, Paul McKenney, Yanmin Zhang, Hugh Dickins
On Wed, 2011-01-26 at 06:16 +1000, Linus Torvalds wrote:
> On Wed, Jan 26, 2011 at 3:31 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> >
> > This patch changes the anon_vma refcount to be 0 when the object is
> > free. It does this by adding 1 ref to being in use in the anon_vma
> > structure (iow. the anon_vma->head list is not empty).
>
> Why is this patch part of this series, rather than being an
> independent patch before the whole series?
>
> I think this part of the series is the only total no-brainer, ie we
> should have done this from the beginning. The preemptability stuff I'm
> more nervous about (performance issues? semantic differences?)
It relies on patch 19, which moves the anon_vma refcount out from under
CONFIG_goo.
If however you like it, I can move 19 and this patch up to the start and
have that go your way soon.
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 20/25] mm: Simplify anon_vma refcounts
@ 2011-01-25 20:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 20:31 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, linux-kernel, linux-arch, linux-mm,
Benjamin Herrenschmidt, David Miller, Hugh Dickins, Mel Gorman,
Nick Piggin, Paul McKenney, Yanmin Zhang, Hugh Dickins
On Wed, 2011-01-26 at 06:16 +1000, Linus Torvalds wrote:
> On Wed, Jan 26, 2011 at 3:31 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> >
> > This patch changes the anon_vma refcount to be 0 when the object is
> > free. It does this by adding 1 ref to being in use in the anon_vma
> > structure (iow. the anon_vma->head list is not empty).
>
> Why is this patch part of this series, rather than being an
> independent patch before the whole series?
>
> I think this part of the series is the only total no-brainer, ie we
> should have done this from the beginning. The preemptability stuff I'm
> more nervous about (performance issues? semantic differences?)
It relies on patch 19, which moves the anon_vma refcount out from under
CONFIG_goo.
If however you like it, I can move 19 and this patch up to the start and
have that go your way soon.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 20/25] mm: Simplify anon_vma refcounts
2011-01-25 20:31 ` Peter Zijlstra
@ 2011-01-25 20:37 ` Linus Torvalds
-1 siblings, 0 replies; 128+ messages in thread
From: Linus Torvalds @ 2011-01-25 20:37 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, linux-kernel, linux-arch, linux-mm,
Benjamin Herrenschmidt, David Miller, Hugh Dickins, Mel Gorman,
Nick Piggin, Paul McKenney, Yanmin Zhang, Hugh Dickins
On Wed, Jan 26, 2011 at 6:31 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> It relies on patch 19, which moves the anon_vma refcount out from under
> CONFIG_goo.
>
> If however you like it, I can move 19 and this patch up to the start and
> have that go your way soon.
So I'd prefer that, and keeping this issue separate. It seems like a
rather independent cleanup to me (there may be others, this is the
only one I reacted to when skimming through the series - reading email
using my small laptop screen and a somewhat flaky 3g hotspot
connection isn't exactly amenable to deep and careful thinking ;).
Linus
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 20/25] mm: Simplify anon_vma refcounts
@ 2011-01-25 20:37 ` Linus Torvalds
0 siblings, 0 replies; 128+ messages in thread
From: Linus Torvalds @ 2011-01-25 20:37 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, linux-kernel, linux-arch, linux-mm,
Benjamin Herrenschmidt, David Miller, Hugh Dickins, Mel Gorman,
Nick Piggin, Paul McKenney, Yanmin Zhang, Hugh Dickins
On Wed, Jan 26, 2011 at 6:31 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> It relies on patch 19, which moves the anon_vma refcount out from under
> CONFIG_goo.
>
> If however you like it, I can move 19 and this patch up to the start and
> have that go your way soon.
So I'd prefer that, and keeping this issue separate. It seems like a
rather independent cleanup to me (there may be others, this is the
only one I reacted to when skimming through the series - reading email
using my small laptop screen and a somewhat flaky 3g hotspot
connection isn't exactly amenable to deep and careful thinking ;).
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 21/25] mm: Use refcounts for page_lock_anon_vma()
2011-01-25 17:31 ` Peter Zijlstra
(?)
@ 2011-01-25 17:31 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-use_refcounts_for_page_lock_anon_vma.patch --]
[-- Type: text/plain, Size: 3999 bytes --]
Convert page_lock_anon_vma() over to use refcounts. This is done to
prepare for the conversion of anon_vma from spinlock to mutex.
Sadly this inceases the cost of page_lock_anon_vma() from one to two
atomics, a follow up patch addresses this, lets keep that simple for
now.
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
mm/migrate.c | 17 ++++-------------
mm/rmap.c | 42 +++++++++++++++++++++++++++---------------
2 files changed, 31 insertions(+), 28 deletions(-)
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -336,9 +336,9 @@ void __init anon_vma_init(void)
* that the anon_vma pointer from page->mapping is valid if there is a
* mapcount, we can dereference the anon_vma after observing those.
*/
-struct anon_vma *page_lock_anon_vma(struct page *page)
+struct anon_vma *page_get_anon_vma(struct page *page)
{
- struct anon_vma *anon_vma, *root_anon_vma;
+ struct anon_vma *anon_vma = NULL;
unsigned long anon_mapping;
rcu_read_lock();
@@ -349,30 +349,42 @@ struct anon_vma *page_lock_anon_vma(stru
goto out;
anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
- root_anon_vma = ACCESS_ONCE(anon_vma->root);
- spin_lock(&root_anon_vma->lock);
+ if (!atomic_inc_not_zero(&anon_vma->refcount)) {
+ anon_vma = NULL;
+ goto out;
+ }
/*
* If this page is still mapped, then its anon_vma cannot have been
- * freed. But if it has been unmapped, we have no security against
- * the anon_vma structure being freed and reused (for another anon_vma:
- * SLAB_DESTROY_BY_RCU guarantees that - so the spin_lock above cannot
- * corrupt): with anon_vma_prepare() or anon_vma_fork() redirecting
- * anon_vma->root before page_unlock_anon_vma() is called to unlock.
+ * freed. But if it has been unmapped, we have no security against the
+ * anon_vma structure being freed and reused (for another anon_vma:
+ * SLAB_DESTROY_BY_RCU guarantees that - so the atomic_inc_not_zero()
+ * above cannot corrupt).
*/
- if (page_mapped(page))
- return anon_vma;
-
- spin_unlock(&root_anon_vma->lock);
+ if (!page_mapped(page)) {
+ put_anon_vma(anon_vma);
+ anon_vma = NULL;
+ }
out:
rcu_read_unlock();
- return NULL;
+
+ return anon_vma;
+}
+
+struct anon_vma *page_lock_anon_vma(struct page *page)
+{
+ struct anon_vma *anon_vma = page_get_anon_vma(page);
+
+ if (anon_vma)
+ anon_vma_lock(anon_vma);
+
+ return anon_vma;
}
void page_unlock_anon_vma(struct anon_vma *anon_vma)
{
anon_vma_unlock(anon_vma);
- rcu_read_unlock();
+ put_anon_vma(anon_vma);
}
/*
Index: linux-2.6/mm/migrate.c
===================================================================
--- linux-2.6.orig/mm/migrate.c
+++ linux-2.6/mm/migrate.c
@@ -703,15 +703,11 @@ static int unmap_and_move(new_page_t get
* Only page_lock_anon_vma() understands the subtleties of
* getting a hold on an anon_vma from outside one of its mms.
*/
- anon_vma = page_lock_anon_vma(page);
+ anon_vma = page_get_anon_vma(page);
if (anon_vma) {
/*
- * Take a reference count on the anon_vma if the
- * page is mapped so that it is guaranteed to
- * exist when the page is remapped later
+ * Anon page
*/
- get_anon_vma(anon_vma);
- page_unlock_anon_vma(anon_vma);
} else if (PageSwapCache(page)) {
/*
* We cannot be sure that the anon_vma of an unmapped
@@ -840,13 +836,8 @@ static int unmap_and_move_huge_page(new_
lock_page(hpage);
}
- if (PageAnon(hpage)) {
- anon_vma = page_lock_anon_vma(hpage);
- if (anon_vma) {
- get_anon_vma(anon_vma);
- page_unlock_anon_vma(anon_vma);
- }
- }
+ if (PageAnon(hpage))
+ anon_vma = page_get_anon_vma(hpage);
try_to_unmap(hpage, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 21/25] mm: Use refcounts for page_lock_anon_vma()
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-use_refcounts_for_page_lock_anon_vma.patch --]
[-- Type: text/plain, Size: 4295 bytes --]
Convert page_lock_anon_vma() over to use refcounts. This is done to
prepare for the conversion of anon_vma from spinlock to mutex.
Sadly this inceases the cost of page_lock_anon_vma() from one to two
atomics, a follow up patch addresses this, lets keep that simple for
now.
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
mm/migrate.c | 17 ++++-------------
mm/rmap.c | 42 +++++++++++++++++++++++++++---------------
2 files changed, 31 insertions(+), 28 deletions(-)
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -336,9 +336,9 @@ void __init anon_vma_init(void)
* that the anon_vma pointer from page->mapping is valid if there is a
* mapcount, we can dereference the anon_vma after observing those.
*/
-struct anon_vma *page_lock_anon_vma(struct page *page)
+struct anon_vma *page_get_anon_vma(struct page *page)
{
- struct anon_vma *anon_vma, *root_anon_vma;
+ struct anon_vma *anon_vma = NULL;
unsigned long anon_mapping;
rcu_read_lock();
@@ -349,30 +349,42 @@ struct anon_vma *page_lock_anon_vma(stru
goto out;
anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
- root_anon_vma = ACCESS_ONCE(anon_vma->root);
- spin_lock(&root_anon_vma->lock);
+ if (!atomic_inc_not_zero(&anon_vma->refcount)) {
+ anon_vma = NULL;
+ goto out;
+ }
/*
* If this page is still mapped, then its anon_vma cannot have been
- * freed. But if it has been unmapped, we have no security against
- * the anon_vma structure being freed and reused (for another anon_vma:
- * SLAB_DESTROY_BY_RCU guarantees that - so the spin_lock above cannot
- * corrupt): with anon_vma_prepare() or anon_vma_fork() redirecting
- * anon_vma->root before page_unlock_anon_vma() is called to unlock.
+ * freed. But if it has been unmapped, we have no security against the
+ * anon_vma structure being freed and reused (for another anon_vma:
+ * SLAB_DESTROY_BY_RCU guarantees that - so the atomic_inc_not_zero()
+ * above cannot corrupt).
*/
- if (page_mapped(page))
- return anon_vma;
-
- spin_unlock(&root_anon_vma->lock);
+ if (!page_mapped(page)) {
+ put_anon_vma(anon_vma);
+ anon_vma = NULL;
+ }
out:
rcu_read_unlock();
- return NULL;
+
+ return anon_vma;
+}
+
+struct anon_vma *page_lock_anon_vma(struct page *page)
+{
+ struct anon_vma *anon_vma = page_get_anon_vma(page);
+
+ if (anon_vma)
+ anon_vma_lock(anon_vma);
+
+ return anon_vma;
}
void page_unlock_anon_vma(struct anon_vma *anon_vma)
{
anon_vma_unlock(anon_vma);
- rcu_read_unlock();
+ put_anon_vma(anon_vma);
}
/*
Index: linux-2.6/mm/migrate.c
===================================================================
--- linux-2.6.orig/mm/migrate.c
+++ linux-2.6/mm/migrate.c
@@ -703,15 +703,11 @@ static int unmap_and_move(new_page_t get
* Only page_lock_anon_vma() understands the subtleties of
* getting a hold on an anon_vma from outside one of its mms.
*/
- anon_vma = page_lock_anon_vma(page);
+ anon_vma = page_get_anon_vma(page);
if (anon_vma) {
/*
- * Take a reference count on the anon_vma if the
- * page is mapped so that it is guaranteed to
- * exist when the page is remapped later
+ * Anon page
*/
- get_anon_vma(anon_vma);
- page_unlock_anon_vma(anon_vma);
} else if (PageSwapCache(page)) {
/*
* We cannot be sure that the anon_vma of an unmapped
@@ -840,13 +836,8 @@ static int unmap_and_move_huge_page(new_
lock_page(hpage);
}
- if (PageAnon(hpage)) {
- anon_vma = page_lock_anon_vma(hpage);
- if (anon_vma) {
- get_anon_vma(anon_vma);
- page_unlock_anon_vma(anon_vma);
- }
- }
+ if (PageAnon(hpage))
+ anon_vma = page_get_anon_vma(hpage);
try_to_unmap(hpage, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 21/25] mm: Use refcounts for page_lock_anon_vma()
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-use_refcounts_for_page_lock_anon_vma.patch --]
[-- Type: text/plain, Size: 4295 bytes --]
Convert page_lock_anon_vma() over to use refcounts. This is done to
prepare for the conversion of anon_vma from spinlock to mutex.
Sadly this inceases the cost of page_lock_anon_vma() from one to two
atomics, a follow up patch addresses this, lets keep that simple for
now.
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
mm/migrate.c | 17 ++++-------------
mm/rmap.c | 42 +++++++++++++++++++++++++++---------------
2 files changed, 31 insertions(+), 28 deletions(-)
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -336,9 +336,9 @@ void __init anon_vma_init(void)
* that the anon_vma pointer from page->mapping is valid if there is a
* mapcount, we can dereference the anon_vma after observing those.
*/
-struct anon_vma *page_lock_anon_vma(struct page *page)
+struct anon_vma *page_get_anon_vma(struct page *page)
{
- struct anon_vma *anon_vma, *root_anon_vma;
+ struct anon_vma *anon_vma = NULL;
unsigned long anon_mapping;
rcu_read_lock();
@@ -349,30 +349,42 @@ struct anon_vma *page_lock_anon_vma(stru
goto out;
anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
- root_anon_vma = ACCESS_ONCE(anon_vma->root);
- spin_lock(&root_anon_vma->lock);
+ if (!atomic_inc_not_zero(&anon_vma->refcount)) {
+ anon_vma = NULL;
+ goto out;
+ }
/*
* If this page is still mapped, then its anon_vma cannot have been
- * freed. But if it has been unmapped, we have no security against
- * the anon_vma structure being freed and reused (for another anon_vma:
- * SLAB_DESTROY_BY_RCU guarantees that - so the spin_lock above cannot
- * corrupt): with anon_vma_prepare() or anon_vma_fork() redirecting
- * anon_vma->root before page_unlock_anon_vma() is called to unlock.
+ * freed. But if it has been unmapped, we have no security against the
+ * anon_vma structure being freed and reused (for another anon_vma:
+ * SLAB_DESTROY_BY_RCU guarantees that - so the atomic_inc_not_zero()
+ * above cannot corrupt).
*/
- if (page_mapped(page))
- return anon_vma;
-
- spin_unlock(&root_anon_vma->lock);
+ if (!page_mapped(page)) {
+ put_anon_vma(anon_vma);
+ anon_vma = NULL;
+ }
out:
rcu_read_unlock();
- return NULL;
+
+ return anon_vma;
+}
+
+struct anon_vma *page_lock_anon_vma(struct page *page)
+{
+ struct anon_vma *anon_vma = page_get_anon_vma(page);
+
+ if (anon_vma)
+ anon_vma_lock(anon_vma);
+
+ return anon_vma;
}
void page_unlock_anon_vma(struct anon_vma *anon_vma)
{
anon_vma_unlock(anon_vma);
- rcu_read_unlock();
+ put_anon_vma(anon_vma);
}
/*
Index: linux-2.6/mm/migrate.c
===================================================================
--- linux-2.6.orig/mm/migrate.c
+++ linux-2.6/mm/migrate.c
@@ -703,15 +703,11 @@ static int unmap_and_move(new_page_t get
* Only page_lock_anon_vma() understands the subtleties of
* getting a hold on an anon_vma from outside one of its mms.
*/
- anon_vma = page_lock_anon_vma(page);
+ anon_vma = page_get_anon_vma(page);
if (anon_vma) {
/*
- * Take a reference count on the anon_vma if the
- * page is mapped so that it is guaranteed to
- * exist when the page is remapped later
+ * Anon page
*/
- get_anon_vma(anon_vma);
- page_unlock_anon_vma(anon_vma);
} else if (PageSwapCache(page)) {
/*
* We cannot be sure that the anon_vma of an unmapped
@@ -840,13 +836,8 @@ static int unmap_and_move_huge_page(new_
lock_page(hpage);
}
- if (PageAnon(hpage)) {
- anon_vma = page_lock_anon_vma(hpage);
- if (anon_vma) {
- get_anon_vma(anon_vma);
- page_unlock_anon_vma(anon_vma);
- }
- }
+ if (PageAnon(hpage))
+ anon_vma = page_get_anon_vma(hpage);
try_to_unmap(hpage, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 22/25] mm: Convert anon_vma->lock to a mutex
2011-01-25 17:31 ` Peter Zijlstra
(?)
@ 2011-01-25 17:31 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-anon_vma-lock_to_mutexes.patch --]
[-- Type: text/plain, Size: 7679 bytes --]
Straight fwd conversion of anon_vma->lock to a mutex.
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/huge_mm.h | 8 ++------
include/linux/mmu_notifier.h | 2 +-
include/linux/rmap.h | 14 +++++++-------
mm/huge_memory.c | 4 ++--
mm/mmap.c | 10 +++++-----
mm/rmap.c | 8 ++++----
6 files changed, 21 insertions(+), 25 deletions(-)
Index: linux-2.6/include/linux/rmap.h
===================================================================
--- linux-2.6.orig/include/linux/rmap.h
+++ linux-2.6/include/linux/rmap.h
@@ -7,7 +7,7 @@
#include <linux/list.h>
#include <linux/slab.h>
#include <linux/mm.h>
-#include <linux/spinlock.h>
+#include <linux/mutex.h>
#include <linux/memcontrol.h>
/*
@@ -26,7 +26,7 @@
*/
struct anon_vma {
struct anon_vma *root; /* Root of this anon_vma tree */
- spinlock_t lock; /* Serialize access to vma list */
+ struct mutex mutex; /* Serialize access to vma list */
/*
* The refcount is taken on an anon_vma when there is no
* guarantee that the vma of page tables will exist for
@@ -64,7 +64,7 @@ struct anon_vma_chain {
struct vm_area_struct *vma;
struct anon_vma *anon_vma;
struct list_head same_vma; /* locked by mmap_sem & page_table_lock */
- struct list_head same_anon_vma; /* locked by anon_vma->lock */
+ struct list_head same_anon_vma; /* locked by anon_vma->mutex */
};
#ifdef CONFIG_MMU
@@ -93,24 +93,24 @@ static inline void vma_lock_anon_vma(str
{
struct anon_vma *anon_vma = vma->anon_vma;
if (anon_vma)
- spin_lock(&anon_vma->root->lock);
+ mutex_lock(&anon_vma->root->mutex);
}
static inline void vma_unlock_anon_vma(struct vm_area_struct *vma)
{
struct anon_vma *anon_vma = vma->anon_vma;
if (anon_vma)
- spin_unlock(&anon_vma->root->lock);
+ mutex_unlock(&anon_vma->root->mutex);
}
static inline void anon_vma_lock(struct anon_vma *anon_vma)
{
- spin_lock(&anon_vma->root->lock);
+ mutex_lock(&anon_vma->root->mutex);
}
static inline void anon_vma_unlock(struct anon_vma *anon_vma)
{
- spin_unlock(&anon_vma->root->lock);
+ mutex_unlock(&anon_vma->root->mutex);
}
/*
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -25,7 +25,7 @@
* mm->mmap_sem
* page->flags PG_locked (lock_page)
* mapping->i_mmap_mutex
- * anon_vma->lock
+ * anon_vma->mutex
* mm->page_table_lock or pte_lock
* zone->lru_lock (in mark_page_accessed, isolate_lru_page)
* swap_lock (in swap_duplicate, swap_info_get)
@@ -39,7 +39,7 @@
*
* (code doesn't rely on that order so it could be switched around)
* ->tasklist_lock
- * anon_vma->lock (memory_failure, collect_procs_anon)
+ * anon_vma->mutex (memory_failure, collect_procs_anon)
* pte map lock
*/
@@ -306,7 +306,7 @@ static void anon_vma_ctor(void *data)
{
struct anon_vma *anon_vma = data;
- spin_lock_init(&anon_vma->lock);
+ mutex_init(&anon_vma->mutex);
atomic_set(&anon_vma->refcount, 0);
INIT_LIST_HEAD(&anon_vma->head);
}
@@ -1129,7 +1129,7 @@ int try_to_unmap_one(struct page *page,
/*
* We need mmap_sem locking, Otherwise VM_LOCKED check makes
* unstable result and race. Plus, We can't wait here because
- * we now hold anon_vma->lock or mapping->i_mmap_mutex.
+ * we now hold anon_vma->mutex or mapping->i_mmap_mutex.
* if trylock failed, the page remain in evictable lru and later
* vmscan could retry to move the page to unevictable lru if the
* page is actually mlocked.
Index: linux-2.6/mm/mmap.c
===================================================================
--- linux-2.6.orig/mm/mmap.c
+++ linux-2.6/mm/mmap.c
@@ -2523,15 +2523,15 @@ static void vm_lock_anon_vma(struct mm_s
* The LSB of head.next can't change from under us
* because we hold the mm_all_locks_mutex.
*/
- spin_lock_nest_lock(&anon_vma->root->lock, &mm->mmap_sem);
+ mutex_lock_nest_lock(&anon_vma->root->mutex, &mm->mmap_sem);
/*
* We can safely modify head.next after taking the
- * anon_vma->root->lock. If some other vma in this mm shares
+ * anon_vma->root->mutex. If some other vma in this mm shares
* the same anon_vma we won't take it again.
*
* No need of atomic instructions here, head.next
* can't change from under us thanks to the
- * anon_vma->root->lock.
+ * anon_vma->root->mutex.
*/
if (__test_and_set_bit(0, (unsigned long *)
&anon_vma->root->head.next))
@@ -2580,7 +2580,7 @@ static void vm_lock_mapping(struct mm_st
* vma in this mm is backed by the same anon_vma or address_space.
*
* We can take all the locks in random order because the VM code
- * taking i_mmap_mutex or anon_vma->lock outside the mmap_sem never
+ * taking i_mmap_mutex or anon_vma->mutex outside the mmap_sem never
* takes more than one of them in a row. Secondly we're protected
* against a concurrent mm_take_all_locks() by the mm_all_locks_mutex.
*
@@ -2636,7 +2636,7 @@ static void vm_unlock_anon_vma(struct an
*
* No need of atomic instructions here, head.next
* can't change from under us until we release the
- * anon_vma->root->lock.
+ * anon_vma->root->mutex.
*/
if (!__test_and_clear_bit(0, (unsigned long *)
&anon_vma->root->head.next))
Index: linux-2.6/include/linux/mmu_notifier.h
===================================================================
--- linux-2.6.orig/include/linux/mmu_notifier.h
+++ linux-2.6/include/linux/mmu_notifier.h
@@ -150,7 +150,7 @@ struct mmu_notifier_ops {
* Therefore notifier chains can only be traversed when either
*
* 1. mmap_sem is held.
- * 2. One of the reverse map locks is held (i_mmap_mutex or anon_vma->lock).
+ * 2. One of the reverse map locks is held (i_mmap_mutex or anon_vma->mutex).
* 3. No other concurrent thread can access the list (release)
*/
struct mmu_notifier {
Index: linux-2.6/mm/huge_memory.c
===================================================================
--- linux-2.6.orig/mm/huge_memory.c
+++ linux-2.6/mm/huge_memory.c
@@ -1128,7 +1128,7 @@ static int __split_huge_page_splitting(s
* We can't temporarily set the pmd to null in order
* to split it, the pmd must remain marked huge at all
* times or the VM won't take the pmd_trans_huge paths
- * and it won't wait on the anon_vma->root->lock to
+ * and it won't wait on the anon_vma->root->mutex to
* serialize against split_huge_page*.
*/
pmdp_splitting_flush_notify(vma, address, pmd);
@@ -1315,7 +1315,7 @@ static int __split_huge_page_map(struct
return ret;
}
-/* must be called with anon_vma->root->lock hold */
+/* must be called with anon_vma->root->mutex hold */
static void __split_huge_page(struct page *page,
struct anon_vma *anon_vma)
{
Index: linux-2.6/include/linux/huge_mm.h
===================================================================
--- linux-2.6.orig/include/linux/huge_mm.h
+++ linux-2.6/include/linux/huge_mm.h
@@ -91,12 +91,8 @@ extern void __split_huge_page_pmd(struct
#define wait_split_huge_page(__anon_vma, __pmd) \
do { \
pmd_t *____pmd = (__pmd); \
- spin_unlock_wait(&(__anon_vma)->root->lock); \
- /* \
- * spin_unlock_wait() is just a loop in C and so the \
- * CPU can reorder anything around it. \
- */ \
- smp_mb(); \
+ anon_vma_lock(__anon_vma); \
+ anon_vma_unlock(__anon_vma); \
BUG_ON(pmd_trans_splitting(*____pmd) || \
pmd_trans_huge(*____pmd)); \
} while (0)
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 22/25] mm: Convert anon_vma->lock to a mutex
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-anon_vma-lock_to_mutexes.patch --]
[-- Type: text/plain, Size: 7975 bytes --]
Straight fwd conversion of anon_vma->lock to a mutex.
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/huge_mm.h | 8 ++------
include/linux/mmu_notifier.h | 2 +-
include/linux/rmap.h | 14 +++++++-------
mm/huge_memory.c | 4 ++--
mm/mmap.c | 10 +++++-----
mm/rmap.c | 8 ++++----
6 files changed, 21 insertions(+), 25 deletions(-)
Index: linux-2.6/include/linux/rmap.h
===================================================================
--- linux-2.6.orig/include/linux/rmap.h
+++ linux-2.6/include/linux/rmap.h
@@ -7,7 +7,7 @@
#include <linux/list.h>
#include <linux/slab.h>
#include <linux/mm.h>
-#include <linux/spinlock.h>
+#include <linux/mutex.h>
#include <linux/memcontrol.h>
/*
@@ -26,7 +26,7 @@
*/
struct anon_vma {
struct anon_vma *root; /* Root of this anon_vma tree */
- spinlock_t lock; /* Serialize access to vma list */
+ struct mutex mutex; /* Serialize access to vma list */
/*
* The refcount is taken on an anon_vma when there is no
* guarantee that the vma of page tables will exist for
@@ -64,7 +64,7 @@ struct anon_vma_chain {
struct vm_area_struct *vma;
struct anon_vma *anon_vma;
struct list_head same_vma; /* locked by mmap_sem & page_table_lock */
- struct list_head same_anon_vma; /* locked by anon_vma->lock */
+ struct list_head same_anon_vma; /* locked by anon_vma->mutex */
};
#ifdef CONFIG_MMU
@@ -93,24 +93,24 @@ static inline void vma_lock_anon_vma(str
{
struct anon_vma *anon_vma = vma->anon_vma;
if (anon_vma)
- spin_lock(&anon_vma->root->lock);
+ mutex_lock(&anon_vma->root->mutex);
}
static inline void vma_unlock_anon_vma(struct vm_area_struct *vma)
{
struct anon_vma *anon_vma = vma->anon_vma;
if (anon_vma)
- spin_unlock(&anon_vma->root->lock);
+ mutex_unlock(&anon_vma->root->mutex);
}
static inline void anon_vma_lock(struct anon_vma *anon_vma)
{
- spin_lock(&anon_vma->root->lock);
+ mutex_lock(&anon_vma->root->mutex);
}
static inline void anon_vma_unlock(struct anon_vma *anon_vma)
{
- spin_unlock(&anon_vma->root->lock);
+ mutex_unlock(&anon_vma->root->mutex);
}
/*
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -25,7 +25,7 @@
* mm->mmap_sem
* page->flags PG_locked (lock_page)
* mapping->i_mmap_mutex
- * anon_vma->lock
+ * anon_vma->mutex
* mm->page_table_lock or pte_lock
* zone->lru_lock (in mark_page_accessed, isolate_lru_page)
* swap_lock (in swap_duplicate, swap_info_get)
@@ -39,7 +39,7 @@
*
* (code doesn't rely on that order so it could be switched around)
* ->tasklist_lock
- * anon_vma->lock (memory_failure, collect_procs_anon)
+ * anon_vma->mutex (memory_failure, collect_procs_anon)
* pte map lock
*/
@@ -306,7 +306,7 @@ static void anon_vma_ctor(void *data)
{
struct anon_vma *anon_vma = data;
- spin_lock_init(&anon_vma->lock);
+ mutex_init(&anon_vma->mutex);
atomic_set(&anon_vma->refcount, 0);
INIT_LIST_HEAD(&anon_vma->head);
}
@@ -1129,7 +1129,7 @@ int try_to_unmap_one(struct page *page,
/*
* We need mmap_sem locking, Otherwise VM_LOCKED check makes
* unstable result and race. Plus, We can't wait here because
- * we now hold anon_vma->lock or mapping->i_mmap_mutex.
+ * we now hold anon_vma->mutex or mapping->i_mmap_mutex.
* if trylock failed, the page remain in evictable lru and later
* vmscan could retry to move the page to unevictable lru if the
* page is actually mlocked.
Index: linux-2.6/mm/mmap.c
===================================================================
--- linux-2.6.orig/mm/mmap.c
+++ linux-2.6/mm/mmap.c
@@ -2523,15 +2523,15 @@ static void vm_lock_anon_vma(struct mm_s
* The LSB of head.next can't change from under us
* because we hold the mm_all_locks_mutex.
*/
- spin_lock_nest_lock(&anon_vma->root->lock, &mm->mmap_sem);
+ mutex_lock_nest_lock(&anon_vma->root->mutex, &mm->mmap_sem);
/*
* We can safely modify head.next after taking the
- * anon_vma->root->lock. If some other vma in this mm shares
+ * anon_vma->root->mutex. If some other vma in this mm shares
* the same anon_vma we won't take it again.
*
* No need of atomic instructions here, head.next
* can't change from under us thanks to the
- * anon_vma->root->lock.
+ * anon_vma->root->mutex.
*/
if (__test_and_set_bit(0, (unsigned long *)
&anon_vma->root->head.next))
@@ -2580,7 +2580,7 @@ static void vm_lock_mapping(struct mm_st
* vma in this mm is backed by the same anon_vma or address_space.
*
* We can take all the locks in random order because the VM code
- * taking i_mmap_mutex or anon_vma->lock outside the mmap_sem never
+ * taking i_mmap_mutex or anon_vma->mutex outside the mmap_sem never
* takes more than one of them in a row. Secondly we're protected
* against a concurrent mm_take_all_locks() by the mm_all_locks_mutex.
*
@@ -2636,7 +2636,7 @@ static void vm_unlock_anon_vma(struct an
*
* No need of atomic instructions here, head.next
* can't change from under us until we release the
- * anon_vma->root->lock.
+ * anon_vma->root->mutex.
*/
if (!__test_and_clear_bit(0, (unsigned long *)
&anon_vma->root->head.next))
Index: linux-2.6/include/linux/mmu_notifier.h
===================================================================
--- linux-2.6.orig/include/linux/mmu_notifier.h
+++ linux-2.6/include/linux/mmu_notifier.h
@@ -150,7 +150,7 @@ struct mmu_notifier_ops {
* Therefore notifier chains can only be traversed when either
*
* 1. mmap_sem is held.
- * 2. One of the reverse map locks is held (i_mmap_mutex or anon_vma->lock).
+ * 2. One of the reverse map locks is held (i_mmap_mutex or anon_vma->mutex).
* 3. No other concurrent thread can access the list (release)
*/
struct mmu_notifier {
Index: linux-2.6/mm/huge_memory.c
===================================================================
--- linux-2.6.orig/mm/huge_memory.c
+++ linux-2.6/mm/huge_memory.c
@@ -1128,7 +1128,7 @@ static int __split_huge_page_splitting(s
* We can't temporarily set the pmd to null in order
* to split it, the pmd must remain marked huge at all
* times or the VM won't take the pmd_trans_huge paths
- * and it won't wait on the anon_vma->root->lock to
+ * and it won't wait on the anon_vma->root->mutex to
* serialize against split_huge_page*.
*/
pmdp_splitting_flush_notify(vma, address, pmd);
@@ -1315,7 +1315,7 @@ static int __split_huge_page_map(struct
return ret;
}
-/* must be called with anon_vma->root->lock hold */
+/* must be called with anon_vma->root->mutex hold */
static void __split_huge_page(struct page *page,
struct anon_vma *anon_vma)
{
Index: linux-2.6/include/linux/huge_mm.h
===================================================================
--- linux-2.6.orig/include/linux/huge_mm.h
+++ linux-2.6/include/linux/huge_mm.h
@@ -91,12 +91,8 @@ extern void __split_huge_page_pmd(struct
#define wait_split_huge_page(__anon_vma, __pmd) \
do { \
pmd_t *____pmd = (__pmd); \
- spin_unlock_wait(&(__anon_vma)->root->lock); \
- /* \
- * spin_unlock_wait() is just a loop in C and so the \
- * CPU can reorder anything around it. \
- */ \
- smp_mb(); \
+ anon_vma_lock(__anon_vma); \
+ anon_vma_unlock(__anon_vma); \
BUG_ON(pmd_trans_splitting(*____pmd) || \
pmd_trans_huge(*____pmd)); \
} while (0)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 22/25] mm: Convert anon_vma->lock to a mutex
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-anon_vma-lock_to_mutexes.patch --]
[-- Type: text/plain, Size: 7975 bytes --]
Straight fwd conversion of anon_vma->lock to a mutex.
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/huge_mm.h | 8 ++------
include/linux/mmu_notifier.h | 2 +-
include/linux/rmap.h | 14 +++++++-------
mm/huge_memory.c | 4 ++--
mm/mmap.c | 10 +++++-----
mm/rmap.c | 8 ++++----
6 files changed, 21 insertions(+), 25 deletions(-)
Index: linux-2.6/include/linux/rmap.h
===================================================================
--- linux-2.6.orig/include/linux/rmap.h
+++ linux-2.6/include/linux/rmap.h
@@ -7,7 +7,7 @@
#include <linux/list.h>
#include <linux/slab.h>
#include <linux/mm.h>
-#include <linux/spinlock.h>
+#include <linux/mutex.h>
#include <linux/memcontrol.h>
/*
@@ -26,7 +26,7 @@
*/
struct anon_vma {
struct anon_vma *root; /* Root of this anon_vma tree */
- spinlock_t lock; /* Serialize access to vma list */
+ struct mutex mutex; /* Serialize access to vma list */
/*
* The refcount is taken on an anon_vma when there is no
* guarantee that the vma of page tables will exist for
@@ -64,7 +64,7 @@ struct anon_vma_chain {
struct vm_area_struct *vma;
struct anon_vma *anon_vma;
struct list_head same_vma; /* locked by mmap_sem & page_table_lock */
- struct list_head same_anon_vma; /* locked by anon_vma->lock */
+ struct list_head same_anon_vma; /* locked by anon_vma->mutex */
};
#ifdef CONFIG_MMU
@@ -93,24 +93,24 @@ static inline void vma_lock_anon_vma(str
{
struct anon_vma *anon_vma = vma->anon_vma;
if (anon_vma)
- spin_lock(&anon_vma->root->lock);
+ mutex_lock(&anon_vma->root->mutex);
}
static inline void vma_unlock_anon_vma(struct vm_area_struct *vma)
{
struct anon_vma *anon_vma = vma->anon_vma;
if (anon_vma)
- spin_unlock(&anon_vma->root->lock);
+ mutex_unlock(&anon_vma->root->mutex);
}
static inline void anon_vma_lock(struct anon_vma *anon_vma)
{
- spin_lock(&anon_vma->root->lock);
+ mutex_lock(&anon_vma->root->mutex);
}
static inline void anon_vma_unlock(struct anon_vma *anon_vma)
{
- spin_unlock(&anon_vma->root->lock);
+ mutex_unlock(&anon_vma->root->mutex);
}
/*
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -25,7 +25,7 @@
* mm->mmap_sem
* page->flags PG_locked (lock_page)
* mapping->i_mmap_mutex
- * anon_vma->lock
+ * anon_vma->mutex
* mm->page_table_lock or pte_lock
* zone->lru_lock (in mark_page_accessed, isolate_lru_page)
* swap_lock (in swap_duplicate, swap_info_get)
@@ -39,7 +39,7 @@
*
* (code doesn't rely on that order so it could be switched around)
* ->tasklist_lock
- * anon_vma->lock (memory_failure, collect_procs_anon)
+ * anon_vma->mutex (memory_failure, collect_procs_anon)
* pte map lock
*/
@@ -306,7 +306,7 @@ static void anon_vma_ctor(void *data)
{
struct anon_vma *anon_vma = data;
- spin_lock_init(&anon_vma->lock);
+ mutex_init(&anon_vma->mutex);
atomic_set(&anon_vma->refcount, 0);
INIT_LIST_HEAD(&anon_vma->head);
}
@@ -1129,7 +1129,7 @@ int try_to_unmap_one(struct page *page,
/*
* We need mmap_sem locking, Otherwise VM_LOCKED check makes
* unstable result and race. Plus, We can't wait here because
- * we now hold anon_vma->lock or mapping->i_mmap_mutex.
+ * we now hold anon_vma->mutex or mapping->i_mmap_mutex.
* if trylock failed, the page remain in evictable lru and later
* vmscan could retry to move the page to unevictable lru if the
* page is actually mlocked.
Index: linux-2.6/mm/mmap.c
===================================================================
--- linux-2.6.orig/mm/mmap.c
+++ linux-2.6/mm/mmap.c
@@ -2523,15 +2523,15 @@ static void vm_lock_anon_vma(struct mm_s
* The LSB of head.next can't change from under us
* because we hold the mm_all_locks_mutex.
*/
- spin_lock_nest_lock(&anon_vma->root->lock, &mm->mmap_sem);
+ mutex_lock_nest_lock(&anon_vma->root->mutex, &mm->mmap_sem);
/*
* We can safely modify head.next after taking the
- * anon_vma->root->lock. If some other vma in this mm shares
+ * anon_vma->root->mutex. If some other vma in this mm shares
* the same anon_vma we won't take it again.
*
* No need of atomic instructions here, head.next
* can't change from under us thanks to the
- * anon_vma->root->lock.
+ * anon_vma->root->mutex.
*/
if (__test_and_set_bit(0, (unsigned long *)
&anon_vma->root->head.next))
@@ -2580,7 +2580,7 @@ static void vm_lock_mapping(struct mm_st
* vma in this mm is backed by the same anon_vma or address_space.
*
* We can take all the locks in random order because the VM code
- * taking i_mmap_mutex or anon_vma->lock outside the mmap_sem never
+ * taking i_mmap_mutex or anon_vma->mutex outside the mmap_sem never
* takes more than one of them in a row. Secondly we're protected
* against a concurrent mm_take_all_locks() by the mm_all_locks_mutex.
*
@@ -2636,7 +2636,7 @@ static void vm_unlock_anon_vma(struct an
*
* No need of atomic instructions here, head.next
* can't change from under us until we release the
- * anon_vma->root->lock.
+ * anon_vma->root->mutex.
*/
if (!__test_and_clear_bit(0, (unsigned long *)
&anon_vma->root->head.next))
Index: linux-2.6/include/linux/mmu_notifier.h
===================================================================
--- linux-2.6.orig/include/linux/mmu_notifier.h
+++ linux-2.6/include/linux/mmu_notifier.h
@@ -150,7 +150,7 @@ struct mmu_notifier_ops {
* Therefore notifier chains can only be traversed when either
*
* 1. mmap_sem is held.
- * 2. One of the reverse map locks is held (i_mmap_mutex or anon_vma->lock).
+ * 2. One of the reverse map locks is held (i_mmap_mutex or anon_vma->mutex).
* 3. No other concurrent thread can access the list (release)
*/
struct mmu_notifier {
Index: linux-2.6/mm/huge_memory.c
===================================================================
--- linux-2.6.orig/mm/huge_memory.c
+++ linux-2.6/mm/huge_memory.c
@@ -1128,7 +1128,7 @@ static int __split_huge_page_splitting(s
* We can't temporarily set the pmd to null in order
* to split it, the pmd must remain marked huge at all
* times or the VM won't take the pmd_trans_huge paths
- * and it won't wait on the anon_vma->root->lock to
+ * and it won't wait on the anon_vma->root->mutex to
* serialize against split_huge_page*.
*/
pmdp_splitting_flush_notify(vma, address, pmd);
@@ -1315,7 +1315,7 @@ static int __split_huge_page_map(struct
return ret;
}
-/* must be called with anon_vma->root->lock hold */
+/* must be called with anon_vma->root->mutex hold */
static void __split_huge_page(struct page *page,
struct anon_vma *anon_vma)
{
Index: linux-2.6/include/linux/huge_mm.h
===================================================================
--- linux-2.6.orig/include/linux/huge_mm.h
+++ linux-2.6/include/linux/huge_mm.h
@@ -91,12 +91,8 @@ extern void __split_huge_page_pmd(struct
#define wait_split_huge_page(__anon_vma, __pmd) \
do { \
pmd_t *____pmd = (__pmd); \
- spin_unlock_wait(&(__anon_vma)->root->lock); \
- /* \
- * spin_unlock_wait() is just a loop in C and so the \
- * CPU can reorder anything around it. \
- */ \
- smp_mb(); \
+ anon_vma_lock(__anon_vma); \
+ anon_vma_unlock(__anon_vma); \
BUG_ON(pmd_trans_splitting(*____pmd) || \
pmd_trans_huge(*____pmd)); \
} while (0)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 22/25] mm: Convert anon_vma->lock to a mutex
2011-01-25 17:31 ` Peter Zijlstra
@ 2011-02-03 5:27 ` KOSAKI Motohiro
-1 siblings, 0 replies; 128+ messages in thread
From: KOSAKI Motohiro @ 2011-02-03 5:27 UTC (permalink / raw)
To: Peter Zijlstra
Cc: kosaki.motohiro, Andrea Arcangeli, Avi Kivity, Thomas Gleixner,
Rik van Riel, Ingo Molnar, akpm, Linus Torvalds, linux-kernel,
linux-arch, linux-mm, Benjamin Herrenschmidt, David Miller,
Hugh Dickins, Mel Gorman, Nick Piggin, Paul McKenney,
Yanmin Zhang, Hugh Dickins
> Straight fwd conversion of anon_vma->lock to a mutex.
>
> Acked-by: Hugh Dickins <hughd@google.com>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Don't I ack this series at previous iteration? If not, Hmmm.. I haven't remenber
the reason. Anyway
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 22/25] mm: Convert anon_vma->lock to a mutex
@ 2011-02-03 5:27 ` KOSAKI Motohiro
0 siblings, 0 replies; 128+ messages in thread
From: KOSAKI Motohiro @ 2011-02-03 5:27 UTC (permalink / raw)
To: Peter Zijlstra
Cc: kosaki.motohiro, Andrea Arcangeli, Avi Kivity, Thomas Gleixner,
Rik van Riel, Ingo Molnar, akpm, Linus Torvalds, linux-kernel,
linux-arch, linux-mm, Benjamin Herrenschmidt, David Miller,
Hugh Dickins, Mel Gorman, Nick Piggin, Paul McKenney,
Yanmin Zhang, Hugh Dickins
> Straight fwd conversion of anon_vma->lock to a mutex.
>
> Acked-by: Hugh Dickins <hughd@google.com>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Don't I ack this series at previous iteration? If not, Hmmm.. I haven't remenber
the reason. Anyway
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 22/25] mm: Convert anon_vma->lock to a mutex
2011-02-03 5:27 ` KOSAKI Motohiro
@ 2011-02-03 15:04 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-02-03 15:04 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds, linux-kernel, linux-arch,
linux-mm, Benjamin Herrenschmidt, David Miller, Hugh Dickins,
Mel Gorman, Nick Piggin, Paul McKenney, Yanmin Zhang,
Hugh Dickins
On Thu, 2011-02-03 at 14:27 +0900, KOSAKI Motohiro wrote:
> > Straight fwd conversion of anon_vma->lock to a mutex.
> >
> > Acked-by: Hugh Dickins <hughd@google.com>
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
>
> Don't I ack this series at previous iteration? If not, Hmmm.. I haven't remenber
> the reason.
I got a +1 email from you on the spinlock to mutex conversion patch, I
wasn't quite sure to what tag that translated.
> Anyway
> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Is this for this particular patch, or for the series?
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 22/25] mm: Convert anon_vma->lock to a mutex
@ 2011-02-03 15:04 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-02-03 15:04 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds, linux-kernel, linux-arch,
linux-mm, Benjamin Herrenschmidt, David Miller, Hugh Dickins,
Mel Gorman, Nick Piggin, Paul McKenney, Yanmin Zhang,
Hugh Dickins
On Thu, 2011-02-03 at 14:27 +0900, KOSAKI Motohiro wrote:
> > Straight fwd conversion of anon_vma->lock to a mutex.
> >
> > Acked-by: Hugh Dickins <hughd@google.com>
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
>
> Don't I ack this series at previous iteration? If not, Hmmm.. I haven't remenber
> the reason.
I got a +1 email from you on the spinlock to mutex conversion patch, I
wasn't quite sure to what tag that translated.
> Anyway
> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Is this for this particular patch, or for the series?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 22/25] mm: Convert anon_vma->lock to a mutex
2011-02-03 15:04 ` Peter Zijlstra
@ 2011-02-04 4:35 ` KOSAKI Motohiro
-1 siblings, 0 replies; 128+ messages in thread
From: KOSAKI Motohiro @ 2011-02-04 4:35 UTC (permalink / raw)
To: Peter Zijlstra
Cc: kosaki.motohiro, Andrea Arcangeli, Avi Kivity, Thomas Gleixner,
Rik van Riel, Ingo Molnar, akpm, Linus Torvalds, linux-kernel,
linux-arch, linux-mm, Benjamin Herrenschmidt, David Miller,
Hugh Dickins, Mel Gorman, Nick Piggin, Paul McKenney,
Yanmin Zhang, Hugh Dickins
> On Thu, 2011-02-03 at 14:27 +0900, KOSAKI Motohiro wrote:
> > > Straight fwd conversion of anon_vma->lock to a mutex.
> > >
> > > Acked-by: Hugh Dickins <hughd@google.com>
> > > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> >
> > Don't I ack this series at previous iteration? If not, Hmmm.. I haven't remenber
> > the reason.
>
> I got a +1 email from you on the spinlock to mutex conversion patch, I
> wasn't quite sure to what tag that translated.
I'm sorry. Maybe Im busy and I wrote unkindly mail. yes, it's my fault.
> > Anyway
> > Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>
> Is this for this particular patch, or for the series?
series. please.
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 22/25] mm: Convert anon_vma->lock to a mutex
@ 2011-02-04 4:35 ` KOSAKI Motohiro
0 siblings, 0 replies; 128+ messages in thread
From: KOSAKI Motohiro @ 2011-02-04 4:35 UTC (permalink / raw)
To: Peter Zijlstra
Cc: kosaki.motohiro, Andrea Arcangeli, Avi Kivity, Thomas Gleixner,
Rik van Riel, Ingo Molnar, akpm, Linus Torvalds, linux-kernel,
linux-arch, linux-mm, Benjamin Herrenschmidt, David Miller,
Hugh Dickins, Mel Gorman, Nick Piggin, Paul McKenney,
Yanmin Zhang, Hugh Dickins
> On Thu, 2011-02-03 at 14:27 +0900, KOSAKI Motohiro wrote:
> > > Straight fwd conversion of anon_vma->lock to a mutex.
> > >
> > > Acked-by: Hugh Dickins <hughd@google.com>
> > > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> >
> > Don't I ack this series at previous iteration? If not, Hmmm.. I haven't remenber
> > the reason.
>
> I got a +1 email from you on the spinlock to mutex conversion patch, I
> wasn't quite sure to what tag that translated.
I'm sorry. Maybe Im busy and I wrote unkindly mail. yes, it's my fault.
> > Anyway
> > Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>
> Is this for this particular patch, or for the series?
series. please.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 23/25] mm: Optimize page_lock_anon_vma() fast-path
2011-01-25 17:31 ` Peter Zijlstra
(?)
@ 2011-01-25 17:31 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: peter_zijlstra-mm-optimize_page_lock_anon_vma_fast-path.patch --]
[-- Type: text/plain, Size: 3319 bytes --]
Optimize the page_lock_anon_vma() fast path to be one LOCKed op,
instead of two.
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
mm/rmap.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 78 insertions(+), 4 deletions(-)
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -371,20 +371,75 @@ struct anon_vma *page_get_anon_vma(struc
return anon_vma;
}
+/*
+ * Similar to page_get_anon_vma() except it locks the anon_vma.
+ *
+ * Its a little more complex as it tries to keep the fast path to a single
+ * atomic op -- the trylock. If we fail the trylock, we fall back to getting a
+ * reference like with page_get_anon_vma() and then block on the mutex.
+ */
struct anon_vma *page_lock_anon_vma(struct page *page)
{
- struct anon_vma *anon_vma = page_get_anon_vma(page);
+ struct anon_vma *anon_vma = NULL;
+ unsigned long anon_mapping;
- if (anon_vma)
- anon_vma_lock(anon_vma);
+ rcu_read_lock();
+ anon_mapping = (unsigned long) ACCESS_ONCE(page->mapping);
+ if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON)
+ goto out;
+ if (!page_mapped(page))
+ goto out;
+
+ anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
+ if (mutex_trylock(&anon_vma->root->mutex)) {
+ /*
+ * If we observe a !0 refcount, then holding the lock ensures
+ * the anon_vma will not go away, see __put_anon_vma().
+ */
+ if (!atomic_read(&anon_vma->refcount)) {
+ anon_vma_unlock(anon_vma);
+ anon_vma = NULL;
+ }
+ goto out;
+ }
+
+ /* trylock failed, we got to sleep */
+ if (!atomic_inc_not_zero(&anon_vma->refcount)) {
+ anon_vma = NULL;
+ goto out;
+ }
+
+ if (!page_mapped(page)) {
+ put_anon_vma(anon_vma);
+ anon_vma = NULL;
+ goto out;
+ }
+
+ /* we pinned the anon_vma, its safe to sleep */
+ rcu_read_unlock();
+ anon_vma_lock(anon_vma);
+
+ if (atomic_dec_and_test(&anon_vma->refcount)) {
+ /*
+ * Oops, we held the last refcount, release the lock
+ * and bail -- can't simply use put_anon_vma() because
+ * we'll deadlock on the anon_vma_lock() recursion.
+ */
+ anon_vma_unlock(anon_vma);
+ __put_anon_vma(anon_vma);
+ anon_vma = NULL;
+ }
return anon_vma;
+
+out:
+ rcu_read_unlock();
+ return anon_vma;
}
void page_unlock_anon_vma(struct anon_vma *anon_vma)
{
anon_vma_unlock(anon_vma);
- put_anon_vma(anon_vma);
}
/*
@@ -1500,6 +1555,25 @@ int try_to_munlock(struct page *page)
void __put_anon_vma(struct anon_vma *anon_vma)
{
+ /*
+ * Synchronize against page_lock_anon_vma() such that
+ * we can safely hold the lock without the anon_vma getting
+ * freed.
+ *
+ * Relies on the full mb implied by the atomic_dec_and_test() from
+ * put_anon_vma() against the lock implied by mutex_trylock() from
+ * page_lock_anon_vma(). This orders:
+ *
+ * page_lock_anon_vma() VS put_anon_vma()
+ * mutex_trylock() atomic_dec_and_test()
+ * LOCK MB
+ * atomic_read() mutex_is_locked()
+ */
+ if (mutex_is_locked(&anon_vma->root->mutex)) {
+ anon_vma_lock(anon_vma);
+ anon_vma_unlock(anon_vma);
+ }
+
if (anon_vma->root != anon_vma)
put_anon_vma(anon_vma->root);
anon_vma_free(anon_vma);
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 23/25] mm: Optimize page_lock_anon_vma() fast-path
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: peter_zijlstra-mm-optimize_page_lock_anon_vma_fast-path.patch --]
[-- Type: text/plain, Size: 3615 bytes --]
Optimize the page_lock_anon_vma() fast path to be one LOCKed op,
instead of two.
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
mm/rmap.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 78 insertions(+), 4 deletions(-)
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -371,20 +371,75 @@ struct anon_vma *page_get_anon_vma(struc
return anon_vma;
}
+/*
+ * Similar to page_get_anon_vma() except it locks the anon_vma.
+ *
+ * Its a little more complex as it tries to keep the fast path to a single
+ * atomic op -- the trylock. If we fail the trylock, we fall back to getting a
+ * reference like with page_get_anon_vma() and then block on the mutex.
+ */
struct anon_vma *page_lock_anon_vma(struct page *page)
{
- struct anon_vma *anon_vma = page_get_anon_vma(page);
+ struct anon_vma *anon_vma = NULL;
+ unsigned long anon_mapping;
- if (anon_vma)
- anon_vma_lock(anon_vma);
+ rcu_read_lock();
+ anon_mapping = (unsigned long) ACCESS_ONCE(page->mapping);
+ if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON)
+ goto out;
+ if (!page_mapped(page))
+ goto out;
+
+ anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
+ if (mutex_trylock(&anon_vma->root->mutex)) {
+ /*
+ * If we observe a !0 refcount, then holding the lock ensures
+ * the anon_vma will not go away, see __put_anon_vma().
+ */
+ if (!atomic_read(&anon_vma->refcount)) {
+ anon_vma_unlock(anon_vma);
+ anon_vma = NULL;
+ }
+ goto out;
+ }
+
+ /* trylock failed, we got to sleep */
+ if (!atomic_inc_not_zero(&anon_vma->refcount)) {
+ anon_vma = NULL;
+ goto out;
+ }
+
+ if (!page_mapped(page)) {
+ put_anon_vma(anon_vma);
+ anon_vma = NULL;
+ goto out;
+ }
+
+ /* we pinned the anon_vma, its safe to sleep */
+ rcu_read_unlock();
+ anon_vma_lock(anon_vma);
+
+ if (atomic_dec_and_test(&anon_vma->refcount)) {
+ /*
+ * Oops, we held the last refcount, release the lock
+ * and bail -- can't simply use put_anon_vma() because
+ * we'll deadlock on the anon_vma_lock() recursion.
+ */
+ anon_vma_unlock(anon_vma);
+ __put_anon_vma(anon_vma);
+ anon_vma = NULL;
+ }
return anon_vma;
+
+out:
+ rcu_read_unlock();
+ return anon_vma;
}
void page_unlock_anon_vma(struct anon_vma *anon_vma)
{
anon_vma_unlock(anon_vma);
- put_anon_vma(anon_vma);
}
/*
@@ -1500,6 +1555,25 @@ int try_to_munlock(struct page *page)
void __put_anon_vma(struct anon_vma *anon_vma)
{
+ /*
+ * Synchronize against page_lock_anon_vma() such that
+ * we can safely hold the lock without the anon_vma getting
+ * freed.
+ *
+ * Relies on the full mb implied by the atomic_dec_and_test() from
+ * put_anon_vma() against the lock implied by mutex_trylock() from
+ * page_lock_anon_vma(). This orders:
+ *
+ * page_lock_anon_vma() VS put_anon_vma()
+ * mutex_trylock() atomic_dec_and_test()
+ * LOCK MB
+ * atomic_read() mutex_is_locked()
+ */
+ if (mutex_is_locked(&anon_vma->root->mutex)) {
+ anon_vma_lock(anon_vma);
+ anon_vma_unlock(anon_vma);
+ }
+
if (anon_vma->root != anon_vma)
put_anon_vma(anon_vma->root);
anon_vma_free(anon_vma);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 23/25] mm: Optimize page_lock_anon_vma() fast-path
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: peter_zijlstra-mm-optimize_page_lock_anon_vma_fast-path.patch --]
[-- Type: text/plain, Size: 3615 bytes --]
Optimize the page_lock_anon_vma() fast path to be one LOCKed op,
instead of two.
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
mm/rmap.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 78 insertions(+), 4 deletions(-)
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -371,20 +371,75 @@ struct anon_vma *page_get_anon_vma(struc
return anon_vma;
}
+/*
+ * Similar to page_get_anon_vma() except it locks the anon_vma.
+ *
+ * Its a little more complex as it tries to keep the fast path to a single
+ * atomic op -- the trylock. If we fail the trylock, we fall back to getting a
+ * reference like with page_get_anon_vma() and then block on the mutex.
+ */
struct anon_vma *page_lock_anon_vma(struct page *page)
{
- struct anon_vma *anon_vma = page_get_anon_vma(page);
+ struct anon_vma *anon_vma = NULL;
+ unsigned long anon_mapping;
- if (anon_vma)
- anon_vma_lock(anon_vma);
+ rcu_read_lock();
+ anon_mapping = (unsigned long) ACCESS_ONCE(page->mapping);
+ if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON)
+ goto out;
+ if (!page_mapped(page))
+ goto out;
+
+ anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
+ if (mutex_trylock(&anon_vma->root->mutex)) {
+ /*
+ * If we observe a !0 refcount, then holding the lock ensures
+ * the anon_vma will not go away, see __put_anon_vma().
+ */
+ if (!atomic_read(&anon_vma->refcount)) {
+ anon_vma_unlock(anon_vma);
+ anon_vma = NULL;
+ }
+ goto out;
+ }
+
+ /* trylock failed, we got to sleep */
+ if (!atomic_inc_not_zero(&anon_vma->refcount)) {
+ anon_vma = NULL;
+ goto out;
+ }
+
+ if (!page_mapped(page)) {
+ put_anon_vma(anon_vma);
+ anon_vma = NULL;
+ goto out;
+ }
+
+ /* we pinned the anon_vma, its safe to sleep */
+ rcu_read_unlock();
+ anon_vma_lock(anon_vma);
+
+ if (atomic_dec_and_test(&anon_vma->refcount)) {
+ /*
+ * Oops, we held the last refcount, release the lock
+ * and bail -- can't simply use put_anon_vma() because
+ * we'll deadlock on the anon_vma_lock() recursion.
+ */
+ anon_vma_unlock(anon_vma);
+ __put_anon_vma(anon_vma);
+ anon_vma = NULL;
+ }
return anon_vma;
+
+out:
+ rcu_read_unlock();
+ return anon_vma;
}
void page_unlock_anon_vma(struct anon_vma *anon_vma)
{
anon_vma_unlock(anon_vma);
- put_anon_vma(anon_vma);
}
/*
@@ -1500,6 +1555,25 @@ int try_to_munlock(struct page *page)
void __put_anon_vma(struct anon_vma *anon_vma)
{
+ /*
+ * Synchronize against page_lock_anon_vma() such that
+ * we can safely hold the lock without the anon_vma getting
+ * freed.
+ *
+ * Relies on the full mb implied by the atomic_dec_and_test() from
+ * put_anon_vma() against the lock implied by mutex_trylock() from
+ * page_lock_anon_vma(). This orders:
+ *
+ * page_lock_anon_vma() VS put_anon_vma()
+ * mutex_trylock() atomic_dec_and_test()
+ * LOCK MB
+ * atomic_read() mutex_is_locked()
+ */
+ if (mutex_is_locked(&anon_vma->root->mutex)) {
+ anon_vma_lock(anon_vma);
+ anon_vma_unlock(anon_vma);
+ }
+
if (anon_vma->root != anon_vma)
put_anon_vma(anon_vma->root);
anon_vma_free(anon_vma);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 24/25] mm: Remove i_mmap_mutex lockbreak
2011-01-25 17:31 ` Peter Zijlstra
(?)
@ 2011-01-25 17:31 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: mm-fix-zap_block_size.patch --]
[-- Type: text/plain, Size: 16130 bytes --]
Hugh says:
"The only significant loser, I think, would be page reclaim (when
concurrent with truncation): could spin for a long time waiting for
the i_mmap_mutex it expects would soon be dropped? "
Counter points:
- cpu contention makes the spin stop (need_resched())
- zap pages should be freeing pages at a higher rate than reclaim
ever can
- shouldn't hold up reclaim more than lock_page() would
If we're going to do this, we can remove the mutex_is_contended()
patch from this series.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/fs.h | 1
include/linux/mm.h | 2
include/linux/mm_types.h | 1
kernel/fork.c | 1
mm/memory.c | 191 ++++++-----------------------------------------
mm/mmap.c | 13 ---
mm/mremap.c | 3
7 files changed, 27 insertions(+), 185 deletions(-)
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -876,8 +876,6 @@ struct zap_details {
struct address_space *check_mapping; /* Check page->mapping if set */
pgoff_t first_index; /* Lowest page->index to unmap */
pgoff_t last_index; /* Highest page->index to unmap */
- struct mutex *i_mmap_mutex; /* For unmap_mapping_range: */
- unsigned long truncate_count; /* Compare vm_truncate_count */
};
struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -986,12 +986,12 @@ int copy_page_range(struct mm_struct *ds
static unsigned long zap_pte_range(struct mmu_gather *tlb,
struct vm_area_struct *vma, pmd_t *pmd,
unsigned long addr, unsigned long end,
- long *zap_work, struct zap_details *details)
+ struct zap_details *details)
{
struct mm_struct *mm = tlb->mm;
- pte_t *pte;
- spinlock_t *ptl;
int rss[NR_MM_COUNTERS];
+ spinlock_t *ptl;
+ pte_t *pte;
init_rss_vec(rss);
@@ -1000,12 +1000,9 @@ static unsigned long zap_pte_range(struc
do {
pte_t ptent = *pte;
if (pte_none(ptent)) {
- (*zap_work)--;
continue;
}
- (*zap_work) -= PAGE_SIZE;
-
if (pte_present(ptent)) {
struct page *page;
@@ -1072,7 +1069,7 @@ static unsigned long zap_pte_range(struc
print_bad_pte(vma, addr, ptent, NULL);
}
pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
- } while (pte++, addr += PAGE_SIZE, (addr != end && *zap_work > 0));
+ } while (pte++, addr += PAGE_SIZE, addr != end);
add_mm_rss_vec(mm, rss);
arch_leave_lazy_mmu_mode();
@@ -1084,7 +1081,7 @@ static unsigned long zap_pte_range(struc
static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
struct vm_area_struct *vma, pud_t *pud,
unsigned long addr, unsigned long end,
- long *zap_work, struct zap_details *details)
+ struct zap_details *details)
{
pmd_t *pmd;
unsigned long next;
@@ -1096,19 +1093,15 @@ static inline unsigned long zap_pmd_rang
if (next-addr != HPAGE_PMD_SIZE) {
VM_BUG_ON(!rwsem_is_locked(&tlb->mm->mmap_sem));
split_huge_page_pmd(vma->vm_mm, pmd);
- } else if (zap_huge_pmd(tlb, vma, pmd)) {
- (*zap_work)--;
+ } else if (zap_huge_pmd(tlb, vma, pmd))
continue;
- }
/* fall through */
}
- if (pmd_none_or_clear_bad(pmd)) {
- (*zap_work)--;
+ if (pmd_none_or_clear_bad(pmd))
continue;
- }
- next = zap_pte_range(tlb, vma, pmd, addr, next,
- zap_work, details);
- } while (pmd++, addr = next, (addr != end && *zap_work > 0));
+ next = zap_pte_range(tlb, vma, pmd, addr, next, details);
+ cond_resched();
+ } while (pmd++, addr = next, addr != end);
return addr;
}
@@ -1116,7 +1109,7 @@ static inline unsigned long zap_pmd_rang
static inline unsigned long zap_pud_range(struct mmu_gather *tlb,
struct vm_area_struct *vma, pgd_t *pgd,
unsigned long addr, unsigned long end,
- long *zap_work, struct zap_details *details)
+ struct zap_details *details)
{
pud_t *pud;
unsigned long next;
@@ -1124,13 +1117,10 @@ static inline unsigned long zap_pud_rang
pud = pud_offset(pgd, addr);
do {
next = pud_addr_end(addr, end);
- if (pud_none_or_clear_bad(pud)) {
- (*zap_work)--;
+ if (pud_none_or_clear_bad(pud))
continue;
- }
- next = zap_pmd_range(tlb, vma, pud, addr, next,
- zap_work, details);
- } while (pud++, addr = next, (addr != end && *zap_work > 0));
+ next = zap_pmd_range(tlb, vma, pud, addr, next, details);
+ } while (pud++, addr = next, addr != end);
return addr;
}
@@ -1138,7 +1128,7 @@ static inline unsigned long zap_pud_rang
static unsigned long unmap_page_range(struct mmu_gather *tlb,
struct vm_area_struct *vma,
unsigned long addr, unsigned long end,
- long *zap_work, struct zap_details *details)
+ struct zap_details *details)
{
pgd_t *pgd;
unsigned long next;
@@ -1152,13 +1142,10 @@ static unsigned long unmap_page_range(st
pgd = pgd_offset(vma->vm_mm, addr);
do {
next = pgd_addr_end(addr, end);
- if (pgd_none_or_clear_bad(pgd)) {
- (*zap_work)--;
+ if (pgd_none_or_clear_bad(pgd))
continue;
- }
- next = zap_pud_range(tlb, vma, pgd, addr, next,
- zap_work, details);
- } while (pgd++, addr = next, (addr != end && *zap_work > 0));
+ next = zap_pud_range(tlb, vma, pgd, addr, next, details);
+ } while (pgd++, addr = next, addr != end);
tlb_end_vma(tlb, vma);
mem_cgroup_uncharge_end();
@@ -1203,9 +1190,7 @@ unsigned long unmap_vmas(struct mmu_gath
unsigned long end_addr, unsigned long *nr_accounted,
struct zap_details *details)
{
- long zap_work = ZAP_BLOCK_SIZE;
unsigned long start = start_addr;
- struct mutex *i_mmap_mutex = details ? details->i_mmap_mutex : NULL;
struct mm_struct *mm = vma->vm_mm;
mmu_notifier_invalidate_range_start(mm, start_addr, end_addr);
@@ -1238,33 +1223,16 @@ unsigned long unmap_vmas(struct mmu_gath
* Since no pte has actually been setup, it is
* safe to do nothing in this case.
*/
- if (vma->vm_file) {
+ if (vma->vm_file)
unmap_hugepage_range(vma, start, end, NULL);
- zap_work -= (end - start) /
- pages_per_huge_page(hstate_vma(vma));
- }
start = end;
} else
start = unmap_page_range(tlb, vma,
- start, end, &zap_work, details);
-
- if (zap_work > 0) {
- BUG_ON(start != end);
- break;
- }
-
- if (need_resched() ||
- (i_mmap_mutex && mutex_is_contended(i_mmap_mutex))) {
- if (i_mmap_mutex)
- goto out;
- cond_resched();
- }
-
- zap_work = ZAP_BLOCK_SIZE;
+ start, end, details);
}
}
-out:
+
mmu_notifier_invalidate_range_end(mm, start_addr, end_addr);
return start; /* which is now the end (or restart) address */
}
@@ -2528,96 +2496,11 @@ static int do_wp_page(struct mm_struct *
return ret;
}
-/*
- * Helper functions for unmap_mapping_range().
- *
- * __ Notes on dropping i_mmap_mutex to reduce latency while unmapping __
- *
- * We have to restart searching the prio_tree whenever we drop the lock,
- * since the iterator is only valid while the lock is held, and anyway
- * a later vma might be split and reinserted earlier while lock dropped.
- *
- * The list of nonlinear vmas could be handled more efficiently, using
- * a placeholder, but handle it in the same way until a need is shown.
- * It is important to search the prio_tree before nonlinear list: a vma
- * may become nonlinear and be shifted from prio_tree to nonlinear list
- * while the lock is dropped; but never shifted from list to prio_tree.
- *
- * In order to make forward progress despite restarting the search,
- * vm_truncate_count is used to mark a vma as now dealt with, so we can
- * quickly skip it next time around. Since the prio_tree search only
- * shows us those vmas affected by unmapping the range in question, we
- * can't efficiently keep all vmas in step with mapping->truncate_count:
- * so instead reset them all whenever it wraps back to 0 (then go to 1).
- * mapping->truncate_count and vma->vm_truncate_count are protected by
- * i_mmap_mutex.
- *
- * In order to make forward progress despite repeatedly restarting some
- * large vma, note the restart_addr from unmap_vmas when it breaks out:
- * and restart from that address when we reach that vma again. It might
- * have been split or merged, shrunk or extended, but never shifted: so
- * restart_addr remains valid so long as it remains in the vma's range.
- * unmap_mapping_range forces truncate_count to leap over page-aligned
- * values so we can save vma's restart_addr in its truncate_count field.
- */
-#define is_restart_addr(truncate_count) (!((truncate_count) & ~PAGE_MASK))
-
-static void reset_vma_truncate_counts(struct address_space *mapping)
-{
- struct vm_area_struct *vma;
- struct prio_tree_iter iter;
-
- vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, 0, ULONG_MAX)
- vma->vm_truncate_count = 0;
- list_for_each_entry(vma, &mapping->i_mmap_nonlinear, shared.vm_set.list)
- vma->vm_truncate_count = 0;
-}
-
-static int unmap_mapping_range_vma(struct vm_area_struct *vma,
+static void unmap_mapping_range_vma(struct vm_area_struct *vma,
unsigned long start_addr, unsigned long end_addr,
struct zap_details *details)
{
- unsigned long restart_addr;
- int need_break;
-
- /*
- * files that support invalidating or truncating portions of the
- * file from under mmaped areas must have their ->fault function
- * return a locked page (and set VM_FAULT_LOCKED in the return).
- * This provides synchronisation against concurrent unmapping here.
- */
-
-again:
- restart_addr = vma->vm_truncate_count;
- if (is_restart_addr(restart_addr) && start_addr < restart_addr) {
- start_addr = restart_addr;
- if (start_addr >= end_addr) {
- /* Top of vma has been split off since last time */
- vma->vm_truncate_count = details->truncate_count;
- return 0;
- }
- }
-
- restart_addr = zap_page_range(vma, start_addr,
- end_addr - start_addr, details);
- need_break = need_resched() || mutex_is_contended(details->i_mmap_mutex);
-
- if (restart_addr >= end_addr) {
- /* We have now completed this vma: mark it so */
- vma->vm_truncate_count = details->truncate_count;
- if (!need_break)
- return 0;
- } else {
- /* Note restart_addr in vma's truncate_count field */
- vma->vm_truncate_count = restart_addr;
- if (!need_break)
- goto again;
- }
-
- mutex_unlock(details->i_mmap_mutex);
- cond_resched();
- mutex_lock(details->i_mmap_mutex);
- return -EINTR;
+ zap_page_range(vma, start_addr, end_addr - start_addr, details);
}
static inline void unmap_mapping_range_tree(struct prio_tree_root *root,
@@ -2627,12 +2510,8 @@ static inline void unmap_mapping_range_t
struct prio_tree_iter iter;
pgoff_t vba, vea, zba, zea;
-restart:
vma_prio_tree_foreach(vma, &iter, root,
details->first_index, details->last_index) {
- /* Skip quickly over those we have already dealt with */
- if (vma->vm_truncate_count == details->truncate_count)
- continue;
vba = vma->vm_pgoff;
vea = vba + ((vma->vm_end - vma->vm_start) >> PAGE_SHIFT) - 1;
@@ -2644,11 +2523,10 @@ static inline void unmap_mapping_range_t
if (zea > vea)
zea = vea;
- if (unmap_mapping_range_vma(vma,
+ unmap_mapping_range_vma(vma,
((zba - vba) << PAGE_SHIFT) + vma->vm_start,
((zea - vba + 1) << PAGE_SHIFT) + vma->vm_start,
- details) < 0)
- goto restart;
+ details);
}
}
@@ -2663,15 +2541,9 @@ static inline void unmap_mapping_range_l
* across *all* the pages in each nonlinear VMA, not just the pages
* whose virtual address lies outside the file truncation point.
*/
-restart:
list_for_each_entry(vma, head, shared.vm_set.list) {
- /* Skip quickly over those we have already dealt with */
- if (vma->vm_truncate_count == details->truncate_count)
- continue;
details->nonlinear_vma = vma;
- if (unmap_mapping_range_vma(vma, vma->vm_start,
- vma->vm_end, details) < 0)
- goto restart;
+ unmap_mapping_range_vma(vma, vma->vm_start, vma->vm_end, details);
}
}
@@ -2710,19 +2582,8 @@ void unmap_mapping_range(struct address_
details.last_index = hba + hlen - 1;
if (details.last_index < details.first_index)
details.last_index = ULONG_MAX;
- details.i_mmap_mutex = &mapping->i_mmap_mutex;
mutex_lock(&mapping->i_mmap_mutex);
-
- /* Protect against endless unmapping loops */
- mapping->truncate_count++;
- if (unlikely(is_restart_addr(mapping->truncate_count))) {
- if (mapping->truncate_count == 0)
- reset_vma_truncate_counts(mapping);
- mapping->truncate_count++;
- }
- details.truncate_count = mapping->truncate_count;
-
if (unlikely(!prio_tree_empty(&mapping->i_mmap)))
unmap_mapping_range_tree(&mapping->i_mmap, &details);
if (unlikely(!list_empty(&mapping->i_mmap_nonlinear)))
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -640,7 +640,6 @@ struct address_space {
struct prio_tree_root i_mmap; /* tree of private and shared mappings */
struct list_head i_mmap_nonlinear;/*list VM_NONLINEAR mappings */
struct mutex i_mmap_mutex; /* protect tree, count, list */
- unsigned int truncate_count; /* Cover race condition with truncate */
unsigned long nrpages; /* number of total pages */
pgoff_t writeback_index;/* writeback starts here */
const struct address_space_operations *a_ops; /* methods */
Index: linux-2.6/include/linux/mm_types.h
===================================================================
--- linux-2.6.orig/include/linux/mm_types.h
+++ linux-2.6/include/linux/mm_types.h
@@ -175,7 +175,6 @@ struct vm_area_struct {
units, *not* PAGE_CACHE_SIZE */
struct file * vm_file; /* File we map to (can be NULL). */
void * vm_private_data; /* was vm_pte (shared mem) */
- unsigned long vm_truncate_count;/* truncate_count or restart_addr */
#ifndef CONFIG_MMU
struct vm_region *vm_region; /* NOMMU mapping region */
Index: linux-2.6/kernel/fork.c
===================================================================
--- linux-2.6.orig/kernel/fork.c
+++ linux-2.6/kernel/fork.c
@@ -379,7 +379,6 @@ static int dup_mmap(struct mm_struct *mm
mutex_lock(&mapping->i_mmap_mutex);
if (tmp->vm_flags & VM_SHARED)
mapping->i_mmap_writable++;
- tmp->vm_truncate_count = mpnt->vm_truncate_count;
flush_dcache_mmap_lock(mapping);
/* insert tmp into the share list, just after mpnt */
vma_prio_tree_add(tmp, mpnt);
Index: linux-2.6/mm/mmap.c
===================================================================
--- linux-2.6.orig/mm/mmap.c
+++ linux-2.6/mm/mmap.c
@@ -464,10 +464,8 @@ static void vma_link(struct mm_struct *m
if (vma->vm_file)
mapping = vma->vm_file->f_mapping;
- if (mapping) {
+ if (mapping)
mutex_lock(&mapping->i_mmap_mutex);
- vma->vm_truncate_count = mapping->truncate_count;
- }
__vma_link(mm, vma, prev, rb_link, rb_parent);
__vma_link_file(vma);
@@ -577,16 +575,7 @@ again: remove_next = 1 + (end > next->
if (!(vma->vm_flags & VM_NONLINEAR))
root = &mapping->i_mmap;
mutex_lock(&mapping->i_mmap_mutex);
- if (importer &&
- vma->vm_truncate_count != next->vm_truncate_count) {
- /*
- * unmap_mapping_range might be in progress:
- * ensure that the expanding vma is rescanned.
- */
- importer->vm_truncate_count = 0;
- }
if (insert) {
- insert->vm_truncate_count = vma->vm_truncate_count;
/*
* Put into prio_tree now, so instantiated pages
* are visible to arm/parisc __flush_dcache_page
Index: linux-2.6/mm/mremap.c
===================================================================
--- linux-2.6.orig/mm/mremap.c
+++ linux-2.6/mm/mremap.c
@@ -94,9 +94,6 @@ static void move_ptes(struct vm_area_str
*/
mapping = vma->vm_file->f_mapping;
mutex_lock(&mapping->i_mmap_mutex);
- if (new_vma->vm_truncate_count &&
- new_vma->vm_truncate_count != vma->vm_truncate_count)
- new_vma->vm_truncate_count = 0;
}
/*
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 24/25] mm: Remove i_mmap_mutex lockbreak
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: mm-fix-zap_block_size.patch --]
[-- Type: text/plain, Size: 16426 bytes --]
Hugh says:
"The only significant loser, I think, would be page reclaim (when
concurrent with truncation): could spin for a long time waiting for
the i_mmap_mutex it expects would soon be dropped? "
Counter points:
- cpu contention makes the spin stop (need_resched())
- zap pages should be freeing pages at a higher rate than reclaim
ever can
- shouldn't hold up reclaim more than lock_page() would
If we're going to do this, we can remove the mutex_is_contended()
patch from this series.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/fs.h | 1
include/linux/mm.h | 2
include/linux/mm_types.h | 1
kernel/fork.c | 1
mm/memory.c | 191 ++++++-----------------------------------------
mm/mmap.c | 13 ---
mm/mremap.c | 3
7 files changed, 27 insertions(+), 185 deletions(-)
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -876,8 +876,6 @@ struct zap_details {
struct address_space *check_mapping; /* Check page->mapping if set */
pgoff_t first_index; /* Lowest page->index to unmap */
pgoff_t last_index; /* Highest page->index to unmap */
- struct mutex *i_mmap_mutex; /* For unmap_mapping_range: */
- unsigned long truncate_count; /* Compare vm_truncate_count */
};
struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -986,12 +986,12 @@ int copy_page_range(struct mm_struct *ds
static unsigned long zap_pte_range(struct mmu_gather *tlb,
struct vm_area_struct *vma, pmd_t *pmd,
unsigned long addr, unsigned long end,
- long *zap_work, struct zap_details *details)
+ struct zap_details *details)
{
struct mm_struct *mm = tlb->mm;
- pte_t *pte;
- spinlock_t *ptl;
int rss[NR_MM_COUNTERS];
+ spinlock_t *ptl;
+ pte_t *pte;
init_rss_vec(rss);
@@ -1000,12 +1000,9 @@ static unsigned long zap_pte_range(struc
do {
pte_t ptent = *pte;
if (pte_none(ptent)) {
- (*zap_work)--;
continue;
}
- (*zap_work) -= PAGE_SIZE;
-
if (pte_present(ptent)) {
struct page *page;
@@ -1072,7 +1069,7 @@ static unsigned long zap_pte_range(struc
print_bad_pte(vma, addr, ptent, NULL);
}
pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
- } while (pte++, addr += PAGE_SIZE, (addr != end && *zap_work > 0));
+ } while (pte++, addr += PAGE_SIZE, addr != end);
add_mm_rss_vec(mm, rss);
arch_leave_lazy_mmu_mode();
@@ -1084,7 +1081,7 @@ static unsigned long zap_pte_range(struc
static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
struct vm_area_struct *vma, pud_t *pud,
unsigned long addr, unsigned long end,
- long *zap_work, struct zap_details *details)
+ struct zap_details *details)
{
pmd_t *pmd;
unsigned long next;
@@ -1096,19 +1093,15 @@ static inline unsigned long zap_pmd_rang
if (next-addr != HPAGE_PMD_SIZE) {
VM_BUG_ON(!rwsem_is_locked(&tlb->mm->mmap_sem));
split_huge_page_pmd(vma->vm_mm, pmd);
- } else if (zap_huge_pmd(tlb, vma, pmd)) {
- (*zap_work)--;
+ } else if (zap_huge_pmd(tlb, vma, pmd))
continue;
- }
/* fall through */
}
- if (pmd_none_or_clear_bad(pmd)) {
- (*zap_work)--;
+ if (pmd_none_or_clear_bad(pmd))
continue;
- }
- next = zap_pte_range(tlb, vma, pmd, addr, next,
- zap_work, details);
- } while (pmd++, addr = next, (addr != end && *zap_work > 0));
+ next = zap_pte_range(tlb, vma, pmd, addr, next, details);
+ cond_resched();
+ } while (pmd++, addr = next, addr != end);
return addr;
}
@@ -1116,7 +1109,7 @@ static inline unsigned long zap_pmd_rang
static inline unsigned long zap_pud_range(struct mmu_gather *tlb,
struct vm_area_struct *vma, pgd_t *pgd,
unsigned long addr, unsigned long end,
- long *zap_work, struct zap_details *details)
+ struct zap_details *details)
{
pud_t *pud;
unsigned long next;
@@ -1124,13 +1117,10 @@ static inline unsigned long zap_pud_rang
pud = pud_offset(pgd, addr);
do {
next = pud_addr_end(addr, end);
- if (pud_none_or_clear_bad(pud)) {
- (*zap_work)--;
+ if (pud_none_or_clear_bad(pud))
continue;
- }
- next = zap_pmd_range(tlb, vma, pud, addr, next,
- zap_work, details);
- } while (pud++, addr = next, (addr != end && *zap_work > 0));
+ next = zap_pmd_range(tlb, vma, pud, addr, next, details);
+ } while (pud++, addr = next, addr != end);
return addr;
}
@@ -1138,7 +1128,7 @@ static inline unsigned long zap_pud_rang
static unsigned long unmap_page_range(struct mmu_gather *tlb,
struct vm_area_struct *vma,
unsigned long addr, unsigned long end,
- long *zap_work, struct zap_details *details)
+ struct zap_details *details)
{
pgd_t *pgd;
unsigned long next;
@@ -1152,13 +1142,10 @@ static unsigned long unmap_page_range(st
pgd = pgd_offset(vma->vm_mm, addr);
do {
next = pgd_addr_end(addr, end);
- if (pgd_none_or_clear_bad(pgd)) {
- (*zap_work)--;
+ if (pgd_none_or_clear_bad(pgd))
continue;
- }
- next = zap_pud_range(tlb, vma, pgd, addr, next,
- zap_work, details);
- } while (pgd++, addr = next, (addr != end && *zap_work > 0));
+ next = zap_pud_range(tlb, vma, pgd, addr, next, details);
+ } while (pgd++, addr = next, addr != end);
tlb_end_vma(tlb, vma);
mem_cgroup_uncharge_end();
@@ -1203,9 +1190,7 @@ unsigned long unmap_vmas(struct mmu_gath
unsigned long end_addr, unsigned long *nr_accounted,
struct zap_details *details)
{
- long zap_work = ZAP_BLOCK_SIZE;
unsigned long start = start_addr;
- struct mutex *i_mmap_mutex = details ? details->i_mmap_mutex : NULL;
struct mm_struct *mm = vma->vm_mm;
mmu_notifier_invalidate_range_start(mm, start_addr, end_addr);
@@ -1238,33 +1223,16 @@ unsigned long unmap_vmas(struct mmu_gath
* Since no pte has actually been setup, it is
* safe to do nothing in this case.
*/
- if (vma->vm_file) {
+ if (vma->vm_file)
unmap_hugepage_range(vma, start, end, NULL);
- zap_work -= (end - start) /
- pages_per_huge_page(hstate_vma(vma));
- }
start = end;
} else
start = unmap_page_range(tlb, vma,
- start, end, &zap_work, details);
-
- if (zap_work > 0) {
- BUG_ON(start != end);
- break;
- }
-
- if (need_resched() ||
- (i_mmap_mutex && mutex_is_contended(i_mmap_mutex))) {
- if (i_mmap_mutex)
- goto out;
- cond_resched();
- }
-
- zap_work = ZAP_BLOCK_SIZE;
+ start, end, details);
}
}
-out:
+
mmu_notifier_invalidate_range_end(mm, start_addr, end_addr);
return start; /* which is now the end (or restart) address */
}
@@ -2528,96 +2496,11 @@ static int do_wp_page(struct mm_struct *
return ret;
}
-/*
- * Helper functions for unmap_mapping_range().
- *
- * __ Notes on dropping i_mmap_mutex to reduce latency while unmapping __
- *
- * We have to restart searching the prio_tree whenever we drop the lock,
- * since the iterator is only valid while the lock is held, and anyway
- * a later vma might be split and reinserted earlier while lock dropped.
- *
- * The list of nonlinear vmas could be handled more efficiently, using
- * a placeholder, but handle it in the same way until a need is shown.
- * It is important to search the prio_tree before nonlinear list: a vma
- * may become nonlinear and be shifted from prio_tree to nonlinear list
- * while the lock is dropped; but never shifted from list to prio_tree.
- *
- * In order to make forward progress despite restarting the search,
- * vm_truncate_count is used to mark a vma as now dealt with, so we can
- * quickly skip it next time around. Since the prio_tree search only
- * shows us those vmas affected by unmapping the range in question, we
- * can't efficiently keep all vmas in step with mapping->truncate_count:
- * so instead reset them all whenever it wraps back to 0 (then go to 1).
- * mapping->truncate_count and vma->vm_truncate_count are protected by
- * i_mmap_mutex.
- *
- * In order to make forward progress despite repeatedly restarting some
- * large vma, note the restart_addr from unmap_vmas when it breaks out:
- * and restart from that address when we reach that vma again. It might
- * have been split or merged, shrunk or extended, but never shifted: so
- * restart_addr remains valid so long as it remains in the vma's range.
- * unmap_mapping_range forces truncate_count to leap over page-aligned
- * values so we can save vma's restart_addr in its truncate_count field.
- */
-#define is_restart_addr(truncate_count) (!((truncate_count) & ~PAGE_MASK))
-
-static void reset_vma_truncate_counts(struct address_space *mapping)
-{
- struct vm_area_struct *vma;
- struct prio_tree_iter iter;
-
- vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, 0, ULONG_MAX)
- vma->vm_truncate_count = 0;
- list_for_each_entry(vma, &mapping->i_mmap_nonlinear, shared.vm_set.list)
- vma->vm_truncate_count = 0;
-}
-
-static int unmap_mapping_range_vma(struct vm_area_struct *vma,
+static void unmap_mapping_range_vma(struct vm_area_struct *vma,
unsigned long start_addr, unsigned long end_addr,
struct zap_details *details)
{
- unsigned long restart_addr;
- int need_break;
-
- /*
- * files that support invalidating or truncating portions of the
- * file from under mmaped areas must have their ->fault function
- * return a locked page (and set VM_FAULT_LOCKED in the return).
- * This provides synchronisation against concurrent unmapping here.
- */
-
-again:
- restart_addr = vma->vm_truncate_count;
- if (is_restart_addr(restart_addr) && start_addr < restart_addr) {
- start_addr = restart_addr;
- if (start_addr >= end_addr) {
- /* Top of vma has been split off since last time */
- vma->vm_truncate_count = details->truncate_count;
- return 0;
- }
- }
-
- restart_addr = zap_page_range(vma, start_addr,
- end_addr - start_addr, details);
- need_break = need_resched() || mutex_is_contended(details->i_mmap_mutex);
-
- if (restart_addr >= end_addr) {
- /* We have now completed this vma: mark it so */
- vma->vm_truncate_count = details->truncate_count;
- if (!need_break)
- return 0;
- } else {
- /* Note restart_addr in vma's truncate_count field */
- vma->vm_truncate_count = restart_addr;
- if (!need_break)
- goto again;
- }
-
- mutex_unlock(details->i_mmap_mutex);
- cond_resched();
- mutex_lock(details->i_mmap_mutex);
- return -EINTR;
+ zap_page_range(vma, start_addr, end_addr - start_addr, details);
}
static inline void unmap_mapping_range_tree(struct prio_tree_root *root,
@@ -2627,12 +2510,8 @@ static inline void unmap_mapping_range_t
struct prio_tree_iter iter;
pgoff_t vba, vea, zba, zea;
-restart:
vma_prio_tree_foreach(vma, &iter, root,
details->first_index, details->last_index) {
- /* Skip quickly over those we have already dealt with */
- if (vma->vm_truncate_count == details->truncate_count)
- continue;
vba = vma->vm_pgoff;
vea = vba + ((vma->vm_end - vma->vm_start) >> PAGE_SHIFT) - 1;
@@ -2644,11 +2523,10 @@ static inline void unmap_mapping_range_t
if (zea > vea)
zea = vea;
- if (unmap_mapping_range_vma(vma,
+ unmap_mapping_range_vma(vma,
((zba - vba) << PAGE_SHIFT) + vma->vm_start,
((zea - vba + 1) << PAGE_SHIFT) + vma->vm_start,
- details) < 0)
- goto restart;
+ details);
}
}
@@ -2663,15 +2541,9 @@ static inline void unmap_mapping_range_l
* across *all* the pages in each nonlinear VMA, not just the pages
* whose virtual address lies outside the file truncation point.
*/
-restart:
list_for_each_entry(vma, head, shared.vm_set.list) {
- /* Skip quickly over those we have already dealt with */
- if (vma->vm_truncate_count == details->truncate_count)
- continue;
details->nonlinear_vma = vma;
- if (unmap_mapping_range_vma(vma, vma->vm_start,
- vma->vm_end, details) < 0)
- goto restart;
+ unmap_mapping_range_vma(vma, vma->vm_start, vma->vm_end, details);
}
}
@@ -2710,19 +2582,8 @@ void unmap_mapping_range(struct address_
details.last_index = hba + hlen - 1;
if (details.last_index < details.first_index)
details.last_index = ULONG_MAX;
- details.i_mmap_mutex = &mapping->i_mmap_mutex;
mutex_lock(&mapping->i_mmap_mutex);
-
- /* Protect against endless unmapping loops */
- mapping->truncate_count++;
- if (unlikely(is_restart_addr(mapping->truncate_count))) {
- if (mapping->truncate_count == 0)
- reset_vma_truncate_counts(mapping);
- mapping->truncate_count++;
- }
- details.truncate_count = mapping->truncate_count;
-
if (unlikely(!prio_tree_empty(&mapping->i_mmap)))
unmap_mapping_range_tree(&mapping->i_mmap, &details);
if (unlikely(!list_empty(&mapping->i_mmap_nonlinear)))
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -640,7 +640,6 @@ struct address_space {
struct prio_tree_root i_mmap; /* tree of private and shared mappings */
struct list_head i_mmap_nonlinear;/*list VM_NONLINEAR mappings */
struct mutex i_mmap_mutex; /* protect tree, count, list */
- unsigned int truncate_count; /* Cover race condition with truncate */
unsigned long nrpages; /* number of total pages */
pgoff_t writeback_index;/* writeback starts here */
const struct address_space_operations *a_ops; /* methods */
Index: linux-2.6/include/linux/mm_types.h
===================================================================
--- linux-2.6.orig/include/linux/mm_types.h
+++ linux-2.6/include/linux/mm_types.h
@@ -175,7 +175,6 @@ struct vm_area_struct {
units, *not* PAGE_CACHE_SIZE */
struct file * vm_file; /* File we map to (can be NULL). */
void * vm_private_data; /* was vm_pte (shared mem) */
- unsigned long vm_truncate_count;/* truncate_count or restart_addr */
#ifndef CONFIG_MMU
struct vm_region *vm_region; /* NOMMU mapping region */
Index: linux-2.6/kernel/fork.c
===================================================================
--- linux-2.6.orig/kernel/fork.c
+++ linux-2.6/kernel/fork.c
@@ -379,7 +379,6 @@ static int dup_mmap(struct mm_struct *mm
mutex_lock(&mapping->i_mmap_mutex);
if (tmp->vm_flags & VM_SHARED)
mapping->i_mmap_writable++;
- tmp->vm_truncate_count = mpnt->vm_truncate_count;
flush_dcache_mmap_lock(mapping);
/* insert tmp into the share list, just after mpnt */
vma_prio_tree_add(tmp, mpnt);
Index: linux-2.6/mm/mmap.c
===================================================================
--- linux-2.6.orig/mm/mmap.c
+++ linux-2.6/mm/mmap.c
@@ -464,10 +464,8 @@ static void vma_link(struct mm_struct *m
if (vma->vm_file)
mapping = vma->vm_file->f_mapping;
- if (mapping) {
+ if (mapping)
mutex_lock(&mapping->i_mmap_mutex);
- vma->vm_truncate_count = mapping->truncate_count;
- }
__vma_link(mm, vma, prev, rb_link, rb_parent);
__vma_link_file(vma);
@@ -577,16 +575,7 @@ again: remove_next = 1 + (end > next->
if (!(vma->vm_flags & VM_NONLINEAR))
root = &mapping->i_mmap;
mutex_lock(&mapping->i_mmap_mutex);
- if (importer &&
- vma->vm_truncate_count != next->vm_truncate_count) {
- /*
- * unmap_mapping_range might be in progress:
- * ensure that the expanding vma is rescanned.
- */
- importer->vm_truncate_count = 0;
- }
if (insert) {
- insert->vm_truncate_count = vma->vm_truncate_count;
/*
* Put into prio_tree now, so instantiated pages
* are visible to arm/parisc __flush_dcache_page
Index: linux-2.6/mm/mremap.c
===================================================================
--- linux-2.6.orig/mm/mremap.c
+++ linux-2.6/mm/mremap.c
@@ -94,9 +94,6 @@ static void move_ptes(struct vm_area_str
*/
mapping = vma->vm_file->f_mapping;
mutex_lock(&mapping->i_mmap_mutex);
- if (new_vma->vm_truncate_count &&
- new_vma->vm_truncate_count != vma->vm_truncate_count)
- new_vma->vm_truncate_count = 0;
}
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 24/25] mm: Remove i_mmap_mutex lockbreak
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: mm-fix-zap_block_size.patch --]
[-- Type: text/plain, Size: 16426 bytes --]
Hugh says:
"The only significant loser, I think, would be page reclaim (when
concurrent with truncation): could spin for a long time waiting for
the i_mmap_mutex it expects would soon be dropped? "
Counter points:
- cpu contention makes the spin stop (need_resched())
- zap pages should be freeing pages at a higher rate than reclaim
ever can
- shouldn't hold up reclaim more than lock_page() would
If we're going to do this, we can remove the mutex_is_contended()
patch from this series.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/fs.h | 1
include/linux/mm.h | 2
include/linux/mm_types.h | 1
kernel/fork.c | 1
mm/memory.c | 191 ++++++-----------------------------------------
mm/mmap.c | 13 ---
mm/mremap.c | 3
7 files changed, 27 insertions(+), 185 deletions(-)
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -876,8 +876,6 @@ struct zap_details {
struct address_space *check_mapping; /* Check page->mapping if set */
pgoff_t first_index; /* Lowest page->index to unmap */
pgoff_t last_index; /* Highest page->index to unmap */
- struct mutex *i_mmap_mutex; /* For unmap_mapping_range: */
- unsigned long truncate_count; /* Compare vm_truncate_count */
};
struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -986,12 +986,12 @@ int copy_page_range(struct mm_struct *ds
static unsigned long zap_pte_range(struct mmu_gather *tlb,
struct vm_area_struct *vma, pmd_t *pmd,
unsigned long addr, unsigned long end,
- long *zap_work, struct zap_details *details)
+ struct zap_details *details)
{
struct mm_struct *mm = tlb->mm;
- pte_t *pte;
- spinlock_t *ptl;
int rss[NR_MM_COUNTERS];
+ spinlock_t *ptl;
+ pte_t *pte;
init_rss_vec(rss);
@@ -1000,12 +1000,9 @@ static unsigned long zap_pte_range(struc
do {
pte_t ptent = *pte;
if (pte_none(ptent)) {
- (*zap_work)--;
continue;
}
- (*zap_work) -= PAGE_SIZE;
-
if (pte_present(ptent)) {
struct page *page;
@@ -1072,7 +1069,7 @@ static unsigned long zap_pte_range(struc
print_bad_pte(vma, addr, ptent, NULL);
}
pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
- } while (pte++, addr += PAGE_SIZE, (addr != end && *zap_work > 0));
+ } while (pte++, addr += PAGE_SIZE, addr != end);
add_mm_rss_vec(mm, rss);
arch_leave_lazy_mmu_mode();
@@ -1084,7 +1081,7 @@ static unsigned long zap_pte_range(struc
static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
struct vm_area_struct *vma, pud_t *pud,
unsigned long addr, unsigned long end,
- long *zap_work, struct zap_details *details)
+ struct zap_details *details)
{
pmd_t *pmd;
unsigned long next;
@@ -1096,19 +1093,15 @@ static inline unsigned long zap_pmd_rang
if (next-addr != HPAGE_PMD_SIZE) {
VM_BUG_ON(!rwsem_is_locked(&tlb->mm->mmap_sem));
split_huge_page_pmd(vma->vm_mm, pmd);
- } else if (zap_huge_pmd(tlb, vma, pmd)) {
- (*zap_work)--;
+ } else if (zap_huge_pmd(tlb, vma, pmd))
continue;
- }
/* fall through */
}
- if (pmd_none_or_clear_bad(pmd)) {
- (*zap_work)--;
+ if (pmd_none_or_clear_bad(pmd))
continue;
- }
- next = zap_pte_range(tlb, vma, pmd, addr, next,
- zap_work, details);
- } while (pmd++, addr = next, (addr != end && *zap_work > 0));
+ next = zap_pte_range(tlb, vma, pmd, addr, next, details);
+ cond_resched();
+ } while (pmd++, addr = next, addr != end);
return addr;
}
@@ -1116,7 +1109,7 @@ static inline unsigned long zap_pmd_rang
static inline unsigned long zap_pud_range(struct mmu_gather *tlb,
struct vm_area_struct *vma, pgd_t *pgd,
unsigned long addr, unsigned long end,
- long *zap_work, struct zap_details *details)
+ struct zap_details *details)
{
pud_t *pud;
unsigned long next;
@@ -1124,13 +1117,10 @@ static inline unsigned long zap_pud_rang
pud = pud_offset(pgd, addr);
do {
next = pud_addr_end(addr, end);
- if (pud_none_or_clear_bad(pud)) {
- (*zap_work)--;
+ if (pud_none_or_clear_bad(pud))
continue;
- }
- next = zap_pmd_range(tlb, vma, pud, addr, next,
- zap_work, details);
- } while (pud++, addr = next, (addr != end && *zap_work > 0));
+ next = zap_pmd_range(tlb, vma, pud, addr, next, details);
+ } while (pud++, addr = next, addr != end);
return addr;
}
@@ -1138,7 +1128,7 @@ static inline unsigned long zap_pud_rang
static unsigned long unmap_page_range(struct mmu_gather *tlb,
struct vm_area_struct *vma,
unsigned long addr, unsigned long end,
- long *zap_work, struct zap_details *details)
+ struct zap_details *details)
{
pgd_t *pgd;
unsigned long next;
@@ -1152,13 +1142,10 @@ static unsigned long unmap_page_range(st
pgd = pgd_offset(vma->vm_mm, addr);
do {
next = pgd_addr_end(addr, end);
- if (pgd_none_or_clear_bad(pgd)) {
- (*zap_work)--;
+ if (pgd_none_or_clear_bad(pgd))
continue;
- }
- next = zap_pud_range(tlb, vma, pgd, addr, next,
- zap_work, details);
- } while (pgd++, addr = next, (addr != end && *zap_work > 0));
+ next = zap_pud_range(tlb, vma, pgd, addr, next, details);
+ } while (pgd++, addr = next, addr != end);
tlb_end_vma(tlb, vma);
mem_cgroup_uncharge_end();
@@ -1203,9 +1190,7 @@ unsigned long unmap_vmas(struct mmu_gath
unsigned long end_addr, unsigned long *nr_accounted,
struct zap_details *details)
{
- long zap_work = ZAP_BLOCK_SIZE;
unsigned long start = start_addr;
- struct mutex *i_mmap_mutex = details ? details->i_mmap_mutex : NULL;
struct mm_struct *mm = vma->vm_mm;
mmu_notifier_invalidate_range_start(mm, start_addr, end_addr);
@@ -1238,33 +1223,16 @@ unsigned long unmap_vmas(struct mmu_gath
* Since no pte has actually been setup, it is
* safe to do nothing in this case.
*/
- if (vma->vm_file) {
+ if (vma->vm_file)
unmap_hugepage_range(vma, start, end, NULL);
- zap_work -= (end - start) /
- pages_per_huge_page(hstate_vma(vma));
- }
start = end;
} else
start = unmap_page_range(tlb, vma,
- start, end, &zap_work, details);
-
- if (zap_work > 0) {
- BUG_ON(start != end);
- break;
- }
-
- if (need_resched() ||
- (i_mmap_mutex && mutex_is_contended(i_mmap_mutex))) {
- if (i_mmap_mutex)
- goto out;
- cond_resched();
- }
-
- zap_work = ZAP_BLOCK_SIZE;
+ start, end, details);
}
}
-out:
+
mmu_notifier_invalidate_range_end(mm, start_addr, end_addr);
return start; /* which is now the end (or restart) address */
}
@@ -2528,96 +2496,11 @@ static int do_wp_page(struct mm_struct *
return ret;
}
-/*
- * Helper functions for unmap_mapping_range().
- *
- * __ Notes on dropping i_mmap_mutex to reduce latency while unmapping __
- *
- * We have to restart searching the prio_tree whenever we drop the lock,
- * since the iterator is only valid while the lock is held, and anyway
- * a later vma might be split and reinserted earlier while lock dropped.
- *
- * The list of nonlinear vmas could be handled more efficiently, using
- * a placeholder, but handle it in the same way until a need is shown.
- * It is important to search the prio_tree before nonlinear list: a vma
- * may become nonlinear and be shifted from prio_tree to nonlinear list
- * while the lock is dropped; but never shifted from list to prio_tree.
- *
- * In order to make forward progress despite restarting the search,
- * vm_truncate_count is used to mark a vma as now dealt with, so we can
- * quickly skip it next time around. Since the prio_tree search only
- * shows us those vmas affected by unmapping the range in question, we
- * can't efficiently keep all vmas in step with mapping->truncate_count:
- * so instead reset them all whenever it wraps back to 0 (then go to 1).
- * mapping->truncate_count and vma->vm_truncate_count are protected by
- * i_mmap_mutex.
- *
- * In order to make forward progress despite repeatedly restarting some
- * large vma, note the restart_addr from unmap_vmas when it breaks out:
- * and restart from that address when we reach that vma again. It might
- * have been split or merged, shrunk or extended, but never shifted: so
- * restart_addr remains valid so long as it remains in the vma's range.
- * unmap_mapping_range forces truncate_count to leap over page-aligned
- * values so we can save vma's restart_addr in its truncate_count field.
- */
-#define is_restart_addr(truncate_count) (!((truncate_count) & ~PAGE_MASK))
-
-static void reset_vma_truncate_counts(struct address_space *mapping)
-{
- struct vm_area_struct *vma;
- struct prio_tree_iter iter;
-
- vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, 0, ULONG_MAX)
- vma->vm_truncate_count = 0;
- list_for_each_entry(vma, &mapping->i_mmap_nonlinear, shared.vm_set.list)
- vma->vm_truncate_count = 0;
-}
-
-static int unmap_mapping_range_vma(struct vm_area_struct *vma,
+static void unmap_mapping_range_vma(struct vm_area_struct *vma,
unsigned long start_addr, unsigned long end_addr,
struct zap_details *details)
{
- unsigned long restart_addr;
- int need_break;
-
- /*
- * files that support invalidating or truncating portions of the
- * file from under mmaped areas must have their ->fault function
- * return a locked page (and set VM_FAULT_LOCKED in the return).
- * This provides synchronisation against concurrent unmapping here.
- */
-
-again:
- restart_addr = vma->vm_truncate_count;
- if (is_restart_addr(restart_addr) && start_addr < restart_addr) {
- start_addr = restart_addr;
- if (start_addr >= end_addr) {
- /* Top of vma has been split off since last time */
- vma->vm_truncate_count = details->truncate_count;
- return 0;
- }
- }
-
- restart_addr = zap_page_range(vma, start_addr,
- end_addr - start_addr, details);
- need_break = need_resched() || mutex_is_contended(details->i_mmap_mutex);
-
- if (restart_addr >= end_addr) {
- /* We have now completed this vma: mark it so */
- vma->vm_truncate_count = details->truncate_count;
- if (!need_break)
- return 0;
- } else {
- /* Note restart_addr in vma's truncate_count field */
- vma->vm_truncate_count = restart_addr;
- if (!need_break)
- goto again;
- }
-
- mutex_unlock(details->i_mmap_mutex);
- cond_resched();
- mutex_lock(details->i_mmap_mutex);
- return -EINTR;
+ zap_page_range(vma, start_addr, end_addr - start_addr, details);
}
static inline void unmap_mapping_range_tree(struct prio_tree_root *root,
@@ -2627,12 +2510,8 @@ static inline void unmap_mapping_range_t
struct prio_tree_iter iter;
pgoff_t vba, vea, zba, zea;
-restart:
vma_prio_tree_foreach(vma, &iter, root,
details->first_index, details->last_index) {
- /* Skip quickly over those we have already dealt with */
- if (vma->vm_truncate_count == details->truncate_count)
- continue;
vba = vma->vm_pgoff;
vea = vba + ((vma->vm_end - vma->vm_start) >> PAGE_SHIFT) - 1;
@@ -2644,11 +2523,10 @@ static inline void unmap_mapping_range_t
if (zea > vea)
zea = vea;
- if (unmap_mapping_range_vma(vma,
+ unmap_mapping_range_vma(vma,
((zba - vba) << PAGE_SHIFT) + vma->vm_start,
((zea - vba + 1) << PAGE_SHIFT) + vma->vm_start,
- details) < 0)
- goto restart;
+ details);
}
}
@@ -2663,15 +2541,9 @@ static inline void unmap_mapping_range_l
* across *all* the pages in each nonlinear VMA, not just the pages
* whose virtual address lies outside the file truncation point.
*/
-restart:
list_for_each_entry(vma, head, shared.vm_set.list) {
- /* Skip quickly over those we have already dealt with */
- if (vma->vm_truncate_count == details->truncate_count)
- continue;
details->nonlinear_vma = vma;
- if (unmap_mapping_range_vma(vma, vma->vm_start,
- vma->vm_end, details) < 0)
- goto restart;
+ unmap_mapping_range_vma(vma, vma->vm_start, vma->vm_end, details);
}
}
@@ -2710,19 +2582,8 @@ void unmap_mapping_range(struct address_
details.last_index = hba + hlen - 1;
if (details.last_index < details.first_index)
details.last_index = ULONG_MAX;
- details.i_mmap_mutex = &mapping->i_mmap_mutex;
mutex_lock(&mapping->i_mmap_mutex);
-
- /* Protect against endless unmapping loops */
- mapping->truncate_count++;
- if (unlikely(is_restart_addr(mapping->truncate_count))) {
- if (mapping->truncate_count == 0)
- reset_vma_truncate_counts(mapping);
- mapping->truncate_count++;
- }
- details.truncate_count = mapping->truncate_count;
-
if (unlikely(!prio_tree_empty(&mapping->i_mmap)))
unmap_mapping_range_tree(&mapping->i_mmap, &details);
if (unlikely(!list_empty(&mapping->i_mmap_nonlinear)))
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -640,7 +640,6 @@ struct address_space {
struct prio_tree_root i_mmap; /* tree of private and shared mappings */
struct list_head i_mmap_nonlinear;/*list VM_NONLINEAR mappings */
struct mutex i_mmap_mutex; /* protect tree, count, list */
- unsigned int truncate_count; /* Cover race condition with truncate */
unsigned long nrpages; /* number of total pages */
pgoff_t writeback_index;/* writeback starts here */
const struct address_space_operations *a_ops; /* methods */
Index: linux-2.6/include/linux/mm_types.h
===================================================================
--- linux-2.6.orig/include/linux/mm_types.h
+++ linux-2.6/include/linux/mm_types.h
@@ -175,7 +175,6 @@ struct vm_area_struct {
units, *not* PAGE_CACHE_SIZE */
struct file * vm_file; /* File we map to (can be NULL). */
void * vm_private_data; /* was vm_pte (shared mem) */
- unsigned long vm_truncate_count;/* truncate_count or restart_addr */
#ifndef CONFIG_MMU
struct vm_region *vm_region; /* NOMMU mapping region */
Index: linux-2.6/kernel/fork.c
===================================================================
--- linux-2.6.orig/kernel/fork.c
+++ linux-2.6/kernel/fork.c
@@ -379,7 +379,6 @@ static int dup_mmap(struct mm_struct *mm
mutex_lock(&mapping->i_mmap_mutex);
if (tmp->vm_flags & VM_SHARED)
mapping->i_mmap_writable++;
- tmp->vm_truncate_count = mpnt->vm_truncate_count;
flush_dcache_mmap_lock(mapping);
/* insert tmp into the share list, just after mpnt */
vma_prio_tree_add(tmp, mpnt);
Index: linux-2.6/mm/mmap.c
===================================================================
--- linux-2.6.orig/mm/mmap.c
+++ linux-2.6/mm/mmap.c
@@ -464,10 +464,8 @@ static void vma_link(struct mm_struct *m
if (vma->vm_file)
mapping = vma->vm_file->f_mapping;
- if (mapping) {
+ if (mapping)
mutex_lock(&mapping->i_mmap_mutex);
- vma->vm_truncate_count = mapping->truncate_count;
- }
__vma_link(mm, vma, prev, rb_link, rb_parent);
__vma_link_file(vma);
@@ -577,16 +575,7 @@ again: remove_next = 1 + (end > next->
if (!(vma->vm_flags & VM_NONLINEAR))
root = &mapping->i_mmap;
mutex_lock(&mapping->i_mmap_mutex);
- if (importer &&
- vma->vm_truncate_count != next->vm_truncate_count) {
- /*
- * unmap_mapping_range might be in progress:
- * ensure that the expanding vma is rescanned.
- */
- importer->vm_truncate_count = 0;
- }
if (insert) {
- insert->vm_truncate_count = vma->vm_truncate_count;
/*
* Put into prio_tree now, so instantiated pages
* are visible to arm/parisc __flush_dcache_page
Index: linux-2.6/mm/mremap.c
===================================================================
--- linux-2.6.orig/mm/mremap.c
+++ linux-2.6/mm/mremap.c
@@ -94,9 +94,6 @@ static void move_ptes(struct vm_area_str
*/
mapping = vma->vm_file->f_mapping;
mutex_lock(&mapping->i_mmap_mutex);
- if (new_vma->vm_truncate_count &&
- new_vma->vm_truncate_count != vma->vm_truncate_count)
- new_vma->vm_truncate_count = 0;
}
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 25/25] mm, arch: Ensure we never tlb_flush_mmu() from atomic context
2011-01-25 17:31 ` Peter Zijlstra
(?)
@ 2011-01-25 17:31 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: mm-flush-vs-pte_lock.patch --]
[-- Type: text/plain, Size: 10365 bytes --]
Hugh noted that we could still end up flushing the batch from atomic
context because we do tlb_remove_page() while holding the pte_lock.
This will still generate immense latencies, more so now than ever
before due to the larger batches. Break tlb_remove_page() into two
functions, one that queues the page and one that flushes the queue.
Leave the tlb_remove_page() interface for now with the old semantics
but add a might_sleep() in there to detect callers from atomic
contexts.
XXX should probably fold back into the mmu_gather preempt patches for
the various architectures.
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/arm/include/asm/tlb.h | 17 ++++++++++++++++-
arch/ia64/include/asm/tlb.h | 22 ++++++++++++++++++----
arch/s390/include/asm/tlb.h | 18 ++++++++++++------
arch/sh/include/asm/tlb.h | 17 ++++++++++++++++-
arch/um/include/asm/tlb.h | 15 +++++++++++----
include/asm-generic/tlb.h | 22 +++++++++++++++-------
mm/memory.c | 14 +++++++++++---
7 files changed, 99 insertions(+), 26 deletions(-)
Index: linux-2.6/include/asm-generic/tlb.h
===================================================================
--- linux-2.6.orig/include/asm-generic/tlb.h
+++ linux-2.6/include/asm-generic/tlb.h
@@ -146,7 +146,7 @@ tlb_gather_mmu(struct mmu_gather *tlb, s
}
static inline void
-tlb_flush_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
+tlb_flush_mmu(struct mmu_gather *tlb)
{
struct mmu_gather_batch *batch;
@@ -176,7 +176,7 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
{
struct mmu_gather_batch *batch, *next;
- tlb_flush_mmu(tlb, start, end);
+ tlb_flush_mmu(tlb);
/* keep the page table cache within bounds */
check_pgt_cache();
@@ -193,7 +193,7 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
* handling the additional races in SMP caused by other CPUs caching valid
* mappings in their TLBs.
*/
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
{
struct mmu_gather_batch *batch;
@@ -201,17 +201,25 @@ static inline void tlb_remove_page(struc
if (tlb_fast_mode(tlb)) {
free_page_and_swap_cache(page);
- return;
+ return 0;
}
batch = tlb->active;
+ batch->pages[batch->nr++] = page;
if (batch->nr == batch->max) {
if (!tlb_next_batch(tlb))
- tlb_flush_mmu(tlb, 0, 0);
- batch = tlb->active;
+ return 1;
}
- batch->pages[batch->nr++] = page;
+ return 0;
+}
+
+static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+{
+ might_sleep();
+
+ if (__tlb_remove_page(tlb, page))
+ tlb_flush_mmu(tlb);
}
/**
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -990,11 +990,12 @@ static unsigned long zap_pte_range(struc
{
struct mm_struct *mm = tlb->mm;
int rss[NR_MM_COUNTERS];
+ int need_flush = 0;
spinlock_t *ptl;
pte_t *pte;
init_rss_vec(rss);
-
+again:
pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
arch_enter_lazy_mmu_mode();
do {
@@ -1048,7 +1049,7 @@ static unsigned long zap_pte_range(struc
page_remove_rmap(page);
if (unlikely(page_mapcount(page) < 0))
print_bad_pte(vma, addr, ptent, page);
- tlb_remove_page(tlb, page);
+ need_flush = __tlb_remove_page(tlb, page);
continue;
}
/*
@@ -1069,12 +1070,19 @@ static unsigned long zap_pte_range(struc
print_bad_pte(vma, addr, ptent, NULL);
}
pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
- } while (pte++, addr += PAGE_SIZE, addr != end);
+ } while (pte++, addr += PAGE_SIZE, (addr != end && !need_flush));
add_mm_rss_vec(mm, rss);
arch_leave_lazy_mmu_mode();
pte_unmap_unlock(pte - 1, ptl);
+ if (need_flush) {
+ need_flush = 0;
+ tlb_flush_mmu(tlb);
+ if (addr != end)
+ goto again;
+ }
+
return addr;
}
Index: linux-2.6/arch/arm/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/arm/include/asm/tlb.h
+++ linux-2.6/arch/arm/include/asm/tlb.h
@@ -93,7 +93,22 @@ tlb_end_vma(struct mmu_gather *tlb, stru
flush_tlb_range(vma, tlb->range_start, tlb->range_end);
}
-#define tlb_remove_page(tlb,page) free_page_and_swap_cache(page)
+static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+{
+ free_page_and_swap_cache(page);
+ return 0;
+}
+
+static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+{
+ might_sleep();
+ __tlb_remove_page(tlb, page);
+}
+
+static inline void tlb_flush_mmu(struct mmu_gather *tlb)
+{
+}
+
#define pte_free_tlb(tlb, ptep, addr) pte_free((tlb)->mm, ptep)
#define pmd_free_tlb(tlb, pmdp, addr) pmd_free((tlb)->mm, pmdp)
Index: linux-2.6/arch/ia64/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/ia64/include/asm/tlb.h
+++ linux-2.6/arch/ia64/include/asm/tlb.h
@@ -204,14 +204,13 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
* must be delayed until after the TLB has been flushed (see comments at the beginning of
* this file).
*/
-static inline void
-tlb_remove_page (struct mmu_gather *tlb, struct page *page)
+static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
{
tlb->need_flush = 1;
if (tlb_fast_mode(tlb)) {
free_page_and_swap_cache(page);
- return;
+ return 0;
}
if (!tlb->nr && tlb->pages == tlb->local)
@@ -219,7 +218,22 @@ tlb_remove_page (struct mmu_gather *tlb,
tlb->pages[tlb->nr++] = page;
if (tlb->nr >= tlb->max)
- ia64_tlb_flush_mmu(tlb, tlb->start_addr, tlb->end_addr);
+ return 1;
+
+ return 0;
+}
+
+static inline void tlb_flush_mmu(struct mmu_gather *tlb)
+{
+ ia64_tlb_flush_mmu(tlb, tlb->start_addr, tlb->end_addr);
+}
+
+static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+{
+ might_sleep();
+
+ if (__tlb_remove_page(tlb, page))
+ tlb_flush_mmu(tlb);
}
/*
Index: linux-2.6/arch/s390/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/s390/include/asm/tlb.h
+++ linux-2.6/arch/s390/include/asm/tlb.h
@@ -64,8 +64,7 @@ static inline void tlb_gather_mmu(struct
tlb->nr_pxds = tlb->max;
}
-static inline void tlb_flush_mmu(struct mmu_gather *tlb,
- unsigned long start, unsigned long end)
+static inline void tlb_flush_mmu(struct mmu_gather *tlb)
{
if (!tlb->fullmm && (tlb->nr_ptes > 0 || tlb->nr_pxds < tlb->max))
__tlb_flush_mm(tlb->mm);
@@ -78,7 +77,7 @@ static inline void tlb_flush_mmu(struct
static inline void tlb_finish_mmu(struct mmu_gather *tlb,
unsigned long start, unsigned long end)
{
- tlb_flush_mmu(tlb, start, end);
+ tlb_flush_mmu(tlb);
rcu_table_freelist_finish();
@@ -94,8 +93,15 @@ static inline void tlb_finish_mmu(struct
* tlb_ptep_clear_flush. In both flush modes the tlb fo a page cache page
* has already been freed, so just do free_page_and_swap_cache.
*/
+static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+{
+ free_page_and_swap_cache(page);
+ return 0;
+}
+
static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
{
+ might_sleep();
free_page_and_swap_cache(page);
}
@@ -109,7 +115,7 @@ static inline void pte_free_tlb(struct m
if (!tlb->fullmm) {
tlb->array[tlb->nr_ptes++] = pte;
if (tlb->nr_ptes >= tlb->nr_pxds)
- tlb_flush_mmu(tlb, 0, 0);
+ tlb_flush_mmu(tlb);
} else
page_table_free(tlb->mm, (unsigned long *) pte);
}
@@ -130,7 +136,7 @@ static inline void pmd_free_tlb(struct m
if (!tlb->fullmm) {
tlb->array[--tlb->nr_pxds] = pmd;
if (tlb->nr_ptes >= tlb->nr_pxds)
- tlb_flush_mmu(tlb, 0, 0);
+ tlb_flush_mmu(tlb);
} else
crst_table_free(tlb->mm, (unsigned long *) pmd);
#endif
@@ -152,7 +158,7 @@ static inline void pud_free_tlb(struct m
if (!tlb->fullmm) {
tlb->array[--tlb->nr_pxds] = pud;
if (tlb->nr_ptes >= tlb->nr_pxds)
- tlb_flush_mmu(tlb, 0, 0);
+ tlb_flush_mmu(tlb);
} else
crst_table_free(tlb->mm, (unsigned long *) pud);
#endif
Index: linux-2.6/arch/sh/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/sh/include/asm/tlb.h
+++ linux-2.6/arch/sh/include/asm/tlb.h
@@ -83,7 +83,22 @@ tlb_end_vma(struct mmu_gather *tlb, stru
}
}
-#define tlb_remove_page(tlb,page) free_page_and_swap_cache(page)
+static inline void tlb_flush_mmu(struct mmu_gather *tlb)
+{
+}
+
+static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+{
+ free_page_and_swap_cache(page);
+ return 0;
+}
+
+static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+{
+ might_sleep();
+ __tlb_remove_page(tlb, page);
+}
+
#define pte_free_tlb(tlb, ptep, addr) pte_free((tlb)->mm, ptep)
#define pmd_free_tlb(tlb, pmdp, addr) pmd_free((tlb)->mm, pmdp)
#define pud_free_tlb(tlb, pudp, addr) pud_free((tlb)->mm, pudp)
Index: linux-2.6/arch/um/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/um/include/asm/tlb.h
+++ linux-2.6/arch/um/include/asm/tlb.h
@@ -57,7 +57,7 @@ extern void flush_tlb_mm_range(struct mm
unsigned long end);
static inline void
-tlb_flush_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
+tlb_flush_mmu(struct mmu_gather *tlb)
{
if (!tlb->need_flush)
return;
@@ -73,7 +73,7 @@ tlb_flush_mmu(struct mmu_gather *tlb, un
static inline void
tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
{
- tlb_flush_mmu(tlb, start, end);
+ tlb_flush_mmu(tlb);
/* keep the page table cache within bounds */
check_pgt_cache();
@@ -84,11 +84,18 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
* while handling the additional races in SMP caused by other CPUs
* caching valid mappings in their TLBs.
*/
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
{
tlb->need_flush = 1;
free_page_and_swap_cache(page);
- return;
+ return 0;
+}
+
+static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+{
+ might_sleep();
+
+ __tlb_remove_page(tlb, page);
}
/**
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 25/25] mm, arch: Ensure we never tlb_flush_mmu() from atomic context
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: mm-flush-vs-pte_lock.patch --]
[-- Type: text/plain, Size: 10661 bytes --]
Hugh noted that we could still end up flushing the batch from atomic
context because we do tlb_remove_page() while holding the pte_lock.
This will still generate immense latencies, more so now than ever
before due to the larger batches. Break tlb_remove_page() into two
functions, one that queues the page and one that flushes the queue.
Leave the tlb_remove_page() interface for now with the old semantics
but add a might_sleep() in there to detect callers from atomic
contexts.
XXX should probably fold back into the mmu_gather preempt patches for
the various architectures.
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/arm/include/asm/tlb.h | 17 ++++++++++++++++-
arch/ia64/include/asm/tlb.h | 22 ++++++++++++++++++----
arch/s390/include/asm/tlb.h | 18 ++++++++++++------
arch/sh/include/asm/tlb.h | 17 ++++++++++++++++-
arch/um/include/asm/tlb.h | 15 +++++++++++----
include/asm-generic/tlb.h | 22 +++++++++++++++-------
mm/memory.c | 14 +++++++++++---
7 files changed, 99 insertions(+), 26 deletions(-)
Index: linux-2.6/include/asm-generic/tlb.h
===================================================================
--- linux-2.6.orig/include/asm-generic/tlb.h
+++ linux-2.6/include/asm-generic/tlb.h
@@ -146,7 +146,7 @@ tlb_gather_mmu(struct mmu_gather *tlb, s
}
static inline void
-tlb_flush_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
+tlb_flush_mmu(struct mmu_gather *tlb)
{
struct mmu_gather_batch *batch;
@@ -176,7 +176,7 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
{
struct mmu_gather_batch *batch, *next;
- tlb_flush_mmu(tlb, start, end);
+ tlb_flush_mmu(tlb);
/* keep the page table cache within bounds */
check_pgt_cache();
@@ -193,7 +193,7 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
* handling the additional races in SMP caused by other CPUs caching valid
* mappings in their TLBs.
*/
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
{
struct mmu_gather_batch *batch;
@@ -201,17 +201,25 @@ static inline void tlb_remove_page(struc
if (tlb_fast_mode(tlb)) {
free_page_and_swap_cache(page);
- return;
+ return 0;
}
batch = tlb->active;
+ batch->pages[batch->nr++] = page;
if (batch->nr == batch->max) {
if (!tlb_next_batch(tlb))
- tlb_flush_mmu(tlb, 0, 0);
- batch = tlb->active;
+ return 1;
}
- batch->pages[batch->nr++] = page;
+ return 0;
+}
+
+static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+{
+ might_sleep();
+
+ if (__tlb_remove_page(tlb, page))
+ tlb_flush_mmu(tlb);
}
/**
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -990,11 +990,12 @@ static unsigned long zap_pte_range(struc
{
struct mm_struct *mm = tlb->mm;
int rss[NR_MM_COUNTERS];
+ int need_flush = 0;
spinlock_t *ptl;
pte_t *pte;
init_rss_vec(rss);
-
+again:
pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
arch_enter_lazy_mmu_mode();
do {
@@ -1048,7 +1049,7 @@ static unsigned long zap_pte_range(struc
page_remove_rmap(page);
if (unlikely(page_mapcount(page) < 0))
print_bad_pte(vma, addr, ptent, page);
- tlb_remove_page(tlb, page);
+ need_flush = __tlb_remove_page(tlb, page);
continue;
}
/*
@@ -1069,12 +1070,19 @@ static unsigned long zap_pte_range(struc
print_bad_pte(vma, addr, ptent, NULL);
}
pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
- } while (pte++, addr += PAGE_SIZE, addr != end);
+ } while (pte++, addr += PAGE_SIZE, (addr != end && !need_flush));
add_mm_rss_vec(mm, rss);
arch_leave_lazy_mmu_mode();
pte_unmap_unlock(pte - 1, ptl);
+ if (need_flush) {
+ need_flush = 0;
+ tlb_flush_mmu(tlb);
+ if (addr != end)
+ goto again;
+ }
+
return addr;
}
Index: linux-2.6/arch/arm/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/arm/include/asm/tlb.h
+++ linux-2.6/arch/arm/include/asm/tlb.h
@@ -93,7 +93,22 @@ tlb_end_vma(struct mmu_gather *tlb, stru
flush_tlb_range(vma, tlb->range_start, tlb->range_end);
}
-#define tlb_remove_page(tlb,page) free_page_and_swap_cache(page)
+static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+{
+ free_page_and_swap_cache(page);
+ return 0;
+}
+
+static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+{
+ might_sleep();
+ __tlb_remove_page(tlb, page);
+}
+
+static inline void tlb_flush_mmu(struct mmu_gather *tlb)
+{
+}
+
#define pte_free_tlb(tlb, ptep, addr) pte_free((tlb)->mm, ptep)
#define pmd_free_tlb(tlb, pmdp, addr) pmd_free((tlb)->mm, pmdp)
Index: linux-2.6/arch/ia64/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/ia64/include/asm/tlb.h
+++ linux-2.6/arch/ia64/include/asm/tlb.h
@@ -204,14 +204,13 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
* must be delayed until after the TLB has been flushed (see comments at the beginning of
* this file).
*/
-static inline void
-tlb_remove_page (struct mmu_gather *tlb, struct page *page)
+static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
{
tlb->need_flush = 1;
if (tlb_fast_mode(tlb)) {
free_page_and_swap_cache(page);
- return;
+ return 0;
}
if (!tlb->nr && tlb->pages == tlb->local)
@@ -219,7 +218,22 @@ tlb_remove_page (struct mmu_gather *tlb,
tlb->pages[tlb->nr++] = page;
if (tlb->nr >= tlb->max)
- ia64_tlb_flush_mmu(tlb, tlb->start_addr, tlb->end_addr);
+ return 1;
+
+ return 0;
+}
+
+static inline void tlb_flush_mmu(struct mmu_gather *tlb)
+{
+ ia64_tlb_flush_mmu(tlb, tlb->start_addr, tlb->end_addr);
+}
+
+static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+{
+ might_sleep();
+
+ if (__tlb_remove_page(tlb, page))
+ tlb_flush_mmu(tlb);
}
/*
Index: linux-2.6/arch/s390/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/s390/include/asm/tlb.h
+++ linux-2.6/arch/s390/include/asm/tlb.h
@@ -64,8 +64,7 @@ static inline void tlb_gather_mmu(struct
tlb->nr_pxds = tlb->max;
}
-static inline void tlb_flush_mmu(struct mmu_gather *tlb,
- unsigned long start, unsigned long end)
+static inline void tlb_flush_mmu(struct mmu_gather *tlb)
{
if (!tlb->fullmm && (tlb->nr_ptes > 0 || tlb->nr_pxds < tlb->max))
__tlb_flush_mm(tlb->mm);
@@ -78,7 +77,7 @@ static inline void tlb_flush_mmu(struct
static inline void tlb_finish_mmu(struct mmu_gather *tlb,
unsigned long start, unsigned long end)
{
- tlb_flush_mmu(tlb, start, end);
+ tlb_flush_mmu(tlb);
rcu_table_freelist_finish();
@@ -94,8 +93,15 @@ static inline void tlb_finish_mmu(struct
* tlb_ptep_clear_flush. In both flush modes the tlb fo a page cache page
* has already been freed, so just do free_page_and_swap_cache.
*/
+static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+{
+ free_page_and_swap_cache(page);
+ return 0;
+}
+
static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
{
+ might_sleep();
free_page_and_swap_cache(page);
}
@@ -109,7 +115,7 @@ static inline void pte_free_tlb(struct m
if (!tlb->fullmm) {
tlb->array[tlb->nr_ptes++] = pte;
if (tlb->nr_ptes >= tlb->nr_pxds)
- tlb_flush_mmu(tlb, 0, 0);
+ tlb_flush_mmu(tlb);
} else
page_table_free(tlb->mm, (unsigned long *) pte);
}
@@ -130,7 +136,7 @@ static inline void pmd_free_tlb(struct m
if (!tlb->fullmm) {
tlb->array[--tlb->nr_pxds] = pmd;
if (tlb->nr_ptes >= tlb->nr_pxds)
- tlb_flush_mmu(tlb, 0, 0);
+ tlb_flush_mmu(tlb);
} else
crst_table_free(tlb->mm, (unsigned long *) pmd);
#endif
@@ -152,7 +158,7 @@ static inline void pud_free_tlb(struct m
if (!tlb->fullmm) {
tlb->array[--tlb->nr_pxds] = pud;
if (tlb->nr_ptes >= tlb->nr_pxds)
- tlb_flush_mmu(tlb, 0, 0);
+ tlb_flush_mmu(tlb);
} else
crst_table_free(tlb->mm, (unsigned long *) pud);
#endif
Index: linux-2.6/arch/sh/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/sh/include/asm/tlb.h
+++ linux-2.6/arch/sh/include/asm/tlb.h
@@ -83,7 +83,22 @@ tlb_end_vma(struct mmu_gather *tlb, stru
}
}
-#define tlb_remove_page(tlb,page) free_page_and_swap_cache(page)
+static inline void tlb_flush_mmu(struct mmu_gather *tlb)
+{
+}
+
+static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+{
+ free_page_and_swap_cache(page);
+ return 0;
+}
+
+static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+{
+ might_sleep();
+ __tlb_remove_page(tlb, page);
+}
+
#define pte_free_tlb(tlb, ptep, addr) pte_free((tlb)->mm, ptep)
#define pmd_free_tlb(tlb, pmdp, addr) pmd_free((tlb)->mm, pmdp)
#define pud_free_tlb(tlb, pudp, addr) pud_free((tlb)->mm, pudp)
Index: linux-2.6/arch/um/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/um/include/asm/tlb.h
+++ linux-2.6/arch/um/include/asm/tlb.h
@@ -57,7 +57,7 @@ extern void flush_tlb_mm_range(struct mm
unsigned long end);
static inline void
-tlb_flush_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
+tlb_flush_mmu(struct mmu_gather *tlb)
{
if (!tlb->need_flush)
return;
@@ -73,7 +73,7 @@ tlb_flush_mmu(struct mmu_gather *tlb, un
static inline void
tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
{
- tlb_flush_mmu(tlb, start, end);
+ tlb_flush_mmu(tlb);
/* keep the page table cache within bounds */
check_pgt_cache();
@@ -84,11 +84,18 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
* while handling the additional races in SMP caused by other CPUs
* caching valid mappings in their TLBs.
*/
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
{
tlb->need_flush = 1;
free_page_and_swap_cache(page);
- return;
+ return 0;
+}
+
+static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+{
+ might_sleep();
+
+ __tlb_remove_page(tlb, page);
}
/**
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [PATCH 25/25] mm, arch: Ensure we never tlb_flush_mmu() from atomic context
@ 2011-01-25 17:31 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 17:31 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: mm-flush-vs-pte_lock.patch --]
[-- Type: text/plain, Size: 10661 bytes --]
Hugh noted that we could still end up flushing the batch from atomic
context because we do tlb_remove_page() while holding the pte_lock.
This will still generate immense latencies, more so now than ever
before due to the larger batches. Break tlb_remove_page() into two
functions, one that queues the page and one that flushes the queue.
Leave the tlb_remove_page() interface for now with the old semantics
but add a might_sleep() in there to detect callers from atomic
contexts.
XXX should probably fold back into the mmu_gather preempt patches for
the various architectures.
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/arm/include/asm/tlb.h | 17 ++++++++++++++++-
arch/ia64/include/asm/tlb.h | 22 ++++++++++++++++++----
arch/s390/include/asm/tlb.h | 18 ++++++++++++------
arch/sh/include/asm/tlb.h | 17 ++++++++++++++++-
arch/um/include/asm/tlb.h | 15 +++++++++++----
include/asm-generic/tlb.h | 22 +++++++++++++++-------
mm/memory.c | 14 +++++++++++---
7 files changed, 99 insertions(+), 26 deletions(-)
Index: linux-2.6/include/asm-generic/tlb.h
===================================================================
--- linux-2.6.orig/include/asm-generic/tlb.h
+++ linux-2.6/include/asm-generic/tlb.h
@@ -146,7 +146,7 @@ tlb_gather_mmu(struct mmu_gather *tlb, s
}
static inline void
-tlb_flush_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
+tlb_flush_mmu(struct mmu_gather *tlb)
{
struct mmu_gather_batch *batch;
@@ -176,7 +176,7 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
{
struct mmu_gather_batch *batch, *next;
- tlb_flush_mmu(tlb, start, end);
+ tlb_flush_mmu(tlb);
/* keep the page table cache within bounds */
check_pgt_cache();
@@ -193,7 +193,7 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
* handling the additional races in SMP caused by other CPUs caching valid
* mappings in their TLBs.
*/
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
{
struct mmu_gather_batch *batch;
@@ -201,17 +201,25 @@ static inline void tlb_remove_page(struc
if (tlb_fast_mode(tlb)) {
free_page_and_swap_cache(page);
- return;
+ return 0;
}
batch = tlb->active;
+ batch->pages[batch->nr++] = page;
if (batch->nr == batch->max) {
if (!tlb_next_batch(tlb))
- tlb_flush_mmu(tlb, 0, 0);
- batch = tlb->active;
+ return 1;
}
- batch->pages[batch->nr++] = page;
+ return 0;
+}
+
+static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+{
+ might_sleep();
+
+ if (__tlb_remove_page(tlb, page))
+ tlb_flush_mmu(tlb);
}
/**
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -990,11 +990,12 @@ static unsigned long zap_pte_range(struc
{
struct mm_struct *mm = tlb->mm;
int rss[NR_MM_COUNTERS];
+ int need_flush = 0;
spinlock_t *ptl;
pte_t *pte;
init_rss_vec(rss);
-
+again:
pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
arch_enter_lazy_mmu_mode();
do {
@@ -1048,7 +1049,7 @@ static unsigned long zap_pte_range(struc
page_remove_rmap(page);
if (unlikely(page_mapcount(page) < 0))
print_bad_pte(vma, addr, ptent, page);
- tlb_remove_page(tlb, page);
+ need_flush = __tlb_remove_page(tlb, page);
continue;
}
/*
@@ -1069,12 +1070,19 @@ static unsigned long zap_pte_range(struc
print_bad_pte(vma, addr, ptent, NULL);
}
pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
- } while (pte++, addr += PAGE_SIZE, addr != end);
+ } while (pte++, addr += PAGE_SIZE, (addr != end && !need_flush));
add_mm_rss_vec(mm, rss);
arch_leave_lazy_mmu_mode();
pte_unmap_unlock(pte - 1, ptl);
+ if (need_flush) {
+ need_flush = 0;
+ tlb_flush_mmu(tlb);
+ if (addr != end)
+ goto again;
+ }
+
return addr;
}
Index: linux-2.6/arch/arm/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/arm/include/asm/tlb.h
+++ linux-2.6/arch/arm/include/asm/tlb.h
@@ -93,7 +93,22 @@ tlb_end_vma(struct mmu_gather *tlb, stru
flush_tlb_range(vma, tlb->range_start, tlb->range_end);
}
-#define tlb_remove_page(tlb,page) free_page_and_swap_cache(page)
+static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+{
+ free_page_and_swap_cache(page);
+ return 0;
+}
+
+static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+{
+ might_sleep();
+ __tlb_remove_page(tlb, page);
+}
+
+static inline void tlb_flush_mmu(struct mmu_gather *tlb)
+{
+}
+
#define pte_free_tlb(tlb, ptep, addr) pte_free((tlb)->mm, ptep)
#define pmd_free_tlb(tlb, pmdp, addr) pmd_free((tlb)->mm, pmdp)
Index: linux-2.6/arch/ia64/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/ia64/include/asm/tlb.h
+++ linux-2.6/arch/ia64/include/asm/tlb.h
@@ -204,14 +204,13 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
* must be delayed until after the TLB has been flushed (see comments at the beginning of
* this file).
*/
-static inline void
-tlb_remove_page (struct mmu_gather *tlb, struct page *page)
+static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
{
tlb->need_flush = 1;
if (tlb_fast_mode(tlb)) {
free_page_and_swap_cache(page);
- return;
+ return 0;
}
if (!tlb->nr && tlb->pages == tlb->local)
@@ -219,7 +218,22 @@ tlb_remove_page (struct mmu_gather *tlb,
tlb->pages[tlb->nr++] = page;
if (tlb->nr >= tlb->max)
- ia64_tlb_flush_mmu(tlb, tlb->start_addr, tlb->end_addr);
+ return 1;
+
+ return 0;
+}
+
+static inline void tlb_flush_mmu(struct mmu_gather *tlb)
+{
+ ia64_tlb_flush_mmu(tlb, tlb->start_addr, tlb->end_addr);
+}
+
+static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+{
+ might_sleep();
+
+ if (__tlb_remove_page(tlb, page))
+ tlb_flush_mmu(tlb);
}
/*
Index: linux-2.6/arch/s390/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/s390/include/asm/tlb.h
+++ linux-2.6/arch/s390/include/asm/tlb.h
@@ -64,8 +64,7 @@ static inline void tlb_gather_mmu(struct
tlb->nr_pxds = tlb->max;
}
-static inline void tlb_flush_mmu(struct mmu_gather *tlb,
- unsigned long start, unsigned long end)
+static inline void tlb_flush_mmu(struct mmu_gather *tlb)
{
if (!tlb->fullmm && (tlb->nr_ptes > 0 || tlb->nr_pxds < tlb->max))
__tlb_flush_mm(tlb->mm);
@@ -78,7 +77,7 @@ static inline void tlb_flush_mmu(struct
static inline void tlb_finish_mmu(struct mmu_gather *tlb,
unsigned long start, unsigned long end)
{
- tlb_flush_mmu(tlb, start, end);
+ tlb_flush_mmu(tlb);
rcu_table_freelist_finish();
@@ -94,8 +93,15 @@ static inline void tlb_finish_mmu(struct
* tlb_ptep_clear_flush. In both flush modes the tlb fo a page cache page
* has already been freed, so just do free_page_and_swap_cache.
*/
+static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+{
+ free_page_and_swap_cache(page);
+ return 0;
+}
+
static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
{
+ might_sleep();
free_page_and_swap_cache(page);
}
@@ -109,7 +115,7 @@ static inline void pte_free_tlb(struct m
if (!tlb->fullmm) {
tlb->array[tlb->nr_ptes++] = pte;
if (tlb->nr_ptes >= tlb->nr_pxds)
- tlb_flush_mmu(tlb, 0, 0);
+ tlb_flush_mmu(tlb);
} else
page_table_free(tlb->mm, (unsigned long *) pte);
}
@@ -130,7 +136,7 @@ static inline void pmd_free_tlb(struct m
if (!tlb->fullmm) {
tlb->array[--tlb->nr_pxds] = pmd;
if (tlb->nr_ptes >= tlb->nr_pxds)
- tlb_flush_mmu(tlb, 0, 0);
+ tlb_flush_mmu(tlb);
} else
crst_table_free(tlb->mm, (unsigned long *) pmd);
#endif
@@ -152,7 +158,7 @@ static inline void pud_free_tlb(struct m
if (!tlb->fullmm) {
tlb->array[--tlb->nr_pxds] = pud;
if (tlb->nr_ptes >= tlb->nr_pxds)
- tlb_flush_mmu(tlb, 0, 0);
+ tlb_flush_mmu(tlb);
} else
crst_table_free(tlb->mm, (unsigned long *) pud);
#endif
Index: linux-2.6/arch/sh/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/sh/include/asm/tlb.h
+++ linux-2.6/arch/sh/include/asm/tlb.h
@@ -83,7 +83,22 @@ tlb_end_vma(struct mmu_gather *tlb, stru
}
}
-#define tlb_remove_page(tlb,page) free_page_and_swap_cache(page)
+static inline void tlb_flush_mmu(struct mmu_gather *tlb)
+{
+}
+
+static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+{
+ free_page_and_swap_cache(page);
+ return 0;
+}
+
+static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+{
+ might_sleep();
+ __tlb_remove_page(tlb, page);
+}
+
#define pte_free_tlb(tlb, ptep, addr) pte_free((tlb)->mm, ptep)
#define pmd_free_tlb(tlb, pmdp, addr) pmd_free((tlb)->mm, pmdp)
#define pud_free_tlb(tlb, pudp, addr) pud_free((tlb)->mm, pudp)
Index: linux-2.6/arch/um/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/um/include/asm/tlb.h
+++ linux-2.6/arch/um/include/asm/tlb.h
@@ -57,7 +57,7 @@ extern void flush_tlb_mm_range(struct mm
unsigned long end);
static inline void
-tlb_flush_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
+tlb_flush_mmu(struct mmu_gather *tlb)
{
if (!tlb->need_flush)
return;
@@ -73,7 +73,7 @@ tlb_flush_mmu(struct mmu_gather *tlb, un
static inline void
tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
{
- tlb_flush_mmu(tlb, start, end);
+ tlb_flush_mmu(tlb);
/* keep the page table cache within bounds */
check_pgt_cache();
@@ -84,11 +84,18 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
* while handling the additional races in SMP caused by other CPUs
* caching valid mappings in their TLBs.
*/
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
{
tlb->need_flush = 1;
free_page_and_swap_cache(page);
- return;
+ return 0;
+}
+
+static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+{
+ might_sleep();
+
+ __tlb_remove_page(tlb, page);
}
/**
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 00/25] mm: Preemptibility -v7
2011-01-25 17:31 ` Peter Zijlstra
@ 2011-01-25 18:32 ` Sam Ravnborg
-1 siblings, 0 replies; 128+ messages in thread
From: Sam Ravnborg @ 2011-01-25 18:32 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds, linux-kernel, linux-arch,
linux-mm, Benjamin Herrenschmidt, David Miller, Hugh Dickins,
Mel Gorman, Nick Piggin, Paul McKenney, Yanmin Zhang
On Tue, Jan 25, 2011 at 06:31:11PM +0100, Peter Zijlstra wrote:
>
> This patch-set makes part of the mm a lot more preemptible. It converts
> i_mmap_lock and anon_vma->lock to mutexes and makes mmu_gather fully
> preemptible.
>
> The main motivation was making mm_take_all_locks() preemptible, since it
> appears people are nesting hundreds of spinlocks there.
>
> The side-effects are that can finally make mmu_gather preemptible,
> something which lots of people have wanted to do for a long time.
>
> It also gets us anon_vma refcounting, which seems to result in a nice
> cleanup of the anon_vma lifetime rules wrt KSM and compaction.
>
> This patch-set is build and boot-tested on x86_64 (a previous version was
> also tested on Dave's Niagra2 machines, and I suppose s390 was too when
> Martin provided the conversion patch for his arch).
>
> There are no known architectures left unconverted.
Hi Peter.
Foregive me my ignorance..
Why is this relevant for sparc64 but not for sparc32?
A quick grep showed up only this in sparc32 specific files:
mm/init_32.c:DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
Maybe this is just something sparc32 does not support?
Sam
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 00/25] mm: Preemptibility -v7
@ 2011-01-25 18:32 ` Sam Ravnborg
0 siblings, 0 replies; 128+ messages in thread
From: Sam Ravnborg @ 2011-01-25 18:32 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds, linux-kernel, linux-arch,
linux-mm, Benjamin Herrenschmidt, David Miller, Hugh Dickins,
Mel Gorman, Nick Piggin, Paul McKenney, Yanmin Zhang
On Tue, Jan 25, 2011 at 06:31:11PM +0100, Peter Zijlstra wrote:
>
> This patch-set makes part of the mm a lot more preemptible. It converts
> i_mmap_lock and anon_vma->lock to mutexes and makes mmu_gather fully
> preemptible.
>
> The main motivation was making mm_take_all_locks() preemptible, since it
> appears people are nesting hundreds of spinlocks there.
>
> The side-effects are that can finally make mmu_gather preemptible,
> something which lots of people have wanted to do for a long time.
>
> It also gets us anon_vma refcounting, which seems to result in a nice
> cleanup of the anon_vma lifetime rules wrt KSM and compaction.
>
> This patch-set is build and boot-tested on x86_64 (a previous version was
> also tested on Dave's Niagra2 machines, and I suppose s390 was too when
> Martin provided the conversion patch for his arch).
>
> There are no known architectures left unconverted.
Hi Peter.
Foregive me my ignorance..
Why is this relevant for sparc64 but not for sparc32?
A quick grep showed up only this in sparc32 specific files:
mm/init_32.c:DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
Maybe this is just something sparc32 does not support?
Sam
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 00/25] mm: Preemptibility -v7
2011-01-25 18:32 ` Sam Ravnborg
@ 2011-01-25 19:28 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 19:28 UTC (permalink / raw)
To: Sam Ravnborg
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds, linux-kernel, linux-arch,
linux-mm, Benjamin Herrenschmidt, David Miller, Hugh Dickins,
Mel Gorman, Nick Piggin, Paul McKenney, Yanmin Zhang
On Tue, 2011-01-25 at 19:32 +0100, Sam Ravnborg wrote:
> Foregive me my ignorance..
> Why is this relevant for sparc64 but not for sparc32?
>
> A quick grep showed up only this in sparc32 specific files:
>
> mm/init_32.c:DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
>
> Maybe this is just something sparc32 does not support?
sparc32 uses include/asm-generic/tlb.h, whereas sparc64 implements its
own variant.
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 00/25] mm: Preemptibility -v7
@ 2011-01-25 19:28 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 19:28 UTC (permalink / raw)
To: Sam Ravnborg
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds, linux-kernel, linux-arch,
linux-mm, Benjamin Herrenschmidt, David Miller, Hugh Dickins,
Mel Gorman, Nick Piggin, Paul McKenney, Yanmin Zhang
On Tue, 2011-01-25 at 19:32 +0100, Sam Ravnborg wrote:
> Foregive me my ignorance..
> Why is this relevant for sparc64 but not for sparc32?
>
> A quick grep showed up only this in sparc32 specific files:
>
> mm/init_32.c:DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
>
> Maybe this is just something sparc32 does not support?
sparc32 uses include/asm-generic/tlb.h, whereas sparc64 implements its
own variant.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 00/25] mm: Preemptibility -v7
2011-01-25 19:28 ` Peter Zijlstra
@ 2011-01-25 19:41 ` Sam Ravnborg
-1 siblings, 0 replies; 128+ messages in thread
From: Sam Ravnborg @ 2011-01-25 19:41 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds, linux-kernel, linux-arch,
linux-mm, Benjamin Herrenschmidt, David Miller, Hugh Dickins,
Mel Gorman, Nick Piggin, Paul McKenney, Yanmin Zhang
On Tue, Jan 25, 2011 at 08:28:53PM +0100, Peter Zijlstra wrote:
> On Tue, 2011-01-25 at 19:32 +0100, Sam Ravnborg wrote:
>
> > Foregive me my ignorance..
> > Why is this relevant for sparc64 but not for sparc32?
> >
> > A quick grep showed up only this in sparc32 specific files:
> >
> > mm/init_32.c:DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
> >
> > Maybe this is just something sparc32 does not support?
>
> sparc32 uses include/asm-generic/tlb.h, whereas sparc64 implements its
> own variant.
Ahh, thanks!
Sam
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 00/25] mm: Preemptibility -v7
@ 2011-01-25 19:41 ` Sam Ravnborg
0 siblings, 0 replies; 128+ messages in thread
From: Sam Ravnborg @ 2011-01-25 19:41 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds, linux-kernel, linux-arch,
linux-mm, Benjamin Herrenschmidt, David Miller, Hugh Dickins,
Mel Gorman, Nick Piggin, Paul McKenney, Yanmin Zhang
On Tue, Jan 25, 2011 at 08:28:53PM +0100, Peter Zijlstra wrote:
> On Tue, 2011-01-25 at 19:32 +0100, Sam Ravnborg wrote:
>
> > Foregive me my ignorance..
> > Why is this relevant for sparc64 but not for sparc32?
> >
> > A quick grep showed up only this in sparc32 specific files:
> >
> > mm/init_32.c:DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
> >
> > Maybe this is just something sparc32 does not support?
>
> sparc32 uses include/asm-generic/tlb.h, whereas sparc64 implements its
> own variant.
Ahh, thanks!
Sam
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 00/25] mm: Preemptibility -v7
2011-01-25 17:31 ` Peter Zijlstra
(?)
@ 2011-01-25 19:45 ` Andi Kleen
-1 siblings, 0 replies; 128+ messages in thread
From: Andi Kleen @ 2011-01-25 19:45 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm, linux-arch, linux-kernel, linux-arch
Peter Zijlstra <a.p.zijlstra@chello.nl> writes:
> This patch-set makes part of the mm a lot more preemptible. It converts
> i_mmap_lock and anon_vma->lock to mutexes and makes mmu_gather fully
> preemptible.
>
> The main motivation was making mm_take_all_locks() preemptible, since it
> appears people are nesting hundreds of spinlocks there.
Just curious: why is mm_take_all_locks() a problem? As far as I can see
it's just used when starting KVM or GRU the first time. Is that a common
situation?
-Andi
--
Andi Kleen
Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 00/25] mm: Preemptibility -v7
@ 2011-01-25 19:45 ` Andi Kleen
0 siblings, 0 replies; 128+ messages in thread
From: Andi Kleen @ 2011-01-25 19:45 UTC (permalink / raw)
To: linux-mm; +Cc: linux-kernel, linux-arch
Peter Zijlstra <a.p.zijlstra@chello.nl> writes:
> This patch-set makes part of the mm a lot more preemptible. It converts
> i_mmap_lock and anon_vma->lock to mutexes and makes mmu_gather fully
> preemptible.
>
> The main motivation was making mm_take_all_locks() preemptible, since it
> appears people are nesting hundreds of spinlocks there.
Just curious: why is mm_take_all_locks() a problem? As far as I can see
it's just used when starting KVM or GRU the first time. Is that a common
situation?
-Andi
--
Andi Kleen
Intel Open Source Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 00/25] mm: Preemptibility -v7
@ 2011-01-25 19:45 ` Andi Kleen
0 siblings, 0 replies; 128+ messages in thread
From: Andi Kleen @ 2011-01-25 19:45 UTC (permalink / raw)
To: linux-arch; +Cc: linux-mm, linux-kernel
Peter Zijlstra <a.p.zijlstra@chello.nl> writes:
> This patch-set makes part of the mm a lot more preemptible. It converts
> i_mmap_lock and anon_vma->lock to mutexes and makes mmu_gather fully
> preemptible.
>
> The main motivation was making mm_take_all_locks() preemptible, since it
> appears people are nesting hundreds of spinlocks there.
Just curious: why is mm_take_all_locks() a problem? As far as I can see
it's just used when starting KVM or GRU the first time. Is that a common
situation?
-Andi
--
Andi Kleen
Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 128+ messages in thread
[parent not found: <1295987985.28776.1118.camel@laptop>]
* Re: [PATCH 00/25] mm: Preemptibility -v7
[not found] ` <1295987985.28776.1118.camel@laptop>
@ 2011-01-25 20:47 ` Andi Kleen
0 siblings, 0 replies; 128+ messages in thread
From: Andi Kleen @ 2011-01-25 20:47 UTC (permalink / raw)
To: Peter Zijlstra, linux-kernel, linux-mm
> Its nesting hundreds of spinlocks (255+) make the preempt debug code
> unhappy, it also causes fun latencies when you do start KVM/GRU
> Although arguably that's the least convincing reason to do all this its
> the one that got me to actually compose this series -- I really should
> write a new leader..
Least Convincing is a good description...
Tuning operations which only happen once is not very interesting,
especially if that affects
fast paths.
> Making all this preemptible also allows making the whole mmu_gather
> thing preemptible, which is something we've wanted to do for a long
> while, it also allows XPMEM or whatever that thing was called (Andrea
> knows) and of course, it moves a part of -rt upstream.
I thought the reason for the preempt off inside the mmu gather region was
to stay on the same CPU for local/remote flushes. How would it change that?
> If we decide to keep patch 24, it also simplifies the truncate path
> quite a bit.
That sounds like a good thing. Making truncate simpler is always good.
-Andi
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 00/25] mm: Preemptibility -v7
@ 2011-01-25 20:47 ` Andi Kleen
0 siblings, 0 replies; 128+ messages in thread
From: Andi Kleen @ 2011-01-25 20:47 UTC (permalink / raw)
To: Peter Zijlstra, linux-kernel, linux-mm
> Its nesting hundreds of spinlocks (255+) make the preempt debug code
> unhappy, it also causes fun latencies when you do start KVM/GRU
> Although arguably that's the least convincing reason to do all this its
> the one that got me to actually compose this series -- I really should
> write a new leader..
Least Convincing is a good description...
Tuning operations which only happen once is not very interesting,
especially if that affects
fast paths.
> Making all this preemptible also allows making the whole mmu_gather
> thing preemptible, which is something we've wanted to do for a long
> while, it also allows XPMEM or whatever that thing was called (Andrea
> knows) and of course, it moves a part of -rt upstream.
I thought the reason for the preempt off inside the mmu gather region was
to stay on the same CPU for local/remote flushes. How would it change that?
> If we decide to keep patch 24, it also simplifies the truncate path
> quite a bit.
That sounds like a good thing. Making truncate simpler is always good.
-Andi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 00/25] mm: Preemptibility -v7
2011-01-25 20:47 ` Andi Kleen
@ 2011-01-25 21:09 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 21:09 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel, linux-mm, Hugh Dickins, benh
On Tue, 2011-01-25 at 12:47 -0800, Andi Kleen wrote:
>
> I thought the reason for the preempt off inside the mmu gather region was
> to stay on the same CPU for local/remote flushes. How would it change that?
afaik its been preempt-off solely because it was always inside a number
of spinlocks, I know both Hugh and BenH worked on making it preemptible
far before I started this.
I remember Hugh and Nick talking about this at OLS'06 or 07, can't
really remember.
As to local/remote flushes, there is no real saying where the pages came
from due to on-demand paging and the scheduler never having had a notion
of home-node. Therefore freeing them wouldn't be more of less local if
that is exposed to the same migration as allocation was.
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 00/25] mm: Preemptibility -v7
@ 2011-01-25 21:09 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-25 21:09 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel, linux-mm, Hugh Dickins, benh
On Tue, 2011-01-25 at 12:47 -0800, Andi Kleen wrote:
>
> I thought the reason for the preempt off inside the mmu gather region was
> to stay on the same CPU for local/remote flushes. How would it change that?
afaik its been preempt-off solely because it was always inside a number
of spinlocks, I know both Hugh and BenH worked on making it preemptible
far before I started this.
I remember Hugh and Nick talking about this at OLS'06 or 07, can't
really remember.
As to local/remote flushes, there is no real saying where the pages came
from due to on-demand paging and the scheduler never having had a notion
of home-node. Therefore freeing them wouldn't be more of less local if
that is exposed to the same migration as allocation was.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [RFC][PATCH 26/25] mm, arch: Convert ia64, arm, sh to generic tlb
2011-01-25 17:31 ` Peter Zijlstra
@ 2011-01-26 13:13 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-26 13:13 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Avi Kivity, Thomas Gleixner, Rik van Riel, Ingo Molnar, akpm,
Linus Torvalds, linux-kernel, linux-arch, linux-mm,
Benjamin Herrenschmidt, David Miller, Hugh Dickins, Mel Gorman,
Nick Piggin, Paul McKenney, Yanmin Zhang, Russell King,
Tony Luck, Paul Mundt
I was looking at what makes the various arches use !generic tlb gather
bits and found that for most the reason is the same, so I looked at what
it would take to support those in the generic code as well.
The below is a quick unification, I still need to run through all the
details since they appear to all do the same thing but slightly
different ;-)
I've compile tested it on ia64 and arm (seem to have lost my SH compiler
again).
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/Kconfig | 3
arch/arm/Kconfig | 1
arch/arm/include/asm/tlb.h | 90 --------------------
arch/arm/include/asm/tlbflush.h | 5 -
arch/ia64/Kconfig | 1
arch/ia64/include/asm/tlb.h | 178 +---------------------------------------
arch/sh/Kconfig | 1
arch/sh/include/asm/tlb.h | 99 +---------------------
include/asm-generic/tlb.h | 35 +++++++
9 files changed, 59 insertions(+), 354 deletions(-)
Index: linux-2.6/arch/Kconfig
===================================================================
--- linux-2.6.orig/arch/Kconfig
+++ linux-2.6/arch/Kconfig
@@ -181,4 +181,7 @@ config HAVE_ARCH_MUTEX_CPU_RELAX
config HAVE_RCU_TABLE_FREE
bool
+config HAVE_MMU_GATHER_RANGE
+ bool
+
source "kernel/gcov/Kconfig"
Index: linux-2.6/arch/arm/Kconfig
===================================================================
--- linux-2.6.orig/arch/arm/Kconfig
+++ linux-2.6/arch/arm/Kconfig
@@ -28,6 +28,7 @@ config ARM
select HAVE_C_RECORDMCOUNT
select HAVE_GENERIC_HARDIRQS
select HAVE_SPARSE_IRQ
+ select HAVE_MMU_GATHER_RANGE if MMU
help
The ARM series is a line of low-power-consumption RISC chip designs
licensed by ARM Ltd and targeted at embedded applications and
Index: linux-2.6/arch/arm/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/arm/include/asm/tlb.h
+++ linux-2.6/arch/arm/include/asm/tlb.h
@@ -23,96 +23,14 @@
#ifndef CONFIG_MMU
#include <linux/pagemap.h>
-#include <asm-generic/tlb.h>
#else /* !CONFIG_MMU */
-#include <asm/pgalloc.h>
-
-/*
- * TLB handling. This allows us to remove pages from the page
- * tables, and efficiently handle the TLB issues.
- */
-struct mmu_gather {
- struct mm_struct *mm;
- unsigned int fullmm;
- unsigned long range_start;
- unsigned long range_end;
-};
-
-static inline void
-tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
-{
- tlb->mm = mm;
- tlb->fullmm = full_mm_flush;
-}
-
-static inline void
-tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
-{
- if (tlb->fullmm)
- flush_tlb_mm(tlb->mm);
-
- /* keep the page table cache within bounds */
- check_pgt_cache();
-}
-
-/*
- * Memorize the range for the TLB flush.
- */
-static inline void
-tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep, unsigned long addr)
-{
- if (!tlb->fullmm) {
- if (addr < tlb->range_start)
- tlb->range_start = addr;
- if (addr + PAGE_SIZE > tlb->range_end)
- tlb->range_end = addr + PAGE_SIZE;
- }
-}
-
-/*
- * In the case of tlb vma handling, we can optimise these away in the
- * case where we're doing a full MM flush. When we're doing a munmap,
- * the vmas are adjusted to only cover the region to be torn down.
- */
-static inline void
-tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
- if (!tlb->fullmm) {
- flush_cache_range(vma, vma->vm_start, vma->vm_end);
- tlb->range_start = TASK_SIZE;
- tlb->range_end = 0;
- }
-}
-
-static inline void
-tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
- if (!tlb->fullmm && tlb->range_end > 0)
- flush_tlb_range(vma, tlb->range_start, tlb->range_end);
-}
-
-static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
- free_page_and_swap_cache(page);
- return 0;
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
- might_sleep();
- __tlb_remove_page(tlb, page);
-}
-
-static inline void tlb_flush_mmu(struct mmu_gather *tlb)
-{
-}
+#define __pte_free_tlb(tlb, ptep, addr) pte_free((tlb)->mm, ptep)
+#define __pmd_free_tlb(tlb, pmdp, addr) pmd_free((tlb)->mm, pmdp)
-#define pte_free_tlb(tlb, ptep, addr) pte_free((tlb)->mm, ptep)
-#define pmd_free_tlb(tlb, pmdp, addr) pmd_free((tlb)->mm, pmdp)
+#endif /* CONFIG_MMU */
-#define tlb_migrate_finish(mm) do { } while (0)
+#include <asm-generic/tlb.h>
-#endif /* CONFIG_MMU */
#endif
Index: linux-2.6/arch/ia64/Kconfig
===================================================================
--- linux-2.6.orig/arch/ia64/Kconfig
+++ linux-2.6/arch/ia64/Kconfig
@@ -25,6 +25,7 @@ config IA64
select HAVE_GENERIC_HARDIRQS
select GENERIC_IRQ_PROBE
select GENERIC_PENDING_IRQ if SMP
+ select HAVE_MMU_GATHER_RANGE
select IRQ_PER_CPU
default y
help
Index: linux-2.6/arch/ia64/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/ia64/include/asm/tlb.h
+++ linux-2.6/arch/ia64/include/asm/tlb.h
@@ -46,30 +46,6 @@
#include <asm/tlbflush.h>
#include <asm/machvec.h>
-#ifdef CONFIG_SMP
-# define tlb_fast_mode(tlb) ((tlb)->nr == ~0U)
-#else
-# define tlb_fast_mode(tlb) (1)
-#endif
-
-/*
- * If we can't allocate a page to make a big batch of page pointers
- * to work on, then just handle a few from the on-stack structure.
- */
-#define IA64_GATHER_BUNDLE 8
-
-struct mmu_gather {
- struct mm_struct *mm;
- unsigned int nr; /* == ~0U => fast mode */
- unsigned int max;
- unsigned char fullmm; /* non-zero means full mm flush */
- unsigned char need_flush; /* really unmapped some PTEs? */
- unsigned long start_addr;
- unsigned long end_addr;
- struct page **pages;
- struct page *local[IA64_GATHER_BUNDLE];
-};
-
struct ia64_tr_entry {
u64 ifa;
u64 itir;
@@ -96,6 +72,12 @@ extern struct ia64_tr_entry *ia64_idtrs[
#define RR_RID_MASK 0x00000000ffffff00L
#define RR_TO_RID(val) ((val >> 8) & 0xffffff)
+static inline void tlb_flush(struct mmu_gather *tlb);
+
+#define tlb_migrate_finish(mm) platform_tlb_migrate_finish(mm)
+
+#include <asm-generic/tlb.h>
+
/*
* Flush the TLB for address range START to END and, if not in fast mode, release the
* freed pages that where gathered up to this point.
@@ -103,12 +85,6 @@ extern struct ia64_tr_entry *ia64_idtrs[
static inline void
ia64_tlb_flush_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long end)
{
- unsigned int nr;
-
- if (!tlb->need_flush)
- return;
- tlb->need_flush = 0;
-
if (tlb->fullmm) {
/*
* Tearing down the entire address space. This happens both as a result
@@ -138,149 +114,11 @@ ia64_tlb_flush_mmu (struct mmu_gather *t
/* now flush the virt. page-table area mapping the address range: */
flush_tlb_range(&vma, ia64_thash(start), ia64_thash(end));
}
-
- /* lastly, release the freed pages */
- nr = tlb->nr;
- if (!tlb_fast_mode(tlb)) {
- unsigned long i;
- tlb->nr = 0;
- tlb->start_addr = ~0UL;
- for (i = 0; i < nr; ++i)
- free_page_and_swap_cache(tlb->pages[i]);
- }
-}
-
-static inline void __tlb_alloc_page(struct mmu_gather *tlb)
-{
- unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
-
- if (addr) {
- tlb->pages = (void *)addr;
- tlb->max = PAGE_SIZE / sizeof(void *);
- }
-}
-
-
-static inline void
-tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
-{
- tlb->mm = mm;
- tlb->max = ARRAY_SIZE(tlb->local);
- tlb->pages = tlb->local;
- /*
- * Use fast mode if only 1 CPU is online.
- *
- * It would be tempting to turn on fast-mode for full_mm_flush as well. But this
- * doesn't work because of speculative accesses and software prefetching: the page
- * table of "mm" may (and usually is) the currently active page table and even
- * though the kernel won't do any user-space accesses during the TLB shoot down, a
- * compiler might use speculation or lfetch.fault on what happens to be a valid
- * user-space address. This in turn could trigger a TLB miss fault (or a VHPT
- * walk) and re-insert a TLB entry we just removed. Slow mode avoids such
- * problems. (We could make fast-mode work by switching the current task to a
- * different "mm" during the shootdown.) --davidm 08/02/2002
- */
- tlb->nr = (num_online_cpus() == 1) ? ~0U : 0;
- tlb->fullmm = full_mm_flush;
- tlb->start_addr = ~0UL;
}
-/*
- * Called at the end of the shootdown operation to free up any resources that were
- * collected.
- */
-static inline void
-tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
+static inline void tlb_flush(struct mmu_gather *tlb)
{
- /*
- * Note: tlb->nr may be 0 at this point, so we can't rely on tlb->start_addr and
- * tlb->end_addr.
- */
- ia64_tlb_flush_mmu(tlb, start, end);
-
- /* keep the page table cache within bounds */
- check_pgt_cache();
-
- if (tlb->pages != tlb->local)
- free_pages((unsigned long)tlb->pages, 0);
+ ia64_tlb_flush_mmu(tlb, tlb->start, tlb->end);
}
-/*
- * Logically, this routine frees PAGE. On MP machines, the actual freeing of the page
- * must be delayed until after the TLB has been flushed (see comments at the beginning of
- * this file).
- */
-static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
- tlb->need_flush = 1;
-
- if (tlb_fast_mode(tlb)) {
- free_page_and_swap_cache(page);
- return 0;
- }
-
- if (!tlb->nr && tlb->pages == tlb->local)
- __tlb_alloc_page(tlb);
-
- tlb->pages[tlb->nr++] = page;
- if (tlb->nr >= tlb->max)
- return 1;
-
- return 0;
-}
-
-static inline void tlb_flush_mmu(struct mmu_gather *tlb)
-{
- ia64_tlb_flush_mmu(tlb, tlb->start_addr, tlb->end_addr);
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
- might_sleep();
-
- if (__tlb_remove_page(tlb, page))
- tlb_flush_mmu(tlb);
-}
-
-/*
- * Remove TLB entry for PTE mapped at virtual address ADDRESS. This is called for any
- * PTE, not just those pointing to (normal) physical memory.
- */
-static inline void
-__tlb_remove_tlb_entry (struct mmu_gather *tlb, pte_t *ptep, unsigned long address)
-{
- if (tlb->start_addr == ~0UL)
- tlb->start_addr = address;
- tlb->end_addr = address + PAGE_SIZE;
-}
-
-#define tlb_migrate_finish(mm) platform_tlb_migrate_finish(mm)
-
-#define tlb_start_vma(tlb, vma) do { } while (0)
-#define tlb_end_vma(tlb, vma) do { } while (0)
-
-#define tlb_remove_tlb_entry(tlb, ptep, addr) \
-do { \
- tlb->need_flush = 1; \
- __tlb_remove_tlb_entry(tlb, ptep, addr); \
-} while (0)
-
-#define pte_free_tlb(tlb, ptep, address) \
-do { \
- tlb->need_flush = 1; \
- __pte_free_tlb(tlb, ptep, address); \
-} while (0)
-
-#define pmd_free_tlb(tlb, ptep, address) \
-do { \
- tlb->need_flush = 1; \
- __pmd_free_tlb(tlb, ptep, address); \
-} while (0)
-
-#define pud_free_tlb(tlb, pudp, address) \
-do { \
- tlb->need_flush = 1; \
- __pud_free_tlb(tlb, pudp, address); \
-} while (0)
-
#endif /* _ASM_IA64_TLB_H */
Index: linux-2.6/arch/sh/Kconfig
===================================================================
--- linux-2.6.orig/arch/sh/Kconfig
+++ linux-2.6/arch/sh/Kconfig
@@ -22,6 +22,7 @@ config SUPERH
select HAVE_SPARSE_IRQ
select RTC_LIB
select GENERIC_ATOMIC64
+ select HAVE_MMU_GATHER_RANGE if MMU
# Support the deprecated APIs until MFD and GPIOLIB catch up.
select GENERIC_HARDIRQS_NO_DEPRECATED if !MFD_SUPPORT && !GPIOLIB
help
Index: linux-2.6/arch/sh/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/sh/include/asm/tlb.h
+++ linux-2.6/arch/sh/include/asm/tlb.h
@@ -9,101 +9,11 @@
#include <linux/pagemap.h>
#ifdef CONFIG_MMU
-#include <asm/pgalloc.h>
-#include <asm/tlbflush.h>
#include <asm/mmu_context.h>
-/*
- * TLB handling. This allows us to remove pages from the page
- * tables, and efficiently handle the TLB issues.
- */
-struct mmu_gather {
- struct mm_struct *mm;
- unsigned int fullmm;
- unsigned long start, end;
-};
-
-static inline void init_tlb_gather(struct mmu_gather *tlb)
-{
- tlb->start = TASK_SIZE;
- tlb->end = 0;
-
- if (tlb->fullmm) {
- tlb->start = 0;
- tlb->end = TASK_SIZE;
- }
-}
-
-static inline void
-tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
-{
- tlb->mm = mm;
- tlb->fullmm = full_mm_flush;
-
- init_tlb_gather(tlb);
-}
-
-static inline void
-tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
-{
- if (tlb->fullmm)
- flush_tlb_mm(tlb->mm);
-
- /* keep the page table cache within bounds */
- check_pgt_cache();
-}
-
-static inline void
-tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep, unsigned long address)
-{
- if (tlb->start > address)
- tlb->start = address;
- if (tlb->end < address + PAGE_SIZE)
- tlb->end = address + PAGE_SIZE;
-}
-
-/*
- * In the case of tlb vma handling, we can optimise these away in the
- * case where we're doing a full MM flush. When we're doing a munmap,
- * the vmas are adjusted to only cover the region to be torn down.
- */
-static inline void
-tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
- if (!tlb->fullmm)
- flush_cache_range(vma, vma->vm_start, vma->vm_end);
-}
-
-static inline void
-tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
- if (!tlb->fullmm && tlb->end) {
- flush_tlb_range(vma, tlb->start, tlb->end);
- init_tlb_gather(tlb);
- }
-}
-
-static inline void tlb_flush_mmu(struct mmu_gather *tlb)
-{
-}
-
-static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
- free_page_and_swap_cache(page);
- return 0;
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
- might_sleep();
- __tlb_remove_page(tlb, page);
-}
-
-#define pte_free_tlb(tlb, ptep, addr) pte_free((tlb)->mm, ptep)
-#define pmd_free_tlb(tlb, pmdp, addr) pmd_free((tlb)->mm, pmdp)
-#define pud_free_tlb(tlb, pudp, addr) pud_free((tlb)->mm, pudp)
-
-#define tlb_migrate_finish(mm) do { } while (0)
+#define __pte_free_tlb(tlb, ptep, addr) pte_free((tlb)->mm, ptep)
+#define __pmd_free_tlb(tlb, pmdp, addr) pmd_free((tlb)->mm, pmdp)
+#define __pud_free_tlb(tlb, pudp, addr) pud_free((tlb)->mm, pudp)
#if defined(CONFIG_CPU_SH4) || defined(CONFIG_SUPERH64)
extern void tlb_wire_entry(struct vm_area_struct *, unsigned long, pte_t);
@@ -128,8 +38,9 @@ static inline void tlb_unwire_entry(void
#define __tlb_remove_tlb_entry(tlb, pte, address) do { } while (0)
#define tlb_flush(tlb) do { } while (0)
+#endif /* CONFIG_MMU */
+
#include <asm-generic/tlb.h>
-#endif /* CONFIG_MMU */
#endif /* __ASSEMBLY__ */
#endif /* __ASM_SH_TLB_H */
Index: linux-2.6/include/asm-generic/tlb.h
===================================================================
--- linux-2.6.orig/include/asm-generic/tlb.h
+++ linux-2.6/include/asm-generic/tlb.h
@@ -79,6 +79,10 @@ struct mmu_gather {
fast_mode : 1; /* No batching */
unsigned int fullmm; /* Flush full mm */
+#ifdef CONFIG_HAVE_MMU_GATHER_RANGE
+ unsigned long start, end;
+#endif
+
struct mmu_gather_batch *active;
struct mmu_gather_batch local;
struct page *__pages[8];
@@ -222,6 +226,35 @@ static inline void tlb_remove_page(struc
tlb_flush_mmu(tlb);
}
+#ifdef CONFIG_HAVE_MMU_GATHER_RANGE
+static inline void
+__tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep, unsigned long address)
+{
+ if (!tlb->fullmm) {
+ tlb->start = min(tlb->start, address);
+ address += PAGE_SIZE;
+ tlb->end = max(tlb->end, address);
+ }
+}
+
+static inline void
+tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
+{
+ if (!tlb->fullmm) {
+ flush_cache_range(vma, vma->vm_start, vma->vm_end);
+ tlb->start = TASK_SIZE;
+ tlb->end = 0;
+ }
+}
+
+static inline void
+tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
+{
+ if (!tlb->fullmm && tlb->end)
+ flush_tlb_range(vma, tlb->start, tlb->end);
+}
+#endif
+
/**
* tlb_remove_tlb_entry - remember a pte unmapping for later tlb invalidation.
*
@@ -255,6 +288,8 @@ static inline void tlb_remove_page(struc
__pmd_free_tlb(tlb, pmdp, address); \
} while (0)
+#ifndef tlb_migrate_finish
#define tlb_migrate_finish(mm) do {} while (0)
+#endif
#endif /* _ASM_GENERIC__TLB_H */
Index: linux-2.6/arch/arm/include/asm/tlbflush.h
===================================================================
--- linux-2.6.orig/arch/arm/include/asm/tlbflush.h
+++ linux-2.6/arch/arm/include/asm/tlbflush.h
@@ -10,12 +10,9 @@
#ifndef _ASMARM_TLBFLUSH_H
#define _ASMARM_TLBFLUSH_H
-
-#ifndef CONFIG_MMU
-
#define tlb_flush(tlb) ((void) tlb)
-#else /* CONFIG_MMU */
+#ifdef CONFIG_MMU
#include <asm/glue.h>
^ permalink raw reply [flat|nested] 128+ messages in thread
* [RFC][PATCH 26/25] mm, arch: Convert ia64, arm, sh to generic tlb
@ 2011-01-26 13:13 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-26 13:13 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Avi Kivity, Thomas Gleixner, Rik van Riel, Ingo Molnar, akpm,
Linus Torvalds, linux-kernel, linux-arch, linux-mm,
Benjamin Herrenschmidt, David Miller, Hugh Dickins, Mel Gorman,
Nick Piggin, Paul McKenney, Yanmin Zhang, Russell King,
Tony Luck, Paul Mundt
I was looking at what makes the various arches use !generic tlb gather
bits and found that for most the reason is the same, so I looked at what
it would take to support those in the generic code as well.
The below is a quick unification, I still need to run through all the
details since they appear to all do the same thing but slightly
different ;-)
I've compile tested it on ia64 and arm (seem to have lost my SH compiler
again).
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/Kconfig | 3
arch/arm/Kconfig | 1
arch/arm/include/asm/tlb.h | 90 --------------------
arch/arm/include/asm/tlbflush.h | 5 -
arch/ia64/Kconfig | 1
arch/ia64/include/asm/tlb.h | 178 +---------------------------------------
arch/sh/Kconfig | 1
arch/sh/include/asm/tlb.h | 99 +---------------------
include/asm-generic/tlb.h | 35 +++++++
9 files changed, 59 insertions(+), 354 deletions(-)
Index: linux-2.6/arch/Kconfig
===================================================================
--- linux-2.6.orig/arch/Kconfig
+++ linux-2.6/arch/Kconfig
@@ -181,4 +181,7 @@ config HAVE_ARCH_MUTEX_CPU_RELAX
config HAVE_RCU_TABLE_FREE
bool
+config HAVE_MMU_GATHER_RANGE
+ bool
+
source "kernel/gcov/Kconfig"
Index: linux-2.6/arch/arm/Kconfig
===================================================================
--- linux-2.6.orig/arch/arm/Kconfig
+++ linux-2.6/arch/arm/Kconfig
@@ -28,6 +28,7 @@ config ARM
select HAVE_C_RECORDMCOUNT
select HAVE_GENERIC_HARDIRQS
select HAVE_SPARSE_IRQ
+ select HAVE_MMU_GATHER_RANGE if MMU
help
The ARM series is a line of low-power-consumption RISC chip designs
licensed by ARM Ltd and targeted at embedded applications and
Index: linux-2.6/arch/arm/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/arm/include/asm/tlb.h
+++ linux-2.6/arch/arm/include/asm/tlb.h
@@ -23,96 +23,14 @@
#ifndef CONFIG_MMU
#include <linux/pagemap.h>
-#include <asm-generic/tlb.h>
#else /* !CONFIG_MMU */
-#include <asm/pgalloc.h>
-
-/*
- * TLB handling. This allows us to remove pages from the page
- * tables, and efficiently handle the TLB issues.
- */
-struct mmu_gather {
- struct mm_struct *mm;
- unsigned int fullmm;
- unsigned long range_start;
- unsigned long range_end;
-};
-
-static inline void
-tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
-{
- tlb->mm = mm;
- tlb->fullmm = full_mm_flush;
-}
-
-static inline void
-tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
-{
- if (tlb->fullmm)
- flush_tlb_mm(tlb->mm);
-
- /* keep the page table cache within bounds */
- check_pgt_cache();
-}
-
-/*
- * Memorize the range for the TLB flush.
- */
-static inline void
-tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep, unsigned long addr)
-{
- if (!tlb->fullmm) {
- if (addr < tlb->range_start)
- tlb->range_start = addr;
- if (addr + PAGE_SIZE > tlb->range_end)
- tlb->range_end = addr + PAGE_SIZE;
- }
-}
-
-/*
- * In the case of tlb vma handling, we can optimise these away in the
- * case where we're doing a full MM flush. When we're doing a munmap,
- * the vmas are adjusted to only cover the region to be torn down.
- */
-static inline void
-tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
- if (!tlb->fullmm) {
- flush_cache_range(vma, vma->vm_start, vma->vm_end);
- tlb->range_start = TASK_SIZE;
- tlb->range_end = 0;
- }
-}
-
-static inline void
-tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
- if (!tlb->fullmm && tlb->range_end > 0)
- flush_tlb_range(vma, tlb->range_start, tlb->range_end);
-}
-
-static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
- free_page_and_swap_cache(page);
- return 0;
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
- might_sleep();
- __tlb_remove_page(tlb, page);
-}
-
-static inline void tlb_flush_mmu(struct mmu_gather *tlb)
-{
-}
+#define __pte_free_tlb(tlb, ptep, addr) pte_free((tlb)->mm, ptep)
+#define __pmd_free_tlb(tlb, pmdp, addr) pmd_free((tlb)->mm, pmdp)
-#define pte_free_tlb(tlb, ptep, addr) pte_free((tlb)->mm, ptep)
-#define pmd_free_tlb(tlb, pmdp, addr) pmd_free((tlb)->mm, pmdp)
+#endif /* CONFIG_MMU */
-#define tlb_migrate_finish(mm) do { } while (0)
+#include <asm-generic/tlb.h>
-#endif /* CONFIG_MMU */
#endif
Index: linux-2.6/arch/ia64/Kconfig
===================================================================
--- linux-2.6.orig/arch/ia64/Kconfig
+++ linux-2.6/arch/ia64/Kconfig
@@ -25,6 +25,7 @@ config IA64
select HAVE_GENERIC_HARDIRQS
select GENERIC_IRQ_PROBE
select GENERIC_PENDING_IRQ if SMP
+ select HAVE_MMU_GATHER_RANGE
select IRQ_PER_CPU
default y
help
Index: linux-2.6/arch/ia64/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/ia64/include/asm/tlb.h
+++ linux-2.6/arch/ia64/include/asm/tlb.h
@@ -46,30 +46,6 @@
#include <asm/tlbflush.h>
#include <asm/machvec.h>
-#ifdef CONFIG_SMP
-# define tlb_fast_mode(tlb) ((tlb)->nr == ~0U)
-#else
-# define tlb_fast_mode(tlb) (1)
-#endif
-
-/*
- * If we can't allocate a page to make a big batch of page pointers
- * to work on, then just handle a few from the on-stack structure.
- */
-#define IA64_GATHER_BUNDLE 8
-
-struct mmu_gather {
- struct mm_struct *mm;
- unsigned int nr; /* == ~0U => fast mode */
- unsigned int max;
- unsigned char fullmm; /* non-zero means full mm flush */
- unsigned char need_flush; /* really unmapped some PTEs? */
- unsigned long start_addr;
- unsigned long end_addr;
- struct page **pages;
- struct page *local[IA64_GATHER_BUNDLE];
-};
-
struct ia64_tr_entry {
u64 ifa;
u64 itir;
@@ -96,6 +72,12 @@ extern struct ia64_tr_entry *ia64_idtrs[
#define RR_RID_MASK 0x00000000ffffff00L
#define RR_TO_RID(val) ((val >> 8) & 0xffffff)
+static inline void tlb_flush(struct mmu_gather *tlb);
+
+#define tlb_migrate_finish(mm) platform_tlb_migrate_finish(mm)
+
+#include <asm-generic/tlb.h>
+
/*
* Flush the TLB for address range START to END and, if not in fast mode, release the
* freed pages that where gathered up to this point.
@@ -103,12 +85,6 @@ extern struct ia64_tr_entry *ia64_idtrs[
static inline void
ia64_tlb_flush_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long end)
{
- unsigned int nr;
-
- if (!tlb->need_flush)
- return;
- tlb->need_flush = 0;
-
if (tlb->fullmm) {
/*
* Tearing down the entire address space. This happens both as a result
@@ -138,149 +114,11 @@ ia64_tlb_flush_mmu (struct mmu_gather *t
/* now flush the virt. page-table area mapping the address range: */
flush_tlb_range(&vma, ia64_thash(start), ia64_thash(end));
}
-
- /* lastly, release the freed pages */
- nr = tlb->nr;
- if (!tlb_fast_mode(tlb)) {
- unsigned long i;
- tlb->nr = 0;
- tlb->start_addr = ~0UL;
- for (i = 0; i < nr; ++i)
- free_page_and_swap_cache(tlb->pages[i]);
- }
-}
-
-static inline void __tlb_alloc_page(struct mmu_gather *tlb)
-{
- unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
-
- if (addr) {
- tlb->pages = (void *)addr;
- tlb->max = PAGE_SIZE / sizeof(void *);
- }
-}
-
-
-static inline void
-tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
-{
- tlb->mm = mm;
- tlb->max = ARRAY_SIZE(tlb->local);
- tlb->pages = tlb->local;
- /*
- * Use fast mode if only 1 CPU is online.
- *
- * It would be tempting to turn on fast-mode for full_mm_flush as well. But this
- * doesn't work because of speculative accesses and software prefetching: the page
- * table of "mm" may (and usually is) the currently active page table and even
- * though the kernel won't do any user-space accesses during the TLB shoot down, a
- * compiler might use speculation or lfetch.fault on what happens to be a valid
- * user-space address. This in turn could trigger a TLB miss fault (or a VHPT
- * walk) and re-insert a TLB entry we just removed. Slow mode avoids such
- * problems. (We could make fast-mode work by switching the current task to a
- * different "mm" during the shootdown.) --davidm 08/02/2002
- */
- tlb->nr = (num_online_cpus() == 1) ? ~0U : 0;
- tlb->fullmm = full_mm_flush;
- tlb->start_addr = ~0UL;
}
-/*
- * Called at the end of the shootdown operation to free up any resources that were
- * collected.
- */
-static inline void
-tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
+static inline void tlb_flush(struct mmu_gather *tlb)
{
- /*
- * Note: tlb->nr may be 0 at this point, so we can't rely on tlb->start_addr and
- * tlb->end_addr.
- */
- ia64_tlb_flush_mmu(tlb, start, end);
-
- /* keep the page table cache within bounds */
- check_pgt_cache();
-
- if (tlb->pages != tlb->local)
- free_pages((unsigned long)tlb->pages, 0);
+ ia64_tlb_flush_mmu(tlb, tlb->start, tlb->end);
}
-/*
- * Logically, this routine frees PAGE. On MP machines, the actual freeing of the page
- * must be delayed until after the TLB has been flushed (see comments at the beginning of
- * this file).
- */
-static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
- tlb->need_flush = 1;
-
- if (tlb_fast_mode(tlb)) {
- free_page_and_swap_cache(page);
- return 0;
- }
-
- if (!tlb->nr && tlb->pages == tlb->local)
- __tlb_alloc_page(tlb);
-
- tlb->pages[tlb->nr++] = page;
- if (tlb->nr >= tlb->max)
- return 1;
-
- return 0;
-}
-
-static inline void tlb_flush_mmu(struct mmu_gather *tlb)
-{
- ia64_tlb_flush_mmu(tlb, tlb->start_addr, tlb->end_addr);
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
- might_sleep();
-
- if (__tlb_remove_page(tlb, page))
- tlb_flush_mmu(tlb);
-}
-
-/*
- * Remove TLB entry for PTE mapped at virtual address ADDRESS. This is called for any
- * PTE, not just those pointing to (normal) physical memory.
- */
-static inline void
-__tlb_remove_tlb_entry (struct mmu_gather *tlb, pte_t *ptep, unsigned long address)
-{
- if (tlb->start_addr == ~0UL)
- tlb->start_addr = address;
- tlb->end_addr = address + PAGE_SIZE;
-}
-
-#define tlb_migrate_finish(mm) platform_tlb_migrate_finish(mm)
-
-#define tlb_start_vma(tlb, vma) do { } while (0)
-#define tlb_end_vma(tlb, vma) do { } while (0)
-
-#define tlb_remove_tlb_entry(tlb, ptep, addr) \
-do { \
- tlb->need_flush = 1; \
- __tlb_remove_tlb_entry(tlb, ptep, addr); \
-} while (0)
-
-#define pte_free_tlb(tlb, ptep, address) \
-do { \
- tlb->need_flush = 1; \
- __pte_free_tlb(tlb, ptep, address); \
-} while (0)
-
-#define pmd_free_tlb(tlb, ptep, address) \
-do { \
- tlb->need_flush = 1; \
- __pmd_free_tlb(tlb, ptep, address); \
-} while (0)
-
-#define pud_free_tlb(tlb, pudp, address) \
-do { \
- tlb->need_flush = 1; \
- __pud_free_tlb(tlb, pudp, address); \
-} while (0)
-
#endif /* _ASM_IA64_TLB_H */
Index: linux-2.6/arch/sh/Kconfig
===================================================================
--- linux-2.6.orig/arch/sh/Kconfig
+++ linux-2.6/arch/sh/Kconfig
@@ -22,6 +22,7 @@ config SUPERH
select HAVE_SPARSE_IRQ
select RTC_LIB
select GENERIC_ATOMIC64
+ select HAVE_MMU_GATHER_RANGE if MMU
# Support the deprecated APIs until MFD and GPIOLIB catch up.
select GENERIC_HARDIRQS_NO_DEPRECATED if !MFD_SUPPORT && !GPIOLIB
help
Index: linux-2.6/arch/sh/include/asm/tlb.h
===================================================================
--- linux-2.6.orig/arch/sh/include/asm/tlb.h
+++ linux-2.6/arch/sh/include/asm/tlb.h
@@ -9,101 +9,11 @@
#include <linux/pagemap.h>
#ifdef CONFIG_MMU
-#include <asm/pgalloc.h>
-#include <asm/tlbflush.h>
#include <asm/mmu_context.h>
-/*
- * TLB handling. This allows us to remove pages from the page
- * tables, and efficiently handle the TLB issues.
- */
-struct mmu_gather {
- struct mm_struct *mm;
- unsigned int fullmm;
- unsigned long start, end;
-};
-
-static inline void init_tlb_gather(struct mmu_gather *tlb)
-{
- tlb->start = TASK_SIZE;
- tlb->end = 0;
-
- if (tlb->fullmm) {
- tlb->start = 0;
- tlb->end = TASK_SIZE;
- }
-}
-
-static inline void
-tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
-{
- tlb->mm = mm;
- tlb->fullmm = full_mm_flush;
-
- init_tlb_gather(tlb);
-}
-
-static inline void
-tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
-{
- if (tlb->fullmm)
- flush_tlb_mm(tlb->mm);
-
- /* keep the page table cache within bounds */
- check_pgt_cache();
-}
-
-static inline void
-tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep, unsigned long address)
-{
- if (tlb->start > address)
- tlb->start = address;
- if (tlb->end < address + PAGE_SIZE)
- tlb->end = address + PAGE_SIZE;
-}
-
-/*
- * In the case of tlb vma handling, we can optimise these away in the
- * case where we're doing a full MM flush. When we're doing a munmap,
- * the vmas are adjusted to only cover the region to be torn down.
- */
-static inline void
-tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
- if (!tlb->fullmm)
- flush_cache_range(vma, vma->vm_start, vma->vm_end);
-}
-
-static inline void
-tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
- if (!tlb->fullmm && tlb->end) {
- flush_tlb_range(vma, tlb->start, tlb->end);
- init_tlb_gather(tlb);
- }
-}
-
-static inline void tlb_flush_mmu(struct mmu_gather *tlb)
-{
-}
-
-static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
- free_page_and_swap_cache(page);
- return 0;
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
- might_sleep();
- __tlb_remove_page(tlb, page);
-}
-
-#define pte_free_tlb(tlb, ptep, addr) pte_free((tlb)->mm, ptep)
-#define pmd_free_tlb(tlb, pmdp, addr) pmd_free((tlb)->mm, pmdp)
-#define pud_free_tlb(tlb, pudp, addr) pud_free((tlb)->mm, pudp)
-
-#define tlb_migrate_finish(mm) do { } while (0)
+#define __pte_free_tlb(tlb, ptep, addr) pte_free((tlb)->mm, ptep)
+#define __pmd_free_tlb(tlb, pmdp, addr) pmd_free((tlb)->mm, pmdp)
+#define __pud_free_tlb(tlb, pudp, addr) pud_free((tlb)->mm, pudp)
#if defined(CONFIG_CPU_SH4) || defined(CONFIG_SUPERH64)
extern void tlb_wire_entry(struct vm_area_struct *, unsigned long, pte_t);
@@ -128,8 +38,9 @@ static inline void tlb_unwire_entry(void
#define __tlb_remove_tlb_entry(tlb, pte, address) do { } while (0)
#define tlb_flush(tlb) do { } while (0)
+#endif /* CONFIG_MMU */
+
#include <asm-generic/tlb.h>
-#endif /* CONFIG_MMU */
#endif /* __ASSEMBLY__ */
#endif /* __ASM_SH_TLB_H */
Index: linux-2.6/include/asm-generic/tlb.h
===================================================================
--- linux-2.6.orig/include/asm-generic/tlb.h
+++ linux-2.6/include/asm-generic/tlb.h
@@ -79,6 +79,10 @@ struct mmu_gather {
fast_mode : 1; /* No batching */
unsigned int fullmm; /* Flush full mm */
+#ifdef CONFIG_HAVE_MMU_GATHER_RANGE
+ unsigned long start, end;
+#endif
+
struct mmu_gather_batch *active;
struct mmu_gather_batch local;
struct page *__pages[8];
@@ -222,6 +226,35 @@ static inline void tlb_remove_page(struc
tlb_flush_mmu(tlb);
}
+#ifdef CONFIG_HAVE_MMU_GATHER_RANGE
+static inline void
+__tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep, unsigned long address)
+{
+ if (!tlb->fullmm) {
+ tlb->start = min(tlb->start, address);
+ address += PAGE_SIZE;
+ tlb->end = max(tlb->end, address);
+ }
+}
+
+static inline void
+tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
+{
+ if (!tlb->fullmm) {
+ flush_cache_range(vma, vma->vm_start, vma->vm_end);
+ tlb->start = TASK_SIZE;
+ tlb->end = 0;
+ }
+}
+
+static inline void
+tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
+{
+ if (!tlb->fullmm && tlb->end)
+ flush_tlb_range(vma, tlb->start, tlb->end);
+}
+#endif
+
/**
* tlb_remove_tlb_entry - remember a pte unmapping for later tlb invalidation.
*
@@ -255,6 +288,8 @@ static inline void tlb_remove_page(struc
__pmd_free_tlb(tlb, pmdp, address); \
} while (0)
+#ifndef tlb_migrate_finish
#define tlb_migrate_finish(mm) do {} while (0)
+#endif
#endif /* _ASM_GENERIC__TLB_H */
Index: linux-2.6/arch/arm/include/asm/tlbflush.h
===================================================================
--- linux-2.6.orig/arch/arm/include/asm/tlbflush.h
+++ linux-2.6/arch/arm/include/asm/tlbflush.h
@@ -10,12 +10,9 @@
#ifndef _ASMARM_TLBFLUSH_H
#define _ASMARM_TLBFLUSH_H
-
-#ifndef CONFIG_MMU
-
#define tlb_flush(tlb) ((void) tlb)
-#else /* CONFIG_MMU */
+#ifdef CONFIG_MMU
#include <asm/glue.h>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [RFC][PATCH 26/25] mm, arch: Convert ia64, arm, sh to generic tlb
2011-01-26 13:13 ` Peter Zijlstra
(?)
@ 2011-01-26 19:19 ` Sam Ravnborg
2011-01-26 19:30 ` Peter Zijlstra
-1 siblings, 1 reply; 128+ messages in thread
From: Sam Ravnborg @ 2011-01-26 19:19 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds, linux-kernel, linux-arch
> Index: linux-2.6/include/asm-generic/tlb.h
> ===================================================================
> --- linux-2.6.orig/include/asm-generic/tlb.h
> +++ linux-2.6/include/asm-generic/tlb.h
> @@ -79,6 +79,10 @@ struct mmu_gather {
> fast_mode : 1; /* No batching */
> unsigned int fullmm; /* Flush full mm */
>
> +#ifdef CONFIG_HAVE_MMU_GATHER_RANGE
> + unsigned long start, end;
> +#endif
> +
> struct mmu_gather_batch *active;
> struct mmu_gather_batch local;
> struct page *__pages[8];
Not related to this patch per see.
But if you anyway touch this header could you then fix
following comment:
/* Users of the generic TLB shootdown code must declare this storage space. */
DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
If I understand this correct then this should read:
/*
* Users of the generic TLB shootdown
* must define mmu_gathers like this:
* DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
*/
The main difference is "declare" => "define".
If you already updated this in you serie then please ignore this mail.
Sam
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [RFC][PATCH 26/25] mm, arch: Convert ia64, arm, sh to generic tlb
2011-01-26 19:19 ` Sam Ravnborg
@ 2011-01-26 19:30 ` Peter Zijlstra
2011-01-26 20:03 ` Sam Ravnborg
0 siblings, 1 reply; 128+ messages in thread
From: Peter Zijlstra @ 2011-01-26 19:30 UTC (permalink / raw)
To: Sam Ravnborg
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds, linux-kernel, linux-arch
On Wed, 2011-01-26 at 20:19 +0100, Sam Ravnborg wrote:
>
> /* Users of the generic TLB shootdown code must declare this storage space. */
> DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
>
> If I understand this correct then this should read:
>
> /*
> * Users of the generic TLB shootdown
> * must define mmu_gathers like this:
> * DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
> */
>
> The main difference is "declare" => "define".
>
> If you already updated this in you serie then please ignore this mail.
This series rewrites mmu_gather to remove this per-cpu data :-), see
patch #10 which removes the DEFINE_PER_CPU()s after all users are dead.
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [RFC][PATCH 26/25] mm, arch: Convert ia64, arm, sh to generic tlb
2011-01-26 19:30 ` Peter Zijlstra
@ 2011-01-26 20:03 ` Sam Ravnborg
0 siblings, 0 replies; 128+ messages in thread
From: Sam Ravnborg @ 2011-01-26 20:03 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds, linux-kernel, linux-arch
On Wed, Jan 26, 2011 at 08:30:18PM +0100, Peter Zijlstra wrote:
> On Wed, 2011-01-26 at 20:19 +0100, Sam Ravnborg wrote:
> >
> > /* Users of the generic TLB shootdown code must declare this storage space. */
> > DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
> >
> > If I understand this correct then this should read:
> >
> > /*
> > * Users of the generic TLB shootdown
> > * must define mmu_gathers like this:
> > * DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
> > */
> >
> > The main difference is "declare" => "define".
> >
> > If you already updated this in you serie then please ignore this mail.
>
> This series rewrites mmu_gather to remove this per-cpu data :-), see
> patch #10 which removes the DEFINE_PER_CPU()s after all users are dead.
Thanks again.
I better read the patch descriptions before asking next time.
Sam
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 00/25] mm: Preemptibility -v7
2011-01-25 17:31 ` Peter Zijlstra
@ 2011-02-17 12:06 ` Peter Zijlstra
-1 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-02-17 12:06 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Avi Kivity, Thomas Gleixner, Rik van Riel, Ingo Molnar, akpm,
Linus Torvalds, linux-kernel, linux-arch, linux-mm,
Benjamin Herrenschmidt, David Miller, Hugh Dickins, Mel Gorman,
Nick Piggin, Paul McKenney, Yanmin Zhang
On Tue, 2011-01-25 at 18:31 +0100, Peter Zijlstra wrote:
> - figure out what to do with sparc's tlb_batch lack of ->fullmm
David, how does this look?
---
Index: linux-2.6/arch/sparc/mm/tlb.c
===================================================================
--- linux-2.6.orig/arch/sparc/mm/tlb.c
+++ linux-2.6/arch/sparc/mm/tlb.c
@@ -43,7 +43,8 @@ void flush_tlb_pending(void)
put_cpu_var(tlb_batch);
}
-void tlb_batch_add(struct mm_struct *mm, unsigned long vaddr, pte_t *ptep, pte_t orig)
+void tlb_batch_add(struct mm_struct *mm, unsigned long vaddr,
+ pte_t *ptep, pte_t orig, int fullmm)
{
struct tlb_batch *tb = &get_cpu_var(tlb_batch);
unsigned long nr;
@@ -77,12 +78,10 @@ void tlb_batch_add(struct mm_struct *mm,
no_cache_flush:
- /*
- if (tb->fullmm) {
+ if (fullmm) {
put_cpu_var(tlb_batch);
return;
}
- */
nr = tb->tlb_nr;
Index: linux-2.6/arch/sparc/include/asm/pgtable_64.h
===================================================================
--- linux-2.6.orig/arch/sparc/include/asm/pgtable_64.h
+++ linux-2.6/arch/sparc/include/asm/pgtable_64.h
@@ -655,9 +655,11 @@ static inline int pte_special(pte_t pte)
#define pte_unmap(pte) do { } while (0)
/* Actual page table PTE updates. */
-extern void tlb_batch_add(struct mm_struct *mm, unsigned long vaddr, pte_t *ptep, pte_t orig);
+extern void tlb_batch_add(struct mm_struct *mm, unsigned long vaddr,
+ pte_t *ptep, pte_t orig, int fullmm);
-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte)
+static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte, int fullmm)
{
pte_t orig = *ptep;
@@ -670,12 +672,19 @@ static inline void set_pte_at(struct mm_
* and SUN4V pte layout, so this inline test is fine.
*/
if (likely(mm != &init_mm) && (pte_val(orig) & _PAGE_VALID))
- tlb_batch_add(mm, addr, ptep, orig);
+ tlb_batch_add(mm, addr, ptep, orig, fullmm);
}
+#define set_pte_at(mm,addr,ptep,pte) \
+ __set_pte_at((mm), (addr), (ptep), (pte), 0)
+
#define pte_clear(mm,addr,ptep) \
set_pte_at((mm), (addr), (ptep), __pte(0UL))
+#define __HAVE_ARCH_PTE_CLEAR_NOT_PRESENT_FULL
+#define pte_clear_not_present_full(mm,addr,ptep,fullmm) \
+ __set_pte_at((mm), (addr), (ptep), __pte(0UL), (fullmm))
+
#ifdef DCACHE_ALIASING_POSSIBLE
#define __HAVE_ARCH_MOVE_PTE
#define move_pte(pte, prot, old_addr, new_addr) \
^ permalink raw reply [flat|nested] 128+ messages in thread
* Re: [PATCH 00/25] mm: Preemptibility -v7
@ 2011-02-17 12:06 ` Peter Zijlstra
0 siblings, 0 replies; 128+ messages in thread
From: Peter Zijlstra @ 2011-02-17 12:06 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Avi Kivity, Thomas Gleixner, Rik van Riel, Ingo Molnar, akpm,
Linus Torvalds, linux-kernel, linux-arch, linux-mm,
Benjamin Herrenschmidt, David Miller, Hugh Dickins, Mel Gorman,
Nick Piggin, Paul McKenney, Yanmin Zhang
On Tue, 2011-01-25 at 18:31 +0100, Peter Zijlstra wrote:
> - figure out what to do with sparc's tlb_batch lack of ->fullmm
David, how does this look?
---
Index: linux-2.6/arch/sparc/mm/tlb.c
===================================================================
--- linux-2.6.orig/arch/sparc/mm/tlb.c
+++ linux-2.6/arch/sparc/mm/tlb.c
@@ -43,7 +43,8 @@ void flush_tlb_pending(void)
put_cpu_var(tlb_batch);
}
-void tlb_batch_add(struct mm_struct *mm, unsigned long vaddr, pte_t *ptep, pte_t orig)
+void tlb_batch_add(struct mm_struct *mm, unsigned long vaddr,
+ pte_t *ptep, pte_t orig, int fullmm)
{
struct tlb_batch *tb = &get_cpu_var(tlb_batch);
unsigned long nr;
@@ -77,12 +78,10 @@ void tlb_batch_add(struct mm_struct *mm,
no_cache_flush:
- /*
- if (tb->fullmm) {
+ if (fullmm) {
put_cpu_var(tlb_batch);
return;
}
- */
nr = tb->tlb_nr;
Index: linux-2.6/arch/sparc/include/asm/pgtable_64.h
===================================================================
--- linux-2.6.orig/arch/sparc/include/asm/pgtable_64.h
+++ linux-2.6/arch/sparc/include/asm/pgtable_64.h
@@ -655,9 +655,11 @@ static inline int pte_special(pte_t pte)
#define pte_unmap(pte) do { } while (0)
/* Actual page table PTE updates. */
-extern void tlb_batch_add(struct mm_struct *mm, unsigned long vaddr, pte_t *ptep, pte_t orig);
+extern void tlb_batch_add(struct mm_struct *mm, unsigned long vaddr,
+ pte_t *ptep, pte_t orig, int fullmm);
-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte)
+static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte, int fullmm)
{
pte_t orig = *ptep;
@@ -670,12 +672,19 @@ static inline void set_pte_at(struct mm_
* and SUN4V pte layout, so this inline test is fine.
*/
if (likely(mm != &init_mm) && (pte_val(orig) & _PAGE_VALID))
- tlb_batch_add(mm, addr, ptep, orig);
+ tlb_batch_add(mm, addr, ptep, orig, fullmm);
}
+#define set_pte_at(mm,addr,ptep,pte) \
+ __set_pte_at((mm), (addr), (ptep), (pte), 0)
+
#define pte_clear(mm,addr,ptep) \
set_pte_at((mm), (addr), (ptep), __pte(0UL))
+#define __HAVE_ARCH_PTE_CLEAR_NOT_PRESENT_FULL
+#define pte_clear_not_present_full(mm,addr,ptep,fullmm) \
+ __set_pte_at((mm), (addr), (ptep), __pte(0UL), (fullmm))
+
#ifdef DCACHE_ALIASING_POSSIBLE
#define __HAVE_ARCH_MOVE_PTE
#define move_pte(pte, prot, old_addr, new_addr) \
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 128+ messages in thread