All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] THP: mremap support and TLB optimization #2
@ 2011-07-28 14:26 Andrea Arcangeli
  2011-08-04 15:32 ` Andrea Arcangeli
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Andrea Arcangeli @ 2011-07-28 14:26 UTC (permalink / raw)
  To: linux-mm; +Cc: Mel Gorman, Johannes Weiner, Rik van Riel, Hugh Dickins

Hello,

this is the latest version of the mremap THP native implementation
plus optimizations.

So first question is: am I right, the "- 1" that I am removing below
was buggy? It's harmless because these old_end/next are page aligned,
but if PAGE_SIZE would be 1, it'd break, right? It's really confusing
to read even if it happens to work. Please let me know because that "-
1" ironically it's the thing I'm less comfortable about in this patch.

This fixes a couple of bugs that were still present in the previous
post.

Here are also some benchmarks with a proggy like this:

===
#define _GNU_SOURCE
#include <sys/mman.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <sys/time.h>

#define SIZE (5UL*1024*1024*1024)

int main()
{
        static struct timeval oldstamp, newstamp;
	long diffsec;
	char *p, *p2, *p3, *p4;
	if (posix_memalign((void **)&p, 2*1024*1024, SIZE))
		perror("memalign"), exit(1);
	if (posix_memalign((void **)&p2, 2*1024*1024, SIZE))
		perror("memalign"), exit(1);
	if (posix_memalign((void **)&p3, 2*1024*1024, 4096))
		perror("memalign"), exit(1);

	memset(p, 0xff, SIZE);
	memset(p2, 0xff, SIZE);
	memset(p3, 0x77, 4096);
	gettimeofday(&oldstamp, NULL);
	p4 = mremap(p, SIZE, SIZE, MREMAP_FIXED|MREMAP_MAYMOVE, p3);
	gettimeofday(&newstamp, NULL);
	diffsec = newstamp.tv_sec - oldstamp.tv_sec;
	diffsec = newstamp.tv_usec - oldstamp.tv_usec + 1000000 * diffsec;
	printf("usec %ld\n", diffsec);
	if (p == MAP_FAILED || p4 != p3)
	//if (p == MAP_FAILED)
		perror("mremap"), exit(1);
	if (memcmp(p4, p2, SIZE))
		printf("mremap bug\n"), exit(1);
	printf("ok\n");

	return 0;
}
===

THP on

 Performance counter stats for './largepage13' (3 runs):

          69195836 dTLB-loads                 ( +-   3.546% )  (scaled from 50.30%)
             60708 dTLB-load-misses           ( +-  11.776% )  (scaled from 52.62%)
         676266476 dTLB-stores                ( +-   5.654% )  (scaled from 69.54%)
             29856 dTLB-store-misses          ( +-   4.081% )  (scaled from 89.22%)
        1055848782 iTLB-loads                 ( +-   4.526% )  (scaled from 80.18%)
              8689 iTLB-load-misses           ( +-   2.987% )  (scaled from 58.20%)

        7.314454164  seconds time elapsed   ( +-   0.023% )

THP off

 Performance counter stats for './largepage13' (3 runs):

        1967379311 dTLB-loads                 ( +-   0.506% )  (scaled from 60.59%)
           9238687 dTLB-load-misses           ( +-  22.547% )  (scaled from 61.87%)
        2014239444 dTLB-stores                ( +-   0.692% )  (scaled from 60.40%)
           3312335 dTLB-store-misses          ( +-   7.304% )  (scaled from 67.60%)
        6764372065 iTLB-loads                 ( +-   0.925% )  (scaled from 79.00%)
              8202 iTLB-load-misses           ( +-   0.475% )  (scaled from 70.55%)

        9.693655243  seconds time elapsed   ( +-   0.069% )

grep thp /proc/vmstat 
thp_fault_alloc 35849
thp_fault_fallback 0
thp_collapse_alloc 3
thp_collapse_alloc_failed 0
thp_split 0

thp_split 0 confirms no thp split despite plenty of hugepages allocated.

The measurement of only the mremap time (so excluding the 3 long
memset and final long 10GB memory accessing memcmp):

THP on

usec 14824
usec 14862
usec 14859

THP off

usec 256416
usec 255981
usec 255847

With an older kernel without the mremap optimizations (the below patch
optimizes the non THP version too).

THP on

usec 392107
usec 390237
usec 404124

THP off

usec 444294
usec 445237
usec 445820

I guess with a threaded program that sends more IPI on large SMP it'd
create an even larger difference.

All debug options are off except DEBUG_VM to avoid skewing the
results.

The only problem for native 2M mremap like it happens above both the
source and destination address must be 2M aligned or the hugepmd can't
be moved without a split.

Patch follows:

===
Subject: thp: mremap support and TLB optimization

From: Andrea Arcangeli <aarcange@redhat.com>

This adds THP support to mremap (decreases the number of split_huge_page
called).

This also replaces ptep_clear_flush with ptep_get_and_clear and replaces it
with a final flush_tlb_range to send a single tlb flush IPI instead of one IPI
for each page.

It also removes a bogus (even if harmless) "- 1" subtraction in the
"next" calculation in move_page_tables().

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -22,6 +22,11 @@ extern int zap_huge_pmd(struct mmu_gathe
 extern int mincore_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 			unsigned long addr, unsigned long end,
 			unsigned char *vec);
+extern int move_huge_pmd(struct vm_area_struct *vma,
+			 struct vm_area_struct *new_vma,
+			 unsigned long old_addr,
+			 unsigned long new_addr, unsigned long old_end,
+			 pmd_t *old_pmd, pmd_t *new_pmd);
 extern int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 			unsigned long addr, pgprot_t newprot);
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1054,6 +1054,52 @@ int mincore_huge_pmd(struct vm_area_stru
 	return ret;
 }
 
+int move_huge_pmd(struct vm_area_struct *vma, struct vm_area_struct *new_vma,
+		  unsigned long old_addr,
+		  unsigned long new_addr, unsigned long old_end,
+		  pmd_t *old_pmd, pmd_t *new_pmd)
+{
+	int ret = 0;
+	pmd_t pmd;
+
+	struct mm_struct *mm = vma->vm_mm;
+
+	if ((old_addr & ~HPAGE_PMD_MASK) ||
+	    (new_addr & ~HPAGE_PMD_MASK) ||
+	    (old_addr + HPAGE_PMD_SIZE) > old_end ||
+	    new_vma->vm_flags & VM_NOHUGEPAGE)
+		goto out;
+
+	/*
+	 * The destination pmd shouldn't be established, free_pgtables()
+	 * should have release it.
+	 */
+	if (!pmd_none(*new_pmd)) {
+		WARN_ON(1);
+		VM_BUG_ON(pmd_trans_huge(*new_pmd));
+		goto out;
+	}
+
+	spin_lock(&mm->page_table_lock);
+	if (likely(pmd_trans_huge(*old_pmd))) {
+		if (pmd_trans_splitting(*old_pmd)) {
+			spin_unlock(&mm->page_table_lock);
+			wait_split_huge_page(vma->anon_vma, old_pmd);
+			ret = -1;
+		} else {
+			pmd = pmdp_get_and_clear(mm, old_addr, old_pmd);
+			VM_BUG_ON(!pmd_none(*new_pmd));
+			set_pmd_at(mm, new_addr, new_pmd, pmd);
+			spin_unlock(&mm->page_table_lock);
+			ret = 1;
+		}
+	} else
+		spin_unlock(&mm->page_table_lock);
+
+out:
+	return ret;
+}
+
 int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 		unsigned long addr, pgprot_t newprot)
 {
diff --git a/mm/mremap.c b/mm/mremap.c
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -41,8 +41,7 @@ static pmd_t *get_old_pmd(struct mm_stru
 		return NULL;
 
 	pmd = pmd_offset(pud, addr);
-	split_huge_page_pmd(mm, pmd);
-	if (pmd_none_or_clear_bad(pmd))
+	if (pmd_none(*pmd))
 		return NULL;
 
 	return pmd;
@@ -65,8 +64,6 @@ static pmd_t *alloc_new_pmd(struct mm_st
 		return NULL;
 
 	VM_BUG_ON(pmd_trans_huge(*pmd));
-	if (pmd_none(*pmd) && __pte_alloc(mm, vma, pmd, addr))
-		return NULL;
 
 	return pmd;
 }
@@ -80,11 +77,7 @@ static void move_ptes(struct vm_area_str
 	struct mm_struct *mm = vma->vm_mm;
 	pte_t *old_pte, *new_pte, pte;
 	spinlock_t *old_ptl, *new_ptl;
-	unsigned long old_start;
 
-	old_start = old_addr;
-	mmu_notifier_invalidate_range_start(vma->vm_mm,
-					    old_start, old_end);
 	if (vma->vm_file) {
 		/*
 		 * Subtle point from Rajesh Venkatasubramanian: before
@@ -111,7 +104,7 @@ static void move_ptes(struct vm_area_str
 				   new_pte++, new_addr += PAGE_SIZE) {
 		if (pte_none(*old_pte))
 			continue;
-		pte = ptep_clear_flush(vma, old_addr, old_pte);
+		pte = ptep_get_and_clear(mm, old_addr, old_pte);
 		pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr);
 		set_pte_at(mm, new_addr, new_pte, pte);
 	}
@@ -123,7 +116,6 @@ static void move_ptes(struct vm_area_str
 	pte_unmap_unlock(old_pte - 1, old_ptl);
 	if (mapping)
 		mutex_unlock(&mapping->i_mmap_mutex);
-	mmu_notifier_invalidate_range_end(vma->vm_mm, old_start, old_end);
 }
 
 #define LATENCY_LIMIT	(64 * PAGE_SIZE)
@@ -134,14 +126,17 @@ unsigned long move_page_tables(struct vm
 {
 	unsigned long extent, next, old_end;
 	pmd_t *old_pmd, *new_pmd;
+	bool need_flush = false;
 
 	old_end = old_addr + len;
 	flush_cache_range(vma, old_addr, old_end);
 
+	mmu_notifier_invalidate_range_start(vma->vm_mm, old_addr, old_end);
+
 	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
 		cond_resched();
 		next = (old_addr + PMD_SIZE) & PMD_MASK;
-		if (next - 1 > old_end)
+		if (next > old_end)
 			next = old_end;
 		extent = next - old_addr;
 		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
@@ -150,6 +145,23 @@ unsigned long move_page_tables(struct vm
 		new_pmd = alloc_new_pmd(vma->vm_mm, vma, new_addr);
 		if (!new_pmd)
 			break;
+		if (pmd_trans_huge(*old_pmd)) {
+			int err = 0;
+			if (extent == HPAGE_PMD_SIZE)
+				err = move_huge_pmd(vma, new_vma, old_addr,
+						    new_addr, old_end,
+						    old_pmd, new_pmd);
+			if (err > 0) {
+				need_flush = true;
+				continue;
+			} else if (!err)
+				split_huge_page_pmd(vma->vm_mm, old_pmd);
+			VM_BUG_ON(pmd_trans_huge(*old_pmd));
+		}
+		if (pmd_none(*new_pmd) && __pte_alloc(new_vma->vm_mm, new_vma,
+						      new_pmd,
+						      new_addr))
+			break;
 		next = (new_addr + PMD_SIZE) & PMD_MASK;
 		if (extent > next - new_addr)
 			extent = next - new_addr;
@@ -157,7 +169,12 @@ unsigned long move_page_tables(struct vm
 			extent = LATENCY_LIMIT;
 		move_ptes(vma, old_pmd, old_addr, old_addr + extent,
 				new_vma, new_pmd, new_addr);
+		need_flush = true;
 	}
+	if (likely(need_flush))
+		flush_tlb_range(vma, old_end-len, old_addr);
+
+	mmu_notifier_invalidate_range_end(vma->vm_mm, old_end-len, old_end);
 
 	return len + old_addr - old_end;	/* how much done */
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] THP: mremap support and TLB optimization #2
  2011-07-28 14:26 [PATCH] THP: mremap support and TLB optimization #2 Andrea Arcangeli
@ 2011-08-04 15:32 ` Andrea Arcangeli
  2011-08-05 10:50 ` Johannes Weiner
  2011-08-05 15:25 ` Mel Gorman
  2 siblings, 0 replies; 9+ messages in thread
From: Andrea Arcangeli @ 2011-08-04 15:32 UTC (permalink / raw)
  To: linux-mm; +Cc: Mel Gorman, Johannes Weiner, Rik van Riel, Hugh Dickins

Hello everyone,

On Thu, Jul 28, 2011 at 04:26:31PM +0200, Andrea Arcangeli wrote:
> Hello,
> 
> this is the latest version of the mremap THP native implementation
> plus optimizations.
> 
> So first question is: am I right, the "- 1" that I am removing below
> was buggy? It's harmless because these old_end/next are page aligned,
> but if PAGE_SIZE would be 1, it'd break, right? It's really confusing
> to read even if it happens to work. Please let me know because that "-
> 1" ironically it's the thing I'm less comfortable about in this patch.
> diff --git a/mm/mremap.c b/mm/mremap.c

*snip*

> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -134,14 +126,17 @@ unsigned long move_page_tables(struct vm
>  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>  		cond_resched();
>  		next = (old_addr + PMD_SIZE) & PMD_MASK;
> -		if (next - 1 > old_end)
> +		if (next > old_end)
>  			next = old_end;
>  		extent = next - old_addr;
>  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);

Could somebody comment on the "- 1" removal, the rest is less
urgent. That above change is indipendent of the mremap optimizations
and could be split off. For whatever reason it makes me uncomfortable
and I'd appreciate at least one positive or negative comment on it
from another two eyes ;). No need to review the rest for now if you're
busy.

Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] THP: mremap support and TLB optimization #2
  2011-07-28 14:26 [PATCH] THP: mremap support and TLB optimization #2 Andrea Arcangeli
  2011-08-04 15:32 ` Andrea Arcangeli
@ 2011-08-05 10:50 ` Johannes Weiner
  2011-08-05 15:52   ` Andrea Arcangeli
  2011-08-05 15:25 ` Mel Gorman
  2 siblings, 1 reply; 9+ messages in thread
From: Johannes Weiner @ 2011-08-05 10:50 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-mm, Mel Gorman, Rik van Riel, Hugh Dickins

On Thu, Jul 28, 2011 at 04:26:31PM +0200, Andrea Arcangeli wrote:
> Hello,
> 
> this is the latest version of the mremap THP native implementation
> plus optimizations.
> 
> So first question is: am I right, the "- 1" that I am removing below
> was buggy? It's harmless because these old_end/next are page aligned,
> but if PAGE_SIZE would be 1, it'd break, right? It's really confusing
> to read even if it happens to work. Please let me know because that "-
> 1" ironically it's the thing I'm less comfortable about in this patch.

> @@ -134,14 +126,17 @@ unsigned long move_page_tables(struct vm
>  {
>  	unsigned long extent, next, old_end;
>  	pmd_t *old_pmd, *new_pmd;
> +	bool need_flush = false;
>  
>  	old_end = old_addr + len;
>  	flush_cache_range(vma, old_addr, old_end);
>  
> +	mmu_notifier_invalidate_range_start(vma->vm_mm, old_addr, old_end);
> +
>  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>  		cond_resched();
>  		next = (old_addr + PMD_SIZE) & PMD_MASK;
> -		if (next - 1 > old_end)
> +		if (next > old_end)
>  			next = old_end;

If old_addr + PMD_SIZE overflows, next will be zero, thus smaller than
old_end and not fixed up.  This results in a bogus extent length here:

>  		extent = next - old_addr;

which I think can overrun old_addr + len if the extent should have
actually been smaller than the distance between new_addr and the next
PMD as well as that LATENCY_LIMIT to which extent is capped a few
lines below.  I haven't checked all the possibilities, though.

It could probably be

	if (next > old_end || next - 1 > old_end)
		next = old_end

to catch the theoretical next == old_end + 1 case, but PAGE_SIZE > 1
looks like a sensible assumption to me.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] THP: mremap support and TLB optimization #2
  2011-07-28 14:26 [PATCH] THP: mremap support and TLB optimization #2 Andrea Arcangeli
  2011-08-04 15:32 ` Andrea Arcangeli
  2011-08-05 10:50 ` Johannes Weiner
@ 2011-08-05 15:25 ` Mel Gorman
  2011-08-05 16:21   ` Andrea Arcangeli
  2011-08-06 16:31   ` Andrea Arcangeli
  2 siblings, 2 replies; 9+ messages in thread
From: Mel Gorman @ 2011-08-05 15:25 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-mm, Johannes Weiner, Rik van Riel, Hugh Dickins

On Thu, Jul 28, 2011 at 04:26:31PM +0200, Andrea Arcangeli wrote:
> Here are also some benchmarks with a proggy like this:
> 
> ===
> #define _GNU_SOURCE
> #include <sys/mman.h>
> #include <stdlib.h>
> #include <stdio.h>
> #include <string.h>
> #include <sys/time.h>
> 
> #define SIZE (5UL*1024*1024*1024)
> 
> int main()
> {
>         static struct timeval oldstamp, newstamp;
> 	long diffsec;
> 	char *p, *p2, *p3, *p4;
> 	if (posix_memalign((void **)&p, 2*1024*1024, SIZE))
> 		perror("memalign"), exit(1);
> 	if (posix_memalign((void **)&p2, 2*1024*1024, SIZE))
> 		perror("memalign"), exit(1);
> 	if (posix_memalign((void **)&p3, 2*1024*1024, 4096))
> 		perror("memalign"), exit(1);
> 
> 	memset(p, 0xff, SIZE);
> 	memset(p2, 0xff, SIZE);
> 	memset(p3, 0x77, 4096);
> 	gettimeofday(&oldstamp, NULL);
> 	p4 = mremap(p, SIZE, SIZE, MREMAP_FIXED|MREMAP_MAYMOVE, p3);
> 	gettimeofday(&newstamp, NULL);
> 	diffsec = newstamp.tv_sec - oldstamp.tv_sec;
> 	diffsec = newstamp.tv_usec - oldstamp.tv_usec + 1000000 * diffsec;
> 	printf("usec %ld\n", diffsec);
> 	if (p == MAP_FAILED || p4 != p3)
> 	//if (p == MAP_FAILED)
> 		perror("mremap"), exit(1);
> 	if (memcmp(p4, p2, SIZE))
> 		printf("mremap bug\n"), exit(1);
> 	printf("ok\n");
> 
> 	return 0;
> }
> ===
> 
> THP on
> 
>  Performance counter stats for './largepage13' (3 runs):
> 
>           69195836 dTLB-loads                 ( +-   3.546% )  (scaled from 50.30%)
>              60708 dTLB-load-misses           ( +-  11.776% )  (scaled from 52.62%)
>          676266476 dTLB-stores                ( +-   5.654% )  (scaled from 69.54%)
>              29856 dTLB-store-misses          ( +-   4.081% )  (scaled from 89.22%)
>         1055848782 iTLB-loads                 ( +-   4.526% )  (scaled from 80.18%)
>               8689 iTLB-load-misses           ( +-   2.987% )  (scaled from 58.20%)
> 
>         7.314454164  seconds time elapsed   ( +-   0.023% )
> 
> THP off
> 
>  Performance counter stats for './largepage13' (3 runs):
> 
>         1967379311 dTLB-loads                 ( +-   0.506% )  (scaled from 60.59%)
>            9238687 dTLB-load-misses           ( +-  22.547% )  (scaled from 61.87%)
>         2014239444 dTLB-stores                ( +-   0.692% )  (scaled from 60.40%)
>            3312335 dTLB-store-misses          ( +-   7.304% )  (scaled from 67.60%)
>         6764372065 iTLB-loads                 ( +-   0.925% )  (scaled from 79.00%)
>               8202 iTLB-load-misses           ( +-   0.475% )  (scaled from 70.55%)
> 
>         9.693655243  seconds time elapsed   ( +-   0.069% )
> 
> grep thp /proc/vmstat 
> thp_fault_alloc 35849
> thp_fault_fallback 0
> thp_collapse_alloc 3
> thp_collapse_alloc_failed 0
> thp_split 0
> 
> thp_split 0 confirms no thp split despite plenty of hugepages allocated.
> 
> The measurement of only the mremap time (so excluding the 3 long
> memset and final long 10GB memory accessing memcmp):
> 
> THP on
> 
> usec 14824
> usec 14862
> usec 14859
> 
> THP off
> 
> usec 256416
> usec 255981
> usec 255847
> 
> With an older kernel without the mremap optimizations (the below patch
> optimizes the non THP version too).
> 
> THP on
> 
> usec 392107
> usec 390237
> usec 404124
> 
> THP off
> 
> usec 444294
> usec 445237
> usec 445820
> 
> I guess with a threaded program that sends more IPI on large SMP it'd
> create an even larger difference.
> 
> All debug options are off except DEBUG_VM to avoid skewing the
> results.
> 
> The only problem for native 2M mremap like it happens above both the
> source and destination address must be 2M aligned or the hugepmd can't
> be moved without a split.
> 
> Patch follows:
> 
> ===
> Subject: thp: mremap support and TLB optimization
> 
> From: Andrea Arcangeli <aarcange@redhat.com>
> 
> This adds THP support to mremap (decreases the number of split_huge_page
> called).
> 
> This also replaces ptep_clear_flush with ptep_get_and_clear and replaces it
> with a final flush_tlb_range to send a single tlb flush IPI instead of one IPI
> for each page.
> 
> It also removes a bogus (even if harmless) "- 1" subtraction in the
> "next" calculation in move_page_tables().
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -22,6 +22,11 @@ extern int zap_huge_pmd(struct mmu_gathe
>  extern int mincore_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
>  			unsigned long addr, unsigned long end,
>  			unsigned char *vec);
> +extern int move_huge_pmd(struct vm_area_struct *vma,
> +			 struct vm_area_struct *new_vma,
> +			 unsigned long old_addr,
> +			 unsigned long new_addr, unsigned long old_end,
> +			 pmd_t *old_pmd, pmd_t *new_pmd);
>  extern int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
>  			unsigned long addr, pgprot_t newprot);
>  
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1054,6 +1054,52 @@ int mincore_huge_pmd(struct vm_area_stru
>  	return ret;
>  }
>  
> +int move_huge_pmd(struct vm_area_struct *vma, struct vm_area_struct *new_vma,
> +		  unsigned long old_addr,
> +		  unsigned long new_addr, unsigned long old_end,
> +		  pmd_t *old_pmd, pmd_t *new_pmd)
> +{
> +	int ret = 0;
> +	pmd_t pmd;
> +
> +	struct mm_struct *mm = vma->vm_mm;
> +
> +	if ((old_addr & ~HPAGE_PMD_MASK) ||
> +	    (new_addr & ~HPAGE_PMD_MASK) ||

How could these conditions ever be true? We are here because it was
pmd_trans_huge. There should be no way this can be aligned. If this
is paranoia, make it a BUG_ON.

> +	    (old_addr + HPAGE_PMD_SIZE) > old_end ||

Again, is this possible? The old addr was already huge.

> +	    new_vma->vm_flags & VM_NOHUGEPAGE)

This makes sense.

> +		goto out;
> +
> +	/*
> +	 * The destination pmd shouldn't be established, free_pgtables()
> +	 * should have release it.
> +	 */
> +	if (!pmd_none(*new_pmd)) {
> +		WARN_ON(1);
> +		VM_BUG_ON(pmd_trans_huge(*new_pmd));
> +		goto out;
> +	}
> +

Agreed that this should never happen. The mmap_sem is held for writing
and we are remapping to what should be empty space. It should not be
possible for a huge PMD to be established underneath us.

> +	spin_lock(&mm->page_table_lock);
> +	if (likely(pmd_trans_huge(*old_pmd))) {
> +		if (pmd_trans_splitting(*old_pmd)) {
> +			spin_unlock(&mm->page_table_lock);
> +			wait_split_huge_page(vma->anon_vma, old_pmd);
> +			ret = -1;
> +		} else {
> +			pmd = pmdp_get_and_clear(mm, old_addr, old_pmd);
> +			VM_BUG_ON(!pmd_none(*new_pmd));
> +			set_pmd_at(mm, new_addr, new_pmd, pmd);
> +			spin_unlock(&mm->page_table_lock);
> +			ret = 1;
> +		}
> +	} else
> +		spin_unlock(&mm->page_table_lock);
> +

The meaning of the return values of -1, 0, 1 with the caller doing

if (err)
...
else if (!err)
	...

is tricky to work out. split_huge_page only needs to be called if
returning 0. Would it be possible to have the split_huge_page called in
this function? The end of the function would then look like

return ret;

out_split:
split_huge_page_pmd()
return ret;

with either success or failure being returned instead of a tristate
which is easier to understand.

> +out:
> +	return ret;
> +}
> +
>  int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
>  		unsigned long addr, pgprot_t newprot)
>  {
> diff --git a/mm/mremap.c b/mm/mremap.c
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -41,8 +41,7 @@ static pmd_t *get_old_pmd(struct mm_stru
>  		return NULL;
>  
>  	pmd = pmd_offset(pud, addr);
> -	split_huge_page_pmd(mm, pmd);
> -	if (pmd_none_or_clear_bad(pmd))
> +	if (pmd_none(*pmd))
>  		return NULL;
>  

Ok, this is changing to pmd_none because it could be a huge PMD and
pmd_none_or_clear_bad triggers on a huge PMD. Right?

>  	return pmd;
> @@ -65,8 +64,6 @@ static pmd_t *alloc_new_pmd(struct mm_st
>  		return NULL;
>  
>  	VM_BUG_ON(pmd_trans_huge(*pmd));
> -	if (pmd_none(*pmd) && __pte_alloc(mm, vma, pmd, addr))
> -		return NULL;
>  
>  	return pmd;
>  }
> @@ -80,11 +77,7 @@ static void move_ptes(struct vm_area_str
>  	struct mm_struct *mm = vma->vm_mm;
>  	pte_t *old_pte, *new_pte, pte;
>  	spinlock_t *old_ptl, *new_ptl;
> -	unsigned long old_start;
>  
> -	old_start = old_addr;
> -	mmu_notifier_invalidate_range_start(vma->vm_mm,
> -					    old_start, old_end);

The MMU notifier is now being called for a larger range. Previously it
would usually be ranges of 64 pages and now it looks like it happens
once for the entire range being remapped. This is not mentioned in
the leader. What are the consequences of having a large gap between
invalidate_start and invalidate_end? Would it be a big deal to call
the MMU notifier within move_huge_pmd()?

If it's safe to use larger ranges, it would be preferable to see it
in a separate patch or at the very least explained in the changelog.

>  	if (vma->vm_file) {
>  		/*
>  		 * Subtle point from Rajesh Venkatasubramanian: before
> @@ -111,7 +104,7 @@ static void move_ptes(struct vm_area_str
>  				   new_pte++, new_addr += PAGE_SIZE) {
>  		if (pte_none(*old_pte))
>  			continue;
> -		pte = ptep_clear_flush(vma, old_addr, old_pte);
> +		pte = ptep_get_and_clear(mm, old_addr, old_pte);

This looks like an unrelated optimisation. You hint at this in the
patch subject but it needs a separate patch or a better explanation in
the leader. If I'm reading this right, it looks like you are deferring
a TLB flush on a single page and calling one call later at the end of
move_page_tables. At a glance, that seems ok and would reduce IPIs
but I'm not thinking about it properly because I'm trying to think
about THP shenanigans :)

>  		pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr);
>  		set_pte_at(mm, new_addr, new_pte, pte);
>  	}
> @@ -123,7 +116,6 @@ static void move_ptes(struct vm_area_str
>  	pte_unmap_unlock(old_pte - 1, old_ptl);
>  	if (mapping)
>  		mutex_unlock(&mapping->i_mmap_mutex);
> -	mmu_notifier_invalidate_range_end(vma->vm_mm, old_start, old_end);
>  }
>  
>  #define LATENCY_LIMIT	(64 * PAGE_SIZE)
> @@ -134,14 +126,17 @@ unsigned long move_page_tables(struct vm
>  {
>  	unsigned long extent, next, old_end;
>  	pmd_t *old_pmd, *new_pmd;
> +	bool need_flush = false;
>  
>  	old_end = old_addr + len;
>  	flush_cache_range(vma, old_addr, old_end);
>  
> +	mmu_notifier_invalidate_range_start(vma->vm_mm, old_addr, old_end);
> +
>  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>  		cond_resched();
>  		next = (old_addr + PMD_SIZE) & PMD_MASK;
> -		if (next - 1 > old_end)
> +		if (next > old_end)
>  			next = old_end;
>  		extent = next - old_addr;
>  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);

You asked if removing this "- 1" is correct. It's an overflow check for
a situation where old_addr + PMD_SIZE overflows. On what architecture
is it possible to call mremap() at the very top of the address space
or am I missing the point?

Otherwise I think the existing check is harmless if obscure. It's
reasonable to assume PAGE_SIZE will be > 1 and I'm not seeing why it is
required by the rest of the patch.

> @@ -150,6 +145,23 @@ unsigned long move_page_tables(struct vm
>  		new_pmd = alloc_new_pmd(vma->vm_mm, vma, new_addr);
>  		if (!new_pmd)
>  			break;
> +		if (pmd_trans_huge(*old_pmd)) {
> +			int err = 0;
> +			if (extent == HPAGE_PMD_SIZE)
> +				err = move_huge_pmd(vma, new_vma, old_addr,
> +						    new_addr, old_end,
> +						    old_pmd, new_pmd);
> +			if (err > 0) {
> +				need_flush = true;
> +				continue;
> +			} else if (!err)
> +				split_huge_page_pmd(vma->vm_mm, old_pmd);
> +			VM_BUG_ON(pmd_trans_huge(*old_pmd));

This tristate is hard to parse but I mentioned this already.

> +		}
> +		if (pmd_none(*new_pmd) && __pte_alloc(new_vma->vm_mm, new_vma,
> +						      new_pmd,
> +						      new_addr))
> +			break;
>  		next = (new_addr + PMD_SIZE) & PMD_MASK;
>  		if (extent > next - new_addr)
>  			extent = next - new_addr;
> @@ -157,7 +169,12 @@ unsigned long move_page_tables(struct vm
>  			extent = LATENCY_LIMIT;
>  		move_ptes(vma, old_pmd, old_addr, old_addr + extent,
>  				new_vma, new_pmd, new_addr);
> +		need_flush = true;
>  	}
> +	if (likely(need_flush))
> +		flush_tlb_range(vma, old_end-len, old_addr);
> +
> +	mmu_notifier_invalidate_range_end(vma->vm_mm, old_end-len, old_end);
>  
>  	return len + old_addr - old_end;	/* how much done */
>  }
> 

Functionally, I can't see a major problem with the patch. The
minor problems are that I'd like to see that tristate replaced for
readability, the optimisation better explained or in a separate patch
and an explanation why the larger ranges for mmu_notifiers is not
a problem.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] THP: mremap support and TLB optimization #2
  2011-08-05 10:50 ` Johannes Weiner
@ 2011-08-05 15:52   ` Andrea Arcangeli
  0 siblings, 0 replies; 9+ messages in thread
From: Andrea Arcangeli @ 2011-08-05 15:52 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: linux-mm, Mel Gorman, Rik van Riel, Hugh Dickins

On Fri, Aug 05, 2011 at 12:50:47PM +0200, Johannes Weiner wrote:
> On Thu, Jul 28, 2011 at 04:26:31PM +0200, Andrea Arcangeli wrote:
> > Hello,
> > 
> > this is the latest version of the mremap THP native implementation
> > plus optimizations.
> > 
> > So first question is: am I right, the "- 1" that I am removing below
> > was buggy? It's harmless because these old_end/next are page aligned,
> > but if PAGE_SIZE would be 1, it'd break, right? It's really confusing
> > to read even if it happens to work. Please let me know because that "-
> > 1" ironically it's the thing I'm less comfortable about in this patch.
> 
> > @@ -134,14 +126,17 @@ unsigned long move_page_tables(struct vm
> >  {
> >  	unsigned long extent, next, old_end;
> >  	pmd_t *old_pmd, *new_pmd;
> > +	bool need_flush = false;
> >  
> >  	old_end = old_addr + len;
> >  	flush_cache_range(vma, old_addr, old_end);
> >  
> > +	mmu_notifier_invalidate_range_start(vma->vm_mm, old_addr, old_end);
> > +
> >  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
> >  		cond_resched();
> >  		next = (old_addr + PMD_SIZE) & PMD_MASK;
> > -		if (next - 1 > old_end)
> > +		if (next > old_end)
> >  			next = old_end;
> 
> If old_addr + PMD_SIZE overflows, next will be zero, thus smaller than
> old_end and not fixed up.  This results in a bogus extent length here:

So basically this -1 is to prevent an overflow and it's relaying on
PAGE_SIZE > 1 to be safe, which is safe assumption.

> >  		extent = next - old_addr;
> 
> which I think can overrun old_addr + len if the extent should have
> actually been smaller than the distance between new_addr and the next
> PMD as well as that LATENCY_LIMIT to which extent is capped a few
> lines below.  I haven't checked all the possibilities, though.
> 
> It could probably be
> 
> 	if (next > old_end || next - 1 > old_end)
> 		next = old_end
> 
> to catch the theoretical next == old_end + 1 case, but PAGE_SIZE > 1
> looks like a sensible assumption to me.

However if old_end == 0, I doubt it could still work safe because next
would be again zero leading to an extreme high extent. It looks like
the last page must not be allowed to be mapped for this to work
safe. sparc is using 0xf0000000 as TASK_SIZE. But being safe on the
PMD is better in case it spans more than 32-4 bits. The pmd_shift on
sparc32 seems at most 23, so it doesn't look problematic. Maybe some
other arch would lead to trouble, but it's possible it never happens
to be problematic and it was just an off by one error as I thought?
For this to break the userland must end less than a PMD_SIZE before
the end of the address space and it's possible no arch is like
that. x86 has pgd_size of 4m on 32bit nopae, and 2m pmd_size for pae
32/64bit and address space 32bit ends at 3g... leaving 1g vs 4m or
>=1g vs 2m.

But hey I prefer to be safe against overflow even if no arch is
actually going to be in trouble and if TASK_SIZE is always set at
least one PMD away from the overflowing point like it seems to happen
on sparc32 too.

So rather than doing -1 which looks more like an off by one error,
it's cleaner to do an explicit overflow check.

This first calculates the next pmd start which may be 0 if it
overflows. Then does 0 - old_addr and stores it in extent. Then
comares old_end-old_addr with extent (where extent is from old_addr to
the next pmd start). If old_end-old_addr is smaller than extent it
stores that into extent to stop at old_end instead of at the next pmd
start. So the -1 goes away.

diff --git a/mm/mremap.c b/mm/mremap.c
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -136,9 +136,10 @@ unsigned long move_page_tables(struct vm
 	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
 		cond_resched();
 		next = (old_addr + PMD_SIZE) & PMD_MASK;
-		if (next > old_end)
-			next = old_end;
+		/* even if next overflowed, extent below will be ok */
 		extent = next - old_addr;
+		if (extent > old_end - old_addr)
+			extent = old_end - old_addr;
 		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
 		if (!old_pmd)
 			continue;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] THP: mremap support and TLB optimization #2
  2011-08-05 15:25 ` Mel Gorman
@ 2011-08-05 16:21   ` Andrea Arcangeli
  2011-08-05 17:11     ` Mel Gorman
  2011-08-06 16:31   ` Andrea Arcangeli
  1 sibling, 1 reply; 9+ messages in thread
From: Andrea Arcangeli @ 2011-08-05 16:21 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, Johannes Weiner, Rik van Riel, Hugh Dickins

On Fri, Aug 05, 2011 at 04:25:16PM +0100, Mel Gorman wrote:
> > +int move_huge_pmd(struct vm_area_struct *vma, struct vm_area_struct *new_vma,
> > +		  unsigned long old_addr,
> > +		  unsigned long new_addr, unsigned long old_end,
> > +		  pmd_t *old_pmd, pmd_t *new_pmd)
> > +{
> > +	int ret = 0;
> > +	pmd_t pmd;
> > +
> > +	struct mm_struct *mm = vma->vm_mm;
> > +
> > +	if ((old_addr & ~HPAGE_PMD_MASK) ||
> > +	    (new_addr & ~HPAGE_PMD_MASK) ||
> 
> How could these conditions ever be true? We are here because it was
> pmd_trans_huge. There should be no way this can be aligned. If this
> is paranoia, make it a BUG_ON.

It actually happens. old_addr/new_addr aren't aligned to hpage
beforehand, and they just from the mremap syscall
parameters. Returning 0 is needed here.

> 
> > +	    (old_addr + HPAGE_PMD_SIZE) > old_end ||
> 
> Again, is this possible? The old addr was already huge.

This can happen too, old_end is also passed as parameter from syscall
and it's not mangled to fit an hpage, just calculated as old_addr+len
(len passed as parameter). The length of the destination vma shouldn't
be necessary to check, there's no new_end in the first place as
parameter of move_page_tables so the caller must ensure there's enough
space to copy in the destination. And if the old_end-old_addr <=
HPAGE_PMD_SIZE and the old_addr and new_addr are aligned, we're sure
the hugepmd is safe to create on the destination new_addr (if it's
aligned).

So I think all checks are needed here. In theory the caller could
check this stuff before calling move_huge_pmd but I try to be as less
invasive as possible in the common code that may be built with
TRANSPARENT_HUGEPAGE=n (these checks would still be computed in the
pmd_trans_huge(*old_pmd) case only but it's kind of nicer to hide them
in huge_memory.c).

> > +	/*
> > +	 * The destination pmd shouldn't be established, free_pgtables()
> > +	 * should have release it.
> > +	 */
> > +	if (!pmd_none(*new_pmd)) {
> > +		WARN_ON(1);
> > +		VM_BUG_ON(pmd_trans_huge(*new_pmd));
> > +		goto out;
> > +	}
> > +
> 
> Agreed that this should never happen. The mmap_sem is held for writing
> and we are remapping to what should be empty space. It should not be
> possible for a huge PMD to be established underneath us.

Yes. The code will work fine also if the new_pmd points to a pte that
wasn't freed by free_pgtables but I think it's kind of safer to have a
WARN_ON because if that really ever happens we would prefer to
release the pte here and proceed mapping the hugepmd instead of
splitting the source hugepmd.

If there's a hugepmd mapped instead it means the memory wasn't
munmapped. I could have run a mem compare of the pte against zero page
too but I didn't.. kind of overkill. But checking that there is no
hugepmd is fast.

> > +	spin_lock(&mm->page_table_lock);
> > +	if (likely(pmd_trans_huge(*old_pmd))) {
> > +		if (pmd_trans_splitting(*old_pmd)) {
> > +			spin_unlock(&mm->page_table_lock);
> > +			wait_split_huge_page(vma->anon_vma, old_pmd);
> > +			ret = -1;
> > +		} else {
> > +			pmd = pmdp_get_and_clear(mm, old_addr, old_pmd);
> > +			VM_BUG_ON(!pmd_none(*new_pmd));
> > +			set_pmd_at(mm, new_addr, new_pmd, pmd);
> > +			spin_unlock(&mm->page_table_lock);
> > +			ret = 1;
> > +		}
> > +	} else
> > +		spin_unlock(&mm->page_table_lock);
> > +
> 
> The meaning of the return values of -1, 0, 1 with the caller doing
> 
> if (err)
> ...
> else if (!err)
> 	...
> 
> is tricky to work out. split_huge_page only needs to be called if
> returning 0. Would it be possible to have the split_huge_page called in
> this function? The end of the function would then look like
> 
> return ret;
> 
> out_split:
> split_huge_page_pmd()
> return ret;
> 
> with either success or failure being returned instead of a tristate
> which is easier to understand.

So basically always return 0 regardless if it was a
pmd_trans_splitting or if we splitted it ourself. And only return 1 in
case the move_huge_pmd was successful.

I'm afraid we've other trestates returned for the other mm
methods... if we cleanup this one later we may also cleanup the
others.

I'll try to clean up this one for now ok.

> > diff --git a/mm/mremap.c b/mm/mremap.c
> > --- a/mm/mremap.c
> > +++ b/mm/mremap.c
> > @@ -41,8 +41,7 @@ static pmd_t *get_old_pmd(struct mm_stru
> >  		return NULL;
> >  
> >  	pmd = pmd_offset(pud, addr);
> > -	split_huge_page_pmd(mm, pmd);
> > -	if (pmd_none_or_clear_bad(pmd))
> > +	if (pmd_none(*pmd))
> >  		return NULL;
> >  
> 
> Ok, this is changing to pmd_none because it could be a huge PMD and
> pmd_none_or_clear_bad triggers on a huge PMD. Right?

Right. It'd bug on with a hugepmd. Maybe we could someday change
pmd_none_or_clear_bad not to choke on the PSE bit, but it was kind of
safer to keep assuming the PSE bit was "bad" in case we forgotten some
split_huge_page somewhere, better to bug in that case than to
ignore. But as time passes and THP is rock solid, I've to admit it's
becoming more a lack of reliability to consider the PSE bit "bad" and
to remove some pmd_none_clear_bad than an increase of reliability to
validate the THP case. Maybe we should change that. It's a common
problem not specific to mremap.

> > @@ -65,8 +64,6 @@ static pmd_t *alloc_new_pmd(struct mm_st
> >  		return NULL;
> >  
> >  	VM_BUG_ON(pmd_trans_huge(*pmd));
> > -	if (pmd_none(*pmd) && __pte_alloc(mm, vma, pmd, addr))
> > -		return NULL;
> >  
> >  	return pmd;
> >  }
> > @@ -80,11 +77,7 @@ static void move_ptes(struct vm_area_str
> >  	struct mm_struct *mm = vma->vm_mm;
> >  	pte_t *old_pte, *new_pte, pte;
> >  	spinlock_t *old_ptl, *new_ptl;
> > -	unsigned long old_start;
> >  
> > -	old_start = old_addr;
> > -	mmu_notifier_invalidate_range_start(vma->vm_mm,
> > -					    old_start, old_end);
> 
> The MMU notifier is now being called for a larger range. Previously it
> would usually be ranges of 64 pages and now it looks like it happens
> once for the entire range being remapped. This is not mentioned in
> the leader. What are the consequences of having a large gap between
> invalidate_start and invalidate_end? Would it be a big deal to call
> the MMU notifier within move_huge_pmd()?

Well it should improve performance. The only downside is that it will
stall secondary page faults for a longer time, but because the mremap
can now run faster with fewer IPIs, I think overall it should improve
performance. Also it's probably not too common to do much mremap on
apps with a secondary MMU attached so in this place the mmu notifier
is more about correctness I think and correctness remains. Userland
can always do mremap in smaller chunks if it needs and then it'll
really run faster with no downside. After all no app is supposed to
touch the source addresses while they're being migrated as it could
SEGFAULT at any time. So a very long mremap basically makes a mapping
"not available" for a longer time, just now it'll be shorter because
mremap will run faster. The mapping being available for the secondary
mmu was incidental implementation detail, now it's not available to
the secondary mmu during the move, like it is not available to
userland. Trapping sigsegv to modify the mapping while it's under
mremap I doubt anybody is depending on.

> If it's safe to use larger ranges, it would be preferable to see it
> in a separate patch or at the very least explained in the changelog.

I can split it off to a separate patch. I knew I should have done that
in the first place and rightfully I got caught :).

> >  	if (vma->vm_file) {
> >  		/*
> >  		 * Subtle point from Rajesh Venkatasubramanian: before
> > @@ -111,7 +104,7 @@ static void move_ptes(struct vm_area_str
> >  				   new_pte++, new_addr += PAGE_SIZE) {
> >  		if (pte_none(*old_pte))
> >  			continue;
> > -		pte = ptep_clear_flush(vma, old_addr, old_pte);
> > +		pte = ptep_get_and_clear(mm, old_addr, old_pte);
> 
> This looks like an unrelated optimisation. You hint at this in the
> patch subject but it needs a separate patch or a better explanation in
> the leader. If I'm reading this right, it looks like you are deferring
> a TLB flush on a single page and calling one call later at the end of
> move_page_tables. At a glance, that seems ok and would reduce IPIs
> but I'm not thinking about it properly because I'm trying to think
> about THP shenanigans :)

That's exactly what it does. Once I split it off you can concentrate
on the two parts separately. This is also the parts that requires
moving the mmu notifier outside along with the tlb flush outside.

The THP shenanigans don't require moving the mmu notifier outside.

The one IPI per page is a major bottleneck for java, lack of hugepmd
migrate also major bottleneck, here we get both combined so we get 1
IPI for a ton of THP. The benchmark I run was single threaded on a 12
core system (and single threaded if scheduler is doing good won't
require any IPI), you can only imagine the boost it gets on heavily
multithreaded apps that requires flooding IPI on large SMP (I didn't
measure that as I was already happy with what I got single threaded :).

> > +	mmu_notifier_invalidate_range_start(vma->vm_mm, old_addr, old_end);
> > +
> >  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
> >  		cond_resched();
> >  		next = (old_addr + PMD_SIZE) & PMD_MASK;
> > -		if (next - 1 > old_end)
> > +		if (next > old_end)
> >  			next = old_end;
> >  		extent = next - old_addr;
> >  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
> 
> You asked if removing this "- 1" is correct. It's an overflow check for
> a situation where old_addr + PMD_SIZE overflows. On what architecture
> is it possible to call mremap() at the very top of the address space
> or am I missing the point?
> 
> Otherwise I think the existing check is harmless if obscure. It's
> reasonable to assume PAGE_SIZE will be > 1 and I'm not seeing why it is
> required by the rest of the patch.

An arch where the -TASK_SIZE is less than PMD_SIZE didn't cross my
mind sorry, Johannes also pointed it out. I'd find this more readable
than a off by one -1 that looks erroneous.

diff --git a/mm/mremap.c b/mm/mremap.c
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -136,9 +136,10 @@ unsigned long move_page_tables(struct vm
 	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
 		cond_resched();
 		next = (old_addr + PMD_SIZE) & PMD_MASK;
-		if (next > old_end)
-			next = old_end;
+		/* even if next overflowed, extent below will be ok */
 		extent = next - old_addr;
+		if (extent > old_end - old_addr)
+			extent = old_end - old_addr;
 		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
 		if (!old_pmd)
 			continue;

What do you think? I can put the -1 back if you prefer. I doubt the
speed difference can matter here, all values should be in registers.

> > @@ -150,6 +145,23 @@ unsigned long move_page_tables(struct vm
> >  		new_pmd = alloc_new_pmd(vma->vm_mm, vma, new_addr);
> >  		if (!new_pmd)
> >  			break;
> > +		if (pmd_trans_huge(*old_pmd)) {
> > +			int err = 0;
> > +			if (extent == HPAGE_PMD_SIZE)
> > +				err = move_huge_pmd(vma, new_vma, old_addr,
> > +						    new_addr, old_end,
> > +						    old_pmd, new_pmd);
> > +			if (err > 0) {
> > +				need_flush = true;
> > +				continue;
> > +			} else if (!err)
> > +				split_huge_page_pmd(vma->vm_mm, old_pmd);
> > +			VM_BUG_ON(pmd_trans_huge(*old_pmd));
> 
> This tristate is hard to parse but I mentioned this already.

Yep I'll try to make it 0/1.

> Functionally, I can't see a major problem with the patch. The
> minor problems are that I'd like to see that tristate replaced for
> readability, the optimisation better explained or in a separate patch
> and an explanation why the larger ranges for mmu_notifiers is not
> a problem.

Thanks for the review, very helpful as usual, I'll try to submit a 3
patches version with the cleanups you suggested soon enough. Ideally
I'd like to replace the -1 with the above change that also should
guard against TASK_SIZE ending less than one pmd away from the end of
the address space if you like it.

Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] THP: mremap support and TLB optimization #2
  2011-08-05 16:21   ` Andrea Arcangeli
@ 2011-08-05 17:11     ` Mel Gorman
  2011-08-05 18:18       ` Andrea Arcangeli
  0 siblings, 1 reply; 9+ messages in thread
From: Mel Gorman @ 2011-08-05 17:11 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-mm, Johannes Weiner, Rik van Riel, Hugh Dickins

On Fri, Aug 05, 2011 at 06:21:51PM +0200, Andrea Arcangeli wrote:
> On Fri, Aug 05, 2011 at 04:25:16PM +0100, Mel Gorman wrote:
> > > +int move_huge_pmd(struct vm_area_struct *vma, struct vm_area_struct *new_vma,
> > > +		  unsigned long old_addr,
> > > +		  unsigned long new_addr, unsigned long old_end,
> > > +		  pmd_t *old_pmd, pmd_t *new_pmd)
> > > +{
> > > +	int ret = 0;
> > > +	pmd_t pmd;
> > > +
> > > +	struct mm_struct *mm = vma->vm_mm;
> > > +
> > > +	if ((old_addr & ~HPAGE_PMD_MASK) ||
> > > +	    (new_addr & ~HPAGE_PMD_MASK) ||
> > 
> > How could these conditions ever be true? We are here because it was
> > pmd_trans_huge. There should be no way this can be aligned. If this
> > is paranoia, make it a BUG_ON.
> 
> It actually happens. old_addr/new_addr aren't aligned to hpage
> beforehand, and they just from the mremap syscall
> parameters. Returning 0 is needed here.
> 

Ah yes, of course. Just because the PMD is huge does not mean you
are calling mremap on an aligned address.

> > 
> > > +	    (old_addr + HPAGE_PMD_SIZE) > old_end ||
> > 
> > Again, is this possible? The old addr was already huge.
> 
> This can happen too, old_end is also passed as parameter from syscall
> and it's not mangled to fit an hpage, just calculated as old_addr+len
> (len passed as parameter). The length of the destination vma shouldn't
> be necessary to check, there's no new_end in the first place as
> parameter of move_page_tables so the caller must ensure there's enough
> space to copy in the destination. And if the old_end-old_addr <=
> HPAGE_PMD_SIZE and the old_addr and new_addr are aligned, we're sure
> the hugepmd is safe to create on the destination new_addr (if it's
> aligned).
> 
> So I think all checks are needed here.

Yes, blindingly obvious now of course :/

> In theory the caller could
> check this stuff before calling move_huge_pmd but I try to be as less
> invasive as possible in the common code that may be built with
> TRANSPARENT_HUGEPAGE=n (these checks would still be computed in the
> pmd_trans_huge(*old_pmd) case only but it's kind of nicer to hide them
> in huge_memory.c).
> 

What you have at the moment is indeed nicer.

> > > +	/*
> > > +	 * The destination pmd shouldn't be established, free_pgtables()
> > > +	 * should have release it.
> > > +	 */
> > > +	if (!pmd_none(*new_pmd)) {
> > > +		WARN_ON(1);
> > > +		VM_BUG_ON(pmd_trans_huge(*new_pmd));
> > > +		goto out;
> > > +	}
> > > +
> > 
> > Agreed that this should never happen. The mmap_sem is held for writing
> > and we are remapping to what should be empty space. It should not be
> > possible for a huge PMD to be established underneath us.
> 
> Yes. The code will work fine also if the new_pmd points to a pte that
> wasn't freed by free_pgtables

As we hold mmap_sem for write, I would expect any munmap to have
completed by the time this is reached and we're hardly calling mremap
during exit.

> but I think it's kind of safer to have a
> WARN_ON because if that really ever happens we would prefer to
> release the pte here and proceed mapping the hugepmd instead of
> splitting the source hugepmd.
> 

It's healthy paranoia.

> If there's a hugepmd mapped instead it means the memory wasn't
> munmapped. I could have run a mem compare of the pte against zero page
> too but I didn't.. kind of overkill. But checking that there is no
> hugepmd is fast.
> 

I think the check is paranoid but there is no harm in that.

> > > +	spin_lock(&mm->page_table_lock);
> > > +	if (likely(pmd_trans_huge(*old_pmd))) {
> > > +		if (pmd_trans_splitting(*old_pmd)) {
> > > +			spin_unlock(&mm->page_table_lock);
> > > +			wait_split_huge_page(vma->anon_vma, old_pmd);
> > > +			ret = -1;
> > > +		} else {
> > > +			pmd = pmdp_get_and_clear(mm, old_addr, old_pmd);
> > > +			VM_BUG_ON(!pmd_none(*new_pmd));
> > > +			set_pmd_at(mm, new_addr, new_pmd, pmd);
> > > +			spin_unlock(&mm->page_table_lock);
> > > +			ret = 1;
> > > +		}
> > > +	} else
> > > +		spin_unlock(&mm->page_table_lock);
> > > +
> > 
> > The meaning of the return values of -1, 0, 1 with the caller doing
> > 
> > if (err)
> > ...
> > else if (!err)
> > 	...
> > 
> > is tricky to work out. split_huge_page only needs to be called if
> > returning 0. Would it be possible to have the split_huge_page called in
> > this function? The end of the function would then look like
> > 
> > return ret;
> > 
> > out_split:
> > split_huge_page_pmd()
> > return ret;
> > 
> > with either success or failure being returned instead of a tristate
> > which is easier to understand.
> 
> So basically always return 0 regardless if it was a
> pmd_trans_splitting or if we splitted it ourself. And only return 1 in
> case the move_huge_pmd was successful.
> 

Exactly.

> I'm afraid we've other trestates returned for the other mm
> methods... if we cleanup this one later we may also cleanup the
> others.
> 

There are other tristates but lets not make life hard. When I've needed
tristates in recent patches, I've used enums with symbolic names instead
of -1, 0, 1 but that would be overkill here.

> I'll try to clean up this one for now ok.
> 
> > > diff --git a/mm/mremap.c b/mm/mremap.c
> > > --- a/mm/mremap.c
> > > +++ b/mm/mremap.c
> > > @@ -41,8 +41,7 @@ static pmd_t *get_old_pmd(struct mm_stru
> > >  		return NULL;
> > >  
> > >  	pmd = pmd_offset(pud, addr);
> > > -	split_huge_page_pmd(mm, pmd);
> > > -	if (pmd_none_or_clear_bad(pmd))
> > > +	if (pmd_none(*pmd))
> > >  		return NULL;
> > >  
> > 
> > Ok, this is changing to pmd_none because it could be a huge PMD and
> > pmd_none_or_clear_bad triggers on a huge PMD. Right?
> 
> Right. It'd bug on with a hugepmd. Maybe we could someday change
> pmd_none_or_clear_bad not to choke on the PSE bit, but it was kind of
> safer to keep assuming the PSE bit was "bad" in case we forgotten some
> split_huge_page somewhere, better to bug in that case than to
> ignore.

Agreed.

> But as time passes and THP is rock solid, I've to admit it's
> becoming more a lack of reliability to consider the PSE bit "bad" and
> to remove some pmd_none_clear_bad than an increase of reliability to
> validate the THP case. Maybe we should change that. It's a common
> problem not specific to mremap.
> 

It's something worth doing in a few releases time. It would be nice
to have the bad PMD checks back at some point.

> > > @@ -65,8 +64,6 @@ static pmd_t *alloc_new_pmd(struct mm_st
> > >  		return NULL;
> > >  
> > >  	VM_BUG_ON(pmd_trans_huge(*pmd));
> > > -	if (pmd_none(*pmd) && __pte_alloc(mm, vma, pmd, addr))
> > > -		return NULL;
> > >  
> > >  	return pmd;
> > >  }
> > > @@ -80,11 +77,7 @@ static void move_ptes(struct vm_area_str
> > >  	struct mm_struct *mm = vma->vm_mm;
> > >  	pte_t *old_pte, *new_pte, pte;
> > >  	spinlock_t *old_ptl, *new_ptl;
> > > -	unsigned long old_start;
> > >  
> > > -	old_start = old_addr;
> > > -	mmu_notifier_invalidate_range_start(vma->vm_mm,
> > > -					    old_start, old_end);
> > 
> > The MMU notifier is now being called for a larger range. Previously it
> > would usually be ranges of 64 pages and now it looks like it happens
> > once for the entire range being remapped. This is not mentioned in
> > the leader. What are the consequences of having a large gap between
> > invalidate_start and invalidate_end? Would it be a big deal to call
> > the MMU notifier within move_huge_pmd()?
> 
> Well it should improve performance. The only downside is that it will
> stall secondary page faults for a longer time, but because the mremap
> can now run faster with fewer IPIs, I think overall it should improve
> performance.

That's a tough call. Increased jitter within KVM might be unwelcome
but ordinarily the only people that might care about latencies are
real time people and I dont think they would care about the KVM case.

> Also it's probably not too common to do much mremap on
> apps with a secondary MMU attached so in this place the mmu notifier
> is more about correctness I think and correctness remains. Userland
> can always do mremap in smaller chunks if it needs and then it'll
> really run faster with no downside. After all no app is supposed to
> touch the source addresses while they're being migrated as it could
> SEGFAULT at any time. So a very long mremap basically makes a mapping
> "not available" for a longer time, just now it'll be shorter because
> mremap will run faster. The mapping being available for the secondary
> mmu was incidental implementation detail, now it's not available to
> the secondary mmu during the move, like it is not available to
> userland. Trapping sigsegv to modify the mapping while it's under
> mremap I doubt anybody is depending on.
> 

I agree with you that overall it's probably for the best but splitting
it out as a separate patch and cc'ing the KVM people would do no harm.
Is there any impact for things like xpmem that might be using MMU
notifiers in some creative manner?

> > If it's safe to use larger ranges, it would be preferable to see it
> > in a separate patch or at the very least explained in the changelog.
> 
> I can split it off to a separate patch. I knew I should have done that
> in the first place and rightfully I got caught :).
> 

:)

> > >  	if (vma->vm_file) {
> > >  		/*
> > >  		 * Subtle point from Rajesh Venkatasubramanian: before
> > > @@ -111,7 +104,7 @@ static void move_ptes(struct vm_area_str
> > >  				   new_pte++, new_addr += PAGE_SIZE) {
> > >  		if (pte_none(*old_pte))
> > >  			continue;
> > > -		pte = ptep_clear_flush(vma, old_addr, old_pte);
> > > +		pte = ptep_get_and_clear(mm, old_addr, old_pte);
> > 
> > This looks like an unrelated optimisation. You hint at this in the
> > patch subject but it needs a separate patch or a better explanation in
> > the leader. If I'm reading this right, it looks like you are deferring
> > a TLB flush on a single page and calling one call later at the end of
> > move_page_tables. At a glance, that seems ok and would reduce IPIs
> > but I'm not thinking about it properly because I'm trying to think
> > about THP shenanigans :)
> 
> That's exactly what it does. Once I split it off you can concentrate
> on the two parts separately. This is also the parts that requires
> moving the mmu notifier outside along with the tlb flush outside.
> 
> The THP shenanigans don't require moving the mmu notifier outside.
> 

Right. I did notice that it would be less churn if the optimisation
went in first but that was about it.

> The one IPI per page is a major bottleneck for java, lack of hugepmd
> migrate also major bottleneck, here we get both combined so we get 1
> IPI for a ton of THP. The benchmark I run was single threaded on a 12
> core system (and single threaded if scheduler is doing good won't
> require any IPI), you can only imagine the boost it gets on heavily
> multithreaded apps that requires flooding IPI on large SMP (I didn't
> measure that as I was already happy with what I got single threaded :).
> 

Yeah, I imagine it should also be a boost during GC if the JVM is
using mremap to migrate full pages from an old heap to a new one.

> > > +	mmu_notifier_invalidate_range_start(vma->vm_mm, old_addr, old_end);
> > > +
> > >  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
> > >  		cond_resched();
> > >  		next = (old_addr + PMD_SIZE) & PMD_MASK;
> > > -		if (next - 1 > old_end)
> > > +		if (next > old_end)
> > >  			next = old_end;
> > >  		extent = next - old_addr;
> > >  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
> > 
> > You asked if removing this "- 1" is correct. It's an overflow check for
> > a situation where old_addr + PMD_SIZE overflows. On what architecture
> > is it possible to call mremap() at the very top of the address space
> > or am I missing the point?
> > 
> > Otherwise I think the existing check is harmless if obscure. It's
> > reasonable to assume PAGE_SIZE will be > 1 and I'm not seeing why it is
> > required by the rest of the patch.
> 
> An arch where the -TASK_SIZE is less than PMD_SIZE didn't cross my
> mind sorry, Johannes also pointed it out. I'd find this more readable
> than a off by one -1 that looks erroneous.
> 
> diff --git a/mm/mremap.c b/mm/mremap.c
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -136,9 +136,10 @@ unsigned long move_page_tables(struct vm
>  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>  		cond_resched();
>  		next = (old_addr + PMD_SIZE) & PMD_MASK;
> -		if (next > old_end)
> -			next = old_end;
> +		/* even if next overflowed, extent below will be ok */
>  		extent = next - old_addr;
> +		if (extent > old_end - old_addr)
> +			extent = old_end - old_addr;
>  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>  		if (!old_pmd)
>  			continue;
> 
> What do you think? I can put the -1 back if you prefer. I doubt the
> speed difference can matter here, all values should be in registers.
> 

I don't feel very strongly about it. The change looks reasonable
and with a fresh head with the patch split out, it probably is more
readable.

> > > @@ -150,6 +145,23 @@ unsigned long move_page_tables(struct vm
> > >  		new_pmd = alloc_new_pmd(vma->vm_mm, vma, new_addr);
> > >  		if (!new_pmd)
> > >  			break;
> > > +		if (pmd_trans_huge(*old_pmd)) {
> > > +			int err = 0;
> > > +			if (extent == HPAGE_PMD_SIZE)
> > > +				err = move_huge_pmd(vma, new_vma, old_addr,
> > > +						    new_addr, old_end,
> > > +						    old_pmd, new_pmd);
> > > +			if (err > 0) {
> > > +				need_flush = true;
> > > +				continue;
> > > +			} else if (!err)
> > > +				split_huge_page_pmd(vma->vm_mm, old_pmd);
> > > +			VM_BUG_ON(pmd_trans_huge(*old_pmd));
> > 
> > This tristate is hard to parse but I mentioned this already.
> 
> Yep I'll try to make it 0/1.
> 
> > Functionally, I can't see a major problem with the patch. The
> > minor problems are that I'd like to see that tristate replaced for
> > readability, the optimisation better explained or in a separate patch
> > and an explanation why the larger ranges for mmu_notifiers is not
> > a problem.
> 
> Thanks for the review, very helpful as usual, I'll try to submit a 3
> patches version with the cleanups you suggested soon enough. Ideally
> I'd like to replace the -1 with the above change that also should
> guard against TASK_SIZE ending less than one pmd away from the end of
> the address space if you like it.
> 

Sounds good. Will review again when they come around.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] THP: mremap support and TLB optimization #2
  2011-08-05 17:11     ` Mel Gorman
@ 2011-08-05 18:18       ` Andrea Arcangeli
  0 siblings, 0 replies; 9+ messages in thread
From: Andrea Arcangeli @ 2011-08-05 18:18 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, Johannes Weiner, Rik van Riel, Hugh Dickins

On Fri, Aug 05, 2011 at 06:11:26PM +0100, Mel Gorman wrote:
> That's a tough call. Increased jitter within KVM might be unwelcome
> but ordinarily the only people that might care about latencies are
> real time people and I dont think they would care about the KVM case.

Ah KVM won't possibly ever run mremap on regions backed by the mmu
notifier, so that's not an issue.

> I agree with you that overall it's probably for the best but splitting
> it out as a separate patch and cc'ing the KVM people would do no harm.
> Is there any impact for things like xpmem that might be using MMU
> notifiers in some creative manner?

KVM people are already safe, this mremap optimization won't affect
KVM, it's more for JVM than anything else. xpmem currently can't run
on mmu notifier because they require scheduling at least in the
range_start/end methods (not in the invalidate_page methods), and
without srcu in mmu notifier they can't schedule even in
range_start/end. The only other user in tree is the GRU driver.

I don't think it's a fundamental new limitation, the whole mremap
region becomes unavailable to the users of the primary MMU for the
whole duration, it was more by accident that the part of the region
was still available to the secondary mmu users during mremap despite
not being "accessible" with predictable result on the primary mmu
(modulo trapping sigsegv). So I don't think we can make things
worse with this, not worse than they already were for the primary mmu,
plus the duration of the syscall will be shorter now.

> > The one IPI per page is a major bottleneck for java, lack of hugepmd
> > migrate also major bottleneck, here we get both combined so we get 1
> > IPI for a ton of THP. The benchmark I run was single threaded on a 12
> > core system (and single threaded if scheduler is doing good won't
> > require any IPI), you can only imagine the boost it gets on heavily
> > multithreaded apps that requires flooding IPI on large SMP (I didn't
> > measure that as I was already happy with what I got single threaded :).
> > 
> 
> Yeah, I imagine it should also be a boost during GC if the JVM is
> using mremap to migrate full pages from an old heap to a new one.

Correct, it's GC doing mremap AFIK.

> I don't feel very strongly about it. The change looks reasonable
> and with a fresh head with the patch split out, it probably is more
> readable.

Yes I plan to do a split out. I just wanted to know if I should change
the -1 into the code that doesn't relay on PAGE_SIZE > 1 and that
explicitly says it's about the overflow, so that it feels
less obscure, or if I should leave the -1 as is.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] THP: mremap support and TLB optimization #2
  2011-08-05 15:25 ` Mel Gorman
  2011-08-05 16:21   ` Andrea Arcangeli
@ 2011-08-06 16:31   ` Andrea Arcangeli
  1 sibling, 0 replies; 9+ messages in thread
From: Andrea Arcangeli @ 2011-08-06 16:31 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, Johannes Weiner, Rik van Riel, Hugh Dickins

On Fri, Aug 05, 2011 at 04:25:16PM +0100, Mel Gorman wrote:
> The meaning of the return values of -1, 0, 1 with the caller doing
> 
> if (err)
> ...
> else if (!err)
> 	...
> 
> is tricky to work out. split_huge_page only needs to be called if
> returning 0. Would it be possible to have the split_huge_page called in
> this function? The end of the function would then look like

I'm doing the cleanup but problem is to call split_huge_page in
move_huge_pmd I'd need to call move_huge_pmd even when extent <
HPAGE_PMD_SIZE. That's an unnecessary call... That is the reason of
the tristate.

Alternatively I could call split_huge_page even if move_huge_pmd
returns -1 (so making it return 0) but then it'd be another
unnecessary call.

Not sure anymore if it's worth removing the -1.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-08-06 18:18 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-28 14:26 [PATCH] THP: mremap support and TLB optimization #2 Andrea Arcangeli
2011-08-04 15:32 ` Andrea Arcangeli
2011-08-05 10:50 ` Johannes Weiner
2011-08-05 15:52   ` Andrea Arcangeli
2011-08-05 15:25 ` Mel Gorman
2011-08-05 16:21   ` Andrea Arcangeli
2011-08-05 17:11     ` Mel Gorman
2011-08-05 18:18       ` Andrea Arcangeli
2011-08-06 16:31   ` Andrea Arcangeli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.