All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Fix migration races in rmap_walk()
@ 2010-04-26 22:37 ` Mel Gorman
  0 siblings, 0 replies; 42+ messages in thread
From: Mel Gorman @ 2010-04-26 22:37 UTC (permalink / raw)
  To: Linux-MM, LKML
  Cc: Minchan Kim, KAMEZAWA Hiroyuki, Mel Gorman, Christoph Lameter,
	Andrea Arcangeli, Rik van Riel, Andrew Morton

After digging around a lot, I believe the following two patches are the
best way to close the race that allows a migration PTE to be left behind
triggering a BUG check in migration_entry_to_page().

Patch one alters has fork() wait for migration to complete. Patch two has
vma_adjust() acquire the anon_vma lock it is aware of and makes rmap_walk()
aware that different VMAs can be encountered during the walk.

I dropped the use of the seq counter because there were still races in
place. For example, while the seq counter would catch when vma_adjust()
and rmap_walk() were looking at the same VMA, there was still insufficient
protection on the VMA list being modified.

The reproduction case was as follows;

1. Run kernel compilation in a loop
2. Start two processes that repeatedly fork()ed and manipulated mappings
3. Constantly compact memory using /proc/sys/vm/compact_memory
4. Optionally add/remove swap

With these two patches applied, I was unable to trigger the bug check
in migration_entry_to_page() but it would be really helpful if Rik could
comment on the anon_vma locking requirements and whether patch 2 is 100%
safe or not.  The tests have only been running 8 hours but I'm posting now
anyway and will see how it survives running for a few days.

The other issues raised about expand_downwards will need to be re-examined to
see if they still exist and transparent hugepage support will need further
thinking to see if split_huge_page() can deal with these situations.

 mm/ksm.c    |   13 +++++++++++++
 mm/memory.c |   25 ++++++++++++++++---------
 mm/mmap.c   |    6 ++++++
 mm/rmap.c   |   23 ++++++++++++++++++++---
 4 files changed, 55 insertions(+), 12 deletions(-)


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 0/2] Fix migration races in rmap_walk()
@ 2010-04-26 22:37 ` Mel Gorman
  0 siblings, 0 replies; 42+ messages in thread
From: Mel Gorman @ 2010-04-26 22:37 UTC (permalink / raw)
  To: Linux-MM, LKML
  Cc: Minchan Kim, KAMEZAWA Hiroyuki, Mel Gorman, Christoph Lameter,
	Andrea Arcangeli, Rik van Riel, Andrew Morton

After digging around a lot, I believe the following two patches are the
best way to close the race that allows a migration PTE to be left behind
triggering a BUG check in migration_entry_to_page().

Patch one alters has fork() wait for migration to complete. Patch two has
vma_adjust() acquire the anon_vma lock it is aware of and makes rmap_walk()
aware that different VMAs can be encountered during the walk.

I dropped the use of the seq counter because there were still races in
place. For example, while the seq counter would catch when vma_adjust()
and rmap_walk() were looking at the same VMA, there was still insufficient
protection on the VMA list being modified.

The reproduction case was as follows;

1. Run kernel compilation in a loop
2. Start two processes that repeatedly fork()ed and manipulated mappings
3. Constantly compact memory using /proc/sys/vm/compact_memory
4. Optionally add/remove swap

With these two patches applied, I was unable to trigger the bug check
in migration_entry_to_page() but it would be really helpful if Rik could
comment on the anon_vma locking requirements and whether patch 2 is 100%
safe or not.  The tests have only been running 8 hours but I'm posting now
anyway and will see how it survives running for a few days.

The other issues raised about expand_downwards will need to be re-examined to
see if they still exist and transparent hugepage support will need further
thinking to see if split_huge_page() can deal with these situations.

 mm/ksm.c    |   13 +++++++++++++
 mm/memory.c |   25 ++++++++++++++++---------
 mm/mmap.c   |    6 ++++++
 mm/rmap.c   |   23 ++++++++++++++++++++---
 4 files changed, 55 insertions(+), 12 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/2] mm,migration: During fork(), wait for migration to end if migration PTE is encountered
  2010-04-26 22:37 ` Mel Gorman
@ 2010-04-26 22:37   ` Mel Gorman
  -1 siblings, 0 replies; 42+ messages in thread
From: Mel Gorman @ 2010-04-26 22:37 UTC (permalink / raw)
  To: Linux-MM, LKML
  Cc: Minchan Kim, KAMEZAWA Hiroyuki, Mel Gorman, Christoph Lameter,
	Andrea Arcangeli, Rik van Riel, Andrew Morton

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

At page migration, we replace pte with migration_entry, which has
similar format as swap_entry and replace it with real pfn at the
end of migration. But there is a race with fork()'s copy_page_range().

Assume page migraion on CPU A and fork in CPU B. On CPU A, a page of
a process is under migration. On CPU B, a page's pte is under copy.

	CPUA			CPU B
				do_fork()
				copy_mm() (from process 1 to process2)
				insert new vma to mmap_list (if inode/anon_vma)
	pte_lock(process1)
	unmap a page
	insert migration_entry
	pte_unlock(process1)

	migrate page copy
				copy_page_range
	remap new page by rmap_walk()
	pte_lock(process2)
	found no pte.
	pte_unlock(process2)
				pte lock(process2)
				pte lock(process1)
				copy migration entry to process2
				pte unlock(process1)
				pte unlokc(process2)
	pte_lock(process1)
	replace migration entry
	to new page's pte.
	pte_unlock(process1)

Then, some serialization is necessary. IIUC, this is very rare event but
it is reproducible if a lot of migration is happening a lot with the
following program running in parallel.

    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
    #include <sys/mman.h>

    #define SIZE (24*1048576UL)
    #define CHILDREN 100
    int main()
    {
	    int i = 0;
	    pid_t pids[CHILDREN];
	    char *buf = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
			    MAP_PRIVATE|MAP_ANONYMOUS,
			    0, 0);
	    if (buf == MAP_FAILED) {
		    perror("mmap");
		    exit(-1);
	    }

	    while (++i) {
		    int j = i % CHILDREN;

		    if (j == 0) {
			    printf("Waiting on children\n");
			    for (j = 0; j < CHILDREN; j++) {
				    memset(buf, i, SIZE);
				    if (pids[j] != -1)
					    waitpid(pids[j], NULL, 0);
			    }
			    j = 0;
		    }

		    if ((pids[j] = fork()) == 0) {
			    memset(buf, i, SIZE);
			    exit(EXIT_SUCCESS);
		    }
	    }

	    munmap(buf, SIZE);
    }

copy_page_range() can wait for the end of migration.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 mm/memory.c |   24 +++++++++++++++---------
 1 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 833952d..36dadd4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -675,15 +675,8 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 			}
 			if (likely(!non_swap_entry(entry)))
 				rss[MM_SWAPENTS]++;
-			else if (is_write_migration_entry(entry) &&
-					is_cow_mapping(vm_flags)) {
-				/*
-				 * COW mappings require pages in both parent
-				 * and child to be set to read.
-				 */
-				make_migration_entry_read(&entry);
-				pte = swp_entry_to_pte(entry);
-				set_pte_at(src_mm, addr, src_pte, pte);
+			else {
+				BUG();
 			}
 		}
 		goto out_set_pte;
@@ -760,6 +753,19 @@ again:
 			progress++;
 			continue;
 		}
+		if (unlikely(!pte_present(*src_pte) && !pte_file(*src_pte))) {
+			entry = pte_to_swp_entry(*src_pte);
+			if (is_migration_entry(entry)) {
+				/*
+				 * Because copying pte has the race with
+				 * pte rewriting of migraton, release lock
+				 * and retry.
+				 */
+				progress = 0;
+				entry.val = 0;
+				break;
+			}
+		}
 		entry.val = copy_one_pte(dst_mm, src_mm, dst_pte, src_pte,
 							vma, addr, rss);
 		if (entry.val)
-- 
1.6.5


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 1/2] mm,migration: During fork(), wait for migration to end if migration PTE is encountered
@ 2010-04-26 22:37   ` Mel Gorman
  0 siblings, 0 replies; 42+ messages in thread
From: Mel Gorman @ 2010-04-26 22:37 UTC (permalink / raw)
  To: Linux-MM, LKML
  Cc: Minchan Kim, KAMEZAWA Hiroyuki, Mel Gorman, Christoph Lameter,
	Andrea Arcangeli, Rik van Riel, Andrew Morton

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

At page migration, we replace pte with migration_entry, which has
similar format as swap_entry and replace it with real pfn at the
end of migration. But there is a race with fork()'s copy_page_range().

Assume page migraion on CPU A and fork in CPU B. On CPU A, a page of
a process is under migration. On CPU B, a page's pte is under copy.

	CPUA			CPU B
				do_fork()
				copy_mm() (from process 1 to process2)
				insert new vma to mmap_list (if inode/anon_vma)
	pte_lock(process1)
	unmap a page
	insert migration_entry
	pte_unlock(process1)

	migrate page copy
				copy_page_range
	remap new page by rmap_walk()
	pte_lock(process2)
	found no pte.
	pte_unlock(process2)
				pte lock(process2)
				pte lock(process1)
				copy migration entry to process2
				pte unlock(process1)
				pte unlokc(process2)
	pte_lock(process1)
	replace migration entry
	to new page's pte.
	pte_unlock(process1)

Then, some serialization is necessary. IIUC, this is very rare event but
it is reproducible if a lot of migration is happening a lot with the
following program running in parallel.

    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
    #include <sys/mman.h>

    #define SIZE (24*1048576UL)
    #define CHILDREN 100
    int main()
    {
	    int i = 0;
	    pid_t pids[CHILDREN];
	    char *buf = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
			    MAP_PRIVATE|MAP_ANONYMOUS,
			    0, 0);
	    if (buf == MAP_FAILED) {
		    perror("mmap");
		    exit(-1);
	    }

	    while (++i) {
		    int j = i % CHILDREN;

		    if (j == 0) {
			    printf("Waiting on children\n");
			    for (j = 0; j < CHILDREN; j++) {
				    memset(buf, i, SIZE);
				    if (pids[j] != -1)
					    waitpid(pids[j], NULL, 0);
			    }
			    j = 0;
		    }

		    if ((pids[j] = fork()) == 0) {
			    memset(buf, i, SIZE);
			    exit(EXIT_SUCCESS);
		    }
	    }

	    munmap(buf, SIZE);
    }

copy_page_range() can wait for the end of migration.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 mm/memory.c |   24 +++++++++++++++---------
 1 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 833952d..36dadd4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -675,15 +675,8 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 			}
 			if (likely(!non_swap_entry(entry)))
 				rss[MM_SWAPENTS]++;
-			else if (is_write_migration_entry(entry) &&
-					is_cow_mapping(vm_flags)) {
-				/*
-				 * COW mappings require pages in both parent
-				 * and child to be set to read.
-				 */
-				make_migration_entry_read(&entry);
-				pte = swp_entry_to_pte(entry);
-				set_pte_at(src_mm, addr, src_pte, pte);
+			else {
+				BUG();
 			}
 		}
 		goto out_set_pte;
@@ -760,6 +753,19 @@ again:
 			progress++;
 			continue;
 		}
+		if (unlikely(!pte_present(*src_pte) && !pte_file(*src_pte))) {
+			entry = pte_to_swp_entry(*src_pte);
+			if (is_migration_entry(entry)) {
+				/*
+				 * Because copying pte has the race with
+				 * pte rewriting of migraton, release lock
+				 * and retry.
+				 */
+				progress = 0;
+				entry.val = 0;
+				break;
+			}
+		}
 		entry.val = copy_one_pte(dst_mm, src_mm, dst_pte, src_pte,
 							vma, addr, rss);
 		if (entry.val)
-- 
1.6.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
  2010-04-26 22:37 ` Mel Gorman
@ 2010-04-26 22:37   ` Mel Gorman
  -1 siblings, 0 replies; 42+ messages in thread
From: Mel Gorman @ 2010-04-26 22:37 UTC (permalink / raw)
  To: Linux-MM, LKML
  Cc: Minchan Kim, KAMEZAWA Hiroyuki, Mel Gorman, Christoph Lameter,
	Andrea Arcangeli, Rik van Riel, Andrew Morton

vma_adjust() is updating anon VMA information without any locks taken.
In contrast, file-backed mappings use the i_mmap_lock and this lack of
locking can result in races with page migration. During rmap_walk(),
vma_address() can return -EFAULT for an address that will soon be valid.
This leaves a dangling migration PTE behind which can later cause a BUG_ON
to trigger when the page is faulted in.

With the recent anon_vma changes, there can be more than one anon_vma->lock
that can be taken in a anon_vma_chain but a second lock cannot be spinned
upon in case of deadlock. Instead, the rmap walker tries to take locks of
different anon_vma's. If the attempt fails, the operation is restarted.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 mm/ksm.c  |   13 +++++++++++++
 mm/mmap.c |    6 ++++++
 mm/rmap.c |   22 +++++++++++++++++++---
 3 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 3666d43..baa5b4d 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1674,9 +1674,22 @@ again:
 		spin_lock(&anon_vma->lock);
 		list_for_each_entry(vmac, &anon_vma->head, same_anon_vma) {
 			vma = vmac->vma;
+
+			/* See comment in mm/rmap.c#rmap_walk_anon on locking */
+			if (anon_vma != vma->anon_vma) {
+				if (!spin_trylock(&vma->anon_vma->lock)) {
+					spin_unlock(&anon_vma->lock);
+					goto again;
+				}
+			}
+
 			if (rmap_item->address < vma->vm_start ||
 			    rmap_item->address >= vma->vm_end)
 				continue;
+
+			if (anon_vma != vma->anon_vma)
+				spin_unlock(&vma->anon_vma->lock);
+
 			/*
 			 * Initially we examine only the vma which covers this
 			 * rmap_item; but later, if there is still work to do,
diff --git a/mm/mmap.c b/mm/mmap.c
index f90ea92..61d6f1d 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -578,6 +578,9 @@ again:			remove_next = 1 + (end > next->vm_end);
 		}
 	}
 
+	if (vma->anon_vma)
+		spin_lock(&vma->anon_vma->lock);
+
 	if (root) {
 		flush_dcache_mmap_lock(mapping);
 		vma_prio_tree_remove(vma, root);
@@ -620,6 +623,9 @@ again:			remove_next = 1 + (end > next->vm_end);
 	if (mapping)
 		spin_unlock(&mapping->i_mmap_lock);
 
+	if (vma->anon_vma)
+		spin_unlock(&vma->anon_vma->lock);
+
 	if (remove_next) {
 		if (file) {
 			fput(file);
diff --git a/mm/rmap.c b/mm/rmap.c
index 85f203e..bc313a6 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1368,15 +1368,31 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
 	 * are holding mmap_sem. Users without mmap_sem are required to
 	 * take a reference count to prevent the anon_vma disappearing
 	 */
+retry:
 	anon_vma = page_anon_vma(page);
 	if (!anon_vma)
 		return ret;
 	spin_lock(&anon_vma->lock);
 	list_for_each_entry(avc, &anon_vma->head, same_anon_vma) {
 		struct vm_area_struct *vma = avc->vma;
-		unsigned long address = vma_address(page, vma);
-		if (address == -EFAULT)
-			continue;
+		unsigned long address;
+
+		/*
+		 * Guard against deadlocks by not spinning against
+		 * vma->anon_vma->lock. If contention is found, release our
+		 * lock and try again until VMA list can be traversed without
+		 * contention.
+		 */
+		if (anon_vma != vma->anon_vma) {
+			if (!spin_trylock(&vma->anon_vma->lock)) {
+				spin_unlock(&anon_vma->lock);
+				goto retry;
+			}
+		}
+		address = vma_address(page, vma);
+		if (anon_vma != vma->anon_vma)
+			spin_unlock(&vma->anon_vma->lock);
+
 		ret = rmap_one(page, vma, address, arg);
 		if (ret != SWAP_AGAIN)
 			break;
-- 
1.6.5


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
@ 2010-04-26 22:37   ` Mel Gorman
  0 siblings, 0 replies; 42+ messages in thread
From: Mel Gorman @ 2010-04-26 22:37 UTC (permalink / raw)
  To: Linux-MM, LKML
  Cc: Minchan Kim, KAMEZAWA Hiroyuki, Mel Gorman, Christoph Lameter,
	Andrea Arcangeli, Rik van Riel, Andrew Morton

vma_adjust() is updating anon VMA information without any locks taken.
In contrast, file-backed mappings use the i_mmap_lock and this lack of
locking can result in races with page migration. During rmap_walk(),
vma_address() can return -EFAULT for an address that will soon be valid.
This leaves a dangling migration PTE behind which can later cause a BUG_ON
to trigger when the page is faulted in.

With the recent anon_vma changes, there can be more than one anon_vma->lock
that can be taken in a anon_vma_chain but a second lock cannot be spinned
upon in case of deadlock. Instead, the rmap walker tries to take locks of
different anon_vma's. If the attempt fails, the operation is restarted.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 mm/ksm.c  |   13 +++++++++++++
 mm/mmap.c |    6 ++++++
 mm/rmap.c |   22 +++++++++++++++++++---
 3 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 3666d43..baa5b4d 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1674,9 +1674,22 @@ again:
 		spin_lock(&anon_vma->lock);
 		list_for_each_entry(vmac, &anon_vma->head, same_anon_vma) {
 			vma = vmac->vma;
+
+			/* See comment in mm/rmap.c#rmap_walk_anon on locking */
+			if (anon_vma != vma->anon_vma) {
+				if (!spin_trylock(&vma->anon_vma->lock)) {
+					spin_unlock(&anon_vma->lock);
+					goto again;
+				}
+			}
+
 			if (rmap_item->address < vma->vm_start ||
 			    rmap_item->address >= vma->vm_end)
 				continue;
+
+			if (anon_vma != vma->anon_vma)
+				spin_unlock(&vma->anon_vma->lock);
+
 			/*
 			 * Initially we examine only the vma which covers this
 			 * rmap_item; but later, if there is still work to do,
diff --git a/mm/mmap.c b/mm/mmap.c
index f90ea92..61d6f1d 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -578,6 +578,9 @@ again:			remove_next = 1 + (end > next->vm_end);
 		}
 	}
 
+	if (vma->anon_vma)
+		spin_lock(&vma->anon_vma->lock);
+
 	if (root) {
 		flush_dcache_mmap_lock(mapping);
 		vma_prio_tree_remove(vma, root);
@@ -620,6 +623,9 @@ again:			remove_next = 1 + (end > next->vm_end);
 	if (mapping)
 		spin_unlock(&mapping->i_mmap_lock);
 
+	if (vma->anon_vma)
+		spin_unlock(&vma->anon_vma->lock);
+
 	if (remove_next) {
 		if (file) {
 			fput(file);
diff --git a/mm/rmap.c b/mm/rmap.c
index 85f203e..bc313a6 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1368,15 +1368,31 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
 	 * are holding mmap_sem. Users without mmap_sem are required to
 	 * take a reference count to prevent the anon_vma disappearing
 	 */
+retry:
 	anon_vma = page_anon_vma(page);
 	if (!anon_vma)
 		return ret;
 	spin_lock(&anon_vma->lock);
 	list_for_each_entry(avc, &anon_vma->head, same_anon_vma) {
 		struct vm_area_struct *vma = avc->vma;
-		unsigned long address = vma_address(page, vma);
-		if (address == -EFAULT)
-			continue;
+		unsigned long address;
+
+		/*
+		 * Guard against deadlocks by not spinning against
+		 * vma->anon_vma->lock. If contention is found, release our
+		 * lock and try again until VMA list can be traversed without
+		 * contention.
+		 */
+		if (anon_vma != vma->anon_vma) {
+			if (!spin_trylock(&vma->anon_vma->lock)) {
+				spin_unlock(&anon_vma->lock);
+				goto retry;
+			}
+		}
+		address = vma_address(page, vma);
+		if (anon_vma != vma->anon_vma)
+			spin_unlock(&vma->anon_vma->lock);
+
 		ret = rmap_one(page, vma, address, arg);
 		if (ret != SWAP_AGAIN)
 			break;
-- 
1.6.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/2] Fix migration races in rmap_walk()
  2010-04-26 22:37 ` Mel Gorman
@ 2010-04-26 23:04   ` Andrea Arcangeli
  -1 siblings, 0 replies; 42+ messages in thread
From: Andrea Arcangeli @ 2010-04-26 23:04 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, Minchan Kim, KAMEZAWA Hiroyuki,
	Christoph Lameter, Rik van Riel, Andrew Morton

On Mon, Apr 26, 2010 at 11:37:56PM +0100, Mel Gorman wrote:
> The other issues raised about expand_downwards will need to be re-examined to
> see if they still exist and transparent hugepage support will need further
> thinking to see if split_huge_page() can deal with these situations.

So patch 1 is for aa.git too, and patch 2 is only for mainline with
the new anon-vma changes (patch 2 not needed in current aa.git, and if
I apply it, it'll deadlock so...) right?

split_huge_page is somewhat simpler and more strict in its checking
than migrate.c in this respect, and yes patch 2 will also need to be
extended to cover split_huge_page the moment I stop backing out the
new anon-vma code (but it won't be any different, whatever works for
migrate will also work for split_huge_page later).

For now I'm much more interested in patch 1 and I'll leave patch 2 to
mainline digestion and check it later hope to find all issues fixed by
the time transparent hugepage gets merged.

About patch 1 it's very interesting because I looked at the race
against fork and migrate yesterday and I didn't see issues but I'm
going to read your patch 1 in detail now to understand what is the
problem you're fixing.

Good you posted this fast, so I can try to help ;)
Andrea

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/2] Fix migration races in rmap_walk()
@ 2010-04-26 23:04   ` Andrea Arcangeli
  0 siblings, 0 replies; 42+ messages in thread
From: Andrea Arcangeli @ 2010-04-26 23:04 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, Minchan Kim, KAMEZAWA Hiroyuki,
	Christoph Lameter, Rik van Riel, Andrew Morton

On Mon, Apr 26, 2010 at 11:37:56PM +0100, Mel Gorman wrote:
> The other issues raised about expand_downwards will need to be re-examined to
> see if they still exist and transparent hugepage support will need further
> thinking to see if split_huge_page() can deal with these situations.

So patch 1 is for aa.git too, and patch 2 is only for mainline with
the new anon-vma changes (patch 2 not needed in current aa.git, and if
I apply it, it'll deadlock so...) right?

split_huge_page is somewhat simpler and more strict in its checking
than migrate.c in this respect, and yes patch 2 will also need to be
extended to cover split_huge_page the moment I stop backing out the
new anon-vma code (but it won't be any different, whatever works for
migrate will also work for split_huge_page later).

For now I'm much more interested in patch 1 and I'll leave patch 2 to
mainline digestion and check it later hope to find all issues fixed by
the time transparent hugepage gets merged.

About patch 1 it's very interesting because I looked at the race
against fork and migrate yesterday and I didn't see issues but I'm
going to read your patch 1 in detail now to understand what is the
problem you're fixing.

Good you posted this fast, so I can try to help ;)
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the  wrong VMA information
  2010-04-26 22:37   ` Mel Gorman
@ 2010-04-26 23:05     ` Minchan Kim
  -1 siblings, 0 replies; 42+ messages in thread
From: Minchan Kim @ 2010-04-26 23:05 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, KAMEZAWA Hiroyuki, Christoph Lameter,
	Andrea Arcangeli, Rik van Riel, Andrew Morton

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 5409 bytes --]

On Tue, Apr 27, 2010 at 7:37 AM, Mel Gorman <mel@csn.ul.ie> wrote:> vma_adjust() is updating anon VMA information without any locks taken.> In contrast, file-backed mappings use the i_mmap_lock and this lack of> locking can result in races with page migration. During rmap_walk(),> vma_address() can return -EFAULT for an address that will soon be valid.> This leaves a dangling migration PTE behind which can later cause a BUG_ON> to trigger when the page is faulted in.>> With the recent anon_vma changes, there can be more than one anon_vma->lock> that can be taken in a anon_vma_chain but a second lock cannot be spinned> upon in case of deadlock. Instead, the rmap walker tries to take locks of> different anon_vma's. If the attempt fails, the operation is restarted.>> Signed-off-by: Mel Gorman <mel@csn.ul.ie>> --->  mm/ksm.c  |   13 +++++++++++++>  mm/mmap.c |    6 ++++++>  mm/rmap.c |   22 +++++++++++++++++++--->  3 files changed, 38 insertions(+), 3 deletions(-)>> diff --git a/mm/ksm.c b/mm/ksm.c> index 3666d43..baa5b4d 100644> --- a/mm/ksm.c> +++ b/mm/ksm.c> @@ -1674,9 +1674,22 @@ again:>                spin_lock(&anon_vma->lock);>                list_for_each_entry(vmac, &anon_vma->head, same_anon_vma) {>                        vma = vmac->vma;> +> +                       /* See comment in mm/rmap.c#rmap_walk_anon on locking */> +                       if (anon_vma != vma->anon_vma) {> +                               if (!spin_trylock(&vma->anon_vma->lock)) {> +                                       spin_unlock(&anon_vma->lock);> +                                       goto again;> +                               }> +                       }> +>                        if (rmap_item->address < vma->vm_start ||>                            rmap_item->address >= vma->vm_end)>                                continue;> +> +                       if (anon_vma != vma->anon_vma)> +                               spin_unlock(&vma->anon_vma->lock);> +>                        /*>                         * Initially we examine only the vma which covers this>                         * rmap_item; but later, if there is still work to do,> diff --git a/mm/mmap.c b/mm/mmap.c> index f90ea92..61d6f1d 100644> --- a/mm/mmap.c> +++ b/mm/mmap.c> @@ -578,6 +578,9 @@ again:                      remove_next = 1 + (end > next->vm_end);>                }>        }>> +       if (vma->anon_vma)> +               spin_lock(&vma->anon_vma->lock);> +>        if (root) {>                flush_dcache_mmap_lock(mapping);>                vma_prio_tree_remove(vma, root);> @@ -620,6 +623,9 @@ again:                      remove_next = 1 + (end > next->vm_end);>        if (mapping)>                spin_unlock(&mapping->i_mmap_lock);>> +       if (vma->anon_vma)> +               spin_unlock(&vma->anon_vma->lock);> +>        if (remove_next) {>                if (file) {>                        fput(file);> diff --git a/mm/rmap.c b/mm/rmap.c> index 85f203e..bc313a6 100644> --- a/mm/rmap.c> +++ b/mm/rmap.c> @@ -1368,15 +1368,31 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,>         * are holding mmap_sem. Users without mmap_sem are required to>         * take a reference count to prevent the anon_vma disappearing>         */> +retry:>        anon_vma = page_anon_vma(page);>        if (!anon_vma)>                return ret;>        spin_lock(&anon_vma->lock);>        list_for_each_entry(avc, &anon_vma->head, same_anon_vma) {>                struct vm_area_struct *vma = avc->vma;> -               unsigned long address = vma_address(page, vma);> -               if (address == -EFAULT)> -                       continue;> +               unsigned long address;> +> +               /*> +                * Guard against deadlocks by not spinning against> +                * vma->anon_vma->lock. If contention is found, release our> +                * lock and try again until VMA list can be traversed without> +                * contention.> +                */> +               if (anon_vma != vma->anon_vma) {> +                       if (!spin_trylock(&vma->anon_vma->lock)) {> +                               spin_unlock(&anon_vma->lock);> +                               goto retry;> +                       }> +               }> +               address = vma_address(page, vma);> +               if (anon_vma != vma->anon_vma)> +                       spin_unlock(&vma->anon_vma->lock);> +
if (address == -EFAULT)        continue;
>                ret = rmap_one(page, vma, address, arg);>                if (ret != SWAP_AGAIN)>                        break;> --> 1.6.5>>


-- Kind regards,Minchan Kimÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
@ 2010-04-26 23:05     ` Minchan Kim
  0 siblings, 0 replies; 42+ messages in thread
From: Minchan Kim @ 2010-04-26 23:05 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, KAMEZAWA Hiroyuki, Christoph Lameter,
	Andrea Arcangeli, Rik van Riel, Andrew Morton

On Tue, Apr 27, 2010 at 7:37 AM, Mel Gorman <mel@csn.ul.ie> wrote:
> vma_adjust() is updating anon VMA information without any locks taken.
> In contrast, file-backed mappings use the i_mmap_lock and this lack of
> locking can result in races with page migration. During rmap_walk(),
> vma_address() can return -EFAULT for an address that will soon be valid.
> This leaves a dangling migration PTE behind which can later cause a BUG_ON
> to trigger when the page is faulted in.
>
> With the recent anon_vma changes, there can be more than one anon_vma->lock
> that can be taken in a anon_vma_chain but a second lock cannot be spinned
> upon in case of deadlock. Instead, the rmap walker tries to take locks of
> different anon_vma's. If the attempt fails, the operation is restarted.
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> ---
>  mm/ksm.c  |   13 +++++++++++++
>  mm/mmap.c |    6 ++++++
>  mm/rmap.c |   22 +++++++++++++++++++---
>  3 files changed, 38 insertions(+), 3 deletions(-)
>
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 3666d43..baa5b4d 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1674,9 +1674,22 @@ again:
>                spin_lock(&anon_vma->lock);
>                list_for_each_entry(vmac, &anon_vma->head, same_anon_vma) {
>                        vma = vmac->vma;
> +
> +                       /* See comment in mm/rmap.c#rmap_walk_anon on locking */
> +                       if (anon_vma != vma->anon_vma) {
> +                               if (!spin_trylock(&vma->anon_vma->lock)) {
> +                                       spin_unlock(&anon_vma->lock);
> +                                       goto again;
> +                               }
> +                       }
> +
>                        if (rmap_item->address < vma->vm_start ||
>                            rmap_item->address >= vma->vm_end)
>                                continue;
> +
> +                       if (anon_vma != vma->anon_vma)
> +                               spin_unlock(&vma->anon_vma->lock);
> +
>                        /*
>                         * Initially we examine only the vma which covers this
>                         * rmap_item; but later, if there is still work to do,
> diff --git a/mm/mmap.c b/mm/mmap.c
> index f90ea92..61d6f1d 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -578,6 +578,9 @@ again:                      remove_next = 1 + (end > next->vm_end);
>                }
>        }
>
> +       if (vma->anon_vma)
> +               spin_lock(&vma->anon_vma->lock);
> +
>        if (root) {
>                flush_dcache_mmap_lock(mapping);
>                vma_prio_tree_remove(vma, root);
> @@ -620,6 +623,9 @@ again:                      remove_next = 1 + (end > next->vm_end);
>        if (mapping)
>                spin_unlock(&mapping->i_mmap_lock);
>
> +       if (vma->anon_vma)
> +               spin_unlock(&vma->anon_vma->lock);
> +
>        if (remove_next) {
>                if (file) {
>                        fput(file);
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 85f203e..bc313a6 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1368,15 +1368,31 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
>         * are holding mmap_sem. Users without mmap_sem are required to
>         * take a reference count to prevent the anon_vma disappearing
>         */
> +retry:
>        anon_vma = page_anon_vma(page);
>        if (!anon_vma)
>                return ret;
>        spin_lock(&anon_vma->lock);
>        list_for_each_entry(avc, &anon_vma->head, same_anon_vma) {
>                struct vm_area_struct *vma = avc->vma;
> -               unsigned long address = vma_address(page, vma);
> -               if (address == -EFAULT)
> -                       continue;
> +               unsigned long address;
> +
> +               /*
> +                * Guard against deadlocks by not spinning against
> +                * vma->anon_vma->lock. If contention is found, release our
> +                * lock and try again until VMA list can be traversed without
> +                * contention.
> +                */
> +               if (anon_vma != vma->anon_vma) {
> +                       if (!spin_trylock(&vma->anon_vma->lock)) {
> +                               spin_unlock(&anon_vma->lock);
> +                               goto retry;
> +                       }
> +               }
> +               address = vma_address(page, vma);
> +               if (anon_vma != vma->anon_vma)
> +                       spin_unlock(&vma->anon_vma->lock);
> +

if (address == -EFAULT)
        continue;

>                ret = rmap_one(page, vma, address, arg);
>                if (ret != SWAP_AGAIN)
>                        break;
> --
> 1.6.5
>
>



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the  wrong VMA information
  2010-04-26 22:37   ` Mel Gorman
@ 2010-04-26 23:15     ` Minchan Kim
  -1 siblings, 0 replies; 42+ messages in thread
From: Minchan Kim @ 2010-04-26 23:15 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, KAMEZAWA Hiroyuki, Christoph Lameter,
	Andrea Arcangeli, Rik van Riel, Andrew Morton

On Tue, Apr 27, 2010 at 7:37 AM, Mel Gorman <mel@csn.ul.ie> wrote:
> vma_adjust() is updating anon VMA information without any locks taken.
> In contrast, file-backed mappings use the i_mmap_lock and this lack of
> locking can result in races with page migration. During rmap_walk(),
> vma_address() can return -EFAULT for an address that will soon be valid.
> This leaves a dangling migration PTE behind which can later cause a BUG_ON
> to trigger when the page is faulted in.
>
> With the recent anon_vma changes, there can be more than one anon_vma->lock
> that can be taken in a anon_vma_chain but a second lock cannot be spinned
> upon in case of deadlock. Instead, the rmap walker tries to take locks of
> different anon_vma's. If the attempt fails, the operation is restarted.
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>

Actually, I am worry about rollback approach like this.
If we don't often need anon_vmas serializing, that's enough.
But I am not sure how often we need locking of anon_vmas like this.
Whenever we need it, we have to use rollback approach like this in future.
In my opinion, it's not good.

Rik, can't we make anon_vma locks more simple?
Anyway, Mel's patch is best now.

I hope improving locks of anon_vmas without rollback approach in near future.

Thanks, Mel.


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
@ 2010-04-26 23:15     ` Minchan Kim
  0 siblings, 0 replies; 42+ messages in thread
From: Minchan Kim @ 2010-04-26 23:15 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, KAMEZAWA Hiroyuki, Christoph Lameter,
	Andrea Arcangeli, Rik van Riel, Andrew Morton

On Tue, Apr 27, 2010 at 7:37 AM, Mel Gorman <mel@csn.ul.ie> wrote:
> vma_adjust() is updating anon VMA information without any locks taken.
> In contrast, file-backed mappings use the i_mmap_lock and this lack of
> locking can result in races with page migration. During rmap_walk(),
> vma_address() can return -EFAULT for an address that will soon be valid.
> This leaves a dangling migration PTE behind which can later cause a BUG_ON
> to trigger when the page is faulted in.
>
> With the recent anon_vma changes, there can be more than one anon_vma->lock
> that can be taken in a anon_vma_chain but a second lock cannot be spinned
> upon in case of deadlock. Instead, the rmap walker tries to take locks of
> different anon_vma's. If the attempt fails, the operation is restarted.
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>

Actually, I am worry about rollback approach like this.
If we don't often need anon_vmas serializing, that's enough.
But I am not sure how often we need locking of anon_vmas like this.
Whenever we need it, we have to use rollback approach like this in future.
In my opinion, it's not good.

Rik, can't we make anon_vma locks more simple?
Anyway, Mel's patch is best now.

I hope improving locks of anon_vmas without rollback approach in near future.

Thanks, Mel.


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] mm,migration: During fork(), wait for migration to  end if migration PTE is encountered
  2010-04-26 22:37   ` Mel Gorman
@ 2010-04-26 23:28     ` Minchan Kim
  -1 siblings, 0 replies; 42+ messages in thread
From: Minchan Kim @ 2010-04-26 23:28 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, KAMEZAWA Hiroyuki, Christoph Lameter,
	Andrea Arcangeli, Rik van Riel, Andrew Morton

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 3777 bytes --]

On Tue, Apr 27, 2010 at 7:37 AM, Mel Gorman <mel@csn.ul.ie> wrote:> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>>> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>>> At page migration, we replace pte with migration_entry, which has> similar format as swap_entry and replace it with real pfn at the> end of migration. But there is a race with fork()'s copy_page_range().>> Assume page migraion on CPU A and fork in CPU B. On CPU A, a page of> a process is under migration. On CPU B, a page's pte is under copy.>>        CPUA                    CPU B>                                do_fork()>                                copy_mm() (from process 1 to process2)>                                insert new vma to mmap_list (if inode/anon_vma)>        pte_lock(process1)>        unmap a page>        insert migration_entry>        pte_unlock(process1)>>        migrate page copy>                                copy_page_range>        remap new page by rmap_walk()>        pte_lock(process2)>        found no pte.>        pte_unlock(process2)>                                pte lock(process2)>                                pte lock(process1)>                                copy migration entry to process2>                                pte unlock(process1)>                                pte unlokc(process2)>        pte_lock(process1)>        replace migration entry>        to new page's pte.>        pte_unlock(process1)>> Then, some serialization is necessary. IIUC, this is very rare event but> it is reproducible if a lot of migration is happening a lot with the> following program running in parallel.>>    #include <stdio.h>>    #include <string.h>>    #include <stdlib.h>>    #include <sys/mman.h>>>    #define SIZE (24*1048576UL)>    #define CHILDREN 100>    int main()>    {>            int i = 0;>            pid_t pids[CHILDREN];>            char *buf = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,>                            MAP_PRIVATE|MAP_ANONYMOUS,>                            0, 0);>            if (buf == MAP_FAILED) {>                    perror("mmap");>                    exit(-1);>            }>>            while (++i) {>                    int j = i % CHILDREN;>>                    if (j == 0) {>                            printf("Waiting on children\n");>                            for (j = 0; j < CHILDREN; j++) {>                                    memset(buf, i, SIZE);>                                    if (pids[j] != -1)>                                            waitpid(pids[j], NULL, 0);>                            }>                            j = 0;>                    }>>                    if ((pids[j] = fork()) == 0) {>                            memset(buf, i, SIZE);>                            exit(EXIT_SUCCESS);>                    }>            }>>            munmap(buf, SIZE);>    }>> copy_page_range() can wait for the end of migration.>> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>> Signed-off-by: Mel Gorman <mel@csn.ul.ie>Reviewed-by : Minchan Kim <minchan.kim@gmail.com>
-- Kind regards,Minchan Kimÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] mm,migration: During fork(), wait for migration to end if migration PTE is encountered
@ 2010-04-26 23:28     ` Minchan Kim
  0 siblings, 0 replies; 42+ messages in thread
From: Minchan Kim @ 2010-04-26 23:28 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, KAMEZAWA Hiroyuki, Christoph Lameter,
	Andrea Arcangeli, Rik van Riel, Andrew Morton

On Tue, Apr 27, 2010 at 7:37 AM, Mel Gorman <mel@csn.ul.ie> wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> At page migration, we replace pte with migration_entry, which has
> similar format as swap_entry and replace it with real pfn at the
> end of migration. But there is a race with fork()'s copy_page_range().
>
> Assume page migraion on CPU A and fork in CPU B. On CPU A, a page of
> a process is under migration. On CPU B, a page's pte is under copy.
>
>        CPUA                    CPU B
>                                do_fork()
>                                copy_mm() (from process 1 to process2)
>                                insert new vma to mmap_list (if inode/anon_vma)
>        pte_lock(process1)
>        unmap a page
>        insert migration_entry
>        pte_unlock(process1)
>
>        migrate page copy
>                                copy_page_range
>        remap new page by rmap_walk()
>        pte_lock(process2)
>        found no pte.
>        pte_unlock(process2)
>                                pte lock(process2)
>                                pte lock(process1)
>                                copy migration entry to process2
>                                pte unlock(process1)
>                                pte unlokc(process2)
>        pte_lock(process1)
>        replace migration entry
>        to new page's pte.
>        pte_unlock(process1)
>
> Then, some serialization is necessary. IIUC, this is very rare event but
> it is reproducible if a lot of migration is happening a lot with the
> following program running in parallel.
>
>    #include <stdio.h>
>    #include <string.h>
>    #include <stdlib.h>
>    #include <sys/mman.h>
>
>    #define SIZE (24*1048576UL)
>    #define CHILDREN 100
>    int main()
>    {
>            int i = 0;
>            pid_t pids[CHILDREN];
>            char *buf = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
>                            MAP_PRIVATE|MAP_ANONYMOUS,
>                            0, 0);
>            if (buf == MAP_FAILED) {
>                    perror("mmap");
>                    exit(-1);
>            }
>
>            while (++i) {
>                    int j = i % CHILDREN;
>
>                    if (j == 0) {
>                            printf("Waiting on children\n");
>                            for (j = 0; j < CHILDREN; j++) {
>                                    memset(buf, i, SIZE);
>                                    if (pids[j] != -1)
>                                            waitpid(pids[j], NULL, 0);
>                            }
>                            j = 0;
>                    }
>
>                    if ((pids[j] = fork()) == 0) {
>                            memset(buf, i, SIZE);
>                            exit(EXIT_SUCCESS);
>                    }
>            }
>
>            munmap(buf, SIZE);
>    }
>
> copy_page_range() can wait for the end of migration.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by : Minchan Kim <minchan.kim@gmail.com>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
  2010-04-26 22:37   ` Mel Gorman
@ 2010-04-27  0:07     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 42+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-04-27  0:07 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, Minchan Kim, Christoph Lameter, Andrea Arcangeli,
	Rik van Riel, Andrew Morton

On Mon, 26 Apr 2010 23:37:58 +0100
Mel Gorman <mel@csn.ul.ie> wrote:

> vma_adjust() is updating anon VMA information without any locks taken.
> In contrast, file-backed mappings use the i_mmap_lock and this lack of
> locking can result in races with page migration. During rmap_walk(),
> vma_address() can return -EFAULT for an address that will soon be valid.
> This leaves a dangling migration PTE behind which can later cause a BUG_ON
> to trigger when the page is faulted in.
> 
> With the recent anon_vma changes, there can be more than one anon_vma->lock
> that can be taken in a anon_vma_chain but a second lock cannot be spinned
> upon in case of deadlock. Instead, the rmap walker tries to take locks of
> different anon_vma's. If the attempt fails, the operation is restarted.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>

Ok, acquiring vma->anon_vma->spin_lock always sounds very safe.
(but slow.)

I'll test this, too.

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>



> ---
>  mm/ksm.c  |   13 +++++++++++++
>  mm/mmap.c |    6 ++++++
>  mm/rmap.c |   22 +++++++++++++++++++---
>  3 files changed, 38 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 3666d43..baa5b4d 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1674,9 +1674,22 @@ again:
>  		spin_lock(&anon_vma->lock);
>  		list_for_each_entry(vmac, &anon_vma->head, same_anon_vma) {
>  			vma = vmac->vma;
> +
> +			/* See comment in mm/rmap.c#rmap_walk_anon on locking */
> +			if (anon_vma != vma->anon_vma) {
> +				if (!spin_trylock(&vma->anon_vma->lock)) {
> +					spin_unlock(&anon_vma->lock);
> +					goto again;
> +				}
> +			}
> +
>  			if (rmap_item->address < vma->vm_start ||
>  			    rmap_item->address >= vma->vm_end)
>  				continue;
> +
> +			if (anon_vma != vma->anon_vma)
> +				spin_unlock(&vma->anon_vma->lock);
> +
>  			/*
>  			 * Initially we examine only the vma which covers this
>  			 * rmap_item; but later, if there is still work to do,
> diff --git a/mm/mmap.c b/mm/mmap.c
> index f90ea92..61d6f1d 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -578,6 +578,9 @@ again:			remove_next = 1 + (end > next->vm_end);
>  		}
>  	}
>  
> +	if (vma->anon_vma)
> +		spin_lock(&vma->anon_vma->lock);
> +
>  	if (root) {
>  		flush_dcache_mmap_lock(mapping);
>  		vma_prio_tree_remove(vma, root);
> @@ -620,6 +623,9 @@ again:			remove_next = 1 + (end > next->vm_end);
>  	if (mapping)
>  		spin_unlock(&mapping->i_mmap_lock);
>  
> +	if (vma->anon_vma)
> +		spin_unlock(&vma->anon_vma->lock);
> +
>  	if (remove_next) {
>  		if (file) {
>  			fput(file);
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 85f203e..bc313a6 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1368,15 +1368,31 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
>  	 * are holding mmap_sem. Users without mmap_sem are required to
>  	 * take a reference count to prevent the anon_vma disappearing
>  	 */
> +retry:
>  	anon_vma = page_anon_vma(page);
>  	if (!anon_vma)
>  		return ret;
>  	spin_lock(&anon_vma->lock);
>  	list_for_each_entry(avc, &anon_vma->head, same_anon_vma) {
>  		struct vm_area_struct *vma = avc->vma;
> -		unsigned long address = vma_address(page, vma);
> -		if (address == -EFAULT)
> -			continue;
> +		unsigned long address;
> +
> +		/*
> +		 * Guard against deadlocks by not spinning against
> +		 * vma->anon_vma->lock. If contention is found, release our
> +		 * lock and try again until VMA list can be traversed without
> +		 * contention.
> +		 */
> +		if (anon_vma != vma->anon_vma) {
> +			if (!spin_trylock(&vma->anon_vma->lock)) {
> +				spin_unlock(&anon_vma->lock);
> +				goto retry;
> +			}
> +		}
> +		address = vma_address(page, vma);
> +		if (anon_vma != vma->anon_vma)
> +			spin_unlock(&vma->anon_vma->lock);
> +
>  		ret = rmap_one(page, vma, address, arg);
>  		if (ret != SWAP_AGAIN)
>  			break;
> -- 
> 1.6.5
> 
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
@ 2010-04-27  0:07     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 42+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-04-27  0:07 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, Minchan Kim, Christoph Lameter, Andrea Arcangeli,
	Rik van Riel, Andrew Morton

On Mon, 26 Apr 2010 23:37:58 +0100
Mel Gorman <mel@csn.ul.ie> wrote:

> vma_adjust() is updating anon VMA information without any locks taken.
> In contrast, file-backed mappings use the i_mmap_lock and this lack of
> locking can result in races with page migration. During rmap_walk(),
> vma_address() can return -EFAULT for an address that will soon be valid.
> This leaves a dangling migration PTE behind which can later cause a BUG_ON
> to trigger when the page is faulted in.
> 
> With the recent anon_vma changes, there can be more than one anon_vma->lock
> that can be taken in a anon_vma_chain but a second lock cannot be spinned
> upon in case of deadlock. Instead, the rmap walker tries to take locks of
> different anon_vma's. If the attempt fails, the operation is restarted.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>

Ok, acquiring vma->anon_vma->spin_lock always sounds very safe.
(but slow.)

I'll test this, too.

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>



> ---
>  mm/ksm.c  |   13 +++++++++++++
>  mm/mmap.c |    6 ++++++
>  mm/rmap.c |   22 +++++++++++++++++++---
>  3 files changed, 38 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 3666d43..baa5b4d 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1674,9 +1674,22 @@ again:
>  		spin_lock(&anon_vma->lock);
>  		list_for_each_entry(vmac, &anon_vma->head, same_anon_vma) {
>  			vma = vmac->vma;
> +
> +			/* See comment in mm/rmap.c#rmap_walk_anon on locking */
> +			if (anon_vma != vma->anon_vma) {
> +				if (!spin_trylock(&vma->anon_vma->lock)) {
> +					spin_unlock(&anon_vma->lock);
> +					goto again;
> +				}
> +			}
> +
>  			if (rmap_item->address < vma->vm_start ||
>  			    rmap_item->address >= vma->vm_end)
>  				continue;
> +
> +			if (anon_vma != vma->anon_vma)
> +				spin_unlock(&vma->anon_vma->lock);
> +
>  			/*
>  			 * Initially we examine only the vma which covers this
>  			 * rmap_item; but later, if there is still work to do,
> diff --git a/mm/mmap.c b/mm/mmap.c
> index f90ea92..61d6f1d 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -578,6 +578,9 @@ again:			remove_next = 1 + (end > next->vm_end);
>  		}
>  	}
>  
> +	if (vma->anon_vma)
> +		spin_lock(&vma->anon_vma->lock);
> +
>  	if (root) {
>  		flush_dcache_mmap_lock(mapping);
>  		vma_prio_tree_remove(vma, root);
> @@ -620,6 +623,9 @@ again:			remove_next = 1 + (end > next->vm_end);
>  	if (mapping)
>  		spin_unlock(&mapping->i_mmap_lock);
>  
> +	if (vma->anon_vma)
> +		spin_unlock(&vma->anon_vma->lock);
> +
>  	if (remove_next) {
>  		if (file) {
>  			fput(file);
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 85f203e..bc313a6 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1368,15 +1368,31 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
>  	 * are holding mmap_sem. Users without mmap_sem are required to
>  	 * take a reference count to prevent the anon_vma disappearing
>  	 */
> +retry:
>  	anon_vma = page_anon_vma(page);
>  	if (!anon_vma)
>  		return ret;
>  	spin_lock(&anon_vma->lock);
>  	list_for_each_entry(avc, &anon_vma->head, same_anon_vma) {
>  		struct vm_area_struct *vma = avc->vma;
> -		unsigned long address = vma_address(page, vma);
> -		if (address == -EFAULT)
> -			continue;
> +		unsigned long address;
> +
> +		/*
> +		 * Guard against deadlocks by not spinning against
> +		 * vma->anon_vma->lock. If contention is found, release our
> +		 * lock and try again until VMA list can be traversed without
> +		 * contention.
> +		 */
> +		if (anon_vma != vma->anon_vma) {
> +			if (!spin_trylock(&vma->anon_vma->lock)) {
> +				spin_unlock(&anon_vma->lock);
> +				goto retry;
> +			}
> +		}
> +		address = vma_address(page, vma);
> +		if (anon_vma != vma->anon_vma)
> +			spin_unlock(&vma->anon_vma->lock);
> +
>  		ret = rmap_one(page, vma, address, arg);
>  		if (ret != SWAP_AGAIN)
>  			break;
> -- 
> 1.6.5
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] mm,migration: During fork(), wait for migration to end if migration PTE is encountered
  2010-04-26 22:37   ` Mel Gorman
@ 2010-04-27  0:08     ` Rik van Riel
  -1 siblings, 0 replies; 42+ messages in thread
From: Rik van Riel @ 2010-04-27  0:08 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, Minchan Kim, KAMEZAWA Hiroyuki,
	Christoph Lameter, Andrea Arcangeli, Andrew Morton

On 04/26/2010 06:37 PM, Mel Gorman wrote:
> From: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>
> From: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>
> At page migration, we replace pte with migration_entry, which has
> similar format as swap_entry and replace it with real pfn at the
> end of migration. But there is a race with fork()'s copy_page_range().

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] mm,migration: During fork(), wait for migration to end if migration PTE is encountered
@ 2010-04-27  0:08     ` Rik van Riel
  0 siblings, 0 replies; 42+ messages in thread
From: Rik van Riel @ 2010-04-27  0:08 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, Minchan Kim, KAMEZAWA Hiroyuki,
	Christoph Lameter, Andrea Arcangeli, Andrew Morton

On 04/26/2010 06:37 PM, Mel Gorman wrote:
> From: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>
> From: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>
> At page migration, we replace pte with migration_entry, which has
> similar format as swap_entry and replace it with real pfn at the
> end of migration. But there is a race with fork()'s copy_page_range().

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the  wrong VMA information
  2010-04-26 23:05     ` Minchan Kim
@ 2010-04-27  0:30       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 42+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-04-27  0:30 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Mel Gorman, Linux-MM, LKML, Christoph Lameter, Andrea Arcangeli,
	Rik van Riel, Andrew Morton

On Tue, 27 Apr 2010 08:05:26 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> On Tue, Apr 27, 2010 at 7:37 AM, Mel Gorman <mel@csn.ul.ie> wrote:
> > vma_adjust() is updating anon VMA information without any locks taken.
> > In contrast, file-backed mappings use the i_mmap_lock and this lack of
> > locking can result in races with page migration. During rmap_walk(),
> > vma_address() can return -EFAULT for an address that will soon be valid.
> > This leaves a dangling migration PTE behind which can later cause a BUG_ON
> > to trigger when the page is faulted in.
> >
> > With the recent anon_vma changes, there can be more than one anon_vma->lock
> > that can be taken in a anon_vma_chain but a second lock cannot be spinned
> > upon in case of deadlock. Instead, the rmap walker tries to take locks of
> > different anon_vma's. If the attempt fails, the operation is restarted.
> >
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > ---
> >  mm/ksm.c  |   13 +++++++++++++
> >  mm/mmap.c |    6 ++++++
> >  mm/rmap.c |   22 +++++++++++++++++++---
> >  3 files changed, 38 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/ksm.c b/mm/ksm.c
> > index 3666d43..baa5b4d 100644
> > --- a/mm/ksm.c
> > +++ b/mm/ksm.c
> > @@ -1674,9 +1674,22 @@ again:
> >                spin_lock(&anon_vma->lock);
> >                list_for_each_entry(vmac, &anon_vma->head, same_anon_vma) {
> >                        vma = vmac->vma;
> > +
> > +                       /* See comment in mm/rmap.c#rmap_walk_anon on locking */
> > +                       if (anon_vma != vma->anon_vma) {
> > +                               if (!spin_trylock(&vma->anon_vma->lock)) {
> > +                                       spin_unlock(&anon_vma->lock);
> > +                                       goto again;
> > +                               }
> > +                       }
> > +
> >                        if (rmap_item->address < vma->vm_start ||
> >                            rmap_item->address >= vma->vm_end)
> >                                continue;
> > +
> > +                       if (anon_vma != vma->anon_vma)
> > +                               spin_unlock(&vma->anon_vma->lock);
> > +
> >                        /*
> >                         * Initially we examine only the vma which covers this
> >                         * rmap_item; but later, if there is still work to do,
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index f90ea92..61d6f1d 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -578,6 +578,9 @@ again:                      remove_next = 1 + (end > next->vm_end);
> >                }
> >        }
> >
> > +       if (vma->anon_vma)
> > +               spin_lock(&vma->anon_vma->lock);
> > +
> >        if (root) {
> >                flush_dcache_mmap_lock(mapping);
> >                vma_prio_tree_remove(vma, root);
> > @@ -620,6 +623,9 @@ again:                      remove_next = 1 + (end > next->vm_end);
> >        if (mapping)
> >                spin_unlock(&mapping->i_mmap_lock);
> >
> > +       if (vma->anon_vma)
> > +               spin_unlock(&vma->anon_vma->lock);
> > +
> >        if (remove_next) {
> >                if (file) {
> >                        fput(file);
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index 85f203e..bc313a6 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1368,15 +1368,31 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
> >         * are holding mmap_sem. Users without mmap_sem are required to
> >         * take a reference count to prevent the anon_vma disappearing
> >         */
> > +retry:
> >        anon_vma = page_anon_vma(page);
> >        if (!anon_vma)
> >                return ret;
> >        spin_lock(&anon_vma->lock);
> >        list_for_each_entry(avc, &anon_vma->head, same_anon_vma) {
> >                struct vm_area_struct *vma = avc->vma;
> > -               unsigned long address = vma_address(page, vma);
> > -               if (address == -EFAULT)
> > -                       continue;
> > +               unsigned long address;
> > +
> > +               /*
> > +                * Guard against deadlocks by not spinning against
> > +                * vma->anon_vma->lock. If contention is found, release our
> > +                * lock and try again until VMA list can be traversed without
> > +                * contention.
> > +                */
> > +               if (anon_vma != vma->anon_vma) {
> > +                       if (!spin_trylock(&vma->anon_vma->lock)) {
> > +                               spin_unlock(&anon_vma->lock);
> > +                               goto retry;
> > +                       }
> > +               }
> > +               address = vma_address(page, vma);
> > +               if (anon_vma != vma->anon_vma)
> > +                       spin_unlock(&vma->anon_vma->lock);
> > +
> 
> if (address == -EFAULT)
>         continue;
> 
yes. thank you for pointing out.

-Kame


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the  wrong VMA information
@ 2010-04-27  0:30       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 42+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-04-27  0:30 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Mel Gorman, Linux-MM, LKML, Christoph Lameter, Andrea Arcangeli,
	Rik van Riel, Andrew Morton

On Tue, 27 Apr 2010 08:05:26 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> On Tue, Apr 27, 2010 at 7:37 AM, Mel Gorman <mel@csn.ul.ie> wrote:
> > vma_adjust() is updating anon VMA information without any locks taken.
> > In contrast, file-backed mappings use the i_mmap_lock and this lack of
> > locking can result in races with page migration. During rmap_walk(),
> > vma_address() can return -EFAULT for an address that will soon be valid.
> > This leaves a dangling migration PTE behind which can later cause a BUG_ON
> > to trigger when the page is faulted in.
> >
> > With the recent anon_vma changes, there can be more than one anon_vma->lock
> > that can be taken in a anon_vma_chain but a second lock cannot be spinned
> > upon in case of deadlock. Instead, the rmap walker tries to take locks of
> > different anon_vma's. If the attempt fails, the operation is restarted.
> >
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > ---
> > A mm/ksm.c A | A  13 +++++++++++++
> > A mm/mmap.c | A  A 6 ++++++
> > A mm/rmap.c | A  22 +++++++++++++++++++---
> > A 3 files changed, 38 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/ksm.c b/mm/ksm.c
> > index 3666d43..baa5b4d 100644
> > --- a/mm/ksm.c
> > +++ b/mm/ksm.c
> > @@ -1674,9 +1674,22 @@ again:
> > A  A  A  A  A  A  A  A spin_lock(&anon_vma->lock);
> > A  A  A  A  A  A  A  A list_for_each_entry(vmac, &anon_vma->head, same_anon_vma) {
> > A  A  A  A  A  A  A  A  A  A  A  A vma = vmac->vma;
> > +
> > + A  A  A  A  A  A  A  A  A  A  A  /* See comment in mm/rmap.c#rmap_walk_anon on locking */
> > + A  A  A  A  A  A  A  A  A  A  A  if (anon_vma != vma->anon_vma) {
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  if (!spin_trylock(&vma->anon_vma->lock)) {
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  spin_unlock(&anon_vma->lock);
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  goto again;
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  }
> > + A  A  A  A  A  A  A  A  A  A  A  }
> > +
> > A  A  A  A  A  A  A  A  A  A  A  A if (rmap_item->address < vma->vm_start ||
> > A  A  A  A  A  A  A  A  A  A  A  A  A  A rmap_item->address >= vma->vm_end)
> > A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A continue;
> > +
> > + A  A  A  A  A  A  A  A  A  A  A  if (anon_vma != vma->anon_vma)
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  spin_unlock(&vma->anon_vma->lock);
> > +
> > A  A  A  A  A  A  A  A  A  A  A  A /*
> > A  A  A  A  A  A  A  A  A  A  A  A  * Initially we examine only the vma which covers this
> > A  A  A  A  A  A  A  A  A  A  A  A  * rmap_item; but later, if there is still work to do,
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index f90ea92..61d6f1d 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -578,6 +578,9 @@ again: A  A  A  A  A  A  A  A  A  A  A remove_next = 1 + (end > next->vm_end);
> > A  A  A  A  A  A  A  A }
> > A  A  A  A }
> >
> > + A  A  A  if (vma->anon_vma)
> > + A  A  A  A  A  A  A  spin_lock(&vma->anon_vma->lock);
> > +
> > A  A  A  A if (root) {
> > A  A  A  A  A  A  A  A flush_dcache_mmap_lock(mapping);
> > A  A  A  A  A  A  A  A vma_prio_tree_remove(vma, root);
> > @@ -620,6 +623,9 @@ again: A  A  A  A  A  A  A  A  A  A  A remove_next = 1 + (end > next->vm_end);
> > A  A  A  A if (mapping)
> > A  A  A  A  A  A  A  A spin_unlock(&mapping->i_mmap_lock);
> >
> > + A  A  A  if (vma->anon_vma)
> > + A  A  A  A  A  A  A  spin_unlock(&vma->anon_vma->lock);
> > +
> > A  A  A  A if (remove_next) {
> > A  A  A  A  A  A  A  A if (file) {
> > A  A  A  A  A  A  A  A  A  A  A  A fput(file);
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index 85f203e..bc313a6 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1368,15 +1368,31 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
> > A  A  A  A  * are holding mmap_sem. Users without mmap_sem are required to
> > A  A  A  A  * take a reference count to prevent the anon_vma disappearing
> > A  A  A  A  */
> > +retry:
> > A  A  A  A anon_vma = page_anon_vma(page);
> > A  A  A  A if (!anon_vma)
> > A  A  A  A  A  A  A  A return ret;
> > A  A  A  A spin_lock(&anon_vma->lock);
> > A  A  A  A list_for_each_entry(avc, &anon_vma->head, same_anon_vma) {
> > A  A  A  A  A  A  A  A struct vm_area_struct *vma = avc->vma;
> > - A  A  A  A  A  A  A  unsigned long address = vma_address(page, vma);
> > - A  A  A  A  A  A  A  if (address == -EFAULT)
> > - A  A  A  A  A  A  A  A  A  A  A  continue;
> > + A  A  A  A  A  A  A  unsigned long address;
> > +
> > + A  A  A  A  A  A  A  /*
> > + A  A  A  A  A  A  A  A * Guard against deadlocks by not spinning against
> > + A  A  A  A  A  A  A  A * vma->anon_vma->lock. If contention is found, release our
> > + A  A  A  A  A  A  A  A * lock and try again until VMA list can be traversed without
> > + A  A  A  A  A  A  A  A * contention.
> > + A  A  A  A  A  A  A  A */
> > + A  A  A  A  A  A  A  if (anon_vma != vma->anon_vma) {
> > + A  A  A  A  A  A  A  A  A  A  A  if (!spin_trylock(&vma->anon_vma->lock)) {
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  spin_unlock(&anon_vma->lock);
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  goto retry;
> > + A  A  A  A  A  A  A  A  A  A  A  }
> > + A  A  A  A  A  A  A  }
> > + A  A  A  A  A  A  A  address = vma_address(page, vma);
> > + A  A  A  A  A  A  A  if (anon_vma != vma->anon_vma)
> > + A  A  A  A  A  A  A  A  A  A  A  spin_unlock(&vma->anon_vma->lock);
> > +
> 
> if (address == -EFAULT)
>         continue;
> 
yes. thank you for pointing out.

-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
  2010-04-26 22:37   ` Mel Gorman
@ 2010-04-27  0:30     ` Rik van Riel
  -1 siblings, 0 replies; 42+ messages in thread
From: Rik van Riel @ 2010-04-27  0:30 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, Minchan Kim, KAMEZAWA Hiroyuki,
	Christoph Lameter, Andrea Arcangeli, Andrew Morton

On 04/26/2010 06:37 PM, Mel Gorman wrote:

> diff --git a/mm/rmap.c b/mm/rmap.c
> index 85f203e..bc313a6 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1368,15 +1368,31 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
>   	 * are holding mmap_sem. Users without mmap_sem are required to
>   	 * take a reference count to prevent the anon_vma disappearing
>   	 */
> +retry:
>   	anon_vma = page_anon_vma(page);
>   	if (!anon_vma)
>   		return ret;
>   	spin_lock(&anon_vma->lock);
>   	list_for_each_entry(avc,&anon_vma->head, same_anon_vma) {
>   		struct vm_area_struct *vma = avc->vma;
> -		unsigned long address = vma_address(page, vma);
> -		if (address == -EFAULT)
> -			continue;
> +		unsigned long address;
> +
> +		/*
> +		 * Guard against deadlocks by not spinning against
> +		 * vma->anon_vma->lock. If contention is found, release our
> +		 * lock and try again until VMA list can be traversed without
> +		 * contention.
> +		 */
> +		if (anon_vma != vma->anon_vma) {
> +			if (!spin_trylock(&vma->anon_vma->lock)) {
> +				spin_unlock(&anon_vma->lock);
> +				goto retry;
> +			}
> +		}

If you're part way down the list, surely you'll need to
unlock multiple anon_vmas here before going to retry?

Otherwise you could forget to unlock one that you already
locked, and live-lock here.

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
@ 2010-04-27  0:30     ` Rik van Riel
  0 siblings, 0 replies; 42+ messages in thread
From: Rik van Riel @ 2010-04-27  0:30 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, Minchan Kim, KAMEZAWA Hiroyuki,
	Christoph Lameter, Andrea Arcangeli, Andrew Morton

On 04/26/2010 06:37 PM, Mel Gorman wrote:

> diff --git a/mm/rmap.c b/mm/rmap.c
> index 85f203e..bc313a6 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1368,15 +1368,31 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
>   	 * are holding mmap_sem. Users without mmap_sem are required to
>   	 * take a reference count to prevent the anon_vma disappearing
>   	 */
> +retry:
>   	anon_vma = page_anon_vma(page);
>   	if (!anon_vma)
>   		return ret;
>   	spin_lock(&anon_vma->lock);
>   	list_for_each_entry(avc,&anon_vma->head, same_anon_vma) {
>   		struct vm_area_struct *vma = avc->vma;
> -		unsigned long address = vma_address(page, vma);
> -		if (address == -EFAULT)
> -			continue;
> +		unsigned long address;
> +
> +		/*
> +		 * Guard against deadlocks by not spinning against
> +		 * vma->anon_vma->lock. If contention is found, release our
> +		 * lock and try again until VMA list can be traversed without
> +		 * contention.
> +		 */
> +		if (anon_vma != vma->anon_vma) {
> +			if (!spin_trylock(&vma->anon_vma->lock)) {
> +				spin_unlock(&anon_vma->lock);
> +				goto retry;
> +			}
> +		}

If you're part way down the list, surely you'll need to
unlock multiple anon_vmas here before going to retry?

Otherwise you could forget to unlock one that you already
locked, and live-lock here.

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
  2010-04-27  0:30     ` Rik van Riel
@ 2010-04-27  0:31       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 42+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-04-27  0:31 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Mel Gorman, Linux-MM, LKML, Minchan Kim, Christoph Lameter,
	Andrea Arcangeli, Andrew Morton

On Mon, 26 Apr 2010 20:30:41 -0400
Rik van Riel <riel@redhat.com> wrote:

> On 04/26/2010 06:37 PM, Mel Gorman wrote:
> 
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index 85f203e..bc313a6 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1368,15 +1368,31 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
> >   	 * are holding mmap_sem. Users without mmap_sem are required to
> >   	 * take a reference count to prevent the anon_vma disappearing
> >   	 */
> > +retry:
> >   	anon_vma = page_anon_vma(page);
> >   	if (!anon_vma)
> >   		return ret;
> >   	spin_lock(&anon_vma->lock);
> >   	list_for_each_entry(avc,&anon_vma->head, same_anon_vma) {
> >   		struct vm_area_struct *vma = avc->vma;
> > -		unsigned long address = vma_address(page, vma);
> > -		if (address == -EFAULT)
> > -			continue;
> > +		unsigned long address;
> > +
> > +		/*
> > +		 * Guard against deadlocks by not spinning against
> > +		 * vma->anon_vma->lock. If contention is found, release our
> > +		 * lock and try again until VMA list can be traversed without
> > +		 * contention.
> > +		 */
> > +		if (anon_vma != vma->anon_vma) {
> > +			if (!spin_trylock(&vma->anon_vma->lock)) {
> > +				spin_unlock(&anon_vma->lock);
> > +				goto retry;
> > +			}
> > +		}
> 
> If you're part way down the list, surely you'll need to
> unlock multiple anon_vmas here before going to retry?
> 
vma->anon_vma->lock is released after vma_address().

-Kame


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
@ 2010-04-27  0:31       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 42+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-04-27  0:31 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Mel Gorman, Linux-MM, LKML, Minchan Kim, Christoph Lameter,
	Andrea Arcangeli, Andrew Morton

On Mon, 26 Apr 2010 20:30:41 -0400
Rik van Riel <riel@redhat.com> wrote:

> On 04/26/2010 06:37 PM, Mel Gorman wrote:
> 
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index 85f203e..bc313a6 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1368,15 +1368,31 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
> >   	 * are holding mmap_sem. Users without mmap_sem are required to
> >   	 * take a reference count to prevent the anon_vma disappearing
> >   	 */
> > +retry:
> >   	anon_vma = page_anon_vma(page);
> >   	if (!anon_vma)
> >   		return ret;
> >   	spin_lock(&anon_vma->lock);
> >   	list_for_each_entry(avc,&anon_vma->head, same_anon_vma) {
> >   		struct vm_area_struct *vma = avc->vma;
> > -		unsigned long address = vma_address(page, vma);
> > -		if (address == -EFAULT)
> > -			continue;
> > +		unsigned long address;
> > +
> > +		/*
> > +		 * Guard against deadlocks by not spinning against
> > +		 * vma->anon_vma->lock. If contention is found, release our
> > +		 * lock and try again until VMA list can be traversed without
> > +		 * contention.
> > +		 */
> > +		if (anon_vma != vma->anon_vma) {
> > +			if (!spin_trylock(&vma->anon_vma->lock)) {
> > +				spin_unlock(&anon_vma->lock);
> > +				goto retry;
> > +			}
> > +		}
> 
> If you're part way down the list, surely you'll need to
> unlock multiple anon_vmas here before going to retry?
> 
vma->anon_vma->lock is released after vma_address().

-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
  2010-04-27  0:31       ` KAMEZAWA Hiroyuki
@ 2010-04-27  2:13         ` Rik van Riel
  -1 siblings, 0 replies; 42+ messages in thread
From: Rik van Riel @ 2010-04-27  2:13 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Mel Gorman, Linux-MM, LKML, Minchan Kim, Christoph Lameter,
	Andrea Arcangeli, Andrew Morton

On 04/26/2010 08:31 PM, KAMEZAWA Hiroyuki wrote:

>> If you're part way down the list, surely you'll need to
>> unlock multiple anon_vmas here before going to retry?
>>
> vma->anon_vma->lock is released after vma_address().

Doh, never mind.  Too much code in my mind at once...

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
@ 2010-04-27  2:13         ` Rik van Riel
  0 siblings, 0 replies; 42+ messages in thread
From: Rik van Riel @ 2010-04-27  2:13 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Mel Gorman, Linux-MM, LKML, Minchan Kim, Christoph Lameter,
	Andrea Arcangeli, Andrew Morton

On 04/26/2010 08:31 PM, KAMEZAWA Hiroyuki wrote:

>> If you're part way down the list, surely you'll need to
>> unlock multiple anon_vmas here before going to retry?
>>
> vma->anon_vma->lock is released after vma_address().

Doh, never mind.  Too much code in my mind at once...

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
  2010-04-27  0:07     ` KAMEZAWA Hiroyuki
@ 2010-04-27  3:50       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 42+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-04-27  3:50 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Mel Gorman, Linux-MM, LKML, Minchan Kim, Christoph Lameter,
	Andrea Arcangeli, Rik van Riel, Andrew Morton

On Tue, 27 Apr 2010 09:07:06 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Mon, 26 Apr 2010 23:37:58 +0100
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > vma_adjust() is updating anon VMA information without any locks taken.
> > In contrast, file-backed mappings use the i_mmap_lock and this lack of
> > locking can result in races with page migration. During rmap_walk(),
> > vma_address() can return -EFAULT for an address that will soon be valid.
> > This leaves a dangling migration PTE behind which can later cause a BUG_ON
> > to trigger when the page is faulted in.
> > 
> > With the recent anon_vma changes, there can be more than one anon_vma->lock
> > that can be taken in a anon_vma_chain but a second lock cannot be spinned
> > upon in case of deadlock. Instead, the rmap walker tries to take locks of
> > different anon_vma's. If the attempt fails, the operation is restarted.
> > 
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> 
> Ok, acquiring vma->anon_vma->spin_lock always sounds very safe.
> (but slow.)
> 
> I'll test this, too.
> 
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 

Sorry. reproduced. It seems the same bug before patch. 
mapcount 1 -> unmap -> remap -> mapcount 0. And it was SwapCache.


Thanks,
-Kame


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
@ 2010-04-27  3:50       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 42+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-04-27  3:50 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Mel Gorman, Linux-MM, LKML, Minchan Kim, Christoph Lameter,
	Andrea Arcangeli, Rik van Riel, Andrew Morton

On Tue, 27 Apr 2010 09:07:06 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Mon, 26 Apr 2010 23:37:58 +0100
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > vma_adjust() is updating anon VMA information without any locks taken.
> > In contrast, file-backed mappings use the i_mmap_lock and this lack of
> > locking can result in races with page migration. During rmap_walk(),
> > vma_address() can return -EFAULT for an address that will soon be valid.
> > This leaves a dangling migration PTE behind which can later cause a BUG_ON
> > to trigger when the page is faulted in.
> > 
> > With the recent anon_vma changes, there can be more than one anon_vma->lock
> > that can be taken in a anon_vma_chain but a second lock cannot be spinned
> > upon in case of deadlock. Instead, the rmap walker tries to take locks of
> > different anon_vma's. If the attempt fails, the operation is restarted.
> > 
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> 
> Ok, acquiring vma->anon_vma->spin_lock always sounds very safe.
> (but slow.)
> 
> I'll test this, too.
> 
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 

Sorry. reproduced. It seems the same bug before patch. 
mapcount 1 -> unmap -> remap -> mapcount 0. And it was SwapCache.


Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
  2010-04-27  3:50       ` KAMEZAWA Hiroyuki
@ 2010-04-27  4:03         ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 42+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-04-27  4:03 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Mel Gorman, Linux-MM, LKML, Minchan Kim, Christoph Lameter,
	Andrea Arcangeli, Rik van Riel, Andrew Morton

On Tue, 27 Apr 2010 12:50:40 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Tue, 27 Apr 2010 09:07:06 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > On Mon, 26 Apr 2010 23:37:58 +0100
> > Mel Gorman <mel@csn.ul.ie> wrote:
> > 
> > > vma_adjust() is updating anon VMA information without any locks taken.
> > > In contrast, file-backed mappings use the i_mmap_lock and this lack of
> > > locking can result in races with page migration. During rmap_walk(),
> > > vma_address() can return -EFAULT for an address that will soon be valid.
> > > This leaves a dangling migration PTE behind which can later cause a BUG_ON
> > > to trigger when the page is faulted in.
> > > 
> > > With the recent anon_vma changes, there can be more than one anon_vma->lock
> > > that can be taken in a anon_vma_chain but a second lock cannot be spinned
> > > upon in case of deadlock. Instead, the rmap walker tries to take locks of
> > > different anon_vma's. If the attempt fails, the operation is restarted.
> > > 
> > > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > 
> > Ok, acquiring vma->anon_vma->spin_lock always sounds very safe.
> > (but slow.)
> > 
> > I'll test this, too.
> > 
> > Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> 
> Sorry. reproduced. It seems the same bug before patch. 
> mapcount 1 -> unmap -> remap -> mapcount 0. And it was SwapCache.
> 
Sorry again. It was _not_ SwapCache. Just SwapBacked.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
@ 2010-04-27  4:03         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 42+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-04-27  4:03 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Mel Gorman, Linux-MM, LKML, Minchan Kim, Christoph Lameter,
	Andrea Arcangeli, Rik van Riel, Andrew Morton

On Tue, 27 Apr 2010 12:50:40 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Tue, 27 Apr 2010 09:07:06 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > On Mon, 26 Apr 2010 23:37:58 +0100
> > Mel Gorman <mel@csn.ul.ie> wrote:
> > 
> > > vma_adjust() is updating anon VMA information without any locks taken.
> > > In contrast, file-backed mappings use the i_mmap_lock and this lack of
> > > locking can result in races with page migration. During rmap_walk(),
> > > vma_address() can return -EFAULT for an address that will soon be valid.
> > > This leaves a dangling migration PTE behind which can later cause a BUG_ON
> > > to trigger when the page is faulted in.
> > > 
> > > With the recent anon_vma changes, there can be more than one anon_vma->lock
> > > that can be taken in a anon_vma_chain but a second lock cannot be spinned
> > > upon in case of deadlock. Instead, the rmap walker tries to take locks of
> > > different anon_vma's. If the attempt fails, the operation is restarted.
> > > 
> > > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > 
> > Ok, acquiring vma->anon_vma->spin_lock always sounds very safe.
> > (but slow.)
> > 
> > I'll test this, too.
> > 
> > Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> 
> Sorry. reproduced. It seems the same bug before patch. 
> mapcount 1 -> unmap -> remap -> mapcount 0. And it was SwapCache.
> 
Sorry again. It was _not_ SwapCache. Just SwapBacked.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
  2010-04-27  3:50       ` KAMEZAWA Hiroyuki
@ 2010-04-27  8:59         ` Mel Gorman
  -1 siblings, 0 replies; 42+ messages in thread
From: Mel Gorman @ 2010-04-27  8:59 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Linux-MM, LKML, Minchan Kim, Christoph Lameter, Andrea Arcangeli,
	Rik van Riel, Andrew Morton

On Tue, Apr 27, 2010 at 12:50:40PM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 27 Apr 2010 09:07:06 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > On Mon, 26 Apr 2010 23:37:58 +0100
> > Mel Gorman <mel@csn.ul.ie> wrote:
> > 
> > > vma_adjust() is updating anon VMA information without any locks taken.
> > > In contrast, file-backed mappings use the i_mmap_lock and this lack of
> > > locking can result in races with page migration. During rmap_walk(),
> > > vma_address() can return -EFAULT for an address that will soon be valid.
> > > This leaves a dangling migration PTE behind which can later cause a BUG_ON
> > > to trigger when the page is faulted in.
> > > 
> > > With the recent anon_vma changes, there can be more than one anon_vma->lock
> > > that can be taken in a anon_vma_chain but a second lock cannot be spinned
> > > upon in case of deadlock. Instead, the rmap walker tries to take locks of
> > > different anon_vma's. If the attempt fails, the operation is restarted.
> > > 
> > > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > 
> > Ok, acquiring vma->anon_vma->spin_lock always sounds very safe.
> > (but slow.)
> > 
> > I'll test this, too.
> > 
> > Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> 
> Sorry. reproduced. It seems the same bug before patch. 
> mapcount 1 -> unmap -> remap -> mapcount 0. And it was SwapCache.
> 

Same here, reproduced after 18 hours.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
@ 2010-04-27  8:59         ` Mel Gorman
  0 siblings, 0 replies; 42+ messages in thread
From: Mel Gorman @ 2010-04-27  8:59 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Linux-MM, LKML, Minchan Kim, Christoph Lameter, Andrea Arcangeli,
	Rik van Riel, Andrew Morton

On Tue, Apr 27, 2010 at 12:50:40PM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 27 Apr 2010 09:07:06 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > On Mon, 26 Apr 2010 23:37:58 +0100
> > Mel Gorman <mel@csn.ul.ie> wrote:
> > 
> > > vma_adjust() is updating anon VMA information without any locks taken.
> > > In contrast, file-backed mappings use the i_mmap_lock and this lack of
> > > locking can result in races with page migration. During rmap_walk(),
> > > vma_address() can return -EFAULT for an address that will soon be valid.
> > > This leaves a dangling migration PTE behind which can later cause a BUG_ON
> > > to trigger when the page is faulted in.
> > > 
> > > With the recent anon_vma changes, there can be more than one anon_vma->lock
> > > that can be taken in a anon_vma_chain but a second lock cannot be spinned
> > > upon in case of deadlock. Instead, the rmap walker tries to take locks of
> > > different anon_vma's. If the attempt fails, the operation is restarted.
> > > 
> > > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > 
> > Ok, acquiring vma->anon_vma->spin_lock always sounds very safe.
> > (but slow.)
> > 
> > I'll test this, too.
> > 
> > Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> 
> Sorry. reproduced. It seems the same bug before patch. 
> mapcount 1 -> unmap -> remap -> mapcount 0. And it was SwapCache.
> 

Same here, reproduced after 18 hours.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
  2010-04-27  8:59         ` Mel Gorman
@ 2010-04-27  9:09           ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 42+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-04-27  9:09 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, Minchan Kim, Christoph Lameter, Andrea Arcangeli,
	Rik van Riel, Andrew Morton

On Tue, 27 Apr 2010 09:59:51 +0100
Mel Gorman <mel@csn.ul.ie> wrote:

> On Tue, Apr 27, 2010 at 12:50:40PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Tue, 27 Apr 2010 09:07:06 +0900
> > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > 
> > > On Mon, 26 Apr 2010 23:37:58 +0100
> > > Mel Gorman <mel@csn.ul.ie> wrote:
> > > 
> > > > vma_adjust() is updating anon VMA information without any locks taken.
> > > > In contrast, file-backed mappings use the i_mmap_lock and this lack of
> > > > locking can result in races with page migration. During rmap_walk(),
> > > > vma_address() can return -EFAULT for an address that will soon be valid.
> > > > This leaves a dangling migration PTE behind which can later cause a BUG_ON
> > > > to trigger when the page is faulted in.
> > > > 
> > > > With the recent anon_vma changes, there can be more than one anon_vma->lock
> > > > that can be taken in a anon_vma_chain but a second lock cannot be spinned
> > > > upon in case of deadlock. Instead, the rmap walker tries to take locks of
> > > > different anon_vma's. If the attempt fails, the operation is restarted.
> > > > 
> > > > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > > 
> > > Ok, acquiring vma->anon_vma->spin_lock always sounds very safe.
> > > (but slow.)
> > > 
> > > I'll test this, too.
> > > 
> > > Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > 
> > 
> > Sorry. reproduced. It seems the same bug before patch. 
> > mapcount 1 -> unmap -> remap -> mapcount 0. And it was SwapCache.
> > 
> 
> Same here, reproduced after 18 hours.
> 
Hmm. It seems rmap_one() is called and the race is not in vma_address()
but in remap_migration_pte().
So, I added more hooks for debug..but not reproduced yet.
(But I doubt my debug code, too ;)

But it seems strange to have a race in remap_migration_pte(), so, I doubt
my debug code, too.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
@ 2010-04-27  9:09           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 42+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-04-27  9:09 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, Minchan Kim, Christoph Lameter, Andrea Arcangeli,
	Rik van Riel, Andrew Morton

On Tue, 27 Apr 2010 09:59:51 +0100
Mel Gorman <mel@csn.ul.ie> wrote:

> On Tue, Apr 27, 2010 at 12:50:40PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Tue, 27 Apr 2010 09:07:06 +0900
> > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > 
> > > On Mon, 26 Apr 2010 23:37:58 +0100
> > > Mel Gorman <mel@csn.ul.ie> wrote:
> > > 
> > > > vma_adjust() is updating anon VMA information without any locks taken.
> > > > In contrast, file-backed mappings use the i_mmap_lock and this lack of
> > > > locking can result in races with page migration. During rmap_walk(),
> > > > vma_address() can return -EFAULT for an address that will soon be valid.
> > > > This leaves a dangling migration PTE behind which can later cause a BUG_ON
> > > > to trigger when the page is faulted in.
> > > > 
> > > > With the recent anon_vma changes, there can be more than one anon_vma->lock
> > > > that can be taken in a anon_vma_chain but a second lock cannot be spinned
> > > > upon in case of deadlock. Instead, the rmap walker tries to take locks of
> > > > different anon_vma's. If the attempt fails, the operation is restarted.
> > > > 
> > > > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > > 
> > > Ok, acquiring vma->anon_vma->spin_lock always sounds very safe.
> > > (but slow.)
> > > 
> > > I'll test this, too.
> > > 
> > > Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > 
> > 
> > Sorry. reproduced. It seems the same bug before patch. 
> > mapcount 1 -> unmap -> remap -> mapcount 0. And it was SwapCache.
> > 
> 
> Same here, reproduced after 18 hours.
> 
Hmm. It seems rmap_one() is called and the race is not in vma_address()
but in remap_migration_pte().
So, I added more hooks for debug..but not reproduced yet.
(But I doubt my debug code, too ;)

But it seems strange to have a race in remap_migration_pte(), so, I doubt
my debug code, too.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
  2010-04-26 23:05     ` Minchan Kim
@ 2010-04-27  9:17       ` Mel Gorman
  -1 siblings, 0 replies; 42+ messages in thread
From: Mel Gorman @ 2010-04-27  9:17 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Linux-MM, LKML, KAMEZAWA Hiroyuki, Christoph Lameter,
	Andrea Arcangeli, Rik van Riel, Andrew Morton

On Tue, Apr 27, 2010 at 08:05:26AM +0900, Minchan Kim wrote:
> On Tue, Apr 27, 2010 at 7:37 AM, Mel Gorman <mel@csn.ul.ie> wrote:
> > vma_adjust() is updating anon VMA information without any locks taken.
> > In contrast, file-backed mappings use the i_mmap_lock and this lack of
> > locking can result in races with page migration. During rmap_walk(),
> > vma_address() can return -EFAULT for an address that will soon be valid.
> > This leaves a dangling migration PTE behind which can later cause a BUG_ON
> > to trigger when the page is faulted in.
> >
> > With the recent anon_vma changes, there can be more than one anon_vma->lock
> > that can be taken in a anon_vma_chain but a second lock cannot be spinned
> > upon in case of deadlock. Instead, the rmap walker tries to take locks of
> > different anon_vma's. If the attempt fails, the operation is restarted.
> >
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > ---
> >  mm/ksm.c  |   13 +++++++++++++
> >  mm/mmap.c |    6 ++++++
> >  mm/rmap.c |   22 +++++++++++++++++++---
> >  3 files changed, 38 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/ksm.c b/mm/ksm.c
> > index 3666d43..baa5b4d 100644
> > --- a/mm/ksm.c
> > +++ b/mm/ksm.c
> > @@ -1674,9 +1674,22 @@ again:
> >                spin_lock(&anon_vma->lock);
> >                list_for_each_entry(vmac, &anon_vma->head, same_anon_vma) {
> >                        vma = vmac->vma;
> > +
> > +                       /* See comment in mm/rmap.c#rmap_walk_anon on locking */
> > +                       if (anon_vma != vma->anon_vma) {
> > +                               if (!spin_trylock(&vma->anon_vma->lock)) {
> > +                                       spin_unlock(&anon_vma->lock);
> > +                                       goto again;
> > +                               }
> > +                       }
> > +
> >                        if (rmap_item->address < vma->vm_start ||
> >                            rmap_item->address >= vma->vm_end)
> >                                continue;
> > +
> > +                       if (anon_vma != vma->anon_vma)
> > +                               spin_unlock(&vma->anon_vma->lock);
> > +
> >                        /*
> >                         * Initially we examine only the vma which covers this
> >                         * rmap_item; but later, if there is still work to do,
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index f90ea92..61d6f1d 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -578,6 +578,9 @@ again:                      remove_next = 1 + (end > next->vm_end);
> >                }
> >        }
> >
> > +       if (vma->anon_vma)
> > +               spin_lock(&vma->anon_vma->lock);
> > +
> >        if (root) {
> >                flush_dcache_mmap_lock(mapping);
> >                vma_prio_tree_remove(vma, root);
> > @@ -620,6 +623,9 @@ again:                      remove_next = 1 + (end > next->vm_end);
> >        if (mapping)
> >                spin_unlock(&mapping->i_mmap_lock);
> >
> > +       if (vma->anon_vma)
> > +               spin_unlock(&vma->anon_vma->lock);
> > +
> >        if (remove_next) {
> >                if (file) {
> >                        fput(file);
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index 85f203e..bc313a6 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1368,15 +1368,31 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
> >         * are holding mmap_sem. Users without mmap_sem are required to
> >         * take a reference count to prevent the anon_vma disappearing
> >         */
> > +retry:
> >        anon_vma = page_anon_vma(page);
> >        if (!anon_vma)
> >                return ret;
> >        spin_lock(&anon_vma->lock);
> >        list_for_each_entry(avc, &anon_vma->head, same_anon_vma) {
> >                struct vm_area_struct *vma = avc->vma;
> > -               unsigned long address = vma_address(page, vma);
> > -               if (address == -EFAULT)
> > -                       continue;
> > +               unsigned long address;
> > +
> > +               /*
> > +                * Guard against deadlocks by not spinning against
> > +                * vma->anon_vma->lock. If contention is found, release our
> > +                * lock and try again until VMA list can be traversed without
> > +                * contention.
> > +                */
> > +               if (anon_vma != vma->anon_vma) {
> > +                       if (!spin_trylock(&vma->anon_vma->lock)) {
> > +                               spin_unlock(&anon_vma->lock);
> > +                               goto retry;
> > +                       }
> > +               }
> > +               address = vma_address(page, vma);
> > +               if (anon_vma != vma->anon_vma)
> > +                       spin_unlock(&vma->anon_vma->lock);
> > +
> 
> if (address == -EFAULT)
>         continue;
> 

Correct. Thanks.

> >                ret = rmap_one(page, vma, address, arg);
> >                if (ret != SWAP_AGAIN)
> >                        break;
> > --
> > 1.6.5
> >
> >
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
@ 2010-04-27  9:17       ` Mel Gorman
  0 siblings, 0 replies; 42+ messages in thread
From: Mel Gorman @ 2010-04-27  9:17 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Linux-MM, LKML, KAMEZAWA Hiroyuki, Christoph Lameter,
	Andrea Arcangeli, Rik van Riel, Andrew Morton

On Tue, Apr 27, 2010 at 08:05:26AM +0900, Minchan Kim wrote:
> On Tue, Apr 27, 2010 at 7:37 AM, Mel Gorman <mel@csn.ul.ie> wrote:
> > vma_adjust() is updating anon VMA information without any locks taken.
> > In contrast, file-backed mappings use the i_mmap_lock and this lack of
> > locking can result in races with page migration. During rmap_walk(),
> > vma_address() can return -EFAULT for an address that will soon be valid.
> > This leaves a dangling migration PTE behind which can later cause a BUG_ON
> > to trigger when the page is faulted in.
> >
> > With the recent anon_vma changes, there can be more than one anon_vma->lock
> > that can be taken in a anon_vma_chain but a second lock cannot be spinned
> > upon in case of deadlock. Instead, the rmap walker tries to take locks of
> > different anon_vma's. If the attempt fails, the operation is restarted.
> >
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > ---
> >  mm/ksm.c  |   13 +++++++++++++
> >  mm/mmap.c |    6 ++++++
> >  mm/rmap.c |   22 +++++++++++++++++++---
> >  3 files changed, 38 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/ksm.c b/mm/ksm.c
> > index 3666d43..baa5b4d 100644
> > --- a/mm/ksm.c
> > +++ b/mm/ksm.c
> > @@ -1674,9 +1674,22 @@ again:
> >                spin_lock(&anon_vma->lock);
> >                list_for_each_entry(vmac, &anon_vma->head, same_anon_vma) {
> >                        vma = vmac->vma;
> > +
> > +                       /* See comment in mm/rmap.c#rmap_walk_anon on locking */
> > +                       if (anon_vma != vma->anon_vma) {
> > +                               if (!spin_trylock(&vma->anon_vma->lock)) {
> > +                                       spin_unlock(&anon_vma->lock);
> > +                                       goto again;
> > +                               }
> > +                       }
> > +
> >                        if (rmap_item->address < vma->vm_start ||
> >                            rmap_item->address >= vma->vm_end)
> >                                continue;
> > +
> > +                       if (anon_vma != vma->anon_vma)
> > +                               spin_unlock(&vma->anon_vma->lock);
> > +
> >                        /*
> >                         * Initially we examine only the vma which covers this
> >                         * rmap_item; but later, if there is still work to do,
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index f90ea92..61d6f1d 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -578,6 +578,9 @@ again:                      remove_next = 1 + (end > next->vm_end);
> >                }
> >        }
> >
> > +       if (vma->anon_vma)
> > +               spin_lock(&vma->anon_vma->lock);
> > +
> >        if (root) {
> >                flush_dcache_mmap_lock(mapping);
> >                vma_prio_tree_remove(vma, root);
> > @@ -620,6 +623,9 @@ again:                      remove_next = 1 + (end > next->vm_end);
> >        if (mapping)
> >                spin_unlock(&mapping->i_mmap_lock);
> >
> > +       if (vma->anon_vma)
> > +               spin_unlock(&vma->anon_vma->lock);
> > +
> >        if (remove_next) {
> >                if (file) {
> >                        fput(file);
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index 85f203e..bc313a6 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1368,15 +1368,31 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
> >         * are holding mmap_sem. Users without mmap_sem are required to
> >         * take a reference count to prevent the anon_vma disappearing
> >         */
> > +retry:
> >        anon_vma = page_anon_vma(page);
> >        if (!anon_vma)
> >                return ret;
> >        spin_lock(&anon_vma->lock);
> >        list_for_each_entry(avc, &anon_vma->head, same_anon_vma) {
> >                struct vm_area_struct *vma = avc->vma;
> > -               unsigned long address = vma_address(page, vma);
> > -               if (address == -EFAULT)
> > -                       continue;
> > +               unsigned long address;
> > +
> > +               /*
> > +                * Guard against deadlocks by not spinning against
> > +                * vma->anon_vma->lock. If contention is found, release our
> > +                * lock and try again until VMA list can be traversed without
> > +                * contention.
> > +                */
> > +               if (anon_vma != vma->anon_vma) {
> > +                       if (!spin_trylock(&vma->anon_vma->lock)) {
> > +                               spin_unlock(&anon_vma->lock);
> > +                               goto retry;
> > +                       }
> > +               }
> > +               address = vma_address(page, vma);
> > +               if (anon_vma != vma->anon_vma)
> > +                       spin_unlock(&vma->anon_vma->lock);
> > +
> 
> if (address == -EFAULT)
>         continue;
> 

Correct. Thanks.

> >                ret = rmap_one(page, vma, address, arg);
> >                if (ret != SWAP_AGAIN)
> >                        break;
> > --
> > 1.6.5
> >
> >
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
  2010-04-27  9:09           ` KAMEZAWA Hiroyuki
@ 2010-04-27 10:29             ` Mel Gorman
  -1 siblings, 0 replies; 42+ messages in thread
From: Mel Gorman @ 2010-04-27 10:29 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Linux-MM, LKML, Minchan Kim, Christoph Lameter, Andrea Arcangeli,
	Rik van Riel, Andrew Morton

On Tue, Apr 27, 2010 at 06:09:49PM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 27 Apr 2010 09:59:51 +0100
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > On Tue, Apr 27, 2010 at 12:50:40PM +0900, KAMEZAWA Hiroyuki wrote:
> > > On Tue, 27 Apr 2010 09:07:06 +0900
> > > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > > 
> > > > On Mon, 26 Apr 2010 23:37:58 +0100
> > > > Mel Gorman <mel@csn.ul.ie> wrote:
> > > > 
> > > > > vma_adjust() is updating anon VMA information without any locks taken.
> > > > > In contrast, file-backed mappings use the i_mmap_lock and this lack of
> > > > > locking can result in races with page migration. During rmap_walk(),
> > > > > vma_address() can return -EFAULT for an address that will soon be valid.
> > > > > This leaves a dangling migration PTE behind which can later cause a BUG_ON
> > > > > to trigger when the page is faulted in.
> > > > > 
> > > > > With the recent anon_vma changes, there can be more than one anon_vma->lock
> > > > > that can be taken in a anon_vma_chain but a second lock cannot be spinned
> > > > > upon in case of deadlock. Instead, the rmap walker tries to take locks of
> > > > > different anon_vma's. If the attempt fails, the operation is restarted.
> > > > > 
> > > > > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > > > 
> > > > Ok, acquiring vma->anon_vma->spin_lock always sounds very safe.
> > > > (but slow.)
> > > > 
> > > > I'll test this, too.
> > > > 
> > > > Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > > 
> > > 
> > > Sorry. reproduced. It seems the same bug before patch. 
> > > mapcount 1 -> unmap -> remap -> mapcount 0. And it was SwapCache.
> > > 
> > 
> > Same here, reproduced after 18 hours.
> > 
> Hmm. It seems rmap_one() is called and the race is not in vma_address()
> but in remap_migration_pte().

It could have been in both but the vma lock should have been held across
the rmap_one. It still reproduces but it's still the right thing to do.
This is the current version of patch 2/2.

==== CUT HERE ====

[PATCH] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information

vma_adjust() is updating anon VMA information without any locks taken.
In contrast, file-backed mappings use the i_mmap_lock and this lack of
locking can result in races with page migration. During rmap_walk(),
vma_address() can return -EFAULT for an address that will soon be valid.
This leaves a dangling migration PTE behind which can later cause a BUG_ON
to trigger when the page is faulted in.

With the recent anon_vma changes, there can be more than one anon_vma->lock
that can be taken in a anon_vma_chain but a second lock cannot be spinned
upon in case of deadlock. Instead, the rmap walker tries to take locks of
different anon_vma's. If the attempt fails, the operation is restarted.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 mm/ksm.c  |   19 +++++++++++++++++--
 mm/mmap.c |    6 ++++++
 mm/rmap.c |   27 +++++++++++++++++++++++----
 3 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 3666d43..87c7531 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1674,9 +1674,19 @@ again:
 		spin_lock(&anon_vma->lock);
 		list_for_each_entry(vmac, &anon_vma->head, same_anon_vma) {
 			vma = vmac->vma;
+
+			/* See comment in mm/rmap.c#rmap_walk_anon on locking */
+			if (anon_vma != vma->anon_vma) {
+				if (!spin_trylock(&vma->anon_vma->lock)) {
+					spin_unlock(&anon_vma->lock);
+					goto again;
+				}
+			}
+
 			if (rmap_item->address < vma->vm_start ||
 			    rmap_item->address >= vma->vm_end)
-				continue;
+				goto next_vma;
+
 			/*
 			 * Initially we examine only the vma which covers this
 			 * rmap_item; but later, if there is still work to do,
@@ -1684,9 +1694,14 @@ again:
 			 * were forked from the original since ksmd passed.
 			 */
 			if ((rmap_item->mm == vma->vm_mm) == search_new_forks)
-				continue;
+				goto next_vma;
 
 			ret = rmap_one(page, vma, rmap_item->address, arg);
+
+next_vma:
+			if (anon_vma != vma->anon_vma)
+				spin_unlock(&vma->anon_vma->lock);
+
 			if (ret != SWAP_AGAIN) {
 				spin_unlock(&anon_vma->lock);
 				goto out;
diff --git a/mm/mmap.c b/mm/mmap.c
index f90ea92..61d6f1d 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -578,6 +578,9 @@ again:			remove_next = 1 + (end > next->vm_end);
 		}
 	}
 
+	if (vma->anon_vma)
+		spin_lock(&vma->anon_vma->lock);
+
 	if (root) {
 		flush_dcache_mmap_lock(mapping);
 		vma_prio_tree_remove(vma, root);
@@ -620,6 +623,9 @@ again:			remove_next = 1 + (end > next->vm_end);
 	if (mapping)
 		spin_unlock(&mapping->i_mmap_lock);
 
+	if (vma->anon_vma)
+		spin_unlock(&vma->anon_vma->lock);
+
 	if (remove_next) {
 		if (file) {
 			fput(file);
diff --git a/mm/rmap.c b/mm/rmap.c
index 85f203e..7c2b7a9 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1368,18 +1368,37 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
 	 * are holding mmap_sem. Users without mmap_sem are required to
 	 * take a reference count to prevent the anon_vma disappearing
 	 */
+retry:
 	anon_vma = page_anon_vma(page);
 	if (!anon_vma)
 		return ret;
 	spin_lock(&anon_vma->lock);
 	list_for_each_entry(avc, &anon_vma->head, same_anon_vma) {
 		struct vm_area_struct *vma = avc->vma;
-		unsigned long address = vma_address(page, vma);
-		if (address == -EFAULT)
-			continue;
-		ret = rmap_one(page, vma, address, arg);
+		unsigned long address;
+
+		/*
+		 * Guard against deadlocks by not spinning against
+		 * vma->anon_vma->lock. If contention is found, release our lock and
+		 * try again until VMA list can be traversed without worrying about
+		 * the details of the VMA changing underneath us.
+		 */
+		if (anon_vma != vma->anon_vma) {
+			if (!spin_trylock(&vma->anon_vma->lock)) {
+				spin_unlock(&anon_vma->lock);
+				goto retry;
+			}
+		}
+		address = vma_address(page, vma);
+		if (address != -EFAULT)
+			ret = rmap_one(page, vma, address, arg);
+
+		if (anon_vma != vma->anon_vma)
+			spin_unlock(&vma->anon_vma->lock);
+
 		if (ret != SWAP_AGAIN)
 			break;
+		
 	}
 	spin_unlock(&anon_vma->lock);
 	return ret;

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
@ 2010-04-27 10:29             ` Mel Gorman
  0 siblings, 0 replies; 42+ messages in thread
From: Mel Gorman @ 2010-04-27 10:29 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Linux-MM, LKML, Minchan Kim, Christoph Lameter, Andrea Arcangeli,
	Rik van Riel, Andrew Morton

On Tue, Apr 27, 2010 at 06:09:49PM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 27 Apr 2010 09:59:51 +0100
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > On Tue, Apr 27, 2010 at 12:50:40PM +0900, KAMEZAWA Hiroyuki wrote:
> > > On Tue, 27 Apr 2010 09:07:06 +0900
> > > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > > 
> > > > On Mon, 26 Apr 2010 23:37:58 +0100
> > > > Mel Gorman <mel@csn.ul.ie> wrote:
> > > > 
> > > > > vma_adjust() is updating anon VMA information without any locks taken.
> > > > > In contrast, file-backed mappings use the i_mmap_lock and this lack of
> > > > > locking can result in races with page migration. During rmap_walk(),
> > > > > vma_address() can return -EFAULT for an address that will soon be valid.
> > > > > This leaves a dangling migration PTE behind which can later cause a BUG_ON
> > > > > to trigger when the page is faulted in.
> > > > > 
> > > > > With the recent anon_vma changes, there can be more than one anon_vma->lock
> > > > > that can be taken in a anon_vma_chain but a second lock cannot be spinned
> > > > > upon in case of deadlock. Instead, the rmap walker tries to take locks of
> > > > > different anon_vma's. If the attempt fails, the operation is restarted.
> > > > > 
> > > > > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > > > 
> > > > Ok, acquiring vma->anon_vma->spin_lock always sounds very safe.
> > > > (but slow.)
> > > > 
> > > > I'll test this, too.
> > > > 
> > > > Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > > 
> > > 
> > > Sorry. reproduced. It seems the same bug before patch. 
> > > mapcount 1 -> unmap -> remap -> mapcount 0. And it was SwapCache.
> > > 
> > 
> > Same here, reproduced after 18 hours.
> > 
> Hmm. It seems rmap_one() is called and the race is not in vma_address()
> but in remap_migration_pte().

It could have been in both but the vma lock should have been held across
the rmap_one. It still reproduces but it's still the right thing to do.
This is the current version of patch 2/2.

==== CUT HERE ====

[PATCH] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information

vma_adjust() is updating anon VMA information without any locks taken.
In contrast, file-backed mappings use the i_mmap_lock and this lack of
locking can result in races with page migration. During rmap_walk(),
vma_address() can return -EFAULT for an address that will soon be valid.
This leaves a dangling migration PTE behind which can later cause a BUG_ON
to trigger when the page is faulted in.

With the recent anon_vma changes, there can be more than one anon_vma->lock
that can be taken in a anon_vma_chain but a second lock cannot be spinned
upon in case of deadlock. Instead, the rmap walker tries to take locks of
different anon_vma's. If the attempt fails, the operation is restarted.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 mm/ksm.c  |   19 +++++++++++++++++--
 mm/mmap.c |    6 ++++++
 mm/rmap.c |   27 +++++++++++++++++++++++----
 3 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 3666d43..87c7531 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1674,9 +1674,19 @@ again:
 		spin_lock(&anon_vma->lock);
 		list_for_each_entry(vmac, &anon_vma->head, same_anon_vma) {
 			vma = vmac->vma;
+
+			/* See comment in mm/rmap.c#rmap_walk_anon on locking */
+			if (anon_vma != vma->anon_vma) {
+				if (!spin_trylock(&vma->anon_vma->lock)) {
+					spin_unlock(&anon_vma->lock);
+					goto again;
+				}
+			}
+
 			if (rmap_item->address < vma->vm_start ||
 			    rmap_item->address >= vma->vm_end)
-				continue;
+				goto next_vma;
+
 			/*
 			 * Initially we examine only the vma which covers this
 			 * rmap_item; but later, if there is still work to do,
@@ -1684,9 +1694,14 @@ again:
 			 * were forked from the original since ksmd passed.
 			 */
 			if ((rmap_item->mm == vma->vm_mm) == search_new_forks)
-				continue;
+				goto next_vma;
 
 			ret = rmap_one(page, vma, rmap_item->address, arg);
+
+next_vma:
+			if (anon_vma != vma->anon_vma)
+				spin_unlock(&vma->anon_vma->lock);
+
 			if (ret != SWAP_AGAIN) {
 				spin_unlock(&anon_vma->lock);
 				goto out;
diff --git a/mm/mmap.c b/mm/mmap.c
index f90ea92..61d6f1d 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -578,6 +578,9 @@ again:			remove_next = 1 + (end > next->vm_end);
 		}
 	}
 
+	if (vma->anon_vma)
+		spin_lock(&vma->anon_vma->lock);
+
 	if (root) {
 		flush_dcache_mmap_lock(mapping);
 		vma_prio_tree_remove(vma, root);
@@ -620,6 +623,9 @@ again:			remove_next = 1 + (end > next->vm_end);
 	if (mapping)
 		spin_unlock(&mapping->i_mmap_lock);
 
+	if (vma->anon_vma)
+		spin_unlock(&vma->anon_vma->lock);
+
 	if (remove_next) {
 		if (file) {
 			fput(file);
diff --git a/mm/rmap.c b/mm/rmap.c
index 85f203e..7c2b7a9 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1368,18 +1368,37 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
 	 * are holding mmap_sem. Users without mmap_sem are required to
 	 * take a reference count to prevent the anon_vma disappearing
 	 */
+retry:
 	anon_vma = page_anon_vma(page);
 	if (!anon_vma)
 		return ret;
 	spin_lock(&anon_vma->lock);
 	list_for_each_entry(avc, &anon_vma->head, same_anon_vma) {
 		struct vm_area_struct *vma = avc->vma;
-		unsigned long address = vma_address(page, vma);
-		if (address == -EFAULT)
-			continue;
-		ret = rmap_one(page, vma, address, arg);
+		unsigned long address;
+
+		/*
+		 * Guard against deadlocks by not spinning against
+		 * vma->anon_vma->lock. If contention is found, release our lock and
+		 * try again until VMA list can be traversed without worrying about
+		 * the details of the VMA changing underneath us.
+		 */
+		if (anon_vma != vma->anon_vma) {
+			if (!spin_trylock(&vma->anon_vma->lock)) {
+				spin_unlock(&anon_vma->lock);
+				goto retry;
+			}
+		}
+		address = vma_address(page, vma);
+		if (address != -EFAULT)
+			ret = rmap_one(page, vma, address, arg);
+
+		if (anon_vma != vma->anon_vma)
+			spin_unlock(&vma->anon_vma->lock);
+
 		if (ret != SWAP_AGAIN)
 			break;
+		
 	}
 	spin_unlock(&anon_vma->lock);
 	return ret;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
  2010-04-27 10:29             ` Mel Gorman
@ 2010-04-27 15:37               ` Andrea Arcangeli
  -1 siblings, 0 replies; 42+ messages in thread
From: Andrea Arcangeli @ 2010-04-27 15:37 UTC (permalink / raw)
  To: Mel Gorman
  Cc: KAMEZAWA Hiroyuki, Linux-MM, LKML, Minchan Kim,
	Christoph Lameter, Rik van Riel, Andrew Morton

On Tue, Apr 27, 2010 at 11:29:05AM +0100, Mel Gorman wrote:
> It could have been in both but the vma lock should have been held across
> the rmap_one. It still reproduces but it's still the right thing to do.
> This is the current version of patch 2/2.

Well, keep in mind I reproduced the swapops bug with 2.6.33 anon-vma
code, it's unlikely that focusing on patch 2 you'll fix bug in
swapops.h. If this is a bug in the new anon-vma code, it needs fixing
of course! But I doubt this bug is related to swapops in execve on the
bprm->p args.

I've yet to check in detail patch 1 sorry, I'll let you know my
opinion about it as soon as I checked it in detail.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
@ 2010-04-27 15:37               ` Andrea Arcangeli
  0 siblings, 0 replies; 42+ messages in thread
From: Andrea Arcangeli @ 2010-04-27 15:37 UTC (permalink / raw)
  To: Mel Gorman
  Cc: KAMEZAWA Hiroyuki, Linux-MM, LKML, Minchan Kim,
	Christoph Lameter, Rik van Riel, Andrew Morton

On Tue, Apr 27, 2010 at 11:29:05AM +0100, Mel Gorman wrote:
> It could have been in both but the vma lock should have been held across
> the rmap_one. It still reproduces but it's still the right thing to do.
> This is the current version of patch 2/2.

Well, keep in mind I reproduced the swapops bug with 2.6.33 anon-vma
code, it's unlikely that focusing on patch 2 you'll fix bug in
swapops.h. If this is a bug in the new anon-vma code, it needs fixing
of course! But I doubt this bug is related to swapops in execve on the
bprm->p args.

I've yet to check in detail patch 1 sorry, I'll let you know my
opinion about it as soon as I checked it in detail.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
  2010-04-27 15:37               ` Andrea Arcangeli
@ 2010-04-27 16:35                 ` Mel Gorman
  -1 siblings, 0 replies; 42+ messages in thread
From: Mel Gorman @ 2010-04-27 16:35 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KAMEZAWA Hiroyuki, Linux-MM, LKML, Minchan Kim,
	Christoph Lameter, Rik van Riel, Andrew Morton

On Tue, Apr 27, 2010 at 05:37:59PM +0200, Andrea Arcangeli wrote:
> On Tue, Apr 27, 2010 at 11:29:05AM +0100, Mel Gorman wrote:
> > It could have been in both but the vma lock should have been held across
> > the rmap_one. It still reproduces but it's still the right thing to do.
> > This is the current version of patch 2/2.
> 
> Well, keep in mind I reproduced the swapops bug with 2.6.33 anon-vma
> code, it's unlikely that focusing on patch 2 you'll fix bug in
> swapops.h. If this is a bug in the new anon-vma code, it needs fixing
> of course! But I doubt this bug is related to swapops in execve on the
> bprm->p args.
> 

Why do you doubt it's unrelated to execve? From what I've seen during the day,
there is a race in execve where the VMA gets moved (under the anon_vma lock)
before the page-tables are copied with move_ptes (without a lock). In that
case, execve can encounter and copy migration ptes before migration removes
them. This also applies to mainline because it is only taking the RCU lock
and not the anon_vma->lock.

I have a prototype that "handles" the situation with the new anon_vma
code by removing the migration ptes it finds while moving page tables
but it needs more work before releasing.

An alternative would be to split vma_adjust() into locked and unlocked
versions. shift_arg_pages() could then take the anon_vma lock to lock
both the VMA move and the pagetable copy here.

        /*
         * cover the whole range: [new_start, old_end)
         */
        if (vma_adjust(vma, new_start, old_end, vma->vm_pgoff, NULL))
                return -ENOMEM;

        /*
         * move the page tables downwards, on failure we rely on
         * process cleanup to remove whatever mess we made.
         */
        if (length != move_page_tables(vma, old_start,
                                       vma, new_start, length))
                return -ENOMEM;

It'd be messy to split up the locking of vma_adjust like this though and
exec() will hold the anon_vma locks for longer just to guard against
migration. It's not clear this is better than having move_ptes handle
the

> I've yet to check in detail patch 1 sorry, I'll let you know my
> opinion about it as soon as I checked it in detail.
> 

No problem. I still need to revisit all of these patches once I am
confident the swapops bug cannot be triggered any more.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
@ 2010-04-27 16:35                 ` Mel Gorman
  0 siblings, 0 replies; 42+ messages in thread
From: Mel Gorman @ 2010-04-27 16:35 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KAMEZAWA Hiroyuki, Linux-MM, LKML, Minchan Kim,
	Christoph Lameter, Rik van Riel, Andrew Morton

On Tue, Apr 27, 2010 at 05:37:59PM +0200, Andrea Arcangeli wrote:
> On Tue, Apr 27, 2010 at 11:29:05AM +0100, Mel Gorman wrote:
> > It could have been in both but the vma lock should have been held across
> > the rmap_one. It still reproduces but it's still the right thing to do.
> > This is the current version of patch 2/2.
> 
> Well, keep in mind I reproduced the swapops bug with 2.6.33 anon-vma
> code, it's unlikely that focusing on patch 2 you'll fix bug in
> swapops.h. If this is a bug in the new anon-vma code, it needs fixing
> of course! But I doubt this bug is related to swapops in execve on the
> bprm->p args.
> 

Why do you doubt it's unrelated to execve? From what I've seen during the day,
there is a race in execve where the VMA gets moved (under the anon_vma lock)
before the page-tables are copied with move_ptes (without a lock). In that
case, execve can encounter and copy migration ptes before migration removes
them. This also applies to mainline because it is only taking the RCU lock
and not the anon_vma->lock.

I have a prototype that "handles" the situation with the new anon_vma
code by removing the migration ptes it finds while moving page tables
but it needs more work before releasing.

An alternative would be to split vma_adjust() into locked and unlocked
versions. shift_arg_pages() could then take the anon_vma lock to lock
both the VMA move and the pagetable copy here.

        /*
         * cover the whole range: [new_start, old_end)
         */
        if (vma_adjust(vma, new_start, old_end, vma->vm_pgoff, NULL))
                return -ENOMEM;

        /*
         * move the page tables downwards, on failure we rely on
         * process cleanup to remove whatever mess we made.
         */
        if (length != move_page_tables(vma, old_start,
                                       vma, new_start, length))
                return -ENOMEM;

It'd be messy to split up the locking of vma_adjust like this though and
exec() will hold the anon_vma locks for longer just to guard against
migration. It's not clear this is better than having move_ptes handle
the

> I've yet to check in detail patch 1 sorry, I'll let you know my
> opinion about it as soon as I checked it in detail.
> 

No problem. I still need to revisit all of these patches once I am
confident the swapops bug cannot be triggered any more.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2010-04-27 16:36 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-26 22:37 [PATCH 0/2] Fix migration races in rmap_walk() Mel Gorman
2010-04-26 22:37 ` Mel Gorman
2010-04-26 22:37 ` [PATCH 1/2] mm,migration: During fork(), wait for migration to end if migration PTE is encountered Mel Gorman
2010-04-26 22:37   ` Mel Gorman
2010-04-26 23:28   ` Minchan Kim
2010-04-26 23:28     ` Minchan Kim
2010-04-27  0:08   ` Rik van Riel
2010-04-27  0:08     ` Rik van Riel
2010-04-26 22:37 ` [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information Mel Gorman
2010-04-26 22:37   ` Mel Gorman
2010-04-26 23:05   ` Minchan Kim
2010-04-26 23:05     ` Minchan Kim
2010-04-27  0:30     ` KAMEZAWA Hiroyuki
2010-04-27  0:30       ` KAMEZAWA Hiroyuki
2010-04-27  9:17     ` Mel Gorman
2010-04-27  9:17       ` Mel Gorman
2010-04-26 23:15   ` Minchan Kim
2010-04-26 23:15     ` Minchan Kim
2010-04-27  0:07   ` KAMEZAWA Hiroyuki
2010-04-27  0:07     ` KAMEZAWA Hiroyuki
2010-04-27  3:50     ` KAMEZAWA Hiroyuki
2010-04-27  3:50       ` KAMEZAWA Hiroyuki
2010-04-27  4:03       ` KAMEZAWA Hiroyuki
2010-04-27  4:03         ` KAMEZAWA Hiroyuki
2010-04-27  8:59       ` Mel Gorman
2010-04-27  8:59         ` Mel Gorman
2010-04-27  9:09         ` KAMEZAWA Hiroyuki
2010-04-27  9:09           ` KAMEZAWA Hiroyuki
2010-04-27 10:29           ` Mel Gorman
2010-04-27 10:29             ` Mel Gorman
2010-04-27 15:37             ` Andrea Arcangeli
2010-04-27 15:37               ` Andrea Arcangeli
2010-04-27 16:35               ` Mel Gorman
2010-04-27 16:35                 ` Mel Gorman
2010-04-27  0:30   ` Rik van Riel
2010-04-27  0:30     ` Rik van Riel
2010-04-27  0:31     ` KAMEZAWA Hiroyuki
2010-04-27  0:31       ` KAMEZAWA Hiroyuki
2010-04-27  2:13       ` Rik van Riel
2010-04-27  2:13         ` Rik van Riel
2010-04-26 23:04 ` [PATCH 0/2] Fix migration races in rmap_walk() Andrea Arcangeli
2010-04-26 23:04   ` Andrea Arcangeli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.