All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/7] some page_referenced() improvement
@ 2009-12-04  8:40 ` KOSAKI Motohiro
  0 siblings, 0 replies; 38+ messages in thread
From: KOSAKI Motohiro @ 2009-12-04  8:40 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, linux-mm, Rik van Riel, Andrea Arcangeli, Larry Woodman

Hi

here is my refactoring and improvement patchset of page_referenced().
I think it solve Larry's AIM7 scalability issue.

I'll test this patches on stress workload at this weekend. but I hope to
receive guys review concurrently.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC][PATCH 0/7] some page_referenced() improvement
@ 2009-12-04  8:40 ` KOSAKI Motohiro
  0 siblings, 0 replies; 38+ messages in thread
From: KOSAKI Motohiro @ 2009-12-04  8:40 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, linux-mm, Rik van Riel, Andrea Arcangeli, Larry Woodman

Hi

here is my refactoring and improvement patchset of page_referenced().
I think it solve Larry's AIM7 scalability issue.

I'll test this patches on stress workload at this weekend. but I hope to
receive guys review concurrently.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 1/7] Replace page_mapping_inuse() with page_mapped()
  2009-12-04  8:40 ` KOSAKI Motohiro
@ 2009-12-04  8:41   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 38+ messages in thread
From: KOSAKI Motohiro @ 2009-12-04  8:41 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, linux-mm, Rik van Riel, Andrea Arcangeli, Larry Woodman

>From c0cd3ee2bb13567a36728600a86f43abac3125b5 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Wed, 2 Dec 2009 12:05:26 +0900
Subject: [PATCH 1/7] Replace page_mapping_inuse() with page_mapped()

page reclaim logic need to distingish mapped and unmapped pages.
However page_mapping_inuse() don't provide proper test way. it test
the address space (i.e. file) is mmpad(). Why `page' reclaim need
care unrelated page's mapped state? it's unrelated.

Thus, This patch replace page_mapping_inuse() with page_mapped()

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
---
 mm/vmscan.c |   25 ++-----------------------
 1 files changed, 2 insertions(+), 23 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index da6cf42..4ba08da 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -262,27 +262,6 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
 	return ret;
 }
 
-/* Called without lock on whether page is mapped, so answer is unstable */
-static inline int page_mapping_inuse(struct page *page)
-{
-	struct address_space *mapping;
-
-	/* Page is in somebody's page tables. */
-	if (page_mapped(page))
-		return 1;
-
-	/* Be more reluctant to reclaim swapcache than pagecache */
-	if (PageSwapCache(page))
-		return 1;
-
-	mapping = page_mapping(page);
-	if (!mapping)
-		return 0;
-
-	/* File is mmap'd by somebody? */
-	return mapping_mapped(mapping);
-}
-
 static inline int is_page_cache_freeable(struct page *page)
 {
 	/*
@@ -649,7 +628,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		 * try_to_unmap moves it to unevictable list
 		 */
 		if (sc->order <= PAGE_ALLOC_COSTLY_ORDER &&
-					referenced && page_mapping_inuse(page)
+					referenced && page_mapped(page)
 					&& !(vm_flags & VM_LOCKED))
 			goto activate_locked;
 
@@ -1356,7 +1335,7 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 		}
 
 		/* page_referenced clears PageReferenced */
-		if (page_mapping_inuse(page) &&
+		if (page_mapped(page) &&
 		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags)) {
 			nr_rotated++;
 			/*
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 1/7] Replace page_mapping_inuse() with page_mapped()
@ 2009-12-04  8:41   ` KOSAKI Motohiro
  0 siblings, 0 replies; 38+ messages in thread
From: KOSAKI Motohiro @ 2009-12-04  8:41 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, linux-mm, Rik van Riel, Andrea Arcangeli, Larry Woodman

From c0cd3ee2bb13567a36728600a86f43abac3125b5 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Wed, 2 Dec 2009 12:05:26 +0900
Subject: [PATCH 1/7] Replace page_mapping_inuse() with page_mapped()

page reclaim logic need to distingish mapped and unmapped pages.
However page_mapping_inuse() don't provide proper test way. it test
the address space (i.e. file) is mmpad(). Why `page' reclaim need
care unrelated page's mapped state? it's unrelated.

Thus, This patch replace page_mapping_inuse() with page_mapped()

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
---
 mm/vmscan.c |   25 ++-----------------------
 1 files changed, 2 insertions(+), 23 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index da6cf42..4ba08da 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -262,27 +262,6 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
 	return ret;
 }
 
-/* Called without lock on whether page is mapped, so answer is unstable */
-static inline int page_mapping_inuse(struct page *page)
-{
-	struct address_space *mapping;
-
-	/* Page is in somebody's page tables. */
-	if (page_mapped(page))
-		return 1;
-
-	/* Be more reluctant to reclaim swapcache than pagecache */
-	if (PageSwapCache(page))
-		return 1;
-
-	mapping = page_mapping(page);
-	if (!mapping)
-		return 0;
-
-	/* File is mmap'd by somebody? */
-	return mapping_mapped(mapping);
-}
-
 static inline int is_page_cache_freeable(struct page *page)
 {
 	/*
@@ -649,7 +628,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		 * try_to_unmap moves it to unevictable list
 		 */
 		if (sc->order <= PAGE_ALLOC_COSTLY_ORDER &&
-					referenced && page_mapping_inuse(page)
+					referenced && page_mapped(page)
 					&& !(vm_flags & VM_LOCKED))
 			goto activate_locked;
 
@@ -1356,7 +1335,7 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 		}
 
 		/* page_referenced clears PageReferenced */
-		if (page_mapping_inuse(page) &&
+		if (page_mapped(page) &&
 		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags)) {
 			nr_rotated++;
 			/*
-- 
1.6.5.2



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 2/7] Introduce __page_check_address
  2009-12-04  8:40 ` KOSAKI Motohiro
@ 2009-12-04  8:42   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 38+ messages in thread
From: KOSAKI Motohiro @ 2009-12-04  8:42 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, linux-mm, Rik van Riel, Andrea Arcangeli, Larry Woodman

>From 381108e1ff6309f45f45a67acf2a1dd66e41df4f Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Thu, 3 Dec 2009 15:01:42 +0900
Subject: [PATCH 2/7] Introduce __page_check_address

page_check_address() need to take ptelock. but it might be contended.
Then we need trylock version and this patch introduce new helper function.

it will be used latter patch.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/rmap.c |   62 ++++++++++++++++++++++++++++++++++++++++++++++++------------
 1 files changed, 49 insertions(+), 13 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 278cd27..1b50425 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -268,44 +268,80 @@ unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma)
  * the page table lock when the pte is not present (helpful when reclaiming
  * highly shared pages).
  *
- * On success returns with pte mapped and locked.
+ * if @noblock is true, page_check_address may return -EAGAIN if lock is
+ * contended.
+ *
+ * Returns valid pte pointer if success.
+ * Returns -EFAULT if address seems invalid.
+ * Returns -EAGAIN if trylock failed.
  */
-pte_t *page_check_address(struct page *page, struct mm_struct *mm,
-			  unsigned long address, spinlock_t **ptlp, int sync)
+static pte_t *__page_check_address(struct page *page, struct mm_struct *mm,
+				   unsigned long address, spinlock_t **ptlp,
+				   int sync, int noblock)
 {
 	pgd_t *pgd;
 	pud_t *pud;
 	pmd_t *pmd;
 	pte_t *pte;
 	spinlock_t *ptl;
+	int err = -EFAULT;
 
 	pgd = pgd_offset(mm, address);
 	if (!pgd_present(*pgd))
-		return NULL;
+		goto out;
 
 	pud = pud_offset(pgd, address);
 	if (!pud_present(*pud))
-		return NULL;
+		goto out;
 
 	pmd = pmd_offset(pud, address);
 	if (!pmd_present(*pmd))
-		return NULL;
+		goto out;
 
 	pte = pte_offset_map(pmd, address);
 	/* Make a quick check before getting the lock */
-	if (!sync && !pte_present(*pte)) {
-		pte_unmap(pte);
-		return NULL;
-	}
+	if (!sync && !pte_present(*pte))
+		goto out_unmap;
 
 	ptl = pte_lockptr(mm, pmd);
-	spin_lock(ptl);
+	if (noblock) {
+		if (!spin_trylock(ptl)) {
+			err = -EAGAIN;
+			goto out_unmap;
+		}
+	} else
+		spin_lock(ptl);
+
 	if (pte_present(*pte) && page_to_pfn(page) == pte_pfn(*pte)) {
 		*ptlp = ptl;
 		return pte;
 	}
-	pte_unmap_unlock(pte, ptl);
-	return NULL;
+
+	spin_unlock(ptl);
+ out_unmap:
+	pte_unmap(pte);
+ out:
+	return ERR_PTR(err);
+}
+
+/*
+ * Check that @page is mapped at @address into @mm.
+ *
+ * If @sync is false, page_check_address may perform a racy check to avoid
+ * the page table lock when the pte is not present (helpful when reclaiming
+ * highly shared pages).
+ *
+ * On success returns with pte mapped and locked.
+ */
+pte_t *page_check_address(struct page *page, struct mm_struct *mm,
+			  unsigned long address, spinlock_t **ptlp, int sync)
+{
+	pte_t *pte;
+
+	pte = __page_check_address(page, mm, address, ptlp, sync, 0);
+	if (IS_ERR(pte))
+		return NULL;
+	return pte;
 }
 
 /**
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 2/7] Introduce __page_check_address
@ 2009-12-04  8:42   ` KOSAKI Motohiro
  0 siblings, 0 replies; 38+ messages in thread
From: KOSAKI Motohiro @ 2009-12-04  8:42 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, linux-mm, Rik van Riel, Andrea Arcangeli, Larry Woodman

From 381108e1ff6309f45f45a67acf2a1dd66e41df4f Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Thu, 3 Dec 2009 15:01:42 +0900
Subject: [PATCH 2/7] Introduce __page_check_address

page_check_address() need to take ptelock. but it might be contended.
Then we need trylock version and this patch introduce new helper function.

it will be used latter patch.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/rmap.c |   62 ++++++++++++++++++++++++++++++++++++++++++++++++------------
 1 files changed, 49 insertions(+), 13 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 278cd27..1b50425 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -268,44 +268,80 @@ unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma)
  * the page table lock when the pte is not present (helpful when reclaiming
  * highly shared pages).
  *
- * On success returns with pte mapped and locked.
+ * if @noblock is true, page_check_address may return -EAGAIN if lock is
+ * contended.
+ *
+ * Returns valid pte pointer if success.
+ * Returns -EFAULT if address seems invalid.
+ * Returns -EAGAIN if trylock failed.
  */
-pte_t *page_check_address(struct page *page, struct mm_struct *mm,
-			  unsigned long address, spinlock_t **ptlp, int sync)
+static pte_t *__page_check_address(struct page *page, struct mm_struct *mm,
+				   unsigned long address, spinlock_t **ptlp,
+				   int sync, int noblock)
 {
 	pgd_t *pgd;
 	pud_t *pud;
 	pmd_t *pmd;
 	pte_t *pte;
 	spinlock_t *ptl;
+	int err = -EFAULT;
 
 	pgd = pgd_offset(mm, address);
 	if (!pgd_present(*pgd))
-		return NULL;
+		goto out;
 
 	pud = pud_offset(pgd, address);
 	if (!pud_present(*pud))
-		return NULL;
+		goto out;
 
 	pmd = pmd_offset(pud, address);
 	if (!pmd_present(*pmd))
-		return NULL;
+		goto out;
 
 	pte = pte_offset_map(pmd, address);
 	/* Make a quick check before getting the lock */
-	if (!sync && !pte_present(*pte)) {
-		pte_unmap(pte);
-		return NULL;
-	}
+	if (!sync && !pte_present(*pte))
+		goto out_unmap;
 
 	ptl = pte_lockptr(mm, pmd);
-	spin_lock(ptl);
+	if (noblock) {
+		if (!spin_trylock(ptl)) {
+			err = -EAGAIN;
+			goto out_unmap;
+		}
+	} else
+		spin_lock(ptl);
+
 	if (pte_present(*pte) && page_to_pfn(page) == pte_pfn(*pte)) {
 		*ptlp = ptl;
 		return pte;
 	}
-	pte_unmap_unlock(pte, ptl);
-	return NULL;
+
+	spin_unlock(ptl);
+ out_unmap:
+	pte_unmap(pte);
+ out:
+	return ERR_PTR(err);
+}
+
+/*
+ * Check that @page is mapped at @address into @mm.
+ *
+ * If @sync is false, page_check_address may perform a racy check to avoid
+ * the page table lock when the pte is not present (helpful when reclaiming
+ * highly shared pages).
+ *
+ * On success returns with pte mapped and locked.
+ */
+pte_t *page_check_address(struct page *page, struct mm_struct *mm,
+			  unsigned long address, spinlock_t **ptlp, int sync)
+{
+	pte_t *pte;
+
+	pte = __page_check_address(page, mm, address, ptlp, sync, 0);
+	if (IS_ERR(pte))
+		return NULL;
+	return pte;
 }
 
 /**
-- 
1.6.5.2



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 3/7] VM_LOCKED check don't need pte lock
  2009-12-04  8:40 ` KOSAKI Motohiro
@ 2009-12-04  8:42   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 38+ messages in thread
From: KOSAKI Motohiro @ 2009-12-04  8:42 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, linux-mm, Rik van Riel, Andrea Arcangeli, Larry Woodman

>From 24f910b1ac966c21ea5aab825d1f26815b760304 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Thu, 3 Dec 2009 16:06:47 +0900
Subject: [PATCH 3/7] VM_LOCKED check don't need pte lock

Currently, page_referenced_one() check VM_LOCKED after taking ptelock.
But it's unnecessary. We can check VM_LOCKED before to take lock.

This patch does it.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/rmap.c |   13 ++++++-------
 1 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 1b50425..fb0983a 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -383,21 +383,21 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
 	spinlock_t *ptl;
 	int referenced = 0;
 
-	pte = page_check_address(page, mm, address, &ptl, 0);
-	if (!pte)
-		goto out;
-
 	/*
 	 * Don't want to elevate referenced for mlocked page that gets this far,
 	 * in order that it progresses to try_to_unmap and is moved to the
 	 * unevictable list.
 	 */
 	if (vma->vm_flags & VM_LOCKED) {
-		*mapcount = 1;	/* break early from loop */
+		*mapcount = 0;	/* break early from loop */
 		*vm_flags |= VM_LOCKED;
-		goto out_unmap;
+		goto out;
 	}
 
+	pte = page_check_address(page, mm, address, &ptl, 0);
+	if (!pte)
+		goto out;
+
 	if (ptep_clear_flush_young_notify(vma, address, pte)) {
 		/*
 		 * Don't treat a reference through a sequentially read
@@ -416,7 +416,6 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
 			rwsem_is_locked(&mm->mmap_sem))
 		referenced++;
 
-out_unmap:
 	(*mapcount)--;
 	pte_unmap_unlock(pte, ptl);
 
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 3/7] VM_LOCKED check don't need pte lock
@ 2009-12-04  8:42   ` KOSAKI Motohiro
  0 siblings, 0 replies; 38+ messages in thread
From: KOSAKI Motohiro @ 2009-12-04  8:42 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, linux-mm, Rik van Riel, Andrea Arcangeli, Larry Woodman

From 24f910b1ac966c21ea5aab825d1f26815b760304 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Thu, 3 Dec 2009 16:06:47 +0900
Subject: [PATCH 3/7] VM_LOCKED check don't need pte lock

Currently, page_referenced_one() check VM_LOCKED after taking ptelock.
But it's unnecessary. We can check VM_LOCKED before to take lock.

This patch does it.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/rmap.c |   13 ++++++-------
 1 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 1b50425..fb0983a 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -383,21 +383,21 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
 	spinlock_t *ptl;
 	int referenced = 0;
 
-	pte = page_check_address(page, mm, address, &ptl, 0);
-	if (!pte)
-		goto out;
-
 	/*
 	 * Don't want to elevate referenced for mlocked page that gets this far,
 	 * in order that it progresses to try_to_unmap and is moved to the
 	 * unevictable list.
 	 */
 	if (vma->vm_flags & VM_LOCKED) {
-		*mapcount = 1;	/* break early from loop */
+		*mapcount = 0;	/* break early from loop */
 		*vm_flags |= VM_LOCKED;
-		goto out_unmap;
+		goto out;
 	}
 
+	pte = page_check_address(page, mm, address, &ptl, 0);
+	if (!pte)
+		goto out;
+
 	if (ptep_clear_flush_young_notify(vma, address, pte)) {
 		/*
 		 * Don't treat a reference through a sequentially read
@@ -416,7 +416,6 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
 			rwsem_is_locked(&mm->mmap_sem))
 		referenced++;
 
-out_unmap:
 	(*mapcount)--;
 	pte_unmap_unlock(pte, ptl);
 
-- 
1.6.5.2



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 4/7] Replace page_referenced() with wipe_page_reference()
  2009-12-04  8:40 ` KOSAKI Motohiro
@ 2009-12-04  8:43   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 38+ messages in thread
From: KOSAKI Motohiro @ 2009-12-04  8:43 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, linux-mm, Rik van Riel, Andrea Arcangeli, Larry Woodman

>From d9110c2804a4b88e460edada140b8bb0f7eb3a18 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Fri, 4 Dec 2009 11:45:18 +0900
Subject: [PATCH 4/7] Replace page_referenced() with wipe_page_reference()

page_referenced() imply "test the page was referenced or not", but
shrink_active_list() use it for drop pte young bit. then, it should be
renamed.

Plus, vm_flags argument is really ugly. instead, introduce new
struct page_reference_context, it's for collect some statistics.

This patch doesn't have any behavior change.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 include/linux/ksm.h  |   11 +++--
 include/linux/rmap.h |   16 +++++--
 mm/ksm.c             |   17 +++++---
 mm/rmap.c            |  112 ++++++++++++++++++++++++-------------------------
 mm/vmscan.c          |   46 ++++++++++++--------
 5 files changed, 112 insertions(+), 90 deletions(-)

diff --git a/include/linux/ksm.h b/include/linux/ksm.h
index bed5f16..e1a60d3 100644
--- a/include/linux/ksm.h
+++ b/include/linux/ksm.h
@@ -85,8 +85,8 @@ static inline struct page *ksm_might_need_to_copy(struct page *page,
 	return ksm_does_need_to_copy(page, vma, address);
 }
 
-int page_referenced_ksm(struct page *page,
-			struct mem_cgroup *memcg, unsigned long *vm_flags);
+int wipe_page_reference_ksm(struct page *page, struct mem_cgroup *memcg,
+			    struct page_reference_context *refctx);
 int try_to_unmap_ksm(struct page *page, enum ttu_flags flags);
 int rmap_walk_ksm(struct page *page, int (*rmap_one)(struct page *,
 		  struct vm_area_struct *, unsigned long, void *), void *arg);
@@ -120,10 +120,11 @@ static inline struct page *ksm_might_need_to_copy(struct page *page,
 	return page;
 }
 
-static inline int page_referenced_ksm(struct page *page,
-			struct mem_cgroup *memcg, unsigned long *vm_flags)
+static inline int wipe_page_reference_ksm(struct page *page,
+					  struct mem_cgroup *memcg,
+					  struct page_reference_context *refctx)
 {
-	return 0;
+	return SWAP_SUCCESS;
 }
 
 static inline int try_to_unmap_ksm(struct page *page, enum ttu_flags flags)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index b019ae6..564d981 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -108,13 +108,21 @@ static inline void page_dup_rmap(struct page *page)
 	atomic_inc(&page->_mapcount);
 }
 
+struct page_reference_context {
+	int is_page_locked;
+	unsigned long referenced;
+	unsigned long exec_referenced;
+	int maybe_mlocked;	/* found VM_LOCKED, but it's unstable result */
+};
+
 /*
  * Called from mm/vmscan.c to handle paging out
  */
-int page_referenced(struct page *, int is_locked,
-			struct mem_cgroup *cnt, unsigned long *vm_flags);
-int page_referenced_one(struct page *, struct vm_area_struct *,
-	unsigned long address, unsigned int *mapcount, unsigned long *vm_flags);
+int wipe_page_reference(struct page *page, struct mem_cgroup *memcg,
+		    struct page_reference_context *refctx);
+int wipe_page_reference_one(struct page *page,
+			    struct page_reference_context *refctx,
+			    struct vm_area_struct *vma, unsigned long address);
 
 enum ttu_flags {
 	TTU_UNMAP = 0,			/* unmap mode */
diff --git a/mm/ksm.c b/mm/ksm.c
index c1647a2..3c121c8 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1544,15 +1544,15 @@ struct page *ksm_does_need_to_copy(struct page *page,
 	return new_page;
 }
 
-int page_referenced_ksm(struct page *page, struct mem_cgroup *memcg,
-			unsigned long *vm_flags)
+int wipe_page_reference_ksm(struct page *page, struct mem_cgroup *memcg,
+			    struct page_reference_context *refctx)
 {
 	struct stable_node *stable_node;
 	struct rmap_item *rmap_item;
 	struct hlist_node *hlist;
 	unsigned int mapcount = page_mapcount(page);
-	int referenced = 0;
 	int search_new_forks = 0;
+	int ret = SWAP_SUCCESS;
 
 	VM_BUG_ON(!PageKsm(page));
 	VM_BUG_ON(!PageLocked(page));
@@ -1582,10 +1582,15 @@ again:
 			if ((rmap_item->mm == vma->vm_mm) == search_new_forks)
 				continue;
 
-			referenced += page_referenced_one(page, vma,
-				rmap_item->address, &mapcount, vm_flags);
+			ret = wipe_page_reference_one(page, refctx, vma,
+						      rmap_item->address);
+			if (ret != SWAP_SUCCESS)
+				goto out;
+			mapcount--;
 			if (!search_new_forks || !mapcount)
 				break;
+			if (refctx->maybe_mlocked)
+				goto out;
 		}
 		spin_unlock(&anon_vma->lock);
 		if (!mapcount)
@@ -1594,7 +1599,7 @@ again:
 	if (!search_new_forks++)
 		goto again;
 out:
-	return referenced;
+	return ret;
 }
 
 int try_to_unmap_ksm(struct page *page, enum ttu_flags flags)
diff --git a/mm/rmap.c b/mm/rmap.c
index fb0983a..2f4451b 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -371,17 +371,16 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma)
 }
 
 /*
- * Subfunctions of page_referenced: page_referenced_one called
- * repeatedly from either page_referenced_anon or page_referenced_file.
+ * Subfunctions of wipe_page_reference: wipe_page_reference_one called
+ * repeatedly from either wipe_page_reference_anon or wipe_page_reference_file.
  */
-int page_referenced_one(struct page *page, struct vm_area_struct *vma,
-			unsigned long address, unsigned int *mapcount,
-			unsigned long *vm_flags)
+int wipe_page_reference_one(struct page *page,
+			    struct page_reference_context *refctx,
+			    struct vm_area_struct *vma, unsigned long address)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	pte_t *pte;
 	spinlock_t *ptl;
-	int referenced = 0;
 
 	/*
 	 * Don't want to elevate referenced for mlocked page that gets this far,
@@ -389,8 +388,7 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
 	 * unevictable list.
 	 */
 	if (vma->vm_flags & VM_LOCKED) {
-		*mapcount = 0;	/* break early from loop */
-		*vm_flags |= VM_LOCKED;
+		refctx->maybe_mlocked = 1;
 		goto out;
 	}
 
@@ -406,37 +404,38 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
 		 * mapping is already gone, the unmap path will have
 		 * set PG_referenced or activated the page.
 		 */
-		if (likely(!VM_SequentialReadHint(vma)))
-			referenced++;
+		if (likely(!VM_SequentialReadHint(vma))) {
+			refctx->referenced++;
+			if (vma->vm_flags & VM_EXEC) {
+				refctx->exec_referenced++;
+			}
+		}
 	}
 
 	/* Pretend the page is referenced if the task has the
 	   swap token and is in the middle of a page fault. */
 	if (mm != current->mm && has_swap_token(mm) &&
 			rwsem_is_locked(&mm->mmap_sem))
-		referenced++;
+		refctx->referenced++;
 
-	(*mapcount)--;
 	pte_unmap_unlock(pte, ptl);
 
-	if (referenced)
-		*vm_flags |= vma->vm_flags;
 out:
-	return referenced;
+	return SWAP_SUCCESS;
 }
 
-static int page_referenced_anon(struct page *page,
-				struct mem_cgroup *mem_cont,
-				unsigned long *vm_flags)
+static int wipe_page_reference_anon(struct page *page,
+				    struct mem_cgroup *memcg,
+				    struct page_reference_context *refctx)
 {
 	unsigned int mapcount;
 	struct anon_vma *anon_vma;
 	struct vm_area_struct *vma;
-	int referenced = 0;
+	int ret = SWAP_SUCCESS;
 
 	anon_vma = page_lock_anon_vma(page);
 	if (!anon_vma)
-		return referenced;
+		return ret;
 
 	mapcount = page_mapcount(page);
 	list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {
@@ -448,20 +447,22 @@ static int page_referenced_anon(struct page *page,
 		 * counting on behalf of references from different
 		 * cgroups
 		 */
-		if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
+		if (memcg && !mm_match_cgroup(vma->vm_mm, memcg))
 			continue;
-		referenced += page_referenced_one(page, vma, address,
-						  &mapcount, vm_flags);
-		if (!mapcount)
+		ret = wipe_page_reference_one(page, refctx, vma, address);
+		if (ret != SWAP_SUCCESS)
+			break;
+		mapcount--;
+		if (!mapcount || refctx->maybe_mlocked)
 			break;
 	}
 
 	page_unlock_anon_vma(anon_vma);
-	return referenced;
+	return ret;
 }
 
 /**
- * page_referenced_file - referenced check for object-based rmap
+ * wipe_page_reference_file - wipe page reference for object-based rmap
  * @page: the page we're checking references on.
  * @mem_cont: target memory controller
  * @vm_flags: collect encountered vma->vm_flags who actually referenced the page
@@ -471,18 +472,18 @@ static int page_referenced_anon(struct page *page,
  * pointer, then walking the chain of vmas it holds.  It returns the number
  * of references it found.
  *
- * This function is only called from page_referenced for object-based pages.
+ * This function is only called from wipe_page_reference for object-based pages.
  */
-static int page_referenced_file(struct page *page,
-				struct mem_cgroup *mem_cont,
-				unsigned long *vm_flags)
+static int wipe_page_reference_file(struct page *page,
+				    struct mem_cgroup *memcg,
+				    struct page_reference_context *refctx)
 {
 	unsigned int mapcount;
 	struct address_space *mapping = page->mapping;
 	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
 	struct vm_area_struct *vma;
 	struct prio_tree_iter iter;
-	int referenced = 0;
+	int ret = SWAP_SUCCESS;
 
 	/*
 	 * The caller's checks on page->mapping and !PageAnon have made
@@ -516,65 +517,62 @@ static int page_referenced_file(struct page *page,
 		 * counting on behalf of references from different
 		 * cgroups
 		 */
-		if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
+		if (memcg && !mm_match_cgroup(vma->vm_mm, memcg))
 			continue;
-		referenced += page_referenced_one(page, vma, address,
-						  &mapcount, vm_flags);
-		if (!mapcount)
+		ret = wipe_page_reference_one(page, refctx, vma, address);
+		if (ret != SWAP_SUCCESS)
+			break;
+		mapcount--;
+		if (!mapcount || refctx->maybe_mlocked)
 			break;
 	}
 
 	spin_unlock(&mapping->i_mmap_lock);
-	return referenced;
+	return ret;
 }
 
 /**
- * page_referenced - test if the page was referenced
+ * wipe_page_reference - clear and test the page reference and pte young bit
  * @page: the page to test
- * @is_locked: caller holds lock on the page
- * @mem_cont: target memory controller
- * @vm_flags: collect encountered vma->vm_flags who actually referenced the page
+ * @memcg: target memory controller
+ * @refctx: context for collect some statistics
  *
  * Quick test_and_clear_referenced for all mappings to a page,
  * returns the number of ptes which referenced the page.
  */
-int page_referenced(struct page *page,
-		    int is_locked,
-		    struct mem_cgroup *mem_cont,
-		    unsigned long *vm_flags)
+int wipe_page_reference(struct page *page,
+			struct mem_cgroup *memcg,
+			struct page_reference_context *refctx)
 {
-	int referenced = 0;
 	int we_locked = 0;
+	int ret = SWAP_SUCCESS;
 
 	if (TestClearPageReferenced(page))
-		referenced++;
+		refctx->referenced++;
 
-	*vm_flags = 0;
 	if (page_mapped(page) && page_rmapping(page)) {
-		if (!is_locked && (!PageAnon(page) || PageKsm(page))) {
+		if (!refctx->is_page_locked &&
+		    (!PageAnon(page) || PageKsm(page))) {
 			we_locked = trylock_page(page);
 			if (!we_locked) {
-				referenced++;
+				refctx->referenced++;
 				goto out;
 			}
 		}
 		if (unlikely(PageKsm(page)))
-			referenced += page_referenced_ksm(page, mem_cont,
-								vm_flags);
+			ret = wipe_page_reference_ksm(page, memcg, refctx);
 		else if (PageAnon(page))
-			referenced += page_referenced_anon(page, mem_cont,
-								vm_flags);
+			ret = wipe_page_reference_anon(page, memcg, refctx);
 		else if (page->mapping)
-			referenced += page_referenced_file(page, mem_cont,
-								vm_flags);
+			ret = wipe_page_reference_file(page, memcg, refctx);
 		if (we_locked)
 			unlock_page(page);
 	}
 out:
 	if (page_test_and_clear_young(page))
-		referenced++;
+		refctx->referenced++;
 
-	return referenced;
+	return ret;
 }
 
 static int page_mkclean_one(struct page *page, struct vm_area_struct *vma,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4ba08da..0db9c06 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -569,7 +569,6 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 	struct pagevec freed_pvec;
 	int pgactivate = 0;
 	unsigned long nr_reclaimed = 0;
-	unsigned long vm_flags;
 
 	cond_resched();
 
@@ -578,7 +577,9 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		struct address_space *mapping;
 		struct page *page;
 		int may_enter_fs;
-		int referenced;
+		struct page_reference_context refctx = {
+			.is_page_locked = 1,
+		};
 
 		cond_resched();
 
@@ -620,16 +621,15 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				goto keep_locked;
 		}
 
-		referenced = page_referenced(page, 1,
-						sc->mem_cgroup, &vm_flags);
+		wipe_page_reference(page, sc->mem_cgroup, &refctx);
 		/*
 		 * In active use or really unfreeable?  Activate it.
 		 * If page which have PG_mlocked lost isoltation race,
 		 * try_to_unmap moves it to unevictable list
 		 */
 		if (sc->order <= PAGE_ALLOC_COSTLY_ORDER &&
-					referenced && page_mapped(page)
-					&& !(vm_flags & VM_LOCKED))
+		    page_mapped(page) && refctx.referenced &&
+		    !refctx.maybe_mlocked)
 			goto activate_locked;
 
 		/*
@@ -664,7 +664,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		}
 
 		if (PageDirty(page)) {
-			if (sc->order <= PAGE_ALLOC_COSTLY_ORDER && referenced)
+			if (sc->order <= PAGE_ALLOC_COSTLY_ORDER &&
+			    refctx.referenced)
 				goto keep_locked;
 			if (!may_enter_fs)
 				goto keep_locked;
@@ -1243,9 +1244,10 @@ static inline void note_zone_scanning_priority(struct zone *zone, int priority)
  *
  * If the pages are mostly unmapped, the processing is fast and it is
  * appropriate to hold zone->lru_lock across the whole operation.  But if
- * the pages are mapped, the processing is slow (page_referenced()) so we
- * should drop zone->lru_lock around each page.  It's impossible to balance
- * this, so instead we remove the pages from the LRU while processing them.
+ * the pages are mapped, the processing is slow (because wipe_page_reference()
+ * walk each ptes) so we should drop zone->lru_lock around each page.  It's
+ * impossible to balance this, so instead we remove the pages from the LRU
+ * while processing them.
  * It is safe to rely on PG_active against the non-LRU pages in here because
  * nobody will play with that bit on a non-LRU page.
  *
@@ -1291,7 +1293,6 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 {
 	unsigned long nr_taken;
 	unsigned long pgscanned;
-	unsigned long vm_flags;
 	LIST_HEAD(l_hold);	/* The pages which were snipped off */
 	LIST_HEAD(l_active);
 	LIST_HEAD(l_inactive);
@@ -1325,6 +1326,10 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 	spin_unlock_irq(&zone->lru_lock);
 
 	while (!list_empty(&l_hold)) {
+		struct page_reference_context refctx = {
+			.is_page_locked = 0,
+		};
+
 		cond_resched();
 		page = lru_to_page(&l_hold);
 		list_del(&page->lru);
@@ -1334,10 +1339,17 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 			continue;
 		}
 
-		/* page_referenced clears PageReferenced */
-		if (page_mapped(page) &&
-		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags)) {
+		if (!page_mapped(page)) {
+			ClearPageActive(page);	/* we are de-activating */
+			list_add(&page->lru, &l_inactive);
+			continue;
+		}
+
+		wipe_page_reference(page, sc->mem_cgroup, &refctx);
+
+		if (refctx.referenced)
 			nr_rotated++;
+		if (refctx.exec_referenced && page_is_file_cache(page)) {
 			/*
 			 * Identify referenced, file-backed active pages and
 			 * give them one more trip around the active list. So
@@ -1347,10 +1359,8 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 			 * IO, plus JVM can create lots of anon VM_EXEC pages,
 			 * so we ignore them here.
 			 */
-			if ((vm_flags & VM_EXEC) && page_is_file_cache(page)) {
-				list_add(&page->lru, &l_active);
-				continue;
-			}
+			list_add(&page->lru, &l_active);
+			continue;
 		}
 
 		ClearPageActive(page);	/* we are de-activating */
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 4/7] Replace page_referenced() with wipe_page_reference()
@ 2009-12-04  8:43   ` KOSAKI Motohiro
  0 siblings, 0 replies; 38+ messages in thread
From: KOSAKI Motohiro @ 2009-12-04  8:43 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, linux-mm, Rik van Riel, Andrea Arcangeli, Larry Woodman

From d9110c2804a4b88e460edada140b8bb0f7eb3a18 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Fri, 4 Dec 2009 11:45:18 +0900
Subject: [PATCH 4/7] Replace page_referenced() with wipe_page_reference()

page_referenced() imply "test the page was referenced or not", but
shrink_active_list() use it for drop pte young bit. then, it should be
renamed.

Plus, vm_flags argument is really ugly. instead, introduce new
struct page_reference_context, it's for collect some statistics.

This patch doesn't have any behavior change.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 include/linux/ksm.h  |   11 +++--
 include/linux/rmap.h |   16 +++++--
 mm/ksm.c             |   17 +++++---
 mm/rmap.c            |  112 ++++++++++++++++++++++++-------------------------
 mm/vmscan.c          |   46 ++++++++++++--------
 5 files changed, 112 insertions(+), 90 deletions(-)

diff --git a/include/linux/ksm.h b/include/linux/ksm.h
index bed5f16..e1a60d3 100644
--- a/include/linux/ksm.h
+++ b/include/linux/ksm.h
@@ -85,8 +85,8 @@ static inline struct page *ksm_might_need_to_copy(struct page *page,
 	return ksm_does_need_to_copy(page, vma, address);
 }
 
-int page_referenced_ksm(struct page *page,
-			struct mem_cgroup *memcg, unsigned long *vm_flags);
+int wipe_page_reference_ksm(struct page *page, struct mem_cgroup *memcg,
+			    struct page_reference_context *refctx);
 int try_to_unmap_ksm(struct page *page, enum ttu_flags flags);
 int rmap_walk_ksm(struct page *page, int (*rmap_one)(struct page *,
 		  struct vm_area_struct *, unsigned long, void *), void *arg);
@@ -120,10 +120,11 @@ static inline struct page *ksm_might_need_to_copy(struct page *page,
 	return page;
 }
 
-static inline int page_referenced_ksm(struct page *page,
-			struct mem_cgroup *memcg, unsigned long *vm_flags)
+static inline int wipe_page_reference_ksm(struct page *page,
+					  struct mem_cgroup *memcg,
+					  struct page_reference_context *refctx)
 {
-	return 0;
+	return SWAP_SUCCESS;
 }
 
 static inline int try_to_unmap_ksm(struct page *page, enum ttu_flags flags)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index b019ae6..564d981 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -108,13 +108,21 @@ static inline void page_dup_rmap(struct page *page)
 	atomic_inc(&page->_mapcount);
 }
 
+struct page_reference_context {
+	int is_page_locked;
+	unsigned long referenced;
+	unsigned long exec_referenced;
+	int maybe_mlocked;	/* found VM_LOCKED, but it's unstable result */
+};
+
 /*
  * Called from mm/vmscan.c to handle paging out
  */
-int page_referenced(struct page *, int is_locked,
-			struct mem_cgroup *cnt, unsigned long *vm_flags);
-int page_referenced_one(struct page *, struct vm_area_struct *,
-	unsigned long address, unsigned int *mapcount, unsigned long *vm_flags);
+int wipe_page_reference(struct page *page, struct mem_cgroup *memcg,
+		    struct page_reference_context *refctx);
+int wipe_page_reference_one(struct page *page,
+			    struct page_reference_context *refctx,
+			    struct vm_area_struct *vma, unsigned long address);
 
 enum ttu_flags {
 	TTU_UNMAP = 0,			/* unmap mode */
diff --git a/mm/ksm.c b/mm/ksm.c
index c1647a2..3c121c8 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1544,15 +1544,15 @@ struct page *ksm_does_need_to_copy(struct page *page,
 	return new_page;
 }
 
-int page_referenced_ksm(struct page *page, struct mem_cgroup *memcg,
-			unsigned long *vm_flags)
+int wipe_page_reference_ksm(struct page *page, struct mem_cgroup *memcg,
+			    struct page_reference_context *refctx)
 {
 	struct stable_node *stable_node;
 	struct rmap_item *rmap_item;
 	struct hlist_node *hlist;
 	unsigned int mapcount = page_mapcount(page);
-	int referenced = 0;
 	int search_new_forks = 0;
+	int ret = SWAP_SUCCESS;
 
 	VM_BUG_ON(!PageKsm(page));
 	VM_BUG_ON(!PageLocked(page));
@@ -1582,10 +1582,15 @@ again:
 			if ((rmap_item->mm == vma->vm_mm) == search_new_forks)
 				continue;
 
-			referenced += page_referenced_one(page, vma,
-				rmap_item->address, &mapcount, vm_flags);
+			ret = wipe_page_reference_one(page, refctx, vma,
+						      rmap_item->address);
+			if (ret != SWAP_SUCCESS)
+				goto out;
+			mapcount--;
 			if (!search_new_forks || !mapcount)
 				break;
+			if (refctx->maybe_mlocked)
+				goto out;
 		}
 		spin_unlock(&anon_vma->lock);
 		if (!mapcount)
@@ -1594,7 +1599,7 @@ again:
 	if (!search_new_forks++)
 		goto again;
 out:
-	return referenced;
+	return ret;
 }
 
 int try_to_unmap_ksm(struct page *page, enum ttu_flags flags)
diff --git a/mm/rmap.c b/mm/rmap.c
index fb0983a..2f4451b 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -371,17 +371,16 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma)
 }
 
 /*
- * Subfunctions of page_referenced: page_referenced_one called
- * repeatedly from either page_referenced_anon or page_referenced_file.
+ * Subfunctions of wipe_page_reference: wipe_page_reference_one called
+ * repeatedly from either wipe_page_reference_anon or wipe_page_reference_file.
  */
-int page_referenced_one(struct page *page, struct vm_area_struct *vma,
-			unsigned long address, unsigned int *mapcount,
-			unsigned long *vm_flags)
+int wipe_page_reference_one(struct page *page,
+			    struct page_reference_context *refctx,
+			    struct vm_area_struct *vma, unsigned long address)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	pte_t *pte;
 	spinlock_t *ptl;
-	int referenced = 0;
 
 	/*
 	 * Don't want to elevate referenced for mlocked page that gets this far,
@@ -389,8 +388,7 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
 	 * unevictable list.
 	 */
 	if (vma->vm_flags & VM_LOCKED) {
-		*mapcount = 0;	/* break early from loop */
-		*vm_flags |= VM_LOCKED;
+		refctx->maybe_mlocked = 1;
 		goto out;
 	}
 
@@ -406,37 +404,38 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
 		 * mapping is already gone, the unmap path will have
 		 * set PG_referenced or activated the page.
 		 */
-		if (likely(!VM_SequentialReadHint(vma)))
-			referenced++;
+		if (likely(!VM_SequentialReadHint(vma))) {
+			refctx->referenced++;
+			if (vma->vm_flags & VM_EXEC) {
+				refctx->exec_referenced++;
+			}
+		}
 	}
 
 	/* Pretend the page is referenced if the task has the
 	   swap token and is in the middle of a page fault. */
 	if (mm != current->mm && has_swap_token(mm) &&
 			rwsem_is_locked(&mm->mmap_sem))
-		referenced++;
+		refctx->referenced++;
 
-	(*mapcount)--;
 	pte_unmap_unlock(pte, ptl);
 
-	if (referenced)
-		*vm_flags |= vma->vm_flags;
 out:
-	return referenced;
+	return SWAP_SUCCESS;
 }
 
-static int page_referenced_anon(struct page *page,
-				struct mem_cgroup *mem_cont,
-				unsigned long *vm_flags)
+static int wipe_page_reference_anon(struct page *page,
+				    struct mem_cgroup *memcg,
+				    struct page_reference_context *refctx)
 {
 	unsigned int mapcount;
 	struct anon_vma *anon_vma;
 	struct vm_area_struct *vma;
-	int referenced = 0;
+	int ret = SWAP_SUCCESS;
 
 	anon_vma = page_lock_anon_vma(page);
 	if (!anon_vma)
-		return referenced;
+		return ret;
 
 	mapcount = page_mapcount(page);
 	list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {
@@ -448,20 +447,22 @@ static int page_referenced_anon(struct page *page,
 		 * counting on behalf of references from different
 		 * cgroups
 		 */
-		if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
+		if (memcg && !mm_match_cgroup(vma->vm_mm, memcg))
 			continue;
-		referenced += page_referenced_one(page, vma, address,
-						  &mapcount, vm_flags);
-		if (!mapcount)
+		ret = wipe_page_reference_one(page, refctx, vma, address);
+		if (ret != SWAP_SUCCESS)
+			break;
+		mapcount--;
+		if (!mapcount || refctx->maybe_mlocked)
 			break;
 	}
 
 	page_unlock_anon_vma(anon_vma);
-	return referenced;
+	return ret;
 }
 
 /**
- * page_referenced_file - referenced check for object-based rmap
+ * wipe_page_reference_file - wipe page reference for object-based rmap
  * @page: the page we're checking references on.
  * @mem_cont: target memory controller
  * @vm_flags: collect encountered vma->vm_flags who actually referenced the page
@@ -471,18 +472,18 @@ static int page_referenced_anon(struct page *page,
  * pointer, then walking the chain of vmas it holds.  It returns the number
  * of references it found.
  *
- * This function is only called from page_referenced for object-based pages.
+ * This function is only called from wipe_page_reference for object-based pages.
  */
-static int page_referenced_file(struct page *page,
-				struct mem_cgroup *mem_cont,
-				unsigned long *vm_flags)
+static int wipe_page_reference_file(struct page *page,
+				    struct mem_cgroup *memcg,
+				    struct page_reference_context *refctx)
 {
 	unsigned int mapcount;
 	struct address_space *mapping = page->mapping;
 	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
 	struct vm_area_struct *vma;
 	struct prio_tree_iter iter;
-	int referenced = 0;
+	int ret = SWAP_SUCCESS;
 
 	/*
 	 * The caller's checks on page->mapping and !PageAnon have made
@@ -516,65 +517,62 @@ static int page_referenced_file(struct page *page,
 		 * counting on behalf of references from different
 		 * cgroups
 		 */
-		if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
+		if (memcg && !mm_match_cgroup(vma->vm_mm, memcg))
 			continue;
-		referenced += page_referenced_one(page, vma, address,
-						  &mapcount, vm_flags);
-		if (!mapcount)
+		ret = wipe_page_reference_one(page, refctx, vma, address);
+		if (ret != SWAP_SUCCESS)
+			break;
+		mapcount--;
+		if (!mapcount || refctx->maybe_mlocked)
 			break;
 	}
 
 	spin_unlock(&mapping->i_mmap_lock);
-	return referenced;
+	return ret;
 }
 
 /**
- * page_referenced - test if the page was referenced
+ * wipe_page_reference - clear and test the page reference and pte young bit
  * @page: the page to test
- * @is_locked: caller holds lock on the page
- * @mem_cont: target memory controller
- * @vm_flags: collect encountered vma->vm_flags who actually referenced the page
+ * @memcg: target memory controller
+ * @refctx: context for collect some statistics
  *
  * Quick test_and_clear_referenced for all mappings to a page,
  * returns the number of ptes which referenced the page.
  */
-int page_referenced(struct page *page,
-		    int is_locked,
-		    struct mem_cgroup *mem_cont,
-		    unsigned long *vm_flags)
+int wipe_page_reference(struct page *page,
+			struct mem_cgroup *memcg,
+			struct page_reference_context *refctx)
 {
-	int referenced = 0;
 	int we_locked = 0;
+	int ret = SWAP_SUCCESS;
 
 	if (TestClearPageReferenced(page))
-		referenced++;
+		refctx->referenced++;
 
-	*vm_flags = 0;
 	if (page_mapped(page) && page_rmapping(page)) {
-		if (!is_locked && (!PageAnon(page) || PageKsm(page))) {
+		if (!refctx->is_page_locked &&
+		    (!PageAnon(page) || PageKsm(page))) {
 			we_locked = trylock_page(page);
 			if (!we_locked) {
-				referenced++;
+				refctx->referenced++;
 				goto out;
 			}
 		}
 		if (unlikely(PageKsm(page)))
-			referenced += page_referenced_ksm(page, mem_cont,
-								vm_flags);
+			ret = wipe_page_reference_ksm(page, memcg, refctx);
 		else if (PageAnon(page))
-			referenced += page_referenced_anon(page, mem_cont,
-								vm_flags);
+			ret = wipe_page_reference_anon(page, memcg, refctx);
 		else if (page->mapping)
-			referenced += page_referenced_file(page, mem_cont,
-								vm_flags);
+			ret = wipe_page_reference_file(page, memcg, refctx);
 		if (we_locked)
 			unlock_page(page);
 	}
 out:
 	if (page_test_and_clear_young(page))
-		referenced++;
+		refctx->referenced++;
 
-	return referenced;
+	return ret;
 }
 
 static int page_mkclean_one(struct page *page, struct vm_area_struct *vma,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4ba08da..0db9c06 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -569,7 +569,6 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 	struct pagevec freed_pvec;
 	int pgactivate = 0;
 	unsigned long nr_reclaimed = 0;
-	unsigned long vm_flags;
 
 	cond_resched();
 
@@ -578,7 +577,9 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		struct address_space *mapping;
 		struct page *page;
 		int may_enter_fs;
-		int referenced;
+		struct page_reference_context refctx = {
+			.is_page_locked = 1,
+		};
 
 		cond_resched();
 
@@ -620,16 +621,15 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				goto keep_locked;
 		}
 
-		referenced = page_referenced(page, 1,
-						sc->mem_cgroup, &vm_flags);
+		wipe_page_reference(page, sc->mem_cgroup, &refctx);
 		/*
 		 * In active use or really unfreeable?  Activate it.
 		 * If page which have PG_mlocked lost isoltation race,
 		 * try_to_unmap moves it to unevictable list
 		 */
 		if (sc->order <= PAGE_ALLOC_COSTLY_ORDER &&
-					referenced && page_mapped(page)
-					&& !(vm_flags & VM_LOCKED))
+		    page_mapped(page) && refctx.referenced &&
+		    !refctx.maybe_mlocked)
 			goto activate_locked;
 
 		/*
@@ -664,7 +664,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		}
 
 		if (PageDirty(page)) {
-			if (sc->order <= PAGE_ALLOC_COSTLY_ORDER && referenced)
+			if (sc->order <= PAGE_ALLOC_COSTLY_ORDER &&
+			    refctx.referenced)
 				goto keep_locked;
 			if (!may_enter_fs)
 				goto keep_locked;
@@ -1243,9 +1244,10 @@ static inline void note_zone_scanning_priority(struct zone *zone, int priority)
  *
  * If the pages are mostly unmapped, the processing is fast and it is
  * appropriate to hold zone->lru_lock across the whole operation.  But if
- * the pages are mapped, the processing is slow (page_referenced()) so we
- * should drop zone->lru_lock around each page.  It's impossible to balance
- * this, so instead we remove the pages from the LRU while processing them.
+ * the pages are mapped, the processing is slow (because wipe_page_reference()
+ * walk each ptes) so we should drop zone->lru_lock around each page.  It's
+ * impossible to balance this, so instead we remove the pages from the LRU
+ * while processing them.
  * It is safe to rely on PG_active against the non-LRU pages in here because
  * nobody will play with that bit on a non-LRU page.
  *
@@ -1291,7 +1293,6 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 {
 	unsigned long nr_taken;
 	unsigned long pgscanned;
-	unsigned long vm_flags;
 	LIST_HEAD(l_hold);	/* The pages which were snipped off */
 	LIST_HEAD(l_active);
 	LIST_HEAD(l_inactive);
@@ -1325,6 +1326,10 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 	spin_unlock_irq(&zone->lru_lock);
 
 	while (!list_empty(&l_hold)) {
+		struct page_reference_context refctx = {
+			.is_page_locked = 0,
+		};
+
 		cond_resched();
 		page = lru_to_page(&l_hold);
 		list_del(&page->lru);
@@ -1334,10 +1339,17 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 			continue;
 		}
 
-		/* page_referenced clears PageReferenced */
-		if (page_mapped(page) &&
-		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags)) {
+		if (!page_mapped(page)) {
+			ClearPageActive(page);	/* we are de-activating */
+			list_add(&page->lru, &l_inactive);
+			continue;
+		}
+
+		wipe_page_reference(page, sc->mem_cgroup, &refctx);
+
+		if (refctx.referenced)
 			nr_rotated++;
+		if (refctx.exec_referenced && page_is_file_cache(page)) {
 			/*
 			 * Identify referenced, file-backed active pages and
 			 * give them one more trip around the active list. So
@@ -1347,10 +1359,8 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 			 * IO, plus JVM can create lots of anon VM_EXEC pages,
 			 * so we ignore them here.
 			 */
-			if ((vm_flags & VM_EXEC) && page_is_file_cache(page)) {
-				list_add(&page->lru, &l_active);
-				continue;
-			}
+			list_add(&page->lru, &l_active);
+			continue;
 		}
 
 		ClearPageActive(page);	/* we are de-activating */
-- 
1.6.5.2



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 5/7] Don't deactivate the page if trylock_page() is failed.
  2009-12-04  8:40 ` KOSAKI Motohiro
@ 2009-12-04  8:44   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 38+ messages in thread
From: KOSAKI Motohiro @ 2009-12-04  8:44 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, linux-mm, Rik van Riel, Andrea Arcangeli, Larry Woodman

>From 7635eaa033cfcce7f351b5023952f23f0daffefe Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Fri, 4 Dec 2009 12:03:07 +0900
Subject: [PATCH 5/7] Don't deactivate the page if trylock_page() is failed.

Currently, wipe_page_reference() increment refctx->referenced variable
if trylock_page() is failed. but it is meaningless at all.
shrink_active_list() deactivate the page although the page was
referenced. The page shouldn't be deactivated with young bit. it
break reclaim basic theory and decrease reclaim throughput.

This patch introduce new SWAP_AGAIN return value to
wipe_page_reference().

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/rmap.c   |    5 ++++-
 mm/vmscan.c |   15 +++++++++++++--
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 2f4451b..b84f350 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -539,6 +539,9 @@ static int wipe_page_reference_file(struct page *page,
  *
  * Quick test_and_clear_referenced for all mappings to a page,
  * returns the number of ptes which referenced the page.
+ *
+ * SWAP_SUCCESS  - success to wipe all ptes
+ * SWAP_AGAIN    - temporary busy, try again later
  */
 int wipe_page_reference(struct page *page,
 			struct mem_cgroup *memcg,
@@ -555,7 +558,7 @@ int wipe_page_reference(struct page *page,
 		    (!PageAnon(page) || PageKsm(page))) {
 			we_locked = trylock_page(page);
 			if (!we_locked) {
-				refctx->referenced++;
+				ret = SWAP_AGAIN;
 				goto out;
 			}
 		}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 0db9c06..9684e40 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -577,6 +577,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		struct address_space *mapping;
 		struct page *page;
 		int may_enter_fs;
+		int ret;
 		struct page_reference_context refctx = {
 			.is_page_locked = 1,
 		};
@@ -621,7 +622,11 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				goto keep_locked;
 		}
 
-		wipe_page_reference(page, sc->mem_cgroup, &refctx);
+		ret = wipe_page_reference(page, sc->mem_cgroup, &refctx);
+		if (ret == SWAP_AGAIN)
+			goto keep_locked;
+		VM_BUG_ON(ret != SWAP_SUCCESS);
+
 		/*
 		 * In active use or really unfreeable?  Activate it.
 		 * If page which have PG_mlocked lost isoltation race,
@@ -1326,6 +1331,7 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 	spin_unlock_irq(&zone->lru_lock);
 
 	while (!list_empty(&l_hold)) {
+		int ret;
 		struct page_reference_context refctx = {
 			.is_page_locked = 0,
 		};
@@ -1345,7 +1351,12 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 			continue;
 		}
 
-		wipe_page_reference(page, sc->mem_cgroup, &refctx);
+		ret = wipe_page_reference(page, sc->mem_cgroup, &refctx);
+		if (ret == SWAP_AGAIN) {
+			list_add(&page->lru, &l_active);
+			continue;
+		}
+		VM_BUG_ON(ret != SWAP_SUCCESS);
 
 		if (refctx.referenced)
 			nr_rotated++;
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 5/7] Don't deactivate the page if trylock_page() is failed.
@ 2009-12-04  8:44   ` KOSAKI Motohiro
  0 siblings, 0 replies; 38+ messages in thread
From: KOSAKI Motohiro @ 2009-12-04  8:44 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, linux-mm, Rik van Riel, Andrea Arcangeli, Larry Woodman

From 7635eaa033cfcce7f351b5023952f23f0daffefe Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Fri, 4 Dec 2009 12:03:07 +0900
Subject: [PATCH 5/7] Don't deactivate the page if trylock_page() is failed.

Currently, wipe_page_reference() increment refctx->referenced variable
if trylock_page() is failed. but it is meaningless at all.
shrink_active_list() deactivate the page although the page was
referenced. The page shouldn't be deactivated with young bit. it
break reclaim basic theory and decrease reclaim throughput.

This patch introduce new SWAP_AGAIN return value to
wipe_page_reference().

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/rmap.c   |    5 ++++-
 mm/vmscan.c |   15 +++++++++++++--
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 2f4451b..b84f350 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -539,6 +539,9 @@ static int wipe_page_reference_file(struct page *page,
  *
  * Quick test_and_clear_referenced for all mappings to a page,
  * returns the number of ptes which referenced the page.
+ *
+ * SWAP_SUCCESS  - success to wipe all ptes
+ * SWAP_AGAIN    - temporary busy, try again later
  */
 int wipe_page_reference(struct page *page,
 			struct mem_cgroup *memcg,
@@ -555,7 +558,7 @@ int wipe_page_reference(struct page *page,
 		    (!PageAnon(page) || PageKsm(page))) {
 			we_locked = trylock_page(page);
 			if (!we_locked) {
-				refctx->referenced++;
+				ret = SWAP_AGAIN;
 				goto out;
 			}
 		}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 0db9c06..9684e40 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -577,6 +577,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		struct address_space *mapping;
 		struct page *page;
 		int may_enter_fs;
+		int ret;
 		struct page_reference_context refctx = {
 			.is_page_locked = 1,
 		};
@@ -621,7 +622,11 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				goto keep_locked;
 		}
 
-		wipe_page_reference(page, sc->mem_cgroup, &refctx);
+		ret = wipe_page_reference(page, sc->mem_cgroup, &refctx);
+		if (ret == SWAP_AGAIN)
+			goto keep_locked;
+		VM_BUG_ON(ret != SWAP_SUCCESS);
+
 		/*
 		 * In active use or really unfreeable?  Activate it.
 		 * If page which have PG_mlocked lost isoltation race,
@@ -1326,6 +1331,7 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 	spin_unlock_irq(&zone->lru_lock);
 
 	while (!list_empty(&l_hold)) {
+		int ret;
 		struct page_reference_context refctx = {
 			.is_page_locked = 0,
 		};
@@ -1345,7 +1351,12 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 			continue;
 		}
 
-		wipe_page_reference(page, sc->mem_cgroup, &refctx);
+		ret = wipe_page_reference(page, sc->mem_cgroup, &refctx);
+		if (ret == SWAP_AGAIN) {
+			list_add(&page->lru, &l_active);
+			continue;
+		}
+		VM_BUG_ON(ret != SWAP_SUCCESS);
 
 		if (refctx.referenced)
 			nr_rotated++;
-- 
1.6.5.2



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 6/7] wipe_page_reference return SWAP_AGAIN if VM pressulre is low and lock contention is detected.
  2009-12-04  8:40 ` KOSAKI Motohiro
@ 2009-12-04  8:45   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 38+ messages in thread
From: KOSAKI Motohiro @ 2009-12-04  8:45 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, linux-mm, Rik van Riel, Andrea Arcangeli, Larry Woodman

>From 3fb2a585729a37e205c5ea42ac6c48d4a6c0a29c Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Fri, 4 Dec 2009 12:54:37 +0900
Subject: [PATCH 6/7] wipe_page_reference return SWAP_AGAIN if VM pressulre is low and lock contention is detected.

Larry Woodman reported AIM7 makes serious ptelock and anon_vma_lock
contention on current VM. because SplitLRU VM (since 2.6.28) remove
calc_reclaim_mapped() test, then shrink_active_list() always call
page_referenced() against mapped page although VM pressure is low.
Lightweight VM pressure is very common situation and it easily makes
ptelock contention with page fault. then, anon_vma_lock is holding
long time and it makes another lock contention. then, fork/exit
throughput decrease a lot.

	While running workloads that do lots of forking processes, exiting
	processes and page reclamation(AIM 7) on large systems very high system
	time(100%) and lots of lock contention was observed.

	CPU5:
	 [<ffffffff814afb48>] ? _spin_lock+0x27/0x48
	 [<ffffffff81101deb>] ? anon_vma_link+0x2a/0x5a
	 [<ffffffff8105d3d8>] ? dup_mm+0x242/0x40c
	 [<ffffffff8105e0a9>] ? copy_process+0xab1/0x12be
	 [<ffffffff8105ea07>] ? do_fork+0x151/0x330
	 [<ffffffff81058407>] ? default_wake_function+0x0/0x36
	 [<ffffffff814b0243>] ? _spin_lock_irqsave+0x2f/0x68
	 [<ffffffff810121d3>] ? stub_clone+0x13/0x20
	 [<ffffffff81011e02>] ? system_call_fastpath+0x16/0x1b

	CPU4:
	 [<ffffffff814afb4a>] ? _spin_lock+0x29/0x48
	 [<ffffffff81103062>] ? anon_vma_unlink+0x2a/0x84
	 [<ffffffff810fbab7>] ? free_pgtables+0x3c/0xe1
	 [<ffffffff810fd8b1>] ? exit_mmap+0xc5/0x110
	 [<ffffffff8105ce4c>] ? mmput+0x55/0xd9
	 [<ffffffff81061afd>] ? exit_mm+0x109/0x129
	 [<ffffffff81063846>] ? do_exit+0x1d7/0x712
	 [<ffffffff814b0243>] ? _spin_lock_irqsave+0x2f/0x68
	 [<ffffffff81063e07>] ? do_group_exit+0x86/0xb2
	 [<ffffffff81063e55>] ? sys_exit_group+0x22/0x3e
	 [<ffffffff81011e02>] ? system_call_fastpath+0x16/0x1b

	CPU0:
	 [<ffffffff814afb4a>] ? _spin_lock+0x29/0x48
	 [<ffffffff81101ad1>] ? page_check_address+0x9e/0x16f
	 [<ffffffff81101cb8>] ? page_referenced_one+0x53/0x10b
	 [<ffffffff81102f5a>] ? page_referenced+0xcd/0x167
	 [<ffffffff810eb32d>] ? shrink_active_list+0x1ed/0x2a3
	 [<ffffffff810ebde9>] ? shrink_zone+0xa06/0xa38
	 [<ffffffff8108440a>] ? getnstimeofday+0x64/0xce
	 [<ffffffff810ecaf9>] ? do_try_to_free_pages+0x1e5/0x362
	 [<ffffffff810ecd9f>] ? try_to_free_pages+0x7a/0x94
	 [<ffffffff810ea66f>] ? isolate_pages_global+0x0/0x242
	 [<ffffffff810e57b9>] ? __alloc_pages_nodemask+0x397/0x572
	 [<ffffffff810e3c1e>] ? __get_free_pages+0x19/0x6e
	 [<ffffffff8105d6c9>] ? copy_process+0xd1/0x12be
	 [<ffffffff81204eb2>] ? avc_has_perm+0x5c/0x84
	 [<ffffffff81130db8>] ? user_path_at+0x65/0xa3
	 [<ffffffff8105ea07>] ? do_fork+0x151/0x330
	 [<ffffffff810b7935>] ? check_for_new_grace_period+0x78/0xab
	 [<ffffffff810121d3>] ? stub_clone+0x13/0x20
	 [<ffffffff81011e02>] ? system_call_fastpath+0x16/0x1b

	------------------------------------------------------------------------------
	   PerfTop:     864 irqs/sec  kernel:99.7% [100000 cycles],  (all, 8 CPUs)
	------------------------------------------------------------------------------

	             samples    pcnt         RIP          kernel function
	  ______     _______   _____   ________________   _______________

	             3235.00 - 75.1% - ffffffff814afb21 : _spin_lock
	              670.00 - 15.6% - ffffffff81101a33 : page_check_address
	              165.00 -  3.8% - ffffffffa01cbc39 : rpc_sleep_on  [sunrpc]
	               40.00 -  0.9% - ffffffff81102113 : try_to_unmap_one
	               29.00 -  0.7% - ffffffff81101c65 : page_referenced_one
	               27.00 -  0.6% - ffffffff81101964 : vma_address
	                8.00 -  0.2% - ffffffff8125a5a0 : clear_page_c
	                6.00 -  0.1% - ffffffff8125a5f0 : copy_page_c
	                6.00 -  0.1% - ffffffff811023ca : try_to_unmap_anon
	                5.00 -  0.1% - ffffffff810fb014 : copy_page_range
	                5.00 -  0.1% - ffffffff810e4d18 : get_page_from_freelist

Then, We use trylock for avoiding ptelock contention if VM pressure is
low.

Reported-by: Larry Woodman <lwoodman@redhat.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 include/linux/rmap.h |    4 ++++
 mm/rmap.c            |   16 ++++++++++++----
 mm/vmscan.c          |    1 +
 3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 564d981..499972e 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -110,6 +110,10 @@ static inline void page_dup_rmap(struct page *page)
 
 struct page_reference_context {
 	int is_page_locked;
+
+	/* if 1, we might give up to wipe when find lock contention. */
+	int soft_try;
+
 	unsigned long referenced;
 	unsigned long exec_referenced;
 	int maybe_mlocked;	/* found VM_LOCKED, but it's unstable result */
diff --git a/mm/rmap.c b/mm/rmap.c
index b84f350..5ae7c81 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -373,6 +373,9 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma)
 /*
  * Subfunctions of wipe_page_reference: wipe_page_reference_one called
  * repeatedly from either wipe_page_reference_anon or wipe_page_reference_file.
+ *
+ * SWAP_SUCCESS  - success
+ * SWAP_AGAIN    - give up to take lock, try later again
  */
 int wipe_page_reference_one(struct page *page,
 			    struct page_reference_context *refctx,
@@ -381,6 +384,7 @@ int wipe_page_reference_one(struct page *page,
 	struct mm_struct *mm = vma->vm_mm;
 	pte_t *pte;
 	spinlock_t *ptl;
+	int ret = SWAP_SUCCESS;
 
 	/*
 	 * Don't want to elevate referenced for mlocked page that gets this far,
@@ -392,10 +396,14 @@ int wipe_page_reference_one(struct page *page,
 		goto out;
 	}
 
-	pte = page_check_address(page, mm, address, &ptl, 0);
-	if (!pte)
+	pte = __page_check_address(page, mm, address, &ptl, 0,
+				   refctx->soft_try);
+	if (IS_ERR(pte)) {
+		if (PTR_ERR(pte) == -EAGAIN) {
+			ret = SWAP_AGAIN;
+		}
 		goto out;
-
+	}
 	if (ptep_clear_flush_young_notify(vma, address, pte)) {
 		/*
 		 * Don't treat a reference through a sequentially read
@@ -421,7 +429,7 @@ int wipe_page_reference_one(struct page *page,
 	pte_unmap_unlock(pte, ptl);
 
 out:
-	return SWAP_SUCCESS;
+	return ret;
 }
 
 static int wipe_page_reference_anon(struct page *page,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9684e40..16e8bd0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1334,6 +1334,7 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 		int ret;
 		struct page_reference_context refctx = {
 			.is_page_locked = 0,
+			.soft_try = (priority < DEF_PRIORITY - 2) ? 0 : 1,
 		};
 
 		cond_resched();
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 6/7] wipe_page_reference return SWAP_AGAIN if VM pressulre is low and lock contention is detected.
@ 2009-12-04  8:45   ` KOSAKI Motohiro
  0 siblings, 0 replies; 38+ messages in thread
From: KOSAKI Motohiro @ 2009-12-04  8:45 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, linux-mm, Rik van Riel, Andrea Arcangeli, Larry Woodman

From 3fb2a585729a37e205c5ea42ac6c48d4a6c0a29c Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Fri, 4 Dec 2009 12:54:37 +0900
Subject: [PATCH 6/7] wipe_page_reference return SWAP_AGAIN if VM pressulre is low and lock contention is detected.

Larry Woodman reported AIM7 makes serious ptelock and anon_vma_lock
contention on current VM. because SplitLRU VM (since 2.6.28) remove
calc_reclaim_mapped() test, then shrink_active_list() always call
page_referenced() against mapped page although VM pressure is low.
Lightweight VM pressure is very common situation and it easily makes
ptelock contention with page fault. then, anon_vma_lock is holding
long time and it makes another lock contention. then, fork/exit
throughput decrease a lot.

	While running workloads that do lots of forking processes, exiting
	processes and page reclamation(AIM 7) on large systems very high system
	time(100%) and lots of lock contention was observed.

	CPU5:
	 [<ffffffff814afb48>] ? _spin_lock+0x27/0x48
	 [<ffffffff81101deb>] ? anon_vma_link+0x2a/0x5a
	 [<ffffffff8105d3d8>] ? dup_mm+0x242/0x40c
	 [<ffffffff8105e0a9>] ? copy_process+0xab1/0x12be
	 [<ffffffff8105ea07>] ? do_fork+0x151/0x330
	 [<ffffffff81058407>] ? default_wake_function+0x0/0x36
	 [<ffffffff814b0243>] ? _spin_lock_irqsave+0x2f/0x68
	 [<ffffffff810121d3>] ? stub_clone+0x13/0x20
	 [<ffffffff81011e02>] ? system_call_fastpath+0x16/0x1b

	CPU4:
	 [<ffffffff814afb4a>] ? _spin_lock+0x29/0x48
	 [<ffffffff81103062>] ? anon_vma_unlink+0x2a/0x84
	 [<ffffffff810fbab7>] ? free_pgtables+0x3c/0xe1
	 [<ffffffff810fd8b1>] ? exit_mmap+0xc5/0x110
	 [<ffffffff8105ce4c>] ? mmput+0x55/0xd9
	 [<ffffffff81061afd>] ? exit_mm+0x109/0x129
	 [<ffffffff81063846>] ? do_exit+0x1d7/0x712
	 [<ffffffff814b0243>] ? _spin_lock_irqsave+0x2f/0x68
	 [<ffffffff81063e07>] ? do_group_exit+0x86/0xb2
	 [<ffffffff81063e55>] ? sys_exit_group+0x22/0x3e
	 [<ffffffff81011e02>] ? system_call_fastpath+0x16/0x1b

	CPU0:
	 [<ffffffff814afb4a>] ? _spin_lock+0x29/0x48
	 [<ffffffff81101ad1>] ? page_check_address+0x9e/0x16f
	 [<ffffffff81101cb8>] ? page_referenced_one+0x53/0x10b
	 [<ffffffff81102f5a>] ? page_referenced+0xcd/0x167
	 [<ffffffff810eb32d>] ? shrink_active_list+0x1ed/0x2a3
	 [<ffffffff810ebde9>] ? shrink_zone+0xa06/0xa38
	 [<ffffffff8108440a>] ? getnstimeofday+0x64/0xce
	 [<ffffffff810ecaf9>] ? do_try_to_free_pages+0x1e5/0x362
	 [<ffffffff810ecd9f>] ? try_to_free_pages+0x7a/0x94
	 [<ffffffff810ea66f>] ? isolate_pages_global+0x0/0x242
	 [<ffffffff810e57b9>] ? __alloc_pages_nodemask+0x397/0x572
	 [<ffffffff810e3c1e>] ? __get_free_pages+0x19/0x6e
	 [<ffffffff8105d6c9>] ? copy_process+0xd1/0x12be
	 [<ffffffff81204eb2>] ? avc_has_perm+0x5c/0x84
	 [<ffffffff81130db8>] ? user_path_at+0x65/0xa3
	 [<ffffffff8105ea07>] ? do_fork+0x151/0x330
	 [<ffffffff810b7935>] ? check_for_new_grace_period+0x78/0xab
	 [<ffffffff810121d3>] ? stub_clone+0x13/0x20
	 [<ffffffff81011e02>] ? system_call_fastpath+0x16/0x1b

	------------------------------------------------------------------------------
	   PerfTop:     864 irqs/sec  kernel:99.7% [100000 cycles],  (all, 8 CPUs)
	------------------------------------------------------------------------------

	             samples    pcnt         RIP          kernel function
	  ______     _______   _____   ________________   _______________

	             3235.00 - 75.1% - ffffffff814afb21 : _spin_lock
	              670.00 - 15.6% - ffffffff81101a33 : page_check_address
	              165.00 -  3.8% - ffffffffa01cbc39 : rpc_sleep_on  [sunrpc]
	               40.00 -  0.9% - ffffffff81102113 : try_to_unmap_one
	               29.00 -  0.7% - ffffffff81101c65 : page_referenced_one
	               27.00 -  0.6% - ffffffff81101964 : vma_address
	                8.00 -  0.2% - ffffffff8125a5a0 : clear_page_c
	                6.00 -  0.1% - ffffffff8125a5f0 : copy_page_c
	                6.00 -  0.1% - ffffffff811023ca : try_to_unmap_anon
	                5.00 -  0.1% - ffffffff810fb014 : copy_page_range
	                5.00 -  0.1% - ffffffff810e4d18 : get_page_from_freelist

Then, We use trylock for avoiding ptelock contention if VM pressure is
low.

Reported-by: Larry Woodman <lwoodman@redhat.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 include/linux/rmap.h |    4 ++++
 mm/rmap.c            |   16 ++++++++++++----
 mm/vmscan.c          |    1 +
 3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 564d981..499972e 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -110,6 +110,10 @@ static inline void page_dup_rmap(struct page *page)
 
 struct page_reference_context {
 	int is_page_locked;
+
+	/* if 1, we might give up to wipe when find lock contention. */
+	int soft_try;
+
 	unsigned long referenced;
 	unsigned long exec_referenced;
 	int maybe_mlocked;	/* found VM_LOCKED, but it's unstable result */
diff --git a/mm/rmap.c b/mm/rmap.c
index b84f350..5ae7c81 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -373,6 +373,9 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma)
 /*
  * Subfunctions of wipe_page_reference: wipe_page_reference_one called
  * repeatedly from either wipe_page_reference_anon or wipe_page_reference_file.
+ *
+ * SWAP_SUCCESS  - success
+ * SWAP_AGAIN    - give up to take lock, try later again
  */
 int wipe_page_reference_one(struct page *page,
 			    struct page_reference_context *refctx,
@@ -381,6 +384,7 @@ int wipe_page_reference_one(struct page *page,
 	struct mm_struct *mm = vma->vm_mm;
 	pte_t *pte;
 	spinlock_t *ptl;
+	int ret = SWAP_SUCCESS;
 
 	/*
 	 * Don't want to elevate referenced for mlocked page that gets this far,
@@ -392,10 +396,14 @@ int wipe_page_reference_one(struct page *page,
 		goto out;
 	}
 
-	pte = page_check_address(page, mm, address, &ptl, 0);
-	if (!pte)
+	pte = __page_check_address(page, mm, address, &ptl, 0,
+				   refctx->soft_try);
+	if (IS_ERR(pte)) {
+		if (PTR_ERR(pte) == -EAGAIN) {
+			ret = SWAP_AGAIN;
+		}
 		goto out;
-
+	}
 	if (ptep_clear_flush_young_notify(vma, address, pte)) {
 		/*
 		 * Don't treat a reference through a sequentially read
@@ -421,7 +429,7 @@ int wipe_page_reference_one(struct page *page,
 	pte_unmap_unlock(pte, ptl);
 
 out:
-	return SWAP_SUCCESS;
+	return ret;
 }
 
 static int wipe_page_reference_anon(struct page *page,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9684e40..16e8bd0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1334,6 +1334,7 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 		int ret;
 		struct page_reference_context refctx = {
 			.is_page_locked = 0,
+			.soft_try = (priority < DEF_PRIORITY - 2) ? 0 : 1,
 		};
 
 		cond_resched();
-- 
1.6.5.2



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 7/7] Try to mark PG_mlocked if wipe_page_reference find VM_LOCKED vma
  2009-12-04  8:40 ` KOSAKI Motohiro
@ 2009-12-04  8:46   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 38+ messages in thread
From: KOSAKI Motohiro @ 2009-12-04  8:46 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, linux-mm, Rik van Riel, Andrea Arcangeli, Larry Woodman

>From 519178d353926466fcb7411d19424c5e559b6b80 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Fri, 4 Dec 2009 16:51:20 +0900
Subject: [PATCH 7/7] Try to mark PG_mlocked if wipe_page_reference find VM_LOCKED vma

Both try_to_unmap() and wipe_page_reference() walk each ptes, but
latter doesn't mark PG_mlocked altough find VM_LOCKED vma.

This patch does it.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/rmap.c   |   14 ++++++++++++++
 mm/vmscan.c |    2 ++
 2 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 5ae7c81..cfda0a0 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -376,6 +376,7 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma)
  *
  * SWAP_SUCCESS  - success
  * SWAP_AGAIN    - give up to take lock, try later again
+ * SWAP_MLOCK    - the page is mlocked
  */
 int wipe_page_reference_one(struct page *page,
 			    struct page_reference_context *refctx,
@@ -401,6 +402,7 @@ int wipe_page_reference_one(struct page *page,
 	if (IS_ERR(pte)) {
 		if (PTR_ERR(pte) == -EAGAIN) {
 			ret = SWAP_AGAIN;
+			goto out_mlock;
 		}
 		goto out;
 	}
@@ -430,6 +432,17 @@ int wipe_page_reference_one(struct page *page,
 
 out:
 	return ret;
+
+out_mlock:
+	if (refctx->is_page_locked &&
+	    down_read_trylock(&vma->vm_mm->mmap_sem)) {
+		if (vma->vm_flags & VM_LOCKED) {
+			mlock_vma_page(page);
+			ret = SWAP_MLOCK;
+		}
+		up_read(&vma->vm_mm->mmap_sem);
+	}
+	return ret;
 }
 
 static int wipe_page_reference_anon(struct page *page,
@@ -550,6 +563,7 @@ static int wipe_page_reference_file(struct page *page,
  *
  * SWAP_SUCCESS  - success to wipe all ptes
  * SWAP_AGAIN    - temporary busy, try again later
+ * SWAP_MLOCK    - the page is mlocked
  */
 int wipe_page_reference(struct page *page,
 			struct mem_cgroup *memcg,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 16e8bd0..164fda7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -625,6 +625,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		ret = wipe_page_reference(page, sc->mem_cgroup, &refctx);
 		if (ret == SWAP_AGAIN)
 			goto keep_locked;
+		else if (ret == SWAP_MLOCK)
+			goto cull_mlocked;
 		VM_BUG_ON(ret != SWAP_SUCCESS);
 
 		/*
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 7/7] Try to mark PG_mlocked if wipe_page_reference find VM_LOCKED vma
@ 2009-12-04  8:46   ` KOSAKI Motohiro
  0 siblings, 0 replies; 38+ messages in thread
From: KOSAKI Motohiro @ 2009-12-04  8:46 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, linux-mm, Rik van Riel, Andrea Arcangeli, Larry Woodman

From 519178d353926466fcb7411d19424c5e559b6b80 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Fri, 4 Dec 2009 16:51:20 +0900
Subject: [PATCH 7/7] Try to mark PG_mlocked if wipe_page_reference find VM_LOCKED vma

Both try_to_unmap() and wipe_page_reference() walk each ptes, but
latter doesn't mark PG_mlocked altough find VM_LOCKED vma.

This patch does it.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/rmap.c   |   14 ++++++++++++++
 mm/vmscan.c |    2 ++
 2 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 5ae7c81..cfda0a0 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -376,6 +376,7 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma)
  *
  * SWAP_SUCCESS  - success
  * SWAP_AGAIN    - give up to take lock, try later again
+ * SWAP_MLOCK    - the page is mlocked
  */
 int wipe_page_reference_one(struct page *page,
 			    struct page_reference_context *refctx,
@@ -401,6 +402,7 @@ int wipe_page_reference_one(struct page *page,
 	if (IS_ERR(pte)) {
 		if (PTR_ERR(pte) == -EAGAIN) {
 			ret = SWAP_AGAIN;
+			goto out_mlock;
 		}
 		goto out;
 	}
@@ -430,6 +432,17 @@ int wipe_page_reference_one(struct page *page,
 
 out:
 	return ret;
+
+out_mlock:
+	if (refctx->is_page_locked &&
+	    down_read_trylock(&vma->vm_mm->mmap_sem)) {
+		if (vma->vm_flags & VM_LOCKED) {
+			mlock_vma_page(page);
+			ret = SWAP_MLOCK;
+		}
+		up_read(&vma->vm_mm->mmap_sem);
+	}
+	return ret;
 }
 
 static int wipe_page_reference_anon(struct page *page,
@@ -550,6 +563,7 @@ static int wipe_page_reference_file(struct page *page,
  *
  * SWAP_SUCCESS  - success to wipe all ptes
  * SWAP_AGAIN    - temporary busy, try again later
+ * SWAP_MLOCK    - the page is mlocked
  */
 int wipe_page_reference(struct page *page,
 			struct mem_cgroup *memcg,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 16e8bd0..164fda7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -625,6 +625,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		ret = wipe_page_reference(page, sc->mem_cgroup, &refctx);
 		if (ret == SWAP_AGAIN)
 			goto keep_locked;
+		else if (ret == SWAP_MLOCK)
+			goto cull_mlocked;
 		VM_BUG_ON(ret != SWAP_SUCCESS);
 
 		/*
-- 
1.6.5.2



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/7] Introduce __page_check_address
  2009-12-04  8:42   ` KOSAKI Motohiro
@ 2009-12-06 14:55     ` Rik van Riel
  -1 siblings, 0 replies; 38+ messages in thread
From: Rik van Riel @ 2009-12-06 14:55 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrea Arcangeli, Larry Woodman

On 12/04/2009 03:42 AM, KOSAKI Motohiro wrote:
>  From 381108e1ff6309f45f45a67acf2a1dd66e41df4f Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> Date: Thu, 3 Dec 2009 15:01:42 +0900
> Subject: [PATCH 2/7] Introduce __page_check_address
>
> page_check_address() need to take ptelock. but it might be contended.
> Then we need trylock version and this patch introduce new helper function.
>
> it will be used latter patch.
>
> Signed-off-by: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/7] Introduce __page_check_address
@ 2009-12-06 14:55     ` Rik van Riel
  0 siblings, 0 replies; 38+ messages in thread
From: Rik van Riel @ 2009-12-06 14:55 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrea Arcangeli, Larry Woodman

On 12/04/2009 03:42 AM, KOSAKI Motohiro wrote:
>  From 381108e1ff6309f45f45a67acf2a1dd66e41df4f Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> Date: Thu, 3 Dec 2009 15:01:42 +0900
> Subject: [PATCH 2/7] Introduce __page_check_address
>
> page_check_address() need to take ptelock. but it might be contended.
> Then we need trylock version and this patch introduce new helper function.
>
> it will be used latter patch.
>
> Signed-off-by: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] VM_LOCKED check don't need pte lock
  2009-12-04  8:42   ` KOSAKI Motohiro
@ 2009-12-06 19:41     ` Rik van Riel
  -1 siblings, 0 replies; 38+ messages in thread
From: Rik van Riel @ 2009-12-06 19:41 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrea Arcangeli, Larry Woodman

On 12/04/2009 03:42 AM, KOSAKI Motohiro wrote:
>  From 24f910b1ac966c21ea5aab825d1f26815b760304 Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> Date: Thu, 3 Dec 2009 16:06:47 +0900
> Subject: [PATCH 3/7] VM_LOCKED check don't need pte lock
>
> Currently, page_referenced_one() check VM_LOCKED after taking ptelock.
> But it's unnecessary. We can check VM_LOCKED before to take lock.
>
> This patch does it.

Nice optimization.

> Signed-off-by: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] VM_LOCKED check don't need pte lock
@ 2009-12-06 19:41     ` Rik van Riel
  0 siblings, 0 replies; 38+ messages in thread
From: Rik van Riel @ 2009-12-06 19:41 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrea Arcangeli, Larry Woodman

On 12/04/2009 03:42 AM, KOSAKI Motohiro wrote:
>  From 24f910b1ac966c21ea5aab825d1f26815b760304 Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> Date: Thu, 3 Dec 2009 16:06:47 +0900
> Subject: [PATCH 3/7] VM_LOCKED check don't need pte lock
>
> Currently, page_referenced_one() check VM_LOCKED after taking ptelock.
> But it's unnecessary. We can check VM_LOCKED before to take lock.
>
> This patch does it.

Nice optimization.

> Signed-off-by: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/7] Replace page_referenced() with wipe_page_reference()
  2009-12-04  8:43   ` KOSAKI Motohiro
@ 2009-12-06 20:31     ` Rik van Riel
  -1 siblings, 0 replies; 38+ messages in thread
From: Rik van Riel @ 2009-12-06 20:31 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrea Arcangeli, Larry Woodman

On 12/04/2009 03:43 AM, KOSAKI Motohiro wrote:
>  From d9110c2804a4b88e460edada140b8bb0f7eb3a18 Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> Date: Fri, 4 Dec 2009 11:45:18 +0900
> Subject: [PATCH 4/7] Replace page_referenced() with wipe_page_reference()
>
> page_referenced() imply "test the page was referenced or not", but
> shrink_active_list() use it for drop pte young bit. then, it should be
> renamed.
>
> Plus, vm_flags argument is really ugly. instead, introduce new
> struct page_reference_context, it's for collect some statistics.
>
> This patch doesn't have any behavior change.
>
> Signed-off-by: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/7] Replace page_referenced() with wipe_page_reference()
@ 2009-12-06 20:31     ` Rik van Riel
  0 siblings, 0 replies; 38+ messages in thread
From: Rik van Riel @ 2009-12-06 20:31 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrea Arcangeli, Larry Woodman

On 12/04/2009 03:43 AM, KOSAKI Motohiro wrote:
>  From d9110c2804a4b88e460edada140b8bb0f7eb3a18 Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> Date: Fri, 4 Dec 2009 11:45:18 +0900
> Subject: [PATCH 4/7] Replace page_referenced() with wipe_page_reference()
>
> page_referenced() imply "test the page was referenced or not", but
> shrink_active_list() use it for drop pte young bit. then, it should be
> renamed.
>
> Plus, vm_flags argument is really ugly. instead, introduce new
> struct page_reference_context, it's for collect some statistics.
>
> This patch doesn't have any behavior change.
>
> Signed-off-by: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 5/7] Don't deactivate the page if trylock_page() is failed.
  2009-12-04  8:44   ` KOSAKI Motohiro
@ 2009-12-06 20:34     ` Rik van Riel
  -1 siblings, 0 replies; 38+ messages in thread
From: Rik van Riel @ 2009-12-06 20:34 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrea Arcangeli, Larry Woodman

On 12/04/2009 03:44 AM, KOSAKI Motohiro wrote:
>  From 7635eaa033cfcce7f351b5023952f23f0daffefe Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> Date: Fri, 4 Dec 2009 12:03:07 +0900
> Subject: [PATCH 5/7] Don't deactivate the page if trylock_page() is failed.
>
> Currently, wipe_page_reference() increment refctx->referenced variable
> if trylock_page() is failed. but it is meaningless at all.
> shrink_active_list() deactivate the page although the page was
> referenced. The page shouldn't be deactivated with young bit. it
> break reclaim basic theory and decrease reclaim throughput.
>
> This patch introduce new SWAP_AGAIN return value to
> wipe_page_reference().
>
> Signed-off-by: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 5/7] Don't deactivate the page if trylock_page() is failed.
@ 2009-12-06 20:34     ` Rik van Riel
  0 siblings, 0 replies; 38+ messages in thread
From: Rik van Riel @ 2009-12-06 20:34 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrea Arcangeli, Larry Woodman

On 12/04/2009 03:44 AM, KOSAKI Motohiro wrote:
>  From 7635eaa033cfcce7f351b5023952f23f0daffefe Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> Date: Fri, 4 Dec 2009 12:03:07 +0900
> Subject: [PATCH 5/7] Don't deactivate the page if trylock_page() is failed.
>
> Currently, wipe_page_reference() increment refctx->referenced variable
> if trylock_page() is failed. but it is meaningless at all.
> shrink_active_list() deactivate the page although the page was
> referenced. The page shouldn't be deactivated with young bit. it
> break reclaim basic theory and decrease reclaim throughput.
>
> This patch introduce new SWAP_AGAIN return value to
> wipe_page_reference().
>
> Signed-off-by: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 6/7] wipe_page_reference return SWAP_AGAIN if VM pressulre is low and lock contention is detected.
  2009-12-04  8:45   ` KOSAKI Motohiro
@ 2009-12-06 21:01     ` Rik van Riel
  -1 siblings, 0 replies; 38+ messages in thread
From: Rik van Riel @ 2009-12-06 21:01 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrea Arcangeli, Larry Woodman

On 12/04/2009 03:45 AM, KOSAKI Motohiro wrote:
>  From 3fb2a585729a37e205c5ea42ac6c48d4a6c0a29c Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> Date: Fri, 4 Dec 2009 12:54:37 +0900
> Subject: [PATCH 6/7] wipe_page_reference return SWAP_AGAIN if VM pressulre is low and lock contention is detected.
>
> Larry Woodman reported AIM7 makes serious ptelock and anon_vma_lock
> contention on current VM. because SplitLRU VM (since 2.6.28) remove
> calc_reclaim_mapped() test, then shrink_active_list() always call
> page_referenced() against mapped page although VM pressure is low.
> Lightweight VM pressure is very common situation and it easily makes
> ptelock contention with page fault. then, anon_vma_lock is holding
> long time and it makes another lock contention. then, fork/exit
> throughput decrease a lot.

It looks good to me.   Larry, does this patch series resolve
your issue?

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 6/7] wipe_page_reference return SWAP_AGAIN if VM pressulre is low and lock contention is detected.
@ 2009-12-06 21:01     ` Rik van Riel
  0 siblings, 0 replies; 38+ messages in thread
From: Rik van Riel @ 2009-12-06 21:01 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrea Arcangeli, Larry Woodman

On 12/04/2009 03:45 AM, KOSAKI Motohiro wrote:
>  From 3fb2a585729a37e205c5ea42ac6c48d4a6c0a29c Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> Date: Fri, 4 Dec 2009 12:54:37 +0900
> Subject: [PATCH 6/7] wipe_page_reference return SWAP_AGAIN if VM pressulre is low and lock contention is detected.
>
> Larry Woodman reported AIM7 makes serious ptelock and anon_vma_lock
> contention on current VM. because SplitLRU VM (since 2.6.28) remove
> calc_reclaim_mapped() test, then shrink_active_list() always call
> page_referenced() against mapped page although VM pressure is low.
> Lightweight VM pressure is very common situation and it easily makes
> ptelock contention with page fault. then, anon_vma_lock is holding
> long time and it makes another lock contention. then, fork/exit
> throughput decrease a lot.

It looks good to me.   Larry, does this patch series resolve
your issue?

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 7/7] Try to mark PG_mlocked if wipe_page_reference find VM_LOCKED vma
  2009-12-04  8:46   ` KOSAKI Motohiro
@ 2009-12-06 21:03     ` Rik van Riel
  -1 siblings, 0 replies; 38+ messages in thread
From: Rik van Riel @ 2009-12-06 21:03 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrea Arcangeli, Larry Woodman

On 12/04/2009 03:46 AM, KOSAKI Motohiro wrote:
>  From 519178d353926466fcb7411d19424c5e559b6b80 Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> Date: Fri, 4 Dec 2009 16:51:20 +0900
> Subject: [PATCH 7/7] Try to mark PG_mlocked if wipe_page_reference find VM_LOCKED vma
>
> Both try_to_unmap() and wipe_page_reference() walk each ptes, but
> latter doesn't mark PG_mlocked altough find VM_LOCKED vma.
>
> This patch does it.
>
> Signed-off-by: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 7/7] Try to mark PG_mlocked if wipe_page_reference find VM_LOCKED vma
@ 2009-12-06 21:03     ` Rik van Riel
  0 siblings, 0 replies; 38+ messages in thread
From: Rik van Riel @ 2009-12-06 21:03 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrea Arcangeli, Larry Woodman

On 12/04/2009 03:46 AM, KOSAKI Motohiro wrote:
>  From 519178d353926466fcb7411d19424c5e559b6b80 Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> Date: Fri, 4 Dec 2009 16:51:20 +0900
> Subject: [PATCH 7/7] Try to mark PG_mlocked if wipe_page_reference find VM_LOCKED vma
>
> Both try_to_unmap() and wipe_page_reference() walk each ptes, but
> latter doesn't mark PG_mlocked altough find VM_LOCKED vma.
>
> This patch does it.
>
> Signed-off-by: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/7] some page_referenced() improvement
  2009-12-04  8:40 ` KOSAKI Motohiro
@ 2009-12-07  9:25   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 38+ messages in thread
From: KOSAKI Motohiro @ 2009-12-07  9:25 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: kosaki.motohiro, LKML, linux-mm, Rik van Riel, Andrea Arcangeli,
	Larry Woodman

> Hi
> 
> here is my refactoring and improvement patchset of page_referenced().
> I think it solve Larry's AIM7 scalability issue.
> 
> I'll test this patches on stress workload at this weekend. but I hope to
> receive guys review concurrently.

OK. this patches survived my stress workload correctly for two days of last weekend.




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/7] some page_referenced() improvement
@ 2009-12-07  9:25   ` KOSAKI Motohiro
  0 siblings, 0 replies; 38+ messages in thread
From: KOSAKI Motohiro @ 2009-12-07  9:25 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Rik van Riel, Andrea Arcangeli, Larry Woodman

> Hi
> 
> here is my refactoring and improvement patchset of page_referenced().
> I think it solve Larry's AIM7 scalability issue.
> 
> I'll test this patches on stress workload at this weekend. but I hope to
> receive guys review concurrently.

OK. this patches survived my stress workload correctly for two days of last weekend.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/7] Replace page_mapping_inuse() with page_mapped()
  2009-12-04  8:41   ` KOSAKI Motohiro
@ 2009-12-07 11:27     ` Johannes Weiner
  -1 siblings, 0 replies; 38+ messages in thread
From: Johannes Weiner @ 2009-12-07 11:27 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Rik van Riel, Andrea Arcangeli, Larry Woodman

On Fri, Dec 04, 2009 at 05:41:35PM +0900, KOSAKI Motohiro wrote:
> From c0cd3ee2bb13567a36728600a86f43abac3125b5 Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Date: Wed, 2 Dec 2009 12:05:26 +0900
> Subject: [PATCH 1/7] Replace page_mapping_inuse() with page_mapped()
> 
> page reclaim logic need to distingish mapped and unmapped pages.
> However page_mapping_inuse() don't provide proper test way. it test
> the address space (i.e. file) is mmpad(). Why `page' reclaim need
> care unrelated page's mapped state? it's unrelated.
> 
> Thus, This patch replace page_mapping_inuse() with page_mapped()
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Reviewed-by: Rik van Riel <riel@redhat.com>

Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>

> ---
>  mm/vmscan.c |   25 ++-----------------------
>  1 files changed, 2 insertions(+), 23 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index da6cf42..4ba08da 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -262,27 +262,6 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
>  	return ret;
>  }
>  
> -/* Called without lock on whether page is mapped, so answer is unstable */
> -static inline int page_mapping_inuse(struct page *page)
> -{
> -	struct address_space *mapping;
> -
> -	/* Page is in somebody's page tables. */
> -	if (page_mapped(page))
> -		return 1;
> -
> -	/* Be more reluctant to reclaim swapcache than pagecache */
> -	if (PageSwapCache(page))
> -		return 1;
> -
> -	mapping = page_mapping(page);
> -	if (!mapping)
> -		return 0;
> -
> -	/* File is mmap'd by somebody? */
> -	return mapping_mapped(mapping);
> -}
> -
>  static inline int is_page_cache_freeable(struct page *page)
>  {
>  	/*
> @@ -649,7 +628,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  		 * try_to_unmap moves it to unevictable list
>  		 */
>  		if (sc->order <= PAGE_ALLOC_COSTLY_ORDER &&
> -					referenced && page_mapping_inuse(page)
> +					referenced && page_mapped(page)
>  					&& !(vm_flags & VM_LOCKED))
>  			goto activate_locked;
>  
> @@ -1356,7 +1335,7 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
>  		}
>  
>  		/* page_referenced clears PageReferenced */
> -		if (page_mapping_inuse(page) &&
> +		if (page_mapped(page) &&
>  		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags)) {
>  			nr_rotated++;
>  			/*
> -- 
> 1.6.5.2
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/7] Replace page_mapping_inuse() with page_mapped()
@ 2009-12-07 11:27     ` Johannes Weiner
  0 siblings, 0 replies; 38+ messages in thread
From: Johannes Weiner @ 2009-12-07 11:27 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Rik van Riel, Andrea Arcangeli, Larry Woodman

On Fri, Dec 04, 2009 at 05:41:35PM +0900, KOSAKI Motohiro wrote:
> From c0cd3ee2bb13567a36728600a86f43abac3125b5 Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Date: Wed, 2 Dec 2009 12:05:26 +0900
> Subject: [PATCH 1/7] Replace page_mapping_inuse() with page_mapped()
> 
> page reclaim logic need to distingish mapped and unmapped pages.
> However page_mapping_inuse() don't provide proper test way. it test
> the address space (i.e. file) is mmpad(). Why `page' reclaim need
> care unrelated page's mapped state? it's unrelated.
> 
> Thus, This patch replace page_mapping_inuse() with page_mapped()
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Reviewed-by: Rik van Riel <riel@redhat.com>

Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>

> ---
>  mm/vmscan.c |   25 ++-----------------------
>  1 files changed, 2 insertions(+), 23 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index da6cf42..4ba08da 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -262,27 +262,6 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
>  	return ret;
>  }
>  
> -/* Called without lock on whether page is mapped, so answer is unstable */
> -static inline int page_mapping_inuse(struct page *page)
> -{
> -	struct address_space *mapping;
> -
> -	/* Page is in somebody's page tables. */
> -	if (page_mapped(page))
> -		return 1;
> -
> -	/* Be more reluctant to reclaim swapcache than pagecache */
> -	if (PageSwapCache(page))
> -		return 1;
> -
> -	mapping = page_mapping(page);
> -	if (!mapping)
> -		return 0;
> -
> -	/* File is mmap'd by somebody? */
> -	return mapping_mapped(mapping);
> -}
> -
>  static inline int is_page_cache_freeable(struct page *page)
>  {
>  	/*
> @@ -649,7 +628,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  		 * try_to_unmap moves it to unevictable list
>  		 */
>  		if (sc->order <= PAGE_ALLOC_COSTLY_ORDER &&
> -					referenced && page_mapping_inuse(page)
> +					referenced && page_mapped(page)
>  					&& !(vm_flags & VM_LOCKED))
>  			goto activate_locked;
>  
> @@ -1356,7 +1335,7 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
>  		}
>  
>  		/* page_referenced clears PageReferenced */
> -		if (page_mapping_inuse(page) &&
> +		if (page_mapped(page) &&
>  		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags)) {
>  			nr_rotated++;
>  			/*
> -- 
> 1.6.5.2
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [early RFC][PATCH 8/7] vmscan: Don't deactivate many touched page
  2009-12-04  8:40 ` KOSAKI Motohiro
@ 2009-12-07 11:36   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 38+ messages in thread
From: KOSAKI Motohiro @ 2009-12-07 11:36 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: kosaki.motohiro, LKML, linux-mm, Rik van Riel, Larry Woodman


Andrea, Can you please try following patch on your workload?


>From a7758c66d36a136d5fbbcf0b042839445f0ca522 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Mon, 7 Dec 2009 18:37:20 +0900
Subject: [PATCH] [RFC] vmscan: Don't deactivate many touched page

Changelog
 o from andrea's original patch
   - Rebase topon my patches.
   - Use list_cut_position/list_splice_tail pair instead
     list_del/list_add to make pte scan fairness.
   - Only use max young threshold when soft_try is true.
     It avoid wrong OOM sideeffect.
   - Return SWAP_AGAIN instead successful result if max
     young threshold exceed. It prevent the pages without clear
     pte young bit will be deactivated wrongly.
   - Add to treat ksm page logic

Many shared and frequently used page don't need deactivate and
try_to_unamp(). It's pointless while VM pressure is low, the page
might reactivate soon. it's only makes cpu wasting.

Then, This patch makes to stop pte scan if wipe_page_reference()
found lots young pte bit.

Originally-Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 include/linux/rmap.h |   17 +++++++++++++++++
 mm/ksm.c             |    4 ++++
 mm/rmap.c            |   19 +++++++++++++++++++
 3 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 499972e..9ad69b5 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -128,6 +128,23 @@ int wipe_page_reference_one(struct page *page,
 			    struct page_reference_context *refctx,
 			    struct vm_area_struct *vma, unsigned long address);
 
+#define MAX_YOUNG_BIT_CLEARED 64
+/*
+ * if VM pressure is low and the page have too many active mappings, there isn't
+ * any reason to continue clear young bit of other ptes. Otherwise,
+ *  - Makes meaningless cpu wasting, many touched page sholdn't be reclaimed.
+ *  - Makes lots IPI for pte change and it might cause another sadly lock
+ *    contention. 
+ */
+static inline
+int too_many_young_bit_found(struct page_reference_context *refctx)
+{
+	if (refctx->soft_try &&
+	    refctx->referenced >= MAX_YOUNG_BIT_CLEARED)
+		return 1;
+	return 0;
+}
+
 enum ttu_flags {
 	TTU_UNMAP = 0,			/* unmap mode */
 	TTU_MIGRATION = 1,		/* migration mode */
diff --git a/mm/ksm.c b/mm/ksm.c
index 3c121c8..46ea519 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1586,6 +1586,10 @@ again:
 						      rmap_item->address);
 			if (ret != SWAP_SUCCESS)
 				goto out;
+			if (too_many_young_bit_found(refctx)) {
+				ret = SWAP_AGAIN;
+				goto out;
+			}
 			mapcount--;
 			if (!search_new_forks || !mapcount)
 				break;
diff --git a/mm/rmap.c b/mm/rmap.c
index cfda0a0..f4517f3 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -473,6 +473,21 @@ static int wipe_page_reference_anon(struct page *page,
 		ret = wipe_page_reference_one(page, refctx, vma, address);
 		if (ret != SWAP_SUCCESS)
 			break;
+		if (too_many_young_bit_found(refctx)) {
+			LIST_HEAD(tmp_list);
+
+			/*
+			 * The scanned ptes move to list tail. it help every ptes
+			 * on this page will be tested by ptep_clear_young().
+			 * Otherwise, this shortcut makes unfair thing.
+			 */
+			list_cut_position(&tmp_list,
+					  &vma->anon_vma_node,
+					  &anon_vma->head);
+			list_splice_tail(&tmp_list, &vma->anon_vma_node);
+			ret = SWAP_AGAIN;
+			break;
+		}
 		mapcount--;
 		if (!mapcount || refctx->maybe_mlocked)
 			break;
@@ -543,6 +558,10 @@ static int wipe_page_reference_file(struct page *page,
 		ret = wipe_page_reference_one(page, refctx, vma, address);
 		if (ret != SWAP_SUCCESS)
 			break;
+		if (too_many_young_bit_found(refctx)) {
+			ret = SWAP_AGAIN;
+			break;
+		}
 		mapcount--;
 		if (!mapcount || refctx->maybe_mlocked)
 			break;
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [early RFC][PATCH 8/7] vmscan: Don't deactivate many touched page
@ 2009-12-07 11:36   ` KOSAKI Motohiro
  0 siblings, 0 replies; 38+ messages in thread
From: KOSAKI Motohiro @ 2009-12-07 11:36 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: kosaki.motohiro, LKML, linux-mm, Rik van Riel, Larry Woodman


Andrea, Can you please try following patch on your workload?


From a7758c66d36a136d5fbbcf0b042839445f0ca522 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Mon, 7 Dec 2009 18:37:20 +0900
Subject: [PATCH] [RFC] vmscan: Don't deactivate many touched page

Changelog
 o from andrea's original patch
   - Rebase topon my patches.
   - Use list_cut_position/list_splice_tail pair instead
     list_del/list_add to make pte scan fairness.
   - Only use max young threshold when soft_try is true.
     It avoid wrong OOM sideeffect.
   - Return SWAP_AGAIN instead successful result if max
     young threshold exceed. It prevent the pages without clear
     pte young bit will be deactivated wrongly.
   - Add to treat ksm page logic

Many shared and frequently used page don't need deactivate and
try_to_unamp(). It's pointless while VM pressure is low, the page
might reactivate soon. it's only makes cpu wasting.

Then, This patch makes to stop pte scan if wipe_page_reference()
found lots young pte bit.

Originally-Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 include/linux/rmap.h |   17 +++++++++++++++++
 mm/ksm.c             |    4 ++++
 mm/rmap.c            |   19 +++++++++++++++++++
 3 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 499972e..9ad69b5 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -128,6 +128,23 @@ int wipe_page_reference_one(struct page *page,
 			    struct page_reference_context *refctx,
 			    struct vm_area_struct *vma, unsigned long address);
 
+#define MAX_YOUNG_BIT_CLEARED 64
+/*
+ * if VM pressure is low and the page have too many active mappings, there isn't
+ * any reason to continue clear young bit of other ptes. Otherwise,
+ *  - Makes meaningless cpu wasting, many touched page sholdn't be reclaimed.
+ *  - Makes lots IPI for pte change and it might cause another sadly lock
+ *    contention. 
+ */
+static inline
+int too_many_young_bit_found(struct page_reference_context *refctx)
+{
+	if (refctx->soft_try &&
+	    refctx->referenced >= MAX_YOUNG_BIT_CLEARED)
+		return 1;
+	return 0;
+}
+
 enum ttu_flags {
 	TTU_UNMAP = 0,			/* unmap mode */
 	TTU_MIGRATION = 1,		/* migration mode */
diff --git a/mm/ksm.c b/mm/ksm.c
index 3c121c8..46ea519 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1586,6 +1586,10 @@ again:
 						      rmap_item->address);
 			if (ret != SWAP_SUCCESS)
 				goto out;
+			if (too_many_young_bit_found(refctx)) {
+				ret = SWAP_AGAIN;
+				goto out;
+			}
 			mapcount--;
 			if (!search_new_forks || !mapcount)
 				break;
diff --git a/mm/rmap.c b/mm/rmap.c
index cfda0a0..f4517f3 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -473,6 +473,21 @@ static int wipe_page_reference_anon(struct page *page,
 		ret = wipe_page_reference_one(page, refctx, vma, address);
 		if (ret != SWAP_SUCCESS)
 			break;
+		if (too_many_young_bit_found(refctx)) {
+			LIST_HEAD(tmp_list);
+
+			/*
+			 * The scanned ptes move to list tail. it help every ptes
+			 * on this page will be tested by ptep_clear_young().
+			 * Otherwise, this shortcut makes unfair thing.
+			 */
+			list_cut_position(&tmp_list,
+					  &vma->anon_vma_node,
+					  &anon_vma->head);
+			list_splice_tail(&tmp_list, &vma->anon_vma_node);
+			ret = SWAP_AGAIN;
+			break;
+		}
 		mapcount--;
 		if (!mapcount || refctx->maybe_mlocked)
 			break;
@@ -543,6 +558,10 @@ static int wipe_page_reference_file(struct page *page,
 		ret = wipe_page_reference_one(page, refctx, vma, address);
 		if (ret != SWAP_SUCCESS)
 			break;
+		if (too_many_young_bit_found(refctx)) {
+			ret = SWAP_AGAIN;
+			break;
+		}
 		mapcount--;
 		if (!mapcount || refctx->maybe_mlocked)
 			break;
-- 
1.6.5.2



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [early RFC][PATCH 8/7] vmscan: Don't deactivate many touched page
  2009-12-07 11:36   ` KOSAKI Motohiro
@ 2009-12-07 18:10     ` Rik van Riel
  -1 siblings, 0 replies; 38+ messages in thread
From: Rik van Riel @ 2009-12-07 18:10 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Andrea Arcangeli, LKML, linux-mm, Larry Woodman

On 12/07/2009 06:36 AM, KOSAKI Motohiro wrote:
>
> Andrea, Can you please try following patch on your workload?
>
>
>  From a7758c66d36a136d5fbbcf0b042839445f0ca522 Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> Date: Mon, 7 Dec 2009 18:37:20 +0900
> Subject: [PATCH] [RFC] vmscan: Don't deactivate many touched page
>
> Changelog
>   o from andrea's original patch
>     - Rebase topon my patches.
>     - Use list_cut_position/list_splice_tail pair instead
>       list_del/list_add to make pte scan fairness.
>     - Only use max young threshold when soft_try is true.
>       It avoid wrong OOM sideeffect.
>     - Return SWAP_AGAIN instead successful result if max
>       young threshold exceed. It prevent the pages without clear
>       pte young bit will be deactivated wrongly.
>     - Add to treat ksm page logic

I like the concept and your changes, and really only
have a few small nitpicks :)

First, the VM uses a mix of "referenced", "accessed" and
"young".  We should probably avoid adding "active" to that
mix, and may even want to think about moving to just one
or two terms :)

> +#define MAX_YOUNG_BIT_CLEARED 64
> +/*
> + * if VM pressure is low and the page have too many active mappings, there isn't
> + * any reason to continue clear young bit of other ptes. Otherwise,
> + *  - Makes meaningless cpu wasting, many touched page sholdn't be reclaimed.
> + *  - Makes lots IPI for pte change and it might cause another sadly lock
> + *    contention.
> + */

If VM pressure is low and the page has lots of active users, we only
clear up to MAX_YOUNG_BIT_CLEARED accessed bits at a time.  Clearing
accessed bits takes CPU time, needs TLB invalidate IPIs and could
cause lock contention.  Since a heavily shared page is very likely
to be used again soon, the cost outweighs the benefit of making such
a heavily shared page a candidate for eviction.

> diff --git a/mm/rmap.c b/mm/rmap.c
> index cfda0a0..f4517f3 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -473,6 +473,21 @@ static int wipe_page_reference_anon(struct page *page,
>   		ret = wipe_page_reference_one(page, refctx, vma, address);
>   		if (ret != SWAP_SUCCESS)
>   			break;
> +		if (too_many_young_bit_found(refctx)) {
> +			LIST_HEAD(tmp_list);
> +
> +			/*
> +			 * The scanned ptes move to list tail. it help every ptes
> +			 * on this page will be tested by ptep_clear_young().
> +			 * Otherwise, this shortcut makes unfair thing.
> +			 */
> +			list_cut_position(&tmp_list,
> +					&vma->anon_vma_node,
> +					&anon_vma->head);
> +			list_splice_tail(&tmp_list,&vma->anon_vma_node);
> +			ret = SWAP_AGAIN;
> +			break;
> +		}

I do not understand the unfairness here, since all a page needs
to stay on the active list is >64 referenced PTEs.  It does not
matter which of the PTEs mapping the page were recently referenced.

However, rotating the anon vmas around may help spread out lock
pressure in the VM and help things that way, so the code looks
useful to me.

In short, you can give the next version of this patch my

Reviewed-by: Rik van Riel <riel@redhat.com>

All I have are comment nitpicks :)

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [early RFC][PATCH 8/7] vmscan: Don't deactivate many touched page
@ 2009-12-07 18:10     ` Rik van Riel
  0 siblings, 0 replies; 38+ messages in thread
From: Rik van Riel @ 2009-12-07 18:10 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Andrea Arcangeli, LKML, linux-mm, Larry Woodman

On 12/07/2009 06:36 AM, KOSAKI Motohiro wrote:
>
> Andrea, Can you please try following patch on your workload?
>
>
>  From a7758c66d36a136d5fbbcf0b042839445f0ca522 Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> Date: Mon, 7 Dec 2009 18:37:20 +0900
> Subject: [PATCH] [RFC] vmscan: Don't deactivate many touched page
>
> Changelog
>   o from andrea's original patch
>     - Rebase topon my patches.
>     - Use list_cut_position/list_splice_tail pair instead
>       list_del/list_add to make pte scan fairness.
>     - Only use max young threshold when soft_try is true.
>       It avoid wrong OOM sideeffect.
>     - Return SWAP_AGAIN instead successful result if max
>       young threshold exceed. It prevent the pages without clear
>       pte young bit will be deactivated wrongly.
>     - Add to treat ksm page logic

I like the concept and your changes, and really only
have a few small nitpicks :)

First, the VM uses a mix of "referenced", "accessed" and
"young".  We should probably avoid adding "active" to that
mix, and may even want to think about moving to just one
or two terms :)

> +#define MAX_YOUNG_BIT_CLEARED 64
> +/*
> + * if VM pressure is low and the page have too many active mappings, there isn't
> + * any reason to continue clear young bit of other ptes. Otherwise,
> + *  - Makes meaningless cpu wasting, many touched page sholdn't be reclaimed.
> + *  - Makes lots IPI for pte change and it might cause another sadly lock
> + *    contention.
> + */

If VM pressure is low and the page has lots of active users, we only
clear up to MAX_YOUNG_BIT_CLEARED accessed bits at a time.  Clearing
accessed bits takes CPU time, needs TLB invalidate IPIs and could
cause lock contention.  Since a heavily shared page is very likely
to be used again soon, the cost outweighs the benefit of making such
a heavily shared page a candidate for eviction.

> diff --git a/mm/rmap.c b/mm/rmap.c
> index cfda0a0..f4517f3 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -473,6 +473,21 @@ static int wipe_page_reference_anon(struct page *page,
>   		ret = wipe_page_reference_one(page, refctx, vma, address);
>   		if (ret != SWAP_SUCCESS)
>   			break;
> +		if (too_many_young_bit_found(refctx)) {
> +			LIST_HEAD(tmp_list);
> +
> +			/*
> +			 * The scanned ptes move to list tail. it help every ptes
> +			 * on this page will be tested by ptep_clear_young().
> +			 * Otherwise, this shortcut makes unfair thing.
> +			 */
> +			list_cut_position(&tmp_list,
> +					&vma->anon_vma_node,
> +					&anon_vma->head);
> +			list_splice_tail(&tmp_list,&vma->anon_vma_node);
> +			ret = SWAP_AGAIN;
> +			break;
> +		}

I do not understand the unfairness here, since all a page needs
to stay on the active list is >64 referenced PTEs.  It does not
matter which of the PTEs mapping the page were recently referenced.

However, rotating the anon vmas around may help spread out lock
pressure in the VM and help things that way, so the code looks
useful to me.

In short, you can give the next version of this patch my

Reviewed-by: Rik van Riel <riel@redhat.com>

All I have are comment nitpicks :)

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [early RFC][PATCH 8/7] vmscan: Don't deactivate many touched page
  2009-12-07 18:10     ` Rik van Riel
@ 2009-12-08  6:27       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 38+ messages in thread
From: KOSAKI Motohiro @ 2009-12-08  6:27 UTC (permalink / raw)
  To: Rik van Riel
  Cc: kosaki.motohiro, Andrea Arcangeli, LKML, linux-mm, Larry Woodman

> On 12/07/2009 06:36 AM, KOSAKI Motohiro wrote:
> >
> > Andrea, Can you please try following patch on your workload?
> >
> >
> >  From a7758c66d36a136d5fbbcf0b042839445f0ca522 Mon Sep 17 00:00:00 2001
> > From: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> > Date: Mon, 7 Dec 2009 18:37:20 +0900
> > Subject: [PATCH] [RFC] vmscan: Don't deactivate many touched page
> >
> > Changelog
> >   o from andrea's original patch
> >     - Rebase topon my patches.
> >     - Use list_cut_position/list_splice_tail pair instead
> >       list_del/list_add to make pte scan fairness.
> >     - Only use max young threshold when soft_try is true.
> >       It avoid wrong OOM sideeffect.
> >     - Return SWAP_AGAIN instead successful result if max
> >       young threshold exceed. It prevent the pages without clear
> >       pte young bit will be deactivated wrongly.
> >     - Add to treat ksm page logic
> 
> I like the concept and your changes, and really only
> have a few small nitpicks :)
> 
> First, the VM uses a mix of "referenced", "accessed" and
> "young".  We should probably avoid adding "active" to that
> mix, and may even want to think about moving to just one
> or two terms :)

Ah yes, certainly.


> > +#define MAX_YOUNG_BIT_CLEARED 64
> > +/*
> > + * if VM pressure is low and the page have too many active mappings, there isn't
> > + * any reason to continue clear young bit of other ptes. Otherwise,
> > + *  - Makes meaningless cpu wasting, many touched page sholdn't be reclaimed.
> > + *  - Makes lots IPI for pte change and it might cause another sadly lock
> > + *    contention.
> > + */
> 
> If VM pressure is low and the page has lots of active users, we only
> clear up to MAX_YOUNG_BIT_CLEARED accessed bits at a time.  Clearing
> accessed bits takes CPU time, needs TLB invalidate IPIs and could
> cause lock contention.  Since a heavily shared page is very likely
> to be used again soon, the cost outweighs the benefit of making such
> a heavily shared page a candidate for eviction.

Thanks. Will fix.


> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index cfda0a0..f4517f3 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -473,6 +473,21 @@ static int wipe_page_reference_anon(struct page *page,
> >   		ret = wipe_page_reference_one(page, refctx, vma, address);
> >   		if (ret != SWAP_SUCCESS)
> >   			break;
> > +		if (too_many_young_bit_found(refctx)) {
> > +			LIST_HEAD(tmp_list);
> > +
> > +			/*
> > +			 * The scanned ptes move to list tail. it help every ptes
> > +			 * on this page will be tested by ptep_clear_young().
> > +			 * Otherwise, this shortcut makes unfair thing.
> > +			 */
> > +			list_cut_position(&tmp_list,
> > +					&vma->anon_vma_node,
> > +					&anon_vma->head);
> > +			list_splice_tail(&tmp_list,&vma->anon_vma_node);
> > +			ret = SWAP_AGAIN;
> > +			break;
> > +		}
> 
> I do not understand the unfairness here, since all a page needs
> to stay on the active list is >64 referenced PTEs.  It does not
> matter which of the PTEs mapping the page were recently referenced.
> 
> However, rotating the anon vmas around may help spread out lock
> pressure in the VM and help things that way, so the code looks
> useful to me.

agreed. I have to rewrite the comment.


> In short, you can give the next version of this patch my
> 
> Reviewed-by: Rik van Riel <riel@redhat.com>
> 
> All I have are comment nitpicks :)

No. It's really worth.

Thank you.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [early RFC][PATCH 8/7] vmscan: Don't deactivate many touched page
@ 2009-12-08  6:27       ` KOSAKI Motohiro
  0 siblings, 0 replies; 38+ messages in thread
From: KOSAKI Motohiro @ 2009-12-08  6:27 UTC (permalink / raw)
  To: Rik van Riel
  Cc: kosaki.motohiro, Andrea Arcangeli, LKML, linux-mm, Larry Woodman

> On 12/07/2009 06:36 AM, KOSAKI Motohiro wrote:
> >
> > Andrea, Can you please try following patch on your workload?
> >
> >
> >  From a7758c66d36a136d5fbbcf0b042839445f0ca522 Mon Sep 17 00:00:00 2001
> > From: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> > Date: Mon, 7 Dec 2009 18:37:20 +0900
> > Subject: [PATCH] [RFC] vmscan: Don't deactivate many touched page
> >
> > Changelog
> >   o from andrea's original patch
> >     - Rebase topon my patches.
> >     - Use list_cut_position/list_splice_tail pair instead
> >       list_del/list_add to make pte scan fairness.
> >     - Only use max young threshold when soft_try is true.
> >       It avoid wrong OOM sideeffect.
> >     - Return SWAP_AGAIN instead successful result if max
> >       young threshold exceed. It prevent the pages without clear
> >       pte young bit will be deactivated wrongly.
> >     - Add to treat ksm page logic
> 
> I like the concept and your changes, and really only
> have a few small nitpicks :)
> 
> First, the VM uses a mix of "referenced", "accessed" and
> "young".  We should probably avoid adding "active" to that
> mix, and may even want to think about moving to just one
> or two terms :)

Ah yes, certainly.


> > +#define MAX_YOUNG_BIT_CLEARED 64
> > +/*
> > + * if VM pressure is low and the page have too many active mappings, there isn't
> > + * any reason to continue clear young bit of other ptes. Otherwise,
> > + *  - Makes meaningless cpu wasting, many touched page sholdn't be reclaimed.
> > + *  - Makes lots IPI for pte change and it might cause another sadly lock
> > + *    contention.
> > + */
> 
> If VM pressure is low and the page has lots of active users, we only
> clear up to MAX_YOUNG_BIT_CLEARED accessed bits at a time.  Clearing
> accessed bits takes CPU time, needs TLB invalidate IPIs and could
> cause lock contention.  Since a heavily shared page is very likely
> to be used again soon, the cost outweighs the benefit of making such
> a heavily shared page a candidate for eviction.

Thanks. Will fix.


> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index cfda0a0..f4517f3 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -473,6 +473,21 @@ static int wipe_page_reference_anon(struct page *page,
> >   		ret = wipe_page_reference_one(page, refctx, vma, address);
> >   		if (ret != SWAP_SUCCESS)
> >   			break;
> > +		if (too_many_young_bit_found(refctx)) {
> > +			LIST_HEAD(tmp_list);
> > +
> > +			/*
> > +			 * The scanned ptes move to list tail. it help every ptes
> > +			 * on this page will be tested by ptep_clear_young().
> > +			 * Otherwise, this shortcut makes unfair thing.
> > +			 */
> > +			list_cut_position(&tmp_list,
> > +					&vma->anon_vma_node,
> > +					&anon_vma->head);
> > +			list_splice_tail(&tmp_list,&vma->anon_vma_node);
> > +			ret = SWAP_AGAIN;
> > +			break;
> > +		}
> 
> I do not understand the unfairness here, since all a page needs
> to stay on the active list is >64 referenced PTEs.  It does not
> matter which of the PTEs mapping the page were recently referenced.
> 
> However, rotating the anon vmas around may help spread out lock
> pressure in the VM and help things that way, so the code looks
> useful to me.

agreed. I have to rewrite the comment.


> In short, you can give the next version of this patch my
> 
> Reviewed-by: Rik van Riel <riel@redhat.com>
> 
> All I have are comment nitpicks :)

No. It's really worth.

Thank you.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2009-12-08  6:27 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-04  8:40 [RFC][PATCH 0/7] some page_referenced() improvement KOSAKI Motohiro
2009-12-04  8:40 ` KOSAKI Motohiro
2009-12-04  8:41 ` [PATCH 1/7] Replace page_mapping_inuse() with page_mapped() KOSAKI Motohiro
2009-12-04  8:41   ` KOSAKI Motohiro
2009-12-07 11:27   ` Johannes Weiner
2009-12-07 11:27     ` Johannes Weiner
2009-12-04  8:42 ` [PATCH 2/7] Introduce __page_check_address KOSAKI Motohiro
2009-12-04  8:42   ` KOSAKI Motohiro
2009-12-06 14:55   ` Rik van Riel
2009-12-06 14:55     ` Rik van Riel
2009-12-04  8:42 ` [PATCH 3/7] VM_LOCKED check don't need pte lock KOSAKI Motohiro
2009-12-04  8:42   ` KOSAKI Motohiro
2009-12-06 19:41   ` Rik van Riel
2009-12-06 19:41     ` Rik van Riel
2009-12-04  8:43 ` [PATCH 4/7] Replace page_referenced() with wipe_page_reference() KOSAKI Motohiro
2009-12-04  8:43   ` KOSAKI Motohiro
2009-12-06 20:31   ` Rik van Riel
2009-12-06 20:31     ` Rik van Riel
2009-12-04  8:44 ` [PATCH 5/7] Don't deactivate the page if trylock_page() is failed KOSAKI Motohiro
2009-12-04  8:44   ` KOSAKI Motohiro
2009-12-06 20:34   ` Rik van Riel
2009-12-06 20:34     ` Rik van Riel
2009-12-04  8:45 ` [PATCH 6/7] wipe_page_reference return SWAP_AGAIN if VM pressulre is low and lock contention is detected KOSAKI Motohiro
2009-12-04  8:45   ` KOSAKI Motohiro
2009-12-06 21:01   ` Rik van Riel
2009-12-06 21:01     ` Rik van Riel
2009-12-04  8:46 ` [PATCH 7/7] Try to mark PG_mlocked if wipe_page_reference find VM_LOCKED vma KOSAKI Motohiro
2009-12-04  8:46   ` KOSAKI Motohiro
2009-12-06 21:03   ` Rik van Riel
2009-12-06 21:03     ` Rik van Riel
2009-12-07  9:25 ` [RFC][PATCH 0/7] some page_referenced() improvement KOSAKI Motohiro
2009-12-07  9:25   ` KOSAKI Motohiro
2009-12-07 11:36 ` [early RFC][PATCH 8/7] vmscan: Don't deactivate many touched page KOSAKI Motohiro
2009-12-07 11:36   ` KOSAKI Motohiro
2009-12-07 18:10   ` Rik van Riel
2009-12-07 18:10     ` Rik van Riel
2009-12-08  6:27     ` KOSAKI Motohiro
2009-12-08  6:27       ` KOSAKI Motohiro

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.