All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] mm: prepare for ksm swapping
@ 2009-11-10 21:50 ` Hugh Dickins
  0 siblings, 0 replies; 62+ messages in thread
From: Hugh Dickins @ 2009-11-10 21:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Izik Eidus, Andrea Arcangeli, linux-kernel, linux-mm

Here's a series of six miscellaneous mm patches against 2.6.32-rc5-mm1,
intended to follow my earlier swap_info patches, or slot in just before
mmend in the mmotm series.

Apart from the sixth, they have some relevance to the KSM-page swapping
patches, following after a few days.  They clear away some mm cruft,
to let that series concentrate on ksm.c; but should stand on their own.

 include/linux/ksm.h        |    5 -
 include/linux/mm.h         |   17 +++
 include/linux/page-flags.h |    8 -
 include/linux/rmap.h       |    8 +
 mm/Kconfig                 |   14 --
 mm/internal.h              |   26 ++---
 mm/memory-failure.c        |    2 
 mm/memory.c                |    4 
 mm/migrate.c               |   11 --
 mm/mlock.c                 |    2 
 mm/page_alloc.c            |    4 
 mm/rmap.c                  |  174 +++++++++++------------------------
 mm/swapfile.c              |    2 
 13 files changed, 110 insertions(+), 167 deletions(-)

Hugh

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 0/6] mm: prepare for ksm swapping
@ 2009-11-10 21:50 ` Hugh Dickins
  0 siblings, 0 replies; 62+ messages in thread
From: Hugh Dickins @ 2009-11-10 21:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Izik Eidus, Andrea Arcangeli, linux-kernel, linux-mm

Here's a series of six miscellaneous mm patches against 2.6.32-rc5-mm1,
intended to follow my earlier swap_info patches, or slot in just before
mmend in the mmotm series.

Apart from the sixth, they have some relevance to the KSM-page swapping
patches, following after a few days.  They clear away some mm cruft,
to let that series concentrate on ksm.c; but should stand on their own.

 include/linux/ksm.h        |    5 -
 include/linux/mm.h         |   17 +++
 include/linux/page-flags.h |    8 -
 include/linux/rmap.h       |    8 +
 mm/Kconfig                 |   14 --
 mm/internal.h              |   26 ++---
 mm/memory-failure.c        |    2 
 mm/memory.c                |    4 
 mm/migrate.c               |   11 --
 mm/mlock.c                 |    2 
 mm/page_alloc.c            |    4 
 mm/rmap.c                  |  174 +++++++++++------------------------
 mm/swapfile.c              |    2 
 13 files changed, 110 insertions(+), 167 deletions(-)

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 1/6] mm: define PAGE_MAPPING_FLAGS
  2009-11-10 21:50 ` Hugh Dickins
@ 2009-11-10 21:51   ` Hugh Dickins
  -1 siblings, 0 replies; 62+ messages in thread
From: Hugh Dickins @ 2009-11-10 21:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Izik Eidus, Andrea Arcangeli, linux-kernel, linux-mm

At present we define PageAnon(page) by the low PAGE_MAPPING_ANON bit
set in page->mapping, with the higher bits a pointer to the anon_vma;
and have defined PageKsm(page) as that with NULL anon_vma.

But KSM swapping will need to store a pointer there: so in preparation
for that, now define PAGE_MAPPING_FLAGS as the low two bits, including
PAGE_MAPPING_KSM (always set along with PAGE_MAPPING_ANON, until some
other use for the bit emerges).

Declare page_rmapping(page) to return the pointer part of page->mapping,
and page_anon_vma(page) to return the anon_vma pointer when that's what
it is.  Use these in a few appropriate places: notably, unuse_vma() has
been testing page->mapping, but is better to be testing page_anon_vma()
(cases may be added in which flag bits are set without any pointer).

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
---

 include/linux/ksm.h  |    5 +++--
 include/linux/mm.h   |   17 ++++++++++++++++-
 include/linux/rmap.h |    8 ++++++++
 mm/migrate.c         |   11 ++++-------
 mm/rmap.c            |    7 +++----
 mm/swapfile.c        |    2 +-
 6 files changed, 35 insertions(+), 15 deletions(-)

--- mm0/include/linux/ksm.h	2009-09-28 00:28:38.000000000 +0100
+++ mm1/include/linux/ksm.h	2009-11-04 10:52:45.000000000 +0000
@@ -38,7 +38,8 @@ static inline void ksm_exit(struct mm_st
  */
 static inline int PageKsm(struct page *page)
 {
-	return ((unsigned long)page->mapping == PAGE_MAPPING_ANON);
+	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
+				(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
 }
 
 /*
@@ -47,7 +48,7 @@ static inline int PageKsm(struct page *p
 static inline void page_add_ksm_rmap(struct page *page)
 {
 	if (atomic_inc_and_test(&page->_mapcount)) {
-		page->mapping = (void *) PAGE_MAPPING_ANON;
+		page->mapping = (void *) (PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
 		__inc_zone_page_state(page, NR_ANON_PAGES);
 	}
 }
--- mm0/include/linux/mm.h	2009-11-02 12:32:34.000000000 +0000
+++ mm1/include/linux/mm.h	2009-11-04 10:52:45.000000000 +0000
@@ -620,13 +620,22 @@ void page_address_init(void);
 /*
  * On an anonymous page mapped into a user virtual memory area,
  * page->mapping points to its anon_vma, not to a struct address_space;
- * with the PAGE_MAPPING_ANON bit set to distinguish it.
+ * with the PAGE_MAPPING_ANON bit set to distinguish it.  See rmap.h.
+ *
+ * On an anonymous page in a VM_MERGEABLE area, if CONFIG_KSM is enabled,
+ * the PAGE_MAPPING_KSM bit may be set along with the PAGE_MAPPING_ANON bit;
+ * and then page->mapping points, not to an anon_vma, but to a private
+ * structure which KSM associates with that merged page.  See ksm.h.
+ *
+ * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is currently never used.
  *
  * Please note that, confusingly, "page_mapping" refers to the inode
  * address_space which maps the page from disk; whereas "page_mapped"
  * refers to user virtual address space into which the page is mapped.
  */
 #define PAGE_MAPPING_ANON	1
+#define PAGE_MAPPING_KSM	2
+#define PAGE_MAPPING_FLAGS	(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM)
 
 extern struct address_space swapper_space;
 static inline struct address_space *page_mapping(struct page *page)
@@ -644,6 +653,12 @@ static inline struct address_space *page
 	return mapping;
 }
 
+/* Neutral page->mapping pointer to address_space or anon_vma or other */
+static inline void *page_rmapping(struct page *page)
+{
+	return (void *)((unsigned long)page->mapping & ~PAGE_MAPPING_FLAGS);
+}
+
 static inline int PageAnon(struct page *page)
 {
 	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
--- mm0/include/linux/rmap.h	2009-09-28 00:28:39.000000000 +0100
+++ mm1/include/linux/rmap.h	2009-11-04 10:52:45.000000000 +0000
@@ -39,6 +39,14 @@ struct anon_vma {
 
 #ifdef CONFIG_MMU
 
+static inline struct anon_vma *page_anon_vma(struct page *page)
+{
+	if (((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) !=
+					    PAGE_MAPPING_ANON)
+		return NULL;
+	return page_rmapping(page);
+}
+
 static inline void anon_vma_lock(struct vm_area_struct *vma)
 {
 	struct anon_vma *anon_vma = vma->anon_vma;
--- mm0/mm/migrate.c	2009-11-02 12:32:34.000000000 +0000
+++ mm1/mm/migrate.c	2009-11-04 10:52:45.000000000 +0000
@@ -172,17 +172,14 @@ static void remove_anon_migration_ptes(s
 {
 	struct anon_vma *anon_vma;
 	struct vm_area_struct *vma;
-	unsigned long mapping;
-
-	mapping = (unsigned long)new->mapping;
-
-	if (!mapping || (mapping & PAGE_MAPPING_ANON) == 0)
-		return;
 
 	/*
 	 * We hold the mmap_sem lock. So no need to call page_lock_anon_vma.
 	 */
-	anon_vma = (struct anon_vma *) (mapping - PAGE_MAPPING_ANON);
+	anon_vma = page_anon_vma(new);
+	if (!anon_vma)
+		return;
+
 	spin_lock(&anon_vma->lock);
 
 	list_for_each_entry(vma, &anon_vma->head, anon_vma_node)
--- mm0/mm/rmap.c	2009-11-02 12:32:34.000000000 +0000
+++ mm1/mm/rmap.c	2009-11-04 10:52:45.000000000 +0000
@@ -203,7 +203,7 @@ struct anon_vma *page_lock_anon_vma(stru
 
 	rcu_read_lock();
 	anon_mapping = (unsigned long) page->mapping;
-	if (!(anon_mapping & PAGE_MAPPING_ANON))
+	if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON)
 		goto out;
 	if (!page_mapped(page))
 		goto out;
@@ -248,8 +248,7 @@ vma_address(struct page *page, struct vm
 unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma)
 {
 	if (PageAnon(page)) {
-		if ((void *)vma->anon_vma !=
-		    (void *)page->mapping - PAGE_MAPPING_ANON)
+		if (vma->anon_vma != page_anon_vma(page))
 			return -EFAULT;
 	} else if (page->mapping && !(vma->vm_flags & VM_NONLINEAR)) {
 		if (!vma->vm_file ||
@@ -512,7 +511,7 @@ int page_referenced(struct page *page,
 		referenced++;
 
 	*vm_flags = 0;
-	if (page_mapped(page) && page->mapping) {
+	if (page_mapped(page) && page_rmapping(page)) {
 		if (PageAnon(page))
 			referenced += page_referenced_anon(page, mem_cont,
 								vm_flags);
--- mm0/mm/swapfile.c	2009-11-04 10:21:17.000000000 +0000
+++ mm1/mm/swapfile.c	2009-11-04 10:52:45.000000000 +0000
@@ -937,7 +937,7 @@ static int unuse_vma(struct vm_area_stru
 	unsigned long addr, end, next;
 	int ret;
 
-	if (page->mapping) {
+	if (page_anon_vma(page)) {
 		addr = page_address_in_vma(page, vma);
 		if (addr == -EFAULT)
 			return 0;

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 1/6] mm: define PAGE_MAPPING_FLAGS
@ 2009-11-10 21:51   ` Hugh Dickins
  0 siblings, 0 replies; 62+ messages in thread
From: Hugh Dickins @ 2009-11-10 21:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Izik Eidus, Andrea Arcangeli, linux-kernel, linux-mm

At present we define PageAnon(page) by the low PAGE_MAPPING_ANON bit
set in page->mapping, with the higher bits a pointer to the anon_vma;
and have defined PageKsm(page) as that with NULL anon_vma.

But KSM swapping will need to store a pointer there: so in preparation
for that, now define PAGE_MAPPING_FLAGS as the low two bits, including
PAGE_MAPPING_KSM (always set along with PAGE_MAPPING_ANON, until some
other use for the bit emerges).

Declare page_rmapping(page) to return the pointer part of page->mapping,
and page_anon_vma(page) to return the anon_vma pointer when that's what
it is.  Use these in a few appropriate places: notably, unuse_vma() has
been testing page->mapping, but is better to be testing page_anon_vma()
(cases may be added in which flag bits are set without any pointer).

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
---

 include/linux/ksm.h  |    5 +++--
 include/linux/mm.h   |   17 ++++++++++++++++-
 include/linux/rmap.h |    8 ++++++++
 mm/migrate.c         |   11 ++++-------
 mm/rmap.c            |    7 +++----
 mm/swapfile.c        |    2 +-
 6 files changed, 35 insertions(+), 15 deletions(-)

--- mm0/include/linux/ksm.h	2009-09-28 00:28:38.000000000 +0100
+++ mm1/include/linux/ksm.h	2009-11-04 10:52:45.000000000 +0000
@@ -38,7 +38,8 @@ static inline void ksm_exit(struct mm_st
  */
 static inline int PageKsm(struct page *page)
 {
-	return ((unsigned long)page->mapping == PAGE_MAPPING_ANON);
+	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
+				(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
 }
 
 /*
@@ -47,7 +48,7 @@ static inline int PageKsm(struct page *p
 static inline void page_add_ksm_rmap(struct page *page)
 {
 	if (atomic_inc_and_test(&page->_mapcount)) {
-		page->mapping = (void *) PAGE_MAPPING_ANON;
+		page->mapping = (void *) (PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
 		__inc_zone_page_state(page, NR_ANON_PAGES);
 	}
 }
--- mm0/include/linux/mm.h	2009-11-02 12:32:34.000000000 +0000
+++ mm1/include/linux/mm.h	2009-11-04 10:52:45.000000000 +0000
@@ -620,13 +620,22 @@ void page_address_init(void);
 /*
  * On an anonymous page mapped into a user virtual memory area,
  * page->mapping points to its anon_vma, not to a struct address_space;
- * with the PAGE_MAPPING_ANON bit set to distinguish it.
+ * with the PAGE_MAPPING_ANON bit set to distinguish it.  See rmap.h.
+ *
+ * On an anonymous page in a VM_MERGEABLE area, if CONFIG_KSM is enabled,
+ * the PAGE_MAPPING_KSM bit may be set along with the PAGE_MAPPING_ANON bit;
+ * and then page->mapping points, not to an anon_vma, but to a private
+ * structure which KSM associates with that merged page.  See ksm.h.
+ *
+ * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is currently never used.
  *
  * Please note that, confusingly, "page_mapping" refers to the inode
  * address_space which maps the page from disk; whereas "page_mapped"
  * refers to user virtual address space into which the page is mapped.
  */
 #define PAGE_MAPPING_ANON	1
+#define PAGE_MAPPING_KSM	2
+#define PAGE_MAPPING_FLAGS	(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM)
 
 extern struct address_space swapper_space;
 static inline struct address_space *page_mapping(struct page *page)
@@ -644,6 +653,12 @@ static inline struct address_space *page
 	return mapping;
 }
 
+/* Neutral page->mapping pointer to address_space or anon_vma or other */
+static inline void *page_rmapping(struct page *page)
+{
+	return (void *)((unsigned long)page->mapping & ~PAGE_MAPPING_FLAGS);
+}
+
 static inline int PageAnon(struct page *page)
 {
 	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
--- mm0/include/linux/rmap.h	2009-09-28 00:28:39.000000000 +0100
+++ mm1/include/linux/rmap.h	2009-11-04 10:52:45.000000000 +0000
@@ -39,6 +39,14 @@ struct anon_vma {
 
 #ifdef CONFIG_MMU
 
+static inline struct anon_vma *page_anon_vma(struct page *page)
+{
+	if (((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) !=
+					    PAGE_MAPPING_ANON)
+		return NULL;
+	return page_rmapping(page);
+}
+
 static inline void anon_vma_lock(struct vm_area_struct *vma)
 {
 	struct anon_vma *anon_vma = vma->anon_vma;
--- mm0/mm/migrate.c	2009-11-02 12:32:34.000000000 +0000
+++ mm1/mm/migrate.c	2009-11-04 10:52:45.000000000 +0000
@@ -172,17 +172,14 @@ static void remove_anon_migration_ptes(s
 {
 	struct anon_vma *anon_vma;
 	struct vm_area_struct *vma;
-	unsigned long mapping;
-
-	mapping = (unsigned long)new->mapping;
-
-	if (!mapping || (mapping & PAGE_MAPPING_ANON) == 0)
-		return;
 
 	/*
 	 * We hold the mmap_sem lock. So no need to call page_lock_anon_vma.
 	 */
-	anon_vma = (struct anon_vma *) (mapping - PAGE_MAPPING_ANON);
+	anon_vma = page_anon_vma(new);
+	if (!anon_vma)
+		return;
+
 	spin_lock(&anon_vma->lock);
 
 	list_for_each_entry(vma, &anon_vma->head, anon_vma_node)
--- mm0/mm/rmap.c	2009-11-02 12:32:34.000000000 +0000
+++ mm1/mm/rmap.c	2009-11-04 10:52:45.000000000 +0000
@@ -203,7 +203,7 @@ struct anon_vma *page_lock_anon_vma(stru
 
 	rcu_read_lock();
 	anon_mapping = (unsigned long) page->mapping;
-	if (!(anon_mapping & PAGE_MAPPING_ANON))
+	if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON)
 		goto out;
 	if (!page_mapped(page))
 		goto out;
@@ -248,8 +248,7 @@ vma_address(struct page *page, struct vm
 unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma)
 {
 	if (PageAnon(page)) {
-		if ((void *)vma->anon_vma !=
-		    (void *)page->mapping - PAGE_MAPPING_ANON)
+		if (vma->anon_vma != page_anon_vma(page))
 			return -EFAULT;
 	} else if (page->mapping && !(vma->vm_flags & VM_NONLINEAR)) {
 		if (!vma->vm_file ||
@@ -512,7 +511,7 @@ int page_referenced(struct page *page,
 		referenced++;
 
 	*vm_flags = 0;
-	if (page_mapped(page) && page->mapping) {
+	if (page_mapped(page) && page_rmapping(page)) {
 		if (PageAnon(page))
 			referenced += page_referenced_anon(page, mem_cont,
 								vm_flags);
--- mm0/mm/swapfile.c	2009-11-04 10:21:17.000000000 +0000
+++ mm1/mm/swapfile.c	2009-11-04 10:52:45.000000000 +0000
@@ -937,7 +937,7 @@ static int unuse_vma(struct vm_area_stru
 	unsigned long addr, end, next;
 	int ret;
 
-	if (page->mapping) {
+	if (page_anon_vma(page)) {
 		addr = page_address_in_vma(page, vma);
 		if (addr == -EFAULT)
 			return 0;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 2/6] mm: mlocking in try_to_unmap_one
  2009-11-10 21:50 ` Hugh Dickins
@ 2009-11-10 21:55   ` Hugh Dickins
  -1 siblings, 0 replies; 62+ messages in thread
From: Hugh Dickins @ 2009-11-10 21:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Izik Eidus, Andrea Arcangeli, Nick Piggin, KOSAKI Motohiro,
	Rik van Riel, Lee Schermerhorn, linux-kernel, linux-mm

There's contorted mlock/munlock handling in try_to_unmap_anon() and
try_to_unmap_file(), which we'd prefer not to repeat for KSM swapping.
Simplify it by moving it all down into try_to_unmap_one().

One thing is then lost, try_to_munlock()'s distinction between when no
vma holds the page mlocked, and when a vma does mlock it, but we could
not get mmap_sem to set the page flag.  But its only caller takes no
interest in that distinction (and is better testing SWAP_MLOCK anyway),
so let's keep the code simple and return SWAP_AGAIN for both cases.

try_to_unmap_file()'s TTU_MUNLOCK nonlinear handling was particularly
amusing: once unravelled, it turns out to have been choosing between
two different ways of doing the same nothing.  Ah, no, one way was
actually returning SWAP_FAIL when it meant to return SWAP_SUCCESS.

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
---

 mm/mlock.c |    2 
 mm/rmap.c  |  107 ++++++++++++---------------------------------------
 2 files changed, 27 insertions(+), 82 deletions(-)

--- mm1/mm/mlock.c	2009-09-28 00:28:41.000000000 +0100
+++ mm2/mm/mlock.c	2009-11-04 10:52:52.000000000 +0000
@@ -117,7 +117,7 @@ static void munlock_vma_page(struct page
 			/*
 			 * did try_to_unlock() succeed or punt?
 			 */
-			if (ret == SWAP_SUCCESS || ret == SWAP_AGAIN)
+			if (ret != SWAP_MLOCK)
 				count_vm_event(UNEVICTABLE_PGMUNLOCKED);
 
 			putback_lru_page(page);
--- mm1/mm/rmap.c	2009-11-04 10:52:45.000000000 +0000
+++ mm2/mm/rmap.c	2009-11-04 10:52:52.000000000 +0000
@@ -787,6 +787,8 @@ static int try_to_unmap_one(struct page
 			ret = SWAP_MLOCK;
 			goto out_unmap;
 		}
+		if (MLOCK_PAGES && TTU_ACTION(flags) == TTU_MUNLOCK)
+			goto out_unmap;
 	}
 	if (!(flags & TTU_IGNORE_ACCESS)) {
 		if (ptep_clear_flush_young_notify(vma, address, pte)) {
@@ -852,12 +854,22 @@ static int try_to_unmap_one(struct page
 	} else
 		dec_mm_counter(mm, file_rss);
 
-
 	page_remove_rmap(page);
 	page_cache_release(page);
 
 out_unmap:
 	pte_unmap_unlock(pte, ptl);
+
+	if (MLOCK_PAGES && ret == SWAP_MLOCK) {
+		ret = SWAP_AGAIN;
+		if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
+			if (vma->vm_flags & VM_LOCKED) {
+				mlock_vma_page(page);
+				ret = SWAP_MLOCK;
+			}
+			up_read(&vma->vm_mm->mmap_sem);
+		}
+	}
 out:
 	return ret;
 }
@@ -979,23 +991,6 @@ static int try_to_unmap_cluster(unsigned
 	return ret;
 }
 
-/*
- * common handling for pages mapped in VM_LOCKED vmas
- */
-static int try_to_mlock_page(struct page *page, struct vm_area_struct *vma)
-{
-	int mlocked = 0;
-
-	if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
-		if (vma->vm_flags & VM_LOCKED) {
-			mlock_vma_page(page);
-			mlocked++;	/* really mlocked the page */
-		}
-		up_read(&vma->vm_mm->mmap_sem);
-	}
-	return mlocked;
-}
-
 /**
  * try_to_unmap_anon - unmap or unlock anonymous page using the object-based
  * rmap method
@@ -1016,42 +1011,19 @@ static int try_to_unmap_anon(struct page
 {
 	struct anon_vma *anon_vma;
 	struct vm_area_struct *vma;
-	unsigned int mlocked = 0;
 	int ret = SWAP_AGAIN;
-	int unlock = TTU_ACTION(flags) == TTU_MUNLOCK;
-
-	if (MLOCK_PAGES && unlikely(unlock))
-		ret = SWAP_SUCCESS;	/* default for try_to_munlock() */
 
 	anon_vma = page_lock_anon_vma(page);
 	if (!anon_vma)
 		return ret;
 
 	list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {
-		if (MLOCK_PAGES && unlikely(unlock)) {
-			if (!((vma->vm_flags & VM_LOCKED) &&
-			      page_mapped_in_vma(page, vma)))
-				continue;  /* must visit all unlocked vmas */
-			ret = SWAP_MLOCK;  /* saw at least one mlocked vma */
-		} else {
-			ret = try_to_unmap_one(page, vma, flags);
-			if (ret == SWAP_FAIL || !page_mapped(page))
-				break;
-		}
-		if (ret == SWAP_MLOCK) {
-			mlocked = try_to_mlock_page(page, vma);
-			if (mlocked)
-				break;	/* stop if actually mlocked page */
-		}
+		ret = try_to_unmap_one(page, vma, flags);
+		if (ret != SWAP_AGAIN || !page_mapped(page))
+			break;
 	}
 
 	page_unlock_anon_vma(anon_vma);
-
-	if (mlocked)
-		ret = SWAP_MLOCK;	/* actually mlocked the page */
-	else if (ret == SWAP_MLOCK)
-		ret = SWAP_AGAIN;	/* saw VM_LOCKED vma */
-
 	return ret;
 }
 
@@ -1081,45 +1053,23 @@ static int try_to_unmap_file(struct page
 	unsigned long max_nl_cursor = 0;
 	unsigned long max_nl_size = 0;
 	unsigned int mapcount;
-	unsigned int mlocked = 0;
-	int unlock = TTU_ACTION(flags) == TTU_MUNLOCK;
-
-	if (MLOCK_PAGES && unlikely(unlock))
-		ret = SWAP_SUCCESS;	/* default for try_to_munlock() */
 
 	spin_lock(&mapping->i_mmap_lock);
 	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
-		if (MLOCK_PAGES && unlikely(unlock)) {
-			if (!((vma->vm_flags & VM_LOCKED) &&
-						page_mapped_in_vma(page, vma)))
-				continue;	/* must visit all vmas */
-			ret = SWAP_MLOCK;
-		} else {
-			ret = try_to_unmap_one(page, vma, flags);
-			if (ret == SWAP_FAIL || !page_mapped(page))
-				goto out;
-		}
-		if (ret == SWAP_MLOCK) {
-			mlocked = try_to_mlock_page(page, vma);
-			if (mlocked)
-				break;  /* stop if actually mlocked page */
-		}
+		ret = try_to_unmap_one(page, vma, flags);
+		if (ret != SWAP_AGAIN || !page_mapped(page))
+			goto out;
 	}
 
-	if (mlocked)
+	if (list_empty(&mapping->i_mmap_nonlinear))
 		goto out;
 
-	if (list_empty(&mapping->i_mmap_nonlinear))
+	/* We don't bother to try to find the munlocked page in nonlinears */
+	if (MLOCK_PAGES && TTU_ACTION(flags) == TTU_MUNLOCK)
 		goto out;
 
 	list_for_each_entry(vma, &mapping->i_mmap_nonlinear,
 						shared.vm_set.list) {
-		if (MLOCK_PAGES && unlikely(unlock)) {
-			if (!(vma->vm_flags & VM_LOCKED))
-				continue;	/* must visit all vmas */
-			ret = SWAP_MLOCK;	/* leave mlocked == 0 */
-			goto out;		/* no need to look further */
-		}
 		if (!MLOCK_PAGES && !(flags & TTU_IGNORE_MLOCK) &&
 			(vma->vm_flags & VM_LOCKED))
 			continue;
@@ -1161,10 +1111,9 @@ static int try_to_unmap_file(struct page
 			cursor = (unsigned long) vma->vm_private_data;
 			while ( cursor < max_nl_cursor &&
 				cursor < vma->vm_end - vma->vm_start) {
-				ret = try_to_unmap_cluster(cursor, &mapcount,
-								vma, page);
-				if (ret == SWAP_MLOCK)
-					mlocked = 2;	/* to return below */
+				if (try_to_unmap_cluster(cursor, &mapcount,
+						vma, page) == SWAP_MLOCK)
+					ret = SWAP_MLOCK;
 				cursor += CLUSTER_SIZE;
 				vma->vm_private_data = (void *) cursor;
 				if ((int)mapcount <= 0)
@@ -1185,10 +1134,6 @@ static int try_to_unmap_file(struct page
 		vma->vm_private_data = NULL;
 out:
 	spin_unlock(&mapping->i_mmap_lock);
-	if (mlocked)
-		ret = SWAP_MLOCK;	/* actually mlocked the page */
-	else if (ret == SWAP_MLOCK)
-		ret = SWAP_AGAIN;	/* saw VM_LOCKED vma */
 	return ret;
 }
 
@@ -1231,7 +1176,7 @@ int try_to_unmap(struct page *page, enum
  *
  * Return values are:
  *
- * SWAP_SUCCESS	- no vma's holding page mlocked.
+ * SWAP_AGAIN	- no vma is holding page mlocked, or,
  * SWAP_AGAIN	- page mapped in mlocked vma -- couldn't acquire mmap sem
  * SWAP_MLOCK	- page is now mlocked.
  */

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 2/6] mm: mlocking in try_to_unmap_one
@ 2009-11-10 21:55   ` Hugh Dickins
  0 siblings, 0 replies; 62+ messages in thread
From: Hugh Dickins @ 2009-11-10 21:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Izik Eidus, Andrea Arcangeli, Nick Piggin, KOSAKI Motohiro,
	Rik van Riel, Lee Schermerhorn, linux-kernel, linux-mm

There's contorted mlock/munlock handling in try_to_unmap_anon() and
try_to_unmap_file(), which we'd prefer not to repeat for KSM swapping.
Simplify it by moving it all down into try_to_unmap_one().

One thing is then lost, try_to_munlock()'s distinction between when no
vma holds the page mlocked, and when a vma does mlock it, but we could
not get mmap_sem to set the page flag.  But its only caller takes no
interest in that distinction (and is better testing SWAP_MLOCK anyway),
so let's keep the code simple and return SWAP_AGAIN for both cases.

try_to_unmap_file()'s TTU_MUNLOCK nonlinear handling was particularly
amusing: once unravelled, it turns out to have been choosing between
two different ways of doing the same nothing.  Ah, no, one way was
actually returning SWAP_FAIL when it meant to return SWAP_SUCCESS.

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
---

 mm/mlock.c |    2 
 mm/rmap.c  |  107 ++++++++++++---------------------------------------
 2 files changed, 27 insertions(+), 82 deletions(-)

--- mm1/mm/mlock.c	2009-09-28 00:28:41.000000000 +0100
+++ mm2/mm/mlock.c	2009-11-04 10:52:52.000000000 +0000
@@ -117,7 +117,7 @@ static void munlock_vma_page(struct page
 			/*
 			 * did try_to_unlock() succeed or punt?
 			 */
-			if (ret == SWAP_SUCCESS || ret == SWAP_AGAIN)
+			if (ret != SWAP_MLOCK)
 				count_vm_event(UNEVICTABLE_PGMUNLOCKED);
 
 			putback_lru_page(page);
--- mm1/mm/rmap.c	2009-11-04 10:52:45.000000000 +0000
+++ mm2/mm/rmap.c	2009-11-04 10:52:52.000000000 +0000
@@ -787,6 +787,8 @@ static int try_to_unmap_one(struct page
 			ret = SWAP_MLOCK;
 			goto out_unmap;
 		}
+		if (MLOCK_PAGES && TTU_ACTION(flags) == TTU_MUNLOCK)
+			goto out_unmap;
 	}
 	if (!(flags & TTU_IGNORE_ACCESS)) {
 		if (ptep_clear_flush_young_notify(vma, address, pte)) {
@@ -852,12 +854,22 @@ static int try_to_unmap_one(struct page
 	} else
 		dec_mm_counter(mm, file_rss);
 
-
 	page_remove_rmap(page);
 	page_cache_release(page);
 
 out_unmap:
 	pte_unmap_unlock(pte, ptl);
+
+	if (MLOCK_PAGES && ret == SWAP_MLOCK) {
+		ret = SWAP_AGAIN;
+		if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
+			if (vma->vm_flags & VM_LOCKED) {
+				mlock_vma_page(page);
+				ret = SWAP_MLOCK;
+			}
+			up_read(&vma->vm_mm->mmap_sem);
+		}
+	}
 out:
 	return ret;
 }
@@ -979,23 +991,6 @@ static int try_to_unmap_cluster(unsigned
 	return ret;
 }
 
-/*
- * common handling for pages mapped in VM_LOCKED vmas
- */
-static int try_to_mlock_page(struct page *page, struct vm_area_struct *vma)
-{
-	int mlocked = 0;
-
-	if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
-		if (vma->vm_flags & VM_LOCKED) {
-			mlock_vma_page(page);
-			mlocked++;	/* really mlocked the page */
-		}
-		up_read(&vma->vm_mm->mmap_sem);
-	}
-	return mlocked;
-}
-
 /**
  * try_to_unmap_anon - unmap or unlock anonymous page using the object-based
  * rmap method
@@ -1016,42 +1011,19 @@ static int try_to_unmap_anon(struct page
 {
 	struct anon_vma *anon_vma;
 	struct vm_area_struct *vma;
-	unsigned int mlocked = 0;
 	int ret = SWAP_AGAIN;
-	int unlock = TTU_ACTION(flags) == TTU_MUNLOCK;
-
-	if (MLOCK_PAGES && unlikely(unlock))
-		ret = SWAP_SUCCESS;	/* default for try_to_munlock() */
 
 	anon_vma = page_lock_anon_vma(page);
 	if (!anon_vma)
 		return ret;
 
 	list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {
-		if (MLOCK_PAGES && unlikely(unlock)) {
-			if (!((vma->vm_flags & VM_LOCKED) &&
-			      page_mapped_in_vma(page, vma)))
-				continue;  /* must visit all unlocked vmas */
-			ret = SWAP_MLOCK;  /* saw at least one mlocked vma */
-		} else {
-			ret = try_to_unmap_one(page, vma, flags);
-			if (ret == SWAP_FAIL || !page_mapped(page))
-				break;
-		}
-		if (ret == SWAP_MLOCK) {
-			mlocked = try_to_mlock_page(page, vma);
-			if (mlocked)
-				break;	/* stop if actually mlocked page */
-		}
+		ret = try_to_unmap_one(page, vma, flags);
+		if (ret != SWAP_AGAIN || !page_mapped(page))
+			break;
 	}
 
 	page_unlock_anon_vma(anon_vma);
-
-	if (mlocked)
-		ret = SWAP_MLOCK;	/* actually mlocked the page */
-	else if (ret == SWAP_MLOCK)
-		ret = SWAP_AGAIN;	/* saw VM_LOCKED vma */
-
 	return ret;
 }
 
@@ -1081,45 +1053,23 @@ static int try_to_unmap_file(struct page
 	unsigned long max_nl_cursor = 0;
 	unsigned long max_nl_size = 0;
 	unsigned int mapcount;
-	unsigned int mlocked = 0;
-	int unlock = TTU_ACTION(flags) == TTU_MUNLOCK;
-
-	if (MLOCK_PAGES && unlikely(unlock))
-		ret = SWAP_SUCCESS;	/* default for try_to_munlock() */
 
 	spin_lock(&mapping->i_mmap_lock);
 	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
-		if (MLOCK_PAGES && unlikely(unlock)) {
-			if (!((vma->vm_flags & VM_LOCKED) &&
-						page_mapped_in_vma(page, vma)))
-				continue;	/* must visit all vmas */
-			ret = SWAP_MLOCK;
-		} else {
-			ret = try_to_unmap_one(page, vma, flags);
-			if (ret == SWAP_FAIL || !page_mapped(page))
-				goto out;
-		}
-		if (ret == SWAP_MLOCK) {
-			mlocked = try_to_mlock_page(page, vma);
-			if (mlocked)
-				break;  /* stop if actually mlocked page */
-		}
+		ret = try_to_unmap_one(page, vma, flags);
+		if (ret != SWAP_AGAIN || !page_mapped(page))
+			goto out;
 	}
 
-	if (mlocked)
+	if (list_empty(&mapping->i_mmap_nonlinear))
 		goto out;
 
-	if (list_empty(&mapping->i_mmap_nonlinear))
+	/* We don't bother to try to find the munlocked page in nonlinears */
+	if (MLOCK_PAGES && TTU_ACTION(flags) == TTU_MUNLOCK)
 		goto out;
 
 	list_for_each_entry(vma, &mapping->i_mmap_nonlinear,
 						shared.vm_set.list) {
-		if (MLOCK_PAGES && unlikely(unlock)) {
-			if (!(vma->vm_flags & VM_LOCKED))
-				continue;	/* must visit all vmas */
-			ret = SWAP_MLOCK;	/* leave mlocked == 0 */
-			goto out;		/* no need to look further */
-		}
 		if (!MLOCK_PAGES && !(flags & TTU_IGNORE_MLOCK) &&
 			(vma->vm_flags & VM_LOCKED))
 			continue;
@@ -1161,10 +1111,9 @@ static int try_to_unmap_file(struct page
 			cursor = (unsigned long) vma->vm_private_data;
 			while ( cursor < max_nl_cursor &&
 				cursor < vma->vm_end - vma->vm_start) {
-				ret = try_to_unmap_cluster(cursor, &mapcount,
-								vma, page);
-				if (ret == SWAP_MLOCK)
-					mlocked = 2;	/* to return below */
+				if (try_to_unmap_cluster(cursor, &mapcount,
+						vma, page) == SWAP_MLOCK)
+					ret = SWAP_MLOCK;
 				cursor += CLUSTER_SIZE;
 				vma->vm_private_data = (void *) cursor;
 				if ((int)mapcount <= 0)
@@ -1185,10 +1134,6 @@ static int try_to_unmap_file(struct page
 		vma->vm_private_data = NULL;
 out:
 	spin_unlock(&mapping->i_mmap_lock);
-	if (mlocked)
-		ret = SWAP_MLOCK;	/* actually mlocked the page */
-	else if (ret == SWAP_MLOCK)
-		ret = SWAP_AGAIN;	/* saw VM_LOCKED vma */
 	return ret;
 }
 
@@ -1231,7 +1176,7 @@ int try_to_unmap(struct page *page, enum
  *
  * Return values are:
  *
- * SWAP_SUCCESS	- no vma's holding page mlocked.
+ * SWAP_AGAIN	- no vma is holding page mlocked, or,
  * SWAP_AGAIN	- page mapped in mlocked vma -- couldn't acquire mmap sem
  * SWAP_MLOCK	- page is now mlocked.
  */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 3/6] mm: CONFIG_MMU for PG_mlocked
  2009-11-10 21:50 ` Hugh Dickins
@ 2009-11-10 21:59   ` Hugh Dickins
  -1 siblings, 0 replies; 62+ messages in thread
From: Hugh Dickins @ 2009-11-10 21:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Izik Eidus, Andrea Arcangeli, Nick Piggin, KOSAKI Motohiro,
	Rik van Riel, Lee Schermerhorn, Andi Kleen, Wu Fengguang,
	linux-kernel, linux-mm

Remove three degrees of obfuscation, left over from when we had
CONFIG_UNEVICTABLE_LRU.  MLOCK_PAGES is CONFIG_HAVE_MLOCKED_PAGE_BIT
is CONFIG_HAVE_MLOCK is CONFIG_MMU.  rmap.o (and memory-failure.o)
are only built when CONFIG_MMU, so don't need such conditions at all.

Somehow, I feel no compulsion to remove the CONFIG_HAVE_MLOCK*
lines from 169 defconfigs: leave those to evolve in due course.

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
---

 include/linux/page-flags.h |    8 +++-----
 mm/Kconfig                 |    8 --------
 mm/internal.h              |   26 ++++++++++++--------------
 mm/memory-failure.c        |    2 --
 mm/page_alloc.c            |    4 ----
 mm/rmap.c                  |   17 +++++------------
 6 files changed, 20 insertions(+), 45 deletions(-)

--- mm2/include/linux/page-flags.h	2009-11-02 12:32:34.000000000 +0000
+++ mm3/include/linux/page-flags.h	2009-11-04 10:52:58.000000000 +0000
@@ -99,7 +99,7 @@ enum pageflags {
 	PG_buddy,		/* Page is free, on buddy lists */
 	PG_swapbacked,		/* Page is backed by RAM/swap */
 	PG_unevictable,		/* Page is "unevictable"  */
-#ifdef CONFIG_HAVE_MLOCKED_PAGE_BIT
+#ifdef CONFIG_MMU
 	PG_mlocked,		/* Page is vma mlocked */
 #endif
 #ifdef CONFIG_ARCH_USES_PG_UNCACHED
@@ -259,12 +259,10 @@ PAGEFLAG_FALSE(SwapCache)
 PAGEFLAG(Unevictable, unevictable) __CLEARPAGEFLAG(Unevictable, unevictable)
 	TESTCLEARFLAG(Unevictable, unevictable)
 
-#ifdef CONFIG_HAVE_MLOCKED_PAGE_BIT
-#define MLOCK_PAGES 1
+#ifdef CONFIG_MMU
 PAGEFLAG(Mlocked, mlocked) __CLEARPAGEFLAG(Mlocked, mlocked)
 	TESTSCFLAG(Mlocked, mlocked) __TESTCLEARFLAG(Mlocked, mlocked)
 #else
-#define MLOCK_PAGES 0
 PAGEFLAG_FALSE(Mlocked) SETPAGEFLAG_NOOP(Mlocked)
 	TESTCLEARFLAG_FALSE(Mlocked) __TESTCLEARFLAG_FALSE(Mlocked)
 #endif
@@ -393,7 +391,7 @@ static inline void __ClearPageTail(struc
 
 #endif /* !PAGEFLAGS_EXTENDED */
 
-#ifdef CONFIG_HAVE_MLOCKED_PAGE_BIT
+#ifdef CONFIG_MMU
 #define __PG_MLOCKED		(1 << PG_mlocked)
 #else
 #define __PG_MLOCKED		0
--- mm2/mm/Kconfig	2009-11-02 12:32:34.000000000 +0000
+++ mm3/mm/Kconfig	2009-11-04 10:52:58.000000000 +0000
@@ -203,14 +203,6 @@ config VIRT_TO_BUS
 	def_bool y
 	depends on !ARCH_NO_VIRT_TO_BUS
 
-config HAVE_MLOCK
-	bool
-	default y if MMU=y
-
-config HAVE_MLOCKED_PAGE_BIT
-	bool
-	default y if HAVE_MLOCK=y
-
 config MMU_NOTIFIER
 	bool
 
--- mm2/mm/internal.h	2009-09-28 00:28:41.000000000 +0100
+++ mm3/mm/internal.h	2009-11-04 10:52:58.000000000 +0000
@@ -63,17 +63,6 @@ static inline unsigned long page_order(s
 	return page_private(page);
 }
 
-#ifdef CONFIG_HAVE_MLOCK
-extern long mlock_vma_pages_range(struct vm_area_struct *vma,
-			unsigned long start, unsigned long end);
-extern void munlock_vma_pages_range(struct vm_area_struct *vma,
-			unsigned long start, unsigned long end);
-static inline void munlock_vma_pages_all(struct vm_area_struct *vma)
-{
-	munlock_vma_pages_range(vma, vma->vm_start, vma->vm_end);
-}
-#endif
-
 /*
  * unevictable_migrate_page() called only from migrate_page_copy() to
  * migrate unevictable flag to new page.
@@ -86,7 +75,16 @@ static inline void unevictable_migrate_p
 		SetPageUnevictable(new);
 }
 
-#ifdef CONFIG_HAVE_MLOCKED_PAGE_BIT
+#ifdef CONFIG_MMU
+extern long mlock_vma_pages_range(struct vm_area_struct *vma,
+			unsigned long start, unsigned long end);
+extern void munlock_vma_pages_range(struct vm_area_struct *vma,
+			unsigned long start, unsigned long end);
+static inline void munlock_vma_pages_all(struct vm_area_struct *vma)
+{
+	munlock_vma_pages_range(vma, vma->vm_start, vma->vm_end);
+}
+
 /*
  * Called only in fault path via page_evictable() for a new page
  * to determine if it's being mapped into a LOCKED vma.
@@ -144,7 +142,7 @@ static inline void mlock_migrate_page(st
 	}
 }
 
-#else /* CONFIG_HAVE_MLOCKED_PAGE_BIT */
+#else /* !CONFIG_MMU */
 static inline int is_mlocked_vma(struct vm_area_struct *v, struct page *p)
 {
 	return 0;
@@ -153,7 +151,7 @@ static inline void clear_page_mlock(stru
 static inline void mlock_vma_page(struct page *page) { }
 static inline void mlock_migrate_page(struct page *new, struct page *old) { }
 
-#endif /* CONFIG_HAVE_MLOCKED_PAGE_BIT */
+#endif /* !CONFIG_MMU */
 
 /*
  * Return the mem_map entry representing the 'offset' subpage within
--- mm2/mm/memory-failure.c	2009-11-02 12:32:34.000000000 +0000
+++ mm3/mm/memory-failure.c	2009-11-04 10:52:58.000000000 +0000
@@ -582,10 +582,8 @@ static struct page_state {
 	{ unevict|dirty, unevict|dirty,	"unevictable LRU", me_pagecache_dirty},
 	{ unevict,	unevict,	"unevictable LRU", me_pagecache_clean},
 
-#ifdef CONFIG_HAVE_MLOCKED_PAGE_BIT
 	{ mlock|dirty,	mlock|dirty,	"mlocked LRU",	me_pagecache_dirty },
 	{ mlock,	mlock,		"mlocked LRU",	me_pagecache_clean },
-#endif
 
 	{ lru|dirty,	lru|dirty,	"LRU",		me_pagecache_dirty },
 	{ lru|dirty,	lru,		"clean LRU",	me_pagecache_clean },
--- mm2/mm/page_alloc.c	2009-11-02 12:32:34.000000000 +0000
+++ mm3/mm/page_alloc.c	2009-11-04 10:52:58.000000000 +0000
@@ -487,7 +487,6 @@ static inline void __free_one_page(struc
 	zone->free_area[order].nr_free++;
 }
 
-#ifdef CONFIG_HAVE_MLOCKED_PAGE_BIT
 /*
  * free_page_mlock() -- clean up attempts to free and mlocked() page.
  * Page should not be on lru, so no need to fix that up.
@@ -503,9 +502,6 @@ static inline void free_page_mlock(struc
 	__dec_zone_page_state(page, NR_MLOCK);
 	__count_vm_event(UNEVICTABLE_MLOCKFREED);
 }
-#else
-static void free_page_mlock(struct page *page) { }
-#endif
 
 static inline int free_pages_check(struct page *page)
 {
--- mm2/mm/rmap.c	2009-11-04 10:52:52.000000000 +0000
+++ mm3/mm/rmap.c	2009-11-04 10:52:58.000000000 +0000
@@ -787,7 +787,7 @@ static int try_to_unmap_one(struct page
 			ret = SWAP_MLOCK;
 			goto out_unmap;
 		}
-		if (MLOCK_PAGES && TTU_ACTION(flags) == TTU_MUNLOCK)
+		if (TTU_ACTION(flags) == TTU_MUNLOCK)
 			goto out_unmap;
 	}
 	if (!(flags & TTU_IGNORE_ACCESS)) {
@@ -860,7 +860,7 @@ static int try_to_unmap_one(struct page
 out_unmap:
 	pte_unmap_unlock(pte, ptl);
 
-	if (MLOCK_PAGES && ret == SWAP_MLOCK) {
+	if (ret == SWAP_MLOCK) {
 		ret = SWAP_AGAIN;
 		if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
 			if (vma->vm_flags & VM_LOCKED) {
@@ -937,11 +937,10 @@ static int try_to_unmap_cluster(unsigned
 		return ret;
 
 	/*
-	 * MLOCK_PAGES => feature is configured.
-	 * if we can acquire the mmap_sem for read, and vma is VM_LOCKED,
+	 * If we can acquire the mmap_sem for read, and vma is VM_LOCKED,
 	 * keep the sem while scanning the cluster for mlocking pages.
 	 */
-	if (MLOCK_PAGES && down_read_trylock(&vma->vm_mm->mmap_sem)) {
+	if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
 		locked_vma = (vma->vm_flags & VM_LOCKED);
 		if (!locked_vma)
 			up_read(&vma->vm_mm->mmap_sem); /* don't need it */
@@ -1065,14 +1064,11 @@ static int try_to_unmap_file(struct page
 		goto out;
 
 	/* We don't bother to try to find the munlocked page in nonlinears */
-	if (MLOCK_PAGES && TTU_ACTION(flags) == TTU_MUNLOCK)
+	if (TTU_ACTION(flags) == TTU_MUNLOCK)
 		goto out;
 
 	list_for_each_entry(vma, &mapping->i_mmap_nonlinear,
 						shared.vm_set.list) {
-		if (!MLOCK_PAGES && !(flags & TTU_IGNORE_MLOCK) &&
-			(vma->vm_flags & VM_LOCKED))
-			continue;
 		cursor = (unsigned long) vma->vm_private_data;
 		if (cursor > max_nl_cursor)
 			max_nl_cursor = cursor;
@@ -1105,9 +1101,6 @@ static int try_to_unmap_file(struct page
 	do {
 		list_for_each_entry(vma, &mapping->i_mmap_nonlinear,
 						shared.vm_set.list) {
-			if (!MLOCK_PAGES && !(flags & TTU_IGNORE_MLOCK) &&
-			    (vma->vm_flags & VM_LOCKED))
-				continue;
 			cursor = (unsigned long) vma->vm_private_data;
 			while ( cursor < max_nl_cursor &&
 				cursor < vma->vm_end - vma->vm_start) {

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 3/6] mm: CONFIG_MMU for PG_mlocked
@ 2009-11-10 21:59   ` Hugh Dickins
  0 siblings, 0 replies; 62+ messages in thread
From: Hugh Dickins @ 2009-11-10 21:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Izik Eidus, Andrea Arcangeli, Nick Piggin, KOSAKI Motohiro,
	Rik van Riel, Lee Schermerhorn, Andi Kleen, Wu Fengguang,
	linux-kernel, linux-mm

Remove three degrees of obfuscation, left over from when we had
CONFIG_UNEVICTABLE_LRU.  MLOCK_PAGES is CONFIG_HAVE_MLOCKED_PAGE_BIT
is CONFIG_HAVE_MLOCK is CONFIG_MMU.  rmap.o (and memory-failure.o)
are only built when CONFIG_MMU, so don't need such conditions at all.

Somehow, I feel no compulsion to remove the CONFIG_HAVE_MLOCK*
lines from 169 defconfigs: leave those to evolve in due course.

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
---

 include/linux/page-flags.h |    8 +++-----
 mm/Kconfig                 |    8 --------
 mm/internal.h              |   26 ++++++++++++--------------
 mm/memory-failure.c        |    2 --
 mm/page_alloc.c            |    4 ----
 mm/rmap.c                  |   17 +++++------------
 6 files changed, 20 insertions(+), 45 deletions(-)

--- mm2/include/linux/page-flags.h	2009-11-02 12:32:34.000000000 +0000
+++ mm3/include/linux/page-flags.h	2009-11-04 10:52:58.000000000 +0000
@@ -99,7 +99,7 @@ enum pageflags {
 	PG_buddy,		/* Page is free, on buddy lists */
 	PG_swapbacked,		/* Page is backed by RAM/swap */
 	PG_unevictable,		/* Page is "unevictable"  */
-#ifdef CONFIG_HAVE_MLOCKED_PAGE_BIT
+#ifdef CONFIG_MMU
 	PG_mlocked,		/* Page is vma mlocked */
 #endif
 #ifdef CONFIG_ARCH_USES_PG_UNCACHED
@@ -259,12 +259,10 @@ PAGEFLAG_FALSE(SwapCache)
 PAGEFLAG(Unevictable, unevictable) __CLEARPAGEFLAG(Unevictable, unevictable)
 	TESTCLEARFLAG(Unevictable, unevictable)
 
-#ifdef CONFIG_HAVE_MLOCKED_PAGE_BIT
-#define MLOCK_PAGES 1
+#ifdef CONFIG_MMU
 PAGEFLAG(Mlocked, mlocked) __CLEARPAGEFLAG(Mlocked, mlocked)
 	TESTSCFLAG(Mlocked, mlocked) __TESTCLEARFLAG(Mlocked, mlocked)
 #else
-#define MLOCK_PAGES 0
 PAGEFLAG_FALSE(Mlocked) SETPAGEFLAG_NOOP(Mlocked)
 	TESTCLEARFLAG_FALSE(Mlocked) __TESTCLEARFLAG_FALSE(Mlocked)
 #endif
@@ -393,7 +391,7 @@ static inline void __ClearPageTail(struc
 
 #endif /* !PAGEFLAGS_EXTENDED */
 
-#ifdef CONFIG_HAVE_MLOCKED_PAGE_BIT
+#ifdef CONFIG_MMU
 #define __PG_MLOCKED		(1 << PG_mlocked)
 #else
 #define __PG_MLOCKED		0
--- mm2/mm/Kconfig	2009-11-02 12:32:34.000000000 +0000
+++ mm3/mm/Kconfig	2009-11-04 10:52:58.000000000 +0000
@@ -203,14 +203,6 @@ config VIRT_TO_BUS
 	def_bool y
 	depends on !ARCH_NO_VIRT_TO_BUS
 
-config HAVE_MLOCK
-	bool
-	default y if MMU=y
-
-config HAVE_MLOCKED_PAGE_BIT
-	bool
-	default y if HAVE_MLOCK=y
-
 config MMU_NOTIFIER
 	bool
 
--- mm2/mm/internal.h	2009-09-28 00:28:41.000000000 +0100
+++ mm3/mm/internal.h	2009-11-04 10:52:58.000000000 +0000
@@ -63,17 +63,6 @@ static inline unsigned long page_order(s
 	return page_private(page);
 }
 
-#ifdef CONFIG_HAVE_MLOCK
-extern long mlock_vma_pages_range(struct vm_area_struct *vma,
-			unsigned long start, unsigned long end);
-extern void munlock_vma_pages_range(struct vm_area_struct *vma,
-			unsigned long start, unsigned long end);
-static inline void munlock_vma_pages_all(struct vm_area_struct *vma)
-{
-	munlock_vma_pages_range(vma, vma->vm_start, vma->vm_end);
-}
-#endif
-
 /*
  * unevictable_migrate_page() called only from migrate_page_copy() to
  * migrate unevictable flag to new page.
@@ -86,7 +75,16 @@ static inline void unevictable_migrate_p
 		SetPageUnevictable(new);
 }
 
-#ifdef CONFIG_HAVE_MLOCKED_PAGE_BIT
+#ifdef CONFIG_MMU
+extern long mlock_vma_pages_range(struct vm_area_struct *vma,
+			unsigned long start, unsigned long end);
+extern void munlock_vma_pages_range(struct vm_area_struct *vma,
+			unsigned long start, unsigned long end);
+static inline void munlock_vma_pages_all(struct vm_area_struct *vma)
+{
+	munlock_vma_pages_range(vma, vma->vm_start, vma->vm_end);
+}
+
 /*
  * Called only in fault path via page_evictable() for a new page
  * to determine if it's being mapped into a LOCKED vma.
@@ -144,7 +142,7 @@ static inline void mlock_migrate_page(st
 	}
 }
 
-#else /* CONFIG_HAVE_MLOCKED_PAGE_BIT */
+#else /* !CONFIG_MMU */
 static inline int is_mlocked_vma(struct vm_area_struct *v, struct page *p)
 {
 	return 0;
@@ -153,7 +151,7 @@ static inline void clear_page_mlock(stru
 static inline void mlock_vma_page(struct page *page) { }
 static inline void mlock_migrate_page(struct page *new, struct page *old) { }
 
-#endif /* CONFIG_HAVE_MLOCKED_PAGE_BIT */
+#endif /* !CONFIG_MMU */
 
 /*
  * Return the mem_map entry representing the 'offset' subpage within
--- mm2/mm/memory-failure.c	2009-11-02 12:32:34.000000000 +0000
+++ mm3/mm/memory-failure.c	2009-11-04 10:52:58.000000000 +0000
@@ -582,10 +582,8 @@ static struct page_state {
 	{ unevict|dirty, unevict|dirty,	"unevictable LRU", me_pagecache_dirty},
 	{ unevict,	unevict,	"unevictable LRU", me_pagecache_clean},
 
-#ifdef CONFIG_HAVE_MLOCKED_PAGE_BIT
 	{ mlock|dirty,	mlock|dirty,	"mlocked LRU",	me_pagecache_dirty },
 	{ mlock,	mlock,		"mlocked LRU",	me_pagecache_clean },
-#endif
 
 	{ lru|dirty,	lru|dirty,	"LRU",		me_pagecache_dirty },
 	{ lru|dirty,	lru,		"clean LRU",	me_pagecache_clean },
--- mm2/mm/page_alloc.c	2009-11-02 12:32:34.000000000 +0000
+++ mm3/mm/page_alloc.c	2009-11-04 10:52:58.000000000 +0000
@@ -487,7 +487,6 @@ static inline void __free_one_page(struc
 	zone->free_area[order].nr_free++;
 }
 
-#ifdef CONFIG_HAVE_MLOCKED_PAGE_BIT
 /*
  * free_page_mlock() -- clean up attempts to free and mlocked() page.
  * Page should not be on lru, so no need to fix that up.
@@ -503,9 +502,6 @@ static inline void free_page_mlock(struc
 	__dec_zone_page_state(page, NR_MLOCK);
 	__count_vm_event(UNEVICTABLE_MLOCKFREED);
 }
-#else
-static void free_page_mlock(struct page *page) { }
-#endif
 
 static inline int free_pages_check(struct page *page)
 {
--- mm2/mm/rmap.c	2009-11-04 10:52:52.000000000 +0000
+++ mm3/mm/rmap.c	2009-11-04 10:52:58.000000000 +0000
@@ -787,7 +787,7 @@ static int try_to_unmap_one(struct page
 			ret = SWAP_MLOCK;
 			goto out_unmap;
 		}
-		if (MLOCK_PAGES && TTU_ACTION(flags) == TTU_MUNLOCK)
+		if (TTU_ACTION(flags) == TTU_MUNLOCK)
 			goto out_unmap;
 	}
 	if (!(flags & TTU_IGNORE_ACCESS)) {
@@ -860,7 +860,7 @@ static int try_to_unmap_one(struct page
 out_unmap:
 	pte_unmap_unlock(pte, ptl);
 
-	if (MLOCK_PAGES && ret == SWAP_MLOCK) {
+	if (ret == SWAP_MLOCK) {
 		ret = SWAP_AGAIN;
 		if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
 			if (vma->vm_flags & VM_LOCKED) {
@@ -937,11 +937,10 @@ static int try_to_unmap_cluster(unsigned
 		return ret;
 
 	/*
-	 * MLOCK_PAGES => feature is configured.
-	 * if we can acquire the mmap_sem for read, and vma is VM_LOCKED,
+	 * If we can acquire the mmap_sem for read, and vma is VM_LOCKED,
 	 * keep the sem while scanning the cluster for mlocking pages.
 	 */
-	if (MLOCK_PAGES && down_read_trylock(&vma->vm_mm->mmap_sem)) {
+	if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
 		locked_vma = (vma->vm_flags & VM_LOCKED);
 		if (!locked_vma)
 			up_read(&vma->vm_mm->mmap_sem); /* don't need it */
@@ -1065,14 +1064,11 @@ static int try_to_unmap_file(struct page
 		goto out;
 
 	/* We don't bother to try to find the munlocked page in nonlinears */
-	if (MLOCK_PAGES && TTU_ACTION(flags) == TTU_MUNLOCK)
+	if (TTU_ACTION(flags) == TTU_MUNLOCK)
 		goto out;
 
 	list_for_each_entry(vma, &mapping->i_mmap_nonlinear,
 						shared.vm_set.list) {
-		if (!MLOCK_PAGES && !(flags & TTU_IGNORE_MLOCK) &&
-			(vma->vm_flags & VM_LOCKED))
-			continue;
 		cursor = (unsigned long) vma->vm_private_data;
 		if (cursor > max_nl_cursor)
 			max_nl_cursor = cursor;
@@ -1105,9 +1101,6 @@ static int try_to_unmap_file(struct page
 	do {
 		list_for_each_entry(vma, &mapping->i_mmap_nonlinear,
 						shared.vm_set.list) {
-			if (!MLOCK_PAGES && !(flags & TTU_IGNORE_MLOCK) &&
-			    (vma->vm_flags & VM_LOCKED))
-				continue;
 			cursor = (unsigned long) vma->vm_private_data;
 			while ( cursor < max_nl_cursor &&
 				cursor < vma->vm_end - vma->vm_start) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 4/6] mm: pass address down to rmap ones
  2009-11-10 21:50 ` Hugh Dickins
@ 2009-11-10 22:00   ` Hugh Dickins
  -1 siblings, 0 replies; 62+ messages in thread
From: Hugh Dickins @ 2009-11-10 22:00 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Izik Eidus, Andrea Arcangeli, linux-kernel, linux-mm

KSM swapping will know where page_referenced_one() and try_to_unmap_one()
should look.  It could hack page->index to get them to do what it wants,
but it seems cleaner now to pass the address down to them.

Make the same change to page_mkclean_one(), since it follows the same
pattern; but there's no real need in its case.

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
---

 mm/rmap.c |   53 ++++++++++++++++++++++++++--------------------------
 1 file changed, 27 insertions(+), 26 deletions(-)

--- mm3/mm/rmap.c	2009-11-04 10:52:58.000000000 +0000
+++ mm4/mm/rmap.c	2009-11-04 10:53:05.000000000 +0000
@@ -336,21 +336,15 @@ int page_mapped_in_vma(struct page *page
  * Subfunctions of page_referenced: page_referenced_one called
  * repeatedly from either page_referenced_anon or page_referenced_file.
  */
-static int page_referenced_one(struct page *page,
-			       struct vm_area_struct *vma,
-			       unsigned int *mapcount,
+static int page_referenced_one(struct page *page, struct vm_area_struct *vma,
+			       unsigned long address, unsigned int *mapcount,
 			       unsigned long *vm_flags)
 {
 	struct mm_struct *mm = vma->vm_mm;
-	unsigned long address;
 	pte_t *pte;
 	spinlock_t *ptl;
 	int referenced = 0;
 
-	address = vma_address(page, vma);
-	if (address == -EFAULT)
-		goto out;
-
 	pte = page_check_address(page, mm, address, &ptl, 0);
 	if (!pte)
 		goto out;
@@ -408,6 +402,9 @@ static int page_referenced_anon(struct p
 
 	mapcount = page_mapcount(page);
 	list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {
+		unsigned long address = vma_address(page, vma);
+		if (address == -EFAULT)
+			continue;
 		/*
 		 * If we are reclaiming on behalf of a cgroup, skip
 		 * counting on behalf of references from different
@@ -415,7 +412,7 @@ static int page_referenced_anon(struct p
 		 */
 		if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
 			continue;
-		referenced += page_referenced_one(page, vma,
+		referenced += page_referenced_one(page, vma, address,
 						  &mapcount, vm_flags);
 		if (!mapcount)
 			break;
@@ -473,6 +470,9 @@ static int page_referenced_file(struct p
 	mapcount = page_mapcount(page);
 
 	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
+		unsigned long address = vma_address(page, vma);
+		if (address == -EFAULT)
+			continue;
 		/*
 		 * If we are reclaiming on behalf of a cgroup, skip
 		 * counting on behalf of references from different
@@ -480,7 +480,7 @@ static int page_referenced_file(struct p
 		 */
 		if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
 			continue;
-		referenced += page_referenced_one(page, vma,
+		referenced += page_referenced_one(page, vma, address,
 						  &mapcount, vm_flags);
 		if (!mapcount)
 			break;
@@ -534,18 +534,14 @@ int page_referenced(struct page *page,
 	return referenced;
 }
 
-static int page_mkclean_one(struct page *page, struct vm_area_struct *vma)
+static int page_mkclean_one(struct page *page, struct vm_area_struct *vma,
+			    unsigned long address)
 {
 	struct mm_struct *mm = vma->vm_mm;
-	unsigned long address;
 	pte_t *pte;
 	spinlock_t *ptl;
 	int ret = 0;
 
-	address = vma_address(page, vma);
-	if (address == -EFAULT)
-		goto out;
-
 	pte = page_check_address(page, mm, address, &ptl, 1);
 	if (!pte)
 		goto out;
@@ -577,8 +573,12 @@ static int page_mkclean_file(struct addr
 
 	spin_lock(&mapping->i_mmap_lock);
 	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
-		if (vma->vm_flags & VM_SHARED)
-			ret += page_mkclean_one(page, vma);
+		if (vma->vm_flags & VM_SHARED) {
+			unsigned long address = vma_address(page, vma);
+			if (address == -EFAULT)
+				continue;
+			ret += page_mkclean_one(page, vma, address);
+		}
 	}
 	spin_unlock(&mapping->i_mmap_lock);
 	return ret;
@@ -760,19 +760,14 @@ void page_remove_rmap(struct page *page)
  * repeatedly from either try_to_unmap_anon or try_to_unmap_file.
  */
 static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
-				enum ttu_flags flags)
+			    unsigned long address, enum ttu_flags flags)
 {
 	struct mm_struct *mm = vma->vm_mm;
-	unsigned long address;
 	pte_t *pte;
 	pte_t pteval;
 	spinlock_t *ptl;
 	int ret = SWAP_AGAIN;
 
-	address = vma_address(page, vma);
-	if (address == -EFAULT)
-		goto out;
-
 	pte = page_check_address(page, mm, address, &ptl, 0);
 	if (!pte)
 		goto out;
@@ -1017,7 +1012,10 @@ static int try_to_unmap_anon(struct page
 		return ret;
 
 	list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {
-		ret = try_to_unmap_one(page, vma, flags);
+		unsigned long address = vma_address(page, vma);
+		if (address == -EFAULT)
+			continue;
+		ret = try_to_unmap_one(page, vma, address, flags);
 		if (ret != SWAP_AGAIN || !page_mapped(page))
 			break;
 	}
@@ -1055,7 +1053,10 @@ static int try_to_unmap_file(struct page
 
 	spin_lock(&mapping->i_mmap_lock);
 	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
-		ret = try_to_unmap_one(page, vma, flags);
+		unsigned long address = vma_address(page, vma);
+		if (address == -EFAULT)
+			continue;
+		ret = try_to_unmap_one(page, vma, address, flags);
 		if (ret != SWAP_AGAIN || !page_mapped(page))
 			goto out;
 	}

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 4/6] mm: pass address down to rmap ones
@ 2009-11-10 22:00   ` Hugh Dickins
  0 siblings, 0 replies; 62+ messages in thread
From: Hugh Dickins @ 2009-11-10 22:00 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Izik Eidus, Andrea Arcangeli, linux-kernel, linux-mm

KSM swapping will know where page_referenced_one() and try_to_unmap_one()
should look.  It could hack page->index to get them to do what it wants,
but it seems cleaner now to pass the address down to them.

Make the same change to page_mkclean_one(), since it follows the same
pattern; but there's no real need in its case.

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
---

 mm/rmap.c |   53 ++++++++++++++++++++++++++--------------------------
 1 file changed, 27 insertions(+), 26 deletions(-)

--- mm3/mm/rmap.c	2009-11-04 10:52:58.000000000 +0000
+++ mm4/mm/rmap.c	2009-11-04 10:53:05.000000000 +0000
@@ -336,21 +336,15 @@ int page_mapped_in_vma(struct page *page
  * Subfunctions of page_referenced: page_referenced_one called
  * repeatedly from either page_referenced_anon or page_referenced_file.
  */
-static int page_referenced_one(struct page *page,
-			       struct vm_area_struct *vma,
-			       unsigned int *mapcount,
+static int page_referenced_one(struct page *page, struct vm_area_struct *vma,
+			       unsigned long address, unsigned int *mapcount,
 			       unsigned long *vm_flags)
 {
 	struct mm_struct *mm = vma->vm_mm;
-	unsigned long address;
 	pte_t *pte;
 	spinlock_t *ptl;
 	int referenced = 0;
 
-	address = vma_address(page, vma);
-	if (address == -EFAULT)
-		goto out;
-
 	pte = page_check_address(page, mm, address, &ptl, 0);
 	if (!pte)
 		goto out;
@@ -408,6 +402,9 @@ static int page_referenced_anon(struct p
 
 	mapcount = page_mapcount(page);
 	list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {
+		unsigned long address = vma_address(page, vma);
+		if (address == -EFAULT)
+			continue;
 		/*
 		 * If we are reclaiming on behalf of a cgroup, skip
 		 * counting on behalf of references from different
@@ -415,7 +412,7 @@ static int page_referenced_anon(struct p
 		 */
 		if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
 			continue;
-		referenced += page_referenced_one(page, vma,
+		referenced += page_referenced_one(page, vma, address,
 						  &mapcount, vm_flags);
 		if (!mapcount)
 			break;
@@ -473,6 +470,9 @@ static int page_referenced_file(struct p
 	mapcount = page_mapcount(page);
 
 	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
+		unsigned long address = vma_address(page, vma);
+		if (address == -EFAULT)
+			continue;
 		/*
 		 * If we are reclaiming on behalf of a cgroup, skip
 		 * counting on behalf of references from different
@@ -480,7 +480,7 @@ static int page_referenced_file(struct p
 		 */
 		if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
 			continue;
-		referenced += page_referenced_one(page, vma,
+		referenced += page_referenced_one(page, vma, address,
 						  &mapcount, vm_flags);
 		if (!mapcount)
 			break;
@@ -534,18 +534,14 @@ int page_referenced(struct page *page,
 	return referenced;
 }
 
-static int page_mkclean_one(struct page *page, struct vm_area_struct *vma)
+static int page_mkclean_one(struct page *page, struct vm_area_struct *vma,
+			    unsigned long address)
 {
 	struct mm_struct *mm = vma->vm_mm;
-	unsigned long address;
 	pte_t *pte;
 	spinlock_t *ptl;
 	int ret = 0;
 
-	address = vma_address(page, vma);
-	if (address == -EFAULT)
-		goto out;
-
 	pte = page_check_address(page, mm, address, &ptl, 1);
 	if (!pte)
 		goto out;
@@ -577,8 +573,12 @@ static int page_mkclean_file(struct addr
 
 	spin_lock(&mapping->i_mmap_lock);
 	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
-		if (vma->vm_flags & VM_SHARED)
-			ret += page_mkclean_one(page, vma);
+		if (vma->vm_flags & VM_SHARED) {
+			unsigned long address = vma_address(page, vma);
+			if (address == -EFAULT)
+				continue;
+			ret += page_mkclean_one(page, vma, address);
+		}
 	}
 	spin_unlock(&mapping->i_mmap_lock);
 	return ret;
@@ -760,19 +760,14 @@ void page_remove_rmap(struct page *page)
  * repeatedly from either try_to_unmap_anon or try_to_unmap_file.
  */
 static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
-				enum ttu_flags flags)
+			    unsigned long address, enum ttu_flags flags)
 {
 	struct mm_struct *mm = vma->vm_mm;
-	unsigned long address;
 	pte_t *pte;
 	pte_t pteval;
 	spinlock_t *ptl;
 	int ret = SWAP_AGAIN;
 
-	address = vma_address(page, vma);
-	if (address == -EFAULT)
-		goto out;
-
 	pte = page_check_address(page, mm, address, &ptl, 0);
 	if (!pte)
 		goto out;
@@ -1017,7 +1012,10 @@ static int try_to_unmap_anon(struct page
 		return ret;
 
 	list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {
-		ret = try_to_unmap_one(page, vma, flags);
+		unsigned long address = vma_address(page, vma);
+		if (address == -EFAULT)
+			continue;
+		ret = try_to_unmap_one(page, vma, address, flags);
 		if (ret != SWAP_AGAIN || !page_mapped(page))
 			break;
 	}
@@ -1055,7 +1053,10 @@ static int try_to_unmap_file(struct page
 
 	spin_lock(&mapping->i_mmap_lock);
 	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
-		ret = try_to_unmap_one(page, vma, flags);
+		unsigned long address = vma_address(page, vma);
+		if (address == -EFAULT)
+			continue;
+		ret = try_to_unmap_one(page, vma, address, flags);
 		if (ret != SWAP_AGAIN || !page_mapped(page))
 			goto out;
 	}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 5/6] mm: stop ptlock enlarging struct page
  2009-11-10 21:50 ` Hugh Dickins
@ 2009-11-10 22:02   ` Hugh Dickins
  -1 siblings, 0 replies; 62+ messages in thread
From: Hugh Dickins @ 2009-11-10 22:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Izik Eidus, Andrea Arcangeli, Peter Zijlstra, Christoph Lameter,
	linux-kernel, linux-mm

CONFIG_DEBUG_SPINLOCK adds 12 or 16 bytes to a 32- or 64-bit spinlock_t,
and CONFIG_DEBUG_LOCK_ALLOC adds another 12 or 24 bytes to it: lockdep
enables both of those, and CONFIG_LOCK_STAT adds 8 or 16 bytes to that.

When 2.6.15 placed the split page table lock inside struct page (usually
sized 32 or 56 bytes), only CONFIG_DEBUG_SPINLOCK was a possibility, and
we ignored the enlargement (but fitted in CONFIG_GENERIC_LOCKBREAK's 4
by letting the spinlock_t occupy both page->private and page->mapping).

Should these debugging options be allowed to double the size of a struct
page, when only one minority use of the page (as a page table) needs to
fit a spinlock in there?  Perhaps not.

Take the easy way out: switch off SPLIT_PTLOCK_CPUS when DEBUG_SPINLOCK
or DEBUG_LOCK_ALLOC is in force.  I've sometimes tried to be cleverer,
kmallocing a cacheline for the spinlock when it doesn't fit, but given
up each time.  Falling back to mm->page_table_lock (as we do when ptlock
is not split) lets lockdep check out the strictest path anyway.

And now that some arches allow 8192 cpus, use 999999 for infinity.

(What has this got to do with KSM swapping?  It doesn't care about the
size of struct page, but may care about random junk in page->mapping -
to be explained separately later.)

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
---

 mm/Kconfig |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- mm4/mm/Kconfig	2009-11-04 10:52:58.000000000 +0000
+++ mm5/mm/Kconfig	2009-11-04 10:53:13.000000000 +0000
@@ -161,11 +161,13 @@ config PAGEFLAGS_EXTENDED
 # Default to 4 for wider testing, though 8 might be more appropriate.
 # ARM's adjust_pte (unused if VIPT) depends on mm-wide page_table_lock.
 # PA-RISC 7xxx's spinlock_t would enlarge struct page from 32 to 44 bytes.
+# DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC spinlock_t also enlarge struct page.
 #
 config SPLIT_PTLOCK_CPUS
 	int
-	default "4096" if ARM && !CPU_CACHE_VIPT
-	default "4096" if PARISC && !PA20
+	default "999999" if ARM && !CPU_CACHE_VIPT
+	default "999999" if PARISC && !PA20
+	default "999999" if DEBUG_SPINLOCK || DEBUG_LOCK_ALLOC
 	default "4"
 
 #

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 5/6] mm: stop ptlock enlarging struct page
@ 2009-11-10 22:02   ` Hugh Dickins
  0 siblings, 0 replies; 62+ messages in thread
From: Hugh Dickins @ 2009-11-10 22:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Izik Eidus, Andrea Arcangeli, Peter Zijlstra, Christoph Lameter,
	linux-kernel, linux-mm

CONFIG_DEBUG_SPINLOCK adds 12 or 16 bytes to a 32- or 64-bit spinlock_t,
and CONFIG_DEBUG_LOCK_ALLOC adds another 12 or 24 bytes to it: lockdep
enables both of those, and CONFIG_LOCK_STAT adds 8 or 16 bytes to that.

When 2.6.15 placed the split page table lock inside struct page (usually
sized 32 or 56 bytes), only CONFIG_DEBUG_SPINLOCK was a possibility, and
we ignored the enlargement (but fitted in CONFIG_GENERIC_LOCKBREAK's 4
by letting the spinlock_t occupy both page->private and page->mapping).

Should these debugging options be allowed to double the size of a struct
page, when only one minority use of the page (as a page table) needs to
fit a spinlock in there?  Perhaps not.

Take the easy way out: switch off SPLIT_PTLOCK_CPUS when DEBUG_SPINLOCK
or DEBUG_LOCK_ALLOC is in force.  I've sometimes tried to be cleverer,
kmallocing a cacheline for the spinlock when it doesn't fit, but given
up each time.  Falling back to mm->page_table_lock (as we do when ptlock
is not split) lets lockdep check out the strictest path anyway.

And now that some arches allow 8192 cpus, use 999999 for infinity.

(What has this got to do with KSM swapping?  It doesn't care about the
size of struct page, but may care about random junk in page->mapping -
to be explained separately later.)

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
---

 mm/Kconfig |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- mm4/mm/Kconfig	2009-11-04 10:52:58.000000000 +0000
+++ mm5/mm/Kconfig	2009-11-04 10:53:13.000000000 +0000
@@ -161,11 +161,13 @@ config PAGEFLAGS_EXTENDED
 # Default to 4 for wider testing, though 8 might be more appropriate.
 # ARM's adjust_pte (unused if VIPT) depends on mm-wide page_table_lock.
 # PA-RISC 7xxx's spinlock_t would enlarge struct page from 32 to 44 bytes.
+# DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC spinlock_t also enlarge struct page.
 #
 config SPLIT_PTLOCK_CPUS
 	int
-	default "4096" if ARM && !CPU_CACHE_VIPT
-	default "4096" if PARISC && !PA20
+	default "999999" if ARM && !CPU_CACHE_VIPT
+	default "999999" if PARISC && !PA20
+	default "999999" if DEBUG_SPINLOCK || DEBUG_LOCK_ALLOC
 	default "4"
 
 #

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 6/6] mm: sigbus instead of abusing oom
  2009-11-10 21:50 ` Hugh Dickins
@ 2009-11-10 22:06   ` Hugh Dickins
  -1 siblings, 0 replies; 62+ messages in thread
From: Hugh Dickins @ 2009-11-10 22:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Izik Eidus, Andrea Arcangeli, KAMEZAWA Hiroyuki, Minchan Kim,
	Andi Kleen, Wu Fengguang, linux-kernel, linux-mm

When do_nonlinear_fault() realizes that the page table must have been
corrupted for it to have been called, it does print_bad_pte() and
returns ... VM_FAULT_OOM, which is hard to understand.

It made some sense when I did it for 2.6.15, when do_page_fault()
just killed the current process; but nowadays it lets the OOM killer
decide who to kill - so page table corruption in one process would
be liable to kill another.

Change it to return VM_FAULT_SIGBUS instead: that doesn't guarantee
that the process will be killed, but is good enough for such a rare
abnormality, accompanied as it is by the "BUG: Bad page map" message.

And recent HWPOISON work has copied that code into do_swap_page(),
when it finds an impossible swap entry: fix that to VM_FAULT_SIGBUS too.

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
---
This one has nothing whatever to do with KSM swapping,
just something that KAMEZAWA-san and Minchan noticed recently.

 mm/memory.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- mm5/mm/memory.c	2009-11-02 12:32:34.000000000 +0000
+++ mm6/mm/memory.c	2009-11-07 14:44:58.000000000 +0000
@@ -2529,7 +2529,7 @@ static int do_swap_page(struct mm_struct
 			ret = VM_FAULT_HWPOISON;
 		} else {
 			print_bad_pte(vma, address, orig_pte, NULL);
-			ret = VM_FAULT_OOM;
+			ret = VM_FAULT_SIGBUS;
 		}
 		goto out;
 	}
@@ -2925,7 +2925,7 @@ static int do_nonlinear_fault(struct mm_
 		 * Page table corrupted: show pte and kill process.
 		 */
 		print_bad_pte(vma, address, orig_pte, NULL);
-		return VM_FAULT_OOM;
+		return VM_FAULT_SIGBUS;
 	}
 
 	pgoff = pte_to_pgoff(orig_pte);

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 6/6] mm: sigbus instead of abusing oom
@ 2009-11-10 22:06   ` Hugh Dickins
  0 siblings, 0 replies; 62+ messages in thread
From: Hugh Dickins @ 2009-11-10 22:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Izik Eidus, Andrea Arcangeli, KAMEZAWA Hiroyuki, Minchan Kim,
	Andi Kleen, Wu Fengguang, linux-kernel, linux-mm

When do_nonlinear_fault() realizes that the page table must have been
corrupted for it to have been called, it does print_bad_pte() and
returns ... VM_FAULT_OOM, which is hard to understand.

It made some sense when I did it for 2.6.15, when do_page_fault()
just killed the current process; but nowadays it lets the OOM killer
decide who to kill - so page table corruption in one process would
be liable to kill another.

Change it to return VM_FAULT_SIGBUS instead: that doesn't guarantee
that the process will be killed, but is good enough for such a rare
abnormality, accompanied as it is by the "BUG: Bad page map" message.

And recent HWPOISON work has copied that code into do_swap_page(),
when it finds an impossible swap entry: fix that to VM_FAULT_SIGBUS too.

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
---
This one has nothing whatever to do with KSM swapping,
just something that KAMEZAWA-san and Minchan noticed recently.

 mm/memory.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- mm5/mm/memory.c	2009-11-02 12:32:34.000000000 +0000
+++ mm6/mm/memory.c	2009-11-07 14:44:58.000000000 +0000
@@ -2529,7 +2529,7 @@ static int do_swap_page(struct mm_struct
 			ret = VM_FAULT_HWPOISON;
 		} else {
 			print_bad_pte(vma, address, orig_pte, NULL);
-			ret = VM_FAULT_OOM;
+			ret = VM_FAULT_SIGBUS;
 		}
 		goto out;
 	}
@@ -2925,7 +2925,7 @@ static int do_nonlinear_fault(struct mm_
 		 * Page table corrupted: show pte and kill process.
 		 */
 		print_bad_pte(vma, address, orig_pte, NULL);
-		return VM_FAULT_OOM;
+		return VM_FAULT_SIGBUS;
 	}
 
 	pgoff = pte_to_pgoff(orig_pte);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 5/6] mm: stop ptlock enlarging struct page
  2009-11-10 22:02   ` Hugh Dickins
@ 2009-11-10 22:09     ` Peter Zijlstra
  -1 siblings, 0 replies; 62+ messages in thread
From: Peter Zijlstra @ 2009-11-10 22:09 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Izik Eidus, Andrea Arcangeli, Christoph Lameter,
	linux-kernel, linux-mm

On Tue, 2009-11-10 at 22:02 +0000, Hugh Dickins wrote:
> CONFIG_DEBUG_SPINLOCK adds 12 or 16 bytes to a 32- or 64-bit spinlock_t,
> and CONFIG_DEBUG_LOCK_ALLOC adds another 12 or 24 bytes to it: lockdep
> enables both of those, and CONFIG_LOCK_STAT adds 8 or 16 bytes to that.
> 
> When 2.6.15 placed the split page table lock inside struct page (usually
> sized 32 or 56 bytes), only CONFIG_DEBUG_SPINLOCK was a possibility, and
> we ignored the enlargement (but fitted in CONFIG_GENERIC_LOCKBREAK's 4
> by letting the spinlock_t occupy both page->private and page->mapping).
> 
> Should these debugging options be allowed to double the size of a struct
> page, when only one minority use of the page (as a page table) needs to
> fit a spinlock in there?  Perhaps not.
> 
> Take the easy way out: switch off SPLIT_PTLOCK_CPUS when DEBUG_SPINLOCK
> or DEBUG_LOCK_ALLOC is in force.  I've sometimes tried to be cleverer,
> kmallocing a cacheline for the spinlock when it doesn't fit, but given
> up each time.  Falling back to mm->page_table_lock (as we do when ptlock
> is not split) lets lockdep check out the strictest path anyway.

Why? we know lockdep bloats stuff we never cared.. and hiding a popular
CONFIG option from lockdep doesn't seem like a good idea to me.

> Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
> ---
> 
>  mm/Kconfig |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> --- mm4/mm/Kconfig	2009-11-04 10:52:58.000000000 +0000
> +++ mm5/mm/Kconfig	2009-11-04 10:53:13.000000000 +0000
> @@ -161,11 +161,13 @@ config PAGEFLAGS_EXTENDED
>  # Default to 4 for wider testing, though 8 might be more appropriate.
>  # ARM's adjust_pte (unused if VIPT) depends on mm-wide page_table_lock.
>  # PA-RISC 7xxx's spinlock_t would enlarge struct page from 32 to 44 bytes.
> +# DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC spinlock_t also enlarge struct page.
>  #
>  config SPLIT_PTLOCK_CPUS
>  	int
> -	default "4096" if ARM && !CPU_CACHE_VIPT
> -	default "4096" if PARISC && !PA20
> +	default "999999" if ARM && !CPU_CACHE_VIPT
> +	default "999999" if PARISC && !PA20
> +	default "999999" if DEBUG_SPINLOCK || DEBUG_LOCK_ALLOC
>  	default "4"
>  
>  #



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 5/6] mm: stop ptlock enlarging struct page
@ 2009-11-10 22:09     ` Peter Zijlstra
  0 siblings, 0 replies; 62+ messages in thread
From: Peter Zijlstra @ 2009-11-10 22:09 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Izik Eidus, Andrea Arcangeli, Christoph Lameter,
	linux-kernel, linux-mm

On Tue, 2009-11-10 at 22:02 +0000, Hugh Dickins wrote:
> CONFIG_DEBUG_SPINLOCK adds 12 or 16 bytes to a 32- or 64-bit spinlock_t,
> and CONFIG_DEBUG_LOCK_ALLOC adds another 12 or 24 bytes to it: lockdep
> enables both of those, and CONFIG_LOCK_STAT adds 8 or 16 bytes to that.
> 
> When 2.6.15 placed the split page table lock inside struct page (usually
> sized 32 or 56 bytes), only CONFIG_DEBUG_SPINLOCK was a possibility, and
> we ignored the enlargement (but fitted in CONFIG_GENERIC_LOCKBREAK's 4
> by letting the spinlock_t occupy both page->private and page->mapping).
> 
> Should these debugging options be allowed to double the size of a struct
> page, when only one minority use of the page (as a page table) needs to
> fit a spinlock in there?  Perhaps not.
> 
> Take the easy way out: switch off SPLIT_PTLOCK_CPUS when DEBUG_SPINLOCK
> or DEBUG_LOCK_ALLOC is in force.  I've sometimes tried to be cleverer,
> kmallocing a cacheline for the spinlock when it doesn't fit, but given
> up each time.  Falling back to mm->page_table_lock (as we do when ptlock
> is not split) lets lockdep check out the strictest path anyway.

Why? we know lockdep bloats stuff we never cared.. and hiding a popular
CONFIG option from lockdep doesn't seem like a good idea to me.

> Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
> ---
> 
>  mm/Kconfig |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> --- mm4/mm/Kconfig	2009-11-04 10:52:58.000000000 +0000
> +++ mm5/mm/Kconfig	2009-11-04 10:53:13.000000000 +0000
> @@ -161,11 +161,13 @@ config PAGEFLAGS_EXTENDED
>  # Default to 4 for wider testing, though 8 might be more appropriate.
>  # ARM's adjust_pte (unused if VIPT) depends on mm-wide page_table_lock.
>  # PA-RISC 7xxx's spinlock_t would enlarge struct page from 32 to 44 bytes.
> +# DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC spinlock_t also enlarge struct page.
>  #
>  config SPLIT_PTLOCK_CPUS
>  	int
> -	default "4096" if ARM && !CPU_CACHE_VIPT
> -	default "4096" if PARISC && !PA20
> +	default "999999" if ARM && !CPU_CACHE_VIPT
> +	default "999999" if PARISC && !PA20
> +	default "999999" if DEBUG_SPINLOCK || DEBUG_LOCK_ALLOC
>  	default "4"
>  
>  #


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 5/6] mm: stop ptlock enlarging struct page
  2009-11-10 22:02   ` Hugh Dickins
@ 2009-11-10 22:14     ` Peter Zijlstra
  -1 siblings, 0 replies; 62+ messages in thread
From: Peter Zijlstra @ 2009-11-10 22:14 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Izik Eidus, Andrea Arcangeli, Christoph Lameter,
	linux-kernel, linux-mm


fwiw, in -rt we carry this, because there spinlock_t is huge even
without lockdep.


---
commit 27909c87933670deead6ab74274cf61ebffad5ac
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Fri Jul 3 08:44:54 2009 -0500

    mm: shrink the page frame to !-rt size
    
    He below is a boot-tested hack to shrink the page frame size back to
    normal.
    
    Should be a net win since there should be many less PTE-pages than
    page-frames.
    
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

diff --git a/include/linux/mm.h b/include/linux/mm.h
index e52dfbb..fb2a7e9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -938,27 +938,85 @@ static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long a
  * overflow into the next struct page (as it might with DEBUG_SPINLOCK).
  * When freeing, reset page->mapping so free_pages_check won't complain.
  */
+#ifndef CONFIG_PREEMPT_RT
+
 #define __pte_lockptr(page)	&((page)->ptl)
-#define pte_lock_init(_page)	do {					\
-	spin_lock_init(__pte_lockptr(_page));				\
-} while (0)
+
+static inline struct page *pte_lock_init(struct page *page)
+{
+	spin_lock_init(__pte_lockptr(page));
+	return page;
+}
+
 #define pte_lock_deinit(page)	((page)->mapping = NULL)
+
+#else /* PREEMPT_RT */
+
+/*
+ * On PREEMPT_RT the spinlock_t's are too large to embed in the
+ * page frame, hence it only has a pointer and we need to dynamically
+ * allocate the lock when we allocate PTE-pages.
+ *
+ * This is an overall win, since only a small fraction of the pages
+ * will be PTE pages under normal circumstances.
+ */
+
+#define __pte_lockptr(page)	((page)->ptl)
+
+/*
+ * Heinous hack, relies on the caller doing something like:
+ *
+ *   pte = alloc_pages(PGALLOC_GFP, 0);
+ *   if (pte)
+ *     pgtable_page_ctor(pte);
+ *   return pte;
+ *
+ * This ensures we release the page and return NULL when the
+ * lock allocation fails.
+ */
+static inline struct page *pte_lock_init(struct page *page)
+{
+	page->ptl = kmalloc(sizeof(spinlock_t), GFP_KERNEL);
+	if (page->ptl) {
+		spin_lock_init(__pte_lockptr(page));
+	} else {
+		__free_page(page);
+		page = NULL;
+	}
+	return page;
+}
+
+static inline void pte_lock_deinit(struct page *page)
+{
+	kfree(page->ptl);
+	page->mapping = NULL;
+}
+
+#endif /* PREEMPT_RT */
+
 #define pte_lockptr(mm, pmd)	({(void)(mm); __pte_lockptr(pmd_page(*(pmd)));})
 #else	/* !USE_SPLIT_PTLOCKS */
 /*
  * We use mm->page_table_lock to guard all pagetable pages of the mm.
  */
-#define pte_lock_init(page)	do {} while (0)
+static inline struct page *pte_lock_init(struct page *page) { return page; }
 #define pte_lock_deinit(page)	do {} while (0)
 #define pte_lockptr(mm, pmd)	({(void)(pmd); &(mm)->page_table_lock;})
 #endif /* USE_SPLIT_PTLOCKS */
 
-static inline void pgtable_page_ctor(struct page *page)
+static inline struct page *__pgtable_page_ctor(struct page *page)
 {
-	pte_lock_init(page);
-	inc_zone_page_state(page, NR_PAGETABLE);
+	page = pte_lock_init(page);
+	if (page)
+		inc_zone_page_state(page, NR_PAGETABLE);
+	return page;
 }
 
+#define pgtable_page_ctor(page)				\
+do {							\
+	page = __pgtable_page_ctor(page);		\
+} while (0)
+
 static inline void pgtable_page_dtor(struct page *page)
 {
 	pte_lock_deinit(page);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index bd79936..2b208da 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -69,7 +69,11 @@ struct page {
 						 */
 	    };
 #if USE_SPLIT_PTLOCKS
+#ifndef CONFIG_PREEMPT_RT
 	    spinlock_t ptl;
+#else
+	    spinlock_t *ptl;
+#endif
 #endif
 	    struct kmem_cache *slab;	/* SLUB: Pointer to slab */
 	    struct page *first_page;	/* Compound tail pages */



^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH 5/6] mm: stop ptlock enlarging struct page
@ 2009-11-10 22:14     ` Peter Zijlstra
  0 siblings, 0 replies; 62+ messages in thread
From: Peter Zijlstra @ 2009-11-10 22:14 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Izik Eidus, Andrea Arcangeli, Christoph Lameter,
	linux-kernel, linux-mm


fwiw, in -rt we carry this, because there spinlock_t is huge even
without lockdep.


---
commit 27909c87933670deead6ab74274cf61ebffad5ac
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Fri Jul 3 08:44:54 2009 -0500

    mm: shrink the page frame to !-rt size
    
    He below is a boot-tested hack to shrink the page frame size back to
    normal.
    
    Should be a net win since there should be many less PTE-pages than
    page-frames.
    
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

diff --git a/include/linux/mm.h b/include/linux/mm.h
index e52dfbb..fb2a7e9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -938,27 +938,85 @@ static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long a
  * overflow into the next struct page (as it might with DEBUG_SPINLOCK).
  * When freeing, reset page->mapping so free_pages_check won't complain.
  */
+#ifndef CONFIG_PREEMPT_RT
+
 #define __pte_lockptr(page)	&((page)->ptl)
-#define pte_lock_init(_page)	do {					\
-	spin_lock_init(__pte_lockptr(_page));				\
-} while (0)
+
+static inline struct page *pte_lock_init(struct page *page)
+{
+	spin_lock_init(__pte_lockptr(page));
+	return page;
+}
+
 #define pte_lock_deinit(page)	((page)->mapping = NULL)
+
+#else /* PREEMPT_RT */
+
+/*
+ * On PREEMPT_RT the spinlock_t's are too large to embed in the
+ * page frame, hence it only has a pointer and we need to dynamically
+ * allocate the lock when we allocate PTE-pages.
+ *
+ * This is an overall win, since only a small fraction of the pages
+ * will be PTE pages under normal circumstances.
+ */
+
+#define __pte_lockptr(page)	((page)->ptl)
+
+/*
+ * Heinous hack, relies on the caller doing something like:
+ *
+ *   pte = alloc_pages(PGALLOC_GFP, 0);
+ *   if (pte)
+ *     pgtable_page_ctor(pte);
+ *   return pte;
+ *
+ * This ensures we release the page and return NULL when the
+ * lock allocation fails.
+ */
+static inline struct page *pte_lock_init(struct page *page)
+{
+	page->ptl = kmalloc(sizeof(spinlock_t), GFP_KERNEL);
+	if (page->ptl) {
+		spin_lock_init(__pte_lockptr(page));
+	} else {
+		__free_page(page);
+		page = NULL;
+	}
+	return page;
+}
+
+static inline void pte_lock_deinit(struct page *page)
+{
+	kfree(page->ptl);
+	page->mapping = NULL;
+}
+
+#endif /* PREEMPT_RT */
+
 #define pte_lockptr(mm, pmd)	({(void)(mm); __pte_lockptr(pmd_page(*(pmd)));})
 #else	/* !USE_SPLIT_PTLOCKS */
 /*
  * We use mm->page_table_lock to guard all pagetable pages of the mm.
  */
-#define pte_lock_init(page)	do {} while (0)
+static inline struct page *pte_lock_init(struct page *page) { return page; }
 #define pte_lock_deinit(page)	do {} while (0)
 #define pte_lockptr(mm, pmd)	({(void)(pmd); &(mm)->page_table_lock;})
 #endif /* USE_SPLIT_PTLOCKS */
 
-static inline void pgtable_page_ctor(struct page *page)
+static inline struct page *__pgtable_page_ctor(struct page *page)
 {
-	pte_lock_init(page);
-	inc_zone_page_state(page, NR_PAGETABLE);
+	page = pte_lock_init(page);
+	if (page)
+		inc_zone_page_state(page, NR_PAGETABLE);
+	return page;
 }
 
+#define pgtable_page_ctor(page)				\
+do {							\
+	page = __pgtable_page_ctor(page);		\
+} while (0)
+
 static inline void pgtable_page_dtor(struct page *page)
 {
 	pte_lock_deinit(page);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index bd79936..2b208da 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -69,7 +69,11 @@ struct page {
 						 */
 	    };
 #if USE_SPLIT_PTLOCKS
+#ifndef CONFIG_PREEMPT_RT
 	    spinlock_t ptl;
+#else
+	    spinlock_t *ptl;
+#endif
 #endif
 	    struct kmem_cache *slab;	/* SLUB: Pointer to slab */
 	    struct page *first_page;	/* Compound tail pages */


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH 5/6] mm: stop ptlock enlarging struct page
  2009-11-10 22:09     ` Peter Zijlstra
@ 2009-11-10 22:24       ` Hugh Dickins
  -1 siblings, 0 replies; 62+ messages in thread
From: Hugh Dickins @ 2009-11-10 22:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Izik Eidus, Andrea Arcangeli, Christoph Lameter,
	linux-kernel, linux-mm

On Tue, 10 Nov 2009, Peter Zijlstra wrote:
> On Tue, 2009-11-10 at 22:02 +0000, Hugh Dickins wrote:
> > 
> > Take the easy way out: switch off SPLIT_PTLOCK_CPUS when DEBUG_SPINLOCK
> > or DEBUG_LOCK_ALLOC is in force.  I've sometimes tried to be cleverer,
> > kmallocing a cacheline for the spinlock when it doesn't fit, but given
> > up each time.  Falling back to mm->page_table_lock (as we do when ptlock
> > is not split) lets lockdep check out the strictest path anyway.
> 
> Why? we know lockdep bloats stuff we never cared.. and hiding a popular
> CONFIG option from lockdep doesn't seem like a good idea to me.

That's a fair opinion, and indeed I Cc'ed you in case it were yours.

I'd like to see how other people feel about it.  Personally I detest
and regret that bloat to struct page, when there's only one particular
use of a page that remotely excuses it.

If it were less tiresome, I'd have gone for the dynamic kmalloc; but
it seemed silly to make that effort when the Kconfig mod is so easy.

But so far as letting lockdep do its job goes, we're actually better
off using page_table_lock there, as I tried to explain: since that
lock is used for a few other purposes, lockdep is more likely to
catch an issue which the SPLIT_PTLOCK case could be hiding.

Hugh

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 5/6] mm: stop ptlock enlarging struct page
@ 2009-11-10 22:24       ` Hugh Dickins
  0 siblings, 0 replies; 62+ messages in thread
From: Hugh Dickins @ 2009-11-10 22:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Izik Eidus, Andrea Arcangeli, Christoph Lameter,
	linux-kernel, linux-mm

On Tue, 10 Nov 2009, Peter Zijlstra wrote:
> On Tue, 2009-11-10 at 22:02 +0000, Hugh Dickins wrote:
> > 
> > Take the easy way out: switch off SPLIT_PTLOCK_CPUS when DEBUG_SPINLOCK
> > or DEBUG_LOCK_ALLOC is in force.  I've sometimes tried to be cleverer,
> > kmallocing a cacheline for the spinlock when it doesn't fit, but given
> > up each time.  Falling back to mm->page_table_lock (as we do when ptlock
> > is not split) lets lockdep check out the strictest path anyway.
> 
> Why? we know lockdep bloats stuff we never cared.. and hiding a popular
> CONFIG option from lockdep doesn't seem like a good idea to me.

That's a fair opinion, and indeed I Cc'ed you in case it were yours.

I'd like to see how other people feel about it.  Personally I detest
and regret that bloat to struct page, when there's only one particular
use of a page that remotely excuses it.

If it were less tiresome, I'd have gone for the dynamic kmalloc; but
it seemed silly to make that effort when the Kconfig mod is so easy.

But so far as letting lockdep do its job goes, we're actually better
off using page_table_lock there, as I tried to explain: since that
lock is used for a few other purposes, lockdep is more likely to
catch an issue which the SPLIT_PTLOCK case could be hiding.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 5/6] mm: stop ptlock enlarging struct page
  2009-11-10 22:14     ` Peter Zijlstra
@ 2009-11-10 22:29       ` Hugh Dickins
  -1 siblings, 0 replies; 62+ messages in thread
From: Hugh Dickins @ 2009-11-10 22:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Hugh Dickins, Andrew Morton, Izik Eidus, Andrea Arcangeli,
	Christoph Lameter, linux-kernel, linux-mm

On Tue, 10 Nov 2009, Peter Zijlstra wrote:
> 
> fwiw, in -rt we carry this, because there spinlock_t is huge even
> without lockdep.

Thanks, I may want to consider that; but I'm not keen on darting
off to another cacheline even for the non-debug spinlock case.

Hugh

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 5/6] mm: stop ptlock enlarging struct page
@ 2009-11-10 22:29       ` Hugh Dickins
  0 siblings, 0 replies; 62+ messages in thread
From: Hugh Dickins @ 2009-11-10 22:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Hugh Dickins, Andrew Morton, Izik Eidus, Andrea Arcangeli,
	Christoph Lameter, linux-kernel, linux-mm

On Tue, 10 Nov 2009, Peter Zijlstra wrote:
> 
> fwiw, in -rt we carry this, because there spinlock_t is huge even
> without lockdep.

Thanks, I may want to consider that; but I'm not keen on darting
off to another cacheline even for the non-debug spinlock case.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 3/6] mm: CONFIG_MMU for PG_mlocked
  2009-11-10 21:59   ` Hugh Dickins
@ 2009-11-11  1:22     ` KOSAKI Motohiro
  -1 siblings, 0 replies; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-11-11  1:22 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: kosaki.motohiro, Andrew Morton, Izik Eidus, Andrea Arcangeli,
	Nick Piggin, Rik van Riel, Lee Schermerhorn, Andi Kleen,
	Wu Fengguang, linux-kernel, linux-mm

> Remove three degrees of obfuscation, left over from when we had
> CONFIG_UNEVICTABLE_LRU.  MLOCK_PAGES is CONFIG_HAVE_MLOCKED_PAGE_BIT
> is CONFIG_HAVE_MLOCK is CONFIG_MMU.  rmap.o (and memory-failure.o)
> are only built when CONFIG_MMU, so don't need such conditions at all.
> 
> Somehow, I feel no compulsion to remove the CONFIG_HAVE_MLOCK*
> lines from 169 defconfigs: leave those to evolve in due course.
> 
> Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>

I don't recall why Lee added this config option. but it seems very
reasonable and I storongly like it.

At least, vmscan folks never said "please try to disable CONFIG_MLOCK".
It mean this option didn't help our debug.

	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 3/6] mm: CONFIG_MMU for PG_mlocked
@ 2009-11-11  1:22     ` KOSAKI Motohiro
  0 siblings, 0 replies; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-11-11  1:22 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: kosaki.motohiro, Andrew Morton, Izik Eidus, Andrea Arcangeli,
	Nick Piggin, Rik van Riel, Lee Schermerhorn, Andi Kleen,
	Wu Fengguang, linux-kernel, linux-mm

> Remove three degrees of obfuscation, left over from when we had
> CONFIG_UNEVICTABLE_LRU.  MLOCK_PAGES is CONFIG_HAVE_MLOCKED_PAGE_BIT
> is CONFIG_HAVE_MLOCK is CONFIG_MMU.  rmap.o (and memory-failure.o)
> are only built when CONFIG_MMU, so don't need such conditions at all.
> 
> Somehow, I feel no compulsion to remove the CONFIG_HAVE_MLOCK*
> lines from 169 defconfigs: leave those to evolve in due course.
> 
> Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>

I don't recall why Lee added this config option. but it seems very
reasonable and I storongly like it.

At least, vmscan folks never said "please try to disable CONFIG_MLOCK".
It mean this option didn't help our debug.

	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 6/6] mm: sigbus instead of abusing oom
  2009-11-10 22:06   ` Hugh Dickins
@ 2009-11-11  2:37     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 62+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-11-11  2:37 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Izik Eidus, Andrea Arcangeli, Minchan Kim,
	Andi Kleen, Wu Fengguang, linux-kernel, linux-mm

On Tue, 10 Nov 2009 22:06:49 +0000 (GMT)
Hugh Dickins <hugh.dickins@tiscali.co.uk> wrote:

> When do_nonlinear_fault() realizes that the page table must have been
> corrupted for it to have been called, it does print_bad_pte() and
> returns ... VM_FAULT_OOM, which is hard to understand.
> 
> It made some sense when I did it for 2.6.15, when do_page_fault()
> just killed the current process; but nowadays it lets the OOM killer
> decide who to kill - so page table corruption in one process would
> be liable to kill another.
> 
> Change it to return VM_FAULT_SIGBUS instead: that doesn't guarantee
> that the process will be killed, but is good enough for such a rare
> abnormality, accompanied as it is by the "BUG: Bad page map" message.
> 
> And recent HWPOISON work has copied that code into do_swap_page(),
> when it finds an impossible swap entry: fix that to VM_FAULT_SIGBUS too.
> 
> Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>

Thank you !
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 6/6] mm: sigbus instead of abusing oom
@ 2009-11-11  2:37     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 62+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-11-11  2:37 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Izik Eidus, Andrea Arcangeli, Minchan Kim,
	Andi Kleen, Wu Fengguang, linux-kernel, linux-mm

On Tue, 10 Nov 2009 22:06:49 +0000 (GMT)
Hugh Dickins <hugh.dickins@tiscali.co.uk> wrote:

> When do_nonlinear_fault() realizes that the page table must have been
> corrupted for it to have been called, it does print_bad_pte() and
> returns ... VM_FAULT_OOM, which is hard to understand.
> 
> It made some sense when I did it for 2.6.15, when do_page_fault()
> just killed the current process; but nowadays it lets the OOM killer
> decide who to kill - so page table corruption in one process would
> be liable to kill another.
> 
> Change it to return VM_FAULT_SIGBUS instead: that doesn't guarantee
> that the process will be killed, but is good enough for such a rare
> abnormality, accompanied as it is by the "BUG: Bad page map" message.
> 
> And recent HWPOISON work has copied that code into do_swap_page(),
> when it finds an impossible swap entry: fix that to VM_FAULT_SIGBUS too.
> 
> Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>

Thank you !
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 6/6] mm: sigbus instead of abusing oom
  2009-11-11  2:37     ` KAMEZAWA Hiroyuki
@ 2009-11-11  2:42       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-11-11  2:42 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro, Hugh Dickins, Andrew Morton, Izik Eidus,
	Andrea Arcangeli, Minchan Kim, Andi Kleen, Wu Fengguang,
	linux-kernel, linux-mm

> On Tue, 10 Nov 2009 22:06:49 +0000 (GMT)
> Hugh Dickins <hugh.dickins@tiscali.co.uk> wrote:
> 
> > When do_nonlinear_fault() realizes that the page table must have been
> > corrupted for it to have been called, it does print_bad_pte() and
> > returns ... VM_FAULT_OOM, which is hard to understand.
> > 
> > It made some sense when I did it for 2.6.15, when do_page_fault()
> > just killed the current process; but nowadays it lets the OOM killer
> > decide who to kill - so page table corruption in one process would
> > be liable to kill another.
> > 
> > Change it to return VM_FAULT_SIGBUS instead: that doesn't guarantee
> > that the process will be killed, but is good enough for such a rare
> > abnormality, accompanied as it is by the "BUG: Bad page map" message.
> > 
> > And recent HWPOISON work has copied that code into do_swap_page(),
> > when it finds an impossible swap entry: fix that to VM_FAULT_SIGBUS too.
> > 
> > Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
> 
> Thank you !
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Thank you, me too.

	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 6/6] mm: sigbus instead of abusing oom
@ 2009-11-11  2:42       ` KOSAKI Motohiro
  0 siblings, 0 replies; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-11-11  2:42 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro, Hugh Dickins, Andrew Morton, Izik Eidus,
	Andrea Arcangeli, Minchan Kim, Andi Kleen, Wu Fengguang,
	linux-kernel, linux-mm

> On Tue, 10 Nov 2009 22:06:49 +0000 (GMT)
> Hugh Dickins <hugh.dickins@tiscali.co.uk> wrote:
> 
> > When do_nonlinear_fault() realizes that the page table must have been
> > corrupted for it to have been called, it does print_bad_pte() and
> > returns ... VM_FAULT_OOM, which is hard to understand.
> > 
> > It made some sense when I did it for 2.6.15, when do_page_fault()
> > just killed the current process; but nowadays it lets the OOM killer
> > decide who to kill - so page table corruption in one process would
> > be liable to kill another.
> > 
> > Change it to return VM_FAULT_SIGBUS instead: that doesn't guarantee
> > that the process will be killed, but is good enough for such a rare
> > abnormality, accompanied as it is by the "BUG: Bad page map" message.
> > 
> > And recent HWPOISON work has copied that code into do_swap_page(),
> > when it finds an impossible swap entry: fix that to VM_FAULT_SIGBUS too.
> > 
> > Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
> 
> Thank you !
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Thank you, me too.

	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 6/6] mm: sigbus instead of abusing oom
  2009-11-11  2:42       ` KOSAKI Motohiro
@ 2009-11-11  4:35         ` Wu Fengguang
  -1 siblings, 0 replies; 62+ messages in thread
From: Wu Fengguang @ 2009-11-11  4:35 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: KAMEZAWA Hiroyuki, Hugh Dickins, Andrew Morton, Izik Eidus,
	Andrea Arcangeli, Minchan Kim, Andi Kleen, linux-kernel,
	linux-mm

On Wed, Nov 11, 2009 at 10:42:04AM +0800, KOSAKI Motohiro wrote:
> > On Tue, 10 Nov 2009 22:06:49 +0000 (GMT)
> > Hugh Dickins <hugh.dickins@tiscali.co.uk> wrote:
> >
> > > When do_nonlinear_fault() realizes that the page table must have been
> > > corrupted for it to have been called, it does print_bad_pte() and
> > > returns ... VM_FAULT_OOM, which is hard to understand.
> > >
> > > It made some sense when I did it for 2.6.15, when do_page_fault()
> > > just killed the current process; but nowadays it lets the OOM killer
> > > decide who to kill - so page table corruption in one process would
> > > be liable to kill another.
> > >
> > > Change it to return VM_FAULT_SIGBUS instead: that doesn't guarantee
> > > that the process will be killed, but is good enough for such a rare
> > > abnormality, accompanied as it is by the "BUG: Bad page map" message.
> > >
> > > And recent HWPOISON work has copied that code into do_swap_page(),
> > > when it finds an impossible swap entry: fix that to VM_FAULT_SIGBUS too.
> > >
> > > Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
> >
> > Thank you !
> > Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> Thank you, me too.
>
> 	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Thank you!

 	Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>


Some unrelated comments:

We observed that copy_to_user() on a hwpoison page would trigger 3
(duplicate) late kills (the last three lines below):

early kill:
        [   56.964041] virtual address 7fffcab7d000 found in vma
        [   56.964390]  7fffcab7d000 phys b4365000
        [   58.089254] Triggering MCE exception on CPU 0
        [   58.089563] Disabling lock debugging due to kernel taint
        [   58.089914] Machine check events logged
        [   58.090187] MCE exception done on CPU 0
        [   58.090462] MCE 0xb4365: page flags 0x100000000100068=uptodate,lru,active,mmap,anonymous,swapbacked count 1 mapcount 1
        [   58.091878] MCE 0xb4365: Killing copy_to_user_te:3768 early due to hardware memory corruption
        [   58.092425] MCE 0xb4365: dirty LRU page recovery: Recovered
late kill on copy_to_user():
        [   59.136331] Copy 4096 bytes to 00007fffcab7d000
        [   59.136641] MCE: Killing copy_to_user_te:3768 due to hardware memory corruption fault at 7fffcab7d000
        [   59.137231] MCE: Killing copy_to_user_te:3768 due to hardware memory corruption fault at 7fffcab7d000
        [   59.137812] MCE: Killing copy_to_user_te:3768 due to hardware memory corruption fault at 7fffcab7d001

And this patch does not affect it (somehow weird but harmless behavior).

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 6/6] mm: sigbus instead of abusing oom
@ 2009-11-11  4:35         ` Wu Fengguang
  0 siblings, 0 replies; 62+ messages in thread
From: Wu Fengguang @ 2009-11-11  4:35 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: KAMEZAWA Hiroyuki, Hugh Dickins, Andrew Morton, Izik Eidus,
	Andrea Arcangeli, Minchan Kim, Andi Kleen, linux-kernel,
	linux-mm

On Wed, Nov 11, 2009 at 10:42:04AM +0800, KOSAKI Motohiro wrote:
> > On Tue, 10 Nov 2009 22:06:49 +0000 (GMT)
> > Hugh Dickins <hugh.dickins@tiscali.co.uk> wrote:
> >
> > > When do_nonlinear_fault() realizes that the page table must have been
> > > corrupted for it to have been called, it does print_bad_pte() and
> > > returns ... VM_FAULT_OOM, which is hard to understand.
> > >
> > > It made some sense when I did it for 2.6.15, when do_page_fault()
> > > just killed the current process; but nowadays it lets the OOM killer
> > > decide who to kill - so page table corruption in one process would
> > > be liable to kill another.
> > >
> > > Change it to return VM_FAULT_SIGBUS instead: that doesn't guarantee
> > > that the process will be killed, but is good enough for such a rare
> > > abnormality, accompanied as it is by the "BUG: Bad page map" message.
> > >
> > > And recent HWPOISON work has copied that code into do_swap_page(),
> > > when it finds an impossible swap entry: fix that to VM_FAULT_SIGBUS too.
> > >
> > > Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
> >
> > Thank you !
> > Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> Thank you, me too.
>
> 	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Thank you!

 	Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>


Some unrelated comments:

We observed that copy_to_user() on a hwpoison page would trigger 3
(duplicate) late kills (the last three lines below):

early kill:
        [   56.964041] virtual address 7fffcab7d000 found in vma
        [   56.964390]  7fffcab7d000 phys b4365000
        [   58.089254] Triggering MCE exception on CPU 0
        [   58.089563] Disabling lock debugging due to kernel taint
        [   58.089914] Machine check events logged
        [   58.090187] MCE exception done on CPU 0
        [   58.090462] MCE 0xb4365: page flags 0x100000000100068=uptodate,lru,active,mmap,anonymous,swapbacked count 1 mapcount 1
        [   58.091878] MCE 0xb4365: Killing copy_to_user_te:3768 early due to hardware memory corruption
        [   58.092425] MCE 0xb4365: dirty LRU page recovery: Recovered
late kill on copy_to_user():
        [   59.136331] Copy 4096 bytes to 00007fffcab7d000
        [   59.136641] MCE: Killing copy_to_user_te:3768 due to hardware memory corruption fault at 7fffcab7d000
        [   59.137231] MCE: Killing copy_to_user_te:3768 due to hardware memory corruption fault at 7fffcab7d000
        [   59.137812] MCE: Killing copy_to_user_te:3768 due to hardware memory corruption fault at 7fffcab7d001

And this patch does not affect it (somehow weird but harmless behavior).

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 6/6] mm: sigbus instead of abusing oom
  2009-11-10 22:06   ` Hugh Dickins
@ 2009-11-11  5:51     ` Minchan Kim
  -1 siblings, 0 replies; 62+ messages in thread
From: Minchan Kim @ 2009-11-11  5:51 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Izik Eidus, Andrea Arcangeli, KAMEZAWA Hiroyuki,
	Andi Kleen, Wu Fengguang, linux-kernel, linux-mm

On Wed, Nov 11, 2009 at 7:06 AM, Hugh Dickins
<hugh.dickins@tiscali.co.uk> wrote:
> When do_nonlinear_fault() realizes that the page table must have been
> corrupted for it to have been called, it does print_bad_pte() and
> returns ... VM_FAULT_OOM, which is hard to understand.
>
> It made some sense when I did it for 2.6.15, when do_page_fault()
> just killed the current process; but nowadays it lets the OOM killer
> decide who to kill - so page table corruption in one process would
> be liable to kill another.
>
> Change it to return VM_FAULT_SIGBUS instead: that doesn't guarantee
> that the process will be killed, but is good enough for such a rare
> abnormality, accompanied as it is by the "BUG: Bad page map" message.
>
> And recent HWPOISON work has copied that code into do_swap_page(),
> when it finds an impossible swap entry: fix that to VM_FAULT_SIGBUS too.
>
> Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
I already agreed this. :)
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 6/6] mm: sigbus instead of abusing oom
@ 2009-11-11  5:51     ` Minchan Kim
  0 siblings, 0 replies; 62+ messages in thread
From: Minchan Kim @ 2009-11-11  5:51 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Izik Eidus, Andrea Arcangeli, KAMEZAWA Hiroyuki,
	Andi Kleen, Wu Fengguang, linux-kernel, linux-mm

On Wed, Nov 11, 2009 at 7:06 AM, Hugh Dickins
<hugh.dickins@tiscali.co.uk> wrote:
> When do_nonlinear_fault() realizes that the page table must have been
> corrupted for it to have been called, it does print_bad_pte() and
> returns ... VM_FAULT_OOM, which is hard to understand.
>
> It made some sense when I did it for 2.6.15, when do_page_fault()
> just killed the current process; but nowadays it lets the OOM killer
> decide who to kill - so page table corruption in one process would
> be liable to kill another.
>
> Change it to return VM_FAULT_SIGBUS instead: that doesn't guarantee
> that the process will be killed, but is good enough for such a rare
> abnormality, accompanied as it is by the "BUG: Bad page map" message.
>
> And recent HWPOISON work has copied that code into do_swap_page(),
> when it finds an impossible swap entry: fix that to VM_FAULT_SIGBUS too.
>
> Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
I already agreed this. :)
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: mlocking in try_to_unmap_one
  2009-11-10 21:55   ` Hugh Dickins
@ 2009-11-11  7:56     ` KOSAKI Motohiro
  -1 siblings, 0 replies; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-11-11  7:56 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: kosaki.motohiro, Andrew Morton, Izik Eidus, Andrea Arcangeli,
	Nick Piggin, Rik van Riel, Lee Schermerhorn, linux-kernel,
	linux-mm

Hi Hugh

> @@ -1081,45 +1053,23 @@ static int try_to_unmap_file(struct page
>  	unsigned long max_nl_cursor = 0;
>  	unsigned long max_nl_size = 0;
>  	unsigned int mapcount;
> -	unsigned int mlocked = 0;
> -	int unlock = TTU_ACTION(flags) == TTU_MUNLOCK;
> -
> -	if (MLOCK_PAGES && unlikely(unlock))
> -		ret = SWAP_SUCCESS;	/* default for try_to_munlock() */
>  
>  	spin_lock(&mapping->i_mmap_lock);
>  	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
> -		if (MLOCK_PAGES && unlikely(unlock)) {
> -			if (!((vma->vm_flags & VM_LOCKED) &&
> -						page_mapped_in_vma(page, vma)))
> -				continue;	/* must visit all vmas */
> -			ret = SWAP_MLOCK;
> -		} else {
> -			ret = try_to_unmap_one(page, vma, flags);
> -			if (ret == SWAP_FAIL || !page_mapped(page))
> -				goto out;
> -		}
> -		if (ret == SWAP_MLOCK) {
> -			mlocked = try_to_mlock_page(page, vma);
> -			if (mlocked)
> -				break;  /* stop if actually mlocked page */
> -		}
> +		ret = try_to_unmap_one(page, vma, flags);
> +		if (ret != SWAP_AGAIN || !page_mapped(page))
> +			goto out;
>  	}
>  
> -	if (mlocked)
> +	if (list_empty(&mapping->i_mmap_nonlinear))
>  		goto out;
>
> -	if (list_empty(&mapping->i_mmap_nonlinear))
> +	/* We don't bother to try to find the munlocked page in nonlinears */
> +	if (MLOCK_PAGES && TTU_ACTION(flags) == TTU_MUNLOCK)
>  		goto out;

I have dumb question.
Does this shortcut exiting code makes any behavior change?




>  
>  	list_for_each_entry(vma, &mapping->i_mmap_nonlinear,
>  						shared.vm_set.list) {
> -		if (MLOCK_PAGES && unlikely(unlock)) {
> -			if (!(vma->vm_flags & VM_LOCKED))
> -				continue;	/* must visit all vmas */
> -			ret = SWAP_MLOCK;	/* leave mlocked == 0 */
> -			goto out;		/* no need to look further */
> -		}
>  		if (!MLOCK_PAGES && !(flags & TTU_IGNORE_MLOCK) &&
>  			(vma->vm_flags & VM_LOCKED))
>  			continue;
> @@ -1161,10 +1111,9 @@ static int try_to_unmap_file(struct page
>  			cursor = (unsigned long) vma->vm_private_data;
>  			while ( cursor < max_nl_cursor &&
>  				cursor < vma->vm_end - vma->vm_start) {
> -				ret = try_to_unmap_cluster(cursor, &mapcount,
> -								vma, page);
> -				if (ret == SWAP_MLOCK)
> -					mlocked = 2;	/* to return below */
> +				if (try_to_unmap_cluster(cursor, &mapcount,
> +						vma, page) == SWAP_MLOCK)
> +					ret = SWAP_MLOCK;
>  				cursor += CLUSTER_SIZE;
>  				vma->vm_private_data = (void *) cursor;
>  				if ((int)mapcount <= 0)
> @@ -1185,10 +1134,6 @@ static int try_to_unmap_file(struct page
>  		vma->vm_private_data = NULL;
>  out:
>  	spin_unlock(&mapping->i_mmap_lock);
> -	if (mlocked)
> -		ret = SWAP_MLOCK;	/* actually mlocked the page */
> -	else if (ret == SWAP_MLOCK)
> -		ret = SWAP_AGAIN;	/* saw VM_LOCKED vma */
>  	return ret;
>  }
>  



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: mlocking in try_to_unmap_one
@ 2009-11-11  7:56     ` KOSAKI Motohiro
  0 siblings, 0 replies; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-11-11  7:56 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: kosaki.motohiro, Andrew Morton, Izik Eidus, Andrea Arcangeli,
	Nick Piggin, Rik van Riel, Lee Schermerhorn, linux-kernel,
	linux-mm

Hi Hugh

> @@ -1081,45 +1053,23 @@ static int try_to_unmap_file(struct page
>  	unsigned long max_nl_cursor = 0;
>  	unsigned long max_nl_size = 0;
>  	unsigned int mapcount;
> -	unsigned int mlocked = 0;
> -	int unlock = TTU_ACTION(flags) == TTU_MUNLOCK;
> -
> -	if (MLOCK_PAGES && unlikely(unlock))
> -		ret = SWAP_SUCCESS;	/* default for try_to_munlock() */
>  
>  	spin_lock(&mapping->i_mmap_lock);
>  	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
> -		if (MLOCK_PAGES && unlikely(unlock)) {
> -			if (!((vma->vm_flags & VM_LOCKED) &&
> -						page_mapped_in_vma(page, vma)))
> -				continue;	/* must visit all vmas */
> -			ret = SWAP_MLOCK;
> -		} else {
> -			ret = try_to_unmap_one(page, vma, flags);
> -			if (ret == SWAP_FAIL || !page_mapped(page))
> -				goto out;
> -		}
> -		if (ret == SWAP_MLOCK) {
> -			mlocked = try_to_mlock_page(page, vma);
> -			if (mlocked)
> -				break;  /* stop if actually mlocked page */
> -		}
> +		ret = try_to_unmap_one(page, vma, flags);
> +		if (ret != SWAP_AGAIN || !page_mapped(page))
> +			goto out;
>  	}
>  
> -	if (mlocked)
> +	if (list_empty(&mapping->i_mmap_nonlinear))
>  		goto out;
>
> -	if (list_empty(&mapping->i_mmap_nonlinear))
> +	/* We don't bother to try to find the munlocked page in nonlinears */
> +	if (MLOCK_PAGES && TTU_ACTION(flags) == TTU_MUNLOCK)
>  		goto out;

I have dumb question.
Does this shortcut exiting code makes any behavior change?




>  
>  	list_for_each_entry(vma, &mapping->i_mmap_nonlinear,
>  						shared.vm_set.list) {
> -		if (MLOCK_PAGES && unlikely(unlock)) {
> -			if (!(vma->vm_flags & VM_LOCKED))
> -				continue;	/* must visit all vmas */
> -			ret = SWAP_MLOCK;	/* leave mlocked == 0 */
> -			goto out;		/* no need to look further */
> -		}
>  		if (!MLOCK_PAGES && !(flags & TTU_IGNORE_MLOCK) &&
>  			(vma->vm_flags & VM_LOCKED))
>  			continue;
> @@ -1161,10 +1111,9 @@ static int try_to_unmap_file(struct page
>  			cursor = (unsigned long) vma->vm_private_data;
>  			while ( cursor < max_nl_cursor &&
>  				cursor < vma->vm_end - vma->vm_start) {
> -				ret = try_to_unmap_cluster(cursor, &mapcount,
> -								vma, page);
> -				if (ret == SWAP_MLOCK)
> -					mlocked = 2;	/* to return below */
> +				if (try_to_unmap_cluster(cursor, &mapcount,
> +						vma, page) == SWAP_MLOCK)
> +					ret = SWAP_MLOCK;
>  				cursor += CLUSTER_SIZE;
>  				vma->vm_private_data = (void *) cursor;
>  				if ((int)mapcount <= 0)
> @@ -1185,10 +1134,6 @@ static int try_to_unmap_file(struct page
>  		vma->vm_private_data = NULL;
>  out:
>  	spin_unlock(&mapping->i_mmap_lock);
> -	if (mlocked)
> -		ret = SWAP_MLOCK;	/* actually mlocked the page */
> -	else if (ret == SWAP_MLOCK)
> -		ret = SWAP_AGAIN;	/* saw VM_LOCKED vma */
>  	return ret;
>  }
>  


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 3/6] mm: CONFIG_MMU for PG_mlocked
  2009-11-11  1:22     ` KOSAKI Motohiro
@ 2009-11-11 10:48       ` Hugh Dickins
  -1 siblings, 0 replies; 62+ messages in thread
From: Hugh Dickins @ 2009-11-11 10:48 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Izik Eidus, Andrea Arcangeli, Nick Piggin,
	Rik van Riel, Lee Schermerhorn, Andi Kleen, Wu Fengguang,
	linux-kernel, linux-mm

On Wed, 11 Nov 2009, KOSAKI Motohiro wrote:
> > Remove three degrees of obfuscation, left over from when we had
> > CONFIG_UNEVICTABLE_LRU.  MLOCK_PAGES is CONFIG_HAVE_MLOCKED_PAGE_BIT
> > is CONFIG_HAVE_MLOCK is CONFIG_MMU.  rmap.o (and memory-failure.o)
> > are only built when CONFIG_MMU, so don't need such conditions at all.
> 
> I don't recall why Lee added this config option. but it seems very
> reasonable and I storongly like it.
> 
> At least, vmscan folks never said "please try to disable CONFIG_MLOCK".
> It mean this option didn't help our debug.
> 
> 	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Thanks.  CONFIG_HAVE_MLOCKED_PAGE_BIT and CONFIG_HAVE_MLOCK were both
just internal, automatically defaulted options which the user never
saw (except in .config).  I think they were there to sort out the
interdependencies between CONFIG_MMU and CONFIG_UNEVICTABLE_LRU,
and probably other historical issues while people decided whether
or not to go ahead with having a page bit for the thing.  So no
user should notice their disappearance: removing them just makes
the code clearer, that's all.

Hugh

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 3/6] mm: CONFIG_MMU for PG_mlocked
@ 2009-11-11 10:48       ` Hugh Dickins
  0 siblings, 0 replies; 62+ messages in thread
From: Hugh Dickins @ 2009-11-11 10:48 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Izik Eidus, Andrea Arcangeli, Nick Piggin,
	Rik van Riel, Lee Schermerhorn, Andi Kleen, Wu Fengguang,
	linux-kernel, linux-mm

On Wed, 11 Nov 2009, KOSAKI Motohiro wrote:
> > Remove three degrees of obfuscation, left over from when we had
> > CONFIG_UNEVICTABLE_LRU.  MLOCK_PAGES is CONFIG_HAVE_MLOCKED_PAGE_BIT
> > is CONFIG_HAVE_MLOCK is CONFIG_MMU.  rmap.o (and memory-failure.o)
> > are only built when CONFIG_MMU, so don't need such conditions at all.
> 
> I don't recall why Lee added this config option. but it seems very
> reasonable and I storongly like it.
> 
> At least, vmscan folks never said "please try to disable CONFIG_MLOCK".
> It mean this option didn't help our debug.
> 
> 	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Thanks.  CONFIG_HAVE_MLOCKED_PAGE_BIT and CONFIG_HAVE_MLOCK were both
just internal, automatically defaulted options which the user never
saw (except in .config).  I think they were there to sort out the
interdependencies between CONFIG_MMU and CONFIG_UNEVICTABLE_LRU,
and probably other historical issues while people decided whether
or not to go ahead with having a page bit for the thing.  So no
user should notice their disappearance: removing them just makes
the code clearer, that's all.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: mlocking in try_to_unmap_one
  2009-11-11  7:56     ` KOSAKI Motohiro
@ 2009-11-11 11:36       ` Hugh Dickins
  -1 siblings, 0 replies; 62+ messages in thread
From: Hugh Dickins @ 2009-11-11 11:36 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Izik Eidus, Andrea Arcangeli, Nick Piggin,
	Rik van Riel, Lee Schermerhorn, linux-kernel, linux-mm

On Wed, 11 Nov 2009, KOSAKI Motohiro wrote:

Though it doesn't quite answer your question,
I'll just reinsert the last paragraph of my description here...

> > try_to_unmap_file()'s TTU_MUNLOCK nonlinear handling was particularly
> > amusing: once unravelled, it turns out to have been choosing between
> > two different ways of doing the same nothing.  Ah, no, one way was
> > actually returning SWAP_FAIL when it meant to return SWAP_SUCCESS.

... 
> > @@ -1081,45 +1053,23 @@ static int try_to_unmap_file(struct page
...
> >
> > -	if (list_empty(&mapping->i_mmap_nonlinear))
> > +	/* We don't bother to try to find the munlocked page in nonlinears */
> > +	if (MLOCK_PAGES && TTU_ACTION(flags) == TTU_MUNLOCK)
> >  		goto out;
> 
> I have dumb question.
> Does this shortcut exiting code makes any behavior change?

Not dumb.  My intention was to make no behaviour change with any of
this patch; but in checking back before completing the description,
I suddenly realized that that shortcut intentionally avoids the

	if (max_nl_size == 0) {	/* all nonlinears locked or reserved ? */
		ret = SWAP_FAIL;
		goto out;
	}

(which doesn't show up in the patch: you'll have to look at rmap.c),
which used to have the effect of try_to_munlock() returning SWAP_FAIL
in the case when there were one or more VM_NONLINEAR vmas of the file,
but none of them (and none of the covering linear vmas) VM_LOCKED.

That should have been a SWAP_SUCCESS case, or with my changes
another SWAP_AGAIN, either of which would make munlock_vma_page()
				count_vm_event(UNEVICTABLE_PGMUNLOCKED);
which would be correct; but the SWAP_FAIL meant that count was not
incremented in this case.

Actually, I've double-fixed that, because I also changed
munlock_vma_page() to increment the count whenever ret != SWAP_MLOCK;
which seemed more appropriate, but would have been a no-op if
try_to_munlock() only returned SWAP_SUCCESS or SWAP_AGAIN or SWAP_MLOCK
as it claimed.

But I wasn't very inclined to boast of fixing that bug, since my testing
didn't give confidence that those /proc/vmstat unevictable_pgs_*lock*
counts are being properly maintained anyway - when I locked the same
pages in two vmas then unlocked them in both, I ended up with mlocked
bigger than munlocked (with or without my 2/6 patch); which I suspect
is wrong, but rather off my present course towards KSM swapping...

Hugh

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: mlocking in try_to_unmap_one
@ 2009-11-11 11:36       ` Hugh Dickins
  0 siblings, 0 replies; 62+ messages in thread
From: Hugh Dickins @ 2009-11-11 11:36 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Izik Eidus, Andrea Arcangeli, Nick Piggin,
	Rik van Riel, Lee Schermerhorn, linux-kernel, linux-mm

On Wed, 11 Nov 2009, KOSAKI Motohiro wrote:

Though it doesn't quite answer your question,
I'll just reinsert the last paragraph of my description here...

> > try_to_unmap_file()'s TTU_MUNLOCK nonlinear handling was particularly
> > amusing: once unravelled, it turns out to have been choosing between
> > two different ways of doing the same nothing.  Ah, no, one way was
> > actually returning SWAP_FAIL when it meant to return SWAP_SUCCESS.

... 
> > @@ -1081,45 +1053,23 @@ static int try_to_unmap_file(struct page
...
> >
> > -	if (list_empty(&mapping->i_mmap_nonlinear))
> > +	/* We don't bother to try to find the munlocked page in nonlinears */
> > +	if (MLOCK_PAGES && TTU_ACTION(flags) == TTU_MUNLOCK)
> >  		goto out;
> 
> I have dumb question.
> Does this shortcut exiting code makes any behavior change?

Not dumb.  My intention was to make no behaviour change with any of
this patch; but in checking back before completing the description,
I suddenly realized that that shortcut intentionally avoids the

	if (max_nl_size == 0) {	/* all nonlinears locked or reserved ? */
		ret = SWAP_FAIL;
		goto out;
	}

(which doesn't show up in the patch: you'll have to look at rmap.c),
which used to have the effect of try_to_munlock() returning SWAP_FAIL
in the case when there were one or more VM_NONLINEAR vmas of the file,
but none of them (and none of the covering linear vmas) VM_LOCKED.

That should have been a SWAP_SUCCESS case, or with my changes
another SWAP_AGAIN, either of which would make munlock_vma_page()
				count_vm_event(UNEVICTABLE_PGMUNLOCKED);
which would be correct; but the SWAP_FAIL meant that count was not
incremented in this case.

Actually, I've double-fixed that, because I also changed
munlock_vma_page() to increment the count whenever ret != SWAP_MLOCK;
which seemed more appropriate, but would have been a no-op if
try_to_munlock() only returned SWAP_SUCCESS or SWAP_AGAIN or SWAP_MLOCK
as it claimed.

But I wasn't very inclined to boast of fixing that bug, since my testing
didn't give confidence that those /proc/vmstat unevictable_pgs_*lock*
counts are being properly maintained anyway - when I locked the same
pages in two vmas then unlocked them in both, I ended up with mlocked
bigger than munlocked (with or without my 2/6 patch); which I suspect
is wrong, but rather off my present course towards KSM swapping...

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 3/6] mm: CONFIG_MMU for PG_mlocked
  2009-11-10 21:59   ` Hugh Dickins
@ 2009-11-11 12:38     ` Andi Kleen
  -1 siblings, 0 replies; 62+ messages in thread
From: Andi Kleen @ 2009-11-11 12:38 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Izik Eidus, Andrea Arcangeli, Nick Piggin,
	KOSAKI Motohiro, Rik van Riel, Lee Schermerhorn, Andi Kleen,
	Wu Fengguang, linux-kernel, linux-mm

On Tue, Nov 10, 2009 at 09:59:23PM +0000, Hugh Dickins wrote:
> Remove three degrees of obfuscation, left over from when we had
> CONFIG_UNEVICTABLE_LRU.  MLOCK_PAGES is CONFIG_HAVE_MLOCKED_PAGE_BIT
> is CONFIG_HAVE_MLOCK is CONFIG_MMU.  rmap.o (and memory-failure.o)
> are only built when CONFIG_MMU, so don't need such conditions at all.

Thanks.  The memory-failure.c change looks good and indeeds
it's overall less confusing.
-Andi

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 3/6] mm: CONFIG_MMU for PG_mlocked
@ 2009-11-11 12:38     ` Andi Kleen
  0 siblings, 0 replies; 62+ messages in thread
From: Andi Kleen @ 2009-11-11 12:38 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Izik Eidus, Andrea Arcangeli, Nick Piggin,
	KOSAKI Motohiro, Rik van Riel, Lee Schermerhorn, Andi Kleen,
	Wu Fengguang, linux-kernel, linux-mm

On Tue, Nov 10, 2009 at 09:59:23PM +0000, Hugh Dickins wrote:
> Remove three degrees of obfuscation, left over from when we had
> CONFIG_UNEVICTABLE_LRU.  MLOCK_PAGES is CONFIG_HAVE_MLOCKED_PAGE_BIT
> is CONFIG_HAVE_MLOCK is CONFIG_MMU.  rmap.o (and memory-failure.o)
> are only built when CONFIG_MMU, so don't need such conditions at all.

Thanks.  The memory-failure.c change looks good and indeeds
it's overall less confusing.
-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: mlocking in try_to_unmap_one
  2009-11-10 21:55   ` Hugh Dickins
@ 2009-11-13  6:30     ` KOSAKI Motohiro
  -1 siblings, 0 replies; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-11-13  6:30 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: kosaki.motohiro, Andrew Morton, Izik Eidus, Andrea Arcangeli,
	Nick Piggin, Rik van Riel, Lee Schermerhorn, linux-kernel,
	linux-mm

> @@ -787,6 +787,8 @@ static int try_to_unmap_one(struct page
>  			ret = SWAP_MLOCK;
>  			goto out_unmap;
>  		}
> +		if (MLOCK_PAGES && TTU_ACTION(flags) == TTU_MUNLOCK)
> +			goto out_unmap;
>  	}
>  	if (!(flags & TTU_IGNORE_ACCESS)) {
>  		if (ptep_clear_flush_young_notify(vma, address, pte)) {
> @@ -852,12 +854,22 @@ static int try_to_unmap_one(struct page
>  	} else
>  		dec_mm_counter(mm, file_rss);
>  
> -
>  	page_remove_rmap(page);
>  	page_cache_release(page);
>  
>  out_unmap:
>  	pte_unmap_unlock(pte, ptl);
> +
> +	if (MLOCK_PAGES && ret == SWAP_MLOCK) {
> +		ret = SWAP_AGAIN;
> +		if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
> +			if (vma->vm_flags & VM_LOCKED) {
> +				mlock_vma_page(page);
> +				ret = SWAP_MLOCK;
> +			}
> +			up_read(&vma->vm_mm->mmap_sem);
> +		}
> +	}
>  out:

Very small nit. How about this?


------------------------------------------------------------
>From 9d4b507572eccf88dcaa02e650df59874216528c Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Fri, 13 Nov 2009 15:00:04 +0900
Subject: [PATCH] Simplify try_to_unmap_one()

SWAP_MLOCK mean "We marked the page as PG_MLOCK, please move it to
unevictable-lru". So, following code is easy confusable.

	if (vma->vm_flags & VM_LOCKED) {
		ret = SWAP_MLOCK;
		goto out_unmap;
	}

Plus, if the VMA doesn't have VM_LOCKED, We don't need to check
the needed of calling mlock_vma_page().

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/rmap.c |   25 ++++++++++++-------------
 1 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 4440a86..81a168c 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -784,10 +784,8 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 	 * skipped over this mm) then we should reactivate it.
 	 */
 	if (!(flags & TTU_IGNORE_MLOCK)) {
-		if (vma->vm_flags & VM_LOCKED) {
-			ret = SWAP_MLOCK;
-			goto out_unmap;
-		}
+		if (vma->vm_flags & VM_LOCKED)
+			goto out_unlock;
 		if (MLOCK_PAGES && TTU_ACTION(flags) == TTU_MUNLOCK)
 			goto out_unmap;
 	}
@@ -856,18 +854,19 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 
 out_unmap:
 	pte_unmap_unlock(pte, ptl);
+out:
+	return ret;
 
-	if (MLOCK_PAGES && ret == SWAP_MLOCK) {
-		ret = SWAP_AGAIN;
-		if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
-			if (vma->vm_flags & VM_LOCKED) {
-				mlock_vma_page(page);
-				ret = SWAP_MLOCK;
-			}
-			up_read(&vma->vm_mm->mmap_sem);
+out_unlock:
+	pte_unmap_unlock(pte, ptl);
+
+	if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
+		if (vma->vm_flags & VM_LOCKED) {
+			mlock_vma_page(page);
+			ret = SWAP_MLOCK;
 		}
+		up_read(&vma->vm_mm->mmap_sem);
 	}
-out:
 	return ret;
 }
 
-- 
1.6.2.5






^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: mlocking in try_to_unmap_one
@ 2009-11-13  6:30     ` KOSAKI Motohiro
  0 siblings, 0 replies; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-11-13  6:30 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: kosaki.motohiro, Andrew Morton, Izik Eidus, Andrea Arcangeli,
	Nick Piggin, Rik van Riel, Lee Schermerhorn, linux-kernel,
	linux-mm

> @@ -787,6 +787,8 @@ static int try_to_unmap_one(struct page
>  			ret = SWAP_MLOCK;
>  			goto out_unmap;
>  		}
> +		if (MLOCK_PAGES && TTU_ACTION(flags) == TTU_MUNLOCK)
> +			goto out_unmap;
>  	}
>  	if (!(flags & TTU_IGNORE_ACCESS)) {
>  		if (ptep_clear_flush_young_notify(vma, address, pte)) {
> @@ -852,12 +854,22 @@ static int try_to_unmap_one(struct page
>  	} else
>  		dec_mm_counter(mm, file_rss);
>  
> -
>  	page_remove_rmap(page);
>  	page_cache_release(page);
>  
>  out_unmap:
>  	pte_unmap_unlock(pte, ptl);
> +
> +	if (MLOCK_PAGES && ret == SWAP_MLOCK) {
> +		ret = SWAP_AGAIN;
> +		if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
> +			if (vma->vm_flags & VM_LOCKED) {
> +				mlock_vma_page(page);
> +				ret = SWAP_MLOCK;
> +			}
> +			up_read(&vma->vm_mm->mmap_sem);
> +		}
> +	}
>  out:

Very small nit. How about this?


------------------------------------------------------------
From 9d4b507572eccf88dcaa02e650df59874216528c Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Fri, 13 Nov 2009 15:00:04 +0900
Subject: [PATCH] Simplify try_to_unmap_one()

SWAP_MLOCK mean "We marked the page as PG_MLOCK, please move it to
unevictable-lru". So, following code is easy confusable.

	if (vma->vm_flags & VM_LOCKED) {
		ret = SWAP_MLOCK;
		goto out_unmap;
	}

Plus, if the VMA doesn't have VM_LOCKED, We don't need to check
the needed of calling mlock_vma_page().

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/rmap.c |   25 ++++++++++++-------------
 1 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 4440a86..81a168c 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -784,10 +784,8 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 	 * skipped over this mm) then we should reactivate it.
 	 */
 	if (!(flags & TTU_IGNORE_MLOCK)) {
-		if (vma->vm_flags & VM_LOCKED) {
-			ret = SWAP_MLOCK;
-			goto out_unmap;
-		}
+		if (vma->vm_flags & VM_LOCKED)
+			goto out_unlock;
 		if (MLOCK_PAGES && TTU_ACTION(flags) == TTU_MUNLOCK)
 			goto out_unmap;
 	}
@@ -856,18 +854,19 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 
 out_unmap:
 	pte_unmap_unlock(pte, ptl);
+out:
+	return ret;
 
-	if (MLOCK_PAGES && ret == SWAP_MLOCK) {
-		ret = SWAP_AGAIN;
-		if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
-			if (vma->vm_flags & VM_LOCKED) {
-				mlock_vma_page(page);
-				ret = SWAP_MLOCK;
-			}
-			up_read(&vma->vm_mm->mmap_sem);
+out_unlock:
+	pte_unmap_unlock(pte, ptl);
+
+	if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
+		if (vma->vm_flags & VM_LOCKED) {
+			mlock_vma_page(page);
+			ret = SWAP_MLOCK;
 		}
+		up_read(&vma->vm_mm->mmap_sem);
 	}
-out:
 	return ret;
 }
 
-- 
1.6.2.5





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: mlocking in try_to_unmap_one
  2009-11-11 11:36       ` Hugh Dickins
@ 2009-11-13  8:16         ` KOSAKI Motohiro
  -1 siblings, 0 replies; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-11-13  8:16 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: kosaki.motohiro, Andrew Morton, Izik Eidus, Andrea Arcangeli,
	Nick Piggin, Rik van Riel, Lee Schermerhorn, linux-kernel,
	linux-mm

> On Wed, 11 Nov 2009, KOSAKI Motohiro wrote:
> 
> Though it doesn't quite answer your question,
> I'll just reinsert the last paragraph of my description here...
> 
> > > try_to_unmap_file()'s TTU_MUNLOCK nonlinear handling was particularly
> > > amusing: once unravelled, it turns out to have been choosing between
> > > two different ways of doing the same nothing.  Ah, no, one way was
> > > actually returning SWAP_FAIL when it meant to return SWAP_SUCCESS.
> 
> ... 
> > > @@ -1081,45 +1053,23 @@ static int try_to_unmap_file(struct page
> ...
> > >
> > > -	if (list_empty(&mapping->i_mmap_nonlinear))
> > > +	/* We don't bother to try to find the munlocked page in nonlinears */
> > > +	if (MLOCK_PAGES && TTU_ACTION(flags) == TTU_MUNLOCK)
> > >  		goto out;
> > 
> > I have dumb question.
> > Does this shortcut exiting code makes any behavior change?
> 
> Not dumb.  My intention was to make no behaviour change with any of
> this patch; but in checking back before completing the description,
> I suddenly realized that that shortcut intentionally avoids the
> 
> 	if (max_nl_size == 0) {	/* all nonlinears locked or reserved ? */
> 		ret = SWAP_FAIL;
> 		goto out;
> 	}
> 
> (which doesn't show up in the patch: you'll have to look at rmap.c),
> which used to have the effect of try_to_munlock() returning SWAP_FAIL
> in the case when there were one or more VM_NONLINEAR vmas of the file,
> but none of them (and none of the covering linear vmas) VM_LOCKED.
> 
> That should have been a SWAP_SUCCESS case, or with my changes
> another SWAP_AGAIN, either of which would make munlock_vma_page()
> 				count_vm_event(UNEVICTABLE_PGMUNLOCKED);
> which would be correct; but the SWAP_FAIL meant that count was not
> incremented in this case.

Ah, correct.
Then, we lost the capability unevictability of non linear mapping pages, right.
if so, following additional patch makes more consistent?


> 
> Actually, I've double-fixed that, because I also changed
> munlock_vma_page() to increment the count whenever ret != SWAP_MLOCK;
> which seemed more appropriate, but would have been a no-op if
> try_to_munlock() only returned SWAP_SUCCESS or SWAP_AGAIN or SWAP_MLOCK
> as it claimed.
> 
> But I wasn't very inclined to boast of fixing that bug, since my testing
> didn't give confidence that those /proc/vmstat unevictable_pgs_*lock*
> counts are being properly maintained anyway - when I locked the same
> pages in two vmas then unlocked them in both, I ended up with mlocked
> bigger than munlocked (with or without my 2/6 patch); which I suspect
> is wrong, but rather off my present course towards KSM swapping...

Ah, vmstat inconsistent is weird. I'll try to debug it later.
Thanks this notice.


----------------------------------
>From 3fd3bc58dc6505af73ecf92c981609ecf8b6ac40 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Fri, 13 Nov 2009 16:52:03 +0900
Subject: [PATCH] [RFC] mm: non linear mapping page don't mark as PG_mlocked

Now, try_to_unmap_file() lost the capability to treat VM_NONLINEAR.
Then, mlock() shouldn't mark the page of non linear mapping as
PG_mlocked. Otherwise the page continue to drinker walk between
evictable and unevictable lru.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/mlock.c |   37 +++++++++++++++++++++++--------------
 1 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/mm/mlock.c b/mm/mlock.c
index 48691fb..4187f9c 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -266,25 +266,34 @@ long mlock_vma_pages_range(struct vm_area_struct *vma,
 	if (vma->vm_flags & (VM_IO | VM_PFNMAP))
 		goto no_mlock;
 
-	if (!((vma->vm_flags & (VM_DONTEXPAND | VM_RESERVED)) ||
+	if ((vma->vm_flags & (VM_DONTEXPAND | VM_RESERVED)) ||
 			is_vm_hugetlb_page(vma) ||
-			vma == get_gate_vma(current))) {
+			vma == get_gate_vma(current)) {
+
+		/*
+		 * User mapped kernel pages or huge pages:
+		 * make these pages present to populate the ptes, but
+		 * fall thru' to reset VM_LOCKED--no need to unlock, and
+		 * return nr_pages so these don't get counted against task's
+		 * locked limit.  huge pages are already counted against
+		 * locked vm limit.
+		 */
+		make_pages_present(start, end);
+		goto no_mlock;
+	}
 
+	if (vma->vm_flags & VM_NONLINEAR)
+		/*
+		 * try_to_munmap() doesn't treat VM_NONLINEAR. let's make
+		 * consist.
+		 */
+		make_pages_present(start, end);
+	else
 		__mlock_vma_pages_range(vma, start, end);
 
-		/* Hide errors from mmap() and other callers */
-		return 0;
-	}
+	/* Hide errors from mmap() and other callers */
+	return 0;
 
-	/*
-	 * User mapped kernel pages or huge pages:
-	 * make these pages present to populate the ptes, but
-	 * fall thru' to reset VM_LOCKED--no need to unlock, and
-	 * return nr_pages so these don't get counted against task's
-	 * locked limit.  huge pages are already counted against
-	 * locked vm limit.
-	 */
-	make_pages_present(start, end);
 
 no_mlock:
 	vma->vm_flags &= ~VM_LOCKED;	/* and don't come back! */
-- 
1.6.2.5




^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: mlocking in try_to_unmap_one
@ 2009-11-13  8:16         ` KOSAKI Motohiro
  0 siblings, 0 replies; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-11-13  8:16 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: kosaki.motohiro, Andrew Morton, Izik Eidus, Andrea Arcangeli,
	Nick Piggin, Rik van Riel, Lee Schermerhorn, linux-kernel,
	linux-mm

> On Wed, 11 Nov 2009, KOSAKI Motohiro wrote:
> 
> Though it doesn't quite answer your question,
> I'll just reinsert the last paragraph of my description here...
> 
> > > try_to_unmap_file()'s TTU_MUNLOCK nonlinear handling was particularly
> > > amusing: once unravelled, it turns out to have been choosing between
> > > two different ways of doing the same nothing.  Ah, no, one way was
> > > actually returning SWAP_FAIL when it meant to return SWAP_SUCCESS.
> 
> ... 
> > > @@ -1081,45 +1053,23 @@ static int try_to_unmap_file(struct page
> ...
> > >
> > > -	if (list_empty(&mapping->i_mmap_nonlinear))
> > > +	/* We don't bother to try to find the munlocked page in nonlinears */
> > > +	if (MLOCK_PAGES && TTU_ACTION(flags) == TTU_MUNLOCK)
> > >  		goto out;
> > 
> > I have dumb question.
> > Does this shortcut exiting code makes any behavior change?
> 
> Not dumb.  My intention was to make no behaviour change with any of
> this patch; but in checking back before completing the description,
> I suddenly realized that that shortcut intentionally avoids the
> 
> 	if (max_nl_size == 0) {	/* all nonlinears locked or reserved ? */
> 		ret = SWAP_FAIL;
> 		goto out;
> 	}
> 
> (which doesn't show up in the patch: you'll have to look at rmap.c),
> which used to have the effect of try_to_munlock() returning SWAP_FAIL
> in the case when there were one or more VM_NONLINEAR vmas of the file,
> but none of them (and none of the covering linear vmas) VM_LOCKED.
> 
> That should have been a SWAP_SUCCESS case, or with my changes
> another SWAP_AGAIN, either of which would make munlock_vma_page()
> 				count_vm_event(UNEVICTABLE_PGMUNLOCKED);
> which would be correct; but the SWAP_FAIL meant that count was not
> incremented in this case.

Ah, correct.
Then, we lost the capability unevictability of non linear mapping pages, right.
if so, following additional patch makes more consistent?


> 
> Actually, I've double-fixed that, because I also changed
> munlock_vma_page() to increment the count whenever ret != SWAP_MLOCK;
> which seemed more appropriate, but would have been a no-op if
> try_to_munlock() only returned SWAP_SUCCESS or SWAP_AGAIN or SWAP_MLOCK
> as it claimed.
> 
> But I wasn't very inclined to boast of fixing that bug, since my testing
> didn't give confidence that those /proc/vmstat unevictable_pgs_*lock*
> counts are being properly maintained anyway - when I locked the same
> pages in two vmas then unlocked them in both, I ended up with mlocked
> bigger than munlocked (with or without my 2/6 patch); which I suspect
> is wrong, but rather off my present course towards KSM swapping...

Ah, vmstat inconsistent is weird. I'll try to debug it later.
Thanks this notice.


----------------------------------
From 3fd3bc58dc6505af73ecf92c981609ecf8b6ac40 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Fri, 13 Nov 2009 16:52:03 +0900
Subject: [PATCH] [RFC] mm: non linear mapping page don't mark as PG_mlocked

Now, try_to_unmap_file() lost the capability to treat VM_NONLINEAR.
Then, mlock() shouldn't mark the page of non linear mapping as
PG_mlocked. Otherwise the page continue to drinker walk between
evictable and unevictable lru.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/mlock.c |   37 +++++++++++++++++++++++--------------
 1 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/mm/mlock.c b/mm/mlock.c
index 48691fb..4187f9c 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -266,25 +266,34 @@ long mlock_vma_pages_range(struct vm_area_struct *vma,
 	if (vma->vm_flags & (VM_IO | VM_PFNMAP))
 		goto no_mlock;
 
-	if (!((vma->vm_flags & (VM_DONTEXPAND | VM_RESERVED)) ||
+	if ((vma->vm_flags & (VM_DONTEXPAND | VM_RESERVED)) ||
 			is_vm_hugetlb_page(vma) ||
-			vma == get_gate_vma(current))) {
+			vma == get_gate_vma(current)) {
+
+		/*
+		 * User mapped kernel pages or huge pages:
+		 * make these pages present to populate the ptes, but
+		 * fall thru' to reset VM_LOCKED--no need to unlock, and
+		 * return nr_pages so these don't get counted against task's
+		 * locked limit.  huge pages are already counted against
+		 * locked vm limit.
+		 */
+		make_pages_present(start, end);
+		goto no_mlock;
+	}
 
+	if (vma->vm_flags & VM_NONLINEAR)
+		/*
+		 * try_to_munmap() doesn't treat VM_NONLINEAR. let's make
+		 * consist.
+		 */
+		make_pages_present(start, end);
+	else
 		__mlock_vma_pages_range(vma, start, end);
 
-		/* Hide errors from mmap() and other callers */
-		return 0;
-	}
+	/* Hide errors from mmap() and other callers */
+	return 0;
 
-	/*
-	 * User mapped kernel pages or huge pages:
-	 * make these pages present to populate the ptes, but
-	 * fall thru' to reset VM_LOCKED--no need to unlock, and
-	 * return nr_pages so these don't get counted against task's
-	 * locked limit.  huge pages are already counted against
-	 * locked vm limit.
-	 */
-	make_pages_present(start, end);
 
 no_mlock:
 	vma->vm_flags &= ~VM_LOCKED;	/* and don't come back! */
-- 
1.6.2.5



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: mlocking in try_to_unmap_one
  2009-11-13  8:16         ` KOSAKI Motohiro
@ 2009-11-13  8:26           ` KOSAKI Motohiro
  -1 siblings, 0 replies; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-11-13  8:26 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: kosaki.motohiro, Hugh Dickins, Andrew Morton, Izik Eidus,
	Andrea Arcangeli, Nick Piggin, Rik van Riel, Lee Schermerhorn,
	linux-kernel, linux-mm

> > On Wed, 11 Nov 2009, KOSAKI Motohiro wrote:
> > 
> > Though it doesn't quite answer your question,
> > I'll just reinsert the last paragraph of my description here...
> > 
> > > > try_to_unmap_file()'s TTU_MUNLOCK nonlinear handling was particularly
> > > > amusing: once unravelled, it turns out to have been choosing between
> > > > two different ways of doing the same nothing.  Ah, no, one way was
> > > > actually returning SWAP_FAIL when it meant to return SWAP_SUCCESS.
> > 
> > ... 
> > > > @@ -1081,45 +1053,23 @@ static int try_to_unmap_file(struct page
> > ...
> > > >
> > > > -	if (list_empty(&mapping->i_mmap_nonlinear))
> > > > +	/* We don't bother to try to find the munlocked page in nonlinears */
> > > > +	if (MLOCK_PAGES && TTU_ACTION(flags) == TTU_MUNLOCK)
> > > >  		goto out;
> > > 
> > > I have dumb question.
> > > Does this shortcut exiting code makes any behavior change?
> > 
> > Not dumb.  My intention was to make no behaviour change with any of
> > this patch; but in checking back before completing the description,
> > I suddenly realized that that shortcut intentionally avoids the
> > 
> > 	if (max_nl_size == 0) {	/* all nonlinears locked or reserved ? */
> > 		ret = SWAP_FAIL;
> > 		goto out;
> > 	}
> > 
> > (which doesn't show up in the patch: you'll have to look at rmap.c),
> > which used to have the effect of try_to_munlock() returning SWAP_FAIL
> > in the case when there were one or more VM_NONLINEAR vmas of the file,
> > but none of them (and none of the covering linear vmas) VM_LOCKED.
> > 
> > That should have been a SWAP_SUCCESS case, or with my changes
> > another SWAP_AGAIN, either of which would make munlock_vma_page()
> > 				count_vm_event(UNEVICTABLE_PGMUNLOCKED);
> > which would be correct; but the SWAP_FAIL meant that count was not
> > incremented in this case.
> 
> Ah, correct.
> Then, we lost the capability unevictability of non linear mapping pages, right.
> if so, following additional patch makes more consistent?

[indistinct muttering]

Probably we can remove VM_NONLINEAR perfectly. I've never seen real user of it.






^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: mlocking in try_to_unmap_one
@ 2009-11-13  8:26           ` KOSAKI Motohiro
  0 siblings, 0 replies; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-11-13  8:26 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Hugh Dickins, Andrew Morton, Izik Eidus, Andrea Arcangeli,
	Nick Piggin, Rik van Riel, Lee Schermerhorn, linux-kernel,
	linux-mm

> > On Wed, 11 Nov 2009, KOSAKI Motohiro wrote:
> > 
> > Though it doesn't quite answer your question,
> > I'll just reinsert the last paragraph of my description here...
> > 
> > > > try_to_unmap_file()'s TTU_MUNLOCK nonlinear handling was particularly
> > > > amusing: once unravelled, it turns out to have been choosing between
> > > > two different ways of doing the same nothing.  Ah, no, one way was
> > > > actually returning SWAP_FAIL when it meant to return SWAP_SUCCESS.
> > 
> > ... 
> > > > @@ -1081,45 +1053,23 @@ static int try_to_unmap_file(struct page
> > ...
> > > >
> > > > -	if (list_empty(&mapping->i_mmap_nonlinear))
> > > > +	/* We don't bother to try to find the munlocked page in nonlinears */
> > > > +	if (MLOCK_PAGES && TTU_ACTION(flags) == TTU_MUNLOCK)
> > > >  		goto out;
> > > 
> > > I have dumb question.
> > > Does this shortcut exiting code makes any behavior change?
> > 
> > Not dumb.  My intention was to make no behaviour change with any of
> > this patch; but in checking back before completing the description,
> > I suddenly realized that that shortcut intentionally avoids the
> > 
> > 	if (max_nl_size == 0) {	/* all nonlinears locked or reserved ? */
> > 		ret = SWAP_FAIL;
> > 		goto out;
> > 	}
> > 
> > (which doesn't show up in the patch: you'll have to look at rmap.c),
> > which used to have the effect of try_to_munlock() returning SWAP_FAIL
> > in the case when there were one or more VM_NONLINEAR vmas of the file,
> > but none of them (and none of the covering linear vmas) VM_LOCKED.
> > 
> > That should have been a SWAP_SUCCESS case, or with my changes
> > another SWAP_AGAIN, either of which would make munlock_vma_page()
> > 				count_vm_event(UNEVICTABLE_PGMUNLOCKED);
> > which would be correct; but the SWAP_FAIL meant that count was not
> > incremented in this case.
> 
> Ah, correct.
> Then, we lost the capability unevictability of non linear mapping pages, right.
> if so, following additional patch makes more consistent?

[indistinct muttering]

Probably we can remove VM_NONLINEAR perfectly. I've never seen real user of it.





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: mlocking in try_to_unmap_one
  2009-11-13  8:26           ` KOSAKI Motohiro
@ 2009-11-13 11:50             ` Andrea Arcangeli
  -1 siblings, 0 replies; 62+ messages in thread
From: Andrea Arcangeli @ 2009-11-13 11:50 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Hugh Dickins, Andrew Morton, Izik Eidus, Nick Piggin,
	Rik van Riel, Lee Schermerhorn, linux-kernel, linux-mm

On Fri, Nov 13, 2009 at 05:26:14PM +0900, KOSAKI Motohiro wrote:
> Probably we can remove VM_NONLINEAR perfectly. I've never seen real user of it.

Do you mean as a whole or in the mlock logic? databases are using
remap_file_pages on 32bit archs to avoid generating zillon of vmas on
tmpfs scattered mappings. On 64bits it could only be useful to some
emulators but with real shadow paging and nonlinear rmap already
created on shadow pagetables, it looks pretty useless on 64bit archs
to me.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: mlocking in try_to_unmap_one
@ 2009-11-13 11:50             ` Andrea Arcangeli
  0 siblings, 0 replies; 62+ messages in thread
From: Andrea Arcangeli @ 2009-11-13 11:50 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Hugh Dickins, Andrew Morton, Izik Eidus, Nick Piggin,
	Rik van Riel, Lee Schermerhorn, linux-kernel, linux-mm

On Fri, Nov 13, 2009 at 05:26:14PM +0900, KOSAKI Motohiro wrote:
> Probably we can remove VM_NONLINEAR perfectly. I've never seen real user of it.

Do you mean as a whole or in the mlock logic? databases are using
remap_file_pages on 32bit archs to avoid generating zillon of vmas on
tmpfs scattered mappings. On 64bits it could only be useful to some
emulators but with real shadow paging and nonlinear rmap already
created on shadow pagetables, it looks pretty useless on 64bit archs
to me.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: mlocking in try_to_unmap_one
  2009-11-13 11:50             ` Andrea Arcangeli
@ 2009-11-13 18:00               ` KOSAKI Motohiro
  -1 siblings, 0 replies; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-11-13 18:00 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: kosaki.motohiro, Hugh Dickins, Andrew Morton, Izik Eidus,
	Nick Piggin, Rik van Riel, Lee Schermerhorn, linux-kernel,
	linux-mm

> On Fri, Nov 13, 2009 at 05:26:14PM +0900, KOSAKI Motohiro wrote:
> > Probably we can remove VM_NONLINEAR perfectly. I've never seen real user of it.
> 
> Do you mean as a whole or in the mlock logic? databases are using
> remap_file_pages on 32bit archs to avoid generating zillon of vmas on
> tmpfs scattered mappings. On 64bits it could only be useful to some
> emulators but with real shadow paging and nonlinear rmap already
> created on shadow pagetables, it looks pretty useless on 64bit archs
> to me.

Hehe, you point out kosaki is stupid and knoledgeless.
thanks correct me.

Probaby we have to maintain VM_NONLINEAR for one or two year. two years
later, nobody use database on 32bit machine.
(low-end DB might be use on 32bit, but that's ok. it doesn't use VM_NONLINEAR)





^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: mlocking in try_to_unmap_one
@ 2009-11-13 18:00               ` KOSAKI Motohiro
  0 siblings, 0 replies; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-11-13 18:00 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: kosaki.motohiro, Hugh Dickins, Andrew Morton, Izik Eidus,
	Nick Piggin, Rik van Riel, Lee Schermerhorn, linux-kernel,
	linux-mm

> On Fri, Nov 13, 2009 at 05:26:14PM +0900, KOSAKI Motohiro wrote:
> > Probably we can remove VM_NONLINEAR perfectly. I've never seen real user of it.
> 
> Do you mean as a whole or in the mlock logic? databases are using
> remap_file_pages on 32bit archs to avoid generating zillon of vmas on
> tmpfs scattered mappings. On 64bits it could only be useful to some
> emulators but with real shadow paging and nonlinear rmap already
> created on shadow pagetables, it looks pretty useless on 64bit archs
> to me.

Hehe, you point out kosaki is stupid and knoledgeless.
thanks correct me.

Probaby we have to maintain VM_NONLINEAR for one or two year. two years
later, nobody use database on 32bit machine.
(low-end DB might be use on 32bit, but that's ok. it doesn't use VM_NONLINEAR)




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: mlocking in try_to_unmap_one
  2009-11-13  6:30     ` KOSAKI Motohiro
@ 2009-11-15 22:16       ` Hugh Dickins
  -1 siblings, 0 replies; 62+ messages in thread
From: Hugh Dickins @ 2009-11-15 22:16 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Izik Eidus, Andrea Arcangeli, Nick Piggin,
	Rik van Riel, Lee Schermerhorn, linux-kernel, linux-mm

On Fri, 13 Nov 2009, KOSAKI Motohiro wrote:
> 
> Very small nit. How about this?

Yes, that takes it a stage further, I prefer that, thanks: but better
redo against mmotm, I removed the "MLOCK_PAGES && " in a later patch.

Hugh

> 
> 
> ------------------------------------------------------------
> From 9d4b507572eccf88dcaa02e650df59874216528c Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Date: Fri, 13 Nov 2009 15:00:04 +0900
> Subject: [PATCH] Simplify try_to_unmap_one()
> 
> SWAP_MLOCK mean "We marked the page as PG_MLOCK, please move it to
> unevictable-lru". So, following code is easy confusable.
> 
> 	if (vma->vm_flags & VM_LOCKED) {
> 		ret = SWAP_MLOCK;
> 		goto out_unmap;
> 	}
> 
> Plus, if the VMA doesn't have VM_LOCKED, We don't need to check
> the needed of calling mlock_vma_page().
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  mm/rmap.c |   25 ++++++++++++-------------
>  1 files changed, 12 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 4440a86..81a168c 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -784,10 +784,8 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>  	 * skipped over this mm) then we should reactivate it.
>  	 */
>  	if (!(flags & TTU_IGNORE_MLOCK)) {
> -		if (vma->vm_flags & VM_LOCKED) {
> -			ret = SWAP_MLOCK;
> -			goto out_unmap;
> -		}
> +		if (vma->vm_flags & VM_LOCKED)
> +			goto out_unlock;
>  		if (MLOCK_PAGES && TTU_ACTION(flags) == TTU_MUNLOCK)
>  			goto out_unmap;
>  	}
> @@ -856,18 +854,19 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>  
>  out_unmap:
>  	pte_unmap_unlock(pte, ptl);
> +out:
> +	return ret;
>  
> -	if (MLOCK_PAGES && ret == SWAP_MLOCK) {
> -		ret = SWAP_AGAIN;
> -		if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
> -			if (vma->vm_flags & VM_LOCKED) {
> -				mlock_vma_page(page);
> -				ret = SWAP_MLOCK;
> -			}
> -			up_read(&vma->vm_mm->mmap_sem);
> +out_unlock:
> +	pte_unmap_unlock(pte, ptl);
> +
> +	if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
> +		if (vma->vm_flags & VM_LOCKED) {
> +			mlock_vma_page(page);
> +			ret = SWAP_MLOCK;
>  		}
> +		up_read(&vma->vm_mm->mmap_sem);
>  	}
> -out:
>  	return ret;
>  }
>  
> -- 
> 1.6.2.5

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: mlocking in try_to_unmap_one
@ 2009-11-15 22:16       ` Hugh Dickins
  0 siblings, 0 replies; 62+ messages in thread
From: Hugh Dickins @ 2009-11-15 22:16 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Izik Eidus, Andrea Arcangeli, Nick Piggin,
	Rik van Riel, Lee Schermerhorn, linux-kernel, linux-mm

On Fri, 13 Nov 2009, KOSAKI Motohiro wrote:
> 
> Very small nit. How about this?

Yes, that takes it a stage further, I prefer that, thanks: but better
redo against mmotm, I removed the "MLOCK_PAGES && " in a later patch.

Hugh

> 
> 
> ------------------------------------------------------------
> From 9d4b507572eccf88dcaa02e650df59874216528c Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Date: Fri, 13 Nov 2009 15:00:04 +0900
> Subject: [PATCH] Simplify try_to_unmap_one()
> 
> SWAP_MLOCK mean "We marked the page as PG_MLOCK, please move it to
> unevictable-lru". So, following code is easy confusable.
> 
> 	if (vma->vm_flags & VM_LOCKED) {
> 		ret = SWAP_MLOCK;
> 		goto out_unmap;
> 	}
> 
> Plus, if the VMA doesn't have VM_LOCKED, We don't need to check
> the needed of calling mlock_vma_page().
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  mm/rmap.c |   25 ++++++++++++-------------
>  1 files changed, 12 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 4440a86..81a168c 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -784,10 +784,8 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>  	 * skipped over this mm) then we should reactivate it.
>  	 */
>  	if (!(flags & TTU_IGNORE_MLOCK)) {
> -		if (vma->vm_flags & VM_LOCKED) {
> -			ret = SWAP_MLOCK;
> -			goto out_unmap;
> -		}
> +		if (vma->vm_flags & VM_LOCKED)
> +			goto out_unlock;
>  		if (MLOCK_PAGES && TTU_ACTION(flags) == TTU_MUNLOCK)
>  			goto out_unmap;
>  	}
> @@ -856,18 +854,19 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>  
>  out_unmap:
>  	pte_unmap_unlock(pte, ptl);
> +out:
> +	return ret;
>  
> -	if (MLOCK_PAGES && ret == SWAP_MLOCK) {
> -		ret = SWAP_AGAIN;
> -		if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
> -			if (vma->vm_flags & VM_LOCKED) {
> -				mlock_vma_page(page);
> -				ret = SWAP_MLOCK;
> -			}
> -			up_read(&vma->vm_mm->mmap_sem);
> +out_unlock:
> +	pte_unmap_unlock(pte, ptl);
> +
> +	if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
> +		if (vma->vm_flags & VM_LOCKED) {
> +			mlock_vma_page(page);
> +			ret = SWAP_MLOCK;
>  		}
> +		up_read(&vma->vm_mm->mmap_sem);
>  	}
> -out:
>  	return ret;
>  }
>  
> -- 
> 1.6.2.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: mlocking in try_to_unmap_one
  2009-11-13  8:16         ` KOSAKI Motohiro
@ 2009-11-15 22:37           ` Hugh Dickins
  -1 siblings, 0 replies; 62+ messages in thread
From: Hugh Dickins @ 2009-11-15 22:37 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Izik Eidus, Andrea Arcangeli, Nick Piggin,
	Rik van Riel, Lee Schermerhorn, linux-kernel, linux-mm

On Fri, 13 Nov 2009, KOSAKI Motohiro wrote:
> if so, following additional patch makes more consistent?
> ----------------------------------
> From 3fd3bc58dc6505af73ecf92c981609ecf8b6ac40 Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Date: Fri, 13 Nov 2009 16:52:03 +0900
> Subject: [PATCH] [RFC] mm: non linear mapping page don't mark as PG_mlocked
> 
> Now, try_to_unmap_file() lost the capability to treat VM_NONLINEAR.

Now?
Genuine try_to_unmap_file() deals with VM_NONLINEAR (including VM_LOCKED)
much as it always did, I think.  But try_to_munlock() on a VM_NONLINEAR
has not being doing anything useful, I assume ever since it was added,
but haven't checked the history.

But so what?  try_to_munlock() has those down_read_trylock()s which make
it never quite reliable.  In the VM_NONLINEAR case it has simply been
giving up rather more easily.

> Then, mlock() shouldn't mark the page of non linear mapping as
> PG_mlocked. Otherwise the page continue to drinker walk between
> evictable and unevictable lru.

I do like your phrase "drinker walk".  But is it really worse than
the lazy discovery of the page being locked, which is how I thought
this stuff was originally supposed to work anyway.  I presume cases
were found in which the counts got so far out that it was a problem?

I liked the lazy discovery much better than trying to keep count;
can we just accept that VM_NONLINEAR may leave the counts further
away from exactitude?

I don't think this patch makes things more consistent, really.
It does make sys_remap_file_pages on an mlocked area inconsistent
with mlock on a sys_remap_file_pages area, doesn't it?

Hugh

> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  mm/mlock.c |   37 +++++++++++++++++++++++--------------
>  1 files changed, 23 insertions(+), 14 deletions(-)
> 
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 48691fb..4187f9c 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -266,25 +266,34 @@ long mlock_vma_pages_range(struct vm_area_struct *vma,
>  	if (vma->vm_flags & (VM_IO | VM_PFNMAP))
>  		goto no_mlock;
>  
> -	if (!((vma->vm_flags & (VM_DONTEXPAND | VM_RESERVED)) ||
> +	if ((vma->vm_flags & (VM_DONTEXPAND | VM_RESERVED)) ||
>  			is_vm_hugetlb_page(vma) ||
> -			vma == get_gate_vma(current))) {
> +			vma == get_gate_vma(current)) {
> +
> +		/*
> +		 * User mapped kernel pages or huge pages:
> +		 * make these pages present to populate the ptes, but
> +		 * fall thru' to reset VM_LOCKED--no need to unlock, and
> +		 * return nr_pages so these don't get counted against task's
> +		 * locked limit.  huge pages are already counted against
> +		 * locked vm limit.
> +		 */
> +		make_pages_present(start, end);
> +		goto no_mlock;
> +	}
>  
> +	if (vma->vm_flags & VM_NONLINEAR)
> +		/*
> +		 * try_to_munmap() doesn't treat VM_NONLINEAR. let's make
> +		 * consist.
> +		 */
> +		make_pages_present(start, end);
> +	else
>  		__mlock_vma_pages_range(vma, start, end);
>  
> -		/* Hide errors from mmap() and other callers */
> -		return 0;
> -	}
> +	/* Hide errors from mmap() and other callers */
> +	return 0;
>  
> -	/*
> -	 * User mapped kernel pages or huge pages:
> -	 * make these pages present to populate the ptes, but
> -	 * fall thru' to reset VM_LOCKED--no need to unlock, and
> -	 * return nr_pages so these don't get counted against task's
> -	 * locked limit.  huge pages are already counted against
> -	 * locked vm limit.
> -	 */
> -	make_pages_present(start, end);
>  
>  no_mlock:
>  	vma->vm_flags &= ~VM_LOCKED;	/* and don't come back! */
> -- 
> 1.6.2.5

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: mlocking in try_to_unmap_one
@ 2009-11-15 22:37           ` Hugh Dickins
  0 siblings, 0 replies; 62+ messages in thread
From: Hugh Dickins @ 2009-11-15 22:37 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Izik Eidus, Andrea Arcangeli, Nick Piggin,
	Rik van Riel, Lee Schermerhorn, linux-kernel, linux-mm

On Fri, 13 Nov 2009, KOSAKI Motohiro wrote:
> if so, following additional patch makes more consistent?
> ----------------------------------
> From 3fd3bc58dc6505af73ecf92c981609ecf8b6ac40 Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Date: Fri, 13 Nov 2009 16:52:03 +0900
> Subject: [PATCH] [RFC] mm: non linear mapping page don't mark as PG_mlocked
> 
> Now, try_to_unmap_file() lost the capability to treat VM_NONLINEAR.

Now?
Genuine try_to_unmap_file() deals with VM_NONLINEAR (including VM_LOCKED)
much as it always did, I think.  But try_to_munlock() on a VM_NONLINEAR
has not being doing anything useful, I assume ever since it was added,
but haven't checked the history.

But so what?  try_to_munlock() has those down_read_trylock()s which make
it never quite reliable.  In the VM_NONLINEAR case it has simply been
giving up rather more easily.

> Then, mlock() shouldn't mark the page of non linear mapping as
> PG_mlocked. Otherwise the page continue to drinker walk between
> evictable and unevictable lru.

I do like your phrase "drinker walk".  But is it really worse than
the lazy discovery of the page being locked, which is how I thought
this stuff was originally supposed to work anyway.  I presume cases
were found in which the counts got so far out that it was a problem?

I liked the lazy discovery much better than trying to keep count;
can we just accept that VM_NONLINEAR may leave the counts further
away from exactitude?

I don't think this patch makes things more consistent, really.
It does make sys_remap_file_pages on an mlocked area inconsistent
with mlock on a sys_remap_file_pages area, doesn't it?

Hugh

> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  mm/mlock.c |   37 +++++++++++++++++++++++--------------
>  1 files changed, 23 insertions(+), 14 deletions(-)
> 
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 48691fb..4187f9c 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -266,25 +266,34 @@ long mlock_vma_pages_range(struct vm_area_struct *vma,
>  	if (vma->vm_flags & (VM_IO | VM_PFNMAP))
>  		goto no_mlock;
>  
> -	if (!((vma->vm_flags & (VM_DONTEXPAND | VM_RESERVED)) ||
> +	if ((vma->vm_flags & (VM_DONTEXPAND | VM_RESERVED)) ||
>  			is_vm_hugetlb_page(vma) ||
> -			vma == get_gate_vma(current))) {
> +			vma == get_gate_vma(current)) {
> +
> +		/*
> +		 * User mapped kernel pages or huge pages:
> +		 * make these pages present to populate the ptes, but
> +		 * fall thru' to reset VM_LOCKED--no need to unlock, and
> +		 * return nr_pages so these don't get counted against task's
> +		 * locked limit.  huge pages are already counted against
> +		 * locked vm limit.
> +		 */
> +		make_pages_present(start, end);
> +		goto no_mlock;
> +	}
>  
> +	if (vma->vm_flags & VM_NONLINEAR)
> +		/*
> +		 * try_to_munmap() doesn't treat VM_NONLINEAR. let's make
> +		 * consist.
> +		 */
> +		make_pages_present(start, end);
> +	else
>  		__mlock_vma_pages_range(vma, start, end);
>  
> -		/* Hide errors from mmap() and other callers */
> -		return 0;
> -	}
> +	/* Hide errors from mmap() and other callers */
> +	return 0;
>  
> -	/*
> -	 * User mapped kernel pages or huge pages:
> -	 * make these pages present to populate the ptes, but
> -	 * fall thru' to reset VM_LOCKED--no need to unlock, and
> -	 * return nr_pages so these don't get counted against task's
> -	 * locked limit.  huge pages are already counted against
> -	 * locked vm limit.
> -	 */
> -	make_pages_present(start, end);
>  
>  no_mlock:
>  	vma->vm_flags &= ~VM_LOCKED;	/* and don't come back! */
> -- 
> 1.6.2.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: mlocking in try_to_unmap_one
  2009-11-15 22:16       ` Hugh Dickins
@ 2009-11-16 23:34         ` KOSAKI Motohiro
  -1 siblings, 0 replies; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-11-16 23:34 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: kosaki.motohiro, Andrew Morton, Izik Eidus, Andrea Arcangeli,
	Nick Piggin, Rik van Riel, Lee Schermerhorn, linux-kernel,
	linux-mm

> On Fri, 13 Nov 2009, KOSAKI Motohiro wrote:
> > 
> > Very small nit. How about this?
> 
> Yes, that takes it a stage further, I prefer that, thanks: but better
> redo against mmotm, I removed the "MLOCK_PAGES && " in a later patch.

Ah, yes.
thanks.

I'll redo.




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: mlocking in try_to_unmap_one
@ 2009-11-16 23:34         ` KOSAKI Motohiro
  0 siblings, 0 replies; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-11-16 23:34 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: kosaki.motohiro, Andrew Morton, Izik Eidus, Andrea Arcangeli,
	Nick Piggin, Rik van Riel, Lee Schermerhorn, linux-kernel,
	linux-mm

> On Fri, 13 Nov 2009, KOSAKI Motohiro wrote:
> > 
> > Very small nit. How about this?
> 
> Yes, that takes it a stage further, I prefer that, thanks: but better
> redo against mmotm, I removed the "MLOCK_PAGES && " in a later patch.

Ah, yes.
thanks.

I'll redo.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: mlocking in try_to_unmap_one
  2009-11-15 22:37           ` Hugh Dickins
@ 2009-11-17  2:00             ` KOSAKI Motohiro
  -1 siblings, 0 replies; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17  2:00 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: kosaki.motohiro, Andrew Morton, Izik Eidus, Andrea Arcangeli,
	Nick Piggin, Rik van Riel, Lee Schermerhorn, linux-kernel,
	linux-mm

> On Fri, 13 Nov 2009, KOSAKI Motohiro wrote:
> > if so, following additional patch makes more consistent?
> > ----------------------------------
> > From 3fd3bc58dc6505af73ecf92c981609ecf8b6ac40 Mon Sep 17 00:00:00 2001
> > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > Date: Fri, 13 Nov 2009 16:52:03 +0900
> > Subject: [PATCH] [RFC] mm: non linear mapping page don't mark as PG_mlocked
> > 
> > Now, try_to_unmap_file() lost the capability to treat VM_NONLINEAR.
> 
> Now?
> Genuine try_to_unmap_file() deals with VM_NONLINEAR (including VM_LOCKED)
> much as it always did, I think.  But try_to_munlock() on a VM_NONLINEAR
> has not being doing anything useful, I assume ever since it was added,
> but haven't checked the history.
> 
> But so what?  try_to_munlock() has those down_read_trylock()s which make
> it never quite reliable.  In the VM_NONLINEAR case it has simply been
> giving up rather more easily.

I catched your point, maybe. thanks, correct me. I agree your lazy 
discovery method.

So, Can we add more kindly comment? (see below)



> > Then, mlock() shouldn't mark the page of non linear mapping as
> > PG_mlocked. Otherwise the page continue to drinker walk between
> > evictable and unevictable lru.
> 
> I do like your phrase "drinker walk".  But is it really worse than
> the lazy discovery of the page being locked, which is how I thought
> this stuff was originally supposed to work anyway.  I presume cases
> were found in which the counts got so far out that it was a problem?
> 
> I liked the lazy discovery much better than trying to keep count;
> can we just accept that VM_NONLINEAR may leave the counts further
> away from exactitude?
> 
> I don't think this patch makes things more consistent, really.
> It does make sys_remap_file_pages on an mlocked area inconsistent
> with mlock on a sys_remap_file_pages area, doesn't it?

you are right.



>From 7332f765dbaa1fbfe48cf8d53b20048f7f8105e0 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Tue, 17 Nov 2009 10:46:51 +0900
Subject: comment adding to mlocking in try_to_unmap_one

Current code doesn't tell us why we don't bother to nonlinear kindly.
This patch added small adding explanation.


Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/rmap.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 81a168c..c631407 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1061,7 +1061,11 @@ static int try_to_unmap_file(struct page *page, enum ttu_flags flags)
 	if (list_empty(&mapping->i_mmap_nonlinear))
 		goto out;
 
-	/* We don't bother to try to find the munlocked page in nonlinears */
+	/*
+	 * We don't bother to try to find the munlocked page in nonlinears.
+	 * It's costly. Instead, later, page reclaim logic may call
+	 * try_to_unmap(TTU_MUNLOCK) and recover PG_mlocked lazily.
+	 */
 	if (MLOCK_PAGES && TTU_ACTION(flags) == TTU_MUNLOCK)
 		goto out;
 
-- 
1.6.2.5




^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: mlocking in try_to_unmap_one
@ 2009-11-17  2:00             ` KOSAKI Motohiro
  0 siblings, 0 replies; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17  2:00 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: kosaki.motohiro, Andrew Morton, Izik Eidus, Andrea Arcangeli,
	Nick Piggin, Rik van Riel, Lee Schermerhorn, linux-kernel,
	linux-mm

> On Fri, 13 Nov 2009, KOSAKI Motohiro wrote:
> > if so, following additional patch makes more consistent?
> > ----------------------------------
> > From 3fd3bc58dc6505af73ecf92c981609ecf8b6ac40 Mon Sep 17 00:00:00 2001
> > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > Date: Fri, 13 Nov 2009 16:52:03 +0900
> > Subject: [PATCH] [RFC] mm: non linear mapping page don't mark as PG_mlocked
> > 
> > Now, try_to_unmap_file() lost the capability to treat VM_NONLINEAR.
> 
> Now?
> Genuine try_to_unmap_file() deals with VM_NONLINEAR (including VM_LOCKED)
> much as it always did, I think.  But try_to_munlock() on a VM_NONLINEAR
> has not being doing anything useful, I assume ever since it was added,
> but haven't checked the history.
> 
> But so what?  try_to_munlock() has those down_read_trylock()s which make
> it never quite reliable.  In the VM_NONLINEAR case it has simply been
> giving up rather more easily.

I catched your point, maybe. thanks, correct me. I agree your lazy 
discovery method.

So, Can we add more kindly comment? (see below)



> > Then, mlock() shouldn't mark the page of non linear mapping as
> > PG_mlocked. Otherwise the page continue to drinker walk between
> > evictable and unevictable lru.
> 
> I do like your phrase "drinker walk".  But is it really worse than
> the lazy discovery of the page being locked, which is how I thought
> this stuff was originally supposed to work anyway.  I presume cases
> were found in which the counts got so far out that it was a problem?
> 
> I liked the lazy discovery much better than trying to keep count;
> can we just accept that VM_NONLINEAR may leave the counts further
> away from exactitude?
> 
> I don't think this patch makes things more consistent, really.
> It does make sys_remap_file_pages on an mlocked area inconsistent
> with mlock on a sys_remap_file_pages area, doesn't it?

you are right.



From 7332f765dbaa1fbfe48cf8d53b20048f7f8105e0 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Tue, 17 Nov 2009 10:46:51 +0900
Subject: comment adding to mlocking in try_to_unmap_one

Current code doesn't tell us why we don't bother to nonlinear kindly.
This patch added small adding explanation.


Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/rmap.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 81a168c..c631407 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1061,7 +1061,11 @@ static int try_to_unmap_file(struct page *page, enum ttu_flags flags)
 	if (list_empty(&mapping->i_mmap_nonlinear))
 		goto out;
 
-	/* We don't bother to try to find the munlocked page in nonlinears */
+	/*
+	 * We don't bother to try to find the munlocked page in nonlinears.
+	 * It's costly. Instead, later, page reclaim logic may call
+	 * try_to_unmap(TTU_MUNLOCK) and recover PG_mlocked lazily.
+	 */
 	if (MLOCK_PAGES && TTU_ACTION(flags) == TTU_MUNLOCK)
 		goto out;
 
-- 
1.6.2.5



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: mlocking in try_to_unmap_one
  2009-11-17  2:00             ` KOSAKI Motohiro
@ 2009-11-18 16:32               ` Hugh Dickins
  -1 siblings, 0 replies; 62+ messages in thread
From: Hugh Dickins @ 2009-11-18 16:32 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Izik Eidus, Andrea Arcangeli, Nick Piggin,
	Rik van Riel, Lee Schermerhorn, linux-kernel, linux-mm

On Tue, 17 Nov 2009, KOSAKI Motohiro wrote:
> 
> From 7332f765dbaa1fbfe48cf8d53b20048f7f8105e0 Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Date: Tue, 17 Nov 2009 10:46:51 +0900
> Subject: comment adding to mlocking in try_to_unmap_one
> 
> Current code doesn't tell us why we don't bother to nonlinear kindly.
> This patch added small adding explanation.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>

(if the "MLOCK_PAGES && " is removed from this one too)

> ---
>  mm/rmap.c |    6 +++++-
>  1 files changed, 5 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 81a168c..c631407 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1061,7 +1061,11 @@ static int try_to_unmap_file(struct page *page, enum ttu_flags flags)
>  	if (list_empty(&mapping->i_mmap_nonlinear))
>  		goto out;
>  
> -	/* We don't bother to try to find the munlocked page in nonlinears */
> +	/*
> +	 * We don't bother to try to find the munlocked page in nonlinears.
> +	 * It's costly. Instead, later, page reclaim logic may call
> +	 * try_to_unmap(TTU_MUNLOCK) and recover PG_mlocked lazily.
> +	 */
>  	if (MLOCK_PAGES && TTU_ACTION(flags) == TTU_MUNLOCK)
>  		goto out;
>  
> -- 
> 1.6.2.5

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: mlocking in try_to_unmap_one
@ 2009-11-18 16:32               ` Hugh Dickins
  0 siblings, 0 replies; 62+ messages in thread
From: Hugh Dickins @ 2009-11-18 16:32 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Izik Eidus, Andrea Arcangeli, Nick Piggin,
	Rik van Riel, Lee Schermerhorn, linux-kernel, linux-mm

On Tue, 17 Nov 2009, KOSAKI Motohiro wrote:
> 
> From 7332f765dbaa1fbfe48cf8d53b20048f7f8105e0 Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Date: Tue, 17 Nov 2009 10:46:51 +0900
> Subject: comment adding to mlocking in try_to_unmap_one
> 
> Current code doesn't tell us why we don't bother to nonlinear kindly.
> This patch added small adding explanation.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>

(if the "MLOCK_PAGES && " is removed from this one too)

> ---
>  mm/rmap.c |    6 +++++-
>  1 files changed, 5 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 81a168c..c631407 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1061,7 +1061,11 @@ static int try_to_unmap_file(struct page *page, enum ttu_flags flags)
>  	if (list_empty(&mapping->i_mmap_nonlinear))
>  		goto out;
>  
> -	/* We don't bother to try to find the munlocked page in nonlinears */
> +	/*
> +	 * We don't bother to try to find the munlocked page in nonlinears.
> +	 * It's costly. Instead, later, page reclaim logic may call
> +	 * try_to_unmap(TTU_MUNLOCK) and recover PG_mlocked lazily.
> +	 */
>  	if (MLOCK_PAGES && TTU_ACTION(flags) == TTU_MUNLOCK)
>  		goto out;
>  
> -- 
> 1.6.2.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/6] mm: define PAGE_MAPPING_FLAGS
  2009-11-10 21:51   ` Hugh Dickins
@ 2009-11-19  0:25     ` Rik van Riel
  -1 siblings, 0 replies; 62+ messages in thread
From: Rik van Riel @ 2009-11-19  0:25 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Izik Eidus, Andrea Arcangeli, linux-kernel, linux-mm

On 11/10/2009 04:51 PM, Hugh Dickins wrote:
> At present we define PageAnon(page) by the low PAGE_MAPPING_ANON bit
> set in page->mapping, with the higher bits a pointer to the anon_vma;
> and have defined PageKsm(page) as that with NULL anon_vma.
>
> But KSM swapping will need to store a pointer there: so in preparation
> for that, now define PAGE_MAPPING_FLAGS as the low two bits, including
> PAGE_MAPPING_KSM (always set along with PAGE_MAPPING_ANON, until some
> other use for the bit emerges).
>
> Declare page_rmapping(page) to return the pointer part of page->mapping,
> and page_anon_vma(page) to return the anon_vma pointer when that's what
> it is.  Use these in a few appropriate places: notably, unuse_vma() has
> been testing page->mapping, but is better to be testing page_anon_vma()
> (cases may be added in which flag bits are set without any pointer).
>
> Signed-off-by: Hugh Dickins<hugh.dickins@tiscali.co.uk>
>
>    
Reviewed-by: Rik van Riel <riel@redhat.com>


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/6] mm: define PAGE_MAPPING_FLAGS
@ 2009-11-19  0:25     ` Rik van Riel
  0 siblings, 0 replies; 62+ messages in thread
From: Rik van Riel @ 2009-11-19  0:25 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Izik Eidus, Andrea Arcangeli, linux-kernel, linux-mm

On 11/10/2009 04:51 PM, Hugh Dickins wrote:
> At present we define PageAnon(page) by the low PAGE_MAPPING_ANON bit
> set in page->mapping, with the higher bits a pointer to the anon_vma;
> and have defined PageKsm(page) as that with NULL anon_vma.
>
> But KSM swapping will need to store a pointer there: so in preparation
> for that, now define PAGE_MAPPING_FLAGS as the low two bits, including
> PAGE_MAPPING_KSM (always set along with PAGE_MAPPING_ANON, until some
> other use for the bit emerges).
>
> Declare page_rmapping(page) to return the pointer part of page->mapping,
> and page_anon_vma(page) to return the anon_vma pointer when that's what
> it is.  Use these in a few appropriate places: notably, unuse_vma() has
> been testing page->mapping, but is better to be testing page_anon_vma()
> (cases may be added in which flag bits are set without any pointer).
>
> Signed-off-by: Hugh Dickins<hugh.dickins@tiscali.co.uk>
>
>    
Reviewed-by: Rik van Riel <riel@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2009-11-19  0:26 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-10 21:50 [PATCH 0/6] mm: prepare for ksm swapping Hugh Dickins
2009-11-10 21:50 ` Hugh Dickins
2009-11-10 21:51 ` [PATCH 1/6] mm: define PAGE_MAPPING_FLAGS Hugh Dickins
2009-11-10 21:51   ` Hugh Dickins
2009-11-19  0:25   ` Rik van Riel
2009-11-19  0:25     ` Rik van Riel
2009-11-10 21:55 ` [PATCH 2/6] mm: mlocking in try_to_unmap_one Hugh Dickins
2009-11-10 21:55   ` Hugh Dickins
2009-11-11  7:56   ` KOSAKI Motohiro
2009-11-11  7:56     ` KOSAKI Motohiro
2009-11-11 11:36     ` Hugh Dickins
2009-11-11 11:36       ` Hugh Dickins
2009-11-13  8:16       ` KOSAKI Motohiro
2009-11-13  8:16         ` KOSAKI Motohiro
2009-11-13  8:26         ` KOSAKI Motohiro
2009-11-13  8:26           ` KOSAKI Motohiro
2009-11-13 11:50           ` Andrea Arcangeli
2009-11-13 11:50             ` Andrea Arcangeli
2009-11-13 18:00             ` KOSAKI Motohiro
2009-11-13 18:00               ` KOSAKI Motohiro
2009-11-15 22:37         ` Hugh Dickins
2009-11-15 22:37           ` Hugh Dickins
2009-11-17  2:00           ` KOSAKI Motohiro
2009-11-17  2:00             ` KOSAKI Motohiro
2009-11-18 16:32             ` Hugh Dickins
2009-11-18 16:32               ` Hugh Dickins
2009-11-13  6:30   ` KOSAKI Motohiro
2009-11-13  6:30     ` KOSAKI Motohiro
2009-11-15 22:16     ` Hugh Dickins
2009-11-15 22:16       ` Hugh Dickins
2009-11-16 23:34       ` KOSAKI Motohiro
2009-11-16 23:34         ` KOSAKI Motohiro
2009-11-10 21:59 ` [PATCH 3/6] mm: CONFIG_MMU for PG_mlocked Hugh Dickins
2009-11-10 21:59   ` Hugh Dickins
2009-11-11  1:22   ` KOSAKI Motohiro
2009-11-11  1:22     ` KOSAKI Motohiro
2009-11-11 10:48     ` Hugh Dickins
2009-11-11 10:48       ` Hugh Dickins
2009-11-11 12:38   ` Andi Kleen
2009-11-11 12:38     ` Andi Kleen
2009-11-10 22:00 ` [PATCH 4/6] mm: pass address down to rmap ones Hugh Dickins
2009-11-10 22:00   ` Hugh Dickins
2009-11-10 22:02 ` [PATCH 5/6] mm: stop ptlock enlarging struct page Hugh Dickins
2009-11-10 22:02   ` Hugh Dickins
2009-11-10 22:09   ` Peter Zijlstra
2009-11-10 22:09     ` Peter Zijlstra
2009-11-10 22:24     ` Hugh Dickins
2009-11-10 22:24       ` Hugh Dickins
2009-11-10 22:14   ` Peter Zijlstra
2009-11-10 22:14     ` Peter Zijlstra
2009-11-10 22:29     ` Hugh Dickins
2009-11-10 22:29       ` Hugh Dickins
2009-11-10 22:06 ` [PATCH 6/6] mm: sigbus instead of abusing oom Hugh Dickins
2009-11-10 22:06   ` Hugh Dickins
2009-11-11  2:37   ` KAMEZAWA Hiroyuki
2009-11-11  2:37     ` KAMEZAWA Hiroyuki
2009-11-11  2:42     ` KOSAKI Motohiro
2009-11-11  2:42       ` KOSAKI Motohiro
2009-11-11  4:35       ` Wu Fengguang
2009-11-11  4:35         ` Wu Fengguang
2009-11-11  5:51   ` Minchan Kim
2009-11-11  5:51     ` Minchan Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.