linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/4] mm: fix invalidate_complete_page2 lock ordering
@ 2012-09-19  3:51 Hugh Dickins
  2012-09-19  3:53 ` [PATCH 2/4] mm: remove vma arg from page_evictable Hugh Dickins
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Hugh Dickins @ 2012-09-19  3:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Sasha Levin, Mel Gorman, Rik van Riel, Johannes Weiner,
	Michel Lespinasse, Ying Han, linux-mm, linux-kernel

In fuzzing with trinity, lockdep protested "possible irq lock inversion
dependency detected" when isolate_lru_page() reenabled interrupts while
still holding the supposedly irq-safe tree_lock:

invalidate_inode_pages2
  invalidate_complete_page2
    spin_lock_irq(&mapping->tree_lock)
    clear_page_mlock
      isolate_lru_page
        spin_unlock_irq(&zone->lru_lock)

isolate_lru_page() is correct to enable interrupts unconditionally:
invalidate_complete_page2() is incorrect to call clear_page_mlock()
while holding tree_lock, which is supposed to nest inside lru_lock.

Both truncate_complete_page() and invalidate_complete_page() call
clear_page_mlock() before taking tree_lock to remove page from
radix_tree.  I guess invalidate_complete_page2() preferred to test
PageDirty (again) under tree_lock before committing to the munlock;
but since the page has already been unmapped, its state is already
somewhat inconsistent, and no worse if clear_page_mlock() moved up.

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Deciphered-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ying Han <yinghan@google.com>
Cc: stable@vger.kernel.org
---
 mm/truncate.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- 3.6-rc6.orig/mm/truncate.c	2012-09-18 15:38:08.000000000 -0700
+++ 3.6-rc6/mm/truncate.c	2012-09-18 15:42:17.066731792 -0700
@@ -394,11 +394,12 @@ invalidate_complete_page2(struct address
 	if (page_has_private(page) && !try_to_release_page(page, GFP_KERNEL))
 		return 0;
 
+	clear_page_mlock(page);
+
 	spin_lock_irq(&mapping->tree_lock);
 	if (PageDirty(page))
 		goto failed;
 
-	clear_page_mlock(page);
 	BUG_ON(page_has_private(page));
 	__delete_from_page_cache(page);
 	spin_unlock_irq(&mapping->tree_lock);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/4] mm: remove vma arg from page_evictable
  2012-09-19  3:51 [PATCH 1/4] mm: fix invalidate_complete_page2 lock ordering Hugh Dickins
@ 2012-09-19  3:53 ` Hugh Dickins
  2012-09-19 16:46   ` Johannes Weiner
  2012-09-21 12:30   ` Mel Gorman
  2012-09-19  3:55 ` [PATCH 3/4] mm: clear_page_mlock in page_remove_rmap Hugh Dickins
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 15+ messages in thread
From: Hugh Dickins @ 2012-09-19  3:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Rik van Riel, Johannes Weiner, Michel Lespinasse,
	Ying Han, linux-mm, linux-kernel

page_evictable(page, vma) is an irritant: almost all its callers pass
NULL for vma.  Remove the vma arg and use mlocked_vma_newpage(vma, page)
explicitly in the couple of places it's needed.  But in those places we
don't even need page_evictable() itself!  They're dealing with a freshly
allocated anonymous page, which has no "mapping" and cannot be mlocked yet.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ying Han <yinghan@google.com>
---
 Documentation/vm/unevictable-lru.txt |   10 ++-------
 include/linux/swap.h                 |    2 -
 mm/internal.h                        |    5 +---
 mm/ksm.c                             |    2 -
 mm/rmap.c                            |    2 -
 mm/swap.c                            |    2 -
 mm/vmscan.c                          |   27 ++++++++-----------------
 7 files changed, 18 insertions(+), 32 deletions(-)

--- 3.6-rc6.orig/Documentation/vm/unevictable-lru.txt	2012-09-18 15:38:08.000000000 -0700
+++ 3.6-rc6/Documentation/vm/unevictable-lru.txt	2012-09-18 16:39:50.878992976 -0700
@@ -197,12 +197,8 @@ the pages are also "rescued" from the un
 freeing them.
 
 page_evictable() also checks for mlocked pages by testing an additional page
-flag, PG_mlocked (as wrapped by PageMlocked()).  If the page is NOT mlocked,
-and a non-NULL VMA is supplied, page_evictable() will check whether the VMA is
-VM_LOCKED via is_mlocked_vma().  is_mlocked_vma() will SetPageMlocked() and
-update the appropriate statistics if the vma is VM_LOCKED.  This method allows
-efficient "culling" of pages in the fault path that are being faulted in to
-VM_LOCKED VMAs.
+flag, PG_mlocked (as wrapped by PageMlocked()), which is set when a page is
+faulted into a VM_LOCKED vma, or found in a vma being VM_LOCKED.
 
 
 VMSCAN'S HANDLING OF UNEVICTABLE PAGES
@@ -651,7 +647,7 @@ PAGE RECLAIM IN shrink_*_list()
 -------------------------------
 
 shrink_active_list() culls any obviously unevictable pages - i.e.
-!page_evictable(page, NULL) - diverting these to the unevictable list.
+!page_evictable(page) - diverting these to the unevictable list.
 However, shrink_active_list() only sees unevictable pages that made it onto the
 active/inactive lru lists.  Note that these pages do not have PageUnevictable
 set - otherwise they would be on the unevictable list and shrink_active_list
--- 3.6-rc6.orig/include/linux/swap.h	2012-09-18 15:38:08.000000000 -0700
+++ 3.6-rc6/include/linux/swap.h	2012-09-18 16:39:50.878992976 -0700
@@ -281,7 +281,7 @@ static inline int zone_reclaim(struct zo
 }
 #endif
 
-extern int page_evictable(struct page *page, struct vm_area_struct *vma);
+extern int page_evictable(struct page *page);
 extern void check_move_unevictable_pages(struct page **, int nr_pages);
 
 extern unsigned long scan_unevictable_pages;
--- 3.6-rc6.orig/mm/internal.h	2012-09-18 15:38:08.000000000 -0700
+++ 3.6-rc6/mm/internal.h	2012-09-18 16:39:50.882992906 -0700
@@ -167,9 +167,8 @@ static inline void munlock_vma_pages_all
 }
 
 /*
- * Called only in fault path via page_evictable() for a new page
- * to determine if it's being mapped into a LOCKED vma.
- * If so, mark page as mlocked.
+ * Called only in fault path, to determine if a new page is being
+ * mapped into a LOCKED vma.  If it is, mark page as mlocked.
  */
 static inline int mlocked_vma_newpage(struct vm_area_struct *vma,
 				    struct page *page)
--- 3.6-rc6.orig/mm/ksm.c	2012-09-18 15:38:08.000000000 -0700
+++ 3.6-rc6/mm/ksm.c	2012-09-18 16:39:50.882992906 -0700
@@ -1582,7 +1582,7 @@ struct page *ksm_does_need_to_copy(struc
 		SetPageSwapBacked(new_page);
 		__set_page_locked(new_page);
 
-		if (page_evictable(new_page, vma))
+		if (!mlocked_vma_newpage(vma, new_page))
 			lru_cache_add_lru(new_page, LRU_ACTIVE_ANON);
 		else
 			add_page_to_unevictable_list(new_page);
--- 3.6-rc6.orig/mm/rmap.c	2012-09-18 15:38:08.000000000 -0700
+++ 3.6-rc6/mm/rmap.c	2012-09-18 16:39:50.882992906 -0700
@@ -1128,7 +1128,7 @@ void page_add_new_anon_rmap(struct page
 	else
 		__inc_zone_page_state(page, NR_ANON_TRANSPARENT_HUGEPAGES);
 	__page_set_anon_rmap(page, vma, address, 1);
-	if (page_evictable(page, vma))
+	if (!mlocked_vma_newpage(vma, page))
 		lru_cache_add_lru(page, LRU_ACTIVE_ANON);
 	else
 		add_page_to_unevictable_list(page);
--- 3.6-rc6.orig/mm/swap.c	2012-09-18 15:38:08.000000000 -0700
+++ 3.6-rc6/mm/swap.c	2012-09-18 16:39:50.882992906 -0700
@@ -742,7 +742,7 @@ void lru_add_page_tail(struct page *page
 
 	SetPageLRU(page_tail);
 
-	if (page_evictable(page_tail, NULL)) {
+	if (page_evictable(page_tail)) {
 		if (PageActive(page)) {
 			SetPageActive(page_tail);
 			active = 1;
--- 3.6-rc6.orig/mm/vmscan.c	2012-09-18 15:38:08.000000000 -0700
+++ 3.6-rc6/mm/vmscan.c	2012-09-18 16:39:50.882992906 -0700
@@ -553,7 +553,7 @@ void putback_lru_page(struct page *page)
 redo:
 	ClearPageUnevictable(page);
 
-	if (page_evictable(page, NULL)) {
+	if (page_evictable(page)) {
 		/*
 		 * For evictable pages, we can use the cache.
 		 * In event of a race, worst case is we end up with an
@@ -587,7 +587,7 @@ redo:
 	 * page is on unevictable list, it never be freed. To avoid that,
 	 * check after we added it to the list, again.
 	 */
-	if (lru == LRU_UNEVICTABLE && page_evictable(page, NULL)) {
+	if (lru == LRU_UNEVICTABLE && page_evictable(page)) {
 		if (!isolate_lru_page(page)) {
 			put_page(page);
 			goto redo;
@@ -707,7 +707,7 @@ static unsigned long shrink_page_list(st
 
 		sc->nr_scanned++;
 
-		if (unlikely(!page_evictable(page, NULL)))
+		if (unlikely(!page_evictable(page)))
 			goto cull_mlocked;
 
 		if (!sc->may_unmap && page_mapped(page))
@@ -1186,7 +1186,7 @@ putback_inactive_pages(struct lruvec *lr
 
 		VM_BUG_ON(PageLRU(page));
 		list_del(&page->lru);
-		if (unlikely(!page_evictable(page, NULL))) {
+		if (unlikely(!page_evictable(page))) {
 			spin_unlock_irq(&zone->lru_lock);
 			putback_lru_page(page);
 			spin_lock_irq(&zone->lru_lock);
@@ -1439,7 +1439,7 @@ static void shrink_active_list(unsigned
 		page = lru_to_page(&l_hold);
 		list_del(&page->lru);
 
-		if (unlikely(!page_evictable(page, NULL))) {
+		if (unlikely(!page_evictable(page))) {
 			putback_lru_page(page);
 			continue;
 		}
@@ -3349,27 +3349,18 @@ int zone_reclaim(struct zone *zone, gfp_
 /*
  * page_evictable - test whether a page is evictable
  * @page: the page to test
- * @vma: the VMA in which the page is or will be mapped, may be NULL
  *
  * Test whether page is evictable--i.e., should be placed on active/inactive
- * lists vs unevictable list.  The vma argument is !NULL when called from the
- * fault path to determine how to instantate a new page.
+ * lists vs unevictable list.
  *
  * Reasons page might not be evictable:
  * (1) page's mapping marked unevictable
  * (2) page is part of an mlocked VMA
  *
  */
-int page_evictable(struct page *page, struct vm_area_struct *vma)
+int page_evictable(struct page *page)
 {
-
-	if (mapping_unevictable(page_mapping(page)))
-		return 0;
-
-	if (PageMlocked(page) || (vma && mlocked_vma_newpage(vma, page)))
-		return 0;
-
-	return 1;
+	return !mapping_unevictable(page_mapping(page)) && !PageMlocked(page);
 }
 
 #ifdef CONFIG_SHMEM
@@ -3407,7 +3398,7 @@ void check_move_unevictable_pages(struct
 		if (!PageLRU(page) || !PageUnevictable(page))
 			continue;
 
-		if (page_evictable(page, NULL)) {
+		if (page_evictable(page)) {
 			enum lru_list lru = page_lru_base_type(page);
 
 			VM_BUG_ON(PageActive(page));

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/4] mm: clear_page_mlock in page_remove_rmap
  2012-09-19  3:51 [PATCH 1/4] mm: fix invalidate_complete_page2 lock ordering Hugh Dickins
  2012-09-19  3:53 ` [PATCH 2/4] mm: remove vma arg from page_evictable Hugh Dickins
@ 2012-09-19  3:55 ` Hugh Dickins
  2012-09-19 17:18   ` Johannes Weiner
  2012-09-19  3:57 ` [PATCH 4/4] mm: remove free_page_mlock Hugh Dickins
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Hugh Dickins @ 2012-09-19  3:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Rik van Riel, Johannes Weiner, Michel Lespinasse,
	Ying Han, linux-mm, linux-kernel

We had thought that pages could no longer get freed while still marked
as mlocked; but Johannes Weiner posted this program to demonstrate that
truncating an mlocked private file mapping containing COWed pages is
still mishandled:

#include <sys/types.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>

int main(void)
{
	char *map;
	int fd;

	system("grep mlockfreed /proc/vmstat");
	fd = open("chigurh", O_CREAT|O_EXCL|O_RDWR);
	unlink("chigurh");
	ftruncate(fd, 4096);
	map = mmap(NULL, 4096, PROT_WRITE, MAP_PRIVATE, fd, 0);
	map[0] = 11;
	mlock(map, sizeof(fd));
	ftruncate(fd, 0);
	close(fd);
	munlock(map, sizeof(fd));
	munmap(map, 4096);
	system("grep mlockfreed /proc/vmstat");
	return 0;
}

The anon COWed pages are not caught by truncation's clear_page_mlock()
of the pagecache pages; but unmap_mapping_range() unmaps them, so we
ought to look out for them there in page_remove_rmap().  Indeed, why
should truncation or invalidation be doing the clear_page_mlock() when
removing from pagecache?  mlock is a property of mapping in userspace,
not a propertly of pagecache: an mlocked unmapped page is nonsensical.

Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ying Han <yinghan@google.com>
---
 mm/internal.h |    7 +------
 mm/memory.c   |   10 +++++-----
 mm/mlock.c    |   16 +++-------------
 mm/rmap.c     |    4 ++++
 mm/truncate.c |    4 ----
 5 files changed, 13 insertions(+), 28 deletions(-)

--- 3.6-rc6.orig/mm/internal.h	2012-09-18 16:39:50.000000000 -0700
+++ 3.6-rc6/mm/internal.h	2012-09-18 17:51:02.871288773 -0700
@@ -200,12 +200,7 @@ extern void munlock_vma_page(struct page
  * If called for a page that is still mapped by mlocked vmas, all we do
  * is revert to lazy LRU behaviour -- semantics are not broken.
  */
-extern void __clear_page_mlock(struct page *page);
-static inline void clear_page_mlock(struct page *page)
-{
-	if (unlikely(TestClearPageMlocked(page)))
-		__clear_page_mlock(page);
-}
+extern void clear_page_mlock(struct page *page);
 
 /*
  * mlock_migrate_page - called only from migrate_page_copy() to
--- 3.6-rc6.orig/mm/memory.c	2012-09-18 15:38:08.000000000 -0700
+++ 3.6-rc6/mm/memory.c	2012-09-18 17:51:02.871288773 -0700
@@ -1576,12 +1576,12 @@ split_fallthrough:
 		if (page->mapping && trylock_page(page)) {
 			lru_add_drain();  /* push cached pages to LRU */
 			/*
-			 * Because we lock page here and migration is
-			 * blocked by the pte's page reference, we need
-			 * only check for file-cache page truncation.
+			 * Because we lock page here, and migration is
+			 * blocked by the pte's page reference, and we
+			 * know the page is still mapped, we don't even
+			 * need to check for file-cache page truncation.
 			 */
-			if (page->mapping)
-				mlock_vma_page(page);
+			mlock_vma_page(page);
 			unlock_page(page);
 		}
 	}
--- 3.6-rc6.orig/mm/mlock.c	2012-09-18 15:38:08.000000000 -0700
+++ 3.6-rc6/mm/mlock.c	2012-09-18 17:51:02.871288773 -0700
@@ -51,13 +51,10 @@ EXPORT_SYMBOL(can_do_mlock);
 /*
  *  LRU accounting for clear_page_mlock()
  */
-void __clear_page_mlock(struct page *page)
+void clear_page_mlock(struct page *page)
 {
-	VM_BUG_ON(!PageLocked(page));
-
-	if (!page->mapping) {	/* truncated ? */
+	if (!TestClearPageMlocked(page))
 		return;
-	}
 
 	dec_zone_page_state(page, NR_MLOCK);
 	count_vm_event(UNEVICTABLE_PGCLEARED);
@@ -290,14 +287,7 @@ void munlock_vma_pages_range(struct vm_a
 		page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP);
 		if (page && !IS_ERR(page)) {
 			lock_page(page);
-			/*
-			 * Like in __mlock_vma_pages_range(),
-			 * because we lock page here and migration is
-			 * blocked by the elevated reference, we need
-			 * only check for file-cache page truncation.
-			 */
-			if (page->mapping)
-				munlock_vma_page(page);
+			munlock_vma_page(page);
 			unlock_page(page);
 			put_page(page);
 		}
--- 3.6-rc6.orig/mm/rmap.c	2012-09-18 16:39:50.000000000 -0700
+++ 3.6-rc6/mm/rmap.c	2012-09-18 17:51:02.871288773 -0700
@@ -1203,7 +1203,10 @@ void page_remove_rmap(struct page *page)
 	} else {
 		__dec_zone_page_state(page, NR_FILE_MAPPED);
 		mem_cgroup_dec_page_stat(page, MEMCG_NR_FILE_MAPPED);
+		mem_cgroup_end_update_page_stat(page, &locked, &flags);
 	}
+	if (unlikely(PageMlocked(page)))
+		clear_page_mlock(page);
 	/*
 	 * It would be tidy to reset the PageAnon mapping here,
 	 * but that might overwrite a racing page_add_anon_rmap
@@ -1213,6 +1216,7 @@ void page_remove_rmap(struct page *page)
 	 * Leaving it set also helps swapoff to reinstate ptes
 	 * faster for those pages still in swapcache.
 	 */
+	return;
 out:
 	if (!anon)
 		mem_cgroup_end_update_page_stat(page, &locked, &flags);
--- 3.6-rc6.orig/mm/truncate.c	2012-09-18 15:42:17.000000000 -0700
+++ 3.6-rc6/mm/truncate.c	2012-09-18 17:51:02.875288902 -0700
@@ -107,7 +107,6 @@ truncate_complete_page(struct address_sp
 
 	cancel_dirty_page(page, PAGE_CACHE_SIZE);
 
-	clear_page_mlock(page);
 	ClearPageMappedToDisk(page);
 	delete_from_page_cache(page);
 	return 0;
@@ -132,7 +131,6 @@ invalidate_complete_page(struct address_
 	if (page_has_private(page) && !try_to_release_page(page, 0))
 		return 0;
 
-	clear_page_mlock(page);
 	ret = remove_mapping(mapping, page);
 
 	return ret;
@@ -394,8 +392,6 @@ invalidate_complete_page2(struct address
 	if (page_has_private(page) && !try_to_release_page(page, GFP_KERNEL))
 		return 0;
 
-	clear_page_mlock(page);
-
 	spin_lock_irq(&mapping->tree_lock);
 	if (PageDirty(page))
 		goto failed;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 4/4] mm: remove free_page_mlock
  2012-09-19  3:51 [PATCH 1/4] mm: fix invalidate_complete_page2 lock ordering Hugh Dickins
  2012-09-19  3:53 ` [PATCH 2/4] mm: remove vma arg from page_evictable Hugh Dickins
  2012-09-19  3:55 ` [PATCH 3/4] mm: clear_page_mlock in page_remove_rmap Hugh Dickins
@ 2012-09-19  3:57 ` Hugh Dickins
  2012-09-19 17:21   ` Johannes Weiner
  2012-09-21 12:47   ` Mel Gorman
  2012-09-19 16:44 ` [PATCH 1/4] mm: fix invalidate_complete_page2 lock ordering Johannes Weiner
  2012-09-21 12:26 ` Mel Gorman
  4 siblings, 2 replies; 15+ messages in thread
From: Hugh Dickins @ 2012-09-19  3:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Rik van Riel, Johannes Weiner, Michel Lespinasse,
	Ying Han, linux-mm, linux-kernel

We should not be seeing non-0 unevictable_pgs_mlockfreed any longer.
So remove free_page_mlock() from the page freeing paths: __PG_MLOCKED
is already in PAGE_FLAGS_CHECK_AT_FREE, so free_pages_check() will now
be checking it, reporting "BUG: Bad page state" if it's ever found set.
Comment UNEVICTABLE_MLOCKFREED and unevictable_pgs_mlockfreed always 0.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ying Han <yinghan@google.com>
---
 include/linux/vm_event_item.h |    2 +-
 mm/page_alloc.c               |   17 -----------------
 mm/vmstat.c                   |    2 +-
 3 files changed, 2 insertions(+), 19 deletions(-)

--- 3.6-rc6.orig/include/linux/vm_event_item.h	2012-09-18 15:38:08.000000000 -0700
+++ 3.6-rc6/include/linux/vm_event_item.h	2012-09-18 20:04:42.516625261 -0700
@@ -52,7 +52,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS
 		UNEVICTABLE_PGMUNLOCKED,
 		UNEVICTABLE_PGCLEARED,	/* on COW, page truncate */
 		UNEVICTABLE_PGSTRANDED,	/* unable to isolate on unlock */
-		UNEVICTABLE_MLOCKFREED,
+		UNEVICTABLE_MLOCKFREED,	/* no longer useful: always zero */
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 		THP_FAULT_ALLOC,
 		THP_FAULT_FALLBACK,
--- 3.6-rc6.orig/mm/page_alloc.c	2012-09-18 15:38:08.000000000 -0700
+++ 3.6-rc6/mm/page_alloc.c	2012-09-18 20:04:42.520625316 -0700
@@ -597,17 +597,6 @@ out:
 	zone->free_area[order].nr_free++;
 }
 
-/*
- * free_page_mlock() -- clean up attempts to free and mlocked() page.
- * Page should not be on lru, so no need to fix that up.
- * free_pages_check() will verify...
- */
-static inline void free_page_mlock(struct page *page)
-{
-	__dec_zone_page_state(page, NR_MLOCK);
-	__count_vm_event(UNEVICTABLE_MLOCKFREED);
-}
-
 static inline int free_pages_check(struct page *page)
 {
 	if (unlikely(page_mapcount(page) |
@@ -721,14 +710,11 @@ static bool free_pages_prepare(struct pa
 static void __free_pages_ok(struct page *page, unsigned int order)
 {
 	unsigned long flags;
-	int wasMlocked = __TestClearPageMlocked(page);
 
 	if (!free_pages_prepare(page, order))
 		return;
 
 	local_irq_save(flags);
-	if (unlikely(wasMlocked))
-		free_page_mlock(page);
 	__count_vm_events(PGFREE, 1 << order);
 	free_one_page(page_zone(page), page, order,
 					get_pageblock_migratetype(page));
@@ -1296,7 +1282,6 @@ void free_hot_cold_page(struct page *pag
 	struct per_cpu_pages *pcp;
 	unsigned long flags;
 	int migratetype;
-	int wasMlocked = __TestClearPageMlocked(page);
 
 	if (!free_pages_prepare(page, 0))
 		return;
@@ -1304,8 +1289,6 @@ void free_hot_cold_page(struct page *pag
 	migratetype = get_pageblock_migratetype(page);
 	set_page_private(page, migratetype);
 	local_irq_save(flags);
-	if (unlikely(wasMlocked))
-		free_page_mlock(page);
 	__count_vm_event(PGFREE);
 
 	/*
--- 3.6-rc6.orig/mm/vmstat.c	2012-09-18 15:38:08.000000000 -0700
+++ 3.6-rc6/mm/vmstat.c	2012-09-18 20:04:42.524625352 -0700
@@ -781,7 +781,7 @@ const char * const vmstat_text[] = {
 	"unevictable_pgs_munlocked",
 	"unevictable_pgs_cleared",
 	"unevictable_pgs_stranded",
-	"unevictable_pgs_mlockfreed",
+	"unevictable_pgs_mlockfreed",	/* no longer useful: always zero */
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	"thp_fault_alloc",

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] mm: fix invalidate_complete_page2 lock ordering
  2012-09-19  3:51 [PATCH 1/4] mm: fix invalidate_complete_page2 lock ordering Hugh Dickins
                   ` (2 preceding siblings ...)
  2012-09-19  3:57 ` [PATCH 4/4] mm: remove free_page_mlock Hugh Dickins
@ 2012-09-19 16:44 ` Johannes Weiner
  2012-09-21 12:26 ` Mel Gorman
  4 siblings, 0 replies; 15+ messages in thread
From: Johannes Weiner @ 2012-09-19 16:44 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Sasha Levin, Mel Gorman, Rik van Riel,
	Michel Lespinasse, Ying Han, linux-mm, linux-kernel

On Tue, Sep 18, 2012 at 08:51:47PM -0700, Hugh Dickins wrote:
> In fuzzing with trinity, lockdep protested "possible irq lock inversion
> dependency detected" when isolate_lru_page() reenabled interrupts while
> still holding the supposedly irq-safe tree_lock:
> 
> invalidate_inode_pages2
>   invalidate_complete_page2
>     spin_lock_irq(&mapping->tree_lock)
>     clear_page_mlock
>       isolate_lru_page
>         spin_unlock_irq(&zone->lru_lock)
> 
> isolate_lru_page() is correct to enable interrupts unconditionally:
> invalidate_complete_page2() is incorrect to call clear_page_mlock()
> while holding tree_lock, which is supposed to nest inside lru_lock.
> 
> Both truncate_complete_page() and invalidate_complete_page() call
> clear_page_mlock() before taking tree_lock to remove page from
> radix_tree.  I guess invalidate_complete_page2() preferred to test
> PageDirty (again) under tree_lock before committing to the munlock;
> but since the page has already been unmapped, its state is already
> somewhat inconsistent, and no worse if clear_page_mlock() moved up.
> 
> Reported-by: Sasha Levin <levinsasha928@gmail.com>
> Deciphered-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michel Lespinasse <walken@google.com>
> Cc: Ying Han <yinghan@google.com>
> Cc: stable@vger.kernel.org

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] mm: remove vma arg from page_evictable
  2012-09-19  3:53 ` [PATCH 2/4] mm: remove vma arg from page_evictable Hugh Dickins
@ 2012-09-19 16:46   ` Johannes Weiner
  2012-09-21 12:30   ` Mel Gorman
  1 sibling, 0 replies; 15+ messages in thread
From: Johannes Weiner @ 2012-09-19 16:46 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Mel Gorman, Rik van Riel, Michel Lespinasse,
	Ying Han, linux-mm, linux-kernel

On Tue, Sep 18, 2012 at 08:53:45PM -0700, Hugh Dickins wrote:
> page_evictable(page, vma) is an irritant: almost all its callers pass
> NULL for vma.  Remove the vma arg and use mlocked_vma_newpage(vma, page)
> explicitly in the couple of places it's needed.  But in those places we
> don't even need page_evictable() itself!  They're dealing with a freshly
> allocated anonymous page, which has no "mapping" and cannot be mlocked yet.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michel Lespinasse <walken@google.com>
> Cc: Ying Han <yinghan@google.com>

Much better.  With documentation updates and everything, thank you!

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] mm: clear_page_mlock in page_remove_rmap
  2012-09-19  3:55 ` [PATCH 3/4] mm: clear_page_mlock in page_remove_rmap Hugh Dickins
@ 2012-09-19 17:18   ` Johannes Weiner
  2012-09-19 21:52     ` Hugh Dickins
  0 siblings, 1 reply; 15+ messages in thread
From: Johannes Weiner @ 2012-09-19 17:18 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Mel Gorman, Rik van Riel, Michel Lespinasse,
	Ying Han, linux-mm, linux-kernel

On Tue, Sep 18, 2012 at 08:55:21PM -0700, Hugh Dickins wrote:
> We had thought that pages could no longer get freed while still marked
> as mlocked; but Johannes Weiner posted this program to demonstrate that
> truncating an mlocked private file mapping containing COWed pages is
> still mishandled:
> 
> #include <sys/types.h>
> #include <sys/mman.h>
> #include <sys/stat.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <fcntl.h>
> #include <stdio.h>
> 
> int main(void)
> {
> 	char *map;
> 	int fd;
> 
> 	system("grep mlockfreed /proc/vmstat");
> 	fd = open("chigurh", O_CREAT|O_EXCL|O_RDWR);
> 	unlink("chigurh");
> 	ftruncate(fd, 4096);
> 	map = mmap(NULL, 4096, PROT_WRITE, MAP_PRIVATE, fd, 0);
> 	map[0] = 11;
> 	mlock(map, sizeof(fd));
> 	ftruncate(fd, 0);
> 	close(fd);
> 	munlock(map, sizeof(fd));
> 	munmap(map, 4096);
> 	system("grep mlockfreed /proc/vmstat");
> 	return 0;
> }
> 
> The anon COWed pages are not caught by truncation's clear_page_mlock()
> of the pagecache pages; but unmap_mapping_range() unmaps them, so we
> ought to look out for them there in page_remove_rmap().  Indeed, why
> should truncation or invalidation be doing the clear_page_mlock() when
> removing from pagecache?  mlock is a property of mapping in userspace,
> not a propertly of pagecache: an mlocked unmapped page is nonsensical.

property?

> --- 3.6-rc6.orig/mm/memory.c	2012-09-18 15:38:08.000000000 -0700
> +++ 3.6-rc6/mm/memory.c	2012-09-18 17:51:02.871288773 -0700
> @@ -1576,12 +1576,12 @@ split_fallthrough:
>  		if (page->mapping && trylock_page(page)) {
>  			lru_add_drain();  /* push cached pages to LRU */
>  			/*
> -			 * Because we lock page here and migration is
> -			 * blocked by the pte's page reference, we need
> -			 * only check for file-cache page truncation.
> +			 * Because we lock page here, and migration is
> +			 * blocked by the pte's page reference, and we
> +			 * know the page is still mapped, we don't even
> +			 * need to check for file-cache page truncation.
>  			 */
> -			if (page->mapping)
> -				mlock_vma_page(page);
> +			mlock_vma_page(page);
>  			unlock_page(page);

So I don't see a reason for checking for truncation in current code,
but I also had a hard time figuring out from git history and list
archives when this was ever "needed" (flu brain does not help).

My conclusion is that it started out as a fix for when an early draft
of putback_lru_page dropped the page lock on truncated pages, but at
the time b291f00 "mlock: mlocked pages are unevictable" went into the
tree it was merely an optimization anymore to avoid moving pages
between lists when they are to be freed soon anyway.

Is this correct?

> --- 3.6-rc6.orig/mm/mlock.c	2012-09-18 15:38:08.000000000 -0700
> +++ 3.6-rc6/mm/mlock.c	2012-09-18 17:51:02.871288773 -0700
> @@ -51,13 +51,10 @@ EXPORT_SYMBOL(can_do_mlock);
>  /*
>   *  LRU accounting for clear_page_mlock()
>   */
> -void __clear_page_mlock(struct page *page)
> +void clear_page_mlock(struct page *page)
>  {
> -	VM_BUG_ON(!PageLocked(page));
> -
> -	if (!page->mapping) {	/* truncated ? */
> +	if (!TestClearPageMlocked(page))
>  		return;
> -	}
>  
>  	dec_zone_page_state(page, NR_MLOCK);
>  	count_vm_event(UNEVICTABLE_PGCLEARED);
> @@ -290,14 +287,7 @@ void munlock_vma_pages_range(struct vm_a
>  		page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP);
>  		if (page && !IS_ERR(page)) {
>  			lock_page(page);
> -			/*
> -			 * Like in __mlock_vma_pages_range(),
> -			 * because we lock page here and migration is
> -			 * blocked by the elevated reference, we need
> -			 * only check for file-cache page truncation.
> -			 */
> -			if (page->mapping)
> -				munlock_vma_page(page);
> +			munlock_vma_page(page);
>  			unlock_page(page);
>  			put_page(page);
>  		}
> --- 3.6-rc6.orig/mm/rmap.c	2012-09-18 16:39:50.000000000 -0700
> +++ 3.6-rc6/mm/rmap.c	2012-09-18 17:51:02.871288773 -0700
> @@ -1203,7 +1203,10 @@ void page_remove_rmap(struct page *page)
>  	} else {
>  		__dec_zone_page_state(page, NR_FILE_MAPPED);
>  		mem_cgroup_dec_page_stat(page, MEMCG_NR_FILE_MAPPED);
> +		mem_cgroup_end_update_page_stat(page, &locked, &flags);
>  	}
> +	if (unlikely(PageMlocked(page)))
> +		clear_page_mlock(page);
>  	/*
>  	 * It would be tidy to reset the PageAnon mapping here,
>  	 * but that might overwrite a racing page_add_anon_rmap
> @@ -1213,6 +1216,7 @@ void page_remove_rmap(struct page *page)
>  	 * Leaving it set also helps swapoff to reinstate ptes
>  	 * faster for those pages still in swapcache.
>  	 */
> +	return;
>  out:
>  	if (!anon)
>  		mem_cgroup_end_update_page_stat(page, &locked, &flags);

Would it be cleaner to fold this into the only goto site left?  One
certain upside of that would be the fantastic comment about leaving
page->mapping intact being the last operation in this function again :-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] mm: remove free_page_mlock
  2012-09-19  3:57 ` [PATCH 4/4] mm: remove free_page_mlock Hugh Dickins
@ 2012-09-19 17:21   ` Johannes Weiner
  2012-09-21 12:47   ` Mel Gorman
  1 sibling, 0 replies; 15+ messages in thread
From: Johannes Weiner @ 2012-09-19 17:21 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Mel Gorman, Rik van Riel, Michel Lespinasse,
	Ying Han, linux-mm, linux-kernel

On Tue, Sep 18, 2012 at 08:57:27PM -0700, Hugh Dickins wrote:
> We should not be seeing non-0 unevictable_pgs_mlockfreed any longer.
> So remove free_page_mlock() from the page freeing paths: __PG_MLOCKED
> is already in PAGE_FLAGS_CHECK_AT_FREE, so free_pages_check() will now
> be checking it, reporting "BUG: Bad page state" if it's ever found set.
> Comment UNEVICTABLE_MLOCKFREED and unevictable_pgs_mlockfreed always 0.

I would have just removed it because I don't see too many users
relying on it being there.  But I'm fine with keeping it for now.

> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michel Lespinasse <walken@google.com>
> Cc: Ying Han <yinghan@google.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] mm: clear_page_mlock in page_remove_rmap
  2012-09-19 17:18   ` Johannes Weiner
@ 2012-09-19 21:52     ` Hugh Dickins
  2012-09-20 16:09       ` Johannes Weiner
  0 siblings, 1 reply; 15+ messages in thread
From: Hugh Dickins @ 2012-09-19 21:52 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Mel Gorman, Rik van Riel, Michel Lespinasse,
	Ying Han, linux-mm, linux-kernel

On Wed, 19 Sep 2012, Johannes Weiner wrote:
> On Tue, Sep 18, 2012 at 08:55:21PM -0700, Hugh Dickins wrote:
> > We had thought that pages could no longer get freed while still marked
> > as mlocked; but Johannes Weiner posted this program to demonstrate that
> > truncating an mlocked private file mapping containing COWed pages is
> > still mishandled:
> > 
> > #include <sys/types.h>
> > #include <sys/mman.h>
> > #include <sys/stat.h>
> > #include <stdlib.h>
> > #include <unistd.h>
> > #include <fcntl.h>
> > #include <stdio.h>
> > 
> > int main(void)
> > {
> > 	char *map;
> > 	int fd;
> > 
> > 	system("grep mlockfreed /proc/vmstat");
> > 	fd = open("chigurh", O_CREAT|O_EXCL|O_RDWR);
> > 	unlink("chigurh");
> > 	ftruncate(fd, 4096);
> > 	map = mmap(NULL, 4096, PROT_WRITE, MAP_PRIVATE, fd, 0);
> > 	map[0] = 11;
> > 	mlock(map, sizeof(fd));
> > 	ftruncate(fd, 0);
> > 	close(fd);
> > 	munlock(map, sizeof(fd));
> > 	munmap(map, 4096);
> > 	system("grep mlockfreed /proc/vmstat");
> > 	return 0;
> > }
> > 
> > The anon COWed pages are not caught by truncation's clear_page_mlock()
> > of the pagecache pages; but unmap_mapping_range() unmaps them, so we
> > ought to look out for them there in page_remove_rmap().  Indeed, why
> > should truncation or invalidation be doing the clear_page_mlock() when
> > removing from pagecache?  mlock is a property of mapping in userspace,
> > not a propertly of pagecache: an mlocked unmapped page is nonsensical.
> 
> property?

Indeed :) thanks.  I'll not post a v2 just for this, I'll have a peep
when/if it goes into akpm's tree, in the hope that he might turn out
to have magically corrected it on the way (thank you, Andrew).

> 
> > --- 3.6-rc6.orig/mm/memory.c	2012-09-18 15:38:08.000000000 -0700
> > +++ 3.6-rc6/mm/memory.c	2012-09-18 17:51:02.871288773 -0700
> > @@ -1576,12 +1576,12 @@ split_fallthrough:
> >  		if (page->mapping && trylock_page(page)) {
> >  			lru_add_drain();  /* push cached pages to LRU */
> >  			/*
> > -			 * Because we lock page here and migration is
> > -			 * blocked by the pte's page reference, we need
> > -			 * only check for file-cache page truncation.
> > +			 * Because we lock page here, and migration is
> > +			 * blocked by the pte's page reference, and we
> > +			 * know the page is still mapped, we don't even
> > +			 * need to check for file-cache page truncation.
> >  			 */
> > -			if (page->mapping)
> > -				mlock_vma_page(page);
> > +			mlock_vma_page(page);
> >  			unlock_page(page);
> 
> So I don't see a reason for checking for truncation in current code,
> but I also had a hard time figuring out from git history and list
> archives when this was ever "needed" (flu brain does not help).

Thanks a lot for looking through all these.

But my unflued brain curses your flued brain for asking hard questions
that mine has such difficulty answering.  So, please get well soon!

I do believe you're right that it was unnecessary even before my patch.

I came to look at it (and spent a long time pondering this very block)
because I had already removed the page->mapping checks from the
munlocking cases.  Without giving any thought as to whether the NULL
case could actually occur in those, it was clearly wrong to skip
munlocking if NULL did occur (after my other changes anyway:
I didn't stop to work out if they were right before or not).

A more interesting question, I think, is whether that mlocking block
actually needs the trylock_page and unlock_page: holding the pte
lock there in follow_page gives a lot of security.  I did not decide
one way or another (just as I simply updated the comment to reflect
the change being made, without rethinking it all): it simply needed
more time and thought than I had to give it, could be done separately
later, and would have delayed getting these patches out.

> 
> My conclusion is that it started out as a fix for when an early draft
> of putback_lru_page dropped the page lock on truncated pages, but at

I don't recall the history of putback_lru_page at all, that sounds an
odd thing for it to have done.  Your question prompted me to look back
at old 2008 saved mail (though I've not looked at marc.info), but I
didn't find the crucial stage where the page->mapping check got added
(but there is a comment that Kosaki-san had fixed a truncate race).

I believe it used to be necessary (or at least advisable) because
there was a get_user_pages to pin the pages to be mlocked, quite
separate from the loop down those pages to mlock them: it was a
real possibility that the pages could be truncated between those.

> the time b291f00 "mlock: mlocked pages are unevictable" went into the
> tree it was merely an optimization anymore to avoid moving pages
> between lists when they are to be freed soon anyway.

I doubt it was ever intended as an optimization: too rare a case to
optimize for.  I think it just got carried over when the mlocking moved
inside the get_user_pages, because removing it would have required more
thought - in just the same way as I'm leaving the trylock_page.

> 
> Is this correct?
> 
> > --- 3.6-rc6.orig/mm/mlock.c	2012-09-18 15:38:08.000000000 -0700
> > +++ 3.6-rc6/mm/mlock.c	2012-09-18 17:51:02.871288773 -0700
> > @@ -51,13 +51,10 @@ EXPORT_SYMBOL(can_do_mlock);
> >  /*
> >   *  LRU accounting for clear_page_mlock()
> >   */
> > -void __clear_page_mlock(struct page *page)
> > +void clear_page_mlock(struct page *page)
> >  {
> > -	VM_BUG_ON(!PageLocked(page));
> > -
> > -	if (!page->mapping) {	/* truncated ? */
> > +	if (!TestClearPageMlocked(page))
> >  		return;
> > -	}
> >  
> >  	dec_zone_page_state(page, NR_MLOCK);
> >  	count_vm_event(UNEVICTABLE_PGCLEARED);
> > @@ -290,14 +287,7 @@ void munlock_vma_pages_range(struct vm_a
> >  		page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP);
> >  		if (page && !IS_ERR(page)) {
> >  			lock_page(page);
> > -			/*
> > -			 * Like in __mlock_vma_pages_range(),
> > -			 * because we lock page here and migration is
> > -			 * blocked by the elevated reference, we need
> > -			 * only check for file-cache page truncation.
> > -			 */
> > -			if (page->mapping)
> > -				munlock_vma_page(page);
> > +			munlock_vma_page(page);
> >  			unlock_page(page);
> >  			put_page(page);
> >  		}
> > --- 3.6-rc6.orig/mm/rmap.c	2012-09-18 16:39:50.000000000 -0700
> > +++ 3.6-rc6/mm/rmap.c	2012-09-18 17:51:02.871288773 -0700
> > @@ -1203,7 +1203,10 @@ void page_remove_rmap(struct page *page)
> >  	} else {
> >  		__dec_zone_page_state(page, NR_FILE_MAPPED);
> >  		mem_cgroup_dec_page_stat(page, MEMCG_NR_FILE_MAPPED);
> > +		mem_cgroup_end_update_page_stat(page, &locked, &flags);
> >  	}
> > +	if (unlikely(PageMlocked(page)))
> > +		clear_page_mlock(page);
> >  	/*
> >  	 * It would be tidy to reset the PageAnon mapping here,
> >  	 * but that might overwrite a racing page_add_anon_rmap
> > @@ -1213,6 +1216,7 @@ void page_remove_rmap(struct page *page)
> >  	 * Leaving it set also helps swapoff to reinstate ptes
> >  	 * faster for those pages still in swapcache.
> >  	 */
> > +	return;
> >  out:
> >  	if (!anon)
> >  		mem_cgroup_end_update_page_stat(page, &locked, &flags);
> 
> Would it be cleaner to fold this into the only goto site left?  One
> certain upside of that would be the fantastic comment about leaving
> page->mapping intact being the last operation in this function again :-)

Yes and no: I wanted to do that, but look again and you'll see
that there are actually two "goto out"s there.

I dislike the way page_remove_rmap() looks these days, and have
contemplated splitting it into anon and file subfunctions; but
not been satisfied with the result of that either.  And I admit
that this latest patch does not make it prettier.

I find the (very necessary) mem_cgroup_begin/end_update_page_stat()
particularly constricting, and some things depend upon how those are
implemented (what locks might get taken).  I do have a patch to make
them somewhat easier to work with (I never find time to review its
memory barriers, and it doesn't seem urgent), but page_remove_rmap()
remains just as ugly even with that change.

Hugh

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] mm: clear_page_mlock in page_remove_rmap
  2012-09-19 21:52     ` Hugh Dickins
@ 2012-09-20 16:09       ` Johannes Weiner
  0 siblings, 0 replies; 15+ messages in thread
From: Johannes Weiner @ 2012-09-20 16:09 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Mel Gorman, Rik van Riel, Michel Lespinasse,
	Ying Han, linux-mm, linux-kernel

On Wed, Sep 19, 2012 at 02:52:53PM -0700, Hugh Dickins wrote:
> On Wed, 19 Sep 2012, Johannes Weiner wrote:
> > On Tue, Sep 18, 2012 at 08:55:21PM -0700, Hugh Dickins wrote:
> > > --- 3.6-rc6.orig/mm/memory.c	2012-09-18 15:38:08.000000000 -0700
> > > +++ 3.6-rc6/mm/memory.c	2012-09-18 17:51:02.871288773 -0700
> > > @@ -1576,12 +1576,12 @@ split_fallthrough:
> > >  		if (page->mapping && trylock_page(page)) {
> > >  			lru_add_drain();  /* push cached pages to LRU */
> > >  			/*
> > > -			 * Because we lock page here and migration is
> > > -			 * blocked by the pte's page reference, we need
> > > -			 * only check for file-cache page truncation.
> > > +			 * Because we lock page here, and migration is
> > > +			 * blocked by the pte's page reference, and we
> > > +			 * know the page is still mapped, we don't even
> > > +			 * need to check for file-cache page truncation.
> > >  			 */
> > > -			if (page->mapping)
> > > -				mlock_vma_page(page);
> > > +			mlock_vma_page(page);
> > >  			unlock_page(page);
> > 
> > So I don't see a reason for checking for truncation in current code,
> > but I also had a hard time figuring out from git history and list
> > archives when this was ever "needed" (flu brain does not help).
> 
> Thanks a lot for looking through all these.
> 
> But my unflued brain curses your flued brain for asking hard questions
> that mine has such difficulty answering.  So, please get well soon!
> 
> I do believe you're right that it was unnecessary even before my patch.
> 
> I came to look at it (and spent a long time pondering this very block)
> because I had already removed the page->mapping checks from the
> munlocking cases.  Without giving any thought as to whether the NULL
> case could actually occur in those, it was clearly wrong to skip
> munlocking if NULL did occur (after my other changes anyway:
> I didn't stop to work out if they were right before or not).
> 
> A more interesting question, I think, is whether that mlocking block
> actually needs the trylock_page and unlock_page: holding the pte
> lock there in follow_page gives a lot of security.  I did not decide
> one way or another (just as I simply updated the comment to reflect
> the change being made, without rethinking it all): it simply needed
> more time and thought than I had to give it, could be done separately
> later, and would have delayed getting these patches out.

Fair enough, it was just a mix of curiosity and making sure I did not
miss anything fundamental.  It looks like we agree, though :)

> > My conclusion is that it started out as a fix for when an early draft
> > of putback_lru_page dropped the page lock on truncated pages, but at
> 
> I don't recall the history of putback_lru_page at all, that sounds an
> odd thing for it to have done.  Your question prompted me to look back
> at old 2008 saved mail (though I've not looked at marc.info), but I
> didn't find the crucial stage where the page->mapping check got added
> (but there is a comment that Kosaki-san had fixed a truncate race).

This is what I was referring to: https://lkml.org/lkml/2008/6/19/72 -
but the base of this patch never appeared in Linus' tree.

> > > --- 3.6-rc6.orig/mm/rmap.c	2012-09-18 16:39:50.000000000 -0700
> > > +++ 3.6-rc6/mm/rmap.c	2012-09-18 17:51:02.871288773 -0700
> > > @@ -1203,7 +1203,10 @@ void page_remove_rmap(struct page *page)
> > >  	} else {
> > >  		__dec_zone_page_state(page, NR_FILE_MAPPED);
> > >  		mem_cgroup_dec_page_stat(page, MEMCG_NR_FILE_MAPPED);
> > > +		mem_cgroup_end_update_page_stat(page, &locked, &flags);
> > >  	}
> > > +	if (unlikely(PageMlocked(page)))
> > > +		clear_page_mlock(page);
> > >  	/*
> > >  	 * It would be tidy to reset the PageAnon mapping here,
> > >  	 * but that might overwrite a racing page_add_anon_rmap
> > > @@ -1213,6 +1216,7 @@ void page_remove_rmap(struct page *page)
> > >  	 * Leaving it set also helps swapoff to reinstate ptes
> > >  	 * faster for those pages still in swapcache.
> > >  	 */
> > > +	return;
> > >  out:
> > >  	if (!anon)
> > >  		mem_cgroup_end_update_page_stat(page, &locked, &flags);
> > 
> > Would it be cleaner to fold this into the only goto site left?  One
> > certain upside of that would be the fantastic comment about leaving
> > page->mapping intact being the last operation in this function again :-)
> 
> Yes and no: I wanted to do that, but look again and you'll see
> that there are actually two "goto out"s there.

Yes, I missed that.  No worries, then!

Please include in this patch:

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] mm: fix invalidate_complete_page2 lock ordering
  2012-09-19  3:51 [PATCH 1/4] mm: fix invalidate_complete_page2 lock ordering Hugh Dickins
                   ` (3 preceding siblings ...)
  2012-09-19 16:44 ` [PATCH 1/4] mm: fix invalidate_complete_page2 lock ordering Johannes Weiner
@ 2012-09-21 12:26 ` Mel Gorman
  4 siblings, 0 replies; 15+ messages in thread
From: Mel Gorman @ 2012-09-21 12:26 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Sasha Levin, Rik van Riel, Johannes Weiner,
	Michel Lespinasse, Ying Han, linux-mm, linux-kernel

On Tue, Sep 18, 2012 at 08:51:47PM -0700, Hugh Dickins wrote:
> In fuzzing with trinity, lockdep protested "possible irq lock inversion
> dependency detected" when isolate_lru_page() reenabled interrupts while
> still holding the supposedly irq-safe tree_lock:
> 
> invalidate_inode_pages2
>   invalidate_complete_page2
>     spin_lock_irq(&mapping->tree_lock)
>     clear_page_mlock
>       isolate_lru_page
>         spin_unlock_irq(&zone->lru_lock)
> 
> isolate_lru_page() is correct to enable interrupts unconditionally:
> invalidate_complete_page2() is incorrect to call clear_page_mlock()
> while holding tree_lock, which is supposed to nest inside lru_lock.
> 
> Both truncate_complete_page() and invalidate_complete_page() call
> clear_page_mlock() before taking tree_lock to remove page from
> radix_tree.  I guess invalidate_complete_page2() preferred to test
> PageDirty (again) under tree_lock before committing to the munlock;
> but since the page has already been unmapped, its state is already
> somewhat inconsistent, and no worse if clear_page_mlock() moved up.
> 
> Reported-by: Sasha Levin <levinsasha928@gmail.com>
> Deciphered-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michel Lespinasse <walken@google.com>
> Cc: Ying Han <yinghan@google.com>
> Cc: stable@vger.kernel.org

Acked-by: Mel Gorman <mel@csn.ul.ie>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] mm: remove vma arg from page_evictable
  2012-09-19  3:53 ` [PATCH 2/4] mm: remove vma arg from page_evictable Hugh Dickins
  2012-09-19 16:46   ` Johannes Weiner
@ 2012-09-21 12:30   ` Mel Gorman
  1 sibling, 0 replies; 15+ messages in thread
From: Mel Gorman @ 2012-09-21 12:30 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Michel Lespinasse,
	Ying Han, linux-mm, linux-kernel

On Tue, Sep 18, 2012 at 08:53:45PM -0700, Hugh Dickins wrote:
> page_evictable(page, vma) is an irritant: almost all its callers pass
> NULL for vma.  Remove the vma arg and use mlocked_vma_newpage(vma, page)
> explicitly in the couple of places it's needed.  But in those places we
> don't even need page_evictable() itself!  They're dealing with a freshly
> allocated anonymous page, which has no "mapping" and cannot be mlocked yet.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michel Lespinasse <walken@google.com>
> Cc: Ying Han <yinghan@google.com>

Acked-by: Mel Gorman <mel@csn.ul.ie>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] mm: remove free_page_mlock
  2012-09-19  3:57 ` [PATCH 4/4] mm: remove free_page_mlock Hugh Dickins
  2012-09-19 17:21   ` Johannes Weiner
@ 2012-09-21 12:47   ` Mel Gorman
  2012-09-21 22:56     ` [PATCH 5/4] mm: remove unevictable_pgs_mlockfreed Hugh Dickins
  1 sibling, 1 reply; 15+ messages in thread
From: Mel Gorman @ 2012-09-21 12:47 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Michel Lespinasse,
	Ying Han, linux-mm, linux-kernel

On Tue, Sep 18, 2012 at 08:57:27PM -0700, Hugh Dickins wrote:
> We should not be seeing non-0 unevictable_pgs_mlockfreed any longer.
> So remove free_page_mlock() from the page freeing paths: __PG_MLOCKED
> is already in PAGE_FLAGS_CHECK_AT_FREE, so free_pages_check() will now
> be checking it, reporting "BUG: Bad page state" if it's ever found set.
> Comment UNEVICTABLE_MLOCKFREED and unevictable_pgs_mlockfreed always 0.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michel Lespinasse <walken@google.com>
> Cc: Ying Han <yinghan@google.com>

Like Johannes I think you should just drop the counter. I find it very
unlikely that there is a tool that depends on it existing because it's
very hard to draw any useful conclusions from its value unlikes like say
pgscan* or pgfault.

Acked-by: Mel Gorman <mel@csn.ul.ie>

Thanks Hugh.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 5/4] mm: remove unevictable_pgs_mlockfreed
  2012-09-21 12:47   ` Mel Gorman
@ 2012-09-21 22:56     ` Hugh Dickins
  2012-09-21 23:32       ` Johannes Weiner
  0 siblings, 1 reply; 15+ messages in thread
From: Hugh Dickins @ 2012-09-21 22:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Rik van Riel, Johannes Weiner, Michel Lespinasse,
	Ying Han, linux-mm, linux-kernel

Simply remove UNEVICTABLE_MLOCKFREED and unevictable_pgs_mlockfreed
line from /proc/vmstat: Johannes and Mel point out that it was very
unlikely to have been used by any tool, and of course we can restore
it easily enough if that turns out to be wrong.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ying Han <yinghan@google.com>
---
 include/linux/vm_event_item.h |    1 -
 mm/vmstat.c                   |    1 -
 2 files changed, 2 deletions(-)

--- 3.6-rc6.orig/include/linux/vm_event_item.h	2012-09-18 20:04:42.000000000 -0700
+++ 3.6-rc6/include/linux/vm_event_item.h	2012-09-21 15:13:26.608016171 -0700
@@ -52,7 +52,6 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS
 		UNEVICTABLE_PGMUNLOCKED,
 		UNEVICTABLE_PGCLEARED,	/* on COW, page truncate */
 		UNEVICTABLE_PGSTRANDED,	/* unable to isolate on unlock */
-		UNEVICTABLE_MLOCKFREED,	/* no longer useful: always zero */
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 		THP_FAULT_ALLOC,
 		THP_FAULT_FALLBACK,
--- 3.6-rc6.orig/mm/vmstat.c	2012-09-18 20:04:42.000000000 -0700
+++ 3.6-rc6/mm/vmstat.c	2012-09-21 15:13:43.724017386 -0700
@@ -781,7 +781,6 @@ const char * const vmstat_text[] = {
 	"unevictable_pgs_munlocked",
 	"unevictable_pgs_cleared",
 	"unevictable_pgs_stranded",
-	"unevictable_pgs_mlockfreed",	/* no longer useful: always zero */
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	"thp_fault_alloc",

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/4] mm: remove unevictable_pgs_mlockfreed
  2012-09-21 22:56     ` [PATCH 5/4] mm: remove unevictable_pgs_mlockfreed Hugh Dickins
@ 2012-09-21 23:32       ` Johannes Weiner
  0 siblings, 0 replies; 15+ messages in thread
From: Johannes Weiner @ 2012-09-21 23:32 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Mel Gorman, Rik van Riel, Michel Lespinasse,
	Ying Han, linux-mm, linux-kernel

On Fri, Sep 21, 2012 at 03:56:11PM -0700, Hugh Dickins wrote:
> Simply remove UNEVICTABLE_MLOCKFREED and unevictable_pgs_mlockfreed
> line from /proc/vmstat: Johannes and Mel point out that it was very
> unlikely to have been used by any tool, and of course we can restore
> it easily enough if that turns out to be wrong.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michel Lespinasse <walken@google.com>
> Cc: Ying Han <yinghan@google.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-09-21 23:32 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-19  3:51 [PATCH 1/4] mm: fix invalidate_complete_page2 lock ordering Hugh Dickins
2012-09-19  3:53 ` [PATCH 2/4] mm: remove vma arg from page_evictable Hugh Dickins
2012-09-19 16:46   ` Johannes Weiner
2012-09-21 12:30   ` Mel Gorman
2012-09-19  3:55 ` [PATCH 3/4] mm: clear_page_mlock in page_remove_rmap Hugh Dickins
2012-09-19 17:18   ` Johannes Weiner
2012-09-19 21:52     ` Hugh Dickins
2012-09-20 16:09       ` Johannes Weiner
2012-09-19  3:57 ` [PATCH 4/4] mm: remove free_page_mlock Hugh Dickins
2012-09-19 17:21   ` Johannes Weiner
2012-09-21 12:47   ` Mel Gorman
2012-09-21 22:56     ` [PATCH 5/4] mm: remove unevictable_pgs_mlockfreed Hugh Dickins
2012-09-21 23:32       ` Johannes Weiner
2012-09-19 16:44 ` [PATCH 1/4] mm: fix invalidate_complete_page2 lock ordering Johannes Weiner
2012-09-21 12:26 ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).