[PATCH 0/12] mm: page migration cleanups, and a little mlock

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/12] mm: page migration cleanups, and a little mlock
@ 2015-10-19  4:44 Hugh Dickins
  2015-10-19  4:45 ` [PATCH 1/12] mm Documentation: undoc non-linear vmas Hugh Dickins
                   ` (11 more replies)
  0 siblings, 12 replies; 39+ messages in thread
From: Hugh Dickins @ 2015-10-19  4:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Kirill A. Shutemov, Rik van Riel,
	Vlastimil Babka, Davidlohr Bueso, Oleg Nesterov, Sasha Levin,
	Andrey Konovalov, Dmitry Vyukov, KOSAKI Motohiro,
	Johannes Weiner, Michal Hocko, Joonsoo Kim, Rafael Aquini,
	Konstantin Khlebnikov, Naoya Horiguchi, Mel Gorman,
	Peter Zijlstra, Minchan Kim, Cyrill Gorcunov, Pavel Emelyanov,
	David Rientjes, Greg Thelen, linux-mm

Here's a series of mostly trivial cleanups to page migration, following
on from a preliminary cleanup to Documentation, and to the way rmap
calls mlock_vma_page (which has to be duplicated in page migration).

This started out as patch 04/24 "mm: make page migration's newpage
handling more robust" in my huge tmpfs series against v3.19 in February.
It was already a portmanteau then, and more has been thrown in since:
not much of relevance to tmpfs, just cleanups that looked worth making.

A few minor fixes on the way, nothing I think worth sending to stable.
Mostly trivial, but 12/12 and a few of the others deserve some thought.

Diffed against v4.3-rc6: I couldn't decide whether to use that or
the v4.3-rc5-mm1 as base, but there's very little conflict anyway:
4/12 should have the second arg to page_remove_rmap() if it goes after
Kirill's THP refcounting series, and 10/12 should have the TTU_FREE
block inserted if it goes after Minchan's MADV_FREE series.

 1/12 mm Documentation: undoc non-linear vmas
 2/12 mm: rmap use pte lock not mmap_sem to set PageMlocked
 3/12 mm: page migration fix PageMlocked on migrated pages
 4/12 mm: rename mem_cgroup_migrate to mem_cgroup_replace_page
 5/12 mm: correct a couple of page migration comments
 6/12 mm: page migration use the put_new_page whenever necessary
 7/12 mm: page migration trylock newpage at same level as oldpage
 8/12 mm: page migration remove_migration_ptes at lock+unlock level
 9/12 mm: simplify page migration's anon_vma comment and flow
10/12 mm: page migration use migration entry for swapcache too
11/12 mm: page migration avoid touching newpage until no going back
12/12 mm: migrate dirty page without clear_page_dirty_for_io etc

 Documentation/filesystems/proc.txt   |    1 
 Documentation/vm/page_migration      |   27 +-
 Documentation/vm/unevictable-lru.txt |  120 +-----------
 include/linux/memcontrol.h           |    7 
 mm/balloon_compaction.c              |   10 -
 mm/filemap.c                         |    2 
 mm/internal.h                        |    9 
 mm/memcontrol.c                      |   29 --
 mm/migrate.c                         |  244 ++++++++++++-------------
 mm/rmap.c                            |   99 ++++------
 mm/shmem.c                           |    2 
 11 files changed, 213 insertions(+), 337 deletions(-)

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/12] mm Documentation: undoc non-linear vmas
  2015-10-19  4:44 [PATCH 0/12] mm: page migration cleanups, and a little mlock Hugh Dickins
@ 2015-10-19  4:45 ` Hugh Dickins
  2015-10-19  9:16   ` Kirill A. Shutemov
  2015-11-05 17:29   ` Vlastimil Babka
  2015-10-19  4:50 ` [PATCH 2/12] mm: rmap use pte lock not mmap_sem to set PageMlocked Hugh Dickins
                   ` (10 subsequent siblings)
  11 siblings, 2 replies; 39+ messages in thread
From: Hugh Dickins @ 2015-10-19  4:45 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Christoph Lameter, Kirill A. Shutemov, linux-mm

While updating some mm Documentation, I came across a few straggling
references to the non-linear vmas which were happily removed in v4.0.
Delete them.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 Documentation/filesystems/proc.txt   |    1 
 Documentation/vm/page_migration      |   10 +--
 Documentation/vm/unevictable-lru.txt |   63 +------------------------
 3 files changed, 9 insertions(+), 65 deletions(-)

--- migrat.orig/Documentation/filesystems/proc.txt	2015-09-12 18:30:13.713047477 -0700
+++ migrat/Documentation/filesystems/proc.txt	2015-10-18 17:53:03.715313650 -0700
@@ -474,7 +474,6 @@ manner. The codes are the following:
     ac  - area is accountable
     nr  - swap space is not reserved for the area
     ht  - area uses huge tlb pages
-    nl  - non-linear mapping
     ar  - architecture specific flag
     dd  - do not include area into core dump
     sd  - soft-dirty flag
--- migrat.orig/Documentation/vm/page_migration	2009-06-09 20:05:27.000000000 -0700
+++ migrat/Documentation/vm/page_migration	2015-10-18 17:53:03.715313650 -0700
@@ -99,12 +99,10 @@ Steps:
 4. The new page is prepped with some settings from the old page so that
    accesses to the new page will discover a page with the correct settings.
 
-5. All the page table references to the page are converted
-   to migration entries or dropped (nonlinear vmas).
-   This decrease the mapcount of a page. If the resulting
-   mapcount is not zero then we do not migrate the page.
-   All user space processes that attempt to access the page
-   will now wait on the page lock.
+5. All the page table references to the page are converted to migration
+   entries. This decreases the mapcount of a page. If the resulting
+   mapcount is not zero then we do not migrate the page. All user space
+   processes that attempt to access the page will now wait on the page lock.
 
 6. The radix tree lock is taken. This will cause all processes trying
    to access the page via the mapping to block on the radix tree spinlock.
--- migrat.orig/Documentation/vm/unevictable-lru.txt	2015-08-30 11:34:09.000000000 -0700
+++ migrat/Documentation/vm/unevictable-lru.txt	2015-10-18 17:53:03.716313651 -0700
@@ -552,63 +552,17 @@ different reverse map mechanisms.
      is really unevictable or not.  In this case, try_to_unmap_anon() will
      return SWAP_AGAIN.
 
- (*) try_to_unmap_file() - linear mappings
+ (*) try_to_unmap_file()
 
      Unmapping of a mapped file page works the same as for anonymous mappings,
      except that the scan visits all VMAs that map the page's index/page offset
-     in the page's mapping's reverse map priority search tree.  It also visits
-     each VMA in the page's mapping's non-linear list, if the list is
-     non-empty.
+     in the page's mapping's reverse map interval search tree.
 
      As for anonymous pages, on encountering a VM_LOCKED VMA for a mapped file
      page, try_to_unmap_file() will attempt to acquire the associated
      mm_struct's mmap semaphore to mlock the page, returning SWAP_MLOCK if this
      is successful, and SWAP_AGAIN, if not.
 
- (*) try_to_unmap_file() - non-linear mappings
-
-     If a page's mapping contains a non-empty non-linear mapping VMA list, then
-     try_to_un{map|lock}() must also visit each VMA in that list to determine
-     whether the page is mapped in a VM_LOCKED VMA.  Again, the scan must visit
-     all VMAs in the non-linear list to ensure that the pages is not/should not
-     be mlocked.
-
-     If a VM_LOCKED VMA is found in the list, the scan could terminate.
-     However, there is no easy way to determine whether the page is actually
-     mapped in a given VMA - either for unmapping or testing whether the
-     VM_LOCKED VMA actually pins the page.
-
-     try_to_unmap_file() handles non-linear mappings by scanning a certain
-     number of pages - a "cluster" - in each non-linear VMA associated with the
-     page's mapping, for each file mapped page that vmscan tries to unmap.  If
-     this happens to unmap the page we're trying to unmap, try_to_unmap() will
-     notice this on return (page_mapcount(page) will be 0) and return
-     SWAP_SUCCESS.  Otherwise, it will return SWAP_AGAIN, causing vmscan to
-     recirculate this page.  We take advantage of the cluster scan in
-     try_to_unmap_cluster() as follows:
-
-	For each non-linear VMA, try_to_unmap_cluster() attempts to acquire the
-	mmap semaphore of the associated mm_struct for read without blocking.
-
-	If this attempt is successful and the VMA is VM_LOCKED,
-	try_to_unmap_cluster() will retain the mmap semaphore for the scan;
-	otherwise it drops it here.
-
-	Then, for each page in the cluster, if we're holding the mmap semaphore
-	for a locked VMA, try_to_unmap_cluster() calls mlock_vma_page() to
-	mlock the page.  This call is a no-op if the page is already locked,
-	but will mlock any pages in the non-linear mapping that happen to be
-	unlocked.
-
-	If one of the pages so mlocked is the page passed in to try_to_unmap(),
-	try_to_unmap_cluster() will return SWAP_MLOCK, rather than the default
-	SWAP_AGAIN.  This will allow vmscan to cull the page, rather than
-	recirculating it on the inactive list.
-
-	Again, if try_to_unmap_cluster() cannot acquire the VMA's mmap sem, it
-	returns SWAP_AGAIN, indicating that the page is mapped by a VM_LOCKED
-	VMA, but couldn't be mlocked.
-
 
 try_to_munlock() REVERSE MAP SCAN
 ---------------------------------
@@ -625,10 +579,9 @@ introduced a variant of try_to_unmap() c
 try_to_munlock() calls the same functions as try_to_unmap() for anonymous and
 mapped file pages with an additional argument specifying unlock versus unmap
 processing.  Again, these functions walk the respective reverse maps looking
-for VM_LOCKED VMAs.  When such a VMA is found for anonymous pages and file
-pages mapped in linear VMAs, as in the try_to_unmap() case, the functions
-attempt to acquire the associated mmap semaphore, mlock the page via
-mlock_vma_page() and return SWAP_MLOCK.  This effectively undoes the
+for VM_LOCKED VMAs.  When such a VMA is found, as in the try_to_unmap() case,
+the functions attempt to acquire the associated mmap semaphore, mlock the page
+via mlock_vma_page() and return SWAP_MLOCK.  This effectively undoes the
 pre-clearing of the page's PG_mlocked done by munlock_vma_page.
 
 If try_to_unmap() is unable to acquire a VM_LOCKED VMA's associated mmap
@@ -636,12 +589,6 @@ semaphore, it will return SWAP_AGAIN.  T
 recycle the page on the inactive list and hope that it has better luck with the
 page next time.
 
-For file pages mapped into non-linear VMAs, the try_to_munlock() logic works
-slightly differently.  On encountering a VM_LOCKED non-linear VMA that might
-map the page, try_to_munlock() returns SWAP_AGAIN without actually mlocking the
-page.  munlock_vma_page() will just leave the page unlocked and let vmscan deal
-with it - the usual fallback position.
-
 Note that try_to_munlock()'s reverse map walk must visit every VMA in a page's
 reverse map to determine that a page is NOT mapped into any VM_LOCKED VMA.
 However, the scan can terminate when it encounters a VM_LOCKED VMA and can

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 2/12] mm: rmap use pte lock not mmap_sem to set PageMlocked
  2015-10-19  4:44 [PATCH 0/12] mm: page migration cleanups, and a little mlock Hugh Dickins
  2015-10-19  4:45 ` [PATCH 1/12] mm Documentation: undoc non-linear vmas Hugh Dickins
@ 2015-10-19  4:50 ` Hugh Dickins
  2015-10-19  6:23   ` Vlastimil Babka
  2015-10-19  4:52 ` [PATCH 3/12] mm: page migration fix PageMlocked on migrated pages Hugh Dickins
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 39+ messages in thread
From: Hugh Dickins @ 2015-10-19  4:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Kirill A. Shutemov, Rik van Riel,
	Vlastimil Babka, Davidlohr Bueso, Oleg Nesterov, Sasha Levin,
	Andrey Konovalov, Dmitry Vyukov, KOSAKI Motohiro, linux-mm

KernelThreadSanitizer (ktsan) has shown that the down_read_trylock()
of mmap_sem in try_to_unmap_one() (when going to set PageMlocked on
a page found mapped in a VM_LOCKED vma) is ineffective against races
with exit_mmap()'s munlock_vma_pages_all(), because mmap_sem is not
held when tearing down an mm.

But that's okay, those races are benign; and although we've believed
for years in that ugly down_read_trylock(), it's unsuitable for the job,
and frustrates the good intention of setting PageMlocked when it fails.

It just doesn't matter if here we read vm_flags an instant before or
after a racing mlock() or munlock() or exit_mmap() sets or clears
VM_LOCKED: the syscalls (or exit) work their way up the address space
(taking pt locks after updating vm_flags) to establish the final state.

We do still need to be careful never to mark a page Mlocked (hence
unevictable) by any race that will not be corrected shortly after.
The page lock protects from many of the races, but not all (a page
is not necessarily locked when it's unmapped).  But the pte lock we
just dropped is good to cover the rest (and serializes even with
munlock_vma_pages_all(), so no special barriers required): now hold
on to the pte lock while calling mlock_vma_page().  Is that lock
ordering safe?  Yes, that's how follow_page_pte() calls it, and
how page_remove_rmap() calls the complementary clear_page_mlock().

This fixes the following case (though not a case which anyone has
complained of), which mmap_sem did not: truncation's preliminary
unmap_mapping_range() is supposed to remove even the anonymous COWs
of filecache pages, and that might race with try_to_unmap_one() on a
VM_LOCKED vma, so that mlock_vma_page() sets PageMlocked just after
zap_pte_range() unmaps the page, causing "Bad page state (mlocked)"
when freed.  The pte lock protects against this.

You could say that it also protects against the more ordinary case,
racing with the preliminary unmapping of a filecache page itself: but
in our current tree, that's independently protected by i_mmap_rwsem;
and that race would be why "Bad page state (mlocked)" was seen before
commit 48ec833b7851 ("Revert mm/memory.c: share the i_mmap_rwsem").

While we're here, make a related optimization in try_to_munmap_one():
if it's doing TTU_MUNLOCK, then there's no point at all in descending
the page tables and getting the pt lock, unless the vma is VM_LOCKED.
Yes, that can change racily, but it can change racily even without the
optimization: it's not critical.  Far better not to waste time here.

Stopped short of separating try_to_munlock_one() from try_to_munmap_one()
on this occasion, but that's probably the sensible next step - with a
rename, given that try_to_munlock()'s business is to try to set Mlocked.

Updated the unevictable-lru Documentation, to remove its reference to
mmap semaphore, but found a few more updates needed in just that area.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
I've cc'ed a lot of people on this one, since it originated in the
"Multiple potential races on vma->vm_flags" discussion.  But didn't
cc everyone on the not-very-interesting 1/12 "mm Documentation: undoc
non-linear vmas" - so if you apply just this to a 4.3-rc6 or mmotm tree,
yes, there will be rejects on unevictable-lru.txt, that's expected.

 Documentation/vm/unevictable-lru.txt |   65 ++++++-------------------
 mm/rmap.c                            |   36 ++++---------
 2 files changed, 29 insertions(+), 72 deletions(-)

--- migrat.orig/Documentation/vm/unevictable-lru.txt	2015-10-18 17:53:03.716313651 -0700
+++ migrat/Documentation/vm/unevictable-lru.txt	2015-10-18 17:53:07.098317501 -0700
@@ -531,37 +531,20 @@ map.
 
 try_to_unmap() is always called, by either vmscan for reclaim or for page
 migration, with the argument page locked and isolated from the LRU.  Separate
-functions handle anonymous and mapped file pages, as these types of pages have
-different reverse map mechanisms.
-
- (*) try_to_unmap_anon()
-
-     To unmap anonymous pages, each VMA in the list anchored in the anon_vma
-     must be visited - at least until a VM_LOCKED VMA is encountered.  If the
-     page is being unmapped for migration, VM_LOCKED VMAs do not stop the
-     process because mlocked pages are migratable.  However, for reclaim, if
-     the page is mapped into a VM_LOCKED VMA, the scan stops.
-
-     try_to_unmap_anon() attempts to acquire in read mode the mmap semaphore of
-     the mm_struct to which the VMA belongs.  If this is successful, it will
-     mlock the page via mlock_vma_page() - we wouldn't have gotten to
-     try_to_unmap_anon() if the page were already mlocked - and will return
-     SWAP_MLOCK, indicating that the page is unevictable.
-
-     If the mmap semaphore cannot be acquired, we are not sure whether the page
-     is really unevictable or not.  In this case, try_to_unmap_anon() will
-     return SWAP_AGAIN.
-
- (*) try_to_unmap_file()
-
-     Unmapping of a mapped file page works the same as for anonymous mappings,
-     except that the scan visits all VMAs that map the page's index/page offset
-     in the page's mapping's reverse map interval search tree.
-
-     As for anonymous pages, on encountering a VM_LOCKED VMA for a mapped file
-     page, try_to_unmap_file() will attempt to acquire the associated
-     mm_struct's mmap semaphore to mlock the page, returning SWAP_MLOCK if this
-     is successful, and SWAP_AGAIN, if not.
+functions handle anonymous and mapped file and KSM pages, as these types of
+pages have different reverse map lookup mechanisms, with different locking.
+In each case, whether rmap_walk_anon() or rmap_walk_file() or rmap_walk_ksm(),
+it will call try_to_unmap_one() for every VMA which might contain the page.
+
+When trying to reclaim, if try_to_unmap_one() finds the page in a VM_LOCKED
+VMA, it will then mlock the page via mlock_vma_page() instead of unmapping it,
+and return SWAP_MLOCK to indicate that the page is unevictable: and the scan
+stops there.
+
+mlock_vma_page() is called while holding the page table's lock (in addition
+to the page lock, and the rmap lock): to serialize against concurrent mlock or
+munlock or munmap system calls, mm teardown (munlock_vma_pages_all), reclaim,
+holepunching, and truncation of file pages and their anonymous COWed pages.
 
 
 try_to_munlock() REVERSE MAP SCAN
@@ -577,22 +560,15 @@ all PTEs from the page.  For this purpos
 introduced a variant of try_to_unmap() called try_to_munlock().
 
 try_to_munlock() calls the same functions as try_to_unmap() for anonymous and
-mapped file pages with an additional argument specifying unlock versus unmap
+mapped file and KSM pages with a flag argument specifying unlock versus unmap
 processing.  Again, these functions walk the respective reverse maps looking
 for VM_LOCKED VMAs.  When such a VMA is found, as in the try_to_unmap() case,
-the functions attempt to acquire the associated mmap semaphore, mlock the page
-via mlock_vma_page() and return SWAP_MLOCK.  This effectively undoes the
-pre-clearing of the page's PG_mlocked done by munlock_vma_page.
-
-If try_to_unmap() is unable to acquire a VM_LOCKED VMA's associated mmap
-semaphore, it will return SWAP_AGAIN.  This will allow shrink_page_list() to
-recycle the page on the inactive list and hope that it has better luck with the
-page next time.
+the functions mlock the page via mlock_vma_page() and return SWAP_MLOCK.  This
+undoes the pre-clearing of the page's PG_mlocked done by munlock_vma_page.
 
 Note that try_to_munlock()'s reverse map walk must visit every VMA in a page's
 reverse map to determine that a page is NOT mapped into any VM_LOCKED VMA.
-However, the scan can terminate when it encounters a VM_LOCKED VMA and can
-successfully acquire the VMA's mmap semaphore for read and mlock the page.
+However, the scan can terminate when it encounters a VM_LOCKED VMA.
 Although try_to_munlock() might be called a great many times when munlocking a
 large region or tearing down a large address space that has been mlocked via
 mlockall(), overall this is a fairly rare event.
@@ -620,11 +596,6 @@ Some examples of these unevictable pages
  (3) mlocked pages that could not be isolated from the LRU and moved to the
      unevictable list in mlock_vma_page().
 
- (4) Pages mapped into multiple VM_LOCKED VMAs, but try_to_munlock() couldn't
-     acquire the VMA's mmap semaphore to test the flags and set PageMlocked.
-     munlock_vma_page() was forced to let the page back on to the normal LRU
-     list for vmscan to handle.
-
 shrink_inactive_list() also diverts any unevictable pages that it finds on the
 inactive lists to the appropriate zone's unevictable list.
 
--- migrat.orig/mm/rmap.c	2015-09-12 18:30:20.857039763 -0700
+++ migrat/mm/rmap.c	2015-10-18 17:53:07.099317502 -0700
@@ -1304,6 +1304,10 @@ static int try_to_unmap_one(struct page
 	int ret = SWAP_AGAIN;
 	enum ttu_flags flags = (enum ttu_flags)arg;
 
+	/* munlock has nothing to gain from examining un-locked vmas */
+	if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED))
+		goto out;
+
 	pte = page_check_address(page, mm, address, &ptl, 0);
 	if (!pte)
 		goto out;
@@ -1314,9 +1318,12 @@ static int try_to_unmap_one(struct page
 	 * skipped over this mm) then we should reactivate it.
 	 */
 	if (!(flags & TTU_IGNORE_MLOCK)) {
-		if (vma->vm_flags & VM_LOCKED)
-			goto out_mlock;
-
+		if (vma->vm_flags & VM_LOCKED) {
+			/* Holding pte lock, we do *not* need mmap_sem here */
+			mlock_vma_page(page);
+			ret = SWAP_MLOCK;
+			goto out_unmap;
+		}
 		if (flags & TTU_MUNLOCK)
 			goto out_unmap;
 	}
@@ -1419,31 +1426,10 @@ static int try_to_unmap_one(struct page
 
 out_unmap:
 	pte_unmap_unlock(pte, ptl);
-	if (ret != SWAP_FAIL && !(flags & TTU_MUNLOCK))
+	if (ret != SWAP_FAIL && ret != SWAP_MLOCK && !(flags & TTU_MUNLOCK))
 		mmu_notifier_invalidate_page(mm, address);
 out:
 	return ret;
-
-out_mlock:
-	pte_unmap_unlock(pte, ptl);
-
-
-	/*
-	 * We need mmap_sem locking, Otherwise VM_LOCKED check makes
-	 * unstable result and race. Plus, We can't wait here because
-	 * we now hold anon_vma->rwsem or mapping->i_mmap_rwsem.
-	 * if trylock failed, the page remain in evictable lru and later
-	 * vmscan could retry to move the page to unevictable lru if the
-	 * page is actually mlocked.
-	 */
-	if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
-		if (vma->vm_flags & VM_LOCKED) {
-			mlock_vma_page(page);
-			ret = SWAP_MLOCK;
-		}
-		up_read(&vma->vm_mm->mmap_sem);
-	}
-	return ret;
 }
 
 bool is_vma_temporary_stack(struct vm_area_struct *vma)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 3/12] mm: page migration fix PageMlocked on migrated pages
  2015-10-19  4:44 [PATCH 0/12] mm: page migration cleanups, and a little mlock Hugh Dickins
  2015-10-19  4:45 ` [PATCH 1/12] mm Documentation: undoc non-linear vmas Hugh Dickins
  2015-10-19  4:50 ` [PATCH 2/12] mm: rmap use pte lock not mmap_sem to set PageMlocked Hugh Dickins
@ 2015-10-19  4:52 ` Hugh Dickins
  2015-11-05 18:18   ` Vlastimil Babka
  2015-10-19  4:54 ` [PATCH 4/12] mm: rename mem_cgroup_migrate to mem_cgroup_replace_page Hugh Dickins
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 39+ messages in thread
From: Hugh Dickins @ 2015-10-19  4:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Christoph Lameter, Rik van Riel, linux-mm

Commit e6c509f85455 ("mm: use clear_page_mlock() in page_remove_rmap()")
in v3.7 inadvertently made mlock_migrate_page() impotent: page migration
unmaps the page from userspace before migrating, and that commit clears
PageMlocked on the final unmap, leaving mlock_migrate_page() with nothing
to do.  Not a serious bug, the next attempt at reclaiming the page would
fix it up; but a betrayal of page migration's intent - the new page ought
to emerge as PageMlocked.

I don't see how to fix it for mlock_migrate_page() itself; but easily
fixed in remove_migration_pte(), by calling mlock_vma_page() when the
vma is VM_LOCKED - under pte lock as in try_to_unmap_one().

Delete mlock_migrate_page()?  Not quite, it does still serve a purpose
for migrate_misplaced_transhuge_page(): where we could replace it by a
test, clear_page_mlock(), mlock_vma_page() sequence; but would that be
an improvement?  mlock_migrate_page() is fairly lean, and let's make
it leaner by skipping the irq save/restore now clearly not needed.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/internal.h |    9 ++++-----
 mm/migrate.c  |    6 ++++--
 2 files changed, 8 insertions(+), 7 deletions(-)

--- migrat.orig/mm/internal.h	2015-09-12 18:30:20.841039780 -0700
+++ migrat/mm/internal.h	2015-10-18 17:53:11.061322013 -0700
@@ -271,20 +271,19 @@ extern unsigned int munlock_vma_page(str
 extern void clear_page_mlock(struct page *page);
 
 /*
- * mlock_migrate_page - called only from migrate_page_copy() to
- * migrate the Mlocked page flag; update statistics.
+ * mlock_migrate_page - called only from migrate_misplaced_transhuge_page()
+ * (because that does not go through the full procedure of migration ptes):
+ * to migrate the Mlocked page flag; update statistics.
  */
 static inline void mlock_migrate_page(struct page *newpage, struct page *page)
 {
 	if (TestClearPageMlocked(page)) {
-		unsigned long flags;
 		int nr_pages = hpage_nr_pages(page);
 
-		local_irq_save(flags);
+		/* Holding pmd lock, no change in irq context: __mod is safe */
 		__mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages);
 		SetPageMlocked(newpage);
 		__mod_zone_page_state(page_zone(newpage), NR_MLOCK, nr_pages);
-		local_irq_restore(flags);
 	}
 }
 
--- migrat.orig/mm/migrate.c	2015-10-04 10:47:52.469445854 -0700
+++ migrat/mm/migrate.c	2015-10-18 17:53:11.062322014 -0700
@@ -171,6 +171,9 @@ static int remove_migration_pte(struct p
 	else
 		page_add_file_rmap(new);
 
+	if (vma->vm_flags & VM_LOCKED)
+		mlock_vma_page(new);
+
 	/* No need to invalidate - it was non-present before */
 	update_mmu_cache(vma, addr, ptep);
 unlock:
@@ -537,7 +540,6 @@ void migrate_page_copy(struct page *newp
 	cpupid = page_cpupid_xchg_last(page, -1);
 	page_cpupid_xchg_last(newpage, cpupid);
 
-	mlock_migrate_page(newpage, page);
 	ksm_migrate_page(newpage, page);
 	/*
 	 * Please do not reorder this without considering how mm/ksm.c's
@@ -1786,7 +1788,6 @@ fail_putback:
 			SetPageActive(page);
 		if (TestClearPageUnevictable(new_page))
 			SetPageUnevictable(page);
-		mlock_migrate_page(page, new_page);
 
 		unlock_page(new_page);
 		put_page(new_page);		/* Free it */
@@ -1828,6 +1829,7 @@ fail_putback:
 		goto fail_putback;
 	}
 
+	mlock_migrate_page(new_page, page);
 	mem_cgroup_migrate(page, new_page, false);
 
 	page_remove_rmap(page);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 4/12] mm: rename mem_cgroup_migrate to mem_cgroup_replace_page
  2015-10-19  4:44 [PATCH 0/12] mm: page migration cleanups, and a little mlock Hugh Dickins
                   ` (2 preceding siblings ...)
  2015-10-19  4:52 ` [PATCH 3/12] mm: page migration fix PageMlocked on migrated pages Hugh Dickins
@ 2015-10-19  4:54 ` Hugh Dickins
  2015-10-19 12:35   ` Johannes Weiner
  2015-12-02  9:33   ` [PATCH] mm: fix kerneldoc on mem_cgroup_replace_page Hugh Dickins
  2015-10-19  4:55 ` [PATCH 5/12] mm: correct a couple of page migration comments Hugh Dickins
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 39+ messages in thread
From: Hugh Dickins @ 2015-10-19  4:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Johannes Weiner, Michal Hocko, Greg Thelen, linux-mm

After v4.3's commit 0610c25daa3e ("memcg: fix dirty page migration")
mem_cgroup_migrate() doesn't have much to offer in page migration:
convert migrate_misplaced_transhuge_page() to set_page_memcg() instead.

Then rename mem_cgroup_migrate() to mem_cgroup_replace_page(), since its
remaining callers are replace_page_cache_page() and shmem_replace_page():
both of whom passed lrucare true, so just eliminate that argument.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 include/linux/memcontrol.h |    7 ++-----
 mm/filemap.c               |    2 +-
 mm/memcontrol.c            |   29 ++++++++---------------------
 mm/migrate.c               |    5 ++---
 mm/shmem.c                 |    2 +-
 5 files changed, 14 insertions(+), 31 deletions(-)

--- migrat.orig/include/linux/memcontrol.h	2015-10-04 10:47:52.429445705 -0700
+++ migrat/include/linux/memcontrol.h	2015-10-18 17:53:14.323325727 -0700
@@ -297,8 +297,7 @@ void mem_cgroup_cancel_charge(struct pag
 void mem_cgroup_uncharge(struct page *page);
 void mem_cgroup_uncharge_list(struct list_head *page_list);
 
-void mem_cgroup_migrate(struct page *oldpage, struct page *newpage,
-			bool lrucare);
+void mem_cgroup_replace_page(struct page *oldpage, struct page *newpage);
 
 struct lruvec *mem_cgroup_zone_lruvec(struct zone *, struct mem_cgroup *);
 struct lruvec *mem_cgroup_page_lruvec(struct page *, struct zone *);
@@ -535,9 +534,7 @@ static inline void mem_cgroup_uncharge_l
 {
 }
 
-static inline void mem_cgroup_migrate(struct page *oldpage,
-				      struct page *newpage,
-				      bool lrucare)
+static inline void mem_cgroup_replace_page(struct page *old, struct page *new)
 {
 }
 
--- migrat.orig/mm/filemap.c	2015-10-11 14:55:52.800038322 -0700
+++ migrat/mm/filemap.c	2015-10-18 17:53:14.324325728 -0700
@@ -510,7 +510,7 @@ int replace_page_cache_page(struct page
 			__inc_zone_page_state(new, NR_SHMEM);
 		spin_unlock_irqrestore(&mapping->tree_lock, flags);
 		mem_cgroup_end_page_stat(memcg);
-		mem_cgroup_migrate(old, new, true);
+		mem_cgroup_replace_page(old, new);
 		radix_tree_preload_end();
 		if (freepage)
 			freepage(old);
--- migrat.orig/mm/memcontrol.c	2015-10-18 17:37:59.336284038 -0700
+++ migrat/mm/memcontrol.c	2015-10-18 17:53:14.326325730 -0700
@@ -4578,9 +4578,8 @@ static int mem_cgroup_move_account(struc
 		goto out;
 
 	/*
-	 * Prevent mem_cgroup_migrate() from looking at page->mem_cgroup
-	 * of its source page while we change it: page migration takes
-	 * both pages off the LRU, but page cache replacement doesn't.
+	 * Prevent mem_cgroup_replace_page() from looking at
+	 * page->mem_cgroup of its source page while we change it.
 	 */
 	if (!trylock_page(page))
 		goto out;
@@ -5547,7 +5546,7 @@ void mem_cgroup_uncharge_list(struct lis
 }
 
 /**
- * mem_cgroup_migrate - migrate a charge to another page
+ * mem_cgroup_replace_page - migrate a charge to another page
  * @oldpage: currently charged page
  * @newpage: page to transfer the charge to
  * @lrucare: either or both pages might be on the LRU already
@@ -5556,16 +5555,13 @@ void mem_cgroup_uncharge_list(struct lis
  *
  * Both pages must be locked, @newpage->mapping must be set up.
  */
-void mem_cgroup_migrate(struct page *oldpage, struct page *newpage,
-			bool lrucare)
+void mem_cgroup_replace_page(struct page *oldpage, struct page *newpage)
 {
 	struct mem_cgroup *memcg;
 	int isolated;
 
 	VM_BUG_ON_PAGE(!PageLocked(oldpage), oldpage);
 	VM_BUG_ON_PAGE(!PageLocked(newpage), newpage);
-	VM_BUG_ON_PAGE(!lrucare && PageLRU(oldpage), oldpage);
-	VM_BUG_ON_PAGE(!lrucare && PageLRU(newpage), newpage);
 	VM_BUG_ON_PAGE(PageAnon(oldpage) != PageAnon(newpage), newpage);
 	VM_BUG_ON_PAGE(PageTransHuge(oldpage) != PageTransHuge(newpage),
 		       newpage);
@@ -5577,25 +5573,16 @@ void mem_cgroup_migrate(struct page *old
 	if (newpage->mem_cgroup)
 		return;
 
-	/*
-	 * Swapcache readahead pages can get migrated before being
-	 * charged, and migration from compaction can happen to an
-	 * uncharged page when the PFN walker finds a page that
-	 * reclaim just put back on the LRU but has not released yet.
-	 */
+	/* Swapcache readahead pages can get replaced before being charged */
 	memcg = oldpage->mem_cgroup;
 	if (!memcg)
 		return;
 
-	if (lrucare)
-		lock_page_lru(oldpage, &isolated);
-
+	lock_page_lru(oldpage, &isolated);
 	oldpage->mem_cgroup = NULL;
+	unlock_page_lru(oldpage, isolated);
 
-	if (lrucare)
-		unlock_page_lru(oldpage, isolated);
-
-	commit_charge(newpage, memcg, lrucare);
+	commit_charge(newpage, memcg, true);
 }
 
 /*
--- migrat.orig/mm/migrate.c	2015-10-18 17:53:11.062322014 -0700
+++ migrat/mm/migrate.c	2015-10-18 17:53:14.326325730 -0700
@@ -30,7 +30,6 @@
 #include <linux/mempolicy.h>
 #include <linux/vmalloc.h>
 #include <linux/security.h>
-#include <linux/memcontrol.h>
 #include <linux/syscalls.h>
 #include <linux/hugetlb.h>
 #include <linux/hugetlb_cgroup.h>
@@ -1830,8 +1829,8 @@ fail_putback:
 	}
 
 	mlock_migrate_page(new_page, page);
-	mem_cgroup_migrate(page, new_page, false);
-
+	set_page_memcg(new_page, page_memcg(page));
+	set_page_memcg(page, NULL);
 	page_remove_rmap(page);
 
 	spin_unlock(ptl);
--- migrat.orig/mm/shmem.c	2015-09-12 18:30:20.857039763 -0700
+++ migrat/mm/shmem.c	2015-10-18 17:53:14.327325731 -0700
@@ -1023,7 +1023,7 @@ static int shmem_replace_page(struct pag
 		 */
 		oldpage = newpage;
 	} else {
-		mem_cgroup_migrate(oldpage, newpage, true);
+		mem_cgroup_replace_page(oldpage, newpage);
 		lru_cache_add_anon(newpage);
 		*pagep = newpage;
 	}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 5/12] mm: correct a couple of page migration comments
  2015-10-19  4:44 [PATCH 0/12] mm: page migration cleanups, and a little mlock Hugh Dickins
                   ` (3 preceding siblings ...)
  2015-10-19  4:54 ` [PATCH 4/12] mm: rename mem_cgroup_migrate to mem_cgroup_replace_page Hugh Dickins
@ 2015-10-19  4:55 ` Hugh Dickins
  2015-10-21 17:53   ` Rafael Aquini
  2015-10-19  4:57 ` [PATCH 6/12] mm: page migration use the put_new_page whenever necessary Hugh Dickins
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 39+ messages in thread
From: Hugh Dickins @ 2015-10-19  4:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Christoph Lameter, Joonsoo Kim, Rafael Aquini, linux-mm

It's migrate.c not migration,c, and nowadays putback_movable_pages()
not putback_lru_pages().

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/migrate.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- migrat.orig/mm/migrate.c	2015-10-18 17:53:14.326325730 -0700
+++ migrat/mm/migrate.c	2015-10-18 17:53:17.579329434 -0700
@@ -1,5 +1,5 @@
 /*
- * Memory Migration functionality - linux/mm/migration.c
+ * Memory Migration functionality - linux/mm/migrate.c
  *
  * Copyright (C) 2006 Silicon Graphics, Inc., Christoph Lameter
  *
@@ -1113,7 +1113,7 @@ out:
  *
  * The function returns after 10 attempts or if no pages are movable any more
  * because the list has become empty or no retryable pages exist any more.
- * The caller should call putback_lru_pages() to return pages to the LRU
+ * The caller should call putback_movable_pages() to return pages to the LRU
  * or free list only if ret != 0.
  *
  * Returns the number of pages that were not migrated, or an error code.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 6/12] mm: page migration use the put_new_page whenever necessary
  2015-10-19  4:44 [PATCH 0/12] mm: page migration cleanups, and a little mlock Hugh Dickins
                   ` (4 preceding siblings ...)
  2015-10-19  4:55 ` [PATCH 5/12] mm: correct a couple of page migration comments Hugh Dickins
@ 2015-10-19  4:57 ` Hugh Dickins
  2015-11-05 18:31   ` Vlastimil Babka
  2015-10-19  4:59 ` [PATCH 7/12] mm: page migration trylock newpage at same level as oldpage Hugh Dickins
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 39+ messages in thread
From: Hugh Dickins @ 2015-10-19  4:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Christoph Lameter, David Rientjes, linux-mm

I don't know of any problem from the way it's used in our current tree,
but there is one defect in page migration's custom put_new_page feature.

An unused newpage is expected to be released with the put_new_page(),
but there was one MIGRATEPAGE_SUCCESS (0) path which released it with
putback_lru_page(): which can be very wrong for a custom pool.

Fixed more easily by resetting put_new_page once it won't be needed,
than by adding a further flag to modify the rc test.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/migrate.c |   19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

--- migrat.orig/mm/migrate.c	2015-10-18 17:53:17.579329434 -0700
+++ migrat/mm/migrate.c	2015-10-18 17:53:20.159332371 -0700
@@ -938,10 +938,11 @@ static ICE_noinline int unmap_and_move(n
 				   int force, enum migrate_mode mode,
 				   enum migrate_reason reason)
 {
-	int rc = 0;
+	int rc = MIGRATEPAGE_SUCCESS;
 	int *result = NULL;
-	struct page *newpage = get_new_page(page, private, &result);
+	struct page *newpage;
 
+	newpage = get_new_page(page, private, &result);
 	if (!newpage)
 		return -ENOMEM;
 
@@ -955,6 +956,8 @@ static ICE_noinline int unmap_and_move(n
 			goto out;
 
 	rc = __unmap_and_move(page, newpage, force, mode);
+	if (rc == MIGRATEPAGE_SUCCESS)
+		put_new_page = NULL;
 
 out:
 	if (rc != -EAGAIN) {
@@ -981,7 +984,7 @@ out:
 	 * it.  Otherwise, putback_lru_page() will drop the reference grabbed
 	 * during isolation.
 	 */
-	if (rc != MIGRATEPAGE_SUCCESS && put_new_page) {
+	if (put_new_page) {
 		ClearPageSwapBacked(newpage);
 		put_new_page(newpage, private);
 	} else if (unlikely(__is_movable_balloon_page(newpage))) {
@@ -1022,7 +1025,7 @@ static int unmap_and_move_huge_page(new_
 				struct page *hpage, int force,
 				enum migrate_mode mode)
 {
-	int rc = 0;
+	int rc = -EAGAIN;
 	int *result = NULL;
 	int page_was_mapped = 0;
 	struct page *new_hpage;
@@ -1044,8 +1047,6 @@ static int unmap_and_move_huge_page(new_
 	if (!new_hpage)
 		return -ENOMEM;
 
-	rc = -EAGAIN;
-
 	if (!trylock_page(hpage)) {
 		if (!force || mode != MIGRATE_SYNC)
 			goto out;
@@ -1070,8 +1071,10 @@ static int unmap_and_move_huge_page(new_
 	if (anon_vma)
 		put_anon_vma(anon_vma);
 
-	if (rc == MIGRATEPAGE_SUCCESS)
+	if (rc == MIGRATEPAGE_SUCCESS) {
 		hugetlb_cgroup_migrate(hpage, new_hpage);
+		put_new_page = NULL;
+	}
 
 	unlock_page(hpage);
 out:
@@ -1083,7 +1086,7 @@ out:
 	 * it.  Otherwise, put_page() will drop the reference grabbed during
 	 * isolation.
 	 */
-	if (rc != MIGRATEPAGE_SUCCESS && put_new_page)
+	if (put_new_page)
 		put_new_page(new_hpage, private);
 	else
 		putback_active_hugepage(new_hpage);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 7/12] mm: page migration trylock newpage at same level as oldpage
  2015-10-19  4:44 [PATCH 0/12] mm: page migration cleanups, and a little mlock Hugh Dickins
                   ` (5 preceding siblings ...)
  2015-10-19  4:57 ` [PATCH 6/12] mm: page migration use the put_new_page whenever necessary Hugh Dickins
@ 2015-10-19  4:59 ` Hugh Dickins
  2015-10-21 17:54   ` Rafael Aquini
  2015-10-19  5:01 ` [PATCH 8/12] mm: page migration remove_migration_ptes at lock+unlock level Hugh Dickins
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 39+ messages in thread
From: Hugh Dickins @ 2015-10-19  4:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Rafael Aquini, Konstantin Khlebnikov,
	Naoya Horiguchi, linux-mm

Clean up page migration a little by moving the trylock of newpage from
move_to_new_page() into __unmap_and_move(), where the old page has been
locked.  Adjust unmap_and_move_huge_page() and balloon_page_migrate()
accordingly.

But make one kind-of-functional change on the way: whereas trylock of
newpage used to BUG() if it failed, now simply return -EAGAIN if so.
Cutting out BUG()s is good, right?  But, to be honest, this is really
to extend the usefulness of the custom put_new_page feature, allowing
a pool of new pages to be shared perhaps with racing uses.

Use an "else" instead of that "skip_unmap" label.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/balloon_compaction.c |   10 +-------
 mm/migrate.c            |   46 +++++++++++++++++++++-----------------
 2 files changed, 28 insertions(+), 28 deletions(-)

--- migrat.orig/mm/balloon_compaction.c	2014-12-07 14:21:05.000000000 -0800
+++ migrat/mm/balloon_compaction.c	2015-10-18 17:53:22.486335020 -0700
@@ -199,23 +199,17 @@ int balloon_page_migrate(struct page *ne
 	struct balloon_dev_info *balloon = balloon_page_device(page);
 	int rc = -EAGAIN;
 
-	/*
-	 * Block others from accessing the 'newpage' when we get around to
-	 * establishing additional references. We should be the only one
-	 * holding a reference to the 'newpage' at this point.
-	 */
-	BUG_ON(!trylock_page(newpage));
+	VM_BUG_ON_PAGE(!PageLocked(page), page);
+	VM_BUG_ON_PAGE(!PageLocked(newpage), newpage);
 
 	if (WARN_ON(!__is_movable_balloon_page(page))) {
 		dump_page(page, "not movable balloon page");
-		unlock_page(newpage);
 		return rc;
 	}
 
 	if (balloon && balloon->migratepage)
 		rc = balloon->migratepage(balloon, newpage, page, mode);
 
-	unlock_page(newpage);
 	return rc;
 }
 #endif /* CONFIG_BALLOON_COMPACTION */
--- migrat.orig/mm/migrate.c	2015-10-18 17:53:20.159332371 -0700
+++ migrat/mm/migrate.c	2015-10-18 17:53:22.487335021 -0700
@@ -727,13 +727,8 @@ static int move_to_new_page(struct page
 	struct address_space *mapping;
 	int rc;
 
-	/*
-	 * Block others from accessing the page when we get around to
-	 * establishing additional references. We are the only one
-	 * holding a reference to the new page at this point.
-	 */
-	if (!trylock_page(newpage))
-		BUG();
+	VM_BUG_ON_PAGE(!PageLocked(page), page);
+	VM_BUG_ON_PAGE(!PageLocked(newpage), newpage);
 
 	/* Prepare mapping for the new page.*/
 	newpage->index = page->index;
@@ -774,9 +769,6 @@ static int move_to_new_page(struct page
 			remove_migration_ptes(page, newpage);
 		page->mapping = NULL;
 	}
-
-	unlock_page(newpage);
-
 	return rc;
 }
 
@@ -861,6 +853,17 @@ static int __unmap_and_move(struct page
 		}
 	}
 
+	/*
+	 * Block others from accessing the new page when we get around to
+	 * establishing additional references. We are usually the only one
+	 * holding a reference to newpage at this point. We used to have a BUG
+	 * here if trylock_page(newpage) fails, but would like to allow for
+	 * cases where there might be a race with the previous use of newpage.
+	 * This is much like races on refcount of oldpage: just don't BUG().
+	 */
+	if (unlikely(!trylock_page(newpage)))
+		goto out_unlock;
+
 	if (unlikely(isolated_balloon_page(page))) {
 		/*
 		 * A ballooned page does not need any special attention from
@@ -870,7 +873,7 @@ static int __unmap_and_move(struct page
 		 * the page migration right away (proteced by page lock).
 		 */
 		rc = balloon_page_migrate(newpage, page, mode);
-		goto out_unlock;
+		goto out_unlock_both;
 	}
 
 	/*
@@ -889,30 +892,27 @@ static int __unmap_and_move(struct page
 		VM_BUG_ON_PAGE(PageAnon(page), page);
 		if (page_has_private(page)) {
 			try_to_free_buffers(page);
-			goto out_unlock;
+			goto out_unlock_both;
 		}
-		goto skip_unmap;
-	}
-
-	/* Establish migration ptes or remove ptes */
-	if (page_mapped(page)) {
+	} else if (page_mapped(page)) {
+		/* Establish migration ptes */
 		try_to_unmap(page,
 			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
 		page_was_mapped = 1;
 	}
 
-skip_unmap:
 	if (!page_mapped(page))
 		rc = move_to_new_page(newpage, page, page_was_mapped, mode);
 
 	if (rc && page_was_mapped)
 		remove_migration_ptes(page, page);
 
+out_unlock_both:
+	unlock_page(newpage);
+out_unlock:
 	/* Drop an anon_vma reference if we took one */
 	if (anon_vma)
 		put_anon_vma(anon_vma);
-
-out_unlock:
 	unlock_page(page);
 out:
 	return rc;
@@ -1056,6 +1056,9 @@ static int unmap_and_move_huge_page(new_
 	if (PageAnon(hpage))
 		anon_vma = page_get_anon_vma(hpage);
 
+	if (unlikely(!trylock_page(new_hpage)))
+		goto put_anon;
+
 	if (page_mapped(hpage)) {
 		try_to_unmap(hpage,
 			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
@@ -1068,6 +1071,9 @@ static int unmap_and_move_huge_page(new_
 	if (rc != MIGRATEPAGE_SUCCESS && page_was_mapped)
 		remove_migration_ptes(hpage, hpage);
 
+	unlock_page(new_hpage);
+
+put_anon:
 	if (anon_vma)
 		put_anon_vma(anon_vma);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 8/12] mm: page migration remove_migration_ptes at lock+unlock level
  2015-10-19  4:44 [PATCH 0/12] mm: page migration cleanups, and a little mlock Hugh Dickins
                   ` (6 preceding siblings ...)
  2015-10-19  4:59 ` [PATCH 7/12] mm: page migration trylock newpage at same level as oldpage Hugh Dickins
@ 2015-10-19  5:01 ` Hugh Dickins
  2015-10-19  5:03 ` [PATCH 9/12] mm: simplify page migration's anon_vma comment and flow Hugh Dickins
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 39+ messages in thread
From: Hugh Dickins @ 2015-10-19  5:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Konstantin Khlebnikov, Naoya Horiguchi, linux-mm

Clean up page migration a little more by calling remove_migration_ptes()
from the same level, on success or on failure, from __unmap_and_move()
or from unmap_and_move_huge_page().

Don't reset page->mapping of a PageAnon old page in move_to_new_page(),
leave that to when the page is freed.  Except for here in page migration,
it has been an invariant that a PageAnon (bit set in page->mapping) page
stays PageAnon until it is freed, and I think we're safer to keep to that.

And with the above rearrangement, it's necessary because zap_pte_range()
wants to identify whether a migration entry represents a file or an anon
page, to update the appropriate rss stats without waiting on it.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/migrate.c |   34 +++++++++++++++++++---------------
 1 file changed, 19 insertions(+), 15 deletions(-)

--- migrat.orig/mm/migrate.c	2015-10-18 17:53:22.487335021 -0700
+++ migrat/mm/migrate.c	2015-10-18 17:53:24.858337721 -0700
@@ -722,7 +722,7 @@ static int fallback_migrate_page(struct
  *  MIGRATEPAGE_SUCCESS - success
  */
 static int move_to_new_page(struct page *newpage, struct page *page,
-				int page_was_mapped, enum migrate_mode mode)
+				enum migrate_mode mode)
 {
 	struct address_space *mapping;
 	int rc;
@@ -755,19 +755,21 @@ static int move_to_new_page(struct page
 		 * space which also has its own migratepage callback. This
 		 * is the most common path for page migration.
 		 */
-		rc = mapping->a_ops->migratepage(mapping,
-						newpage, page, mode);
+		rc = mapping->a_ops->migratepage(mapping, newpage, page, mode);
 	else
 		rc = fallback_migrate_page(mapping, newpage, page, mode);
 
-	if (rc != MIGRATEPAGE_SUCCESS) {
+	/*
+	 * When successful, old pagecache page->mapping must be cleared before
+	 * page is freed; but stats require that PageAnon be left as PageAnon.
+	 */
+	if (rc == MIGRATEPAGE_SUCCESS) {
+		set_page_memcg(page, NULL);
+		if (!PageAnon(page))
+			page->mapping = NULL;
+	} else {
 		set_page_memcg(newpage, NULL);
 		newpage->mapping = NULL;
-	} else {
-		set_page_memcg(page, NULL);
-		if (page_was_mapped)
-			remove_migration_ptes(page, newpage);
-		page->mapping = NULL;
 	}
 	return rc;
 }
@@ -902,10 +904,11 @@ static int __unmap_and_move(struct page
 	}
 
 	if (!page_mapped(page))
-		rc = move_to_new_page(newpage, page, page_was_mapped, mode);
+		rc = move_to_new_page(newpage, page, mode);
 
-	if (rc && page_was_mapped)
-		remove_migration_ptes(page, page);
+	if (page_was_mapped)
+		remove_migration_ptes(page,
+			rc == MIGRATEPAGE_SUCCESS ? newpage : page);
 
 out_unlock_both:
 	unlock_page(newpage);
@@ -1066,10 +1069,11 @@ static int unmap_and_move_huge_page(new_
 	}
 
 	if (!page_mapped(hpage))
-		rc = move_to_new_page(new_hpage, hpage, page_was_mapped, mode);
+		rc = move_to_new_page(new_hpage, hpage, mode);
 
-	if (rc != MIGRATEPAGE_SUCCESS && page_was_mapped)
-		remove_migration_ptes(hpage, hpage);
+	if (page_was_mapped)
+		remove_migration_ptes(hpage,
+			rc == MIGRATEPAGE_SUCCESS ? new_hpage : hpage);
 
 	unlock_page(new_hpage);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 9/12] mm: simplify page migration's anon_vma comment and flow
  2015-10-19  4:44 [PATCH 0/12] mm: page migration cleanups, and a little mlock Hugh Dickins
                   ` (7 preceding siblings ...)
  2015-10-19  5:01 ` [PATCH 8/12] mm: page migration remove_migration_ptes at lock+unlock level Hugh Dickins
@ 2015-10-19  5:03 ` Hugh Dickins
  2015-10-19  5:05 ` [PATCH 10/12] mm: page migration use migration entry for swapcache too Hugh Dickins
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 39+ messages in thread
From: Hugh Dickins @ 2015-10-19  5:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Christoph Lameter, Mel Gorman, Peter Zijlstra, linux-mm

__unmap_and_move() contains a long stale comment on page_get_anon_vma()
and PageSwapCache(), with an odd control flow that's hard to follow.
Mostly this reflects our confusion about the lifetime of an anon_vma,
in the early days of page migration, before we could take a reference
to one.  Nowadays this seems quite straightforward: cut it all down to
essentials.

I cannot see the relevance of swapcache here at all, so don't treat it
any differently: I believe the old comment reflects in part our anon_vma
confusions, and in part the original v2.6.16 page migration technique,
which used actual swap to migrate anon instead of swap-like migration
entries.  Why should a swapcache page not be migrated with the aid of
migration entry ptes like everything else?  So lose that comment now,
and enable migration entries for swapcache in the next patch.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/migrate.c |   36 ++++++++++--------------------------
 1 file changed, 10 insertions(+), 26 deletions(-)

--- migrat.orig/mm/migrate.c	2015-10-18 17:53:24.858337721 -0700
+++ migrat/mm/migrate.c	2015-10-18 17:53:27.120340297 -0700
@@ -819,6 +819,7 @@ static int __unmap_and_move(struct page
 			goto out_unlock;
 		wait_on_page_writeback(page);
 	}
+
 	/*
 	 * By try_to_unmap(), page->mapcount goes down to 0 here. In this case,
 	 * we cannot notice that anon_vma is freed while we migrates a page.
@@ -826,34 +827,15 @@ static int __unmap_and_move(struct page
 	 * of migration. File cache pages are no problem because of page_lock()
 	 * File Caches may use write_page() or lock_page() in migration, then,
 	 * just care Anon page here.
+	 *
+	 * Only page_get_anon_vma() understands the subtleties of
+	 * getting a hold on an anon_vma from outside one of its mms.
+	 * But if we cannot get anon_vma, then we won't need it anyway,
+	 * because that implies that the anon page is no longer mapped
+	 * (and cannot be remapped so long as we hold the page lock).
 	 */
-	if (PageAnon(page) && !PageKsm(page)) {
-		/*
-		 * Only page_lock_anon_vma_read() understands the subtleties of
-		 * getting a hold on an anon_vma from outside one of its mms.
-		 */
+	if (PageAnon(page) && !PageKsm(page))
 		anon_vma = page_get_anon_vma(page);
-		if (anon_vma) {
-			/*
-			 * Anon page
-			 */
-		} else if (PageSwapCache(page)) {
-			/*
-			 * We cannot be sure that the anon_vma of an unmapped
-			 * swapcache page is safe to use because we don't
-			 * know in advance if the VMA that this page belonged
-			 * to still exists. If the VMA and others sharing the
-			 * data have been freed, then the anon_vma could
-			 * already be invalid.
-			 *
-			 * To avoid this possibility, swapcache pages get
-			 * migrated but are not remapped when migration
-			 * completes
-			 */
-		} else {
-			goto out_unlock;
-		}
-	}
 
 	/*
 	 * Block others from accessing the new page when we get around to
@@ -898,6 +880,8 @@ static int __unmap_and_move(struct page
 		}
 	} else if (page_mapped(page)) {
 		/* Establish migration ptes */
+		VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma,
+				page);
 		try_to_unmap(page,
 			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
 		page_was_mapped = 1;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 10/12] mm: page migration use migration entry for swapcache too
  2015-10-19  4:44 [PATCH 0/12] mm: page migration cleanups, and a little mlock Hugh Dickins
                   ` (8 preceding siblings ...)
  2015-10-19  5:03 ` [PATCH 9/12] mm: simplify page migration's anon_vma comment and flow Hugh Dickins
@ 2015-10-19  5:05 ` Hugh Dickins
  2015-10-22 22:35   ` Cyrill Gorcunov
  2015-10-19  5:07 ` [PATCH 11/12] mm: page migration avoid touching newpage until no going back Hugh Dickins
  2015-10-19  5:11 ` [PATCH 12/12] mm: migrate dirty page without clear_page_dirty_for_io etc Hugh Dickins
  11 siblings, 1 reply; 39+ messages in thread
From: Hugh Dickins @ 2015-10-19  5:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Konstantin Khlebnikov, Mel Gorman,
	Minchan Kim, Cyrill Gorcunov, Pavel Emelyanov, linux-mm

Hitherto page migration has avoided using a migration entry for a
swapcache page mapped into userspace, apparently for historical reasons.
So any page blessed with swapcache would entail a minor fault when it's
next touched, which page migration otherwise tries to avoid.  Swapcache
in an mlocked area is rare, so won't often matter, but still better fixed.

Just rearrange the block in try_to_unmap_one(), to handle TTU_MIGRATION
before checking PageAnon, that's all (apart from some reindenting).

Well, no, that's not quite all: doesn't this by the way fix a soft_dirty
bug, that page migration of a file page was forgetting to transfer the
soft_dirty bit?  Probably not a serious bug: if I understand correctly,
soft_dirty afficionados usually have to handle file pages separately
anyway; but we publish the bit in /proc/<pid>/pagemap on file mappings
as well as anonymous, so page migration ought not to perturb it.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/rmap.c |   63 ++++++++++++++++++++++++----------------------------
 1 file changed, 30 insertions(+), 33 deletions(-)

--- migrat.orig/mm/rmap.c	2015-10-18 17:53:07.099317502 -0700
+++ migrat/mm/rmap.c	2015-10-18 17:53:29.244342714 -0700
@@ -1377,47 +1377,44 @@ static int try_to_unmap_one(struct page
 			dec_mm_counter(mm, MM_ANONPAGES);
 		else
 			dec_mm_counter(mm, MM_FILEPAGES);
+	} else if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION)) {
+		swp_entry_t entry;
+		pte_t swp_pte;
+		/*
+		 * Store the pfn of the page in a special migration
+		 * pte. do_swap_page() will wait until the migration
+		 * pte is removed and then restart fault handling.
+		 */
+		entry = make_migration_entry(page, pte_write(pteval));
+		swp_pte = swp_entry_to_pte(entry);
+		if (pte_soft_dirty(pteval))
+			swp_pte = pte_swp_mksoft_dirty(swp_pte);
+		set_pte_at(mm, address, pte, swp_pte);
 	} else if (PageAnon(page)) {
 		swp_entry_t entry = { .val = page_private(page) };
 		pte_t swp_pte;
-
-		if (PageSwapCache(page)) {
-			/*
-			 * Store the swap location in the pte.
-			 * See handle_pte_fault() ...
-			 */
-			if (swap_duplicate(entry) < 0) {
-				set_pte_at(mm, address, pte, pteval);
-				ret = SWAP_FAIL;
-				goto out_unmap;
-			}
-			if (list_empty(&mm->mmlist)) {
-				spin_lock(&mmlist_lock);
-				if (list_empty(&mm->mmlist))
-					list_add(&mm->mmlist, &init_mm.mmlist);
-				spin_unlock(&mmlist_lock);
-			}
-			dec_mm_counter(mm, MM_ANONPAGES);
-			inc_mm_counter(mm, MM_SWAPENTS);
-		} else if (IS_ENABLED(CONFIG_MIGRATION)) {
-			/*
-			 * Store the pfn of the page in a special migration
-			 * pte. do_swap_page() will wait until the migration
-			 * pte is removed and then restart fault handling.
-			 */
-			BUG_ON(!(flags & TTU_MIGRATION));
-			entry = make_migration_entry(page, pte_write(pteval));
+		/*
+		 * Store the swap location in the pte.
+		 * See handle_pte_fault() ...
+		 */
+		VM_BUG_ON_PAGE(!PageSwapCache(page), page);
+		if (swap_duplicate(entry) < 0) {
+			set_pte_at(mm, address, pte, pteval);
+			ret = SWAP_FAIL;
+			goto out_unmap;
 		}
+		if (list_empty(&mm->mmlist)) {
+			spin_lock(&mmlist_lock);
+			if (list_empty(&mm->mmlist))
+				list_add(&mm->mmlist, &init_mm.mmlist);
+			spin_unlock(&mmlist_lock);
+		}
+		dec_mm_counter(mm, MM_ANONPAGES);
+		inc_mm_counter(mm, MM_SWAPENTS);
 		swp_pte = swp_entry_to_pte(entry);
 		if (pte_soft_dirty(pteval))
 			swp_pte = pte_swp_mksoft_dirty(swp_pte);
 		set_pte_at(mm, address, pte, swp_pte);
-	} else if (IS_ENABLED(CONFIG_MIGRATION) &&
-		   (flags & TTU_MIGRATION)) {
-		/* Establish migration entry for a file page */
-		swp_entry_t entry;
-		entry = make_migration_entry(page, pte_write(pteval));
-		set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
 	} else
 		dec_mm_counter(mm, MM_FILEPAGES);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 11/12] mm: page migration avoid touching newpage until no going back
  2015-10-19  4:44 [PATCH 0/12] mm: page migration cleanups, and a little mlock Hugh Dickins
                   ` (9 preceding siblings ...)
  2015-10-19  5:05 ` [PATCH 10/12] mm: page migration use migration entry for swapcache too Hugh Dickins
@ 2015-10-19  5:07 ` Hugh Dickins
  2015-10-19  5:11 ` [PATCH 12/12] mm: migrate dirty page without clear_page_dirty_for_io etc Hugh Dickins
  11 siblings, 0 replies; 39+ messages in thread
From: Hugh Dickins @ 2015-10-19  5:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Johannes Weiner, Naoya Horiguchi, linux-mm

We have had trouble in the past from the way in which page migration's
newpage is initialized in dribs and drabs - see commit 8bdd63809160
("mm: fix direct reclaim writeback regression") which proposed a cleanup.

We have no actual problem now, but I think the procedure would be clearer
(and alternative get_new_page pools safer to implement) if we assert that
newpage is not touched until we are sure that it's going to be used -
except for taking the trylock on it in __unmap_and_move().

So shift the early initializations from move_to_new_page() into
migrate_page_move_mapping(), mapping and NULL-mapping paths.  Similarly
migrate_huge_page_move_mapping(), but its NULL-mapping path can just be
deleted: you cannot reach hugetlbfs_migrate_page() with a NULL mapping.

Adjust stages 3 to 8 in the Documentation file accordingly.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 Documentation/vm/page_migration |   19 +++++------
 mm/migrate.c                    |   49 ++++++++++++------------------
 2 files changed, 30 insertions(+), 38 deletions(-)

--- migrat.orig/Documentation/vm/page_migration	2015-10-18 17:53:03.715313650 -0700
+++ migrat/Documentation/vm/page_migration	2015-10-18 17:53:31.871345704 -0700
@@ -92,27 +92,26 @@ Steps:
 
 2. Insure that writeback is complete.
 
-3. Prep the new page that we want to move to. It is locked
-   and set to not being uptodate so that all accesses to the new
-   page immediately lock while the move is in progress.
+3. Lock the new page that we want to move to. It is locked so that accesses to
+   this (not yet uptodate) page immediately lock while the move is in progress.
 
-4. The new page is prepped with some settings from the old page so that
-   accesses to the new page will discover a page with the correct settings.
-
-5. All the page table references to the page are converted to migration
+4. All the page table references to the page are converted to migration
    entries. This decreases the mapcount of a page. If the resulting
    mapcount is not zero then we do not migrate the page. All user space
    processes that attempt to access the page will now wait on the page lock.
 
-6. The radix tree lock is taken. This will cause all processes trying
+5. The radix tree lock is taken. This will cause all processes trying
    to access the page via the mapping to block on the radix tree spinlock.
 
-7. The refcount of the page is examined and we back out if references remain
+6. The refcount of the page is examined and we back out if references remain
    otherwise we know that we are the only one referencing this page.
 
-8. The radix tree is checked and if it does not contain the pointer to this
+7. The radix tree is checked and if it does not contain the pointer to this
    page then we back out because someone else modified the radix tree.
 
+8. The new page is prepped with some settings from the old page so that
+   accesses to the new page will discover a page with the correct settings.
+
 9. The radix tree is changed to point to the new page.
 
 10. The reference count of the old page is dropped because the radix tree
--- migrat.orig/mm/migrate.c	2015-10-18 17:53:27.120340297 -0700
+++ migrat/mm/migrate.c	2015-10-18 17:53:31.871345704 -0700
@@ -320,6 +320,14 @@ int migrate_page_move_mapping(struct add
 		/* Anonymous page without mapping */
 		if (page_count(page) != expected_count)
 			return -EAGAIN;
+
+		/* No turning back from here */
+		set_page_memcg(newpage, page_memcg(page));
+		newpage->index = page->index;
+		newpage->mapping = page->mapping;
+		if (PageSwapBacked(page))
+			SetPageSwapBacked(newpage);
+
 		return MIGRATEPAGE_SUCCESS;
 	}
 
@@ -355,8 +363,15 @@ int migrate_page_move_mapping(struct add
 	}
 
 	/*
-	 * Now we know that no one else is looking at the page.
+	 * Now we know that no one else is looking at the page:
+	 * no turning back from here.
 	 */
+	set_page_memcg(newpage, page_memcg(page));
+	newpage->index = page->index;
+	newpage->mapping = page->mapping;
+	if (PageSwapBacked(page))
+		SetPageSwapBacked(newpage);
+
 	get_page(newpage);	/* add cache reference */
 	if (PageSwapCache(page)) {
 		SetPageSwapCache(newpage);
@@ -403,12 +418,6 @@ int migrate_huge_page_move_mapping(struc
 	int expected_count;
 	void **pslot;
 
-	if (!mapping) {
-		if (page_count(page) != 1)
-			return -EAGAIN;
-		return MIGRATEPAGE_SUCCESS;
-	}
-
 	spin_lock_irq(&mapping->tree_lock);
 
 	pslot = radix_tree_lookup_slot(&mapping->page_tree,
@@ -426,6 +435,9 @@ int migrate_huge_page_move_mapping(struc
 		return -EAGAIN;
 	}
 
+	set_page_memcg(newpage, page_memcg(page));
+	newpage->index = page->index;
+	newpage->mapping = page->mapping;
 	get_page(newpage);
 
 	radix_tree_replace_slot(pslot, newpage);
@@ -730,21 +742,6 @@ static int move_to_new_page(struct page
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 	VM_BUG_ON_PAGE(!PageLocked(newpage), newpage);
 
-	/* Prepare mapping for the new page.*/
-	newpage->index = page->index;
-	newpage->mapping = page->mapping;
-	if (PageSwapBacked(page))
-		SetPageSwapBacked(newpage);
-
-	/*
-	 * Indirectly called below, migrate_page_copy() copies PG_dirty and thus
-	 * needs newpage's memcg set to transfer memcg dirty page accounting.
-	 * So perform memcg migration in two steps:
-	 * 1. set newpage->mem_cgroup (here)
-	 * 2. clear page->mem_cgroup (below)
-	 */
-	set_page_memcg(newpage, page_memcg(page));
-
 	mapping = page_mapping(page);
 	if (!mapping)
 		rc = migrate_page(mapping, newpage, page, mode);
@@ -767,9 +764,6 @@ static int move_to_new_page(struct page
 		set_page_memcg(page, NULL);
 		if (!PageAnon(page))
 			page->mapping = NULL;
-	} else {
-		set_page_memcg(newpage, NULL);
-		newpage->mapping = NULL;
 	}
 	return rc;
 }
@@ -971,10 +965,9 @@ out:
 	 * it.  Otherwise, putback_lru_page() will drop the reference grabbed
 	 * during isolation.
 	 */
-	if (put_new_page) {
-		ClearPageSwapBacked(newpage);
+	if (put_new_page)
 		put_new_page(newpage, private);
-	} else if (unlikely(__is_movable_balloon_page(newpage))) {
+	else if (unlikely(__is_movable_balloon_page(newpage))) {
 		/* drop our reference, page already in the balloon */
 		put_page(newpage);
 	} else

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 12/12] mm: migrate dirty page without clear_page_dirty_for_io etc
  2015-10-19  4:44 [PATCH 0/12] mm: page migration cleanups, and a little mlock Hugh Dickins
                   ` (10 preceding siblings ...)
  2015-10-19  5:07 ` [PATCH 11/12] mm: page migration avoid touching newpage until no going back Hugh Dickins
@ 2015-10-19  5:11 ` Hugh Dickins
  11 siblings, 0 replies; 39+ messages in thread
From: Hugh Dickins @ 2015-10-19  5:11 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Johannes Weiner, Michal Hocko, Peter Zijlstra,
	Greg Thelen, linux-mm

clear_page_dirty_for_io() has accumulated writeback and memcg subtleties
since v2.6.16 first introduced page migration; and the set_page_dirty()
which completed its migration of PageDirty, later had to be moderated to
__set_page_dirty_nobuffers(); then PageSwapBacked had to skip that too.

No actual problems seen with this procedure recently, but if you look
into what the clear_page_dirty_for_io(page)+set_page_dirty(newpage) is
actually achieving, it turns out to be nothing more than moving the
PageDirty flag, and its NR_FILE_DIRTY stat from one zone to another.

It would be good to avoid a pile of irrelevant decrementations and
incrementations, and improper event counting, and unnecessary descent
of the radix_tree under tree_lock (to set the PAGECACHE_TAG_DIRTY which
radix_tree_replace_slot() left in place anyway).

Do the NR_FILE_DIRTY movement, like the other stats movements, while
interrupts still disabled in migrate_page_move_mapping(); and don't even
bother if the zone is the same.  Do the PageDirty movement there under
tree_lock too, where old page is frozen and newpage not yet visible:
bearing in mind that as soon as newpage becomes visible in radix_tree,
an un-page-locked set_page_dirty() might interfere (or perhaps that's
just not possible: anything doing so should already hold an additional
reference to the old page, preventing its migration; but play safe).

But we do still need to transfer PageDirty in migrate_page_copy(), for
those who don't go the mapping route through migrate_page_move_mapping().

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/migrate.c |   51 +++++++++++++++++++++++++++++--------------------
 1 file changed, 31 insertions(+), 20 deletions(-)

--- migrat.orig/mm/migrate.c	2015-10-18 17:53:31.871345704 -0700
+++ migrat/mm/migrate.c	2015-10-18 17:53:33.899348014 -0700
@@ -30,6 +30,7 @@
 #include <linux/mempolicy.h>
 #include <linux/vmalloc.h>
 #include <linux/security.h>
+#include <linux/backing-dev.h>
 #include <linux/syscalls.h>
 #include <linux/hugetlb.h>
 #include <linux/hugetlb_cgroup.h>
@@ -313,6 +314,8 @@ int migrate_page_move_mapping(struct add
 		struct buffer_head *head, enum migrate_mode mode,
 		int extra_count)
 {
+	struct zone *oldzone, *newzone;
+	int dirty;
 	int expected_count = 1 + extra_count;
 	void **pslot;
 
@@ -331,6 +334,9 @@ int migrate_page_move_mapping(struct add
 		return MIGRATEPAGE_SUCCESS;
 	}
 
+	oldzone = page_zone(page);
+	newzone = page_zone(newpage);
+
 	spin_lock_irq(&mapping->tree_lock);
 
 	pslot = radix_tree_lookup_slot(&mapping->page_tree,
@@ -378,6 +384,13 @@ int migrate_page_move_mapping(struct add
 		set_page_private(newpage, page_private(page));
 	}
 
+	/* Move dirty while page refs frozen and newpage not yet exposed */
+	dirty = PageDirty(page);
+	if (dirty) {
+		ClearPageDirty(page);
+		SetPageDirty(newpage);
+	}
+
 	radix_tree_replace_slot(pslot, newpage);
 
 	/*
@@ -387,6 +400,9 @@ int migrate_page_move_mapping(struct add
 	 */
 	page_unfreeze_refs(page, expected_count - 1);
 
+	spin_unlock(&mapping->tree_lock);
+	/* Leave irq disabled to prevent preemption while updating stats */
+
 	/*
 	 * If moved to a different zone then also account
 	 * the page for that zone. Other VM counters will be
@@ -397,13 +413,19 @@ int migrate_page_move_mapping(struct add
 	 * via NR_FILE_PAGES and NR_ANON_PAGES if they
 	 * are mapped to swap space.
 	 */
-	__dec_zone_page_state(page, NR_FILE_PAGES);
-	__inc_zone_page_state(newpage, NR_FILE_PAGES);
-	if (!PageSwapCache(page) && PageSwapBacked(page)) {
-		__dec_zone_page_state(page, NR_SHMEM);
-		__inc_zone_page_state(newpage, NR_SHMEM);
+	if (newzone != oldzone) {
+		__dec_zone_state(oldzone, NR_FILE_PAGES);
+		__inc_zone_state(newzone, NR_FILE_PAGES);
+		if (PageSwapBacked(page) && !PageSwapCache(page)) {
+			__dec_zone_state(oldzone, NR_SHMEM);
+			__inc_zone_state(newzone, NR_SHMEM);
+		}
+		if (dirty && mapping_cap_account_dirty(mapping)) {
+			__dec_zone_state(oldzone, NR_FILE_DIRTY);
+			__inc_zone_state(newzone, NR_FILE_DIRTY);
+		}
 	}
-	spin_unlock_irq(&mapping->tree_lock);
+	local_irq_enable();
 
 	return MIGRATEPAGE_SUCCESS;
 }
@@ -524,20 +546,9 @@ void migrate_page_copy(struct page *newp
 	if (PageMappedToDisk(page))
 		SetPageMappedToDisk(newpage);
 
-	if (PageDirty(page)) {
-		clear_page_dirty_for_io(page);
-		/*
-		 * Want to mark the page and the radix tree as dirty, and
-		 * redo the accounting that clear_page_dirty_for_io undid,
-		 * but we can't use set_page_dirty because that function
-		 * is actually a signal that all of the page has become dirty.
-		 * Whereas only part of our page may be dirty.
-		 */
-		if (PageSwapBacked(page))
-			SetPageDirty(newpage);
-		else
-			__set_page_dirty_nobuffers(newpage);
- 	}
+	/* Move dirty on pages not done by migrate_page_move_mapping() */
+	if (PageDirty(page))
+		SetPageDirty(newpage);
 
 	if (page_is_young(page))
 		set_page_young(newpage);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/12] mm: rmap use pte lock not mmap_sem to set PageMlocked
  2015-10-19  4:50 ` [PATCH 2/12] mm: rmap use pte lock not mmap_sem to set PageMlocked Hugh Dickins
@ 2015-10-19  6:23   ` Vlastimil Babka
  2015-10-19 11:20     ` Hugh Dickins
  0 siblings, 1 reply; 39+ messages in thread
From: Vlastimil Babka @ 2015-10-19  6:23 UTC (permalink / raw)
  To: Hugh Dickins, Andrew Morton
  Cc: Christoph Lameter, Kirill A. Shutemov, Rik van Riel,
	Davidlohr Bueso, Oleg Nesterov, Sasha Levin, Andrey Konovalov,
	Dmitry Vyukov, KOSAKI Motohiro, linux-mm

On 10/19/2015 06:50 AM, Hugh Dickins wrote:
> KernelThreadSanitizer (ktsan) has shown that the down_read_trylock()
> of mmap_sem in try_to_unmap_one() (when going to set PageMlocked on
> a page found mapped in a VM_LOCKED vma) is ineffective against races
> with exit_mmap()'s munlock_vma_pages_all(), because mmap_sem is not
> held when tearing down an mm.
>
> But that's okay, those races are benign; and although we've believed

But didn't Kirill show that it's not so benign, and can leak memory?
- http://marc.info/?l=linux-mm&m=144196800325498&w=2
Although as I noted, it probably doesn't leak completely. But a page 
will remain unevictable, until its last user unmaps it, which is again 
not completely benign?
- http://marc.info/?l=linux-mm&m=144198536831589&w=2


> for years in that ugly down_read_trylock(), it's unsuitable for the job,
> and frustrates the good intention of setting PageMlocked when it fails.
>
> It just doesn't matter if here we read vm_flags an instant before or
> after a racing mlock() or munlock() or exit_mmap() sets or clears
> VM_LOCKED: the syscalls (or exit) work their way up the address space
> (taking pt locks after updating vm_flags) to establish the final state.
>
> We do still need to be careful never to mark a page Mlocked (hence
> unevictable) by any race that will not be corrected shortly after.

And waiting for the last user to unmap the page is not necessarily 
shortly after :)

Anyway pte lock looks like it could work, but I'll need to think about 
it some more, because...

> The page lock protects from many of the races, but not all (a page
> is not necessarily locked when it's unmapped).  But the pte lock we
> just dropped is good to cover the rest (and serializes even with
> munlock_vma_pages_all(),

Note how munlock_vma_pages_range() via __munlock_pagevec() does 
TestClearPageMlocked() without (or "between") pte or page lock. But the 
pte lock is being taken after clearing VM_LOCKED, so perhaps it's safe 
against try_to_unmap_one...

> so no special barriers required): now hold
> on to the pte lock while calling mlock_vma_page().  Is that lock
> ordering safe?  Yes, that's how follow_page_pte() calls it, and
> how page_remove_rmap() calls the complementary clear_page_mlock().
>
> This fixes the following case (though not a case which anyone has
> complained of), which mmap_sem did not: truncation's preliminary
> unmap_mapping_range() is supposed to remove even the anonymous COWs
> of filecache pages, and that might race with try_to_unmap_one() on a
> VM_LOCKED vma, so that mlock_vma_page() sets PageMlocked just after
> zap_pte_range() unmaps the page, causing "Bad page state (mlocked)"
> when freed.  The pte lock protects against this.
>
> You could say that it also protects against the more ordinary case,
> racing with the preliminary unmapping of a filecache page itself: but
> in our current tree, that's independently protected by i_mmap_rwsem;
> and that race would be why "Bad page state (mlocked)" was seen before
> commit 48ec833b7851 ("Revert mm/memory.c: share the i_mmap_rwsem").
>
> While we're here, make a related optimization in try_to_munmap_one():
> if it's doing TTU_MUNLOCK, then there's no point at all in descending
> the page tables and getting the pt lock, unless the vma is VM_LOCKED.
> Yes, that can change racily, but it can change racily even without the
> optimization: it's not critical.  Far better not to waste time here.
>
> Stopped short of separating try_to_munlock_one() from try_to_munmap_one()
> on this occasion, but that's probably the sensible next step - with a
> rename, given that try_to_munlock()'s business is to try to set Mlocked.
>
> Updated the unevictable-lru Documentation, to remove its reference to
> mmap semaphore, but found a few more updates needed in just that area.
>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
> I've cc'ed a lot of people on this one, since it originated in the
> "Multiple potential races on vma->vm_flags" discussion.  But didn't
> cc everyone on the not-very-interesting 1/12 "mm Documentation: undoc
> non-linear vmas" - so if you apply just this to a 4.3-rc6 or mmotm tree,
> yes, there will be rejects on unevictable-lru.txt, that's expected.
>
>   Documentation/vm/unevictable-lru.txt |   65 ++++++-------------------
>   mm/rmap.c                            |   36 ++++---------
>   2 files changed, 29 insertions(+), 72 deletions(-)
>
> --- migrat.orig/Documentation/vm/unevictable-lru.txt	2015-10-18 17:53:03.716313651 -0700
> +++ migrat/Documentation/vm/unevictable-lru.txt	2015-10-18 17:53:07.098317501 -0700
> @@ -531,37 +531,20 @@ map.
>
>   try_to_unmap() is always called, by either vmscan for reclaim or for page
>   migration, with the argument page locked and isolated from the LRU.  Separate
> -functions handle anonymous and mapped file pages, as these types of pages have
> -different reverse map mechanisms.
> -
> - (*) try_to_unmap_anon()
> -
> -     To unmap anonymous pages, each VMA in the list anchored in the anon_vma
> -     must be visited - at least until a VM_LOCKED VMA is encountered.  If the
> -     page is being unmapped for migration, VM_LOCKED VMAs do not stop the
> -     process because mlocked pages are migratable.  However, for reclaim, if
> -     the page is mapped into a VM_LOCKED VMA, the scan stops.
> -
> -     try_to_unmap_anon() attempts to acquire in read mode the mmap semaphore of
> -     the mm_struct to which the VMA belongs.  If this is successful, it will
> -     mlock the page via mlock_vma_page() - we wouldn't have gotten to
> -     try_to_unmap_anon() if the page were already mlocked - and will return
> -     SWAP_MLOCK, indicating that the page is unevictable.
> -
> -     If the mmap semaphore cannot be acquired, we are not sure whether the page
> -     is really unevictable or not.  In this case, try_to_unmap_anon() will
> -     return SWAP_AGAIN.
> -
> - (*) try_to_unmap_file()
> -
> -     Unmapping of a mapped file page works the same as for anonymous mappings,
> -     except that the scan visits all VMAs that map the page's index/page offset
> -     in the page's mapping's reverse map interval search tree.
> -
> -     As for anonymous pages, on encountering a VM_LOCKED VMA for a mapped file
> -     page, try_to_unmap_file() will attempt to acquire the associated
> -     mm_struct's mmap semaphore to mlock the page, returning SWAP_MLOCK if this
> -     is successful, and SWAP_AGAIN, if not.
> +functions handle anonymous and mapped file and KSM pages, as these types of
> +pages have different reverse map lookup mechanisms, with different locking.
> +In each case, whether rmap_walk_anon() or rmap_walk_file() or rmap_walk_ksm(),
> +it will call try_to_unmap_one() for every VMA which might contain the page.
> +
> +When trying to reclaim, if try_to_unmap_one() finds the page in a VM_LOCKED
> +VMA, it will then mlock the page via mlock_vma_page() instead of unmapping it,
> +and return SWAP_MLOCK to indicate that the page is unevictable: and the scan
> +stops there.
> +
> +mlock_vma_page() is called while holding the page table's lock (in addition
> +to the page lock, and the rmap lock): to serialize against concurrent mlock or
> +munlock or munmap system calls, mm teardown (munlock_vma_pages_all), reclaim,
> +holepunching, and truncation of file pages and their anonymous COWed pages.
>
>
>   try_to_munlock() REVERSE MAP SCAN
> @@ -577,22 +560,15 @@ all PTEs from the page.  For this purpos
>   introduced a variant of try_to_unmap() called try_to_munlock().
>
>   try_to_munlock() calls the same functions as try_to_unmap() for anonymous and
> -mapped file pages with an additional argument specifying unlock versus unmap
> +mapped file and KSM pages with a flag argument specifying unlock versus unmap
>   processing.  Again, these functions walk the respective reverse maps looking
>   for VM_LOCKED VMAs.  When such a VMA is found, as in the try_to_unmap() case,
> -the functions attempt to acquire the associated mmap semaphore, mlock the page
> -via mlock_vma_page() and return SWAP_MLOCK.  This effectively undoes the
> -pre-clearing of the page's PG_mlocked done by munlock_vma_page.
> -
> -If try_to_unmap() is unable to acquire a VM_LOCKED VMA's associated mmap
> -semaphore, it will return SWAP_AGAIN.  This will allow shrink_page_list() to
> -recycle the page on the inactive list and hope that it has better luck with the
> -page next time.
> +the functions mlock the page via mlock_vma_page() and return SWAP_MLOCK.  This
> +undoes the pre-clearing of the page's PG_mlocked done by munlock_vma_page.
>
>   Note that try_to_munlock()'s reverse map walk must visit every VMA in a page's
>   reverse map to determine that a page is NOT mapped into any VM_LOCKED VMA.
> -However, the scan can terminate when it encounters a VM_LOCKED VMA and can
> -successfully acquire the VMA's mmap semaphore for read and mlock the page.
> +However, the scan can terminate when it encounters a VM_LOCKED VMA.
>   Although try_to_munlock() might be called a great many times when munlocking a
>   large region or tearing down a large address space that has been mlocked via
>   mlockall(), overall this is a fairly rare event.
> @@ -620,11 +596,6 @@ Some examples of these unevictable pages
>    (3) mlocked pages that could not be isolated from the LRU and moved to the
>        unevictable list in mlock_vma_page().
>
> - (4) Pages mapped into multiple VM_LOCKED VMAs, but try_to_munlock() couldn't
> -     acquire the VMA's mmap semaphore to test the flags and set PageMlocked.
> -     munlock_vma_page() was forced to let the page back on to the normal LRU
> -     list for vmscan to handle.
> -
>   shrink_inactive_list() also diverts any unevictable pages that it finds on the
>   inactive lists to the appropriate zone's unevictable list.
>
> --- migrat.orig/mm/rmap.c	2015-09-12 18:30:20.857039763 -0700
> +++ migrat/mm/rmap.c	2015-10-18 17:53:07.099317502 -0700
> @@ -1304,6 +1304,10 @@ static int try_to_unmap_one(struct page
>   	int ret = SWAP_AGAIN;
>   	enum ttu_flags flags = (enum ttu_flags)arg;
>
> +	/* munlock has nothing to gain from examining un-locked vmas */
> +	if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED))
> +		goto out;
> +
>   	pte = page_check_address(page, mm, address, &ptl, 0);
>   	if (!pte)
>   		goto out;
> @@ -1314,9 +1318,12 @@ static int try_to_unmap_one(struct page
>   	 * skipped over this mm) then we should reactivate it.
>   	 */
>   	if (!(flags & TTU_IGNORE_MLOCK)) {
> -		if (vma->vm_flags & VM_LOCKED)
> -			goto out_mlock;
> -
> +		if (vma->vm_flags & VM_LOCKED) {
> +			/* Holding pte lock, we do *not* need mmap_sem here */
> +			mlock_vma_page(page);
> +			ret = SWAP_MLOCK;
> +			goto out_unmap;
> +		}
>   		if (flags & TTU_MUNLOCK)
>   			goto out_unmap;
>   	}
> @@ -1419,31 +1426,10 @@ static int try_to_unmap_one(struct page
>
>   out_unmap:
>   	pte_unmap_unlock(pte, ptl);
> -	if (ret != SWAP_FAIL && !(flags & TTU_MUNLOCK))
> +	if (ret != SWAP_FAIL && ret != SWAP_MLOCK && !(flags & TTU_MUNLOCK))
>   		mmu_notifier_invalidate_page(mm, address);
>   out:
>   	return ret;
> -
> -out_mlock:
> -	pte_unmap_unlock(pte, ptl);
> -
> -
> -	/*
> -	 * We need mmap_sem locking, Otherwise VM_LOCKED check makes
> -	 * unstable result and race. Plus, We can't wait here because
> -	 * we now hold anon_vma->rwsem or mapping->i_mmap_rwsem.
> -	 * if trylock failed, the page remain in evictable lru and later
> -	 * vmscan could retry to move the page to unevictable lru if the
> -	 * page is actually mlocked.
> -	 */
> -	if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
> -		if (vma->vm_flags & VM_LOCKED) {
> -			mlock_vma_page(page);
> -			ret = SWAP_MLOCK;
> -		}
> -		up_read(&vma->vm_mm->mmap_sem);
> -	}
> -	return ret;
>   }
>
>   bool is_vma_temporary_stack(struct vm_area_struct *vma)
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/12] mm Documentation: undoc non-linear vmas
  2015-10-19  4:45 ` [PATCH 1/12] mm Documentation: undoc non-linear vmas Hugh Dickins
@ 2015-10-19  9:16   ` Kirill A. Shutemov
  2015-11-05 17:29   ` Vlastimil Babka
  1 sibling, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2015-10-19  9:16 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Christoph Lameter, Kirill A. Shutemov, linux-mm

On Sun, Oct 18, 2015 at 09:45:47PM -0700, Hugh Dickins wrote:
> While updating some mm Documentation, I came across a few straggling
> references to the non-linear vmas which were happily removed in v4.0.
> Delete them.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

> ---
>  Documentation/filesystems/proc.txt   |    1 
>  Documentation/vm/page_migration      |   10 +--
>  Documentation/vm/unevictable-lru.txt |   63 +------------------------
>  3 files changed, 9 insertions(+), 65 deletions(-)
> 
> --- migrat.orig/Documentation/filesystems/proc.txt	2015-09-12 18:30:13.713047477 -0700
> +++ migrat/Documentation/filesystems/proc.txt	2015-10-18 17:53:03.715313650 -0700
> @@ -474,7 +474,6 @@ manner. The codes are the following:
>      ac  - area is accountable
>      nr  - swap space is not reserved for the area
>      ht  - area uses huge tlb pages
> -    nl  - non-linear mapping
>      ar  - architecture specific flag
>      dd  - do not include area into core dump
>      sd  - soft-dirty flag
> --- migrat.orig/Documentation/vm/page_migration	2009-06-09 20:05:27.000000000 -0700
> +++ migrat/Documentation/vm/page_migration	2015-10-18 17:53:03.715313650 -0700
> @@ -99,12 +99,10 @@ Steps:
>  4. The new page is prepped with some settings from the old page so that
>     accesses to the new page will discover a page with the correct settings.
>  
> -5. All the page table references to the page are converted
> -   to migration entries or dropped (nonlinear vmas).
> -   This decrease the mapcount of a page. If the resulting
> -   mapcount is not zero then we do not migrate the page.
> -   All user space processes that attempt to access the page
> -   will now wait on the page lock.
> +5. All the page table references to the page are converted to migration
> +   entries. This decreases the mapcount of a page. If the resulting
> +   mapcount is not zero then we do not migrate the page. All user space
> +   processes that attempt to access the page will now wait on the page lock.
>  
>  6. The radix tree lock is taken. This will cause all processes trying
>     to access the page via the mapping to block on the radix tree spinlock.
> --- migrat.orig/Documentation/vm/unevictable-lru.txt	2015-08-30 11:34:09.000000000 -0700
> +++ migrat/Documentation/vm/unevictable-lru.txt	2015-10-18 17:53:03.716313651 -0700
> @@ -552,63 +552,17 @@ different reverse map mechanisms.
>       is really unevictable or not.  In this case, try_to_unmap_anon() will
>       return SWAP_AGAIN.
>  
> - (*) try_to_unmap_file() - linear mappings
> + (*) try_to_unmap_file()
>  
>       Unmapping of a mapped file page works the same as for anonymous mappings,
>       except that the scan visits all VMAs that map the page's index/page offset
> -     in the page's mapping's reverse map priority search tree.  It also visits
> -     each VMA in the page's mapping's non-linear list, if the list is
> -     non-empty.
> +     in the page's mapping's reverse map interval search tree.
>  
>       As for anonymous pages, on encountering a VM_LOCKED VMA for a mapped file
>       page, try_to_unmap_file() will attempt to acquire the associated
>       mm_struct's mmap semaphore to mlock the page, returning SWAP_MLOCK if this
>       is successful, and SWAP_AGAIN, if not.
>  
> - (*) try_to_unmap_file() - non-linear mappings
> -
> -     If a page's mapping contains a non-empty non-linear mapping VMA list, then
> -     try_to_un{map|lock}() must also visit each VMA in that list to determine
> -     whether the page is mapped in a VM_LOCKED VMA.  Again, the scan must visit
> -     all VMAs in the non-linear list to ensure that the pages is not/should not
> -     be mlocked.
> -
> -     If a VM_LOCKED VMA is found in the list, the scan could terminate.
> -     However, there is no easy way to determine whether the page is actually
> -     mapped in a given VMA - either for unmapping or testing whether the
> -     VM_LOCKED VMA actually pins the page.
> -
> -     try_to_unmap_file() handles non-linear mappings by scanning a certain
> -     number of pages - a "cluster" - in each non-linear VMA associated with the
> -     page's mapping, for each file mapped page that vmscan tries to unmap.  If
> -     this happens to unmap the page we're trying to unmap, try_to_unmap() will
> -     notice this on return (page_mapcount(page) will be 0) and return
> -     SWAP_SUCCESS.  Otherwise, it will return SWAP_AGAIN, causing vmscan to
> -     recirculate this page.  We take advantage of the cluster scan in
> -     try_to_unmap_cluster() as follows:
> -
> -	For each non-linear VMA, try_to_unmap_cluster() attempts to acquire the
> -	mmap semaphore of the associated mm_struct for read without blocking.
> -
> -	If this attempt is successful and the VMA is VM_LOCKED,
> -	try_to_unmap_cluster() will retain the mmap semaphore for the scan;
> -	otherwise it drops it here.
> -
> -	Then, for each page in the cluster, if we're holding the mmap semaphore
> -	for a locked VMA, try_to_unmap_cluster() calls mlock_vma_page() to
> -	mlock the page.  This call is a no-op if the page is already locked,
> -	but will mlock any pages in the non-linear mapping that happen to be
> -	unlocked.
> -
> -	If one of the pages so mlocked is the page passed in to try_to_unmap(),
> -	try_to_unmap_cluster() will return SWAP_MLOCK, rather than the default
> -	SWAP_AGAIN.  This will allow vmscan to cull the page, rather than
> -	recirculating it on the inactive list.
> -
> -	Again, if try_to_unmap_cluster() cannot acquire the VMA's mmap sem, it
> -	returns SWAP_AGAIN, indicating that the page is mapped by a VM_LOCKED
> -	VMA, but couldn't be mlocked.
> -
>  
>  try_to_munlock() REVERSE MAP SCAN
>  ---------------------------------
> @@ -625,10 +579,9 @@ introduced a variant of try_to_unmap() c
>  try_to_munlock() calls the same functions as try_to_unmap() for anonymous and
>  mapped file pages with an additional argument specifying unlock versus unmap
>  processing.  Again, these functions walk the respective reverse maps looking
> -for VM_LOCKED VMAs.  When such a VMA is found for anonymous pages and file
> -pages mapped in linear VMAs, as in the try_to_unmap() case, the functions
> -attempt to acquire the associated mmap semaphore, mlock the page via
> -mlock_vma_page() and return SWAP_MLOCK.  This effectively undoes the
> +for VM_LOCKED VMAs.  When such a VMA is found, as in the try_to_unmap() case,
> +the functions attempt to acquire the associated mmap semaphore, mlock the page
> +via mlock_vma_page() and return SWAP_MLOCK.  This effectively undoes the
>  pre-clearing of the page's PG_mlocked done by munlock_vma_page.
>  
>  If try_to_unmap() is unable to acquire a VM_LOCKED VMA's associated mmap
> @@ -636,12 +589,6 @@ semaphore, it will return SWAP_AGAIN.  T
>  recycle the page on the inactive list and hope that it has better luck with the
>  page next time.
>  
> -For file pages mapped into non-linear VMAs, the try_to_munlock() logic works
> -slightly differently.  On encountering a VM_LOCKED non-linear VMA that might
> -map the page, try_to_munlock() returns SWAP_AGAIN without actually mlocking the
> -page.  munlock_vma_page() will just leave the page unlocked and let vmscan deal
> -with it - the usual fallback position.
> -
>  Note that try_to_munlock()'s reverse map walk must visit every VMA in a page's
>  reverse map to determine that a page is NOT mapped into any VM_LOCKED VMA.
>  However, the scan can terminate when it encounters a VM_LOCKED VMA and can
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/12] mm: rmap use pte lock not mmap_sem to set PageMlocked
  2015-10-19  6:23   ` Vlastimil Babka
@ 2015-10-19 11:20     ` Hugh Dickins
  2015-10-19 12:33       ` Vlastimil Babka
  2015-10-19 13:13       ` Kirill A. Shutemov
  0 siblings, 2 replies; 39+ messages in thread
From: Hugh Dickins @ 2015-10-19 11:20 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Hugh Dickins, Andrew Morton, Christoph Lameter,
	Kirill A. Shutemov, Rik van Riel, Davidlohr Bueso, Oleg Nesterov,
	Sasha Levin, Andrey Konovalov, Dmitry Vyukov, KOSAKI Motohiro,
	linux-mm

On Mon, 19 Oct 2015, Vlastimil Babka wrote:
> On 10/19/2015 06:50 AM, Hugh Dickins wrote:
> > KernelThreadSanitizer (ktsan) has shown that the down_read_trylock()
> > of mmap_sem in try_to_unmap_one() (when going to set PageMlocked on
> > a page found mapped in a VM_LOCKED vma) is ineffective against races
> > with exit_mmap()'s munlock_vma_pages_all(), because mmap_sem is not
> > held when tearing down an mm.
> > 
> > But that's okay, those races are benign; and although we've believed
> 
> But didn't Kirill show that it's not so benign, and can leak memory?
> - http://marc.info/?l=linux-mm&m=144196800325498&w=2

Kirill's race was this:

		CPU0				CPU1
exit_mmap()
  // mmap_sem is *not* taken
  munlock_vma_pages_all()
    munlock_vma_pages_range()
    					try_to_unmap_one()
					  down_read_trylock(&vma->vm_mm->mmap_sem))
					  !!(vma->vm_flags & VM_LOCKED) == true
      vma->vm_flags &= ~VM_LOCKED;
      <munlock the page>
      					  mlock_vma_page(page);
					  // mlocked pages is leaked.

Hmm, I pulled that in to say that it looked benign to me, that he was
missing all the subsequent "<munlock the page>" which would correct the
situation.  But now I look at it again, I agree with you both: lacking
any relevant locking on CPU1 at that point (it has already given up the
pte lock there), the whole of "<munlock the page>" could take place on
CPU0, before CPU1 reaches its mlock_vma_page(page), yes.

Oh, hold on, no: doesn't page lock prevent that one?  CPU1 has the page
lock throughout, so CPU0's <munlock the page> cannot complete before
CPU1's mlock_vma_page(page).  So now I disagree with you again!

> Although as I noted, it probably doesn't leak completely. But a page will
> remain unevictable, until its last user unmaps it, which is again not
> completely benign?
> - http://marc.info/?l=linux-mm&m=144198536831589&w=2

I agree that we'd be wrong to leave a page on the unevictable lru
indefinitely once it's actually evictable.  But I think my change is
only making the above case easier to think about: trylock on mmap_sem
is a confusing distraction from where the proper locking is done,
whether it be page lock or pte lock.

> 
> 
> > for years in that ugly down_read_trylock(), it's unsuitable for the job,
> > and frustrates the good intention of setting PageMlocked when it fails.
> > 
> > It just doesn't matter if here we read vm_flags an instant before or
> > after a racing mlock() or munlock() or exit_mmap() sets or clears
> > VM_LOCKED: the syscalls (or exit) work their way up the address space
> > (taking pt locks after updating vm_flags) to establish the final state.
> > 
> > We do still need to be careful never to mark a page Mlocked (hence
> > unevictable) by any race that will not be corrected shortly after.
> 
> And waiting for the last user to unmap the page is not necessarily shortly
> after :)
> 
> Anyway pte lock looks like it could work, but I'll need to think about it
> some more, because...
> 
> > The page lock protects from many of the races, but not all (a page
> > is not necessarily locked when it's unmapped).  But the pte lock we
> > just dropped is good to cover the rest (and serializes even with
> > munlock_vma_pages_all(),
> 
> Note how munlock_vma_pages_range() via __munlock_pagevec() does
> TestClearPageMlocked() without (or "between") pte or page lock. But the pte
> lock is being taken after clearing VM_LOCKED, so perhaps it's safe against
> try_to_unmap_one...

A mind-trick I found helpful for understanding the barriers here, is
to imagine that the munlocker repeats its "vma->vm_flags &= ~VM_LOCKED"
every time it takes the pte lock: it does not actually do that, it
doesn't need to of course; but that does help show that ~VM_LOCKED
must be visible to anyone getting that pte lock afterwards.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/12] mm: rmap use pte lock not mmap_sem to set PageMlocked
  2015-10-19 11:20     ` Hugh Dickins
@ 2015-10-19 12:33       ` Vlastimil Babka
  2015-10-19 19:17         ` Hugh Dickins
  2015-10-19 13:13       ` Kirill A. Shutemov
  1 sibling, 1 reply; 39+ messages in thread
From: Vlastimil Babka @ 2015-10-19 12:33 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Christoph Lameter, Kirill A. Shutemov,
	Rik van Riel, Davidlohr Bueso, Oleg Nesterov, Sasha Levin,
	Andrey Konovalov, Dmitry Vyukov, KOSAKI Motohiro, linux-mm

On 10/19/2015 01:20 PM, Hugh Dickins wrote:
> On Mon, 19 Oct 2015, Vlastimil Babka wrote:
>> On 10/19/2015 06:50 AM, Hugh Dickins wrote:
>>> KernelThreadSanitizer (ktsan) has shown that the down_read_trylock()
>>> of mmap_sem in try_to_unmap_one() (when going to set PageMlocked on
>>> a page found mapped in a VM_LOCKED vma) is ineffective against races
>>> with exit_mmap()'s munlock_vma_pages_all(), because mmap_sem is not
>>> held when tearing down an mm.
>>>
>>> But that's okay, those races are benign; and although we've believed
>>
>> But didn't Kirill show that it's not so benign, and can leak memory?
>> - http://marc.info/?l=linux-mm&m=144196800325498&w=2
> 
> Kirill's race was this:
> 
> 		CPU0				CPU1
> exit_mmap()
>    // mmap_sem is *not* taken
>    munlock_vma_pages_all()
>      munlock_vma_pages_range()
>      					try_to_unmap_one()
> 					  down_read_trylock(&vma->vm_mm->mmap_sem))
> 					  !!(vma->vm_flags & VM_LOCKED) == true
>        vma->vm_flags &= ~VM_LOCKED;
>        <munlock the page>
>        					  mlock_vma_page(page);
> 					  // mlocked pages is leaked.
> 
> Hmm, I pulled that in to say that it looked benign to me, that he was
> missing all the subsequent "<munlock the page>" which would correct the
> situation.  But now I look at it again, I agree with you both: lacking
> any relevant locking on CPU1 at that point (it has already given up the
> pte lock there), the whole of "<munlock the page>" could take place on
> CPU0, before CPU1 reaches its mlock_vma_page(page), yes.
> 
> Oh, hold on, no: doesn't page lock prevent that one?  CPU1 has the page
> lock throughout, so CPU0's <munlock the page> cannot complete before
> CPU1's mlock_vma_page(page).  So now I disagree with you again!


I think the page lock doesn't help with munlock_vma_pages_range(). If I
expand the race above:

	CPU0				CPU1
					
exit_mmap()
  // mmap_sem is *not* taken
  munlock_vma_pages_all()
    munlock_vma_pages_range()		
					lock_page()
					...
    					try_to_unmap_one()
					  down_read_trylock(&vma->vm_mm->mmap_sem))
					  !!(vma->vm_flags & VM_LOCKED) == true
      vma->vm_flags &= ~VM_LOCKED;
      __munlock_pagevec_fill
        // this briefly takes pte lock
      __munlock_pagevec()
	// Phase 1
	TestClearPageMlocked(page)

      					  mlock_vma_page(page);
					    TestSetPageMlocked(page)
					    // page is still mlocked
					...
					unlock_page()
        // Phase 2
        lock_page()
        if (!__putback_lru_fast_prepare())
          // true, because page_evictable(page) is false due to PageMlocked
          __munlock_isolated_page
          if (page_mapcount(page) > 1)
             try_to_munlock(page);
             // this will not help AFAICS

Now if CPU0 is the last mapper, it will unmap the page anyway
further in exit_mmap(). If not, it stays mlocked.

The key problem is that page lock doesn't cover the TestClearPageMlocked(page)
part on CPU0.
Your patch should help AFAICS. If CPU1 does the mlock under pte lock, the
TestClear... on CPU0 can happen only after that.
If CPU0 takes pte lock first, then CPU1 must see the VM_LOCKED flag cleared,
right?

>> Although as I noted, it probably doesn't leak completely. But a page will
>> remain unevictable, until its last user unmaps it, which is again not
>> completely benign?
>> - http://marc.info/?l=linux-mm&m=144198536831589&w=2
> 
> I agree that we'd be wrong to leave a page on the unevictable lru
> indefinitely once it's actually evictable.  But I think my change is
> only making the above case easier to think about: trylock on mmap_sem
> is a confusing distraction from where the proper locking is done,
> whether it be page lock or pte lock.
> 
>>
>>
>>> for years in that ugly down_read_trylock(), it's unsuitable for the job,
>>> and frustrates the good intention of setting PageMlocked when it fails.
>>>
>>> It just doesn't matter if here we read vm_flags an instant before or
>>> after a racing mlock() or munlock() or exit_mmap() sets or clears
>>> VM_LOCKED: the syscalls (or exit) work their way up the address space
>>> (taking pt locks after updating vm_flags) to establish the final state.
>>>
>>> We do still need to be careful never to mark a page Mlocked (hence
>>> unevictable) by any race that will not be corrected shortly after.
>>
>> And waiting for the last user to unmap the page is not necessarily shortly
>> after :)
>>
>> Anyway pte lock looks like it could work, but I'll need to think about it
>> some more, because...
>>
>>> The page lock protects from many of the races, but not all (a page
>>> is not necessarily locked when it's unmapped).  But the pte lock we
>>> just dropped is good to cover the rest (and serializes even with
>>> munlock_vma_pages_all(),
>>
>> Note how munlock_vma_pages_range() via __munlock_pagevec() does
>> TestClearPageMlocked() without (or "between") pte or page lock. But the pte
>> lock is being taken after clearing VM_LOCKED, so perhaps it's safe against
>> try_to_unmap_one...
> 
> A mind-trick I found helpful for understanding the barriers here, is
> to imagine that the munlocker repeats its "vma->vm_flags &= ~VM_LOCKED"
> every time it takes the pte lock: it does not actually do that, it
> doesn't need to of course; but that does help show that ~VM_LOCKED
> must be visible to anyone getting that pte lock afterwards.
> 
> Hugh
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/12] mm: rename mem_cgroup_migrate to mem_cgroup_replace_page
  2015-10-19  4:54 ` [PATCH 4/12] mm: rename mem_cgroup_migrate to mem_cgroup_replace_page Hugh Dickins
@ 2015-10-19 12:35   ` Johannes Weiner
  2015-12-02  9:33   ` [PATCH] mm: fix kerneldoc on mem_cgroup_replace_page Hugh Dickins
  1 sibling, 0 replies; 39+ messages in thread
From: Johannes Weiner @ 2015-10-19 12:35 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Christoph Lameter, Michal Hocko, Greg Thelen, linux-mm

On Sun, Oct 18, 2015 at 09:54:26PM -0700, Hugh Dickins wrote:
> After v4.3's commit 0610c25daa3e ("memcg: fix dirty page migration")
> mem_cgroup_migrate() doesn't have much to offer in page migration:
> convert migrate_misplaced_transhuge_page() to set_page_memcg() instead.
> 
> Then rename mem_cgroup_migrate() to mem_cgroup_replace_page(), since its
> remaining callers are replace_page_cache_page() and shmem_replace_page():
> both of whom passed lrucare true, so just eliminate that argument.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/12] mm: rmap use pte lock not mmap_sem to set PageMlocked
  2015-10-19 11:20     ` Hugh Dickins
  2015-10-19 12:33       ` Vlastimil Babka
@ 2015-10-19 13:13       ` Kirill A. Shutemov
  2015-10-19 19:53         ` Hugh Dickins
  2015-10-19 23:30         ` [PATCH " Davidlohr Bueso
  1 sibling, 2 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2015-10-19 13:13 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Vlastimil Babka, Andrew Morton, Christoph Lameter,
	Kirill A. Shutemov, Rik van Riel, Davidlohr Bueso, Oleg Nesterov,
	Sasha Levin, Andrey Konovalov, Dmitry Vyukov, KOSAKI Motohiro,
	linux-mm

On Mon, Oct 19, 2015 at 04:20:05AM -0700, Hugh Dickins wrote:
> > Note how munlock_vma_pages_range() via __munlock_pagevec() does
> > TestClearPageMlocked() without (or "between") pte or page lock. But the pte
> > lock is being taken after clearing VM_LOCKED, so perhaps it's safe against
> > try_to_unmap_one...
> 
> A mind-trick I found helpful for understanding the barriers here, is
> to imagine that the munlocker repeats its "vma->vm_flags &= ~VM_LOCKED"
> every time it takes the pte lock: it does not actually do that, it
> doesn't need to of course; but that does help show that ~VM_LOCKED
> must be visible to anyone getting that pte lock afterwards.

How can you make sure that any other codepath that changes vm_flags would
not make (vm_flags & VM_LOCKED) temporary true while dealing with other
flags?

Compiler can convert things like "vma->vm_flags &= ~VM_FOO;" to whatever
it wants as long as end result is the same. It's very unlikely that it
will generate code to set all bits to one and then clear all which should
be cleared, but it's theoretically possible.

I think we need to have at lease WRITE_ONCE() everywhere we update
vm_flags and READ_ONCE() where we read it without mmap_sem taken.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/12] mm: rmap use pte lock not mmap_sem to set PageMlocked
  2015-10-19 12:33       ` Vlastimil Babka
@ 2015-10-19 19:17         ` Hugh Dickins
  2015-10-19 20:52           ` Vlastimil Babka
  0 siblings, 1 reply; 39+ messages in thread
From: Hugh Dickins @ 2015-10-19 19:17 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Hugh Dickins, Andrew Morton, Christoph Lameter,
	Kirill A. Shutemov, Rik van Riel, Davidlohr Bueso, Oleg Nesterov,
	Sasha Levin, Andrey Konovalov, Dmitry Vyukov, KOSAKI Motohiro,
	linux-mm

On Mon, 19 Oct 2015, Vlastimil Babka wrote:
> On 10/19/2015 01:20 PM, Hugh Dickins wrote:
> > On Mon, 19 Oct 2015, Vlastimil Babka wrote:
> >> On 10/19/2015 06:50 AM, Hugh Dickins wrote:
> >>> KernelThreadSanitizer (ktsan) has shown that the down_read_trylock()
> >>> of mmap_sem in try_to_unmap_one() (when going to set PageMlocked on
> >>> a page found mapped in a VM_LOCKED vma) is ineffective against races
> >>> with exit_mmap()'s munlock_vma_pages_all(), because mmap_sem is not
> >>> held when tearing down an mm.
> >>>
> >>> But that's okay, those races are benign; and although we've believed
> >>
> >> But didn't Kirill show that it's not so benign, and can leak memory?
> >> - http://marc.info/?l=linux-mm&m=144196800325498&w=2
> > 
> > Kirill's race was this:
> > 
> > 		CPU0				CPU1
> > exit_mmap()
> >    // mmap_sem is *not* taken
> >    munlock_vma_pages_all()
> >      munlock_vma_pages_range()
> >      					try_to_unmap_one()
> > 					  down_read_trylock(&vma->vm_mm->mmap_sem))
> > 					  !!(vma->vm_flags & VM_LOCKED) == true
> >        vma->vm_flags &= ~VM_LOCKED;
> >        <munlock the page>
> >        					  mlock_vma_page(page);
> > 					  // mlocked pages is leaked.
> > 
> > Hmm, I pulled that in to say that it looked benign to me, that he was
> > missing all the subsequent "<munlock the page>" which would correct the
> > situation.  But now I look at it again, I agree with you both: lacking
> > any relevant locking on CPU1 at that point (it has already given up the
> > pte lock there), the whole of "<munlock the page>" could take place on
> > CPU0, before CPU1 reaches its mlock_vma_page(page), yes.
> > 
> > Oh, hold on, no: doesn't page lock prevent that one?  CPU1 has the page
> > lock throughout, so CPU0's <munlock the page> cannot complete before
> > CPU1's mlock_vma_page(page).  So now I disagree with you again!
> 
> 
> I think the page lock doesn't help with munlock_vma_pages_range(). If I
> expand the race above:
> 
> 	CPU0				CPU1
> 					
> exit_mmap()
>   // mmap_sem is *not* taken
>   munlock_vma_pages_all()
>     munlock_vma_pages_range()		
> 					lock_page()
> 					...
>     					try_to_unmap_one()
> 					  down_read_trylock(&vma->vm_mm->mmap_sem))
> 					  !!(vma->vm_flags & VM_LOCKED) == true
>       vma->vm_flags &= ~VM_LOCKED;
>       __munlock_pagevec_fill
>         // this briefly takes pte lock
>       __munlock_pagevec()
> 	// Phase 1
> 	TestClearPageMlocked(page)
> 
>       					  mlock_vma_page(page);
> 					    TestSetPageMlocked(page)
> 					    // page is still mlocked
> 					...
> 					unlock_page()
>         // Phase 2
>         lock_page()
>         if (!__putback_lru_fast_prepare())
>           // true, because page_evictable(page) is false due to PageMlocked
>           __munlock_isolated_page
>           if (page_mapcount(page) > 1)
>              try_to_munlock(page);
>              // this will not help AFAICS
> 
> Now if CPU0 is the last mapper, it will unmap the page anyway
> further in exit_mmap(). If not, it stays mlocked.
> 
> The key problem is that page lock doesn't cover the TestClearPageMlocked(page)
> part on CPU0.

Thank you for expanding: your diagram beats my words.  Yes, I now agree
with you again - but reserve the right the change my mind an infinite
number of times as we look into this for longer.

You can see why mm/mlock.c is not my favourite source file, and every
improvement to it seems to make it worse.  It doesn't help that most of
the functions named "munlock" are about trying to set the mlocked bit.

And while it's there on our screens, let me note that "page_mapcount > 1"
"improvement" of mine is, I believe, less valid in the current multistage
procedure than when I first added it (though perhaps a look back would
prove me just as wrong back then).  But it errs on the safe side (never
marking something unevictable when it's evictable) since PageMlocked has
already been cleared, so I think that it's still an optimization well
worth making for the common case.

> Your patch should help AFAICS. If CPU1 does the mlock under pte lock, the
> TestClear... on CPU0 can happen only after that.
> If CPU0 takes pte lock first, then CPU1 must see the VM_LOCKED flag cleared,
> right?

Right - thanks a lot for giving it more thought.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/12] mm: rmap use pte lock not mmap_sem to set PageMlocked
  2015-10-19 13:13       ` Kirill A. Shutemov
@ 2015-10-19 19:53         ` Hugh Dickins
  2015-10-19 20:10           ` Kirill A. Shutemov
  2015-10-19 23:30         ` [PATCH " Davidlohr Bueso
  1 sibling, 1 reply; 39+ messages in thread
From: Hugh Dickins @ 2015-10-19 19:53 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Vlastimil Babka, Andrew Morton, Christoph Lameter,
	Kirill A. Shutemov, Rik van Riel, Davidlohr Bueso, Oleg Nesterov,
	Sasha Levin, Andrey Konovalov, Dmitry Vyukov, KOSAKI Motohiro,
	linux-mm

On Mon, 19 Oct 2015, Kirill A. Shutemov wrote:
> On Mon, Oct 19, 2015 at 04:20:05AM -0700, Hugh Dickins wrote:
> > > Note how munlock_vma_pages_range() via __munlock_pagevec() does
> > > TestClearPageMlocked() without (or "between") pte or page lock. But the pte
> > > lock is being taken after clearing VM_LOCKED, so perhaps it's safe against
> > > try_to_unmap_one...
> > 
> > A mind-trick I found helpful for understanding the barriers here, is
> > to imagine that the munlocker repeats its "vma->vm_flags &= ~VM_LOCKED"
> > every time it takes the pte lock: it does not actually do that, it
> > doesn't need to of course; but that does help show that ~VM_LOCKED
> > must be visible to anyone getting that pte lock afterwards.
> 
> How can you make sure that any other codepath that changes vm_flags would
> not make (vm_flags & VM_LOCKED) temporary true while dealing with other
> flags?
> 
> Compiler can convert things like "vma->vm_flags &= ~VM_FOO;" to whatever
> it wants as long as end result is the same. It's very unlikely that it
> will generate code to set all bits to one and then clear all which should
> be cleared, but it's theoretically possible.

I think that's in the realm of the fanciful.  But yes, it quite often
turns out that what I think is fanciful, is something that Paul has
heard compiler writers say they want to do, even if he has managed
to discourage them from doing it so far.

But more to the point, when you write of "end result", the compiler
would have no idea that releasing mmap_sem is the point at which
end result must be established: wouldn't it have to establish end
result before the next unlock operation, and before the end of the
compilation unit?  pte unlock being the relevant unlock operation
in this case, at least with my patch if not without.

> 
> I think we need to have at lease WRITE_ONCE() everywhere we update
> vm_flags and READ_ONCE() where we read it without mmap_sem taken.

Not a series I'll embark upon myself,
and the patch at hand doesn't make things worse.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/12] mm: rmap use pte lock not mmap_sem to set PageMlocked
  2015-10-19 19:53         ` Hugh Dickins
@ 2015-10-19 20:10           ` Kirill A. Shutemov
  2015-10-19 21:25             ` Vlastimil Babka
  0 siblings, 1 reply; 39+ messages in thread
From: Kirill A. Shutemov @ 2015-10-19 20:10 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Vlastimil Babka, Andrew Morton, Christoph Lameter,
	Kirill A. Shutemov, Rik van Riel, Davidlohr Bueso, Oleg Nesterov,
	Sasha Levin, Andrey Konovalov, Dmitry Vyukov, KOSAKI Motohiro,
	linux-mm, Paul E. McKenney

On Mon, Oct 19, 2015 at 12:53:17PM -0700, Hugh Dickins wrote:
> On Mon, 19 Oct 2015, Kirill A. Shutemov wrote:
> > On Mon, Oct 19, 2015 at 04:20:05AM -0700, Hugh Dickins wrote:
> > > > Note how munlock_vma_pages_range() via __munlock_pagevec() does
> > > > TestClearPageMlocked() without (or "between") pte or page lock. But the pte
> > > > lock is being taken after clearing VM_LOCKED, so perhaps it's safe against
> > > > try_to_unmap_one...
> > > 
> > > A mind-trick I found helpful for understanding the barriers here, is
> > > to imagine that the munlocker repeats its "vma->vm_flags &= ~VM_LOCKED"
> > > every time it takes the pte lock: it does not actually do that, it
> > > doesn't need to of course; but that does help show that ~VM_LOCKED
> > > must be visible to anyone getting that pte lock afterwards.
> > 
> > How can you make sure that any other codepath that changes vm_flags would
> > not make (vm_flags & VM_LOCKED) temporary true while dealing with other
> > flags?
> > 
> > Compiler can convert things like "vma->vm_flags &= ~VM_FOO;" to whatever
> > it wants as long as end result is the same. It's very unlikely that it
> > will generate code to set all bits to one and then clear all which should
> > be cleared, but it's theoretically possible.
> 
> I think that's in the realm of the fanciful.  But yes, it quite often
> turns out that what I think is fanciful, is something that Paul has
> heard compiler writers say they want to do, even if he has managed
> to discourage them from doing it so far.
 
Paul always has links to pdfs with this kind of horror. ;)

> But more to the point, when you write of "end result", the compiler
> would have no idea that releasing mmap_sem is the point at which
> end result must be established: wouldn't it have to establish end
> result before the next unlock operation, and before the end of the
> compilation unit?  pte unlock being the relevant unlock operation
> in this case, at least with my patch if not without.
> 
> > 
> > I think we need to have at lease WRITE_ONCE() everywhere we update
> > vm_flags and READ_ONCE() where we read it without mmap_sem taken.
> 
> Not a series I'll embark upon myself,
> and the patch at hand doesn't make things worse.

I think it does.

The patch changes locking rules for ->vm_flags without proper preparation
and documentation. It will strike back one day.

I know we have few other cases when we access ->vm_flags without mmap_sem,
but this doesn't justify introducing one more potentially weak codepath.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/12] mm: rmap use pte lock not mmap_sem to set PageMlocked
  2015-10-19 19:17         ` Hugh Dickins
@ 2015-10-19 20:52           ` Vlastimil Babka
  0 siblings, 0 replies; 39+ messages in thread
From: Vlastimil Babka @ 2015-10-19 20:52 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Christoph Lameter, Kirill A. Shutemov,
	Rik van Riel, Davidlohr Bueso, Oleg Nesterov, Sasha Levin,
	Andrey Konovalov, Dmitry Vyukov, KOSAKI Motohiro, linux-mm

On 10/19/2015 09:17 PM, Hugh Dickins wrote:
>> Now if CPU0 is the last mapper, it will unmap the page anyway
>> further in exit_mmap(). If not, it stays mlocked.
>>
>> The key problem is that page lock doesn't cover the TestClearPageMlocked(page)
>> part on CPU0.
>
> Thank you for expanding: your diagram beats my words.  Yes, I now agree
> with you again - but reserve the right the change my mind an infinite
> number of times as we look into this for longer.

Good :)

> You can see why mm/mlock.c is not my favourite source file, and every
> improvement to it seems to make it worse.

Thank you for not explicitly pointing out the authorship of the current 
pagevec-based munlock_vma_pages_range. In the unknown author's defense, 
it was my first series.

> It doesn't help that most of
> the functions named "munlock" are about trying to set the mlocked bit.

That's hopefully a much older issue. And I may add that it doesn't help 
that although we do atomic TestAndSet/Clear operations, it still subtly 
relies on other locks for correctness.

> And while it's there on our screens, let me note that "page_mapcount > 1"
> "improvement" of mine is, I believe, less valid in the current multistage
> procedure than when I first added it (though perhaps a look back would
> prove me just as wrong back then).  But it errs on the safe side (never
> marking something unevictable when it's evictable) since PageMlocked has
> already been cleared, so I think that it's still an optimization well
> worth making for the common case.

Sure.

>> Your patch should help AFAICS. If CPU1 does the mlock under pte lock, the
>> TestClear... on CPU0 can happen only after that.
>> If CPU0 takes pte lock first, then CPU1 must see the VM_LOCKED flag cleared,
>> right?
>
> Right - thanks a lot for giving it more thought.
>
> Hugh
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/12] mm: rmap use pte lock not mmap_sem to set PageMlocked
  2015-10-19 20:10           ` Kirill A. Shutemov
@ 2015-10-19 21:25             ` Vlastimil Babka
  2015-10-19 21:53               ` Kirill A. Shutemov
  2015-10-21 23:26               ` Hugh Dickins
  0 siblings, 2 replies; 39+ messages in thread
From: Vlastimil Babka @ 2015-10-19 21:25 UTC (permalink / raw)
  To: Kirill A. Shutemov, Hugh Dickins
  Cc: Andrew Morton, Christoph Lameter, Kirill A. Shutemov,
	Rik van Riel, Davidlohr Bueso, Oleg Nesterov, Sasha Levin,
	Andrey Konovalov, Dmitry Vyukov, KOSAKI Motohiro, linux-mm,
	Paul E. McKenney

On 10/19/2015 10:10 PM, Kirill A. Shutemov wrote:
> On Mon, Oct 19, 2015 at 12:53:17PM -0700, Hugh Dickins wrote:
>> On Mon, 19 Oct 2015, Kirill A. Shutemov wrote:
>>> On Mon, Oct 19, 2015 at 04:20:05AM -0700, Hugh Dickins wrote:
>>>>> Note how munlock_vma_pages_range() via __munlock_pagevec() does
>>>>> TestClearPageMlocked() without (or "between") pte or page lock. But the pte
>>>>> lock is being taken after clearing VM_LOCKED, so perhaps it's safe against
>>>>> try_to_unmap_one...
>>>>
>>>> A mind-trick I found helpful for understanding the barriers here, is
>>>> to imagine that the munlocker repeats its "vma->vm_flags &= ~VM_LOCKED"
>>>> every time it takes the pte lock: it does not actually do that, it
>>>> doesn't need to of course; but that does help show that ~VM_LOCKED
>>>> must be visible to anyone getting that pte lock afterwards.
>>>
>>> How can you make sure that any other codepath that changes vm_flags would
>>> not make (vm_flags & VM_LOCKED) temporary true while dealing with other
>>> flags?
>>>
>>> Compiler can convert things like "vma->vm_flags &= ~VM_FOO;" to whatever
>>> it wants as long as end result is the same. It's very unlikely that it
>>> will generate code to set all bits to one and then clear all which should
>>> be cleared, but it's theoretically possible.

I think Linus would be very vocal about such compiler implementation. 
And I can imagine a lot of things in the kernel would break by those 
spuriously set bits. There must be a lot of stuff that's "theoretically 
possible within the standard" but no sane compiler does. I believe even 
compiler guys are not that insane. IIRC we've seen bugs like this and 
they were always treated as bugs and fixed.
The example I've heard often used for theoretically possible but insane 
stuff is that the compiler could make code randomly write over anything 
that's not volatile, as long as it restored the original values upon 
e.g. returning from the function. That just can't happen.

>> I think that's in the realm of the fanciful.  But yes, it quite often
>> turns out that what I think is fanciful, is something that Paul has
>> heard compiler writers say they want to do, even if he has managed
>> to discourage them from doing it so far.
>
> Paul always has links to pdfs with this kind of horror. ;)
>
>> But more to the point, when you write of "end result", the compiler
>> would have no idea that releasing mmap_sem is the point at which
>> end result must be established:

Isn't releasing a lock one of those "release" barriers where previously
issued writes must become visible before the unlock takes place?

>> wouldn't it have to establish end
>> result before the next unlock operation, and before the end of the
>> compilation unit?

Now I'm lost in what you mean.

>> pte unlock being the relevant unlock operation
>> in this case, at least with my patch if not without.

Hm so IIUC Kirill's point is that try_to_unmap_one() is checking 
VM_LOCKED under pte lock, but somebody else might be modifying vm_flags 
under mmap_sem, and thus we have no protection.

>>>
>>> I think we need to have at lease WRITE_ONCE() everywhere we update
>>> vm_flags and READ_ONCE() where we read it without mmap_sem taken.

It wouldn't hurt to check if seeing a stale value or using non-atomic 
RMW can be a problem somewhere. In this case it's testing, not changing, 
so RMW is not an issue. But the check shouldn't consider arbitrary 
changes made by a potentially crazy compiler.

>> Not a series I'll embark upon myself,
>> and the patch at hand doesn't make things worse.
>
> I think it does.

So what's the alternative? Hm could we keep the trylock on mmap_sem 
under pte lock? The ordering is wrong, but it's a trylock, so no danger 
of deadlock?

> The patch changes locking rules for ->vm_flags without proper preparation
> and documentation. It will strike back one day.
> I know we have few other cases when we access ->vm_flags without mmap_sem,
> but this doesn't justify introducing one more potentially weak codepath.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/12] mm: rmap use pte lock not mmap_sem to set PageMlocked
  2015-10-19 21:25             ` Vlastimil Babka
@ 2015-10-19 21:53               ` Kirill A. Shutemov
  2015-10-21 23:26               ` Hugh Dickins
  1 sibling, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2015-10-19 21:53 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Hugh Dickins, Andrew Morton, Christoph Lameter,
	Kirill A. Shutemov, Rik van Riel, Davidlohr Bueso, Oleg Nesterov,
	Sasha Levin, Andrey Konovalov, Dmitry Vyukov, KOSAKI Motohiro,
	linux-mm, Paul E. McKenney

On Mon, Oct 19, 2015 at 11:25:56PM +0200, Vlastimil Babka wrote:
> >>>I think we need to have at lease WRITE_ONCE() everywhere we update
> >>>vm_flags and READ_ONCE() where we read it without mmap_sem taken.
> 
> It wouldn't hurt to check if seeing a stale value or using non-atomic RMW
> can be a problem somewhere. In this case it's testing, not changing, so RMW
> is not an issue. But the check shouldn't consider arbitrary changes made by
> a potentially crazy compiler.
> 
> >>Not a series I'll embark upon myself,
> >>and the patch at hand doesn't make things worse.
> >
> >I think it does.
> 
> So what's the alternative? Hm could we keep the trylock on mmap_sem under
> pte lock? The ordering is wrong, but it's a trylock, so no danger of
> deadlock?

This patch is okay for now as it fixes bug. I mean more real bug than it
introduces ;)

But situation with locking around ->vm_flags is getting worse. ->mmap_sem
doesn't serve it well.  We need to come up with something better.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/12] mm: rmap use pte lock not mmap_sem to set PageMlocked
  2015-10-19 13:13       ` Kirill A. Shutemov
  2015-10-19 19:53         ` Hugh Dickins
@ 2015-10-19 23:30         ` Davidlohr Bueso
  1 sibling, 0 replies; 39+ messages in thread
From: Davidlohr Bueso @ 2015-10-19 23:30 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Vlastimil Babka, Andrew Morton, Christoph Lameter,
	Kirill A. Shutemov, Rik van Riel, Oleg Nesterov, Sasha Levin,
	Andrey Konovalov, Dmitry Vyukov, KOSAKI Motohiro, linux-mm

On Mon, 19 Oct 2015, Kirill A. Shutemov wrote:

>I think we need to have at lease WRITE_ONCE() everywhere we update
>vm_flags and READ_ONCE() where we read it without mmap_sem taken.

Given the CPU-CPU interaction, lockless/racy vm_flag checks should
actually be using barrier pairing, afaict. This is expensive obviously,
but I cannot recall what other places we do lockless games with vm_flags.

Perhaps its time to introduce formal helpers around vma->vm_flags such
that we can encapsulate such things: __vma_[set/read]_vmflags() or whatever
that would be used for those racy scenarios.

Thanks,
Davidlohr

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/12] mm: correct a couple of page migration comments
  2015-10-19  4:55 ` [PATCH 5/12] mm: correct a couple of page migration comments Hugh Dickins
@ 2015-10-21 17:53   ` Rafael Aquini
  0 siblings, 0 replies; 39+ messages in thread
From: Rafael Aquini @ 2015-10-21 17:53 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, Christoph Lameter, Joonsoo Kim, linux-mm

On Sun, Oct 18, 2015 at 09:55:56PM -0700, Hugh Dickins wrote:
> It's migrate.c not migration,c, and nowadays putback_movable_pages()
> not putback_lru_pages().
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
>  mm/migrate.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> --- migrat.orig/mm/migrate.c	2015-10-18 17:53:14.326325730 -0700
> +++ migrat/mm/migrate.c	2015-10-18 17:53:17.579329434 -0700
> @@ -1,5 +1,5 @@
>  /*
> - * Memory Migration functionality - linux/mm/migration.c
> + * Memory Migration functionality - linux/mm/migrate.c
>   *
>   * Copyright (C) 2006 Silicon Graphics, Inc., Christoph Lameter
>   *
> @@ -1113,7 +1113,7 @@ out:
>   *
>   * The function returns after 10 attempts or if no pages are movable any more
>   * because the list has become empty or no retryable pages exist any more.
> - * The caller should call putback_lru_pages() to return pages to the LRU
> + * The caller should call putback_movable_pages() to return pages to the LRU
>   * or free list only if ret != 0.
>   *
>   * Returns the number of pages that were not migrated, or an error code.
Acked-by: Rafael Aquini <aquini@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/12] mm: page migration trylock newpage at same level as oldpage
  2015-10-19  4:59 ` [PATCH 7/12] mm: page migration trylock newpage at same level as oldpage Hugh Dickins
@ 2015-10-21 17:54   ` Rafael Aquini
  0 siblings, 0 replies; 39+ messages in thread
From: Rafael Aquini @ 2015-10-21 17:54 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Christoph Lameter, Konstantin Khlebnikov,
	Naoya Horiguchi, linux-mm

On Sun, Oct 18, 2015 at 09:59:11PM -0700, Hugh Dickins wrote:
> Clean up page migration a little by moving the trylock of newpage from
> move_to_new_page() into __unmap_and_move(), where the old page has been
> locked.  Adjust unmap_and_move_huge_page() and balloon_page_migrate()
> accordingly.
> 
> But make one kind-of-functional change on the way: whereas trylock of
> newpage used to BUG() if it failed, now simply return -EAGAIN if so.
> Cutting out BUG()s is good, right?  But, to be honest, this is really
> to extend the usefulness of the custom put_new_page feature, allowing
> a pool of new pages to be shared perhaps with racing uses.
> 
> Use an "else" instead of that "skip_unmap" label.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
>  mm/balloon_compaction.c |   10 +-------
>  mm/migrate.c            |   46 +++++++++++++++++++++-----------------
>  2 files changed, 28 insertions(+), 28 deletions(-)
> 
> --- migrat.orig/mm/balloon_compaction.c	2014-12-07 14:21:05.000000000 -0800
> +++ migrat/mm/balloon_compaction.c	2015-10-18 17:53:22.486335020 -0700
> @@ -199,23 +199,17 @@ int balloon_page_migrate(struct page *ne
>  	struct balloon_dev_info *balloon = balloon_page_device(page);
>  	int rc = -EAGAIN;
>  
> -	/*
> -	 * Block others from accessing the 'newpage' when we get around to
> -	 * establishing additional references. We should be the only one
> -	 * holding a reference to the 'newpage' at this point.
> -	 */
> -	BUG_ON(!trylock_page(newpage));
> +	VM_BUG_ON_PAGE(!PageLocked(page), page);
> +	VM_BUG_ON_PAGE(!PageLocked(newpage), newpage);
>  
>  	if (WARN_ON(!__is_movable_balloon_page(page))) {
>  		dump_page(page, "not movable balloon page");
> -		unlock_page(newpage);
>  		return rc;
>  	}
>  
>  	if (balloon && balloon->migratepage)
>  		rc = balloon->migratepage(balloon, newpage, page, mode);
>  
> -	unlock_page(newpage);
>  	return rc;
>  }
>  #endif /* CONFIG_BALLOON_COMPACTION */
> --- migrat.orig/mm/migrate.c	2015-10-18 17:53:20.159332371 -0700
> +++ migrat/mm/migrate.c	2015-10-18 17:53:22.487335021 -0700
> @@ -727,13 +727,8 @@ static int move_to_new_page(struct page
>  	struct address_space *mapping;
>  	int rc;
>  
> -	/*
> -	 * Block others from accessing the page when we get around to
> -	 * establishing additional references. We are the only one
> -	 * holding a reference to the new page at this point.
> -	 */
> -	if (!trylock_page(newpage))
> -		BUG();
> +	VM_BUG_ON_PAGE(!PageLocked(page), page);
> +	VM_BUG_ON_PAGE(!PageLocked(newpage), newpage);
>  
>  	/* Prepare mapping for the new page.*/
>  	newpage->index = page->index;
> @@ -774,9 +769,6 @@ static int move_to_new_page(struct page
>  			remove_migration_ptes(page, newpage);
>  		page->mapping = NULL;
>  	}
> -
> -	unlock_page(newpage);
> -
>  	return rc;
>  }
>  
> @@ -861,6 +853,17 @@ static int __unmap_and_move(struct page
>  		}
>  	}
>  
> +	/*
> +	 * Block others from accessing the new page when we get around to
> +	 * establishing additional references. We are usually the only one
> +	 * holding a reference to newpage at this point. We used to have a BUG
> +	 * here if trylock_page(newpage) fails, but would like to allow for
> +	 * cases where there might be a race with the previous use of newpage.
> +	 * This is much like races on refcount of oldpage: just don't BUG().
> +	 */
> +	if (unlikely(!trylock_page(newpage)))
> +		goto out_unlock;
> +
>  	if (unlikely(isolated_balloon_page(page))) {
>  		/*
>  		 * A ballooned page does not need any special attention from
> @@ -870,7 +873,7 @@ static int __unmap_and_move(struct page
>  		 * the page migration right away (proteced by page lock).
>  		 */
>  		rc = balloon_page_migrate(newpage, page, mode);
> -		goto out_unlock;
> +		goto out_unlock_both;
>  	}
>  
>  	/*
> @@ -889,30 +892,27 @@ static int __unmap_and_move(struct page
>  		VM_BUG_ON_PAGE(PageAnon(page), page);
>  		if (page_has_private(page)) {
>  			try_to_free_buffers(page);
> -			goto out_unlock;
> +			goto out_unlock_both;
>  		}
> -		goto skip_unmap;
> -	}
> -
> -	/* Establish migration ptes or remove ptes */
> -	if (page_mapped(page)) {
> +	} else if (page_mapped(page)) {
> +		/* Establish migration ptes */
>  		try_to_unmap(page,
>  			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
>  		page_was_mapped = 1;
>  	}
>  
> -skip_unmap:
>  	if (!page_mapped(page))
>  		rc = move_to_new_page(newpage, page, page_was_mapped, mode);
>  
>  	if (rc && page_was_mapped)
>  		remove_migration_ptes(page, page);
>  
> +out_unlock_both:
> +	unlock_page(newpage);
> +out_unlock:
>  	/* Drop an anon_vma reference if we took one */
>  	if (anon_vma)
>  		put_anon_vma(anon_vma);
> -
> -out_unlock:
>  	unlock_page(page);
>  out:
>  	return rc;
> @@ -1056,6 +1056,9 @@ static int unmap_and_move_huge_page(new_
>  	if (PageAnon(hpage))
>  		anon_vma = page_get_anon_vma(hpage);
>  
> +	if (unlikely(!trylock_page(new_hpage)))
> +		goto put_anon;
> +
>  	if (page_mapped(hpage)) {
>  		try_to_unmap(hpage,
>  			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
> @@ -1068,6 +1071,9 @@ static int unmap_and_move_huge_page(new_
>  	if (rc != MIGRATEPAGE_SUCCESS && page_was_mapped)
>  		remove_migration_ptes(hpage, hpage);
>  
> +	unlock_page(new_hpage);
> +
> +put_anon:
>  	if (anon_vma)
>  		put_anon_vma(anon_vma);
>  
Acked-by: Rafael Aquini <aquini@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/12] mm: rmap use pte lock not mmap_sem to set PageMlocked
  2015-10-19 21:25             ` Vlastimil Babka
  2015-10-19 21:53               ` Kirill A. Shutemov
@ 2015-10-21 23:26               ` Hugh Dickins
  2015-10-29 18:49                 ` [PATCH v2 " Hugh Dickins
  1 sibling, 1 reply; 39+ messages in thread
From: Hugh Dickins @ 2015-10-21 23:26 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kirill A. Shutemov, Hugh Dickins, Andrew Morton,
	Christoph Lameter, Kirill A. Shutemov, Rik van Riel,
	Davidlohr Bueso, Oleg Nesterov, Sasha Levin, Andrey Konovalov,
	Dmitry Vyukov, KOSAKI Motohiro, linux-mm, Paul E. McKenney

On Mon, 19 Oct 2015, Vlastimil Babka wrote:
> On 10/19/2015 10:10 PM, Kirill A. Shutemov wrote:
> > On Mon, Oct 19, 2015 at 12:53:17PM -0700, Hugh Dickins wrote:
> > > On Mon, 19 Oct 2015, Kirill A. Shutemov wrote:
> > > > On Mon, Oct 19, 2015 at 04:20:05AM -0700, Hugh Dickins wrote:
> > > > > > Note how munlock_vma_pages_range() via __munlock_pagevec() does
> > > > > > TestClearPageMlocked() without (or "between") pte or page lock. But
> > > > > > the pte
> > > > > > lock is being taken after clearing VM_LOCKED, so perhaps it's safe
> > > > > > against
> > > > > > try_to_unmap_one...
> > > > > 
> > > > > A mind-trick I found helpful for understanding the barriers here, is
> > > > > to imagine that the munlocker repeats its "vma->vm_flags &=
> > > > > ~VM_LOCKED"
> > > > > every time it takes the pte lock: it does not actually do that, it
> > > > > doesn't need to of course; but that does help show that ~VM_LOCKED
> > > > > must be visible to anyone getting that pte lock afterwards.
> > > > 
> > > > How can you make sure that any other codepath that changes vm_flags
> > > > would
> > > > not make (vm_flags & VM_LOCKED) temporary true while dealing with other
> > > > flags?
> > > > 
> > > > Compiler can convert things like "vma->vm_flags &= ~VM_FOO;" to
> > > > whatever
> > > > it wants as long as end result is the same. It's very unlikely that it
> > > > will generate code to set all bits to one and then clear all which
> > > > should
> > > > be cleared, but it's theoretically possible.
> 
> I think Linus would be very vocal about such compiler implementation. And I
> can imagine a lot of things in the kernel would break by those spuriously set
> bits. There must be a lot of stuff that's "theoretically possible within the
> standard" but no sane compiler does. I believe even compiler guys are not
> that insane. IIRC we've seen bugs like this and they were always treated as
> bugs and fixed.
> The example I've heard often used for theoretically possible but insane stuff
> is that the compiler could make code randomly write over anything that's not
> volatile, as long as it restored the original values upon e.g. returning from
> the function. That just can't happen.
> 
> > > I think that's in the realm of the fanciful.  But yes, it quite often
> > > turns out that what I think is fanciful, is something that Paul has
> > > heard compiler writers say they want to do, even if he has managed
> > > to discourage them from doing it so far.
> > 
> > Paul always has links to pdfs with this kind of horror. ;)
> > 
> > > But more to the point, when you write of "end result", the compiler
> > > would have no idea that releasing mmap_sem is the point at which
> > > end result must be established:
> 
> Isn't releasing a lock one of those "release" barriers where previously
> issued writes must become visible before the unlock takes place?

Yes, as I understand it.

> 
> > > wouldn't it have to establish end
> > > result before the next unlock operation, and before the end of the
> > > compilation unit?
> 
> Now I'm lost in what you mean.

I just meant what you suggest above; plus, if the compiler cannot see
any unlock within its scope, it must complete the write before returning.

> 
> > > pte unlock being the relevant unlock operation
> > > in this case, at least with my patch if not without.
> 
> Hm so IIUC Kirill's point is that try_to_unmap_one() is checking VM_LOCKED
> under pte lock, but somebody else might be modifying vm_flags under mmap_sem,
> and thus we have no protection.

Yes, in trying to understand what it was that lost you above, I finally
grasped Kirill's (perfectly well stated) point: whereas I was focussed on
the valid interplay between try_to_unmap_one() and mlock(2) and munlock(2),
he was concerned about a non-mlocker-munlocker doing something else to
vm_flags (under exclusive mmap_sem) while we're in try_to_unmap_one().

Which indeed would be a problem (a problem of the "left page unevictable
when it's not in any locked vma" kind) if the kernel is to be built with
one of these hypothetical compilers which implements
vm_flags &= VM_WHATEVER or vm_flags |= VM_WHATEVER with an intermediate
vm_flags = -1 stage.

And we'd feel better about it if I could point to somewhere which
already absolutely depends upon this not happening; but I'll admit that
the first places I looked to for support, turned out already to have the
WRITE_ONCE when modifying.  And I don't feel quite as confident of Linus's
outrage in the "&=" or "|=" case, as in the straight "=" assignment case.

I'm pretty sure I could find examples if I spent the time on it (er, how
convincing an argument is that??), but there do seem to be a lot of more
urgent things to attend to, than looking for those examples, or rushing
to add some kind of notation in lots of places.

Clearly I should add a couple of comments, to the commit description if
not to the code: one to add the case you've convinced me I was also fixing,
one to acknowledge Kirill's point about creative compilers.  That I will
do, before the patch hits Linus's tree, but not written yet.

And we could argue over whether I should restore the trylock of mmap_sem
(but this time under pte lock).  Personally I'm against restoring it:
it limits the effectiveness of the re-mlock technique, to handle a
hypothetical case, which we all(?) agree is not an imminent problem,
and should eventually be handled in a better way.

Hugh

> 
> > > > 
> > > > I think we need to have at lease WRITE_ONCE() everywhere we update
> > > > vm_flags and READ_ONCE() where we read it without mmap_sem taken.
> 
> It wouldn't hurt to check if seeing a stale value or using non-atomic RMW can
> be a problem somewhere. In this case it's testing, not changing, so RMW is
> not an issue. But the check shouldn't consider arbitrary changes made by a
> potentially crazy compiler.
> 
> > > Not a series I'll embark upon myself,
> > > and the patch at hand doesn't make things worse.
> > 
> > I think it does.
> 
> So what's the alternative? Hm could we keep the trylock on mmap_sem under pte
> lock? The ordering is wrong, but it's a trylock, so no danger of deadlock?
> 
> > The patch changes locking rules for ->vm_flags without proper preparation
> > and documentation. It will strike back one day.
> > I know we have few other cases when we access ->vm_flags without mmap_sem,
> > but this doesn't justify introducing one more potentially weak codepath.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 10/12] mm: page migration use migration entry for swapcache too
  2015-10-19  5:05 ` [PATCH 10/12] mm: page migration use migration entry for swapcache too Hugh Dickins
@ 2015-10-22 22:35   ` Cyrill Gorcunov
  0 siblings, 0 replies; 39+ messages in thread
From: Cyrill Gorcunov @ 2015-10-22 22:35 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Christoph Lameter, Konstantin Khlebnikov,
	Mel Gorman, Minchan Kim, Pavel Emelyanov, linux-mm

On Sun, Oct 18, 2015 at 10:05:28PM -0700, Hugh Dickins wrote:
> Hitherto page migration has avoided using a migration entry for a
> swapcache page mapped into userspace, apparently for historical reasons.
> So any page blessed with swapcache would entail a minor fault when it's
> next touched, which page migration otherwise tries to avoid.  Swapcache
> in an mlocked area is rare, so won't often matter, but still better fixed.
> 
> Just rearrange the block in try_to_unmap_one(), to handle TTU_MIGRATION
> before checking PageAnon, that's all (apart from some reindenting).
> 
> Well, no, that's not quite all: doesn't this by the way fix a soft_dirty
> bug, that page migration of a file page was forgetting to transfer the
> soft_dirty bit?  Probably not a serious bug: if I understand correctly,
> soft_dirty afficionados usually have to handle file pages separately
> anyway; but we publish the bit in /proc/<pid>/pagemap on file mappings
> as well as anonymous, so page migration ought not to perturb it.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>

Sorry for delay in response. Indeed this should fix the nit, thanks!
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2 2/12] mm: rmap use pte lock not mmap_sem to set PageMlocked
  2015-10-21 23:26               ` Hugh Dickins
@ 2015-10-29 18:49                 ` Hugh Dickins
  2015-11-05 17:50                   ` Vlastimil Babka
  0 siblings, 1 reply; 39+ messages in thread
From: Hugh Dickins @ 2015-10-29 18:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Kirill A. Shutemov, Christoph Lameter,
	Kirill A. Shutemov, Rik van Riel, Davidlohr Bueso, Oleg Nesterov,
	Sasha Levin, Andrey Konovalov, Dmitry Vyukov, KOSAKI Motohiro,
	linux-mm, Paul E. McKenney

KernelThreadSanitizer (ktsan) has shown that the down_read_trylock() of
mmap_sem in try_to_unmap_one() (when going to set PageMlocked on a page
found mapped in a VM_LOCKED vma) is ineffective against races with
exit_mmap()'s munlock_vma_pages_all(), because mmap_sem is not held when
tearing down an mm.

But that's okay, those races are benign; and although we've believed for
years in that ugly down_read_trylock(), it's unsuitable for the job, and
frustrates the good intention of setting PageMlocked when it fails.

It just doesn't matter if here we read vm_flags an instant before or after
a racing mlock() or munlock() or exit_mmap() sets or clears VM_LOCKED: the
syscalls (or exit) work their way up the address space (taking pt locks
after updating vm_flags) to establish the final state.

We do still need to be careful never to mark a page Mlocked (hence
unevictable) by any race that will not be corrected shortly after.  The
page lock protects from many of the races, but not all (a page is not
necessarily locked when it's unmapped).  But the pte lock we just dropped
is good to cover the rest (and serializes even with
munlock_vma_pages_all(), so no special barriers required): now hold on to
the pte lock while calling mlock_vma_page().  Is that lock ordering safe? 
Yes, that's how follow_page_pte() calls it, and how page_remove_rmap()
calls the complementary clear_page_mlock().

This fixes the following case (though not a case which anyone has
complained of), which mmap_sem did not: truncation's preliminary
unmap_mapping_range() is supposed to remove even the anonymous COWs of
filecache pages, and that might race with try_to_unmap_one() on a
VM_LOCKED vma, so that mlock_vma_page() sets PageMlocked just after
zap_pte_range() unmaps the page, causing "Bad page state (mlocked)" when
freed.  The pte lock protects against this.

You could say that it also protects against the more ordinary case, racing
with the preliminary unmapping of a filecache page itself: but in our
current tree, that's independently protected by i_mmap_rwsem; and that
race would be why "Bad page state (mlocked)" was seen before commit
48ec833b7851 ("Revert mm/memory.c: share the i_mmap_rwsem").

Vlastimil Babka points out another race which this patch protects against.
try_to_unmap_one() might reach its mlock_vma_page() TestSetPageMlocked a
moment after munlock_vma_pages_all() did its Phase 1 TestClearPageMlocked:
leaving PageMlocked and unevictable when it should be evictable.  mmap_sem
is ineffective because exit_mmap() does not hold it; page lock ineffective
because __munlock_pagevec() only takes it afterwards, in Phase 2; pte lock
is effective because __munlock_pagevec_fill() takes it to get the page,
after VM_LOCKED was cleared from vm_flags, so visible to try_to_unmap_one.

Kirill Shutemov points out that if the compiler chooses to implement a
"vma->vm_flags &= VM_WHATEVER" or "vma->vm_flags |= VM_WHATEVER" operation
with an intermediate store of unrelated bits set, since I'm here foregoing
its usual protection by mmap_sem, try_to_unmap_one() might catch sight of
a spurious VM_LOCKED in vm_flags, and make the wrong decision.  This does
not appear to be an immediate problem, but we may want to define vm_flags
accessors in future, to guard against such a possibility.

While we're here, make a related optimization in try_to_munmap_one(): if
it's doing TTU_MUNLOCK, then there's no point at all in descending the
page tables and getting the pt lock, unless the vma is VM_LOCKED.  Yes,
that can change racily, but it can change racily even without the
optimization: it's not critical.  Far better not to waste time here.

Stopped short of separating try_to_munlock_one() from try_to_munmap_one()
on this occasion, but that's probably the sensible next step - with a
rename, given that try_to_munlock()'s business is to try to set Mlocked.

Updated the unevictable-lru Documentation, to remove its reference to mmap
semaphore, but found a few more updates needed in just that area.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
Andrew, please add back your Signed-off-by, and replace
mm-rmap-use-pte-lock-not-mmap_sem-to-set-pagemlocked.patch
by this version.  The patch is identical to what you have in mmotm,
but Vlastimil and Kirill made important points that needed recording
in the commit description: two paragraphs added above.

 Documentation/vm/unevictable-lru.txt |   65 ++++++-------------------
 mm/rmap.c                            |   36 ++++---------
 2 files changed, 29 insertions(+), 72 deletions(-)

diff -puN Documentation/vm/unevictable-lru.txt~mm-rmap-use-pte-lock-not-mmap_sem-to-set-pagemlocked Documentation/vm/unevictable-lru.txt
--- a/Documentation/vm/unevictable-lru.txt~mm-rmap-use-pte-lock-not-mmap_sem-to-set-pagemlocked
+++ a/Documentation/vm/unevictable-lru.txt
@@ -531,37 +531,20 @@ map.

 try_to_unmap() is always called, by either vmscan for reclaim or for page
 migration, with the argument page locked and isolated from the LRU.  Separate
-functions handle anonymous and mapped file pages, as these types of pages have
-different reverse map mechanisms.
-
- (*) try_to_unmap_anon()
-
-     To unmap anonymous pages, each VMA in the list anchored in the anon_vma
-     must be visited - at least until a VM_LOCKED VMA is encountered.  If the
-     page is being unmapped for migration, VM_LOCKED VMAs do not stop the
-     process because mlocked pages are migratable.  However, for reclaim, if
-     the page is mapped into a VM_LOCKED VMA, the scan stops.
-
-     try_to_unmap_anon() attempts to acquire in read mode the mmap semaphore of
-     the mm_struct to which the VMA belongs.  If this is successful, it will
-     mlock the page via mlock_vma_page() - we wouldn't have gotten to
-     try_to_unmap_anon() if the page were already mlocked - and will return
-     SWAP_MLOCK, indicating that the page is unevictable.
-
-     If the mmap semaphore cannot be acquired, we are not sure whether the page
-     is really unevictable or not.  In this case, try_to_unmap_anon() will
-     return SWAP_AGAIN.
-
- (*) try_to_unmap_file()
-
-     Unmapping of a mapped file page works the same as for anonymous mappings,
-     except that the scan visits all VMAs that map the page's index/page offset
-     in the page's mapping's reverse map interval search tree.
-
-     As for anonymous pages, on encountering a VM_LOCKED VMA for a mapped file
-     page, try_to_unmap_file() will attempt to acquire the associated
-     mm_struct's mmap semaphore to mlock the page, returning SWAP_MLOCK if this
-     is successful, and SWAP_AGAIN, if not.
+functions handle anonymous and mapped file and KSM pages, as these types of
+pages have different reverse map lookup mechanisms, with different locking.
+In each case, whether rmap_walk_anon() or rmap_walk_file() or rmap_walk_ksm(),
+it will call try_to_unmap_one() for every VMA which might contain the page.
+
+When trying to reclaim, if try_to_unmap_one() finds the page in a VM_LOCKED
+VMA, it will then mlock the page via mlock_vma_page() instead of unmapping it,
+and return SWAP_MLOCK to indicate that the page is unevictable: and the scan
+stops there.
+
+mlock_vma_page() is called while holding the page table's lock (in addition
+to the page lock, and the rmap lock): to serialize against concurrent mlock or
+munlock or munmap system calls, mm teardown (munlock_vma_pages_all), reclaim,
+holepunching, and truncation of file pages and their anonymous COWed pages.

 try_to_munlock() REVERSE MAP SCAN
@@ -577,22 +560,15 @@ all PTEs from the page.  For this purpos
 introduced a variant of try_to_unmap() called try_to_munlock().

 try_to_munlock() calls the same functions as try_to_unmap() for anonymous and
-mapped file pages with an additional argument specifying unlock versus unmap
+mapped file and KSM pages with a flag argument specifying unlock versus unmap
 processing.  Again, these functions walk the respective reverse maps looking
 for VM_LOCKED VMAs.  When such a VMA is found, as in the try_to_unmap() case,
-the functions attempt to acquire the associated mmap semaphore, mlock the page
-via mlock_vma_page() and return SWAP_MLOCK.  This effectively undoes the
-pre-clearing of the page's PG_mlocked done by munlock_vma_page.
-
-If try_to_unmap() is unable to acquire a VM_LOCKED VMA's associated mmap
-semaphore, it will return SWAP_AGAIN.  This will allow shrink_page_list() to
-recycle the page on the inactive list and hope that it has better luck with the
-page next time.
+the functions mlock the page via mlock_vma_page() and return SWAP_MLOCK.  This
+undoes the pre-clearing of the page's PG_mlocked done by munlock_vma_page.

 Note that try_to_munlock()'s reverse map walk must visit every VMA in a page's
 reverse map to determine that a page is NOT mapped into any VM_LOCKED VMA.
-However, the scan can terminate when it encounters a VM_LOCKED VMA and can
-successfully acquire the VMA's mmap semaphore for read and mlock the page.
+However, the scan can terminate when it encounters a VM_LOCKED VMA.
 Although try_to_munlock() might be called a great many times when munlocking a
 large region or tearing down a large address space that has been mlocked via
 mlockall(), overall this is a fairly rare event.
@@ -620,11 +596,6 @@ Some examples of these unevictable pages
  (3) mlocked pages that could not be isolated from the LRU and moved to the
      unevictable list in mlock_vma_page().

- (4) Pages mapped into multiple VM_LOCKED VMAs, but try_to_munlock() couldn't
-     acquire the VMA's mmap semaphore to test the flags and set PageMlocked.
-     munlock_vma_page() was forced to let the page back on to the normal LRU
-     list for vmscan to handle.
-
 shrink_inactive_list() also diverts any unevictable pages that it finds on the
 inactive lists to the appropriate zone's unevictable list.

diff -puN mm/rmap.c~mm-rmap-use-pte-lock-not-mmap_sem-to-set-pagemlocked mm/rmap.c
--- a/mm/rmap.c~mm-rmap-use-pte-lock-not-mmap_sem-to-set-pagemlocked
+++ a/mm/rmap.c
@@ -1304,6 +1304,10 @@ static int try_to_unmap_one(struct page
 	int ret = SWAP_AGAIN;
 	enum ttu_flags flags = (enum ttu_flags)arg;

+	/* munlock has nothing to gain from examining un-locked vmas */
+	if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED))
+		goto out;
+
 	pte = page_check_address(page, mm, address, &ptl, 0);
 	if (!pte)
 		goto out;
@@ -1314,9 +1318,12 @@ static int try_to_unmap_one(struct page
 	 * skipped over this mm) then we should reactivate it.
 	 */
 	if (!(flags & TTU_IGNORE_MLOCK)) {
-		if (vma->vm_flags & VM_LOCKED)
-			goto out_mlock;
-
+		if (vma->vm_flags & VM_LOCKED) {
+			/* Holding pte lock, we do *not* need mmap_sem here */
+			mlock_vma_page(page);
+			ret = SWAP_MLOCK;
+			goto out_unmap;
+		}
 		if (flags & TTU_MUNLOCK)
 			goto out_unmap;
 	}
@@ -1421,31 +1428,10 @@ static int try_to_unmap_one(struct page

 out_unmap:
 	pte_unmap_unlock(pte, ptl);
-	if (ret != SWAP_FAIL && !(flags & TTU_MUNLOCK))
+	if (ret != SWAP_FAIL && ret != SWAP_MLOCK && !(flags & TTU_MUNLOCK))
 		mmu_notifier_invalidate_page(mm, address);
 out:
 	return ret;
-
-out_mlock:
-	pte_unmap_unlock(pte, ptl);
-
-
-	/*
-	 * We need mmap_sem locking, Otherwise VM_LOCKED check makes
-	 * unstable result and race. Plus, We can't wait here because
-	 * we now hold anon_vma->rwsem or mapping->i_mmap_rwsem.
-	 * if trylock failed, the page remain in evictable lru and later
-	 * vmscan could retry to move the page to unevictable lru if the
-	 * page is actually mlocked.
-	 */
-	if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
-		if (vma->vm_flags & VM_LOCKED) {
-			mlock_vma_page(page);
-			ret = SWAP_MLOCK;
-		}
-		up_read(&vma->vm_mm->mmap_sem);
-	}
-	return ret;
 }

 bool is_vma_temporary_stack(struct vm_area_struct *vma)
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/12] mm Documentation: undoc non-linear vmas
  2015-10-19  4:45 ` [PATCH 1/12] mm Documentation: undoc non-linear vmas Hugh Dickins
  2015-10-19  9:16   ` Kirill A. Shutemov
@ 2015-11-05 17:29   ` Vlastimil Babka
  1 sibling, 0 replies; 39+ messages in thread
From: Vlastimil Babka @ 2015-11-05 17:29 UTC (permalink / raw)
  To: Hugh Dickins, Andrew Morton
  Cc: Christoph Lameter, Kirill A. Shutemov, linux-mm

On 10/19/2015 06:45 AM, Hugh Dickins wrote:
> While updating some mm Documentation, I came across a few straggling
> references to the non-linear vmas which were happily removed in v4.0.
> Delete them.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/12] mm: rmap use pte lock not mmap_sem to set PageMlocked
  2015-10-29 18:49                 ` [PATCH v2 " Hugh Dickins
@ 2015-11-05 17:50                   ` Vlastimil Babka
  0 siblings, 0 replies; 39+ messages in thread
From: Vlastimil Babka @ 2015-11-05 17:50 UTC (permalink / raw)
  To: Hugh Dickins, Andrew Morton
  Cc: Kirill A. Shutemov, Christoph Lameter, Kirill A. Shutemov,
	Rik van Riel, Davidlohr Bueso, Oleg Nesterov, Sasha Levin,
	Andrey Konovalov, Dmitry Vyukov, KOSAKI Motohiro, linux-mm,
	Paul E. McKenney

On 10/29/2015 07:49 PM, Hugh Dickins wrote:
> KernelThreadSanitizer (ktsan) has shown that the down_read_trylock() of
> mmap_sem in try_to_unmap_one() (when going to set PageMlocked on a page
> found mapped in a VM_LOCKED vma) is ineffective against races with
> exit_mmap()'s munlock_vma_pages_all(), because mmap_sem is not held when
> tearing down an mm.
> 
> But that's okay, those races are benign; and although we've believed for
> years in that ugly down_read_trylock(), it's unsuitable for the job, and
> frustrates the good intention of setting PageMlocked when it fails.
> 
> It just doesn't matter if here we read vm_flags an instant before or after
> a racing mlock() or munlock() or exit_mmap() sets or clears VM_LOCKED: the
> syscalls (or exit) work their way up the address space (taking pt locks
> after updating vm_flags) to establish the final state.
> 
> We do still need to be careful never to mark a page Mlocked (hence
> unevictable) by any race that will not be corrected shortly after.  The
> page lock protects from many of the races, but not all (a page is not
> necessarily locked when it's unmapped).  But the pte lock we just dropped
> is good to cover the rest (and serializes even with
> munlock_vma_pages_all(), so no special barriers required): now hold on to
> the pte lock while calling mlock_vma_page().  Is that lock ordering safe? 
> Yes, that's how follow_page_pte() calls it, and how page_remove_rmap()
> calls the complementary clear_page_mlock().
> 
> This fixes the following case (though not a case which anyone has
> complained of), which mmap_sem did not: truncation's preliminary
> unmap_mapping_range() is supposed to remove even the anonymous COWs of
> filecache pages, and that might race with try_to_unmap_one() on a
> VM_LOCKED vma, so that mlock_vma_page() sets PageMlocked just after
> zap_pte_range() unmaps the page, causing "Bad page state (mlocked)" when
> freed.  The pte lock protects against this.
> 
> You could say that it also protects against the more ordinary case, racing
> with the preliminary unmapping of a filecache page itself: but in our
> current tree, that's independently protected by i_mmap_rwsem; and that
> race would be why "Bad page state (mlocked)" was seen before commit
> 48ec833b7851 ("Revert mm/memory.c: share the i_mmap_rwsem").
> 
> Vlastimil Babka points out another race which this patch protects against.
> try_to_unmap_one() might reach its mlock_vma_page() TestSetPageMlocked a
> moment after munlock_vma_pages_all() did its Phase 1 TestClearPageMlocked:
> leaving PageMlocked and unevictable when it should be evictable.  mmap_sem
> is ineffective because exit_mmap() does not hold it; page lock ineffective
> because __munlock_pagevec() only takes it afterwards, in Phase 2; pte lock
> is effective because __munlock_pagevec_fill() takes it to get the page,
> after VM_LOCKED was cleared from vm_flags, so visible to try_to_unmap_one.
> 
> Kirill Shutemov points out that if the compiler chooses to implement a
> "vma->vm_flags &= VM_WHATEVER" or "vma->vm_flags |= VM_WHATEVER" operation
> with an intermediate store of unrelated bits set, since I'm here foregoing
> its usual protection by mmap_sem, try_to_unmap_one() might catch sight of
> a spurious VM_LOCKED in vm_flags, and make the wrong decision.  This does
> not appear to be an immediate problem, but we may want to define vm_flags
> accessors in future, to guard against such a possibility.
> 
> While we're here, make a related optimization in try_to_munmap_one(): if
> it's doing TTU_MUNLOCK, then there's no point at all in descending the
> page tables and getting the pt lock, unless the vma is VM_LOCKED.  Yes,
> that can change racily, but it can change racily even without the
> optimization: it's not critical.  Far better not to waste time here.
> 
> Stopped short of separating try_to_munlock_one() from try_to_munmap_one()
> on this occasion, but that's probably the sensible next step - with a
> rename, given that try_to_munlock()'s business is to try to set Mlocked.
> 
> Updated the unevictable-lru Documentation, to remove its reference to mmap
> semaphore, but found a few more updates needed in just that area.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Sasha Levin <sasha.levin@oracle.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/12] mm: page migration fix PageMlocked on migrated pages
  2015-10-19  4:52 ` [PATCH 3/12] mm: page migration fix PageMlocked on migrated pages Hugh Dickins
@ 2015-11-05 18:18   ` Vlastimil Babka
  0 siblings, 0 replies; 39+ messages in thread
From: Vlastimil Babka @ 2015-11-05 18:18 UTC (permalink / raw)
  To: Hugh Dickins, Andrew Morton; +Cc: Christoph Lameter, Rik van Riel, linux-mm

On 10/19/2015 06:52 AM, Hugh Dickins wrote:
> Commit e6c509f85455 ("mm: use clear_page_mlock() in page_remove_rmap()")
> in v3.7 inadvertently made mlock_migrate_page() impotent: page migration
> unmaps the page from userspace before migrating, and that commit clears
> PageMlocked on the final unmap, leaving mlock_migrate_page() with nothing
> to do.  Not a serious bug, the next attempt at reclaiming the page would
> fix it up; but a betrayal of page migration's intent - the new page ought
> to emerge as PageMlocked.
> 
> I don't see how to fix it for mlock_migrate_page() itself; but easily
> fixed in remove_migration_pte(), by calling mlock_vma_page() when the
> vma is VM_LOCKED - under pte lock as in try_to_unmap_one().
> 
> Delete mlock_migrate_page()?  Not quite, it does still serve a purpose
> for migrate_misplaced_transhuge_page(): where we could replace it by a
> test, clear_page_mlock(), mlock_vma_page() sequence; but would that be
> an improvement?  mlock_migrate_page() is fairly lean, and let's make
> it leaner by skipping the irq save/restore now clearly not needed.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 6/12] mm: page migration use the put_new_page whenever necessary
  2015-10-19  4:57 ` [PATCH 6/12] mm: page migration use the put_new_page whenever necessary Hugh Dickins
@ 2015-11-05 18:31   ` Vlastimil Babka
  2015-11-08 21:17     ` Hugh Dickins
  0 siblings, 1 reply; 39+ messages in thread
From: Vlastimil Babka @ 2015-11-05 18:31 UTC (permalink / raw)
  To: Hugh Dickins, Andrew Morton; +Cc: Christoph Lameter, David Rientjes, linux-mm

On 10/19/2015 06:57 AM, Hugh Dickins wrote:
> I don't know of any problem from the way it's used in our current tree,
> but there is one defect in page migration's custom put_new_page feature.
> 
> An unused newpage is expected to be released with the put_new_page(),
> but there was one MIGRATEPAGE_SUCCESS (0) path which released it with
> putback_lru_page(): which can be very wrong for a custom pool.

I'm a bit confused. So there's no immediate bug to be fixed but there was one in
the mainline in the past? Or elsewhere?

> Fixed more easily by resetting put_new_page once it won't be needed,
> than by adding a further flag to modify the rc test.

What is "fixed" if there is no bug? :) Maybe "Further bugs would be
prevented..." or something?

> Signed-off-by: Hugh Dickins <hughd@google.com>

I agree it's less error-prone after you patch, so:

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/migrate.c |   19 +++++++++++--------
>  1 file changed, 11 insertions(+), 8 deletions(-)
> 
> --- migrat.orig/mm/migrate.c	2015-10-18 17:53:17.579329434 -0700
> +++ migrat/mm/migrate.c	2015-10-18 17:53:20.159332371 -0700
> @@ -938,10 +938,11 @@ static ICE_noinline int unmap_and_move(n
>  				   int force, enum migrate_mode mode,
>  				   enum migrate_reason reason)
>  {
> -	int rc = 0;
> +	int rc = MIGRATEPAGE_SUCCESS;
>  	int *result = NULL;
> -	struct page *newpage = get_new_page(page, private, &result);
> +	struct page *newpage;
>  
> +	newpage = get_new_page(page, private, &result);
>  	if (!newpage)
>  		return -ENOMEM;
>  
> @@ -955,6 +956,8 @@ static ICE_noinline int unmap_and_move(n
>  			goto out;
>  
>  	rc = __unmap_and_move(page, newpage, force, mode);
> +	if (rc == MIGRATEPAGE_SUCCESS)
> +		put_new_page = NULL;
>  
>  out:
>  	if (rc != -EAGAIN) {
> @@ -981,7 +984,7 @@ out:
>  	 * it.  Otherwise, putback_lru_page() will drop the reference grabbed
>  	 * during isolation.
>  	 */
> -	if (rc != MIGRATEPAGE_SUCCESS && put_new_page) {
> +	if (put_new_page) {
>  		ClearPageSwapBacked(newpage);
>  		put_new_page(newpage, private);
>  	} else if (unlikely(__is_movable_balloon_page(newpage))) {
> @@ -1022,7 +1025,7 @@ static int unmap_and_move_huge_page(new_
>  				struct page *hpage, int force,
>  				enum migrate_mode mode)
>  {
> -	int rc = 0;
> +	int rc = -EAGAIN;
>  	int *result = NULL;
>  	int page_was_mapped = 0;
>  	struct page *new_hpage;
> @@ -1044,8 +1047,6 @@ static int unmap_and_move_huge_page(new_
>  	if (!new_hpage)
>  		return -ENOMEM;
>  
> -	rc = -EAGAIN;
> -
>  	if (!trylock_page(hpage)) {
>  		if (!force || mode != MIGRATE_SYNC)
>  			goto out;
> @@ -1070,8 +1071,10 @@ static int unmap_and_move_huge_page(new_
>  	if (anon_vma)
>  		put_anon_vma(anon_vma);
>  
> -	if (rc == MIGRATEPAGE_SUCCESS)
> +	if (rc == MIGRATEPAGE_SUCCESS) {
>  		hugetlb_cgroup_migrate(hpage, new_hpage);
> +		put_new_page = NULL;
> +	}
>  
>  	unlock_page(hpage);
>  out:
> @@ -1083,7 +1086,7 @@ out:
>  	 * it.  Otherwise, put_page() will drop the reference grabbed during
>  	 * isolation.
>  	 */
> -	if (rc != MIGRATEPAGE_SUCCESS && put_new_page)
> +	if (put_new_page)
>  		put_new_page(new_hpage, private);
>  	else
>  		putback_active_hugepage(new_hpage);
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 6/12] mm: page migration use the put_new_page whenever necessary
  2015-11-05 18:31   ` Vlastimil Babka
@ 2015-11-08 21:17     ` Hugh Dickins
  0 siblings, 0 replies; 39+ messages in thread
From: Hugh Dickins @ 2015-11-08 21:17 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Hugh Dickins, Andrew Morton, Christoph Lameter, David Rientjes, linux-mm

On Thu, 5 Nov 2015, Vlastimil Babka wrote:
> On 10/19/2015 06:57 AM, Hugh Dickins wrote:
> > I don't know of any problem from the way it's used in our current tree,
> > but there is one defect in page migration's custom put_new_page feature.
> > 
> > An unused newpage is expected to be released with the put_new_page(),
> > but there was one MIGRATEPAGE_SUCCESS (0) path which released it with
> > putback_lru_page(): which can be very wrong for a custom pool.
> 
> I'm a bit confused. So there's no immediate bug to be fixed but there was one in
> the mainline in the past? Or elsewhere?

"elsewhere": I came across it (and several of the other issues addressed
in this patchset) when using migrate_pages() in my huge tmpfs work.

I admit that, until I came to reply to you, I had thought this oversight
resulted in a (minor) unintended inefficiency in compaction - still the
sole user of the put_new_page feature in current mainline.  I thought it
was permitting a waste of effort of the kind that put_new_page was added
to stop.

But that's not so: because it's limited to (the page_count 1 case of)
MIGRATEPAGE_SUCCESS, migrate_pages() will not retry on the old page, so
it does not matter that its migration target is diverted to the public
pool, instead of back to the private pool.

At least, I think it barely matters; but using putback_lru_page does
miss out on the "pfn > high_pfn" check in release_freepages().

> 
> > Fixed more easily by resetting put_new_page once it won't be needed,
> > than by adding a further flag to modify the rc test.
> 
> What is "fixed" if there is no bug? :) Maybe "Further bugs would be
> prevented..." or something?

I never claimed a "critical bug" there, nor asked for this to go to
stable.  I think it's fair to describe something as a "bug", where a
design is not quite working as it had intended; though "defect" is the
word I actually used.  It's reasonable to "fix" a "defect", isn't it?

> 
> > Signed-off-by: Hugh Dickins <hughd@google.com>
> 
> I agree it's less error-prone after you patch, so:
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>

Thanks, and for your other scrutiny and Acks too.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH] mm: fix kerneldoc on mem_cgroup_replace_page
  2015-10-19  4:54 ` [PATCH 4/12] mm: rename mem_cgroup_migrate to mem_cgroup_replace_page Hugh Dickins
  2015-10-19 12:35   ` Johannes Weiner
@ 2015-12-02  9:33   ` Hugh Dickins
  2015-12-02 10:17     ` Michal Hocko
  2015-12-02 16:57     ` Johannes Weiner
  1 sibling, 2 replies; 39+ messages in thread
From: Hugh Dickins @ 2015-12-02  9:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, Christoph Lameter, Johannes Weiner, Michal Hocko,
	Greg Thelen, linux-mm

Whoops, I missed removing the kerneldoc comment of the lrucare arg
removed from mem_cgroup_replace_page; but it's a good comment, keep it.

Signed-off-by: Hugh Dickins <hughd@google.com>
---

 mm/memcontrol.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- 4.4-rc3/mm/memcontrol.c	2015-11-15 21:06:56.505752425 -0800
+++ linux/mm/memcontrol.c	2015-11-30 17:40:42.510193391 -0800
@@ -5511,11 +5511,11 @@ void mem_cgroup_uncharge_list(struct lis
  * mem_cgroup_replace_page - migrate a charge to another page
  * @oldpage: currently charged page
  * @newpage: page to transfer the charge to
- * @lrucare: either or both pages might be on the LRU already
  *
  * Migrate the charge from @oldpage to @newpage.
  *
  * Both pages must be locked, @newpage->mapping must be set up.
+ * Either or both pages might be on the LRU already.
  */
 void mem_cgroup_replace_page(struct page *oldpage, struct page *newpage)
 {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] mm: fix kerneldoc on mem_cgroup_replace_page
  2015-12-02  9:33   ` [PATCH] mm: fix kerneldoc on mem_cgroup_replace_page Hugh Dickins
@ 2015-12-02 10:17     ` Michal Hocko
  2015-12-02 16:57     ` Johannes Weiner
  1 sibling, 0 replies; 39+ messages in thread
From: Michal Hocko @ 2015-12-02 10:17 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Christoph Lameter, Johannes Weiner, Greg Thelen, linux-mm

On Wed 02-12-15 01:33:03, Hugh Dickins wrote:
> Whoops, I missed removing the kerneldoc comment of the lrucare arg
> removed from mem_cgroup_replace_page; but it's a good comment, keep it.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
> 
>  mm/memcontrol.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- 4.4-rc3/mm/memcontrol.c	2015-11-15 21:06:56.505752425 -0800
> +++ linux/mm/memcontrol.c	2015-11-30 17:40:42.510193391 -0800
> @@ -5511,11 +5511,11 @@ void mem_cgroup_uncharge_list(struct lis
>   * mem_cgroup_replace_page - migrate a charge to another page
>   * @oldpage: currently charged page
>   * @newpage: page to transfer the charge to
> - * @lrucare: either or both pages might be on the LRU already
>   *
>   * Migrate the charge from @oldpage to @newpage.
>   *
>   * Both pages must be locked, @newpage->mapping must be set up.
> + * Either or both pages might be on the LRU already.
>   */
>  void mem_cgroup_replace_page(struct page *oldpage, struct page *newpage)
>  {

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] mm: fix kerneldoc on mem_cgroup_replace_page
  2015-12-02  9:33   ` [PATCH] mm: fix kerneldoc on mem_cgroup_replace_page Hugh Dickins
  2015-12-02 10:17     ` Michal Hocko
@ 2015-12-02 16:57     ` Johannes Weiner
  1 sibling, 0 replies; 39+ messages in thread
From: Johannes Weiner @ 2015-12-02 16:57 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Christoph Lameter, Michal Hocko, Greg Thelen, linux-mm

On Wed, Dec 02, 2015 at 01:33:03AM -0800, Hugh Dickins wrote:
> Whoops, I missed removing the kerneldoc comment of the lrucare arg
> removed from mem_cgroup_replace_page; but it's a good comment, keep it.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>

Thanks!

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2015-12-02 16:57 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-19  4:44 [PATCH 0/12] mm: page migration cleanups, and a little mlock Hugh Dickins
2015-10-19  4:45 ` [PATCH 1/12] mm Documentation: undoc non-linear vmas Hugh Dickins
2015-10-19  9:16   ` Kirill A. Shutemov
2015-11-05 17:29   ` Vlastimil Babka
2015-10-19  4:50 ` [PATCH 2/12] mm: rmap use pte lock not mmap_sem to set PageMlocked Hugh Dickins
2015-10-19  6:23   ` Vlastimil Babka
2015-10-19 11:20     ` Hugh Dickins
2015-10-19 12:33       ` Vlastimil Babka
2015-10-19 19:17         ` Hugh Dickins
2015-10-19 20:52           ` Vlastimil Babka
2015-10-19 13:13       ` Kirill A. Shutemov
2015-10-19 19:53         ` Hugh Dickins
2015-10-19 20:10           ` Kirill A. Shutemov
2015-10-19 21:25             ` Vlastimil Babka
2015-10-19 21:53               ` Kirill A. Shutemov
2015-10-21 23:26               ` Hugh Dickins
2015-10-29 18:49                 ` [PATCH v2 " Hugh Dickins
2015-11-05 17:50                   ` Vlastimil Babka
2015-10-19 23:30         ` [PATCH " Davidlohr Bueso
2015-10-19  4:52 ` [PATCH 3/12] mm: page migration fix PageMlocked on migrated pages Hugh Dickins
2015-11-05 18:18   ` Vlastimil Babka
2015-10-19  4:54 ` [PATCH 4/12] mm: rename mem_cgroup_migrate to mem_cgroup_replace_page Hugh Dickins
2015-10-19 12:35   ` Johannes Weiner
2015-12-02  9:33   ` [PATCH] mm: fix kerneldoc on mem_cgroup_replace_page Hugh Dickins
2015-12-02 10:17     ` Michal Hocko
2015-12-02 16:57     ` Johannes Weiner
2015-10-19  4:55 ` [PATCH 5/12] mm: correct a couple of page migration comments Hugh Dickins
2015-10-21 17:53   ` Rafael Aquini
2015-10-19  4:57 ` [PATCH 6/12] mm: page migration use the put_new_page whenever necessary Hugh Dickins
2015-11-05 18:31   ` Vlastimil Babka
2015-11-08 21:17     ` Hugh Dickins
2015-10-19  4:59 ` [PATCH 7/12] mm: page migration trylock newpage at same level as oldpage Hugh Dickins
2015-10-21 17:54   ` Rafael Aquini
2015-10-19  5:01 ` [PATCH 8/12] mm: page migration remove_migration_ptes at lock+unlock level Hugh Dickins
2015-10-19  5:03 ` [PATCH 9/12] mm: simplify page migration's anon_vma comment and flow Hugh Dickins
2015-10-19  5:05 ` [PATCH 10/12] mm: page migration use migration entry for swapcache too Hugh Dickins
2015-10-22 22:35   ` Cyrill Gorcunov
2015-10-19  5:07 ` [PATCH 11/12] mm: page migration avoid touching newpage until no going back Hugh Dickins
2015-10-19  5:11 ` [PATCH 12/12] mm: migrate dirty page without clear_page_dirty_for_io etc Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).