All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] mm: thp: mapcount updates
@ 2016-05-06 15:03 ` Andrea Arcangeli
  0 siblings, 0 replies; 27+ messages in thread
From: Andrea Arcangeli @ 2016-05-06 15:03 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, linux-kernel; +Cc: Alex Williamson, Kirill A. Shutemov

Hello,

1/3 is a bugfix and it fixes userland (not kernel) data corruption
with vfio (and in general device driver) page pinning done with
get_user_pages. More testing of it under any load is welcome (also not
necessarily a page pinning load using vfio).

Along with the above I'm sending also 2/3 and 3/3 but those are not
meant to be merged upstream quickly and they're very low priority and
furthermore 2/3 is not zero risk and it didn't get enough testing
yet. Queuing 2/3 in -mm to give it more exposure should be ok
though. 2/3 is only suitable for merging at the very opening of merge
window anyway.

Andrea Arcangeli (3):
  mm: thp: calculate the mapcount correctly for THP pages during WP
    faults
  mm: thp: microoptimize compound_mapcount()
  mm: thp: split_huge_pmd_address() comment improvement

 include/linux/mm.h   | 12 +++++++--
 include/linux/swap.h |  8 +++---
 mm/huge_memory.c     | 73 ++++++++++++++++++++++++++++++++++++++++++++--------
 mm/memory.c          | 22 ++++++++++------
 mm/swapfile.c        | 13 +++++-----
 5 files changed, 98 insertions(+), 30 deletions(-)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 0/3] mm: thp: mapcount updates
@ 2016-05-06 15:03 ` Andrea Arcangeli
  0 siblings, 0 replies; 27+ messages in thread
From: Andrea Arcangeli @ 2016-05-06 15:03 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, linux-kernel; +Cc: Alex Williamson, Kirill A. Shutemov

Hello,

1/3 is a bugfix and it fixes userland (not kernel) data corruption
with vfio (and in general device driver) page pinning done with
get_user_pages. More testing of it under any load is welcome (also not
necessarily a page pinning load using vfio).

Along with the above I'm sending also 2/3 and 3/3 but those are not
meant to be merged upstream quickly and they're very low priority and
furthermore 2/3 is not zero risk and it didn't get enough testing
yet. Queuing 2/3 in -mm to give it more exposure should be ok
though. 2/3 is only suitable for merging at the very opening of merge
window anyway.

Andrea Arcangeli (3):
  mm: thp: calculate the mapcount correctly for THP pages during WP
    faults
  mm: thp: microoptimize compound_mapcount()
  mm: thp: split_huge_pmd_address() comment improvement

 include/linux/mm.h   | 12 +++++++--
 include/linux/swap.h |  8 +++---
 mm/huge_memory.c     | 73 ++++++++++++++++++++++++++++++++++++++++++++--------
 mm/memory.c          | 22 ++++++++++------
 mm/swapfile.c        | 13 +++++-----
 5 files changed, 98 insertions(+), 30 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/3] mm: thp: calculate the mapcount correctly for THP pages during WP faults
  2016-05-06 15:03 ` Andrea Arcangeli
@ 2016-05-06 15:03   ` Andrea Arcangeli
  -1 siblings, 0 replies; 27+ messages in thread
From: Andrea Arcangeli @ 2016-05-06 15:03 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, linux-kernel; +Cc: Alex Williamson, Kirill A. Shutemov

This will provide fully accuracy to the mapcount calculation in the
write protect faults, so page pinning will not get broken by false
positive copy-on-writes.

total_mapcount() isn't the right calculation needed in
reuse_swap_page(), so this introduces a page_trans_huge_mapcount()
that is effectively the full accurate return value for page_mapcount()
if dealing with Transparent Hugepages, however we only use the
page_trans_huge_mapcount() during COW faults where it strictly needed,
due to its higher runtime cost.

This also provide at practical zero cost the total_mapcount
information which is needed to know if we can still relocate the page
anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we
can reuse the page no matter if it's a pte or a pmd_trans_huge
triggering the fault, but we can only relocate the page anon_vma to
the local vma->anon_vma if we're sure it's only this "vma" mapping the
whole THP physical range.

Kirill A. Shutemov discovered the problem with moving the page
anon_vma to the local vma->anon_vma in a previous version of this
patch and another problem in the way page_move_anon_rmap() was called.

Reviewed-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 include/linux/mm.h   |  9 +++++++
 include/linux/swap.h |  8 ++++---
 mm/huge_memory.c     | 67 +++++++++++++++++++++++++++++++++++++++++++++-------
 mm/memory.c          | 22 ++++++++++-------
 mm/swapfile.c        | 13 +++++-----
 5 files changed, 93 insertions(+), 26 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a55e5be..263f229 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -500,11 +500,20 @@ static inline int page_mapcount(struct page *page)
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 int total_mapcount(struct page *page);
+int page_trans_huge_mapcount(struct page *page, int *total_mapcount);
 #else
 static inline int total_mapcount(struct page *page)
 {
 	return page_mapcount(page);
 }
+static inline int page_trans_huge_mapcount(struct page *page,
+					   int *total_mapcount)
+{
+	int mapcount = page_mapcount(page);
+	if (total_mapcount)
+		*total_mapcount = mapcount;
+	return mapcount;
+}
 #endif
 
 static inline struct page *virt_to_head_page(const void *x)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 2b83359..acef20d 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -418,7 +418,7 @@ extern sector_t swapdev_block(int, pgoff_t);
 extern int page_swapcount(struct page *);
 extern int swp_swapcount(swp_entry_t entry);
 extern struct swap_info_struct *page_swap_info(struct page *);
-extern int reuse_swap_page(struct page *);
+extern bool reuse_swap_page(struct page *, int *);
 extern int try_to_free_swap(struct page *);
 struct backing_dev_info;
 
@@ -513,8 +513,10 @@ static inline int swp_swapcount(swp_entry_t entry)
 	return 0;
 }
 
-#define reuse_swap_page(page) \
-	(!PageTransCompound(page) && page_mapcount(page) == 1)
+static inline bool reuse_swap_page(struct page *page, int *total_mapcount)
+{
+	return page_trans_huge_mapcount(page, total_mapcount) == 1;
+}
 
 static inline int try_to_free_swap(struct page *page)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 86f9f8b..9086793 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1298,15 +1298,9 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	VM_BUG_ON_PAGE(!PageCompound(page) || !PageHead(page), page);
 	/*
 	 * We can only reuse the page if nobody else maps the huge page or it's
-	 * part. We can do it by checking page_mapcount() on each sub-page, but
-	 * it's expensive.
-	 * The cheaper way is to check page_count() to be equal 1: every
-	 * mapcount takes page reference reference, so this way we can
-	 * guarantee, that the PMD is the only mapping.
-	 * This can give false negative if somebody pinned the page, but that's
-	 * fine.
+	 * part.
 	 */
-	if (page_mapcount(page) == 1 && page_count(page) == 1) {
+	if (page_trans_huge_mapcount(page, NULL) == 1) {
 		pmd_t entry;
 		entry = pmd_mkyoung(orig_pmd);
 		entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
@@ -2080,7 +2074,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 		if (pte_write(pteval)) {
 			writable = true;
 		} else {
-			if (PageSwapCache(page) && !reuse_swap_page(page)) {
+			if (PageSwapCache(page) &&
+			    !reuse_swap_page(page, NULL)) {
 				unlock_page(page);
 				result = SCAN_SWAP_CACHE_PAGE;
 				goto out;
@@ -3225,6 +3220,60 @@ int total_mapcount(struct page *page)
 }
 
 /*
+ * This calculates accurately how many mappings a transparent hugepage
+ * has (unlike page_mapcount() which isn't fully accurate). This full
+ * accuracy is primarily needed to know if copy-on-write faults can
+ * reuse the page and change the mapping to read-write instead of
+ * copying them. At the same time this returns the total_mapcount too.
+ *
+ * The function returns the highest mapcount any one of the subpages
+ * has. If the return value is one, even if different processes are
+ * mapping different subpages of the transparent hugepage, they can
+ * all reuse it, because each process is reusing a different subpage.
+ *
+ * The total_mapcount is instead counting all virtual mappings of the
+ * subpages. If the total_mapcount is equal to "one", it tells the
+ * caller all mappings belong to the same "mm" and in turn the
+ * anon_vma of the transparent hugepage can become the vma->anon_vma
+ * local one as no other process may be mapping any of the subpages.
+ *
+ * It would be more accurate to replace page_mapcount() with
+ * page_trans_huge_mapcount(), however we only use
+ * page_trans_huge_mapcount() in the copy-on-write faults where we
+ * need full accuracy to avoid breaking page pinning, because
+ * page_trans_huge_mapcount() is slower than page_mapcount().
+ */
+int page_trans_huge_mapcount(struct page *page, int *total_mapcount)
+{
+	int i, ret, _total_mapcount, mapcount;
+
+	/* hugetlbfs shouldn't call it */
+	VM_BUG_ON_PAGE(PageHuge(page), page);
+
+	if (likely(!PageTransCompound(page)))
+		return atomic_read(&page->_mapcount) + 1;
+
+	page = compound_head(page);
+
+	_total_mapcount = ret = 0;
+	for (i = 0; i < HPAGE_PMD_NR; i++) {
+		mapcount = atomic_read(&page[i]._mapcount) + 1;
+		ret = max(ret, mapcount);
+		_total_mapcount += mapcount;
+	}
+	if (PageDoubleMap(page)) {
+		ret -= 1;
+		_total_mapcount -= HPAGE_PMD_NR;
+	}
+	mapcount = compound_mapcount(page);
+	ret += mapcount;
+	_total_mapcount += mapcount;
+	if (total_mapcount)
+		*total_mapcount = _total_mapcount;
+	return ret;
+}
+
+/*
  * This function splits huge page into normal pages. @page can point to any
  * subpage of huge page to split. Split doesn't change the position of @page.
  *
diff --git a/mm/memory.c b/mm/memory.c
index 93897f2..d7a2af3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2340,6 +2340,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * not dirty accountable.
 	 */
 	if (PageAnon(old_page) && !PageKsm(old_page)) {
+		int total_mapcount;
 		if (!trylock_page(old_page)) {
 			get_page(old_page);
 			pte_unmap_unlock(page_table, ptl);
@@ -2354,13 +2355,18 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			}
 			put_page(old_page);
 		}
-		if (reuse_swap_page(old_page)) {
-			/*
-			 * The page is all ours.  Move it to our anon_vma so
-			 * the rmap code will not search our parent or siblings.
-			 * Protected against the rmap code by the page lock.
-			 */
-			page_move_anon_rmap(old_page, vma, address);
+		if (reuse_swap_page(old_page, &total_mapcount)) {
+			if (total_mapcount == 1) {
+				/*
+				 * The page is all ours. Move it to
+				 * our anon_vma so the rmap code will
+				 * not search our parent or siblings.
+				 * Protected against the rmap code by
+				 * the page lock.
+				 */
+				page_move_anon_rmap(compound_head(old_page),
+						    vma, address);
+			}
 			unlock_page(old_page);
 			return wp_page_reuse(mm, vma, address, page_table, ptl,
 					     orig_pte, old_page, 0, 0);
@@ -2584,7 +2590,7 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	inc_mm_counter_fast(mm, MM_ANONPAGES);
 	dec_mm_counter_fast(mm, MM_SWAPENTS);
 	pte = mk_pte(page, vma->vm_page_prot);
-	if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page)) {
+	if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) {
 		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
 		flags &= ~FAULT_FLAG_WRITE;
 		ret |= VM_FAULT_WRITE;
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 83874ec..031713ab 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -922,18 +922,19 @@ out:
  * to it.  And as a side-effect, free up its swap: because the old content
  * on disk will never be read, and seeking back there to write new content
  * later would only waste time away from clustering.
+ *
+ * NOTE: total_mapcount should not be relied upon by the caller if
+ * reuse_swap_page() returns false, but it may be always overwritten
+ * (see the other implementation for CONFIG_SWAP=n).
  */
-int reuse_swap_page(struct page *page)
+bool reuse_swap_page(struct page *page, int *total_mapcount)
 {
 	int count;
 
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 	if (unlikely(PageKsm(page)))
-		return 0;
-	/* The page is part of THP and cannot be reused */
-	if (PageTransCompound(page))
-		return 0;
-	count = page_mapcount(page);
+		return false;
+	count = page_trans_huge_mapcount(page, total_mapcount);
 	if (count <= 1 && PageSwapCache(page)) {
 		count += page_swapcount(page);
 		if (count == 1 && !PageWriteback(page)) {

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 1/3] mm: thp: calculate the mapcount correctly for THP pages during WP faults
@ 2016-05-06 15:03   ` Andrea Arcangeli
  0 siblings, 0 replies; 27+ messages in thread
From: Andrea Arcangeli @ 2016-05-06 15:03 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, linux-kernel; +Cc: Alex Williamson, Kirill A. Shutemov

This will provide fully accuracy to the mapcount calculation in the
write protect faults, so page pinning will not get broken by false
positive copy-on-writes.

total_mapcount() isn't the right calculation needed in
reuse_swap_page(), so this introduces a page_trans_huge_mapcount()
that is effectively the full accurate return value for page_mapcount()
if dealing with Transparent Hugepages, however we only use the
page_trans_huge_mapcount() during COW faults where it strictly needed,
due to its higher runtime cost.

This also provide at practical zero cost the total_mapcount
information which is needed to know if we can still relocate the page
anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we
can reuse the page no matter if it's a pte or a pmd_trans_huge
triggering the fault, but we can only relocate the page anon_vma to
the local vma->anon_vma if we're sure it's only this "vma" mapping the
whole THP physical range.

Kirill A. Shutemov discovered the problem with moving the page
anon_vma to the local vma->anon_vma in a previous version of this
patch and another problem in the way page_move_anon_rmap() was called.

Reviewed-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 include/linux/mm.h   |  9 +++++++
 include/linux/swap.h |  8 ++++---
 mm/huge_memory.c     | 67 +++++++++++++++++++++++++++++++++++++++++++++-------
 mm/memory.c          | 22 ++++++++++-------
 mm/swapfile.c        | 13 +++++-----
 5 files changed, 93 insertions(+), 26 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a55e5be..263f229 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -500,11 +500,20 @@ static inline int page_mapcount(struct page *page)
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 int total_mapcount(struct page *page);
+int page_trans_huge_mapcount(struct page *page, int *total_mapcount);
 #else
 static inline int total_mapcount(struct page *page)
 {
 	return page_mapcount(page);
 }
+static inline int page_trans_huge_mapcount(struct page *page,
+					   int *total_mapcount)
+{
+	int mapcount = page_mapcount(page);
+	if (total_mapcount)
+		*total_mapcount = mapcount;
+	return mapcount;
+}
 #endif
 
 static inline struct page *virt_to_head_page(const void *x)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 2b83359..acef20d 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -418,7 +418,7 @@ extern sector_t swapdev_block(int, pgoff_t);
 extern int page_swapcount(struct page *);
 extern int swp_swapcount(swp_entry_t entry);
 extern struct swap_info_struct *page_swap_info(struct page *);
-extern int reuse_swap_page(struct page *);
+extern bool reuse_swap_page(struct page *, int *);
 extern int try_to_free_swap(struct page *);
 struct backing_dev_info;
 
@@ -513,8 +513,10 @@ static inline int swp_swapcount(swp_entry_t entry)
 	return 0;
 }
 
-#define reuse_swap_page(page) \
-	(!PageTransCompound(page) && page_mapcount(page) == 1)
+static inline bool reuse_swap_page(struct page *page, int *total_mapcount)
+{
+	return page_trans_huge_mapcount(page, total_mapcount) == 1;
+}
 
 static inline int try_to_free_swap(struct page *page)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 86f9f8b..9086793 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1298,15 +1298,9 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	VM_BUG_ON_PAGE(!PageCompound(page) || !PageHead(page), page);
 	/*
 	 * We can only reuse the page if nobody else maps the huge page or it's
-	 * part. We can do it by checking page_mapcount() on each sub-page, but
-	 * it's expensive.
-	 * The cheaper way is to check page_count() to be equal 1: every
-	 * mapcount takes page reference reference, so this way we can
-	 * guarantee, that the PMD is the only mapping.
-	 * This can give false negative if somebody pinned the page, but that's
-	 * fine.
+	 * part.
 	 */
-	if (page_mapcount(page) == 1 && page_count(page) == 1) {
+	if (page_trans_huge_mapcount(page, NULL) == 1) {
 		pmd_t entry;
 		entry = pmd_mkyoung(orig_pmd);
 		entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
@@ -2080,7 +2074,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 		if (pte_write(pteval)) {
 			writable = true;
 		} else {
-			if (PageSwapCache(page) && !reuse_swap_page(page)) {
+			if (PageSwapCache(page) &&
+			    !reuse_swap_page(page, NULL)) {
 				unlock_page(page);
 				result = SCAN_SWAP_CACHE_PAGE;
 				goto out;
@@ -3225,6 +3220,60 @@ int total_mapcount(struct page *page)
 }
 
 /*
+ * This calculates accurately how many mappings a transparent hugepage
+ * has (unlike page_mapcount() which isn't fully accurate). This full
+ * accuracy is primarily needed to know if copy-on-write faults can
+ * reuse the page and change the mapping to read-write instead of
+ * copying them. At the same time this returns the total_mapcount too.
+ *
+ * The function returns the highest mapcount any one of the subpages
+ * has. If the return value is one, even if different processes are
+ * mapping different subpages of the transparent hugepage, they can
+ * all reuse it, because each process is reusing a different subpage.
+ *
+ * The total_mapcount is instead counting all virtual mappings of the
+ * subpages. If the total_mapcount is equal to "one", it tells the
+ * caller all mappings belong to the same "mm" and in turn the
+ * anon_vma of the transparent hugepage can become the vma->anon_vma
+ * local one as no other process may be mapping any of the subpages.
+ *
+ * It would be more accurate to replace page_mapcount() with
+ * page_trans_huge_mapcount(), however we only use
+ * page_trans_huge_mapcount() in the copy-on-write faults where we
+ * need full accuracy to avoid breaking page pinning, because
+ * page_trans_huge_mapcount() is slower than page_mapcount().
+ */
+int page_trans_huge_mapcount(struct page *page, int *total_mapcount)
+{
+	int i, ret, _total_mapcount, mapcount;
+
+	/* hugetlbfs shouldn't call it */
+	VM_BUG_ON_PAGE(PageHuge(page), page);
+
+	if (likely(!PageTransCompound(page)))
+		return atomic_read(&page->_mapcount) + 1;
+
+	page = compound_head(page);
+
+	_total_mapcount = ret = 0;
+	for (i = 0; i < HPAGE_PMD_NR; i++) {
+		mapcount = atomic_read(&page[i]._mapcount) + 1;
+		ret = max(ret, mapcount);
+		_total_mapcount += mapcount;
+	}
+	if (PageDoubleMap(page)) {
+		ret -= 1;
+		_total_mapcount -= HPAGE_PMD_NR;
+	}
+	mapcount = compound_mapcount(page);
+	ret += mapcount;
+	_total_mapcount += mapcount;
+	if (total_mapcount)
+		*total_mapcount = _total_mapcount;
+	return ret;
+}
+
+/*
  * This function splits huge page into normal pages. @page can point to any
  * subpage of huge page to split. Split doesn't change the position of @page.
  *
diff --git a/mm/memory.c b/mm/memory.c
index 93897f2..d7a2af3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2340,6 +2340,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * not dirty accountable.
 	 */
 	if (PageAnon(old_page) && !PageKsm(old_page)) {
+		int total_mapcount;
 		if (!trylock_page(old_page)) {
 			get_page(old_page);
 			pte_unmap_unlock(page_table, ptl);
@@ -2354,13 +2355,18 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			}
 			put_page(old_page);
 		}
-		if (reuse_swap_page(old_page)) {
-			/*
-			 * The page is all ours.  Move it to our anon_vma so
-			 * the rmap code will not search our parent or siblings.
-			 * Protected against the rmap code by the page lock.
-			 */
-			page_move_anon_rmap(old_page, vma, address);
+		if (reuse_swap_page(old_page, &total_mapcount)) {
+			if (total_mapcount == 1) {
+				/*
+				 * The page is all ours. Move it to
+				 * our anon_vma so the rmap code will
+				 * not search our parent or siblings.
+				 * Protected against the rmap code by
+				 * the page lock.
+				 */
+				page_move_anon_rmap(compound_head(old_page),
+						    vma, address);
+			}
 			unlock_page(old_page);
 			return wp_page_reuse(mm, vma, address, page_table, ptl,
 					     orig_pte, old_page, 0, 0);
@@ -2584,7 +2590,7 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	inc_mm_counter_fast(mm, MM_ANONPAGES);
 	dec_mm_counter_fast(mm, MM_SWAPENTS);
 	pte = mk_pte(page, vma->vm_page_prot);
-	if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page)) {
+	if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) {
 		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
 		flags &= ~FAULT_FLAG_WRITE;
 		ret |= VM_FAULT_WRITE;
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 83874ec..031713ab 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -922,18 +922,19 @@ out:
  * to it.  And as a side-effect, free up its swap: because the old content
  * on disk will never be read, and seeking back there to write new content
  * later would only waste time away from clustering.
+ *
+ * NOTE: total_mapcount should not be relied upon by the caller if
+ * reuse_swap_page() returns false, but it may be always overwritten
+ * (see the other implementation for CONFIG_SWAP=n).
  */
-int reuse_swap_page(struct page *page)
+bool reuse_swap_page(struct page *page, int *total_mapcount)
 {
 	int count;
 
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 	if (unlikely(PageKsm(page)))
-		return 0;
-	/* The page is part of THP and cannot be reused */
-	if (PageTransCompound(page))
-		return 0;
-	count = page_mapcount(page);
+		return false;
+	count = page_trans_huge_mapcount(page, total_mapcount);
 	if (count <= 1 && PageSwapCache(page)) {
 		count += page_swapcount(page);
 		if (count == 1 && !PageWriteback(page)) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/3] mm: thp: microoptimize compound_mapcount()
  2016-05-06 15:03 ` Andrea Arcangeli
@ 2016-05-06 15:03   ` Andrea Arcangeli
  -1 siblings, 0 replies; 27+ messages in thread
From: Andrea Arcangeli @ 2016-05-06 15:03 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, linux-kernel; +Cc: Alex Williamson, Kirill A. Shutemov

compound_mapcount() is only called after PageCompound() has already
been checked by the caller, so there's no point to check it again. Gcc
may optimize it away too because it's inline but this will remove the
runtime check for sure and add it'll add an assert instead.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 include/linux/mm.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 263f229..726ba80 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -471,8 +471,7 @@ static inline atomic_t *compound_mapcount_ptr(struct page *page)
 
 static inline int compound_mapcount(struct page *page)
 {
-	if (!PageCompound(page))
-		return 0;
+	VM_BUG_ON_PAGE(!PageCompound(page), page);
 	page = compound_head(page);
 	return atomic_read(compound_mapcount_ptr(page)) + 1;
 }

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/3] mm: thp: microoptimize compound_mapcount()
@ 2016-05-06 15:03   ` Andrea Arcangeli
  0 siblings, 0 replies; 27+ messages in thread
From: Andrea Arcangeli @ 2016-05-06 15:03 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, linux-kernel; +Cc: Alex Williamson, Kirill A. Shutemov

compound_mapcount() is only called after PageCompound() has already
been checked by the caller, so there's no point to check it again. Gcc
may optimize it away too because it's inline but this will remove the
runtime check for sure and add it'll add an assert instead.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 include/linux/mm.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 263f229..726ba80 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -471,8 +471,7 @@ static inline atomic_t *compound_mapcount_ptr(struct page *page)
 
 static inline int compound_mapcount(struct page *page)
 {
-	if (!PageCompound(page))
-		return 0;
+	VM_BUG_ON_PAGE(!PageCompound(page), page);
 	page = compound_head(page);
 	return atomic_read(compound_mapcount_ptr(page)) + 1;
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/3] mm: thp: split_huge_pmd_address() comment improvement
  2016-05-06 15:03 ` Andrea Arcangeli
@ 2016-05-06 15:04   ` Andrea Arcangeli
  -1 siblings, 0 replies; 27+ messages in thread
From: Andrea Arcangeli @ 2016-05-06 15:04 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, linux-kernel; +Cc: Alex Williamson, Kirill A. Shutemov

Comment is partly wrong, this improves it by including the case of
split_huge_pmd_address() called by try_to_unmap_one if
TTU_SPLIT_HUGE_PMD is set.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 mm/huge_memory.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9086793..1fbe13d 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3031,8 +3031,10 @@ void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address,
 		return;
 
 	/*
-	 * Caller holds the mmap_sem write mode, so a huge pmd cannot
-	 * materialize from under us.
+	 * Caller holds the mmap_sem write mode or the anon_vma lock,
+	 * so a huge pmd cannot materialize from under us (khugepaged
+	 * holds both the mmap_sem write mode and the anon_vma lock
+	 * write mode).
 	 */
 	__split_huge_pmd(vma, pmd, address, freeze);
 }

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/3] mm: thp: split_huge_pmd_address() comment improvement
@ 2016-05-06 15:04   ` Andrea Arcangeli
  0 siblings, 0 replies; 27+ messages in thread
From: Andrea Arcangeli @ 2016-05-06 15:04 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, linux-kernel; +Cc: Alex Williamson, Kirill A. Shutemov

Comment is partly wrong, this improves it by including the case of
split_huge_pmd_address() called by try_to_unmap_one if
TTU_SPLIT_HUGE_PMD is set.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 mm/huge_memory.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9086793..1fbe13d 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3031,8 +3031,10 @@ void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address,
 		return;
 
 	/*
-	 * Caller holds the mmap_sem write mode, so a huge pmd cannot
-	 * materialize from under us.
+	 * Caller holds the mmap_sem write mode or the anon_vma lock,
+	 * so a huge pmd cannot materialize from under us (khugepaged
+	 * holds both the mmap_sem write mode and the anon_vma lock
+	 * write mode).
 	 */
 	__split_huge_pmd(vma, pmd, address, freeze);
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] mm: thp: microoptimize compound_mapcount()
  2016-05-06 15:03   ` Andrea Arcangeli
@ 2016-05-06 17:33     ` Kirill A. Shutemov
  -1 siblings, 0 replies; 27+ messages in thread
From: Kirill A. Shutemov @ 2016-05-06 17:33 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Andrew Morton, linux-mm, linux-kernel, Alex Williamson

On Fri, May 06, 2016 at 05:03:59PM +0200, Andrea Arcangeli wrote:
> compound_mapcount() is only called after PageCompound() has already
> been checked by the caller, so there's no point to check it again. Gcc
> may optimize it away too because it's inline but this will remove the
> runtime check for sure and add it'll add an assert instead.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

> ---
>  include/linux/mm.h | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 263f229..726ba80 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -471,8 +471,7 @@ static inline atomic_t *compound_mapcount_ptr(struct page *page)
>  
>  static inline int compound_mapcount(struct page *page)
>  {
> -	if (!PageCompound(page))
> -		return 0;
> +	VM_BUG_ON_PAGE(!PageCompound(page), page);
>  	page = compound_head(page);
>  	return atomic_read(compound_mapcount_ptr(page)) + 1;
>  }

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] mm: thp: microoptimize compound_mapcount()
@ 2016-05-06 17:33     ` Kirill A. Shutemov
  0 siblings, 0 replies; 27+ messages in thread
From: Kirill A. Shutemov @ 2016-05-06 17:33 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Andrew Morton, linux-mm, linux-kernel, Alex Williamson

On Fri, May 06, 2016 at 05:03:59PM +0200, Andrea Arcangeli wrote:
> compound_mapcount() is only called after PageCompound() has already
> been checked by the caller, so there's no point to check it again. Gcc
> may optimize it away too because it's inline but this will remove the
> runtime check for sure and add it'll add an assert instead.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

> ---
>  include/linux/mm.h | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 263f229..726ba80 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -471,8 +471,7 @@ static inline atomic_t *compound_mapcount_ptr(struct page *page)
>  
>  static inline int compound_mapcount(struct page *page)
>  {
> -	if (!PageCompound(page))
> -		return 0;
> +	VM_BUG_ON_PAGE(!PageCompound(page), page);
>  	page = compound_head(page);
>  	return atomic_read(compound_mapcount_ptr(page)) + 1;
>  }

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] mm: thp: split_huge_pmd_address() comment improvement
  2016-05-06 15:04   ` Andrea Arcangeli
@ 2016-05-06 17:33     ` Kirill A. Shutemov
  -1 siblings, 0 replies; 27+ messages in thread
From: Kirill A. Shutemov @ 2016-05-06 17:33 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Andrew Morton, linux-mm, linux-kernel, Alex Williamson

On Fri, May 06, 2016 at 05:04:00PM +0200, Andrea Arcangeli wrote:
> Comment is partly wrong, this improves it by including the case of
> split_huge_pmd_address() called by try_to_unmap_one if
> TTU_SPLIT_HUGE_PMD is set.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

> ---
>  mm/huge_memory.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 9086793..1fbe13d 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3031,8 +3031,10 @@ void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address,
>  		return;
>  
>  	/*
> -	 * Caller holds the mmap_sem write mode, so a huge pmd cannot
> -	 * materialize from under us.
> +	 * Caller holds the mmap_sem write mode or the anon_vma lock,
> +	 * so a huge pmd cannot materialize from under us (khugepaged
> +	 * holds both the mmap_sem write mode and the anon_vma lock
> +	 * write mode).
>  	 */
>  	__split_huge_pmd(vma, pmd, address, freeze);
>  }

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] mm: thp: split_huge_pmd_address() comment improvement
@ 2016-05-06 17:33     ` Kirill A. Shutemov
  0 siblings, 0 replies; 27+ messages in thread
From: Kirill A. Shutemov @ 2016-05-06 17:33 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Andrew Morton, linux-mm, linux-kernel, Alex Williamson

On Fri, May 06, 2016 at 05:04:00PM +0200, Andrea Arcangeli wrote:
> Comment is partly wrong, this improves it by including the case of
> split_huge_pmd_address() called by try_to_unmap_one if
> TTU_SPLIT_HUGE_PMD is set.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

> ---
>  mm/huge_memory.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 9086793..1fbe13d 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3031,8 +3031,10 @@ void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address,
>  		return;
>  
>  	/*
> -	 * Caller holds the mmap_sem write mode, so a huge pmd cannot
> -	 * materialize from under us.
> +	 * Caller holds the mmap_sem write mode or the anon_vma lock,
> +	 * so a huge pmd cannot materialize from under us (khugepaged
> +	 * holds both the mmap_sem write mode and the anon_vma lock
> +	 * write mode).
>  	 */
>  	__split_huge_pmd(vma, pmd, address, freeze);
>  }

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/3] mm: thp: calculate the mapcount correctly for THP pages during WP faults
  2016-05-06 15:03   ` Andrea Arcangeli
@ 2016-05-06 22:10     ` Alex Williamson
  -1 siblings, 0 replies; 27+ messages in thread
From: Alex Williamson @ 2016-05-06 22:10 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, linux-mm, linux-kernel, Kirill A. Shutemov

On Fri,  6 May 2016 17:03:58 +0200
Andrea Arcangeli <aarcange@redhat.com> wrote:

> This will provide fully accuracy to the mapcount calculation in the
> write protect faults, so page pinning will not get broken by false
> positive copy-on-writes.
> 
> total_mapcount() isn't the right calculation needed in
> reuse_swap_page(), so this introduces a page_trans_huge_mapcount()
> that is effectively the full accurate return value for page_mapcount()
> if dealing with Transparent Hugepages, however we only use the
> page_trans_huge_mapcount() during COW faults where it strictly needed,
> due to its higher runtime cost.
> 
> This also provide at practical zero cost the total_mapcount
> information which is needed to know if we can still relocate the page
> anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we
> can reuse the page no matter if it's a pte or a pmd_trans_huge
> triggering the fault, but we can only relocate the page anon_vma to
> the local vma->anon_vma if we're sure it's only this "vma" mapping the
> whole THP physical range.
> 
> Kirill A. Shutemov discovered the problem with moving the page
> anon_vma to the local vma->anon_vma in a previous version of this
> patch and another problem in the way page_move_anon_rmap() was called.
> 
> Reviewed-by: "Kirill A. Shutemov" <kirill@shutemov.name>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---

I've been running my original vfio test case on this for over 6 hours
with no sign of error.

Tested-by: Alex Williamson <alex.williamson@redhat.com>

Can we also get a stable tag for this for v4.5 please?  Thanks,

Alex

>  include/linux/mm.h   |  9 +++++++
>  include/linux/swap.h |  8 ++++---
>  mm/huge_memory.c     | 67 +++++++++++++++++++++++++++++++++++++++++++++-------
>  mm/memory.c          | 22 ++++++++++-------
>  mm/swapfile.c        | 13 +++++-----
>  5 files changed, 93 insertions(+), 26 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a55e5be..263f229 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -500,11 +500,20 @@ static inline int page_mapcount(struct page *page)
>  
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  int total_mapcount(struct page *page);
> +int page_trans_huge_mapcount(struct page *page, int *total_mapcount);
>  #else
>  static inline int total_mapcount(struct page *page)
>  {
>  	return page_mapcount(page);
>  }
> +static inline int page_trans_huge_mapcount(struct page *page,
> +					   int *total_mapcount)
> +{
> +	int mapcount = page_mapcount(page);
> +	if (total_mapcount)
> +		*total_mapcount = mapcount;
> +	return mapcount;
> +}
>  #endif
>  
>  static inline struct page *virt_to_head_page(const void *x)
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 2b83359..acef20d 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -418,7 +418,7 @@ extern sector_t swapdev_block(int, pgoff_t);
>  extern int page_swapcount(struct page *);
>  extern int swp_swapcount(swp_entry_t entry);
>  extern struct swap_info_struct *page_swap_info(struct page *);
> -extern int reuse_swap_page(struct page *);
> +extern bool reuse_swap_page(struct page *, int *);
>  extern int try_to_free_swap(struct page *);
>  struct backing_dev_info;
>  
> @@ -513,8 +513,10 @@ static inline int swp_swapcount(swp_entry_t entry)
>  	return 0;
>  }
>  
> -#define reuse_swap_page(page) \
> -	(!PageTransCompound(page) && page_mapcount(page) == 1)
> +static inline bool reuse_swap_page(struct page *page, int *total_mapcount)
> +{
> +	return page_trans_huge_mapcount(page, total_mapcount) == 1;
> +}
>  
>  static inline int try_to_free_swap(struct page *page)
>  {
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 86f9f8b..9086793 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1298,15 +1298,9 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  	VM_BUG_ON_PAGE(!PageCompound(page) || !PageHead(page), page);
>  	/*
>  	 * We can only reuse the page if nobody else maps the huge page or it's
> -	 * part. We can do it by checking page_mapcount() on each sub-page, but
> -	 * it's expensive.
> -	 * The cheaper way is to check page_count() to be equal 1: every
> -	 * mapcount takes page reference reference, so this way we can
> -	 * guarantee, that the PMD is the only mapping.
> -	 * This can give false negative if somebody pinned the page, but that's
> -	 * fine.
> +	 * part.
>  	 */
> -	if (page_mapcount(page) == 1 && page_count(page) == 1) {
> +	if (page_trans_huge_mapcount(page, NULL) == 1) {
>  		pmd_t entry;
>  		entry = pmd_mkyoung(orig_pmd);
>  		entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
> @@ -2080,7 +2074,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>  		if (pte_write(pteval)) {
>  			writable = true;
>  		} else {
> -			if (PageSwapCache(page) && !reuse_swap_page(page)) {
> +			if (PageSwapCache(page) &&
> +			    !reuse_swap_page(page, NULL)) {
>  				unlock_page(page);
>  				result = SCAN_SWAP_CACHE_PAGE;
>  				goto out;
> @@ -3225,6 +3220,60 @@ int total_mapcount(struct page *page)
>  }
>  
>  /*
> + * This calculates accurately how many mappings a transparent hugepage
> + * has (unlike page_mapcount() which isn't fully accurate). This full
> + * accuracy is primarily needed to know if copy-on-write faults can
> + * reuse the page and change the mapping to read-write instead of
> + * copying them. At the same time this returns the total_mapcount too.
> + *
> + * The function returns the highest mapcount any one of the subpages
> + * has. If the return value is one, even if different processes are
> + * mapping different subpages of the transparent hugepage, they can
> + * all reuse it, because each process is reusing a different subpage.
> + *
> + * The total_mapcount is instead counting all virtual mappings of the
> + * subpages. If the total_mapcount is equal to "one", it tells the
> + * caller all mappings belong to the same "mm" and in turn the
> + * anon_vma of the transparent hugepage can become the vma->anon_vma
> + * local one as no other process may be mapping any of the subpages.
> + *
> + * It would be more accurate to replace page_mapcount() with
> + * page_trans_huge_mapcount(), however we only use
> + * page_trans_huge_mapcount() in the copy-on-write faults where we
> + * need full accuracy to avoid breaking page pinning, because
> + * page_trans_huge_mapcount() is slower than page_mapcount().
> + */
> +int page_trans_huge_mapcount(struct page *page, int *total_mapcount)
> +{
> +	int i, ret, _total_mapcount, mapcount;
> +
> +	/* hugetlbfs shouldn't call it */
> +	VM_BUG_ON_PAGE(PageHuge(page), page);
> +
> +	if (likely(!PageTransCompound(page)))
> +		return atomic_read(&page->_mapcount) + 1;
> +
> +	page = compound_head(page);
> +
> +	_total_mapcount = ret = 0;
> +	for (i = 0; i < HPAGE_PMD_NR; i++) {
> +		mapcount = atomic_read(&page[i]._mapcount) + 1;
> +		ret = max(ret, mapcount);
> +		_total_mapcount += mapcount;
> +	}
> +	if (PageDoubleMap(page)) {
> +		ret -= 1;
> +		_total_mapcount -= HPAGE_PMD_NR;
> +	}
> +	mapcount = compound_mapcount(page);
> +	ret += mapcount;
> +	_total_mapcount += mapcount;
> +	if (total_mapcount)
> +		*total_mapcount = _total_mapcount;
> +	return ret;
> +}
> +
> +/*
>   * This function splits huge page into normal pages. @page can point to any
>   * subpage of huge page to split. Split doesn't change the position of @page.
>   *
> diff --git a/mm/memory.c b/mm/memory.c
> index 93897f2..d7a2af3 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2340,6 +2340,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  	 * not dirty accountable.
>  	 */
>  	if (PageAnon(old_page) && !PageKsm(old_page)) {
> +		int total_mapcount;
>  		if (!trylock_page(old_page)) {
>  			get_page(old_page);
>  			pte_unmap_unlock(page_table, ptl);
> @@ -2354,13 +2355,18 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  			}
>  			put_page(old_page);
>  		}
> -		if (reuse_swap_page(old_page)) {
> -			/*
> -			 * The page is all ours.  Move it to our anon_vma so
> -			 * the rmap code will not search our parent or siblings.
> -			 * Protected against the rmap code by the page lock.
> -			 */
> -			page_move_anon_rmap(old_page, vma, address);
> +		if (reuse_swap_page(old_page, &total_mapcount)) {
> +			if (total_mapcount == 1) {
> +				/*
> +				 * The page is all ours. Move it to
> +				 * our anon_vma so the rmap code will
> +				 * not search our parent or siblings.
> +				 * Protected against the rmap code by
> +				 * the page lock.
> +				 */
> +				page_move_anon_rmap(compound_head(old_page),
> +						    vma, address);
> +			}
>  			unlock_page(old_page);
>  			return wp_page_reuse(mm, vma, address, page_table, ptl,
>  					     orig_pte, old_page, 0, 0);
> @@ -2584,7 +2590,7 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  	inc_mm_counter_fast(mm, MM_ANONPAGES);
>  	dec_mm_counter_fast(mm, MM_SWAPENTS);
>  	pte = mk_pte(page, vma->vm_page_prot);
> -	if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page)) {
> +	if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) {
>  		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
>  		flags &= ~FAULT_FLAG_WRITE;
>  		ret |= VM_FAULT_WRITE;
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 83874ec..031713ab 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -922,18 +922,19 @@ out:
>   * to it.  And as a side-effect, free up its swap: because the old content
>   * on disk will never be read, and seeking back there to write new content
>   * later would only waste time away from clustering.
> + *
> + * NOTE: total_mapcount should not be relied upon by the caller if
> + * reuse_swap_page() returns false, but it may be always overwritten
> + * (see the other implementation for CONFIG_SWAP=n).
>   */
> -int reuse_swap_page(struct page *page)
> +bool reuse_swap_page(struct page *page, int *total_mapcount)
>  {
>  	int count;
>  
>  	VM_BUG_ON_PAGE(!PageLocked(page), page);
>  	if (unlikely(PageKsm(page)))
> -		return 0;
> -	/* The page is part of THP and cannot be reused */
> -	if (PageTransCompound(page))
> -		return 0;
> -	count = page_mapcount(page);
> +		return false;
> +	count = page_trans_huge_mapcount(page, total_mapcount);
>  	if (count <= 1 && PageSwapCache(page)) {
>  		count += page_swapcount(page);
>  		if (count == 1 && !PageWriteback(page)) {

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/3] mm: thp: calculate the mapcount correctly for THP pages during WP faults
@ 2016-05-06 22:10     ` Alex Williamson
  0 siblings, 0 replies; 27+ messages in thread
From: Alex Williamson @ 2016-05-06 22:10 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, linux-mm, linux-kernel, Kirill A. Shutemov

On Fri,  6 May 2016 17:03:58 +0200
Andrea Arcangeli <aarcange@redhat.com> wrote:

> This will provide fully accuracy to the mapcount calculation in the
> write protect faults, so page pinning will not get broken by false
> positive copy-on-writes.
> 
> total_mapcount() isn't the right calculation needed in
> reuse_swap_page(), so this introduces a page_trans_huge_mapcount()
> that is effectively the full accurate return value for page_mapcount()
> if dealing with Transparent Hugepages, however we only use the
> page_trans_huge_mapcount() during COW faults where it strictly needed,
> due to its higher runtime cost.
> 
> This also provide at practical zero cost the total_mapcount
> information which is needed to know if we can still relocate the page
> anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we
> can reuse the page no matter if it's a pte or a pmd_trans_huge
> triggering the fault, but we can only relocate the page anon_vma to
> the local vma->anon_vma if we're sure it's only this "vma" mapping the
> whole THP physical range.
> 
> Kirill A. Shutemov discovered the problem with moving the page
> anon_vma to the local vma->anon_vma in a previous version of this
> patch and another problem in the way page_move_anon_rmap() was called.
> 
> Reviewed-by: "Kirill A. Shutemov" <kirill@shutemov.name>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---

I've been running my original vfio test case on this for over 6 hours
with no sign of error.

Tested-by: Alex Williamson <alex.williamson@redhat.com>

Can we also get a stable tag for this for v4.5 please?  Thanks,

Alex

>  include/linux/mm.h   |  9 +++++++
>  include/linux/swap.h |  8 ++++---
>  mm/huge_memory.c     | 67 +++++++++++++++++++++++++++++++++++++++++++++-------
>  mm/memory.c          | 22 ++++++++++-------
>  mm/swapfile.c        | 13 +++++-----
>  5 files changed, 93 insertions(+), 26 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a55e5be..263f229 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -500,11 +500,20 @@ static inline int page_mapcount(struct page *page)
>  
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  int total_mapcount(struct page *page);
> +int page_trans_huge_mapcount(struct page *page, int *total_mapcount);
>  #else
>  static inline int total_mapcount(struct page *page)
>  {
>  	return page_mapcount(page);
>  }
> +static inline int page_trans_huge_mapcount(struct page *page,
> +					   int *total_mapcount)
> +{
> +	int mapcount = page_mapcount(page);
> +	if (total_mapcount)
> +		*total_mapcount = mapcount;
> +	return mapcount;
> +}
>  #endif
>  
>  static inline struct page *virt_to_head_page(const void *x)
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 2b83359..acef20d 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -418,7 +418,7 @@ extern sector_t swapdev_block(int, pgoff_t);
>  extern int page_swapcount(struct page *);
>  extern int swp_swapcount(swp_entry_t entry);
>  extern struct swap_info_struct *page_swap_info(struct page *);
> -extern int reuse_swap_page(struct page *);
> +extern bool reuse_swap_page(struct page *, int *);
>  extern int try_to_free_swap(struct page *);
>  struct backing_dev_info;
>  
> @@ -513,8 +513,10 @@ static inline int swp_swapcount(swp_entry_t entry)
>  	return 0;
>  }
>  
> -#define reuse_swap_page(page) \
> -	(!PageTransCompound(page) && page_mapcount(page) == 1)
> +static inline bool reuse_swap_page(struct page *page, int *total_mapcount)
> +{
> +	return page_trans_huge_mapcount(page, total_mapcount) == 1;
> +}
>  
>  static inline int try_to_free_swap(struct page *page)
>  {
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 86f9f8b..9086793 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1298,15 +1298,9 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  	VM_BUG_ON_PAGE(!PageCompound(page) || !PageHead(page), page);
>  	/*
>  	 * We can only reuse the page if nobody else maps the huge page or it's
> -	 * part. We can do it by checking page_mapcount() on each sub-page, but
> -	 * it's expensive.
> -	 * The cheaper way is to check page_count() to be equal 1: every
> -	 * mapcount takes page reference reference, so this way we can
> -	 * guarantee, that the PMD is the only mapping.
> -	 * This can give false negative if somebody pinned the page, but that's
> -	 * fine.
> +	 * part.
>  	 */
> -	if (page_mapcount(page) == 1 && page_count(page) == 1) {
> +	if (page_trans_huge_mapcount(page, NULL) == 1) {
>  		pmd_t entry;
>  		entry = pmd_mkyoung(orig_pmd);
>  		entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
> @@ -2080,7 +2074,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>  		if (pte_write(pteval)) {
>  			writable = true;
>  		} else {
> -			if (PageSwapCache(page) && !reuse_swap_page(page)) {
> +			if (PageSwapCache(page) &&
> +			    !reuse_swap_page(page, NULL)) {
>  				unlock_page(page);
>  				result = SCAN_SWAP_CACHE_PAGE;
>  				goto out;
> @@ -3225,6 +3220,60 @@ int total_mapcount(struct page *page)
>  }
>  
>  /*
> + * This calculates accurately how many mappings a transparent hugepage
> + * has (unlike page_mapcount() which isn't fully accurate). This full
> + * accuracy is primarily needed to know if copy-on-write faults can
> + * reuse the page and change the mapping to read-write instead of
> + * copying them. At the same time this returns the total_mapcount too.
> + *
> + * The function returns the highest mapcount any one of the subpages
> + * has. If the return value is one, even if different processes are
> + * mapping different subpages of the transparent hugepage, they can
> + * all reuse it, because each process is reusing a different subpage.
> + *
> + * The total_mapcount is instead counting all virtual mappings of the
> + * subpages. If the total_mapcount is equal to "one", it tells the
> + * caller all mappings belong to the same "mm" and in turn the
> + * anon_vma of the transparent hugepage can become the vma->anon_vma
> + * local one as no other process may be mapping any of the subpages.
> + *
> + * It would be more accurate to replace page_mapcount() with
> + * page_trans_huge_mapcount(), however we only use
> + * page_trans_huge_mapcount() in the copy-on-write faults where we
> + * need full accuracy to avoid breaking page pinning, because
> + * page_trans_huge_mapcount() is slower than page_mapcount().
> + */
> +int page_trans_huge_mapcount(struct page *page, int *total_mapcount)
> +{
> +	int i, ret, _total_mapcount, mapcount;
> +
> +	/* hugetlbfs shouldn't call it */
> +	VM_BUG_ON_PAGE(PageHuge(page), page);
> +
> +	if (likely(!PageTransCompound(page)))
> +		return atomic_read(&page->_mapcount) + 1;
> +
> +	page = compound_head(page);
> +
> +	_total_mapcount = ret = 0;
> +	for (i = 0; i < HPAGE_PMD_NR; i++) {
> +		mapcount = atomic_read(&page[i]._mapcount) + 1;
> +		ret = max(ret, mapcount);
> +		_total_mapcount += mapcount;
> +	}
> +	if (PageDoubleMap(page)) {
> +		ret -= 1;
> +		_total_mapcount -= HPAGE_PMD_NR;
> +	}
> +	mapcount = compound_mapcount(page);
> +	ret += mapcount;
> +	_total_mapcount += mapcount;
> +	if (total_mapcount)
> +		*total_mapcount = _total_mapcount;
> +	return ret;
> +}
> +
> +/*
>   * This function splits huge page into normal pages. @page can point to any
>   * subpage of huge page to split. Split doesn't change the position of @page.
>   *
> diff --git a/mm/memory.c b/mm/memory.c
> index 93897f2..d7a2af3 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2340,6 +2340,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  	 * not dirty accountable.
>  	 */
>  	if (PageAnon(old_page) && !PageKsm(old_page)) {
> +		int total_mapcount;
>  		if (!trylock_page(old_page)) {
>  			get_page(old_page);
>  			pte_unmap_unlock(page_table, ptl);
> @@ -2354,13 +2355,18 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  			}
>  			put_page(old_page);
>  		}
> -		if (reuse_swap_page(old_page)) {
> -			/*
> -			 * The page is all ours.  Move it to our anon_vma so
> -			 * the rmap code will not search our parent or siblings.
> -			 * Protected against the rmap code by the page lock.
> -			 */
> -			page_move_anon_rmap(old_page, vma, address);
> +		if (reuse_swap_page(old_page, &total_mapcount)) {
> +			if (total_mapcount == 1) {
> +				/*
> +				 * The page is all ours. Move it to
> +				 * our anon_vma so the rmap code will
> +				 * not search our parent or siblings.
> +				 * Protected against the rmap code by
> +				 * the page lock.
> +				 */
> +				page_move_anon_rmap(compound_head(old_page),
> +						    vma, address);
> +			}
>  			unlock_page(old_page);
>  			return wp_page_reuse(mm, vma, address, page_table, ptl,
>  					     orig_pte, old_page, 0, 0);
> @@ -2584,7 +2590,7 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  	inc_mm_counter_fast(mm, MM_ANONPAGES);
>  	dec_mm_counter_fast(mm, MM_SWAPENTS);
>  	pte = mk_pte(page, vma->vm_page_prot);
> -	if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page)) {
> +	if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) {
>  		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
>  		flags &= ~FAULT_FLAG_WRITE;
>  		ret |= VM_FAULT_WRITE;
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 83874ec..031713ab 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -922,18 +922,19 @@ out:
>   * to it.  And as a side-effect, free up its swap: because the old content
>   * on disk will never be read, and seeking back there to write new content
>   * later would only waste time away from clustering.
> + *
> + * NOTE: total_mapcount should not be relied upon by the caller if
> + * reuse_swap_page() returns false, but it may be always overwritten
> + * (see the other implementation for CONFIG_SWAP=n).
>   */
> -int reuse_swap_page(struct page *page)
> +bool reuse_swap_page(struct page *page, int *total_mapcount)
>  {
>  	int count;
>  
>  	VM_BUG_ON_PAGE(!PageLocked(page), page);
>  	if (unlikely(PageKsm(page)))
> -		return 0;
> -	/* The page is part of THP and cannot be reused */
> -	if (PageTransCompound(page))
> -		return 0;
> -	count = page_mapcount(page);
> +		return false;
> +	count = page_trans_huge_mapcount(page, total_mapcount);
>  	if (count <= 1 && PageSwapCache(page)) {
>  		count += page_swapcount(page);
>  		if (count == 1 && !PageWriteback(page)) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/3] mm: thp: calculate the mapcount correctly for THP pages during WP faults
  2016-05-06 15:03   ` Andrea Arcangeli
@ 2016-05-09 22:26     ` Andrew Morton
  -1 siblings, 0 replies; 27+ messages in thread
From: Andrew Morton @ 2016-05-09 22:26 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: linux-mm, linux-kernel, Alex Williamson, Kirill A. Shutemov

On Fri,  6 May 2016 17:03:58 +0200 Andrea Arcangeli <aarcange@redhat.com> wrote:

> This will provide fully accuracy to the mapcount calculation in the
> write protect faults, so page pinning will not get broken by false
> positive copy-on-writes.
> 
> total_mapcount() isn't the right calculation needed in
> reuse_swap_page(), so this introduces a page_trans_huge_mapcount()
> that is effectively the full accurate return value for page_mapcount()
> if dealing with Transparent Hugepages, however we only use the
> page_trans_huge_mapcount() during COW faults where it strictly needed,
> due to its higher runtime cost.
> 
> This also provide at practical zero cost the total_mapcount
> information which is needed to know if we can still relocate the page
> anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we
> can reuse the page no matter if it's a pte or a pmd_trans_huge
> triggering the fault, but we can only relocate the page anon_vma to
> the local vma->anon_vma if we're sure it's only this "vma" mapping the
> whole THP physical range.
> 
> Kirill A. Shutemov discovered the problem with moving the page
> anon_vma to the local vma->anon_vma in a previous version of this
> patch and another problem in the way page_move_anon_rmap() was called.

x86_64 allnoconfig:

include/linux/swap.h: In function 'reuse_swap_page':
include/linux/swap.h:518: error: implicit declaration of function 'page_trans_huge_mapcount'
In file included from include/linux/suspend.h:8,
                 from arch/x86/kernel/asm-offsets.c:12:
include/linux/mm.h: At top level:
include/linux/mm.h:509: error: static declaration of 'page_trans_huge_mapcount' follows non-static declaration
include/linux/swap.h:518: note: previous implicit declaration of 'page_trans_huge_mapcount' was here

include ordering I assume.  mm.h vs swap.h.


I did the below (whcih is a bit dumb) but got


mm/built-in.o: In function `do_wp_page':
memory.c:(.text+0x194b3): undefined reference to `page_trans_huge_mapcount'

and gave up.



From: Andrew Morton <akpm@linux-foundation.org>
Subject: mm-thp-calculate-the-mapcount-correctly-for-thp-pages-during-wp-faults-fix

Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: <stable@vger.kernel.org>	[4.5]
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mm.h   |   10 +---------
 include/linux/swap.h |    2 ++
 mm/huge_memory.c     |   12 ++++++++++++
 3 files changed, 15 insertions(+), 9 deletions(-)

--- a/include/linux/mm.h~mm-thp-calculate-the-mapcount-correctly-for-thp-pages-during-wp-faults-fix
+++ a/include/linux/mm.h
@@ -500,21 +500,13 @@ static inline int page_mapcount(struct p
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 int total_mapcount(struct page *page);
-int page_trans_huge_mapcount(struct page *page, int *total_mapcount);
 #else
 static inline int total_mapcount(struct page *page)
 {
 	return page_mapcount(page);
 }
-static inline int page_trans_huge_mapcount(struct page *page,
-					   int *total_mapcount)
-{
-	int mapcount = page_mapcount(page);
-	if (total_mapcount)
-		*total_mapcount = mapcount;
-	return mapcount;
-}
 #endif
+int page_trans_huge_mapcount(struct page *page, int *total_mapcount);
 
 static inline struct page *virt_to_head_page(const void *x)
 {
--- a/include/linux/swap.h~mm-thp-calculate-the-mapcount-correctly-for-thp-pages-during-wp-faults-fix
+++ a/include/linux/swap.h
@@ -248,6 +248,8 @@ struct swap_info_struct {
 	struct swap_cluster_info discard_cluster_tail; /* list tail of discard clusters */
 };
 
+int page_trans_huge_mapcount(struct page *page, int *total_mapcount);
+
 /* linux/mm/workingset.c */
 void *workingset_eviction(struct address_space *mapping, struct page *page);
 bool workingset_refault(void *shadow);
--- a/mm/huge_memory.c~mm-thp-calculate-the-mapcount-correctly-for-thp-pages-during-wp-faults-fix
+++ a/mm/huge_memory.c
@@ -3217,6 +3217,7 @@ int total_mapcount(struct page *page)
 	return ret;
 }
 
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
 /*
  * This calculates accurately how many mappings a transparent hugepage
  * has (unlike page_mapcount() which isn't fully accurate). This full
@@ -3271,6 +3272,17 @@ int page_trans_huge_mapcount(struct page
 	return ret;
 }
 
+#else
+
+int page_trans_huge_mapcount(struct page *page, int *total_mapcount)
+{
+	int mapcount = page_mapcount(page);
+	if (total_mapcount)
+		*total_mapcount = mapcount;
+	return mapcount;
+}
+#endif
+
 /*
  * This function splits huge page into normal pages. @page can point to any
  * subpage of huge page to split. Split doesn't change the position of @page.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/3] mm: thp: calculate the mapcount correctly for THP pages during WP faults
@ 2016-05-09 22:26     ` Andrew Morton
  0 siblings, 0 replies; 27+ messages in thread
From: Andrew Morton @ 2016-05-09 22:26 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: linux-mm, linux-kernel, Alex Williamson, Kirill A. Shutemov

On Fri,  6 May 2016 17:03:58 +0200 Andrea Arcangeli <aarcange@redhat.com> wrote:

> This will provide fully accuracy to the mapcount calculation in the
> write protect faults, so page pinning will not get broken by false
> positive copy-on-writes.
> 
> total_mapcount() isn't the right calculation needed in
> reuse_swap_page(), so this introduces a page_trans_huge_mapcount()
> that is effectively the full accurate return value for page_mapcount()
> if dealing with Transparent Hugepages, however we only use the
> page_trans_huge_mapcount() during COW faults where it strictly needed,
> due to its higher runtime cost.
> 
> This also provide at practical zero cost the total_mapcount
> information which is needed to know if we can still relocate the page
> anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we
> can reuse the page no matter if it's a pte or a pmd_trans_huge
> triggering the fault, but we can only relocate the page anon_vma to
> the local vma->anon_vma if we're sure it's only this "vma" mapping the
> whole THP physical range.
> 
> Kirill A. Shutemov discovered the problem with moving the page
> anon_vma to the local vma->anon_vma in a previous version of this
> patch and another problem in the way page_move_anon_rmap() was called.

x86_64 allnoconfig:

include/linux/swap.h: In function 'reuse_swap_page':
include/linux/swap.h:518: error: implicit declaration of function 'page_trans_huge_mapcount'
In file included from include/linux/suspend.h:8,
                 from arch/x86/kernel/asm-offsets.c:12:
include/linux/mm.h: At top level:
include/linux/mm.h:509: error: static declaration of 'page_trans_huge_mapcount' follows non-static declaration
include/linux/swap.h:518: note: previous implicit declaration of 'page_trans_huge_mapcount' was here

include ordering I assume.  mm.h vs swap.h.


I did the below (whcih is a bit dumb) but got


mm/built-in.o: In function `do_wp_page':
memory.c:(.text+0x194b3): undefined reference to `page_trans_huge_mapcount'

and gave up.



From: Andrew Morton <akpm@linux-foundation.org>
Subject: mm-thp-calculate-the-mapcount-correctly-for-thp-pages-during-wp-faults-fix

Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: <stable@vger.kernel.org>	[4.5]
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mm.h   |   10 +---------
 include/linux/swap.h |    2 ++
 mm/huge_memory.c     |   12 ++++++++++++
 3 files changed, 15 insertions(+), 9 deletions(-)

--- a/include/linux/mm.h~mm-thp-calculate-the-mapcount-correctly-for-thp-pages-during-wp-faults-fix
+++ a/include/linux/mm.h
@@ -500,21 +500,13 @@ static inline int page_mapcount(struct p
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 int total_mapcount(struct page *page);
-int page_trans_huge_mapcount(struct page *page, int *total_mapcount);
 #else
 static inline int total_mapcount(struct page *page)
 {
 	return page_mapcount(page);
 }
-static inline int page_trans_huge_mapcount(struct page *page,
-					   int *total_mapcount)
-{
-	int mapcount = page_mapcount(page);
-	if (total_mapcount)
-		*total_mapcount = mapcount;
-	return mapcount;
-}
 #endif
+int page_trans_huge_mapcount(struct page *page, int *total_mapcount);
 
 static inline struct page *virt_to_head_page(const void *x)
 {
--- a/include/linux/swap.h~mm-thp-calculate-the-mapcount-correctly-for-thp-pages-during-wp-faults-fix
+++ a/include/linux/swap.h
@@ -248,6 +248,8 @@ struct swap_info_struct {
 	struct swap_cluster_info discard_cluster_tail; /* list tail of discard clusters */
 };
 
+int page_trans_huge_mapcount(struct page *page, int *total_mapcount);
+
 /* linux/mm/workingset.c */
 void *workingset_eviction(struct address_space *mapping, struct page *page);
 bool workingset_refault(void *shadow);
--- a/mm/huge_memory.c~mm-thp-calculate-the-mapcount-correctly-for-thp-pages-during-wp-faults-fix
+++ a/mm/huge_memory.c
@@ -3217,6 +3217,7 @@ int total_mapcount(struct page *page)
 	return ret;
 }
 
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
 /*
  * This calculates accurately how many mappings a transparent hugepage
  * has (unlike page_mapcount() which isn't fully accurate). This full
@@ -3271,6 +3272,17 @@ int page_trans_huge_mapcount(struct page
 	return ret;
 }
 
+#else
+
+int page_trans_huge_mapcount(struct page *page, int *total_mapcount)
+{
+	int mapcount = page_mapcount(page);
+	if (total_mapcount)
+		*total_mapcount = mapcount;
+	return mapcount;
+}
+#endif
+
 /*
  * This function splits huge page into normal pages. @page can point to any
  * subpage of huge page to split. Split doesn't change the position of @page.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/1] mm: thp: calculate the mapcount correctly for THP pages during WP faults
  2016-05-06 15:03   ` Andrea Arcangeli
@ 2016-05-10 19:21     ` Andrea Arcangeli
  -1 siblings, 0 replies; 27+ messages in thread
From: Andrea Arcangeli @ 2016-05-10 19:21 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, linux-kernel; +Cc: Alex Williamson, Kirill A. Shutemov

This will provide fully accuracy to the mapcount calculation in the
write protect faults, so page pinning will not get broken by false
positive copy-on-writes.

total_mapcount() isn't the right calculation needed in
reuse_swap_page(), so this introduces a page_trans_huge_mapcount()
that is effectively the full accurate return value for page_mapcount()
if dealing with Transparent Hugepages, however we only use the
page_trans_huge_mapcount() during COW faults where it strictly needed,
due to its higher runtime cost.

This also provide at practical zero cost the total_mapcount
information which is needed to know if we can still relocate the page
anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we
can reuse the page no matter if it's a pte or a pmd_trans_huge
triggering the fault, but we can only relocate the page anon_vma to
the local vma->anon_vma if we're sure it's only this "vma" mapping the
whole THP physical range.

Kirill A. Shutemov discovered the problem with moving the page
anon_vma to the local vma->anon_vma in a previous version of this
patch and another problem in the way page_move_anon_rmap() was called.

Andrew Morton discovered that CONFIG_SWAP=n wouldn't build in a
previous version, because reuse_swap_page must be a macro to call
page_trans_huge_mapcount from swap.h, so this uses a macro again
instead of an inline function. With this change at least it's a less
dangerous usage than it was before, because "page" is used only once
now, while with the previous code reuse_swap_page(page++) would have
called page_mapcount on page+1 and it would have increased page twice
instead of just once.

Reviewed-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 include/linux/mm.h   |  9 +++++++
 include/linux/swap.h |  6 ++---
 mm/huge_memory.c     | 67 +++++++++++++++++++++++++++++++++++++++++++++-------
 mm/memory.c          | 22 ++++++++++-------
 mm/swapfile.c        | 13 +++++-----
 5 files changed, 91 insertions(+), 26 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 864d722..8f468e0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -500,11 +500,20 @@ static inline int page_mapcount(struct page *page)
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 int total_mapcount(struct page *page);
+int page_trans_huge_mapcount(struct page *page, int *total_mapcount);
 #else
 static inline int total_mapcount(struct page *page)
 {
 	return page_mapcount(page);
 }
+static inline int page_trans_huge_mapcount(struct page *page,
+					   int *total_mapcount)
+{
+	int mapcount = page_mapcount(page);
+	if (total_mapcount)
+		*total_mapcount = mapcount;
+	return mapcount;
+}
 #endif
 
 static inline struct page *virt_to_head_page(const void *x)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 0a4cd47..ad22035 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -418,7 +418,7 @@ extern sector_t swapdev_block(int, pgoff_t);
 extern int page_swapcount(struct page *);
 extern int swp_swapcount(swp_entry_t entry);
 extern struct swap_info_struct *page_swap_info(struct page *);
-extern int reuse_swap_page(struct page *);
+extern bool reuse_swap_page(struct page *, int *);
 extern int try_to_free_swap(struct page *);
 struct backing_dev_info;
 
@@ -513,8 +513,8 @@ static inline int swp_swapcount(swp_entry_t entry)
 	return 0;
 }
 
-#define reuse_swap_page(page) \
-	(!PageTransCompound(page) && page_mapcount(page) == 1)
+#define reuse_swap_page(page, total_mapcount) \
+	(page_trans_huge_mapcount(page, total_mapcount) == 1)
 
 static inline int try_to_free_swap(struct page *page)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f7daa7d..cbd2142 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1298,15 +1298,9 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	VM_BUG_ON_PAGE(!PageCompound(page) || !PageHead(page), page);
 	/*
 	 * We can only reuse the page if nobody else maps the huge page or it's
-	 * part. We can do it by checking page_mapcount() on each sub-page, but
-	 * it's expensive.
-	 * The cheaper way is to check page_count() to be equal 1: every
-	 * mapcount takes page reference reference, so this way we can
-	 * guarantee, that the PMD is the only mapping.
-	 * This can give false negative if somebody pinned the page, but that's
-	 * fine.
+	 * part.
 	 */
-	if (page_mapcount(page) == 1 && page_count(page) == 1) {
+	if (page_trans_huge_mapcount(page, NULL) == 1) {
 		pmd_t entry;
 		entry = pmd_mkyoung(orig_pmd);
 		entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
@@ -2079,7 +2073,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 		if (pte_write(pteval)) {
 			writable = true;
 		} else {
-			if (PageSwapCache(page) && !reuse_swap_page(page)) {
+			if (PageSwapCache(page) &&
+			    !reuse_swap_page(page, NULL)) {
 				unlock_page(page);
 				result = SCAN_SWAP_CACHE_PAGE;
 				goto out;
@@ -3223,6 +3218,60 @@ int total_mapcount(struct page *page)
 }
 
 /*
+ * This calculates accurately how many mappings a transparent hugepage
+ * has (unlike page_mapcount() which isn't fully accurate). This full
+ * accuracy is primarily needed to know if copy-on-write faults can
+ * reuse the page and change the mapping to read-write instead of
+ * copying them. At the same time this returns the total_mapcount too.
+ *
+ * The function returns the highest mapcount any one of the subpages
+ * has. If the return value is one, even if different processes are
+ * mapping different subpages of the transparent hugepage, they can
+ * all reuse it, because each process is reusing a different subpage.
+ *
+ * The total_mapcount is instead counting all virtual mappings of the
+ * subpages. If the total_mapcount is equal to "one", it tells the
+ * caller all mappings belong to the same "mm" and in turn the
+ * anon_vma of the transparent hugepage can become the vma->anon_vma
+ * local one as no other process may be mapping any of the subpages.
+ *
+ * It would be more accurate to replace page_mapcount() with
+ * page_trans_huge_mapcount(), however we only use
+ * page_trans_huge_mapcount() in the copy-on-write faults where we
+ * need full accuracy to avoid breaking page pinning, because
+ * page_trans_huge_mapcount() is slower than page_mapcount().
+ */
+int page_trans_huge_mapcount(struct page *page, int *total_mapcount)
+{
+	int i, ret, _total_mapcount, mapcount;
+
+	/* hugetlbfs shouldn't call it */
+	VM_BUG_ON_PAGE(PageHuge(page), page);
+
+	if (likely(!PageTransCompound(page)))
+		return atomic_read(&page->_mapcount) + 1;
+
+	page = compound_head(page);
+
+	_total_mapcount = ret = 0;
+	for (i = 0; i < HPAGE_PMD_NR; i++) {
+		mapcount = atomic_read(&page[i]._mapcount) + 1;
+		ret = max(ret, mapcount);
+		_total_mapcount += mapcount;
+	}
+	if (PageDoubleMap(page)) {
+		ret -= 1;
+		_total_mapcount -= HPAGE_PMD_NR;
+	}
+	mapcount = compound_mapcount(page);
+	ret += mapcount;
+	_total_mapcount += mapcount;
+	if (total_mapcount)
+		*total_mapcount = _total_mapcount;
+	return ret;
+}
+
+/*
  * This function splits huge page into normal pages. @page can point to any
  * subpage of huge page to split. Split doesn't change the position of @page.
  *
diff --git a/mm/memory.c b/mm/memory.c
index 52c218e..07493e3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2373,6 +2373,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * not dirty accountable.
 	 */
 	if (PageAnon(old_page) && !PageKsm(old_page)) {
+		int total_mapcount;
 		if (!trylock_page(old_page)) {
 			get_page(old_page);
 			pte_unmap_unlock(page_table, ptl);
@@ -2387,13 +2388,18 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			}
 			put_page(old_page);
 		}
-		if (reuse_swap_page(old_page)) {
-			/*
-			 * The page is all ours.  Move it to our anon_vma so
-			 * the rmap code will not search our parent or siblings.
-			 * Protected against the rmap code by the page lock.
-			 */
-			page_move_anon_rmap(old_page, vma, address);
+		if (reuse_swap_page(old_page, &total_mapcount)) {
+			if (total_mapcount == 1) {
+				/*
+				 * The page is all ours. Move it to
+				 * our anon_vma so the rmap code will
+				 * not search our parent or siblings.
+				 * Protected against the rmap code by
+				 * the page lock.
+				 */
+				page_move_anon_rmap(compound_head(old_page),
+						    vma, address);
+			}
 			unlock_page(old_page);
 			return wp_page_reuse(mm, vma, address, page_table, ptl,
 					     orig_pte, old_page, 0, 0);
@@ -2617,7 +2623,7 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	inc_mm_counter_fast(mm, MM_ANONPAGES);
 	dec_mm_counter_fast(mm, MM_SWAPENTS);
 	pte = mk_pte(page, vma->vm_page_prot);
-	if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page)) {
+	if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) {
 		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
 		flags &= ~FAULT_FLAG_WRITE;
 		ret |= VM_FAULT_WRITE;
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 83874ec..031713ab 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -922,18 +922,19 @@ out:
  * to it.  And as a side-effect, free up its swap: because the old content
  * on disk will never be read, and seeking back there to write new content
  * later would only waste time away from clustering.
+ *
+ * NOTE: total_mapcount should not be relied upon by the caller if
+ * reuse_swap_page() returns false, but it may be always overwritten
+ * (see the other implementation for CONFIG_SWAP=n).
  */
-int reuse_swap_page(struct page *page)
+bool reuse_swap_page(struct page *page, int *total_mapcount)
 {
 	int count;
 
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 	if (unlikely(PageKsm(page)))
-		return 0;
-	/* The page is part of THP and cannot be reused */
-	if (PageTransCompound(page))
-		return 0;
-	count = page_mapcount(page);
+		return false;
+	count = page_trans_huge_mapcount(page, total_mapcount);
 	if (count <= 1 && PageSwapCache(page)) {
 		count += page_swapcount(page);
 		if (count == 1 && !PageWriteback(page)) {

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 1/1] mm: thp: calculate the mapcount correctly for THP pages during WP faults
@ 2016-05-10 19:21     ` Andrea Arcangeli
  0 siblings, 0 replies; 27+ messages in thread
From: Andrea Arcangeli @ 2016-05-10 19:21 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, linux-kernel; +Cc: Alex Williamson, Kirill A. Shutemov

This will provide fully accuracy to the mapcount calculation in the
write protect faults, so page pinning will not get broken by false
positive copy-on-writes.

total_mapcount() isn't the right calculation needed in
reuse_swap_page(), so this introduces a page_trans_huge_mapcount()
that is effectively the full accurate return value for page_mapcount()
if dealing with Transparent Hugepages, however we only use the
page_trans_huge_mapcount() during COW faults where it strictly needed,
due to its higher runtime cost.

This also provide at practical zero cost the total_mapcount
information which is needed to know if we can still relocate the page
anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we
can reuse the page no matter if it's a pte or a pmd_trans_huge
triggering the fault, but we can only relocate the page anon_vma to
the local vma->anon_vma if we're sure it's only this "vma" mapping the
whole THP physical range.

Kirill A. Shutemov discovered the problem with moving the page
anon_vma to the local vma->anon_vma in a previous version of this
patch and another problem in the way page_move_anon_rmap() was called.

Andrew Morton discovered that CONFIG_SWAP=n wouldn't build in a
previous version, because reuse_swap_page must be a macro to call
page_trans_huge_mapcount from swap.h, so this uses a macro again
instead of an inline function. With this change at least it's a less
dangerous usage than it was before, because "page" is used only once
now, while with the previous code reuse_swap_page(page++) would have
called page_mapcount on page+1 and it would have increased page twice
instead of just once.

Reviewed-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 include/linux/mm.h   |  9 +++++++
 include/linux/swap.h |  6 ++---
 mm/huge_memory.c     | 67 +++++++++++++++++++++++++++++++++++++++++++++-------
 mm/memory.c          | 22 ++++++++++-------
 mm/swapfile.c        | 13 +++++-----
 5 files changed, 91 insertions(+), 26 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 864d722..8f468e0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -500,11 +500,20 @@ static inline int page_mapcount(struct page *page)
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 int total_mapcount(struct page *page);
+int page_trans_huge_mapcount(struct page *page, int *total_mapcount);
 #else
 static inline int total_mapcount(struct page *page)
 {
 	return page_mapcount(page);
 }
+static inline int page_trans_huge_mapcount(struct page *page,
+					   int *total_mapcount)
+{
+	int mapcount = page_mapcount(page);
+	if (total_mapcount)
+		*total_mapcount = mapcount;
+	return mapcount;
+}
 #endif
 
 static inline struct page *virt_to_head_page(const void *x)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 0a4cd47..ad22035 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -418,7 +418,7 @@ extern sector_t swapdev_block(int, pgoff_t);
 extern int page_swapcount(struct page *);
 extern int swp_swapcount(swp_entry_t entry);
 extern struct swap_info_struct *page_swap_info(struct page *);
-extern int reuse_swap_page(struct page *);
+extern bool reuse_swap_page(struct page *, int *);
 extern int try_to_free_swap(struct page *);
 struct backing_dev_info;
 
@@ -513,8 +513,8 @@ static inline int swp_swapcount(swp_entry_t entry)
 	return 0;
 }
 
-#define reuse_swap_page(page) \
-	(!PageTransCompound(page) && page_mapcount(page) == 1)
+#define reuse_swap_page(page, total_mapcount) \
+	(page_trans_huge_mapcount(page, total_mapcount) == 1)
 
 static inline int try_to_free_swap(struct page *page)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f7daa7d..cbd2142 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1298,15 +1298,9 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	VM_BUG_ON_PAGE(!PageCompound(page) || !PageHead(page), page);
 	/*
 	 * We can only reuse the page if nobody else maps the huge page or it's
-	 * part. We can do it by checking page_mapcount() on each sub-page, but
-	 * it's expensive.
-	 * The cheaper way is to check page_count() to be equal 1: every
-	 * mapcount takes page reference reference, so this way we can
-	 * guarantee, that the PMD is the only mapping.
-	 * This can give false negative if somebody pinned the page, but that's
-	 * fine.
+	 * part.
 	 */
-	if (page_mapcount(page) == 1 && page_count(page) == 1) {
+	if (page_trans_huge_mapcount(page, NULL) == 1) {
 		pmd_t entry;
 		entry = pmd_mkyoung(orig_pmd);
 		entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
@@ -2079,7 +2073,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 		if (pte_write(pteval)) {
 			writable = true;
 		} else {
-			if (PageSwapCache(page) && !reuse_swap_page(page)) {
+			if (PageSwapCache(page) &&
+			    !reuse_swap_page(page, NULL)) {
 				unlock_page(page);
 				result = SCAN_SWAP_CACHE_PAGE;
 				goto out;
@@ -3223,6 +3218,60 @@ int total_mapcount(struct page *page)
 }
 
 /*
+ * This calculates accurately how many mappings a transparent hugepage
+ * has (unlike page_mapcount() which isn't fully accurate). This full
+ * accuracy is primarily needed to know if copy-on-write faults can
+ * reuse the page and change the mapping to read-write instead of
+ * copying them. At the same time this returns the total_mapcount too.
+ *
+ * The function returns the highest mapcount any one of the subpages
+ * has. If the return value is one, even if different processes are
+ * mapping different subpages of the transparent hugepage, they can
+ * all reuse it, because each process is reusing a different subpage.
+ *
+ * The total_mapcount is instead counting all virtual mappings of the
+ * subpages. If the total_mapcount is equal to "one", it tells the
+ * caller all mappings belong to the same "mm" and in turn the
+ * anon_vma of the transparent hugepage can become the vma->anon_vma
+ * local one as no other process may be mapping any of the subpages.
+ *
+ * It would be more accurate to replace page_mapcount() with
+ * page_trans_huge_mapcount(), however we only use
+ * page_trans_huge_mapcount() in the copy-on-write faults where we
+ * need full accuracy to avoid breaking page pinning, because
+ * page_trans_huge_mapcount() is slower than page_mapcount().
+ */
+int page_trans_huge_mapcount(struct page *page, int *total_mapcount)
+{
+	int i, ret, _total_mapcount, mapcount;
+
+	/* hugetlbfs shouldn't call it */
+	VM_BUG_ON_PAGE(PageHuge(page), page);
+
+	if (likely(!PageTransCompound(page)))
+		return atomic_read(&page->_mapcount) + 1;
+
+	page = compound_head(page);
+
+	_total_mapcount = ret = 0;
+	for (i = 0; i < HPAGE_PMD_NR; i++) {
+		mapcount = atomic_read(&page[i]._mapcount) + 1;
+		ret = max(ret, mapcount);
+		_total_mapcount += mapcount;
+	}
+	if (PageDoubleMap(page)) {
+		ret -= 1;
+		_total_mapcount -= HPAGE_PMD_NR;
+	}
+	mapcount = compound_mapcount(page);
+	ret += mapcount;
+	_total_mapcount += mapcount;
+	if (total_mapcount)
+		*total_mapcount = _total_mapcount;
+	return ret;
+}
+
+/*
  * This function splits huge page into normal pages. @page can point to any
  * subpage of huge page to split. Split doesn't change the position of @page.
  *
diff --git a/mm/memory.c b/mm/memory.c
index 52c218e..07493e3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2373,6 +2373,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * not dirty accountable.
 	 */
 	if (PageAnon(old_page) && !PageKsm(old_page)) {
+		int total_mapcount;
 		if (!trylock_page(old_page)) {
 			get_page(old_page);
 			pte_unmap_unlock(page_table, ptl);
@@ -2387,13 +2388,18 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			}
 			put_page(old_page);
 		}
-		if (reuse_swap_page(old_page)) {
-			/*
-			 * The page is all ours.  Move it to our anon_vma so
-			 * the rmap code will not search our parent or siblings.
-			 * Protected against the rmap code by the page lock.
-			 */
-			page_move_anon_rmap(old_page, vma, address);
+		if (reuse_swap_page(old_page, &total_mapcount)) {
+			if (total_mapcount == 1) {
+				/*
+				 * The page is all ours. Move it to
+				 * our anon_vma so the rmap code will
+				 * not search our parent or siblings.
+				 * Protected against the rmap code by
+				 * the page lock.
+				 */
+				page_move_anon_rmap(compound_head(old_page),
+						    vma, address);
+			}
 			unlock_page(old_page);
 			return wp_page_reuse(mm, vma, address, page_table, ptl,
 					     orig_pte, old_page, 0, 0);
@@ -2617,7 +2623,7 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	inc_mm_counter_fast(mm, MM_ANONPAGES);
 	dec_mm_counter_fast(mm, MM_SWAPENTS);
 	pte = mk_pte(page, vma->vm_page_prot);
-	if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page)) {
+	if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) {
 		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
 		flags &= ~FAULT_FLAG_WRITE;
 		ret |= VM_FAULT_WRITE;
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 83874ec..031713ab 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -922,18 +922,19 @@ out:
  * to it.  And as a side-effect, free up its swap: because the old content
  * on disk will never be read, and seeking back there to write new content
  * later would only waste time away from clustering.
+ *
+ * NOTE: total_mapcount should not be relied upon by the caller if
+ * reuse_swap_page() returns false, but it may be always overwritten
+ * (see the other implementation for CONFIG_SWAP=n).
  */
-int reuse_swap_page(struct page *page)
+bool reuse_swap_page(struct page *page, int *total_mapcount)
 {
 	int count;
 
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 	if (unlikely(PageKsm(page)))
-		return 0;
-	/* The page is part of THP and cannot be reused */
-	if (PageTransCompound(page))
-		return 0;
-	count = page_mapcount(page);
+		return false;
+	count = page_trans_huge_mapcount(page, total_mapcount);
 	if (count <= 1 && PageSwapCache(page)) {
 		count += page_swapcount(page);
 		if (count == 1 && !PageWriteback(page)) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [1/1] mm: thp: calculate the mapcount correctly for THP pages during WP faults
  2016-05-10 19:21     ` Andrea Arcangeli
  (?)
@ 2016-05-11 21:26         ` Mike Marciniszyn
  -1 siblings, 0 replies; 27+ messages in thread
From: Mike Marciniszyn @ 2016-05-11 21:26 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

>
>Reviewed-by: "Kirill A. Shutemov" <kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
>Signed-off-by: Andrea Arcangeli <aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>

Our RDMA tests are seeing an issue with memory locking that bisects to
commit 61f5d698cc97 ("mm: re-enable THP").

The test program registers two rather large MRs (512M) and RDMA writes
data to a passive peer using the first and RDMA reads it back into the
second MR and compares that data.  The sizes are chosen randomly between
0 and 1024 bytes.

The test will get through a few (<= 4 iterations) and then gets a compare error.

Tracing indicates the kernel logical addresses associated with the individual
pages at registration ARE correct , the data in the "RDMA read response only"
packets ARE correct.

The “corruption” occurs when the packet crosse two pages that are not
physically contiguous.   The second page reads back as zero in the program.

It looks like the user VA at the point of the compare error no longer points
to the same physical address as was registered.  

This patch totally resolves the issue!

Tested-by: Mike Marciniszyn <mike.marciniszy-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Tested-by: Josh Collier <josh.d.collier-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [1/1] mm: thp: calculate the mapcount correctly for THP pages during WP faults
@ 2016-05-11 21:26         ` Mike Marciniszyn
  0 siblings, 0 replies; 27+ messages in thread
From: Mike Marciniszyn @ 2016-05-11 21:26 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Andrew Morton, linux-mm, linux-kernel, linux-rdma

>
>Reviewed-by: "Kirill A. Shutemov" <kirill@shutemov.name>
>Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
>

Our RDMA tests are seeing an issue with memory locking that bisects to
commit 61f5d698cc97 ("mm: re-enable THP").

The test program registers two rather large MRs (512M) and RDMA writes
data to a passive peer using the first and RDMA reads it back into the
second MR and compares that data.  The sizes are chosen randomly between
0 and 1024 bytes.

The test will get through a few (<= 4 iterations) and then gets a compare error.

Tracing indicates the kernel logical addresses associated with the individual
pages at registration ARE correct , the data in the "RDMA read response only"
packets ARE correct.

The “corruption” occurs when the packet crosse two pages that are not
physically contiguous.   The second page reads back as zero in the program.

It looks like the user VA at the point of the compare error no longer points
to the same physical address as was registered.  

This patch totally resolves the issue!

Tested-by: Mike Marciniszyn <mike.marciniszy@intel.com>
Tested-by: Josh Collier <josh.d.collier@intel.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [1/1] mm: thp: calculate the mapcount correctly for THP pages during WP faults
@ 2016-05-11 21:26         ` Mike Marciniszyn
  0 siblings, 0 replies; 27+ messages in thread
From: Mike Marciniszyn @ 2016-05-11 21:26 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Andrew Morton, linux-mm, linux-kernel, linux-rdma

>
>Reviewed-by: "Kirill A. Shutemov" <kirill@shutemov.name>
>Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
>

Our RDMA tests are seeing an issue with memory locking that bisects to
commit 61f5d698cc97 ("mm: re-enable THP").

The test program registers two rather large MRs (512M) and RDMA writes
data to a passive peer using the first and RDMA reads it back into the
second MR and compares that data.  The sizes are chosen randomly between
0 and 1024 bytes.

The test will get through a few (<= 4 iterations) and then gets a compare error.

Tracing indicates the kernel logical addresses associated with the individual
pages at registration ARE correct , the data in the "RDMA read response only"
packets ARE correct.

The a??corruptiona?? occurs when the packet crosse two pages that are not
physically contiguous.   The second page reads back as zero in the program.

It looks like the user VA at the point of the compare error no longer points
to the same physical address as was registered.  

This patch totally resolves the issue!

Tested-by: Mike Marciniszyn <mike.marciniszy@intel.com>
Tested-by: Josh Collier <josh.d.collier@intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/1] mm: thp: calculate the mapcount correctly for THP pages during WP faults
  2016-05-10 19:21     ` Andrea Arcangeli
  (?)
@ 2016-05-12 16:32         ` Andrea Arcangeli
  -1 siblings, 0 replies; 27+ messages in thread
From: Andrea Arcangeli @ 2016-05-12 16:32 UTC (permalink / raw)
  To: Andrew Morton, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Alex Williamson, Kirill A. Shutemov, Luick, Dean, Marciniszyn, Mike

This will provide fully accuracy to the mapcount calculation in the
write protect faults, so page pinning will not get broken by false
positive copy-on-writes.

total_mapcount() isn't the right calculation needed in
reuse_swap_page(), so this introduces a page_trans_huge_mapcount()
that is effectively the full accurate return value for page_mapcount()
if dealing with Transparent Hugepages, however we only use the
page_trans_huge_mapcount() during COW faults where it strictly needed,
due to its higher runtime cost.

This also provide at practical zero cost the total_mapcount
information which is needed to know if we can still relocate the page
anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we
can reuse the page no matter if it's a pte or a pmd_trans_huge
triggering the fault, but we can only relocate the page anon_vma to
the local vma->anon_vma if we're sure it's only this "vma" mapping the
whole THP physical range.

Kirill A. Shutemov discovered the problem with moving the page
anon_vma to the local vma->anon_vma in a previous version of this
patch and another problem in the way page_move_anon_rmap() was called.

Andrew Morton discovered that CONFIG_SWAP=n wouldn't build in a
previous version, because reuse_swap_page must be a macro to call
page_trans_huge_mapcount from swap.h, so this uses a macro again
instead of an inline function. With this change at least it's a less
dangerous usage than it was before, because "page" is used only once
now, while with the previous code reuse_swap_page(page++) would have
called page_mapcount on page+1 and it would have increased page twice
instead of just once.

Dean Luick noticed an uninitialized variable that could result in a
rmap inefficiency for the non-THP case in a previous version.

Reviewed-by: "Kirill A. Shutemov" <kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
Signed-off-by: Andrea Arcangeli <aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 include/linux/mm.h   |  9 +++++++
 include/linux/swap.h |  6 ++---
 mm/huge_memory.c     | 71 +++++++++++++++++++++++++++++++++++++++++++++-------
 mm/memory.c          | 22 ++++++++++------
 mm/swapfile.c        | 13 +++++-----
 5 files changed, 95 insertions(+), 26 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 864d722..8f468e0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -500,11 +500,20 @@ static inline int page_mapcount(struct page *page)
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 int total_mapcount(struct page *page);
+int page_trans_huge_mapcount(struct page *page, int *total_mapcount);
 #else
 static inline int total_mapcount(struct page *page)
 {
 	return page_mapcount(page);
 }
+static inline int page_trans_huge_mapcount(struct page *page,
+					   int *total_mapcount)
+{
+	int mapcount = page_mapcount(page);
+	if (total_mapcount)
+		*total_mapcount = mapcount;
+	return mapcount;
+}
 #endif
 
 static inline struct page *virt_to_head_page(const void *x)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 0a4cd47..ad22035 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -418,7 +418,7 @@ extern sector_t swapdev_block(int, pgoff_t);
 extern int page_swapcount(struct page *);
 extern int swp_swapcount(swp_entry_t entry);
 extern struct swap_info_struct *page_swap_info(struct page *);
-extern int reuse_swap_page(struct page *);
+extern bool reuse_swap_page(struct page *, int *);
 extern int try_to_free_swap(struct page *);
 struct backing_dev_info;
 
@@ -513,8 +513,8 @@ static inline int swp_swapcount(swp_entry_t entry)
 	return 0;
 }
 
-#define reuse_swap_page(page) \
-	(!PageTransCompound(page) && page_mapcount(page) == 1)
+#define reuse_swap_page(page, total_mapcount) \
+	(page_trans_huge_mapcount(page, total_mapcount) == 1)
 
 static inline int try_to_free_swap(struct page *page)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f7daa7d..b49ee12 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1298,15 +1298,9 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	VM_BUG_ON_PAGE(!PageCompound(page) || !PageHead(page), page);
 	/*
 	 * We can only reuse the page if nobody else maps the huge page or it's
-	 * part. We can do it by checking page_mapcount() on each sub-page, but
-	 * it's expensive.
-	 * The cheaper way is to check page_count() to be equal 1: every
-	 * mapcount takes page reference reference, so this way we can
-	 * guarantee, that the PMD is the only mapping.
-	 * This can give false negative if somebody pinned the page, but that's
-	 * fine.
+	 * part.
 	 */
-	if (page_mapcount(page) == 1 && page_count(page) == 1) {
+	if (page_trans_huge_mapcount(page, NULL) == 1) {
 		pmd_t entry;
 		entry = pmd_mkyoung(orig_pmd);
 		entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
@@ -2079,7 +2073,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 		if (pte_write(pteval)) {
 			writable = true;
 		} else {
-			if (PageSwapCache(page) && !reuse_swap_page(page)) {
+			if (PageSwapCache(page) &&
+			    !reuse_swap_page(page, NULL)) {
 				unlock_page(page);
 				result = SCAN_SWAP_CACHE_PAGE;
 				goto out;
@@ -3223,6 +3218,64 @@ int total_mapcount(struct page *page)
 }
 
 /*
+ * This calculates accurately how many mappings a transparent hugepage
+ * has (unlike page_mapcount() which isn't fully accurate). This full
+ * accuracy is primarily needed to know if copy-on-write faults can
+ * reuse the page and change the mapping to read-write instead of
+ * copying them. At the same time this returns the total_mapcount too.
+ *
+ * The function returns the highest mapcount any one of the subpages
+ * has. If the return value is one, even if different processes are
+ * mapping different subpages of the transparent hugepage, they can
+ * all reuse it, because each process is reusing a different subpage.
+ *
+ * The total_mapcount is instead counting all virtual mappings of the
+ * subpages. If the total_mapcount is equal to "one", it tells the
+ * caller all mappings belong to the same "mm" and in turn the
+ * anon_vma of the transparent hugepage can become the vma->anon_vma
+ * local one as no other process may be mapping any of the subpages.
+ *
+ * It would be more accurate to replace page_mapcount() with
+ * page_trans_huge_mapcount(), however we only use
+ * page_trans_huge_mapcount() in the copy-on-write faults where we
+ * need full accuracy to avoid breaking page pinning, because
+ * page_trans_huge_mapcount() is slower than page_mapcount().
+ */
+int page_trans_huge_mapcount(struct page *page, int *total_mapcount)
+{
+	int i, ret, _total_mapcount, mapcount;
+
+	/* hugetlbfs shouldn't call it */
+	VM_BUG_ON_PAGE(PageHuge(page), page);
+
+	if (likely(!PageTransCompound(page))) {
+		mapcount = atomic_read(&page->_mapcount) + 1;
+		if (total_mapcount)
+			*total_mapcount = mapcount;
+		return mapcount;
+	}
+
+	page = compound_head(page);
+
+	_total_mapcount = ret = 0;
+	for (i = 0; i < HPAGE_PMD_NR; i++) {
+		mapcount = atomic_read(&page[i]._mapcount) + 1;
+		ret = max(ret, mapcount);
+		_total_mapcount += mapcount;
+	}
+	if (PageDoubleMap(page)) {
+		ret -= 1;
+		_total_mapcount -= HPAGE_PMD_NR;
+	}
+	mapcount = compound_mapcount(page);
+	ret += mapcount;
+	_total_mapcount += mapcount;
+	if (total_mapcount)
+		*total_mapcount = _total_mapcount;
+	return ret;
+}
+
+/*
  * This function splits huge page into normal pages. @page can point to any
  * subpage of huge page to split. Split doesn't change the position of @page.
  *
diff --git a/mm/memory.c b/mm/memory.c
index 52c218e..07493e3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2373,6 +2373,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * not dirty accountable.
 	 */
 	if (PageAnon(old_page) && !PageKsm(old_page)) {
+		int total_mapcount;
 		if (!trylock_page(old_page)) {
 			get_page(old_page);
 			pte_unmap_unlock(page_table, ptl);
@@ -2387,13 +2388,18 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			}
 			put_page(old_page);
 		}
-		if (reuse_swap_page(old_page)) {
-			/*
-			 * The page is all ours.  Move it to our anon_vma so
-			 * the rmap code will not search our parent or siblings.
-			 * Protected against the rmap code by the page lock.
-			 */
-			page_move_anon_rmap(old_page, vma, address);
+		if (reuse_swap_page(old_page, &total_mapcount)) {
+			if (total_mapcount == 1) {
+				/*
+				 * The page is all ours. Move it to
+				 * our anon_vma so the rmap code will
+				 * not search our parent or siblings.
+				 * Protected against the rmap code by
+				 * the page lock.
+				 */
+				page_move_anon_rmap(compound_head(old_page),
+						    vma, address);
+			}
 			unlock_page(old_page);
 			return wp_page_reuse(mm, vma, address, page_table, ptl,
 					     orig_pte, old_page, 0, 0);
@@ -2617,7 +2623,7 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	inc_mm_counter_fast(mm, MM_ANONPAGES);
 	dec_mm_counter_fast(mm, MM_SWAPENTS);
 	pte = mk_pte(page, vma->vm_page_prot);
-	if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page)) {
+	if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) {
 		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
 		flags &= ~FAULT_FLAG_WRITE;
 		ret |= VM_FAULT_WRITE;
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 83874ec..031713ab 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -922,18 +922,19 @@ out:
  * to it.  And as a side-effect, free up its swap: because the old content
  * on disk will never be read, and seeking back there to write new content
  * later would only waste time away from clustering.
+ *
+ * NOTE: total_mapcount should not be relied upon by the caller if
+ * reuse_swap_page() returns false, but it may be always overwritten
+ * (see the other implementation for CONFIG_SWAP=n).
  */
-int reuse_swap_page(struct page *page)
+bool reuse_swap_page(struct page *page, int *total_mapcount)
 {
 	int count;
 
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 	if (unlikely(PageKsm(page)))
-		return 0;
-	/* The page is part of THP and cannot be reused */
-	if (PageTransCompound(page))
-		return 0;
-	count = page_mapcount(page);
+		return false;
+	count = page_trans_huge_mapcount(page, total_mapcount);
 	if (count <= 1 && PageSwapCache(page)) {
 		count += page_swapcount(page);
 		if (count == 1 && !PageWriteback(page)) {
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 1/1] mm: thp: calculate the mapcount correctly for THP pages during WP faults
@ 2016-05-12 16:32         ` Andrea Arcangeli
  0 siblings, 0 replies; 27+ messages in thread
From: Andrea Arcangeli @ 2016-05-12 16:32 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, linux-kernel, linux-rdma
  Cc: Alex Williamson, Kirill A. Shutemov, Luick, Dean, Marciniszyn, Mike

This will provide fully accuracy to the mapcount calculation in the
write protect faults, so page pinning will not get broken by false
positive copy-on-writes.

total_mapcount() isn't the right calculation needed in
reuse_swap_page(), so this introduces a page_trans_huge_mapcount()
that is effectively the full accurate return value for page_mapcount()
if dealing with Transparent Hugepages, however we only use the
page_trans_huge_mapcount() during COW faults where it strictly needed,
due to its higher runtime cost.

This also provide at practical zero cost the total_mapcount
information which is needed to know if we can still relocate the page
anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we
can reuse the page no matter if it's a pte or a pmd_trans_huge
triggering the fault, but we can only relocate the page anon_vma to
the local vma->anon_vma if we're sure it's only this "vma" mapping the
whole THP physical range.

Kirill A. Shutemov discovered the problem with moving the page
anon_vma to the local vma->anon_vma in a previous version of this
patch and another problem in the way page_move_anon_rmap() was called.

Andrew Morton discovered that CONFIG_SWAP=n wouldn't build in a
previous version, because reuse_swap_page must be a macro to call
page_trans_huge_mapcount from swap.h, so this uses a macro again
instead of an inline function. With this change at least it's a less
dangerous usage than it was before, because "page" is used only once
now, while with the previous code reuse_swap_page(page++) would have
called page_mapcount on page+1 and it would have increased page twice
instead of just once.

Dean Luick noticed an uninitialized variable that could result in a
rmap inefficiency for the non-THP case in a previous version.

Reviewed-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 include/linux/mm.h   |  9 +++++++
 include/linux/swap.h |  6 ++---
 mm/huge_memory.c     | 71 +++++++++++++++++++++++++++++++++++++++++++++-------
 mm/memory.c          | 22 ++++++++++------
 mm/swapfile.c        | 13 +++++-----
 5 files changed, 95 insertions(+), 26 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 864d722..8f468e0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -500,11 +500,20 @@ static inline int page_mapcount(struct page *page)
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 int total_mapcount(struct page *page);
+int page_trans_huge_mapcount(struct page *page, int *total_mapcount);
 #else
 static inline int total_mapcount(struct page *page)
 {
 	return page_mapcount(page);
 }
+static inline int page_trans_huge_mapcount(struct page *page,
+					   int *total_mapcount)
+{
+	int mapcount = page_mapcount(page);
+	if (total_mapcount)
+		*total_mapcount = mapcount;
+	return mapcount;
+}
 #endif
 
 static inline struct page *virt_to_head_page(const void *x)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 0a4cd47..ad22035 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -418,7 +418,7 @@ extern sector_t swapdev_block(int, pgoff_t);
 extern int page_swapcount(struct page *);
 extern int swp_swapcount(swp_entry_t entry);
 extern struct swap_info_struct *page_swap_info(struct page *);
-extern int reuse_swap_page(struct page *);
+extern bool reuse_swap_page(struct page *, int *);
 extern int try_to_free_swap(struct page *);
 struct backing_dev_info;
 
@@ -513,8 +513,8 @@ static inline int swp_swapcount(swp_entry_t entry)
 	return 0;
 }
 
-#define reuse_swap_page(page) \
-	(!PageTransCompound(page) && page_mapcount(page) == 1)
+#define reuse_swap_page(page, total_mapcount) \
+	(page_trans_huge_mapcount(page, total_mapcount) == 1)
 
 static inline int try_to_free_swap(struct page *page)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f7daa7d..b49ee12 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1298,15 +1298,9 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	VM_BUG_ON_PAGE(!PageCompound(page) || !PageHead(page), page);
 	/*
 	 * We can only reuse the page if nobody else maps the huge page or it's
-	 * part. We can do it by checking page_mapcount() on each sub-page, but
-	 * it's expensive.
-	 * The cheaper way is to check page_count() to be equal 1: every
-	 * mapcount takes page reference reference, so this way we can
-	 * guarantee, that the PMD is the only mapping.
-	 * This can give false negative if somebody pinned the page, but that's
-	 * fine.
+	 * part.
 	 */
-	if (page_mapcount(page) == 1 && page_count(page) == 1) {
+	if (page_trans_huge_mapcount(page, NULL) == 1) {
 		pmd_t entry;
 		entry = pmd_mkyoung(orig_pmd);
 		entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
@@ -2079,7 +2073,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 		if (pte_write(pteval)) {
 			writable = true;
 		} else {
-			if (PageSwapCache(page) && !reuse_swap_page(page)) {
+			if (PageSwapCache(page) &&
+			    !reuse_swap_page(page, NULL)) {
 				unlock_page(page);
 				result = SCAN_SWAP_CACHE_PAGE;
 				goto out;
@@ -3223,6 +3218,64 @@ int total_mapcount(struct page *page)
 }
 
 /*
+ * This calculates accurately how many mappings a transparent hugepage
+ * has (unlike page_mapcount() which isn't fully accurate). This full
+ * accuracy is primarily needed to know if copy-on-write faults can
+ * reuse the page and change the mapping to read-write instead of
+ * copying them. At the same time this returns the total_mapcount too.
+ *
+ * The function returns the highest mapcount any one of the subpages
+ * has. If the return value is one, even if different processes are
+ * mapping different subpages of the transparent hugepage, they can
+ * all reuse it, because each process is reusing a different subpage.
+ *
+ * The total_mapcount is instead counting all virtual mappings of the
+ * subpages. If the total_mapcount is equal to "one", it tells the
+ * caller all mappings belong to the same "mm" and in turn the
+ * anon_vma of the transparent hugepage can become the vma->anon_vma
+ * local one as no other process may be mapping any of the subpages.
+ *
+ * It would be more accurate to replace page_mapcount() with
+ * page_trans_huge_mapcount(), however we only use
+ * page_trans_huge_mapcount() in the copy-on-write faults where we
+ * need full accuracy to avoid breaking page pinning, because
+ * page_trans_huge_mapcount() is slower than page_mapcount().
+ */
+int page_trans_huge_mapcount(struct page *page, int *total_mapcount)
+{
+	int i, ret, _total_mapcount, mapcount;
+
+	/* hugetlbfs shouldn't call it */
+	VM_BUG_ON_PAGE(PageHuge(page), page);
+
+	if (likely(!PageTransCompound(page))) {
+		mapcount = atomic_read(&page->_mapcount) + 1;
+		if (total_mapcount)
+			*total_mapcount = mapcount;
+		return mapcount;
+	}
+
+	page = compound_head(page);
+
+	_total_mapcount = ret = 0;
+	for (i = 0; i < HPAGE_PMD_NR; i++) {
+		mapcount = atomic_read(&page[i]._mapcount) + 1;
+		ret = max(ret, mapcount);
+		_total_mapcount += mapcount;
+	}
+	if (PageDoubleMap(page)) {
+		ret -= 1;
+		_total_mapcount -= HPAGE_PMD_NR;
+	}
+	mapcount = compound_mapcount(page);
+	ret += mapcount;
+	_total_mapcount += mapcount;
+	if (total_mapcount)
+		*total_mapcount = _total_mapcount;
+	return ret;
+}
+
+/*
  * This function splits huge page into normal pages. @page can point to any
  * subpage of huge page to split. Split doesn't change the position of @page.
  *
diff --git a/mm/memory.c b/mm/memory.c
index 52c218e..07493e3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2373,6 +2373,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * not dirty accountable.
 	 */
 	if (PageAnon(old_page) && !PageKsm(old_page)) {
+		int total_mapcount;
 		if (!trylock_page(old_page)) {
 			get_page(old_page);
 			pte_unmap_unlock(page_table, ptl);
@@ -2387,13 +2388,18 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			}
 			put_page(old_page);
 		}
-		if (reuse_swap_page(old_page)) {
-			/*
-			 * The page is all ours.  Move it to our anon_vma so
-			 * the rmap code will not search our parent or siblings.
-			 * Protected against the rmap code by the page lock.
-			 */
-			page_move_anon_rmap(old_page, vma, address);
+		if (reuse_swap_page(old_page, &total_mapcount)) {
+			if (total_mapcount == 1) {
+				/*
+				 * The page is all ours. Move it to
+				 * our anon_vma so the rmap code will
+				 * not search our parent or siblings.
+				 * Protected against the rmap code by
+				 * the page lock.
+				 */
+				page_move_anon_rmap(compound_head(old_page),
+						    vma, address);
+			}
 			unlock_page(old_page);
 			return wp_page_reuse(mm, vma, address, page_table, ptl,
 					     orig_pte, old_page, 0, 0);
@@ -2617,7 +2623,7 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	inc_mm_counter_fast(mm, MM_ANONPAGES);
 	dec_mm_counter_fast(mm, MM_SWAPENTS);
 	pte = mk_pte(page, vma->vm_page_prot);
-	if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page)) {
+	if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) {
 		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
 		flags &= ~FAULT_FLAG_WRITE;
 		ret |= VM_FAULT_WRITE;
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 83874ec..031713ab 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -922,18 +922,19 @@ out:
  * to it.  And as a side-effect, free up its swap: because the old content
  * on disk will never be read, and seeking back there to write new content
  * later would only waste time away from clustering.
+ *
+ * NOTE: total_mapcount should not be relied upon by the caller if
+ * reuse_swap_page() returns false, but it may be always overwritten
+ * (see the other implementation for CONFIG_SWAP=n).
  */
-int reuse_swap_page(struct page *page)
+bool reuse_swap_page(struct page *page, int *total_mapcount)
 {
 	int count;
 
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 	if (unlikely(PageKsm(page)))
-		return 0;
-	/* The page is part of THP and cannot be reused */
-	if (PageTransCompound(page))
-		return 0;
-	count = page_mapcount(page);
+		return false;
+	count = page_trans_huge_mapcount(page, total_mapcount);
 	if (count <= 1 && PageSwapCache(page)) {
 		count += page_swapcount(page);
 		if (count == 1 && !PageWriteback(page)) {

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 1/1] mm: thp: calculate the mapcount correctly for THP pages during WP faults
@ 2016-05-12 16:32         ` Andrea Arcangeli
  0 siblings, 0 replies; 27+ messages in thread
From: Andrea Arcangeli @ 2016-05-12 16:32 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, linux-kernel, linux-rdma
  Cc: Alex Williamson, Kirill A. Shutemov, Luick, Dean, Marciniszyn, Mike

This will provide fully accuracy to the mapcount calculation in the
write protect faults, so page pinning will not get broken by false
positive copy-on-writes.

total_mapcount() isn't the right calculation needed in
reuse_swap_page(), so this introduces a page_trans_huge_mapcount()
that is effectively the full accurate return value for page_mapcount()
if dealing with Transparent Hugepages, however we only use the
page_trans_huge_mapcount() during COW faults where it strictly needed,
due to its higher runtime cost.

This also provide at practical zero cost the total_mapcount
information which is needed to know if we can still relocate the page
anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we
can reuse the page no matter if it's a pte or a pmd_trans_huge
triggering the fault, but we can only relocate the page anon_vma to
the local vma->anon_vma if we're sure it's only this "vma" mapping the
whole THP physical range.

Kirill A. Shutemov discovered the problem with moving the page
anon_vma to the local vma->anon_vma in a previous version of this
patch and another problem in the way page_move_anon_rmap() was called.

Andrew Morton discovered that CONFIG_SWAP=n wouldn't build in a
previous version, because reuse_swap_page must be a macro to call
page_trans_huge_mapcount from swap.h, so this uses a macro again
instead of an inline function. With this change at least it's a less
dangerous usage than it was before, because "page" is used only once
now, while with the previous code reuse_swap_page(page++) would have
called page_mapcount on page+1 and it would have increased page twice
instead of just once.

Dean Luick noticed an uninitialized variable that could result in a
rmap inefficiency for the non-THP case in a previous version.

Reviewed-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 include/linux/mm.h   |  9 +++++++
 include/linux/swap.h |  6 ++---
 mm/huge_memory.c     | 71 +++++++++++++++++++++++++++++++++++++++++++++-------
 mm/memory.c          | 22 ++++++++++------
 mm/swapfile.c        | 13 +++++-----
 5 files changed, 95 insertions(+), 26 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 864d722..8f468e0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -500,11 +500,20 @@ static inline int page_mapcount(struct page *page)
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 int total_mapcount(struct page *page);
+int page_trans_huge_mapcount(struct page *page, int *total_mapcount);
 #else
 static inline int total_mapcount(struct page *page)
 {
 	return page_mapcount(page);
 }
+static inline int page_trans_huge_mapcount(struct page *page,
+					   int *total_mapcount)
+{
+	int mapcount = page_mapcount(page);
+	if (total_mapcount)
+		*total_mapcount = mapcount;
+	return mapcount;
+}
 #endif
 
 static inline struct page *virt_to_head_page(const void *x)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 0a4cd47..ad22035 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -418,7 +418,7 @@ extern sector_t swapdev_block(int, pgoff_t);
 extern int page_swapcount(struct page *);
 extern int swp_swapcount(swp_entry_t entry);
 extern struct swap_info_struct *page_swap_info(struct page *);
-extern int reuse_swap_page(struct page *);
+extern bool reuse_swap_page(struct page *, int *);
 extern int try_to_free_swap(struct page *);
 struct backing_dev_info;
 
@@ -513,8 +513,8 @@ static inline int swp_swapcount(swp_entry_t entry)
 	return 0;
 }
 
-#define reuse_swap_page(page) \
-	(!PageTransCompound(page) && page_mapcount(page) == 1)
+#define reuse_swap_page(page, total_mapcount) \
+	(page_trans_huge_mapcount(page, total_mapcount) == 1)
 
 static inline int try_to_free_swap(struct page *page)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f7daa7d..b49ee12 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1298,15 +1298,9 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	VM_BUG_ON_PAGE(!PageCompound(page) || !PageHead(page), page);
 	/*
 	 * We can only reuse the page if nobody else maps the huge page or it's
-	 * part. We can do it by checking page_mapcount() on each sub-page, but
-	 * it's expensive.
-	 * The cheaper way is to check page_count() to be equal 1: every
-	 * mapcount takes page reference reference, so this way we can
-	 * guarantee, that the PMD is the only mapping.
-	 * This can give false negative if somebody pinned the page, but that's
-	 * fine.
+	 * part.
 	 */
-	if (page_mapcount(page) == 1 && page_count(page) == 1) {
+	if (page_trans_huge_mapcount(page, NULL) == 1) {
 		pmd_t entry;
 		entry = pmd_mkyoung(orig_pmd);
 		entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
@@ -2079,7 +2073,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 		if (pte_write(pteval)) {
 			writable = true;
 		} else {
-			if (PageSwapCache(page) && !reuse_swap_page(page)) {
+			if (PageSwapCache(page) &&
+			    !reuse_swap_page(page, NULL)) {
 				unlock_page(page);
 				result = SCAN_SWAP_CACHE_PAGE;
 				goto out;
@@ -3223,6 +3218,64 @@ int total_mapcount(struct page *page)
 }
 
 /*
+ * This calculates accurately how many mappings a transparent hugepage
+ * has (unlike page_mapcount() which isn't fully accurate). This full
+ * accuracy is primarily needed to know if copy-on-write faults can
+ * reuse the page and change the mapping to read-write instead of
+ * copying them. At the same time this returns the total_mapcount too.
+ *
+ * The function returns the highest mapcount any one of the subpages
+ * has. If the return value is one, even if different processes are
+ * mapping different subpages of the transparent hugepage, they can
+ * all reuse it, because each process is reusing a different subpage.
+ *
+ * The total_mapcount is instead counting all virtual mappings of the
+ * subpages. If the total_mapcount is equal to "one", it tells the
+ * caller all mappings belong to the same "mm" and in turn the
+ * anon_vma of the transparent hugepage can become the vma->anon_vma
+ * local one as no other process may be mapping any of the subpages.
+ *
+ * It would be more accurate to replace page_mapcount() with
+ * page_trans_huge_mapcount(), however we only use
+ * page_trans_huge_mapcount() in the copy-on-write faults where we
+ * need full accuracy to avoid breaking page pinning, because
+ * page_trans_huge_mapcount() is slower than page_mapcount().
+ */
+int page_trans_huge_mapcount(struct page *page, int *total_mapcount)
+{
+	int i, ret, _total_mapcount, mapcount;
+
+	/* hugetlbfs shouldn't call it */
+	VM_BUG_ON_PAGE(PageHuge(page), page);
+
+	if (likely(!PageTransCompound(page))) {
+		mapcount = atomic_read(&page->_mapcount) + 1;
+		if (total_mapcount)
+			*total_mapcount = mapcount;
+		return mapcount;
+	}
+
+	page = compound_head(page);
+
+	_total_mapcount = ret = 0;
+	for (i = 0; i < HPAGE_PMD_NR; i++) {
+		mapcount = atomic_read(&page[i]._mapcount) + 1;
+		ret = max(ret, mapcount);
+		_total_mapcount += mapcount;
+	}
+	if (PageDoubleMap(page)) {
+		ret -= 1;
+		_total_mapcount -= HPAGE_PMD_NR;
+	}
+	mapcount = compound_mapcount(page);
+	ret += mapcount;
+	_total_mapcount += mapcount;
+	if (total_mapcount)
+		*total_mapcount = _total_mapcount;
+	return ret;
+}
+
+/*
  * This function splits huge page into normal pages. @page can point to any
  * subpage of huge page to split. Split doesn't change the position of @page.
  *
diff --git a/mm/memory.c b/mm/memory.c
index 52c218e..07493e3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2373,6 +2373,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * not dirty accountable.
 	 */
 	if (PageAnon(old_page) && !PageKsm(old_page)) {
+		int total_mapcount;
 		if (!trylock_page(old_page)) {
 			get_page(old_page);
 			pte_unmap_unlock(page_table, ptl);
@@ -2387,13 +2388,18 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			}
 			put_page(old_page);
 		}
-		if (reuse_swap_page(old_page)) {
-			/*
-			 * The page is all ours.  Move it to our anon_vma so
-			 * the rmap code will not search our parent or siblings.
-			 * Protected against the rmap code by the page lock.
-			 */
-			page_move_anon_rmap(old_page, vma, address);
+		if (reuse_swap_page(old_page, &total_mapcount)) {
+			if (total_mapcount == 1) {
+				/*
+				 * The page is all ours. Move it to
+				 * our anon_vma so the rmap code will
+				 * not search our parent or siblings.
+				 * Protected against the rmap code by
+				 * the page lock.
+				 */
+				page_move_anon_rmap(compound_head(old_page),
+						    vma, address);
+			}
 			unlock_page(old_page);
 			return wp_page_reuse(mm, vma, address, page_table, ptl,
 					     orig_pte, old_page, 0, 0);
@@ -2617,7 +2623,7 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	inc_mm_counter_fast(mm, MM_ANONPAGES);
 	dec_mm_counter_fast(mm, MM_SWAPENTS);
 	pte = mk_pte(page, vma->vm_page_prot);
-	if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page)) {
+	if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) {
 		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
 		flags &= ~FAULT_FLAG_WRITE;
 		ret |= VM_FAULT_WRITE;
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 83874ec..031713ab 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -922,18 +922,19 @@ out:
  * to it.  And as a side-effect, free up its swap: because the old content
  * on disk will never be read, and seeking back there to write new content
  * later would only waste time away from clustering.
+ *
+ * NOTE: total_mapcount should not be relied upon by the caller if
+ * reuse_swap_page() returns false, but it may be always overwritten
+ * (see the other implementation for CONFIG_SWAP=n).
  */
-int reuse_swap_page(struct page *page)
+bool reuse_swap_page(struct page *page, int *total_mapcount)
 {
 	int count;
 
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 	if (unlikely(PageKsm(page)))
-		return 0;
-	/* The page is part of THP and cannot be reused */
-	if (PageTransCompound(page))
-		return 0;
-	count = page_mapcount(page);
+		return false;
+	count = page_trans_huge_mapcount(page, total_mapcount);
 	if (count <= 1 && PageSwapCache(page)) {
 		count += page_swapcount(page);
 		if (count == 1 && !PageWriteback(page)) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [1/1] mm: thp: calculate the mapcount correctly for THP pages during WP faults
  2016-05-12 16:32         ` Andrea Arcangeli
  (?)
@ 2016-05-12 18:48             ` Mike Marciniszyn
  -1 siblings, 0 replies; 27+ messages in thread
From: Mike Marciniszyn @ 2016-05-12 18:48 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Alex Williamson,
	Kirill A. Shutemov, Luick, Dean

>
>Reviewed-by: "Kirill A. Shutemov" <kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
>Signed-off-by: Andrea Arcangeli <aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>

Andrea,

Perhaps add a V<n> in the subject for subsequent submissions as well as
version change control in email after the ---.

I happened to know the differences, but others might not.

This patch also solves the >= 4.5-rc1 IB user memory registration thp bug
that results in memory corruption!

Reviewed-by: Dean Luick <dean.luick-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Tested-by: Mike Marciniszyn <mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Tested-by: Josh Collier <josh.d.collier-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [1/1] mm: thp: calculate the mapcount correctly for THP pages during WP faults
@ 2016-05-12 18:48             ` Mike Marciniszyn
  0 siblings, 0 replies; 27+ messages in thread
From: Mike Marciniszyn @ 2016-05-12 18:48 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, linux-mm, linux-kernel, linux-rdma,
	Alex Williamson, Kirill A. Shutemov, Luick, Dean

>
>Reviewed-by: "Kirill A. Shutemov" <kirill@shutemov.name>
>Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
>

Andrea,

Perhaps add a V<n> in the subject for subsequent submissions as well as
version change control in email after the ---.

I happened to know the differences, but others might not.

This patch also solves the >= 4.5-rc1 IB user memory registration thp bug
that results in memory corruption!

Reviewed-by: Dean Luick <dean.luick@intel.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Tested-by: Josh Collier <josh.d.collier@intel.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [1/1] mm: thp: calculate the mapcount correctly for THP pages during WP faults
@ 2016-05-12 18:48             ` Mike Marciniszyn
  0 siblings, 0 replies; 27+ messages in thread
From: Mike Marciniszyn @ 2016-05-12 18:48 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, linux-mm, linux-kernel, linux-rdma,
	Alex Williamson, Kirill A. Shutemov, Luick, Dean

>
>Reviewed-by: "Kirill A. Shutemov" <kirill@shutemov.name>
>Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
>

Andrea,

Perhaps add a V<n> in the subject for subsequent submissions as well as
version change control in email after the ---.

I happened to know the differences, but others might not.

This patch also solves the >= 4.5-rc1 IB user memory registration thp bug
that results in memory corruption!

Reviewed-by: Dean Luick <dean.luick@intel.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Tested-by: Josh Collier <josh.d.collier@intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2016-05-12 18:48 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-06 15:03 [PATCH 0/3] mm: thp: mapcount updates Andrea Arcangeli
2016-05-06 15:03 ` Andrea Arcangeli
2016-05-06 15:03 ` [PATCH 1/3] mm: thp: calculate the mapcount correctly for THP pages during WP faults Andrea Arcangeli
2016-05-06 15:03   ` Andrea Arcangeli
2016-05-06 22:10   ` Alex Williamson
2016-05-06 22:10     ` Alex Williamson
2016-05-09 22:26   ` Andrew Morton
2016-05-09 22:26     ` Andrew Morton
2016-05-10 19:21   ` [PATCH 1/1] " Andrea Arcangeli
2016-05-10 19:21     ` Andrea Arcangeli
     [not found]     ` <1462908082-12657-1-git-send-email-aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-05-11 21:26       ` [1/1] " Mike Marciniszyn
2016-05-11 21:26         ` Mike Marciniszyn
2016-05-11 21:26         ` Mike Marciniszyn
2016-05-12 16:32       ` [PATCH 1/1] " Andrea Arcangeli
2016-05-12 16:32         ` Andrea Arcangeli
2016-05-12 16:32         ` Andrea Arcangeli
     [not found]         ` <1463070742-18401-1-git-send-email-aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-05-12 18:48           ` [1/1] " Mike Marciniszyn
2016-05-12 18:48             ` Mike Marciniszyn
2016-05-12 18:48             ` Mike Marciniszyn
2016-05-06 15:03 ` [PATCH 2/3] mm: thp: microoptimize compound_mapcount() Andrea Arcangeli
2016-05-06 15:03   ` Andrea Arcangeli
2016-05-06 17:33   ` Kirill A. Shutemov
2016-05-06 17:33     ` Kirill A. Shutemov
2016-05-06 15:04 ` [PATCH 3/3] mm: thp: split_huge_pmd_address() comment improvement Andrea Arcangeli
2016-05-06 15:04   ` Andrea Arcangeli
2016-05-06 17:33   ` Kirill A. Shutemov
2016-05-06 17:33     ` Kirill A. Shutemov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.