All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] thp: simplify freeze_page() and unfreeze_page()
@ 2016-02-03 15:14 ` Kirill A. Shutemov
  0 siblings, 0 replies; 28+ messages in thread
From: Kirill A. Shutemov @ 2016-02-03 15:14 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli
  Cc: Hugh Dickins, Dave Hansen, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, Sasha Levin, linux-kernel, linux-mm,
	Kirill A. Shutemov

This patchset rewrites freeze_page() and unfreeze_page() using try_to_unmap()
and remove_migration_ptes(). Result is much simplier, but somewhat slower.
See the last patch for details.

I did quick sanity check. More testing is required.

Any comments?

Kirill A. Shutemov (4):
  rmap: introduce rmap_walk_locked()
  rmap: extend try_to_unmap() to be usable by split_huge_page()
  mm: make remove_migration_ptes() beyond mm/migration.c
  thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers

 include/linux/huge_mm.h |   7 ++
 include/linux/rmap.h    |   6 ++
 mm/huge_memory.c        | 219 ++++++------------------------------------------
 mm/migrate.c            |  13 +--
 mm/rmap.c               |  49 ++++++++---
 5 files changed, 83 insertions(+), 211 deletions(-)

-- 
2.7.0

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 0/4] thp: simplify freeze_page() and unfreeze_page()
@ 2016-02-03 15:14 ` Kirill A. Shutemov
  0 siblings, 0 replies; 28+ messages in thread
From: Kirill A. Shutemov @ 2016-02-03 15:14 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli
  Cc: Hugh Dickins, Dave Hansen, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, Sasha Levin, linux-kernel, linux-mm,
	Kirill A. Shutemov

This patchset rewrites freeze_page() and unfreeze_page() using try_to_unmap()
and remove_migration_ptes(). Result is much simplier, but somewhat slower.
See the last patch for details.

I did quick sanity check. More testing is required.

Any comments?

Kirill A. Shutemov (4):
  rmap: introduce rmap_walk_locked()
  rmap: extend try_to_unmap() to be usable by split_huge_page()
  mm: make remove_migration_ptes() beyond mm/migration.c
  thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers

 include/linux/huge_mm.h |   7 ++
 include/linux/rmap.h    |   6 ++
 mm/huge_memory.c        | 219 ++++++------------------------------------------
 mm/migrate.c            |  13 +--
 mm/rmap.c               |  49 ++++++++---
 5 files changed, 83 insertions(+), 211 deletions(-)

-- 
2.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/4] rmap: introduce rmap_walk_locked()
  2016-02-03 15:14 ` Kirill A. Shutemov
@ 2016-02-03 15:14   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 28+ messages in thread
From: Kirill A. Shutemov @ 2016-02-03 15:14 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli
  Cc: Hugh Dickins, Dave Hansen, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, Sasha Levin, linux-kernel, linux-mm,
	Kirill A. Shutemov

rmap_walk_locked() is the same as rmap_walk(), but caller takes care
about relevant rmap lock. It only supports anonymous pages for now.

It's preparation to switch THP splitting from custom rmap walk in
freeze_page()/unfreeze_page() to generic one.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/rmap.h |  1 +
 mm/rmap.c            | 25 +++++++++++++++++++++----
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index bdf597c4f0be..23a03fbeef61 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -280,6 +280,7 @@ struct rmap_walk_control {
 };
 
 int rmap_walk(struct page *page, struct rmap_walk_control *rwc);
+int rmap_walk_locked(struct page *page, struct rmap_walk_control *rwc);
 
 #else	/* !CONFIG_MMU */
 
diff --git a/mm/rmap.c b/mm/rmap.c
index 79f3bf047f38..a9cffb784502 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1719,14 +1719,21 @@ static struct anon_vma *rmap_walk_anon_lock(struct page *page,
  * vm_flags for that VMA.  That should be OK, because that vma shouldn't be
  * LOCKED.
  */
-static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc)
+static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc,
+		bool locked)
 {
 	struct anon_vma *anon_vma;
 	pgoff_t pgoff;
 	struct anon_vma_chain *avc;
 	int ret = SWAP_AGAIN;
 
-	anon_vma = rmap_walk_anon_lock(page, rwc);
+	if (locked) {
+		anon_vma = page_anon_vma(page);
+		/* anon_vma disappear under us? */
+		VM_BUG_ON_PAGE(!anon_vma, page);
+	} else {
+		anon_vma = rmap_walk_anon_lock(page, rwc);
+	}
 	if (!anon_vma)
 		return ret;
 
@@ -1746,7 +1753,9 @@ static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc)
 		if (rwc->done && rwc->done(page))
 			break;
 	}
-	anon_vma_unlock_read(anon_vma);
+
+	if (!locked)
+		anon_vma_unlock_read(anon_vma);
 	return ret;
 }
 
@@ -1808,11 +1817,19 @@ int rmap_walk(struct page *page, struct rmap_walk_control *rwc)
 	if (unlikely(PageKsm(page)))
 		return rmap_walk_ksm(page, rwc);
 	else if (PageAnon(page))
-		return rmap_walk_anon(page, rwc);
+		return rmap_walk_anon(page, rwc, false);
 	else
 		return rmap_walk_file(page, rwc);
 }
 
+/* Like rmap_walk, but caller holds relevant rmap lock */
+int rmap_walk_locked(struct page *page, struct rmap_walk_control *rwc)
+{
+	/* only for anon pages for now */
+	VM_BUG_ON_PAGE(!PageAnon(page) || PageKsm(page), page);
+	return rmap_walk_anon(page, rwc, true);
+}
+
 #ifdef CONFIG_HUGETLB_PAGE
 /*
  * The following three functions are for anonymous (private mapped) hugepages.
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 1/4] rmap: introduce rmap_walk_locked()
@ 2016-02-03 15:14   ` Kirill A. Shutemov
  0 siblings, 0 replies; 28+ messages in thread
From: Kirill A. Shutemov @ 2016-02-03 15:14 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli
  Cc: Hugh Dickins, Dave Hansen, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, Sasha Levin, linux-kernel, linux-mm,
	Kirill A. Shutemov

rmap_walk_locked() is the same as rmap_walk(), but caller takes care
about relevant rmap lock. It only supports anonymous pages for now.

It's preparation to switch THP splitting from custom rmap walk in
freeze_page()/unfreeze_page() to generic one.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/rmap.h |  1 +
 mm/rmap.c            | 25 +++++++++++++++++++++----
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index bdf597c4f0be..23a03fbeef61 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -280,6 +280,7 @@ struct rmap_walk_control {
 };
 
 int rmap_walk(struct page *page, struct rmap_walk_control *rwc);
+int rmap_walk_locked(struct page *page, struct rmap_walk_control *rwc);
 
 #else	/* !CONFIG_MMU */
 
diff --git a/mm/rmap.c b/mm/rmap.c
index 79f3bf047f38..a9cffb784502 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1719,14 +1719,21 @@ static struct anon_vma *rmap_walk_anon_lock(struct page *page,
  * vm_flags for that VMA.  That should be OK, because that vma shouldn't be
  * LOCKED.
  */
-static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc)
+static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc,
+		bool locked)
 {
 	struct anon_vma *anon_vma;
 	pgoff_t pgoff;
 	struct anon_vma_chain *avc;
 	int ret = SWAP_AGAIN;
 
-	anon_vma = rmap_walk_anon_lock(page, rwc);
+	if (locked) {
+		anon_vma = page_anon_vma(page);
+		/* anon_vma disappear under us? */
+		VM_BUG_ON_PAGE(!anon_vma, page);
+	} else {
+		anon_vma = rmap_walk_anon_lock(page, rwc);
+	}
 	if (!anon_vma)
 		return ret;
 
@@ -1746,7 +1753,9 @@ static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc)
 		if (rwc->done && rwc->done(page))
 			break;
 	}
-	anon_vma_unlock_read(anon_vma);
+
+	if (!locked)
+		anon_vma_unlock_read(anon_vma);
 	return ret;
 }
 
@@ -1808,11 +1817,19 @@ int rmap_walk(struct page *page, struct rmap_walk_control *rwc)
 	if (unlikely(PageKsm(page)))
 		return rmap_walk_ksm(page, rwc);
 	else if (PageAnon(page))
-		return rmap_walk_anon(page, rwc);
+		return rmap_walk_anon(page, rwc, false);
 	else
 		return rmap_walk_file(page, rwc);
 }
 
+/* Like rmap_walk, but caller holds relevant rmap lock */
+int rmap_walk_locked(struct page *page, struct rmap_walk_control *rwc)
+{
+	/* only for anon pages for now */
+	VM_BUG_ON_PAGE(!PageAnon(page) || PageKsm(page), page);
+	return rmap_walk_anon(page, rwc, true);
+}
+
 #ifdef CONFIG_HUGETLB_PAGE
 /*
  * The following three functions are for anonymous (private mapped) hugepages.
-- 
2.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/4] rmap: extend try_to_unmap() to be usable by split_huge_page()
  2016-02-03 15:14 ` Kirill A. Shutemov
@ 2016-02-03 15:14   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 28+ messages in thread
From: Kirill A. Shutemov @ 2016-02-03 15:14 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli
  Cc: Hugh Dickins, Dave Hansen, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, Sasha Levin, linux-kernel, linux-mm,
	Kirill A. Shutemov

The patch add support for two ttu_flags:

  - TTU_SPLIT_HUGE_PMD would split PMD if it's there, before trying to
    unmap page;

  - TTU_RMAP_LOCKED indicates that caller holds relevant rmap lock;

Apart these flags, patch changes rwc->done to !page_mapcount()
instead of !page_mapped(). try_to_unmap() works on pte level, so we
really interested if this small pages is mapped, not compound page
it's part of.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/huge_mm.h |  7 +++++++
 include/linux/rmap.h    |  3 +++
 mm/huge_memory.c        |  5 +----
 mm/rmap.c               | 24 ++++++++++++++++--------
 4 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 459fd25b378e..c47067151ffd 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -111,6 +111,9 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 			__split_huge_pmd(__vma, __pmd, __address);	\
 	}  while (0)
 
+
+void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address);
+
 #if HPAGE_PMD_ORDER >= MAX_ORDER
 #error "hugepages can't be allocated by the buddy allocator"
 #endif
@@ -178,6 +181,10 @@ static inline int split_huge_page(struct page *page)
 static inline void deferred_split_huge_page(struct page *page) {}
 #define split_huge_pmd(__vma, __pmd, __address)	\
 	do { } while (0)
+
+static inline void split_huge_pmd_address(struct vm_area_struct *vma,
+		unsigned long address) {}
+
 static inline int hugepage_madvise(struct vm_area_struct *vma,
 				   unsigned long *vm_flags, int advice)
 {
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 23a03fbeef61..1fdde1ee7042 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -86,6 +86,7 @@ enum ttu_flags {
 	TTU_MIGRATION = 2,		/* migration mode */
 	TTU_MUNLOCK = 4,		/* munlock mode */
 	TTU_LZFREE = 8,			/* lazy free mode */
+	TTU_SPLIT_HUGE_PMD = 16,	/* split huge PMD if any */
 
 	TTU_IGNORE_MLOCK = (1 << 8),	/* ignore mlock */
 	TTU_IGNORE_ACCESS = (1 << 9),	/* don't age */
@@ -93,6 +94,8 @@ enum ttu_flags {
 	TTU_BATCH_FLUSH = (1 << 11),	/* Batch TLB flushes where possible
 					 * and caller guarantees they will
 					 * do a final flush if necessary */
+	TTU_RMAP_LOCKED = (1 << 12)	/* do not grab rmap lock:
+					 * caller holds it */
 };
 
 #ifdef CONFIG_MMU
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 581709e837b7..4213c76c8434 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2970,15 +2970,12 @@ out:
 	}
 }
 
-static void split_huge_pmd_address(struct vm_area_struct *vma,
-				    unsigned long address)
+void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address)
 {
 	pgd_t *pgd;
 	pud_t *pud;
 	pmd_t *pmd;
 
-	VM_BUG_ON(!(address & ~HPAGE_PMD_MASK));
-
 	pgd = pgd_offset(vma->vm_mm, address);
 	if (!pgd_present(*pgd))
 		return;
diff --git a/mm/rmap.c b/mm/rmap.c
index a9cffb784502..2cafd48caf85 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1435,6 +1435,8 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 	if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED))
 		goto out;
 
+	if (flags & TTU_SPLIT_HUGE_PMD)
+		split_huge_pmd_address(vma, address);
 	pte = page_check_address(page, mm, address, &ptl, 0);
 	if (!pte)
 		goto out;
@@ -1580,10 +1582,10 @@ static bool invalid_migration_vma(struct vm_area_struct *vma, void *arg)
 	return is_vma_temporary_stack(vma);
 }
 
-static int page_not_mapped(struct page *page)
+static int page_mapcount_is_zero(struct page *page)
 {
-	return !page_mapped(page);
-};
+	return !page_mapcount(page);
+}
 
 /**
  * try_to_unmap - try to remove all page table mappings to a page
@@ -1610,12 +1612,10 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
 	struct rmap_walk_control rwc = {
 		.rmap_one = try_to_unmap_one,
 		.arg = &rp,
-		.done = page_not_mapped,
+		.done = page_mapcount_is_zero,
 		.anon_lock = page_lock_anon_vma_read,
 	};
 
-	VM_BUG_ON_PAGE(!PageHuge(page) && PageTransHuge(page), page);
-
 	/*
 	 * During exec, a temporary VMA is setup and later moved.
 	 * The VMA is moved under the anon_vma lock but not the
@@ -1627,9 +1627,12 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
 	if ((flags & TTU_MIGRATION) && !PageKsm(page) && PageAnon(page))
 		rwc.invalid_vma = invalid_migration_vma;
 
-	ret = rmap_walk(page, &rwc);
+	if (flags & TTU_RMAP_LOCKED)
+		ret = rmap_walk_locked(page, &rwc);
+	else
+		ret = rmap_walk(page, &rwc);
 
-	if (ret != SWAP_MLOCK && !page_mapped(page)) {
+	if (ret != SWAP_MLOCK && !page_mapcount(page)) {
 		ret = SWAP_SUCCESS;
 		if (rp.lazyfreed && !PageDirty(page))
 			ret = SWAP_LZFREE;
@@ -1637,6 +1640,11 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
 	return ret;
 }
 
+static int page_not_mapped(struct page *page)
+{
+	return !page_mapped(page);
+};
+
 /**
  * try_to_munlock - try to munlock a page
  * @page: the page to be munlocked
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/4] rmap: extend try_to_unmap() to be usable by split_huge_page()
@ 2016-02-03 15:14   ` Kirill A. Shutemov
  0 siblings, 0 replies; 28+ messages in thread
From: Kirill A. Shutemov @ 2016-02-03 15:14 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli
  Cc: Hugh Dickins, Dave Hansen, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, Sasha Levin, linux-kernel, linux-mm,
	Kirill A. Shutemov

The patch add support for two ttu_flags:

  - TTU_SPLIT_HUGE_PMD would split PMD if it's there, before trying to
    unmap page;

  - TTU_RMAP_LOCKED indicates that caller holds relevant rmap lock;

Apart these flags, patch changes rwc->done to !page_mapcount()
instead of !page_mapped(). try_to_unmap() works on pte level, so we
really interested if this small pages is mapped, not compound page
it's part of.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/huge_mm.h |  7 +++++++
 include/linux/rmap.h    |  3 +++
 mm/huge_memory.c        |  5 +----
 mm/rmap.c               | 24 ++++++++++++++++--------
 4 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 459fd25b378e..c47067151ffd 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -111,6 +111,9 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 			__split_huge_pmd(__vma, __pmd, __address);	\
 	}  while (0)
 
+
+void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address);
+
 #if HPAGE_PMD_ORDER >= MAX_ORDER
 #error "hugepages can't be allocated by the buddy allocator"
 #endif
@@ -178,6 +181,10 @@ static inline int split_huge_page(struct page *page)
 static inline void deferred_split_huge_page(struct page *page) {}
 #define split_huge_pmd(__vma, __pmd, __address)	\
 	do { } while (0)
+
+static inline void split_huge_pmd_address(struct vm_area_struct *vma,
+		unsigned long address) {}
+
 static inline int hugepage_madvise(struct vm_area_struct *vma,
 				   unsigned long *vm_flags, int advice)
 {
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 23a03fbeef61..1fdde1ee7042 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -86,6 +86,7 @@ enum ttu_flags {
 	TTU_MIGRATION = 2,		/* migration mode */
 	TTU_MUNLOCK = 4,		/* munlock mode */
 	TTU_LZFREE = 8,			/* lazy free mode */
+	TTU_SPLIT_HUGE_PMD = 16,	/* split huge PMD if any */
 
 	TTU_IGNORE_MLOCK = (1 << 8),	/* ignore mlock */
 	TTU_IGNORE_ACCESS = (1 << 9),	/* don't age */
@@ -93,6 +94,8 @@ enum ttu_flags {
 	TTU_BATCH_FLUSH = (1 << 11),	/* Batch TLB flushes where possible
 					 * and caller guarantees they will
 					 * do a final flush if necessary */
+	TTU_RMAP_LOCKED = (1 << 12)	/* do not grab rmap lock:
+					 * caller holds it */
 };
 
 #ifdef CONFIG_MMU
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 581709e837b7..4213c76c8434 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2970,15 +2970,12 @@ out:
 	}
 }
 
-static void split_huge_pmd_address(struct vm_area_struct *vma,
-				    unsigned long address)
+void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address)
 {
 	pgd_t *pgd;
 	pud_t *pud;
 	pmd_t *pmd;
 
-	VM_BUG_ON(!(address & ~HPAGE_PMD_MASK));
-
 	pgd = pgd_offset(vma->vm_mm, address);
 	if (!pgd_present(*pgd))
 		return;
diff --git a/mm/rmap.c b/mm/rmap.c
index a9cffb784502..2cafd48caf85 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1435,6 +1435,8 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 	if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED))
 		goto out;
 
+	if (flags & TTU_SPLIT_HUGE_PMD)
+		split_huge_pmd_address(vma, address);
 	pte = page_check_address(page, mm, address, &ptl, 0);
 	if (!pte)
 		goto out;
@@ -1580,10 +1582,10 @@ static bool invalid_migration_vma(struct vm_area_struct *vma, void *arg)
 	return is_vma_temporary_stack(vma);
 }
 
-static int page_not_mapped(struct page *page)
+static int page_mapcount_is_zero(struct page *page)
 {
-	return !page_mapped(page);
-};
+	return !page_mapcount(page);
+}
 
 /**
  * try_to_unmap - try to remove all page table mappings to a page
@@ -1610,12 +1612,10 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
 	struct rmap_walk_control rwc = {
 		.rmap_one = try_to_unmap_one,
 		.arg = &rp,
-		.done = page_not_mapped,
+		.done = page_mapcount_is_zero,
 		.anon_lock = page_lock_anon_vma_read,
 	};
 
-	VM_BUG_ON_PAGE(!PageHuge(page) && PageTransHuge(page), page);
-
 	/*
 	 * During exec, a temporary VMA is setup and later moved.
 	 * The VMA is moved under the anon_vma lock but not the
@@ -1627,9 +1627,12 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
 	if ((flags & TTU_MIGRATION) && !PageKsm(page) && PageAnon(page))
 		rwc.invalid_vma = invalid_migration_vma;
 
-	ret = rmap_walk(page, &rwc);
+	if (flags & TTU_RMAP_LOCKED)
+		ret = rmap_walk_locked(page, &rwc);
+	else
+		ret = rmap_walk(page, &rwc);
 
-	if (ret != SWAP_MLOCK && !page_mapped(page)) {
+	if (ret != SWAP_MLOCK && !page_mapcount(page)) {
 		ret = SWAP_SUCCESS;
 		if (rp.lazyfreed && !PageDirty(page))
 			ret = SWAP_LZFREE;
@@ -1637,6 +1640,11 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
 	return ret;
 }
 
+static int page_not_mapped(struct page *page)
+{
+	return !page_mapped(page);
+};
+
 /**
  * try_to_munlock - try to munlock a page
  * @page: the page to be munlocked
-- 
2.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/4] mm: make remove_migration_ptes() beyond mm/migration.c
  2016-02-03 15:14 ` Kirill A. Shutemov
@ 2016-02-03 15:14   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 28+ messages in thread
From: Kirill A. Shutemov @ 2016-02-03 15:14 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli
  Cc: Hugh Dickins, Dave Hansen, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, Sasha Levin, linux-kernel, linux-mm,
	Kirill A. Shutemov

The patch makes remove_migration_ptes() available to be used in
split_huge_page().

New parameter 'locked' added: as with try_to_umap() we need a way to
indicate that caller holds rmap lock.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/rmap.h |  2 ++
 mm/migrate.c         | 13 ++++++++-----
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 1fdde1ee7042..675b070f489a 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -257,6 +257,8 @@ int page_mkclean(struct page *);
  */
 int try_to_munlock(struct page *);
 
+void remove_migration_ptes(struct page *old, struct page *new, bool locked);
+
 /*
  * Called by memory-failure.c to kill processes.
  */
diff --git a/mm/migrate.c b/mm/migrate.c
index b1034f9c77e7..244a267fdb61 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -186,14 +186,17 @@ out:
  * Get rid of all migration entries and replace them by
  * references to the indicated page.
  */
-static void remove_migration_ptes(struct page *old, struct page *new)
+void remove_migration_ptes(struct page *old, struct page *new, bool locked)
 {
 	struct rmap_walk_control rwc = {
 		.rmap_one = remove_migration_pte,
 		.arg = old,
 	};
 
-	rmap_walk(new, &rwc);
+	if (locked)
+		rmap_walk_locked(new, &rwc);
+	else
+		rmap_walk(new, &rwc);
 }
 
 /*
@@ -698,7 +701,7 @@ static int writeout(struct address_space *mapping, struct page *page)
 	 * At this point we know that the migration attempt cannot
 	 * be successful.
 	 */
-	remove_migration_ptes(page, page);
+	remove_migration_ptes(page, page, false);
 
 	rc = mapping->a_ops->writepage(page, &wbc);
 
@@ -897,7 +900,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 
 	if (page_was_mapped)
 		remove_migration_ptes(page,
-			rc == MIGRATEPAGE_SUCCESS ? newpage : page);
+			rc == MIGRATEPAGE_SUCCESS ? newpage : page, false);
 
 out_unlock_both:
 	unlock_page(newpage);
@@ -1065,7 +1068,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 
 	if (page_was_mapped)
 		remove_migration_ptes(hpage,
-			rc == MIGRATEPAGE_SUCCESS ? new_hpage : hpage);
+			rc == MIGRATEPAGE_SUCCESS ? new_hpage : hpage, false);
 
 	unlock_page(new_hpage);
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/4] mm: make remove_migration_ptes() beyond mm/migration.c
@ 2016-02-03 15:14   ` Kirill A. Shutemov
  0 siblings, 0 replies; 28+ messages in thread
From: Kirill A. Shutemov @ 2016-02-03 15:14 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli
  Cc: Hugh Dickins, Dave Hansen, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, Sasha Levin, linux-kernel, linux-mm,
	Kirill A. Shutemov

The patch makes remove_migration_ptes() available to be used in
split_huge_page().

New parameter 'locked' added: as with try_to_umap() we need a way to
indicate that caller holds rmap lock.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/rmap.h |  2 ++
 mm/migrate.c         | 13 ++++++++-----
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 1fdde1ee7042..675b070f489a 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -257,6 +257,8 @@ int page_mkclean(struct page *);
  */
 int try_to_munlock(struct page *);
 
+void remove_migration_ptes(struct page *old, struct page *new, bool locked);
+
 /*
  * Called by memory-failure.c to kill processes.
  */
diff --git a/mm/migrate.c b/mm/migrate.c
index b1034f9c77e7..244a267fdb61 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -186,14 +186,17 @@ out:
  * Get rid of all migration entries and replace them by
  * references to the indicated page.
  */
-static void remove_migration_ptes(struct page *old, struct page *new)
+void remove_migration_ptes(struct page *old, struct page *new, bool locked)
 {
 	struct rmap_walk_control rwc = {
 		.rmap_one = remove_migration_pte,
 		.arg = old,
 	};
 
-	rmap_walk(new, &rwc);
+	if (locked)
+		rmap_walk_locked(new, &rwc);
+	else
+		rmap_walk(new, &rwc);
 }
 
 /*
@@ -698,7 +701,7 @@ static int writeout(struct address_space *mapping, struct page *page)
 	 * At this point we know that the migration attempt cannot
 	 * be successful.
 	 */
-	remove_migration_ptes(page, page);
+	remove_migration_ptes(page, page, false);
 
 	rc = mapping->a_ops->writepage(page, &wbc);
 
@@ -897,7 +900,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 
 	if (page_was_mapped)
 		remove_migration_ptes(page,
-			rc == MIGRATEPAGE_SUCCESS ? newpage : page);
+			rc == MIGRATEPAGE_SUCCESS ? newpage : page, false);
 
 out_unlock_both:
 	unlock_page(newpage);
@@ -1065,7 +1068,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 
 	if (page_was_mapped)
 		remove_migration_ptes(hpage,
-			rc == MIGRATEPAGE_SUCCESS ? new_hpage : hpage);
+			rc == MIGRATEPAGE_SUCCESS ? new_hpage : hpage, false);
 
 	unlock_page(new_hpage);
 
-- 
2.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/4] thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers
  2016-02-03 15:14 ` Kirill A. Shutemov
@ 2016-02-03 15:14   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 28+ messages in thread
From: Kirill A. Shutemov @ 2016-02-03 15:14 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli
  Cc: Hugh Dickins, Dave Hansen, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, Sasha Levin, linux-kernel, linux-mm,
	Kirill A. Shutemov

freeze_page() and unfreeze_page() helpers evolved in rather complex
beasts. It would be nice to cut complexity of this code.

This patch rewrites freeze_page() using standard try_to_unmap().
unfreeze_page() is rewritten with remove_migration_ptes().

The result is much simpler.

But the new variant is somewhat slower. Current helpers iterates over
VMAs the compound page is mapped to, and then over ptes within this VMA.
New helpers iterates over small page, then over VMA the small page
mapped to, and only then find relevant pte.

Also we've lost optimization which allows to split PMD directly into
migration entries.

I don't think the slowdown is critical, considering how much simpler
result is and that split_huge_page() is quite rare nowadays. It only
happens due memory pressure or migration.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c | 214 +++++++------------------------------------------------
 1 file changed, 24 insertions(+), 190 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4213c76c8434..3bab26781f8d 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2825,7 +2825,7 @@ static void __split_huge_zero_page_pmd(struct vm_area_struct *vma,
 }
 
 static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
-		unsigned long haddr, bool freeze)
+		unsigned long haddr)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	struct page *page;
@@ -2867,18 +2867,12 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 		 * transferred to avoid any possibility of altering
 		 * permissions across VMAs.
 		 */
-		if (freeze) {
-			swp_entry_t swp_entry;
-			swp_entry = make_migration_entry(page + i, write);
-			entry = swp_entry_to_pte(swp_entry);
-		} else {
-			entry = mk_pte(page + i, vma->vm_page_prot);
-			entry = maybe_mkwrite(entry, vma);
-			if (!write)
-				entry = pte_wrprotect(entry);
-			if (!young)
-				entry = pte_mkold(entry);
-		}
+		entry = mk_pte(page + i, vma->vm_page_prot);
+		entry = maybe_mkwrite(entry, vma);
+		if (!write)
+			entry = pte_wrprotect(entry);
+		if (!young)
+			entry = pte_mkold(entry);
 		if (dirty)
 			SetPageDirty(page + i);
 		pte = pte_offset_map(&_pmd, haddr);
@@ -2931,13 +2925,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	 */
 	pmdp_invalidate(vma, haddr, pmd);
 	pmd_populate(mm, pmd, pgtable);
-
-	if (freeze) {
-		for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
-			page_remove_rmap(page + i, false);
-			put_page(page + i);
-		}
-	}
 }
 
 void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
@@ -2958,7 +2945,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 			page = NULL;
 	} else if (!pmd_devmap(*pmd))
 		goto out;
-	__split_huge_pmd_locked(vma, pmd, haddr, false);
+	__split_huge_pmd_locked(vma, pmd, haddr);
 out:
 	spin_unlock(ptl);
 	mmu_notifier_invalidate_range_end(mm, haddr, haddr + HPAGE_PMD_SIZE);
@@ -3035,180 +3022,27 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma,
 	}
 }
 
-static void freeze_page_vma(struct vm_area_struct *vma, struct page *page,
-		unsigned long address)
+static void freeze_page(struct page *page)
 {
-	unsigned long haddr = address & HPAGE_PMD_MASK;
-	spinlock_t *ptl;
-	pgd_t *pgd;
-	pud_t *pud;
-	pmd_t *pmd;
-	pte_t *pte;
-	int i, nr = HPAGE_PMD_NR;
-
-	/* Skip pages which doesn't belong to the VMA */
-	if (address < vma->vm_start) {
-		int off = (vma->vm_start - address) >> PAGE_SHIFT;
-		page += off;
-		nr -= off;
-		address = vma->vm_start;
-	}
-
-	pgd = pgd_offset(vma->vm_mm, address);
-	if (!pgd_present(*pgd))
-		return;
-	pud = pud_offset(pgd, address);
-	if (!pud_present(*pud))
-		return;
-	pmd = pmd_offset(pud, address);
-	ptl = pmd_lock(vma->vm_mm, pmd);
-	if (!pmd_present(*pmd)) {
-		spin_unlock(ptl);
-		return;
-	}
-	if (pmd_trans_huge(*pmd)) {
-		if (page == pmd_page(*pmd))
-			__split_huge_pmd_locked(vma, pmd, haddr, true);
-		spin_unlock(ptl);
-		return;
-	}
-	spin_unlock(ptl);
-
-	pte = pte_offset_map_lock(vma->vm_mm, pmd, address, &ptl);
-	for (i = 0; i < nr; i++, address += PAGE_SIZE, page++, pte++) {
-		pte_t entry, swp_pte;
-		swp_entry_t swp_entry;
-
-		/*
-		 * We've just crossed page table boundary: need to map next one.
-		 * It can happen if THP was mremaped to non PMD-aligned address.
-		 */
-		if (unlikely(address == haddr + HPAGE_PMD_SIZE)) {
-			pte_unmap_unlock(pte - 1, ptl);
-			pmd = mm_find_pmd(vma->vm_mm, address);
-			if (!pmd)
-				return;
-			pte = pte_offset_map_lock(vma->vm_mm, pmd,
-					address, &ptl);
-		}
-
-		if (!pte_present(*pte))
-			continue;
-		if (page_to_pfn(page) != pte_pfn(*pte))
-			continue;
-		flush_cache_page(vma, address, page_to_pfn(page));
-		entry = ptep_clear_flush(vma, address, pte);
-		if (pte_dirty(entry))
-			SetPageDirty(page);
-		swp_entry = make_migration_entry(page, pte_write(entry));
-		swp_pte = swp_entry_to_pte(swp_entry);
-		if (pte_soft_dirty(entry))
-			swp_pte = pte_swp_mksoft_dirty(swp_pte);
-		set_pte_at(vma->vm_mm, address, pte, swp_pte);
-		page_remove_rmap(page, false);
-		put_page(page);
-	}
-	pte_unmap_unlock(pte - 1, ptl);
-}
-
-static void freeze_page(struct anon_vma *anon_vma, struct page *page)
-{
-	struct anon_vma_chain *avc;
-	pgoff_t pgoff = page_to_pgoff(page);
+	enum ttu_flags ttu_flags = TTU_MIGRATION | TTU_IGNORE_MLOCK |
+		TTU_IGNORE_ACCESS | TTU_RMAP_LOCKED;
+	int i, ret;
 
 	VM_BUG_ON_PAGE(!PageHead(page), page);
 
-	anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root, pgoff,
-			pgoff + HPAGE_PMD_NR - 1) {
-		unsigned long address = __vma_address(page, avc->vma);
-
-		mmu_notifier_invalidate_range_start(avc->vma->vm_mm,
-				address, address + HPAGE_PMD_SIZE);
-		freeze_page_vma(avc->vma, page, address);
-		mmu_notifier_invalidate_range_end(avc->vma->vm_mm,
-				address, address + HPAGE_PMD_SIZE);
-	}
-}
-
-static void unfreeze_page_vma(struct vm_area_struct *vma, struct page *page,
-		unsigned long address)
-{
-	spinlock_t *ptl;
-	pmd_t *pmd;
-	pte_t *pte, entry;
-	swp_entry_t swp_entry;
-	unsigned long haddr = address & HPAGE_PMD_MASK;
-	int i, nr = HPAGE_PMD_NR;
-
-	/* Skip pages which doesn't belong to the VMA */
-	if (address < vma->vm_start) {
-		int off = (vma->vm_start - address) >> PAGE_SHIFT;
-		page += off;
-		nr -= off;
-		address = vma->vm_start;
-	}
-
-	pmd = mm_find_pmd(vma->vm_mm, address);
-	if (!pmd)
-		return;
-
-	pte = pte_offset_map_lock(vma->vm_mm, pmd, address, &ptl);
-	for (i = 0; i < nr; i++, address += PAGE_SIZE, page++, pte++) {
-		/*
-		 * We've just crossed page table boundary: need to map next one.
-		 * It can happen if THP was mremaped to non-PMD aligned address.
-		 */
-		if (unlikely(address == haddr + HPAGE_PMD_SIZE)) {
-			pte_unmap_unlock(pte - 1, ptl);
-			pmd = mm_find_pmd(vma->vm_mm, address);
-			if (!pmd)
-				return;
-			pte = pte_offset_map_lock(vma->vm_mm, pmd,
-					address, &ptl);
-		}
-
-		if (!is_swap_pte(*pte))
-			continue;
-
-		swp_entry = pte_to_swp_entry(*pte);
-		if (!is_migration_entry(swp_entry))
-			continue;
-		if (migration_entry_to_page(swp_entry) != page)
-			continue;
-
-		get_page(page);
-		page_add_anon_rmap(page, vma, address, false);
-
-		entry = pte_mkold(mk_pte(page, vma->vm_page_prot));
-		if (PageDirty(page))
-			entry = pte_mkdirty(entry);
-		if (is_write_migration_entry(swp_entry))
-			entry = maybe_mkwrite(entry, vma);
-
-		flush_dcache_page(page);
-		set_pte_at(vma->vm_mm, address, pte, entry);
-
-		/* No need to invalidate - it was non-present before */
-		update_mmu_cache(vma, address, pte);
-	}
-	pte_unmap_unlock(pte - 1, ptl);
+	/* We only need TTU_SPLIT_HUGE_PMD once */
+	ret = try_to_unmap(page, ttu_flags | TTU_SPLIT_HUGE_PMD);
+	for (i = 1; !ret && i < HPAGE_PMD_NR; i++)
+		ret = try_to_unmap(page + i, ttu_flags);
+	VM_BUG_ON(ret);
 }
 
-static void unfreeze_page(struct anon_vma *anon_vma, struct page *page)
+static void unfreeze_page(struct page *page)
 {
-	struct anon_vma_chain *avc;
-	pgoff_t pgoff = page_to_pgoff(page);
-
-	anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root,
-			pgoff, pgoff + HPAGE_PMD_NR - 1) {
-		unsigned long address = __vma_address(page, avc->vma);
+	int i;
 
-		mmu_notifier_invalidate_range_start(avc->vma->vm_mm,
-				address, address + HPAGE_PMD_SIZE);
-		unfreeze_page_vma(avc->vma, page, address);
-		mmu_notifier_invalidate_range_end(avc->vma->vm_mm,
-				address, address + HPAGE_PMD_SIZE);
-	}
+	for (i = 0; i < HPAGE_PMD_NR; i++)
+		remove_migration_ptes(page + i, page + i, true);
 }
 
 static void __split_huge_page_tail(struct page *head, int tail,
@@ -3286,7 +3120,7 @@ static void __split_huge_page(struct page *page, struct list_head *list)
 	ClearPageCompound(head);
 	spin_unlock_irq(&zone->lru_lock);
 
-	unfreeze_page(page_anon_vma(head), head);
+	unfreeze_page(head);
 
 	for (i = 0; i < HPAGE_PMD_NR; i++) {
 		struct page *subpage = head + i;
@@ -3382,7 +3216,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 	}
 
 	mlocked = PageMlocked(page);
-	freeze_page(anon_vma, head);
+	freeze_page(head);
 	VM_BUG_ON_PAGE(compound_mapcount(head), head);
 
 	/* Make sure the page is not on per-CPU pagevec as it takes pin */
@@ -3411,7 +3245,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 		BUG();
 	} else {
 		spin_unlock_irqrestore(&pgdata->split_queue_lock, flags);
-		unfreeze_page(anon_vma, head);
+		unfreeze_page(head);
 		ret = -EBUSY;
 	}
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/4] thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers
@ 2016-02-03 15:14   ` Kirill A. Shutemov
  0 siblings, 0 replies; 28+ messages in thread
From: Kirill A. Shutemov @ 2016-02-03 15:14 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli
  Cc: Hugh Dickins, Dave Hansen, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, Sasha Levin, linux-kernel, linux-mm,
	Kirill A. Shutemov

freeze_page() and unfreeze_page() helpers evolved in rather complex
beasts. It would be nice to cut complexity of this code.

This patch rewrites freeze_page() using standard try_to_unmap().
unfreeze_page() is rewritten with remove_migration_ptes().

The result is much simpler.

But the new variant is somewhat slower. Current helpers iterates over
VMAs the compound page is mapped to, and then over ptes within this VMA.
New helpers iterates over small page, then over VMA the small page
mapped to, and only then find relevant pte.

Also we've lost optimization which allows to split PMD directly into
migration entries.

I don't think the slowdown is critical, considering how much simpler
result is and that split_huge_page() is quite rare nowadays. It only
happens due memory pressure or migration.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c | 214 +++++++------------------------------------------------
 1 file changed, 24 insertions(+), 190 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4213c76c8434..3bab26781f8d 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2825,7 +2825,7 @@ static void __split_huge_zero_page_pmd(struct vm_area_struct *vma,
 }
 
 static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
-		unsigned long haddr, bool freeze)
+		unsigned long haddr)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	struct page *page;
@@ -2867,18 +2867,12 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 		 * transferred to avoid any possibility of altering
 		 * permissions across VMAs.
 		 */
-		if (freeze) {
-			swp_entry_t swp_entry;
-			swp_entry = make_migration_entry(page + i, write);
-			entry = swp_entry_to_pte(swp_entry);
-		} else {
-			entry = mk_pte(page + i, vma->vm_page_prot);
-			entry = maybe_mkwrite(entry, vma);
-			if (!write)
-				entry = pte_wrprotect(entry);
-			if (!young)
-				entry = pte_mkold(entry);
-		}
+		entry = mk_pte(page + i, vma->vm_page_prot);
+		entry = maybe_mkwrite(entry, vma);
+		if (!write)
+			entry = pte_wrprotect(entry);
+		if (!young)
+			entry = pte_mkold(entry);
 		if (dirty)
 			SetPageDirty(page + i);
 		pte = pte_offset_map(&_pmd, haddr);
@@ -2931,13 +2925,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	 */
 	pmdp_invalidate(vma, haddr, pmd);
 	pmd_populate(mm, pmd, pgtable);
-
-	if (freeze) {
-		for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
-			page_remove_rmap(page + i, false);
-			put_page(page + i);
-		}
-	}
 }
 
 void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
@@ -2958,7 +2945,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 			page = NULL;
 	} else if (!pmd_devmap(*pmd))
 		goto out;
-	__split_huge_pmd_locked(vma, pmd, haddr, false);
+	__split_huge_pmd_locked(vma, pmd, haddr);
 out:
 	spin_unlock(ptl);
 	mmu_notifier_invalidate_range_end(mm, haddr, haddr + HPAGE_PMD_SIZE);
@@ -3035,180 +3022,27 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma,
 	}
 }
 
-static void freeze_page_vma(struct vm_area_struct *vma, struct page *page,
-		unsigned long address)
+static void freeze_page(struct page *page)
 {
-	unsigned long haddr = address & HPAGE_PMD_MASK;
-	spinlock_t *ptl;
-	pgd_t *pgd;
-	pud_t *pud;
-	pmd_t *pmd;
-	pte_t *pte;
-	int i, nr = HPAGE_PMD_NR;
-
-	/* Skip pages which doesn't belong to the VMA */
-	if (address < vma->vm_start) {
-		int off = (vma->vm_start - address) >> PAGE_SHIFT;
-		page += off;
-		nr -= off;
-		address = vma->vm_start;
-	}
-
-	pgd = pgd_offset(vma->vm_mm, address);
-	if (!pgd_present(*pgd))
-		return;
-	pud = pud_offset(pgd, address);
-	if (!pud_present(*pud))
-		return;
-	pmd = pmd_offset(pud, address);
-	ptl = pmd_lock(vma->vm_mm, pmd);
-	if (!pmd_present(*pmd)) {
-		spin_unlock(ptl);
-		return;
-	}
-	if (pmd_trans_huge(*pmd)) {
-		if (page == pmd_page(*pmd))
-			__split_huge_pmd_locked(vma, pmd, haddr, true);
-		spin_unlock(ptl);
-		return;
-	}
-	spin_unlock(ptl);
-
-	pte = pte_offset_map_lock(vma->vm_mm, pmd, address, &ptl);
-	for (i = 0; i < nr; i++, address += PAGE_SIZE, page++, pte++) {
-		pte_t entry, swp_pte;
-		swp_entry_t swp_entry;
-
-		/*
-		 * We've just crossed page table boundary: need to map next one.
-		 * It can happen if THP was mremaped to non PMD-aligned address.
-		 */
-		if (unlikely(address == haddr + HPAGE_PMD_SIZE)) {
-			pte_unmap_unlock(pte - 1, ptl);
-			pmd = mm_find_pmd(vma->vm_mm, address);
-			if (!pmd)
-				return;
-			pte = pte_offset_map_lock(vma->vm_mm, pmd,
-					address, &ptl);
-		}
-
-		if (!pte_present(*pte))
-			continue;
-		if (page_to_pfn(page) != pte_pfn(*pte))
-			continue;
-		flush_cache_page(vma, address, page_to_pfn(page));
-		entry = ptep_clear_flush(vma, address, pte);
-		if (pte_dirty(entry))
-			SetPageDirty(page);
-		swp_entry = make_migration_entry(page, pte_write(entry));
-		swp_pte = swp_entry_to_pte(swp_entry);
-		if (pte_soft_dirty(entry))
-			swp_pte = pte_swp_mksoft_dirty(swp_pte);
-		set_pte_at(vma->vm_mm, address, pte, swp_pte);
-		page_remove_rmap(page, false);
-		put_page(page);
-	}
-	pte_unmap_unlock(pte - 1, ptl);
-}
-
-static void freeze_page(struct anon_vma *anon_vma, struct page *page)
-{
-	struct anon_vma_chain *avc;
-	pgoff_t pgoff = page_to_pgoff(page);
+	enum ttu_flags ttu_flags = TTU_MIGRATION | TTU_IGNORE_MLOCK |
+		TTU_IGNORE_ACCESS | TTU_RMAP_LOCKED;
+	int i, ret;
 
 	VM_BUG_ON_PAGE(!PageHead(page), page);
 
-	anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root, pgoff,
-			pgoff + HPAGE_PMD_NR - 1) {
-		unsigned long address = __vma_address(page, avc->vma);
-
-		mmu_notifier_invalidate_range_start(avc->vma->vm_mm,
-				address, address + HPAGE_PMD_SIZE);
-		freeze_page_vma(avc->vma, page, address);
-		mmu_notifier_invalidate_range_end(avc->vma->vm_mm,
-				address, address + HPAGE_PMD_SIZE);
-	}
-}
-
-static void unfreeze_page_vma(struct vm_area_struct *vma, struct page *page,
-		unsigned long address)
-{
-	spinlock_t *ptl;
-	pmd_t *pmd;
-	pte_t *pte, entry;
-	swp_entry_t swp_entry;
-	unsigned long haddr = address & HPAGE_PMD_MASK;
-	int i, nr = HPAGE_PMD_NR;
-
-	/* Skip pages which doesn't belong to the VMA */
-	if (address < vma->vm_start) {
-		int off = (vma->vm_start - address) >> PAGE_SHIFT;
-		page += off;
-		nr -= off;
-		address = vma->vm_start;
-	}
-
-	pmd = mm_find_pmd(vma->vm_mm, address);
-	if (!pmd)
-		return;
-
-	pte = pte_offset_map_lock(vma->vm_mm, pmd, address, &ptl);
-	for (i = 0; i < nr; i++, address += PAGE_SIZE, page++, pte++) {
-		/*
-		 * We've just crossed page table boundary: need to map next one.
-		 * It can happen if THP was mremaped to non-PMD aligned address.
-		 */
-		if (unlikely(address == haddr + HPAGE_PMD_SIZE)) {
-			pte_unmap_unlock(pte - 1, ptl);
-			pmd = mm_find_pmd(vma->vm_mm, address);
-			if (!pmd)
-				return;
-			pte = pte_offset_map_lock(vma->vm_mm, pmd,
-					address, &ptl);
-		}
-
-		if (!is_swap_pte(*pte))
-			continue;
-
-		swp_entry = pte_to_swp_entry(*pte);
-		if (!is_migration_entry(swp_entry))
-			continue;
-		if (migration_entry_to_page(swp_entry) != page)
-			continue;
-
-		get_page(page);
-		page_add_anon_rmap(page, vma, address, false);
-
-		entry = pte_mkold(mk_pte(page, vma->vm_page_prot));
-		if (PageDirty(page))
-			entry = pte_mkdirty(entry);
-		if (is_write_migration_entry(swp_entry))
-			entry = maybe_mkwrite(entry, vma);
-
-		flush_dcache_page(page);
-		set_pte_at(vma->vm_mm, address, pte, entry);
-
-		/* No need to invalidate - it was non-present before */
-		update_mmu_cache(vma, address, pte);
-	}
-	pte_unmap_unlock(pte - 1, ptl);
+	/* We only need TTU_SPLIT_HUGE_PMD once */
+	ret = try_to_unmap(page, ttu_flags | TTU_SPLIT_HUGE_PMD);
+	for (i = 1; !ret && i < HPAGE_PMD_NR; i++)
+		ret = try_to_unmap(page + i, ttu_flags);
+	VM_BUG_ON(ret);
 }
 
-static void unfreeze_page(struct anon_vma *anon_vma, struct page *page)
+static void unfreeze_page(struct page *page)
 {
-	struct anon_vma_chain *avc;
-	pgoff_t pgoff = page_to_pgoff(page);
-
-	anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root,
-			pgoff, pgoff + HPAGE_PMD_NR - 1) {
-		unsigned long address = __vma_address(page, avc->vma);
+	int i;
 
-		mmu_notifier_invalidate_range_start(avc->vma->vm_mm,
-				address, address + HPAGE_PMD_SIZE);
-		unfreeze_page_vma(avc->vma, page, address);
-		mmu_notifier_invalidate_range_end(avc->vma->vm_mm,
-				address, address + HPAGE_PMD_SIZE);
-	}
+	for (i = 0; i < HPAGE_PMD_NR; i++)
+		remove_migration_ptes(page + i, page + i, true);
 }
 
 static void __split_huge_page_tail(struct page *head, int tail,
@@ -3286,7 +3120,7 @@ static void __split_huge_page(struct page *page, struct list_head *list)
 	ClearPageCompound(head);
 	spin_unlock_irq(&zone->lru_lock);
 
-	unfreeze_page(page_anon_vma(head), head);
+	unfreeze_page(head);
 
 	for (i = 0; i < HPAGE_PMD_NR; i++) {
 		struct page *subpage = head + i;
@@ -3382,7 +3216,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 	}
 
 	mlocked = PageMlocked(page);
-	freeze_page(anon_vma, head);
+	freeze_page(head);
 	VM_BUG_ON_PAGE(compound_mapcount(head), head);
 
 	/* Make sure the page is not on per-CPU pagevec as it takes pin */
@@ -3411,7 +3245,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 		BUG();
 	} else {
 		spin_unlock_irqrestore(&pgdata->split_queue_lock, flags);
-		unfreeze_page(anon_vma, head);
+		unfreeze_page(head);
 		ret = -EBUSY;
 	}
 
-- 
2.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers
  2016-02-03 15:14   ` Kirill A. Shutemov
@ 2016-02-03 15:42     ` Dave Hansen
  -1 siblings, 0 replies; 28+ messages in thread
From: Dave Hansen @ 2016-02-03 15:42 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli
  Cc: Hugh Dickins, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	Sasha Levin, linux-kernel, linux-mm

On 02/03/2016 07:14 AM, Kirill A. Shutemov wrote:
> But the new variant is somewhat slower. Current helpers iterates over
> VMAs the compound page is mapped to, and then over ptes within this VMA.
> New helpers iterates over small page, then over VMA the small page
> mapped to, and only then find relevant pte.

The code simplification here is really attractive.  Can you quantify
what the slowdown is?  Is it noticeable, or would it be in the noise
during all the other stuff that happens under memory pressure?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers
@ 2016-02-03 15:42     ` Dave Hansen
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Hansen @ 2016-02-03 15:42 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli
  Cc: Hugh Dickins, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	Sasha Levin, linux-kernel, linux-mm

On 02/03/2016 07:14 AM, Kirill A. Shutemov wrote:
> But the new variant is somewhat slower. Current helpers iterates over
> VMAs the compound page is mapped to, and then over ptes within this VMA.
> New helpers iterates over small page, then over VMA the small page
> mapped to, and only then find relevant pte.

The code simplification here is really attractive.  Can you quantify
what the slowdown is?  Is it noticeable, or would it be in the noise
during all the other stuff that happens under memory pressure?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/4] rmap: introduce rmap_walk_locked()
  2016-02-03 15:14   ` Kirill A. Shutemov
@ 2016-02-03 22:40     ` Andrew Morton
  -1 siblings, 0 replies; 28+ messages in thread
From: Andrew Morton @ 2016-02-03 22:40 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrea Arcangeli, Hugh Dickins, Dave Hansen, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, Sasha Levin, linux-kernel,
	linux-mm

On Wed,  3 Feb 2016 18:14:16 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> rmap_walk_locked() is the same as rmap_walk(), but caller takes care
> about relevant rmap lock. It only supports anonymous pages for now.
> 
> It's preparation to switch THP splitting from custom rmap walk in
> freeze_page()/unfreeze_page() to generic one.
> 
> ...
>
> +/* Like rmap_walk, but caller holds relevant rmap lock */
> +int rmap_walk_locked(struct page *page, struct rmap_walk_control *rwc)
> +{
> +	/* only for anon pages for now */
> +	VM_BUG_ON_PAGE(!PageAnon(page) || PageKsm(page), page);
> +	return rmap_walk_anon(page, rwc, true);
> +}

Should be rmap_walk_anon_locked()?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/4] rmap: introduce rmap_walk_locked()
@ 2016-02-03 22:40     ` Andrew Morton
  0 siblings, 0 replies; 28+ messages in thread
From: Andrew Morton @ 2016-02-03 22:40 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrea Arcangeli, Hugh Dickins, Dave Hansen, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, Sasha Levin, linux-kernel,
	linux-mm

On Wed,  3 Feb 2016 18:14:16 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> rmap_walk_locked() is the same as rmap_walk(), but caller takes care
> about relevant rmap lock. It only supports anonymous pages for now.
> 
> It's preparation to switch THP splitting from custom rmap walk in
> freeze_page()/unfreeze_page() to generic one.
> 
> ...
>
> +/* Like rmap_walk, but caller holds relevant rmap lock */
> +int rmap_walk_locked(struct page *page, struct rmap_walk_control *rwc)
> +{
> +	/* only for anon pages for now */
> +	VM_BUG_ON_PAGE(!PageAnon(page) || PageKsm(page), page);
> +	return rmap_walk_anon(page, rwc, true);
> +}

Should be rmap_walk_anon_locked()?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers
  2016-02-03 15:42     ` Dave Hansen
@ 2016-02-03 22:43       ` Andrew Morton
  -1 siblings, 0 replies; 28+ messages in thread
From: Andrew Morton @ 2016-02-03 22:43 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Hugh Dickins, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, Sasha Levin, linux-kernel,
	linux-mm

On Wed, 3 Feb 2016 07:42:01 -0800 Dave Hansen <dave.hansen@intel.com> wrote:

> On 02/03/2016 07:14 AM, Kirill A. Shutemov wrote:
> > But the new variant is somewhat slower. Current helpers iterates over
> > VMAs the compound page is mapped to, and then over ptes within this VMA.
> > New helpers iterates over small page, then over VMA the small page
> > mapped to, and only then find relevant pte.
> 
> The code simplification here is really attractive.  Can you quantify
> what the slowdown is?  Is it noticeable, or would it be in the noise
> during all the other stuff that happens under memory pressure?

yup.  And the "more testing is required" is a bit worrisome.  Is this
code really ready for getting pounded upon in -next?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers
@ 2016-02-03 22:43       ` Andrew Morton
  0 siblings, 0 replies; 28+ messages in thread
From: Andrew Morton @ 2016-02-03 22:43 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Hugh Dickins, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, Sasha Levin, linux-kernel,
	linux-mm

On Wed, 3 Feb 2016 07:42:01 -0800 Dave Hansen <dave.hansen@intel.com> wrote:

> On 02/03/2016 07:14 AM, Kirill A. Shutemov wrote:
> > But the new variant is somewhat slower. Current helpers iterates over
> > VMAs the compound page is mapped to, and then over ptes within this VMA.
> > New helpers iterates over small page, then over VMA the small page
> > mapped to, and only then find relevant pte.
> 
> The code simplification here is really attractive.  Can you quantify
> what the slowdown is?  Is it noticeable, or would it be in the noise
> during all the other stuff that happens under memory pressure?

yup.  And the "more testing is required" is a bit worrisome.  Is this
code really ready for getting pounded upon in -next?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/4] rmap: introduce rmap_walk_locked()
  2016-02-03 22:40     ` Andrew Morton
@ 2016-02-03 22:45       ` Kirill A. Shutemov
  -1 siblings, 0 replies; 28+ messages in thread
From: Kirill A. Shutemov @ 2016-02-03 22:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andrea Arcangeli, Hugh Dickins, Dave Hansen, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, Sasha Levin, linux-kernel,
	linux-mm

On Wed, Feb 03, 2016 at 02:40:19PM -0800, Andrew Morton wrote:
> On Wed,  3 Feb 2016 18:14:16 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> 
> > rmap_walk_locked() is the same as rmap_walk(), but caller takes care
> > about relevant rmap lock. It only supports anonymous pages for now.
> > 
> > It's preparation to switch THP splitting from custom rmap walk in
> > freeze_page()/unfreeze_page() to generic one.
> > 
> > ...
> >
> > +/* Like rmap_walk, but caller holds relevant rmap lock */
> > +int rmap_walk_locked(struct page *page, struct rmap_walk_control *rwc)
> > +{
> > +	/* only for anon pages for now */
> > +	VM_BUG_ON_PAGE(!PageAnon(page) || PageKsm(page), page);
> > +	return rmap_walk_anon(page, rwc, true);
> > +}
> 
> Should be rmap_walk_anon_locked()?

I leave interface open for further extension for file mappings, once it
will be needed. Interface is mirroring plain rmap_walk()

If you prefer to rename the function, I can do it too.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/4] rmap: introduce rmap_walk_locked()
@ 2016-02-03 22:45       ` Kirill A. Shutemov
  0 siblings, 0 replies; 28+ messages in thread
From: Kirill A. Shutemov @ 2016-02-03 22:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andrea Arcangeli, Hugh Dickins, Dave Hansen, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, Sasha Levin, linux-kernel,
	linux-mm

On Wed, Feb 03, 2016 at 02:40:19PM -0800, Andrew Morton wrote:
> On Wed,  3 Feb 2016 18:14:16 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> 
> > rmap_walk_locked() is the same as rmap_walk(), but caller takes care
> > about relevant rmap lock. It only supports anonymous pages for now.
> > 
> > It's preparation to switch THP splitting from custom rmap walk in
> > freeze_page()/unfreeze_page() to generic one.
> > 
> > ...
> >
> > +/* Like rmap_walk, but caller holds relevant rmap lock */
> > +int rmap_walk_locked(struct page *page, struct rmap_walk_control *rwc)
> > +{
> > +	/* only for anon pages for now */
> > +	VM_BUG_ON_PAGE(!PageAnon(page) || PageKsm(page), page);
> > +	return rmap_walk_anon(page, rwc, true);
> > +}
> 
> Should be rmap_walk_anon_locked()?

I leave interface open for further extension for file mappings, once it
will be needed. Interface is mirroring plain rmap_walk()

If you prefer to rename the function, I can do it too.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers
  2016-02-03 22:43       ` Andrew Morton
@ 2016-02-03 22:53         ` Kirill A. Shutemov
  -1 siblings, 0 replies; 28+ messages in thread
From: Kirill A. Shutemov @ 2016-02-03 22:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dave Hansen, Andrea Arcangeli, Hugh Dickins, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, Sasha Levin, linux-kernel,
	linux-mm

On Wed, Feb 03, 2016 at 02:43:16PM -0800, Andrew Morton wrote:
> On Wed, 3 Feb 2016 07:42:01 -0800 Dave Hansen <dave.hansen@intel.com> wrote:
> 
> > On 02/03/2016 07:14 AM, Kirill A. Shutemov wrote:
> > > But the new variant is somewhat slower. Current helpers iterates over
> > > VMAs the compound page is mapped to, and then over ptes within this VMA.
> > > New helpers iterates over small page, then over VMA the small page
> > > mapped to, and only then find relevant pte.
> > 
> > The code simplification here is really attractive.  Can you quantify
> > what the slowdown is?  Is it noticeable, or would it be in the noise
> > during all the other stuff that happens under memory pressure?
> 
> yup.

It is not really clear, how to quantify this properly. Let me think more
about it.

> And the "more testing is required" is a bit worrisome.  Is this
> code really ready for getting pounded upon in -next?

By now it survived 5+ hours of fuzzing in 16 VM in parallel. I'll continue
with other tests tomorrow.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers
@ 2016-02-03 22:53         ` Kirill A. Shutemov
  0 siblings, 0 replies; 28+ messages in thread
From: Kirill A. Shutemov @ 2016-02-03 22:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dave Hansen, Andrea Arcangeli, Hugh Dickins, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, Sasha Levin, linux-kernel,
	linux-mm

On Wed, Feb 03, 2016 at 02:43:16PM -0800, Andrew Morton wrote:
> On Wed, 3 Feb 2016 07:42:01 -0800 Dave Hansen <dave.hansen@intel.com> wrote:
> 
> > On 02/03/2016 07:14 AM, Kirill A. Shutemov wrote:
> > > But the new variant is somewhat slower. Current helpers iterates over
> > > VMAs the compound page is mapped to, and then over ptes within this VMA.
> > > New helpers iterates over small page, then over VMA the small page
> > > mapped to, and only then find relevant pte.
> > 
> > The code simplification here is really attractive.  Can you quantify
> > what the slowdown is?  Is it noticeable, or would it be in the noise
> > during all the other stuff that happens under memory pressure?
> 
> yup.

It is not really clear, how to quantify this properly. Let me think more
about it.

> And the "more testing is required" is a bit worrisome.  Is this
> code really ready for getting pounded upon in -next?

By now it survived 5+ hours of fuzzing in 16 VM in parallel. I'll continue
with other tests tomorrow.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/4] rmap: introduce rmap_walk_locked()
  2016-02-03 22:45       ` Kirill A. Shutemov
@ 2016-02-03 22:56         ` Andrew Morton
  -1 siblings, 0 replies; 28+ messages in thread
From: Andrew Morton @ 2016-02-03 22:56 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrea Arcangeli, Hugh Dickins, Dave Hansen, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, Sasha Levin, linux-kernel,
	linux-mm

On Thu, 4 Feb 2016 01:45:07 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> On Wed, Feb 03, 2016 at 02:40:19PM -0800, Andrew Morton wrote:
> > On Wed,  3 Feb 2016 18:14:16 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> > 
> > > rmap_walk_locked() is the same as rmap_walk(), but caller takes care
> > > about relevant rmap lock. It only supports anonymous pages for now.
> > > 
> > > It's preparation to switch THP splitting from custom rmap walk in
> > > freeze_page()/unfreeze_page() to generic one.
> > > 
> > > ...
> > >
> > > +/* Like rmap_walk, but caller holds relevant rmap lock */
> > > +int rmap_walk_locked(struct page *page, struct rmap_walk_control *rwc)
> > > +{
> > > +	/* only for anon pages for now */
> > > +	VM_BUG_ON_PAGE(!PageAnon(page) || PageKsm(page), page);
> > > +	return rmap_walk_anon(page, rwc, true);
> > > +}
> > 
> > Should be rmap_walk_anon_locked()?
> 
> I leave interface open for further extension for file mappings, once it
> will be needed. Interface is mirroring plain rmap_walk()

hm, yes, I see.

> If you prefer to rename the function, I can do it too.

Well, what does "unlocked" mean in the context of rmap_walk_ksm() and
rmap_walk_file()?  That the caller holds totally different locks.  I
expect that sitting down and writing out the interface definition for
such an rmap_walk_locked() would reveal that we shouldn't have created
it.

I mean, if the caller is to call such an rmap_walk_locked(), he first
needs to work out if it's a ksm page or an anon page or a file page,
then take the appropriate lock and then call rmap_walk_locked(). 
That's silly - at this point he should directly call
rmap_walk_ksm_locked()?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/4] rmap: introduce rmap_walk_locked()
@ 2016-02-03 22:56         ` Andrew Morton
  0 siblings, 0 replies; 28+ messages in thread
From: Andrew Morton @ 2016-02-03 22:56 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrea Arcangeli, Hugh Dickins, Dave Hansen, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, Sasha Levin, linux-kernel,
	linux-mm

On Thu, 4 Feb 2016 01:45:07 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> On Wed, Feb 03, 2016 at 02:40:19PM -0800, Andrew Morton wrote:
> > On Wed,  3 Feb 2016 18:14:16 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> > 
> > > rmap_walk_locked() is the same as rmap_walk(), but caller takes care
> > > about relevant rmap lock. It only supports anonymous pages for now.
> > > 
> > > It's preparation to switch THP splitting from custom rmap walk in
> > > freeze_page()/unfreeze_page() to generic one.
> > > 
> > > ...
> > >
> > > +/* Like rmap_walk, but caller holds relevant rmap lock */
> > > +int rmap_walk_locked(struct page *page, struct rmap_walk_control *rwc)
> > > +{
> > > +	/* only for anon pages for now */
> > > +	VM_BUG_ON_PAGE(!PageAnon(page) || PageKsm(page), page);
> > > +	return rmap_walk_anon(page, rwc, true);
> > > +}
> > 
> > Should be rmap_walk_anon_locked()?
> 
> I leave interface open for further extension for file mappings, once it
> will be needed. Interface is mirroring plain rmap_walk()

hm, yes, I see.

> If you prefer to rename the function, I can do it too.

Well, what does "unlocked" mean in the context of rmap_walk_ksm() and
rmap_walk_file()?  That the caller holds totally different locks.  I
expect that sitting down and writing out the interface definition for
such an rmap_walk_locked() would reveal that we shouldn't have created
it.

I mean, if the caller is to call such an rmap_walk_locked(), he first
needs to work out if it's a ksm page or an anon page or a file page,
then take the appropriate lock and then call rmap_walk_locked(). 
That's silly - at this point he should directly call
rmap_walk_ksm_locked()?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers
  2016-02-03 15:42     ` Dave Hansen
@ 2016-02-04 14:27       ` Kirill A. Shutemov
  -1 siblings, 0 replies; 28+ messages in thread
From: Kirill A. Shutemov @ 2016-02-04 14:27 UTC (permalink / raw)
  To: Dave Hansen, Andrea Arcangeli, Andrew Morton
  Cc: Kirill A. Shutemov, Hugh Dickins, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, Sasha Levin, linux-kernel, linux-mm

On Wed, Feb 03, 2016 at 07:42:01AM -0800, Dave Hansen wrote:
> On 02/03/2016 07:14 AM, Kirill A. Shutemov wrote:
> > But the new variant is somewhat slower. Current helpers iterates over
> > VMAs the compound page is mapped to, and then over ptes within this VMA.
> > New helpers iterates over small page, then over VMA the small page
> > mapped to, and only then find relevant pte.
> 
> The code simplification here is really attractive.  Can you quantify
> what the slowdown is?  Is it noticeable, or would it be in the noise
> during all the other stuff that happens under memory pressure?

I don't know how to quantify it within whole memory pressure picture.
There're just too many variables to get some sense from split_huge_page()
contribution.

I've tried to measure split_huge_page() performance itself.

Testcase:

	#define _GNU_SOURCE
	#include <stdio.h>
	#include <stdlib.h>
	#include <unistd.h>
	#include <sys/mman.h>

	#define MB (1024UL * 1024)
	#define SIZE (4 * 1024 * 2 * MB)
	#define BASE ((void *)0x400000000000)

	#define FORKS 0

	int main()
	{
		char *p;
		unsigned long i;

		p = mmap(BASE, SIZE, PROT_READ | PROT_WRITE,
				MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE,
				-1, 0);
		if (p == MAP_FAILED)
			perror("mmap"), exit(1);

		for (i = 0; i < SIZE; i += 2 * MB) {
			munmap(p + i, 4096);
		}

		for (i = 0; i < FORKS; i++) {
			if (!fork())
				pause();
		}

		system("grep thp /proc/vmstat");
		system("time /bin/echo 3 > /proc/sys/vm/drop_caches");
		system("grep thp /proc/vmstat");
		return 0;
	}

Basically, we allocate 4k THP, make them partially unmapped, optionally
fork() the process multiple times and then trigger shrinker, measuring how
long would it take.

Optional fork() will make THP shared, meaning we need to freeze/unfreeze
ptes in multiple VMAs.

Numbers doesn't look pretty:

		FORKS == 0		FORKS == 100
Baseline:	1.93s ± 0.017s		32.08s ± 0.246s
Patched:	5.636s ± 0.021s		405.943s ± 6.126s
Slowdown:	2.92x			12.65x

With FORKS == 100, it looks especially bad. But having that many mapping
of the page is uncommon.

Any comments?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers
@ 2016-02-04 14:27       ` Kirill A. Shutemov
  0 siblings, 0 replies; 28+ messages in thread
From: Kirill A. Shutemov @ 2016-02-04 14:27 UTC (permalink / raw)
  To: Dave Hansen, Andrea Arcangeli, Andrew Morton
  Cc: Kirill A. Shutemov, Hugh Dickins, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, Sasha Levin, linux-kernel, linux-mm

On Wed, Feb 03, 2016 at 07:42:01AM -0800, Dave Hansen wrote:
> On 02/03/2016 07:14 AM, Kirill A. Shutemov wrote:
> > But the new variant is somewhat slower. Current helpers iterates over
> > VMAs the compound page is mapped to, and then over ptes within this VMA.
> > New helpers iterates over small page, then over VMA the small page
> > mapped to, and only then find relevant pte.
> 
> The code simplification here is really attractive.  Can you quantify
> what the slowdown is?  Is it noticeable, or would it be in the noise
> during all the other stuff that happens under memory pressure?

I don't know how to quantify it within whole memory pressure picture.
There're just too many variables to get some sense from split_huge_page()
contribution.

I've tried to measure split_huge_page() performance itself.

Testcase:

	#define _GNU_SOURCE
	#include <stdio.h>
	#include <stdlib.h>
	#include <unistd.h>
	#include <sys/mman.h>

	#define MB (1024UL * 1024)
	#define SIZE (4 * 1024 * 2 * MB)
	#define BASE ((void *)0x400000000000)

	#define FORKS 0

	int main()
	{
		char *p;
		unsigned long i;

		p = mmap(BASE, SIZE, PROT_READ | PROT_WRITE,
				MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE,
				-1, 0);
		if (p == MAP_FAILED)
			perror("mmap"), exit(1);

		for (i = 0; i < SIZE; i += 2 * MB) {
			munmap(p + i, 4096);
		}

		for (i = 0; i < FORKS; i++) {
			if (!fork())
				pause();
		}

		system("grep thp /proc/vmstat");
		system("time /bin/echo 3 > /proc/sys/vm/drop_caches");
		system("grep thp /proc/vmstat");
		return 0;
	}

Basically, we allocate 4k THP, make them partially unmapped, optionally
fork() the process multiple times and then trigger shrinker, measuring how
long would it take.

Optional fork() will make THP shared, meaning we need to freeze/unfreeze
ptes in multiple VMAs.

Numbers doesn't look pretty:

		FORKS == 0		FORKS == 100
Baseline:	1.93s +- 0.017s		32.08s +- 0.246s
Patched:	5.636s +- 0.021s		405.943s +- 6.126s
Slowdown:	2.92x			12.65x

With FORKS == 100, it looks especially bad. But having that many mapping
of the page is uncommon.

Any comments?

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/4] rmap: introduce rmap_walk_locked()
  2016-02-03 22:56         ` Andrew Morton
@ 2016-02-04 14:37           ` Kirill A. Shutemov
  -1 siblings, 0 replies; 28+ messages in thread
From: Kirill A. Shutemov @ 2016-02-04 14:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Hugh Dickins, Dave Hansen,
	Mel Gorman, Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, Sasha Levin, linux-kernel,
	linux-mm

On Wed, Feb 03, 2016 at 02:56:07PM -0800, Andrew Morton wrote:
> On Thu, 4 Feb 2016 01:45:07 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> 
> > On Wed, Feb 03, 2016 at 02:40:19PM -0800, Andrew Morton wrote:
> > > On Wed,  3 Feb 2016 18:14:16 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> > > 
> > > > rmap_walk_locked() is the same as rmap_walk(), but caller takes care
> > > > about relevant rmap lock. It only supports anonymous pages for now.
> > > > 
> > > > It's preparation to switch THP splitting from custom rmap walk in
> > > > freeze_page()/unfreeze_page() to generic one.
> > > > 
> > > > ...
> > > >
> > > > +/* Like rmap_walk, but caller holds relevant rmap lock */
> > > > +int rmap_walk_locked(struct page *page, struct rmap_walk_control *rwc)
> > > > +{
> > > > +	/* only for anon pages for now */
> > > > +	VM_BUG_ON_PAGE(!PageAnon(page) || PageKsm(page), page);
> > > > +	return rmap_walk_anon(page, rwc, true);
> > > > +}
> > > 
> > > Should be rmap_walk_anon_locked()?
> > 
> > I leave interface open for further extension for file mappings, once it
> > will be needed. Interface is mirroring plain rmap_walk()
> 
> hm, yes, I see.
> 
> > If you prefer to rename the function, I can do it too.
> 
> Well, what does "unlocked" mean in the context of rmap_walk_ksm() and
> rmap_walk_file()?

For rmap_walk_file(), caller should take i_mmap_lock for page->mapping at
least for read.

Not sure about KSM..

> That the caller holds totally different locks.  I expect that sitting
> down and writing out the interface definition for such an
> rmap_walk_locked() would reveal that we shouldn't have created it.
> 
> I mean, if the caller is to call such an rmap_walk_locked(), he first
> needs to work out if it's a ksm page or an anon page or a file page,
> then take the appropriate lock and then call rmap_walk_locked(). 
> That's silly - at this point he should directly call
> rmap_walk_ksm_locked()?

It makes sense if you have multiple pages to process and it's known that
they share reverse mapping.

Or if you want to keep the reverse mapping locked to keep continuity with
other operations.

In THP case, we have 512 subpages to unmap and we want to keep anon_vma
locked until the THP is split.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/4] rmap: introduce rmap_walk_locked()
@ 2016-02-04 14:37           ` Kirill A. Shutemov
  0 siblings, 0 replies; 28+ messages in thread
From: Kirill A. Shutemov @ 2016-02-04 14:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Hugh Dickins, Dave Hansen,
	Mel Gorman, Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, Sasha Levin, linux-kernel,
	linux-mm

On Wed, Feb 03, 2016 at 02:56:07PM -0800, Andrew Morton wrote:
> On Thu, 4 Feb 2016 01:45:07 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> 
> > On Wed, Feb 03, 2016 at 02:40:19PM -0800, Andrew Morton wrote:
> > > On Wed,  3 Feb 2016 18:14:16 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> > > 
> > > > rmap_walk_locked() is the same as rmap_walk(), but caller takes care
> > > > about relevant rmap lock. It only supports anonymous pages for now.
> > > > 
> > > > It's preparation to switch THP splitting from custom rmap walk in
> > > > freeze_page()/unfreeze_page() to generic one.
> > > > 
> > > > ...
> > > >
> > > > +/* Like rmap_walk, but caller holds relevant rmap lock */
> > > > +int rmap_walk_locked(struct page *page, struct rmap_walk_control *rwc)
> > > > +{
> > > > +	/* only for anon pages for now */
> > > > +	VM_BUG_ON_PAGE(!PageAnon(page) || PageKsm(page), page);
> > > > +	return rmap_walk_anon(page, rwc, true);
> > > > +}
> > > 
> > > Should be rmap_walk_anon_locked()?
> > 
> > I leave interface open for further extension for file mappings, once it
> > will be needed. Interface is mirroring plain rmap_walk()
> 
> hm, yes, I see.
> 
> > If you prefer to rename the function, I can do it too.
> 
> Well, what does "unlocked" mean in the context of rmap_walk_ksm() and
> rmap_walk_file()?

For rmap_walk_file(), caller should take i_mmap_lock for page->mapping at
least for read.

Not sure about KSM..

> That the caller holds totally different locks.  I expect that sitting
> down and writing out the interface definition for such an
> rmap_walk_locked() would reveal that we shouldn't have created it.
> 
> I mean, if the caller is to call such an rmap_walk_locked(), he first
> needs to work out if it's a ksm page or an anon page or a file page,
> then take the appropriate lock and then call rmap_walk_locked(). 
> That's silly - at this point he should directly call
> rmap_walk_ksm_locked()?

It makes sense if you have multiple pages to process and it's known that
they share reverse mapping.

Or if you want to keep the reverse mapping locked to keep continuity with
other operations.

In THP case, we have 512 subpages to unmap and we want to keep anon_vma
locked until the THP is split.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers
  2016-02-03 15:42     ` Dave Hansen
@ 2016-02-04 23:58       ` Kirill A. Shutemov
  -1 siblings, 0 replies; 28+ messages in thread
From: Kirill A. Shutemov @ 2016-02-04 23:58 UTC (permalink / raw)
  To: Dave Hansen, Andrew Morton, Andrea Arcangeli
  Cc: Kirill A. Shutemov, Hugh Dickins, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, Sasha Levin, linux-kernel, linux-mm

On Wed, Feb 03, 2016 at 07:42:01AM -0800, Dave Hansen wrote:
> On 02/03/2016 07:14 AM, Kirill A. Shutemov wrote:
> > But the new variant is somewhat slower. Current helpers iterates over
> > VMAs the compound page is mapped to, and then over ptes within this VMA.
> > New helpers iterates over small page, then over VMA the small page
> > mapped to, and only then find relevant pte.
> 
> The code simplification here is really attractive.  Can you quantify
> what the slowdown is?  Is it noticeable, or would it be in the noise
> during all the other stuff that happens under memory pressure?

Okay, here's more realistic scenario: migration 8GiB worth of THP.

Testcase:
	#define _GNU_SOURCE
	#include <stdio.h>
	#include <stdlib.h>
	#include <unistd.h>
	#include <sys/mman.h>
	#include <linux/mempolicy.h>
	#include <numaif.h>

	#define MB (1024UL * 1024)
	#define SIZE (4 * 1024 * 2 * MB)
	#define BASE ((void *)0x400000000000)

	#include <time.h>

	void timespec_diff(struct timespec *start, struct timespec *stop,
			struct timespec *result)
	{
		if ((stop->tv_nsec - start->tv_nsec) < 0) {
			result->tv_sec = stop->tv_sec - start->tv_sec - 1;
			result->tv_nsec = stop->tv_nsec - start->tv_nsec + 1000000000;
		} else {
			result->tv_sec = stop->tv_sec - start->tv_sec;
			result->tv_nsec = stop->tv_nsec - start->tv_nsec;
		}
	}

	int main()
	{
		char *p;
		unsigned long ret, node_mask;
		struct timespec start, stop, result;

		node_mask = 0b01;
		ret = set_mempolicy(MPOL_BIND, &node_mask, 64);
		if (ret == -1)
			perror("set_mempolicy"), exit(1);
		p = mmap(BASE, SIZE, PROT_READ | PROT_WRITE,
				MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE,
				-1, 0);
		if (p == MAP_FAILED)
			perror("mmap"), exit(1);

		system("grep thp /proc/vmstat");
		clock_gettime(CLOCK_MONOTONIC, &start);
		node_mask = 0b10;
		ret = mbind(p, SIZE, MPOL_BIND, &node_mask, 64, MPOL_MF_MOVE);
		if (ret == -1)
			perror("mbind"), exit(1);
		clock_gettime(CLOCK_MONOTONIC, &stop);
		system("grep thp /proc/vmstat");

		timespec_diff(&start, &stop, &result);
		printf("--------------------------\n");
		printf("%ld.%09lds\n", result.tv_sec, result.tv_nsec);

		return 0;
	}

Baseline: 25.146 ± 0.141
Patched:  28.684 ± 0.298
Slowdown: 1.14x

Can we tolerate this?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers
@ 2016-02-04 23:58       ` Kirill A. Shutemov
  0 siblings, 0 replies; 28+ messages in thread
From: Kirill A. Shutemov @ 2016-02-04 23:58 UTC (permalink / raw)
  To: Dave Hansen, Andrew Morton, Andrea Arcangeli
  Cc: Kirill A. Shutemov, Hugh Dickins, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, Sasha Levin, linux-kernel, linux-mm

On Wed, Feb 03, 2016 at 07:42:01AM -0800, Dave Hansen wrote:
> On 02/03/2016 07:14 AM, Kirill A. Shutemov wrote:
> > But the new variant is somewhat slower. Current helpers iterates over
> > VMAs the compound page is mapped to, and then over ptes within this VMA.
> > New helpers iterates over small page, then over VMA the small page
> > mapped to, and only then find relevant pte.
> 
> The code simplification here is really attractive.  Can you quantify
> what the slowdown is?  Is it noticeable, or would it be in the noise
> during all the other stuff that happens under memory pressure?

Okay, here's more realistic scenario: migration 8GiB worth of THP.

Testcase:
	#define _GNU_SOURCE
	#include <stdio.h>
	#include <stdlib.h>
	#include <unistd.h>
	#include <sys/mman.h>
	#include <linux/mempolicy.h>
	#include <numaif.h>

	#define MB (1024UL * 1024)
	#define SIZE (4 * 1024 * 2 * MB)
	#define BASE ((void *)0x400000000000)

	#include <time.h>

	void timespec_diff(struct timespec *start, struct timespec *stop,
			struct timespec *result)
	{
		if ((stop->tv_nsec - start->tv_nsec) < 0) {
			result->tv_sec = stop->tv_sec - start->tv_sec - 1;
			result->tv_nsec = stop->tv_nsec - start->tv_nsec + 1000000000;
		} else {
			result->tv_sec = stop->tv_sec - start->tv_sec;
			result->tv_nsec = stop->tv_nsec - start->tv_nsec;
		}
	}

	int main()
	{
		char *p;
		unsigned long ret, node_mask;
		struct timespec start, stop, result;

		node_mask = 0b01;
		ret = set_mempolicy(MPOL_BIND, &node_mask, 64);
		if (ret == -1)
			perror("set_mempolicy"), exit(1);
		p = mmap(BASE, SIZE, PROT_READ | PROT_WRITE,
				MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE,
				-1, 0);
		if (p == MAP_FAILED)
			perror("mmap"), exit(1);

		system("grep thp /proc/vmstat");
		clock_gettime(CLOCK_MONOTONIC, &start);
		node_mask = 0b10;
		ret = mbind(p, SIZE, MPOL_BIND, &node_mask, 64, MPOL_MF_MOVE);
		if (ret == -1)
			perror("mbind"), exit(1);
		clock_gettime(CLOCK_MONOTONIC, &stop);
		system("grep thp /proc/vmstat");

		timespec_diff(&start, &stop, &result);
		printf("--------------------------\n");
		printf("%ld.%09lds\n", result.tv_sec, result.tv_nsec);

		return 0;
	}

Baseline: 25.146 +- 0.141
Patched:  28.684 +- 0.298
Slowdown: 1.14x

Can we tolerate this?

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2016-02-04 23:59 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-03 15:14 [PATCH 0/4] thp: simplify freeze_page() and unfreeze_page() Kirill A. Shutemov
2016-02-03 15:14 ` Kirill A. Shutemov
2016-02-03 15:14 ` [PATCH 1/4] rmap: introduce rmap_walk_locked() Kirill A. Shutemov
2016-02-03 15:14   ` Kirill A. Shutemov
2016-02-03 22:40   ` Andrew Morton
2016-02-03 22:40     ` Andrew Morton
2016-02-03 22:45     ` Kirill A. Shutemov
2016-02-03 22:45       ` Kirill A. Shutemov
2016-02-03 22:56       ` Andrew Morton
2016-02-03 22:56         ` Andrew Morton
2016-02-04 14:37         ` Kirill A. Shutemov
2016-02-04 14:37           ` Kirill A. Shutemov
2016-02-03 15:14 ` [PATCH 2/4] rmap: extend try_to_unmap() to be usable by split_huge_page() Kirill A. Shutemov
2016-02-03 15:14   ` Kirill A. Shutemov
2016-02-03 15:14 ` [PATCH 3/4] mm: make remove_migration_ptes() beyond mm/migration.c Kirill A. Shutemov
2016-02-03 15:14   ` Kirill A. Shutemov
2016-02-03 15:14 ` [PATCH 4/4] thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers Kirill A. Shutemov
2016-02-03 15:14   ` Kirill A. Shutemov
2016-02-03 15:42   ` Dave Hansen
2016-02-03 15:42     ` Dave Hansen
2016-02-03 22:43     ` Andrew Morton
2016-02-03 22:43       ` Andrew Morton
2016-02-03 22:53       ` Kirill A. Shutemov
2016-02-03 22:53         ` Kirill A. Shutemov
2016-02-04 14:27     ` Kirill A. Shutemov
2016-02-04 14:27       ` Kirill A. Shutemov
2016-02-04 23:58     ` Kirill A. Shutemov
2016-02-04 23:58       ` Kirill A. Shutemov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.