linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] prohibit pinning pages in ZONE_MOVABLE
@ 2020-12-02  5:23 Pavel Tatashin
  2020-12-02  5:23 ` [PATCH 1/6] mm/gup: perform check_dax_vmas only when FS_DAX is enabled Pavel Tatashin
                   ` (6 more replies)
  0 siblings, 7 replies; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-02  5:23 UTC (permalink / raw)
  To: pasha.tatashin, linux-kernel, linux-mm, akpm, vbabka, mhocko,
	david, osalvador, dan.j.williams, sashal, tyhicks,
	iamjoonsoo.kim, mike.kravetz, rostedt, mingo, jgg, peterz,
	mgorman, willy, rientjes, jhubbard

When page is pinned it cannot be moved and its physical address stays
the same until pages is unpinned.

This is useful functionality to allows userland to implementation DMA
access. For example, it is used by vfio in vfio_pin_pages().

However, this functionality breaks memory hotplug/hotremove assumptions
that pages in ZONE_MOVABLE can always be migrated.

This patch series fixes this issue by forcing new allocations during
page pinning to omit ZONE_MOVABLE, and also to migrate any existing
pages from ZONE_MOVABLE during pinning.

It uses the same scheme logic that is currently used by CMA, and extends
the functionality for all allocations.

For more information read the discussion [1] about this problem.

[1] https://lore.kernel.org/lkml/CA+CK2bBffHBxjmb9jmSKacm0fJMinyt3Nhk8Nx6iudcQSj80_w@mail.gmail.com/

Pavel Tatashin (6):
  mm/gup: perform check_dax_vmas only when FS_DAX is enabled
  mm/gup: don't pin migrated cma pages in movable zone
  mm/gup: make __gup_longterm_locked common
  mm cma: rename PF_MEMALLOC_NOCMA to PF_MEMALLOC_NOMOVABLE
  mm: honor PF_MEMALLOC_NOMOVABLE for all allocations
  mm/gup: migrate pinned pages out of movable zone

 include/linux/migrate.h        |  1 +
 include/linux/sched.h          |  2 +-
 include/linux/sched/mm.h       | 21 +++------
 include/trace/events/migrate.h |  3 +-
 mm/gup.c                       | 82 +++++++++++++---------------------
 mm/hugetlb.c                   |  4 +-
 mm/page_alloc.c                | 26 ++++++-----
 7 files changed, 58 insertions(+), 81 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 1/6] mm/gup: perform check_dax_vmas only when FS_DAX is enabled
  2020-12-02  5:23 [PATCH 0/6] prohibit pinning pages in ZONE_MOVABLE Pavel Tatashin
@ 2020-12-02  5:23 ` Pavel Tatashin
  2020-12-02 16:22   ` Ira Weiny
                     ` (2 more replies)
  2020-12-02  5:23 ` [PATCH 2/6] mm/gup: don't pin migrated cma pages in movable zone Pavel Tatashin
                   ` (5 subsequent siblings)
  6 siblings, 3 replies; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-02  5:23 UTC (permalink / raw)
  To: pasha.tatashin, linux-kernel, linux-mm, akpm, vbabka, mhocko,
	david, osalvador, dan.j.williams, sashal, tyhicks,
	iamjoonsoo.kim, mike.kravetz, rostedt, mingo, jgg, peterz,
	mgorman, willy, rientjes, jhubbard

There is no need to check_dax_vmas() and run through the npage loop of
pinned pages if FS_DAX is not enabled.

Add a stub check_dax_vmas() function for no-FS_DAX case.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
---
 mm/gup.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/gup.c b/mm/gup.c
index 98eb8e6d2609..cdb8b9eeb016 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1568,6 +1568,7 @@ struct page *get_dump_page(unsigned long addr)
 #endif /* CONFIG_ELF_CORE */
 
 #if defined(CONFIG_FS_DAX) || defined (CONFIG_CMA)
+#ifdef CONFIG_FS_DAX
 static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
 {
 	long i;
@@ -1586,6 +1587,12 @@ static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
 	}
 	return false;
 }
+#else
+static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
+{
+	return false;
+}
+#endif
 
 #ifdef CONFIG_CMA
 static long check_and_migrate_cma_pages(struct mm_struct *mm,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 2/6] mm/gup: don't pin migrated cma pages in movable zone
  2020-12-02  5:23 [PATCH 0/6] prohibit pinning pages in ZONE_MOVABLE Pavel Tatashin
  2020-12-02  5:23 ` [PATCH 1/6] mm/gup: perform check_dax_vmas only when FS_DAX is enabled Pavel Tatashin
@ 2020-12-02  5:23 ` Pavel Tatashin
  2020-12-02 16:31   ` David Hildenbrand
                     ` (2 more replies)
  2020-12-02  5:23 ` [PATCH 3/6] mm/gup: make __gup_longterm_locked common Pavel Tatashin
                   ` (4 subsequent siblings)
  6 siblings, 3 replies; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-02  5:23 UTC (permalink / raw)
  To: pasha.tatashin, linux-kernel, linux-mm, akpm, vbabka, mhocko,
	david, osalvador, dan.j.williams, sashal, tyhicks,
	iamjoonsoo.kim, mike.kravetz, rostedt, mingo, jgg, peterz,
	mgorman, willy, rientjes, jhubbard

In order not to fragment CMA the pinned pages are migrated. However,
they are migrated to ZONE_MOVABLE, which also should not have pinned pages.

Remove __GFP_MOVABLE, so pages can be migrated to zones where pinning
is allowed.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
---
 mm/gup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/gup.c b/mm/gup.c
index cdb8b9eeb016..3a76c005a3e2 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1610,7 +1610,7 @@ static long check_and_migrate_cma_pages(struct mm_struct *mm,
 	long ret = nr_pages;
 	struct migration_target_control mtc = {
 		.nid = NUMA_NO_NODE,
-		.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_NOWARN,
+		.gfp_mask = GFP_USER | __GFP_NOWARN,
 	};
 
 check_again:
-- 
2.25.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 3/6] mm/gup: make __gup_longterm_locked common
  2020-12-02  5:23 [PATCH 0/6] prohibit pinning pages in ZONE_MOVABLE Pavel Tatashin
  2020-12-02  5:23 ` [PATCH 1/6] mm/gup: perform check_dax_vmas only when FS_DAX is enabled Pavel Tatashin
  2020-12-02  5:23 ` [PATCH 2/6] mm/gup: don't pin migrated cma pages in movable zone Pavel Tatashin
@ 2020-12-02  5:23 ` Pavel Tatashin
  2020-12-02 16:31   ` Ira Weiny
  2020-12-03  8:03   ` John Hubbard
  2020-12-02  5:23 ` [PATCH 4/6] mm cma: rename PF_MEMALLOC_NOCMA to PF_MEMALLOC_NOMOVABLE Pavel Tatashin
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-02  5:23 UTC (permalink / raw)
  To: pasha.tatashin, linux-kernel, linux-mm, akpm, vbabka, mhocko,
	david, osalvador, dan.j.williams, sashal, tyhicks,
	iamjoonsoo.kim, mike.kravetz, rostedt, mingo, jgg, peterz,
	mgorman, willy, rientjes, jhubbard

__gup_longterm_locked() has CMA || FS_DAX version and a common stub
version. In the preparation of prohibiting longterm pinning of pages from
movable zone make the CMA || FS_DAX version common, and delete the stub
version.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
---
 mm/gup.c | 13 -------------
 1 file changed, 13 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 3a76c005a3e2..0e2de888a8b0 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1567,7 +1567,6 @@ struct page *get_dump_page(unsigned long addr)
 }
 #endif /* CONFIG_ELF_CORE */
 
-#if defined(CONFIG_FS_DAX) || defined (CONFIG_CMA)
 #ifdef CONFIG_FS_DAX
 static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
 {
@@ -1757,18 +1756,6 @@ static long __gup_longterm_locked(struct mm_struct *mm,
 		kfree(vmas_tmp);
 	return rc;
 }
-#else /* !CONFIG_FS_DAX && !CONFIG_CMA */
-static __always_inline long __gup_longterm_locked(struct mm_struct *mm,
-						  unsigned long start,
-						  unsigned long nr_pages,
-						  struct page **pages,
-						  struct vm_area_struct **vmas,
-						  unsigned int flags)
-{
-	return __get_user_pages_locked(mm, start, nr_pages, pages, vmas,
-				       NULL, flags);
-}
-#endif /* CONFIG_FS_DAX || CONFIG_CMA */
 
 static bool is_valid_gup_flags(unsigned int gup_flags)
 {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 4/6] mm cma: rename PF_MEMALLOC_NOCMA to PF_MEMALLOC_NOMOVABLE
  2020-12-02  5:23 [PATCH 0/6] prohibit pinning pages in ZONE_MOVABLE Pavel Tatashin
                   ` (2 preceding siblings ...)
  2020-12-02  5:23 ` [PATCH 3/6] mm/gup: make __gup_longterm_locked common Pavel Tatashin
@ 2020-12-02  5:23 ` Pavel Tatashin
  2020-12-03  8:04   ` John Hubbard
  2020-12-03  8:57   ` Michal Hocko
  2020-12-02  5:23 ` [PATCH 5/6] mm: honor PF_MEMALLOC_NOMOVABLE for all allocations Pavel Tatashin
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-02  5:23 UTC (permalink / raw)
  To: pasha.tatashin, linux-kernel, linux-mm, akpm, vbabka, mhocko,
	david, osalvador, dan.j.williams, sashal, tyhicks,
	iamjoonsoo.kim, mike.kravetz, rostedt, mingo, jgg, peterz,
	mgorman, willy, rientjes, jhubbard

PF_MEMALLOC_NOCMA is used for longterm pinning and has an effect of
clearing _GFP_MOVABLE or prohibiting allocations from ZONE_MOVABLE.

We will prohibit allocating any pages that are getting
longterm pinned from ZONE_MOVABLE, and we would want to unify and re-use
this flag. So, rename it to generic PF_MEMALLOC_NOMOVABLE.
Also re-name:
memalloc_nocma_save()/memalloc_nocma_restore
to
memalloc_nomovable_save()/memalloc_nomovable_restore()
and make the new functions common.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
---
 include/linux/sched.h    |  2 +-
 include/linux/sched/mm.h | 21 +++++----------------
 mm/gup.c                 |  4 ++--
 mm/hugetlb.c             |  4 ++--
 mm/page_alloc.c          |  4 ++--
 5 files changed, 12 insertions(+), 23 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 76cd21fa5501..f1bf05f5f5fa 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1548,7 +1548,7 @@ extern struct pid *cad_pid;
 #define PF_SWAPWRITE		0x00800000	/* Allowed to write to swap */
 #define PF_NO_SETAFFINITY	0x04000000	/* Userland is not allowed to meddle with cpus_mask */
 #define PF_MCE_EARLY		0x08000000      /* Early kill for mce process policy */
-#define PF_MEMALLOC_NOCMA	0x10000000	/* All allocation request will have _GFP_MOVABLE cleared */
+#define PF_MEMALLOC_NOMOVABLE	0x10000000	/* All allocation request will have _GFP_MOVABLE cleared */
 #define PF_FREEZER_SKIP		0x40000000	/* Freezer should not count it as freezable */
 #define PF_SUSPEND_TASK		0x80000000      /* This thread called freeze_processes() and should not be frozen */
 
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index d5ece7a9a403..5bb9a6b69479 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -254,29 +254,18 @@ static inline void memalloc_noreclaim_restore(unsigned int flags)
 	current->flags = (current->flags & ~PF_MEMALLOC) | flags;
 }
 
-#ifdef CONFIG_CMA
-static inline unsigned int memalloc_nocma_save(void)
+static inline unsigned int memalloc_nomovable_save(void)
 {
-	unsigned int flags = current->flags & PF_MEMALLOC_NOCMA;
+	unsigned int flags = current->flags & PF_MEMALLOC_NOMOVABLE;
 
-	current->flags |= PF_MEMALLOC_NOCMA;
+	current->flags |= PF_MEMALLOC_NOMOVABLE;
 	return flags;
 }
 
-static inline void memalloc_nocma_restore(unsigned int flags)
+static inline void memalloc_nomovable_restore(unsigned int flags)
 {
-	current->flags = (current->flags & ~PF_MEMALLOC_NOCMA) | flags;
+	current->flags = (current->flags & ~PF_MEMALLOC_NOMOVABLE) | flags;
 }
-#else
-static inline unsigned int memalloc_nocma_save(void)
-{
-	return 0;
-}
-
-static inline void memalloc_nocma_restore(unsigned int flags)
-{
-}
-#endif
 
 #ifdef CONFIG_MEMCG
 DECLARE_PER_CPU(struct mem_cgroup *, int_active_memcg);
diff --git a/mm/gup.c b/mm/gup.c
index 0e2de888a8b0..724d8a65e1df 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1726,7 +1726,7 @@ static long __gup_longterm_locked(struct mm_struct *mm,
 			if (!vmas_tmp)
 				return -ENOMEM;
 		}
-		flags = memalloc_nocma_save();
+		flags = memalloc_nomovable_save();
 	}
 
 	rc = __get_user_pages_locked(mm, start, nr_pages, pages,
@@ -1749,7 +1749,7 @@ static long __gup_longterm_locked(struct mm_struct *mm,
 		rc = check_and_migrate_cma_pages(mm, start, rc, pages,
 						 vmas_tmp, gup_flags);
 out:
-		memalloc_nocma_restore(flags);
+		memalloc_nomovable_restore(flags);
 	}
 
 	if (vmas_tmp != vmas)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 37f15c3c24dc..02213c74ed6b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1033,10 +1033,10 @@ static void enqueue_huge_page(struct hstate *h, struct page *page)
 static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
 {
 	struct page *page;
-	bool nocma = !!(current->flags & PF_MEMALLOC_NOCMA);
+	bool nomovable = !!(current->flags & PF_MEMALLOC_NOMOVABLE);
 
 	list_for_each_entry(page, &h->hugepage_freelists[nid], lru) {
-		if (nocma && is_migrate_cma_page(page))
+		if (nomovable && is_migrate_cma_page(page))
 			continue;
 
 		if (PageHWPoison(page))
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index eaa227a479e4..611799c72da5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3772,8 +3772,8 @@ static inline unsigned int current_alloc_flags(gfp_t gfp_mask,
 #ifdef CONFIG_CMA
 	unsigned int pflags = current->flags;
 
-	if (!(pflags & PF_MEMALLOC_NOCMA) &&
-			gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
+	if (!(pflags & PF_MEMALLOC_NOMOVABLE) &&
+	    gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
 		alloc_flags |= ALLOC_CMA;
 
 #endif
-- 
2.25.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 5/6] mm: honor PF_MEMALLOC_NOMOVABLE for all allocations
  2020-12-02  5:23 [PATCH 0/6] prohibit pinning pages in ZONE_MOVABLE Pavel Tatashin
                   ` (3 preceding siblings ...)
  2020-12-02  5:23 ` [PATCH 4/6] mm cma: rename PF_MEMALLOC_NOCMA to PF_MEMALLOC_NOMOVABLE Pavel Tatashin
@ 2020-12-02  5:23 ` Pavel Tatashin
  2020-12-03  8:17   ` John Hubbard
  2020-12-03  9:17   ` Michal Hocko
  2020-12-02  5:23 ` [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone Pavel Tatashin
  2020-12-04  4:02 ` [PATCH 0/6] prohibit pinning pages in ZONE_MOVABLE Joonsoo Kim
  6 siblings, 2 replies; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-02  5:23 UTC (permalink / raw)
  To: pasha.tatashin, linux-kernel, linux-mm, akpm, vbabka, mhocko,
	david, osalvador, dan.j.williams, sashal, tyhicks,
	iamjoonsoo.kim, mike.kravetz, rostedt, mingo, jgg, peterz,
	mgorman, willy, rientjes, jhubbard

PF_MEMALLOC_NOMOVABLE is only honored for CMA allocations, extend
this flag to work for any allocations by removing __GFP_MOVABLE from
gfp_mask when this flag is passed in the current context, thus
prohibiting allocations from ZONE_MOVABLE.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
---
 mm/hugetlb.c    |  2 +-
 mm/page_alloc.c | 26 ++++++++++++++++----------
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 02213c74ed6b..00e786201d8b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1036,7 +1036,7 @@ static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
 	bool nomovable = !!(current->flags & PF_MEMALLOC_NOMOVABLE);
 
 	list_for_each_entry(page, &h->hugepage_freelists[nid], lru) {
-		if (nomovable && is_migrate_cma_page(page))
+		if (nomovable && is_migrate_movable(get_pageblock_migratetype(page)))
 			continue;
 
 		if (PageHWPoison(page))
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 611799c72da5..7a6d86d0bc5f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3766,20 +3766,25 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask)
 	return alloc_flags;
 }
 
-static inline unsigned int current_alloc_flags(gfp_t gfp_mask,
-					unsigned int alloc_flags)
+static inline unsigned int cma_alloc_flags(gfp_t gfp_mask,
+					   unsigned int alloc_flags)
 {
 #ifdef CONFIG_CMA
-	unsigned int pflags = current->flags;
-
-	if (!(pflags & PF_MEMALLOC_NOMOVABLE) &&
-	    gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
+	if (gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
 		alloc_flags |= ALLOC_CMA;
-
 #endif
 	return alloc_flags;
 }
 
+static inline gfp_t current_gfp_checkmovable(gfp_t gfp_mask)
+{
+	unsigned int pflags = current->flags;
+
+	if ((pflags & PF_MEMALLOC_NOMOVABLE))
+		return gfp_mask & ~__GFP_MOVABLE;
+	return gfp_mask;
+}
+
 /*
  * get_page_from_freelist goes through the zonelist trying to allocate
  * a page.
@@ -4423,7 +4428,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
 	} else if (unlikely(rt_task(current)) && !in_interrupt())
 		alloc_flags |= ALLOC_HARDER;
 
-	alloc_flags = current_alloc_flags(gfp_mask, alloc_flags);
+	alloc_flags = cma_alloc_flags(gfp_mask, alloc_flags);
 
 	return alloc_flags;
 }
@@ -4725,7 +4730,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 
 	reserve_flags = __gfp_pfmemalloc_flags(gfp_mask);
 	if (reserve_flags)
-		alloc_flags = current_alloc_flags(gfp_mask, reserve_flags);
+		alloc_flags = cma_alloc_flags(gfp_mask, reserve_flags);
 
 	/*
 	 * Reset the nodemask and zonelist iterators if memory policies can be
@@ -4894,7 +4899,7 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
 	if (should_fail_alloc_page(gfp_mask, order))
 		return false;
 
-	*alloc_flags = current_alloc_flags(gfp_mask, *alloc_flags);
+	*alloc_flags = cma_alloc_flags(gfp_mask, *alloc_flags);
 
 	/* Dirty zone balancing only done in the fast path */
 	ac->spread_dirty_pages = (gfp_mask & __GFP_WRITE);
@@ -4932,6 +4937,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
 	}
 
 	gfp_mask &= gfp_allowed_mask;
+	gfp_mask = current_gfp_checkmovable(gfp_mask);
 	alloc_mask = gfp_mask;
 	if (!prepare_alloc_pages(gfp_mask, order, preferred_nid, nodemask, &ac, &alloc_mask, &alloc_flags))
 		return NULL;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone
  2020-12-02  5:23 [PATCH 0/6] prohibit pinning pages in ZONE_MOVABLE Pavel Tatashin
                   ` (4 preceding siblings ...)
  2020-12-02  5:23 ` [PATCH 5/6] mm: honor PF_MEMALLOC_NOMOVABLE for all allocations Pavel Tatashin
@ 2020-12-02  5:23 ` Pavel Tatashin
  2020-12-02 16:35   ` Jason Gunthorpe
                     ` (2 more replies)
  2020-12-04  4:02 ` [PATCH 0/6] prohibit pinning pages in ZONE_MOVABLE Joonsoo Kim
  6 siblings, 3 replies; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-02  5:23 UTC (permalink / raw)
  To: pasha.tatashin, linux-kernel, linux-mm, akpm, vbabka, mhocko,
	david, osalvador, dan.j.williams, sashal, tyhicks,
	iamjoonsoo.kim, mike.kravetz, rostedt, mingo, jgg, peterz,
	mgorman, willy, rientjes, jhubbard

We do not allocate pin pages in ZONE_MOVABLE, but if pages were already
allocated before pinning they need to migrated to a different zone.
Currently, we migrate movable CMA pages only. Generalize the function
that migrates CMA pages to migrate all movable pages.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
---
 include/linux/migrate.h        |  1 +
 include/trace/events/migrate.h |  3 +-
 mm/gup.c                       | 56 +++++++++++++---------------------
 3 files changed, 24 insertions(+), 36 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 0f8d1583fa8e..00bab23d1ee5 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -27,6 +27,7 @@ enum migrate_reason {
 	MR_MEMPOLICY_MBIND,
 	MR_NUMA_MISPLACED,
 	MR_CONTIG_RANGE,
+	MR_LONGTERM_PIN,
 	MR_TYPES
 };
 
diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h
index 4d434398d64d..363b54ce104c 100644
--- a/include/trace/events/migrate.h
+++ b/include/trace/events/migrate.h
@@ -20,7 +20,8 @@
 	EM( MR_SYSCALL,		"syscall_or_cpuset")		\
 	EM( MR_MEMPOLICY_MBIND,	"mempolicy_mbind")		\
 	EM( MR_NUMA_MISPLACED,	"numa_misplaced")		\
-	EMe(MR_CONTIG_RANGE,	"contig_range")
+	EM( MR_CONTIG_RANGE,	"contig_range")			\
+	EMe(MR_LONGTERM_PIN,	"longterm_pin")
 
 /*
  * First define the enums in the above macros to be exported to userspace
diff --git a/mm/gup.c b/mm/gup.c
index 724d8a65e1df..1d511f65f8a7 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1593,19 +1593,18 @@ static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
 }
 #endif
 
-#ifdef CONFIG_CMA
-static long check_and_migrate_cma_pages(struct mm_struct *mm,
-					unsigned long start,
-					unsigned long nr_pages,
-					struct page **pages,
-					struct vm_area_struct **vmas,
-					unsigned int gup_flags)
+static long check_and_migrate_movable_pages(struct mm_struct *mm,
+					    unsigned long start,
+					    unsigned long nr_pages,
+					    struct page **pages,
+					    struct vm_area_struct **vmas,
+					    unsigned int gup_flags)
 {
 	unsigned long i;
 	unsigned long step;
 	bool drain_allow = true;
 	bool migrate_allow = true;
-	LIST_HEAD(cma_page_list);
+	LIST_HEAD(page_list);
 	long ret = nr_pages;
 	struct migration_target_control mtc = {
 		.nid = NUMA_NO_NODE,
@@ -1623,13 +1622,12 @@ static long check_and_migrate_cma_pages(struct mm_struct *mm,
 		 */
 		step = compound_nr(head) - (pages[i] - head);
 		/*
-		 * If we get a page from the CMA zone, since we are going to
-		 * be pinning these entries, we might as well move them out
-		 * of the CMA zone if possible.
+		 * If we get a movable page, since we are going to be pinning
+		 * these entries, try to move them out if possible.
 		 */
-		if (is_migrate_cma_page(head)) {
+		if (is_migrate_movable(get_pageblock_migratetype(head))) {
 			if (PageHuge(head))
-				isolate_huge_page(head, &cma_page_list);
+				isolate_huge_page(head, &page_list);
 			else {
 				if (!PageLRU(head) && drain_allow) {
 					lru_add_drain_all();
@@ -1637,7 +1635,7 @@ static long check_and_migrate_cma_pages(struct mm_struct *mm,
 				}
 
 				if (!isolate_lru_page(head)) {
-					list_add_tail(&head->lru, &cma_page_list);
+					list_add_tail(&head->lru, &page_list);
 					mod_node_page_state(page_pgdat(head),
 							    NR_ISOLATED_ANON +
 							    page_is_file_lru(head),
@@ -1649,7 +1647,7 @@ static long check_and_migrate_cma_pages(struct mm_struct *mm,
 		i += step;
 	}
 
-	if (!list_empty(&cma_page_list)) {
+	if (!list_empty(&page_list)) {
 		/*
 		 * drop the above get_user_pages reference.
 		 */
@@ -1659,7 +1657,7 @@ static long check_and_migrate_cma_pages(struct mm_struct *mm,
 			for (i = 0; i < nr_pages; i++)
 				put_page(pages[i]);
 
-		if (migrate_pages(&cma_page_list, alloc_migration_target, NULL,
+		if (migrate_pages(&page_list, alloc_migration_target, NULL,
 			(unsigned long)&mtc, MIGRATE_SYNC, MR_CONTIG_RANGE)) {
 			/*
 			 * some of the pages failed migration. Do get_user_pages
@@ -1667,17 +1665,16 @@ static long check_and_migrate_cma_pages(struct mm_struct *mm,
 			 */
 			migrate_allow = false;
 
-			if (!list_empty(&cma_page_list))
-				putback_movable_pages(&cma_page_list);
+			if (!list_empty(&page_list))
+				putback_movable_pages(&page_list);
 		}
 		/*
 		 * We did migrate all the pages, Try to get the page references
-		 * again migrating any new CMA pages which we failed to isolate
-		 * earlier.
+		 * again migrating any pages which we failed to isolate earlier.
 		 */
 		ret = __get_user_pages_locked(mm, start, nr_pages,
-						   pages, vmas, NULL,
-						   gup_flags);
+					      pages, vmas, NULL,
+					      gup_flags);
 
 		if ((ret > 0) && migrate_allow) {
 			nr_pages = ret;
@@ -1688,17 +1685,6 @@ static long check_and_migrate_cma_pages(struct mm_struct *mm,
 
 	return ret;
 }
-#else
-static long check_and_migrate_cma_pages(struct mm_struct *mm,
-					unsigned long start,
-					unsigned long nr_pages,
-					struct page **pages,
-					struct vm_area_struct **vmas,
-					unsigned int gup_flags)
-{
-	return nr_pages;
-}
-#endif /* CONFIG_CMA */
 
 /*
  * __gup_longterm_locked() is a wrapper for __get_user_pages_locked which
@@ -1746,8 +1732,8 @@ static long __gup_longterm_locked(struct mm_struct *mm,
 			goto out;
 		}
 
-		rc = check_and_migrate_cma_pages(mm, start, rc, pages,
-						 vmas_tmp, gup_flags);
+		rc = check_and_migrate_movable_pages(mm, start, rc, pages,
+						     vmas_tmp, gup_flags);
 out:
 		memalloc_nomovable_restore(flags);
 	}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/6] mm/gup: perform check_dax_vmas only when FS_DAX is enabled
  2020-12-02  5:23 ` [PATCH 1/6] mm/gup: perform check_dax_vmas only when FS_DAX is enabled Pavel Tatashin
@ 2020-12-02 16:22   ` Ira Weiny
  2020-12-02 18:15     ` Pavel Tatashin
  2020-12-02 16:29   ` Jason Gunthorpe
  2020-12-03  7:59   ` John Hubbard
  2 siblings, 1 reply; 67+ messages in thread
From: Ira Weiny @ 2020-12-02 16:22 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: linux-kernel, linux-mm, akpm, vbabka, mhocko, david, osalvador,
	dan.j.williams, sashal, tyhicks, iamjoonsoo.kim, mike.kravetz,
	rostedt, mingo, jgg, peterz, mgorman, willy, rientjes, jhubbard

On Wed, Dec 02, 2020 at 12:23:25AM -0500, Pavel Tatashin wrote:
> There is no need to check_dax_vmas() and run through the npage loop of
> pinned pages if FS_DAX is not enabled.
> 
> Add a stub check_dax_vmas() function for no-FS_DAX case.

This looks like a good idea.

> 
> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> ---
>  mm/gup.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 98eb8e6d2609..cdb8b9eeb016 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1568,6 +1568,7 @@ struct page *get_dump_page(unsigned long addr)
>  #endif /* CONFIG_ELF_CORE */
>  
>  #if defined(CONFIG_FS_DAX) || defined (CONFIG_CMA)

In addition, I think it would make a lot of sense to clean up this config as
well like this:

08:20:10 > git di
diff --git a/mm/gup.c b/mm/gup.c
index 102877ed77a4..92cfda220aeb 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1567,7 +1567,7 @@ struct page *get_dump_page(unsigned long addr)
 }
 #endif /* CONFIG_ELF_CORE */
 
-#if defined(CONFIG_FS_DAX) || defined (CONFIG_CMA)
+#ifdef CONFIG_FS_DAX
 static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
 {
        long i;
@@ -1586,6 +1586,12 @@ static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
        }
        return false;
 }
+#else
+static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
+{
+       return false;
+}
+#endif /* CONFIG_FS_DAX */
 
 #ifdef CONFIG_CMA
 static long check_and_migrate_cma_pages(struct mm_struct *mm,
@@ -1691,6 +1697,7 @@ static long check_and_migrate_cma_pages(struct mm_struct *mm,
 }
 #endif /* CONFIG_CMA */
 
+#if defined(CONFIG_FS_DAX) || defined (CONFIG_CMA)
 /*
  * __gup_longterm_locked() is a wrapper for __get_user_pages_locked which
  * allows us to process the FOLL_LONGTERM flag.


That makes it more clear what is going on with __gup_longterm_locked() and
places both CMA and FS_DAX code within their own blocks.

Ira

> +#ifdef CONFIG_FS_DAX
>  static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
>  {
>  	long i;
> @@ -1586,6 +1587,12 @@ static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
>  	}
>  	return false;
>  }
> +#else
> +static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
> +{
> +	return false;
> +}
> +#endif
>  
>  #ifdef CONFIG_CMA
>  static long check_and_migrate_cma_pages(struct mm_struct *mm,
> -- 
> 2.25.1
> 
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/6] mm/gup: perform check_dax_vmas only when FS_DAX is enabled
  2020-12-02  5:23 ` [PATCH 1/6] mm/gup: perform check_dax_vmas only when FS_DAX is enabled Pavel Tatashin
  2020-12-02 16:22   ` Ira Weiny
@ 2020-12-02 16:29   ` Jason Gunthorpe
  2020-12-02 18:16     ` Pavel Tatashin
  2020-12-03  7:59   ` John Hubbard
  2 siblings, 1 reply; 67+ messages in thread
From: Jason Gunthorpe @ 2020-12-02 16:29 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: linux-kernel, linux-mm, akpm, vbabka, mhocko, david, osalvador,
	dan.j.williams, sashal, tyhicks, iamjoonsoo.kim, mike.kravetz,
	rostedt, mingo, peterz, mgorman, willy, rientjes, jhubbard

On Wed, Dec 02, 2020 at 12:23:25AM -0500, Pavel Tatashin wrote:
> There is no need to check_dax_vmas() and run through the npage loop of
> pinned pages if FS_DAX is not enabled.
> 
> Add a stub check_dax_vmas() function for no-FS_DAX case.
> 
> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> ---\0
>  mm/gup.c | 7 +++++++
>  1 file changed, 7 insertions(+)

I have a patch to delete check_dax_vmas that is just waiting on me to
figure out how to test with dax. That makes all this ifdefery much
simpler

Jason

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/6] mm/gup: don't pin migrated cma pages in movable zone
  2020-12-02  5:23 ` [PATCH 2/6] mm/gup: don't pin migrated cma pages in movable zone Pavel Tatashin
@ 2020-12-02 16:31   ` David Hildenbrand
  2020-12-02 18:17     ` Pavel Tatashin
  2020-12-03  8:01   ` John Hubbard
  2020-12-03  8:46   ` Michal Hocko
  2 siblings, 1 reply; 67+ messages in thread
From: David Hildenbrand @ 2020-12-02 16:31 UTC (permalink / raw)
  To: Pavel Tatashin, linux-kernel, linux-mm, akpm, vbabka, mhocko,
	osalvador, dan.j.williams, sashal, tyhicks, iamjoonsoo.kim,
	mike.kravetz, rostedt, mingo, jgg, peterz, mgorman, willy,
	rientjes, jhubbard

On 02.12.20 06:23, Pavel Tatashin wrote:
> In order not to fragment CMA the pinned pages are migrated. However,
> they are migrated to ZONE_MOVABLE, which also should not have pinned pages.
> 
> Remove __GFP_MOVABLE, so pages can be migrated to zones where pinning
> is allowed.
> 
> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> ---
>  mm/gup.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index cdb8b9eeb016..3a76c005a3e2 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1610,7 +1610,7 @@ static long check_and_migrate_cma_pages(struct mm_struct *mm,
>  	long ret = nr_pages;
>  	struct migration_target_control mtc = {
>  		.nid = NUMA_NO_NODE,
> -		.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_NOWARN,
> +		.gfp_mask = GFP_USER | __GFP_NOWARN,
>  	};
>  
>  check_again:
> 


Looks like the right thing to me, thanks!

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 3/6] mm/gup: make __gup_longterm_locked common
  2020-12-02  5:23 ` [PATCH 3/6] mm/gup: make __gup_longterm_locked common Pavel Tatashin
@ 2020-12-02 16:31   ` Ira Weiny
  2020-12-02 16:33     ` Ira Weiny
  2020-12-03  8:03   ` John Hubbard
  1 sibling, 1 reply; 67+ messages in thread
From: Ira Weiny @ 2020-12-02 16:31 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: linux-kernel, linux-mm, akpm, vbabka, mhocko, david, osalvador,
	dan.j.williams, sashal, tyhicks, iamjoonsoo.kim, mike.kravetz,
	rostedt, mingo, jgg, peterz, mgorman, willy, rientjes, jhubbard

On Wed, Dec 02, 2020 at 12:23:27AM -0500, Pavel Tatashin wrote:
> __gup_longterm_locked() has CMA || FS_DAX version and a common stub
> version. In the preparation of prohibiting longterm pinning of pages from
> movable zone make the CMA || FS_DAX version common, and delete the stub
> version.
> 
> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> ---
>  mm/gup.c | 13 -------------
>  1 file changed, 13 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 3a76c005a3e2..0e2de888a8b0 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1567,7 +1567,6 @@ struct page *get_dump_page(unsigned long addr)
>  }
>  #endif /* CONFIG_ELF_CORE */
>  
> -#if defined(CONFIG_FS_DAX) || defined (CONFIG_CMA)
>  #ifdef CONFIG_FS_DAX
>  static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
>  {
> @@ -1757,18 +1756,6 @@ static long __gup_longterm_locked(struct mm_struct *mm,
>  		kfree(vmas_tmp);
>  	return rc;
>  }

Isn't this going to potentially allocate vmas_tmp only to not need it when
!FS_DAX and !CMA?

Ira

> -#else /* !CONFIG_FS_DAX && !CONFIG_CMA */
> -static __always_inline long __gup_longterm_locked(struct mm_struct *mm,
> -						  unsigned long start,
> -						  unsigned long nr_pages,
> -						  struct page **pages,
> -						  struct vm_area_struct **vmas,
> -						  unsigned int flags)
> -{
> -	return __get_user_pages_locked(mm, start, nr_pages, pages, vmas,
> -				       NULL, flags);
> -}
> -#endif /* CONFIG_FS_DAX || CONFIG_CMA */
>  
>  static bool is_valid_gup_flags(unsigned int gup_flags)
>  {
> -- 
> 2.25.1
> 
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 3/6] mm/gup: make __gup_longterm_locked common
  2020-12-02 16:31   ` Ira Weiny
@ 2020-12-02 16:33     ` Ira Weiny
  2020-12-02 18:19       ` Pavel Tatashin
  0 siblings, 1 reply; 67+ messages in thread
From: Ira Weiny @ 2020-12-02 16:33 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: linux-kernel, linux-mm, akpm, vbabka, mhocko, david, osalvador,
	dan.j.williams, sashal, tyhicks, iamjoonsoo.kim, mike.kravetz,
	rostedt, mingo, jgg, peterz, mgorman, willy, rientjes, jhubbard

On Wed, Dec 02, 2020 at 08:31:45AM -0800, 'Ira Weiny' wrote:
> On Wed, Dec 02, 2020 at 12:23:27AM -0500, Pavel Tatashin wrote:
> > __gup_longterm_locked() has CMA || FS_DAX version and a common stub
> > version. In the preparation of prohibiting longterm pinning of pages from
> > movable zone make the CMA || FS_DAX version common, and delete the stub
> > version.
> > 
> > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> > ---
> >  mm/gup.c | 13 -------------
> >  1 file changed, 13 deletions(-)
> > 
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 3a76c005a3e2..0e2de888a8b0 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -1567,7 +1567,6 @@ struct page *get_dump_page(unsigned long addr)
> >  }
> >  #endif /* CONFIG_ELF_CORE */
> >  
> > -#if defined(CONFIG_FS_DAX) || defined (CONFIG_CMA)
> >  #ifdef CONFIG_FS_DAX
> >  static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
> >  {
> > @@ -1757,18 +1756,6 @@ static long __gup_longterm_locked(struct mm_struct *mm,
> >  		kfree(vmas_tmp);
> >  	return rc;
> >  }
> 
> Isn't this going to potentially allocate vmas_tmp only to not need it when
> !FS_DAX and !CMA?

To clarify, when FOLL_LONGTERM is set...

IRa

> 
> Ira
> 
> > -#else /* !CONFIG_FS_DAX && !CONFIG_CMA */
> > -static __always_inline long __gup_longterm_locked(struct mm_struct *mm,
> > -						  unsigned long start,
> > -						  unsigned long nr_pages,
> > -						  struct page **pages,
> > -						  struct vm_area_struct **vmas,
> > -						  unsigned int flags)
> > -{
> > -	return __get_user_pages_locked(mm, start, nr_pages, pages, vmas,
> > -				       NULL, flags);
> > -}
> > -#endif /* CONFIG_FS_DAX || CONFIG_CMA */
> >  
> >  static bool is_valid_gup_flags(unsigned int gup_flags)
> >  {
> > -- 
> > 2.25.1
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone
  2020-12-02  5:23 ` [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone Pavel Tatashin
@ 2020-12-02 16:35   ` Jason Gunthorpe
  2020-12-03  0:19     ` Pavel Tatashin
  2020-12-03  8:22   ` John Hubbard
  2020-12-04  4:13   ` Joonsoo Kim
  2 siblings, 1 reply; 67+ messages in thread
From: Jason Gunthorpe @ 2020-12-02 16:35 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: linux-kernel, linux-mm, akpm, vbabka, mhocko, david, osalvador,
	dan.j.williams, sashal, tyhicks, iamjoonsoo.kim, mike.kravetz,
	rostedt, mingo, peterz, mgorman, willy, rientjes, jhubbard

On Wed, Dec 02, 2020 at 12:23:30AM -0500, Pavel Tatashin wrote:
>  /*
>   * First define the enums in the above macros to be exported to userspace
> diff --git a/mm/gup.c b/mm/gup.c
> index 724d8a65e1df..1d511f65f8a7 100644
> +++ b/mm/gup.c
> @@ -1593,19 +1593,18 @@ static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
>  }
>  #endif
>  
> -#ifdef CONFIG_CMA
> -static long check_and_migrate_cma_pages(struct mm_struct *mm,
> -					unsigned long start,
> -					unsigned long nr_pages,
> -					struct page **pages,
> -					struct vm_area_struct **vmas,
> -					unsigned int gup_flags)
> +static long check_and_migrate_movable_pages(struct mm_struct *mm,
> +					    unsigned long start,
> +					    unsigned long nr_pages,
> +					    struct page **pages,
> +					    struct vm_area_struct **vmas,
> +					    unsigned int gup_flags)
>  {
>  	unsigned long i;
>  	unsigned long step;
>  	bool drain_allow = true;
>  	bool migrate_allow = true;
> -	LIST_HEAD(cma_page_list);
> +	LIST_HEAD(page_list);
>  	long ret = nr_pages;
>  	struct migration_target_control mtc = {
>  		.nid = NUMA_NO_NODE,
> @@ -1623,13 +1622,12 @@ static long check_and_migrate_cma_pages(struct mm_struct *mm,
>  		 */
>  		step = compound_nr(head) - (pages[i] - head);
>  		/*
> -		 * If we get a page from the CMA zone, since we are going to
> -		 * be pinning these entries, we might as well move them out
> -		 * of the CMA zone if possible.
> +		 * If we get a movable page, since we are going to be pinning
> +		 * these entries, try to move them out if possible.
>  		 */
> -		if (is_migrate_cma_page(head)) {
> +		if (is_migrate_movable(get_pageblock_migratetype(head))) {
>  			if (PageHuge(head))

It is a good moment to say, I really dislike how this was implemented
in the first place.

Scanning the output of gup just to do the is_migrate_movable() test is
kind of nonsense and slow. It would be better/faster to handle this
directly while gup is scanning the page tables and adding pages to the
list.

Now that this becoming more general, can you take a moment to see if a
better implementation could be possible?

Also, something takes care of the gup fast path too?

Jason

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/6] mm/gup: perform check_dax_vmas only when FS_DAX is enabled
  2020-12-02 16:22   ` Ira Weiny
@ 2020-12-02 18:15     ` Pavel Tatashin
  0 siblings, 0 replies; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-02 18:15 UTC (permalink / raw)
  To: Ira Weiny
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka, Michal Hocko,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, Joonsoo Kim, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Jason Gunthorpe, Peter Zijlstra, Mel Gorman,
	Matthew Wilcox, David Rientjes, John Hubbard

On Wed, Dec 2, 2020 at 11:22 AM Ira Weiny <ira.weiny@intel.com> wrote:
>
> On Wed, Dec 02, 2020 at 12:23:25AM -0500, Pavel Tatashin wrote:
> > There is no need to check_dax_vmas() and run through the npage loop of
> > pinned pages if FS_DAX is not enabled.
> >
> > Add a stub check_dax_vmas() function for no-FS_DAX case.
>
> This looks like a good idea.
>
> >
> > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> > ---
> >  mm/gup.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 98eb8e6d2609..cdb8b9eeb016 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -1568,6 +1568,7 @@ struct page *get_dump_page(unsigned long addr)
> >  #endif /* CONFIG_ELF_CORE */
> >
> >  #if defined(CONFIG_FS_DAX) || defined (CONFIG_CMA)
>
> In addition, I think it would make a lot of sense to clean up this config as
> well like this:
>
> 08:20:10 > git di
> diff --git a/mm/gup.c b/mm/gup.c
> index 102877ed77a4..92cfda220aeb 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1567,7 +1567,7 @@ struct page *get_dump_page(unsigned long addr)
>  }
>  #endif /* CONFIG_ELF_CORE */
>
> -#if defined(CONFIG_FS_DAX) || defined (CONFIG_CMA)
> +#ifdef CONFIG_FS_DAX
>  static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
>  {
>         long i;
> @@ -1586,6 +1586,12 @@ static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
>         }
>         return false;
>  }
> +#else
> +static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
> +{
> +       return false;
> +}
> +#endif /* CONFIG_FS_DAX */
>
>  #ifdef CONFIG_CMA
>  static long check_and_migrate_cma_pages(struct mm_struct *mm,
> @@ -1691,6 +1697,7 @@ static long check_and_migrate_cma_pages(struct mm_struct *mm,
>  }
>  #endif /* CONFIG_CMA */
>
> +#if defined(CONFIG_FS_DAX) || defined (CONFIG_CMA)
>  /*
>   * __gup_longterm_locked() is a wrapper for __get_user_pages_locked which
>   * allows us to process the FOLL_LONGTERM flag.
>
>
> That makes it more clear what is going on with __gup_longterm_locked() and
> places both CMA and FS_DAX code within their own blocks.

Hi Ira,

Thank you for your comments. I think the end result of what you are
suggesting is the same for this series when __gup_longterm_locked is
made common. " #if defined(CONFIG_FS_DAX) || defined (CONFIG_CMA)"
mess is removed in that patch.

Pasha

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/6] mm/gup: perform check_dax_vmas only when FS_DAX is enabled
  2020-12-02 16:29   ` Jason Gunthorpe
@ 2020-12-02 18:16     ` Pavel Tatashin
  0 siblings, 0 replies; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-02 18:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka, Michal Hocko,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, Joonsoo Kim, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Mel Gorman, Matthew Wilcox,
	David Rientjes, John Hubbard

On Wed, Dec 2, 2020 at 11:30 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Dec 02, 2020 at 12:23:25AM -0500, Pavel Tatashin wrote:
> > There is no need to check_dax_vmas() and run through the npage loop of
> > pinned pages if FS_DAX is not enabled.
> >
> > Add a stub check_dax_vmas() function for no-FS_DAX case.
> >
> > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> > ---
> >  mm/gup.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
>
> I have a patch to delete check_dax_vmas that is just waiting on me to
> figure out how to test with dax. That makes all this ifdefery much
> simpler

Hi Jason,

Yeap, that would be nice. I made this change as a preparation for
moving __gup_longterm_locked into common code, so when you send your
patch it can remove both versions of check_dax_vmas.

Thank you,
Pasha

>
> Jason

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/6] mm/gup: don't pin migrated cma pages in movable zone
  2020-12-02 16:31   ` David Hildenbrand
@ 2020-12-02 18:17     ` Pavel Tatashin
  0 siblings, 0 replies; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-02 18:17 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka, Michal Hocko,
	Oscar Salvador, Dan Williams, Sasha Levin, Tyler Hicks,
	Joonsoo Kim, mike.kravetz, Steven Rostedt, Ingo Molnar,
	Jason Gunthorpe, Peter Zijlstra, Mel Gorman, Matthew Wilcox,
	David Rientjes, John Hubbard

> Looks like the right thing to me, thanks!
>
> Reviewed-by: David Hildenbrand <david@redhat.com>

Thank you,
Pasha

>
> --
> Thanks,
>
> David / dhildenb
>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 3/6] mm/gup: make __gup_longterm_locked common
  2020-12-02 16:33     ` Ira Weiny
@ 2020-12-02 18:19       ` Pavel Tatashin
  2020-12-03  0:03         ` Pavel Tatashin
  0 siblings, 1 reply; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-02 18:19 UTC (permalink / raw)
  To: Ira Weiny
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka, Michal Hocko,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, Joonsoo Kim, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Jason Gunthorpe, Peter Zijlstra, Mel Gorman,
	Matthew Wilcox, David Rientjes, John Hubbard

On Wed, Dec 2, 2020 at 11:33 AM Ira Weiny <ira.weiny@intel.com> wrote:
>
> On Wed, Dec 02, 2020 at 08:31:45AM -0800, 'Ira Weiny' wrote:
> > On Wed, Dec 02, 2020 at 12:23:27AM -0500, Pavel Tatashin wrote:
> > > __gup_longterm_locked() has CMA || FS_DAX version and a common stub
> > > version. In the preparation of prohibiting longterm pinning of pages from
> > > movable zone make the CMA || FS_DAX version common, and delete the stub
> > > version.
> > >
> > > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> > > ---
> > >  mm/gup.c | 13 -------------
> > >  1 file changed, 13 deletions(-)
> > >
> > > diff --git a/mm/gup.c b/mm/gup.c
> > > index 3a76c005a3e2..0e2de888a8b0 100644
> > > --- a/mm/gup.c
> > > +++ b/mm/gup.c
> > > @@ -1567,7 +1567,6 @@ struct page *get_dump_page(unsigned long addr)
> > >  }
> > >  #endif /* CONFIG_ELF_CORE */
> > >
> > > -#if defined(CONFIG_FS_DAX) || defined (CONFIG_CMA)
> > >  #ifdef CONFIG_FS_DAX
> > >  static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
> > >  {
> > > @@ -1757,18 +1756,6 @@ static long __gup_longterm_locked(struct mm_struct *mm,
> > >             kfree(vmas_tmp);
> > >     return rc;
> > >  }
> >
> > Isn't this going to potentially allocate vmas_tmp only to not need it when
> > !FS_DAX and !CMA?
>
> To clarify, when FOLL_LONGTERM is set...

Yes, this is the case. We need that because once migration is checked
for all allocations, not only CMA, we need vmas_tmp for all cases.

Pasha

>
> IRa
>
> >
> > Ira
> >
> > > -#else /* !CONFIG_FS_DAX && !CONFIG_CMA */
> > > -static __always_inline long __gup_longterm_locked(struct mm_struct *mm,
> > > -                                             unsigned long start,
> > > -                                             unsigned long nr_pages,
> > > -                                             struct page **pages,
> > > -                                             struct vm_area_struct **vmas,
> > > -                                             unsigned int flags)
> > > -{
> > > -   return __get_user_pages_locked(mm, start, nr_pages, pages, vmas,
> > > -                                  NULL, flags);
> > > -}
> > > -#endif /* CONFIG_FS_DAX || CONFIG_CMA */
> > >
> > >  static bool is_valid_gup_flags(unsigned int gup_flags)
> > >  {
> > > --
> > > 2.25.1
> > >
> > >
> >

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 3/6] mm/gup: make __gup_longterm_locked common
  2020-12-02 18:19       ` Pavel Tatashin
@ 2020-12-03  0:03         ` Pavel Tatashin
  0 siblings, 0 replies; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-03  0:03 UTC (permalink / raw)
  To: Ira Weiny
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka, Michal Hocko,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, Joonsoo Kim, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Jason Gunthorpe, Peter Zijlstra, Mel Gorman,
	Matthew Wilcox, David Rientjes, John Hubbard

On Wed, Dec 2, 2020 at 1:19 PM Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
>
> On Wed, Dec 2, 2020 at 11:33 AM Ira Weiny <ira.weiny@intel.com> wrote:
> >
> > On Wed, Dec 02, 2020 at 08:31:45AM -0800, 'Ira Weiny' wrote:
> > > On Wed, Dec 02, 2020 at 12:23:27AM -0500, Pavel Tatashin wrote:
> > > > __gup_longterm_locked() has CMA || FS_DAX version and a common stub
> > > > version. In the preparation of prohibiting longterm pinning of pages from
> > > > movable zone make the CMA || FS_DAX version common, and delete the stub
> > > > version.
> > > >
> > > > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> > > > ---
> > > >  mm/gup.c | 13 -------------
> > > >  1 file changed, 13 deletions(-)
> > > >
> > > > diff --git a/mm/gup.c b/mm/gup.c
> > > > index 3a76c005a3e2..0e2de888a8b0 100644
> > > > --- a/mm/gup.c
> > > > +++ b/mm/gup.c
> > > > @@ -1567,7 +1567,6 @@ struct page *get_dump_page(unsigned long addr)
> > > >  }
> > > >  #endif /* CONFIG_ELF_CORE */
> > > >
> > > > -#if defined(CONFIG_FS_DAX) || defined (CONFIG_CMA)
> > > >  #ifdef CONFIG_FS_DAX
> > > >  static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
> > > >  {
> > > > @@ -1757,18 +1756,6 @@ static long __gup_longterm_locked(struct mm_struct *mm,
> > > >             kfree(vmas_tmp);
> > > >     return rc;
> > > >  }
> > >
> > > Isn't this going to potentially allocate vmas_tmp only to not need it when
> > > !FS_DAX and !CMA?
> >
> > To clarify, when FOLL_LONGTERM is set...
>
> Yes, this is the case. We need that because once migration is checked
> for all allocations, not only CMA, we need vmas_tmp for all cases.

A slight correction, we only need vmas_tmp for check_dax_vmas().
Potentially, we could wrap vmas_tmp allocation in a #ifdef for FS_DAX,
but I do not think this is really needed.

Pasha

>
> Pasha
>
> >
> > IRa
> >
> > >
> > > Ira
> > >
> > > > -#else /* !CONFIG_FS_DAX && !CONFIG_CMA */
> > > > -static __always_inline long __gup_longterm_locked(struct mm_struct *mm,
> > > > -                                             unsigned long start,
> > > > -                                             unsigned long nr_pages,
> > > > -                                             struct page **pages,
> > > > -                                             struct vm_area_struct **vmas,
> > > > -                                             unsigned int flags)
> > > > -{
> > > > -   return __get_user_pages_locked(mm, start, nr_pages, pages, vmas,
> > > > -                                  NULL, flags);
> > > > -}
> > > > -#endif /* CONFIG_FS_DAX || CONFIG_CMA */
> > > >
> > > >  static bool is_valid_gup_flags(unsigned int gup_flags)
> > > >  {
> > > > --
> > > > 2.25.1
> > > >
> > > >
> > >

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone
  2020-12-02 16:35   ` Jason Gunthorpe
@ 2020-12-03  0:19     ` Pavel Tatashin
  2020-12-03  1:08       ` Jason Gunthorpe
  0 siblings, 1 reply; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-03  0:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka, Michal Hocko,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, Joonsoo Kim, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Mel Gorman, Matthew Wilcox,
	David Rientjes, John Hubbard

> It is a good moment to say, I really dislike how this was implemented
> in the first place.
>
> Scanning the output of gup just to do the is_migrate_movable() test is
> kind of nonsense and slow. It would be better/faster to handle this
> directly while gup is scanning the page tables and adding pages to the
> list.

Hi Jason,

I assume you mean to migrate pages as soon as they are followed and
skip those that are faulted, as we already know that faulted pages are
allocated from nomovable zone.

The place would be:

__get_user_pages()
      while(more pages)
          get_gate_page()
          follow_hugetlb_page()
          follow_page_mask()

          if (!page)
               faultin_page()

          if (page && !faulted && (gup_flags & FOLL_LONGTERM) )
                check_and_migrate this page

I looked at that function, and I do not think the code will be cleaner
there, as that function already has a complicated loop.  The only
drawback with the current approach that I see is that
check_and_migrate_movable_pages() has to check once the faulted pages.
This is while not optimal is not horrible. The FOLL_LONGTERM should
not happen too frequently, so having one extra nr_pages loop should
not hurt the performance. Also, I checked and most of the users of
FOLL_LONGTERM pin only one page at a time. Which means the extra loop
is only to check a single page.

We could discuss improving this code farther. For example, I still
think it would be a good idea to pass an appropriate gfp_mask via
fault_flags from gup_flags instead of using
PF_MEMALLOC_NOMOVABLE (previously PF_MEMALLOC_NOCMA) per context flag.
However, those changes can come after this series. The current series
fixes a bug where hot-remove is not working with making minimal amount
of changes, so it is easy to backport it to stable kernels.

Thank you,
Pasha



>
> Now that this becoming more general, can you take a moment to see if a
> better implementation could be possible?
>
> Also, something takes care of the gup fast path too?
>
> Jason

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone
  2020-12-03  0:19     ` Pavel Tatashin
@ 2020-12-03  1:08       ` Jason Gunthorpe
  2020-12-03  1:34         ` Pavel Tatashin
  0 siblings, 1 reply; 67+ messages in thread
From: Jason Gunthorpe @ 2020-12-03  1:08 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka, Michal Hocko,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, Joonsoo Kim, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Mel Gorman, Matthew Wilcox,
	David Rientjes, John Hubbard

On Wed, Dec 02, 2020 at 07:19:45PM -0500, Pavel Tatashin wrote:
> > It is a good moment to say, I really dislike how this was implemented
> > in the first place.
> >
> > Scanning the output of gup just to do the is_migrate_movable() test is
> > kind of nonsense and slow. It would be better/faster to handle this
> > directly while gup is scanning the page tables and adding pages to the
> > list.
> 
> Hi Jason,
> 
> I assume you mean to migrate pages as soon as they are followed and
> skip those that are faulted, as we already know that faulted pages are
> allocated from nomovable zone.
> 
> The place would be:
> 
> __get_user_pages()
>       while(more pages)
>           get_gate_page()
>           follow_hugetlb_page()
>           follow_page_mask()
> 
>           if (!page)
>                faultin_page()
> 
>           if (page && !faulted && (gup_flags & FOLL_LONGTERM) )
>                 check_and_migrate this page

Either here or perhaps even lower down the call chain when the page is
captured, similar to how GUP fast would detect it. (how is that done
anyhow?)

> I looked at that function, and I do not think the code will be cleaner
> there, as that function already has a complicated loop.  

That function is complicated for its own reasons.. But complicated is
not the point here..

> The only drawback with the current approach that I see is that
> check_and_migrate_movable_pages() has to check once the faulted
> pages.

Yes

> This is while not optimal is not horrible. 

It is.

> The FOLL_LONGTERM should not happen too frequently, so having one
> extra nr_pages loop should not hurt the performance. 

FOLL_LONGTERM is typically used with very large regions, for instance
we are benchmarking around the 300G level. It takes 10s of seconds for
get_user_pages to operate. There are many inefficiencies in this
path. This extra work of re-scanning the list is part of the cost.

Further, having these special wrappers just for FOLL_LONGTERM has a
spill over complexity on the entire rest of the callchain up to here,
we now have endless wrappers and varieties of function calls that
generally are happening because the longterm path needs to end up in a
different place than other paths.

IMHO this is due to the lack of integration with the primary loop
above

> Also, I checked and most of the users of FOLL_LONGTERM pin only one
> page at a time. Which means the extra loop is only to check a single
> page.

Er, I don't know what you checked but those are not the cases I
see. Two big users are vfio and rdma. Both are pinning huge ranges of
memory in very typical use cases.

> However, those changes can come after this series. The current series
> fixes a bug where hot-remove is not working with making minimal amount
> of changes, so it is easy to backport it to stable kernels.

This is a good point, good enough that you should probably continue as
is

Jason

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone
  2020-12-03  1:08       ` Jason Gunthorpe
@ 2020-12-03  1:34         ` Pavel Tatashin
  2020-12-03 14:17           ` Jason Gunthorpe
  0 siblings, 1 reply; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-03  1:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka, Michal Hocko,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, Joonsoo Kim, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Mel Gorman, Matthew Wilcox,
	David Rientjes, John Hubbard

On Wed, Dec 2, 2020 at 8:08 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Dec 02, 2020 at 07:19:45PM -0500, Pavel Tatashin wrote:
> > > It is a good moment to say, I really dislike how this was implemented
> > > in the first place.
> > >
> > > Scanning the output of gup just to do the is_migrate_movable() test is
> > > kind of nonsense and slow. It would be better/faster to handle this
> > > directly while gup is scanning the page tables and adding pages to the
> > > list.
> >
> > Hi Jason,
> >
> > I assume you mean to migrate pages as soon as they are followed and
> > skip those that are faulted, as we already know that faulted pages are
> > allocated from nomovable zone.
> >
> > The place would be:
> >
> > __get_user_pages()
> >       while(more pages)
> >           get_gate_page()
> >           follow_hugetlb_page()
> >           follow_page_mask()
> >
> >           if (!page)
> >                faultin_page()
> >
> >           if (page && !faulted && (gup_flags & FOLL_LONGTERM) )
> >                 check_and_migrate this page
>
> Either here or perhaps even lower down the call chain when the page is
> captured, similar to how GUP fast would detect it. (how is that done
> anyhow?)

Ah, thank you for pointing this out. I think I need to address it here:

https://soleen.com/source/xref/linux/mm/gup.c?r=96e1fac1#94

static __maybe_unused struct page *try_grab_compound_head()
              if (unlikely(flags & FOLL_LONGTERM) &&  is_migrate_cma_page(page))
                   return NULL;

I need to change is_migrate_cma_page() to all migratable pages. Will
study, and send an update with this fix.

>
> > I looked at that function, and I do not think the code will be cleaner
> > there, as that function already has a complicated loop.
>
> That function is complicated for its own reasons.. But complicated is
> not the point here..
>
> > The only drawback with the current approach that I see is that
> > check_and_migrate_movable_pages() has to check once the faulted
> > pages.
>
> Yes
>
> > This is while not optimal is not horrible.
>
> It is.
>
> > The FOLL_LONGTERM should not happen too frequently, so having one
> > extra nr_pages loop should not hurt the performance.
>
> FOLL_LONGTERM is typically used with very large regions, for instance
> we are benchmarking around the 300G level. It takes 10s of seconds for
> get_user_pages to operate. There are many inefficiencies in this
> path. This extra work of re-scanning the list is part of the cost.

OK, I did not realize that pinning was for such large regions, the
path must be optimized.

>
> Further, having these special wrappers just for FOLL_LONGTERM has a
> spill over complexity on the entire rest of the callchain up to here,
> we now have endless wrappers and varieties of function calls that
> generally are happening because the longterm path needs to end up in a
> different place than other paths.
>
> IMHO this is due to the lack of integration with the primary loop
> above
>
> > Also, I checked and most of the users of FOLL_LONGTERM pin only one
> > page at a time. Which means the extra loop is only to check a single
> > page.
>
> Er, I don't know what you checked but those are not the cases I
> see. Two big users are vfio and rdma. Both are pinning huge ranges of
> memory in very typical use cases.

What I meant is the users of the interface do it incrementally not in
large chunks. For example:

vfio_pin_pages_remote
   vaddr_get_pfn
        ret = pin_user_pages_remote(mm, vaddr, 1, flags |
FOLL_LONGTERM, page, NULL, NULL);
1 -> pin only one pages at a time

RDMA indeed can do it in one chunk though. Regardless, the VFIO should
probably be optimized to do it in a larger chunk, and the code path
should be optimized for the reasons you gave above.

>
> > However, those changes can come after this series. The current series
> > fixes a bug where hot-remove is not working with making minimal amount
> > of changes, so it is easy to backport it to stable kernels.
>
> This is a good point, good enough that you should probably continue as
> is

I will continue looking into this code, and see if I can address your
concerns in a follow-up fixes.


Thanks,
Pasha

>
> Jason

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/6] mm/gup: perform check_dax_vmas only when FS_DAX is enabled
  2020-12-02  5:23 ` [PATCH 1/6] mm/gup: perform check_dax_vmas only when FS_DAX is enabled Pavel Tatashin
  2020-12-02 16:22   ` Ira Weiny
  2020-12-02 16:29   ` Jason Gunthorpe
@ 2020-12-03  7:59   ` John Hubbard
  2020-12-03 14:52     ` Pavel Tatashin
  2 siblings, 1 reply; 67+ messages in thread
From: John Hubbard @ 2020-12-03  7:59 UTC (permalink / raw)
  To: Pavel Tatashin, linux-kernel, linux-mm, akpm, vbabka, mhocko,
	david, osalvador, dan.j.williams, sashal, tyhicks,
	iamjoonsoo.kim, mike.kravetz, rostedt, mingo, jgg, peterz,
	mgorman, willy, rientjes

On 12/1/20 9:23 PM, Pavel Tatashin wrote:
> There is no need to check_dax_vmas() and run through the npage loop of
> pinned pages if FS_DAX is not enabled.
> 
> Add a stub check_dax_vmas() function for no-FS_DAX case.
> 
> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> ---
>   mm/gup.c | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 98eb8e6d2609..cdb8b9eeb016 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1568,6 +1568,7 @@ struct page *get_dump_page(unsigned long addr)
>   #endif /* CONFIG_ELF_CORE */
>   
>   #if defined(CONFIG_FS_DAX) || defined (CONFIG_CMA)
> +#ifdef CONFIG_FS_DAX
>   static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
>   {
>   	long i;
> @@ -1586,6 +1587,12 @@ static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
>   	}
>   	return false;
>   }
> +#else
> +static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
> +{
> +	return false;
> +}
> +#endif
>   
>   #ifdef CONFIG_CMA
>   static long check_and_migrate_cma_pages(struct mm_struct *mm,
> 

Looks obviously correct, and the follow-up simplication is very nice.

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/6] mm/gup: don't pin migrated cma pages in movable zone
  2020-12-02  5:23 ` [PATCH 2/6] mm/gup: don't pin migrated cma pages in movable zone Pavel Tatashin
  2020-12-02 16:31   ` David Hildenbrand
@ 2020-12-03  8:01   ` John Hubbard
  2020-12-03  8:46   ` Michal Hocko
  2 siblings, 0 replies; 67+ messages in thread
From: John Hubbard @ 2020-12-03  8:01 UTC (permalink / raw)
  To: Pavel Tatashin, linux-kernel, linux-mm, akpm, vbabka, mhocko,
	david, osalvador, dan.j.williams, sashal, tyhicks,
	iamjoonsoo.kim, mike.kravetz, rostedt, mingo, jgg, peterz,
	mgorman, willy, rientjes

On 12/1/20 9:23 PM, Pavel Tatashin wrote:
> In order not to fragment CMA the pinned pages are migrated. However,
> they are migrated to ZONE_MOVABLE, which also should not have pinned pages.
> 
> Remove __GFP_MOVABLE, so pages can be migrated to zones where pinning
> is allowed.
> 
> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> ---
>   mm/gup.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index cdb8b9eeb016..3a76c005a3e2 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1610,7 +1610,7 @@ static long check_and_migrate_cma_pages(struct mm_struct *mm,
>   	long ret = nr_pages;
>   	struct migration_target_control mtc = {
>   		.nid = NUMA_NO_NODE,
> -		.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_NOWARN,
> +		.gfp_mask = GFP_USER | __GFP_NOWARN,
>   	};
>   
>   check_again:
> 

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

...while I was here, I noticed this, which is outside the scope of your patchset, but
I thought I'd just mention it anyway in case anyone cares:

static inline bool is_migrate_movable(int mt)
{
	return is_migrate_cma(mt) || mt == MIGRATE_MOVABLE;
}

...that really should take an "enum migratetype mt" instead of an int, I think.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 3/6] mm/gup: make __gup_longterm_locked common
  2020-12-02  5:23 ` [PATCH 3/6] mm/gup: make __gup_longterm_locked common Pavel Tatashin
  2020-12-02 16:31   ` Ira Weiny
@ 2020-12-03  8:03   ` John Hubbard
  2020-12-03 15:02     ` Pavel Tatashin
  1 sibling, 1 reply; 67+ messages in thread
From: John Hubbard @ 2020-12-03  8:03 UTC (permalink / raw)
  To: Pavel Tatashin, linux-kernel, linux-mm, akpm, vbabka, mhocko,
	david, osalvador, dan.j.williams, sashal, tyhicks,
	iamjoonsoo.kim, mike.kravetz, rostedt, mingo, jgg, peterz,
	mgorman, willy, rientjes

On 12/1/20 9:23 PM, Pavel Tatashin wrote:
> __gup_longterm_locked() has CMA || FS_DAX version and a common stub
> version. In the preparation of prohibiting longterm pinning of pages from
> movable zone make the CMA || FS_DAX version common, and delete the stub
> version.
> 
> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> ---
>   mm/gup.c | 13 -------------
>   1 file changed, 13 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 3a76c005a3e2..0e2de888a8b0 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1567,7 +1567,6 @@ struct page *get_dump_page(unsigned long addr)
>   }
>   #endif /* CONFIG_ELF_CORE */
>   
> -#if defined(CONFIG_FS_DAX) || defined (CONFIG_CMA)
>   #ifdef CONFIG_FS_DAX
>   static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
>   {
> @@ -1757,18 +1756,6 @@ static long __gup_longterm_locked(struct mm_struct *mm,
>   		kfree(vmas_tmp);
>   	return rc;
>   }
> -#else /* !CONFIG_FS_DAX && !CONFIG_CMA */
> -static __always_inline long __gup_longterm_locked(struct mm_struct *mm,
> -						  unsigned long start,
> -						  unsigned long nr_pages,
> -						  struct page **pages,
> -						  struct vm_area_struct **vmas,
> -						  unsigned int flags)
> -{
> -	return __get_user_pages_locked(mm, start, nr_pages, pages, vmas,
> -				       NULL, flags);
> -}
> -#endif /* CONFIG_FS_DAX || CONFIG_CMA */
>   
>   static bool is_valid_gup_flags(unsigned int gup_flags)
>   {
> 

At last some simplification here, yea!

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 4/6] mm cma: rename PF_MEMALLOC_NOCMA to PF_MEMALLOC_NOMOVABLE
  2020-12-02  5:23 ` [PATCH 4/6] mm cma: rename PF_MEMALLOC_NOCMA to PF_MEMALLOC_NOMOVABLE Pavel Tatashin
@ 2020-12-03  8:04   ` John Hubbard
  2020-12-03 15:02     ` Pavel Tatashin
  2020-12-03  8:57   ` Michal Hocko
  1 sibling, 1 reply; 67+ messages in thread
From: John Hubbard @ 2020-12-03  8:04 UTC (permalink / raw)
  To: Pavel Tatashin, linux-kernel, linux-mm, akpm, vbabka, mhocko,
	david, osalvador, dan.j.williams, sashal, tyhicks,
	iamjoonsoo.kim, mike.kravetz, rostedt, mingo, jgg, peterz,
	mgorman, willy, rientjes

On 12/1/20 9:23 PM, Pavel Tatashin wrote:
> PF_MEMALLOC_NOCMA is used for longterm pinning and has an effect of
> clearing _GFP_MOVABLE or prohibiting allocations from ZONE_MOVABLE.
> 
> We will prohibit allocating any pages that are getting
> longterm pinned from ZONE_MOVABLE, and we would want to unify and re-use
> this flag. So, rename it to generic PF_MEMALLOC_NOMOVABLE.
> Also re-name:
> memalloc_nocma_save()/memalloc_nocma_restore
> to
> memalloc_nomovable_save()/memalloc_nomovable_restore()
> and make the new functions common.
> 
> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>

Looks accurate, and grep didn't find any lingering rename candidates
after this series is applied. And it's a good rename.


Reviewed-by: John Hubbard <jhubbard@nvidia.com>


thanks,
-- 
John Hubbard
NVIDIA

> ---
>   include/linux/sched.h    |  2 +-
>   include/linux/sched/mm.h | 21 +++++----------------
>   mm/gup.c                 |  4 ++--
>   mm/hugetlb.c             |  4 ++--
>   mm/page_alloc.c          |  4 ++--
>   5 files changed, 12 insertions(+), 23 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 76cd21fa5501..f1bf05f5f5fa 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1548,7 +1548,7 @@ extern struct pid *cad_pid;
>   #define PF_SWAPWRITE		0x00800000	/* Allowed to write to swap */
>   #define PF_NO_SETAFFINITY	0x04000000	/* Userland is not allowed to meddle with cpus_mask */
>   #define PF_MCE_EARLY		0x08000000      /* Early kill for mce process policy */
> -#define PF_MEMALLOC_NOCMA	0x10000000	/* All allocation request will have _GFP_MOVABLE cleared */
> +#define PF_MEMALLOC_NOMOVABLE	0x10000000	/* All allocation request will have _GFP_MOVABLE cleared */
>   #define PF_FREEZER_SKIP		0x40000000	/* Freezer should not count it as freezable */
>   #define PF_SUSPEND_TASK		0x80000000      /* This thread called freeze_processes() and should not be frozen */
>   
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index d5ece7a9a403..5bb9a6b69479 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -254,29 +254,18 @@ static inline void memalloc_noreclaim_restore(unsigned int flags)
>   	current->flags = (current->flags & ~PF_MEMALLOC) | flags;
>   }
>   
> -#ifdef CONFIG_CMA
> -static inline unsigned int memalloc_nocma_save(void)
> +static inline unsigned int memalloc_nomovable_save(void)
>   {
> -	unsigned int flags = current->flags & PF_MEMALLOC_NOCMA;
> +	unsigned int flags = current->flags & PF_MEMALLOC_NOMOVABLE;
>   
> -	current->flags |= PF_MEMALLOC_NOCMA;
> +	current->flags |= PF_MEMALLOC_NOMOVABLE;
>   	return flags;
>   }
>   
> -static inline void memalloc_nocma_restore(unsigned int flags)
> +static inline void memalloc_nomovable_restore(unsigned int flags)
>   {
> -	current->flags = (current->flags & ~PF_MEMALLOC_NOCMA) | flags;
> +	current->flags = (current->flags & ~PF_MEMALLOC_NOMOVABLE) | flags;
>   }
> -#else
> -static inline unsigned int memalloc_nocma_save(void)
> -{
> -	return 0;
> -}
> -
> -static inline void memalloc_nocma_restore(unsigned int flags)
> -{
> -}
> -#endif
>   
>   #ifdef CONFIG_MEMCG
>   DECLARE_PER_CPU(struct mem_cgroup *, int_active_memcg);
> diff --git a/mm/gup.c b/mm/gup.c
> index 0e2de888a8b0..724d8a65e1df 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1726,7 +1726,7 @@ static long __gup_longterm_locked(struct mm_struct *mm,
>   			if (!vmas_tmp)
>   				return -ENOMEM;
>   		}
> -		flags = memalloc_nocma_save();
> +		flags = memalloc_nomovable_save();
>   	}
>   
>   	rc = __get_user_pages_locked(mm, start, nr_pages, pages,
> @@ -1749,7 +1749,7 @@ static long __gup_longterm_locked(struct mm_struct *mm,
>   		rc = check_and_migrate_cma_pages(mm, start, rc, pages,
>   						 vmas_tmp, gup_flags);
>   out:
> -		memalloc_nocma_restore(flags);
> +		memalloc_nomovable_restore(flags);
>   	}
>   
>   	if (vmas_tmp != vmas)
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 37f15c3c24dc..02213c74ed6b 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1033,10 +1033,10 @@ static void enqueue_huge_page(struct hstate *h, struct page *page)
>   static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
>   {
>   	struct page *page;
> -	bool nocma = !!(current->flags & PF_MEMALLOC_NOCMA);
> +	bool nomovable = !!(current->flags & PF_MEMALLOC_NOMOVABLE);
>   
>   	list_for_each_entry(page, &h->hugepage_freelists[nid], lru) {
> -		if (nocma && is_migrate_cma_page(page))
> +		if (nomovable && is_migrate_cma_page(page))
>   			continue;
>   
>   		if (PageHWPoison(page))
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index eaa227a479e4..611799c72da5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3772,8 +3772,8 @@ static inline unsigned int current_alloc_flags(gfp_t gfp_mask,
>   #ifdef CONFIG_CMA
>   	unsigned int pflags = current->flags;
>   
> -	if (!(pflags & PF_MEMALLOC_NOCMA) &&
> -			gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
> +	if (!(pflags & PF_MEMALLOC_NOMOVABLE) &&
> +	    gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
>   		alloc_flags |= ALLOC_CMA;
>   
>   #endif
> 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 5/6] mm: honor PF_MEMALLOC_NOMOVABLE for all allocations
  2020-12-02  5:23 ` [PATCH 5/6] mm: honor PF_MEMALLOC_NOMOVABLE for all allocations Pavel Tatashin
@ 2020-12-03  8:17   ` John Hubbard
  2020-12-03 15:06     ` Pavel Tatashin
  2020-12-03  9:17   ` Michal Hocko
  1 sibling, 1 reply; 67+ messages in thread
From: John Hubbard @ 2020-12-03  8:17 UTC (permalink / raw)
  To: Pavel Tatashin, linux-kernel, linux-mm, akpm, vbabka, mhocko,
	david, osalvador, dan.j.williams, sashal, tyhicks,
	iamjoonsoo.kim, mike.kravetz, rostedt, mingo, jgg, peterz,
	mgorman, willy, rientjes

On 12/1/20 9:23 PM, Pavel Tatashin wrote:
> PF_MEMALLOC_NOMOVABLE is only honored for CMA allocations, extend
> this flag to work for any allocations by removing __GFP_MOVABLE from
> gfp_mask when this flag is passed in the current context, thus
> prohibiting allocations from ZONE_MOVABLE.
> 
> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> ---
>   mm/hugetlb.c    |  2 +-
>   mm/page_alloc.c | 26 ++++++++++++++++----------
>   2 files changed, 17 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 02213c74ed6b..00e786201d8b 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1036,7 +1036,7 @@ static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
>   	bool nomovable = !!(current->flags & PF_MEMALLOC_NOMOVABLE);
>   
>   	list_for_each_entry(page, &h->hugepage_freelists[nid], lru) {
> -		if (nomovable && is_migrate_cma_page(page))
> +		if (nomovable && is_migrate_movable(get_pageblock_migratetype(page)))


I wonder if we should add a helper, like is_migrate_cma_page(), that avoids having
to call get_pageblock_migratetype() at all of the callsites?


>   			continue;
>   
>   		if (PageHWPoison(page))
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 611799c72da5..7a6d86d0bc5f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3766,20 +3766,25 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask)
>   	return alloc_flags;
>   }
>   
> -static inline unsigned int current_alloc_flags(gfp_t gfp_mask,
> -					unsigned int alloc_flags)
> +static inline unsigned int cma_alloc_flags(gfp_t gfp_mask,
> +					   unsigned int alloc_flags)

Actually, maybe the original name should be left intact. This handles current alloc
flags, which right now happen to only cover CMA flags, so the original name seems
accurate, right?


thanks,

John Hubbard
NVIDIA

>   {
>   #ifdef CONFIG_CMA
> -	unsigned int pflags = current->flags;
> -
> -	if (!(pflags & PF_MEMALLOC_NOMOVABLE) &&
> -	    gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
> +	if (gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
>   		alloc_flags |= ALLOC_CMA;
> -
>   #endif
>   	return alloc_flags;
>   }
>   
> +static inline gfp_t current_gfp_checkmovable(gfp_t gfp_mask)
> +{
> +	unsigned int pflags = current->flags;
> +
> +	if ((pflags & PF_MEMALLOC_NOMOVABLE))
> +		return gfp_mask & ~__GFP_MOVABLE;
> +	return gfp_mask;
> +}
> +
>   /*
>    * get_page_from_freelist goes through the zonelist trying to allocate
>    * a page.
> @@ -4423,7 +4428,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
>   	} else if (unlikely(rt_task(current)) && !in_interrupt())
>   		alloc_flags |= ALLOC_HARDER;
>   
> -	alloc_flags = current_alloc_flags(gfp_mask, alloc_flags);
> +	alloc_flags = cma_alloc_flags(gfp_mask, alloc_flags);
>   
>   	return alloc_flags;
>   }
> @@ -4725,7 +4730,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>   
>   	reserve_flags = __gfp_pfmemalloc_flags(gfp_mask);
>   	if (reserve_flags)
> -		alloc_flags = current_alloc_flags(gfp_mask, reserve_flags);
> +		alloc_flags = cma_alloc_flags(gfp_mask, reserve_flags);
>   
>   	/*
>   	 * Reset the nodemask and zonelist iterators if memory policies can be
> @@ -4894,7 +4899,7 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
>   	if (should_fail_alloc_page(gfp_mask, order))
>   		return false;
>   
> -	*alloc_flags = current_alloc_flags(gfp_mask, *alloc_flags);
> +	*alloc_flags = cma_alloc_flags(gfp_mask, *alloc_flags);
>   
>   	/* Dirty zone balancing only done in the fast path */
>   	ac->spread_dirty_pages = (gfp_mask & __GFP_WRITE);
> @@ -4932,6 +4937,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>   	}
>   
>   	gfp_mask &= gfp_allowed_mask;
> +	gfp_mask = current_gfp_checkmovable(gfp_mask);
>   	alloc_mask = gfp_mask;
>   	if (!prepare_alloc_pages(gfp_mask, order, preferred_nid, nodemask, &ac, &alloc_mask, &alloc_flags))
>   		return NULL;
> 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone
  2020-12-02  5:23 ` [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone Pavel Tatashin
  2020-12-02 16:35   ` Jason Gunthorpe
@ 2020-12-03  8:22   ` John Hubbard
  2020-12-03 15:55     ` Pavel Tatashin
  2020-12-04  4:13   ` Joonsoo Kim
  2 siblings, 1 reply; 67+ messages in thread
From: John Hubbard @ 2020-12-03  8:22 UTC (permalink / raw)
  To: Pavel Tatashin, linux-kernel, linux-mm, akpm, vbabka, mhocko,
	david, osalvador, dan.j.williams, sashal, tyhicks,
	iamjoonsoo.kim, mike.kravetz, rostedt, mingo, jgg, peterz,
	mgorman, willy, rientjes

On 12/1/20 9:23 PM, Pavel Tatashin wrote:
> We do not allocate pin pages in ZONE_MOVABLE, but if pages were already
> allocated before pinning they need to migrated to a different zone.
> Currently, we migrate movable CMA pages only. Generalize the function
> that migrates CMA pages to migrate all movable pages.
> 
> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> ---
>   include/linux/migrate.h        |  1 +
>   include/trace/events/migrate.h |  3 +-
>   mm/gup.c                       | 56 +++++++++++++---------------------
>   3 files changed, 24 insertions(+), 36 deletions(-)
> 

I like the cleanup so far, even at this point it's a relief to finally
see the nested ifdefs get removed.

One naming nit/idea below, but this looks fine as is, so:

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index 0f8d1583fa8e..00bab23d1ee5 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -27,6 +27,7 @@ enum migrate_reason {
>   	MR_MEMPOLICY_MBIND,
>   	MR_NUMA_MISPLACED,
>   	MR_CONTIG_RANGE,
> +	MR_LONGTERM_PIN,
>   	MR_TYPES
>   };
>   
> diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h
> index 4d434398d64d..363b54ce104c 100644
> --- a/include/trace/events/migrate.h
> +++ b/include/trace/events/migrate.h
> @@ -20,7 +20,8 @@
>   	EM( MR_SYSCALL,		"syscall_or_cpuset")		\
>   	EM( MR_MEMPOLICY_MBIND,	"mempolicy_mbind")		\
>   	EM( MR_NUMA_MISPLACED,	"numa_misplaced")		\
> -	EMe(MR_CONTIG_RANGE,	"contig_range")
> +	EM( MR_CONTIG_RANGE,	"contig_range")			\
> +	EMe(MR_LONGTERM_PIN,	"longterm_pin")
>   
>   /*
>    * First define the enums in the above macros to be exported to userspace
> diff --git a/mm/gup.c b/mm/gup.c
> index 724d8a65e1df..1d511f65f8a7 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1593,19 +1593,18 @@ static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
>   }
>   #endif
>   
> -#ifdef CONFIG_CMA
> -static long check_and_migrate_cma_pages(struct mm_struct *mm,
> -					unsigned long start,
> -					unsigned long nr_pages,
> -					struct page **pages,
> -					struct vm_area_struct **vmas,
> -					unsigned int gup_flags)
> +static long check_and_migrate_movable_pages(struct mm_struct *mm,
> +					    unsigned long start,
> +					    unsigned long nr_pages,
> +					    struct page **pages,
> +					    struct vm_area_struct **vmas,
> +					    unsigned int gup_flags)
>   {
>   	unsigned long i;
>   	unsigned long step;
>   	bool drain_allow = true;
>   	bool migrate_allow = true;
> -	LIST_HEAD(cma_page_list);
> +	LIST_HEAD(page_list);


Maybe naming it "movable_page_list", would be a nice touch.


thanks,
-- 
John Hubbard
NVIDIA

>   	long ret = nr_pages;
>   	struct migration_target_control mtc = {
>   		.nid = NUMA_NO_NODE,
> @@ -1623,13 +1622,12 @@ static long check_and_migrate_cma_pages(struct mm_struct *mm,
>   		 */
>   		step = compound_nr(head) - (pages[i] - head);
>   		/*
> -		 * If we get a page from the CMA zone, since we are going to
> -		 * be pinning these entries, we might as well move them out
> -		 * of the CMA zone if possible.
> +		 * If we get a movable page, since we are going to be pinning
> +		 * these entries, try to move them out if possible.
>   		 */
> -		if (is_migrate_cma_page(head)) {
> +		if (is_migrate_movable(get_pageblock_migratetype(head))) {
>   			if (PageHuge(head))
> -				isolate_huge_page(head, &cma_page_list);
> +				isolate_huge_page(head, &page_list);
>   			else {
>   				if (!PageLRU(head) && drain_allow) {
>   					lru_add_drain_all();
> @@ -1637,7 +1635,7 @@ static long check_and_migrate_cma_pages(struct mm_struct *mm,
>   				}
>   
>   				if (!isolate_lru_page(head)) {
> -					list_add_tail(&head->lru, &cma_page_list);
> +					list_add_tail(&head->lru, &page_list);
>   					mod_node_page_state(page_pgdat(head),
>   							    NR_ISOLATED_ANON +
>   							    page_is_file_lru(head),
> @@ -1649,7 +1647,7 @@ static long check_and_migrate_cma_pages(struct mm_struct *mm,
>   		i += step;
>   	}
>   
> -	if (!list_empty(&cma_page_list)) {
> +	if (!list_empty(&page_list)) {
>   		/*
>   		 * drop the above get_user_pages reference.
>   		 */
> @@ -1659,7 +1657,7 @@ static long check_and_migrate_cma_pages(struct mm_struct *mm,
>   			for (i = 0; i < nr_pages; i++)
>   				put_page(pages[i]);
>   
> -		if (migrate_pages(&cma_page_list, alloc_migration_target, NULL,
> +		if (migrate_pages(&page_list, alloc_migration_target, NULL,
>   			(unsigned long)&mtc, MIGRATE_SYNC, MR_CONTIG_RANGE)) {
>   			/*
>   			 * some of the pages failed migration. Do get_user_pages
> @@ -1667,17 +1665,16 @@ static long check_and_migrate_cma_pages(struct mm_struct *mm,
>   			 */
>   			migrate_allow = false;
>   
> -			if (!list_empty(&cma_page_list))
> -				putback_movable_pages(&cma_page_list);
> +			if (!list_empty(&page_list))
> +				putback_movable_pages(&page_list);
>   		}
>   		/*
>   		 * We did migrate all the pages, Try to get the page references
> -		 * again migrating any new CMA pages which we failed to isolate
> -		 * earlier.
> +		 * again migrating any pages which we failed to isolate earlier.
>   		 */
>   		ret = __get_user_pages_locked(mm, start, nr_pages,
> -						   pages, vmas, NULL,
> -						   gup_flags);
> +					      pages, vmas, NULL,
> +					      gup_flags);
>   
>   		if ((ret > 0) && migrate_allow) {
>   			nr_pages = ret;
> @@ -1688,17 +1685,6 @@ static long check_and_migrate_cma_pages(struct mm_struct *mm,
>   
>   	return ret;
>   }
> -#else
> -static long check_and_migrate_cma_pages(struct mm_struct *mm,
> -					unsigned long start,
> -					unsigned long nr_pages,
> -					struct page **pages,
> -					struct vm_area_struct **vmas,
> -					unsigned int gup_flags)
> -{
> -	return nr_pages;
> -}
> -#endif /* CONFIG_CMA */
>   
>   /*
>    * __gup_longterm_locked() is a wrapper for __get_user_pages_locked which
> @@ -1746,8 +1732,8 @@ static long __gup_longterm_locked(struct mm_struct *mm,
>   			goto out;
>   		}
>   
> -		rc = check_and_migrate_cma_pages(mm, start, rc, pages,
> -						 vmas_tmp, gup_flags);
> +		rc = check_and_migrate_movable_pages(mm, start, rc, pages,
> +						     vmas_tmp, gup_flags);
>   out:
>   		memalloc_nomovable_restore(flags);
>   	}
> 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/6] mm/gup: don't pin migrated cma pages in movable zone
  2020-12-02  5:23 ` [PATCH 2/6] mm/gup: don't pin migrated cma pages in movable zone Pavel Tatashin
  2020-12-02 16:31   ` David Hildenbrand
  2020-12-03  8:01   ` John Hubbard
@ 2020-12-03  8:46   ` Michal Hocko
  2020-12-03 14:58     ` Pavel Tatashin
  2 siblings, 1 reply; 67+ messages in thread
From: Michal Hocko @ 2020-12-03  8:46 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: linux-kernel, linux-mm, akpm, vbabka, david, osalvador,
	dan.j.williams, sashal, tyhicks, iamjoonsoo.kim, mike.kravetz,
	rostedt, mingo, jgg, peterz, mgorman, willy, rientjes, jhubbard

On Wed 02-12-20 00:23:26, Pavel Tatashin wrote:
> In order not to fragment CMA the pinned pages are migrated. However,
> they are migrated to ZONE_MOVABLE, which also should not have pinned pages.
> 
> Remove __GFP_MOVABLE, so pages can be migrated to zones where pinning
> is allowed.

I was wondering why we do have __GFP_MOVABLE at all. Took a shovel
and... 41b4dc14ee807 says:
: We have well defined scope API to exclude CMA region.  Use it rather than
: manipulating gfp_mask manually.  With this change, we can now restore
: __GFP_MOVABLE for gfp_mask like as usual migration target allocation.  It
: would result in that the ZONE_MOVABLE is also searched by page allocator.
: For hugetlb, gfp_mask is redefined since it has a regular allocation mask
: filter for migration target.  __GPF_NOWARN is added to hugetlb gfp_mask
: filter since a new user for gfp_mask filter, gup, want to be silent when
: allocation fails.

This clearly states that the priority was to increase the migration
target success rate rather than bother with the pinning aspect of the
target page. So I believe we have simply ignored/missed the point of the
movable zone guarantees back then and that was a mistake.
 
> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>

I have to admit I am not really sure about the failure path. The code is
just too convoluted to follow. I presume the pin will fail in that case.
Anyway this wouldn't be anything new in this path. Movable zone
exclusion can make the failure slightly more possible in some setups but
fundamentally nothing new there.

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/gup.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index cdb8b9eeb016..3a76c005a3e2 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1610,7 +1610,7 @@ static long check_and_migrate_cma_pages(struct mm_struct *mm,
>  	long ret = nr_pages;
>  	struct migration_target_control mtc = {
>  		.nid = NUMA_NO_NODE,
> -		.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_NOWARN,
> +		.gfp_mask = GFP_USER | __GFP_NOWARN,
>  	};
>  
>  check_again:
> -- 
> 2.25.1
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 4/6] mm cma: rename PF_MEMALLOC_NOCMA to PF_MEMALLOC_NOMOVABLE
  2020-12-02  5:23 ` [PATCH 4/6] mm cma: rename PF_MEMALLOC_NOCMA to PF_MEMALLOC_NOMOVABLE Pavel Tatashin
  2020-12-03  8:04   ` John Hubbard
@ 2020-12-03  8:57   ` Michal Hocko
  2020-12-03 15:02     ` Pavel Tatashin
  1 sibling, 1 reply; 67+ messages in thread
From: Michal Hocko @ 2020-12-03  8:57 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: linux-kernel, linux-mm, akpm, vbabka, david, osalvador,
	dan.j.williams, sashal, tyhicks, iamjoonsoo.kim, mike.kravetz,
	rostedt, mingo, jgg, peterz, mgorman, willy, rientjes, jhubbard

On Wed 02-12-20 00:23:28, Pavel Tatashin wrote:
> PF_MEMALLOC_NOCMA is used for longterm pinning and has an effect of
> clearing _GFP_MOVABLE or prohibiting allocations from ZONE_MOVABLE.

This is not precise. You are mixing the implementation and the intention
here. I would say "PF_MEMALLOC_NOCMA is used ot guarantee that the
allocator will not return pages that might belong to CMA region. This is
currently used for long term gup to make sure that such pins are not
going to be done on any CMA pages."
 
> We will prohibit allocating any pages that are getting
> longterm pinned from ZONE_MOVABLE, and we would want to unify and re-use
> this flag. So, rename it to generic PF_MEMALLOC_NOMOVABLE.
> Also re-name:
> memalloc_nocma_save()/memalloc_nocma_restore
> to
> memalloc_nomovable_save()/memalloc_nomovable_restore()
> and make the new functions common.

This is hard to parse for me. I would go with something like:
"
When PF_MEMALLOC_NOCMA has been introduced we haven't realized that it
is focusing on CMA pages too much and that there is larger class of
pages that need the same treatment. MOVABLE zone cannot contain
any long term pins as well so it makes sense to reuse and redefine this
flag for that usecase as well. Rename the flag to PF_MEMALLOC_NOMOVABLE
which defines an allocation context which can only get pages suitable
for long-term pins.
"

Btw. the naming is hard but PF_MEMALLOC_NOMOVABLE is a bit misnomer. CMA
pages are not implicitly movable. So in fact we do care more about
pinning than movability. PF_MEMALLOC_PIN or something similar would be
better fit for the overal intention.

Other than that looks good to me. Thanks!
 
> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> ---
>  include/linux/sched.h    |  2 +-
>  include/linux/sched/mm.h | 21 +++++----------------
>  mm/gup.c                 |  4 ++--
>  mm/hugetlb.c             |  4 ++--
>  mm/page_alloc.c          |  4 ++--
>  5 files changed, 12 insertions(+), 23 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 76cd21fa5501..f1bf05f5f5fa 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1548,7 +1548,7 @@ extern struct pid *cad_pid;
>  #define PF_SWAPWRITE		0x00800000	/* Allowed to write to swap */
>  #define PF_NO_SETAFFINITY	0x04000000	/* Userland is not allowed to meddle with cpus_mask */
>  #define PF_MCE_EARLY		0x08000000      /* Early kill for mce process policy */
> -#define PF_MEMALLOC_NOCMA	0x10000000	/* All allocation request will have _GFP_MOVABLE cleared */
> +#define PF_MEMALLOC_NOMOVABLE	0x10000000	/* All allocation request will have _GFP_MOVABLE cleared */
>  #define PF_FREEZER_SKIP		0x40000000	/* Freezer should not count it as freezable */
>  #define PF_SUSPEND_TASK		0x80000000      /* This thread called freeze_processes() and should not be frozen */
>  
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index d5ece7a9a403..5bb9a6b69479 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -254,29 +254,18 @@ static inline void memalloc_noreclaim_restore(unsigned int flags)
>  	current->flags = (current->flags & ~PF_MEMALLOC) | flags;
>  }
>  
> -#ifdef CONFIG_CMA
> -static inline unsigned int memalloc_nocma_save(void)
> +static inline unsigned int memalloc_nomovable_save(void)
>  {
> -	unsigned int flags = current->flags & PF_MEMALLOC_NOCMA;
> +	unsigned int flags = current->flags & PF_MEMALLOC_NOMOVABLE;
>  
> -	current->flags |= PF_MEMALLOC_NOCMA;
> +	current->flags |= PF_MEMALLOC_NOMOVABLE;
>  	return flags;
>  }
>  
> -static inline void memalloc_nocma_restore(unsigned int flags)
> +static inline void memalloc_nomovable_restore(unsigned int flags)
>  {
> -	current->flags = (current->flags & ~PF_MEMALLOC_NOCMA) | flags;
> +	current->flags = (current->flags & ~PF_MEMALLOC_NOMOVABLE) | flags;
>  }
> -#else
> -static inline unsigned int memalloc_nocma_save(void)
> -{
> -	return 0;
> -}
> -
> -static inline void memalloc_nocma_restore(unsigned int flags)
> -{
> -}
> -#endif
>  
>  #ifdef CONFIG_MEMCG
>  DECLARE_PER_CPU(struct mem_cgroup *, int_active_memcg);
> diff --git a/mm/gup.c b/mm/gup.c
> index 0e2de888a8b0..724d8a65e1df 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1726,7 +1726,7 @@ static long __gup_longterm_locked(struct mm_struct *mm,
>  			if (!vmas_tmp)
>  				return -ENOMEM;
>  		}
> -		flags = memalloc_nocma_save();
> +		flags = memalloc_nomovable_save();
>  	}
>  
>  	rc = __get_user_pages_locked(mm, start, nr_pages, pages,
> @@ -1749,7 +1749,7 @@ static long __gup_longterm_locked(struct mm_struct *mm,
>  		rc = check_and_migrate_cma_pages(mm, start, rc, pages,
>  						 vmas_tmp, gup_flags);
>  out:
> -		memalloc_nocma_restore(flags);
> +		memalloc_nomovable_restore(flags);
>  	}
>  
>  	if (vmas_tmp != vmas)
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 37f15c3c24dc..02213c74ed6b 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1033,10 +1033,10 @@ static void enqueue_huge_page(struct hstate *h, struct page *page)
>  static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
>  {
>  	struct page *page;
> -	bool nocma = !!(current->flags & PF_MEMALLOC_NOCMA);
> +	bool nomovable = !!(current->flags & PF_MEMALLOC_NOMOVABLE);
>  
>  	list_for_each_entry(page, &h->hugepage_freelists[nid], lru) {
> -		if (nocma && is_migrate_cma_page(page))
> +		if (nomovable && is_migrate_cma_page(page))
>  			continue;
>  
>  		if (PageHWPoison(page))
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index eaa227a479e4..611799c72da5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3772,8 +3772,8 @@ static inline unsigned int current_alloc_flags(gfp_t gfp_mask,
>  #ifdef CONFIG_CMA
>  	unsigned int pflags = current->flags;
>  
> -	if (!(pflags & PF_MEMALLOC_NOCMA) &&
> -			gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
> +	if (!(pflags & PF_MEMALLOC_NOMOVABLE) &&
> +	    gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
>  		alloc_flags |= ALLOC_CMA;
>  
>  #endif
> -- 
> 2.25.1
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 5/6] mm: honor PF_MEMALLOC_NOMOVABLE for all allocations
  2020-12-02  5:23 ` [PATCH 5/6] mm: honor PF_MEMALLOC_NOMOVABLE for all allocations Pavel Tatashin
  2020-12-03  8:17   ` John Hubbard
@ 2020-12-03  9:17   ` Michal Hocko
  2020-12-03 15:15     ` Pavel Tatashin
  1 sibling, 1 reply; 67+ messages in thread
From: Michal Hocko @ 2020-12-03  9:17 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: linux-kernel, linux-mm, akpm, vbabka, david, osalvador,
	dan.j.williams, sashal, tyhicks, iamjoonsoo.kim, mike.kravetz,
	rostedt, mingo, jgg, peterz, mgorman, willy, rientjes, jhubbard

On Wed 02-12-20 00:23:29, Pavel Tatashin wrote:
[...]
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 611799c72da5..7a6d86d0bc5f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3766,20 +3766,25 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask)
>  	return alloc_flags;
>  }
>  
> -static inline unsigned int current_alloc_flags(gfp_t gfp_mask,
> -					unsigned int alloc_flags)
> +static inline unsigned int cma_alloc_flags(gfp_t gfp_mask,
> +					   unsigned int alloc_flags)
>  {
>  #ifdef CONFIG_CMA
> -	unsigned int pflags = current->flags;
> -
> -	if (!(pflags & PF_MEMALLOC_NOMOVABLE) &&
> -	    gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
> +	if (gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
>  		alloc_flags |= ALLOC_CMA;
> -
>  #endif
>  	return alloc_flags;
>  }
>  
> +static inline gfp_t current_gfp_checkmovable(gfp_t gfp_mask)
> +{
> +	unsigned int pflags = current->flags;
> +
> +	if ((pflags & PF_MEMALLOC_NOMOVABLE))
> +		return gfp_mask & ~__GFP_MOVABLE;
> +	return gfp_mask;
> +}
> +

It sucks that we have to control both ALLOC and gfp flags. But wouldn't
it be simpler and more straightforward to keep current_alloc_flags as is
(module PF rename) and hook the gfp mask evaluation into current_gfp_context
and move it up before the first allocation attempt? All scope flags
should be applicable to the hot path as well. It would add few cycles to
there but the question is whether that would be noticeable over just
handling PF_MEMALLOC_NOMOVABLE on its own. The cache line would be
pulled in anyway.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone
  2020-12-03  1:34         ` Pavel Tatashin
@ 2020-12-03 14:17           ` Jason Gunthorpe
  2020-12-03 16:40             ` Pavel Tatashin
  2020-12-04 20:05             ` Daniel Jordan
  0 siblings, 2 replies; 67+ messages in thread
From: Jason Gunthorpe @ 2020-12-03 14:17 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka, Michal Hocko,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, Joonsoo Kim, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Mel Gorman, Matthew Wilcox,
	David Rientjes, John Hubbard

On Wed, Dec 02, 2020 at 08:34:32PM -0500, Pavel Tatashin wrote:

> > Either here or perhaps even lower down the call chain when the page is
> > captured, similar to how GUP fast would detect it. (how is that done
> > anyhow?)
> 
> Ah, thank you for pointing this out. I think I need to address it here:
> 
> https://soleen.com/source/xref/linux/mm/gup.c?r=96e1fac1#94
> 
> static __maybe_unused struct page *try_grab_compound_head()
>               if (unlikely(flags & FOLL_LONGTERM) &&  is_migrate_cma_page(page))
>                    return NULL;
> 
> I need to change is_migrate_cma_page() to all migratable pages. Will
> study, and send an update with this fix.

Yes, missing the two flows is a common error :(

Looking at this code some more.. How is it even correct?

1633  				if (!isolate_lru_page(head)) {
1634  					list_add_tail(&head->lru, &cma_page_list);

Here we are only running under the read side of the mmap sem so multiple
GUPs can be calling that sequence in parallel. I don't see any
obvious exclusion that will prevent corruption of head->lru. The first
GUP thread to do isolate_lru_page() will ClearPageLRU() and the second
GUP thread will be a NOP for isolate_lru_page().

They will both race list_add_tail and other list ops. That is not OK.

> What I meant is the users of the interface do it incrementally not in
> large chunks. For example:
> 
> vfio_pin_pages_remote
>    vaddr_get_pfn
>         ret = pin_user_pages_remote(mm, vaddr, 1, flags |
> FOLL_LONGTERM, page, NULL, NULL);
> 1 -> pin only one pages at a time

I don't know why vfio does this, it is why it so ridiculously slow at
least.

Jason

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/6] mm/gup: perform check_dax_vmas only when FS_DAX is enabled
  2020-12-03  7:59   ` John Hubbard
@ 2020-12-03 14:52     ` Pavel Tatashin
  0 siblings, 0 replies; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-03 14:52 UTC (permalink / raw)
  To: John Hubbard
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka, Michal Hocko,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, Joonsoo Kim, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Jason Gunthorpe, Peter Zijlstra, Mel Gorman,
	Matthew Wilcox, David Rientjes

> Looks obviously correct, and the follow-up simplication is very nice.
>
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>

Thank you,
Pasha

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/6] mm/gup: don't pin migrated cma pages in movable zone
  2020-12-03  8:46   ` Michal Hocko
@ 2020-12-03 14:58     ` Pavel Tatashin
  0 siblings, 0 replies; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-03 14:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, Joonsoo Kim, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Jason Gunthorpe, Peter Zijlstra, Mel Gorman,
	Matthew Wilcox, David Rientjes, John Hubbard

On Thu, Dec 3, 2020 at 3:46 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 02-12-20 00:23:26, Pavel Tatashin wrote:
> > In order not to fragment CMA the pinned pages are migrated. However,
> > they are migrated to ZONE_MOVABLE, which also should not have pinned pages.
> >
> > Remove __GFP_MOVABLE, so pages can be migrated to zones where pinning
> > is allowed.
>
> I was wondering why we do have __GFP_MOVABLE at all. Took a shovel
> and... 41b4dc14ee807 says:
> : We have well defined scope API to exclude CMA region.  Use it rather than
> : manipulating gfp_mask manually.  With this change, we can now restore
> : __GFP_MOVABLE for gfp_mask like as usual migration target allocation.  It
> : would result in that the ZONE_MOVABLE is also searched by page allocator.
> : For hugetlb, gfp_mask is redefined since it has a regular allocation mask
> : filter for migration target.  __GPF_NOWARN is added to hugetlb gfp_mask
> : filter since a new user for gfp_mask filter, gup, want to be silent when
> : allocation fails.
>
> This clearly states that the priority was to increase the migration
> target success rate rather than bother with the pinning aspect of the
> target page. So I believe we have simply ignored/missed the point of the
> movable zone guarantees back then and that was a mistake.
>
> > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
>
> I have to admit I am not really sure about the failure path. The code is
> just too convoluted to follow. I presume the pin will fail in that case.
> Anyway this wouldn't be anything new in this path. Movable zone
> exclusion can make the failure slightly more possible in some setups but
> fundamentally nothing new there.

I've been trying to keep this series simple for easier backport, and
not to introduce new changes beside increasing the scope of pages
which are not allowed to be pinned. This area, however, requires some
inspection and fixes, something that Jason also mentioned in another
patch.

> Acked-by: Michal Hocko <mhocko@suse.com>

Thank you,
Pasha

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 4/6] mm cma: rename PF_MEMALLOC_NOCMA to PF_MEMALLOC_NOMOVABLE
  2020-12-03  8:57   ` Michal Hocko
@ 2020-12-03 15:02     ` Pavel Tatashin
  0 siblings, 0 replies; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-03 15:02 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, Joonsoo Kim, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Jason Gunthorpe, Peter Zijlstra, Mel Gorman,
	Matthew Wilcox, David Rientjes, John Hubbard

On Thu, Dec 3, 2020 at 3:57 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 02-12-20 00:23:28, Pavel Tatashin wrote:
> > PF_MEMALLOC_NOCMA is used for longterm pinning and has an effect of
> > clearing _GFP_MOVABLE or prohibiting allocations from ZONE_MOVABLE.
>
> This is not precise. You are mixing the implementation and the intention
> here. I would say "PF_MEMALLOC_NOCMA is used ot guarantee that the
> allocator will not return pages that might belong to CMA region. This is
> currently used for long term gup to make sure that such pins are not
> going to be done on any CMA pages."
>
> > We will prohibit allocating any pages that are getting
> > longterm pinned from ZONE_MOVABLE, and we would want to unify and re-use
> > this flag. So, rename it to generic PF_MEMALLOC_NOMOVABLE.
> > Also re-name:
> > memalloc_nocma_save()/memalloc_nocma_restore
> > to
> > memalloc_nomovable_save()/memalloc_nomovable_restore()
> > and make the new functions common.
>
> This is hard to parse for me. I would go with something like:
> "
> When PF_MEMALLOC_NOCMA has been introduced we haven't realized that it
> is focusing on CMA pages too much and that there is larger class of
> pages that need the same treatment. MOVABLE zone cannot contain
> any long term pins as well so it makes sense to reuse and redefine this
> flag for that usecase as well. Rename the flag to PF_MEMALLOC_NOMOVABLE
> which defines an allocation context which can only get pages suitable
> for long-term pins.
> "

I will address the above with your suggested wording.

>
> Btw. the naming is hard but PF_MEMALLOC_NOMOVABLE is a bit misnomer. CMA
> pages are not implicitly movable. So in fact we do care more about
> pinning than movability. PF_MEMALLOC_PIN or something similar would be
> better fit for the overal intention.

Sounds good, I will rename with PF_MEMALLOC_PIN

>
> Other than that looks good to me. Thanks!

Thank you,
Pasha

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 4/6] mm cma: rename PF_MEMALLOC_NOCMA to PF_MEMALLOC_NOMOVABLE
  2020-12-03  8:04   ` John Hubbard
@ 2020-12-03 15:02     ` Pavel Tatashin
  0 siblings, 0 replies; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-03 15:02 UTC (permalink / raw)
  To: John Hubbard
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka, Michal Hocko,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, Joonsoo Kim, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Jason Gunthorpe, Peter Zijlstra, Mel Gorman,
	Matthew Wilcox, David Rientjes

>
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>

Thank you for your review.
Pasha

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 3/6] mm/gup: make __gup_longterm_locked common
  2020-12-03  8:03   ` John Hubbard
@ 2020-12-03 15:02     ` Pavel Tatashin
  0 siblings, 0 replies; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-03 15:02 UTC (permalink / raw)
  To: John Hubbard
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka, Michal Hocko,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, Joonsoo Kim, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Jason Gunthorpe, Peter Zijlstra, Mel Gorman,
	Matthew Wilcox, David Rientjes

> Reviewed-by: John Hubbard <jhubbard@nvidia.com>

Thank you,
Pasha

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 5/6] mm: honor PF_MEMALLOC_NOMOVABLE for all allocations
  2020-12-03  8:17   ` John Hubbard
@ 2020-12-03 15:06     ` Pavel Tatashin
  2020-12-03 16:51       ` John Hubbard
  0 siblings, 1 reply; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-03 15:06 UTC (permalink / raw)
  To: John Hubbard
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka, Michal Hocko,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, Joonsoo Kim, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Jason Gunthorpe, Peter Zijlstra, Mel Gorman,
	Matthew Wilcox, David Rientjes

On Thu, Dec 3, 2020 at 3:17 AM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 12/1/20 9:23 PM, Pavel Tatashin wrote:
> > PF_MEMALLOC_NOMOVABLE is only honored for CMA allocations, extend
> > this flag to work for any allocations by removing __GFP_MOVABLE from
> > gfp_mask when this flag is passed in the current context, thus
> > prohibiting allocations from ZONE_MOVABLE.
> >
> > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> > ---
> >   mm/hugetlb.c    |  2 +-
> >   mm/page_alloc.c | 26 ++++++++++++++++----------
> >   2 files changed, 17 insertions(+), 11 deletions(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 02213c74ed6b..00e786201d8b 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1036,7 +1036,7 @@ static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
> >       bool nomovable = !!(current->flags & PF_MEMALLOC_NOMOVABLE);
> >
> >       list_for_each_entry(page, &h->hugepage_freelists[nid], lru) {
> > -             if (nomovable && is_migrate_cma_page(page))
> > +             if (nomovable && is_migrate_movable(get_pageblock_migratetype(page)))
>
>
> I wonder if we should add a helper, like is_migrate_cma_page(), that avoids having
> to call get_pageblock_migratetype() at all of the callsites?

Good idea, I will add it.

>
>
> >                       continue;
> >
> >               if (PageHWPoison(page))
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 611799c72da5..7a6d86d0bc5f 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -3766,20 +3766,25 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask)
> >       return alloc_flags;
> >   }
> >
> > -static inline unsigned int current_alloc_flags(gfp_t gfp_mask,
> > -                                     unsigned int alloc_flags)
> > +static inline unsigned int cma_alloc_flags(gfp_t gfp_mask,
> > +                                        unsigned int alloc_flags)
>
> Actually, maybe the original name should be left intact. This handles current alloc
> flags, which right now happen to only cover CMA flags, so the original name seems
> accurate, right?

The reason I re-named it is because we do not access current context
anymore, only use gfp_mask to get cma flag.
>> -     unsigned int pflags = current->flags;

So, keeping "current" in the function name makes its intent misleading.

Thank you,
Pasha

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 5/6] mm: honor PF_MEMALLOC_NOMOVABLE for all allocations
  2020-12-03  9:17   ` Michal Hocko
@ 2020-12-03 15:15     ` Pavel Tatashin
  2020-12-04  8:43       ` Michal Hocko
  0 siblings, 1 reply; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-03 15:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, Joonsoo Kim, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Jason Gunthorpe, Peter Zijlstra, Mel Gorman,
	Matthew Wilcox, David Rientjes, John Hubbard

On Thu, Dec 3, 2020 at 4:17 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 02-12-20 00:23:29, Pavel Tatashin wrote:
> [...]
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 611799c72da5..7a6d86d0bc5f 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -3766,20 +3766,25 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask)
> >       return alloc_flags;
> >  }
> >
> > -static inline unsigned int current_alloc_flags(gfp_t gfp_mask,
> > -                                     unsigned int alloc_flags)
> > +static inline unsigned int cma_alloc_flags(gfp_t gfp_mask,
> > +                                        unsigned int alloc_flags)
> >  {
> >  #ifdef CONFIG_CMA
> > -     unsigned int pflags = current->flags;
> > -
> > -     if (!(pflags & PF_MEMALLOC_NOMOVABLE) &&
> > -         gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
> > +     if (gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
> >               alloc_flags |= ALLOC_CMA;
> > -
> >  #endif
> >       return alloc_flags;
> >  }
> >
> > +static inline gfp_t current_gfp_checkmovable(gfp_t gfp_mask)
> > +{
> > +     unsigned int pflags = current->flags;
> > +
> > +     if ((pflags & PF_MEMALLOC_NOMOVABLE))
> > +             return gfp_mask & ~__GFP_MOVABLE;
> > +     return gfp_mask;
> > +}
> > +
>
> It sucks that we have to control both ALLOC and gfp flags. But wouldn't
> it be simpler and more straightforward to keep current_alloc_flags as is
> (module PF rename) and hook the gfp mask evaluation into current_gfp_context
> and move it up before the first allocation attempt?

We could do that, but perhaps as a separate patch? I am worried about
hidden implication of adding extra scope (GFP_NOIO|GFP_NOFS) to the
fast path. Also, current_gfp_context() is used elsewhere, and in some
places removing __GFP_MOVABLE from gfp_mask means that we will need to
also change other things. For example [1], in try_to_free_pages() we
call current_gfp_context(gfp_mask) which can reduce the maximum zone
idx, yet we simply set it to: reclaim_idx = gfp_zone(gfp_mask), not to
the newly determined gfp_mask.

[1] https://soleen.com/source/xref/linux/mm/vmscan.c?r=2da9f630#3239


 All scope flags
> should be applicable to the hot path as well. It would add few cycles to
> there but the question is whether that would be noticeable over just
> handling PF_MEMALLOC_NOMOVABLE on its own. The cache line would be
> pulled in anyway.

Let's try it in a separate patch? I will add it in the next version of
this series.

Thank you,
Pasha

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone
  2020-12-03  8:22   ` John Hubbard
@ 2020-12-03 15:55     ` Pavel Tatashin
  0 siblings, 0 replies; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-03 15:55 UTC (permalink / raw)
  To: John Hubbard
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka, Michal Hocko,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, Joonsoo Kim, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Jason Gunthorpe, Peter Zijlstra, Mel Gorman,
	Matthew Wilcox, David Rientjes

> I like the cleanup so far, even at this point it's a relief to finally
> see the nested ifdefs get removed.
>
> One naming nit/idea below, but this looks fine as is, so:
>
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>

Thank you for reviewing this series.

> Maybe naming it "movable_page_list", would be a nice touch.

Sure, I will change it.

Thank you,
Pasha

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone
  2020-12-03 14:17           ` Jason Gunthorpe
@ 2020-12-03 16:40             ` Pavel Tatashin
  2020-12-03 16:59               ` Jason Gunthorpe
  2020-12-04 20:05             ` Daniel Jordan
  1 sibling, 1 reply; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-03 16:40 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka, Michal Hocko,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, Joonsoo Kim, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Mel Gorman, Matthew Wilcox,
	David Rientjes, John Hubbard

> Looking at this code some more.. How is it even correct?
>
> 1633                            if (!isolate_lru_page(head)) {
> 1634                                    list_add_tail(&head->lru, &cma_page_list);
>
> Here we are only running under the read side of the mmap sem so multiple
> GUPs can be calling that sequence in parallel. I don't see any
> obvious exclusion that will prevent corruption of head->lru. The first
> GUP thread to do isolate_lru_page() will ClearPageLRU() and the second
> GUP thread will be a NOP for isolate_lru_page().
>
> They will both race list_add_tail and other list ops. That is not OK.

Good question. I studied it, and I do not see how this is OK. Worse,
this race is also exposable as a syscall instead of via driver: two
move_pages() run simultaneously. Perhaps in other places?

move_pages()
  kernel_move_pages()
    mmget()
    do_pages_move()
      add_page_for_migratio()
         mmap_read_lock(mm);
         list_add_tail(&head->lru, pagelist); <- Not protected

>
> > What I meant is the users of the interface do it incrementally not in
> > large chunks. For example:
> >
> > vfio_pin_pages_remote
> >    vaddr_get_pfn
> >         ret = pin_user_pages_remote(mm, vaddr, 1, flags |
> > FOLL_LONGTERM, page, NULL, NULL);
> > 1 -> pin only one pages at a time
>
> I don't know why vfio does this, it is why it so ridiculously slow at
> least.

Agreed.

>
> Jason

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 5/6] mm: honor PF_MEMALLOC_NOMOVABLE for all allocations
  2020-12-03 15:06     ` Pavel Tatashin
@ 2020-12-03 16:51       ` John Hubbard
  0 siblings, 0 replies; 67+ messages in thread
From: John Hubbard @ 2020-12-03 16:51 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka, Michal Hocko,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, Joonsoo Kim, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Jason Gunthorpe, Peter Zijlstra, Mel Gorman,
	Matthew Wilcox, David Rientjes

On 12/3/20 7:06 AM, Pavel Tatashin wrote:
...
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 611799c72da5..7a6d86d0bc5f 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -3766,20 +3766,25 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask)
>>>        return alloc_flags;
>>>    }
>>>
>>> -static inline unsigned int current_alloc_flags(gfp_t gfp_mask,
>>> -                                     unsigned int alloc_flags)
>>> +static inline unsigned int cma_alloc_flags(gfp_t gfp_mask,
>>> +                                        unsigned int alloc_flags)
>>
>> Actually, maybe the original name should be left intact. This handles current alloc
>> flags, which right now happen to only cover CMA flags, so the original name seems
>> accurate, right?
> 
> The reason I re-named it is because we do not access current context
> anymore, only use gfp_mask to get cma flag.
>>> -     unsigned int pflags = current->flags;
> 
> So, keeping "current" in the function name makes its intent misleading.
> 

OK, I see. That sounds OK then.


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone
  2020-12-03 16:40             ` Pavel Tatashin
@ 2020-12-03 16:59               ` Jason Gunthorpe
  2020-12-03 17:14                 ` Pavel Tatashin
  0 siblings, 1 reply; 67+ messages in thread
From: Jason Gunthorpe @ 2020-12-03 16:59 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka, Michal Hocko,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, Joonsoo Kim, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Mel Gorman, Matthew Wilcox,
	David Rientjes, John Hubbard

On Thu, Dec 03, 2020 at 11:40:15AM -0500, Pavel Tatashin wrote:
> > Looking at this code some more.. How is it even correct?
> >
> > 1633                            if (!isolate_lru_page(head)) {
> > 1634                                    list_add_tail(&head->lru, &cma_page_list);
> >
> > Here we are only running under the read side of the mmap sem so multiple
> > GUPs can be calling that sequence in parallel. I don't see any
> > obvious exclusion that will prevent corruption of head->lru. The first
> > GUP thread to do isolate_lru_page() will ClearPageLRU() and the second
> > GUP thread will be a NOP for isolate_lru_page().
> >
> > They will both race list_add_tail and other list ops. That is not OK.
> 
> Good question. I studied it, and I do not see how this is OK. Worse,
> this race is also exposable as a syscall instead of via driver: two
> move_pages() run simultaneously. Perhaps in other places?
> 
> move_pages()
>   kernel_move_pages()
>     mmget()
>     do_pages_move()
>       add_page_for_migratio()
>          mmap_read_lock(mm);
>          list_add_tail(&head->lru, pagelist); <- Not protected

When this was CMA only it might have been rarer to trigger, but this
move stuff sounds like it makes it much more broadly, eg on typical
servers with RDMA exposed/etc

Seems like it needs fixing as part of this too :\

Page at a time inside the gup loop could address both concerns, unsure
about batching performance here though..

Jason

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone
  2020-12-03 16:59               ` Jason Gunthorpe
@ 2020-12-03 17:14                 ` Pavel Tatashin
  2020-12-03 19:15                   ` Pavel Tatashin
  0 siblings, 1 reply; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-03 17:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka, Michal Hocko,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, Joonsoo Kim, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Mel Gorman, Matthew Wilcox,
	David Rientjes, John Hubbard

On Thu, Dec 3, 2020 at 11:59 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Thu, Dec 03, 2020 at 11:40:15AM -0500, Pavel Tatashin wrote:
> > > Looking at this code some more.. How is it even correct?
> > >
> > > 1633                            if (!isolate_lru_page(head)) {
> > > 1634                                    list_add_tail(&head->lru, &cma_page_list);
> > >
> > > Here we are only running under the read side of the mmap sem so multiple
> > > GUPs can be calling that sequence in parallel. I don't see any
> > > obvious exclusion that will prevent corruption of head->lru. The first
> > > GUP thread to do isolate_lru_page() will ClearPageLRU() and the second
> > > GUP thread will be a NOP for isolate_lru_page().
> > >
> > > They will both race list_add_tail and other list ops. That is not OK.
> >
> > Good question. I studied it, and I do not see how this is OK. Worse,
> > this race is also exposable as a syscall instead of via driver: two
> > move_pages() run simultaneously. Perhaps in other places?
> >
> > move_pages()
> >   kernel_move_pages()
> >     mmget()
> >     do_pages_move()
> >       add_page_for_migratio()
> >          mmap_read_lock(mm);
> >          list_add_tail(&head->lru, pagelist); <- Not protected
>
> When this was CMA only it might have been rarer to trigger, but this
> move stuff sounds like it makes it much more broadly, eg on typical
> servers with RDMA exposed/etc
>
> Seems like it needs fixing as part of this too :\

Just to clarify the stack that I showed above is outside of gup, it is
the same issue that you pointed out that happens elsewhere. I suspect
there might be more. All of them should be addressed together.

Pasha

>
> Page at a time inside the gup loop could address both concerns, unsure
> about batching performance here though..
>
> Jason

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone
  2020-12-03 17:14                 ` Pavel Tatashin
@ 2020-12-03 19:15                   ` Pavel Tatashin
  2020-12-03 19:36                     ` Jason Gunthorpe
  0 siblings, 1 reply; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-03 19:15 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka, Michal Hocko,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, Joonsoo Kim, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Mel Gorman, Matthew Wilcox,
	David Rientjes, John Hubbard

> > > > Looking at this code some more.. How is it even correct?
> > > >
> > > > 1633                            if (!isolate_lru_page(head)) {
> > > > 1634                                    list_add_tail(&head->lru, &cma_page_list);
> > > >
> > > > Here we are only running under the read side of the mmap sem so multiple
> > > > GUPs can be calling that sequence in parallel. I don't see any
> > > > obvious exclusion that will prevent corruption of head->lru. The first
> > > > GUP thread to do isolate_lru_page() will ClearPageLRU() and the second
> > > > GUP thread will be a NOP for isolate_lru_page().
> > > >
> > > > They will both race list_add_tail and other list ops. That is not OK.
> > >
> > > Good question. I studied it, and I do not see how this is OK. Worse,
> > > this race is also exposable as a syscall instead of via driver: two
> > > move_pages() run simultaneously. Perhaps in other places?
> > >
> > > move_pages()
> > >   kernel_move_pages()
> > >     mmget()
> > >     do_pages_move()
> > >       add_page_for_migratio()
> > >          mmap_read_lock(mm);
> > >          list_add_tail(&head->lru, pagelist); <- Not protected
> >
> > When this was CMA only it might have been rarer to trigger, but this
> > move stuff sounds like it makes it much more broadly, eg on typical
> > servers with RDMA exposed/etc
> >
> > Seems like it needs fixing as part of this too :\
>
> Just to clarify the stack that I showed above is outside of gup, it is
> the same issue that you pointed out that happens elsewhere. I suspect
> there might be more. All of them should be addressed together.

Hi Jason,

I studied some more, and I think this is not a race:
list_add_tail(&head->lru, &cma_page_list) is called only when
isolate_lru_page(head) succeeds.
isolate_lru_page(head) succeeds only when PageLRU(head) is true.
However, in this function we also clear LRU flag before returning
success.
This means, that if we race with another thread, the other thread
won't get to unprotected list_add_tail(&head->lru, &cma_page_list)
until head is is back on LRU list.

Please let me know if I am missing anything.

Thank you,
Pasha

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone
  2020-12-03 19:15                   ` Pavel Tatashin
@ 2020-12-03 19:36                     ` Jason Gunthorpe
  2020-12-04 16:24                       ` Pavel Tatashin
  0 siblings, 1 reply; 67+ messages in thread
From: Jason Gunthorpe @ 2020-12-03 19:36 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka, Michal Hocko,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, Joonsoo Kim, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Mel Gorman, Matthew Wilcox,
	David Rientjes, John Hubbard

On Thu, Dec 03, 2020 at 02:15:36PM -0500, Pavel Tatashin wrote:

> I studied some more, and I think this is not a race:
> list_add_tail(&head->lru, &cma_page_list) is called only when
> isolate_lru_page(head) succeeds.
> isolate_lru_page(head) succeeds only when PageLRU(head) is true.
> However, in this function we also clear LRU flag before returning
> success.
> This means, that if we race with another thread, the other thread
> won't get to unprotected list_add_tail(&head->lru, &cma_page_list)
> until head is is back on LRU list.

Oh interesting, I totally didn't see how that LRU stuff is
working. So.. this creates a ridiculously expensive spin lock? Not
broken, but yikes :|

Jason

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 0/6] prohibit pinning pages in ZONE_MOVABLE
  2020-12-02  5:23 [PATCH 0/6] prohibit pinning pages in ZONE_MOVABLE Pavel Tatashin
                   ` (5 preceding siblings ...)
  2020-12-02  5:23 ` [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone Pavel Tatashin
@ 2020-12-04  4:02 ` Joonsoo Kim
  2020-12-04 15:55   ` Pavel Tatashin
  6 siblings, 1 reply; 67+ messages in thread
From: Joonsoo Kim @ 2020-12-04  4:02 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: linux-kernel, linux-mm, akpm, vbabka, mhocko, david, osalvador,
	dan.j.williams, sashal, tyhicks, mike.kravetz, rostedt, mingo,
	jgg, peterz, mgorman, willy, rientjes, jhubbard

Hello,

On Wed, Dec 02, 2020 at 12:23:24AM -0500, Pavel Tatashin wrote:
> When page is pinned it cannot be moved and its physical address stays
> the same until pages is unpinned.
> 
> This is useful functionality to allows userland to implementation DMA
> access. For example, it is used by vfio in vfio_pin_pages().
> 
> However, this functionality breaks memory hotplug/hotremove assumptions
> that pages in ZONE_MOVABLE can always be migrated.
> 
> This patch series fixes this issue by forcing new allocations during
> page pinning to omit ZONE_MOVABLE, and also to migrate any existing
> pages from ZONE_MOVABLE during pinning.

I love what this patchset does, but, at least, it's better to consider
the side-effect of this patchset and inform it in somewhere. IIUC,
ZONE_MOVABLE exists for two purposes.

1) increasing availability of THP
2) memory hot-unplug

Potential issue would come from the case 1). They uses ZONE_MOVABLE
for THP availability and hard guarantee for migration isn't required
until now. So, there would be a system with following congifuration.

- memory layout: ZONE_NORMAL-512MB, ZONE_MOVABLE-512MB
- memory usage: unmovable-256MB, movable pinned-256MB, movable
  unpinned-512MB

With this patchset, movable pinned should be placed in ZONE_NORMAL so
512MB is required for ZONE_NORMAL. ZONE_NORMAL would be exhausted and
system performance would be highly afftect according to memory usage
pattern.

I'm not sure whether such configuration exists or not, but, at least,
it's better to write down this risk on commit message or something
else.

Thanks.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone
  2020-12-02  5:23 ` [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone Pavel Tatashin
  2020-12-02 16:35   ` Jason Gunthorpe
  2020-12-03  8:22   ` John Hubbard
@ 2020-12-04  4:13   ` Joonsoo Kim
  2020-12-04 17:43     ` Pavel Tatashin
  2 siblings, 1 reply; 67+ messages in thread
From: Joonsoo Kim @ 2020-12-04  4:13 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: linux-kernel, linux-mm, akpm, vbabka, mhocko, david, osalvador,
	dan.j.williams, sashal, tyhicks, mike.kravetz, rostedt, mingo,
	jgg, peterz, mgorman, willy, rientjes, jhubbard

On Wed, Dec 02, 2020 at 12:23:30AM -0500, Pavel Tatashin wrote:
> We do not allocate pin pages in ZONE_MOVABLE, but if pages were already
> allocated before pinning they need to migrated to a different zone.
> Currently, we migrate movable CMA pages only. Generalize the function
> that migrates CMA pages to migrate all movable pages.
> 
> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> ---
>  include/linux/migrate.h        |  1 +
>  include/trace/events/migrate.h |  3 +-
>  mm/gup.c                       | 56 +++++++++++++---------------------
>  3 files changed, 24 insertions(+), 36 deletions(-)
> 
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index 0f8d1583fa8e..00bab23d1ee5 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -27,6 +27,7 @@ enum migrate_reason {
>  	MR_MEMPOLICY_MBIND,
>  	MR_NUMA_MISPLACED,
>  	MR_CONTIG_RANGE,
> +	MR_LONGTERM_PIN,
>  	MR_TYPES
>  };
>  
> diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h
> index 4d434398d64d..363b54ce104c 100644
> --- a/include/trace/events/migrate.h
> +++ b/include/trace/events/migrate.h
> @@ -20,7 +20,8 @@
>  	EM( MR_SYSCALL,		"syscall_or_cpuset")		\
>  	EM( MR_MEMPOLICY_MBIND,	"mempolicy_mbind")		\
>  	EM( MR_NUMA_MISPLACED,	"numa_misplaced")		\
> -	EMe(MR_CONTIG_RANGE,	"contig_range")
> +	EM( MR_CONTIG_RANGE,	"contig_range")			\
> +	EMe(MR_LONGTERM_PIN,	"longterm_pin")
>  
>  /*
>   * First define the enums in the above macros to be exported to userspace
> diff --git a/mm/gup.c b/mm/gup.c
> index 724d8a65e1df..1d511f65f8a7 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1593,19 +1593,18 @@ static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
>  }
>  #endif
>  
> -#ifdef CONFIG_CMA
> -static long check_and_migrate_cma_pages(struct mm_struct *mm,
> -					unsigned long start,
> -					unsigned long nr_pages,
> -					struct page **pages,
> -					struct vm_area_struct **vmas,
> -					unsigned int gup_flags)
> +static long check_and_migrate_movable_pages(struct mm_struct *mm,
> +					    unsigned long start,
> +					    unsigned long nr_pages,
> +					    struct page **pages,
> +					    struct vm_area_struct **vmas,
> +					    unsigned int gup_flags)
>  {
>  	unsigned long i;
>  	unsigned long step;
>  	bool drain_allow = true;
>  	bool migrate_allow = true;
> -	LIST_HEAD(cma_page_list);
> +	LIST_HEAD(page_list);
>  	long ret = nr_pages;
>  	struct migration_target_control mtc = {
>  		.nid = NUMA_NO_NODE,
> @@ -1623,13 +1622,12 @@ static long check_and_migrate_cma_pages(struct mm_struct *mm,
>  		 */
>  		step = compound_nr(head) - (pages[i] - head);
>  		/*
> -		 * If we get a page from the CMA zone, since we are going to
> -		 * be pinning these entries, we might as well move them out
> -		 * of the CMA zone if possible.
> +		 * If we get a movable page, since we are going to be pinning
> +		 * these entries, try to move them out if possible.
>  		 */
> -		if (is_migrate_cma_page(head)) {
> +		if (is_migrate_movable(get_pageblock_migratetype(head))) {

is_migrate_movable() isn't a check for the ZONE. It's a check for the
MIGRATE_TYPE. MIGRATE_TYPE doesn't require hard guarantee for
migration, and, most of memory, including ZONE_NORMAL, is
MIGRATE_MOVABLE. With this code, long term gup would always fails due
to not enough memory. I think that correct change would be
"is_migrate_cma_page(hear) && zone == ZONE_MOVABLE".

Patch #5 also has this problem. Please fix it too.

Thanks.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 5/6] mm: honor PF_MEMALLOC_NOMOVABLE for all allocations
  2020-12-03 15:15     ` Pavel Tatashin
@ 2020-12-04  8:43       ` Michal Hocko
  2020-12-04  8:54         ` Michal Hocko
  0 siblings, 1 reply; 67+ messages in thread
From: Michal Hocko @ 2020-12-04  8:43 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, Joonsoo Kim, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Jason Gunthorpe, Peter Zijlstra, Mel Gorman,
	Matthew Wilcox, David Rientjes, John Hubbard

On Thu 03-12-20 10:15:41, Pavel Tatashin wrote:
> On Thu, Dec 3, 2020 at 4:17 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Wed 02-12-20 00:23:29, Pavel Tatashin wrote:
> > [...]
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 611799c72da5..7a6d86d0bc5f 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -3766,20 +3766,25 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask)
> > >       return alloc_flags;
> > >  }
> > >
> > > -static inline unsigned int current_alloc_flags(gfp_t gfp_mask,
> > > -                                     unsigned int alloc_flags)
> > > +static inline unsigned int cma_alloc_flags(gfp_t gfp_mask,
> > > +                                        unsigned int alloc_flags)
> > >  {
> > >  #ifdef CONFIG_CMA
> > > -     unsigned int pflags = current->flags;
> > > -
> > > -     if (!(pflags & PF_MEMALLOC_NOMOVABLE) &&
> > > -         gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
> > > +     if (gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
> > >               alloc_flags |= ALLOC_CMA;
> > > -
> > >  #endif
> > >       return alloc_flags;
> > >  }
> > >
> > > +static inline gfp_t current_gfp_checkmovable(gfp_t gfp_mask)
> > > +{
> > > +     unsigned int pflags = current->flags;
> > > +
> > > +     if ((pflags & PF_MEMALLOC_NOMOVABLE))
> > > +             return gfp_mask & ~__GFP_MOVABLE;
> > > +     return gfp_mask;
> > > +}
> > > +
> >
> > It sucks that we have to control both ALLOC and gfp flags. But wouldn't
> > it be simpler and more straightforward to keep current_alloc_flags as is
> > (module PF rename) and hook the gfp mask evaluation into current_gfp_context
> > and move it up before the first allocation attempt?
> 
> We could do that, but perhaps as a separate patch? I am worried about
> hidden implication of adding extra scope (GFP_NOIO|GFP_NOFS) to the
> fast path.

Why?

> Also, current_gfp_context() is used elsewhere, and in some
> places removing __GFP_MOVABLE from gfp_mask means that we will need to
> also change other things. For example [1], in try_to_free_pages() we
> call current_gfp_context(gfp_mask) which can reduce the maximum zone
> idx, yet we simply set it to: reclaim_idx = gfp_zone(gfp_mask), not to
> the newly determined gfp_mask.

Yes and the direct reclaim should honor the movable zone restriction.
Why should we reclaim ZONE_MOVABLE when the allocation cannot really
allocate from it? Or have I misunderstood your concern?

> 
> [1] https://soleen.com/source/xref/linux/mm/vmscan.c?r=2da9f630#3239
> 
> 
>  All scope flags
> > should be applicable to the hot path as well. It would add few cycles to
> > there but the question is whether that would be noticeable over just
> > handling PF_MEMALLOC_NOMOVABLE on its own. The cache line would be
> > pulled in anyway.
> 
> Let's try it in a separate patch? I will add it in the next version of
> this series.

Separate patch or not is up to you. But I do not see a strong reason why
this cannot be addressed in a single one.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 5/6] mm: honor PF_MEMALLOC_NOMOVABLE for all allocations
  2020-12-04  8:43       ` Michal Hocko
@ 2020-12-04  8:54         ` Michal Hocko
  2020-12-04 16:07           ` Pavel Tatashin
  0 siblings, 1 reply; 67+ messages in thread
From: Michal Hocko @ 2020-12-04  8:54 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, Joonsoo Kim, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Jason Gunthorpe, Peter Zijlstra, Mel Gorman,
	Matthew Wilcox, David Rientjes, John Hubbard

On Fri 04-12-20 09:43:13, Michal Hocko wrote:
> On Thu 03-12-20 10:15:41, Pavel Tatashin wrote:
[...]
> > Also, current_gfp_context() is used elsewhere, and in some
> > places removing __GFP_MOVABLE from gfp_mask means that we will need to
> > also change other things. For example [1], in try_to_free_pages() we
> > call current_gfp_context(gfp_mask) which can reduce the maximum zone
> > idx, yet we simply set it to: reclaim_idx = gfp_zone(gfp_mask), not to
> > the newly determined gfp_mask.
> 
> Yes and the direct reclaim should honor the movable zone restriction.
> Why should we reclaim ZONE_MOVABLE when the allocation cannot really
> allocate from it? Or have I misunderstood your concern?

Btw. if we have gfp mask properly filtered for the fast path then we can
remove the additional call to current_gfp_context from the direct
reclaim path. Something for a separate patch.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 0/6] prohibit pinning pages in ZONE_MOVABLE
  2020-12-04  4:02 ` [PATCH 0/6] prohibit pinning pages in ZONE_MOVABLE Joonsoo Kim
@ 2020-12-04 15:55   ` Pavel Tatashin
  2020-12-04 16:10     ` Jason Gunthorpe
  0 siblings, 1 reply; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-04 15:55 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka, Michal Hocko,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, mike.kravetz, Steven Rostedt, Ingo Molnar,
	Jason Gunthorpe, Peter Zijlstra, Mel Gorman, Matthew Wilcox,
	David Rientjes, John Hubbard

On Thu, Dec 3, 2020 at 11:03 PM Joonsoo Kim <js1304@gmail.com> wrote:
>
> Hello,
>
> On Wed, Dec 02, 2020 at 12:23:24AM -0500, Pavel Tatashin wrote:
> > When page is pinned it cannot be moved and its physical address stays
> > the same until pages is unpinned.
> >
> > This is useful functionality to allows userland to implementation DMA
> > access. For example, it is used by vfio in vfio_pin_pages().
> >
> > However, this functionality breaks memory hotplug/hotremove assumptions
> > that pages in ZONE_MOVABLE can always be migrated.
> >
> > This patch series fixes this issue by forcing new allocations during
> > page pinning to omit ZONE_MOVABLE, and also to migrate any existing
> > pages from ZONE_MOVABLE during pinning.
>
> I love what this patchset does, but, at least, it's better to consider
> the side-effect of this patchset and inform it in somewhere. IIUC,
> ZONE_MOVABLE exists for two purposes.
>
> 1) increasing availability of THP
> 2) memory hot-unplug
>
> Potential issue would come from the case 1). They uses ZONE_MOVABLE
> for THP availability and hard guarantee for migration isn't required
> until now. So, there would be a system with following congifuration.
>
> - memory layout: ZONE_NORMAL-512MB, ZONE_MOVABLE-512MB
> - memory usage: unmovable-256MB, movable pinned-256MB, movable
>   unpinned-512MB
>
> With this patchset, movable pinned should be placed in ZONE_NORMAL so
> 512MB is required for ZONE_NORMAL. ZONE_NORMAL would be exhausted and
> system performance would be highly afftect according to memory usage
> pattern.
>
> I'm not sure whether such configuration exists or not, but, at least,
> it's better to write down this risk on commit message or something
> else.

Yes, this indeed could be a problem for some configurations. I will
add your comment to the commit log of one of the patches.

Thank you,
Pasha

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 5/6] mm: honor PF_MEMALLOC_NOMOVABLE for all allocations
  2020-12-04  8:54         ` Michal Hocko
@ 2020-12-04 16:07           ` Pavel Tatashin
  0 siblings, 0 replies; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-04 16:07 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, Joonsoo Kim, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Jason Gunthorpe, Peter Zijlstra, Mel Gorman,
	Matthew Wilcox, David Rientjes, John Hubbard

On Fri, Dec 4, 2020 at 3:54 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Fri 04-12-20 09:43:13, Michal Hocko wrote:
> > On Thu 03-12-20 10:15:41, Pavel Tatashin wrote:
> [...]
> > > Also, current_gfp_context() is used elsewhere, and in some
> > > places removing __GFP_MOVABLE from gfp_mask means that we will need to
> > > also change other things. For example [1], in try_to_free_pages() we
> > > call current_gfp_context(gfp_mask) which can reduce the maximum zone
> > > idx, yet we simply set it to: reclaim_idx = gfp_zone(gfp_mask), not to
> > > the newly determined gfp_mask.
> >
> > Yes and the direct reclaim should honor the movable zone restriction.
> > Why should we reclaim ZONE_MOVABLE when the allocation cannot really
> > allocate from it? Or have I misunderstood your concern?
>
> Btw. if we have gfp mask properly filtered for the fast path then we can
> remove the additional call to current_gfp_context from the direct
> reclaim path. Something for a separate patch.

Good point. I am thinking to make a preparation patch at the beginning
of the series where we move current_gfp_context() to the fast path,
and also address all other cases where this call is not going to be
needed anymore, or where the gfp_mask will needed to be set according
to what current_gfp_context() returned.

Thanks,
Pasha

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 0/6] prohibit pinning pages in ZONE_MOVABLE
  2020-12-04 15:55   ` Pavel Tatashin
@ 2020-12-04 16:10     ` Jason Gunthorpe
  2020-12-04 17:50       ` Pavel Tatashin
  0 siblings, 1 reply; 67+ messages in thread
From: Jason Gunthorpe @ 2020-12-04 16:10 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Joonsoo Kim, LKML, linux-mm, Andrew Morton, Vlastimil Babka,
	Michal Hocko, David Hildenbrand, Oscar Salvador, Dan Williams,
	Sasha Levin, Tyler Hicks, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Mel Gorman, Matthew Wilcox,
	David Rientjes, John Hubbard

On Fri, Dec 04, 2020 at 10:55:30AM -0500, Pavel Tatashin wrote:
> On Thu, Dec 3, 2020 at 11:03 PM Joonsoo Kim <js1304@gmail.com> wrote:
> >
> > Hello,
> >
> > On Wed, Dec 02, 2020 at 12:23:24AM -0500, Pavel Tatashin wrote:
> > > When page is pinned it cannot be moved and its physical address stays
> > > the same until pages is unpinned.
> > >
> > > This is useful functionality to allows userland to implementation DMA
> > > access. For example, it is used by vfio in vfio_pin_pages().
> > >
> > > However, this functionality breaks memory hotplug/hotremove assumptions
> > > that pages in ZONE_MOVABLE can always be migrated.
> > >
> > > This patch series fixes this issue by forcing new allocations during
> > > page pinning to omit ZONE_MOVABLE, and also to migrate any existing
> > > pages from ZONE_MOVABLE during pinning.
> >
> > I love what this patchset does, but, at least, it's better to consider
> > the side-effect of this patchset and inform it in somewhere. IIUC,
> > ZONE_MOVABLE exists for two purposes.
> >
> > 1) increasing availability of THP
> > 2) memory hot-unplug
> >
> > Potential issue would come from the case 1). They uses ZONE_MOVABLE
> > for THP availability and hard guarantee for migration isn't required
> > until now. So, there would be a system with following congifuration.
> >
> > - memory layout: ZONE_NORMAL-512MB, ZONE_MOVABLE-512MB
> > - memory usage: unmovable-256MB, movable pinned-256MB, movable
> >   unpinned-512MB
> >
> > With this patchset, movable pinned should be placed in ZONE_NORMAL so
> > 512MB is required for ZONE_NORMAL. ZONE_NORMAL would be exhausted and
> > system performance would be highly afftect according to memory usage
> > pattern.
> >
> > I'm not sure whether such configuration exists or not, but, at least,
> > it's better to write down this risk on commit message or something
> > else.
> 
> Yes, this indeed could be a problem for some configurations. I will
> add your comment to the commit log of one of the patches.

It sounds like there is some inherent tension here, breaking THP's
when doing pin_user_pages() is a really nasty thing to do. DMA
benefits greatly from THP.

I know nothing about ZONE_MOVABLE, is this auto-setup or an admin
option? If the result of this patch is standard systems can no longer
pin > 80% of their memory I have some regression concerns..

Jason

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone
  2020-12-03 19:36                     ` Jason Gunthorpe
@ 2020-12-04 16:24                       ` Pavel Tatashin
  2020-12-04 17:06                         ` Jason Gunthorpe
  0 siblings, 1 reply; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-04 16:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka, Michal Hocko,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, Joonsoo Kim, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Mel Gorman, Matthew Wilcox,
	David Rientjes, John Hubbard

On Thu, Dec 3, 2020 at 2:36 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Thu, Dec 03, 2020 at 02:15:36PM -0500, Pavel Tatashin wrote:
>
> > I studied some more, and I think this is not a race:
> > list_add_tail(&head->lru, &cma_page_list) is called only when
> > isolate_lru_page(head) succeeds.
> > isolate_lru_page(head) succeeds only when PageLRU(head) is true.
> > However, in this function we also clear LRU flag before returning
> > success.
> > This means, that if we race with another thread, the other thread
> > won't get to unprotected list_add_tail(&head->lru, &cma_page_list)
> > until head is is back on LRU list.
>
> Oh interesting, I totally didn't see how that LRU stuff is
> working. So.. this creates a ridiculously expensive spin lock? Not
> broken, but yikes :|

Not really a spin lock, the second thread won't be able to isolate
this page, and will skip migration of this page.

>
> Jason

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone
  2020-12-04 16:24                       ` Pavel Tatashin
@ 2020-12-04 17:06                         ` Jason Gunthorpe
  0 siblings, 0 replies; 67+ messages in thread
From: Jason Gunthorpe @ 2020-12-04 17:06 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka, Michal Hocko,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, Joonsoo Kim, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Mel Gorman, Matthew Wilcox,
	David Rientjes, John Hubbard

On Fri, Dec 04, 2020 at 11:24:56AM -0500, Pavel Tatashin wrote:
> On Thu, Dec 3, 2020 at 2:36 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Thu, Dec 03, 2020 at 02:15:36PM -0500, Pavel Tatashin wrote:
> >
> > > I studied some more, and I think this is not a race:
> > > list_add_tail(&head->lru, &cma_page_list) is called only when
> > > isolate_lru_page(head) succeeds.
> > > isolate_lru_page(head) succeeds only when PageLRU(head) is true.
> > > However, in this function we also clear LRU flag before returning
> > > success.
> > > This means, that if we race with another thread, the other thread
> > > won't get to unprotected list_add_tail(&head->lru, &cma_page_list)
> > > until head is is back on LRU list.
> >
> > Oh interesting, I totally didn't see how that LRU stuff is
> > working. So.. this creates a ridiculously expensive spin lock? Not
> > broken, but yikes :|
> 
> Not really a spin lock, the second thread won't be able to isolate
> this page, and will skip migration of this page.

It looks like the intent is that it will call gup again, then goto
check_again, and once again try to isolate the LRU. ie it loops.

If it gets to a point where all the CMA pages fail to isolate then it
simply exits with success as the cma_page_list will be empty.

Is this a bug? It seems like a bug, the invariant here is to not
return with a CMA page, so why do we have a path that does return with
a CMA page?

Jason

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone
  2020-12-04  4:13   ` Joonsoo Kim
@ 2020-12-04 17:43     ` Pavel Tatashin
  2020-12-07  7:13       ` Joonsoo Kim
  0 siblings, 1 reply; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-04 17:43 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka, Michal Hocko,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, mike.kravetz, Steven Rostedt, Ingo Molnar,
	Jason Gunthorpe, Peter Zijlstra, Mel Gorman, Matthew Wilcox,
	David Rientjes, John Hubbard

On Thu, Dec 3, 2020 at 11:14 PM Joonsoo Kim <js1304@gmail.com> wrote:
>
> On Wed, Dec 02, 2020 at 12:23:30AM -0500, Pavel Tatashin wrote:
> > We do not allocate pin pages in ZONE_MOVABLE, but if pages were already
> > allocated before pinning they need to migrated to a different zone.
> > Currently, we migrate movable CMA pages only. Generalize the function
> > that migrates CMA pages to migrate all movable pages.
> >
> > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> > ---
> >  include/linux/migrate.h        |  1 +
> >  include/trace/events/migrate.h |  3 +-
> >  mm/gup.c                       | 56 +++++++++++++---------------------
> >  3 files changed, 24 insertions(+), 36 deletions(-)
> >
> > diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> > index 0f8d1583fa8e..00bab23d1ee5 100644
> > --- a/include/linux/migrate.h
> > +++ b/include/linux/migrate.h
> > @@ -27,6 +27,7 @@ enum migrate_reason {
> >       MR_MEMPOLICY_MBIND,
> >       MR_NUMA_MISPLACED,
> >       MR_CONTIG_RANGE,
> > +     MR_LONGTERM_PIN,
> >       MR_TYPES
> >  };
> >
> > diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h
> > index 4d434398d64d..363b54ce104c 100644
> > --- a/include/trace/events/migrate.h
> > +++ b/include/trace/events/migrate.h
> > @@ -20,7 +20,8 @@
> >       EM( MR_SYSCALL,         "syscall_or_cpuset")            \
> >       EM( MR_MEMPOLICY_MBIND, "mempolicy_mbind")              \
> >       EM( MR_NUMA_MISPLACED,  "numa_misplaced")               \
> > -     EMe(MR_CONTIG_RANGE,    "contig_range")
> > +     EM( MR_CONTIG_RANGE,    "contig_range")                 \
> > +     EMe(MR_LONGTERM_PIN,    "longterm_pin")
> >
> >  /*
> >   * First define the enums in the above macros to be exported to userspace
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 724d8a65e1df..1d511f65f8a7 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -1593,19 +1593,18 @@ static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
> >  }
> >  #endif
> >
> > -#ifdef CONFIG_CMA
> > -static long check_and_migrate_cma_pages(struct mm_struct *mm,
> > -                                     unsigned long start,
> > -                                     unsigned long nr_pages,
> > -                                     struct page **pages,
> > -                                     struct vm_area_struct **vmas,
> > -                                     unsigned int gup_flags)
> > +static long check_and_migrate_movable_pages(struct mm_struct *mm,
> > +                                         unsigned long start,
> > +                                         unsigned long nr_pages,
> > +                                         struct page **pages,
> > +                                         struct vm_area_struct **vmas,
> > +                                         unsigned int gup_flags)
> >  {
> >       unsigned long i;
> >       unsigned long step;
> >       bool drain_allow = true;
> >       bool migrate_allow = true;
> > -     LIST_HEAD(cma_page_list);
> > +     LIST_HEAD(page_list);
> >       long ret = nr_pages;
> >       struct migration_target_control mtc = {
> >               .nid = NUMA_NO_NODE,
> > @@ -1623,13 +1622,12 @@ static long check_and_migrate_cma_pages(struct mm_struct *mm,
> >                */
> >               step = compound_nr(head) - (pages[i] - head);
> >               /*
> > -              * If we get a page from the CMA zone, since we are going to
> > -              * be pinning these entries, we might as well move them out
> > -              * of the CMA zone if possible.
> > +              * If we get a movable page, since we are going to be pinning
> > +              * these entries, try to move them out if possible.
> >                */
> > -             if (is_migrate_cma_page(head)) {
> > +             if (is_migrate_movable(get_pageblock_migratetype(head))) {
>
> is_migrate_movable() isn't a check for the ZONE. It's a check for the
> MIGRATE_TYPE. MIGRATE_TYPE doesn't require hard guarantee for
> migration, and, most of memory, including ZONE_NORMAL, is
> MIGRATE_MOVABLE. With this code, long term gup would always fails due
> to not enough memory. I think that correct change would be
> "is_migrate_cma_page(hear) && zone == ZONE_MOVABLE".

Good point. The above should be OR not AND.

zone_idx(page_zone(head)) == ZONE_MOVABLE || is_migrate_cma_page(hear)

Pasha

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 0/6] prohibit pinning pages in ZONE_MOVABLE
  2020-12-04 16:10     ` Jason Gunthorpe
@ 2020-12-04 17:50       ` Pavel Tatashin
  2020-12-04 18:01         ` David Hildenbrand
  2020-12-07  7:12         ` Joonsoo Kim
  0 siblings, 2 replies; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-04 17:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Joonsoo Kim, LKML, linux-mm, Andrew Morton, Vlastimil Babka,
	Michal Hocko, David Hildenbrand, Oscar Salvador, Dan Williams,
	Sasha Levin, Tyler Hicks, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Mel Gorman, Matthew Wilcox,
	David Rientjes, John Hubbard

> > Yes, this indeed could be a problem for some configurations. I will
> > add your comment to the commit log of one of the patches.
>
> It sounds like there is some inherent tension here, breaking THP's
> when doing pin_user_pages() is a really nasty thing to do. DMA
> benefits greatly from THP.
>
> I know nothing about ZONE_MOVABLE, is this auto-setup or an admin
> option? If the result of this patch is standard systems can no longer
> pin > 80% of their memory I have some regression concerns..

ZONE_MOVABLE can be configured via kernel parameter, or when memory
nodes are onlined after hot-add; so this is something that admins
configure. ZONE_MOVABLE is designed to gurantee memory hot-plug
functionality, and not availability of THP, however, I did not know
about the use case where some admins might configure ZONE_MOVABLE to
increase availability of THP because pages are always migratable in
them. The thing is, if we fragment ZONE_MOVABLE by pinning pages in
it, the availability of THP also suffers.  We can migrate pages in
ZONE_NORMAL, just not guaranteed, so we can create THP in ZONE_NORMAL
as well, which is the usual case.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 0/6] prohibit pinning pages in ZONE_MOVABLE
  2020-12-04 17:50       ` Pavel Tatashin
@ 2020-12-04 18:01         ` David Hildenbrand
  2020-12-04 18:10           ` Pavel Tatashin
  2020-12-07  7:12         ` Joonsoo Kim
  1 sibling, 1 reply; 67+ messages in thread
From: David Hildenbrand @ 2020-12-04 18:01 UTC (permalink / raw)
  To: Pavel Tatashin, Jason Gunthorpe
  Cc: Joonsoo Kim, LKML, linux-mm, Andrew Morton, Vlastimil Babka,
	Michal Hocko, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, mike.kravetz, Steven Rostedt, Ingo Molnar,
	Peter Zijlstra, Mel Gorman, Matthew Wilcox, David Rientjes,
	John Hubbard

On 04.12.20 18:50, Pavel Tatashin wrote:
>>> Yes, this indeed could be a problem for some configurations. I will
>>> add your comment to the commit log of one of the patches.
>>
>> It sounds like there is some inherent tension here, breaking THP's
>> when doing pin_user_pages() is a really nasty thing to do. DMA
>> benefits greatly from THP.
>>
>> I know nothing about ZONE_MOVABLE, is this auto-setup or an admin
>> option? If the result of this patch is standard systems can no longer
>> pin > 80% of their memory I have some regression concerns..
> 
> ZONE_MOVABLE can be configured via kernel parameter, or when memory
> nodes are onlined after hot-add; so this is something that admins
> configure. ZONE_MOVABLE is designed to gurantee memory hot-plug
> functionality, and not availability of THP, however, I did not know
> about the use case where some admins might configure ZONE_MOVABLE to
> increase availability of THP because pages are always migratable in
> them. The thing is, if we fragment ZONE_MOVABLE by pinning pages in
> it, the availability of THP also suffers.  We can migrate pages in
> ZONE_NORMAL, just not guaranteed, so we can create THP in ZONE_NORMAL
> as well, which is the usual case.

Right, we should document this at some place to make admins aware of
this. Something like

"Techniques that rely on long-term pinnings of memory (especially, RDMA
and vfio) are fundamentally problematic with ZONE_MOVABLE and,
therefore, memory hotunplug. Pinned pages cannot reside on ZONE_MOVABLE,
to guarantee that memory can still get hotunplugged - be aware that
pinning can fail even if there is plenty of free memory in ZONE_MOVABLE.
In addition, using ZONE_MOVABLE might make page pinning more expensive,
because pages have to be migrated off that zone first."

BTW, you might also want to update the comment for ZONE_MOVABLE in
include/linux/mmzone.h at the end of this series, removing the special
case of pinned pages (1.) and maybe adding what happens when trying to
pin pages on ZONE_MOVABLE.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 0/6] prohibit pinning pages in ZONE_MOVABLE
  2020-12-04 18:01         ` David Hildenbrand
@ 2020-12-04 18:10           ` Pavel Tatashin
  0 siblings, 0 replies; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-04 18:10 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Jason Gunthorpe, Joonsoo Kim, LKML, linux-mm, Andrew Morton,
	Vlastimil Babka, Michal Hocko, Oscar Salvador, Dan Williams,
	Sasha Levin, Tyler Hicks, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Mel Gorman, Matthew Wilcox,
	David Rientjes, John Hubbard

> > ZONE_MOVABLE can be configured via kernel parameter, or when memory
> > nodes are onlined after hot-add; so this is something that admins
> > configure. ZONE_MOVABLE is designed to gurantee memory hot-plug
> > functionality, and not availability of THP, however, I did not know
> > about the use case where some admins might configure ZONE_MOVABLE to
> > increase availability of THP because pages are always migratable in
> > them. The thing is, if we fragment ZONE_MOVABLE by pinning pages in
> > it, the availability of THP also suffers.  We can migrate pages in
> > ZONE_NORMAL, just not guaranteed, so we can create THP in ZONE_NORMAL
> > as well, which is the usual case.
>
> Right, we should document this at some place to make admins aware of
> this. Something like
>
> "Techniques that rely on long-term pinnings of memory (especially, RDMA
> and vfio) are fundamentally problematic with ZONE_MOVABLE and,
> therefore, memory hotunplug. Pinned pages cannot reside on ZONE_MOVABLE,
> to guarantee that memory can still get hotunplugged - be aware that
> pinning can fail even if there is plenty of free memory in ZONE_MOVABLE.
> In addition, using ZONE_MOVABLE might make page pinning more expensive,
> because pages have to be migrated off that zone first."

Thanks, I will add this.

>
> BTW, you might also want to update the comment for ZONE_MOVABLE in
> include/linux/mmzone.h at the end of this series, removing the special
> case of pinned pages (1.) and maybe adding what happens when trying to
> pin pages on ZONE_MOVABLE.

Will do it.

Thank you,
Pasha

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone
  2020-12-03 14:17           ` Jason Gunthorpe
  2020-12-03 16:40             ` Pavel Tatashin
@ 2020-12-04 20:05             ` Daniel Jordan
  2020-12-04 20:16               ` Pavel Tatashin
  2020-12-04 20:52               ` Jason Gunthorpe
  1 sibling, 2 replies; 67+ messages in thread
From: Daniel Jordan @ 2020-12-04 20:05 UTC (permalink / raw)
  To: Jason Gunthorpe, Pavel Tatashin, Alex Williamson
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka, Michal Hocko,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, Joonsoo Kim, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Mel Gorman, Matthew Wilcox,
	David Rientjes, John Hubbard

Jason Gunthorpe <jgg@ziepe.ca> writes:

> On Wed, Dec 02, 2020 at 08:34:32PM -0500, Pavel Tatashin wrote:
>> What I meant is the users of the interface do it incrementally not in
>> large chunks. For example:
>> 
>> vfio_pin_pages_remote
>>    vaddr_get_pfn
>>         ret = pin_user_pages_remote(mm, vaddr, 1, flags |
>> FOLL_LONGTERM, page, NULL, NULL);
>> 1 -> pin only one pages at a time
>
> I don't know why vfio does this, it is why it so ridiculously slow at
> least.

Well Alex can correct me, but I went digging and a comment from the
first type1 vfio commit says the iommu API didn't promise to unmap
subpages of previous mappings, so doing page at a time gave flexibility
at the cost of inefficiency.

Then 166fd7d94afd allowed the iommu to use larger pages in vfio, but
vfio kept pinning pages at a time.  I couldn't find an explanation for
why that stayed the same.

Yesterday I tried optimizing vfio to skip gup calls for tail pages after
Matthew pointed out this same issue to me by coincidence last week.
Currently debugging, but if there's a fundamental reason this won't work
on the vfio side, it'd be nice to know.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone
  2020-12-04 20:05             ` Daniel Jordan
@ 2020-12-04 20:16               ` Pavel Tatashin
  2020-12-08  2:27                 ` Daniel Jordan
  2020-12-04 20:52               ` Jason Gunthorpe
  1 sibling, 1 reply; 67+ messages in thread
From: Pavel Tatashin @ 2020-12-04 20:16 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Jason Gunthorpe, Alex Williamson, LKML, linux-mm, Andrew Morton,
	Vlastimil Babka, Michal Hocko, David Hildenbrand, Oscar Salvador,
	Dan Williams, Sasha Levin, Tyler Hicks, Joonsoo Kim,
	mike.kravetz, Steven Rostedt, Ingo Molnar, Peter Zijlstra,
	Mel Gorman, Matthew Wilcox, David Rientjes, John Hubbard

On Fri, Dec 4, 2020 at 3:06 PM Daniel Jordan <daniel.m.jordan@oracle.com> wrote:
>
> Jason Gunthorpe <jgg@ziepe.ca> writes:
>
> > On Wed, Dec 02, 2020 at 08:34:32PM -0500, Pavel Tatashin wrote:
> >> What I meant is the users of the interface do it incrementally not in
> >> large chunks. For example:
> >>
> >> vfio_pin_pages_remote
> >>    vaddr_get_pfn
> >>         ret = pin_user_pages_remote(mm, vaddr, 1, flags |
> >> FOLL_LONGTERM, page, NULL, NULL);
> >> 1 -> pin only one pages at a time
> >
> > I don't know why vfio does this, it is why it so ridiculously slow at
> > least.
>
> Well Alex can correct me, but I went digging and a comment from the
> first type1 vfio commit says the iommu API didn't promise to unmap
> subpages of previous mappings, so doing page at a time gave flexibility
> at the cost of inefficiency.
>
> Then 166fd7d94afd allowed the iommu to use larger pages in vfio, but
> vfio kept pinning pages at a time.  I couldn't find an explanation for
> why that stayed the same.
>
> Yesterday I tried optimizing vfio to skip gup calls for tail pages after
> Matthew pointed out this same issue to me by coincidence last week.
> Currently debugging, but if there's a fundamental reason this won't work
> on the vfio side, it'd be nice to know.

Hi Daniel,

I do not think there are any fundamental reasons why it won't work. I
have also thinking increasing VFIO chunking for a different reason:

If a client touches pages before doing a VFIO DMA map, those pages
might be huge, and pinning a small page at a time and migrating a
small page at a time can break-up the huge pages. So, it is not only
inefficient to pin, but it can also inadvertently slow down the
runtime.

Thank you,
Pasha

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone
  2020-12-04 20:05             ` Daniel Jordan
  2020-12-04 20:16               ` Pavel Tatashin
@ 2020-12-04 20:52               ` Jason Gunthorpe
  2020-12-08  2:48                 ` Daniel Jordan
  1 sibling, 1 reply; 67+ messages in thread
From: Jason Gunthorpe @ 2020-12-04 20:52 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Pavel Tatashin, Alex Williamson, LKML, linux-mm, Andrew Morton,
	Vlastimil Babka, Michal Hocko, David Hildenbrand, Oscar Salvador,
	Dan Williams, Sasha Levin, Tyler Hicks, Joonsoo Kim,
	mike.kravetz, Steven Rostedt, Ingo Molnar, Peter Zijlstra,
	Mel Gorman, Matthew Wilcox, David Rientjes, John Hubbard

On Fri, Dec 04, 2020 at 03:05:46PM -0500, Daniel Jordan wrote:
> Jason Gunthorpe <jgg@ziepe.ca> writes:
> 
> > On Wed, Dec 02, 2020 at 08:34:32PM -0500, Pavel Tatashin wrote:
> >> What I meant is the users of the interface do it incrementally not in
> >> large chunks. For example:
> >> 
> >> vfio_pin_pages_remote
> >>    vaddr_get_pfn
> >>         ret = pin_user_pages_remote(mm, vaddr, 1, flags |
> >> FOLL_LONGTERM, page, NULL, NULL);
> >> 1 -> pin only one pages at a time
> >
> > I don't know why vfio does this, it is why it so ridiculously slow at
> > least.
> 
> Well Alex can correct me, but I went digging and a comment from the
> first type1 vfio commit says the iommu API didn't promise to unmap
> subpages of previous mappings, so doing page at a time gave flexibility
> at the cost of inefficiency.

iommu restrictions are not related to with gup. vfio needs to get the
page list from the page tables as efficiently as possible, then you
break it up into what you want to feed into the IOMMU how the iommu
wants.

vfio must maintain a page list to call unpin_user_pages() anyhow, so
it makes alot of sense to assemble the page list up front, then do the
iommu, instead of trying to do both things page at a time.

It would be smart to rebuild vfio to use scatter lists to store the
page list and then break the sgl into pages for iommu
configuration. SGLs will consume alot less memory for the usual case
of THPs backing the VFIO registrations.

ib_umem_get() has some example of how to code this, I've been thinking
we could make this some common API, and it could be further optimized.

> Yesterday I tried optimizing vfio to skip gup calls for tail pages after
> Matthew pointed out this same issue to me by coincidence last week.

Please don't just hack up vfio like this. Everyone needs faster gup,
we really need to solve this in the core code. Plus this is tricky,
vfio is already using follow_pfn wrongly, drivers should not be open
coding MM stuff.

> Currently debugging, but if there's a fundamental reason this won't work
> on the vfio side, it'd be nice to know.

AFAIK there is no guarentee that just because you see a compound head
that the remaining pages in the page tables are actually the tail
pages. This is only true sometimes, for instance if an entire huge
page is placed in a page table level.

I belive Ralph pointed to some case where we might break a huge page
from PMD to PTEs then later COW one of the PTEs. In this case the
compound head will be visible but the page map will be non-contiguous
and the page flags on each 4k entry will be different.

Only GUP's page walkers know that the compound page is actually at a
PMD level and can safely apply the 'everything is the same'
optimization.

The solution here is to make core gup faster, espcially for the cases
where it is returning huge pages. We can approach this by:
 - Batching the compound & tail page acquisition for higher page
   levels, eg gup fast does this already, look at record_subpages()
   gup slow needs it too
 - Batching unpin for compound & tail page, the opposite of the 'refs'
   arg for try_grab_compound_head()
 - Devise some API where get_user_pages can directly return
   contiguous groups of pages to avoid memory traffic
 - Reduce the cost of a FOLL_LONGTERM pin eg here is a start:
    https://lore.kernel.org/linux-mm/0-v1-5551df3ed12e+b8-gup_dax_speedup_jgg@nvidia.com
   And CMA should get some similar treatment. Scanning the output page
   list multiple times is slow.

I would like to get to a point where the main GUP walker functions can
output in more formats than just page array. For instance directly
constructing and chaining a biovec or sgl would dramatically improve
perfomance and decrease memory consumption. Being able to write in
hmm_range_fault's pfn&flags output format would delete a whole bunch
of duplicated code.

Jason

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 0/6] prohibit pinning pages in ZONE_MOVABLE
  2020-12-04 17:50       ` Pavel Tatashin
  2020-12-04 18:01         ` David Hildenbrand
@ 2020-12-07  7:12         ` Joonsoo Kim
  2020-12-07 12:13           ` Michal Hocko
  1 sibling, 1 reply; 67+ messages in thread
From: Joonsoo Kim @ 2020-12-07  7:12 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Jason Gunthorpe, LKML, linux-mm, Andrew Morton, Vlastimil Babka,
	Michal Hocko, David Hildenbrand, Oscar Salvador, Dan Williams,
	Sasha Levin, Tyler Hicks, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Mel Gorman, Matthew Wilcox,
	David Rientjes, John Hubbard

On Fri, Dec 04, 2020 at 12:50:56PM -0500, Pavel Tatashin wrote:
> > > Yes, this indeed could be a problem for some configurations. I will
> > > add your comment to the commit log of one of the patches.
> >
> > It sounds like there is some inherent tension here, breaking THP's
> > when doing pin_user_pages() is a really nasty thing to do. DMA
> > benefits greatly from THP.
> >
> > I know nothing about ZONE_MOVABLE, is this auto-setup or an admin
> > option? If the result of this patch is standard systems can no longer
> > pin > 80% of their memory I have some regression concerns..
> 
> ZONE_MOVABLE can be configured via kernel parameter, or when memory
> nodes are onlined after hot-add; so this is something that admins
> configure. ZONE_MOVABLE is designed to gurantee memory hot-plug

Just note, the origin of ZONE_MOVABLE is to provide availability of
huge page, especially, hugetlb page. AFAIK, not guarantee memory
hot-plug. See following commit that introduces the ZONE_MOVABLE.

2a1e274 Create the ZONE_MOVABLE zone

> functionality, and not availability of THP, however, I did not know
> about the use case where some admins might configure ZONE_MOVABLE to

The usecase is lightly mentioned in previous discussion.

http://lkml.kernel.org/r/alpine.DEB.2.23.453.2011221300100.2830030@chino.kir.corp.google.com

Anyway, I agree with your other arguments and this patchset.

Thanks.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone
  2020-12-04 17:43     ` Pavel Tatashin
@ 2020-12-07  7:13       ` Joonsoo Kim
  0 siblings, 0 replies; 67+ messages in thread
From: Joonsoo Kim @ 2020-12-07  7:13 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: LKML, linux-mm, Andrew Morton, Vlastimil Babka, Michal Hocko,
	David Hildenbrand, Oscar Salvador, Dan Williams, Sasha Levin,
	Tyler Hicks, mike.kravetz, Steven Rostedt, Ingo Molnar,
	Jason Gunthorpe, Peter Zijlstra, Mel Gorman, Matthew Wilcox,
	David Rientjes, John Hubbard

On Fri, Dec 04, 2020 at 12:43:29PM -0500, Pavel Tatashin wrote:
> On Thu, Dec 3, 2020 at 11:14 PM Joonsoo Kim <js1304@gmail.com> wrote:
> >
> > On Wed, Dec 02, 2020 at 12:23:30AM -0500, Pavel Tatashin wrote:
> > > We do not allocate pin pages in ZONE_MOVABLE, but if pages were already
> > > allocated before pinning they need to migrated to a different zone.
> > > Currently, we migrate movable CMA pages only. Generalize the function
> > > that migrates CMA pages to migrate all movable pages.
> > >
> > > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> > > ---
> > >  include/linux/migrate.h        |  1 +
> > >  include/trace/events/migrate.h |  3 +-
> > >  mm/gup.c                       | 56 +++++++++++++---------------------
> > >  3 files changed, 24 insertions(+), 36 deletions(-)
> > >
> > > diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> > > index 0f8d1583fa8e..00bab23d1ee5 100644
> > > --- a/include/linux/migrate.h
> > > +++ b/include/linux/migrate.h
> > > @@ -27,6 +27,7 @@ enum migrate_reason {
> > >       MR_MEMPOLICY_MBIND,
> > >       MR_NUMA_MISPLACED,
> > >       MR_CONTIG_RANGE,
> > > +     MR_LONGTERM_PIN,
> > >       MR_TYPES
> > >  };
> > >
> > > diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h
> > > index 4d434398d64d..363b54ce104c 100644
> > > --- a/include/trace/events/migrate.h
> > > +++ b/include/trace/events/migrate.h
> > > @@ -20,7 +20,8 @@
> > >       EM( MR_SYSCALL,         "syscall_or_cpuset")            \
> > >       EM( MR_MEMPOLICY_MBIND, "mempolicy_mbind")              \
> > >       EM( MR_NUMA_MISPLACED,  "numa_misplaced")               \
> > > -     EMe(MR_CONTIG_RANGE,    "contig_range")
> > > +     EM( MR_CONTIG_RANGE,    "contig_range")                 \
> > > +     EMe(MR_LONGTERM_PIN,    "longterm_pin")
> > >
> > >  /*
> > >   * First define the enums in the above macros to be exported to userspace
> > > diff --git a/mm/gup.c b/mm/gup.c
> > > index 724d8a65e1df..1d511f65f8a7 100644
> > > --- a/mm/gup.c
> > > +++ b/mm/gup.c
> > > @@ -1593,19 +1593,18 @@ static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
> > >  }
> > >  #endif
> > >
> > > -#ifdef CONFIG_CMA
> > > -static long check_and_migrate_cma_pages(struct mm_struct *mm,
> > > -                                     unsigned long start,
> > > -                                     unsigned long nr_pages,
> > > -                                     struct page **pages,
> > > -                                     struct vm_area_struct **vmas,
> > > -                                     unsigned int gup_flags)
> > > +static long check_and_migrate_movable_pages(struct mm_struct *mm,
> > > +                                         unsigned long start,
> > > +                                         unsigned long nr_pages,
> > > +                                         struct page **pages,
> > > +                                         struct vm_area_struct **vmas,
> > > +                                         unsigned int gup_flags)
> > >  {
> > >       unsigned long i;
> > >       unsigned long step;
> > >       bool drain_allow = true;
> > >       bool migrate_allow = true;
> > > -     LIST_HEAD(cma_page_list);
> > > +     LIST_HEAD(page_list);
> > >       long ret = nr_pages;
> > >       struct migration_target_control mtc = {
> > >               .nid = NUMA_NO_NODE,
> > > @@ -1623,13 +1622,12 @@ static long check_and_migrate_cma_pages(struct mm_struct *mm,
> > >                */
> > >               step = compound_nr(head) - (pages[i] - head);
> > >               /*
> > > -              * If we get a page from the CMA zone, since we are going to
> > > -              * be pinning these entries, we might as well move them out
> > > -              * of the CMA zone if possible.
> > > +              * If we get a movable page, since we are going to be pinning
> > > +              * these entries, try to move them out if possible.
> > >                */
> > > -             if (is_migrate_cma_page(head)) {
> > > +             if (is_migrate_movable(get_pageblock_migratetype(head))) {
> >
> > is_migrate_movable() isn't a check for the ZONE. It's a check for the
> > MIGRATE_TYPE. MIGRATE_TYPE doesn't require hard guarantee for
> > migration, and, most of memory, including ZONE_NORMAL, is
> > MIGRATE_MOVABLE. With this code, long term gup would always fails due
> > to not enough memory. I think that correct change would be
> > "is_migrate_cma_page(hear) && zone == ZONE_MOVABLE".
> 
> Good point. The above should be OR not AND.
> 
> zone_idx(page_zone(head)) == ZONE_MOVABLE || is_migrate_cma_page(hear)

Yep!

Thanks.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 0/6] prohibit pinning pages in ZONE_MOVABLE
  2020-12-07  7:12         ` Joonsoo Kim
@ 2020-12-07 12:13           ` Michal Hocko
  0 siblings, 0 replies; 67+ messages in thread
From: Michal Hocko @ 2020-12-07 12:13 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pavel Tatashin, Jason Gunthorpe, LKML, linux-mm, Andrew Morton,
	Vlastimil Babka, David Hildenbrand, Oscar Salvador, Dan Williams,
	Sasha Levin, Tyler Hicks, mike.kravetz, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Mel Gorman, Matthew Wilcox,
	David Rientjes, John Hubbard

On Mon 07-12-20 16:12:50, Joonsoo Kim wrote:
> On Fri, Dec 04, 2020 at 12:50:56PM -0500, Pavel Tatashin wrote:
> > > > Yes, this indeed could be a problem for some configurations. I will
> > > > add your comment to the commit log of one of the patches.
> > >
> > > It sounds like there is some inherent tension here, breaking THP's
> > > when doing pin_user_pages() is a really nasty thing to do. DMA
> > > benefits greatly from THP.
> > >
> > > I know nothing about ZONE_MOVABLE, is this auto-setup or an admin
> > > option? If the result of this patch is standard systems can no longer
> > > pin > 80% of their memory I have some regression concerns..
> > 
> > ZONE_MOVABLE can be configured via kernel parameter, or when memory
> > nodes are onlined after hot-add; so this is something that admins
> > configure. ZONE_MOVABLE is designed to gurantee memory hot-plug
> 
> Just note, the origin of ZONE_MOVABLE is to provide availability of
> huge page, especially, hugetlb page. AFAIK, not guarantee memory
> hot-plug. See following commit that introduces the ZONE_MOVABLE.
> 
> 2a1e274 Create the ZONE_MOVABLE zone
> 
> > functionality, and not availability of THP, however, I did not know
> > about the use case where some admins might configure ZONE_MOVABLE to
> 
> The usecase is lightly mentioned in previous discussion.
> 
> http://lkml.kernel.org/r/alpine.DEB.2.23.453.2011221300100.2830030@chino.kir.corp.google.com
> 
> Anyway, I agree with your other arguments and this patchset.

Yes, historically the original motivation for the movable zone was to
help creating large pages via compaction. I also do remember Mel
not being particularly happy about that.

The thing is that the movability constrain is just too strict for this
usecases because the movable zone, especially a lot of it, might be
causing similar to lowmem/highmem problems very well known from 32b
world. So an admin had to be always very careful when configuring to not
cause zone pressure problems.

Later on, with a higher demand on the memory hotplug - especially the
hotremove usecases - it has become clear that the only reliable way for
the memory offlining is to rule out any unmovable memory out of the way
and that is why a rather strong properly of movable zone was relied on.

In the end we are in two rather different requirements here. One for
optimization and one for correctness. In this case I would much rather
focus on the correctness aspect.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone
  2020-12-04 20:16               ` Pavel Tatashin
@ 2020-12-08  2:27                 ` Daniel Jordan
  0 siblings, 0 replies; 67+ messages in thread
From: Daniel Jordan @ 2020-12-08  2:27 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Jason Gunthorpe, Alex Williamson, LKML, linux-mm, Andrew Morton,
	Vlastimil Babka, Michal Hocko, David Hildenbrand, Oscar Salvador,
	Dan Williams, Sasha Levin, Tyler Hicks, Joonsoo Kim,
	mike.kravetz, Steven Rostedt, Ingo Molnar, Peter Zijlstra,
	Mel Gorman, Matthew Wilcox, David Rientjes, John Hubbard

Pavel Tatashin <pasha.tatashin@soleen.com> writes:

> On Fri, Dec 4, 2020 at 3:06 PM Daniel Jordan <daniel.m.jordan@oracle.com> wrote:
>>
>> Jason Gunthorpe <jgg@ziepe.ca> writes:
>>
>> > On Wed, Dec 02, 2020 at 08:34:32PM -0500, Pavel Tatashin wrote:
>> >> What I meant is the users of the interface do it incrementally not in
>> >> large chunks. For example:
>> >>
>> >> vfio_pin_pages_remote
>> >>    vaddr_get_pfn
>> >>         ret = pin_user_pages_remote(mm, vaddr, 1, flags |
>> >> FOLL_LONGTERM, page, NULL, NULL);
>> >> 1 -> pin only one pages at a time
>> >
>> > I don't know why vfio does this, it is why it so ridiculously slow at
>> > least.
>>
>> Well Alex can correct me, but I went digging and a comment from the
>> first type1 vfio commit says the iommu API didn't promise to unmap
>> subpages of previous mappings, so doing page at a time gave flexibility
>> at the cost of inefficiency.
>>
>> Then 166fd7d94afd allowed the iommu to use larger pages in vfio, but
>> vfio kept pinning pages at a time.  I couldn't find an explanation for
>> why that stayed the same.
>>
>> Yesterday I tried optimizing vfio to skip gup calls for tail pages after
>> Matthew pointed out this same issue to me by coincidence last week.
>> Currently debugging, but if there's a fundamental reason this won't work
>> on the vfio side, it'd be nice to know.
>
> Hi Daniel,
>
> I do not think there are any fundamental reasons why it won't work. I
> have also thinking increasing VFIO chunking for a different reason:
>
> If a client touches pages before doing a VFIO DMA map, those pages
> might be huge, and pinning a small page at a time and migrating a
> small page at a time can break-up the huge pages. So, it is not only
> inefficient to pin, but it can also inadvertently slow down the
> runtime.

Hi Pasha,

I see, and I'm curious, do you experience this case where a user has
touched the pages before doing a VFIO DMA map, and if so where?

The usual situation on my side is that the pages are faulted in during
pinning.

Daniel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone
  2020-12-04 20:52               ` Jason Gunthorpe
@ 2020-12-08  2:48                 ` Daniel Jordan
  2020-12-08 13:24                   ` Jason Gunthorpe
  0 siblings, 1 reply; 67+ messages in thread
From: Daniel Jordan @ 2020-12-08  2:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Pavel Tatashin, Alex Williamson, LKML, linux-mm, Andrew Morton,
	Vlastimil Babka, Michal Hocko, David Hildenbrand, Oscar Salvador,
	Dan Williams, Sasha Levin, Tyler Hicks, Joonsoo Kim,
	mike.kravetz, Steven Rostedt, Ingo Molnar, Peter Zijlstra,
	Mel Gorman, Matthew Wilcox, David Rientjes, John Hubbard

Jason Gunthorpe <jgg@ziepe.ca> writes:
> On Fri, Dec 04, 2020 at 03:05:46PM -0500, Daniel Jordan wrote:
>> Well Alex can correct me, but I went digging and a comment from the
>> first type1 vfio commit says the iommu API didn't promise to unmap
>> subpages of previous mappings, so doing page at a time gave flexibility
>> at the cost of inefficiency.
>
> iommu restrictions are not related to with gup. vfio needs to get the
> page list from the page tables as efficiently as possible, then you
> break it up into what you want to feed into the IOMMU how the iommu
> wants.
>
> vfio must maintain a page list to call unpin_user_pages() anyhow, so

It does in some cases but not others, namely the expensive
VFIO_IOMMU_MAP_DMA/UNMAP_DMA path where the iommu page tables are used
to find the pfns when unpinning.

I don't see why vfio couldn't do as you say, though, and the worst case
memory overhead of using scatterlist to remember the pfns of a 300g VM
backed by huge but physically discontiguous pages is only a few meg, not
bad at all.

> it makes alot of sense to assemble the page list up front, then do the
> iommu, instead of trying to do both things page at a time.
>
> It would be smart to rebuild vfio to use scatter lists to store the
> page list and then break the sgl into pages for iommu
> configuration. SGLs will consume alot less memory for the usual case
> of THPs backing the VFIO registrations.
>
> ib_umem_get() has some example of how to code this, I've been thinking
> we could make this some common API, and it could be further optimized.

Agreed, great suggestions, both above and in the rest of your response.

>> Yesterday I tried optimizing vfio to skip gup calls for tail pages after
>> Matthew pointed out this same issue to me by coincidence last week.
>
> Please don't just hack up vfio like this.

Yeah, you've cured me of that idea.  I'll see where I get experimenting
with some of this stuff.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone
  2020-12-08  2:48                 ` Daniel Jordan
@ 2020-12-08 13:24                   ` Jason Gunthorpe
  0 siblings, 0 replies; 67+ messages in thread
From: Jason Gunthorpe @ 2020-12-08 13:24 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Pavel Tatashin, Alex Williamson, LKML, linux-mm, Andrew Morton,
	Vlastimil Babka, Michal Hocko, David Hildenbrand, Oscar Salvador,
	Dan Williams, Sasha Levin, Tyler Hicks, Joonsoo Kim,
	mike.kravetz, Steven Rostedt, Ingo Molnar, Peter Zijlstra,
	Mel Gorman, Matthew Wilcox, David Rientjes, John Hubbard

On Mon, Dec 07, 2020 at 09:48:48PM -0500, Daniel Jordan wrote:
> Jason Gunthorpe <jgg@ziepe.ca> writes:
> > On Fri, Dec 04, 2020 at 03:05:46PM -0500, Daniel Jordan wrote:
> >> Well Alex can correct me, but I went digging and a comment from the
> >> first type1 vfio commit says the iommu API didn't promise to unmap
> >> subpages of previous mappings, so doing page at a time gave flexibility
> >> at the cost of inefficiency.
> >
> > iommu restrictions are not related to with gup. vfio needs to get the
> > page list from the page tables as efficiently as possible, then you
> > break it up into what you want to feed into the IOMMU how the iommu
> > wants.
> >
> > vfio must maintain a page list to call unpin_user_pages() anyhow, so
> 
> It does in some cases but not others, namely the expensive
> VFIO_IOMMU_MAP_DMA/UNMAP_DMA path where the iommu page tables are used
> to find the pfns when unpinning.

Oh, I see.. Well, that is still possible, but vfio really needs to
batch operations, eg call pin_user_pages() with some larger buffer and
store those into the iommu and then reverse this to build up
contiguous runs of pages to unpin

> I don't see why vfio couldn't do as you say, though, and the worst case
> memory overhead of using scatterlist to remember the pfns of a 300g VM
> backed by huge but physically discontiguous pages is only a few meg, not
> bad at all.

Yes, but 0 is still better.. I would start by focusing on batching
pin_user_pages.

Jason

^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2020-12-08 13:25 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-02  5:23 [PATCH 0/6] prohibit pinning pages in ZONE_MOVABLE Pavel Tatashin
2020-12-02  5:23 ` [PATCH 1/6] mm/gup: perform check_dax_vmas only when FS_DAX is enabled Pavel Tatashin
2020-12-02 16:22   ` Ira Weiny
2020-12-02 18:15     ` Pavel Tatashin
2020-12-02 16:29   ` Jason Gunthorpe
2020-12-02 18:16     ` Pavel Tatashin
2020-12-03  7:59   ` John Hubbard
2020-12-03 14:52     ` Pavel Tatashin
2020-12-02  5:23 ` [PATCH 2/6] mm/gup: don't pin migrated cma pages in movable zone Pavel Tatashin
2020-12-02 16:31   ` David Hildenbrand
2020-12-02 18:17     ` Pavel Tatashin
2020-12-03  8:01   ` John Hubbard
2020-12-03  8:46   ` Michal Hocko
2020-12-03 14:58     ` Pavel Tatashin
2020-12-02  5:23 ` [PATCH 3/6] mm/gup: make __gup_longterm_locked common Pavel Tatashin
2020-12-02 16:31   ` Ira Weiny
2020-12-02 16:33     ` Ira Weiny
2020-12-02 18:19       ` Pavel Tatashin
2020-12-03  0:03         ` Pavel Tatashin
2020-12-03  8:03   ` John Hubbard
2020-12-03 15:02     ` Pavel Tatashin
2020-12-02  5:23 ` [PATCH 4/6] mm cma: rename PF_MEMALLOC_NOCMA to PF_MEMALLOC_NOMOVABLE Pavel Tatashin
2020-12-03  8:04   ` John Hubbard
2020-12-03 15:02     ` Pavel Tatashin
2020-12-03  8:57   ` Michal Hocko
2020-12-03 15:02     ` Pavel Tatashin
2020-12-02  5:23 ` [PATCH 5/6] mm: honor PF_MEMALLOC_NOMOVABLE for all allocations Pavel Tatashin
2020-12-03  8:17   ` John Hubbard
2020-12-03 15:06     ` Pavel Tatashin
2020-12-03 16:51       ` John Hubbard
2020-12-03  9:17   ` Michal Hocko
2020-12-03 15:15     ` Pavel Tatashin
2020-12-04  8:43       ` Michal Hocko
2020-12-04  8:54         ` Michal Hocko
2020-12-04 16:07           ` Pavel Tatashin
2020-12-02  5:23 ` [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone Pavel Tatashin
2020-12-02 16:35   ` Jason Gunthorpe
2020-12-03  0:19     ` Pavel Tatashin
2020-12-03  1:08       ` Jason Gunthorpe
2020-12-03  1:34         ` Pavel Tatashin
2020-12-03 14:17           ` Jason Gunthorpe
2020-12-03 16:40             ` Pavel Tatashin
2020-12-03 16:59               ` Jason Gunthorpe
2020-12-03 17:14                 ` Pavel Tatashin
2020-12-03 19:15                   ` Pavel Tatashin
2020-12-03 19:36                     ` Jason Gunthorpe
2020-12-04 16:24                       ` Pavel Tatashin
2020-12-04 17:06                         ` Jason Gunthorpe
2020-12-04 20:05             ` Daniel Jordan
2020-12-04 20:16               ` Pavel Tatashin
2020-12-08  2:27                 ` Daniel Jordan
2020-12-04 20:52               ` Jason Gunthorpe
2020-12-08  2:48                 ` Daniel Jordan
2020-12-08 13:24                   ` Jason Gunthorpe
2020-12-03  8:22   ` John Hubbard
2020-12-03 15:55     ` Pavel Tatashin
2020-12-04  4:13   ` Joonsoo Kim
2020-12-04 17:43     ` Pavel Tatashin
2020-12-07  7:13       ` Joonsoo Kim
2020-12-04  4:02 ` [PATCH 0/6] prohibit pinning pages in ZONE_MOVABLE Joonsoo Kim
2020-12-04 15:55   ` Pavel Tatashin
2020-12-04 16:10     ` Jason Gunthorpe
2020-12-04 17:50       ` Pavel Tatashin
2020-12-04 18:01         ` David Hildenbrand
2020-12-04 18:10           ` Pavel Tatashin
2020-12-07  7:12         ` Joonsoo Kim
2020-12-07 12:13           ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).