linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH V2 1/4] mm: Export alloc_migrate_huge_page
@ 2018-09-06  5:43 Aneesh Kumar K.V
  2018-09-06  5:43 ` [RFC PATCH V2 2/4] mm: Add get_user_pages_cma_migrate Aneesh Kumar K.V
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Aneesh Kumar K.V @ 2018-09-06  5:43 UTC (permalink / raw)
  To: akpm, Alexey Kardashevskiy, mpe
  Cc: linux-mm, linux-kernel, linuxppc-dev, Aneesh Kumar K.V

We want to use this to support customized huge page migration.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 include/linux/hugetlb.h | 2 ++
 mm/hugetlb.c            | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index c39d9170a8a0..98c9c6dc308c 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -357,6 +357,8 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
 				nodemask_t *nmask);
 struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
 				unsigned long address);
+struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
+				     int nid, nodemask_t *nmask);
 int huge_add_to_page_cache(struct page *page, struct address_space *mapping,
 			pgoff_t idx);
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 47566bb0b4b1..88881b3f8628 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1586,8 +1586,8 @@ static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask,
 	return page;
 }
 
-static struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
-		int nid, nodemask_t *nmask)
+struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
+				     int nid, nodemask_t *nmask)
 {
 	struct page *page;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH V2 2/4] mm: Add get_user_pages_cma_migrate
  2018-09-06  5:43 [RFC PATCH V2 1/4] mm: Export alloc_migrate_huge_page Aneesh Kumar K.V
@ 2018-09-06  5:43 ` Aneesh Kumar K.V
  2018-09-06 12:45   ` Michal Hocko
  2018-09-06  5:43 ` [RFC PATCH V2 3/4] powerpc/mm/iommu: Allow large IOMMU page size only for hugetlb backing Aneesh Kumar K.V
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Aneesh Kumar K.V @ 2018-09-06  5:43 UTC (permalink / raw)
  To: akpm, Alexey Kardashevskiy, mpe
  Cc: linux-mm, linux-kernel, linuxppc-dev, Aneesh Kumar K.V

This helper does a get_user_pages_fast and if it find pages in the CMA area
it will try to migrate them before taking page reference. This makes sure that
we don't keep non-movable pages (due to page reference count) in the CMA area.
Not able to move pages out of CMA area result in CMA allocation failures.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 include/linux/migrate.h |   3 ++
 mm/migrate.c            | 108 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 111 insertions(+)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index f2b4abbca55e..d82b35afd2eb 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -286,6 +286,9 @@ static inline int migrate_vma(const struct migrate_vma_ops *ops,
 }
 #endif /* IS_ENABLED(CONFIG_MIGRATE_VMA_HELPER) */
 
+extern int get_user_pages_cma_migrate(unsigned long start, int nr_pages, int write,
+				      struct page **pages);
+
 #endif /* CONFIG_MIGRATION */
 
 #endif /* _LINUX_MIGRATE_H */
diff --git a/mm/migrate.c b/mm/migrate.c
index c27e97b5b69d..c26288d407ae 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -3008,3 +3008,111 @@ int migrate_vma(const struct migrate_vma_ops *ops,
 }
 EXPORT_SYMBOL(migrate_vma);
 #endif /* defined(MIGRATE_VMA_HELPER) */
+
+static struct page *new_non_cma_page(struct page *page, unsigned long private)
+{
+	/*
+	 * We want to make sure we allocate the new page from the same node
+	 * as the source page.
+	 */
+	int nid = page_to_nid(page);
+	gfp_t gfp_mask = GFP_USER | __GFP_THISNODE;
+
+	if (PageHighMem(page))
+		gfp_mask |= __GFP_HIGHMEM;
+
+	if (PageHuge(page)) {
+
+		struct hstate *h = page_hstate(page);
+		/*
+		 * We don't want to dequeue from the pool because pool pages will
+		 * mostly be from the CMA region.
+		 */
+		return alloc_migrate_huge_page(h, gfp_mask, nid, NULL);
+
+	} else if (PageTransHuge(page)) {
+		struct page *thp;
+		gfp_t thp_gfpmask = GFP_TRANSHUGE | __GFP_THISNODE;
+
+		/*
+		 * Remove the movable mask so that we don't allocate from
+		 * CMA area again.
+		 */
+		thp_gfpmask &= ~__GFP_MOVABLE;
+		thp = __alloc_pages_node(nid, thp_gfpmask, HPAGE_PMD_ORDER);
+		if (!thp)
+			return NULL;
+		prep_transhuge_page(thp);
+		return thp;
+	}
+
+	return __alloc_pages_node(nid, gfp_mask, 0);
+}
+
+int get_user_pages_cma_migrate(unsigned long start, int nr_pages, int write,
+			       struct page **pages)
+{
+	int i, ret;
+	bool drain_allow = true;
+	bool migrate_allow = true;
+	LIST_HEAD(cma_page_list);
+
+get_user_again:
+	ret = get_user_pages_fast(start, nr_pages, write, pages);
+	if (ret <= 0)
+		return ret;
+
+	for (i = 0; i < ret; ++i) {
+		/*
+		 * If we get a page from the CMA zone, since we are going to
+		 * be pinning these entries, we might as well move them out
+		 * of the CMA zone if possible.
+		 */
+		if (is_migrate_cma_page(pages[i]) && migrate_allow) {
+			if (PageHuge(pages[i]))
+				isolate_huge_page(pages[i], &cma_page_list);
+			else {
+				struct page *head = compound_head(pages[i]);
+
+				if (!PageLRU(head) && drain_allow) {
+					lru_add_drain_all();
+					drain_allow = false;
+				}
+
+				if (!isolate_lru_page(head)) {
+					list_add_tail(&head->lru, &cma_page_list);
+					mod_node_page_state(page_pgdat(head),
+							    NR_ISOLATED_ANON +
+							    page_is_file_cache(head),
+							    hpage_nr_pages(head));
+				}
+			}
+		}
+	}
+	if (!list_empty(&cma_page_list)) {
+		/*
+		 * drop the above get_user_pages reference.
+		 */
+		for (i = 0; i < ret; ++i)
+			put_page(pages[i]);
+
+		if (migrate_pages(&cma_page_list, new_non_cma_page,
+				  NULL, 0, MIGRATE_SYNC, MR_CONTIG_RANGE)) {
+			/*
+			 * some of the pages failed migration. Do get_user_pages
+			 * without migration.
+			 */
+			migrate_allow = false;
+
+			if (!list_empty(&cma_page_list))
+				putback_movable_pages(&cma_page_list);
+		}
+		/*
+		 * We did migrate all the pages, Try to get the page references again
+		 * migrating any new CMA pages which we failed to isolate earlier.
+		 */
+		drain_allow = true;
+		goto get_user_again;
+	}
+	return ret;
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH V2 3/4] powerpc/mm/iommu: Allow large IOMMU page size only for hugetlb backing
  2018-09-06  5:43 [RFC PATCH V2 1/4] mm: Export alloc_migrate_huge_page Aneesh Kumar K.V
  2018-09-06  5:43 ` [RFC PATCH V2 2/4] mm: Add get_user_pages_cma_migrate Aneesh Kumar K.V
@ 2018-09-06  5:43 ` Aneesh Kumar K.V
  2018-09-06  5:43 ` [RFC PATCH V2 4/4] powerpc/mm/iommu: Allow migration of cma allocated pages during mm_iommu_get Aneesh Kumar K.V
  2018-09-06 12:31 ` [RFC PATCH V2 1/4] mm: Export alloc_migrate_huge_page Michal Hocko
  3 siblings, 0 replies; 14+ messages in thread
From: Aneesh Kumar K.V @ 2018-09-06  5:43 UTC (permalink / raw)
  To: akpm, Alexey Kardashevskiy, mpe
  Cc: linux-mm, linux-kernel, linuxppc-dev, Aneesh Kumar K.V

THP pages can get split during different code paths. An incremented reference
count do imply we will not split the compound page. But the pmd entry can be
converted to level 4 pte entries. Keep the code simpler by allowing large
IOMMU page size only if the guest ram is backed by hugetlb pages.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/mm/mmu_context_iommu.c | 16 ++--------------
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
index c9ee9e23845f..f472965f7638 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -212,21 +212,9 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
 		}
 populate:
 		pageshift = PAGE_SHIFT;
-		if (mem->pageshift > PAGE_SHIFT && PageCompound(page)) {
-			pte_t *pte;
+		if (mem->pageshift > PAGE_SHIFT && PageHuge(page)) {
 			struct page *head = compound_head(page);
-			unsigned int compshift = compound_order(head);
-			unsigned int pteshift;
-
-			local_irq_save(flags); /* disables as well */
-			pte = find_linux_pte(mm->pgd, cur_ua, NULL, &pteshift);
-
-			/* Double check it is still the same pinned page */
-			if (pte && pte_page(*pte) == head &&
-			    pteshift == compshift + PAGE_SHIFT)
-				pageshift = max_t(unsigned int, pteshift,
-						PAGE_SHIFT);
-			local_irq_restore(flags);
+			pageshift = compound_order(head) + PAGE_SHIFT;
 		}
 		mem->pageshift = min(mem->pageshift, pageshift);
 		mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH V2 4/4] powerpc/mm/iommu: Allow migration of cma allocated pages during mm_iommu_get
  2018-09-06  5:43 [RFC PATCH V2 1/4] mm: Export alloc_migrate_huge_page Aneesh Kumar K.V
  2018-09-06  5:43 ` [RFC PATCH V2 2/4] mm: Add get_user_pages_cma_migrate Aneesh Kumar K.V
  2018-09-06  5:43 ` [RFC PATCH V2 3/4] powerpc/mm/iommu: Allow large IOMMU page size only for hugetlb backing Aneesh Kumar K.V
@ 2018-09-06  5:43 ` Aneesh Kumar K.V
  2018-09-06 12:53   ` Michal Hocko
  2018-09-06 12:31 ` [RFC PATCH V2 1/4] mm: Export alloc_migrate_huge_page Michal Hocko
  3 siblings, 1 reply; 14+ messages in thread
From: Aneesh Kumar K.V @ 2018-09-06  5:43 UTC (permalink / raw)
  To: akpm, Alexey Kardashevskiy, mpe
  Cc: linux-mm, linux-kernel, linuxppc-dev, Aneesh Kumar K.V

Current code doesn't do page migration if the page allocated is a compound page.
With HugeTLB migration support, we can end up allocating hugetlb pages from
CMA region. Also THP pages can be allocated from CMA region. This patch updates
the code to handle compound pages correctly.

This use the new helper get_user_pages_cma_migrate. It does one get_user_pages
with right count, instead of doing one get_user_pages per page. That avoids
reading page table multiple times.

The patch also convert the hpas member of mm_iommu_table_group_mem_t to a union.
We use the same storage location to store pointers to struct page. We cannot
update alll the code path use struct page *, because we access hpas in real mode
and we can't do that struct page * to pfn conversion in real mode.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/mm/mmu_context_iommu.c | 124 +++++++++-------------------
 1 file changed, 37 insertions(+), 87 deletions(-)

diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
index f472965f7638..607acd03ab06 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -20,6 +20,7 @@
 #include <linux/swap.h>
 #include <asm/mmu_context.h>
 #include <asm/pte-walk.h>
+#include <linux/mm_inline.h>
 
 static DEFINE_MUTEX(mem_list_mutex);
 
@@ -30,8 +31,18 @@ struct mm_iommu_table_group_mem_t {
 	atomic64_t mapped;
 	unsigned int pageshift;
 	u64 ua;			/* userspace address */
-	u64 entries;		/* number of entries in hpas[] */
-	u64 *hpas;		/* vmalloc'ed */
+	u64 entries;		/* number of entries in hpages[] */
+	/*
+	 * in mm_iommu_get we temporarily use this to store
+	 * struct page address.
+	 *
+	 * We need to convert ua to hpa in real mode. Make it
+	 * simpler by storing physicall address.
+	 */
+	union {
+		struct page **hpages;	/* vmalloc'ed */
+		phys_addr_t *hpas;
+	};
 };
 
 static long mm_iommu_adjust_locked_vm(struct mm_struct *mm,
@@ -74,63 +85,12 @@ bool mm_iommu_preregistered(struct mm_struct *mm)
 }
 EXPORT_SYMBOL_GPL(mm_iommu_preregistered);
 
-/*
- * Taken from alloc_migrate_target with changes to remove CMA allocations
- */
-struct page *new_iommu_non_cma_page(struct page *page, unsigned long private)
-{
-	gfp_t gfp_mask = GFP_USER;
-	struct page *new_page;
-
-	if (PageCompound(page))
-		return NULL;
-
-	if (PageHighMem(page))
-		gfp_mask |= __GFP_HIGHMEM;
-
-	/*
-	 * We don't want the allocation to force an OOM if possibe
-	 */
-	new_page = alloc_page(gfp_mask | __GFP_NORETRY | __GFP_NOWARN);
-	return new_page;
-}
-
-static int mm_iommu_move_page_from_cma(struct page *page)
-{
-	int ret = 0;
-	LIST_HEAD(cma_migrate_pages);
-
-	/* Ignore huge pages for now */
-	if (PageCompound(page))
-		return -EBUSY;
-
-	lru_add_drain();
-	ret = isolate_lru_page(page);
-	if (ret)
-		return ret;
-
-	list_add(&page->lru, &cma_migrate_pages);
-	put_page(page); /* Drop the gup reference */
-
-	ret = migrate_pages(&cma_migrate_pages, new_iommu_non_cma_page,
-				NULL, 0, MIGRATE_SYNC, MR_CONTIG_RANGE);
-	if (ret) {
-		if (!list_empty(&cma_migrate_pages))
-			putback_movable_pages(&cma_migrate_pages);
-	}
-
-	return 0;
-}
-
 long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
 		struct mm_iommu_table_group_mem_t **pmem)
 {
 	struct mm_iommu_table_group_mem_t *mem;
-	long i, j, ret = 0, locked_entries = 0;
+	long i, ret = 0, locked_entries = 0;
 	unsigned int pageshift;
-	unsigned long flags;
-	unsigned long cur_ua;
-	struct page *page = NULL;
 
 	mutex_lock(&mem_list_mutex);
 
@@ -177,47 +137,37 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
 		goto unlock_exit;
 	}
 
+	ret = get_user_pages_cma_migrate(ua, entries, 1, mem->hpages);
+	if (ret != entries) {
+		/* free the reference taken */
+		for (i = 0; i < ret; i++)
+			put_page(mem->hpages[i]);
+
+		vfree(mem->hpas);
+		kfree(mem);
+		ret = -EFAULT;
+		goto unlock_exit;
+	} else
+		ret = 0;
+
+	pageshift = PAGE_SHIFT;
 	for (i = 0; i < entries; ++i) {
-		cur_ua = ua + (i << PAGE_SHIFT);
-		if (1 != get_user_pages_fast(cur_ua,
-					1/* pages */, 1/* iswrite */, &page)) {
-			ret = -EFAULT;
-			for (j = 0; j < i; ++j)
-				put_page(pfn_to_page(mem->hpas[j] >>
-						PAGE_SHIFT));
-			vfree(mem->hpas);
-			kfree(mem);
-			goto unlock_exit;
-		}
+		struct page *page = mem->hpages[i];
 		/*
-		 * If we get a page from the CMA zone, since we are going to
-		 * be pinning these entries, we might as well move them out
-		 * of the CMA zone if possible. NOTE: faulting in + migration
-		 * can be expensive. Batching can be considered later
+		 * Allow to use larger than 64k IOMMU pages. Only do that
+		 * if we are backed by hugetlb.
 		 */
-		if (is_migrate_cma_page(page)) {
-			if (mm_iommu_move_page_from_cma(page))
-				goto populate;
-			if (1 != get_user_pages_fast(cur_ua,
-						1/* pages */, 1/* iswrite */,
-						&page)) {
-				ret = -EFAULT;
-				for (j = 0; j < i; ++j)
-					put_page(pfn_to_page(mem->hpas[j] >>
-								PAGE_SHIFT));
-				vfree(mem->hpas);
-				kfree(mem);
-				goto unlock_exit;
-			}
-		}
-populate:
-		pageshift = PAGE_SHIFT;
-		if (mem->pageshift > PAGE_SHIFT && PageHuge(page)) {
+		if ((mem->pageshift > PAGE_SHIFT) && PageHuge(page)) {
 			struct page *head = compound_head(page);
 			pageshift = compound_order(head) + PAGE_SHIFT;
 		}
 		mem->pageshift = min(mem->pageshift, pageshift);
+		/*
+		 * We don't need struct page reference any more, switch
+		 * physicall address.
+		 */
 		mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
+
 	}
 
 	atomic64_set(&mem->mapped, 1);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH V2 1/4] mm: Export alloc_migrate_huge_page
  2018-09-06  5:43 [RFC PATCH V2 1/4] mm: Export alloc_migrate_huge_page Aneesh Kumar K.V
                   ` (2 preceding siblings ...)
  2018-09-06  5:43 ` [RFC PATCH V2 4/4] powerpc/mm/iommu: Allow migration of cma allocated pages during mm_iommu_get Aneesh Kumar K.V
@ 2018-09-06 12:31 ` Michal Hocko
  2018-09-06 12:35   ` Michal Hocko
  3 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2018-09-06 12:31 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: akpm, Alexey Kardashevskiy, mpe, linux-mm, linux-kernel, linuxppc-dev

On Thu 06-09-18 11:13:39, Aneesh Kumar K.V wrote:
> We want to use this to support customized huge page migration.

Please be much more specific. Ideally including the user. Btw. why do
you want to skip the hugetlb pools? In other words alloc_huge_page_node*
which are intended to an external use?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH V2 1/4] mm: Export alloc_migrate_huge_page
  2018-09-06 12:31 ` [RFC PATCH V2 1/4] mm: Export alloc_migrate_huge_page Michal Hocko
@ 2018-09-06 12:35   ` Michal Hocko
  2018-09-06 13:32     ` Aneesh Kumar K.V
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2018-09-06 12:35 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: akpm, Alexey Kardashevskiy, mpe, linux-mm, linux-kernel, linuxppc-dev

On Thu 06-09-18 14:31:11, Michal Hocko wrote:
> On Thu 06-09-18 11:13:39, Aneesh Kumar K.V wrote:
> > We want to use this to support customized huge page migration.
> 
> Please be much more specific. Ideally including the user. Btw. why do
> you want to skip the hugetlb pools? In other words alloc_huge_page_node*
> which are intended to an external use?

Ups, I have now found http://lkml.kernel.org/r/20180906054342.25094-2-aneesh.kumar@linux.ibm.com
which ended up in a different email folder so I have missed it. It would
be much better to merge those two to make the user immediately obvious.
There is a good reason to keep newly added functions closer to their
users.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH V2 2/4] mm: Add get_user_pages_cma_migrate
  2018-09-06  5:43 ` [RFC PATCH V2 2/4] mm: Add get_user_pages_cma_migrate Aneesh Kumar K.V
@ 2018-09-06 12:45   ` Michal Hocko
  2018-09-06 13:23     ` Aneesh Kumar K.V
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2018-09-06 12:45 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: akpm, Alexey Kardashevskiy, mpe, linux-mm, linux-kernel, linuxppc-dev

On Thu 06-09-18 11:13:40, Aneesh Kumar K.V wrote:
> This helper does a get_user_pages_fast and if it find pages in the CMA area
> it will try to migrate them before taking page reference. This makes sure that
> we don't keep non-movable pages (due to page reference count) in the CMA area.
> Not able to move pages out of CMA area result in CMA allocation failures.

Again, there is no user so it is hard to guess the intention completely.
There is no documentation to describe the expected context and
assumptions about locking etc.

As noted in the previous email. You should better describe why you are
bypassing hugetlb pools. I assume that the reason is to guarantee a
forward progress because those might be sitting in the CMA pools
already, right?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH V2 4/4] powerpc/mm/iommu: Allow migration of cma allocated pages during mm_iommu_get
  2018-09-06  5:43 ` [RFC PATCH V2 4/4] powerpc/mm/iommu: Allow migration of cma allocated pages during mm_iommu_get Aneesh Kumar K.V
@ 2018-09-06 12:53   ` Michal Hocko
  2018-09-06 13:30     ` Aneesh Kumar K.V
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2018-09-06 12:53 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: akpm, Alexey Kardashevskiy, mpe, linux-mm, linux-kernel, linuxppc-dev

On Thu 06-09-18 11:13:42, Aneesh Kumar K.V wrote:
> Current code doesn't do page migration if the page allocated is a compound page.
> With HugeTLB migration support, we can end up allocating hugetlb pages from
> CMA region. Also THP pages can be allocated from CMA region. This patch updates
> the code to handle compound pages correctly.
> 
> This use the new helper get_user_pages_cma_migrate. It does one get_user_pages
> with right count, instead of doing one get_user_pages per page. That avoids
> reading page table multiple times.
> 
> The patch also convert the hpas member of mm_iommu_table_group_mem_t to a union.
> We use the same storage location to store pointers to struct page. We cannot
> update alll the code path use struct page *, because we access hpas in real mode
> and we can't do that struct page * to pfn conversion in real mode.

I am not fmailiar with this code so bear with me. I am completely
missing the purpose of this patch. The changelog doesn't really explain
that AFAICS. I can only guess that you do not want to establish long
pins on CMA pages, right? So whenever you are about to pin a page that
is in CMA you migrate it away to a different !__GFP_MOVABLE page, right?
If that is the case then how do you handle pins which are already in
zone_movable? I do not see any specific check for those.

Btw. why is this a proper thing to do? Problems with longterm pins are
not only for CMA/ZONE_MOVABLE pages. Pinned pages are not reclaimable as
well so there is a risk of OOMs if there are too many of them. We have
discussed approaches that would allow to force pin invalidation/revocation
at LSF/MM. Isn't that a more appropriate solution to the problem you are
seeing?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH V2 2/4] mm: Add get_user_pages_cma_migrate
  2018-09-06 12:45   ` Michal Hocko
@ 2018-09-06 13:23     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 14+ messages in thread
From: Aneesh Kumar K.V @ 2018-09-06 13:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, Alexey Kardashevskiy, mpe, linux-mm, linux-kernel, linuxppc-dev

On 09/06/2018 06:15 PM, Michal Hocko wrote:
> On Thu 06-09-18 11:13:40, Aneesh Kumar K.V wrote:
>> This helper does a get_user_pages_fast and if it find pages in the CMA area
>> it will try to migrate them before taking page reference. This makes sure that
>> we don't keep non-movable pages (due to page reference count) in the CMA area.
>> Not able to move pages out of CMA area result in CMA allocation failures.
> 
> Again, there is no user so it is hard to guess the intention completely.
> There is no documentation to describe the expected context and
> assumptions about locking etc.
> 

patch 4 is the user for the new helper. I will add the documentation 
update.

> As noted in the previous email. You should better describe why you are
> bypassing hugetlb pools. I assume that the reason is to guarantee a
> forward progress because those might be sitting in the CMA pools
> already, right?
> 

The reason for that is explained in the code

+		struct hstate *h = page_hstate(page);
+		/*
+		 * We don't want to dequeue from the pool because pool pages will
+		 * mostly be from the CMA region.
+		 */
+		return alloc_migrate_huge_page(h, gfp_mask, nid, NULL);

-aneesh


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH V2 4/4] powerpc/mm/iommu: Allow migration of cma allocated pages during mm_iommu_get
  2018-09-06 12:53   ` Michal Hocko
@ 2018-09-06 13:30     ` Aneesh Kumar K.V
  2018-09-07  9:03       ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: Aneesh Kumar K.V @ 2018-09-06 13:30 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, Alexey Kardashevskiy, mpe, linux-mm, linux-kernel, linuxppc-dev

On 09/06/2018 06:23 PM, Michal Hocko wrote:
> On Thu 06-09-18 11:13:42, Aneesh Kumar K.V wrote:
>> Current code doesn't do page migration if the page allocated is a compound page.
>> With HugeTLB migration support, we can end up allocating hugetlb pages from
>> CMA region. Also THP pages can be allocated from CMA region. This patch updates
>> the code to handle compound pages correctly.
>>
>> This use the new helper get_user_pages_cma_migrate. It does one get_user_pages
>> with right count, instead of doing one get_user_pages per page. That avoids
>> reading page table multiple times.
>>
>> The patch also convert the hpas member of mm_iommu_table_group_mem_t to a union.
>> We use the same storage location to store pointers to struct page. We cannot
>> update alll the code path use struct page *, because we access hpas in real mode
>> and we can't do that struct page * to pfn conversion in real mode.
> 
> I am not fmailiar with this code so bear with me. I am completely
> missing the purpose of this patch. The changelog doesn't really explain
> that AFAICS. I can only guess that you do not want to establish long
> pins on CMA pages, right? So whenever you are about to pin a page that
> is in CMA you migrate it away to a different !__GFP_MOVABLE page, right?

That is right.

> If that is the case then how do you handle pins which are already in
> zone_movable? I do not see any specific check for those.


> 
> Btw. why is this a proper thing to do? Problems with longterm pins are
> not only for CMA/ZONE_MOVABLE pages. Pinned pages are not reclaimable as
> well so there is a risk of OOMs if there are too many of them. We have
> discussed approaches that would allow to force pin invalidation/revocation
> at LSF/MM. Isn't that a more appropriate solution to the problem you are
> seeing?
> 

The CMA area is used on powerpc platforms to allocate guest specific 
page table (hash page table). If we don't have sufficient free pages we 
fail to allocate hash page table that result in failure to start guest.

Now with vfio, we end up pinning the entire guest RAM. There is a 
possibility that these guest RAM  pages got allocated from CMA region. 
We already do supporting migrating those pages out except for compound 
pages. What this patch does is to start supporting compound page 
migration that got allocated out of CMA region (ie, THP pages and 
hugetlb pages if platform supported hugetlb migration).

Now to do that I added a helper get_user_pages_cma_migrate().

I agree that long term pinned pages do have other issues. The patchset 
is not solving that issue.

-aneesh


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH V2 1/4] mm: Export alloc_migrate_huge_page
  2018-09-06 12:35   ` Michal Hocko
@ 2018-09-06 13:32     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 14+ messages in thread
From: Aneesh Kumar K.V @ 2018-09-06 13:32 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, Alexey Kardashevskiy, mpe, linux-mm, linux-kernel, linuxppc-dev

On 09/06/2018 06:05 PM, Michal Hocko wrote:
> On Thu 06-09-18 14:31:11, Michal Hocko wrote:
>> On Thu 06-09-18 11:13:39, Aneesh Kumar K.V wrote:
>>> We want to use this to support customized huge page migration.
>>
>> Please be much more specific. Ideally including the user. Btw. why do
>> you want to skip the hugetlb pools? In other words alloc_huge_page_node*
>> which are intended to an external use?
> 
> Ups, I have now found http://lkml.kernel.org/r/20180906054342.25094-2-aneesh.kumar@linux.ibm.com
> which ended up in a different email folder so I have missed it. It would
> be much better to merge those two to make the user immediately obvious.
> There is a good reason to keep newly added functions closer to their
> users.
> 

It is the same series. I will fold the patch 1 and 2.

-aneesh


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH V2 4/4] powerpc/mm/iommu: Allow migration of cma allocated pages during mm_iommu_get
  2018-09-06 13:30     ` Aneesh Kumar K.V
@ 2018-09-07  9:03       ` Michal Hocko
  2018-09-07 11:15         ` Aneesh Kumar K.V
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2018-09-07  9:03 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: akpm, Alexey Kardashevskiy, mpe, linux-mm, linux-kernel, linuxppc-dev

On Thu 06-09-18 19:00:43, Aneesh Kumar K.V wrote:
> On 09/06/2018 06:23 PM, Michal Hocko wrote:
> > On Thu 06-09-18 11:13:42, Aneesh Kumar K.V wrote:
> > > Current code doesn't do page migration if the page allocated is a compound page.
> > > With HugeTLB migration support, we can end up allocating hugetlb pages from
> > > CMA region. Also THP pages can be allocated from CMA region. This patch updates
> > > the code to handle compound pages correctly.
> > > 
> > > This use the new helper get_user_pages_cma_migrate. It does one get_user_pages
> > > with right count, instead of doing one get_user_pages per page. That avoids
> > > reading page table multiple times.
> > > 
> > > The patch also convert the hpas member of mm_iommu_table_group_mem_t to a union.
> > > We use the same storage location to store pointers to struct page. We cannot
> > > update alll the code path use struct page *, because we access hpas in real mode
> > > and we can't do that struct page * to pfn conversion in real mode.
> > 
> > I am not fmailiar with this code so bear with me. I am completely
> > missing the purpose of this patch. The changelog doesn't really explain
> > that AFAICS. I can only guess that you do not want to establish long
> > pins on CMA pages, right? So whenever you are about to pin a page that
> > is in CMA you migrate it away to a different !__GFP_MOVABLE page, right?
> 
> That is right.
> 
> > If that is the case then how do you handle pins which are already in
> > zone_movable? I do not see any specific check for those.
> 
> 
> > 
> > Btw. why is this a proper thing to do? Problems with longterm pins are
> > not only for CMA/ZONE_MOVABLE pages. Pinned pages are not reclaimable as
> > well so there is a risk of OOMs if there are too many of them. We have
> > discussed approaches that would allow to force pin invalidation/revocation
> > at LSF/MM. Isn't that a more appropriate solution to the problem you are
> > seeing?
> > 
> 
> The CMA area is used on powerpc platforms to allocate guest specific page
> table (hash page table). If we don't have sufficient free pages we fail to
> allocate hash page table that result in failure to start guest.
> 
> Now with vfio, we end up pinning the entire guest RAM. There is a
> possibility that these guest RAM  pages got allocated from CMA region. We
> already do supporting migrating those pages out except for compound pages.
> What this patch does is to start supporting compound page migration that got
> allocated out of CMA region (ie, THP pages and hugetlb pages if platform
> supported hugetlb migration).

This definitely belongs to the changelog.

> Now to do that I added a helper get_user_pages_cma_migrate().
> 
> I agree that long term pinned pages do have other issues. The patchset is
> not solving that issue.

It would be great to note why a generic approach is not viable. I assume
the main reason is that those pins are pretty much permanent for the
guest lifetime so the situation has to be handled in advance. In other
words, more information please. 
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH V2 4/4] powerpc/mm/iommu: Allow migration of cma allocated pages during mm_iommu_get
  2018-09-07  9:03       ` Michal Hocko
@ 2018-09-07 11:15         ` Aneesh Kumar K.V
  2018-09-07 11:25           ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: Aneesh Kumar K.V @ 2018-09-07 11:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, Alexey Kardashevskiy, mpe, linux-mm, linux-kernel, linuxppc-dev

On 09/07/2018 02:33 PM, Michal Hocko wrote:
> On Thu 06-09-18 19:00:43, Aneesh Kumar K.V wrote:
>> On 09/06/2018 06:23 PM, Michal Hocko wrote:
>>> On Thu 06-09-18 11:13:42, Aneesh Kumar K.V wrote:
>>>> Current code doesn't do page migration if the page allocated is a compound page.
>>>> With HugeTLB migration support, we can end up allocating hugetlb pages from
>>>> CMA region. Also THP pages can be allocated from CMA region. This patch updates
>>>> the code to handle compound pages correctly.
>>>>
>>>> This use the new helper get_user_pages_cma_migrate. It does one get_user_pages
>>>> with right count, instead of doing one get_user_pages per page. That avoids
>>>> reading page table multiple times.
>>>>
>>>> The patch also convert the hpas member of mm_iommu_table_group_mem_t to a union.
>>>> We use the same storage location to store pointers to struct page. We cannot
>>>> update alll the code path use struct page *, because we access hpas in real mode
>>>> and we can't do that struct page * to pfn conversion in real mode.
>>>
>>> I am not fmailiar with this code so bear with me. I am completely
>>> missing the purpose of this patch. The changelog doesn't really explain
>>> that AFAICS. I can only guess that you do not want to establish long
>>> pins on CMA pages, right? So whenever you are about to pin a page that
>>> is in CMA you migrate it away to a different !__GFP_MOVABLE page, right?
>>
>> That is right.
>>
>>> If that is the case then how do you handle pins which are already in
>>> zone_movable? I do not see any specific check for those.
>>
>>
>>>
>>> Btw. why is this a proper thing to do? Problems with longterm pins are
>>> not only for CMA/ZONE_MOVABLE pages. Pinned pages are not reclaimable as
>>> well so there is a risk of OOMs if there are too many of them. We have
>>> discussed approaches that would allow to force pin invalidation/revocation
>>> at LSF/MM. Isn't that a more appropriate solution to the problem you are
>>> seeing?
>>>
>>
>> The CMA area is used on powerpc platforms to allocate guest specific page
>> table (hash page table). If we don't have sufficient free pages we fail to
>> allocate hash page table that result in failure to start guest.
>>
>> Now with vfio, we end up pinning the entire guest RAM. There is a
>> possibility that these guest RAM  pages got allocated from CMA region. We
>> already do supporting migrating those pages out except for compound pages.
>> What this patch does is to start supporting compound page migration that got
>> allocated out of CMA region (ie, THP pages and hugetlb pages if platform
>> supported hugetlb migration).
> 
> This definitely belongs to the changelog.
> 
>> Now to do that I added a helper get_user_pages_cma_migrate().
>>
>> I agree that long term pinned pages do have other issues. The patchset is
>> not solving that issue.
> 
> It would be great to note why a generic approach is not viable. I assume
> the main reason is that those pins are pretty much permanent for the
> guest lifetime so the situation has to be handled in advance. In other
> words, more information please.
> 

That is correct. I will add these details to commit message. And will 
also do a cover letter for the patch series.

-aneesh


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH V2 4/4] powerpc/mm/iommu: Allow migration of cma allocated pages during mm_iommu_get
  2018-09-07 11:15         ` Aneesh Kumar K.V
@ 2018-09-07 11:25           ` Michal Hocko
  0 siblings, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2018-09-07 11:25 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: akpm, Alexey Kardashevskiy, mpe, linux-mm, linux-kernel, linuxppc-dev

On Fri 07-09-18 16:45:09, Aneesh Kumar K.V wrote:
> On 09/07/2018 02:33 PM, Michal Hocko wrote:
> > On Thu 06-09-18 19:00:43, Aneesh Kumar K.V wrote:
> > > On 09/06/2018 06:23 PM, Michal Hocko wrote:
> > > > On Thu 06-09-18 11:13:42, Aneesh Kumar K.V wrote:
> > > > > Current code doesn't do page migration if the page allocated is a compound page.
> > > > > With HugeTLB migration support, we can end up allocating hugetlb pages from
> > > > > CMA region. Also THP pages can be allocated from CMA region. This patch updates
> > > > > the code to handle compound pages correctly.
> > > > > 
> > > > > This use the new helper get_user_pages_cma_migrate. It does one get_user_pages
> > > > > with right count, instead of doing one get_user_pages per page. That avoids
> > > > > reading page table multiple times.
> > > > > 
> > > > > The patch also convert the hpas member of mm_iommu_table_group_mem_t to a union.
> > > > > We use the same storage location to store pointers to struct page. We cannot
> > > > > update alll the code path use struct page *, because we access hpas in real mode
> > > > > and we can't do that struct page * to pfn conversion in real mode.
> > > > 
> > > > I am not fmailiar with this code so bear with me. I am completely
> > > > missing the purpose of this patch. The changelog doesn't really explain
> > > > that AFAICS. I can only guess that you do not want to establish long
> > > > pins on CMA pages, right? So whenever you are about to pin a page that
> > > > is in CMA you migrate it away to a different !__GFP_MOVABLE page, right?
> > > 
> > > That is right.
> > > 
> > > > If that is the case then how do you handle pins which are already in
> > > > zone_movable? I do not see any specific check for those.
> > > 
> > > 
> > > > 
> > > > Btw. why is this a proper thing to do? Problems with longterm pins are
> > > > not only for CMA/ZONE_MOVABLE pages. Pinned pages are not reclaimable as
> > > > well so there is a risk of OOMs if there are too many of them. We have
> > > > discussed approaches that would allow to force pin invalidation/revocation
> > > > at LSF/MM. Isn't that a more appropriate solution to the problem you are
> > > > seeing?
> > > > 
> > > 
> > > The CMA area is used on powerpc platforms to allocate guest specific page
> > > table (hash page table). If we don't have sufficient free pages we fail to
> > > allocate hash page table that result in failure to start guest.
> > > 
> > > Now with vfio, we end up pinning the entire guest RAM. There is a
> > > possibility that these guest RAM  pages got allocated from CMA region. We
> > > already do supporting migrating those pages out except for compound pages.
> > > What this patch does is to start supporting compound page migration that got
> > > allocated out of CMA region (ie, THP pages and hugetlb pages if platform
> > > supported hugetlb migration).
> > 
> > This definitely belongs to the changelog.
> > 
> > > Now to do that I added a helper get_user_pages_cma_migrate().
> > > 
> > > I agree that long term pinned pages do have other issues. The patchset is
> > > not solving that issue.
> > 
> > It would be great to note why a generic approach is not viable. I assume
> > the main reason is that those pins are pretty much permanent for the
> > guest lifetime so the situation has to be handled in advance. In other
> > words, more information please.
> > 
> 
> That is correct. I will add these details to commit message. And will also
> do a cover letter for the patch series.

OK, then the early migration makes some sense. Although I suspect this
will lead to other issues (OOM in kernel zones) but revocation approach
is clearly not usable. An excessive pinning simply sucks.

Thanks a lot for the updated information though!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-09-07 11:25 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-06  5:43 [RFC PATCH V2 1/4] mm: Export alloc_migrate_huge_page Aneesh Kumar K.V
2018-09-06  5:43 ` [RFC PATCH V2 2/4] mm: Add get_user_pages_cma_migrate Aneesh Kumar K.V
2018-09-06 12:45   ` Michal Hocko
2018-09-06 13:23     ` Aneesh Kumar K.V
2018-09-06  5:43 ` [RFC PATCH V2 3/4] powerpc/mm/iommu: Allow large IOMMU page size only for hugetlb backing Aneesh Kumar K.V
2018-09-06  5:43 ` [RFC PATCH V2 4/4] powerpc/mm/iommu: Allow migration of cma allocated pages during mm_iommu_get Aneesh Kumar K.V
2018-09-06 12:53   ` Michal Hocko
2018-09-06 13:30     ` Aneesh Kumar K.V
2018-09-07  9:03       ` Michal Hocko
2018-09-07 11:15         ` Aneesh Kumar K.V
2018-09-07 11:25           ` Michal Hocko
2018-09-06 12:31 ` [RFC PATCH V2 1/4] mm: Export alloc_migrate_huge_page Michal Hocko
2018-09-06 12:35   ` Michal Hocko
2018-09-06 13:32     ` Aneesh Kumar K.V

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).