All of lore.kernel.org
 help / color / mirror / Atom feed
* [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
@ 2016-07-14  4:25 ` Balbir Singh
  0 siblings, 0 replies; 27+ messages in thread
From: Balbir Singh @ 2016-07-14  4:25 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc, kvm
  Cc: linux-mm, Benjamin Herrenschmidt, Alexey Kardashevskiy,
	Paul Mackerras, Michael Ellerman


From: Balbir Singh <bsingharora@gmail.com>
Subject: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA

When PCI Device pass-through is enabled via VFIO, KVM-PPC will
pin pages using get_user_pages_fast(). One of the downsides of
the pinning is that the page could be in CMA region. The CMA
region is used for other allocations like the hash page table.
Ideally we want the pinned pages to be from non CMA region.

This patch (currently only for KVM PPC with VFIO) forcefully
migrates the pages out (huge pages are omitted for the moment).
There are more efficient ways of doing this, but that might
be elaborate and might impact a larger audience beyond just
the kvm ppc implementation.

The magic is in new_iommu_non_cma_page() which allocates the
new page from a non CMA region.

I've tested the patches lightly at my end, but there might be bugs
For example if after lru_add_drain(), the page is not isolated
is this a BUG?

Previous discussion was at
http://permalink.gmane.org/gmane.linux.kernel.mm/136738

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
---
 arch/powerpc/include/asm/mmu_context.h |  1 +
 arch/powerpc/mm/mmu_context_iommu.c    | 80 ++++++++++++++++++++++++++++++++--
 2 files changed, 77 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 9d2cd0c..475d1be 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -18,6 +18,7 @@ extern void destroy_context(struct mm_struct *mm);
 #ifdef CONFIG_SPAPR_TCE_IOMMU
 struct mm_iommu_table_group_mem_t;
 
+extern int isolate_lru_page(struct page *page);	/* from internal.h */
 extern bool mm_iommu_preregistered(void);
 extern long mm_iommu_get(unsigned long ua, unsigned long entries,
 		struct mm_iommu_table_group_mem_t **pmem);
diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
index da6a216..c18f742 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -15,6 +15,9 @@
 #include <linux/rculist.h>
 #include <linux/vmalloc.h>
 #include <linux/mutex.h>
+#include <linux/migrate.h>
+#include <linux/hugetlb.h>
+#include <linux/swap.h>
 #include <asm/mmu_context.h>
 
 static DEFINE_MUTEX(mem_list_mutex);
@@ -72,6 +75,54 @@ bool mm_iommu_preregistered(void)
 }
 EXPORT_SYMBOL_GPL(mm_iommu_preregistered);
 
+/*
+ * Taken from alloc_migrate_target with changes to remove CMA allocations
+ */
+struct page *new_iommu_non_cma_page(struct page *page, unsigned long private,
+					int **resultp)
+{
+	gfp_t gfp_mask = GFP_USER;
+	struct page *new_page;
+
+	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
+		return NULL;
+
+	if (PageHighMem(page))
+		gfp_mask |= __GFP_HIGHMEM;
+
+	/*
+	 * We don't want the allocation to force an OOM if possibe
+	 */
+	new_page = alloc_page(gfp_mask | __GFP_NORETRY | __GFP_NOWARN);
+	return new_page;
+}
+
+static int mm_iommu_move_page_from_cma(struct page *page)
+{
+	int ret;
+	LIST_HEAD(cma_migrate_pages);
+
+	/* Ignore huge pages for now */
+	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
+		return -EBUSY;
+
+	lru_add_drain();
+	ret = isolate_lru_page(page);
+	if (ret)
+		get_page(page); /* Potential BUG? */
+
+	list_add(&page->lru, &cma_migrate_pages);
+	put_page(page); /* Drop the gup reference */
+
+	ret = migrate_pages(&cma_migrate_pages, new_iommu_non_cma_page,
+				NULL, 0, MIGRATE_SYNC, MR_CMA);
+	if (ret) {
+		if (!list_empty(&cma_migrate_pages))
+			putback_movable_pages(&cma_migrate_pages);
+	}
+	return 0;
+}
+
 long mm_iommu_get(unsigned long ua, unsigned long entries,
 		struct mm_iommu_table_group_mem_t **pmem)
 {
@@ -124,15 +175,36 @@ long mm_iommu_get(unsigned long ua, unsigned long entries,
 	for (i = 0; i < entries; ++i) {
 		if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
 					1/* pages */, 1/* iswrite */, &page)) {
+			ret = -EFAULT;
 			for (j = 0; j < i; ++j)
-				put_page(pfn_to_page(
-						mem->hpas[j] >> PAGE_SHIFT));
+				put_page(pfn_to_page(mem->hpas[j] >>
+						PAGE_SHIFT));
 			vfree(mem->hpas);
 			kfree(mem);
-			ret = -EFAULT;
 			goto unlock_exit;
 		}
-
+		/*
+		 * If we get a page from the CMA zone, since we are going to
+		 * be pinning these entries, we might as well move them out
+		 * of the CMA zone if possible. NOTE: faulting in + migration
+		 * can be expensive. Batching can be considered later
+		 */
+		if (get_pageblock_migratetype(page) == MIGRATE_CMA) {
+			if (mm_iommu_move_page_from_cma(page))
+				goto populate;
+			if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
+						1/* pages */, 1/* iswrite */,
+						&page)) {
+				ret = -EFAULT;
+				for (j = 0; j < i; ++j)
+					put_page(pfn_to_page(mem->hpas[j] >>
+								PAGE_SHIFT));
+				vfree(mem->hpas);
+				kfree(mem);
+				goto unlock_exit;
+			}
+		}
+populate:
 		mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
 	}
 
-- 
2.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
@ 2016-07-14  4:25 ` Balbir Singh
  0 siblings, 0 replies; 27+ messages in thread
From: Balbir Singh @ 2016-07-14  4:25 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc, kvm
  Cc: linux-mm, Benjamin Herrenschmidt, Alexey Kardashevskiy,
	Paul Mackerras, Michael Ellerman


From: Balbir Singh <bsingharora@gmail.com>
Subject: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA

When PCI Device pass-through is enabled via VFIO, KVM-PPC will
pin pages using get_user_pages_fast(). One of the downsides of
the pinning is that the page could be in CMA region. The CMA
region is used for other allocations like the hash page table.
Ideally we want the pinned pages to be from non CMA region.

This patch (currently only for KVM PPC with VFIO) forcefully
migrates the pages out (huge pages are omitted for the moment).
There are more efficient ways of doing this, but that might
be elaborate and might impact a larger audience beyond just
the kvm ppc implementation.

The magic is in new_iommu_non_cma_page() which allocates the
new page from a non CMA region.

I've tested the patches lightly at my end, but there might be bugs
For example if after lru_add_drain(), the page is not isolated
is this a BUG?

Previous discussion was at
http://permalink.gmane.org/gmane.linux.kernel.mm/136738

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
---
 arch/powerpc/include/asm/mmu_context.h |  1 +
 arch/powerpc/mm/mmu_context_iommu.c    | 80 ++++++++++++++++++++++++++++++++--
 2 files changed, 77 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 9d2cd0c..475d1be 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -18,6 +18,7 @@ extern void destroy_context(struct mm_struct *mm);
 #ifdef CONFIG_SPAPR_TCE_IOMMU
 struct mm_iommu_table_group_mem_t;
 
+extern int isolate_lru_page(struct page *page);	/* from internal.h */
 extern bool mm_iommu_preregistered(void);
 extern long mm_iommu_get(unsigned long ua, unsigned long entries,
 		struct mm_iommu_table_group_mem_t **pmem);
diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
index da6a216..c18f742 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -15,6 +15,9 @@
 #include <linux/rculist.h>
 #include <linux/vmalloc.h>
 #include <linux/mutex.h>
+#include <linux/migrate.h>
+#include <linux/hugetlb.h>
+#include <linux/swap.h>
 #include <asm/mmu_context.h>
 
 static DEFINE_MUTEX(mem_list_mutex);
@@ -72,6 +75,54 @@ bool mm_iommu_preregistered(void)
 }
 EXPORT_SYMBOL_GPL(mm_iommu_preregistered);
 
+/*
+ * Taken from alloc_migrate_target with changes to remove CMA allocations
+ */
+struct page *new_iommu_non_cma_page(struct page *page, unsigned long private,
+					int **resultp)
+{
+	gfp_t gfp_mask = GFP_USER;
+	struct page *new_page;
+
+	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
+		return NULL;
+
+	if (PageHighMem(page))
+		gfp_mask |= __GFP_HIGHMEM;
+
+	/*
+	 * We don't want the allocation to force an OOM if possibe
+	 */
+	new_page = alloc_page(gfp_mask | __GFP_NORETRY | __GFP_NOWARN);
+	return new_page;
+}
+
+static int mm_iommu_move_page_from_cma(struct page *page)
+{
+	int ret;
+	LIST_HEAD(cma_migrate_pages);
+
+	/* Ignore huge pages for now */
+	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
+		return -EBUSY;
+
+	lru_add_drain();
+	ret = isolate_lru_page(page);
+	if (ret)
+		get_page(page); /* Potential BUG? */
+
+	list_add(&page->lru, &cma_migrate_pages);
+	put_page(page); /* Drop the gup reference */
+
+	ret = migrate_pages(&cma_migrate_pages, new_iommu_non_cma_page,
+				NULL, 0, MIGRATE_SYNC, MR_CMA);
+	if (ret) {
+		if (!list_empty(&cma_migrate_pages))
+			putback_movable_pages(&cma_migrate_pages);
+	}
+	return 0;
+}
+
 long mm_iommu_get(unsigned long ua, unsigned long entries,
 		struct mm_iommu_table_group_mem_t **pmem)
 {
@@ -124,15 +175,36 @@ long mm_iommu_get(unsigned long ua, unsigned long entries,
 	for (i = 0; i < entries; ++i) {
 		if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
 					1/* pages */, 1/* iswrite */, &page)) {
+			ret = -EFAULT;
 			for (j = 0; j < i; ++j)
-				put_page(pfn_to_page(
-						mem->hpas[j] >> PAGE_SHIFT));
+				put_page(pfn_to_page(mem->hpas[j] >>
+						PAGE_SHIFT));
 			vfree(mem->hpas);
 			kfree(mem);
-			ret = -EFAULT;
 			goto unlock_exit;
 		}
-
+		/*
+		 * If we get a page from the CMA zone, since we are going to
+		 * be pinning these entries, we might as well move them out
+		 * of the CMA zone if possible. NOTE: faulting in + migration
+		 * can be expensive. Batching can be considered later
+		 */
+		if (get_pageblock_migratetype(page) == MIGRATE_CMA) {
+			if (mm_iommu_move_page_from_cma(page))
+				goto populate;
+			if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
+						1/* pages */, 1/* iswrite */,
+						&page)) {
+				ret = -EFAULT;
+				for (j = 0; j < i; ++j)
+					put_page(pfn_to_page(mem->hpas[j] >>
+								PAGE_SHIFT));
+				vfree(mem->hpas);
+				kfree(mem);
+				goto unlock_exit;
+			}
+		}
+populate:
 		mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
 	}
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
@ 2016-07-14  4:25 ` Balbir Singh
  0 siblings, 0 replies; 27+ messages in thread
From: Balbir Singh @ 2016-07-14  4:25 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc, kvm
  Cc: linux-mm, Benjamin Herrenschmidt, Alexey Kardashevskiy,
	Paul Mackerras, Michael Ellerman


From: Balbir Singh <bsingharora@gmail.com>
Subject: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA

When PCI Device pass-through is enabled via VFIO, KVM-PPC will
pin pages using get_user_pages_fast(). One of the downsides of
the pinning is that the page could be in CMA region. The CMA
region is used for other allocations like the hash page table.
Ideally we want the pinned pages to be from non CMA region.

This patch (currently only for KVM PPC with VFIO) forcefully
migrates the pages out (huge pages are omitted for the moment).
There are more efficient ways of doing this, but that might
be elaborate and might impact a larger audience beyond just
the kvm ppc implementation.

The magic is in new_iommu_non_cma_page() which allocates the
new page from a non CMA region.

I've tested the patches lightly at my end, but there might be bugs
For example if after lru_add_drain(), the page is not isolated
is this a BUG?

Previous discussion was at
http://permalink.gmane.org/gmane.linux.kernel.mm/136738

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
---
 arch/powerpc/include/asm/mmu_context.h |  1 +
 arch/powerpc/mm/mmu_context_iommu.c    | 80 ++++++++++++++++++++++++++++++++--
 2 files changed, 77 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 9d2cd0c..475d1be 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -18,6 +18,7 @@ extern void destroy_context(struct mm_struct *mm);
 #ifdef CONFIG_SPAPR_TCE_IOMMU
 struct mm_iommu_table_group_mem_t;
 
+extern int isolate_lru_page(struct page *page);	/* from internal.h */
 extern bool mm_iommu_preregistered(void);
 extern long mm_iommu_get(unsigned long ua, unsigned long entries,
 		struct mm_iommu_table_group_mem_t **pmem);
diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
index da6a216..c18f742 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -15,6 +15,9 @@
 #include <linux/rculist.h>
 #include <linux/vmalloc.h>
 #include <linux/mutex.h>
+#include <linux/migrate.h>
+#include <linux/hugetlb.h>
+#include <linux/swap.h>
 #include <asm/mmu_context.h>
 
 static DEFINE_MUTEX(mem_list_mutex);
@@ -72,6 +75,54 @@ bool mm_iommu_preregistered(void)
 }
 EXPORT_SYMBOL_GPL(mm_iommu_preregistered);
 
+/*
+ * Taken from alloc_migrate_target with changes to remove CMA allocations
+ */
+struct page *new_iommu_non_cma_page(struct page *page, unsigned long private,
+					int **resultp)
+{
+	gfp_t gfp_mask = GFP_USER;
+	struct page *new_page;
+
+	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
+		return NULL;
+
+	if (PageHighMem(page))
+		gfp_mask |= __GFP_HIGHMEM;
+
+	/*
+	 * We don't want the allocation to force an OOM if possibe
+	 */
+	new_page = alloc_page(gfp_mask | __GFP_NORETRY | __GFP_NOWARN);
+	return new_page;
+}
+
+static int mm_iommu_move_page_from_cma(struct page *page)
+{
+	int ret;
+	LIST_HEAD(cma_migrate_pages);
+
+	/* Ignore huge pages for now */
+	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
+		return -EBUSY;
+
+	lru_add_drain();
+	ret = isolate_lru_page(page);
+	if (ret)
+		get_page(page); /* Potential BUG? */
+
+	list_add(&page->lru, &cma_migrate_pages);
+	put_page(page); /* Drop the gup reference */
+
+	ret = migrate_pages(&cma_migrate_pages, new_iommu_non_cma_page,
+				NULL, 0, MIGRATE_SYNC, MR_CMA);
+	if (ret) {
+		if (!list_empty(&cma_migrate_pages))
+			putback_movable_pages(&cma_migrate_pages);
+	}
+	return 0;
+}
+
 long mm_iommu_get(unsigned long ua, unsigned long entries,
 		struct mm_iommu_table_group_mem_t **pmem)
 {
@@ -124,15 +175,36 @@ long mm_iommu_get(unsigned long ua, unsigned long entries,
 	for (i = 0; i < entries; ++i) {
 		if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
 					1/* pages */, 1/* iswrite */, &page)) {
+			ret = -EFAULT;
 			for (j = 0; j < i; ++j)
-				put_page(pfn_to_page(
-						mem->hpas[j] >> PAGE_SHIFT));
+				put_page(pfn_to_page(mem->hpas[j] >>
+						PAGE_SHIFT));
 			vfree(mem->hpas);
 			kfree(mem);
-			ret = -EFAULT;
 			goto unlock_exit;
 		}
-
+		/*
+		 * If we get a page from the CMA zone, since we are going to
+		 * be pinning these entries, we might as well move them out
+		 * of the CMA zone if possible. NOTE: faulting in + migration
+		 * can be expensive. Batching can be considered later
+		 */
+		if (get_pageblock_migratetype(page) = MIGRATE_CMA) {
+			if (mm_iommu_move_page_from_cma(page))
+				goto populate;
+			if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
+						1/* pages */, 1/* iswrite */,
+						&page)) {
+				ret = -EFAULT;
+				for (j = 0; j < i; ++j)
+					put_page(pfn_to_page(mem->hpas[j] >>
+								PAGE_SHIFT));
+				vfree(mem->hpas);
+				kfree(mem);
+				goto unlock_exit;
+			}
+		}
+populate:
 		mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
 	}
 
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
  2016-07-14  4:25 ` Balbir Singh
  (?)
@ 2016-08-31  4:14   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 27+ messages in thread
From: Alexey Kardashevskiy @ 2016-08-31  4:14 UTC (permalink / raw)
  To: bsingharora, linuxppc-dev, kvm-ppc, kvm
  Cc: linux-mm, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman

On 14/07/16 14:25, Balbir Singh wrote:
> 
> From: Balbir Singh <bsingharora@gmail.com>
> Subject: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
> 
> When PCI Device pass-through is enabled via VFIO, KVM-PPC will
> pin pages using get_user_pages_fast(). One of the downsides of
> the pinning is that the page could be in CMA region. The CMA
> region is used for other allocations like the hash page table.
> Ideally we want the pinned pages to be from non CMA region.
> 
> This patch (currently only for KVM PPC with VFIO) forcefully
> migrates the pages out (huge pages are omitted for the moment).
> There are more efficient ways of doing this, but that might
> be elaborate and might impact a larger audience beyond just
> the kvm ppc implementation.
> 
> The magic is in new_iommu_non_cma_page() which allocates the
> new page from a non CMA region.
> 
> I've tested the patches lightly at my end, but there might be bugs
> For example if after lru_add_drain(), the page is not isolated
> is this a BUG?
> 
> Previous discussion was at
> http://permalink.gmane.org/gmane.linux.kernel.mm/136738
> 
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> Signed-off-by: Balbir Singh <bsingharora@gmail.com>



Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>



> ---
>  arch/powerpc/include/asm/mmu_context.h |  1 +
>  arch/powerpc/mm/mmu_context_iommu.c    | 80 ++++++++++++++++++++++++++++++++--
>  2 files changed, 77 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index 9d2cd0c..475d1be 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -18,6 +18,7 @@ extern void destroy_context(struct mm_struct *mm);
>  #ifdef CONFIG_SPAPR_TCE_IOMMU
>  struct mm_iommu_table_group_mem_t;
>  
> +extern int isolate_lru_page(struct page *page);	/* from internal.h */
>  extern bool mm_iommu_preregistered(void);
>  extern long mm_iommu_get(unsigned long ua, unsigned long entries,
>  		struct mm_iommu_table_group_mem_t **pmem);
> diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
> index da6a216..c18f742 100644
> --- a/arch/powerpc/mm/mmu_context_iommu.c
> +++ b/arch/powerpc/mm/mmu_context_iommu.c
> @@ -15,6 +15,9 @@
>  #include <linux/rculist.h>
>  #include <linux/vmalloc.h>
>  #include <linux/mutex.h>
> +#include <linux/migrate.h>
> +#include <linux/hugetlb.h>
> +#include <linux/swap.h>
>  #include <asm/mmu_context.h>
>  
>  static DEFINE_MUTEX(mem_list_mutex);
> @@ -72,6 +75,54 @@ bool mm_iommu_preregistered(void)
>  }
>  EXPORT_SYMBOL_GPL(mm_iommu_preregistered);
>  
> +/*
> + * Taken from alloc_migrate_target with changes to remove CMA allocations
> + */
> +struct page *new_iommu_non_cma_page(struct page *page, unsigned long private,
> +					int **resultp)
> +{
> +	gfp_t gfp_mask = GFP_USER;
> +	struct page *new_page;
> +
> +	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
> +		return NULL;
> +
> +	if (PageHighMem(page))
> +		gfp_mask |= __GFP_HIGHMEM;
> +
> +	/*
> +	 * We don't want the allocation to force an OOM if possibe
> +	 */
> +	new_page = alloc_page(gfp_mask | __GFP_NORETRY | __GFP_NOWARN);
> +	return new_page;
> +}
> +
> +static int mm_iommu_move_page_from_cma(struct page *page)
> +{
> +	int ret;
> +	LIST_HEAD(cma_migrate_pages);
> +
> +	/* Ignore huge pages for now */
> +	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
> +		return -EBUSY;
> +
> +	lru_add_drain();
> +	ret = isolate_lru_page(page);
> +	if (ret)
> +		get_page(page); /* Potential BUG? */
> +
> +	list_add(&page->lru, &cma_migrate_pages);
> +	put_page(page); /* Drop the gup reference */
> +
> +	ret = migrate_pages(&cma_migrate_pages, new_iommu_non_cma_page,
> +				NULL, 0, MIGRATE_SYNC, MR_CMA);
> +	if (ret) {
> +		if (!list_empty(&cma_migrate_pages))
> +			putback_movable_pages(&cma_migrate_pages);
> +	}
> +	return 0;
> +}
> +
>  long mm_iommu_get(unsigned long ua, unsigned long entries,
>  		struct mm_iommu_table_group_mem_t **pmem)
>  {
> @@ -124,15 +175,36 @@ long mm_iommu_get(unsigned long ua, unsigned long entries,
>  	for (i = 0; i < entries; ++i) {
>  		if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
>  					1/* pages */, 1/* iswrite */, &page)) {
> +			ret = -EFAULT;
>  			for (j = 0; j < i; ++j)
> -				put_page(pfn_to_page(
> -						mem->hpas[j] >> PAGE_SHIFT));
> +				put_page(pfn_to_page(mem->hpas[j] >>
> +						PAGE_SHIFT));
>  			vfree(mem->hpas);
>  			kfree(mem);
> -			ret = -EFAULT;
>  			goto unlock_exit;
>  		}
> -
> +		/*
> +		 * If we get a page from the CMA zone, since we are going to
> +		 * be pinning these entries, we might as well move them out
> +		 * of the CMA zone if possible. NOTE: faulting in + migration
> +		 * can be expensive. Batching can be considered later
> +		 */
> +		if (get_pageblock_migratetype(page) == MIGRATE_CMA) {
> +			if (mm_iommu_move_page_from_cma(page))
> +				goto populate;
> +			if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
> +						1/* pages */, 1/* iswrite */,
> +						&page)) {
> +				ret = -EFAULT;
> +				for (j = 0; j < i; ++j)
> +					put_page(pfn_to_page(mem->hpas[j] >>
> +								PAGE_SHIFT));
> +				vfree(mem->hpas);
> +				kfree(mem);
> +				goto unlock_exit;
> +			}
> +		}
> +populate:
>  		mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
>  	}
>  
> 


-- 
Alexey

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
@ 2016-08-31  4:14   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 27+ messages in thread
From: Alexey Kardashevskiy @ 2016-08-31  4:14 UTC (permalink / raw)
  To: bsingharora, linuxppc-dev, kvm-ppc, kvm
  Cc: linux-mm, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman

On 14/07/16 14:25, Balbir Singh wrote:
> 
> From: Balbir Singh <bsingharora@gmail.com>
> Subject: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
> 
> When PCI Device pass-through is enabled via VFIO, KVM-PPC will
> pin pages using get_user_pages_fast(). One of the downsides of
> the pinning is that the page could be in CMA region. The CMA
> region is used for other allocations like the hash page table.
> Ideally we want the pinned pages to be from non CMA region.
> 
> This patch (currently only for KVM PPC with VFIO) forcefully
> migrates the pages out (huge pages are omitted for the moment).
> There are more efficient ways of doing this, but that might
> be elaborate and might impact a larger audience beyond just
> the kvm ppc implementation.
> 
> The magic is in new_iommu_non_cma_page() which allocates the
> new page from a non CMA region.
> 
> I've tested the patches lightly at my end, but there might be bugs
> For example if after lru_add_drain(), the page is not isolated
> is this a BUG?
> 
> Previous discussion was at
> http://permalink.gmane.org/gmane.linux.kernel.mm/136738
> 
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> Signed-off-by: Balbir Singh <bsingharora@gmail.com>



Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>



> ---
>  arch/powerpc/include/asm/mmu_context.h |  1 +
>  arch/powerpc/mm/mmu_context_iommu.c    | 80 ++++++++++++++++++++++++++++++++--
>  2 files changed, 77 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index 9d2cd0c..475d1be 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -18,6 +18,7 @@ extern void destroy_context(struct mm_struct *mm);
>  #ifdef CONFIG_SPAPR_TCE_IOMMU
>  struct mm_iommu_table_group_mem_t;
>  
> +extern int isolate_lru_page(struct page *page);	/* from internal.h */
>  extern bool mm_iommu_preregistered(void);
>  extern long mm_iommu_get(unsigned long ua, unsigned long entries,
>  		struct mm_iommu_table_group_mem_t **pmem);
> diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
> index da6a216..c18f742 100644
> --- a/arch/powerpc/mm/mmu_context_iommu.c
> +++ b/arch/powerpc/mm/mmu_context_iommu.c
> @@ -15,6 +15,9 @@
>  #include <linux/rculist.h>
>  #include <linux/vmalloc.h>
>  #include <linux/mutex.h>
> +#include <linux/migrate.h>
> +#include <linux/hugetlb.h>
> +#include <linux/swap.h>
>  #include <asm/mmu_context.h>
>  
>  static DEFINE_MUTEX(mem_list_mutex);
> @@ -72,6 +75,54 @@ bool mm_iommu_preregistered(void)
>  }
>  EXPORT_SYMBOL_GPL(mm_iommu_preregistered);
>  
> +/*
> + * Taken from alloc_migrate_target with changes to remove CMA allocations
> + */
> +struct page *new_iommu_non_cma_page(struct page *page, unsigned long private,
> +					int **resultp)
> +{
> +	gfp_t gfp_mask = GFP_USER;
> +	struct page *new_page;
> +
> +	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
> +		return NULL;
> +
> +	if (PageHighMem(page))
> +		gfp_mask |= __GFP_HIGHMEM;
> +
> +	/*
> +	 * We don't want the allocation to force an OOM if possibe
> +	 */
> +	new_page = alloc_page(gfp_mask | __GFP_NORETRY | __GFP_NOWARN);
> +	return new_page;
> +}
> +
> +static int mm_iommu_move_page_from_cma(struct page *page)
> +{
> +	int ret;
> +	LIST_HEAD(cma_migrate_pages);
> +
> +	/* Ignore huge pages for now */
> +	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
> +		return -EBUSY;
> +
> +	lru_add_drain();
> +	ret = isolate_lru_page(page);
> +	if (ret)
> +		get_page(page); /* Potential BUG? */
> +
> +	list_add(&page->lru, &cma_migrate_pages);
> +	put_page(page); /* Drop the gup reference */
> +
> +	ret = migrate_pages(&cma_migrate_pages, new_iommu_non_cma_page,
> +				NULL, 0, MIGRATE_SYNC, MR_CMA);
> +	if (ret) {
> +		if (!list_empty(&cma_migrate_pages))
> +			putback_movable_pages(&cma_migrate_pages);
> +	}
> +	return 0;
> +}
> +
>  long mm_iommu_get(unsigned long ua, unsigned long entries,
>  		struct mm_iommu_table_group_mem_t **pmem)
>  {
> @@ -124,15 +175,36 @@ long mm_iommu_get(unsigned long ua, unsigned long entries,
>  	for (i = 0; i < entries; ++i) {
>  		if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
>  					1/* pages */, 1/* iswrite */, &page)) {
> +			ret = -EFAULT;
>  			for (j = 0; j < i; ++j)
> -				put_page(pfn_to_page(
> -						mem->hpas[j] >> PAGE_SHIFT));
> +				put_page(pfn_to_page(mem->hpas[j] >>
> +						PAGE_SHIFT));
>  			vfree(mem->hpas);
>  			kfree(mem);
> -			ret = -EFAULT;
>  			goto unlock_exit;
>  		}
> -
> +		/*
> +		 * If we get a page from the CMA zone, since we are going to
> +		 * be pinning these entries, we might as well move them out
> +		 * of the CMA zone if possible. NOTE: faulting in + migration
> +		 * can be expensive. Batching can be considered later
> +		 */
> +		if (get_pageblock_migratetype(page) == MIGRATE_CMA) {
> +			if (mm_iommu_move_page_from_cma(page))
> +				goto populate;
> +			if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
> +						1/* pages */, 1/* iswrite */,
> +						&page)) {
> +				ret = -EFAULT;
> +				for (j = 0; j < i; ++j)
> +					put_page(pfn_to_page(mem->hpas[j] >>
> +								PAGE_SHIFT));
> +				vfree(mem->hpas);
> +				kfree(mem);
> +				goto unlock_exit;
> +			}
> +		}
> +populate:
>  		mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
>  	}
>  
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
@ 2016-08-31  4:14   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 27+ messages in thread
From: Alexey Kardashevskiy @ 2016-08-31  4:14 UTC (permalink / raw)
  To: bsingharora, linuxppc-dev, kvm-ppc, kvm
  Cc: linux-mm, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman

On 14/07/16 14:25, Balbir Singh wrote:
> 
> From: Balbir Singh <bsingharora@gmail.com>
> Subject: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
> 
> When PCI Device pass-through is enabled via VFIO, KVM-PPC will
> pin pages using get_user_pages_fast(). One of the downsides of
> the pinning is that the page could be in CMA region. The CMA
> region is used for other allocations like the hash page table.
> Ideally we want the pinned pages to be from non CMA region.
> 
> This patch (currently only for KVM PPC with VFIO) forcefully
> migrates the pages out (huge pages are omitted for the moment).
> There are more efficient ways of doing this, but that might
> be elaborate and might impact a larger audience beyond just
> the kvm ppc implementation.
> 
> The magic is in new_iommu_non_cma_page() which allocates the
> new page from a non CMA region.
> 
> I've tested the patches lightly at my end, but there might be bugs
> For example if after lru_add_drain(), the page is not isolated
> is this a BUG?
> 
> Previous discussion was at
> http://permalink.gmane.org/gmane.linux.kernel.mm/136738
> 
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> Signed-off-by: Balbir Singh <bsingharora@gmail.com>



Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>



> ---
>  arch/powerpc/include/asm/mmu_context.h |  1 +
>  arch/powerpc/mm/mmu_context_iommu.c    | 80 ++++++++++++++++++++++++++++++++--
>  2 files changed, 77 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index 9d2cd0c..475d1be 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -18,6 +18,7 @@ extern void destroy_context(struct mm_struct *mm);
>  #ifdef CONFIG_SPAPR_TCE_IOMMU
>  struct mm_iommu_table_group_mem_t;
>  
> +extern int isolate_lru_page(struct page *page);	/* from internal.h */
>  extern bool mm_iommu_preregistered(void);
>  extern long mm_iommu_get(unsigned long ua, unsigned long entries,
>  		struct mm_iommu_table_group_mem_t **pmem);
> diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
> index da6a216..c18f742 100644
> --- a/arch/powerpc/mm/mmu_context_iommu.c
> +++ b/arch/powerpc/mm/mmu_context_iommu.c
> @@ -15,6 +15,9 @@
>  #include <linux/rculist.h>
>  #include <linux/vmalloc.h>
>  #include <linux/mutex.h>
> +#include <linux/migrate.h>
> +#include <linux/hugetlb.h>
> +#include <linux/swap.h>
>  #include <asm/mmu_context.h>
>  
>  static DEFINE_MUTEX(mem_list_mutex);
> @@ -72,6 +75,54 @@ bool mm_iommu_preregistered(void)
>  }
>  EXPORT_SYMBOL_GPL(mm_iommu_preregistered);
>  
> +/*
> + * Taken from alloc_migrate_target with changes to remove CMA allocations
> + */
> +struct page *new_iommu_non_cma_page(struct page *page, unsigned long private,
> +					int **resultp)
> +{
> +	gfp_t gfp_mask = GFP_USER;
> +	struct page *new_page;
> +
> +	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
> +		return NULL;
> +
> +	if (PageHighMem(page))
> +		gfp_mask |= __GFP_HIGHMEM;
> +
> +	/*
> +	 * We don't want the allocation to force an OOM if possibe
> +	 */
> +	new_page = alloc_page(gfp_mask | __GFP_NORETRY | __GFP_NOWARN);
> +	return new_page;
> +}
> +
> +static int mm_iommu_move_page_from_cma(struct page *page)
> +{
> +	int ret;
> +	LIST_HEAD(cma_migrate_pages);
> +
> +	/* Ignore huge pages for now */
> +	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
> +		return -EBUSY;
> +
> +	lru_add_drain();
> +	ret = isolate_lru_page(page);
> +	if (ret)
> +		get_page(page); /* Potential BUG? */
> +
> +	list_add(&page->lru, &cma_migrate_pages);
> +	put_page(page); /* Drop the gup reference */
> +
> +	ret = migrate_pages(&cma_migrate_pages, new_iommu_non_cma_page,
> +				NULL, 0, MIGRATE_SYNC, MR_CMA);
> +	if (ret) {
> +		if (!list_empty(&cma_migrate_pages))
> +			putback_movable_pages(&cma_migrate_pages);
> +	}
> +	return 0;
> +}
> +
>  long mm_iommu_get(unsigned long ua, unsigned long entries,
>  		struct mm_iommu_table_group_mem_t **pmem)
>  {
> @@ -124,15 +175,36 @@ long mm_iommu_get(unsigned long ua, unsigned long entries,
>  	for (i = 0; i < entries; ++i) {
>  		if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
>  					1/* pages */, 1/* iswrite */, &page)) {
> +			ret = -EFAULT;
>  			for (j = 0; j < i; ++j)
> -				put_page(pfn_to_page(
> -						mem->hpas[j] >> PAGE_SHIFT));
> +				put_page(pfn_to_page(mem->hpas[j] >>
> +						PAGE_SHIFT));
>  			vfree(mem->hpas);
>  			kfree(mem);
> -			ret = -EFAULT;
>  			goto unlock_exit;
>  		}
> -
> +		/*
> +		 * If we get a page from the CMA zone, since we are going to
> +		 * be pinning these entries, we might as well move them out
> +		 * of the CMA zone if possible. NOTE: faulting in + migration
> +		 * can be expensive. Batching can be considered later
> +		 */
> +		if (get_pageblock_migratetype(page) = MIGRATE_CMA) {
> +			if (mm_iommu_move_page_from_cma(page))
> +				goto populate;
> +			if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
> +						1/* pages */, 1/* iswrite */,
> +						&page)) {
> +				ret = -EFAULT;
> +				for (j = 0; j < i; ++j)
> +					put_page(pfn_to_page(mem->hpas[j] >>
> +								PAGE_SHIFT));
> +				vfree(mem->hpas);
> +				kfree(mem);
> +				goto unlock_exit;
> +			}
> +		}
> +populate:
>  		mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
>  	}
>  
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
  2016-08-31  4:14   ` Alexey Kardashevskiy
  (?)
@ 2016-09-06  1:55     ` Balbir Singh
  -1 siblings, 0 replies; 27+ messages in thread
From: Balbir Singh @ 2016-09-06  1:55 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev, kvm-ppc, kvm
  Cc: linux-mm, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman



On 31/08/16 14:14, Alexey Kardashevskiy wrote:
> On 14/07/16 14:25, Balbir Singh wrote:
>>
>> From: Balbir Singh <bsingharora@gmail.com>
>> Subject: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
>>
>> When PCI Device pass-through is enabled via VFIO, KVM-PPC will
>> pin pages using get_user_pages_fast(). One of the downsides of
>> the pinning is that the page could be in CMA region. The CMA
>> region is used for other allocations like the hash page table.
>> Ideally we want the pinned pages to be from non CMA region.
>>
>> This patch (currently only for KVM PPC with VFIO) forcefully
>> migrates the pages out (huge pages are omitted for the moment).
>> There are more efficient ways of doing this, but that might
>> be elaborate and might impact a larger audience beyond just
>> the kvm ppc implementation.
>>
>> The magic is in new_iommu_non_cma_page() which allocates the
>> new page from a non CMA region.
>>
>> I've tested the patches lightly at my end, but there might be bugs
>> For example if after lru_add_drain(), the page is not isolated
>> is this a BUG?
>>
>> Previous discussion was at
>> http://permalink.gmane.org/gmane.linux.kernel.mm/136738
>>
>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Paul Mackerras <paulus@ozlabs.org>
>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>>
>> Signed-off-by: Balbir Singh <bsingharora@gmail.com>
> 
> 
> 
> Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> 

Thanks! I tested this patch against latest mainline and here are the test results


System RAM - 64GB

VM instance 1 - size 55GB

Before patch - nr_free_cma after launch 8900
After patch - nr_free_cma after launch 39500

Balbir Singh.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
@ 2016-09-06  1:55     ` Balbir Singh
  0 siblings, 0 replies; 27+ messages in thread
From: Balbir Singh @ 2016-09-06  1:55 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev, kvm-ppc, kvm
  Cc: linux-mm, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman



On 31/08/16 14:14, Alexey Kardashevskiy wrote:
> On 14/07/16 14:25, Balbir Singh wrote:
>>
>> From: Balbir Singh <bsingharora@gmail.com>
>> Subject: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
>>
>> When PCI Device pass-through is enabled via VFIO, KVM-PPC will
>> pin pages using get_user_pages_fast(). One of the downsides of
>> the pinning is that the page could be in CMA region. The CMA
>> region is used for other allocations like the hash page table.
>> Ideally we want the pinned pages to be from non CMA region.
>>
>> This patch (currently only for KVM PPC with VFIO) forcefully
>> migrates the pages out (huge pages are omitted for the moment).
>> There are more efficient ways of doing this, but that might
>> be elaborate and might impact a larger audience beyond just
>> the kvm ppc implementation.
>>
>> The magic is in new_iommu_non_cma_page() which allocates the
>> new page from a non CMA region.
>>
>> I've tested the patches lightly at my end, but there might be bugs
>> For example if after lru_add_drain(), the page is not isolated
>> is this a BUG?
>>
>> Previous discussion was at
>> http://permalink.gmane.org/gmane.linux.kernel.mm/136738
>>
>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Paul Mackerras <paulus@ozlabs.org>
>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>>
>> Signed-off-by: Balbir Singh <bsingharora@gmail.com>
> 
> 
> 
> Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> 

Thanks! I tested this patch against latest mainline and here are the test results


System RAM - 64GB

VM instance 1 - size 55GB

Before patch - nr_free_cma after launch 8900
After patch - nr_free_cma after launch 39500

Balbir Singh.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
@ 2016-09-06  1:55     ` Balbir Singh
  0 siblings, 0 replies; 27+ messages in thread
From: Balbir Singh @ 2016-09-06  1:55 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev, kvm-ppc, kvm
  Cc: linux-mm, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman



On 31/08/16 14:14, Alexey Kardashevskiy wrote:
> On 14/07/16 14:25, Balbir Singh wrote:
>>
>> From: Balbir Singh <bsingharora@gmail.com>
>> Subject: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
>>
>> When PCI Device pass-through is enabled via VFIO, KVM-PPC will
>> pin pages using get_user_pages_fast(). One of the downsides of
>> the pinning is that the page could be in CMA region. The CMA
>> region is used for other allocations like the hash page table.
>> Ideally we want the pinned pages to be from non CMA region.
>>
>> This patch (currently only for KVM PPC with VFIO) forcefully
>> migrates the pages out (huge pages are omitted for the moment).
>> There are more efficient ways of doing this, but that might
>> be elaborate and might impact a larger audience beyond just
>> the kvm ppc implementation.
>>
>> The magic is in new_iommu_non_cma_page() which allocates the
>> new page from a non CMA region.
>>
>> I've tested the patches lightly at my end, but there might be bugs
>> For example if after lru_add_drain(), the page is not isolated
>> is this a BUG?
>>
>> Previous discussion was at
>> http://permalink.gmane.org/gmane.linux.kernel.mm/136738
>>
>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Paul Mackerras <paulus@ozlabs.org>
>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>>
>> Signed-off-by: Balbir Singh <bsingharora@gmail.com>
> 
> 
> 
> Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> 

Thanks! I tested this patch against latest mainline and here are the test results


System RAM - 64GB

VM instance 1 - size 55GB

Before patch - nr_free_cma after launch 8900
After patch - nr_free_cma after launch 39500

Balbir Singh.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
  2016-07-14  4:25 ` Balbir Singh
  (?)
@ 2016-09-06  5:49   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 27+ messages in thread
From: Aneesh Kumar K.V @ 2016-09-06  5:49 UTC (permalink / raw)
  To: bsingharora, linuxppc-dev, kvm-ppc, kvm
  Cc: linux-mm, Benjamin Herrenschmidt, Alexey Kardashevskiy,
	Paul Mackerras, Michael Ellerman

Balbir Singh <bsingharora@gmail.com> writes:

> From: Balbir Singh <bsingharora@gmail.com>
> Subject: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
>
> When PCI Device pass-through is enabled via VFIO, KVM-PPC will
> pin pages using get_user_pages_fast(). One of the downsides of
> the pinning is that the page could be in CMA region. The CMA
> region is used for other allocations like the hash page table.
> Ideally we want the pinned pages to be from non CMA region.
>
> This patch (currently only for KVM PPC with VFIO) forcefully
> migrates the pages out (huge pages are omitted for the moment).
> There are more efficient ways of doing this, but that might
> be elaborate and might impact a larger audience beyond just
> the kvm ppc implementation.
>
> The magic is in new_iommu_non_cma_page() which allocates the
> new page from a non CMA region.
>
> I've tested the patches lightly at my end, but there might be bugs
> For example if after lru_add_drain(), the page is not isolated
> is this a BUG?
>
> Previous discussion was at
> http://permalink.gmane.org/gmane.linux.kernel.mm/136738
>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>
> Signed-off-by: Balbir Singh <bsingharora@gmail.com>
> ---
>  arch/powerpc/include/asm/mmu_context.h |  1 +
>  arch/powerpc/mm/mmu_context_iommu.c    | 80 ++++++++++++++++++++++++++++++++--
>  2 files changed, 77 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index 9d2cd0c..475d1be 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -18,6 +18,7 @@ extern void destroy_context(struct mm_struct *mm);
>  #ifdef CONFIG_SPAPR_TCE_IOMMU
>  struct mm_iommu_table_group_mem_t;
>  
> +extern int isolate_lru_page(struct page *page);	/* from internal.h */
>  extern bool mm_iommu_preregistered(void);
>  extern long mm_iommu_get(unsigned long ua, unsigned long entries,
>  		struct mm_iommu_table_group_mem_t **pmem);
> diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
> index da6a216..c18f742 100644
> --- a/arch/powerpc/mm/mmu_context_iommu.c
> +++ b/arch/powerpc/mm/mmu_context_iommu.c
> @@ -15,6 +15,9 @@
>  #include <linux/rculist.h>
>  #include <linux/vmalloc.h>
>  #include <linux/mutex.h>
> +#include <linux/migrate.h>
> +#include <linux/hugetlb.h>
> +#include <linux/swap.h>
>  #include <asm/mmu_context.h>
>  
>  static DEFINE_MUTEX(mem_list_mutex);
> @@ -72,6 +75,54 @@ bool mm_iommu_preregistered(void)
>  }
>  EXPORT_SYMBOL_GPL(mm_iommu_preregistered);
>  
> +/*
> + * Taken from alloc_migrate_target with changes to remove CMA allocations
> + */
> +struct page *new_iommu_non_cma_page(struct page *page, unsigned long private,
> +					int **resultp)
> +{
> +	gfp_t gfp_mask = GFP_USER;
> +	struct page *new_page;
> +
> +	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
> +		return NULL;

Doesn't a PageCompound check cover all ?


> +
> +	if (PageHighMem(page))
> +		gfp_mask |= __GFP_HIGHMEM;
> +
> +	/*
> +	 * We don't want the allocation to force an OOM if possibe
> +	 */
> +	new_page = alloc_page(gfp_mask | __GFP_NORETRY | __GFP_NOWARN);
> +	return new_page;
> +}
> +
> +static int mm_iommu_move_page_from_cma(struct page *page)
> +{
> +	int ret;
> +	LIST_HEAD(cma_migrate_pages);
> +
> +	/* Ignore huge pages for now */
> +	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
> +		return -EBUSY;
> +
> +	lru_add_drain();

I guess I asked this last time. Shouldn't this be lru_add_drain_all() ?
What if the page is in other cpu's pagevec ?


> +	ret = isolate_lru_page(page);
> +	if (ret)
> +		get_page(page); /* Potential BUG? */
> +
> +	list_add(&page->lru, &cma_migrate_pages);

Is that correct ? if we failed the isolate_lru_page(), can we be sure we
are not on lru at all ? ie, what if the page was on other cpu pagevec ?


> +	put_page(page); /* Drop the gup reference */
> +

Where is get user page (gup) here ? . I guess you mean drop the
reference taken above ?


> +	ret = migrate_pages(&cma_migrate_pages, new_iommu_non_cma_page,
> +				NULL, 0, MIGRATE_SYNC, MR_CMA);
> +	if (ret) {
> +		if (!list_empty(&cma_migrate_pages))
> +			putback_movable_pages(&cma_migrate_pages);
> +	}
> +	return 0;
> +}
> +

I guess the plan was to not do it one page at a time and switch this to list
of pages which we need to migrate. Any reason why that is not tried ?

>  long mm_iommu_get(unsigned long ua, unsigned long entries,
>  		struct mm_iommu_table_group_mem_t **pmem)
>  {
> @@ -124,15 +175,36 @@ long mm_iommu_get(unsigned long ua, unsigned long entries,
>  	for (i = 0; i < entries; ++i) {
>  		if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
>  					1/* pages */, 1/* iswrite */, &page)) {
> +			ret = -EFAULT;
>  			for (j = 0; j < i; ++j)
> -				put_page(pfn_to_page(
> -						mem->hpas[j] >> PAGE_SHIFT));
> +				put_page(pfn_to_page(mem->hpas[j] >>
> +						PAGE_SHIFT));
>  			vfree(mem->hpas);
>  			kfree(mem);
> -			ret = -EFAULT;
>  			goto unlock_exit;
>  		}
> -
> +		/*
> +		 * If we get a page from the CMA zone, since we are going to
> +		 * be pinning these entries, we might as well move them out
> +		 * of the CMA zone if possible. NOTE: faulting in + migration
> +		 * can be expensive. Batching can be considered later
> +		 */
> +		if (get_pageblock_migratetype(page) == MIGRATE_CMA) {
> +			if (mm_iommu_move_page_from_cma(page))
> +				goto populate;
> +			if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
> +						1/* pages */, 1/* iswrite */,
> +						&page)) {
> +				ret = -EFAULT;
> +				for (j = 0; j < i; ++j)
> +					put_page(pfn_to_page(mem->hpas[j] >>
> +								PAGE_SHIFT));
> +				vfree(mem->hpas);
> +				kfree(mem);
> +				goto unlock_exit;
> +			}
> +		}
> +populate:
>  		mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
>  	}
>  
> -- 
> 2.5.5
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
@ 2016-09-06  5:49   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 27+ messages in thread
From: Aneesh Kumar K.V @ 2016-09-06  5:49 UTC (permalink / raw)
  To: bsingharora, linuxppc-dev, kvm-ppc, kvm
  Cc: linux-mm, Benjamin Herrenschmidt, Alexey Kardashevskiy,
	Paul Mackerras, Michael Ellerman

Balbir Singh <bsingharora@gmail.com> writes:

> From: Balbir Singh <bsingharora@gmail.com>
> Subject: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
>
> When PCI Device pass-through is enabled via VFIO, KVM-PPC will
> pin pages using get_user_pages_fast(). One of the downsides of
> the pinning is that the page could be in CMA region. The CMA
> region is used for other allocations like the hash page table.
> Ideally we want the pinned pages to be from non CMA region.
>
> This patch (currently only for KVM PPC with VFIO) forcefully
> migrates the pages out (huge pages are omitted for the moment).
> There are more efficient ways of doing this, but that might
> be elaborate and might impact a larger audience beyond just
> the kvm ppc implementation.
>
> The magic is in new_iommu_non_cma_page() which allocates the
> new page from a non CMA region.
>
> I've tested the patches lightly at my end, but there might be bugs
> For example if after lru_add_drain(), the page is not isolated
> is this a BUG?
>
> Previous discussion was at
> http://permalink.gmane.org/gmane.linux.kernel.mm/136738
>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>
> Signed-off-by: Balbir Singh <bsingharora@gmail.com>
> ---
>  arch/powerpc/include/asm/mmu_context.h |  1 +
>  arch/powerpc/mm/mmu_context_iommu.c    | 80 ++++++++++++++++++++++++++++++++--
>  2 files changed, 77 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index 9d2cd0c..475d1be 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -18,6 +18,7 @@ extern void destroy_context(struct mm_struct *mm);
>  #ifdef CONFIG_SPAPR_TCE_IOMMU
>  struct mm_iommu_table_group_mem_t;
>  
> +extern int isolate_lru_page(struct page *page);	/* from internal.h */
>  extern bool mm_iommu_preregistered(void);
>  extern long mm_iommu_get(unsigned long ua, unsigned long entries,
>  		struct mm_iommu_table_group_mem_t **pmem);
> diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
> index da6a216..c18f742 100644
> --- a/arch/powerpc/mm/mmu_context_iommu.c
> +++ b/arch/powerpc/mm/mmu_context_iommu.c
> @@ -15,6 +15,9 @@
>  #include <linux/rculist.h>
>  #include <linux/vmalloc.h>
>  #include <linux/mutex.h>
> +#include <linux/migrate.h>
> +#include <linux/hugetlb.h>
> +#include <linux/swap.h>
>  #include <asm/mmu_context.h>
>  
>  static DEFINE_MUTEX(mem_list_mutex);
> @@ -72,6 +75,54 @@ bool mm_iommu_preregistered(void)
>  }
>  EXPORT_SYMBOL_GPL(mm_iommu_preregistered);
>  
> +/*
> + * Taken from alloc_migrate_target with changes to remove CMA allocations
> + */
> +struct page *new_iommu_non_cma_page(struct page *page, unsigned long private,
> +					int **resultp)
> +{
> +	gfp_t gfp_mask = GFP_USER;
> +	struct page *new_page;
> +
> +	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
> +		return NULL;

Doesn't a PageCompound check cover all ?


> +
> +	if (PageHighMem(page))
> +		gfp_mask |= __GFP_HIGHMEM;
> +
> +	/*
> +	 * We don't want the allocation to force an OOM if possibe
> +	 */
> +	new_page = alloc_page(gfp_mask | __GFP_NORETRY | __GFP_NOWARN);
> +	return new_page;
> +}
> +
> +static int mm_iommu_move_page_from_cma(struct page *page)
> +{
> +	int ret;
> +	LIST_HEAD(cma_migrate_pages);
> +
> +	/* Ignore huge pages for now */
> +	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
> +		return -EBUSY;
> +
> +	lru_add_drain();

I guess I asked this last time. Shouldn't this be lru_add_drain_all() ?
What if the page is in other cpu's pagevec ?


> +	ret = isolate_lru_page(page);
> +	if (ret)
> +		get_page(page); /* Potential BUG? */
> +
> +	list_add(&page->lru, &cma_migrate_pages);

Is that correct ? if we failed the isolate_lru_page(), can we be sure we
are not on lru at all ? ie, what if the page was on other cpu pagevec ?


> +	put_page(page); /* Drop the gup reference */
> +

Where is get user page (gup) here ? . I guess you mean drop the
reference taken above ?


> +	ret = migrate_pages(&cma_migrate_pages, new_iommu_non_cma_page,
> +				NULL, 0, MIGRATE_SYNC, MR_CMA);
> +	if (ret) {
> +		if (!list_empty(&cma_migrate_pages))
> +			putback_movable_pages(&cma_migrate_pages);
> +	}
> +	return 0;
> +}
> +

I guess the plan was to not do it one page at a time and switch this to list
of pages which we need to migrate. Any reason why that is not tried ?

>  long mm_iommu_get(unsigned long ua, unsigned long entries,
>  		struct mm_iommu_table_group_mem_t **pmem)
>  {
> @@ -124,15 +175,36 @@ long mm_iommu_get(unsigned long ua, unsigned long entries,
>  	for (i = 0; i < entries; ++i) {
>  		if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
>  					1/* pages */, 1/* iswrite */, &page)) {
> +			ret = -EFAULT;
>  			for (j = 0; j < i; ++j)
> -				put_page(pfn_to_page(
> -						mem->hpas[j] >> PAGE_SHIFT));
> +				put_page(pfn_to_page(mem->hpas[j] >>
> +						PAGE_SHIFT));
>  			vfree(mem->hpas);
>  			kfree(mem);
> -			ret = -EFAULT;
>  			goto unlock_exit;
>  		}
> -
> +		/*
> +		 * If we get a page from the CMA zone, since we are going to
> +		 * be pinning these entries, we might as well move them out
> +		 * of the CMA zone if possible. NOTE: faulting in + migration
> +		 * can be expensive. Batching can be considered later
> +		 */
> +		if (get_pageblock_migratetype(page) == MIGRATE_CMA) {
> +			if (mm_iommu_move_page_from_cma(page))
> +				goto populate;
> +			if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
> +						1/* pages */, 1/* iswrite */,
> +						&page)) {
> +				ret = -EFAULT;
> +				for (j = 0; j < i; ++j)
> +					put_page(pfn_to_page(mem->hpas[j] >>
> +								PAGE_SHIFT));
> +				vfree(mem->hpas);
> +				kfree(mem);
> +				goto unlock_exit;
> +			}
> +		}
> +populate:
>  		mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
>  	}
>  
> -- 
> 2.5.5
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-aneesh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
@ 2016-09-06  5:49   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 27+ messages in thread
From: Aneesh Kumar K.V @ 2016-09-06  5:49 UTC (permalink / raw)
  To: bsingharora, linuxppc-dev, kvm-ppc, kvm
  Cc: linux-mm, Benjamin Herrenschmidt, Alexey Kardashevskiy,
	Paul Mackerras, Michael Ellerman

Balbir Singh <bsingharora@gmail.com> writes:

> From: Balbir Singh <bsingharora@gmail.com>
> Subject: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
>
> When PCI Device pass-through is enabled via VFIO, KVM-PPC will
> pin pages using get_user_pages_fast(). One of the downsides of
> the pinning is that the page could be in CMA region. The CMA
> region is used for other allocations like the hash page table.
> Ideally we want the pinned pages to be from non CMA region.
>
> This patch (currently only for KVM PPC with VFIO) forcefully
> migrates the pages out (huge pages are omitted for the moment).
> There are more efficient ways of doing this, but that might
> be elaborate and might impact a larger audience beyond just
> the kvm ppc implementation.
>
> The magic is in new_iommu_non_cma_page() which allocates the
> new page from a non CMA region.
>
> I've tested the patches lightly at my end, but there might be bugs
> For example if after lru_add_drain(), the page is not isolated
> is this a BUG?
>
> Previous discussion was at
> http://permalink.gmane.org/gmane.linux.kernel.mm/136738
>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>
> Signed-off-by: Balbir Singh <bsingharora@gmail.com>
> ---
>  arch/powerpc/include/asm/mmu_context.h |  1 +
>  arch/powerpc/mm/mmu_context_iommu.c    | 80 ++++++++++++++++++++++++++++++++--
>  2 files changed, 77 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index 9d2cd0c..475d1be 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -18,6 +18,7 @@ extern void destroy_context(struct mm_struct *mm);
>  #ifdef CONFIG_SPAPR_TCE_IOMMU
>  struct mm_iommu_table_group_mem_t;
>  
> +extern int isolate_lru_page(struct page *page);	/* from internal.h */
>  extern bool mm_iommu_preregistered(void);
>  extern long mm_iommu_get(unsigned long ua, unsigned long entries,
>  		struct mm_iommu_table_group_mem_t **pmem);
> diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
> index da6a216..c18f742 100644
> --- a/arch/powerpc/mm/mmu_context_iommu.c
> +++ b/arch/powerpc/mm/mmu_context_iommu.c
> @@ -15,6 +15,9 @@
>  #include <linux/rculist.h>
>  #include <linux/vmalloc.h>
>  #include <linux/mutex.h>
> +#include <linux/migrate.h>
> +#include <linux/hugetlb.h>
> +#include <linux/swap.h>
>  #include <asm/mmu_context.h>
>  
>  static DEFINE_MUTEX(mem_list_mutex);
> @@ -72,6 +75,54 @@ bool mm_iommu_preregistered(void)
>  }
>  EXPORT_SYMBOL_GPL(mm_iommu_preregistered);
>  
> +/*
> + * Taken from alloc_migrate_target with changes to remove CMA allocations
> + */
> +struct page *new_iommu_non_cma_page(struct page *page, unsigned long private,
> +					int **resultp)
> +{
> +	gfp_t gfp_mask = GFP_USER;
> +	struct page *new_page;
> +
> +	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
> +		return NULL;

Doesn't a PageCompound check cover all ?


> +
> +	if (PageHighMem(page))
> +		gfp_mask |= __GFP_HIGHMEM;
> +
> +	/*
> +	 * We don't want the allocation to force an OOM if possibe
> +	 */
> +	new_page = alloc_page(gfp_mask | __GFP_NORETRY | __GFP_NOWARN);
> +	return new_page;
> +}
> +
> +static int mm_iommu_move_page_from_cma(struct page *page)
> +{
> +	int ret;
> +	LIST_HEAD(cma_migrate_pages);
> +
> +	/* Ignore huge pages for now */
> +	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
> +		return -EBUSY;
> +
> +	lru_add_drain();

I guess I asked this last time. Shouldn't this be lru_add_drain_all() ?
What if the page is in other cpu's pagevec ?


> +	ret = isolate_lru_page(page);
> +	if (ret)
> +		get_page(page); /* Potential BUG? */
> +
> +	list_add(&page->lru, &cma_migrate_pages);

Is that correct ? if we failed the isolate_lru_page(), can we be sure we
are not on lru at all ? ie, what if the page was on other cpu pagevec ?


> +	put_page(page); /* Drop the gup reference */
> +

Where is get user page (gup) here ? . I guess you mean drop the
reference taken above ?


> +	ret = migrate_pages(&cma_migrate_pages, new_iommu_non_cma_page,
> +				NULL, 0, MIGRATE_SYNC, MR_CMA);
> +	if (ret) {
> +		if (!list_empty(&cma_migrate_pages))
> +			putback_movable_pages(&cma_migrate_pages);
> +	}
> +	return 0;
> +}
> +

I guess the plan was to not do it one page at a time and switch this to list
of pages which we need to migrate. Any reason why that is not tried ?

>  long mm_iommu_get(unsigned long ua, unsigned long entries,
>  		struct mm_iommu_table_group_mem_t **pmem)
>  {
> @@ -124,15 +175,36 @@ long mm_iommu_get(unsigned long ua, unsigned long entries,
>  	for (i = 0; i < entries; ++i) {
>  		if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
>  					1/* pages */, 1/* iswrite */, &page)) {
> +			ret = -EFAULT;
>  			for (j = 0; j < i; ++j)
> -				put_page(pfn_to_page(
> -						mem->hpas[j] >> PAGE_SHIFT));
> +				put_page(pfn_to_page(mem->hpas[j] >>
> +						PAGE_SHIFT));
>  			vfree(mem->hpas);
>  			kfree(mem);
> -			ret = -EFAULT;
>  			goto unlock_exit;
>  		}
> -
> +		/*
> +		 * If we get a page from the CMA zone, since we are going to
> +		 * be pinning these entries, we might as well move them out
> +		 * of the CMA zone if possible. NOTE: faulting in + migration
> +		 * can be expensive. Batching can be considered later
> +		 */
> +		if (get_pageblock_migratetype(page) = MIGRATE_CMA) {
> +			if (mm_iommu_move_page_from_cma(page))
> +				goto populate;
> +			if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
> +						1/* pages */, 1/* iswrite */,
> +						&page)) {
> +				ret = -EFAULT;
> +				for (j = 0; j < i; ++j)
> +					put_page(pfn_to_page(mem->hpas[j] >>
> +								PAGE_SHIFT));
> +				vfree(mem->hpas);
> +				kfree(mem);
> +				goto unlock_exit;
> +			}
> +		}
> +populate:
>  		mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
>  	}
>  
> -- 
> 2.5.5
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-aneesh


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
  2016-09-06  1:55     ` Balbir Singh
  (?)
@ 2016-09-06  6:27       ` Balbir Singh
  -1 siblings, 0 replies; 27+ messages in thread
From: Balbir Singh @ 2016-09-06  6:27 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev, kvm-ppc, kvm
  Cc: linux-mm, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman


When PCI Device pass-through is enabled via VFIO, KVM-PPC will
pin pages using get_user_pages_fast(). One of the downsides of
the pinning is that the page could be in CMA region. The CMA
region is used for other allocations like the hash page table.
Ideally we want the pinned pages to be from non CMA region.

This patch (currently only for KVM PPC with VFIO) forcefully
migrates the pages out (huge pages are omitted for the moment).
There are more efficient ways of doing this, but that might
be elaborate and might impact a larger audience beyond just
the kvm ppc implementation.

The magic is in new_iommu_non_cma_page() which allocates the
new page from a non CMA region.

I've tested the patches lightly at my end. The full solution
requires migration of THP pages in the CMA region. That work
will be done incrementally on top of this.

Previous discussion was at
http://permalink.gmane.org/gmane.linux.kernel.mm/136738

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/mmu_context.h |  1 +
 arch/powerpc/mm/mmu_context_iommu.c    | 81 ++++++++++++++++++++++++++++++++--
 2 files changed, 78 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 9d2cd0c..475d1be 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -18,6 +18,7 @@ extern void destroy_context(struct mm_struct *mm);
 #ifdef CONFIG_SPAPR_TCE_IOMMU
 struct mm_iommu_table_group_mem_t;
 
+extern int isolate_lru_page(struct page *page);	/* from internal.h */
 extern bool mm_iommu_preregistered(void);
 extern long mm_iommu_get(unsigned long ua, unsigned long entries,
 		struct mm_iommu_table_group_mem_t **pmem);
diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
index da6a216..e0f1c33 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -15,6 +15,9 @@
 #include <linux/rculist.h>
 #include <linux/vmalloc.h>
 #include <linux/mutex.h>
+#include <linux/migrate.h>
+#include <linux/hugetlb.h>
+#include <linux/swap.h>
 #include <asm/mmu_context.h>
 
 static DEFINE_MUTEX(mem_list_mutex);
@@ -72,6 +75,55 @@ bool mm_iommu_preregistered(void)
 }
 EXPORT_SYMBOL_GPL(mm_iommu_preregistered);
 
+/*
+ * Taken from alloc_migrate_target with changes to remove CMA allocations
+ */
+struct page *new_iommu_non_cma_page(struct page *page, unsigned long private,
+					int **resultp)
+{
+	gfp_t gfp_mask = GFP_USER;
+	struct page *new_page;
+
+	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
+		return NULL;
+
+	if (PageHighMem(page))
+		gfp_mask |= __GFP_HIGHMEM;
+
+	/*
+	 * We don't want the allocation to force an OOM if possibe
+	 */
+	new_page = alloc_page(gfp_mask | __GFP_NORETRY | __GFP_NOWARN);
+	return new_page;
+}
+
+static int mm_iommu_move_page_from_cma(struct page *page)
+{
+	int ret = 0;
+	LIST_HEAD(cma_migrate_pages);
+
+	/* Ignore huge pages for now */
+	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
+		return -EBUSY;
+
+	lru_add_drain();
+	ret = isolate_lru_page(page);
+	if (ret)
+		return ret;
+
+	list_add(&page->lru, &cma_migrate_pages);
+	put_page(page); /* Drop the gup reference */
+
+	ret = migrate_pages(&cma_migrate_pages, new_iommu_non_cma_page,
+				NULL, 0, MIGRATE_SYNC, MR_CMA);
+	if (ret) {
+		if (!list_empty(&cma_migrate_pages))
+			putback_movable_pages(&cma_migrate_pages);
+	}
+
+	return 0;
+}
+
 long mm_iommu_get(unsigned long ua, unsigned long entries,
 		struct mm_iommu_table_group_mem_t **pmem)
 {
@@ -124,15 +176,36 @@ long mm_iommu_get(unsigned long ua, unsigned long entries,
 	for (i = 0; i < entries; ++i) {
 		if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
 					1/* pages */, 1/* iswrite */, &page)) {
+			ret = -EFAULT;
 			for (j = 0; j < i; ++j)
-				put_page(pfn_to_page(
-						mem->hpas[j] >> PAGE_SHIFT));
+				put_page(pfn_to_page(mem->hpas[j] >>
+						PAGE_SHIFT));
 			vfree(mem->hpas);
 			kfree(mem);
-			ret = -EFAULT;
 			goto unlock_exit;
 		}
-
+		/*
+		 * If we get a page from the CMA zone, since we are going to
+		 * be pinning these entries, we might as well move them out
+		 * of the CMA zone if possible. NOTE: faulting in + migration
+		 * can be expensive. Batching can be considered later
+		 */
+		if (get_pageblock_migratetype(page) == MIGRATE_CMA) {
+			if (mm_iommu_move_page_from_cma(page))
+				goto populate;
+			if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
+						1/* pages */, 1/* iswrite */,
+						&page)) {
+				ret = -EFAULT;
+				for (j = 0; j < i; ++j)
+					put_page(pfn_to_page(mem->hpas[j] >>
+								PAGE_SHIFT));
+				vfree(mem->hpas);
+				kfree(mem);
+				goto unlock_exit;
+			}
+		}
+populate:
 		mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
 	}
 
-- 
2.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
@ 2016-09-06  6:27       ` Balbir Singh
  0 siblings, 0 replies; 27+ messages in thread
From: Balbir Singh @ 2016-09-06  6:27 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev, kvm-ppc, kvm
  Cc: linux-mm, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman


When PCI Device pass-through is enabled via VFIO, KVM-PPC will
pin pages using get_user_pages_fast(). One of the downsides of
the pinning is that the page could be in CMA region. The CMA
region is used for other allocations like the hash page table.
Ideally we want the pinned pages to be from non CMA region.

This patch (currently only for KVM PPC with VFIO) forcefully
migrates the pages out (huge pages are omitted for the moment).
There are more efficient ways of doing this, but that might
be elaborate and might impact a larger audience beyond just
the kvm ppc implementation.

The magic is in new_iommu_non_cma_page() which allocates the
new page from a non CMA region.

I've tested the patches lightly at my end. The full solution
requires migration of THP pages in the CMA region. That work
will be done incrementally on top of this.

Previous discussion was at
http://permalink.gmane.org/gmane.linux.kernel.mm/136738

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/mmu_context.h |  1 +
 arch/powerpc/mm/mmu_context_iommu.c    | 81 ++++++++++++++++++++++++++++++++--
 2 files changed, 78 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 9d2cd0c..475d1be 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -18,6 +18,7 @@ extern void destroy_context(struct mm_struct *mm);
 #ifdef CONFIG_SPAPR_TCE_IOMMU
 struct mm_iommu_table_group_mem_t;
 
+extern int isolate_lru_page(struct page *page);	/* from internal.h */
 extern bool mm_iommu_preregistered(void);
 extern long mm_iommu_get(unsigned long ua, unsigned long entries,
 		struct mm_iommu_table_group_mem_t **pmem);
diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
index da6a216..e0f1c33 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -15,6 +15,9 @@
 #include <linux/rculist.h>
 #include <linux/vmalloc.h>
 #include <linux/mutex.h>
+#include <linux/migrate.h>
+#include <linux/hugetlb.h>
+#include <linux/swap.h>
 #include <asm/mmu_context.h>
 
 static DEFINE_MUTEX(mem_list_mutex);
@@ -72,6 +75,55 @@ bool mm_iommu_preregistered(void)
 }
 EXPORT_SYMBOL_GPL(mm_iommu_preregistered);
 
+/*
+ * Taken from alloc_migrate_target with changes to remove CMA allocations
+ */
+struct page *new_iommu_non_cma_page(struct page *page, unsigned long private,
+					int **resultp)
+{
+	gfp_t gfp_mask = GFP_USER;
+	struct page *new_page;
+
+	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
+		return NULL;
+
+	if (PageHighMem(page))
+		gfp_mask |= __GFP_HIGHMEM;
+
+	/*
+	 * We don't want the allocation to force an OOM if possibe
+	 */
+	new_page = alloc_page(gfp_mask | __GFP_NORETRY | __GFP_NOWARN);
+	return new_page;
+}
+
+static int mm_iommu_move_page_from_cma(struct page *page)
+{
+	int ret = 0;
+	LIST_HEAD(cma_migrate_pages);
+
+	/* Ignore huge pages for now */
+	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
+		return -EBUSY;
+
+	lru_add_drain();
+	ret = isolate_lru_page(page);
+	if (ret)
+		return ret;
+
+	list_add(&page->lru, &cma_migrate_pages);
+	put_page(page); /* Drop the gup reference */
+
+	ret = migrate_pages(&cma_migrate_pages, new_iommu_non_cma_page,
+				NULL, 0, MIGRATE_SYNC, MR_CMA);
+	if (ret) {
+		if (!list_empty(&cma_migrate_pages))
+			putback_movable_pages(&cma_migrate_pages);
+	}
+
+	return 0;
+}
+
 long mm_iommu_get(unsigned long ua, unsigned long entries,
 		struct mm_iommu_table_group_mem_t **pmem)
 {
@@ -124,15 +176,36 @@ long mm_iommu_get(unsigned long ua, unsigned long entries,
 	for (i = 0; i < entries; ++i) {
 		if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
 					1/* pages */, 1/* iswrite */, &page)) {
+			ret = -EFAULT;
 			for (j = 0; j < i; ++j)
-				put_page(pfn_to_page(
-						mem->hpas[j] >> PAGE_SHIFT));
+				put_page(pfn_to_page(mem->hpas[j] >>
+						PAGE_SHIFT));
 			vfree(mem->hpas);
 			kfree(mem);
-			ret = -EFAULT;
 			goto unlock_exit;
 		}
-
+		/*
+		 * If we get a page from the CMA zone, since we are going to
+		 * be pinning these entries, we might as well move them out
+		 * of the CMA zone if possible. NOTE: faulting in + migration
+		 * can be expensive. Batching can be considered later
+		 */
+		if (get_pageblock_migratetype(page) == MIGRATE_CMA) {
+			if (mm_iommu_move_page_from_cma(page))
+				goto populate;
+			if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
+						1/* pages */, 1/* iswrite */,
+						&page)) {
+				ret = -EFAULT;
+				for (j = 0; j < i; ++j)
+					put_page(pfn_to_page(mem->hpas[j] >>
+								PAGE_SHIFT));
+				vfree(mem->hpas);
+				kfree(mem);
+				goto unlock_exit;
+			}
+		}
+populate:
 		mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
 	}
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
@ 2016-09-06  6:27       ` Balbir Singh
  0 siblings, 0 replies; 27+ messages in thread
From: Balbir Singh @ 2016-09-06  6:27 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev, kvm-ppc, kvm
  Cc: linux-mm, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman


When PCI Device pass-through is enabled via VFIO, KVM-PPC will
pin pages using get_user_pages_fast(). One of the downsides of
the pinning is that the page could be in CMA region. The CMA
region is used for other allocations like the hash page table.
Ideally we want the pinned pages to be from non CMA region.

This patch (currently only for KVM PPC with VFIO) forcefully
migrates the pages out (huge pages are omitted for the moment).
There are more efficient ways of doing this, but that might
be elaborate and might impact a larger audience beyond just
the kvm ppc implementation.

The magic is in new_iommu_non_cma_page() which allocates the
new page from a non CMA region.

I've tested the patches lightly at my end. The full solution
requires migration of THP pages in the CMA region. That work
will be done incrementally on top of this.

Previous discussion was at
http://permalink.gmane.org/gmane.linux.kernel.mm/136738

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/mmu_context.h |  1 +
 arch/powerpc/mm/mmu_context_iommu.c    | 81 ++++++++++++++++++++++++++++++++--
 2 files changed, 78 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 9d2cd0c..475d1be 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -18,6 +18,7 @@ extern void destroy_context(struct mm_struct *mm);
 #ifdef CONFIG_SPAPR_TCE_IOMMU
 struct mm_iommu_table_group_mem_t;
 
+extern int isolate_lru_page(struct page *page);	/* from internal.h */
 extern bool mm_iommu_preregistered(void);
 extern long mm_iommu_get(unsigned long ua, unsigned long entries,
 		struct mm_iommu_table_group_mem_t **pmem);
diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
index da6a216..e0f1c33 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -15,6 +15,9 @@
 #include <linux/rculist.h>
 #include <linux/vmalloc.h>
 #include <linux/mutex.h>
+#include <linux/migrate.h>
+#include <linux/hugetlb.h>
+#include <linux/swap.h>
 #include <asm/mmu_context.h>
 
 static DEFINE_MUTEX(mem_list_mutex);
@@ -72,6 +75,55 @@ bool mm_iommu_preregistered(void)
 }
 EXPORT_SYMBOL_GPL(mm_iommu_preregistered);
 
+/*
+ * Taken from alloc_migrate_target with changes to remove CMA allocations
+ */
+struct page *new_iommu_non_cma_page(struct page *page, unsigned long private,
+					int **resultp)
+{
+	gfp_t gfp_mask = GFP_USER;
+	struct page *new_page;
+
+	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
+		return NULL;
+
+	if (PageHighMem(page))
+		gfp_mask |= __GFP_HIGHMEM;
+
+	/*
+	 * We don't want the allocation to force an OOM if possibe
+	 */
+	new_page = alloc_page(gfp_mask | __GFP_NORETRY | __GFP_NOWARN);
+	return new_page;
+}
+
+static int mm_iommu_move_page_from_cma(struct page *page)
+{
+	int ret = 0;
+	LIST_HEAD(cma_migrate_pages);
+
+	/* Ignore huge pages for now */
+	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
+		return -EBUSY;
+
+	lru_add_drain();
+	ret = isolate_lru_page(page);
+	if (ret)
+		return ret;
+
+	list_add(&page->lru, &cma_migrate_pages);
+	put_page(page); /* Drop the gup reference */
+
+	ret = migrate_pages(&cma_migrate_pages, new_iommu_non_cma_page,
+				NULL, 0, MIGRATE_SYNC, MR_CMA);
+	if (ret) {
+		if (!list_empty(&cma_migrate_pages))
+			putback_movable_pages(&cma_migrate_pages);
+	}
+
+	return 0;
+}
+
 long mm_iommu_get(unsigned long ua, unsigned long entries,
 		struct mm_iommu_table_group_mem_t **pmem)
 {
@@ -124,15 +176,36 @@ long mm_iommu_get(unsigned long ua, unsigned long entries,
 	for (i = 0; i < entries; ++i) {
 		if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
 					1/* pages */, 1/* iswrite */, &page)) {
+			ret = -EFAULT;
 			for (j = 0; j < i; ++j)
-				put_page(pfn_to_page(
-						mem->hpas[j] >> PAGE_SHIFT));
+				put_page(pfn_to_page(mem->hpas[j] >>
+						PAGE_SHIFT));
 			vfree(mem->hpas);
 			kfree(mem);
-			ret = -EFAULT;
 			goto unlock_exit;
 		}
-
+		/*
+		 * If we get a page from the CMA zone, since we are going to
+		 * be pinning these entries, we might as well move them out
+		 * of the CMA zone if possible. NOTE: faulting in + migration
+		 * can be expensive. Batching can be considered later
+		 */
+		if (get_pageblock_migratetype(page) = MIGRATE_CMA) {
+			if (mm_iommu_move_page_from_cma(page))
+				goto populate;
+			if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
+						1/* pages */, 1/* iswrite */,
+						&page)) {
+				ret = -EFAULT;
+				for (j = 0; j < i; ++j)
+					put_page(pfn_to_page(mem->hpas[j] >>
+								PAGE_SHIFT));
+				vfree(mem->hpas);
+				kfree(mem);
+				goto unlock_exit;
+			}
+		}
+populate:
 		mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
 	}
 
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
  2016-09-06  5:49   ` Aneesh Kumar K.V
  (?)
@ 2016-09-06  7:46     ` Balbir Singh
  -1 siblings, 0 replies; 27+ messages in thread
From: Balbir Singh @ 2016-09-06  7:46 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linuxppc-dev, kvm-ppc, kvm
  Cc: linux-mm, Benjamin Herrenschmidt, Alexey Kardashevskiy,
	Paul Mackerras, Michael Ellerman



On 06/09/16 15:49, Aneesh Kumar K.V wrote:
> Balbir Singh <bsingharora@gmail.com> writes:
> 
>> From: Balbir Singh <bsingharora@gmail.com>
>> Subject: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
>>
>> When PCI Device pass-through is enabled via VFIO, KVM-PPC will
>> pin pages using get_user_pages_fast(). One of the downsides of
>> the pinning is that the page could be in CMA region. The CMA
>> region is used for other allocations like the hash page table.
>> Ideally we want the pinned pages to be from non CMA region.
>>
>> This patch (currently only for KVM PPC with VFIO) forcefully
>> migrates the pages out (huge pages are omitted for the moment).
>> There are more efficient ways of doing this, but that might
>> be elaborate and might impact a larger audience beyond just
>> the kvm ppc implementation.
>>
>> The magic is in new_iommu_non_cma_page() which allocates the
>> new page from a non CMA region.
>>
>> I've tested the patches lightly at my end, but there might be bugs
>> For example if after lru_add_drain(), the page is not isolated
>> is this a BUG?
>>
>> Previous discussion was at
>> http://permalink.gmane.org/gmane.linux.kernel.mm/136738
>>
>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Paul Mackerras <paulus@ozlabs.org>
>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>>
>> Signed-off-by: Balbir Singh <bsingharora@gmail.com>
>> ---
>>  arch/powerpc/include/asm/mmu_context.h |  1 +
>>  arch/powerpc/mm/mmu_context_iommu.c    | 80 ++++++++++++++++++++++++++++++++--
>>  2 files changed, 77 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
>> index 9d2cd0c..475d1be 100644
>> --- a/arch/powerpc/include/asm/mmu_context.h
>> +++ b/arch/powerpc/include/asm/mmu_context.h
>> @@ -18,6 +18,7 @@ extern void destroy_context(struct mm_struct *mm);
>>  #ifdef CONFIG_SPAPR_TCE_IOMMU
>>  struct mm_iommu_table_group_mem_t;
>>  
>> +extern int isolate_lru_page(struct page *page);	/* from internal.h */
>>  extern bool mm_iommu_preregistered(void);
>>  extern long mm_iommu_get(unsigned long ua, unsigned long entries,
>>  		struct mm_iommu_table_group_mem_t **pmem);
>> diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
>> index da6a216..c18f742 100644
>> --- a/arch/powerpc/mm/mmu_context_iommu.c
>> +++ b/arch/powerpc/mm/mmu_context_iommu.c
>> @@ -15,6 +15,9 @@
>>  #include <linux/rculist.h>
>>  #include <linux/vmalloc.h>
>>  #include <linux/mutex.h>
>> +#include <linux/migrate.h>
>> +#include <linux/hugetlb.h>
>> +#include <linux/swap.h>
>>  #include <asm/mmu_context.h>
>>  
>>  static DEFINE_MUTEX(mem_list_mutex);
>> @@ -72,6 +75,54 @@ bool mm_iommu_preregistered(void)
>>  }
>>  EXPORT_SYMBOL_GPL(mm_iommu_preregistered);
>>  
>> +/*
>> + * Taken from alloc_migrate_target with changes to remove CMA allocations
>> + */
>> +struct page *new_iommu_non_cma_page(struct page *page, unsigned long private,
>> +					int **resultp)
>> +{
>> +	gfp_t gfp_mask = GFP_USER;
>> +	struct page *new_page;
>> +
>> +	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
>> +		return NULL;
> 
> Doesn't a PageCompound check cover all ?

Yes, I was being overly conservative with the checks

> 
> 
>> +
>> +	if (PageHighMem(page))
>> +		gfp_mask |= __GFP_HIGHMEM;
>> +
>> +	/*
>> +	 * We don't want the allocation to force an OOM if possibe
>> +	 */
>> +	new_page = alloc_page(gfp_mask | __GFP_NORETRY | __GFP_NOWARN);
>> +	return new_page;
>> +}
>> +
>> +static int mm_iommu_move_page_from_cma(struct page *page)
>> +{
>> +	int ret;
>> +	LIST_HEAD(cma_migrate_pages);
>> +
>> +	/* Ignore huge pages for now */
>> +	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
>> +		return -EBUSY;
>> +
>> +	lru_add_drain();
> 
> I guess I asked this last time. Shouldn't this be lru_add_drain_all() ?
> What if the page is in other cpu's pagevec ?

lru_add_drain_all() is too expensive for a per-page migration. This is best
effort. If it is on the pagevec of another CPU, we skip it -- see v3

> 
> 
>> +	ret = isolate_lru_page(page);
>> +	if (ret)
>> +		get_page(page); /* Potential BUG? */
>> +
>> +	list_add(&page->lru, &cma_migrate_pages);
> 
> Is that correct ? if we failed the isolate_lru_page(), can we be sure we
> are not on lru at all ? ie, what if the page was on other cpu pagevec ?
> 
> 

Fixed in v3

>> +	put_page(page); /* Drop the gup reference */
>> +
> 
> Where is get user page (gup) here ? . I guess you mean drop the
> reference taken above ?
> 

I say gup, because we'll do gup after this point if migration fails
and that we reacquire the reference lost here.

> 
>> +	ret = migrate_pages(&cma_migrate_pages, new_iommu_non_cma_page,
>> +				NULL, 0, MIGRATE_SYNC, MR_CMA);
>> +	if (ret) {
>> +		if (!list_empty(&cma_migrate_pages))
>> +			putback_movable_pages(&cma_migrate_pages);
>> +	}
>> +	return 0;
>> +}
>> +
> 
> I guess the plan was to not do it one page at a time and switch this to list
> of pages which we need to migrate. Any reason why that is not tried ?
> 

Yes, it is a TODO. Here is my order of preference

1. get this in
2. Get THP migration in -- larger workset
3. Do page aggregation for both 1 and 2


Balbir Singh.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
@ 2016-09-06  7:46     ` Balbir Singh
  0 siblings, 0 replies; 27+ messages in thread
From: Balbir Singh @ 2016-09-06  7:46 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linuxppc-dev, kvm-ppc, kvm
  Cc: linux-mm, Benjamin Herrenschmidt, Alexey Kardashevskiy,
	Paul Mackerras, Michael Ellerman



On 06/09/16 15:49, Aneesh Kumar K.V wrote:
> Balbir Singh <bsingharora@gmail.com> writes:
> 
>> From: Balbir Singh <bsingharora@gmail.com>
>> Subject: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
>>
>> When PCI Device pass-through is enabled via VFIO, KVM-PPC will
>> pin pages using get_user_pages_fast(). One of the downsides of
>> the pinning is that the page could be in CMA region. The CMA
>> region is used for other allocations like the hash page table.
>> Ideally we want the pinned pages to be from non CMA region.
>>
>> This patch (currently only for KVM PPC with VFIO) forcefully
>> migrates the pages out (huge pages are omitted for the moment).
>> There are more efficient ways of doing this, but that might
>> be elaborate and might impact a larger audience beyond just
>> the kvm ppc implementation.
>>
>> The magic is in new_iommu_non_cma_page() which allocates the
>> new page from a non CMA region.
>>
>> I've tested the patches lightly at my end, but there might be bugs
>> For example if after lru_add_drain(), the page is not isolated
>> is this a BUG?
>>
>> Previous discussion was at
>> http://permalink.gmane.org/gmane.linux.kernel.mm/136738
>>
>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Paul Mackerras <paulus@ozlabs.org>
>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>>
>> Signed-off-by: Balbir Singh <bsingharora@gmail.com>
>> ---
>>  arch/powerpc/include/asm/mmu_context.h |  1 +
>>  arch/powerpc/mm/mmu_context_iommu.c    | 80 ++++++++++++++++++++++++++++++++--
>>  2 files changed, 77 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
>> index 9d2cd0c..475d1be 100644
>> --- a/arch/powerpc/include/asm/mmu_context.h
>> +++ b/arch/powerpc/include/asm/mmu_context.h
>> @@ -18,6 +18,7 @@ extern void destroy_context(struct mm_struct *mm);
>>  #ifdef CONFIG_SPAPR_TCE_IOMMU
>>  struct mm_iommu_table_group_mem_t;
>>  
>> +extern int isolate_lru_page(struct page *page);	/* from internal.h */
>>  extern bool mm_iommu_preregistered(void);
>>  extern long mm_iommu_get(unsigned long ua, unsigned long entries,
>>  		struct mm_iommu_table_group_mem_t **pmem);
>> diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
>> index da6a216..c18f742 100644
>> --- a/arch/powerpc/mm/mmu_context_iommu.c
>> +++ b/arch/powerpc/mm/mmu_context_iommu.c
>> @@ -15,6 +15,9 @@
>>  #include <linux/rculist.h>
>>  #include <linux/vmalloc.h>
>>  #include <linux/mutex.h>
>> +#include <linux/migrate.h>
>> +#include <linux/hugetlb.h>
>> +#include <linux/swap.h>
>>  #include <asm/mmu_context.h>
>>  
>>  static DEFINE_MUTEX(mem_list_mutex);
>> @@ -72,6 +75,54 @@ bool mm_iommu_preregistered(void)
>>  }
>>  EXPORT_SYMBOL_GPL(mm_iommu_preregistered);
>>  
>> +/*
>> + * Taken from alloc_migrate_target with changes to remove CMA allocations
>> + */
>> +struct page *new_iommu_non_cma_page(struct page *page, unsigned long private,
>> +					int **resultp)
>> +{
>> +	gfp_t gfp_mask = GFP_USER;
>> +	struct page *new_page;
>> +
>> +	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
>> +		return NULL;
> 
> Doesn't a PageCompound check cover all ?

Yes, I was being overly conservative with the checks

> 
> 
>> +
>> +	if (PageHighMem(page))
>> +		gfp_mask |= __GFP_HIGHMEM;
>> +
>> +	/*
>> +	 * We don't want the allocation to force an OOM if possibe
>> +	 */
>> +	new_page = alloc_page(gfp_mask | __GFP_NORETRY | __GFP_NOWARN);
>> +	return new_page;
>> +}
>> +
>> +static int mm_iommu_move_page_from_cma(struct page *page)
>> +{
>> +	int ret;
>> +	LIST_HEAD(cma_migrate_pages);
>> +
>> +	/* Ignore huge pages for now */
>> +	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
>> +		return -EBUSY;
>> +
>> +	lru_add_drain();
> 
> I guess I asked this last time. Shouldn't this be lru_add_drain_all() ?
> What if the page is in other cpu's pagevec ?

lru_add_drain_all() is too expensive for a per-page migration. This is best
effort. If it is on the pagevec of another CPU, we skip it -- see v3

> 
> 
>> +	ret = isolate_lru_page(page);
>> +	if (ret)
>> +		get_page(page); /* Potential BUG? */
>> +
>> +	list_add(&page->lru, &cma_migrate_pages);
> 
> Is that correct ? if we failed the isolate_lru_page(), can we be sure we
> are not on lru at all ? ie, what if the page was on other cpu pagevec ?
> 
> 

Fixed in v3

>> +	put_page(page); /* Drop the gup reference */
>> +
> 
> Where is get user page (gup) here ? . I guess you mean drop the
> reference taken above ?
> 

I say gup, because we'll do gup after this point if migration fails
and that we reacquire the reference lost here.

> 
>> +	ret = migrate_pages(&cma_migrate_pages, new_iommu_non_cma_page,
>> +				NULL, 0, MIGRATE_SYNC, MR_CMA);
>> +	if (ret) {
>> +		if (!list_empty(&cma_migrate_pages))
>> +			putback_movable_pages(&cma_migrate_pages);
>> +	}
>> +	return 0;
>> +}
>> +
> 
> I guess the plan was to not do it one page at a time and switch this to list
> of pages which we need to migrate. Any reason why that is not tried ?
> 

Yes, it is a TODO. Here is my order of preference

1. get this in
2. Get THP migration in -- larger workset
3. Do page aggregation for both 1 and 2


Balbir Singh.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
@ 2016-09-06  7:46     ` Balbir Singh
  0 siblings, 0 replies; 27+ messages in thread
From: Balbir Singh @ 2016-09-06  7:46 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linuxppc-dev, kvm-ppc, kvm
  Cc: linux-mm, Benjamin Herrenschmidt, Alexey Kardashevskiy,
	Paul Mackerras, Michael Ellerman



On 06/09/16 15:49, Aneesh Kumar K.V wrote:
> Balbir Singh <bsingharora@gmail.com> writes:
> 
>> From: Balbir Singh <bsingharora@gmail.com>
>> Subject: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
>>
>> When PCI Device pass-through is enabled via VFIO, KVM-PPC will
>> pin pages using get_user_pages_fast(). One of the downsides of
>> the pinning is that the page could be in CMA region. The CMA
>> region is used for other allocations like the hash page table.
>> Ideally we want the pinned pages to be from non CMA region.
>>
>> This patch (currently only for KVM PPC with VFIO) forcefully
>> migrates the pages out (huge pages are omitted for the moment).
>> There are more efficient ways of doing this, but that might
>> be elaborate and might impact a larger audience beyond just
>> the kvm ppc implementation.
>>
>> The magic is in new_iommu_non_cma_page() which allocates the
>> new page from a non CMA region.
>>
>> I've tested the patches lightly at my end, but there might be bugs
>> For example if after lru_add_drain(), the page is not isolated
>> is this a BUG?
>>
>> Previous discussion was at
>> http://permalink.gmane.org/gmane.linux.kernel.mm/136738
>>
>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Paul Mackerras <paulus@ozlabs.org>
>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>>
>> Signed-off-by: Balbir Singh <bsingharora@gmail.com>
>> ---
>>  arch/powerpc/include/asm/mmu_context.h |  1 +
>>  arch/powerpc/mm/mmu_context_iommu.c    | 80 ++++++++++++++++++++++++++++++++--
>>  2 files changed, 77 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
>> index 9d2cd0c..475d1be 100644
>> --- a/arch/powerpc/include/asm/mmu_context.h
>> +++ b/arch/powerpc/include/asm/mmu_context.h
>> @@ -18,6 +18,7 @@ extern void destroy_context(struct mm_struct *mm);
>>  #ifdef CONFIG_SPAPR_TCE_IOMMU
>>  struct mm_iommu_table_group_mem_t;
>>  
>> +extern int isolate_lru_page(struct page *page);	/* from internal.h */
>>  extern bool mm_iommu_preregistered(void);
>>  extern long mm_iommu_get(unsigned long ua, unsigned long entries,
>>  		struct mm_iommu_table_group_mem_t **pmem);
>> diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
>> index da6a216..c18f742 100644
>> --- a/arch/powerpc/mm/mmu_context_iommu.c
>> +++ b/arch/powerpc/mm/mmu_context_iommu.c
>> @@ -15,6 +15,9 @@
>>  #include <linux/rculist.h>
>>  #include <linux/vmalloc.h>
>>  #include <linux/mutex.h>
>> +#include <linux/migrate.h>
>> +#include <linux/hugetlb.h>
>> +#include <linux/swap.h>
>>  #include <asm/mmu_context.h>
>>  
>>  static DEFINE_MUTEX(mem_list_mutex);
>> @@ -72,6 +75,54 @@ bool mm_iommu_preregistered(void)
>>  }
>>  EXPORT_SYMBOL_GPL(mm_iommu_preregistered);
>>  
>> +/*
>> + * Taken from alloc_migrate_target with changes to remove CMA allocations
>> + */
>> +struct page *new_iommu_non_cma_page(struct page *page, unsigned long private,
>> +					int **resultp)
>> +{
>> +	gfp_t gfp_mask = GFP_USER;
>> +	struct page *new_page;
>> +
>> +	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
>> +		return NULL;
> 
> Doesn't a PageCompound check cover all ?

Yes, I was being overly conservative with the checks

> 
> 
>> +
>> +	if (PageHighMem(page))
>> +		gfp_mask |= __GFP_HIGHMEM;
>> +
>> +	/*
>> +	 * We don't want the allocation to force an OOM if possibe
>> +	 */
>> +	new_page = alloc_page(gfp_mask | __GFP_NORETRY | __GFP_NOWARN);
>> +	return new_page;
>> +}
>> +
>> +static int mm_iommu_move_page_from_cma(struct page *page)
>> +{
>> +	int ret;
>> +	LIST_HEAD(cma_migrate_pages);
>> +
>> +	/* Ignore huge pages for now */
>> +	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
>> +		return -EBUSY;
>> +
>> +	lru_add_drain();
> 
> I guess I asked this last time. Shouldn't this be lru_add_drain_all() ?
> What if the page is in other cpu's pagevec ?

lru_add_drain_all() is too expensive for a per-page migration. This is best
effort. If it is on the pagevec of another CPU, we skip it -- see v3

> 
> 
>> +	ret = isolate_lru_page(page);
>> +	if (ret)
>> +		get_page(page); /* Potential BUG? */
>> +
>> +	list_add(&page->lru, &cma_migrate_pages);
> 
> Is that correct ? if we failed the isolate_lru_page(), can we be sure we
> are not on lru at all ? ie, what if the page was on other cpu pagevec ?
> 
> 

Fixed in v3

>> +	put_page(page); /* Drop the gup reference */
>> +
> 
> Where is get user page (gup) here ? . I guess you mean drop the
> reference taken above ?
> 

I say gup, because we'll do gup after this point if migration fails
and that we reacquire the reference lost here.

> 
>> +	ret = migrate_pages(&cma_migrate_pages, new_iommu_non_cma_page,
>> +				NULL, 0, MIGRATE_SYNC, MR_CMA);
>> +	if (ret) {
>> +		if (!list_empty(&cma_migrate_pages))
>> +			putback_movable_pages(&cma_migrate_pages);
>> +	}
>> +	return 0;
>> +}
>> +
> 
> I guess the plan was to not do it one page at a time and switch this to list
> of pages which we need to migrate. Any reason why that is not tried ?
> 

Yes, it is a TODO. Here is my order of preference

1. get this in
2. Get THP migration in -- larger workset
3. Do page aggregation for both 1 and 2


Balbir Singh.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
  2016-09-06  6:27       ` Balbir Singh
  (?)
@ 2016-09-06 11:54         ` Anshuman Khandual
  -1 siblings, 0 replies; 27+ messages in thread
From: Anshuman Khandual @ 2016-09-06 11:54 UTC (permalink / raw)
  To: Balbir Singh, Alexey Kardashevskiy, linuxppc-dev, kvm-ppc, kvm
  Cc: linux-mm, Paul Mackerras

On 09/06/2016 11:57 AM, Balbir Singh wrote:
> 
> When PCI Device pass-through is enabled via VFIO, KVM-PPC will
> pin pages using get_user_pages_fast(). One of the downsides of
> the pinning is that the page could be in CMA region. The CMA
> region is used for other allocations like the hash page table.
> Ideally we want the pinned pages to be from non CMA region.
> 
> This patch (currently only for KVM PPC with VFIO) forcefully
> migrates the pages out (huge pages are omitted for the moment).
> There are more efficient ways of doing this, but that might
> be elaborate and might impact a larger audience beyond just
> the kvm ppc implementation.
> 
> The magic is in new_iommu_non_cma_page() which allocates the
> new page from a non CMA region.
> 
> I've tested the patches lightly at my end. The full solution
> requires migration of THP pages in the CMA region. That work
> will be done incrementally on top of this.
> 
> Previous discussion was at
> http://permalink.gmane.org/gmane.linux.kernel.mm/136738
> 
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> Signed-off-by: Balbir Singh <bsingharora@gmail.com>
> Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/include/asm/mmu_context.h |  1 +
>  arch/powerpc/mm/mmu_context_iommu.c    | 81 ++++++++++++++++++++++++++++++++--
>  2 files changed, 78 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index 9d2cd0c..475d1be 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -18,6 +18,7 @@ extern void destroy_context(struct mm_struct *mm);
>  #ifdef CONFIG_SPAPR_TCE_IOMMU
>  struct mm_iommu_table_group_mem_t;
> 
> +extern int isolate_lru_page(struct page *page);	/* from internal.h */

Small nit, cant we just add "mm/internal.h" header here with full path ?

>  extern bool mm_iommu_preregistered(void);
>  extern long mm_iommu_get(unsigned long ua, unsigned long entries,
>  		struct mm_iommu_table_group_mem_t **pmem);
> diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
> index da6a216..e0f1c33 100644
> --- a/arch/powerpc/mm/mmu_context_iommu.c
> +++ b/arch/powerpc/mm/mmu_context_iommu.c
> @@ -15,6 +15,9 @@
>  #include <linux/rculist.h>
>  #include <linux/vmalloc.h>
>  #include <linux/mutex.h>
> +#include <linux/migrate.h>
> +#include <linux/hugetlb.h>
> +#include <linux/swap.h>
>  #include <asm/mmu_context.h>
> 
>  static DEFINE_MUTEX(mem_list_mutex);
> @@ -72,6 +75,55 @@ bool mm_iommu_preregistered(void)
>  }
>  EXPORT_SYMBOL_GPL(mm_iommu_preregistered);
> 
> +/*
> + * Taken from alloc_migrate_target with changes to remove CMA allocations
> + */
> +struct page *new_iommu_non_cma_page(struct page *page, unsigned long private,
> +					int **resultp)
> +{
> +	gfp_t gfp_mask = GFP_USER;
> +	struct page *new_page;
> +
> +	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
> +		return NULL;
> +
> +	if (PageHighMem(page))
> +		gfp_mask |= __GFP_HIGHMEM;
> +
> +	/*
> +	 * We don't want the allocation to force an OOM if possibe
> +	 */
> +	new_page = alloc_page(gfp_mask | __GFP_NORETRY | __GFP_NOWARN);

So what guarantees that the new page too wont come from MIGRATE_CMA
page block ? Is absence of __GFP_MOVABLE flag enough. Also should not
we be checking that migrate type of the new allocated page is indeed
not MIGRATE_CMA ?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
@ 2016-09-06 11:54         ` Anshuman Khandual
  0 siblings, 0 replies; 27+ messages in thread
From: Anshuman Khandual @ 2016-09-06 11:54 UTC (permalink / raw)
  To: Balbir Singh, Alexey Kardashevskiy, linuxppc-dev, kvm-ppc, kvm
  Cc: linux-mm, Paul Mackerras

On 09/06/2016 11:57 AM, Balbir Singh wrote:
> 
> When PCI Device pass-through is enabled via VFIO, KVM-PPC will
> pin pages using get_user_pages_fast(). One of the downsides of
> the pinning is that the page could be in CMA region. The CMA
> region is used for other allocations like the hash page table.
> Ideally we want the pinned pages to be from non CMA region.
> 
> This patch (currently only for KVM PPC with VFIO) forcefully
> migrates the pages out (huge pages are omitted for the moment).
> There are more efficient ways of doing this, but that might
> be elaborate and might impact a larger audience beyond just
> the kvm ppc implementation.
> 
> The magic is in new_iommu_non_cma_page() which allocates the
> new page from a non CMA region.
> 
> I've tested the patches lightly at my end. The full solution
> requires migration of THP pages in the CMA region. That work
> will be done incrementally on top of this.
> 
> Previous discussion was at
> http://permalink.gmane.org/gmane.linux.kernel.mm/136738
> 
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> Signed-off-by: Balbir Singh <bsingharora@gmail.com>
> Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/include/asm/mmu_context.h |  1 +
>  arch/powerpc/mm/mmu_context_iommu.c    | 81 ++++++++++++++++++++++++++++++++--
>  2 files changed, 78 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index 9d2cd0c..475d1be 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -18,6 +18,7 @@ extern void destroy_context(struct mm_struct *mm);
>  #ifdef CONFIG_SPAPR_TCE_IOMMU
>  struct mm_iommu_table_group_mem_t;
> 
> +extern int isolate_lru_page(struct page *page);	/* from internal.h */

Small nit, cant we just add "mm/internal.h" header here with full path ?

>  extern bool mm_iommu_preregistered(void);
>  extern long mm_iommu_get(unsigned long ua, unsigned long entries,
>  		struct mm_iommu_table_group_mem_t **pmem);
> diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
> index da6a216..e0f1c33 100644
> --- a/arch/powerpc/mm/mmu_context_iommu.c
> +++ b/arch/powerpc/mm/mmu_context_iommu.c
> @@ -15,6 +15,9 @@
>  #include <linux/rculist.h>
>  #include <linux/vmalloc.h>
>  #include <linux/mutex.h>
> +#include <linux/migrate.h>
> +#include <linux/hugetlb.h>
> +#include <linux/swap.h>
>  #include <asm/mmu_context.h>
> 
>  static DEFINE_MUTEX(mem_list_mutex);
> @@ -72,6 +75,55 @@ bool mm_iommu_preregistered(void)
>  }
>  EXPORT_SYMBOL_GPL(mm_iommu_preregistered);
> 
> +/*
> + * Taken from alloc_migrate_target with changes to remove CMA allocations
> + */
> +struct page *new_iommu_non_cma_page(struct page *page, unsigned long private,
> +					int **resultp)
> +{
> +	gfp_t gfp_mask = GFP_USER;
> +	struct page *new_page;
> +
> +	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
> +		return NULL;
> +
> +	if (PageHighMem(page))
> +		gfp_mask |= __GFP_HIGHMEM;
> +
> +	/*
> +	 * We don't want the allocation to force an OOM if possibe
> +	 */
> +	new_page = alloc_page(gfp_mask | __GFP_NORETRY | __GFP_NOWARN);

So what guarantees that the new page too wont come from MIGRATE_CMA
page block ? Is absence of __GFP_MOVABLE flag enough. Also should not
we be checking that migrate type of the new allocated page is indeed
not MIGRATE_CMA ?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
@ 2016-09-06 11:54         ` Anshuman Khandual
  0 siblings, 0 replies; 27+ messages in thread
From: Anshuman Khandual @ 2016-09-06 11:54 UTC (permalink / raw)
  To: Balbir Singh, Alexey Kardashevskiy, linuxppc-dev, kvm-ppc, kvm
  Cc: linux-mm, Paul Mackerras

On 09/06/2016 11:57 AM, Balbir Singh wrote:
> 
> When PCI Device pass-through is enabled via VFIO, KVM-PPC will
> pin pages using get_user_pages_fast(). One of the downsides of
> the pinning is that the page could be in CMA region. The CMA
> region is used for other allocations like the hash page table.
> Ideally we want the pinned pages to be from non CMA region.
> 
> This patch (currently only for KVM PPC with VFIO) forcefully
> migrates the pages out (huge pages are omitted for the moment).
> There are more efficient ways of doing this, but that might
> be elaborate and might impact a larger audience beyond just
> the kvm ppc implementation.
> 
> The magic is in new_iommu_non_cma_page() which allocates the
> new page from a non CMA region.
> 
> I've tested the patches lightly at my end. The full solution
> requires migration of THP pages in the CMA region. That work
> will be done incrementally on top of this.
> 
> Previous discussion was at
> http://permalink.gmane.org/gmane.linux.kernel.mm/136738
> 
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> Signed-off-by: Balbir Singh <bsingharora@gmail.com>
> Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/include/asm/mmu_context.h |  1 +
>  arch/powerpc/mm/mmu_context_iommu.c    | 81 ++++++++++++++++++++++++++++++++--
>  2 files changed, 78 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index 9d2cd0c..475d1be 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -18,6 +18,7 @@ extern void destroy_context(struct mm_struct *mm);
>  #ifdef CONFIG_SPAPR_TCE_IOMMU
>  struct mm_iommu_table_group_mem_t;
> 
> +extern int isolate_lru_page(struct page *page);	/* from internal.h */

Small nit, cant we just add "mm/internal.h" header here with full path ?

>  extern bool mm_iommu_preregistered(void);
>  extern long mm_iommu_get(unsigned long ua, unsigned long entries,
>  		struct mm_iommu_table_group_mem_t **pmem);
> diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
> index da6a216..e0f1c33 100644
> --- a/arch/powerpc/mm/mmu_context_iommu.c
> +++ b/arch/powerpc/mm/mmu_context_iommu.c
> @@ -15,6 +15,9 @@
>  #include <linux/rculist.h>
>  #include <linux/vmalloc.h>
>  #include <linux/mutex.h>
> +#include <linux/migrate.h>
> +#include <linux/hugetlb.h>
> +#include <linux/swap.h>
>  #include <asm/mmu_context.h>
> 
>  static DEFINE_MUTEX(mem_list_mutex);
> @@ -72,6 +75,55 @@ bool mm_iommu_preregistered(void)
>  }
>  EXPORT_SYMBOL_GPL(mm_iommu_preregistered);
> 
> +/*
> + * Taken from alloc_migrate_target with changes to remove CMA allocations
> + */
> +struct page *new_iommu_non_cma_page(struct page *page, unsigned long private,
> +					int **resultp)
> +{
> +	gfp_t gfp_mask = GFP_USER;
> +	struct page *new_page;
> +
> +	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
> +		return NULL;
> +
> +	if (PageHighMem(page))
> +		gfp_mask |= __GFP_HIGHMEM;
> +
> +	/*
> +	 * We don't want the allocation to force an OOM if possibe
> +	 */
> +	new_page = alloc_page(gfp_mask | __GFP_NORETRY | __GFP_NOWARN);

So what guarantees that the new page too wont come from MIGRATE_CMA
page block ? Is absence of __GFP_MOVABLE flag enough. Also should not
we be checking that migrate type of the new allocated page is indeed
not MIGRATE_CMA ?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
  2016-09-06 11:54         ` Anshuman Khandual
  (?)
@ 2016-09-06 23:53           ` Balbir Singh
  -1 siblings, 0 replies; 27+ messages in thread
From: Balbir Singh @ 2016-09-06 23:53 UTC (permalink / raw)
  To: Anshuman Khandual, Alexey Kardashevskiy, linuxppc-dev, kvm-ppc, kvm
  Cc: linux-mm, Paul Mackerras



On 06/09/16 21:54, Anshuman Khandual wrote:
> On 09/06/2016 11:57 AM, Balbir Singh wrote:
>>
>> When PCI Device pass-through is enabled via VFIO, KVM-PPC will
>> pin pages using get_user_pages_fast(). One of the downsides of
>> the pinning is that the page could be in CMA region. The CMA
>> region is used for other allocations like the hash page table.
>> Ideally we want the pinned pages to be from non CMA region.
>>
>> This patch (currently only for KVM PPC with VFIO) forcefully
>> migrates the pages out (huge pages are omitted for the moment).
>> There are more efficient ways of doing this, but that might
>> be elaborate and might impact a larger audience beyond just
>> the kvm ppc implementation.
>>
>> The magic is in new_iommu_non_cma_page() which allocates the
>> new page from a non CMA region.
>>
>> I've tested the patches lightly at my end. The full solution
>> requires migration of THP pages in the CMA region. That work
>> will be done incrementally on top of this.
>>
>> Previous discussion was at
>> http://permalink.gmane.org/gmane.linux.kernel.mm/136738
>>
>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Paul Mackerras <paulus@ozlabs.org>
>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>>
>> Signed-off-by: Balbir Singh <bsingharora@gmail.com>
>> Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  arch/powerpc/include/asm/mmu_context.h |  1 +
>>  arch/powerpc/mm/mmu_context_iommu.c    | 81 ++++++++++++++++++++++++++++++++--
>>  2 files changed, 78 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
>> index 9d2cd0c..475d1be 100644
>> --- a/arch/powerpc/include/asm/mmu_context.h
>> +++ b/arch/powerpc/include/asm/mmu_context.h
>> @@ -18,6 +18,7 @@ extern void destroy_context(struct mm_struct *mm);
>>  #ifdef CONFIG_SPAPR_TCE_IOMMU
>>  struct mm_iommu_table_group_mem_t;
>>
>> +extern int isolate_lru_page(struct page *page);	/* from internal.h */
> 
> Small nit, cant we just add "mm/internal.h" header here with full path ?
> 

I did not think it was worth including it here

>>  extern bool mm_iommu_preregistered(void);
>>  extern long mm_iommu_get(unsigned long ua, unsigned long entries,
>>  		struct mm_iommu_table_group_mem_t **pmem);
>> diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
>> index da6a216..e0f1c33 100644
>> --- a/arch/powerpc/mm/mmu_context_iommu.c
>> +++ b/arch/powerpc/mm/mmu_context_iommu.c
>> @@ -15,6 +15,9 @@
>>  #include <linux/rculist.h>
>>  #include <linux/vmalloc.h>
>>  #include <linux/mutex.h>
>> +#include <linux/migrate.h>
>> +#include <linux/hugetlb.h>
>> +#include <linux/swap.h>
>>  #include <asm/mmu_context.h>
>>
>>  static DEFINE_MUTEX(mem_list_mutex);
>> @@ -72,6 +75,55 @@ bool mm_iommu_preregistered(void)
>>  }
>>  EXPORT_SYMBOL_GPL(mm_iommu_preregistered);
>>
>> +/*
>> + * Taken from alloc_migrate_target with changes to remove CMA allocations
>> + */
>> +struct page *new_iommu_non_cma_page(struct page *page, unsigned long private,
>> +					int **resultp)
>> +{
>> +	gfp_t gfp_mask = GFP_USER;
>> +	struct page *new_page;
>> +
>> +	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
>> +		return NULL;
>> +
>> +	if (PageHighMem(page))
>> +		gfp_mask |= __GFP_HIGHMEM;
>> +
>> +	/*
>> +	 * We don't want the allocation to force an OOM if possibe
>> +	 */
>> +	new_page = alloc_page(gfp_mask | __GFP_NORETRY | __GFP_NOWARN);
> 
> So what guarantees that the new page too wont come from MIGRATE_CMA
> page block ? Is absence of __GFP_MOVABLE flag enough.

I think so, that is what I am relying on and checked for

 Also should not
> we be checking that migrate type of the new allocated page is indeed
> not MIGRATE_CMA ?
> 

I don't think that is required, may be I can do a VM_WARN_ON for debugging

Balbir Singh.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
@ 2016-09-06 23:53           ` Balbir Singh
  0 siblings, 0 replies; 27+ messages in thread
From: Balbir Singh @ 2016-09-06 23:53 UTC (permalink / raw)
  To: Anshuman Khandual, Alexey Kardashevskiy, linuxppc-dev, kvm-ppc, kvm
  Cc: linux-mm, Paul Mackerras



On 06/09/16 21:54, Anshuman Khandual wrote:
> On 09/06/2016 11:57 AM, Balbir Singh wrote:
>>
>> When PCI Device pass-through is enabled via VFIO, KVM-PPC will
>> pin pages using get_user_pages_fast(). One of the downsides of
>> the pinning is that the page could be in CMA region. The CMA
>> region is used for other allocations like the hash page table.
>> Ideally we want the pinned pages to be from non CMA region.
>>
>> This patch (currently only for KVM PPC with VFIO) forcefully
>> migrates the pages out (huge pages are omitted for the moment).
>> There are more efficient ways of doing this, but that might
>> be elaborate and might impact a larger audience beyond just
>> the kvm ppc implementation.
>>
>> The magic is in new_iommu_non_cma_page() which allocates the
>> new page from a non CMA region.
>>
>> I've tested the patches lightly at my end. The full solution
>> requires migration of THP pages in the CMA region. That work
>> will be done incrementally on top of this.
>>
>> Previous discussion was at
>> http://permalink.gmane.org/gmane.linux.kernel.mm/136738
>>
>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Paul Mackerras <paulus@ozlabs.org>
>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>>
>> Signed-off-by: Balbir Singh <bsingharora@gmail.com>
>> Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  arch/powerpc/include/asm/mmu_context.h |  1 +
>>  arch/powerpc/mm/mmu_context_iommu.c    | 81 ++++++++++++++++++++++++++++++++--
>>  2 files changed, 78 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
>> index 9d2cd0c..475d1be 100644
>> --- a/arch/powerpc/include/asm/mmu_context.h
>> +++ b/arch/powerpc/include/asm/mmu_context.h
>> @@ -18,6 +18,7 @@ extern void destroy_context(struct mm_struct *mm);
>>  #ifdef CONFIG_SPAPR_TCE_IOMMU
>>  struct mm_iommu_table_group_mem_t;
>>
>> +extern int isolate_lru_page(struct page *page);	/* from internal.h */
> 
> Small nit, cant we just add "mm/internal.h" header here with full path ?
> 

I did not think it was worth including it here

>>  extern bool mm_iommu_preregistered(void);
>>  extern long mm_iommu_get(unsigned long ua, unsigned long entries,
>>  		struct mm_iommu_table_group_mem_t **pmem);
>> diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
>> index da6a216..e0f1c33 100644
>> --- a/arch/powerpc/mm/mmu_context_iommu.c
>> +++ b/arch/powerpc/mm/mmu_context_iommu.c
>> @@ -15,6 +15,9 @@
>>  #include <linux/rculist.h>
>>  #include <linux/vmalloc.h>
>>  #include <linux/mutex.h>
>> +#include <linux/migrate.h>
>> +#include <linux/hugetlb.h>
>> +#include <linux/swap.h>
>>  #include <asm/mmu_context.h>
>>
>>  static DEFINE_MUTEX(mem_list_mutex);
>> @@ -72,6 +75,55 @@ bool mm_iommu_preregistered(void)
>>  }
>>  EXPORT_SYMBOL_GPL(mm_iommu_preregistered);
>>
>> +/*
>> + * Taken from alloc_migrate_target with changes to remove CMA allocations
>> + */
>> +struct page *new_iommu_non_cma_page(struct page *page, unsigned long private,
>> +					int **resultp)
>> +{
>> +	gfp_t gfp_mask = GFP_USER;
>> +	struct page *new_page;
>> +
>> +	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
>> +		return NULL;
>> +
>> +	if (PageHighMem(page))
>> +		gfp_mask |= __GFP_HIGHMEM;
>> +
>> +	/*
>> +	 * We don't want the allocation to force an OOM if possibe
>> +	 */
>> +	new_page = alloc_page(gfp_mask | __GFP_NORETRY | __GFP_NOWARN);
> 
> So what guarantees that the new page too wont come from MIGRATE_CMA
> page block ? Is absence of __GFP_MOVABLE flag enough.

I think so, that is what I am relying on and checked for

 Also should not
> we be checking that migrate type of the new allocated page is indeed
> not MIGRATE_CMA ?
> 

I don't think that is required, may be I can do a VM_WARN_ON for debugging

Balbir Singh.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
@ 2016-09-06 23:53           ` Balbir Singh
  0 siblings, 0 replies; 27+ messages in thread
From: Balbir Singh @ 2016-09-06 23:53 UTC (permalink / raw)
  To: Anshuman Khandual, Alexey Kardashevskiy, linuxppc-dev, kvm-ppc, kvm
  Cc: linux-mm, Paul Mackerras



On 06/09/16 21:54, Anshuman Khandual wrote:
> On 09/06/2016 11:57 AM, Balbir Singh wrote:
>>
>> When PCI Device pass-through is enabled via VFIO, KVM-PPC will
>> pin pages using get_user_pages_fast(). One of the downsides of
>> the pinning is that the page could be in CMA region. The CMA
>> region is used for other allocations like the hash page table.
>> Ideally we want the pinned pages to be from non CMA region.
>>
>> This patch (currently only for KVM PPC with VFIO) forcefully
>> migrates the pages out (huge pages are omitted for the moment).
>> There are more efficient ways of doing this, but that might
>> be elaborate and might impact a larger audience beyond just
>> the kvm ppc implementation.
>>
>> The magic is in new_iommu_non_cma_page() which allocates the
>> new page from a non CMA region.
>>
>> I've tested the patches lightly at my end. The full solution
>> requires migration of THP pages in the CMA region. That work
>> will be done incrementally on top of this.
>>
>> Previous discussion was at
>> http://permalink.gmane.org/gmane.linux.kernel.mm/136738
>>
>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Paul Mackerras <paulus@ozlabs.org>
>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>>
>> Signed-off-by: Balbir Singh <bsingharora@gmail.com>
>> Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  arch/powerpc/include/asm/mmu_context.h |  1 +
>>  arch/powerpc/mm/mmu_context_iommu.c    | 81 ++++++++++++++++++++++++++++++++--
>>  2 files changed, 78 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
>> index 9d2cd0c..475d1be 100644
>> --- a/arch/powerpc/include/asm/mmu_context.h
>> +++ b/arch/powerpc/include/asm/mmu_context.h
>> @@ -18,6 +18,7 @@ extern void destroy_context(struct mm_struct *mm);
>>  #ifdef CONFIG_SPAPR_TCE_IOMMU
>>  struct mm_iommu_table_group_mem_t;
>>
>> +extern int isolate_lru_page(struct page *page);	/* from internal.h */
> 
> Small nit, cant we just add "mm/internal.h" header here with full path ?
> 

I did not think it was worth including it here

>>  extern bool mm_iommu_preregistered(void);
>>  extern long mm_iommu_get(unsigned long ua, unsigned long entries,
>>  		struct mm_iommu_table_group_mem_t **pmem);
>> diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
>> index da6a216..e0f1c33 100644
>> --- a/arch/powerpc/mm/mmu_context_iommu.c
>> +++ b/arch/powerpc/mm/mmu_context_iommu.c
>> @@ -15,6 +15,9 @@
>>  #include <linux/rculist.h>
>>  #include <linux/vmalloc.h>
>>  #include <linux/mutex.h>
>> +#include <linux/migrate.h>
>> +#include <linux/hugetlb.h>
>> +#include <linux/swap.h>
>>  #include <asm/mmu_context.h>
>>
>>  static DEFINE_MUTEX(mem_list_mutex);
>> @@ -72,6 +75,55 @@ bool mm_iommu_preregistered(void)
>>  }
>>  EXPORT_SYMBOL_GPL(mm_iommu_preregistered);
>>
>> +/*
>> + * Taken from alloc_migrate_target with changes to remove CMA allocations
>> + */
>> +struct page *new_iommu_non_cma_page(struct page *page, unsigned long private,
>> +					int **resultp)
>> +{
>> +	gfp_t gfp_mask = GFP_USER;
>> +	struct page *new_page;
>> +
>> +	if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
>> +		return NULL;
>> +
>> +	if (PageHighMem(page))
>> +		gfp_mask |= __GFP_HIGHMEM;
>> +
>> +	/*
>> +	 * We don't want the allocation to force an OOM if possibe
>> +	 */
>> +	new_page = alloc_page(gfp_mask | __GFP_NORETRY | __GFP_NOWARN);
> 
> So what guarantees that the new page too wont come from MIGRATE_CMA
> page block ? Is absence of __GFP_MOVABLE flag enough.

I think so, that is what I am relying on and checked for

 Also should not
> we be checking that migrate type of the new allocated page is indeed
> not MIGRATE_CMA ?
> 

I don't think that is required, may be I can do a VM_WARN_ON for debugging

Balbir Singh.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [v3] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
  2016-09-06  6:27       ` Balbir Singh
  (?)
@ 2016-09-29 13:13         ` Michael Ellerman
  -1 siblings, 0 replies; 27+ messages in thread
From: Michael Ellerman @ 2016-09-29 13:13 UTC (permalink / raw)
  To: Balbir Singh, Alexey Kardashevskiy, linuxppc-dev, kvm-ppc, kvm
  Cc: linux-mm, Paul Mackerras

On Tue, 2016-06-09 at 06:27:31 UTC, Balbir Singh wrote:
> When PCI Device pass-through is enabled via VFIO, KVM-PPC will
> pin pages using get_user_pages_fast(). One of the downsides of
> the pinning is that the page could be in CMA region. The CMA
> region is used for other allocations like the hash page table.
> Ideally we want the pinned pages to be from non CMA region.
> 
> This patch (currently only for KVM PPC with VFIO) forcefully
> migrates the pages out (huge pages are omitted for the moment).
> There are more efficient ways of doing this, but that might
> be elaborate and might impact a larger audience beyond just
> the kvm ppc implementation.
> 
> The magic is in new_iommu_non_cma_page() which allocates the
> new page from a non CMA region.
> 
> I've tested the patches lightly at my end. The full solution
> requires migration of THP pages in the CMA region. That work
> will be done incrementally on top of this.
> 
> Signed-off-by: Balbir Singh <bsingharora@gmail.com>
> Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/2e5bbb5461f138cac631fe21b4ad95

cheers

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [v3] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
@ 2016-09-29 13:13         ` Michael Ellerman
  0 siblings, 0 replies; 27+ messages in thread
From: Michael Ellerman @ 2016-09-29 13:13 UTC (permalink / raw)
  To: Balbir Singh, Alexey Kardashevskiy, linuxppc-dev, kvm-ppc, kvm
  Cc: linux-mm, Paul Mackerras

On Tue, 2016-06-09 at 06:27:31 UTC, Balbir Singh wrote:
> When PCI Device pass-through is enabled via VFIO, KVM-PPC will
> pin pages using get_user_pages_fast(). One of the downsides of
> the pinning is that the page could be in CMA region. The CMA
> region is used for other allocations like the hash page table.
> Ideally we want the pinned pages to be from non CMA region.
> 
> This patch (currently only for KVM PPC with VFIO) forcefully
> migrates the pages out (huge pages are omitted for the moment).
> There are more efficient ways of doing this, but that might
> be elaborate and might impact a larger audience beyond just
> the kvm ppc implementation.
> 
> The magic is in new_iommu_non_cma_page() which allocates the
> new page from a non CMA region.
> 
> I've tested the patches lightly at my end. The full solution
> requires migration of THP pages in the CMA region. That work
> will be done incrementally on top of this.
> 
> Signed-off-by: Balbir Singh <bsingharora@gmail.com>
> Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/2e5bbb5461f138cac631fe21b4ad95

cheers

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [v3] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
@ 2016-09-29 13:13         ` Michael Ellerman
  0 siblings, 0 replies; 27+ messages in thread
From: Michael Ellerman @ 2016-09-29 13:13 UTC (permalink / raw)
  To: Balbir Singh, Alexey Kardashevskiy, linuxppc-dev, kvm-ppc, kvm
  Cc: linux-mm, Paul Mackerras

On Tue, 2016-06-09 at 06:27:31 UTC, Balbir Singh wrote:
> When PCI Device pass-through is enabled via VFIO, KVM-PPC will
> pin pages using get_user_pages_fast(). One of the downsides of
> the pinning is that the page could be in CMA region. The CMA
> region is used for other allocations like the hash page table.
> Ideally we want the pinned pages to be from non CMA region.
> 
> This patch (currently only for KVM PPC with VFIO) forcefully
> migrates the pages out (huge pages are omitted for the moment).
> There are more efficient ways of doing this, but that might
> be elaborate and might impact a larger audience beyond just
> the kvm ppc implementation.
> 
> The magic is in new_iommu_non_cma_page() which allocates the
> new page from a non CMA region.
> 
> I've tested the patches lightly at my end. The full solution
> requires migration of THP pages in the CMA region. That work
> will be done incrementally on top of this.
> 
> Signed-off-by: Balbir Singh <bsingharora@gmail.com>
> Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/2e5bbb5461f138cac631fe21b4ad95

cheers

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2016-09-29 13:13 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-14  4:25 [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA Balbir Singh
2016-07-14  4:25 ` Balbir Singh
2016-07-14  4:25 ` Balbir Singh
2016-08-31  4:14 ` Alexey Kardashevskiy
2016-08-31  4:14   ` Alexey Kardashevskiy
2016-08-31  4:14   ` Alexey Kardashevskiy
2016-09-06  1:55   ` Balbir Singh
2016-09-06  1:55     ` Balbir Singh
2016-09-06  1:55     ` Balbir Singh
2016-09-06  6:27     ` [PATCH v3] " Balbir Singh
2016-09-06  6:27       ` Balbir Singh
2016-09-06  6:27       ` Balbir Singh
2016-09-06 11:54       ` Anshuman Khandual
2016-09-06 11:54         ` Anshuman Khandual
2016-09-06 11:54         ` Anshuman Khandual
2016-09-06 23:53         ` Balbir Singh
2016-09-06 23:53           ` Balbir Singh
2016-09-06 23:53           ` Balbir Singh
2016-09-29 13:13       ` [v3] " Michael Ellerman
2016-09-29 13:13         ` Michael Ellerman
2016-09-29 13:13         ` Michael Ellerman
2016-09-06  5:49 ` [RESEND][v2][PATCH] " Aneesh Kumar K.V
2016-09-06  5:49   ` Aneesh Kumar K.V
2016-09-06  5:49   ` Aneesh Kumar K.V
2016-09-06  7:46   ` Balbir Singh
2016-09-06  7:46     ` Balbir Singh
2016-09-06  7:46     ` Balbir Singh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.