linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv4 0/3] mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path
@ 2019-06-13 10:44 Pingfan Liu
  2019-06-13 10:45 ` [PATCHv4 1/3] mm/gup: rename nr as nr_pinned in get_user_pages_fast() Pingfan Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Pingfan Liu @ 2019-06-13 10:44 UTC (permalink / raw)
  To: linux-mm
  Cc: Pingfan Liu, Ira Weiny, Andrew Morton, Mike Rapoport,
	Dan Williams, Matthew Wilcox, John Hubbard, Aneesh Kumar K.V,
	Keith Busch, Christoph Hellwig, Shuah Khan, linux-kernel

These three patches have no dependency of each other, but related with the same purpose
to improve get_user_page_fast(), patch [2/3]. Put them together.

v3->v4:
  Place the check on FOLL_LONGTERM in gup_pte_range() instead of get_user_page_fast()

Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: linux-kernel@vger.kernel.org

Pingfan Liu (3):
  mm/gup: rename nr as nr_pinned in get_user_pages_fast()
  mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path
  mm/gup_benchemark: add LONGTERM_BENCHMARK test in gup fast path

 mm/gup.c                                   | 46 +++++++++++++++++++++++-------
 mm/gup_benchmark.c                         | 11 +++++--
 tools/testing/selftests/vm/gup_benchmark.c | 10 +++++--
 3 files changed, 52 insertions(+), 15 deletions(-)

-- 
2.7.5


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCHv4 1/3] mm/gup: rename nr as nr_pinned in get_user_pages_fast()
  2019-06-13 10:44 [PATCHv4 0/3] mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path Pingfan Liu
@ 2019-06-13 10:45 ` Pingfan Liu
  2019-06-13 21:28   ` Ira Weiny
  2019-06-13 10:45 ` [PATCHv4 2/3] mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path Pingfan Liu
  2019-06-13 10:45 ` [PATCHv4 3/3] mm/gup_benchemark: add LONGTERM_BENCHMARK test " Pingfan Liu
  2 siblings, 1 reply; 9+ messages in thread
From: Pingfan Liu @ 2019-06-13 10:45 UTC (permalink / raw)
  To: linux-mm
  Cc: Pingfan Liu, Ira Weiny, Andrew Morton, Mike Rapoport,
	Dan Williams, Matthew Wilcox, John Hubbard, Aneesh Kumar K.V,
	Keith Busch, Christoph Hellwig, Shuah Khan, linux-kernel

To better reflect the held state of pages and make code self-explaining,
rename nr as nr_pinned.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: linux-kernel@vger.kernel.org
---
 mm/gup.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index f173fcb..766ae54 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2216,7 +2216,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages,
 			unsigned int gup_flags, struct page **pages)
 {
 	unsigned long addr, len, end;
-	int nr = 0, ret = 0;
+	int nr_pinned = 0, ret = 0;
 
 	start &= PAGE_MASK;
 	addr = start;
@@ -2231,25 +2231,25 @@ int get_user_pages_fast(unsigned long start, int nr_pages,
 
 	if (gup_fast_permitted(start, nr_pages)) {
 		local_irq_disable();
-		gup_pgd_range(addr, end, gup_flags, pages, &nr);
+		gup_pgd_range(addr, end, gup_flags, pages, &nr_pinned);
 		local_irq_enable();
-		ret = nr;
+		ret = nr_pinned;
 	}
 
-	if (nr < nr_pages) {
+	if (nr_pinned < nr_pages) {
 		/* Try to get the remaining pages with get_user_pages */
-		start += nr << PAGE_SHIFT;
-		pages += nr;
+		start += nr_pinned << PAGE_SHIFT;
+		pages += nr_pinned;
 
-		ret = __gup_longterm_unlocked(start, nr_pages - nr,
+		ret = __gup_longterm_unlocked(start, nr_pages - nr_pinned,
 					      gup_flags, pages);
 
 		/* Have to be a bit careful with return values */
-		if (nr > 0) {
+		if (nr_pinned > 0) {
 			if (ret < 0)
-				ret = nr;
+				ret = nr_pinned;
 			else
-				ret += nr;
+				ret += nr_pinned;
 		}
 	}
 
-- 
2.7.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCHv4 2/3] mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path
  2019-06-13 10:44 [PATCHv4 0/3] mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path Pingfan Liu
  2019-06-13 10:45 ` [PATCHv4 1/3] mm/gup: rename nr as nr_pinned in get_user_pages_fast() Pingfan Liu
@ 2019-06-13 10:45 ` Pingfan Liu
  2019-06-13 21:39   ` Ira Weiny
  2019-06-13 10:45 ` [PATCHv4 3/3] mm/gup_benchemark: add LONGTERM_BENCHMARK test " Pingfan Liu
  2 siblings, 1 reply; 9+ messages in thread
From: Pingfan Liu @ 2019-06-13 10:45 UTC (permalink / raw)
  To: linux-mm
  Cc: Pingfan Liu, Ira Weiny, Andrew Morton, Mike Rapoport,
	Dan Williams, Matthew Wilcox, John Hubbard, Aneesh Kumar K.V,
	Keith Busch, Christoph Hellwig, Shuah Khan, linux-kernel

FOLL_LONGTERM suggests a pin which is going to be given to hardware and
can't move. It would truncate CMA permanently and should be excluded.

FOLL_LONGTERM has already been checked in the slow path, but not checked in
the fast path, which means a possible leak of CMA page to longterm pinned
requirement through this crack.

Place a check in gup_pte_range() in the fast path.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: linux-kernel@vger.kernel.org
---
 mm/gup.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/mm/gup.c b/mm/gup.c
index 766ae54..de1b03f 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1757,6 +1757,14 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
 		page = pte_page(pte);
 
+		/*
+		 * FOLL_LONGTERM suggests a pin given to hardware. Prevent it
+		 * from truncating CMA area
+		 */
+		if (unlikely(flags & FOLL_LONGTERM) &&
+			is_migrate_cma_page(page))
+			goto pte_unmap;
+
 		head = try_get_compound_head(page, 1);
 		if (!head)
 			goto pte_unmap;
@@ -1900,6 +1908,12 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
 		refs++;
 	} while (addr += PAGE_SIZE, addr != end);
 
+	if (unlikely(flags & FOLL_LONGTERM) &&
+		is_migrate_cma_page(page)) {
+		*nr -= refs;
+		return 0;
+	}
+
 	head = try_get_compound_head(pmd_page(orig), refs);
 	if (!head) {
 		*nr -= refs;
@@ -1941,6 +1955,12 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
 		refs++;
 	} while (addr += PAGE_SIZE, addr != end);
 
+	if (unlikely(flags & FOLL_LONGTERM) &&
+		is_migrate_cma_page(page)) {
+		*nr -= refs;
+		return 0;
+	}
+
 	head = try_get_compound_head(pud_page(orig), refs);
 	if (!head) {
 		*nr -= refs;
@@ -1978,6 +1998,12 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
 		refs++;
 	} while (addr += PAGE_SIZE, addr != end);
 
+	if (unlikely(flags & FOLL_LONGTERM) &&
+		is_migrate_cma_page(page)) {
+		*nr -= refs;
+		return 0;
+	}
+
 	head = try_get_compound_head(pgd_page(orig), refs);
 	if (!head) {
 		*nr -= refs;
-- 
2.7.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCHv4 3/3] mm/gup_benchemark: add LONGTERM_BENCHMARK test in gup fast path
  2019-06-13 10:44 [PATCHv4 0/3] mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path Pingfan Liu
  2019-06-13 10:45 ` [PATCHv4 1/3] mm/gup: rename nr as nr_pinned in get_user_pages_fast() Pingfan Liu
  2019-06-13 10:45 ` [PATCHv4 2/3] mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path Pingfan Liu
@ 2019-06-13 10:45 ` Pingfan Liu
  2019-06-13 21:42   ` Ira Weiny
  2 siblings, 1 reply; 9+ messages in thread
From: Pingfan Liu @ 2019-06-13 10:45 UTC (permalink / raw)
  To: linux-mm
  Cc: Pingfan Liu, Ira Weiny, Andrew Morton, Mike Rapoport,
	Dan Williams, Matthew Wilcox, John Hubbard, Aneesh Kumar K.V,
	Keith Busch, Christoph Hellwig, Shuah Khan, linux-kernel

Introduce a GUP_LONGTERM_BENCHMARK ioctl to test longterm pin in gup fast
path.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: linux-kernel@vger.kernel.org
---
 mm/gup_benchmark.c                         | 11 +++++++++--
 tools/testing/selftests/vm/gup_benchmark.c | 10 +++++++---
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c
index 7dd602d..83f3378 100644
--- a/mm/gup_benchmark.c
+++ b/mm/gup_benchmark.c
@@ -6,8 +6,9 @@
 #include <linux/debugfs.h>
 
 #define GUP_FAST_BENCHMARK	_IOWR('g', 1, struct gup_benchmark)
-#define GUP_LONGTERM_BENCHMARK	_IOWR('g', 2, struct gup_benchmark)
-#define GUP_BENCHMARK		_IOWR('g', 3, struct gup_benchmark)
+#define GUP_FAST_LONGTERM_BENCHMARK	_IOWR('g', 2, struct gup_benchmark)
+#define GUP_LONGTERM_BENCHMARK	_IOWR('g', 3, struct gup_benchmark)
+#define GUP_BENCHMARK		_IOWR('g', 4, struct gup_benchmark)
 
 struct gup_benchmark {
 	__u64 get_delta_usec;
@@ -53,6 +54,11 @@ static int __gup_benchmark_ioctl(unsigned int cmd,
 			nr = get_user_pages_fast(addr, nr, gup->flags & 1,
 						 pages + i);
 			break;
+		case GUP_FAST_LONGTERM_BENCHMARK:
+			nr = get_user_pages_fast(addr, nr,
+					(gup->flags & 1) | FOLL_LONGTERM,
+					 pages + i);
+			break;
 		case GUP_LONGTERM_BENCHMARK:
 			nr = get_user_pages(addr, nr,
 					    (gup->flags & 1) | FOLL_LONGTERM,
@@ -96,6 +102,7 @@ static long gup_benchmark_ioctl(struct file *filep, unsigned int cmd,
 
 	switch (cmd) {
 	case GUP_FAST_BENCHMARK:
+	case GUP_FAST_LONGTERM_BENCHMARK:
 	case GUP_LONGTERM_BENCHMARK:
 	case GUP_BENCHMARK:
 		break;
diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c
index c0534e2..ade8acb 100644
--- a/tools/testing/selftests/vm/gup_benchmark.c
+++ b/tools/testing/selftests/vm/gup_benchmark.c
@@ -15,8 +15,9 @@
 #define PAGE_SIZE sysconf(_SC_PAGESIZE)
 
 #define GUP_FAST_BENCHMARK	_IOWR('g', 1, struct gup_benchmark)
-#define GUP_LONGTERM_BENCHMARK	_IOWR('g', 2, struct gup_benchmark)
-#define GUP_BENCHMARK		_IOWR('g', 3, struct gup_benchmark)
+#define GUP_FAST_LONGTERM_BENCHMARK	_IOWR('g', 2, struct gup_benchmark)
+#define GUP_LONGTERM_BENCHMARK	_IOWR('g', 3, struct gup_benchmark)
+#define GUP_BENCHMARK		_IOWR('g', 4, struct gup_benchmark)
 
 struct gup_benchmark {
 	__u64 get_delta_usec;
@@ -37,7 +38,7 @@ int main(int argc, char **argv)
 	char *file = "/dev/zero";
 	char *p;
 
-	while ((opt = getopt(argc, argv, "m:r:n:f:tTLUSH")) != -1) {
+	while ((opt = getopt(argc, argv, "m:r:n:f:tTlLUSH")) != -1) {
 		switch (opt) {
 		case 'm':
 			size = atoi(optarg) * MB;
@@ -54,6 +55,9 @@ int main(int argc, char **argv)
 		case 'T':
 			thp = 0;
 			break;
+		case 'l':
+			cmd = GUP_FAST_LONGTERM_BENCHMARK;
+			break;
 		case 'L':
 			cmd = GUP_LONGTERM_BENCHMARK;
 			break;
-- 
2.7.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCHv4 1/3] mm/gup: rename nr as nr_pinned in get_user_pages_fast()
  2019-06-13 10:45 ` [PATCHv4 1/3] mm/gup: rename nr as nr_pinned in get_user_pages_fast() Pingfan Liu
@ 2019-06-13 21:28   ` Ira Weiny
  0 siblings, 0 replies; 9+ messages in thread
From: Ira Weiny @ 2019-06-13 21:28 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-mm, Andrew Morton, Mike Rapoport, Dan Williams,
	Matthew Wilcox, John Hubbard, Aneesh Kumar K.V, Keith Busch,
	Christoph Hellwig, Shuah Khan, linux-kernel

On Thu, Jun 13, 2019 at 06:45:00PM +0800, Pingfan Liu wrote:
> To better reflect the held state of pages and make code self-explaining,
> rename nr as nr_pinned.
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mike Rapoport <rppt@linux.ibm.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> Cc: Keith Busch <keith.busch@intel.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: linux-kernel@vger.kernel.org
> ---
>  mm/gup.c | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index f173fcb..766ae54 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2216,7 +2216,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages,
>  			unsigned int gup_flags, struct page **pages)
>  {
>  	unsigned long addr, len, end;
> -	int nr = 0, ret = 0;
> +	int nr_pinned = 0, ret = 0;
>  
>  	start &= PAGE_MASK;
>  	addr = start;
> @@ -2231,25 +2231,25 @@ int get_user_pages_fast(unsigned long start, int nr_pages,
>  
>  	if (gup_fast_permitted(start, nr_pages)) {
>  		local_irq_disable();
> -		gup_pgd_range(addr, end, gup_flags, pages, &nr);
> +		gup_pgd_range(addr, end, gup_flags, pages, &nr_pinned);
>  		local_irq_enable();
> -		ret = nr;
> +		ret = nr_pinned;
>  	}
>  
> -	if (nr < nr_pages) {
> +	if (nr_pinned < nr_pages) {
>  		/* Try to get the remaining pages with get_user_pages */
> -		start += nr << PAGE_SHIFT;
> -		pages += nr;
> +		start += nr_pinned << PAGE_SHIFT;
> +		pages += nr_pinned;
>  
> -		ret = __gup_longterm_unlocked(start, nr_pages - nr,
> +		ret = __gup_longterm_unlocked(start, nr_pages - nr_pinned,
>  					      gup_flags, pages);
>  
>  		/* Have to be a bit careful with return values */
> -		if (nr > 0) {
> +		if (nr_pinned > 0) {
>  			if (ret < 0)
> -				ret = nr;
> +				ret = nr_pinned;
>  			else
> -				ret += nr;
> +				ret += nr_pinned;
>  		}
>  	}
>  
> -- 
> 2.7.5
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCHv4 2/3] mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path
  2019-06-13 10:45 ` [PATCHv4 2/3] mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path Pingfan Liu
@ 2019-06-13 21:39   ` Ira Weiny
  2019-06-14 15:24     ` Pingfan Liu
  0 siblings, 1 reply; 9+ messages in thread
From: Ira Weiny @ 2019-06-13 21:39 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-mm, Andrew Morton, Mike Rapoport, Dan Williams,
	Matthew Wilcox, John Hubbard, Aneesh Kumar K.V, Keith Busch,
	Christoph Hellwig, Shuah Khan, linux-kernel

On Thu, Jun 13, 2019 at 06:45:01PM +0800, Pingfan Liu wrote:
> FOLL_LONGTERM suggests a pin which is going to be given to hardware and
> can't move. It would truncate CMA permanently and should be excluded.
> 
> FOLL_LONGTERM has already been checked in the slow path, but not checked in
> the fast path, which means a possible leak of CMA page to longterm pinned
> requirement through this crack.
> 
> Place a check in gup_pte_range() in the fast path.
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mike Rapoport <rppt@linux.ibm.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> Cc: Keith Busch <keith.busch@intel.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: linux-kernel@vger.kernel.org
> ---
>  mm/gup.c | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 766ae54..de1b03f 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1757,6 +1757,14 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>  		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
>  		page = pte_page(pte);
>  
> +		/*
> +		 * FOLL_LONGTERM suggests a pin given to hardware. Prevent it
> +		 * from truncating CMA area
> +		 */
> +		if (unlikely(flags & FOLL_LONGTERM) &&
> +			is_migrate_cma_page(page))
> +			goto pte_unmap;
> +
>  		head = try_get_compound_head(page, 1);
>  		if (!head)
>  			goto pte_unmap;
> @@ -1900,6 +1908,12 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
>  		refs++;
>  	} while (addr += PAGE_SIZE, addr != end);
>  
> +	if (unlikely(flags & FOLL_LONGTERM) &&
> +		is_migrate_cma_page(page)) {
> +		*nr -= refs;
> +		return 0;
> +	}
> +

Why can't we place this check before the while loop and skip subtracting the
page count?

Can is_migrate_cma_page() operate on any "subpage" of a compound page? 

Here this calls is_magrate_cma_page() on the tail page of the compound page.

I'm not an expert on compound pages nor cma handling so is this ok?

It seems like you need to call is_migrate_cma_page() on each page within the
while loop?

>  	head = try_get_compound_head(pmd_page(orig), refs);
>  	if (!head) {
>  		*nr -= refs;
> @@ -1941,6 +1955,12 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
>  		refs++;
>  	} while (addr += PAGE_SIZE, addr != end);
>  
> +	if (unlikely(flags & FOLL_LONGTERM) &&
> +		is_migrate_cma_page(page)) {
> +		*nr -= refs;
> +		return 0;
> +	}
> +

Same comment here.

>  	head = try_get_compound_head(pud_page(orig), refs);
>  	if (!head) {
>  		*nr -= refs;
> @@ -1978,6 +1998,12 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
>  		refs++;
>  	} while (addr += PAGE_SIZE, addr != end);
>  
> +	if (unlikely(flags & FOLL_LONGTERM) &&
> +		is_migrate_cma_page(page)) {
> +		*nr -= refs;
> +		return 0;
> +	}
> +

And here.

Ira

>  	head = try_get_compound_head(pgd_page(orig), refs);
>  	if (!head) {
>  		*nr -= refs;
> -- 
> 2.7.5
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCHv4 3/3] mm/gup_benchemark: add LONGTERM_BENCHMARK test in gup fast path
  2019-06-13 10:45 ` [PATCHv4 3/3] mm/gup_benchemark: add LONGTERM_BENCHMARK test " Pingfan Liu
@ 2019-06-13 21:42   ` Ira Weiny
  2019-06-13 21:49     ` Ira Weiny
  0 siblings, 1 reply; 9+ messages in thread
From: Ira Weiny @ 2019-06-13 21:42 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-mm, Andrew Morton, Mike Rapoport, Dan Williams,
	Matthew Wilcox, John Hubbard, Aneesh Kumar K.V, Keith Busch,
	Christoph Hellwig, Shuah Khan, linux-kernel

On Thu, Jun 13, 2019 at 06:45:02PM +0800, Pingfan Liu wrote:
> Introduce a GUP_LONGTERM_BENCHMARK ioctl to test longterm pin in gup fast
> path.
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mike Rapoport <rppt@linux.ibm.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> Cc: Keith Busch <keith.busch@intel.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: linux-kernel@vger.kernel.org
> ---
>  mm/gup_benchmark.c                         | 11 +++++++++--
>  tools/testing/selftests/vm/gup_benchmark.c | 10 +++++++---
>  2 files changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c
> index 7dd602d..83f3378 100644
> --- a/mm/gup_benchmark.c
> +++ b/mm/gup_benchmark.c
> @@ -6,8 +6,9 @@
>  #include <linux/debugfs.h>
>  
>  #define GUP_FAST_BENCHMARK	_IOWR('g', 1, struct gup_benchmark)
> -#define GUP_LONGTERM_BENCHMARK	_IOWR('g', 2, struct gup_benchmark)
> -#define GUP_BENCHMARK		_IOWR('g', 3, struct gup_benchmark)
> +#define GUP_FAST_LONGTERM_BENCHMARK	_IOWR('g', 2, struct gup_benchmark)
> +#define GUP_LONGTERM_BENCHMARK	_IOWR('g', 3, struct gup_benchmark)
> +#define GUP_BENCHMARK		_IOWR('g', 4, struct gup_benchmark)

But I really like this addition!  Thanks!

But why not just add GUP_FAST_LONGTERM_BENCHMARK to the end of this list (value
4)?  I know the user space test program is probably expected to be lock step
with this code but it seems odd to redefine GUP_LONGTERM_BENCHMARK and
GUP_BENCHMARK with this change.

Ira

>  
>  struct gup_benchmark {
>  	__u64 get_delta_usec;
> @@ -53,6 +54,11 @@ static int __gup_benchmark_ioctl(unsigned int cmd,
>  			nr = get_user_pages_fast(addr, nr, gup->flags & 1,
>  						 pages + i);
>  			break;
> +		case GUP_FAST_LONGTERM_BENCHMARK:
> +			nr = get_user_pages_fast(addr, nr,
> +					(gup->flags & 1) | FOLL_LONGTERM,
> +					 pages + i);
> +			break;
>  		case GUP_LONGTERM_BENCHMARK:
>  			nr = get_user_pages(addr, nr,
>  					    (gup->flags & 1) | FOLL_LONGTERM,
> @@ -96,6 +102,7 @@ static long gup_benchmark_ioctl(struct file *filep, unsigned int cmd,
>  
>  	switch (cmd) {
>  	case GUP_FAST_BENCHMARK:
> +	case GUP_FAST_LONGTERM_BENCHMARK:
>  	case GUP_LONGTERM_BENCHMARK:
>  	case GUP_BENCHMARK:
>  		break;
> diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c
> index c0534e2..ade8acb 100644
> --- a/tools/testing/selftests/vm/gup_benchmark.c
> +++ b/tools/testing/selftests/vm/gup_benchmark.c
> @@ -15,8 +15,9 @@
>  #define PAGE_SIZE sysconf(_SC_PAGESIZE)
>  
>  #define GUP_FAST_BENCHMARK	_IOWR('g', 1, struct gup_benchmark)
> -#define GUP_LONGTERM_BENCHMARK	_IOWR('g', 2, struct gup_benchmark)
> -#define GUP_BENCHMARK		_IOWR('g', 3, struct gup_benchmark)
> +#define GUP_FAST_LONGTERM_BENCHMARK	_IOWR('g', 2, struct gup_benchmark)
> +#define GUP_LONGTERM_BENCHMARK	_IOWR('g', 3, struct gup_benchmark)
> +#define GUP_BENCHMARK		_IOWR('g', 4, struct gup_benchmark)
>  
>  struct gup_benchmark {
>  	__u64 get_delta_usec;
> @@ -37,7 +38,7 @@ int main(int argc, char **argv)
>  	char *file = "/dev/zero";
>  	char *p;
>  
> -	while ((opt = getopt(argc, argv, "m:r:n:f:tTLUSH")) != -1) {
> +	while ((opt = getopt(argc, argv, "m:r:n:f:tTlLUSH")) != -1) {
>  		switch (opt) {
>  		case 'm':
>  			size = atoi(optarg) * MB;
> @@ -54,6 +55,9 @@ int main(int argc, char **argv)
>  		case 'T':
>  			thp = 0;
>  			break;
> +		case 'l':
> +			cmd = GUP_FAST_LONGTERM_BENCHMARK;
> +			break;
>  		case 'L':
>  			cmd = GUP_LONGTERM_BENCHMARK;
>  			break;
> -- 
> 2.7.5
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCHv4 3/3] mm/gup_benchemark: add LONGTERM_BENCHMARK test in gup fast path
  2019-06-13 21:42   ` Ira Weiny
@ 2019-06-13 21:49     ` Ira Weiny
  0 siblings, 0 replies; 9+ messages in thread
From: Ira Weiny @ 2019-06-13 21:49 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-mm, Andrew Morton, Mike Rapoport, Dan Williams,
	Matthew Wilcox, John Hubbard, Aneesh Kumar K.V, Keith Busch,
	Christoph Hellwig, Shuah Khan, linux-kernel

On Thu, Jun 13, 2019 at 02:42:47PM -0700, 'Ira Weiny' wrote:
> On Thu, Jun 13, 2019 at 06:45:02PM +0800, Pingfan Liu wrote:
> > Introduce a GUP_LONGTERM_BENCHMARK ioctl to test longterm pin in gup fast
> > path.
> > 
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Cc: Ira Weiny <ira.weiny@intel.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Mike Rapoport <rppt@linux.ibm.com>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Matthew Wilcox <willy@infradead.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> > Cc: Keith Busch <keith.busch@intel.com>
> > Cc: Christoph Hellwig <hch@infradead.org>
> > Cc: Shuah Khan <shuah@kernel.org>
> > Cc: linux-kernel@vger.kernel.org
> > ---
> >  mm/gup_benchmark.c                         | 11 +++++++++--
> >  tools/testing/selftests/vm/gup_benchmark.c | 10 +++++++---
> >  2 files changed, 16 insertions(+), 5 deletions(-)
> > 
> > diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c
> > index 7dd602d..83f3378 100644
> > --- a/mm/gup_benchmark.c
> > +++ b/mm/gup_benchmark.c
> > @@ -6,8 +6,9 @@
> >  #include <linux/debugfs.h>
> >  
> >  #define GUP_FAST_BENCHMARK	_IOWR('g', 1, struct gup_benchmark)
> > -#define GUP_LONGTERM_BENCHMARK	_IOWR('g', 2, struct gup_benchmark)
> > -#define GUP_BENCHMARK		_IOWR('g', 3, struct gup_benchmark)
> > +#define GUP_FAST_LONGTERM_BENCHMARK	_IOWR('g', 2, struct gup_benchmark)
> > +#define GUP_LONGTERM_BENCHMARK	_IOWR('g', 3, struct gup_benchmark)
> > +#define GUP_BENCHMARK		_IOWR('g', 4, struct gup_benchmark)
> 
> But I really like this addition!  Thanks!
> 
> But why not just add GUP_FAST_LONGTERM_BENCHMARK to the end of this list (value
> 4)?  I know the user space test program is probably expected to be lock step
> with this code but it seems odd to redefine GUP_LONGTERM_BENCHMARK and
> GUP_BENCHMARK with this change.

I see that Andrew pull this change.  So if others don't think this renumbering
is an issue feel free to add my:

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

> 
> Ira
> 
> >  
> >  struct gup_benchmark {
> >  	__u64 get_delta_usec;
> > @@ -53,6 +54,11 @@ static int __gup_benchmark_ioctl(unsigned int cmd,
> >  			nr = get_user_pages_fast(addr, nr, gup->flags & 1,
> >  						 pages + i);
> >  			break;
> > +		case GUP_FAST_LONGTERM_BENCHMARK:
> > +			nr = get_user_pages_fast(addr, nr,
> > +					(gup->flags & 1) | FOLL_LONGTERM,
> > +					 pages + i);
> > +			break;
> >  		case GUP_LONGTERM_BENCHMARK:
> >  			nr = get_user_pages(addr, nr,
> >  					    (gup->flags & 1) | FOLL_LONGTERM,
> > @@ -96,6 +102,7 @@ static long gup_benchmark_ioctl(struct file *filep, unsigned int cmd,
> >  
> >  	switch (cmd) {
> >  	case GUP_FAST_BENCHMARK:
> > +	case GUP_FAST_LONGTERM_BENCHMARK:
> >  	case GUP_LONGTERM_BENCHMARK:
> >  	case GUP_BENCHMARK:
> >  		break;
> > diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c
> > index c0534e2..ade8acb 100644
> > --- a/tools/testing/selftests/vm/gup_benchmark.c
> > +++ b/tools/testing/selftests/vm/gup_benchmark.c
> > @@ -15,8 +15,9 @@
> >  #define PAGE_SIZE sysconf(_SC_PAGESIZE)
> >  
> >  #define GUP_FAST_BENCHMARK	_IOWR('g', 1, struct gup_benchmark)
> > -#define GUP_LONGTERM_BENCHMARK	_IOWR('g', 2, struct gup_benchmark)
> > -#define GUP_BENCHMARK		_IOWR('g', 3, struct gup_benchmark)
> > +#define GUP_FAST_LONGTERM_BENCHMARK	_IOWR('g', 2, struct gup_benchmark)
> > +#define GUP_LONGTERM_BENCHMARK	_IOWR('g', 3, struct gup_benchmark)
> > +#define GUP_BENCHMARK		_IOWR('g', 4, struct gup_benchmark)
> >  
> >  struct gup_benchmark {
> >  	__u64 get_delta_usec;
> > @@ -37,7 +38,7 @@ int main(int argc, char **argv)
> >  	char *file = "/dev/zero";
> >  	char *p;
> >  
> > -	while ((opt = getopt(argc, argv, "m:r:n:f:tTLUSH")) != -1) {
> > +	while ((opt = getopt(argc, argv, "m:r:n:f:tTlLUSH")) != -1) {
> >  		switch (opt) {
> >  		case 'm':
> >  			size = atoi(optarg) * MB;
> > @@ -54,6 +55,9 @@ int main(int argc, char **argv)
> >  		case 'T':
> >  			thp = 0;
> >  			break;
> > +		case 'l':
> > +			cmd = GUP_FAST_LONGTERM_BENCHMARK;
> > +			break;
> >  		case 'L':
> >  			cmd = GUP_LONGTERM_BENCHMARK;
> >  			break;
> > -- 
> > 2.7.5
> > 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCHv4 2/3] mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path
  2019-06-13 21:39   ` Ira Weiny
@ 2019-06-14 15:24     ` Pingfan Liu
  0 siblings, 0 replies; 9+ messages in thread
From: Pingfan Liu @ 2019-06-14 15:24 UTC (permalink / raw)
  To: Ira Weiny
  Cc: linux-mm, Andrew Morton, Mike Rapoport, Dan Williams,
	Matthew Wilcox, John Hubbard, Aneesh Kumar K.V, Keith Busch,
	Christoph Hellwig, Shuah Khan, LKML, mike.kravetz,
	David Rientjes

Cc Mike, David, who is an expert of hugetlb and thp

On Fri, Jun 14, 2019 at 5:37 AM Ira Weiny <ira.weiny@intel.com> wrote:
>
> On Thu, Jun 13, 2019 at 06:45:01PM +0800, Pingfan Liu wrote:
> > FOLL_LONGTERM suggests a pin which is going to be given to hardware and
> > can't move. It would truncate CMA permanently and should be excluded.
> >
> > FOLL_LONGTERM has already been checked in the slow path, but not checked in
> > the fast path, which means a possible leak of CMA page to longterm pinned
> > requirement through this crack.
> >
> > Place a check in gup_pte_range() in the fast path.
> >
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Cc: Ira Weiny <ira.weiny@intel.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Mike Rapoport <rppt@linux.ibm.com>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Matthew Wilcox <willy@infradead.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> > Cc: Keith Busch <keith.busch@intel.com>
> > Cc: Christoph Hellwig <hch@infradead.org>
> > Cc: Shuah Khan <shuah@kernel.org>
> > Cc: linux-kernel@vger.kernel.org
> > ---
> >  mm/gup.c | 26 ++++++++++++++++++++++++++
> >  1 file changed, 26 insertions(+)
> >
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 766ae54..de1b03f 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -1757,6 +1757,14 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
> >               VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
> >               page = pte_page(pte);
> >
> > +             /*
> > +              * FOLL_LONGTERM suggests a pin given to hardware. Prevent it
> > +              * from truncating CMA area
> > +              */
> > +             if (unlikely(flags & FOLL_LONGTERM) &&
> > +                     is_migrate_cma_page(page))
> > +                     goto pte_unmap;
> > +
> >               head = try_get_compound_head(page, 1);
> >               if (!head)
> >                       goto pte_unmap;
> > @@ -1900,6 +1908,12 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
> >               refs++;
> >       } while (addr += PAGE_SIZE, addr != end);
> >
> > +     if (unlikely(flags & FOLL_LONGTERM) &&
> > +             is_migrate_cma_page(page)) {
> > +             *nr -= refs;
> > +             return 0;
> > +     }
> > +
>
> Why can't we place this check before the while loop and skip subtracting the
> page count?
Yes, that will be better.

>
> Can is_migrate_cma_page() operate on any "subpage" of a compound page?
For gigantic page, __alloc_gigantic_page() allocate from
MIGRATE_MOVABLE pageblock. For page order < MAX_ORDER, pages are
allocated from either free_list[MIGRATE_MOVABLE] or
free_list[MIGRATE_CMA]. So all subpage have the same migrate type.

Thanks,
  Pingfan
>
> Here this calls is_magrate_cma_page() on the tail page of the compound page.
>
> I'm not an expert on compound pages nor cma handling so is this ok?
>
> It seems like you need to call is_migrate_cma_page() on each page within the
> while loop?
>
> >       head = try_get_compound_head(pmd_page(orig), refs);
> >       if (!head) {
> >               *nr -= refs;
> > @@ -1941,6 +1955,12 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
> >               refs++;
> >       } while (addr += PAGE_SIZE, addr != end);
> >
> > +     if (unlikely(flags & FOLL_LONGTERM) &&
> > +             is_migrate_cma_page(page)) {
> > +             *nr -= refs;
> > +             return 0;
> > +     }
> > +
>
> Same comment here.
>
> >       head = try_get_compound_head(pud_page(orig), refs);
> >       if (!head) {
> >               *nr -= refs;
> > @@ -1978,6 +1998,12 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
> >               refs++;
> >       } while (addr += PAGE_SIZE, addr != end);
> >
> > +     if (unlikely(flags & FOLL_LONGTERM) &&
> > +             is_migrate_cma_page(page)) {
> > +             *nr -= refs;
> > +             return 0;
> > +     }
> > +
>
> And here.
>
> Ira
>
> >       head = try_get_compound_head(pgd_page(orig), refs);
> >       if (!head) {
> >               *nr -= refs;
> > --
> > 2.7.5
> >

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-06-14 15:24 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-13 10:44 [PATCHv4 0/3] mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path Pingfan Liu
2019-06-13 10:45 ` [PATCHv4 1/3] mm/gup: rename nr as nr_pinned in get_user_pages_fast() Pingfan Liu
2019-06-13 21:28   ` Ira Weiny
2019-06-13 10:45 ` [PATCHv4 2/3] mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path Pingfan Liu
2019-06-13 21:39   ` Ira Weiny
2019-06-14 15:24     ` Pingfan Liu
2019-06-13 10:45 ` [PATCHv4 3/3] mm/gup_benchemark: add LONGTERM_BENCHMARK test " Pingfan Liu
2019-06-13 21:42   ` Ira Weiny
2019-06-13 21:49     ` Ira Weiny

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).