linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv5 0/3] fix omission of check on FOLL_LONGTERM in gup fast path
@ 2020-02-28 11:32 Pingfan Liu
  2020-02-28 11:32 ` [PATCHv5 1/3] mm/gup: rename nr as nr_pinned in internal_get_user_pages_fast() Pingfan Liu
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Pingfan Liu @ 2020-02-28 11:32 UTC (permalink / raw)
  To: linux-mm
  Cc: Pingfan Liu, Ira Weiny, Andrew Morton, Mike Rapoport,
	Dan Williams, Matthew Wilcox, John Hubbard, Aneesh Kumar K.V,
	Keith Busch, Christoph Hellwig, Shuah Khan, linux-kernel

The last V4 series:
https://lore.kernel.org/patchwork/project/lkml/list/?series=397950, and be
dropped from mm tree due to conflict with "RFC: switch the remaining
architectures to use generic GUP" [1]

I rebase it and sent out V5.
V4 -> V5: move around the patched code due to code change.

[1]: https://lore.kernel.org/linux-mm/20190601074959.14036-1-hch@lst.de/ 

Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org

---
Pingfan Liu (3):
  mm/gup: rename nr as nr_pinned in get_user_pages_fast()
  mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path
  mm/gup_benchemark: add LONGTERM_BENCHMARK test in gup fast path

 mm/gup.c                                   | 46 +++++++++++++++++++-----------
 mm/gup_benchmark.c                         |  7 +++++
 tools/testing/selftests/vm/gup_benchmark.c |  6 +++-
 3 files changed, 41 insertions(+), 18 deletions(-)

-- 
2.7.5



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCHv5 1/3] mm/gup: rename nr as nr_pinned in internal_get_user_pages_fast()
  2020-02-28 11:32 [PATCHv5 0/3] fix omission of check on FOLL_LONGTERM in gup fast path Pingfan Liu
@ 2020-02-28 11:32 ` Pingfan Liu
  2020-02-28 11:32 ` [PATCHv5 2/3] mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path Pingfan Liu
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Pingfan Liu @ 2020-02-28 11:32 UTC (permalink / raw)
  To: linux-mm
  Cc: Pingfan Liu, Ira Weiny, Andrew Morton, Mike Rapoport,
	Dan Williams, Matthew Wilcox, John Hubbard, Aneesh Kumar K.V,
	Keith Busch, Christoph Hellwig, Shuah Khan, linux-kernel

To better reflect the held state of pages and make code self-explaining,
rename nr as nr_pinned.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/gup.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 1b521e0..cd8075e 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2432,7 +2432,7 @@ static int internal_get_user_pages_fast(unsigned long start, int nr_pages,
 					struct page **pages)
 {
 	unsigned long addr, len, end;
-	int nr = 0, ret = 0;
+	int nr_pinned = 0, ret = 0;
 
 	if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM |
 				       FOLL_FORCE | FOLL_PIN)))
@@ -2451,25 +2451,25 @@ static int internal_get_user_pages_fast(unsigned long start, int nr_pages,
 	if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) &&
 	    gup_fast_permitted(start, end)) {
 		local_irq_disable();
-		gup_pgd_range(addr, end, gup_flags, pages, &nr);
+		gup_pgd_range(addr, end, gup_flags, pages, &nr_pinned);
 		local_irq_enable();
-		ret = nr;
+		ret = nr_pinned;
 	}
 
-	if (nr < nr_pages) {
+	if (nr_pinned < nr_pages) {
 		/* Try to get the remaining pages with get_user_pages */
-		start += nr << PAGE_SHIFT;
-		pages += nr;
+		start += nr_pinned << PAGE_SHIFT;
+		pages += nr_pinned;
 
-		ret = __gup_longterm_unlocked(start, nr_pages - nr,
+		ret = __gup_longterm_unlocked(start, nr_pages - nr_pinned,
 					      gup_flags, pages);
 
 		/* Have to be a bit careful with return values */
-		if (nr > 0) {
+		if (nr_pinned > 0) {
 			if (ret < 0)
-				ret = nr;
+				ret = nr_pinned;
 			else
-				ret += nr;
+				ret += nr_pinned;
 		}
 	}
 
-- 
2.7.5



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCHv5 2/3] mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path
  2020-02-28 11:32 [PATCHv5 0/3] fix omission of check on FOLL_LONGTERM in gup fast path Pingfan Liu
  2020-02-28 11:32 ` [PATCHv5 1/3] mm/gup: rename nr as nr_pinned in internal_get_user_pages_fast() Pingfan Liu
@ 2020-02-28 11:32 ` Pingfan Liu
  2020-02-28 13:44   ` Jason Gunthorpe
                     ` (2 more replies)
  2020-02-28 11:32 ` [PATCHv5 3/3] mm/gup_benchemark: add LONGTERM_BENCHMARK test " Pingfan Liu
  2020-03-02 23:42 ` [PATCHv5 0/3] fix omission of check on FOLL_LONGTERM " John Hubbard
  3 siblings, 3 replies; 15+ messages in thread
From: Pingfan Liu @ 2020-02-28 11:32 UTC (permalink / raw)
  To: linux-mm
  Cc: Pingfan Liu, Ira Weiny, Andrew Morton, Mike Rapoport,
	Dan Williams, Matthew Wilcox, John Hubbard, Aneesh Kumar K.V,
	Keith Busch, Christoph Hellwig, Shuah Khan, linux-kernel

FOLL_LONGTERM suggests a pin which is going to be given to hardware and
can't move. It would truncate CMA permanently and should be excluded.

FOLL_LONGTERM has already been checked in the slow path, but not checked in
the fast path, which means a possible leak of CMA page to longterm pinned
requirement through this crack.

Place a check in try_get_compound_head() in the fast path.

Some note about the check:
Huge page's subpages have the same migrate type due to either
allocation from a free_list[] or alloc_contig_range() with param
MIGRATE_MOVABLE. So it is enough to check on a single subpage
by is_migrate_cma_page(subpage)

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/gup.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index cd8075e..f0d6804 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -33,9 +33,21 @@ struct follow_page_context {
  * Return the compound head page with ref appropriately incremented,
  * or NULL if that failed.
  */
-static inline struct page *try_get_compound_head(struct page *page, int refs)
+static inline struct page *try_get_compound_head(struct page *page, int refs,
+	unsigned int flags)
 {
-	struct page *head = compound_head(page);
+	struct page *head;
+
+	/*
+	 * Huge page's subpages have the same migrate type due to either
+	 * allocation from a free_list[] or alloc_contig_range() with param
+	 * MIGRATE_MOVABLE. So it is enough to check on a single subpage.
+	 */
+	if (unlikely(flags & FOLL_LONGTERM) &&
+		is_migrate_cma_page(page))
+		return NULL;
+
+	head = compound_head(page);
 
 	if (WARN_ON_ONCE(page_ref_count(head) < 0))
 		return NULL;
@@ -1908,7 +1920,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
 		page = pte_page(pte);
 
-		head = try_get_compound_head(page, 1);
+		head = try_get_compound_head(page, 1, flags);
 		if (!head)
 			goto pte_unmap;
 
@@ -2083,7 +2095,7 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
 	page = head + ((addr & (sz-1)) >> PAGE_SHIFT);
 	refs = record_subpages(page, addr, end, pages + *nr);
 
-	head = try_get_compound_head(head, refs);
+	head = try_get_compound_head(head, refs, flags);
 	if (!head)
 		return 0;
 
@@ -2142,7 +2154,7 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
 	page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
 	refs = record_subpages(page, addr, end, pages + *nr);
 
-	head = try_get_compound_head(pmd_page(orig), refs);
+	head = try_get_compound_head(pmd_page(orig), refs, flags);
 	if (!head)
 		return 0;
 
@@ -2174,7 +2186,7 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
 	page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
 	refs = record_subpages(page, addr, end, pages + *nr);
 
-	head = try_get_compound_head(pud_page(orig), refs);
+	head = try_get_compound_head(pud_page(orig), refs, flags);
 	if (!head)
 		return 0;
 
@@ -2203,7 +2215,7 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
 	page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
 	refs = record_subpages(page, addr, end, pages + *nr);
 
-	head = try_get_compound_head(pgd_page(orig), refs);
+	head = try_get_compound_head(pgd_page(orig), refs, flags);
 	if (!head)
 		return 0;
 
-- 
2.7.5



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCHv5 3/3] mm/gup_benchemark: add LONGTERM_BENCHMARK test in gup fast path
  2020-02-28 11:32 [PATCHv5 0/3] fix omission of check on FOLL_LONGTERM in gup fast path Pingfan Liu
  2020-02-28 11:32 ` [PATCHv5 1/3] mm/gup: rename nr as nr_pinned in internal_get_user_pages_fast() Pingfan Liu
  2020-02-28 11:32 ` [PATCHv5 2/3] mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path Pingfan Liu
@ 2020-02-28 11:32 ` Pingfan Liu
  2020-02-28 15:43   ` Alexander Duyck
  2020-03-02 23:42 ` [PATCHv5 0/3] fix omission of check on FOLL_LONGTERM " John Hubbard
  3 siblings, 1 reply; 15+ messages in thread
From: Pingfan Liu @ 2020-02-28 11:32 UTC (permalink / raw)
  To: linux-mm
  Cc: Pingfan Liu, Ira Weiny, Andrew Morton, Mike Rapoport,
	Dan Williams, Matthew Wilcox, John Hubbard, Aneesh Kumar K.V,
	Keith Busch, Christoph Hellwig, Shuah Khan, linux-kernel

Introduce a GUP_LONGTERM_BENCHMARK ioctl to test longterm pin in gup fast
path.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/gup_benchmark.c                         | 7 +++++++
 tools/testing/selftests/vm/gup_benchmark.c | 6 +++++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c
index 8dba38e..bf61e7a 100644
--- a/mm/gup_benchmark.c
+++ b/mm/gup_benchmark.c
@@ -8,6 +8,7 @@
 #define GUP_FAST_BENCHMARK	_IOWR('g', 1, struct gup_benchmark)
 #define GUP_LONGTERM_BENCHMARK	_IOWR('g', 2, struct gup_benchmark)
 #define GUP_BENCHMARK		_IOWR('g', 3, struct gup_benchmark)
+#define GUP_FAST_LONGTERM_BENCHMARK	_IOWR('g', 4, struct gup_benchmark)
 
 struct gup_benchmark {
 	__u64 get_delta_usec;
@@ -57,6 +58,11 @@ static int __gup_benchmark_ioctl(unsigned int cmd,
 			nr = get_user_pages_fast(addr, nr, gup->flags,
 						 pages + i);
 			break;
+		case GUP_FAST_LONGTERM_BENCHMARK:
+			nr = get_user_pages_fast(addr, nr,
+					(gup->flags & 1) | FOLL_LONGTERM,
+					 pages + i);
+			break;
 		case GUP_LONGTERM_BENCHMARK:
 			nr = get_user_pages(addr, nr,
 					    gup->flags | FOLL_LONGTERM,
@@ -103,6 +109,7 @@ static long gup_benchmark_ioctl(struct file *filep, unsigned int cmd,
 
 	switch (cmd) {
 	case GUP_FAST_BENCHMARK:
+	case GUP_FAST_LONGTERM_BENCHMARK:
 	case GUP_LONGTERM_BENCHMARK:
 	case GUP_BENCHMARK:
 		break;
diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c
index 389327e..5a01c538 100644
--- a/tools/testing/selftests/vm/gup_benchmark.c
+++ b/tools/testing/selftests/vm/gup_benchmark.c
@@ -17,6 +17,7 @@
 #define GUP_FAST_BENCHMARK	_IOWR('g', 1, struct gup_benchmark)
 #define GUP_LONGTERM_BENCHMARK	_IOWR('g', 2, struct gup_benchmark)
 #define GUP_BENCHMARK		_IOWR('g', 3, struct gup_benchmark)
+#define GUP_FAST_LONGTERM_BENCHMARK	_IOWR('g', 4, struct gup_benchmark)
 
 /* Just the flags we need, copied from mm.h: */
 #define FOLL_WRITE	0x01	/* check pte is writable */
@@ -40,7 +41,7 @@ int main(int argc, char **argv)
 	char *file = "/dev/zero";
 	char *p;
 
-	while ((opt = getopt(argc, argv, "m:r:n:f:tTLUwSH")) != -1) {
+	while ((opt = getopt(argc, argv, "m:r:n:f:tTlLUwSH")) != -1) {
 		switch (opt) {
 		case 'm':
 			size = atoi(optarg) * MB;
@@ -57,6 +58,9 @@ int main(int argc, char **argv)
 		case 'T':
 			thp = 0;
 			break;
+		case 'l':
+			cmd = GUP_FAST_LONGTERM_BENCHMARK;
+			break;
 		case 'L':
 			cmd = GUP_LONGTERM_BENCHMARK;
 			break;
-- 
2.7.5



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCHv5 2/3] mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path
  2020-02-28 11:32 ` [PATCHv5 2/3] mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path Pingfan Liu
@ 2020-02-28 13:44   ` Jason Gunthorpe
  2020-03-02  2:25     ` Pingfan Liu
  2020-02-28 22:34   ` Ira Weiny
  2020-03-02 23:51   ` John Hubbard
  2 siblings, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2020-02-28 13:44 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-mm, Ira Weiny, Andrew Morton, Mike Rapoport, Dan Williams,
	Matthew Wilcox, John Hubbard, Aneesh Kumar K.V, Keith Busch,
	Christoph Hellwig, Shuah Khan, linux-kernel

On Fri, Feb 28, 2020 at 07:32:29PM +0800, Pingfan Liu wrote:
> FOLL_LONGTERM suggests a pin which is going to be given to hardware and
> can't move. It would truncate CMA permanently and should be excluded.
> 
> FOLL_LONGTERM has already been checked in the slow path, but not checked in
> the fast path, which means a possible leak of CMA page to longterm pinned
> requirement through this crack.
> 
> Place a check in try_get_compound_head() in the fast path.
> 
> Some note about the check:
> Huge page's subpages have the same migrate type due to either
> allocation from a free_list[] or alloc_contig_range() with param
> MIGRATE_MOVABLE. So it is enough to check on a single subpage
> by is_migrate_cma_page(subpage)
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mike Rapoport <rppt@linux.ibm.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> Cc: Keith Busch <keith.busch@intel.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Shuah Khan <shuah@kernel.org>
> To: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
>  mm/gup.c | 26 +++++++++++++++++++-------
>  1 file changed, 19 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index cd8075e..f0d6804 100644
> +++ b/mm/gup.c
> @@ -33,9 +33,21 @@ struct follow_page_context {
>   * Return the compound head page with ref appropriately incremented,
>   * or NULL if that failed.
>   */
> -static inline struct page *try_get_compound_head(struct page *page, int refs)
> +static inline struct page *try_get_compound_head(struct page *page, int refs,
> +	unsigned int flags)
>  {
> -	struct page *head = compound_head(page);
> +	struct page *head;
> +
> +	/*
> +	 * Huge page's subpages have the same migrate type due to either
> +	 * allocation from a free_list[] or alloc_contig_range() with param
> +	 * MIGRATE_MOVABLE. So it is enough to check on a single subpage.
> +	 */
> +	if (unlikely(flags & FOLL_LONGTERM) &&
> +		is_migrate_cma_page(page))
> +		return NULL;

This doesn't seem very good actually.

If I understand properly, if the system has randomly decided to place,
say, an anonymous page in a CMA region when an application did mmap(),
then when the application tries to use this page with a LONGTERM pin
it gets an immediate failure because of the above.

This not OK - the application should not be subject to random failures
related to long term pins beyond its direct control.

Essentially, failures should only originate from the application using
specific mmap scenarios, not randomly based on something the MM did,
and certainly never for anonymous memory.

I think the correct action here is to trigger migration of the page so
it is not in CMA.

Jason


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv5 3/3] mm/gup_benchemark: add LONGTERM_BENCHMARK test in gup fast path
  2020-02-28 11:32 ` [PATCHv5 3/3] mm/gup_benchemark: add LONGTERM_BENCHMARK test " Pingfan Liu
@ 2020-02-28 15:43   ` Alexander Duyck
  2020-03-02  2:38     ` Pingfan Liu
  0 siblings, 1 reply; 15+ messages in thread
From: Alexander Duyck @ 2020-02-28 15:43 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-mm, Ira Weiny, Andrew Morton, Mike Rapoport, Dan Williams,
	Matthew Wilcox, John Hubbard, Aneesh Kumar K.V, Keith Busch,
	Christoph Hellwig, Shuah Khan, LKML

On Fri, Feb 28, 2020 at 3:35 AM Pingfan Liu <kernelfans@gmail.com> wrote:
>
> Introduce a GUP_LONGTERM_BENCHMARK ioctl to test longterm pin in gup fast
> path.

The title of the patch has a typo in it. There is only one 'e' in "benchmark".

> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mike Rapoport <rppt@linux.ibm.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> Cc: Keith Busch <keith.busch@intel.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Shuah Khan <shuah@kernel.org>
> To: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  mm/gup_benchmark.c                         | 7 +++++++
>  tools/testing/selftests/vm/gup_benchmark.c | 6 +++++-
>  2 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c
> index 8dba38e..bf61e7a 100644
> --- a/mm/gup_benchmark.c
> +++ b/mm/gup_benchmark.c
> @@ -8,6 +8,7 @@
>  #define GUP_FAST_BENCHMARK     _IOWR('g', 1, struct gup_benchmark)
>  #define GUP_LONGTERM_BENCHMARK _IOWR('g', 2, struct gup_benchmark)
>  #define GUP_BENCHMARK          _IOWR('g', 3, struct gup_benchmark)
> +#define GUP_FAST_LONGTERM_BENCHMARK    _IOWR('g', 4, struct gup_benchmark)
>
>  struct gup_benchmark {
>         __u64 get_delta_usec;
> @@ -57,6 +58,11 @@ static int __gup_benchmark_ioctl(unsigned int cmd,
>                         nr = get_user_pages_fast(addr, nr, gup->flags,
>                                                  pages + i);
>                         break;
> +               case GUP_FAST_LONGTERM_BENCHMARK:
> +                       nr = get_user_pages_fast(addr, nr,
> +                                       (gup->flags & 1) | FOLL_LONGTERM,
> +                                        pages + i);
> +                       break;

If I am not mistaken the mask of gup->flags is redundant. It is
already masked by FOLL_WRITE several lines before this switch
statement.

>                 case GUP_LONGTERM_BENCHMARK:
>                         nr = get_user_pages(addr, nr,
>                                             gup->flags | FOLL_LONGTERM,
> @@ -103,6 +109,7 @@ static long gup_benchmark_ioctl(struct file *filep, unsigned int cmd,
>
>         switch (cmd) {
>         case GUP_FAST_BENCHMARK:
> +       case GUP_FAST_LONGTERM_BENCHMARK:
>         case GUP_LONGTERM_BENCHMARK:
>         case GUP_BENCHMARK:
>                 break;
> diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c
> index 389327e..5a01c538 100644
> --- a/tools/testing/selftests/vm/gup_benchmark.c
> +++ b/tools/testing/selftests/vm/gup_benchmark.c
> @@ -17,6 +17,7 @@
>  #define GUP_FAST_BENCHMARK     _IOWR('g', 1, struct gup_benchmark)
>  #define GUP_LONGTERM_BENCHMARK _IOWR('g', 2, struct gup_benchmark)
>  #define GUP_BENCHMARK          _IOWR('g', 3, struct gup_benchmark)
> +#define GUP_FAST_LONGTERM_BENCHMARK    _IOWR('g', 4, struct gup_benchmark)
>
>  /* Just the flags we need, copied from mm.h: */
>  #define FOLL_WRITE     0x01    /* check pte is writable */
> @@ -40,7 +41,7 @@ int main(int argc, char **argv)
>         char *file = "/dev/zero";
>         char *p;
>
> -       while ((opt = getopt(argc, argv, "m:r:n:f:tTLUwSH")) != -1) {
> +       while ((opt = getopt(argc, argv, "m:r:n:f:tTlLUwSH")) != -1) {
>                 switch (opt) {
>                 case 'm':
>                         size = atoi(optarg) * MB;
> @@ -57,6 +58,9 @@ int main(int argc, char **argv)
>                 case 'T':
>                         thp = 0;
>                         break;
> +               case 'l':
> +                       cmd = GUP_FAST_LONGTERM_BENCHMARK;
> +                       break;
>                 case 'L':
>                         cmd = GUP_LONGTERM_BENCHMARK;
>                         break;
> --
> 2.7.5
>
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv5 2/3] mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path
  2020-02-28 11:32 ` [PATCHv5 2/3] mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path Pingfan Liu
  2020-02-28 13:44   ` Jason Gunthorpe
@ 2020-02-28 22:34   ` Ira Weiny
  2020-03-02  2:28     ` Pingfan Liu
  2020-03-02 23:51   ` John Hubbard
  2 siblings, 1 reply; 15+ messages in thread
From: Ira Weiny @ 2020-02-28 22:34 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-mm, Andrew Morton, Mike Rapoport, Dan Williams,
	Matthew Wilcox, John Hubbard, Aneesh Kumar K.V, Keith Busch,
	Christoph Hellwig, Shuah Khan, linux-kernel

On Fri, Feb 28, 2020 at 07:32:29PM +0800, Pingfan Liu wrote:
> FOLL_LONGTERM suggests a pin which is going to be given to hardware and
> can't move. It would truncate CMA permanently and should be excluded.

I don't understand what is 'truncated' here?

I generally agree with Jason that this is going to be confusing to the user.

Ira

> 
> FOLL_LONGTERM has already been checked in the slow path, but not checked in
> the fast path, which means a possible leak of CMA page to longterm pinned
> requirement through this crack.
> 
> Place a check in try_get_compound_head() in the fast path.
> 
> Some note about the check:
> Huge page's subpages have the same migrate type due to either
> allocation from a free_list[] or alloc_contig_range() with param
> MIGRATE_MOVABLE. So it is enough to check on a single subpage
> by is_migrate_cma_page(subpage)
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mike Rapoport <rppt@linux.ibm.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> Cc: Keith Busch <keith.busch@intel.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Shuah Khan <shuah@kernel.org>
> To: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  mm/gup.c | 26 +++++++++++++++++++-------
>  1 file changed, 19 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index cd8075e..f0d6804 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -33,9 +33,21 @@ struct follow_page_context {
>   * Return the compound head page with ref appropriately incremented,
>   * or NULL if that failed.
>   */
> -static inline struct page *try_get_compound_head(struct page *page, int refs)
> +static inline struct page *try_get_compound_head(struct page *page, int refs,
> +	unsigned int flags)
>  {
> -	struct page *head = compound_head(page);
> +	struct page *head;
> +
> +	/*
> +	 * Huge page's subpages have the same migrate type due to either
> +	 * allocation from a free_list[] or alloc_contig_range() with param
> +	 * MIGRATE_MOVABLE. So it is enough to check on a single subpage.
> +	 */
> +	if (unlikely(flags & FOLL_LONGTERM) &&
> +		is_migrate_cma_page(page))
> +		return NULL;
> +
> +	head = compound_head(page);
>  
>  	if (WARN_ON_ONCE(page_ref_count(head) < 0))
>  		return NULL;
> @@ -1908,7 +1920,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>  		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
>  		page = pte_page(pte);
>  
> -		head = try_get_compound_head(page, 1);
> +		head = try_get_compound_head(page, 1, flags);
>  		if (!head)
>  			goto pte_unmap;
>  
> @@ -2083,7 +2095,7 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
>  	page = head + ((addr & (sz-1)) >> PAGE_SHIFT);
>  	refs = record_subpages(page, addr, end, pages + *nr);
>  
> -	head = try_get_compound_head(head, refs);
> +	head = try_get_compound_head(head, refs, flags);
>  	if (!head)
>  		return 0;
>  
> @@ -2142,7 +2154,7 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
>  	page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
>  	refs = record_subpages(page, addr, end, pages + *nr);
>  
> -	head = try_get_compound_head(pmd_page(orig), refs);
> +	head = try_get_compound_head(pmd_page(orig), refs, flags);
>  	if (!head)
>  		return 0;
>  
> @@ -2174,7 +2186,7 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
>  	page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
>  	refs = record_subpages(page, addr, end, pages + *nr);
>  
> -	head = try_get_compound_head(pud_page(orig), refs);
> +	head = try_get_compound_head(pud_page(orig), refs, flags);
>  	if (!head)
>  		return 0;
>  
> @@ -2203,7 +2215,7 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
>  	page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
>  	refs = record_subpages(page, addr, end, pages + *nr);
>  
> -	head = try_get_compound_head(pgd_page(orig), refs);
> +	head = try_get_compound_head(pgd_page(orig), refs, flags);
>  	if (!head)
>  		return 0;
>  
> -- 
> 2.7.5
> 
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv5 2/3] mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path
  2020-02-28 13:44   ` Jason Gunthorpe
@ 2020-03-02  2:25     ` Pingfan Liu
  2020-03-02 13:08       ` Jason Gunthorpe
  0 siblings, 1 reply; 15+ messages in thread
From: Pingfan Liu @ 2020-03-02  2:25 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Linux-MM, Ira Weiny, Andrew Morton, Mike Rapoport, Dan Williams,
	Matthew Wilcox, John Hubbard, Aneesh Kumar K.V, Keith Busch,
	Christoph Hellwig, Shuah Khan, LKML

On Fri, Feb 28, 2020 at 9:44 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Feb 28, 2020 at 07:32:29PM +0800, Pingfan Liu wrote:
> > FOLL_LONGTERM suggests a pin which is going to be given to hardware and
> > can't move. It would truncate CMA permanently and should be excluded.
> >
> > FOLL_LONGTERM has already been checked in the slow path, but not checked in
> > the fast path, which means a possible leak of CMA page to longterm pinned
> > requirement through this crack.
> >
> > Place a check in try_get_compound_head() in the fast path.
> >
> > Some note about the check:
> > Huge page's subpages have the same migrate type due to either
> > allocation from a free_list[] or alloc_contig_range() with param
> > MIGRATE_MOVABLE. So it is enough to check on a single subpage
> > by is_migrate_cma_page(subpage)
> >
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Cc: Ira Weiny <ira.weiny@intel.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Mike Rapoport <rppt@linux.ibm.com>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Matthew Wilcox <willy@infradead.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> > Cc: Keith Busch <keith.busch@intel.com>
> > Cc: Christoph Hellwig <hch@infradead.org>
> > Cc: Shuah Khan <shuah@kernel.org>
> > To: linux-mm@kvack.org
> > Cc: linux-kernel@vger.kernel.org
> >  mm/gup.c | 26 +++++++++++++++++++-------
> >  1 file changed, 19 insertions(+), 7 deletions(-)
> >
> > diff --git a/mm/gup.c b/mm/gup.c
> > index cd8075e..f0d6804 100644
> > +++ b/mm/gup.c
> > @@ -33,9 +33,21 @@ struct follow_page_context {
> >   * Return the compound head page with ref appropriately incremented,
> >   * or NULL if that failed.
> >   */
> > -static inline struct page *try_get_compound_head(struct page *page, int refs)
> > +static inline struct page *try_get_compound_head(struct page *page, int refs,
> > +     unsigned int flags)
> >  {
> > -     struct page *head = compound_head(page);
> > +     struct page *head;
> > +
> > +     /*
> > +      * Huge page's subpages have the same migrate type due to either
> > +      * allocation from a free_list[] or alloc_contig_range() with param
> > +      * MIGRATE_MOVABLE. So it is enough to check on a single subpage.
> > +      */
> > +     if (unlikely(flags & FOLL_LONGTERM) &&
> > +             is_migrate_cma_page(page))
> > +             return NULL;
>
> This doesn't seem very good actually.
>
> If I understand properly, if the system has randomly decided to place,
> say, an anonymous page in a CMA region when an application did mmap(),
> then when the application tries to use this page with a LONGTERM pin
> it gets an immediate failure because of the above.
No, actually, it will fall back to slow path, which migrates and sever
the LONGTERM pin.

This patch just aims to fix the leakage in gup fast path, while in gup
slow path, there is already logic to guard CMA against  LONGTERM pin.
>
> This not OK - the application should not be subject to random failures
> related to long term pins beyond its direct control.
>
> Essentially, failures should only originate from the application using
> specific mmap scenarios, not randomly based on something the MM did,
> and certainly never for anonymous memory.
>
> I think the correct action here is to trigger migration of the page so
> it is not in CMA.
In fact, it does this. The failure in gup fast path will fall back to
slow path, where __gup_longterm_locked->check_and_migrate_cma_pages()
does the migration.

Thanks,
Pingfan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv5 2/3] mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path
  2020-02-28 22:34   ` Ira Weiny
@ 2020-03-02  2:28     ` Pingfan Liu
  0 siblings, 0 replies; 15+ messages in thread
From: Pingfan Liu @ 2020-03-02  2:28 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Linux-MM, Andrew Morton, Mike Rapoport, Dan Williams,
	Matthew Wilcox, John Hubbard, Aneesh Kumar K.V,
	Christoph Hellwig, Shuah Khan, LKML

On Sat, Feb 29, 2020 at 6:34 AM Ira Weiny <ira.weiny@intel.com> wrote:
>
> On Fri, Feb 28, 2020 at 07:32:29PM +0800, Pingfan Liu wrote:
> > FOLL_LONGTERM suggests a pin which is going to be given to hardware and
> > can't move. It would truncate CMA permanently and should be excluded.
>
> I don't understand what is 'truncated' here?
a pinned page will truncate a continuous area, and prevent CMA to
reclaim the continuous area.
>
> I generally agree with Jason that this is going to be confusing to the user.
Please see the reply in anothe mail.

Thanks,
Pingfan
[...]


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv5 3/3] mm/gup_benchemark: add LONGTERM_BENCHMARK test in gup fast path
  2020-02-28 15:43   ` Alexander Duyck
@ 2020-03-02  2:38     ` Pingfan Liu
  0 siblings, 0 replies; 15+ messages in thread
From: Pingfan Liu @ 2020-03-02  2:38 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: linux-mm, Ira Weiny, Andrew Morton, Mike Rapoport, Dan Williams,
	Matthew Wilcox, John Hubbard, Aneesh Kumar K.V, Keith Busch,
	Christoph Hellwig, Shuah Khan, LKML

On Fri, Feb 28, 2020 at 11:43 PM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Fri, Feb 28, 2020 at 3:35 AM Pingfan Liu <kernelfans@gmail.com> wrote:
> >
> > Introduce a GUP_LONGTERM_BENCHMARK ioctl to test longterm pin in gup fast
> > path.
>
> The title of the patch has a typo in it. There is only one 'e' in "benchmark".
Yes, it should be GUP_FAST_LONGTERM_BENCHMARK
>
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Cc: Ira Weiny <ira.weiny@intel.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Mike Rapoport <rppt@linux.ibm.com>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Matthew Wilcox <willy@infradead.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> > Cc: Keith Busch <keith.busch@intel.com>
> > Cc: Christoph Hellwig <hch@infradead.org>
> > Cc: Shuah Khan <shuah@kernel.org>
> > To: linux-mm@kvack.org
> > Cc: linux-kernel@vger.kernel.org
> > ---
> >  mm/gup_benchmark.c                         | 7 +++++++
> >  tools/testing/selftests/vm/gup_benchmark.c | 6 +++++-
> >  2 files changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c
> > index 8dba38e..bf61e7a 100644
> > --- a/mm/gup_benchmark.c
> > +++ b/mm/gup_benchmark.c
> > @@ -8,6 +8,7 @@
> >  #define GUP_FAST_BENCHMARK     _IOWR('g', 1, struct gup_benchmark)
> >  #define GUP_LONGTERM_BENCHMARK _IOWR('g', 2, struct gup_benchmark)
> >  #define GUP_BENCHMARK          _IOWR('g', 3, struct gup_benchmark)
> > +#define GUP_FAST_LONGTERM_BENCHMARK    _IOWR('g', 4, struct gup_benchmark)
> >
> >  struct gup_benchmark {
> >         __u64 get_delta_usec;
> > @@ -57,6 +58,11 @@ static int __gup_benchmark_ioctl(unsigned int cmd,
> >                         nr = get_user_pages_fast(addr, nr, gup->flags,
> >                                                  pages + i);
> >                         break;
> > +               case GUP_FAST_LONGTERM_BENCHMARK:
> > +                       nr = get_user_pages_fast(addr, nr,
> > +                                       (gup->flags & 1) | FOLL_LONGTERM,
> > +                                        pages + i);
> > +                       break;
>
> If I am not mistaken the mask of gup->flags is redundant. It is
> already masked by FOLL_WRITE several lines before this switch
> statement.
Yes, you are right. Thanks for your kind review.

Regards,
Pingfan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv5 2/3] mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path
  2020-03-02  2:25     ` Pingfan Liu
@ 2020-03-02 13:08       ` Jason Gunthorpe
  2020-03-03 13:39         ` Pingfan Liu
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2020-03-02 13:08 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: Linux-MM, Ira Weiny, Andrew Morton, Mike Rapoport, Dan Williams,
	Matthew Wilcox, John Hubbard, Aneesh Kumar K.V, Keith Busch,
	Christoph Hellwig, Shuah Khan, LKML

On Mon, Mar 02, 2020 at 10:25:52AM +0800, Pingfan Liu wrote:
> On Fri, Feb 28, 2020 at 9:44 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Fri, Feb 28, 2020 at 07:32:29PM +0800, Pingfan Liu wrote:
> > > FOLL_LONGTERM suggests a pin which is going to be given to hardware and
> > > can't move. It would truncate CMA permanently and should be excluded.
> > >
> > > FOLL_LONGTERM has already been checked in the slow path, but not checked in
> > > the fast path, which means a possible leak of CMA page to longterm pinned
> > > requirement through this crack.
> > >
> > > Place a check in try_get_compound_head() in the fast path.
> > >
> > > Some note about the check:
> > > Huge page's subpages have the same migrate type due to either
> > > allocation from a free_list[] or alloc_contig_range() with param
> > > MIGRATE_MOVABLE. So it is enough to check on a single subpage
> > > by is_migrate_cma_page(subpage)
> > >
> > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > > Cc: Ira Weiny <ira.weiny@intel.com>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: Mike Rapoport <rppt@linux.ibm.com>
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Cc: Matthew Wilcox <willy@infradead.org>
> > > Cc: John Hubbard <jhubbard@nvidia.com>
> > > Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> > > Cc: Keith Busch <keith.busch@intel.com>
> > > Cc: Christoph Hellwig <hch@infradead.org>
> > > Cc: Shuah Khan <shuah@kernel.org>
> > > To: linux-mm@kvack.org
> > > Cc: linux-kernel@vger.kernel.org
> > >  mm/gup.c | 26 +++++++++++++++++++-------
> > >  1 file changed, 19 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/mm/gup.c b/mm/gup.c
> > > index cd8075e..f0d6804 100644
> > > +++ b/mm/gup.c
> > > @@ -33,9 +33,21 @@ struct follow_page_context {
> > >   * Return the compound head page with ref appropriately incremented,
> > >   * or NULL if that failed.
> > >   */
> > > -static inline struct page *try_get_compound_head(struct page *page, int refs)
> > > +static inline struct page *try_get_compound_head(struct page *page, int refs,
> > > +     unsigned int flags)
> > >  {
> > > -     struct page *head = compound_head(page);
> > > +     struct page *head;
> > > +
> > > +     /*
> > > +      * Huge page's subpages have the same migrate type due to either
> > > +      * allocation from a free_list[] or alloc_contig_range() with param
> > > +      * MIGRATE_MOVABLE. So it is enough to check on a single subpage.
> > > +      */
> > > +     if (unlikely(flags & FOLL_LONGTERM) &&
> > > +             is_migrate_cma_page(page))
> > > +             return NULL;
> >
> > This doesn't seem very good actually.
> >
> > If I understand properly, if the system has randomly decided to place,
> > say, an anonymous page in a CMA region when an application did mmap(),
> > then when the application tries to use this page with a LONGTERM pin
> > it gets an immediate failure because of the above.
> No, actually, it will fall back to slow path, which migrates and sever
> the LONGTERM pin.
> 
> This patch just aims to fix the leakage in gup fast path, while in gup
> slow path, there is already logic to guard CMA against  LONGTERM pin.
> >
> > This not OK - the application should not be subject to random failures
> > related to long term pins beyond its direct control.
> >
> > Essentially, failures should only originate from the application using
> > specific mmap scenarios, not randomly based on something the MM did,
> > and certainly never for anonymous memory.
> >
> > I think the correct action here is to trigger migration of the page so
> > it is not in CMA.
> In fact, it does this. The failure in gup fast path will fall back to
> slow path, where __gup_longterm_locked->check_and_migrate_cma_pages()
> does the migration.

It is probably worth revising the commit message so this flow is clear

Jason


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv5 0/3] fix omission of check on FOLL_LONGTERM in gup fast path
  2020-02-28 11:32 [PATCHv5 0/3] fix omission of check on FOLL_LONGTERM in gup fast path Pingfan Liu
                   ` (2 preceding siblings ...)
  2020-02-28 11:32 ` [PATCHv5 3/3] mm/gup_benchemark: add LONGTERM_BENCHMARK test " Pingfan Liu
@ 2020-03-02 23:42 ` John Hubbard
  3 siblings, 0 replies; 15+ messages in thread
From: John Hubbard @ 2020-03-02 23:42 UTC (permalink / raw)
  To: Pingfan Liu, linux-mm
  Cc: Ira Weiny, Andrew Morton, Mike Rapoport, Dan Williams,
	Matthew Wilcox, Aneesh Kumar K.V, Keith Busch, Christoph Hellwig,
	Shuah Khan, linux-kernel

On 2/28/20 3:32 AM, Pingfan Liu wrote:
> The last V4 series:
> https://lore.kernel.org/patchwork/project/lkml/list/?series=397950, and be
> dropped from mm tree due to conflict with "RFC: switch the remaining
> architectures to use generic GUP" [1]
> 
> I rebase it and sent out V5.
> V4 -> V5: move around the patched code due to code change.
> 
> [1]: https://lore.kernel.org/linux-mm/20190601074959.14036-1-hch@lst.de/ 
> 

Hi,

This whole series conflicts pretty significantly with the upcoming "track
FOLL_PIN pages" patch series that is in mmtom. Can you please try to resolve
this a bit more? In other words:

The easiest way is to rebase against mmotm and target Linux 5.7, but I'm assuming
that since you've based this on today's linux.git (5.6-rc*), you want this to go
into 5.6,right?

If that's the case, then let's find the minimal fix for 5.6, and put the remaining
stuff (name changes, etc) into mmotm where it will have to fit in with the other
upcoming changes, please.


thanks,
-- 
John Hubbard
NVIDIA

> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mike Rapoport <rppt@linux.ibm.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> Cc: Keith Busch <keith.busch@intel.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Shuah Khan <shuah@kernel.org>
> To: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> 
> ---
> Pingfan Liu (3):
>   mm/gup: rename nr as nr_pinned in get_user_pages_fast()
>   mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path
>   mm/gup_benchemark: add LONGTERM_BENCHMARK test in gup fast path
> 
>  mm/gup.c                                   | 46 +++++++++++++++++++-----------
>  mm/gup_benchmark.c                         |  7 +++++
>  tools/testing/selftests/vm/gup_benchmark.c |  6 +++-
>  3 files changed, 41 insertions(+), 18 deletions(-)
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv5 2/3] mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path
  2020-02-28 11:32 ` [PATCHv5 2/3] mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path Pingfan Liu
  2020-02-28 13:44   ` Jason Gunthorpe
  2020-02-28 22:34   ` Ira Weiny
@ 2020-03-02 23:51   ` John Hubbard
  2020-03-03 13:38     ` Pingfan Liu
  2 siblings, 1 reply; 15+ messages in thread
From: John Hubbard @ 2020-03-02 23:51 UTC (permalink / raw)
  To: Pingfan Liu, linux-mm
  Cc: Ira Weiny, Andrew Morton, Mike Rapoport, Dan Williams,
	Matthew Wilcox, Aneesh Kumar K.V, Keith Busch, Christoph Hellwig,
	Shuah Khan, linux-kernel

On 2/28/20 3:32 AM, Pingfan Liu wrote:
> FOLL_LONGTERM suggests a pin which is going to be given to hardware and
> can't move. It would truncate CMA permanently and should be excluded.
> 
> FOLL_LONGTERM has already been checked in the slow path, but not checked in
> the fast path, which means a possible leak of CMA page to longterm pinned
> requirement through this crack.
> 
> Place a check in try_get_compound_head() in the fast path.
> 
> Some note about the check:
> Huge page's subpages have the same migrate type due to either
> allocation from a free_list[] or alloc_contig_range() with param
> MIGRATE_MOVABLE. So it is enough to check on a single subpage
> by is_migrate_cma_page(subpage)
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mike Rapoport <rppt@linux.ibm.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> Cc: Keith Busch <keith.busch@intel.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Shuah Khan <shuah@kernel.org>
> To: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  mm/gup.c | 26 +++++++++++++++++++-------
>  1 file changed, 19 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index cd8075e..f0d6804 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -33,9 +33,21 @@ struct follow_page_context {
>   * Return the compound head page with ref appropriately incremented,
>   * or NULL if that failed.
>   */
> -static inline struct page *try_get_compound_head(struct page *page, int refs)
> +static inline struct page *try_get_compound_head(struct page *page, int refs,
> +	unsigned int flags)


ohhh...please please look at the latest gup.c in mmotm, and this one in particular:

    commit 0ea2781c3de4 mm/gup: track FOLL_PIN pages

...where you'll see that there is a concept of "try_get*" vs. "try_grab*"). This is going
to be a huge mess if we do it as above, from a code structure point of view.

The "grab" functions take gup flags, the "get" functions do not.

Anyway, as I said in reply to the cover letter, I'm really uncomfortable with this 
being applied to linux.git. So maybe if we see a fix to mmotm, it will be clearer how
to port that back to linux.git (assuming that you need 5.6 fixed--do you though?)


thanks,
-- 
John Hubbard
NVIDIA


>  {
> -	struct page *head = compound_head(page);
> +	struct page *head;
> +
> +	/*
> +	 * Huge page's subpages have the same migrate type due to either
> +	 * allocation from a free_list[] or alloc_contig_range() with param
> +	 * MIGRATE_MOVABLE. So it is enough to check on a single subpage.
> +	 */
> +	if (unlikely(flags & FOLL_LONGTERM) &&
> +		is_migrate_cma_page(page))
> +		return NULL;
> +
> +	head = compound_head(page);
>  
>  	if (WARN_ON_ONCE(page_ref_count(head) < 0))
>  		return NULL;
> @@ -1908,7 +1920,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>  		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
>  		page = pte_page(pte);
>  
> -		head = try_get_compound_head(page, 1);
> +		head = try_get_compound_head(page, 1, flags);
>  		if (!head)
>  			goto pte_unmap;
>  
> @@ -2083,7 +2095,7 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
>  	page = head + ((addr & (sz-1)) >> PAGE_SHIFT);
>  	refs = record_subpages(page, addr, end, pages + *nr);
>  
> -	head = try_get_compound_head(head, refs);
> +	head = try_get_compound_head(head, refs, flags);
>  	if (!head)
>  		return 0;
>  
> @@ -2142,7 +2154,7 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
>  	page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
>  	refs = record_subpages(page, addr, end, pages + *nr);
>  
> -	head = try_get_compound_head(pmd_page(orig), refs);
> +	head = try_get_compound_head(pmd_page(orig), refs, flags);
>  	if (!head)
>  		return 0;
>  
> @@ -2174,7 +2186,7 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
>  	page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
>  	refs = record_subpages(page, addr, end, pages + *nr);
>  
> -	head = try_get_compound_head(pud_page(orig), refs);
> +	head = try_get_compound_head(pud_page(orig), refs, flags);
>  	if (!head)
>  		return 0;
>  
> @@ -2203,7 +2215,7 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
>  	page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
>  	refs = record_subpages(page, addr, end, pages + *nr);
>  
> -	head = try_get_compound_head(pgd_page(orig), refs);
> +	head = try_get_compound_head(pgd_page(orig), refs, flags);
>  	if (!head)
>  		return 0;
>  
> 





^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv5 2/3] mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path
  2020-03-02 23:51   ` John Hubbard
@ 2020-03-03 13:38     ` Pingfan Liu
  0 siblings, 0 replies; 15+ messages in thread
From: Pingfan Liu @ 2020-03-03 13:38 UTC (permalink / raw)
  To: John Hubbard
  Cc: Linux-MM, Ira Weiny, Andrew Morton, Mike Rapoport, Dan Williams,
	Matthew Wilcox, Aneesh Kumar K.V, Christoph Hellwig, Shuah Khan,
	LKML

On Tue, Mar 3, 2020 at 7:51 AM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 2/28/20 3:32 AM, Pingfan Liu wrote:
> > FOLL_LONGTERM suggests a pin which is going to be given to hardware and
> > can't move. It would truncate CMA permanently and should be excluded.
> >
> > FOLL_LONGTERM has already been checked in the slow path, but not checked in
> > the fast path, which means a possible leak of CMA page to longterm pinned
> > requirement through this crack.
> >
> > Place a check in try_get_compound_head() in the fast path.
> >
> > Some note about the check:
> > Huge page's subpages have the same migrate type due to either
> > allocation from a free_list[] or alloc_contig_range() with param
> > MIGRATE_MOVABLE. So it is enough to check on a single subpage
> > by is_migrate_cma_page(subpage)
> >
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Cc: Ira Weiny <ira.weiny@intel.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Mike Rapoport <rppt@linux.ibm.com>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Matthew Wilcox <willy@infradead.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> > Cc: Keith Busch <keith.busch@intel.com>
> > Cc: Christoph Hellwig <hch@infradead.org>
> > Cc: Shuah Khan <shuah@kernel.org>
> > To: linux-mm@kvack.org
> > Cc: linux-kernel@vger.kernel.org
> > ---
> >  mm/gup.c | 26 +++++++++++++++++++-------
> >  1 file changed, 19 insertions(+), 7 deletions(-)
> >
> > diff --git a/mm/gup.c b/mm/gup.c
> > index cd8075e..f0d6804 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -33,9 +33,21 @@ struct follow_page_context {
> >   * Return the compound head page with ref appropriately incremented,
> >   * or NULL if that failed.
> >   */
> > -static inline struct page *try_get_compound_head(struct page *page, int refs)
> > +static inline struct page *try_get_compound_head(struct page *page, int refs,
> > +     unsigned int flags)
>
>
> ohhh...please please look at the latest gup.c in mmotm, and this one in particular:
>
>     commit 0ea2781c3de4 mm/gup: track FOLL_PIN pages
>
> ...where you'll see that there is a concept of "try_get*" vs. "try_grab*"). This is going
> to be a huge mess if we do it as above, from a code structure point of view.
>
> The "grab" functions take gup flags, the "get" functions do not.
>
> Anyway, as I said in reply to the cover letter, I'm really uncomfortable with this
> being applied to linux.git. So maybe if we see a fix to mmotm, it will be clearer how
> to port that back to linux.git (assuming that you need 5.6 fixed--do you though?)
Sure, I will read your series and figure out the way to rebase my
patches on mmotm at first.

Thanks,
Pingfan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv5 2/3] mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path
  2020-03-02 13:08       ` Jason Gunthorpe
@ 2020-03-03 13:39         ` Pingfan Liu
  0 siblings, 0 replies; 15+ messages in thread
From: Pingfan Liu @ 2020-03-03 13:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Linux-MM, Ira Weiny, Andrew Morton, Mike Rapoport, Dan Williams,
	Matthew Wilcox, John Hubbard, Aneesh Kumar K.V, Keith Busch,
	Christoph Hellwig, Shuah Khan, LKML

On Mon, Mar 2, 2020 at 9:08 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Mon, Mar 02, 2020 at 10:25:52AM +0800, Pingfan Liu wrote:
> > On Fri, Feb 28, 2020 at 9:44 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Fri, Feb 28, 2020 at 07:32:29PM +0800, Pingfan Liu wrote:
> > > > FOLL_LONGTERM suggests a pin which is going to be given to hardware and
> > > > can't move. It would truncate CMA permanently and should be excluded.
> > > >
> > > > FOLL_LONGTERM has already been checked in the slow path, but not checked in
> > > > the fast path, which means a possible leak of CMA page to longterm pinned
> > > > requirement through this crack.
> > > >
> > > > Place a check in try_get_compound_head() in the fast path.
> > > >
> > > > Some note about the check:
> > > > Huge page's subpages have the same migrate type due to either
> > > > allocation from a free_list[] or alloc_contig_range() with param
> > > > MIGRATE_MOVABLE. So it is enough to check on a single subpage
> > > > by is_migrate_cma_page(subpage)
> > > >
> > > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > > > Cc: Ira Weiny <ira.weiny@intel.com>
> > > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > > Cc: Mike Rapoport <rppt@linux.ibm.com>
> > > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > > Cc: Matthew Wilcox <willy@infradead.org>
> > > > Cc: John Hubbard <jhubbard@nvidia.com>
> > > > Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> > > > Cc: Keith Busch <keith.busch@intel.com>
> > > > Cc: Christoph Hellwig <hch@infradead.org>
> > > > Cc: Shuah Khan <shuah@kernel.org>
> > > > To: linux-mm@kvack.org
> > > > Cc: linux-kernel@vger.kernel.org
> > > >  mm/gup.c | 26 +++++++++++++++++++-------
> > > >  1 file changed, 19 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/mm/gup.c b/mm/gup.c
> > > > index cd8075e..f0d6804 100644
> > > > +++ b/mm/gup.c
> > > > @@ -33,9 +33,21 @@ struct follow_page_context {
> > > >   * Return the compound head page with ref appropriately incremented,
> > > >   * or NULL if that failed.
> > > >   */
> > > > -static inline struct page *try_get_compound_head(struct page *page, int refs)
> > > > +static inline struct page *try_get_compound_head(struct page *page, int refs,
> > > > +     unsigned int flags)
> > > >  {
> > > > -     struct page *head = compound_head(page);
> > > > +     struct page *head;
> > > > +
> > > > +     /*
> > > > +      * Huge page's subpages have the same migrate type due to either
> > > > +      * allocation from a free_list[] or alloc_contig_range() with param
> > > > +      * MIGRATE_MOVABLE. So it is enough to check on a single subpage.
> > > > +      */
> > > > +     if (unlikely(flags & FOLL_LONGTERM) &&
> > > > +             is_migrate_cma_page(page))
> > > > +             return NULL;
> > >
> > > This doesn't seem very good actually.
> > >
> > > If I understand properly, if the system has randomly decided to place,
> > > say, an anonymous page in a CMA region when an application did mmap(),
> > > then when the application tries to use this page with a LONGTERM pin
> > > it gets an immediate failure because of the above.
> > No, actually, it will fall back to slow path, which migrates and sever
> > the LONGTERM pin.
> >
> > This patch just aims to fix the leakage in gup fast path, while in gup
> > slow path, there is already logic to guard CMA against  LONGTERM pin.
> > >
> > > This not OK - the application should not be subject to random failures
> > > related to long term pins beyond its direct control.
> > >
> > > Essentially, failures should only originate from the application using
> > > specific mmap scenarios, not randomly based on something the MM did,
> > > and certainly never for anonymous memory.
> > >
> > > I think the correct action here is to trigger migration of the page so
> > > it is not in CMA.
> > In fact, it does this. The failure in gup fast path will fall back to
> > slow path, where __gup_longterm_locked->check_and_migrate_cma_pages()
> > does the migration.
>
> It is probably worth revising the commit message so this flow is clear
OK.

Thanks,
Pingfan


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-03-03 13:39 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-28 11:32 [PATCHv5 0/3] fix omission of check on FOLL_LONGTERM in gup fast path Pingfan Liu
2020-02-28 11:32 ` [PATCHv5 1/3] mm/gup: rename nr as nr_pinned in internal_get_user_pages_fast() Pingfan Liu
2020-02-28 11:32 ` [PATCHv5 2/3] mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path Pingfan Liu
2020-02-28 13:44   ` Jason Gunthorpe
2020-03-02  2:25     ` Pingfan Liu
2020-03-02 13:08       ` Jason Gunthorpe
2020-03-03 13:39         ` Pingfan Liu
2020-02-28 22:34   ` Ira Weiny
2020-03-02  2:28     ` Pingfan Liu
2020-03-02 23:51   ` John Hubbard
2020-03-03 13:38     ` Pingfan Liu
2020-02-28 11:32 ` [PATCHv5 3/3] mm/gup_benchemark: add LONGTERM_BENCHMARK test " Pingfan Liu
2020-02-28 15:43   ` Alexander Duyck
2020-03-02  2:38     ` Pingfan Liu
2020-03-02 23:42 ` [PATCHv5 0/3] fix omission of check on FOLL_LONGTERM " John Hubbard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).