linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] mm/gup: some cleanups
@ 2022-01-31  5:17 John Hubbard
  2022-01-31  5:17 ` [PATCH 1/4] mm: Fix invalid page pointer returned with FOLL_PIN gups John Hubbard
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: John Hubbard @ 2022-01-31  5:17 UTC (permalink / raw)
  To: Andrew Morton, Peter Xu, Jason Gunthorpe
  Cc: Jan Kara, Claudio Imbrenda, Kirill A . Shutemov, Alex Williamson,
	Andrea Arcangeli, Jérôme Glisse, LKML, linux-mm,
	John Hubbard

Hi Peter, Jason and all,

I'm including Peter's patch as the first one in this tiny series. (The
commit description has my r-b tag in place of my Cc.) The second patch
is what I had in mind for a follow-up to that, when we were discussing
that fix [1].

Plus, a couple more small removals that I had queued up:

The third patch removes a completely unused routine:
pin_user_pages_locked().

The forth patch removes a similar routine, get_user_pages_locked(), that
only has one caller--and that caller is not simplified by calling
get_user_pages_locked().

[1] https://lore.kernel.org/all/20220125033700.69705-1-peterx@redhat.com/

thanks,
John Hubbard

John Hubbard (3):
  mm/gup: clean up follow_pfn_pte() slightly
  mm/gup: remove unused pin_user_pages_locked()
  mm/gup: remove get_user_pages_locked()

Peter Xu (1):
  mm: Fix invalid page pointer returned with FOLL_PIN gups

 include/linux/mm.h |   4 --
 mm/gup.c           | 100 ++++-----------------------------------------
 mm/mempolicy.c     |  22 +++++-----
 3 files changed, 17 insertions(+), 109 deletions(-)


base-commit: 26291c54e111ff6ba87a164d85d4a4e134b7315c
-- 
2.35.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] mm: Fix invalid page pointer returned with FOLL_PIN gups
  2022-01-31  5:17 [PATCH 0/4] mm/gup: some cleanups John Hubbard
@ 2022-01-31  5:17 ` John Hubbard
  2022-02-02 14:04   ` Christoph Hellwig
  2022-01-31  5:17 ` [PATCH 2/4] mm/gup: clean up follow_pfn_pte() slightly John Hubbard
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: John Hubbard @ 2022-01-31  5:17 UTC (permalink / raw)
  To: Andrew Morton, Peter Xu, Jason Gunthorpe
  Cc: Jan Kara, Claudio Imbrenda, Kirill A . Shutemov, Alex Williamson,
	Andrea Arcangeli, Jérôme Glisse, LKML, linux-mm,
	John Hubbard

From: Peter Xu <peterx@redhat.com>

Alex reported invalid page pointer returned with pin_user_pages_remote() from
vfio after upstream commit 4b6c33b32296 ("vfio/type1: Prepare for batched
pinning with struct vfio_batch").  This problem breaks NVIDIA vfio mdev.

It turns out that it's not the fault of the vfio commit; however after vfio
switches to a full page buffer to store the page pointers it starts to expose
the problem easier.

The problem is for VM_PFNMAP vmas we should normally fail with an -EFAULT then
vfio will carry on to handle the MMIO regions.  However when the bug triggered,
follow_page_mask() returned -EEXIST for such a page, which will jump over the
current page, leaving that entry in **pages untouched.  However the caller is
not aware of it, hence the caller will reference the page as usual even if the
pointer data can be anything.

We had that -EEXIST logic since commit 1027e4436b6a ("mm: make GUP handle pfn
mapping unless FOLL_GET is requested") which seems very reasonable.  It could
be that when we reworked GUP with FOLL_PIN we could have overlooked that
special path in commit 3faa52c03f44 ("mm/gup: track FOLL_PIN pages"), even if
that commit rightfully touched up follow_devmap_pud() on checking FOLL_PIN when
it needs to return an -EEXIST.

Attaching the Fixes to the FOLL_PIN rework commit, as it happened later than
1027e4436b6a.

Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Fixes: 3faa52c03f44 ("mm/gup: track FOLL_PIN pages")
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Debugged-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/gup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/gup.c b/mm/gup.c
index f0af462ac1e2..65575ae3602f 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -440,7 +440,7 @@ static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address,
 		pte_t *pte, unsigned int flags)
 {
 	/* No page to get reference */
-	if (flags & FOLL_GET)
+	if (flags & (FOLL_GET | FOLL_PIN))
 		return -EFAULT;
 
 	if (flags & FOLL_TOUCH) {
-- 
2.35.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4] mm/gup: clean up follow_pfn_pte() slightly
  2022-01-31  5:17 [PATCH 0/4] mm/gup: some cleanups John Hubbard
  2022-01-31  5:17 ` [PATCH 1/4] mm: Fix invalid page pointer returned with FOLL_PIN gups John Hubbard
@ 2022-01-31  5:17 ` John Hubbard
  2022-01-31 13:36   ` Jason Gunthorpe
  2022-01-31  5:17 ` [PATCH 3/4] mm/gup: remove unused pin_user_pages_locked() John Hubbard
  2022-01-31  5:17 ` [PATCH 4/4] mm/gup: remove get_user_pages_locked() John Hubbard
  3 siblings, 1 reply; 14+ messages in thread
From: John Hubbard @ 2022-01-31  5:17 UTC (permalink / raw)
  To: Andrew Morton, Peter Xu, Jason Gunthorpe
  Cc: Jan Kara, Claudio Imbrenda, Kirill A . Shutemov, Alex Williamson,
	Andrea Arcangeli, Jérôme Glisse, LKML, linux-mm,
	John Hubbard

Regardless of any FOLL_* flags, get_user_pages() and its variants should
handle PFN-only entries by stopping early, if the caller expected
**pages to be filled in.

This makes for a more reliable API, as compared to the previous approach
of skipping over such entries (and thus leaving them silently
unwritten).

Cc: Peter Xu <peterx@redhat.com>
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 mm/gup.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 65575ae3602f..8633bca12eab 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -439,10 +439,6 @@ static struct page *no_page_table(struct vm_area_struct *vma,
 static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address,
 		pte_t *pte, unsigned int flags)
 {
-	/* No page to get reference */
-	if (flags & (FOLL_GET | FOLL_PIN))
-		return -EFAULT;
-
 	if (flags & FOLL_TOUCH) {
 		pte_t entry = *pte;
 
@@ -1180,8 +1176,14 @@ static long __get_user_pages(struct mm_struct *mm,
 		} else if (PTR_ERR(page) == -EEXIST) {
 			/*
 			 * Proper page table entry exists, but no corresponding
-			 * struct page.
+			 * struct page. If the caller expects **pages to be
+			 * filled in, bail out now, because that can't be done
+			 * for this page.
 			 */
+			if (pages) {
+				page = ERR_PTR(-EFAULT);
+				goto out;
+			}
 			goto next_page;
 		} else if (IS_ERR(page)) {
 			ret = PTR_ERR(page);
-- 
2.35.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4] mm/gup: remove unused pin_user_pages_locked()
  2022-01-31  5:17 [PATCH 0/4] mm/gup: some cleanups John Hubbard
  2022-01-31  5:17 ` [PATCH 1/4] mm: Fix invalid page pointer returned with FOLL_PIN gups John Hubbard
  2022-01-31  5:17 ` [PATCH 2/4] mm/gup: clean up follow_pfn_pte() slightly John Hubbard
@ 2022-01-31  5:17 ` John Hubbard
  2022-01-31 14:06   ` David Hildenbrand
                     ` (2 more replies)
  2022-01-31  5:17 ` [PATCH 4/4] mm/gup: remove get_user_pages_locked() John Hubbard
  3 siblings, 3 replies; 14+ messages in thread
From: John Hubbard @ 2022-01-31  5:17 UTC (permalink / raw)
  To: Andrew Morton, Peter Xu, Jason Gunthorpe
  Cc: Jan Kara, Claudio Imbrenda, Kirill A . Shutemov, Alex Williamson,
	Andrea Arcangeli, Jérôme Glisse, LKML, linux-mm,
	John Hubbard

This routine was used for a short while, but then the calling code was
refactored and the only caller was removed.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 include/linux/mm.h |  2 --
 mm/gup.c           | 29 -----------------------------
 2 files changed, 31 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 213cc569b192..80c540c17d83 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1918,8 +1918,6 @@ long pin_user_pages(unsigned long start, unsigned long nr_pages,
 		    struct vm_area_struct **vmas);
 long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
 		    unsigned int gup_flags, struct page **pages, int *locked);
-long pin_user_pages_locked(unsigned long start, unsigned long nr_pages,
-		    unsigned int gup_flags, struct page **pages, int *locked);
 long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
 		    struct page **pages, unsigned int gup_flags);
 long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
diff --git a/mm/gup.c b/mm/gup.c
index 8633bca12eab..58d01a96ab30 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -3120,32 +3120,3 @@ long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
 	return get_user_pages_unlocked(start, nr_pages, pages, gup_flags);
 }
 EXPORT_SYMBOL(pin_user_pages_unlocked);
-
-/*
- * pin_user_pages_locked() is the FOLL_PIN variant of get_user_pages_locked().
- * Behavior is the same, except that this one sets FOLL_PIN and rejects
- * FOLL_GET.
- */
-long pin_user_pages_locked(unsigned long start, unsigned long nr_pages,
-			   unsigned int gup_flags, struct page **pages,
-			   int *locked)
-{
-	/*
-	 * FIXME: Current FOLL_LONGTERM behavior is incompatible with
-	 * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on
-	 * vmas.  As there are no users of this flag in this call we simply
-	 * disallow this option for now.
-	 */
-	if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM))
-		return -EINVAL;
-
-	/* FOLL_GET and FOLL_PIN are mutually exclusive. */
-	if (WARN_ON_ONCE(gup_flags & FOLL_GET))
-		return -EINVAL;
-
-	gup_flags |= FOLL_PIN;
-	return __get_user_pages_locked(current->mm, start, nr_pages,
-				       pages, NULL, locked,
-				       gup_flags | FOLL_TOUCH);
-}
-EXPORT_SYMBOL(pin_user_pages_locked);
-- 
2.35.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] mm/gup: remove get_user_pages_locked()
  2022-01-31  5:17 [PATCH 0/4] mm/gup: some cleanups John Hubbard
                   ` (2 preceding siblings ...)
  2022-01-31  5:17 ` [PATCH 3/4] mm/gup: remove unused pin_user_pages_locked() John Hubbard
@ 2022-01-31  5:17 ` John Hubbard
  2022-01-31 12:05   ` Jan Kara
  2022-01-31 13:36   ` Jason Gunthorpe
  3 siblings, 2 replies; 14+ messages in thread
From: John Hubbard @ 2022-01-31  5:17 UTC (permalink / raw)
  To: Andrew Morton, Peter Xu, Jason Gunthorpe
  Cc: Jan Kara, Claudio Imbrenda, Kirill A . Shutemov, Alex Williamson,
	Andrea Arcangeli, Jérôme Glisse, LKML, linux-mm,
	John Hubbard

Unraveling the rat's nest set of APIs in mm/gup.c a bit more.
get_user_pages_locked() was not helping at all, so remove it.

Also, lookup_node() has only a single caller, but it is still worth
having a clearer locking policy there. Changing it so that the caller
both takes and releases the mmap_lock, thus leaving lookup_node() with
the sole job of translating a virtual address into a numa node ID.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 include/linux/mm.h |  2 --
 mm/gup.c           | 59 ----------------------------------------------
 mm/mempolicy.c     | 22 ++++++++---------
 3 files changed, 10 insertions(+), 73 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 80c540c17d83..528ef1cb4f3a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1916,8 +1916,6 @@ long get_user_pages(unsigned long start, unsigned long nr_pages,
 long pin_user_pages(unsigned long start, unsigned long nr_pages,
 		    unsigned int gup_flags, struct page **pages,
 		    struct vm_area_struct **vmas);
-long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
-		    unsigned int gup_flags, struct page **pages, int *locked);
 long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
 		    struct page **pages, unsigned int gup_flags);
 long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
diff --git a/mm/gup.c b/mm/gup.c
index 58d01a96ab30..4a43c79f0972 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2119,65 +2119,6 @@ long get_user_pages(unsigned long start, unsigned long nr_pages,
 }
 EXPORT_SYMBOL(get_user_pages);
 
-/**
- * get_user_pages_locked() - variant of get_user_pages()
- *
- * @start:      starting user address
- * @nr_pages:   number of pages from start to pin
- * @gup_flags:  flags modifying lookup behaviour
- * @pages:      array that receives pointers to the pages pinned.
- *              Should be at least nr_pages long. Or NULL, if caller
- *              only intends to ensure the pages are faulted in.
- * @locked:     pointer to lock flag indicating whether lock is held and
- *              subsequently whether VM_FAULT_RETRY functionality can be
- *              utilised. Lock must initially be held.
- *
- * It is suitable to replace the form:
- *
- *      mmap_read_lock(mm);
- *      do_something()
- *      get_user_pages(mm, ..., pages, NULL);
- *      mmap_read_unlock(mm);
- *
- *  to:
- *
- *      int locked = 1;
- *      mmap_read_lock(mm);
- *      do_something()
- *      get_user_pages_locked(mm, ..., pages, &locked);
- *      if (locked)
- *          mmap_read_unlock(mm);
- *
- * We can leverage the VM_FAULT_RETRY functionality in the page fault
- * paths better by using either get_user_pages_locked() or
- * get_user_pages_unlocked().
- *
- */
-long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
-			   unsigned int gup_flags, struct page **pages,
-			   int *locked)
-{
-	/*
-	 * FIXME: Current FOLL_LONGTERM behavior is incompatible with
-	 * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on
-	 * vmas.  As there are no users of this flag in this call we simply
-	 * disallow this option for now.
-	 */
-	if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM))
-		return -EINVAL;
-	/*
-	 * FOLL_PIN must only be set internally by the pin_user_pages*() APIs,
-	 * never directly by the caller, so enforce that:
-	 */
-	if (WARN_ON_ONCE(gup_flags & FOLL_PIN))
-		return -EINVAL;
-
-	return __get_user_pages_locked(current->mm, start, nr_pages,
-				       pages, NULL, locked,
-				       gup_flags | FOLL_TOUCH);
-}
-EXPORT_SYMBOL(get_user_pages_locked);
-
 /*
  * get_user_pages_unlocked() is suitable to replace the form:
  *
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 028e8dd82b44..040d88354cfa 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -907,17 +907,15 @@ static void get_policy_nodemask(struct mempolicy *p, nodemask_t *nodes)
 static int lookup_node(struct mm_struct *mm, unsigned long addr)
 {
 	struct page *p = NULL;
-	int err;
+	int ret;
 
-	int locked = 1;
-	err = get_user_pages_locked(addr & PAGE_MASK, 1, 0, &p, &locked);
-	if (err > 0) {
-		err = page_to_nid(p);
+	mmap_assert_locked(mm);
+	ret = get_user_pages(addr & PAGE_MASK, 1, 0, &p, NULL);
+	if (ret > 0) {
+		ret = page_to_nid(p);
 		put_page(p);
 	}
-	if (locked)
-		mmap_read_unlock(mm);
-	return err;
+	return ret;
 }
 
 /* Retrieve NUMA policy */
@@ -968,15 +966,15 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask,
 	if (flags & MPOL_F_NODE) {
 		if (flags & MPOL_F_ADDR) {
 			/*
-			 * Take a refcount on the mpol, lookup_node()
-			 * will drop the mmap_lock, so after calling
-			 * lookup_node() only "pol" remains valid, "vma"
-			 * is stale.
+			 * Take a refcount on the mpol, because we are about to
+			 * drop the mmap_lock, after which only "pol" remains
+			 * valid, "vma" is stale.
 			 */
 			pol_refcount = pol;
 			vma = NULL;
 			mpol_get(pol);
 			err = lookup_node(mm, addr);
+			mmap_read_unlock(mm);
 			if (err < 0)
 				goto out;
 			*policy = err;
-- 
2.35.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] mm/gup: remove get_user_pages_locked()
  2022-01-31  5:17 ` [PATCH 4/4] mm/gup: remove get_user_pages_locked() John Hubbard
@ 2022-01-31 12:05   ` Jan Kara
  2022-01-31 20:01     ` John Hubbard
  2022-01-31 13:36   ` Jason Gunthorpe
  1 sibling, 1 reply; 14+ messages in thread
From: Jan Kara @ 2022-01-31 12:05 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, Peter Xu, Jason Gunthorpe, Jan Kara,
	Claudio Imbrenda, Kirill A . Shutemov, Alex Williamson,
	Andrea Arcangeli, Jérôme Glisse, LKML, linux-mm

On Sun 30-01-22 21:17:52, John Hubbard wrote:
> Unraveling the rat's nest set of APIs in mm/gup.c a bit more.
> get_user_pages_locked() was not helping at all, so remove it.
> 
> Also, lookup_node() has only a single caller, but it is still worth
> having a clearer locking policy there. Changing it so that the caller
> both takes and releases the mmap_lock, thus leaving lookup_node() with
> the sole job of translating a virtual address into a numa node ID.
> 
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>

Well, the point of _locked() GUP variants is that we can unlock mmap_sem
when reading a page from the disk during a page fault (hidden behind
VM_FAULT_RETRY). So as such _locked() variants are about reducing mmap_sem
latency rather than code readability.  In this particular case, I don't
think using _locked() variant in lookup_node() is very beneficial
(generally I would not expect to take a fault there) but at least a
justification in the commit message should be different :).

								Honza

> ---
>  include/linux/mm.h |  2 --
>  mm/gup.c           | 59 ----------------------------------------------
>  mm/mempolicy.c     | 22 ++++++++---------
>  3 files changed, 10 insertions(+), 73 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 80c540c17d83..528ef1cb4f3a 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1916,8 +1916,6 @@ long get_user_pages(unsigned long start, unsigned long nr_pages,
>  long pin_user_pages(unsigned long start, unsigned long nr_pages,
>  		    unsigned int gup_flags, struct page **pages,
>  		    struct vm_area_struct **vmas);
> -long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
> -		    unsigned int gup_flags, struct page **pages, int *locked);
>  long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
>  		    struct page **pages, unsigned int gup_flags);
>  long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
> diff --git a/mm/gup.c b/mm/gup.c
> index 58d01a96ab30..4a43c79f0972 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2119,65 +2119,6 @@ long get_user_pages(unsigned long start, unsigned long nr_pages,
>  }
>  EXPORT_SYMBOL(get_user_pages);
>  
> -/**
> - * get_user_pages_locked() - variant of get_user_pages()
> - *
> - * @start:      starting user address
> - * @nr_pages:   number of pages from start to pin
> - * @gup_flags:  flags modifying lookup behaviour
> - * @pages:      array that receives pointers to the pages pinned.
> - *              Should be at least nr_pages long. Or NULL, if caller
> - *              only intends to ensure the pages are faulted in.
> - * @locked:     pointer to lock flag indicating whether lock is held and
> - *              subsequently whether VM_FAULT_RETRY functionality can be
> - *              utilised. Lock must initially be held.
> - *
> - * It is suitable to replace the form:
> - *
> - *      mmap_read_lock(mm);
> - *      do_something()
> - *      get_user_pages(mm, ..., pages, NULL);
> - *      mmap_read_unlock(mm);
> - *
> - *  to:
> - *
> - *      int locked = 1;
> - *      mmap_read_lock(mm);
> - *      do_something()
> - *      get_user_pages_locked(mm, ..., pages, &locked);
> - *      if (locked)
> - *          mmap_read_unlock(mm);
> - *
> - * We can leverage the VM_FAULT_RETRY functionality in the page fault
> - * paths better by using either get_user_pages_locked() or
> - * get_user_pages_unlocked().
> - *
> - */
> -long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
> -			   unsigned int gup_flags, struct page **pages,
> -			   int *locked)
> -{
> -	/*
> -	 * FIXME: Current FOLL_LONGTERM behavior is incompatible with
> -	 * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on
> -	 * vmas.  As there are no users of this flag in this call we simply
> -	 * disallow this option for now.
> -	 */
> -	if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM))
> -		return -EINVAL;
> -	/*
> -	 * FOLL_PIN must only be set internally by the pin_user_pages*() APIs,
> -	 * never directly by the caller, so enforce that:
> -	 */
> -	if (WARN_ON_ONCE(gup_flags & FOLL_PIN))
> -		return -EINVAL;
> -
> -	return __get_user_pages_locked(current->mm, start, nr_pages,
> -				       pages, NULL, locked,
> -				       gup_flags | FOLL_TOUCH);
> -}
> -EXPORT_SYMBOL(get_user_pages_locked);
> -
>  /*
>   * get_user_pages_unlocked() is suitable to replace the form:
>   *
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 028e8dd82b44..040d88354cfa 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -907,17 +907,15 @@ static void get_policy_nodemask(struct mempolicy *p, nodemask_t *nodes)
>  static int lookup_node(struct mm_struct *mm, unsigned long addr)
>  {
>  	struct page *p = NULL;
> -	int err;
> +	int ret;
>  
> -	int locked = 1;
> -	err = get_user_pages_locked(addr & PAGE_MASK, 1, 0, &p, &locked);
> -	if (err > 0) {
> -		err = page_to_nid(p);
> +	mmap_assert_locked(mm);
> +	ret = get_user_pages(addr & PAGE_MASK, 1, 0, &p, NULL);
> +	if (ret > 0) {
> +		ret = page_to_nid(p);
>  		put_page(p);
>  	}
> -	if (locked)
> -		mmap_read_unlock(mm);
> -	return err;
> +	return ret;
>  }
>  
>  /* Retrieve NUMA policy */
> @@ -968,15 +966,15 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask,
>  	if (flags & MPOL_F_NODE) {
>  		if (flags & MPOL_F_ADDR) {
>  			/*
> -			 * Take a refcount on the mpol, lookup_node()
> -			 * will drop the mmap_lock, so after calling
> -			 * lookup_node() only "pol" remains valid, "vma"
> -			 * is stale.
> +			 * Take a refcount on the mpol, because we are about to
> +			 * drop the mmap_lock, after which only "pol" remains
> +			 * valid, "vma" is stale.
>  			 */
>  			pol_refcount = pol;
>  			vma = NULL;
>  			mpol_get(pol);
>  			err = lookup_node(mm, addr);
> +			mmap_read_unlock(mm);
>  			if (err < 0)
>  				goto out;
>  			*policy = err;
> -- 
> 2.35.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] mm/gup: remove get_user_pages_locked()
  2022-01-31  5:17 ` [PATCH 4/4] mm/gup: remove get_user_pages_locked() John Hubbard
  2022-01-31 12:05   ` Jan Kara
@ 2022-01-31 13:36   ` Jason Gunthorpe
  2022-01-31 20:01     ` John Hubbard
  1 sibling, 1 reply; 14+ messages in thread
From: Jason Gunthorpe @ 2022-01-31 13:36 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, Peter Xu, Jan Kara, Claudio Imbrenda,
	Kirill A . Shutemov, Alex Williamson, Andrea Arcangeli,
	Jérôme Glisse, LKML, linux-mm

On Sun, Jan 30, 2022 at 09:17:52PM -0800, John Hubbard wrote:

> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 028e8dd82b44..040d88354cfa 100644
> +++ b/mm/mempolicy.c
> @@ -907,17 +907,15 @@ static void get_policy_nodemask(struct mempolicy *p, nodemask_t *nodes)
>  static int lookup_node(struct mm_struct *mm, unsigned long addr)
>  {
>  	struct page *p = NULL;
> -	int err;
> +	int ret;
>  
> -	int locked = 1;
> -	err = get_user_pages_locked(addr & PAGE_MASK, 1, 0, &p, &locked);
> -	if (err > 0) {
> -		err = page_to_nid(p);
> +	mmap_assert_locked(mm);
> +	ret = get_user_pages(addr & PAGE_MASK, 1, 0, &p, NULL);
> +	if (ret > 0) {
> +		ret = page_to_nid(p);
>  		put_page(p);
>  	}
> -	if (locked)
> -		mmap_read_unlock(mm);
> -	return err;
> +	return ret;
>  }
>  
>  /* Retrieve NUMA policy */
> @@ -968,15 +966,15 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask,
>  	if (flags & MPOL_F_NODE) {
>  		if (flags & MPOL_F_ADDR) {
>  			/*
> -			 * Take a refcount on the mpol, lookup_node()
> -			 * will drop the mmap_lock, so after calling
> -			 * lookup_node() only "pol" remains valid, "vma"
> -			 * is stale.
> +			 * Take a refcount on the mpol, because we are about to
> +			 * drop the mmap_lock, after which only "pol" remains
> +			 * valid, "vma" is stale.
>  			 */
>  			pol_refcount = pol;
>  			vma = NULL;
>  			mpol_get(pol);
>  			err = lookup_node(mm, addr);
> +			mmap_read_unlock(mm);

How about move the mmap_read_unlock up one line and then use
get_user_pages_fast()

I'm guessing in most cases here the PTE will be present so that should
be a net win?

Jason

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] mm/gup: clean up follow_pfn_pte() slightly
  2022-01-31  5:17 ` [PATCH 2/4] mm/gup: clean up follow_pfn_pte() slightly John Hubbard
@ 2022-01-31 13:36   ` Jason Gunthorpe
  0 siblings, 0 replies; 14+ messages in thread
From: Jason Gunthorpe @ 2022-01-31 13:36 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, Peter Xu, Jan Kara, Claudio Imbrenda,
	Kirill A . Shutemov, Alex Williamson, Andrea Arcangeli,
	Jérôme Glisse, LKML, linux-mm

On Sun, Jan 30, 2022 at 09:17:50PM -0800, John Hubbard wrote:
> Regardless of any FOLL_* flags, get_user_pages() and its variants should
> handle PFN-only entries by stopping early, if the caller expected
> **pages to be filled in.
> 
> This makes for a more reliable API, as compared to the previous approach
> of skipping over such entries (and thus leaving them silently
> unwritten).
> 
> Cc: Peter Xu <peterx@redhat.com>
> Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  mm/gup.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)

I still think it should be squashed into the previous path, but
otherwise

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] mm/gup: remove unused pin_user_pages_locked()
  2022-01-31  5:17 ` [PATCH 3/4] mm/gup: remove unused pin_user_pages_locked() John Hubbard
@ 2022-01-31 14:06   ` David Hildenbrand
  2022-01-31 14:43   ` Jason Gunthorpe
  2022-02-02 14:05   ` Christoph Hellwig
  2 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2022-01-31 14:06 UTC (permalink / raw)
  To: John Hubbard, Andrew Morton, Peter Xu, Jason Gunthorpe
  Cc: Jan Kara, Claudio Imbrenda, Kirill A . Shutemov, Alex Williamson,
	Andrea Arcangeli, Jérôme Glisse, LKML, linux-mm

On 31.01.22 06:17, John Hubbard wrote:
> This routine was used for a short while, but then the calling code was
> refactored and the only caller was removed.
> 
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] mm/gup: remove unused pin_user_pages_locked()
  2022-01-31  5:17 ` [PATCH 3/4] mm/gup: remove unused pin_user_pages_locked() John Hubbard
  2022-01-31 14:06   ` David Hildenbrand
@ 2022-01-31 14:43   ` Jason Gunthorpe
  2022-02-02 14:05   ` Christoph Hellwig
  2 siblings, 0 replies; 14+ messages in thread
From: Jason Gunthorpe @ 2022-01-31 14:43 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, Peter Xu, Jan Kara, Claudio Imbrenda,
	Kirill A . Shutemov, Alex Williamson, Andrea Arcangeli,
	Jérôme Glisse, LKML, linux-mm

On Sun, Jan 30, 2022 at 09:17:51PM -0800, John Hubbard wrote:
> This routine was used for a short while, but then the calling code was
> refactored and the only caller was removed.
> 
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  include/linux/mm.h |  2 --
>  mm/gup.c           | 29 -----------------------------
>  2 files changed, 31 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] mm/gup: remove get_user_pages_locked()
  2022-01-31 12:05   ` Jan Kara
@ 2022-01-31 20:01     ` John Hubbard
  0 siblings, 0 replies; 14+ messages in thread
From: John Hubbard @ 2022-01-31 20:01 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, Peter Xu, Jason Gunthorpe, Claudio Imbrenda,
	Kirill A . Shutemov, Alex Williamson, Andrea Arcangeli,
	Jérôme Glisse, LKML, linux-mm

On 1/31/22 04:05, Jan Kara wrote:
> On Sun 30-01-22 21:17:52, John Hubbard wrote:
>> Unraveling the rat's nest set of APIs in mm/gup.c a bit more.
>> get_user_pages_locked() was not helping at all, so remove it.
>>
>> Also, lookup_node() has only a single caller, but it is still worth
>> having a clearer locking policy there. Changing it so that the caller
>> both takes and releases the mmap_lock, thus leaving lookup_node() with
>> the sole job of translating a virtual address into a numa node ID.
>>
>> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> 
> Well, the point of _locked() GUP variants is that we can unlock mmap_sem
> when reading a page from the disk during a page fault (hidden behind
> VM_FAULT_RETRY). So as such _locked() variants are about reducing mmap_sem
> latency rather than code readability.  In this particular case, I don't
> think using _locked() variant in lookup_node() is very beneficial
> (generally I would not expect to take a fault there) but at least a
> justification in the commit message should be different :).
> 
> 								Honza

I'll rewrite this commit description to cover this point properly.
Jason also suggested using gup-fast, which I like.
  thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] mm/gup: remove get_user_pages_locked()
  2022-01-31 13:36   ` Jason Gunthorpe
@ 2022-01-31 20:01     ` John Hubbard
  0 siblings, 0 replies; 14+ messages in thread
From: John Hubbard @ 2022-01-31 20:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Andrew Morton, Peter Xu, Jan Kara, Claudio Imbrenda,
	Kirill A . Shutemov, Alex Williamson, Andrea Arcangeli,
	Jérôme Glisse, LKML, linux-mm

On 1/31/22 05:36, Jason Gunthorpe wrote:
...
>> @@ -968,15 +966,15 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask,
>>   	if (flags & MPOL_F_NODE) {
>>   		if (flags & MPOL_F_ADDR) {
>>   			/*
>> -			 * Take a refcount on the mpol, lookup_node()
>> -			 * will drop the mmap_lock, so after calling
>> -			 * lookup_node() only "pol" remains valid, "vma"
>> -			 * is stale.
>> +			 * Take a refcount on the mpol, because we are about to
>> +			 * drop the mmap_lock, after which only "pol" remains
>> +			 * valid, "vma" is stale.
>>   			 */
>>   			pol_refcount = pol;
>>   			vma = NULL;
>>   			mpol_get(pol);
>>   			err = lookup_node(mm, addr);
>> +			mmap_read_unlock(mm);
> 
> How about move the mmap_read_unlock up one line and then use
> get_user_pages_fast()
> 
> I'm guessing in most cases here the PTE will be present so that should
> be a net win?

Neat, I'll do that.


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] mm: Fix invalid page pointer returned with FOLL_PIN gups
  2022-01-31  5:17 ` [PATCH 1/4] mm: Fix invalid page pointer returned with FOLL_PIN gups John Hubbard
@ 2022-02-02 14:04   ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2022-02-02 14:04 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, Peter Xu, Jason Gunthorpe, Jan Kara,
	Claudio Imbrenda, Kirill A . Shutemov, Alex Williamson,
	Andrea Arcangeli, Jérôme Glisse, LKML, linux-mm

On Sun, Jan 30, 2022 at 09:17:49PM -0800, John Hubbard wrote:
> From: Peter Xu <peterx@redhat.com>
> 
> Alex reported invalid page pointer returned with pin_user_pages_remote() from
> vfio after upstream commit 4b6c33b32296 ("vfio/type1: Prepare for batched
> pinning with struct vfio_batch").  This problem breaks NVIDIA vfio mdev.

I haven't actually seen any nvidia mdev module in the tree.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] mm/gup: remove unused pin_user_pages_locked()
  2022-01-31  5:17 ` [PATCH 3/4] mm/gup: remove unused pin_user_pages_locked() John Hubbard
  2022-01-31 14:06   ` David Hildenbrand
  2022-01-31 14:43   ` Jason Gunthorpe
@ 2022-02-02 14:05   ` Christoph Hellwig
  2 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2022-02-02 14:05 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, Peter Xu, Jason Gunthorpe, Jan Kara,
	Claudio Imbrenda, Kirill A . Shutemov, Alex Williamson,
	Andrea Arcangeli, Jérôme Glisse, LKML, linux-mm

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-02-02 14:06 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-31  5:17 [PATCH 0/4] mm/gup: some cleanups John Hubbard
2022-01-31  5:17 ` [PATCH 1/4] mm: Fix invalid page pointer returned with FOLL_PIN gups John Hubbard
2022-02-02 14:04   ` Christoph Hellwig
2022-01-31  5:17 ` [PATCH 2/4] mm/gup: clean up follow_pfn_pte() slightly John Hubbard
2022-01-31 13:36   ` Jason Gunthorpe
2022-01-31  5:17 ` [PATCH 3/4] mm/gup: remove unused pin_user_pages_locked() John Hubbard
2022-01-31 14:06   ` David Hildenbrand
2022-01-31 14:43   ` Jason Gunthorpe
2022-02-02 14:05   ` Christoph Hellwig
2022-01-31  5:17 ` [PATCH 4/4] mm/gup: remove get_user_pages_locked() John Hubbard
2022-01-31 12:05   ` Jan Kara
2022-01-31 20:01     ` John Hubbard
2022-01-31 13:36   ` Jason Gunthorpe
2022-01-31 20:01     ` John Hubbard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).