linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Linux-kernel-mentees][PATCH v5 0/1] get_user_pages changes
@ 2019-08-09 19:38 Bharath Vedartham
  2019-08-09 19:38 ` [Linux-kernel-mentees][PATCH v5 1/1] sgi-gru: Remove *pte_lookup functions, Convert to get_user_page*() Bharath Vedartham
  0 siblings, 1 reply; 10+ messages in thread
From: Bharath Vedartham @ 2019-08-09 19:38 UTC (permalink / raw)
  To: jhubbard, gregkh, sivanich, arnd
  Cc: ira.weiny, jglisse, william.kucharski, hch, linux-kernel,
	linux-mm, linux-kernel-mentees, Bharath Vedartham

In this 5th version of the patch series, I have compressed the patches
of the v2 patch series into one patch. This was suggested by Christoph Hellwig.
The suggestion was to remove the pte_lookup functions and use the 
get_user_pages* functions directly instead of the pte_lookup functions.

There is nothing different in this series compared to the v2
series, It essentially compresses the 3 patches of the original series
into one patch.

This series survives a compile test.

Bharath Vedartham (1):
  sgi-gru: Remove *pte_lookup functions

 drivers/misc/sgi-gru/grufault.c | 112 +++++++++-------------------------------
 1 file changed, 24 insertions(+), 88 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Linux-kernel-mentees][PATCH v5 1/1] sgi-gru: Remove *pte_lookup functions, Convert to get_user_page*()
  2019-08-09 19:38 [Linux-kernel-mentees][PATCH v5 0/1] get_user_pages changes Bharath Vedartham
@ 2019-08-09 19:38 ` Bharath Vedartham
  2019-08-13 14:50   ` Dimitri Sivanich
  2019-08-18 17:58   ` Bharath Vedartham
  0 siblings, 2 replies; 10+ messages in thread
From: Bharath Vedartham @ 2019-08-09 19:38 UTC (permalink / raw)
  To: jhubbard, gregkh, sivanich, arnd
  Cc: ira.weiny, jglisse, william.kucharski, hch, linux-kernel,
	linux-mm, linux-kernel-mentees, Bharath Vedartham

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

As part of this conversion, the *pte_lookup functions can be removed and
be easily replaced with get_user_pages_fast() functions. In the case of
atomic lookup, __get_user_pages_fast() is used, because it does not fall
back to the slow path: get_user_pages(). get_user_pages_fast(), on the other
hand, first calls __get_user_pages_fast(), but then falls back to the
slow path if __get_user_pages_fast() fails.

Also: remove unnecessary CONFIG_HUGETLB ifdefs.

Cc: Ira Weiny <ira.weiny@intel.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-kernel-mentees@lists.linuxfoundation.org
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Bharath Vedartham <linux.bhar@gmail.com>
---
This is a fold of the 3 patches in the v2 patch series.
The review tags were given to the individual patches.

Changes since v3
	- Used gup flags in get_user_pages_fast rather than
	boolean flags.
Changes since v4
	- Updated changelog according to John Hubbard.
---
 drivers/misc/sgi-gru/grufault.c | 112 +++++++++-------------------------------
 1 file changed, 24 insertions(+), 88 deletions(-)

diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
index 4b713a8..304e9c5 100644
--- a/drivers/misc/sgi-gru/grufault.c
+++ b/drivers/misc/sgi-gru/grufault.c
@@ -166,96 +166,20 @@ static void get_clear_fault_map(struct gru_state *gru,
 }
 
 /*
- * Atomic (interrupt context) & non-atomic (user context) functions to
- * convert a vaddr into a physical address. The size of the page
- * is returned in pageshift.
- * 	returns:
- * 		  0 - successful
- * 		< 0 - error code
- * 		  1 - (atomic only) try again in non-atomic context
- */
-static int non_atomic_pte_lookup(struct vm_area_struct *vma,
-				 unsigned long vaddr, int write,
-				 unsigned long *paddr, int *pageshift)
-{
-	struct page *page;
-
-#ifdef CONFIG_HUGETLB_PAGE
-	*pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
-#else
-	*pageshift = PAGE_SHIFT;
-#endif
-	if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page, NULL) <= 0)
-		return -EFAULT;
-	*paddr = page_to_phys(page);
-	put_page(page);
-	return 0;
-}
-
-/*
- * atomic_pte_lookup
+ * mmap_sem is already helod on entry to this function. This guarantees
+ * existence of the page tables.
  *
- * Convert a user virtual address to a physical address
  * Only supports Intel large pages (2MB only) on x86_64.
- *	ZZZ - hugepage support is incomplete
- *
- * NOTE: mmap_sem is already held on entry to this function. This
- * guarantees existence of the page tables.
+ *	ZZZ - hugepage support is incomplete.
  */
-static int atomic_pte_lookup(struct vm_area_struct *vma, unsigned long vaddr,
-	int write, unsigned long *paddr, int *pageshift)
-{
-	pgd_t *pgdp;
-	p4d_t *p4dp;
-	pud_t *pudp;
-	pmd_t *pmdp;
-	pte_t pte;
-
-	pgdp = pgd_offset(vma->vm_mm, vaddr);
-	if (unlikely(pgd_none(*pgdp)))
-		goto err;
-
-	p4dp = p4d_offset(pgdp, vaddr);
-	if (unlikely(p4d_none(*p4dp)))
-		goto err;
-
-	pudp = pud_offset(p4dp, vaddr);
-	if (unlikely(pud_none(*pudp)))
-		goto err;
-
-	pmdp = pmd_offset(pudp, vaddr);
-	if (unlikely(pmd_none(*pmdp)))
-		goto err;
-#ifdef CONFIG_X86_64
-	if (unlikely(pmd_large(*pmdp)))
-		pte = *(pte_t *) pmdp;
-	else
-#endif
-		pte = *pte_offset_kernel(pmdp, vaddr);
-
-	if (unlikely(!pte_present(pte) ||
-		     (write && (!pte_write(pte) || !pte_dirty(pte)))))
-		return 1;
-
-	*paddr = pte_pfn(pte) << PAGE_SHIFT;
-#ifdef CONFIG_HUGETLB_PAGE
-	*pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
-#else
-	*pageshift = PAGE_SHIFT;
-#endif
-	return 0;
-
-err:
-	return 1;
-}
-
 static int gru_vtop(struct gru_thread_state *gts, unsigned long vaddr,
 		    int write, int atomic, unsigned long *gpa, int *pageshift)
 {
 	struct mm_struct *mm = gts->ts_mm;
 	struct vm_area_struct *vma;
 	unsigned long paddr;
-	int ret, ps;
+	int ret;
+	struct page *page;
 
 	vma = find_vma(mm, vaddr);
 	if (!vma)
@@ -263,21 +187,33 @@ static int gru_vtop(struct gru_thread_state *gts, unsigned long vaddr,
 
 	/*
 	 * Atomic lookup is faster & usually works even if called in non-atomic
-	 * context.
+	 * context. get_user_pages_fast does atomic lookup before falling back to
+	 * slow gup.
 	 */
 	rmb();	/* Must/check ms_range_active before loading PTEs */
-	ret = atomic_pte_lookup(vma, vaddr, write, &paddr, &ps);
-	if (ret) {
-		if (atomic)
+	if (atomic) {
+		ret = __get_user_pages_fast(vaddr, 1, write, &page);
+		if (!ret)
 			goto upm;
-		if (non_atomic_pte_lookup(vma, vaddr, write, &paddr, &ps))
+	} else {
+		ret = get_user_pages_fast(vaddr, 1, write ? FOLL_WRITE : 0, &page);
+		if (!ret)
 			goto inval;
 	}
+
+	paddr = page_to_phys(page);
+	put_user_page(page);
+
+	if (unlikely(is_vm_hugetlb_page(vma)))
+		*pageshift = HPAGE_SHIFT;
+	else
+		*pageshift = PAGE_SHIFT;
+
 	if (is_gru_paddr(paddr))
 		goto inval;
-	paddr = paddr & ~((1UL << ps) - 1);
+	paddr = paddr & ~((1UL << *pageshift) - 1);
 	*gpa = uv_soc_phys_ram_to_gpa(paddr);
-	*pageshift = ps;
+
 	return VTOP_SUCCESS;
 
 inval:
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Linux-kernel-mentees][PATCH v5 1/1] sgi-gru: Remove *pte_lookup functions, Convert to get_user_page*()
  2019-08-09 19:38 ` [Linux-kernel-mentees][PATCH v5 1/1] sgi-gru: Remove *pte_lookup functions, Convert to get_user_page*() Bharath Vedartham
@ 2019-08-13 14:50   ` Dimitri Sivanich
  2019-08-13 17:23     ` Bharath Vedartham
  2019-08-18 17:58   ` Bharath Vedartham
  1 sibling, 1 reply; 10+ messages in thread
From: Dimitri Sivanich @ 2019-08-13 14:50 UTC (permalink / raw)
  To: Bharath Vedartham
  Cc: jhubbard, gregkh, sivanich, arnd, ira.weiny, jglisse,
	william.kucharski, hch, linux-kernel, linux-mm,
	linux-kernel-mentees

Bharath,

I do not believe that __get_user_pages_fast will work for the atomic case, as
there is no guarantee that the 'current->mm' will be the correct one for the
process in question, as the process might have moved away from the cpu that is
handling interrupts for it's context.

On Sat, Aug 10, 2019 at 01:08:17AM +0530, Bharath Vedartham wrote:
> For pages that were retained via get_user_pages*(), release those pages
> via the new put_user_page*() routines, instead of via put_page() or
> release_pages().
> 
> This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> ("mm: introduce put_user_page*(), placeholder versions").
> 
> As part of this conversion, the *pte_lookup functions can be removed and
> be easily replaced with get_user_pages_fast() functions. In the case of
> atomic lookup, __get_user_pages_fast() is used, because it does not fall
> back to the slow path: get_user_pages(). get_user_pages_fast(), on the other
> hand, first calls __get_user_pages_fast(), but then falls back to the
> slow path if __get_user_pages_fast() fails.
> 
> Also: remove unnecessary CONFIG_HUGETLB ifdefs.
> 
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Dimitri Sivanich <sivanich@sgi.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: William Kucharski <william.kucharski@oracle.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-mm@kvack.org
> Cc: linux-kernel-mentees@lists.linuxfoundation.org
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
> Signed-off-by: Bharath Vedartham <linux.bhar@gmail.com>
> ---
> This is a fold of the 3 patches in the v2 patch series.
> The review tags were given to the individual patches.
> 
> Changes since v3
> 	- Used gup flags in get_user_pages_fast rather than
> 	boolean flags.
> Changes since v4
> 	- Updated changelog according to John Hubbard.
> ---
>  drivers/misc/sgi-gru/grufault.c | 112 +++++++++-------------------------------
>  1 file changed, 24 insertions(+), 88 deletions(-)
> 
> diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
> index 4b713a8..304e9c5 100644
> --- a/drivers/misc/sgi-gru/grufault.c
> +++ b/drivers/misc/sgi-gru/grufault.c
> @@ -166,96 +166,20 @@ static void get_clear_fault_map(struct gru_state *gru,
>  }
>  
>  /*
> - * Atomic (interrupt context) & non-atomic (user context) functions to
> - * convert a vaddr into a physical address. The size of the page
> - * is returned in pageshift.
> - * 	returns:
> - * 		  0 - successful
> - * 		< 0 - error code
> - * 		  1 - (atomic only) try again in non-atomic context
> - */
> -static int non_atomic_pte_lookup(struct vm_area_struct *vma,
> -				 unsigned long vaddr, int write,
> -				 unsigned long *paddr, int *pageshift)
> -{
> -	struct page *page;
> -
> -#ifdef CONFIG_HUGETLB_PAGE
> -	*pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
> -#else
> -	*pageshift = PAGE_SHIFT;
> -#endif
> -	if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page, NULL) <= 0)
> -		return -EFAULT;
> -	*paddr = page_to_phys(page);
> -	put_page(page);
> -	return 0;
> -}
> -
> -/*
> - * atomic_pte_lookup
> + * mmap_sem is already helod on entry to this function. This guarantees
> + * existence of the page tables.
>   *
> - * Convert a user virtual address to a physical address
>   * Only supports Intel large pages (2MB only) on x86_64.
> - *	ZZZ - hugepage support is incomplete
> - *
> - * NOTE: mmap_sem is already held on entry to this function. This
> - * guarantees existence of the page tables.
> + *	ZZZ - hugepage support is incomplete.
>   */
> -static int atomic_pte_lookup(struct vm_area_struct *vma, unsigned long vaddr,
> -	int write, unsigned long *paddr, int *pageshift)
> -{
> -	pgd_t *pgdp;
> -	p4d_t *p4dp;
> -	pud_t *pudp;
> -	pmd_t *pmdp;
> -	pte_t pte;
> -
> -	pgdp = pgd_offset(vma->vm_mm, vaddr);
> -	if (unlikely(pgd_none(*pgdp)))
> -		goto err;
> -
> -	p4dp = p4d_offset(pgdp, vaddr);
> -	if (unlikely(p4d_none(*p4dp)))
> -		goto err;
> -
> -	pudp = pud_offset(p4dp, vaddr);
> -	if (unlikely(pud_none(*pudp)))
> -		goto err;
> -
> -	pmdp = pmd_offset(pudp, vaddr);
> -	if (unlikely(pmd_none(*pmdp)))
> -		goto err;
> -#ifdef CONFIG_X86_64
> -	if (unlikely(pmd_large(*pmdp)))
> -		pte = *(pte_t *) pmdp;
> -	else
> -#endif
> -		pte = *pte_offset_kernel(pmdp, vaddr);
> -
> -	if (unlikely(!pte_present(pte) ||
> -		     (write && (!pte_write(pte) || !pte_dirty(pte)))))
> -		return 1;
> -
> -	*paddr = pte_pfn(pte) << PAGE_SHIFT;
> -#ifdef CONFIG_HUGETLB_PAGE
> -	*pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
> -#else
> -	*pageshift = PAGE_SHIFT;
> -#endif
> -	return 0;
> -
> -err:
> -	return 1;
> -}
> -
>  static int gru_vtop(struct gru_thread_state *gts, unsigned long vaddr,
>  		    int write, int atomic, unsigned long *gpa, int *pageshift)
>  {
>  	struct mm_struct *mm = gts->ts_mm;
>  	struct vm_area_struct *vma;
>  	unsigned long paddr;
> -	int ret, ps;
> +	int ret;
> +	struct page *page;
>  
>  	vma = find_vma(mm, vaddr);
>  	if (!vma)
> @@ -263,21 +187,33 @@ static int gru_vtop(struct gru_thread_state *gts, unsigned long vaddr,
>  
>  	/*
>  	 * Atomic lookup is faster & usually works even if called in non-atomic
> -	 * context.
> +	 * context. get_user_pages_fast does atomic lookup before falling back to
> +	 * slow gup.
>  	 */
>  	rmb();	/* Must/check ms_range_active before loading PTEs */
> -	ret = atomic_pte_lookup(vma, vaddr, write, &paddr, &ps);
> -	if (ret) {
> -		if (atomic)
> +	if (atomic) {
> +		ret = __get_user_pages_fast(vaddr, 1, write, &page);
> +		if (!ret)
>  			goto upm;
> -		if (non_atomic_pte_lookup(vma, vaddr, write, &paddr, &ps))
> +	} else {
> +		ret = get_user_pages_fast(vaddr, 1, write ? FOLL_WRITE : 0, &page);
> +		if (!ret)
>  			goto inval;
>  	}
> +
> +	paddr = page_to_phys(page);
> +	put_user_page(page);
> +
> +	if (unlikely(is_vm_hugetlb_page(vma)))
> +		*pageshift = HPAGE_SHIFT;
> +	else
> +		*pageshift = PAGE_SHIFT;
> +
>  	if (is_gru_paddr(paddr))
>  		goto inval;
> -	paddr = paddr & ~((1UL << ps) - 1);
> +	paddr = paddr & ~((1UL << *pageshift) - 1);
>  	*gpa = uv_soc_phys_ram_to_gpa(paddr);
> -	*pageshift = ps;
> +
>  	return VTOP_SUCCESS;
>  
>  inval:
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Linux-kernel-mentees][PATCH v5 1/1] sgi-gru: Remove *pte_lookup functions, Convert to get_user_page*()
  2019-08-13 14:50   ` Dimitri Sivanich
@ 2019-08-13 17:23     ` Bharath Vedartham
  2019-08-13 18:19       ` Dimitri Sivanich
  0 siblings, 1 reply; 10+ messages in thread
From: Bharath Vedartham @ 2019-08-13 17:23 UTC (permalink / raw)
  To: Dimitri Sivanich
  Cc: jhubbard, gregkh, arnd, ira.weiny, jglisse, william.kucharski,
	hch, linux-kernel, linux-mm, linux-kernel-mentees

On Tue, Aug 13, 2019 at 09:50:29AM -0500, Dimitri Sivanich wrote:
> Bharath,
> 
> I do not believe that __get_user_pages_fast will work for the atomic case, as
> there is no guarantee that the 'current->mm' will be the correct one for the
> process in question, as the process might have moved away from the cpu that is
> handling interrupts for it's context.
So what your saying is, there may be cases where current->mm != gts->ts_mm
right? __get_user_pages_fast and get_user_pages do assume current->mm.

These changes were inspired a bit from kvm. In kvm/kvm_main.c,
hva_to_pfn_fast uses __get_user_pages_fast. THe comment above the
function states it runs in atomic context.

Just curious, get_user_pages also uses current->mm. Do you think that is
also an issue? 

Do you feel using get_user_pages_remote would be a better idea? We can
specify the mm_struct in get_user_pages_remote?

Thank you
Bharath
> On Sat, Aug 10, 2019 at 01:08:17AM +0530, Bharath Vedartham wrote:
> > For pages that were retained via get_user_pages*(), release those pages
> > via the new put_user_page*() routines, instead of via put_page() or
> > release_pages().
> > 
> > This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> > ("mm: introduce put_user_page*(), placeholder versions").
> > 
> > As part of this conversion, the *pte_lookup functions can be removed and
> > be easily replaced with get_user_pages_fast() functions. In the case of
> > atomic lookup, __get_user_pages_fast() is used, because it does not fall
> > back to the slow path: get_user_pages(). get_user_pages_fast(), on the other
> > hand, first calls __get_user_pages_fast(), but then falls back to the
> > slow path if __get_user_pages_fast() fails.
> > 
> > Also: remove unnecessary CONFIG_HUGETLB ifdefs.
> > 
> > Cc: Ira Weiny <ira.weiny@intel.com>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: Dimitri Sivanich <sivanich@sgi.com>
> > Cc: Arnd Bergmann <arnd@arndb.de>
> > Cc: William Kucharski <william.kucharski@oracle.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux-mm@kvack.org
> > Cc: linux-kernel-mentees@lists.linuxfoundation.org
> > Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> > Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> > Reviewed-by: William Kucharski <william.kucharski@oracle.com>
> > Signed-off-by: Bharath Vedartham <linux.bhar@gmail.com>
> > ---
> > This is a fold of the 3 patches in the v2 patch series.
> > The review tags were given to the individual patches.
> > 
> > Changes since v3
> > 	- Used gup flags in get_user_pages_fast rather than
> > 	boolean flags.
> > Changes since v4
> > 	- Updated changelog according to John Hubbard.
> > ---
> >  drivers/misc/sgi-gru/grufault.c | 112 +++++++++-------------------------------
> >  1 file changed, 24 insertions(+), 88 deletions(-)
> > 
> > diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
> > index 4b713a8..304e9c5 100644
> > --- a/drivers/misc/sgi-gru/grufault.c
> > +++ b/drivers/misc/sgi-gru/grufault.c
> > @@ -166,96 +166,20 @@ static void get_clear_fault_map(struct gru_state *gru,
> >  }
> >  
> >  /*
> > - * Atomic (interrupt context) & non-atomic (user context) functions to
> > - * convert a vaddr into a physical address. The size of the page
> > - * is returned in pageshift.
> > - * 	returns:
> > - * 		  0 - successful
> > - * 		< 0 - error code
> > - * 		  1 - (atomic only) try again in non-atomic context
> > - */
> > -static int non_atomic_pte_lookup(struct vm_area_struct *vma,
> > -				 unsigned long vaddr, int write,
> > -				 unsigned long *paddr, int *pageshift)
> > -{
> > -	struct page *page;
> > -
> > -#ifdef CONFIG_HUGETLB_PAGE
> > -	*pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
> > -#else
> > -	*pageshift = PAGE_SHIFT;
> > -#endif
> > -	if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page, NULL) <= 0)
> > -		return -EFAULT;
> > -	*paddr = page_to_phys(page);
> > -	put_page(page);
> > -	return 0;
> > -}
> > -
> > -/*
> > - * atomic_pte_lookup
> > + * mmap_sem is already helod on entry to this function. This guarantees
> > + * existence of the page tables.
> >   *
> > - * Convert a user virtual address to a physical address
> >   * Only supports Intel large pages (2MB only) on x86_64.
> > - *	ZZZ - hugepage support is incomplete
> > - *
> > - * NOTE: mmap_sem is already held on entry to this function. This
> > - * guarantees existence of the page tables.
> > + *	ZZZ - hugepage support is incomplete.
> >   */
> > -static int atomic_pte_lookup(struct vm_area_struct *vma, unsigned long vaddr,
> > -	int write, unsigned long *paddr, int *pageshift)
> > -{
> > -	pgd_t *pgdp;
> > -	p4d_t *p4dp;
> > -	pud_t *pudp;
> > -	pmd_t *pmdp;
> > -	pte_t pte;
> > -
> > -	pgdp = pgd_offset(vma->vm_mm, vaddr);
> > -	if (unlikely(pgd_none(*pgdp)))
> > -		goto err;
> > -
> > -	p4dp = p4d_offset(pgdp, vaddr);
> > -	if (unlikely(p4d_none(*p4dp)))
> > -		goto err;
> > -
> > -	pudp = pud_offset(p4dp, vaddr);
> > -	if (unlikely(pud_none(*pudp)))
> > -		goto err;
> > -
> > -	pmdp = pmd_offset(pudp, vaddr);
> > -	if (unlikely(pmd_none(*pmdp)))
> > -		goto err;
> > -#ifdef CONFIG_X86_64
> > -	if (unlikely(pmd_large(*pmdp)))
> > -		pte = *(pte_t *) pmdp;
> > -	else
> > -#endif
> > -		pte = *pte_offset_kernel(pmdp, vaddr);
> > -
> > -	if (unlikely(!pte_present(pte) ||
> > -		     (write && (!pte_write(pte) || !pte_dirty(pte)))))
> > -		return 1;
> > -
> > -	*paddr = pte_pfn(pte) << PAGE_SHIFT;
> > -#ifdef CONFIG_HUGETLB_PAGE
> > -	*pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
> > -#else
> > -	*pageshift = PAGE_SHIFT;
> > -#endif
> > -	return 0;
> > -
> > -err:
> > -	return 1;
> > -}
> > -
> >  static int gru_vtop(struct gru_thread_state *gts, unsigned long vaddr,
> >  		    int write, int atomic, unsigned long *gpa, int *pageshift)
> >  {
> >  	struct mm_struct *mm = gts->ts_mm;
> >  	struct vm_area_struct *vma;
> >  	unsigned long paddr;
> > -	int ret, ps;
> > +	int ret;
> > +	struct page *page;
> >  
> >  	vma = find_vma(mm, vaddr);
> >  	if (!vma)
> > @@ -263,21 +187,33 @@ static int gru_vtop(struct gru_thread_state *gts, unsigned long vaddr,
> >  
> >  	/*
> >  	 * Atomic lookup is faster & usually works even if called in non-atomic
> > -	 * context.
> > +	 * context. get_user_pages_fast does atomic lookup before falling back to
> > +	 * slow gup.
> >  	 */
> >  	rmb();	/* Must/check ms_range_active before loading PTEs */
> > -	ret = atomic_pte_lookup(vma, vaddr, write, &paddr, &ps);
> > -	if (ret) {
> > -		if (atomic)
> > +	if (atomic) {
> > +		ret = __get_user_pages_fast(vaddr, 1, write, &page);
> > +		if (!ret)
> >  			goto upm;
> > -		if (non_atomic_pte_lookup(vma, vaddr, write, &paddr, &ps))
> > +	} else {
> > +		ret = get_user_pages_fast(vaddr, 1, write ? FOLL_WRITE : 0, &page);
> > +		if (!ret)
> >  			goto inval;
> >  	}
> > +
> > +	paddr = page_to_phys(page);
> > +	put_user_page(page);
> > +
> > +	if (unlikely(is_vm_hugetlb_page(vma)))
> > +		*pageshift = HPAGE_SHIFT;
> > +	else
> > +		*pageshift = PAGE_SHIFT;
> > +
> >  	if (is_gru_paddr(paddr))
> >  		goto inval;
> > -	paddr = paddr & ~((1UL << ps) - 1);
> > +	paddr = paddr & ~((1UL << *pageshift) - 1);
> >  	*gpa = uv_soc_phys_ram_to_gpa(paddr);
> > -	*pageshift = ps;
> > +
> >  	return VTOP_SUCCESS;
> >  
> >  inval:
> > -- 
> > 2.7.4
> > 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Linux-kernel-mentees][PATCH v5 1/1] sgi-gru: Remove *pte_lookup functions, Convert to get_user_page*()
  2019-08-13 17:23     ` Bharath Vedartham
@ 2019-08-13 18:19       ` Dimitri Sivanich
  2019-08-14 17:30         ` Bharath Vedartham
  0 siblings, 1 reply; 10+ messages in thread
From: Dimitri Sivanich @ 2019-08-13 18:19 UTC (permalink / raw)
  To: Bharath Vedartham
  Cc: jhubbard, gregkh, arnd, ira.weiny, jglisse, william.kucharski,
	hch, linux-kernel, linux-mm, linux-kernel-mentees, sivanich

On Tue, Aug 13, 2019 at 10:53:01PM +0530, Bharath Vedartham wrote:
> On Tue, Aug 13, 2019 at 09:50:29AM -0500, Dimitri Sivanich wrote:
> > Bharath,
> > 
> > I do not believe that __get_user_pages_fast will work for the atomic case, as
> > there is no guarantee that the 'current->mm' will be the correct one for the
> > process in question, as the process might have moved away from the cpu that is
> > handling interrupts for it's context.
> So what your saying is, there may be cases where current->mm != gts->ts_mm
> right? __get_user_pages_fast and get_user_pages do assume current->mm.

Correct, in the case of atomic context.

> 
> These changes were inspired a bit from kvm. In kvm/kvm_main.c,
> hva_to_pfn_fast uses __get_user_pages_fast. THe comment above the
> function states it runs in atomic context.
> 
> Just curious, get_user_pages also uses current->mm. Do you think that is
> also an issue? 

Not in non-atomic context.  Notice that it is currently done that way.

> 
> Do you feel using get_user_pages_remote would be a better idea? We can
> specify the mm_struct in get_user_pages_remote?

From that standpoint maybe, but is it safe in interrupt context?

> 
> Thank you
> Bharath
> > On Sat, Aug 10, 2019 at 01:08:17AM +0530, Bharath Vedartham wrote:
> > > For pages that were retained via get_user_pages*(), release those pages
> > > via the new put_user_page*() routines, instead of via put_page() or
> > > release_pages().
> > > 
> > > This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> > > ("mm: introduce put_user_page*(), placeholder versions").
> > > 
> > > As part of this conversion, the *pte_lookup functions can be removed and
> > > be easily replaced with get_user_pages_fast() functions. In the case of
> > > atomic lookup, __get_user_pages_fast() is used, because it does not fall
> > > back to the slow path: get_user_pages(). get_user_pages_fast(), on the other
> > > hand, first calls __get_user_pages_fast(), but then falls back to the
> > > slow path if __get_user_pages_fast() fails.
> > > 
> > > Also: remove unnecessary CONFIG_HUGETLB ifdefs.
> > > 
> > > Cc: Ira Weiny <ira.weiny@intel.com>
> > > Cc: John Hubbard <jhubbard@nvidia.com>
> > > Cc: Jérôme Glisse <jglisse@redhat.com>
> > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > Cc: Dimitri Sivanich <sivanich@sgi.com>
> > > Cc: Arnd Bergmann <arnd@arndb.de>
> > > Cc: William Kucharski <william.kucharski@oracle.com>
> > > Cc: Christoph Hellwig <hch@lst.de>
> > > Cc: linux-kernel@vger.kernel.org
> > > Cc: linux-mm@kvack.org
> > > Cc: linux-kernel-mentees@lists.linuxfoundation.org
> > > Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> > > Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> > > Reviewed-by: William Kucharski <william.kucharski@oracle.com>
> > > Signed-off-by: Bharath Vedartham <linux.bhar@gmail.com>
> > > ---
> > > This is a fold of the 3 patches in the v2 patch series.
> > > The review tags were given to the individual patches.
> > > 
> > > Changes since v3
> > > 	- Used gup flags in get_user_pages_fast rather than
> > > 	boolean flags.
> > > Changes since v4
> > > 	- Updated changelog according to John Hubbard.
> > > ---
> > >  drivers/misc/sgi-gru/grufault.c | 112 +++++++++-------------------------------
> > >  1 file changed, 24 insertions(+), 88 deletions(-)
> > > 
> > > diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
> > > index 4b713a8..304e9c5 100644
> > > --- a/drivers/misc/sgi-gru/grufault.c
> > > +++ b/drivers/misc/sgi-gru/grufault.c
> > > @@ -166,96 +166,20 @@ static void get_clear_fault_map(struct gru_state *gru,
> > >  }
> > >  
> > >  /*
> > > - * Atomic (interrupt context) & non-atomic (user context) functions to
> > > - * convert a vaddr into a physical address. The size of the page
> > > - * is returned in pageshift.
> > > - * 	returns:
> > > - * 		  0 - successful
> > > - * 		< 0 - error code
> > > - * 		  1 - (atomic only) try again in non-atomic context
> > > - */
> > > -static int non_atomic_pte_lookup(struct vm_area_struct *vma,
> > > -				 unsigned long vaddr, int write,
> > > -				 unsigned long *paddr, int *pageshift)
> > > -{
> > > -	struct page *page;
> > > -
> > > -#ifdef CONFIG_HUGETLB_PAGE
> > > -	*pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
> > > -#else
> > > -	*pageshift = PAGE_SHIFT;
> > > -#endif
> > > -	if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page, NULL) <= 0)
> > > -		return -EFAULT;
> > > -	*paddr = page_to_phys(page);
> > > -	put_page(page);
> > > -	return 0;
> > > -}
> > > -
> > > -/*
> > > - * atomic_pte_lookup
> > > + * mmap_sem is already helod on entry to this function. This guarantees
> > > + * existence of the page tables.
> > >   *
> > > - * Convert a user virtual address to a physical address
> > >   * Only supports Intel large pages (2MB only) on x86_64.
> > > - *	ZZZ - hugepage support is incomplete
> > > - *
> > > - * NOTE: mmap_sem is already held on entry to this function. This
> > > - * guarantees existence of the page tables.
> > > + *	ZZZ - hugepage support is incomplete.
> > >   */
> > > -static int atomic_pte_lookup(struct vm_area_struct *vma, unsigned long vaddr,
> > > -	int write, unsigned long *paddr, int *pageshift)
> > > -{
> > > -	pgd_t *pgdp;
> > > -	p4d_t *p4dp;
> > > -	pud_t *pudp;
> > > -	pmd_t *pmdp;
> > > -	pte_t pte;
> > > -
> > > -	pgdp = pgd_offset(vma->vm_mm, vaddr);
> > > -	if (unlikely(pgd_none(*pgdp)))
> > > -		goto err;
> > > -
> > > -	p4dp = p4d_offset(pgdp, vaddr);
> > > -	if (unlikely(p4d_none(*p4dp)))
> > > -		goto err;
> > > -
> > > -	pudp = pud_offset(p4dp, vaddr);
> > > -	if (unlikely(pud_none(*pudp)))
> > > -		goto err;
> > > -
> > > -	pmdp = pmd_offset(pudp, vaddr);
> > > -	if (unlikely(pmd_none(*pmdp)))
> > > -		goto err;
> > > -#ifdef CONFIG_X86_64
> > > -	if (unlikely(pmd_large(*pmdp)))
> > > -		pte = *(pte_t *) pmdp;
> > > -	else
> > > -#endif
> > > -		pte = *pte_offset_kernel(pmdp, vaddr);
> > > -
> > > -	if (unlikely(!pte_present(pte) ||
> > > -		     (write && (!pte_write(pte) || !pte_dirty(pte)))))
> > > -		return 1;
> > > -
> > > -	*paddr = pte_pfn(pte) << PAGE_SHIFT;
> > > -#ifdef CONFIG_HUGETLB_PAGE
> > > -	*pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
> > > -#else
> > > -	*pageshift = PAGE_SHIFT;
> > > -#endif
> > > -	return 0;
> > > -
> > > -err:
> > > -	return 1;
> > > -}
> > > -
> > >  static int gru_vtop(struct gru_thread_state *gts, unsigned long vaddr,
> > >  		    int write, int atomic, unsigned long *gpa, int *pageshift)
> > >  {
> > >  	struct mm_struct *mm = gts->ts_mm;
> > >  	struct vm_area_struct *vma;
> > >  	unsigned long paddr;
> > > -	int ret, ps;
> > > +	int ret;
> > > +	struct page *page;
> > >  
> > >  	vma = find_vma(mm, vaddr);
> > >  	if (!vma)
> > > @@ -263,21 +187,33 @@ static int gru_vtop(struct gru_thread_state *gts, unsigned long vaddr,
> > >  
> > >  	/*
> > >  	 * Atomic lookup is faster & usually works even if called in non-atomic
> > > -	 * context.
> > > +	 * context. get_user_pages_fast does atomic lookup before falling back to
> > > +	 * slow gup.
> > >  	 */
> > >  	rmb();	/* Must/check ms_range_active before loading PTEs */
> > > -	ret = atomic_pte_lookup(vma, vaddr, write, &paddr, &ps);
> > > -	if (ret) {
> > > -		if (atomic)
> > > +	if (atomic) {
> > > +		ret = __get_user_pages_fast(vaddr, 1, write, &page);
> > > +		if (!ret)
> > >  			goto upm;
> > > -		if (non_atomic_pte_lookup(vma, vaddr, write, &paddr, &ps))
> > > +	} else {
> > > +		ret = get_user_pages_fast(vaddr, 1, write ? FOLL_WRITE : 0, &page);
> > > +		if (!ret)
> > >  			goto inval;
> > >  	}
> > > +
> > > +	paddr = page_to_phys(page);
> > > +	put_user_page(page);
> > > +
> > > +	if (unlikely(is_vm_hugetlb_page(vma)))
> > > +		*pageshift = HPAGE_SHIFT;
> > > +	else
> > > +		*pageshift = PAGE_SHIFT;
> > > +
> > >  	if (is_gru_paddr(paddr))
> > >  		goto inval;
> > > -	paddr = paddr & ~((1UL << ps) - 1);
> > > +	paddr = paddr & ~((1UL << *pageshift) - 1);
> > >  	*gpa = uv_soc_phys_ram_to_gpa(paddr);
> > > -	*pageshift = ps;
> > > +
> > >  	return VTOP_SUCCESS;
> > >  
> > >  inval:
> > > -- 
> > > 2.7.4
> > > 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Linux-kernel-mentees][PATCH v5 1/1] sgi-gru: Remove *pte_lookup functions, Convert to get_user_page*()
  2019-08-13 18:19       ` Dimitri Sivanich
@ 2019-08-14 17:30         ` Bharath Vedartham
  2019-08-14 17:38           ` Jason Gunthorpe
  0 siblings, 1 reply; 10+ messages in thread
From: Bharath Vedartham @ 2019-08-14 17:30 UTC (permalink / raw)
  To: Dimitri Sivanich
  Cc: jhubbard, gregkh, arnd, ira.weiny, jglisse, william.kucharski,
	hch, linux-kernel, linux-mm, linux-kernel-mentees

On Tue, Aug 13, 2019 at 01:19:38PM -0500, Dimitri Sivanich wrote:
> On Tue, Aug 13, 2019 at 10:53:01PM +0530, Bharath Vedartham wrote:
> > On Tue, Aug 13, 2019 at 09:50:29AM -0500, Dimitri Sivanich wrote:
> > > Bharath,
> > > 
> > > I do not believe that __get_user_pages_fast will work for the atomic case, as
> > > there is no guarantee that the 'current->mm' will be the correct one for the
> > > process in question, as the process might have moved away from the cpu that is
> > > handling interrupts for it's context.
> > So what your saying is, there may be cases where current->mm != gts->ts_mm
> > right? __get_user_pages_fast and get_user_pages do assume current->mm.
> 
> Correct, in the case of atomic context.
> 
> > 
> > These changes were inspired a bit from kvm. In kvm/kvm_main.c,
> > hva_to_pfn_fast uses __get_user_pages_fast. THe comment above the
> > function states it runs in atomic context.
> > 
> > Just curious, get_user_pages also uses current->mm. Do you think that is
> > also an issue? 
> 
> Not in non-atomic context.  Notice that it is currently done that way.
> 
> > 
> > Do you feel using get_user_pages_remote would be a better idea? We can
> > specify the mm_struct in get_user_pages_remote?
> 
> From that standpoint maybe, but is it safe in interrupt context?
Hmm.. The gup maintainers seemed fine with the code..

Now this is only an issue if gru_vtop can be executed in an interrupt
context. 

get_user_pages_remote is not valid in an interrupt context(if CONFIG_MMU
is set). If we follow the function, in __get_user_pages, cond_resched()
is called which definitly confirms that we can't run this function in an
interrupt context. 

I think we might need some advice from the gup maintainers here.
Note that the comment on the function __get_user_pages_fast states that
__get_user_pages_fast is IRQ-safe.

Thank you
Bharath
> > 
> > Thank you
> > Bharath
> > > On Sat, Aug 10, 2019 at 01:08:17AM +0530, Bharath Vedartham wrote:
> > > > For pages that were retained via get_user_pages*(), release those pages
> > > > via the new put_user_page*() routines, instead of via put_page() or
> > > > release_pages().
> > > > 
> > > > This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> > > > ("mm: introduce put_user_page*(), placeholder versions").
> > > > 
> > > > As part of this conversion, the *pte_lookup functions can be removed and
> > > > be easily replaced with get_user_pages_fast() functions. In the case of
> > > > atomic lookup, __get_user_pages_fast() is used, because it does not fall
> > > > back to the slow path: get_user_pages(). get_user_pages_fast(), on the other
> > > > hand, first calls __get_user_pages_fast(), but then falls back to the
> > > > slow path if __get_user_pages_fast() fails.
> > > > 
> > > > Also: remove unnecessary CONFIG_HUGETLB ifdefs.
> > > > 
> > > > Cc: Ira Weiny <ira.weiny@intel.com>
> > > > Cc: John Hubbard <jhubbard@nvidia.com>
> > > > Cc: Jérôme Glisse <jglisse@redhat.com>
> > > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > > Cc: Dimitri Sivanich <sivanich@sgi.com>
> > > > Cc: Arnd Bergmann <arnd@arndb.de>
> > > > Cc: William Kucharski <william.kucharski@oracle.com>
> > > > Cc: Christoph Hellwig <hch@lst.de>
> > > > Cc: linux-kernel@vger.kernel.org
> > > > Cc: linux-mm@kvack.org
> > > > Cc: linux-kernel-mentees@lists.linuxfoundation.org
> > > > Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> > > > Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> > > > Reviewed-by: William Kucharski <william.kucharski@oracle.com>
> > > > Signed-off-by: Bharath Vedartham <linux.bhar@gmail.com>
> > > > ---
> > > > This is a fold of the 3 patches in the v2 patch series.
> > > > The review tags were given to the individual patches.
> > > > 
> > > > Changes since v3
> > > > 	- Used gup flags in get_user_pages_fast rather than
> > > > 	boolean flags.
> > > > Changes since v4
> > > > 	- Updated changelog according to John Hubbard.
> > > > ---
> > > >  drivers/misc/sgi-gru/grufault.c | 112 +++++++++-------------------------------
> > > >  1 file changed, 24 insertions(+), 88 deletions(-)
> > > > 
> > > > diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
> > > > index 4b713a8..304e9c5 100644
> > > > --- a/drivers/misc/sgi-gru/grufault.c
> > > > +++ b/drivers/misc/sgi-gru/grufault.c
> > > > @@ -166,96 +166,20 @@ static void get_clear_fault_map(struct gru_state *gru,
> > > >  }
> > > >  
> > > >  /*
> > > > - * Atomic (interrupt context) & non-atomic (user context) functions to
> > > > - * convert a vaddr into a physical address. The size of the page
> > > > - * is returned in pageshift.
> > > > - * 	returns:
> > > > - * 		  0 - successful
> > > > - * 		< 0 - error code
> > > > - * 		  1 - (atomic only) try again in non-atomic context
> > > > - */
> > > > -static int non_atomic_pte_lookup(struct vm_area_struct *vma,
> > > > -				 unsigned long vaddr, int write,
> > > > -				 unsigned long *paddr, int *pageshift)
> > > > -{
> > > > -	struct page *page;
> > > > -
> > > > -#ifdef CONFIG_HUGETLB_PAGE
> > > > -	*pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
> > > > -#else
> > > > -	*pageshift = PAGE_SHIFT;
> > > > -#endif
> > > > -	if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page, NULL) <= 0)
> > > > -		return -EFAULT;
> > > > -	*paddr = page_to_phys(page);
> > > > -	put_page(page);
> > > > -	return 0;
> > > > -}
> > > > -
> > > > -/*
> > > > - * atomic_pte_lookup
> > > > + * mmap_sem is already helod on entry to this function. This guarantees
> > > > + * existence of the page tables.
> > > >   *
> > > > - * Convert a user virtual address to a physical address
> > > >   * Only supports Intel large pages (2MB only) on x86_64.
> > > > - *	ZZZ - hugepage support is incomplete
> > > > - *
> > > > - * NOTE: mmap_sem is already held on entry to this function. This
> > > > - * guarantees existence of the page tables.
> > > > + *	ZZZ - hugepage support is incomplete.
> > > >   */
> > > > -static int atomic_pte_lookup(struct vm_area_struct *vma, unsigned long vaddr,
> > > > -	int write, unsigned long *paddr, int *pageshift)
> > > > -{
> > > > -	pgd_t *pgdp;
> > > > -	p4d_t *p4dp;
> > > > -	pud_t *pudp;
> > > > -	pmd_t *pmdp;
> > > > -	pte_t pte;
> > > > -
> > > > -	pgdp = pgd_offset(vma->vm_mm, vaddr);
> > > > -	if (unlikely(pgd_none(*pgdp)))
> > > > -		goto err;
> > > > -
> > > > -	p4dp = p4d_offset(pgdp, vaddr);
> > > > -	if (unlikely(p4d_none(*p4dp)))
> > > > -		goto err;
> > > > -
> > > > -	pudp = pud_offset(p4dp, vaddr);
> > > > -	if (unlikely(pud_none(*pudp)))
> > > > -		goto err;
> > > > -
> > > > -	pmdp = pmd_offset(pudp, vaddr);
> > > > -	if (unlikely(pmd_none(*pmdp)))
> > > > -		goto err;
> > > > -#ifdef CONFIG_X86_64
> > > > -	if (unlikely(pmd_large(*pmdp)))
> > > > -		pte = *(pte_t *) pmdp;
> > > > -	else
> > > > -#endif
> > > > -		pte = *pte_offset_kernel(pmdp, vaddr);
> > > > -
> > > > -	if (unlikely(!pte_present(pte) ||
> > > > -		     (write && (!pte_write(pte) || !pte_dirty(pte)))))
> > > > -		return 1;
> > > > -
> > > > -	*paddr = pte_pfn(pte) << PAGE_SHIFT;
> > > > -#ifdef CONFIG_HUGETLB_PAGE
> > > > -	*pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
> > > > -#else
> > > > -	*pageshift = PAGE_SHIFT;
> > > > -#endif
> > > > -	return 0;
> > > > -
> > > > -err:
> > > > -	return 1;
> > > > -}
> > > > -
> > > >  static int gru_vtop(struct gru_thread_state *gts, unsigned long vaddr,
> > > >  		    int write, int atomic, unsigned long *gpa, int *pageshift)
> > > >  {
> > > >  	struct mm_struct *mm = gts->ts_mm;
> > > >  	struct vm_area_struct *vma;
> > > >  	unsigned long paddr;
> > > > -	int ret, ps;
> > > > +	int ret;
> > > > +	struct page *page;
> > > >  
> > > >  	vma = find_vma(mm, vaddr);
> > > >  	if (!vma)
> > > > @@ -263,21 +187,33 @@ static int gru_vtop(struct gru_thread_state *gts, unsigned long vaddr,
> > > >  
> > > >  	/*
> > > >  	 * Atomic lookup is faster & usually works even if called in non-atomic
> > > > -	 * context.
> > > > +	 * context. get_user_pages_fast does atomic lookup before falling back to
> > > > +	 * slow gup.
> > > >  	 */
> > > >  	rmb();	/* Must/check ms_range_active before loading PTEs */
> > > > -	ret = atomic_pte_lookup(vma, vaddr, write, &paddr, &ps);
> > > > -	if (ret) {
> > > > -		if (atomic)
> > > > +	if (atomic) {
> > > > +		ret = __get_user_pages_fast(vaddr, 1, write, &page);
> > > > +		if (!ret)
> > > >  			goto upm;
> > > > -		if (non_atomic_pte_lookup(vma, vaddr, write, &paddr, &ps))
> > > > +	} else {
> > > > +		ret = get_user_pages_fast(vaddr, 1, write ? FOLL_WRITE : 0, &page);
> > > > +		if (!ret)
> > > >  			goto inval;
> > > >  	}
> > > > +
> > > > +	paddr = page_to_phys(page);
> > > > +	put_user_page(page);
> > > > +
> > > > +	if (unlikely(is_vm_hugetlb_page(vma)))
> > > > +		*pageshift = HPAGE_SHIFT;
> > > > +	else
> > > > +		*pageshift = PAGE_SHIFT;
> > > > +
> > > >  	if (is_gru_paddr(paddr))
> > > >  		goto inval;
> > > > -	paddr = paddr & ~((1UL << ps) - 1);
> > > > +	paddr = paddr & ~((1UL << *pageshift) - 1);
> > > >  	*gpa = uv_soc_phys_ram_to_gpa(paddr);
> > > > -	*pageshift = ps;
> > > > +
> > > >  	return VTOP_SUCCESS;
> > > >  
> > > >  inval:
> > > > -- 
> > > > 2.7.4
> > > > 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Linux-kernel-mentees][PATCH v5 1/1] sgi-gru: Remove *pte_lookup functions, Convert to get_user_page*()
  2019-08-14 17:30         ` Bharath Vedartham
@ 2019-08-14 17:38           ` Jason Gunthorpe
  2019-08-14 18:00             ` Bharath Vedartham
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2019-08-14 17:38 UTC (permalink / raw)
  To: Bharath Vedartham
  Cc: Dimitri Sivanich, jhubbard, gregkh, arnd, ira.weiny, jglisse,
	william.kucharski, hch, linux-kernel, linux-mm,
	linux-kernel-mentees

On Wed, Aug 14, 2019 at 11:00:34PM +0530, Bharath Vedartham wrote:
> On Tue, Aug 13, 2019 at 01:19:38PM -0500, Dimitri Sivanich wrote:
> > On Tue, Aug 13, 2019 at 10:53:01PM +0530, Bharath Vedartham wrote:
> > > On Tue, Aug 13, 2019 at 09:50:29AM -0500, Dimitri Sivanich wrote:
> > > > Bharath,
> > > > 
> > > > I do not believe that __get_user_pages_fast will work for the atomic case, as
> > > > there is no guarantee that the 'current->mm' will be the correct one for the
> > > > process in question, as the process might have moved away from the cpu that is
> > > > handling interrupts for it's context.
> > > So what your saying is, there may be cases where current->mm != gts->ts_mm
> > > right? __get_user_pages_fast and get_user_pages do assume current->mm.
> > 
> > Correct, in the case of atomic context.
> > 
> > > 
> > > These changes were inspired a bit from kvm. In kvm/kvm_main.c,
> > > hva_to_pfn_fast uses __get_user_pages_fast. THe comment above the
> > > function states it runs in atomic context.
> > > 
> > > Just curious, get_user_pages also uses current->mm. Do you think that is
> > > also an issue? 
> > 
> > Not in non-atomic context.  Notice that it is currently done that way.
> > 
> > > 
> > > Do you feel using get_user_pages_remote would be a better idea? We can
> > > specify the mm_struct in get_user_pages_remote?
> > 
> > From that standpoint maybe, but is it safe in interrupt context?
> Hmm.. The gup maintainers seemed fine with the code..
> 
> Now this is only an issue if gru_vtop can be executed in an interrupt
> context. 
> 
> get_user_pages_remote is not valid in an interrupt context(if CONFIG_MMU
> is set). If we follow the function, in __get_user_pages, cond_resched()
> is called which definitly confirms that we can't run this function in an
> interrupt context. 
> 
> I think we might need some advice from the gup maintainers here.
> Note that the comment on the function __get_user_pages_fast states that
> __get_user_pages_fast is IRQ-safe.

vhost is doing some approach where they switch current to the target
then call __get_user_pages_fast in an IRQ context, that might be a
reasonable pattern

If this is a regular occurance we should probably add a
get_atomic_user_pages_remote() to make the pattern clear.

Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Linux-kernel-mentees][PATCH v5 1/1] sgi-gru: Remove *pte_lookup functions, Convert to get_user_page*()
  2019-08-14 17:38           ` Jason Gunthorpe
@ 2019-08-14 18:00             ` Bharath Vedartham
  0 siblings, 0 replies; 10+ messages in thread
From: Bharath Vedartham @ 2019-08-14 18:00 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Dimitri Sivanich, jhubbard, gregkh, arnd, ira.weiny, jglisse,
	william.kucharski, hch, linux-kernel, linux-mm,
	linux-kernel-mentees

On Wed, Aug 14, 2019 at 02:38:30PM -0300, Jason Gunthorpe wrote:
> On Wed, Aug 14, 2019 at 11:00:34PM +0530, Bharath Vedartham wrote:
> > On Tue, Aug 13, 2019 at 01:19:38PM -0500, Dimitri Sivanich wrote:
> > > On Tue, Aug 13, 2019 at 10:53:01PM +0530, Bharath Vedartham wrote:
> > > > On Tue, Aug 13, 2019 at 09:50:29AM -0500, Dimitri Sivanich wrote:
> > > > > Bharath,
> > > > > 
> > > > > I do not believe that __get_user_pages_fast will work for the atomic case, as
> > > > > there is no guarantee that the 'current->mm' will be the correct one for the
> > > > > process in question, as the process might have moved away from the cpu that is
> > > > > handling interrupts for it's context.
> > > > So what your saying is, there may be cases where current->mm != gts->ts_mm
> > > > right? __get_user_pages_fast and get_user_pages do assume current->mm.
> > > 
> > > Correct, in the case of atomic context.
> > > 
> > > > 
> > > > These changes were inspired a bit from kvm. In kvm/kvm_main.c,
> > > > hva_to_pfn_fast uses __get_user_pages_fast. THe comment above the
> > > > function states it runs in atomic context.
> > > > 
> > > > Just curious, get_user_pages also uses current->mm. Do you think that is
> > > > also an issue? 
> > > 
> > > Not in non-atomic context.  Notice that it is currently done that way.
> > > 
> > > > 
> > > > Do you feel using get_user_pages_remote would be a better idea? We can
> > > > specify the mm_struct in get_user_pages_remote?
> > > 
> > > From that standpoint maybe, but is it safe in interrupt context?
> > Hmm.. The gup maintainers seemed fine with the code..
> > 
> > Now this is only an issue if gru_vtop can be executed in an interrupt
> > context. 
> > 
> > get_user_pages_remote is not valid in an interrupt context(if CONFIG_MMU
> > is set). If we follow the function, in __get_user_pages, cond_resched()
> > is called which definitly confirms that we can't run this function in an
> > interrupt context. 
> > 
> > I think we might need some advice from the gup maintainers here.
> > Note that the comment on the function __get_user_pages_fast states that
> > __get_user_pages_fast is IRQ-safe.
> 
> vhost is doing some approach where they switch current to the target
> then call __get_user_pages_fast in an IRQ context, that might be a
> reasonable pattern
> 
> If this is a regular occurance we should probably add a
> get_atomic_user_pages_remote() to make the pattern clear.
> 
> Jason

That makes sense. get_atomic_user_pages_remote() should not be hard to
write. AFAIKS __get_user_pages_fast is special_cased for current, we
could probably just add a new parameter of the mm_struct to the page
table walking code in gup.c

But till then I think we can approach this by the way vhost approaches
this problem by switching current to the target. 

Thank you
Bharath

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Linux-kernel-mentees][PATCH v5 1/1] sgi-gru: Remove *pte_lookup functions, Convert to get_user_page*()
  2019-08-09 19:38 ` [Linux-kernel-mentees][PATCH v5 1/1] sgi-gru: Remove *pte_lookup functions, Convert to get_user_page*() Bharath Vedartham
  2019-08-13 14:50   ` Dimitri Sivanich
@ 2019-08-18 17:58   ` Bharath Vedartham
  2019-08-18 18:25     ` Dimitri Sivanich
  1 sibling, 1 reply; 10+ messages in thread
From: Bharath Vedartham @ 2019-08-18 17:58 UTC (permalink / raw)
  To: jhubbard, gregkh, sivanich, arnd
  Cc: ira.weiny, jglisse, william.kucharski, hch, linux-kernel,
	linux-mm, linux-kernel-mentees

Hi Dimitri,

Can you confirm that this driver will run gru_vtop() in interrupt
context?

If so, I ll send you another set of patches in which I don't change the
*pte_lookup functions but only change put_page to put_user_page and
remove the ifdef for CONFIG_HUGETLB_PAGE.

Thank you for your time.

Thank you
Bharath

On Sat, Aug 10, 2019 at 01:08:17AM +0530, Bharath Vedartham wrote:
> For pages that were retained via get_user_pages*(), release those pages
> via the new put_user_page*() routines, instead of via put_page() or
> release_pages().
> 
> This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> ("mm: introduce put_user_page*(), placeholder versions").
> 
> As part of this conversion, the *pte_lookup functions can be removed and
> be easily replaced with get_user_pages_fast() functions. In the case of
> atomic lookup, __get_user_pages_fast() is used, because it does not fall
> back to the slow path: get_user_pages(). get_user_pages_fast(), on the other
> hand, first calls __get_user_pages_fast(), but then falls back to the
> slow path if __get_user_pages_fast() fails.
> 
> Also: remove unnecessary CONFIG_HUGETLB ifdefs.
> 
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Dimitri Sivanich <sivanich@sgi.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: William Kucharski <william.kucharski@oracle.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-mm@kvack.org
> Cc: linux-kernel-mentees@lists.linuxfoundation.org
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
> Signed-off-by: Bharath Vedartham <linux.bhar@gmail.com>
> ---
> This is a fold of the 3 patches in the v2 patch series.
> The review tags were given to the individual patches.
> 
> Changes since v3
> 	- Used gup flags in get_user_pages_fast rather than
> 	boolean flags.
> Changes since v4
> 	- Updated changelog according to John Hubbard.
> ---
>  drivers/misc/sgi-gru/grufault.c | 112 +++++++++-------------------------------
>  1 file changed, 24 insertions(+), 88 deletions(-)
> 
> diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
> index 4b713a8..304e9c5 100644
> --- a/drivers/misc/sgi-gru/grufault.c
> +++ b/drivers/misc/sgi-gru/grufault.c
> @@ -166,96 +166,20 @@ static void get_clear_fault_map(struct gru_state *gru,
>  }
>  
>  /*
> - * Atomic (interrupt context) & non-atomic (user context) functions to
> - * convert a vaddr into a physical address. The size of the page
> - * is returned in pageshift.
> - * 	returns:
> - * 		  0 - successful
> - * 		< 0 - error code
> - * 		  1 - (atomic only) try again in non-atomic context
> - */
> -static int non_atomic_pte_lookup(struct vm_area_struct *vma,
> -				 unsigned long vaddr, int write,
> -				 unsigned long *paddr, int *pageshift)
> -{
> -	struct page *page;
> -
> -#ifdef CONFIG_HUGETLB_PAGE
> -	*pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
> -#else
> -	*pageshift = PAGE_SHIFT;
> -#endif
> -	if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page, NULL) <= 0)
> -		return -EFAULT;
> -	*paddr = page_to_phys(page);
> -	put_page(page);
> -	return 0;
> -}
> -
> -/*
> - * atomic_pte_lookup
> + * mmap_sem is already helod on entry to this function. This guarantees
> + * existence of the page tables.
>   *
> - * Convert a user virtual address to a physical address
>   * Only supports Intel large pages (2MB only) on x86_64.
> - *	ZZZ - hugepage support is incomplete
> - *
> - * NOTE: mmap_sem is already held on entry to this function. This
> - * guarantees existence of the page tables.
> + *	ZZZ - hugepage support is incomplete.
>   */
> -static int atomic_pte_lookup(struct vm_area_struct *vma, unsigned long vaddr,
> -	int write, unsigned long *paddr, int *pageshift)
> -{
> -	pgd_t *pgdp;
> -	p4d_t *p4dp;
> -	pud_t *pudp;
> -	pmd_t *pmdp;
> -	pte_t pte;
> -
> -	pgdp = pgd_offset(vma->vm_mm, vaddr);
> -	if (unlikely(pgd_none(*pgdp)))
> -		goto err;
> -
> -	p4dp = p4d_offset(pgdp, vaddr);
> -	if (unlikely(p4d_none(*p4dp)))
> -		goto err;
> -
> -	pudp = pud_offset(p4dp, vaddr);
> -	if (unlikely(pud_none(*pudp)))
> -		goto err;
> -
> -	pmdp = pmd_offset(pudp, vaddr);
> -	if (unlikely(pmd_none(*pmdp)))
> -		goto err;
> -#ifdef CONFIG_X86_64
> -	if (unlikely(pmd_large(*pmdp)))
> -		pte = *(pte_t *) pmdp;
> -	else
> -#endif
> -		pte = *pte_offset_kernel(pmdp, vaddr);
> -
> -	if (unlikely(!pte_present(pte) ||
> -		     (write && (!pte_write(pte) || !pte_dirty(pte)))))
> -		return 1;
> -
> -	*paddr = pte_pfn(pte) << PAGE_SHIFT;
> -#ifdef CONFIG_HUGETLB_PAGE
> -	*pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
> -#else
> -	*pageshift = PAGE_SHIFT;
> -#endif
> -	return 0;
> -
> -err:
> -	return 1;
> -}
> -
>  static int gru_vtop(struct gru_thread_state *gts, unsigned long vaddr,
>  		    int write, int atomic, unsigned long *gpa, int *pageshift)
>  {
>  	struct mm_struct *mm = gts->ts_mm;
>  	struct vm_area_struct *vma;
>  	unsigned long paddr;
> -	int ret, ps;
> +	int ret;
> +	struct page *page;
>  
>  	vma = find_vma(mm, vaddr);
>  	if (!vma)
> @@ -263,21 +187,33 @@ static int gru_vtop(struct gru_thread_state *gts, unsigned long vaddr,
>  
>  	/*
>  	 * Atomic lookup is faster & usually works even if called in non-atomic
> -	 * context.
> +	 * context. get_user_pages_fast does atomic lookup before falling back to
> +	 * slow gup.
>  	 */
>  	rmb();	/* Must/check ms_range_active before loading PTEs */
> -	ret = atomic_pte_lookup(vma, vaddr, write, &paddr, &ps);
> -	if (ret) {
> -		if (atomic)
> +	if (atomic) {
> +		ret = __get_user_pages_fast(vaddr, 1, write, &page);
> +		if (!ret)
>  			goto upm;
> -		if (non_atomic_pte_lookup(vma, vaddr, write, &paddr, &ps))
> +	} else {
> +		ret = get_user_pages_fast(vaddr, 1, write ? FOLL_WRITE : 0, &page);
> +		if (!ret)
>  			goto inval;
>  	}
> +
> +	paddr = page_to_phys(page);
> +	put_user_page(page);
> +
> +	if (unlikely(is_vm_hugetlb_page(vma)))
> +		*pageshift = HPAGE_SHIFT;
> +	else
> +		*pageshift = PAGE_SHIFT;
> +
>  	if (is_gru_paddr(paddr))
>  		goto inval;
> -	paddr = paddr & ~((1UL << ps) - 1);
> +	paddr = paddr & ~((1UL << *pageshift) - 1);
>  	*gpa = uv_soc_phys_ram_to_gpa(paddr);
> -	*pageshift = ps;
> +
>  	return VTOP_SUCCESS;
>  
>  inval:
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Linux-kernel-mentees][PATCH v5 1/1] sgi-gru: Remove *pte_lookup functions, Convert to get_user_page*()
  2019-08-18 17:58   ` Bharath Vedartham
@ 2019-08-18 18:25     ` Dimitri Sivanich
  0 siblings, 0 replies; 10+ messages in thread
From: Dimitri Sivanich @ 2019-08-18 18:25 UTC (permalink / raw)
  To: Bharath Vedartham
  Cc: jhubbard, gregkh, sivanich, arnd, ira.weiny, jglisse,
	william.kucharski, hch, linux-kernel, linux-mm,
	linux-kernel-mentees

Yes it will.

On Sun, Aug 18, 2019 at 11:28:24PM +0530, Bharath Vedartham wrote:
> Hi Dimitri,
> 
> Can you confirm that this driver will run gru_vtop() in interrupt
> context?
> 
> If so, I ll send you another set of patches in which I don't change the
> *pte_lookup functions but only change put_page to put_user_page and
> remove the ifdef for CONFIG_HUGETLB_PAGE.
> 
> Thank you for your time.
> 
> Thank you
> Bharath
> 
> On Sat, Aug 10, 2019 at 01:08:17AM +0530, Bharath Vedartham wrote:
> > For pages that were retained via get_user_pages*(), release those pages
> > via the new put_user_page*() routines, instead of via put_page() or
> > release_pages().
> > 
> > This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> > ("mm: introduce put_user_page*(), placeholder versions").
> > 
> > As part of this conversion, the *pte_lookup functions can be removed and
> > be easily replaced with get_user_pages_fast() functions. In the case of
> > atomic lookup, __get_user_pages_fast() is used, because it does not fall
> > back to the slow path: get_user_pages(). get_user_pages_fast(), on the other
> > hand, first calls __get_user_pages_fast(), but then falls back to the
> > slow path if __get_user_pages_fast() fails.
> > 
> > Also: remove unnecessary CONFIG_HUGETLB ifdefs.
> > 
> > Cc: Ira Weiny <ira.weiny@intel.com>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: Dimitri Sivanich <sivanich@sgi.com>
> > Cc: Arnd Bergmann <arnd@arndb.de>
> > Cc: William Kucharski <william.kucharski@oracle.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux-mm@kvack.org
> > Cc: linux-kernel-mentees@lists.linuxfoundation.org
> > Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> > Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> > Reviewed-by: William Kucharski <william.kucharski@oracle.com>
> > Signed-off-by: Bharath Vedartham <linux.bhar@gmail.com>
> > ---
> > This is a fold of the 3 patches in the v2 patch series.
> > The review tags were given to the individual patches.
> > 
> > Changes since v3
> > 	- Used gup flags in get_user_pages_fast rather than
> > 	boolean flags.
> > Changes since v4
> > 	- Updated changelog according to John Hubbard.
> > ---
> >  drivers/misc/sgi-gru/grufault.c | 112 +++++++++-------------------------------
> >  1 file changed, 24 insertions(+), 88 deletions(-)
> > 
> > diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
> > index 4b713a8..304e9c5 100644
> > --- a/drivers/misc/sgi-gru/grufault.c
> > +++ b/drivers/misc/sgi-gru/grufault.c
> > @@ -166,96 +166,20 @@ static void get_clear_fault_map(struct gru_state *gru,
> >  }
> >  
> >  /*
> > - * Atomic (interrupt context) & non-atomic (user context) functions to
> > - * convert a vaddr into a physical address. The size of the page
> > - * is returned in pageshift.
> > - * 	returns:
> > - * 		  0 - successful
> > - * 		< 0 - error code
> > - * 		  1 - (atomic only) try again in non-atomic context
> > - */
> > -static int non_atomic_pte_lookup(struct vm_area_struct *vma,
> > -				 unsigned long vaddr, int write,
> > -				 unsigned long *paddr, int *pageshift)
> > -{
> > -	struct page *page;
> > -
> > -#ifdef CONFIG_HUGETLB_PAGE
> > -	*pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
> > -#else
> > -	*pageshift = PAGE_SHIFT;
> > -#endif
> > -	if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page, NULL) <= 0)
> > -		return -EFAULT;
> > -	*paddr = page_to_phys(page);
> > -	put_page(page);
> > -	return 0;
> > -}
> > -
> > -/*
> > - * atomic_pte_lookup
> > + * mmap_sem is already helod on entry to this function. This guarantees
> > + * existence of the page tables.
> >   *
> > - * Convert a user virtual address to a physical address
> >   * Only supports Intel large pages (2MB only) on x86_64.
> > - *	ZZZ - hugepage support is incomplete
> > - *
> > - * NOTE: mmap_sem is already held on entry to this function. This
> > - * guarantees existence of the page tables.
> > + *	ZZZ - hugepage support is incomplete.
> >   */
> > -static int atomic_pte_lookup(struct vm_area_struct *vma, unsigned long vaddr,
> > -	int write, unsigned long *paddr, int *pageshift)
> > -{
> > -	pgd_t *pgdp;
> > -	p4d_t *p4dp;
> > -	pud_t *pudp;
> > -	pmd_t *pmdp;
> > -	pte_t pte;
> > -
> > -	pgdp = pgd_offset(vma->vm_mm, vaddr);
> > -	if (unlikely(pgd_none(*pgdp)))
> > -		goto err;
> > -
> > -	p4dp = p4d_offset(pgdp, vaddr);
> > -	if (unlikely(p4d_none(*p4dp)))
> > -		goto err;
> > -
> > -	pudp = pud_offset(p4dp, vaddr);
> > -	if (unlikely(pud_none(*pudp)))
> > -		goto err;
> > -
> > -	pmdp = pmd_offset(pudp, vaddr);
> > -	if (unlikely(pmd_none(*pmdp)))
> > -		goto err;
> > -#ifdef CONFIG_X86_64
> > -	if (unlikely(pmd_large(*pmdp)))
> > -		pte = *(pte_t *) pmdp;
> > -	else
> > -#endif
> > -		pte = *pte_offset_kernel(pmdp, vaddr);
> > -
> > -	if (unlikely(!pte_present(pte) ||
> > -		     (write && (!pte_write(pte) || !pte_dirty(pte)))))
> > -		return 1;
> > -
> > -	*paddr = pte_pfn(pte) << PAGE_SHIFT;
> > -#ifdef CONFIG_HUGETLB_PAGE
> > -	*pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
> > -#else
> > -	*pageshift = PAGE_SHIFT;
> > -#endif
> > -	return 0;
> > -
> > -err:
> > -	return 1;
> > -}
> > -
> >  static int gru_vtop(struct gru_thread_state *gts, unsigned long vaddr,
> >  		    int write, int atomic, unsigned long *gpa, int *pageshift)
> >  {
> >  	struct mm_struct *mm = gts->ts_mm;
> >  	struct vm_area_struct *vma;
> >  	unsigned long paddr;
> > -	int ret, ps;
> > +	int ret;
> > +	struct page *page;
> >  
> >  	vma = find_vma(mm, vaddr);
> >  	if (!vma)
> > @@ -263,21 +187,33 @@ static int gru_vtop(struct gru_thread_state *gts, unsigned long vaddr,
> >  
> >  	/*
> >  	 * Atomic lookup is faster & usually works even if called in non-atomic
> > -	 * context.
> > +	 * context. get_user_pages_fast does atomic lookup before falling back to
> > +	 * slow gup.
> >  	 */
> >  	rmb();	/* Must/check ms_range_active before loading PTEs */
> > -	ret = atomic_pte_lookup(vma, vaddr, write, &paddr, &ps);
> > -	if (ret) {
> > -		if (atomic)
> > +	if (atomic) {
> > +		ret = __get_user_pages_fast(vaddr, 1, write, &page);
> > +		if (!ret)
> >  			goto upm;
> > -		if (non_atomic_pte_lookup(vma, vaddr, write, &paddr, &ps))
> > +	} else {
> > +		ret = get_user_pages_fast(vaddr, 1, write ? FOLL_WRITE : 0, &page);
> > +		if (!ret)
> >  			goto inval;
> >  	}
> > +
> > +	paddr = page_to_phys(page);
> > +	put_user_page(page);
> > +
> > +	if (unlikely(is_vm_hugetlb_page(vma)))
> > +		*pageshift = HPAGE_SHIFT;
> > +	else
> > +		*pageshift = PAGE_SHIFT;
> > +
> >  	if (is_gru_paddr(paddr))
> >  		goto inval;
> > -	paddr = paddr & ~((1UL << ps) - 1);
> > +	paddr = paddr & ~((1UL << *pageshift) - 1);
> >  	*gpa = uv_soc_phys_ram_to_gpa(paddr);
> > -	*pageshift = ps;
> > +
> >  	return VTOP_SUCCESS;
> >  
> >  inval:
> > -- 
> > 2.7.4
> > 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-08-18 18:26 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-09 19:38 [Linux-kernel-mentees][PATCH v5 0/1] get_user_pages changes Bharath Vedartham
2019-08-09 19:38 ` [Linux-kernel-mentees][PATCH v5 1/1] sgi-gru: Remove *pte_lookup functions, Convert to get_user_page*() Bharath Vedartham
2019-08-13 14:50   ` Dimitri Sivanich
2019-08-13 17:23     ` Bharath Vedartham
2019-08-13 18:19       ` Dimitri Sivanich
2019-08-14 17:30         ` Bharath Vedartham
2019-08-14 17:38           ` Jason Gunthorpe
2019-08-14 18:00             ` Bharath Vedartham
2019-08-18 17:58   ` Bharath Vedartham
2019-08-18 18:25     ` Dimitri Sivanich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).