linux-next.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v1 0/2] add callbacks for inaccessible pages
@ 2020-02-28 15:43 Claudio Imbrenda
  2020-02-28 15:43 ` [RFC v1 1/2] fixup for 9947ea2c1e608e32669d5caeb67b3e3fba3309e8 "mm/gup: track FOLL_PIN pages" Claudio Imbrenda
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Claudio Imbrenda @ 2020-02-28 15:43 UTC (permalink / raw)
  To: linux-next, akpm
  Cc: borntraeger, david, aarcange, linux-mm, frankja, sfr, jhubbard,
	linux-kernel, linux-s390

This patchset has a fixup for gup/mm, and provides the necessary arch
hooks to enable protected virtualization.

Claudio Imbrenda (2):
  mm/gup: fixup for 9947ea2c1e608e32 "mm/gup: track FOLL_PIN pages"
  mm/gup/writeback: add callbacks for inaccessible pages

 include/linux/gfp.h |  6 ++++++
 mm/gup.c            | 24 ++++++++++++++++++++----
 mm/page-writeback.c |  5 +++++
 3 files changed, 31 insertions(+), 4 deletions(-)

-- 
2.24.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC v1 1/2] fixup for 9947ea2c1e608e32669d5caeb67b3e3fba3309e8 "mm/gup: track FOLL_PIN pages"
  2020-02-28 15:43 [RFC v1 0/2] add callbacks for inaccessible pages Claudio Imbrenda
@ 2020-02-28 15:43 ` Claudio Imbrenda
  2020-02-28 15:45   ` Claudio Imbrenda
  2020-02-28 15:43 ` [RFC v1 1/2] mm/gup: fixup for 9947ea2c1e608e32 " Claudio Imbrenda
  2020-02-28 15:43 ` [RFC v1 2/2] mm/gup/writeback: add callbacks for inaccessible pages Claudio Imbrenda
  2 siblings, 1 reply; 14+ messages in thread
From: Claudio Imbrenda @ 2020-02-28 15:43 UTC (permalink / raw)
  To: linux-next, akpm
  Cc: borntraeger, david, aarcange, linux-mm, frankja, sfr, jhubbard,
	linux-kernel, linux-s390

in case pin fails, we need to unpin, just a put_page will not be enough

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
 mm/gup.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/gup.c b/mm/gup.c
index f589299b0d4a..0b9a806898f3 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2134,7 +2134,10 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 			goto pte_unmap;
 
 		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
-			put_page(head);
+			if (flags & FOLL_GET)
+				put_page(head);
+			else if (flags & FOLL_PIN)
+				unpin_user_page(head);
 			goto pte_unmap;
 		}
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC v1 1/2] mm/gup: fixup for 9947ea2c1e608e32 "mm/gup: track FOLL_PIN pages"
  2020-02-28 15:43 [RFC v1 0/2] add callbacks for inaccessible pages Claudio Imbrenda
  2020-02-28 15:43 ` [RFC v1 1/2] fixup for 9947ea2c1e608e32669d5caeb67b3e3fba3309e8 "mm/gup: track FOLL_PIN pages" Claudio Imbrenda
@ 2020-02-28 15:43 ` Claudio Imbrenda
  2020-02-28 23:08   ` John Hubbard
  2020-02-28 15:43 ` [RFC v1 2/2] mm/gup/writeback: add callbacks for inaccessible pages Claudio Imbrenda
  2 siblings, 1 reply; 14+ messages in thread
From: Claudio Imbrenda @ 2020-02-28 15:43 UTC (permalink / raw)
  To: linux-next, akpm
  Cc: borntraeger, david, aarcange, linux-mm, frankja, sfr, jhubbard,
	linux-kernel, linux-s390

In case pin fails, we need to unpin, a simple put_page will not be enough

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
 mm/gup.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/gup.c b/mm/gup.c
index f589299b0d4a..0b9a806898f3 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2134,7 +2134,10 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 			goto pte_unmap;
 
 		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
-			put_page(head);
+			if (flags & FOLL_GET)
+				put_page(head);
+			else if (flags & FOLL_PIN)
+				unpin_user_page(head);
 			goto pte_unmap;
 		}
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC v1 2/2] mm/gup/writeback: add callbacks for inaccessible pages
  2020-02-28 15:43 [RFC v1 0/2] add callbacks for inaccessible pages Claudio Imbrenda
  2020-02-28 15:43 ` [RFC v1 1/2] fixup for 9947ea2c1e608e32669d5caeb67b3e3fba3309e8 "mm/gup: track FOLL_PIN pages" Claudio Imbrenda
  2020-02-28 15:43 ` [RFC v1 1/2] mm/gup: fixup for 9947ea2c1e608e32 " Claudio Imbrenda
@ 2020-02-28 15:43 ` Claudio Imbrenda
  2020-02-28 16:08   ` Christian Borntraeger
  2 siblings, 1 reply; 14+ messages in thread
From: Claudio Imbrenda @ 2020-02-28 15:43 UTC (permalink / raw)
  To: linux-next, akpm
  Cc: borntraeger, david, aarcange, linux-mm, frankja, sfr, jhubbard,
	linux-kernel, linux-s390, Will Deacon

With the introduction of protected KVM guests on s390 there is now a
concept of inaccessible pages. These pages need to be made accessible
before the host can access them.

While cpu accesses will trigger a fault that can be resolved, I/O
accesses will just fail.  We need to add a callback into architecture
code for places that will do I/O, namely when writeback is started or
when a page reference is taken.

This is not only to enable paging, file backing etc, it is also
necessary to protect the host against a malicious user space. For
example a bad QEMU could simply start direct I/O on such protected
memory.  We do not want userspace to be able to trigger I/O errors and
thus we the logic is "whenever somebody accesses that page (gup) or
does I/O, make sure that this page can be accessed". When the guest
tries to access that page we will wait in the page fault handler for
writeback to have finished and for the page_ref to be the expected
value.

On s390x the function is not supposed to fail, so it is ok to use a
WARN_ON on failure. If we ever need some more finegrained handling
we can tackle this when we know the details.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 include/linux/gfp.h |  6 ++++++
 mm/gup.c            | 19 ++++++++++++++++---
 mm/page-writeback.c |  5 +++++
 3 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index e5b817cb86e7..be2754841369 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -485,6 +485,12 @@ static inline void arch_free_page(struct page *page, int order) { }
 #ifndef HAVE_ARCH_ALLOC_PAGE
 static inline void arch_alloc_page(struct page *page, int order) { }
 #endif
+#ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
+static inline int arch_make_page_accessible(struct page *page)
+{
+	return 0;
+}
+#endif
 
 struct page *
 __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
diff --git a/mm/gup.c b/mm/gup.c
index 0b9a806898f3..86fff6e4e4f3 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -391,6 +391,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
 	struct page *page;
 	spinlock_t *ptl;
 	pte_t *ptep, pte;
+	int ret;
 
 	/* FOLL_GET and FOLL_PIN are mutually exclusive. */
 	if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) ==
@@ -449,8 +450,6 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
 		if (is_zero_pfn(pte_pfn(pte))) {
 			page = pte_page(pte);
 		} else {
-			int ret;
-
 			ret = follow_pfn_pte(vma, address, ptep, flags);
 			page = ERR_PTR(ret);
 			goto out;
@@ -458,7 +457,6 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
 	}
 
 	if (flags & FOLL_SPLIT && PageTransCompound(page)) {
-		int ret;
 		get_page(page);
 		pte_unmap_unlock(ptep, ptl);
 		lock_page(page);
@@ -475,6 +473,14 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
 		page = ERR_PTR(-ENOMEM);
 		goto out;
 	}
+	if (flags & FOLL_PIN) {
+		ret = arch_make_page_accessible(page);
+		if (ret) {
+			unpin_user_page(page);
+			page = ERR_PTR(ret);
+			goto out;
+		}
+	}
 	if (flags & FOLL_TOUCH) {
 		if ((flags & FOLL_WRITE) &&
 		    !pte_dirty(pte) && !PageDirty(page))
@@ -2143,6 +2149,13 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 
 		VM_BUG_ON_PAGE(compound_head(page) != head, page);
 
+		if (flags & FOLL_PIN) {
+			ret = arch_make_page_accessible(page);
+			if (ret) {
+				unpin_user_page(page);
+				goto pte_unmap;
+			}
+		}
 		SetPageReferenced(page);
 		pages[*nr] = page;
 		(*nr)++;
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index ab5a3cee8ad3..8384be5a2758 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2807,6 +2807,11 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
 		inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
 	}
 	unlock_page_memcg(page);
+	/*
+	 * If writeback has been triggered on a page that cannot be made
+	 * accessible, it is too late.
+	 */
+	WARN_ON(arch_make_page_accessible(page));
 	return ret;
 
 }
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC v1 1/2] fixup for 9947ea2c1e608e32669d5caeb67b3e3fba3309e8 "mm/gup: track FOLL_PIN pages"
  2020-02-28 15:43 ` [RFC v1 1/2] fixup for 9947ea2c1e608e32669d5caeb67b3e3fba3309e8 "mm/gup: track FOLL_PIN pages" Claudio Imbrenda
@ 2020-02-28 15:45   ` Claudio Imbrenda
  0 siblings, 0 replies; 14+ messages in thread
From: Claudio Imbrenda @ 2020-02-28 15:45 UTC (permalink / raw)
  To: linux-next, akpm
  Cc: borntraeger, david, aarcange, linux-mm, frankja, sfr, jhubbard,
	linux-kernel, linux-s390

sorry, this one was a mistake, please ignore


On Fri, 28 Feb 2020 16:43:20 +0100
Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:

> in case pin fails, we need to unpin, just a put_page will not be
> enough
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> ---
>  mm/gup.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index f589299b0d4a..0b9a806898f3 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2134,7 +2134,10 @@ static int gup_pte_range(pmd_t pmd, unsigned
> long addr, unsigned long end, goto pte_unmap;
>  
>  		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
> -			put_page(head);
> +			if (flags & FOLL_GET)
> +				put_page(head);
> +			else if (flags & FOLL_PIN)
> +				unpin_user_page(head);
>  			goto pte_unmap;
>  		}
>  


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v1 2/2] mm/gup/writeback: add callbacks for inaccessible pages
  2020-02-28 15:43 ` [RFC v1 2/2] mm/gup/writeback: add callbacks for inaccessible pages Claudio Imbrenda
@ 2020-02-28 16:08   ` Christian Borntraeger
  2020-02-29  0:08     ` John Hubbard
  2020-03-01  3:47     ` Andrew Morton
  0 siblings, 2 replies; 14+ messages in thread
From: Christian Borntraeger @ 2020-02-28 16:08 UTC (permalink / raw)
  To: Claudio Imbrenda, linux-next, akpm
  Cc: david, aarcange, linux-mm, frankja, sfr, jhubbard, linux-kernel,
	linux-s390, Will Deacon

Andrew,

while patch 1 is a fixup for the FOLL_PIN work in your patch queue,
I would really love to see this patch in 5.7. The exploitation code
of kvm/s390 is in Linux next also scheduled for 5.7.

Christian

On 28.02.20 16:43, Claudio Imbrenda wrote:
> With the introduction of protected KVM guests on s390 there is now a
> concept of inaccessible pages. These pages need to be made accessible
> before the host can access them.
> 
> While cpu accesses will trigger a fault that can be resolved, I/O
> accesses will just fail.  We need to add a callback into architecture
> code for places that will do I/O, namely when writeback is started or
> when a page reference is taken.
> 
> This is not only to enable paging, file backing etc, it is also
> necessary to protect the host against a malicious user space. For
> example a bad QEMU could simply start direct I/O on such protected
> memory.  We do not want userspace to be able to trigger I/O errors and
> thus we the logic is "whenever somebody accesses that page (gup) or
> does I/O, make sure that this page can be accessed". When the guest
> tries to access that page we will wait in the page fault handler for
> writeback to have finished and for the page_ref to be the expected
> value.
> 
> On s390x the function is not supposed to fail, so it is ok to use a
> WARN_ON on failure. If we ever need some more finegrained handling
> we can tackle this when we know the details.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Acked-by: Will Deacon <will@kernel.org>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  include/linux/gfp.h |  6 ++++++
>  mm/gup.c            | 19 ++++++++++++++++---
>  mm/page-writeback.c |  5 +++++
>  3 files changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index e5b817cb86e7..be2754841369 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -485,6 +485,12 @@ static inline void arch_free_page(struct page *page, int order) { }
>  #ifndef HAVE_ARCH_ALLOC_PAGE
>  static inline void arch_alloc_page(struct page *page, int order) { }
>  #endif
> +#ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
> +static inline int arch_make_page_accessible(struct page *page)
> +{
> +	return 0;
> +}
> +#endif
>  
>  struct page *
>  __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> diff --git a/mm/gup.c b/mm/gup.c
> index 0b9a806898f3..86fff6e4e4f3 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -391,6 +391,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
>  	struct page *page;
>  	spinlock_t *ptl;
>  	pte_t *ptep, pte;
> +	int ret;
>  
>  	/* FOLL_GET and FOLL_PIN are mutually exclusive. */
>  	if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) ==
> @@ -449,8 +450,6 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
>  		if (is_zero_pfn(pte_pfn(pte))) {
>  			page = pte_page(pte);
>  		} else {
> -			int ret;
> -
>  			ret = follow_pfn_pte(vma, address, ptep, flags);
>  			page = ERR_PTR(ret);
>  			goto out;
> @@ -458,7 +457,6 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
>  	}
>  
>  	if (flags & FOLL_SPLIT && PageTransCompound(page)) {
> -		int ret;
>  		get_page(page);
>  		pte_unmap_unlock(ptep, ptl);
>  		lock_page(page);
> @@ -475,6 +473,14 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
>  		page = ERR_PTR(-ENOMEM);
>  		goto out;
>  	}
> +	if (flags & FOLL_PIN) {
> +		ret = arch_make_page_accessible(page);
> +		if (ret) {
> +			unpin_user_page(page);
> +			page = ERR_PTR(ret);
> +			goto out;
> +		}
> +	}
>  	if (flags & FOLL_TOUCH) {
>  		if ((flags & FOLL_WRITE) &&
>  		    !pte_dirty(pte) && !PageDirty(page))
> @@ -2143,6 +2149,13 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>  
>  		VM_BUG_ON_PAGE(compound_head(page) != head, page);
>  
> +		if (flags & FOLL_PIN) {
> +			ret = arch_make_page_accessible(page);
> +			if (ret) {
> +				unpin_user_page(page);
> +				goto pte_unmap;
> +			}
> +		}
>  		SetPageReferenced(page);
>  		pages[*nr] = page;
>  		(*nr)++;
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index ab5a3cee8ad3..8384be5a2758 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2807,6 +2807,11 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
>  		inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
>  	}
>  	unlock_page_memcg(page);
> +	/*
> +	 * If writeback has been triggered on a page that cannot be made
> +	 * accessible, it is too late.
> +	 */
> +	WARN_ON(arch_make_page_accessible(page));
>  	return ret;
>  
>  }
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v1 1/2] mm/gup: fixup for 9947ea2c1e608e32 "mm/gup: track FOLL_PIN pages"
  2020-02-28 15:43 ` [RFC v1 1/2] mm/gup: fixup for 9947ea2c1e608e32 " Claudio Imbrenda
@ 2020-02-28 23:08   ` John Hubbard
  2020-02-29 10:51     ` Claudio Imbrenda
  2020-03-02 13:46     ` Michal Hocko
  0 siblings, 2 replies; 14+ messages in thread
From: John Hubbard @ 2020-02-28 23:08 UTC (permalink / raw)
  To: Claudio Imbrenda, linux-next, akpm
  Cc: borntraeger, david, aarcange, linux-mm, frankja, sfr,
	linux-kernel, linux-s390, Jan Kara, Kirill A. Shutemov

On 2/28/20 7:43 AM, Claudio Imbrenda wrote:
> In case pin fails, we need to unpin, a simple put_page will not be enough
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> ---
>  mm/gup.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index f589299b0d4a..0b9a806898f3 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2134,7 +2134,10 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>  			goto pte_unmap;
>  
>  		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
> -			put_page(head);
> +			if (flags & FOLL_GET)
> +				put_page(head);
> +			else if (flags & FOLL_PIN)
> +				unpin_user_page(head);

Hi Claudio,

Instead, I think that should actually be:

		put_compound_head(page, 1, flags);

which does a bit more (bug checks and /proc/vmstat instrumentation) than your diff,
but has the same basic idea: call the right "put" function.  

...oh, actually, I see you have the commit hash in the subject line. Instead, it should 
be in the commit description. Let's maybe change the subject line to approx:

    mm/gup: Fix a missing put_compound_head() call in gup_pte_range()

And the write up...how about something like this, if you like it:


try_grab_compound_head() must be undone via put_compound_head(), not put_page().
This was missed in the original implementation of the gup/dma tracking patch, so
fix it now.

Fixes: 0ea2781c3de4 ("mm/gup: track FOLL_PIN pages")


    
(Aside: I'm using the linux-next commit hash. How does one get the correct hash before
it goes to mainline? I guess maintainer scripts fix all those up?)

It's also good to Cc some reviewers in case I'm overlooking something, so I'm adding
Jan and Kirill.

thanks,
-- 
John Hubbard
NVIDIA
>  			goto pte_unmap;
>  		}
>  
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v1 2/2] mm/gup/writeback: add callbacks for inaccessible pages
  2020-02-28 16:08   ` Christian Borntraeger
@ 2020-02-29  0:08     ` John Hubbard
  2020-02-29 10:49       ` Claudio Imbrenda
  2020-03-01  3:47     ` Andrew Morton
  1 sibling, 1 reply; 14+ messages in thread
From: John Hubbard @ 2020-02-29  0:08 UTC (permalink / raw)
  To: Christian Borntraeger, Claudio Imbrenda, linux-next, akpm
  Cc: david, aarcange, linux-mm, frankja, sfr, linux-kernel,
	linux-s390, Will Deacon

On 2/28/20 8:08 AM, Christian Borntraeger wrote:
> Andrew,
> 
> while patch 1 is a fixup for the FOLL_PIN work in your patch queue,
> I would really love to see this patch in 5.7. The exploitation code
> of kvm/s390 is in Linux next also scheduled for 5.7.
> 
> Christian
> 
> On 28.02.20 16:43, Claudio Imbrenda wrote:
>> With the introduction of protected KVM guests on s390 there is now a
>> concept of inaccessible pages. These pages need to be made accessible
>> before the host can access them.
>>
>> While cpu accesses will trigger a fault that can be resolved, I/O
>> accesses will just fail.  We need to add a callback into architecture
>> code for places that will do I/O, namely when writeback is started or
>> when a page reference is taken.
>>
>> This is not only to enable paging, file backing etc, it is also
>> necessary to protect the host against a malicious user space. For
>> example a bad QEMU could simply start direct I/O on such protected
>> memory.  We do not want userspace to be able to trigger I/O errors and
>> thus we the logic is "whenever somebody accesses that page (gup) or


I actually kind of like the sound of that: "We the logic of the kernel,
in order to form a more perfect computer..." :)

Probably this wording is what you want, though:

"thus the logic is "whenever somebody (gup) accesses that page or"


...
>> @@ -458,7 +457,6 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
>>  	}
>>  
>>  	if (flags & FOLL_SPLIT && PageTransCompound(page)) {
>> -		int ret;
>>  		get_page(page);
>>  		pte_unmap_unlock(ptep, ptl);
>>  		lock_page(page);
>> @@ -475,6 +473,14 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
>>  		page = ERR_PTR(-ENOMEM);
>>  		goto out;
>>  	}
>> +	if (flags & FOLL_PIN) {


What about FOLL_GET? Unless your calling code has some sort of BUG_ON(flags & FOLL_GET),
I'm not sure it's a good idea to leave that case unhandled.


>> +		ret = arch_make_page_accessible(page);
>> +		if (ret) {
>> +			unpin_user_page(page);
>> +			page = ERR_PTR(ret);
>> +			goto out;
>> +		}
>> +	}
>>  	if (flags & FOLL_TOUCH) {
>>  		if ((flags & FOLL_WRITE) &&
>>  		    !pte_dirty(pte) && !PageDirty(page))
>> @@ -2143,6 +2149,13 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>>  
>>  		VM_BUG_ON_PAGE(compound_head(page) != head, page);
>>  
>> +		if (flags & FOLL_PIN) {
>> +			ret = arch_make_page_accessible(page);
>> +			if (ret) {
>> +				unpin_user_page(page);


Same concern as above, about leaving FOLL_GET unhandled.


thanks,
-- 
John Hubbard
NVIDIA




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v1 2/2] mm/gup/writeback: add callbacks for inaccessible pages
  2020-02-29  0:08     ` John Hubbard
@ 2020-02-29 10:49       ` Claudio Imbrenda
  2020-02-29 20:07         ` John Hubbard
  0 siblings, 1 reply; 14+ messages in thread
From: Claudio Imbrenda @ 2020-02-29 10:49 UTC (permalink / raw)
  To: John Hubbard
  Cc: Christian Borntraeger, linux-next, akpm, david, aarcange,
	linux-mm, frankja, sfr, linux-kernel, linux-s390, Will Deacon

On Fri, 28 Feb 2020 16:08:23 -0800
John Hubbard <jhubbard@nvidia.com> wrote:

> On 2/28/20 8:08 AM, Christian Borntraeger wrote:
> > Andrew,
> > 
> > while patch 1 is a fixup for the FOLL_PIN work in your patch queue,
> > I would really love to see this patch in 5.7. The exploitation code
> > of kvm/s390 is in Linux next also scheduled for 5.7.
> > 
> > Christian
> > 
> > On 28.02.20 16:43, Claudio Imbrenda wrote:  
> >> With the introduction of protected KVM guests on s390 there is now
> >> a concept of inaccessible pages. These pages need to be made
> >> accessible before the host can access them.
> >>
> >> While cpu accesses will trigger a fault that can be resolved, I/O
> >> accesses will just fail.  We need to add a callback into
> >> architecture code for places that will do I/O, namely when
> >> writeback is started or when a page reference is taken.
> >>
> >> This is not only to enable paging, file backing etc, it is also
> >> necessary to protect the host against a malicious user space. For
> >> example a bad QEMU could simply start direct I/O on such protected
> >> memory.  We do not want userspace to be able to trigger I/O errors
> >> and thus we the logic is "whenever somebody accesses that page
> >> (gup) or  
> 
> 
> I actually kind of like the sound of that: "We the logic of the
> kernel, in order to form a more perfect computer..." :)
> 
> Probably this wording is what you want, though:
> 
> "thus the logic is "whenever somebody (gup) accesses that page or"
> 
> 
> ...
> >> @@ -458,7 +457,6 @@ static struct page *follow_page_pte(struct
> >> vm_area_struct *vma, }
> >>  
> >>  	if (flags & FOLL_SPLIT && PageTransCompound(page)) {
> >> -		int ret;
> >>  		get_page(page);
> >>  		pte_unmap_unlock(ptep, ptl);
> >>  		lock_page(page);
> >> @@ -475,6 +473,14 @@ static struct page *follow_page_pte(struct
> >> vm_area_struct *vma, page = ERR_PTR(-ENOMEM);
> >>  		goto out;
> >>  	}
> >> +	if (flags & FOLL_PIN) {  
> 
> 
> What about FOLL_GET? Unless your calling code has some sort of
> BUG_ON(flags & FOLL_GET), I'm not sure it's a good idea to leave that
> case unhandled.

if I understood the semantics of FOLL_PIN correctly, then we don't need
to make the page accessible for FOLL_GET. FOLL_PIN indicates intent to
access the content of the page, whereas FOLL_GET is only for the struct
page. 

if we are not touching the content of the page, there is no need to
make it accessible 

> >> +		ret = arch_make_page_accessible(page);
> >> +		if (ret) {
> >> +			unpin_user_page(page);
> >> +			page = ERR_PTR(ret);
> >> +			goto out;
> >> +		}
> >> +	}
> >>  	if (flags & FOLL_TOUCH) {
> >>  		if ((flags & FOLL_WRITE) &&
> >>  		    !pte_dirty(pte) && !PageDirty(page))
> >> @@ -2143,6 +2149,13 @@ static int gup_pte_range(pmd_t pmd,
> >> unsigned long addr, unsigned long end, 
> >>  		VM_BUG_ON_PAGE(compound_head(page) != head, page);
> >>  
> >> +		if (flags & FOLL_PIN) {
> >> +			ret = arch_make_page_accessible(page);
> >> +			if (ret) {
> >> +				unpin_user_page(page);  
> 
> 
> Same concern as above, about leaving FOLL_GET unhandled.

and same answer as above :)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v1 1/2] mm/gup: fixup for 9947ea2c1e608e32 "mm/gup: track FOLL_PIN pages"
  2020-02-28 23:08   ` John Hubbard
@ 2020-02-29 10:51     ` Claudio Imbrenda
  2020-02-29 20:09       ` John Hubbard
  2020-03-02 13:46     ` Michal Hocko
  1 sibling, 1 reply; 14+ messages in thread
From: Claudio Imbrenda @ 2020-02-29 10:51 UTC (permalink / raw)
  To: John Hubbard
  Cc: linux-next, akpm, borntraeger, david, aarcange, linux-mm,
	frankja, sfr, linux-kernel, linux-s390, Jan Kara,
	Kirill A. Shutemov

On Fri, 28 Feb 2020 15:08:35 -0800
John Hubbard <jhubbard@nvidia.com> wrote:

> On 2/28/20 7:43 AM, Claudio Imbrenda wrote:
> > In case pin fails, we need to unpin, a simple put_page will not be
> > enough
> > 
> > Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> > ---
> >  mm/gup.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/gup.c b/mm/gup.c
> > index f589299b0d4a..0b9a806898f3 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -2134,7 +2134,10 @@ static int gup_pte_range(pmd_t pmd, unsigned
> > long addr, unsigned long end, goto pte_unmap;
> >  
> >  		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
> > -			put_page(head);
> > +			if (flags & FOLL_GET)
> > +				put_page(head);
> > +			else if (flags & FOLL_PIN)
> > +				unpin_user_page(head);  
> 
> Hi Claudio,
> 
> Instead, I think that should actually be:
> 
> 		put_compound_head(page, 1, flags);

that makes sense, yes :)

I'll fix it in the next iteration

> 
> which does a bit more (bug checks and /proc/vmstat instrumentation)
> than your diff, but has the same basic idea: call the right "put"
> function.  
> 
> ...oh, actually, I see you have the commit hash in the subject line.
> Instead, it should be in the commit description. Let's maybe change
> the subject line to approx:
> 
>     mm/gup: Fix a missing put_compound_head() call in gup_pte_range()
> 
> And the write up...how about something like this, if you like it:
> 
> 
> try_grab_compound_head() must be undone via put_compound_head(), not
> put_page(). This was missed in the original implementation of the
> gup/dma tracking patch, so fix it now.
> 
> Fixes: 0ea2781c3de4 ("mm/gup: track FOLL_PIN pages")
> 
> 
>     
> (Aside: I'm using the linux-next commit hash. How does one get the
> correct hash before it goes to mainline? I guess maintainer scripts
> fix all those up?)

my idea was that my patch should be used as fix-up, so the actual
content of the commit message is not relevant

> It's also good to Cc some reviewers in case I'm overlooking
> something, so I'm adding Jan and Kirill.
> 
> thanks,


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v1 2/2] mm/gup/writeback: add callbacks for inaccessible pages
  2020-02-29 10:49       ` Claudio Imbrenda
@ 2020-02-29 20:07         ` John Hubbard
  0 siblings, 0 replies; 14+ messages in thread
From: John Hubbard @ 2020-02-29 20:07 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: Christian Borntraeger, linux-next, akpm, david, aarcange,
	linux-mm, frankja, sfr, linux-kernel, linux-s390, Will Deacon

On 2/29/20 2:49 AM, Claudio Imbrenda wrote:
>> ...
>>>> @@ -458,7 +457,6 @@ static struct page *follow_page_pte(struct
>>>> vm_area_struct *vma, }
>>>>   
>>>>   	if (flags & FOLL_SPLIT && PageTransCompound(page)) {
>>>> -		int ret;
>>>>   		get_page(page);
>>>>   		pte_unmap_unlock(ptep, ptl);
>>>>   		lock_page(page);
>>>> @@ -475,6 +473,14 @@ static struct page *follow_page_pte(struct
>>>> vm_area_struct *vma, page = ERR_PTR(-ENOMEM);
>>>>   		goto out;
>>>>   	}
>>>> +	if (flags & FOLL_PIN) {
>>
>>
>> What about FOLL_GET? Unless your calling code has some sort of
>> BUG_ON(flags & FOLL_GET), I'm not sure it's a good idea to leave that
>> case unhandled.
> 
> if I understood the semantics of FOLL_PIN correctly, then we don't need
> to make the page accessible for FOLL_GET. FOLL_PIN indicates intent to
> access the content of the page, whereas FOLL_GET is only for the struct
> page.
> 
> if we are not touching the content of the page, there is no need to
> make it accessible
> 


OK, I hope I'm not overlooking anything, but that sounds correct to me.


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v1 1/2] mm/gup: fixup for 9947ea2c1e608e32 "mm/gup: track FOLL_PIN pages"
  2020-02-29 10:51     ` Claudio Imbrenda
@ 2020-02-29 20:09       ` John Hubbard
  0 siblings, 0 replies; 14+ messages in thread
From: John Hubbard @ 2020-02-29 20:09 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: linux-next, akpm, borntraeger, david, aarcange, linux-mm,
	frankja, sfr, linux-kernel, linux-s390, Jan Kara,
	Kirill A. Shutemov

On 2/29/20 2:51 AM, Claudio Imbrenda wrote:

> my idea was that my patch should be used as fix-up, so the actual
> content of the commit message is not relevant
> 


aha, OK. Yes, a fixup would be nice.


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v1 2/2] mm/gup/writeback: add callbacks for inaccessible pages
  2020-02-28 16:08   ` Christian Borntraeger
  2020-02-29  0:08     ` John Hubbard
@ 2020-03-01  3:47     ` Andrew Morton
  1 sibling, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2020-03-01  3:47 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Claudio Imbrenda, linux-next, david, aarcange, linux-mm, frankja,
	sfr, jhubbard, linux-kernel, linux-s390, Will Deacon

On Fri, 28 Feb 2020 17:08:19 +0100 Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> while patch 1 is a fixup for the FOLL_PIN work in your patch queue,
> I would really love to see this patch in 5.7. The exploitation code
> of kvm/s390 is in Linux next also scheduled for 5.7.

Sounds good.  My inbox eagerly awaits v2 ;)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v1 1/2] mm/gup: fixup for 9947ea2c1e608e32 "mm/gup: track FOLL_PIN pages"
  2020-02-28 23:08   ` John Hubbard
  2020-02-29 10:51     ` Claudio Imbrenda
@ 2020-03-02 13:46     ` Michal Hocko
  1 sibling, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2020-03-02 13:46 UTC (permalink / raw)
  To: John Hubbard
  Cc: Claudio Imbrenda, linux-next, akpm, borntraeger, david, aarcange,
	linux-mm, frankja, sfr, linux-kernel, linux-s390, Jan Kara,
	Kirill A. Shutemov

On Fri 28-02-20 15:08:35, John Hubbard wrote:
[...]
> (Aside: I'm using the linux-next commit hash. How does one get the correct hash before
> it goes to mainline? I guess maintainer scripts fix all those up?)

There is no such maging going on AFAIK. Please just do not use sha1 from
linux-next unless it is really clear that those are not going to change.
So essentially everything from mmotm is out of question.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-03-02 13:46 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-28 15:43 [RFC v1 0/2] add callbacks for inaccessible pages Claudio Imbrenda
2020-02-28 15:43 ` [RFC v1 1/2] fixup for 9947ea2c1e608e32669d5caeb67b3e3fba3309e8 "mm/gup: track FOLL_PIN pages" Claudio Imbrenda
2020-02-28 15:45   ` Claudio Imbrenda
2020-02-28 15:43 ` [RFC v1 1/2] mm/gup: fixup for 9947ea2c1e608e32 " Claudio Imbrenda
2020-02-28 23:08   ` John Hubbard
2020-02-29 10:51     ` Claudio Imbrenda
2020-02-29 20:09       ` John Hubbard
2020-03-02 13:46     ` Michal Hocko
2020-02-28 15:43 ` [RFC v1 2/2] mm/gup/writeback: add callbacks for inaccessible pages Claudio Imbrenda
2020-02-28 16:08   ` Christian Borntraeger
2020-02-29  0:08     ` John Hubbard
2020-02-29 10:49       ` Claudio Imbrenda
2020-02-29 20:07         ` John Hubbard
2020-03-01  3:47     ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).