linux-next.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Claudio Imbrenda <imbrenda@linux.ibm.com>
To: John Hubbard <jhubbard@nvidia.com>
Cc: <linux-next@vger.kernel.org>, <akpm@linux-foundation.org>,
	<jack@suse.cz>, <kirill@shutemov.name>, <borntraeger@de.ibm.com>,
	<david@redhat.com>, <aarcange@redhat.com>, <linux-mm@kvack.org>,
	<frankja@linux.ibm.com>, <sfr@canb.auug.org.au>,
	<linux-kernel@vger.kernel.org>, <linux-s390@vger.kernel.org>,
	Will Deacon <will@kernel.org>
Subject: Re: [PATCH v2 2/2] mm/gup/writeback: add callbacks for inaccessible pages
Date: Tue, 3 Mar 2020 11:41:49 +0100	[thread overview]
Message-ID: <20200303114149.54c072d1@p-imbrenda> (raw)
In-Reply-To: <99903e77-7720-678e-35c5-6eb9e35e7fcb@nvidia.com>

On Mon, 2 Mar 2020 23:59:32 -0800
John Hubbard <jhubbard@nvidia.com> wrote:

> On 3/2/20 4:25 PM, Claudio Imbrenda wrote:
> > With the introduction of protected KVM guests on s390 there is now a
> > concept of inaccessible pages. These pages need to be made
> > accessible before the host can access them.
> > 
> > While cpu accesses will trigger a fault that can be resolved, I/O
> > accesses will just fail.  We need to add a callback into
> > architecture code for places that will do I/O, namely when
> > writeback is started or when a page reference is taken.
> > 
> > This is not only to enable paging, file backing etc, it is also
> > necessary to protect the host against a malicious user space.  For
> > example a bad QEMU could simply start direct I/O on such protected
> > memory.  We do not want userspace to be able to trigger I/O errors
> > and thus the logic is "whenever somebody accesses that page (gup)
> > or does I/O, make sure that this page can be accessed".  When the
> > guest tries to access that page we will wait in the page fault
> > handler for writeback to have finished and for the page_ref to be
> > the expected value.
> > 
> > On s390x the function is not supposed to fail, so it is ok to use a
> > WARN_ON on failure. If we ever need some more finegrained handling
> > we can tackle this when we know the details.
> > 
> > Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> > Acked-by: Will Deacon <will@kernel.org>
> > Reviewed-by: David Hildenbrand <david@redhat.com>
> > Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
> > ---
> >   include/linux/gfp.h |  6 ++++++
> >   mm/gup.c            | 27 ++++++++++++++++++++++++---
> >   mm/page-writeback.c |  5 +++++
> >   3 files changed, 35 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> > index e5b817cb86e7..be2754841369 100644
> > --- a/include/linux/gfp.h
> > +++ b/include/linux/gfp.h
> > @@ -485,6 +485,12 @@ static inline void arch_free_page(struct page
> > *page, int order) { } #ifndef HAVE_ARCH_ALLOC_PAGE
> >   static inline void arch_alloc_page(struct page *page, int order)
> > { } #endif
> > +#ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
> > +static inline int arch_make_page_accessible(struct page *page)
> > +{
> > +	return 0;
> > +}
> > +#endif
> >   
> >   struct page *
> >   __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int
> > preferred_nid, diff --git a/mm/gup.c b/mm/gup.c
> > index 81a95fbe9901..15c47e0e86f8 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -413,6 +413,7 @@ static struct page *follow_page_pte(struct
> > vm_area_struct *vma, struct page *page;
> >   	spinlock_t *ptl;
> >   	pte_t *ptep, pte;
> > +	int ret;
> >   
> >   	/* FOLL_GET and FOLL_PIN are mutually exclusive. */
> >   	if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) ==
> > @@ -471,8 +472,6 @@ static struct page *follow_page_pte(struct
> > vm_area_struct *vma, if (is_zero_pfn(pte_pfn(pte))) {
> >   			page = pte_page(pte);
> >   		} else {
> > -			int ret;
> > -
> >   			ret = follow_pfn_pte(vma, address, ptep,
> > flags); page = ERR_PTR(ret);
> >   			goto out;
> > @@ -480,7 +479,6 @@ static struct page *follow_page_pte(struct
> > vm_area_struct *vma, }
> >   
> >   	if (flags & FOLL_SPLIT && PageTransCompound(page)) {
> > -		int ret;
> >   		get_page(page);
> >   		pte_unmap_unlock(ptep, ptl);
> >   		lock_page(page);
> > @@ -497,6 +495,19 @@ static struct page *follow_page_pte(struct
> > vm_area_struct *vma, page = ERR_PTR(-ENOMEM);
> >   		goto out;
> >   	}
> > +	/*
> > +	 * We need to make the page accessible if we are actually
> > going to
> > +	 * poke at its content (pin), otherwise we can leave it
> > inaccessible.
> > +	 * If we cannot make the page accessible, fail.
> > +	 */
> > +	if (flags & FOLL_PIN) {
> > +		ret = arch_make_page_accessible(page);
> > +		if (ret) {
> > +			unpin_user_page(page);
> > +			page = ERR_PTR(ret);
> > +			goto out;
> > +		}
> > +	}  
> 
> 
> That looks good.
> 
> 
> >   	if (flags & FOLL_TOUCH) {
> >   		if ((flags & FOLL_WRITE) &&
> >   		    !pte_dirty(pte) && !PageDirty(page))
> > @@ -2162,6 +2173,16 @@ static int gup_pte_range(pmd_t pmd, unsigned
> > long addr, unsigned long end, 
> >   		VM_BUG_ON_PAGE(compound_head(page) != head, page);
> >   
> > +		/*
> > +		 * We need to make the page accessible if we are
> > actually
> > +		 * going to poke at its content (pin), otherwise
> > we can
> > +		 * leave it inaccessible. If the page cannot be
> > made
> > +		 * accessible, fail.
> > +		 */  
> 
> 
> This part looks good, so these two points are just nits:
> 
> That's a little bit of repeating what the code does, in the comments.
> How about:
> 
> 		/*
> 		 * We need to make the page accessible if and only if
> we are
> 		 * going to access its content (the FOLL_PIN case).
> Please see
> 		 * Documentation/core-api/pin_user_pages.rst for
> details. */
> 
> 
> > +		if ((flags & FOLL_PIN) &&
> > arch_make_page_accessible(page)) {
> > +			unpin_user_page(page);
> > +			goto pte_unmap;
> > +		}  
> 
> 
> Your style earlier in the patch was easier on the reader, why not
> stay consistent with that (and with this file, which tends also to do
> this), so:
> 
> 		if (flags & FOLL_PIN) {
> 			ret = arch_make_page_accessible(page);
> 			if (ret) {
> 				unpin_user_page(page);
> 				goto pte_unmap;
> 			}
> 		}
> 
> 
> 
> 
> >   		SetPageReferenced(page);
> >   		pages[*nr] = page;
> >   		(*nr)++;
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index ab5a3cee8ad3..8384be5a2758 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -2807,6 +2807,11 @@ int __test_set_page_writeback(struct page
> > *page, bool keep_write) inc_zone_page_state(page,
> > NR_ZONE_WRITE_PENDING); }
> >   	unlock_page_memcg(page);
> > +	/*
> > +	 * If writeback has been triggered on a page that cannot
> > be made
> > +	 * accessible, it is too late.
> > +	 */
> > +	WARN_ON(arch_make_page_accessible(page));  
> 
> 
> I'm not deep enough into this area to know if a) this is correct, and
> b) if there are any other places that need
> arch_make_page_accessible() calls. So I'll rely on other reviewers to
> help check on that.
> 
> 
> >   	return ret;
> >   
> >   }
> >   
> 
> Anyway, I don't see any problems, and as I said, those documentation
> and style points are just nitpicks, not bugs.


these are minor fixes, and I mostly agree with you. I'll fix them and
send a v3 soon™

thanks for the comments!


      parent reply	other threads:[~2020-03-03 10:42 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-03  0:25 [PATCH v2 0/2] add callbacks for inaccessible pages Claudio Imbrenda
2020-03-03  0:25 ` [PATCH v2 1/2] mm/gup: fixup for 9947ea2c1e608e32 "mm/gup: track FOLL_PIN pages" Claudio Imbrenda
2020-03-03  5:38   ` John Hubbard
2020-03-03  0:25 ` [PATCH v2 2/2] mm/gup/writeback: add callbacks for inaccessible pages Claudio Imbrenda
2020-03-03  7:59   ` John Hubbard
2020-03-03  8:07     ` David Hildenbrand
2020-03-03 10:41     ` Claudio Imbrenda [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200303114149.54c072d1@p-imbrenda \
    --to=imbrenda@linux.ibm.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=borntraeger@de.ibm.com \
    --cc=david@redhat.com \
    --cc=frankja@linux.ibm.com \
    --cc=jack@suse.cz \
    --cc=jhubbard@nvidia.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-next@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=sfr@canb.auug.org.au \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).