All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>,
	linux-next@vger.kernel.org, akpm@linux-foundation.org,
	jack@suse.cz, kirill@shutemov.name, "Edgecombe,
	Rick P" <rick.p.edgecombe@intel.com>,
	Sean Christopherson <sean.j.christopherson@intel.com>,
	borntraeger@de.ibm.com, david@redhat.com, aarcange@redhat.com,
	linux-mm@kvack.org, frankja@linux.ibm.com, sfr@canb.auug.org.au,
	jhubbard@nvidia.com, linux-kernel@vger.kernel.org,
	linux-s390@vger.kernel.org, Will Deacon <will@kernel.org>,
	"Williams, Dan J" <dan.j.williams@intel.com>
Subject: Re: [PATCH v4 2/2] mm/gup/writeback: add callbacks for inaccessible pages
Date: Thu, 16 Apr 2020 00:17:54 +0200	[thread overview]
Message-ID: <20200415221754.GM2483@worktop.programming.kicks-ass.net> (raw)
In-Reply-To: <3ae46945-0c7b-03cd-700a-a6fe8003c6ab@intel.com>

On Wed, Apr 15, 2020 at 02:52:31PM -0700, Dave Hansen wrote:
> On 3/6/20 5:25 AM, Claudio Imbrenda wrote:
> > +	/*
> > +	 * We need to make the page accessible if and only if we are going
> > +	 * to access its content (the FOLL_PIN case).  Please see
> > +	 * Documentation/core-api/pin_user_pages.rst for details.
> > +	 */
> > +	if (flags & FOLL_PIN) {
> > +		ret = arch_make_page_accessible(page);
> > +		if (ret) {
> > +			unpin_user_page(page);
> > +			page = ERR_PTR(ret);
> > +			goto out;
> > +		}
> > +	}
> 
> Thanks, Claudio, for a really thorough refresher on this in private mail.
> 
> But, I think this mechanism probably hooks into the wrong place.  I
> don't doubt that it *functions* on s390, but I think these calls are
> misplaced.  I think the end result is that no other architecture will
> have a chance to use the same hooks.  They're far too s390-specific even
> for a concept that's not limited to s390.
> 
> get_user_pages(FOLL_PIN) does *not* mean "the kernel will access this
> page's contents".  The kmap() family is really what we use for that.
> kmap()s are often *preceded* by get_user_pages(), which is probably why
> this works for you, though.
> 
> Yes, the docs do say that FOLL_PIN is for accessing the pages.  But,
> there's a crucial thing that it leaves out: *WHO* will be accessing the
> pages.  For Direct IO, for instance, the CPU isn't touching the page at
> all.  It's always a device.  Also, crucially, the page contents are
> *not* accessible from the CPU's perspective after a gup.  They're not
> accessible until a kmap().  They're also not even accessible for
> *devices* after a gup.  There's a _separate_ mapping process that's
> requires to make them accessible to the CPU.

I think the crucial detail is that we can fail gup(), while we cannot
ever fail kmap() or whatever else a device needs to do.

> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -2764,7 +2764,7 @@ int test_clear_page_writeback(struct page *page)
> >  int __test_set_page_writeback(struct page *page, bool keep_write)
> >  {
> >  	struct address_space *mapping = page_mapping(page);
> > -	int ret;
> > +	int ret, access_ret;
> >  
> >  	lock_page_memcg(page);
> >  	if (mapping && mapping_use_writeback_tags(mapping)) {
> > @@ -2807,6 +2807,13 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
> >  		inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
> >  	}
> >  	unlock_page_memcg(page);
> > +	access_ret = arch_make_page_accessible(page);
> > +	/*
> > +	 * If writeback has been triggered on a page that cannot be made
> > +	 * accessible, it is too late to recover here.
> > +	 */
> > +	VM_BUG_ON_PAGE(access_ret != 0, page);
> > +
> >  	return ret;
> >  
> >  }
> 
> I think this one really shows the cracks in the approach.  Pages being
> swapped *don't* have get_user_pages() done on them since we've already
> got the physical page at the time writeback and aren't looking at PTEs.

I suspect this happens because FOLL_TOUCH or something later does
set_page_dirty() on the page, which then eventually gets it in
writeback.

Failing gup() ealier, should ensure the above VM_BUG never happens,
unless someone is doing dodgy things.

> Why do I care?
> 
> I was looking at AMD's SEV (Secure Encrypted Virtualization) code which
> is in the kernel which shares some implementation details with the
> not-in-the-tree Intel MKTME.  SEV currently has a concept of guest pages
> being encrypted and being gibberish to the host, plus a handshake to
> share guest-selected pages.  Some of the side-effects of exposing the
> gibberish to the host aren't great (I think it can break cache coherency
> if a stray write occurs) and it would be nice to get better behavior.
> 
> But, to get better behavior, the host kernel might need to remove pages
> from its direct map, making them inaccessible. 

But for SEV we would actually need to fail this
arch_make_page_acesssible() thing, right? The encrypted guest pages
cannot be sanely accessed by the host IIRC, ever. Isn't their encryption
key linked to the phys addr of the page?

> I was hoping to reuse
> arch_make_page_accessible() for obvious reasons.  But, get_user_pages()
> is not the right spot to map pages because they might not *ever* be
> accessed by the CPU, only devices.

I'm confused, why does it matter who accesses it? The point is that they
want to access it through this vaddr/mapping.

  reply	other threads:[~2020-04-15 22:18 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-06 13:25 [PATCH v4 0/2] add callbacks for inaccessible pages Claudio Imbrenda
2020-03-06 13:25 ` [PATCH v4 1/2] mm/gup: fixup for 9947ea2c1e608e32 "mm/gup: track FOLL_PIN pages" Claudio Imbrenda
2020-03-06 13:25 ` [PATCH v4 2/2] mm/gup/writeback: add callbacks for inaccessible pages Claudio Imbrenda
2020-04-13 20:22   ` Dave Hansen
2020-04-14 16:03     ` Claudio Imbrenda
2020-04-14 18:50       ` Dave Hansen
2020-04-15  9:26         ` Claudio Imbrenda
2020-04-15 11:39           ` Janosch Frank
2020-04-15 21:52   ` Dave Hansen
2020-04-15 22:17     ` Peter Zijlstra [this message]
2020-04-15 23:34       ` Dave Hansen
2020-04-16 12:15         ` Claudio Imbrenda
2020-04-16 14:20           ` Dave Hansen
2020-04-16 14:59             ` Claudio Imbrenda
2020-04-16 15:36               ` Dave Hansen
2020-04-16 16:34                 ` Claudio Imbrenda
2020-04-16 19:02                   ` Dave Hansen
2020-04-21 21:31                     ` Dave Hansen
2020-04-28 19:43                       ` Dave Hansen
2020-04-28 20:02                         ` Christian Borntraeger
2020-04-28 23:39                         ` Claudio Imbrenda
2020-04-29  0:42                           ` Dave Hansen
2020-04-16 11:51     ` Claudio Imbrenda

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200415221754.GM2483@worktop.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=borntraeger@de.ibm.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=frankja@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=jack@suse.cz \
    --cc=jhubbard@nvidia.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-next@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=sean.j.christopherson@intel.com \
    --cc=sfr@canb.auug.org.au \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.