linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>,
	linux-next@vger.kernel.org, akpm@linux-foundation.org,
	jack@suse.cz, kirill@shutemov.name, "Edgecombe,
	Rick P" <rick.p.edgecombe@intel.com>,
	Sean Christopherson <sean.j.christopherson@intel.com>,
	borntraeger@de.ibm.com, david@redhat.com, aarcange@redhat.com,
	linux-mm@kvack.org, frankja@linux.ibm.com, sfr@canb.auug.org.au,
	jhubbard@nvidia.com, linux-kernel@vger.kernel.org,
	linux-s390@vger.kernel.org, Will Deacon <will@kernel.org>,
	"Williams, Dan J" <dan.j.williams@intel.com>
Subject: Re: [PATCH v4 2/2] mm/gup/writeback: add callbacks for inaccessible pages
Date: Thu, 16 Apr 2020 00:17:54 +0200	[thread overview]
Message-ID: <20200415221754.GM2483@worktop.programming.kicks-ass.net> (raw)
In-Reply-To: <3ae46945-0c7b-03cd-700a-a6fe8003c6ab@intel.com>

On Wed, Apr 15, 2020 at 02:52:31PM -0700, Dave Hansen wrote:
> On 3/6/20 5:25 AM, Claudio Imbrenda wrote:
> > +	/*
> > +	 * We need to make the page accessible if and only if we are going
> > +	 * to access its content (the FOLL_PIN case).  Please see
> > +	 * Documentation/core-api/pin_user_pages.rst for details.
> > +	 */
> > +	if (flags & FOLL_PIN) {
> > +		ret = arch_make_page_accessible(page);
> > +		if (ret) {
> > +			unpin_user_page(page);
> > +			page = ERR_PTR(ret);
> > +			goto out;
> > +		}
> > +	}
> 
> Thanks, Claudio, for a really thorough refresher on this in private mail.
> 
> But, I think this mechanism probably hooks into the wrong place.  I
> don't doubt that it *functions* on s390, but I think these calls are
> misplaced.  I think the end result is that no other architecture will
> have a chance to use the same hooks.  They're far too s390-specific even
> for a concept that's not limited to s390.
> 
> get_user_pages(FOLL_PIN) does *not* mean "the kernel will access this
> page's contents".  The kmap() family is really what we use for that.
> kmap()s are often *preceded* by get_user_pages(), which is probably why
> this works for you, though.
> 
> Yes, the docs do say that FOLL_PIN is for accessing the pages.  But,
> there's a crucial thing that it leaves out: *WHO* will be accessing the
> pages.  For Direct IO, for instance, the CPU isn't touching the page at
> all.  It's always a device.  Also, crucially, the page contents are
> *not* accessible from the CPU's perspective after a gup.  They're not
> accessible until a kmap().  They're also not even accessible for
> *devices* after a gup.  There's a _separate_ mapping process that's
> requires to make them accessible to the CPU.

I think the crucial detail is that we can fail gup(), while we cannot
ever fail kmap() or whatever else a device needs to do.

> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -2764,7 +2764,7 @@ int test_clear_page_writeback(struct page *page)
> >  int __test_set_page_writeback(struct page *page, bool keep_write)
> >  {
> >  	struct address_space *mapping = page_mapping(page);
> > -	int ret;
> > +	int ret, access_ret;
> >  
> >  	lock_page_memcg(page);
> >  	if (mapping && mapping_use_writeback_tags(mapping)) {
> > @@ -2807,6 +2807,13 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
> >  		inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
> >  	}
> >  	unlock_page_memcg(page);
> > +	access_ret = arch_make_page_accessible(page);
> > +	/*
> > +	 * If writeback has been triggered on a page that cannot be made
> > +	 * accessible, it is too late to recover here.
> > +	 */
> > +	VM_BUG_ON_PAGE(access_ret != 0, page);
> > +
> >  	return ret;
> >  
> >  }
> 
> I think this one really shows the cracks in the approach.  Pages being
> swapped *don't* have get_user_pages() done on them since we've already
> got the physical page at the time writeback and aren't looking at PTEs.

I suspect this happens because FOLL_TOUCH or something later does
set_page_dirty() on the page, which then eventually gets it in
writeback.

Failing gup() ealier, should ensure the above VM_BUG never happens,
unless someone is doing dodgy things.

> Why do I care?
> 
> I was looking at AMD's SEV (Secure Encrypted Virtualization) code which
> is in the kernel which shares some implementation details with the
> not-in-the-tree Intel MKTME.  SEV currently has a concept of guest pages
> being encrypted and being gibberish to the host, plus a handshake to
> share guest-selected pages.  Some of the side-effects of exposing the
> gibberish to the host aren't great (I think it can break cache coherency
> if a stray write occurs) and it would be nice to get better behavior.
> 
> But, to get better behavior, the host kernel might need to remove pages
> from its direct map, making them inaccessible. 

But for SEV we would actually need to fail this
arch_make_page_acesssible() thing, right? The encrypted guest pages
cannot be sanely accessed by the host IIRC, ever. Isn't their encryption
key linked to the phys addr of the page?

> I was hoping to reuse
> arch_make_page_accessible() for obvious reasons.  But, get_user_pages()
> is not the right spot to map pages because they might not *ever* be
> accessed by the CPU, only devices.

I'm confused, why does it matter who accesses it? The point is that they
want to access it through this vaddr/mapping.


  reply	other threads:[~2020-04-15 22:18 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-06 13:25 [PATCH v4 0/2] add callbacks for inaccessible pages Claudio Imbrenda
2020-03-06 13:25 ` [PATCH v4 1/2] mm/gup: fixup for 9947ea2c1e608e32 "mm/gup: track FOLL_PIN pages" Claudio Imbrenda
2020-03-06 13:25 ` [PATCH v4 2/2] mm/gup/writeback: add callbacks for inaccessible pages Claudio Imbrenda
2020-04-13 20:22   ` Dave Hansen
2020-04-14 16:03     ` Claudio Imbrenda
2020-04-14 18:50       ` Dave Hansen
2020-04-15  9:26         ` Claudio Imbrenda
2020-04-15 11:39           ` Janosch Frank
2020-04-15 21:52   ` Dave Hansen
2020-04-15 22:17     ` Peter Zijlstra [this message]
2020-04-15 23:34       ` Dave Hansen
2020-04-16 12:15         ` Claudio Imbrenda
2020-04-16 14:20           ` Dave Hansen
2020-04-16 14:59             ` Claudio Imbrenda
2020-04-16 15:36               ` Dave Hansen
2020-04-16 16:34                 ` Claudio Imbrenda
2020-04-16 19:02                   ` Dave Hansen
2020-04-21 21:31                     ` Dave Hansen
2020-04-28 19:43                       ` Dave Hansen
2020-04-28 20:02                         ` Christian Borntraeger
2020-04-28 23:39                         ` Claudio Imbrenda
2020-04-29  0:42                           ` Dave Hansen
2020-04-16 11:51     ` Claudio Imbrenda

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200415221754.GM2483@worktop.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=borntraeger@de.ibm.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=frankja@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=jack@suse.cz \
    --cc=jhubbard@nvidia.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-next@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=sean.j.christopherson@intel.com \
    --cc=sfr@canb.auug.org.au \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).