linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <sean.j.christopherson@intel.com>
To: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Janosch Frank <frankja@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Marc Zyngier <maz@kernel.org>,
	Tom Lendacky <thomas.lendacky@amd.com>, KVM <kvm@vger.kernel.org>,
	Cornelia Huck <cohuck@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	Thomas Huth <thuth@redhat.com>,
	Ulrich Weigand <Ulrich.Weigand@de.ibm.com>,
	Claudio Imbrenda <imbrenda@linux.ibm.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	linux-s390 <linux-s390@vger.kernel.org>,
	Michael Mueller <mimu@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	linux-mm@kvack.org, kvm-ppc@vger.kernel.org,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages
Date: Thu, 13 Feb 2020 11:56:02 -0800	[thread overview]
Message-ID: <20200213195602.GD18610@linux.intel.com> (raw)
In-Reply-To: <28792269-e053-ac70-a344-45612ee5c729@de.ibm.com>

On Mon, Feb 10, 2020 at 06:27:04PM +0100, Christian Borntraeger wrote:
> CC Marc Zyngier for KVM on ARM.  Marc, see below. Will there be any
> use for this on KVM/ARM in the future?
> 
> CC Sean Christopherson/Tom Lendacky. Any obvious use case for Intel/AMD
> to have a callback before a page is used for I/O?

Yes?

> Andrew (or other mm people) any chance to get an ACK for this change?
> I could then carry that via s390 or KVM tree. Or if you want to carry
> that yourself I can send an updated version (we need to kind of 
> synchronize that Linus will pull the KVM changes after the mm changes).
> 
> Andrea asked if others would benefit from this, so here are some more
> information about this (and I can also put this into the patch
> description).  So we have talked to the POWER folks. They do not use
> the standard normal memory management, instead they have a hard split
> between secure and normal memory. The secure memory  is the handled by
> the hypervisor as device memory and the ultravisor and the hypervisor
> move this forth and back when needed.
> 
> On s390 there is no *separate* pool of physical pages that are secure.
> Instead, *any* physical page can be marked as secure or not, by
> setting a bit in a per-page data structure that hardware uses to stop
> unauthorized access.  (That bit is under control of the ultravisor.)
> 
> Note that one side effect of this strategy is that the decision
> *which* secure pages to encrypt and then swap out is actually done by
> the hypervisor, not the ultravisor.  In our case, the hypervisor is
> Linux/KVM, so we're using the regular Linux memory management scheme
> (active/inactive LRU lists etc.) to make this decision.  The advantage
> is that the Ultravisor code does not need to itself implement any
> memory management code, making it a lot simpler.

Disclaimer: I'm not familiar with s390 guest page faults or UV.  I tried
to give myself a crash course, apologies if I'm way out in left field...

AIUI, pages will first be added to a secure guest by converting a normal,
non-secure page to secure and stuffing it into the guest page tables.  To
swap a page from a secure guest, arch_make_page_accessible() will be called
to encrypt the page in place so that it can be accessed by the untrusted
kernel/VMM and written out to disk.  And to fault the page back in, on s390
a secure guest access to a non-secure page will generate a page fault with
a dedicated type.  That fault routes directly to
do_non_secure_storage_access(), which converts the page to secure and thus
makes it re-accessible to the guest.

That all sounds sane and usable for Intel.

My big question is the follow/get flows, more on that below.

> However, in the end this is why we need the hook into Linux memory
> management: once Linux has decided to swap a page out, we need to get
> a chance to tell the Ultravisor to "export" the page (i.e., encrypt
> its contents and mark it no longer secure).
> 
> As outlined below this should be a no-op for anybody not opting in.
> 
> Christian                                   
> 
> On 07.02.20 12:39, Christian Borntraeger wrote:
> > From: Claudio Imbrenda <imbrenda@linux.ibm.com>
> > 
> > With the introduction of protected KVM guests on s390 there is now a
> > concept of inaccessible pages. These pages need to be made accessible
> > before the host can access them.
> > 
> > While cpu accesses will trigger a fault that can be resolved, I/O
> > accesses will just fail.  We need to add a callback into architecture
> > code for places that will do I/O, namely when writeback is started or
> > when a page reference is taken.
> > 
> > Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> > Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> > ---
> >  include/linux/gfp.h | 6 ++++++
> >  mm/gup.c            | 2 ++
> >  mm/page-writeback.c | 1 +
> >  3 files changed, 9 insertions(+)
> > 
> > diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> > index e5b817cb86e7..be2754841369 100644
> > --- a/include/linux/gfp.h
> > +++ b/include/linux/gfp.h
> > @@ -485,6 +485,12 @@ static inline void arch_free_page(struct page *page, int order) { }
> >  #ifndef HAVE_ARCH_ALLOC_PAGE
> >  static inline void arch_alloc_page(struct page *page, int order) { }
> >  #endif
> > +#ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
> > +static inline int arch_make_page_accessible(struct page *page)
> > +{
> > +	return 0;
> > +}
> > +#endif
> >  
> >  struct page *
> >  __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 7646bf993b25..a01262cd2821 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -257,6 +257,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
> >  			page = ERR_PTR(-ENOMEM);
> >  			goto out;
> >  		}
> > +		arch_make_page_accessible(page);

As Will pointed out, the return value definitely needs to be checked, there
will undoubtedly be scenarios where the page cannot be made accessible.

What is the use case for calling arch_make_page_accessible() in the follow()
and gup() paths?  Live migration is the only thing that comes to mind, and
for live migration I would expect you would want to keep the secure guest
running when copying pages to the target, i.e. use pre-copy.  That would
conflict with converting the page in place.  Rather, migration would use a
separate dedicated path to copy the encrypted contents of the secure page to
a completely different page, and send *that* across the wire so that the
guest can continue accessing the original page.

Am I missing a need to do this for the swap/reclaim case?  Or is there a
completely different use case I'm overlooking?

Tangentially related, hooks here could be quite useful for sanity checking
the kernel/KVM and/or debugging kernel/KVM bugs.  Would it make sense to
pass a param to arch_make_page_accessible() to provide some information as
to why the page needs to be made accessible?

> >  	}
> >  	if (flags & FOLL_TOUCH) {
> >  		if ((flags & FOLL_WRITE) &&
> > @@ -1870,6 +1871,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
> >  
> >  		VM_BUG_ON_PAGE(compound_head(page) != head, page);
> >  
> > +		arch_make_page_accessible(page);
> >  		SetPageReferenced(page);
> >  		pages[*nr] = page;
> >  		(*nr)++;
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index 2caf780a42e7..0f0bd14571b1 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -2806,6 +2806,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
> >  		inc_lruvec_page_state(page, NR_WRITEBACK);
> >  		inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
> >  	}
> > +	arch_make_page_accessible(page);
> >  	unlock_page_memcg(page);
> 
> As outlined by Ulrich, we can move the callback after the unlock.
> 
> >  	return ret;
> >  
> > 
> 


  parent reply	other threads:[~2020-02-13 19:56 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
2020-02-07 11:39 ` [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages Christian Borntraeger
2020-02-10 17:27   ` Christian Borntraeger
2020-02-11 11:26     ` Will Deacon
2020-02-11 11:43       ` Christian Borntraeger
2020-02-13 14:48       ` Christian Borntraeger
2020-02-18 16:02         ` Will Deacon
2020-02-13 19:56     ` Sean Christopherson [this message]
2020-02-13 20:13       ` Christian Borntraeger
2020-02-13 20:46         ` Sean Christopherson
2020-02-17 20:55         ` Tom Lendacky
2020-02-17 21:14           ` Christian Borntraeger
2020-02-10 18:17   ` David Hildenbrand
2020-02-10 18:28     ` Christian Borntraeger
2020-02-10 18:43       ` David Hildenbrand
2020-02-10 18:51         ` Christian Borntraeger
2020-02-18  3:36   ` Tian, Kevin
2020-02-18  6:44     ` Christian Borntraeger
2020-02-07 11:39 ` [PATCH 02/35] KVM: s390/interrupt: do not pin adapter interrupt pages Christian Borntraeger
2020-02-10 12:26   ` David Hildenbrand
2020-02-10 18:38     ` Christian Borntraeger
2020-02-10 19:33       ` David Hildenbrand
2020-02-11  9:23         ` [PATCH v2 RFC] " Christian Borntraeger
2020-02-12 11:52           ` Christian Borntraeger
2020-02-12 12:16           ` David Hildenbrand
2020-02-12 12:22             ` Christian Borntraeger
2020-02-12 12:47               ` David Hildenbrand
2020-02-12 12:39           ` Cornelia Huck
2020-02-12 12:44             ` Christian Borntraeger
2020-02-12 13:07               ` Cornelia Huck
2020-02-10 18:56     ` [PATCH 02/35] KVM: s390/interrupt: do not pin adapter interrupt Ulrich Weigand
2020-02-10 12:40   ` [PATCH 02/35] KVM: s390/interrupt: do not pin adapter interrupt pages David Hildenbrand
2020-02-07 11:39 ` [PATCH 05/35] s390/mm: provide memory management functions for protected KVM guests Christian Borntraeger
2020-02-12 13:42   ` Cornelia Huck
2020-02-13  7:43     ` Christian Borntraeger
2020-02-13  8:44       ` Cornelia Huck
2020-02-14 17:59   ` David Hildenbrand
2020-02-14 21:17     ` Christian Borntraeger
2020-02-07 11:39 ` [PATCH 06/35] s390/mm: add (non)secure page access exceptions handlers Christian Borntraeger
2020-02-14 18:05   ` David Hildenbrand
2020-02-14 19:59     ` Christian Borntraeger
2020-02-07 11:39 ` [PATCH 10/35] KVM: s390: protvirt: Secure memory is not mergeable Christian Borntraeger
2020-02-07 11:39 ` [PATCH 11/35] KVM: s390/mm: Make pages accessible before destroying the guest Christian Borntraeger
2020-02-14 18:40   ` David Hildenbrand
2020-02-07 11:39 ` [PATCH 21/35] KVM: s390/mm: handle guest unpin events Christian Borntraeger
2020-02-10 14:58   ` Thomas Huth
2020-02-11 13:21     ` Cornelia Huck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200213195602.GD18610@linux.intel.com \
    --to=sean.j.christopherson@intel.com \
    --cc=Ulrich.Weigand@de.ibm.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=borntraeger@de.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=david@redhat.com \
    --cc=frankja@linux.vnet.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=mimu@linux.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=thomas.lendacky@amd.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).