linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Tom Lendacky <thomas.lendacky@amd.com>
To: Christian Borntraeger <borntraeger@de.ibm.com>,
	Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Janosch Frank <frankja@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Marc Zyngier <maz@kernel.org>, KVM <kvm@vger.kernel.org>,
	Cornelia Huck <cohuck@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	Thomas Huth <thuth@redhat.com>,
	Ulrich Weigand <Ulrich.Weigand@de.ibm.com>,
	Claudio Imbrenda <imbrenda@linux.ibm.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	linux-s390 <linux-s390@vger.kernel.org>,
	Michael Mueller <mimu@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	linux-mm@kvack.org, kvm-ppc@vger.kernel.org,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages
Date: Mon, 17 Feb 2020 14:55:34 -0600	[thread overview]
Message-ID: <6da2e3d0-a2be-18d2-3548-b546052d14e3@amd.com> (raw)
In-Reply-To: <e2c41b25-6d6d-6685-3450-2e3e8d84efd1@de.ibm.com>

On 2/13/20 2:13 PM, Christian Borntraeger wrote:
> 
> 
> On 13.02.20 20:56, Sean Christopherson wrote:
>> On Mon, Feb 10, 2020 at 06:27:04PM +0100, Christian Borntraeger wrote:
>>> CC Marc Zyngier for KVM on ARM.  Marc, see below. Will there be any
>>> use for this on KVM/ARM in the future?
>>>
>>> CC Sean Christopherson/Tom Lendacky. Any obvious use case for Intel/AMD
>>> to have a callback before a page is used for I/O?

From an SEV-SNP perspective, I don't think so. The SEV-SNP architecture
uses page states and having the hypervisor change the state from beneath
the guest might trigger the guest into thinking it's being attacked vs
just allowing the I/O to fail. Is this a concern with flooding the console
with I/O error messages?

>>
>> Yes?
>>
>>> Andrew (or other mm people) any chance to get an ACK for this change?
>>> I could then carry that via s390 or KVM tree. Or if you want to carry
>>> that yourself I can send an updated version (we need to kind of 
>>> synchronize that Linus will pull the KVM changes after the mm changes).
>>>
>>> Andrea asked if others would benefit from this, so here are some more
>>> information about this (and I can also put this into the patch
>>> description).  So we have talked to the POWER folks. They do not use
>>> the standard normal memory management, instead they have a hard split
>>> between secure and normal memory. The secure memory  is the handled by
>>> the hypervisor as device memory and the ultravisor and the hypervisor
>>> move this forth and back when needed.
>>>
>>> On s390 there is no *separate* pool of physical pages that are secure.
>>> Instead, *any* physical page can be marked as secure or not, by
>>> setting a bit in a per-page data structure that hardware uses to stop
>>> unauthorized access.  (That bit is under control of the ultravisor.)
>>>
>>> Note that one side effect of this strategy is that the decision
>>> *which* secure pages to encrypt and then swap out is actually done by
>>> the hypervisor, not the ultravisor.  In our case, the hypervisor is
>>> Linux/KVM, so we're using the regular Linux memory management scheme
>>> (active/inactive LRU lists etc.) to make this decision.  The advantage
>>> is that the Ultravisor code does not need to itself implement any
>>> memory management code, making it a lot simpler.
>>
>> Disclaimer: I'm not familiar with s390 guest page faults or UV.  I tried
>> to give myself a crash course, apologies if I'm way out in left field...
>>
>> AIUI, pages will first be added to a secure guest by converting a normal,
>> non-secure page to secure and stuffing it into the guest page tables.  To
>> swap a page from a secure guest, arch_make_page_accessible() will be called
>> to encrypt the page in place so that it can be accessed by the untrusted
>> kernel/VMM and written out to disk.  And to fault the page back in, on s390
>> a secure guest access to a non-secure page will generate a page fault with
>> a dedicated type.  That fault routes directly to
>> do_non_secure_storage_access(), which converts the page to secure and thus
>> makes it re-accessible to the guest.
>>
>> That all sounds sane and usable for Intel.
>>
>> My big question is the follow/get flows, more on that below.
>>
>>> However, in the end this is why we need the hook into Linux memory
>>> management: once Linux has decided to swap a page out, we need to get
>>> a chance to tell the Ultravisor to "export" the page (i.e., encrypt
>>> its contents and mark it no longer secure).
>>>
>>> As outlined below this should be a no-op for anybody not opting in.
>>>
>>> Christian                                   
>>>
>>> On 07.02.20 12:39, Christian Borntraeger wrote:
>>>> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>>>
>>>> With the introduction of protected KVM guests on s390 there is now a
>>>> concept of inaccessible pages. These pages need to be made accessible
>>>> before the host can access them.
>>>>
>>>> While cpu accesses will trigger a fault that can be resolved, I/O
>>>> accesses will just fail.  We need to add a callback into architecture
>>>> code for places that will do I/O, namely when writeback is started or
>>>> when a page reference is taken.
>>>>
>>>> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>>> ---
>>>>  include/linux/gfp.h | 6 ++++++
>>>>  mm/gup.c            | 2 ++
>>>>  mm/page-writeback.c | 1 +
>>>>  3 files changed, 9 insertions(+)
>>>>
>>>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
>>>> index e5b817cb86e7..be2754841369 100644
>>>> --- a/include/linux/gfp.h
>>>> +++ b/include/linux/gfp.h
>>>> @@ -485,6 +485,12 @@ static inline void arch_free_page(struct page *page, int order) { }
>>>>  #ifndef HAVE_ARCH_ALLOC_PAGE
>>>>  static inline void arch_alloc_page(struct page *page, int order) { }
>>>>  #endif
>>>> +#ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
>>>> +static inline int arch_make_page_accessible(struct page *page)
>>>> +{
>>>> +	return 0;
>>>> +}
>>>> +#endif
>>>>  
>>>>  struct page *
>>>>  __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>>>> diff --git a/mm/gup.c b/mm/gup.c
>>>> index 7646bf993b25..a01262cd2821 100644
>>>> --- a/mm/gup.c
>>>> +++ b/mm/gup.c
>>>> @@ -257,6 +257,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
>>>>  			page = ERR_PTR(-ENOMEM);
>>>>  			goto out;
>>>>  		}
>>>> +		arch_make_page_accessible(page);
>>
>> As Will pointed out, the return value definitely needs to be checked, there
>> will undoubtedly be scenarios where the page cannot be made accessible.
> 
> Actually onm s390 this should always succeed unless we have a bug.
> 
> But we can certainly provide a variant of that patch that does check the return
> value. 
> Proper error handling for gup and WARN_ON for pae-writeback.
>>
>> What is the use case for calling arch_make_page_accessible() in the follow()
>> and gup() paths?  Live migration is the only thing that comes to mind, and
>> for live migration I would expect you would want to keep the secure guest
>> running when copying pages to the target, i.e. use pre-copy.  That would
>> conflict with converting the page in place.  Rather, migration would use a
>> separate dedicated path to copy the encrypted contents of the secure page to
>> a completely different page, and send *that* across the wire so that the
>> guest can continue accessing the original page.
>> Am I missing a need to do this for the swap/reclaim case?  Or is there a
>> completely different use case I'm overlooking?
> 
> This is actually to protect the host against a malicious user space. For 
> example a bad QEMU could simply start direct I/O on such protected memory.
> We do not want userspace to be able to trigger I/O errors and thus we
> implemented the logic to "whenever somebody accesses that page (gup) or
> doing I/O, make sure that this page can be accessed. When the guest tries
> to access that page we will wait in the page fault handler for writeback to
> have finished and for the page_ref to be the expected value.

So in this case, when the guest tries to access the page, the page may now
be corrupted because I/O was allowed to be done to it? Or will the I/O
have been blocked in some way, but without generating the I/O error?

Thanks,
Tom

> 
> 
> 
>>
>> Tangentially related, hooks here could be quite useful for sanity checking
>> the kernel/KVM and/or debugging kernel/KVM bugs.  Would it make sense to
>> pass a param to arch_make_page_accessible() to provide some information as
>> to why the page needs to be made accessible?
> 
> Some kind of enum that can be used optionally to optimize things?
> 
>>
>>>>  	}
>>>>  	if (flags & FOLL_TOUCH) {
>>>>  		if ((flags & FOLL_WRITE) &&
>>>> @@ -1870,6 +1871,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>>>>  
>>>>  		VM_BUG_ON_PAGE(compound_head(page) != head, page);
>>>>  
>>>> +		arch_make_page_accessible(page);
>>>>  		SetPageReferenced(page);
>>>>  		pages[*nr] = page;
>>>>  		(*nr)++;
>>>> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
>>>> index 2caf780a42e7..0f0bd14571b1 100644
>>>> --- a/mm/page-writeback.c
>>>> +++ b/mm/page-writeback.c
>>>> @@ -2806,6 +2806,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
>>>>  		inc_lruvec_page_state(page, NR_WRITEBACK);
>>>>  		inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
>>>>  	}
>>>> +	arch_make_page_accessible(page);
>>>>  	unlock_page_memcg(page);
>>>
>>> As outlined by Ulrich, we can move the callback after the unlock.
>>>
>>>>  	return ret;
>>>>  
>>>>
>>>
> 


  parent reply	other threads:[~2020-02-17 20:55 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
2020-02-07 11:39 ` [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages Christian Borntraeger
2020-02-10 17:27   ` Christian Borntraeger
2020-02-11 11:26     ` Will Deacon
2020-02-11 11:43       ` Christian Borntraeger
2020-02-13 14:48       ` Christian Borntraeger
2020-02-18 16:02         ` Will Deacon
2020-02-13 19:56     ` Sean Christopherson
2020-02-13 20:13       ` Christian Borntraeger
2020-02-13 20:46         ` Sean Christopherson
2020-02-17 20:55         ` Tom Lendacky [this message]
2020-02-17 21:14           ` Christian Borntraeger
2020-02-10 18:17   ` David Hildenbrand
2020-02-10 18:28     ` Christian Borntraeger
2020-02-10 18:43       ` David Hildenbrand
2020-02-10 18:51         ` Christian Borntraeger
2020-02-18  3:36   ` Tian, Kevin
2020-02-18  6:44     ` Christian Borntraeger
2020-02-07 11:39 ` [PATCH 02/35] KVM: s390/interrupt: do not pin adapter interrupt pages Christian Borntraeger
2020-02-10 12:26   ` David Hildenbrand
2020-02-10 18:38     ` Christian Borntraeger
2020-02-10 19:33       ` David Hildenbrand
2020-02-11  9:23         ` [PATCH v2 RFC] " Christian Borntraeger
2020-02-12 11:52           ` Christian Borntraeger
2020-02-12 12:16           ` David Hildenbrand
2020-02-12 12:22             ` Christian Borntraeger
2020-02-12 12:47               ` David Hildenbrand
2020-02-12 12:39           ` Cornelia Huck
2020-02-12 12:44             ` Christian Borntraeger
2020-02-12 13:07               ` Cornelia Huck
2020-02-10 18:56     ` [PATCH 02/35] KVM: s390/interrupt: do not pin adapter interrupt Ulrich Weigand
2020-02-10 12:40   ` [PATCH 02/35] KVM: s390/interrupt: do not pin adapter interrupt pages David Hildenbrand
2020-02-07 11:39 ` [PATCH 05/35] s390/mm: provide memory management functions for protected KVM guests Christian Borntraeger
2020-02-12 13:42   ` Cornelia Huck
2020-02-13  7:43     ` Christian Borntraeger
2020-02-13  8:44       ` Cornelia Huck
2020-02-14 17:59   ` David Hildenbrand
2020-02-14 21:17     ` Christian Borntraeger
2020-02-07 11:39 ` [PATCH 06/35] s390/mm: add (non)secure page access exceptions handlers Christian Borntraeger
2020-02-14 18:05   ` David Hildenbrand
2020-02-14 19:59     ` Christian Borntraeger
2020-02-07 11:39 ` [PATCH 10/35] KVM: s390: protvirt: Secure memory is not mergeable Christian Borntraeger
2020-02-07 11:39 ` [PATCH 11/35] KVM: s390/mm: Make pages accessible before destroying the guest Christian Borntraeger
2020-02-14 18:40   ` David Hildenbrand
2020-02-07 11:39 ` [PATCH 21/35] KVM: s390/mm: handle guest unpin events Christian Borntraeger
2020-02-10 14:58   ` Thomas Huth
2020-02-11 13:21     ` Cornelia Huck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6da2e3d0-a2be-18d2-3548-b546052d14e3@amd.com \
    --to=thomas.lendacky@amd.com \
    --cc=Ulrich.Weigand@de.ibm.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=borntraeger@de.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=david@redhat.com \
    --cc=frankja@linux.vnet.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=mimu@linux.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=sean.j.christopherson@intel.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).