From: Tom Lendacky <thomas.lendacky@amd.com>
To: Christian Borntraeger <borntraeger@de.ibm.com>,
Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Janosch Frank <frankja@linux.vnet.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Marc Zyngier <maz@kernel.org>, KVM <kvm@vger.kernel.org>,
Cornelia Huck <cohuck@redhat.com>,
David Hildenbrand <david@redhat.com>,
Thomas Huth <thuth@redhat.com>,
Ulrich Weigand <Ulrich.Weigand@de.ibm.com>,
Claudio Imbrenda <imbrenda@linux.ibm.com>,
Andrea Arcangeli <aarcange@redhat.com>,
linux-s390 <linux-s390@vger.kernel.org>,
Michael Mueller <mimu@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
linux-mm@kvack.org, kvm-ppc@vger.kernel.org,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages
Date: Mon, 17 Feb 2020 14:55:34 -0600 [thread overview]
Message-ID: <6da2e3d0-a2be-18d2-3548-b546052d14e3@amd.com> (raw)
In-Reply-To: <e2c41b25-6d6d-6685-3450-2e3e8d84efd1@de.ibm.com>
On 2/13/20 2:13 PM, Christian Borntraeger wrote:
>
>
> On 13.02.20 20:56, Sean Christopherson wrote:
>> On Mon, Feb 10, 2020 at 06:27:04PM +0100, Christian Borntraeger wrote:
>>> CC Marc Zyngier for KVM on ARM. Marc, see below. Will there be any
>>> use for this on KVM/ARM in the future?
>>>
>>> CC Sean Christopherson/Tom Lendacky. Any obvious use case for Intel/AMD
>>> to have a callback before a page is used for I/O?
From an SEV-SNP perspective, I don't think so. The SEV-SNP architecture
uses page states and having the hypervisor change the state from beneath
the guest might trigger the guest into thinking it's being attacked vs
just allowing the I/O to fail. Is this a concern with flooding the console
with I/O error messages?
>>
>> Yes?
>>
>>> Andrew (or other mm people) any chance to get an ACK for this change?
>>> I could then carry that via s390 or KVM tree. Or if you want to carry
>>> that yourself I can send an updated version (we need to kind of
>>> synchronize that Linus will pull the KVM changes after the mm changes).
>>>
>>> Andrea asked if others would benefit from this, so here are some more
>>> information about this (and I can also put this into the patch
>>> description). So we have talked to the POWER folks. They do not use
>>> the standard normal memory management, instead they have a hard split
>>> between secure and normal memory. The secure memory is the handled by
>>> the hypervisor as device memory and the ultravisor and the hypervisor
>>> move this forth and back when needed.
>>>
>>> On s390 there is no *separate* pool of physical pages that are secure.
>>> Instead, *any* physical page can be marked as secure or not, by
>>> setting a bit in a per-page data structure that hardware uses to stop
>>> unauthorized access. (That bit is under control of the ultravisor.)
>>>
>>> Note that one side effect of this strategy is that the decision
>>> *which* secure pages to encrypt and then swap out is actually done by
>>> the hypervisor, not the ultravisor. In our case, the hypervisor is
>>> Linux/KVM, so we're using the regular Linux memory management scheme
>>> (active/inactive LRU lists etc.) to make this decision. The advantage
>>> is that the Ultravisor code does not need to itself implement any
>>> memory management code, making it a lot simpler.
>>
>> Disclaimer: I'm not familiar with s390 guest page faults or UV. I tried
>> to give myself a crash course, apologies if I'm way out in left field...
>>
>> AIUI, pages will first be added to a secure guest by converting a normal,
>> non-secure page to secure and stuffing it into the guest page tables. To
>> swap a page from a secure guest, arch_make_page_accessible() will be called
>> to encrypt the page in place so that it can be accessed by the untrusted
>> kernel/VMM and written out to disk. And to fault the page back in, on s390
>> a secure guest access to a non-secure page will generate a page fault with
>> a dedicated type. That fault routes directly to
>> do_non_secure_storage_access(), which converts the page to secure and thus
>> makes it re-accessible to the guest.
>>
>> That all sounds sane and usable for Intel.
>>
>> My big question is the follow/get flows, more on that below.
>>
>>> However, in the end this is why we need the hook into Linux memory
>>> management: once Linux has decided to swap a page out, we need to get
>>> a chance to tell the Ultravisor to "export" the page (i.e., encrypt
>>> its contents and mark it no longer secure).
>>>
>>> As outlined below this should be a no-op for anybody not opting in.
>>>
>>> Christian
>>>
>>> On 07.02.20 12:39, Christian Borntraeger wrote:
>>>> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>>>
>>>> With the introduction of protected KVM guests on s390 there is now a
>>>> concept of inaccessible pages. These pages need to be made accessible
>>>> before the host can access them.
>>>>
>>>> While cpu accesses will trigger a fault that can be resolved, I/O
>>>> accesses will just fail. We need to add a callback into architecture
>>>> code for places that will do I/O, namely when writeback is started or
>>>> when a page reference is taken.
>>>>
>>>> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>>> ---
>>>> include/linux/gfp.h | 6 ++++++
>>>> mm/gup.c | 2 ++
>>>> mm/page-writeback.c | 1 +
>>>> 3 files changed, 9 insertions(+)
>>>>
>>>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
>>>> index e5b817cb86e7..be2754841369 100644
>>>> --- a/include/linux/gfp.h
>>>> +++ b/include/linux/gfp.h
>>>> @@ -485,6 +485,12 @@ static inline void arch_free_page(struct page *page, int order) { }
>>>> #ifndef HAVE_ARCH_ALLOC_PAGE
>>>> static inline void arch_alloc_page(struct page *page, int order) { }
>>>> #endif
>>>> +#ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
>>>> +static inline int arch_make_page_accessible(struct page *page)
>>>> +{
>>>> + return 0;
>>>> +}
>>>> +#endif
>>>>
>>>> struct page *
>>>> __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>>>> diff --git a/mm/gup.c b/mm/gup.c
>>>> index 7646bf993b25..a01262cd2821 100644
>>>> --- a/mm/gup.c
>>>> +++ b/mm/gup.c
>>>> @@ -257,6 +257,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
>>>> page = ERR_PTR(-ENOMEM);
>>>> goto out;
>>>> }
>>>> + arch_make_page_accessible(page);
>>
>> As Will pointed out, the return value definitely needs to be checked, there
>> will undoubtedly be scenarios where the page cannot be made accessible.
>
> Actually onm s390 this should always succeed unless we have a bug.
>
> But we can certainly provide a variant of that patch that does check the return
> value.
> Proper error handling for gup and WARN_ON for pae-writeback.
>>
>> What is the use case for calling arch_make_page_accessible() in the follow()
>> and gup() paths? Live migration is the only thing that comes to mind, and
>> for live migration I would expect you would want to keep the secure guest
>> running when copying pages to the target, i.e. use pre-copy. That would
>> conflict with converting the page in place. Rather, migration would use a
>> separate dedicated path to copy the encrypted contents of the secure page to
>> a completely different page, and send *that* across the wire so that the
>> guest can continue accessing the original page.
>> Am I missing a need to do this for the swap/reclaim case? Or is there a
>> completely different use case I'm overlooking?
>
> This is actually to protect the host against a malicious user space. For
> example a bad QEMU could simply start direct I/O on such protected memory.
> We do not want userspace to be able to trigger I/O errors and thus we
> implemented the logic to "whenever somebody accesses that page (gup) or
> doing I/O, make sure that this page can be accessed. When the guest tries
> to access that page we will wait in the page fault handler for writeback to
> have finished and for the page_ref to be the expected value.
So in this case, when the guest tries to access the page, the page may now
be corrupted because I/O was allowed to be done to it? Or will the I/O
have been blocked in some way, but without generating the I/O error?
Thanks,
Tom
>
>
>
>>
>> Tangentially related, hooks here could be quite useful for sanity checking
>> the kernel/KVM and/or debugging kernel/KVM bugs. Would it make sense to
>> pass a param to arch_make_page_accessible() to provide some information as
>> to why the page needs to be made accessible?
>
> Some kind of enum that can be used optionally to optimize things?
>
>>
>>>> }
>>>> if (flags & FOLL_TOUCH) {
>>>> if ((flags & FOLL_WRITE) &&
>>>> @@ -1870,6 +1871,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>>>>
>>>> VM_BUG_ON_PAGE(compound_head(page) != head, page);
>>>>
>>>> + arch_make_page_accessible(page);
>>>> SetPageReferenced(page);
>>>> pages[*nr] = page;
>>>> (*nr)++;
>>>> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
>>>> index 2caf780a42e7..0f0bd14571b1 100644
>>>> --- a/mm/page-writeback.c
>>>> +++ b/mm/page-writeback.c
>>>> @@ -2806,6 +2806,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
>>>> inc_lruvec_page_state(page, NR_WRITEBACK);
>>>> inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
>>>> }
>>>> + arch_make_page_accessible(page);
>>>> unlock_page_memcg(page);
>>>
>>> As outlined by Ulrich, we can move the callback after the unlock.
>>>
>>>> return ret;
>>>>
>>>>
>>>
>
next prev parent reply other threads:[~2020-02-17 20:55 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
2020-02-07 11:39 ` [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages Christian Borntraeger
2020-02-10 17:27 ` Christian Borntraeger
2020-02-11 11:26 ` Will Deacon
2020-02-11 11:43 ` Christian Borntraeger
2020-02-13 14:48 ` Christian Borntraeger
2020-02-18 16:02 ` Will Deacon
2020-02-13 19:56 ` Sean Christopherson
2020-02-13 20:13 ` Christian Borntraeger
2020-02-13 20:46 ` Sean Christopherson
2020-02-17 20:55 ` Tom Lendacky [this message]
2020-02-17 21:14 ` Christian Borntraeger
2020-02-10 18:17 ` David Hildenbrand
2020-02-10 18:28 ` Christian Borntraeger
2020-02-10 18:43 ` David Hildenbrand
2020-02-10 18:51 ` Christian Borntraeger
2020-02-18 3:36 ` Tian, Kevin
2020-02-18 6:44 ` Christian Borntraeger
2020-02-07 11:39 ` [PATCH 02/35] KVM: s390/interrupt: do not pin adapter interrupt pages Christian Borntraeger
2020-02-10 12:26 ` David Hildenbrand
2020-02-10 18:38 ` Christian Borntraeger
2020-02-10 19:33 ` David Hildenbrand
2020-02-11 9:23 ` [PATCH v2 RFC] " Christian Borntraeger
2020-02-12 11:52 ` Christian Borntraeger
2020-02-12 12:16 ` David Hildenbrand
2020-02-12 12:22 ` Christian Borntraeger
2020-02-12 12:47 ` David Hildenbrand
2020-02-12 12:39 ` Cornelia Huck
2020-02-12 12:44 ` Christian Borntraeger
2020-02-12 13:07 ` Cornelia Huck
2020-02-10 18:56 ` [PATCH 02/35] KVM: s390/interrupt: do not pin adapter interrupt Ulrich Weigand
2020-02-10 12:40 ` [PATCH 02/35] KVM: s390/interrupt: do not pin adapter interrupt pages David Hildenbrand
2020-02-07 11:39 ` [PATCH 05/35] s390/mm: provide memory management functions for protected KVM guests Christian Borntraeger
2020-02-12 13:42 ` Cornelia Huck
2020-02-13 7:43 ` Christian Borntraeger
2020-02-13 8:44 ` Cornelia Huck
2020-02-14 17:59 ` David Hildenbrand
2020-02-14 21:17 ` Christian Borntraeger
2020-02-07 11:39 ` [PATCH 06/35] s390/mm: add (non)secure page access exceptions handlers Christian Borntraeger
2020-02-14 18:05 ` David Hildenbrand
2020-02-14 19:59 ` Christian Borntraeger
2020-02-07 11:39 ` [PATCH 10/35] KVM: s390: protvirt: Secure memory is not mergeable Christian Borntraeger
2020-02-07 11:39 ` [PATCH 11/35] KVM: s390/mm: Make pages accessible before destroying the guest Christian Borntraeger
2020-02-14 18:40 ` David Hildenbrand
2020-02-07 11:39 ` [PATCH 21/35] KVM: s390/mm: handle guest unpin events Christian Borntraeger
2020-02-10 14:58 ` Thomas Huth
2020-02-11 13:21 ` Cornelia Huck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6da2e3d0-a2be-18d2-3548-b546052d14e3@amd.com \
--to=thomas.lendacky@amd.com \
--cc=Ulrich.Weigand@de.ibm.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=borntraeger@de.ibm.com \
--cc=cohuck@redhat.com \
--cc=david@redhat.com \
--cc=frankja@linux.vnet.ibm.com \
--cc=gor@linux.ibm.com \
--cc=imbrenda@linux.ibm.com \
--cc=kvm-ppc@vger.kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-s390@vger.kernel.org \
--cc=maz@kernel.org \
--cc=mimu@linux.ibm.com \
--cc=pbonzini@redhat.com \
--cc=sean.j.christopherson@intel.com \
--cc=thuth@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).