All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brijesh Singh <brijesh.singh@amd.com>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	David Rientjes <rientjes@google.com>
Cc: brijesh.singh@amd.com, Borislav Petkov <bp@alien8.de>,
	Andy Lutomirski <luto@kernel.org>,
	Sean Christopherson <seanjc@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andi Kleen <ak@linux.intel.com>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Jon Grimm <jon.grimm@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Christoph Hellwig <hch@lst.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Ingo Molnar <mingo@redhat.com>, Joerg Roedel <jroedel@suse.de>,
	x86@kernel.org, linux-mm@kvack.org
Subject: Re: AMD SEV-SNP/Intel TDX: validation of memory pages
Date: Tue, 2 Feb 2021 18:16:41 -0600	[thread overview]
Message-ID: <961a2736-9bc9-43e1-1e75-6d373fe9590b@amd.com> (raw)
In-Reply-To: <20210202160205.3wfchtibq2sd7pe5@black.fi.intel.com>


On 2/2/21 10:02 AM, Kirill A. Shutemov wrote:
> On Mon, Feb 01, 2021 at 05:51:09PM -0800, David Rientjes wrote:
>> Hi everybody,
>>
>> I'd like to kick-start the discussion on lazy validation of guest memory
>> for the purposes of AMD SEV-SNP and Intel TDX.
>>
>> Both AMD SEV-SNP and Intel TDX require validation of guest memory before
>> it may be used by the guest.  This is needed for integrity protection from
>> a potentially malicious hypervisor or other host components.
>>
>> For AMD SEV-SNP, the hypervisor assigns a page to the guest using the new
>> RMPUPDATE instruction.  The guest then transitions the page to a usable by
>> the new PVALIDATE instruction[1].  This sets the Validated flag in the
>> Reverse Map Table (RMP) for a guest addressable page, which opts into
>> hardware and firmware integrity protection.  This may only be done by the
>> guest itself and until that time, the guest cannot access the page.
>>
>> The guest can only PVALIDATE memory for a gPA once; the RMP then
>> guarantees for each hPA that there is only a single gPA mapping.  This
>> validation can either be done all up front at the time the guest is booted
>> or it can be done lazily at runtime on fault if the guest keeps track of
>> Valid vs Invalid pages.  Because doing PVALIDATE for all guest memory at
>> boot would be extremely lengthy, I'd like to discuss the options for doing
>> it lazily.
>>
>> Similarly, for Intel TDX, the hypervisor unmaps the gPA from the shared
>> EPT and invalidates the tlb and all caches for the TD's vcpus; it then
>> adds a page to the gPA address space for a TD by using the new
>> TDH.MEM.PAGE.AUG call.  The TDG.MEM.PAGE.ACCEPT TDCALL[2] then allows a
>> guest to accept a guest page for a gPA and initialize it using the private
>> key for that TD.  This may only be done by the TD itself and until that
>> time, the gPA cannot be used within the TD.
>>
>> Both AMD SEV-SNP and Intel TDX support hugepages.  SEV-SNP supports 2MB
>> whereas TDX has accept TDCALL support for 2MB and 1GB.
>>
>> I believe the UEFI ECR[3] for the unaccepted memory type to
>> EFI_MEMORY_TYPE was accepted in December.  This should enable the guest to
>> learn what memory has not yet been validated (or accepted) by the firmware
>> if all guest memory is not done completely up front.
>>
>> This likely requires a pre-validation of all memory that can be accessed
>> when handling a #VC (or #VE for TDX) such as IST stacks, including memory
>> in the x86 boot sequence that must be validated before the core mm
>> subsystem is up and running to handle the lazy validation.  I believe
>> lazy validation can be done by the core mm after that, perhaps by
>> maintaining a new "validated" bit in struct page flags.
>>
>> Has anybody looked into this or, even better, is anybody currently working
>> on this?
> It's likely I'm going to do this on Intel side, but I have not looked
> deeply into it.
>
>> I think quite invasive changes are needed for the guest to support lazy
>> validation/acceptance to core areas that lots of people on the recipient
>> list have strong opinions about.  Some things that come to mind:
>>
>>  - Annotations for pages that must be pre-validated in the x86 boot
>>    sequence, including IST stacks
>>
>>  - Proliferation of these annotations throughout any kernel code that can
>>    access memory for #VC or #VE
>>
>>  - Handling lazy validation of guest memory through the core mm layer,
>>    most likely involving a bit in struct page flags to track their status
>>
>>  - Any need for validating memory that is not backed by struct page that
>>    needs to be special-cased
>>
>>  - Any concerns about this for the DMA layer
>>
>> One possibility for minimal disruption to the boot entry code is to
>> require the guest BIOS to validate 4GB and below, and then leave 4GB and
>> above to be done lazily (the true amount of memory will actually be less
>> due to the MMIO hole).
> [ As I didn't looked into actual code, I may say total garbage below... ]
>
> Pre-validating 4GB would indeed be easiest way to go, but it's going to be
> too slow.
>
> The more realistic is for BIOS to pre-validate memory where kernel and
> initrd are placed, plus few dozen megs for runtime. It means decompression
> code would need to be aware about the validation.


I was thinking that BIOS validating the lower 4GB will simplify the
changes to the kernel entry code path as well provide a clean approach
to support kexec. 

My initial thought is

- BIOS or VMM validate lower 4GB memory.

- BIOS mark the higher 4GB as unaccepted in e820/efi memmap

- Kernel early boot can be achieved with minimal (or no changes)

- If there is an unaccepted type discovered then allocate a bitmap that
can be used to keep track of information (e.g which pages are
validated). We can also explore whether removing the unaccepted flag
from the memmap range will work.

- On #VC/#VE, look at the bitmap to see if we need to validate the
pages. To speed up, we can validate more than one page on #VC/#VE.

- If we get kexec'd then rebuild the e820/memmap based on the bitmap so
that we don't double validate. 


>
> The critical thing is that once memory is validate we must not validate
> it again. It's possible VMM->guest attack vector. We must track precisely
> what memory has been validated and stop the guest on detecting the
> unexpected second validation request.
>
> It also means that we has to keep the information when the control gets
> passed from decompression code to the real kernel. Page flag is no good
> for this.
>
> My initial thought that we can use e820/efi memmap to keep track of
> information -- remove the unaccepted memory flag from the range that got
> accepted.
>
> The decompression code validates the memory that it's need for
> decompression, modify memmap accordingly and pass control to the main
> kernel.
>
> The main kernel may accept the memory via #VE/#VC, but ideally it need to
> stay within memory accepted by decompression code for initial boot.
>
> I think the bulk of memory validation can be done via existing machinery:
> we have already deferred struct page initialization code in kernel and I
> believe we can hook up into it for the purpose.
>
> Any comments?
>


  reply	other threads:[~2021-02-03  0:17 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-02  1:51 AMD SEV-SNP/Intel TDX: validation of memory pages David Rientjes
2021-02-02 13:17 ` Matthew Wilcox
2021-02-02 16:02 ` Kirill A. Shutemov
2021-02-03  0:16   ` Brijesh Singh [this message]
2021-02-11 17:46     ` Sean Christopherson
2021-02-02 22:37 ` Andi Kleen
2021-02-11 20:46 ` Peter Zijlstra
2021-02-12 13:19 ` Joerg Roedel
2021-02-12 14:17   ` Peter Zijlstra
2021-02-12 14:53     ` Joerg Roedel
2021-02-12 15:19       ` Peter Zijlstra
2021-02-12 15:28         ` Joerg Roedel
2021-02-12 16:12           ` Peter Zijlstra
2021-02-12 16:18             ` Joerg Roedel
2021-02-12 16:45               ` Peter Zijlstra
2021-02-12 17:48                 ` Dave Hansen
2021-02-12 18:22                   ` Sean Christopherson
2021-02-12 18:38                     ` Andy Lutomirski
2021-02-12 18:43                       ` Sean Christopherson
2021-02-12 18:46                     ` Dave Hansen
2021-02-12 19:24                       ` Sean Christopherson
2021-02-16 10:00                 ` Joerg Roedel
2021-02-16 14:27                   ` Andi Kleen
2021-02-16 14:46                     ` Peter Zijlstra
2021-02-16 15:59                       ` Paolo Bonzini
2021-02-16 16:25                         ` Joerg Roedel
2021-02-16 16:48                           ` Paolo Bonzini
2021-02-16 18:26                             ` Joerg Roedel
2021-02-16 18:33                               ` Paolo Bonzini
2021-02-16 16:47                         ` Peter Zijlstra
2021-02-16 16:57                         ` Andy Lutomirski
2021-02-16 17:05                           ` Paolo Bonzini
2021-02-16 16:55                       ` Andi Kleen
2021-02-12 21:42             ` Andi Kleen
2021-02-12 21:58               ` Peter Zijlstra
2021-02-12 22:39                 ` Andi Kleen
2021-02-12 22:46                   ` Andy Lutomirski
2021-02-13  9:38                   ` Peter Zijlstra
2021-02-12 23:51                 ` Paolo Bonzini
2021-03-23  9:33 ` Joerg Roedel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=961a2736-9bc9-43e1-1e75-6d373fe9590b@amd.com \
    --to=brijesh.singh@amd.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=hch@lst.de \
    --cc=jon.grimm@amd.com \
    --cc=jroedel@suse.de \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rientjes@google.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.