From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HTML_MESSAGE,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95FCDC07E95 for ; Tue, 20 Jul 2021 03:30:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D93F66113C for ; Tue, 20 Jul 2021 03:30:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D93F66113C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 619516B0106; Mon, 19 Jul 2021 23:30:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5C9FE6B0108; Mon, 19 Jul 2021 23:30:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 41DCC6B010C; Mon, 19 Jul 2021 23:30:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0170.hostedemail.com [216.40.44.170]) by kanga.kvack.org (Postfix) with ESMTP id 17A9D6B0106 for ; Mon, 19 Jul 2021 23:30:48 -0400 (EDT) Received: from smtpin34.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 9660A184B549A for ; Tue, 20 Jul 2021 03:30:46 +0000 (UTC) X-FDA: 78381539292.34.07FB48D Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf08.hostedemail.com (Postfix) with ESMTP id 37A2930000BE for ; Tue, 20 Jul 2021 03:30:46 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 373E6610D0; Tue, 20 Jul 2021 03:30:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1626751843; bh=/vaH0G5YKCXgTY65JiWpZ7dv05i4DAmP2PXpEyD3yb0=; h=In-Reply-To:References:Date:From:To:Cc:Subject:From; b=TPpXcNaXCRBo/PPmItXFc+PYGT8lHHjAQweA3aPBhhPdFj842GVmlZKjQYpqspIEQ yHqjf65gjXix7Yfnb/fMUv2+KH2GEbnB02FaSaa9NV/cZKvSyHbBUMt8lfJR0uc394 H3RY1bMG2j1pxHD2TLJQ59G2kMK8LtKc8F8bremk7/Ul0BujLxLc0T6YcGOncQIIMI 2wlyUyixy7x5fQ5AWbVwSfYAk3t3kcW7l18YRaGo0gBpqNThJ0HzuXHm8KYnNapiu7 6g5nFUyKzXBHDQR0qzLn6oDCiLE076S6QIEY9UmjueLo9aekDXiPH2ojgBwfVrDKKN //+dgCAAgMaeQ== Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailauth.nyi.internal (Postfix) with ESMTP id 3B27727C0054; Mon, 19 Jul 2021 23:30:41 -0400 (EDT) Received: from imap2 ([10.202.2.52]) by compute6.internal (MEProxy); Mon, 19 Jul 2021 23:30:41 -0400 X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddrfedugdeihecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefofgggkfgjfhffhffvufgtsegrtderreerreejnecuhfhrohhmpedftehnugih ucfnuhhtohhmihhrshhkihdfuceolhhuthhosehkvghrnhgvlhdrohhrgheqnecuggftrf grthhtvghrnhepleduvdetleefudekudfggfdujedvuefhiedtuedtleehkeehkeeigfei ffetvedvnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomh eprghnugihodhmvghsmhhtphgruhhthhhpvghrshhonhgrlhhithihqdduudeiudekheei fedvqddvieefudeiiedtkedqlhhuthhopeepkhgvrhhnvghlrdhorhhgsehlihhnuhigrd hluhhtohdruhhs X-ME-Proxy: Received: by mailuser.nyi.internal (Postfix, from userid 501) id D2A88A03A98; Mon, 19 Jul 2021 23:30:38 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.5.0-alpha0-533-gf73e617b8a-fm-20210712.002-gf73e617b Mime-Version: 1.0 Message-Id: In-Reply-To: References: Date: Mon, 19 Jul 2021 20:30:15 -0700 From: "Andy Lutomirski" To: "Erdem Aktas" Cc: "Joerg Roedel" , "David Rientjes" , "Borislav Petkov" , "Sean Christopherson" , "Andrew Morton" , "Vlastimil Babka" , "Kirill A. Shutemov" , "Andi Kleen" , "Brijesh Singh" , "Tom Lendacky" , "Jon Grimm" , "Thomas Gleixner" , "Peter Zijlstra (Intel)" , "Paolo Bonzini" , "Ingo Molnar" , "Kaplan, David" , "Varad Gautam" , "Dario Faggioli" , "the arch/x86 maintainers" , linux-mm@kvack.org, linux-coco@lists.linux.dev Subject: Re: Runtime Memory Validation in Intel-TDX and AMD-SNP Content-Type: multipart/alternative; boundary=a7e38998adce4bb7ae5503d51d4ee99d Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=TPpXcNaX; spf=pass (imf08.hostedemail.com: domain of luto@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=luto@kernel.org; dmarc=pass (policy=none) header.from=kernel.org X-Rspamd-Server: rspam02 X-Stat-Signature: 1pi9h8bjhzszxq3u8roh4pf1n1kfpamz X-Rspamd-Queue-Id: 37A2930000BE X-HE-Tag: 1626751846-17555 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --a7e38998adce4bb7ae5503d51d4ee99d Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: quoted-printable On Mon, Jul 19, 2021, at 6:51 PM, Erdem Aktas wrote: > With the new UEFI memory type, option 2 seems like a better option to = me. >=20 > I was thinking with the lack of new UEFI memory type support yet, opti= on 3 can be implemented as a temporary solution. IMO, this is crucial fo= r a reasonable boot performance.=20 >=20 > > There's one exception to this, which is the previous memory view in > > crash kernels. But that's an relatively obscure case and there might= be > > other solutions for this. >=20 > I think this is an important angle. It might cause reliability issues.= if kexec kernel does not know which page is shared or private, it can u= se a previously shared page as a code page which will not work. It is al= so a security concern. Hosts can always cause crashes which forces guest= s to do kexec for crash dump. If the kexec kernel does not know which pa= ges are validated before, it might be compromised with page replay attac= ks. What=E2=80=99s the attack you have in mind? With TDX, the guest using t= he wrong shared vs secure type should, at worst, cause crashes. With SE= V, I can imagine it=E2=80=99s possible for a guest to read or write the = ciphertext of a private page, but actually turning that into an attack s= eems like it would require convincing a guest to use the same page with = both modes. >=20 > Also kexec is not only for crash dumps. For warm resets, kexec kernel = needs to know the valid page map. >=20 > >> Also in general i don't think it will really happen, at least initi= ally. > >> All the shared buffers we use are allocated and never freed. So suc= h a > >> problem could be deferred. >=20 > Does it not depend on kernel configs? Currently, there is a valid cont= rol path in dma_alloc_coherent which might alloc and free shared pages. >=20 > >> At the risk of asking a potentially silly question, would it be > >> reasonable to treat non-validated memory as not-present for kernel > >> purposes and hot-add it in a thread as it gets validated?=20 >=20 > My concern with this is, it assumes that all the present memory is pri= vate. UEFI might have some pages which are shared therefore also are pre= sent.=20 Why is this a problem? In TDX, I don=E2=80=99t think shared pages need = any sort of validation. The private memory needs acceptance, but only Do= S should be possible by getting it wrong. If EFI passed in a messy map w= ith shared and private transitions all over, there will be a lot of exte= nts in the map, but what actually goes wrong? > -Erdem >=20 > On Mon, Jul 19, 2021 at 5:26 PM Andy Lutomirski wrot= e: >> On 7/19/21 5:58 AM, Joerg Roedel wrote: >>=20 >> > Memory Validation through the Boot Process and in the Running Syste= m >> > -------------------------------------------------------------------= - >> >=20 >> > The memory is validated throughout the boot process as described be= low. >> > These steps assume a firmware is present, but this proposal does no= t >> > strictly require a firmware. The tasks done be the firmware can als= o be >> > done by the hypervisor before starting the guest. The steps are: >> >=20 >> > 1. The firmware validates all memory which will not be owned = by >> > the boot loader or the OS. >> >=20 >> > 2. The firmware also validates the first X MB of memory, just= >> > enough to run a boot loader and to load the compressed Lin= ux >> > kernel image. X is not expected to be very large, 64 or 12= 8 >> > MB should be enough. This pre-validation should not cause >> > significant delays in the boot process. >> >=20 >> > 3. The validated memory is marked E820-Usable in struct >> > boot_params for the Linux decompressor. The rest of the >> > memory is also passed to Linux via new special E820 entrie= s >> > which mark the memory as Usable-but-Invalid. >> >=20 >> > 4. When the Linux decompressor takes over control, it evaluat= es >> > the E820 table and calculates to total amount of memory >> > available to Linux (valid and invalid memory). >> >=20 >> > The decompressor allocates a physically contiguous data >> > structure at a random memory location which is big enough = to >> > hold the the validation states of all 4kb pages available = to >> > the guest. This data structure will be called the Validati= on >> > Bitmap through the rest of this document. The Validation >> > Bitmap is indexed by page frame numbers.=20 >>=20 >> At the risk of asking a potentially silly question, would it be >> reasonable to treat non-validated memory as not-present for kernel >> purposes and hot-add it in a thread as it gets validated? Or would t= his >> result in poor system behavior before enough memory is validated? >> Perhaps we should block instead of failing allocations if we want mor= e >> memory than is currently validated? >>=20 >> --Andy >>=20 --a7e38998adce4bb7ae5503d51d4ee99d Content-Type: text/html;charset=utf-8 Content-Transfer-Encoding: quoted-printable

=
On Mon, Jul 19, 2021, at 6:51 PM, Erdem Aktas wrote:
<= /div>
With the new UEFI memory type, option 2 seems like a better option to = me.

I was thinking with the lack of new UEF= I memory type support yet, option 3 can be implemented as a temporary so= lution. IMO, this is crucial for a reasonable boot performance= . 

> There's one exception to this,= which is the previous memory view in
> crash kernels. = But that's an relatively obscure case and there might be
&= gt; other solutions for this.

I think this = is an important angle. It might cause reliability issues. if kexec kerne= l does not know which page is shared or private, it can use a previously= shared page as a code page which will not work. It is also a security c= oncern. Hosts can always cause crashes which forces guests to do kexec f= or crash dump. If the kexec kernel does not know which pages are validat= ed before, it might be compromised with page replay attacks.

What=E2=80=99s the attack you have = in mind?  With TDX, the guest using the wrong shared vs secure type= should, at worst, cause crashes.  With SEV, I can imagine it=E2=80= =99s possible for a guest to read or write the ciphertext of a private p= age, but actually turning that into an attack seems like it would requir= e convincing a guest to use the same page with both modes.


Also kexec is not only for crash dumps. For warm = resets, kexec kernel needs to know the valid page map.
>> Also in general i don't think it will really happen= , at least initially.
>> All the shared buffers we u= se are allocated and never freed. So such a
>> probl= em could be deferred.

Does it not depend on= kernel configs? Currently, there is a valid control path in dma_alloc_c= oherent which might alloc and free shared pages.

>> At the risk of asking a potentially silly question, would= it be
>> reasonable to treat non-validated memory a= s not-present for kernel
>> purposes and hot-add it = in a thread as it gets validated? 

My = concern with this is, it assumes that all the present memory is private.= UEFI might have some pages which are shared therefore also are present.=  

Why is this a pro= blem?  In TDX, I don=E2=80=99t think shared pages need any sort of = validation. The private memory needs acceptance, but only DoS should be = possible by getting it wrong. If EFI passed in a messy map with shared a= nd private transitions all over, there will be a lot of extents in the m= ap, but what actually goes wrong?

-Erdem
=

On Mon, Jul 19, 2021 at 5:26 PM Andy Lutomirski <luto@kernel.org> wrote:
On 7/19/21 5:58 AM, Joerg Roedel wrote:

= > Memory Validation through the Boot Process and in the Running Syst= em
> -------------------------------------------------= -------------------
>
> The memory = is validated throughout the boot process as described below.
> These steps assume a firmware is present, but this proposal doe= s not
> strictly require a firmware. The tasks done be= the firmware can also be
> done by the hypervisor bef= ore starting the guest. The steps are:
>
>       1. The firmware validates all memory w= hich will not be owned by
>       =   the boot loader or the OS.
>
&= gt;       2. The firmware also validates the first X= MB of memory, just
>         = ; enough to run a boot loader and to load the compressed Linux
=
>          kernel image. X is not expe= cted to be very large, 64 or 128
>     =     MB should be enough. This pre-validation should not cause=
>          significant delay= s in the boot process.
>
>  &n= bsp;    3. The validated memory is marked E820-Usable in struc= t
>          boot_params for = the Linux decompressor. The rest of the
>   =       memory is also passed to Linux via new special E82= 0 entries
>          which ma= rk the memory as Usable-but-Invalid.
>
= >       4. When the Linux decompressor takes ove= r control, it evaluates
>        &= nbsp; the E820 table and calculates to total amount of memory
<= div> >          available to Linux (valid an= d invalid memory).
>
>   =       The decompressor allocates a physically contiguous= data
>          structure at= a random memory location which is big enough to
>&nbs= p;         hold the the validation states of all 4kb= pages available to
>         = ; the guest. This data structure will be called the Validation
=
>          Bitmap through the rest of = this document. The Validation
>      &n= bsp;   Bitmap is indexed by page frame numbers.
At the risk of asking a potentially silly question, would i= t be
reasonable to treat non-validated memory as not-pres= ent for kernel
purposes and hot-add it in a thread as it = gets validated?  Or would this
result in poor system= behavior before enough memory is validated?
Perhaps we s= hould block instead of failing allocations if we want more
memory than is currently validated?

--A= ndy


--a7e38998adce4bb7ae5503d51d4ee99d--