From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Mike Rapoport <rppt@kernel.org>
Cc: Joerg Roedel <jroedel@suse.de>, Andi Kleen <ak@linux.intel.com>,
David Rientjes <rientjes@google.com>,
Borislav Petkov <bp@alien8.de>, Andy Lutomirski <luto@kernel.org>,
Sean Christopherson <seanjc@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Vlastimil Babka <vbabka@suse.cz>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Brijesh Singh <brijesh.singh@amd.com>,
Tom Lendacky <thomas.lendacky@amd.com>,
Jon Grimm <jon.grimm@amd.com>,
Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <peterz@infradead.org>,
Paolo Bonzini <pbonzini@redhat.com>,
Ingo Molnar <mingo@redhat.com>,
"Kaplan, David" <David.Kaplan@amd.com>,
Varad Gautam <varad.gautam@suse.com>,
Dario Faggioli <dfaggioli@suse.com>,
x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev
Subject: Re: Runtime Memory Validation in Intel-TDX and AMD-SNP
Date: Sun, 25 Jul 2021 21:28:28 +0300 [thread overview]
Message-ID: <20210725182828.6o57hc6j72urwxkz@box.shutemov.name> (raw)
In-Reply-To: <YP0r/fC0MlESX2IW@kernel.org>
On Sun, Jul 25, 2021 at 12:16:45PM +0300, Mike Rapoport wrote:
> On Fri, Jul 23, 2021 at 07:29:59PM +0300, Kirill A. Shutemov wrote:
> > On Fri, Jul 23, 2021 at 06:23:39PM +0300, Mike Rapoport wrote:
> > > > @@ -1318,9 +1327,14 @@ void __init e820__memblock_setup(void)
> > > > if (entry->type == E820_TYPE_SOFT_RESERVED)
> > > > memblock_reserve(entry->addr, entry->size);
> > > >
> > > > - if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN)
> > > > + if (entry->type != E820_TYPE_RAM &&
> > > > + entry->type != E820_TYPE_RESERVED_KERN &&
> > > > + entry->type != E820_TYPE_UNACCEPTED)
> > > > continue;
> > >
> > > If I understand correctly, you assume that
> > >
> > > * E820_TYPE_RAM and E820_TYPE_RESERVED_KERN regions are already accepted by
> > > firmware/booloader
> > > * E820_TYPE_UNACCEPTED would have been E820_SYSTEM_RAM if we'd disabled
> > > encryption
> > >
> > > What happens with other types? Particularly E820_TYPE_ACPI and
> > > E820_TYPE_NVS that may reside in memory and might have been accepted by
> > > BIOS.
> >
> > Any accessible memory that not marked as UNACCEPTED has to be accepted
> > before kernel gets control.
>
> Hmm, that would mean that everything that runs before the kernel must
> maintain precise E820 map. If we use 2M chunk as basic unit for accepting
> memory, the firmware must also use the same basic unit. E.g. we can't have
> an ACPI table squeezed between E820_TYPE_UNACCEPTED.
No. See mark_unaccepted(). Any chunks that cannot be accepted with 2M, get
accepted upfront, so we will not need to track them.
(I've just realized that mark_unaccepted() is buggy if 'start' and 'end'
are in the same 2M. Will fix.)
> Using e820 table would also mean that bootloader must be able to modify
> e820 and it also must follow the 2M rule.
>
> I think that using a dedicated data structure would be more robust than
> hooking into e820 table.
Maybe. We can construct the bitmap in the decompresser and translate
EFI_UNACCEPTED_MEMORY to E820_TYPE_RAM. I will look into this.
> > > > + if (entry->type == E820_TYPE_UNACCEPTED)
> > > > + mark_unaccepted(entry->addr, end);
> > > > +
> > > > memblock_add(entry->addr, entry->size);
> > > > }
> > > >
> > > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > > > index 72920af0b3c0..db9d1bcac9ed 100644
> > > > --- a/arch/x86/kernel/setup.c
> > > > +++ b/arch/x86/kernel/setup.c
> > > > @@ -944,6 +944,8 @@ void __init setup_arch(char **cmdline_p)
> > > > if (movable_node_is_enabled())
> > > > memblock_set_bottom_up(true);
> > > > #endif
> > > > + /* TODO: make conditional */
> > > > + memblock_set_bottom_up(true);
> > >
> > > If memory is accepted during memblock allocations this should not really
> > > matter.
> > > Bottom up would be preferable if we'd like to reuse as much of already
> > > accepted memory as possible before page allocator is up.
> >
> > One of the main reason for this feature is to speed up boot time and
> > re-usinging preaccepted memory fits the goal.
>
> Using bottom up also means that early allocations end up in DMA zones,
> which probably not a problem for VMs in general, but who knows what path
> through devices people would want to use...
Good point. Maybe we can drop it. Will see based on performance
evaluation.
>
> > > > --- a/mm/memblock.c
> > > > +++ b/mm/memblock.c
> > > > @@ -814,6 +814,7 @@ int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size)
> > > > memblock_dbg("%s: [%pa-%pa] %pS\n", __func__,
> > > > &base, &end, (void *)_RET_IP_);
> > > >
> > > > + accept_pages(base, base + size);
> > >
> > > Hmm, I'm not sure memblock_reserve() is the right place to accept pages. It
> > > can be called to reserve memory owned by firmware which not necessarily
> > > would be encrypted. Besides, memblock_reserve() may be called for absent
> > > memory, could be it'll confuse TDX/SEV?
> >
> > Such memory will not be marked as unaccepted and accept_pages() will do
> > nothing.
> >
> > > Ideally, the call to accept_pages() should live in
> > > memblock_alloc_range_nid(), but unfortunately there still stale
> > > memblock_find_in_range() + memblock_reserve() pairs in x86 setup code.
> >
> > memblock_reserve() is the root of memory allocation in the early boot and
> > it is natual place to do the trick. Unless we have a good reason to move
> > it somewhere I would keep it here.
>
> I think it is better to accept memory that is actually allocated rather
> than marked as being used. It'll make it more robust against future changes
> in memblock_reserve() callers and in what is accept_pages() in your patch.
I disagree.
If we move accept_pages() up to callers we will make less robust: any new
user of memblock_reserve() has to consider if accept_pages() is needed and
like would ignore it since it's not essential for any non-TDX/non-SEV use
case.
--
Kirill A. Shutemov
next prev parent reply other threads:[~2021-07-25 18:28 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-19 12:58 Runtime Memory Validation in Intel-TDX and AMD-SNP Joerg Roedel
2021-07-19 13:07 ` Matthew Wilcox
2021-07-19 15:02 ` Joerg Roedel
2021-07-19 20:39 ` Andi Kleen
2021-07-20 8:55 ` Joerg Roedel
2021-07-20 9:34 ` Dr. David Alan Gilbert
2021-07-20 11:50 ` Joerg Roedel
2021-07-20 0:26 ` Andy Lutomirski
[not found] ` <CAAYXXYwFzrf8uY-PFkMRSG28+HztfGdJft8kB3Y3keWCx9K8TQ@mail.gmail.com>
2021-07-20 2:00 ` Erdem Aktas
2021-07-20 5:17 ` Andi Kleen
2021-07-20 9:11 ` Joerg Roedel
2021-07-20 17:32 ` Andi Kleen
2021-07-20 23:09 ` Erdem Aktas
2021-07-21 0:38 ` Andi Kleen
2021-07-22 17:31 ` Marc Orr
2021-07-26 18:55 ` Joerg Roedel
[not found] ` <eacb9c1f-2c61-4a7f-b5a3-7bf579e6cbf6@www.fastmail.com>
2021-07-20 19:54 ` Erdem Aktas
2021-07-20 22:01 ` Andi Kleen
2021-07-20 23:55 ` Erdem Aktas
2021-07-21 0:35 ` Andi Kleen
2021-07-21 8:51 ` Joerg Roedel
2021-07-20 8:44 ` Joerg Roedel
2021-07-20 14:14 ` Dave Hansen
2021-07-20 17:30 ` Kirill A. Shutemov
2021-07-21 9:20 ` Mike Rapoport
2021-07-21 10:02 ` Kirill A. Shutemov
2021-07-21 10:22 ` Mike Rapoport
2021-07-21 10:53 ` Joerg Roedel
2021-07-21 9:25 ` Joerg Roedel
2021-07-21 10:25 ` Kirill A. Shutemov
2021-07-21 10:48 ` Joerg Roedel
2021-07-22 15:46 ` David Hildenbrand
2021-07-26 19:02 ` Joerg Roedel
2021-07-27 9:34 ` David Hildenbrand
2021-08-02 10:19 ` Joerg Roedel
2021-08-02 18:47 ` David Hildenbrand
2021-07-22 15:57 ` David Hildenbrand
2021-07-22 19:51 ` Kirill A. Shutemov
2021-07-23 15:23 ` Mike Rapoport
2021-07-23 16:29 ` Kirill A. Shutemov
2021-07-25 9:16 ` Mike Rapoport
2021-07-25 18:28 ` Kirill A. Shutemov [this message]
2021-07-26 10:00 ` Mike Rapoport
2021-07-26 11:53 ` Kirill A. Shutemov
2021-07-26 19:13 ` Joerg Roedel
2021-07-26 23:02 ` Erdem Aktas
2021-07-26 23:54 ` Kirill A. Shutemov
2021-07-27 1:35 ` Erdem Aktas
2021-07-23 11:04 ` Varad Gautam
2021-07-23 14:34 ` Kaplan, David
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210725182828.6o57hc6j72urwxkz@box.shutemov.name \
--to=kirill@shutemov.name \
--cc=David.Kaplan@amd.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=brijesh.singh@amd.com \
--cc=dfaggioli@suse.com \
--cc=jon.grimm@amd.com \
--cc=jroedel@suse.de \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-coco@lists.linux.dev \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=rientjes@google.com \
--cc=rppt@kernel.org \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=thomas.lendacky@amd.com \
--cc=varad.gautam@suse.com \
--cc=vbabka@suse.cz \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).