From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Dave Hansen <dave.hansen@intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Borislav Petkov <bp@alien8.de>, Andy Lutomirski <luto@kernel.org>,
Sean Christopherson <seanjc@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Joerg Roedel <jroedel@suse.de>, Ard Biesheuvel <ardb@kernel.org>,
Andi Kleen <ak@linux.intel.com>,
Kuppuswamy Sathyanarayanan
<sathyanarayanan.kuppuswamy@linux.intel.com>,
David Rientjes <rientjes@google.com>,
Vlastimil Babka <vbabka@suse.cz>,
Tom Lendacky <thomas.lendacky@amd.com>,
Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <peterz@infradead.org>,
Paolo Bonzini <pbonzini@redhat.com>,
Ingo Molnar <mingo@redhat.com>,
Varad Gautam <varad.gautam@suse.com>,
Dario Faggioli <dfaggioli@suse.com>,
Brijesh Singh <brijesh.singh@amd.com>,
Mike Rapoport <rppt@kernel.org>,
David Hildenbrand <david@redhat.com>,
x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev,
linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCHv4 4/8] x86/boot/compressed: Handle unaccepted memory
Date: Sat, 9 Apr 2022 23:20:35 +0300 [thread overview]
Message-ID: <20220409202035.plaiekzuihov4kvq@box.shutemov.name> (raw)
In-Reply-To: <043469ae-427c-b2bb-89ff-db8975894266@intel.com>
On Fri, Apr 08, 2022 at 10:57:17AM -0700, Dave Hansen wrote:
> On 4/5/22 16:43, Kirill A. Shutemov wrote:
> > Firmware is responsible for accepting memory where compressed kernel
> > image and initrd land. But kernel has to accept memory for decompression
> > buffer: accept memory just before decompression starts.
>
> I think I'd appreciate a sentence or two more about what's going on.
> How about something like this?
>
> The firmware starts the kernel by booting into the "compressed" kernel
> stub. That stub's job is to decompress the full kernel image and then
> jump to it.
>
> The firmware will pre-accept the memory used to run the stub. But, the
> stub is responsible for accepting the memory into which it decompresses
> the main kernel. Accept memory just before decompression starts.
>
> The stub is also responsible for choosing a physical address in which to
> place the decompressed kernel image. The KASLR mechanism will randomize
> this physical address. Since the unaccepted memory region is relatively
> small, KASLR would be quite ineffective if it only used the pre-accepted
> area (EFI_CONVENTIONAL_MEMORY). Ensure that KASLR randomizes among the
> entire physical address space by also including EFI_UNACCEPTED_MEMORY.
Sure, looks good.
> > diff --git a/arch/x86/boot/compressed/bitmap.c b/arch/x86/boot/compressed/bitmap.c
> > index bf58b259380a..ba2de61c0823 100644
> > --- a/arch/x86/boot/compressed/bitmap.c
> > +++ b/arch/x86/boot/compressed/bitmap.c
> > @@ -2,6 +2,48 @@
> > /* Taken from lib/string.c */
> >
> > #include <linux/bitmap.h>
> > +#include <linux/math.h>
> > +#include <linux/minmax.h>
> > +
> > +unsigned long _find_next_bit(const unsigned long *addr1,
> > + const unsigned long *addr2, unsigned long nbits,
> > + unsigned long start, unsigned long invert, unsigned long le)
> > +{
> > + unsigned long tmp, mask;
> > +
> > + if (unlikely(start >= nbits))
> > + return nbits;
> > +
> > + tmp = addr1[start / BITS_PER_LONG];
> > + if (addr2)
> > + tmp &= addr2[start / BITS_PER_LONG];
> > + tmp ^= invert;
> > +
> > + /* Handle 1st word. */
> > + mask = BITMAP_FIRST_WORD_MASK(start);
> > + if (le)
> > + mask = swab(mask);
> > +
> > + tmp &= mask;
> > +
> > + start = round_down(start, BITS_PER_LONG);
> > +
> > + while (!tmp) {
> > + start += BITS_PER_LONG;
> > + if (start >= nbits)
> > + return nbits;
> > +
> > + tmp = addr1[start / BITS_PER_LONG];
> > + if (addr2)
> > + tmp &= addr2[start / BITS_PER_LONG];
> > + tmp ^= invert;
> > + }
> > +
> > + if (le)
> > + tmp = swab(tmp);
> > +
> > + return min(start + __ffs(tmp), nbits);
> > +}
> >
> > void __bitmap_set(unsigned long *map, unsigned int start, int len)
> > {
> > @@ -22,3 +64,23 @@ void __bitmap_set(unsigned long *map, unsigned int start, int len)
> > *p |= mask_to_set;
> > }
> > }
> > +
> > +void __bitmap_clear(unsigned long *map, unsigned int start, int len)
> > +{
> > + unsigned long *p = map + BIT_WORD(start);
> > + const unsigned int size = start + len;
> > + int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
> > + unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);
> > +
> > + while (len - bits_to_clear >= 0) {
> > + *p &= ~mask_to_clear;
> > + len -= bits_to_clear;
> > + bits_to_clear = BITS_PER_LONG;
> > + mask_to_clear = ~0UL;
> > + p++;
> > + }
> > + if (len) {
> > + mask_to_clear &= BITMAP_LAST_WORD_MASK(size);
> > + *p &= ~mask_to_clear;
> > + }
> > +}
>
> It's a real shame that we have to duplicate this code. Is there
> anything crazy we could do here like
>
> #include "../../../lib/find_bit.c"
>
> ?
Well, it would require fracturing source files on the kernel side.
__bitmap_set() and __bitmap_clear() are now in lib/bitmap.c.
_find_next_bit() is in lib/find_bit.c.
Both lib/bitmap.c and lib/find_bit.c have a lot of stuff that are not used
here. I guess we would need to split them into few pieces to make it in
sane way. Do you want me to go this path?
>
> > diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
> > index 411b268bc0a2..59db90626042 100644
> > --- a/arch/x86/boot/compressed/kaslr.c
> > +++ b/arch/x86/boot/compressed/kaslr.c
> > @@ -725,10 +725,20 @@ process_efi_entries(unsigned long minimum, unsigned long image_size)
> > * but in practice there's firmware where using that memory leads
> > * to crashes.
> > *
> > - * Only EFI_CONVENTIONAL_MEMORY is guaranteed to be free.
> > + * Only EFI_CONVENTIONAL_MEMORY and EFI_UNACCEPTED_MEMORY (if
> > + * supported) are guaranteed to be free.
> > */
> > - if (md->type != EFI_CONVENTIONAL_MEMORY)
> > +
> > + switch (md->type) {
> > + case EFI_CONVENTIONAL_MEMORY:
> > + break;
> > + case EFI_UNACCEPTED_MEMORY:
> > + if (IS_ENABLED(CONFIG_UNACCEPTED_MEMORY))
> > + break;
> > continue;
> > + default:
> > + continue;
> > + }
> >
> > if (efi_soft_reserve_enabled() &&
> > (md->attribute & EFI_MEMORY_SP))
> > diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> > index fa8969fad011..c1d9d71a6615 100644
> > --- a/arch/x86/boot/compressed/misc.c
> > +++ b/arch/x86/boot/compressed/misc.c
> > @@ -18,6 +18,7 @@
> > #include "../string.h"
> > #include "../voffset.h"
> > #include <asm/bootparam_utils.h>
> > +#include <asm/unaccepted_memory.h>
> >
> > /*
> > * WARNING!!
> > @@ -43,6 +44,9 @@
> > void *memmove(void *dest, const void *src, size_t n);
> > #endif
> >
> > +#undef __pa
> > +#define __pa(x) ((unsigned long)(x))
>
> Those #undef's always worry me. Why is this one needed?
arch/x86/boot/compressed/misc.c:47:9: warning: '__pa' macro redefined [-Wmacro-redefined]
#define __pa(x) ((unsigned long)(x))
^
arch/x86/include/asm/page.h:47:9: note: previous definition is here
#define __pa(x) __phys_addr((unsigned long)(x))
Note that sev.c does the same. At least we are consistent :)
> > /*
> > * This is set up by the setup-routine at boot-time
> > */
> > @@ -451,6 +455,13 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
> > #endif
> >
> > debug_putstr("\nDecompressing Linux... ");
> > +
> > + if (IS_ENABLED(CONFIG_UNACCEPTED_MEMORY) &&
> > + boot_params->unaccepted_memory) {
> > + debug_putstr("Accepting memory... ");
> > + accept_memory(__pa(output), __pa(output) + needed_size);
> > + }
> > +
> > __decompress(input_data, input_len, NULL, NULL, output, output_len,
> > NULL, error);
> > parse_elf(output);
> > diff --git a/arch/x86/boot/compressed/unaccepted_memory.c b/arch/x86/boot/compressed/unaccepted_memory.c
> > index d363acf59c08..3ebab63789bb 100644
> > --- a/arch/x86/boot/compressed/unaccepted_memory.c
> > +++ b/arch/x86/boot/compressed/unaccepted_memory.c
> > @@ -51,3 +51,17 @@ void mark_unaccepted(struct boot_params *params, u64 start, u64 end)
> > bitmap_set((unsigned long *)params->unaccepted_memory,
> > start / PMD_SIZE, (end - start) / PMD_SIZE);
> > }
> > +
> > +void accept_memory(phys_addr_t start, phys_addr_t end)
> > +{
> > + unsigned long *unaccepted_memory;
> > + unsigned int rs, re;
> > +
> > + unaccepted_memory = (unsigned long *)boot_params->unaccepted_memory;
> > + rs = start / PMD_SIZE;
>
> OK, so start is a physical address, PMD_SIZE is 2^21, and 'rs' is an
> unsigned int. That means 'rs' can, at most, represent a physical
> address at 2^(21+32), or 2^53. That's cutting it a *bit* close, don't
> you think?
>
> Could we please just give 'rs' and 're' real names and make them
> 'unsigned long's, please? It will surely save at least one other person
> from doing math. The find_next_bit() functions seem to take ulongs anyway.
Okay. 'range_start' and 'range_end' are good enough names?
>
> > + for_each_set_bitrange_from(rs, re, unaccepted_memory,
> > + DIV_ROUND_UP(end, PMD_SIZE)) {
> > + __accept_memory(rs * PMD_SIZE, re * PMD_SIZE);
> > + bitmap_clear(unaccepted_memory, rs, re - rs);
> > + }
> > +}
>
> Could we please introduce some intermediate variable? For instance:
>
> unsigned long bitmap_size = DIV_ROUND_UP(end, PMD_SIZE);
>
> That will make this all a lot easier to read.
Okay.
>
> > diff --git a/arch/x86/include/asm/unaccepted_memory.h b/arch/x86/include/asm/unaccepted_memory.h
> > index cbc24040b853..f1f835d3cd78 100644
> > --- a/arch/x86/include/asm/unaccepted_memory.h
> > +++ b/arch/x86/include/asm/unaccepted_memory.h
> > @@ -9,4 +9,6 @@ struct boot_params;
> >
> > void mark_unaccepted(struct boot_params *params, u64 start, u64 num);
> >
> > +void accept_memory(phys_addr_t start, phys_addr_t end);
> > +
> > #endif
>
--
Kirill A. Shutemov
next prev parent reply other threads:[~2022-04-09 20:19 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-05 23:43 [PATCHv4 0/8] mm, x86/cc: Implement support for unaccepted memory Kirill A. Shutemov
2022-04-05 23:43 ` [PATCHv4 1/8] mm: Add " Kirill A. Shutemov
2022-04-08 18:55 ` Dave Hansen
2022-04-09 15:54 ` Kirill A. Shutemov
2022-04-11 6:38 ` Dave Hansen
2022-04-11 10:07 ` Mike Rapoport
2022-04-13 11:40 ` Kirill A. Shutemov
2022-04-13 14:48 ` Mike Rapoport
2022-04-13 15:15 ` Kirill A. Shutemov
2022-04-13 20:06 ` Mike Rapoport
2022-04-11 8:47 ` David Hildenbrand
2022-04-08 19:04 ` David Hildenbrand
2022-04-08 19:11 ` Dave Hansen
2022-04-09 17:52 ` Kirill A. Shutemov
2022-04-11 6:41 ` Dave Hansen
2022-04-11 15:55 ` Borislav Petkov
2022-04-11 16:27 ` Dave Hansen
2022-04-11 18:55 ` Tom Lendacky
2022-04-12 8:15 ` David Hildenbrand
2022-04-12 16:08 ` Dave Hansen
2022-04-13 10:36 ` David Hildenbrand
2022-04-13 11:30 ` Kirill A. Shutemov
2022-04-13 11:32 ` David Hildenbrand
2022-04-13 15:36 ` Dave Hansen
2022-04-13 16:07 ` David Hildenbrand
2022-04-13 16:13 ` Dave Hansen
2022-04-13 16:24 ` Kirill A. Shutemov
2022-04-13 14:39 ` Mike Rapoport
2022-04-05 23:43 ` [PATCHv4 2/8] efi/x86: Get full memory map in allocate_e820() Kirill A. Shutemov
[not found] ` <Ylae+bejPzRMPrDw@zn.tnic>
2022-04-13 11:45 ` Kirill A. Shutemov
2022-04-05 23:43 ` [PATCHv4 3/8] efi/x86: Implement support for unaccepted memory Kirill A. Shutemov
2022-04-08 17:26 ` Dave Hansen
2022-04-09 19:41 ` Kirill A. Shutemov
2022-04-14 15:55 ` Borislav Petkov
2022-04-15 22:24 ` Borislav Petkov
2022-04-18 15:55 ` Kirill A. Shutemov
2022-04-18 16:38 ` Borislav Petkov
2022-04-18 20:24 ` Kirill A. Shutemov
2022-04-18 21:01 ` Borislav Petkov
2022-04-18 23:50 ` Kirill A. Shutemov
2022-04-19 7:39 ` Borislav Petkov
2022-04-19 15:30 ` Kirill A. Shutemov
2022-04-19 16:38 ` Dave Hansen
2022-04-19 19:23 ` Borislav Petkov
2022-04-21 12:26 ` Borislav Petkov
2022-04-22 0:21 ` Kirill A. Shutemov
2022-04-22 9:30 ` Borislav Petkov
2022-04-22 13:26 ` Kirill A. Shutemov
2022-04-05 23:43 ` [PATCHv4 4/8] x86/boot/compressed: Handle " Kirill A. Shutemov
2022-04-08 17:57 ` Dave Hansen
2022-04-09 20:20 ` Kirill A. Shutemov [this message]
2022-04-11 6:49 ` Dave Hansen
2022-04-05 23:43 ` [PATCHv4 5/8] x86/mm: Reserve unaccepted memory bitmap Kirill A. Shutemov
2022-04-08 18:08 ` Dave Hansen
2022-04-09 20:43 ` Kirill A. Shutemov
2022-04-05 23:43 ` [PATCHv4 6/8] x86/mm: Provide helpers for unaccepted memory Kirill A. Shutemov
2022-04-08 18:15 ` Dave Hansen
2022-04-08 19:21 ` Dave Hansen
2022-04-13 16:08 ` Kirill A. Shutemov
2022-04-05 23:43 ` [PATCHv4 7/8] x86/tdx: Unaccepted memory support Kirill A. Shutemov
2022-04-08 18:28 ` Dave Hansen
2022-04-05 23:43 ` [PATCHv4 8/8] mm/vmstat: Add counter for memory accepting Kirill A. Shutemov
2022-04-12 8:18 ` David Hildenbrand
2022-04-08 17:02 ` [PATCHv4 0/8] mm, x86/cc: Implement support for unaccepted memory Dave Hansen
2022-04-09 23:44 ` Kirill A. Shutemov
2022-04-21 12:29 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220409202035.plaiekzuihov4kvq@box.shutemov.name \
--to=kirill@shutemov.name \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=ardb@kernel.org \
--cc=bp@alien8.de \
--cc=brijesh.singh@amd.com \
--cc=dave.hansen@intel.com \
--cc=david@redhat.com \
--cc=dfaggioli@suse.com \
--cc=jroedel@suse.de \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-coco@lists.linux.dev \
--cc=linux-efi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=rientjes@google.com \
--cc=rppt@kernel.org \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=thomas.lendacky@amd.com \
--cc=varad.gautam@suse.com \
--cc=vbabka@suse.cz \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).