All of lore.kernel.org
 help / color / mirror / Atom feed
From: "H. Peter Anvin" <hpa@zytor.com>
To: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: pbonzini@redhat.com, ebiggers@kernel.org, x86@kernel.org,
	linux-kernel@vger.kernel.org, qemu-devel@nongnu.org,
	ardb@kernel.org, kraxel@redhat.com, bp@alien8.de,
	philmd@linaro.org
Subject: Re: [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data
Date: Fri, 30 Dec 2022 11:13:32 -0800	[thread overview]
Message-ID: <6C1D0560-6D77-4733-9B8D-5184935AEC62@zytor.com> (raw)
In-Reply-To: <Y68K4mPuz6edQkCX@zx2c4.com>

On December 30, 2022 7:59:30 AM PST, "Jason A. Donenfeld" <Jason@zx2c4.com> wrote:
>Hi,
>
>On Wed, Dec 28, 2022 at 11:31:34PM -0800, H. Peter Anvin wrote:
>> On December 28, 2022 6:31:07 PM PST, "Jason A. Donenfeld" <Jason@zx2c4.com> wrote:
>> >Hi,
>> >
>> >Read this message in a fixed width text editor with a lot of columns.
>> >
>> >On Wed, Dec 28, 2022 at 03:58:12PM -0800, H. Peter Anvin wrote:
>> >> Glad you asked.
>> >> 
>> >> So the kernel load addresses are parameterized in the kernel image
>> >> setup header. One of the things that are so parameterized are the size
>> >> and possible realignment of the kernel image in memory.
>> >> 
>> >> I'm very confused where you are getting the 64 MB number from. There
>> >> should not be any such limitation.
>> >
>> >Currently, QEMU appends it to the kernel image, not to the initramfs as
>> >you suggest below. So, that winds up looking, currently, like:
>> >
>> >          kernel image            setup_data
>> >   |--------------------------||----------------|
>> >0x100000                  0x100000+l1     0x100000+l1+l2
>> >
>> >The problem is that this decompresses to 0x1000000 (one more zero). So
>> >if l1 is > (0x1000000-0x100000), then this winds up looking like:
>> >
>> >          kernel image            setup_data
>> >   |--------------------------||----------------|
>> >0x100000                  0x100000+l1     0x100000+l1+l2
>> >
>> >                                 d e c o m p r e s s e d   k e r n e l
>> >		     |-------------------------------------------------------------|
>> >                0x1000000                                                     0x1000000+l3 
>> >
>> >The decompressed kernel seemingly overwriting the compressed kernel
>> >image isn't a problem, because that gets relocated to a higher address
>> >early on in the boot process. setup_data, however, stays in the same
>> >place, since those links are self referential and nothing fixes them up.
>> >So the decompressed kernel clobbers it.
>> >
>> >The solution in this commit adds a bunch of padding between the kernel
>> >image and setup_data to avoid this. That looks like this:
>> >
>> >          kernel image                            padding                               setup_data
>> >   |--------------------------||---------------------------------------------------||----------------|
>> >0x100000                  0x100000+l1                                         0x1000000+l3      0x1000000+l3+l2
>> >
>> >                                 d e c o m p r e s s e d   k e r n e l
>> >		     |-------------------------------------------------------------|
>> >                0x1000000                                                     0x1000000+l3 
>> >
>> >This way, the decompressed kernel doesn't clobber setup_data.
>> >
>> >The problem is that if 0x1000000+l3-0x100000 is around 62 megabytes,
>> >then the bootloader crashes when trying to dereference setup_data's
>> >->len param at the end of initialize_identity_maps() in ident_map_64.c.
>> >I don't know why it does this. If I could remove the 62 megabyte
>> >restriction, then I could keep with this technique and all would be
>> >well.
>> >
>> >> In general, setup_data should be able to go anywhere the initrd can
>> >> go, and so is subject to the same address cap (896 MB for old kernels,
>> >> 4 GB on newer ones; this address too is enumerated in the header.)
>> >
>> >It would be theoretically possible to attach it to the initrd image
>> >instead of to the kernel image. As a last resort, I guess I can look
>> >into doing that. However, that's going to require some serious rework
>> >and plumbing of a lot of different components. So if I can make it work
>> >as is, that'd be ideal. However, I need to figure out this weird 62 meg
>> >limitation.
>> >
>> >Any ideas on that?
>> >
>> >Jason
>> 
>> As far as a crash... that sounds like a big and a pretty serious one at that.
>> 
>> Could you let me know what kernel you are using and how *exactly* you are booting it?
>
>I'll attach a .config file. Apply the patch at the top of this thread to
>qemu, except make one modification:
>
>diff --git a/hw/i386/x86.c b/hw/i386/x86.c
>index 628fd2b2e9..a61ee23e13 100644
>--- a/hw/i386/x86.c
>+++ b/hw/i386/x86.c
>@@ -1097,7 +1097,7 @@ void x86_load_linux(X86MachineState *x86ms,
> 
>             /* The early stage can't address past around 64 MB from the original
>              * mapping, so just give up in that case. */
>-            if (padded_size < 62 * 1024 * 1024)
>+            if (true || padded_size < 62 * 1024 * 1024)
>                 kernel_size = padded_size;
>             else {
>                 fprintf(stderr, "qemu: Kernel image too large to hold setup_data\n");
>
>Then build qemu. Run it with `-kernel bzImage`, based on the kernel
>built with the .config I attached.
>
>You'll see that the CPU triple faults when hitting this line:
>
>        sd = (struct setup_data *)boot_params->hdr.setup_data;
>        while (sd) {
>                unsigned long sd_addr = (unsigned long)sd;
>
>                kernel_add_identity_map(sd_addr, sd_addr + sizeof(*sd) + sd->len);  <----
>                sd = (struct setup_data *)sd->next;
>        }
>
>, because it dereferences *sd. This does not happen if the decompressed
>size of the kernel is < 62 megs.
>
>So that's the "big and pretty serious" bug that might be worthy of
>investigation.
>
>Jason

No kidding. Dereferencing data *before you map it* is generally frowned upon.

This needs to be split into to making calls.

*Facepalm*

  parent reply	other threads:[~2022-12-30 19:14 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-28 14:38 [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data Jason A. Donenfeld
2022-12-28 16:02 ` Philippe Mathieu-Daudé
2022-12-28 16:30   ` Jason A. Donenfeld
2022-12-28 16:57     ` Jason A. Donenfeld
2022-12-28 23:58       ` H. Peter Anvin
2022-12-29  2:13         ` H. Peter Anvin
2022-12-29  2:31         ` Jason A. Donenfeld
2022-12-29  7:28           ` Philippe Mathieu-Daudé
2022-12-29  7:30           ` H. Peter Anvin
2022-12-29  7:31           ` H. Peter Anvin
2022-12-29 12:47             ` Borislav Petkov
2022-12-30 15:54               ` Jason A. Donenfeld
2022-12-30 17:01                 ` Borislav Petkov
2022-12-30 17:07                   ` Jason A. Donenfeld
2022-12-30 19:54                     ` Borislav Petkov
2022-12-30 21:58                       ` H. Peter Anvin
2022-12-30 22:10                         ` Jason A. Donenfeld
2022-12-31  1:06                           ` H. Peter Anvin
2022-12-31  1:14                             ` H. Peter Anvin
2022-12-31 12:55                             ` Jason A. Donenfeld
2022-12-31 13:40                             ` Borislav Petkov
2022-12-31 13:44                               ` Jason A. Donenfeld
2022-12-31 13:48                                 ` Borislav Petkov
2022-12-31 13:51                                   ` Jason A. Donenfeld
2022-12-31 14:24                                     ` Borislav Petkov
2022-12-31 18:22                                       ` Jason A. Donenfeld
2022-12-31 19:00                                         ` Borislav Petkov
2023-01-01  3:21                                           ` H. Peter Anvin
2023-01-01  3:31                                             ` H. Peter Anvin
2023-01-02  6:01                                               ` Borislav Petkov
2023-01-02  6:17                                                 ` Borislav Petkov
2023-01-02  9:32                                                   ` Ard Biesheuvel
2023-01-02 13:36                                                     ` Borislav Petkov
2023-01-02 15:03                                                       ` Ard Biesheuvel
2023-01-02  5:50                                             ` Borislav Petkov
2023-01-01  4:33                                         ` H. Peter Anvin
2023-01-01  4:55                                           ` Mika Penttilä
2023-01-01  5:13                                             ` H. Peter Anvin
2022-12-30 15:59             ` Jason A. Donenfeld
2022-12-30 16:21               ` Jason A. Donenfeld
2022-12-30 19:13               ` H. Peter Anvin [this message]
2022-12-31  9:48               ` Borislav Petkov
2022-12-31 12:54                 ` Jason A. Donenfeld
2022-12-31 13:35                   ` Borislav Petkov
2022-12-31 13:42                     ` Jason A. Donenfeld
2022-12-30 18:30 ` Jason A. Donenfeld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6C1D0560-6D77-4733-9B8D-5184935AEC62@zytor.com \
    --to=hpa@zytor.com \
    --cc=Jason@zx2c4.com \
    --cc=ardb@kernel.org \
    --cc=bp@alien8.de \
    --cc=ebiggers@kernel.org \
    --cc=kraxel@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.