linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Jason A. Donenfeld" <Jason@zx2c4.com>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: pbonzini@redhat.com, ebiggers@kernel.org, x86@kernel.org,
	linux-kernel@vger.kernel.org, qemu-devel@nongnu.org,
	ardb@kernel.org, kraxel@redhat.com, bp@alien8.de,
	philmd@linaro.org
Subject: Re: [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data
Date: Fri, 30 Dec 2022 16:59:30 +0100	[thread overview]
Message-ID: <Y68K4mPuz6edQkCX@zx2c4.com> (raw)
In-Reply-To: <AF921575-0968-434A-8B46-095B78C209C1@zytor.com>

Hi,

On Wed, Dec 28, 2022 at 11:31:34PM -0800, H. Peter Anvin wrote:
> On December 28, 2022 6:31:07 PM PST, "Jason A. Donenfeld" <Jason@zx2c4.com> wrote:
> >Hi,
> >
> >Read this message in a fixed width text editor with a lot of columns.
> >
> >On Wed, Dec 28, 2022 at 03:58:12PM -0800, H. Peter Anvin wrote:
> >> Glad you asked.
> >> 
> >> So the kernel load addresses are parameterized in the kernel image
> >> setup header. One of the things that are so parameterized are the size
> >> and possible realignment of the kernel image in memory.
> >> 
> >> I'm very confused where you are getting the 64 MB number from. There
> >> should not be any such limitation.
> >
> >Currently, QEMU appends it to the kernel image, not to the initramfs as
> >you suggest below. So, that winds up looking, currently, like:
> >
> >          kernel image            setup_data
> >   |--------------------------||----------------|
> >0x100000                  0x100000+l1     0x100000+l1+l2
> >
> >The problem is that this decompresses to 0x1000000 (one more zero). So
> >if l1 is > (0x1000000-0x100000), then this winds up looking like:
> >
> >          kernel image            setup_data
> >   |--------------------------||----------------|
> >0x100000                  0x100000+l1     0x100000+l1+l2
> >
> >                                 d e c o m p r e s s e d   k e r n e l
> >		     |-------------------------------------------------------------|
> >                0x1000000                                                     0x1000000+l3 
> >
> >The decompressed kernel seemingly overwriting the compressed kernel
> >image isn't a problem, because that gets relocated to a higher address
> >early on in the boot process. setup_data, however, stays in the same
> >place, since those links are self referential and nothing fixes them up.
> >So the decompressed kernel clobbers it.
> >
> >The solution in this commit adds a bunch of padding between the kernel
> >image and setup_data to avoid this. That looks like this:
> >
> >          kernel image                            padding                               setup_data
> >   |--------------------------||---------------------------------------------------||----------------|
> >0x100000                  0x100000+l1                                         0x1000000+l3      0x1000000+l3+l2
> >
> >                                 d e c o m p r e s s e d   k e r n e l
> >		     |-------------------------------------------------------------|
> >                0x1000000                                                     0x1000000+l3 
> >
> >This way, the decompressed kernel doesn't clobber setup_data.
> >
> >The problem is that if 0x1000000+l3-0x100000 is around 62 megabytes,
> >then the bootloader crashes when trying to dereference setup_data's
> >->len param at the end of initialize_identity_maps() in ident_map_64.c.
> >I don't know why it does this. If I could remove the 62 megabyte
> >restriction, then I could keep with this technique and all would be
> >well.
> >
> >> In general, setup_data should be able to go anywhere the initrd can
> >> go, and so is subject to the same address cap (896 MB for old kernels,
> >> 4 GB on newer ones; this address too is enumerated in the header.)
> >
> >It would be theoretically possible to attach it to the initrd image
> >instead of to the kernel image. As a last resort, I guess I can look
> >into doing that. However, that's going to require some serious rework
> >and plumbing of a lot of different components. So if I can make it work
> >as is, that'd be ideal. However, I need to figure out this weird 62 meg
> >limitation.
> >
> >Any ideas on that?
> >
> >Jason
> 
> As far as a crash... that sounds like a big and a pretty serious one at that.
> 
> Could you let me know what kernel you are using and how *exactly* you are booting it?

I'll attach a .config file. Apply the patch at the top of this thread to
qemu, except make one modification:

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 628fd2b2e9..a61ee23e13 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1097,7 +1097,7 @@ void x86_load_linux(X86MachineState *x86ms,
 
             /* The early stage can't address past around 64 MB from the original
              * mapping, so just give up in that case. */
-            if (padded_size < 62 * 1024 * 1024)
+            if (true || padded_size < 62 * 1024 * 1024)
                 kernel_size = padded_size;
             else {
                 fprintf(stderr, "qemu: Kernel image too large to hold setup_data\n");

Then build qemu. Run it with `-kernel bzImage`, based on the kernel
built with the .config I attached.

You'll see that the CPU triple faults when hitting this line:

        sd = (struct setup_data *)boot_params->hdr.setup_data;
        while (sd) {
                unsigned long sd_addr = (unsigned long)sd;

                kernel_add_identity_map(sd_addr, sd_addr + sizeof(*sd) + sd->len);  <----
                sd = (struct setup_data *)sd->next;
        }

, because it dereferences *sd. This does not happen if the decompressed
size of the kernel is < 62 megs.

So that's the "big and pretty serious" bug that might be worthy of
investigation.

Jason

  parent reply	other threads:[~2022-12-30 15:59 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-28 14:38 [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data Jason A. Donenfeld
2022-12-28 16:02 ` Philippe Mathieu-Daudé
2022-12-28 16:30   ` Jason A. Donenfeld
2022-12-28 16:57     ` Jason A. Donenfeld
2022-12-28 23:58       ` H. Peter Anvin
2022-12-29  2:13         ` H. Peter Anvin
2022-12-29  2:31         ` Jason A. Donenfeld
2022-12-29  7:28           ` Philippe Mathieu-Daudé
2022-12-29  7:30           ` H. Peter Anvin
2022-12-29  7:31           ` H. Peter Anvin
2022-12-29 12:47             ` Borislav Petkov
2022-12-30 15:54               ` Jason A. Donenfeld
2022-12-30 17:01                 ` Borislav Petkov
2022-12-30 17:07                   ` Jason A. Donenfeld
2022-12-30 19:54                     ` Borislav Petkov
2022-12-30 21:58                       ` H. Peter Anvin
2022-12-30 22:10                         ` Jason A. Donenfeld
2022-12-31  1:06                           ` H. Peter Anvin
2022-12-31  1:14                             ` H. Peter Anvin
2022-12-31 12:55                             ` Jason A. Donenfeld
2022-12-31 13:40                             ` Borislav Petkov
2022-12-31 13:44                               ` Jason A. Donenfeld
2022-12-31 13:48                                 ` Borislav Petkov
2022-12-31 13:51                                   ` Jason A. Donenfeld
2022-12-31 14:24                                     ` Borislav Petkov
2022-12-31 18:22                                       ` Jason A. Donenfeld
2022-12-31 19:00                                         ` Borislav Petkov
2023-01-01  3:21                                           ` H. Peter Anvin
2023-01-01  3:31                                             ` H. Peter Anvin
2023-01-02  6:01                                               ` Borislav Petkov
2023-01-02  6:17                                                 ` Borislav Petkov
2023-01-02  9:32                                                   ` Ard Biesheuvel
2023-01-02 13:36                                                     ` Borislav Petkov
2023-01-02 15:03                                                       ` Ard Biesheuvel
2023-01-02  5:50                                             ` Borislav Petkov
2023-01-01  4:33                                         ` H. Peter Anvin
2023-01-01  4:55                                           ` Mika Penttilä
2023-01-01  5:13                                             ` H. Peter Anvin
2022-12-30 15:59             ` Jason A. Donenfeld [this message]
2022-12-30 16:21               ` Jason A. Donenfeld
2022-12-30 19:13               ` H. Peter Anvin
2022-12-31  9:48               ` Borislav Petkov
2022-12-31 12:54                 ` Jason A. Donenfeld
2022-12-31 13:35                   ` Borislav Petkov
2022-12-31 13:42                     ` Jason A. Donenfeld
2022-12-30 18:30 ` Jason A. Donenfeld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y68K4mPuz6edQkCX@zx2c4.com \
    --to=jason@zx2c4.com \
    --cc=ardb@kernel.org \
    --cc=bp@alien8.de \
    --cc=ebiggers@kernel.org \
    --cc=hpa@zytor.com \
    --cc=kraxel@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).