Re: [PATCH 2/6] x86/boot: Move compressed kernel to end of decompression buffer

From: Kees Cook <keescook@chromium.org>
To: Ingo Molnar <mingo@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>, Ingo Molnar <mingo@redhat.com>,
	Baoquan He <bhe@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Borislav Petkov <bp@alien8.de>, Vivek Goyal <vgoyal@redhat.com>,
	Andy Lutomirski <luto@kernel.org>,
	Lasse Collin <lasse.collin@tukaani.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Young <dyoung@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/6] x86/boot: Move compressed kernel to end of decompression buffer
Date: Fri, 29 Apr 2016 00:48:54 -0700	[thread overview]
Message-ID: <CAGXu5jLyeaUV_Pe7d-GoTdb+PxsTynkGAw+MzK3xd+_gCqbnvg@mail.gmail.com> (raw)
In-Reply-To: <20160429071805.GC28320@gmail.com>

On Fri, Apr 29, 2016 at 12:18 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Kees Cook <keescook@chromium.org> wrote:
>
>> From: Yinghai Lu <yinghai@kernel.org>
>>
>> This change makes later calculations about where the kernel is located
>> easier to reason about. To better understand this change, we must first
>> clarify what VO and ZO are. They were introduced in commits by hpa:
>>
>> 77d1a49 x86, boot: make symbols from the main vmlinux available
>> 37ba7ab x86, boot: make kernel_alignment adjustable; new bzImage fields
>>
>> Specifically:
>>
>> VO:
>> - uncompressed kernel image
>> - size: VO__end - VO__text ("VO_INIT_SIZE" define)
>>
>> ZO:
>> - bootable compressed kernel image (boot/compressed/vmlinux)
>> - head text + compressed kernel (VO and relocs table) + decompressor code
>> - size: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below)
>>
>> The INIT_SIZE definition is used to find the larger of the two image sizes:
>>
>>  #define ZO_INIT_SIZE    (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
>>  #define VO_INIT_SIZE    (VO__end - VO__text)
>>  #if ZO_INIT_SIZE > VO_INIT_SIZE
>>  #define INIT_SIZE ZO_INIT_SIZE
>>  #else
>>  #define INIT_SIZE VO_INIT_SIZE
>>  #endif
>>
>> The current code uses extract_offset to decide where to position the
>> copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE
>> currently includes the extract_offset.)
>
> Yeah, so I rewrote the above to:
>
> =================>
> This change makes later calculations about where the kernel is located
> easier to reason about. To better understand this change, we must first
> clarify what 'VO' and 'ZO' are. These values were introduced in commits
> by hpa:
>
>   77d1a4999502 ("x86, boot: make symbols from the main vmlinux available")
>   37ba7ab5e33c ("x86, boot: make kernel_alignment adjustable; new bzImage fields")
>
> Specifically:
>
> All names prefixed with 'VO_':
>
>  - relate to the uncompressed kernel image
>
>  - the size of the VO image is: VO__end-VO__text ("VO_INIT_SIZE" define)
>
> All names prefixed with 'ZO_':
>
>  - relate to the bootable compressed kernel image (boot/compressed/vmlinux),
>    which is composed of the following memory areas:
>      - head text
>      - compressed kernel (VO image and relocs table)
>      - decompressor code
>
>  - the size of the ZO image is: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below)
>
> The 'INIT_SIZE' value is used to find the larger of the two image sizes:
>
>  #define ZO_INIT_SIZE    (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
>  #define VO_INIT_SIZE    (VO__end - VO__text)
>
>  #if ZO_INIT_SIZE > VO_INIT_SIZE
>  # define INIT_SIZE ZO_INIT_SIZE
>  #else
>  # define INIT_SIZE VO_INIT_SIZE
>  #endif
>
> The current code uses extract_offset to decide where to position the
> copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE
> currently includes the extract_offset.)
> <=================
>
> Assuming the edits I made are correct, this is the point where the changelog lost
> me. It does not explain why ZO_z_extract_offset exists. Why isn't the ZO copied to
> offset 0?
>
> I had to go into arch/x86/boot/compressed/mkpiggy.c, where ZO_z_extract_offset is
> generated, to find the answer: it's needed because we are trying to minimize the
> amount of RAM used for the whole act of creating an uncompressed, executable,
> properly relocation-linked kernel image in system memory. We do this so that
> kernels can be booted on even very small systems.
>
> To achieve the goal of minimal memory consumption we have implemented an in-place
> decompression strategy: instead of cleanly separating the VO and ZO images and
> also allocating some memory for the decompression code's runtime needs, we instead
> create this elaborate layout of memory buffers where the output (decompressed)
> stream, as it progresses, overlaps with and destroys the input (compressed)
> stream. This can only be done safely if the ZO image is placed to the end of the
> VO range, plus a certain amount of safety distance to make sure that when the last
> bytes of the VO range are decompressed, the compressed stream pointer is safely
> beyond the end of the VO range. Correct?
>
> This is a very essential central concept to the whole code, but nowhere is it
> described clearly!

That would certainly be worth calling out in the description, true.

> But more importantly, especially in view of address space randomization, we should
> realize that the days of 8 MB i386-DX systems are gone, and we should get rid of
> all this crazy obfuscation that is hindering development in this area. I also
> suspect that the actual temporary allocation size reduction savings from this
> trick are relatively small, compared to the resulting total memory size.
>
> So my suggestion: let's just cleanly separate all the data areas and not try to do
> any clever overlapping: the benefit will be minimal, and any system that has main
> RAM less than twice of the VO+ZO image sizes is fundamentally unbootable and
> unusable anyway.
>
> I.e. have a really clean size calculation of:
>
>         ZO + VO + decompressor-stacks-size + decompressor-data-size
>
> and decompress accordingly without tricks, without overlaps, without any chance
> for corruption - and, most importantly, without this metric ton of obfuscation
> that very few people have managed to fight their way through in the last couple of
> years, and which hinders essential features ...
>
> Agreed?

I don't agree. We do still have embedded systems running x86 kernels,
and we have cases where we're running multiple kernels in memory (like
kdump). I think the memory savings is worth the complexity, especially
since the complexity is being reduced up by this patch. But that's not
all:

If we moved the compressed kernel after the buffer, the only thing
we'd do would be taking up more memory. We'd still have the head_*.S
complexity of handling the relocation and handling the copy, we'd
still have the extraction, etc, etc. The only thing would be literally
changing extract_offset to INIT_SIZE. Everything else would be the
same.

If we moved the decompressed kernel after the compressed kernel,
(ignoring KASLR for a moment) then we'd end up in a confusing
situation where the kernel would be running somewhere other than where
the boot loader asked it to load. I don't even want to think about the
weird bug reports we might get from a change like that from old or
weird loaders.

This patch gets us a more reasonable layout with less complexity and
no change to the memory footprint without changing the expectations of
the boot loader. I really think this should stand.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security