Re: [PATCH RFC 1/2] x86/boot: Introduce the setup_header2

From: Daniel Kiper <daniel.kiper@oracle.com>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-kernel@vger.kernel.org, x86@kernel.org,
	dpsmith@apertussolutions.com, eric.snowberg@oracle.com,
	kanth.ghatraju@oracle.com, konrad.wilk@oracle.com,
	ross.philipson@oracle.com
Subject: Re: [PATCH RFC 1/2] x86/boot: Introduce the setup_header2
Date: Fri, 14 Jun 2019 13:06:50 +0200	[thread overview]
Message-ID: <20190614110650.ytg7dvyimstu2lis@tomti.i.net-space.pl> (raw)
In-Reply-To: <95fd235b-b4e5-c547-3625-b23ef66c5d4f@zytor.com>

I am working on new version of patches but I have some concerns. Please
look below for more details...

On Thu, Jun 06, 2019 at 03:06:30PM -0700, H. Peter Anvin wrote:
> On 5/24/19 2:55 AM, Daniel Kiper wrote:
> > Due to limited space left in the setup header it was decided to
> > introduce the setup_header2. Its role is to communicate Linux kernel
> > supported features to the boot loader. Starting from now this is the
> > primary way to communicate things to the boot loader.
> >
> > Suggested-by: H. Peter Anvin <hpa@zytor.com>
> > Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
> > Reviewed-by: Ross Philipson <ross.philipson@oracle.com>
> > Reviewed-by: Eric Snowberg <eric.snowberg@oracle.com>
> > ---
> > I know that setup_header2 is not the best name. There were some
> > alternatives proposed like setup_header_extra, setup_header_addendum,
> > setup_header_more, ext_setup_header, extended_setup_header, extended_header
> > and extended_setup. Sadly, I am not happy with any of them. So,
> > leaving setup_header2 as is but still looking for better name.
> > Probably shorter == better...
>
> I would say kernel_info. The relationships between the headers are analogous
> to the various data sections:
>
> 	setup_header		= .data
> 	boot_params/setup_data	= .bss
>
> What is missing from the above list? That's right:
>
> 	kernel_info		= .rodata
>
> We have been (ab)using .data for things that could go into .rodata or .bss for
> a long time, for lack of alternatives and -- especially early on -- intertia.
> Also, the BIOS stub is responsible for creating boot_params, so it isn't
> available to a BIOS-based loader (setup_data is, though.)
>
> setup_header is permanently limited to 144 bytes due to the reach of the
> 2-byte jump field, which doubles as a length field for the structure, combined
> with the size of the "hole" in struct boot_params that a protected-mode loader
> or the BIOS stub has to copy it into. It is currently 119 bytes long, which
> leaves us with 25 very precious bytes. This isn't something that can be fixed
> without revising the boot protocol entirely, breaking backwards compatibility.
>
> boot_params proper is limited to 4096 bytes, but can be arbitrarily extended
> by adding setup_data entries. It cannot be used to communicate properties of
> the kernel image, because it is .bss and has no image-provided content.
>
> kernel_info solves this by providing an extensible place for information about
> the kernel image. It is readonly, because the kernel cannot rely on a
> bootloader copying its contents anywhere, but that is OK; if it becomes
> necessary it can still contain data items that an enabled bootloader would be
> expected to copy into a setup_data chunk.
>
> ^ The above or some variant thereof may be a good thing to put both in your
> patch comments as well as in the boot protocol documentation.

Will do...

> While we are making a change that bumps the version number anyway, there is
> another change I would like to make to the boot protocol which we might as
> well do at the same time. setup_data is a bit awkward to use for extremely
> large data objects, both because the setup_data header has to be adjacent to
> the data object, and because it has a 32-bit length field. However, it is
> important that intermediate stages of the boot process have a way to identify
> which chunks of memory are occupied by kernel data.

Is not possible to "identify which chunks of memory are occupied by
kernel data" today? I think that it is. So, this does not look like
a valid point for change. Am I missing something?

> Thus I think we should introduce a uniform way to specify such indirect data.
> We define a new setup_data type we can maybe call SETUP_INDIRECT; a
> SETUP_INDIRECT data item would be an array of structures of the form:

OK.

> struct setup_indirect {
> 	__u32 type;
> 	__u32 reserved;	/* Reserved, must be set to zero */
> 	__u64 len;
> 	__u64 addr;
> };
>
> ... where type is itself simply a SETUP_* type -- although we probably don't
> want to let it be SETUP_INDIRECT itself since making it a tree structure could
> require a lot of stack space in something that needs to parse it, and stack
> space can be limited in boot contexts.

Yeah...

> This would be particularly useful for having SETUP_INITRAMFS, if it becomes
> desirable to allow the kernel to parse a non-contiguous set of memory regions
> for the initramfs.

OK.

> It might be a good idea to immediately start out struct kernel_info with
> either a high mark or a bitmask of SETUP_* types that the kernel supports. A

High mark seems better for me here.

> bitmask would be more flexible, but would need provisions to be grown in the
> future.

Yep.

Anyway, I agree with the idea but I am not sure it makes sense to
introduce something which does not have users right now. Does it?

> Which leads me to yet another thought.
>
> We probably want to make the contents of kernel_info a bit more structured to
> allow for content that may need to be extended in the future, or is inherently
> variable length (like strings.)
>
> This would lend itself to a structure such as:
>
> 	- Magic number
> 	- Length of total structure

My first proposal have both fields...

> ... followed by a list of data chunks, each prefixed by a length field. The
> first data chunk would be the main (root) structure; other data structures are

I am not sure that I understand the idea of the main (root) structure.
Should it point to itself?

> pointed to from the root structure using offsets from the beginning of the
> structure (the magic number field.)

Sounds nice but I think that it is an overkill in many simple cases,
e.g. look at MLE entry point. In case of MLE entry point we do not need
nothing fancy. So, I think that this filed should be a member of the
kernel_info which points directly to new MLE entry point. I can agree
that we can use the mechanism mentioned by you above in more complicated
cases and it can be described in Documentation/x86/boot.rst. But I would
not enforce it for everything. Just for the sake of simplicity.

> As an implementation detail, strings can of course be "pooled" into a single
> data chunk as long as they are zero-terminated.

OK.

> I have intentionally avoided specifying a type field for each data chunk;
> history shows that it is generally a bad idea to have multiple ways to derive
> the same information, as different implementations will do it differently,
> resulting in bugs when things change.

Make sense for me.

So, it seems to me that we have to have at least two patches:
  - one introducing the kernel_info structure,
  - another introducing the setup_indirect and bumping the
    boot protocol version number.

This thing has at least one cons: first patch introduces a new boot
protocol feature but it is not reflected in the protocol version change.
Not perfect but I do not think that we should bump the version number in
first patch. do you have better idea?

Daniel