It was <2020-06-01 pon 19:25>, when Russell King - ARM Linux admin wrote:
> On Mon, Jun 01, 2020 at 06:19:52PM +0200, Lukasz Stelmach wrote:
>> It was <2020-06-01 pon 15:55>, when Russell King - ARM Linux admin wrote:
>> > On Mon, Jun 01, 2020 at 04:27:52PM +0200, Łukasz Stelmach wrote:
>> >> Add DCSZ tag which holds dynamic memory (stack, bss, malloc pool)
>> >> requirements of the decompressor code.
>> >
>> > Why do we need to know the stack and BSS size, when the userspace
>> > kexec tool doesn't need to know this to perform the same function.
>> 
>> 
>> kexec-tools load zImage as low in DRAM as possible and rely on two
>> assumptions:
>> 
>>     + the zImage will copy itself to make enough room for the kernel,
>>     + sizeof(zImage+mem) < sizeof(kernel+mem), which is true because
>>       of compression.
>> 
>>        DRAM start
>>        + 0x8000
>> 
>> zImage    |-----------|-----|-------|
>>             text+data   bss   stack 
>> 
>>                  text+data           bss   
>> kernel    |---------------------|-------------------|
>> 
>> 
>> initrd                                              |-initrd-|-dtb-|
>
> This is actually incorrect, because the decompressor will self-
> relocate itself to avoid the area that it is going to decompress
> into.

I described the state right after kexec(8) invocation.

> So, while the decompressor runs, in the above situation it
> ends up as:
>
>
> ram    |------------------------------------------------------...
>                  text+data           bss   
> kernel    |---------------------|-------------------|
> zImage                          |-----------|-----|-------|
>                                   text+data   bss   stack+malloc

And I am trying to achieve this state before the decompressor starts so
it won't need to copy itself during boot. The only exception is (as we
discussed under a different patch) when edata_size >= 128-eps MiB because
loading zImage above 128 MiB prevents it from properly detecting
physical memory. In such unlikely case my code behaves like kexec-tools
and loads zImage low. That is why I suggested that passing detailed
information about memory layout to the decompressor would help.

> Where "text+data" is actually smaller than the image size that
> was loaded - the part of the image that performs the relocation
> is discarded (the first chunk of code up to "restart" - 200
> bytes.)  The BSS is typically smaller than 200 bytes, so we've
> been able to get away without knowing the actual BSS size so
> far.
>
>
> ram    |--|-----------------------------------------|---------...
>        |<>| TEXT_OFFSET
> kernel    |---------------------|-------------------|
>           |<----edata_size----->|<-----bss_size---->|
>           |<---------------kernel_size------------->|
> zImage                          |-----------|-----|-------|
>                                 |<-------len------->| (initial)
> 				|<----------len------------>| (final)
>
> The "final" len value is what the decompressor prints as the "zImage
> requires" debugging value.
>
> Hence, the size that the decompressed kernel requires is kernel_size.
>
> The size that the decompressor requires is edata_size + len(final).
>
> Now, if you intend to load the kernel to ram + TEXT_OFFSET + edata_size
> then it isn't going to lose the first 200 bytes of code, so as you
> correctly point out, we need to know the BSS size.

Formal note: can we keep using terms zImage and kernel as separate,
where zImage is what is loaded with kexec and kernel is the decompressed
code loaded at TEXT_OFFSET. I believe, it will help us avoid mistakes.

>> >> +struct arm_zimage_tag_dc {
>> >> +	struct tag_header hdr;
>> >> +	union {
>> >> +#define ZIMAGE_TAG_DECOMP_SIZE ARM_ZIMAGE_MAGIC4
>> >> +		struct zimage_decomp_size {
>> >> +			__le32 bss_size;
>> >> +			__le32 stack_size;
>> >> +			__le32 malloc_size;
>> >> +		} decomp_size;
>
> You certainly don't need to know all this.  All you need to know is
> how much space the decompressor requires after the end of the image,
> encompassing the BSS size, stack size and malloc size, which is one
> value.

I agree. However, since we are not fighting here for every single byte,
I'd rather add them as separate values and make the tag, even if only
slightly, more future-proof.

Kind regards,
-- 
Łukasz Stelmach
Samsung R&D Institute Poland
Samsung Electronics