Re: [RFC PATCH 0/5] ARM: reduce the memory consumed when mapping UEFI flash images

From: David Edmondson <dme@dme.org>
To: "Alex Bennée" <alex.bennee@linaro.org>,
	"Philippe Mathieu-Daudé" <philmd@redhat.com>
Cc: "Kevin Wolf" <kwolf@redhat.com>,
	"Peter Maydell" <peter.maydell@linaro.org>,
	qemu-block@nongnu.org, "Michael S. Tsirkin" <mst@redhat.com>,
	"Xu Yandong" <xuyandong2@huawei.com>,
	qemu-devel@nongnu.org, "Markus Armbruster" <armbru@redhat.com>,
	"Shannon Zhao" <shannon.zhaosl@gmail.com>,
	"Zheng Xiang" <zhengxiang9@huawei.com>,
	qemu-arm@nongnu.org, "haibinzhang(张海斌)" <haibinzhang@tencent.com>,
	"Mihai Carabas" <mihai.carabas@oracle.com>,
	"Igor Mammedov" <imammedo@redhat.com>,
	"Max Reitz" <mreitz@redhat.com>, "John Snow" <jsnow@redhat.com>
Subject: Re: [RFC PATCH 0/5] ARM: reduce the memory consumed when mapping UEFI flash images
Date: Mon, 07 Dec 2020 12:07:40 +0000	[thread overview]
Message-ID: <cun5z5du8dv.fsf@zarquon.hh.sledj.net> (raw)
In-Reply-To: <87ft55vad4.fsf@linaro.org>

Sorry for the delay in replying - holiday then distractions.

On Thursday, 2020-11-19 at 11:45:11 GMT, Alex Bennée wrote:

> Philippe Mathieu-Daudé <philmd@redhat.com> writes:
>
>> On 11/16/20 2:48 PM, Markus Armbruster wrote:
>>> Philippe Mathieu-Daudé <philmd@redhat.com> writes:
>>> 
>>>> Hi David,
>>>>
>>>> On 11/16/20 11:42 AM, David Edmondson wrote:
>>>>> Currently ARM UEFI images are typically built as 2MB/768kB flash
>>>>> images for code and variables respectively. These images are both then
>>>>> padded out to 64MB before being loaded by QEMU.
>>>>>
>>>>> Because the images are 64MB each, QEMU allocates 128MB of memory to
>>>>> read them, and then proceeds to read all 128MB from disk (dirtying the
>>>>> memory). Of this 128MB less than 3MB is useful - the rest is zero
>>>>> padding.
>>>>
>>>> 2 years ago I commented the same problem, and suggested to access the
>>>> underlying storage by "block", as this is a "block storage".
>>>>
>>>> Back then the response was "do not try to fix something that works".
>>>> This is why we choose the big hammer "do not accept image size not
>>>> matching device size" way.
>>>>
>>>> While your series seems to help, it only postpone the same
>>>> implementation problem. If what you want is use the least memory
>>>> required, I still believe accessing the device by block is the
>>>> best approach.
>>> 
>>> "Do not try to fix something that works" is hard to disagree with.
>>> However, at least some users seem to disagree with "this works".  Enough
>>> to overcome the resistance to change?
>>
>> Yeah, at least 3 big users so far:
>>
>> - Huawei
>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg607292.html
>> - Tencent
>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg742066.html
>> - Oracle
>> (this series).
>>
>> Then Huawei tried the MicroVM approach:
>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg680103.html
>>
>> I simply wanted to save David time by remembering this other approach,
>> since Peter already commented on David's one (see Huawei link).
>
> IIRC the two questions that came up were:
>
>   - what would reading memory not covered by a file look like (0's or
>     something more like real HW, 7f?).
>
>   - what happens when the guest writes beyond the bounds of a backing
>     file?

In the non-backward-compatible-case (pre virt-5.2 in the patches, but
obviously not actually in 5.2, where the memory range is declared to the
guest as the extent of the file) neither of these issues should arise.

Reading/writing outside of the declared range should generate a fault,
much like any other non-existent region.

In the backward-compatible case the patches are obviously flawed - a
write outside of the extent of the backing file would have nowhere to
go.

In order to figure out how to move forward, I think that the summary of
the previous conversations is that either:
- it's not possible to reduce the size of the memory region declared to
  the guest from 64M+64M because of $reasons, or
- we don't want to reduce the size of the memory region declared to the
  guest from 64M+64M because of $reasons.

If that's right, I'd be curious to better understand $reasons, but have
no basis on which to argue.

Presuming that it's not acceptable to reduce the declared extent of the
flash regions, I will probably look to implement Philippe's suggestion:

> If what you want is use the least memory required, I still believe
> accessing the device by block is the best approach.

...which I interpret to mean that the pflash drivers should not allocate
memory and read the images (later writing back modified blocks), but
handle each IO request from the caller by reading and writing blocks
from the underlying device as necessary.

If this interpretation is wrong, please let me know :-)

Looking at doing that, it seems that a new variant of
memory_region_init_rom_device() that does not allocate memory will be
required, unless there anything already available that performs a
similar function?

Booting a VM with AAVMF and writing some variables to the flash doesn't
exercise much of the pflash functionality. In particular it would seem
like the right time to fix the batch write omission in the current
code. Is there another consumer of the pflash drivers that is likely to
exercise more? (I can write tests that run in the guest, of course,
presuming that the Linux driver can be pushed into using those paths.)

> I'm guessing for these cloudy type applications no one cares about
> persistence of EFI variables? Maybe we just need a formulation for the
> second pflash which is explicit about writes being ephemeral while also
> being accepted?

Our current deployments on x86 do not have writable flash for VARS,
though I believe that was problematic on arm64 until a recent
change. It's my suspicion that assuming we can continue to present
read-only VARS generally is going to be proven wrong before too long.

dme.
-- 
I'm not living in the real world, no more, no more.