Re: [RFC PATCH 0/3] hw/pflash_cfi01: Reduce memory consumption when flash image is smaller than region - Philippe Mathieu-Daudé

From: "Philippe Mathieu-Daudé" <philmd@redhat.com>
To: David Edmondson <david.edmondson@oracle.com>, qemu-block@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>,
	qemu-devel@nongnu.org, Max Reitz <mreitz@redhat.com>
Subject: Re: [RFC PATCH 0/3] hw/pflash_cfi01: Reduce memory consumption when flash image is smaller than region
Date: Tue, 16 Feb 2021 16:03:05 +0100	[thread overview]
Message-ID: <df4db595-c2db-4fa8-0a4b-1403117dcc76@redhat.com> (raw)
In-Reply-To: <20210216142721.1985543-1-david.edmondson@oracle.com>

On 2/16/21 3:27 PM, David Edmondson wrote:
> As described in
> https://lore.kernel.org/r/20201116104216.439650-1-david.edmondson@oracle.com,
> I'd like to reduce the amount of memory consumed by QEMU mapping UEFI
> images on aarch64.
> 
> To recap:
> 
>> Currently ARM UEFI images are typically built as 2MB/768kB flash
>> images for code and variables respectively. These images are both
>> then padded out to 64MB before being loaded by QEMU.
>>
>> Because the images are 64MB each, QEMU allocates 128MB of memory to
>> read them, and then proceeds to read all 128MB from disk (dirtying
>> the memory). Of this 128MB less than 3MB is useful - the rest is
>> zero padding.
>>
>> On a machine with 100 VMs this wastes over 12GB of memory.
> 
> There were objections to my previous patch because it changed the size
> of the regions reported to the guest via the memory map (the reported
> size depended on the size of the image).
> 
> This is a smaller patch which only helps with read-only flash images,
> as it does so by changing the memory region that covers the entire
> region to be IO rather than RAM, and loads the flash image into a
> smaller sub-region that is the more traditional mixed IO/ROMD type.
> 
> All read/write operations to areas outside of the underlying block
> device are handled directly (reads return 0, writes fail (which is
> okay, because this path only supports read-only devices)).
> 
> This reduces the memory consumption for the read-only AAVMF code image
> from 64MB to around 2MB (presuming that the UEFI image is adjusted
> accordingly). It does nothing to improve the memory consumption caused
> by the read-write AAVMF vars image.

So for each VM this changes from 64 + 64 to 2 + 64 MiB.

100 VMs now use 6.5GB instead of 400MB. Quite an improvement already :)

> There was a suggestion in a previous thread that perhaps the pflash
> driver could be re-worked to use the block IO interfaces to access the
> underlying device "on demand" rather than reading in the entire image
> at startup (at least, that's how I understood the comment).
> 
> I looked at implementing this and struggled to get it to work for all
> of the required use cases. Specifically, there are several code paths
> that expect to retrieve a pointer to the flat memory image of the
> pflash device and manipulate it directly (examples include the Malta
> board and encrypted memory support on x86), or write the entire image
> to storage (after migration).

IIUC these are specific uses when the machine is paused. For Malta we
can map a ROM instead.

I don't know about encrypted x86 machines.

> My implementation was based around mapping the flash region only for
> IO, which meant that every read or write had to be handled directly by
> the pflash driver (there was no ROMD style operation), which also made
> booting an aarch64 VM noticeably slower - getting through the firmware
> went from under 1 second to around 10 seconds.
> 
> Improving the writeable device support requires some more general
> infrastructure, I think, but I'm not familiar with everything that
> QEMU currently provides, and would be very happy to learn otherwise.

I am not a block expert, but I wonder if something like this could
be used:

- create a raw (parent) block image of 64MiB

- add a raw (child) block with your 768kB of VARS file

- add a null-co (child) block of 63Mib + 256kiB

- pass the parent block to the pflash device

Regards,

Phil.