All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] ARM: reduce the memory consumed when mapping UEFI flash images
@ 2020-11-16 10:42 David Edmondson
  2020-11-16 10:42 ` [RFC PATCH 1/5] hw/block: blk_check_size_and_read_all should report backend name David Edmondson
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: David Edmondson @ 2020-11-16 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Peter Maydell, qemu-block, Michael S. Tsirkin,
	Max Reitz, David Edmondson, Shannon Zhao, qemu-arm,
	Igor Mammedov, John Snow

Currently ARM UEFI images are typically built as 2MB/768kB flash
images for code and variables respectively. These images are both then
padded out to 64MB before being loaded by QEMU.

Because the images are 64MB each, QEMU allocates 128MB of memory to
read them, and then proceeds to read all 128MB from disk (dirtying the
memory). Of this 128MB less than 3MB is useful - the rest is zero
padding.

On a machine with 100 VMs this wastes over 12GB of memory.

This set of patches aims to reclaim the wasted memory by allowing QEMU
to respect the size of the flash images and allocate only the
necessary memory. This will, of course, require that the flash build
process be modified to avoid padding the images to 64MB.

Because existing machine types expect the full 128MB reserved for
flash to be occupied, do so for machine types older than virt-5.2. The
changes are beneficial even in this case, because while the full 128MB
of memory is allocated, only that required to actually load the flash
images from disk is touched.

David Edmondson (5):
  hw/block: blk_check_size_and_read_all should report backend name
  hw/block: Flash images can be smaller than the device
  hw/arm: Convert assertions about flash image size to error_report
  hw/arm: Flash image mapping follows image size
  hw/arm: Only minimise flash size on older machines

 hw/arm/trace-events      |  2 +
 hw/arm/virt-acpi-build.c | 30 ++++++++------
 hw/arm/virt.c            | 86 +++++++++++++++++++++++++++++-----------
 hw/block/block.c         | 26 ++++++------
 include/hw/arm/virt.h    |  2 +
 5 files changed, 97 insertions(+), 49 deletions(-)

-- 
2.28.0



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 1/5] hw/block: blk_check_size_and_read_all should report backend name
  2020-11-16 10:42 [RFC PATCH 0/5] ARM: reduce the memory consumed when mapping UEFI flash images David Edmondson
@ 2020-11-16 10:42 ` David Edmondson
  2020-11-16 11:23   ` Philippe Mathieu-Daudé
  2020-11-19 11:56   ` Alex Bennée
  2020-11-16 10:42 ` [RFC PATCH 2/5] hw/block: Flash images can be smaller than the device David Edmondson
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: David Edmondson @ 2020-11-16 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Peter Maydell, qemu-block, Michael S. Tsirkin,
	Max Reitz, David Edmondson, Shannon Zhao, qemu-arm,
	Igor Mammedov, John Snow

If there are problems examining or reading data from the block
backend, the error messages should include an appropriate identifier
to assist in diagnoses.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
---
 hw/block/block.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/hw/block/block.c b/hw/block/block.c
index 1e34573da7..8b284e1f14 100644
--- a/hw/block/block.c
+++ b/hw/block/block.c
@@ -20,9 +20,6 @@
  * BDRV_REQUEST_MAX_BYTES.
  * On success, return true.
  * On failure, store an error through @errp and return false.
- * Note that the error messages do not identify the block backend.
- * TODO Since callers don't either, this can result in confusing
- * errors.
  * This function not intended for actual block devices, which read on
  * demand.  It's for things like memory devices that (ab)use a block
  * backend to provide persistence.
@@ -32,17 +29,19 @@ bool blk_check_size_and_read_all(BlockBackend *blk, void *buf, hwaddr size,
 {
     int64_t blk_len;
     int ret;
+    const char *name = blk_name(blk);
 
     blk_len = blk_getlength(blk);
     if (blk_len < 0) {
         error_setg_errno(errp, -blk_len,
-                         "can't get size of block backend");
+                         "can't get size of block backend %s",
+                         name);
         return false;
     }
     if (blk_len != size) {
         error_setg(errp, "device requires %" HWADDR_PRIu " bytes, "
-                   "block backend provides %" PRIu64 " bytes",
-                   size, blk_len);
+                   "block backend %s provides %" PRIu64 " bytes",
+                   size, name, blk_len);
         return false;
     }
 
@@ -55,7 +54,8 @@ bool blk_check_size_and_read_all(BlockBackend *blk, void *buf, hwaddr size,
     assert(size <= BDRV_REQUEST_MAX_BYTES);
     ret = blk_pread(blk, 0, buf, size);
     if (ret < 0) {
-        error_setg_errno(errp, -ret, "can't read block backend");
+        error_setg_errno(errp, -ret, "can't read block backend %s",
+                         name);
         return false;
     }
     return true;
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 2/5] hw/block: Flash images can be smaller than the device
  2020-11-16 10:42 [RFC PATCH 0/5] ARM: reduce the memory consumed when mapping UEFI flash images David Edmondson
  2020-11-16 10:42 ` [RFC PATCH 1/5] hw/block: blk_check_size_and_read_all should report backend name David Edmondson
@ 2020-11-16 10:42 ` David Edmondson
  2020-11-16 10:42 ` [RFC PATCH 3/5] hw/arm: Convert assertions about flash image size to error_report David Edmondson
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: David Edmondson @ 2020-11-16 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Peter Maydell, qemu-block, Michael S. Tsirkin,
	Max Reitz, David Edmondson, Shannon Zhao, qemu-arm,
	Igor Mammedov, John Snow

When loading a flash image into a device, allow the image to be
smaller than the extent of the device.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
---
 hw/block/block.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/hw/block/block.c b/hw/block/block.c
index 8b284e1f14..40262546bd 100644
--- a/hw/block/block.c
+++ b/hw/block/block.c
@@ -16,8 +16,8 @@
 
 /*
  * Read the entire contents of @blk into @buf.
- * @blk's contents must be @size bytes, and @size must be at most
- * BDRV_REQUEST_MAX_BYTES.
+ * @blk's contents must not be more than @size bytes, and must be at
+ * most BDRV_REQUEST_MAX_BYTES in length.
  * On success, return true.
  * On failure, store an error through @errp and return false.
  * This function not intended for actual block devices, which read on
@@ -38,10 +38,10 @@ bool blk_check_size_and_read_all(BlockBackend *blk, void *buf, hwaddr size,
                          name);
         return false;
     }
-    if (blk_len != size) {
-        error_setg(errp, "device requires %" HWADDR_PRIu " bytes, "
-                   "block backend %s provides %" PRIu64 " bytes",
-                   size, name, blk_len);
+    if (blk_len > size) {
+        error_setg(errp, "block backend %s is too large for device "
+                   "(%" PRIu64 " > %" HWADDR_PRIu ")",
+                   name, blk_len, size);
         return false;
     }
 
@@ -51,8 +51,8 @@ bool blk_check_size_and_read_all(BlockBackend *blk, void *buf, hwaddr size,
      * should probably rework the device to be more like an actual
      * block device and read only on demand.
      */
-    assert(size <= BDRV_REQUEST_MAX_BYTES);
-    ret = blk_pread(blk, 0, buf, size);
+    assert(blk_len <= BDRV_REQUEST_MAX_BYTES);
+    ret = blk_pread(blk, 0, buf, blk_len);
     if (ret < 0) {
         error_setg_errno(errp, -ret, "can't read block backend %s",
                          name);
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 3/5] hw/arm: Convert assertions about flash image size to error_report
  2020-11-16 10:42 [RFC PATCH 0/5] ARM: reduce the memory consumed when mapping UEFI flash images David Edmondson
  2020-11-16 10:42 ` [RFC PATCH 1/5] hw/block: blk_check_size_and_read_all should report backend name David Edmondson
  2020-11-16 10:42 ` [RFC PATCH 2/5] hw/block: Flash images can be smaller than the device David Edmondson
@ 2020-11-16 10:42 ` David Edmondson
  2020-11-19 11:39   ` Alex Bennée
  2020-11-16 10:42 ` [RFC PATCH 4/5] hw/arm: Flash image mapping follows image size David Edmondson
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: David Edmondson @ 2020-11-16 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Peter Maydell, qemu-block, Michael S. Tsirkin,
	Max Reitz, David Edmondson, Shannon Zhao, qemu-arm,
	Igor Mammedov, John Snow

Rather than throwing an assertion, provide a more detailed report if a
flash image is inappropriately sized or aligned.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
---
 hw/arm/virt.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 27dbeb549e..f9f10987dc 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -967,9 +967,21 @@ static void virt_flash_map1(PFlashCFI01 *flash,
                             MemoryRegion *sysmem)
 {
     DeviceState *dev = DEVICE(flash);
+    const char *name = blk_name(pflash_cfi01_get_blk(flash));
+
+    if (size == 0 || !QEMU_IS_ALIGNED(size, VIRT_FLASH_SECTOR_SIZE)) {
+        error_report("system firmware block device %s has invalid size "
+                     "%" PRId64, name, size);
+        info_report("its size must be a non-zero multiple of 0x%" PRIx64,
+                    VIRT_FLASH_SECTOR_SIZE);
+        exit(1);
+    }
+    if (!(size / VIRT_FLASH_SECTOR_SIZE <= UINT32_MAX)) {
+        error_report("system firmware block device %s is too large "
+                     "(%" PRId64 ")", name, size);
+        exit(1);
+    }
 
-    assert(QEMU_IS_ALIGNED(size, VIRT_FLASH_SECTOR_SIZE));
-    assert(size / VIRT_FLASH_SECTOR_SIZE <= UINT32_MAX);
     qdev_prop_set_uint32(dev, "num-blocks", size / VIRT_FLASH_SECTOR_SIZE);
     sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);
 
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 4/5] hw/arm: Flash image mapping follows image size
  2020-11-16 10:42 [RFC PATCH 0/5] ARM: reduce the memory consumed when mapping UEFI flash images David Edmondson
                   ` (2 preceding siblings ...)
  2020-11-16 10:42 ` [RFC PATCH 3/5] hw/arm: Convert assertions about flash image size to error_report David Edmondson
@ 2020-11-16 10:42 ` David Edmondson
  2020-11-16 10:42 ` [RFC PATCH 5/5] hw/arm: Only minimise flash size on older machines David Edmondson
  2020-11-16 11:39 ` [RFC PATCH 0/5] ARM: reduce the memory consumed when mapping UEFI flash images Philippe Mathieu-Daudé
  5 siblings, 0 replies; 17+ messages in thread
From: David Edmondson @ 2020-11-16 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Peter Maydell, qemu-block, Michael S. Tsirkin,
	Max Reitz, David Edmondson, Shannon Zhao, qemu-arm,
	Igor Mammedov, John Snow

When mapping flash images into the bottom 128MB, create mappings that
match the size of the underlying block device rather than 64MB.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
---
 hw/arm/trace-events      |  2 +
 hw/arm/virt-acpi-build.c | 29 ++++++++-------
 hw/arm/virt.c            | 79 +++++++++++++++++++++-------------------
 include/hw/arm/virt.h    |  1 +
 4 files changed, 60 insertions(+), 51 deletions(-)

diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index a335ee891d..a9174f8fba 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -53,3 +53,5 @@ smmuv3_notify_flag_add(const char *iommu) "ADD SMMUNotifier node for iommu mr=%s
 smmuv3_notify_flag_del(const char *iommu) "DEL SMMUNotifier node for iommu mr=%s"
 smmuv3_inv_notifiers_iova(const char *name, uint16_t asid, uint64_t iova, uint8_t tg, uint64_t num_pages) "iommu mr=%s asid=%d iova=0x%"PRIx64" tg=%d num_pages=0x%"PRIx64
 
+# virt.c
+virt_flash_map1(const char *name, uint64_t base, uint64_t size) "map %s at 0x%"PRIx64" + 0x%"PRIx64
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 9747a6458f..2c08d36624 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -102,28 +102,31 @@ static void acpi_dsdt_add_fw_cfg(Aml *scope, const MemMapEntry *fw_cfg_memmap)
     aml_append(scope, dev);
 }
 
-static void acpi_dsdt_add_flash(Aml *scope, const MemMapEntry *flash_memmap)
+static void acpi_dsdt_add_flash1(Aml *scope, int index,
+                                 hwaddr base, hwaddr size)
 {
     Aml *dev, *crs;
-    hwaddr base = flash_memmap->base;
-    hwaddr size = flash_memmap->size / 2;
 
-    dev = aml_device("FLS0");
+    dev = aml_device("FLS%u", index);
     aml_append(dev, aml_name_decl("_HID", aml_string("LNRO0015")));
-    aml_append(dev, aml_name_decl("_UID", aml_int(0)));
+    aml_append(dev, aml_name_decl("_UID", aml_int(index)));
 
     crs = aml_resource_template();
     aml_append(crs, aml_memory32_fixed(base, size, AML_READ_WRITE));
     aml_append(dev, aml_name_decl("_CRS", crs));
     aml_append(scope, dev);
+}
 
-    dev = aml_device("FLS1");
-    aml_append(dev, aml_name_decl("_HID", aml_string("LNRO0015")));
-    aml_append(dev, aml_name_decl("_UID", aml_int(1)));
-    crs = aml_resource_template();
-    aml_append(crs, aml_memory32_fixed(base + size, size, AML_READ_WRITE));
-    aml_append(dev, aml_name_decl("_CRS", crs));
-    aml_append(scope, dev);
+static void acpi_dsdt_add_flash(Aml *scope, const MemMapEntry *flash_memmap,
+    PFlashCFI01 *flash[2])
+{
+    acpi_dsdt_add_flash1(scope, 0,
+                         flash_memmap->base,
+                         virt_flash_size(flash[0]));
+
+    acpi_dsdt_add_flash1(scope, 1,
+                         flash_memmap->base + flash_memmap->size / 2,
+                         virt_flash_size(flash[1]));
 }
 
 static void acpi_dsdt_add_virtio(Aml *scope,
@@ -603,7 +606,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     acpi_dsdt_add_uart(scope, &memmap[VIRT_UART],
                        (irqmap[VIRT_UART] + ARM_SPI_BASE));
     if (vmc->acpi_expose_flash) {
-        acpi_dsdt_add_flash(scope, &memmap[VIRT_FLASH]);
+        acpi_dsdt_add_flash(scope, &memmap[VIRT_FLASH], vms->flash);
     }
     acpi_dsdt_add_fw_cfg(scope, &memmap[VIRT_FW_CFG]);
     acpi_dsdt_add_virtio(scope, &memmap[VIRT_MMIO],
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index f9f10987dc..03ec844bf3 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -50,6 +50,7 @@
 #include "sysemu/sysemu.h"
 #include "sysemu/tpm.h"
 #include "sysemu/kvm.h"
+#include "sysemu/block-backend.h"
 #include "hw/loader.h"
 #include "exec/address-spaces.h"
 #include "qemu/bitops.h"
@@ -78,6 +79,7 @@
 #include "hw/virtio/virtio-iommu.h"
 #include "hw/char/pl011.h"
 #include "qemu/guest-random.h"
+#include "trace.h"
 
 #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
     static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
@@ -931,6 +933,11 @@ static void create_virtio_devices(const VirtMachineState *vms)
 
 #define VIRT_FLASH_SECTOR_SIZE (256 * KiB)
 
+int64_t virt_flash_size(PFlashCFI01 *flash)
+{
+    return blk_getlength(pflash_cfi01_get_blk(flash));
+}
+
 static PFlashCFI01 *virt_flash_create1(VirtMachineState *vms,
                                         const char *name,
                                         const char *alias_prop_name)
@@ -969,6 +976,8 @@ static void virt_flash_map1(PFlashCFI01 *flash,
     DeviceState *dev = DEVICE(flash);
     const char *name = blk_name(pflash_cfi01_get_blk(flash));
 
+    trace_virt_flash_map1(name, base, size);
+
     if (size == 0 || !QEMU_IS_ALIGNED(size, VIRT_FLASH_SECTOR_SIZE)) {
         error_report("system firmware block device %s has invalid size "
                      "%" PRId64, name, size);
@@ -995,63 +1004,57 @@ static void virt_flash_map(VirtMachineState *vms,
                            MemoryRegion *secure_sysmem)
 {
     /*
-     * Map two flash devices to fill the VIRT_FLASH space in the memmap.
+     * Map two flash devices in the VIRT_FLASH space in the memmap.
      * sysmem is the system memory space. secure_sysmem is the secure view
      * of the system, and the first flash device should be made visible only
      * there. The second flash device is visible to both secure and nonsecure.
      * If sysmem == secure_sysmem this means there is no separate Secure
      * address space and both flash devices are generally visible.
      */
-    hwaddr flashsize = vms->memmap[VIRT_FLASH].size / 2;
-    hwaddr flashbase = vms->memmap[VIRT_FLASH].base;
+    MemMapEntry *m = &vms->memmap[VIRT_FLASH];
 
-    virt_flash_map1(vms->flash[0], flashbase, flashsize,
-                    secure_sysmem);
-    virt_flash_map1(vms->flash[1], flashbase + flashsize, flashsize,
-                    sysmem);
+    virt_flash_map1(vms->flash[0], m->base,
+                    virt_flash_size(vms->flash[0]), secure_sysmem);
+
+    virt_flash_map1(vms->flash[1], m->base + m->size / 2,
+                    virt_flash_size(vms->flash[1]), sysmem);
 }
 
 static void virt_flash_fdt(VirtMachineState *vms,
                            MemoryRegion *sysmem,
                            MemoryRegion *secure_sysmem)
 {
-    hwaddr flashsize = vms->memmap[VIRT_FLASH].size / 2;
-    hwaddr flashbase = vms->memmap[VIRT_FLASH].base;
+    bool secure = sysmem != secure_sysmem;
+    MemMapEntry *m = &vms->memmap[VIRT_FLASH];
+    hwaddr flashbase0 = m->base;
+    hwaddr flashbase1 = m->base + m->size / 2;
+    hwaddr flashsize0 = virt_flash_size(vms->flash[0]);
+    hwaddr flashsize1 = virt_flash_size(vms->flash[1]);
     char *nodename;
 
-    if (sysmem == secure_sysmem) {
-        /* Report both flash devices as a single node in the DT */
-        nodename = g_strdup_printf("/flash@%" PRIx64, flashbase);
-        qemu_fdt_add_subnode(vms->fdt, nodename);
-        qemu_fdt_setprop_string(vms->fdt, nodename, "compatible", "cfi-flash");
-        qemu_fdt_setprop_sized_cells(vms->fdt, nodename, "reg",
-                                     2, flashbase, 2, flashsize,
-                                     2, flashbase + flashsize, 2, flashsize);
-        qemu_fdt_setprop_cell(vms->fdt, nodename, "bank-width", 4);
-        g_free(nodename);
+    if (secure) {
+        nodename = g_strdup_printf("/secflash@%" PRIx64, flashbase0);
     } else {
-        /*
-         * Report the devices as separate nodes so we can mark one as
-         * only visible to the secure world.
-         */
-        nodename = g_strdup_printf("/secflash@%" PRIx64, flashbase);
-        qemu_fdt_add_subnode(vms->fdt, nodename);
-        qemu_fdt_setprop_string(vms->fdt, nodename, "compatible", "cfi-flash");
-        qemu_fdt_setprop_sized_cells(vms->fdt, nodename, "reg",
-                                     2, flashbase, 2, flashsize);
-        qemu_fdt_setprop_cell(vms->fdt, nodename, "bank-width", 4);
+        nodename = g_strdup_printf("/flash@%" PRIx64, flashbase0);
+    }
+    qemu_fdt_add_subnode(vms->fdt, nodename);
+    qemu_fdt_setprop_string(vms->fdt, nodename, "compatible", "cfi-flash");
+    qemu_fdt_setprop_sized_cells(vms->fdt, nodename, "reg",
+                                 2, flashbase0, 2, flashsize0);
+    qemu_fdt_setprop_cell(vms->fdt, nodename, "bank-width", 4);
+    if (secure) {
         qemu_fdt_setprop_string(vms->fdt, nodename, "status", "disabled");
         qemu_fdt_setprop_string(vms->fdt, nodename, "secure-status", "okay");
-        g_free(nodename);
-
-        nodename = g_strdup_printf("/flash@%" PRIx64, flashbase);
-        qemu_fdt_add_subnode(vms->fdt, nodename);
-        qemu_fdt_setprop_string(vms->fdt, nodename, "compatible", "cfi-flash");
-        qemu_fdt_setprop_sized_cells(vms->fdt, nodename, "reg",
-                                     2, flashbase + flashsize, 2, flashsize);
-        qemu_fdt_setprop_cell(vms->fdt, nodename, "bank-width", 4);
-        g_free(nodename);
     }
+    g_free(nodename);
+
+    nodename = g_strdup_printf("/flash@%" PRIx64, flashbase1);
+    qemu_fdt_add_subnode(vms->fdt, nodename);
+    qemu_fdt_setprop_string(vms->fdt, nodename, "compatible", "cfi-flash");
+    qemu_fdt_setprop_sized_cells(vms->fdt, nodename, "reg",
+                                 2, flashbase1, 2, flashsize1);
+    qemu_fdt_setprop_cell(vms->fdt, nodename, "bank-width", 4);
+    g_free(nodename);
 }
 
 static bool virt_firmware_init(VirtMachineState *vms,
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index aad6d69841..ee21d691ea 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -172,6 +172,7 @@ OBJECT_DECLARE_TYPE(VirtMachineState, VirtMachineClass, VIRT_MACHINE)
 
 void virt_acpi_setup(VirtMachineState *vms);
 bool virt_is_acpi_enabled(VirtMachineState *vms);
+int64_t virt_flash_size(PFlashCFI01 *flash);
 
 /* Return the number of used redistributor regions  */
 static inline int virt_gicv3_redist_region_count(VirtMachineState *vms)
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 5/5] hw/arm: Only minimise flash size on older machines
  2020-11-16 10:42 [RFC PATCH 0/5] ARM: reduce the memory consumed when mapping UEFI flash images David Edmondson
                   ` (3 preceding siblings ...)
  2020-11-16 10:42 ` [RFC PATCH 4/5] hw/arm: Flash image mapping follows image size David Edmondson
@ 2020-11-16 10:42 ` David Edmondson
  2020-11-16 11:39 ` [RFC PATCH 0/5] ARM: reduce the memory consumed when mapping UEFI flash images Philippe Mathieu-Daudé
  5 siblings, 0 replies; 17+ messages in thread
From: David Edmondson @ 2020-11-16 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Peter Maydell, qemu-block, Michael S. Tsirkin,
	Max Reitz, David Edmondson, Shannon Zhao, qemu-arm,
	Igor Mammedov, John Snow

Prior to 5.2 the flash images loaded into the bottom 128MB always
filled the region. Ensure that this continues to be the case.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
---
 hw/arm/virt-acpi-build.c | 11 +++---
 hw/arm/virt.c            | 79 ++++++++++++++++++++++++++--------------
 include/hw/arm/virt.h    |  3 +-
 3 files changed, 60 insertions(+), 33 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 2c08d36624..6e3d72a9e9 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -117,16 +117,17 @@ static void acpi_dsdt_add_flash1(Aml *scope, int index,
     aml_append(scope, dev);
 }
 
-static void acpi_dsdt_add_flash(Aml *scope, const MemMapEntry *flash_memmap,
-    PFlashCFI01 *flash[2])
+static void acpi_dsdt_add_flash(Aml *scope, VirtMachineState *vms)
 {
+    MemMapEntry *flash_memmap = &vms->memmap[VIRT_FLASH];
+
     acpi_dsdt_add_flash1(scope, 0,
                          flash_memmap->base,
-                         virt_flash_size(flash[0]));
+                         virt_flash_size(vms, vms->flash[0]));
 
     acpi_dsdt_add_flash1(scope, 1,
                          flash_memmap->base + flash_memmap->size / 2,
-                         virt_flash_size(flash[1]));
+                         virt_flash_size(vms, vms->flash[1]));
 }
 
 static void acpi_dsdt_add_virtio(Aml *scope,
@@ -606,7 +607,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     acpi_dsdt_add_uart(scope, &memmap[VIRT_UART],
                        (irqmap[VIRT_UART] + ARM_SPI_BASE));
     if (vmc->acpi_expose_flash) {
-        acpi_dsdt_add_flash(scope, &memmap[VIRT_FLASH], vms->flash);
+        acpi_dsdt_add_flash(scope, vms);
     }
     acpi_dsdt_add_fw_cfg(scope, &memmap[VIRT_FW_CFG]);
     acpi_dsdt_add_virtio(scope, &memmap[VIRT_MMIO],
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 03ec844bf3..e851622cb5 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -933,9 +933,15 @@ static void create_virtio_devices(const VirtMachineState *vms)
 
 #define VIRT_FLASH_SECTOR_SIZE (256 * KiB)
 
-int64_t virt_flash_size(PFlashCFI01 *flash)
+int64_t virt_flash_size(VirtMachineState *vms, PFlashCFI01 *flash)
 {
-    return blk_getlength(pflash_cfi01_get_blk(flash));
+    VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
+
+    if (vmc->maximize_flash_size) {
+        return vms->memmap[VIRT_FLASH].size / 2;
+    } else {
+        return blk_getlength(pflash_cfi01_get_blk(flash));
+    }
 }
 
 static PFlashCFI01 *virt_flash_create1(VirtMachineState *vms,
@@ -1014,47 +1020,65 @@ static void virt_flash_map(VirtMachineState *vms,
     MemMapEntry *m = &vms->memmap[VIRT_FLASH];
 
     virt_flash_map1(vms->flash[0], m->base,
-                    virt_flash_size(vms->flash[0]), secure_sysmem);
+                    virt_flash_size(vms, vms->flash[0]), secure_sysmem);
 
     virt_flash_map1(vms->flash[1], m->base + m->size / 2,
-                    virt_flash_size(vms->flash[1]), sysmem);
+                    virt_flash_size(vms, vms->flash[1]), sysmem);
 }
 
 static void virt_flash_fdt(VirtMachineState *vms,
                            MemoryRegion *sysmem,
                            MemoryRegion *secure_sysmem)
 {
+    VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
     bool secure = sysmem != secure_sysmem;
     MemMapEntry *m = &vms->memmap[VIRT_FLASH];
     hwaddr flashbase0 = m->base;
     hwaddr flashbase1 = m->base + m->size / 2;
-    hwaddr flashsize0 = virt_flash_size(vms->flash[0]);
-    hwaddr flashsize1 = virt_flash_size(vms->flash[1]);
+    hwaddr flashsize0 = virt_flash_size(vms, vms->flash[0]);
+    hwaddr flashsize1 = virt_flash_size(vms, vms->flash[1]);
     char *nodename;
 
-    if (secure) {
-        nodename = g_strdup_printf("/secflash@%" PRIx64, flashbase0);
-    } else {
+    if (vmc->maximize_flash_size && !secure) {
+        /* Report both flash devices as a single node in the DT */
         nodename = g_strdup_printf("/flash@%" PRIx64, flashbase0);
-    }
-    qemu_fdt_add_subnode(vms->fdt, nodename);
-    qemu_fdt_setprop_string(vms->fdt, nodename, "compatible", "cfi-flash");
-    qemu_fdt_setprop_sized_cells(vms->fdt, nodename, "reg",
-                                 2, flashbase0, 2, flashsize0);
-    qemu_fdt_setprop_cell(vms->fdt, nodename, "bank-width", 4);
-    if (secure) {
-        qemu_fdt_setprop_string(vms->fdt, nodename, "status", "disabled");
-        qemu_fdt_setprop_string(vms->fdt, nodename, "secure-status", "okay");
-    }
-    g_free(nodename);
+        qemu_fdt_add_subnode(vms->fdt, nodename);
+        qemu_fdt_setprop_string(vms->fdt, nodename, "compatible", "cfi-flash");
+        qemu_fdt_setprop_sized_cells(vms->fdt, nodename, "reg",
+                                     2, flashbase0, 2, flashsize0,
+                                     2, flashbase1, 2, flashsize1);
+        qemu_fdt_setprop_cell(vms->fdt, nodename, "bank-width", 4);
+        g_free(nodename);
+    } else {
+        /*
+         * If we are not intending to fill the flash region or one is
+         * device is secure, report two distinct nodes.
+         */
+        if (secure) {
+            nodename = g_strdup_printf("/secflash@%" PRIx64, flashbase0);
+        } else {
+            nodename = g_strdup_printf("/flash@%" PRIx64, flashbase0);
+        }
+        qemu_fdt_add_subnode(vms->fdt, nodename);
+        qemu_fdt_setprop_string(vms->fdt, nodename, "compatible", "cfi-flash");
+        qemu_fdt_setprop_sized_cells(vms->fdt, nodename, "reg",
+                                     2, flashbase0, 2, flashsize0);
+        qemu_fdt_setprop_cell(vms->fdt, nodename, "bank-width", 4);
+        if (secure) {
+            qemu_fdt_setprop_string(vms->fdt, nodename, "status", "disabled");
+            qemu_fdt_setprop_string(vms->fdt, nodename,
+                                    "secure-status", "okay");
+        }
+        g_free(nodename);
 
-    nodename = g_strdup_printf("/flash@%" PRIx64, flashbase1);
-    qemu_fdt_add_subnode(vms->fdt, nodename);
-    qemu_fdt_setprop_string(vms->fdt, nodename, "compatible", "cfi-flash");
-    qemu_fdt_setprop_sized_cells(vms->fdt, nodename, "reg",
-                                 2, flashbase1, 2, flashsize1);
-    qemu_fdt_setprop_cell(vms->fdt, nodename, "bank-width", 4);
-    g_free(nodename);
+        nodename = g_strdup_printf("/flash@%" PRIx64, flashbase1);
+        qemu_fdt_add_subnode(vms->fdt, nodename);
+        qemu_fdt_setprop_string(vms->fdt, nodename, "compatible", "cfi-flash");
+        qemu_fdt_setprop_sized_cells(vms->fdt, nodename, "reg",
+                                     2, flashbase1, 2, flashsize1);
+        qemu_fdt_setprop_cell(vms->fdt, nodename, "bank-width", 4);
+        g_free(nodename);
+    }
 }
 
 static bool virt_firmware_init(VirtMachineState *vms,
@@ -2614,6 +2638,7 @@ static void virt_machine_5_1_options(MachineClass *mc)
     virt_machine_5_2_options(mc);
     compat_props_add(mc->compat_props, hw_compat_5_1, hw_compat_5_1_len);
     vmc->no_kvm_steal_time = true;
+    vmc->maximize_flash_size = true;
 }
 DEFINE_VIRT_MACHINE(5, 1)
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index ee21d691ea..1135e7e165 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -127,6 +127,7 @@ struct VirtMachineClass {
     bool kvm_no_adjvtime;
     bool no_kvm_steal_time;
     bool acpi_expose_flash;
+    bool maximize_flash_size;
 };
 
 struct VirtMachineState {
@@ -172,7 +173,7 @@ OBJECT_DECLARE_TYPE(VirtMachineState, VirtMachineClass, VIRT_MACHINE)
 
 void virt_acpi_setup(VirtMachineState *vms);
 bool virt_is_acpi_enabled(VirtMachineState *vms);
-int64_t virt_flash_size(PFlashCFI01 *flash);
+int64_t virt_flash_size(VirtMachineState *vms, PFlashCFI01 *flash);
 
 /* Return the number of used redistributor regions  */
 static inline int virt_gicv3_redist_region_count(VirtMachineState *vms)
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/5] hw/block: blk_check_size_and_read_all should report backend name
  2020-11-16 10:42 ` [RFC PATCH 1/5] hw/block: blk_check_size_and_read_all should report backend name David Edmondson
@ 2020-11-16 11:23   ` Philippe Mathieu-Daudé
  2020-11-16 13:29     ` David Edmondson
  2020-11-19 11:56   ` Alex Bennée
  1 sibling, 1 reply; 17+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-11-16 11:23 UTC (permalink / raw)
  To: David Edmondson, qemu-devel
  Cc: Kevin Wolf, Peter Maydell, qemu-block, Michael S. Tsirkin,
	Max Reitz, Shannon Zhao, qemu-arm, Igor Mammedov, John Snow

On 11/16/20 11:42 AM, David Edmondson wrote:
> If there are problems examining or reading data from the block
> backend, the error messages should include an appropriate identifier
> to assist in diagnoses.
> 
> Signed-off-by: David Edmondson <david.edmondson@oracle.com>
> ---
>  hw/block/block.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/block/block.c b/hw/block/block.c
> index 1e34573da7..8b284e1f14 100644
> --- a/hw/block/block.c
> +++ b/hw/block/block.c
> @@ -20,9 +20,6 @@
>   * BDRV_REQUEST_MAX_BYTES.
>   * On success, return true.
>   * On failure, store an error through @errp and return false.
> - * Note that the error messages do not identify the block backend.
> - * TODO Since callers don't either, this can result in confusing
> - * errors.
>   * This function not intended for actual block devices, which read on
>   * demand.  It's for things like memory devices that (ab)use a block
>   * backend to provide persistence.
> @@ -32,17 +29,19 @@ bool blk_check_size_and_read_all(BlockBackend *blk, void *buf, hwaddr size,
>  {
>      int64_t blk_len;
>      int ret;
> +    const char *name = blk_name(blk);
>  
>      blk_len = blk_getlength(blk);
>      if (blk_len < 0) {
>          error_setg_errno(errp, -blk_len,
> -                         "can't get size of block backend");
> +                         "can't get size of block backend %s",

Maybe '%s' to notice empty name?

> +                         name);
>          return false;
>      }
>      if (blk_len != size) {
>          error_setg(errp, "device requires %" HWADDR_PRIu " bytes, "
> -                   "block backend provides %" PRIu64 " bytes",
> -                   size, blk_len);
> +                   "block backend %s provides %" PRIu64 " bytes",
> +                   size, name, blk_len);
>          return false;
>      }
>  
> @@ -55,7 +54,8 @@ bool blk_check_size_and_read_all(BlockBackend *blk, void *buf, hwaddr size,
>      assert(size <= BDRV_REQUEST_MAX_BYTES);
>      ret = blk_pread(blk, 0, buf, size);
>      if (ret < 0) {
> -        error_setg_errno(errp, -ret, "can't read block backend");
> +        error_setg_errno(errp, -ret, "can't read block backend %s",
> +                         name);
>          return false;
>      }
>      return true;
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/5] ARM: reduce the memory consumed when mapping UEFI flash images
  2020-11-16 10:42 [RFC PATCH 0/5] ARM: reduce the memory consumed when mapping UEFI flash images David Edmondson
                   ` (4 preceding siblings ...)
  2020-11-16 10:42 ` [RFC PATCH 5/5] hw/arm: Only minimise flash size on older machines David Edmondson
@ 2020-11-16 11:39 ` Philippe Mathieu-Daudé
  2020-11-16 13:43   ` David Edmondson
  2020-11-16 13:48   ` Markus Armbruster
  5 siblings, 2 replies; 17+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-11-16 11:39 UTC (permalink / raw)
  To: David Edmondson, qemu-devel
  Cc: Kevin Wolf, Peter Maydell, qemu-block, Michael S. Tsirkin,
	Markus Armbruster, Max Reitz, Shannon Zhao, qemu-arm,
	Igor Mammedov, John Snow

Hi David,

On 11/16/20 11:42 AM, David Edmondson wrote:
> Currently ARM UEFI images are typically built as 2MB/768kB flash
> images for code and variables respectively. These images are both then
> padded out to 64MB before being loaded by QEMU.
> 
> Because the images are 64MB each, QEMU allocates 128MB of memory to
> read them, and then proceeds to read all 128MB from disk (dirtying the
> memory). Of this 128MB less than 3MB is useful - the rest is zero
> padding.

2 years ago I commented the same problem, and suggested to access the
underlying storage by "block", as this is a "block storage".

Back then the response was "do not try to fix something that works".
This is why we choose the big hammer "do not accept image size not
matching device size" way.

While your series seems to help, it only postpone the same
implementation problem. If what you want is use the least memory
required, I still believe accessing the device by block is the
best approach.

Regards,

Phil.

> 
> On a machine with 100 VMs this wastes over 12GB of memory.
> 
> This set of patches aims to reclaim the wasted memory by allowing QEMU
> to respect the size of the flash images and allocate only the
> necessary memory. This will, of course, require that the flash build
> process be modified to avoid padding the images to 64MB.
> 
> Because existing machine types expect the full 128MB reserved for
> flash to be occupied, do so for machine types older than virt-5.2. The
> changes are beneficial even in this case, because while the full 128MB
> of memory is allocated, only that required to actually load the flash
> images from disk is touched.
> 
> David Edmondson (5):
>   hw/block: blk_check_size_and_read_all should report backend name
>   hw/block: Flash images can be smaller than the device
>   hw/arm: Convert assertions about flash image size to error_report
>   hw/arm: Flash image mapping follows image size
>   hw/arm: Only minimise flash size on older machines
> 
>  hw/arm/trace-events      |  2 +
>  hw/arm/virt-acpi-build.c | 30 ++++++++------
>  hw/arm/virt.c            | 86 +++++++++++++++++++++++++++++-----------
>  hw/block/block.c         | 26 ++++++------
>  include/hw/arm/virt.h    |  2 +
>  5 files changed, 97 insertions(+), 49 deletions(-)
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/5] hw/block: blk_check_size_and_read_all should report backend name
  2020-11-16 11:23   ` Philippe Mathieu-Daudé
@ 2020-11-16 13:29     ` David Edmondson
  0 siblings, 0 replies; 17+ messages in thread
From: David Edmondson @ 2020-11-16 13:29 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: Kevin Wolf, Peter Maydell, qemu-block, Michael S. Tsirkin,
	Max Reitz, Shannon Zhao, qemu-arm, Igor Mammedov, John Snow

On Monday, 2020-11-16 at 12:23:24 +01, Philippe Mathieu-Daudé wrote:

> On 11/16/20 11:42 AM, David Edmondson wrote:
>> If there are problems examining or reading data from the block
>> backend, the error messages should include an appropriate identifier
>> to assist in diagnoses.
>> 
>> Signed-off-by: David Edmondson <david.edmondson@oracle.com>
>> ---
>>  hw/block/block.c | 14 +++++++-------
>>  1 file changed, 7 insertions(+), 7 deletions(-)
>> 
>> diff --git a/hw/block/block.c b/hw/block/block.c
>> index 1e34573da7..8b284e1f14 100644
>> --- a/hw/block/block.c
>> +++ b/hw/block/block.c
>> @@ -20,9 +20,6 @@
>>   * BDRV_REQUEST_MAX_BYTES.
>>   * On success, return true.
>>   * On failure, store an error through @errp and return false.
>> - * Note that the error messages do not identify the block backend.
>> - * TODO Since callers don't either, this can result in confusing
>> - * errors.
>>   * This function not intended for actual block devices, which read on
>>   * demand.  It's for things like memory devices that (ab)use a block
>>   * backend to provide persistence.
>> @@ -32,17 +29,19 @@ bool blk_check_size_and_read_all(BlockBackend *blk, void *buf, hwaddr size,
>>  {
>>      int64_t blk_len;
>>      int ret;
>> +    const char *name = blk_name(blk);
>>  
>>      blk_len = blk_getlength(blk);
>>      if (blk_len < 0) {
>>          error_setg_errno(errp, -blk_len,
>> -                         "can't get size of block backend");
>> +                         "can't get size of block backend %s",
>
> Maybe '%s' to notice empty name?

Okay.

>> +                         name);
>>          return false;
>>      }
>>      if (blk_len != size) {
>>          error_setg(errp, "device requires %" HWADDR_PRIu " bytes, "
>> -                   "block backend provides %" PRIu64 " bytes",
>> -                   size, blk_len);
>> +                   "block backend %s provides %" PRIu64 " bytes",
>> +                   size, name, blk_len);
>>          return false;
>>      }
>>  
>> @@ -55,7 +54,8 @@ bool blk_check_size_and_read_all(BlockBackend *blk, void *buf, hwaddr size,
>>      assert(size <= BDRV_REQUEST_MAX_BYTES);
>>      ret = blk_pread(blk, 0, buf, size);
>>      if (ret < 0) {
>> -        error_setg_errno(errp, -ret, "can't read block backend");
>> +        error_setg_errno(errp, -ret, "can't read block backend %s",
>> +                         name);
>>          return false;
>>      }
>>      return true;
>> 

dme.
-- 
She looks like Eve Marie Saint in "On the Waterfront".


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/5] ARM: reduce the memory consumed when mapping UEFI flash images
  2020-11-16 11:39 ` [RFC PATCH 0/5] ARM: reduce the memory consumed when mapping UEFI flash images Philippe Mathieu-Daudé
@ 2020-11-16 13:43   ` David Edmondson
  2020-11-16 13:48   ` Markus Armbruster
  1 sibling, 0 replies; 17+ messages in thread
From: David Edmondson @ 2020-11-16 13:43 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: Kevin Wolf, Peter Maydell, qemu-block, Michael S. Tsirkin,
	Markus Armbruster, Max Reitz, Shannon Zhao, qemu-arm,
	Igor Mammedov, John Snow

On Monday, 2020-11-16 at 12:39:46 +01, Philippe Mathieu-Daudé wrote:

> Hi David,
>
> On 11/16/20 11:42 AM, David Edmondson wrote:
>> Currently ARM UEFI images are typically built as 2MB/768kB flash
>> images for code and variables respectively. These images are both then
>> padded out to 64MB before being loaded by QEMU.
>> 
>> Because the images are 64MB each, QEMU allocates 128MB of memory to
>> read them, and then proceeds to read all 128MB from disk (dirtying the
>> memory). Of this 128MB less than 3MB is useful - the rest is zero
>> padding.
>
> 2 years ago I commented the same problem, and suggested to access the
> underlying storage by "block", as this is a "block storage".
>
> Back then the response was "do not try to fix something that works".
> This is why we choose the big hammer "do not accept image size not
> matching device size" way.
>
> While your series seems to help, it only postpone the same
> implementation problem. If what you want is use the least memory
> required, I still believe accessing the device by block is the
> best approach.

I agree that would reduce the size further, but it feels like it may be
a case of diminishing returns.

Currently the consumption for a single guest is 128MB. This change will
bring it down under 3MB, with the block approach perhaps reducing that
to zero (there will be some buffer block usage presumably, and perhaps a
small cache would make sense, so it won't really be zero).

Scaling that up for 100 guests, we're going from 12.5GB now down to
under 300MB with the changes, and again something around zero with the
block based approach.

I guess that it would mean that the machine model wouldn't have to
change, as we could return zeros for reads outside the underlying device
extent. This seems like the most interesting part - are there likely to
be any consequential *benefits* from reducing the amount of guest
address space consumed by the flash regions?

> Regards,
>
> Phil.
>
>> 
>> On a machine with 100 VMs this wastes over 12GB of memory.
>> 
>> This set of patches aims to reclaim the wasted memory by allowing QEMU
>> to respect the size of the flash images and allocate only the
>> necessary memory. This will, of course, require that the flash build
>> process be modified to avoid padding the images to 64MB.
>> 
>> Because existing machine types expect the full 128MB reserved for
>> flash to be occupied, do so for machine types older than virt-5.2. The
>> changes are beneficial even in this case, because while the full 128MB
>> of memory is allocated, only that required to actually load the flash
>> images from disk is touched.
>> 
>> David Edmondson (5):
>>   hw/block: blk_check_size_and_read_all should report backend name
>>   hw/block: Flash images can be smaller than the device
>>   hw/arm: Convert assertions about flash image size to error_report
>>   hw/arm: Flash image mapping follows image size
>>   hw/arm: Only minimise flash size on older machines
>> 
>>  hw/arm/trace-events      |  2 +
>>  hw/arm/virt-acpi-build.c | 30 ++++++++------
>>  hw/arm/virt.c            | 86 +++++++++++++++++++++++++++++-----------
>>  hw/block/block.c         | 26 ++++++------
>>  include/hw/arm/virt.h    |  2 +
>>  5 files changed, 97 insertions(+), 49 deletions(-)
>> 

dme.
-- 
Please forgive me if I act a little strange, for I know not what I do.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/5] ARM: reduce the memory consumed when mapping UEFI flash images
  2020-11-16 11:39 ` [RFC PATCH 0/5] ARM: reduce the memory consumed when mapping UEFI flash images Philippe Mathieu-Daudé
  2020-11-16 13:43   ` David Edmondson
@ 2020-11-16 13:48   ` Markus Armbruster
  2020-11-19  6:09     ` Philippe Mathieu-Daudé
  1 sibling, 1 reply; 17+ messages in thread
From: Markus Armbruster @ 2020-11-16 13:48 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Kevin Wolf, Peter Maydell, qemu-block, Michael S. Tsirkin,
	qemu-devel, Max Reitz, David Edmondson, Shannon Zhao, qemu-arm,
	Igor Mammedov, John Snow

Philippe Mathieu-Daudé <philmd@redhat.com> writes:

> Hi David,
>
> On 11/16/20 11:42 AM, David Edmondson wrote:
>> Currently ARM UEFI images are typically built as 2MB/768kB flash
>> images for code and variables respectively. These images are both then
>> padded out to 64MB before being loaded by QEMU.
>> 
>> Because the images are 64MB each, QEMU allocates 128MB of memory to
>> read them, and then proceeds to read all 128MB from disk (dirtying the
>> memory). Of this 128MB less than 3MB is useful - the rest is zero
>> padding.
>
> 2 years ago I commented the same problem, and suggested to access the
> underlying storage by "block", as this is a "block storage".
>
> Back then the response was "do not try to fix something that works".
> This is why we choose the big hammer "do not accept image size not
> matching device size" way.
>
> While your series seems to help, it only postpone the same
> implementation problem. If what you want is use the least memory
> required, I still believe accessing the device by block is the
> best approach.

"Do not try to fix something that works" is hard to disagree with.
However, at least some users seem to disagree with "this works".  Enough
to overcome the resistance to change?



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/5] ARM: reduce the memory consumed when mapping UEFI flash images
  2020-11-16 13:48   ` Markus Armbruster
@ 2020-11-19  6:09     ` Philippe Mathieu-Daudé
  2020-11-19 11:45       ` Alex Bennée
  0 siblings, 1 reply; 17+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-11-19  6:09 UTC (permalink / raw)
  To: Markus Armbruster, David Edmondson
  Cc: Kevin Wolf, Peter Maydell, qemu-block, Michael S. Tsirkin,
	Xu Yandong, qemu-devel, Max Reitz, Shannon Zhao, Zheng Xiang,
	qemu-arm, haibinzhang(张海斌),
	Igor Mammedov, John Snow

On 11/16/20 2:48 PM, Markus Armbruster wrote:
> Philippe Mathieu-Daudé <philmd@redhat.com> writes:
> 
>> Hi David,
>>
>> On 11/16/20 11:42 AM, David Edmondson wrote:
>>> Currently ARM UEFI images are typically built as 2MB/768kB flash
>>> images for code and variables respectively. These images are both then
>>> padded out to 64MB before being loaded by QEMU.
>>>
>>> Because the images are 64MB each, QEMU allocates 128MB of memory to
>>> read them, and then proceeds to read all 128MB from disk (dirtying the
>>> memory). Of this 128MB less than 3MB is useful - the rest is zero
>>> padding.
>>
>> 2 years ago I commented the same problem, and suggested to access the
>> underlying storage by "block", as this is a "block storage".
>>
>> Back then the response was "do not try to fix something that works".
>> This is why we choose the big hammer "do not accept image size not
>> matching device size" way.
>>
>> While your series seems to help, it only postpone the same
>> implementation problem. If what you want is use the least memory
>> required, I still believe accessing the device by block is the
>> best approach.
> 
> "Do not try to fix something that works" is hard to disagree with.
> However, at least some users seem to disagree with "this works".  Enough
> to overcome the resistance to change?

Yeah, at least 3 big users so far:

- Huawei
https://www.mail-archive.com/qemu-devel@nongnu.org/msg607292.html
- Tencent
https://www.mail-archive.com/qemu-devel@nongnu.org/msg742066.html
- Oracle
(this series).

Then Huawei tried the MicroVM approach:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg680103.html

I simply wanted to save David time by remembering this other approach,
since Peter already commented on David's one (see Huawei link).

Regards,

Phil.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 3/5] hw/arm: Convert assertions about flash image size to error_report
  2020-11-16 10:42 ` [RFC PATCH 3/5] hw/arm: Convert assertions about flash image size to error_report David Edmondson
@ 2020-11-19 11:39   ` Alex Bennée
  0 siblings, 0 replies; 17+ messages in thread
From: Alex Bennée @ 2020-11-19 11:39 UTC (permalink / raw)
  To: David Edmondson
  Cc: Kevin Wolf, Peter Maydell, qemu-block, Michael S. Tsirkin,
	qemu-devel, Max Reitz, Shannon Zhao, qemu-arm, Igor Mammedov,
	John Snow


David Edmondson <david.edmondson@oracle.com> writes:

> Rather than throwing an assertion, provide a more detailed report if a
> flash image is inappropriately sized or aligned.
>
> Signed-off-by: David Edmondson <david.edmondson@oracle.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/5] ARM: reduce the memory consumed when mapping UEFI flash images
  2020-11-19  6:09     ` Philippe Mathieu-Daudé
@ 2020-11-19 11:45       ` Alex Bennée
  2020-11-19 11:57         ` Philippe Mathieu-Daudé
  2020-12-07 12:07         ` David Edmondson
  0 siblings, 2 replies; 17+ messages in thread
From: Alex Bennée @ 2020-11-19 11:45 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Kevin Wolf, Peter Maydell, qemu-devel, qemu-block,
	Michael S. Tsirkin, Xu Yandong, Markus Armbruster, Max Reitz,
	David Edmondson, Shannon Zhao, Zheng Xiang, qemu-arm,
	haibinzhang(张海斌),
	Igor Mammedov, John Snow


Philippe Mathieu-Daudé <philmd@redhat.com> writes:

> On 11/16/20 2:48 PM, Markus Armbruster wrote:
>> Philippe Mathieu-Daudé <philmd@redhat.com> writes:
>> 
>>> Hi David,
>>>
>>> On 11/16/20 11:42 AM, David Edmondson wrote:
>>>> Currently ARM UEFI images are typically built as 2MB/768kB flash
>>>> images for code and variables respectively. These images are both then
>>>> padded out to 64MB before being loaded by QEMU.
>>>>
>>>> Because the images are 64MB each, QEMU allocates 128MB of memory to
>>>> read them, and then proceeds to read all 128MB from disk (dirtying the
>>>> memory). Of this 128MB less than 3MB is useful - the rest is zero
>>>> padding.
>>>
>>> 2 years ago I commented the same problem, and suggested to access the
>>> underlying storage by "block", as this is a "block storage".
>>>
>>> Back then the response was "do not try to fix something that works".
>>> This is why we choose the big hammer "do not accept image size not
>>> matching device size" way.
>>>
>>> While your series seems to help, it only postpone the same
>>> implementation problem. If what you want is use the least memory
>>> required, I still believe accessing the device by block is the
>>> best approach.
>> 
>> "Do not try to fix something that works" is hard to disagree with.
>> However, at least some users seem to disagree with "this works".  Enough
>> to overcome the resistance to change?
>
> Yeah, at least 3 big users so far:
>
> - Huawei
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg607292.html
> - Tencent
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg742066.html
> - Oracle
> (this series).
>
> Then Huawei tried the MicroVM approach:
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg680103.html
>
> I simply wanted to save David time by remembering this other approach,
> since Peter already commented on David's one (see Huawei link).

IIRC the two questions that came up were:

  - what would reading memory not covered by a file look like (0's or
    something more like real HW, 7f?).

  - what happens when the guest writes beyond the bounds of a backing
    file?

I'm guessing for these cloudy type applications no one cares about
persistence of EFI variables? Maybe we just need a formulation for the
second pflash which is explicit about writes being ephemeral while also
being accepted?

>
> Regards,
>
> Phil.


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/5] hw/block: blk_check_size_and_read_all should report backend name
  2020-11-16 10:42 ` [RFC PATCH 1/5] hw/block: blk_check_size_and_read_all should report backend name David Edmondson
  2020-11-16 11:23   ` Philippe Mathieu-Daudé
@ 2020-11-19 11:56   ` Alex Bennée
  1 sibling, 0 replies; 17+ messages in thread
From: Alex Bennée @ 2020-11-19 11:56 UTC (permalink / raw)
  To: David Edmondson
  Cc: Kevin Wolf, Peter Maydell, qemu-block, Michael S. Tsirkin,
	qemu-devel, Max Reitz, Shannon Zhao, qemu-arm, Igor Mammedov,
	John Snow


David Edmondson <david.edmondson@oracle.com> writes:

> If there are problems examining or reading data from the block
> backend, the error messages should include an appropriate identifier
> to assist in diagnoses.
>
> Signed-off-by: David Edmondson <david.edmondson@oracle.com>

With Phillipe's suggested ''s:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/5] ARM: reduce the memory consumed when mapping UEFI flash images
  2020-11-19 11:45       ` Alex Bennée
@ 2020-11-19 11:57         ` Philippe Mathieu-Daudé
  2020-12-07 12:07         ` David Edmondson
  1 sibling, 0 replies; 17+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-11-19 11:57 UTC (permalink / raw)
  To: Alex Bennée, Ard Biesheuvel, Leif Lindholm
  Cc: Kevin Wolf, Peter Maydell, qemu-devel, qemu-block,
	Michael S. Tsirkin, Xu Yandong, Markus Armbruster, Max Reitz,
	David Edmondson, Shannon Zhao, Zheng Xiang, qemu-arm,
	haibinzhang(张海斌),
	Igor Mammedov, John Snow

+Ard & Leif for EDK2.

On 11/19/20 12:45 PM, Alex Bennée wrote:
> Philippe Mathieu-Daudé <philmd@redhat.com> writes:
>> On 11/16/20 2:48 PM, Markus Armbruster wrote:
>>> Philippe Mathieu-Daudé <philmd@redhat.com> writes:
>>>
>>>> Hi David,
>>>>
>>>> On 11/16/20 11:42 AM, David Edmondson wrote:
>>>>> Currently ARM UEFI images are typically built as 2MB/768kB flash
>>>>> images for code and variables respectively. These images are both then
>>>>> padded out to 64MB before being loaded by QEMU.
>>>>>
>>>>> Because the images are 64MB each, QEMU allocates 128MB of memory to
>>>>> read them, and then proceeds to read all 128MB from disk (dirtying the
>>>>> memory). Of this 128MB less than 3MB is useful - the rest is zero
>>>>> padding.
>>>>
>>>> 2 years ago I commented the same problem, and suggested to access the
>>>> underlying storage by "block", as this is a "block storage".
>>>>
>>>> Back then the response was "do not try to fix something that works".
>>>> This is why we choose the big hammer "do not accept image size not
>>>> matching device size" way.
>>>>
>>>> While your series seems to help, it only postpone the same
>>>> implementation problem. If what you want is use the least memory
>>>> required, I still believe accessing the device by block is the
>>>> best approach.
>>>
>>> "Do not try to fix something that works" is hard to disagree with.
>>> However, at least some users seem to disagree with "this works".  Enough
>>> to overcome the resistance to change?
>>
>> Yeah, at least 3 big users so far:
>>
>> - Huawei
>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg607292.html
>> - Tencent
>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg742066.html
>> - Oracle
>> (this series).
>>
>> Then Huawei tried the MicroVM approach:
>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg680103.html
>>
>> I simply wanted to save David time by remembering this other approach,
>> since Peter already commented on David's one (see Huawei link).
> 
> IIRC the two questions that came up were:
> 
>   - what would reading memory not covered by a file look like (0's or
>     something more like real HW, 7f?).

For NOR flashes erased bit is high, programmed bit is low, so: 0xff.

> 
>   - what happens when the guest writes beyond the bounds of a backing
>     file?

Report an hardware error, so guest firmware have a chance to do
do something (not sure what, beside rebooting...).

> 
> I'm guessing for these cloudy type applications no one cares about
> persistence of EFI variables? Maybe we just need a formulation for the
> second pflash which is explicit about writes being ephemeral while also
> being accepted?

Someone suggested adding a new machine type to QEMU to be able to use
smaller flash for cloud usage (but I don't remember who). Then EDK2
could be built with for this new flash size.

Regards,

Phil.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/5] ARM: reduce the memory consumed when mapping UEFI flash images
  2020-11-19 11:45       ` Alex Bennée
  2020-11-19 11:57         ` Philippe Mathieu-Daudé
@ 2020-12-07 12:07         ` David Edmondson
  1 sibling, 0 replies; 17+ messages in thread
From: David Edmondson @ 2020-12-07 12:07 UTC (permalink / raw)
  To: Alex Bennée, Philippe Mathieu-Daudé
  Cc: Kevin Wolf, Peter Maydell, qemu-block, Michael S. Tsirkin,
	Xu Yandong, qemu-devel, Markus Armbruster, Shannon Zhao,
	Zheng Xiang, qemu-arm, haibinzhang(张海斌),
	Mihai Carabas, Igor Mammedov, Max Reitz, John Snow

Sorry for the delay in replying - holiday then distractions.

On Thursday, 2020-11-19 at 11:45:11 GMT, Alex Bennée wrote:

> Philippe Mathieu-Daudé <philmd@redhat.com> writes:
>
>> On 11/16/20 2:48 PM, Markus Armbruster wrote:
>>> Philippe Mathieu-Daudé <philmd@redhat.com> writes:
>>> 
>>>> Hi David,
>>>>
>>>> On 11/16/20 11:42 AM, David Edmondson wrote:
>>>>> Currently ARM UEFI images are typically built as 2MB/768kB flash
>>>>> images for code and variables respectively. These images are both then
>>>>> padded out to 64MB before being loaded by QEMU.
>>>>>
>>>>> Because the images are 64MB each, QEMU allocates 128MB of memory to
>>>>> read them, and then proceeds to read all 128MB from disk (dirtying the
>>>>> memory). Of this 128MB less than 3MB is useful - the rest is zero
>>>>> padding.
>>>>
>>>> 2 years ago I commented the same problem, and suggested to access the
>>>> underlying storage by "block", as this is a "block storage".
>>>>
>>>> Back then the response was "do not try to fix something that works".
>>>> This is why we choose the big hammer "do not accept image size not
>>>> matching device size" way.
>>>>
>>>> While your series seems to help, it only postpone the same
>>>> implementation problem. If what you want is use the least memory
>>>> required, I still believe accessing the device by block is the
>>>> best approach.
>>> 
>>> "Do not try to fix something that works" is hard to disagree with.
>>> However, at least some users seem to disagree with "this works".  Enough
>>> to overcome the resistance to change?
>>
>> Yeah, at least 3 big users so far:
>>
>> - Huawei
>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg607292.html
>> - Tencent
>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg742066.html
>> - Oracle
>> (this series).
>>
>> Then Huawei tried the MicroVM approach:
>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg680103.html
>>
>> I simply wanted to save David time by remembering this other approach,
>> since Peter already commented on David's one (see Huawei link).
>
> IIRC the two questions that came up were:
>
>   - what would reading memory not covered by a file look like (0's or
>     something more like real HW, 7f?).
>
>   - what happens when the guest writes beyond the bounds of a backing
>     file?

In the non-backward-compatible-case (pre virt-5.2 in the patches, but
obviously not actually in 5.2, where the memory range is declared to the
guest as the extent of the file) neither of these issues should arise.

Reading/writing outside of the declared range should generate a fault,
much like any other non-existent region.

In the backward-compatible case the patches are obviously flawed - a
write outside of the extent of the backing file would have nowhere to
go.

In order to figure out how to move forward, I think that the summary of
the previous conversations is that either:
- it's not possible to reduce the size of the memory region declared to
  the guest from 64M+64M because of $reasons, or
- we don't want to reduce the size of the memory region declared to the
  guest from 64M+64M because of $reasons.

If that's right, I'd be curious to better understand $reasons, but have
no basis on which to argue.

Presuming that it's not acceptable to reduce the declared extent of the
flash regions, I will probably look to implement Philippe's suggestion:

> If what you want is use the least memory required, I still believe
> accessing the device by block is the best approach.

...which I interpret to mean that the pflash drivers should not allocate
memory and read the images (later writing back modified blocks), but
handle each IO request from the caller by reading and writing blocks
from the underlying device as necessary.

If this interpretation is wrong, please let me know :-)

Looking at doing that, it seems that a new variant of
memory_region_init_rom_device() that does not allocate memory will be
required, unless there anything already available that performs a
similar function?

Booting a VM with AAVMF and writing some variables to the flash doesn't
exercise much of the pflash functionality. In particular it would seem
like the right time to fix the batch write omission in the current
code. Is there another consumer of the pflash drivers that is likely to
exercise more? (I can write tests that run in the guest, of course,
presuming that the Linux driver can be pushed into using those paths.)

> I'm guessing for these cloudy type applications no one cares about
> persistence of EFI variables? Maybe we just need a formulation for the
> second pflash which is explicit about writes being ephemeral while also
> being accepted?

Our current deployments on x86 do not have writable flash for VARS,
though I believe that was problematic on arm64 until a recent
change. It's my suspicion that assuming we can continue to present
read-only VARS generally is going to be proven wrong before too long.

dme.
-- 
I'm not living in the real world, no more, no more.


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-12-07 13:27 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-16 10:42 [RFC PATCH 0/5] ARM: reduce the memory consumed when mapping UEFI flash images David Edmondson
2020-11-16 10:42 ` [RFC PATCH 1/5] hw/block: blk_check_size_and_read_all should report backend name David Edmondson
2020-11-16 11:23   ` Philippe Mathieu-Daudé
2020-11-16 13:29     ` David Edmondson
2020-11-19 11:56   ` Alex Bennée
2020-11-16 10:42 ` [RFC PATCH 2/5] hw/block: Flash images can be smaller than the device David Edmondson
2020-11-16 10:42 ` [RFC PATCH 3/5] hw/arm: Convert assertions about flash image size to error_report David Edmondson
2020-11-19 11:39   ` Alex Bennée
2020-11-16 10:42 ` [RFC PATCH 4/5] hw/arm: Flash image mapping follows image size David Edmondson
2020-11-16 10:42 ` [RFC PATCH 5/5] hw/arm: Only minimise flash size on older machines David Edmondson
2020-11-16 11:39 ` [RFC PATCH 0/5] ARM: reduce the memory consumed when mapping UEFI flash images Philippe Mathieu-Daudé
2020-11-16 13:43   ` David Edmondson
2020-11-16 13:48   ` Markus Armbruster
2020-11-19  6:09     ` Philippe Mathieu-Daudé
2020-11-19 11:45       ` Alex Bennée
2020-11-19 11:57         ` Philippe Mathieu-Daudé
2020-12-07 12:07         ` David Edmondson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.