All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] hw/pflash_cfi01: Reduce memory consumption when flash image is smaller than region
@ 2021-02-16 14:27 David Edmondson
  2021-02-16 14:27 ` [RFC PATCH 1/3] hw/pflash_cfi*: Replace DPRINTF with trace events David Edmondson
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: David Edmondson @ 2021-02-16 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, David Edmondson, Philippe Mathieu-Daudé,
	qemu-block, Max Reitz

As described in
https://lore.kernel.org/r/20201116104216.439650-1-david.edmondson@oracle.com,
I'd like to reduce the amount of memory consumed by QEMU mapping UEFI
images on aarch64.

To recap:

> Currently ARM UEFI images are typically built as 2MB/768kB flash
> images for code and variables respectively. These images are both
> then padded out to 64MB before being loaded by QEMU.
>
> Because the images are 64MB each, QEMU allocates 128MB of memory to
> read them, and then proceeds to read all 128MB from disk (dirtying
> the memory). Of this 128MB less than 3MB is useful - the rest is
> zero padding.
>
> On a machine with 100 VMs this wastes over 12GB of memory.

There were objections to my previous patch because it changed the size
of the regions reported to the guest via the memory map (the reported
size depended on the size of the image).

This is a smaller patch which only helps with read-only flash images,
as it does so by changing the memory region that covers the entire
region to be IO rather than RAM, and loads the flash image into a
smaller sub-region that is the more traditional mixed IO/ROMD type.

All read/write operations to areas outside of the underlying block
device are handled directly (reads return 0, writes fail (which is
okay, because this path only supports read-only devices)).

This reduces the memory consumption for the read-only AAVMF code image
from 64MB to around 2MB (presuming that the UEFI image is adjusted
accordingly). It does nothing to improve the memory consumption caused
by the read-write AAVMF vars image.

There was a suggestion in a previous thread that perhaps the pflash
driver could be re-worked to use the block IO interfaces to access the
underlying device "on demand" rather than reading in the entire image
at startup (at least, that's how I understood the comment).

I looked at implementing this and struggled to get it to work for all
of the required use cases. Specifically, there are several code paths
that expect to retrieve a pointer to the flat memory image of the
pflash device and manipulate it directly (examples include the Malta
board and encrypted memory support on x86), or write the entire image
to storage (after migration).

My implementation was based around mapping the flash region only for
IO, which meant that every read or write had to be handled directly by
the pflash driver (there was no ROMD style operation), which also made
booting an aarch64 VM noticeably slower - getting through the firmware
went from under 1 second to around 10 seconds.

Improving the writeable device support requires some more general
infrastructure, I think, but I'm not familiar with everything that
QEMU currently provides, and would be very happy to learn otherwise.

David Edmondson (3):
  hw/pflash_cfi*: Replace DPRINTF with trace events
  hw/pflash_cfi01: Correct the type of PFlashCFI01.ro
  hw/pflash_cfi01: Allow read-only devices to have a smaller backing
    device

 hw/block/pflash_cfi01.c | 197 +++++++++++++++++++++++++---------------
 hw/block/pflash_cfi02.c |  75 ++++++---------
 hw/block/trace-events   |  42 +++++++--
 3 files changed, 186 insertions(+), 128 deletions(-)

-- 
2.30.0



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC PATCH 1/3] hw/pflash_cfi*: Replace DPRINTF with trace events
  2021-02-16 14:27 [RFC PATCH 0/3] hw/pflash_cfi01: Reduce memory consumption when flash image is smaller than region David Edmondson
@ 2021-02-16 14:27 ` David Edmondson
  2021-02-16 14:27 ` [RFC PATCH 2/3] hw/pflash_cfi01: Correct the type of PFlashCFI01.ro David Edmondson
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: David Edmondson @ 2021-02-16 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, David Edmondson, Philippe Mathieu-Daudé,
	qemu-block, Max Reitz

Rather than having a device specific debug implementation in
pflash_cfi01.c and pflash_cfi02.c, use the standard tracing facility.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
---
 hw/block/pflash_cfi01.c | 78 +++++++++++++++++------------------------
 hw/block/pflash_cfi02.c | 75 +++++++++++++++------------------------
 hw/block/trace-events   | 39 ++++++++++++++++-----
 3 files changed, 91 insertions(+), 101 deletions(-)

diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index 22287a1522..9e1f3b42c6 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -56,16 +56,6 @@
 #include "sysemu/runstate.h"
 #include "trace.h"
 
-/* #define PFLASH_DEBUG */
-#ifdef PFLASH_DEBUG
-#define DPRINTF(fmt, ...)                                   \
-do {                                                        \
-    fprintf(stderr, "PFLASH: " fmt , ## __VA_ARGS__);       \
-} while (0)
-#else
-#define DPRINTF(fmt, ...) do { } while (0)
-#endif
-
 #define PFLASH_BE          0
 #define PFLASH_SECURE      1
 
@@ -152,10 +142,8 @@ static uint32_t pflash_cfi_query(PFlashCFI01 *pfl, hwaddr offset)
          * wider part.
          */
         if (pfl->device_width != 1 || pfl->bank_width > 4) {
-            DPRINTF("%s: Unsupported device configuration: "
-                    "device_width=%d, max_device_width=%d\n",
-                    __func__, pfl->device_width,
-                    pfl->max_device_width);
+            trace_pflash_unsupported_device_configuration(
+                pfl->name, pfl->device_width, pfl->max_device_width);
             return 0;
         }
         /* CFI query data is repeated, rather than zero padded for
@@ -205,14 +193,14 @@ static uint32_t pflash_devid_query(PFlashCFI01 *pfl, hwaddr offset)
     switch (boff & 0xFF) {
     case 0:
         resp = pfl->ident0;
-        trace_pflash_manufacturer_id(resp);
+        trace_pflash_manufacturer_id(pfl->name, resp);
         break;
     case 1:
         resp = pfl->ident1;
-        trace_pflash_device_id(resp);
+        trace_pflash_device_id(pfl->name, resp);
         break;
     default:
-        trace_pflash_device_info(offset);
+        trace_pflash_device_info(pfl->name, offset);
         return 0;
     }
     /* Replicate responses for each device in bank. */
@@ -260,10 +248,9 @@ static uint32_t pflash_data_read(PFlashCFI01 *pfl, hwaddr offset,
         }
         break;
     default:
-        DPRINTF("BUG in %s\n", __func__);
         abort();
     }
-    trace_pflash_data_read(offset, width, ret);
+    trace_pflash_data_read(pfl->name, offset, width, ret);
     return ret;
 }
 
@@ -277,7 +264,7 @@ static uint32_t pflash_read(PFlashCFI01 *pfl, hwaddr offset,
     switch (pfl->cmd) {
     default:
         /* This should never happen : reset state & treat it as a read */
-        DPRINTF("%s: unknown command state: %x\n", __func__, pfl->cmd);
+        trace_pflash_read_unknown_state(pfl->name, pfl->cmd);
         pfl->wcycle = 0;
         /*
          * The command 0x00 is not assigned by the CFI open standard,
@@ -313,7 +300,7 @@ static uint32_t pflash_read(PFlashCFI01 *pfl, hwaddr offset,
              */
             ret |= pfl->status << 16;
         }
-        DPRINTF("%s: status %x\n", __func__, ret);
+        trace_pflash_read_status(pfl->name, ret);
         break;
     case 0x90:
         if (!pfl->device_width) {
@@ -328,14 +315,14 @@ static uint32_t pflash_read(PFlashCFI01 *pfl, hwaddr offset,
             switch (boff) {
             case 0:
                 ret = pfl->ident0 << 8 | pfl->ident1;
-                trace_pflash_manufacturer_id(ret);
+                trace_pflash_manufacturer_id(pfl->name, ret);
                 break;
             case 1:
                 ret = pfl->ident2 << 8 | pfl->ident3;
-                trace_pflash_device_id(ret);
+                trace_pflash_device_id(pfl->name, ret);
                 break;
             default:
-                trace_pflash_device_info(boff);
+                trace_pflash_device_info(pfl->name, boff);
                 ret = 0;
                 break;
             }
@@ -380,7 +367,7 @@ static uint32_t pflash_read(PFlashCFI01 *pfl, hwaddr offset,
 
         break;
     }
-    trace_pflash_io_read(offset, width, ret, pfl->cmd, pfl->wcycle);
+    trace_pflash_io_read(pfl->name, offset, width, ret, pfl->cmd, pfl->wcycle);
 
     return ret;
 }
@@ -410,7 +397,7 @@ static inline void pflash_data_write(PFlashCFI01 *pfl, hwaddr offset,
 {
     uint8_t *p = pfl->storage;
 
-    trace_pflash_data_write(offset, width, value, pfl->counter);
+    trace_pflash_data_write(pfl->name, offset, width, value, pfl->counter);
     switch (width) {
     case 1:
         p[offset] = value;
@@ -449,7 +436,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
 
     cmd = value;
 
-    trace_pflash_io_write(offset, width, value, pfl->wcycle);
+    trace_pflash_io_write(pfl->name, offset, width, value, pfl->wcycle);
     if (!pfl->wcycle) {
         /* Set the device in I/O access mode */
         memory_region_rom_device_set_romd(&pfl->mem, false);
@@ -463,14 +450,13 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
             goto mode_read_array;
         case 0x10: /* Single Byte Program */
         case 0x40: /* Single Byte Program */
-            DPRINTF("%s: Single Byte Program\n", __func__);
+            trace_pflash_write(pfl->name, "single byte program (0)");
             break;
         case 0x20: /* Block erase */
             p = pfl->storage;
             offset &= ~(pfl->sector_len - 1);
 
-            DPRINTF("%s: block erase at " TARGET_FMT_plx " bytes %x\n",
-                    __func__, offset, (unsigned)pfl->sector_len);
+            trace_pflash_write_block_erase(pfl->name, offset, pfl->sector_len);
 
             if (!pfl->ro) {
                 memset(p + offset, 0xff, pfl->sector_len);
@@ -481,25 +467,25 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
             pfl->status |= 0x80; /* Ready! */
             break;
         case 0x50: /* Clear status bits */
-            DPRINTF("%s: Clear status bits\n", __func__);
+            trace_pflash_write(pfl->name, "clear status bits");
             pfl->status = 0x0;
             goto mode_read_array;
         case 0x60: /* Block (un)lock */
-            DPRINTF("%s: Block unlock\n", __func__);
+            trace_pflash_write(pfl->name, "block unlock");
             break;
         case 0x70: /* Status Register */
-            DPRINTF("%s: Read status register\n", __func__);
+            trace_pflash_write(pfl->name, "read status register");
             pfl->cmd = cmd;
             return;
         case 0x90: /* Read Device ID */
-            DPRINTF("%s: Read Device information\n", __func__);
+            trace_pflash_write(pfl->name, "read device information");
             pfl->cmd = cmd;
             return;
         case 0x98: /* CFI query */
-            DPRINTF("%s: CFI query\n", __func__);
+            trace_pflash_write(pfl->name, "CFI query");
             break;
         case 0xe8: /* Write to buffer */
-            DPRINTF("%s: Write to buffer\n", __func__);
+            trace_pflash_write(pfl->name, "write to buffer");
             /* FIXME should save @offset, @width for case 1+ */
             qemu_log_mask(LOG_UNIMP,
                           "%s: Write to buffer emulation is flawed\n",
@@ -507,10 +493,10 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
             pfl->status |= 0x80; /* Ready! */
             break;
         case 0xf0: /* Probe for AMD flash */
-            DPRINTF("%s: Probe for AMD flash\n", __func__);
+            trace_pflash_write(pfl->name, "probe for AMD flash");
             goto mode_read_array;
         case 0xff: /* Read Array */
-            DPRINTF("%s: Read array mode\n", __func__);
+            trace_pflash_write(pfl->name, "read array mode");
             goto mode_read_array;
         default:
             goto error_flash;
@@ -522,7 +508,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
         switch (pfl->cmd) {
         case 0x10: /* Single Byte Program */
         case 0x40: /* Single Byte Program */
-            DPRINTF("%s: Single Byte Program\n", __func__);
+            trace_pflash_write(pfl->name, "single byte program (1)");
             if (!pfl->ro) {
                 pflash_data_write(pfl, offset, value, width, be);
                 pflash_update(pfl, offset, width);
@@ -553,7 +539,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
             } else {
                 value = extract32(value, 0, pfl->bank_width * 8);
             }
-            DPRINTF("%s: block write of %x bytes\n", __func__, value);
+            trace_pflash_write_block(pfl->name, value);
             pfl->counter = value;
             pfl->wcycle++;
             break;
@@ -567,7 +553,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
             } else if (cmd == 0xff) { /* Read Array */
                 goto mode_read_array;
             } else {
-                DPRINTF("%s: Unknown (un)locking command\n", __func__);
+                trace_pflash_write(pfl->name, "unknown (un)locking command");
                 goto mode_read_array;
             }
             break;
@@ -575,7 +561,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
             if (cmd == 0xff) { /* Read Array */
                 goto mode_read_array;
             } else {
-                DPRINTF("%s: leaving query mode\n", __func__);
+                trace_pflash_write(pfl->name, "leaving query mode");
             }
             break;
         default:
@@ -603,7 +589,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
                 hwaddr mask = pfl->writeblock_size - 1;
                 mask = ~mask;
 
-                DPRINTF("%s: block write finished\n", __func__);
+                trace_pflash_write(pfl->name, "block write finished");
                 pfl->wcycle++;
                 if (!pfl->ro) {
                     /* Flush the entire write buffer onto backing storage.  */
@@ -642,7 +628,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
         break;
     default:
         /* Should never happen */
-        DPRINTF("%s: invalid write state\n",  __func__);
+        trace_pflash_write(pfl->name, "invalid write state");
         goto mode_read_array;
     }
     return;
@@ -653,7 +639,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
                   "\n", __func__, offset, pfl->wcycle, pfl->cmd, value);
 
  mode_read_array:
-    trace_pflash_reset();
+    trace_pflash_reset(pfl->name);
     memory_region_rom_device_set_romd(&pfl->mem, true);
     pfl->wcycle = 0;
     pfl->cmd = 0x00; /* This model reset value for READ_ARRAY (not CFI) */
@@ -1022,7 +1008,7 @@ static void postload_update_cb(void *opaque, int running, RunState state)
     qemu_del_vm_change_state_handler(pfl->vmstate);
     pfl->vmstate = NULL;
 
-    DPRINTF("%s: updating bdrv for %s\n", __func__, pfl->name);
+    trace_pflash_postload_cb(pfl->name);
     pflash_update(pfl, 0, pfl->sector_len * pfl->nb_blocs);
 }
 
diff --git a/hw/block/pflash_cfi02.c b/hw/block/pflash_cfi02.c
index 7962cff745..b6de18d3ad 100644
--- a/hw/block/pflash_cfi02.c
+++ b/hw/block/pflash_cfi02.c
@@ -48,14 +48,6 @@
 #include "migration/vmstate.h"
 #include "trace.h"
 
-#define PFLASH_DEBUG false
-#define DPRINTF(fmt, ...)                                  \
-do {                                                       \
-    if (PFLASH_DEBUG) {                                    \
-        fprintf(stderr, "PFLASH: " fmt, ## __VA_ARGS__);   \
-    }                                                      \
-} while (0)
-
 #define PFLASH_LAZY_ROMD_THRESHOLD 42
 
 /*
@@ -220,7 +212,7 @@ static void pflash_timer(void *opaque)
 {
     PFlashCFI02 *pfl = opaque;
 
-    trace_pflash_timer_expired(pfl->cmd);
+    trace_pflash_timer_expired(pfl->name, pfl->cmd);
     if (pfl->cmd == 0x30) {
         /*
          * Sector erase. If DQ3 is 0 when the timer expires, then the 50
@@ -233,11 +225,10 @@ static void pflash_timer(void *opaque)
             uint64_t timeout = pflash_erase_time(pfl);
             timer_mod(&pfl->timer,
                       qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + timeout);
-            DPRINTF("%s: erase timeout fired; erasing %d sectors\n",
-                    __func__, pfl->sectors_to_erase);
+            trace_pflash_erase_timeout(pfl->name, pfl->sectors_to_erase);
             return;
         }
-        DPRINTF("%s: sector erase complete\n", __func__);
+        trace_pflash_erase_complete(pfl->name);
         bitmap_zero(pfl->sector_erase_map, pfl->total_sectors);
         pfl->sectors_to_erase = 0;
         reset_dq3(pfl);
@@ -262,7 +253,7 @@ static uint64_t pflash_data_read(PFlashCFI02 *pfl, hwaddr offset,
 {
     uint8_t *p = (uint8_t *)pfl->storage + offset;
     uint64_t ret = pfl->be ? ldn_be_p(p, width) : ldn_le_p(p, width);
-    trace_pflash_data_read(offset, width, ret);
+    trace_pflash_data_read(pfl->name, offset, width, ret);
     return ret;
 }
 
@@ -325,7 +316,7 @@ static uint64_t pflash_read(void *opaque, hwaddr offset, unsigned int width)
     switch (pfl->cmd) {
     default:
         /* This should never happen : reset state & treat it as a read*/
-        DPRINTF("%s: unknown command state: %x\n", __func__, pfl->cmd);
+        trace_pflash_read_unknown_state(pfl->name, pfl->cmd);
         pfl->wcycle = 0;
         pfl->cmd = 0;
         /* fall through to the read code */
@@ -338,7 +329,7 @@ static uint64_t pflash_read(void *opaque, hwaddr offset, unsigned int width)
             toggle_dq2(pfl);
             /* Status register read */
             ret = pfl->status;
-            DPRINTF("%s: status %" PRIx64 "\n", __func__, ret);
+            trace_pflash_read_status(pfl->name, ret);
             break;
         }
         /* Flash area read */
@@ -363,7 +354,7 @@ static uint64_t pflash_read(void *opaque, hwaddr offset, unsigned int width)
         default:
             ret = pflash_data_read(pfl, offset, width);
         }
-        DPRINTF("%s: ID " TARGET_FMT_plx " %" PRIx64 "\n", __func__, boff, ret);
+        trace_pflash_read_done(pfl->name, boff, ret);
         break;
     case 0x10: /* Chip Erase */
     case 0x30: /* Sector Erase */
@@ -375,7 +366,7 @@ static uint64_t pflash_read(void *opaque, hwaddr offset, unsigned int width)
         toggle_dq6(pfl);
         /* Status register read */
         ret = pfl->status;
-        DPRINTF("%s: status %" PRIx64 "\n", __func__, ret);
+        trace_pflash_read_status(pfl->name, ret);
         break;
     case 0x98:
         /* CFI query mode */
@@ -386,7 +377,7 @@ static uint64_t pflash_read(void *opaque, hwaddr offset, unsigned int width)
         }
         break;
     }
-    trace_pflash_io_read(offset, width, ret, pfl->cmd, pfl->wcycle);
+    trace_pflash_io_read(pfl->name, offset, width, ret, pfl->cmd, pfl->wcycle);
 
     return ret;
 }
@@ -415,9 +406,8 @@ static void pflash_sector_erase(PFlashCFI02 *pfl, hwaddr offset)
     SectorInfo sector_info = pflash_sector_info(pfl, offset);
     uint64_t sector_len = sector_info.len;
     offset &= ~(sector_len - 1);
-    DPRINTF("%s: start sector erase at %0*" PRIx64 "-%0*" PRIx64 "\n",
-            __func__, pfl->width * 2, offset,
-            pfl->width * 2, offset + sector_len - 1);
+    trace_pflash_sector_erase_start(pfl->name, pfl->width * 2, offset,
+                                    pfl->width * 2, offset + sector_len - 1);
     if (!pfl->ro) {
         uint8_t *p = pfl->storage;
         memset(p + offset, 0xff, sector_len);
@@ -438,7 +428,7 @@ static void pflash_write(void *opaque, hwaddr offset, uint64_t value,
     uint8_t *p;
     uint8_t cmd;
 
-    trace_pflash_io_write(offset, width, value, pfl->wcycle);
+    trace_pflash_io_write(pfl->name, offset, width, value, pfl->wcycle);
     cmd = value;
     if (pfl->cmd != 0xA0) {
         /* Reset does nothing during chip erase and sector erase. */
@@ -496,27 +486,24 @@ static void pflash_write(void *opaque, hwaddr offset, uint64_t value,
             return;
         }
         if (boff != pfl->unlock_addr0 || cmd != 0xAA) {
-            DPRINTF("%s: unlock0 failed " TARGET_FMT_plx " %02x %04x\n",
-                    __func__, boff, cmd, pfl->unlock_addr0);
+            trace_pflash_unlock0_failed(pfl->name, boff, cmd, pfl->unlock_addr0);
             goto reset_flash;
         }
-        DPRINTF("%s: unlock sequence started\n", __func__);
+        trace_pflash_write(pfl->name, "unlock sequence started");
         break;
     case 1:
         /* We started an unlock sequence */
     check_unlock1:
         if (boff != pfl->unlock_addr1 || cmd != 0x55) {
-            DPRINTF("%s: unlock1 failed " TARGET_FMT_plx " %02x\n", __func__,
-                    boff, cmd);
+            trace_pflash_unlock1_failed(pfl->name, boff, cmd);
             goto reset_flash;
         }
-        DPRINTF("%s: unlock sequence done\n", __func__);
+        trace_pflash_write(pfl->name, "unlock sequence done");
         break;
     case 2:
         /* We finished an unlock sequence */
         if (!pfl->bypass && boff != pfl->unlock_addr0) {
-            DPRINTF("%s: command failed " TARGET_FMT_plx " %02x\n", __func__,
-                    boff, cmd);
+            trace_pflash_write_failed(pfl->name, boff, cmd);
             goto reset_flash;
         }
         switch (cmd) {
@@ -527,10 +514,10 @@ static void pflash_write(void *opaque, hwaddr offset, uint64_t value,
         case 0x90: /* Autoselect */
         case 0xA0: /* Program */
             pfl->cmd = cmd;
-            DPRINTF("%s: starting command %02x\n", __func__, cmd);
+            trace_pflash_write_start(pfl->name, cmd);
             break;
         default:
-            DPRINTF("%s: unknown command %02x\n", __func__, cmd);
+            trace_pflash_write_unknown(pfl->name, cmd);
             goto reset_flash;
         }
         break;
@@ -548,7 +535,7 @@ static void pflash_write(void *opaque, hwaddr offset, uint64_t value,
                 }
                 goto reset_flash;
             }
-            trace_pflash_data_write(offset, width, value, 0);
+            trace_pflash_data_write(pfl->name, offset, width, value, 0);
             if (!pfl->ro) {
                 p = (uint8_t *)pfl->storage + offset;
                 if (pfl->be) {
@@ -586,8 +573,7 @@ static void pflash_write(void *opaque, hwaddr offset, uint64_t value,
             }
             /* fall through */
         default:
-            DPRINTF("%s: invalid write for command %02x\n",
-                    __func__, pfl->cmd);
+            trace_pflash_write_invalid(pfl->name, pfl->cmd);
             goto reset_flash;
         }
     case 4:
@@ -600,8 +586,7 @@ static void pflash_write(void *opaque, hwaddr offset, uint64_t value,
             goto check_unlock1;
         default:
             /* Should never happen */
-            DPRINTF("%s: invalid command state %02x (wc 4)\n",
-                    __func__, pfl->cmd);
+            trace_pflash_write_invalid_state(pfl->name, pfl->cmd, 5);
             goto reset_flash;
         }
         break;
@@ -613,12 +598,11 @@ static void pflash_write(void *opaque, hwaddr offset, uint64_t value,
         switch (cmd) {
         case 0x10: /* Chip Erase */
             if (boff != pfl->unlock_addr0) {
-                DPRINTF("%s: chip erase: invalid address " TARGET_FMT_plx "\n",
-                        __func__, offset);
+                trace_pflash_chip_erase_invalid(pfl->name, offset);
                 goto reset_flash;
             }
             /* Chip erase */
-            DPRINTF("%s: start chip erase\n", __func__);
+            trace_pflash_chip_erase_start(pfl->name);
             if (!pfl->ro) {
                 memset(pfl->storage, 0xff, pfl->chip_len);
                 pflash_update(pfl, 0, pfl->chip_len);
@@ -632,7 +616,7 @@ static void pflash_write(void *opaque, hwaddr offset, uint64_t value,
             pflash_sector_erase(pfl, offset);
             break;
         default:
-            DPRINTF("%s: invalid command %02x (wc 5)\n", __func__, cmd);
+            trace_pflash_write_invalid_command(pfl->name, cmd);
             goto reset_flash;
         }
         pfl->cmd = cmd;
@@ -683,19 +667,18 @@ static void pflash_write(void *opaque, hwaddr offset, uint64_t value,
             return;
         default:
             /* Should never happen */
-            DPRINTF("%s: invalid command state %02x (wc 6)\n",
-                    __func__, pfl->cmd);
+            trace_pflash_write_invalid_state(pfl->name, pfl->cmd, 6);
             goto reset_flash;
         }
         break;
     /* Special values for CFI queries */
     case WCYCLE_CFI:
     case WCYCLE_AUTOSELECT_CFI:
-        DPRINTF("%s: invalid write in CFI query mode\n", __func__);
+        trace_pflash_write(pfl->name, "invalid write in CFI query mode");
         goto reset_flash;
     default:
         /* Should never happen */
-        DPRINTF("%s: invalid write state (wc 7)\n",  __func__);
+        trace_pflash_write(pfl->name, "invalid write state (wc 7)");
         goto reset_flash;
     }
     pfl->wcycle++;
@@ -704,7 +687,7 @@ static void pflash_write(void *opaque, hwaddr offset, uint64_t value,
 
     /* Reset flash */
  reset_flash:
-    trace_pflash_reset();
+    trace_pflash_reset(pfl->name);
     pfl->bypass = 0;
     pfl->wcycle = 0;
     pfl->cmd = 0;
diff --git a/hw/block/trace-events b/hw/block/trace-events
index c1537e3ac0..a715a2e173 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -6,15 +6,36 @@ fdc_ioport_write(uint8_t reg, uint8_t value) "write reg 0x%02x val 0x%02x"
 
 # pflash_cfi01.c
 # pflash_cfi02.c
-pflash_reset(void) "reset"
-pflash_timer_expired(uint8_t cmd) "command 0x%02x done"
-pflash_io_read(uint64_t offset, unsigned size, uint32_t value, uint8_t cmd, uint8_t wcycle) "offset:0x%04"PRIx64" size:%u value:0x%04x cmd:0x%02x wcycle:%u"
-pflash_io_write(uint64_t offset, unsigned size, uint32_t value, uint8_t wcycle) "offset:0x%04"PRIx64" size:%u value:0x%04x wcycle:%u"
-pflash_data_read(uint64_t offset, unsigned size, uint32_t value) "data offset:0x%04"PRIx64" size:%u value:0x%04x"
-pflash_data_write(uint64_t offset, unsigned size, uint32_t value, uint64_t counter) "data offset:0x%04"PRIx64" size:%u value:0x%04x counter:0x%016"PRIx64
-pflash_manufacturer_id(uint16_t id) "Read Manufacturer ID: 0x%04x"
-pflash_device_id(uint16_t id) "Read Device ID: 0x%04x"
-pflash_device_info(uint64_t offset) "Read Device Information offset:0x%04"PRIx64
+pflash_chip_erase_invalid(const char *name, uint64_t offset) "%s: chip erase: invalid address 0x%" PRIx64
+pflash_chip_erase_start(const char *name) "%s: start chip erase"
+pflash_data_read(const char *name, uint64_t offset, unsigned size, uint32_t value) "%s: data offset:0x%04"PRIx64" size:%u value:0x%04x"
+pflash_data_write(const char *name, uint64_t offset, unsigned size, uint32_t value, uint64_t counter) "%s: data offset:0x%04"PRIx64" size:%u value:0x%04x counter:0x%016"PRIx64
+pflash_device_id(const char *name, uint16_t id) "%s: read device ID: 0x%04x"
+pflash_device_info(const char *name, uint64_t offset) "%s: read device information offset:0x%04" PRIx64
+pflash_erase_complete(const char *name) "%s: sector erase complete"
+pflash_erase_timeout(const char *name, int count) "%s: erase timeout fired; erasing %d sectors"
+pflash_io_read(const char *name, uint64_t offset, unsigned int size, uint32_t value, uint8_t cmd, uint8_t wcycle) "%s: offset:0x%04" PRIx64 " size:%u value:0x%04x cmd:0x%02x wcycle:%u"
+pflash_io_write(const char *name, uint64_t offset, unsigned int size, uint32_t value, uint8_t wcycle) "%s: offset:0x%04"PRIx64" size:%u value:0x%04x wcycle:%u"
+pflash_manufacturer_id(const char *name, uint16_t id) "%s: read manufacturer ID: 0x%04x"
+pflash_postload_cb(const char *name)  "%s: updating bdrv"
+pflash_read_done(const char *name, uint64_t offset, uint64_t ret) "%s: ID:0x%" PRIx64 " ret:0x%" PRIx64
+pflash_read_status(const char *name, uint32_t ret) "%s: status:0x%x"
+pflash_read_unknown_state(const char *name, uint8_t cmd) "%s: unknown command state:0x%x"
+pflash_reset(const char *name) "%s: reset"
+pflash_sector_erase_start(const char *name, int width1, uint64_t start, int width2, uint64_t end) "%s: start sector erase at: 0x%0*" PRIx64 "-0x%0*" PRIx64
+pflash_timer_expired(const char *name, uint8_t cmd) "%s: command 0x%02x done"
+pflash_unlock0_failed(const char *name, uint64_t offset, uint8_t cmd, uint16_t addr0) "%s: unlock0 failed 0x%" PRIx64 " 0x%02x 0x%04x"
+pflash_unlock1_failed(const char *name, uint64_t offset, uint8_t cmd) "%s: unlock0 failed 0x%" PRIx64 " 0x%02x"
+pflash_unsupported_device_configuration(const char *name, uint8_t width, uint8_t max) "%s: unsupported device configuration: device_width:%d max_device_width:%d"
+pflash_write(const char *name, const char *str) "%s: %s"
+pflash_write_block(const char *name, uint32_t value) "%s: block write: bytes:0x%x"
+pflash_write_block_erase(const char *name, uint64_t offset, uint64_t len) "%s: block erase offset:0x%" PRIx64 " bytes:0x%lx"
+pflash_write_failed(const char *name, uint64_t offset, uint8_t cmd) "%s: command failed 0x%" PRIx64 " 0x%02x"
+pflash_write_invalid(const char *name, uint8_t cmd) "%s: invalid write for command 0x%02x"
+pflash_write_invalid_command(const char *name, uint8_t cmd) "%s: invalid command 0x%02x (wc 5)"
+pflash_write_invalid_state(const char *name, uint8_t cmd, int wc) "%s: invalid command state 0x%02x (wc %d)"
+pflash_write_start(const char *name, uint8_t cmd) "%s: starting command 0x%02x"
+pflash_write_unknown(const char *name, uint8_t cmd) "%s: unknown command 0x%02x"
 
 # virtio-blk.c
 virtio_blk_req_complete(void *vdev, void *req, int status) "vdev %p req %p status %d"
-- 
2.30.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH 2/3] hw/pflash_cfi01: Correct the type of PFlashCFI01.ro
  2021-02-16 14:27 [RFC PATCH 0/3] hw/pflash_cfi01: Reduce memory consumption when flash image is smaller than region David Edmondson
  2021-02-16 14:27 ` [RFC PATCH 1/3] hw/pflash_cfi*: Replace DPRINTF with trace events David Edmondson
@ 2021-02-16 14:27 ` David Edmondson
  2021-02-16 14:27 ` [RFC PATCH 3/3] hw/pflash_cfi01: Allow read-only devices to have a smaller backing device David Edmondson
  2021-02-16 15:03 ` [RFC PATCH 0/3] hw/pflash_cfi01: Reduce memory consumption when flash image is smaller than region Philippe Mathieu-Daudé
  3 siblings, 0 replies; 9+ messages in thread
From: David Edmondson @ 2021-02-16 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, David Edmondson, Philippe Mathieu-Daudé,
	qemu-block, Max Reitz

PFlashCFI01.ro is a bool, declare it as such.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
---
 hw/block/pflash_cfi01.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index 9e1f3b42c6..6b21b4af52 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -72,7 +72,7 @@ struct PFlashCFI01 {
     uint8_t max_device_width;  /* max device width in bytes */
     uint32_t features;
     uint8_t wcycle; /* if 0, the flash is read normally */
-    int ro;
+    bool ro;
     uint8_t cmd;
     uint8_t status;
     uint16_t ident0;
@@ -738,7 +738,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error **errp)
             return;
         }
     } else {
-        pfl->ro = 0;
+        pfl->ro = false;
     }
 
     if (pfl->blk) {
-- 
2.30.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH 3/3] hw/pflash_cfi01: Allow read-only devices to have a smaller backing device
  2021-02-16 14:27 [RFC PATCH 0/3] hw/pflash_cfi01: Reduce memory consumption when flash image is smaller than region David Edmondson
  2021-02-16 14:27 ` [RFC PATCH 1/3] hw/pflash_cfi*: Replace DPRINTF with trace events David Edmondson
  2021-02-16 14:27 ` [RFC PATCH 2/3] hw/pflash_cfi01: Correct the type of PFlashCFI01.ro David Edmondson
@ 2021-02-16 14:27 ` David Edmondson
  2021-02-16 15:03 ` [RFC PATCH 0/3] hw/pflash_cfi01: Reduce memory consumption when flash image is smaller than region Philippe Mathieu-Daudé
  3 siblings, 0 replies; 9+ messages in thread
From: David Edmondson @ 2021-02-16 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, David Edmondson, Philippe Mathieu-Daudé,
	qemu-block, Max Reitz

If a flash device is to be read-only, allow the backing device to be
smaller than the extent of the flash device by mapping it as a
subregion of the flash device region.

Return zeroes for all reads of the flash device beyond the extent of
the backing device.

Writes beyond the extent of the backing device fail, as the device is
read-only.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
---
 hw/block/pflash_cfi01.c | 115 +++++++++++++++++++++++++++++++---------
 hw/block/trace-events   |   3 ++
 2 files changed, 93 insertions(+), 25 deletions(-)

diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index 6b21b4af52..0d305c7d87 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -83,6 +83,8 @@ struct PFlashCFI01 {
     uint64_t counter;
     unsigned int writeblock_size;
     MemoryRegion mem;
+    MemoryRegion mem_outer;
+    char outer_name[64];
     char *name;
     void *storage;
     VMChangeStateEntry *vmstate;
@@ -425,7 +427,6 @@ static inline void pflash_data_write(PFlashCFI01 *pfl, hwaddr offset,
         }
         break;
     }
-
 }
 
 static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
@@ -646,8 +647,39 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
 }
 
 
-static MemTxResult pflash_mem_read_with_attrs(void *opaque, hwaddr addr, uint64_t *value,
-                                              unsigned len, MemTxAttrs attrs)
+static MemTxResult pflash_outer_read_with_attrs(void *opaque, hwaddr addr,
+                                                    uint64_t *value,
+                                                    unsigned len,
+                                                    MemTxAttrs attrs)
+{
+    PFlashCFI01 *pfl = opaque;
+
+    trace_pflash_outer_read(pfl->name, addr, len);
+    *value = 0;
+    return MEMTX_OK;
+}
+
+static MemTxResult pflash_outer_write_with_attrs(void *opaque, hwaddr addr,
+                                                     uint64_t value,
+                                                     unsigned len,
+                                                     MemTxAttrs attrs)
+{
+    PFlashCFI01 *pfl = opaque;
+
+    trace_pflash_outer_write(pfl->name, addr, len);
+    assert(pfl->ro);
+    return MEMTX_ERROR;
+}
+
+static const MemoryRegionOps pflash_cfi01_outer_ops = {
+    .read_with_attrs = pflash_outer_read_with_attrs,
+    .write_with_attrs = pflash_outer_write_with_attrs,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+static MemTxResult pflash_mem_read_with_attrs(void *opaque, hwaddr addr,
+                                              uint64_t *value, unsigned len,
+                                              MemTxAttrs attrs)
 {
     PFlashCFI01 *pfl = opaque;
     bool be = !!(pfl->features & (1 << PFLASH_BE));
@@ -660,8 +692,9 @@ static MemTxResult pflash_mem_read_with_attrs(void *opaque, hwaddr addr, uint64_
     return MEMTX_OK;
 }
 
-static MemTxResult pflash_mem_write_with_attrs(void *opaque, hwaddr addr, uint64_t value,
-                                               unsigned len, MemTxAttrs attrs)
+static MemTxResult pflash_mem_write_with_attrs(void *opaque, hwaddr addr,
+                                               uint64_t value, unsigned len,
+                                               MemTxAttrs attrs)
 {
     PFlashCFI01 *pfl = opaque;
     bool be = !!(pfl->features & (1 << PFLASH_BE));
@@ -684,7 +717,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error **errp)
 {
     ERRP_GUARD();
     PFlashCFI01 *pfl = PFLASH_CFI01(dev);
-    uint64_t total_len;
+    uint64_t outer_len, inner_len;
     int ret;
     uint64_t blocks_per_device, sector_len_per_device, device_len;
     int num_devices;
@@ -702,7 +735,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error **errp)
         return;
     }
 
-    total_len = pfl->sector_len * pfl->nb_blocs;
+    outer_len = pfl->sector_len * pfl->nb_blocs;
 
     /* These are only used to expose the parameters of each device
      * in the cfi_table[].
@@ -717,36 +750,68 @@ static void pflash_cfi01_realize(DeviceState *dev, Error **errp)
     }
     device_len = sector_len_per_device * blocks_per_device;
 
-    memory_region_init_rom_device(
-        &pfl->mem, OBJECT(dev),
-        &pflash_cfi01_ops,
-        pfl,
-        pfl->name, total_len, errp);
-    if (*errp) {
-        return;
-    }
-
-    pfl->storage = memory_region_get_ram_ptr(&pfl->mem);
-    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &pfl->mem);
-
     if (pfl->blk) {
         uint64_t perm;
+
         pfl->ro = !blk_supports_write_perm(pfl->blk);
         perm = BLK_PERM_CONSISTENT_READ | (pfl->ro ? 0 : BLK_PERM_WRITE);
         ret = blk_set_perm(pfl->blk, perm, BLK_PERM_ALL, errp);
         if (ret < 0) {
             return;
         }
+
+        if (pfl->ro) {
+            /*
+             * A read-only flash device can utilise an underlying
+             * block device that is smaller than the flash device, but
+             * not one that is larger.
+             */
+            inner_len = blk_getlength(pfl->blk);
+
+            if (inner_len > outer_len) {
+                error_setg(errp,
+                           "block backend provides %" HWADDR_PRIu " bytes, "
+                           "device limited to %" PRIu64 " bytes",
+                           inner_len, outer_len);
+                return;
+            }
+        } else {
+            /*
+             * If the flash device is to be writable, the underlying
+             * block device must be large enough to accommodate it.
+             */
+            inner_len = outer_len;
+        }
     } else {
         pfl->ro = false;
+        inner_len = outer_len;
     }
 
-    if (pfl->blk) {
-        if (!blk_check_size_and_read_all(pfl->blk, pfl->storage, total_len,
-                                         errp)) {
-            vmstate_unregister_ram(&pfl->mem, DEVICE(pfl));
-            return;
-        }
+    trace_pflash_realize(pfl->name, pfl->ro, inner_len, outer_len);
+
+    snprintf(pfl->outer_name, sizeof(pfl->outer_name),
+             "%s container", pfl->name);
+    memory_region_init_io(&pfl->mem_outer, OBJECT(dev),
+                          &pflash_cfi01_outer_ops,
+                          pfl, pfl->outer_name, outer_len);
+
+    memory_region_init_rom_device(&pfl->mem, OBJECT(dev),
+                                  &pflash_cfi01_ops,
+                                  pfl, pfl->name, inner_len, errp);
+    if (*errp) {
+        return;
+    }
+
+    memory_region_add_subregion(&pfl->mem_outer, 0, &pfl->mem);
+
+    pfl->storage = memory_region_get_ram_ptr(&pfl->mem);
+    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &pfl->mem_outer);
+    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &pfl->mem);
+
+    if (pfl->blk &&
+        !blk_check_size_and_read_all(pfl->blk, pfl->storage, inner_len, errp)) {
+        vmstate_unregister_ram(&pfl->mem, DEVICE(pfl));
+        return;
     }
 
     /* Default to devices being used at their maximum device width. This was
diff --git a/hw/block/trace-events b/hw/block/trace-events
index a715a2e173..85b501e23e 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -17,10 +17,13 @@ pflash_erase_timeout(const char *name, int count) "%s: erase timeout fired; eras
 pflash_io_read(const char *name, uint64_t offset, unsigned int size, uint32_t value, uint8_t cmd, uint8_t wcycle) "%s: offset:0x%04" PRIx64 " size:%u value:0x%04x cmd:0x%02x wcycle:%u"
 pflash_io_write(const char *name, uint64_t offset, unsigned int size, uint32_t value, uint8_t wcycle) "%s: offset:0x%04"PRIx64" size:%u value:0x%04x wcycle:%u"
 pflash_manufacturer_id(const char *name, uint16_t id) "%s: read manufacturer ID: 0x%04x"
+pflash_outer_read(const char *name, uint64_t addr, unsigned int len) "%s: addr:0x%" PRIx64 " len:%d"
+pflash_outer_write(const char *name, uint64_t addr, unsigned int len) "%s: addr:0x%" PRIx64 " len:%d"
 pflash_postload_cb(const char *name)  "%s: updating bdrv"
 pflash_read_done(const char *name, uint64_t offset, uint64_t ret) "%s: ID:0x%" PRIx64 " ret:0x%" PRIx64
 pflash_read_status(const char *name, uint32_t ret) "%s: status:0x%x"
 pflash_read_unknown_state(const char *name, uint8_t cmd) "%s: unknown command state:0x%x"
+pflash_realize(const char *name, bool ro, uint64_t blk_len, uint64_t total_len) "%s: ro:%d blk_len:0x%" PRIx64 " total_len:0x%" PRIx64
 pflash_reset(const char *name) "%s: reset"
 pflash_sector_erase_start(const char *name, int width1, uint64_t start, int width2, uint64_t end) "%s: start sector erase at: 0x%0*" PRIx64 "-0x%0*" PRIx64
 pflash_timer_expired(const char *name, uint8_t cmd) "%s: command 0x%02x done"
-- 
2.30.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 0/3] hw/pflash_cfi01: Reduce memory consumption when flash image is smaller than region
  2021-02-16 14:27 [RFC PATCH 0/3] hw/pflash_cfi01: Reduce memory consumption when flash image is smaller than region David Edmondson
                   ` (2 preceding siblings ...)
  2021-02-16 14:27 ` [RFC PATCH 3/3] hw/pflash_cfi01: Allow read-only devices to have a smaller backing device David Edmondson
@ 2021-02-16 15:03 ` Philippe Mathieu-Daudé
  2021-02-16 15:22   ` David Edmondson
  3 siblings, 1 reply; 9+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-02-16 15:03 UTC (permalink / raw)
  To: David Edmondson, qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

On 2/16/21 3:27 PM, David Edmondson wrote:
> As described in
> https://lore.kernel.org/r/20201116104216.439650-1-david.edmondson@oracle.com,
> I'd like to reduce the amount of memory consumed by QEMU mapping UEFI
> images on aarch64.
> 
> To recap:
> 
>> Currently ARM UEFI images are typically built as 2MB/768kB flash
>> images for code and variables respectively. These images are both
>> then padded out to 64MB before being loaded by QEMU.
>>
>> Because the images are 64MB each, QEMU allocates 128MB of memory to
>> read them, and then proceeds to read all 128MB from disk (dirtying
>> the memory). Of this 128MB less than 3MB is useful - the rest is
>> zero padding.
>>
>> On a machine with 100 VMs this wastes over 12GB of memory.
> 
> There were objections to my previous patch because it changed the size
> of the regions reported to the guest via the memory map (the reported
> size depended on the size of the image).
> 
> This is a smaller patch which only helps with read-only flash images,
> as it does so by changing the memory region that covers the entire
> region to be IO rather than RAM, and loads the flash image into a
> smaller sub-region that is the more traditional mixed IO/ROMD type.
> 
> All read/write operations to areas outside of the underlying block
> device are handled directly (reads return 0, writes fail (which is
> okay, because this path only supports read-only devices)).
> 
> This reduces the memory consumption for the read-only AAVMF code image
> from 64MB to around 2MB (presuming that the UEFI image is adjusted
> accordingly). It does nothing to improve the memory consumption caused
> by the read-write AAVMF vars image.

So for each VM this changes from 64 + 64 to 2 + 64 MiB.

100 VMs now use 6.5GB instead of 400MB. Quite an improvement already :)

> There was a suggestion in a previous thread that perhaps the pflash
> driver could be re-worked to use the block IO interfaces to access the
> underlying device "on demand" rather than reading in the entire image
> at startup (at least, that's how I understood the comment).
> 
> I looked at implementing this and struggled to get it to work for all
> of the required use cases. Specifically, there are several code paths
> that expect to retrieve a pointer to the flat memory image of the
> pflash device and manipulate it directly (examples include the Malta
> board and encrypted memory support on x86), or write the entire image
> to storage (after migration).

IIUC these are specific uses when the machine is paused. For Malta we
can map a ROM instead.

I don't know about encrypted x86 machines.

> My implementation was based around mapping the flash region only for
> IO, which meant that every read or write had to be handled directly by
> the pflash driver (there was no ROMD style operation), which also made
> booting an aarch64 VM noticeably slower - getting through the firmware
> went from under 1 second to around 10 seconds.
> 
> Improving the writeable device support requires some more general
> infrastructure, I think, but I'm not familiar with everything that
> QEMU currently provides, and would be very happy to learn otherwise.

I am not a block expert, but I wonder if something like this could
be used:

- create a raw (parent) block image of 64MiB

- add a raw (child) block with your 768kB of VARS file

- add a null-co (child) block of 63Mib + 256kiB

- pass the parent block to the pflash device

Regards,

Phil.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 0/3] hw/pflash_cfi01: Reduce memory consumption when flash image is smaller than region
  2021-02-16 15:03 ` [RFC PATCH 0/3] hw/pflash_cfi01: Reduce memory consumption when flash image is smaller than region Philippe Mathieu-Daudé
@ 2021-02-16 15:22   ` David Edmondson
  2021-02-16 15:44     ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 9+ messages in thread
From: David Edmondson @ 2021-02-16 15:22 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

On Tuesday, 2021-02-16 at 16:03:05 +01, Philippe Mathieu-Daudé wrote:

> I am not a block expert, but I wonder if something like this could
> be used:
>
> - create a raw (parent) block image of 64MiB
>
> - add a raw (child) block with your 768kB of VARS file
>
> - add a null-co (child) block of 63Mib + 256kiB
>
> - pass the parent block to the pflash device

I'm not clear how this would behave if there is a write to the device at
(say) 1MiB.

More philosophically, how should it behave?

My expectation was that if the machine says that there is 64MiB of
writable flash, we have to allow writes throughout the full 64MiB and
(significantly) persist them to the backing block device.

Just because the backing block device started out 768KiB big doesn't
mean that we should not write to the remaining extent if that's what the
VM attempts.

Would the above approach achieve that? (It doesn't sound like it.)

dme.
-- 
No visible means of support and you have not seen nothing yet.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 0/3] hw/pflash_cfi01: Reduce memory consumption when flash image is smaller than region
  2021-02-16 15:22   ` David Edmondson
@ 2021-02-16 15:44     ` Philippe Mathieu-Daudé
  2021-02-16 15:53       ` David Edmondson
  0 siblings, 1 reply; 9+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-02-16 15:44 UTC (permalink / raw)
  To: David Edmondson, qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

On 2/16/21 4:22 PM, David Edmondson wrote:
> On Tuesday, 2021-02-16 at 16:03:05 +01, Philippe Mathieu-Daudé wrote:
> 
>> I am not a block expert, but I wonder if something like this could
>> be used:
>>
>> - create a raw (parent) block image of 64MiB
>>
>> - add a raw (child) block with your 768kB of VARS file
>>
>> - add a null-co (child) block of 63Mib + 256kiB
>>
>> - pass the parent block to the pflash device
> 
> I'm not clear how this would behave if there is a write to the device at
> (say) 1MiB.

Discarded.

> More philosophically, how should it behave?

null-co: reads return zeroes, writes are discarded.

> My expectation was that if the machine says that there is 64MiB of
> writable flash, we have to allow writes throughout the full 64MiB and
> (significantly) persist them to the backing block device.
> 
> Just because the backing block device started out 768KiB big doesn't
> mean that we should not write to the remaining extent if that's what the
> VM attempts.
> 
> Would the above approach achieve that? (It doesn't sound like it.)

Well this was a simple suggestion if you know your guest won't access
anything beside these 768KiB (IIRC AAVMF "consumes" the flash devices
and doesn't expose them to the guest via ACPI/other).

If you are into memory optimization, this is worth considering.
Else it doesn't make sense.

AAVMF is designed for virtual world. Is the virtual world interested in
doing firmware upgrade on the CODE flash? Unlikely, if you want to
upgrade AAVMF you'd do it on the host. Why not mount the CODE flash as
ROM? I suppose because AAVMF does CFI MMIO accesses to detect the flash,
but I wonder what is the point if this flash will be ever written...

Regards,

Phil.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 0/3] hw/pflash_cfi01: Reduce memory consumption when flash image is smaller than region
  2021-02-16 15:44     ` Philippe Mathieu-Daudé
@ 2021-02-16 15:53       ` David Edmondson
  2021-02-18 10:34         ` David Edmondson
  0 siblings, 1 reply; 9+ messages in thread
From: David Edmondson @ 2021-02-16 15:53 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

On Tuesday, 2021-02-16 at 16:44:58 +01, Philippe Mathieu-Daudé wrote:

> On 2/16/21 4:22 PM, David Edmondson wrote:
>> On Tuesday, 2021-02-16 at 16:03:05 +01, Philippe Mathieu-Daudé wrote:
>> 
>>> I am not a block expert, but I wonder if something like this could
>>> be used:
>>>
>>> - create a raw (parent) block image of 64MiB
>>>
>>> - add a raw (child) block with your 768kB of VARS file
>>>
>>> - add a null-co (child) block of 63Mib + 256kiB
>>>
>>> - pass the parent block to the pflash device
>> 
>> I'm not clear how this would behave if there is a write to the device at
>> (say) 1MiB.
>
> Discarded.
>
>> More philosophically, how should it behave?
>
> null-co: reads return zeroes, writes are discarded.
>
>> My expectation was that if the machine says that there is 64MiB of
>> writable flash, we have to allow writes throughout the full 64MiB and
>> (significantly) persist them to the backing block device.
>> 
>> Just because the backing block device started out 768KiB big doesn't
>> mean that we should not write to the remaining extent if that's what the
>> VM attempts.
>> 
>> Would the above approach achieve that? (It doesn't sound like it.)
>
> Well this was a simple suggestion if you know your guest won't access
> anything beside these 768KiB (IIRC AAVMF "consumes" the flash devices
> and doesn't expose them to the guest via ACPI/other).

If that's the case, mirroring the approach in the patch that I sent
should work - we run the 768kiB as a subregion with IO/ROMD behaviour,
fail or discard writes to the rest and read as zero.

> If you are into memory optimization, this is worth considering.
> Else it doesn't make sense.
>
> AAVMF is designed for virtual world. Is the virtual world interested in
> doing firmware upgrade on the CODE flash? Unlikely, if you want to
> upgrade AAVMF you'd do it on the host. Why not mount the CODE flash as
> ROM? I suppose because AAVMF does CFI MMIO accesses to detect the flash,
> but I wonder what is the point if this flash will be ever written...

I don't expect that the CODE flash will ever be written by the guest, no
(it's generally presented to the guest as read-only), and currently the
VARS flash is also often presented read-only as well, but I don't think
that is a situation that we can maintain given increasing use (DBX
updates, for example).

If it's acceptable to limit flash writes to the extent of the underlying
block device then things are straightforward. It's not clear to me that
this is the case.

dme.
-- 
She's got no name, but she is family.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 0/3] hw/pflash_cfi01: Reduce memory consumption when flash image is smaller than region
  2021-02-16 15:53       ` David Edmondson
@ 2021-02-18 10:34         ` David Edmondson
  0 siblings, 0 replies; 9+ messages in thread
From: David Edmondson @ 2021-02-18 10:34 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

On Tuesday, 2021-02-16 at 15:53:48 GMT, David Edmondson wrote:

> On Tuesday, 2021-02-16 at 16:44:58 +01, Philippe Mathieu-Daudé wrote:
>
>> On 2/16/21 4:22 PM, David Edmondson wrote:
>>> On Tuesday, 2021-02-16 at 16:03:05 +01, Philippe Mathieu-Daudé wrote:
>>> 
>>>> I am not a block expert, but I wonder if something like this could
>>>> be used:
>>>>
>>>> - create a raw (parent) block image of 64MiB
>>>>
>>>> - add a raw (child) block with your 768kB of VARS file
>>>>
>>>> - add a null-co (child) block of 63Mib + 256kiB
>>>>
>>>> - pass the parent block to the pflash device
>>> 
>>> I'm not clear how this would behave if there is a write to the device at
>>> (say) 1MiB.
>>
>> Discarded.
>>
>>> More philosophically, how should it behave?
>>
>> null-co: reads return zeroes, writes are discarded.
>>
>>> My expectation was that if the machine says that there is 64MiB of
>>> writable flash, we have to allow writes throughout the full 64MiB and
>>> (significantly) persist them to the backing block device.
>>> 
>>> Just because the backing block device started out 768KiB big doesn't
>>> mean that we should not write to the remaining extent if that's what the
>>> VM attempts.
>>> 
>>> Would the above approach achieve that? (It doesn't sound like it.)
>>
>> Well this was a simple suggestion if you know your guest won't access
>> anything beside these 768KiB (IIRC AAVMF "consumes" the flash devices
>> and doesn't expose them to the guest via ACPI/other).
>
> If that's the case, mirroring the approach in the patch that I sent
> should work - we run the 768kiB as a subregion with IO/ROMD behaviour,
> fail or discard writes to the rest and read as zero.
>
>> If you are into memory optimization, this is worth considering.
>> Else it doesn't make sense.
>>
>> AAVMF is designed for virtual world. Is the virtual world interested in
>> doing firmware upgrade on the CODE flash? Unlikely, if you want to
>> upgrade AAVMF you'd do it on the host. Why not mount the CODE flash as
>> ROM? I suppose because AAVMF does CFI MMIO accesses to detect the flash,
>> but I wonder what is the point if this flash will be ever written...
>
> I don't expect that the CODE flash will ever be written by the guest, no
> (it's generally presented to the guest as read-only), and currently the
> VARS flash is also often presented read-only as well, but I don't think
> that is a situation that we can maintain given increasing use (DBX
> updates, for example).
>
> If it's acceptable to limit flash writes to the extent of the underlying
> block device then things are straightforward. It's not clear to me that
> this is the case.

Looking at the AAVMF implementation, it seems to me that if the initial
VARS image is prepared as being 768KiB in size (which it is), none of
AAVMF itself will attempt to write outside of that extent.

If that is correct, it would support the idea of allowing writes only to
the sub-region of the 64MiB that is covered by the backing block device.

This would allow the same approach as used for read-only devices to be
used for read-write devices - it's mostly a matter of deleting some code
from the patch that I sent. The additional patch on top of the previous
series would be:
https://p.dme.org/0004-hw-pflash_cfi01-Allow-devices-to-have-a-smaller-back.patch.html
(Presumably this would just be squashed into the previous patch in
reality.)

If AAVMF is not going to try to use any of the region outside of the
VARS, do we need to persist writes to the rest of the region?

If so, would it be acceptable for those writes to persist only for the
lifetime of a QEMU instance? (Memory could be allocated "on demand" so
that a read after write would behave, but the content would be thrown
away when QEMU exits.)

dme.
-- 
Everyone I know, goes away in the end.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-02-18 10:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-16 14:27 [RFC PATCH 0/3] hw/pflash_cfi01: Reduce memory consumption when flash image is smaller than region David Edmondson
2021-02-16 14:27 ` [RFC PATCH 1/3] hw/pflash_cfi*: Replace DPRINTF with trace events David Edmondson
2021-02-16 14:27 ` [RFC PATCH 2/3] hw/pflash_cfi01: Correct the type of PFlashCFI01.ro David Edmondson
2021-02-16 14:27 ` [RFC PATCH 3/3] hw/pflash_cfi01: Allow read-only devices to have a smaller backing device David Edmondson
2021-02-16 15:03 ` [RFC PATCH 0/3] hw/pflash_cfi01: Reduce memory consumption when flash image is smaller than region Philippe Mathieu-Daudé
2021-02-16 15:22   ` David Edmondson
2021-02-16 15:44     ` Philippe Mathieu-Daudé
2021-02-16 15:53       ` David Edmondson
2021-02-18 10:34         ` David Edmondson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.