All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/3] hw/block/pflash: Mmap read-only backend files with MAP_SHARED
@ 2021-03-01 11:53 Philippe Mathieu-Daudé
  2021-03-01 11:53 ` [RFC PATCH v2 1/3] exec/memory: Introduce memory_region_init_rom_device_from_file() Philippe Mathieu-Daudé
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-03-01 11:53 UTC (permalink / raw)
  To: qemu-devel, David Edmondson
  Cc: Kevin Wolf, qemu-block, Xu Yandong, Markus Armbruster, Max Reitz,
	haibinzhang, Stefan Hajnoczi, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Stefano Garzarella

Since v1:
- check driver is "raw" (David)
- ignore CFI02 for now

Hi,

This series aims to reduce the memory footprint of flash devices
when the backing file is read-only.

When a backing file is read-only, the model considers the flash
is in "protected" mode. No write are allowed, but the MMIO
state machine is still usable.

This series introduces a new memory region helper to mmap files
and use it with the pflash device (only with read-only backing
files).

The goal is to reduce QEMU's memory footprint when multiple VMs
are instantiated using the same read-only backing file, which is
the case with the CODE flash from OVMF and AAVMF.

Previous attempts:

- Huawei
https://www.mail-archive.com/qemu-devel@nongnu.org/msg607292.html
- Tencent
https://www.mail-archive.com/qemu-devel@nongnu.org/msg742066.html
- Oracle
https://www.mail-archive.com/qemu-devel@nongnu.org/msg760065.html

RFC because yet another approach to tackle this technical debt,
and very little tested.

Regards,

Phil.

Philippe Mathieu-Daudé (3):
  exec/memory: Introduce memory_region_init_rom_device_from_file()
  hw/block/pflash: Move code around
  hw/block/pflash: use memory_region_init_rom_device_from_file()

 include/exec/memory.h   | 85 +++++++++++++++++++++++++++++++++++
 hw/block/pflash_cfi01.c | 49 +++++++++++++++------
 hw/block/pflash_cfi02.c | 18 ++++----
 softmmu/memory.c        | 98 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 228 insertions(+), 22 deletions(-)

-- 
2.26.2




^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH v2 1/3] exec/memory: Introduce memory_region_init_rom_device_from_file()
  2021-03-01 11:53 [RFC PATCH v2 0/3] hw/block/pflash: Mmap read-only backend files with MAP_SHARED Philippe Mathieu-Daudé
@ 2021-03-01 11:53 ` Philippe Mathieu-Daudé
  2021-03-01 11:53 ` [RFC PATCH v2 2/3] hw/block/pflash: Move code around Philippe Mathieu-Daudé
  2021-03-01 11:53 ` [RFC PATCH v2 3/3] hw/block/pflash: use memory_region_init_rom_device_from_file() Philippe Mathieu-Daudé
  2 siblings, 0 replies; 7+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-03-01 11:53 UTC (permalink / raw)
  To: qemu-devel, David Edmondson
  Cc: Kevin Wolf, qemu-block, Xu Yandong, Markus Armbruster, Max Reitz,
	haibinzhang, Stefan Hajnoczi, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Stefano Garzarella

Introduce memory_region_init_rom_device_from_file() which mmap
the backing file of ROM devices. This allows to reduce QEMU memory
footprint as the same file can be shared between multiple instances
of QEMU.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 include/exec/memory.h | 85 +++++++++++++++++++++++++++++++++++++
 softmmu/memory.c      | 98 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 183 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index c6fb714e499..bacf7495003 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -487,6 +487,9 @@ struct MemoryRegion {
     const char *name;
     unsigned ioeventfd_nb;
     MemoryRegionIoeventfd *ioeventfds;
+#ifndef CONFIG_POSIX
+    gchar *contents;
+#endif
 };
 
 struct IOMMUMemoryRegion {
@@ -1131,6 +1134,43 @@ void memory_region_init_rom_device_nomigrate(MemoryRegion *mr,
                                              uint64_t size,
                                              Error **errp);
 
+/**
+ * memory_region_init_rom_device_from_file_nomigrate:
+ * Initialize a ROM memory region from the specified backing file.
+ * Writes are handled via callbacks.
+ *
+ * Note that this function does not do anything to cause the data in the
+ * RAM side of the memory region to be migrated; that is the responsibility
+ * of the caller.
+ *
+ * @mr: the #MemoryRegion to be initialized.
+ * @owner: the object that tracks the region's reference count
+ * @ops: callbacks for write access handling (must not be NULL).
+ * @opaque: passed to the read and write callbacks of the @ops structure.
+ * @name: Region name, becomes part of RAMBlock name used in migration stream
+ *        must be unique within any device
+ * @size: size of the region.
+ * @ram_flags: specify the properties of the ram block, which can be one
+ *             or bit-or of following values
+ *             - RAM_SHARED: mmap the backing file or device with MAP_SHARED
+ *             - RAM_PMEM: the backend @mem_path is persistent memory
+ *             Other bits are ignored.
+ * @path: specify the backing file
+ * @readonly: true to open @path for reading, false for read/write.
+ * @errp: pointer to Error*, to store an error if it happens.
+ */
+void memory_region_init_rom_device_from_file_nomigrate(MemoryRegion *mr,
+                                                       Object *owner,
+                                                       const MemoryRegionOps *ops,
+                                                       void *opaque,
+                                                       const char *name,
+                                                       uint64_t size,
+                                                       uint64_t align,
+                                                       uint32_t ram_flags,
+                                                       const char *path,
+                                                       bool readonly,
+                                                       Error **errp);
+
 /**
  * memory_region_init_iommu: Initialize a memory region of a custom type
  * that translates addresses
@@ -1249,6 +1289,51 @@ void memory_region_init_rom_device(MemoryRegion *mr,
                                    Error **errp);
 
 
+/**
+ * memory_region_init_rom_device_from_file:
+ * Initialize a ROM memory region from the specified backing file.
+ * Writes are handled via callbacks.
+ *
+ * This function initializes a memory region backed by RAM for reads
+ * and callbacks for writes, and arranges for the RAM backing to
+ * be migrated (by calling vmstate_register_ram()
+ * if @owner is a DeviceState, or vmstate_register_ram_global() if
+ * @owner is NULL).
+ *
+ * TODO: Currently we restrict @owner to being either NULL (for
+ * global RAM regions with no owner) or devices, so that we can
+ * give the RAM block a unique name for migration purposes.
+ * We should lift this restriction and allow arbitrary Objects.
+ * If you pass a non-NULL non-device @owner then we will assert.
+ *
+ * @mr: the #MemoryRegion to be initialized.
+ * @owner: the object that tracks the region's reference count
+ * @ops: callbacks for write access handling (must not be NULL).
+ * @opaque: passed to the read and write callbacks of the @ops structure.
+ * @name: Region name, becomes part of RAMBlock name used in migration stream
+ *        must be unique within any device
+ * @size: size of the region.
+ * @ram_flags: specify the properties of the ram block, which can be one
+ *             or bit-or of following values
+ *             - RAM_SHARED: mmap the backing file or device with MAP_SHARED
+ *             - RAM_PMEM: the backend @mem_path is persistent memory
+ *             Other bits are ignored.
+ * @path: specify the backing file
+ * @readonly: true to open @path for reading, false for read/write.
+ * @errp: pointer to Error*, to store an error if it happens.
+ */
+void memory_region_init_rom_device_from_file(MemoryRegion *mr,
+                                             Object *owner,
+                                             const MemoryRegionOps *ops,
+                                             void *opaque,
+                                             const char *name,
+                                             uint64_t size,
+                                             uint64_t align,
+                                             uint32_t ram_flags,
+                                             const char *path,
+                                             bool readonly,
+                                             Error **errp);
+
 /**
  * memory_region_owner: get a memory region's owner.
  *
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 874a8fccdee..ea1892a8cd6 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1120,6 +1120,14 @@ static void memory_region_destructor_ram(MemoryRegion *mr)
     qemu_ram_free(mr->ram_block);
 }
 
+#ifndef CONFIG_POSIX
+static void memory_region_destructor_contents(MemoryRegion *mr)
+{
+    qemu_ram_free(mr->ram_block);
+    g_free(mr->contents);
+}
+#endif
+
 static bool memory_region_need_escape(char c)
 {
     return c == '/' || c == '[' || c == '\\' || c == ']';
@@ -1712,6 +1720,96 @@ void memory_region_init_rom_device_nomigrate(MemoryRegion *mr,
     }
 }
 
+void memory_region_init_rom_device_from_file_nomigrate(MemoryRegion *mr,
+                                                       Object *owner,
+                                                       const MemoryRegionOps *ops,
+                                                       void *opaque,
+                                                       const char *name,
+                                                       uint64_t size,
+                                                       uint64_t align,
+                                                       uint32_t ram_flags,
+                                                       const char *path,
+                                                       bool readonly,
+                                                       Error **errp)
+{
+    Error *err = NULL;
+
+    assert(ops);
+#ifdef CONFIG_POSIX
+    memory_region_init(mr, owner, name, size);
+    mr->opaque = opaque;
+    mr->ops = ops;
+    mr->rom_device = true;
+    mr->readonly = readonly;
+    mr->ram = true;
+    mr->align = align;
+    mr->terminates = true;
+    mr->destructor = memory_region_destructor_ram;
+    mr->ram_block = qemu_ram_alloc_from_file(size, mr, ram_flags, path,
+                                             readonly, &err);
+    if (err) {
+        mr->size = int128_zero();
+        object_unparent(OBJECT(mr));
+        error_propagate(errp, err);
+    }
+#else
+    g_autoptr(GError) gerr = NULL;
+    gsize len;
+
+    memory_region_init(mr, owner, name, size);
+    mr->ops = ops;
+    mr->opaque = opaque;
+    mr->terminates = true;
+    mr->rom_device = true;
+
+    if (!g_file_get_contents(path, &mr->contents, &len, &gerr)) {
+        error_setg(errp, "Unable to read '%s': %s", path, gerr->message);
+        return;
+    }
+    mr->destructor = memory_region_destructor_contents;
+    mr->contents = g_realloc(mr->contents, size);
+    mr->ram_block = qemu_ram_alloc_from_ptr(size, mr->contents, mr, &err);
+    if (err) {
+        mr->size = int128_zero();
+        object_unparent(OBJECT(mr));
+        error_propagate(errp, err);
+    }
+#endif
+}
+
+void memory_region_init_rom_device_from_file(MemoryRegion *mr,
+                                             Object *owner,
+                                             const MemoryRegionOps *ops,
+                                             void *opaque,
+                                             const char *name,
+                                             uint64_t size,
+                                             uint64_t align,
+                                             uint32_t ram_flags,
+                                             const char *path,
+                                             bool readonly,
+                                             Error **errp)
+{
+    DeviceState *owner_dev;
+    Error *err = NULL;
+
+    memory_region_init_rom_device_from_file_nomigrate(mr, owner, ops, opaque,
+                                                      name, size, align,
+                                                      ram_flags, path, readonly,
+                                                      &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+    /* This will assert if owner is neither NULL nor a DeviceState.
+     * We only want the owner here for the purposes of defining a
+     * unique name for migration. TODO: Ideally we should implement
+     * a naming scheme for Objects which are not DeviceStates, in
+     * which case we can relax this restriction.
+     */
+    owner_dev = DEVICE(owner);
+    vmstate_register_ram(mr, owner_dev);
+}
+
 void memory_region_init_iommu(void *_iommu_mr,
                               size_t instance_size,
                               const char *mrtypename,
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH v2 2/3] hw/block/pflash: Move code around
  2021-03-01 11:53 [RFC PATCH v2 0/3] hw/block/pflash: Mmap read-only backend files with MAP_SHARED Philippe Mathieu-Daudé
  2021-03-01 11:53 ` [RFC PATCH v2 1/3] exec/memory: Introduce memory_region_init_rom_device_from_file() Philippe Mathieu-Daudé
@ 2021-03-01 11:53 ` Philippe Mathieu-Daudé
  2021-03-01 11:53 ` [RFC PATCH v2 3/3] hw/block/pflash: use memory_region_init_rom_device_from_file() Philippe Mathieu-Daudé
  2 siblings, 0 replies; 7+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-03-01 11:53 UTC (permalink / raw)
  To: qemu-devel, David Edmondson
  Cc: Kevin Wolf, qemu-block, Xu Yandong, Markus Armbruster, Max Reitz,
	haibinzhang, Stefan Hajnoczi, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Stefano Garzarella

First do the block checks, so we know if it is read-only or not.
Then create the MemoryRegion. This will allow optimization in
the next commit.

Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 hw/block/pflash_cfi01.c | 24 ++++++++++++------------
 hw/block/pflash_cfi02.c | 18 +++++++++---------
 2 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index 22287a1522e..a5fa8d8b74a 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -731,18 +731,6 @@ static void pflash_cfi01_realize(DeviceState *dev, Error **errp)
     }
     device_len = sector_len_per_device * blocks_per_device;
 
-    memory_region_init_rom_device(
-        &pfl->mem, OBJECT(dev),
-        &pflash_cfi01_ops,
-        pfl,
-        pfl->name, total_len, errp);
-    if (*errp) {
-        return;
-    }
-
-    pfl->storage = memory_region_get_ram_ptr(&pfl->mem);
-    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &pfl->mem);
-
     if (pfl->blk) {
         uint64_t perm;
         pfl->ro = !blk_supports_write_perm(pfl->blk);
@@ -755,6 +743,18 @@ static void pflash_cfi01_realize(DeviceState *dev, Error **errp)
         pfl->ro = 0;
     }
 
+    memory_region_init_rom_device(
+        &pfl->mem, OBJECT(dev),
+        &pflash_cfi01_ops,
+        pfl,
+        pfl->name, total_len, errp);
+    if (*errp) {
+        return;
+    }
+
+    pfl->storage = memory_region_get_ram_ptr(&pfl->mem);
+    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &pfl->mem);
+
     if (pfl->blk) {
         if (!blk_check_size_and_read_all(pfl->blk, pfl->storage, total_len,
                                          errp)) {
diff --git a/hw/block/pflash_cfi02.c b/hw/block/pflash_cfi02.c
index 7962cff7455..4f62ce8917d 100644
--- a/hw/block/pflash_cfi02.c
+++ b/hw/block/pflash_cfi02.c
@@ -791,15 +791,6 @@ static void pflash_cfi02_realize(DeviceState *dev, Error **errp)
         return;
     }
 
-    memory_region_init_rom_device(&pfl->orig_mem, OBJECT(pfl),
-                                  &pflash_cfi02_ops, pfl, pfl->name,
-                                  pfl->chip_len, errp);
-    if (*errp) {
-        return;
-    }
-
-    pfl->storage = memory_region_get_ram_ptr(&pfl->orig_mem);
-
     if (pfl->blk) {
         uint64_t perm;
         pfl->ro = !blk_supports_write_perm(pfl->blk);
@@ -812,6 +803,15 @@ static void pflash_cfi02_realize(DeviceState *dev, Error **errp)
         pfl->ro = 0;
     }
 
+    memory_region_init_rom_device(&pfl->orig_mem, OBJECT(pfl),
+                                  &pflash_cfi02_ops, pfl, pfl->name,
+                                  pfl->chip_len, errp);
+    if (*errp) {
+        return;
+    }
+
+    pfl->storage = memory_region_get_ram_ptr(&pfl->orig_mem);
+
     if (pfl->blk) {
         if (!blk_check_size_and_read_all(pfl->blk, pfl->storage,
                                          pfl->chip_len, errp)) {
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH v2 3/3] hw/block/pflash: use memory_region_init_rom_device_from_file()
  2021-03-01 11:53 [RFC PATCH v2 0/3] hw/block/pflash: Mmap read-only backend files with MAP_SHARED Philippe Mathieu-Daudé
  2021-03-01 11:53 ` [RFC PATCH v2 1/3] exec/memory: Introduce memory_region_init_rom_device_from_file() Philippe Mathieu-Daudé
  2021-03-01 11:53 ` [RFC PATCH v2 2/3] hw/block/pflash: Move code around Philippe Mathieu-Daudé
@ 2021-03-01 11:53 ` Philippe Mathieu-Daudé
  2021-03-01 18:13   ` Stefan Hajnoczi
  2 siblings, 1 reply; 7+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-03-01 11:53 UTC (permalink / raw)
  To: qemu-devel, David Edmondson
  Cc: Kevin Wolf, qemu-block, Xu Yandong, Markus Armbruster, Max Reitz,
	haibinzhang, Stefan Hajnoczi, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Stefano Garzarella

If the block drive is read-only we will model a "protected" flash
device. We can thus use memory_region_init_rom_device_from_file()
which mmap the backing file when creating the MemoryRegion.
If the same backing file is used by multiple QEMU instances, this
reduces the memory footprint (this is often the case with the
CODE flash image from OVMF and AAVMF).

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 hw/block/pflash_cfi01.c | 39 +++++++++++++++++++++++++++++++--------
 1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index a5fa8d8b74a..ec290636298 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -702,6 +702,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error **errp)
     int ret;
     uint64_t blocks_per_device, sector_len_per_device, device_len;
     int num_devices;
+    bool romd_mr_shared_mapped;
 
     if (pfl->sector_len == 0) {
         error_setg(errp, "attribute \"sector-length\" not specified or zero.");
@@ -743,19 +744,41 @@ static void pflash_cfi01_realize(DeviceState *dev, Error **errp)
         pfl->ro = 0;
     }
 
-    memory_region_init_rom_device(
-        &pfl->mem, OBJECT(dev),
-        &pflash_cfi01_ops,
-        pfl,
-        pfl->name, total_len, errp);
-    if (*errp) {
-        return;
+    if (pfl->ro && pfl->blk) {
+        BlockDriverState *bs = blk_bs(pfl->blk);
+
+        /* If "raw" driver used, try to mmap the backing file as RAM_SHARED */
+        if (bs->drv == &bdrv_raw) { /* FIXME check offset=0 ? */
+            Error *local_err = NULL;
+
+            memory_region_init_rom_device_from_file(&pfl->mem, OBJECT(dev),
+                                                    &pflash_cfi01_ops, pfl,
+                                                    pfl->name, total_len,
+                                                    qemu_real_host_page_size,
+                                                    RAM_SHARED,
+                                                    bs->exact_filename,
+                                                    true, &local_err);
+            if (local_err) {
+                error_report_err(local_err);
+                /* fall back to memory_region_init_rom_device() */
+            } else {
+                romd_mr_shared_mapped = true;
+            }
+        }
+    }
+    if (!romd_mr_shared_mapped) {
+        memory_region_init_rom_device(&pfl->mem, OBJECT(dev),
+                                      &pflash_cfi01_ops, pfl,
+                                      pfl->name, total_len, errp);
+        if (*errp) {
+            return;
+        }
     }
 
     pfl->storage = memory_region_get_ram_ptr(&pfl->mem);
     sysbus_init_mmio(SYS_BUS_DEVICE(dev), &pfl->mem);
 
-    if (pfl->blk) {
+    if (pfl->blk && !romd_mr_shared_mapped) {
         if (!blk_check_size_and_read_all(pfl->blk, pfl->storage, total_len,
                                          errp)) {
             vmstate_unregister_ram(&pfl->mem, DEVICE(pfl));
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH v2 3/3] hw/block/pflash: use memory_region_init_rom_device_from_file()
  2021-03-01 11:53 ` [RFC PATCH v2 3/3] hw/block/pflash: use memory_region_init_rom_device_from_file() Philippe Mathieu-Daudé
@ 2021-03-01 18:13   ` Stefan Hajnoczi
  2021-03-02  7:04     ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Hajnoczi @ 2021-03-01 18:13 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Kevin Wolf, qemu-block, Xu Yandong, Markus Armbruster,
	qemu-devel, David Edmondson, haibinzhang, Paolo Bonzini,
	Max Reitz, Stefano Garzarella

[-- Attachment #1: Type: text/plain, Size: 4467 bytes --]

On Mon, Mar 01, 2021 at 12:53:29PM +0100, Philippe Mathieu-Daudé wrote:
> If the block drive is read-only we will model a "protected" flash
> device. We can thus use memory_region_init_rom_device_from_file()
> which mmap the backing file when creating the MemoryRegion.
> If the same backing file is used by multiple QEMU instances, this
> reduces the memory footprint (this is often the case with the
> CODE flash image from OVMF and AAVMF).
> 
> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
>  hw/block/pflash_cfi01.c | 39 +++++++++++++++++++++++++++++++--------
>  1 file changed, 31 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
> index a5fa8d8b74a..ec290636298 100644
> --- a/hw/block/pflash_cfi01.c
> +++ b/hw/block/pflash_cfi01.c
> @@ -702,6 +702,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error **errp)
>      int ret;
>      uint64_t blocks_per_device, sector_len_per_device, device_len;
>      int num_devices;
> +    bool romd_mr_shared_mapped;
>  
>      if (pfl->sector_len == 0) {
>          error_setg(errp, "attribute \"sector-length\" not specified or zero.");
> @@ -743,19 +744,41 @@ static void pflash_cfi01_realize(DeviceState *dev, Error **errp)
>          pfl->ro = 0;
>      }
>  
> -    memory_region_init_rom_device(
> -        &pfl->mem, OBJECT(dev),
> -        &pflash_cfi01_ops,
> -        pfl,
> -        pfl->name, total_len, errp);
> -    if (*errp) {
> -        return;
> +    if (pfl->ro && pfl->blk) {
> +        BlockDriverState *bs = blk_bs(pfl->blk);
> +
> +        /* If "raw" driver used, try to mmap the backing file as RAM_SHARED */
> +        if (bs->drv == &bdrv_raw) { /* FIXME check offset=0 ? */

Bypassing the block layer is tricky because there are a lot of features
that conflict (you already pointed out the offset= option). Checking
bdrv_raw is not enough because the underlying protocol driver could be
GlusterFS, iSCSI, etc.

I think the goal here is to avoid changing the command-line/QMP so that
users don't need to modify their guests. Therefore changing the pflash
qdev properties is not desirable (we could have added a separate code
path that bypasses the block layer cleanly). This seems like a
worthwhile optimization that the block layer should support. I suggest
adding a new API like:

  /* Returns a filename string if @blk supports read-only mmap */
  char *blk_get_read_only_mmap_filename(BlockBackend *blk, Error **errp);

Then block/raw-format.c would forward the call to bs->file and
block/raw-posix.c would implement it by returning a new filename string
when bs->read_only is true.

FWIW this API isn't perfect because the file could be reopened with QMP
and the existing mmap would remain in place.

> +            Error *local_err = NULL;
> +
> +            memory_region_init_rom_device_from_file(&pfl->mem, OBJECT(dev),
> +                                                    &pflash_cfi01_ops, pfl,
> +                                                    pfl->name, total_len,
> +                                                    qemu_real_host_page_size,
> +                                                    RAM_SHARED,
> +                                                    bs->exact_filename,
> +                                                    true, &local_err);
> +            if (local_err) {
> +                error_report_err(local_err);
> +                /* fall back to memory_region_init_rom_device() */
> +            } else {
> +                romd_mr_shared_mapped = true;
> +            }
> +        }
> +    }
> +    if (!romd_mr_shared_mapped) {
> +        memory_region_init_rom_device(&pfl->mem, OBJECT(dev),
> +                                      &pflash_cfi01_ops, pfl,
> +                                      pfl->name, total_len, errp);
> +        if (*errp) {
> +            return;
> +        }
>      }
>  
>      pfl->storage = memory_region_get_ram_ptr(&pfl->mem);
>      sysbus_init_mmio(SYS_BUS_DEVICE(dev), &pfl->mem);
>  
> -    if (pfl->blk) {
> +    if (pfl->blk && !romd_mr_shared_mapped) {
>          if (!blk_check_size_and_read_all(pfl->blk, pfl->storage, total_len,
>                                           errp)) {
>              vmstate_unregister_ram(&pfl->mem, DEVICE(pfl));
> -- 
> 2.26.2
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH v2 3/3] hw/block/pflash: use memory_region_init_rom_device_from_file()
  2021-03-01 18:13   ` Stefan Hajnoczi
@ 2021-03-02  7:04     ` Philippe Mathieu-Daudé
  2021-03-02 10:54       ` Stefan Hajnoczi
  0 siblings, 1 reply; 7+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-03-02  7:04 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, qemu-block, Xu Yandong, Markus Armbruster,
	qemu-devel, David Edmondson, haibinzhang, Paolo Bonzini,
	Max Reitz, Stefano Garzarella

On 3/1/21 7:13 PM, Stefan Hajnoczi wrote:
> On Mon, Mar 01, 2021 at 12:53:29PM +0100, Philippe Mathieu-Daudé wrote:
>> If the block drive is read-only we will model a "protected" flash
>> device. We can thus use memory_region_init_rom_device_from_file()
>> which mmap the backing file when creating the MemoryRegion.
>> If the same backing file is used by multiple QEMU instances, this
>> reduces the memory footprint (this is often the case with the
>> CODE flash image from OVMF and AAVMF).
>>
>> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
>> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
>> ---
>>  hw/block/pflash_cfi01.c | 39 +++++++++++++++++++++++++++++++--------
>>  1 file changed, 31 insertions(+), 8 deletions(-)
>>
>> diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
>> index a5fa8d8b74a..ec290636298 100644
>> --- a/hw/block/pflash_cfi01.c
>> +++ b/hw/block/pflash_cfi01.c
>> @@ -702,6 +702,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error **errp)
>>      int ret;
>>      uint64_t blocks_per_device, sector_len_per_device, device_len;
>>      int num_devices;
>> +    bool romd_mr_shared_mapped;
>>  
>>      if (pfl->sector_len == 0) {
>>          error_setg(errp, "attribute \"sector-length\" not specified or zero.");
>> @@ -743,19 +744,41 @@ static void pflash_cfi01_realize(DeviceState *dev, Error **errp)
>>          pfl->ro = 0;
>>      }
>>  
>> -    memory_region_init_rom_device(
>> -        &pfl->mem, OBJECT(dev),
>> -        &pflash_cfi01_ops,
>> -        pfl,
>> -        pfl->name, total_len, errp);
>> -    if (*errp) {
>> -        return;
>> +    if (pfl->ro && pfl->blk) {
>> +        BlockDriverState *bs = blk_bs(pfl->blk);
>> +
>> +        /* If "raw" driver used, try to mmap the backing file as RAM_SHARED */
>> +        if (bs->drv == &bdrv_raw) { /* FIXME check offset=0 ? */
> 
> Bypassing the block layer is tricky because there are a lot of features
> that conflict (you already pointed out the offset= option). Checking
> bdrv_raw is not enough because the underlying protocol driver could be
> GlusterFS, iSCSI, etc.

OK.

> I think the goal here is to avoid changing the command-line/QMP so that
> users don't need to modify their guests. Therefore changing the pflash
> qdev properties is not desirable (we could have added a separate code
> path that bypasses the block layer cleanly).

Yes, this is the limitation.

> This seems like a
> worthwhile optimization that the block layer should support. I suggest
> adding a new API like:
> 
>   /* Returns a filename string if @blk supports read-only mmap */
>   char *blk_get_read_only_mmap_filename(BlockBackend *blk, Error **errp);
> 
> Then block/raw-format.c would forward the call to bs->file and
> block/raw-posix.c would implement it by returning a new filename string
> when bs->read_only is true.

Thanks :) Kevin suggested something similar too.

> 
> FWIW this API isn't perfect because the file could be reopened with QMP
> and the existing mmap would remain in place.

Can you show me a QMP example or point me at the command?
This shouldn't happen with the pflash.

Thanks for reviewing,

Phil.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH v2 3/3] hw/block/pflash: use memory_region_init_rom_device_from_file()
  2021-03-02  7:04     ` Philippe Mathieu-Daudé
@ 2021-03-02 10:54       ` Stefan Hajnoczi
  0 siblings, 0 replies; 7+ messages in thread
From: Stefan Hajnoczi @ 2021-03-02 10:54 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Kevin Wolf, qemu-block, Xu Yandong, Markus Armbruster,
	qemu-devel, David Edmondson, haibinzhang, Paolo Bonzini,
	Max Reitz, Stefano Garzarella

[-- Attachment #1: Type: text/plain, Size: 1065 bytes --]

On Tue, Mar 02, 2021 at 08:04:46AM +0100, Philippe Mathieu-Daudé wrote:
> On 3/1/21 7:13 PM, Stefan Hajnoczi wrote:
> > On Mon, Mar 01, 2021 at 12:53:29PM +0100, Philippe Mathieu-Daudé wrote:
> > FWIW this API isn't perfect because the file could be reopened with QMP
> > and the existing mmap would remain in place.
> 
> Can you show me a QMP example or point me at the command?

x-blockdev-change and other commands can reopen or reconfigure the
BlockDriverState graph - the mmap user would not be aware of this.

For example, block_set_io_throttle won't take effect if the guest has
the device mmapped.

> This shouldn't happen with the pflash.

It's not possible to say that because pflash has a
DEFINE_PROP_DRIVE("drive") property. The storage is backed by a
--drive/--blockdev and the user could send any QMP command that operates
on drives :(.

Users probably won't but there is nothing stopping them.

The block layer has a permission system (BLK_PERM_*). Maybe it's
possible to use it to lock a BDS while mmap is active?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-03-02 10:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-01 11:53 [RFC PATCH v2 0/3] hw/block/pflash: Mmap read-only backend files with MAP_SHARED Philippe Mathieu-Daudé
2021-03-01 11:53 ` [RFC PATCH v2 1/3] exec/memory: Introduce memory_region_init_rom_device_from_file() Philippe Mathieu-Daudé
2021-03-01 11:53 ` [RFC PATCH v2 2/3] hw/block/pflash: Move code around Philippe Mathieu-Daudé
2021-03-01 11:53 ` [RFC PATCH v2 3/3] hw/block/pflash: use memory_region_init_rom_device_from_file() Philippe Mathieu-Daudé
2021-03-01 18:13   ` Stefan Hajnoczi
2021-03-02  7:04     ` Philippe Mathieu-Daudé
2021-03-02 10:54       ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.