All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/1] Exogenous memory management via -mem-path
@ 2012-11-20 12:10 Peter Feiner
  2012-11-20 12:10 ` [Qemu-devel] [PATCH 1/1] exec: make -mem-path filenames deterministic Peter Feiner
  2013-01-02 19:34 ` [Qemu-devel] [PATCH 0/1] Exogenous memory management via -mem-path Peter Feiner
  0 siblings, 2 replies; 11+ messages in thread
From: Peter Feiner @ 2012-11-20 12:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: andreslc, aliguori, peter

This patch proposes a framework that empowers QEMU to implement powerful memory
management techniques with very low cost. The insight is to leverage existing
sub-systems, composing functionality in a traditional UNIX style. The key to
enabling this approach is to expand the -mem-path option to provide more control
over how backing memory files are named, opened, and mapped.

Two seemingly complex memory management techniques can now be achieved quite
simply:

    1. Post-copy migration.

        A. Boot a VM with RAM backed by MAP_SHARED files.
        B. Pause the VM and save device state.
        C. Share RAM files using a network file system.
        D. Copy saved device state to target host.
        E. On the target host, create a VM with the saved device state. Back RAM
           with files on the network mount.

        As the VM runs on the target, pages are loaded on demand by the
        network file system.

    2. VM cloning with shared CoW memory

        A. Boot a parent VM, run for a while, then pause.
        B. Save the parent's device state.
        C. Create child VMs from the saved state. Back RAM with MAP_PRIVATE
           overlays of the parent's RAM.

        Pages are shared by the children until they're COW'd.

        Note that kernel shared memory (KSM, or TPS - transparent page sharing -
        in VMWare parlance) is complimentary to cloning. KSM discovers pages
        that happen to be identical and shares them. Although KSM can find all
        redundant pages, doing so comes at a cost. Cloning, on the other hand,
        exploits sharing opportunities that it knows about for free. However, as
        children are longer lived and COW more pages, the memory savings due to
        cloning diminish.

By simply granting more control over the manipulation of backing RAM files, QEMU
can enable these features *with no additional costs or changes to its internal
memory management*.

Thanks to MMU notifiers, kvm is already compatible with all file systems and
MAP_SHARED and MAP_PRIVATE. Furthermore, thanks to -mem-path, QEMU is already
wired to back RAMBlocks with files. However, since these backing files are
currently randomly named (i.e., using mkstemp), it's impossible to know the
correspondence between different VMs' backing files.

This patch's essential feature is making the names of the RAMBlocks' backing
files deterministic and unique. Then, given predictable backing file names, this
patch makes QEMU configurable to use existing RAMBlock backing files. These new
features are exposed to the user via arguments to the -mem-path command:

-mem-path \
    path[,mode=temp|open|create][,mmap=private|shared][,nofallback][,anyfs]

A detailed explanation of each argument follows.

mode:

    Naming of the RAMBlock filenames is controlled by the new mode argument for
    -mem-path. When mode is "create", files with deterministic names are created
    and opened. When mode is "temp", which is the default, the names are random
    and the files are unlinked, as per the current behavior of -mem-path. The
    "open" mode is the same as "create" but it fails if the files don't already
    exist; the "open" mode is included as a sanity checking measure when
    creating children.

    To make backing files' names' deterministic and unique, a RAMBlock's offset
    is included in its backing file's name. The offset is a function of the
    order in which RAMBlocks are allocated and their size, which, in turn, is a
    function of the order in which devices are initialized. Hence, given the
    same QEMU command line, two QEMU processes will have RAMBlocks with the same
    offsets. Note that Xen's live migration also relies on RAMBlocks having the
    same offsets between QEMU processes (see xen_read_physmap).

    In addition to the offset, the size of the RamBlock and the (non-unique)
    name of the RamBlock's MemoryRegion included in the file name:

    qemu_back_mem.OFFSET+SIZE.NAME[.RANDOM]

    The size and name are included for extra sanity checking when the -mem-path
    mode is "open". The optional random suffix is added when the mode is "temp".

mmap:

    Whether the RAMBlock files are mmap'd with MAP_PRIVATE or MAP_SHARED is
    governed by the mmap argument. As explained above, MAP_PRIVATE is necessary
    for sharing memory between children. The motivation for MAP_SHARED is to
    make saving the parent's memory state a zero-cost operation during step 2B
    of the clone algorithm by using the parent's RAMBlock files in place for the
    children. This is best illustrated by an example:

    $ mkdir /tmp/mem
    $ qemu -mempath /tmp/mem,mode=create,mmap=shared -m 8g -qmp stdio disk.qcow2
    # Warm up the parent (e.g., fill buffer cache, load apps) ...

    # In the parent's qmp shell:
    {"execute": "qmp_capabilities"}
    {"execute": "stop"}
    {"execute": "xen-save-devices-state", "arguments": {"filename": "devices"}}
    {"execute": "quit"}

    # Now we have a tiny file with the parent's device state and
    # and a bunch of memory backing files:
    $ ls -lh devices
    ...  80K ... devices
    $ ls -lh /tmp/mem
    ... 8.0G ... qemu_back_mem.0+200000000.pc.ram
    ... 128K ... qemu_back_mem.200000000+20000.pc.bios
    ... 128K ... qemu_back_mem.200020000+20000.pc.rom
    ... 8.0M ... qemu_back_mem.200040000+800000.vga.vram
    ...  64K ... qemu_back_mem.200840000+10000.cirrus_vga.rom
    ... 128K ... qemu_back_mem.200850000+20000.e1000.rom

    # Launch the children (mode=open, mmap=private):
    $ for i in $(seq 100); do \
        qemu-img create -o backing_file=disk.qcow2 -f qcow2 child$i.qcow2; \
        qemu -mem-path /tmp/memory,mode=open,mmap=private \
             -incoming "exec:cat devices" \
             -m 8g \
             child$i.qcow2 &; \
      done
    # Now we have 100 8GB VMs sharing memory!

nofallback:

    Abort if a RAMBlock can't be backed by a file. Currently, QEMU falls back to
    anonymous VMAs. This fallback only makes sense if you're using -mem-path as
    a transparent optimization (as is the case with hugetlbfs).

anyfs:
    
    -mem-path currently prints a warning if the specified path isn't on a
    hugetlbfs mount. This option squelches that warning.

In conclusion, by constraining -mem-path's implementation with deterministic
file names and control over mmap flags, QEMU enables efficient VM cloning.
Moreover, novel memory-management techniques can be implemented without adding
any complexity to QEMU. For example, the example becomes post-copy live
migration if /tmp/mem is shared using NFS and a child is launched on another
host.

Peter Feiner (1):
  exec: make -mem-path filenames deterministic

 cpu-all.h       |    5 ++++
 exec.c          |   60 ++++++++++++++++++++++++++++++++++--------------------
 qemu-config.c   |   26 +++++++++++++++++++++++
 qemu-options.hx |   24 +++++++++++++++++++--
 vl.c            |   43 +++++++++++++++++++++++++++++++++++++-
 5 files changed, 131 insertions(+), 27 deletions(-)

-- 
1.7.5.4

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH 1/1] exec: make -mem-path filenames deterministic
  2012-11-20 12:10 [Qemu-devel] [PATCH 0/1] Exogenous memory management via -mem-path Peter Feiner
@ 2012-11-20 12:10 ` Peter Feiner
  2012-11-29 16:29   ` Peter Feiner
  2013-01-07 19:55   ` Anthony Liguori
  2013-01-02 19:34 ` [Qemu-devel] [PATCH 0/1] Exogenous memory management via -mem-path Peter Feiner
  1 sibling, 2 replies; 11+ messages in thread
From: Peter Feiner @ 2012-11-20 12:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: andreslc, aliguori, peter

This patch makes the -mem-path filenames deterministic and allows some control
over how QEMU mmaps the files. Given this control, QEMU can be used to implement
exogenous memory management techniques quite simply. Two examples follow:

    1. Post-copy migration (mmap=shared for origin, save device state, path in
       NFS available on both hosts, mmap=shared for destination).
    2. memory sharing via VM cloning (mmap=shared for master, save device state,
       then mmap=private for clones).

These new features are exposed via arguments to -mem-path:

-mem-path \
    path[,mode=temp|open|create][,mmap=private|shared][,nofallback][,anyfs]

The backing file names are made deterministic by including their RAMBlocks'
offsets, which are unique given a qemu command line. Note that Xen's live
migration relies on RAMBlocks having the same offsets between QEMU processes
(see xen_read_physmap). The new file name format is

    qemu_back_mem.OFFSET+SIZE.NAME[.RANDOM]

where SIZE and NAME are added for extra sanity checking when mode="open".

By default, the old -mem-path behavior is preserved. I.e., mode="temp" is used,
which adds a random suffix to the deterministic name and unlinks the files.

Signed-off-by: Peter Feiner <peter@gridcentric.ca>
---
 cpu-all.h       |    5 ++++
 exec.c          |   60 ++++++++++++++++++++++++++++++++++--------------------
 qemu-config.c   |   26 +++++++++++++++++++++++
 qemu-options.hx |   24 +++++++++++++++++++--
 vl.c            |   43 +++++++++++++++++++++++++++++++++++++-
 5 files changed, 131 insertions(+), 27 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index c9c51b8..a83f71e 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -500,6 +500,11 @@ typedef struct RAMList {
 extern RAMList ram_list;
 
 extern const char *mem_path;
+extern int mem_temp;
+extern int mem_create;
+extern int mem_fallback;
+extern int mem_shared;
+extern int mem_hugetlbfs;
 extern int mem_prealloc;
 
 /* Flags stored in the low bits of the TLB virtual address.  These are
diff --git a/exec.c b/exec.c
index 8435de0..75b146c 100644
--- a/exec.c
+++ b/exec.c
@@ -2346,7 +2346,7 @@ void qemu_flush_coalesced_mmio_buffer(void)
 
 #define HUGETLBFS_MAGIC       0x958458f6
 
-static long gethugepagesize(const char *path)
+static long getfspagesize(const char *path, int want_hugetlbfs)
 {
     struct statfs fs;
     int ret;
@@ -2360,7 +2360,7 @@ static long gethugepagesize(const char *path)
         return 0;
     }
 
-    if (fs.f_type != HUGETLBFS_MAGIC)
+    if (want_hugetlbfs && fs.f_type != HUGETLBFS_MAGIC)
         fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
 
     return fs.f_bsize;
@@ -2373,17 +2373,15 @@ static void *file_ram_alloc(RAMBlock *block,
     char *filename;
     void *area;
     int fd;
-#ifdef MAP_POPULATE
     int flags;
-#endif
-    unsigned long hpagesize;
+    unsigned long pagesize;
 
-    hpagesize = gethugepagesize(path);
-    if (!hpagesize) {
+    pagesize = getfspagesize(path, mem_hugetlbfs);
+    if (!pagesize) {
         return NULL;
     }
 
-    if (memory < hpagesize) {
+    if (memory < pagesize) {
         return NULL;
     }
 
@@ -2392,20 +2390,35 @@ static void *file_ram_alloc(RAMBlock *block,
         return NULL;
     }
 
-    if (asprintf(&filename, "%s/qemu_back_mem.XXXXXX", path) == -1) {
+    if (asprintf(&filename, "%s/qemu_back_mem.%"PRIx64"+%"PRIx64".%s",
+                 path, block->offset, memory, block->mr->name) == -1) {
         return NULL;
     }
 
-    fd = mkstemp(filename);
+    if (mem_temp) {
+        if (asprintf(&filename, "%s.XXXXXX", filename) == -1) {
+            return NULL;
+        }
+        fd = mkstemp(filename);
+    } else {
+        flags = O_RDWR | O_CLOEXEC | (mem_create ? O_CREAT : 0);
+        fd = open(filename, flags, 0600);
+    }
+
     if (fd < 0) {
-        perror("unable to create backing store for hugepages");
+        fprintf(stderr, "unable to create backing store %s "
+                "for guest memory: %s\n", filename, strerror(errno));
         free(filename);
         return NULL;
     }
-    unlink(filename);
+
+    if (mem_temp) {
+        unlink(filename);
+    }
+
     free(filename);
 
-    memory = (memory+hpagesize-1) & ~(hpagesize-1);
+    memory = (memory+pagesize-1) & ~(pagesize-1);
 
     /*
      * ftruncate is not supported by hugetlbfs in older
@@ -2416,16 +2429,16 @@ static void *file_ram_alloc(RAMBlock *block,
     if (ftruncate(fd, memory))
         perror("ftruncate");
 
+    flags = mem_shared ? MAP_SHARED : MAP_PRIVATE;
 #ifdef MAP_POPULATE
     /* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the case
-     * MAP_PRIVATE is requested.  For mem_prealloc we mmap as MAP_SHARED
+     * MAP_PRIVATE is requested.  For mem_prealloc we always mmap as MAP_SHARED
      * to sidestep this quirk.
      */
-    flags = mem_prealloc ? MAP_POPULATE | MAP_SHARED : MAP_PRIVATE;
-    area = mmap(0, memory, PROT_READ | PROT_WRITE, flags, fd, 0);
-#else
-    area = mmap(0, memory, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
+    if (mem_prealloc)
+        flags = MAP_POPULATE | MAP_SHARED;
 #endif
+    area = mmap(0, memory, PROT_READ | PROT_WRITE, flags, fd, 0);
     if (area == MAP_FAILED) {
         perror("file_ram_alloc: can't mmap RAM pages");
         close(fd);
@@ -2561,6 +2574,10 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
 #if defined (__linux__) && !defined(TARGET_S390X)
             new_block->host = file_ram_alloc(new_block, size, mem_path);
             if (!new_block->host) {
+                if (!mem_fallback) {
+                    fprintf(stderr, "failed to allocate ram file for %s\n", mr->name);
+                    exit(1);
+                }
                 new_block->host = qemu_vmalloc(size);
                 memory_try_enable_merging(new_block->host, size);
             }
@@ -2675,11 +2692,10 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
                 if (mem_path) {
 #if defined(__linux__) && !defined(TARGET_S390X)
                     if (block->fd) {
+                        flags = mem_shared ? MAP_SHARED : MAP_PRIVATE;
 #ifdef MAP_POPULATE
-                        flags |= mem_prealloc ? MAP_POPULATE | MAP_SHARED :
-                            MAP_PRIVATE;
-#else
-                        flags |= MAP_PRIVATE;
+                        if (mem_prealloc)
+                            flags = MAP_POPULATE | MAP_SHARED;
 #endif
                         area = mmap(vaddr, length, PROT_READ | PROT_WRITE,
                                     flags, block->fd, offset);
diff --git a/qemu-config.c b/qemu-config.c
index 10d1ba4..45b24d2 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -691,6 +691,31 @@ static QemuOptsList qemu_object_opts = {
     },
 };
 
+static QemuOptsList qemu_mempath_opts = {
+    .name = "mempath",
+    .implied_opt_name = "path",
+    .head = QTAILQ_HEAD_INITIALIZER(qemu_mempath_opts.head),
+    .desc = {
+        {
+            .name = "path",
+            .type = QEMU_OPT_STRING,
+        },{
+            .name = "mmap",
+            .type = QEMU_OPT_STRING,
+        },{
+            .name = "mode",
+            .type = QEMU_OPT_STRING,
+        },{
+            .name = "anyfs",
+            .type = QEMU_OPT_BOOL,
+        },{
+            .name = "fallback",
+            .type = QEMU_OPT_BOOL,
+        },
+        { /* end of list */ }
+    },
+};
+
 static QemuOptsList *vm_config_groups[32] = {
     &qemu_drive_opts,
     &qemu_chardev_opts,
@@ -709,6 +734,7 @@ static QemuOptsList *vm_config_groups[32] = {
     &qemu_sandbox_opts,
     &qemu_add_fd_opts,
     &qemu_object_opts,
+    &qemu_mempath_opts,
     NULL,
 };
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 9bb29d3..8972397 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -443,10 +443,28 @@ gigabytes respectively.
 ETEXI
 
 DEF("mem-path", HAS_ARG, QEMU_OPTION_mempath,
-    "-mem-path FILE  provide backing storage for guest RAM\n", QEMU_ARCH_ALL)
+    "-mem-path path[,mmap=private|shared][,mode=temp|open|create][,nofallback][,anyfs]\n"
+    "          provide backing storage for guest RAM\n",
+    QEMU_ARCH_ALL)
 STEXI
-@item -mem-path @var{path}
-Allocate guest RAM from a temporarily created file in @var{path}.
+@item -mem-path @var{path}[,mmap=private|shared][,mode=temp|open|create][,nofallback][,anyfs]
+Allocate guest RAM from files in @var{path}.
+@item mmap=private|shared
+Specifies how backing files are mmap'd. "private" indicates a COW mapping, thus
+leaving the underlying file unchanged. "shared" indicates a write-through
+mapping, thus reflecting the guest's memory in the underlying file. Default is
+private.
+@item mode=temp|open|create
+Determines how backing files are created and opened. "temp" indicates that
+unique temporary files are created in @var{path} and unlinked after opening.
+Both "create" and "open" indicate that deterministic names are used and the
+files aren't unlinked. The The difference between "create" and "open" is that
+"open" expects the files to already exist. Default is temp.
+@item nofallback
+Fail if backing files cannot be used for guest RAM (e.g., permission error, bad
+path). By default, RAM allocation falls back to normal method.
+@item anyfs
+Suppressing warnings if @var{path} isn't on a hugetlbfs filesystem.
 ETEXI
 
 #ifdef MAP_POPULATE
diff --git a/vl.c b/vl.c
index c8e9c78..3b7cea7 100644
--- a/vl.c
+++ b/vl.c
@@ -185,6 +185,11 @@ static int display_remote;
 const char* keyboard_layout = NULL;
 ram_addr_t ram_size;
 const char *mem_path = NULL;
+int mem_shared = 0;
+int mem_hugetlbfs = 1;
+int mem_temp = 1;
+int mem_create = 1;
+int mem_fallback = 1;
 #ifdef MAP_POPULATE
 int mem_prealloc = 0; /* force preallocation of physical target memory */
 #endif
@@ -2938,9 +2943,43 @@ int main(int argc, char **argv, char **envp)
                 }
                 break;
             }
-            case QEMU_OPTION_mempath:
-                mem_path = optarg;
+            case QEMU_OPTION_mempath: {
+                const char *value;
+                opts = qemu_opts_parse(qemu_find_opts("mempath"), optarg, 1);
+                if (!opts) {
+                    exit(1);
+                }
+
+                mem_path = qemu_opt_get(opts, "path");
+                mem_hugetlbfs = !qemu_opt_get_bool(opts, "anyfs", 0);
+                mem_fallback = qemu_opt_get_bool(opts, "fallback", 1);
+
+                value = qemu_opt_get(opts, "mode");
+                if (value == NULL || !strcmp(value, "temp")) {
+                    mem_temp = 1;
+                } else {
+                    mem_temp = 0;
+                    if (!strcmp(value, "create")) {
+                        mem_create = 1;
+                    } else if (!strcmp(value, "open")) {
+                        mem_create = 0;
+                    } else {
+                        fprintf(stderr, "Invalid -mem-path mode value.\n");
+                        exit(1);
+                    }
+                }
+
+                value = qemu_opt_get(opts, "mmap");
+                if (value == NULL || !strcmp(value, "private")) {
+                    mem_shared = 0;
+                } else if (!strcmp(value, "shared")) {
+                    mem_shared = 1;
+                } else {
+                    fprintf(stderr, "Invalid -mem-path mmap value.\n");
+                    exit(1);
+                }
                 break;
+            }
 #ifdef MAP_POPULATE
             case QEMU_OPTION_mem_prealloc:
                 mem_prealloc = 1;
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] exec: make -mem-path filenames deterministic
  2012-11-20 12:10 ` [Qemu-devel] [PATCH 1/1] exec: make -mem-path filenames deterministic Peter Feiner
@ 2012-11-29 16:29   ` Peter Feiner
  2013-01-07 19:55   ` Anthony Liguori
  1 sibling, 0 replies; 11+ messages in thread
From: Peter Feiner @ 2012-11-29 16:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Andres Lagar-Cavilla, aliguori, Peter Feiner

[-- Attachment #1: Type: text/plain, Size: 12416 bytes --]

Ping?

P.S. Please note a typo in the cover letter: when I referred to KSM, I
meant "kernel samepage merging", not "kernel shared memory" :-)


On Tue, Nov 20, 2012 at 7:10 AM, Peter Feiner <peter@gridcentric.ca> wrote:

> This patch makes the -mem-path filenames deterministic and allows some
> control
> over how QEMU mmaps the files. Given this control, QEMU can be used to
> implement
> exogenous memory management techniques quite simply. Two examples follow:
>
>     1. Post-copy migration (mmap=shared for origin, save device state,
> path in
>        NFS available on both hosts, mmap=shared for destination).
>     2. memory sharing via VM cloning (mmap=shared for master, save device
> state,
>        then mmap=private for clones).
>
> These new features are exposed via arguments to -mem-path:
>
> -mem-path \
>     path[,mode=temp|open|create][,mmap=private|shared][,nofallback][,anyfs]
>
> The backing file names are made deterministic by including their RAMBlocks'
> offsets, which are unique given a qemu command line. Note that Xen's live
> migration relies on RAMBlocks having the same offsets between QEMU
> processes
> (see xen_read_physmap). The new file name format is
>
>     qemu_back_mem.OFFSET+SIZE.NAME[.RANDOM]
>
> where SIZE and NAME are added for extra sanity checking when mode="open".
>
> By default, the old -mem-path behavior is preserved. I.e., mode="temp" is
> used,
> which adds a random suffix to the deterministic name and unlinks the files.
>
> Signed-off-by: Peter Feiner <peter@gridcentric.ca>
> ---
>  cpu-all.h       |    5 ++++
>  exec.c          |   60
> ++++++++++++++++++++++++++++++++++--------------------
>  qemu-config.c   |   26 +++++++++++++++++++++++
>  qemu-options.hx |   24 +++++++++++++++++++--
>  vl.c            |   43 +++++++++++++++++++++++++++++++++++++-
>  5 files changed, 131 insertions(+), 27 deletions(-)
>
> diff --git a/cpu-all.h b/cpu-all.h
> index c9c51b8..a83f71e 100644
> --- a/cpu-all.h
> +++ b/cpu-all.h
> @@ -500,6 +500,11 @@ typedef struct RAMList {
>  extern RAMList ram_list;
>
>  extern const char *mem_path;
> +extern int mem_temp;
> +extern int mem_create;
> +extern int mem_fallback;
> +extern int mem_shared;
> +extern int mem_hugetlbfs;
>  extern int mem_prealloc;
>
>  /* Flags stored in the low bits of the TLB virtual address.  These are
> diff --git a/exec.c b/exec.c
> index 8435de0..75b146c 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2346,7 +2346,7 @@ void qemu_flush_coalesced_mmio_buffer(void)
>
>  #define HUGETLBFS_MAGIC       0x958458f6
>
> -static long gethugepagesize(const char *path)
> +static long getfspagesize(const char *path, int want_hugetlbfs)
>  {
>      struct statfs fs;
>      int ret;
> @@ -2360,7 +2360,7 @@ static long gethugepagesize(const char *path)
>          return 0;
>      }
>
> -    if (fs.f_type != HUGETLBFS_MAGIC)
> +    if (want_hugetlbfs && fs.f_type != HUGETLBFS_MAGIC)
>          fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
>
>      return fs.f_bsize;
> @@ -2373,17 +2373,15 @@ static void *file_ram_alloc(RAMBlock *block,
>      char *filename;
>      void *area;
>      int fd;
> -#ifdef MAP_POPULATE
>      int flags;
> -#endif
> -    unsigned long hpagesize;
> +    unsigned long pagesize;
>
> -    hpagesize = gethugepagesize(path);
> -    if (!hpagesize) {
> +    pagesize = getfspagesize(path, mem_hugetlbfs);
> +    if (!pagesize) {
>          return NULL;
>      }
>
> -    if (memory < hpagesize) {
> +    if (memory < pagesize) {
>          return NULL;
>      }
>
> @@ -2392,20 +2390,35 @@ static void *file_ram_alloc(RAMBlock *block,
>          return NULL;
>      }
>
> -    if (asprintf(&filename, "%s/qemu_back_mem.XXXXXX", path) == -1) {
> +    if (asprintf(&filename, "%s/qemu_back_mem.%"PRIx64"+%"PRIx64".%s",
> +                 path, block->offset, memory, block->mr->name) == -1) {
>          return NULL;
>      }
>
> -    fd = mkstemp(filename);
> +    if (mem_temp) {
> +        if (asprintf(&filename, "%s.XXXXXX", filename) == -1) {
> +            return NULL;
> +        }
> +        fd = mkstemp(filename);
> +    } else {
> +        flags = O_RDWR | O_CLOEXEC | (mem_create ? O_CREAT : 0);
> +        fd = open(filename, flags, 0600);
> +    }
> +
>      if (fd < 0) {
> -        perror("unable to create backing store for hugepages");
> +        fprintf(stderr, "unable to create backing store %s "
> +                "for guest memory: %s\n", filename, strerror(errno));
>          free(filename);
>          return NULL;
>      }
> -    unlink(filename);
> +
> +    if (mem_temp) {
> +        unlink(filename);
> +    }
> +
>      free(filename);
>
> -    memory = (memory+hpagesize-1) & ~(hpagesize-1);
> +    memory = (memory+pagesize-1) & ~(pagesize-1);
>
>      /*
>       * ftruncate is not supported by hugetlbfs in older
> @@ -2416,16 +2429,16 @@ static void *file_ram_alloc(RAMBlock *block,
>      if (ftruncate(fd, memory))
>          perror("ftruncate");
>
> +    flags = mem_shared ? MAP_SHARED : MAP_PRIVATE;
>  #ifdef MAP_POPULATE
>      /* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the
> case
> -     * MAP_PRIVATE is requested.  For mem_prealloc we mmap as MAP_SHARED
> +     * MAP_PRIVATE is requested.  For mem_prealloc we always mmap as
> MAP_SHARED
>       * to sidestep this quirk.
>       */
> -    flags = mem_prealloc ? MAP_POPULATE | MAP_SHARED : MAP_PRIVATE;
> -    area = mmap(0, memory, PROT_READ | PROT_WRITE, flags, fd, 0);
> -#else
> -    area = mmap(0, memory, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
> +    if (mem_prealloc)
> +        flags = MAP_POPULATE | MAP_SHARED;
>  #endif
> +    area = mmap(0, memory, PROT_READ | PROT_WRITE, flags, fd, 0);
>      if (area == MAP_FAILED) {
>          perror("file_ram_alloc: can't mmap RAM pages");
>          close(fd);
> @@ -2561,6 +2574,10 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size,
> void *host,
>  #if defined (__linux__) && !defined(TARGET_S390X)
>              new_block->host = file_ram_alloc(new_block, size, mem_path);
>              if (!new_block->host) {
> +                if (!mem_fallback) {
> +                    fprintf(stderr, "failed to allocate ram file for
> %s\n", mr->name);
> +                    exit(1);
> +                }
>                  new_block->host = qemu_vmalloc(size);
>                  memory_try_enable_merging(new_block->host, size);
>              }
> @@ -2675,11 +2692,10 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t
> length)
>                  if (mem_path) {
>  #if defined(__linux__) && !defined(TARGET_S390X)
>                      if (block->fd) {
> +                        flags = mem_shared ? MAP_SHARED : MAP_PRIVATE;
>  #ifdef MAP_POPULATE
> -                        flags |= mem_prealloc ? MAP_POPULATE | MAP_SHARED
> :
> -                            MAP_PRIVATE;
> -#else
> -                        flags |= MAP_PRIVATE;
> +                        if (mem_prealloc)
> +                            flags = MAP_POPULATE | MAP_SHARED;
>  #endif
>                          area = mmap(vaddr, length, PROT_READ | PROT_WRITE,
>                                      flags, block->fd, offset);
> diff --git a/qemu-config.c b/qemu-config.c
> index 10d1ba4..45b24d2 100644
> --- a/qemu-config.c
> +++ b/qemu-config.c
> @@ -691,6 +691,31 @@ static QemuOptsList qemu_object_opts = {
>      },
>  };
>
> +static QemuOptsList qemu_mempath_opts = {
> +    .name = "mempath",
> +    .implied_opt_name = "path",
> +    .head = QTAILQ_HEAD_INITIALIZER(qemu_mempath_opts.head),
> +    .desc = {
> +        {
> +            .name = "path",
> +            .type = QEMU_OPT_STRING,
> +        },{
> +            .name = "mmap",
> +            .type = QEMU_OPT_STRING,
> +        },{
> +            .name = "mode",
> +            .type = QEMU_OPT_STRING,
> +        },{
> +            .name = "anyfs",
> +            .type = QEMU_OPT_BOOL,
> +        },{
> +            .name = "fallback",
> +            .type = QEMU_OPT_BOOL,
> +        },
> +        { /* end of list */ }
> +    },
> +};
> +
>  static QemuOptsList *vm_config_groups[32] = {
>      &qemu_drive_opts,
>      &qemu_chardev_opts,
> @@ -709,6 +734,7 @@ static QemuOptsList *vm_config_groups[32] = {
>      &qemu_sandbox_opts,
>      &qemu_add_fd_opts,
>      &qemu_object_opts,
> +    &qemu_mempath_opts,
>      NULL,
>  };
>
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 9bb29d3..8972397 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -443,10 +443,28 @@ gigabytes respectively.
>  ETEXI
>
>  DEF("mem-path", HAS_ARG, QEMU_OPTION_mempath,
> -    "-mem-path FILE  provide backing storage for guest RAM\n",
> QEMU_ARCH_ALL)
> +    "-mem-path
> path[,mmap=private|shared][,mode=temp|open|create][,nofallback][,anyfs]\n"
> +    "          provide backing storage for guest RAM\n",
> +    QEMU_ARCH_ALL)
>  STEXI
> -@item -mem-path @var{path}
> -Allocate guest RAM from a temporarily created file in @var{path}.
> +@item -mem-path
> @var{path}[,mmap=private|shared][,mode=temp|open|create][,nofallback][,anyfs]
> +Allocate guest RAM from files in @var{path}.
> +@item mmap=private|shared
> +Specifies how backing files are mmap'd. "private" indicates a COW
> mapping, thus
> +leaving the underlying file unchanged. "shared" indicates a write-through
> +mapping, thus reflecting the guest's memory in the underlying file.
> Default is
> +private.
> +@item mode=temp|open|create
> +Determines how backing files are created and opened. "temp" indicates that
> +unique temporary files are created in @var{path} and unlinked after
> opening.
> +Both "create" and "open" indicate that deterministic names are used and
> the
> +files aren't unlinked. The The difference between "create" and "open" is
> that
> +"open" expects the files to already exist. Default is temp.
> +@item nofallback
> +Fail if backing files cannot be used for guest RAM (e.g., permission
> error, bad
> +path). By default, RAM allocation falls back to normal method.
> +@item anyfs
> +Suppressing warnings if @var{path} isn't on a hugetlbfs filesystem.
>  ETEXI
>
>  #ifdef MAP_POPULATE
> diff --git a/vl.c b/vl.c
> index c8e9c78..3b7cea7 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -185,6 +185,11 @@ static int display_remote;
>  const char* keyboard_layout = NULL;
>  ram_addr_t ram_size;
>  const char *mem_path = NULL;
> +int mem_shared = 0;
> +int mem_hugetlbfs = 1;
> +int mem_temp = 1;
> +int mem_create = 1;
> +int mem_fallback = 1;
>  #ifdef MAP_POPULATE
>  int mem_prealloc = 0; /* force preallocation of physical target memory */
>  #endif
> @@ -2938,9 +2943,43 @@ int main(int argc, char **argv, char **envp)
>                  }
>                  break;
>              }
> -            case QEMU_OPTION_mempath:
> -                mem_path = optarg;
> +            case QEMU_OPTION_mempath: {
> +                const char *value;
> +                opts = qemu_opts_parse(qemu_find_opts("mempath"), optarg,
> 1);
> +                if (!opts) {
> +                    exit(1);
> +                }
> +
> +                mem_path = qemu_opt_get(opts, "path");
> +                mem_hugetlbfs = !qemu_opt_get_bool(opts, "anyfs", 0);
> +                mem_fallback = qemu_opt_get_bool(opts, "fallback", 1);
> +
> +                value = qemu_opt_get(opts, "mode");
> +                if (value == NULL || !strcmp(value, "temp")) {
> +                    mem_temp = 1;
> +                } else {
> +                    mem_temp = 0;
> +                    if (!strcmp(value, "create")) {
> +                        mem_create = 1;
> +                    } else if (!strcmp(value, "open")) {
> +                        mem_create = 0;
> +                    } else {
> +                        fprintf(stderr, "Invalid -mem-path mode
> value.\n");
> +                        exit(1);
> +                    }
> +                }
> +
> +                value = qemu_opt_get(opts, "mmap");
> +                if (value == NULL || !strcmp(value, "private")) {
> +                    mem_shared = 0;
> +                } else if (!strcmp(value, "shared")) {
> +                    mem_shared = 1;
> +                } else {
> +                    fprintf(stderr, "Invalid -mem-path mmap value.\n");
> +                    exit(1);
> +                }
>                  break;
> +            }
>  #ifdef MAP_POPULATE
>              case QEMU_OPTION_mem_prealloc:
>                  mem_prealloc = 1;
> --
> 1.7.5.4
>
>

[-- Attachment #2: Type: text/html, Size: 14461 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 0/1] Exogenous memory management via -mem-path
  2012-11-20 12:10 [Qemu-devel] [PATCH 0/1] Exogenous memory management via -mem-path Peter Feiner
  2012-11-20 12:10 ` [Qemu-devel] [PATCH 1/1] exec: make -mem-path filenames deterministic Peter Feiner
@ 2013-01-02 19:34 ` Peter Feiner
  1 sibling, 0 replies; 11+ messages in thread
From: Peter Feiner @ 2013-01-02 19:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: Andres Lagar-Cavilla, aliguori, Peter Feiner

On Tue, Nov 20, 2012 at 7:10 AM, Peter Feiner <peter@gridcentric.ca> wrote:
>
> This patch proposes a framework that empowers QEMU to implement powerful memory
> management techniques with very low cost. The insight is to leverage existing
> sub-systems, composing functionality in a traditional UNIX style. The key to
> enabling this approach is to expand the -mem-path option to provide more control
> over how backing memory files are named, opened, and mapped.

Ping again.

I've rebased using the latest qemu master:
http://github.com/peterfeiner/qemu/commits/mem-path-options

Thanks!
Peter

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] exec: make -mem-path filenames deterministic
  2012-11-20 12:10 ` [Qemu-devel] [PATCH 1/1] exec: make -mem-path filenames deterministic Peter Feiner
  2012-11-29 16:29   ` Peter Feiner
@ 2013-01-07 19:55   ` Anthony Liguori
  2013-01-08 15:59     ` Peter Feiner
  1 sibling, 1 reply; 11+ messages in thread
From: Anthony Liguori @ 2013-01-07 19:55 UTC (permalink / raw)
  To: Peter Feiner, qemu-devel; +Cc: andreslc

Peter Feiner <peter@gridcentric.ca> writes:

> This patch makes the -mem-path filenames deterministic and allows some control
> over how QEMU mmaps the files. Given this control, QEMU can be used to implement
> exogenous memory management techniques quite simply. Two examples follow:
>
>     1. Post-copy migration (mmap=shared for origin, save device state, path in
>        NFS available on both hosts, mmap=shared for destination).
>     2. memory sharing via VM cloning (mmap=shared for master, save device state,
>        then mmap=private for clones).
>
> These new features are exposed via arguments to -mem-path:
>
> -mem-path \
>     path[,mode=temp|open|create][,mmap=private|shared][,nofallback][,anyfs]
>
> The backing file names are made deterministic by including their RAMBlocks'
> offsets, which are unique given a qemu command line. Note that Xen's live
> migration relies on RAMBlocks having the same offsets between QEMU processes
> (see xen_read_physmap). The new file name format is
>
>     qemu_back_mem.OFFSET+SIZE.NAME[.RANDOM]
>
> where SIZE and NAME are added for extra sanity checking when mode="open".
>
> By default, the old -mem-path behavior is preserved. I.e., mode="temp" is used,
> which adds a random suffix to the deterministic name and unlinks the files.
>
> Signed-off-by: Peter Feiner <peter@gridcentric.ca>


This is not reasonable IMHO.

I was okay with sticking a name on a ramblock, but encoding a guest PA
offset turns this into a supported ABI which I'm not willing to do.

A one line change is one thing, but not a complex new option that
introduces an ABI only for a proprietary product that's jumping through hoops to keep
from contributing useful logic to QEMU.

Regards,

Anthony Liguori

> ---
>  cpu-all.h       |    5 ++++
>  exec.c          |   60 ++++++++++++++++++++++++++++++++++--------------------
>  qemu-config.c   |   26 +++++++++++++++++++++++
>  qemu-options.hx |   24 +++++++++++++++++++--
>  vl.c            |   43 +++++++++++++++++++++++++++++++++++++-
>  5 files changed, 131 insertions(+), 27 deletions(-)
>
> diff --git a/cpu-all.h b/cpu-all.h
> index c9c51b8..a83f71e 100644
> --- a/cpu-all.h
> +++ b/cpu-all.h
> @@ -500,6 +500,11 @@ typedef struct RAMList {
>  extern RAMList ram_list;
>  
>  extern const char *mem_path;
> +extern int mem_temp;
> +extern int mem_create;
> +extern int mem_fallback;
> +extern int mem_shared;
> +extern int mem_hugetlbfs;
>  extern int mem_prealloc;
>  
>  /* Flags stored in the low bits of the TLB virtual address.  These are
> diff --git a/exec.c b/exec.c
> index 8435de0..75b146c 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2346,7 +2346,7 @@ void qemu_flush_coalesced_mmio_buffer(void)
>  
>  #define HUGETLBFS_MAGIC       0x958458f6
>  
> -static long gethugepagesize(const char *path)
> +static long getfspagesize(const char *path, int want_hugetlbfs)
>  {
>      struct statfs fs;
>      int ret;
> @@ -2360,7 +2360,7 @@ static long gethugepagesize(const char *path)
>          return 0;
>      }
>  
> -    if (fs.f_type != HUGETLBFS_MAGIC)
> +    if (want_hugetlbfs && fs.f_type != HUGETLBFS_MAGIC)
>          fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
>  
>      return fs.f_bsize;
> @@ -2373,17 +2373,15 @@ static void *file_ram_alloc(RAMBlock *block,
>      char *filename;
>      void *area;
>      int fd;
> -#ifdef MAP_POPULATE
>      int flags;
> -#endif
> -    unsigned long hpagesize;
> +    unsigned long pagesize;
>  
> -    hpagesize = gethugepagesize(path);
> -    if (!hpagesize) {
> +    pagesize = getfspagesize(path, mem_hugetlbfs);
> +    if (!pagesize) {
>          return NULL;
>      }
>  
> -    if (memory < hpagesize) {
> +    if (memory < pagesize) {
>          return NULL;
>      }
>  
> @@ -2392,20 +2390,35 @@ static void *file_ram_alloc(RAMBlock *block,
>          return NULL;
>      }
>  
> -    if (asprintf(&filename, "%s/qemu_back_mem.XXXXXX", path) == -1) {
> +    if (asprintf(&filename, "%s/qemu_back_mem.%"PRIx64"+%"PRIx64".%s",
> +                 path, block->offset, memory, block->mr->name) == -1) {
>          return NULL;
>      }
>  
> -    fd = mkstemp(filename);
> +    if (mem_temp) {
> +        if (asprintf(&filename, "%s.XXXXXX", filename) == -1) {
> +            return NULL;
> +        }
> +        fd = mkstemp(filename);
> +    } else {
> +        flags = O_RDWR | O_CLOEXEC | (mem_create ? O_CREAT : 0);
> +        fd = open(filename, flags, 0600);
> +    }
> +
>      if (fd < 0) {
> -        perror("unable to create backing store for hugepages");
> +        fprintf(stderr, "unable to create backing store %s "
> +                "for guest memory: %s\n", filename, strerror(errno));
>          free(filename);
>          return NULL;
>      }
> -    unlink(filename);
> +
> +    if (mem_temp) {
> +        unlink(filename);
> +    }
> +
>      free(filename);
>  
> -    memory = (memory+hpagesize-1) & ~(hpagesize-1);
> +    memory = (memory+pagesize-1) & ~(pagesize-1);
>  
>      /*
>       * ftruncate is not supported by hugetlbfs in older
> @@ -2416,16 +2429,16 @@ static void *file_ram_alloc(RAMBlock *block,
>      if (ftruncate(fd, memory))
>          perror("ftruncate");
>  
> +    flags = mem_shared ? MAP_SHARED : MAP_PRIVATE;
>  #ifdef MAP_POPULATE
>      /* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the case
> -     * MAP_PRIVATE is requested.  For mem_prealloc we mmap as MAP_SHARED
> +     * MAP_PRIVATE is requested.  For mem_prealloc we always mmap as MAP_SHARED
>       * to sidestep this quirk.
>       */
> -    flags = mem_prealloc ? MAP_POPULATE | MAP_SHARED : MAP_PRIVATE;
> -    area = mmap(0, memory, PROT_READ | PROT_WRITE, flags, fd, 0);
> -#else
> -    area = mmap(0, memory, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
> +    if (mem_prealloc)
> +        flags = MAP_POPULATE | MAP_SHARED;
>  #endif
> +    area = mmap(0, memory, PROT_READ | PROT_WRITE, flags, fd, 0);
>      if (area == MAP_FAILED) {
>          perror("file_ram_alloc: can't mmap RAM pages");
>          close(fd);
> @@ -2561,6 +2574,10 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
>  #if defined (__linux__) && !defined(TARGET_S390X)
>              new_block->host = file_ram_alloc(new_block, size, mem_path);
>              if (!new_block->host) {
> +                if (!mem_fallback) {
> +                    fprintf(stderr, "failed to allocate ram file for %s\n", mr->name);
> +                    exit(1);
> +                }
>                  new_block->host = qemu_vmalloc(size);
>                  memory_try_enable_merging(new_block->host, size);
>              }
> @@ -2675,11 +2692,10 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
>                  if (mem_path) {
>  #if defined(__linux__) && !defined(TARGET_S390X)
>                      if (block->fd) {
> +                        flags = mem_shared ? MAP_SHARED : MAP_PRIVATE;
>  #ifdef MAP_POPULATE
> -                        flags |= mem_prealloc ? MAP_POPULATE | MAP_SHARED :
> -                            MAP_PRIVATE;
> -#else
> -                        flags |= MAP_PRIVATE;
> +                        if (mem_prealloc)
> +                            flags = MAP_POPULATE | MAP_SHARED;
>  #endif
>                          area = mmap(vaddr, length, PROT_READ | PROT_WRITE,
>                                      flags, block->fd, offset);
> diff --git a/qemu-config.c b/qemu-config.c
> index 10d1ba4..45b24d2 100644
> --- a/qemu-config.c
> +++ b/qemu-config.c
> @@ -691,6 +691,31 @@ static QemuOptsList qemu_object_opts = {
>      },
>  };
>  
> +static QemuOptsList qemu_mempath_opts = {
> +    .name = "mempath",
> +    .implied_opt_name = "path",
> +    .head = QTAILQ_HEAD_INITIALIZER(qemu_mempath_opts.head),
> +    .desc = {
> +        {
> +            .name = "path",
> +            .type = QEMU_OPT_STRING,
> +        },{
> +            .name = "mmap",
> +            .type = QEMU_OPT_STRING,
> +        },{
> +            .name = "mode",
> +            .type = QEMU_OPT_STRING,
> +        },{
> +            .name = "anyfs",
> +            .type = QEMU_OPT_BOOL,
> +        },{
> +            .name = "fallback",
> +            .type = QEMU_OPT_BOOL,
> +        },
> +        { /* end of list */ }
> +    },
> +};
> +
>  static QemuOptsList *vm_config_groups[32] = {
>      &qemu_drive_opts,
>      &qemu_chardev_opts,
> @@ -709,6 +734,7 @@ static QemuOptsList *vm_config_groups[32] = {
>      &qemu_sandbox_opts,
>      &qemu_add_fd_opts,
>      &qemu_object_opts,
> +    &qemu_mempath_opts,
>      NULL,
>  };
>  
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 9bb29d3..8972397 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -443,10 +443,28 @@ gigabytes respectively.
>  ETEXI
>  
>  DEF("mem-path", HAS_ARG, QEMU_OPTION_mempath,
> -    "-mem-path FILE  provide backing storage for guest RAM\n", QEMU_ARCH_ALL)
> +    "-mem-path path[,mmap=private|shared][,mode=temp|open|create][,nofallback][,anyfs]\n"
> +    "          provide backing storage for guest RAM\n",
> +    QEMU_ARCH_ALL)
>  STEXI
> -@item -mem-path @var{path}
> -Allocate guest RAM from a temporarily created file in @var{path}.
> +@item -mem-path @var{path}[,mmap=private|shared][,mode=temp|open|create][,nofallback][,anyfs]
> +Allocate guest RAM from files in @var{path}.
> +@item mmap=private|shared
> +Specifies how backing files are mmap'd. "private" indicates a COW mapping, thus
> +leaving the underlying file unchanged. "shared" indicates a write-through
> +mapping, thus reflecting the guest's memory in the underlying file. Default is
> +private.
> +@item mode=temp|open|create
> +Determines how backing files are created and opened. "temp" indicates that
> +unique temporary files are created in @var{path} and unlinked after opening.
> +Both "create" and "open" indicate that deterministic names are used and the
> +files aren't unlinked. The The difference between "create" and "open" is that
> +"open" expects the files to already exist. Default is temp.
> +@item nofallback
> +Fail if backing files cannot be used for guest RAM (e.g., permission error, bad
> +path). By default, RAM allocation falls back to normal method.
> +@item anyfs
> +Suppressing warnings if @var{path} isn't on a hugetlbfs filesystem.
>  ETEXI
>  
>  #ifdef MAP_POPULATE
> diff --git a/vl.c b/vl.c
> index c8e9c78..3b7cea7 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -185,6 +185,11 @@ static int display_remote;
>  const char* keyboard_layout = NULL;
>  ram_addr_t ram_size;
>  const char *mem_path = NULL;
> +int mem_shared = 0;
> +int mem_hugetlbfs = 1;
> +int mem_temp = 1;
> +int mem_create = 1;
> +int mem_fallback = 1;
>  #ifdef MAP_POPULATE
>  int mem_prealloc = 0; /* force preallocation of physical target memory */
>  #endif
> @@ -2938,9 +2943,43 @@ int main(int argc, char **argv, char **envp)
>                  }
>                  break;
>              }
> -            case QEMU_OPTION_mempath:
> -                mem_path = optarg;
> +            case QEMU_OPTION_mempath: {
> +                const char *value;
> +                opts = qemu_opts_parse(qemu_find_opts("mempath"), optarg, 1);
> +                if (!opts) {
> +                    exit(1);
> +                }
> +
> +                mem_path = qemu_opt_get(opts, "path");
> +                mem_hugetlbfs = !qemu_opt_get_bool(opts, "anyfs", 0);
> +                mem_fallback = qemu_opt_get_bool(opts, "fallback", 1);
> +
> +                value = qemu_opt_get(opts, "mode");
> +                if (value == NULL || !strcmp(value, "temp")) {
> +                    mem_temp = 1;
> +                } else {
> +                    mem_temp = 0;
> +                    if (!strcmp(value, "create")) {
> +                        mem_create = 1;
> +                    } else if (!strcmp(value, "open")) {
> +                        mem_create = 0;
> +                    } else {
> +                        fprintf(stderr, "Invalid -mem-path mode value.\n");
> +                        exit(1);
> +                    }
> +                }
> +
> +                value = qemu_opt_get(opts, "mmap");
> +                if (value == NULL || !strcmp(value, "private")) {
> +                    mem_shared = 0;
> +                } else if (!strcmp(value, "shared")) {
> +                    mem_shared = 1;
> +                } else {
> +                    fprintf(stderr, "Invalid -mem-path mmap value.\n");
> +                    exit(1);
> +                }
>                  break;
> +            }
>  #ifdef MAP_POPULATE
>              case QEMU_OPTION_mem_prealloc:
>                  mem_prealloc = 1;
> -- 
> 1.7.5.4

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] exec: make -mem-path filenames deterministic
  2013-01-07 19:55   ` Anthony Liguori
@ 2013-01-08 15:59     ` Peter Feiner
  2013-01-08 19:04       ` Anthony Liguori
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Feiner @ 2013-01-08 15:59 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Andres Lagar-Cavilla, qemu-devel

> This is not reasonable IMHO.
>
> I was okay with sticking a name on a ramblock, but encoding a guest PA
> offset turns this into a supported ABI which I'm not willing to do.
>
> A one line change is one thing, but not a complex new option that
> introduces an ABI only for a proprietary product that's jumping through hoops to keep
> from contributing useful logic to QEMU.

Hi Anthony,

Thanks for getting back to me.

Sticking a name on the ramblock file would suite our product just
fine. Indeed, this is what we had agreed upon at the KVM forum.
However, I submitted a more complex patch in an attempt to expose a
more general & easy to use feature; I was trying to make a more useful
contribution than the simple patch :-)

Perhaps I can assuage your ABI concern and argue the utility of this
patch vs the one-line version. However, if you aren't satisfied,
please let me know and I'll resubmit the one-line version.

On ABI: This patch doesn't add a new ABI. QEMU already has this ABI
due to Xen live migration.

When a Xen domain is booted, a new domain is created with an empty
physmap. Then QEMU is launched. QEMU creates its ramblocks and, via
memory callbacks (xen_add_to_physmap), populates Xen's physmap using
ramblock sizes & offsets.

On incoming migration, the Xen toolstack creates a new domain,
populates its physmap, and copies RAM from the outgoing migration.
When QEMU is launched, it populates its Xen memory model (i.e.,
XenIOState) by reading the domain's existing physmap from xenstore.
When QEMU creates ramblocks, the callbacks in xen-all.c _ignore_ the
new ramblocks because their offsets are already in the physmap. If the
new ramblocks had different sizes & offsets than those from the
outgoing QEMU process, then QEMU's memory model would be inconsistent
with Xen's (i.e., the physmap maintained by the hypervisor and the
XenIOState maintained in userspace). In particular, QEMU would expect
memory at a particular physmap offset that wouldn't have been
populated by the Xen toolstack during live migration.

On utility: Just adding ramblock names to backing file paths makes
post-copy migration & cloning possible, but involves some painful VFS
contortions, which I give a detailed example of below. On the other
hand, these new -mem-path parameters make post-copy migration &
cloning simple by leveraging an existing QMP command, existing
filesystems, and kernel behavior. Put another way, the useful logic
for memory sharing and post-copy live migration already exists in the
kernel and a myriad of filesystems.  A fairly small patch (albeit not
one line) enables that logic in QEMU.

Peter

Detailed example:

Suppose you have a patched QEMU that adds ramblock names to their
backing files and you want to implement memory sharing via cloning.
When clones come up, each of their ramblocks' backing files need to
contain the same data as the corresponding backing file from the
parent (obviously you want those new backing files to somehow share
pages and COW). The basic idea is to save the parent's ramblock files
and arrange for the clones to open them.

You can see the parent's ramblock files easily enough by looking at
the unlinked ramblock files (e.g., /proc/pid/fd/10 is a symlink to
/tmp/qemu_back_mem.pc.ram.WHFZYw (deleted), /proc/pid/fd/11 is a
symlink to /tmp/qemu_back_mem.vga.vram.WT1yQW (deleted), etc.).
Unfortunately, since they're all mapped MAP_PRIVATE, these symlinks,
when opened, will give all zeros. So you can either implement your own
filesystem that gives you a backdoor to the MAP_PRIVATE pages (fast
but complicated), or you can use qemu's monitor to dump guest RAM
(slow but works).

When a clone runs and creates a new backing file using mkstemp, you
need to arrange for that backing file to somehow contain the same data
as the corresponding file from the parent. There is an obvious
heuristic for determining this correspondence: parse the ramblock name
from the child's file and use the matching file from the parent.
Correctness aside (e.g., multiple ramblocks can have the same name,
e.g., e1000.rom, but this is moot because the _important_ ramblocks,
i.e., pc.ram and vga.ram, are unique in the emulated system we care
about), implementing this heuristic is a pain. To see the file being
created, you need to implement a custom file system. Moreover, to
share memory with another file that's been opened MAP_PRIVATE, you
have to implement your own VMA operations. Oye!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] exec: make -mem-path filenames deterministic
  2013-01-08 15:59     ` Peter Feiner
@ 2013-01-08 19:04       ` Anthony Liguori
  2013-01-08 19:59         ` Peter Maydell
  2013-03-01 17:21         ` [Qemu-devel] [PATCH v2] " peter
  0 siblings, 2 replies; 11+ messages in thread
From: Anthony Liguori @ 2013-01-08 19:04 UTC (permalink / raw)
  To: Peter Feiner; +Cc: Andres Lagar-Cavilla, qemu-devel

Peter Feiner <peter@gridcentric.ca> writes:

>> This is not reasonable IMHO.
>>
>> I was okay with sticking a name on a ramblock, but encoding a guest PA
>> offset turns this into a supported ABI which I'm not willing to do.
>>
>> A one line change is one thing, but not a complex new option that
>> introduces an ABI only for a proprietary product that's jumping through hoops to keep
>> from contributing useful logic to QEMU.
>
> Hi Anthony,
>
> Thanks for getting back to me.
>
> Sticking a name on the ramblock file would suite our product just
> fine. Indeed, this is what we had agreed upon at the KVM forum.
> However, I submitted a more complex patch in an attempt to expose a
> more general & easy to use feature; I was trying to make a more useful
> contribution than the simple patch :-)
>
> Perhaps I can assuage your ABI concern and argue the utility of this
> patch vs the one-line version. However, if you aren't satisfied,
> please let me know and I'll resubmit the one-line version.

Yes, please submit the oneliner.

> On ABI: This patch doesn't add a new ABI. QEMU already has this ABI
> due to Xen live migration.
>
> When a Xen domain is booted, a new domain is created with an empty
> physmap. Then QEMU is launched. QEMU creates its ramblocks and, via
> memory callbacks (xen_add_to_physmap), populates Xen's physmap using
> ramblock sizes & offsets.
>
> On incoming migration, the Xen toolstack creates a new domain,
> populates its physmap, and copies RAM from the outgoing migration.
> When QEMU is launched, it populates its Xen memory model (i.e.,
> XenIOState) by reading the domain's existing physmap from xenstore.
> When QEMU creates ramblocks, the callbacks in xen-all.c _ignore_ the
> new ramblocks because their offsets are already in the physmap. If the
> new ramblocks had different sizes & offsets than those from the
> outgoing QEMU process, then QEMU's memory model would be inconsistent
> with Xen's (i.e., the physmap maintained by the hypervisor and the
> XenIOState maintained in userspace). In particular, QEMU would expect
> memory at a particular physmap offset that wouldn't have been
> populated by the Xen toolstack during live migration.

This is an internal detail between Xen and QEMU.  That doesn't mean it's
a general public API.

I'm fairly certain that Xen does not support arbitrary versions of QEMU
to be used as qemu-dm.

Regards,

Anthony Liguori

>
> On utility: Just adding ramblock names to backing file paths makes
> post-copy migration & cloning possible, but involves some painful VFS
> contortions, which I give a detailed example of below. On the other
> hand, these new -mem-path parameters make post-copy migration &
> cloning simple by leveraging an existing QMP command, existing
> filesystems, and kernel behavior. Put another way, the useful logic
> for memory sharing and post-copy live migration already exists in the
> kernel and a myriad of filesystems.  A fairly small patch (albeit not
> one line) enables that logic in QEMU.
>
> Peter
>
> Detailed example:
>
> Suppose you have a patched QEMU that adds ramblock names to their
> backing files and you want to implement memory sharing via cloning.
> When clones come up, each of their ramblocks' backing files need to
> contain the same data as the corresponding backing file from the
> parent (obviously you want those new backing files to somehow share
> pages and COW). The basic idea is to save the parent's ramblock files
> and arrange for the clones to open them.
>
> You can see the parent's ramblock files easily enough by looking at
> the unlinked ramblock files (e.g., /proc/pid/fd/10 is a symlink to
> /tmp/qemu_back_mem.pc.ram.WHFZYw (deleted), /proc/pid/fd/11 is a
> symlink to /tmp/qemu_back_mem.vga.vram.WT1yQW (deleted), etc.).
> Unfortunately, since they're all mapped MAP_PRIVATE, these symlinks,
> when opened, will give all zeros. So you can either implement your own
> filesystem that gives you a backdoor to the MAP_PRIVATE pages (fast
> but complicated), or you can use qemu's monitor to dump guest RAM
> (slow but works).
>
> When a clone runs and creates a new backing file using mkstemp, you
> need to arrange for that backing file to somehow contain the same data
> as the corresponding file from the parent. There is an obvious
> heuristic for determining this correspondence: parse the ramblock name
> from the child's file and use the matching file from the parent.
> Correctness aside (e.g., multiple ramblocks can have the same name,
> e.g., e1000.rom, but this is moot because the _important_ ramblocks,
> i.e., pc.ram and vga.ram, are unique in the emulated system we care
> about), implementing this heuristic is a pain. To see the file being
> created, you need to implement a custom file system. Moreover, to
> share memory with another file that's been opened MAP_PRIVATE, you
> have to implement your own VMA operations. Oye!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] exec: make -mem-path filenames deterministic
  2013-01-08 19:04       ` Anthony Liguori
@ 2013-01-08 19:59         ` Peter Maydell
  2013-03-01 17:21         ` [Qemu-devel] [PATCH v2] " peter
  1 sibling, 0 replies; 11+ messages in thread
From: Peter Maydell @ 2013-01-08 19:59 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Andres Lagar-Cavilla, Peter Feiner, qemu-devel

On 8 January 2013 19:04, Anthony Liguori <aliguori@us.ibm.com> wrote:
> This is an internal detail between Xen and QEMU.  That doesn't mean it's
> a general public API.
>
> I'm fairly certain that Xen does not support arbitrary versions of QEMU
> to be used as qemu-dm.

I would have thought that would be a large part of the point of
upstreaming the Xen stuff into QEMU?

-- PMM

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v2] exec: make -mem-path filenames deterministic
  2013-01-08 19:04       ` Anthony Liguori
  2013-01-08 19:59         ` Peter Maydell
@ 2013-03-01 17:21         ` peter
  2013-03-01 18:47           ` Andreas Färber
  1 sibling, 1 reply; 11+ messages in thread
From: peter @ 2013-03-01 17:21 UTC (permalink / raw)
  To: qemu-devel, aliguori; +Cc: andreslc, peter

From: Peter Feiner <peter@gridcentric.ca>

Adds ramblocks' names to their backing files when using -mem-path.  Eases
introspection and debugging.

Signed-off-by: Peter Feiner <peter@gridcentric.ca>
---

On Tue, Jan 8, 2013 at 2:04 PM, Anthony Liguori <aliguori@us.ibm.com> wrote:
>
> Yes, please submit the oneliner.

Here it is :)

The commit should probably be called "exec: add ramblocks' names to -mem-path
files" since the paths aren't deterministic.

v1 -> v2: Just add ramblock name to mkstemp template. 

Thanks,
Peter

 exec.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/exec.c b/exec.c
index a41bcb8..16a5452 100644
--- a/exec.c
+++ b/exec.c
@@ -865,7 +865,8 @@ static void *file_ram_alloc(RAMBlock *block,
         return NULL;
     }
 
-    filename = g_strdup_printf("%s/qemu_back_mem.XXXXXX", path);
+    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
+                               block->mr->name);
 
     fd = mkstemp(filename);
     if (fd < 0) {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v2] exec: make -mem-path filenames deterministic
  2013-03-01 17:21         ` [Qemu-devel] [PATCH v2] " peter
@ 2013-03-01 18:47           ` Andreas Färber
  2013-03-01 19:20             ` Peter Feiner
  0 siblings, 1 reply; 11+ messages in thread
From: Andreas Färber @ 2013-03-01 18:47 UTC (permalink / raw)
  To: peter; +Cc: andreslc, aliguori, qemu-devel

Am 01.03.2013 18:21, schrieb peter@gridcentric.ca:
> From: Peter Feiner <peter@gridcentric.ca>
> 
> Adds ramblocks' names to their backing files when using -mem-path.  Eases
> introspection and debugging.
> 
> Signed-off-by: Peter Feiner <peter@gridcentric.ca>
> ---
> 
> On Tue, Jan 8, 2013 at 2:04 PM, Anthony Liguori <aliguori@us.ibm.com> wrote:
>>
>> Yes, please submit the oneliner.
> 
> Here it is :)
> 
> The commit should probably be called "exec: add ramblocks' names to -mem-path
> files" since the paths aren't deterministic.
> 
> v1 -> v2: Just add ramblock name to mkstemp template. 
> 
> Thanks,
> Peter
> 
>  exec.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/exec.c b/exec.c
> index a41bcb8..16a5452 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -865,7 +865,8 @@ static void *file_ram_alloc(RAMBlock *block,
>          return NULL;
>      }
>  
> -    filename = g_strdup_printf("%s/qemu_back_mem.XXXXXX", path);
> +    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
> +                               block->mr->name);

Is it safe to use a MemoryRegion name as part of mkstemp()? "/../" seems
rather unlikely but a "/" less so.

Also note that v2 is expected to be posted as a top-level patch since
the patches tool does not cope well with "hidden" reposts.

Andreas

>  
>      fd = mkstemp(filename);
>      if (fd < 0) {
> 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v2] exec: make -mem-path filenames deterministic
  2013-03-01 18:47           ` Andreas Färber
@ 2013-03-01 19:20             ` Peter Feiner
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Feiner @ 2013-03-01 19:20 UTC (permalink / raw)
  To: Andreas Färber; +Cc: Andres Lagar-Cavilla, Anthony Liguori, qemu-devel

On Fri, Mar 1, 2013 at 1:47 PM, Andreas Färber <afaerber@suse.de> wrote:
> Am 01.03.2013 18:21, schrieb peter@gridcentric.ca:
>
> Is it safe to use a MemoryRegion name as part of mkstemp()? "/../" seems
> rather unlikely but a "/" less so.

Since mkstemp essentially calls open, "/" does what you'd expect. So a
MemoryRegion name with "/" would cause problems. I can sanitize the
name by replacing "/" with "_". I'll post this as v3.

> Also note that v2 is expected to be posted as a top-level patch since
> the patches tool does not cope well with "hidden" reposts.

Thanks for the tip. I'll post v3 without "In-Reply-To".

Peter

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-03-01 19:20 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-20 12:10 [Qemu-devel] [PATCH 0/1] Exogenous memory management via -mem-path Peter Feiner
2012-11-20 12:10 ` [Qemu-devel] [PATCH 1/1] exec: make -mem-path filenames deterministic Peter Feiner
2012-11-29 16:29   ` Peter Feiner
2013-01-07 19:55   ` Anthony Liguori
2013-01-08 15:59     ` Peter Feiner
2013-01-08 19:04       ` Anthony Liguori
2013-01-08 19:59         ` Peter Maydell
2013-03-01 17:21         ` [Qemu-devel] [PATCH v2] " peter
2013-03-01 18:47           ` Andreas Färber
2013-03-01 19:20             ` Peter Feiner
2013-01-02 19:34 ` [Qemu-devel] [PATCH 0/1] Exogenous memory management via -mem-path Peter Feiner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.