qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [RFC v4 PATCH 00/49] Initial support of multi-process qemu
@ 2019-10-24  9:08 Jagannathan Raman
  2019-10-24  9:08 ` [RFC v4 PATCH 01/49] multi-process: memory: alloc RAM from file at offset Jagannathan Raman
                   ` (53 more replies)
  0 siblings, 54 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

Started with the presentation in October 2017 made by Marc-Andre (Red Hat)
and Konrad Wilk (Oracle) [1], and continued by Jag's BoF at KVM Forum 2018,
the multi-process project is now a prototype and presented in this patchset.
John & Elena will present the status of this project in KVM Forum 2019.

This first series enables the emulation of lsi53c895a in a separate process.

We posted the Proof Of Concept patches [2] before the BoF session in 2018.
Subsequently, we posted RFC v1 [3], RFC v2 [4] and RFC v3 [5] of this series. 

We want to present version 4 of this series, which incorporates the feedback
we received for v3 & adds support for live migrating the remote process.

Following people contributed to this patchset:

John G Johnson <john.g.johnson@oracle.com>
Jagannathan Raman <jag.raman@oracle.com>
Elena Ufimtseva <elena.ufimtseva@oracle.com>
Kanth Ghatraju <kanth.ghatraju@oracle.com>

For full concept writeup about QEMU disaggregation refer to
docs/devel/qemu-multiprocess.rst. Please refer to 
docs/qemu-multiprocess.txt for usage information.

We are planning on making the following improvements in the future:
 - Performance improvements
 - Libvirt support
 - Enforcement of security policies
 - blockdev support

We welcome all your ideas, concerns, and questions for this patchset.

Thank you!

[1]: http://events17.linuxfoundation.org/sites/events/files/slides/KVM%20FORUM%20multi-process.pdf
[1]: https://www.youtube.com/watch?v=Kq1-coHh7lg
[2]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg566538.html
[3]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg602285.html
[4]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg624877.html
[5]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg642000.html

Elena Ufimtseva (22):
  multi-process: add a command line option for debug file
  multi-process: introduce proxy object
  mutli-process: build remote command line args
  multi-process: configure remote side devices
  multi-process: add qdev_proxy_add to create proxy devices
  multi-process: remote: add setup_devices and setup_drive msg
    processing
  multi-process: remote: use fd for socket from parent process
  multi-process: remote: add create_done condition
  multi-process: add processing of remote drive and device command line
  multi-process: refractor vl.c code to re-use in remote
  multi-process: add remote option
  multi-process: add remote options parser
  multi-process: add parse_cmdline in remote process
  multi-process: send heartbeat messages to remote
  multi-process: handle heartbeat messages in remote process
  multi-process/mon: choose HMP commands based on target
  multi-process/mig: Load VMSD in the proxy object
  multi-process/mig: refactor runstate_check into common file
  multi-process/mig: Synchronize runstate of remote process
  multi-process/mig: Restore the VMSD in remote process
  multi-process: Enable support for multiple devices in remote
  multi-process: add configure and usage information

Jagannathan Raman (26):
  multi-process: memory: alloc RAM from file at offset
  multi-process: util: Add qemu_thread_cancel() to cancel running thread
  multi-process: Add stub functions to facilate build of multi-process
  multi-process: Add config option for multi-process QEMU
  multi-process: build system for remote device process
  multi-process: define mpqemu-link object
  multi-process: add functions to synchronize proxy and remote endpoints
  multi-process: setup PCI host bridge for remote device
  multi-process: setup a machine object for remote device process
  multi-process: setup memory manager for remote device
  multi-process: remote process initialization
  multi-process: PCI BAR read/write handling for proxy & remote
    endpoints
  multi-process: Add LSI device proxy object
  multi-process: Synchronize remote memory
  multi-process: create IOHUB object to handle irq
  multi-process: Introduce build flags to separate remote process code
  multi-process: Use separate MMIO communication channel
  multi-process: perform device reset in the remote process
  multi-process/mon: stub functions to enable QMP module for remote
    process
  multi-process/mon: enable QMP module support in the remote process
  multi-process/mon: Refactor monitor/chardev functions out of vl.c
  multi-process/mon: Initialize QMP module for remote processes
  multi-process: prevent duplicate memory initialization in remote
  multi-process/mig: build migration module in the remote process
  multi-process/mig: Enable VMSD save in the Proxy object
  multi-process/mig: Send VMSD of remote to the Proxy object

John G Johnson (1):
  multi-process: add the concept description to
    docs/devel/qemu-multiprocess

 Makefile                            |    2 +
 Makefile.objs                       |   39 ++
 Makefile.target                     |   94 ++-
 accel/stubs/kvm-stub.c              |    5 +
 accel/stubs/tcg-stub.c              |  106 ++++
 backends/Makefile.objs              |    2 +
 block/Makefile.objs                 |    2 +
 chardev/char.c                      |   14 +
 configure                           |   15 +
 docs/devel/index.rst                |    1 +
 docs/devel/qemu-multiprocess.rst    | 1102 +++++++++++++++++++++++++++++++++++
 docs/qemu-multiprocess.txt          |   86 +++
 exec.c                              |   14 +-
 hmp-commands-info.hx                |   10 +
 hmp-commands.hx                     |   25 +-
 hw/Makefile.objs                    |    9 +
 hw/block/Makefile.objs              |    2 +
 hw/core/Makefile.objs               |   17 +
 hw/nvram/Makefile.objs              |    2 +
 hw/pci/Makefile.objs                |    4 +
 hw/proxy/Makefile.objs              |    1 +
 hw/proxy/memory-sync.c              |  226 +++++++
 hw/proxy/proxy-lsi53c895a.c         |   97 +++
 hw/proxy/qemu-proxy.c               |  807 +++++++++++++++++++++++++
 hw/scsi/Makefile.objs               |    2 +
 include/chardev/char.h              |    1 +
 include/exec/address-spaces.h       |    2 +
 include/exec/ram_addr.h             |    2 +-
 include/hw/pci/pci_ids.h            |    3 +
 include/hw/proxy/memory-sync.h      |   51 ++
 include/hw/proxy/proxy-lsi53c895a.h |   42 ++
 include/hw/proxy/qemu-proxy.h       |  125 ++++
 include/hw/qdev-core.h              |    2 +
 include/io/mpqemu-link.h            |  214 +++++++
 include/monitor/monitor.h           |    2 +
 include/monitor/qdev.h              |   25 +
 include/qemu-common.h               |    8 +
 include/qemu/log.h                  |    1 +
 include/qemu/mmap-alloc.h           |    3 +-
 include/qemu/thread.h               |    1 +
 include/remote/iohub.h              |   63 ++
 include/remote/machine.h            |   48 ++
 include/remote/memory.h             |   34 ++
 include/remote/pcihost.h            |   59 ++
 include/sysemu/runstate.h           |    3 +
 io/Makefile.objs                    |    2 +
 io/mpqemu-link.c                    |  351 +++++++++++
 memory.c                            |    2 +-
 migration/Makefile.objs             |   12 +
 migration/savevm.c                  |   63 ++
 migration/savevm.h                  |    3 +
 monitor/Makefile.objs               |    3 +
 monitor/misc.c                      |   84 +--
 monitor/monitor-internal.h          |   38 ++
 monitor/monitor.c                   |   83 ++-
 net/Makefile.objs                   |    2 +
 qapi/Makefile.objs                  |    2 +
 qdev-monitor.c                      |  270 ++++++++-
 qemu-options.hx                     |   21 +
 qom/Makefile.objs                   |    4 +
 remote/Makefile.objs                |    6 +
 remote/iohub.c                      |  159 +++++
 remote/machine.c                    |  133 +++++
 remote/memory.c                     |   99 ++++
 remote/pcihost.c                    |   85 +++
 remote/remote-main.c                |  633 ++++++++++++++++++++
 remote/remote-opts.c                |  131 +++++
 remote/remote-opts.h                |   31 +
 replay/Makefile.objs                |    2 +-
 rules.mak                           |    2 +-
 runstate.c                          |   41 ++
 scripts/hxtool                      |   44 +-
 stubs/audio.c                       |   12 +
 stubs/gdbstub.c                     |   21 +
 stubs/machine-init-done.c           |    4 +
 stubs/migration.c                   |  211 +++++++
 stubs/monitor.c                     |   72 +++
 stubs/net-stub.c                    |  121 ++++
 stubs/qapi-misc.c                   |   43 ++
 stubs/qapi-target.c                 |   49 ++
 stubs/replay.c                      |   26 +
 stubs/runstate-check.c              |    3 +
 stubs/ui-stub.c                     |  130 +++++
 stubs/vl-stub.c                     |  193 ++++++
 stubs/vmstate.c                     |   20 +
 stubs/xen-mapcache.c                |   22 +
 ui/Makefile.objs                    |    2 +
 util/log.c                          |    2 +
 util/mmap-alloc.c                   |    7 +-
 util/oslib-posix.c                  |    2 +-
 util/qemu-thread-posix.c            |   10 +
 vl-parse.c                          |  161 +++++
 vl.c                                |  310 ++++------
 vl.h                                |   54 ++
 94 files changed, 6908 insertions(+), 246 deletions(-)
 create mode 100644 docs/devel/qemu-multiprocess.rst
 create mode 100644 docs/qemu-multiprocess.txt
 create mode 100644 hw/proxy/Makefile.objs
 create mode 100644 hw/proxy/memory-sync.c
 create mode 100644 hw/proxy/proxy-lsi53c895a.c
 create mode 100644 hw/proxy/qemu-proxy.c
 create mode 100644 include/hw/proxy/memory-sync.h
 create mode 100644 include/hw/proxy/proxy-lsi53c895a.h
 create mode 100644 include/hw/proxy/qemu-proxy.h
 create mode 100644 include/io/mpqemu-link.h
 create mode 100644 include/remote/iohub.h
 create mode 100644 include/remote/machine.h
 create mode 100644 include/remote/memory.h
 create mode 100644 include/remote/pcihost.h
 create mode 100644 io/mpqemu-link.c
 create mode 100644 remote/Makefile.objs
 create mode 100644 remote/iohub.c
 create mode 100644 remote/machine.c
 create mode 100644 remote/memory.c
 create mode 100644 remote/pcihost.c
 create mode 100644 remote/remote-main.c
 create mode 100644 remote/remote-opts.c
 create mode 100644 remote/remote-opts.h
 create mode 100644 runstate.c
 mode change 100644 => 100755 scripts/hxtool
 create mode 100644 stubs/audio.c
 create mode 100644 stubs/migration.c
 create mode 100644 stubs/net-stub.c
 create mode 100644 stubs/qapi-misc.c
 create mode 100644 stubs/qapi-target.c
 create mode 100644 stubs/ui-stub.c
 create mode 100644 stubs/vl-stub.c
 create mode 100644 stubs/xen-mapcache.c
 create mode 100644 vl-parse.c
 create mode 100644 vl.h

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 01/49] multi-process: memory: alloc RAM from file at offset
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
@ 2019-10-24  9:08 ` Jagannathan Raman
  2019-10-24  9:08 ` [RFC v4 PATCH 02/49] multi-process: util: Add qemu_thread_cancel() to cancel running thread Jagannathan Raman
                   ` (52 subsequent siblings)
  53 siblings, 0 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

Allow RAM MemoryRegion to be created from an offset in a file, instead
of allocating at offset of 0 by default. This is needed to synchronize
RAM between QEMU & remote process.
This will be needed for the following patches.

Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 exec.c                    | 11 +++++++----
 include/exec/ram_addr.h   |  2 +-
 include/qemu/mmap-alloc.h |  3 ++-
 memory.c                  |  2 +-
 util/mmap-alloc.c         |  7 ++++---
 util/oslib-posix.c        |  2 +-
 6 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/exec.c b/exec.c
index fb0943c..08c4181 100644
--- a/exec.c
+++ b/exec.c
@@ -1871,6 +1871,7 @@ static void *file_ram_alloc(RAMBlock *block,
                             ram_addr_t memory,
                             int fd,
                             bool truncate,
+                            off_t offset,
                             Error **errp)
 {
     MachineState *ms = MACHINE(qdev_get_machine());
@@ -1922,7 +1923,8 @@ static void *file_ram_alloc(RAMBlock *block,
     }
 
     area = qemu_ram_mmap(fd, memory, block->mr->align,
-                         block->flags & RAM_SHARED, block->flags & RAM_PMEM);
+                         block->flags & RAM_SHARED, block->flags & RAM_PMEM,
+                         offset);
     if (area == MAP_FAILED) {
         error_setg_errno(errp, errno,
                          "unable to map backing store for guest RAM");
@@ -2309,7 +2311,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
 #ifdef CONFIG_POSIX
 RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
                                  uint32_t ram_flags, int fd,
-                                 Error **errp)
+                                 off_t offset, Error **errp)
 {
     RAMBlock *new_block;
     Error *local_err = NULL;
@@ -2354,7 +2356,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
     new_block->used_length = size;
     new_block->max_length = size;
     new_block->flags = ram_flags;
-    new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
+    new_block->host = file_ram_alloc(new_block, size, fd, !file_size, offset,
+                                     errp);
     if (!new_block->host) {
         g_free(new_block);
         return NULL;
@@ -2384,7 +2387,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
         return NULL;
     }
 
-    block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, errp);
+    block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, 0, errp);
     if (!block) {
         if (created) {
             unlink(mem_path);
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index ad158bb..92134c0 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -159,7 +159,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
                                    Error **errp);
 RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
                                  uint32_t ram_flags, int fd,
-                                 Error **errp);
+                                 off_t offset, Error **errp);
 
 RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
                                   MemoryRegion *mr, Error **errp);
diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index e786266..4f57985 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -25,7 +25,8 @@ void *qemu_ram_mmap(int fd,
                     size_t size,
                     size_t align,
                     bool shared,
-                    bool is_pmem);
+                    bool is_pmem,
+                    off_t start);
 
 void qemu_ram_munmap(int fd, void *ptr, size_t size);
 
diff --git a/memory.c b/memory.c
index c952eab..c25c74f 100644
--- a/memory.c
+++ b/memory.c
@@ -1602,7 +1602,7 @@ void memory_region_init_ram_from_fd(MemoryRegion *mr,
     mr->destructor = memory_region_destructor_ram;
     mr->ram_block = qemu_ram_alloc_from_fd(size, mr,
                                            share ? RAM_SHARED : 0,
-                                           fd, &err);
+                                           fd, 0, &err);
     mr->dirty_log_mask = tcg_enabled() ? (1 << DIRTY_MEMORY_CODE) : 0;
     if (err) {
         mr->size = int128_zero();
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index f7f177d..4b727bd 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -86,7 +86,8 @@ void *qemu_ram_mmap(int fd,
                     size_t size,
                     size_t align,
                     bool shared,
-                    bool is_pmem)
+                    bool is_pmem,
+                    off_t start)
 {
     int flags;
     int map_sync_flags = 0;
@@ -147,7 +148,7 @@ void *qemu_ram_mmap(int fd,
     offset = QEMU_ALIGN_UP((uintptr_t)guardptr, align) - (uintptr_t)guardptr;
 
     ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE,
-               flags | map_sync_flags, fd, 0);
+               flags | map_sync_flags, fd, start);
 
     if (ptr == MAP_FAILED && map_sync_flags) {
         if (errno == ENOTSUP) {
@@ -172,7 +173,7 @@ void *qemu_ram_mmap(int fd,
          * we will remove these flags to handle compatibility.
          */
         ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE,
-                   flags, fd, 0);
+                   flags, fd, start);
     }
 
     if (ptr == MAP_FAILED) {
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index f869338..bdfcdcf 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -205,7 +205,7 @@ void *qemu_memalign(size_t alignment, size_t size)
 void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
 {
     size_t align = QEMU_VMALLOC_ALIGN;
-    void *ptr = qemu_ram_mmap(-1, size, align, shared, false);
+    void *ptr = qemu_ram_mmap(-1, size, align, shared, false, 0);
 
     if (ptr == MAP_FAILED) {
         return NULL;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 02/49] multi-process: util: Add qemu_thread_cancel() to cancel running thread
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
  2019-10-24  9:08 ` [RFC v4 PATCH 01/49] multi-process: memory: alloc RAM from file at offset Jagannathan Raman
@ 2019-10-24  9:08 ` Jagannathan Raman
  2019-11-13 15:30   ` Stefan Hajnoczi
  2019-10-24  9:08 ` [RFC v4 PATCH 03/49] multi-process: add a command line option for debug file Jagannathan Raman
                   ` (51 subsequent siblings)
  53 siblings, 1 reply; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

qemu_thread_cancel() added to destroy a given running thread.
This will be needed in the following patches.

Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 include/qemu/thread.h    |  1 +
 util/qemu-thread-posix.c | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/include/qemu/thread.h b/include/qemu/thread.h
index 047db03..fe7fa5a 100644
--- a/include/qemu/thread.h
+++ b/include/qemu/thread.h
@@ -175,6 +175,7 @@ void qemu_thread_create(QemuThread *thread, const char *name,
                         void *(*start_routine)(void *),
                         void *arg, int mode);
 void *qemu_thread_join(QemuThread *thread);
+void qemu_thread_cancel(QemuThread *thread);
 void qemu_thread_get_self(QemuThread *thread);
 bool qemu_thread_is_self(QemuThread *thread);
 void qemu_thread_exit(void *retval);
diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index 838980a..2fd85ed 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -590,3 +590,13 @@ void *qemu_thread_join(QemuThread *thread)
     }
     return ret;
 }
+
+void qemu_thread_cancel(QemuThread *thread)
+{
+    int err;
+
+    err = pthread_cancel(thread->thread);
+    if (err) {
+        error_exit(err, __func__);
+    }
+}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 03/49] multi-process: add a command line option for debug file
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
  2019-10-24  9:08 ` [RFC v4 PATCH 01/49] multi-process: memory: alloc RAM from file at offset Jagannathan Raman
  2019-10-24  9:08 ` [RFC v4 PATCH 02/49] multi-process: util: Add qemu_thread_cancel() to cancel running thread Jagannathan Raman
@ 2019-10-24  9:08 ` Jagannathan Raman
  2019-11-13 15:35   ` Stefan Hajnoczi
  2019-10-24  9:08 ` [RFC v4 PATCH 04/49] multi-process: Add stub functions to facilate build of multi-process Jagannathan Raman
                   ` (50 subsequent siblings)
  53 siblings, 1 reply; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

Can be used with -d rdebug command options when starting qemu.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
---
 include/qemu/log.h | 1 +
 util/log.c         | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/include/qemu/log.h b/include/qemu/log.h
index b097a6c..ca6f490 100644
--- a/include/qemu/log.h
+++ b/include/qemu/log.h
@@ -45,6 +45,7 @@ static inline bool qemu_log_separate(void)
 /* LOG_TRACE (1 << 15) is defined in log-for-trace.h */
 #define CPU_LOG_TB_OP_IND  (1 << 16)
 #define CPU_LOG_TB_FPU     (1 << 17)
+#define LOG_REMOTE_DEBUG   (1 << 18)
 
 /* Lock output for a series of related logs.  Since this is not needed
  * for a single qemu_log / qemu_log_mask / qemu_log_mask_and_addr, we
diff --git a/util/log.c b/util/log.c
index 1d1b33f..78e3e82 100644
--- a/util/log.c
+++ b/util/log.c
@@ -273,6 +273,8 @@ const QEMULogItem qemu_log_items[] = {
     { CPU_LOG_TB_NOCHAIN, "nochain",
       "do not chain compiled TBs so that \"exec\" and \"cpu\" show\n"
       "complete traces" },
+    { LOG_REMOTE_DEBUG, "rdebug",
+      "log remote debug" },
     { 0, NULL, NULL },
 };
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 04/49] multi-process: Add stub functions to facilate build of multi-process
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (2 preceding siblings ...)
  2019-10-24  9:08 ` [RFC v4 PATCH 03/49] multi-process: add a command line option for debug file Jagannathan Raman
@ 2019-10-24  9:08 ` Jagannathan Raman
  2019-10-24  9:08 ` [RFC v4 PATCH 05/49] multi-process: Add config option for multi-process QEMU Jagannathan Raman
                   ` (49 subsequent siblings)
  53 siblings, 0 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

Add stub functions that are needed during compile time but not in
runtime.

Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 accel/stubs/kvm-stub.c    |  5 +++
 accel/stubs/tcg-stub.c    | 96 +++++++++++++++++++++++++++++++++++++++++++++++
 stubs/audio.c             | 12 ++++++
 stubs/machine-init-done.c |  4 ++
 stubs/monitor.c           | 41 ++++++++++++++++++++
 stubs/net-stub.c          | 31 +++++++++++++++
 stubs/replay.c            | 14 +++++++
 stubs/vl-stub.c           | 79 ++++++++++++++++++++++++++++++++++++++
 stubs/vmstate.c           | 20 ++++++++++
 stubs/xen-mapcache.c      | 22 +++++++++++
 10 files changed, 324 insertions(+)
 create mode 100644 stubs/audio.c
 create mode 100644 stubs/net-stub.c
 create mode 100644 stubs/vl-stub.c
 create mode 100644 stubs/xen-mapcache.c

diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 6feb66e..f129dfb 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -31,6 +31,7 @@ bool kvm_allowed;
 bool kvm_readonly_mem_allowed;
 bool kvm_ioeventfd_any_length_allowed;
 bool kvm_msi_use_devid;
+bool kvm_halt_in_kernel_allowed;
 
 int kvm_destroy_vcpu(CPUState *cpu)
 {
@@ -58,6 +59,10 @@ void kvm_cpu_synchronize_post_init(CPUState *cpu)
 {
 }
 
+void kvm_cpu_synchronize_pre_loadvm(CPUState *cpu)
+{
+}
+
 int kvm_cpu_exec(CPUState *cpu)
 {
     abort();
diff --git a/accel/stubs/tcg-stub.c b/accel/stubs/tcg-stub.c
index e2d23ed..9b55fb0 100644
--- a/accel/stubs/tcg-stub.c
+++ b/accel/stubs/tcg-stub.c
@@ -15,11 +15,107 @@
 #include "cpu.h"
 #include "tcg/tcg.h"
 #include "exec/exec-all.h"
+#include "translate-all.h"
+#include "exec/ram_addr.h"
+
+bool parallel_cpus;
 
 void tb_flush(CPUState *cpu)
 {
 }
 
+void tb_check_watchpoint(CPUState *cpu, uintptr_t retaddr)
+{
+}
+
+void tb_invalidate_phys_range(ram_addr_t start, ram_addr_t end)
+{
+}
+
+void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end)
+{
+}
+
+void tb_invalidate_phys_page_fast(struct page_collection *pages,
+                                  tb_page_addr_t start, int len,
+                                  uintptr_t retaddr)
+{
+}
+
+void tlb_init(CPUState *cpu)
+{
+}
+
 void tlb_set_dirty(CPUState *cpu, target_ulong vaddr)
 {
 }
+
+void tlb_flush(CPUState *cpu)
+{
+}
+
+void tlb_flush_page(CPUState *cpu, target_ulong addr)
+{
+}
+
+void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, ram_addr_t length)
+{
+}
+
+void tcg_region_init(void)
+{
+}
+
+void tcg_register_thread(void)
+{
+}
+
+void tcg_flush_softmmu_tlb(CPUState *cs)
+{
+}
+
+void cpu_loop_exit_noexc(CPUState *cpu)
+{
+    cpu->exception_index = -1;
+    cpu_loop_exit(cpu);
+}
+
+void cpu_loop_exit(CPUState *cpu)
+{
+    cpu->can_do_io = 1;
+    siglongjmp(cpu->jmp_env, 1);
+}
+
+void cpu_reloading_memory_map(void)
+{
+}
+
+int cpu_exec(CPUState *cpu)
+{
+    return 0;
+}
+
+void cpu_exec_step_atomic(CPUState *cpu)
+{
+}
+
+bool cpu_restore_state(CPUState *cpu, uintptr_t host_pc, bool will_exit)
+{
+    return false;
+}
+
+void cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc)
+{
+    while (1) {
+    }
+}
+
+struct page_collection *
+page_collection_lock(tb_page_addr_t start, tb_page_addr_t end)
+{
+    return NULL;
+}
+
+void page_collection_unlock(struct page_collection *set)
+{
+}
diff --git a/stubs/audio.c b/stubs/audio.c
new file mode 100644
index 0000000..8ae3b0f
--- /dev/null
+++ b/stubs/audio.c
@@ -0,0 +1,12 @@
+#include "qemu/osdep.h"
+#include "audio/audio.h"
+
+AudioState *audio_state_by_name(const char *name)
+{
+    return NULL;
+}
+
+const char *audio_get_id(QEMUSoundCard *card)
+{
+    return NULL;
+}
diff --git a/stubs/machine-init-done.c b/stubs/machine-init-done.c
index cd8e813..3deabc9 100644
--- a/stubs/machine-init-done.c
+++ b/stubs/machine-init-done.c
@@ -6,3 +6,7 @@ bool machine_init_done = true;
 void qemu_add_machine_init_done_notifier(Notifier *notify)
 {
 }
+
+void qemu_remove_machine_init_done_notifier(Notifier *notify)
+{
+}
diff --git a/stubs/monitor.c b/stubs/monitor.c
index c3e9a2e..17d2493 100644
--- a/stubs/monitor.c
+++ b/stubs/monitor.c
@@ -2,9 +2,19 @@
 #include "qapi/error.h"
 #include "qapi/qapi-emit-events.h"
 #include "monitor/monitor.h"
+#include "qapi/qapi-types-misc.h"
+#include "qapi/qapi-commands-misc.h"
+#include "qapi/qapi-types-qom.h"
+#include "qapi/qapi-commands-qdev.h"
+#include "hw/qdev-core.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/runstate.h"
+#include "monitor/hmp.h"
 
 __thread Monitor *cur_mon;
 
+#pragma weak hmp_handle_error
+
 int monitor_vprintf(Monitor *mon, const char *fmt, va_list ap)
 {
     abort();
@@ -27,3 +37,34 @@ void monitor_init_hmp(Chardev *chr, bool use_readline)
 void qapi_event_emit(QAPIEvent event, QDict *qdict)
 {
 }
+
+int monitor_get_cpu_index(void)
+{
+    return -ENOSYS;
+}
+int monitor_printf(Monitor *mon, const char *fmt, ...)
+{
+    return -ENOSYS;
+}
+
+bool monitor_cur_is_qmp(void)
+{
+    return false;
+}
+
+ObjectPropertyInfoList *qmp_device_list_properties(const char *typename,
+                                                   Error **errp)
+{
+    return NULL;
+}
+
+VMChangeStateEntry *qdev_add_vm_change_state_handler(DeviceState *dev,
+                                                     VMChangeStateHandler *cb,
+                                                     void *opaque)
+{
+    return NULL;
+}
+
+void hmp_handle_error(Monitor *mon, Error **errp)
+{
+}
diff --git a/stubs/net-stub.c b/stubs/net-stub.c
new file mode 100644
index 0000000..cb2274b
--- /dev/null
+++ b/stubs/net-stub.c
@@ -0,0 +1,31 @@
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "net/net.h"
+
+int qemu_find_net_clients_except(const char *id, NetClientState **ncs,
+                                 NetClientDriver type, int max)
+{
+    return -ENOSYS;
+}
+
+NetClientState *net_hub_port_find(int hub_id)
+{
+    return NULL;
+}
+
+int net_hub_id_for_client(NetClientState *nc, int *id)
+{
+    return -ENOSYS;
+}
+
+int qemu_show_nic_models(const char *arg, const char *const *models)
+{
+    return -ENOSYS;
+}
+
+int qemu_find_nic_model(NICInfo *nd, const char * const *models,
+                        const char *default_model)
+{
+    return -ENOSYS;
+}
+
diff --git a/stubs/replay.c b/stubs/replay.c
index 10b3925..c0b51cb 100644
--- a/stubs/replay.c
+++ b/stubs/replay.c
@@ -79,3 +79,17 @@ void replay_mutex_lock(void)
 void replay_mutex_unlock(void)
 {
 }
+
+bool replay_has_checkpoint(void)
+{
+    return false;
+}
+
+int replay_get_instructions(void)
+{
+    return 0;
+}
+
+void replay_account_executed_instructions(void)
+{
+}
diff --git a/stubs/vl-stub.c b/stubs/vl-stub.c
new file mode 100644
index 0000000..fff72be
--- /dev/null
+++ b/stubs/vl-stub.c
@@ -0,0 +1,79 @@
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qemu/uuid.h"
+#include "sysemu/sysemu.h"
+#include "exec/cpu-common.h"
+#include "exec/gdbstub.h"
+#include "sysemu/replay.h"
+#include "disas/disas.h"
+#include "sysemu/runstate.h"
+
+bool tcg_allowed;
+bool xen_allowed;
+bool boot_strict;
+bool qemu_uuid_set;
+
+int mem_prealloc;
+int smp_cpus;
+int vga_interface_type = VGA_NONE;
+int smp_cores = 1;
+int smp_threads = 1;
+int icount_align_option;
+int boot_menu;
+
+unsigned int max_cpus;
+const uint32_t arch_type;
+const char *mem_path;
+uint8_t qemu_extra_params_fw[2];
+uint8_t *boot_splash_filedata;
+size_t boot_splash_filedata_size;
+struct syminfo *syminfos;
+
+ram_addr_t ram_size;
+MachineState *current_machine;
+QemuUUID qemu_uuid;
+
+int runstate_is_running(void)
+{
+    return 0;
+}
+
+void runstate_set(RunState new_state)
+{
+}
+
+void vm_state_notify(int running, RunState state)
+{
+}
+
+bool qemu_vmstop_requested(RunState *r)
+{
+    return false;
+}
+
+void qemu_system_debug_request(void)
+{
+}
+
+char *qemu_find_file(int type, const char *name)
+{
+    return NULL;
+}
+
+void gdb_set_stop_cpu(CPUState *cpu)
+{
+}
+
+void replay_enable_events(void)
+{
+}
+
+void replay_disable_events(void)
+{
+}
+
+#ifdef TARGET_I386
+void x86_cpu_list(void)
+{
+}
+#endif
diff --git a/stubs/vmstate.c b/stubs/vmstate.c
index e1e89b8..a9824bc 100644
--- a/stubs/vmstate.c
+++ b/stubs/vmstate.c
@@ -1,8 +1,11 @@
 #include "qemu/osdep.h"
 #include "migration/vmstate.h"
+#include "migration/misc.h"
 
 const VMStateDescription vmstate_dummy = {};
 
+const VMStateInfo vmstate_info_timer;
+
 int vmstate_register_with_alias_id(DeviceState *dev,
                                    int instance_id,
                                    const VMStateDescription *vmsd,
@@ -23,3 +26,20 @@ bool vmstate_check_only_migratable(const VMStateDescription *vmsd)
 {
     return true;
 }
+
+void vmstate_register_ram(MemoryRegion *mr, DeviceState *dev)
+{
+}
+
+void vmstate_unregister_ram(MemoryRegion *mr, DeviceState *dev)
+{
+}
+
+void vmstate_register_ram_global(MemoryRegion *mr)
+{
+}
+
+bool migration_is_idle(void)
+{
+    return true;
+}
diff --git a/stubs/xen-mapcache.c b/stubs/xen-mapcache.c
new file mode 100644
index 0000000..af5c031
--- /dev/null
+++ b/stubs/xen-mapcache.c
@@ -0,0 +1,22 @@
+#include "qemu/osdep.h"
+#include "exec/hwaddr.h"
+#include "exec/cpu-common.h"
+#include "sysemu/xen-mapcache.h"
+
+#ifdef CONFIG_XEN
+
+void xen_invalidate_map_cache_entry(uint8_t *buffer)
+{
+}
+
+uint8_t *xen_map_cache(hwaddr phys_addr, hwaddr size, uint8_t lock, bool dma)
+{
+    return NULL;
+}
+
+ram_addr_t xen_ram_addr_from_mapcache(void *ptr)
+{
+    return 0;
+}
+
+#endif
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 05/49] multi-process: Add config option for multi-process QEMU
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (3 preceding siblings ...)
  2019-10-24  9:08 ` [RFC v4 PATCH 04/49] multi-process: Add stub functions to facilate build of multi-process Jagannathan Raman
@ 2019-10-24  9:08 ` Jagannathan Raman
  2019-10-24  9:08 ` [RFC v4 PATCH 06/49] multi-process: build system for remote device process Jagannathan Raman
                   ` (48 subsequent siblings)
  53 siblings, 0 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

Add a configuration option to separate multi-process code

Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 configure | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/configure b/configure
index 145fcab..135afa9 100755
--- a/configure
+++ b/configure
@@ -498,6 +498,7 @@ libxml2=""
 debug_mutex="no"
 libpmem=""
 default_devices="yes"
+mpqemu="no"
 
 supported_cpu="no"
 supported_os="no"
@@ -1529,6 +1530,10 @@ for opt do
   ;;
   --disable-xkbcommon) xkbcommon=no
   ;;
+  --enable-mpqemu) mpqemu=yes
+  ;;
+  --disable-mpqemu) mpqemu=no
+  ;;
   *)
       echo "ERROR: unknown option $opt"
       echo "Try '$0 --help' for more information"
@@ -1813,6 +1818,7 @@ disabled with --disable-FEATURE, default is enabled if available:
   debug-mutex     mutex debugging support
   libpmem         libpmem support
   xkbcommon       xkbcommon support
+  mpqemu          multi-process QEMU support
 
 NOTE: The object files are built at the place where configure is launched
 EOF
@@ -6439,6 +6445,7 @@ echo "capstone          $capstone"
 echo "libpmem support   $libpmem"
 echo "libudev           $libudev"
 echo "default devices   $default_devices"
+echo "multiprocess QEMU $mpqemu"
 
 if test "$supported_cpu" = "no"; then
     echo
@@ -7241,6 +7248,10 @@ if test "$libpmem" = "yes" ; then
   echo "CONFIG_LIBPMEM=y" >> $config_host_mak
 fi
 
+if test "$mpqemu" = "yes" ; then
+  echo "CONFIG_MPQEMU=y" >> $config_host_mak
+fi
+
 if test "$bochs" = "yes" ; then
   echo "CONFIG_BOCHS=y" >> $config_host_mak
 fi
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 06/49] multi-process: build system for remote device process
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (4 preceding siblings ...)
  2019-10-24  9:08 ` [RFC v4 PATCH 05/49] multi-process: Add config option for multi-process QEMU Jagannathan Raman
@ 2019-10-24  9:08 ` Jagannathan Raman
  2019-10-24  9:08 ` [RFC v4 PATCH 07/49] multi-process: define mpqemu-link object Jagannathan Raman
                   ` (47 subsequent siblings)
  53 siblings, 0 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

Modify Makefile to support the building of the remote
device process. Implements main() function of remote
device process.

Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 Makefile                |  2 ++
 Makefile.objs           | 22 ++++++++++++++++++++
 Makefile.target         | 53 +++++++++++++++++++++++++++++++++++++++++++++++--
 backends/Makefile.objs  |  2 ++
 block/Makefile.objs     |  2 ++
 hw/Makefile.objs        |  7 +++++++
 hw/block/Makefile.objs  |  2 ++
 hw/core/Makefile.objs   | 16 +++++++++++++++
 hw/nvram/Makefile.objs  |  2 ++
 hw/pci/Makefile.objs    |  4 ++++
 hw/scsi/Makefile.objs   |  2 ++
 migration/Makefile.objs |  2 ++
 qom/Makefile.objs       |  3 +++
 remote/Makefile.objs    |  1 +
 remote/remote-main.c    | 37 ++++++++++++++++++++++++++++++++++
 stubs/replay.c          |  4 ++++
 16 files changed, 159 insertions(+), 2 deletions(-)
 create mode 100644 remote/Makefile.objs
 create mode 100644 remote/remote-main.c

diff --git a/Makefile b/Makefile
index d20e7ff..c4d5048 100644
--- a/Makefile
+++ b/Makefile
@@ -437,6 +437,8 @@ dummy := $(call unnest-vars,, \
                 qom-obj-y \
                 io-obj-y \
                 common-obj-y \
+                remote-pci-obj-y \
+                remote-lsi-obj-y \
                 common-obj-m \
                 ui-obj-y \
                 ui-obj-m \
diff --git a/Makefile.objs b/Makefile.objs
index abcbd89..c2ac261 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -21,6 +21,28 @@ block-obj-$(CONFIG_REPLICATION) += replication.o
 
 block-obj-m = block/
 
+#########################################################
+# remote-pci-obj-y is common code used by remote devices
+
+remote-pci-obj-$(CONFIG_MPQEMU) += hw/
+remote-pci-obj-$(CONFIG_MPQEMU) += qom/
+remote-pci-obj-$(CONFIG_MPQEMU) += backends/
+remote-pci-obj-$(CONFIG_MPQEMU) += block/
+remote-pci-obj-$(CONFIG_MPQEMU) += migration/
+remote-pci-obj-$(CONFIG_MPQEMU) += remote/
+
+remote-pci-obj-$(CONFIG_MPQEMU) += cpus-common.o
+remote-pci-obj-$(CONFIG_MPQEMU) += dma-helpers.o
+remote-pci-obj-$(CONFIG_MPQEMU) += blockdev.o
+remote-pci-obj-$(CONFIG_MPQEMU) += qdev-monitor.o
+remote-pci-obj-$(CONFIG_MPQEMU) += bootdevice.o
+remote-pci-obj-$(CONFIG_MPQEMU) += iothread.o
+
+##############################################################
+# remote-lsi-obj-y is code used to implement remote LSI device
+
+remote-lsi-obj-$(CONFIG_MPQEMU) += hw/
+
 #######################################################################
 # crypto-obj-y is code used by both qemu system emulation and qemu-img
 
diff --git a/Makefile.target b/Makefile.target
index 5e91623..e454aae 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -36,7 +36,12 @@ QEMU_PROG_BUILD = $(QEMU_PROG)
 endif
 endif
 
-PROGS=$(QEMU_PROG) $(QEMU_PROGW)
+ifdef CONFIG_MPQEMU
+SCSI_DEV_PROG=qemu-scsi-dev
+SCSI_DEV_BUILD = $(SCSI_DEV_PROG)
+endif
+
+PROGS=$(QEMU_PROG) $(QEMU_PROGW) $(SCSI_DEV_PROG)
 STPFILES=
 
 config-target.h: config-target.h-timestamp
@@ -119,6 +124,18 @@ obj-y += disas.o
 obj-$(call notempty,$(TARGET_XML_FILES)) += gdbstub-xml.o
 LIBS := $(libs_cpu) $(LIBS)
 
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += accel/stubs/kvm-stub.o
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += accel/stubs/tcg-stub.o
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += accel/stubs/hax-stub.o
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += accel/stubs/whpx-stub.o
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/vl-stub.o
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/net-stub.o
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/monitor.o
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/replay.o
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/xen-mapcache.o
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/audio.o
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/monitor.o
+
 #########################################################
 # Linux user emulator target
 
@@ -175,6 +192,17 @@ endif # CONFIG_SOFTMMU
 dummy := $(call unnest-vars,,obj-y)
 all-obj-y := $(obj-y)
 
+dummy := $(call unnest-vars,..,remote-pci-tgt-obj-y)
+all-remote-pci-obj-y := $(remote-pci-tgt-obj-y)
+
+all-remote-pci-obj-y += memory.o
+all-remote-pci-obj-y += exec.o
+all-remote-pci-obj-y += ioport.o
+all-remote-pci-obj-y += cpus.o
+
+remote-pci-obj-y :=
+remote-lsi-obj-y :=
+
 include $(SRC_PATH)/Makefile.objs
 dummy := $(call unnest-vars,.., \
                authz-obj-y \
@@ -186,7 +214,10 @@ dummy := $(call unnest-vars,.., \
                qom-obj-y \
                io-obj-y \
                common-obj-y \
-               common-obj-m)
+               common-obj-m \
+               remote-pci-obj-y \
+               remote-lsi-obj-y)
+
 all-obj-y += $(common-obj-y)
 all-obj-y += $(qom-obj-y)
 all-obj-$(CONFIG_SOFTMMU) += $(authz-obj-y)
@@ -195,8 +226,19 @@ all-obj-$(CONFIG_USER_ONLY) += $(crypto-user-obj-y)
 all-obj-$(CONFIG_SOFTMMU) += $(crypto-obj-y)
 all-obj-$(CONFIG_SOFTMMU) += $(io-obj-y)
 
+all-remote-pci-obj-y += $(authz-obj-y)
+all-remote-pci-obj-y += $(block-obj-y)
+all-remote-pci-obj-y += $(crypto-obj-y)
+all-remote-pci-obj-y += $(io-obj-y)
+all-remote-pci-obj-y += $(chardev-obj-y)
+all-remote-pci-obj-y += $(remote-pci-obj-y)
+
+
+all-remote-lsi-obj-y += $(all-remote-pci-obj-y) $(remote-lsi-obj-y)
+
 ifdef CONFIG_SOFTMMU
 $(QEMU_PROG_BUILD): config-devices.mak
+$(SCSI_DEV_BUILD): config-devices.mak
 endif
 
 COMMON_LDADDS = ../libqemuutil.a
@@ -209,6 +251,13 @@ ifdef CONFIG_DARWIN
 	$(call quiet-command,SetFile -a C $@,"SETFILE","$(TARGET_DIR)$@")
 endif
 
+$(SCSI_DEV_BUILD): $(all-remote-lsi-obj-y) $(COMMON_LDADDS)
+	$(call LINK, $(filter-out %.mak, $^))
+ifdef CONFIG_DARWIN
+	$(call quiet-command,Rez -append $(SRC_PATH)/pc-bios/qemu.rsrc -o $@,"REZ","$(TARGET_DIR)$@")
+	$(call quiet-command,SetFile -a C $@,"SETFILE","$(TARGET_DIR)$@")
+endif
+
 gdbstub-xml.c: $(TARGET_XML_FILES) $(SRC_PATH)/scripts/feature_to_c.sh
 	$(call quiet-command,rm -f $@ && $(SHELL) $(SRC_PATH)/scripts/feature_to_c.sh $@ $(TARGET_XML_FILES),"GEN","$(TARGET_DIR)$@")
 
diff --git a/backends/Makefile.objs b/backends/Makefile.objs
index f069111..bd52cbc 100644
--- a/backends/Makefile.objs
+++ b/backends/Makefile.objs
@@ -17,3 +17,5 @@ endif
 common-obj-$(call land,$(CONFIG_VHOST_USER),$(CONFIG_VIRTIO)) += vhost-user.o
 
 common-obj-$(CONFIG_LINUX) += hostmem-memfd.o
+
+remote-pci-obj-$(CONFIG_MPQEMU) += hostmem.o
diff --git a/block/Makefile.objs b/block/Makefile.objs
index e394fe0..adcc1b0 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -67,3 +67,5 @@ qcow.o-libs        := -lz
 linux-aio.o-libs   := -laio
 parallels.o-cflags := $(LIBXML2_CFLAGS)
 parallels.o-libs   := $(LIBXML2_LIBS)
+
+remote-pci-obj-$(CONFIG_MPQEMU) += stream.o
diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index ece6cc3..4e28053 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -43,3 +43,10 @@ endif
 
 common-obj-y += $(devices-dirs-y)
 obj-y += $(devices-dirs-y)
+
+remote-pci-obj-$(CONFIG_MPQEMU) += core/
+remote-pci-obj-$(CONFIG_MPQEMU) += block/
+remote-pci-obj-$(CONFIG_MPQEMU) += pci/
+remote-pci-obj-$(CONFIG_MPQEMU) += nvram/
+
+remote-lsi-obj-$(CONFIG_MPQEMU) += scsi/
diff --git a/hw/block/Makefile.objs b/hw/block/Makefile.objs
index f5f643f..7286fbd 100644
--- a/hw/block/Makefile.objs
+++ b/hw/block/Makefile.objs
@@ -15,3 +15,5 @@ obj-$(CONFIG_VIRTIO_BLK) += virtio-blk.o
 obj-$(CONFIG_VHOST_USER_BLK) += vhost-user-blk.o
 
 obj-y += dataplane/
+
+remote-pci-obj-$(CONFIG_MPQEMU) += block.o cdrom.o hd-geometry.o
diff --git a/hw/core/Makefile.objs b/hw/core/Makefile.objs
index fd0550d..9ef6b42 100644
--- a/hw/core/Makefile.objs
+++ b/hw/core/Makefile.objs
@@ -28,3 +28,19 @@ common-obj-$(CONFIG_SOFTMMU) += null-machine.o
 obj-$(CONFIG_SOFTMMU) += machine-qmp-cmds.o
 obj-$(CONFIG_SOFTMMU) += numa.o
 common-obj-$(CONFIG_SOFTMMU) += machine-hmp-cmds.o
+
+remote-pci-obj-$(CONFIG_MPQEMU) += qdev-properties.o
+remote-pci-obj-$(CONFIG_MPQEMU) += qdev.o
+remote-pci-obj-$(CONFIG_MPQEMU) += bus.o
+remote-pci-obj-$(CONFIG_MPQEMU) += irq.o
+remote-pci-obj-$(CONFIG_MPQEMU) += hotplug.o
+remote-pci-obj-$(CONFIG_MPQEMU) += machine.o
+remote-pci-obj-$(CONFIG_MPQEMU) += fw-path-provider.o
+remote-pci-obj-$(CONFIG_MPQEMU) += reset.o
+remote-pci-obj-$(CONFIG_MPQEMU) += sysbus.o
+remote-pci-obj-$(CONFIG_MPQEMU) += loader.o
+remote-pci-obj-$(CONFIG_MPQEMU) += nmi.o
+remote-pci-obj-$(CONFIG_MPQEMU) += qdev-properties-system.o
+remote-pci-obj-$(CONFIG_MPQEMU) += qdev-fw.o
+remote-pci-obj-$(CONFIG_MPQEMU) += numa.o
+remote-pci-obj-$(CONFIG_MPQEMU) += cpu.o
diff --git a/hw/nvram/Makefile.objs b/hw/nvram/Makefile.objs
index 26f7b4c..9802a31 100644
--- a/hw/nvram/Makefile.objs
+++ b/hw/nvram/Makefile.objs
@@ -6,3 +6,5 @@ common-obj-y += chrp_nvram.o
 common-obj-$(CONFIG_MAC_NVRAM) += mac_nvram.o
 obj-$(CONFIG_PSERIES) += spapr_nvram.o
 obj-$(CONFIG_NRF51_SOC) += nrf51_nvm.o
+
+remote-pci-obj-$(CONFIG_MPQEMU) += fw_cfg.o
diff --git a/hw/pci/Makefile.objs b/hw/pci/Makefile.objs
index c78f2fb..955be54 100644
--- a/hw/pci/Makefile.objs
+++ b/hw/pci/Makefile.objs
@@ -12,3 +12,7 @@ common-obj-$(CONFIG_PCI_EXPRESS) += pcie_port.o pcie_host.o
 
 common-obj-$(call lnot,$(CONFIG_PCI)) += pci-stub.o
 common-obj-$(CONFIG_ALL) += pci-stub.o
+
+remote-pci-obj-$(CONFIG_MPQEMU) += pci.o pci_bridge.o
+remote-pci-obj-$(CONFIG_MPQEMU) += msi.o msix.o
+remote-pci-obj-$(CONFIG_MPQEMU) += pcie.o
diff --git a/hw/scsi/Makefile.objs b/hw/scsi/Makefile.objs
index 54b36ed..ef97770 100644
--- a/hw/scsi/Makefile.objs
+++ b/hw/scsi/Makefile.objs
@@ -13,3 +13,5 @@ obj-y += virtio-scsi.o virtio-scsi-dataplane.o
 obj-$(CONFIG_VHOST_SCSI) += vhost-scsi-common.o vhost-scsi.o
 obj-$(CONFIG_VHOST_USER_SCSI) += vhost-scsi-common.o vhost-user-scsi.o
 endif
+
+remote-lsi-obj-$(CONFIG_MPQEMU) += scsi-generic.o scsi-bus.o lsi53c895a.o scsi-disk.o emulation.o
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index a4f3baf..016b6ab 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -13,3 +13,5 @@ common-obj-$(CONFIG_RDMA) += rdma.o
 common-obj-$(CONFIG_LIVE_BLOCK_MIGRATION) += block.o
 
 rdma.o-libs := $(RDMA_LIBS)
+
+remote-pci-obj-$(CONFIG_MPQEMU) += qemu-file.o vmstate.o qjson.o vmstate-types.o
diff --git a/qom/Makefile.objs b/qom/Makefile.objs
index f9d7735..07e50e5 100644
--- a/qom/Makefile.objs
+++ b/qom/Makefile.objs
@@ -2,3 +2,6 @@ qom-obj-y = object.o container.o qom-qobject.o
 qom-obj-y += object_interfaces.o
 
 common-obj-$(CONFIG_SOFTMMU) += qom-hmp-cmds.o qom-qmp-cmds.o
+
+remote-pci-obj-$(CONFIG_MPQEMU) += object.o qom-qobject.o container.o
+remote-pci-obj-$(CONFIG_MPQEMU) += object_interfaces.o
diff --git a/remote/Makefile.objs b/remote/Makefile.objs
new file mode 100644
index 0000000..a9b2256
--- /dev/null
+++ b/remote/Makefile.objs
@@ -0,0 +1 @@
+remote-pci-obj-$(CONFIG_MPQEMU) += remote-main.o
diff --git a/remote/remote-main.c b/remote/remote-main.c
new file mode 100644
index 0000000..cccad12
--- /dev/null
+++ b/remote/remote-main.c
@@ -0,0 +1,37 @@
+/*
+ * Remote device initialization
+ *
+ * Copyright 2019, Oracle and/or its affiliates.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include <stdio.h>
+
+#include "qemu/module.h"
+
+int main(int argc, char *argv[])
+{
+    module_call_init(MODULE_INIT_QOM);
+
+    return 0;
+}
diff --git a/stubs/replay.c b/stubs/replay.c
index c0b51cb..4a966ff 100644
--- a/stubs/replay.c
+++ b/stubs/replay.c
@@ -93,3 +93,7 @@ int replay_get_instructions(void)
 void replay_account_executed_instructions(void)
 {
 }
+
+void replay_add_blocker(Error *reason)
+{
+}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 07/49] multi-process: define mpqemu-link object
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (5 preceding siblings ...)
  2019-10-24  9:08 ` [RFC v4 PATCH 06/49] multi-process: build system for remote device process Jagannathan Raman
@ 2019-10-24  9:08 ` Jagannathan Raman
  2019-11-11 16:41   ` Stefan Hajnoczi
  2019-11-13 15:53   ` Stefan Hajnoczi
  2019-10-24  9:08 ` [RFC v4 PATCH 08/49] multi-process: add functions to synchronize proxy and remote endpoints Jagannathan Raman
                   ` (46 subsequent siblings)
  53 siblings, 2 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

Defines mpqemu-link object which forms the communication link between
QEMU & emulation program.
Adds functions to configure members of mpqemu-link object instance.
Adds functions to send and receive messages over the communication
channel.
Adds GMainLoop to handle events received on the communication channel.

Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 v1 -> v2:
   - Use default context for main loop instead of a new context

 v2 -> v3:
   - Enabled multi-channel support in the communication link

 v3 -> v4:
  - Change the name of proxy-link to mpqemu-link
  - Use separate locks for sending and receiving messages

 include/io/mpqemu-link.h | 150 +++++++++++++++++++++++
 io/Makefile.objs         |   2 +
 io/mpqemu-link.c         | 309 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 461 insertions(+)
 create mode 100644 include/io/mpqemu-link.h
 create mode 100644 io/mpqemu-link.c

diff --git a/include/io/mpqemu-link.h b/include/io/mpqemu-link.h
new file mode 100644
index 0000000..345c67e
--- /dev/null
+++ b/include/io/mpqemu-link.h
@@ -0,0 +1,150 @@
+/*
+ * Communication channel between QEMU and remote device process
+ *
+ * Copyright 2019, Oracle and/or its affiliates.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef MPQEMU_LINK_H
+#define MPQEMU_LINK_H
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include <stddef.h>
+#include <stdint.h>
+#include <pthread.h>
+
+#include "qom/object.h"
+#include "qemu/thread.h"
+
+#define TYPE_MPQEMU_LINK "mpqemu-link"
+#define MPQEMU_LINK(obj) \
+    OBJECT_CHECK(MPQemuLinkState, (obj), TYPE_MPQEMU_LINK)
+
+#define REMOTE_MAX_FDS 8
+
+#define MPQEMU_MSG_HDR_SIZE offsetof(MPQemuMsg, data1.u64)
+
+/**
+ * mpqemu_cmd_t:
+ * CONF_READ        PCI config. space read
+ * CONF_WRITE       PCI config. space write
+ *
+ * proc_cmd_t enum type to specify the command to be executed on the remote
+ * device.
+ */
+typedef enum {
+    INIT = 0,
+    CONF_READ,
+    CONF_WRITE,
+    MAX,
+} mpqemu_cmd_t;
+
+/**
+ * MPQemuMsg:
+ * @cmd: The remote command
+ * @bytestream: Indicates if the data to be shared is structured (data1)
+ *              or unstructured (data2)
+ * @size: Size of the data to be shared
+ * @data1: Structured data
+ * @fds: File descriptors to be shared with remote device
+ * @data2: Unstructured data
+ *
+ * MPQemuMsg Format of the message sent to the remote device from QEMU.
+ *
+ */
+typedef struct {
+    mpqemu_cmd_t cmd;
+    int bytestream;
+    size_t size;
+
+    union {
+        uint64_t u64;
+    } data1;
+
+    int fds[REMOTE_MAX_FDS];
+    int num_fds;
+
+    uint8_t *data2;
+} MPQemuMsg;
+
+struct conf_data_msg {
+    uint32_t addr;
+    uint32_t val;
+    int l;
+};
+
+/**
+ * MPQemuChannel:
+ * @gsrc: GSource object to be used by loop
+ * @gpfd: GPollFD object containing the socket & events to monitor
+ * @sock: Socket to send/receive communication, same as the one in gpfd
+ * @send_lock: Mutex to synchronize access to the send stream
+ * @recv_lock: Mutex to synchronize access to the recv stream
+ *
+ * Defines the channel that make up the communication link
+ * between QEMU and remote process
+ */
+
+typedef struct MPQemuChannel {
+    GSource gsrc;
+    GPollFD gpfd;
+    int sock;
+    QemuMutex send_lock;
+    QemuMutex recv_lock;
+} MPQemuChannel;
+
+typedef void (*mpqemu_link_callback)(GIOCondition cond, MPQemuChannel *chan);
+
+/*
+ * MPQemuLinkState Instance info. of the communication
+ * link between QEMU and remote process. The Link could
+ * be made up of multiple channels.
+ *
+ * ctx        GMainContext to be used for communication
+ * loop       Main loop that would be used to poll for incoming data
+ * com        Communication channel to transport control messages
+ *
+ */
+
+typedef struct MPQemuLinkState {
+    Object obj;
+
+    GMainContext *ctx;
+    GMainLoop *loop;
+
+    MPQemuChannel *com;
+
+    mpqemu_link_callback callback;
+} MPQemuLinkState;
+
+MPQemuLinkState *mpqemu_link_create(void);
+void mpqemu_link_finalize(MPQemuLinkState *s);
+
+void mpqemu_msg_send(MPQemuLinkState *s, MPQemuMsg *msg, MPQemuChannel *chan);
+int mpqemu_msg_recv(MPQemuLinkState *s, MPQemuMsg *msg, MPQemuChannel *chan);
+
+void mpqemu_init_channel(MPQemuLinkState *s, MPQemuChannel **chan, int fd);
+void mpqemu_destroy_channel(MPQemuChannel *chan);
+void mpqemu_link_set_callback(MPQemuLinkState *s, mpqemu_link_callback callback);
+void mpqemu_start_coms(MPQemuLinkState *s);
+
+#endif
diff --git a/io/Makefile.objs b/io/Makefile.objs
index 9a20fce..5875ab0 100644
--- a/io/Makefile.objs
+++ b/io/Makefile.objs
@@ -10,3 +10,5 @@ io-obj-y += channel-util.o
 io-obj-y += dns-resolver.o
 io-obj-y += net-listener.o
 io-obj-y += task.o
+
+io-obj-$(CONFIG_MPQEMU) += mpqemu-link.o
diff --git a/io/mpqemu-link.c b/io/mpqemu-link.c
new file mode 100644
index 0000000..b39f4d0
--- /dev/null
+++ b/io/mpqemu-link.c
@@ -0,0 +1,309 @@
+/*
+ * Communication channel between QEMU and remote device process
+ *
+ * Copyright 2019, Oracle and/or its affiliates.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include <assert.h>
+#include <errno.h>
+#include <pthread.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <unistd.h>
+#include <limits.h>
+#include <poll.h>
+
+#include "qemu/module.h"
+#include "io/mpqemu-link.h"
+#include "qemu/log.h"
+
+GSourceFuncs gsrc_funcs;
+
+static void mpqemu_link_inst_init(Object *obj)
+{
+    MPQemuLinkState *s = MPQEMU_LINK(obj);
+
+    s->ctx = g_main_context_default();
+    s->loop = g_main_loop_new(s->ctx, FALSE);
+}
+
+static const TypeInfo mpqemu_link_info = {
+    .name = TYPE_MPQEMU_LINK,
+    .parent = TYPE_OBJECT,
+    .instance_size = sizeof(MPQemuLinkState),
+    .instance_init = mpqemu_link_inst_init,
+};
+
+static void mpqemu_link_register_types(void)
+{
+    type_register_static(&mpqemu_link_info);
+}
+
+type_init(mpqemu_link_register_types)
+
+MPQemuLinkState *mpqemu_link_create(void)
+{
+    return MPQEMU_LINK(object_new(TYPE_MPQEMU_LINK));
+}
+
+void mpqemu_link_finalize(MPQemuLinkState *s)
+{
+    g_main_loop_unref(s->loop);
+    g_main_context_unref(s->ctx);
+    g_main_loop_quit(s->loop);
+
+    mpqemu_destroy_channel(s->com);
+
+    object_unref(OBJECT(s));
+}
+
+void mpqemu_msg_send(MPQemuLinkState *s, MPQemuMsg *msg, MPQemuChannel *chan)
+{
+    int rc;
+    uint8_t *data;
+    union {
+        char control[CMSG_SPACE(REMOTE_MAX_FDS * sizeof(int))];
+        struct cmsghdr align;
+    } u;
+    struct msghdr hdr;
+    struct cmsghdr *chdr;
+    int sock = chan->sock;
+    QemuMutex *lock = &chan->send_lock;
+
+    struct iovec iov = {
+        .iov_base = (char *) msg,
+        .iov_len = MPQEMU_MSG_HDR_SIZE,
+    };
+
+    memset(&hdr, 0, sizeof(hdr));
+    memset(&u, 0, sizeof(u));
+
+    hdr.msg_iov = &iov;
+    hdr.msg_iovlen = 1;
+
+    if (msg->num_fds > REMOTE_MAX_FDS) {
+        qemu_log_mask(LOG_REMOTE_DEBUG, "%s: Max FDs exceeded\n", __func__);
+        return;
+    }
+
+    if (msg->num_fds > 0) {
+        size_t fdsize = msg->num_fds * sizeof(int);
+
+        hdr.msg_control = &u;
+        hdr.msg_controllen = sizeof(u);
+
+        chdr = CMSG_FIRSTHDR(&hdr);
+        chdr->cmsg_len = CMSG_LEN(fdsize);
+        chdr->cmsg_level = SOL_SOCKET;
+        chdr->cmsg_type = SCM_RIGHTS;
+        memcpy(CMSG_DATA(chdr), msg->fds, fdsize);
+        hdr.msg_controllen = CMSG_SPACE(fdsize);
+    }
+
+    qemu_mutex_lock(lock);
+
+    do {
+        rc = sendmsg(sock, &hdr, 0);
+    } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
+
+    if (rc < 0) {
+        qemu_log_mask(LOG_REMOTE_DEBUG, "%s - sendmsg rc is %d, errno is %d,"
+                      " sock %d\n", __func__, rc, errno, sock);
+        qemu_mutex_unlock(lock);
+        return;
+    }
+
+    if (msg->bytestream) {
+        data = msg->data2;
+    } else {
+        data = (uint8_t *)msg + MPQEMU_MSG_HDR_SIZE;
+    }
+
+    do {
+        rc = write(sock, data, msg->size);
+    } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
+
+    qemu_mutex_unlock(lock);
+}
+
+
+int mpqemu_msg_recv(MPQemuLinkState *s, MPQemuMsg *msg, MPQemuChannel *chan)
+{
+    int rc;
+    uint8_t *data;
+    union {
+        char control[CMSG_SPACE(REMOTE_MAX_FDS * sizeof(int))];
+        struct cmsghdr align;
+    } u;
+    struct msghdr hdr;
+    struct cmsghdr *chdr;
+    size_t fdsize;
+    int sock = chan->sock;
+    QemuMutex *lock = &chan->recv_lock;
+
+    struct iovec iov = {
+        .iov_base = (char *) msg,
+        .iov_len = MPQEMU_MSG_HDR_SIZE,
+    };
+
+    memset(&hdr, 0, sizeof(hdr));
+    memset(&u, 0, sizeof(u));
+
+    hdr.msg_iov = &iov;
+    hdr.msg_iovlen = 1;
+    hdr.msg_control = &u;
+    hdr.msg_controllen = sizeof(u);
+
+    qemu_mutex_lock(lock);
+
+    do {
+        rc = recvmsg(sock, &hdr, 0);
+    } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
+
+    if (rc < 0) {
+        qemu_log_mask(LOG_REMOTE_DEBUG, "%s - recvmsg rc is %d, errno is %d,"
+                      " sock %d\n", __func__, rc, errno, sock);
+        qemu_mutex_unlock(lock);
+        return rc;
+    }
+
+    msg->num_fds = 0;
+    for (chdr = CMSG_FIRSTHDR(&hdr); chdr != NULL;
+         chdr = CMSG_NXTHDR(&hdr, chdr)) {
+        if ((chdr->cmsg_level == SOL_SOCKET) &&
+            (chdr->cmsg_type == SCM_RIGHTS)) {
+            fdsize = chdr->cmsg_len - CMSG_LEN(0);
+            msg->num_fds = fdsize / sizeof(int);
+            if (msg->num_fds > REMOTE_MAX_FDS) {
+                /*
+                 * TODO: Security issue detected. Sender never sends more
+                 * than REMOTE_MAX_FDS. This condition should be signaled to
+                 * the admin
+                 */
+                qemu_log_mask(LOG_REMOTE_DEBUG, "%s: Max FDs exceeded\n", __func__);
+                return -ERANGE;
+            }
+
+            memcpy(msg->fds, CMSG_DATA(chdr), fdsize);
+            break;
+        }
+    }
+
+    if (msg->size && msg->bytestream) {
+        msg->data2 = calloc(1, msg->size);
+        data = msg->data2;
+    } else {
+        data = (uint8_t *)&msg->data1;
+    }
+
+    if (msg->size) {
+        do {
+            rc = read(sock, data, msg->size);
+        } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
+    }
+
+    qemu_mutex_unlock(lock);
+
+    return rc;
+}
+
+static gboolean mpqemu_link_handler_prepare(GSource *gsrc, gint *timeout)
+{
+    g_assert(timeout);
+
+    *timeout = -1;
+
+    return FALSE;
+}
+
+static gboolean mpqemu_link_handler_check(GSource *gsrc)
+{
+    MPQemuChannel *chan = (MPQemuChannel *)gsrc;
+
+    return chan->gpfd.events & chan->gpfd.revents;
+}
+
+static gboolean mpqemu_link_handler_dispatch(GSource *gsrc, GSourceFunc func,
+                                             gpointer data)
+{
+    MPQemuLinkState *s = (MPQemuLinkState *)data;
+    MPQemuChannel *chan = (MPQemuChannel *)gsrc;
+
+    s->callback(chan->gpfd.revents, chan);
+
+    if ((chan->gpfd.revents & G_IO_HUP) || (chan->gpfd.revents & G_IO_ERR)) {
+        return G_SOURCE_REMOVE;
+    }
+
+    return G_SOURCE_CONTINUE;
+}
+
+void mpqemu_link_set_callback(MPQemuLinkState *s, mpqemu_link_callback callback)
+{
+    s->callback = callback;
+}
+
+void mpqemu_init_channel(MPQemuLinkState *s, MPQemuChannel **chan, int fd)
+{
+    MPQemuChannel *src;
+
+    gsrc_funcs = (GSourceFuncs){
+        .prepare = mpqemu_link_handler_prepare,
+        .check = mpqemu_link_handler_check,
+        .dispatch = mpqemu_link_handler_dispatch,
+        .finalize = NULL,
+    };
+
+    src = (MPQemuChannel *)g_source_new(&gsrc_funcs, sizeof(MPQemuChannel));
+
+    src->sock = fd;
+    qemu_mutex_init(&src->send_lock);
+    qemu_mutex_init(&src->recv_lock);
+
+    g_source_set_callback(&src->gsrc, NULL, (gpointer)s, NULL);
+    src->gpfd.fd = fd;
+    src->gpfd.events = G_IO_IN | G_IO_HUP | G_IO_ERR;
+    g_source_add_poll(&src->gsrc, &src->gpfd);
+
+    *chan = src;
+}
+
+void mpqemu_destroy_channel(MPQemuChannel *chan)
+{
+    g_source_unref(&chan->gsrc);
+    close(chan->sock);
+    qemu_mutex_destroy(&chan->send_lock);
+    qemu_mutex_destroy(&chan->recv_lock);
+}
+
+void mpqemu_start_coms(MPQemuLinkState *s)
+{
+
+    g_assert(g_source_attach(&s->com->gsrc, s->ctx));
+
+    g_main_loop_run(s->loop);
+}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 08/49] multi-process: add functions to synchronize proxy and remote endpoints
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (6 preceding siblings ...)
  2019-10-24  9:08 ` [RFC v4 PATCH 07/49] multi-process: define mpqemu-link object Jagannathan Raman
@ 2019-10-24  9:08 ` Jagannathan Raman
  2019-10-24  9:08 ` [RFC v4 PATCH 09/49] multi-process: setup PCI host bridge for remote device Jagannathan Raman
                   ` (45 subsequent siblings)
  53 siblings, 0 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

In some cases, for example MMIO read, QEMU has to wait for the remote to
complete a command before proceeding. An eventfd based mechanism is
added to synchronize QEMU & remote process.

Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 v1 -> v2:
   - Added timeout to synchronization functions

 include/io/mpqemu-link.h |  9 +++++++++
 io/mpqemu-link.c         | 40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 49 insertions(+)

diff --git a/include/io/mpqemu-link.h b/include/io/mpqemu-link.h
index 345c67e..dee2dd3 100644
--- a/include/io/mpqemu-link.h
+++ b/include/io/mpqemu-link.h
@@ -30,7 +30,9 @@
 
 #include <stddef.h>
 #include <stdint.h>
+#include <unistd.h>
 #include <pthread.h>
+#include <sys/eventfd.h>
 
 #include "qom/object.h"
 #include "qemu/thread.h"
@@ -147,4 +149,11 @@ void mpqemu_destroy_channel(MPQemuChannel *chan);
 void mpqemu_link_set_callback(MPQemuLinkState *s, mpqemu_link_callback callback);
 void mpqemu_start_coms(MPQemuLinkState *s);
 
+#define GET_REMOTE_WAIT eventfd(0, 0)
+#define PUT_REMOTE_WAIT(wait) close(wait)
+#define PROXY_LINK_WAIT_DONE 1
+
+uint64_t wait_for_remote(int efd);
+void notify_proxy(int fd, uint64_t val);
+
 #endif
diff --git a/io/mpqemu-link.c b/io/mpqemu-link.c
index b39f4d0..696aeb1 100644
--- a/io/mpqemu-link.c
+++ b/io/mpqemu-link.c
@@ -231,6 +231,46 @@ int mpqemu_msg_recv(MPQemuLinkState *s, MPQemuMsg *msg, MPQemuChannel *chan)
     return rc;
 }
 
+uint64_t wait_for_remote(int efd)
+{
+    struct pollfd pfd = { .fd = efd, .events = POLLIN };
+    uint64_t val;
+    int ret;
+
+    ret = poll(&pfd, 1, 1000);
+
+    switch (ret) {
+    case 0:
+        qemu_log_mask(LOG_REMOTE_DEBUG, "Error wait_for_remote: Timed out\n");
+        /* TODO: Kick-off error recovery */
+        return ULLONG_MAX;
+    case -1:
+        qemu_log_mask(LOG_REMOTE_DEBUG, "Poll error wait_for_remote: %s\n",
+                      strerror(errno));
+        return ULLONG_MAX;
+    default:
+        if (read(efd, &val, sizeof(val)) == -1) {
+            qemu_log_mask(LOG_REMOTE_DEBUG, "Error wait_for_remote: %s\n",
+                          strerror(errno));
+            return ULLONG_MAX;
+        }
+    }
+
+    val = (val == ULLONG_MAX) ? val : (val - 1);
+
+    return val;
+}
+
+void notify_proxy(int efd, uint64_t val)
+{
+    val = (val == ULLONG_MAX) ? val : (val + 1);
+
+    if (write(efd, &val, sizeof(val)) == -1) {
+        qemu_log_mask(LOG_REMOTE_DEBUG, "Error notify_proxy: %s\n",
+                      strerror(errno));
+    }
+}
+
 static gboolean mpqemu_link_handler_prepare(GSource *gsrc, gint *timeout)
 {
     g_assert(timeout);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 09/49] multi-process: setup PCI host bridge for remote device
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (7 preceding siblings ...)
  2019-10-24  9:08 ` [RFC v4 PATCH 08/49] multi-process: add functions to synchronize proxy and remote endpoints Jagannathan Raman
@ 2019-10-24  9:08 ` Jagannathan Raman
  2019-11-13 16:07   ` Stefan Hajnoczi
  2019-10-24  9:08 ` [RFC v4 PATCH 10/49] multi-process: setup a machine object for remote device process Jagannathan Raman
                   ` (44 subsequent siblings)
  53 siblings, 1 reply; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

PCI host bridge is setup for the remote device process. It is
implemented using remote-pcihost object. It is an extension of the PCI
host bridge setup by QEMU.
Remote-pcihost configures a PCI bus which could be used by the remote
 PCI device to latch on to.

Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 hw/pci/Makefile.objs     |  2 +-
 include/remote/pcihost.h | 59 +++++++++++++++++++++++++++++++++
 remote/Makefile.objs     |  1 +
 remote/pcihost.c         | 85 ++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 146 insertions(+), 1 deletion(-)
 create mode 100644 include/remote/pcihost.h
 create mode 100644 remote/pcihost.c

diff --git a/hw/pci/Makefile.objs b/hw/pci/Makefile.objs
index 955be54..90693a7 100644
--- a/hw/pci/Makefile.objs
+++ b/hw/pci/Makefile.objs
@@ -13,6 +13,6 @@ common-obj-$(CONFIG_PCI_EXPRESS) += pcie_port.o pcie_host.o
 common-obj-$(call lnot,$(CONFIG_PCI)) += pci-stub.o
 common-obj-$(CONFIG_ALL) += pci-stub.o
 
-remote-pci-obj-$(CONFIG_MPQEMU) += pci.o pci_bridge.o
+remote-pci-obj-$(CONFIG_MPQEMU) += pci.o pci_bridge.o pci_host.o pcie_host.o
 remote-pci-obj-$(CONFIG_MPQEMU) += msi.o msix.o
 remote-pci-obj-$(CONFIG_MPQEMU) += pcie.o
diff --git a/include/remote/pcihost.h b/include/remote/pcihost.h
new file mode 100644
index 0000000..b3c711d
--- /dev/null
+++ b/include/remote/pcihost.h
@@ -0,0 +1,59 @@
+/*
+ * PCI Host for remote device
+ *
+ * Copyright 2019, Oracle and/or its affiliates.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef REMOTE_PCIHOST_H
+#define REMOTE_PCIHOST_H
+
+#include <stddef.h>
+#include <stdint.h>
+
+#include "exec/memory.h"
+#include "hw/pci/pcie_host.h"
+
+#define TYPE_REMOTE_HOST_DEVICE "remote-pcihost"
+#define REMOTE_HOST_DEVICE(obj) \
+    OBJECT_CHECK(RemPCIHost, (obj), TYPE_REMOTE_HOST_DEVICE)
+
+typedef struct RemPCIHost {
+    /*< private >*/
+    PCIExpressHost parent_obj;
+    /*< public >*/
+
+    /*
+     * Memory Controller Hub (MCH) may not be necessary for the emulation
+     * program. The two important reasons for implementing a PCI host in the
+     * emulation program are:
+     * - Provide a PCI bus for IO devices
+     * - Enable translation of guest PA to the PCI bar regions
+     *
+     * For both the above mentioned purposes, it doesn't look like we would
+     * need the MCH
+     */
+
+    MemoryRegion *mr_pci_mem;
+    MemoryRegion *mr_sys_mem;
+    MemoryRegion *mr_sys_io;
+} RemPCIHost;
+
+#endif
diff --git a/remote/Makefile.objs b/remote/Makefile.objs
index a9b2256..2757f5a 100644
--- a/remote/Makefile.objs
+++ b/remote/Makefile.objs
@@ -1 +1,2 @@
 remote-pci-obj-$(CONFIG_MPQEMU) += remote-main.o
+remote-pci-obj-$(CONFIG_MPQEMU) += pcihost.o
diff --git a/remote/pcihost.c b/remote/pcihost.c
new file mode 100644
index 0000000..0f43057
--- /dev/null
+++ b/remote/pcihost.c
@@ -0,0 +1,85 @@
+/*
+ * Remote PCI host device
+ *
+ * Copyright 2019, Oracle and/or its affiliates.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include <sys/types.h>
+
+#include "qemu/osdep.h"
+#include "hw/pci/pci.h"
+#include "hw/pci/pci_host.h"
+#include "hw/pci/pcie_host.h"
+#include "hw/qdev-properties.h"
+#include "remote/pcihost.h"
+#include "exec/memory.h"
+
+static const char *remote_host_root_bus_path(PCIHostState *host_bridge,
+                                             PCIBus *rootbus)
+{
+    return "0000:00";
+}
+
+static void remote_host_realize(DeviceState *dev, Error **errp)
+{
+    PCIHostState *pci = PCI_HOST_BRIDGE(dev);
+    RemPCIHost *s = REMOTE_HOST_DEVICE(dev);
+
+    /*
+     * TODO: the name of the bus would be provided by QEMU. Use
+     * "pcie.0" for now.
+     */
+    pci->bus = pci_root_bus_new(DEVICE(s), "pcie.0",
+                                s->mr_pci_mem, s->mr_sys_io,
+                                0, TYPE_PCIE_BUS);
+}
+
+static Property remote_host_props[] = {
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void remote_host_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
+
+    hc->root_bus_path = remote_host_root_bus_path;
+    dc->realize = remote_host_realize;
+    dc->props = remote_host_props;
+
+    dc->user_creatable = false;
+    set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
+    dc->fw_name = "pci";
+}
+
+static const TypeInfo remote_host_info = {
+    .name = TYPE_REMOTE_HOST_DEVICE,
+    .parent = TYPE_PCIE_HOST_BRIDGE,
+    .instance_size = sizeof(RemPCIHost),
+    .class_init = remote_host_class_init,
+};
+
+static void remote_machine_register(void)
+{
+    type_register_static(&remote_host_info);
+}
+
+type_init(remote_machine_register)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 10/49] multi-process: setup a machine object for remote device process
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (8 preceding siblings ...)
  2019-10-24  9:08 ` [RFC v4 PATCH 09/49] multi-process: setup PCI host bridge for remote device Jagannathan Raman
@ 2019-10-24  9:08 ` Jagannathan Raman
  2019-11-13 16:22   ` Stefan Hajnoczi
  2019-10-24  9:08 ` [RFC v4 PATCH 11/49] multi-process: setup memory manager for remote device Jagannathan Raman
                   ` (43 subsequent siblings)
  53 siblings, 1 reply; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

remote-machine object sets up various subsystems of the remote device
process. Instantiate PCI host bridge object and initialize RAM, IO &
PCI memory regions.

Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 exec.c                        |   3 +-
 include/exec/address-spaces.h |   2 +
 include/remote/machine.h      |  46 ++++++++++++++++
 remote/Makefile.objs          |   1 +
 remote/machine.c              | 118 ++++++++++++++++++++++++++++++++++++++++++
 remote/remote-main.c          |   7 +++
 6 files changed, 175 insertions(+), 2 deletions(-)
 create mode 100644 include/remote/machine.h
 create mode 100644 remote/machine.c

diff --git a/exec.c b/exec.c
index 08c4181..129a8a6 100644
--- a/exec.c
+++ b/exec.c
@@ -192,7 +192,6 @@ typedef struct subpage_t {
 #define PHYS_SECTION_UNASSIGNED 0
 
 static void io_mem_init(void);
-static void memory_map_init(void);
 static void tcg_log_global_after_sync(MemoryListener *listener);
 static void tcg_commit(MemoryListener *listener);
 
@@ -2989,7 +2988,7 @@ static void tcg_commit(MemoryListener *listener)
     tlb_flush(cpuas->cpu);
 }
 
-static void memory_map_init(void)
+void memory_map_init(void)
 {
     system_memory = g_malloc(sizeof(*system_memory));
 
diff --git a/include/exec/address-spaces.h b/include/exec/address-spaces.h
index db8bfa9..56a877b 100644
--- a/include/exec/address-spaces.h
+++ b/include/exec/address-spaces.h
@@ -33,6 +33,8 @@ MemoryRegion *get_system_memory(void);
  */
 MemoryRegion *get_system_io(void);
 
+void memory_map_init(void);
+
 extern AddressSpace address_space_memory;
 extern AddressSpace address_space_io;
 
diff --git a/include/remote/machine.h b/include/remote/machine.h
new file mode 100644
index 0000000..a00732d
--- /dev/null
+++ b/include/remote/machine.h
@@ -0,0 +1,46 @@
+/*
+ * Remote machine configuration
+ *
+ * Copyright 2019, Oracle and/or its affiliates.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef REMOTE_MACHINE_H
+#define REMOTE_MACHINE_H
+
+#include "qemu/osdep.h"
+#include "qom/object.h"
+#include "hw/boards.h"
+#include "remote/pcihost.h"
+#include "qemu/notify.h"
+
+typedef struct RemMachineState {
+    MachineState parent_obj;
+
+    RemPCIHost *host;
+} RemMachineState;
+
+#define TYPE_REMOTE_MACHINE "remote-machine"
+#define REMOTE_MACHINE(obj) \
+    OBJECT_CHECK(RemMachineState, (obj), TYPE_REMOTE_MACHINE)
+
+void qemu_run_machine_init_done_notifiers(void);
+
+#endif
diff --git a/remote/Makefile.objs b/remote/Makefile.objs
index 2757f5a..13d4c48 100644
--- a/remote/Makefile.objs
+++ b/remote/Makefile.objs
@@ -1,2 +1,3 @@
 remote-pci-obj-$(CONFIG_MPQEMU) += remote-main.o
 remote-pci-obj-$(CONFIG_MPQEMU) += pcihost.o
+remote-pci-obj-$(CONFIG_MPQEMU) += machine.o
diff --git a/remote/machine.c b/remote/machine.c
new file mode 100644
index 0000000..4ce197d
--- /dev/null
+++ b/remote/machine.c
@@ -0,0 +1,118 @@
+/*
+ * Machine for remote device
+ *
+ * Copyright 2019, Oracle and/or its affiliates.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include <stdint.h>
+#include <sys/types.h>
+
+#include "qemu/osdep.h"
+#include "remote/pcihost.h"
+#include "remote/machine.h"
+#include "exec/address-spaces.h"
+#include "exec/memory.h"
+#include "exec/ioport.h"
+#include "exec/ramlist.h"
+#include "qemu/thread.h"
+#include "qom/object.h"
+#include "qemu/module.h"
+#include "qapi/error.h"
+#include "qemu/main-loop.h"
+#include "qemu-common.h"
+#include "sysemu/sysemu.h"
+#include "qemu/notify.h"
+
+static NotifierList machine_init_done_notifiers =
+    NOTIFIER_LIST_INITIALIZER(machine_init_done_notifiers);
+
+bool machine_init_done;
+
+void qemu_add_machine_init_done_notifier(Notifier *notify)
+{
+    notifier_list_add(&machine_init_done_notifiers, notify);
+    if (machine_init_done) {
+        notify->notify(notify, NULL);
+    }
+}
+
+void qemu_remove_machine_init_done_notifier(Notifier *notify)
+{
+    notifier_remove(notify);
+}
+
+void qemu_run_machine_init_done_notifiers(void)
+{
+    machine_init_done = true;
+    notifier_list_notify(&machine_init_done_notifiers, NULL);
+}
+
+static void remote_machine_init(Object *obj)
+{
+    RemMachineState *s = REMOTE_MACHINE(obj);
+    RemPCIHost *rem_host;
+    MemoryRegion *system_memory, *system_io, *pci_memory;
+
+    Error *error_abort = NULL;
+
+    qemu_mutex_init(&ram_list.mutex);
+
+    object_property_add_child(object_get_root(), "machine", obj, &error_abort);
+    if (error_abort) {
+        error_report_err(error_abort);
+    }
+
+    memory_map_init();
+
+    system_memory = get_system_memory();
+    system_io = get_system_io();
+
+    pci_memory = g_new(MemoryRegion, 1);
+    memory_region_init(pci_memory, NULL, "pci", UINT64_MAX);
+
+    rem_host = REMOTE_HOST_DEVICE(qdev_create(NULL, TYPE_REMOTE_HOST_DEVICE));
+
+    rem_host->mr_pci_mem = pci_memory;
+    rem_host->mr_sys_mem = system_memory;
+    rem_host->mr_sys_io = system_io;
+
+    s->host = rem_host;
+
+    qemu_mutex_lock_iothread();
+    memory_region_add_subregion_overlap(system_memory, 0x0, pci_memory, -1);
+    qemu_mutex_unlock_iothread();
+
+    qdev_init_nofail(DEVICE(rem_host));
+}
+
+static const TypeInfo remote_machine = {
+    .name = TYPE_REMOTE_MACHINE,
+    .parent = TYPE_MACHINE,
+    .instance_size = sizeof(RemMachineState),
+    .instance_init = remote_machine_init,
+};
+
+static void remote_machine_register_types(void)
+{
+    type_register_static(&remote_machine);
+}
+
+type_init(remote_machine_register_types);
diff --git a/remote/remote-main.c b/remote/remote-main.c
index cccad12..bf44310 100644
--- a/remote/remote-main.c
+++ b/remote/remote-main.c
@@ -28,10 +28,17 @@
 #include <stdio.h>
 
 #include "qemu/module.h"
+#include "remote/pcihost.h"
+#include "remote/machine.h"
+#include "hw/boards.h"
+#include "hw/qdev-core.h"
+#include "qemu/main-loop.h"
 
 int main(int argc, char *argv[])
 {
     module_call_init(MODULE_INIT_QOM);
 
+    current_machine = MACHINE(REMOTE_MACHINE(object_new(TYPE_REMOTE_MACHINE)));
+
     return 0;
 }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 11/49] multi-process: setup memory manager for remote device
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (9 preceding siblings ...)
  2019-10-24  9:08 ` [RFC v4 PATCH 10/49] multi-process: setup a machine object for remote device process Jagannathan Raman
@ 2019-10-24  9:08 ` Jagannathan Raman
  2019-11-13 16:33   ` Stefan Hajnoczi
  2019-10-24  9:08 ` [RFC v4 PATCH 12/49] multi-process: remote process initialization Jagannathan Raman
                   ` (42 subsequent siblings)
  53 siblings, 1 reply; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

sync_sysmem_msg_t message format is defined. It is used to send
file descriptors of the RAM regions to remote device.
RAM on the remote device is configured with a set of file descriptors.
Old RAM regions are deleted and new regions, each with an fd, is
added to the RAM.

Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 Makefile.target          |  2 +
 include/io/mpqemu-link.h | 11 ++++++
 include/remote/memory.h  | 34 +++++++++++++++++
 remote/memory.c          | 99 ++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 146 insertions(+)
 create mode 100644 include/remote/memory.h
 create mode 100644 remote/memory.c

diff --git a/Makefile.target b/Makefile.target
index e454aae..547f10e 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -136,6 +136,8 @@ remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/xen-mapcache.o
 remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/audio.o
 remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/monitor.o
 
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += remote/memory.o
+
 #########################################################
 # Linux user emulator target
 
diff --git a/include/io/mpqemu-link.h b/include/io/mpqemu-link.h
index dee2dd3..7ef8207 100644
--- a/include/io/mpqemu-link.h
+++ b/include/io/mpqemu-link.h
@@ -36,6 +36,8 @@
 
 #include "qom/object.h"
 #include "qemu/thread.h"
+#include "exec/cpu-common.h"
+#include "exec/hwaddr.h"
 
 #define TYPE_MPQEMU_LINK "mpqemu-link"
 #define MPQEMU_LINK(obj) \
@@ -49,6 +51,7 @@
  * mpqemu_cmd_t:
  * CONF_READ        PCI config. space read
  * CONF_WRITE       PCI config. space write
+ * SYNC_SYSMEM      Shares QEMU's RAM with remote device's RAM
  *
  * proc_cmd_t enum type to specify the command to be executed on the remote
  * device.
@@ -57,6 +60,7 @@ typedef enum {
     INIT = 0,
     CONF_READ,
     CONF_WRITE,
+    SYNC_SYSMEM,
     MAX,
 } mpqemu_cmd_t;
 
@@ -74,12 +78,19 @@ typedef enum {
  *
  */
 typedef struct {
+    hwaddr gpas[REMOTE_MAX_FDS];
+    uint64_t sizes[REMOTE_MAX_FDS];
+    ram_addr_t offsets[REMOTE_MAX_FDS];
+} sync_sysmem_msg_t;
+
+typedef struct {
     mpqemu_cmd_t cmd;
     int bytestream;
     size_t size;
 
     union {
         uint64_t u64;
+        sync_sysmem_msg_t sync_sysmem;
     } data1;
 
     int fds[REMOTE_MAX_FDS];
diff --git a/include/remote/memory.h b/include/remote/memory.h
new file mode 100644
index 0000000..0bb637f
--- /dev/null
+++ b/include/remote/memory.h
@@ -0,0 +1,34 @@
+/*
+ * Memory manager for remote device
+ *
+ * Copyright 2019, Oracle and/or its affiliates.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef REMOTE_MEMORY_H
+#define REMOTE_MEMORY_H
+
+#include "qemu/osdep.h"
+#include "exec/hwaddr.h"
+#include "io/mpqemu-link.h"
+
+void remote_sysmem_reconfig(MPQemuMsg *msg, Error **errp);
+
+#endif
diff --git a/remote/memory.c b/remote/memory.c
new file mode 100644
index 0000000..a70027e
--- /dev/null
+++ b/remote/memory.c
@@ -0,0 +1,99 @@
+/*
+ * Memory manager for remote device
+ *
+ * Copyright 2019, Oracle and/or its affiliates.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include <stdint.h>
+#include <sys/types.h>
+
+#include "qemu/osdep.h"
+#include "qemu/queue.h"
+#include "qemu-common.h"
+#include "remote/memory.h"
+#include "exec/memory.h"
+#include "exec/address-spaces.h"
+#include "cpu.h"
+#include "exec/ram_addr.h"
+#include "io/mpqemu-link.h"
+#include "qemu/main-loop.h"
+#include "qapi/error.h"
+
+static void remote_ram_destructor(MemoryRegion *mr)
+{
+    qemu_ram_free(mr->ram_block);
+}
+
+static void remote_ram_init_from_fd(MemoryRegion *mr, int fd, uint64_t size,
+                                    ram_addr_t offset, Error **errp)
+{
+    char *name = g_strdup_printf("%d", fd);
+
+    memory_region_init(mr, NULL, name, size);
+    mr->ram = true;
+    mr->terminates = true;
+    mr->destructor = NULL;
+    mr->align = 0;
+    mr->ram_block = qemu_ram_alloc_from_fd(size, mr, RAM_SHARED, fd, offset,
+                                           errp);
+    mr->dirty_log_mask = tcg_enabled() ? (1 << DIRTY_MEMORY_CODE) : 0;
+
+    g_free(name);
+}
+
+void remote_sysmem_reconfig(MPQemuMsg *msg, Error **errp)
+{
+    sync_sysmem_msg_t *sysmem_info = &msg->data1.sync_sysmem;
+    MemoryRegion *sysmem, *subregion, *next;
+    Error *local_err = NULL;
+    int region;
+
+    sysmem = get_system_memory();
+
+    qemu_mutex_lock_iothread();
+
+    memory_region_transaction_begin();
+
+    QTAILQ_FOREACH_SAFE(subregion, &sysmem->subregions, subregions_link, next) {
+        if (subregion->ram) {
+            memory_region_del_subregion(sysmem, subregion);
+            remote_ram_destructor(subregion);
+        }
+    }
+
+    for (region = 0; region < msg->num_fds; region++) {
+        subregion = g_new(MemoryRegion, 1);
+        remote_ram_init_from_fd(subregion, msg->fds[region],
+                                sysmem_info->sizes[region],
+                                sysmem_info->offsets[region], &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            break;
+        }
+
+        memory_region_add_subregion(sysmem, sysmem_info->gpas[region],
+                                    subregion);
+    }
+
+    memory_region_transaction_commit();
+
+    qemu_mutex_unlock_iothread();
+}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 12/49] multi-process: remote process initialization
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (10 preceding siblings ...)
  2019-10-24  9:08 ` [RFC v4 PATCH 11/49] multi-process: setup memory manager for remote device Jagannathan Raman
@ 2019-10-24  9:08 ` Jagannathan Raman
  2019-11-13 16:38   ` Stefan Hajnoczi
  2019-10-24  9:08 ` [RFC v4 PATCH 13/49] multi-process: introduce proxy object Jagannathan Raman
                   ` (41 subsequent siblings)
  53 siblings, 1 reply; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

Adds the handler to process message from QEMU,
Initialize remote process main loop, handles SYNC_SYSMEM
message by updating its "system_memory" container using
shared file descriptors received from QEMU.

Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 v1 -> v2:
   - Separate thread for message processing is removed

 v2 -> v3:
   - Added multi-channel support in the remote end

 remote/remote-main.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 80 insertions(+)

diff --git a/remote/remote-main.c b/remote/remote-main.c
index bf44310..7689b57 100644
--- a/remote/remote-main.c
+++ b/remote/remote-main.c
@@ -26,6 +26,7 @@
 #include "qemu-common.h"
 
 #include <stdio.h>
+#include <unistd.h>
 
 #include "qemu/module.h"
 #include "remote/pcihost.h"
@@ -33,12 +34,91 @@
 #include "hw/boards.h"
 #include "hw/qdev-core.h"
 #include "qemu/main-loop.h"
+#include "remote/memory.h"
+#include "io/mpqemu-link.h"
+#include "qapi/error.h"
+#include "qemu/main-loop.h"
+#include "sysemu/cpus.h"
+#include "qemu-common.h"
+#include "hw/pci/pci.h"
+#include "qemu/thread.h"
+#include "qemu/main-loop.h"
+#include "qemu/config-file.h"
+#include "sysemu/sysemu.h"
+#include "block/block.h"
+
+static MPQemuLinkState *mpqemu_link;
+PCIDevice *remote_pci_dev;
+
+static void process_msg(GIOCondition cond, MPQemuChannel *chan)
+{
+    MPQemuMsg *msg = NULL;
+    Error *err = NULL;
+
+    if ((cond & G_IO_HUP) || (cond & G_IO_ERR)) {
+        error_setg(&err, "socket closed, cond is %d", cond);
+        goto finalize_loop;
+    }
+
+    msg = g_malloc0(sizeof(MPQemuMsg));
+
+    if (mpqemu_msg_recv(mpqemu_link, msg, chan) < 0) {
+        error_setg(&err, "Failed to receive message");
+        goto finalize_loop;
+    }
+
+    switch (msg->cmd) {
+    case INIT:
+        break;
+    case CONF_WRITE:
+        break;
+    case CONF_READ:
+        break;
+    default:
+        error_setg(&err, "Unknown command");
+        goto finalize_loop;
+    }
+
+    g_free(msg);
+
+    return;
+
+finalize_loop:
+    error_report_err(err);
+    g_free(msg);
+    mpqemu_link_finalize(mpqemu_link);
+    mpqemu_link = NULL;
+}
 
 int main(int argc, char *argv[])
 {
+    Error *err = NULL;
+
     module_call_init(MODULE_INIT_QOM);
 
+    bdrv_init_with_whitelist();
+
+    if (qemu_init_main_loop(&err)) {
+        error_report_err(err);
+        return -EBUSY;
+    }
+
+    qemu_init_cpu_loop();
+
+    page_size_init();
+
     current_machine = MACHINE(REMOTE_MACHINE(object_new(TYPE_REMOTE_MACHINE)));
 
+    mpqemu_link = mpqemu_link_create();
+    if (!mpqemu_link) {
+        printf("Could not create MPQemu link\n");
+        return -1;
+    }
+
+    mpqemu_init_channel(mpqemu_link, &mpqemu_link->com, STDIN_FILENO);
+    mpqemu_link_set_callback(mpqemu_link, process_msg);
+
+    mpqemu_start_coms(mpqemu_link);
+
     return 0;
 }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 13/49] multi-process: introduce proxy object
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (11 preceding siblings ...)
  2019-10-24  9:08 ` [RFC v4 PATCH 12/49] multi-process: remote process initialization Jagannathan Raman
@ 2019-10-24  9:08 ` Jagannathan Raman
  2019-11-21 11:09   ` Stefan Hajnoczi
  2019-10-24  9:08 ` [RFC v4 PATCH 14/49] mutli-process: build remote command line args Jagannathan Raman
                   ` (40 subsequent siblings)
  53 siblings, 1 reply; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

Defines a PCI Device proxy object as a parent of TYPE_PCI_DEVICE.
PCI Proxy Object is responsible for registering PCI BARs,i
MemoryRegionOps to handle access to the BARs and forwarding those
to the remote device.
PCI Proxy object intercepts config space reads and writes. In case
of pci config write it forwards it to the remote device using
communication channel set by proxy-link object.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
---
 hw/Makefile.objs              |   2 +
 hw/proxy/Makefile.objs        |   1 +
 hw/proxy/qemu-proxy.c         | 247 ++++++++++++++++++++++++++++++++++++++++++
 include/hw/proxy/qemu-proxy.h |  81 ++++++++++++++
 remote/remote-main.c          |  28 +++++
 5 files changed, 359 insertions(+)
 create mode 100644 hw/proxy/Makefile.objs
 create mode 100644 hw/proxy/qemu-proxy.c
 create mode 100644 include/hw/proxy/qemu-proxy.h

diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index 4e28053..e016100 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -44,6 +44,8 @@ endif
 common-obj-y += $(devices-dirs-y)
 obj-y += $(devices-dirs-y)
 
+common-obj-$(CONFIG_MPQEMU) += proxy/
+
 remote-pci-obj-$(CONFIG_MPQEMU) += core/
 remote-pci-obj-$(CONFIG_MPQEMU) += block/
 remote-pci-obj-$(CONFIG_MPQEMU) += pci/
diff --git a/hw/proxy/Makefile.objs b/hw/proxy/Makefile.objs
new file mode 100644
index 0000000..eb81624
--- /dev/null
+++ b/hw/proxy/Makefile.objs
@@ -0,0 +1 @@
+common-obj-$(CONFIG_MPQEMU) += qemu-proxy.o
diff --git a/hw/proxy/qemu-proxy.c b/hw/proxy/qemu-proxy.c
new file mode 100644
index 0000000..baba4da
--- /dev/null
+++ b/hw/proxy/qemu-proxy.c
@@ -0,0 +1,247 @@
+/*
+ * Copyright 2019, Oracle and/or its affiliates.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <unistd.h>
+#include <assert.h>
+#include <string.h>
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "io/mpqemu-link.h"
+#include "exec/memory.h"
+#include "exec/cpu-common.h"
+#include "exec/address-spaces.h"
+#include "qemu/int128.h"
+#include "qemu/range.h"
+#include "hw/pci/pci.h"
+#include "qemu/option.h"
+#include "qemu/config-file.h"
+#include "qapi/qmp/qjson.h"
+#include "qapi/qmp/qstring.h"
+#include "sysemu/sysemu.h"
+#include "hw/proxy/qemu-proxy.h"
+
+static void pci_proxy_dev_realize(PCIDevice *dev, Error **errp);
+
+int remote_spawn(PCIProxyDev *pdev, const char *command, Error **errp)
+{
+    char *args[3];
+    pid_t rpid;
+    int fd[2] = {-1, -1};
+    Error *local_error = NULL;
+
+    if (pdev->managed) {
+        /* Child is forked by external program (such as libvirt). */
+        return -1;
+    }
+
+    if (socketpair(AF_UNIX, SOCK_STREAM, 0, fd)) {
+        error_setg(errp, "Unable to create unix socket.");
+        return -1;
+    }
+    /* TODO: Restrict the forked process' permissions and capabilities. */
+    rpid = qemu_fork(&local_error);
+
+    if (rpid == -1) {
+        error_setg(errp, "Unable to spawn emulation program.");
+        close(fd[0]);
+        close(fd[1]);
+        return -1;
+    }
+
+    if (rpid == 0) {
+        close(fd[0]);
+
+        args[0] = g_strdup(command);
+        args[1] = g_strdup_printf("%d", fd[1]);
+        args[2] = NULL;
+        execvp(args[0], (char *const *)args);
+        exit(1);
+    }
+    pdev->remote_pid = rpid;
+    pdev->rsocket = fd[0];
+
+    close(fd[1]);
+
+    return 0;
+}
+
+static int get_proxy_sock(PCIDevice *dev)
+{
+    PCIProxyDev *pdev;
+
+    pdev = PCI_PROXY_DEV(dev);
+
+    return pdev->rsocket;
+}
+
+static void set_proxy_sock(PCIDevice *dev, int socket)
+{
+    PCIProxyDev *pdev;
+
+    pdev = PCI_PROXY_DEV(dev);
+
+    pdev->rsocket = socket;
+    pdev->managed = true;
+
+}
+
+static int config_op_send(PCIProxyDev *dev, uint32_t addr, uint32_t *val, int l,
+                          unsigned int op)
+{
+    MPQemuMsg msg;
+    struct conf_data_msg conf_data;
+    int wait;
+
+    memset(&msg, 0, sizeof(MPQemuMsg));
+    conf_data.addr = addr;
+    conf_data.val = (op == CONF_WRITE) ? *val : 0;
+    conf_data.l = l;
+
+    msg.data2 = (uint8_t *)malloc(sizeof(conf_data));
+    if (!msg.data2) {
+        return -ENOMEM;
+    }
+
+    memcpy(msg.data2, (const uint8_t *)&conf_data, sizeof(conf_data));
+    msg.size = sizeof(conf_data);
+    msg.cmd = op;
+    msg.bytestream = 1;
+
+    if (op == CONF_WRITE) {
+        msg.num_fds = 0;
+    } else {
+        wait = GET_REMOTE_WAIT;
+        msg.num_fds = 1;
+        msg.fds[0] = wait;
+    }
+
+    mpqemu_msg_send(dev->mpqemu_link, &msg, dev->mpqemu_link->com);
+
+    if (op == CONF_READ) {
+        *val = (uint32_t)wait_for_remote(wait);
+        PUT_REMOTE_WAIT(wait);
+    }
+
+    free(msg.data2);
+
+    return 0;
+}
+
+static uint32_t pci_proxy_read_config(PCIDevice *d, uint32_t addr, int len)
+{
+    uint32_t val;
+
+    (void)pci_default_read_config(d, addr, len);
+
+    config_op_send(PCI_PROXY_DEV(d), addr, &val, len, CONF_READ);
+
+    return val;
+}
+
+static void pci_proxy_write_config(PCIDevice *d, uint32_t addr, uint32_t val,
+                                   int l)
+{
+    pci_default_write_config(d, addr, val, l);
+
+    config_op_send(PCI_PROXY_DEV(d), addr, &val, l, CONF_WRITE);
+}
+
+static void pci_proxy_dev_class_init(ObjectClass *klass, void *data)
+{
+    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+    k->realize = pci_proxy_dev_realize;
+    k->config_read = pci_proxy_read_config;
+    k->config_write = pci_proxy_write_config;
+}
+
+static const TypeInfo pci_proxy_dev_type_info = {
+    .name          = TYPE_PCI_PROXY_DEV,
+    .parent        = TYPE_PCI_DEVICE,
+    .instance_size = sizeof(PCIProxyDev),
+    .abstract      = true,
+    .class_size    = sizeof(PCIProxyDevClass),
+    .class_init    = pci_proxy_dev_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { INTERFACE_CONVENTIONAL_PCI_DEVICE },
+        { },
+    },
+};
+
+static void pci_proxy_dev_register_types(void)
+{
+    type_register_static(&pci_proxy_dev_type_info);
+}
+
+type_init(pci_proxy_dev_register_types)
+
+static void init_proxy(PCIDevice *dev, char *command, Error **errp)
+{
+    PCIProxyDev *pdev = PCI_PROXY_DEV(dev);
+    Error *local_error = NULL;
+
+    if (!pdev->managed) {
+        if (command) {
+            remote_spawn(pdev, command, &local_error);
+        } else {
+            return;
+        }
+    } else {
+        pdev->remote_pid = atoi(pdev->rid);
+        if (pdev->remote_pid == -1) {
+            error_setg(errp, "Remote PID is -1");
+            return;
+        }
+    }
+
+    pdev->mpqemu_link = mpqemu_link_create();
+
+    if (!pdev->mpqemu_link) {
+        error_setg(errp, "Failed to create proxy link");
+        return;
+    }
+
+    mpqemu_init_channel(pdev->mpqemu_link, &pdev->mpqemu_link->com,
+                        pdev->socket);
+}
+
+static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
+{
+    PCIProxyDev *dev = PCI_PROXY_DEV(device);
+    PCIProxyDevClass *k = PCI_PROXY_DEV_GET_CLASS(dev);
+    Error *local_err = NULL;
+
+    if (k->realize) {
+        k->realize(dev, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+        }
+    }
+
+    dev->set_proxy_sock = set_proxy_sock;
+    dev->get_proxy_sock = get_proxy_sock;
+    dev->init_proxy = init_proxy;
+}
diff --git a/include/hw/proxy/qemu-proxy.h b/include/hw/proxy/qemu-proxy.h
new file mode 100644
index 0000000..3648a77
--- /dev/null
+++ b/include/hw/proxy/qemu-proxy.h
@@ -0,0 +1,81 @@
+/*
+ * Copyright 2019, Oracle and/or its affiliates.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef QEMU_PROXY_H
+#define QEMU_PROXY_H
+
+#include "io/mpqemu-link.h"
+
+#define TYPE_PCI_PROXY_DEV "pci-proxy-dev"
+
+#define PCI_PROXY_DEV(obj) \
+            OBJECT_CHECK(PCIProxyDev, (obj), TYPE_PCI_PROXY_DEV)
+
+#define PCI_PROXY_DEV_CLASS(klass) \
+            OBJECT_CLASS_CHECK(PCIProxyDevClass, (klass), TYPE_PCI_PROXY_DEV)
+
+#define PCI_PROXY_DEV_GET_CLASS(obj) \
+            OBJECT_GET_CLASS(PCIProxyDevClass, (obj), TYPE_PCI_PROXY_DEV)
+
+typedef struct PCIProxyDev {
+    PCIDevice parent_dev;
+
+    int n_mr_sections;
+    MemoryRegionSection *mr_sections;
+
+    MPQemuLinkState *mpqemu_link;
+
+    EventNotifier intr;
+    EventNotifier resample;
+
+    pid_t remote_pid;
+    int rsocket;
+    int socket;
+
+    char *rid;
+
+    bool managed;
+    char *dev_id;
+
+    QLIST_ENTRY(PCIProxyDev) next;
+
+    void (*set_proxy_sock) (PCIDevice *dev, int socket);
+    int (*get_proxy_sock) (PCIDevice *dev);
+
+    void (*set_remote_opts) (PCIDevice *dev, QDict *qdict, unsigned int cmd);
+    void (*proxy_ready) (PCIDevice *dev);
+    void (*init_proxy) (PCIDevice *pdev, char *command, Error **errp);
+
+} PCIProxyDev;
+
+typedef struct PCIProxyDevClass {
+    PCIDeviceClass parent_class;
+
+    void (*realize)(PCIProxyDev *dev, Error **errp);
+
+    char *command;
+} PCIProxyDevClass;
+
+int remote_spawn(PCIProxyDev *pdev, const char *command, Error **errp);
+
+
+#endif /* QEMU_PROXY_H */
diff --git a/remote/remote-main.c b/remote/remote-main.c
index 7689b57..6c2eb91 100644
--- a/remote/remote-main.c
+++ b/remote/remote-main.c
@@ -50,6 +50,32 @@
 static MPQemuLinkState *mpqemu_link;
 PCIDevice *remote_pci_dev;
 
+static void process_config_write(MPQemuMsg *msg)
+{
+    struct conf_data_msg *conf = (struct conf_data_msg *)msg->data2;
+
+    qemu_mutex_lock_iothread();
+    pci_default_write_config(remote_pci_dev, conf->addr, conf->val, conf->l);
+    qemu_mutex_unlock_iothread();
+}
+
+static void process_config_read(MPQemuMsg *msg)
+{
+    struct conf_data_msg *conf = (struct conf_data_msg *)msg->data2;
+    uint32_t val;
+    int wait;
+
+    wait = msg->fds[0];
+
+    qemu_mutex_lock_iothread();
+    val = pci_default_read_config(remote_pci_dev, conf->addr, conf->l);
+    qemu_mutex_unlock_iothread();
+
+    notify_proxy(wait, val);
+
+    PUT_REMOTE_WAIT(wait);
+}
+
 static void process_msg(GIOCondition cond, MPQemuChannel *chan)
 {
     MPQemuMsg *msg = NULL;
@@ -71,8 +97,10 @@ static void process_msg(GIOCondition cond, MPQemuChannel *chan)
     case INIT:
         break;
     case CONF_WRITE:
+        process_config_write(msg);
         break;
     case CONF_READ:
+        process_config_read(msg);
         break;
     default:
         error_setg(&err, "Unknown command");
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 14/49] mutli-process: build remote command line args
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (12 preceding siblings ...)
  2019-10-24  9:08 ` [RFC v4 PATCH 13/49] multi-process: introduce proxy object Jagannathan Raman
@ 2019-10-24  9:08 ` Jagannathan Raman
  2019-11-21 11:23   ` Stefan Hajnoczi
  2019-10-24  9:08 ` [RFC v4 PATCH 15/49] multi-process: PCI BAR read/write handling for proxy & remote endpoints Jagannathan Raman
                   ` (39 subsequent siblings)
  53 siblings, 1 reply; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
---
 New patch in v3

 hw/proxy/qemu-proxy.c         | 80 +++++++++++++++++++++++++++++++++----------
 include/hw/proxy/qemu-proxy.h |  2 +-
 2 files changed, 62 insertions(+), 20 deletions(-)

diff --git a/hw/proxy/qemu-proxy.c b/hw/proxy/qemu-proxy.c
index baba4da..ca7dd1a 100644
--- a/hw/proxy/qemu-proxy.c
+++ b/hw/proxy/qemu-proxy.c
@@ -45,47 +45,89 @@
 
 static void pci_proxy_dev_realize(PCIDevice *dev, Error **errp);
 
+static int add_argv(char *command_str, char **argv, int argc)
+{
+    int max_args = 64;
+
+    if (argc < max_args - 1) {
+        argv[argc++] = command_str;
+        argv[argc] = 0;
+    } else {
+        return 0;
+    }
+
+    return argc;
+}
+
+static int make_argv(char *command_str, char **argv, int argc)
+{
+    int max_args = 64;
+
+    char *p2 = strtok(command_str, " ");
+    while (p2 && argc < max_args - 1) {
+        argv[argc++] = p2;
+        p2 = strtok(0, " ");
+    }
+    argv[argc] = 0;
+
+    return argc;
+}
+
 int remote_spawn(PCIProxyDev *pdev, const char *command, Error **errp)
 {
-    char *args[3];
     pid_t rpid;
     int fd[2] = {-1, -1};
     Error *local_error = NULL;
+    char *argv[64];
+    int argc = 0, _argc;
+    char *sfd;
+    char *exec_dir;
+    int rc = -EINVAL;
 
     if (pdev->managed) {
         /* Child is forked by external program (such as libvirt). */
-        return -1;
+        return rc;
     }
 
     if (socketpair(AF_UNIX, SOCK_STREAM, 0, fd)) {
         error_setg(errp, "Unable to create unix socket.");
-        return -1;
+        return rc;
     }
+    exec_dir = g_strdup_printf("%s/%s", qemu_get_exec_dir(), "qemu-scsi-dev");
+    argc = add_argv(exec_dir, argv, argc);
+    sfd = g_strdup_printf("%d", fd[1]);
+    argc = add_argv(sfd, argv, argc);
+    _argc = argc;
+    argc = make_argv((char *)command, argv, argc);
+
     /* TODO: Restrict the forked process' permissions and capabilities. */
     rpid = qemu_fork(&local_error);
 
     if (rpid == -1) {
         error_setg(errp, "Unable to spawn emulation program.");
         close(fd[0]);
-        close(fd[1]);
-        return -1;
+        goto fail;
     }
 
     if (rpid == 0) {
         close(fd[0]);
-
-        args[0] = g_strdup(command);
-        args[1] = g_strdup_printf("%d", fd[1]);
-        args[2] = NULL;
-        execvp(args[0], (char *const *)args);
+        execvp(argv[0], (char *const *)argv);
         exit(1);
     }
     pdev->remote_pid = rpid;
-    pdev->rsocket = fd[0];
+    pdev->rsocket = fd[1];
+    pdev->socket = fd[0];
 
+    rc = 0;
+
+fail:
     close(fd[1]);
 
-    return 0;
+    for (int i = 0; i < _argc; i++) {
+        g_free(argv[i]);
+    }
+
+    return rc;
 }
 
 static int get_proxy_sock(PCIDevice *dev)
@@ -94,7 +136,7 @@ static int get_proxy_sock(PCIDevice *dev)
 
     pdev = PCI_PROXY_DEV(dev);
 
-    return pdev->rsocket;
+    return pdev->socket;
 }
 
 static void set_proxy_sock(PCIDevice *dev, int socket)
@@ -103,7 +145,7 @@ static void set_proxy_sock(PCIDevice *dev, int socket)
 
     pdev = PCI_PROXY_DEV(dev);
 
-    pdev->rsocket = socket;
+    pdev->socket = socket;
     pdev->managed = true;
 
 }
@@ -198,16 +240,16 @@ static void pci_proxy_dev_register_types(void)
 
 type_init(pci_proxy_dev_register_types)
 
-static void init_proxy(PCIDevice *dev, char *command, Error **errp)
+static void init_proxy(PCIDevice *dev, char *command, bool need_spawn, Error **errp)
 {
     PCIProxyDev *pdev = PCI_PROXY_DEV(dev);
     Error *local_error = NULL;
 
     if (!pdev->managed) {
-        if (command) {
-            remote_spawn(pdev, command, &local_error);
-        } else {
-            return;
+        if (need_spawn) {
+            if (!remote_spawn(pdev, command, &local_error)) {
+                return;
+            }
         }
     } else {
         pdev->remote_pid = atoi(pdev->rid);
diff --git a/include/hw/proxy/qemu-proxy.h b/include/hw/proxy/qemu-proxy.h
index 3648a77..f97b103 100644
--- a/include/hw/proxy/qemu-proxy.h
+++ b/include/hw/proxy/qemu-proxy.h
@@ -63,7 +63,7 @@ typedef struct PCIProxyDev {
 
     void (*set_remote_opts) (PCIDevice *dev, QDict *qdict, unsigned int cmd);
     void (*proxy_ready) (PCIDevice *dev);
-    void (*init_proxy) (PCIDevice *pdev, char *command, Error **errp);
+    void (*init_proxy) (PCIDevice *dev, char *command, bool need_spawn, Error **errp);
 
 } PCIProxyDev;
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 15/49] multi-process: PCI BAR read/write handling for proxy & remote endpoints
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (13 preceding siblings ...)
  2019-10-24  9:08 ` [RFC v4 PATCH 14/49] mutli-process: build remote command line args Jagannathan Raman
@ 2019-10-24  9:08 ` Jagannathan Raman
  2019-11-21 11:33   ` Stefan Hajnoczi
  2019-10-24  9:08 ` [RFC v4 PATCH 16/49] multi-process: Add LSI device proxy object Jagannathan Raman
                   ` (38 subsequent siblings)
  53 siblings, 1 reply; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

Proxy device object implements handler for PCI BAR writes and reads. The handler
uses BAR_WRITE/BAR_READ message to communicate to the remote process with the BAR address and
value to be written/read.
The remote process implements handler for BAR_WRITE/BAR_READ message.

Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
---
 hw/proxy/qemu-proxy.c         | 77 +++++++++++++++++++++++++++++++++++++++++++
 include/hw/proxy/qemu-proxy.h | 21 ++++++++++--
 include/io/mpqemu-link.h      | 12 +++++++
 remote/remote-main.c          | 73 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 181 insertions(+), 2 deletions(-)

diff --git a/hw/proxy/qemu-proxy.c b/hw/proxy/qemu-proxy.c
index ca7dd1a..e1f62d7 100644
--- a/hw/proxy/qemu-proxy.c
+++ b/hw/proxy/qemu-proxy.c
@@ -275,6 +275,7 @@ static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
     PCIProxyDev *dev = PCI_PROXY_DEV(device);
     PCIProxyDevClass *k = PCI_PROXY_DEV_GET_CLASS(dev);
     Error *local_err = NULL;
+    int r;
 
     if (k->realize) {
         k->realize(dev, &local_err);
@@ -283,7 +284,83 @@ static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
         }
     }
 
+    for (r = 0; r < PCI_NUM_REGIONS; r++) {
+        if (!dev->region[r].present) {
+            continue;
+        }
+
+        dev->region[r].dev = dev;
+
+        pci_register_bar(PCI_DEVICE(dev), r, dev->region[r].type,
+                         &dev->region[r].mr);
+    }
+
     dev->set_proxy_sock = set_proxy_sock;
     dev->get_proxy_sock = get_proxy_sock;
     dev->init_proxy = init_proxy;
 }
+
+static void send_bar_access_msg(PCIProxyDev *dev, MemoryRegion *mr,
+                                bool write, hwaddr addr, uint64_t *val,
+                                unsigned size, bool memory)
+{
+    MPQemuLinkState *mpqemu_link = dev->mpqemu_link;
+    MPQemuMsg msg;
+    int wait;
+
+    memset(&msg, 0, sizeof(MPQemuMsg));
+
+    msg.bytestream = 0;
+    msg.size = sizeof(msg.data1);
+    msg.data1.bar_access.addr = mr->addr + addr;
+    msg.data1.bar_access.size = size;
+    msg.data1.bar_access.memory = memory;
+
+    if (write) {
+        msg.cmd = BAR_WRITE;
+        msg.data1.bar_access.val = *val;
+    } else {
+        wait = GET_REMOTE_WAIT;
+
+        msg.cmd = BAR_READ;
+        msg.num_fds = 1;
+        msg.fds[0] = wait;
+    }
+
+    mpqemu_msg_send(mpqemu_link, &msg, mpqemu_link->com);
+
+    if (!write) {
+        *val = wait_for_remote(wait);
+        PUT_REMOTE_WAIT(wait);
+    }
+}
+
+void proxy_default_bar_write(void *opaque, hwaddr addr, uint64_t val,
+                             unsigned size)
+{
+    ProxyMemoryRegion *pmr = opaque;
+
+    send_bar_access_msg(pmr->dev, &pmr->mr, true, addr, &val, size,
+                        pmr->memory);
+}
+
+uint64_t proxy_default_bar_read(void *opaque, hwaddr addr, unsigned size)
+{
+    ProxyMemoryRegion *pmr = opaque;
+    uint64_t val;
+
+    send_bar_access_msg(pmr->dev, &pmr->mr, false, addr, &val, size,
+                        pmr->memory);
+
+     return val;
+}
+
+const MemoryRegionOps proxy_default_ops = {
+    .read = proxy_default_bar_read,
+    .write = proxy_default_bar_write,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+    .impl = {
+        .min_access_size = 1,
+        .max_access_size = 1,
+    },
+};
diff --git a/include/hw/proxy/qemu-proxy.h b/include/hw/proxy/qemu-proxy.h
index f97b103..5f57822 100644
--- a/include/hw/proxy/qemu-proxy.h
+++ b/include/hw/proxy/qemu-proxy.h
@@ -36,7 +36,19 @@
 #define PCI_PROXY_DEV_GET_CLASS(obj) \
             OBJECT_GET_CLASS(PCIProxyDevClass, (obj), TYPE_PCI_PROXY_DEV)
 
-typedef struct PCIProxyDev {
+typedef struct PCIProxyDev PCIProxyDev;
+
+typedef struct ProxyMemoryRegion {
+    PCIProxyDev *dev;
+    MemoryRegion mr;
+    bool memory;
+    bool present;
+    uint8_t type;
+} ProxyMemoryRegion;
+
+extern const MemoryRegionOps proxy_default_ops;
+
+struct PCIProxyDev {
     PCIDevice parent_dev;
 
     int n_mr_sections;
@@ -65,7 +77,8 @@ typedef struct PCIProxyDev {
     void (*proxy_ready) (PCIDevice *dev);
     void (*init_proxy) (PCIDevice *dev, char *command, bool need_spawn, Error **errp);
 
-} PCIProxyDev;
+    ProxyMemoryRegion region[PCI_NUM_REGIONS];
+};
 
 typedef struct PCIProxyDevClass {
     PCIDeviceClass parent_class;
@@ -77,5 +90,9 @@ typedef struct PCIProxyDevClass {
 
 int remote_spawn(PCIProxyDev *pdev, const char *command, Error **errp);
 
+void proxy_default_bar_write(void *opaque, hwaddr addr, uint64_t val,
+                             unsigned size);
+
+uint64_t proxy_default_bar_read(void *opaque, hwaddr addr, unsigned size);
 
 #endif /* QEMU_PROXY_H */
diff --git a/include/io/mpqemu-link.h b/include/io/mpqemu-link.h
index 7ef8207..89f04c5 100644
--- a/include/io/mpqemu-link.h
+++ b/include/io/mpqemu-link.h
@@ -52,6 +52,8 @@
  * CONF_READ        PCI config. space read
  * CONF_WRITE       PCI config. space write
  * SYNC_SYSMEM      Shares QEMU's RAM with remote device's RAM
+ * BAR_WRITE        Writes to PCI BAR region
+ * BAR_READ         Reads from PCI BAR region
  *
  * proc_cmd_t enum type to specify the command to be executed on the remote
  * device.
@@ -61,6 +63,8 @@ typedef enum {
     CONF_READ,
     CONF_WRITE,
     SYNC_SYSMEM,
+    BAR_WRITE,
+    BAR_READ,
     MAX,
 } mpqemu_cmd_t;
 
@@ -84,6 +88,13 @@ typedef struct {
 } sync_sysmem_msg_t;
 
 typedef struct {
+    hwaddr addr;
+    uint64_t val;
+    unsigned size;
+    bool memory;
+} bar_access_msg_t;
+
+typedef struct {
     mpqemu_cmd_t cmd;
     int bytestream;
     size_t size;
@@ -91,6 +102,7 @@ typedef struct {
     union {
         uint64_t u64;
         sync_sysmem_msg_t sync_sysmem;
+        bar_access_msg_t bar_access;
     } data1;
 
     int fds[REMOTE_MAX_FDS];
diff --git a/remote/remote-main.c b/remote/remote-main.c
index 6c2eb91..49b27d5 100644
--- a/remote/remote-main.c
+++ b/remote/remote-main.c
@@ -46,6 +46,7 @@
 #include "qemu/config-file.h"
 #include "sysemu/sysemu.h"
 #include "block/block.h"
+#include "exec/memattrs.h"
 
 static MPQemuLinkState *mpqemu_link;
 PCIDevice *remote_pci_dev;
@@ -76,6 +77,66 @@ static void process_config_read(MPQemuMsg *msg)
     PUT_REMOTE_WAIT(wait);
 }
 
+/* TODO: confirm memtx attrs. */
+static void process_bar_write(MPQemuMsg *msg, Error **errp)
+{
+    bar_access_msg_t *bar_access = &msg->data1.bar_access;
+    AddressSpace *as =
+        bar_access->memory ? &address_space_memory : &address_space_io;
+    MemTxResult res;
+
+    res = address_space_rw(as, bar_access->addr, MEMTXATTRS_UNSPECIFIED,
+                           (uint8_t *)&bar_access->val, bar_access->size, true);
+
+    if (res != MEMTX_OK) {
+        error_setg(errp, "Could not perform address space write operation,"
+                   " inaccessible address: %lx.", bar_access->addr);
+    }
+}
+
+static void process_bar_read(MPQemuMsg *msg, Error **errp)
+{
+    bar_access_msg_t *bar_access = &msg->data1.bar_access;
+    AddressSpace *as;
+    int wait = msg->fds[0];
+    MemTxResult res;
+    uint64_t val = 0;
+
+    as = bar_access->memory ? &address_space_memory : &address_space_io;
+
+    assert(bar_access->size <= sizeof(uint64_t));
+
+    res = address_space_rw(as, bar_access->addr, MEMTXATTRS_UNSPECIFIED,
+                           (uint8_t *)&val, bar_access->size, false);
+
+    if (res != MEMTX_OK) {
+        error_setg(errp, "Could not perform address space read operation,"
+                   " inaccessible address: %lx.", bar_access->addr);
+        val = (uint64_t)-1;
+        goto fail;
+    }
+
+    switch (bar_access->size) {
+    case 4:
+        val = *((uint32_t *)&val);
+        break;
+    case 2:
+        val = *((uint16_t *)&val);
+        break;
+    case 1:
+        val = *((uint8_t *)&val);
+        break;
+    default:
+        error_setg(errp, "Invalid PCI BAR read size");
+        return;
+    }
+
+fail:
+    notify_proxy(wait, val);
+
+    PUT_REMOTE_WAIT(wait);
+}
+
 static void process_msg(GIOCondition cond, MPQemuChannel *chan)
 {
     MPQemuMsg *msg = NULL;
@@ -102,6 +163,18 @@ static void process_msg(GIOCondition cond, MPQemuChannel *chan)
     case CONF_READ:
         process_config_read(msg);
         break;
+    case BAR_WRITE:
+        process_bar_write(msg, &err);
+        if (err) {
+            goto finalize_loop;
+        }
+        break;
+    case BAR_READ:
+        process_bar_read(msg, &err);
+        if (err) {
+            goto finalize_loop;
+        }
+        break;
     default:
         error_setg(&err, "Unknown command");
         goto finalize_loop;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 16/49] multi-process: Add LSI device proxy object
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (14 preceding siblings ...)
  2019-10-24  9:08 ` [RFC v4 PATCH 15/49] multi-process: PCI BAR read/write handling for proxy & remote endpoints Jagannathan Raman
@ 2019-10-24  9:08 ` Jagannathan Raman
  2019-11-21 11:35   ` Stefan Hajnoczi
  2019-10-24  9:08 ` [RFC v4 PATCH 17/49] multi-process: Synchronize remote memory Jagannathan Raman
                   ` (37 subsequent siblings)
  53 siblings, 1 reply; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

Adds proxy-lsi53c895a object, as a derivative of the pci-proxy-dev
object. This object is the proxy for the lsi53c895a object
instantiated by the remote process.

Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 hw/proxy/Makefile.objs              |  1 +
 hw/proxy/proxy-lsi53c895a.c         | 91 +++++++++++++++++++++++++++++++++++++
 include/hw/proxy/proxy-lsi53c895a.h | 42 +++++++++++++++++
 3 files changed, 134 insertions(+)
 create mode 100644 hw/proxy/proxy-lsi53c895a.c
 create mode 100644 include/hw/proxy/proxy-lsi53c895a.h

diff --git a/hw/proxy/Makefile.objs b/hw/proxy/Makefile.objs
index eb81624..f562f5a 100644
--- a/hw/proxy/Makefile.objs
+++ b/hw/proxy/Makefile.objs
@@ -1 +1,2 @@
 common-obj-$(CONFIG_MPQEMU) += qemu-proxy.o
+common-obj-$(CONFIG_MPQEMU) += proxy-lsi53c895a.o
diff --git a/hw/proxy/proxy-lsi53c895a.c b/hw/proxy/proxy-lsi53c895a.c
new file mode 100644
index 0000000..7734ae2
--- /dev/null
+++ b/hw/proxy/proxy-lsi53c895a.c
@@ -0,0 +1,91 @@
+/*
+ * Copyright 2019, Oracle and/or its affiliates.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include <sys/types.h>
+
+#include "qemu/osdep.h"
+#include "hw/qdev-core.h"
+#include "qemu/bitops.h"
+#include "hw/pci/pci.h"
+#include "hw/proxy/qemu-proxy.h"
+#include "hw/proxy/proxy-lsi53c895a.h"
+#include "exec/memory.h"
+
+static void proxy_lsi_realize(PCIProxyDev *dev, Error **errp)
+{
+    ProxyLSIState *s = LSI_PROXY_DEV(dev);
+    PCIDevice *pci_dev = PCI_DEVICE(dev);
+    uint8_t *pci_conf = pci_dev->config;
+
+    pci_conf[PCI_LATENCY_TIMER] = 0xff;
+    pci_conf[PCI_INTERRUPT_PIN] = 0x01;
+
+    dev->region[0].present = true;
+    dev->region[0].type = PCI_BASE_ADDRESS_SPACE_IO;
+    memory_region_init_io(&dev->region[0].mr, OBJECT(s), &proxy_default_ops,
+                          &dev->region[0], "proxy-lsi-io", 256);
+
+    dev->region[1].present = true;
+    dev->region[1].memory = true;
+    dev->region[1].type = PCI_BASE_ADDRESS_SPACE_MEMORY;
+    memory_region_init_io(&dev->region[1].mr, OBJECT(s), &proxy_default_ops,
+                          &dev->region[1], "proxy-lsi-mmio", 0x400);
+
+    dev->region[2].present = true;
+    dev->region[2].memory = true;
+    dev->region[2].type = PCI_BASE_ADDRESS_SPACE_MEMORY;
+    memory_region_init_io(&dev->region[2].mr, OBJECT(s), &proxy_default_ops,
+                          &dev->region[2], "proxy-lsi-ram", 0x2000);
+}
+
+static void proxy_lsi_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PCIDeviceClass *pci_class = PCI_DEVICE_CLASS(klass);
+    PCIProxyDevClass *proxy_class = PCI_PROXY_DEV_CLASS(klass);
+
+    proxy_class->realize = proxy_lsi_realize;
+    proxy_class->command = g_strdup("qemu-scsi-dev");
+
+    pci_class->vendor_id = PCI_VENDOR_ID_LSI_LOGIC;
+    pci_class->device_id = PCI_DEVICE_ID_LSI_53C895A;
+    pci_class->class_id = PCI_CLASS_STORAGE_SCSI;
+    pci_class->subsystem_id = 0x1000;
+
+    set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
+
+    dc->desc = "LSI Proxy Device";
+}
+
+static const TypeInfo lsi_proxy_dev_type_info = {
+    .name          = TYPE_PROXY_LSI53C895A,
+    .parent        = TYPE_PCI_PROXY_DEV,
+    .instance_size = sizeof(ProxyLSIState),
+    .class_init    = proxy_lsi_class_init,
+};
+
+static void lsi_proxy_dev_register_types(void)
+{
+    type_register_static(&lsi_proxy_dev_type_info);
+}
+
+type_init(lsi_proxy_dev_register_types)
diff --git a/include/hw/proxy/proxy-lsi53c895a.h b/include/hw/proxy/proxy-lsi53c895a.h
new file mode 100644
index 0000000..8afb8f3
--- /dev/null
+++ b/include/hw/proxy/proxy-lsi53c895a.h
@@ -0,0 +1,42 @@
+/*
+ * Copyright 2019, Oracle and/or its affiliates.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef LSI_PROXY_H
+#define LSI_PROXY_H
+
+#include "hw/proxy/qemu-proxy.h"
+
+#define TYPE_PROXY_LSI53C895A "proxy-lsi53c895a"
+
+#define LSI_PROXY_DEV(obj) \
+            OBJECT_CHECK(ProxyLSIState, (obj), TYPE_PROXY_LSI53C895A)
+
+typedef struct ProxyLSIState {
+    PCIProxyDev parent_dev;
+
+    MemoryRegion mmio_io;
+    MemoryRegion ram_io;
+    MemoryRegion io_io;
+
+} ProxyLSIState;
+
+#endif
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 17/49] multi-process: Synchronize remote memory
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (15 preceding siblings ...)
  2019-10-24  9:08 ` [RFC v4 PATCH 16/49] multi-process: Add LSI device proxy object Jagannathan Raman
@ 2019-10-24  9:08 ` Jagannathan Raman
  2019-11-21 11:44   ` Stefan Hajnoczi
  2019-10-24  9:08 ` [RFC v4 PATCH 18/49] multi-process: create IOHUB object to handle irq Jagannathan Raman
                   ` (36 subsequent siblings)
  53 siblings, 1 reply; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

Add memory-listener object which is used to keep the view of the RAM
in sync between QEMU and remote process.
A MemoryListener is registered for system-memory AddressSpace. The
listener sends SYNC_SYSMEM message to the remote process when memory
listener commits the changes to memory, the remote process receives
the message and processes it in the handler for SYNC_SYSMEM message.

Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 v2 -> v3:
   - Refactored code to obtain fd from host address, added
     get_fd_from_hostaddr().
   - Discovered a bug which results in invalid FDs (-1) being
     sent over to the remote process. Fixed this by checking
     if the FD value is valid before sending over to remote.

 Makefile.target                |   1 +
 hw/proxy/memory-sync.c         | 226 +++++++++++++++++++++++++++++++++++++++++
 hw/proxy/qemu-proxy.c          |   5 +
 include/hw/proxy/memory-sync.h |  51 ++++++++++
 include/hw/proxy/qemu-proxy.h  |   2 +
 remote/remote-main.c           |  11 ++
 6 files changed, 296 insertions(+)
 create mode 100644 hw/proxy/memory-sync.c
 create mode 100644 include/hw/proxy/memory-sync.h

diff --git a/Makefile.target b/Makefile.target
index 547f10e..eb1ac34 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -122,6 +122,7 @@ obj-$(CONFIG_TCG) += fpu/softfloat.o
 obj-y += target/$(TARGET_BASE_ARCH)/
 obj-y += disas.o
 obj-$(call notempty,$(TARGET_XML_FILES)) += gdbstub-xml.o
+obj-$(CONFIG_MPQEMU) += hw/proxy/memory-sync.o
 LIBS := $(libs_cpu) $(LIBS)
 
 remote-pci-tgt-obj-$(CONFIG_MPQEMU) += accel/stubs/kvm-stub.o
diff --git a/hw/proxy/memory-sync.c b/hw/proxy/memory-sync.c
new file mode 100644
index 0000000..da24a25
--- /dev/null
+++ b/hw/proxy/memory-sync.c
@@ -0,0 +1,226 @@
+/*
+ * Copyright 2019, Oracle and/or its affiliates.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include <sys/types.h>
+#include <stdio.h>
+#include <string.h>
+
+#include "qemu/osdep.h"
+#include "qemu/compiler.h"
+#include "qemu/int128.h"
+#include "qemu/range.h"
+#include "exec/memory.h"
+#include "exec/cpu-common.h"
+#include "cpu.h"
+#include "exec/ram_addr.h"
+#include "exec/address-spaces.h"
+#include "io/mpqemu-link.h"
+#include "hw/proxy/memory-sync.h"
+
+static const TypeInfo remote_mem_sync_type_info = {
+    .name          = TYPE_MEMORY_LISTENER,
+    .parent        = TYPE_OBJECT,
+    .instance_size = sizeof(RemoteMemSync),
+};
+
+static void remote_mem_sync_register_types(void)
+{
+    type_register_static(&remote_mem_sync_type_info);
+}
+
+type_init(remote_mem_sync_register_types)
+
+static void proxy_ml_begin(MemoryListener *listener)
+{
+    RemoteMemSync *sync = container_of(listener, RemoteMemSync, listener);
+    int mrs;
+
+    for (mrs = 0; mrs < sync->n_mr_sections; mrs++) {
+        memory_region_unref(sync->mr_sections[mrs].mr);
+    }
+
+    g_free(sync->mr_sections);
+    sync->mr_sections = NULL;
+    sync->n_mr_sections = 0;
+}
+
+static int get_fd_from_hostaddr(uint64_t host, ram_addr_t *offset)
+{
+    MemoryRegion *mr;
+    ram_addr_t off;
+
+    mr = memory_region_from_host((void *)(uintptr_t)host, &off);
+
+    if (offset) {
+        *offset = off;
+    }
+
+    return memory_region_get_fd(mr);
+}
+
+static bool proxy_mrs_can_merge(uint64_t host, uint64_t prev_host, size_t size)
+{
+    bool merge;
+    int fd1, fd2;
+
+    fd1 = get_fd_from_hostaddr(host, NULL);
+
+    fd2 = get_fd_from_hostaddr(prev_host, NULL);
+
+    merge = (fd1 == fd2);
+
+    merge &= ((prev_host + size) == host);
+
+    return merge;
+}
+
+static void proxy_ml_region_addnop(MemoryListener *listener,
+                                   MemoryRegionSection *section)
+{
+    RemoteMemSync *sync = container_of(listener, RemoteMemSync, listener);
+    bool need_add = true;
+    uint64_t mrs_size, mrs_gpa, mrs_page;
+    uintptr_t mrs_host;
+    RAMBlock *mrs_rb;
+    MemoryRegionSection *prev_sec;
+
+    if (!(memory_region_is_ram(section->mr) &&
+          !memory_region_is_rom(section->mr))) {
+        return;
+    }
+
+    mrs_rb = section->mr->ram_block;
+    mrs_page = (uint64_t)qemu_ram_pagesize(mrs_rb);
+    mrs_size = int128_get64(section->size);
+    mrs_gpa = section->offset_within_address_space;
+    mrs_host = (uintptr_t)memory_region_get_ram_ptr(section->mr) +
+               section->offset_within_region;
+
+    if (get_fd_from_hostaddr(mrs_host, NULL) <= 0) {
+        return;
+    }
+
+    mrs_host = mrs_host & ~(mrs_page - 1);
+    mrs_gpa = mrs_gpa & ~(mrs_page - 1);
+    mrs_size = ROUND_UP(mrs_size, mrs_page);
+
+    if (sync->n_mr_sections) {
+        prev_sec = sync->mr_sections + (sync->n_mr_sections - 1);
+        uint64_t prev_gpa_start = prev_sec->offset_within_address_space;
+        uint64_t prev_size = int128_get64(prev_sec->size);
+        uint64_t prev_gpa_end   = range_get_last(prev_gpa_start, prev_size);
+        uint64_t prev_host_start =
+            (uintptr_t)memory_region_get_ram_ptr(prev_sec->mr) +
+            prev_sec->offset_within_region;
+        uint64_t prev_host_end = range_get_last(prev_host_start, prev_size);
+
+        if (mrs_gpa <= (prev_gpa_end + 1)) {
+            if (mrs_gpa < prev_gpa_start) {
+                assert(0);
+            }
+
+            if ((section->mr == prev_sec->mr) &&
+                proxy_mrs_can_merge(mrs_host, prev_host_start,
+                                    (mrs_gpa - prev_gpa_start))) {
+                uint64_t max_end = MAX(prev_host_end, mrs_host + mrs_size);
+                need_add = false;
+                prev_sec->offset_within_address_space =
+                    MIN(prev_gpa_start, mrs_gpa);
+                prev_sec->offset_within_region =
+                    MIN(prev_host_start, mrs_host) -
+                    (uintptr_t)memory_region_get_ram_ptr(prev_sec->mr);
+                prev_sec->size = int128_make64(max_end - MIN(prev_host_start,
+                                                             mrs_host));
+            }
+        }
+    }
+
+    if (need_add) {
+        ++sync->n_mr_sections;
+        sync->mr_sections = g_renew(MemoryRegionSection, sync->mr_sections,
+                                    sync->n_mr_sections);
+        sync->mr_sections[sync->n_mr_sections - 1] = *section;
+        sync->mr_sections[sync->n_mr_sections - 1].fv = NULL;
+        memory_region_ref(section->mr);
+    }
+}
+
+static void proxy_ml_commit(MemoryListener *listener)
+{
+    RemoteMemSync *sync = container_of(listener, RemoteMemSync, listener);
+    MPQemuMsg msg;
+    MemoryRegionSection section;
+    ram_addr_t offset;
+    uintptr_t host_addr;
+    int region;
+
+    memset(&msg, 0, sizeof(MPQemuMsg));
+
+    msg.cmd = SYNC_SYSMEM;
+    msg.bytestream = 0;
+    msg.num_fds = sync->n_mr_sections;
+    msg.size = sizeof(msg.data1);
+    assert(msg.num_fds <= REMOTE_MAX_FDS);
+
+    for (region = 0; region < sync->n_mr_sections; region++) {
+        section = sync->mr_sections[region];
+        msg.data1.sync_sysmem.gpas[region] =
+            section.offset_within_address_space;
+        msg.data1.sync_sysmem.sizes[region] = int128_get64(section.size);
+        host_addr = (uintptr_t)memory_region_get_ram_ptr(section.mr) +
+                    section.offset_within_region;
+        msg.fds[region] = get_fd_from_hostaddr(host_addr, &offset);
+        msg.data1.sync_sysmem.offsets[region] = offset;
+    }
+    mpqemu_msg_send(sync->mpqemu_link, &msg, sync->mpqemu_link->com);
+}
+
+void deconfigure_memory_sync(RemoteMemSync *sync)
+{
+    memory_listener_unregister(&sync->listener);
+}
+
+/*
+ * TODO: Memory Sync need not be instantianted once per every proxy device.
+ *       All remote devices are going to get the exact same updates at the
+ *       same time. It therefore makes sense to have a broadcast model.
+ *
+ *       Broadcast model would involve running the MemorySync object in a
+ *       thread. MemorySync would contain a list of mpqemu-link objects
+ *       that need notification. proxy_ml_commit() could send the same
+ *       message to all the links at the same time.
+ */
+void configure_memory_sync(RemoteMemSync *sync, MPQemuLinkState *mpqemu_link)
+{
+    sync->n_mr_sections = 0;
+    sync->mr_sections = NULL;
+
+    sync->mpqemu_link = mpqemu_link;
+
+    sync->listener.begin = proxy_ml_begin;
+    sync->listener.commit = proxy_ml_commit;
+    sync->listener.region_add = proxy_ml_region_addnop;
+    sync->listener.region_nop = proxy_ml_region_addnop;
+    sync->listener.priority = 10;
+
+    memory_listener_register(&sync->listener, &address_space_memory);
+}
diff --git a/hw/proxy/qemu-proxy.c b/hw/proxy/qemu-proxy.c
index e1f62d7..71770ca 100644
--- a/hw/proxy/qemu-proxy.c
+++ b/hw/proxy/qemu-proxy.c
@@ -42,6 +42,8 @@
 #include "qapi/qmp/qstring.h"
 #include "sysemu/sysemu.h"
 #include "hw/proxy/qemu-proxy.h"
+#include "hw/proxy/memory-sync.h"
+#include "qom/object.h"
 
 static void pci_proxy_dev_realize(PCIDevice *dev, Error **errp);
 
@@ -268,6 +270,8 @@ static void init_proxy(PCIDevice *dev, char *command, bool need_spawn, Error **e
 
     mpqemu_init_channel(pdev->mpqemu_link, &pdev->mpqemu_link->com,
                         pdev->socket);
+
+    configure_memory_sync(pdev->sync, pdev->mpqemu_link);
 }
 
 static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
@@ -298,6 +302,7 @@ static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
     dev->set_proxy_sock = set_proxy_sock;
     dev->get_proxy_sock = get_proxy_sock;
     dev->init_proxy = init_proxy;
+    dev->sync = REMOTE_MEM_SYNC(object_new(TYPE_MEMORY_LISTENER));
 }
 
 static void send_bar_access_msg(PCIProxyDev *dev, MemoryRegion *mr,
diff --git a/include/hw/proxy/memory-sync.h b/include/hw/proxy/memory-sync.h
new file mode 100644
index 0000000..cb94995
--- /dev/null
+++ b/include/hw/proxy/memory-sync.h
@@ -0,0 +1,51 @@
+/*
+ * Copyright 2019, Oracle and/or its affiliates.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef MEMORY_SYNC_H
+#define MEMORY_SYNC_H
+
+#include <sys/types.h>
+
+#include "qemu/osdep.h"
+#include "qom/object.h"
+#include "exec/memory.h"
+#include "io/mpqemu-link.h"
+
+#define TYPE_MEMORY_LISTENER "memory-listener"
+#define REMOTE_MEM_SYNC(obj) \
+            OBJECT_CHECK(RemoteMemSync, (obj), TYPE_MEMORY_LISTENER)
+
+typedef struct RemoteMemSync {
+    Object obj;
+
+    MemoryListener listener;
+
+    int n_mr_sections;
+    MemoryRegionSection *mr_sections;
+
+    MPQemuLinkState *mpqemu_link;
+} RemoteMemSync;
+
+void configure_memory_sync(RemoteMemSync *sync, MPQemuLinkState *mpqemu_link);
+void deconfigure_memory_sync(RemoteMemSync *sync);
+
+#endif
diff --git a/include/hw/proxy/qemu-proxy.h b/include/hw/proxy/qemu-proxy.h
index 5f57822..7475bba 100644
--- a/include/hw/proxy/qemu-proxy.h
+++ b/include/hw/proxy/qemu-proxy.h
@@ -24,6 +24,7 @@
 #define QEMU_PROXY_H
 
 #include "io/mpqemu-link.h"
+#include "hw/proxy/memory-sync.h"
 
 #define TYPE_PCI_PROXY_DEV "pci-proxy-dev"
 
@@ -56,6 +57,7 @@ struct PCIProxyDev {
 
     MPQemuLinkState *mpqemu_link;
 
+    RemoteMemSync *sync;
     EventNotifier intr;
     EventNotifier resample;
 
diff --git a/remote/remote-main.c b/remote/remote-main.c
index 49b27d5..9fe4b87 100644
--- a/remote/remote-main.c
+++ b/remote/remote-main.c
@@ -47,6 +47,7 @@
 #include "sysemu/sysemu.h"
 #include "block/block.h"
 #include "exec/memattrs.h"
+#include "exec/address-spaces.h"
 
 static MPQemuLinkState *mpqemu_link;
 PCIDevice *remote_pci_dev;
@@ -175,6 +176,16 @@ static void process_msg(GIOCondition cond, MPQemuChannel *chan)
             goto finalize_loop;
         }
         break;
+    case SYNC_SYSMEM:
+        /*
+         * TODO: ensure no active DMA is happening when
+         * sysmem is being updated
+         */
+        remote_sysmem_reconfig(msg, &err);
+        if (err) {
+            goto finalize_loop;
+        }
+        break;
     default:
         error_setg(&err, "Unknown command");
         goto finalize_loop;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 18/49] multi-process: create IOHUB object to handle irq
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (16 preceding siblings ...)
  2019-10-24  9:08 ` [RFC v4 PATCH 17/49] multi-process: Synchronize remote memory Jagannathan Raman
@ 2019-10-24  9:08 ` Jagannathan Raman
  2019-11-21 12:02   ` Stefan Hajnoczi
  2019-10-24  9:09 ` [RFC v4 PATCH 19/49] multi-process: configure remote side devices Jagannathan Raman
                   ` (35 subsequent siblings)
  53 siblings, 1 reply; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

IOHUB object is added to manage PCI IRQs. It uses KVM_IRQFD
ioctl to create irqfd to injecting PCI interrupts to the guest.
IOHUB object forwards the irqfd to the remote process. Remote process
uses this fd to directly send interrupts to the guest, bypassing QEMU.

Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 Makefile.target               |   1 +
 hw/proxy/Makefile.objs        |   1 -
 hw/proxy/qemu-proxy.c         |  54 ++++++++++++++
 include/hw/pci/pci_ids.h      |   3 +
 include/hw/proxy/qemu-proxy.h |   5 ++
 include/io/mpqemu-link.h      |   8 +++
 include/remote/iohub.h        |  63 +++++++++++++++++
 include/remote/machine.h      |   2 +
 remote/Makefile.objs          |   1 +
 remote/iohub.c                | 159 ++++++++++++++++++++++++++++++++++++++++++
 remote/machine.c              |  15 ++++
 remote/remote-main.c          |   4 ++
 12 files changed, 315 insertions(+), 1 deletion(-)
 create mode 100644 include/remote/iohub.h
 create mode 100644 remote/iohub.c

diff --git a/Makefile.target b/Makefile.target
index eb1ac34..f16b74a 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -123,6 +123,7 @@ obj-y += target/$(TARGET_BASE_ARCH)/
 obj-y += disas.o
 obj-$(call notempty,$(TARGET_XML_FILES)) += gdbstub-xml.o
 obj-$(CONFIG_MPQEMU) += hw/proxy/memory-sync.o
+obj-$(CONFIG_MPQEMU) += hw/proxy/qemu-proxy.o
 LIBS := $(libs_cpu) $(LIBS)
 
 remote-pci-tgt-obj-$(CONFIG_MPQEMU) += accel/stubs/kvm-stub.o
diff --git a/hw/proxy/Makefile.objs b/hw/proxy/Makefile.objs
index f562f5a..ca89109 100644
--- a/hw/proxy/Makefile.objs
+++ b/hw/proxy/Makefile.objs
@@ -1,2 +1 @@
-common-obj-$(CONFIG_MPQEMU) += qemu-proxy.o
 common-obj-$(CONFIG_MPQEMU) += proxy-lsi53c895a.o
diff --git a/hw/proxy/qemu-proxy.c b/hw/proxy/qemu-proxy.c
index 71770ca..bd7dd35 100644
--- a/hw/proxy/qemu-proxy.c
+++ b/hw/proxy/qemu-proxy.c
@@ -27,6 +27,9 @@
 #include <unistd.h>
 #include <assert.h>
 #include <string.h>
+#include <linux/kvm.h>
+#include <errno.h>
+
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "io/mpqemu-link.h"
@@ -44,6 +47,9 @@
 #include "hw/proxy/qemu-proxy.h"
 #include "hw/proxy/memory-sync.h"
 #include "qom/object.h"
+#include "qemu/event_notifier.h"
+#include "sysemu/kvm.h"
+#include "util/event_notifier-posix.c"
 
 static void pci_proxy_dev_realize(PCIDevice *dev, Error **errp);
 
@@ -242,6 +248,53 @@ static void pci_proxy_dev_register_types(void)
 
 type_init(pci_proxy_dev_register_types)
 
+static void proxy_intx_update(PCIDevice *pci_dev)
+{
+    PCIProxyDev *dev = PCI_PROXY_DEV(pci_dev);
+    PCIINTxRoute route;
+    int pin = pci_get_byte(pci_dev->config + PCI_INTERRUPT_PIN) - 1;
+
+    if (dev->irqfd.fd) {
+        dev->irqfd.flags = KVM_IRQFD_FLAG_DEASSIGN;
+        (void) kvm_vm_ioctl(kvm_state, KVM_IRQFD, &dev->irqfd);
+        memset(&dev->irqfd, 0, sizeof(struct kvm_irqfd));
+    }
+
+    route = pci_device_route_intx_to_irq(pci_dev, pin);
+
+    dev->irqfd.fd = event_notifier_get_fd(&dev->intr);
+    dev->irqfd.resamplefd = event_notifier_get_fd(&dev->resample);
+    dev->irqfd.gsi = route.irq;
+    dev->irqfd.flags |= KVM_IRQFD_FLAG_RESAMPLE;
+    (void) kvm_vm_ioctl(kvm_state, KVM_IRQFD, &dev->irqfd);
+}
+
+static void setup_irqfd(PCIProxyDev *dev)
+{
+    PCIDevice *pci_dev = PCI_DEVICE(dev);
+    MPQemuMsg msg;
+
+    event_notifier_init(&dev->intr, 0);
+    event_notifier_init(&dev->resample, 0);
+
+    memset(&msg, 0, sizeof(MPQemuMsg));
+    msg.cmd = SET_IRQFD;
+    msg.num_fds = 2;
+    msg.fds[0] = event_notifier_get_fd(&dev->intr);
+    msg.fds[1] = event_notifier_get_fd(&dev->resample);
+    msg.data1.set_irqfd.intx =
+        pci_get_byte(pci_dev->config + PCI_INTERRUPT_PIN) - 1;
+    msg.size = sizeof(msg.data1);
+
+    mpqemu_msg_send(dev->mpqemu_link, &msg, dev->mpqemu_link->com);
+
+    memset(&dev->irqfd, 0, sizeof(struct kvm_irqfd));
+
+    proxy_intx_update(pci_dev);
+
+    pci_device_set_intx_routing_notifier(pci_dev, proxy_intx_update);
+}
+
 static void init_proxy(PCIDevice *dev, char *command, bool need_spawn, Error **errp)
 {
     PCIProxyDev *pdev = PCI_PROXY_DEV(dev);
@@ -272,6 +325,7 @@ static void init_proxy(PCIDevice *dev, char *command, bool need_spawn, Error **e
                         pdev->socket);
 
     configure_memory_sync(pdev->sync, pdev->mpqemu_link);
+    setup_irqfd(pdev);
 }
 
 static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h
index 0abe27a..9cc5e28 100644
--- a/include/hw/pci/pci_ids.h
+++ b/include/hw/pci/pci_ids.h
@@ -191,6 +191,9 @@
 #define PCI_DEVICE_ID_SUN_SIMBA          0x5000
 #define PCI_DEVICE_ID_SUN_SABRE          0xa000
 
+#define PCI_VENDOR_ID_ORACLE             0x108e
+#define PCI_DEVICE_ID_REMOTE_IOHUB       0xb000
+
 #define PCI_VENDOR_ID_CMD                0x1095
 #define PCI_DEVICE_ID_CMD_646            0x0646
 
diff --git a/include/hw/proxy/qemu-proxy.h b/include/hw/proxy/qemu-proxy.h
index 7475bba..0fad7e3 100644
--- a/include/hw/proxy/qemu-proxy.h
+++ b/include/hw/proxy/qemu-proxy.h
@@ -23,8 +23,11 @@
 #ifndef QEMU_PROXY_H
 #define QEMU_PROXY_H
 
+#include <linux/kvm.h>
+
 #include "io/mpqemu-link.h"
 #include "hw/proxy/memory-sync.h"
+#include "qemu/event_notifier.h"
 
 #define TYPE_PCI_PROXY_DEV "pci-proxy-dev"
 
@@ -58,6 +61,8 @@ struct PCIProxyDev {
     MPQemuLinkState *mpqemu_link;
 
     RemoteMemSync *sync;
+    struct kvm_irqfd irqfd;
+
     EventNotifier intr;
     EventNotifier resample;
 
diff --git a/include/io/mpqemu-link.h b/include/io/mpqemu-link.h
index 89f04c5..1885ad7 100644
--- a/include/io/mpqemu-link.h
+++ b/include/io/mpqemu-link.h
@@ -54,6 +54,8 @@
  * SYNC_SYSMEM      Shares QEMU's RAM with remote device's RAM
  * BAR_WRITE        Writes to PCI BAR region
  * BAR_READ         Reads from PCI BAR region
+ * SET_IRQFD        Sets the IRQFD to be used to raise interrupts directly
+ *                  from remote device
  *
  * proc_cmd_t enum type to specify the command to be executed on the remote
  * device.
@@ -65,6 +67,7 @@ typedef enum {
     SYNC_SYSMEM,
     BAR_WRITE,
     BAR_READ,
+    SET_IRQFD,
     MAX,
 } mpqemu_cmd_t;
 
@@ -95,6 +98,10 @@ typedef struct {
 } bar_access_msg_t;
 
 typedef struct {
+    int intx;
+} set_irqfd_msg_t;
+
+typedef struct {
     mpqemu_cmd_t cmd;
     int bytestream;
     size_t size;
@@ -103,6 +110,7 @@ typedef struct {
         uint64_t u64;
         sync_sysmem_msg_t sync_sysmem;
         bar_access_msg_t bar_access;
+        set_irqfd_msg_t set_irqfd;
     } data1;
 
     int fds[REMOTE_MAX_FDS];
diff --git a/include/remote/iohub.h b/include/remote/iohub.h
new file mode 100644
index 0000000..7ae41e9
--- /dev/null
+++ b/include/remote/iohub.h
@@ -0,0 +1,63 @@
+/*
+ * IO Hub for remote device
+ *
+ * Copyright 2019, Oracle and/or its affiliates. All rights reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef REMOTE_IOHUB_H
+#define REMOTE_IOHUB_H
+
+#include <sys/types.h>
+
+#include "qemu/osdep.h"
+#include "hw/pci/pci.h"
+#include "qemu/event_notifier.h"
+#include "qemu/thread-posix.h"
+#include "io/mpqemu-link.h"
+
+#define REMOTE_IOHUB_NB_PIRQS    8
+
+#define REMOTE_IOHUB_DEV         31
+#define REMOTE_IOHUB_FUNC        0
+
+#define TYPE_REMOTE_IOHUB_DEVICE "remote-iohub"
+#define REMOTE_IOHUB_DEVICE(obj) \
+    OBJECT_CHECK(RemoteIOHubState, (obj), TYPE_REMOTE_IOHUB_DEVICE)
+
+typedef struct RemoteIOHubState {
+    PCIDevice d;
+    uint8_t irq_num[PCI_SLOT_MAX][PCI_NUM_PINS];
+    EventNotifier irqfds[REMOTE_IOHUB_NB_PIRQS];
+    EventNotifier resamplefds[REMOTE_IOHUB_NB_PIRQS];
+    unsigned int irq_level[REMOTE_IOHUB_NB_PIRQS];
+    QemuMutex irq_level_lock[REMOTE_IOHUB_NB_PIRQS];
+} RemoteIOHubState;
+
+typedef struct ResampleToken {
+    RemoteIOHubState *iohub;
+    int pirq;
+} ResampleToken;
+
+int remote_iohub_map_irq(PCIDevice *pci_dev, int intx);
+void remote_iohub_set_irq(void *opaque, int pirq, int level);
+void process_set_irqfd_msg(PCIDevice *pci_dev, MPQemuMsg *msg);
+
+#endif
diff --git a/include/remote/machine.h b/include/remote/machine.h
index a00732d..0a16cc6 100644
--- a/include/remote/machine.h
+++ b/include/remote/machine.h
@@ -30,11 +30,13 @@
 #include "hw/boards.h"
 #include "remote/pcihost.h"
 #include "qemu/notify.h"
+#include "remote/iohub.h"
 
 typedef struct RemMachineState {
     MachineState parent_obj;
 
     RemPCIHost *host;
+    RemoteIOHubState *iohub;
 } RemMachineState;
 
 #define TYPE_REMOTE_MACHINE "remote-machine"
diff --git a/remote/Makefile.objs b/remote/Makefile.objs
index 13d4c48..cbb3065 100644
--- a/remote/Makefile.objs
+++ b/remote/Makefile.objs
@@ -1,3 +1,4 @@
 remote-pci-obj-$(CONFIG_MPQEMU) += remote-main.o
 remote-pci-obj-$(CONFIG_MPQEMU) += pcihost.o
 remote-pci-obj-$(CONFIG_MPQEMU) += machine.o
+remote-pci-obj-$(CONFIG_MPQEMU) += iohub.o
diff --git a/remote/iohub.c b/remote/iohub.c
new file mode 100644
index 0000000..dad92c9
--- /dev/null
+++ b/remote/iohub.c
@@ -0,0 +1,159 @@
+/*
+ * Remote IO Hub
+ *
+ * Copyright 2019, Oracle and/or its affiliates. All rights reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include <sys/types.h>
+
+#include "qemu/osdep.h"
+#include "hw/pci/pci.h"
+#include "hw/pci/pci_ids.h"
+#include "hw/pci/pci_bus.h"
+#include "remote/iohub.h"
+#include "qemu/thread.h"
+#include "hw/boards.h"
+#include "remote/machine.h"
+#include "qemu/main-loop.h"
+
+static void remote_iohub_initfn(Object *obj)
+{
+    RemoteIOHubState *iohub = REMOTE_IOHUB_DEVICE(obj);
+    int slot, intx, pirq;
+
+    memset(&iohub->irqfds, 0, sizeof(iohub->irqfds));
+    memset(&iohub->resamplefds, 0, sizeof(iohub->resamplefds));
+
+    for (slot = 0; slot < PCI_SLOT_MAX; slot++) {
+        for (intx = 0; intx < PCI_NUM_PINS; intx++) {
+            iohub->irq_num[slot][intx] = (slot + intx) % 4 + 4;
+        }
+    }
+
+    for (pirq = 0; pirq < REMOTE_IOHUB_NB_PIRQS; pirq++) {
+        qemu_mutex_init(&iohub->irq_level_lock[pirq]);
+        iohub->irq_level[pirq] = 0;
+    }
+}
+
+static void remote_iohub_class_init(ObjectClass *klass, void *data)
+{
+    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+    k->vendor_id = PCI_VENDOR_ID_ORACLE;
+    k->device_id = PCI_DEVICE_ID_REMOTE_IOHUB;
+}
+
+static const TypeInfo remote_iohub_info = {
+    .name       = TYPE_REMOTE_IOHUB_DEVICE,
+    .parent     = TYPE_PCI_DEVICE,
+    .instance_size = sizeof(RemoteIOHubState),
+    .instance_init = remote_iohub_initfn,
+    .class_init  = remote_iohub_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { INTERFACE_CONVENTIONAL_PCI_DEVICE },
+        { }
+    }
+};
+
+static void remote_iohub_register(void)
+{
+    type_register_static(&remote_iohub_info);
+}
+
+type_init(remote_iohub_register);
+
+int remote_iohub_map_irq(PCIDevice *pci_dev, int intx)
+{
+    BusState *bus = qdev_get_parent_bus(&pci_dev->qdev);
+    PCIBus *pci_bus = PCI_BUS(bus);
+    PCIDevice *pci_iohub =
+        pci_bus->devices[PCI_DEVFN(REMOTE_IOHUB_DEV, REMOTE_IOHUB_FUNC)];
+    RemoteIOHubState *iohub = REMOTE_IOHUB_DEVICE(pci_iohub);
+
+    return iohub->irq_num[PCI_SLOT(pci_dev->devfn)][intx];
+}
+
+/*
+ * TODO: Using lock to set the interrupt level could become a
+ *       performance bottleneck. Check if atomic arithmetic
+ *       is possible.
+ */
+void remote_iohub_set_irq(void *opaque, int pirq, int level)
+{
+    RemoteIOHubState *iohub = opaque;
+
+    assert(pirq >= 0);
+    assert(pirq < REMOTE_IOHUB_NB_PIRQS);
+
+    qemu_mutex_lock(&iohub->irq_level_lock[pirq]);
+
+    if (level) {
+        if (++iohub->irq_level[pirq] == 1) {
+            event_notifier_set(&iohub->irqfds[pirq]);
+        }
+    } else if (iohub->irq_level[pirq] > 0) {
+        iohub->irq_level[pirq]--;
+    }
+
+    qemu_mutex_unlock(&iohub->irq_level_lock[pirq]);
+}
+
+static void intr_resample_handler(void *opaque)
+{
+    ResampleToken *token = opaque;
+    RemoteIOHubState *iohub = token->iohub;
+    uint64_t val;
+    int pirq, s;
+
+    pirq = token->pirq;
+
+    s = read(event_notifier_get_fd(&iohub->resamplefds[pirq]), &val,
+             sizeof(uint64_t));
+
+    assert(s >= 0);
+
+    qemu_mutex_lock(&iohub->irq_level_lock[pirq]);
+
+    if (iohub->irq_level[pirq]) {
+        event_notifier_set(&iohub->irqfds[pirq]);
+    }
+
+    qemu_mutex_unlock(&iohub->irq_level_lock[pirq]);
+}
+
+void process_set_irqfd_msg(PCIDevice *pci_dev, MPQemuMsg *msg)
+{
+    RemMachineState *machine = REMOTE_MACHINE(current_machine);
+    RemoteIOHubState *iohub = machine->iohub;
+    ResampleToken *token;
+    int pirq = remote_iohub_map_irq(pci_dev, msg->data1.set_irqfd.intx);
+
+    assert(msg->num_fds == 2);
+
+    event_notifier_init_fd(&iohub->irqfds[pirq], msg->fds[0]);
+    event_notifier_init_fd(&iohub->resamplefds[pirq], msg->fds[1]);
+
+    token = g_malloc0(sizeof(ResampleToken));
+    token->iohub = iohub;
+    token->pirq = pirq;
+
+    qemu_set_fd_handler(msg->fds[1], intr_resample_handler, NULL, token);
+}
diff --git a/remote/machine.c b/remote/machine.c
index 4ce197d..5b03167 100644
--- a/remote/machine.c
+++ b/remote/machine.c
@@ -40,6 +40,8 @@
 #include "qemu-common.h"
 #include "sysemu/sysemu.h"
 #include "qemu/notify.h"
+#include "hw/pci/pci_host.h"
+#include "remote/iohub.h"
 
 static NotifierList machine_init_done_notifiers =
     NOTIFIER_LIST_INITIALIZER(machine_init_done_notifiers);
@@ -70,6 +72,8 @@ static void remote_machine_init(Object *obj)
     RemMachineState *s = REMOTE_MACHINE(obj);
     RemPCIHost *rem_host;
     MemoryRegion *system_memory, *system_io, *pci_memory;
+    PCIHostState *pci_host;
+    PCIDevice *pci_dev;
 
     Error *error_abort = NULL;
 
@@ -101,6 +105,17 @@ static void remote_machine_init(Object *obj)
     qemu_mutex_unlock_iothread();
 
     qdev_init_nofail(DEVICE(rem_host));
+
+    pci_host = PCI_HOST_BRIDGE(rem_host);
+    pci_dev = pci_create_simple_multifunction(pci_host->bus,
+                                              PCI_DEVFN(REMOTE_IOHUB_DEV,
+                                                        REMOTE_IOHUB_FUNC),
+                                              true, TYPE_REMOTE_IOHUB_DEVICE);
+
+    s->iohub = REMOTE_IOHUB_DEVICE(pci_dev);
+
+    pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
+                 s->iohub, REMOTE_IOHUB_NB_PIRQS);
 }
 
 static const TypeInfo remote_machine = {
diff --git a/remote/remote-main.c b/remote/remote-main.c
index 9fe4b87..cede97c 100644
--- a/remote/remote-main.c
+++ b/remote/remote-main.c
@@ -48,6 +48,7 @@
 #include "block/block.h"
 #include "exec/memattrs.h"
 #include "exec/address-spaces.h"
+#include "remote/iohub.h"
 
 static MPQemuLinkState *mpqemu_link;
 PCIDevice *remote_pci_dev;
@@ -186,6 +187,9 @@ static void process_msg(GIOCondition cond, MPQemuChannel *chan)
             goto finalize_loop;
         }
         break;
+    case SET_IRQFD:
+        process_set_irqfd_msg(remote_pci_dev, msg);
+        break;
     default:
         error_setg(&err, "Unknown command");
         goto finalize_loop;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 19/49] multi-process: configure remote side devices
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (17 preceding siblings ...)
  2019-10-24  9:08 ` [RFC v4 PATCH 18/49] multi-process: create IOHUB object to handle irq Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-11-21 12:05   ` Stefan Hajnoczi
  2019-10-24  9:09 ` [RFC v4 PATCH 20/49] multi-process: add qdev_proxy_add to create proxy devices Jagannathan Raman
                   ` (34 subsequent siblings)
  53 siblings, 1 reply; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

Add functions to configure remote devices.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 hw/proxy/qemu-proxy.c         | 39 ++++++++++++++++++++++++++++++++++++++-
 include/hw/proxy/qemu-proxy.h |  2 ++
 include/io/mpqemu-link.h      |  4 ++++
 3 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/hw/proxy/qemu-proxy.c b/hw/proxy/qemu-proxy.c
index bd7dd35..3b84055 100644
--- a/hw/proxy/qemu-proxy.c
+++ b/hw/proxy/qemu-proxy.c
@@ -50,8 +50,43 @@
 #include "qemu/event_notifier.h"
 #include "sysemu/kvm.h"
 #include "util/event_notifier-posix.c"
+#include "hw/boards.h"
+#include "include/qemu/log.h"
 
 static void pci_proxy_dev_realize(PCIDevice *dev, Error **errp);
+static void setup_irqfd(PCIProxyDev *dev);
+
+static void proxy_ready(PCIDevice *dev)
+{
+    PCIProxyDev *pdev = PCI_PROXY_DEV(dev);
+
+    setup_irqfd(pdev);
+}
+
+static void set_remote_opts(PCIDevice *dev, QDict *qdict, unsigned int cmd)
+{
+    QString *qstr;
+    MPQemuMsg msg;
+    const char *str;
+    PCIProxyDev *pdev;
+
+    pdev = PCI_PROXY_DEV(dev);
+
+    qstr = qobject_to_json(QOBJECT(qdict));
+    str = qstring_get_str(qstr);
+
+    memset(&msg, 0, sizeof(MPQemuMsg));
+
+    msg.data2 = (uint8_t *)str;
+    msg.cmd = cmd;
+    msg.bytestream = 1;
+    msg.size = qstring_get_length(qstr) + 1;
+    msg.num_fds = 0;
+
+    mpqemu_msg_send(pdev->mpqemu_link, &msg, pdev->mpqemu_link->com);
+
+    return;
+}
 
 static int add_argv(char *command_str, char **argv, int argc)
 {
@@ -325,7 +360,6 @@ static void init_proxy(PCIDevice *dev, char *command, bool need_spawn, Error **e
                         pdev->socket);
 
     configure_memory_sync(pdev->sync, pdev->mpqemu_link);
-    setup_irqfd(pdev);
 }
 
 static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
@@ -357,6 +391,9 @@ static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
     dev->get_proxy_sock = get_proxy_sock;
     dev->init_proxy = init_proxy;
     dev->sync = REMOTE_MEM_SYNC(object_new(TYPE_MEMORY_LISTENER));
+
+    dev->set_remote_opts = set_remote_opts;
+    dev->proxy_ready = proxy_ready;
 }
 
 static void send_bar_access_msg(PCIProxyDev *dev, MemoryRegion *mr,
diff --git a/include/hw/proxy/qemu-proxy.h b/include/hw/proxy/qemu-proxy.h
index 0fad7e3..80aadf9 100644
--- a/include/hw/proxy/qemu-proxy.h
+++ b/include/hw/proxy/qemu-proxy.h
@@ -28,6 +28,8 @@
 #include "io/mpqemu-link.h"
 #include "hw/proxy/memory-sync.h"
 #include "qemu/event_notifier.h"
+#include "hw/pci/pci.h"
+#include "block/qdict.h"
 
 #define TYPE_PCI_PROXY_DEV "pci-proxy-dev"
 
diff --git a/include/io/mpqemu-link.h b/include/io/mpqemu-link.h
index 1885ad7..3145b0e 100644
--- a/include/io/mpqemu-link.h
+++ b/include/io/mpqemu-link.h
@@ -68,6 +68,10 @@ typedef enum {
     BAR_WRITE,
     BAR_READ,
     SET_IRQFD,
+    DEV_OPTS,
+    DRIVE_OPTS,
+    DEVICE_ADD,
+    DEVICE_DEL,
     MAX,
 } mpqemu_cmd_t;
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 20/49] multi-process: add qdev_proxy_add to create proxy devices
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (18 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 19/49] multi-process: configure remote side devices Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-11-21 12:16   ` Stefan Hajnoczi
  2019-10-24  9:09 ` [RFC v4 PATCH 21/49] multi-process: remote: add setup_devices and setup_drive msg processing Jagannathan Raman
                   ` (33 subsequent siblings)
  53 siblings, 1 reply; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

This is handled while parsing the command line options.
The parsed options are being sent to remote process
as the messgaes containing JSON strings.

Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 v1 -> v2:
   - parse socket and command suboptions of drive/device commands

 hw/proxy/qemu-proxy.c         |   3 +-
 include/hw/proxy/qemu-proxy.h |   7 ++
 include/monitor/qdev.h        |  25 +++++
 qdev-monitor.c                | 254 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 288 insertions(+), 1 deletion(-)

diff --git a/hw/proxy/qemu-proxy.c b/hw/proxy/qemu-proxy.c
index 3b84055..fc1c731 100644
--- a/hw/proxy/qemu-proxy.c
+++ b/hw/proxy/qemu-proxy.c
@@ -337,7 +337,8 @@ static void init_proxy(PCIDevice *dev, char *command, bool need_spawn, Error **e
 
     if (!pdev->managed) {
         if (need_spawn) {
-            if (!remote_spawn(pdev, command, &local_error)) {
+            if (remote_spawn(pdev, command, &local_error)) {
+                fprintf(stderr, "remote spawn failed\n");
                 return;
             }
         }
diff --git a/include/hw/proxy/qemu-proxy.h b/include/hw/proxy/qemu-proxy.h
index 80aadf9..ac61a9b 100644
--- a/include/hw/proxy/qemu-proxy.h
+++ b/include/hw/proxy/qemu-proxy.h
@@ -99,6 +99,13 @@ typedef struct PCIProxyDevClass {
 
 int remote_spawn(PCIProxyDev *pdev, const char *command, Error **errp);
 
+typedef struct PCIProxyDevList {
+    QLIST_HEAD(, PCIProxyDev) devices;
+} proxy_dev_list_t;
+
+extern QemuMutex proxy_list_lock;
+extern proxy_dev_list_t proxy_dev_list;
+
 void proxy_default_bar_write(void *opaque, hwaddr addr, uint64_t val,
                              unsigned size);
 
diff --git a/include/monitor/qdev.h b/include/monitor/qdev.h
index eaa947d..0bc355a 100644
--- a/include/monitor/qdev.h
+++ b/include/monitor/qdev.h
@@ -1,13 +1,38 @@
 #ifndef MONITOR_QDEV_H
 #define MONITOR_QDEV_H
 
+#include "hw/proxy/qemu-proxy.h"
+
 /*** monitor commands ***/
 
 void hmp_info_qtree(Monitor *mon, const QDict *qdict);
 void hmp_info_qdm(Monitor *mon, const QDict *qdict);
 void qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp);
 
+DeviceState *qdev_remote_add(QemuOpts *opts, bool device, Error **errp);
+void qdev_proxy_fire(void);
+
 int qdev_device_help(QemuOpts *opts);
+DeviceState *qdev_proxy_add(const char *rid, const char *id, char *bus,
+                            char *command, int rsocket, bool managed,
+                            Error **errp);
+
+struct remote_process {
+    int rid;
+    int remote_pid;
+    unsigned int type;
+    int socket;
+    char *command;
+    QemuOpts *opts;
+
+    QLIST_ENTRY(remote_process) next;
+};
+
+void remote_process_register(struct remote_process *p);
+
+struct remote_process *get_remote_process_type(unsigned int type);
+struct remote_process *get_remote_process_rid(unsigned int rid);
+
 DeviceState *qdev_device_add(QemuOpts *opts, Error **errp);
 void qdev_set_id(DeviceState *dev, const char *id);
 
diff --git a/qdev-monitor.c b/qdev-monitor.c
index 148df9c..eeff43e 100644
--- a/qdev-monitor.c
+++ b/qdev-monitor.c
@@ -35,6 +35,17 @@
 #include "sysemu/block-backend.h"
 #include "sysemu/sysemu.h"
 #include "migration/misc.h"
+#include "hw/boards.h"
+#include "hw/proxy/qemu-proxy.h"
+#include "qapi/qmp/qjson.h"
+#include "qapi/qmp/qstring.h"
+#include "sysemu/sysemu.h"
+#include "hw/proxy/proxy-lsi53c895a.h"
+#include "include/qemu/cutils.h"
+#include "include/qemu/log.h"
+#include "qapi/qmp/qlist.h"
+#include "hw/proxy/qemu-proxy.h"
+#include "io/mpqemu-link.h"
 
 /*
  * Aliases were a bad idea from the start.  Let's keep them
@@ -47,6 +58,8 @@ typedef struct QDevAlias
     uint32_t arch_mask;
 } QDevAlias;
 
+proxy_dev_list_t proxy_dev_list;
+QemuMutex proxy_list_lock;
 /* Please keep this table sorted by typename. */
 static const QDevAlias qdev_alias_table[] = {
     { "e1000", "e1000-82540em" },
@@ -562,6 +575,247 @@ void qdev_set_id(DeviceState *dev, const char *id)
     }
 }
 
+static QLIST_HEAD(, remote_process) remote_processes;
+
+void remote_process_register(struct remote_process *p)
+{
+    QLIST_INSERT_HEAD(&remote_processes, p, next);
+}
+
+struct remote_process *get_remote_process_rid(unsigned int rid)
+{
+    struct remote_process *p;
+
+    QLIST_FOREACH(p, &remote_processes, next) {
+        if (rid == p->rid) {
+            return p;
+        }
+    }
+    return NULL;
+}
+
+struct remote_process *get_remote_process_type(unsigned int type)
+{
+    struct remote_process *p;
+
+    QLIST_FOREACH(p, &remote_processes, next) {
+        if (type == p->type) {
+            return p;
+        }
+    }
+    return NULL;
+}
+
+#if defined(CONFIG_MPQEMU)
+
+static PCIProxyDev *get_proxy_object_rid(const char *rid)
+{
+    PCIProxyDev *entry;
+    if (!proxy_list_lock.initialized) {
+        QLIST_INIT(&proxy_dev_list.devices);
+        qemu_mutex_init(&proxy_list_lock);
+    }
+
+    qemu_mutex_lock(&proxy_list_lock);
+    QLIST_FOREACH(entry, &proxy_dev_list.devices, next) {
+        if (strncmp(entry->rid, rid, strlen(entry->rid)) == 0) {
+            qemu_mutex_unlock(&proxy_list_lock);
+            return entry;
+        }
+    }
+    qemu_mutex_unlock(&proxy_list_lock);
+
+    return NULL;
+}
+
+#define MAX_RID_LENGTH 10
+void qdev_proxy_fire(void)
+{
+    PCIProxyDev *entry;
+
+    QLIST_FOREACH(entry, &proxy_dev_list.devices, next) {
+        if (entry->proxy_ready) {
+            entry->proxy_ready(PCI_DEVICE(entry));
+        }
+    }
+}
+
+DeviceState *qdev_proxy_add(const char *rid, const char *id, char *bus,
+                            char *command, int rsocket, bool managed,
+                            Error **errp)
+{
+    DeviceState *ds;
+    PCIProxyDev *pdev, *old_pdev;
+    QemuOpts *proxy_opts;
+    const char *proxy_type;
+    Error *local_err = NULL;
+    QDict *qdict;
+    const char *str;
+    bool need_spawn = false;
+    bool remote_exists = false;
+
+    if (strlen(rid) > MAX_RID_LENGTH) {
+        error_setg(errp, "rid %s is too long.", rid);
+        return NULL;
+    }
+
+    old_pdev = get_proxy_object_rid(rid);
+    if (old_pdev) {
+        remote_exists = true;
+        if (old_pdev->dev_id) {
+            if (id) {
+                if (strncmp(id, old_pdev->dev_id,
+                            strlen(old_pdev->dev_id)) == 0) {
+                    return DEVICE(old_pdev);
+                }
+            } else {
+            /* check if device belongs to this proxy, use bus */
+                if (bus) {
+                    if (strncmp(bus, old_pdev->dev_id,
+                                strlen(old_pdev->dev_id)) == 0) {
+                        return DEVICE(old_pdev);
+                    }
+                }
+            }
+        }
+    }
+
+    proxy_opts = qemu_opts_create(&qemu_device_opts, NULL, 0,
+                                  errp);
+
+    /* TODO: remove hardcoded type and add approptiate type identification. */
+    proxy_type = TYPE_PROXY_LSI53C895A;
+
+    qemu_opts_set_id(proxy_opts, (char *)rid);
+    qemu_opt_set(proxy_opts, "driver", proxy_type, &local_err);
+
+    qdict = qemu_opts_to_qdict(proxy_opts, NULL);
+    str = qstring_get_str(qobject_to_json(QOBJECT(qdict)));
+
+    ds = qdev_device_add(proxy_opts, &local_err);
+    if (!ds) {
+        error_setg(errp, "Could not create proxy device"
+                      " with opts %s.", str);
+        qemu_opts_del(proxy_opts);
+        return NULL;
+    }
+    qdev_set_id(ds, qemu_opts_id(proxy_opts));
+
+    pdev = PCI_PROXY_DEV(ds);
+    if (!pdev) {
+        error_setg(errp, "qdev_device_add failed.");
+        qemu_opts_del(proxy_opts);
+        return NULL;
+    }
+    pdev->rid = g_strdup(rid);
+    if (old_pdev) {
+        pdev->rsocket = old_pdev->rsocket;
+        pdev->socket = old_pdev->socket;
+        pdev->remote_pid = old_pdev->remote_pid;
+    } else {
+        pdev->rsocket = managed ? rsocket : -1;
+        pdev->socket = managed ? rsocket : -1;
+
+    }
+    pdev->managed = managed;
+
+    /* With no libvirt, we will need to spawn. For now, every time. */
+    if (!remote_exists) {
+        need_spawn = true;
+    }
+
+    pdev->init_proxy(PCI_DEVICE(ds), command, need_spawn, errp);
+
+    qemu_mutex_lock(&proxy_list_lock);
+    QLIST_INSERT_HEAD(&proxy_dev_list.devices, pdev, next);
+    qemu_mutex_unlock(&proxy_list_lock);
+
+    qemu_opts_del(proxy_opts);
+    return ds;
+}
+
+DeviceState *qdev_remote_add(QemuOpts *opts, bool device, Error **errp)
+{
+    PCIProxyDev *pdev = NULL;
+    DeviceState *dev;
+    const char *rid, *rsocket = NULL, *command = NULL;
+    QDict *qdict_new;
+    const char *id = NULL;
+    const char *driver = NULL;
+    const char *bus = NULL;
+
+    if (!proxy_list_lock.initialized) {
+        QLIST_INIT(&proxy_dev_list.devices);
+        qemu_mutex_init(&proxy_list_lock);
+    }
+
+    rid = qemu_opt_get(opts, "rid");
+    if (!rid) {
+        error_setg(errp, "rdevice option needs rid specified.");
+        return NULL;
+    }
+    if (device) {
+        driver = qemu_opt_get(opts, "driver");
+        /* TODO: properly identify the device class. */
+        if (strncmp(driver, "lsi", 3) == 0) {
+            id = qemu_opts_id(opts);
+            if (!id) {
+                error_setg(errp, "qdev_remote_add option needs id specified.");
+                return NULL;
+            }
+        }
+    }
+
+    rsocket = qemu_opt_get(opts, "socket");
+    if (rsocket) {
+        if (strlen(rsocket) > MAX_RID_LENGTH) {
+            error_setg(errp, "Socket number is incorrect.");
+            return NULL;
+        }
+    }
+    /*
+     * TODO: verify command with known commands and on remote end.
+     * How else can we verify the binary we launch without libvirtd support?
+     */
+    command = qemu_opt_get(opts, "command");
+    if (!rsocket && !command) {
+        error_setg(errp, "rdevice option needs socket or command specified.");
+        return NULL;
+    }
+
+    bus = qemu_opt_get(opts, "bus");
+    dev = qdev_proxy_add(rid, id, (char *)bus, (char *)command,
+                         rsocket ? atoi(rsocket) : -1,
+                         rsocket ? true : false, errp);
+    if (!dev) {
+        error_setg(errp, "qdev_proxy_add error.");
+        return NULL;
+    }
+
+    qdict_new = qemu_opts_to_qdict(opts, NULL);
+
+    if (!qdict_new) {
+        error_setg(errp, "Could not parse rdevice options.");
+        return NULL;
+    }
+
+    pdev = PCI_PROXY_DEV(dev);
+    if (!pdev->set_remote_opts) {
+        /* TODO: destroy proxy? */
+        error_setg(errp, "set_remote_opts failed.");
+        return NULL;
+    } else {
+        if (id && !pdev->dev_id) {
+            pdev->dev_id = g_strdup(id);
+        }
+        pdev->set_remote_opts(PCI_DEVICE(pdev), qdict_new,
+                              device ? DEV_OPTS : DRIVE_OPTS);
+    }
+
+    return dev;
+}
+#endif /*defined(CONFIG_MPQEMU)*/
+
 DeviceState *qdev_device_add(QemuOpts *opts, Error **errp)
 {
     DeviceClass *dc;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 21/49] multi-process: remote: add setup_devices and setup_drive msg processing
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (19 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 20/49] multi-process: add qdev_proxy_add to create proxy devices Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-10-24  9:09 ` [RFC v4 PATCH 22/49] multi-process: remote: use fd for socket from parent process Jagannathan Raman
                   ` (32 subsequent siblings)
  53 siblings, 0 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

Receive by remote side the configuration messages and build the device
object from JSON device descriptions.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
---
 v1 -> v2:
   - for new command line suboptions with libvirtd support, clean
     the options before creating drives/devices
   - use default pci bus/address for now

 include/hw/qdev-core.h |   2 +
 qdev-monitor.c         |   2 +-
 remote/remote-main.c   | 231 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 234 insertions(+), 1 deletion(-)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index aa123f8..19b117d 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -357,6 +357,8 @@ BusState *qdev_get_parent_bus(DeviceState *dev);
 
 DeviceState *qdev_find_recursive(BusState *bus, const char *id);
 
+DeviceState *find_device_state(const char *id, Error **errp);
+
 /* Returns 0 to walk children, > 0 to skip walk, < 0 to terminate walk. */
 typedef int (qbus_walkerfn)(BusState *bus, void *opaque);
 typedef int (qdev_walkerfn)(DeviceState *dev, void *opaque);
diff --git a/qdev-monitor.c b/qdev-monitor.c
index eeff43e..e1d05e4 100644
--- a/qdev-monitor.c
+++ b/qdev-monitor.c
@@ -1025,7 +1025,7 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp)
     object_unref(OBJECT(dev));
 }
 
-static DeviceState *find_device_state(const char *id, Error **errp)
+DeviceState *find_device_state(const char *id, Error **errp)
 {
     Object *obj;
 
diff --git a/remote/remote-main.c b/remote/remote-main.c
index cede97c..5b3ffd8 100644
--- a/remote/remote-main.c
+++ b/remote/remote-main.c
@@ -49,6 +49,21 @@
 #include "exec/memattrs.h"
 #include "exec/address-spaces.h"
 #include "remote/iohub.h"
+#include "qapi/qmp/qjson.h"
+#include "qapi/qmp/qobject.h"
+#include "qemu/option.h"
+#include "qemu/config-file.h"
+#include "monitor/qdev.h"
+#include "qapi/qmp/qdict.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/blockdev.h"
+#include "block/block.h"
+#include "qapi/qmp/qstring.h"
+#include "hw/qdev-properties.h"
+#include "hw/scsi/scsi.h"
+#include "block/qdict.h"
+#include "qapi/qmp/qlist.h"
+#include "qemu/log.h"
 
 static MPQemuLinkState *mpqemu_link;
 PCIDevice *remote_pci_dev;
@@ -139,6 +154,200 @@ fail:
     PUT_REMOTE_WAIT(wait);
 }
 
+static void process_device_add_msg(MPQemuMsg *msg)
+{
+    Error *local_err = NULL;
+    const char *json = (const char *)msg->data2;
+    int wait = msg->fds[0];
+    QObject *qobj = NULL;
+    QDict *qdict = NULL;
+    QemuOpts *opts = NULL;
+
+    qobj = qobject_from_json(json, &local_err);
+    if (local_err) {
+        goto fail;
+    }
+
+    qdict = qobject_to(QDict, qobj);
+    assert(qdict);
+
+    opts = qemu_opts_from_qdict(qemu_find_opts("device"), qdict, &local_err);
+    if (local_err) {
+        goto fail;
+    }
+
+    (void)qdev_device_add(opts, &local_err);
+    if (local_err) {
+        goto fail;
+    }
+
+fail:
+    if (local_err) {
+        error_report_err(local_err);
+        /* TODO: communicate the exact error message to proxy */
+    }
+
+    notify_proxy(wait, 1);
+
+    PUT_REMOTE_WAIT(wait);
+}
+
+static void process_device_del_msg(MPQemuMsg *msg)
+{
+    Error *local_err = NULL;
+    DeviceState *dev = NULL;
+    const char *json = (const char *)msg->data2;
+    int wait = msg->fds[0];
+    QObject *qobj = NULL;
+    QDict *qdict = NULL;
+    const char *id;
+
+    qobj = qobject_from_json(json, &local_err);
+    if (local_err) {
+        goto fail;
+    }
+
+    qdict = qobject_to(QDict, qobj);
+    assert(qdict);
+
+    id = qdict_get_try_str(qdict, "id");
+    assert(id);
+
+    dev = find_device_state(id, &local_err);
+    if (local_err) {
+        goto fail;
+    }
+
+    if (dev) {
+        qdev_unplug(dev, &local_err);
+    }
+
+fail:
+    if (local_err) {
+        error_report_err(local_err);
+        /* TODO: communicate the exact error message to proxy */
+    }
+
+    notify_proxy(wait, 1);
+
+    PUT_REMOTE_WAIT(wait);
+}
+
+static int init_drive(QDict *rqdict, Error **errp)
+{
+    QemuOpts *opts;
+    Error *local_error = NULL;
+
+    if (rqdict != NULL && qdict_size(rqdict) > 0) {
+        opts = qemu_opts_from_qdict(&qemu_drive_opts,
+                                    rqdict, &local_error);
+        if (!opts) {
+            error_propagate(errp, local_error);
+            return -EINVAL;
+        }
+    } else {
+        return -EINVAL;
+    }
+
+    qemu_opt_unset(opts, "rid");
+    qemu_opt_unset(opts, "socket");
+    qemu_opt_unset(opts, "remote");
+    qemu_opt_unset(opts, "command");
+
+    if (drive_new(opts, IF_IDE, &local_error) == NULL) {
+        error_propagate(errp, local_error);
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+static int setup_drive(MPQemuMsg *msg, Error **errp)
+{
+    QObject *obj;
+    QDict *qdict;
+    QString *qstr;
+    Error *local_error = NULL;
+    int rc = -EINVAL;
+
+    if (!msg->data2) {
+        return rc;
+    }
+
+    qstr = qstring_from_str((char *)msg->data2);
+    obj = qobject_from_json(qstring_get_str(qstr), &local_error);
+    if (!obj) {
+        error_propagate(errp, local_error);
+        return rc;
+    }
+
+    qdict = qobject_to(QDict, obj);
+    if (!qdict) {
+        return rc;
+    }
+
+    if (init_drive(qdict, &local_error)) {
+        error_setg(errp, "init_drive failed in setup_drive.");
+        return rc;
+    }
+
+    return 0;
+}
+
+static int setup_device(MPQemuMsg *msg, Error **errp)
+{
+    QObject *obj;
+    QDict *qdict;
+    QString *qstr;
+    QemuOpts *opts;
+    DeviceState *dev = NULL;
+    int rc = -EINVAL;
+    Error *local_error = NULL;
+
+    if (!msg->data2) {
+        return rc;
+    }
+
+    qstr = qstring_from_str((char *)msg->data2);
+    obj = qobject_from_json(qstring_get_str(qstr), &local_error);
+    if (!obj) {
+        error_setg(errp, "Could not get object!");
+        return rc;
+    }
+
+    qdict = qobject_to(QDict, obj);
+    if (!qdict) {
+        return rc;
+    }
+
+    g_assert(qdict_size(qdict) > 1);
+
+    opts = qemu_opts_from_qdict(&qemu_device_opts, qdict, &local_error);
+    qemu_opt_unset(opts, "rid");
+    qemu_opt_unset(opts, "socket");
+    qemu_opt_unset(opts, "remote");
+    qemu_opt_unset(opts, "command");
+    /*
+     * TODO: use the bus and addr from the device options. For now
+     * we use default value.
+     */
+    qemu_opt_unset(opts, "bus");
+    qemu_opt_unset(opts, "addr");
+
+    dev = qdev_device_add(opts, &local_error);
+    if (!dev) {
+        error_setg(errp, "Could not add device %s.",
+                   qstring_get_str(qobject_to_json(QOBJECT(qdict))));
+        return rc;
+    }
+    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
+        remote_pci_dev = PCI_DEVICE(dev);
+    }
+    qemu_opts_del(opts);
+
+    return 0;
+}
+
 static void process_msg(GIOCondition cond, MPQemuChannel *chan)
 {
     MPQemuMsg *msg = NULL;
@@ -184,11 +393,33 @@ static void process_msg(GIOCondition cond, MPQemuChannel *chan)
          */
         remote_sysmem_reconfig(msg, &err);
         if (err) {
+            error_report_err(err);
             goto finalize_loop;
         }
         break;
     case SET_IRQFD:
         process_set_irqfd_msg(remote_pci_dev, msg);
+        qdev_machine_creation_done();
+        qemu_mutex_lock_iothread();
+        qemu_run_machine_init_done_notifiers();
+        qemu_mutex_unlock_iothread();
+
+        break;
+    case DRIVE_OPTS:
+        if (setup_drive(msg, &err)) {
+            error_report_err(err);
+        }
+        break;
+    case DEV_OPTS:
+        if (setup_device(msg, &err)) {
+            error_report_err(err);
+        }
+        break;
+    case DEVICE_ADD:
+        process_device_add_msg(msg);
+        break;
+    case DEVICE_DEL:
+        process_device_del_msg(msg);
         break;
     default:
         error_setg(&err, "Unknown command");
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 22/49] multi-process: remote: use fd for socket from parent process
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (20 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 21/49] multi-process: remote: add setup_devices and setup_drive msg processing Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-10-24  9:09 ` [RFC v4 PATCH 23/49] multi-process: remote: add create_done condition Jagannathan Raman
                   ` (31 subsequent siblings)
  53 siblings, 0 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 remote/remote-main.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/remote/remote-main.c b/remote/remote-main.c
index 5b3ffd8..cb2829e 100644
--- a/remote/remote-main.c
+++ b/remote/remote-main.c
@@ -64,6 +64,7 @@
 #include "block/qdict.h"
 #include "qapi/qmp/qlist.h"
 #include "qemu/log.h"
+#include "qemu/cutils.h"
 
 static MPQemuLinkState *mpqemu_link;
 PCIDevice *remote_pci_dev;
@@ -440,6 +441,7 @@ finalize_loop:
 int main(int argc, char *argv[])
 {
     Error *err = NULL;
+    int fd = -1;
 
     module_call_init(MODULE_INIT_QOM);
 
@@ -462,7 +464,13 @@ int main(int argc, char *argv[])
         return -1;
     }
 
-    mpqemu_init_channel(mpqemu_link, &mpqemu_link->com, STDIN_FILENO);
+    fd = qemu_parse_fd(argv[1]);
+    if (fd == -1) {
+        printf("Failed to parse fd for remote process.\n");
+        return -EINVAL;
+    }
+
+    mpqemu_init_channel(mpqemu_link, &mpqemu_link->com, fd);
     mpqemu_link_set_callback(mpqemu_link, process_msg);
 
     mpqemu_start_coms(mpqemu_link);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 23/49] multi-process: remote: add create_done condition
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (21 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 22/49] multi-process: remote: use fd for socket from parent process Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-10-24  9:09 ` [RFC v4 PATCH 24/49] multi-process: add processing of remote drive and device command line Jagannathan Raman
                   ` (30 subsequent siblings)
  53 siblings, 0 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

Do not allow BAR,MMIO handlers and irq setup to run before
the configuration of the devices completes.

Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 remote/remote-main.c | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/remote/remote-main.c b/remote/remote-main.c
index cb2829e..80d8c02 100644
--- a/remote/remote-main.c
+++ b/remote/remote-main.c
@@ -68,6 +68,7 @@
 
 static MPQemuLinkState *mpqemu_link;
 PCIDevice *remote_pci_dev;
+bool create_done;
 
 static void process_config_write(MPQemuMsg *msg)
 {
@@ -370,21 +371,31 @@ static void process_msg(GIOCondition cond, MPQemuChannel *chan)
     case INIT:
         break;
     case CONF_WRITE:
-        process_config_write(msg);
+        if (create_done) {
+            process_config_write(msg);
+        }
+
         break;
     case CONF_READ:
-        process_config_read(msg);
+        if (create_done) {
+            process_config_read(msg);
+        }
+
         break;
     case BAR_WRITE:
-        process_bar_write(msg, &err);
-        if (err) {
-            goto finalize_loop;
+        if (create_done) {
+            process_bar_write(msg, &err);
+            if (err) {
+                error_report_err(err);
+            }
         }
         break;
     case BAR_READ:
-        process_bar_read(msg, &err);
-        if (err) {
-            goto finalize_loop;
+        if (create_done) {
+            process_bar_read(msg, &err);
+            if (err) {
+                error_report_err(err);
+            }
         }
         break;
     case SYNC_SYSMEM:
@@ -404,7 +415,7 @@ static void process_msg(GIOCondition cond, MPQemuChannel *chan)
         qemu_mutex_lock_iothread();
         qemu_run_machine_init_done_notifiers();
         qemu_mutex_unlock_iothread();
-
+        create_done = true;
         break;
     case DRIVE_OPTS:
         if (setup_drive(msg, &err)) {
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 24/49] multi-process: add processing of remote drive and device command line
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (22 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 23/49] multi-process: remote: add create_done condition Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-10-24  9:09 ` [RFC v4 PATCH 25/49] multi-process: Introduce build flags to separate remote process code Jagannathan Raman
                   ` (29 subsequent siblings)
  53 siblings, 0 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

Add processing of command line options drive and device.
After remote devices are created along with their proxies,
signal the proxies to finish the configuration steps.

Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 v1 -> v2:
   - change command line option for remote process drive/device to
     use existing -drive/-device options
   - process drive and device options only after non-remote devices
     and drives are added

 vl.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 78 insertions(+)

diff --git a/vl.c b/vl.c
index 4489cfb..b19c57b 100644
--- a/vl.c
+++ b/vl.c
@@ -35,6 +35,11 @@
 #include "sysemu/runstate.h"
 #include "sysemu/seccomp.h"
 #include "sysemu/tcg.h"
+#include "qapi/qmp/qdict.h"
+#include "block/qdict.h"
+#include "qapi/qmp/qstring.h"
+#include "qapi/qmp/qjson.h"
+#include "qapi/qmp/qlist.h"
 
 #ifdef CONFIG_SDL
 #if defined(__APPLE__) || defined(main)
@@ -1136,11 +1141,45 @@ static int cleanup_add_fd(void *opaque, QemuOpts *opts, Error **errp)
 #define MTD_OPTS ""
 #define SD_OPTS ""
 
+#if defined(CONFIG_MPQEMU)
+static int rdrive_init_func(void *opaque, QemuOpts *opts, Error **errp)
+{
+    DeviceState *dev;
+
+    dev = qdev_remote_add(opts, false /* this is drive */, errp);
+    if (!dev) {
+        error_setg(errp, "qdev_remote_add failed for drive.");
+        return -1;
+    }
+    object_unref(OBJECT(dev));
+    return 0;
+}
+#endif
+
+#if defined(CONFIG_MPQEMU)
+static int pass;
+#endif
+
 static int drive_init_func(void *opaque, QemuOpts *opts, Error **errp)
 {
     BlockInterfaceType *block_default_type = opaque;
 
+#if defined(CONFIG_MPQEMU)
+    const char *remote;
+
+    remote = qemu_opt_get(opts, "remote");
+    if (pass && remote) {
+        return rdrive_init_func(opaque, opts, errp);
+    } else {
+        if (!remote && !pass) {
+            drive_new(opts, *block_default_type, errp);
+        }
+    }
+
+    return 0;
+#else
     return drive_new(opts, *block_default_type, errp) == NULL;
+#endif
 }
 
 static int drive_enable_snapshot(void *opaque, QemuOpts *opts, Error **errp)
@@ -2199,10 +2238,35 @@ static int device_help_func(void *opaque, QemuOpts *opts, Error **errp)
     return qdev_device_help(opts);
 }
 
+#if defined(CONFIG_MPQEMU)
+static int rdevice_init_func(void *opaque, QemuOpts *opts, Error **errp)
+{
+    DeviceState *dev;
+
+    dev = qdev_remote_add(opts, true /* this is device */, errp);
+    if (!dev) {
+        error_setg(errp, "qdev_remote_add failed for device.");
+        return -1;
+    }
+    object_unref(OBJECT(dev));
+    return 0;
+}
+#endif
+
 static int device_init_func(void *opaque, QemuOpts *opts, Error **errp)
 {
     DeviceState *dev;
 
+#if defined(CONFIG_MPQEMU)
+    const char *remote;
+
+    remote = qemu_opt_get(opts, "remote");
+    if (remote) {
+        /* This will be a remote process */
+        return rdevice_init_func(opaque, opts, errp);
+    }
+#endif
+
     dev = qdev_device_add(opts, errp);
     if (!dev) {
         return -1;
@@ -4348,6 +4412,17 @@ int main(int argc, char **argv, char **envp)
     /* Check if IGD GFX passthrough. */
     igd_gfx_passthru();
 
+#if defined(CONFIG_MPQEMU)
+    /*
+     * Parse the list for remote drives here as we launch PCIProxyDev here and
+     * need PCI host initialized. As a TODO: could defer init of PCIProxyDev instead.
+     */
+    if (qemu_opts_foreach(qemu_find_opts("drive"), drive_init_func,
+                          &machine_class->block_default_type, &error_fatal)) {
+        exit(0);
+    }
+#endif
+
     /* init generic devices */
     rom_set_order_override(FW_CFG_ORDER_OVERRIDE_DEVICE);
     qemu_opts_foreach(qemu_find_opts("device"),
@@ -4405,6 +4480,9 @@ int main(int argc, char **argv, char **envp)
     qemu_register_reset(qbus_reset_all_fn, sysbus_get_default());
     qemu_run_machine_init_done_notifiers();
 
+#if defined(CONFIG_MPQEMU)
+    qdev_proxy_fire();
+#endif
     if (rom_check_and_register_reset() != 0) {
         error_report("rom check and register reset failed");
         exit(1);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 25/49] multi-process: Introduce build flags to separate remote process code
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (23 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 24/49] multi-process: add processing of remote drive and device command line Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-10-24  9:09 ` [RFC v4 PATCH 26/49] multi-process: refractor vl.c code to re-use in remote Jagannathan Raman
                   ` (28 subsequent siblings)
  53 siblings, 0 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

Introduce SCSI_PROCESS & REMOTE_PROCESS build flags to separate
code that applies only to remote processes.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 New patch in v3

 Makefile.target | 4 ++++
 rules.mak       | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/Makefile.target b/Makefile.target
index f16b74a..0ca40f1 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -255,6 +255,10 @@ ifdef CONFIG_DARWIN
 	$(call quiet-command,SetFile -a C $@,"SETFILE","$(TARGET_DIR)$@")
 endif
 
+ifdef CONFIG_MPQEMU
+$(SCSI_DEV_BUILD): REMOTE_FLAGS = -DREMOTE_PROCESS -DSCSI_PROCESS
+endif
+
 $(SCSI_DEV_BUILD): $(all-remote-lsi-obj-y) $(COMMON_LDADDS)
 	$(call LINK, $(filter-out %.mak, $^))
 ifdef CONFIG_DARWIN
diff --git a/rules.mak b/rules.mak
index 967295d..22e0c36 100644
--- a/rules.mak
+++ b/rules.mak
@@ -67,7 +67,7 @@ expand-objs = $(strip $(sort $(filter %.o,$1)) \
 
 %.o: %.c
 	$(call quiet-command,$(CC) $(QEMU_LOCAL_INCLUDES) $(QEMU_INCLUDES) \
-	       $(QEMU_CFLAGS) $(QEMU_DGFLAGS) $(CFLAGS) $($@-cflags) \
+	       $(QEMU_CFLAGS) $(QEMU_DGFLAGS) $(CFLAGS) $($@-cflags) $(REMOTE_FLAGS) \
 	       -c -o $@ $<,"CC","$(TARGET_DIR)$@")
 %.o: %.rc
 	$(call quiet-command,$(WINDRES) -I. -o $@ $<,"RC","$(TARGET_DIR)$@")
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 26/49] multi-process: refractor vl.c code to re-use in remote
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (24 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 25/49] multi-process: Introduce build flags to separate remote process code Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-10-24  9:09 ` [RFC v4 PATCH 27/49] multi-process: add remote option Jagannathan Raman
                   ` (27 subsequent siblings)
  53 siblings, 0 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
---
 New patch in v3

 Makefile.objs        |   2 +
 remote/Makefile.objs |   1 +
 vl-parse.c           | 158 +++++++++++++++++++++++++++++++++++++++++++++++++++
 vl.c                 | 152 +------------------------------------------------
 vl.h                 |  54 ++++++++++++++++++
 5 files changed, 216 insertions(+), 151 deletions(-)
 create mode 100644 vl-parse.c
 create mode 100644 vl.h

diff --git a/Makefile.objs b/Makefile.objs
index c2ac261..c23ccaa 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -104,6 +104,8 @@ qemu-seccomp.o-libs := $(SECCOMP_LIBS)
 
 common-obj-$(CONFIG_FDT) += device_tree.o
 
+common-obj-y += vl-parse.o
+
 ######################################################################
 # qapi
 
diff --git a/remote/Makefile.objs b/remote/Makefile.objs
index cbb3065..c1349ad 100644
--- a/remote/Makefile.objs
+++ b/remote/Makefile.objs
@@ -2,3 +2,4 @@ remote-pci-obj-$(CONFIG_MPQEMU) += remote-main.o
 remote-pci-obj-$(CONFIG_MPQEMU) += pcihost.o
 remote-pci-obj-$(CONFIG_MPQEMU) += machine.o
 remote-pci-obj-$(CONFIG_MPQEMU) += iohub.o
+remote-pci-obj-$(CONFIG_MPQEMU) +=../vl-parse.o
diff --git a/vl-parse.c b/vl-parse.c
new file mode 100644
index 0000000..4e2bd7c
--- /dev/null
+++ b/vl-parse.c
@@ -0,0 +1,158 @@
+/*
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "qapi/error.h"
+#include "qemu/cutils.h"
+#include "qemu/error-report.h"
+#include "monitor/qdev.h"
+#include "monitor/qdev.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/runstate.h"
+#include "qemu/option.h"
+#include "qemu-options.h"
+#include "sysemu/blockdev.h"
+
+#include "chardev/char.h"
+#include "monitor/monitor.h"
+#include "qemu/config-file.h"
+
+#include "sysemu/arch_init.h"
+
+#include "vl.h"
+
+/***********************************************************/
+/* QEMU Block devices */
+
+static const QEMUOption qemu_options[] = {
+    { "h", 0, QEMU_OPTION_h, QEMU_ARCH_ALL },
+#define QEMU_OPTIONS_GENERATE_OPTIONS
+#include "qemu-options-wrapper.h"
+    { NULL },
+};
+
+const QEMUOption *lookup_opt(int argc, char **argv,
+                                    const char **poptarg, int *poptind)
+{
+    const QEMUOption *popt;
+    int optind = *poptind;
+    char *r = argv[optind];
+    const char *optarg;
+
+    loc_set_cmdline(argv, optind, 1);
+    optind++;
+    /* Treat --foo the same as -foo.  */
+    if (r[1] == '-') {
+        r++;
+    }
+    popt = qemu_options;
+    if (!popt) {
+        error_report("No valide qemu_options");
+    }
+    for (;;) {
+        if (!popt->name) {
+            error_report("invalid option*");
+            exit(1);
+            popt++;
+            continue;
+        }
+        if (!strcmp(popt->name, r + 1)) {
+            break;
+        }
+        popt++;
+    }
+    if (popt->flags & HAS_ARG) {
+        if (optind >= argc) {
+            error_report("optind %d, argc %d", optind, argc);
+            error_report("requires an argument");
+            exit(1);
+        }
+        optarg = argv[optind++];
+        loc_set_cmdline(argv, optind - 2, 2);
+    } else {
+        optarg = NULL;
+    }
+
+    *poptarg = optarg;
+    *poptind = optind;
+
+    return popt;
+}
+
+int drive_init_func(void *opaque, QemuOpts *opts, Error **errp)
+{
+    BlockInterfaceType *block_default_type = opaque;
+
+    if (!drive_new(opts, *block_default_type, errp)) {
+        error_report_err(*errp);
+    }
+
+    return 0;
+}
+
+#if defined(CONFIG_MPQEMU)
+int rdrive_init_func(void *opaque, QemuOpts *opts, Error **errp)
+{
+    DeviceState *dev;
+
+    dev = qdev_remote_add(opts, false /* this is drive */, errp);
+    if (!dev) {
+        error_setg(errp, "qdev_remote_add failed for drive.");
+        return -1;
+    }
+    object_unref(OBJECT(dev));
+    return 0;
+}
+#endif
+
+#if defined(CONFIG_MPQEMU)
+int rdevice_init_func(void *opaque, QemuOpts *opts, Error **errp)
+{
+    DeviceState *dev;
+
+    dev = qdev_remote_add(opts, true /* this is device */, errp);
+    if (!dev) {
+        error_setg(errp, "qdev_remote_add failed for device.");
+        return -1;
+    }
+    return 0;
+}
+#endif
+
+int device_init_func(void *opaque, QemuOpts *opts, Error **errp)
+{
+    DeviceState *dev;
+    const char *remote = NULL;
+
+    remote = qemu_opt_get(opts, "rid");
+    if (remote) {
+        return 0;
+    }
+
+    dev = qdev_device_add(opts, errp);
+    if (!dev) {
+        return -1;
+    }
+    object_unref(OBJECT(dev));
+    return 0;
+}
diff --git a/vl.c b/vl.c
index b19c57b..3fef694 100644
--- a/vl.c
+++ b/vl.c
@@ -40,6 +40,7 @@
 #include "qapi/qmp/qstring.h"
 #include "qapi/qmp/qjson.h"
 #include "qapi/qmp/qlist.h"
+#include "vl.h"
 
 #ifdef CONFIG_SDL
 #if defined(__APPLE__) || defined(main)
@@ -1134,54 +1135,6 @@ static int cleanup_add_fd(void *opaque, QemuOpts *opts, Error **errp)
 /***********************************************************/
 /* QEMU Block devices */
 
-#define HD_OPTS "media=disk"
-#define CDROM_OPTS "media=cdrom"
-#define FD_OPTS ""
-#define PFLASH_OPTS ""
-#define MTD_OPTS ""
-#define SD_OPTS ""
-
-#if defined(CONFIG_MPQEMU)
-static int rdrive_init_func(void *opaque, QemuOpts *opts, Error **errp)
-{
-    DeviceState *dev;
-
-    dev = qdev_remote_add(opts, false /* this is drive */, errp);
-    if (!dev) {
-        error_setg(errp, "qdev_remote_add failed for drive.");
-        return -1;
-    }
-    object_unref(OBJECT(dev));
-    return 0;
-}
-#endif
-
-#if defined(CONFIG_MPQEMU)
-static int pass;
-#endif
-
-static int drive_init_func(void *opaque, QemuOpts *opts, Error **errp)
-{
-    BlockInterfaceType *block_default_type = opaque;
-
-#if defined(CONFIG_MPQEMU)
-    const char *remote;
-
-    remote = qemu_opt_get(opts, "remote");
-    if (pass && remote) {
-        return rdrive_init_func(opaque, opts, errp);
-    } else {
-        if (!remote && !pass) {
-            drive_new(opts, *block_default_type, errp);
-        }
-    }
-
-    return 0;
-#else
-    return drive_new(opts, *block_default_type, errp) == NULL;
-#endif
-}
-
 static int drive_enable_snapshot(void *opaque, QemuOpts *opts, Error **errp)
 {
     if (qemu_opt_get(opts, "snapshot") == NULL) {
@@ -1877,21 +1830,6 @@ static void help(int exitcode)
     exit(exitcode);
 }
 
-#define HAS_ARG 0x0001
-
-typedef struct QEMUOption {
-    const char *name;
-    int flags;
-    int index;
-    uint32_t arch_mask;
-} QEMUOption;
-
-static const QEMUOption qemu_options[] = {
-    { "h", 0, QEMU_OPTION_h, QEMU_ARCH_ALL },
-#define QEMU_OPTIONS_GENERATE_OPTIONS
-#include "qemu-options-wrapper.h"
-    { NULL },
-};
 
 typedef struct VGAInterfaceInfo {
     const char *opt_name;    /* option name */
@@ -2238,43 +2176,6 @@ static int device_help_func(void *opaque, QemuOpts *opts, Error **errp)
     return qdev_device_help(opts);
 }
 
-#if defined(CONFIG_MPQEMU)
-static int rdevice_init_func(void *opaque, QemuOpts *opts, Error **errp)
-{
-    DeviceState *dev;
-
-    dev = qdev_remote_add(opts, true /* this is device */, errp);
-    if (!dev) {
-        error_setg(errp, "qdev_remote_add failed for device.");
-        return -1;
-    }
-    object_unref(OBJECT(dev));
-    return 0;
-}
-#endif
-
-static int device_init_func(void *opaque, QemuOpts *opts, Error **errp)
-{
-    DeviceState *dev;
-
-#if defined(CONFIG_MPQEMU)
-    const char *remote;
-
-    remote = qemu_opt_get(opts, "remote");
-    if (remote) {
-        /* This will be a remote process */
-        return rdevice_init_func(opaque, opts, errp);
-    }
-#endif
-
-    dev = qdev_device_add(opts, errp);
-    if (!dev) {
-        return -1;
-    }
-    object_unref(OBJECT(dev));
-    return 0;
-}
-
 static int chardev_init_func(void *opaque, QemuOpts *opts, Error **errp)
 {
     Error *local_err = NULL;
@@ -2604,46 +2505,6 @@ static void qemu_run_machine_init_done_notifiers(void)
     notifier_list_notify(&machine_init_done_notifiers, NULL);
 }
 
-static const QEMUOption *lookup_opt(int argc, char **argv,
-                                    const char **poptarg, int *poptind)
-{
-    const QEMUOption *popt;
-    int optind = *poptind;
-    char *r = argv[optind];
-    const char *optarg;
-
-    loc_set_cmdline(argv, optind, 1);
-    optind++;
-    /* Treat --foo the same as -foo.  */
-    if (r[1] == '-')
-        r++;
-    popt = qemu_options;
-    for(;;) {
-        if (!popt->name) {
-            error_report("invalid option");
-            exit(1);
-        }
-        if (!strcmp(popt->name, r + 1))
-            break;
-        popt++;
-    }
-    if (popt->flags & HAS_ARG) {
-        if (optind >= argc) {
-            error_report("requires an argument");
-            exit(1);
-        }
-        optarg = argv[optind++];
-        loc_set_cmdline(argv, optind - 2, 2);
-    } else {
-        optarg = NULL;
-    }
-
-    *poptarg = optarg;
-    *poptind = optind;
-
-    return popt;
-}
-
 static MachineClass *select_machine(void)
 {
     GSList *machines = object_class_get_list(TYPE_MACHINE, false);
@@ -4412,17 +4273,6 @@ int main(int argc, char **argv, char **envp)
     /* Check if IGD GFX passthrough. */
     igd_gfx_passthru();
 
-#if defined(CONFIG_MPQEMU)
-    /*
-     * Parse the list for remote drives here as we launch PCIProxyDev here and
-     * need PCI host initialized. As a TODO: could defer init of PCIProxyDev instead.
-     */
-    if (qemu_opts_foreach(qemu_find_opts("drive"), drive_init_func,
-                          &machine_class->block_default_type, &error_fatal)) {
-        exit(0);
-    }
-#endif
-
     /* init generic devices */
     rom_set_order_override(FW_CFG_ORDER_OVERRIDE_DEVICE);
     qemu_opts_foreach(qemu_find_opts("device"),
diff --git a/vl.h b/vl.h
new file mode 100644
index 0000000..8c40fed
--- /dev/null
+++ b/vl.h
@@ -0,0 +1,54 @@
+/*
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef VL_H
+#define VL_H
+
+/***********************************************************/
+/* QEMU Block devices */
+
+#define HD_OPTS "media=disk"
+#define CDROM_OPTS "media=cdrom"
+#define FD_OPTS ""
+#define PFLASH_OPTS ""
+#define MTD_OPTS ""
+#define SD_OPTS ""
+
+
+#define HAS_ARG 0x0001
+typedef struct QEMUOption {
+    const char *name;
+    int flags;
+    int index;
+    uint32_t arch_mask;
+} QEMUOption;
+
+const QEMUOption *lookup_opt(int argc, char **argv,
+                                    const char **poptarg, int *poptind);
+
+int drive_init_func(void *opaque, QemuOpts *opts, Error **errp);
+int device_init_func(void *opaque, QemuOpts *opts, Error **errp);
+int rdrive_init_func(void *opaque, QemuOpts *opts, Error **errp);
+int rdevice_init_func(void *opaque, QemuOpts *opts, Error **errp);
+
+#endif /* VL_H */
+
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 27/49] multi-process: add remote option
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (25 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 26/49] multi-process: refractor vl.c code to re-use in remote Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-10-24  9:09 ` [RFC v4 PATCH 28/49] multi-process: add remote options parser Jagannathan Raman
                   ` (26 subsequent siblings)
  53 siblings, 0 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
---
 New patch in v3

 qemu-options.hx | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/qemu-options.hx b/qemu-options.hx
index 996b6fb..4734e8e 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -27,6 +27,27 @@ STEXI
 Display version information and exit
 ETEXI
 
+DEF("remote", HAS_ARG, QEMU_OPTION_remote,
+    "-remote socket[,prop[=value][,...]]\n"
+    "                add remote process\n"
+    "                prop=value,... sets driver properties\n"
+    "                use '-remote help' to print all possible properties\n",
+    QEMU_ARCH_ALL)
+STEXI
+@table @option
+@item rid
+@findex -rid
+remote id
+@item socket
+@findex -socket
+Remote process socket
+@item command
+@findex -command
+Remote process command.
+
+@end table
+ETEXI
+
 DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
     "-machine [type=]name[,prop[=value][,...]]\n"
     "                selects emulated machine ('-machine help' for list)\n"
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 28/49] multi-process: add remote options parser
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (26 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 27/49] multi-process: add remote option Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-10-24  9:09 ` [RFC v4 PATCH 29/49] multi-process: add parse_cmdline in remote process Jagannathan Raman
                   ` (25 subsequent siblings)
  53 siblings, 0 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
---
 New patch in v3

 vl.c | 117 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 117 insertions(+)

diff --git a/vl.c b/vl.c
index 3fef694..1417ff2 100644
--- a/vl.c
+++ b/vl.c
@@ -280,6 +280,28 @@ static QemuOptsList qemu_option_rom_opts = {
     },
 };
 
+static QemuOptsList qemu_remote_opts = {
+    .name = "remote",
+    .head = QTAILQ_HEAD_INITIALIZER(qemu_remote_opts.head),
+    .desc = {
+        {
+            .name = "rid",
+            .type = QEMU_OPT_NUMBER,
+            .help = "id of the remote process"
+        },{
+            .name = "socket",
+            .type = QEMU_OPT_NUMBER,
+            .help = "Socket for remote",
+        },{
+            .name = "command",
+            .type = QEMU_OPT_STRING,
+            .help = "command to run",
+        },
+        { /* end of list */ }
+    },
+};
+
+
 static QemuOptsList qemu_machine_opts = {
     .name = "machine",
     .implied_opt_name = "type",
@@ -347,6 +369,87 @@ static QemuOptsList qemu_boot_opts = {
     },
 };
 
+#if defined(CONFIG_MPQEMU)
+static int device_remote_add(void *opaque, QemuOpts *opts, Error **errp)
+{
+    unsigned int rid = *(unsigned int *)opaque;
+    const char *opt_rid = NULL;
+    struct remote_process *p = NULL;;
+
+    opt_rid = qemu_opt_get(opts, "rid");
+    if (!opt_rid) {
+        return 0;
+    }
+
+    p = get_remote_process_rid(rid);
+    if (!p) {
+        return -EINVAL;
+    }
+
+    if (atoi(opt_rid) == rid) {
+        qemu_opt_set(opts, "command", p->command, errp);
+        rdevice_init_func(opaque, opts, errp);
+        qemu_opts_del(opts);
+    }
+    return 0;
+}
+
+static int parse_remote(void *opaque, QemuOpts *opts, Error **errp)
+{
+    int rid;
+    int socket;
+    char  *c_sock;
+    const char *command = NULL;
+    struct remote_process r_proc;
+
+    rid = atoi(qemu_opt_get(opts, "rid"));
+    if (rid < 0) {
+        error_setg(errp, "rid is required.");
+        return -1;
+    }
+    if (get_remote_process_rid(rid)) {
+        error_setg(errp, "There is already process with rid %d", rid);
+        goto cont_devices;
+    }
+
+    c_sock = (char *)qemu_opt_get(opts, "socket");
+    if (c_sock) {
+        socket = atoi(c_sock);
+    } else {
+        socket = -1;
+    }
+
+    command = qemu_opt_get(opts, "command");
+
+    if (socket <= STDERR_FILENO && socket != -1) {
+        socket = -1;
+    }
+
+    if (!command && socket < 0) {
+        error_setg(errp, "No correct  socket or command defined for remote.");
+        return -1;
+    }
+
+    if (rid < 0) {
+        error_setg(errp, "id option is required and must be non-negative");
+        return -1;
+    }
+    r_proc.rid = rid;
+    r_proc.socket = socket;
+    r_proc.command = g_strdup(command);
+    remote_process_register(&r_proc);
+
+ cont_devices:
+    if (qemu_opts_foreach(qemu_find_opts("device"), device_remote_add,
+                          &rid, NULL)) {
+        error_setg(errp, "Could not process some of the remote devices.");
+    }
+
+    return 0;
+}
+
+#endif
+
 static QemuOptsList qemu_add_fd_opts = {
     .name = "add-fd",
     .head = QTAILQ_HEAD_INITIALIZER(qemu_add_fd_opts.head),
@@ -2826,6 +2929,7 @@ int main(int argc, char **argv, char **envp)
     qemu_add_opts(&qemu_icount_opts);
     qemu_add_opts(&qemu_semihosting_config_opts);
     qemu_add_opts(&qemu_fw_cfg_opts);
+    qemu_add_opts(&qemu_remote_opts);
     module_call_init(MODULE_INIT_OPTS);
 
     runstate_init();
@@ -3678,6 +3782,14 @@ int main(int argc, char **argv, char **envp)
                 exit(1);
 #endif
                 break;
+            case QEMU_OPTION_remote:
+                opts = qemu_opts_parse_noisily(qemu_find_opts("remote"),
+                                               optarg, false);
+                if (!opts) {
+                    exit(1);
+                }
+                break;
+
             case QEMU_OPTION_object:
                 opts = qemu_opts_parse_noisily(qemu_find_opts("object"),
                                                optarg, true);
@@ -4278,6 +4390,11 @@ int main(int argc, char **argv, char **envp)
     qemu_opts_foreach(qemu_find_opts("device"),
                       device_init_func, NULL, &error_fatal);
 
+#ifdef CONFIG_MPQEMU
+    qemu_opts_foreach(qemu_find_opts("remote"),
+                      parse_remote, NULL, &error_fatal);
+#endif
+
     cpu_synchronize_all_post_init();
 
     rom_reset_order_override();
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 29/49] multi-process: add parse_cmdline in remote process
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (27 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 28/49] multi-process: add remote options parser Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-10-24  9:09 ` [RFC v4 PATCH 30/49] multi-process: send heartbeat messages to remote Jagannathan Raman
                   ` (24 subsequent siblings)
  53 siblings, 0 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
---
 New patch in v3

 remote/Makefile.objs |   1 +
 remote/remote-main.c |  11 +++++
 remote/remote-opts.c | 115 +++++++++++++++++++++++++++++++++++++++++++++++++++
 remote/remote-opts.h |  31 ++++++++++++++
 4 files changed, 158 insertions(+)
 create mode 100644 remote/remote-opts.c
 create mode 100644 remote/remote-opts.h

diff --git a/remote/Makefile.objs b/remote/Makefile.objs
index c1349ad..a677fce 100644
--- a/remote/Makefile.objs
+++ b/remote/Makefile.objs
@@ -1,4 +1,5 @@
 remote-pci-obj-$(CONFIG_MPQEMU) += remote-main.o
+remote-pci-obj-$(CONFIG_MPQEMU) += remote-opts.o
 remote-pci-obj-$(CONFIG_MPQEMU) += pcihost.o
 remote-pci-obj-$(CONFIG_MPQEMU) += machine.o
 remote-pci-obj-$(CONFIG_MPQEMU) += iohub.o
diff --git a/remote/remote-main.c b/remote/remote-main.c
index 80d8c02..729f7e9 100644
--- a/remote/remote-main.c
+++ b/remote/remote-main.c
@@ -65,6 +65,7 @@
 #include "qapi/qmp/qlist.h"
 #include "qemu/log.h"
 #include "qemu/cutils.h"
+#include "remote-opts.h"
 
 static MPQemuLinkState *mpqemu_link;
 PCIDevice *remote_pci_dev;
@@ -469,6 +470,13 @@ int main(int argc, char *argv[])
 
     current_machine = MACHINE(REMOTE_MACHINE(object_new(TYPE_REMOTE_MACHINE)));
 
+    qemu_add_opts(&qemu_device_opts);
+    qemu_add_opts(&qemu_drive_opts);
+    qemu_add_drive_opts(&qemu_legacy_drive_opts);
+    qemu_add_drive_opts(&qemu_common_drive_opts);
+    qemu_add_drive_opts(&qemu_drive_opts);
+    qemu_add_drive_opts(&bdrv_runtime_opts);
+
     mpqemu_link = mpqemu_link_create();
     if (!mpqemu_link) {
         printf("Could not create MPQemu link\n");
@@ -482,6 +490,9 @@ int main(int argc, char *argv[])
     }
 
     mpqemu_init_channel(mpqemu_link, &mpqemu_link->com, fd);
+
+    parse_cmdline(argc - 2, argv + 2, NULL);
+
     mpqemu_link_set_callback(mpqemu_link, process_msg);
 
     mpqemu_start_coms(mpqemu_link);
diff --git a/remote/remote-opts.c b/remote/remote-opts.c
new file mode 100644
index 0000000..0ebe6b1
--- /dev/null
+++ b/remote/remote-opts.c
@@ -0,0 +1,115 @@
+/*
+ * Remote device initialization
+ *
+ * Copyright 2019, Oracle and/or its affiliates.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include <stdio.h>
+#include <unistd.h>
+
+#include "qemu/module.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "qemu-common.h"
+
+#include "remote/pcihost.h"
+#include "remote/machine.h"
+#include "hw/boards.h"
+#include "hw/qdev-core.h"
+#include "qemu/main-loop.h"
+#include "remote/memory.h"
+#include "io/mpqemu-link.h"
+#include "qapi/error.h"
+#include "qemu-options.h"
+#include "sysemu/arch_init.h"
+
+#include "qapi/qmp/qjson.h"
+#include "qapi/qmp/qobject.h"
+#include "qemu/option.h"
+#include "qemu/config-file.h"
+#include "monitor/qdev.h"
+#include "qapi/qmp/qdict.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/blockdev.h"
+#include "block/block.h"
+#include "remote/remote-opts.h"
+#include "include/qemu-common.h"
+
+#include "vl.h"
+/*
+ * In remote process, we parse only subset of options. The code
+ * taken from vl.c to re-use in remote command line parser.
+ */
+void parse_cmdline(int argc, char **argv, char **envp)
+{
+    int optind;
+    const char *optarg;
+    MachineClass *mc;
+
+    /* from vl.c */
+    optind = 0;
+
+    /* second pass of option parsing */
+
+    for (;;) {
+        if (optind >= argc) {
+            break;
+        }
+        if (argv[optind][0] != '-') {
+            loc_set_cmdline(argv, optind, 1);
+            drive_add(IF_DEFAULT, 0, argv[optind++], HD_OPTS);
+        } else {
+            const QEMUOption *popt;
+
+            popt = lookup_opt(argc, argv, &optarg, &optind);
+            #ifndef REMOTE_PROCESS
+            if (!(popt->arch_mask & arch_type)) {
+                error_report("Option not supported for this target, %x arch_mask, %x arch_type",
+                             popt->arch_mask, arch_type);
+                exit(1);
+            }
+            #endif
+            switch (popt->index) {
+            case QEMU_OPTION_drive:
+                if (drive_def(optarg) == NULL) {
+                    fprintf(stderr, "Could not init drive\n");
+                    exit(1);
+                }
+                break;
+            default:
+                break;
+            }
+        }
+    }
+    mc = MACHINE_GET_CLASS(current_machine);
+
+    mc->block_default_type = IF_IDE;
+    if (qemu_opts_foreach(qemu_find_opts("drive"), drive_init_func,
+                          &mc->block_default_type, &error_fatal)) {
+        /* We printed help */
+        exit(0);
+    }
+
+    return;
+}
diff --git a/remote/remote-opts.h b/remote/remote-opts.h
new file mode 100644
index 0000000..e15c29b
--- /dev/null
+++ b/remote/remote-opts.h
@@ -0,0 +1,31 @@
+/*
+ * Remote device initialization
+ *
+ * Copyright 2019, Oracle and/or its affiliates.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef REMOTE_OPTS_H
+#define REMOTE_OPTS_H
+
+void parse_cmdline(int argc, char **argv, char **envp);
+
+#endif
+
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 30/49] multi-process: send heartbeat messages to remote
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (28 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 29/49] multi-process: add parse_cmdline in remote process Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-11-11 16:27   ` Stefan Hajnoczi
  2019-10-24  9:09 ` [RFC v4 PATCH 31/49] multi-process: handle heartbeat messages in remote process Jagannathan Raman
                   ` (23 subsequent siblings)
  53 siblings, 1 reply; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

In order to detect remote processes which are hung, the
proxy periodically sends heartbeat messages to confirm if
the remote process is alive

Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 hw/proxy/qemu-proxy.c    | 101 +++++++++++++++++++++++++++++++++++++++++++++++
 include/io/mpqemu-link.h |   1 +
 2 files changed, 102 insertions(+)

diff --git a/hw/proxy/qemu-proxy.c b/hw/proxy/qemu-proxy.c
index fc1c731..691b991 100644
--- a/hw/proxy/qemu-proxy.c
+++ b/hw/proxy/qemu-proxy.c
@@ -53,14 +53,96 @@
 #include "hw/boards.h"
 #include "include/qemu/log.h"
 
+QEMUTimer *hb_timer;
 static void pci_proxy_dev_realize(PCIDevice *dev, Error **errp);
 static void setup_irqfd(PCIProxyDev *dev);
+static void pci_dev_exit(PCIDevice *dev);
+static void start_heartbeat_timer(void);
+static void stop_heartbeat_timer(void);
+static void childsig_handler(int sig, siginfo_t *siginfo, void *ctx);
+static void broadcast_msg(MPQemuMsg *msg, bool need_reply);
+
+static void childsig_handler(int sig, siginfo_t *siginfo, void *ctx)
+{
+    /* TODO: Add proper handler. */
+    printf("Child (pid %d) is dead? Signal is %d, Exit code is %d.\n",
+           siginfo->si_pid, siginfo->si_signo, siginfo->si_code);
+}
+
+static void broadcast_msg(MPQemuMsg *msg, bool need_reply)
+{
+    PCIProxyDev *entry;
+    unsigned int pid;
+    int wait;
+
+    QLIST_FOREACH(entry, &proxy_dev_list.devices, next) {
+        if (need_reply) {
+            wait = eventfd(0, EFD_NONBLOCK);
+            msg->num_fds = 1;
+            msg->fds[0] = wait;
+        }
+
+        mpqemu_msg_send(entry->mpqemu_link, msg, entry->mpqemu_link->com);
+        if (need_reply) {
+            pid = (uint32_t)wait_for_remote(wait);
+            close(wait);
+            /* TODO: Add proper handling. */
+            if (pid) {
+                need_reply = 0;
+            }
+        }
+    }
+}
+
+#define NOP_INTERVAL 1000000
+
+static void remote_ping(void *opaque)
+{
+    MPQemuMsg msg;
+
+    memset(&msg, 0, sizeof(MPQemuMsg));
+
+    msg.num_fds = 0;
+    msg.cmd = PROXY_PING;
+    msg.bytestream = 0;
+    msg.size = 0;
+
+    broadcast_msg(&msg, true);
+    timer_mod(hb_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + NOP_INTERVAL);
+
+}
+
+void start_heartbeat_timer(void)
+{
+    hb_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
+                                            remote_ping,
+                                            &proxy_dev_list);
+    timer_mod(hb_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + NOP_INTERVAL);
+
+}
+
+static void stop_heartbeat_timer(void)
+{
+    timer_del(hb_timer);
+    timer_free(hb_timer);
+}
+
+static void set_sigchld_handler(void)
+{
+    struct sigaction sa_sigterm;
+    memset(&sa_sigterm, 0, sizeof(sa_sigterm));
+    sa_sigterm.sa_sigaction = childsig_handler;
+    sa_sigterm.sa_flags = SA_SIGINFO | SA_NOCLDWAIT | SA_NOCLDSTOP;
+    sigaction(SIGCHLD, &sa_sigterm, NULL);
+}
 
 static void proxy_ready(PCIDevice *dev)
 {
     PCIProxyDev *pdev = PCI_PROXY_DEV(dev);
 
     setup_irqfd(pdev);
+    set_sigchld_handler();
+    start_heartbeat_timer();
 }
 
 static void set_remote_opts(PCIDevice *dev, QDict *qdict, unsigned int cmd)
@@ -259,6 +341,7 @@ static void pci_proxy_dev_class_init(ObjectClass *klass, void *data)
     PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
     k->realize = pci_proxy_dev_realize;
+    k->exit = pci_dev_exit;
     k->config_read = pci_proxy_read_config;
     k->config_write = pci_proxy_write_config;
 }
@@ -397,6 +480,24 @@ static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
     dev->proxy_ready = proxy_ready;
 }
 
+static void pci_dev_exit(PCIDevice *pdev)
+{
+    PCIProxyDev *entry, *sentry;
+    PCIProxyDev *dev = PCI_PROXY_DEV(pdev);
+
+    stop_heartbeat_timer();
+
+    QLIST_FOREACH_SAFE(entry, &proxy_dev_list.devices, next, sentry) {
+        if (entry->remote_pid == dev->remote_pid) {
+            QLIST_REMOVE(entry, next);
+        }
+    }
+
+    if (!QLIST_EMPTY(&proxy_dev_list.devices)) {
+        start_heartbeat_timer();
+    }
+}
+
 static void send_bar_access_msg(PCIProxyDev *dev, MemoryRegion *mr,
                                 bool write, hwaddr addr, uint64_t *val,
                                 unsigned size, bool memory)
diff --git a/include/io/mpqemu-link.h b/include/io/mpqemu-link.h
index 3145b0e..16a913b 100644
--- a/include/io/mpqemu-link.h
+++ b/include/io/mpqemu-link.h
@@ -72,6 +72,7 @@ typedef enum {
     DRIVE_OPTS,
     DEVICE_ADD,
     DEVICE_DEL,
+    PROXY_PING,
     MAX,
 } mpqemu_cmd_t;
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 31/49] multi-process: handle heartbeat messages in remote process
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (29 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 30/49] multi-process: send heartbeat messages to remote Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-10-24  9:09 ` [RFC v4 PATCH 32/49] multi-process: Use separate MMIO communication channel Jagannathan Raman
                   ` (22 subsequent siblings)
  53 siblings, 0 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

If the remote process is alive, it responds to proxy's heartbeat
messages

Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 remote/remote-main.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/remote/remote-main.c b/remote/remote-main.c
index 729f7e9..27e4492 100644
--- a/remote/remote-main.c
+++ b/remote/remote-main.c
@@ -355,6 +355,7 @@ static void process_msg(GIOCondition cond, MPQemuChannel *chan)
 {
     MPQemuMsg *msg = NULL;
     Error *err = NULL;
+    int wait;
 
     if ((cond & G_IO_HUP) || (cond & G_IO_ERR)) {
         error_setg(&err, "socket closed, cond is %d", cond);
@@ -434,6 +435,11 @@ static void process_msg(GIOCondition cond, MPQemuChannel *chan)
     case DEVICE_DEL:
         process_device_del_msg(msg);
         break;
+    case PROXY_PING:
+        wait = msg->fds[0];
+        notify_proxy(wait, (uint32_t)getpid());
+        PUT_REMOTE_WAIT(wait);
+        break;
     default:
         error_setg(&err, "Unknown command");
         goto finalize_loop;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 32/49] multi-process: Use separate MMIO communication channel
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (30 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 31/49] multi-process: handle heartbeat messages in remote process Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-11-11 16:21   ` Stefan Hajnoczi
  2019-10-24  9:09 ` [RFC v4 PATCH 33/49] multi-process: perform device reset in the remote process Jagannathan Raman
                   ` (21 subsequent siblings)
  53 siblings, 1 reply; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

Using a separate communication channel for MMIO helps
with improving Performance

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
---
 New patch in v3

 hw/proxy/qemu-proxy.c         | 39 +++++++++++++++++++++++++++------------
 include/hw/proxy/qemu-proxy.h |  1 +
 include/io/mpqemu-link.h      |  7 +++++++
 io/mpqemu-link.c              |  2 ++
 qdev-monitor.c                |  1 +
 remote/remote-main.c          | 18 +++++++++++++-----
 6 files changed, 51 insertions(+), 17 deletions(-)

diff --git a/hw/proxy/qemu-proxy.c b/hw/proxy/qemu-proxy.c
index 691b991..74aecd3 100644
--- a/hw/proxy/qemu-proxy.c
+++ b/hw/proxy/qemu-proxy.c
@@ -201,20 +201,22 @@ static int make_argv(char *command_str, char **argv, int argc)
 int remote_spawn(PCIProxyDev *pdev, const char *command, Error **errp)
 {
     pid_t rpid;
-    int fd[2] = {-1, -1};
+    int fd[2], mmio[2];
     Error *local_error = NULL;
     char *argv[64];
     int argc = 0, _argc;
     char *sfd;
     char *exec_dir;
     int rc = -EINVAL;
+    struct timeval timeout = {.tv_sec = 10, .tv_usec = 0};
 
     if (pdev->managed) {
         /* Child is forked by external program (such as libvirt). */
         return rc;
     }
 
-    if (socketpair(AF_UNIX, SOCK_STREAM, 0, fd)) {
+    if (socketpair(AF_UNIX, SOCK_STREAM, 0, fd) ||
+        socketpair(AF_UNIX, SOCK_STREAM, 0, mmio)) {
         error_setg(errp, "Unable to create unix socket.");
         return rc;
     }
@@ -222,6 +224,8 @@ int remote_spawn(PCIProxyDev *pdev, const char *command, Error **errp)
     argc = add_argv(exec_dir, argv, argc);
     sfd = g_strdup_printf("%d", fd[1]);
     argc = add_argv(sfd, argv, argc);
+    sfd = g_strdup_printf("%d", mmio[1]);
+    argc = add_argv(sfd, argv, argc);
     _argc = argc;
     argc = make_argv((char *)command, argv, argc);
 
@@ -231,22 +235,32 @@ int remote_spawn(PCIProxyDev *pdev, const char *command, Error **errp)
     if (rpid == -1) {
         error_setg(errp, "Unable to spawn emulation program.");
         close(fd[0]);
+        close(mmio[0]);
         goto fail;
     }
 
     if (rpid == 0) {
         close(fd[0]);
+        close(mmio[0]);
         execvp(argv[0], (char *const *)argv);
         exit(1);
     }
     pdev->remote_pid = rpid;
     pdev->rsocket = fd[1];
     pdev->socket = fd[0];
+    pdev->mmio_sock = mmio[0];
+
+    if (setsockopt(mmio[0], SOL_SOCKET, SO_RCVTIMEO, (char *)&timeout,
+                   sizeof(timeout)) < 0) {
+        error_setg(errp, "Unable to set timeout for socket");
+        goto fail;
+    }
 
     rc = 0;
 
 fail:
     close(fd[1]);
+    close(mmio[1]);
 
     for (int i = 0; i < _argc; i++) {
         g_free(argv[i]);
@@ -443,6 +457,9 @@ static void init_proxy(PCIDevice *dev, char *command, bool need_spawn, Error **e
     mpqemu_init_channel(pdev->mpqemu_link, &pdev->mpqemu_link->com,
                         pdev->socket);
 
+    mpqemu_init_channel(pdev->mpqemu_link, &pdev->mpqemu_link->mmio,
+                        pdev->mmio_sock);
+
     configure_memory_sync(pdev->sync, pdev->mpqemu_link);
 }
 
@@ -503,8 +520,7 @@ static void send_bar_access_msg(PCIProxyDev *dev, MemoryRegion *mr,
                                 unsigned size, bool memory)
 {
     MPQemuLinkState *mpqemu_link = dev->mpqemu_link;
-    MPQemuMsg msg;
-    int wait;
+    MPQemuMsg msg, ret;
 
     memset(&msg, 0, sizeof(MPQemuMsg));
 
@@ -518,19 +534,18 @@ static void send_bar_access_msg(PCIProxyDev *dev, MemoryRegion *mr,
         msg.cmd = BAR_WRITE;
         msg.data1.bar_access.val = *val;
     } else {
-        wait = GET_REMOTE_WAIT;
-
         msg.cmd = BAR_READ;
-        msg.num_fds = 1;
-        msg.fds[0] = wait;
     }
 
-    mpqemu_msg_send(mpqemu_link, &msg, mpqemu_link->com);
+    mpqemu_msg_send(mpqemu_link, &msg, mpqemu_link->mmio);
 
-    if (!write) {
-        *val = wait_for_remote(wait);
-        PUT_REMOTE_WAIT(wait);
+    if (write) {
+        return;
     }
+
+    mpqemu_msg_recv(mpqemu_link, &ret, mpqemu_link->mmio);
+
+    *val = ret.data1.mmio_ret.val;
 }
 
 void proxy_default_bar_write(void *opaque, hwaddr addr, uint64_t val,
diff --git a/include/hw/proxy/qemu-proxy.h b/include/hw/proxy/qemu-proxy.h
index ac61a9b..5e858cc 100644
--- a/include/hw/proxy/qemu-proxy.h
+++ b/include/hw/proxy/qemu-proxy.h
@@ -71,6 +71,7 @@ struct PCIProxyDev {
     pid_t remote_pid;
     int rsocket;
     int socket;
+    int mmio_sock;
 
     char *rid;
 
diff --git a/include/io/mpqemu-link.h b/include/io/mpqemu-link.h
index 16a913b..4911eea 100644
--- a/include/io/mpqemu-link.h
+++ b/include/io/mpqemu-link.h
@@ -73,6 +73,7 @@ typedef enum {
     DEVICE_ADD,
     DEVICE_DEL,
     PROXY_PING,
+    MMIO_RETURN,
     MAX,
 } mpqemu_cmd_t;
 
@@ -107,6 +108,10 @@ typedef struct {
 } set_irqfd_msg_t;
 
 typedef struct {
+    uint64_t val;
+} mmio_ret_msg_t;
+
+typedef struct {
     mpqemu_cmd_t cmd;
     int bytestream;
     size_t size;
@@ -116,6 +121,7 @@ typedef struct {
         sync_sysmem_msg_t sync_sysmem;
         bar_access_msg_t bar_access;
         set_irqfd_msg_t set_irqfd;
+        mmio_ret_msg_t mmio_ret;
     } data1;
 
     int fds[REMOTE_MAX_FDS];
@@ -170,6 +176,7 @@ typedef struct MPQemuLinkState {
     GMainLoop *loop;
 
     MPQemuChannel *com;
+    MPQemuChannel *mmio;
 
     mpqemu_link_callback callback;
 } MPQemuLinkState;
diff --git a/io/mpqemu-link.c b/io/mpqemu-link.c
index 696aeb1..d91936e 100644
--- a/io/mpqemu-link.c
+++ b/io/mpqemu-link.c
@@ -77,6 +77,7 @@ void mpqemu_link_finalize(MPQemuLinkState *s)
     g_main_loop_quit(s->loop);
 
     mpqemu_destroy_channel(s->com);
+    mpqemu_destroy_channel(s->mmio);
 
     object_unref(OBJECT(s));
 }
@@ -344,6 +345,7 @@ void mpqemu_start_coms(MPQemuLinkState *s)
 {
 
     g_assert(g_source_attach(&s->com->gsrc, s->ctx));
+    g_assert(g_source_attach(&s->mmio->gsrc, s->ctx));
 
     g_main_loop_run(s->loop);
 }
diff --git a/qdev-monitor.c b/qdev-monitor.c
index e1d05e4..f38849e 100644
--- a/qdev-monitor.c
+++ b/qdev-monitor.c
@@ -711,6 +711,7 @@ DeviceState *qdev_proxy_add(const char *rid, const char *id, char *bus,
     if (old_pdev) {
         pdev->rsocket = old_pdev->rsocket;
         pdev->socket = old_pdev->socket;
+        pdev->mmio_sock = old_pdev->mmio_sock;
         pdev->remote_pid = old_pdev->remote_pid;
     } else {
         pdev->rsocket = managed ? rsocket : -1;
diff --git a/remote/remote-main.c b/remote/remote-main.c
index 27e4492..0a1326d 100644
--- a/remote/remote-main.c
+++ b/remote/remote-main.c
@@ -117,8 +117,8 @@ static void process_bar_write(MPQemuMsg *msg, Error **errp)
 static void process_bar_read(MPQemuMsg *msg, Error **errp)
 {
     bar_access_msg_t *bar_access = &msg->data1.bar_access;
+    MPQemuMsg ret = { 0 };
     AddressSpace *as;
-    int wait = msg->fds[0];
     MemTxResult res;
     uint64_t val = 0;
 
@@ -152,9 +152,10 @@ static void process_bar_read(MPQemuMsg *msg, Error **errp)
     }
 
 fail:
-    notify_proxy(wait, val);
-
-    PUT_REMOTE_WAIT(wait);
+    ret.cmd = MMIO_RETURN;
+    ret.data1.mmio_ret.val = val;
+    ret.size = sizeof(ret.data1);
+    mpqemu_msg_send(mpqemu_link, &ret, mpqemu_link->mmio);
 }
 
 static void process_device_add_msg(MPQemuMsg *msg)
@@ -497,7 +498,14 @@ int main(int argc, char *argv[])
 
     mpqemu_init_channel(mpqemu_link, &mpqemu_link->com, fd);
 
-    parse_cmdline(argc - 2, argv + 2, NULL);
+    fd = qemu_parse_fd(argv[2]);
+    if (fd == -1) {
+        printf("Failed to parse fd for remote process.\n");
+        return -EINVAL;
+    }
+    mpqemu_init_channel(mpqemu_link, &mpqemu_link->mmio, fd);
+
+    parse_cmdline(argc - 3, argv + 3, NULL);
 
     mpqemu_link_set_callback(mpqemu_link, process_msg);
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 33/49] multi-process: perform device reset in the remote process
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (31 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 32/49] multi-process: Use separate MMIO communication channel Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-11-11 16:19   ` Stefan Hajnoczi
  2019-10-24  9:09 ` [RFC v4 PATCH 34/49] multi-process/mon: choose HMP commands based on target Jagannathan Raman
                   ` (20 subsequent siblings)
  53 siblings, 1 reply; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

Perform device reset in the remote process when QEMU performs
device reset. This is required to reset the internal state
(like registers, etc...) of emulated devices

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 New patch in v3

 hw/proxy/proxy-lsi53c895a.c   |  6 ++++++
 hw/proxy/qemu-proxy.c         | 14 ++++++++++++++
 include/hw/proxy/qemu-proxy.h |  2 ++
 include/io/mpqemu-link.h      |  1 +
 remote/remote-main.c          | 11 +++++++++++
 5 files changed, 34 insertions(+)

diff --git a/hw/proxy/proxy-lsi53c895a.c b/hw/proxy/proxy-lsi53c895a.c
index 7734ae2..f6bd8a1 100644
--- a/hw/proxy/proxy-lsi53c895a.c
+++ b/hw/proxy/proxy-lsi53c895a.c
@@ -57,6 +57,11 @@ static void proxy_lsi_realize(PCIProxyDev *dev, Error **errp)
                           &dev->region[2], "proxy-lsi-ram", 0x2000);
 }
 
+static void proxy_lsi_reset(DeviceState *dev)
+{
+    proxy_device_reset(dev);
+}
+
 static void proxy_lsi_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
@@ -74,6 +79,7 @@ static void proxy_lsi_class_init(ObjectClass *klass, void *data)
     set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
 
     dc->desc = "LSI Proxy Device";
+    dc->reset = proxy_lsi_reset;
 }
 
 static const TypeInfo lsi_proxy_dev_type_info = {
diff --git a/hw/proxy/qemu-proxy.c b/hw/proxy/qemu-proxy.c
index 74aecd3..5aada67 100644
--- a/hw/proxy/qemu-proxy.c
+++ b/hw/proxy/qemu-proxy.c
@@ -577,3 +577,17 @@ const MemoryRegionOps proxy_default_ops = {
         .max_access_size = 1,
     },
 };
+
+void proxy_device_reset(DeviceState *dev)
+{
+    PCIProxyDev *pdev = PCI_PROXY_DEV(dev);
+    MPQemuMsg msg;
+
+    memset(&msg, 0, sizeof(MPQemuMsg));
+
+    msg.bytestream = 0;
+    msg.size = sizeof(msg.data1);
+    msg.cmd = DEVICE_RESET;
+
+    mpqemu_msg_send(pdev->mpqemu_link, &msg, pdev->mpqemu_link->com);
+}
diff --git a/include/hw/proxy/qemu-proxy.h b/include/hw/proxy/qemu-proxy.h
index 5e858cc..672303c 100644
--- a/include/hw/proxy/qemu-proxy.h
+++ b/include/hw/proxy/qemu-proxy.h
@@ -112,4 +112,6 @@ void proxy_default_bar_write(void *opaque, hwaddr addr, uint64_t val,
 
 uint64_t proxy_default_bar_read(void *opaque, hwaddr addr, unsigned size);
 
+void proxy_device_reset(DeviceState *dev);
+
 #endif /* QEMU_PROXY_H */
diff --git a/include/io/mpqemu-link.h b/include/io/mpqemu-link.h
index 4911eea..6fcc6f5 100644
--- a/include/io/mpqemu-link.h
+++ b/include/io/mpqemu-link.h
@@ -74,6 +74,7 @@ typedef enum {
     DEVICE_DEL,
     PROXY_PING,
     MMIO_RETURN,
+    DEVICE_RESET,
     MAX,
 } mpqemu_cmd_t;
 
diff --git a/remote/remote-main.c b/remote/remote-main.c
index 0a1326d..4459d26 100644
--- a/remote/remote-main.c
+++ b/remote/remote-main.c
@@ -66,8 +66,11 @@
 #include "qemu/log.h"
 #include "qemu/cutils.h"
 #include "remote-opts.h"
+#include "monitor/monitor.h"
+#include "sysemu/reset.h"
 
 static MPQemuLinkState *mpqemu_link;
+
 PCIDevice *remote_pci_dev;
 bool create_done;
 
@@ -237,6 +240,11 @@ fail:
     PUT_REMOTE_WAIT(wait);
 }
 
+static void process_device_reset_msg(MPQemuMsg *msg)
+{
+    qemu_devices_reset();
+}
+
 static int init_drive(QDict *rqdict, Error **errp)
 {
     QemuOpts *opts;
@@ -441,6 +449,9 @@ static void process_msg(GIOCondition cond, MPQemuChannel *chan)
         notify_proxy(wait, (uint32_t)getpid());
         PUT_REMOTE_WAIT(wait);
         break;
+    case DEVICE_RESET:
+        process_device_reset_msg(msg);
+        break;
     default:
         error_setg(&err, "Unknown command");
         goto finalize_loop;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 34/49] multi-process/mon: choose HMP commands based on target
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (32 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 33/49] multi-process: perform device reset in the remote process Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-10-24  9:09 ` [RFC v4 PATCH 35/49] multi-process/mon: stub functions to enable QMP module for remote process Jagannathan Raman
                   ` (19 subsequent siblings)
  53 siblings, 0 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

Add "targets" field to HMP command definition to select the targets
which would be supported by each command

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 New patch in v4

 hmp-commands-info.hx | 10 ++++++++++
 hmp-commands.hx      | 20 ++++++++++++++++++++
 scripts/hxtool       | 44 ++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 72 insertions(+), 2 deletions(-)
 mode change 100644 => 100755 scripts/hxtool

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 257ee7d..631cc76 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -19,6 +19,7 @@ ETEXI
         .params     = "",
         .help       = "show the version of QEMU",
         .cmd        = hmp_info_version,
+        .targets    = "scsi",
         .flags      = "p",
     },
 
@@ -48,6 +49,7 @@ ETEXI
         .params     = "",
         .help       = "show the character devices",
         .cmd        = hmp_info_chardev,
+        .targets    = "scsi",
         .flags      = "p",
     },
 
@@ -64,6 +66,7 @@ ETEXI
         .help       = "show info of one block device or all block devices "
                       "(-n: show named nodes; -v: show details)",
         .cmd        = hmp_info_block,
+        .targets    = "scsi",
     },
 
 STEXI
@@ -78,6 +81,7 @@ ETEXI
         .params     = "",
         .help       = "show block device statistics",
         .cmd        = hmp_info_blockstats,
+        .targets    = "scsi",
     },
 
 STEXI
@@ -92,6 +96,7 @@ ETEXI
         .params     = "",
         .help       = "show progress of ongoing block device operations",
         .cmd        = hmp_info_block_jobs,
+        .targets    = "scsi",
     },
 
 STEXI
@@ -167,6 +172,7 @@ ETEXI
         .params     = "",
         .help       = "show the command line history",
         .cmd        = hmp_info_history,
+        .targets    = "scsi",
         .flags      = "p",
     },
 
@@ -224,6 +230,7 @@ ETEXI
         .params     = "",
         .help       = "show PCI info",
         .cmd        = hmp_info_pci,
+        .targets    = "scsi",
     },
 
 STEXI
@@ -630,6 +637,7 @@ ETEXI
         .params     = "",
         .help       = "show device tree",
         .cmd        = hmp_info_qtree,
+        .targets    = "scsi",
     },
 
 STEXI
@@ -644,6 +652,7 @@ ETEXI
         .params     = "",
         .help       = "show qdev device model list",
         .cmd        = hmp_info_qdm,
+        .targets    = "scsi",
     },
 
 STEXI
@@ -658,6 +667,7 @@ ETEXI
         .params     = "[path]",
         .help       = "show QOM composition tree",
         .cmd        = hmp_info_qom_tree,
+        .targets    = "scsi",
         .flags      = "p",
     },
 
diff --git a/hmp-commands.hx b/hmp-commands.hx
index cfcc044..6d9674b 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -49,6 +49,7 @@ ETEXI
         .params     = "",
         .help       = "quit the emulator",
         .cmd        = hmp_quit,
+        .targets    = "scsi",
     },
 
 STEXI
@@ -82,6 +83,7 @@ ETEXI
         .params     = "device size",
         .help       = "resize a block image",
         .cmd        = hmp_block_resize,
+        .targets    = "scsi",
     },
 
 STEXI
@@ -99,6 +101,7 @@ ETEXI
         .params     = "device [speed [base]]",
         .help       = "copy data from a backing file into a block device",
         .cmd        = hmp_block_stream,
+        .targets    = "scsi",
     },
 
 STEXI
@@ -113,6 +116,7 @@ ETEXI
         .params     = "device speed",
         .help       = "set maximum speed for a background block operation",
         .cmd        = hmp_block_job_set_speed,
+        .targets    = "scsi",
     },
 
 STEXI
@@ -129,6 +133,7 @@ ETEXI
                       "\n\t\t\t if you want to abort the operation immediately"
                       "\n\t\t\t instead of keep running until data is in sync)",
         .cmd        = hmp_block_job_cancel,
+        .targets    = "scsi",
     },
 
 STEXI
@@ -143,6 +148,7 @@ ETEXI
         .params     = "device",
         .help       = "stop an active background block operation",
         .cmd        = hmp_block_job_complete,
+        .targets    = "scsi",
     },
 
 STEXI
@@ -158,6 +164,7 @@ ETEXI
         .params     = "device",
         .help       = "pause an active background block operation",
         .cmd        = hmp_block_job_pause,
+        .targets    = "scsi",
     },
 
 STEXI
@@ -172,6 +179,7 @@ ETEXI
         .params     = "device",
         .help       = "resume a paused background block operation",
         .cmd        = hmp_block_job_resume,
+        .targets    = "scsi",
     },
 
 STEXI
@@ -186,6 +194,7 @@ ETEXI
         .params     = "[-f] device",
         .help       = "eject a removable medium (use -f to force it)",
         .cmd        = hmp_eject,
+        .targets    = "scsi",
     },
 
 STEXI
@@ -200,6 +209,7 @@ ETEXI
         .params     = "device",
         .help       = "remove host block device",
         .cmd        = hmp_drive_del,
+        .targets    = "scsi",
     },
 
 STEXI
@@ -219,6 +229,7 @@ ETEXI
         .params     = "device filename [format [read-only-mode]]",
         .help       = "change a removable medium, optional format",
         .cmd        = hmp_change,
+        .targets    = "scsi",
     },
 
 STEXI
@@ -732,6 +743,7 @@ ETEXI
         .help       = "add device, like -device on the command line",
         .cmd        = hmp_device_add,
         .command_completion = device_add_completion,
+        .targets    = "scsi",
     },
 
 STEXI
@@ -747,6 +759,7 @@ ETEXI
         .help       = "remove device",
         .cmd        = hmp_device_del,
         .command_completion = device_del_completion,
+        .targets    = "scsi",
     },
 
 STEXI
@@ -1351,6 +1364,7 @@ ETEXI
                       "The -c flag requests QEMU to compress backup data\n\t\t\t"
                       "(if the target format supports it).\n\t\t\t",
         .cmd        = hmp_drive_backup,
+        .targets    = "scsi",
     },
 STEXI
 @item drive_backup
@@ -1368,6 +1382,7 @@ ETEXI
                       "[,readonly=on|off][,copy-on-read=on|off]",
         .help       = "add drive to PCI storage controller",
         .cmd        = hmp_drive_add,
+        .targets    = "scsi",
     },
 
 STEXI
@@ -1816,6 +1831,7 @@ ETEXI
         .help       = "add chardev",
         .cmd        = hmp_chardev_add,
         .command_completion = chardev_add_completion,
+        .targets    = "scsi",
     },
 
 STEXI
@@ -1831,6 +1847,7 @@ ETEXI
         .params     = "id args",
         .help       = "change chardev",
         .cmd        = hmp_chardev_change,
+        .targets    = "scsi",
     },
 
 STEXI
@@ -1848,6 +1865,7 @@ ETEXI
         .help       = "remove chardev",
         .cmd        = hmp_chardev_remove,
         .command_completion = chardev_remove_completion,
+        .targets    = "scsi",
     },
 
 STEXI
@@ -1864,6 +1882,7 @@ ETEXI
         .help       = "send a break on chardev",
         .cmd        = hmp_chardev_send_break,
         .command_completion = chardev_remove_completion,
+        .targets    = "scsi",
     },
 
 STEXI
@@ -1938,6 +1957,7 @@ ETEXI
         .params     = "[subcommand]",
         .help       = "show various information about the system state",
         .cmd        = hmp_info_help,
+        .targets    = "scsi",
         .sub_table  = hmp_info_cmds,
         .flags      = "p",
     },
diff --git a/scripts/hxtool b/scripts/hxtool
old mode 100644
new mode 100755
index 7d7c428..02fbbd8
--- a/scripts/hxtool
+++ b/scripts/hxtool
@@ -10,7 +10,14 @@ hxtoh()
             STEXI*|ETEXI*) flag=$(($flag^1))
             ;;
             *)
-            test $flag -eq 1 && printf "%s\n" "$str"
+            # Skip line that has ".targets" as it is for multi-process targets based hmp
+            # commands generation.
+            echo $str | grep -q '.targets'
+            if [ $? -eq 0 ]; then
+                continue
+            else
+                test $flag -eq 1 && printf "%s\n" "$str"
+            fi
             ;;
         esac
     done
@@ -53,16 +60,49 @@ hxtotexi()
             print_texi_heading "$(expr "$str" : "ARCHHEADING(\(.*\),.*)")"
             ;;
             *)
-            test $flag -eq 1 && printf '%s\n' "$str"
+            # Skip line that has ".targets" as it is for multi-process targets based hmp
+            # commands generation.
+            echo $str | grep -q '.targetss'
+            if [ $? -eq 0 ]; then
+                continue
+            else
+                test $flag -eq 1 && printf '%s\n' "$str"
+            fi
             ;;
         esac
         line=$((line+1))
     done
 }
 
+hxtoh_tgt()
+{
+    section=""
+    flag=1
+    use_section=0
+    while read -r str; do
+        # Print section if it has ".targets" and the second argument passed to the
+        # script, such as "scsi".
+        echo "$str" | grep -q -E ".targets.*$1"
+        if [ $? -eq 0 ]; then
+            use_section=1
+            continue
+        fi
+        case $str in
+            HXCOMM*)
+            ;;
+            STEXI*|ETEXI*) flag=$(($flag^1)); test $use_section -eq 1 && printf '%s' "$section"; section=""; use_section=0
+            ;;
+            *)
+            test $flag -eq 1 && section="${section} ${str} ${IFS}"
+            ;;
+        esac
+    done
+}
+
 case "$1" in
 "-h") hxtoh ;;
 "-t") hxtotexi ;;
+"-tgt") hxtoh_tgt $2 ;;
 *) exit 1 ;;
 esac
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 35/49] multi-process/mon: stub functions to enable QMP module for remote process
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (33 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 34/49] multi-process/mon: choose HMP commands based on target Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-10-24  9:09 ` [RFC v4 PATCH 36/49] multi-process/mon: enable QMP module support in the " Jagannathan Raman
                   ` (18 subsequent siblings)
  53 siblings, 0 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

QMP module doesn't need some functions to run independently on the
remote processes. However, these functions are necessary for
compilation. Therefore, these functions are stub'ed out. The
stub functions raise an assert if QEMU is built in debug mode
(--enable-debug).

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 New patch in v3

 accel/stubs/tcg-stub.c |  10 +++
 configure              |   4 ++
 include/qemu-common.h  |   8 +++
 stubs/gdbstub.c        |  21 +++++++
 stubs/migration.c      | 162 +++++++++++++++++++++++++++++++++++++++++++++++++
 stubs/monitor.c        |  31 ++++++++++
 stubs/net-stub.c       |  69 +++++++++++++++++++++
 stubs/qapi-misc.c      |  41 +++++++++++++
 stubs/qapi-target.c    |  49 +++++++++++++++
 stubs/ui-stub.c        | 130 +++++++++++++++++++++++++++++++++++++++
 stubs/vl-stub.c        |  92 ++++++++++++++++++++++++++++
 11 files changed, 617 insertions(+)
 create mode 100644 stubs/migration.c
 create mode 100644 stubs/qapi-misc.c
 create mode 100644 stubs/qapi-target.c
 create mode 100644 stubs/ui-stub.c

diff --git a/accel/stubs/tcg-stub.c b/accel/stubs/tcg-stub.c
index 9b55fb0..c2d72fe 100644
--- a/accel/stubs/tcg-stub.c
+++ b/accel/stubs/tcg-stub.c
@@ -119,3 +119,13 @@ page_collection_lock(tb_page_addr_t start, tb_page_addr_t end)
 void page_collection_unlock(struct page_collection *set)
 {
 }
+
+void dump_exec_info(void)
+{
+    qemu_debug_assert(0);
+}
+
+void dump_opcount_info(void)
+{
+    qemu_debug_assert(0);
+}
diff --git a/configure b/configure
index 135afa9..aaf25eb 100755
--- a/configure
+++ b/configure
@@ -7252,6 +7252,10 @@ if test "$mpqemu" = "yes" ; then
   echo "CONFIG_MPQEMU=y" >> $config_host_mak
 fi
 
+if test "$debug" = "yes" ; then
+  echo "CONFIG_DEBUG=y" >> $config_host_mak
+fi
+
 if test "$bochs" = "yes" ; then
   echo "CONFIG_BOCHS=y" >> $config_host_mak
 fi
diff --git a/include/qemu-common.h b/include/qemu-common.h
index 8d84db9..277d145 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -10,6 +10,8 @@
 #ifndef QEMU_COMMON_H
 #define QEMU_COMMON_H
 
+#include <assert.h>
+
 #define TFR(expr) do { if ((expr) != -1) break; } while (errno == EINTR)
 
 /* Copyright string for -version arguments, About dialogs, etc */
@@ -129,4 +131,10 @@ void page_size_init(void);
  * returned. */
 bool dump_in_progress(void);
 
+#ifdef CONFIG_DEBUG
+#define qemu_debug_assert(x) assert(x)
+#else
+#define qemu_debug_assert(x)
+#endif
+
 #endif
diff --git a/stubs/gdbstub.c b/stubs/gdbstub.c
index 2b7aee5..28c574a 100644
--- a/stubs/gdbstub.c
+++ b/stubs/gdbstub.c
@@ -1,6 +1,27 @@
 #include "qemu/osdep.h"
+#include "qemu-common.h"
 #include "exec/gdbstub.h"       /* xml_builtin */
 
 const char *const xml_builtin[][2] = {
   { NULL, NULL }
 };
+
+#ifdef CONFIG_USER_ONLY
+
+int gdbserver_start(int port)
+{
+    qemu_debug_assert(0);
+
+    return -ENOSYS;
+}
+
+#else
+
+int gdbserver_start(const char *device)
+{
+    qemu_debug_assert(0);
+
+    return -ENOSYS;
+}
+
+#endif
diff --git a/stubs/migration.c b/stubs/migration.c
new file mode 100644
index 0000000..28ccf80
--- /dev/null
+++ b/stubs/migration.c
@@ -0,0 +1,162 @@
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include "migration/misc.h"
+#include "migration/snapshot.h"
+#include "qapi/qapi-types-migration.h"
+#include "qapi/qapi-commands-migration.h"
+#include "qapi/qapi-types-net.h"
+
+MigrationInfo *qmp_query_migrate(Error **errp)
+{
+    qemu_debug_assert(0);
+
+    return NULL;
+}
+
+void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
+                                  Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+MigrationCapabilityStatusList *qmp_query_migrate_capabilities(Error **errp)
+{
+    qemu_debug_assert(0);
+
+    return NULL;
+}
+
+void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+MigrationParameters *qmp_query_migrate_parameters(Error **errp)
+{
+    qemu_debug_assert(0);
+
+    return NULL;
+}
+
+void qmp_migrate_start_postcopy(Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+void qmp_migrate_cancel(Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+void qmp_migrate_continue(MigrationStatus state, Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+void qmp_migrate_set_downtime(double value, Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+void qmp_migrate_set_speed(int64_t value, Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+void qmp_migrate_set_cache_size(int64_t value, Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+int64_t qmp_query_migrate_cache_size(Error **errp)
+{
+    qemu_debug_assert(0);
+
+    return 0;
+}
+
+void qmp_migrate(const char *uri, bool has_blk, bool blk,
+                 bool has_inc, bool inc, bool has_detach, bool detach,
+                 bool has_resume, bool resume, Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+void qmp_migrate_incoming(const char *uri, Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+void qmp_migrate_recover(const char *uri, Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+void qmp_migrate_pause(Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+void qmp_x_colo_lost_heartbeat(Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+void qmp_xen_save_devices_state(const char *filename, bool has_live, bool live,
+                                Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+void qmp_xen_set_replication(bool enable, bool primary,
+                             bool has_failover, bool failover,
+                             Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+ReplicationStatus *qmp_query_xen_replication_status(Error **errp)
+{
+    qemu_debug_assert(0);
+
+    return NULL;
+}
+
+void qmp_xen_colo_do_checkpoint(Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+COLOStatus *qmp_query_colo_status(Error **errp)
+{
+    qemu_debug_assert(0);
+
+    return NULL;
+}
+
+void migration_global_dump(Monitor *mon)
+{
+    qemu_debug_assert(0);
+}
+
+int load_snapshot(const char *name, Error **errp)
+{
+    qemu_debug_assert(0);
+
+    return -ENOSYS;
+}
+
+int save_snapshot(const char *name, Error **errp)
+{
+    qemu_debug_assert(0);
+
+    return -ENOSYS;
+}
+
+AnnounceParameters *migrate_announce_params(void)
+{
+    qemu_debug_assert(0);
+
+    return NULL;
+}
diff --git a/stubs/monitor.c b/stubs/monitor.c
index 17d2493..8fc129f 100644
--- a/stubs/monitor.c
+++ b/stubs/monitor.c
@@ -1,4 +1,5 @@
 #include "qemu/osdep.h"
+#include "qemu-common.h"
 #include "qapi/error.h"
 #include "qapi/qapi-emit-events.h"
 #include "monitor/monitor.h"
@@ -10,6 +11,24 @@
 #include "sysemu/sysemu.h"
 #include "sysemu/runstate.h"
 #include "monitor/hmp.h"
+#include "monitor/qdev.h"
+#include "sysemu/blockdev.h"
+#include "sysemu/sysemu.h"
+
+#include "qapi/qapi-types-block-core.h"
+#include "qapi/qapi-commands-block-core.h"
+
+#pragma weak cur_mon
+#pragma weak monitor_vprintf
+#pragma weak monitor_get_fd
+#pragma weak monitor_init
+#pragma weak qapi_event_emit
+#pragma weak monitor_get_cpu_index
+#pragma weak monitor_printf
+#pragma weak monitor_cur_is_qmp
+#pragma weak qmp_device_list_properties
+#pragma weak monitor_init_qmp
+#pragma weak monitor_init_hmp
 
 __thread Monitor *cur_mon;
 
@@ -17,11 +36,13 @@ __thread Monitor *cur_mon;
 
 int monitor_vprintf(Monitor *mon, const char *fmt, va_list ap)
 {
+    qemu_debug_assert(0);
     abort();
 }
 
 int monitor_get_fd(Monitor *mon, const char *name, Error **errp)
 {
+    qemu_debug_assert(0);
     error_setg(errp, "only QEMU supports file descriptor passing");
     return -1;
 }
@@ -32,29 +53,39 @@ void monitor_init_qmp(Chardev *chr, bool pretty)
 
 void monitor_init_hmp(Chardev *chr, bool use_readline)
 {
+    qemu_debug_assert(0);
 }
 
 void qapi_event_emit(QAPIEvent event, QDict *qdict)
 {
+    qemu_debug_assert(0);
 }
 
 int monitor_get_cpu_index(void)
 {
+    qemu_debug_assert(0);
+
     return -ENOSYS;
 }
 int monitor_printf(Monitor *mon, const char *fmt, ...)
 {
+    qemu_debug_assert(0);
+
     return -ENOSYS;
 }
 
 bool monitor_cur_is_qmp(void)
 {
+    qemu_debug_assert(0);
+
     return false;
 }
 
 ObjectPropertyInfoList *qmp_device_list_properties(const char *typename,
                                                    Error **errp)
 {
+    qemu_debug_assert(0);
+
     return NULL;
 }
 
diff --git a/stubs/net-stub.c b/stubs/net-stub.c
index cb2274b..962827e 100644
--- a/stubs/net-stub.c
+++ b/stubs/net-stub.c
@@ -2,6 +2,9 @@
 #include "qemu-common.h"
 #include "net/net.h"
 
+#include "qapi/qapi-commands-net.h"
+#include "qapi/qapi-commands-rocker.h"
+
 int qemu_find_net_clients_except(const char *id, NetClientState **ncs,
                                  NetClientDriver type, int max)
 {
@@ -29,3 +32,69 @@ int qemu_find_nic_model(NICInfo *nd, const char * const *models,
     return -ENOSYS;
 }
 
+void qmp_set_link(const char *name, bool up, Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+void qmp_netdev_del(const char *id, Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+RxFilterInfoList *qmp_query_rx_filter(bool has_name, const char *name,
+                                      Error **errp)
+{
+    qemu_debug_assert(0);
+
+    return NULL;
+}
+
+void qmp_announce_self(AnnounceParameters *params, Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+RockerSwitch *qmp_query_rocker(const char *name, Error **errp)
+{
+    qemu_debug_assert(0);
+
+    return NULL;
+}
+
+RockerPortList *qmp_query_rocker_ports(const char *name, Error **errp)
+{
+    qemu_debug_assert(0);
+
+    return NULL;
+}
+
+RockerOfDpaFlowList *qmp_query_rocker_of_dpa_flows(const char *name,
+                                                   bool has_tbl_id,
+                                                   uint32_t tbl_id,
+                                                   Error **errp)
+{
+    qemu_debug_assert(0);
+
+    return NULL;
+}
+
+RockerOfDpaGroupList *qmp_query_rocker_of_dpa_groups(const char *name,
+                                                     bool has_type,
+                                                     uint8_t type,
+                                                     Error **errp)
+{
+    qemu_debug_assert(0);
+
+    return NULL;
+}
+
+void qmp_netdev_add(QDict *qdict, QObject **ret, Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+void netdev_add(QemuOpts *opts, Error **errp)
+{
+    qemu_debug_assert(0);
+}
diff --git a/stubs/qapi-misc.c b/stubs/qapi-misc.c
new file mode 100644
index 0000000..3eeedd9
--- /dev/null
+++ b/stubs/qapi-misc.c
@@ -0,0 +1,41 @@
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include "qapi/qapi-commands-misc.h"
+#include "./qapi/qapi-types-dump.h"
+#include "qapi/qapi-commands-dump.h"
+
+void qmp_dump_guest_memory(bool paging, const char *file,
+                           bool has_detach, bool detach,
+                           bool has_begin, int64_t begin, bool has_length,
+                           int64_t length, bool has_format,
+                           DumpGuestMemoryFormat format, Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+DumpQueryResult *qmp_query_dump(Error **errp)
+{
+    qemu_debug_assert(0);
+
+    return NULL;
+}
+
+DumpGuestMemoryCapability *qmp_query_dump_guest_memory_capability(Error **errp)
+{
+    qemu_debug_assert(0);
+
+    return NULL;
+}
+
+void qmp_xen_load_devices_state(const char *filename, Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+bool dump_in_progress(void)
+{
+    qemu_debug_assert(0);
+
+    return FALSE;
+}
diff --git a/stubs/qapi-target.c b/stubs/qapi-target.c
new file mode 100644
index 0000000..b3a3ffc
--- /dev/null
+++ b/stubs/qapi-target.c
@@ -0,0 +1,49 @@
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include "qapi/qapi-types-misc-target.h"
+#include "qapi/qapi-commands-misc-target.h"
+#include "qapi/qapi-types-machine-target.h"
+#include "qapi/qapi-commands-machine-target.h"
+
+void qmp_rtc_reset_reinjection(Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+SevInfo *qmp_query_sev(Error **errp)
+{
+    qemu_debug_assert(0);
+
+    return NULL;
+}
+
+SevLaunchMeasureInfo *qmp_query_sev_launch_measure(Error **errp)
+{
+    qemu_debug_assert(0);
+
+    return NULL;
+}
+
+SevCapability *qmp_query_sev_capabilities(Error **errp)
+{
+    qemu_debug_assert(0);
+
+    return NULL;
+}
+
+CpuModelExpansionInfo *qmp_query_cpu_model_expansion(CpuModelExpansionType type,
+                                                     CpuModelInfo *model,
+                                                     Error **errp)
+{
+    qemu_debug_assert(0);
+
+    return NULL;
+}
+
+CpuDefinitionInfoList *qmp_query_cpu_definitions(Error **errp)
+{
+    qemu_debug_assert(0);
+
+    return NULL;
+}
diff --git a/stubs/ui-stub.c b/stubs/ui-stub.c
new file mode 100644
index 0000000..a5a63ea
--- /dev/null
+++ b/stubs/ui-stub.c
@@ -0,0 +1,130 @@
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include "ui/console.h"
+#include "ui/input.h"
+#include "ui/qemu-spice.h"
+
+#include "qapi/qapi-types-ui.h"
+#include "qapi/qapi-commands-ui.h"
+
+void qmp_screendump(const char *filename, bool has_device, const char *device,
+                    bool has_head, int64_t head, Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+VncInfo *qmp_query_vnc(Error **errp)
+{
+    qemu_debug_assert(0);
+
+    return NULL;
+}
+
+VncInfo2List *qmp_query_vnc_servers(Error **errp)
+{
+    qemu_debug_assert(0);
+
+    return NULL;
+}
+
+MouseInfoList *qmp_query_mice(Error **errp)
+{
+    qemu_debug_assert(0);
+
+    return NULL;
+}
+
+void qmp_send_key(KeyValueList *keys, bool has_hold_time, int64_t hold_time,
+                  Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+void qmp_input_send_event(bool has_device, const char *device,
+                          bool has_head, int64_t head,
+                          InputEventList *events, Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+void vnc_display_open(const char *id, Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+void vnc_display_add_client(const char *id, int csock, bool skipauth)
+{
+    qemu_debug_assert(0);
+}
+
+void qemu_input_queue_rel(QemuConsole *src, InputAxis axis, int value)
+{
+    qemu_debug_assert(0);
+}
+
+void qemu_input_queue_btn(QemuConsole *src, InputButton btn, bool down)
+{
+    qemu_debug_assert(0);
+}
+
+void qemu_input_event_sync(void)
+{
+    qemu_debug_assert(0);
+}
+
+void qemu_input_update_buttons(QemuConsole *src, uint32_t *button_map,
+                               uint32_t button_old, uint32_t button_new)
+{
+    qemu_debug_assert(0);
+}
+
+#ifdef CONFIG_SPICE
+
+int using_spice;
+
+SpiceInfo *qmp_query_spice(Error **errp)
+{
+    qemu_debug_assert(0);
+
+    return NULL;
+}
+
+int qemu_spice_migrate_info(const char *hostname, int port, int tls_port,
+                            const char *subject)
+{
+    qemu_debug_assert(0);
+
+    return -ENOSYS;
+}
+
+int qemu_spice_display_add_client(int csock, int skipauth, int tls)
+{
+    qemu_debug_assert(0);
+
+    return -ENOSYS;
+}
+
+int qemu_spice_set_passwd(const char *passwd, bool fail_if_conn,
+                          bool disconnect_if_conn)
+{
+    qemu_debug_assert(0);
+
+    return -ENOSYS;
+}
+
+int qemu_spice_set_pw_expire(time_t expires)
+{
+    qemu_debug_assert(0);
+
+    return -ENOSYS;
+}
+
+#endif
+
+int index_from_key(const char *key, size_t key_length)
+{
+    qemu_debug_assert(0);
+
+    return -ENOSYS;
+}
diff --git a/stubs/vl-stub.c b/stubs/vl-stub.c
index fff72be..606f078 100644
--- a/stubs/vl-stub.c
+++ b/stubs/vl-stub.c
@@ -8,6 +8,12 @@
 #include "disas/disas.h"
 #include "sysemu/runstate.h"
 
+#include "qapi/qapi-commands-ui.h"
+#include "qapi/qapi-commands-run-state.h"
+#include "sysemu/watchdog.h"
+#include "disas/disas.h"
+#include "audio/audio.h"
+
 bool tcg_allowed;
 bool xen_allowed;
 bool boot_strict;
@@ -21,6 +27,8 @@ int smp_threads = 1;
 int icount_align_option;
 int boot_menu;
 
+#pragma weak arch_type
+
 unsigned int max_cpus;
 const uint32_t arch_type;
 const char *mem_path;
@@ -33,6 +41,11 @@ ram_addr_t ram_size;
 MachineState *current_machine;
 QemuUUID qemu_uuid;
 
+int singlestep;
+const char *qemu_name;
+int no_shutdown;
+int autostart;
+
 int runstate_is_running(void)
 {
     return 0;
@@ -77,3 +90,82 @@ void x86_cpu_list(void)
 {
 }
 #endif
+
+void qemu_system_shutdown_request(ShutdownCause reason)
+{
+    qemu_debug_assert(0);
+}
+
+void qemu_system_reset_request(ShutdownCause reason)
+{
+    qemu_debug_assert(0);
+}
+
+void qemu_system_powerdown_request(void)
+{
+    qemu_debug_assert(0);
+}
+
+void qemu_exit_preconfig_request(void)
+{
+    qemu_debug_assert(0);
+}
+
+bool runstate_needs_reset(void)
+{
+    qemu_debug_assert(0);
+
+    return FALSE;
+}
+
+bool qemu_wakeup_suspend_enabled(void)
+{
+    qemu_debug_assert(0);
+
+    return FALSE;
+}
+
+void qemu_system_wakeup_request(WakeupReason reason, Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+DisplayOptions *qmp_query_display_options(Error **errp)
+{
+    qemu_debug_assert(0);
+
+    return NULL;
+}
+
+StatusInfo *qmp_query_status(Error **errp)
+{
+    qemu_debug_assert(0);
+
+    return NULL;
+}
+
+void qmp_watchdog_set_action(WatchdogAction action, Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+int select_watchdog_action(const char *p)
+{
+    qemu_debug_assert(0);
+
+    return -1;
+}
+
+void monitor_disas(Monitor *mon, CPUState *cpu,
+                   target_ulong pc, int nb_insn, int is_physical)
+{
+    qemu_debug_assert(0);
+}
+
+int wav_start_capture(AudioState *state, CaptureState *s, const char *path,
+                      int freq, int bits, int nchannels)
+{
+    qemu_debug_assert(0);
+
+    return -1;
+}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 36/49] multi-process/mon: enable QMP module support in the remote process
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (34 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 35/49] multi-process/mon: stub functions to enable QMP module for remote process Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-10-24  9:09 ` [RFC v4 PATCH 37/49] multi-process/mon: Refactor monitor/chardev functions out of vl.c Jagannathan Raman
                   ` (17 subsequent siblings)
  53 siblings, 0 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

Build system changes to enable QMP module in the remote process

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 New patch in v3

 Makefile.objs              |  9 +++++
 Makefile.target            | 34 +++++++++++++++++--
 hmp-commands.hx            |  5 +--
 hw/core/Makefile.objs      |  1 +
 monitor/Makefile.objs      |  3 ++
 monitor/misc.c             | 84 +++++++++++++++++++++++++---------------------
 monitor/monitor-internal.h | 38 +++++++++++++++++++++
 qapi/Makefile.objs         |  2 ++
 qom/Makefile.objs          |  1 +
 ui/Makefile.objs           |  2 ++
 10 files changed, 137 insertions(+), 42 deletions(-)

diff --git a/Makefile.objs b/Makefile.objs
index c23ccaa..c72db88 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -30,6 +30,7 @@ remote-pci-obj-$(CONFIG_MPQEMU) += backends/
 remote-pci-obj-$(CONFIG_MPQEMU) += block/
 remote-pci-obj-$(CONFIG_MPQEMU) += migration/
 remote-pci-obj-$(CONFIG_MPQEMU) += remote/
+remote-pci-obj-$(CONFIG_MPQEMU) += monitor/
 
 remote-pci-obj-$(CONFIG_MPQEMU) += cpus-common.o
 remote-pci-obj-$(CONFIG_MPQEMU) += dma-helpers.o
@@ -42,6 +43,9 @@ remote-pci-obj-$(CONFIG_MPQEMU) += iothread.o
 # remote-lsi-obj-y is code used to implement remote LSI device
 
 remote-lsi-obj-$(CONFIG_MPQEMU) += hw/
+remote-lsi-obj-$(CONFIG_MPQEMU) += ui/
+
+remote-lsi-obj-$(CONFIG_MPQEMU) += device-hotplug.o
 
 #######################################################################
 # crypto-obj-y is code used by both qemu system emulation and qemu-img
@@ -112,6 +116,11 @@ common-obj-y += vl-parse.o
 common-obj-y += qapi/
 endif
 
+remote-pci-obj-$(CONFIG_MPQEMU) += qapi/
+remote-pci-obj-$(CONFIG_MPQEMU) += blockdev-nbd.o
+remote-pci-obj-$(CONFIG_MPQEMU) += job-qmp.o
+remote-pci-obj-$(CONFIG_MPQEMU) += balloon.o
+
 #######################################################################
 # Target-independent parts used in system and user emulation
 common-obj-y += cpus-common.o
diff --git a/Makefile.target b/Makefile.target
index 0ca40f1..8010998 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -132,13 +132,31 @@ remote-pci-tgt-obj-$(CONFIG_MPQEMU) += accel/stubs/hax-stub.o
 remote-pci-tgt-obj-$(CONFIG_MPQEMU) += accel/stubs/whpx-stub.o
 remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/vl-stub.o
 remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/net-stub.o
-remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/monitor.o
 remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/replay.o
 remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/xen-mapcache.o
 remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/audio.o
 remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/monitor.o
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/migration.o
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/ui-stub.o
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/gdbstub.o
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/qapi-target.o
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/qapi-misc.o
 
 remote-pci-tgt-obj-$(CONFIG_MPQEMU) += remote/memory.o
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += arch_init.o
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += monitor/misc.o
+
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += qapi/qapi-introspect.o
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += qapi/qapi-commands-block-core.o
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += qapi/qapi-commands-block.o
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += qapi/qapi-commands-misc.o
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += qapi/qapi-commands.o
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += qapi/qapi-commands-machine-target.o
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += qapi/qapi-commands-misc-target.o
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += qapi/qapi-visit-machine-target.o
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += qapi/qapi-visit-misc-target.o
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += qapi/qapi-types-machine-target.o
+remote-pci-tgt-obj-$(CONFIG_MPQEMU) += qapi/qapi-types-misc-target.o
 
 #########################################################
 # Linux user emulator target
@@ -191,6 +209,10 @@ endif
 generated-files-y += hmp-commands.h hmp-commands-info.h
 generated-files-y += config-devices.h
 
+ifdef CONFIG_MPQEMU
+generated-files-y += hmp-scsi-commands.h hmp-scsi-commands-info.h
+endif
+
 endif # CONFIG_SOFTMMU
 
 dummy := $(call unnest-vars,,obj-y)
@@ -275,10 +297,18 @@ hmp-commands.h: $(SRC_PATH)/hmp-commands.hx $(SRC_PATH)/scripts/hxtool
 hmp-commands-info.h: $(SRC_PATH)/hmp-commands-info.hx $(SRC_PATH)/scripts/hxtool
 	$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h < $< > $@,"GEN","$(TARGET_DIR)$@")
 
+ifdef CONFIG_MPQEMU
+hmp-scsi-commands.h: $(SRC_PATH)/hmp-commands.hx $(SRC_PATH)/scripts/hxtool
+	$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -tgt scsi < $< > $@)
+
+hmp-scsi-commands-info.h: $(SRC_PATH)/hmp-commands-info.hx $(SRC_PATH)/scripts/hxtool
+	$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -tgt scsi < $< > $@)
+endif
+
 clean: clean-target
 	rm -f *.a *~ $(PROGS)
 	rm -f $(shell find . -name '*.[od]')
-	rm -f hmp-commands.h gdbstub-xml.c
+	rm -f hmp-commands.h gdbstub-xml.c hmp-scsi-commands.h hmp-scsi-commands-info.h
 	rm -f trace/generated-helpers.c trace/generated-helpers.c-timestamp
 ifdef CONFIG_TRACE_SYSTEMTAP
 	rm -f *.stp
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 6d9674b..534f272 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -14,7 +14,8 @@ ETEXI
         .args_type  = "name:S?",
         .params     = "[cmd]",
         .help       = "show the help",
-        .cmd        = do_help_cmd,
+        .cmd        = hmp_do_help_cmd,
+        .targets    = "scsi",
         .flags      = "p",
     },
 
@@ -618,7 +619,7 @@ ETEXI
         .args_type  = "fmt:/,val:l",
         .params     = "/fmt expr",
         .help       = "print expression value (use $reg for CPU register access)",
-        .cmd        = do_print,
+        .cmd        = hmp_do_print,
     },
 
 STEXI
diff --git a/hw/core/Makefile.objs b/hw/core/Makefile.objs
index 9ef6b42..721bc5f 100644
--- a/hw/core/Makefile.objs
+++ b/hw/core/Makefile.objs
@@ -44,3 +44,4 @@ remote-pci-obj-$(CONFIG_MPQEMU) += qdev-properties-system.o
 remote-pci-obj-$(CONFIG_MPQEMU) += qdev-fw.o
 remote-pci-obj-$(CONFIG_MPQEMU) += numa.o
 remote-pci-obj-$(CONFIG_MPQEMU) += cpu.o
+remote-pci-obj-$(CONFIG_MPQEMU) += machine-qmp-cmds.o
diff --git a/monitor/Makefile.objs b/monitor/Makefile.objs
index e91a858..11c42ec 100644
--- a/monitor/Makefile.objs
+++ b/monitor/Makefile.objs
@@ -1,3 +1,6 @@
 obj-y += misc.o
 common-obj-y += monitor.o qmp.o hmp.o
 common-obj-y += qmp-cmds.o hmp-cmds.o
+
+remote-pci-obj-$(CONFIG_MPQEMU) += monitor.o qmp.o hmp.o
+remote-pci-obj-$(CONFIG_MPQEMU) += qmp-cmds.o hmp-cmds.o
diff --git a/monitor/misc.c b/monitor/misc.c
index aef16f6..400ba06 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -176,12 +176,12 @@ int hmp_compare_cmd(const char *name, const char *list)
     return 0;
 }
 
-static void do_help_cmd(Monitor *mon, const QDict *qdict)
+void hmp_do_help_cmd(Monitor *mon, const QDict *qdict)
 {
     help_cmd(mon, qdict_get_try_str(qdict, "name"));
 }
 
-static void hmp_trace_event(Monitor *mon, const QDict *qdict)
+void hmp_trace_event(Monitor *mon, const QDict *qdict)
 {
     const char *tp_name = qdict_get_str(qdict, "name");
     bool new_state = qdict_get_bool(qdict, "option");
@@ -225,7 +225,7 @@ static void hmp_trace_file(Monitor *mon, const QDict *qdict)
 }
 #endif
 
-static void hmp_info_help(Monitor *mon, const QDict *qdict)
+void hmp_info_help(Monitor *mon, const QDict *qdict)
 {
     help_cmd(mon, "info");
 }
@@ -436,7 +436,7 @@ int monitor_get_cpu_index(void)
     return cs ? cs->cpu_index : UNASSIGNED_CPU_INDEX;
 }
 
-static void hmp_info_registers(Monitor *mon, const QDict *qdict)
+void hmp_info_registers(Monitor *mon, const QDict *qdict)
 {
     bool all_cpus = qdict_get_try_bool(qdict, "cpustate_all", false);
     CPUState *cs;
@@ -459,7 +459,7 @@ static void hmp_info_registers(Monitor *mon, const QDict *qdict)
 }
 
 #ifdef CONFIG_TCG
-static void hmp_info_jit(Monitor *mon, const QDict *qdict)
+void hmp_info_jit(Monitor *mon, const QDict *qdict)
 {
     if (!tcg_enabled()) {
         error_report("JIT information is only available with accel=tcg");
@@ -470,13 +470,13 @@ static void hmp_info_jit(Monitor *mon, const QDict *qdict)
     dump_drift_info();
 }
 
-static void hmp_info_opcount(Monitor *mon, const QDict *qdict)
+void hmp_info_opcount(Monitor *mon, const QDict *qdict)
 {
     dump_opcount_info();
 }
 #endif
 
-static void hmp_info_sync_profile(Monitor *mon, const QDict *qdict)
+void hmp_info_sync_profile(Monitor *mon, const QDict *qdict)
 {
     int64_t max = qdict_get_try_int(qdict, "max", 10);
     bool mean = qdict_get_try_bool(qdict, "mean", false);
@@ -487,7 +487,7 @@ static void hmp_info_sync_profile(Monitor *mon, const QDict *qdict)
     qsp_report(max, sort_by, coalesce);
 }
 
-static void hmp_info_history(Monitor *mon, const QDict *qdict)
+void hmp_info_history(Monitor *mon, const QDict *qdict)
 {
     MonitorHMP *hmp_mon = container_of(mon, MonitorHMP, common);
     int i;
@@ -507,7 +507,7 @@ static void hmp_info_history(Monitor *mon, const QDict *qdict)
     }
 }
 
-static void hmp_info_cpustats(Monitor *mon, const QDict *qdict)
+void hmp_info_cpustats(Monitor *mon, const QDict *qdict)
 {
     CPUState *cs = mon_get_cpu();
 
@@ -518,7 +518,7 @@ static void hmp_info_cpustats(Monitor *mon, const QDict *qdict)
     cpu_dump_statistics(cs, 0);
 }
 
-static void hmp_info_trace_events(Monitor *mon, const QDict *qdict)
+void hmp_info_trace_events(Monitor *mon, const QDict *qdict)
 {
     const char *name = qdict_get_try_str(qdict, "name");
     bool has_vcpu = qdict_haskey(qdict, "vcpu");
@@ -578,7 +578,7 @@ void qmp_client_migrate_info(const char *protocol, const char *hostname,
     error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "protocol", "spice");
 }
 
-static void hmp_logfile(Monitor *mon, const QDict *qdict)
+void hmp_logfile(Monitor *mon, const QDict *qdict)
 {
     Error *err = NULL;
 
@@ -588,7 +588,7 @@ static void hmp_logfile(Monitor *mon, const QDict *qdict)
     }
 }
 
-static void hmp_log(Monitor *mon, const QDict *qdict)
+void hmp_log(Monitor *mon, const QDict *qdict)
 {
     int mask;
     const char *items = qdict_get_str(qdict, "items");
@@ -605,7 +605,7 @@ static void hmp_log(Monitor *mon, const QDict *qdict)
     qemu_set_log(mask);
 }
 
-static void hmp_singlestep(Monitor *mon, const QDict *qdict)
+void hmp_singlestep(Monitor *mon, const QDict *qdict)
 {
     const char *option = qdict_get_try_str(qdict, "option");
     if (!option || !strcmp(option, "on")) {
@@ -617,7 +617,7 @@ static void hmp_singlestep(Monitor *mon, const QDict *qdict)
     }
 }
 
-static void hmp_gdbserver(Monitor *mon, const QDict *qdict)
+void hmp_gdbserver(Monitor *mon, const QDict *qdict)
 {
     const char *device = qdict_get_try_str(qdict, "device");
     if (!device)
@@ -633,7 +633,7 @@ static void hmp_gdbserver(Monitor *mon, const QDict *qdict)
     }
 }
 
-static void hmp_watchdog_action(Monitor *mon, const QDict *qdict)
+void hmp_watchdog_action(Monitor *mon, const QDict *qdict)
 {
     const char *action = qdict_get_str(qdict, "action");
     if (select_watchdog_action(action) == -1) {
@@ -775,7 +775,7 @@ static void memory_dump(Monitor *mon, int count, int format, int wsize,
     }
 }
 
-static void hmp_memory_dump(Monitor *mon, const QDict *qdict)
+void hmp_memory_dump(Monitor *mon, const QDict *qdict)
 {
     int count = qdict_get_int(qdict, "count");
     int format = qdict_get_int(qdict, "format");
@@ -785,7 +785,7 @@ static void hmp_memory_dump(Monitor *mon, const QDict *qdict)
     memory_dump(mon, count, format, size, addr, 0);
 }
 
-static void hmp_physical_memory_dump(Monitor *mon, const QDict *qdict)
+void hmp_physical_memory_dump(Monitor *mon, const QDict *qdict)
 {
     int count = qdict_get_int(qdict, "count");
     int format = qdict_get_int(qdict, "format");
@@ -815,7 +815,7 @@ static void *gpa2hva(MemoryRegion **p_mr, hwaddr addr, Error **errp)
     return qemu_map_ram_ptr(mrs.mr->ram_block, mrs.offset_within_region);
 }
 
-static void hmp_gpa2hva(Monitor *mon, const QDict *qdict)
+void hmp_gpa2hva(Monitor *mon, const QDict *qdict)
 {
     hwaddr addr = qdict_get_int(qdict, "addr");
     Error *local_err = NULL;
@@ -835,7 +835,7 @@ static void hmp_gpa2hva(Monitor *mon, const QDict *qdict)
     memory_region_unref(mr);
 }
 
-static void hmp_gva2gpa(Monitor *mon, const QDict *qdict)
+void hmp_gva2gpa(Monitor *mon, const QDict *qdict)
 {
     target_ulong addr = qdict_get_int(qdict, "addr");
     MemTxAttrs attrs;
@@ -890,7 +890,7 @@ out:
     return ret;
 }
 
-static void hmp_gpa2hpa(Monitor *mon, const QDict *qdict)
+void hmp_gpa2hpa(Monitor *mon, const QDict *qdict)
 {
     hwaddr addr = qdict_get_int(qdict, "addr");
     Error *local_err = NULL;
@@ -917,7 +917,7 @@ static void hmp_gpa2hpa(Monitor *mon, const QDict *qdict)
 }
 #endif
 
-static void do_print(Monitor *mon, const QDict *qdict)
+void hmp_do_print(Monitor *mon, const QDict *qdict)
 {
     int format = qdict_get_int(qdict, "format");
     hwaddr val = qdict_get_int(qdict, "val");
@@ -943,7 +943,7 @@ static void do_print(Monitor *mon, const QDict *qdict)
     monitor_printf(mon, "\n");
 }
 
-static void hmp_sum(Monitor *mon, const QDict *qdict)
+void hmp_sum(Monitor *mon, const QDict *qdict)
 {
     uint32_t addr;
     uint16_t sum;
@@ -963,7 +963,7 @@ static void hmp_sum(Monitor *mon, const QDict *qdict)
 
 static int mouse_button_state;
 
-static void hmp_mouse_move(Monitor *mon, const QDict *qdict)
+void hmp_mouse_move(Monitor *mon, const QDict *qdict)
 {
     int dx, dy, dz, button;
     const char *dx_str = qdict_get_str(qdict, "dx_str");
@@ -987,7 +987,7 @@ static void hmp_mouse_move(Monitor *mon, const QDict *qdict)
     qemu_input_event_sync();
 }
 
-static void hmp_mouse_button(Monitor *mon, const QDict *qdict)
+void hmp_mouse_button(Monitor *mon, const QDict *qdict)
 {
     static uint32_t bmap[INPUT_BUTTON__MAX] = {
         [INPUT_BUTTON_LEFT]       = MOUSE_EVENT_LBUTTON,
@@ -1004,7 +1004,7 @@ static void hmp_mouse_button(Monitor *mon, const QDict *qdict)
     mouse_button_state = button_state;
 }
 
-static void hmp_ioport_read(Monitor *mon, const QDict *qdict)
+void hmp_ioport_read(Monitor *mon, const QDict *qdict)
 {
     int size = qdict_get_int(qdict, "size");
     int addr = qdict_get_int(qdict, "addr");
@@ -1038,7 +1038,7 @@ static void hmp_ioport_read(Monitor *mon, const QDict *qdict)
                    suffix, addr, size * 2, val);
 }
 
-static void hmp_ioport_write(Monitor *mon, const QDict *qdict)
+void hmp_ioport_write(Monitor *mon, const QDict *qdict)
 {
     int size = qdict_get_int(qdict, "size");
     int addr = qdict_get_int(qdict, "addr");
@@ -1060,7 +1060,7 @@ static void hmp_ioport_write(Monitor *mon, const QDict *qdict)
     }
 }
 
-static void hmp_boot_set(Monitor *mon, const QDict *qdict)
+void hmp_boot_set(Monitor *mon, const QDict *qdict)
 {
     Error *local_err = NULL;
     const char *bootdevice = qdict_get_str(qdict, "bootdevice");
@@ -1073,7 +1073,7 @@ static void hmp_boot_set(Monitor *mon, const QDict *qdict)
     }
 }
 
-static void hmp_info_mtree(Monitor *mon, const QDict *qdict)
+void hmp_info_mtree(Monitor *mon, const QDict *qdict)
 {
     bool flatview = qdict_get_try_bool(qdict, "flatview", false);
     bool dispatch_tree = qdict_get_try_bool(qdict, "dispatch_tree", false);
@@ -1086,7 +1086,7 @@ static void hmp_info_mtree(Monitor *mon, const QDict *qdict)
 
 int64_t dev_time;
 
-static void hmp_info_profile(Monitor *mon, const QDict *qdict)
+void hmp_info_profile(Monitor *mon, const QDict *qdict)
 {
     static int64_t last_cpu_exec_time;
     int64_t cpu_exec_time;
@@ -1103,7 +1103,7 @@ static void hmp_info_profile(Monitor *mon, const QDict *qdict)
     dev_time = 0;
 }
 #else
-static void hmp_info_profile(Monitor *mon, const QDict *qdict)
+void hmp_info_profile(Monitor *mon, const QDict *qdict)
 {
     monitor_printf(mon, "Internal profiler not compiled\n");
 }
@@ -1112,7 +1112,7 @@ static void hmp_info_profile(Monitor *mon, const QDict *qdict)
 /* Capture support */
 static QLIST_HEAD (capture_list_head, CaptureState) capture_head;
 
-static void hmp_info_capture(Monitor *mon, const QDict *qdict)
+void hmp_info_capture(Monitor *mon, const QDict *qdict)
 {
     int i;
     CaptureState *s;
@@ -1123,7 +1123,7 @@ static void hmp_info_capture(Monitor *mon, const QDict *qdict)
     }
 }
 
-static void hmp_stopcapture(Monitor *mon, const QDict *qdict)
+void hmp_stopcapture(Monitor *mon, const QDict *qdict)
 {
     int i;
     int n = qdict_get_int(qdict, "n");
@@ -1139,7 +1139,7 @@ static void hmp_stopcapture(Monitor *mon, const QDict *qdict)
     }
 }
 
-static void hmp_wavcapture(Monitor *mon, const QDict *qdict)
+void hmp_wavcapture(Monitor *mon, const QDict *qdict)
 {
     const char *path = qdict_get_str(qdict, "path");
     int freq = qdict_get_try_int(qdict, "freq", 44100);
@@ -1192,7 +1192,7 @@ static void hmp_warn_acl(void)
     warn_acl = true;
 }
 
-static void hmp_acl_show(Monitor *mon, const QDict *qdict)
+void hmp_acl_show(Monitor *mon, const QDict *qdict)
 {
     const char *aclname = qdict_get_str(qdict, "aclname");
     QAuthZList *auth = find_auth(mon, aclname);
@@ -1219,7 +1219,7 @@ static void hmp_acl_show(Monitor *mon, const QDict *qdict)
     }
 }
 
-static void hmp_acl_reset(Monitor *mon, const QDict *qdict)
+void hmp_acl_reset(Monitor *mon, const QDict *qdict)
 {
     const char *aclname = qdict_get_str(qdict, "aclname");
     QAuthZList *auth = find_auth(mon, aclname);
@@ -1236,7 +1236,7 @@ static void hmp_acl_reset(Monitor *mon, const QDict *qdict)
     monitor_printf(mon, "acl: removed all rules\n");
 }
 
-static void hmp_acl_policy(Monitor *mon, const QDict *qdict)
+void hmp_acl_policy(Monitor *mon, const QDict *qdict)
 {
     const char *aclname = qdict_get_str(qdict, "aclname");
     const char *policy = qdict_get_str(qdict, "policy");
@@ -1277,7 +1277,7 @@ static QAuthZListFormat hmp_acl_get_format(const char *match)
     }
 }
 
-static void hmp_acl_add(Monitor *mon, const QDict *qdict)
+void hmp_acl_add(Monitor *mon, const QDict *qdict)
 {
     const char *aclname = qdict_get_str(qdict, "aclname");
     const char *match = qdict_get_str(qdict, "match");
@@ -1330,7 +1330,7 @@ static void hmp_acl_add(Monitor *mon, const QDict *qdict)
     }
 }
 
-static void hmp_acl_remove(Monitor *mon, const QDict *qdict)
+void hmp_acl_remove(Monitor *mon, const QDict *qdict)
 {
     const char *aclname = qdict_get_str(qdict, "aclname");
     const char *match = qdict_get_str(qdict, "match");
@@ -1799,13 +1799,21 @@ int monitor_fd_param(Monitor *mon, const char *fdname, Error **errp)
 
 /* Please update hmp-commands.hx when adding or changing commands */
 static HMPCommand hmp_info_cmds[] = {
+#if defined(SCSI_PROCESS)
+#include "hmp-scsi-commands-info.h"
+#else
 #include "hmp-commands-info.h"
+#endif
     { NULL, NULL, },
 };
 
 /* hmp_cmds and hmp_info_cmds would be sorted at runtime */
 HMPCommand hmp_cmds[] = {
+#if defined(SCSI_PROCESS)
+#include "hmp-scsi-commands.h"
+#else
 #include "hmp-commands.h"
+#endif
     { NULL, NULL, },
 };
 
diff --git a/monitor/monitor-internal.h b/monitor/monitor-internal.h
index d78f5ca..6ea7211 100644
--- a/monitor/monitor-internal.h
+++ b/monitor/monitor-internal.h
@@ -179,4 +179,42 @@ void help_cmd(Monitor *mon, const char *name);
 void handle_hmp_command(MonitorHMP *mon, const char *cmdline);
 int hmp_compare_cmd(const char *name, const char *list);
 
+void hmp_do_help_cmd(Monitor *mon, const QDict *qdict);
+void hmp_trace_event(Monitor *mon, const QDict *qdict);
+void hmp_info_help(Monitor *mon, const QDict *qdict);
+void hmp_info_registers(Monitor *mon, const QDict *qdict);
+void hmp_info_jit(Monitor *mon, const QDict *qdict);
+void hmp_info_opcount(Monitor *mon, const QDict *qdict);
+void hmp_info_sync_profile(Monitor *mon, const QDict *qdict);
+void hmp_info_history(Monitor *mon, const QDict *qdict);
+void hmp_info_cpustats(Monitor *mon, const QDict *qdict);
+void hmp_info_trace_events(Monitor *mon, const QDict *qdict);
+void hmp_logfile(Monitor *mon, const QDict *qdict);
+void hmp_log(Monitor *mon, const QDict *qdict);
+void hmp_singlestep(Monitor *mon, const QDict *qdict);
+void hmp_gdbserver(Monitor *mon, const QDict *qdict);
+void hmp_watchdog_action(Monitor *mon, const QDict *qdict);
+void hmp_memory_dump(Monitor *mon, const QDict *qdict);
+void hmp_physical_memory_dump(Monitor *mon, const QDict *qdict);
+void hmp_gpa2hva(Monitor *mon, const QDict *qdict);
+void hmp_gva2gpa(Monitor *mon, const QDict *qdict);
+void hmp_gpa2hpa(Monitor *mon, const QDict *qdict);
+void hmp_do_print(Monitor *mon, const QDict *qdict);
+void hmp_sum(Monitor *mon, const QDict *qdict);
+void hmp_mouse_move(Monitor *mon, const QDict *qdict);
+void hmp_mouse_button(Monitor *mon, const QDict *qdict);
+void hmp_ioport_read(Monitor *mon, const QDict *qdict);
+void hmp_ioport_write(Monitor *mon, const QDict *qdict);
+void hmp_boot_set(Monitor *mon, const QDict *qdict);
+void hmp_info_mtree(Monitor *mon, const QDict *qdict);
+void hmp_info_profile(Monitor *mon, const QDict *qdict);
+void hmp_info_capture(Monitor *mon, const QDict *qdict);
+void hmp_stopcapture(Monitor *mon, const QDict *qdict);
+void hmp_wavcapture(Monitor *mon, const QDict *qdict);
+void hmp_acl_show(Monitor *mon, const QDict *qdict);
+void hmp_acl_reset(Monitor *mon, const QDict *qdict);
+void hmp_acl_policy(Monitor *mon, const QDict *qdict);
+void hmp_acl_add(Monitor *mon, const QDict *qdict);
+void hmp_acl_remove(Monitor *mon, const QDict *qdict);
+
 #endif
diff --git a/qapi/Makefile.objs b/qapi/Makefile.objs
index dd3f5e6..059ad08 100644
--- a/qapi/Makefile.objs
+++ b/qapi/Makefile.objs
@@ -30,3 +30,5 @@ obj-y += $(QAPI_TARGET_MODULES:%=qapi-events-%.o)
 obj-y += qapi-events.o
 obj-y += $(QAPI_TARGET_MODULES:%=qapi-commands-%.o)
 obj-y += qapi-commands.o
+
+remote-pci-obj-$(CONFIG_MPQEMU) += $(QAPI_COMMON_MODULES:%=qapi-commands-%.o)
diff --git a/qom/Makefile.objs b/qom/Makefile.objs
index 07e50e5..16603d7 100644
--- a/qom/Makefile.objs
+++ b/qom/Makefile.objs
@@ -5,3 +5,4 @@ common-obj-$(CONFIG_SOFTMMU) += qom-hmp-cmds.o qom-qmp-cmds.o
 
 remote-pci-obj-$(CONFIG_MPQEMU) += object.o qom-qobject.o container.o
 remote-pci-obj-$(CONFIG_MPQEMU) += object_interfaces.o
+remote-pci-obj-$(CONFIG_MPQEMU) += qom-qmp-cmds.o qom-hmp-cmds.o
diff --git a/ui/Makefile.objs b/ui/Makefile.objs
index e6da6ff..c3ac572 100644
--- a/ui/Makefile.objs
+++ b/ui/Makefile.objs
@@ -68,3 +68,5 @@ console-gl.o-libs += $(OPENGL_LIBS)
 egl-helpers.o-libs += $(OPENGL_LIBS)
 egl-context.o-libs += $(OPENGL_LIBS)
 egl-headless.o-libs += $(OPENGL_LIBS)
+
+remote-lsi-obj-$(CONFIG_MPQEMU) += vnc-stubs.o
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 37/49] multi-process/mon: Refactor monitor/chardev functions out of vl.c
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (35 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 36/49] multi-process/mon: enable QMP module support in the " Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-10-24  9:09 ` [RFC v4 PATCH 38/49] multi-process/mon: Initialize QMP module for remote processes Jagannathan Raman
                   ` (16 subsequent siblings)
  53 siblings, 0 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

Some of the initialization helper functions w.r.t monitor & chardev
in vl.c are also used by the remote process. Therefore, these functions
are refactored into shared files that both QEMU & remote process
could use.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 New patch in v3

 v3 -> v4
   - Moved monitor related functions to monitor.c and chardev functions
     to char.c

 chardev/char.c            | 14 ++++++++
 include/chardev/char.h    |  1 +
 include/monitor/monitor.h |  2 ++
 monitor/monitor.c         | 83 +++++++++++++++++++++++++++++++++++++++++-
 remote/remote-main.c      |  1 +
 remote/remote-opts.c      |  1 +
 vl-parse.c                | 35 +++++++++---------
 vl.c                      | 91 -----------------------------------------------
 8 files changed, 119 insertions(+), 109 deletions(-)

diff --git a/chardev/char.c b/chardev/char.c
index 7b6b2cb..9f339ec 100644
--- a/chardev/char.c
+++ b/chardev/char.c
@@ -1168,4 +1168,18 @@ static void register_types(void)
     qemu_add_machine_init_done_notifier(&chardev_machine_done_notify);
 }
 
+int chardev_init_func(void *opaque, QemuOpts *opts, Error **errp)
+{
+    Error *local_err = NULL;
+
+    if (!qemu_chr_new_from_opts(opts, NULL, &local_err)) {
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return -1;
+        }
+        exit(0);
+    }
+    return 0;
+}
+
 type_init(register_types);
diff --git a/include/chardev/char.h b/include/chardev/char.h
index 087b202..bad076c 100644
--- a/include/chardev/char.h
+++ b/include/chardev/char.h
@@ -290,4 +290,5 @@ GSource *qemu_chr_timeout_add_ms(Chardev *chr, guint ms,
 /* console.c */
 void qemu_chr_parse_vc(QemuOpts *opts, ChardevBackend *backend, Error **errp);
 
+int chardev_init_func(void *opaque, QemuOpts *opts, Error **errp);
 #endif
diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index a81eeff..ed1963a 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -42,5 +42,7 @@ int monitor_fdset_get_fd(int64_t fdset_id, int flags);
 int monitor_fdset_dup_fd_add(int64_t fdset_id, int dup_fd);
 void monitor_fdset_dup_fd_remove(int dup_fd);
 int64_t monitor_fdset_dup_fd_find(int dup_fd);
+void monitor_parse(const char *optarg, const char *mode, bool pretty);
+int mon_init_func(void *opaque, QemuOpts *opts, Error **errp);
 
 #endif /* MONITOR_H */
diff --git a/monitor/monitor.c b/monitor/monitor.c
index 12898b6..18bcb57 100644
--- a/monitor/monitor.c
+++ b/monitor/monitor.c
@@ -33,7 +33,10 @@
 #include "sysemu/qtest.h"
 #include "sysemu/sysemu.h"
 #include "trace.h"
-
+#include "qemu/cutils.h"
+#include "qemu/option.h"
+#include "qemu-options.h"
+#include "qemu/config-file.h"
 /*
  * To prevent flooding clients, events can be throttled. The
  * throttling is calculated globally, rather than per-Monitor
@@ -609,6 +612,84 @@ void monitor_init_globals_core(void)
                                    NULL);
 }
 
+void monitor_parse(const char *optarg, const char *mode, bool pretty)
+{
+    static int monitor_device_index;
+    QemuOpts *opts;
+    const char *p;
+    char label[32];
+
+    if (strstart(optarg, "chardev:", &p)) {
+        snprintf(label, sizeof(label), "%s", p);
+    } else {
+        snprintf(label, sizeof(label), "compat_monitor%d",
+                 monitor_device_index);
+        opts = qemu_chr_parse_compat(label, optarg, true);
+        if (!opts) {
+            error_report("parse error: %s", optarg);
+            exit(1);
+        }
+    }
+
+    opts = qemu_opts_create(qemu_find_opts("mon"), label, 1, &error_fatal);
+    qemu_opt_set(opts, "mode", mode, &error_abort);
+    qemu_opt_set(opts, "chardev", label, &error_abort);
+    if (!strcmp(mode, "control")) {
+        qemu_opt_set_bool(opts, "pretty", pretty, &error_abort);
+    } else {
+        assert(pretty == false);
+    }
+    monitor_device_index++;
+}
+
+int mon_init_func(void *opaque, QemuOpts *opts, Error **errp)
+{
+    Chardev *chr;
+    bool qmp;
+    bool pretty = false;
+    const char *chardev;
+    const char *mode;
+
+    mode = qemu_opt_get(opts, "mode");
+    if (mode == NULL) {
+        mode = "readline";
+    }
+    if (strcmp(mode, "readline") == 0) {
+        qmp = false;
+    } else if (strcmp(mode, "control") == 0) {
+        qmp = true;
+    } else {
+        error_setg(errp, "unknown monitor mode \"%s\"", mode);
+        return -1;
+    }
+
+    if (!qmp && qemu_opt_get(opts, "pretty")) {
+        warn_report("'pretty' is deprecated for HMP monitors, it has no effect "
+                    "and will be removed in future versions");
+    }
+    if (qemu_opt_get_bool(opts, "pretty", 0)) {
+        pretty = true;
+    }
+
+    chardev = qemu_opt_get(opts, "chardev");
+    if (!chardev) {
+        error_report("chardev is required");
+        exit(1);
+    }
+    chr = qemu_chr_find(chardev);
+    if (chr == NULL) {
+        error_setg(errp, "chardev \"%s\" not found", chardev);
+        return -1;
+    }
+
+    if (qmp) {
+        monitor_init_qmp(chr, pretty);
+    } else {
+        monitor_init_hmp(chr, true);
+    }
+    return 0;
+}
+
 QemuOptsList qemu_mon_opts = {
     .name = "mon",
     .implied_opt_name = "chardev",
diff --git a/remote/remote-main.c b/remote/remote-main.c
index 4459d26..30182dc 100644
--- a/remote/remote-main.c
+++ b/remote/remote-main.c
@@ -67,6 +67,7 @@
 #include "qemu/cutils.h"
 #include "remote-opts.h"
 #include "monitor/monitor.h"
+#include "chardev/char.h"
 #include "sysemu/reset.h"
 
 static MPQemuLinkState *mpqemu_link;
diff --git a/remote/remote-opts.c b/remote/remote-opts.c
index 0ebe6b1..7de6195 100644
--- a/remote/remote-opts.c
+++ b/remote/remote-opts.c
@@ -55,6 +55,7 @@
 #include "block/block.h"
 #include "remote/remote-opts.h"
 #include "include/qemu-common.h"
+#include "monitor/monitor.h"
 
 #include "vl.h"
 /*
diff --git a/vl-parse.c b/vl-parse.c
index 4e2bd7c..011a9ef 100644
--- a/vl-parse.c
+++ b/vl-parse.c
@@ -110,49 +110,50 @@ int drive_init_func(void *opaque, QemuOpts *opts, Error **errp)
     return 0;
 }
 
-#if defined(CONFIG_MPQEMU)
-int rdrive_init_func(void *opaque, QemuOpts *opts, Error **errp)
+int device_init_func(void *opaque, QemuOpts *opts, Error **errp)
 {
     DeviceState *dev;
+    const char *remote = NULL;
 
-    dev = qdev_remote_add(opts, false /* this is drive */, errp);
+    remote = qemu_opt_get(opts, "rid");
+    if (remote) {
+        return 0;
+    }
+
+    dev = qdev_device_add(opts, errp);
     if (!dev) {
-        error_setg(errp, "qdev_remote_add failed for drive.");
         return -1;
     }
     object_unref(OBJECT(dev));
     return 0;
 }
-#endif
 
 #if defined(CONFIG_MPQEMU)
-int rdevice_init_func(void *opaque, QemuOpts *opts, Error **errp)
+int rdrive_init_func(void *opaque, QemuOpts *opts, Error **errp)
 {
     DeviceState *dev;
 
-    dev = qdev_remote_add(opts, true /* this is device */, errp);
+    dev = qdev_remote_add(opts, false /* this is drive */, errp);
     if (!dev) {
-        error_setg(errp, "qdev_remote_add failed for device.");
+        error_setg(errp, "qdev_remote_add failed for drive.");
         return -1;
     }
+    object_unref(OBJECT(dev));
     return 0;
 }
 #endif
 
-int device_init_func(void *opaque, QemuOpts *opts, Error **errp)
+#if defined(CONFIG_MPQEMU)
+int rdevice_init_func(void *opaque, QemuOpts *opts, Error **errp)
 {
     DeviceState *dev;
-    const char *remote = NULL;
 
-    remote = qemu_opt_get(opts, "rid");
-    if (remote) {
-        return 0;
-    }
-
-    dev = qdev_device_add(opts, errp);
+    dev = qdev_remote_add(opts, true /* this is device */, errp);
     if (!dev) {
+        error_setg(errp, "qdev_remote_add failed for device.");
         return -1;
     }
-    object_unref(OBJECT(dev));
     return 0;
 }
+#endif
+
diff --git a/vl.c b/vl.c
index 1417ff2..a6a0db8 100644
--- a/vl.c
+++ b/vl.c
@@ -2279,19 +2279,6 @@ static int device_help_func(void *opaque, QemuOpts *opts, Error **errp)
     return qdev_device_help(opts);
 }
 
-static int chardev_init_func(void *opaque, QemuOpts *opts, Error **errp)
-{
-    Error *local_err = NULL;
-
-    if (!qemu_chr_new_from_opts(opts, NULL, &local_err)) {
-        if (local_err) {
-            error_propagate(errp, local_err);
-            return -1;
-        }
-        exit(0);
-    }
-    return 0;
-}
 
 #ifdef CONFIG_VIRTFS
 static int fsdev_init_func(void *opaque, QemuOpts *opts, Error **errp)
@@ -2300,84 +2287,6 @@ static int fsdev_init_func(void *opaque, QemuOpts *opts, Error **errp)
 }
 #endif
 
-static int mon_init_func(void *opaque, QemuOpts *opts, Error **errp)
-{
-    Chardev *chr;
-    bool qmp;
-    bool pretty = false;
-    const char *chardev;
-    const char *mode;
-
-    mode = qemu_opt_get(opts, "mode");
-    if (mode == NULL) {
-        mode = "readline";
-    }
-    if (strcmp(mode, "readline") == 0) {
-        qmp = false;
-    } else if (strcmp(mode, "control") == 0) {
-        qmp = true;
-    } else {
-        error_setg(errp, "unknown monitor mode \"%s\"", mode);
-        return -1;
-    }
-
-    if (!qmp && qemu_opt_get(opts, "pretty")) {
-        warn_report("'pretty' is deprecated for HMP monitors, it has no effect "
-                    "and will be removed in future versions");
-    }
-    if (qemu_opt_get_bool(opts, "pretty", 0)) {
-        pretty = true;
-    }
-
-    chardev = qemu_opt_get(opts, "chardev");
-    if (!chardev) {
-        error_report("chardev is required");
-        exit(1);
-    }
-    chr = qemu_chr_find(chardev);
-    if (chr == NULL) {
-        error_setg(errp, "chardev \"%s\" not found", chardev);
-        return -1;
-    }
-
-    if (qmp) {
-        monitor_init_qmp(chr, pretty);
-    } else {
-        monitor_init_hmp(chr, true);
-    }
-    return 0;
-}
-
-static void monitor_parse(const char *optarg, const char *mode, bool pretty)
-{
-    static int monitor_device_index = 0;
-    QemuOpts *opts;
-    const char *p;
-    char label[32];
-
-    if (strstart(optarg, "chardev:", &p)) {
-        snprintf(label, sizeof(label), "%s", p);
-    } else {
-        snprintf(label, sizeof(label), "compat_monitor%d",
-                 monitor_device_index);
-        opts = qemu_chr_parse_compat(label, optarg, true);
-        if (!opts) {
-            error_report("parse error: %s", optarg);
-            exit(1);
-        }
-    }
-
-    opts = qemu_opts_create(qemu_find_opts("mon"), label, 1, &error_fatal);
-    qemu_opt_set(opts, "mode", mode, &error_abort);
-    qemu_opt_set(opts, "chardev", label, &error_abort);
-    if (!strcmp(mode, "control")) {
-        qemu_opt_set_bool(opts, "pretty", pretty, &error_abort);
-    } else {
-        assert(pretty == false);
-    }
-    monitor_device_index++;
-}
-
 struct device_config {
     enum {
         DEV_USB,       /* -usbdevice     */
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 38/49] multi-process/mon: Initialize QMP module for remote processes
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (36 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 37/49] multi-process/mon: Refactor monitor/chardev functions out of vl.c Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-10-24  9:09 ` [RFC v4 PATCH 39/49] multi-process: prevent duplicate memory initialization in remote Jagannathan Raman
                   ` (15 subsequent siblings)
  53 siblings, 0 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 New patch in v3

 remote/remote-main.c | 11 +++++++++++
 remote/remote-opts.c | 10 ++++++++++
 2 files changed, 21 insertions(+)

diff --git a/remote/remote-main.c b/remote/remote-main.c
index 30182dc..341b7cf 100644
--- a/remote/remote-main.c
+++ b/remote/remote-main.c
@@ -69,6 +69,7 @@
 #include "monitor/monitor.h"
 #include "chardev/char.h"
 #include "sysemu/reset.h"
+#include "vl.h"
 
 static MPQemuLinkState *mpqemu_link;
 
@@ -476,6 +477,8 @@ int main(int argc, char *argv[])
 
     module_call_init(MODULE_INIT_QOM);
 
+    monitor_init_globals();
+
     bdrv_init_with_whitelist();
 
     if (qemu_init_main_loop(&err)) {
@@ -491,6 +494,8 @@ int main(int argc, char *argv[])
 
     qemu_add_opts(&qemu_device_opts);
     qemu_add_opts(&qemu_drive_opts);
+    qemu_add_opts(&qemu_chardev_opts);
+    qemu_add_opts(&qemu_mon_opts);
     qemu_add_drive_opts(&qemu_legacy_drive_opts);
     qemu_add_drive_opts(&qemu_common_drive_opts);
     qemu_add_drive_opts(&qemu_drive_opts);
@@ -521,6 +526,12 @@ int main(int argc, char *argv[])
 
     mpqemu_link_set_callback(mpqemu_link, process_msg);
 
+    qemu_opts_foreach(qemu_find_opts("chardev"),
+                      chardev_init_func, NULL, &error_fatal);
+
+    qemu_opts_foreach(qemu_find_opts("mon"),
+                      mon_init_func, NULL, &error_fatal);
+
     mpqemu_start_coms(mpqemu_link);
 
     return 0;
diff --git a/remote/remote-opts.c b/remote/remote-opts.c
index 7de6195..1b1824e 100644
--- a/remote/remote-opts.c
+++ b/remote/remote-opts.c
@@ -98,6 +98,16 @@ void parse_cmdline(int argc, char **argv, char **envp)
                     exit(1);
                 }
                 break;
+            case QEMU_OPTION_qmp:
+                monitor_parse(optarg, "control", false);
+                break;
+            case QEMU_OPTION_monitor:
+                if (!strncmp(optarg, "stdio", 5)) {
+                    warn_report("STDIO not supported in remote process");
+                } else if (strncmp(optarg, "none", 4)) {
+                    monitor_parse(optarg, "readline", false);
+                }
+                break;
             default:
                 break;
             }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 39/49] multi-process: prevent duplicate memory initialization in remote
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (37 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 38/49] multi-process/mon: Initialize QMP module for remote processes Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-10-24  9:09 ` [RFC v4 PATCH 40/49] multi-process/mig: build migration module in the remote process Jagannathan Raman
                   ` (14 subsequent siblings)
  53 siblings, 0 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

When multiple controllers are configured in a remote process,
it's better for the memory to be managed by only one of the proxy
objects for that process, in order to conserve file descriptors. Added
"mem_int" flag for this purpose.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 New patch in v3

 hw/proxy/qemu-proxy.c         | 13 ++++++++++++-
 include/hw/proxy/qemu-proxy.h |  1 +
 qdev-monitor.c                |  2 +-
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/hw/proxy/qemu-proxy.c b/hw/proxy/qemu-proxy.c
index 5aada67..623a6c5 100644
--- a/hw/proxy/qemu-proxy.c
+++ b/hw/proxy/qemu-proxy.c
@@ -350,6 +350,13 @@ static void pci_proxy_write_config(PCIDevice *d, uint32_t addr, uint32_t val,
     config_op_send(PCI_PROXY_DEV(d), addr, &val, l, CONF_WRITE);
 }
 
+static void pci_proxy_dev_inst_init(Object *obj)
+{
+    PCIProxyDev *dev = PCI_PROXY_DEV(obj);
+
+    dev->mem_init = false;
+}
+
 static void pci_proxy_dev_class_init(ObjectClass *klass, void *data)
 {
     PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
@@ -364,6 +371,7 @@ static const TypeInfo pci_proxy_dev_type_info = {
     .name          = TYPE_PCI_PROXY_DEV,
     .parent        = TYPE_PCI_DEVICE,
     .instance_size = sizeof(PCIProxyDev),
+    .instance_init = pci_proxy_dev_inst_init,
     .abstract      = true,
     .class_size    = sizeof(PCIProxyDevClass),
     .class_init    = pci_proxy_dev_class_init,
@@ -460,7 +468,10 @@ static void init_proxy(PCIDevice *dev, char *command, bool need_spawn, Error **e
     mpqemu_init_channel(pdev->mpqemu_link, &pdev->mpqemu_link->mmio,
                         pdev->mmio_sock);
 
-    configure_memory_sync(pdev->sync, pdev->mpqemu_link);
+    if (!pdev->mem_init) {
+        pdev->mem_init = true;
+        configure_memory_sync(pdev->sync, pdev->mpqemu_link);
+    }
 }
 
 static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
diff --git a/include/hw/proxy/qemu-proxy.h b/include/hw/proxy/qemu-proxy.h
index 672303c..17e07ac 100644
--- a/include/hw/proxy/qemu-proxy.h
+++ b/include/hw/proxy/qemu-proxy.h
@@ -63,6 +63,7 @@ struct PCIProxyDev {
     MPQemuLinkState *mpqemu_link;
 
     RemoteMemSync *sync;
+    bool mem_init;
     struct kvm_irqfd irqfd;
 
     EventNotifier intr;
diff --git a/qdev-monitor.c b/qdev-monitor.c
index f38849e..2a2c10b 100644
--- a/qdev-monitor.c
+++ b/qdev-monitor.c
@@ -713,10 +713,10 @@ DeviceState *qdev_proxy_add(const char *rid, const char *id, char *bus,
         pdev->socket = old_pdev->socket;
         pdev->mmio_sock = old_pdev->mmio_sock;
         pdev->remote_pid = old_pdev->remote_pid;
+        pdev->mem_init = true;
     } else {
         pdev->rsocket = managed ? rsocket : -1;
         pdev->socket = managed ? rsocket : -1;
-
     }
     pdev->managed = managed;
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 40/49] multi-process/mig: build migration module in the remote process
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (38 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 39/49] multi-process: prevent duplicate memory initialization in remote Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-10-24  9:09 ` [RFC v4 PATCH 41/49] multi-process/mig: Enable VMSD save in the Proxy object Jagannathan Raman
                   ` (13 subsequent siblings)
  53 siblings, 0 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

Add Makefile support to enable migration in remote process

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 New patch in v4

 Makefile.objs           |  4 ++++
 Makefile.target         |  1 +
 migration/Makefile.objs | 12 +++++++++++-
 net/Makefile.objs       |  2 ++
 replay/Makefile.objs    |  2 +-
 stubs/migration.c       | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 stubs/net-stub.c        | 21 +++++++++++++++++++++
 stubs/qapi-misc.c       |  2 ++
 stubs/replay.c          |  8 ++++++++
 stubs/vl-stub.c         | 22 ++++++++++++++++++++++
 vl-parse.c              |  3 +++
 vl.c                    |  2 --
 12 files changed, 124 insertions(+), 4 deletions(-)

diff --git a/Makefile.objs b/Makefile.objs
index c72db88..ebb1938 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -79,6 +79,8 @@ common-obj-y += qdev-monitor.o device-hotplug.o
 common-obj-$(CONFIG_WIN32) += os-win32.o
 common-obj-$(CONFIG_POSIX) += os-posix.o
 
+remote-pci-obj-$(CONFIG_POSIX) += os-posix.o
+
 common-obj-$(CONFIG_LINUX) += fsdev/
 
 common-obj-y += migration/
@@ -110,6 +112,8 @@ common-obj-$(CONFIG_FDT) += device_tree.o
 
 common-obj-y += vl-parse.o
 
+remote-pci-obj-$(CONFIG_MPQEMU) += net/
+
 ######################################################################
 # qapi
 
diff --git a/Makefile.target b/Makefile.target
index 8010998..b60a837 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -225,6 +225,7 @@ all-remote-pci-obj-y += memory.o
 all-remote-pci-obj-y += exec.o
 all-remote-pci-obj-y += ioport.o
 all-remote-pci-obj-y += cpus.o
+all-remote-pci-obj-y += migration/ram.o
 
 remote-pci-obj-y :=
 remote-lsi-obj-y :=
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index 016b6ab..c9682c6 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -14,4 +14,14 @@ common-obj-$(CONFIG_LIVE_BLOCK_MIGRATION) += block.o
 
 rdma.o-libs := $(RDMA_LIBS)
 
-remote-pci-obj-$(CONFIG_MPQEMU) += qemu-file.o vmstate.o qjson.o vmstate-types.o
+remote-pci-obj-$(CONFIG_MPQEMU) += migration.o socket.o fd.o exec.o
+remote-pci-obj-$(CONFIG_MPQEMU) += tls.o channel.o savevm.o
+remote-pci-obj-$(CONFIG_MPQEMU) += colo.o colo-failover.o
+remote-pci-obj-$(CONFIG_MPQEMU) += vmstate.o vmstate-types.o page_cache.o
+remote-pci-obj-$(CONFIG_MPQEMU) += qemu-file.o global_state.o
+remote-pci-obj-$(CONFIG_MPQEMU) += qemu-file-channel.o
+remote-pci-obj-$(CONFIG_MPQEMU) += xbzrle.o postcopy-ram.o
+remote-pci-obj-$(CONFIG_MPQEMU) += qjson.o
+remote-pci-obj-$(CONFIG_MPQEMU) += block-dirty-bitmap.o
+remote-pci-obj-$(CONFIG_RDMA) += rdma.o
+remote-pci-obj-$(CONFIG_MPQEMU) += block.o
diff --git a/net/Makefile.objs b/net/Makefile.objs
index c5d076d..a8ad986 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -30,3 +30,5 @@ common-obj-$(CONFIG_WIN32) += tap-win32.o
 vde.o-libs = $(VDE_LIBS)
 
 common-obj-$(CONFIG_CAN_BUS) += can/
+
+remote-pci-obj-$(CONFIG_MPQEMU) += announce.o
diff --git a/replay/Makefile.objs b/replay/Makefile.objs
index cee6539..c64b968 100644
--- a/replay/Makefile.objs
+++ b/replay/Makefile.objs
@@ -6,4 +6,4 @@ common-obj-y += replay-input.o
 common-obj-y += replay-char.o
 common-obj-y += replay-snapshot.o
 common-obj-y += replay-net.o
-common-obj-y += replay-audio.o
\ No newline at end of file
+common-obj-y += replay-audio.o
diff --git a/stubs/migration.c b/stubs/migration.c
index 28ccf80..dbd12db 100644
--- a/stubs/migration.c
+++ b/stubs/migration.c
@@ -6,6 +6,35 @@
 #include "qapi/qapi-types-migration.h"
 #include "qapi/qapi-commands-migration.h"
 #include "qapi/qapi-types-net.h"
+#include "net/filter.h"
+#include "net/colo-compare.h"
+
+#pragma weak qmp_query_migrate_capabilities
+#pragma weak qmp_query_migrate_parameters
+#pragma weak migrate_announce_params
+#pragma weak qmp_query_migrate
+#pragma weak qmp_migrate_set_capabilities
+#pragma weak qmp_migrate_set_parameters
+#pragma weak qmp_migrate_incoming
+#pragma weak qmp_migrate_recover
+#pragma weak qmp_migrate_pause
+#pragma weak qmp_migrate
+#pragma weak qmp_migrate_cancel
+#pragma weak qmp_migrate_continue
+#pragma weak qmp_migrate_set_cache_size
+#pragma weak qmp_query_migrate_cache_size
+#pragma weak qmp_migrate_set_speed
+#pragma weak qmp_migrate_set_downtime
+#pragma weak qmp_migrate_start_postcopy
+#pragma weak migration_global_dump
+#pragma weak save_snapshot
+#pragma weak qmp_xen_save_devices_state
+#pragma weak load_snapshot
+#pragma weak qmp_xen_set_replication
+#pragma weak qmp_query_xen_replication_status
+#pragma weak qmp_xen_colo_do_checkpoint
+#pragma weak qmp_query_colo_status
+#pragma weak qmp_x_colo_lost_heartbeat
 
 MigrationInfo *qmp_query_migrate(Error **errp)
 {
@@ -160,3 +189,23 @@ AnnounceParameters *migrate_announce_params(void)
 
     return NULL;
 }
+
+void colo_notify_filters_event(int event, Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+void colo_notify_compares_event(void *opaque, int event, Error **errp)
+{
+    qemu_debug_assert(0);
+}
+
+void colo_compare_register_notifier(Notifier *notify)
+{
+    qemu_debug_assert(0);
+}
+
+void colo_compare_unregister_notifier(Notifier *notify)
+{
+    qemu_debug_assert(0);
+}
diff --git a/stubs/net-stub.c b/stubs/net-stub.c
index 962827e..ddfd1e4 100644
--- a/stubs/net-stub.c
+++ b/stubs/net-stub.c
@@ -5,6 +5,8 @@
 #include "qapi/qapi-commands-net.h"
 #include "qapi/qapi-commands-rocker.h"
 
+#pragma weak qmp_announce_self
+
 int qemu_find_net_clients_except(const char *id, NetClientState **ncs,
                                  NetClientDriver type, int max)
 {
@@ -98,3 +100,22 @@ void netdev_add(QemuOpts *opts, Error **errp)
 {
     qemu_debug_assert(0);
 }
+
+NetClientState *qemu_get_queue(NICState *nic)
+{
+    qemu_debug_assert(0);
+
+    return NULL;
+}
+
+ssize_t qemu_send_packet_raw(NetClientState *nc, const uint8_t *buf, int size)
+{
+    qemu_debug_assert(0);
+
+    return 0;
+}
+
+void qemu_foreach_nic(qemu_nic_foreach func, void *opaque)
+{
+    qemu_debug_assert(0);
+}
diff --git a/stubs/qapi-misc.c b/stubs/qapi-misc.c
index 3eeedd9..824eac1 100644
--- a/stubs/qapi-misc.c
+++ b/stubs/qapi-misc.c
@@ -5,6 +5,8 @@
 #include "./qapi/qapi-types-dump.h"
 #include "qapi/qapi-commands-dump.h"
 
+#pragma weak qmp_xen_load_devices_state
+
 void qmp_dump_guest_memory(bool paging, const char *file,
                            bool has_detach, bool detach,
                            bool has_begin, int64_t begin, bool has_length,
diff --git a/stubs/replay.c b/stubs/replay.c
index 4a966ff..6fcd995 100644
--- a/stubs/replay.c
+++ b/stubs/replay.c
@@ -1,4 +1,5 @@
 #include "qemu/osdep.h"
+#include "qemu-common.h"
 #include "sysemu/replay.h"
 
 ReplayMode replay_mode;
@@ -97,3 +98,10 @@ void replay_account_executed_instructions(void)
 void replay_add_blocker(Error *reason)
 {
 }
+
+bool replay_can_snapshot(void)
+{
+    qemu_debug_assert(0);
+
+    return false;
+}
diff --git a/stubs/vl-stub.c b/stubs/vl-stub.c
index 606f078..460d1f3 100644
--- a/stubs/vl-stub.c
+++ b/stubs/vl-stub.c
@@ -169,3 +169,25 @@ int wav_start_capture(AudioState *state, CaptureState *s, const char *path,
 
     return -1;
 }
+
+void qemu_system_killed(int signal, pid_t pid)
+{
+    qemu_debug_assert(0);
+}
+
+void qemu_system_reset(ShutdownCause reason)
+{
+    qemu_debug_assert(0);
+}
+
+bool runstate_store(char *str, size_t size)
+{
+    qemu_debug_assert(0);
+
+    return false;
+}
+
+void qemu_add_exit_notifier(Notifier *notify)
+{
+    qemu_debug_assert(0);
+}
diff --git a/vl-parse.c b/vl-parse.c
index 011a9ef..1c8ecbe 100644
--- a/vl-parse.c
+++ b/vl-parse.c
@@ -41,6 +41,9 @@
 
 #include "vl.h"
 
+int only_migratable; /* turn it off unless user states otherwise */
+bool enable_mlock;
+
 /***********************************************************/
 /* QEMU Block devices */
 
diff --git a/vl.c b/vl.c
index a6a0db8..8a26d81 100644
--- a/vl.c
+++ b/vl.c
@@ -148,7 +148,6 @@ const char* keyboard_layout = NULL;
 ram_addr_t ram_size;
 const char *mem_path = NULL;
 int mem_prealloc = 0; /* force preallocation of physical target memory */
-bool enable_mlock = false;
 bool enable_cpu_pm = false;
 int nb_nics;
 NICInfo nd_table[MAX_NICS];
@@ -189,7 +188,6 @@ const char *prom_envs[MAX_PROM_ENVS];
 int boot_menu;
 bool boot_strict;
 uint8_t *boot_splash_filedata;
-int only_migratable; /* turn it off unless user states otherwise */
 bool wakeup_suspend_enabled;
 
 int icount_align_option;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 41/49] multi-process/mig: Enable VMSD save in the Proxy object
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (39 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 40/49] multi-process/mig: build migration module in the remote process Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-11-13 15:50   ` Daniel P. Berrangé
  2019-10-24  9:09 ` [RFC v4 PATCH 42/49] multi-process/mig: Send VMSD of remote to " Jagannathan Raman
                   ` (12 subsequent siblings)
  53 siblings, 1 reply; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

Collect the VMSD from remote process on the source and save
it to the channel leading to the destination

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 New patch in v4

 hw/proxy/qemu-proxy.c         | 132 ++++++++++++++++++++++++++++++++++++++++++
 include/hw/proxy/qemu-proxy.h |   2 +
 include/io/mpqemu-link.h      |   1 +
 3 files changed, 135 insertions(+)

diff --git a/hw/proxy/qemu-proxy.c b/hw/proxy/qemu-proxy.c
index 623a6c5..ce72e6a 100644
--- a/hw/proxy/qemu-proxy.c
+++ b/hw/proxy/qemu-proxy.c
@@ -52,6 +52,14 @@
 #include "util/event_notifier-posix.c"
 #include "hw/boards.h"
 #include "include/qemu/log.h"
+#include "io/channel.h"
+#include "migration/qemu-file-types.h"
+#include "qapi/error.h"
+#include "io/channel-util.h"
+#include "migration/qemu-file-channel.h"
+#include "migration/qemu-file.h"
+#include "migration/migration.h"
+#include "migration/vmstate.h"
 
 QEMUTimer *hb_timer;
 static void pci_proxy_dev_realize(PCIDevice *dev, Error **errp);
@@ -62,6 +70,9 @@ static void stop_heartbeat_timer(void);
 static void childsig_handler(int sig, siginfo_t *siginfo, void *ctx);
 static void broadcast_msg(MPQemuMsg *msg, bool need_reply);
 
+#define PAGE_SIZE getpagesize()
+uint8_t *mig_data;
+
 static void childsig_handler(int sig, siginfo_t *siginfo, void *ctx)
 {
     /* TODO: Add proper handler. */
@@ -357,14 +368,135 @@ static void pci_proxy_dev_inst_init(Object *obj)
     dev->mem_init = false;
 }
 
+typedef struct {
+    QEMUFile *rem;
+    PCIProxyDev *dev;
+} proxy_mig_data;
+
+static void *proxy_mig_out(void *opaque)
+{
+    proxy_mig_data *data = opaque;
+    PCIProxyDev *dev = data->dev;
+    uint8_t byte;
+    uint64_t data_size = PAGE_SIZE;
+
+    mig_data = g_malloc(data_size);
+
+    while (true) {
+        byte = qemu_get_byte(data->rem);
+        mig_data[dev->migsize++] = byte;
+        if (dev->migsize == data_size) {
+            data_size += PAGE_SIZE;
+            mig_data = g_realloc(mig_data, data_size);
+        }
+    }
+
+    return NULL;
+}
+
+static int proxy_pre_save(void *opaque)
+{
+    PCIProxyDev *pdev = opaque;
+    proxy_mig_data *mig_data;
+    QEMUFile *f_remote;
+    MPQemuMsg msg = {0};
+    QemuThread thread;
+    Error *err = NULL;
+    QIOChannel *ioc;
+    uint64_t size;
+    int fd[2];
+
+    if (socketpair(AF_UNIX, SOCK_STREAM, 0, fd)) {
+        return -1;
+    }
+
+    ioc = qio_channel_new_fd(fd[0], &err);
+    if (err) {
+        error_report_err(err);
+        return -1;
+    }
+
+    qio_channel_set_name(QIO_CHANNEL(ioc), "PCIProxyDevice-mig");
+
+    f_remote = qemu_fopen_channel_input(ioc);
+
+    pdev->migsize = 0;
+
+    mig_data = g_malloc0(sizeof(proxy_mig_data));
+    mig_data->rem = f_remote;
+    mig_data->dev = pdev;
+
+    qemu_thread_create(&thread, "Proxy MIG_OUT", proxy_mig_out, mig_data,
+                       QEMU_THREAD_DETACHED);
+
+    msg.cmd = START_MIG_OUT;
+    msg.bytestream = 0;
+    msg.num_fds = 2;
+    msg.fds[0] = fd[1];
+    msg.fds[1] = GET_REMOTE_WAIT;
+
+    mpqemu_msg_send(pdev->mpqemu_link, &msg, pdev->mpqemu_link->com);
+    size = wait_for_remote(msg.fds[1]);
+    PUT_REMOTE_WAIT(msg.fds[1]);
+
+    assert(size != ULLONG_MAX);
+
+    /*
+     * migsize is being update by a separate thread. Using volatile to
+     * instruct the compiler to fetch the value of this variable from
+     * memory during every read
+     */
+    while (*((volatile uint64_t *)&pdev->migsize) < size) {
+    }
+
+    qemu_thread_cancel(&thread);
+
+    qemu_fclose(f_remote);
+    close(fd[1]);
+
+    return 0;
+}
+
+static int proxy_post_save(void *opaque)
+{
+    MigrationState *ms = migrate_get_current();
+    PCIProxyDev *pdev = opaque;
+    uint64_t pos = 0;
+
+    while (pos < pdev->migsize) {
+        qemu_put_byte(ms->to_dst_file, mig_data[pos]);
+        pos++;
+    }
+
+    qemu_fflush(ms->to_dst_file);
+
+    return 0;
+}
+
+const VMStateDescription vmstate_pci_proxy_device = {
+    .name = "PCIProxyDevice",
+    .version_id = 2,
+    .minimum_version_id = 1,
+    .pre_save = proxy_pre_save,
+    .post_save = proxy_post_save,
+    .fields = (VMStateField[]) {
+        VMSTATE_PCI_DEVICE(parent_dev, PCIProxyDev),
+        VMSTATE_UINT64(migsize, PCIProxyDev),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static void pci_proxy_dev_class_init(ObjectClass *klass, void *data)
 {
     PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+    DeviceClass *dc = DEVICE_CLASS(klass);
 
     k->realize = pci_proxy_dev_realize;
     k->exit = pci_dev_exit;
     k->config_read = pci_proxy_read_config;
     k->config_write = pci_proxy_write_config;
+
+    dc->vmsd = &vmstate_pci_proxy_device;
 }
 
 static const TypeInfo pci_proxy_dev_type_info = {
diff --git a/include/hw/proxy/qemu-proxy.h b/include/hw/proxy/qemu-proxy.h
index 17e07ac..b122e6d 100644
--- a/include/hw/proxy/qemu-proxy.h
+++ b/include/hw/proxy/qemu-proxy.h
@@ -89,6 +89,8 @@ struct PCIProxyDev {
     void (*init_proxy) (PCIDevice *dev, char *command, bool need_spawn, Error **errp);
 
     ProxyMemoryRegion region[PCI_NUM_REGIONS];
+
+    uint64_t migsize;
 };
 
 typedef struct PCIProxyDevClass {
diff --git a/include/io/mpqemu-link.h b/include/io/mpqemu-link.h
index 6fcc6f5..0ed7750 100644
--- a/include/io/mpqemu-link.h
+++ b/include/io/mpqemu-link.h
@@ -75,6 +75,7 @@ typedef enum {
     PROXY_PING,
     MMIO_RETURN,
     DEVICE_RESET,
+    START_MIG_OUT,
     MAX,
 } mpqemu_cmd_t;
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 42/49] multi-process/mig: Send VMSD of remote to the Proxy object
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (40 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 41/49] multi-process/mig: Enable VMSD save in the Proxy object Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-10-24  9:09 ` [RFC v4 PATCH 43/49] multi-process/mig: Load VMSD in the proxy object Jagannathan Raman
                   ` (11 subsequent siblings)
  53 siblings, 0 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

The remote process sends the VMSD to the Proxy object, on the source
side

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 New patch in v4

 migration/savevm.c   | 27 +++++++++++++++++++++++++++
 migration/savevm.h   |  2 ++
 remote/remote-main.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 72 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index 8d95e26..0c84142 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2903,3 +2903,30 @@ bool vmstate_check_only_migratable(const VMStateDescription *vmsd)
 
     return !(vmsd && vmsd->unmigratable);
 }
+
+int qemu_remote_savevm(QEMUFile *f)
+{
+    SaveStateEntry *se;
+    int ret;
+
+    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (!se->vmsd || !vmstate_save_needed(se->vmsd, se->opaque)) {
+            continue;
+        }
+
+        save_section_header(f, se, QEMU_VM_SECTION_FULL);
+
+        ret = vmstate_save(f, se, NULL);
+        if (ret) {
+            qemu_file_set_error(f, ret);
+            return ret;
+        }
+
+        save_section_footer(f, se);
+    }
+
+    qemu_put_byte(f, QEMU_VM_EOF);
+    qemu_fflush(f);
+
+    return 0;
+}
diff --git a/migration/savevm.h b/migration/savevm.h
index 51a4b9c..a6582ac 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -64,4 +64,6 @@ void qemu_loadvm_state_cleanup(void);
 int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
 int qemu_load_device_state(QEMUFile *f);
 
+int qemu_remote_savevm(QEMUFile *f);
+
 #endif
diff --git a/remote/remote-main.c b/remote/remote-main.c
index 341b7cf..0284039 100644
--- a/remote/remote-main.c
+++ b/remote/remote-main.c
@@ -66,6 +66,16 @@
 #include "qemu/log.h"
 #include "qemu/cutils.h"
 #include "remote-opts.h"
+#include "qapi/error.h"
+#include "io/channel-util.h"
+
+#include "io/channel.h"
+#include "io/channel-socket.h"
+#include "migration/qemu-file-types.h"
+#include "migration/savevm.h"
+#include "migration/qemu-file-channel.h"
+#include "migration/qemu-file.h"
+
 #include "monitor/monitor.h"
 #include "chardev/char.h"
 #include "sysemu/reset.h"
@@ -362,6 +372,36 @@ static int setup_device(MPQemuMsg *msg, Error **errp)
     return 0;
 }
 
+static void process_start_mig_out(MPQemuMsg *msg)
+{
+    int wait = msg->fds[1];
+    Error *err = NULL;
+    QIOChannel *ioc;
+    QEMUFile *f;
+
+    ioc = qio_channel_new_fd(msg->fds[0], &err);
+    if (err) {
+        error_report_err(err);
+        return;
+    }
+
+    qio_channel_set_name(QIO_CHANNEL(ioc), "remote-migration-channel");
+
+    f = qemu_fopen_channel_output(ioc);
+
+    bdrv_drain_all();
+    (void)bdrv_flush_all();
+
+    (void)qemu_remote_savevm(f);
+
+    qemu_fflush(f);
+
+    notify_proxy(wait, (uint64_t)qemu_ftell(f));
+    PUT_REMOTE_WAIT(wait);
+
+    qemu_fclose(f);
+}
+
 static void process_msg(GIOCondition cond, MPQemuChannel *chan)
 {
     MPQemuMsg *msg = NULL;
@@ -454,6 +494,9 @@ static void process_msg(GIOCondition cond, MPQemuChannel *chan)
     case DEVICE_RESET:
         process_device_reset_msg(msg);
         break;
+    case START_MIG_OUT:
+        process_start_mig_out(msg);
+        break;
     default:
         error_setg(&err, "Unknown command");
         goto finalize_loop;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 43/49] multi-process/mig: Load VMSD in the proxy object
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (41 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 42/49] multi-process/mig: Send VMSD of remote to " Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-10-24  9:09 ` [RFC v4 PATCH 44/49] multi-process/mig: refactor runstate_check into common file Jagannathan Raman
                   ` (10 subsequent siblings)
  53 siblings, 0 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

The Proxy object loads the VMSD of remote process in source
and send it to the remote process in the destination

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 New patch in v4

 hw/proxy/qemu-proxy.c    | 50 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/io/mpqemu-link.h |  1 +
 2 files changed, 51 insertions(+)

diff --git a/hw/proxy/qemu-proxy.c b/hw/proxy/qemu-proxy.c
index ce72e6a..c85ffb3 100644
--- a/hw/proxy/qemu-proxy.c
+++ b/hw/proxy/qemu-proxy.c
@@ -473,12 +473,62 @@ static int proxy_post_save(void *opaque)
     return 0;
 }
 
+static int proxy_post_load(void *opaque, int version_id)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    PCIProxyDev *pdev = opaque;
+    QEMUFile *f_remote;
+    MPQemuMsg msg = {0};
+    Error *err = NULL;
+    QIOChannel *ioc;
+    uint64_t size;
+    uint8_t byte;
+    int fd[2];
+
+    if (socketpair(AF_UNIX, SOCK_STREAM, 0, fd)) {
+        return -1;
+    }
+
+    ioc = qio_channel_new_fd(fd[0], &err);
+    if (err) {
+        error_report_err(err);
+        return -1;
+    }
+
+    qio_channel_set_name(QIO_CHANNEL(ioc), "proxy-migration-channel");
+
+    f_remote = qemu_fopen_channel_output(ioc);
+
+    msg.cmd = START_MIG_IN;
+    msg.bytestream = 0;
+    msg.num_fds = 1;
+    msg.fds[0] = fd[1];
+
+    mpqemu_msg_send(pdev->mpqemu_link, &msg, pdev->mpqemu_link->com);
+
+    size = pdev->migsize;
+
+    while (size) {
+        byte = qemu_get_byte(mis->from_src_file);
+        qemu_put_byte(f_remote, byte);
+        size--;
+    }
+
+    qemu_fflush(f_remote);
+    qemu_fclose(f_remote);
+
+    close(fd[1]);
+
+    return 0;
+}
+
 const VMStateDescription vmstate_pci_proxy_device = {
     .name = "PCIProxyDevice",
     .version_id = 2,
     .minimum_version_id = 1,
     .pre_save = proxy_pre_save,
     .post_save = proxy_post_save,
+    .post_load = proxy_post_load,
     .fields = (VMStateField[]) {
         VMSTATE_PCI_DEVICE(parent_dev, PCIProxyDev),
         VMSTATE_UINT64(migsize, PCIProxyDev),
diff --git a/include/io/mpqemu-link.h b/include/io/mpqemu-link.h
index 0ed7750..05dc55e 100644
--- a/include/io/mpqemu-link.h
+++ b/include/io/mpqemu-link.h
@@ -76,6 +76,7 @@ typedef enum {
     MMIO_RETURN,
     DEVICE_RESET,
     START_MIG_OUT,
+    START_MIG_IN,
     MAX,
 } mpqemu_cmd_t;
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 44/49] multi-process/mig: refactor runstate_check into common file
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (42 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 43/49] multi-process/mig: Load VMSD in the proxy object Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-10-24  9:09 ` [RFC v4 PATCH 45/49] multi-process/mig: Synchronize runstate of remote process Jagannathan Raman
                   ` (9 subsequent siblings)
  53 siblings, 0 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

runstate_check file is refactored into vl-parse.c

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 New patch in v4

 Makefile.objs             |  2 ++
 include/sysemu/runstate.h |  2 ++
 runstate.c                | 36 ++++++++++++++++++++++++++++++++++++
 stubs/runstate-check.c    |  3 +++
 vl-parse.c                |  1 -
 vl.c                      | 10 ----------
 6 files changed, 43 insertions(+), 11 deletions(-)
 create mode 100644 runstate.c

diff --git a/Makefile.objs b/Makefile.objs
index ebb1938..66fbee0 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -38,6 +38,7 @@ remote-pci-obj-$(CONFIG_MPQEMU) += blockdev.o
 remote-pci-obj-$(CONFIG_MPQEMU) += qdev-monitor.o
 remote-pci-obj-$(CONFIG_MPQEMU) += bootdevice.o
 remote-pci-obj-$(CONFIG_MPQEMU) += iothread.o
+remote-pci-obj-$(CONFIG_MPQEMU) += runstate.o
 
 ##############################################################
 # remote-lsi-obj-y is code used to implement remote LSI device
@@ -111,6 +112,7 @@ qemu-seccomp.o-libs := $(SECCOMP_LIBS)
 common-obj-$(CONFIG_FDT) += device_tree.o
 
 common-obj-y += vl-parse.o
+common-obj-y += runstate.o
 
 remote-pci-obj-$(CONFIG_MPQEMU) += net/
 
diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 0b41555..e89ebf8 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -4,6 +4,8 @@
 #include "qapi/qapi-types-run-state.h"
 #include "qemu/notify.h"
 
+extern RunState current_run_state;
+
 bool runstate_check(RunState state);
 void runstate_set(RunState new_state);
 int runstate_is_running(void);
diff --git a/runstate.c b/runstate.c
new file mode 100644
index 0000000..273345a
--- /dev/null
+++ b/runstate.c
@@ -0,0 +1,36 @@
+/*
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "sysemu/runstate.h"
+
+/***********************************************************/
+/* QEMU state */
+
+RunState current_run_state = RUN_STATE_PRECONFIG;
+
+bool runstate_check(RunState state)
+{
+    return current_run_state == state;
+}
diff --git a/stubs/runstate-check.c b/stubs/runstate-check.c
index 2ccda2b..3038bcb 100644
--- a/stubs/runstate-check.c
+++ b/stubs/runstate-check.c
@@ -1,6 +1,9 @@
 #include "qemu/osdep.h"
 
 #include "sysemu/runstate.h"
+
+#pragma weak runstate_check
+
 bool runstate_check(RunState state)
 {
     return state == RUN_STATE_PRELAUNCH;
diff --git a/vl-parse.c b/vl-parse.c
index 1c8ecbe..3bf1f0f 100644
--- a/vl-parse.c
+++ b/vl-parse.c
@@ -159,4 +159,3 @@ int rdevice_init_func(void *opaque, QemuOpts *opts, Error **errp)
     return 0;
 }
 #endif
-
diff --git a/vl.c b/vl.c
index 8a26d81..725429b 100644
--- a/vl.c
+++ b/vl.c
@@ -665,11 +665,6 @@ static int default_driver_check(void *opaque, QemuOpts *opts, Error **errp)
     return 0;
 }
 
-/***********************************************************/
-/* QEMU state */
-
-static RunState current_run_state = RUN_STATE_PRECONFIG;
-
 /* We use RUN_STATE__MAX but any invalid value will do */
 static RunState vmstop_requested = RUN_STATE__MAX;
 static QemuMutex vmstop_lock;
@@ -777,11 +772,6 @@ static const RunStateTransition runstate_transitions_def[] = {
 
 static bool runstate_valid_transitions[RUN_STATE__MAX][RUN_STATE__MAX];
 
-bool runstate_check(RunState state)
-{
-    return current_run_state == state;
-}
-
 bool runstate_store(char *str, size_t size)
 {
     const char *state = RunState_str(current_run_state);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 45/49] multi-process/mig: Synchronize runstate of remote process
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (43 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 44/49] multi-process/mig: refactor runstate_check into common file Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-11-11 16:17   ` Stefan Hajnoczi
  2019-10-24  9:09 ` [RFC v4 PATCH 46/49] multi-process/mig: Restore the VMSD in " Jagannathan Raman
                   ` (8 subsequent siblings)
  53 siblings, 1 reply; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

Synchronize the runstate of the remote process with that of QEMU

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 New patch in v4

 hw/proxy/qemu-proxy.c         | 18 ++++++++++++++++++
 include/hw/proxy/qemu-proxy.h |  2 ++
 include/io/mpqemu-link.h      |  8 ++++++++
 include/sysemu/runstate.h     |  1 +
 qdev-monitor.c                | 13 ++++++++++++-
 remote/remote-main.c          |  4 ++++
 remote/remote-opts.c          |  5 +++++
 runstate.c                    |  5 +++++
 8 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/hw/proxy/qemu-proxy.c b/hw/proxy/qemu-proxy.c
index c85ffb3..eff299b 100644
--- a/hw/proxy/qemu-proxy.c
+++ b/hw/proxy/qemu-proxy.c
@@ -44,6 +44,7 @@
 #include "qapi/qmp/qjson.h"
 #include "qapi/qmp/qstring.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/runstate.h"
 #include "hw/proxy/qemu-proxy.h"
 #include "hw/proxy/memory-sync.h"
 #include "qom/object.h"
@@ -656,6 +657,19 @@ static void init_proxy(PCIDevice *dev, char *command, bool need_spawn, Error **e
     }
 }
 
+static void proxy_vm_state_change(void *opaque, int running, RunState state)
+{
+    PCIProxyDev *dev = opaque;
+    MPQemuMsg msg = { 0 };
+
+    msg.cmd = RUNSTATE_SET;
+    msg.bytestream = 0;
+    msg.size = sizeof(msg.data1);
+    msg.data1.runstate.state = state;
+
+    mpqemu_msg_send(dev->mpqemu_link, &msg, dev->mpqemu_link->com);
+}
+
 static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
 {
     PCIProxyDev *dev = PCI_PROXY_DEV(device);
@@ -681,6 +695,8 @@ static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
                          &dev->region[r].mr);
     }
 
+    dev->vmcse = qemu_add_vm_change_state_handler(proxy_vm_state_change, dev);
+
     dev->set_proxy_sock = set_proxy_sock;
     dev->get_proxy_sock = get_proxy_sock;
     dev->init_proxy = init_proxy;
@@ -706,6 +722,8 @@ static void pci_dev_exit(PCIDevice *pdev)
     if (!QLIST_EMPTY(&proxy_dev_list.devices)) {
         start_heartbeat_timer();
     }
+
+    qemu_del_vm_change_state_handler(dev->vmcse);
 }
 
 static void send_bar_access_msg(PCIProxyDev *dev, MemoryRegion *mr,
diff --git a/include/hw/proxy/qemu-proxy.h b/include/hw/proxy/qemu-proxy.h
index b122e6d..7fe987d 100644
--- a/include/hw/proxy/qemu-proxy.h
+++ b/include/hw/proxy/qemu-proxy.h
@@ -90,6 +90,8 @@ struct PCIProxyDev {
 
     ProxyMemoryRegion region[PCI_NUM_REGIONS];
 
+    VMChangeStateEntry *vmcse;
+
     uint64_t migsize;
 };
 
diff --git a/include/io/mpqemu-link.h b/include/io/mpqemu-link.h
index 05dc55e..f5a0bbb 100644
--- a/include/io/mpqemu-link.h
+++ b/include/io/mpqemu-link.h
@@ -39,6 +39,8 @@
 #include "exec/cpu-common.h"
 #include "exec/hwaddr.h"
 
+#include "qapi/qapi-types-run-state.h"
+
 #define TYPE_MPQEMU_LINK "mpqemu-link"
 #define MPQEMU_LINK(obj) \
     OBJECT_CHECK(MPQemuLinkState, (obj), TYPE_MPQEMU_LINK)
@@ -77,6 +79,7 @@ typedef enum {
     DEVICE_RESET,
     START_MIG_OUT,
     START_MIG_IN,
+    RUNSTATE_SET,
     MAX,
 } mpqemu_cmd_t;
 
@@ -115,6 +118,10 @@ typedef struct {
 } mmio_ret_msg_t;
 
 typedef struct {
+    RunState state;
+} runstate_msg_t;
+
+typedef struct {
     mpqemu_cmd_t cmd;
     int bytestream;
     size_t size;
@@ -125,6 +132,7 @@ typedef struct {
         bar_access_msg_t bar_access;
         set_irqfd_msg_t set_irqfd;
         mmio_ret_msg_t mmio_ret;
+        runstate_msg_t runstate;
     } data1;
 
     int fds[REMOTE_MAX_FDS];
diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index e89ebf8..c7ad916 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -8,6 +8,7 @@ extern RunState current_run_state;
 
 bool runstate_check(RunState state);
 void runstate_set(RunState new_state);
+void remote_runstate_set(RunState state);
 int runstate_is_running(void);
 bool runstate_needs_reset(void);
 bool runstate_store(char *str, size_t size);
diff --git a/qdev-monitor.c b/qdev-monitor.c
index 2a2c10b..c6aa35c 100644
--- a/qdev-monitor.c
+++ b/qdev-monitor.c
@@ -34,6 +34,7 @@
 #include "qemu/qemu-print.h"
 #include "sysemu/block-backend.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/runstate.h"
 #include "migration/misc.h"
 #include "hw/boards.h"
 #include "hw/proxy/qemu-proxy.h"
@@ -641,7 +642,7 @@ void qdev_proxy_fire(void)
 }
 
 DeviceState *qdev_proxy_add(const char *rid, const char *id, char *bus,
-                            char *command, int rsocket, bool managed,
+                            char *cmd, int rsocket, bool managed,
                             Error **errp)
 {
     DeviceState *ds;
@@ -653,6 +654,7 @@ DeviceState *qdev_proxy_add(const char *rid, const char *id, char *bus,
     const char *str;
     bool need_spawn = false;
     bool remote_exists = false;
+    char *command;
 
     if (strlen(rid) > MAX_RID_LENGTH) {
         error_setg(errp, "rid %s is too long.", rid);
@@ -725,6 +727,12 @@ DeviceState *qdev_proxy_add(const char *rid, const char *id, char *bus,
         need_spawn = true;
     }
 
+    if (runstate_check(RUN_STATE_INMIGRATE)) {
+        command = g_strdup_printf("%s %s", cmd, "-incoming defer");
+    } else {
+        command = g_strdup(cmd);
+    }
+
     pdev->init_proxy(PCI_DEVICE(ds), command, need_spawn, errp);
 
     qemu_mutex_lock(&proxy_list_lock);
@@ -732,6 +740,9 @@ DeviceState *qdev_proxy_add(const char *rid, const char *id, char *bus,
     qemu_mutex_unlock(&proxy_list_lock);
 
     qemu_opts_del(proxy_opts);
+
+    g_free(command);
+
     return ds;
 }
 
diff --git a/remote/remote-main.c b/remote/remote-main.c
index 0284039..2de5ddf 100644
--- a/remote/remote-main.c
+++ b/remote/remote-main.c
@@ -45,6 +45,7 @@
 #include "qemu/main-loop.h"
 #include "qemu/config-file.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/runstate.h"
 #include "block/block.h"
 #include "exec/memattrs.h"
 #include "exec/address-spaces.h"
@@ -497,6 +498,9 @@ static void process_msg(GIOCondition cond, MPQemuChannel *chan)
     case START_MIG_OUT:
         process_start_mig_out(msg);
         break;
+    case RUNSTATE_SET:
+        remote_runstate_set(msg->data1.runstate.state);
+        break;
     default:
         error_setg(&err, "Unknown command");
         goto finalize_loop;
diff --git a/remote/remote-opts.c b/remote/remote-opts.c
index 1b1824e..d3ae221 100644
--- a/remote/remote-opts.c
+++ b/remote/remote-opts.c
@@ -56,8 +56,10 @@
 #include "remote/remote-opts.h"
 #include "include/qemu-common.h"
 #include "monitor/monitor.h"
+#include "sysemu/runstate.h"
 
 #include "vl.h"
+
 /*
  * In remote process, we parse only subset of options. The code
  * taken from vl.c to re-use in remote command line parser.
@@ -101,6 +103,9 @@ void parse_cmdline(int argc, char **argv, char **envp)
             case QEMU_OPTION_qmp:
                 monitor_parse(optarg, "control", false);
                 break;
+            case QEMU_OPTION_incoming:
+                remote_runstate_set(RUN_STATE_INMIGRATE);
+                break;
             case QEMU_OPTION_monitor:
                 if (!strncmp(optarg, "stdio", 5)) {
                     warn_report("STDIO not supported in remote process");
diff --git a/runstate.c b/runstate.c
index 273345a..9c5c627 100644
--- a/runstate.c
+++ b/runstate.c
@@ -34,3 +34,8 @@ bool runstate_check(RunState state)
 {
     return current_run_state == state;
 }
+
+void remote_runstate_set(RunState state)
+{
+    current_run_state = state;
+}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 46/49] multi-process/mig: Restore the VMSD in remote process
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (44 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 45/49] multi-process/mig: Synchronize runstate of remote process Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-10-24  9:09 ` [RFC v4 PATCH 47/49] multi-process: Enable support for multiple devices in remote Jagannathan Raman
                   ` (7 subsequent siblings)
  53 siblings, 0 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

The remote process accepts the VMSD from Proxy object and
restores it

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 New patch in v4

 migration/savevm.c   | 36 ++++++++++++++++++++++++++++++++++++
 migration/savevm.h   |  1 +
 remote/remote-main.c | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 71 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index 0c84142..d730cd1 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2930,3 +2930,39 @@ int qemu_remote_savevm(QEMUFile *f)
 
     return 0;
 }
+
+int qemu_remote_loadvm(QEMUFile *f)
+{
+    uint8_t section_type;
+    int ret = 0;
+
+    qemu_mutex_lock_iothread();
+
+    while (true) {
+        section_type = qemu_get_byte(f);
+
+        if (qemu_file_get_error(f)) {
+            ret = qemu_file_get_error(f);
+            break;
+        }
+
+        switch (section_type) {
+        case QEMU_VM_SECTION_FULL:
+            ret = qemu_loadvm_section_start_full(f, NULL);
+            if (ret < 0) {
+                break;
+            }
+            break;
+        case QEMU_VM_EOF:
+            goto out;
+        default:
+            ret = -EINVAL;
+            goto out;
+        }
+    }
+
+out:
+    qemu_mutex_unlock_iothread();
+
+    return ret;
+}
diff --git a/migration/savevm.h b/migration/savevm.h
index a6582ac..415b72c 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -65,5 +65,6 @@ int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
 int qemu_load_device_state(QEMUFile *f);
 
 int qemu_remote_savevm(QEMUFile *f);
+int qemu_remote_loadvm(QEMUFile *f);
 
 #endif
diff --git a/remote/remote-main.c b/remote/remote-main.c
index 2de5ddf..600c894 100644
--- a/remote/remote-main.c
+++ b/remote/remote-main.c
@@ -81,6 +81,7 @@
 #include "chardev/char.h"
 #include "sysemu/reset.h"
 #include "vl.h"
+#include "migration/misc.h"
 
 static MPQemuLinkState *mpqemu_link;
 
@@ -403,6 +404,30 @@ static void process_start_mig_out(MPQemuMsg *msg)
     qemu_fclose(f);
 }
 
+static int process_start_mig_in(MPQemuMsg *msg)
+{
+    Error *err = NULL;
+    QIOChannel *ioc;
+    QEMUFile *f;
+    int rc = -EINVAL;
+
+    ioc = qio_channel_new_fd(msg->fds[0], &err);
+    if (err) {
+        error_report_err(err);
+        return rc;
+    }
+
+    qio_channel_set_name(QIO_CHANNEL(ioc), "remote-migration-channel");
+
+    f = qemu_fopen_channel_input(ioc);
+
+    rc = qemu_remote_loadvm(f);
+
+    qemu_fclose(f);
+
+    return rc;
+}
+
 static void process_msg(GIOCondition cond, MPQemuChannel *chan)
 {
     MPQemuMsg *msg = NULL;
@@ -498,6 +523,13 @@ static void process_msg(GIOCondition cond, MPQemuChannel *chan)
     case START_MIG_OUT:
         process_start_mig_out(msg);
         break;
+    case START_MIG_IN:
+        if (process_start_mig_in(msg))
+        {
+            error_setg(&err, "Incoming migration failed.");
+            goto finalize_loop;
+        }
+        break;
     case RUNSTATE_SET:
         remote_runstate_set(msg->data1.runstate.state);
         break;
@@ -569,6 +601,8 @@ int main(int argc, char *argv[])
     }
     mpqemu_init_channel(mpqemu_link, &mpqemu_link->mmio, fd);
 
+    migration_object_init();
+
     parse_cmdline(argc - 3, argv + 3, NULL);
 
     mpqemu_link_set_callback(mpqemu_link, process_msg);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 47/49] multi-process: Enable support for multiple devices in remote
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (45 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 46/49] multi-process/mig: Restore the VMSD in " Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-11-11 16:15   ` Stefan Hajnoczi
  2019-10-24  9:09 ` [RFC v4 PATCH 48/49] multi-process: add the concept description to docs/devel/qemu-multiprocess Jagannathan Raman
                   ` (6 subsequent siblings)
  53 siblings, 1 reply; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

Add support to allow multiple devices to be configured in the
remote process

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 New patch in v4

 hw/proxy/qemu-proxy.c         |  3 +++
 include/hw/proxy/qemu-proxy.h |  3 +++
 include/io/mpqemu-link.h      |  1 +
 qdev-monitor.c                |  2 ++
 remote/remote-main.c          | 34 ++++++++++++++++++++++++----------
 5 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/hw/proxy/qemu-proxy.c b/hw/proxy/qemu-proxy.c
index eff299b..2231c36 100644
--- a/hw/proxy/qemu-proxy.c
+++ b/hw/proxy/qemu-proxy.c
@@ -176,6 +176,7 @@ static void set_remote_opts(PCIDevice *dev, QDict *qdict, unsigned int cmd)
     msg.bytestream = 1;
     msg.size = qstring_get_length(qstr) + 1;
     msg.num_fds = 0;
+    msg.id = pdev->id;
 
     mpqemu_msg_send(pdev->mpqemu_link, &msg, pdev->mpqemu_link->com);
 
@@ -322,6 +323,7 @@ static int config_op_send(PCIProxyDev *dev, uint32_t addr, uint32_t *val, int l,
     msg.size = sizeof(conf_data);
     msg.cmd = op;
     msg.bytestream = 1;
+    msg.id = dev->id;
 
     if (op == CONF_WRITE) {
         msg.num_fds = 0;
@@ -602,6 +604,7 @@ static void setup_irqfd(PCIProxyDev *dev)
 
     memset(&msg, 0, sizeof(MPQemuMsg));
     msg.cmd = SET_IRQFD;
+    msg.id = dev->id;
     msg.num_fds = 2;
     msg.fds[0] = event_notifier_get_fd(&dev->intr);
     msg.fds[1] = event_notifier_get_fd(&dev->resample);
diff --git a/include/hw/proxy/qemu-proxy.h b/include/hw/proxy/qemu-proxy.h
index 7fe987d..6a0a574 100644
--- a/include/hw/proxy/qemu-proxy.h
+++ b/include/hw/proxy/qemu-proxy.h
@@ -57,6 +57,9 @@ extern const MemoryRegionOps proxy_default_ops;
 struct PCIProxyDev {
     PCIDevice parent_dev;
 
+    uint64_t id;
+    uint64_t nr_devices;
+
     int n_mr_sections;
     MemoryRegionSection *mr_sections;
 
diff --git a/include/io/mpqemu-link.h b/include/io/mpqemu-link.h
index f5a0bbb..ba81515 100644
--- a/include/io/mpqemu-link.h
+++ b/include/io/mpqemu-link.h
@@ -124,6 +124,7 @@ typedef struct {
 typedef struct {
     mpqemu_cmd_t cmd;
     int bytestream;
+    uint64_t id;
     size_t size;
 
     union {
diff --git a/qdev-monitor.c b/qdev-monitor.c
index c6aa35c..70a7a5a 100644
--- a/qdev-monitor.c
+++ b/qdev-monitor.c
@@ -716,9 +716,11 @@ DeviceState *qdev_proxy_add(const char *rid, const char *id, char *bus,
         pdev->mmio_sock = old_pdev->mmio_sock;
         pdev->remote_pid = old_pdev->remote_pid;
         pdev->mem_init = true;
+        pdev->id = old_pdev->nr_devices++;
     } else {
         pdev->rsocket = managed ? rsocket : -1;
         pdev->socket = managed ? rsocket : -1;
+        pdev->id =  pdev->nr_devices++;
     }
     pdev->managed = managed;
 
diff --git a/remote/remote-main.c b/remote/remote-main.c
index 600c894..93b8500 100644
--- a/remote/remote-main.c
+++ b/remote/remote-main.c
@@ -85,7 +85,8 @@
 
 static MPQemuLinkState *mpqemu_link;
 
-PCIDevice *remote_pci_dev;
+PCIDevice **remote_pci_devs;
+uint64_t nr_devices;
 bool create_done;
 
 static void process_config_write(MPQemuMsg *msg)
@@ -93,7 +94,8 @@ static void process_config_write(MPQemuMsg *msg)
     struct conf_data_msg *conf = (struct conf_data_msg *)msg->data2;
 
     qemu_mutex_lock_iothread();
-    pci_default_write_config(remote_pci_dev, conf->addr, conf->val, conf->l);
+    pci_default_write_config(remote_pci_devs[msg->id], conf->addr, conf->val,
+                             conf->l);
     qemu_mutex_unlock_iothread();
 }
 
@@ -106,7 +108,8 @@ static void process_config_read(MPQemuMsg *msg)
     wait = msg->fds[0];
 
     qemu_mutex_lock_iothread();
-    val = pci_default_read_config(remote_pci_dev, conf->addr, conf->l);
+    val = pci_default_read_config(remote_pci_devs[msg->id], conf->addr,
+                                  conf->l);
     qemu_mutex_unlock_iothread();
 
     notify_proxy(wait, val);
@@ -366,9 +369,17 @@ static int setup_device(MPQemuMsg *msg, Error **errp)
                    qstring_get_str(qobject_to_json(QOBJECT(qdict))));
         return rc;
     }
+
     if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
-        remote_pci_dev = PCI_DEVICE(dev);
+        if (nr_devices <= msg->id) {
+            nr_devices = msg->id + 1;
+            remote_pci_devs = g_realloc(remote_pci_devs,
+                                        nr_devices * sizeof(PCIDevice *));
+        }
+
+        remote_pci_devs[msg->id] = PCI_DEVICE(dev);
     }
+
     qemu_opts_del(opts);
 
     return 0;
@@ -489,12 +500,15 @@ static void process_msg(GIOCondition cond, MPQemuChannel *chan)
         }
         break;
     case SET_IRQFD:
-        process_set_irqfd_msg(remote_pci_dev, msg);
-        qdev_machine_creation_done();
-        qemu_mutex_lock_iothread();
-        qemu_run_machine_init_done_notifiers();
-        qemu_mutex_unlock_iothread();
-        create_done = true;
+        process_set_irqfd_msg(remote_pci_devs[msg->id], msg);
+
+        if (!create_done) {
+            qdev_machine_creation_done();
+            qemu_mutex_lock_iothread();
+            qemu_run_machine_init_done_notifiers();
+            qemu_mutex_unlock_iothread();
+            create_done = true;
+        }
         break;
     case DRIVE_OPTS:
         if (setup_drive(msg, &err)) {
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 48/49] multi-process: add the concept description to docs/devel/qemu-multiprocess
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (46 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 47/49] multi-process: Enable support for multiple devices in remote Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-10-25 19:33   ` Elena Ufimtseva
                     ` (2 more replies)
  2019-10-24  9:09 ` [RFC v4 PATCH 49/49] multi-process: add configure and usage information Jagannathan Raman
                   ` (5 subsequent siblings)
  53 siblings, 3 replies; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

From: John G Johnson <john.g.johnson@oracle.com>

Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 v2 -> v3:
   - Updated with latest design of this project

 v3 -> v4:
  - Updated document to RST format

 docs/devel/index.rst             |    1 +
 docs/devel/qemu-multiprocess.rst | 1102 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 1103 insertions(+)
 create mode 100644 docs/devel/qemu-multiprocess.rst

diff --git a/docs/devel/index.rst b/docs/devel/index.rst
index 1ec61fc..edd3fe3 100644
--- a/docs/devel/index.rst
+++ b/docs/devel/index.rst
@@ -22,3 +22,4 @@ Contents:
    decodetree
    secure-coding-practices
    tcg
+   multi-process
diff --git a/docs/devel/qemu-multiprocess.rst b/docs/devel/qemu-multiprocess.rst
new file mode 100644
index 0000000..2c42c6e
--- /dev/null
+++ b/docs/devel/qemu-multiprocess.rst
@@ -0,0 +1,1102 @@
+Disaggregating QEMU
+===================
+
+QEMU is often used as the hypervisor for virtual machines running in the
+Oracle cloud. Since one of the advantages of cloud computing is the
+ability to run many VMs from different tenants in the same cloud
+infrastructure, a guest that compromised its hypervisor could
+potentially use the hypervisor's access privileges to access data it is
+not authorized for.
+
+QEMU can be susceptible to security attack because it is a large,
+monolithic program that provides many features to the VMs it services.
+Many of these feature can be configured out of QEMU, but even a reduced
+configuration QEMU has a large amount of code a guest can potentially
+attack in order to gain additional privileges.
+
+QEMU services
+-------------
+
+QEMU can be broadly described as providing three main services. One is a
+VM control point, where VMs can be created, migrated, re-configured, and
+destroyed. A second is to emulate the CPU instructions within the VM,
+often accelerated by HW virtualization features such as Intel's VT
+extensions. Finally, it provides IO services to the VM by emulating HW
+IO devices, such as disk and network devices.
+
+A disaggregated QEMU
+~~~~~~~~~~~~~~~~~~~~
+
+A disaggregated QEMU involves separating QEMU services into separate
+host processes. Each of these processes can be given only the privileges
+it needs to provide its service, e.g., a disk service could be given
+access only the the disk images it provides, and not be allowed to
+access other files, or any network devices. An attacker who compromised
+this service would not be able to use this exploit to access files or
+devices beyond what the disk service was given access to.
+
+A QEMU control process would remain, but in disaggregated mode, it would
+be a control point that executes the processes needed to support the VM
+being created, but have no direct interfaces to the VM. During VM
+execution, it would still provide the user interface to hot-plug devices
+or live migrate the VM.
+
+A first step in creating a disaggregated QEMU is to separate IO services
+from the main QEMU program, which would continue to provide CPU
+emulation. i.e., the control process would also be the CPU emulation
+process. In a later phase, CPU emulation could be separated from the
+control process.
+
+Disaggregating IO services
+--------------------------
+
+Disaggregating IO services is a good place to begin QEMU disaggregating
+for a couple of reasons. One is the sheer number of IO devices QEMU can
+emulate provides a large surface of interfaces which could potentially
+be exploited, and, indeed, have been a source of exploits in the past.
+Another is the modular nature of QEMU device emulation code provides
+interface points where the QEMU functions that perform device emulation
+can be separated from the QEMU functions that manage the emulation of
+guest CPU instructions.
+
+QEMU device emulation
+~~~~~~~~~~~~~~~~~~~~~
+
+QEMU uses a object oriented SW architecture for device emulation code.
+Configured objects are all compiled into the QEMU binary, then objects
+are instantiated by name when used by the guest VM. For example, the
+code to emulate a device named "foo" is always present in QEMU, but its
+instantiation code is only run when the device is included in the target
+VM. (e.g., via the QEMU command line as *-device foo*)
+
+The object model is hierarchical, so device emulation code names its
+parent object (such as "pci-device" for a PCI device) and QEMU will
+instantiate a parent object before calling the device's instantiation
+code.
+
+Current separation models
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In order to separate the device emulation code from the CPU emulation
+code, the device object code must run in a different process. There are
+a couple of existing QEMU features that can run emulation code
+separately from the main QEMU process. These are examined below.
+
+vhost user model
+^^^^^^^^^^^^^^^^
+
+Virtio guest device drivers can be connected to vhost user applications
+in order to perform their IO operations. This model uses special virtio
+device drivers in the guest and vhost user device objects in QEMU, but
+once the QEMU vhost user code has configured the vhost user application,
+mission-mode IO is performed by the application. The vhost user
+application is a daemon process that can be contacted via a known UNIX
+domain socket.
+
+vhost socket
+''''''''''''
+
+As mentioned above, one of the tasks of the vhost device object within
+QEMU is to contact the vhost application and send it configuration
+information about this device instance. As part of the configuration
+process, the application can also be sent other file descriptors over
+the socket, which then can be used by the vhost user application in
+various ways, some of which are described below.
+
+vhost MMIO store acceleration
+'''''''''''''''''''''''''''''
+
+VMs are often run using HW virtualization features via the KVM kernel
+driver. This driver allows QEMU to accelerate the emulation of guest CPU
+instructions by running the guest in a virtual HW mode. When the guest
+executes instructions that cannot be executed by virtual HW mode,
+execution returns to the KVM driver so it can inform QEMU to emulate the
+instructions in SW.
+
+One of the events that can cause a return to QEMU is when a guest device
+driver accesses an IO location. QEMU then dispatches the memory
+operation to the corresponding QEMU device object. In the case of a
+vhost user device, the memory operation would need to be sent over a
+socket to the vhost application. This path is accelerated by the QEMU
+virtio code by setting up an eventfd file descriptor that the vhost
+application can directly receive MMIO store notifications from the KVM
+driver, instead of needing them to be sent to the QEMU process first.
+
+vhost interrupt acceleration
+''''''''''''''''''''''''''''
+
+Another optimization used by the vhost application is the ability to
+directly inject interrupts into the VM via the KVM driver, again,
+bypassing the need to send the interrupt back to the QEMU process first.
+The QEMU virtio setup code configures the KVM driver with an eventfd
+that triggers the device interrupt in the guest when the eventfd is
+written. This irqfd file descriptor is then passed to the vhost user
+application program.
+
+vhost access to guest memory
+''''''''''''''''''''''''''''
+
+The vhost application is also allowed to directly access guest memory,
+instead of needing to send the data as messages to QEMU. This is also
+done with file descriptors sent to the vhost user application by QEMU.
+These descriptors can be passed to ``mmap()`` by the vhost application
+to map the guest address space into the vhost application.
+
+IOMMUs introduce another level of complexity, since the address given to
+the guest virtio device to DMA to or from is not a guest physical
+address. This case is handled by having vhost code within QEMU register
+as a listener for IOMMU mapping changes. The vhost application maintains
+a cache of IOMMMU translations: sending translation requests back to
+QEMU on cache misses, and in turn receiving flush requests from QEMU
+when mappings are purged.
+
+applicability to device separation
+''''''''''''''''''''''''''''''''''
+
+Much of the vhost model can be re-used by separated device emulation. In
+particular, the ideas of using a socket between QEMU and the device
+emulation application, using a file descriptor to inject interrupts into
+the VM via KVM, and allowing the application to ``mmap()`` the guest
+should be re used.
+
+There are, however, some notable differences between how a vhost
+application works and the needs of separated device emulation. The most
+basic is that vhost uses custom virtio device drivers which always
+trigger IO with MMIO stores. A separated device emulation model must
+work with existing IO device models and guest device drivers. MMIO loads
+break vhost store acceleration since they are synchronous - guest
+progress cannot continue until the load has been emulated. By contrast,
+stores are asynchronous, the guest can continue after the store event
+has been sent to the vhost application.
+
+Another difference is that in the vhost user model, a single daemon can
+support multiple QEMU instances. This is contrary to the security regime
+desired, in which the emulation application should only be allowed to
+access the files or devices the VM it's running on behalf of can access.
+#### qemu-io model
+
+Qemu-io is a test harness used to test changes to the QEMU block backend
+object code. (e.g., the code that implements disk images for disk driver
+emulation) Qemu-io is not a device emulation application per se, but it
+does compile the QEMU block objects into a separate binary from the main
+QEMU one. This could be useful for disk device emulation, since its
+emulation applications will need to include the QEMU block objects.
+
+New separation model based on proxy objects
+-------------------------------------------
+
+A different model based on proxy objects in the QEMU program
+communicating with remote emulation programs could provide separation
+while minimizing the changes needed to the device emulation code. The
+rest of this section is a discussion of how a proxy object model would
+work.
+
+Remote emulation processes
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The remote emulation process will run the QEMU object hierarchy without
+modification. The device emulation objects will be also be based on the
+QEMU code, because for anything but the simplest device, it would not be
+a tractable to re-implement both the object model and the many device
+backends that QEMU has.
+
+The processes will communicate with the QEMU process over UNIX domain
+sockets. The processes can be executed either as standalone processes,
+or be executed by QEMU. In both cases, the host backends the emulation
+processes will provide are specified on its command line, as they would
+be for QEMU. For example:
+
+::
+
+    disk-proc -blockdev driver=file,node-name=file0,filename=disk-file0  \
+    -blockdev driver=qcow2,node-name=drive0,file=file0
+
+would indicate process *disk-proc* uses a qcow2 emulated disk named
+*file0* as its backend.
+
+Emulation processes may emulate more than one guest controller. A common
+configuration might be to put all controllers of the same device class
+(e.g., disk, network, etc.) in a single process, so that all backends of
+the same type can be managed by a single QMP monitor.
+
+communication with QEMU
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Remote emulation processes will recognize a *-socket* argument that
+specifies the path of a UNIX domain socket used to communicate with the
+QEMU process. If no *-socket* argument is present, the process will use
+file descriptor 0 to communicate with QEMU. For example,
+
+::
+
+    disk-proc -socket /tmp/disk0-sock <backend list>
+
+will communicate with QEMU using the socket path */tmp/dik0-sock*.
+
+remote process QMP monitor
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Remote emulation processes can be monitored via QMP, similar to QEMU
+itself. The QMP monitor socket is specified the same as for a QEMU
+process:
+
+::
+
+    disk-proc -qmp unix:/tmp/disk-mon,server
+
+can be monitored over the UNIX socket path */tmp/disk-mon*.
+
+QEMU command line
+~~~~~~~~~~~~~~~~~
+
+The QEMU command line options will need to be modified to indicate which
+items are emulated by a separate program, and which remain emulated by
+QEMU itself.
+
+identifying remote emulation processes
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Remote emulation processes will be identified to QEMU using a *-remote*
+command line option. This option can either specify a command that QEMU
+will execute, or can specify a UNIX domain socket that QEMU can use to
+connect to an existing process. Both forms require a "id" option that
+identifies the process to later *-device* options. The process version
+is:
+
+::
+
+    -remote id=disk-proc,command="disk-proc <backend list>"
+
+And the socket version is:
+
+::
+
+    -remote id=disk-proc,socket="/tmp/disk0-sock"
+
+In the latter case, the remote process must be given the same socket on
+its command line when it is executed:
+
+::
+
+    disk-proc -socket /tmp/disk0-sock <backend list>
+
+identifying devices emulated remotely
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Devices that are to be emulated in a separate process will be identify
+the remote process with a "remote" option on their *-device* command
+line specification. e.g., an LSI SCSI controller and disk can be
+specified as:
+
+::
+
+    -device lsi53c895a,id=scsi0
+    -device scsi-hd,drive=drive0,bus=scsi0.0,scsi-id=0
+
+If these devices are emulated by remote process "disk-proc," as
+described in the previous section, the QEMU command line would be:
+
+::
+
+    -device lsi53c895a,id=scsi0,remote=disk-proc
+    -device scsi-hd,drive=drive0,bus=scsi0.0,scsi-id=0,remote=disk-proc
+
+Some devices are implicitly created by the machine object. e.g., the q35
+machine object will create its PCI bus, and attach an ich9-ahci IDE
+controller to it. In this case, options will need to be added to the
+*-machine* command line. e.g.,
+
+::
+
+    -machine pc-q35,ide-remote=disk-proc
+
+will use the remote process with an "id" of "disk-proc" to emulate the
+IDE controller and its disks.
+
+The disks themselves still need to be specified with *-remote* option,
+as in the example above. e.g.,
+
+::
+
+    -device ide-hd,drive=drive0,bus=ide.0,unit=0,remote=disk-proc
+
+QEMU management of remote processes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Each *-remote* instance on the QEMU command line will create a remote
+process proxy instance in QEMU. They will be held on a *QList* that can
+be searched for by its "id" property. The remote process proxy will also
+establish a communication channel between QEMU and the remote process.
+This can be done in one of two methods: direction execution of the
+process by QEMU with ``fork()`` and ``exec()`` system calls, or by
+connecting to an existing process.
+
+direct execution
+^^^^^^^^^^^^^^^^
+
+When the remote process is directly executed, the remote process proxy
+will setup a communication channel between itself and the emulation
+process. This channel will be created using ``socketpair()`` and the
+remote process side of the pair will be given to the process as file
+descriptor 0.
+
+connecting to an existing process
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Some environments wish to deny QEMU the ability to execute ``fork()``
+and ``exec()`` In these case, emulation processes will be started before
+QEMU, and a UNIX domain socket will be given to each emulation process
+to communicate with QEMU over. After communication is established, the
+socket will be unlinked from the file system space by the QEMU process.
+
+communication with emulation process
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+primary socket
+''''''''''''''
+
+Whether the process was executed by QEMU or externally, there will be a
+primary socket for communication between QEMU and the remote process.
+This channel will handle configuration commands from QEMU to the
+process, either from the QEMU command line, or from QMP commands that
+affect the devices being emulated by the process. This channel will only
+allow one message to be pending at a time; if additional messages
+arrive, they must wait for previous ones to be acknowledged from the
+remote side.
+
+secondary sockets
+'''''''''''''''''
+
+The primary socket can pass the file descriptors of secondary sockets
+for operations that occur in parallel with commands on the primary
+channel. These include MMIO operations generated by the guest, interrupt
+notifications generated by the devices being emulated, or *vmstate* for
+live migration. These secondary sockets will be created at the behest of
+the device proxies that require them. A disk device proxy wouldn't need
+any secondary sockets, but a disk controller device proxy may need both
+an MMIO socket and an interrupt socket.
+
+emulation process attached via QMP command
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+There will be a new "attach-process" QMP command to facilitate device
+hot-plug. This command's arguments will be the same as the *-remote*
+command line when it's used to attach to a remote process. i.e., it will
+need an "id" argument so that hot-plugged devices can later find it, and
+a "socket" argument to identify the UNIX domain socket that will be used
+to communicate with QEMU.
+
+QEMU device proxy objects
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+QEMU has an object model based on sub-classes inherited from the
+"object" super-class. The sub-classes that are of interest here are the
+"device" and "bus" sub-classes whose child sub-classes make up the
+device tree of a QEMU emulated system.
+
+The proxy object model will use device proxy objects to replace the
+device emulation code within the QEMU process. These objects will live
+in the same place in the object and bus hierarchies as the objects they
+replace. i.e., the proxy object for an LSI SCSI controller will be a
+sub-class of the "pci-device" class, and will have the same PCI bus
+parent and the same SCSI bus child objects as the LSI controller object
+it replaces.
+
+After the QEMU command line has been parsed, the remote devices will be
+instantiated in the same manner as local devices are. (i.e.,
+``qdev_device_add()``). In order to distinguish them from regular
+*-device* device objects, their class name will be the name of the class
+it replaces, with "-proxy" appended. e.g., the "lsi53c895a" proxy class
+will be "lsi53c895a-proxy."
+
+device JSON description
+^^^^^^^^^^^^^^^^^^^^^^^
+
+The remote process needs a JSON representation of the command line
+options used to create the object. This JSON representation is used to
+create the corresponding object in the emulation process. e.g., for an
+LSI SCSI controller invoked as:
+
+::
+
+     -device lsi53c895a,id=scsi0,remote=lsi-scsi
+
+the proxy object would create a
+
+::
+
+    { "driver" : "lsi53c895a", "id" : "scsi0" }
+
+JSON description. The "driver" option is assigned to the device name
+when the command line is parsed, so the "-proxy" appended by the command
+line parsing code is removed. The "remote" option isn't needed in the
+JSON description since it only applies to the proxy object in the QEMU
+process.
+
+device object whitelist
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Some device objects may not need a proxy. These are devices with no
+direct guest interfaces. (e.g., no MMIO, PIO, or interrupts). There will
+be a whitelist of such devices, and any devices on this list will not be
+instantiated in QEMU. Their JSON representation will still be sent to
+the remote process, so the object can be created there.
+
+object initialization
+^^^^^^^^^^^^^^^^^^^^^
+
+QEMU object initialization occurs in two phases. The first
+initialization happens once per object class. (i.e., there can be many
+SCSI disks in an emulated system, but the "scsi-hd" class has its
+``class_init()`` function called only once) The second phase happens
+when each object's ``instance_init()`` function is called to initialize
+each instance of the object.
+
+All device objects are sub-classes of the "device" class, so they also
+have a ``realize()`` function that is called after ``instance_init()``
+is called and after the object's static properties have been
+initialized. Many device objects don't even provide an instance\_init()
+function, and do all their per-instance work in ``realize()``.
+
+class\_init
+'''''''''''
+
+The ``class_init()`` method of a proxy object will, in general behave
+similarly to the object it replaces, including setting any static
+properties and methods needed by the proxy.
+
+instance\_init / realize
+''''''''''''''''''''''''
+
+The ``instance_init()`` and ``realize()`` functions would only need to
+perform tasks related to being a proxy, such are registering its own
+MMIO handlers, or creating a child bus that other proxy devices can be
+attached to later.
+
+Other tasks will are device-specific. For example, PCI device objects
+will initialize the PCI config space in order to make a valid PCI device
+tree within the QEMU process.
+
+address space registration
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Most devices are driven by guest device driver accesses to IO addresses
+or ports. The QEMU device emulation code uses QEMU's memory region
+function calls (such as ``memory_region_init_io()``) to add callback
+functions that QEMU will invoke when the guest accesses the device's
+areas of the IO address space. When a guest driver does access the
+device, the VM will exit HW virtualization mode and return to QEMU,
+which will then lookup and execute the corresponding callback function.
+
+A proxy object would need to mirror the memory region calls the actual
+device emulator would perform in its initialization code, but with its
+own callbacks. When invoked by QEMU as a result of a guest IO operation,
+they will forward the operation to the device emulation process.
+
+PCI config space
+^^^^^^^^^^^^^^^^
+
+PCI devices also have a configuration space that can be accessed by the
+guest driver. Guest accesses to this space is not handled by the device
+emulation object, but by its PCI parent object. Much of this space is
+read-only, but certain registers (especially BAR and MSI-related ones)
+need to be propagated to the emulation process.
+
+PCI parent proxy
+''''''''''''''''
+
+One way to propagate guest PCI config accesses is to create a
+"pci-device-proxy" class that can serve as the parent of a PCI device
+proxy object. This class's parent would be "pci-device" and it would
+override the PCI parent's ``config_read()`` and ``config_write()``
+methods with ones that forward these operations to the emulation
+program.
+
+interrupt receipt
+^^^^^^^^^^^^^^^^^
+
+A proxy for a device that generates interrupts will need to create a
+socket to receive interrupt indications from the emulation process. An
+incoming interrupt indication would then be sent up to its bus parent to
+be injected into the guest. For example, a PCI device object may use
+``pci_set_irq()``.
+
+live migration
+^^^^^^^^^^^^^^
+
+The proxy will register to save and restore any *vmstate* it needs over
+a live migration event. The device proxy does not need to manage the
+remote device's *vmstate*; that will be handled by the remote process
+proxy (see below).
+
+QEMU remote device operation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Generic device operations, such as DMA, will be performs by the remote
+process proxy by sending messages to the remote process.
+
+DMA operations
+^^^^^^^^^^^^^^
+
+DMA operations would be handled much like vhost applications do. One of
+the initial messages sent to the emulation process is a guest memory
+table. Each entry in this table consists of a file descriptor and size
+that the emulation process can ``mmap()`` to directly access guest
+memory, similar to ``vhost_user_set_mem_table()``. Note guest memory
+must be backed by file descriptors, such as when QEMU is given the
+*-mem-path* command line option.
+
+IOMMU operations
+^^^^^^^^^^^^^^^^
+
+When the emulated system includes an IOMMU, the remote process proxy in
+QEMU will need to create a socket for IOMMU requests from the emulation
+process. It will handle those requests with an
+``address_space_get_iotlb_entry()`` call. In order to handle IOMMU
+unmaps, the remote process proxy will also register as a listener on the
+device's DMA address space. When an IOMMU memory region is created
+within the DMA address space, an IOMMU notifier for unmaps will be added
+to the memory region that will forward unmaps to the emulation process
+over the IOMMU socket.
+
+device hot-plug via QMP
+^^^^^^^^^^^^^^^^^^^^^^^
+
+An QMP "device\_add" command can add a device emulated by a remote
+process. It needs to add a "remote" option to the command, just as the
+*-device* command line option does. The remote process may either be one
+started at QEMU startup, or be one added by the "add-process" QMP
+command described above. In either case, the remote process proxy will
+forward the new device's JSON description to the corresponding emulation
+process.
+
+live migration
+^^^^^^^^^^^^^^
+
+The remote process proxy will also register for live migration
+notifications with ``vmstate_register()``. When called to save state,
+the proxy will send the remote process a secondary socket file
+descriptor to save the remote process's device *vmstate* over. The
+incoming byte stream length and data will be saved as the proxy's
+*vmstate*. When the proxy is resumed on its new host, this *vmstate*
+will be extracted, and a secondary socket file descriptor will be sent
+to the new remote process through which it receives the *vmstate* in
+order to restore the devices there.
+
+device emulation in remote process
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The parts of QEMU that the emulation program will need include the
+object model; the memory emulation objects; the device emulation objects
+of the targeted device, and any dependent devices; and, the device's
+backends. It will also need code to setup the machine environment,
+handle requests from the QEMU process, and route machine-level requests
+(such as interrupts or IOMMU mappings) back to the QEMU process.
+
+initialization
+''''''''''''''
+
+The process initialization sequence will follow the same sequence
+followed by QEMU. It will first initialize the backend objects, then
+device emulation objects. The JSON descriptions sent by the QEMU process
+will drive which objects need to be created.
+
+-  address spaces
+
+Before the device objects are created, the initial address spaces and
+memory regions must be configured with ``memory_map_init()``. This
+creates a RAM memory region object (*system\_memory*) and an IO memory
+region object (*system\_io*).
+
+-  RAM
+
+RAM memory region creation will follow how ``pc_memory_init()`` creates
+them, but must use ``memory_region_init_ram_from_fd()`` instead of
+``memory_region_allocate_system_memory()``. The file descriptors needed
+will be supplied by the guest memory table from above. Those RAM regions
+would then be added to the *system\_memory* memory region with
+``memory_region_add_subregion()``.
+
+-  PCI
+
+IO initialization will be driven by the JSON descriptions sent from the
+QEMU process. For a PCI device, a PCI bus will need to be created with
+``pci_root_bus_new()``, and a PCI memory region will need to be created
+and added to the *system\_memory* memory region with
+``memory_region_add_subregion_overlap()``. The overlap version is
+required for architectures where PCI memory overlaps with RAM memory.
+
+MMIO handling
+'''''''''''''
+
+The device emulation objects will use ``memory_region_init_io()`` to
+install their MMIO handlers, and ``pci_register_bar()`` to associate
+those handlers with a PCI BAR, as they do within QEMU currently.
+
+In order to use ``address_space_rw()`` in the emulation process to
+handle MMIO requests from QEMU, the PCI physical addresses must be the
+same in the QEMU process and the device emulation process. In order to
+accomplish that, guest BAR programming must also be forwarded from QEMU
+to the emulation process.
+
+interrupt injection
+'''''''''''''''''''
+
+When device emulation wants to inject an interrupt into the VM, the
+request climbs the device's bus object hierarchy until the point where a
+bus object knows how to signal the interrupt to the guest. The details
+depend on the type of interrupt being raised.
+
+-  PCI pin interrupts
+
+On x86 systems, there is an emulated IOAPIC object attached to the root
+PCI bus object, and the root PCI object forwards interrupt requests to
+it. The IOAPIC object, in turn, calls the KVM driver to inject the
+corresponding interrupt into the VM. The simplest way to handle this in
+an emulation process would be to setup the root PCI bus driver (via
+``pci_bus_irqs()``) to send a interrupt request back to the QEMU
+process, and have the device proxy object reflect it up the PCI tree
+there.
+
+-  PCI MSI/X interrupts
+
+PCI MSI/X interrupts are implemented in HW as DMA writes to a
+CPU-specific PCI address. In QEMU on x86, a KVM APIC object receives
+these DMA writes, then calls into the KVM driver to inject the interrupt
+into the VM. A simple emulation process implementation would be to send
+the MSI DMA address from QEMU as a message at initialization, then
+install an address space handler at that address which forwards the MSI
+message back to QEMU.
+
+DMA operations
+''''''''''''''
+
+When a emulation object wants to DMA into or out of guest memory, it
+first must use dma\_memory\_map() to convert the DMA address to a local
+virtual address. The emulation process memory region objects setup above
+will be used to translate the DMA address to a local virtual address the
+device emulation code can access.
+
+IOMMU
+'''''
+
+When an IOMMU is in use in QEMU, DMA translation uses IOMMU memory
+regions to translate the DMA address to a guest physical address before
+that physical address can be translated to a local virtual address. The
+emulation process will need similar functionality.
+
+-  IOTLB cache
+
+The emulation process will maintain a cache of recent IOMMU translations
+(the IOTLB). When the translate() callback of an IOMMU memory region is
+invoked, the IOTLB cache will be searched for an entry that will map the
+DMA address to a guest PA. On a cache miss, a message will be sent back
+to QEMU requesting the corresponding translation entry, which be both be
+used to return a guest address and be added to the cache.
+
+-  IOTLB purge
+
+The IOMMU emulation will also need to act on unmap requests from QEMU.
+These happen when the guest IOMMU driver purges an entry from the
+guest's translation table.
+
+live migration
+''''''''''''''
+
+When a remote process receives a live migration indication from QEMU, it
+will set up a channel using the received file descriptor with
+``qio_channel_socket_new_fd()``. This channel will be used to create a
+*QEMUfile* that can be passed to ``qemu_save_device_state()`` to send
+the process's device state back to QEMU. This method will be reversed on
+restore - the channel will be passed to ``qemu_loadvm_state()`` to
+restore the device state.
+
+Accelerating device emulation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The messages that are required to be sent between QEMU and the emulation
+process can add considerable latency to IO operations. The optimizations
+described below attempt to ameliorate this effect by allowing the
+emulation process to communicate directly with the kernel KVM driver.
+The KVM file descriptors created wold be passed to the emulation process
+via initialization messages, much like the guest memory table is done.
+#### MMIO acceleration
+
+Vhost user applications can receive guest virtio driver stores directly
+from KVM. The issue with the eventfd mechanism used by vhost user is
+that it does not pass any data with the event indication, so it cannot
+handle guest loads or guest stores that carry store data. This concept
+could, however, be expanded to cover more cases.
+
+The expanded idea would require a new type of KVM device:
+*KVM\_DEV\_TYPE\_USER*. This device has two file descriptors: a master
+descriptor that QEMU can use for configuration, and a slave descriptor
+that the emulation process can use to receive MMIO notifications. QEMU
+would create both descriptors using the KVM driver, and pass the slave
+descriptor to the emulation process via an initialization message.
+
+data structures
+'''''''''''''''
+
+-  guest physical range
+
+The guest physical range structure describes the address range that a
+device will respond to. It includes the base and length of the range, as
+well as which bus the range resides on (e.g., on an x86machine, it can
+specify whether the range refers to memory or IO addresses).
+
+A device can have multiple physical address ranges it responds to (e.g.,
+a PCI device can have multiple BARs), so the structure will also include
+an enumerated identifier to specify which of the device's ranges is
+being referred to.
+
++--------+----------------------------+
+| Name   | Description                |
++========+============================+
+| addr   | range base address         |
++--------+----------------------------+
+| len    | range length               |
++--------+----------------------------+
+| bus    | addr type (memory or IO)   |
++--------+----------------------------+
+| id     | range ID (e.g., PCI BAR)   |
++--------+----------------------------+
+
+-  MMIO request structure
+
+This structure describes an MMIO operation. It includes which guest
+physical range the MMIO was within, the offset within that range, the
+MMIO type (e.g., load or store), and its length and data. It also
+includes a sequence number that can be used to reply to the MMIO, and
+the CPU that issued the MMIO.
+
++----------+------------------------+
+| Name     | Description            |
++==========+========================+
+| rid      | range MMIO is within   |
++----------+------------------------+
+| offset   | offset withing *rid*   |
++----------+------------------------+
+| type     | e.g., load or store    |
++----------+------------------------+
+| len      | MMIO length            |
++----------+------------------------+
+| data     | store data             |
++----------+------------------------+
+| seq      | sequence ID            |
++----------+------------------------+
+
+-  MMIO request queues
+
+MMIO request queues are FIFO arrays of MMIO request structures. There
+are two queues: pending queue is for MMIOs that haven't been read by the
+emulation program, and the sent queue is for MMIOs that haven't been
+acknowledged. The main use of the second queue is to validate MMIO
+replies from the emulation program.
+
+-  scoreboard
+
+Each CPU in the VM is emulated in QEMU by a separate thread, so multiple
+MMIOs may be waiting to be consumed by an emulation program and multiple
+threads may be waiting for MMIO replies. The scoreboard would contain a
+wait queue and sequence number for the per-CPU threads, allowing them to
+be individually woken when the MMIO reply is received from the emulation
+program. It also tracks the number of posted MMIO stores to the device
+that haven't been replied to, in order to satisfy the PCI constraint
+that a load to a device will not complete until all previous stores to
+that device have been completed.
+
+-  device shadow memory
+
+Some MMIO loads do not have device side-effects. These MMIOs can be
+completed without sending a MMIO request to the emulation program if the
+emulation program shares a shadow image of the device's memory image
+with the KVM driver.
+
+The emulation program will ask the KVM driver to allocate memory for the
+shadow image, and will then use ``mmap()`` to directly access it. The
+emulation program can control KVM access to the shadow image by sending
+KVM an access map telling it which areas of the image have no
+side-effects (and can be completed immediately), and which require a
+MMIO request to the emulation program. The access map can also inform
+the KVM drive which size accesses are allowed to the image.
+
+master descriptor
+'''''''''''''''''
+
+The master descriptor is used by QEMU to configure the new KVM device.
+The descriptor would be returned by the KVM driver when QEMU issues a
+*KVM\_CREATE\_DEVICE* ``ioctl()`` with a *KVM\_DEV\_TYPE\_USER* type.
+
+KVM\_DEV\_TYPE\_USER device ops
+
+
+The *KVM\_DEV\_TYPE\_USER* operations vector will be registered by a
+``kvm_register_device_ops()`` call when the KVM system in initialized by
+``kvm_init()``. These device ops are called by the KVM driver when QEMU
+executes certain ``ioctl()`` operations on its KVM file descriptor. They
+include:
+
+-  create
+
+This routine is called when QEMU issues a *KVM\_CREATE\_DEVICE*
+``ioctl()`` on its per-VM file descriptor. It will allocate and
+initialize a KVM user device specific data structure, and assign the
+*kvm\_device* private field to it.
+
+-  ioctl
+
+This routine is invoked when QEMU issues an ``ioctl()`` on the master
+descriptor. The ``ioctl()`` commands supported are defined by the KVM
+device type. *KVM\_DEV\_TYPE\_USER* ones will need several commands:
+
+*KVM\_DEV\_USER\_SLAVE\_FD* creates the slave file descriptor thatwill
+be passed to the device emulation program. Only one slave can be created
+by each master descriptor. The file operations performed by this
+descriptor are described below.
+
+The *KVM\_DEV\_USER\_PA\_RANGE* command configures a guest physical
+address range that the slave descriptor will receive MMIO notifications
+for. The range is specified by a guest physical range structure
+argument. For buses that assign addresses to devices dynamically, this
+command can be executed while the guest is running, such as the case
+when a guest changes a device's PCI BAR registers.
+
+*KVM\_DEV\_USER\_PA\_RANGE* will use ``kvm_io_bus_register_dev()`` to
+register *kvm\_io\_device\_ops* callbacks to be invoked when the guest
+performs a MMIO operation within the range. When a range is changed,
+``kvm_io_bus_unregister_dev()`` is used to remove the previous
+instantiation.
+
+*KVM\_DEV\_USER\_TIMEOUT* will configure a timeout value that specifies
+how long KVM will wait for the emulation process to respond to a MMIO
+indication.
+
+-  destroy
+
+This routine is called when the VM instance is destroyed. It will need
+to destroy the slave descriptor; and free any memory allocated by the
+driver, as well as the *kvm\_device* structure itself.
+
+slave descriptor
+''''''''''''''''
+
+The slave descriptor will have its own file operations vector, which
+responds to system calls on the descriptor performed by the device
+emulation program.
+
+-  read
+
+A read returns any pending MMIO requests from the KVM driver as MMIO
+request structures. Multiple structures can be returned if there are
+multiple MMIO operations pending. The MMIO requests are moved from the
+pending queue to the sent queue, and if there are threads waiting for
+space in the pending to add new MMIO operations, they will be woken
+here.
+
+-  write
+
+A write also consists of a set of MMIO requests. They are compared to
+the MMIO requests in the sent queue. Matches are removed from the sent
+queue, and any threads waiting for the reply are woken. If a store is
+removed, then the number of posted stores in the per-CPU scoreboard is
+decremented. When the number is zero, and a non side-effect load was
+waiting for posted stores to complete, the load is continued.
+
+-  ioctl
+
+There are several ioctl()s that can be performed on the slave
+descriptor.
+
+A *KVM\_DEV\_USER\_SHADOW\_SIZE* ``ioctl()`` causes the KVM driver to
+allocate memory for the shadow image. This memory can later be
+``mmap()``\ ed by the emulation process to share the emulation's view of
+device memory with the KVM driver.
+
+A *KVM\_DEV\_USER\_SHADOW\_CTRL* ``ioctl()`` controls access to the
+shadow image. It will send the KVM driver a shadow control map, which
+specifies which areas of the image can complete guest loads without
+sending the load request to the emulation program. It will also specify
+the size of load operations that are allowed.
+
+-  poll
+
+An emulation program will use the ``poll()`` call with a *POLLIN* flag
+to determine if there are MMIO requests waiting to be read. It will
+return if the pending MMIO request queue is not empty.
+
+-  mmap
+
+This call allows the emulation program to directly access the shadow
+image allocated by the KVM driver. As device emulation updates device
+memory, changes with no side-effects will be reflected in the shadow,
+and the KVM driver can satisfy guest loads from the shadow image without
+needing to wait for the emulation program.
+
+kvm\_io\_device ops
+'''''''''''''''''''
+
+Each KVM per-CPU thread can handle MMIO operation on behalf of the guest
+VM. KVM will use the MMIO's guest physical address to search for a
+matching *kvm\_io\_device* to see if the MMIO can be handled by the KVM
+driver instead of exiting back to QEMU. If a match is found, the
+corresponding callback will be invoked.
+
+-  read
+
+This callback is invoked when the guest performs a load to the device.
+Loads with side-effects must be handled synchronously, with the KVM
+driver putting the QEMU thread to sleep waiting for the emulation
+process reply before re-starting the guest. Loads that do not have
+side-effects may be optimized by satisfying them from the shadow image,
+if there are no outstanding stores to the device by this CPU. PCI memory
+ordering demands that a load cannot complete before all older stores to
+the same device have been completed.
+
+-  write
+
+Stores can be handled asynchronously unless the pending MMIO request
+queue is full. In this case, the QEMU thread must sleep waiting for
+space in the queue. Stores will increment the number of posted stores in
+the per-CPU scoreboard, in order to implement the PCI ordering
+constraint above.
+
+interrupt acceleration
+^^^^^^^^^^^^^^^^^^^^^^
+
+This performance optimization would work much like a vhost user
+application does, where the QEMU process sets up *eventfds* that cause
+the device's corresponding interrupt to be triggered by the KVM driver.
+These irq file descriptors are sent to the emulation process at
+initialization, and are used when the emulation code raises a device
+interrupt.
+
+intx acceleration
+'''''''''''''''''
+
+Traditional PCI pin interrupts are level based, so, in addition to an
+irq file descriptor, a re-sampling file descriptor needs to be sent to
+the emulation program. This second file descriptor allows multiple
+devices sharing an irq to be notified when the interrupt has been
+acknowledged by the guest, so they can re-trigger the interrupt if their
+device has not de-asserted its interrupt.
+
+intx irq descriptor
+
+
+The irq descriptors are created by the proxy object
+``using event_notifier_init()`` to create the irq and re-sampling
+*eventds*, and ``kvm_vm_ioctl(KVM_IRQFD)`` to bind them to an interrupt.
+The interrupt route can be found with
+``pci_device_route_intx_to_irq()``.
+
+intx routing changes
+
+
+Intx routing can be changed when the guest programs the APIC the device
+pin is connected to. The proxy object in QEMU will use
+``pci_device_set_intx_routing_notifier()`` to be informed of any guest
+changes to the route. This handler will broadly follow the VFIO
+interrupt logic to change the route: de-assigning the existing irq
+descriptor from its route, then assigning it the new route. (see
+``vfio_intx_update()``)
+
+MSI/X acceleration
+''''''''''''''''''
+
+MSI/X interrupts are sent as DMA transactions to the host. The interrupt
+data contains a vector that is programed by the guest, A device may have
+multiple MSI interrupts associated with it, so multiple irq descriptors
+may need to be sent to the emulation program.
+
+MSI/X irq descriptor
+
+
+This case will also follow the VFIO example. For each MSI/X interrupt,
+an *eventfd* is created, a virtual interrupt is allocated by
+``kvm_irqchip_add_msi_route()``, and the virtual interrupt is bound to
+the eventfd with ``kvm_irqchip_add_irqfd_notifier()``.
+
+MSI/X config space changes
+
+
+The guest may dynamically update several MSI-related tables in the
+device's PCI config space. These include per-MSI interrupt enables and
+vector data. Additionally, MSIX tables exist in device memory space, not
+config space. Much like the BAR case above, the proxy object must look
+at guest config space programming to keep the MSI interrupt state
+consistent between QEMU and the emulation program.
+
+--------------
+
+Disaggregated CPU emulation
+---------------------------
+
+After IO services have been disaggregated, a second phase would be to
+separate a process to handle CPU instruction emulation from the main
+QEMU control function. There are no object separation points for this
+code, so the first task would be to create one.
+
+Host access controls
+--------------------
+
+Separating QEMU relies on the host OS's access restriction mechanisms to
+enforce that the differing processes can only access the objects they
+are entitled to. There are a couple types of mechanisms usually provided
+by general purpose OSs.
+
+Discretionary access control
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Discretionary access control allows each user to control who can access
+their files. In Linux, this type of control is usually too coarse for
+QEMU separation, since it only provides three separate access controls:
+one for the same user ID, the second for users IDs with the same group
+ID, and the third for all other user IDs. Each device instance would
+need a separate user ID to provide access control, which is likely to be
+unwieldy for dynamically created VMs.
+
+Mandatory access control
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Mandatory access control allows the OS to add an additional set of
+controls on top of discretionary access for the OS to control. It also
+adds other attributes to processes and files such as types, roles, and
+categories, and can establish rules for how processes and files can
+interact.
+
+Type enforcement
+^^^^^^^^^^^^^^^^
+
+Type enforcement assigns a *type* attribute to processes and files, and
+allows rules to be written on what operations a process with a given
+type can perform on a file with a given type. QEMU separation could take
+advantage of type enforcement by running the emulation processes with
+different types, both from the main QEMU process, and from the emulation
+processes of different classes of devices.
+
+For example, guest disk images and disk emulation processes could have
+types separate from the main QEMU process and non-disk emulation
+processes, and the type rules could prevent processes other than disk
+emulation ones from accessing guest disk images. Similarly, network
+emulation processes can have a type separate from the main QEMU process
+and non-network emulation process, and only that type can access the
+host tun/tap device used to provide guest networking.
+
+Category enforcement
+^^^^^^^^^^^^^^^^^^^^
+
+Category enforcement assigns a set of numbers within a given range to
+the process or file. The process is granted access to the file if the
+process's set is a superset of the file's set. This enforcement can be
+used to separate multiple instances of devices in the same class.
+
+For example, if there are multiple disk devices provides to a guest,
+each device emulation process could be provisioned with a separate
+category. The different device emulation processes would not be able to
+access each other's backing disk images.
+
+Alternatively, categories could be used in lieu of the type enforcement
+scheme described above. In this scenario, different categories would be
+used to prevent device emulation processes in different classes from
+accessing resources assigned to other classes.
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC v4 PATCH 49/49] multi-process: add configure and usage information
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (47 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 48/49] multi-process: add the concept description to docs/devel/qemu-multiprocess Jagannathan Raman
@ 2019-10-24  9:09 ` Jagannathan Raman
  2019-11-07 14:02   ` Stefan Hajnoczi
  2019-10-25  2:08 ` [RFC v4 PATCH 00/49] Initial support of multi-process qemu no-reply
                   ` (4 subsequent siblings)
  53 siblings, 1 reply; 140+ messages in thread
From: Jagannathan Raman @ 2019-10-24  9:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, john.g.johnson, kraxel, jag.raman,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
---
 docs/qemu-multiprocess.txt | 86 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 86 insertions(+)
 create mode 100644 docs/qemu-multiprocess.txt

diff --git a/docs/qemu-multiprocess.txt b/docs/qemu-multiprocess.txt
new file mode 100644
index 0000000..c29f4df
--- /dev/null
+++ b/docs/qemu-multiprocess.txt
@@ -0,0 +1,86 @@
+Multi-process QEMU
+==================
+
+This document describes how to configure and use multi-process qemu.
+For the design document refer to docs/devel/qemu-multiprocess.
+
+1) Configuration
+----------------
+
+To enable support for multi-process add --enable-mpqemu
+to the list of options for the "configure" script.
+
+
+2) Usage
+--------
+
+To start qemu with devices intended to run in a separate emulation
+process without libvirtd support, the following should be used on QEMU
+command line. As of now, we only support the emulation of lsi53c895a
+in a separate process
+
+* Since parts of the RAM are shared between QEMU & remote process, a
+  memory-backend-file is required to facilitate this, as follows:
+
+  -object memory-backend-file,id=mem,mem-path=/dev/shm/,size=4096M,share=on
+
+* The devices to be emulated in the separate process are defined as
+  before with addition of "rid" suboption that serves as a remote group
+  identificator.
+
+  -device <device options>,rid="remote process id"
+
+  For exmaple, for non multi-process qemu:
+    -device lsi53c895a,id=scsi0 device
+    -device scsi-hd,drive=drive0,bus=scsi0.0,scsi-id=0
+    -drive id=drive0,file=data-disk.img
+
+  and for multi-process qemu and no libvirt
+  support (i.e. QEMU forks child processes):
+    -device lsi53c895a,id=scsi0,rid=0
+    -device scsi-hd,drive=drive0,bus=scsi0.0,scsi-id=0,rid="0"
+
+* The command-line options for the remote process is added to the "command"
+  suboption of the newly added "-remote" option. 
+
+   -remote [socket],rid=,command="..."
+
+  The drives to be emulated by the remote process are specified as part of
+  this command sub-option. The device to be used to connect to the monitor
+  is also specified as part of this suboption.
+
+  For example, the following option adds a drive and monitor to the remote
+  process:
+  -remote rid=0,command="-drive id=drive0,,file=data-disk.img -monitor unix:/home/qmp-sock,,server,,nowait"
+
+  Note: There's an issue with this "command" subtion which we are in the
+  process of fixing. To work around this issue, it requires additional
+  "comma" characters as illustrated above, and in the example below.
+
+* Example QEMU command-line to launch lsi53c895a in a remote process
+
+  #/bin/sh
+  qemu-system-x86_64 \
+  -name "OL7.4" \
+  -machine q35,accel=kvm \
+  -smp sockets=1,cores=1,threads=1 \
+  -cpu host \
+  -m 2048 \
+  -object memory-backend-file,id=mem,mem-path=/dev/shm/,size=2G,share=on \
+  -numa node,memdev=mem \
+  -device virtio-scsi-pci,id=virtio_scsi_pci0 \
+  -drive id=drive_image1,if=none,format=raw,file=/root/ol7.qcow2 \
+  -device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0 \
+  -boot d \
+  -monitor stdio \
+  -vnc :0 \
+  -device lsi53c895a,id=lsi0,remote,rid=8,command="qemu-scsi-dev" \
+  -device scsi-hd,id=drive2,drive=drive_image2,bus=lsi0.0,scsi-id=0,remote,rid=8,command="qemu-scsi-dev"\
+  -remote rid=8,command="-drive id=drive_image2,,file=/root/remote-process-disk.img -monitor unix:/home/qmp-sock,,server,,nowait"
+
+  We could connect to the monitor using the following command:
+  socat /home/qmp-sock stdio
+
+  After hotplugging disks to the remote process, please execute the
+  following command in the guest to refresh the list of storage devices:
+  rescan_scsi_bus.sh -a
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (48 preceding siblings ...)
  2019-10-24  9:09 ` [RFC v4 PATCH 49/49] multi-process: add configure and usage information Jagannathan Raman
@ 2019-10-25  2:08 ` no-reply
  2019-10-25  2:08 ` no-reply
                   ` (3 subsequent siblings)
  53 siblings, 0 replies; 140+ messages in thread
From: no-reply @ 2019-10-25  2:08 UTC (permalink / raw)
  To: jag.raman
  Cc: elena.ufimtseva, fam, john.g.johnson, qemu-devel, kraxel,
	jag.raman, quintela, mst, armbru, kanth.ghatraju, thuth,
	ehabkost, konrad.wilk, dgilbert, liran.alon, stefanha, rth,
	kwolf, berrange, mreitz, ross.lagerwall, marcandre.lureau,
	pbonzini

Patchew URL: https://patchew.org/QEMU/cover.1571905346.git.jag.raman@oracle.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  TEST    iotest-qcow2: 268
Failures: 071 099 120 184 186
Failed 5 of 109 iotests
make: *** [check-tests/check-block.sh] Error 1
make: *** Waiting for unfinished jobs....
  TEST    check-qtest-aarch64: tests/qos-test
Traceback (most recent call last):
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=7fcd537d324f4e2997dd5a6e772bce59', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-k9w5q7fm/src/docker-src.2019-10-24-21.54.49.17993:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=7fcd537d324f4e2997dd5a6e772bce59
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-k9w5q7fm/src'
make: *** [docker-run-test-quick@centos7] Error 2

real    13m26.478s
user    0m8.947s


The full log is available at
http://patchew.org/logs/cover.1571905346.git.jag.raman@oracle.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (49 preceding siblings ...)
  2019-10-25  2:08 ` [RFC v4 PATCH 00/49] Initial support of multi-process qemu no-reply
@ 2019-10-25  2:08 ` no-reply
  2019-10-25  2:10 ` no-reply
                   ` (2 subsequent siblings)
  53 siblings, 0 replies; 140+ messages in thread
From: no-reply @ 2019-10-25  2:08 UTC (permalink / raw)
  To: jag.raman
  Cc: elena.ufimtseva, fam, john.g.johnson, qemu-devel, kraxel,
	jag.raman, quintela, mst, armbru, kanth.ghatraju, thuth,
	ehabkost, konrad.wilk, dgilbert, liran.alon, stefanha, rth,
	kwolf, berrange, mreitz, ross.lagerwall, marcandre.lureau,
	pbonzini

Patchew URL: https://patchew.org/QEMU/cover.1571905346.git.jag.raman@oracle.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC      util/aio-wait.o
  CC      util/thread-pool.o

Warning, treated as error:
/tmp/qemu-test/src/docs/devel/index.rst:13:toctree contains reference to nonexisting document 'multi-process'
  CC      util/qemu-timer.o
  CC      util/main-loop.o
make: *** [Makefile:1003: docs/devel/index.html] Error 2
make: *** Waiting for unfinished jobs....
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 662, in <module>
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=f573792346ea4c8cb0e6066fdd1f1ddf', '-u', '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-pe_jt46w/src/docker-src.2019-10-24-22.05.58.3272:/var/tmp/qemu:z,ro', 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=f573792346ea4c8cb0e6066fdd1f1ddf
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-pe_jt46w/src'
make: *** [docker-run-test-mingw@fedora] Error 2

real    2m19.037s
user    0m7.875s


The full log is available at
http://patchew.org/logs/cover.1571905346.git.jag.raman@oracle.com/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (50 preceding siblings ...)
  2019-10-25  2:08 ` no-reply
@ 2019-10-25  2:10 ` no-reply
  2019-11-21 12:46 ` Stefan Hajnoczi
  2019-12-10  6:47 ` [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update Elena Ufimtseva
  53 siblings, 0 replies; 140+ messages in thread
From: no-reply @ 2019-10-25  2:10 UTC (permalink / raw)
  To: jag.raman
  Cc: elena.ufimtseva, fam, john.g.johnson, qemu-devel, kraxel,
	jag.raman, quintela, mst, armbru, kanth.ghatraju, thuth,
	ehabkost, konrad.wilk, dgilbert, liran.alon, stefanha, rth,
	kwolf, berrange, mreitz, ross.lagerwall, marcandre.lureau,
	pbonzini

Patchew URL: https://patchew.org/QEMU/cover.1571905346.git.jag.raman@oracle.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [RFC v4 PATCH 00/49] Initial support of multi-process qemu
Type: series
Message-id: cover.1571905346.git.jag.raman@oracle.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
75d605b multi-process: add configure and usage information
1724569 multi-process: add the concept description to docs/devel/qemu-multiprocess
15328b7 multi-process: Enable support for multiple devices in remote
5cf79267 multi-process/mig: Restore the VMSD in remote process
97c15c9 multi-process/mig: Synchronize runstate of remote process
1b67b17 multi-process/mig: refactor runstate_check into common file
55e4b53 multi-process/mig: Load VMSD in the proxy object
7e1b819 multi-process/mig: Send VMSD of remote to the Proxy object
399f9bf multi-process/mig: Enable VMSD save in the Proxy object
645fc27 multi-process/mig: build migration module in the remote process
f0265a7 multi-process: prevent duplicate memory initialization in remote
a0faf9e multi-process/mon: Initialize QMP module for remote processes
58310d9 multi-process/mon: Refactor monitor/chardev functions out of vl.c
ae4fd02 multi-process/mon: enable QMP module support in the remote process
d7e2191 multi-process/mon: stub functions to enable QMP module for remote process
2c67c85 multi-process/mon: choose HMP commands based on target
cb5ab3e multi-process: perform device reset in the remote process
3214409 multi-process: Use separate MMIO communication channel
c34c47f multi-process: handle heartbeat messages in remote process
ccd9230 multi-process: send heartbeat messages to remote
f4991d6 multi-process: add parse_cmdline in remote process
a90d6c6 multi-process: add remote options parser
d616c9d multi-process: add remote option
45c18dd multi-process: refractor vl.c code to re-use in remote
57b6105 multi-process: Introduce build flags to separate remote process code
1295409 multi-process: add processing of remote drive and device command line
e876f72 multi-process: remote: add create_done condition
effa3f0 multi-process: remote: use fd for socket from parent process
de69c89 multi-process: remote: add setup_devices and setup_drive msg processing
afc658d multi-process: add qdev_proxy_add to create proxy devices
70e0f47 multi-process: configure remote side devices
701a141 multi-process: create IOHUB object to handle irq
613c372 multi-process: Synchronize remote memory
f5183a9 multi-process: Add LSI device proxy object
7778ec0 multi-process: PCI BAR read/write handling for proxy & remote endpoints
736d74d mutli-process: build remote command line args
650f3d1 multi-process: introduce proxy object
9374d1a multi-process: remote process initialization
1965b18 multi-process: setup memory manager for remote device
719f823 multi-process: setup a machine object for remote device process
59bdda6 multi-process: setup PCI host bridge for remote device
139cdee multi-process: add functions to synchronize proxy and remote endpoints
1b3c1f0 multi-process: define mpqemu-link object
ac130ca multi-process: build system for remote device process
93774c4 multi-process: Add config option for multi-process QEMU
7e5f9b2 multi-process: Add stub functions to facilate build of multi-process
73740e6 multi-process: add a command line option for debug file
59f2f7d multi-process: util: Add qemu_thread_cancel() to cancel running thread
18b29a1 multi-process: memory: alloc RAM from file at offset

=== OUTPUT BEGIN ===
1/49 Checking commit 18b29a1306b2 (multi-process: memory: alloc RAM from file at offset)
2/49 Checking commit 59f2f7d24d20 (multi-process: util: Add qemu_thread_cancel() to cancel running thread)
3/49 Checking commit 73740e6eff21 (multi-process: add a command line option for debug file)
4/49 Checking commit 7e5f9b26d48c (multi-process: Add stub functions to facilate build of multi-process)
ERROR: suspect code indent for conditional statements (4, 4)
#137: FILE: accel/stubs/tcg-stub.c:109:
+    while (1) {
+    }

WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#151: 
new file mode 100644

total: 1 errors, 1 warnings, 376 lines checked

Patch 4/49 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

5/49 Checking commit 93774c4a2c28 (multi-process: Add config option for multi-process QEMU)
6/49 Checking commit ac130ca326fe (multi-process: build system for remote device process)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#286: 
new file mode 100644

total: 0 errors, 1 warnings, 244 lines checked

Patch 6/49 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
7/49 Checking commit 1b3c1f0e0495 (multi-process: define mpqemu-link object)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#20: 
new file mode 100644

WARNING: line over 80 characters
#171: FILE: include/io/mpqemu-link.h:147:
+void mpqemu_link_set_callback(MPQemuLinkState *s, mpqemu_link_callback callback);

WARNING: line over 80 characters
#397: FILE: io/mpqemu-link.c:207:
+                qemu_log_mask(LOG_REMOTE_DEBUG, "%s: Max FDs exceeded\n", __func__);

total: 0 errors, 3 warnings, 464 lines checked

Patch 7/49 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
8/49 Checking commit 139cdee6cbf6 (multi-process: add functions to synchronize proxy and remote endpoints)
9/49 Checking commit 59bdda61a183 (multi-process: setup PCI host bridge for remote device)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#31: 
new file mode 100644

total: 0 errors, 1 warnings, 153 lines checked

Patch 9/49 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
10/49 Checking commit 719f82346cb6 (multi-process: setup a machine object for remote device process)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#51: 
new file mode 100644

total: 0 errors, 1 warnings, 207 lines checked

Patch 10/49 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
11/49 Checking commit 1965b18cc69e (multi-process: setup memory manager for remote device)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#81: 
new file mode 100644

total: 0 errors, 1 warnings, 182 lines checked

Patch 11/49 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
12/49 Checking commit 9374d1ad3f30 (multi-process: remote process initialization)
13/49 Checking commit 650f3d106d82 (multi-process: introduce proxy object)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#34: 
new file mode 100644

total: 0 errors, 1 warnings, 379 lines checked

Patch 13/49 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
14/49 Checking commit 736d74d6623b (mutli-process: build remote command line args)
WARNING: line over 80 characters
#141: FILE: hw/proxy/qemu-proxy.c:243:
+static void init_proxy(PCIDevice *dev, char *command, bool need_spawn, Error **errp)

WARNING: line over 80 characters
#167: FILE: include/hw/proxy/qemu-proxy.h:66:
+    void (*init_proxy) (PCIDevice *dev, char *command, bool need_spawn, Error **errp);

total: 0 errors, 2 warnings, 146 lines checked

Patch 14/49 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
15/49 Checking commit 7778ec04e0e6 (multi-process: PCI BAR read/write handling for proxy & remote endpoints)
16/49 Checking commit f5183a9f01b5 (multi-process: Add LSI device proxy object)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#24: 
new file mode 100644

total: 0 errors, 1 warnings, 135 lines checked

Patch 16/49 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
17/49 Checking commit 613c372a8221 (multi-process: Synchronize remote memory)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#32: 
new file mode 100644

total: 0 errors, 1 warnings, 344 lines checked

Patch 17/49 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
18/49 Checking commit 701a141ec1a3 (multi-process: create IOHUB object to handle irq)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#202: 
new file mode 100644

total: 0 errors, 1 warnings, 435 lines checked

Patch 18/49 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
19/49 Checking commit 70e0f47f0652 (multi-process: configure remote side devices)
20/49 Checking commit afc658d34c34 (multi-process: add qdev_proxy_add to create proxy devices)
21/49 Checking commit de69c899ae12 (multi-process: remote: add setup_devices and setup_drive msg processing)
22/49 Checking commit effa3f06a627 (multi-process: remote: use fd for socket from parent process)
23/49 Checking commit e876f72db9ae (multi-process: remote: add create_done condition)
24/49 Checking commit 129540993d75 (multi-process: add processing of remote drive and device command line)
WARNING: Block comments use a leading /* on a separate line
#41: FILE: vl.c:1149:
+    dev = qdev_remote_add(opts, false /* this is drive */, errp);

WARNING: Block comments use a leading /* on a separate line
#87: FILE: vl.c:2246:
+    dev = qdev_remote_add(opts, true /* this is device */, errp);

WARNING: line over 80 characters
#121: FILE: vl.c:4418:
+     * need PCI host initialized. As a TODO: could defer init of PCIProxyDev instead.

total: 0 errors, 3 warnings, 117 lines checked

Patch 24/49 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
25/49 Checking commit 57b61051819d (multi-process: Introduce build flags to separate remote process code)
26/49 Checking commit 45c18ddb523b (multi-process: refractor vl.c code to re-use in remote)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#35: 
new file mode 100644

WARNING: Block comments use a leading /* on a separate line
#157: FILE: vl-parse.c:118:
+    dev = qdev_remote_add(opts, false /* this is drive */, errp);

WARNING: Block comments use a leading /* on a separate line
#172: FILE: vl-parse.c:133:
+    dev = qdev_remote_add(opts, true /* this is device */, errp);

total: 0 errors, 3 warnings, 412 lines checked

Patch 26/49 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
27/49 Checking commit d616c9d52b3e (multi-process: add remote option)
28/49 Checking commit a90d6c6eebff (multi-process: add remote options parser)
WARNING: Block comments use a leading /* on a separate line
#37: FILE: vl.c:300:
+        { /* end of list */ }

total: 0 errors, 1 warnings, 147 lines checked

Patch 28/49 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
29/49 Checking commit f4991d61c243 (multi-process: add parse_cmdline in remote process)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#59: 
new file mode 100644

ERROR: line over 90 characters
#151: FILE: remote/remote-opts.c:88:
+                error_report("Option not supported for this target, %x arch_mask, %x arch_type",

total: 1 errors, 1 warnings, 180 lines checked

Patch 29/49 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

30/49 Checking commit ccd92304b77d (multi-process: send heartbeat messages to remote)
31/49 Checking commit c34c47ff9925 (multi-process: handle heartbeat messages in remote process)
32/49 Checking commit 32144092e7eb (multi-process: Use separate MMIO communication channel)
33/49 Checking commit cb5ab3eb7d63 (multi-process: perform device reset in the remote process)
34/49 Checking commit 2c67c85818aa (multi-process/mon: choose HMP commands based on target)
35/49 Checking commit d7e21910a6c4 (multi-process/mon: stub functions to enable QMP module for remote process)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#108: 
new file mode 100644

total: 0 errors, 1 warnings, 722 lines checked

Patch 35/49 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
36/49 Checking commit ae4fd022fa7e (multi-process/mon: enable QMP module support in the remote process)
37/49 Checking commit 58310d9c635b (multi-process/mon: Refactor monitor/chardev functions out of vl.c)
38/49 Checking commit a0faf9ea04b0 (multi-process/mon: Initialize QMP module for remote processes)
39/49 Checking commit f0265a76742b (multi-process: prevent duplicate memory initialization in remote)
40/49 Checking commit 645fc2738aec (multi-process/mig: build migration module in the remote process)
41/49 Checking commit 399f9bf4ba56 (multi-process/mig: Enable VMSD save in the Proxy object)
ERROR: suspect code indent for conditional statements (4, 4)
#126: FILE: hw/proxy/qemu-proxy.c:449:
+    while (*((volatile uint64_t *)&pdev->migsize) < size) {
+    }

total: 1 errors, 0 warnings, 173 lines checked

Patch 41/49 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

42/49 Checking commit 7e1b8192f562 (multi-process/mig: Send VMSD of remote to the Proxy object)
43/49 Checking commit 55e4b53ec008 (multi-process/mig: Load VMSD in the proxy object)
44/49 Checking commit 1b67b1722798 (multi-process/mig: refactor runstate_check into common file)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#48: 
new file mode 100644

total: 0 errors, 1 warnings, 92 lines checked

Patch 44/49 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
45/49 Checking commit 97c15c99f41f (multi-process/mig: Synchronize runstate of remote process)
46/49 Checking commit 5cf792672396 (multi-process/mig: Restore the VMSD in remote process)
ERROR: that open brace { should be on the previous line
#118: FILE: remote/remote-main.c:527:
+        if (process_start_mig_in(msg))
+        {

total: 1 errors, 0 warnings, 103 lines checked

Patch 46/49 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

47/49 Checking commit 15328b7aae2d (multi-process: Enable support for multiple devices in remote)
48/49 Checking commit 1724569b4d71 (multi-process: add the concept description to docs/devel/qemu-multiprocess)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#22: 
new file mode 100644

total: 0 errors, 1 warnings, 1106 lines checked

Patch 48/49 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
49/49 Checking commit 75d605b31739 (multi-process: add configure and usage information)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#13: 
new file mode 100644

total: 0 errors, 1 warnings, 86 lines checked

Patch 49/49 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/cover.1571905346.git.jag.raman@oracle.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 48/49] multi-process: add the concept description to docs/devel/qemu-multiprocess
  2019-10-24  9:09 ` [RFC v4 PATCH 48/49] multi-process: add the concept description to docs/devel/qemu-multiprocess Jagannathan Raman
@ 2019-10-25 19:33   ` Elena Ufimtseva
  2019-11-07 15:50   ` Stefan Hajnoczi
  2019-11-11 15:41   ` Stefan Hajnoczi
  2 siblings, 0 replies; 140+ messages in thread
From: Elena Ufimtseva @ 2019-10-25 19:33 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: fam, john.g.johnson, thuth, berrange, ehabkost, konrad.wilk,
	quintela, mst, qemu-devel, armbru, ross.lagerwall, mreitz,
	kanth.ghatraju, kraxel, stefanha, pbonzini, liran.alon,
	marcandre.lureau, kwolf, dgilbert, rth

On Thu, Oct 24, 2019 at 05:09:29AM -0400, Jagannathan Raman wrote:
> From: John G Johnson <john.g.johnson@oracle.com>
> 
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  v2 -> v3:
>    - Updated with latest design of this project
> 
>  v3 -> v4:
>   - Updated document to RST format
>

Hi,

The warning was reported in regards to this patch because the index for the multi-process
document is incorrect as pointed by the automated tests.

"/tmp/qemu-test/src/docs/devel/index.rst:13:toctree contains reference to nonexisting document 'multi-process'".

The correct version of this patch is available. Should that be sent in the next series or can
be correct version attached here?

Thank you!

Elena, Jag and JJ.  
>  docs/devel/index.rst             |    1 +
>  docs/devel/qemu-multiprocess.rst | 1102 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 1103 insertions(+)
>  create mode 100644 docs/devel/qemu-multiprocess.rst
> 
> diff --git a/docs/devel/index.rst b/docs/devel/index.rst
> index 1ec61fc..edd3fe3 100644
> --- a/docs/devel/index.rst
> +++ b/docs/devel/index.rst
> @@ -22,3 +22,4 @@ Contents:
>     decodetree
>     secure-coding-practices
>     tcg
> +   multi-process
> diff --git a/docs/devel/qemu-multiprocess.rst b/docs/devel/qemu-multiprocess.rst
> new file mode 100644
> index 0000000..2c42c6e
> --- /dev/null
> +++ b/docs/devel/qemu-multiprocess.rst
> @@ -0,0 +1,1102 @@
> +Disaggregating QEMU
> +===================
> +
> +QEMU is often used as the hypervisor for virtual machines running in the
> +Oracle cloud. Since one of the advantages of cloud computing is the
> +ability to run many VMs from different tenants in the same cloud
> +infrastructure, a guest that compromised its hypervisor could
> +potentially use the hypervisor's access privileges to access data it is
> +not authorized for.
> +
> +QEMU can be susceptible to security attack because it is a large,
> +monolithic program that provides many features to the VMs it services.
> +Many of these feature can be configured out of QEMU, but even a reduced
> +configuration QEMU has a large amount of code a guest can potentially
> +attack in order to gain additional privileges.
> +
> +QEMU services
> +-------------
> +
> +QEMU can be broadly described as providing three main services. One is a
> +VM control point, where VMs can be created, migrated, re-configured, and
> +destroyed. A second is to emulate the CPU instructions within the VM,
> +often accelerated by HW virtualization features such as Intel's VT
> +extensions. Finally, it provides IO services to the VM by emulating HW
> +IO devices, such as disk and network devices.
> +
> +A disaggregated QEMU
> +~~~~~~~~~~~~~~~~~~~~
> +
> +A disaggregated QEMU involves separating QEMU services into separate
> +host processes. Each of these processes can be given only the privileges
> +it needs to provide its service, e.g., a disk service could be given
> +access only the the disk images it provides, and not be allowed to
> +access other files, or any network devices. An attacker who compromised
> +this service would not be able to use this exploit to access files or
> +devices beyond what the disk service was given access to.
> +
> +A QEMU control process would remain, but in disaggregated mode, it would
> +be a control point that executes the processes needed to support the VM
> +being created, but have no direct interfaces to the VM. During VM
> +execution, it would still provide the user interface to hot-plug devices
> +or live migrate the VM.
> +
> +A first step in creating a disaggregated QEMU is to separate IO services
> +from the main QEMU program, which would continue to provide CPU
> +emulation. i.e., the control process would also be the CPU emulation
> +process. In a later phase, CPU emulation could be separated from the
> +control process.
> +
> +Disaggregating IO services
> +--------------------------
> +
> +Disaggregating IO services is a good place to begin QEMU disaggregating
> +for a couple of reasons. One is the sheer number of IO devices QEMU can
> +emulate provides a large surface of interfaces which could potentially
> +be exploited, and, indeed, have been a source of exploits in the past.
> +Another is the modular nature of QEMU device emulation code provides
> +interface points where the QEMU functions that perform device emulation
> +can be separated from the QEMU functions that manage the emulation of
> +guest CPU instructions.
> +
> +QEMU device emulation
> +~~~~~~~~~~~~~~~~~~~~~
> +
> +QEMU uses a object oriented SW architecture for device emulation code.
> +Configured objects are all compiled into the QEMU binary, then objects
> +are instantiated by name when used by the guest VM. For example, the
> +code to emulate a device named "foo" is always present in QEMU, but its
> +instantiation code is only run when the device is included in the target
> +VM. (e.g., via the QEMU command line as *-device foo*)
> +
> +The object model is hierarchical, so device emulation code names its
> +parent object (such as "pci-device" for a PCI device) and QEMU will
> +instantiate a parent object before calling the device's instantiation
> +code.
> +
> +Current separation models
> +~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +In order to separate the device emulation code from the CPU emulation
> +code, the device object code must run in a different process. There are
> +a couple of existing QEMU features that can run emulation code
> +separately from the main QEMU process. These are examined below.
> +
> +vhost user model
> +^^^^^^^^^^^^^^^^
> +
> +Virtio guest device drivers can be connected to vhost user applications
> +in order to perform their IO operations. This model uses special virtio
> +device drivers in the guest and vhost user device objects in QEMU, but
> +once the QEMU vhost user code has configured the vhost user application,
> +mission-mode IO is performed by the application. The vhost user
> +application is a daemon process that can be contacted via a known UNIX
> +domain socket.
> +
> +vhost socket
> +''''''''''''
> +
> +As mentioned above, one of the tasks of the vhost device object within
> +QEMU is to contact the vhost application and send it configuration
> +information about this device instance. As part of the configuration
> +process, the application can also be sent other file descriptors over
> +the socket, which then can be used by the vhost user application in
> +various ways, some of which are described below.
> +
> +vhost MMIO store acceleration
> +'''''''''''''''''''''''''''''
> +
> +VMs are often run using HW virtualization features via the KVM kernel
> +driver. This driver allows QEMU to accelerate the emulation of guest CPU
> +instructions by running the guest in a virtual HW mode. When the guest
> +executes instructions that cannot be executed by virtual HW mode,
> +execution returns to the KVM driver so it can inform QEMU to emulate the
> +instructions in SW.
> +
> +One of the events that can cause a return to QEMU is when a guest device
> +driver accesses an IO location. QEMU then dispatches the memory
> +operation to the corresponding QEMU device object. In the case of a
> +vhost user device, the memory operation would need to be sent over a
> +socket to the vhost application. This path is accelerated by the QEMU
> +virtio code by setting up an eventfd file descriptor that the vhost
> +application can directly receive MMIO store notifications from the KVM
> +driver, instead of needing them to be sent to the QEMU process first.
> +
> +vhost interrupt acceleration
> +''''''''''''''''''''''''''''
> +
> +Another optimization used by the vhost application is the ability to
> +directly inject interrupts into the VM via the KVM driver, again,
> +bypassing the need to send the interrupt back to the QEMU process first.
> +The QEMU virtio setup code configures the KVM driver with an eventfd
> +that triggers the device interrupt in the guest when the eventfd is
> +written. This irqfd file descriptor is then passed to the vhost user
> +application program.
> +
> +vhost access to guest memory
> +''''''''''''''''''''''''''''
> +
> +The vhost application is also allowed to directly access guest memory,
> +instead of needing to send the data as messages to QEMU. This is also
> +done with file descriptors sent to the vhost user application by QEMU.
> +These descriptors can be passed to ``mmap()`` by the vhost application
> +to map the guest address space into the vhost application.
> +
> +IOMMUs introduce another level of complexity, since the address given to
> +the guest virtio device to DMA to or from is not a guest physical
> +address. This case is handled by having vhost code within QEMU register
> +as a listener for IOMMU mapping changes. The vhost application maintains
> +a cache of IOMMMU translations: sending translation requests back to
> +QEMU on cache misses, and in turn receiving flush requests from QEMU
> +when mappings are purged.
> +
> +applicability to device separation
> +''''''''''''''''''''''''''''''''''
> +
> +Much of the vhost model can be re-used by separated device emulation. In
> +particular, the ideas of using a socket between QEMU and the device
> +emulation application, using a file descriptor to inject interrupts into
> +the VM via KVM, and allowing the application to ``mmap()`` the guest
> +should be re used.
> +
> +There are, however, some notable differences between how a vhost
> +application works and the needs of separated device emulation. The most
> +basic is that vhost uses custom virtio device drivers which always
> +trigger IO with MMIO stores. A separated device emulation model must
> +work with existing IO device models and guest device drivers. MMIO loads
> +break vhost store acceleration since they are synchronous - guest
> +progress cannot continue until the load has been emulated. By contrast,
> +stores are asynchronous, the guest can continue after the store event
> +has been sent to the vhost application.
> +
> +Another difference is that in the vhost user model, a single daemon can
> +support multiple QEMU instances. This is contrary to the security regime
> +desired, in which the emulation application should only be allowed to
> +access the files or devices the VM it's running on behalf of can access.
> +#### qemu-io model
> +
> +Qemu-io is a test harness used to test changes to the QEMU block backend
> +object code. (e.g., the code that implements disk images for disk driver
> +emulation) Qemu-io is not a device emulation application per se, but it
> +does compile the QEMU block objects into a separate binary from the main
> +QEMU one. This could be useful for disk device emulation, since its
> +emulation applications will need to include the QEMU block objects.
> +
> +New separation model based on proxy objects
> +-------------------------------------------
> +
> +A different model based on proxy objects in the QEMU program
> +communicating with remote emulation programs could provide separation
> +while minimizing the changes needed to the device emulation code. The
> +rest of this section is a discussion of how a proxy object model would
> +work.
> +
> +Remote emulation processes
> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The remote emulation process will run the QEMU object hierarchy without
> +modification. The device emulation objects will be also be based on the
> +QEMU code, because for anything but the simplest device, it would not be
> +a tractable to re-implement both the object model and the many device
> +backends that QEMU has.
> +
> +The processes will communicate with the QEMU process over UNIX domain
> +sockets. The processes can be executed either as standalone processes,
> +or be executed by QEMU. In both cases, the host backends the emulation
> +processes will provide are specified on its command line, as they would
> +be for QEMU. For example:
> +
> +::
> +
> +    disk-proc -blockdev driver=file,node-name=file0,filename=disk-file0  \
> +    -blockdev driver=qcow2,node-name=drive0,file=file0
> +
> +would indicate process *disk-proc* uses a qcow2 emulated disk named
> +*file0* as its backend.
> +
> +Emulation processes may emulate more than one guest controller. A common
> +configuration might be to put all controllers of the same device class
> +(e.g., disk, network, etc.) in a single process, so that all backends of
> +the same type can be managed by a single QMP monitor.
> +
> +communication with QEMU
> +^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Remote emulation processes will recognize a *-socket* argument that
> +specifies the path of a UNIX domain socket used to communicate with the
> +QEMU process. If no *-socket* argument is present, the process will use
> +file descriptor 0 to communicate with QEMU. For example,
> +
> +::
> +
> +    disk-proc -socket /tmp/disk0-sock <backend list>
> +
> +will communicate with QEMU using the socket path */tmp/dik0-sock*.
> +
> +remote process QMP monitor
> +^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Remote emulation processes can be monitored via QMP, similar to QEMU
> +itself. The QMP monitor socket is specified the same as for a QEMU
> +process:
> +
> +::
> +
> +    disk-proc -qmp unix:/tmp/disk-mon,server
> +
> +can be monitored over the UNIX socket path */tmp/disk-mon*.
> +
> +QEMU command line
> +~~~~~~~~~~~~~~~~~
> +
> +The QEMU command line options will need to be modified to indicate which
> +items are emulated by a separate program, and which remain emulated by
> +QEMU itself.
> +
> +identifying remote emulation processes
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Remote emulation processes will be identified to QEMU using a *-remote*
> +command line option. This option can either specify a command that QEMU
> +will execute, or can specify a UNIX domain socket that QEMU can use to
> +connect to an existing process. Both forms require a "id" option that
> +identifies the process to later *-device* options. The process version
> +is:
> +
> +::
> +
> +    -remote id=disk-proc,command="disk-proc <backend list>"
> +
> +And the socket version is:
> +
> +::
> +
> +    -remote id=disk-proc,socket="/tmp/disk0-sock"
> +
> +In the latter case, the remote process must be given the same socket on
> +its command line when it is executed:
> +
> +::
> +
> +    disk-proc -socket /tmp/disk0-sock <backend list>
> +
> +identifying devices emulated remotely
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Devices that are to be emulated in a separate process will be identify
> +the remote process with a "remote" option on their *-device* command
> +line specification. e.g., an LSI SCSI controller and disk can be
> +specified as:
> +
> +::
> +
> +    -device lsi53c895a,id=scsi0
> +    -device scsi-hd,drive=drive0,bus=scsi0.0,scsi-id=0
> +
> +If these devices are emulated by remote process "disk-proc," as
> +described in the previous section, the QEMU command line would be:
> +
> +::
> +
> +    -device lsi53c895a,id=scsi0,remote=disk-proc
> +    -device scsi-hd,drive=drive0,bus=scsi0.0,scsi-id=0,remote=disk-proc
> +
> +Some devices are implicitly created by the machine object. e.g., the q35
> +machine object will create its PCI bus, and attach an ich9-ahci IDE
> +controller to it. In this case, options will need to be added to the
> +*-machine* command line. e.g.,
> +
> +::
> +
> +    -machine pc-q35,ide-remote=disk-proc
> +
> +will use the remote process with an "id" of "disk-proc" to emulate the
> +IDE controller and its disks.
> +
> +The disks themselves still need to be specified with *-remote* option,
> +as in the example above. e.g.,
> +
> +::
> +
> +    -device ide-hd,drive=drive0,bus=ide.0,unit=0,remote=disk-proc
> +
> +QEMU management of remote processes
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Each *-remote* instance on the QEMU command line will create a remote
> +process proxy instance in QEMU. They will be held on a *QList* that can
> +be searched for by its "id" property. The remote process proxy will also
> +establish a communication channel between QEMU and the remote process.
> +This can be done in one of two methods: direction execution of the
> +process by QEMU with ``fork()`` and ``exec()`` system calls, or by
> +connecting to an existing process.
> +
> +direct execution
> +^^^^^^^^^^^^^^^^
> +
> +When the remote process is directly executed, the remote process proxy
> +will setup a communication channel between itself and the emulation
> +process. This channel will be created using ``socketpair()`` and the
> +remote process side of the pair will be given to the process as file
> +descriptor 0.
> +
> +connecting to an existing process
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Some environments wish to deny QEMU the ability to execute ``fork()``
> +and ``exec()`` In these case, emulation processes will be started before
> +QEMU, and a UNIX domain socket will be given to each emulation process
> +to communicate with QEMU over. After communication is established, the
> +socket will be unlinked from the file system space by the QEMU process.
> +
> +communication with emulation process
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +primary socket
> +''''''''''''''
> +
> +Whether the process was executed by QEMU or externally, there will be a
> +primary socket for communication between QEMU and the remote process.
> +This channel will handle configuration commands from QEMU to the
> +process, either from the QEMU command line, or from QMP commands that
> +affect the devices being emulated by the process. This channel will only
> +allow one message to be pending at a time; if additional messages
> +arrive, they must wait for previous ones to be acknowledged from the
> +remote side.
> +
> +secondary sockets
> +'''''''''''''''''
> +
> +The primary socket can pass the file descriptors of secondary sockets
> +for operations that occur in parallel with commands on the primary
> +channel. These include MMIO operations generated by the guest, interrupt
> +notifications generated by the devices being emulated, or *vmstate* for
> +live migration. These secondary sockets will be created at the behest of
> +the device proxies that require them. A disk device proxy wouldn't need
> +any secondary sockets, but a disk controller device proxy may need both
> +an MMIO socket and an interrupt socket.
> +
> +emulation process attached via QMP command
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +There will be a new "attach-process" QMP command to facilitate device
> +hot-plug. This command's arguments will be the same as the *-remote*
> +command line when it's used to attach to a remote process. i.e., it will
> +need an "id" argument so that hot-plugged devices can later find it, and
> +a "socket" argument to identify the UNIX domain socket that will be used
> +to communicate with QEMU.
> +
> +QEMU device proxy objects
> +~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +QEMU has an object model based on sub-classes inherited from the
> +"object" super-class. The sub-classes that are of interest here are the
> +"device" and "bus" sub-classes whose child sub-classes make up the
> +device tree of a QEMU emulated system.
> +
> +The proxy object model will use device proxy objects to replace the
> +device emulation code within the QEMU process. These objects will live
> +in the same place in the object and bus hierarchies as the objects they
> +replace. i.e., the proxy object for an LSI SCSI controller will be a
> +sub-class of the "pci-device" class, and will have the same PCI bus
> +parent and the same SCSI bus child objects as the LSI controller object
> +it replaces.
> +
> +After the QEMU command line has been parsed, the remote devices will be
> +instantiated in the same manner as local devices are. (i.e.,
> +``qdev_device_add()``). In order to distinguish them from regular
> +*-device* device objects, their class name will be the name of the class
> +it replaces, with "-proxy" appended. e.g., the "lsi53c895a" proxy class
> +will be "lsi53c895a-proxy."
> +
> +device JSON description
> +^^^^^^^^^^^^^^^^^^^^^^^
> +
> +The remote process needs a JSON representation of the command line
> +options used to create the object. This JSON representation is used to
> +create the corresponding object in the emulation process. e.g., for an
> +LSI SCSI controller invoked as:
> +
> +::
> +
> +     -device lsi53c895a,id=scsi0,remote=lsi-scsi
> +
> +the proxy object would create a
> +
> +::
> +
> +    { "driver" : "lsi53c895a", "id" : "scsi0" }
> +
> +JSON description. The "driver" option is assigned to the device name
> +when the command line is parsed, so the "-proxy" appended by the command
> +line parsing code is removed. The "remote" option isn't needed in the
> +JSON description since it only applies to the proxy object in the QEMU
> +process.
> +
> +device object whitelist
> +^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Some device objects may not need a proxy. These are devices with no
> +direct guest interfaces. (e.g., no MMIO, PIO, or interrupts). There will
> +be a whitelist of such devices, and any devices on this list will not be
> +instantiated in QEMU. Their JSON representation will still be sent to
> +the remote process, so the object can be created there.
> +
> +object initialization
> +^^^^^^^^^^^^^^^^^^^^^
> +
> +QEMU object initialization occurs in two phases. The first
> +initialization happens once per object class. (i.e., there can be many
> +SCSI disks in an emulated system, but the "scsi-hd" class has its
> +``class_init()`` function called only once) The second phase happens
> +when each object's ``instance_init()`` function is called to initialize
> +each instance of the object.
> +
> +All device objects are sub-classes of the "device" class, so they also
> +have a ``realize()`` function that is called after ``instance_init()``
> +is called and after the object's static properties have been
> +initialized. Many device objects don't even provide an instance\_init()
> +function, and do all their per-instance work in ``realize()``.
> +
> +class\_init
> +'''''''''''
> +
> +The ``class_init()`` method of a proxy object will, in general behave
> +similarly to the object it replaces, including setting any static
> +properties and methods needed by the proxy.
> +
> +instance\_init / realize
> +''''''''''''''''''''''''
> +
> +The ``instance_init()`` and ``realize()`` functions would only need to
> +perform tasks related to being a proxy, such are registering its own
> +MMIO handlers, or creating a child bus that other proxy devices can be
> +attached to later.
> +
> +Other tasks will are device-specific. For example, PCI device objects
> +will initialize the PCI config space in order to make a valid PCI device
> +tree within the QEMU process.
> +
> +address space registration
> +^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Most devices are driven by guest device driver accesses to IO addresses
> +or ports. The QEMU device emulation code uses QEMU's memory region
> +function calls (such as ``memory_region_init_io()``) to add callback
> +functions that QEMU will invoke when the guest accesses the device's
> +areas of the IO address space. When a guest driver does access the
> +device, the VM will exit HW virtualization mode and return to QEMU,
> +which will then lookup and execute the corresponding callback function.
> +
> +A proxy object would need to mirror the memory region calls the actual
> +device emulator would perform in its initialization code, but with its
> +own callbacks. When invoked by QEMU as a result of a guest IO operation,
> +they will forward the operation to the device emulation process.
> +
> +PCI config space
> +^^^^^^^^^^^^^^^^
> +
> +PCI devices also have a configuration space that can be accessed by the
> +guest driver. Guest accesses to this space is not handled by the device
> +emulation object, but by its PCI parent object. Much of this space is
> +read-only, but certain registers (especially BAR and MSI-related ones)
> +need to be propagated to the emulation process.
> +
> +PCI parent proxy
> +''''''''''''''''
> +
> +One way to propagate guest PCI config accesses is to create a
> +"pci-device-proxy" class that can serve as the parent of a PCI device
> +proxy object. This class's parent would be "pci-device" and it would
> +override the PCI parent's ``config_read()`` and ``config_write()``
> +methods with ones that forward these operations to the emulation
> +program.
> +
> +interrupt receipt
> +^^^^^^^^^^^^^^^^^
> +
> +A proxy for a device that generates interrupts will need to create a
> +socket to receive interrupt indications from the emulation process. An
> +incoming interrupt indication would then be sent up to its bus parent to
> +be injected into the guest. For example, a PCI device object may use
> +``pci_set_irq()``.
> +
> +live migration
> +^^^^^^^^^^^^^^
> +
> +The proxy will register to save and restore any *vmstate* it needs over
> +a live migration event. The device proxy does not need to manage the
> +remote device's *vmstate*; that will be handled by the remote process
> +proxy (see below).
> +
> +QEMU remote device operation
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Generic device operations, such as DMA, will be performs by the remote
> +process proxy by sending messages to the remote process.
> +
> +DMA operations
> +^^^^^^^^^^^^^^
> +
> +DMA operations would be handled much like vhost applications do. One of
> +the initial messages sent to the emulation process is a guest memory
> +table. Each entry in this table consists of a file descriptor and size
> +that the emulation process can ``mmap()`` to directly access guest
> +memory, similar to ``vhost_user_set_mem_table()``. Note guest memory
> +must be backed by file descriptors, such as when QEMU is given the
> +*-mem-path* command line option.
> +
> +IOMMU operations
> +^^^^^^^^^^^^^^^^
> +
> +When the emulated system includes an IOMMU, the remote process proxy in
> +QEMU will need to create a socket for IOMMU requests from the emulation
> +process. It will handle those requests with an
> +``address_space_get_iotlb_entry()`` call. In order to handle IOMMU
> +unmaps, the remote process proxy will also register as a listener on the
> +device's DMA address space. When an IOMMU memory region is created
> +within the DMA address space, an IOMMU notifier for unmaps will be added
> +to the memory region that will forward unmaps to the emulation process
> +over the IOMMU socket.
> +
> +device hot-plug via QMP
> +^^^^^^^^^^^^^^^^^^^^^^^
> +
> +An QMP "device\_add" command can add a device emulated by a remote
> +process. It needs to add a "remote" option to the command, just as the
> +*-device* command line option does. The remote process may either be one
> +started at QEMU startup, or be one added by the "add-process" QMP
> +command described above. In either case, the remote process proxy will
> +forward the new device's JSON description to the corresponding emulation
> +process.
> +
> +live migration
> +^^^^^^^^^^^^^^
> +
> +The remote process proxy will also register for live migration
> +notifications with ``vmstate_register()``. When called to save state,
> +the proxy will send the remote process a secondary socket file
> +descriptor to save the remote process's device *vmstate* over. The
> +incoming byte stream length and data will be saved as the proxy's
> +*vmstate*. When the proxy is resumed on its new host, this *vmstate*
> +will be extracted, and a secondary socket file descriptor will be sent
> +to the new remote process through which it receives the *vmstate* in
> +order to restore the devices there.
> +
> +device emulation in remote process
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The parts of QEMU that the emulation program will need include the
> +object model; the memory emulation objects; the device emulation objects
> +of the targeted device, and any dependent devices; and, the device's
> +backends. It will also need code to setup the machine environment,
> +handle requests from the QEMU process, and route machine-level requests
> +(such as interrupts or IOMMU mappings) back to the QEMU process.
> +
> +initialization
> +''''''''''''''
> +
> +The process initialization sequence will follow the same sequence
> +followed by QEMU. It will first initialize the backend objects, then
> +device emulation objects. The JSON descriptions sent by the QEMU process
> +will drive which objects need to be created.
> +
> +-  address spaces
> +
> +Before the device objects are created, the initial address spaces and
> +memory regions must be configured with ``memory_map_init()``. This
> +creates a RAM memory region object (*system\_memory*) and an IO memory
> +region object (*system\_io*).
> +
> +-  RAM
> +
> +RAM memory region creation will follow how ``pc_memory_init()`` creates
> +them, but must use ``memory_region_init_ram_from_fd()`` instead of
> +``memory_region_allocate_system_memory()``. The file descriptors needed
> +will be supplied by the guest memory table from above. Those RAM regions
> +would then be added to the *system\_memory* memory region with
> +``memory_region_add_subregion()``.
> +
> +-  PCI
> +
> +IO initialization will be driven by the JSON descriptions sent from the
> +QEMU process. For a PCI device, a PCI bus will need to be created with
> +``pci_root_bus_new()``, and a PCI memory region will need to be created
> +and added to the *system\_memory* memory region with
> +``memory_region_add_subregion_overlap()``. The overlap version is
> +required for architectures where PCI memory overlaps with RAM memory.
> +
> +MMIO handling
> +'''''''''''''
> +
> +The device emulation objects will use ``memory_region_init_io()`` to
> +install their MMIO handlers, and ``pci_register_bar()`` to associate
> +those handlers with a PCI BAR, as they do within QEMU currently.
> +
> +In order to use ``address_space_rw()`` in the emulation process to
> +handle MMIO requests from QEMU, the PCI physical addresses must be the
> +same in the QEMU process and the device emulation process. In order to
> +accomplish that, guest BAR programming must also be forwarded from QEMU
> +to the emulation process.
> +
> +interrupt injection
> +'''''''''''''''''''
> +
> +When device emulation wants to inject an interrupt into the VM, the
> +request climbs the device's bus object hierarchy until the point where a
> +bus object knows how to signal the interrupt to the guest. The details
> +depend on the type of interrupt being raised.
> +
> +-  PCI pin interrupts
> +
> +On x86 systems, there is an emulated IOAPIC object attached to the root
> +PCI bus object, and the root PCI object forwards interrupt requests to
> +it. The IOAPIC object, in turn, calls the KVM driver to inject the
> +corresponding interrupt into the VM. The simplest way to handle this in
> +an emulation process would be to setup the root PCI bus driver (via
> +``pci_bus_irqs()``) to send a interrupt request back to the QEMU
> +process, and have the device proxy object reflect it up the PCI tree
> +there.
> +
> +-  PCI MSI/X interrupts
> +
> +PCI MSI/X interrupts are implemented in HW as DMA writes to a
> +CPU-specific PCI address. In QEMU on x86, a KVM APIC object receives
> +these DMA writes, then calls into the KVM driver to inject the interrupt
> +into the VM. A simple emulation process implementation would be to send
> +the MSI DMA address from QEMU as a message at initialization, then
> +install an address space handler at that address which forwards the MSI
> +message back to QEMU.
> +
> +DMA operations
> +''''''''''''''
> +
> +When a emulation object wants to DMA into or out of guest memory, it
> +first must use dma\_memory\_map() to convert the DMA address to a local
> +virtual address. The emulation process memory region objects setup above
> +will be used to translate the DMA address to a local virtual address the
> +device emulation code can access.
> +
> +IOMMU
> +'''''
> +
> +When an IOMMU is in use in QEMU, DMA translation uses IOMMU memory
> +regions to translate the DMA address to a guest physical address before
> +that physical address can be translated to a local virtual address. The
> +emulation process will need similar functionality.
> +
> +-  IOTLB cache
> +
> +The emulation process will maintain a cache of recent IOMMU translations
> +(the IOTLB). When the translate() callback of an IOMMU memory region is
> +invoked, the IOTLB cache will be searched for an entry that will map the
> +DMA address to a guest PA. On a cache miss, a message will be sent back
> +to QEMU requesting the corresponding translation entry, which be both be
> +used to return a guest address and be added to the cache.
> +
> +-  IOTLB purge
> +
> +The IOMMU emulation will also need to act on unmap requests from QEMU.
> +These happen when the guest IOMMU driver purges an entry from the
> +guest's translation table.
> +
> +live migration
> +''''''''''''''
> +
> +When a remote process receives a live migration indication from QEMU, it
> +will set up a channel using the received file descriptor with
> +``qio_channel_socket_new_fd()``. This channel will be used to create a
> +*QEMUfile* that can be passed to ``qemu_save_device_state()`` to send
> +the process's device state back to QEMU. This method will be reversed on
> +restore - the channel will be passed to ``qemu_loadvm_state()`` to
> +restore the device state.
> +
> +Accelerating device emulation
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The messages that are required to be sent between QEMU and the emulation
> +process can add considerable latency to IO operations. The optimizations
> +described below attempt to ameliorate this effect by allowing the
> +emulation process to communicate directly with the kernel KVM driver.
> +The KVM file descriptors created wold be passed to the emulation process
> +via initialization messages, much like the guest memory table is done.
> +#### MMIO acceleration
> +
> +Vhost user applications can receive guest virtio driver stores directly
> +from KVM. The issue with the eventfd mechanism used by vhost user is
> +that it does not pass any data with the event indication, so it cannot
> +handle guest loads or guest stores that carry store data. This concept
> +could, however, be expanded to cover more cases.
> +
> +The expanded idea would require a new type of KVM device:
> +*KVM\_DEV\_TYPE\_USER*. This device has two file descriptors: a master
> +descriptor that QEMU can use for configuration, and a slave descriptor
> +that the emulation process can use to receive MMIO notifications. QEMU
> +would create both descriptors using the KVM driver, and pass the slave
> +descriptor to the emulation process via an initialization message.
> +
> +data structures
> +'''''''''''''''
> +
> +-  guest physical range
> +
> +The guest physical range structure describes the address range that a
> +device will respond to. It includes the base and length of the range, as
> +well as which bus the range resides on (e.g., on an x86machine, it can
> +specify whether the range refers to memory or IO addresses).
> +
> +A device can have multiple physical address ranges it responds to (e.g.,
> +a PCI device can have multiple BARs), so the structure will also include
> +an enumerated identifier to specify which of the device's ranges is
> +being referred to.
> +
> ++--------+----------------------------+
> +| Name   | Description                |
> ++========+============================+
> +| addr   | range base address         |
> ++--------+----------------------------+
> +| len    | range length               |
> ++--------+----------------------------+
> +| bus    | addr type (memory or IO)   |
> ++--------+----------------------------+
> +| id     | range ID (e.g., PCI BAR)   |
> ++--------+----------------------------+
> +
> +-  MMIO request structure
> +
> +This structure describes an MMIO operation. It includes which guest
> +physical range the MMIO was within, the offset within that range, the
> +MMIO type (e.g., load or store), and its length and data. It also
> +includes a sequence number that can be used to reply to the MMIO, and
> +the CPU that issued the MMIO.
> +
> ++----------+------------------------+
> +| Name     | Description            |
> ++==========+========================+
> +| rid      | range MMIO is within   |
> ++----------+------------------------+
> +| offset   | offset withing *rid*   |
> ++----------+------------------------+
> +| type     | e.g., load or store    |
> ++----------+------------------------+
> +| len      | MMIO length            |
> ++----------+------------------------+
> +| data     | store data             |
> ++----------+------------------------+
> +| seq      | sequence ID            |
> ++----------+------------------------+
> +
> +-  MMIO request queues
> +
> +MMIO request queues are FIFO arrays of MMIO request structures. There
> +are two queues: pending queue is for MMIOs that haven't been read by the
> +emulation program, and the sent queue is for MMIOs that haven't been
> +acknowledged. The main use of the second queue is to validate MMIO
> +replies from the emulation program.
> +
> +-  scoreboard
> +
> +Each CPU in the VM is emulated in QEMU by a separate thread, so multiple
> +MMIOs may be waiting to be consumed by an emulation program and multiple
> +threads may be waiting for MMIO replies. The scoreboard would contain a
> +wait queue and sequence number for the per-CPU threads, allowing them to
> +be individually woken when the MMIO reply is received from the emulation
> +program. It also tracks the number of posted MMIO stores to the device
> +that haven't been replied to, in order to satisfy the PCI constraint
> +that a load to a device will not complete until all previous stores to
> +that device have been completed.
> +
> +-  device shadow memory
> +
> +Some MMIO loads do not have device side-effects. These MMIOs can be
> +completed without sending a MMIO request to the emulation program if the
> +emulation program shares a shadow image of the device's memory image
> +with the KVM driver.
> +
> +The emulation program will ask the KVM driver to allocate memory for the
> +shadow image, and will then use ``mmap()`` to directly access it. The
> +emulation program can control KVM access to the shadow image by sending
> +KVM an access map telling it which areas of the image have no
> +side-effects (and can be completed immediately), and which require a
> +MMIO request to the emulation program. The access map can also inform
> +the KVM drive which size accesses are allowed to the image.
> +
> +master descriptor
> +'''''''''''''''''
> +
> +The master descriptor is used by QEMU to configure the new KVM device.
> +The descriptor would be returned by the KVM driver when QEMU issues a
> +*KVM\_CREATE\_DEVICE* ``ioctl()`` with a *KVM\_DEV\_TYPE\_USER* type.
> +
> +KVM\_DEV\_TYPE\_USER device ops
> +
> +
> +The *KVM\_DEV\_TYPE\_USER* operations vector will be registered by a
> +``kvm_register_device_ops()`` call when the KVM system in initialized by
> +``kvm_init()``. These device ops are called by the KVM driver when QEMU
> +executes certain ``ioctl()`` operations on its KVM file descriptor. They
> +include:
> +
> +-  create
> +
> +This routine is called when QEMU issues a *KVM\_CREATE\_DEVICE*
> +``ioctl()`` on its per-VM file descriptor. It will allocate and
> +initialize a KVM user device specific data structure, and assign the
> +*kvm\_device* private field to it.
> +
> +-  ioctl
> +
> +This routine is invoked when QEMU issues an ``ioctl()`` on the master
> +descriptor. The ``ioctl()`` commands supported are defined by the KVM
> +device type. *KVM\_DEV\_TYPE\_USER* ones will need several commands:
> +
> +*KVM\_DEV\_USER\_SLAVE\_FD* creates the slave file descriptor thatwill
> +be passed to the device emulation program. Only one slave can be created
> +by each master descriptor. The file operations performed by this
> +descriptor are described below.
> +
> +The *KVM\_DEV\_USER\_PA\_RANGE* command configures a guest physical
> +address range that the slave descriptor will receive MMIO notifications
> +for. The range is specified by a guest physical range structure
> +argument. For buses that assign addresses to devices dynamically, this
> +command can be executed while the guest is running, such as the case
> +when a guest changes a device's PCI BAR registers.
> +
> +*KVM\_DEV\_USER\_PA\_RANGE* will use ``kvm_io_bus_register_dev()`` to
> +register *kvm\_io\_device\_ops* callbacks to be invoked when the guest
> +performs a MMIO operation within the range. When a range is changed,
> +``kvm_io_bus_unregister_dev()`` is used to remove the previous
> +instantiation.
> +
> +*KVM\_DEV\_USER\_TIMEOUT* will configure a timeout value that specifies
> +how long KVM will wait for the emulation process to respond to a MMIO
> +indication.
> +
> +-  destroy
> +
> +This routine is called when the VM instance is destroyed. It will need
> +to destroy the slave descriptor; and free any memory allocated by the
> +driver, as well as the *kvm\_device* structure itself.
> +
> +slave descriptor
> +''''''''''''''''
> +
> +The slave descriptor will have its own file operations vector, which
> +responds to system calls on the descriptor performed by the device
> +emulation program.
> +
> +-  read
> +
> +A read returns any pending MMIO requests from the KVM driver as MMIO
> +request structures. Multiple structures can be returned if there are
> +multiple MMIO operations pending. The MMIO requests are moved from the
> +pending queue to the sent queue, and if there are threads waiting for
> +space in the pending to add new MMIO operations, they will be woken
> +here.
> +
> +-  write
> +
> +A write also consists of a set of MMIO requests. They are compared to
> +the MMIO requests in the sent queue. Matches are removed from the sent
> +queue, and any threads waiting for the reply are woken. If a store is
> +removed, then the number of posted stores in the per-CPU scoreboard is
> +decremented. When the number is zero, and a non side-effect load was
> +waiting for posted stores to complete, the load is continued.
> +
> +-  ioctl
> +
> +There are several ioctl()s that can be performed on the slave
> +descriptor.
> +
> +A *KVM\_DEV\_USER\_SHADOW\_SIZE* ``ioctl()`` causes the KVM driver to
> +allocate memory for the shadow image. This memory can later be
> +``mmap()``\ ed by the emulation process to share the emulation's view of
> +device memory with the KVM driver.
> +
> +A *KVM\_DEV\_USER\_SHADOW\_CTRL* ``ioctl()`` controls access to the
> +shadow image. It will send the KVM driver a shadow control map, which
> +specifies which areas of the image can complete guest loads without
> +sending the load request to the emulation program. It will also specify
> +the size of load operations that are allowed.
> +
> +-  poll
> +
> +An emulation program will use the ``poll()`` call with a *POLLIN* flag
> +to determine if there are MMIO requests waiting to be read. It will
> +return if the pending MMIO request queue is not empty.
> +
> +-  mmap
> +
> +This call allows the emulation program to directly access the shadow
> +image allocated by the KVM driver. As device emulation updates device
> +memory, changes with no side-effects will be reflected in the shadow,
> +and the KVM driver can satisfy guest loads from the shadow image without
> +needing to wait for the emulation program.
> +
> +kvm\_io\_device ops
> +'''''''''''''''''''
> +
> +Each KVM per-CPU thread can handle MMIO operation on behalf of the guest
> +VM. KVM will use the MMIO's guest physical address to search for a
> +matching *kvm\_io\_device* to see if the MMIO can be handled by the KVM
> +driver instead of exiting back to QEMU. If a match is found, the
> +corresponding callback will be invoked.
> +
> +-  read
> +
> +This callback is invoked when the guest performs a load to the device.
> +Loads with side-effects must be handled synchronously, with the KVM
> +driver putting the QEMU thread to sleep waiting for the emulation
> +process reply before re-starting the guest. Loads that do not have
> +side-effects may be optimized by satisfying them from the shadow image,
> +if there are no outstanding stores to the device by this CPU. PCI memory
> +ordering demands that a load cannot complete before all older stores to
> +the same device have been completed.
> +
> +-  write
> +
> +Stores can be handled asynchronously unless the pending MMIO request
> +queue is full. In this case, the QEMU thread must sleep waiting for
> +space in the queue. Stores will increment the number of posted stores in
> +the per-CPU scoreboard, in order to implement the PCI ordering
> +constraint above.
> +
> +interrupt acceleration
> +^^^^^^^^^^^^^^^^^^^^^^
> +
> +This performance optimization would work much like a vhost user
> +application does, where the QEMU process sets up *eventfds* that cause
> +the device's corresponding interrupt to be triggered by the KVM driver.
> +These irq file descriptors are sent to the emulation process at
> +initialization, and are used when the emulation code raises a device
> +interrupt.
> +
> +intx acceleration
> +'''''''''''''''''
> +
> +Traditional PCI pin interrupts are level based, so, in addition to an
> +irq file descriptor, a re-sampling file descriptor needs to be sent to
> +the emulation program. This second file descriptor allows multiple
> +devices sharing an irq to be notified when the interrupt has been
> +acknowledged by the guest, so they can re-trigger the interrupt if their
> +device has not de-asserted its interrupt.
> +
> +intx irq descriptor
> +
> +
> +The irq descriptors are created by the proxy object
> +``using event_notifier_init()`` to create the irq and re-sampling
> +*eventds*, and ``kvm_vm_ioctl(KVM_IRQFD)`` to bind them to an interrupt.
> +The interrupt route can be found with
> +``pci_device_route_intx_to_irq()``.
> +
> +intx routing changes
> +
> +
> +Intx routing can be changed when the guest programs the APIC the device
> +pin is connected to. The proxy object in QEMU will use
> +``pci_device_set_intx_routing_notifier()`` to be informed of any guest
> +changes to the route. This handler will broadly follow the VFIO
> +interrupt logic to change the route: de-assigning the existing irq
> +descriptor from its route, then assigning it the new route. (see
> +``vfio_intx_update()``)
> +
> +MSI/X acceleration
> +''''''''''''''''''
> +
> +MSI/X interrupts are sent as DMA transactions to the host. The interrupt
> +data contains a vector that is programed by the guest, A device may have
> +multiple MSI interrupts associated with it, so multiple irq descriptors
> +may need to be sent to the emulation program.
> +
> +MSI/X irq descriptor
> +
> +
> +This case will also follow the VFIO example. For each MSI/X interrupt,
> +an *eventfd* is created, a virtual interrupt is allocated by
> +``kvm_irqchip_add_msi_route()``, and the virtual interrupt is bound to
> +the eventfd with ``kvm_irqchip_add_irqfd_notifier()``.
> +
> +MSI/X config space changes
> +
> +
> +The guest may dynamically update several MSI-related tables in the
> +device's PCI config space. These include per-MSI interrupt enables and
> +vector data. Additionally, MSIX tables exist in device memory space, not
> +config space. Much like the BAR case above, the proxy object must look
> +at guest config space programming to keep the MSI interrupt state
> +consistent between QEMU and the emulation program.
> +
> +--------------
> +
> +Disaggregated CPU emulation
> +---------------------------
> +
> +After IO services have been disaggregated, a second phase would be to
> +separate a process to handle CPU instruction emulation from the main
> +QEMU control function. There are no object separation points for this
> +code, so the first task would be to create one.
> +
> +Host access controls
> +--------------------
> +
> +Separating QEMU relies on the host OS's access restriction mechanisms to
> +enforce that the differing processes can only access the objects they
> +are entitled to. There are a couple types of mechanisms usually provided
> +by general purpose OSs.
> +
> +Discretionary access control
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Discretionary access control allows each user to control who can access
> +their files. In Linux, this type of control is usually too coarse for
> +QEMU separation, since it only provides three separate access controls:
> +one for the same user ID, the second for users IDs with the same group
> +ID, and the third for all other user IDs. Each device instance would
> +need a separate user ID to provide access control, which is likely to be
> +unwieldy for dynamically created VMs.
> +
> +Mandatory access control
> +~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Mandatory access control allows the OS to add an additional set of
> +controls on top of discretionary access for the OS to control. It also
> +adds other attributes to processes and files such as types, roles, and
> +categories, and can establish rules for how processes and files can
> +interact.
> +
> +Type enforcement
> +^^^^^^^^^^^^^^^^
> +
> +Type enforcement assigns a *type* attribute to processes and files, and
> +allows rules to be written on what operations a process with a given
> +type can perform on a file with a given type. QEMU separation could take
> +advantage of type enforcement by running the emulation processes with
> +different types, both from the main QEMU process, and from the emulation
> +processes of different classes of devices.
> +
> +For example, guest disk images and disk emulation processes could have
> +types separate from the main QEMU process and non-disk emulation
> +processes, and the type rules could prevent processes other than disk
> +emulation ones from accessing guest disk images. Similarly, network
> +emulation processes can have a type separate from the main QEMU process
> +and non-network emulation process, and only that type can access the
> +host tun/tap device used to provide guest networking.
> +
> +Category enforcement
> +^^^^^^^^^^^^^^^^^^^^
> +
> +Category enforcement assigns a set of numbers within a given range to
> +the process or file. The process is granted access to the file if the
> +process's set is a superset of the file's set. This enforcement can be
> +used to separate multiple instances of devices in the same class.
> +
> +For example, if there are multiple disk devices provides to a guest,
> +each device emulation process could be provisioned with a separate
> +category. The different device emulation processes would not be able to
> +access each other's backing disk images.
> +
> +Alternatively, categories could be used in lieu of the type enforcement
> +scheme described above. In this scenario, different categories would be
> +used to prevent device emulation processes in different classes from
> +accessing resources assigned to other classes.
> -- 
> 1.8.3.1
> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 49/49] multi-process: add configure and usage information
  2019-10-24  9:09 ` [RFC v4 PATCH 49/49] multi-process: add configure and usage information Jagannathan Raman
@ 2019-11-07 14:02   ` Stefan Hajnoczi
  2019-11-07 14:33     ` Michael S. Tsirkin
  2019-11-07 14:39     ` Daniel P. Berrangé
  0 siblings, 2 replies; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-07 14:02 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, fam, john.g.johnson, qemu-devel, kraxel,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

[-- Attachment #1: Type: text/plain, Size: 4768 bytes --]

On Thu, Oct 24, 2019 at 05:09:30AM -0400, Jagannathan Raman wrote:
> From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> ---
>  docs/qemu-multiprocess.txt | 86 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 86 insertions(+)
>  create mode 100644 docs/qemu-multiprocess.txt
> 
> diff --git a/docs/qemu-multiprocess.txt b/docs/qemu-multiprocess.txt
> new file mode 100644
> index 0000000..c29f4df
> --- /dev/null
> +++ b/docs/qemu-multiprocess.txt
> @@ -0,0 +1,86 @@
> +Multi-process QEMU
> +==================
> +
> +This document describes how to configure and use multi-process qemu.
> +For the design document refer to docs/devel/qemu-multiprocess.
> +
> +1) Configuration
> +----------------
> +
> +To enable support for multi-process add --enable-mpqemu
> +to the list of options for the "configure" script.
> +
> +
> +2) Usage
> +--------
> +
> +To start qemu with devices intended to run in a separate emulation
> +process without libvirtd support, the following should be used on QEMU
> +command line. As of now, we only support the emulation of lsi53c895a
> +in a separate process
> +
> +* Since parts of the RAM are shared between QEMU & remote process, a
> +  memory-backend-file is required to facilitate this, as follows:
> +
> +  -object memory-backend-file,id=mem,mem-path=/dev/shm/,size=4096M,share=on
> +
> +* The devices to be emulated in the separate process are defined as
> +  before with addition of "rid" suboption that serves as a remote group
> +  identificator.
> +
> +  -device <device options>,rid="remote process id"
> +
> +  For exmaple, for non multi-process qemu:

s/exmaple/example/

> +    -device lsi53c895a,id=scsi0 device
> +    -device scsi-hd,drive=drive0,bus=scsi0.0,scsi-id=0
> +    -drive id=drive0,file=data-disk.img
> +
> +  and for multi-process qemu and no libvirt
> +  support (i.e. QEMU forks child processes):
> +    -device lsi53c895a,id=scsi0,rid=0
> +    -device scsi-hd,drive=drive0,bus=scsi0.0,scsi-id=0,rid="0"
> +
> +* The command-line options for the remote process is added to the "command"

s/is added/are added/

> +  suboption of the newly added "-remote" option. 
> +
> +   -remote [socket],rid=,command="..."
> +
> +  The drives to be emulated by the remote process are specified as part of
> +  this command sub-option. The device to be used to connect to the monitor
> +  is also specified as part of this suboption.
> +
> +  For example, the following option adds a drive and monitor to the remote
> +  process:
> +  -remote rid=0,command="-drive id=drive0,,file=data-disk.img -monitor unix:/home/qmp-sock,,server,,nowait"
> +
> +  Note: There's an issue with this "command" subtion which we are in the

s/subtion/sub-option/

> +  process of fixing. To work around this issue, it requires additional
> +  "comma" characters as illustrated above, and in the example below.
> +
> +* Example QEMU command-line to launch lsi53c895a in a remote process
> +
> +  #/bin/sh
> +  qemu-system-x86_64 \
> +  -name "OL7.4" \
> +  -machine q35,accel=kvm \
> +  -smp sockets=1,cores=1,threads=1 \
> +  -cpu host \
> +  -m 2048 \
> +  -object memory-backend-file,id=mem,mem-path=/dev/shm/,size=2G,share=on \
> +  -numa node,memdev=mem \
> +  -device virtio-scsi-pci,id=virtio_scsi_pci0 \
> +  -drive id=drive_image1,if=none,format=raw,file=/root/ol7.qcow2 \
> +  -device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0 \
> +  -boot d \
> +  -monitor stdio \
> +  -vnc :0 \
> +  -device lsi53c895a,id=lsi0,remote,rid=8,command="qemu-scsi-dev" \
> +  -device scsi-hd,id=drive2,drive=drive_image2,bus=lsi0.0,scsi-id=0,remote,rid=8,command="qemu-scsi-dev"\
> +  -remote rid=8,command="-drive id=drive_image2,,file=/root/remote-process-disk.img -monitor unix:/home/qmp-sock,,server,,nowait"
> +
> +  We could connect to the monitor using the following command:
> +  socat /home/qmp-sock stdio
> +
> +  After hotplugging disks to the remote process, please execute the
> +  following command in the guest to refresh the list of storage devices:
> +  rescan_scsi_bus.sh -a

This documentation suggests that QEMU spawns the remote processes.  How
do this work with unprivileged QEMU?  Is there an additional step where
QEMU drops privileges after having spawned remote processes?

Remote processes require accesses to resources that the main QEMU
process does not need access to, so I'm wondering how this process model
ensures that each process has only the privileges it needs.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 49/49] multi-process: add configure and usage information
  2019-11-07 14:02   ` Stefan Hajnoczi
@ 2019-11-07 14:33     ` Michael S. Tsirkin
  2019-11-08 11:17       ` Stefan Hajnoczi
  2019-11-07 14:39     ` Daniel P. Berrangé
  1 sibling, 1 reply; 140+ messages in thread
From: Michael S. Tsirkin @ 2019-11-07 14:33 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: elena.ufimtseva, fam, john.g.johnson, qemu-devel, kraxel,
	Jagannathan Raman, quintela, armbru, kanth.ghatraju, thuth,
	ehabkost, konrad.wilk, dgilbert, liran.alon, stefanha, rth,
	kwolf, berrange, mreitz, ross.lagerwall, marcandre.lureau,
	pbonzini

On Thu, Nov 07, 2019 at 03:02:20PM +0100, Stefan Hajnoczi wrote:
> This documentation suggests that QEMU spawns the remote processes.  How
> do this work with unprivileged QEMU?  Is there an additional step where
> QEMU drops privileges after having spawned remote processes?
> 
> Remote processes require accesses to resources that the main QEMU
> process does not need access to, so I'm wondering how this process model
> ensures that each process has only the privileges it needs.

I guess you have something like capabilities in mind?

When using something like selinux, priviledges are per binary
so the order of startup doesn't matter.

-- 
MST


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 49/49] multi-process: add configure and usage information
  2019-11-07 14:02   ` Stefan Hajnoczi
  2019-11-07 14:33     ` Michael S. Tsirkin
@ 2019-11-07 14:39     ` Daniel P. Berrangé
  2019-11-07 15:53       ` Jag Raman
  1 sibling, 1 reply; 140+ messages in thread
From: Daniel P. Berrangé @ 2019-11-07 14:39 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: elena.ufimtseva, fam, john.g.johnson, qemu-devel, kraxel,
	Jagannathan Raman, quintela, mst, armbru, kanth.ghatraju, thuth,
	ehabkost, konrad.wilk, dgilbert, liran.alon, stefanha, rth,
	kwolf, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

On Thu, Nov 07, 2019 at 03:02:20PM +0100, Stefan Hajnoczi wrote:
> On Thu, Oct 24, 2019 at 05:09:30AM -0400, Jagannathan Raman wrote:
> > From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> > 
> > Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> > Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> > Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> > ---
> >  docs/qemu-multiprocess.txt | 86 ++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 86 insertions(+)
> >  create mode 100644 docs/qemu-multiprocess.txt
> > 
> > diff --git a/docs/qemu-multiprocess.txt b/docs/qemu-multiprocess.txt
> > new file mode 100644
> > index 0000000..c29f4df
> > --- /dev/null
> > +++ b/docs/qemu-multiprocess.txt
> > @@ -0,0 +1,86 @@
> > +Multi-process QEMU
> > +==================
> > +
> > +This document describes how to configure and use multi-process qemu.
> > +For the design document refer to docs/devel/qemu-multiprocess.
> > +
> > +1) Configuration
> > +----------------
> > +
> > +To enable support for multi-process add --enable-mpqemu
> > +to the list of options for the "configure" script.
> > +
> > +
> > +2) Usage
> > +--------
> > +
> > +To start qemu with devices intended to run in a separate emulation
> > +process without libvirtd support, the following should be used on QEMU
> > +command line. As of now, we only support the emulation of lsi53c895a
> > +in a separate process
> > +
> > +* Since parts of the RAM are shared between QEMU & remote process, a
> > +  memory-backend-file is required to facilitate this, as follows:
> > +
> > +  -object memory-backend-file,id=mem,mem-path=/dev/shm/,size=4096M,share=on
> > +
> > +* The devices to be emulated in the separate process are defined as
> > +  before with addition of "rid" suboption that serves as a remote group
> > +  identificator.
> > +
> > +  -device <device options>,rid="remote process id"
> > +
> > +  For exmaple, for non multi-process qemu:
> 
> s/exmaple/example/
> 
> > +    -device lsi53c895a,id=scsi0 device
> > +    -device scsi-hd,drive=drive0,bus=scsi0.0,scsi-id=0
> > +    -drive id=drive0,file=data-disk.img
> > +
> > +  and for multi-process qemu and no libvirt
> > +  support (i.e. QEMU forks child processes):
> > +    -device lsi53c895a,id=scsi0,rid=0
> > +    -device scsi-hd,drive=drive0,bus=scsi0.0,scsi-id=0,rid="0"
> > +
> > +* The command-line options for the remote process is added to the "command"
> 
> s/is added/are added/
> 
> > +  suboption of the newly added "-remote" option. 
> > +
> > +   -remote [socket],rid=,command="..."
> > +
> > +  The drives to be emulated by the remote process are specified as part of
> > +  this command sub-option. The device to be used to connect to the monitor
> > +  is also specified as part of this suboption.
> > +
> > +  For example, the following option adds a drive and monitor to the remote
> > +  process:
> > +  -remote rid=0,command="-drive id=drive0,,file=data-disk.img -monitor unix:/home/qmp-sock,,server,,nowait"
> > +
> > +  Note: There's an issue with this "command" subtion which we are in the
> 
> s/subtion/sub-option/
> 
> > +  process of fixing. To work around this issue, it requires additional
> > +  "comma" characters as illustrated above, and in the example below.
> > +
> > +* Example QEMU command-line to launch lsi53c895a in a remote process
> > +
> > +  #/bin/sh
> > +  qemu-system-x86_64 \
> > +  -name "OL7.4" \
> > +  -machine q35,accel=kvm \
> > +  -smp sockets=1,cores=1,threads=1 \
> > +  -cpu host \
> > +  -m 2048 \
> > +  -object memory-backend-file,id=mem,mem-path=/dev/shm/,size=2G,share=on \
> > +  -numa node,memdev=mem \
> > +  -device virtio-scsi-pci,id=virtio_scsi_pci0 \
> > +  -drive id=drive_image1,if=none,format=raw,file=/root/ol7.qcow2 \
> > +  -device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0 \
> > +  -boot d \
> > +  -monitor stdio \
> > +  -vnc :0 \
> > +  -device lsi53c895a,id=lsi0,remote,rid=8,command="qemu-scsi-dev" \
> > +  -device scsi-hd,id=drive2,drive=drive_image2,bus=lsi0.0,scsi-id=0,remote,rid=8,command="qemu-scsi-dev"\
> > +  -remote rid=8,command="-drive id=drive_image2,,file=/root/remote-process-disk.img -monitor unix:/home/qmp-sock,,server,,nowait"
> > +
> > +  We could connect to the monitor using the following command:
> > +  socat /home/qmp-sock stdio
> > +
> > +  After hotplugging disks to the remote process, please execute the
> > +  following command in the guest to refresh the list of storage devices:
> > +  rescan_scsi_bus.sh -a
> 
> This documentation suggests that QEMU spawns the remote processes.  How
> do this work with unprivileged QEMU?  Is there an additional step where
> QEMU drops privileges after having spawned remote processes?

This syntax is for the simple case without privilege separation.
If differing privilege levels are needed, then whatever spawns QEMU
should spawn the remote helper process ahead of time, and then just
pass the UNIX socket path to the -remote arg, instead of using
the 'command' parameter.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 48/49] multi-process: add the concept description to docs/devel/qemu-multiprocess
  2019-10-24  9:09 ` [RFC v4 PATCH 48/49] multi-process: add the concept description to docs/devel/qemu-multiprocess Jagannathan Raman
  2019-10-25 19:33   ` Elena Ufimtseva
@ 2019-11-07 15:50   ` Stefan Hajnoczi
  2019-11-11 15:41   ` Stefan Hajnoczi
  2 siblings, 0 replies; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-07 15:50 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, fam, john.g.johnson, qemu-devel, kraxel,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

[-- Attachment #1: Type: text/plain, Size: 34435 bytes --]

On Thu, Oct 24, 2019 at 05:09:29AM -0400, Jagannathan Raman wrote:
> diff --git a/docs/devel/qemu-multiprocess.rst b/docs/devel/qemu-multiprocess.rst
> new file mode 100644
> index 0000000..2c42c6e
> --- /dev/null
> +++ b/docs/devel/qemu-multiprocess.rst
> @@ -0,0 +1,1102 @@
> +Disaggregating QEMU
> +===================
> +
> +QEMU is often used as the hypervisor for virtual machines running in the
> +Oracle cloud. Since one of the advantages of cloud computing is the
> +ability to run many VMs from different tenants in the same cloud
> +infrastructure, a guest that compromised its hypervisor could
> +potentially use the hypervisor's access privileges to access data it is
> +not authorized for.
> +
> +QEMU can be susceptible to security attack because it is a large,
> +monolithic program that provides many features to the VMs it services.
> +Many of these feature can be configured out of QEMU, but even a reduced
> +configuration QEMU has a large amount of code a guest can potentially
> +attack in order to gain additional privileges.

The "additional privileges" are only host userspace code execution (i.e.
syscalls) within an unprivileged process that is sandboxed using seccomp
and SELinux on a properly configured system.  If QEMU has access to
resources that do not belong to the guest then you have not configured
QEMU correctly (libvirt handles a lot of this setup for you).

I think it's more accurate to describe the motivation for multi-process
QEMU in terms of the principle of least privilege: each component in the
system should only have access to the resources that it needs to perform
its job.  That way people don't get the impression that QEMU is a
trusted component with access to resources that must be kept from the
guest.

> +QEMU services
> +-------------
> +
> +QEMU can be broadly described as providing three main services. One is a
> +VM control point, where VMs can be created, migrated, re-configured, and
> +destroyed. A second is to emulate the CPU instructions within the VM,
> +often accelerated by HW virtualization features such as Intel's VT
> +extensions. Finally, it provides IO services to the VM by emulating HW
> +IO devices, such as disk and network devices.
> +
> +A disaggregated QEMU
> +~~~~~~~~~~~~~~~~~~~~
> +
> +A disaggregated QEMU involves separating QEMU services into separate
> +host processes. Each of these processes can be given only the privileges
> +it needs to provide its service, e.g., a disk service could be given
> +access only the the disk images it provides, and not be allowed to
> +access other files, or any network devices. An attacker who compromised
> +this service would not be able to use this exploit to access files or
> +devices beyond what the disk service was given access to.
> +
> +A QEMU control process would remain, but in disaggregated mode, it would
> +be a control point that executes the processes needed to support the VM
> +being created, but have no direct interfaces to the VM. During VM
> +execution, it would still provide the user interface to hot-plug devices
> +or live migrate the VM.

"it would be a control point that executes the processes needed to
support the VM being created"

libvirt does the sandboxing setup.  I think the responsibility of
executing and sandboxing device processes would also be left to libvirt,
not to QEMU.

Perhaps it's best to leave this sentence out and enable both approaches
(1. QEMU executes device processes, 2. management tool executes device
processes).

> +A first step in creating a disaggregated QEMU is to separate IO services
> +from the main QEMU program, which would continue to provide CPU
> +emulation. i.e., the control process would also be the CPU emulation
> +process. In a later phase, CPU emulation could be separated from the
> +control process.
> +
> +Disaggregating IO services
> +--------------------------
> +
> +Disaggregating IO services is a good place to begin QEMU disaggregating
> +for a couple of reasons. One is the sheer number of IO devices QEMU can
> +emulate provides a large surface of interfaces which could potentially
> +be exploited, and, indeed, have been a source of exploits in the past.
> +Another is the modular nature of QEMU device emulation code provides
> +interface points where the QEMU functions that perform device emulation
> +can be separated from the QEMU functions that manage the emulation of
> +guest CPU instructions.
> +
> +QEMU device emulation
> +~~~~~~~~~~~~~~~~~~~~~
> +
> +QEMU uses a object oriented SW architecture for device emulation code.
> +Configured objects are all compiled into the QEMU binary, then objects
> +are instantiated by name when used by the guest VM. For example, the
> +code to emulate a device named "foo" is always present in QEMU, but its
> +instantiation code is only run when the device is included in the target
> +VM. (e.g., via the QEMU command line as *-device foo*)
> +
> +The object model is hierarchical, so device emulation code names its
> +parent object (such as "pci-device" for a PCI device) and QEMU will
> +instantiate a parent object before calling the device's instantiation
> +code.
> +
> +Current separation models
> +~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +In order to separate the device emulation code from the CPU emulation
> +code, the device object code must run in a different process. There are
> +a couple of existing QEMU features that can run emulation code
> +separately from the main QEMU process. These are examined below.
> +
> +vhost user model
> +^^^^^^^^^^^^^^^^
> +
> +Virtio guest device drivers can be connected to vhost user applications
> +in order to perform their IO operations. This model uses special virtio
> +device drivers in the guest and vhost user device objects in QEMU, but
> +once the QEMU vhost user code has configured the vhost user application,
> +mission-mode IO is performed by the application. The vhost user
> +application is a daemon process that can be contacted via a known UNIX
> +domain socket.
> +
> +vhost socket
> +''''''''''''
> +
> +As mentioned above, one of the tasks of the vhost device object within
> +QEMU is to contact the vhost application and send it configuration
> +information about this device instance. As part of the configuration
> +process, the application can also be sent other file descriptors over
> +the socket, which then can be used by the vhost user application in
> +various ways, some of which are described below.
> +
> +vhost MMIO store acceleration
> +'''''''''''''''''''''''''''''
> +
> +VMs are often run using HW virtualization features via the KVM kernel
> +driver. This driver allows QEMU to accelerate the emulation of guest CPU
> +instructions by running the guest in a virtual HW mode. When the guest
> +executes instructions that cannot be executed by virtual HW mode,
> +execution returns to the KVM driver so it can inform QEMU to emulate the
> +instructions in SW.
> +
> +One of the events that can cause a return to QEMU is when a guest device
> +driver accesses an IO location. QEMU then dispatches the memory
> +operation to the corresponding QEMU device object. In the case of a
> +vhost user device, the memory operation would need to be sent over a
> +socket to the vhost application. This path is accelerated by the QEMU
> +virtio code by setting up an eventfd file descriptor that the vhost
> +application can directly receive MMIO store notifications from the KVM
> +driver, instead of needing them to be sent to the QEMU process first.
> +
> +vhost interrupt acceleration
> +''''''''''''''''''''''''''''
> +
> +Another optimization used by the vhost application is the ability to
> +directly inject interrupts into the VM via the KVM driver, again,
> +bypassing the need to send the interrupt back to the QEMU process first.
> +The QEMU virtio setup code configures the KVM driver with an eventfd
> +that triggers the device interrupt in the guest when the eventfd is
> +written. This irqfd file descriptor is then passed to the vhost user
> +application program.
> +
> +vhost access to guest memory
> +''''''''''''''''''''''''''''
> +
> +The vhost application is also allowed to directly access guest memory,
> +instead of needing to send the data as messages to QEMU. This is also
> +done with file descriptors sent to the vhost user application by QEMU.
> +These descriptors can be passed to ``mmap()`` by the vhost application
> +to map the guest address space into the vhost application.
> +
> +IOMMUs introduce another level of complexity, since the address given to
> +the guest virtio device to DMA to or from is not a guest physical
> +address. This case is handled by having vhost code within QEMU register
> +as a listener for IOMMU mapping changes. The vhost application maintains
> +a cache of IOMMMU translations: sending translation requests back to
> +QEMU on cache misses, and in turn receiving flush requests from QEMU
> +when mappings are purged.
> +
> +applicability to device separation
> +''''''''''''''''''''''''''''''''''
> +
> +Much of the vhost model can be re-used by separated device emulation. In
> +particular, the ideas of using a socket between QEMU and the device
> +emulation application, using a file descriptor to inject interrupts into
> +the VM via KVM, and allowing the application to ``mmap()`` the guest
> +should be re used.
> +
> +There are, however, some notable differences between how a vhost
> +application works and the needs of separated device emulation. The most
> +basic is that vhost uses custom virtio device drivers which always
> +trigger IO with MMIO stores. A separated device emulation model must
> +work with existing IO device models and guest device drivers. MMIO loads
> +break vhost store acceleration since they are synchronous - guest
> +progress cannot continue until the load has been emulated. By contrast,
> +stores are asynchronous, the guest can continue after the store event
> +has been sent to the vhost application.
> +
> +Another difference is that in the vhost user model, a single daemon can
> +support multiple QEMU instances. This is contrary to the security regime
> +desired, in which the emulation application should only be allowed to
> +access the files or devices the VM it's running on behalf of can access.
> +#### qemu-io model
> +
> +Qemu-io is a test harness used to test changes to the QEMU block backend
> +object code. (e.g., the code that implements disk images for disk driver
> +emulation) Qemu-io is not a device emulation application per se, but it
> +does compile the QEMU block objects into a separate binary from the main
> +QEMU one. This could be useful for disk device emulation, since its
> +emulation applications will need to include the QEMU block objects.
> +
> +New separation model based on proxy objects
> +-------------------------------------------
> +
> +A different model based on proxy objects in the QEMU program
> +communicating with remote emulation programs could provide separation
> +while minimizing the changes needed to the device emulation code. The
> +rest of this section is a discussion of how a proxy object model would
> +work.
> +
> +Remote emulation processes
> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The remote emulation process will run the QEMU object hierarchy without
> +modification. The device emulation objects will be also be based on the
> +QEMU code, because for anything but the simplest device, it would not be
> +a tractable to re-implement both the object model and the many device
> +backends that QEMU has.
> +
> +The processes will communicate with the QEMU process over UNIX domain
> +sockets. The processes can be executed either as standalone processes,
> +or be executed by QEMU. In both cases, the host backends the emulation
> +processes will provide are specified on its command line, as they would
> +be for QEMU. For example:
> +
> +::
> +
> +    disk-proc -blockdev driver=file,node-name=file0,filename=disk-file0  \
> +    -blockdev driver=qcow2,node-name=drive0,file=file0
> +
> +would indicate process *disk-proc* uses a qcow2 emulated disk named
> +*file0* as its backend.
> +
> +Emulation processes may emulate more than one guest controller. A common
> +configuration might be to put all controllers of the same device class
> +(e.g., disk, network, etc.) in a single process, so that all backends of
> +the same type can be managed by a single QMP monitor.
> +
> +communication with QEMU
> +^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Remote emulation processes will recognize a *-socket* argument that
> +specifies the path of a UNIX domain socket used to communicate with the
> +QEMU process. If no *-socket* argument is present, the process will use
> +file descriptor 0 to communicate with QEMU. For example,
> +
> +::
> +
> +    disk-proc -socket /tmp/disk0-sock <backend list>
> +
> +will communicate with QEMU using the socket path */tmp/dik0-sock*.

s/dik/disk/

> +
> +remote process QMP monitor
> +^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Remote emulation processes can be monitored via QMP, similar to QEMU
> +itself. The QMP monitor socket is specified the same as for a QEMU
> +process:
> +
> +::
> +
> +    disk-proc -qmp unix:/tmp/disk-mon,server
> +
> +can be monitored over the UNIX socket path */tmp/disk-mon*.
> +
> +QEMU command line
> +~~~~~~~~~~~~~~~~~
> +
> +The QEMU command line options will need to be modified to indicate which
> +items are emulated by a separate program, and which remain emulated by
> +QEMU itself.
> +
> +identifying remote emulation processes
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Remote emulation processes will be identified to QEMU using a *-remote*
> +command line option. This option can either specify a command that QEMU
> +will execute, or can specify a UNIX domain socket that QEMU can use to
> +connect to an existing process. Both forms require a "id" option that
> +identifies the process to later *-device* options. The process version
> +is:
> +
> +::
> +
> +    -remote id=disk-proc,command="disk-proc <backend list>"
> +
> +And the socket version is:
> +
> +::
> +
> +    -remote id=disk-proc,socket="/tmp/disk0-sock"
> +
> +In the latter case, the remote process must be given the same socket on
> +its command line when it is executed:
> +
> +::
> +
> +    disk-proc -socket /tmp/disk0-sock <backend list>
> +
> +identifying devices emulated remotely
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Devices that are to be emulated in a separate process will be identify

s/be//

> +the remote process with a "remote" option on their *-device* command
> +line specification. e.g., an LSI SCSI controller and disk can be
> +specified as:
> +
> +::
> +
> +    -device lsi53c895a,id=scsi0
> +    -device scsi-hd,drive=drive0,bus=scsi0.0,scsi-id=0
> +
> +If these devices are emulated by remote process "disk-proc," as
> +described in the previous section, the QEMU command line would be:
> +
> +::
> +
> +    -device lsi53c895a,id=scsi0,remote=disk-proc
> +    -device scsi-hd,drive=drive0,bus=scsi0.0,scsi-id=0,remote=disk-proc

The next patch documents rid=.  This seems to be the same as remote=?
Please use remote= everywhere.

> +
> +Some devices are implicitly created by the machine object. e.g., the q35
> +machine object will create its PCI bus, and attach an ich9-ahci IDE
> +controller to it. In this case, options will need to be added to the
> +*-machine* command line. e.g.,
> +
> +::
> +
> +    -machine pc-q35,ide-remote=disk-proc
> +
> +will use the remote process with an "id" of "disk-proc" to emulate the
> +IDE controller and its disks.

It might be possible to avoid introducing special-purpose *-remote=
parameters using the -set command-line option.  If you know the id of
the on-board device then you can set the remote= property on it:

  -set piix4-ide.ide0.remote=disk-proc

I haven't tried this but if it works then no code changes are required.

> +The disks themselves still need to be specified with *-remote* option,
> +as in the example above. e.g.,
> +
> +::
> +
> +    -device ide-hd,drive=drive0,bus=ide.0,unit=0,remote=disk-proc
> +
> +QEMU management of remote processes
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Each *-remote* instance on the QEMU command line will create a remote
> +process proxy instance in QEMU. They will be held on a *QList* that can
> +be searched for by its "id" property. The remote process proxy will also
> +establish a communication channel between QEMU and the remote process.
> +This can be done in one of two methods: direction execution of the
> +process by QEMU with ``fork()`` and ``exec()`` system calls, or by
> +connecting to an existing process.
> +
> +direct execution
> +^^^^^^^^^^^^^^^^
> +
> +When the remote process is directly executed, the remote process proxy
> +will setup a communication channel between itself and the emulation
> +process. This channel will be created using ``socketpair()`` and the
> +remote process side of the pair will be given to the process as file
> +descriptor 0.
> +
> +connecting to an existing process
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Some environments wish to deny QEMU the ability to execute ``fork()``
> +and ``exec()`` In these case, emulation processes will be started before
> +QEMU, and a UNIX domain socket will be given to each emulation process
> +to communicate with QEMU over. After communication is established, the
> +socket will be unlinked from the file system space by the QEMU process.
> +
> +communication with emulation process
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +primary socket
> +''''''''''''''
> +
> +Whether the process was executed by QEMU or externally, there will be a
> +primary socket for communication between QEMU and the remote process.
> +This channel will handle configuration commands from QEMU to the
> +process, either from the QEMU command line, or from QMP commands that
> +affect the devices being emulated by the process. This channel will only
> +allow one message to be pending at a time; if additional messages
> +arrive, they must wait for previous ones to be acknowledged from the
> +remote side.
> +
> +secondary sockets
> +'''''''''''''''''
> +
> +The primary socket can pass the file descriptors of secondary sockets
> +for operations that occur in parallel with commands on the primary
> +channel. These include MMIO operations generated by the guest, interrupt
> +notifications generated by the devices being emulated, or *vmstate* for
> +live migration. These secondary sockets will be created at the behest of
> +the device proxies that require them. A disk device proxy wouldn't need
> +any secondary sockets, but a disk controller device proxy may need both
> +an MMIO socket and an interrupt socket.
> +
> +emulation process attached via QMP command
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +There will be a new "attach-process" QMP command to facilitate device

The QMP command name "remote-add" would be consistent with object-add.
(There is also netdev_add and device_add but their names use underscores
for legacy reasons.)

> +hot-plug. This command's arguments will be the same as the *-remote*
> +command line when it's used to attach to a remote process. i.e., it will
> +need an "id" argument so that hot-plugged devices can later find it, and
> +a "socket" argument to identify the UNIX domain socket that will be used
> +to communicate with QEMU.
> +
> +QEMU device proxy objects
> +~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +QEMU has an object model based on sub-classes inherited from the
> +"object" super-class. The sub-classes that are of interest here are the
> +"device" and "bus" sub-classes whose child sub-classes make up the
> +device tree of a QEMU emulated system.
> +
> +The proxy object model will use device proxy objects to replace the
> +device emulation code within the QEMU process. These objects will live
> +in the same place in the object and bus hierarchies as the objects they
> +replace. i.e., the proxy object for an LSI SCSI controller will be a
> +sub-class of the "pci-device" class, and will have the same PCI bus
> +parent and the same SCSI bus child objects as the LSI controller object
> +it replaces.
> +
> +After the QEMU command line has been parsed, the remote devices will be
> +instantiated in the same manner as local devices are. (i.e.,
> +``qdev_device_add()``). In order to distinguish them from regular
> +*-device* device objects, their class name will be the name of the class
> +it replaces, with "-proxy" appended. e.g., the "lsi53c895a" proxy class
> +will be "lsi53c895a-proxy."

Did you consider defining just -device pci-device-proxy,remote=ID and
then transferring the device-specific details (e.g. PCI Configuration
Space, BARs, and interrupt configuration) over the socket during
initialization?

That way it's not necessary to write proxy devices.  There is just one
PCI proxy device that automatically reflects the information from the
device emulation process.

> +
> +device JSON description
> +^^^^^^^^^^^^^^^^^^^^^^^
> +
> +The remote process needs a JSON representation of the command line
> +options used to create the object. This JSON representation is used to
> +create the corresponding object in the emulation process. e.g., for an
> +LSI SCSI controller invoked as:
> +
> +::
> +
> +     -device lsi53c895a,id=scsi0,remote=lsi-scsi
> +
> +the proxy object would create a
> +
> +::
> +
> +    { "driver" : "lsi53c895a", "id" : "scsi0" }
> +
> +JSON description. The "driver" option is assigned to the device name
> +when the command line is parsed, so the "-proxy" appended by the command
> +line parsing code is removed. The "remote" option isn't needed in the
> +JSON description since it only applies to the proxy object in the QEMU
> +process.
> +
> +device object whitelist
> +^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Some device objects may not need a proxy. These are devices with no
> +direct guest interfaces. (e.g., no MMIO, PIO, or interrupts). There will
> +be a whitelist of such devices, and any devices on this list will not be
> +instantiated in QEMU. Their JSON representation will still be sent to
> +the remote process, so the object can be created there.
> +
> +object initialization
> +^^^^^^^^^^^^^^^^^^^^^
> +
> +QEMU object initialization occurs in two phases. The first
> +initialization happens once per object class. (i.e., there can be many
> +SCSI disks in an emulated system, but the "scsi-hd" class has its
> +``class_init()`` function called only once) The second phase happens
> +when each object's ``instance_init()`` function is called to initialize
> +each instance of the object.
> +
> +All device objects are sub-classes of the "device" class, so they also
> +have a ``realize()`` function that is called after ``instance_init()``
> +is called and after the object's static properties have been
> +initialized. Many device objects don't even provide an instance\_init()
> +function, and do all their per-instance work in ``realize()``.
> +
> +class\_init
> +'''''''''''
> +
> +The ``class_init()`` method of a proxy object will, in general behave
> +similarly to the object it replaces, including setting any static
> +properties and methods needed by the proxy.
> +
> +instance\_init / realize
> +''''''''''''''''''''''''
> +
> +The ``instance_init()`` and ``realize()`` functions would only need to
> +perform tasks related to being a proxy, such are registering its own
> +MMIO handlers, or creating a child bus that other proxy devices can be
> +attached to later.
> +
> +Other tasks will are device-specific. For example, PCI device objects
> +will initialize the PCI config space in order to make a valid PCI device
> +tree within the QEMU process.
> +
> +address space registration
> +^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Most devices are driven by guest device driver accesses to IO addresses
> +or ports. The QEMU device emulation code uses QEMU's memory region
> +function calls (such as ``memory_region_init_io()``) to add callback
> +functions that QEMU will invoke when the guest accesses the device's
> +areas of the IO address space. When a guest driver does access the
> +device, the VM will exit HW virtualization mode and return to QEMU,
> +which will then lookup and execute the corresponding callback function.
> +
> +A proxy object would need to mirror the memory region calls the actual
> +device emulator would perform in its initialization code, but with its
> +own callbacks. When invoked by QEMU as a result of a guest IO operation,
> +they will forward the operation to the device emulation process.
> +
> +PCI config space
> +^^^^^^^^^^^^^^^^
> +
> +PCI devices also have a configuration space that can be accessed by the
> +guest driver. Guest accesses to this space is not handled by the device
> +emulation object, but by its PCI parent object. Much of this space is
> +read-only, but certain registers (especially BAR and MSI-related ones)
> +need to be propagated to the emulation process.
> +
> +PCI parent proxy
> +''''''''''''''''
> +
> +One way to propagate guest PCI config accesses is to create a
> +"pci-device-proxy" class that can serve as the parent of a PCI device
> +proxy object. This class's parent would be "pci-device" and it would
> +override the PCI parent's ``config_read()`` and ``config_write()``
> +methods with ones that forward these operations to the emulation
> +program.
> +
> +interrupt receipt
> +^^^^^^^^^^^^^^^^^
> +
> +A proxy for a device that generates interrupts will need to create a
> +socket to receive interrupt indications from the emulation process. An
> +incoming interrupt indication would then be sent up to its bus parent to
> +be injected into the guest. For example, a PCI device object may use
> +``pci_set_irq()``.
> +
> +live migration
> +^^^^^^^^^^^^^^
> +
> +The proxy will register to save and restore any *vmstate* it needs over
> +a live migration event. The device proxy does not need to manage the
> +remote device's *vmstate*; that will be handled by the remote process
> +proxy (see below).
> +
> +QEMU remote device operation
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Generic device operations, such as DMA, will be performs by the remote

s/performs/performed/

> +process proxy by sending messages to the remote process.
> +
> +DMA operations
> +^^^^^^^^^^^^^^
> +
> +DMA operations would be handled much like vhost applications do. One of
> +the initial messages sent to the emulation process is a guest memory
> +table. Each entry in this table consists of a file descriptor and size
> +that the emulation process can ``mmap()`` to directly access guest
> +memory, similar to ``vhost_user_set_mem_table()``. Note guest memory
> +must be backed by file descriptors, such as when QEMU is given the
> +*-mem-path* command line option.
> +
> +IOMMU operations
> +^^^^^^^^^^^^^^^^
> +
> +When the emulated system includes an IOMMU, the remote process proxy in
> +QEMU will need to create a socket for IOMMU requests from the emulation
> +process. It will handle those requests with an
> +``address_space_get_iotlb_entry()`` call. In order to handle IOMMU
> +unmaps, the remote process proxy will also register as a listener on the
> +device's DMA address space. When an IOMMU memory region is created
> +within the DMA address space, an IOMMU notifier for unmaps will be added
> +to the memory region that will forward unmaps to the emulation process
> +over the IOMMU socket.
> +
> +device hot-plug via QMP
> +^^^^^^^^^^^^^^^^^^^^^^^
> +
> +An QMP "device\_add" command can add a device emulated by a remote
> +process. It needs to add a "remote" option to the command, just as the
> +*-device* command line option does. The remote process may either be one

device_add parameters are parsed by the same code as -device.  It
shouldn't be necessary to add a "remote" option to device_add.

> +started at QEMU startup, or be one added by the "add-process" QMP
> +command described above. In either case, the remote process proxy will
> +forward the new device's JSON description to the corresponding emulation
> +process.
> +
> +live migration
> +^^^^^^^^^^^^^^
> +
> +The remote process proxy will also register for live migration
> +notifications with ``vmstate_register()``. When called to save state,
> +the proxy will send the remote process a secondary socket file
> +descriptor to save the remote process's device *vmstate* over. The
> +incoming byte stream length and data will be saved as the proxy's
> +*vmstate*. When the proxy is resumed on its new host, this *vmstate*
> +will be extracted, and a secondary socket file descriptor will be sent
> +to the new remote process through which it receives the *vmstate* in
> +order to restore the devices there.
> +
> +device emulation in remote process
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The parts of QEMU that the emulation program will need include the
> +object model; the memory emulation objects; the device emulation objects
> +of the targeted device, and any dependent devices; and, the device's
> +backends. It will also need code to setup the machine environment,
> +handle requests from the QEMU process, and route machine-level requests
> +(such as interrupts or IOMMU mappings) back to the QEMU process.
> +
> +initialization
> +''''''''''''''
> +
> +The process initialization sequence will follow the same sequence
> +followed by QEMU. It will first initialize the backend objects, then
> +device emulation objects. The JSON descriptions sent by the QEMU process
> +will drive which objects need to be created.
> +
> +-  address spaces
> +
> +Before the device objects are created, the initial address spaces and
> +memory regions must be configured with ``memory_map_init()``. This
> +creates a RAM memory region object (*system\_memory*) and an IO memory
> +region object (*system\_io*).
> +
> +-  RAM
> +
> +RAM memory region creation will follow how ``pc_memory_init()`` creates
> +them, but must use ``memory_region_init_ram_from_fd()`` instead of
> +``memory_region_allocate_system_memory()``. The file descriptors needed
> +will be supplied by the guest memory table from above. Those RAM regions
> +would then be added to the *system\_memory* memory region with
> +``memory_region_add_subregion()``.
> +
> +-  PCI
> +
> +IO initialization will be driven by the JSON descriptions sent from the
> +QEMU process. For a PCI device, a PCI bus will need to be created with
> +``pci_root_bus_new()``, and a PCI memory region will need to be created
> +and added to the *system\_memory* memory region with
> +``memory_region_add_subregion_overlap()``. The overlap version is
> +required for architectures where PCI memory overlaps with RAM memory.
> +
> +MMIO handling
> +'''''''''''''
> +
> +The device emulation objects will use ``memory_region_init_io()`` to
> +install their MMIO handlers, and ``pci_register_bar()`` to associate
> +those handlers with a PCI BAR, as they do within QEMU currently.
> +
> +In order to use ``address_space_rw()`` in the emulation process to
> +handle MMIO requests from QEMU, the PCI physical addresses must be the
> +same in the QEMU process and the device emulation process. In order to
> +accomplish that, guest BAR programming must also be forwarded from QEMU
> +to the emulation process.
> +
> +interrupt injection
> +'''''''''''''''''''
> +
> +When device emulation wants to inject an interrupt into the VM, the
> +request climbs the device's bus object hierarchy until the point where a
> +bus object knows how to signal the interrupt to the guest. The details
> +depend on the type of interrupt being raised.
> +
> +-  PCI pin interrupts
> +
> +On x86 systems, there is an emulated IOAPIC object attached to the root
> +PCI bus object, and the root PCI object forwards interrupt requests to
> +it. The IOAPIC object, in turn, calls the KVM driver to inject the
> +corresponding interrupt into the VM. The simplest way to handle this in
> +an emulation process would be to setup the root PCI bus driver (via
> +``pci_bus_irqs()``) to send a interrupt request back to the QEMU
> +process, and have the device proxy object reflect it up the PCI tree
> +there.
> +
> +-  PCI MSI/X interrupts
> +
> +PCI MSI/X interrupts are implemented in HW as DMA writes to a
> +CPU-specific PCI address. In QEMU on x86, a KVM APIC object receives
> +these DMA writes, then calls into the KVM driver to inject the interrupt
> +into the VM. A simple emulation process implementation would be to send
> +the MSI DMA address from QEMU as a message at initialization, then
> +install an address space handler at that address which forwards the MSI
> +message back to QEMU.
> +
> +DMA operations
> +''''''''''''''
> +
> +When a emulation object wants to DMA into or out of guest memory, it
> +first must use dma\_memory\_map() to convert the DMA address to a local
> +virtual address. The emulation process memory region objects setup above
> +will be used to translate the DMA address to a local virtual address the
> +device emulation code can access.
> +
> +IOMMU
> +'''''
> +
> +When an IOMMU is in use in QEMU, DMA translation uses IOMMU memory
> +regions to translate the DMA address to a guest physical address before
> +that physical address can be translated to a local virtual address. The
> +emulation process will need similar functionality.
> +
> +-  IOTLB cache
> +
> +The emulation process will maintain a cache of recent IOMMU translations
> +(the IOTLB). When the translate() callback of an IOMMU memory region is
> +invoked, the IOTLB cache will be searched for an entry that will map the
> +DMA address to a guest PA. On a cache miss, a message will be sent back
> +to QEMU requesting the corresponding translation entry, which be both be
> +used to return a guest address and be added to the cache.
> +
> +-  IOTLB purge
> +
> +The IOMMU emulation will also need to act on unmap requests from QEMU.
> +These happen when the guest IOMMU driver purges an entry from the
> +guest's translation table.
> +
> +live migration
> +''''''''''''''
> +
> +When a remote process receives a live migration indication from QEMU, it
> +will set up a channel using the received file descriptor with
> +``qio_channel_socket_new_fd()``. This channel will be used to create a
> +*QEMUfile* that can be passed to ``qemu_save_device_state()`` to send
> +the process's device state back to QEMU. This method will be reversed on
> +restore - the channel will be passed to ``qemu_loadvm_state()`` to
> +restore the device state.
> +

I have reviewed up to here... :)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 49/49] multi-process: add configure and usage information
  2019-11-07 14:39     ` Daniel P. Berrangé
@ 2019-11-07 15:53       ` Jag Raman
  2019-11-08 11:14         ` Stefan Hajnoczi
  0 siblings, 1 reply; 140+ messages in thread
From: Jag Raman @ 2019-11-07 15:53 UTC (permalink / raw)
  To: Daniel P. Berrangé, Stefan Hajnoczi, mst
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost, quintela,
	konrad.wilk, qemu-devel, armbru, ross.lagerwall, mreitz,
	kanth.ghatraju, kraxel, stefanha, pbonzini, liran.alon,
	marcandre.lureau, kwolf, dgilbert, rth



On 11/7/2019 9:39 AM, Daniel P. Berrangé wrote:
> On Thu, Nov 07, 2019 at 03:02:20PM +0100, Stefan Hajnoczi wrote:
>> On Thu, Oct 24, 2019 at 05:09:30AM -0400, Jagannathan Raman wrote:
>>> From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>>>
>>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>>> ---
>>>   docs/qemu-multiprocess.txt | 86 ++++++++++++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 86 insertions(+)
>>>   create mode 100644 docs/qemu-multiprocess.txt
>>>
>>> diff --git a/docs/qemu-multiprocess.txt b/docs/qemu-multiprocess.txt
>>> new file mode 100644
>>> index 0000000..c29f4df
>>> --- /dev/null
>>> +++ b/docs/qemu-multiprocess.txt
>>> @@ -0,0 +1,86 @@
>>> +Multi-process QEMU
>>> +==================
>>> +
>>> +This document describes how to configure and use multi-process qemu.
>>> +For the design document refer to docs/devel/qemu-multiprocess.
>>> +
>>> +1) Configuration
>>> +----------------
>>> +
>>> +To enable support for multi-process add --enable-mpqemu
>>> +to the list of options for the "configure" script.
>>> +
>>> +
>>> +2) Usage
>>> +--------
>>> +
>>> +To start qemu with devices intended to run in a separate emulation
>>> +process without libvirtd support, the following should be used on QEMU
>>> +command line. As of now, we only support the emulation of lsi53c895a
>>> +in a separate process
>>> +
>>> +* Since parts of the RAM are shared between QEMU & remote process, a
>>> +  memory-backend-file is required to facilitate this, as follows:
>>> +
>>> +  -object memory-backend-file,id=mem,mem-path=/dev/shm/,size=4096M,share=on
>>> +
>>> +* The devices to be emulated in the separate process are defined as
>>> +  before with addition of "rid" suboption that serves as a remote group
>>> +  identificator.
>>> +
>>> +  -device <device options>,rid="remote process id"
>>> +
>>> +  For exmaple, for non multi-process qemu:
>>
>> s/exmaple/example/
>>
>>> +    -device lsi53c895a,id=scsi0 device
>>> +    -device scsi-hd,drive=drive0,bus=scsi0.0,scsi-id=0
>>> +    -drive id=drive0,file=data-disk.img
>>> +
>>> +  and for multi-process qemu and no libvirt
>>> +  support (i.e. QEMU forks child processes):
>>> +    -device lsi53c895a,id=scsi0,rid=0
>>> +    -device scsi-hd,drive=drive0,bus=scsi0.0,scsi-id=0,rid="0"
>>> +
>>> +* The command-line options for the remote process is added to the "command"
>>
>> s/is added/are added/
>>
>>> +  suboption of the newly added "-remote" option.
>>> +
>>> +   -remote [socket],rid=,command="..."
>>> +
>>> +  The drives to be emulated by the remote process are specified as part of
>>> +  this command sub-option. The device to be used to connect to the monitor
>>> +  is also specified as part of this suboption.
>>> +
>>> +  For example, the following option adds a drive and monitor to the remote
>>> +  process:
>>> +  -remote rid=0,command="-drive id=drive0,,file=data-disk.img -monitor unix:/home/qmp-sock,,server,,nowait"
>>> +
>>> +  Note: There's an issue with this "command" subtion which we are in the
>>
>> s/subtion/sub-option/
>>
>>> +  process of fixing. To work around this issue, it requires additional
>>> +  "comma" characters as illustrated above, and in the example below.
>>> +
>>> +* Example QEMU command-line to launch lsi53c895a in a remote process
>>> +
>>> +  #/bin/sh
>>> +  qemu-system-x86_64 \
>>> +  -name "OL7.4" \
>>> +  -machine q35,accel=kvm \
>>> +  -smp sockets=1,cores=1,threads=1 \
>>> +  -cpu host \
>>> +  -m 2048 \
>>> +  -object memory-backend-file,id=mem,mem-path=/dev/shm/,size=2G,share=on \
>>> +  -numa node,memdev=mem \
>>> +  -device virtio-scsi-pci,id=virtio_scsi_pci0 \
>>> +  -drive id=drive_image1,if=none,format=raw,file=/root/ol7.qcow2 \
>>> +  -device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0 \
>>> +  -boot d \
>>> +  -monitor stdio \
>>> +  -vnc :0 \
>>> +  -device lsi53c895a,id=lsi0,remote,rid=8,command="qemu-scsi-dev" \
>>> +  -device scsi-hd,id=drive2,drive=drive_image2,bus=lsi0.0,scsi-id=0,remote,rid=8,command="qemu-scsi-dev"\
>>> +  -remote rid=8,command="-drive id=drive_image2,,file=/root/remote-process-disk.img -monitor unix:/home/qmp-sock,,server,,nowait"
>>> +
>>> +  We could connect to the monitor using the following command:
>>> +  socat /home/qmp-sock stdio
>>> +
>>> +  After hotplugging disks to the remote process, please execute the
>>> +  following command in the guest to refresh the list of storage devices:
>>> +  rescan_scsi_bus.sh -a
>>
>> This documentation suggests that QEMU spawns the remote processes.  How
>> do this work with unprivileged QEMU?  Is there an additional step where
>> QEMU drops privileges after having spawned remote processes?
> 
> This syntax is for the simple case without privilege separation.
> If differing privilege levels are needed, then whatever spawns QEMU
> should spawn the remote helper process ahead of time, and then just
> pass the UNIX socket path to the -remote arg, instead of using
> the 'command' parameter.
> 
> Regards,
> Daniel

Thank You, Stefan, Michael & Daniel, for your comments. I had a chance
to sit down with my teammates to understand the feedback you gave at the
KVM Forum. Thank you for that, as well.

We currently support two ways of launching the remote process - one is
self-launch through QEMU, as outlined in this patch series. The other
approach is using an Orchestrator like libvirt (we haven't had the
chance to submit those patches for review yet).

In the case where libvirt is involved, it would assume the
responsibility of spawning the remote process first and pass in the info
required to connect to the remote process via command-line arguments to
QEMU. This support in QEMU is available in the current series. We
haven't sent the libvirt side of patches out for review yet. It would be
easier to upstream libvirt once the QEMU side of things is firmed up.

In the case of self-launch, our understanding is that QEMU has the
privilege to fork() the remote process until the "-sandbox" argument is
processed. However, if an Orchestrator prohibits QEMU from spawning
other processes from the get-go, then the Orchestrator would assume the
responsibility of spawning the remote process as well - like Daniel just
pointed out.

In both cases, we intend to apply the security policies required to
confine the remote process externally - probably through SELinux. We
haven't had the chance to upstream the SELinux policies yet, but we
previously sent a sample of the policies for your comments. Like Michael
pointed out earlier, the SELinux policies are per binary.

Thank you very much!
--
Jag

> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 49/49] multi-process: add configure and usage information
  2019-11-07 15:53       ` Jag Raman
@ 2019-11-08 11:14         ` Stefan Hajnoczi
  0 siblings, 0 replies; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-08 11:14 UTC (permalink / raw)
  To: Jag Raman
  Cc: elena.ufimtseva, fam, john.g.johnson, Stefan Hajnoczi,
	qemu-devel, kraxel, quintela, mst, armbru, kanth.ghatraju, thuth,
	ehabkost, konrad.wilk, dgilbert, liran.alon, rth, kwolf,
	Daniel P. Berrangé,
	mreitz, ross.lagerwall, marcandre.lureau, pbonzini

[-- Attachment #1: Type: text/plain, Size: 7586 bytes --]

On Thu, Nov 07, 2019 at 10:53:27AM -0500, Jag Raman wrote:
> 
> 
> On 11/7/2019 9:39 AM, Daniel P. Berrangé wrote:
> > On Thu, Nov 07, 2019 at 03:02:20PM +0100, Stefan Hajnoczi wrote:
> > > On Thu, Oct 24, 2019 at 05:09:30AM -0400, Jagannathan Raman wrote:
> > > > From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> > > > 
> > > > Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> > > > Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> > > > Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> > > > ---
> > > >   docs/qemu-multiprocess.txt | 86 ++++++++++++++++++++++++++++++++++++++++++++++
> > > >   1 file changed, 86 insertions(+)
> > > >   create mode 100644 docs/qemu-multiprocess.txt
> > > > 
> > > > diff --git a/docs/qemu-multiprocess.txt b/docs/qemu-multiprocess.txt
> > > > new file mode 100644
> > > > index 0000000..c29f4df
> > > > --- /dev/null
> > > > +++ b/docs/qemu-multiprocess.txt
> > > > @@ -0,0 +1,86 @@
> > > > +Multi-process QEMU
> > > > +==================
> > > > +
> > > > +This document describes how to configure and use multi-process qemu.
> > > > +For the design document refer to docs/devel/qemu-multiprocess.
> > > > +
> > > > +1) Configuration
> > > > +----------------
> > > > +
> > > > +To enable support for multi-process add --enable-mpqemu
> > > > +to the list of options for the "configure" script.
> > > > +
> > > > +
> > > > +2) Usage
> > > > +--------
> > > > +
> > > > +To start qemu with devices intended to run in a separate emulation
> > > > +process without libvirtd support, the following should be used on QEMU
> > > > +command line. As of now, we only support the emulation of lsi53c895a
> > > > +in a separate process
> > > > +
> > > > +* Since parts of the RAM are shared between QEMU & remote process, a
> > > > +  memory-backend-file is required to facilitate this, as follows:
> > > > +
> > > > +  -object memory-backend-file,id=mem,mem-path=/dev/shm/,size=4096M,share=on
> > > > +
> > > > +* The devices to be emulated in the separate process are defined as
> > > > +  before with addition of "rid" suboption that serves as a remote group
> > > > +  identificator.
> > > > +
> > > > +  -device <device options>,rid="remote process id"
> > > > +
> > > > +  For exmaple, for non multi-process qemu:
> > > 
> > > s/exmaple/example/
> > > 
> > > > +    -device lsi53c895a,id=scsi0 device
> > > > +    -device scsi-hd,drive=drive0,bus=scsi0.0,scsi-id=0
> > > > +    -drive id=drive0,file=data-disk.img
> > > > +
> > > > +  and for multi-process qemu and no libvirt
> > > > +  support (i.e. QEMU forks child processes):
> > > > +    -device lsi53c895a,id=scsi0,rid=0
> > > > +    -device scsi-hd,drive=drive0,bus=scsi0.0,scsi-id=0,rid="0"
> > > > +
> > > > +* The command-line options for the remote process is added to the "command"
> > > 
> > > s/is added/are added/
> > > 
> > > > +  suboption of the newly added "-remote" option.
> > > > +
> > > > +   -remote [socket],rid=,command="..."
> > > > +
> > > > +  The drives to be emulated by the remote process are specified as part of
> > > > +  this command sub-option. The device to be used to connect to the monitor
> > > > +  is also specified as part of this suboption.
> > > > +
> > > > +  For example, the following option adds a drive and monitor to the remote
> > > > +  process:
> > > > +  -remote rid=0,command="-drive id=drive0,,file=data-disk.img -monitor unix:/home/qmp-sock,,server,,nowait"
> > > > +
> > > > +  Note: There's an issue with this "command" subtion which we are in the
> > > 
> > > s/subtion/sub-option/
> > > 
> > > > +  process of fixing. To work around this issue, it requires additional
> > > > +  "comma" characters as illustrated above, and in the example below.
> > > > +
> > > > +* Example QEMU command-line to launch lsi53c895a in a remote process
> > > > +
> > > > +  #/bin/sh
> > > > +  qemu-system-x86_64 \
> > > > +  -name "OL7.4" \
> > > > +  -machine q35,accel=kvm \
> > > > +  -smp sockets=1,cores=1,threads=1 \
> > > > +  -cpu host \
> > > > +  -m 2048 \
> > > > +  -object memory-backend-file,id=mem,mem-path=/dev/shm/,size=2G,share=on \
> > > > +  -numa node,memdev=mem \
> > > > +  -device virtio-scsi-pci,id=virtio_scsi_pci0 \
> > > > +  -drive id=drive_image1,if=none,format=raw,file=/root/ol7.qcow2 \
> > > > +  -device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0 \
> > > > +  -boot d \
> > > > +  -monitor stdio \
> > > > +  -vnc :0 \
> > > > +  -device lsi53c895a,id=lsi0,remote,rid=8,command="qemu-scsi-dev" \
> > > > +  -device scsi-hd,id=drive2,drive=drive_image2,bus=lsi0.0,scsi-id=0,remote,rid=8,command="qemu-scsi-dev"\
> > > > +  -remote rid=8,command="-drive id=drive_image2,,file=/root/remote-process-disk.img -monitor unix:/home/qmp-sock,,server,,nowait"
> > > > +
> > > > +  We could connect to the monitor using the following command:
> > > > +  socat /home/qmp-sock stdio
> > > > +
> > > > +  After hotplugging disks to the remote process, please execute the
> > > > +  following command in the guest to refresh the list of storage devices:
> > > > +  rescan_scsi_bus.sh -a
> > > 
> > > This documentation suggests that QEMU spawns the remote processes.  How
> > > do this work with unprivileged QEMU?  Is there an additional step where
> > > QEMU drops privileges after having spawned remote processes?
> > 
> > This syntax is for the simple case without privilege separation.
> > If differing privilege levels are needed, then whatever spawns QEMU
> > should spawn the remote helper process ahead of time, and then just
> > pass the UNIX socket path to the -remote arg, instead of using
> > the 'command' parameter.
> > 
> > Regards,
> > Daniel
> 
> Thank You, Stefan, Michael & Daniel, for your comments. I had a chance
> to sit down with my teammates to understand the feedback you gave at the
> KVM Forum. Thank you for that, as well.
> 
> We currently support two ways of launching the remote process - one is
> self-launch through QEMU, as outlined in this patch series. The other
> approach is using an Orchestrator like libvirt (we haven't had the
> chance to submit those patches for review yet).
> 
> In the case where libvirt is involved, it would assume the
> responsibility of spawning the remote process first and pass in the info
> required to connect to the remote process via command-line arguments to
> QEMU. This support in QEMU is available in the current series. We
> haven't sent the libvirt side of patches out for review yet. It would be
> easier to upstream libvirt once the QEMU side of things is firmed up.
> 
> In the case of self-launch, our understanding is that QEMU has the
> privilege to fork() the remote process until the "-sandbox" argument is
> processed. However, if an Orchestrator prohibits QEMU from spawning
> other processes from the get-go, then the Orchestrator would assume the
> responsibility of spawning the remote process as well - like Daniel just
> pointed out.
> 
> In both cases, we intend to apply the security policies required to
> confine the remote process externally - probably through SELinux. We
> haven't had the chance to upstream the SELinux policies yet, but we
> previously sent a sample of the policies for your comments. Like Michael
> pointed out earlier, the SELinux policies are per binary.

Sounds good, please document -remote socket= as an alternative to
-remote command= so it's clear that both approaches are supported.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 49/49] multi-process: add configure and usage information
  2019-11-07 14:33     ` Michael S. Tsirkin
@ 2019-11-08 11:17       ` Stefan Hajnoczi
  2019-11-08 11:32         ` Daniel P. Berrangé
  0 siblings, 1 reply; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-08 11:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: elena.ufimtseva, fam, john.g.johnson, Stefan Hajnoczi,
	qemu-devel, kraxel, Jagannathan Raman, quintela, armbru,
	kanth.ghatraju, thuth, ehabkost, konrad.wilk, dgilbert,
	liran.alon, rth, kwolf, berrange, mreitz, ross.lagerwall,
	marcandre.lureau, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1087 bytes --]

On Thu, Nov 07, 2019 at 09:33:45AM -0500, Michael S. Tsirkin wrote:
> On Thu, Nov 07, 2019 at 03:02:20PM +0100, Stefan Hajnoczi wrote:
> > This documentation suggests that QEMU spawns the remote processes.  How
> > do this work with unprivileged QEMU?  Is there an additional step where
> > QEMU drops privileges after having spawned remote processes?
> > 
> > Remote processes require accesses to resources that the main QEMU
> > process does not need access to, so I'm wondering how this process model
> > ensures that each process has only the privileges it needs.
> 
> I guess you have something like capabilities in mind?

Or namespaces (unshare(2)).

> When using something like selinux, priviledges are per binary
> so the order of startup doesn't matter.

For static SELinux policies that make sense, thanks for explaining.

Does libvirt also perform dynamic (i.e. per-instance) SELinux
configuration?  I guess that cannot be associated with a specific binary
because multiple QEMU instances launch the same binary yet need to be
differentiated.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 49/49] multi-process: add configure and usage information
  2019-11-08 11:17       ` Stefan Hajnoczi
@ 2019-11-08 11:32         ` Daniel P. Berrangé
  0 siblings, 0 replies; 140+ messages in thread
From: Daniel P. Berrangé @ 2019-11-08 11:32 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: elena.ufimtseva, fam, john.g.johnson, Stefan Hajnoczi,
	qemu-devel, kraxel, Jagannathan Raman, quintela,
	Michael S. Tsirkin, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, rth, kwolf, mreitz,
	ross.lagerwall, marcandre.lureau, pbonzini

On Fri, Nov 08, 2019 at 12:17:41PM +0100, Stefan Hajnoczi wrote:
> On Thu, Nov 07, 2019 at 09:33:45AM -0500, Michael S. Tsirkin wrote:
> > On Thu, Nov 07, 2019 at 03:02:20PM +0100, Stefan Hajnoczi wrote:
> > > This documentation suggests that QEMU spawns the remote processes.  How
> > > do this work with unprivileged QEMU?  Is there an additional step where
> > > QEMU drops privileges after having spawned remote processes?
> > > 
> > > Remote processes require accesses to resources that the main QEMU
> > > process does not need access to, so I'm wondering how this process model
> > > ensures that each process has only the privileges it needs.
> > 
> > I guess you have something like capabilities in mind?
> 
> Or namespaces (unshare(2)).
> 
> > When using something like selinux, priviledges are per binary
> > so the order of startup doesn't matter.
> 
> For static SELinux policies that make sense, thanks for explaining.
> 
> Does libvirt also perform dynamic (i.e. per-instance) SELinux
> configuration?  I guess that cannot be associated with a specific binary
> because multiple QEMU instances launch the same binary yet need to be
> differentiated.

In a traditional SELinux approach, the SELinux context used for any
process is determined by a combination of the label on the binary
and a transition rule.

eg if the qemu-system-x86_64 file is labelled qemu_exec_t, and
there's a context qemu_t for the QEMU process, a transition
rule is defined  "virtd_t + qemu_exec_t ->  qemu_t". This says
that when a process with context "vird_t" execs a binary labelled
qemu_exec_t, the new process gets qemu_t.

We sVirt, however, we can't rely on automatic transitions, because
we need to assign a unique MCS tag for each VM. Thus libvird will
explicitly tell SELinux what label to apply.

In the case of multiprocess QEMU, if using sVirt from libvirt, then
we'll need to continue setting the explicit labels as we'll still
need the MCS tags for each helper process.

If not using libvirt and sVirt, and wanting automatic SELinux
transitions for QEMU helper processes, then each helper would
need to be a separate binary on disk so that each helper can
be given a distinct file label, which in turns lets you define
a set of transitions for each helper according to its expected
access needs.

Having said all that I don't think its worth worrying about
this. Anyone who cares about SELinux with QEMU will want to
be using sVirt  or an equivalent approach to assign unique
MCS per VM. And thus automatic transitions are not possible
even if we had distinct binaries for each helper.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 48/49] multi-process: add the concept description to docs/devel/qemu-multiprocess
  2019-10-24  9:09 ` [RFC v4 PATCH 48/49] multi-process: add the concept description to docs/devel/qemu-multiprocess Jagannathan Raman
  2019-10-25 19:33   ` Elena Ufimtseva
  2019-11-07 15:50   ` Stefan Hajnoczi
@ 2019-11-11 15:41   ` Stefan Hajnoczi
  2 siblings, 0 replies; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-11 15:41 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, quintela, berrange, mst, qemu-devel, armbru,
	ross.lagerwall, kanth.ghatraju, kraxel, kwolf, pbonzini,
	liran.alon, marcandre.lureau, mreitz, dgilbert, rth

[-- Attachment #1: Type: text/plain, Size: 675 bytes --]

On Thu, Oct 24, 2019 at 05:09:29AM -0400, Jagannathan Raman wrote:
> +Accelerating device emulation
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The messages that are required to be sent between QEMU and the emulation
> +process can add considerable latency to IO operations. The optimizations
> +described below attempt to ameliorate this effect by allowing the
> +emulation process to communicate directly with the kernel KVM driver.
> +The KVM file descriptors created wold be passed to the emulation process

s/wold/would/

I skipped the acceleration section for now because they require kvm.ko
changes.  I'll focus the remainder of the review on the patches as they
are now.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 47/49] multi-process: Enable support for multiple devices in remote
  2019-10-24  9:09 ` [RFC v4 PATCH 47/49] multi-process: Enable support for multiple devices in remote Jagannathan Raman
@ 2019-11-11 16:15   ` Stefan Hajnoczi
  2019-11-13 16:21     ` Jag Raman
  0 siblings, 1 reply; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-11 16:15 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, quintela, berrange, mst, qemu-devel, armbru,
	ross.lagerwall, kanth.ghatraju, kraxel, kwolf, pbonzini,
	liran.alon, marcandre.lureau, mreitz, dgilbert, rth

[-- Attachment #1: Type: text/plain, Size: 1025 bytes --]

On Thu, Oct 24, 2019 at 05:09:28AM -0400, Jagannathan Raman wrote:
> @@ -93,7 +94,8 @@ static void process_config_write(MPQemuMsg *msg)
>      struct conf_data_msg *conf = (struct conf_data_msg *)msg->data2;
>  
>      qemu_mutex_lock_iothread();
> -    pci_default_write_config(remote_pci_dev, conf->addr, conf->val, conf->l);
> +    pci_default_write_config(remote_pci_devs[msg->id], conf->addr, conf->val,
> +                             conf->l);
>      qemu_mutex_unlock_iothread();
>  }
>  
> @@ -106,7 +108,8 @@ static void process_config_read(MPQemuMsg *msg)
>      wait = msg->fds[0];
>  
>      qemu_mutex_lock_iothread();
> -    val = pci_default_read_config(remote_pci_dev, conf->addr, conf->l);
> +    val = pci_default_read_config(remote_pci_devs[msg->id], conf->addr,
> +                                  conf->l);
>      qemu_mutex_unlock_iothread();
>  
>      notify_proxy(wait, val);

msg->id was read from a socket and hasn't been validated before indexing
into remote_pci_devs[].

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 45/49] multi-process/mig: Synchronize runstate of remote process
  2019-10-24  9:09 ` [RFC v4 PATCH 45/49] multi-process/mig: Synchronize runstate of remote process Jagannathan Raman
@ 2019-11-11 16:17   ` Stefan Hajnoczi
  2019-11-13 16:33     ` Jag Raman
  0 siblings, 1 reply; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-11 16:17 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, quintela, berrange, mst, qemu-devel, armbru,
	ross.lagerwall, kanth.ghatraju, kraxel, kwolf, pbonzini,
	liran.alon, marcandre.lureau, mreitz, dgilbert, rth

[-- Attachment #1: Type: text/plain, Size: 968 bytes --]

On Thu, Oct 24, 2019 at 05:09:26AM -0400, Jagannathan Raman wrote:
> @@ -656,6 +657,19 @@ static void init_proxy(PCIDevice *dev, char *command, bool need_spawn, Error **e
>      }
>  }
>  
> +static void proxy_vm_state_change(void *opaque, int running, RunState state)
> +{
> +    PCIProxyDev *dev = opaque;
> +    MPQemuMsg msg = { 0 };
> +
> +    msg.cmd = RUNSTATE_SET;
> +    msg.bytestream = 0;
> +    msg.size = sizeof(msg.data1);
> +    msg.data1.runstate.state = state;
> +
> +    mpqemu_msg_send(dev->mpqemu_link, &msg, dev->mpqemu_link->com);
> +}

Changing vm state is a barrier operation - devices must not dirty memory
afterwards.  This function doesn't have barrier semantics, it sends off
the message without waiting for the remote process to finish processing
it.  This means there is a race condition where QEMU has changes the vm
state but devices could still dirty memory.  Please wait for a reply to
prevent this.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 33/49] multi-process: perform device reset in the remote process
  2019-10-24  9:09 ` [RFC v4 PATCH 33/49] multi-process: perform device reset in the remote process Jagannathan Raman
@ 2019-11-11 16:19   ` Stefan Hajnoczi
  2019-11-13 16:15     ` Jag Raman
  0 siblings, 1 reply; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-11 16:19 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, quintela, berrange, mst, qemu-devel, armbru,
	ross.lagerwall, kanth.ghatraju, kraxel, kwolf, pbonzini,
	liran.alon, marcandre.lureau, mreitz, dgilbert, rth

[-- Attachment #1: Type: text/plain, Size: 598 bytes --]

On Thu, Oct 24, 2019 at 05:09:14AM -0400, Jagannathan Raman wrote:
> +void proxy_device_reset(DeviceState *dev)
> +{
> +    PCIProxyDev *pdev = PCI_PROXY_DEV(dev);
> +    MPQemuMsg msg;
> +
> +    memset(&msg, 0, sizeof(MPQemuMsg));
> +
> +    msg.bytestream = 0;
> +    msg.size = sizeof(msg.data1);
> +    msg.cmd = DEVICE_RESET;
> +
> +    mpqemu_msg_send(pdev->mpqemu_link, &msg, pdev->mpqemu_link->com);
> +}

Device reset must wait for the remote process to finish reset, otherwise
the remote device could still be running after proxy_device_reset()
returns from sending the message.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 32/49] multi-process: Use separate MMIO communication channel
  2019-10-24  9:09 ` [RFC v4 PATCH 32/49] multi-process: Use separate MMIO communication channel Jagannathan Raman
@ 2019-11-11 16:21   ` Stefan Hajnoczi
  2019-11-13 16:14     ` Jag Raman
  0 siblings, 1 reply; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-11 16:21 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, quintela, berrange, mst, qemu-devel, armbru,
	ross.lagerwall, kanth.ghatraju, kraxel, kwolf, pbonzini,
	liran.alon, marcandre.lureau, mreitz, dgilbert, rth

[-- Attachment #1: Type: text/plain, Size: 158 bytes --]

On Thu, Oct 24, 2019 at 05:09:13AM -0400, Jagannathan Raman wrote:
> Using a separate communication channel for MMIO helps
> with improving Performance

Why?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 30/49] multi-process: send heartbeat messages to remote
  2019-10-24  9:09 ` [RFC v4 PATCH 30/49] multi-process: send heartbeat messages to remote Jagannathan Raman
@ 2019-11-11 16:27   ` Stefan Hajnoczi
  2019-11-13 16:01     ` Jag Raman
  0 siblings, 1 reply; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-11 16:27 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, quintela, berrange, mst, qemu-devel, armbru,
	ross.lagerwall, kanth.ghatraju, kraxel, kwolf, pbonzini,
	liran.alon, marcandre.lureau, mreitz, dgilbert, rth

[-- Attachment #1: Type: text/plain, Size: 1375 bytes --]

On Thu, Oct 24, 2019 at 05:09:11AM -0400, Jagannathan Raman wrote:
> +static void broadcast_msg(MPQemuMsg *msg, bool need_reply)
> +{
> +    PCIProxyDev *entry;
> +    unsigned int pid;
> +    int wait;
> +
> +    QLIST_FOREACH(entry, &proxy_dev_list.devices, next) {
> +        if (need_reply) {
> +            wait = eventfd(0, EFD_NONBLOCK);
> +            msg->num_fds = 1;
> +            msg->fds[0] = wait;
> +        }
> +
> +        mpqemu_msg_send(entry->mpqemu_link, msg, entry->mpqemu_link->com);
> +        if (need_reply) {
> +            pid = (uint32_t)wait_for_remote(wait);

Sometimes QEMU really needs to wait for the remote process before it can
make progress.  I think this is not one of those cases though.

Since QEMU is event-driven it's problematic to invoke blocking system
calls.  The remote process might not respond for a significant amount of
time.  Other QEMU threads will be held up waiting for the QEMU global
mutex in the meantime (because we hold it!).

Please implement heartbeat/ping asynchronously.  The wait eventfd should
be read by an event loop fd handler instead.  That way QEMU can continue
with running the VM while waiting for the remote process.

This will also improve guest performance because there will be less
jitter (random latency because the event loop is held up waiting for
remote processes for short periods of time).

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 07/49] multi-process: define mpqemu-link object
  2019-10-24  9:08 ` [RFC v4 PATCH 07/49] multi-process: define mpqemu-link object Jagannathan Raman
@ 2019-11-11 16:41   ` Stefan Hajnoczi
  2019-11-13 15:47     ` Jag Raman
  2019-11-13 15:53   ` Stefan Hajnoczi
  1 sibling, 1 reply; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-11 16:41 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, quintela, berrange, mst, qemu-devel, armbru,
	ross.lagerwall, kanth.ghatraju, kraxel, kwolf, pbonzini,
	liran.alon, marcandre.lureau, mreitz, dgilbert, rth

[-- Attachment #1: Type: text/plain, Size: 2956 bytes --]

On Thu, Oct 24, 2019 at 05:08:48AM -0400, Jagannathan Raman wrote:
> +int mpqemu_msg_recv(MPQemuLinkState *s, MPQemuMsg *msg, MPQemuChannel *chan)
> +{
> +    int rc;
> +    uint8_t *data;
> +    union {
> +        char control[CMSG_SPACE(REMOTE_MAX_FDS * sizeof(int))];
> +        struct cmsghdr align;
> +    } u;
> +    struct msghdr hdr;
> +    struct cmsghdr *chdr;
> +    size_t fdsize;
> +    int sock = chan->sock;
> +    QemuMutex *lock = &chan->recv_lock;
> +
> +    struct iovec iov = {
> +        .iov_base = (char *) msg,
> +        .iov_len = MPQEMU_MSG_HDR_SIZE,
> +    };
> +
> +    memset(&hdr, 0, sizeof(hdr));
> +    memset(&u, 0, sizeof(u));
> +
> +    hdr.msg_iov = &iov;
> +    hdr.msg_iovlen = 1;
> +    hdr.msg_control = &u;
> +    hdr.msg_controllen = sizeof(u);
> +
> +    qemu_mutex_lock(lock);
> +
> +    do {
> +        rc = recvmsg(sock, &hdr, 0);
> +    } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
> +
> +    if (rc < 0) {
> +        qemu_log_mask(LOG_REMOTE_DEBUG, "%s - recvmsg rc is %d, errno is %d,"
> +                      " sock %d\n", __func__, rc, errno, sock);
> +        qemu_mutex_unlock(lock);
> +        return rc;
> +    }
> +
> +    msg->num_fds = 0;
> +    for (chdr = CMSG_FIRSTHDR(&hdr); chdr != NULL;
> +         chdr = CMSG_NXTHDR(&hdr, chdr)) {
> +        if ((chdr->cmsg_level == SOL_SOCKET) &&
> +            (chdr->cmsg_type == SCM_RIGHTS)) {
> +            fdsize = chdr->cmsg_len - CMSG_LEN(0);
> +            msg->num_fds = fdsize / sizeof(int);
> +            if (msg->num_fds > REMOTE_MAX_FDS) {
> +                /*
> +                 * TODO: Security issue detected. Sender never sends more
> +                 * than REMOTE_MAX_FDS. This condition should be signaled to
> +                 * the admin
> +                 */
> +                qemu_log_mask(LOG_REMOTE_DEBUG, "%s: Max FDs exceeded\n", __func__);
> +                return -ERANGE;
> +            }
> +
> +            memcpy(msg->fds, CMSG_DATA(chdr), fdsize);
> +            break;
> +        }
> +    }
> +
> +    if (msg->size && msg->bytestream) {
> +        msg->data2 = calloc(1, msg->size);
> +        data = msg->data2;
> +    } else {
> +        data = (uint8_t *)&msg->data1;
> +    }
> +
> +    if (msg->size) {
> +        do {
> +            rc = read(sock, data, msg->size);
> +        } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
> +    }
> +
> +    qemu_mutex_unlock(lock);
> +
> +    return rc;
> +}

This code is still insecure.  Until the communication between processes
is made secure this series does not meet its goal of providing process
isolation.

1. An attacker can overflow msg->data1 easily by setting msg->size but
   not msg->bytestream.
2. An attacker can allocate data2, all mpqemu_msg_recv() callers
   need to free it to prevent memory leaks.
3. mpqemu_msg_recv() callers generally do not validate untrusted msg
   fields.  All the code needs to be audited.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 02/49] multi-process: util: Add qemu_thread_cancel() to cancel running thread
  2019-10-24  9:08 ` [RFC v4 PATCH 02/49] multi-process: util: Add qemu_thread_cancel() to cancel running thread Jagannathan Raman
@ 2019-11-13 15:30   ` Stefan Hajnoczi
  2019-11-13 15:38     ` Jag Raman
  0 siblings, 1 reply; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-13 15:30 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, quintela, berrange, mst, qemu-devel, armbru,
	ross.lagerwall, kanth.ghatraju, kraxel, kwolf, pbonzini,
	liran.alon, marcandre.lureau, mreitz, dgilbert, rth

[-- Attachment #1: Type: text/plain, Size: 662 bytes --]

On Thu, Oct 24, 2019 at 05:08:43AM -0400, Jagannathan Raman wrote:
> qemu_thread_cancel() added to destroy a given running thread.
> This will be needed in the following patches.
> 
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> ---
>  include/qemu/thread.h    |  1 +
>  util/qemu-thread-posix.c | 10 ++++++++++
>  2 files changed, 11 insertions(+)

Is this still needed?  I thought previous discussion concluded that
thread cancellation is hard to get right and it's not actually used by
this series?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 03/49] multi-process: add a command line option for debug file
  2019-10-24  9:08 ` [RFC v4 PATCH 03/49] multi-process: add a command line option for debug file Jagannathan Raman
@ 2019-11-13 15:35   ` Stefan Hajnoczi
  0 siblings, 0 replies; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-13 15:35 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, quintela, berrange, mst, qemu-devel, armbru,
	ross.lagerwall, kanth.ghatraju, kraxel, kwolf, pbonzini,
	liran.alon, marcandre.lureau, mreitz, dgilbert, rth

[-- Attachment #1: Type: text/plain, Size: 536 bytes --]

On Thu, Oct 24, 2019 at 05:08:44AM -0400, Jagannathan Raman wrote:
> From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> 
> Can be used with -d rdebug command options when starting qemu.
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> ---
>  include/qemu/log.h | 1 +
>  util/log.c         | 2 ++
>  2 files changed, 3 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 02/49] multi-process: util: Add qemu_thread_cancel() to cancel running thread
  2019-11-13 15:30   ` Stefan Hajnoczi
@ 2019-11-13 15:38     ` Jag Raman
  2019-11-13 15:51       ` Daniel P. Berrangé
  0 siblings, 1 reply; 140+ messages in thread
From: Jag Raman @ 2019-11-13 15:38 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, liran.alon, rth, quintela, qemu-devel, armbru,
	ross.lagerwall, mst, kraxel, kwolf, pbonzini, berrange, mreitz,
	kanth.ghatraju, dgilbert, marcandre.lureau



On 11/13/2019 10:30 AM, Stefan Hajnoczi wrote:
> On Thu, Oct 24, 2019 at 05:08:43AM -0400, Jagannathan Raman wrote:
>> qemu_thread_cancel() added to destroy a given running thread.
>> This will be needed in the following patches.
>>
>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>> ---
>>   include/qemu/thread.h    |  1 +
>>   util/qemu-thread-posix.c | 10 ++++++++++
>>   2 files changed, 11 insertions(+)
> 
> Is this still needed?  I thought previous discussion concluded that
> thread cancellation is hard to get right and it's not actually used by
> this series?

Hi Stefan,

This is used in PATCH 41/49.

Thank you very much!
--
Jag

> 
> Stefan
> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 07/49] multi-process: define mpqemu-link object
  2019-11-11 16:41   ` Stefan Hajnoczi
@ 2019-11-13 15:47     ` Jag Raman
  0 siblings, 0 replies; 140+ messages in thread
From: Jag Raman @ 2019-11-13 15:47 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, quintela, berrange, mst, qemu-devel, armbru,
	ross.lagerwall, kanth.ghatraju, kraxel, kwolf, pbonzini,
	liran.alon, marcandre.lureau, mreitz, dgilbert, rth



On 11/11/2019 11:41 AM, Stefan Hajnoczi wrote:
> On Thu, Oct 24, 2019 at 05:08:48AM -0400, Jagannathan Raman wrote:
>> +int mpqemu_msg_recv(MPQemuLinkState *s, MPQemuMsg *msg, MPQemuChannel *chan)
>> +{
>> +    int rc;
>> +    uint8_t *data;
>> +    union {
>> +        char control[CMSG_SPACE(REMOTE_MAX_FDS * sizeof(int))];
>> +        struct cmsghdr align;
>> +    } u;
>> +    struct msghdr hdr;
>> +    struct cmsghdr *chdr;
>> +    size_t fdsize;
>> +    int sock = chan->sock;
>> +    QemuMutex *lock = &chan->recv_lock;
>> +
>> +    struct iovec iov = {
>> +        .iov_base = (char *) msg,
>> +        .iov_len = MPQEMU_MSG_HDR_SIZE,
>> +    };
>> +
>> +    memset(&hdr, 0, sizeof(hdr));
>> +    memset(&u, 0, sizeof(u));
>> +
>> +    hdr.msg_iov = &iov;
>> +    hdr.msg_iovlen = 1;
>> +    hdr.msg_control = &u;
>> +    hdr.msg_controllen = sizeof(u);
>> +
>> +    qemu_mutex_lock(lock);
>> +
>> +    do {
>> +        rc = recvmsg(sock, &hdr, 0);
>> +    } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
>> +
>> +    if (rc < 0) {
>> +        qemu_log_mask(LOG_REMOTE_DEBUG, "%s - recvmsg rc is %d, errno is %d,"
>> +                      " sock %d\n", __func__, rc, errno, sock);
>> +        qemu_mutex_unlock(lock);
>> +        return rc;
>> +    }
>> +
>> +    msg->num_fds = 0;
>> +    for (chdr = CMSG_FIRSTHDR(&hdr); chdr != NULL;
>> +         chdr = CMSG_NXTHDR(&hdr, chdr)) {
>> +        if ((chdr->cmsg_level == SOL_SOCKET) &&
>> +            (chdr->cmsg_type == SCM_RIGHTS)) {
>> +            fdsize = chdr->cmsg_len - CMSG_LEN(0);
>> +            msg->num_fds = fdsize / sizeof(int);
>> +            if (msg->num_fds > REMOTE_MAX_FDS) {
>> +                /*
>> +                 * TODO: Security issue detected. Sender never sends more
>> +                 * than REMOTE_MAX_FDS. This condition should be signaled to
>> +                 * the admin
>> +                 */
>> +                qemu_log_mask(LOG_REMOTE_DEBUG, "%s: Max FDs exceeded\n", __func__);
>> +                return -ERANGE;
>> +            }
>> +
>> +            memcpy(msg->fds, CMSG_DATA(chdr), fdsize);
>> +            break;
>> +        }
>> +    }
>> +
>> +    if (msg->size && msg->bytestream) {
>> +        msg->data2 = calloc(1, msg->size);
>> +        data = msg->data2;
>> +    } else {
>> +        data = (uint8_t *)&msg->data1;
>> +    }
>> +
>> +    if (msg->size) {
>> +        do {
>> +            rc = read(sock, data, msg->size);
>> +        } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
>> +    }
>> +
>> +    qemu_mutex_unlock(lock);
>> +
>> +    return rc;
>> +}
> 
> This code is still insecure.  Until the communication between processes
> is made secure this series does not meet its goal of providing process
> isolation.
> 
> 1. An attacker can overflow msg->data1 easily by setting msg->size but
>     not msg->bytestream.

We will add a check to ensure that msg->size is less than msg->data1 if
msg->bytestream is not set.

> 2. An attacker can allocate data2, all mpqemu_msg_recv() callers
>     need to free it to prevent memory leaks.

We will address this memory leak.

> 3. mpqemu_msg_recv() callers generally do not validate untrusted msg
>     fields.  All the code needs to be audited.

mpqemu_msg_recv() callers validate the num_fds field. But we will add
more fields for validation by the callers.

Thanks!
--
Jag

> 
> Stefan
> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 41/49] multi-process/mig: Enable VMSD save in the Proxy object
  2019-10-24  9:09 ` [RFC v4 PATCH 41/49] multi-process/mig: Enable VMSD save in the Proxy object Jagannathan Raman
@ 2019-11-13 15:50   ` Daniel P. Berrangé
  2019-11-13 16:32     ` Jag Raman
  0 siblings, 1 reply; 140+ messages in thread
From: Daniel P. Berrangé @ 2019-11-13 15:50 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, quintela, mst, qemu-devel, armbru, ross.lagerwall,
	mreitz, kanth.ghatraju, kraxel, stefanha, pbonzini, liran.alon,
	marcandre.lureau, kwolf, dgilbert, rth

On Thu, Oct 24, 2019 at 05:09:22AM -0400, Jagannathan Raman wrote:
> Collect the VMSD from remote process on the source and save
> it to the channel leading to the destination
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  New patch in v4
> 
>  hw/proxy/qemu-proxy.c         | 132 ++++++++++++++++++++++++++++++++++++++++++
>  include/hw/proxy/qemu-proxy.h |   2 +
>  include/io/mpqemu-link.h      |   1 +
>  3 files changed, 135 insertions(+)
> 
> diff --git a/hw/proxy/qemu-proxy.c b/hw/proxy/qemu-proxy.c
> index 623a6c5..ce72e6a 100644
> --- a/hw/proxy/qemu-proxy.c
> +++ b/hw/proxy/qemu-proxy.c
> @@ -52,6 +52,14 @@
>  #include "util/event_notifier-posix.c"
>  #include "hw/boards.h"
>  #include "include/qemu/log.h"
> +#include "io/channel.h"
> +#include "migration/qemu-file-types.h"
> +#include "qapi/error.h"
> +#include "io/channel-util.h"
> +#include "migration/qemu-file-channel.h"
> +#include "migration/qemu-file.h"
> +#include "migration/migration.h"
> +#include "migration/vmstate.h"
>  
>  QEMUTimer *hb_timer;
>  static void pci_proxy_dev_realize(PCIDevice *dev, Error **errp);
> @@ -62,6 +70,9 @@ static void stop_heartbeat_timer(void);
>  static void childsig_handler(int sig, siginfo_t *siginfo, void *ctx);
>  static void broadcast_msg(MPQemuMsg *msg, bool need_reply);
>  
> +#define PAGE_SIZE getpagesize()
> +uint8_t *mig_data;
> +
>  static void childsig_handler(int sig, siginfo_t *siginfo, void *ctx)
>  {
>      /* TODO: Add proper handler. */
> @@ -357,14 +368,135 @@ static void pci_proxy_dev_inst_init(Object *obj)
>      dev->mem_init = false;
>  }
>  
> +typedef struct {
> +    QEMUFile *rem;
> +    PCIProxyDev *dev;
> +} proxy_mig_data;
> +
> +static void *proxy_mig_out(void *opaque)
> +{
> +    proxy_mig_data *data = opaque;
> +    PCIProxyDev *dev = data->dev;
> +    uint8_t byte;
> +    uint64_t data_size = PAGE_SIZE;
> +
> +    mig_data = g_malloc(data_size);
> +
> +    while (true) {
> +        byte = qemu_get_byte(data->rem);

There is a pretty large set of APIs hiding behind the qemu_get_byte
call, which does not give me confidence that...

> +        mig_data[dev->migsize++] = byte;
> +        if (dev->migsize == data_size) {
> +            data_size += PAGE_SIZE;
> +            mig_data = g_realloc(mig_data, data_size);
> +        }
> +    }
> +
> +    return NULL;
> +}
> +
> +static int proxy_pre_save(void *opaque)
> +{
> +    PCIProxyDev *pdev = opaque;
> +    proxy_mig_data *mig_data;
> +    QEMUFile *f_remote;
> +    MPQemuMsg msg = {0};
> +    QemuThread thread;
> +    Error *err = NULL;
> +    QIOChannel *ioc;
> +    uint64_t size;
> +    int fd[2];
> +
> +    if (socketpair(AF_UNIX, SOCK_STREAM, 0, fd)) {
> +        return -1;
> +    }
> +
> +    ioc = qio_channel_new_fd(fd[0], &err);
> +    if (err) {
> +        error_report_err(err);
> +        return -1;
> +    }
> +
> +    qio_channel_set_name(QIO_CHANNEL(ioc), "PCIProxyDevice-mig");
> +
> +    f_remote = qemu_fopen_channel_input(ioc);
> +
> +    pdev->migsize = 0;
> +
> +    mig_data = g_malloc0(sizeof(proxy_mig_data));
> +    mig_data->rem = f_remote;
> +    mig_data->dev = pdev;
> +
> +    qemu_thread_create(&thread, "Proxy MIG_OUT", proxy_mig_out, mig_data,
> +                       QEMU_THREAD_DETACHED);
> +
> +    msg.cmd = START_MIG_OUT;
> +    msg.bytestream = 0;
> +    msg.num_fds = 2;
> +    msg.fds[0] = fd[1];
> +    msg.fds[1] = GET_REMOTE_WAIT;
> +
> +    mpqemu_msg_send(pdev->mpqemu_link, &msg, pdev->mpqemu_link->com);
> +    size = wait_for_remote(msg.fds[1]);
> +    PUT_REMOTE_WAIT(msg.fds[1]);
> +
> +    assert(size != ULLONG_MAX);
> +
> +    /*
> +     * migsize is being update by a separate thread. Using volatile to
> +     * instruct the compiler to fetch the value of this variable from
> +     * memory during every read
> +     */
> +    while (*((volatile uint64_t *)&pdev->migsize) < size) {
> +    }
> +
> +    qemu_thread_cancel(&thread);

....this is a safe way to stop the thread executing without
resulting in memory being leaked.

In addition thread cancellation is asynchronous, so the thread
may still be using the QEMUFile object while....

> +    qemu_fclose(f_remote);

..this is closing it. This feels like it is a crash danger.


> +    close(fd[1]);
> +
> +    return 0;
> +}

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 02/49] multi-process: util: Add qemu_thread_cancel() to cancel running thread
  2019-11-13 15:38     ` Jag Raman
@ 2019-11-13 15:51       ` Daniel P. Berrangé
  2019-11-13 16:04         ` Jag Raman
  0 siblings, 1 reply; 140+ messages in thread
From: Daniel P. Berrangé @ 2019-11-13 15:51 UTC (permalink / raw)
  To: Jag Raman
  Cc: elena.ufimtseva, fam, thuth, ross.lagerwall, ehabkost,
	john.g.johnson, mst, konrad.wilk, qemu-devel, armbru, quintela,
	liran.alon, kraxel, Stefan Hajnoczi, pbonzini, kwolf,
	marcandre.lureau, mreitz, kanth.ghatraju, dgilbert, rth

On Wed, Nov 13, 2019 at 10:38:06AM -0500, Jag Raman wrote:
> 
> 
> On 11/13/2019 10:30 AM, Stefan Hajnoczi wrote:
> > On Thu, Oct 24, 2019 at 05:08:43AM -0400, Jagannathan Raman wrote:
> > > qemu_thread_cancel() added to destroy a given running thread.
> > > This will be needed in the following patches.
> > > 
> > > Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> > > Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> > > Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> > > ---
> > >   include/qemu/thread.h    |  1 +
> > >   util/qemu-thread-posix.c | 10 ++++++++++
> > >   2 files changed, 11 insertions(+)
> > 
> > Is this still needed?  I thought previous discussion concluded that
> > thread cancellation is hard to get right and it's not actually used by
> > this series?
> 
> Hi Stefan,
> 
> This is used in PATCH 41/49.

I don't believe the cancellation usage in that patch is safe :-)

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 07/49] multi-process: define mpqemu-link object
  2019-10-24  9:08 ` [RFC v4 PATCH 07/49] multi-process: define mpqemu-link object Jagannathan Raman
  2019-11-11 16:41   ` Stefan Hajnoczi
@ 2019-11-13 15:53   ` Stefan Hajnoczi
  2019-11-18 15:26     ` Jag Raman
  1 sibling, 1 reply; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-13 15:53 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, quintela, berrange, mst, qemu-devel, armbru,
	ross.lagerwall, kanth.ghatraju, kraxel, kwolf, pbonzini,
	liran.alon, marcandre.lureau, mreitz, dgilbert, rth

[-- Attachment #1: Type: text/plain, Size: 3834 bytes --]

On Thu, Oct 24, 2019 at 05:08:48AM -0400, Jagannathan Raman wrote:
> +#ifndef MPQEMU_LINK_H
> +#define MPQEMU_LINK_H
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +
> +#include <stddef.h>
> +#include <stdint.h>

These are already included by "qemu/osdep.h".

> +#include <pthread.h>

Is <pthread.h> needed?

> +
> +#include "qom/object.h"
> +#include "qemu/thread.h"
> +
> +#define TYPE_MPQEMU_LINK "mpqemu-link"
> +#define MPQEMU_LINK(obj) \
> +    OBJECT_CHECK(MPQemuLinkState, (obj), TYPE_MPQEMU_LINK)
> +
> +#define REMOTE_MAX_FDS 8
> +
> +#define MPQEMU_MSG_HDR_SIZE offsetof(MPQemuMsg, data1.u64)
> +
> +/**
> + * mpqemu_cmd_t:
> + * CONF_READ        PCI config. space read
> + * CONF_WRITE       PCI config. space write
> + *
> + * proc_cmd_t enum type to specify the command to be executed on the remote
> + * device.
> + */
> +typedef enum {
> +    INIT = 0,
> +    CONF_READ,
> +    CONF_WRITE,
> +    MAX,
> +} mpqemu_cmd_t;

Please allow for future non-PCI devices by clearly naming PCI-specific
commands and including a bus type in the initialization messages.

> diff --git a/io/mpqemu-link.c b/io/mpqemu-link.c
> new file mode 100644
> index 0000000..b39f4d0
> --- /dev/null
> +++ b/io/mpqemu-link.c
> @@ -0,0 +1,309 @@
> +/*
> + * Communication channel between QEMU and remote device process
> + *
> + * Copyright 2019, Oracle and/or its affiliates.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +
> +#include <assert.h>
> +#include <errno.h>
> +#include <pthread.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <sys/types.h>
> +#include <sys/socket.h>
> +#include <sys/un.h>
> +#include <unistd.h>
> +#include <limits.h>
> +#include <poll.h>

Many of these are already included by "qemu/osdep.h".  Some of them
shouldn't be used directly because QEMU or glib have abstractions that
hide the platform-specific differences (e.g. pthread, poll).

> +MPQemuLinkState *mpqemu_link_create(void)
> +{
> +    return MPQEMU_LINK(object_new(TYPE_MPQEMU_LINK));
> +}

I'm not sure what the purpose of this object is.  mpqemu_link_create()
suggests the objects will be created internally instead of via -object
mpqemu-link,..., which is unusual.

mpqemu_msg_send() and mpqemu_msg_recv() seem to be the main functions
but they do not even use their MPQemuLinkState *s argument.

> +void mpqemu_start_coms(MPQemuLinkState *s)
> +{
> +
> +    g_assert(g_source_attach(&s->com->gsrc, s->ctx));
> +
> +    g_main_loop_run(s->loop);
> +}

There is already IOThread if you need an event loop thread.  But does
this need to be its own thread?  The communication should be
asynchronous and therefore it can run in the main event loop or any
existing IOThread.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 30/49] multi-process: send heartbeat messages to remote
  2019-11-11 16:27   ` Stefan Hajnoczi
@ 2019-11-13 16:01     ` Jag Raman
  2019-11-21 12:19       ` Stefan Hajnoczi
  0 siblings, 1 reply; 140+ messages in thread
From: Jag Raman @ 2019-11-13 16:01 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, liran.alon, rth, quintela, qemu-devel, armbru,
	ross.lagerwall, mst, kraxel, kwolf, pbonzini, berrange, mreitz,
	kanth.ghatraju, dgilbert, marcandre.lureau



On 11/11/2019 11:27 AM, Stefan Hajnoczi wrote:
> On Thu, Oct 24, 2019 at 05:09:11AM -0400, Jagannathan Raman wrote:
>> +static void broadcast_msg(MPQemuMsg *msg, bool need_reply)
>> +{
>> +    PCIProxyDev *entry;
>> +    unsigned int pid;
>> +    int wait;
>> +
>> +    QLIST_FOREACH(entry, &proxy_dev_list.devices, next) {
>> +        if (need_reply) {
>> +            wait = eventfd(0, EFD_NONBLOCK);
>> +            msg->num_fds = 1;
>> +            msg->fds[0] = wait;
>> +        }
>> +
>> +        mpqemu_msg_send(entry->mpqemu_link, msg, entry->mpqemu_link->com);
>> +        if (need_reply) {
>> +            pid = (uint32_t)wait_for_remote(wait);
> 
> Sometimes QEMU really needs to wait for the remote process before it can
> make progress.  I think this is not one of those cases though.
> 
> Since QEMU is event-driven it's problematic to invoke blocking system
> calls.  The remote process might not respond for a significant amount of
> time.  Other QEMU threads will be held up waiting for the QEMU global
> mutex in the meantime (because we hold it!).

There are places where we wait synchronously for the remote process.
However, these synchronous waits carry a timeout to prevent the hang
situation you described above.

We will add an error recovery in the future. That is, we will respawn
the remote process if the QEMU times out waiting for it.

> 
> Please implement heartbeat/ping asynchronously.  The wait eventfd should
> be read by an event loop fd handler instead.  That way QEMU can continue
> with running the VM while waiting for the remote process.

In the current implementation, the heartbeat/ping is asynchronous.
start_heartbeat_timer() sets up a timer to perform the ping.

Thanks!
--
Jag

> 
> This will also improve guest performance because there will be less
> jitter (random latency because the event loop is held up waiting for
> remote processes for short periods of time).
> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 02/49] multi-process: util: Add qemu_thread_cancel() to cancel running thread
  2019-11-13 15:51       ` Daniel P. Berrangé
@ 2019-11-13 16:04         ` Jag Raman
  2019-11-13 16:35           ` Daniel P. Berrangé
  0 siblings, 1 reply; 140+ messages in thread
From: Jag Raman @ 2019-11-13 16:04 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: elena.ufimtseva, fam, thuth, ross.lagerwall, ehabkost,
	john.g.johnson, mst, konrad.wilk, qemu-devel, armbru, quintela,
	liran.alon, kraxel, Stefan Hajnoczi, pbonzini, kwolf,
	marcandre.lureau, mreitz, kanth.ghatraju, dgilbert, rth



On 11/13/2019 10:51 AM, Daniel P. Berrangé wrote:
> On Wed, Nov 13, 2019 at 10:38:06AM -0500, Jag Raman wrote:
>>
>>
>> On 11/13/2019 10:30 AM, Stefan Hajnoczi wrote:
>>> On Thu, Oct 24, 2019 at 05:08:43AM -0400, Jagannathan Raman wrote:
>>>> qemu_thread_cancel() added to destroy a given running thread.
>>>> This will be needed in the following patches.
>>>>
>>>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>>>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>>>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>>>> ---
>>>>    include/qemu/thread.h    |  1 +
>>>>    util/qemu-thread-posix.c | 10 ++++++++++
>>>>    2 files changed, 11 insertions(+)
>>>
>>> Is this still needed?  I thought previous discussion concluded that
>>> thread cancellation is hard to get right and it's not actually used by
>>> this series?
>>
>> Hi Stefan,
>>
>> This is used in PATCH 41/49.
> 
> I don't believe the cancellation usage in that patch is safe :-)

Thanks for the feedback, we will address that.

May I please ask why it is not safe? Any clarification will help us to
find a better alternative.

Thank you very much!
--
Jag

> 
> Regards,
> Daniel
> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 09/49] multi-process: setup PCI host bridge for remote device
  2019-10-24  9:08 ` [RFC v4 PATCH 09/49] multi-process: setup PCI host bridge for remote device Jagannathan Raman
@ 2019-11-13 16:07   ` Stefan Hajnoczi
  2019-11-18 15:25     ` Jag Raman
  0 siblings, 1 reply; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-13 16:07 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, quintela, berrange, mst, qemu-devel, armbru,
	ross.lagerwall, kanth.ghatraju, kraxel, kwolf, pbonzini,
	liran.alon, marcandre.lureau, mreitz, dgilbert, rth

[-- Attachment #1: Type: text/plain, Size: 993 bytes --]

On Thu, Oct 24, 2019 at 05:08:50AM -0400, Jagannathan Raman wrote:
> +static void remote_host_realize(DeviceState *dev, Error **errp)
> +{
> +    PCIHostState *pci = PCI_HOST_BRIDGE(dev);
> +    RemPCIHost *s = REMOTE_HOST_DEVICE(dev);
> +
> +    /*
> +     * TODO: the name of the bus would be provided by QEMU. Use
> +     * "pcie.0" for now.
> +     */
> +    pci->bus = pci_root_bus_new(DEVICE(s), "pcie.0",
> +                                s->mr_pci_mem, s->mr_sys_io,
> +                                0, TYPE_PCIE_BUS);

The PCI bus name could be a property and then whatever instantiates
RemPCIHost could set it.

Machine types usually hardcode the name because they assume there is
only one machine instance.  In the case of mpqemu this is an okay
starting point, but maybe multiple busses will become necessary if the
device emulation process handles multiple device instances - especially
if they are served to multiple guests like in a software-defined network
switch use case.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 32/49] multi-process: Use separate MMIO communication channel
  2019-11-11 16:21   ` Stefan Hajnoczi
@ 2019-11-13 16:14     ` Jag Raman
  2019-11-21 12:31       ` Stefan Hajnoczi
  0 siblings, 1 reply; 140+ messages in thread
From: Jag Raman @ 2019-11-13 16:14 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, liran.alon, rth, quintela, qemu-devel, armbru,
	ross.lagerwall, mst, kraxel, kwolf, pbonzini, berrange, mreitz,
	kanth.ghatraju, dgilbert, marcandre.lureau



On 11/11/2019 11:21 AM, Stefan Hajnoczi wrote:
> On Thu, Oct 24, 2019 at 05:09:13AM -0400, Jagannathan Raman wrote:
>> Using a separate communication channel for MMIO helps
>> with improving Performance
> 
> Why?

Typical initiation of IO operations involves multiple MMIO accesses per
IO operation. In some legacy devices like LSI, the completion of the IO
operations is also accomplished by polling on MMIO registers. Therefore,
MMIO traffic can be hefty in some cases and contribute to Performance.

Having a dedicated channel for MMIO ensures that it doesn't have to
compete with other messages to the remote process, especially when there
are multiple devices emulated by a single remote process.

Thanks!
--
Jag

> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 33/49] multi-process: perform device reset in the remote process
  2019-11-11 16:19   ` Stefan Hajnoczi
@ 2019-11-13 16:15     ` Jag Raman
  0 siblings, 0 replies; 140+ messages in thread
From: Jag Raman @ 2019-11-13 16:15 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, liran.alon, rth, quintela, qemu-devel, armbru,
	ross.lagerwall, mst, kraxel, kwolf, pbonzini, berrange, mreitz,
	kanth.ghatraju, dgilbert, marcandre.lureau



On 11/11/2019 11:19 AM, Stefan Hajnoczi wrote:
> On Thu, Oct 24, 2019 at 05:09:14AM -0400, Jagannathan Raman wrote:
>> +void proxy_device_reset(DeviceState *dev)
>> +{
>> +    PCIProxyDev *pdev = PCI_PROXY_DEV(dev);
>> +    MPQemuMsg msg;
>> +
>> +    memset(&msg, 0, sizeof(MPQemuMsg));
>> +
>> +    msg.bytestream = 0;
>> +    msg.size = sizeof(msg.data1);
>> +    msg.cmd = DEVICE_RESET;
>> +
>> +    mpqemu_msg_send(pdev->mpqemu_link, &msg, pdev->mpqemu_link->com);
>> +}
> 
> Device reset must wait for the remote process to finish reset, otherwise
> the remote device could still be running after proxy_device_reset()
> returns from sending the message.

Thanks for feedback. We will wait for the reset to complete.

--
Jag

> 
> Stefan
> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 47/49] multi-process: Enable support for multiple devices in remote
  2019-11-11 16:15   ` Stefan Hajnoczi
@ 2019-11-13 16:21     ` Jag Raman
  0 siblings, 0 replies; 140+ messages in thread
From: Jag Raman @ 2019-11-13 16:21 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, liran.alon, rth, quintela, qemu-devel, armbru,
	ross.lagerwall, mst, kraxel, kwolf, pbonzini, berrange, mreitz,
	kanth.ghatraju, dgilbert, marcandre.lureau



On 11/11/2019 11:15 AM, Stefan Hajnoczi wrote:
> On Thu, Oct 24, 2019 at 05:09:28AM -0400, Jagannathan Raman wrote:
>> @@ -93,7 +94,8 @@ static void process_config_write(MPQemuMsg *msg)
>>       struct conf_data_msg *conf = (struct conf_data_msg *)msg->data2;
>>   
>>       qemu_mutex_lock_iothread();
>> -    pci_default_write_config(remote_pci_dev, conf->addr, conf->val, conf->l);
>> +    pci_default_write_config(remote_pci_devs[msg->id], conf->addr, conf->val,
>> +                             conf->l);
>>       qemu_mutex_unlock_iothread();
>>   }
>>   
>> @@ -106,7 +108,8 @@ static void process_config_read(MPQemuMsg *msg)
>>       wait = msg->fds[0];
>>   
>>       qemu_mutex_lock_iothread();
>> -    val = pci_default_read_config(remote_pci_dev, conf->addr, conf->l);
>> +    val = pci_default_read_config(remote_pci_devs[msg->id], conf->addr,
>> +                                  conf->l);
>>       qemu_mutex_unlock_iothread();
>>   
>>       notify_proxy(wait, val);
> 
> msg->id was read from a socket and hasn't been validated before indexing
> into remote_pci_devs[].

We see the common thread, w.r.t your concerns about security. Thanks for
pointing them out.

We will fix this and other similar issues in the future.

Thank you very much!
--
Jag

> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 10/49] multi-process: setup a machine object for remote device process
  2019-10-24  9:08 ` [RFC v4 PATCH 10/49] multi-process: setup a machine object for remote device process Jagannathan Raman
@ 2019-11-13 16:22   ` Stefan Hajnoczi
  2019-11-18 15:29     ` Jag Raman
  0 siblings, 1 reply; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-13 16:22 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, quintela, berrange, mst, qemu-devel, armbru,
	ross.lagerwall, kanth.ghatraju, kraxel, kwolf, pbonzini,
	liran.alon, marcandre.lureau, mreitz, dgilbert, rth

[-- Attachment #1: Type: text/plain, Size: 2222 bytes --]

On Thu, Oct 24, 2019 at 05:08:51AM -0400, Jagannathan Raman wrote:
> +static NotifierList machine_init_done_notifiers =
> +    NOTIFIER_LIST_INITIALIZER(machine_init_done_notifiers);
> +
> +bool machine_init_done;
> +
> +void qemu_add_machine_init_done_notifier(Notifier *notify)
> +{
> +    notifier_list_add(&machine_init_done_notifiers, notify);
> +    if (machine_init_done) {
> +        notify->notify(notify, NULL);
> +    }
> +}
> +
> +void qemu_remove_machine_init_done_notifier(Notifier *notify)
> +{
> +    notifier_remove(notify);
> +}
> +
> +void qemu_run_machine_init_done_notifiers(void)
> +{
> +    machine_init_done = true;
> +    notifier_list_notify(&machine_init_done_notifiers, NULL);
> +}

qemu_add_machine_init_done_notifier() is already defined in vl.c.
Please share the implementation instead of duplicating it into the
remote program.

> +
> +static void remote_machine_init(Object *obj)
> +{
> +    RemMachineState *s = REMOTE_MACHINE(obj);
> +    RemPCIHost *rem_host;
> +    MemoryRegion *system_memory, *system_io, *pci_memory;
> +
> +    Error *error_abort = NULL;
> +
> +    qemu_mutex_init(&ram_list.mutex);

Please keep global initialization separate from RemMachineState (e.g. do
it in main() or a function called by main()).  This function should only
initialize RemMachineState.

> +
> +    object_property_add_child(object_get_root(), "machine", obj, &error_abort);
> +    if (error_abort) {
> +        error_report_err(error_abort);
> +    }
> +
> +    memory_map_init();

This is global init, please move it elsewhere.

> +
> +    system_memory = get_system_memory();
> +    system_io = get_system_io();
> +
> +    pci_memory = g_new(MemoryRegion, 1);
> +    memory_region_init(pci_memory, NULL, "pci", UINT64_MAX);
> +
> +    rem_host = REMOTE_HOST_DEVICE(qdev_create(NULL, TYPE_REMOTE_HOST_DEVICE));
> +
> +    rem_host->mr_pci_mem = pci_memory;
> +    rem_host->mr_sys_mem = system_memory;
> +    rem_host->mr_sys_io = system_io;
> +
> +    s->host = rem_host;

Both s and rem_host are QOM objects.  There should be a child property
relationship between them here.  It will ensure that rem_host is cleaned
up when s is cleaned up.  Please use that instead of a regular C
pointer.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 41/49] multi-process/mig: Enable VMSD save in the Proxy object
  2019-11-13 15:50   ` Daniel P. Berrangé
@ 2019-11-13 16:32     ` Jag Raman
  2019-11-13 17:11       ` Daniel P. Berrangé
  0 siblings, 1 reply; 140+ messages in thread
From: Jag Raman @ 2019-11-13 16:32 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, quintela, mst, qemu-devel, armbru, ross.lagerwall,
	mreitz, kanth.ghatraju, kraxel, stefanha, pbonzini, liran.alon,
	marcandre.lureau, kwolf, dgilbert, rth



On 11/13/2019 10:50 AM, Daniel P. Berrangé wrote:
> On Thu, Oct 24, 2019 at 05:09:22AM -0400, Jagannathan Raman wrote:
>> Collect the VMSD from remote process on the source and save
>> it to the channel leading to the destination
>>
>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>> ---
>>   New patch in v4
>>
>>   hw/proxy/qemu-proxy.c         | 132 ++++++++++++++++++++++++++++++++++++++++++
>>   include/hw/proxy/qemu-proxy.h |   2 +
>>   include/io/mpqemu-link.h      |   1 +
>>   3 files changed, 135 insertions(+)
>>
>> diff --git a/hw/proxy/qemu-proxy.c b/hw/proxy/qemu-proxy.c
>> index 623a6c5..ce72e6a 100644
>> --- a/hw/proxy/qemu-proxy.c
>> +++ b/hw/proxy/qemu-proxy.c
>> @@ -52,6 +52,14 @@
>>   #include "util/event_notifier-posix.c"
>>   #include "hw/boards.h"
>>   #include "include/qemu/log.h"
>> +#include "io/channel.h"
>> +#include "migration/qemu-file-types.h"
>> +#include "qapi/error.h"
>> +#include "io/channel-util.h"
>> +#include "migration/qemu-file-channel.h"
>> +#include "migration/qemu-file.h"
>> +#include "migration/migration.h"
>> +#include "migration/vmstate.h"
>>   
>>   QEMUTimer *hb_timer;
>>   static void pci_proxy_dev_realize(PCIDevice *dev, Error **errp);
>> @@ -62,6 +70,9 @@ static void stop_heartbeat_timer(void);
>>   static void childsig_handler(int sig, siginfo_t *siginfo, void *ctx);
>>   static void broadcast_msg(MPQemuMsg *msg, bool need_reply);
>>   
>> +#define PAGE_SIZE getpagesize()
>> +uint8_t *mig_data;
>> +
>>   static void childsig_handler(int sig, siginfo_t *siginfo, void *ctx)
>>   {
>>       /* TODO: Add proper handler. */
>> @@ -357,14 +368,135 @@ static void pci_proxy_dev_inst_init(Object *obj)
>>       dev->mem_init = false;
>>   }
>>   
>> +typedef struct {
>> +    QEMUFile *rem;
>> +    PCIProxyDev *dev;
>> +} proxy_mig_data;
>> +
>> +static void *proxy_mig_out(void *opaque)
>> +{
>> +    proxy_mig_data *data = opaque;
>> +    PCIProxyDev *dev = data->dev;
>> +    uint8_t byte;
>> +    uint64_t data_size = PAGE_SIZE;
>> +
>> +    mig_data = g_malloc(data_size);
>> +
>> +    while (true) {
>> +        byte = qemu_get_byte(data->rem);
> 
> There is a pretty large set of APIs hiding behind the qemu_get_byte
> call, which does not give me confidence that...
> 
>> +        mig_data[dev->migsize++] = byte;
>> +        if (dev->migsize == data_size) {
>> +            data_size += PAGE_SIZE;
>> +            mig_data = g_realloc(mig_data, data_size);
>> +        }
>> +    }
>> +
>> +    return NULL;
>> +}
>> +
>> +static int proxy_pre_save(void *opaque)
>> +{
>> +    PCIProxyDev *pdev = opaque;
>> +    proxy_mig_data *mig_data;
>> +    QEMUFile *f_remote;
>> +    MPQemuMsg msg = {0};
>> +    QemuThread thread;
>> +    Error *err = NULL;
>> +    QIOChannel *ioc;
>> +    uint64_t size;
>> +    int fd[2];
>> +
>> +    if (socketpair(AF_UNIX, SOCK_STREAM, 0, fd)) {
>> +        return -1;
>> +    }
>> +
>> +    ioc = qio_channel_new_fd(fd[0], &err);
>> +    if (err) {
>> +        error_report_err(err);
>> +        return -1;
>> +    }
>> +
>> +    qio_channel_set_name(QIO_CHANNEL(ioc), "PCIProxyDevice-mig");
>> +
>> +    f_remote = qemu_fopen_channel_input(ioc);
>> +
>> +    pdev->migsize = 0;
>> +
>> +    mig_data = g_malloc0(sizeof(proxy_mig_data));
>> +    mig_data->rem = f_remote;
>> +    mig_data->dev = pdev;
>> +
>> +    qemu_thread_create(&thread, "Proxy MIG_OUT", proxy_mig_out, mig_data,
>> +                       QEMU_THREAD_DETACHED);
>> +
>> +    msg.cmd = START_MIG_OUT;
>> +    msg.bytestream = 0;
>> +    msg.num_fds = 2;
>> +    msg.fds[0] = fd[1];
>> +    msg.fds[1] = GET_REMOTE_WAIT;
>> +
>> +    mpqemu_msg_send(pdev->mpqemu_link, &msg, pdev->mpqemu_link->com);
>> +    size = wait_for_remote(msg.fds[1]);
>> +    PUT_REMOTE_WAIT(msg.fds[1]);
>> +
>> +    assert(size != ULLONG_MAX);
>> +
>> +    /*
>> +     * migsize is being update by a separate thread. Using volatile to
>> +     * instruct the compiler to fetch the value of this variable from
>> +     * memory during every read
>> +     */
>> +    while (*((volatile uint64_t *)&pdev->migsize) < size) {
>> +    }
>> +
>> +    qemu_thread_cancel(&thread);
> 
> ....this is a safe way to stop the thread executing without
> resulting in memory being leaked.
> 
> In addition thread cancellation is asynchronous, so the thread
> may still be using the QEMUFile object while....
> 
>> +    qemu_fclose(f_remote);

The above "wait_for_remote()" call waits for the remote process to
finish with Migration, and return the size of the VMSD.

It should be safe to cancel the thread and close the file, once the
remote process is done sending the VMSD and we have read "size" bytes
from it, is it not?

Thank you very much!
--
Jag

> 
> ..this is closing it. This feels like it is a crash danger.
> 
> 
>> +    close(fd[1]);
>> +
>> +    return 0;
>> +}
> 
> Regards,
> Daniel
> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 11/49] multi-process: setup memory manager for remote device
  2019-10-24  9:08 ` [RFC v4 PATCH 11/49] multi-process: setup memory manager for remote device Jagannathan Raman
@ 2019-11-13 16:33   ` Stefan Hajnoczi
  2019-11-13 16:34     ` Jag Raman
  0 siblings, 1 reply; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-13 16:33 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, quintela, berrange, mst, qemu-devel, armbru,
	ross.lagerwall, kanth.ghatraju, kraxel, kwolf, pbonzini,
	liran.alon, marcandre.lureau, mreitz, dgilbert, rth

[-- Attachment #1: Type: text/plain, Size: 1601 bytes --]

On Thu, Oct 24, 2019 at 05:08:52AM -0400, Jagannathan Raman wrote:
> +static void remote_ram_destructor(MemoryRegion *mr)
> +{
> +    qemu_ram_free(mr->ram_block);
> +}
> +
> +static void remote_ram_init_from_fd(MemoryRegion *mr, int fd, uint64_t size,
> +                                    ram_addr_t offset, Error **errp)
> +{
> +    char *name = g_strdup_printf("%d", fd);
> +
> +    memory_region_init(mr, NULL, name, size);
> +    mr->ram = true;
> +    mr->terminates = true;
> +    mr->destructor = NULL;
> +    mr->align = 0;
> +    mr->ram_block = qemu_ram_alloc_from_fd(size, mr, RAM_SHARED, fd, offset,
> +                                           errp);
> +    mr->dirty_log_mask = tcg_enabled() ? (1 << DIRTY_MEMORY_CODE) : 0;
> +
> +    g_free(name);
> +}

This is not specific to remote/memory.c and could be shared in case
something else in QEMU wants to initialize from an fd.

> +
> +void remote_sysmem_reconfig(MPQemuMsg *msg, Error **errp)
> +{
> +    sync_sysmem_msg_t *sysmem_info = &msg->data1.sync_sysmem;

A possible security issue with MPQemuMsg: was the message size
validatedb before we access msg->data1.sync_sysmem?

If not, then we might access uninitialized data.  I didn't see if there
is a single place in the code that always zeroes msg, but I think the
answer is no.  Accessing uninitialized data could expose the old
contents of the stack/heap to the other process.  Information leaks like
this can be used to defeat address-space randomization because the other
process may learn about our memory layout if there are memory addresses
in the uninitialized data.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 45/49] multi-process/mig: Synchronize runstate of remote process
  2019-11-11 16:17   ` Stefan Hajnoczi
@ 2019-11-13 16:33     ` Jag Raman
  0 siblings, 0 replies; 140+ messages in thread
From: Jag Raman @ 2019-11-13 16:33 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, liran.alon, rth, quintela, qemu-devel, armbru,
	ross.lagerwall, mst, kraxel, kwolf, pbonzini, berrange, mreitz,
	kanth.ghatraju, dgilbert, marcandre.lureau



On 11/11/2019 11:17 AM, Stefan Hajnoczi wrote:
> On Thu, Oct 24, 2019 at 05:09:26AM -0400, Jagannathan Raman wrote:
>> @@ -656,6 +657,19 @@ static void init_proxy(PCIDevice *dev, char *command, bool need_spawn, Error **e
>>       }
>>   }
>>   
>> +static void proxy_vm_state_change(void *opaque, int running, RunState state)
>> +{
>> +    PCIProxyDev *dev = opaque;
>> +    MPQemuMsg msg = { 0 };
>> +
>> +    msg.cmd = RUNSTATE_SET;
>> +    msg.bytestream = 0;
>> +    msg.size = sizeof(msg.data1);
>> +    msg.data1.runstate.state = state;
>> +
>> +    mpqemu_msg_send(dev->mpqemu_link, &msg, dev->mpqemu_link->com);
>> +}
> 
> Changing vm state is a barrier operation - devices must not dirty memory
> afterwards.  This function doesn't have barrier semantics, it sends off
> the message without waiting for the remote process to finish processing
> it.  This means there is a race condition where QEMU has changes the vm
> state but devices could still dirty memory.  Please wait for a reply to
> prevent this.

Got it, thanks! Will do.

--
Jag

> 
> Stefan
> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 11/49] multi-process: setup memory manager for remote device
  2019-11-13 16:33   ` Stefan Hajnoczi
@ 2019-11-13 16:34     ` Jag Raman
  0 siblings, 0 replies; 140+ messages in thread
From: Jag Raman @ 2019-11-13 16:34 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, quintela, berrange, mst, qemu-devel, armbru,
	ross.lagerwall, kanth.ghatraju, kraxel, kwolf, pbonzini,
	liran.alon, marcandre.lureau, mreitz, dgilbert, rth



On 11/13/2019 11:33 AM, Stefan Hajnoczi wrote:
> On Thu, Oct 24, 2019 at 05:08:52AM -0400, Jagannathan Raman wrote:
>> +static void remote_ram_destructor(MemoryRegion *mr)
>> +{
>> +    qemu_ram_free(mr->ram_block);
>> +}
>> +
>> +static void remote_ram_init_from_fd(MemoryRegion *mr, int fd, uint64_t size,
>> +                                    ram_addr_t offset, Error **errp)
>> +{
>> +    char *name = g_strdup_printf("%d", fd);
>> +
>> +    memory_region_init(mr, NULL, name, size);
>> +    mr->ram = true;
>> +    mr->terminates = true;
>> +    mr->destructor = NULL;
>> +    mr->align = 0;
>> +    mr->ram_block = qemu_ram_alloc_from_fd(size, mr, RAM_SHARED, fd, offset,
>> +                                           errp);
>> +    mr->dirty_log_mask = tcg_enabled() ? (1 << DIRTY_MEMORY_CODE) : 0;
>> +
>> +    g_free(name);
>> +}
> 
> This is not specific to remote/memory.c and could be shared in case
> something else in QEMU wants to initialize from an fd.
> 
>> +
>> +void remote_sysmem_reconfig(MPQemuMsg *msg, Error **errp)
>> +{
>> +    sync_sysmem_msg_t *sysmem_info = &msg->data1.sync_sysmem;
> 
> A possible security issue with MPQemuMsg: was the message size
> validatedb before we access msg->data1.sync_sysmem?
> 
> If not, then we might access uninitialized data.  I didn't see if there
> is a single place in the code that always zeroes msg, but I think the
> answer is no.  Accessing uninitialized data could expose the old
> contents of the stack/heap to the other process.  Information leaks like
> this can be used to defeat address-space randomization because the other
> process may learn about our memory layout if there are memory addresses
> in the uninitialized data.

Thanks for the feedback. Will do.

--
Jag

> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 02/49] multi-process: util: Add qemu_thread_cancel() to cancel running thread
  2019-11-13 16:04         ` Jag Raman
@ 2019-11-13 16:35           ` Daniel P. Berrangé
  0 siblings, 0 replies; 140+ messages in thread
From: Daniel P. Berrangé @ 2019-11-13 16:35 UTC (permalink / raw)
  To: Jag Raman
  Cc: elena.ufimtseva, fam, thuth, ehabkost, konrad.wilk,
	john.g.johnson, rth, mst, qemu-devel, armbru, ross.lagerwall,
	mreitz, liran.alon, kraxel, Stefan Hajnoczi, kanth.ghatraju,
	pbonzini, kwolf, quintela, dgilbert, marcandre.lureau

On Wed, Nov 13, 2019 at 11:04:58AM -0500, Jag Raman wrote:
> 
> 
> On 11/13/2019 10:51 AM, Daniel P. Berrangé wrote:
> > On Wed, Nov 13, 2019 at 10:38:06AM -0500, Jag Raman wrote:
> > > 
> > > 
> > > On 11/13/2019 10:30 AM, Stefan Hajnoczi wrote:
> > > > On Thu, Oct 24, 2019 at 05:08:43AM -0400, Jagannathan Raman wrote:
> > > > > qemu_thread_cancel() added to destroy a given running thread.
> > > > > This will be needed in the following patches.
> > > > > 
> > > > > Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> > > > > Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> > > > > Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> > > > > ---
> > > > >    include/qemu/thread.h    |  1 +
> > > > >    util/qemu-thread-posix.c | 10 ++++++++++
> > > > >    2 files changed, 11 insertions(+)
> > > > 
> > > > Is this still needed?  I thought previous discussion concluded that
> > > > thread cancellation is hard to get right and it's not actually used by
> > > > this series?
> > > 
> > > Hi Stefan,
> > > 
> > > This is used in PATCH 41/49.
> > 
> > I don't believe the cancellation usage in that patch is safe :-)
> 
> Thanks for the feedback, we will address that.
> 
> May I please ask why it is not safe? Any clarification will help us to
> find a better alternative.

I put some comments inline in the patch 41 explaining my thoughts.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 12/49] multi-process: remote process initialization
  2019-10-24  9:08 ` [RFC v4 PATCH 12/49] multi-process: remote process initialization Jagannathan Raman
@ 2019-11-13 16:38   ` Stefan Hajnoczi
  0 siblings, 0 replies; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-13 16:38 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, quintela, berrange, mst, qemu-devel, armbru,
	ross.lagerwall, kanth.ghatraju, kraxel, kwolf, pbonzini,
	liran.alon, marcandre.lureau, mreitz, dgilbert, rth

[-- Attachment #1: Type: text/plain, Size: 1001 bytes --]

On Thu, Oct 24, 2019 at 05:08:53AM -0400, Jagannathan Raman wrote:
>  int main(int argc, char *argv[])
>  {
> +    Error *err = NULL;
> +
>      module_call_init(MODULE_INIT_QOM);
>  
> +    bdrv_init_with_whitelist();
> +
> +    if (qemu_init_main_loop(&err)) {
> +        error_report_err(err);
> +        return -EBUSY;
> +    }
> +
> +    qemu_init_cpu_loop();
> +
> +    page_size_init();
> +
>      current_machine = MACHINE(REMOTE_MACHINE(object_new(TYPE_REMOTE_MACHINE)));
>  
> +    mpqemu_link = mpqemu_link_create();
> +    if (!mpqemu_link) {
> +        printf("Could not create MPQemu link\n");
> +        return -1;
> +    }
> +
> +    mpqemu_init_channel(mpqemu_link, &mpqemu_link->com, STDIN_FILENO);
> +    mpqemu_link_set_callback(mpqemu_link, process_msg);
> +
> +    mpqemu_start_coms(mpqemu_link);

Can you use util/main-loop.c instead of an mpqemu-specific event loop?
I think that file is needed anyway because lots of QEMU code depends on
it.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 41/49] multi-process/mig: Enable VMSD save in the Proxy object
  2019-11-13 16:32     ` Jag Raman
@ 2019-11-13 17:11       ` Daniel P. Berrangé
  2019-11-18 15:42         ` Jag Raman
  0 siblings, 1 reply; 140+ messages in thread
From: Daniel P. Berrangé @ 2019-11-13 17:11 UTC (permalink / raw)
  To: Jag Raman
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, liran.alon, rth, quintela, qemu-devel, armbru,
	ross.lagerwall, mst, kraxel, stefanha, pbonzini, kanth.ghatraju,
	mreitz, kwolf, dgilbert, marcandre.lureau

On Wed, Nov 13, 2019 at 11:32:09AM -0500, Jag Raman wrote:
> 
> 
> On 11/13/2019 10:50 AM, Daniel P. Berrangé wrote:
> > On Thu, Oct 24, 2019 at 05:09:22AM -0400, Jagannathan Raman wrote:
> > > Collect the VMSD from remote process on the source and save
> > > it to the channel leading to the destination
> > > 
> > > Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> > > Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> > > Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> > > ---
> > >   New patch in v4
> > > 
> > >   hw/proxy/qemu-proxy.c         | 132 ++++++++++++++++++++++++++++++++++++++++++
> > >   include/hw/proxy/qemu-proxy.h |   2 +
> > >   include/io/mpqemu-link.h      |   1 +
> > >   3 files changed, 135 insertions(+)
> > > 
> > > diff --git a/hw/proxy/qemu-proxy.c b/hw/proxy/qemu-proxy.c
> > > index 623a6c5..ce72e6a 100644
> > > --- a/hw/proxy/qemu-proxy.c
> > > +++ b/hw/proxy/qemu-proxy.c
> > > @@ -52,6 +52,14 @@
> > >   #include "util/event_notifier-posix.c"
> > >   #include "hw/boards.h"
> > >   #include "include/qemu/log.h"
> > > +#include "io/channel.h"
> > > +#include "migration/qemu-file-types.h"
> > > +#include "qapi/error.h"
> > > +#include "io/channel-util.h"
> > > +#include "migration/qemu-file-channel.h"
> > > +#include "migration/qemu-file.h"
> > > +#include "migration/migration.h"
> > > +#include "migration/vmstate.h"
> > >   QEMUTimer *hb_timer;
> > >   static void pci_proxy_dev_realize(PCIDevice *dev, Error **errp);
> > > @@ -62,6 +70,9 @@ static void stop_heartbeat_timer(void);
> > >   static void childsig_handler(int sig, siginfo_t *siginfo, void *ctx);
> > >   static void broadcast_msg(MPQemuMsg *msg, bool need_reply);
> > > +#define PAGE_SIZE getpagesize()
> > > +uint8_t *mig_data;
> > > +
> > >   static void childsig_handler(int sig, siginfo_t *siginfo, void *ctx)
> > >   {
> > >       /* TODO: Add proper handler. */
> > > @@ -357,14 +368,135 @@ static void pci_proxy_dev_inst_init(Object *obj)
> > >       dev->mem_init = false;
> > >   }
> > > +typedef struct {
> > > +    QEMUFile *rem;
> > > +    PCIProxyDev *dev;
> > > +} proxy_mig_data;
> > > +
> > > +static void *proxy_mig_out(void *opaque)
> > > +{
> > > +    proxy_mig_data *data = opaque;
> > > +    PCIProxyDev *dev = data->dev;
> > > +    uint8_t byte;
> > > +    uint64_t data_size = PAGE_SIZE;
> > > +
> > > +    mig_data = g_malloc(data_size);
> > > +
> > > +    while (true) {
> > > +        byte = qemu_get_byte(data->rem);
> > 
> > There is a pretty large set of APIs hiding behind the qemu_get_byte
> > call, which does not give me confidence that...
> > 
> > > +        mig_data[dev->migsize++] = byte;
> > > +        if (dev->migsize == data_size) {
> > > +            data_size += PAGE_SIZE;
> > > +            mig_data = g_realloc(mig_data, data_size);
> > > +        }
> > > +    }
> > > +
> > > +    return NULL;
> > > +}
> > > +
> > > +static int proxy_pre_save(void *opaque)
> > > +{
> > > +    PCIProxyDev *pdev = opaque;
> > > +    proxy_mig_data *mig_data;
> > > +    QEMUFile *f_remote;
> > > +    MPQemuMsg msg = {0};
> > > +    QemuThread thread;
> > > +    Error *err = NULL;
> > > +    QIOChannel *ioc;
> > > +    uint64_t size;
> > > +    int fd[2];
> > > +
> > > +    if (socketpair(AF_UNIX, SOCK_STREAM, 0, fd)) {
> > > +        return -1;
> > > +    }
> > > +
> > > +    ioc = qio_channel_new_fd(fd[0], &err);
> > > +    if (err) {
> > > +        error_report_err(err);
> > > +        return -1;
> > > +    }
> > > +
> > > +    qio_channel_set_name(QIO_CHANNEL(ioc), "PCIProxyDevice-mig");
> > > +
> > > +    f_remote = qemu_fopen_channel_input(ioc);
> > > +
> > > +    pdev->migsize = 0;
> > > +
> > > +    mig_data = g_malloc0(sizeof(proxy_mig_data));
> > > +    mig_data->rem = f_remote;
> > > +    mig_data->dev = pdev;
> > > +
> > > +    qemu_thread_create(&thread, "Proxy MIG_OUT", proxy_mig_out, mig_data,
> > > +                       QEMU_THREAD_DETACHED);
> > > +
> > > +    msg.cmd = START_MIG_OUT;
> > > +    msg.bytestream = 0;
> > > +    msg.num_fds = 2;
> > > +    msg.fds[0] = fd[1];
> > > +    msg.fds[1] = GET_REMOTE_WAIT;
> > > +
> > > +    mpqemu_msg_send(pdev->mpqemu_link, &msg, pdev->mpqemu_link->com);
> > > +    size = wait_for_remote(msg.fds[1]);
> > > +    PUT_REMOTE_WAIT(msg.fds[1]);
> > > +
> > > +    assert(size != ULLONG_MAX);
> > > +
> > > +    /*
> > > +     * migsize is being update by a separate thread. Using volatile to
> > > +     * instruct the compiler to fetch the value of this variable from
> > > +     * memory during every read
> > > +     */
> > > +    while (*((volatile uint64_t *)&pdev->migsize) < size) {
> > > +    }
> > > +
> > > +    qemu_thread_cancel(&thread);
> > 
> > ....this is a safe way to stop the thread executing without
> > resulting in memory being leaked.
> > 
> > In addition thread cancellation is asynchronous, so the thread
> > may still be using the QEMUFile object while....
> > 
> > > +    qemu_fclose(f_remote);
> 
> The above "wait_for_remote()" call waits for the remote process to
> finish with Migration, and return the size of the VMSD.
> 
> It should be safe to cancel the thread and close the file, once the
> remote process is done sending the VMSD and we have read "size" bytes
> from it, is it not?

Ok, so the thread is doing 

    while (true) {
        byte = qemu_get_byte(data->rem);
        ...do something with byte...
    }

so when the thread is cancelled it is almost certainly in the
qemu_get_byte() call. Since you say wait_for_remote() syncs
with the end of migration, I'll presume there's no more data
to be read but the file is still open.

If we're using a blocking FD here we'll probably be stuck in
read() when we're cancelled, and cancellation would probably
be ok from looking at the current impl of QEMUFile / QIOChannel.
If we're handling any error scenario though there could be a
"Error *local_err" that needs freeing before cancellation.

If the fclose is processed before cancellation takes affect
on the target thread though we could have a race.

  1. proxy_mig_out blocked in read from qemu_fill_buffer

  2. main thread request async cancel

  3. main thread calls qemu_fclose which closes the FD
     and free's the QEMUFile object

  4. proxy_mig_out thread returns from read() with
     ret == 0 (EOF)

  5. proxy_mig_out thread calls qemu_file_set_error_obj
     on a QEMUFole object free'd in (3). use after free. opps

  6. ..async cancel request gets delivered....

admittedly it is fairly unlikely for the async cancel
to be delayed for so long that this sequence happens, but
unexpected things can happen when we really don't want them.

IMHO the safe way to deal with this would be a lock-step
sequence between the threads

   1. proxy_mig_out blocked in read from qemu_fill_buffer
   
   2. main thread closes the FD with qemu_file_shutdown()
      closing both directions

   3. proxy_mig_out returns from read with ret == 0 (EOF)

   4. proxy_mig_out thread breaks out of its inifinite loop
      due to EOF and exits

   5. main thread calls pthread_join on proxy_mig_out

   6. main thread calls qemu_fclose()

this is easier to reason about the safety of than the cancel based
approach IMHO.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 09/49] multi-process: setup PCI host bridge for remote device
  2019-11-13 16:07   ` Stefan Hajnoczi
@ 2019-11-18 15:25     ` Jag Raman
  2019-11-21 10:37       ` Stefan Hajnoczi
  0 siblings, 1 reply; 140+ messages in thread
From: Jag Raman @ 2019-11-18 15:25 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, liran.alon, rth, quintela, qemu-devel, armbru,
	ross.lagerwall, mst, kraxel, kwolf, pbonzini, berrange, mreitz,
	kanth.ghatraju, dgilbert, marcandre.lureau



On 11/13/2019 11:07 AM, Stefan Hajnoczi wrote:
> On Thu, Oct 24, 2019 at 05:08:50AM -0400, Jagannathan Raman wrote:
>> +static void remote_host_realize(DeviceState *dev, Error **errp)
>> +{
>> +    PCIHostState *pci = PCI_HOST_BRIDGE(dev);
>> +    RemPCIHost *s = REMOTE_HOST_DEVICE(dev);
>> +
>> +    /*
>> +     * TODO: the name of the bus would be provided by QEMU. Use
>> +     * "pcie.0" for now.
>> +     */
>> +    pci->bus = pci_root_bus_new(DEVICE(s), "pcie.0",
>> +                                s->mr_pci_mem, s->mr_sys_io,
>> +                                0, TYPE_PCIE_BUS);
> 
> The PCI bus name could be a property and then whatever instantiates
> RemPCIHost could set it.
> 
> Machine types usually hardcode the name because they assume there is
> only one machine instance.  In the case of mpqemu this is an okay
> starting point, but maybe multiple busses will become necessary if the
> device emulation process handles multiple device instances - especially
> if they are served to multiple guests like in a software-defined network
> switch use case.

Are you referring to a case where a single remote process will emulate
devices from multiple guests?

We haven't thought about that application. But we will certainly add the
ability to specify the name of the bus as a parameter.

Thank you very much!
--
Jag

> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 07/49] multi-process: define mpqemu-link object
  2019-11-13 15:53   ` Stefan Hajnoczi
@ 2019-11-18 15:26     ` Jag Raman
  0 siblings, 0 replies; 140+ messages in thread
From: Jag Raman @ 2019-11-18 15:26 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, liran.alon, rth, quintela, qemu-devel, armbru,
	ross.lagerwall, mst, kraxel, kwolf, pbonzini, berrange, mreitz,
	kanth.ghatraju, dgilbert, marcandre.lureau



On 11/13/2019 10:53 AM, Stefan Hajnoczi wrote:
> On Thu, Oct 24, 2019 at 05:08:48AM -0400, Jagannathan Raman wrote:
>> +#ifndef MPQEMU_LINK_H
>> +#define MPQEMU_LINK_H
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu-common.h"
>> +
>> +#include <stddef.h>
>> +#include <stdint.h>
> 
> These are already included by "qemu/osdep.h".
> 
>> +#include <pthread.h>
> 
> Is <pthread.h> needed?

It's not needed. We'll remove it.

> 
>> +
>> +#include "qom/object.h"
>> +#include "qemu/thread.h"
>> +
>> +#define TYPE_MPQEMU_LINK "mpqemu-link"
>> +#define MPQEMU_LINK(obj) \
>> +    OBJECT_CHECK(MPQemuLinkState, (obj), TYPE_MPQEMU_LINK)
>> +
>> +#define REMOTE_MAX_FDS 8
>> +
>> +#define MPQEMU_MSG_HDR_SIZE offsetof(MPQemuMsg, data1.u64)
>> +
>> +/**
>> + * mpqemu_cmd_t:
>> + * CONF_READ        PCI config. space read
>> + * CONF_WRITE       PCI config. space write
>> + *
>> + * proc_cmd_t enum type to specify the command to be executed on the remote
>> + * device.
>> + */
>> +typedef enum {
>> +    INIT = 0,
>> +    CONF_READ,
>> +    CONF_WRITE,
>> +    MAX,
>> +} mpqemu_cmd_t;
> 
> Please allow for future non-PCI devices by clearly naming PCI-specific
> commands and including a bus type in the initialization messages.

OK, will do.

> 
>> diff --git a/io/mpqemu-link.c b/io/mpqemu-link.c
>> new file mode 100644
>> index 0000000..b39f4d0
>> --- /dev/null
>> +++ b/io/mpqemu-link.c
>> @@ -0,0 +1,309 @@
>> +/*
>> + * Communication channel between QEMU and remote device process
>> + *
>> + * Copyright 2019, Oracle and/or its affiliates.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a copy
>> + * of this software and associated documentation files (the "Software"), to deal
>> + * in the Software without restriction, including without limitation the rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>> + * copies of the Software, and to permit persons to whom the Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>> + * THE SOFTWARE.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu-common.h"
>> +
>> +#include <assert.h>
>> +#include <errno.h>
>> +#include <pthread.h>
>> +#include <stdio.h>
>> +#include <stdlib.h>
>> +#include <sys/types.h>
>> +#include <sys/socket.h>
>> +#include <sys/un.h>
>> +#include <unistd.h>
>> +#include <limits.h>
>> +#include <poll.h>
> 
> Many of these are already included by "qemu/osdep.h".  Some of them
> shouldn't be used directly because QEMU or glib have abstractions that
> hide the platform-specific differences (e.g. pthread, poll).
> 
>> +MPQemuLinkState *mpqemu_link_create(void)
>> +{
>> +    return MPQEMU_LINK(object_new(TYPE_MPQEMU_LINK));
>> +}
> 
> I'm not sure what the purpose of this object is.  mpqemu_link_create()
> suggests the objects will be created internally instead of via -object
> mpqemu-link,..., which is unusual.
> 
> mpqemu_msg_send() and mpqemu_msg_recv() seem to be the main functions
> but they do not even use their MPQemuLinkState *s argument.

The LINK object is made up of multiple CHANNELS objects. For example, a
link between QEMU & the remote process could be comprised of multiple
channels.

You're correct, mpqemu_msg_send() & mpqemu_msg_recv() don't use the the
argument "s". This was a consequence of adding the multi-channel
support, before which this argument was used. We will fix this in the
next review.

Thank you!
--
Jag

> 
>> +void mpqemu_start_coms(MPQemuLinkState *s)
>> +{
>> +
>> +    g_assert(g_source_attach(&s->com->gsrc, s->ctx));
>> +
>> +    g_main_loop_run(s->loop);
>> +}
> 
> There is already IOThread if you need an event loop thread.  But does
> this need to be its own thread?  The communication should be
> asynchronous and therefore it can run in the main event loop or any
> existing IOThread.
> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 10/49] multi-process: setup a machine object for remote device process
  2019-11-13 16:22   ` Stefan Hajnoczi
@ 2019-11-18 15:29     ` Jag Raman
  0 siblings, 0 replies; 140+ messages in thread
From: Jag Raman @ 2019-11-18 15:29 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, quintela, berrange, mst, qemu-devel, armbru,
	ross.lagerwall, kanth.ghatraju, kraxel, kwolf, pbonzini,
	liran.alon, marcandre.lureau, mreitz, dgilbert, rth



On 11/13/2019 11:22 AM, Stefan Hajnoczi wrote:
> On Thu, Oct 24, 2019 at 05:08:51AM -0400, Jagannathan Raman wrote:
>> +static NotifierList machine_init_done_notifiers =
>> +    NOTIFIER_LIST_INITIALIZER(machine_init_done_notifiers);
>> +
>> +bool machine_init_done;
>> +
>> +void qemu_add_machine_init_done_notifier(Notifier *notify)
>> +{
>> +    notifier_list_add(&machine_init_done_notifiers, notify);
>> +    if (machine_init_done) {
>> +        notify->notify(notify, NULL);
>> +    }
>> +}
>> +
>> +void qemu_remove_machine_init_done_notifier(Notifier *notify)
>> +{
>> +    notifier_remove(notify);
>> +}
>> +
>> +void qemu_run_machine_init_done_notifiers(void)
>> +{
>> +    machine_init_done = true;
>> +    notifier_list_notify(&machine_init_done_notifiers, NULL);
>> +}
> 
> qemu_add_machine_init_done_notifier() is already defined in vl.c.
> Please share the implementation instead of duplicating it into the
> remote program.
> 
>> +
>> +static void remote_machine_init(Object *obj)
>> +{
>> +    RemMachineState *s = REMOTE_MACHINE(obj);
>> +    RemPCIHost *rem_host;
>> +    MemoryRegion *system_memory, *system_io, *pci_memory;
>> +
>> +    Error *error_abort = NULL;
>> +
>> +    qemu_mutex_init(&ram_list.mutex);
> 
> Please keep global initialization separate from RemMachineState (e.g. do
> it in main() or a function called by main()).  This function should only
> initialize RemMachineState.

OK, will do!

> 
>> +
>> +    object_property_add_child(object_get_root(), "machine", obj, &error_abort);
>> +    if (error_abort) {
>> +        error_report_err(error_abort);
>> +    }
>> +
>> +    memory_map_init();
> 
> This is global init, please move it elsewhere.

Got it, thank you!

> 
>> +
>> +    system_memory = get_system_memory();
>> +    system_io = get_system_io();
>> +
>> +    pci_memory = g_new(MemoryRegion, 1);
>> +    memory_region_init(pci_memory, NULL, "pci", UINT64_MAX);
>> +
>> +    rem_host = REMOTE_HOST_DEVICE(qdev_create(NULL, TYPE_REMOTE_HOST_DEVICE));
>> +
>> +    rem_host->mr_pci_mem = pci_memory;
>> +    rem_host->mr_sys_mem = system_memory;
>> +    rem_host->mr_sys_io = system_io;
>> +
>> +    s->host = rem_host;
> 
> Both s and rem_host are QOM objects.  There should be a child property
> relationship between them here.  It will ensure that rem_host is cleaned
> up when s is cleaned up.  Please use that instead of a regular C
> pointer.

OK, will add a property linking these two as parent-child.

Thank you!
--
Jag

> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 41/49] multi-process/mig: Enable VMSD save in the Proxy object
  2019-11-13 17:11       ` Daniel P. Berrangé
@ 2019-11-18 15:42         ` Jag Raman
  2019-11-22 10:34           ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 140+ messages in thread
From: Jag Raman @ 2019-11-18 15:42 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: elena.ufimtseva, fam, thuth, john.g.johnson, ehabkost,
	konrad.wilk, liran.alon, rth, quintela, qemu-devel, armbru,
	ross.lagerwall, mst, kraxel, stefanha, pbonzini, kanth.ghatraju,
	mreitz, kwolf, dgilbert, marcandre.lureau



On 11/13/2019 12:11 PM, Daniel P. Berrangé wrote:
> On Wed, Nov 13, 2019 at 11:32:09AM -0500, Jag Raman wrote:
>>
>>
>> On 11/13/2019 10:50 AM, Daniel P. Berrangé wrote:
>>> On Thu, Oct 24, 2019 at 05:09:22AM -0400, Jagannathan Raman wrote:
>>>> Collect the VMSD from remote process on the source and save
>>>> it to the channel leading to the destination
>>>>
>>>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>>>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>>>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>>>> ---
>>>>    New patch in v4
>>>>
>>>>    hw/proxy/qemu-proxy.c         | 132 ++++++++++++++++++++++++++++++++++++++++++
>>>>    include/hw/proxy/qemu-proxy.h |   2 +
>>>>    include/io/mpqemu-link.h      |   1 +
>>>>    3 files changed, 135 insertions(+)
>>>>
>>>> diff --git a/hw/proxy/qemu-proxy.c b/hw/proxy/qemu-proxy.c
>>>> index 623a6c5..ce72e6a 100644
>>>> --- a/hw/proxy/qemu-proxy.c
>>>> +++ b/hw/proxy/qemu-proxy.c
>>>> @@ -52,6 +52,14 @@
>>>>    #include "util/event_notifier-posix.c"
>>>>    #include "hw/boards.h"
>>>>    #include "include/qemu/log.h"
>>>> +#include "io/channel.h"
>>>> +#include "migration/qemu-file-types.h"
>>>> +#include "qapi/error.h"
>>>> +#include "io/channel-util.h"
>>>> +#include "migration/qemu-file-channel.h"
>>>> +#include "migration/qemu-file.h"
>>>> +#include "migration/migration.h"
>>>> +#include "migration/vmstate.h"
>>>>    QEMUTimer *hb_timer;
>>>>    static void pci_proxy_dev_realize(PCIDevice *dev, Error **errp);
>>>> @@ -62,6 +70,9 @@ static void stop_heartbeat_timer(void);
>>>>    static void childsig_handler(int sig, siginfo_t *siginfo, void *ctx);
>>>>    static void broadcast_msg(MPQemuMsg *msg, bool need_reply);
>>>> +#define PAGE_SIZE getpagesize()
>>>> +uint8_t *mig_data;
>>>> +
>>>>    static void childsig_handler(int sig, siginfo_t *siginfo, void *ctx)
>>>>    {
>>>>        /* TODO: Add proper handler. */
>>>> @@ -357,14 +368,135 @@ static void pci_proxy_dev_inst_init(Object *obj)
>>>>        dev->mem_init = false;
>>>>    }
>>>> +typedef struct {
>>>> +    QEMUFile *rem;
>>>> +    PCIProxyDev *dev;
>>>> +} proxy_mig_data;
>>>> +
>>>> +static void *proxy_mig_out(void *opaque)
>>>> +{
>>>> +    proxy_mig_data *data = opaque;
>>>> +    PCIProxyDev *dev = data->dev;
>>>> +    uint8_t byte;
>>>> +    uint64_t data_size = PAGE_SIZE;
>>>> +
>>>> +    mig_data = g_malloc(data_size);
>>>> +
>>>> +    while (true) {
>>>> +        byte = qemu_get_byte(data->rem);
>>>
>>> There is a pretty large set of APIs hiding behind the qemu_get_byte
>>> call, which does not give me confidence that...
>>>
>>>> +        mig_data[dev->migsize++] = byte;
>>>> +        if (dev->migsize == data_size) {
>>>> +            data_size += PAGE_SIZE;
>>>> +            mig_data = g_realloc(mig_data, data_size);
>>>> +        }
>>>> +    }
>>>> +
>>>> +    return NULL;
>>>> +}
>>>> +
>>>> +static int proxy_pre_save(void *opaque)
>>>> +{
>>>> +    PCIProxyDev *pdev = opaque;
>>>> +    proxy_mig_data *mig_data;
>>>> +    QEMUFile *f_remote;
>>>> +    MPQemuMsg msg = {0};
>>>> +    QemuThread thread;
>>>> +    Error *err = NULL;
>>>> +    QIOChannel *ioc;
>>>> +    uint64_t size;
>>>> +    int fd[2];
>>>> +
>>>> +    if (socketpair(AF_UNIX, SOCK_STREAM, 0, fd)) {
>>>> +        return -1;
>>>> +    }
>>>> +
>>>> +    ioc = qio_channel_new_fd(fd[0], &err);
>>>> +    if (err) {
>>>> +        error_report_err(err);
>>>> +        return -1;
>>>> +    }
>>>> +
>>>> +    qio_channel_set_name(QIO_CHANNEL(ioc), "PCIProxyDevice-mig");
>>>> +
>>>> +    f_remote = qemu_fopen_channel_input(ioc);
>>>> +
>>>> +    pdev->migsize = 0;
>>>> +
>>>> +    mig_data = g_malloc0(sizeof(proxy_mig_data));
>>>> +    mig_data->rem = f_remote;
>>>> +    mig_data->dev = pdev;
>>>> +
>>>> +    qemu_thread_create(&thread, "Proxy MIG_OUT", proxy_mig_out, mig_data,
>>>> +                       QEMU_THREAD_DETACHED);
>>>> +
>>>> +    msg.cmd = START_MIG_OUT;
>>>> +    msg.bytestream = 0;
>>>> +    msg.num_fds = 2;
>>>> +    msg.fds[0] = fd[1];
>>>> +    msg.fds[1] = GET_REMOTE_WAIT;
>>>> +
>>>> +    mpqemu_msg_send(pdev->mpqemu_link, &msg, pdev->mpqemu_link->com);
>>>> +    size = wait_for_remote(msg.fds[1]);
>>>> +    PUT_REMOTE_WAIT(msg.fds[1]);
>>>> +
>>>> +    assert(size != ULLONG_MAX);
>>>> +
>>>> +    /*
>>>> +     * migsize is being update by a separate thread. Using volatile to
>>>> +     * instruct the compiler to fetch the value of this variable from
>>>> +     * memory during every read
>>>> +     */
>>>> +    while (*((volatile uint64_t *)&pdev->migsize) < size) {
>>>> +    }
>>>> +
>>>> +    qemu_thread_cancel(&thread);
>>>
>>> ....this is a safe way to stop the thread executing without
>>> resulting in memory being leaked.
>>>
>>> In addition thread cancellation is asynchronous, so the thread
>>> may still be using the QEMUFile object while....
>>>
>>>> +    qemu_fclose(f_remote);
>>
>> The above "wait_for_remote()" call waits for the remote process to
>> finish with Migration, and return the size of the VMSD.
>>
>> It should be safe to cancel the thread and close the file, once the
>> remote process is done sending the VMSD and we have read "size" bytes
>> from it, is it not?
> 
> Ok, so the thread is doing
> 
>      while (true) {
>          byte = qemu_get_byte(data->rem);
>          ...do something with byte...
>      }
> 
> so when the thread is cancelled it is almost certainly in the
> qemu_get_byte() call. Since you say wait_for_remote() syncs
> with the end of migration, I'll presume there's no more data
> to be read but the file is still open.
> 
> If we're using a blocking FD here we'll probably be stuck in
> read() when we're cancelled, and cancellation would probably
> be ok from looking at the current impl of QEMUFile / QIOChannel.
> If we're handling any error scenario though there could be a
> "Error *local_err" that needs freeing before cancellation.
> 
> If the fclose is processed before cancellation takes affect
> on the target thread though we could have a race.
> 
>    1. proxy_mig_out blocked in read from qemu_fill_buffer
> 
>    2. main thread request async cancel
> 
>    3. main thread calls qemu_fclose which closes the FD
>       and free's the QEMUFile object
> 
>    4. proxy_mig_out thread returns from read() with
>       ret == 0 (EOF)

This wasn't happening. It would be convenient if it did.

When the file was closed by the main thread, the async thread was still
hung at qemu_fill_buffer(), instead of returning 0 (EOF). That's reason
why we took the thread-cancellation route. We'd be glad to remove
qemu_thread_cancel().

> 
>    5. proxy_mig_out thread calls qemu_file_set_error_obj
>       on a QEMUFole object free'd in (3). use after free. opps
> 
>    6. ..async cancel request gets delivered....
> 
> admittedly it is fairly unlikely for the async cancel
> to be delayed for so long that this sequence happens, but
> unexpected things can happen when we really don't want them.

Absolutely, we don't want to leave anything to chance.

> 
> IMHO the safe way to deal with this would be a lock-step
> sequence between the threads
> 
>     1. proxy_mig_out blocked in read from qemu_fill_buffer
>     
>     2. main thread closes the FD with qemu_file_shutdown()
>        closing both directions

Will give qemu_file_shutdown() a try.

Thank you!
--
Jag

> 
>     3. proxy_mig_out returns from read with ret == 0 (EOF)
> 
>     4. proxy_mig_out thread breaks out of its inifinite loop
>        due to EOF and exits
> 
>     5. main thread calls pthread_join on proxy_mig_out
> 
>     6. main thread calls qemu_fclose()
> 
> this is easier to reason about the safety of than the cancel based
> approach IMHO.
> 
> Regards,
> Daniel
> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 09/49] multi-process: setup PCI host bridge for remote device
  2019-11-18 15:25     ` Jag Raman
@ 2019-11-21 10:37       ` Stefan Hajnoczi
  0 siblings, 0 replies; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-21 10:37 UTC (permalink / raw)
  To: Jag Raman
  Cc: elena.ufimtseva, fam, john.g.johnson, qemu-devel, kraxel,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, Stefan Hajnoczi, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1709 bytes --]

On Mon, Nov 18, 2019 at 10:25:59AM -0500, Jag Raman wrote:
> On 11/13/2019 11:07 AM, Stefan Hajnoczi wrote:
> > On Thu, Oct 24, 2019 at 05:08:50AM -0400, Jagannathan Raman wrote:
> > > +static void remote_host_realize(DeviceState *dev, Error **errp)
> > > +{
> > > +    PCIHostState *pci = PCI_HOST_BRIDGE(dev);
> > > +    RemPCIHost *s = REMOTE_HOST_DEVICE(dev);
> > > +
> > > +    /*
> > > +     * TODO: the name of the bus would be provided by QEMU. Use
> > > +     * "pcie.0" for now.
> > > +     */
> > > +    pci->bus = pci_root_bus_new(DEVICE(s), "pcie.0",
> > > +                                s->mr_pci_mem, s->mr_sys_io,
> > > +                                0, TYPE_PCIE_BUS);
> > 
> > The PCI bus name could be a property and then whatever instantiates
> > RemPCIHost could set it.
> > 
> > Machine types usually hardcode the name because they assume there is
> > only one machine instance.  In the case of mpqemu this is an okay
> > starting point, but maybe multiple busses will become necessary if the
> > device emulation process handles multiple device instances - especially
> > if they are served to multiple guests like in a software-defined network
> > switch use case.
> 
> Are you referring to a case where a single remote process will emulate
> devices from multiple guests?
> 
> We haven't thought about that application. But we will certainly add the
> ability to specify the name of the bus as a parameter.

Sooner or later someone will want to run multiple devices in one device
emulation process, but it's not critical to support it in this patch
series.  I think it can be implemented later without breaking any stable
interfaces.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 13/49] multi-process: introduce proxy object
  2019-10-24  9:08 ` [RFC v4 PATCH 13/49] multi-process: introduce proxy object Jagannathan Raman
@ 2019-11-21 11:09   ` Stefan Hajnoczi
  0 siblings, 0 replies; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-21 11:09 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, fam, john.g.johnson, qemu-devel, kraxel,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

[-- Attachment #1: Type: text/plain, Size: 10387 bytes --]

On Thu, Oct 24, 2019 at 05:08:54AM -0400, Jagannathan Raman wrote:
> diff --git a/hw/proxy/qemu-proxy.c b/hw/proxy/qemu-proxy.c
> new file mode 100644
> index 0000000..baba4da
> --- /dev/null
> +++ b/hw/proxy/qemu-proxy.c
> @@ -0,0 +1,247 @@
> +/*
> + * Copyright 2019, Oracle and/or its affiliates.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <sys/types.h>
> +#include <sys/socket.h>
> +#include <unistd.h>
> +#include <assert.h>
> +#include <string.h>
> +#include "qemu/osdep.h"

Most of these includes are not necessary.  Please see "Include
directives" in CODING_STYLE.rst.  "qemu/osdep.h" is always first (even
before system headers) and it already includes the common system
headers.

> +int remote_spawn(PCIProxyDev *pdev, const char *command, Error **errp)
> +{
> +    char *args[3];
> +    pid_t rpid;
> +    int fd[2] = {-1, -1};
> +    Error *local_error = NULL;
> +
> +    if (pdev->managed) {
> +        /* Child is forked by external program (such as libvirt). */
> +        return -1;
> +    }
> +
> +    if (socketpair(AF_UNIX, SOCK_STREAM, 0, fd)) {
> +        error_setg(errp, "Unable to create unix socket.");
> +        return -1;
> +    }
> +    /* TODO: Restrict the forked process' permissions and capabilities. */
> +    rpid = qemu_fork(&local_error);
> +
> +    if (rpid == -1) {
> +        error_setg(errp, "Unable to spawn emulation program.");
> +        close(fd[0]);
> +        close(fd[1]);
> +        return -1;
> +    }
> +
> +    if (rpid == 0) {
> +        close(fd[0]);
> +
> +        args[0] = g_strdup(command);
> +        args[1] = g_strdup_printf("%d", fd[1]);
> +        args[2] = NULL;
> +        execvp(args[0], (char *const *)args);

execv(3) is safer because it doesn't search PATH.  Unless searching PATH
is really needed I would use that instead just in case this is ever
deployed in an environment where an attacker controls a directory in
PATH or is able to set PATH.

> +static int config_op_send(PCIProxyDev *dev, uint32_t addr, uint32_t *val, int l,
> +                          unsigned int op)
> +{
> +    MPQemuMsg msg;
> +    struct conf_data_msg conf_data;
> +    int wait;
> +
> +    memset(&msg, 0, sizeof(MPQemuMsg));
> +    conf_data.addr = addr;
> +    conf_data.val = (op == CONF_WRITE) ? *val : 0;
> +    conf_data.l = l;
> +
> +    msg.data2 = (uint8_t *)malloc(sizeof(conf_data));
> +    if (!msg.data2) {
> +        return -ENOMEM;
> +    }
> +
> +    memcpy(msg.data2, (const uint8_t *)&conf_data, sizeof(conf_data));
> +    msg.size = sizeof(conf_data);

Why malloc msg.data2 instead of simply pointing it at conf_data?

> +    msg.cmd = op;
> +    msg.bytestream = 1;
> +
> +    if (op == CONF_WRITE) {
> +        msg.num_fds = 0;
> +    } else {
> +        wait = GET_REMOTE_WAIT;

It seems slow to create an fd and pass it for each 32-bit PCI
Configuration Space read operation.  This doesn't need to be changed
right now, but eventually the protocol should handle this more
efficiently.

> +        msg.num_fds = 1;
> +        msg.fds[0] = wait;
> +    }
> +
> +    mpqemu_msg_send(dev->mpqemu_link, &msg, dev->mpqemu_link->com);
> +
> +    if (op == CONF_READ) {
> +        *val = (uint32_t)wait_for_remote(wait);
> +        PUT_REMOTE_WAIT(wait);
> +    }

Waiting for the eventfd blocks the event loop.  This means timers and
other fds won't be serviced when the remote is slow to respond.  Please
avoid blocking operations in event loop threads.

> +
> +    free(msg.data2);
> +
> +    return 0;
> +}
> +
> +static uint32_t pci_proxy_read_config(PCIDevice *d, uint32_t addr, int len)
> +{
> +    uint32_t val;
> +
> +    (void)pci_default_read_config(d, addr, len);

What is the purpose of this call?

> +static const TypeInfo pci_proxy_dev_type_info = {
> +    .name          = TYPE_PCI_PROXY_DEV,
> +    .parent        = TYPE_PCI_DEVICE,
> +    .instance_size = sizeof(PCIProxyDev),
> +    .abstract      = true,
> +    .class_size    = sizeof(PCIProxyDevClass),
> +    .class_init    = pci_proxy_dev_class_init,
> +    .interfaces = (InterfaceInfo[]) {
> +        { INTERFACE_CONVENTIONAL_PCI_DEVICE },
> +        { },
> +    },
> +};

It would be nice for -device pci-proxy-dev to work as a placeholder for
*any* PCI bus device without the need to define concrete subclasses.
Could the protocol exchange the PCI device configuration (similar to
VFIO and muser ioctls) so that this single object can act as any remote
PCI device?

> diff --git a/include/hw/proxy/qemu-proxy.h b/include/hw/proxy/qemu-proxy.h
> new file mode 100644
> index 0000000..3648a77
> --- /dev/null
> +++ b/include/hw/proxy/qemu-proxy.h
> @@ -0,0 +1,81 @@
> +/*
> + * Copyright 2019, Oracle and/or its affiliates.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#ifndef QEMU_PROXY_H
> +#define QEMU_PROXY_H
> +
> +#include "io/mpqemu-link.h"
> +
> +#define TYPE_PCI_PROXY_DEV "pci-proxy-dev"
> +
> +#define PCI_PROXY_DEV(obj) \
> +            OBJECT_CHECK(PCIProxyDev, (obj), TYPE_PCI_PROXY_DEV)
> +
> +#define PCI_PROXY_DEV_CLASS(klass) \
> +            OBJECT_CLASS_CHECK(PCIProxyDevClass, (klass), TYPE_PCI_PROXY_DEV)
> +
> +#define PCI_PROXY_DEV_GET_CLASS(obj) \
> +            OBJECT_GET_CLASS(PCIProxyDevClass, (obj), TYPE_PCI_PROXY_DEV)
> +
> +typedef struct PCIProxyDev {
> +    PCIDevice parent_dev;
> +
> +    int n_mr_sections;
> +    MemoryRegionSection *mr_sections;

Unused.

> +
> +    MPQemuLinkState *mpqemu_link;
> +
> +    EventNotifier intr;
> +    EventNotifier resample;

Unused.

> +
> +    pid_t remote_pid;
> +    int rsocket;
> +    int socket;

What is the difference between rsocket and socket?  Why is socket only
read in this patch and never written?

> +
> +    char *rid;

Can remote_pid and rid be unified.  They store the same value in
different representations.

> +
> +    bool managed;
> +    char *dev_id;

dev_id is unused.

> +
> +    QLIST_ENTRY(PCIProxyDev) next;

Unused.

> +
> +    void (*set_proxy_sock) (PCIDevice *dev, int socket);
> +    int (*get_proxy_sock) (PCIDevice *dev);
> +
> +    void (*set_remote_opts) (PCIDevice *dev, QDict *qdict, unsigned int cmd);
> +    void (*proxy_ready) (PCIDevice *dev);

Unused.

> +    void (*init_proxy) (PCIDevice *pdev, char *command, Error **errp);

Why are these function pointers not in PCIProxyDevClass?

> +
> +} PCIProxyDev;
> +
> +typedef struct PCIProxyDevClass {
> +    PCIDeviceClass parent_class;
> +
> +    void (*realize)(PCIProxyDev *dev, Error **errp);
> +
> +    char *command;
> +} PCIProxyDevClass;
> +
> +int remote_spawn(PCIProxyDev *pdev, const char *command, Error **errp);

Does this function need to be publicly visible?

> diff --git a/remote/remote-main.c b/remote/remote-main.c
> index 7689b57..6c2eb91 100644
> --- a/remote/remote-main.c
> +++ b/remote/remote-main.c
> @@ -50,6 +50,32 @@
>  static MPQemuLinkState *mpqemu_link;
>  PCIDevice *remote_pci_dev;
>  
> +static void process_config_write(MPQemuMsg *msg)
> +{
> +    struct conf_data_msg *conf = (struct conf_data_msg *)msg->data2;
> +
> +    qemu_mutex_lock_iothread();
> +    pci_default_write_config(remote_pci_dev, conf->addr, conf->val, conf->l);
> +    qemu_mutex_unlock_iothread();
> +}
> +
> +static void process_config_read(MPQemuMsg *msg)
> +{
> +    struct conf_data_msg *conf = (struct conf_data_msg *)msg->data2;
> +    uint32_t val;
> +    int wait;
> +
> +    wait = msg->fds[0];
> +
> +    qemu_mutex_lock_iothread();
> +    val = pci_default_read_config(remote_pci_dev, conf->addr, conf->l);
> +    qemu_mutex_unlock_iothread();
> +
> +    notify_proxy(wait, val);
> +
> +    PUT_REMOTE_WAIT(wait);
> +}

Input validation is missing in these message handler functions.  I won't
look out for this in patches that follow anymore.  All message handler
functions need to be audited.  They must check the message size before
accessing fields, that fds[0] was indeed passed, etc.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 14/49] mutli-process: build remote command line args
  2019-10-24  9:08 ` [RFC v4 PATCH 14/49] mutli-process: build remote command line args Jagannathan Raman
@ 2019-11-21 11:23   ` Stefan Hajnoczi
  0 siblings, 0 replies; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-21 11:23 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, fam, john.g.johnson, qemu-devel, kraxel,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

[-- Attachment #1: Type: text/plain, Size: 5398 bytes --]

On Thu, Oct 24, 2019 at 05:08:55AM -0400, Jagannathan Raman wrote:
> From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> ---
>  New patch in v3
> 
>  hw/proxy/qemu-proxy.c         | 80 +++++++++++++++++++++++++++++++++----------
>  include/hw/proxy/qemu-proxy.h |  2 +-
>  2 files changed, 62 insertions(+), 20 deletions(-)
> 
> diff --git a/hw/proxy/qemu-proxy.c b/hw/proxy/qemu-proxy.c
> index baba4da..ca7dd1a 100644
> --- a/hw/proxy/qemu-proxy.c
> +++ b/hw/proxy/qemu-proxy.c
> @@ -45,47 +45,89 @@
>  
>  static void pci_proxy_dev_realize(PCIDevice *dev, Error **errp);
>  
> +static int add_argv(char *command_str, char **argv, int argc)
> +{
> +    int max_args = 64;
> +
> +    if (argc < max_args - 1) {
> +        argv[argc++] = command_str;
> +        argv[argc] = 0;
> +    } else {
> +        return 0;
> +    }
> +
> +    return argc;
> +}
> +
> +static int make_argv(char *command_str, char **argv, int argc)
> +{
> +    int max_args = 64;
> +
> +    char *p2 = strtok(command_str, " ");
> +    while (p2 && argc < max_args - 1) {
> +        argv[argc++] = p2;
> +        p2 = strtok(0, " ");
> +    }
> +    argv[argc] = 0;
> +
> +    return argc;
> +}

So "command" isn't really the command-line, it's a string of options to
append to the hardcoded qemu-scsi-dev command?

This needs to command-line string construction needs to be cleaned up
and the hardcoded qemu-scsi-dev needs to be replaced with an argument.

> +
>  int remote_spawn(PCIProxyDev *pdev, const char *command, Error **errp)

Error handling code is currently inconsistent because there is an int
-errno return value and an errp argument.  For example, errp isn't set
when pdev->managed == true.

The int -errno return value isn't needed.  It can be just be bool and
errp should be set in every single error code path.

>  {
> -    char *args[3];
>      pid_t rpid;
>      int fd[2] = {-1, -1};
>      Error *local_error = NULL;
> +    char *argv[64];
> +    int argc = 0, _argc;
> +    char *sfd;
> +    char *exec_dir;
> +    int rc = -EINVAL;
>  
>      if (pdev->managed) {
>          /* Child is forked by external program (such as libvirt). */
> -        return -1;
> +        return rc;
>      }
>  
>      if (socketpair(AF_UNIX, SOCK_STREAM, 0, fd)) {
>          error_setg(errp, "Unable to create unix socket.");
> -        return -1;
> +        return rc;
>      }
> +    exec_dir = g_strdup_printf("%s/%s", qemu_get_exec_dir(), "qemu-scsi-dev");
> +    argc = add_argv(exec_dir, argv, argc);
> +    sfd = g_strdup_printf("%d", fd[1]);
> +    argc = add_argv(sfd, argv, argc);
> +    _argc = argc;
> +    argc = make_argv((char *)command, argv, argc);
> +
>      /* TODO: Restrict the forked process' permissions and capabilities. */
>      rpid = qemu_fork(&local_error);
>  
>      if (rpid == -1) {
>          error_setg(errp, "Unable to spawn emulation program.");
>          close(fd[0]);
> -        close(fd[1]);
> -        return -1;
> +        goto fail;
>      }
>  
>      if (rpid == 0) {
>          close(fd[0]);
> -
> -        args[0] = g_strdup(command);
> -        args[1] = g_strdup_printf("%d", fd[1]);
> -        args[2] = NULL;
> -        execvp(args[0], (char *const *)args);
> +        execvp(argv[0], (char *const *)argv);
>          exit(1);
>      }
>      pdev->remote_pid = rpid;
> -    pdev->rsocket = fd[0];
> +    pdev->rsocket = fd[1];
> +    pdev->socket = fd[0];

Please choose meaningful names for these fields.  I'm not sure why both
need to be kept around though...

>  
> +    rc = 0;
> +
> +fail:
>      close(fd[1]);
>  
> -    return 0;
> +    for (int i = 0; i < _argc; i++) {
> +        g_free(argv[i]);
> +    }
> +
> +    return rc;
>  }
>  
>  static int get_proxy_sock(PCIDevice *dev)
> @@ -94,7 +136,7 @@ static int get_proxy_sock(PCIDevice *dev)
>  
>      pdev = PCI_PROXY_DEV(dev);
>  
> -    return pdev->rsocket;
> +    return pdev->socket;
>  }
>  
>  static void set_proxy_sock(PCIDevice *dev, int socket)
> @@ -103,7 +145,7 @@ static void set_proxy_sock(PCIDevice *dev, int socket)
>  
>      pdev = PCI_PROXY_DEV(dev);
>  
> -    pdev->rsocket = socket;
> +    pdev->socket = socket;
>      pdev->managed = true;
>  
>  }
> @@ -198,16 +240,16 @@ static void pci_proxy_dev_register_types(void)
>  
>  type_init(pci_proxy_dev_register_types)
>  
> -static void init_proxy(PCIDevice *dev, char *command, Error **errp)
> +static void init_proxy(PCIDevice *dev, char *command, bool need_spawn, Error **errp)
>  {
>      PCIProxyDev *pdev = PCI_PROXY_DEV(dev);
>      Error *local_error = NULL;
>  
>      if (!pdev->managed) {
> -        if (command) {
> -            remote_spawn(pdev, command, &local_error);
> -        } else {
> -            return;
> +        if (need_spawn) {

pdev->managed, command == NULL, need_spawn are all ways of checking
whether the remote process needs to be spawned.  Why are all of them
necessary and can they be simplified?

> +            if (!remote_spawn(pdev, command, &local_error)) {
> +                return;

local_error needs to be propagated.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 15/49] multi-process: PCI BAR read/write handling for proxy & remote endpoints
  2019-10-24  9:08 ` [RFC v4 PATCH 15/49] multi-process: PCI BAR read/write handling for proxy & remote endpoints Jagannathan Raman
@ 2019-11-21 11:33   ` Stefan Hajnoczi
  0 siblings, 0 replies; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-21 11:33 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, fam, john.g.johnson, qemu-devel, kraxel,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1697 bytes --]

On Thu, Oct 24, 2019 at 05:08:56AM -0400, Jagannathan Raman wrote:
> +const MemoryRegionOps proxy_default_ops = {

Unused.  Please structure patch series so that each patch is a
self-contained logical change.  It should be possible to review the
series in order from start to finish.

If there is no user yet and this is a public API then there need to be
doc comments describing the API.

> diff --git a/include/io/mpqemu-link.h b/include/io/mpqemu-link.h
> index 7ef8207..89f04c5 100644
> --- a/include/io/mpqemu-link.h
> +++ b/include/io/mpqemu-link.h
> @@ -52,6 +52,8 @@
>   * CONF_READ        PCI config. space read
>   * CONF_WRITE       PCI config. space write
>   * SYNC_SYSMEM      Shares QEMU's RAM with remote device's RAM
> + * BAR_WRITE        Writes to PCI BAR region
> + * BAR_READ         Reads from PCI BAR region

Is it possible to generalize this to memory regions instead of PCI BARs?
That way non-PCI devices will be able to use the same protocol messages
and code.  VFIO describes BARs generically too for the same reason, see
<linux/vfio.h> struct vfio_region_info.

>   *
>   * proc_cmd_t enum type to specify the command to be executed on the remote
>   * device.
> @@ -61,6 +63,8 @@ typedef enum {
>      CONF_READ,
>      CONF_WRITE,
>      SYNC_SYSMEM,
> +    BAR_WRITE,
> +    BAR_READ,
>      MAX,
>  } mpqemu_cmd_t;
>  
> @@ -84,6 +88,13 @@ typedef struct {
>  } sync_sysmem_msg_t;
>  
>  typedef struct {
> +    hwaddr addr;
> +    uint64_t val;
> +    unsigned size;
> +    bool memory;

Why is this field necessary?  Whether this is a memory access or not
should be implicit from the address/BAR we are accessing.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 16/49] multi-process: Add LSI device proxy object
  2019-10-24  9:08 ` [RFC v4 PATCH 16/49] multi-process: Add LSI device proxy object Jagannathan Raman
@ 2019-11-21 11:35   ` Stefan Hajnoczi
  0 siblings, 0 replies; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-21 11:35 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, fam, john.g.johnson, qemu-devel, kraxel,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

[-- Attachment #1: Type: text/plain, Size: 368 bytes --]

On Thu, Oct 24, 2019 at 05:08:57AM -0400, Jagannathan Raman wrote:
> Adds proxy-lsi53c895a object, as a derivative of the pci-proxy-dev
> object. This object is the proxy for the lsi53c895a object
> instantiated by the remote process.

The same information could be fetched from the remote process.  That
would eliminate the need for per-device proxy objects.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 17/49] multi-process: Synchronize remote memory
  2019-10-24  9:08 ` [RFC v4 PATCH 17/49] multi-process: Synchronize remote memory Jagannathan Raman
@ 2019-11-21 11:44   ` Stefan Hajnoczi
  0 siblings, 0 replies; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-21 11:44 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, fam, john.g.johnson, qemu-devel, kraxel,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1362 bytes --]

On Thu, Oct 24, 2019 at 05:08:58AM -0400, Jagannathan Raman wrote:
> +static const TypeInfo remote_mem_sync_type_info = {
> +    .name          = TYPE_MEMORY_LISTENER,
> +    .parent        = TYPE_OBJECT,
> +    .instance_size = sizeof(RemoteMemSync),
> +};
> +
> +static void remote_mem_sync_register_types(void)
> +{
> +    type_register_static(&remote_mem_sync_type_info);
> +}
> +
> +type_init(remote_mem_sync_register_types)

Why is a QEMU Object necessary for the memory listener?  QEMU Objects
are used for the device model and -object.  The memory listener is an
internal concept that doesn't need to be exposed as a QEMU Object.  It's
fine to use plain C structs and functions, not everything needs to be a
QEMU Object.

> +/*
> + * TODO: Memory Sync need not be instantianted once per every proxy device.
> + *       All remote devices are going to get the exact same updates at the
> + *       same time. It therefore makes sense to have a broadcast model.
> + *
> + *       Broadcast model would involve running the MemorySync object in a
> + *       thread. MemorySync would contain a list of mpqemu-link objects
> + *       that need notification. proxy_ml_commit() could send the same
> + *       message to all the links at the same time.

Once mpqemu-link is made event-loop friendly (asynchronous) it won't be
necessary to create more threads.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 18/49] multi-process: create IOHUB object to handle irq
  2019-10-24  9:08 ` [RFC v4 PATCH 18/49] multi-process: create IOHUB object to handle irq Jagannathan Raman
@ 2019-11-21 12:02   ` Stefan Hajnoczi
  0 siblings, 0 replies; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-21 12:02 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, fam, john.g.johnson, qemu-devel, kraxel,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1833 bytes --]

On Thu, Oct 24, 2019 at 05:08:59AM -0400, Jagannathan Raman wrote:

I don't know the interrupt code well enough to decide whether it's
necessary to do so much work and tie the protocol to the KVM API.  The
main QEMU process already has the KVM API code and the ability to deal
with these things.  I was expecting something much simpler, like
protocol messages that pass a single eventfd for raising an interrupt
and no state tracking interrupt levels.

> +static void intr_resample_handler(void *opaque)
> +{
> +    ResampleToken *token = opaque;
> +    RemoteIOHubState *iohub = token->iohub;
> +    uint64_t val;
> +    int pirq, s;
> +
> +    pirq = token->pirq;
> +
> +    s = read(event_notifier_get_fd(&iohub->resamplefds[pirq]), &val,
> +             sizeof(uint64_t));

Please use event_notifier_test_and_clear().

> +
> +    assert(s >= 0);
> +
> +    qemu_mutex_lock(&iohub->irq_level_lock[pirq]);
> +
> +    if (iohub->irq_level[pirq]) {
> +        event_notifier_set(&iohub->irqfds[pirq]);
> +    }
> +
> +    qemu_mutex_unlock(&iohub->irq_level_lock[pirq]);
> +}
> +
> +void process_set_irqfd_msg(PCIDevice *pci_dev, MPQemuMsg *msg)

This function doesn't handle the case where SET_IRQFD is sent multiple
times for the same interrupt gracefully.

> +{
> +    RemMachineState *machine = REMOTE_MACHINE(current_machine);
> +    RemoteIOHubState *iohub = machine->iohub;
> +    ResampleToken *token;
> +    int pirq = remote_iohub_map_irq(pci_dev, msg->data1.set_irqfd.intx);
> +
> +    assert(msg->num_fds == 2);
> +
> +    event_notifier_init_fd(&iohub->irqfds[pirq], msg->fds[0]);
> +    event_notifier_init_fd(&iohub->resamplefds[pirq], msg->fds[1]);

event_notifier_cleanup() is missing.

> +
> +    token = g_malloc0(sizeof(ResampleToken));

I couldn't find a g_free() and wonder if this needs to be malloced at
all.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 19/49] multi-process: configure remote side devices
  2019-10-24  9:09 ` [RFC v4 PATCH 19/49] multi-process: configure remote side devices Jagannathan Raman
@ 2019-11-21 12:05   ` Stefan Hajnoczi
  0 siblings, 0 replies; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-21 12:05 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, fam, john.g.johnson, qemu-devel, kraxel,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

[-- Attachment #1: Type: text/plain, Size: 696 bytes --]

On Thu, Oct 24, 2019 at 05:09:00AM -0400, Jagannathan Raman wrote:
> +static void set_remote_opts(PCIDevice *dev, QDict *qdict, unsigned int cmd)
> +{
> +    QString *qstr;
> +    MPQemuMsg msg;
> +    const char *str;
> +    PCIProxyDev *pdev;
> +
> +    pdev = PCI_PROXY_DEV(dev);
> +
> +    qstr = qobject_to_json(QOBJECT(qdict));

qstr is leaked.

> +    str = qstring_get_str(qstr);
> +
> +    memset(&msg, 0, sizeof(MPQemuMsg));
> +
> +    msg.data2 = (uint8_t *)str;
> +    msg.cmd = cmd;
> +    msg.bytestream = 1;
> +    msg.size = qstring_get_length(qstr) + 1;
> +    msg.num_fds = 0;
> +
> +    mpqemu_msg_send(pdev->mpqemu_link, &msg, pdev->mpqemu_link->com);
> +
> +    return;
> +}

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 20/49] multi-process: add qdev_proxy_add to create proxy devices
  2019-10-24  9:09 ` [RFC v4 PATCH 20/49] multi-process: add qdev_proxy_add to create proxy devices Jagannathan Raman
@ 2019-11-21 12:16   ` Stefan Hajnoczi
  0 siblings, 0 replies; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-21 12:16 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, fam, john.g.johnson, qemu-devel, kraxel,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

[-- Attachment #1: Type: text/plain, Size: 4364 bytes --]

On Thu, Oct 24, 2019 at 05:09:01AM -0400, Jagannathan Raman wrote:
> diff --git a/hw/proxy/qemu-proxy.c b/hw/proxy/qemu-proxy.c
> index 3b84055..fc1c731 100644
> --- a/hw/proxy/qemu-proxy.c
> +++ b/hw/proxy/qemu-proxy.c
> @@ -337,7 +337,8 @@ static void init_proxy(PCIDevice *dev, char *command, bool need_spawn, Error **e
>  
>      if (!pdev->managed) {
>          if (need_spawn) {
> -            if (!remote_spawn(pdev, command, &local_error)) {
> +            if (remote_spawn(pdev, command, &local_error)) {
> +                fprintf(stderr, "remote spawn failed\n");
>                  return;
>              }
>          }

Looks like a fix for a bug in a previous patch.  Please squash it.
Also, please propagate local_err and do not use fprintf in a function
that has an errp argument for reporting errors.

> +#if defined(CONFIG_MPQEMU)

Maybe these functions should be in a separate file that the makefile
includes when CONFIG_MPQEMU is defined.

> +
> +static PCIProxyDev *get_proxy_object_rid(const char *rid)
> +{
> +    PCIProxyDev *entry;
> +    if (!proxy_list_lock.initialized) {
> +        QLIST_INIT(&proxy_dev_list.devices);
> +        qemu_mutex_init(&proxy_list_lock);
> +    }

This locking approach is broken since exactly-once initialization
semantics are required to avoid races during initialization.  Is the
lock needed at all?

> +DeviceState *qdev_remote_add(QemuOpts *opts, bool device, Error **errp)
> +{
> +    PCIProxyDev *pdev = NULL;
> +    DeviceState *dev;
> +    const char *rid, *rsocket = NULL, *command = NULL;
> +    QDict *qdict_new;
> +    const char *id = NULL;
> +    const char *driver = NULL;
> +    const char *bus = NULL;
> +
> +    if (!proxy_list_lock.initialized) {
> +        QLIST_INIT(&proxy_dev_list.devices);
> +        qemu_mutex_init(&proxy_list_lock);
> +    }
> +
> +    rid = qemu_opt_get(opts, "rid");
> +    if (!rid) {
> +        error_setg(errp, "rdevice option needs rid specified.");
> +        return NULL;
> +    }
> +    if (device) {
> +        driver = qemu_opt_get(opts, "driver");
> +        /* TODO: properly identify the device class. */
> +        if (strncmp(driver, "lsi", 3) == 0) {
> +            id = qemu_opts_id(opts);
> +            if (!id) {
> +                error_setg(errp, "qdev_remote_add option needs id specified.");
> +                return NULL;
> +            }
> +        }
> +    }
> +
> +    rsocket = qemu_opt_get(opts, "socket");
> +    if (rsocket) {
> +        if (strlen(rsocket) > MAX_RID_LENGTH) {
> +            error_setg(errp, "Socket number is incorrect.");
> +            return NULL;
> +        }
> +    }
> +    /*
> +     * TODO: verify command with known commands and on remote end.
> +     * How else can we verify the binary we launch without libvirtd support?
> +     */
> +    command = qemu_opt_get(opts, "command");
> +    if (!rsocket && !command) {
> +        error_setg(errp, "rdevice option needs socket or command specified.");
> +        return NULL;
> +    }
> +
> +    bus = qemu_opt_get(opts, "bus");
> +    dev = qdev_proxy_add(rid, id, (char *)bus, (char *)command,
> +                         rsocket ? atoi(rsocket) : -1,
> +                         rsocket ? true : false, errp);
> +    if (!dev) {
> +        error_setg(errp, "qdev_proxy_add error.");
> +        return NULL;
> +    }
> +
> +    qdict_new = qemu_opts_to_qdict(opts, NULL);
> +
> +    if (!qdict_new) {
> +        error_setg(errp, "Could not parse rdevice options.");
> +        return NULL;
> +    }
> +
> +    pdev = PCI_PROXY_DEV(dev);
> +    if (!pdev->set_remote_opts) {
> +        /* TODO: destroy proxy? */
> +        error_setg(errp, "set_remote_opts failed.");
> +        return NULL;
> +    } else {
> +        if (id && !pdev->dev_id) {
> +            pdev->dev_id = g_strdup(id);
> +        }
> +        pdev->set_remote_opts(PCI_DEVICE(pdev), qdict_new,
> +                              device ? DEV_OPTS : DRIVE_OPTS);

This function needs to be able to return an error if setting the options
failed.  A response message needs to be defined in the protocol to
support this.

Is DRIVE_OPTS still needed?  I thought the drives would be configured in
the remote process and no proxy objects were needed on the QEMU side?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 30/49] multi-process: send heartbeat messages to remote
  2019-11-13 16:01     ` Jag Raman
@ 2019-11-21 12:19       ` Stefan Hajnoczi
  0 siblings, 0 replies; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-21 12:19 UTC (permalink / raw)
  To: Jag Raman
  Cc: elena.ufimtseva, fam, john.g.johnson, qemu-devel, kraxel,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, Stefan Hajnoczi, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

[-- Attachment #1: Type: text/plain, Size: 2155 bytes --]

On Wed, Nov 13, 2019 at 11:01:07AM -0500, Jag Raman wrote:
> 
> 
> On 11/11/2019 11:27 AM, Stefan Hajnoczi wrote:
> > On Thu, Oct 24, 2019 at 05:09:11AM -0400, Jagannathan Raman wrote:
> > > +static void broadcast_msg(MPQemuMsg *msg, bool need_reply)
> > > +{
> > > +    PCIProxyDev *entry;
> > > +    unsigned int pid;
> > > +    int wait;
> > > +
> > > +    QLIST_FOREACH(entry, &proxy_dev_list.devices, next) {
> > > +        if (need_reply) {
> > > +            wait = eventfd(0, EFD_NONBLOCK);
> > > +            msg->num_fds = 1;
> > > +            msg->fds[0] = wait;
> > > +        }
> > > +
> > > +        mpqemu_msg_send(entry->mpqemu_link, msg, entry->mpqemu_link->com);
> > > +        if (need_reply) {
> > > +            pid = (uint32_t)wait_for_remote(wait);
> > 
> > Sometimes QEMU really needs to wait for the remote process before it can
> > make progress.  I think this is not one of those cases though.
> > 
> > Since QEMU is event-driven it's problematic to invoke blocking system
> > calls.  The remote process might not respond for a significant amount of
> > time.  Other QEMU threads will be held up waiting for the QEMU global
> > mutex in the meantime (because we hold it!).
> 
> There are places where we wait synchronously for the remote process.
> However, these synchronous waits carry a timeout to prevent the hang
> situation you described above.
> 
> We will add an error recovery in the future. That is, we will respawn
> the remote process if the QEMU times out waiting for it.

Even with a timeout, in the meantime the event loop is blocked.  That
means timers will be delayed by a large amount, the monitor will be
unresponsive, etc.

> > 
> > Please implement heartbeat/ping asynchronously.  The wait eventfd should
> > be read by an event loop fd handler instead.  That way QEMU can continue
> > with running the VM while waiting for the remote process.
> 
> In the current implementation, the heartbeat/ping is asynchronous.
> start_heartbeat_timer() sets up a timer to perform the ping.

The heartbeat/ping is synchronous because broadcast_msg() blocks.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 32/49] multi-process: Use separate MMIO communication channel
  2019-11-13 16:14     ` Jag Raman
@ 2019-11-21 12:31       ` Stefan Hajnoczi
  0 siblings, 0 replies; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-21 12:31 UTC (permalink / raw)
  To: Jag Raman
  Cc: elena.ufimtseva, fam, john.g.johnson, qemu-devel, kraxel,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, Stefan Hajnoczi, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1842 bytes --]

On Wed, Nov 13, 2019 at 11:14:50AM -0500, Jag Raman wrote:
> On 11/11/2019 11:21 AM, Stefan Hajnoczi wrote:
> > On Thu, Oct 24, 2019 at 05:09:13AM -0400, Jagannathan Raman wrote:
> > > Using a separate communication channel for MMIO helps
> > > with improving Performance
> > 
> > Why?
> 
> Typical initiation of IO operations involves multiple MMIO accesses per
> IO operation. In some legacy devices like LSI, the completion of the IO
> operations is also accomplished by polling on MMIO registers. Therefore,
> MMIO traffic can be hefty in some cases and contribute to Performance.
> 
> Having a dedicated channel for MMIO ensures that it doesn't have to
> compete with other messages to the remote process, especially when there
> are multiple devices emulated by a single remote process.

A vCPU doing a polling read on an MMIO register will cause a BAR_READ
message to be sent to the remote process.  The vCPU thread waits for the
response to this message.

When there are multiple remote devices each has its own socket, so
communication with different remote processes does not interfere.

The only scenarios I can think of are:
1. Interference within a single device between vCPUs and/or the QEMU
   monitor.
2. A single process serving multiple devices that is implemented in a
   way such that different devices interfere with each other.

It sounds like you are saying the problem is #2, but this is still
unclear to me.  If the remote process can be implemented in a way such
that there is no interference when each device has a special MMIO
socket, then why can't it be implemented in a way such that there is no
interference when each device's main socket is used (each device has
it's own!).

Maybe I've missed the point.  It would be good if you could explain in
more detail.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (51 preceding siblings ...)
  2019-10-25  2:10 ` no-reply
@ 2019-11-21 12:46 ` Stefan Hajnoczi
  2019-12-10  6:47 ` [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update Elena Ufimtseva
  53 siblings, 0 replies; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-11-21 12:46 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, fam, john.g.johnson, qemu-devel, kraxel,
	quintela, mst, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, stefanha, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau, pbonzini

[-- Attachment #1: Type: text/plain, Size: 2321 bytes --]

On Thu, Oct 24, 2019 at 05:08:41AM -0400, Jagannathan Raman wrote:
> Started with the presentation in October 2017 made by Marc-Andre (Red Hat)
> and Konrad Wilk (Oracle) [1], and continued by Jag's BoF at KVM Forum 2018,
> the multi-process project is now a prototype and presented in this patchset.
> John & Elena will present the status of this project in KVM Forum 2019.
> 
> This first series enables the emulation of lsi53c895a in a separate process.
> 
> We posted the Proof Of Concept patches [2] before the BoF session in 2018.
> Subsequently, we posted RFC v1 [3], RFC v2 [4] and RFC v3 [5] of this series. 
> 
> We want to present version 4 of this series, which incorporates the feedback
> we received for v3 & adds support for live migrating the remote process.
> 
> Following people contributed to this patchset:
> 
> John G Johnson <john.g.johnson@oracle.com>
> Jagannathan Raman <jag.raman@oracle.com>
> Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Kanth Ghatraju <kanth.ghatraju@oracle.com>
> 
> For full concept writeup about QEMU disaggregation refer to
> docs/devel/qemu-multiprocess.rst. Please refer to 
> docs/qemu-multiprocess.txt for usage information.
> 
> We are planning on making the following improvements in the future:
>  - Performance improvements
>  - Libvirt support
>  - Enforcement of security policies
>  - blockdev support
> 
> We welcome all your ideas, concerns, and questions for this patchset.
> 
> Thank you!

I've wrapped up for v4.  There is more to review in detail but I've
posted enough comments so that I'd like to see the next revision before
investing more time.

The main topics:

1. It's possible to have just one proxy device per bus type (PCI, USB,
   etc).  The proxy device instance can be initialized by communicating
   with the remote process to inquire about its memory regions,
   interrupts, etc.  This removes the need to hardcode that information
   into per-device proxy objects, which is tedious and can get
   out-of-sync with the real device emulation code.

   This is becoming similar to doing VFIO or muser over a socket...

2. Security and code quality.  Missing input validation and resource
   leaks don't inspire confidence :(.

   Please run scripts/checkpatch.pl on the code.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 41/49] multi-process/mig: Enable VMSD save in the Proxy object
  2019-11-18 15:42         ` Jag Raman
@ 2019-11-22 10:34           ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 140+ messages in thread
From: Dr. David Alan Gilbert @ 2019-11-22 10:34 UTC (permalink / raw)
  To: Jag Raman
  Cc: elena.ufimtseva, fam, thuth, Daniel P. Berrangé,
	ehabkost, john.g.johnson, liran.alon, rth, konrad.wilk,
	qemu-devel, armbru, ross.lagerwall, quintela, mst, kraxel,
	stefanha, pbonzini, kanth.ghatraju, mreitz, kwolf,
	marcandre.lureau

* Jag Raman (jag.raman@oracle.com) wrote:
> 
> 
> On 11/13/2019 12:11 PM, Daniel P. Berrangé wrote:
> > On Wed, Nov 13, 2019 at 11:32:09AM -0500, Jag Raman wrote:
> > > 
> > > 
> > > On 11/13/2019 10:50 AM, Daniel P. Berrangé wrote:
> > > > On Thu, Oct 24, 2019 at 05:09:22AM -0400, Jagannathan Raman wrote:
> > > > > Collect the VMSD from remote process on the source and save
> > > > > it to the channel leading to the destination
> > > > > 
> > > > > Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> > > > > Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> > > > > Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> > > > > ---
> > > > >    New patch in v4
> > > > > 
> > > > >    hw/proxy/qemu-proxy.c         | 132 ++++++++++++++++++++++++++++++++++++++++++
> > > > >    include/hw/proxy/qemu-proxy.h |   2 +
> > > > >    include/io/mpqemu-link.h      |   1 +
> > > > >    3 files changed, 135 insertions(+)
> > > > > 
> > > > > diff --git a/hw/proxy/qemu-proxy.c b/hw/proxy/qemu-proxy.c
> > > > > index 623a6c5..ce72e6a 100644
> > > > > --- a/hw/proxy/qemu-proxy.c
> > > > > +++ b/hw/proxy/qemu-proxy.c
> > > > > @@ -52,6 +52,14 @@
> > > > >    #include "util/event_notifier-posix.c"
> > > > >    #include "hw/boards.h"
> > > > >    #include "include/qemu/log.h"
> > > > > +#include "io/channel.h"
> > > > > +#include "migration/qemu-file-types.h"
> > > > > +#include "qapi/error.h"
> > > > > +#include "io/channel-util.h"
> > > > > +#include "migration/qemu-file-channel.h"
> > > > > +#include "migration/qemu-file.h"
> > > > > +#include "migration/migration.h"
> > > > > +#include "migration/vmstate.h"
> > > > >    QEMUTimer *hb_timer;
> > > > >    static void pci_proxy_dev_realize(PCIDevice *dev, Error **errp);
> > > > > @@ -62,6 +70,9 @@ static void stop_heartbeat_timer(void);
> > > > >    static void childsig_handler(int sig, siginfo_t *siginfo, void *ctx);
> > > > >    static void broadcast_msg(MPQemuMsg *msg, bool need_reply);
> > > > > +#define PAGE_SIZE getpagesize()
> > > > > +uint8_t *mig_data;
> > > > > +
> > > > >    static void childsig_handler(int sig, siginfo_t *siginfo, void *ctx)
> > > > >    {
> > > > >        /* TODO: Add proper handler. */
> > > > > @@ -357,14 +368,135 @@ static void pci_proxy_dev_inst_init(Object *obj)
> > > > >        dev->mem_init = false;
> > > > >    }
> > > > > +typedef struct {
> > > > > +    QEMUFile *rem;
> > > > > +    PCIProxyDev *dev;
> > > > > +} proxy_mig_data;
> > > > > +
> > > > > +static void *proxy_mig_out(void *opaque)
> > > > > +{
> > > > > +    proxy_mig_data *data = opaque;
> > > > > +    PCIProxyDev *dev = data->dev;
> > > > > +    uint8_t byte;
> > > > > +    uint64_t data_size = PAGE_SIZE;
> > > > > +
> > > > > +    mig_data = g_malloc(data_size);
> > > > > +
> > > > > +    while (true) {
> > > > > +        byte = qemu_get_byte(data->rem);
> > > > 
> > > > There is a pretty large set of APIs hiding behind the qemu_get_byte
> > > > call, which does not give me confidence that...
> > > > 
> > > > > +        mig_data[dev->migsize++] = byte;
> > > > > +        if (dev->migsize == data_size) {
> > > > > +            data_size += PAGE_SIZE;
> > > > > +            mig_data = g_realloc(mig_data, data_size);
> > > > > +        }
> > > > > +    }
> > > > > +
> > > > > +    return NULL;
> > > > > +}
> > > > > +
> > > > > +static int proxy_pre_save(void *opaque)
> > > > > +{
> > > > > +    PCIProxyDev *pdev = opaque;
> > > > > +    proxy_mig_data *mig_data;
> > > > > +    QEMUFile *f_remote;
> > > > > +    MPQemuMsg msg = {0};
> > > > > +    QemuThread thread;
> > > > > +    Error *err = NULL;
> > > > > +    QIOChannel *ioc;
> > > > > +    uint64_t size;
> > > > > +    int fd[2];
> > > > > +
> > > > > +    if (socketpair(AF_UNIX, SOCK_STREAM, 0, fd)) {
> > > > > +        return -1;
> > > > > +    }
> > > > > +
> > > > > +    ioc = qio_channel_new_fd(fd[0], &err);
> > > > > +    if (err) {
> > > > > +        error_report_err(err);
> > > > > +        return -1;
> > > > > +    }
> > > > > +
> > > > > +    qio_channel_set_name(QIO_CHANNEL(ioc), "PCIProxyDevice-mig");
> > > > > +
> > > > > +    f_remote = qemu_fopen_channel_input(ioc);
> > > > > +
> > > > > +    pdev->migsize = 0;
> > > > > +
> > > > > +    mig_data = g_malloc0(sizeof(proxy_mig_data));
> > > > > +    mig_data->rem = f_remote;
> > > > > +    mig_data->dev = pdev;
> > > > > +
> > > > > +    qemu_thread_create(&thread, "Proxy MIG_OUT", proxy_mig_out, mig_data,
> > > > > +                       QEMU_THREAD_DETACHED);
> > > > > +
> > > > > +    msg.cmd = START_MIG_OUT;
> > > > > +    msg.bytestream = 0;
> > > > > +    msg.num_fds = 2;
> > > > > +    msg.fds[0] = fd[1];
> > > > > +    msg.fds[1] = GET_REMOTE_WAIT;
> > > > > +
> > > > > +    mpqemu_msg_send(pdev->mpqemu_link, &msg, pdev->mpqemu_link->com);
> > > > > +    size = wait_for_remote(msg.fds[1]);
> > > > > +    PUT_REMOTE_WAIT(msg.fds[1]);
> > > > > +
> > > > > +    assert(size != ULLONG_MAX);
> > > > > +
> > > > > +    /*
> > > > > +     * migsize is being update by a separate thread. Using volatile to
> > > > > +     * instruct the compiler to fetch the value of this variable from
> > > > > +     * memory during every read
> > > > > +     */
> > > > > +    while (*((volatile uint64_t *)&pdev->migsize) < size) {
> > > > > +    }
> > > > > +
> > > > > +    qemu_thread_cancel(&thread);
> > > > 
> > > > ....this is a safe way to stop the thread executing without
> > > > resulting in memory being leaked.
> > > > 
> > > > In addition thread cancellation is asynchronous, so the thread
> > > > may still be using the QEMUFile object while....
> > > > 
> > > > > +    qemu_fclose(f_remote);
> > > 
> > > The above "wait_for_remote()" call waits for the remote process to
> > > finish with Migration, and return the size of the VMSD.
> > > 
> > > It should be safe to cancel the thread and close the file, once the
> > > remote process is done sending the VMSD and we have read "size" bytes
> > > from it, is it not?
> > 
> > Ok, so the thread is doing
> > 
> >      while (true) {
> >          byte = qemu_get_byte(data->rem);
> >          ...do something with byte...
> >      }
> > 
> > so when the thread is cancelled it is almost certainly in the
> > qemu_get_byte() call. Since you say wait_for_remote() syncs
> > with the end of migration, I'll presume there's no more data
> > to be read but the file is still open.
> > 
> > If we're using a blocking FD here we'll probably be stuck in
> > read() when we're cancelled, and cancellation would probably
> > be ok from looking at the current impl of QEMUFile / QIOChannel.
> > If we're handling any error scenario though there could be a
> > "Error *local_err" that needs freeing before cancellation.
> > 
> > If the fclose is processed before cancellation takes affect
> > on the target thread though we could have a race.
> > 
> >    1. proxy_mig_out blocked in read from qemu_fill_buffer
> > 
> >    2. main thread request async cancel
> > 
> >    3. main thread calls qemu_fclose which closes the FD
> >       and free's the QEMUFile object
> > 
> >    4. proxy_mig_out thread returns from read() with
> >       ret == 0 (EOF)
> 
> This wasn't happening. It would be convenient if it did.
> 
> When the file was closed by the main thread, the async thread was still
> hung at qemu_fill_buffer(), instead of returning 0 (EOF). That's reason
> why we took the thread-cancellation route. We'd be glad to remove
> qemu_thread_cancel().
> 
> > 
> >    5. proxy_mig_out thread calls qemu_file_set_error_obj
> >       on a QEMUFole object free'd in (3). use after free. opps
> > 
> >    6. ..async cancel request gets delivered....
> > 
> > admittedly it is fairly unlikely for the async cancel
> > to be delayed for so long that this sequence happens, but
> > unexpected things can happen when we really don't want them.
> 
> Absolutely, we don't want to leave anything to chance.
> 
> > 
> > IMHO the safe way to deal with this would be a lock-step
> > sequence between the threads
> > 
> >     1. proxy_mig_out blocked in read from qemu_fill_buffer
> >     2. main thread closes the FD with qemu_file_shutdown()
> >        closing both directions
> 
> Will give qemu_file_shutdown() a try.

Yes, shutdown() is quite nice - but note it does need to be a socket.

Dave

> Thank you!
> --
> Jag
> 
> > 
> >     3. proxy_mig_out returns from read with ret == 0 (EOF)
> > 
> >     4. proxy_mig_out thread breaks out of its inifinite loop
> >        due to EOF and exits
> > 
> >     5. main thread calls pthread_join on proxy_mig_out
> > 
> >     6. main thread calls qemu_fclose()
> > 
> > this is easier to reason about the safety of than the cancel based
> > approach IMHO.
> > 
> > Regards,
> > Daniel
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
                   ` (52 preceding siblings ...)
  2019-11-21 12:46 ` Stefan Hajnoczi
@ 2019-12-10  6:47 ` Elena Ufimtseva
  2019-12-13 10:41   ` Stefan Hajnoczi
  53 siblings, 1 reply; 140+ messages in thread
From: Elena Ufimtseva @ 2019-12-10  6:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: fam, john.g.johnson, mst, kraxel, jag.raman, quintela, armbru,
	kanth.ghatraju, felipe, thuth, ehabkost, konrad.wilk, dgilbert,
	liran.alon, stefanha, rth, kwolf, berrange, mreitz,
	ross.lagerwall, marcandre.lureau, pbonzini


Hi

We would like to give a short update to the community about the multi-process project.

Firstly, we appreciate the feedback and all productive discussions we had
at KVM 2019 forum.
As an outcome of the conference, we have switched gears and are investigating
the ways of using the muser framework in our project.

At this moment we are working on the evaluation and a first prototype
of qemu-multiprocess based on muser framework.
We first heard about it at the conference from the presentation given by
Thanos Makatos and Swapnil Ingle from Nutanix.
Their presentation is available https://static.sched.com/hosted_files/kvmforum2019/3b/muser.pdf
 along with github link to the source repo.
After the conversation we had with a group of people including Felipe Franciosi,
Stefan Hajnoczi, Daniel Berrangé, Konrad Wilk, Peter Maydell, John Jonson and few others
(apologies if some names are missing), we have gathered important answers on how to move
forward with qemu-multiprocess.

At this moment we are working on the first stage of the project with help of
the Nutanix developers.
The questions we have gathered so far will be addressed with muser
and Qemu developers after we finish the first stage and make sure we understand
what it will take for us to move onto the next stage.

We will also incorporate relevant review from Stefan that he provided
on the series 4 of the patchset. Thank you Stefan.

If anyone have any further suggestions or questions about the status,
please reply to this email.

Thank you

JJ, Jag & Elena

On Thu, Oct 24, 2019 at 05:08:41AM -0400, Jagannathan Raman wrote:
> Started with the presentation in October 2017 made by Marc-Andre (Red Hat)
> and Konrad Wilk (Oracle) [1], and continued by Jag's BoF at KVM Forum 2018,
> the multi-process project is now a prototype and presented in this patchset.
> John & Elena will present the status of this project in KVM Forum 2019.
> 
> This first series enables the emulation of lsi53c895a in a separate process.
> 
> We posted the Proof Of Concept patches [2] before the BoF session in 2018.
> Subsequently, we posted RFC v1 [3], RFC v2 [4] and RFC v3 [5] of this series. 
> 
> We want to present version 4 of this series, which incorporates the feedback
> we received for v3 & adds support for live migrating the remote process.
> 
> Following people contributed to this patchset:
> 
> John G Johnson <john.g.johnson@oracle.com>
> Jagannathan Raman <jag.raman@oracle.com>
> Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Kanth Ghatraju <kanth.ghatraju@oracle.com>
> 
> For full concept writeup about QEMU disaggregation refer to
> docs/devel/qemu-multiprocess.rst. Please refer to 
> docs/qemu-multiprocess.txt for usage information.
> 
> We are planning on making the following improvements in the future:
>  - Performance improvements
>  - Libvirt support
>  - Enforcement of security policies
>  - blockdev support
> 
> We welcome all your ideas, concerns, and questions for this patchset.
> 
> Thank you!
> 
> [1]: http://events17.linuxfoundation.org/sites/events/files/slides/KVM%20FORUM%20multi-process.pdf
> [1]: https://www.youtube.com/watch?v=Kq1-coHh7lg
> [2]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg566538.html
> [3]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg602285.html
> [4]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg624877.html
> [5]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg642000.html
> 
> Elena Ufimtseva (22):
>   multi-process: add a command line option for debug file
>   multi-process: introduce proxy object
>   mutli-process: build remote command line args
>   multi-process: configure remote side devices
>   multi-process: add qdev_proxy_add to create proxy devices
>   multi-process: remote: add setup_devices and setup_drive msg
>     processing
>   multi-process: remote: use fd for socket from parent process
>   multi-process: remote: add create_done condition
>   multi-process: add processing of remote drive and device command line
>   multi-process: refractor vl.c code to re-use in remote
>   multi-process: add remote option
>   multi-process: add remote options parser
>   multi-process: add parse_cmdline in remote process
>   multi-process: send heartbeat messages to remote
>   multi-process: handle heartbeat messages in remote process
>   multi-process/mon: choose HMP commands based on target
>   multi-process/mig: Load VMSD in the proxy object
>   multi-process/mig: refactor runstate_check into common file
>   multi-process/mig: Synchronize runstate of remote process
>   multi-process/mig: Restore the VMSD in remote process
>   multi-process: Enable support for multiple devices in remote
>   multi-process: add configure and usage information
> 
> Jagannathan Raman (26):
>   multi-process: memory: alloc RAM from file at offset
>   multi-process: util: Add qemu_thread_cancel() to cancel running thread
>   multi-process: Add stub functions to facilate build of multi-process
>   multi-process: Add config option for multi-process QEMU
>   multi-process: build system for remote device process
>   multi-process: define mpqemu-link object
>   multi-process: add functions to synchronize proxy and remote endpoints
>   multi-process: setup PCI host bridge for remote device
>   multi-process: setup a machine object for remote device process
>   multi-process: setup memory manager for remote device
>   multi-process: remote process initialization
>   multi-process: PCI BAR read/write handling for proxy & remote
>     endpoints
>   multi-process: Add LSI device proxy object
>   multi-process: Synchronize remote memory
>   multi-process: create IOHUB object to handle irq
>   multi-process: Introduce build flags to separate remote process code
>   multi-process: Use separate MMIO communication channel
>   multi-process: perform device reset in the remote process
>   multi-process/mon: stub functions to enable QMP module for remote
>     process
>   multi-process/mon: enable QMP module support in the remote process
>   multi-process/mon: Refactor monitor/chardev functions out of vl.c
>   multi-process/mon: Initialize QMP module for remote processes
>   multi-process: prevent duplicate memory initialization in remote
>   multi-process/mig: build migration module in the remote process
>   multi-process/mig: Enable VMSD save in the Proxy object
>   multi-process/mig: Send VMSD of remote to the Proxy object
> 
> John G Johnson (1):
>   multi-process: add the concept description to
>     docs/devel/qemu-multiprocess
> 
>  Makefile                            |    2 +
>  Makefile.objs                       |   39 ++
>  Makefile.target                     |   94 ++-
>  accel/stubs/kvm-stub.c              |    5 +
>  accel/stubs/tcg-stub.c              |  106 ++++
>  backends/Makefile.objs              |    2 +
>  block/Makefile.objs                 |    2 +
>  chardev/char.c                      |   14 +
>  configure                           |   15 +
>  docs/devel/index.rst                |    1 +
>  docs/devel/qemu-multiprocess.rst    | 1102 +++++++++++++++++++++++++++++++++++
>  docs/qemu-multiprocess.txt          |   86 +++
>  exec.c                              |   14 +-
>  hmp-commands-info.hx                |   10 +
>  hmp-commands.hx                     |   25 +-
>  hw/Makefile.objs                    |    9 +
>  hw/block/Makefile.objs              |    2 +
>  hw/core/Makefile.objs               |   17 +
>  hw/nvram/Makefile.objs              |    2 +
>  hw/pci/Makefile.objs                |    4 +
>  hw/proxy/Makefile.objs              |    1 +
>  hw/proxy/memory-sync.c              |  226 +++++++
>  hw/proxy/proxy-lsi53c895a.c         |   97 +++
>  hw/proxy/qemu-proxy.c               |  807 +++++++++++++++++++++++++
>  hw/scsi/Makefile.objs               |    2 +
>  include/chardev/char.h              |    1 +
>  include/exec/address-spaces.h       |    2 +
>  include/exec/ram_addr.h             |    2 +-
>  include/hw/pci/pci_ids.h            |    3 +
>  include/hw/proxy/memory-sync.h      |   51 ++
>  include/hw/proxy/proxy-lsi53c895a.h |   42 ++
>  include/hw/proxy/qemu-proxy.h       |  125 ++++
>  include/hw/qdev-core.h              |    2 +
>  include/io/mpqemu-link.h            |  214 +++++++
>  include/monitor/monitor.h           |    2 +
>  include/monitor/qdev.h              |   25 +
>  include/qemu-common.h               |    8 +
>  include/qemu/log.h                  |    1 +
>  include/qemu/mmap-alloc.h           |    3 +-
>  include/qemu/thread.h               |    1 +
>  include/remote/iohub.h              |   63 ++
>  include/remote/machine.h            |   48 ++
>  include/remote/memory.h             |   34 ++
>  include/remote/pcihost.h            |   59 ++
>  include/sysemu/runstate.h           |    3 +
>  io/Makefile.objs                    |    2 +
>  io/mpqemu-link.c                    |  351 +++++++++++
>  memory.c                            |    2 +-
>  migration/Makefile.objs             |   12 +
>  migration/savevm.c                  |   63 ++
>  migration/savevm.h                  |    3 +
>  monitor/Makefile.objs               |    3 +
>  monitor/misc.c                      |   84 +--
>  monitor/monitor-internal.h          |   38 ++
>  monitor/monitor.c                   |   83 ++-
>  net/Makefile.objs                   |    2 +
>  qapi/Makefile.objs                  |    2 +
>  qdev-monitor.c                      |  270 ++++++++-
>  qemu-options.hx                     |   21 +
>  qom/Makefile.objs                   |    4 +
>  remote/Makefile.objs                |    6 +
>  remote/iohub.c                      |  159 +++++
>  remote/machine.c                    |  133 +++++
>  remote/memory.c                     |   99 ++++
>  remote/pcihost.c                    |   85 +++
>  remote/remote-main.c                |  633 ++++++++++++++++++++
>  remote/remote-opts.c                |  131 +++++
>  remote/remote-opts.h                |   31 +
>  replay/Makefile.objs                |    2 +-
>  rules.mak                           |    2 +-
>  runstate.c                          |   41 ++
>  scripts/hxtool                      |   44 +-
>  stubs/audio.c                       |   12 +
>  stubs/gdbstub.c                     |   21 +
>  stubs/machine-init-done.c           |    4 +
>  stubs/migration.c                   |  211 +++++++
>  stubs/monitor.c                     |   72 +++
>  stubs/net-stub.c                    |  121 ++++
>  stubs/qapi-misc.c                   |   43 ++
>  stubs/qapi-target.c                 |   49 ++
>  stubs/replay.c                      |   26 +
>  stubs/runstate-check.c              |    3 +
>  stubs/ui-stub.c                     |  130 +++++
>  stubs/vl-stub.c                     |  193 ++++++
>  stubs/vmstate.c                     |   20 +
>  stubs/xen-mapcache.c                |   22 +
>  ui/Makefile.objs                    |    2 +
>  util/log.c                          |    2 +
>  util/mmap-alloc.c                   |    7 +-
>  util/oslib-posix.c                  |    2 +-
>  util/qemu-thread-posix.c            |   10 +
>  vl-parse.c                          |  161 +++++
>  vl.c                                |  310 ++++------
>  vl.h                                |   54 ++
>  94 files changed, 6908 insertions(+), 246 deletions(-)
>  create mode 100644 docs/devel/qemu-multiprocess.rst
>  create mode 100644 docs/qemu-multiprocess.txt
>  create mode 100644 hw/proxy/Makefile.objs
>  create mode 100644 hw/proxy/memory-sync.c
>  create mode 100644 hw/proxy/proxy-lsi53c895a.c
>  create mode 100644 hw/proxy/qemu-proxy.c
>  create mode 100644 include/hw/proxy/memory-sync.h
>  create mode 100644 include/hw/proxy/proxy-lsi53c895a.h
>  create mode 100644 include/hw/proxy/qemu-proxy.h
>  create mode 100644 include/io/mpqemu-link.h
>  create mode 100644 include/remote/iohub.h
>  create mode 100644 include/remote/machine.h
>  create mode 100644 include/remote/memory.h
>  create mode 100644 include/remote/pcihost.h
>  create mode 100644 io/mpqemu-link.c
>  create mode 100644 remote/Makefile.objs
>  create mode 100644 remote/iohub.c
>  create mode 100644 remote/machine.c
>  create mode 100644 remote/memory.c
>  create mode 100644 remote/pcihost.c
>  create mode 100644 remote/remote-main.c
>  create mode 100644 remote/remote-opts.c
>  create mode 100644 remote/remote-opts.h
>  create mode 100644 runstate.c
>  mode change 100644 => 100755 scripts/hxtool
>  create mode 100644 stubs/audio.c
>  create mode 100644 stubs/migration.c
>  create mode 100644 stubs/net-stub.c
>  create mode 100644 stubs/qapi-misc.c
>  create mode 100644 stubs/qapi-target.c
>  create mode 100644 stubs/ui-stub.c
>  create mode 100644 stubs/vl-stub.c
>  create mode 100644 stubs/xen-mapcache.c
>  create mode 100644 vl-parse.c
>  create mode 100644 vl.h
> 
> -- 
> 1.8.3.1
> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2019-12-10  6:47 ` [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update Elena Ufimtseva
@ 2019-12-13 10:41   ` Stefan Hajnoczi
  2019-12-16 19:46     ` Elena Ufimtseva
  0 siblings, 1 reply; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-12-13 10:41 UTC (permalink / raw)
  To: Elena Ufimtseva
  Cc: fam, john.g.johnson, mst, qemu-devel, kraxel, jag.raman,
	quintela, armbru, kanth.ghatraju, felipe, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, rth, kwolf, berrange, mreitz,
	ross.lagerwall, marcandre.lureau, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1156 bytes --]

On Mon, Dec 09, 2019 at 10:47:17PM -0800, Elena Ufimtseva wrote:
> At this moment we are working on the first stage of the project with help of
> the Nutanix developers.
> The questions we have gathered so far will be addressed with muser
> and Qemu developers after we finish the first stage and make sure we understand
> what it will take for us to move onto the next stage.
> 
> We will also incorporate relevant review from Stefan that he provided
> on the series 4 of the patchset. Thank you Stefan.
> 
> If anyone have any further suggestions or questions about the status,
> please reply to this email.

Hi Elena,
At KVM Forum we discussed spending 1 or 2 weeks trying out muser.  A few
weeks have passed and from your email it sounds like this "next stage"
might be a lot of work.

Is there a work-in-progress muser patch series you can post to start the
discussion early?  That way we can avoid reviewers like myself asking
you to make changes after you have invested a lot of time.

It's good that you are in touch with the muser developers (via private
discussion?  I haven't seen much activity on #muser IRC).

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2019-12-13 10:41   ` Stefan Hajnoczi
@ 2019-12-16 19:46     ` Elena Ufimtseva
  2019-12-16 19:57       ` Felipe Franciosi
  0 siblings, 1 reply; 140+ messages in thread
From: Elena Ufimtseva @ 2019-12-16 19:46 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: fam, john.g.johnson, mst, qemu-devel, kraxel, jag.raman,
	quintela, armbru, kanth.ghatraju, felipe, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, rth, kwolf, berrange, mreitz,
	ross.lagerwall, marcandre.lureau, pbonzini

On Fri, Dec 13, 2019 at 10:41:16AM +0000, Stefan Hajnoczi wrote:
> On Mon, Dec 09, 2019 at 10:47:17PM -0800, Elena Ufimtseva wrote:
> > At this moment we are working on the first stage of the project with help of
> > the Nutanix developers.
> > The questions we have gathered so far will be addressed with muser
> > and Qemu developers after we finish the first stage and make sure we understand
> > what it will take for us to move onto the next stage.
> > 
> > We will also incorporate relevant review from Stefan that he provided
> > on the series 4 of the patchset. Thank you Stefan.
> > 
> > If anyone have any further suggestions or questions about the status,
> > please reply to this email.
> 
> Hi Elena,
> At KVM Forum we discussed spending 1 or 2 weeks trying out muser.  A few
> weeks have passed and from your email it sounds like this "next stage"
> might be a lot of work.
>

Hi Stefan

Perhaps we were not too clear about our work in the previous email.
Our assumption was that the question that came from KVM Forum was
if muser can be used to achieve the same what we have now.
We should have answered clearly yes to this question.  We have not yet
discovered major road blocks.
At the moment, we are mostly engaged in learning the code and discussing
the design, plus some coding to answer the specific questions.
We understand that the best way to make a progress is to work with the
upstream community on early stages and we agree with this and will present
the proposal shortly for discussion.
 
> Is there a work-in-progress muser patch series you can post to start the
> discussion early?  That way we can avoid reviewers like myself asking
> you to make changes after you have invested a lot of time.
>

Absolutely, that is our plan. At the moment we do not have the patches
ready for the review. We have setup internally a milestone and will be
sending that early version as a tarball after we have it completed.
Would be also a meeting something that could help us to stay on the same
page?
 
> It's good that you are in touch with the muser developers (via private
> discussion?  I haven't seen much activity on #muser IRC).
>

We use IRC (I know Jag got some answers there) and github for issues
(one of which was addressed). We are hoping to get the conversation going over
the email.

JJ, Jag and Elena 
> Stefan




^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2019-12-16 19:46     ` Elena Ufimtseva
@ 2019-12-16 19:57       ` Felipe Franciosi
  2019-12-17 16:33         ` Stefan Hajnoczi
  0 siblings, 1 reply; 140+ messages in thread
From: Felipe Franciosi @ 2019-12-16 19:57 UTC (permalink / raw)
  To: Elena Ufimtseva
  Cc: fam, john.g.johnson, Swapnil Ingle, mst, qemu-devel, kraxel,
	jag.raman, quintela, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, Stefan Hajnoczi,
	Thanos Makatos, rth, kwolf, berrange, mreitz, ross.lagerwall,
	marcandre.lureau, pbonzini

Heya,

> On 16 Dec 2019, at 20:47, Elena Ufimtseva <elena.ufimtseva@oracle.com> wrote:
> 
> On Fri, Dec 13, 2019 at 10:41:16AM +0000, Stefan Hajnoczi wrote:
>>> On Mon, Dec 09, 2019 at 10:47:17PM -0800, Elena Ufimtseva wrote:
>>> At this moment we are working on the first stage of the project with help of
>>> the Nutanix developers.
>>> The questions we have gathered so far will be addressed with muser
>>> and Qemu developers after we finish the first stage and make sure we understand
>>> what it will take for us to move onto the next stage.
>>> 
>>> We will also incorporate relevant review from Stefan that he provided
>>> on the series 4 of the patchset. Thank you Stefan.
>>> 
>>> If anyone have any further suggestions or questions about the status,
>>> please reply to this email.
>> 
>> Hi Elena,
>> At KVM Forum we discussed spending 1 or 2 weeks trying out muser.  A few
>> weeks have passed and from your email it sounds like this "next stage"
>> might be a lot of work.
>> 
> 
> Hi Stefan
> 
> Perhaps we were not too clear about our work in the previous email.
> Our assumption was that the question that came from KVM Forum was
> if muser can be used to achieve the same what we have now.
> We should have answered clearly yes to this question.  We have not yet
> discovered major road blocks.
> At the moment, we are mostly engaged in learning the code and discussing
> the design, plus some coding to answer the specific questions.
> We understand that the best way to make a progress is to work with the
> upstream community on early stages and we agree with this and will present
> the proposal shortly for discussion.
> 
>> Is there a work-in-progress muser patch series you can post to start the
>> discussion early?  That way we can avoid reviewers like myself asking
>> you to make changes after you have invested a lot of time.
>> 
> 
> Absolutely, that is our plan. At the moment we do not have the patches
> ready for the review. We have setup internally a milestone and will be
> sending that early version as a tarball after we have it completed.
> Would be also a meeting something that could help us to stay on the same
> page?

Please loop us in if you so set up a meeting.

> 
>> It's good that you are in touch with the muser developers (via private
>> discussion?  I haven't seen much activity on #muser IRC).
>> 
> 
> We use IRC (I know Jag got some answers there) and github for issues
> (one of which was addressed).

I thought there was only the one. Let us know if you run into any other bugs. We are looking forward to hearing about people’s experience and addressing issues that come with uses we didn’t foresee or test.

Cheers,
Felipe

> We are hoping to get the conversation going over
> the email.
> 
> JJ, Jag and Elena 
>> Stefan
> 
> 

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2019-12-16 19:57       ` Felipe Franciosi
@ 2019-12-17 16:33         ` Stefan Hajnoczi
  2019-12-17 22:57           ` Felipe Franciosi
  2020-01-02 16:01           ` Elena Ufimtseva
  0 siblings, 2 replies; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-12-17 16:33 UTC (permalink / raw)
  To: Felipe Franciosi, Elena Ufimtseva
  Cc: fam, john.g.johnson, Swapnil Ingle, mst, qemu-devel, kraxel,
	jag.raman, quintela, armbru, kanth.ghatraju, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, pbonzini, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau,
	Thanos Makatos

[-- Attachment #1: Type: text/plain, Size: 1732 bytes --]

On Mon, Dec 16, 2019 at 07:57:32PM +0000, Felipe Franciosi wrote:
> > On 16 Dec 2019, at 20:47, Elena Ufimtseva <elena.ufimtseva@oracle.com> wrote:
> > On Fri, Dec 13, 2019 at 10:41:16AM +0000, Stefan Hajnoczi wrote:
> >> Is there a work-in-progress muser patch series you can post to start the
> >> discussion early?  That way we can avoid reviewers like myself asking
> >> you to make changes after you have invested a lot of time.
> >> 
> > 
> > Absolutely, that is our plan. At the moment we do not have the patches
> > ready for the review. We have setup internally a milestone and will be
> > sending that early version as a tarball after we have it completed.
> > Would be also a meeting something that could help us to stay on the same
> > page?
> 
> Please loop us in if you so set up a meeting.

There is a bi-weekly KVM Community Call that we can use for phone
discussions:

  https://calendar.google.com/calendar/embed?src=dG9iMXRqcXAzN3Y4ZXZwNzRoMHE4a3BqcXNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ

Or we can schedule a one-off call at any time :).

Questions I've seen when discussing muser with people have been:

1. Can unprivileged containers create muser devices?  If not, this is a
   blocker for use cases that want to avoid root privileges entirely.

2. Does muser need to be in the kernel (e.g. slower to develop/ship,
   security reasons)?  A similar library could be implemented in
   userspace along the lines of the vhost-user protocol.  Although VMMs
   would then need to use a new libmuser-client library instead of
   reusing their VFIO code to access the device.

3. Should this feature be Linux-only?  vhost-user can be implemented on
   non-Linux OSes...

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2019-12-17 16:33         ` Stefan Hajnoczi
@ 2019-12-17 22:57           ` Felipe Franciosi
  2019-12-18  0:00             ` Paolo Bonzini
                               ` (2 more replies)
  2020-01-02 16:01           ` Elena Ufimtseva
  1 sibling, 3 replies; 140+ messages in thread
From: Felipe Franciosi @ 2019-12-17 22:57 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Elena Ufimtseva, fam, Swapnil Ingle, john.g.johnson, qemu-devel,
	kraxel, jag.raman, quintela, mst, armbru, kanth.ghatraju, thuth,
	ehabkost, konrad.wilk, dgilbert, liran.alon, Thanos Makatos, rth,
	kwolf, berrange, mreitz, ross.lagerwall, marcandre.lureau,
	pbonzini



> On Dec 17, 2019, at 5:33 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Mon, Dec 16, 2019 at 07:57:32PM +0000, Felipe Franciosi wrote:
>>> On 16 Dec 2019, at 20:47, Elena Ufimtseva <elena.ufimtseva@oracle.com> wrote:
>>> On Fri, Dec 13, 2019 at 10:41:16AM +0000, Stefan Hajnoczi wrote:
>>>> Is there a work-in-progress muser patch series you can post to start the
>>>> discussion early?  That way we can avoid reviewers like myself asking
>>>> you to make changes after you have invested a lot of time.
>>>> 
>>> 
>>> Absolutely, that is our plan. At the moment we do not have the patches
>>> ready for the review. We have setup internally a milestone and will be
>>> sending that early version as a tarball after we have it completed.
>>> Would be also a meeting something that could help us to stay on the same
>>> page?
>> 
>> Please loop us in if you so set up a meeting.
> 
> There is a bi-weekly KVM Community Call that we can use for phone
> discussions:
> 
>  https://calendar.google.com/calendar/embed?src=dG9iMXRqcXAzN3Y4ZXZwNzRoMHE4a3BqcXNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ
> 
> Or we can schedule a one-off call at any time :).

Sounds good either way, whenever it's needed.

> 
> Questions I've seen when discussing muser with people have been:
> 
> 1. Can unprivileged containers create muser devices?  If not, this is a
>   blocker for use cases that want to avoid root privileges entirely.

Yes you can. Muser device creation follows the same process as general
mdev device creation (ie. you write to a sysfs path). That creates an
entry in /dev/vfio and the control plane can further drop privileges
there (set selinux contexts, &c.)

> 
> 2. Does muser need to be in the kernel (e.g. slower to develop/ship,
>   security reasons)?  A similar library could be implemented in
>   userspace along the lines of the vhost-user protocol.  Although VMMs
>   would then need to use a new libmuser-client library instead of
>   reusing their VFIO code to access the device.

Doing it in userspace was the flow we proposed back in last year's KVM
Forum (Edinburgh), but it got turned down. That's why we procured the
kernel approach, which turned out to have some advantages:
- No changes needed to Qemu
- No Qemu needed at all for userspace drivers
- Device emulation process restart is trivial
  (it therefore makes device code upgrades much easier)

Having said that, nothing stops us from enhancing libmuser to talk
directly to Qemu (for the Qemu case). I envision at least two ways of
doing that:
- Hooking up libmuser with Qemu directly (eg. over a unix socket)
- Hooking Qemu with CUSE and implementing the muser.ko interface

For the latter, libmuser would talk to a character device just like it
talks to the vfio character device. We "just" need to implement that
backend in Qemu. :)

> 
> 3. Should this feature be Linux-only?  vhost-user can be implemented on
>   non-Linux OSes...

The userspace approach discussed above certainly can be more portable.
Currently, muser depends on MDEV+VFIO and that's where the restriction
comes from.

F.

> 
> Stefan


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2019-12-17 22:57           ` Felipe Franciosi
@ 2019-12-18  0:00             ` Paolo Bonzini
  2019-12-19 13:36               ` Stefan Hajnoczi
  2019-12-19 11:55             ` Stefan Hajnoczi
  2019-12-19 12:50             ` Daniel P. Berrangé
  2 siblings, 1 reply; 140+ messages in thread
From: Paolo Bonzini @ 2019-12-18  0:00 UTC (permalink / raw)
  To: Felipe Franciosi, Stefan Hajnoczi
  Cc: Elena Ufimtseva, fam, Swapnil Ingle, john.g.johnson, qemu-devel,
	kraxel, jag.raman, quintela, mst, armbru, kanth.ghatraju, thuth,
	ehabkost, konrad.wilk, dgilbert, liran.alon, rth, kwolf,
	berrange, mreitz, ross.lagerwall, marcandre.lureau,
	Thanos Makatos

On 17/12/19 23:57, Felipe Franciosi wrote:
> Doing it in userspace was the flow we proposed back in last year's KVM
> Forum (Edinburgh), but it got turned down.

I think the time since then has shown that essentially the cat is out of
the bag.  I didn't really like the idea of devices outside QEMU---and I
still don't---but if something like "VFIO over AF_UNIX" turns out to be
the cleanest way to implement multi-process QEMU device models, I am not
going to pull an RMS and block that from happening.  Assuming I could
even do so!

Paolo



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2019-12-17 22:57           ` Felipe Franciosi
  2019-12-18  0:00             ` Paolo Bonzini
@ 2019-12-19 11:55             ` Stefan Hajnoczi
  2019-12-19 12:33               ` Felipe Franciosi
  2019-12-19 12:50             ` Daniel P. Berrangé
  2 siblings, 1 reply; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-12-19 11:55 UTC (permalink / raw)
  To: Felipe Franciosi
  Cc: Elena Ufimtseva, fam, Swapnil Ingle, john.g.johnson, qemu-devel,
	kraxel, jag.raman, quintela, mst, armbru, kanth.ghatraju, thuth,
	ehabkost, konrad.wilk, dgilbert, liran.alon, Stefan Hajnoczi,
	pbonzini, rth, kwolf, berrange, mreitz, ross.lagerwall,
	marcandre.lureau, Thanos Makatos

[-- Attachment #1: Type: text/plain, Size: 3024 bytes --]

On Tue, Dec 17, 2019 at 10:57:17PM +0000, Felipe Franciosi wrote:
> > On Dec 17, 2019, at 5:33 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > On Mon, Dec 16, 2019 at 07:57:32PM +0000, Felipe Franciosi wrote:
> >>> On 16 Dec 2019, at 20:47, Elena Ufimtseva <elena.ufimtseva@oracle.com> wrote:
> >>> On Fri, Dec 13, 2019 at 10:41:16AM +0000, Stefan Hajnoczi wrote:
> > Questions I've seen when discussing muser with people have been:
> > 
> > 1. Can unprivileged containers create muser devices?  If not, this is a
> >   blocker for use cases that want to avoid root privileges entirely.
> 
> Yes you can. Muser device creation follows the same process as general
> mdev device creation (ie. you write to a sysfs path). That creates an
> entry in /dev/vfio and the control plane can further drop privileges
> there (set selinux contexts, &c.)

In this case there is still a privileged step during setup.  What about
completely unprivileged scenarios like a regular user without root or a
rootless container?

> > 2. Does muser need to be in the kernel (e.g. slower to develop/ship,
> >   security reasons)?  A similar library could be implemented in
> >   userspace along the lines of the vhost-user protocol.  Although VMMs
> >   would then need to use a new libmuser-client library instead of
> >   reusing their VFIO code to access the device.
> 
> Doing it in userspace was the flow we proposed back in last year's KVM
> Forum (Edinburgh), but it got turned down. That's why we procured the
> kernel approach, which turned out to have some advantages:
> - No changes needed to Qemu
> - No Qemu needed at all for userspace drivers
> - Device emulation process restart is trivial
>   (it therefore makes device code upgrades much easier)
> 
> Having said that, nothing stops us from enhancing libmuser to talk
> directly to Qemu (for the Qemu case). I envision at least two ways of
> doing that:
> - Hooking up libmuser with Qemu directly (eg. over a unix socket)
> - Hooking Qemu with CUSE and implementing the muser.ko interface
> 
> For the latter, libmuser would talk to a character device just like it
> talks to the vfio character device. We "just" need to implement that
> backend in Qemu. :)

What about:
 * libmuser's API stays mostly unchanged but the library speaks a
   VFIO-over-UNIX domain sockets protocol instead of talking to
   mdev/vfio in the host kernel.
 * VMMs can implement this protocol directly for POSIX-portable and
   unprivileged operation.
 * A CUSE VFIO adapter simulates /dev/vfio so that VFIO-only VMMs can
   still take advantage of libmuser devices.

Assuming this is feasible, would you lose any important
features/advantages of the muser.ko approach?  I don't know enough about
VFIO to identify any blocker or obvious performance problems.

Regarding recovery, it seems straightforward to keep state in a tmpfs
file that can be reopened when the device is restarted.  I don't think
kernel code is necessary?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2019-12-19 11:55             ` Stefan Hajnoczi
@ 2019-12-19 12:33               ` Felipe Franciosi
  2019-12-19 12:55                 ` Daniel P. Berrangé
  2019-12-19 16:40                 ` Jag Raman
  0 siblings, 2 replies; 140+ messages in thread
From: Felipe Franciosi @ 2019-12-19 12:33 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Elena Ufimtseva, fam, Swapnil Ingle, john.g.johnson, qemu-devel,
	Walker, Benjamin, kraxel, jag.raman, Harris, James R, quintela,
	mst, armbru, kanth.ghatraju, thuth, ehabkost, konrad.wilk,
	dgilbert, liran.alon, pbonzini, rth, kwolf, berrange, mreitz,
	ross.lagerwall, marcandre.lureau, Thanos Makatos

Hello,

(I've added Jim and Ben from the SPDK team to the thread.)

> On Dec 19, 2019, at 11:55 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> 
> On Tue, Dec 17, 2019 at 10:57:17PM +0000, Felipe Franciosi wrote:
>>> On Dec 17, 2019, at 5:33 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>> On Mon, Dec 16, 2019 at 07:57:32PM +0000, Felipe Franciosi wrote:
>>>>> On 16 Dec 2019, at 20:47, Elena Ufimtseva <elena.ufimtseva@oracle.com> wrote:
>>>>> On Fri, Dec 13, 2019 at 10:41:16AM +0000, Stefan Hajnoczi wrote:
>>> Questions I've seen when discussing muser with people have been:
>>> 
>>> 1. Can unprivileged containers create muser devices?  If not, this is a
>>>  blocker for use cases that want to avoid root privileges entirely.
>> 
>> Yes you can. Muser device creation follows the same process as general
>> mdev device creation (ie. you write to a sysfs path). That creates an
>> entry in /dev/vfio and the control plane can further drop privileges
>> there (set selinux contexts, &c.)
> 
> In this case there is still a privileged step during setup.  What about
> completely unprivileged scenarios like a regular user without root or a
> rootless container?

Oh, I see what you are saying. I suppose we need to investigate
adjusting the privileges of the sysfs path correctly beforehand to
allow devices to be created by non-root users. The credentials used on
creation should be reflected on the vfio endpoint (ie. /dev/fio/<group>).

I need to look into that and get back to you.

> 
>>> 2. Does muser need to be in the kernel (e.g. slower to develop/ship,
>>>  security reasons)?  A similar library could be implemented in
>>>  userspace along the lines of the vhost-user protocol.  Although VMMs
>>>  would then need to use a new libmuser-client library instead of
>>>  reusing their VFIO code to access the device.
>> 
>> Doing it in userspace was the flow we proposed back in last year's KVM
>> Forum (Edinburgh), but it got turned down. That's why we procured the
>> kernel approach, which turned out to have some advantages:
>> - No changes needed to Qemu
>> - No Qemu needed at all for userspace drivers
>> - Device emulation process restart is trivial
>>  (it therefore makes device code upgrades much easier)
>> 
>> Having said that, nothing stops us from enhancing libmuser to talk
>> directly to Qemu (for the Qemu case). I envision at least two ways of
>> doing that:
>> - Hooking up libmuser with Qemu directly (eg. over a unix socket)
>> - Hooking Qemu with CUSE and implementing the muser.ko interface
>> 
>> For the latter, libmuser would talk to a character device just like it
>> talks to the vfio character device. We "just" need to implement that
>> backend in Qemu. :)
> 
> What about:
> * libmuser's API stays mostly unchanged but the library speaks a
>   VFIO-over-UNIX domain sockets protocol instead of talking to
>   mdev/vfio in the host kernel.

As I said above, there are advantages to the kernel model. The key one
is transparent device emulation restarts. Today, muser.ko keeps the
"device memory" internally in a prefix tree. Upon restart, a new
device emulator can recover state (eg. from a state file in /dev/shm
or similar) and remap the same memory that is already configured to
the guest via Qemu. We have a pending work item for muser.ko to also
keep the eventfds so we can recover those, too. Another advantage is
working with any userspace driver and not requiring a VMM at all.

If done entirely in userspace, the device emulator needs to allocate
the device memory somewhere that remains accessible (eg. tmpfs), with
the difference that now we may be talking about non-trivial amounts of
memory. Also, that may not be the kind of content you want lingering
around the filesystem (for the same reasons Qemu unlinks memory files
from /dev/hugepages after mmap'ing it).

That's why I'd prefer to rephrase what you said to "in addition"
instead of "instead".

> * VMMs can implement this protocol directly for POSIX-portable and
>   unprivileged operation.
> * A CUSE VFIO adapter simulates /dev/vfio so that VFIO-only VMMs can
>   still take advantage of libmuser devices.

I'm happy with that.
We need to think the credential aspect throughout to ensure nodes can
be created in the right places with the right privileges.

> 
> Assuming this is feasible, would you lose any important
> features/advantages of the muser.ko approach?  I don't know enough about
> VFIO to identify any blocker or obvious performance problems.

That's what I elaborated above. The fact that muser.ko can keep
various metadata (and other resources) about the device in the kernel
and grant it back to userspace as needed. There are ways around it,
but it requires some orchestration with tmpfs and the VMM (only so
much can be kept in tmpfs; the eventfds need to be retransmitted from
the machine emulator on request).

Restarting is a critical aspect of this. One key use case for the
project is to be able to emulate various devices from one process (for
polling). That must be able to restart for upgrades or recovery.

> 
> Regarding recovery, it seems straightforward to keep state in a tmpfs
> file that can be reopened when the device is restarted.  I don't think
> kernel code is necessary?

It adds a dependency, but isn't a show stopper. If we can work through
permission issues, making sure the VMM can reconnect and retransmit
eventfds and other state, then it should be ok.

To be clear: I'm very happy to have a userspace-only option for this,
I just don't want to ditch the kernel module (yet, anyway). :)

F.

> 
> Stefan


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2019-12-17 22:57           ` Felipe Franciosi
  2019-12-18  0:00             ` Paolo Bonzini
  2019-12-19 11:55             ` Stefan Hajnoczi
@ 2019-12-19 12:50             ` Daniel P. Berrangé
  2019-12-19 16:46               ` Daniel P. Berrangé
  2 siblings, 1 reply; 140+ messages in thread
From: Daniel P. Berrangé @ 2019-12-19 12:50 UTC (permalink / raw)
  To: Felipe Franciosi
  Cc: Elena Ufimtseva, fam, Swapnil Ingle, john.g.johnson, qemu-devel,
	kraxel, jag.raman, quintela, mst, armbru, kanth.ghatraju, thuth,
	ehabkost, konrad.wilk, dgilbert, liran.alon, Stefan Hajnoczi,
	Thanos Makatos, rth, kwolf, mreitz, ross.lagerwall,
	marcandre.lureau, pbonzini

On Tue, Dec 17, 2019 at 10:57:17PM +0000, Felipe Franciosi wrote:
> 
> 
> > On Dec 17, 2019, at 5:33 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > 
> > On Mon, Dec 16, 2019 at 07:57:32PM +0000, Felipe Franciosi wrote:
> >>> On 16 Dec 2019, at 20:47, Elena Ufimtseva <elena.ufimtseva@oracle.com> wrote:
> >>> On Fri, Dec 13, 2019 at 10:41:16AM +0000, Stefan Hajnoczi wrote:
> >>>> Is there a work-in-progress muser patch series you can post to start the
> >>>> discussion early?  That way we can avoid reviewers like myself asking
> >>>> you to make changes after you have invested a lot of time.
> >>>> 
> >>> 
> >>> Absolutely, that is our plan. At the moment we do not have the patches
> >>> ready for the review. We have setup internally a milestone and will be
> >>> sending that early version as a tarball after we have it completed.
> >>> Would be also a meeting something that could help us to stay on the same
> >>> page?
> >> 
> >> Please loop us in if you so set up a meeting.
> > 
> > There is a bi-weekly KVM Community Call that we can use for phone
> > discussions:
> > 
> >  https://calendar.google.com/calendar/embed?src=dG9iMXRqcXAzN3Y4ZXZwNzRoMHE4a3BqcXNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ
> > 
> > Or we can schedule a one-off call at any time :).
> 
> Sounds good either way, whenever it's needed.
> 
> > 
> > Questions I've seen when discussing muser with people have been:
> > 
> > 1. Can unprivileged containers create muser devices?  If not, this is a
> >   blocker for use cases that want to avoid root privileges entirely.
> 
> Yes you can. Muser device creation follows the same process as general
> mdev device creation (ie. you write to a sysfs path). That creates an
> entry in /dev/vfio and the control plane can further drop privileges
> there (set selinux contexts, &c.)

This isn't what I'd describe / consider as unprivileged, as AFAICT
although QEMU can use it unprivileged, this still requires a privileged
management process to do the setup in sysfs.

I think it is desirable to be able support a fully unprivileged
model where there is nothing requiring elevated privileges, neither
libvirtd or QEMU.

I think this basically ends up at the same requirement as support
for non-Linux hosts. We have to assume that some desirable deployment
scenarios will not be able to use Linux kernel features, either because
they lack privileges, or are simply non-Linux hosts.

> > 2. Does muser need to be in the kernel (e.g. slower to develop/ship,
> >   security reasons)?  A similar library could be implemented in
> >   userspace along the lines of the vhost-user protocol.  Although VMMs
> >   would then need to use a new libmuser-client library instead of
> >   reusing their VFIO code to access the device.
> 
> Doing it in userspace was the flow we proposed back in last year's KVM
> Forum (Edinburgh), but it got turned down. That's why we procured the
> kernel approach, which turned out to have some advantages:
> - No changes needed to Qemu
> - No Qemu needed at all for userspace drivers
> - Device emulation process restart is trivial
>   (it therefore makes device code upgrades much easier)
> 
> Having said that, nothing stops us from enhancing libmuser to talk
> directly to Qemu (for the Qemu case). I envision at least two ways of
> doing that:
> - Hooking up libmuser with Qemu directly (eg. over a unix socket)

A UNIX socket, or localhost TCP socket, sounds most appealing from a
a portability POV.

> - Hooking Qemu with CUSE and implementing the muser.ko interface

Perhaps I'm misunderstanding, but wouldn't a CUSE interface
still have issues with something needing to be privileged to
do the initial setup, and also still lack OS portability.

> For the latter, libmuser would talk to a character device just like it
> talks to the vfio character device. We "just" need to implement that
> backend in Qemu. :)
> 
> > 
> > 3. Should this feature be Linux-only?  vhost-user can be implemented on
> >   non-Linux OSes...
> 
> The userspace approach discussed above certainly can be more portable.
> Currently, muser depends on MDEV+VFIO and that's where the restriction
> comes from.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2019-12-19 12:33               ` Felipe Franciosi
@ 2019-12-19 12:55                 ` Daniel P. Berrangé
  2019-12-20  9:47                   ` Stefan Hajnoczi
  2019-12-19 16:40                 ` Jag Raman
  1 sibling, 1 reply; 140+ messages in thread
From: Daniel P. Berrangé @ 2019-12-19 12:55 UTC (permalink / raw)
  To: Felipe Franciosi
  Cc: Elena Ufimtseva, fam, Swapnil Ingle, john.g.johnson,
	Stefan Hajnoczi, qemu-devel, Walker, Benjamin, kraxel, jag.raman,
	Harris, James R, quintela, mst, armbru, kanth.ghatraju, thuth,
	ehabkost, konrad.wilk, dgilbert, liran.alon, pbonzini, rth,
	kwolf, mreitz, ross.lagerwall, marcandre.lureau, Thanos Makatos

On Thu, Dec 19, 2019 at 12:33:15PM +0000, Felipe Franciosi wrote:
> Hello,
> 
> (I've added Jim and Ben from the SPDK team to the thread.)
> 
> > On Dec 19, 2019, at 11:55 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > 
> > On Tue, Dec 17, 2019 at 10:57:17PM +0000, Felipe Franciosi wrote:
> >>> On Dec 17, 2019, at 5:33 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >>> On Mon, Dec 16, 2019 at 07:57:32PM +0000, Felipe Franciosi wrote:
> >>>>> On 16 Dec 2019, at 20:47, Elena Ufimtseva <elena.ufimtseva@oracle.com> wrote:
> >>>>> On Fri, Dec 13, 2019 at 10:41:16AM +0000, Stefan Hajnoczi wrote:
> >>> Questions I've seen when discussing muser with people have been:
> >>> 
> >>> 1. Can unprivileged containers create muser devices?  If not, this is a
> >>>  blocker for use cases that want to avoid root privileges entirely.
> >> 
> >> Yes you can. Muser device creation follows the same process as general
> >> mdev device creation (ie. you write to a sysfs path). That creates an
> >> entry in /dev/vfio and the control plane can further drop privileges
> >> there (set selinux contexts, &c.)
> > 
> > In this case there is still a privileged step during setup.  What about
> > completely unprivileged scenarios like a regular user without root or a
> > rootless container?
> 
> Oh, I see what you are saying. I suppose we need to investigate
> adjusting the privileges of the sysfs path correctly beforehand to
> allow devices to be created by non-root users. The credentials used on
> creation should be reflected on the vfio endpoint (ie. /dev/fio/<group>).
> 
> I need to look into that and get back to you.
> 
> > 
> >>> 2. Does muser need to be in the kernel (e.g. slower to develop/ship,
> >>>  security reasons)?  A similar library could be implemented in
> >>>  userspace along the lines of the vhost-user protocol.  Although VMMs
> >>>  would then need to use a new libmuser-client library instead of
> >>>  reusing their VFIO code to access the device.
> >> 
> >> Doing it in userspace was the flow we proposed back in last year's KVM
> >> Forum (Edinburgh), but it got turned down. That's why we procured the
> >> kernel approach, which turned out to have some advantages:
> >> - No changes needed to Qemu
> >> - No Qemu needed at all for userspace drivers
> >> - Device emulation process restart is trivial
> >>  (it therefore makes device code upgrades much easier)
> >> 
> >> Having said that, nothing stops us from enhancing libmuser to talk
> >> directly to Qemu (for the Qemu case). I envision at least two ways of
> >> doing that:
> >> - Hooking up libmuser with Qemu directly (eg. over a unix socket)
> >> - Hooking Qemu with CUSE and implementing the muser.ko interface
> >> 
> >> For the latter, libmuser would talk to a character device just like it
> >> talks to the vfio character device. We "just" need to implement that
> >> backend in Qemu. :)
> > 
> > What about:
> > * libmuser's API stays mostly unchanged but the library speaks a
> >   VFIO-over-UNIX domain sockets protocol instead of talking to
> >   mdev/vfio in the host kernel.
> 
> As I said above, there are advantages to the kernel model. The key one
> is transparent device emulation restarts. Today, muser.ko keeps the
> "device memory" internally in a prefix tree. Upon restart, a new
> device emulator can recover state (eg. from a state file in /dev/shm
> or similar) and remap the same memory that is already configured to
> the guest via Qemu. We have a pending work item for muser.ko to also
> keep the eventfds so we can recover those, too. Another advantage is
> working with any userspace driver and not requiring a VMM at all.
> 
> If done entirely in userspace, the device emulator needs to allocate
> the device memory somewhere that remains accessible (eg. tmpfs), with
> the difference that now we may be talking about non-trivial amounts of
> memory. Also, that may not be the kind of content you want lingering
> around the filesystem (for the same reasons Qemu unlinks memory files
> from /dev/hugepages after mmap'ing it).
> 
> That's why I'd prefer to rephrase what you said to "in addition"
> instead of "instead".
> 
> > * VMMs can implement this protocol directly for POSIX-portable and
> >   unprivileged operation.
> > * A CUSE VFIO adapter simulates /dev/vfio so that VFIO-only VMMs can
> >   still take advantage of libmuser devices.
> 
> I'm happy with that.
> We need to think the credential aspect throughout to ensure nodes can
> be created in the right places with the right privileges.
> 
> > 
> > Assuming this is feasible, would you lose any important
> > features/advantages of the muser.ko approach?  I don't know enough about
> > VFIO to identify any blocker or obvious performance problems.
> 
> That's what I elaborated above. The fact that muser.ko can keep
> various metadata (and other resources) about the device in the kernel
> and grant it back to userspace as needed. There are ways around it,
> but it requires some orchestration with tmpfs and the VMM (only so
> much can be kept in tmpfs; the eventfds need to be retransmitted from
> the machine emulator on request).
> 
> Restarting is a critical aspect of this. One key use case for the
> project is to be able to emulate various devices from one process (for
> polling). That must be able to restart for upgrades or recovery.
> 
> > 
> > Regarding recovery, it seems straightforward to keep state in a tmpfs
> > file that can be reopened when the device is restarted.  I don't think
> > kernel code is necessary?
> 
> It adds a dependency, but isn't a show stopper. If we can work through
> permission issues, making sure the VMM can reconnect and retransmit
> eventfds and other state, then it should be ok.
> 
> To be clear: I'm very happy to have a userspace-only option for this,
> I just don't want to ditch the kernel module (yet, anyway). :)

If it doesn't create too large of a burden to support both, then I think
it is very desirable. IIUC, this is saying a kernel based solution as the
optimized/optimal solution, and userspace UNIX socket based option as the
generic "works everywhere" fallback solution.



Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2019-12-18  0:00             ` Paolo Bonzini
@ 2019-12-19 13:36               ` Stefan Hajnoczi
  2019-12-20 17:15                 ` John G Johnson
  0 siblings, 1 reply; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-12-19 13:36 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Elena Ufimtseva, fam, Swapnil Ingle, john.g.johnson, qemu-devel,
	kraxel, jag.raman, quintela, mst, armbru, kanth.ghatraju,
	Felipe Franciosi, thuth, ehabkost, konrad.wilk, dgilbert,
	liran.alon, Stefan Hajnoczi, rth, kwolf, berrange, mreitz,
	ross.lagerwall, marcandre.lureau, Thanos Makatos

[-- Attachment #1: Type: text/plain, Size: 1813 bytes --]

On Wed, Dec 18, 2019 at 01:00:55AM +0100, Paolo Bonzini wrote:
> On 17/12/19 23:57, Felipe Franciosi wrote:
> > Doing it in userspace was the flow we proposed back in last year's KVM
> > Forum (Edinburgh), but it got turned down.
> 
> I think the time since then has shown that essentially the cat is out of
> the bag.  I didn't really like the idea of devices outside QEMU---and I
> still don't---but if something like "VFIO over AF_UNIX" turns out to be
> the cleanest way to implement multi-process QEMU device models, I am not
> going to pull an RMS and block that from happening.  Assuming I could
> even do so!

There are a range of approaches that will influence how out-of-process
devices can be licensed and distributed.

A VFIO-over-UNIX domain sockets approach means a stable API so that any
license (including proprietary) is possible.

Another approach is a QEMU-centric unstable protocol.  I'll call this
the qdev-over-UNIX domain sockets approach.  Maintaining an out-of-tree
device is expensive and ugly since the protocol changes between QEMU
versions in ways that are incompatible and undetectable.

On top of that, the initialization protocol message could include the
QEMU version string that the device was compiled against.  If the
version string doesn't match then QEMU will refuse to talk to the
device.

Distributing a single device executable that works with many QEMUs (e.g.
CentOS, Ubuntu) and versions becomes difficult.

I want to mention that we have the option of doing this if there are
strong concerns about out-of-tree devices.  It does have downsides:
1. Inability to share devices with other VMMs.
2. Probably won't replace vhost-user due to the out-of-tree limitations.
3. Can still be circumvented by a motivated device author.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2019-12-19 12:33               ` Felipe Franciosi
  2019-12-19 12:55                 ` Daniel P. Berrangé
@ 2019-12-19 16:40                 ` Jag Raman
  1 sibling, 0 replies; 140+ messages in thread
From: Jag Raman @ 2019-12-19 16:40 UTC (permalink / raw)
  To: Felipe Franciosi, Stefan Hajnoczi
  Cc: Elena Ufimtseva, fam, Swapnil Ingle, john.g.johnson, qemu-devel,
	Walker, Benjamin, kraxel, Harris, James R, quintela, mst, armbru,
	kanth.ghatraju, thuth, ehabkost, konrad.wilk, dgilbert,
	liran.alon, pbonzini, rth, kwolf, berrange, mreitz,
	ross.lagerwall, marcandre.lureau, Thanos Makatos



On 12/19/2019 7:33 AM, Felipe Franciosi wrote:
> Hello,
> 
> (I've added Jim and Ben from the SPDK team to the thread.)
> 
>> On Dec 19, 2019, at 11:55 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>
>> On Tue, Dec 17, 2019 at 10:57:17PM +0000, Felipe Franciosi wrote:
>>>> On Dec 17, 2019, at 5:33 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>>> On Mon, Dec 16, 2019 at 07:57:32PM +0000, Felipe Franciosi wrote:
>>>>>> On 16 Dec 2019, at 20:47, Elena Ufimtseva <elena.ufimtseva@oracle.com> wrote:
>>>>>> On Fri, Dec 13, 2019 at 10:41:16AM +0000, Stefan Hajnoczi wrote:
>>>> Questions I've seen when discussing muser with people have been:
>>>>
>>>> 1. Can unprivileged containers create muser devices?  If not, this is a
>>>>   blocker for use cases that want to avoid root privileges entirely.
>>>
>>> Yes you can. Muser device creation follows the same process as general
>>> mdev device creation (ie. you write to a sysfs path). That creates an
>>> entry in /dev/vfio and the control plane can further drop privileges
>>> there (set selinux contexts, &c.)
>>
>> In this case there is still a privileged step during setup.  What about
>> completely unprivileged scenarios like a regular user without root or a
>> rootless container?
> 
> Oh, I see what you are saying. I suppose we need to investigate
> adjusting the privileges of the sysfs path correctly beforehand to
> allow devices to be created by non-root users. The credentials used on
> creation should be reflected on the vfio endpoint (ie. /dev/fio/<group>).
> 
> I need to look into that and get back to you.

As a prerequisite to using the "vfio-pci" device in QEMU, the user
assigns the PCI device on the host bus to the VFIO kernel driver by
writing to "/sys/bus/pci/drivers/vfio-pci/new_id" and
"/sys/bus/pci/drivers/vfio-pci/bind"

I believe a privileged control plane is required to perform these
prerequisite steps. Therefore, I wonder how rootless containers or
unprivileged users currently go about using a VFIO device with QEMU/KVM.

Thanks!
--
Jag

> 
>>
>>>> 2. Does muser need to be in the kernel (e.g. slower to develop/ship,
>>>>   security reasons)?  A similar library could be implemented in
>>>>   userspace along the lines of the vhost-user protocol.  Although VMMs
>>>>   would then need to use a new libmuser-client library instead of
>>>>   reusing their VFIO code to access the device.
>>>
>>> Doing it in userspace was the flow we proposed back in last year's KVM
>>> Forum (Edinburgh), but it got turned down. That's why we procured the
>>> kernel approach, which turned out to have some advantages:
>>> - No changes needed to Qemu
>>> - No Qemu needed at all for userspace drivers
>>> - Device emulation process restart is trivial
>>>   (it therefore makes device code upgrades much easier)
>>>
>>> Having said that, nothing stops us from enhancing libmuser to talk
>>> directly to Qemu (for the Qemu case). I envision at least two ways of
>>> doing that:
>>> - Hooking up libmuser with Qemu directly (eg. over a unix socket)
>>> - Hooking Qemu with CUSE and implementing the muser.ko interface
>>>
>>> For the latter, libmuser would talk to a character device just like it
>>> talks to the vfio character device. We "just" need to implement that
>>> backend in Qemu. :)
>>
>> What about:
>> * libmuser's API stays mostly unchanged but the library speaks a
>>    VFIO-over-UNIX domain sockets protocol instead of talking to
>>    mdev/vfio in the host kernel.
> 
> As I said above, there are advantages to the kernel model. The key one
> is transparent device emulation restarts. Today, muser.ko keeps the
> "device memory" internally in a prefix tree. Upon restart, a new
> device emulator can recover state (eg. from a state file in /dev/shm
> or similar) and remap the same memory that is already configured to
> the guest via Qemu. We have a pending work item for muser.ko to also
> keep the eventfds so we can recover those, too. Another advantage is
> working with any userspace driver and not requiring a VMM at all.
> 
> If done entirely in userspace, the device emulator needs to allocate
> the device memory somewhere that remains accessible (eg. tmpfs), with
> the difference that now we may be talking about non-trivial amounts of
> memory. Also, that may not be the kind of content you want lingering
> around the filesystem (for the same reasons Qemu unlinks memory files
> from /dev/hugepages after mmap'ing it).
> 
> That's why I'd prefer to rephrase what you said to "in addition"
> instead of "instead".
> 
>> * VMMs can implement this protocol directly for POSIX-portable and
>>    unprivileged operation.
>> * A CUSE VFIO adapter simulates /dev/vfio so that VFIO-only VMMs can
>>    still take advantage of libmuser devices.
> 
> I'm happy with that.
> We need to think the credential aspect throughout to ensure nodes can
> be created in the right places with the right privileges.
> 
>>
>> Assuming this is feasible, would you lose any important
>> features/advantages of the muser.ko approach?  I don't know enough about
>> VFIO to identify any blocker or obvious performance problems.
> 
> That's what I elaborated above. The fact that muser.ko can keep
> various metadata (and other resources) about the device in the kernel
> and grant it back to userspace as needed. There are ways around it,
> but it requires some orchestration with tmpfs and the VMM (only so
> much can be kept in tmpfs; the eventfds need to be retransmitted from
> the machine emulator on request).
> 
> Restarting is a critical aspect of this. One key use case for the
> project is to be able to emulate various devices from one process (for
> polling). That must be able to restart for upgrades or recovery.
> 
>>
>> Regarding recovery, it seems straightforward to keep state in a tmpfs
>> file that can be reopened when the device is restarted.  I don't think
>> kernel code is necessary?
> 
> It adds a dependency, but isn't a show stopper. If we can work through
> permission issues, making sure the VMM can reconnect and retransmit
> eventfds and other state, then it should be ok.
> 
> To be clear: I'm very happy to have a userspace-only option for this,
> I just don't want to ditch the kernel module (yet, anyway). :)
> 
> F.
> 
>>
>> Stefan
> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2019-12-19 12:50             ` Daniel P. Berrangé
@ 2019-12-19 16:46               ` Daniel P. Berrangé
  0 siblings, 0 replies; 140+ messages in thread
From: Daniel P. Berrangé @ 2019-12-19 16:46 UTC (permalink / raw)
  To: Felipe Franciosi
  Cc: Elena Ufimtseva, fam, Swapnil Ingle, john.g.johnson, qemu-devel,
	kraxel, jag.raman, quintela, mst, armbru, kanth.ghatraju, thuth,
	ehabkost, konrad.wilk, dgilbert, liran.alon, Stefan Hajnoczi,
	pbonzini, rth, kwolf, mreitz, ross.lagerwall, marcandre.lureau,
	Thanos Makatos

On Thu, Dec 19, 2019 at 12:50:21PM +0000, Daniel P. Berrangé wrote:
> On Tue, Dec 17, 2019 at 10:57:17PM +0000, Felipe Franciosi wrote:
> > 
> > 
> > > On Dec 17, 2019, at 5:33 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > 
> > > On Mon, Dec 16, 2019 at 07:57:32PM +0000, Felipe Franciosi wrote:
> > >>> On 16 Dec 2019, at 20:47, Elena Ufimtseva <elena.ufimtseva@oracle.com> wrote:
> > >>> On Fri, Dec 13, 2019 at 10:41:16AM +0000, Stefan Hajnoczi wrote:
> > >>>> Is there a work-in-progress muser patch series you can post to start the
> > >>>> discussion early?  That way we can avoid reviewers like myself asking
> > >>>> you to make changes after you have invested a lot of time.
> > >>>> 
> > >>> 
> > >>> Absolutely, that is our plan. At the moment we do not have the patches
> > >>> ready for the review. We have setup internally a milestone and will be
> > >>> sending that early version as a tarball after we have it completed.
> > >>> Would be also a meeting something that could help us to stay on the same
> > >>> page?
> > >> 
> > >> Please loop us in if you so set up a meeting.
> > > 
> > > There is a bi-weekly KVM Community Call that we can use for phone
> > > discussions:
> > > 
> > >  https://calendar.google.com/calendar/embed?src=dG9iMXRqcXAzN3Y4ZXZwNzRoMHE4a3BqcXNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ
> > > 
> > > Or we can schedule a one-off call at any time :).
> > 
> > Sounds good either way, whenever it's needed.
> > 
> > > 
> > > Questions I've seen when discussing muser with people have been:
> > > 
> > > 1. Can unprivileged containers create muser devices?  If not, this is a
> > >   blocker for use cases that want to avoid root privileges entirely.
> > 
> > Yes you can. Muser device creation follows the same process as general
> > mdev device creation (ie. you write to a sysfs path). That creates an
> > entry in /dev/vfio and the control plane can further drop privileges
> > there (set selinux contexts, &c.)
> 
> This isn't what I'd describe / consider as unprivileged, as AFAICT
> although QEMU can use it unprivileged, this still requires a privileged
> management process to do the setup in sysfs.
> 
> I think it is desirable to be able support a fully unprivileged
> model where there is nothing requiring elevated privileges, neither
> libvirtd or QEMU.
> 
> I think this basically ends up at the same requirement as support
> for non-Linux hosts. We have to assume that some desirable deployment
> scenarios will not be able to use Linux kernel features, either because
> they lack privileges, or are simply non-Linux hosts.
> 
> > > 2. Does muser need to be in the kernel (e.g. slower to develop/ship,
> > >   security reasons)?  A similar library could be implemented in
> > >   userspace along the lines of the vhost-user protocol.  Although VMMs
> > >   would then need to use a new libmuser-client library instead of
> > >   reusing their VFIO code to access the device.
> > 
> > Doing it in userspace was the flow we proposed back in last year's KVM
> > Forum (Edinburgh), but it got turned down. That's why we procured the
> > kernel approach, which turned out to have some advantages:
> > - No changes needed to Qemu
> > - No Qemu needed at all for userspace drivers
> > - Device emulation process restart is trivial
> >   (it therefore makes device code upgrades much easier)
> > 
> > Having said that, nothing stops us from enhancing libmuser to talk
> > directly to Qemu (for the Qemu case). I envision at least two ways of
> > doing that:
> > - Hooking up libmuser with Qemu directly (eg. over a unix socket)
> 
> A UNIX socket, or localhost TCP socket, sounds most appealing from a
> a portability POV.

Felipe reminded me on IRC that muser needs FD passing, so a TCP
localhost socket is not an option.  So UNIX socket would give us
portability to any platform, except for Windows. It is not the
end of the world to lack Windows support.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2019-12-19 12:55                 ` Daniel P. Berrangé
@ 2019-12-20  9:47                   ` Stefan Hajnoczi
  2019-12-20  9:50                     ` Paolo Bonzini
  2019-12-20 10:22                     ` Daniel P. Berrangé
  0 siblings, 2 replies; 140+ messages in thread
From: Stefan Hajnoczi @ 2019-12-20  9:47 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Elena Ufimtseva, fam, Swapnil Ingle, john.g.johnson, qemu-devel,
	Walker, Benjamin, kraxel, jag.raman, Harris, James R, quintela,
	mst, armbru, kanth.ghatraju, Felipe Franciosi, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, pbonzini, rth, kwolf, mreitz,
	ross.lagerwall, marcandre.lureau, Thanos Makatos

[-- Attachment #1: Type: text/plain, Size: 1510 bytes --]

On Thu, Dec 19, 2019 at 12:55:04PM +0000, Daniel P. Berrangé wrote:
> On Thu, Dec 19, 2019 at 12:33:15PM +0000, Felipe Franciosi wrote:
> > > On Dec 19, 2019, at 11:55 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > > On Tue, Dec 17, 2019 at 10:57:17PM +0000, Felipe Franciosi wrote:
> > >>> On Dec 17, 2019, at 5:33 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > >>> On Mon, Dec 16, 2019 at 07:57:32PM +0000, Felipe Franciosi wrote:
> > >>>>> On 16 Dec 2019, at 20:47, Elena Ufimtseva <elena.ufimtseva@oracle.com> wrote:
> > >>>>> On Fri, Dec 13, 2019 at 10:41:16AM +0000, Stefan Hajnoczi wrote:
> > To be clear: I'm very happy to have a userspace-only option for this,
> > I just don't want to ditch the kernel module (yet, anyway). :)
> 
> If it doesn't create too large of a burden to support both, then I think
> it is very desirable. IIUC, this is saying a kernel based solution as the
> optimized/optimal solution, and userspace UNIX socket based option as the
> generic "works everywhere" fallback solution.

I'm slightly in favor of the kernel implementation because it keeps us
better aligned with VFIO.  That means solving problems in one place only
and less reinventing the wheel.

Knowing that a userspace implementation is possible is a plus though.
Maybe that option will become attractive in the future and someone will
develop it.  In fact, a userspace implementation may be a cool Google
Summer of Code project idea that I'd like to co-mentor.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2019-12-20  9:47                   ` Stefan Hajnoczi
@ 2019-12-20  9:50                     ` Paolo Bonzini
  2019-12-20 14:14                       ` Felipe Franciosi
  2019-12-20 10:22                     ` Daniel P. Berrangé
  1 sibling, 1 reply; 140+ messages in thread
From: Paolo Bonzini @ 2019-12-20  9:50 UTC (permalink / raw)
  To: Stefan Hajnoczi, Daniel P. Berrangé
  Cc: Elena Ufimtseva, fam, Swapnil Ingle, john.g.johnson, qemu-devel,
	Walker, Benjamin, kraxel, jag.raman, Harris, James R, quintela,
	mst, armbru, kanth.ghatraju, Felipe Franciosi, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, rth, kwolf, mreitz,
	ross.lagerwall, marcandre.lureau, Thanos Makatos

On 20/12/19 10:47, Stefan Hajnoczi wrote:
>> If it doesn't create too large of a burden to support both, then I think
>> it is very desirable. IIUC, this is saying a kernel based solution as the
>> optimized/optimal solution, and userspace UNIX socket based option as the
>> generic "works everywhere" fallback solution.
> I'm slightly in favor of the kernel implementation because it keeps us
> better aligned with VFIO.  That means solving problems in one place only
> and less reinventing the wheel.

I think there are anyway going to be some differences with VFIO.

For example, currently VFIO requires pinning user memory.  Is that a
limitation for muser too?  If so, that would be a big disadvantage; if
not, however, management tools need to learn that muser devices unlike
other VFIO devices do not prevent overcommit.

Paolo



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2019-12-20  9:47                   ` Stefan Hajnoczi
  2019-12-20  9:50                     ` Paolo Bonzini
@ 2019-12-20 10:22                     ` Daniel P. Berrangé
  2020-01-02 10:42                       ` Stefan Hajnoczi
  1 sibling, 1 reply; 140+ messages in thread
From: Daniel P. Berrangé @ 2019-12-20 10:22 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Elena Ufimtseva, fam, Swapnil Ingle, john.g.johnson, qemu-devel,
	Walker, Benjamin, kraxel, jag.raman, Harris, James R, quintela,
	mst, armbru, kanth.ghatraju, Felipe Franciosi, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, Thanos Makatos, rth, kwolf,
	mreitz, ross.lagerwall, marcandre.lureau, pbonzini

On Fri, Dec 20, 2019 at 09:47:12AM +0000, Stefan Hajnoczi wrote:
> On Thu, Dec 19, 2019 at 12:55:04PM +0000, Daniel P. Berrangé wrote:
> > On Thu, Dec 19, 2019 at 12:33:15PM +0000, Felipe Franciosi wrote:
> > > > On Dec 19, 2019, at 11:55 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > > > On Tue, Dec 17, 2019 at 10:57:17PM +0000, Felipe Franciosi wrote:
> > > >>> On Dec 17, 2019, at 5:33 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > >>> On Mon, Dec 16, 2019 at 07:57:32PM +0000, Felipe Franciosi wrote:
> > > >>>>> On 16 Dec 2019, at 20:47, Elena Ufimtseva <elena.ufimtseva@oracle.com> wrote:
> > > >>>>> On Fri, Dec 13, 2019 at 10:41:16AM +0000, Stefan Hajnoczi wrote:
> > > To be clear: I'm very happy to have a userspace-only option for this,
> > > I just don't want to ditch the kernel module (yet, anyway). :)
> > 
> > If it doesn't create too large of a burden to support both, then I think
> > it is very desirable. IIUC, this is saying a kernel based solution as the
> > optimized/optimal solution, and userspace UNIX socket based option as the
> > generic "works everywhere" fallback solution.
> 
> I'm slightly in favor of the kernel implementation because it keeps us
> better aligned with VFIO.  That means solving problems in one place only
> and less reinventing the wheel.
> 
> Knowing that a userspace implementation is possible is a plus though.
> Maybe that option will become attractive in the future and someone will
> develop it.  In fact, a userspace implementation may be a cool Google
> Summer of Code project idea that I'd like to co-mentor.

If it is technically viable as an approach, then I think  we should be
treating a fully unprivileged muser-over-UNIX socket as a higher priority
than just "maybe a GSoC student will want todo it".

Libvirt is getting strong message from KubeVirt project that they want to
be running both libvirtd and QEMU fully unprivileged. This allows their
containers to be unprivileged. Anything that requires privileges requires
jumping through extra hoops writing custom code in KubeVirt to do things
outside libvirt in side loaded privileged containers and this limits how
where those features can be used.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2019-12-20  9:50                     ` Paolo Bonzini
@ 2019-12-20 14:14                       ` Felipe Franciosi
  2019-12-20 15:25                         ` Alex Williamson
  0 siblings, 1 reply; 140+ messages in thread
From: Felipe Franciosi @ 2019-12-20 14:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Elena Ufimtseva, fam, Swapnil Ingle, john.g.johnson,
	Stefan Hajnoczi, qemu-devel, Walker, Benjamin, kraxel, jag.raman,
	Harris, James R, quintela, mst, armbru, kanth.ghatraju, thuth,
	ehabkost, konrad.wilk, dgilbert, liran.alon, rth, kwolf,
	Daniel P. Berrangé,
	mreitz, ross.lagerwall, marcandre.lureau, Thanos Makatos



> On Dec 20, 2019, at 9:50 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> On 20/12/19 10:47, Stefan Hajnoczi wrote:
>>> If it doesn't create too large of a burden to support both, then I think
>>> it is very desirable. IIUC, this is saying a kernel based solution as the
>>> optimized/optimal solution, and userspace UNIX socket based option as the
>>> generic "works everywhere" fallback solution.
>> I'm slightly in favor of the kernel implementation because it keeps us
>> better aligned with VFIO.  That means solving problems in one place only
>> and less reinventing the wheel.
> 
> I think there are anyway going to be some differences with VFIO.
> 
> For example, currently VFIO requires pinning user memory.  Is that a
> limitation for muser too?  If so, that would be a big disadvantage; if
> not, however, management tools need to learn that muser devices unlike
> other VFIO devices do not prevent overcommit.

More or less. We pin them today, but I think we don't really have to.
I created an issue to look into it:
https://github.com/nutanix/muser/issues/28

In any case, if Qemu is ballooning and calls UNMAP_DMA for memory that
has been ballooned out, then we would release it.

The reason we keep it pinned is to support libmuser restarts. IIRC,
VFIO doesn't need to pin pages for mdev devices (that's the job of the
mdev driver on the other end via vfio_pin_pages()). It only keeps the
DMA entries in a RB tree.

If my understanding is right, then we can probably just keep the map
Qemu registered (without holding the pages) and call vfio_pin_pages()
on demand when libmuser restarts.

For context, this is how the DMA memory registration works today:

1) Qemu calls ioctl(vfio_fd, IOMMU_MAP_DMA, &vm_map);

2) The iommu driver notifies muser.ko

3) Muser.ko pins the pages (in get_dma_map(), called from below)
(https://github.com/nutanix/muser/blob/master/kmod/muser.c#L711)

4) Muser.ko notifies libmuser about the memory registration
(The iommu driver context goes to sleep, hence the pinning)

5) Libmuser wakes up and calls mmap() on muser.ko

6) Muser.ko inserts the VM memory in libmuser's context
(https://github.com/nutanix/muser/blob/master/kmod/muser.c#L543)

7) Libmuser tells muser.ko that it's done

8) Muser.ko iommu callback context that was sleeping wakes up

9) Muser.ko places the memory in a "dma_list" for the mudev and returns.

We could potentially modify the last step to unpin and keep only what
we need for a future call to vfio_pin_pages(), but I need to check if
that works.

Cheers,
Felipe

> 
> Paolo
> 



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2019-12-20 14:14                       ` Felipe Franciosi
@ 2019-12-20 15:25                         ` Alex Williamson
  2019-12-20 16:00                           ` Felipe Franciosi
  2020-02-25  9:16                           ` Thanos Makatos
  0 siblings, 2 replies; 140+ messages in thread
From: Alex Williamson @ 2019-12-20 15:25 UTC (permalink / raw)
  To: Felipe Franciosi
  Cc: Elena Ufimtseva, fam, Swapnil Ingle, john.g.johnson,
	Stefan Hajnoczi, qemu-devel, Walker, Benjamin, kraxel, jag.raman,
	Harris, James R, quintela, mst, armbru, kanth.ghatraju, thuth,
	ehabkost, konrad.wilk, dgilbert, liran.alon, Thanos Makatos, rth,
	kwolf, Daniel P. Berrangé,
	mreitz, ross.lagerwall, marcandre.lureau, Paolo Bonzini

On Fri, 20 Dec 2019 14:14:33 +0000
Felipe Franciosi <felipe@nutanix.com> wrote:

> > On Dec 20, 2019, at 9:50 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> > 
> > On 20/12/19 10:47, Stefan Hajnoczi wrote:  
> >>> If it doesn't create too large of a burden to support both, then I think
> >>> it is very desirable. IIUC, this is saying a kernel based solution as the
> >>> optimized/optimal solution, and userspace UNIX socket based option as the
> >>> generic "works everywhere" fallback solution.  
> >> I'm slightly in favor of the kernel implementation because it keeps us
> >> better aligned with VFIO.  That means solving problems in one place only
> >> and less reinventing the wheel.  
> > 
> > I think there are anyway going to be some differences with VFIO.
> > 
> > For example, currently VFIO requires pinning user memory.  Is that a
> > limitation for muser too?  If so, that would be a big disadvantage; if
> > not, however, management tools need to learn that muser devices unlike
> > other VFIO devices do not prevent overcommit.  
> 
> More or less. We pin them today, but I think we don't really have to.
> I created an issue to look into it:
> https://github.com/nutanix/muser/issues/28
> 
> In any case, if Qemu is ballooning and calls UNMAP_DMA for memory that
> has been ballooned out, then we would release it.

That's exactly the problem with ballooning and vfio, it doesn't unmap
memory, it just zaps it out of the VM, to be demand faulted back in
later.  It's very vCPU-centric.  Memory hotplug is the only case where
we'll see a memory region get unmapped.
 
> The reason we keep it pinned is to support libmuser restarts. IIRC,
> VFIO doesn't need to pin pages for mdev devices (that's the job of the
> mdev driver on the other end via vfio_pin_pages()). It only keeps the
> DMA entries in a RB tree.
> 
> If my understanding is right, then we can probably just keep the map
> Qemu registered (without holding the pages) and call vfio_pin_pages()
> on demand when libmuser restarts.
> 
> For context, this is how the DMA memory registration works today:
> 
> 1) Qemu calls ioctl(vfio_fd, IOMMU_MAP_DMA, &vm_map);
> 
> 2) The iommu driver notifies muser.ko
> 
> 3) Muser.ko pins the pages (in get_dma_map(), called from below)
> (https://github.com/nutanix/muser/blob/master/kmod/muser.c#L711)

Yikes, it pins every page??  vfio_pin_pages() intends for the vendor
driver to be much smarter than this :-\  Thanks,

Alex
 
> 4) Muser.ko notifies libmuser about the memory registration
> (The iommu driver context goes to sleep, hence the pinning)
> 
> 5) Libmuser wakes up and calls mmap() on muser.ko
> 
> 6) Muser.ko inserts the VM memory in libmuser's context
> (https://github.com/nutanix/muser/blob/master/kmod/muser.c#L543)
> 
> 7) Libmuser tells muser.ko that it's done
> 
> 8) Muser.ko iommu callback context that was sleeping wakes up
> 
> 9) Muser.ko places the memory in a "dma_list" for the mudev and returns.
> 
> We could potentially modify the last step to unpin and keep only what
> we need for a future call to vfio_pin_pages(), but I need to check if
> that works.
> 
> Cheers,
> Felipe
> 
> > 
> > Paolo
> >   
> 
> 



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2019-12-20 15:25                         ` Alex Williamson
@ 2019-12-20 16:00                           ` Felipe Franciosi
  2020-02-25  9:16                           ` Thanos Makatos
  1 sibling, 0 replies; 140+ messages in thread
From: Felipe Franciosi @ 2019-12-20 16:00 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Elena Ufimtseva, fam, Swapnil Ingle, john.g.johnson,
	Stefan Hajnoczi, qemu-devel, Walker, Benjamin, kraxel, jag.raman,
	Harris, James R, quintela, mst, armbru, kanth.ghatraju, thuth,
	ehabkost, konrad.wilk, dgilbert, liran.alon, Thanos Makatos, rth,
	kwolf, Daniel P. Berrangé,
	mreitz, ross.lagerwall, marcandre.lureau, Paolo Bonzini

Heya,

> On Dec 20, 2019, at 3:25 PM, Alex Williamson <alex.williamson@redhat.com> wrote:
> 
> On Fri, 20 Dec 2019 14:14:33 +0000
> Felipe Franciosi <felipe@nutanix.com> wrote:
> 
>>> On Dec 20, 2019, at 9:50 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> 
>>> On 20/12/19 10:47, Stefan Hajnoczi wrote:  
>>>>> If it doesn't create too large of a burden to support both, then I think
>>>>> it is very desirable. IIUC, this is saying a kernel based solution as the
>>>>> optimized/optimal solution, and userspace UNIX socket based option as the
>>>>> generic "works everywhere" fallback solution.  
>>>> I'm slightly in favor of the kernel implementation because it keeps us
>>>> better aligned with VFIO.  That means solving problems in one place only
>>>> and less reinventing the wheel.  
>>> 
>>> I think there are anyway going to be some differences with VFIO.
>>> 
>>> For example, currently VFIO requires pinning user memory.  Is that a
>>> limitation for muser too?  If so, that would be a big disadvantage; if
>>> not, however, management tools need to learn that muser devices unlike
>>> other VFIO devices do not prevent overcommit.  
>> 
>> More or less. We pin them today, but I think we don't really have to.
>> I created an issue to look into it:
>> https://github.com/nutanix/muser/issues/28 
>> 
>> In any case, if Qemu is ballooning and calls UNMAP_DMA for memory that
>> has been ballooned out, then we would release it.
> 
> That's exactly the problem with ballooning and vfio, it doesn't unmap
> memory, it just zaps it out of the VM, to be demand faulted back in
> later.  It's very vCPU-centric.  Memory hotplug is the only case where
> we'll see a memory region get unmapped.
> 
>> The reason we keep it pinned is to support libmuser restarts. IIRC,
>> VFIO doesn't need to pin pages for mdev devices (that's the job of the
>> mdev driver on the other end via vfio_pin_pages()). It only keeps the
>> DMA entries in a RB tree.
>> 
>> If my understanding is right, then we can probably just keep the map
>> Qemu registered (without holding the pages) and call vfio_pin_pages()
>> on demand when libmuser restarts.
>> 
>> For context, this is how the DMA memory registration works today:
>> 
>> 1) Qemu calls ioctl(vfio_fd, IOMMU_MAP_DMA, &vm_map);
>> 
>> 2) The iommu driver notifies muser.ko
>> 
>> 3) Muser.ko pins the pages (in get_dma_map(), called from below)
>> (https://github.com/nutanix/muser/blob/master/kmod/muser.c#L711)
> 
> Yikes, it pins every page??  vfio_pin_pages() intends for the vendor
> driver to be much smarter than this :-\  Thanks,

We can't afford a kernel round trip every time we need to translate
GPAs, so that's how we solved it. There's an action item to do pin in
groups of 512 (which is the limit we saw in vfio_pin_pages()). Can you
elaborate on the problems of the approach and whether there's
something better we can do?

F.

> 
> Alex
> 
>> 4) Muser.ko notifies libmuser about the memory registration
>> (The iommu driver context goes to sleep, hence the pinning)
>> 
>> 5) Libmuser wakes up and calls mmap() on muser.ko
>> 
>> 6) Muser.ko inserts the VM memory in libmuser's context
>> (https://github.com/nutanix/muser/blob/master/kmod/muser.c#L543)
>> 
>> 7) Libmuser tells muser.ko that it's done
>> 
>> 8) Muser.ko iommu callback context that was sleeping wakes up
>> 
>> 9) Muser.ko places the memory in a "dma_list" for the mudev and returns.
>> 
>> We could potentially modify the last step to unpin and keep only what
>> we need for a future call to vfio_pin_pages(), but I need to check if
>> that works.
>> 
>> Cheers,
>> Felipe
>> 
>>> 
>>> Paolo



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2019-12-19 13:36               ` Stefan Hajnoczi
@ 2019-12-20 17:15                 ` John G Johnson
  2020-01-02 10:00                   ` Stefan Hajnoczi
  2020-01-02 10:04                   ` Stefan Hajnoczi
  0 siblings, 2 replies; 140+ messages in thread
From: John G Johnson @ 2019-12-20 17:15 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Elena Ufimtseva, fam, Swapnil Ingle, mst, qemu-devel, kraxel,
	jag.raman, quintela, armbru, kanth.ghatraju, Felipe Franciosi,
	thuth, ehabkost, konrad.wilk, dgilbert, liran.alon,
	Stefan Hajnoczi, Thanos Makatos, rth, kwolf, berrange, mreitz,
	ross.lagerwall, marcandre.lureau, Paolo Bonzini



> On Dec 19, 2019, at 5:36 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> 
> On Wed, Dec 18, 2019 at 01:00:55AM +0100, Paolo Bonzini wrote:
>> On 17/12/19 23:57, Felipe Franciosi wrote:
>>> Doing it in userspace was the flow we proposed back in last year's KVM
>>> Forum (Edinburgh), but it got turned down.
>> 
>> I think the time since then has shown that essentially the cat is out of
>> the bag.  I didn't really like the idea of devices outside QEMU---and I
>> still don't---but if something like "VFIO over AF_UNIX" turns out to be
>> the cleanest way to implement multi-process QEMU device models, I am not
>> going to pull an RMS and block that from happening.  Assuming I could
>> even do so!
> 
> There are a range of approaches that will influence how out-of-process
> devices can be licensed and distributed.
> 
> A VFIO-over-UNIX domain sockets approach means a stable API so that any
> license (including proprietary) is possible.
> 
> Another approach is a QEMU-centric unstable protocol.  I'll call this
> the qdev-over-UNIX domain sockets approach.  Maintaining an out-of-tree
> device is expensive and ugly since the protocol changes between QEMU
> versions in ways that are incompatible and undetectable.
> 
> On top of that, the initialization protocol message could include the
> QEMU version string that the device was compiled against.  If the
> version string doesn't match then QEMU will refuse to talk to the
> device.
> 

	This is very similar to our multi-process QEMU implementation before
we looked into using muser.  The differences are:

We use one object per emulated device type in QEMU rather than having a single
VFIO type that can masquerade as any PCI device.

We don’t pin guest memory; we pass the QEMU file descriptors used to create
guest memory to the emulation program, and it mmap()s them itself. (ala
vhost-user).

								JJ



> Distributing a single device executable that works with many QEMUs (e.g.
> CentOS, Ubuntu) and versions becomes difficult.
> 
> I want to mention that we have the option of doing this if there are
> strong concerns about out-of-tree devices.  It does have downsides:
> 1. Inability to share devices with other VMMs.
> 2. Probably won't replace vhost-user due to the out-of-tree limitations.
> 3. Can still be circumvented by a motivated device author.
> 
> Stefan



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2019-12-20 17:15                 ` John G Johnson
@ 2020-01-02 10:00                   ` Stefan Hajnoczi
  2020-01-02 10:04                   ` Stefan Hajnoczi
  1 sibling, 0 replies; 140+ messages in thread
From: Stefan Hajnoczi @ 2020-01-02 10:00 UTC (permalink / raw)
  To: John G Johnson
  Cc: Elena Ufimtseva, fam, Swapnil Ingle, mst, qemu-devel, kraxel,
	jag.raman, quintela, armbru, kanth.ghatraju, Felipe Franciosi,
	thuth, ehabkost, konrad.wilk, dgilbert, liran.alon,
	Stefan Hajnoczi, Thanos Makatos, rth, kwolf, berrange, mreitz,
	ross.lagerwall, marcandre.lureau, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 557 bytes --]

On Fri, Dec 20, 2019 at 09:15:40AM -0800, John G Johnson wrote:
> > On Dec 19, 2019, at 5:36 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > On Wed, Dec 18, 2019 at 01:00:55AM +0100, Paolo Bonzini wrote:
> >> On 17/12/19 23:57, Felipe Franciosi wrote:
> We don’t pin guest memory; we pass the QEMU file descriptors used to create
> guest memory to the emulation program, and it mmap()s them itself. (ala
> vhost-user).

Does muser really require pinning?  If yes, then it seems like a
limitation that can be removed in the future.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2019-12-20 17:15                 ` John G Johnson
  2020-01-02 10:00                   ` Stefan Hajnoczi
@ 2020-01-02 10:04                   ` Stefan Hajnoczi
  1 sibling, 0 replies; 140+ messages in thread
From: Stefan Hajnoczi @ 2020-01-02 10:04 UTC (permalink / raw)
  To: John G Johnson
  Cc: Elena Ufimtseva, fam, Swapnil Ingle, mst, qemu-devel, kraxel,
	jag.raman, quintela, armbru, kanth.ghatraju, Felipe Franciosi,
	thuth, ehabkost, konrad.wilk, dgilbert, liran.alon,
	Stefan Hajnoczi, Thanos Makatos, rth, kwolf, berrange, mreitz,
	ross.lagerwall, marcandre.lureau, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 619 bytes --]

On Fri, Dec 20, 2019 at 09:15:40AM -0800, John G Johnson wrote:
> > On Dec 19, 2019, at 5:36 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > On Wed, Dec 18, 2019 at 01:00:55AM +0100, Paolo Bonzini wrote:
> >> On 17/12/19 23:57, Felipe Franciosi wrote:
> We don’t pin guest memory; we pass the QEMU file descriptors used to create
> guest memory to the emulation program, and it mmap()s them itself. (ala
> vhost-user).

Please ignore my reply.  I just saw pinning was discussed in another
sub-thread.  Felipe posted this URL for tracking the issue:
https://github.com/nutanix/muser/issues/28

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2019-12-20 10:22                     ` Daniel P. Berrangé
@ 2020-01-02 10:42                       ` Stefan Hajnoczi
  2020-01-02 11:03                         ` Felipe Franciosi
  0 siblings, 1 reply; 140+ messages in thread
From: Stefan Hajnoczi @ 2020-01-02 10:42 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Elena Ufimtseva, fam, Swapnil Ingle, john.g.johnson, qemu-devel,
	Walker, Benjamin, kraxel, jag.raman, Harris, James R, quintela,
	mst, armbru, kanth.ghatraju, Felipe Franciosi, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, Thanos Makatos, rth, kwolf,
	mreitz, ross.lagerwall, marcandre.lureau, pbonzini

[-- Attachment #1: Type: text/plain, Size: 4285 bytes --]

On Fri, Dec 20, 2019 at 10:22:37AM +0000, Daniel P. Berrangé wrote:
> On Fri, Dec 20, 2019 at 09:47:12AM +0000, Stefan Hajnoczi wrote:
> > On Thu, Dec 19, 2019 at 12:55:04PM +0000, Daniel P. Berrangé wrote:
> > > On Thu, Dec 19, 2019 at 12:33:15PM +0000, Felipe Franciosi wrote:
> > > > > On Dec 19, 2019, at 11:55 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > > > > On Tue, Dec 17, 2019 at 10:57:17PM +0000, Felipe Franciosi wrote:
> > > > >>> On Dec 17, 2019, at 5:33 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > > >>> On Mon, Dec 16, 2019 at 07:57:32PM +0000, Felipe Franciosi wrote:
> > > > >>>>> On 16 Dec 2019, at 20:47, Elena Ufimtseva <elena.ufimtseva@oracle.com> wrote:
> > > > >>>>> On Fri, Dec 13, 2019 at 10:41:16AM +0000, Stefan Hajnoczi wrote:
> > > > To be clear: I'm very happy to have a userspace-only option for this,
> > > > I just don't want to ditch the kernel module (yet, anyway). :)
> > > 
> > > If it doesn't create too large of a burden to support both, then I think
> > > it is very desirable. IIUC, this is saying a kernel based solution as the
> > > optimized/optimal solution, and userspace UNIX socket based option as the
> > > generic "works everywhere" fallback solution.
> > 
> > I'm slightly in favor of the kernel implementation because it keeps us
> > better aligned with VFIO.  That means solving problems in one place only
> > and less reinventing the wheel.
> > 
> > Knowing that a userspace implementation is possible is a plus though.
> > Maybe that option will become attractive in the future and someone will
> > develop it.  In fact, a userspace implementation may be a cool Google
> > Summer of Code project idea that I'd like to co-mentor.
> 
> If it is technically viable as an approach, then I think  we should be
> treating a fully unprivileged muser-over-UNIX socket as a higher priority
> than just "maybe a GSoC student will want todo it".
> 
> Libvirt is getting strong message from KubeVirt project that they want to
> be running both libvirtd and QEMU fully unprivileged. This allows their
> containers to be unprivileged. Anything that requires privileges requires
> jumping through extra hoops writing custom code in KubeVirt to do things
> outside libvirt in side loaded privileged containers and this limits how
> where those features can be used.

Okay this makes sense.

There needs to be a consensus on whether to go with a qdev-over-socket
approach that is QEMU-specific and strongly discourages third-party
device distribution or a muser-over-socket approach that offers a stable
API for VMM interoperability and third-party device distribution.

Interoperability between VMMs and also DPDK/SPDK is important because
they form today's open source virtualization community.  No one project
or codebase covers all use cases or interesting developments.  If we are
short-sighted and prevent collaboration then we'll become isolated.

On the other hand, I'm personally opposed to proprietary vendors that
contribute very little to open source.  We make that easier by offering
a stable API for third-party devices.  A stable API discourages open
source contributions while allowing proprietary vendors to benefit from
the work that the open source community is doing.

One way to choose a position is to balance up the open source vs
proprietary applications of a stable API.  At this point in time I think
the DPDK/SPDK and rust-vmm communities bring enough to the table that
it's worth fostering collaboration through a stable API.  The benefit of
having the stable API is large enough that the disadvantage of making
life easier for proprietary vendors can be accepted.

This is just a more elaborate explanation for the "the cat is out of the
bag" comments that have already been made on licensing.  Does anyone
still disagree or want to discuss further?

If there is agreement that a stable API is okay then I think the
practical way to do this is to first merge a cleaned-up version of
multi-process QEMU as an unstable experimental API.  Once it's being
tested and used we can write a protocol specification and publish it as
a stable interface when the spec has addressed most use cases.

Does this sound good?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2020-01-02 10:42                       ` Stefan Hajnoczi
@ 2020-01-02 11:03                         ` Felipe Franciosi
  2020-01-02 18:55                           ` Marc-André Lureau
  2020-01-03 15:59                           ` Stefan Hajnoczi
  0 siblings, 2 replies; 140+ messages in thread
From: Felipe Franciosi @ 2020-01-02 11:03 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Elena Ufimtseva, fam, Swapnil Ingle, john.g.johnson, qemu-devel,
	Walker, Benjamin, kraxel, jag.raman, Harris,  James R, quintela,
	mst, armbru, kanth.ghatraju, thuth, ehabkost, konrad.wilk,
	dgilbert, liran.alon, Thanos Makatos, rth, kwolf,
	Daniel P. Berrangé,
	mreitz, ross.lagerwall, marcandre.lureau, pbonzini



> On Jan 2, 2020, at 10:42 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> 
> On Fri, Dec 20, 2019 at 10:22:37AM +0000, Daniel P. Berrangé wrote:
>> On Fri, Dec 20, 2019 at 09:47:12AM +0000, Stefan Hajnoczi wrote:
>>> On Thu, Dec 19, 2019 at 12:55:04PM +0000, Daniel P. Berrangé wrote:
>>>> On Thu, Dec 19, 2019 at 12:33:15PM +0000, Felipe Franciosi wrote:
>>>>>> On Dec 19, 2019, at 11:55 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>>>>> On Tue, Dec 17, 2019 at 10:57:17PM +0000, Felipe Franciosi wrote:
>>>>>>>> On Dec 17, 2019, at 5:33 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>>>>>>> On Mon, Dec 16, 2019 at 07:57:32PM +0000, Felipe Franciosi wrote:
>>>>>>>>>> On 16 Dec 2019, at 20:47, Elena Ufimtseva <elena.ufimtseva@oracle.com> wrote:
>>>>>>>>>> On Fri, Dec 13, 2019 at 10:41:16AM +0000, Stefan Hajnoczi wrote:
>>>>> To be clear: I'm very happy to have a userspace-only option for this,
>>>>> I just don't want to ditch the kernel module (yet, anyway). :)
>>>> 
>>>> If it doesn't create too large of a burden to support both, then I think
>>>> it is very desirable. IIUC, this is saying a kernel based solution as the
>>>> optimized/optimal solution, and userspace UNIX socket based option as the
>>>> generic "works everywhere" fallback solution.
>>> 
>>> I'm slightly in favor of the kernel implementation because it keeps us
>>> better aligned with VFIO.  That means solving problems in one place only
>>> and less reinventing the wheel.
>>> 
>>> Knowing that a userspace implementation is possible is a plus though.
>>> Maybe that option will become attractive in the future and someone will
>>> develop it.  In fact, a userspace implementation may be a cool Google
>>> Summer of Code project idea that I'd like to co-mentor.
>> 
>> If it is technically viable as an approach, then I think  we should be
>> treating a fully unprivileged muser-over-UNIX socket as a higher priority
>> than just "maybe a GSoC student will want todo it".
>> 
>> Libvirt is getting strong message from KubeVirt project that they want to
>> be running both libvirtd and QEMU fully unprivileged. This allows their
>> containers to be unprivileged. Anything that requires privileges requires
>> jumping through extra hoops writing custom code in KubeVirt to do things
>> outside libvirt in side loaded privileged containers and this limits how
>> where those features can be used.
> 
> Okay this makes sense.
> 
> There needs to be a consensus on whether to go with a qdev-over-socket
> approach that is QEMU-specific and strongly discourages third-party
> device distribution or a muser-over-socket approach that offers a stable
> API for VMM interoperability and third-party device distribution.

The reason I dislike yet another offloading protocol (ie. there is
vhost, there is vfio, and then there would be qdev-over-socket) is
that we keep reinventing the wheel. I very much prefer picking
something solid (eg. VFIO) and keep investing on it.

> Interoperability between VMMs and also DPDK/SPDK is important because
> they form today's open source virtualization community.  No one project
> or codebase covers all use cases or interesting developments.  If we are
> short-sighted and prevent collaboration then we'll become isolated.
> 
> On the other hand, I'm personally opposed to proprietary vendors that
> contribute very little to open source.  We make that easier by offering
> a stable API for third-party devices.  A stable API discourages open
> source contributions while allowing proprietary vendors to benefit from
> the work that the open source community is doing.

I appreciate the concern. However, my opinion is that vendors cannot
be stopped by providing them with unstable APIs. There are plenty of
examples where projects were forked and maintained separately to keep
certain things under control and that is bad for everyone. The
community doesn't get contributions back, and vendors have extra pain
to maintain the forks. Furthermore, service vendors will always get
away with murder by copying whatever they like and using however they
please (since they are not sharing the software).

I would rather look at examples like KVM. It's a relatively stable API
with several proprietary users. Nevertheless, we see loads of
contributions to it (perhaps less than we would want, but plenty).

> 
> One way to choose a position is to balance up the open source vs
> proprietary applications of a stable API.  At this point in time I think
> the DPDK/SPDK and rust-vmm communities bring enough to the table that
> it's worth fostering collaboration through a stable API.  The benefit of
> having the stable API is large enough that the disadvantage of making
> life easier for proprietary vendors can be accepted.

I agree with you as per reasoning above.

> 
> This is just a more elaborate explanation for the "the cat is out of the
> bag" comments that have already been made on licensing.  Does anyone
> still disagree or want to discuss further?
> 
> If there is agreement that a stable API is okay then I think the
> practical way to do this is to first merge a cleaned-up version of
> multi-process QEMU as an unstable experimental API.  Once it's being
> tested and used we can write a protocol specification and publish it as
> a stable interface when the spec has addressed most use cases.
> 
> Does this sound good?

In that case, wouldn't it be preferable to revive our proposal from
Edinburgh (KVM Forum 2018)? Our prototypes moved more of the Qemu VFIO
code to "common" and added a "user" backend underneath it, similar to
how vhost-user-scsi moved some of vhost-scsi to vhost-scsi-common and
added vhost-user-scsi. It was centric on PCI, but it doesn't have to
be. The other side can be implemented in libmuser for facilitating things.

I even recall highlighting that vhost-user could be moved underneath
that later, greatly simplifying lots of other Qemu code.

F.


> 
> Stefan


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2019-12-17 16:33         ` Stefan Hajnoczi
  2019-12-17 22:57           ` Felipe Franciosi
@ 2020-01-02 16:01           ` Elena Ufimtseva
  2020-01-03 15:00             ` Stefan Hajnoczi
  1 sibling, 1 reply; 140+ messages in thread
From: Elena Ufimtseva @ 2020-01-02 16:01 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: fam, john.g.johnson, Swapnil Ingle, mst, qemu-devel, kraxel,
	jag.raman, quintela, armbru, kanth.ghatraju, Felipe Franciosi,
	thuth, ehabkost, konrad.wilk, dgilbert, liran.alon,
	Thanos Makatos, rth, kwolf, berrange, mreitz, ross.lagerwall,
	marcandre.lureau, pbonzini

On Tue, Dec 17, 2019 at 04:33:16PM +0000, Stefan Hajnoczi wrote:
> On Mon, Dec 16, 2019 at 07:57:32PM +0000, Felipe Franciosi wrote:
> > > On 16 Dec 2019, at 20:47, Elena Ufimtseva <elena.ufimtseva@oracle.com> wrote:
> > > On Fri, Dec 13, 2019 at 10:41:16AM +0000, Stefan Hajnoczi wrote:
> > >> Is there a work-in-progress muser patch series you can post to start the
> > >> discussion early?  That way we can avoid reviewers like myself asking
> > >> you to make changes after you have invested a lot of time.
> > >> 
> > > 
> > > Absolutely, that is our plan. At the moment we do not have the patches
> > > ready for the review. We have setup internally a milestone and will be
> > > sending that early version as a tarball after we have it completed.
> > > Would be also a meeting something that could help us to stay on the same
> > > page?
> > 
> > Please loop us in if you so set up a meeting.
>

Hi Stefan

And happy New Year to everyone!

> There is a bi-weekly KVM Community Call that we can use for phone
> discussions:
> 
>   https://calendar.google.com/calendar/embed?src=dG9iMXRqcXAzN3Y4ZXZwNzRoMHE4a3BqcXNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ
>

Our team would like to join the call on Jan 14 and maybe talk over few things.
Felipe, will you and your team be joining as well?

> Or we can schedule a one-off call at any time :).
>

Awesome! Thank you, we will use for sure this opportunity.

Elena

> Questions I've seen when discussing muser with people have been:
> 
> 1. Can unprivileged containers create muser devices?  If not, this is a
>    blocker for use cases that want to avoid root privileges entirely.
> 
> 2. Does muser need to be in the kernel (e.g. slower to develop/ship,
>    security reasons)?  A similar library could be implemented in
>    userspace along the lines of the vhost-user protocol.  Although VMMs
>    would then need to use a new libmuser-client library instead of
>    reusing their VFIO code to access the device.
> 
> 3. Should this feature be Linux-only?  vhost-user can be implemented on
>    non-Linux OSes...
> 
> Stefan




^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2020-01-02 11:03                         ` Felipe Franciosi
@ 2020-01-02 18:55                           ` Marc-André Lureau
  2020-01-08 16:31                             ` Stefan Hajnoczi
  2020-01-03 15:59                           ` Stefan Hajnoczi
  1 sibling, 1 reply; 140+ messages in thread
From: Marc-André Lureau @ 2020-01-02 18:55 UTC (permalink / raw)
  To: Felipe Franciosi
  Cc: Elena Ufimtseva, fam, Swapnil Ingle, john.g.johnson,
	Stefan Hajnoczi, qemu-devel, Walker, Benjamin, kraxel, jag.raman,
	Harris, James R, quintela, mst, armbru, kanth.ghatraju, thuth,
	ehabkost, konrad.wilk, dgilbert, liran.alon, Thanos Makatos, rth,
	kwolf, Daniel P. Berrangé,
	mreitz, ross.lagerwall, pbonzini

Hi

On Thu, Jan 2, 2020 at 3:03 PM Felipe Franciosi <felipe@nutanix.com> wrote:
> The reason I dislike yet another offloading protocol (ie. there is
> vhost, there is vfio, and then there would be qdev-over-socket) is
> that we keep reinventing the wheel. I very much prefer picking
> something solid (eg. VFIO) and keep investing on it.

I don't have a lot of experience with VFIO, so I can't tell if it's
really solid for the user-space case. Alex W could probably discuss
that.

> In that case, wouldn't it be preferable to revive our proposal from
> Edinburgh (KVM Forum 2018)? Our prototypes moved more of the Qemu VFIO
> code to "common" and added a "user" backend underneath it, similar to
> how vhost-user-scsi moved some of vhost-scsi to vhost-scsi-common and
> added vhost-user-scsi. It was centric on PCI, but it doesn't have to
> be. The other side can be implemented in libmuser for facilitating things.

Same idea back in KVM forum 2017 (briefly mentioned at the end of our
talk with Conrad)

The PoC is still around:
https://github.com/elmarco/qemu/tree/wip/vfio-user/contrib/libvfio-user

> I even recall highlighting that vhost-user could be moved underneath
> that later, greatly simplifying lots of other Qemu code.

That would eventually be an option, but vhost-user is already quite
complicated. We could try to split it up somehow for the non-virtio
parts.

cheers

-- 
Marc-André Lureau


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2020-01-02 16:01           ` Elena Ufimtseva
@ 2020-01-03 15:00             ` Stefan Hajnoczi
  0 siblings, 0 replies; 140+ messages in thread
From: Stefan Hajnoczi @ 2020-01-03 15:00 UTC (permalink / raw)
  To: Elena Ufimtseva
  Cc: fam, john.g.johnson, Swapnil Ingle, mst, qemu-devel, kraxel,
	jag.raman, quintela, armbru, kanth.ghatraju, Felipe Franciosi,
	thuth, ehabkost, konrad.wilk, dgilbert, liran.alon,
	Thanos Makatos, rth, kwolf, berrange, mreitz, ross.lagerwall,
	marcandre.lureau, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1482 bytes --]

On Thu, Jan 02, 2020 at 08:01:36AM -0800, Elena Ufimtseva wrote:
> On Tue, Dec 17, 2019 at 04:33:16PM +0000, Stefan Hajnoczi wrote:
> > On Mon, Dec 16, 2019 at 07:57:32PM +0000, Felipe Franciosi wrote:
> > > > On 16 Dec 2019, at 20:47, Elena Ufimtseva <elena.ufimtseva@oracle.com> wrote:
> > > > On Fri, Dec 13, 2019 at 10:41:16AM +0000, Stefan Hajnoczi wrote:
> > > >> Is there a work-in-progress muser patch series you can post to start the
> > > >> discussion early?  That way we can avoid reviewers like myself asking
> > > >> you to make changes after you have invested a lot of time.
> > > >> 
> > > > 
> > > > Absolutely, that is our plan. At the moment we do not have the patches
> > > > ready for the review. We have setup internally a milestone and will be
> > > > sending that early version as a tarball after we have it completed.
> > > > Would be also a meeting something that could help us to stay on the same
> > > > page?
> > > 
> > > Please loop us in if you so set up a meeting.
> >
> 
> Hi Stefan
> 
> And happy New Year to everyone!
> 
> > There is a bi-weekly KVM Community Call that we can use for phone
> > discussions:
> > 
> >   https://calendar.google.com/calendar/embed?src=dG9iMXRqcXAzN3Y4ZXZwNzRoMHE4a3BqcXNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ
> >
> 
> Our team would like to join the call on Jan 14 and maybe talk over few things.
> Felipe, will you and your team be joining as well?

Great, I'll be there.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2020-01-02 11:03                         ` Felipe Franciosi
  2020-01-02 18:55                           ` Marc-André Lureau
@ 2020-01-03 15:59                           ` Stefan Hajnoczi
  2020-01-14  1:56                             ` John G Johnson
  1 sibling, 1 reply; 140+ messages in thread
From: Stefan Hajnoczi @ 2020-01-03 15:59 UTC (permalink / raw)
  To: Felipe Franciosi
  Cc: Elena Ufimtseva, fam, Swapnil Ingle, john.g.johnson, qemu-devel,
	Walker, Benjamin, kraxel, jag.raman, Harris, James R, quintela,
	mst, armbru, kanth.ghatraju, thuth, ehabkost, konrad.wilk,
	dgilbert, liran.alon, Thanos Makatos, rth, kwolf,
	Daniel P. Berrangé,
	mreitz, ross.lagerwall, marcandre.lureau, pbonzini

[-- Attachment #1: Type: text/plain, Size: 4616 bytes --]

On Thu, Jan 02, 2020 at 11:03:22AM +0000, Felipe Franciosi wrote:
> > On Jan 2, 2020, at 10:42 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > On Fri, Dec 20, 2019 at 10:22:37AM +0000, Daniel P. Berrangé wrote:
> >> On Fri, Dec 20, 2019 at 09:47:12AM +0000, Stefan Hajnoczi wrote:
> >>> On Thu, Dec 19, 2019 at 12:55:04PM +0000, Daniel P. Berrangé wrote:
> >>>> On Thu, Dec 19, 2019 at 12:33:15PM +0000, Felipe Franciosi wrote:
> >>>>>> On Dec 19, 2019, at 11:55 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> >>>>>> On Tue, Dec 17, 2019 at 10:57:17PM +0000, Felipe Franciosi wrote:
> >>>>>>>> On Dec 17, 2019, at 5:33 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >>>>>>>> On Mon, Dec 16, 2019 at 07:57:32PM +0000, Felipe Franciosi wrote:
> >>>>>>>>>> On 16 Dec 2019, at 20:47, Elena Ufimtseva <elena.ufimtseva@oracle.com> wrote:
> >>>>>>>>>> On Fri, Dec 13, 2019 at 10:41:16AM +0000, Stefan Hajnoczi wrote:
> >>>>> To be clear: I'm very happy to have a userspace-only option for this,
> >>>>> I just don't want to ditch the kernel module (yet, anyway). :)
> >>>> 
> >>>> If it doesn't create too large of a burden to support both, then I think
> >>>> it is very desirable. IIUC, this is saying a kernel based solution as the
> >>>> optimized/optimal solution, and userspace UNIX socket based option as the
> >>>> generic "works everywhere" fallback solution.
> >>> 
> >>> I'm slightly in favor of the kernel implementation because it keeps us
> >>> better aligned with VFIO.  That means solving problems in one place only
> >>> and less reinventing the wheel.
> >>> 
> >>> Knowing that a userspace implementation is possible is a plus though.
> >>> Maybe that option will become attractive in the future and someone will
> >>> develop it.  In fact, a userspace implementation may be a cool Google
> >>> Summer of Code project idea that I'd like to co-mentor.
> >> 
> >> If it is technically viable as an approach, then I think  we should be
> >> treating a fully unprivileged muser-over-UNIX socket as a higher priority
> >> than just "maybe a GSoC student will want todo it".
> >> 
> >> Libvirt is getting strong message from KubeVirt project that they want to
> >> be running both libvirtd and QEMU fully unprivileged. This allows their
> >> containers to be unprivileged. Anything that requires privileges requires
> >> jumping through extra hoops writing custom code in KubeVirt to do things
> >> outside libvirt in side loaded privileged containers and this limits how
> >> where those features can be used.
> > 
> > Okay this makes sense.
> > 
> > There needs to be a consensus on whether to go with a qdev-over-socket
> > approach that is QEMU-specific and strongly discourages third-party
> > device distribution or a muser-over-socket approach that offers a stable
> > API for VMM interoperability and third-party device distribution.
> 
> The reason I dislike yet another offloading protocol (ie. there is
> vhost, there is vfio, and then there would be qdev-over-socket) is
> that we keep reinventing the wheel. I very much prefer picking
> something solid (eg. VFIO) and keep investing on it.

I like the idea of sticking close to VFIO too.  The first step is
figuring out whether VFIO can be mapped to a UNIX domain socket protocol
and many non-VFIO protocol messages are required.  Hopefully that extra
non-VFIO stuff isn't too large.

If implementations can use the kernel uapi vfio header files then we're
on track for compatibility with VFIO.

> > This is just a more elaborate explanation for the "the cat is out of the
> > bag" comments that have already been made on licensing.  Does anyone
> > still disagree or want to discuss further?
> > 
> > If there is agreement that a stable API is okay then I think the
> > practical way to do this is to first merge a cleaned-up version of
> > multi-process QEMU as an unstable experimental API.  Once it's being
> > tested and used we can write a protocol specification and publish it as
> > a stable interface when the spec has addressed most use cases.
> > 
> > Does this sound good?
> 
> In that case, wouldn't it be preferable to revive our proposal from
> Edinburgh (KVM Forum 2018)? Our prototypes moved more of the Qemu VFIO
> code to "common" and added a "user" backend underneath it, similar to
> how vhost-user-scsi moved some of vhost-scsi to vhost-scsi-common and
> added vhost-user-scsi. It was centric on PCI, but it doesn't have to
> be. The other side can be implemented in libmuser for facilitating things.

That sounds good.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2020-01-02 18:55                           ` Marc-André Lureau
@ 2020-01-08 16:31                             ` Stefan Hajnoczi
  0 siblings, 0 replies; 140+ messages in thread
From: Stefan Hajnoczi @ 2020-01-08 16:31 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Elena Ufimtseva, fam, Swapnil Ingle, john.g.johnson, qemu-devel,
	Walker, Benjamin, kraxel, jag.raman, Harris, James R, quintela,
	mst, armbru, kanth.ghatraju, Felipe Franciosi, thuth, ehabkost,
	konrad.wilk, dgilbert, liran.alon, Thanos Makatos, rth, kwolf,
	Daniel P. Berrangé,
	mreitz, ross.lagerwall, pbonzini

[-- Attachment #1: Type: text/plain, Size: 718 bytes --]

On Thu, Jan 02, 2020 at 10:55:46PM +0400, Marc-André Lureau wrote:
> On Thu, Jan 2, 2020 at 3:03 PM Felipe Franciosi <felipe@nutanix.com> wrote:
> > I even recall highlighting that vhost-user could be moved underneath
> > that later, greatly simplifying lots of other Qemu code.
> 
> That would eventually be an option, but vhost-user is already quite
> complicated. We could try to split it up somehow for the non-virtio
> parts.

I hope we can deprecate vhost-user.  New out-of-process devices should
just implement VFIO-over-socket with virtio-pci.  This way
out-of-process devices are full VIRTIO devices and it's cleaner than
having the vhost-user protocol that exposes a subset of VIRTIO.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2020-01-03 15:59                           ` Stefan Hajnoczi
@ 2020-01-14  1:56                             ` John G Johnson
  2020-01-17 17:25                               ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 140+ messages in thread
From: John G Johnson @ 2020-01-14  1:56 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Elena Ufimtseva, fam, Swapnil Ingle, mst, qemu-devel, Walker,
	Benjamin, kraxel, jag.raman, Harris, James R, quintela, armbru,
	kanth.ghatraju, Felipe Franciosi, thuth, ehabkost, konrad.wilk,
	dgilbert, liran.alon, Thanos Makatos, rth, kwolf,
	"Daniel P. Berrangé",
	mreitz, ross.lagerwall, marcandre.lureau, pbonzini



> On Jan 3, 2020, at 7:59 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> 
> On Thu, Jan 02, 2020 at 11:03:22AM +0000, Felipe Franciosi wrote:
>>> On Jan 2, 2020, at 10:42 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>> On Fri, Dec 20, 2019 at 10:22:37AM +0000, Daniel P. Berrangé wrote:
>>>> On Fri, Dec 20, 2019 at 09:47:12AM +0000, Stefan Hajnoczi wrote:
>>>>> On Thu, Dec 19, 2019 at 12:55:04PM +0000, Daniel P. Berrangé wrote:
>>>>>> On Thu, Dec 19, 2019 at 12:33:15PM +0000, Felipe Franciosi wrote:
>>>>>>>> On Dec 19, 2019, at 11:55 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>>>>>>> On Tue, Dec 17, 2019 at 10:57:17PM +0000, Felipe Franciosi wrote:
>>>>>>>>>> On Dec 17, 2019, at 5:33 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>>>>>>>>> On Mon, Dec 16, 2019 at 07:57:32PM +0000, Felipe Franciosi wrote:
>>>>>>>>>>>> On 16 Dec 2019, at 20:47, Elena Ufimtseva <elena.ufimtseva@oracle.com> wrote:
>>>>>>>>>>>> On Fri, Dec 13, 2019 at 10:41:16AM +0000, Stefan Hajnoczi wrote:
>>>>>>> To be clear: I'm very happy to have a userspace-only option for this,
>>>>>>> I just don't want to ditch the kernel module (yet, anyway). :)
>>>>>> 
>>>>>> If it doesn't create too large of a burden to support both, then I think
>>>>>> it is very desirable. IIUC, this is saying a kernel based solution as the
>>>>>> optimized/optimal solution, and userspace UNIX socket based option as the
>>>>>> generic "works everywhere" fallback solution.
>>>>> 
>>>>> I'm slightly in favor of the kernel implementation because it keeps us
>>>>> better aligned with VFIO.  That means solving problems in one place only
>>>>> and less reinventing the wheel.
>>>>> 
>>>>> Knowing that a userspace implementation is possible is a plus though.
>>>>> Maybe that option will become attractive in the future and someone will
>>>>> develop it.  In fact, a userspace implementation may be a cool Google
>>>>> Summer of Code project idea that I'd like to co-mentor.
>>>> 
>>>> If it is technically viable as an approach, then I think  we should be
>>>> treating a fully unprivileged muser-over-UNIX socket as a higher priority
>>>> than just "maybe a GSoC student will want todo it".
>>>> 
>>>> Libvirt is getting strong message from KubeVirt project that they want to
>>>> be running both libvirtd and QEMU fully unprivileged. This allows their
>>>> containers to be unprivileged. Anything that requires privileges requires
>>>> jumping through extra hoops writing custom code in KubeVirt to do things
>>>> outside libvirt in side loaded privileged containers and this limits how
>>>> where those features can be used.
>>> 
>>> Okay this makes sense.
>>> 
>>> There needs to be a consensus on whether to go with a qdev-over-socket
>>> approach that is QEMU-specific and strongly discourages third-party
>>> device distribution or a muser-over-socket approach that offers a stable
>>> API for VMM interoperability and third-party device distribution.
>> 
>> The reason I dislike yet another offloading protocol (ie. there is
>> vhost, there is vfio, and then there would be qdev-over-socket) is
>> that we keep reinventing the wheel. I very much prefer picking
>> something solid (eg. VFIO) and keep investing on it.
> 
> I like the idea of sticking close to VFIO too.  The first step is
> figuring out whether VFIO can be mapped to a UNIX domain socket protocol
> and many non-VFIO protocol messages are required.  Hopefully that extra
> non-VFIO stuff isn't too large.
> 


	I looked at this and think we could map VFIO commands over a
UNIX socket without a lot of difficulty.  We'd have to use SCM
messages to pass file descriptors from the QEMU process to the
emulation process for certain operations, but that shouldn't be
a big problem.  Here are the mission mode operations:

configuration

	VFIO defines a number of configuration ioctl()s that we could
turn into messages, but if we make the protocol specific to PCI, then
all of the information they transmit (e.g., device regions and
interrupts) can be discovered by parsing the device's PCI config
space.  A lot of the current VFIO code that parses config space could
be re-used to do this.

MMIO

	VFIO uses reads and writes on the VFIO file descriptor to
perform MMIOs to the device.  The read/write offset encodes the VFIO
region and offset of the MMIO. (the VFIO regions correspond to PCI
BARs) These would have to be changed to send messages that include the
VFIO region and offset (and data for writes) to the emulation process.

interrupts

	VFIO creates eventfds that are sent to the kernel driver so it
can inject interrupts into a guest.  We would have to send these
eventfds over the socket to the emulation process using SCM messages.
The emulation process could then trigger interrupts by writing on the
eventfd.

DMA

	This is one place where I might diverge from VFIO.  It uses an
ioctl to tell the kernel driver what areas of guest memory the device
can address.  The driver then pins that memory so it can be programmed
into a HW IOMMU.  We could avoid pinning of guest memory by adopting
the vhost-user idea of sending the file descriptors used by QEMU to
create guest memory to the emulation process, and having it mmap() the
guest itself.  IOMMUs are handled by having the emulation process
request device DMA to guest PA translations from QEMU.



> If implementations can use the kernel uapi vfio header files then we're
> on track for compatibility with VFIO.
> 
>>> This is just a more elaborate explanation for the "the cat is out of the
>>> bag" comments that have already been made on licensing.  Does anyone
>>> still disagree or want to discuss further?
>>> 
>>> If there is agreement that a stable API is okay then I think the
>>> practical way to do this is to first merge a cleaned-up version of
>>> multi-process QEMU as an unstable experimental API.  Once it's being
>>> tested and used we can write a protocol specification and publish it as
>>> a stable interface when the spec has addressed most use cases.
>>> 
>>> Does this sound good?
>> 
>> In that case, wouldn't it be preferable to revive our proposal from
>> Edinburgh (KVM Forum 2018)? Our prototypes moved more of the Qemu VFIO
>> code to "common" and added a "user" backend underneath it, similar to
>> how vhost-user-scsi moved some of vhost-scsi to vhost-scsi-common and
>> added vhost-user-scsi. It was centric on PCI, but it doesn't have to
>> be. The other side can be implemented in libmuser for facilitating things.
> 
> That sounds good.
> 

       The emulation program API could be based on the current
libmuser API or the libvfio-user API.  The protocol itself wouldn’t
care which is chosen.  Our multi-processQEMU project would have to
change how devices are specified from the QEMU command line to the
emulation process command line.

							JJ



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2020-01-14  1:56                             ` John G Johnson
@ 2020-01-17 17:25                               ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 140+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-17 17:25 UTC (permalink / raw)
  To: John G Johnson
  Cc: Elena Ufimtseva, fam, Swapnil Ingle, mst, Stefan Hajnoczi,
	qemu-devel, Walker, Benjamin, kraxel, jag.raman, Harris, James R,
	quintela, armbru, kanth.ghatraju, Felipe Franciosi, thuth,
	ehabkost, konrad.wilk, liran.alon, Thanos Makatos, rth, kwolf,
	"Daniel P. Berrangé",
	mreitz, ross.lagerwall, marcandre.lureau, pbonzini

* John G Johnson (john.g.johnson@oracle.com) wrote:

<snip>

> DMA
> 
> 	This is one place where I might diverge from VFIO.  It uses an
> ioctl to tell the kernel driver what areas of guest memory the device
> can address.  The driver then pins that memory so it can be programmed
> into a HW IOMMU.  We could avoid pinning of guest memory by adopting
> the vhost-user idea of sending the file descriptors used by QEMU to
> create guest memory to the emulation process, and having it mmap() the
> guest itself.  IOMMUs are handled by having the emulation process
> request device DMA to guest PA translations from QEMU.

The interface in vhost-user to pass these memory fd's is a bit hairy;
so it would be great if there was something better for multi-process.

Some things to think about:
  a) vhost-user filters it so that areas of memory not backed by an fd
aren't passed to the client; this filters out some of the device
specific RAM blocks that aren't really normal RAM.

  b) Hugepages are tricky; especially on a PC where the 0-1MB area is
broken up into chunks and you're trying to mmap 2MB chunks into the
client.

  c) Postcopy with vhost-user was pretty tricky as well; there needs
to be some coordination with the qemu to handle pages that are missing.

  d) Some RAM mappings can change; mostly not the ones sent to the
client; but just watch out that these can happen at unexpected times.

Dave


> 
> 
> > If implementations can use the kernel uapi vfio header files then we're
> > on track for compatibility with VFIO.
> > 
> >>> This is just a more elaborate explanation for the "the cat is out of the
> >>> bag" comments that have already been made on licensing.  Does anyone
> >>> still disagree or want to discuss further?
> >>> 
> >>> If there is agreement that a stable API is okay then I think the
> >>> practical way to do this is to first merge a cleaned-up version of
> >>> multi-process QEMU as an unstable experimental API.  Once it's being
> >>> tested and used we can write a protocol specification and publish it as
> >>> a stable interface when the spec has addressed most use cases.
> >>> 
> >>> Does this sound good?
> >> 
> >> In that case, wouldn't it be preferable to revive our proposal from
> >> Edinburgh (KVM Forum 2018)? Our prototypes moved more of the Qemu VFIO
> >> code to "common" and added a "user" backend underneath it, similar to
> >> how vhost-user-scsi moved some of vhost-scsi to vhost-scsi-common and
> >> added vhost-user-scsi. It was centric on PCI, but it doesn't have to
> >> be. The other side can be implemented in libmuser for facilitating things.
> > 
> > That sounds good.
> > 
> 
>        The emulation program API could be based on the current
> libmuser API or the libvfio-user API.  The protocol itself wouldn’t
> care which is chosen.  Our multi-processQEMU project would have to
> change how devices are specified from the QEMU command line to the
> emulation process command line.
> 
> 							JJ
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
  2019-12-20 15:25                         ` Alex Williamson
  2019-12-20 16:00                           ` Felipe Franciosi
@ 2020-02-25  9:16                           ` Thanos Makatos
  1 sibling, 0 replies; 140+ messages in thread
From: Thanos Makatos @ 2020-02-25  9:16 UTC (permalink / raw)
  To: Alex Williamson, Felipe Franciosi
  Cc: Elena Ufimtseva, fam, Swapnil Ingle, john.g.johnson,
	Stefan Hajnoczi, qemu-devel, Walker, Benjamin, kraxel, jag.raman,
	Harris, James R, quintela, mst, armbru, kanth.ghatraju, thuth,
	ehabkost, konrad.wilk, dgilbert, liran.alon, rth, kwolf,
	Daniel P. Berrangé,
	mreitz, ross.lagerwall, marcandre.lureau, Paolo Bonzini

> > 3) Muser.ko pins the pages (in get_dma_map(), called from below)
> > (https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_nutanix_muser_blob_master_kmod_muser.c-
> 23L711&d=DwICAg&c=s883GpUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJvtw6ogtt
> i46atk736SI4vgsJiUKIyDE&m=C8rTp4SZoy4YNcZWntiROp3otxCyKbLoQXBw8O
> SB0TM&s=G2JfW1GcVNc_iph7C4hE285sTZM8JrR4dYXgmcyAZPE&e= )
> 
> Yikes, it pins every page??  vfio_pin_pages() intends for the vendor
> driver to be much smarter than this :-\  Thanks,

We no longer have to pin pages at all. Instead we grab the fd backing the VMA
and inject it in libmuser, and then request it to mmap that file. This also
solves a few other problems and is far simpler to implement.


^ permalink raw reply	[flat|nested] 140+ messages in thread

end of thread, other threads:[~2020-02-25  9:18 UTC | newest]

Thread overview: 140+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 01/49] multi-process: memory: alloc RAM from file at offset Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 02/49] multi-process: util: Add qemu_thread_cancel() to cancel running thread Jagannathan Raman
2019-11-13 15:30   ` Stefan Hajnoczi
2019-11-13 15:38     ` Jag Raman
2019-11-13 15:51       ` Daniel P. Berrangé
2019-11-13 16:04         ` Jag Raman
2019-11-13 16:35           ` Daniel P. Berrangé
2019-10-24  9:08 ` [RFC v4 PATCH 03/49] multi-process: add a command line option for debug file Jagannathan Raman
2019-11-13 15:35   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 04/49] multi-process: Add stub functions to facilate build of multi-process Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 05/49] multi-process: Add config option for multi-process QEMU Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 06/49] multi-process: build system for remote device process Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 07/49] multi-process: define mpqemu-link object Jagannathan Raman
2019-11-11 16:41   ` Stefan Hajnoczi
2019-11-13 15:47     ` Jag Raman
2019-11-13 15:53   ` Stefan Hajnoczi
2019-11-18 15:26     ` Jag Raman
2019-10-24  9:08 ` [RFC v4 PATCH 08/49] multi-process: add functions to synchronize proxy and remote endpoints Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 09/49] multi-process: setup PCI host bridge for remote device Jagannathan Raman
2019-11-13 16:07   ` Stefan Hajnoczi
2019-11-18 15:25     ` Jag Raman
2019-11-21 10:37       ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 10/49] multi-process: setup a machine object for remote device process Jagannathan Raman
2019-11-13 16:22   ` Stefan Hajnoczi
2019-11-18 15:29     ` Jag Raman
2019-10-24  9:08 ` [RFC v4 PATCH 11/49] multi-process: setup memory manager for remote device Jagannathan Raman
2019-11-13 16:33   ` Stefan Hajnoczi
2019-11-13 16:34     ` Jag Raman
2019-10-24  9:08 ` [RFC v4 PATCH 12/49] multi-process: remote process initialization Jagannathan Raman
2019-11-13 16:38   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 13/49] multi-process: introduce proxy object Jagannathan Raman
2019-11-21 11:09   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 14/49] mutli-process: build remote command line args Jagannathan Raman
2019-11-21 11:23   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 15/49] multi-process: PCI BAR read/write handling for proxy & remote endpoints Jagannathan Raman
2019-11-21 11:33   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 16/49] multi-process: Add LSI device proxy object Jagannathan Raman
2019-11-21 11:35   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 17/49] multi-process: Synchronize remote memory Jagannathan Raman
2019-11-21 11:44   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 18/49] multi-process: create IOHUB object to handle irq Jagannathan Raman
2019-11-21 12:02   ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 19/49] multi-process: configure remote side devices Jagannathan Raman
2019-11-21 12:05   ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 20/49] multi-process: add qdev_proxy_add to create proxy devices Jagannathan Raman
2019-11-21 12:16   ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 21/49] multi-process: remote: add setup_devices and setup_drive msg processing Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 22/49] multi-process: remote: use fd for socket from parent process Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 23/49] multi-process: remote: add create_done condition Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 24/49] multi-process: add processing of remote drive and device command line Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 25/49] multi-process: Introduce build flags to separate remote process code Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 26/49] multi-process: refractor vl.c code to re-use in remote Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 27/49] multi-process: add remote option Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 28/49] multi-process: add remote options parser Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 29/49] multi-process: add parse_cmdline in remote process Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 30/49] multi-process: send heartbeat messages to remote Jagannathan Raman
2019-11-11 16:27   ` Stefan Hajnoczi
2019-11-13 16:01     ` Jag Raman
2019-11-21 12:19       ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 31/49] multi-process: handle heartbeat messages in remote process Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 32/49] multi-process: Use separate MMIO communication channel Jagannathan Raman
2019-11-11 16:21   ` Stefan Hajnoczi
2019-11-13 16:14     ` Jag Raman
2019-11-21 12:31       ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 33/49] multi-process: perform device reset in the remote process Jagannathan Raman
2019-11-11 16:19   ` Stefan Hajnoczi
2019-11-13 16:15     ` Jag Raman
2019-10-24  9:09 ` [RFC v4 PATCH 34/49] multi-process/mon: choose HMP commands based on target Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 35/49] multi-process/mon: stub functions to enable QMP module for remote process Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 36/49] multi-process/mon: enable QMP module support in the " Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 37/49] multi-process/mon: Refactor monitor/chardev functions out of vl.c Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 38/49] multi-process/mon: Initialize QMP module for remote processes Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 39/49] multi-process: prevent duplicate memory initialization in remote Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 40/49] multi-process/mig: build migration module in the remote process Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 41/49] multi-process/mig: Enable VMSD save in the Proxy object Jagannathan Raman
2019-11-13 15:50   ` Daniel P. Berrangé
2019-11-13 16:32     ` Jag Raman
2019-11-13 17:11       ` Daniel P. Berrangé
2019-11-18 15:42         ` Jag Raman
2019-11-22 10:34           ` Dr. David Alan Gilbert
2019-10-24  9:09 ` [RFC v4 PATCH 42/49] multi-process/mig: Send VMSD of remote to " Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 43/49] multi-process/mig: Load VMSD in the proxy object Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 44/49] multi-process/mig: refactor runstate_check into common file Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 45/49] multi-process/mig: Synchronize runstate of remote process Jagannathan Raman
2019-11-11 16:17   ` Stefan Hajnoczi
2019-11-13 16:33     ` Jag Raman
2019-10-24  9:09 ` [RFC v4 PATCH 46/49] multi-process/mig: Restore the VMSD in " Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 47/49] multi-process: Enable support for multiple devices in remote Jagannathan Raman
2019-11-11 16:15   ` Stefan Hajnoczi
2019-11-13 16:21     ` Jag Raman
2019-10-24  9:09 ` [RFC v4 PATCH 48/49] multi-process: add the concept description to docs/devel/qemu-multiprocess Jagannathan Raman
2019-10-25 19:33   ` Elena Ufimtseva
2019-11-07 15:50   ` Stefan Hajnoczi
2019-11-11 15:41   ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 49/49] multi-process: add configure and usage information Jagannathan Raman
2019-11-07 14:02   ` Stefan Hajnoczi
2019-11-07 14:33     ` Michael S. Tsirkin
2019-11-08 11:17       ` Stefan Hajnoczi
2019-11-08 11:32         ` Daniel P. Berrangé
2019-11-07 14:39     ` Daniel P. Berrangé
2019-11-07 15:53       ` Jag Raman
2019-11-08 11:14         ` Stefan Hajnoczi
2019-10-25  2:08 ` [RFC v4 PATCH 00/49] Initial support of multi-process qemu no-reply
2019-10-25  2:08 ` no-reply
2019-10-25  2:10 ` no-reply
2019-11-21 12:46 ` Stefan Hajnoczi
2019-12-10  6:47 ` [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update Elena Ufimtseva
2019-12-13 10:41   ` Stefan Hajnoczi
2019-12-16 19:46     ` Elena Ufimtseva
2019-12-16 19:57       ` Felipe Franciosi
2019-12-17 16:33         ` Stefan Hajnoczi
2019-12-17 22:57           ` Felipe Franciosi
2019-12-18  0:00             ` Paolo Bonzini
2019-12-19 13:36               ` Stefan Hajnoczi
2019-12-20 17:15                 ` John G Johnson
2020-01-02 10:00                   ` Stefan Hajnoczi
2020-01-02 10:04                   ` Stefan Hajnoczi
2019-12-19 11:55             ` Stefan Hajnoczi
2019-12-19 12:33               ` Felipe Franciosi
2019-12-19 12:55                 ` Daniel P. Berrangé
2019-12-20  9:47                   ` Stefan Hajnoczi
2019-12-20  9:50                     ` Paolo Bonzini
2019-12-20 14:14                       ` Felipe Franciosi
2019-12-20 15:25                         ` Alex Williamson
2019-12-20 16:00                           ` Felipe Franciosi
2020-02-25  9:16                           ` Thanos Makatos
2019-12-20 10:22                     ` Daniel P. Berrangé
2020-01-02 10:42                       ` Stefan Hajnoczi
2020-01-02 11:03                         ` Felipe Franciosi
2020-01-02 18:55                           ` Marc-André Lureau
2020-01-08 16:31                             ` Stefan Hajnoczi
2020-01-03 15:59                           ` Stefan Hajnoczi
2020-01-14  1:56                             ` John G Johnson
2020-01-17 17:25                               ` Dr. David Alan Gilbert
2019-12-19 16:40                 ` Jag Raman
2019-12-19 12:50             ` Daniel P. Berrangé
2019-12-19 16:46               ` Daniel P. Berrangé
2020-01-02 16:01           ` Elena Ufimtseva
2020-01-03 15:00             ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).