All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v12 00/19] Initial support for multi-process Qemu
@ 2020-12-01 20:22 Jagannathan Raman
  2020-12-01 20:22 ` [PATCH v12 01/19] multi-process: add the concept description to docs/devel/qemu-multiprocess Jagannathan Raman
                   ` (19 more replies)
  0 siblings, 20 replies; 52+ messages in thread
From: Jagannathan Raman @ 2020-12-01 20:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, swapnil.ingle, john.g.johnson, kraxel,
	jag.raman, quintela, mst, armbru, kanth.ghatraju, felipe, thuth,
	ehabkost, konrad.wilk, dgilbert, alex.williamson, stefanha,
	thanos.makatos, rth, kwolf, berrange, mreitz, ross.lagerwall,
	marcandre.lureau, pbonzini

Hello,

This is the v12 of the patchset. Thank you very much for the
review of the v11 of the series.

We made changes to the following patches in this version:
  - Moved patches 18 & 19 in v11 to the front of the series based
    on feedback from Phil
  - [PATCH v12 02/19 ] multi-process: add configure and usage information
  - [PATCH v12 04/19 ] multi-process: Add config option for multi-process QEMU
  - [PATCH v12 08/19] multi-process: define MPQemuMsg format and transmission
    functions.

In summary, we replaced "scripts/mpqemu-launcher.py" with
"tests/multiprocess/multiprocess-lsi53c895a.py". We tested this test on
x86_64 and aarch64 architectures, which we have access to.

We changed the name of the config variable called CONFIG_MPQEMU
to CONFIG_MULTIPROCESS. We also moved the config variable
definition out of the "configure" script, and into the Kconfig
system. Previously, the user specified if multiprocess was enabled
using the "--enable-mpqemu" argument to the configure script.
In this version, we changed that. The multiprocess support is enabled
automatically if Kconfig system detects KVM support. This is needed
to run acceptance tests in the future.

We are working on acceptance tests (tests/acceptance/) for this
project. However, we have hit a roadblock and are working with the
avocado-devel community to resolve the issue.

We noticed that checkpatch.pl script flagged a warning for Patch 4 for
this series, but we don't believe that's a valid concern. We generated
the patches using QEMU's git orderfile
(git format-patch -O scripts/git.orderfile ...).

To touch upon the history of this project, we posted the Proof Of
Concept patches before the BoF session in 2018. Subsequently, we have
posted 11 versions on the qemu-devel mailing list. You can find them
by following the links below ([1] - [11]).Following people contributed
to the design and implementation of this project:
Jagannathan Raman <jag.raman@oracle.com>
Elena Ufimtseva <elena.ufimtseva@oracle.com>
John G Johnson <john.g.johnson@oracle.com>
Stefan Hajnoczi <stefanha@redhat.com>
Konrad Wilk <konrad.wilk@oracle.com>
Kanth Ghatraju <kanth.ghatraju@oracle.com>

We would like to thank QEMU community for your feedback in the
design and implementation of this project.Qemu wiki page:
https://wiki.qemu.org/Features/MultiProcessQEMU

For the full concept writeup about QEMU multi-process, please
refer to docs/devel/qemu-multiprocess.rst. Also see
docs/qemu-multiprocess.txt for usage information. We welcome
all your ideas, concerns, and questions for this patchset.

Thank you!

[POC]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg566538.html
[1]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg602285.html
[2]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg624877.html
[3]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg642000.html
[4]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg655118.html
[5]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg682429.html
[6]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg697484.html
[7]: https://patchew.org/QEMU/cover.1593273671.git.elena.ufimtseva@oracle.com/
[8]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg727007.html
[9]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg734275.html
[10]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg747638.html
[11]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg750972.htmlThank you!


Elena Ufimtseva (7):
  multi-process: add configure and usage information
  multi-process: add qio channel function to transmit data and fds
  multi-process: define MPQemuMsg format and transmission functions
  multi-process: introduce proxy object
  multi-process: add proxy communication functions
  multi-process: Forward PCI config space acceses to the remote process
  multi-process: perform device reset in the remote process

Jagannathan Raman (11):
  memory: alloc RAM from file at offset
  multi-process: Add config option for multi-process QEMU
  multi-process: setup PCI host bridge for remote device
  multi-process: setup a machine object for remote device process
  multi-process: Initialize message handler in remote device
  multi-process: Associate fd of a PCIDevice with its object
  multi-process: setup memory manager for remote device
  multi-process: PCI BAR read/write handling for proxy & remote
    endpoints
  multi-process: Synchronize remote memory
  multi-process: create IOHUB object to handle irq
  multi-process: Retrieve PCI info from remote process

John G Johnson (1):
  multi-process: add the concept description to
    docs/devel/qemu-multiprocess

 docs/devel/index.rst                          |   1 +
 docs/devel/multi-process.rst                  | 966 ++++++++++++++++++++++++++
 docs/multi-process.rst                        |  66 ++
 include/exec/memory.h                         |   2 +
 include/exec/ram_addr.h                       |   2 +-
 include/hw/pci-host/remote.h                  |  31 +
 include/hw/pci/pci_ids.h                      |   3 +
 include/hw/remote/iohub.h                     |  42 ++
 include/hw/remote/machine.h                   |  40 ++
 include/hw/remote/memory-sync.h               |  27 +
 include/hw/remote/memory.h                    |  19 +
 include/hw/remote/mpqemu-link.h               |  98 +++
 include/hw/remote/proxy.h                     |  53 ++
 include/hw/remote/remote-obj.h                |  42 ++
 include/io/channel.h                          |  24 +
 include/qemu/mmap-alloc.h                     |   3 +-
 backends/hostmem-memfd.c                      |   2 +-
 hw/misc/ivshmem.c                             |   3 +-
 hw/pci-host/remote.c                          |  75 ++
 hw/remote/iohub.c                             | 123 ++++
 hw/remote/machine.c                           |  79 +++
 hw/remote/memory-sync.c                       | 210 ++++++
 hw/remote/memory.c                            |  58 ++
 hw/remote/message.c                           | 241 +++++++
 hw/remote/mpqemu-link.c                       | 308 ++++++++
 hw/remote/proxy.c                             | 378 ++++++++++
 hw/remote/remote-obj.c                        | 154 ++++
 io/channel.c                                  |  45 ++
 softmmu/memory.c                              |   3 +-
 softmmu/physmem.c                             |  11 +-
 util/mmap-alloc.c                             |   7 +-
 util/oslib-posix.c                            |   2 +-
 MAINTAINERS                                   |  25 +
 accel/Kconfig                                 |   1 +
 hw/Kconfig                                    |   1 +
 hw/meson.build                                |   1 +
 hw/pci-host/Kconfig                           |   3 +
 hw/pci-host/meson.build                       |   1 +
 hw/remote/Kconfig                             |   4 +
 hw/remote/meson.build                         |  13 +
 tests/multiprocess/multiprocess-lsi53c895a.py |  92 +++
 41 files changed, 3246 insertions(+), 13 deletions(-)
 create mode 100644 docs/devel/multi-process.rst
 create mode 100644 docs/multi-process.rst
 create mode 100644 include/hw/pci-host/remote.h
 create mode 100644 include/hw/remote/iohub.h
 create mode 100644 include/hw/remote/machine.h
 create mode 100644 include/hw/remote/memory-sync.h
 create mode 100644 include/hw/remote/memory.h
 create mode 100644 include/hw/remote/mpqemu-link.h
 create mode 100644 include/hw/remote/proxy.h
 create mode 100644 include/hw/remote/remote-obj.h
 create mode 100644 hw/pci-host/remote.c
 create mode 100644 hw/remote/iohub.c
 create mode 100644 hw/remote/machine.c
 create mode 100644 hw/remote/memory-sync.c
 create mode 100644 hw/remote/memory.c
 create mode 100644 hw/remote/message.c
 create mode 100644 hw/remote/mpqemu-link.c
 create mode 100644 hw/remote/proxy.c
 create mode 100644 hw/remote/remote-obj.c
 create mode 100644 hw/remote/Kconfig
 create mode 100644 hw/remote/meson.build
 create mode 100755 tests/multiprocess/multiprocess-lsi53c895a.py

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v12 01/19] multi-process: add the concept description to docs/devel/qemu-multiprocess
  2020-12-01 20:22 [PATCH v12 00/19] Initial support for multi-process Qemu Jagannathan Raman
@ 2020-12-01 20:22 ` Jagannathan Raman
  2020-12-01 20:22 ` [PATCH v12 02/19] multi-process: add configure and usage information Jagannathan Raman
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 52+ messages in thread
From: Jagannathan Raman @ 2020-12-01 20:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, swapnil.ingle, john.g.johnson, kraxel,
	jag.raman, quintela, mst, armbru, kanth.ghatraju, felipe, thuth,
	ehabkost, konrad.wilk, dgilbert, alex.williamson, stefanha,
	thanos.makatos, rth, kwolf, berrange, mreitz, ross.lagerwall,
	marcandre.lureau, pbonzini

From: John G Johnson <john.g.johnson@oracle.com>

Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 docs/devel/index.rst         |   1 +
 docs/devel/multi-process.rst | 966 +++++++++++++++++++++++++++++++++++++++++++
 MAINTAINERS                  |   7 +
 3 files changed, 974 insertions(+)
 create mode 100644 docs/devel/multi-process.rst

diff --git a/docs/devel/index.rst b/docs/devel/index.rst
index f10ed77..5013054 100644
--- a/docs/devel/index.rst
+++ b/docs/devel/index.rst
@@ -35,3 +35,4 @@ Contents:
    clocks
    qom
    block-coroutine-wrapper
+   multi-process
diff --git a/docs/devel/multi-process.rst b/docs/devel/multi-process.rst
new file mode 100644
index 0000000..6969932
--- /dev/null
+++ b/docs/devel/multi-process.rst
@@ -0,0 +1,966 @@
+This is the design document for multi-process QEMU. It does not
+necessarily reflect the status of the current implementation, which
+may lack features or be considerably different from what is described
+in this document. This document is still useful as a description of
+the goals and general direction of this feature.
+
+Please refer to the following wiki for latest details:
+https://wiki.qemu.org/Features/MultiProcessQEMU
+
+Multi-process QEMU
+===================
+
+QEMU is often used as the hypervisor for virtual machines running in the
+Oracle cloud. Since one of the advantages of cloud computing is the
+ability to run many VMs from different tenants in the same cloud
+infrastructure, a guest that compromised its hypervisor could
+potentially use the hypervisor's access privileges to access data it is
+not authorized for.
+
+QEMU can be susceptible to security attacks because it is a large,
+monolithic program that provides many features to the VMs it services.
+Many of these features can be configured out of QEMU, but even a reduced
+configuration QEMU has a large amount of code a guest can potentially
+attack. Separating QEMU reduces the attack surface by aiding to
+limit each component in the system to only access the resources that
+it needs to perform its job.
+
+QEMU services
+-------------
+
+QEMU can be broadly described as providing three main services. One is a
+VM control point, where VMs can be created, migrated, re-configured, and
+destroyed. A second is to emulate the CPU instructions within the VM,
+often accelerated by HW virtualization features such as Intel's VT
+extensions. Finally, it provides IO services to the VM by emulating HW
+IO devices, such as disk and network devices.
+
+A multi-process QEMU
+~~~~~~~~~~~~~~~~~~~~
+
+A multi-process QEMU involves separating QEMU services into separate
+host processes. Each of these processes can be given only the privileges
+it needs to provide its service, e.g., a disk service could be given
+access only to the disk images it provides, and not be allowed to
+access other files, or any network devices. An attacker who compromised
+this service would not be able to use this exploit to access files or
+devices beyond what the disk service was given access to.
+
+A QEMU control process would remain, but in multi-process mode, will
+have no direct interfaces to the VM. During VM execution, it would still
+provide the user interface to hot-plug devices or live migrate the VM.
+
+A first step in creating a multi-process QEMU is to separate IO services
+from the main QEMU program, which would continue to provide CPU
+emulation. i.e., the control process would also be the CPU emulation
+process. In a later phase, CPU emulation could be separated from the
+control process.
+
+Separating IO services
+----------------------
+
+Separating IO services into individual host processes is a good place to
+begin for a couple of reasons. One is the sheer number of IO devices QEMU
+can emulate provides a large surface of interfaces which could potentially
+be exploited, and, indeed, have been a source of exploits in the past.
+Another is the modular nature of QEMU device emulation code provides
+interface points where the QEMU functions that perform device emulation
+can be separated from the QEMU functions that manage the emulation of
+guest CPU instructions. The devices emulated in the separate process are
+referred to as remote devices.
+
+QEMU device emulation
+~~~~~~~~~~~~~~~~~~~~~
+
+QEMU uses an object oriented SW architecture for device emulation code.
+Configured objects are all compiled into the QEMU binary, then objects
+are instantiated by name when used by the guest VM. For example, the
+code to emulate a device named "foo" is always present in QEMU, but its
+instantiation code is only run when the device is included in the target
+VM. (e.g., via the QEMU command line as *-device foo*)
+
+The object model is hierarchical, so device emulation code names its
+parent object (such as "pci-device" for a PCI device) and QEMU will
+instantiate a parent object before calling the device's instantiation
+code.
+
+Current separation models
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In order to separate the device emulation code from the CPU emulation
+code, the device object code must run in a different process. There are
+a couple of existing QEMU features that can run emulation code
+separately from the main QEMU process. These are examined below.
+
+vhost user model
+^^^^^^^^^^^^^^^^
+
+Virtio guest device drivers can be connected to vhost user applications
+in order to perform their IO operations. This model uses special virtio
+device drivers in the guest and vhost user device objects in QEMU, but
+once the QEMU vhost user code has configured the vhost user application,
+mission-mode IO is performed by the application. The vhost user
+application is a daemon process that can be contacted via a known UNIX
+domain socket.
+
+vhost socket
+''''''''''''
+
+As mentioned above, one of the tasks of the vhost device object within
+QEMU is to contact the vhost application and send it configuration
+information about this device instance. As part of the configuration
+process, the application can also be sent other file descriptors over
+the socket, which then can be used by the vhost user application in
+various ways, some of which are described below.
+
+vhost MMIO store acceleration
+'''''''''''''''''''''''''''''
+
+VMs are often run using HW virtualization features via the KVM kernel
+driver. This driver allows QEMU to accelerate the emulation of guest CPU
+instructions by running the guest in a virtual HW mode. When the guest
+executes instructions that cannot be executed by virtual HW mode,
+execution returns to the KVM driver so it can inform QEMU to emulate the
+instructions in SW.
+
+One of the events that can cause a return to QEMU is when a guest device
+driver accesses an IO location. QEMU then dispatches the memory
+operation to the corresponding QEMU device object. In the case of a
+vhost user device, the memory operation would need to be sent over a
+socket to the vhost application. This path is accelerated by the QEMU
+virtio code by setting up an eventfd file descriptor that the vhost
+application can directly receive MMIO store notifications from the KVM
+driver, instead of needing them to be sent to the QEMU process first.
+
+vhost interrupt acceleration
+''''''''''''''''''''''''''''
+
+Another optimization used by the vhost application is the ability to
+directly inject interrupts into the VM via the KVM driver, again,
+bypassing the need to send the interrupt back to the QEMU process first.
+The QEMU virtio setup code configures the KVM driver with an eventfd
+that triggers the device interrupt in the guest when the eventfd is
+written. This irqfd file descriptor is then passed to the vhost user
+application program.
+
+vhost access to guest memory
+''''''''''''''''''''''''''''
+
+The vhost application is also allowed to directly access guest memory,
+instead of needing to send the data as messages to QEMU. This is also
+done with file descriptors sent to the vhost user application by QEMU.
+These descriptors can be passed to ``mmap()`` by the vhost application
+to map the guest address space into the vhost application.
+
+IOMMUs introduce another level of complexity, since the address given to
+the guest virtio device to DMA to or from is not a guest physical
+address. This case is handled by having vhost code within QEMU register
+as a listener for IOMMU mapping changes. The vhost application maintains
+a cache of IOMMMU translations: sending translation requests back to
+QEMU on cache misses, and in turn receiving flush requests from QEMU
+when mappings are purged.
+
+applicability to device separation
+''''''''''''''''''''''''''''''''''
+
+Much of the vhost model can be re-used by separated device emulation. In
+particular, the ideas of using a socket between QEMU and the device
+emulation application, using a file descriptor to inject interrupts into
+the VM via KVM, and allowing the application to ``mmap()`` the guest
+should be re used.
+
+There are, however, some notable differences between how a vhost
+application works and the needs of separated device emulation. The most
+basic is that vhost uses custom virtio device drivers which always
+trigger IO with MMIO stores. A separated device emulation model must
+work with existing IO device models and guest device drivers. MMIO loads
+break vhost store acceleration since they are synchronous - guest
+progress cannot continue until the load has been emulated. By contrast,
+stores are asynchronous, the guest can continue after the store event
+has been sent to the vhost application.
+
+Another difference is that in the vhost user model, a single daemon can
+support multiple QEMU instances. This is contrary to the security regime
+desired, in which the emulation application should only be allowed to
+access the files or devices the VM it's running on behalf of can access.
+#### qemu-io model
+
+Qemu-io is a test harness used to test changes to the QEMU block backend
+object code. (e.g., the code that implements disk images for disk driver
+emulation) Qemu-io is not a device emulation application per se, but it
+does compile the QEMU block objects into a separate binary from the main
+QEMU one. This could be useful for disk device emulation, since its
+emulation applications will need to include the QEMU block objects.
+
+New separation model based on proxy objects
+-------------------------------------------
+
+A different model based on proxy objects in the QEMU program
+communicating with remote emulation programs could provide separation
+while minimizing the changes needed to the device emulation code. The
+rest of this section is a discussion of how a proxy object model would
+work.
+
+Remote emulation processes
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The remote emulation process will run the QEMU object hierarchy without
+modification. The device emulation objects will be also be based on the
+QEMU code, because for anything but the simplest device, it would not be
+a tractable to re-implement both the object model and the many device
+backends that QEMU has.
+
+The processes will communicate with the QEMU process over UNIX domain
+sockets. The processes can be executed either as standalone processes,
+or be executed by QEMU. In both cases, the host backends the emulation
+processes will provide are specified on its command line, as they would
+be for QEMU. For example:
+
+::
+
+    disk-proc -blockdev driver=file,node-name=file0,filename=disk-file0  \
+    -blockdev driver=qcow2,node-name=drive0,file=file0
+
+would indicate process *disk-proc* uses a qcow2 emulated disk named
+*file0* as its backend.
+
+Emulation processes may emulate more than one guest controller. A common
+configuration might be to put all controllers of the same device class
+(e.g., disk, network, etc.) in a single process, so that all backends of
+the same type can be managed by a single QMP monitor.
+
+communication with QEMU
+^^^^^^^^^^^^^^^^^^^^^^^
+
+The first argument to the remote emulation process will be a Unix domain
+socket that connects with the Proxy object. This is a required argument.
+
+::
+
+    disk-proc <socket number> <backend list>
+
+remote process QMP monitor
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Remote emulation processes can be monitored via QMP, similar to QEMU
+itself. The QMP monitor socket is specified the same as for a QEMU
+process:
+
+::
+
+    disk-proc -qmp unix:/tmp/disk-mon,server
+
+can be monitored over the UNIX socket path */tmp/disk-mon*.
+
+QEMU command line
+~~~~~~~~~~~~~~~~~
+
+Each remote device emulated in a remote process on the host is
+represented as a *-device* of type *pci-proxy-dev*. A socket
+sub-option to this option specifies the Unix socket that connects
+to the remote process. An *id* sub-option is required, and it should
+be the same id as used in the remote process.
+
+::
+
+    qemu-system-x86_64 ... -device pci-proxy-dev,id=lsi0,socket=3
+
+can be used to add a device emulated in a remote process
+
+
+QEMU management of remote processes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+QEMU is not aware of the type of type of the remote PCI device. It is
+a pass through device as far as QEMU is concerned.
+
+communication with emulation process
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+primary channel
+'''''''''''''''
+
+The primary channel (referred to as com in the code) is used to bootstrap
+the remote process. It is also used to pass on device-agnostic commands
+like reset.
+
+per-device channels
+'''''''''''''''''''
+
+Each remote device communicates with QEMU using a dedicated communication
+channel. The proxy object sets up this channel using the primary
+channel during its initialization.
+
+QEMU device proxy objects
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+QEMU has an object model based on sub-classes inherited from the
+"object" super-class. The sub-classes that are of interest here are the
+"device" and "bus" sub-classes whose child sub-classes make up the
+device tree of a QEMU emulated system.
+
+The proxy object model will use device proxy objects to replace the
+device emulation code within the QEMU process. These objects will live
+in the same place in the object and bus hierarchies as the objects they
+replace. i.e., the proxy object for an LSI SCSI controller will be a
+sub-class of the "pci-device" class, and will have the same PCI bus
+parent and the same SCSI bus child objects as the LSI controller object
+it replaces.
+
+It is worth noting that the same proxy object is used to mediate with
+all types of remote PCI devices.
+
+object initialization
+^^^^^^^^^^^^^^^^^^^^^
+
+The Proxy device objects are initialized in the exact same manner in
+which any other QEMU device would be initialized.
+
+In addition, the Proxy objects perform the following two tasks:
+- Parses the "socket" sub option and connects to the remote process
+using this channel
+- Uses the "id" sub-option to connect to the emulated device on the
+separate process
+
+class\_init
+'''''''''''
+
+The ``class_init()`` method of a proxy object will, in general behave
+similarly to the object it replaces, including setting any static
+properties and methods needed by the proxy.
+
+instance\_init / realize
+''''''''''''''''''''''''
+
+The ``instance_init()`` and ``realize()`` functions would only need to
+perform tasks related to being a proxy, such are registering its own
+MMIO handlers, or creating a child bus that other proxy devices can be
+attached to later.
+
+Other tasks will be device-specific. For example, PCI device objects
+will initialize the PCI config space in order to make a valid PCI device
+tree within the QEMU process.
+
+address space registration
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Most devices are driven by guest device driver accesses to IO addresses
+or ports. The QEMU device emulation code uses QEMU's memory region
+function calls (such as ``memory_region_init_io()``) to add callback
+functions that QEMU will invoke when the guest accesses the device's
+areas of the IO address space. When a guest driver does access the
+device, the VM will exit HW virtualization mode and return to QEMU,
+which will then lookup and execute the corresponding callback function.
+
+A proxy object would need to mirror the memory region calls the actual
+device emulator would perform in its initialization code, but with its
+own callbacks. When invoked by QEMU as a result of a guest IO operation,
+they will forward the operation to the device emulation process.
+
+PCI config space
+^^^^^^^^^^^^^^^^
+
+PCI devices also have a configuration space that can be accessed by the
+guest driver. Guest accesses to this space is not handled by the device
+emulation object, but by its PCI parent object. Much of this space is
+read-only, but certain registers (especially BAR and MSI-related ones)
+need to be propagated to the emulation process.
+
+PCI parent proxy
+''''''''''''''''
+
+One way to propagate guest PCI config accesses is to create a
+"pci-device-proxy" class that can serve as the parent of a PCI device
+proxy object. This class's parent would be "pci-device" and it would
+override the PCI parent's ``config_read()`` and ``config_write()``
+methods with ones that forward these operations to the emulation
+program.
+
+interrupt receipt
+^^^^^^^^^^^^^^^^^
+
+A proxy for a device that generates interrupts will need to create a
+socket to receive interrupt indications from the emulation process. An
+incoming interrupt indication would then be sent up to its bus parent to
+be injected into the guest. For example, a PCI device object may use
+``pci_set_irq()``.
+
+live migration
+^^^^^^^^^^^^^^
+
+The proxy will register to save and restore any *vmstate* it needs over
+a live migration event. The device proxy does not need to manage the
+remote device's *vmstate*; that will be handled by the remote process
+proxy (see below).
+
+QEMU remote device operation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Generic device operations, such as DMA, will be performed by the remote
+process proxy by sending messages to the remote process.
+
+DMA operations
+^^^^^^^^^^^^^^
+
+DMA operations would be handled much like vhost applications do. One of
+the initial messages sent to the emulation process is a guest memory
+table. Each entry in this table consists of a file descriptor and size
+that the emulation process can ``mmap()`` to directly access guest
+memory, similar to ``vhost_user_set_mem_table()``. Note guest memory
+must be backed by file descriptors, such as when QEMU is given the
+*-mem-path* command line option.
+
+IOMMU operations
+^^^^^^^^^^^^^^^^
+
+When the emulated system includes an IOMMU, the remote process proxy in
+QEMU will need to create a socket for IOMMU requests from the emulation
+process. It will handle those requests with an
+``address_space_get_iotlb_entry()`` call. In order to handle IOMMU
+unmaps, the remote process proxy will also register as a listener on the
+device's DMA address space. When an IOMMU memory region is created
+within the DMA address space, an IOMMU notifier for unmaps will be added
+to the memory region that will forward unmaps to the emulation process
+over the IOMMU socket.
+
+device hot-plug via QMP
+^^^^^^^^^^^^^^^^^^^^^^^
+
+An QMP "device\_add" command can add a device emulated by a remote
+process. It will also have "rid" option to the command, just as the
+*-device* command line option does. The remote process may either be one
+started at QEMU startup, or be one added by the "add-process" QMP
+command described above. In either case, the remote process proxy will
+forward the new device's JSON description to the corresponding emulation
+process.
+
+live migration
+^^^^^^^^^^^^^^
+
+The remote process proxy will also register for live migration
+notifications with ``vmstate_register()``. When called to save state,
+the proxy will send the remote process a secondary socket file
+descriptor to save the remote process's device *vmstate* over. The
+incoming byte stream length and data will be saved as the proxy's
+*vmstate*. When the proxy is resumed on its new host, this *vmstate*
+will be extracted, and a secondary socket file descriptor will be sent
+to the new remote process through which it receives the *vmstate* in
+order to restore the devices there.
+
+device emulation in remote process
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The parts of QEMU that the emulation program will need include the
+object model; the memory emulation objects; the device emulation objects
+of the targeted device, and any dependent devices; and, the device's
+backends. It will also need code to setup the machine environment,
+handle requests from the QEMU process, and route machine-level requests
+(such as interrupts or IOMMU mappings) back to the QEMU process.
+
+initialization
+^^^^^^^^^^^^^^
+
+The process initialization sequence will follow the same sequence
+followed by QEMU. It will first initialize the backend objects, then
+device emulation objects. The JSON descriptions sent by the QEMU process
+will drive which objects need to be created.
+
+-  address spaces
+
+Before the device objects are created, the initial address spaces and
+memory regions must be configured with ``memory_map_init()``. This
+creates a RAM memory region object (*system\_memory*) and an IO memory
+region object (*system\_io*).
+
+-  RAM
+
+RAM memory region creation will follow how ``pc_memory_init()`` creates
+them, but must use ``memory_region_init_ram_from_fd()`` instead of
+``memory_region_allocate_system_memory()``. The file descriptors needed
+will be supplied by the guest memory table from above. Those RAM regions
+would then be added to the *system\_memory* memory region with
+``memory_region_add_subregion()``.
+
+-  PCI
+
+IO initialization will be driven by the JSON descriptions sent from the
+QEMU process. For a PCI device, a PCI bus will need to be created with
+``pci_root_bus_new()``, and a PCI memory region will need to be created
+and added to the *system\_memory* memory region with
+``memory_region_add_subregion_overlap()``. The overlap version is
+required for architectures where PCI memory overlaps with RAM memory.
+
+MMIO handling
+^^^^^^^^^^^^^
+
+The device emulation objects will use ``memory_region_init_io()`` to
+install their MMIO handlers, and ``pci_register_bar()`` to associate
+those handlers with a PCI BAR, as they do within QEMU currently.
+
+In order to use ``address_space_rw()`` in the emulation process to
+handle MMIO requests from QEMU, the PCI physical addresses must be the
+same in the QEMU process and the device emulation process. In order to
+accomplish that, guest BAR programming must also be forwarded from QEMU
+to the emulation process.
+
+interrupt injection
+^^^^^^^^^^^^^^^^^^^
+
+When device emulation wants to inject an interrupt into the VM, the
+request climbs the device's bus object hierarchy until the point where a
+bus object knows how to signal the interrupt to the guest. The details
+depend on the type of interrupt being raised.
+
+-  PCI pin interrupts
+
+On x86 systems, there is an emulated IOAPIC object attached to the root
+PCI bus object, and the root PCI object forwards interrupt requests to
+it. The IOAPIC object, in turn, calls the KVM driver to inject the
+corresponding interrupt into the VM. The simplest way to handle this in
+an emulation process would be to setup the root PCI bus driver (via
+``pci_bus_irqs()``) to send a interrupt request back to the QEMU
+process, and have the device proxy object reflect it up the PCI tree
+there.
+
+-  PCI MSI/X interrupts
+
+PCI MSI/X interrupts are implemented in HW as DMA writes to a
+CPU-specific PCI address. In QEMU on x86, a KVM APIC object receives
+these DMA writes, then calls into the KVM driver to inject the interrupt
+into the VM. A simple emulation process implementation would be to send
+the MSI DMA address from QEMU as a message at initialization, then
+install an address space handler at that address which forwards the MSI
+message back to QEMU.
+
+DMA operations
+^^^^^^^^^^^^^^
+
+When a emulation object wants to DMA into or out of guest memory, it
+first must use dma\_memory\_map() to convert the DMA address to a local
+virtual address. The emulation process memory region objects setup above
+will be used to translate the DMA address to a local virtual address the
+device emulation code can access.
+
+IOMMU
+^^^^^
+
+When an IOMMU is in use in QEMU, DMA translation uses IOMMU memory
+regions to translate the DMA address to a guest physical address before
+that physical address can be translated to a local virtual address. The
+emulation process will need similar functionality.
+
+-  IOTLB cache
+
+The emulation process will maintain a cache of recent IOMMU translations
+(the IOTLB). When the translate() callback of an IOMMU memory region is
+invoked, the IOTLB cache will be searched for an entry that will map the
+DMA address to a guest PA. On a cache miss, a message will be sent back
+to QEMU requesting the corresponding translation entry, which be both be
+used to return a guest address and be added to the cache.
+
+-  IOTLB purge
+
+The IOMMU emulation will also need to act on unmap requests from QEMU.
+These happen when the guest IOMMU driver purges an entry from the
+guest's translation table.
+
+live migration
+^^^^^^^^^^^^^^
+
+When a remote process receives a live migration indication from QEMU, it
+will set up a channel using the received file descriptor with
+``qio_channel_socket_new_fd()``. This channel will be used to create a
+*QEMUfile* that can be passed to ``qemu_save_device_state()`` to send
+the process's device state back to QEMU. This method will be reversed on
+restore - the channel will be passed to ``qemu_loadvm_state()`` to
+restore the device state.
+
+Accelerating device emulation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The messages that are required to be sent between QEMU and the emulation
+process can add considerable latency to IO operations. The optimizations
+described below attempt to ameliorate this effect by allowing the
+emulation process to communicate directly with the kernel KVM driver.
+The KVM file descriptors created would be passed to the emulation process
+via initialization messages, much like the guest memory table is done.
+#### MMIO acceleration
+
+Vhost user applications can receive guest virtio driver stores directly
+from KVM. The issue with the eventfd mechanism used by vhost user is
+that it does not pass any data with the event indication, so it cannot
+handle guest loads or guest stores that carry store data. This concept
+could, however, be expanded to cover more cases.
+
+The expanded idea would require a new type of KVM device:
+*KVM\_DEV\_TYPE\_USER*. This device has two file descriptors: a master
+descriptor that QEMU can use for configuration, and a slave descriptor
+that the emulation process can use to receive MMIO notifications. QEMU
+would create both descriptors using the KVM driver, and pass the slave
+descriptor to the emulation process via an initialization message.
+
+data structures
+^^^^^^^^^^^^^^^
+
+-  guest physical range
+
+The guest physical range structure describes the address range that a
+device will respond to. It includes the base and length of the range, as
+well as which bus the range resides on (e.g., on an x86machine, it can
+specify whether the range refers to memory or IO addresses).
+
+A device can have multiple physical address ranges it responds to (e.g.,
+a PCI device can have multiple BARs), so the structure will also include
+an enumerated identifier to specify which of the device's ranges is
+being referred to.
+
++--------+----------------------------+
+| Name   | Description                |
++========+============================+
+| addr   | range base address         |
++--------+----------------------------+
+| len    | range length               |
++--------+----------------------------+
+| bus    | addr type (memory or IO)   |
++--------+----------------------------+
+| id     | range ID (e.g., PCI BAR)   |
++--------+----------------------------+
+
+-  MMIO request structure
+
+This structure describes an MMIO operation. It includes which guest
+physical range the MMIO was within, the offset within that range, the
+MMIO type (e.g., load or store), and its length and data. It also
+includes a sequence number that can be used to reply to the MMIO, and
+the CPU that issued the MMIO.
+
++----------+------------------------+
+| Name     | Description            |
++==========+========================+
+| rid      | range MMIO is within   |
++----------+------------------------+
+| offset   | offset withing *rid*   |
++----------+------------------------+
+| type     | e.g., load or store    |
++----------+------------------------+
+| len      | MMIO length            |
++----------+------------------------+
+| data     | store data             |
++----------+------------------------+
+| seq      | sequence ID            |
++----------+------------------------+
+
+-  MMIO request queues
+
+MMIO request queues are FIFO arrays of MMIO request structures. There
+are two queues: pending queue is for MMIOs that haven't been read by the
+emulation program, and the sent queue is for MMIOs that haven't been
+acknowledged. The main use of the second queue is to validate MMIO
+replies from the emulation program.
+
+-  scoreboard
+
+Each CPU in the VM is emulated in QEMU by a separate thread, so multiple
+MMIOs may be waiting to be consumed by an emulation program and multiple
+threads may be waiting for MMIO replies. The scoreboard would contain a
+wait queue and sequence number for the per-CPU threads, allowing them to
+be individually woken when the MMIO reply is received from the emulation
+program. It also tracks the number of posted MMIO stores to the device
+that haven't been replied to, in order to satisfy the PCI constraint
+that a load to a device will not complete until all previous stores to
+that device have been completed.
+
+-  device shadow memory
+
+Some MMIO loads do not have device side-effects. These MMIOs can be
+completed without sending a MMIO request to the emulation program if the
+emulation program shares a shadow image of the device's memory image
+with the KVM driver.
+
+The emulation program will ask the KVM driver to allocate memory for the
+shadow image, and will then use ``mmap()`` to directly access it. The
+emulation program can control KVM access to the shadow image by sending
+KVM an access map telling it which areas of the image have no
+side-effects (and can be completed immediately), and which require a
+MMIO request to the emulation program. The access map can also inform
+the KVM drive which size accesses are allowed to the image.
+
+master descriptor
+^^^^^^^^^^^^^^^^^
+
+The master descriptor is used by QEMU to configure the new KVM device.
+The descriptor would be returned by the KVM driver when QEMU issues a
+*KVM\_CREATE\_DEVICE* ``ioctl()`` with a *KVM\_DEV\_TYPE\_USER* type.
+
+KVM\_DEV\_TYPE\_USER device ops
+
+
+The *KVM\_DEV\_TYPE\_USER* operations vector will be registered by a
+``kvm_register_device_ops()`` call when the KVM system in initialized by
+``kvm_init()``. These device ops are called by the KVM driver when QEMU
+executes certain ``ioctl()`` operations on its KVM file descriptor. They
+include:
+
+-  create
+
+This routine is called when QEMU issues a *KVM\_CREATE\_DEVICE*
+``ioctl()`` on its per-VM file descriptor. It will allocate and
+initialize a KVM user device specific data structure, and assign the
+*kvm\_device* private field to it.
+
+-  ioctl
+
+This routine is invoked when QEMU issues an ``ioctl()`` on the master
+descriptor. The ``ioctl()`` commands supported are defined by the KVM
+device type. *KVM\_DEV\_TYPE\_USER* ones will need several commands:
+
+*KVM\_DEV\_USER\_SLAVE\_FD* creates the slave file descriptor that will
+be passed to the device emulation program. Only one slave can be created
+by each master descriptor. The file operations performed by this
+descriptor are described below.
+
+The *KVM\_DEV\_USER\_PA\_RANGE* command configures a guest physical
+address range that the slave descriptor will receive MMIO notifications
+for. The range is specified by a guest physical range structure
+argument. For buses that assign addresses to devices dynamically, this
+command can be executed while the guest is running, such as the case
+when a guest changes a device's PCI BAR registers.
+
+*KVM\_DEV\_USER\_PA\_RANGE* will use ``kvm_io_bus_register_dev()`` to
+register *kvm\_io\_device\_ops* callbacks to be invoked when the guest
+performs a MMIO operation within the range. When a range is changed,
+``kvm_io_bus_unregister_dev()`` is used to remove the previous
+instantiation.
+
+*KVM\_DEV\_USER\_TIMEOUT* will configure a timeout value that specifies
+how long KVM will wait for the emulation process to respond to a MMIO
+indication.
+
+-  destroy
+
+This routine is called when the VM instance is destroyed. It will need
+to destroy the slave descriptor; and free any memory allocated by the
+driver, as well as the *kvm\_device* structure itself.
+
+slave descriptor
+^^^^^^^^^^^^^^^^
+
+The slave descriptor will have its own file operations vector, which
+responds to system calls on the descriptor performed by the device
+emulation program.
+
+-  read
+
+A read returns any pending MMIO requests from the KVM driver as MMIO
+request structures. Multiple structures can be returned if there are
+multiple MMIO operations pending. The MMIO requests are moved from the
+pending queue to the sent queue, and if there are threads waiting for
+space in the pending to add new MMIO operations, they will be woken
+here.
+
+-  write
+
+A write also consists of a set of MMIO requests. They are compared to
+the MMIO requests in the sent queue. Matches are removed from the sent
+queue, and any threads waiting for the reply are woken. If a store is
+removed, then the number of posted stores in the per-CPU scoreboard is
+decremented. When the number is zero, and a non side-effect load was
+waiting for posted stores to complete, the load is continued.
+
+-  ioctl
+
+There are several ioctl()s that can be performed on the slave
+descriptor.
+
+A *KVM\_DEV\_USER\_SHADOW\_SIZE* ``ioctl()`` causes the KVM driver to
+allocate memory for the shadow image. This memory can later be
+``mmap()``\ ed by the emulation process to share the emulation's view of
+device memory with the KVM driver.
+
+A *KVM\_DEV\_USER\_SHADOW\_CTRL* ``ioctl()`` controls access to the
+shadow image. It will send the KVM driver a shadow control map, which
+specifies which areas of the image can complete guest loads without
+sending the load request to the emulation program. It will also specify
+the size of load operations that are allowed.
+
+-  poll
+
+An emulation program will use the ``poll()`` call with a *POLLIN* flag
+to determine if there are MMIO requests waiting to be read. It will
+return if the pending MMIO request queue is not empty.
+
+-  mmap
+
+This call allows the emulation program to directly access the shadow
+image allocated by the KVM driver. As device emulation updates device
+memory, changes with no side-effects will be reflected in the shadow,
+and the KVM driver can satisfy guest loads from the shadow image without
+needing to wait for the emulation program.
+
+kvm\_io\_device ops
+^^^^^^^^^^^^^^^^^^^
+
+Each KVM per-CPU thread can handle MMIO operation on behalf of the guest
+VM. KVM will use the MMIO's guest physical address to search for a
+matching *kvm\_io\_device* to see if the MMIO can be handled by the KVM
+driver instead of exiting back to QEMU. If a match is found, the
+corresponding callback will be invoked.
+
+-  read
+
+This callback is invoked when the guest performs a load to the device.
+Loads with side-effects must be handled synchronously, with the KVM
+driver putting the QEMU thread to sleep waiting for the emulation
+process reply before re-starting the guest. Loads that do not have
+side-effects may be optimized by satisfying them from the shadow image,
+if there are no outstanding stores to the device by this CPU. PCI memory
+ordering demands that a load cannot complete before all older stores to
+the same device have been completed.
+
+-  write
+
+Stores can be handled asynchronously unless the pending MMIO request
+queue is full. In this case, the QEMU thread must sleep waiting for
+space in the queue. Stores will increment the number of posted stores in
+the per-CPU scoreboard, in order to implement the PCI ordering
+constraint above.
+
+interrupt acceleration
+^^^^^^^^^^^^^^^^^^^^^^
+
+This performance optimization would work much like a vhost user
+application does, where the QEMU process sets up *eventfds* that cause
+the device's corresponding interrupt to be triggered by the KVM driver.
+These irq file descriptors are sent to the emulation process at
+initialization, and are used when the emulation code raises a device
+interrupt.
+
+intx acceleration
+'''''''''''''''''
+
+Traditional PCI pin interrupts are level based, so, in addition to an
+irq file descriptor, a re-sampling file descriptor needs to be sent to
+the emulation program. This second file descriptor allows multiple
+devices sharing an irq to be notified when the interrupt has been
+acknowledged by the guest, so they can re-trigger the interrupt if their
+device has not de-asserted its interrupt.
+
+intx irq descriptor
+
+
+The irq descriptors are created by the proxy object
+``using event_notifier_init()`` to create the irq and re-sampling
+*eventds*, and ``kvm_vm_ioctl(KVM_IRQFD)`` to bind them to an interrupt.
+The interrupt route can be found with
+``pci_device_route_intx_to_irq()``.
+
+intx routing changes
+
+
+Intx routing can be changed when the guest programs the APIC the device
+pin is connected to. The proxy object in QEMU will use
+``pci_device_set_intx_routing_notifier()`` to be informed of any guest
+changes to the route. This handler will broadly follow the VFIO
+interrupt logic to change the route: de-assigning the existing irq
+descriptor from its route, then assigning it the new route. (see
+``vfio_intx_update()``)
+
+MSI/X acceleration
+''''''''''''''''''
+
+MSI/X interrupts are sent as DMA transactions to the host. The interrupt
+data contains a vector that is programmed by the guest, A device may have
+multiple MSI interrupts associated with it, so multiple irq descriptors
+may need to be sent to the emulation program.
+
+MSI/X irq descriptor
+
+
+This case will also follow the VFIO example. For each MSI/X interrupt,
+an *eventfd* is created, a virtual interrupt is allocated by
+``kvm_irqchip_add_msi_route()``, and the virtual interrupt is bound to
+the eventfd with ``kvm_irqchip_add_irqfd_notifier()``.
+
+MSI/X config space changes
+
+
+The guest may dynamically update several MSI-related tables in the
+device's PCI config space. These include per-MSI interrupt enables and
+vector data. Additionally, MSIX tables exist in device memory space, not
+config space. Much like the BAR case above, the proxy object must look
+at guest config space programming to keep the MSI interrupt state
+consistent between QEMU and the emulation program.
+
+--------------
+
+Disaggregated CPU emulation
+---------------------------
+
+After IO services have been disaggregated, a second phase would be to
+separate a process to handle CPU instruction emulation from the main
+QEMU control function. There are no object separation points for this
+code, so the first task would be to create one.
+
+Host access controls
+--------------------
+
+Separating QEMU relies on the host OS's access restriction mechanisms to
+enforce that the differing processes can only access the objects they
+are entitled to. There are a couple types of mechanisms usually provided
+by general purpose OSs.
+
+Discretionary access control
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Discretionary access control allows each user to control who can access
+their files. In Linux, this type of control is usually too coarse for
+QEMU separation, since it only provides three separate access controls:
+one for the same user ID, the second for users IDs with the same group
+ID, and the third for all other user IDs. Each device instance would
+need a separate user ID to provide access control, which is likely to be
+unwieldy for dynamically created VMs.
+
+Mandatory access control
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Mandatory access control allows the OS to add an additional set of
+controls on top of discretionary access for the OS to control. It also
+adds other attributes to processes and files such as types, roles, and
+categories, and can establish rules for how processes and files can
+interact.
+
+Type enforcement
+^^^^^^^^^^^^^^^^
+
+Type enforcement assigns a *type* attribute to processes and files, and
+allows rules to be written on what operations a process with a given
+type can perform on a file with a given type. QEMU separation could take
+advantage of type enforcement by running the emulation processes with
+different types, both from the main QEMU process, and from the emulation
+processes of different classes of devices.
+
+For example, guest disk images and disk emulation processes could have
+types separate from the main QEMU process and non-disk emulation
+processes, and the type rules could prevent processes other than disk
+emulation ones from accessing guest disk images. Similarly, network
+emulation processes can have a type separate from the main QEMU process
+and non-network emulation process, and only that type can access the
+host tun/tap device used to provide guest networking.
+
+Category enforcement
+^^^^^^^^^^^^^^^^^^^^
+
+Category enforcement assigns a set of numbers within a given range to
+the process or file. The process is granted access to the file if the
+process's set is a superset of the file's set. This enforcement can be
+used to separate multiple instances of devices in the same class.
+
+For example, if there are multiple disk devices provides to a guest,
+each device emulation process could be provisioned with a separate
+category. The different device emulation processes would not be able to
+access each other's backing disk images.
+
+Alternatively, categories could be used in lieu of the type enforcement
+scheme described above. In this scenario, different categories would be
+used to prevent device emulation processes in different classes from
+accessing resources assigned to other classes.
diff --git a/MAINTAINERS b/MAINTAINERS
index 68bc160..88a5a14 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3130,6 +3130,13 @@ S: Maintained
 F: hw/semihosting/
 F: include/hw/semihosting/
 
+Multi-process QEMU
+M: Elena Ufimtseva <elena.ufimtseva@oracle.com>
+M: Jagannathan Raman <jag.raman@oracle.com>
+M: John G Johnson <john.g.johnson@oracle.com>
+S: Maintained
+F: docs/devel/multi-process.rst
+
 Build and test automation
 -------------------------
 Build and test automation
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v12 02/19] multi-process: add configure and usage information
  2020-12-01 20:22 [PATCH v12 00/19] Initial support for multi-process Qemu Jagannathan Raman
  2020-12-01 20:22 ` [PATCH v12 01/19] multi-process: add the concept description to docs/devel/qemu-multiprocess Jagannathan Raman
@ 2020-12-01 20:22 ` Jagannathan Raman
  2020-12-04 14:10   ` Marc-André Lureau
  2020-12-04 14:37   ` Daniel P. Berrangé
  2020-12-01 20:22 ` [PATCH v12 03/19] memory: alloc RAM from file at offset Jagannathan Raman
                   ` (17 subsequent siblings)
  19 siblings, 2 replies; 52+ messages in thread
From: Jagannathan Raman @ 2020-12-01 20:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, swapnil.ingle, john.g.johnson, kraxel,
	jag.raman, quintela, mst, armbru, kanth.ghatraju, felipe, thuth,
	ehabkost, konrad.wilk, dgilbert, alex.williamson, stefanha,
	thanos.makatos, rth, kwolf, berrange, mreitz, ross.lagerwall,
	marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

Adds documentation explaining the command-line arguments needed
to use multi-process. Also adds a python script that illustrates the
usage.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 docs/multi-process.rst                        | 66 +++++++++++++++++++
 MAINTAINERS                                   |  1 +
 tests/multiprocess/multiprocess-lsi53c895a.py | 92 +++++++++++++++++++++++++++
 3 files changed, 159 insertions(+)
 create mode 100644 docs/multi-process.rst
 create mode 100755 tests/multiprocess/multiprocess-lsi53c895a.py

diff --git a/docs/multi-process.rst b/docs/multi-process.rst
new file mode 100644
index 0000000..9a5fe5b
--- /dev/null
+++ b/docs/multi-process.rst
@@ -0,0 +1,66 @@
+Multi-process QEMU
+==================
+
+This document describes how to configure and use multi-process qemu.
+For the design document refer to docs/devel/qemu-multiprocess.
+
+1) Configuration
+----------------
+
+multi-process is enabled by default for targets that enable KVM
+
+
+2) Usage
+--------
+
+Multi-process QEMU requires an orchestrator to launch. Please refer to a
+light-weight python based orchestrator for mpqemu in
+scripts/mpqemu-launcher.py to lauch QEMU in multi-process mode.
+
+Following is a description of command-line used to launch mpqemu.
+
+* Orchestrator:
+
+  - The Orchestrator creates a unix socketpair
+
+  - It launches the remote process and passes one of the
+    sockets to it via command-line.
+
+  - It then launches QEMU and specifies the other socket as an option
+    to the Proxy device object
+
+* Remote Process:
+
+  - QEMU can enter remote process mode by using the "remote" machine
+    option.
+
+  - The orchestrator creates a "remote-object" with details about
+    the device and the file descriptor for the device
+
+  - The remaining options are no different from how one launches QEMU with
+    devices.
+
+  - Example command-line for the remote process is as follows:
+
+      /usr/bin/qemu-system-x86_64                                        \
+      -machine x-remote                                                  \
+      -device lsi53c895a,id=lsi0                                         \
+      -drive id=drive_image2,file=/build/ol7-nvme-test-1.qcow2           \
+      -device scsi-hd,id=drive2,drive=drive_image2,bus=lsi0.0,scsi-id=0  \
+      -object x-remote-object,id=robj1,devid=lsi1,fd=4,
+
+* QEMU:
+
+  - Since parts of the RAM are shared between QEMU & remote process, a
+    memory-backend-memfd is required to facilitate this, as follows:
+
+    -object memory-backend-memfd,id=mem,size=2G
+
+  - A "x-pci-proxy-dev" device is created for each of the PCI devices emulated
+    in the remote process. A "socket" sub-option specifies the other end of
+    unix channel created by orchestrator. The "id" sub-option must be specified
+    and should be the same as the "id" specified for the remote PCI device
+
+  - Example commandline for QEMU is as follows:
+
+      -device x-pci-proxy-dev,id=lsi0,socket=3
diff --git a/MAINTAINERS b/MAINTAINERS
index 88a5a14..f615ad1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3136,6 +3136,7 @@ M: Jagannathan Raman <jag.raman@oracle.com>
 M: John G Johnson <john.g.johnson@oracle.com>
 S: Maintained
 F: docs/devel/multi-process.rst
+F: tests/multiprocess/multiprocess-lsi53c895a.py
 
 Build and test automation
 -------------------------
diff --git a/tests/multiprocess/multiprocess-lsi53c895a.py b/tests/multiprocess/multiprocess-lsi53c895a.py
new file mode 100755
index 0000000..bfe4f66
--- /dev/null
+++ b/tests/multiprocess/multiprocess-lsi53c895a.py
@@ -0,0 +1,92 @@
+#!/usr/bin/env python3
+
+import urllib.request
+import subprocess
+import argparse
+import socket
+import sys
+import os
+
+arch = os.uname()[4]
+proc_path = os.path.join(os.getcwd(), '..', '..', 'build', arch+'-softmmu',
+                         'qemu-system-'+arch)
+
+parser = argparse.ArgumentParser(description='Launcher for multi-process QEMU')
+parser.add_argument('--bin', required=False, help='location of QEMU binary',
+                    metavar='bin');
+args = parser.parse_args()
+
+if args.bin is not None:
+    proc_path = args.bin
+
+if not os.path.isfile(proc_path):
+    sys.exit('QEMU binary not found')
+
+kernel_path = os.path.join(os.getcwd(), 'vmlinuz')
+initrd_path = os.path.join(os.getcwd(), 'initrd')
+
+proxy_cmd = [ proc_path,                                                    \
+              '-name', 'Fedora', '-smp', '4', '-m', '2048', '-cpu', 'host', \
+              '-object', 'memory-backend-memfd,id=sysmem-file,size=2G',     \
+              '-numa', 'node,memdev=sysmem-file',                           \
+              '-kernel', kernel_path, '-initrd', initrd_path,               \
+              '-vnc', ':0',                                                 \
+              '-monitor', 'unix:/home/qemu-sock,server,nowait',             \
+            ]
+
+if arch == 'x86_64':
+    print('Downloading images for arch x86_64')
+    kernel_url = 'https://dl.fedoraproject.org/pub/fedora/linux/'    \
+                 'releases/33/Everything/x86_64/os/images/'          \
+                 'pxeboot/vmlinuz'
+    initrd_url = 'https://dl.fedoraproject.org/pub/fedora/linux/'    \
+                 'releases/33/Everything/x86_64/os/images/'          \
+                 'pxeboot/initrd.img'
+    proxy_cmd.append('-machine')
+    proxy_cmd.append('pc,accel=kvm')
+    proxy_cmd.append('-append')
+    proxy_cmd.append('rdinit=/bin/bash console=ttyS0 console=tty0')
+elif arch == 'aarch64':
+    print('Downloading images for arch aarch64')
+    kernel_url = 'https://dl.fedoraproject.org/pub/fedora/linux/'    \
+                 'releases/33/Everything/aarch64/os/images/'         \
+                 'pxeboot/vmlinuz'
+    initrd_url = 'https://dl.fedoraproject.org/pub/fedora/linux/'    \
+                 'releases/33/Everything/aarch64/os/images/'         \
+                 'pxeboot/initrd.img'
+    proxy_cmd.append('-machine')
+    proxy_cmd.append('virt,gic-version=3')
+    proxy_cmd.append('-accel')
+    proxy_cmd.append('kvm')
+    proxy_cmd.append('-append')
+    proxy_cmd.append('rdinit=/bin/bash')
+else:
+    sys.exit('Arch %s not tested' % arch)
+
+urllib.request.urlretrieve(kernel_url, kernel_path)
+urllib.request.urlretrieve(initrd_url, initrd_path)
+
+proxy, remote = socket.socketpair(socket.AF_UNIX, socket.SOCK_STREAM)
+
+proxy_cmd.append('-device')
+proxy_cmd.append('x-pci-proxy-dev,id=lsi1,fd='+str(proxy.fileno()))
+
+remote_cmd = [ proc_path,                                                      \
+               '-machine', 'x-remote',                                         \
+               '-device', 'lsi53c895a,id=lsi1',                                \
+               '-object',                                                      \
+               'x-remote-object,id=robj1,devid=lsi1,fd='+str(remote.fileno()), \
+               '-display', 'none',                                             \
+               '-monitor', 'unix:/home/rem-sock,server,nowait',                \
+             ]
+
+pid = os.fork();
+
+if pid:
+    # In Proxy
+    print('Launching QEMU with Proxy object');
+    process = subprocess.Popen(proxy_cmd, pass_fds=[proxy.fileno()])
+else:
+    # In remote
+    print('Launching Remote process');
+    process = subprocess.Popen(remote_cmd, pass_fds=[remote.fileno(), 0, 1, 2])
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v12 03/19] memory: alloc RAM from file at offset
  2020-12-01 20:22 [PATCH v12 00/19] Initial support for multi-process Qemu Jagannathan Raman
  2020-12-01 20:22 ` [PATCH v12 01/19] multi-process: add the concept description to docs/devel/qemu-multiprocess Jagannathan Raman
  2020-12-01 20:22 ` [PATCH v12 02/19] multi-process: add configure and usage information Jagannathan Raman
@ 2020-12-01 20:22 ` Jagannathan Raman
  2020-12-04 14:13   ` Marc-André Lureau
  2020-12-01 20:22 ` [PATCH v12 04/19] multi-process: Add config option for multi-process QEMU Jagannathan Raman
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 52+ messages in thread
From: Jagannathan Raman @ 2020-12-01 20:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, swapnil.ingle, john.g.johnson, kraxel,
	jag.raman, quintela, mst, armbru, kanth.ghatraju, felipe, thuth,
	ehabkost, konrad.wilk, dgilbert, alex.williamson, stefanha,
	thanos.makatos, rth, kwolf, berrange, mreitz, ross.lagerwall,
	marcandre.lureau, pbonzini

Allow RAM MemoryRegion to be created from an offset in a file, instead
of allocating at offset of 0 by default. This is needed to synchronize
RAM between QEMU & remote process.

Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/exec/memory.h     |  2 ++
 include/exec/ram_addr.h   |  2 +-
 include/qemu/mmap-alloc.h |  3 ++-
 backends/hostmem-memfd.c  |  2 +-
 hw/misc/ivshmem.c         |  3 ++-
 softmmu/memory.c          |  3 ++-
 softmmu/physmem.c         | 11 +++++++----
 util/mmap-alloc.c         |  7 ++++---
 util/oslib-posix.c        |  2 +-
 9 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 0f3e6bc..7bcaada 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -980,6 +980,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
  * @size: size of the region.
  * @share: %true if memory must be mmaped with the MAP_SHARED flag
  * @fd: the fd to mmap.
+ * @offset: offset within the file referenced by fd
  * @errp: pointer to Error*, to store an error if it happens.
  *
  * Note that this function does not do anything to cause the data in the
@@ -991,6 +992,7 @@ void memory_region_init_ram_from_fd(MemoryRegion *mr,
                                     uint64_t size,
                                     bool share,
                                     int fd,
+                                    ram_addr_t offset,
                                     Error **errp);
 #endif
 
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index c6d2ef1..d465a48 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -121,7 +121,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
                                    Error **errp);
 RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
                                  uint32_t ram_flags, int fd,
-                                 Error **errp);
+                                 off_t offset, Error **errp);
 
 RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
                                   MemoryRegion *mr, Error **errp);
diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index e786266..4f57985 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -25,7 +25,8 @@ void *qemu_ram_mmap(int fd,
                     size_t size,
                     size_t align,
                     bool shared,
-                    bool is_pmem);
+                    bool is_pmem,
+                    off_t start);
 
 void qemu_ram_munmap(int fd, void *ptr, size_t size);
 
diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c
index e5626d4..69b0ae3 100644
--- a/backends/hostmem-memfd.c
+++ b/backends/hostmem-memfd.c
@@ -55,7 +55,7 @@ memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
     name = host_memory_backend_get_name(backend);
     memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend),
                                    name, backend->size,
-                                   backend->share, fd, errp);
+                                   backend->share, fd, 0, errp);
     g_free(name);
 }
 
diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index e321e5c..8d3e1ee 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -494,7 +494,8 @@ static void process_msg_shmem(IVShmemState *s, int fd, Error **errp)
 
     /* mmap the region and map into the BAR2 */
     memory_region_init_ram_from_fd(&s->server_bar2, OBJECT(s),
-                                   "ivshmem.bar2", size, true, fd, &local_err);
+                                   "ivshmem.bar2", size, true, fd, 0,
+                                   &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         return;
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 11ca94d..e4ed0e4 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1612,6 +1612,7 @@ void memory_region_init_ram_from_fd(MemoryRegion *mr,
                                     uint64_t size,
                                     bool share,
                                     int fd,
+                                    ram_addr_t offset,
                                     Error **errp)
 {
     Error *err = NULL;
@@ -1621,7 +1622,7 @@ void memory_region_init_ram_from_fd(MemoryRegion *mr,
     mr->destructor = memory_region_destructor_ram;
     mr->ram_block = qemu_ram_alloc_from_fd(size, mr,
                                            share ? RAM_SHARED : 0,
-                                           fd, &err);
+                                           fd, offset, &err);
     mr->dirty_log_mask = tcg_enabled() ? (1 << DIRTY_MEMORY_CODE) : 0;
     if (err) {
         mr->size = int128_zero();
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 3027747..e0b8fc6 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -1461,6 +1461,7 @@ static void *file_ram_alloc(RAMBlock *block,
                             ram_addr_t memory,
                             int fd,
                             bool truncate,
+                            off_t offset,
                             Error **errp)
 {
     void *area;
@@ -1511,7 +1512,8 @@ static void *file_ram_alloc(RAMBlock *block,
     }
 
     area = qemu_ram_mmap(fd, memory, block->mr->align,
-                         block->flags & RAM_SHARED, block->flags & RAM_PMEM);
+                         block->flags & RAM_SHARED, block->flags & RAM_PMEM,
+                         offset);
     if (area == MAP_FAILED) {
         error_setg_errno(errp, errno,
                          "unable to map backing store for guest RAM");
@@ -1943,7 +1945,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
 #ifdef CONFIG_POSIX
 RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
                                  uint32_t ram_flags, int fd,
-                                 Error **errp)
+                                 off_t offset, Error **errp)
 {
     RAMBlock *new_block;
     Error *local_err = NULL;
@@ -1996,7 +1998,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
     new_block->used_length = size;
     new_block->max_length = size;
     new_block->flags = ram_flags;
-    new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
+    new_block->host = file_ram_alloc(new_block, size, fd, !file_size, offset,
+                                     errp);
     if (!new_block->host) {
         g_free(new_block);
         return NULL;
@@ -2026,7 +2029,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
         return NULL;
     }
 
-    block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, errp);
+    block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, 0, errp);
     if (!block) {
         if (created) {
             unlink(mem_path);
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 27dcccd..a28f702 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -86,7 +86,8 @@ void *qemu_ram_mmap(int fd,
                     size_t size,
                     size_t align,
                     bool shared,
-                    bool is_pmem)
+                    bool is_pmem,
+                    off_t start)
 {
     int flags;
     int map_sync_flags = 0;
@@ -147,7 +148,7 @@ void *qemu_ram_mmap(int fd,
     offset = QEMU_ALIGN_UP((uintptr_t)guardptr, align) - (uintptr_t)guardptr;
 
     ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE,
-               flags | map_sync_flags, fd, 0);
+               flags | map_sync_flags, fd, start);
 
     if (ptr == MAP_FAILED && map_sync_flags) {
         if (errno == ENOTSUP) {
@@ -172,7 +173,7 @@ void *qemu_ram_mmap(int fd,
          * we will remove these flags to handle compatibility.
          */
         ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE,
-                   flags, fd, 0);
+                   flags, fd, start);
     }
 
     if (ptr == MAP_FAILED) {
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index f15234b..93874df 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -227,7 +227,7 @@ void *qemu_memalign(size_t alignment, size_t size)
 void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
 {
     size_t align = QEMU_VMALLOC_ALIGN;
-    void *ptr = qemu_ram_mmap(-1, size, align, shared, false);
+    void *ptr = qemu_ram_mmap(-1, size, align, shared, false, 0);
 
     if (ptr == MAP_FAILED) {
         return NULL;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v12 04/19] multi-process: Add config option for multi-process QEMU
  2020-12-01 20:22 [PATCH v12 00/19] Initial support for multi-process Qemu Jagannathan Raman
                   ` (2 preceding siblings ...)
  2020-12-01 20:22 ` [PATCH v12 03/19] memory: alloc RAM from file at offset Jagannathan Raman
@ 2020-12-01 20:22 ` Jagannathan Raman
  2020-12-01 20:22 ` [PATCH v12 05/19] multi-process: setup PCI host bridge for remote device Jagannathan Raman
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 52+ messages in thread
From: Jagannathan Raman @ 2020-12-01 20:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, swapnil.ingle, john.g.johnson, kraxel,
	jag.raman, quintela, mst, armbru, kanth.ghatraju, felipe, thuth,
	ehabkost, konrad.wilk, dgilbert, alex.williamson, stefanha,
	thanos.makatos, rth, kwolf, berrange, mreitz, ross.lagerwall,
	marcandre.lureau, pbonzini

Add a configuration option to separate multi-process code

Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 accel/Kconfig     | 1 +
 hw/Kconfig        | 1 +
 hw/remote/Kconfig | 3 +++
 3 files changed, 5 insertions(+)
 create mode 100644 hw/remote/Kconfig

diff --git a/accel/Kconfig b/accel/Kconfig
index 2ad94a3..3b6ce5a 100644
--- a/accel/Kconfig
+++ b/accel/Kconfig
@@ -3,6 +3,7 @@ config TCG
 
 config KVM
     bool
+    select MULTIPROCESS
 
 config XEN
     bool
diff --git a/hw/Kconfig b/hw/Kconfig
index 4de1797..e714c25 100644
--- a/hw/Kconfig
+++ b/hw/Kconfig
@@ -27,6 +27,7 @@ source pci-host/Kconfig
 source pcmcia/Kconfig
 source pci/Kconfig
 source rdma/Kconfig
+source remote/Kconfig
 source rtc/Kconfig
 source scsi/Kconfig
 source sd/Kconfig
diff --git a/hw/remote/Kconfig b/hw/remote/Kconfig
new file mode 100644
index 0000000..5484446
--- /dev/null
+++ b/hw/remote/Kconfig
@@ -0,0 +1,3 @@
+config MULTIPROCESS
+    bool
+    depends on PCI && KVM
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v12 05/19] multi-process: setup PCI host bridge for remote device
  2020-12-01 20:22 [PATCH v12 00/19] Initial support for multi-process Qemu Jagannathan Raman
                   ` (3 preceding siblings ...)
  2020-12-01 20:22 ` [PATCH v12 04/19] multi-process: Add config option for multi-process QEMU Jagannathan Raman
@ 2020-12-01 20:22 ` Jagannathan Raman
  2020-12-04 14:29   ` Marc-André Lureau
  2020-12-04 14:32   ` Marc-André Lureau
  2020-12-01 20:22 ` [PATCH v12 06/19] multi-process: setup a machine object for remote device process Jagannathan Raman
                   ` (14 subsequent siblings)
  19 siblings, 2 replies; 52+ messages in thread
From: Jagannathan Raman @ 2020-12-01 20:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, swapnil.ingle, john.g.johnson, kraxel,
	jag.raman, quintela, mst, armbru, kanth.ghatraju, felipe, thuth,
	ehabkost, konrad.wilk, dgilbert, alex.williamson, stefanha,
	thanos.makatos, rth, kwolf, berrange, mreitz, ross.lagerwall,
	marcandre.lureau, pbonzini

PCI host bridge is setup for the remote device process. It is
implemented using remote-pcihost object. It is an extension of the PCI
host bridge setup by QEMU.
Remote-pcihost configures a PCI bus which could be used by the remote
PCI device to latch on to.

Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/hw/pci-host/remote.h | 30 ++++++++++++++++++
 hw/pci-host/remote.c         | 75 ++++++++++++++++++++++++++++++++++++++++++++
 MAINTAINERS                  |  2 ++
 hw/pci-host/Kconfig          |  3 ++
 hw/pci-host/meson.build      |  1 +
 hw/remote/Kconfig            |  1 +
 6 files changed, 112 insertions(+)
 create mode 100644 include/hw/pci-host/remote.h
 create mode 100644 hw/pci-host/remote.c

diff --git a/include/hw/pci-host/remote.h b/include/hw/pci-host/remote.h
new file mode 100644
index 0000000..bab6d3c
--- /dev/null
+++ b/include/hw/pci-host/remote.h
@@ -0,0 +1,30 @@
+/*
+ * PCI Host for remote device
+ *
+ * Copyright © 2018, 2020 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef REMOTE_PCIHOST_H
+#define REMOTE_PCIHOST_H
+
+#include "exec/memory.h"
+#include "hw/pci/pcie_host.h"
+
+#define TYPE_REMOTE_HOST_DEVICE "remote-pcihost"
+#define REMOTE_HOST_DEVICE(obj) \
+    OBJECT_CHECK(RemotePCIHost, (obj), TYPE_REMOTE_HOST_DEVICE)
+
+typedef struct RemotePCIHost {
+    /*< private >*/
+    PCIExpressHost parent_obj;
+    /*< public >*/
+
+    MemoryRegion *mr_pci_mem;
+    MemoryRegion *mr_sys_io;
+} RemotePCIHost;
+
+#endif
diff --git a/hw/pci-host/remote.c b/hw/pci-host/remote.c
new file mode 100644
index 0000000..11325e2
--- /dev/null
+++ b/hw/pci-host/remote.c
@@ -0,0 +1,75 @@
+/*
+ * Remote PCI host device
+ *
+ * Unlike PCI host devices that model physical hardware, the purpose
+ * of this PCI host is to host multi-process QEMU devices.
+ *
+ * Multi-process QEMU extends the PCI host of a QEMU machine into a
+ * remote process. Any PCI device attached to the remote process is
+ * visible in the QEMU guest. This allows existing QEMU device models
+ * to be reused in the remote process.
+ *
+ * This PCI host is purely a container for PCI devices. It's fake in the
+ * sense that the guest never sees this PCI host and has no way of
+ * accessing it. Its job is just to provide the environment that QEMU
+ * PCI device models need when running in a remote process.
+ *
+ * Copyright © 2018, 2020 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include "hw/pci/pci.h"
+#include "hw/pci/pci_host.h"
+#include "hw/pci/pcie_host.h"
+#include "hw/qdev-properties.h"
+#include "hw/pci-host/remote.h"
+#include "exec/memory.h"
+
+static const char *remote_pcihost_root_bus_path(PCIHostState *host_bridge,
+                                                PCIBus *rootbus)
+{
+    return "0000:00";
+}
+
+static void remote_pcihost_realize(DeviceState *dev, Error **errp)
+{
+    PCIHostState *pci = PCI_HOST_BRIDGE(dev);
+    RemotePCIHost *s = REMOTE_HOST_DEVICE(dev);
+
+    pci->bus = pci_root_bus_new(DEVICE(s), "remote-pci",
+                                s->mr_pci_mem, s->mr_sys_io,
+                                0, TYPE_PCIE_BUS);
+}
+
+static void remote_pcihost_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
+
+    hc->root_bus_path = remote_pcihost_root_bus_path;
+    dc->realize = remote_pcihost_realize;
+
+    dc->user_creatable = false;
+    set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
+    dc->fw_name = "pci";
+}
+
+static const TypeInfo remote_pcihost_info = {
+    .name = TYPE_REMOTE_HOST_DEVICE,
+    .parent = TYPE_PCIE_HOST_BRIDGE,
+    .instance_size = sizeof(RemotePCIHost),
+    .class_init = remote_pcihost_class_init,
+};
+
+static void remote_pcihost_register(void)
+{
+    type_register_static(&remote_pcihost_info);
+}
+
+type_init(remote_pcihost_register)
diff --git a/MAINTAINERS b/MAINTAINERS
index f615ad1..4515476 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3137,6 +3137,8 @@ M: John G Johnson <john.g.johnson@oracle.com>
 S: Maintained
 F: docs/devel/multi-process.rst
 F: tests/multiprocess/multiprocess-lsi53c895a.py
+F: hw/pci-host/remote.c
+F: include/hw/pci-host/remote.h
 
 Build and test automation
 -------------------------
diff --git a/hw/pci-host/Kconfig b/hw/pci-host/Kconfig
index 036a618..25cdeb2 100644
--- a/hw/pci-host/Kconfig
+++ b/hw/pci-host/Kconfig
@@ -60,3 +60,6 @@ config PCI_BONITO
     select PCI
     select UNIMP
     bool
+
+config MULTIPROCESS_HOST
+    bool
diff --git a/hw/pci-host/meson.build b/hw/pci-host/meson.build
index e6d1b89..4147100 100644
--- a/hw/pci-host/meson.build
+++ b/hw/pci-host/meson.build
@@ -9,6 +9,7 @@ pci_ss.add(when: 'CONFIG_PCI_EXPRESS_XILINX', if_true: files('xilinx-pcie.c'))
 pci_ss.add(when: 'CONFIG_PCI_I440FX', if_true: files('i440fx.c'))
 pci_ss.add(when: 'CONFIG_PCI_SABRE', if_true: files('sabre.c'))
 pci_ss.add(when: 'CONFIG_XEN_IGD_PASSTHROUGH', if_true: files('xen_igd_pt.c'))
+pci_ss.add(when: 'CONFIG_MULTIPROCESS_HOST', if_true: files('remote.c'))
 
 # PPC devices
 pci_ss.add(when: 'CONFIG_PREP_PCI', if_true: files('prep.c'))
diff --git a/hw/remote/Kconfig b/hw/remote/Kconfig
index 5484446..fb6ee4a 100644
--- a/hw/remote/Kconfig
+++ b/hw/remote/Kconfig
@@ -1,3 +1,4 @@
 config MULTIPROCESS
     bool
     depends on PCI && KVM
+    select MULTIPROCESS_HOST
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v12 06/19] multi-process: setup a machine object for remote device process
  2020-12-01 20:22 [PATCH v12 00/19] Initial support for multi-process Qemu Jagannathan Raman
                   ` (4 preceding siblings ...)
  2020-12-01 20:22 ` [PATCH v12 05/19] multi-process: setup PCI host bridge for remote device Jagannathan Raman
@ 2020-12-01 20:22 ` Jagannathan Raman
  2020-12-04 14:35   ` Marc-André Lureau
  2020-12-01 20:22 ` [PATCH v12 07/19] multi-process: add qio channel function to transmit data and fds Jagannathan Raman
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 52+ messages in thread
From: Jagannathan Raman @ 2020-12-01 20:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, swapnil.ingle, john.g.johnson, kraxel,
	jag.raman, quintela, mst, armbru, kanth.ghatraju, felipe, thuth,
	ehabkost, konrad.wilk, dgilbert, alex.williamson, stefanha,
	thanos.makatos, rth, kwolf, berrange, mreitz, ross.lagerwall,
	marcandre.lureau, pbonzini

x-remote-machine object sets up various subsystems of the remote
device process. Instantiate PCI host bridge object and initialize RAM, IO &
PCI memory regions.

Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/hw/pci-host/remote.h |  1 +
 include/hw/remote/machine.h  | 28 ++++++++++++++++++
 hw/remote/machine.c          | 69 ++++++++++++++++++++++++++++++++++++++++++++
 MAINTAINERS                  |  2 ++
 hw/meson.build               |  1 +
 hw/remote/meson.build        |  5 ++++
 6 files changed, 106 insertions(+)
 create mode 100644 include/hw/remote/machine.h
 create mode 100644 hw/remote/machine.c
 create mode 100644 hw/remote/meson.build

diff --git a/include/hw/pci-host/remote.h b/include/hw/pci-host/remote.h
index bab6d3c..cc0fff4 100644
--- a/include/hw/pci-host/remote.h
+++ b/include/hw/pci-host/remote.h
@@ -25,6 +25,7 @@ typedef struct RemotePCIHost {
 
     MemoryRegion *mr_pci_mem;
     MemoryRegion *mr_sys_io;
+    MemoryRegion *mr_sys_mem;
 } RemotePCIHost;
 
 #endif
diff --git a/include/hw/remote/machine.h b/include/hw/remote/machine.h
new file mode 100644
index 0000000..d312972
--- /dev/null
+++ b/include/hw/remote/machine.h
@@ -0,0 +1,28 @@
+/*
+ * Remote machine configuration
+ *
+ * Copyright © 2018, 2020 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef REMOTE_MACHINE_H
+#define REMOTE_MACHINE_H
+
+#include "qom/object.h"
+#include "hw/boards.h"
+#include "hw/pci-host/remote.h"
+
+typedef struct RemoteMachineState {
+    MachineState parent_obj;
+
+    RemotePCIHost *host;
+} RemoteMachineState;
+
+#define TYPE_REMOTE_MACHINE "x-remote-machine"
+#define REMOTE_MACHINE(obj) \
+    OBJECT_CHECK(RemoteMachineState, (obj), TYPE_REMOTE_MACHINE)
+
+#endif
diff --git a/hw/remote/machine.c b/hw/remote/machine.c
new file mode 100644
index 0000000..c5658bf
--- /dev/null
+++ b/hw/remote/machine.c
@@ -0,0 +1,69 @@
+/*
+ * Machine for remote device
+ *
+ *  This machine type is used by the remote device process in multi-process
+ *  QEMU. QEMU device models depend on parent busses, interrupt controllers,
+ *  memory regions, etc. The remote machine type offers this environment so
+ *  that QEMU device models can be used as remote devices.
+ *
+ * Copyright © 2018, 2020 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include "hw/remote/machine.h"
+#include "exec/address-spaces.h"
+#include "exec/memory.h"
+#include "qapi/error.h"
+
+static void remote_machine_init(MachineState *machine)
+{
+    MemoryRegion *system_memory, *system_io, *pci_memory;
+    RemoteMachineState *s = REMOTE_MACHINE(machine);
+    RemotePCIHost *rem_host;
+
+    system_memory = get_system_memory();
+    system_io = get_system_io();
+
+    pci_memory = g_new(MemoryRegion, 1);
+    memory_region_init(pci_memory, NULL, "pci", UINT64_MAX);
+
+    rem_host = REMOTE_HOST_DEVICE(qdev_new(TYPE_REMOTE_HOST_DEVICE));
+
+    rem_host->mr_pci_mem = pci_memory;
+    rem_host->mr_sys_mem = system_memory;
+    rem_host->mr_sys_io = system_io;
+
+    s->host = rem_host;
+
+    object_property_add_child(OBJECT(s), "remote-device", OBJECT(rem_host));
+    memory_region_add_subregion_overlap(system_memory, 0x0, pci_memory, -1);
+
+    qdev_realize(DEVICE(rem_host), sysbus_get_default(), &error_fatal);
+}
+
+static void remote_machine_class_init(ObjectClass *oc, void *data)
+{
+    MachineClass *mc = MACHINE_CLASS(oc);
+
+    mc->init = remote_machine_init;
+}
+
+static const TypeInfo remote_machine = {
+    .name = TYPE_REMOTE_MACHINE,
+    .parent = TYPE_MACHINE,
+    .instance_size = sizeof(RemoteMachineState),
+    .class_init = remote_machine_class_init,
+};
+
+static void remote_machine_register_types(void)
+{
+    type_register_static(&remote_machine);
+}
+
+type_init(remote_machine_register_types);
diff --git a/MAINTAINERS b/MAINTAINERS
index 4515476..c45ac1d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3139,6 +3139,8 @@ F: docs/devel/multi-process.rst
 F: tests/multiprocess/multiprocess-lsi53c895a.py
 F: hw/pci-host/remote.c
 F: include/hw/pci-host/remote.h
+F: hw/remote/machine.c
+F: include/hw/remote/machine.h
 
 Build and test automation
 -------------------------
diff --git a/hw/meson.build b/hw/meson.build
index 010de72..e615d72 100644
--- a/hw/meson.build
+++ b/hw/meson.build
@@ -56,6 +56,7 @@ subdir('moxie')
 subdir('nios2')
 subdir('openrisc')
 subdir('ppc')
+subdir('remote')
 subdir('riscv')
 subdir('rx')
 subdir('s390x')
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
new file mode 100644
index 0000000..197b038
--- /dev/null
+++ b/hw/remote/meson.build
@@ -0,0 +1,5 @@
+remote_ss = ss.source_set()
+
+remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('machine.c'))
+
+softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v12 07/19] multi-process: add qio channel function to transmit data and fds
  2020-12-01 20:22 [PATCH v12 00/19] Initial support for multi-process Qemu Jagannathan Raman
                   ` (5 preceding siblings ...)
  2020-12-01 20:22 ` [PATCH v12 06/19] multi-process: setup a machine object for remote device process Jagannathan Raman
@ 2020-12-01 20:22 ` Jagannathan Raman
  2020-12-04 14:40   ` Marc-André Lureau
  2020-12-01 20:22 ` [PATCH v12 08/19] multi-process: define MPQemuMsg format and transmission functions Jagannathan Raman
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 52+ messages in thread
From: Jagannathan Raman @ 2020-12-01 20:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, swapnil.ingle, john.g.johnson, kraxel,
	jag.raman, quintela, mst, armbru, kanth.ghatraju, felipe, thuth,
	ehabkost, konrad.wilk, dgilbert, alex.williamson, stefanha,
	thanos.makatos, rth, kwolf, berrange, mreitz, ross.lagerwall,
	marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

Adds QIO channel functions that transmits the input iovs as well as the
supplied fds.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/io/channel.h | 24 ++++++++++++++++++++++++
 io/channel.c         | 45 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+)

diff --git a/include/io/channel.h b/include/io/channel.h
index 4d6fe45..0aa44e1 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -773,5 +773,29 @@ void qio_channel_set_aio_fd_handler(QIOChannel *ioc,
                                     IOHandler *io_read,
                                     IOHandler *io_write,
                                     void *opaque);
+/**
+ * qio_channel_writev_full_all:
+ * @ioc: the channel object
+ * @iov: the array of memory regions to write data from
+ * @niov: the length of the @iov array
+ * @fds: an array of file handles to send
+ * @nfds: number of file handles in @fds
+ * @errp: pointer to a NULL-initialized error object
+ *
+ *
+ * Behaves like qio_channel_writev_full but will attempt
+ * to send all data passed (file handles and memory regions).
+ * The function will wait for all requested data
+ * to be written, yielding from the current coroutine
+ * if required.
+ *
+ * Returns: 0 if all bytes were written, or -1 on error
+ */
+
+int qio_channel_writev_full_all(QIOChannel *ioc,
+                           const struct iovec *iov,
+                           size_t niov,
+                           int *fds, size_t nfds,
+                           Error **errp);
 
 #endif /* QIO_CHANNEL_H */
diff --git a/io/channel.c b/io/channel.c
index 93d449d..255dd46 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -190,6 +190,51 @@ int qio_channel_writev_all(QIOChannel *ioc,
     return ret;
 }
 
+int qio_channel_writev_full_all(QIOChannel *ioc,
+                                const struct iovec *iov,
+                                size_t niov,
+                                int *fds, size_t nfds,
+                                Error **errp)
+{
+    int ret = -1;
+    struct iovec *local_iov = g_new(struct iovec, niov);
+    struct iovec *local_iov_head = local_iov;
+    unsigned int nlocal_iov = niov;
+
+    nlocal_iov = iov_copy(local_iov, nlocal_iov,
+                          iov, niov,
+                          0, iov_size(iov, niov));
+
+    while (nlocal_iov > 0) {
+        ssize_t len;
+        len = qio_channel_writev_full(ioc, local_iov, nlocal_iov, fds,
+                                      nfds, errp);
+        if (len == QIO_CHANNEL_ERR_BLOCK) {
+            if (qemu_in_coroutine()) {
+                qio_channel_yield(ioc, G_IO_OUT);
+            } else {
+                qio_channel_wait(ioc, G_IO_OUT);
+            }
+            continue;
+        }
+        if (len < 0) {
+            goto cleanup;
+        }
+
+        iov_discard_front(&local_iov, &nlocal_iov, len);
+
+        if (len > 0) {
+            fds = NULL;
+            nfds = 0;
+        }
+    }
+
+    ret = 0;
+ cleanup:
+    g_free(local_iov_head);
+    return ret;
+}
+
 ssize_t qio_channel_readv(QIOChannel *ioc,
                           const struct iovec *iov,
                           size_t niov,
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v12 08/19] multi-process: define MPQemuMsg format and transmission functions
  2020-12-01 20:22 [PATCH v12 00/19] Initial support for multi-process Qemu Jagannathan Raman
                   ` (6 preceding siblings ...)
  2020-12-01 20:22 ` [PATCH v12 07/19] multi-process: add qio channel function to transmit data and fds Jagannathan Raman
@ 2020-12-01 20:22 ` Jagannathan Raman
  2020-12-07 13:18   ` Marc-André Lureau
  2020-12-01 20:22 ` [PATCH v12 09/19] multi-process: Initialize message handler in remote device Jagannathan Raman
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 52+ messages in thread
From: Jagannathan Raman @ 2020-12-01 20:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, swapnil.ingle, john.g.johnson, kraxel,
	jag.raman, quintela, mst, armbru, kanth.ghatraju, felipe, thuth,
	ehabkost, konrad.wilk, dgilbert, alex.williamson, stefanha,
	thanos.makatos, rth, kwolf, berrange, mreitz, ross.lagerwall,
	marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

Defines MPQemuMsg, which is the message that is sent to the remote
process. This message is sent over QIOChannel and is used to
command the remote process to perform various tasks.
Define transmission functions used by proxy and by remote.
There are certain restrictions on where its safe to use these
functions:
  - From main loop in co-routine context. Will block the main loop if not in
    co-routine context;
  - From vCPU thread with no co-routine context and if the channel is not part
    of the main loop handling;
  - From IOThread within co-routine context, outside of co-routine context will
    block IOThread;

Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 include/hw/remote/mpqemu-link.h |  60 ++++++++++
 hw/remote/mpqemu-link.c         | 242 ++++++++++++++++++++++++++++++++++++++++
 MAINTAINERS                     |   2 +
 hw/remote/meson.build           |   1 +
 4 files changed, 305 insertions(+)
 create mode 100644 include/hw/remote/mpqemu-link.h
 create mode 100644 hw/remote/mpqemu-link.c

diff --git a/include/hw/remote/mpqemu-link.h b/include/hw/remote/mpqemu-link.h
new file mode 100644
index 0000000..2d79ff8
--- /dev/null
+++ b/include/hw/remote/mpqemu-link.h
@@ -0,0 +1,60 @@
+/*
+ * Communication channel between QEMU and remote device process
+ *
+ * Copyright © 2018, 2020 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef MPQEMU_LINK_H
+#define MPQEMU_LINK_H
+
+#include "qom/object.h"
+#include "qemu/thread.h"
+#include "io/channel.h"
+
+#define REMOTE_MAX_FDS 8
+
+#define MPQEMU_MSG_HDR_SIZE offsetof(MPQemuMsg, data.u64)
+
+/**
+ * MPQemuCmd:
+ *
+ * MPQemuCmd enum type to specify the command to be executed on the remote
+ * device.
+ */
+typedef enum {
+    MPQEMU_CMD_INIT,
+    MPQEMU_CMD_MAX,
+} MPQemuCmd;
+
+/**
+ * MPQemuMsg:
+ * @cmd: The remote command
+ * @size: Size of the data to be shared
+ * @data: Structured data
+ * @fds: File descriptors to be shared with remote device
+ *
+ * MPQemuMsg Format of the message sent to the remote device from QEMU.
+ *
+ */
+typedef struct {
+    int cmd;
+    size_t size;
+
+    union {
+        uint64_t u64;
+    } data;
+
+    int fds[REMOTE_MAX_FDS];
+    int num_fds;
+} MPQemuMsg;
+
+void mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp);
+void mpqemu_msg_recv(MPQemuMsg *msg, QIOChannel *ioc, Error **errp);
+
+bool mpqemu_msg_valid(MPQemuMsg *msg);
+
+#endif
diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
new file mode 100644
index 0000000..e535ed2
--- /dev/null
+++ b/hw/remote/mpqemu-link.c
@@ -0,0 +1,242 @@
+/*
+ * Communication channel between QEMU and remote device process
+ *
+ * Copyright © 2018, 2020 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include "qemu/module.h"
+#include "hw/remote/mpqemu-link.h"
+#include "qapi/error.h"
+#include "qemu/iov.h"
+#include "qemu/error-report.h"
+#include "qemu/main-loop.h"
+
+/*
+ * Send message over the ioc QIOChannel.
+ * This function is safe to call from:
+ * - From main loop in co-routine context. Will block the main loop if not in
+ *   co-routine context;
+ * - From vCPU thread with no co-routine context and if the channel is not part
+ *   of the main loop handling;
+ * - From IOThread within co-routine context, outside of co-routine context
+ *   will block IOThread;
+ */
+void mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp)
+{
+    bool iolock = qemu_mutex_iothread_locked();
+    bool iothread = qemu_get_current_aio_context() == qemu_get_aio_context() ?
+                    false : true;
+    Error *local_err = NULL;
+    struct iovec send[2] = {0};
+    int *fds = NULL;
+    size_t nfds = 0;
+
+    send[0].iov_base = msg;
+    send[0].iov_len = MPQEMU_MSG_HDR_SIZE;
+
+    send[1].iov_base = (void *)&msg->data;
+    send[1].iov_len = msg->size;
+
+    if (msg->num_fds) {
+        nfds = msg->num_fds;
+        fds = msg->fds;
+    }
+    /*
+     * Dont use in IOThread out of co-routine context as
+     * it will block IOThread.
+     */
+    if (iothread) {
+        assert(qemu_in_coroutine());
+    }
+    /*
+     * Skip unlocking/locking iothread when in IOThread running
+     * in co-routine context. Co-routine context is asserted above
+     * for IOThread case.
+     * Also skip this while in a co-routine in the main context.
+     */
+    if (iolock && !iothread && !qemu_in_coroutine()) {
+        qemu_mutex_unlock_iothread();
+    }
+
+    (void)qio_channel_writev_full_all(ioc, send, G_N_ELEMENTS(send), fds, nfds,
+                                      &local_err);
+
+    if (iolock && !iothread && !qemu_in_coroutine()) {
+        /* See above comment why skip locking here. */
+        qemu_mutex_lock_iothread();
+    }
+
+    if (errp) {
+        error_propagate(errp, local_err);
+    } else if (local_err) {
+        error_report_err(local_err);
+    }
+
+    return;
+}
+
+/*
+ * Read message from the ioc QIOChannel.
+ * This function is safe to call from:
+ * - From main loop in co-routine context. Will block the main loop if not in
+ *   co-routine context;
+ * - From vCPU thread with no co-routine context and if the channel is not part
+ *   of the main loop handling;
+ * - From IOThread within co-routine context, outside of co-routine context
+ *   will block IOThread;
+ */
+static ssize_t mpqemu_read(QIOChannel *ioc, void *buf, size_t len, int **fds,
+                           size_t *nfds, Error **errp)
+{
+    struct iovec iov = { .iov_base = buf, .iov_len = len };
+    bool iolock = qemu_mutex_iothread_locked();
+    bool iothread = qemu_get_current_aio_context() == qemu_get_aio_context()
+                        ? false : true;
+    struct iovec *iovp = &iov;
+    Error *local_err = NULL;
+    unsigned int niov = 1;
+    size_t *l_nfds = nfds;
+    int **l_fds = fds;
+    ssize_t bytes = 0;
+    size_t size;
+
+    size = iov.iov_len;
+
+    /*
+     * Dont use in IOThread out of co-routine context as
+     * it will block IOThread.
+     */
+    if (iothread) {
+        assert(qemu_in_coroutine());
+    }
+
+    while (size > 0) {
+        bytes = qio_channel_readv_full(ioc, iovp, niov, l_fds, l_nfds,
+                                       &local_err);
+        if (bytes == QIO_CHANNEL_ERR_BLOCK) {
+            /*
+             * Skip unlocking/locking iothread when in IOThread running
+             * in co-routine context. Co-routine context is asserted above
+             * for IOThread case.
+             * Also skip this while in a co-routine in the main context.
+             */
+            if (iolock && !iothread && !qemu_in_coroutine()) {
+                qemu_mutex_unlock_iothread();
+            }
+            if (qemu_in_coroutine()) {
+                qio_channel_yield(ioc, G_IO_IN);
+            } else {
+                qio_channel_wait(ioc, G_IO_IN);
+            }
+            /* See above comment why skip locking here. */
+            if (iolock && !iothread && !qemu_in_coroutine()) {
+                qemu_mutex_lock_iothread();
+            }
+            continue;
+        }
+
+        if (bytes <= 0) {
+            error_propagate(errp, local_err);
+            return -EIO;
+        }
+
+        l_fds = NULL;
+        l_nfds = NULL;
+
+        size -= bytes;
+
+        (void)iov_discard_front(&iovp, &niov, bytes);
+    }
+
+    return len - size;
+}
+
+void mpqemu_msg_recv(MPQemuMsg *msg, QIOChannel *ioc, Error **errp)
+{
+    Error *local_err = NULL;
+    int *fds = NULL;
+    size_t nfds = 0;
+    ssize_t len;
+
+    len = mpqemu_read(ioc, (void *)msg, MPQEMU_MSG_HDR_SIZE, &fds, &nfds,
+                      &local_err);
+    if (!local_err) {
+        if (len == -EIO) {
+            error_setg(&local_err, "Connection closed.");
+            goto fail;
+        }
+        if (len < 0) {
+            error_setg(&local_err, "Message length is less than 0");
+            goto fail;
+        }
+        if (len != MPQEMU_MSG_HDR_SIZE) {
+            error_setg(&local_err, "Message header corrupted");
+            goto fail;
+        }
+    } else {
+        goto fail;
+    }
+
+    if (msg->size > sizeof(msg->data)) {
+        error_setg(&local_err, "Invalid size for message");
+        goto fail;
+    }
+
+    if (mpqemu_read(ioc, (void *)&msg->data, msg->size, NULL, NULL,
+                    &local_err) < 0) {
+        goto fail;
+    }
+
+    msg->num_fds = nfds;
+    if (nfds > G_N_ELEMENTS(msg->fds)) {
+        error_setg(&local_err,
+                   "Overflow error: received %zu fds, more than max of %d fds",
+                   nfds, REMOTE_MAX_FDS);
+        goto fail;
+    } else if (nfds) {
+        memcpy(msg->fds, fds, nfds * sizeof(int));
+    }
+
+fail:
+    while (local_err && nfds) {
+        close(fds[nfds - 1]);
+        nfds--;
+    }
+
+    g_free(fds);
+
+    if (errp) {
+        error_propagate(errp, local_err);
+    } else if (local_err) {
+        error_report_err(local_err);
+    }
+}
+
+bool mpqemu_msg_valid(MPQemuMsg *msg)
+{
+    if (msg->cmd >= MPQEMU_CMD_MAX && msg->cmd < 0) {
+        return false;
+    }
+
+    /* Verify FDs. */
+    if (msg->num_fds >= REMOTE_MAX_FDS) {
+        return false;
+    }
+
+    if (msg->num_fds > 0) {
+        for (int i = 0; i < msg->num_fds; i++) {
+            if (fcntl(msg->fds[i], F_GETFL) == -1) {
+                return false;
+            }
+        }
+    }
+
+    return true;
+}
diff --git a/MAINTAINERS b/MAINTAINERS
index c45ac1d..d0c891a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3141,6 +3141,8 @@ F: hw/pci-host/remote.c
 F: include/hw/pci-host/remote.h
 F: hw/remote/machine.c
 F: include/hw/remote/machine.h
+F: hw/remote/mpqemu-link.c
+F: include/hw/remote/mpqemu-link.h
 
 Build and test automation
 -------------------------
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
index 197b038..a2b2fc0 100644
--- a/hw/remote/meson.build
+++ b/hw/remote/meson.build
@@ -1,5 +1,6 @@
 remote_ss = ss.source_set()
 
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('machine.c'))
+remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('mpqemu-link.c'))
 
 softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v12 09/19] multi-process: Initialize message handler in remote device
  2020-12-01 20:22 [PATCH v12 00/19] Initial support for multi-process Qemu Jagannathan Raman
                   ` (7 preceding siblings ...)
  2020-12-01 20:22 ` [PATCH v12 08/19] multi-process: define MPQemuMsg format and transmission functions Jagannathan Raman
@ 2020-12-01 20:22 ` Jagannathan Raman
  2020-12-07 13:33   ` Marc-André Lureau
  2020-12-01 20:22 ` [PATCH v12 10/19] multi-process: Associate fd of a PCIDevice with its object Jagannathan Raman
                   ` (10 subsequent siblings)
  19 siblings, 1 reply; 52+ messages in thread
From: Jagannathan Raman @ 2020-12-01 20:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, swapnil.ingle, john.g.johnson, kraxel,
	jag.raman, quintela, mst, armbru, kanth.ghatraju, felipe, thuth,
	ehabkost, konrad.wilk, dgilbert, alex.williamson, stefanha,
	thanos.makatos, rth, kwolf, berrange, mreitz, ross.lagerwall,
	marcandre.lureau, pbonzini

Initializes the message handler function in the remote process. It is
called whenever there's an event pending on QIOChannel that registers
this function.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/hw/remote/machine.h |  9 +++++++
 hw/remote/message.c         | 61 +++++++++++++++++++++++++++++++++++++++++++++
 MAINTAINERS                 |  1 +
 hw/remote/meson.build       |  1 +
 4 files changed, 72 insertions(+)
 create mode 100644 hw/remote/message.c

diff --git a/include/hw/remote/machine.h b/include/hw/remote/machine.h
index d312972..3073db6 100644
--- a/include/hw/remote/machine.h
+++ b/include/hw/remote/machine.h
@@ -14,6 +14,7 @@
 #include "qom/object.h"
 #include "hw/boards.h"
 #include "hw/pci-host/remote.h"
+#include "io/channel.h"
 
 typedef struct RemoteMachineState {
     MachineState parent_obj;
@@ -21,8 +22,16 @@ typedef struct RemoteMachineState {
     RemotePCIHost *host;
 } RemoteMachineState;
 
+/* Used to pass to co-routine device and ioc. */
+typedef struct RemoteCommDev {
+    PCIDevice *dev;
+    QIOChannel *ioc;
+} RemoteCommDev;
+
 #define TYPE_REMOTE_MACHINE "x-remote-machine"
 #define REMOTE_MACHINE(obj) \
     OBJECT_CHECK(RemoteMachineState, (obj), TYPE_REMOTE_MACHINE)
 
+void coroutine_fn mpqemu_remote_msg_loop_co(void *data);
+
 #endif
diff --git a/hw/remote/message.c b/hw/remote/message.c
new file mode 100644
index 0000000..5d87bf4
--- /dev/null
+++ b/hw/remote/message.c
@@ -0,0 +1,61 @@
+/*
+ * Copyright © 2020 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL-v2, version 2 or later.
+ *
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include "hw/remote/machine.h"
+#include "io/channel.h"
+#include "hw/remote/mpqemu-link.h"
+#include "qapi/error.h"
+#include "sysemu/runstate.h"
+
+void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
+{
+    RemoteCommDev *com = (RemoteCommDev *)data;
+    PCIDevice *pci_dev = NULL;
+
+    pci_dev = com->dev;
+    for (;;) {
+        MPQemuMsg msg = {0};
+        Error *local_err = NULL;
+
+        if (!com->ioc) {
+            error_report("ERROR: No channel available");
+            break;
+        }
+        mpqemu_msg_recv(&msg, com->ioc, &local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            break;
+        }
+
+        if (!mpqemu_msg_valid(&msg)) {
+            error_report("Received invalid message from proxy"
+                         "in remote process pid=%d", getpid());
+            break;
+        }
+
+        switch (msg.cmd) {
+        default:
+            error_setg(&local_err,
+                       "Unknown command (%d) received for device %s (pid=%d)",
+                       msg.cmd, DEVICE(pci_dev)->id, getpid());
+        }
+
+        if (local_err) {
+            error_report_err(local_err);
+            qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+            break;
+        }
+    }
+    qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+
+    return;
+}
diff --git a/MAINTAINERS b/MAINTAINERS
index d0c891a..b64e4b8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3143,6 +3143,7 @@ F: hw/remote/machine.c
 F: include/hw/remote/machine.h
 F: hw/remote/mpqemu-link.c
 F: include/hw/remote/mpqemu-link.h
+F: hw/remote/message.c
 
 Build and test automation
 -------------------------
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
index a2b2fc0..9f5c57f 100644
--- a/hw/remote/meson.build
+++ b/hw/remote/meson.build
@@ -2,5 +2,6 @@ remote_ss = ss.source_set()
 
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('machine.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('mpqemu-link.c'))
+remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
 
 softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v12 10/19] multi-process: Associate fd of a PCIDevice with its object
  2020-12-01 20:22 [PATCH v12 00/19] Initial support for multi-process Qemu Jagannathan Raman
                   ` (8 preceding siblings ...)
  2020-12-01 20:22 ` [PATCH v12 09/19] multi-process: Initialize message handler in remote device Jagannathan Raman
@ 2020-12-01 20:22 ` Jagannathan Raman
  2020-12-07 14:03   ` Marc-André Lureau
  2020-12-01 20:22 ` [PATCH v12 11/19] multi-process: setup memory manager for remote device Jagannathan Raman
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 52+ messages in thread
From: Jagannathan Raman @ 2020-12-01 20:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, swapnil.ingle, john.g.johnson, kraxel,
	jag.raman, quintela, mst, armbru, kanth.ghatraju, felipe, thuth,
	ehabkost, konrad.wilk, dgilbert, alex.williamson, stefanha,
	thanos.makatos, rth, kwolf, berrange, mreitz, ross.lagerwall,
	marcandre.lureau, pbonzini

Associate the file descriptor for a PCIDevice in remote process with
DeviceState object.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/hw/remote/remote-obj.h |  42 +++++++++++
 hw/remote/message.c            |   1 +
 hw/remote/remote-obj.c         | 154 +++++++++++++++++++++++++++++++++++++++++
 MAINTAINERS                    |   2 +
 hw/remote/meson.build          |   1 +
 5 files changed, 200 insertions(+)
 create mode 100644 include/hw/remote/remote-obj.h
 create mode 100644 hw/remote/remote-obj.c

diff --git a/include/hw/remote/remote-obj.h b/include/hw/remote/remote-obj.h
new file mode 100644
index 0000000..0e95813
--- /dev/null
+++ b/include/hw/remote/remote-obj.h
@@ -0,0 +1,42 @@
+/*
+ * Copyright © 2020 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL-v2, version 2 or later.
+ *
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef REMOTE_OBJECT_H
+#define REMOTE_OBJECT_H
+
+#include "io/channel.h"
+#include "qemu/notify.h"
+
+#define TYPE_REMOTE_OBJECT "x-remote-object"
+#define REMOTE_OBJECT(obj) \
+    OBJECT_CHECK(RemoteObject, (obj), TYPE_REMOTE_OBJECT)
+#define REMOTE_OBJECT_GET_CLASS(obj) \
+    OBJECT_GET_CLASS(RemoteObjectClass, (obj), TYPE_REMOTE_OBJECT)
+#define REMOTE_OBJECT_CLASS(klass) \
+    OBJECT_CLASS_CHECK(RemoteObjectClass, (klass), TYPE_REMOTE_OBJECT)
+
+typedef struct {
+    ObjectClass parent_class;
+
+    unsigned int nr_devs;
+    unsigned int max_devs;
+} RemoteObjectClass;
+
+typedef struct {
+    /* private */
+    Object parent;
+
+    Notifier machine_done;
+
+    int fd;
+    char *devid;
+    QIOChannel *ioc;
+} RemoteObject;
+
+#endif
diff --git a/hw/remote/message.c b/hw/remote/message.c
index 5d87bf4..1f2edc7 100644
--- a/hw/remote/message.c
+++ b/hw/remote/message.c
@@ -56,6 +56,7 @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
         }
     }
     qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+    g_free(com);
 
     return;
 }
diff --git a/hw/remote/remote-obj.c b/hw/remote/remote-obj.c
new file mode 100644
index 0000000..3b4c0d4
--- /dev/null
+++ b/hw/remote/remote-obj.c
@@ -0,0 +1,154 @@
+/*
+ * Copyright © 2020 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL-v2, version 2 or later.
+ *
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include "hw/remote/remote-obj.h"
+#include "qemu/error-report.h"
+#include "qom/object_interfaces.h"
+#include "hw/qdev-core.h"
+#include "io/channel.h"
+#include "hw/qdev-core.h"
+#include "hw/remote/machine.h"
+#include "io/channel-util.h"
+#include "qapi/error.h"
+#include "sysemu/sysemu.h"
+#include "hw/pci/pci.h"
+
+static void remote_object_set_fd(Object *obj, const char *str, Error **errp)
+{
+    RemoteObject *o = REMOTE_OBJECT(obj);
+
+    o->fd = atoi(str);
+}
+
+static void remote_object_set_devid(Object *obj, const char *str, Error **errp)
+{
+    RemoteObject *o = REMOTE_OBJECT(obj);
+
+    g_free(o->devid);
+
+    o->devid = g_strdup(str);
+}
+
+static void property_release_remote_object(Object *obj, const char *name,
+                                           void *opaque)
+{
+    Object *remote_object = OBJECT(opaque);
+
+    object_unref(remote_object);
+}
+
+static void remote_object_machine_done(Notifier *notifier, void *data)
+{
+    RemoteObject *o = container_of(notifier, RemoteObject, machine_done);
+    DeviceState *dev = NULL;
+    QIOChannel *ioc = NULL;
+    Coroutine *co = NULL;
+    RemoteCommDev *comdev = NULL;
+    Error *err = NULL;
+
+    dev = qdev_find_recursive(sysbus_get_default(), o->devid);
+    if (!dev || !object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
+        error_report("%s is not a PCI device", o->devid);
+        return;
+    }
+
+    ioc = qio_channel_new_fd(o->fd, &err);
+    if (!ioc) {
+        error_report_err(err);
+        return;
+    }
+    qio_channel_set_blocking(ioc, false, NULL);
+
+    object_property_add(OBJECT(dev), "remote-object", "object", NULL, NULL,
+                        property_release_remote_object, (void *)OBJECT(o));
+    /* co-routine should free this. */
+    comdev = g_new0(RemoteCommDev, 1);
+    *comdev = (RemoteCommDev) {
+        .ioc = ioc,
+        .dev = PCI_DEVICE(dev),
+    };
+
+    co = qemu_coroutine_create(mpqemu_remote_msg_loop_co, comdev);
+    qemu_coroutine_enter(co);
+}
+
+static void remote_object_init(Object *obj)
+{
+    RemoteObjectClass *k = REMOTE_OBJECT_GET_CLASS(obj);
+    RemoteObject *o = REMOTE_OBJECT(obj);
+
+    if (k->nr_devs >= k->max_devs) {
+        error_report("Reached maximum number of devices: %u", k->max_devs);
+        return;
+    }
+
+    o->ioc = NULL;
+    o->fd = -1;
+    o->devid = NULL;
+
+    k->nr_devs++;
+
+    object_property_add_str(obj, "fd", NULL, remote_object_set_fd);
+    object_property_set_description(obj, "fd",
+                                    "file descriptor for the object");
+    object_property_add_str(obj, "devid", NULL, remote_object_set_devid);
+    object_property_set_description(obj, "devid",
+                                    "id of device to associate");
+
+    o->machine_done.notify = remote_object_machine_done;
+    qemu_add_machine_init_done_notifier(&o->machine_done);
+}
+
+static void remote_object_finalize(Object *obj)
+{
+    RemoteObjectClass *k = REMOTE_OBJECT_GET_CLASS(obj);
+    RemoteObject *o = REMOTE_OBJECT(obj);
+
+    if (o->ioc) {
+        qio_channel_shutdown(o->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
+        qio_channel_close(o->ioc, NULL);
+    }
+
+    object_unref(OBJECT(o->ioc));
+
+    k->nr_devs--;
+    g_free(o->devid);
+}
+
+static void remote_object_class_init(ObjectClass *klass, void *data)
+{
+    RemoteObjectClass *k = REMOTE_OBJECT_CLASS(klass);
+
+    k->max_devs = 1;
+    k->nr_devs = 0;
+}
+
+static const TypeInfo remote_object_info = {
+    .name = TYPE_REMOTE_OBJECT,
+    .parent = TYPE_OBJECT,
+    .instance_size = sizeof(RemoteObject),
+    .instance_init = remote_object_init,
+    .instance_finalize = remote_object_finalize,
+    .class_size = sizeof(RemoteObjectClass),
+    .class_init = remote_object_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_USER_CREATABLE },
+        { }
+    }
+};
+
+static void register_types(void)
+{
+    type_register_static(&remote_object_info);
+}
+
+type_init(register_types);
diff --git a/MAINTAINERS b/MAINTAINERS
index b64e4b8..aedfc27 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3144,6 +3144,8 @@ F: include/hw/remote/machine.h
 F: hw/remote/mpqemu-link.c
 F: include/hw/remote/mpqemu-link.h
 F: hw/remote/message.c
+F: include/hw/remote/remote-obj.h
+F: hw/remote/remote-obj.c
 
 Build and test automation
 -------------------------
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
index 9f5c57f..71d0a56 100644
--- a/hw/remote/meson.build
+++ b/hw/remote/meson.build
@@ -3,5 +3,6 @@ remote_ss = ss.source_set()
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('machine.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('mpqemu-link.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
+remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
 
 softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v12 11/19] multi-process: setup memory manager for remote device
  2020-12-01 20:22 [PATCH v12 00/19] Initial support for multi-process Qemu Jagannathan Raman
                   ` (9 preceding siblings ...)
  2020-12-01 20:22 ` [PATCH v12 10/19] multi-process: Associate fd of a PCIDevice with its object Jagannathan Raman
@ 2020-12-01 20:22 ` Jagannathan Raman
  2020-12-08 11:54   ` Marc-André Lureau
  2020-12-08 11:58   ` Marc-André Lureau
  2020-12-01 20:22 ` [PATCH v12 12/19] multi-process: introduce proxy object Jagannathan Raman
                   ` (8 subsequent siblings)
  19 siblings, 2 replies; 52+ messages in thread
From: Jagannathan Raman @ 2020-12-01 20:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, swapnil.ingle, john.g.johnson, kraxel,
	jag.raman, quintela, mst, armbru, kanth.ghatraju, felipe, thuth,
	ehabkost, konrad.wilk, dgilbert, alex.williamson, stefanha,
	thanos.makatos, rth, kwolf, berrange, mreitz, ross.lagerwall,
	marcandre.lureau, pbonzini

SyncSysMemMsg message format is defined. It is used to send
file descriptors of the RAM regions to remote device.
RAM on the remote device is configured with a set of file descriptors.
Old RAM regions are deleted and new regions, each with an fd, is
added to the RAM.

Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/hw/remote/memory.h      | 19 ++++++++++++++
 include/hw/remote/mpqemu-link.h | 13 +++++++++
 hw/remote/memory.c              | 58 +++++++++++++++++++++++++++++++++++++++++
 hw/remote/mpqemu-link.c         | 11 ++++++++
 MAINTAINERS                     |  2 ++
 hw/remote/meson.build           |  2 ++
 6 files changed, 105 insertions(+)
 create mode 100644 include/hw/remote/memory.h
 create mode 100644 hw/remote/memory.c

diff --git a/include/hw/remote/memory.h b/include/hw/remote/memory.h
new file mode 100644
index 0000000..4fd548e
--- /dev/null
+++ b/include/hw/remote/memory.h
@@ -0,0 +1,19 @@
+/*
+ * Memory manager for remote device
+ *
+ * Copyright © 2018, 2020 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef REMOTE_MEMORY_H
+#define REMOTE_MEMORY_H
+
+#include "exec/hwaddr.h"
+#include "hw/remote/mpqemu-link.h"
+
+void remote_sysmem_reconfig(MPQemuMsg *msg, Error **errp);
+
+#endif
diff --git a/include/hw/remote/mpqemu-link.h b/include/hw/remote/mpqemu-link.h
index 2d79ff8..070ac77 100644
--- a/include/hw/remote/mpqemu-link.h
+++ b/include/hw/remote/mpqemu-link.h
@@ -14,6 +14,7 @@
 #include "qom/object.h"
 #include "qemu/thread.h"
 #include "io/channel.h"
+#include "exec/hwaddr.h"
 
 #define REMOTE_MAX_FDS 8
 
@@ -24,12 +25,22 @@
  *
  * MPQemuCmd enum type to specify the command to be executed on the remote
  * device.
+ *
+ * SYNC_SYSMEM      Shares QEMU's RAM with remote device's RAM
  */
 typedef enum {
     MPQEMU_CMD_INIT,
+    SYNC_SYSMEM,
+    RET_MSG,
     MPQEMU_CMD_MAX,
 } MPQemuCmd;
 
+typedef struct {
+    hwaddr gpas[REMOTE_MAX_FDS];
+    uint64_t sizes[REMOTE_MAX_FDS];
+    off_t offsets[REMOTE_MAX_FDS];
+} SyncSysmemMsg;
+
 /**
  * MPQemuMsg:
  * @cmd: The remote command
@@ -40,12 +51,14 @@ typedef enum {
  * MPQemuMsg Format of the message sent to the remote device from QEMU.
  *
  */
+
 typedef struct {
     int cmd;
     size_t size;
 
     union {
         uint64_t u64;
+        SyncSysmemMsg sync_sysmem;
     } data;
 
     int fds[REMOTE_MAX_FDS];
diff --git a/hw/remote/memory.c b/hw/remote/memory.c
new file mode 100644
index 0000000..6d1e830
--- /dev/null
+++ b/hw/remote/memory.c
@@ -0,0 +1,58 @@
+/*
+ * Memory manager for remote device
+ *
+ * Copyright © 2018, 2020 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include "hw/remote/memory.h"
+#include "exec/address-spaces.h"
+#include "exec/ram_addr.h"
+#include "qapi/error.h"
+
+void remote_sysmem_reconfig(MPQemuMsg *msg, Error **errp)
+{
+    SyncSysmemMsg *sysmem_info = &msg->data.sync_sysmem;
+    MemoryRegion *sysmem, *subregion, *next;
+    static unsigned int suffix;
+    Error *local_err = NULL;
+    char *name;
+    int region;
+
+    sysmem = get_system_memory();
+
+    memory_region_transaction_begin();
+
+    QTAILQ_FOREACH_SAFE(subregion, &sysmem->subregions, subregions_link, next) {
+        if (subregion->ram) {
+            memory_region_del_subregion(sysmem, subregion);
+            object_unparent(OBJECT(subregion));
+        }
+    }
+
+    for (region = 0; region < msg->num_fds; region++) {
+        subregion = g_new(MemoryRegion, 1);
+        name = g_strdup_printf("remote-mem-%u", suffix++);
+        memory_region_init_ram_from_fd(subregion, NULL,
+                                       name, sysmem_info->sizes[region],
+                                       true, msg->fds[region],
+                                       sysmem_info->offsets[region],
+                                       &local_err);
+        g_free(name);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            break;
+        }
+
+        memory_region_add_subregion(sysmem, sysmem_info->gpas[region],
+                                    subregion);
+    }
+
+    memory_region_transaction_commit();
+}
diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
index e535ed2..bbd9df3 100644
--- a/hw/remote/mpqemu-link.c
+++ b/hw/remote/mpqemu-link.c
@@ -238,5 +238,16 @@ bool mpqemu_msg_valid(MPQemuMsg *msg)
         }
     }
 
+     /* Verify message specific fields. */
+    switch (msg->cmd) {
+    case SYNC_SYSMEM:
+        if (msg->num_fds == 0 || msg->size != sizeof(SyncSysmemMsg)) {
+            return false;
+        }
+        break;
+    default:
+        break;
+    }
+
     return true;
 }
diff --git a/MAINTAINERS b/MAINTAINERS
index aedfc27..24cb36e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3146,6 +3146,8 @@ F: include/hw/remote/mpqemu-link.h
 F: hw/remote/message.c
 F: include/hw/remote/remote-obj.h
 F: hw/remote/remote-obj.c
+F: include/hw/remote/memory.h
+F: hw/remote/memory.c
 
 Build and test automation
 -------------------------
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
index 71d0a56..64da16c 100644
--- a/hw/remote/meson.build
+++ b/hw/remote/meson.build
@@ -5,4 +5,6 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('mpqemu-link.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
 
+specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
+
 softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v12 12/19] multi-process: introduce proxy object
  2020-12-01 20:22 [PATCH v12 00/19] Initial support for multi-process Qemu Jagannathan Raman
                   ` (10 preceding siblings ...)
  2020-12-01 20:22 ` [PATCH v12 11/19] multi-process: setup memory manager for remote device Jagannathan Raman
@ 2020-12-01 20:22 ` Jagannathan Raman
  2020-12-08 12:23   ` Marc-André Lureau
  2020-12-01 20:22 ` [PATCH v12 13/19] multi-process: add proxy communication functions Jagannathan Raman
                   ` (7 subsequent siblings)
  19 siblings, 1 reply; 52+ messages in thread
From: Jagannathan Raman @ 2020-12-01 20:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, swapnil.ingle, john.g.johnson, kraxel,
	jag.raman, quintela, mst, armbru, kanth.ghatraju, felipe, thuth,
	ehabkost, konrad.wilk, dgilbert, alex.williamson, stefanha,
	thanos.makatos, rth, kwolf, berrange, mreitz, ross.lagerwall,
	marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

Defines a PCI Device proxy object as a child of TYPE_PCI_DEVICE.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/hw/remote/proxy.h | 36 +++++++++++++++++
 hw/remote/proxy.c         | 98 +++++++++++++++++++++++++++++++++++++++++++++++
 MAINTAINERS               |  2 +
 hw/remote/meson.build     |  1 +
 4 files changed, 137 insertions(+)
 create mode 100644 include/hw/remote/proxy.h
 create mode 100644 hw/remote/proxy.c

diff --git a/include/hw/remote/proxy.h b/include/hw/remote/proxy.h
new file mode 100644
index 0000000..923432a
--- /dev/null
+++ b/include/hw/remote/proxy.h
@@ -0,0 +1,36 @@
+/*
+ * Copyright © 2018, 2020 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef PROXY_H
+#define PROXY_H
+
+#include "hw/pci/pci.h"
+#include "io/channel.h"
+
+#define TYPE_PCI_PROXY_DEV "x-pci-proxy-dev"
+
+#define PCI_PROXY_DEV(obj) \
+            OBJECT_CHECK(PCIProxyDev, (obj), TYPE_PCI_PROXY_DEV)
+typedef struct PCIProxyDev PCIProxyDev;
+
+struct PCIProxyDev {
+    PCIDevice parent_dev;
+    char *fd;
+
+    /*
+     * Mutex used to protect the QIOChannel fd from
+     * the concurrent access by the VCPUs since proxy
+     * blocks while awaiting for the replies from the
+     * process remote.
+     */
+    QemuMutex io_mutex;
+    QIOChannel *ioc;
+    Error *migration_blocker;
+};
+
+#endif /* PROXY_H */
diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
new file mode 100644
index 0000000..29100bc
--- /dev/null
+++ b/hw/remote/proxy.c
@@ -0,0 +1,98 @@
+/*
+ * Copyright © 2018, 2020 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include "hw/remote/proxy.h"
+#include "hw/pci/pci.h"
+#include "qapi/error.h"
+#include "io/channel-util.h"
+#include "hw/qdev-properties.h"
+#include "monitor/monitor.h"
+#include "migration/blocker.h"
+
+static void proxy_set_socket(PCIProxyDev *pdev, int fd, Error **errp)
+{
+    pdev->ioc = qio_channel_new_fd(fd, errp);
+}
+
+static Property proxy_properties[] = {
+    DEFINE_PROP_STRING("fd", PCIProxyDev, fd),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
+{
+    PCIProxyDev *dev = PCI_PROXY_DEV(device);
+    int fd;
+
+    if (dev->fd) {
+        fd = monitor_fd_param(monitor_cur(), dev->fd, errp);
+        if (fd == -1) {
+            error_prepend(errp, "proxy: unable to parse fd: ");
+            return;
+        }
+        proxy_set_socket(dev, fd, errp);
+    } else {
+        error_setg(errp, "fd parameter not specified for %s",
+                   DEVICE(device)->id);
+        return;
+    }
+
+    error_setg(&dev->migration_blocker, "%s does not support migration",
+               TYPE_PCI_PROXY_DEV);
+    if (migrate_add_blocker(dev->migration_blocker, errp)) {
+        error_free(dev->migration_blocker);
+        error_free(*errp);
+        dev->migration_blocker = NULL;
+        error_setg(errp, "Failed to set migration blocker");
+    }
+
+    qemu_mutex_init(&dev->io_mutex);
+    qio_channel_set_blocking(dev->ioc, true, NULL);
+}
+
+static void pci_proxy_dev_exit(PCIDevice *pdev)
+{
+    PCIProxyDev *dev = PCI_PROXY_DEV(pdev);
+
+    qio_channel_close(dev->ioc, NULL);
+
+    migrate_del_blocker(dev->migration_blocker);
+
+    error_free(dev->migration_blocker);
+}
+
+static void pci_proxy_dev_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+    k->realize = pci_proxy_dev_realize;
+    k->exit = pci_proxy_dev_exit;
+    device_class_set_props(dc, proxy_properties);
+}
+
+static const TypeInfo pci_proxy_dev_type_info = {
+    .name          = TYPE_PCI_PROXY_DEV,
+    .parent        = TYPE_PCI_DEVICE,
+    .instance_size = sizeof(PCIProxyDev),
+    .class_init    = pci_proxy_dev_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { INTERFACE_CONVENTIONAL_PCI_DEVICE },
+        { },
+    },
+};
+
+static void pci_proxy_dev_register_types(void)
+{
+    type_register_static(&pci_proxy_dev_type_info);
+}
+
+type_init(pci_proxy_dev_register_types)
diff --git a/MAINTAINERS b/MAINTAINERS
index 24cb36e..ebd1d1d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3148,6 +3148,8 @@ F: include/hw/remote/remote-obj.h
 F: hw/remote/remote-obj.c
 F: include/hw/remote/memory.h
 F: hw/remote/memory.c
+F: hw/remote/proxy.c
+F: include/hw/remote/proxy.h
 
 Build and test automation
 -------------------------
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
index 64da16c..569cd20 100644
--- a/hw/remote/meson.build
+++ b/hw/remote/meson.build
@@ -4,6 +4,7 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('machine.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('mpqemu-link.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
+remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
 
 specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v12 13/19] multi-process: add proxy communication functions
  2020-12-01 20:22 [PATCH v12 00/19] Initial support for multi-process Qemu Jagannathan Raman
                   ` (11 preceding siblings ...)
  2020-12-01 20:22 ` [PATCH v12 12/19] multi-process: introduce proxy object Jagannathan Raman
@ 2020-12-01 20:22 ` Jagannathan Raman
  2020-12-08 12:39   ` Marc-André Lureau
  2020-12-01 20:22 ` [PATCH v12 14/19] multi-process: Forward PCI config space acceses to the remote process Jagannathan Raman
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 52+ messages in thread
From: Jagannathan Raman @ 2020-12-01 20:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, swapnil.ingle, john.g.johnson, kraxel,
	jag.raman, quintela, mst, armbru, kanth.ghatraju, felipe, thuth,
	ehabkost, konrad.wilk, dgilbert, alex.williamson, stefanha,
	thanos.makatos, rth, kwolf, berrange, mreitz, ross.lagerwall,
	marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/hw/remote/mpqemu-link.h |  4 ++++
 hw/remote/mpqemu-link.c         | 38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/include/hw/remote/mpqemu-link.h b/include/hw/remote/mpqemu-link.h
index 070ac77..cee9468 100644
--- a/include/hw/remote/mpqemu-link.h
+++ b/include/hw/remote/mpqemu-link.h
@@ -15,6 +15,8 @@
 #include "qemu/thread.h"
 #include "io/channel.h"
 #include "exec/hwaddr.h"
+#include "io/channel-socket.h"
+#include "hw/remote/proxy.h"
 
 #define REMOTE_MAX_FDS 8
 
@@ -65,6 +67,8 @@ typedef struct {
     int num_fds;
 } MPQemuMsg;
 
+uint64_t mpqemu_msg_send_and_await_reply(MPQemuMsg *msg, PCIProxyDev *pdev,
+                                         Error **errp);
 void mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp);
 void mpqemu_msg_recv(MPQemuMsg *msg, QIOChannel *ioc, Error **errp);
 
diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
index bbd9df3..18c8a54 100644
--- a/hw/remote/mpqemu-link.c
+++ b/hw/remote/mpqemu-link.c
@@ -17,6 +17,7 @@
 #include "qemu/iov.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
+#include "io/channel.h"
 
 /*
  * Send message over the ioc QIOChannel.
@@ -219,6 +220,43 @@ fail:
     }
 }
 
+/*
+ * Called from VCPU thread in non-coroutine context.
+ */
+uint64_t mpqemu_msg_send_and_await_reply(MPQemuMsg *msg, PCIProxyDev *pdev,
+                                         Error **errp)
+{
+    MPQemuMsg msg_reply = {0};
+    uint64_t ret = UINT64_MAX;
+    Error *local_err = NULL;
+
+    qemu_mutex_lock(&pdev->io_mutex);
+    mpqemu_msg_send(msg, pdev->ioc, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        goto exit_send;
+    }
+
+    mpqemu_msg_recv(&msg_reply, pdev->ioc, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        goto exit_send;
+    }
+
+    if (!mpqemu_msg_valid(&msg_reply) || msg_reply.cmd != RET_MSG) {
+        error_setg(errp, "ERROR: Invalid reply received for command %d",
+                         msg->cmd);
+        goto exit_send;
+    } else {
+        ret = msg_reply.data.u64;
+    }
+
+ exit_send:
+    qemu_mutex_unlock(&pdev->io_mutex);
+
+    return ret;
+}
+
 bool mpqemu_msg_valid(MPQemuMsg *msg)
 {
     if (msg->cmd >= MPQEMU_CMD_MAX && msg->cmd < 0) {
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v12 14/19] multi-process: Forward PCI config space acceses to the remote process
  2020-12-01 20:22 [PATCH v12 00/19] Initial support for multi-process Qemu Jagannathan Raman
                   ` (12 preceding siblings ...)
  2020-12-01 20:22 ` [PATCH v12 13/19] multi-process: add proxy communication functions Jagannathan Raman
@ 2020-12-01 20:22 ` Jagannathan Raman
  2020-12-08 12:52   ` Marc-André Lureau
  2020-12-01 20:22 ` [PATCH v12 15/19] multi-process: PCI BAR read/write handling for proxy & remote endpoints Jagannathan Raman
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 52+ messages in thread
From: Jagannathan Raman @ 2020-12-01 20:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, swapnil.ingle, john.g.johnson, kraxel,
	jag.raman, quintela, mst, armbru, kanth.ghatraju, felipe, thuth,
	ehabkost, konrad.wilk, dgilbert, alex.williamson, stefanha,
	thanos.makatos, rth, kwolf, berrange, mreitz, ross.lagerwall,
	marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

The Proxy Object sends the PCI config space accesses as messages
to the remote process over the communication channel

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/hw/remote/mpqemu-link.h |  9 ++++++
 hw/remote/message.c             | 62 +++++++++++++++++++++++++++++++++++++++++
 hw/remote/mpqemu-link.c         |  6 ++++
 hw/remote/proxy.c               | 51 +++++++++++++++++++++++++++++++++
 4 files changed, 128 insertions(+)

diff --git a/include/hw/remote/mpqemu-link.h b/include/hw/remote/mpqemu-link.h
index cee9468..057c98b 100644
--- a/include/hw/remote/mpqemu-link.h
+++ b/include/hw/remote/mpqemu-link.h
@@ -34,6 +34,8 @@ typedef enum {
     MPQEMU_CMD_INIT,
     SYNC_SYSMEM,
     RET_MSG,
+    PCI_CONFIG_WRITE,
+    PCI_CONFIG_READ,
     MPQEMU_CMD_MAX,
 } MPQemuCmd;
 
@@ -43,6 +45,12 @@ typedef struct {
     off_t offsets[REMOTE_MAX_FDS];
 } SyncSysmemMsg;
 
+typedef struct {
+    uint32_t addr;
+    uint32_t val;
+    int l;
+} PciConfDataMsg;
+
 /**
  * MPQemuMsg:
  * @cmd: The remote command
@@ -60,6 +68,7 @@ typedef struct {
 
     union {
         uint64_t u64;
+        PciConfDataMsg pci_conf_data;
         SyncSysmemMsg sync_sysmem;
     } data;
 
diff --git a/hw/remote/message.c b/hw/remote/message.c
index 1f2edc7..52a6f6f 100644
--- a/hw/remote/message.c
+++ b/hw/remote/message.c
@@ -15,6 +15,12 @@
 #include "hw/remote/mpqemu-link.h"
 #include "qapi/error.h"
 #include "sysemu/runstate.h"
+#include "hw/pci/pci.h"
+
+static void process_config_write(QIOChannel *ioc, PCIDevice *dev,
+                                 MPQemuMsg *msg);
+static void process_config_read(QIOChannel *ioc, PCIDevice *dev,
+                                MPQemuMsg *msg);
 
 void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
 {
@@ -43,6 +49,12 @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
         }
 
         switch (msg.cmd) {
+        case PCI_CONFIG_WRITE:
+            process_config_write(com->ioc, pci_dev, &msg);
+            break;
+        case PCI_CONFIG_READ:
+            process_config_read(com->ioc, pci_dev, &msg);
+            break;
         default:
             error_setg(&local_err,
                        "Unknown command (%d) received for device %s (pid=%d)",
@@ -60,3 +72,53 @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
 
     return;
 }
+
+static void process_config_write(QIOChannel *ioc, PCIDevice *dev,
+                                 MPQemuMsg *msg)
+{
+    PciConfDataMsg *conf = (PciConfDataMsg *)&msg->data.pci_conf_data;
+    MPQemuMsg ret = { 0 };
+    Error *local_err = NULL;
+
+    if ((conf->addr + sizeof(conf->val)) > pci_config_size(dev)) {
+        error_report("Bad address received when writing PCI config, pid %d",
+                     getpid());
+        ret.data.u64 = UINT64_MAX;
+    } else {
+        pci_default_write_config(dev, conf->addr, conf->val, conf->l);
+    }
+
+    ret.cmd = RET_MSG;
+    ret.size = sizeof(ret.data.u64);
+
+    mpqemu_msg_send(&ret, ioc, &local_err);
+    if (local_err) {
+        error_report("Could not send message to proxy from pid %d",
+                     getpid());
+    }
+}
+
+static void process_config_read(QIOChannel *ioc, PCIDevice *dev,
+                                MPQemuMsg *msg)
+{
+    PciConfDataMsg *conf = (PciConfDataMsg *)&msg->data.pci_conf_data;
+    MPQemuMsg ret = { 0 };
+    Error *local_err = NULL;
+
+    if ((conf->addr + sizeof(conf->val)) > pci_config_size(dev)) {
+        error_report("Bad address received when reading PCI config, pid %d",
+                     getpid());
+        ret.data.u64 = UINT64_MAX;
+    } else {
+        ret.data.u64 = pci_default_read_config(dev, conf->addr, conf->l);
+    }
+
+    ret.cmd = RET_MSG;
+    ret.size = sizeof(ret.data.u64);
+
+    mpqemu_msg_send(&ret, ioc, &local_err);
+    if (local_err) {
+        error_report("Could not send message to proxy from pid %d",
+                     getpid());
+    }
+}
diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
index 18c8a54..83dbd65 100644
--- a/hw/remote/mpqemu-link.c
+++ b/hw/remote/mpqemu-link.c
@@ -283,6 +283,12 @@ bool mpqemu_msg_valid(MPQemuMsg *msg)
             return false;
         }
         break;
+    case PCI_CONFIG_WRITE:
+    case PCI_CONFIG_READ:
+        if (msg->size != sizeof(PciConfDataMsg)) {
+            return false;
+        }
+        break;
     default:
         break;
     }
diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
index 29100bc..c193484 100644
--- a/hw/remote/proxy.c
+++ b/hw/remote/proxy.c
@@ -16,6 +16,8 @@
 #include "hw/qdev-properties.h"
 #include "monitor/monitor.h"
 #include "migration/blocker.h"
+#include "hw/remote/mpqemu-link.h"
+#include "qemu/error-report.h"
 
 static void proxy_set_socket(PCIProxyDev *pdev, int fd, Error **errp)
 {
@@ -69,6 +71,52 @@ static void pci_proxy_dev_exit(PCIDevice *pdev)
     error_free(dev->migration_blocker);
 }
 
+static int config_op_send(PCIProxyDev *pdev, uint32_t addr, uint32_t *val,
+                          int l, unsigned int op)
+{
+    MPQemuMsg msg = { 0 };
+    uint64_t ret = -EINVAL;
+    Error *local_err = NULL;
+
+    msg.cmd = op;
+    msg.data.pci_conf_data.addr = addr;
+    msg.data.pci_conf_data.val = (op == PCI_CONFIG_WRITE) ? *val : 0;
+    msg.data.pci_conf_data.l = l;
+    msg.size = sizeof(PciConfDataMsg);
+
+    ret = mpqemu_msg_send_and_await_reply(&msg, pdev, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+    }
+    if (op == PCI_CONFIG_READ) {
+        *val = (uint32_t)ret;
+    }
+
+    return ret;
+}
+
+static uint32_t pci_proxy_read_config(PCIDevice *d, uint32_t addr, int len)
+{
+    uint32_t val;
+
+    (void)config_op_send(PCI_PROXY_DEV(d), addr, &val, len, PCI_CONFIG_READ);
+
+    return val;
+}
+
+static void pci_proxy_write_config(PCIDevice *d, uint32_t addr, uint32_t val,
+                                   int l)
+{
+    /*
+     * Some of the functions access the copy of remote device's PCI config
+     * space which is cached in the proxy device. Therefore, maintain
+     * it updated.
+     */
+    pci_default_write_config(d, addr, val, l);
+
+    (void)config_op_send(PCI_PROXY_DEV(d), addr, &val, l, PCI_CONFIG_WRITE);
+}
+
 static void pci_proxy_dev_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
@@ -76,6 +124,9 @@ static void pci_proxy_dev_class_init(ObjectClass *klass, void *data)
 
     k->realize = pci_proxy_dev_realize;
     k->exit = pci_proxy_dev_exit;
+    k->config_read = pci_proxy_read_config;
+    k->config_write = pci_proxy_write_config;
+
     device_class_set_props(dc, proxy_properties);
 }
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v12 15/19] multi-process: PCI BAR read/write handling for proxy & remote endpoints
  2020-12-01 20:22 [PATCH v12 00/19] Initial support for multi-process Qemu Jagannathan Raman
                   ` (13 preceding siblings ...)
  2020-12-01 20:22 ` [PATCH v12 14/19] multi-process: Forward PCI config space acceses to the remote process Jagannathan Raman
@ 2020-12-01 20:22 ` Jagannathan Raman
  2020-12-01 20:22 ` [PATCH v12 16/19] multi-process: Synchronize remote memory Jagannathan Raman
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 52+ messages in thread
From: Jagannathan Raman @ 2020-12-01 20:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, swapnil.ingle, john.g.johnson, kraxel,
	jag.raman, quintela, mst, armbru, kanth.ghatraju, felipe, thuth,
	ehabkost, konrad.wilk, dgilbert, alex.williamson, stefanha,
	thanos.makatos, rth, kwolf, berrange, mreitz, ross.lagerwall,
	marcandre.lureau, pbonzini

Proxy device object implements handler for PCI BAR writes and reads.
The handler uses BAR_WRITE/BAR_READ message to communicate to the
remote process with the BAR address and value to be written/read.
The remote process implements handler for BAR_WRITE/BAR_READ
message.

Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/hw/remote/mpqemu-link.h | 10 +++++
 include/hw/remote/proxy.h       | 10 +++++
 hw/remote/message.c             | 87 +++++++++++++++++++++++++++++++++++++++++
 hw/remote/mpqemu-link.c         |  6 +++
 hw/remote/proxy.c               | 60 ++++++++++++++++++++++++++++
 5 files changed, 173 insertions(+)

diff --git a/include/hw/remote/mpqemu-link.h b/include/hw/remote/mpqemu-link.h
index 057c98b..c752738 100644
--- a/include/hw/remote/mpqemu-link.h
+++ b/include/hw/remote/mpqemu-link.h
@@ -36,6 +36,8 @@ typedef enum {
     RET_MSG,
     PCI_CONFIG_WRITE,
     PCI_CONFIG_READ,
+    BAR_WRITE,
+    BAR_READ,
     MPQEMU_CMD_MAX,
 } MPQemuCmd;
 
@@ -51,6 +53,13 @@ typedef struct {
     int l;
 } PciConfDataMsg;
 
+typedef struct {
+    hwaddr addr;
+    uint64_t val;
+    unsigned size;
+    bool memory;
+} BarAccessMsg;
+
 /**
  * MPQemuMsg:
  * @cmd: The remote command
@@ -70,6 +79,7 @@ typedef struct {
         uint64_t u64;
         PciConfDataMsg pci_conf_data;
         SyncSysmemMsg sync_sysmem;
+        BarAccessMsg bar_access;
     } data;
 
     int fds[REMOTE_MAX_FDS];
diff --git a/include/hw/remote/proxy.h b/include/hw/remote/proxy.h
index 923432a..e29c61b 100644
--- a/include/hw/remote/proxy.h
+++ b/include/hw/remote/proxy.h
@@ -16,8 +16,17 @@
 
 #define PCI_PROXY_DEV(obj) \
             OBJECT_CHECK(PCIProxyDev, (obj), TYPE_PCI_PROXY_DEV)
+
 typedef struct PCIProxyDev PCIProxyDev;
 
+typedef struct ProxyMemoryRegion {
+    PCIProxyDev *dev;
+    MemoryRegion mr;
+    bool memory;
+    bool present;
+    uint8_t type;
+} ProxyMemoryRegion;
+
 struct PCIProxyDev {
     PCIDevice parent_dev;
     char *fd;
@@ -31,6 +40,7 @@ struct PCIProxyDev {
     QemuMutex io_mutex;
     QIOChannel *ioc;
     Error *migration_blocker;
+    ProxyMemoryRegion region[PCI_NUM_REGIONS];
 };
 
 #endif /* PROXY_H */
diff --git a/hw/remote/message.c b/hw/remote/message.c
index 52a6f6f..0f3e38a 100644
--- a/hw/remote/message.c
+++ b/hw/remote/message.c
@@ -16,11 +16,14 @@
 #include "qapi/error.h"
 #include "sysemu/runstate.h"
 #include "hw/pci/pci.h"
+#include "exec/memattrs.h"
 
 static void process_config_write(QIOChannel *ioc, PCIDevice *dev,
                                  MPQemuMsg *msg);
 static void process_config_read(QIOChannel *ioc, PCIDevice *dev,
                                 MPQemuMsg *msg);
+static void process_bar_write(QIOChannel *ioc, MPQemuMsg *msg, Error **errp);
+static void process_bar_read(QIOChannel *ioc, MPQemuMsg *msg, Error **errp);
 
 void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
 {
@@ -55,6 +58,12 @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
         case PCI_CONFIG_READ:
             process_config_read(com->ioc, pci_dev, &msg);
             break;
+        case BAR_WRITE:
+            process_bar_write(com->ioc, &msg, &local_err);
+            break;
+        case BAR_READ:
+            process_bar_read(com->ioc, &msg, &local_err);
+            break;
         default:
             error_setg(&local_err,
                        "Unknown command (%d) received for device %s (pid=%d)",
@@ -122,3 +131,81 @@ static void process_config_read(QIOChannel *ioc, PCIDevice *dev,
                      getpid());
     }
 }
+
+static void process_bar_write(QIOChannel *ioc, MPQemuMsg *msg, Error **errp)
+{
+    BarAccessMsg *bar_access = &msg->data.bar_access;
+    AddressSpace *as =
+        bar_access->memory ? &address_space_memory : &address_space_io;
+    MPQemuMsg ret = { 0 };
+    MemTxResult res;
+    uint64_t val;
+    Error *local_err = NULL;
+
+    if (!is_power_of_2(bar_access->size) ||
+       (bar_access->size > sizeof(uint64_t))) {
+        ret.data.u64 = UINT64_MAX;
+        goto fail;
+    }
+
+    val = cpu_to_le64(bar_access->val);
+
+    res = address_space_rw(as, bar_access->addr, MEMTXATTRS_UNSPECIFIED,
+                           (void *)&val, bar_access->size, true);
+
+    if (res != MEMTX_OK) {
+        error_setg(errp, "Could not perform address space write operation,"
+                   " inaccessible address: %"PRIx64" in pid %d.",
+                   bar_access->addr, getpid());
+        ret.data.u64 = -1;
+    }
+
+fail:
+    ret.cmd = RET_MSG;
+    ret.size = sizeof(ret.data.u64);
+
+    mpqemu_msg_send(&ret, ioc, &local_err);
+    if (local_err) {
+        error_setg(errp, "Error while sending message to proxy "
+                   "in remote process pid=%d", getpid());
+    }
+}
+
+static void process_bar_read(QIOChannel *ioc, MPQemuMsg *msg, Error **errp)
+{
+    BarAccessMsg *bar_access = &msg->data.bar_access;
+    MPQemuMsg ret = { 0 };
+    AddressSpace *as;
+    MemTxResult res;
+    uint64_t val = 0;
+    Error *local_err = NULL;
+
+    as = bar_access->memory ? &address_space_memory : &address_space_io;
+
+    if (!is_power_of_2(bar_access->size) ||
+       (bar_access->size > sizeof(uint64_t))) {
+        val = UINT64_MAX;
+        goto fail;
+    }
+
+    res = address_space_rw(as, bar_access->addr, MEMTXATTRS_UNSPECIFIED,
+                           (void *)&val, bar_access->size, false);
+
+    if (res != MEMTX_OK) {
+        error_setg(errp, "Could not perform address space read operation,"
+                   " inaccessible address: %"PRIx64" in pid %d.",
+                   bar_access->addr, getpid());
+        val = UINT64_MAX;
+    }
+
+fail:
+    ret.cmd = RET_MSG;
+    ret.data.u64 = le64_to_cpu(val);
+    ret.size = sizeof(ret.data.u64);
+
+    mpqemu_msg_send(&ret, ioc, &local_err);
+    if (local_err) {
+        error_setg(errp, "Error while sending message to proxy "
+                   "in remote process pid=%d", getpid());
+    }
+}
diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
index 83dbd65..ac2cb2a 100644
--- a/hw/remote/mpqemu-link.c
+++ b/hw/remote/mpqemu-link.c
@@ -289,6 +289,12 @@ bool mpqemu_msg_valid(MPQemuMsg *msg)
             return false;
         }
         break;
+    case BAR_WRITE:
+    case BAR_READ:
+        if ((msg->size != sizeof(BarAccessMsg)) || (msg->num_fds != 0)) {
+            return false;
+        }
+        break;
     default:
         break;
     }
diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
index c193484..039347d 100644
--- a/hw/remote/proxy.c
+++ b/hw/remote/proxy.c
@@ -147,3 +147,63 @@ static void pci_proxy_dev_register_types(void)
 }
 
 type_init(pci_proxy_dev_register_types)
+
+static void send_bar_access_msg(PCIProxyDev *pdev, MemoryRegion *mr,
+                                bool write, hwaddr addr, uint64_t *val,
+                                unsigned size, bool memory)
+{
+    MPQemuMsg msg = { 0 };
+    long ret = -EINVAL;
+    Error *local_err = NULL;
+
+    msg.size = sizeof(BarAccessMsg);
+    msg.data.bar_access.addr = mr->addr + addr;
+    msg.data.bar_access.size = size;
+    msg.data.bar_access.memory = memory;
+
+    if (write) {
+        msg.cmd = BAR_WRITE;
+        msg.data.bar_access.val = *val;
+    } else {
+        msg.cmd = BAR_READ;
+    }
+
+    ret = mpqemu_msg_send_and_await_reply(&msg, pdev, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+    }
+
+    if (!write) {
+        *val = ret;
+    }
+}
+
+static void proxy_bar_write(void *opaque, hwaddr addr, uint64_t val,
+                            unsigned size)
+{
+    ProxyMemoryRegion *pmr = opaque;
+
+    send_bar_access_msg(pmr->dev, &pmr->mr, true, addr, &val, size,
+                        pmr->memory);
+}
+
+static uint64_t proxy_bar_read(void *opaque, hwaddr addr, unsigned size)
+{
+    ProxyMemoryRegion *pmr = opaque;
+    uint64_t val;
+
+    send_bar_access_msg(pmr->dev, &pmr->mr, false, addr, &val, size,
+                        pmr->memory);
+
+    return val;
+}
+
+const MemoryRegionOps proxy_mr_ops = {
+    .read = proxy_bar_read,
+    .write = proxy_bar_write,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+    .impl = {
+        .min_access_size = 1,
+        .max_access_size = 8,
+    },
+};
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v12 16/19] multi-process: Synchronize remote memory
  2020-12-01 20:22 [PATCH v12 00/19] Initial support for multi-process Qemu Jagannathan Raman
                   ` (14 preceding siblings ...)
  2020-12-01 20:22 ` [PATCH v12 15/19] multi-process: PCI BAR read/write handling for proxy & remote endpoints Jagannathan Raman
@ 2020-12-01 20:22 ` Jagannathan Raman
  2020-12-08 13:57   ` Marc-André Lureau
  2020-12-01 20:22 ` [PATCH v12 17/19] multi-process: create IOHUB object to handle irq Jagannathan Raman
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 52+ messages in thread
From: Jagannathan Raman @ 2020-12-01 20:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, swapnil.ingle, john.g.johnson, kraxel,
	jag.raman, quintela, mst, armbru, kanth.ghatraju, felipe, thuth,
	ehabkost, konrad.wilk, dgilbert, alex.williamson, stefanha,
	thanos.makatos, rth, kwolf, berrange, mreitz, ross.lagerwall,
	marcandre.lureau, pbonzini

Add memory-listener object which is used to keep the view of the RAM
in sync between QEMU and remote process.
A MemoryListener is registered for system-memory AddressSpace. The
listener sends SYNC_SYSMEM message to the remote process when memory
listener commits the changes to memory, the remote process receives
the message and processes it in the handler for SYNC_SYSMEM message.

Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/hw/remote/memory-sync.h |  27 ++++++
 include/hw/remote/proxy.h       |   2 +
 hw/remote/memory-sync.c         | 210 ++++++++++++++++++++++++++++++++++++++++
 hw/remote/message.c             |   5 +
 hw/remote/proxy.c               |   6 ++
 MAINTAINERS                     |   2 +
 hw/remote/meson.build           |   1 +
 7 files changed, 253 insertions(+)
 create mode 100644 include/hw/remote/memory-sync.h
 create mode 100644 hw/remote/memory-sync.c

diff --git a/include/hw/remote/memory-sync.h b/include/hw/remote/memory-sync.h
new file mode 100644
index 0000000..785f76a
--- /dev/null
+++ b/include/hw/remote/memory-sync.h
@@ -0,0 +1,27 @@
+/*
+ * Copyright © 2018, 2020 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef MEMORY_SYNC_H
+#define MEMORY_SYNC_H
+
+#include "exec/memory.h"
+#include "io/channel.h"
+
+typedef struct RemoteMemSync {
+    MemoryListener listener;
+
+    int n_mr_sections;
+    MemoryRegionSection *mr_sections;
+
+    QIOChannel *ioc;
+} RemoteMemSync;
+
+void configure_memory_sync(RemoteMemSync *sync, QIOChannel *ioc);
+void deconfigure_memory_sync(RemoteMemSync *sync);
+
+#endif
diff --git a/include/hw/remote/proxy.h b/include/hw/remote/proxy.h
index e29c61b..a687b7d 100644
--- a/include/hw/remote/proxy.h
+++ b/include/hw/remote/proxy.h
@@ -11,6 +11,7 @@
 
 #include "hw/pci/pci.h"
 #include "io/channel.h"
+#include "hw/remote/memory-sync.h"
 
 #define TYPE_PCI_PROXY_DEV "x-pci-proxy-dev"
 
@@ -40,6 +41,7 @@ struct PCIProxyDev {
     QemuMutex io_mutex;
     QIOChannel *ioc;
     Error *migration_blocker;
+    RemoteMemSync sync;
     ProxyMemoryRegion region[PCI_NUM_REGIONS];
 };
 
diff --git a/hw/remote/memory-sync.c b/hw/remote/memory-sync.c
new file mode 100644
index 0000000..2365e69
--- /dev/null
+++ b/hw/remote/memory-sync.c
@@ -0,0 +1,210 @@
+/*
+ * Copyright © 2018, 2020 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include "qemu/compiler.h"
+#include "qemu/int128.h"
+#include "qemu/range.h"
+#include "exec/memory.h"
+#include "exec/cpu-common.h"
+#include "cpu.h"
+#include "exec/ram_addr.h"
+#include "exec/address-spaces.h"
+#include "hw/remote/mpqemu-link.h"
+#include "hw/remote/memory-sync.h"
+
+static void proxy_ml_begin(MemoryListener *listener)
+{
+    RemoteMemSync *sync = container_of(listener, RemoteMemSync, listener);
+    int mrs;
+
+    for (mrs = 0; mrs < sync->n_mr_sections; mrs++) {
+        memory_region_unref(sync->mr_sections[mrs].mr);
+    }
+
+    g_free(sync->mr_sections);
+    sync->mr_sections = NULL;
+    sync->n_mr_sections = 0;
+}
+
+static int get_fd_from_hostaddr(uint64_t host, ram_addr_t *offset)
+{
+    MemoryRegion *mr;
+    ram_addr_t off;
+
+    /**
+     * Assumes that the host address is a valid address as it's
+     * coming from the MemoryListener system. In the case host
+     * address is not valid, the following call would return
+     * the default subregion of "system_memory" region, and
+     * not NULL. So it's not possible to check for NULL here.
+     */
+    mr = memory_region_from_host((void *)(uintptr_t)host, &off);
+
+    if (offset) {
+        *offset = off;
+    }
+
+    return memory_region_get_fd(mr);
+}
+
+static bool proxy_mrs_can_merge(uint64_t host, uint64_t prev_host, size_t size)
+{
+    bool merge;
+    int fd1, fd2;
+
+    fd1 = get_fd_from_hostaddr(host, NULL);
+
+    fd2 = get_fd_from_hostaddr(prev_host, NULL);
+
+    merge = (fd1 == fd2);
+
+    merge &= ((prev_host + size) == host);
+
+    return merge;
+}
+
+static bool try_merge(RemoteMemSync *sync, MemoryRegionSection *section)
+{
+    uint64_t mrs_size, mrs_gpa, mrs_page;
+    MemoryRegionSection *prev_sec;
+    bool merged = false;
+    uintptr_t mrs_host;
+    RAMBlock *mrs_rb;
+
+    if (!sync->n_mr_sections) {
+        return false;
+    }
+
+    mrs_rb = section->mr->ram_block;
+    mrs_page = (uint64_t)qemu_ram_pagesize(mrs_rb);
+    mrs_size = int128_get64(section->size);
+    mrs_gpa = section->offset_within_address_space;
+    mrs_host = (uintptr_t)memory_region_get_ram_ptr(section->mr) +
+               section->offset_within_region;
+
+    if (get_fd_from_hostaddr(mrs_host, NULL) < 0) {
+        return true;
+    }
+
+    mrs_host = mrs_host & ~(mrs_page - 1);
+    mrs_gpa = mrs_gpa & ~(mrs_page - 1);
+    mrs_size = ROUND_UP(mrs_size, mrs_page);
+
+    prev_sec = sync->mr_sections + (sync->n_mr_sections - 1);
+    uint64_t prev_gpa_start = prev_sec->offset_within_address_space;
+    uint64_t prev_size = int128_get64(prev_sec->size);
+    uint64_t prev_gpa_end   = range_get_last(prev_gpa_start, prev_size);
+    uint64_t prev_host_start =
+        (uintptr_t)memory_region_get_ram_ptr(prev_sec->mr) +
+        prev_sec->offset_within_region;
+    uint64_t prev_host_end = range_get_last(prev_host_start, prev_size);
+
+    if (mrs_gpa <= (prev_gpa_end + 1)) {
+        g_assert(mrs_gpa > prev_gpa_start);
+
+        if ((section->mr == prev_sec->mr) &&
+            proxy_mrs_can_merge(mrs_host, prev_host_start,
+                                (mrs_gpa - prev_gpa_start))) {
+            uint64_t max_end = MAX(prev_host_end, mrs_host + mrs_size);
+            merged = true;
+            prev_sec->offset_within_address_space =
+                MIN(prev_gpa_start, mrs_gpa);
+            prev_sec->offset_within_region =
+                MIN(prev_host_start, mrs_host) -
+                (uintptr_t)memory_region_get_ram_ptr(prev_sec->mr);
+            prev_sec->size = int128_make64(max_end - MIN(prev_host_start,
+                                                         mrs_host));
+        }
+    }
+
+    return merged;
+}
+
+static void proxy_ml_region_addnop(MemoryListener *listener,
+                                   MemoryRegionSection *section)
+{
+    RemoteMemSync *sync = container_of(listener, RemoteMemSync, listener);
+
+    if (!(memory_region_is_ram(section->mr) &&
+          !memory_region_is_rom(section->mr))) {
+        return;
+    }
+
+    if (try_merge(sync, section)) {
+        return;
+    }
+
+    ++sync->n_mr_sections;
+    sync->mr_sections = g_renew(MemoryRegionSection, sync->mr_sections,
+                                sync->n_mr_sections);
+    sync->mr_sections[sync->n_mr_sections - 1] = *section;
+    sync->mr_sections[sync->n_mr_sections - 1].fv = NULL;
+    memory_region_ref(section->mr);
+}
+
+static void proxy_ml_commit(MemoryListener *listener)
+{
+    RemoteMemSync *sync = container_of(listener, RemoteMemSync, listener);
+    MPQemuMsg msg;
+    MemoryRegionSection *section;
+    ram_addr_t offset;
+    uintptr_t host_addr;
+    int region;
+    Error *local_err = NULL;
+
+    memset(&msg, 0, sizeof(MPQemuMsg));
+
+    msg.cmd = SYNC_SYSMEM;
+    msg.num_fds = sync->n_mr_sections;
+    msg.size = sizeof(SyncSysmemMsg);
+    if (msg.num_fds > REMOTE_MAX_FDS) {
+        error_report("Number of fds is more than %d", REMOTE_MAX_FDS);
+        return;
+    }
+
+    for (region = 0; region < sync->n_mr_sections; region++) {
+        section = &sync->mr_sections[region];
+        msg.data.sync_sysmem.gpas[region] =
+            section->offset_within_address_space;
+        msg.data.sync_sysmem.sizes[region] = int128_get64(section->size);
+        host_addr = (uintptr_t)memory_region_get_ram_ptr(section->mr) +
+                    section->offset_within_region;
+        msg.fds[region] = get_fd_from_hostaddr(host_addr, &offset);
+        msg.data.sync_sysmem.offsets[region] = offset;
+    }
+    mpqemu_msg_send(&msg, sync->ioc, &local_err);
+    if (local_err) {
+        error_report("Error in sending command %d", msg.cmd);
+    }
+}
+
+void deconfigure_memory_sync(RemoteMemSync *sync)
+{
+    memory_listener_unregister(&sync->listener);
+
+    proxy_ml_begin(&sync->listener);
+}
+
+void configure_memory_sync(RemoteMemSync *sync, QIOChannel *ioc)
+{
+    sync->n_mr_sections = 0;
+    sync->mr_sections = NULL;
+
+    sync->ioc = ioc;
+
+    sync->listener.begin = proxy_ml_begin;
+    sync->listener.commit = proxy_ml_commit;
+    sync->listener.region_add = proxy_ml_region_addnop;
+    sync->listener.region_nop = proxy_ml_region_addnop;
+    sync->listener.priority = 10;
+
+    memory_listener_register(&sync->listener, &address_space_memory);
+}
diff --git a/hw/remote/message.c b/hw/remote/message.c
index 0f3e38a..454fd2d 100644
--- a/hw/remote/message.c
+++ b/hw/remote/message.c
@@ -17,6 +17,7 @@
 #include "sysemu/runstate.h"
 #include "hw/pci/pci.h"
 #include "exec/memattrs.h"
+#include "hw/remote/memory.h"
 
 static void process_config_write(QIOChannel *ioc, PCIDevice *dev,
                                  MPQemuMsg *msg);
@@ -64,6 +65,10 @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
         case BAR_READ:
             process_bar_read(com->ioc, &msg, &local_err);
             break;
+        case SYNC_SYSMEM:
+            remote_sysmem_reconfig(&msg, &local_err);
+            break;
+
         default:
             error_setg(&local_err,
                        "Unknown command (%d) received for device %s (pid=%d)",
diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
index 039347d..0f2d1aa 100644
--- a/hw/remote/proxy.c
+++ b/hw/remote/proxy.c
@@ -18,6 +18,8 @@
 #include "migration/blocker.h"
 #include "hw/remote/mpqemu-link.h"
 #include "qemu/error-report.h"
+#include "hw/remote/memory-sync.h"
+#include "qom/object.h"
 
 static void proxy_set_socket(PCIProxyDev *pdev, int fd, Error **errp)
 {
@@ -58,6 +60,8 @@ static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
 
     qemu_mutex_init(&dev->io_mutex);
     qio_channel_set_blocking(dev->ioc, true, NULL);
+
+    configure_memory_sync(&dev->sync, dev->ioc);
 }
 
 static void pci_proxy_dev_exit(PCIDevice *pdev)
@@ -69,6 +73,8 @@ static void pci_proxy_dev_exit(PCIDevice *pdev)
     migrate_del_blocker(dev->migration_blocker);
 
     error_free(dev->migration_blocker);
+
+    deconfigure_memory_sync(&dev->sync);
 }
 
 static int config_op_send(PCIProxyDev *pdev, uint32_t addr, uint32_t *val,
diff --git a/MAINTAINERS b/MAINTAINERS
index ebd1d1d..5d78b78 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3150,6 +3150,8 @@ F: include/hw/remote/memory.h
 F: hw/remote/memory.c
 F: hw/remote/proxy.c
 F: include/hw/remote/proxy.h
+F: hw/remote/memory-sync.c
+F: include/hw/remote/memory-sync.h
 
 Build and test automation
 -------------------------
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
index 569cd20..7d434a5 100644
--- a/hw/remote/meson.build
+++ b/hw/remote/meson.build
@@ -7,5 +7,6 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
 
 specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
+specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory-sync.c'))
 
 softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v12 17/19] multi-process: create IOHUB object to handle irq
  2020-12-01 20:22 [PATCH v12 00/19] Initial support for multi-process Qemu Jagannathan Raman
                   ` (15 preceding siblings ...)
  2020-12-01 20:22 ` [PATCH v12 16/19] multi-process: Synchronize remote memory Jagannathan Raman
@ 2020-12-01 20:22 ` Jagannathan Raman
  2020-12-01 20:22 ` [PATCH v12 18/19] multi-process: Retrieve PCI info from remote process Jagannathan Raman
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 52+ messages in thread
From: Jagannathan Raman @ 2020-12-01 20:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, swapnil.ingle, john.g.johnson, kraxel,
	jag.raman, quintela, mst, armbru, kanth.ghatraju, felipe, thuth,
	ehabkost, konrad.wilk, dgilbert, alex.williamson, stefanha,
	thanos.makatos, rth, kwolf, berrange, mreitz, ross.lagerwall,
	marcandre.lureau, pbonzini

IOHUB object is added to manage PCI IRQs. It uses KVM_IRQFD
ioctl to create irqfd to injecting PCI interrupts to the guest.
IOHUB object forwards the irqfd to the remote process. Remote process
uses this fd to directly send interrupts to the guest, bypassing QEMU.

Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/hw/pci/pci_ids.h        |   3 +
 include/hw/remote/iohub.h       |  42 ++++++++++++++
 include/hw/remote/machine.h     |   3 +
 include/hw/remote/mpqemu-link.h |   1 +
 include/hw/remote/proxy.h       |   5 ++
 hw/remote/iohub.c               | 123 ++++++++++++++++++++++++++++++++++++++++
 hw/remote/machine.c             |  10 ++++
 hw/remote/message.c             |   4 ++
 hw/remote/mpqemu-link.c         |   5 ++
 hw/remote/proxy.c               |  58 +++++++++++++++++++
 MAINTAINERS                     |   2 +
 hw/remote/meson.build           |   1 +
 12 files changed, 257 insertions(+)
 create mode 100644 include/hw/remote/iohub.h
 create mode 100644 hw/remote/iohub.c

diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h
index 11f8ab7..bd0c17d 100644
--- a/include/hw/pci/pci_ids.h
+++ b/include/hw/pci/pci_ids.h
@@ -192,6 +192,9 @@
 #define PCI_DEVICE_ID_SUN_SIMBA          0x5000
 #define PCI_DEVICE_ID_SUN_SABRE          0xa000
 
+#define PCI_VENDOR_ID_ORACLE             0x108e
+#define PCI_DEVICE_ID_REMOTE_IOHUB       0xb000
+
 #define PCI_VENDOR_ID_CMD                0x1095
 #define PCI_DEVICE_ID_CMD_646            0x0646
 
diff --git a/include/hw/remote/iohub.h b/include/hw/remote/iohub.h
new file mode 100644
index 0000000..fc9f8b5
--- /dev/null
+++ b/include/hw/remote/iohub.h
@@ -0,0 +1,42 @@
+/*
+ * IO Hub for remote device
+ *
+ * Copyright © 2018, 2020 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef REMOTE_IOHUB_H
+#define REMOTE_IOHUB_H
+
+#include "hw/pci/pci.h"
+#include "qemu/event_notifier.h"
+#include "qemu/thread-posix.h"
+#include "hw/remote/mpqemu-link.h"
+
+#define REMOTE_IOHUB_NB_PIRQS    PCI_DEVFN_MAX
+
+typedef struct ResampleToken {
+    void *iohub;
+    int pirq;
+} ResampleToken;
+
+typedef struct RemoteIOHubState {
+    PCIDevice d;
+    EventNotifier irqfds[REMOTE_IOHUB_NB_PIRQS];
+    EventNotifier resamplefds[REMOTE_IOHUB_NB_PIRQS];
+    unsigned int irq_level[REMOTE_IOHUB_NB_PIRQS];
+    ResampleToken token[REMOTE_IOHUB_NB_PIRQS];
+    QemuMutex irq_level_lock[REMOTE_IOHUB_NB_PIRQS];
+} RemoteIOHubState;
+
+int remote_iohub_map_irq(PCIDevice *pci_dev, int intx);
+void remote_iohub_set_irq(void *opaque, int pirq, int level);
+void process_set_irqfd_msg(PCIDevice *pci_dev, MPQemuMsg *msg);
+
+void remote_iohub_init(RemoteIOHubState *iohub);
+void remote_iohub_finalize(RemoteIOHubState *iohub);
+
+#endif
diff --git a/include/hw/remote/machine.h b/include/hw/remote/machine.h
index 3073db6..f8720d7 100644
--- a/include/hw/remote/machine.h
+++ b/include/hw/remote/machine.h
@@ -15,11 +15,14 @@
 #include "hw/boards.h"
 #include "hw/pci-host/remote.h"
 #include "io/channel.h"
+#include "hw/remote/iohub.h"
 
 typedef struct RemoteMachineState {
     MachineState parent_obj;
 
     RemotePCIHost *host;
+
+    RemoteIOHubState iohub;
 } RemoteMachineState;
 
 /* Used to pass to co-routine device and ioc. */
diff --git a/include/hw/remote/mpqemu-link.h b/include/hw/remote/mpqemu-link.h
index c752738..a0c5c5d 100644
--- a/include/hw/remote/mpqemu-link.h
+++ b/include/hw/remote/mpqemu-link.h
@@ -38,6 +38,7 @@ typedef enum {
     PCI_CONFIG_READ,
     BAR_WRITE,
     BAR_READ,
+    SET_IRQFD,
     MPQEMU_CMD_MAX,
 } MPQemuCmd;
 
diff --git a/include/hw/remote/proxy.h b/include/hw/remote/proxy.h
index a687b7d..154e4a9 100644
--- a/include/hw/remote/proxy.h
+++ b/include/hw/remote/proxy.h
@@ -12,6 +12,7 @@
 #include "hw/pci/pci.h"
 #include "io/channel.h"
 #include "hw/remote/memory-sync.h"
+#include "qemu/event_notifier.h"
 
 #define TYPE_PCI_PROXY_DEV "x-pci-proxy-dev"
 
@@ -42,6 +43,10 @@ struct PCIProxyDev {
     QIOChannel *ioc;
     Error *migration_blocker;
     RemoteMemSync sync;
+    int virq;
+    EventNotifier intr;
+    EventNotifier resample;
+
     ProxyMemoryRegion region[PCI_NUM_REGIONS];
 };
 
diff --git a/hw/remote/iohub.c b/hw/remote/iohub.c
new file mode 100644
index 0000000..7b18378
--- /dev/null
+++ b/hw/remote/iohub.c
@@ -0,0 +1,123 @@
+/*
+ * Remote IO Hub
+ *
+ * Copyright © 2018, 2020 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include "hw/pci/pci.h"
+#include "hw/pci/pci_ids.h"
+#include "hw/pci/pci_bus.h"
+#include "qemu/thread.h"
+#include "hw/boards.h"
+#include "hw/remote/machine.h"
+#include "hw/remote/iohub.h"
+#include "qemu/main-loop.h"
+
+void remote_iohub_init(RemoteIOHubState *iohub)
+{
+    int pirq;
+
+    memset(&iohub->irqfds, 0, sizeof(iohub->irqfds));
+    memset(&iohub->resamplefds, 0, sizeof(iohub->resamplefds));
+
+    for (pirq = 0; pirq < REMOTE_IOHUB_NB_PIRQS; pirq++) {
+        qemu_mutex_init(&iohub->irq_level_lock[pirq]);
+        iohub->irq_level[pirq] = 0;
+        event_notifier_init_fd(&iohub->irqfds[pirq], -1);
+        event_notifier_init_fd(&iohub->resamplefds[pirq], -1);
+    }
+}
+
+void remote_iohub_finalize(RemoteIOHubState *iohub)
+{
+    int pirq;
+
+    for (pirq = 0; pirq < REMOTE_IOHUB_NB_PIRQS; pirq++) {
+        qemu_set_fd_handler(event_notifier_get_fd(&iohub->resamplefds[pirq]),
+                            NULL, NULL, NULL);
+        event_notifier_cleanup(&iohub->irqfds[pirq]);
+        event_notifier_cleanup(&iohub->resamplefds[pirq]);
+        qemu_mutex_destroy(&iohub->irq_level_lock[pirq]);
+    }
+}
+
+int remote_iohub_map_irq(PCIDevice *pci_dev, int intx)
+{
+    return pci_dev->devfn;
+}
+
+void remote_iohub_set_irq(void *opaque, int pirq, int level)
+{
+    RemoteIOHubState *iohub = opaque;
+
+    assert(pirq >= 0);
+    assert(pirq < PCI_DEVFN_MAX);
+
+    qemu_mutex_lock(&iohub->irq_level_lock[pirq]);
+
+    if (level) {
+        if (++iohub->irq_level[pirq] == 1) {
+            event_notifier_set(&iohub->irqfds[pirq]);
+        }
+    } else if (iohub->irq_level[pirq] > 0) {
+        iohub->irq_level[pirq]--;
+    }
+
+    qemu_mutex_unlock(&iohub->irq_level_lock[pirq]);
+}
+
+static void intr_resample_handler(void *opaque)
+{
+    ResampleToken *token = opaque;
+    RemoteIOHubState *iohub = token->iohub;
+    int pirq, s;
+
+    pirq = token->pirq;
+
+    s = event_notifier_test_and_clear(&iohub->resamplefds[pirq]);
+
+    assert(s >= 0);
+
+    qemu_mutex_lock(&iohub->irq_level_lock[pirq]);
+
+    if (iohub->irq_level[pirq]) {
+        event_notifier_set(&iohub->irqfds[pirq]);
+    }
+
+    qemu_mutex_unlock(&iohub->irq_level_lock[pirq]);
+}
+
+void process_set_irqfd_msg(PCIDevice *pci_dev, MPQemuMsg *msg)
+{
+    RemoteMachineState *machine = REMOTE_MACHINE(current_machine);
+    RemoteIOHubState *iohub = &machine->iohub;
+    int pirq, intx;
+
+    intx = pci_get_byte(pci_dev->config + PCI_INTERRUPT_PIN) - 1;
+
+    pirq = remote_iohub_map_irq(pci_dev, intx);
+
+    if (event_notifier_get_fd(&iohub->irqfds[pirq]) != -1) {
+        qemu_set_fd_handler(event_notifier_get_fd(&iohub->resamplefds[pirq]),
+                            NULL, NULL, NULL);
+        event_notifier_cleanup(&iohub->irqfds[pirq]);
+        event_notifier_cleanup(&iohub->resamplefds[pirq]);
+        memset(&iohub->token[pirq], 0, sizeof(ResampleToken));
+    }
+
+    event_notifier_init_fd(&iohub->irqfds[pirq], msg->fds[0]);
+    event_notifier_init_fd(&iohub->resamplefds[pirq], msg->fds[1]);
+
+    iohub->token[pirq].iohub = iohub;
+    iohub->token[pirq].pirq = pirq;
+
+    qemu_set_fd_handler(msg->fds[1], intr_resample_handler, NULL,
+                        &iohub->token[pirq]);
+}
diff --git a/hw/remote/machine.c b/hw/remote/machine.c
index c5658bf..a801a4e 100644
--- a/hw/remote/machine.c
+++ b/hw/remote/machine.c
@@ -20,12 +20,15 @@
 #include "exec/address-spaces.h"
 #include "exec/memory.h"
 #include "qapi/error.h"
+#include "hw/pci/pci_host.h"
+#include "hw/remote/iohub.h"
 
 static void remote_machine_init(MachineState *machine)
 {
     MemoryRegion *system_memory, *system_io, *pci_memory;
     RemoteMachineState *s = REMOTE_MACHINE(machine);
     RemotePCIHost *rem_host;
+    PCIHostState *pci_host;
 
     system_memory = get_system_memory();
     system_io = get_system_io();
@@ -45,6 +48,13 @@ static void remote_machine_init(MachineState *machine)
     memory_region_add_subregion_overlap(system_memory, 0x0, pci_memory, -1);
 
     qdev_realize(DEVICE(rem_host), sysbus_get_default(), &error_fatal);
+
+    pci_host = PCI_HOST_BRIDGE(rem_host);
+
+    remote_iohub_init(&s->iohub);
+
+    pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
+                 &s->iohub, REMOTE_IOHUB_NB_PIRQS);
 }
 
 static void remote_machine_class_init(ObjectClass *oc, void *data)
diff --git a/hw/remote/message.c b/hw/remote/message.c
index 454fd2d..a1b1018 100644
--- a/hw/remote/message.c
+++ b/hw/remote/message.c
@@ -18,6 +18,7 @@
 #include "hw/pci/pci.h"
 #include "exec/memattrs.h"
 #include "hw/remote/memory.h"
+#include "hw/remote/iohub.h"
 
 static void process_config_write(QIOChannel *ioc, PCIDevice *dev,
                                  MPQemuMsg *msg);
@@ -68,6 +69,9 @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
         case SYNC_SYSMEM:
             remote_sysmem_reconfig(&msg, &local_err);
             break;
+        case SET_IRQFD:
+            process_set_irqfd_msg(pci_dev, &msg);
+            break;
 
         default:
             error_setg(&local_err,
diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
index ac2cb2a..d75b478 100644
--- a/hw/remote/mpqemu-link.c
+++ b/hw/remote/mpqemu-link.c
@@ -295,6 +295,11 @@ bool mpqemu_msg_valid(MPQemuMsg *msg)
             return false;
         }
         break;
+    case SET_IRQFD:
+        if (msg->size || (msg->num_fds != 2)) {
+            return false;
+        }
+        break;
     default:
         break;
     }
diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
index 0f2d1aa..835554c 100644
--- a/hw/remote/proxy.c
+++ b/hw/remote/proxy.c
@@ -20,6 +20,9 @@
 #include "qemu/error-report.h"
 #include "hw/remote/memory-sync.h"
 #include "qom/object.h"
+#include "qemu/event_notifier.h"
+#include "sysemu/kvm.h"
+#include "util/event_notifier-posix.c"
 
 static void proxy_set_socket(PCIProxyDev *pdev, int fd, Error **errp)
 {
@@ -31,6 +34,56 @@ static Property proxy_properties[] = {
     DEFINE_PROP_END_OF_LIST(),
 };
 
+static void proxy_intx_update(PCIDevice *pci_dev)
+{
+    PCIProxyDev *dev = PCI_PROXY_DEV(pci_dev);
+    PCIINTxRoute route;
+    int pin = pci_get_byte(pci_dev->config + PCI_INTERRUPT_PIN) - 1;
+
+    if (dev->virq != -1) {
+        kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, &dev->intr, dev->virq);
+        dev->virq = -1;
+    }
+
+    route = pci_device_route_intx_to_irq(pci_dev, pin);
+
+    dev->virq = route.irq;
+
+    if (dev->virq != -1) {
+        kvm_irqchip_add_irqfd_notifier_gsi(kvm_state, &dev->intr,
+                                           &dev->resample, dev->virq);
+    }
+}
+
+static void setup_irqfd(PCIProxyDev *dev)
+{
+    PCIDevice *pci_dev = PCI_DEVICE(dev);
+    MPQemuMsg msg;
+    Error *local_err = NULL;
+
+    event_notifier_init(&dev->intr, 0);
+    event_notifier_init(&dev->resample, 0);
+
+    memset(&msg, 0, sizeof(MPQemuMsg));
+    msg.cmd = SET_IRQFD;
+    msg.num_fds = 2;
+    msg.fds[0] = event_notifier_get_fd(&dev->intr);
+    msg.fds[1] = event_notifier_get_fd(&dev->resample);
+    msg.size = 0;
+
+    mpqemu_msg_send(&msg, dev->ioc, &local_err);
+    if (local_err) {
+        error_report("Error to send cmd to remote process %d",
+                     msg.cmd);
+    }
+
+    dev->virq = -1;
+
+    proxy_intx_update(pci_dev);
+
+    pci_device_set_intx_routing_notifier(pci_dev, proxy_intx_update);
+}
+
 static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
 {
     PCIProxyDev *dev = PCI_PROXY_DEV(device);
@@ -62,6 +115,8 @@ static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
     qio_channel_set_blocking(dev->ioc, true, NULL);
 
     configure_memory_sync(&dev->sync, dev->ioc);
+
+    setup_irqfd(dev);
 }
 
 static void pci_proxy_dev_exit(PCIDevice *pdev)
@@ -75,6 +130,9 @@ static void pci_proxy_dev_exit(PCIDevice *pdev)
     error_free(dev->migration_blocker);
 
     deconfigure_memory_sync(&dev->sync);
+
+    event_notifier_cleanup(&dev->intr);
+    event_notifier_cleanup(&dev->resample);
 }
 
 static int config_op_send(PCIProxyDev *pdev, uint32_t addr, uint32_t *val,
diff --git a/MAINTAINERS b/MAINTAINERS
index 5d78b78..02837fe 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3152,6 +3152,8 @@ F: hw/remote/proxy.c
 F: include/hw/remote/proxy.h
 F: hw/remote/memory-sync.c
 F: include/hw/remote/memory-sync.h
+F: hw/remote/iohub.c
+F: include/hw/remote/iohub.h
 
 Build and test automation
 -------------------------
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
index 7d434a5..80bd307 100644
--- a/hw/remote/meson.build
+++ b/hw/remote/meson.build
@@ -5,6 +5,7 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('mpqemu-link.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
+remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iohub.c'))
 
 specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
 specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory-sync.c'))
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v12 18/19] multi-process: Retrieve PCI info from remote process
  2020-12-01 20:22 [PATCH v12 00/19] Initial support for multi-process Qemu Jagannathan Raman
                   ` (16 preceding siblings ...)
  2020-12-01 20:22 ` [PATCH v12 17/19] multi-process: create IOHUB object to handle irq Jagannathan Raman
@ 2020-12-01 20:22 ` Jagannathan Raman
  2020-12-01 20:22 ` [PATCH v12 19/19] multi-process: perform device reset in the " Jagannathan Raman
  2020-12-03  9:14 ` [PATCH v12 00/19] Initial support for multi-process Qemu Stefan Hajnoczi
  19 siblings, 0 replies; 52+ messages in thread
From: Jagannathan Raman @ 2020-12-01 20:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, swapnil.ingle, john.g.johnson, kraxel,
	jag.raman, quintela, mst, armbru, kanth.ghatraju, felipe, thuth,
	ehabkost, konrad.wilk, dgilbert, alex.williamson, stefanha,
	thanos.makatos, rth, kwolf, berrange, mreitz, ross.lagerwall,
	marcandre.lureau, pbonzini

Retrieve PCI configuration info about the remote device and
configure the Proxy PCI object based on the returned information

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/remote/proxy.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)

diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
index 835554c..a68ee66 100644
--- a/hw/remote/proxy.c
+++ b/hw/remote/proxy.c
@@ -24,6 +24,8 @@
 #include "sysemu/kvm.h"
 #include "util/event_notifier-posix.c"
 
+static void probe_pci_info(PCIDevice *dev, Error **errp);
+
 static void proxy_set_socket(PCIProxyDev *pdev, int fd, Error **errp)
 {
     pdev->ioc = qio_channel_new_fd(fd, errp);
@@ -87,6 +89,7 @@ static void setup_irqfd(PCIProxyDev *dev)
 static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
 {
     PCIProxyDev *dev = PCI_PROXY_DEV(device);
+    uint8_t *pci_conf = device->config;
     int fd;
 
     if (dev->fd) {
@@ -114,9 +117,14 @@ static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
     qemu_mutex_init(&dev->io_mutex);
     qio_channel_set_blocking(dev->ioc, true, NULL);
 
+    pci_conf[PCI_LATENCY_TIMER] = 0xff;
+    pci_conf[PCI_INTERRUPT_PIN] = 0x01;
+
     configure_memory_sync(&dev->sync, dev->ioc);
 
     setup_irqfd(dev);
+
+    probe_pci_info(PCI_DEVICE(dev), errp);
 }
 
 static void pci_proxy_dev_exit(PCIDevice *pdev)
@@ -271,3 +279,80 @@ const MemoryRegionOps proxy_mr_ops = {
         .max_access_size = 8,
     },
 };
+
+static void probe_pci_info(PCIDevice *dev, Error **errp)
+{
+    PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
+    uint32_t orig_val, new_val, base_class, val;
+    PCIProxyDev *pdev = PCI_PROXY_DEV(dev);
+    DeviceClass *dc = DEVICE_CLASS(pc);
+    uint8_t type;
+    int i, size;
+    char *name;
+
+    config_op_send(pdev, PCI_VENDOR_ID, &val, 2, PCI_CONFIG_READ);
+    pc->vendor_id = (uint16_t)val;
+
+    config_op_send(pdev, PCI_DEVICE_ID, &val, 2, PCI_CONFIG_READ);
+    pc->device_id = (uint16_t)val;
+
+    config_op_send(pdev, PCI_CLASS_DEVICE, &val, 2, PCI_CONFIG_READ);
+    pc->class_id = (uint16_t)val;
+
+    config_op_send(pdev, PCI_SUBSYSTEM_ID, &val, 2, PCI_CONFIG_READ);
+    pc->subsystem_id = (uint16_t)val;
+
+    base_class = pc->class_id >> 4;
+    switch (base_class) {
+    case PCI_BASE_CLASS_BRIDGE:
+        set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
+        break;
+    case PCI_BASE_CLASS_STORAGE:
+        set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
+        break;
+    case PCI_BASE_CLASS_NETWORK:
+        set_bit(DEVICE_CATEGORY_NETWORK, dc->categories);
+        break;
+    case PCI_BASE_CLASS_INPUT:
+        set_bit(DEVICE_CATEGORY_INPUT, dc->categories);
+        break;
+    case PCI_BASE_CLASS_DISPLAY:
+        set_bit(DEVICE_CATEGORY_DISPLAY, dc->categories);
+        break;
+    case PCI_BASE_CLASS_PROCESSOR:
+        set_bit(DEVICE_CATEGORY_CPU, dc->categories);
+        break;
+    default:
+        set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+        break;
+    }
+
+    for (i = 0; i < PCI_NUM_REGIONS; i++) {
+        config_op_send(pdev, PCI_BASE_ADDRESS_0 + (4 * i), &orig_val, 4,
+                       PCI_CONFIG_READ);
+        new_val = 0xffffffff;
+        config_op_send(pdev, PCI_BASE_ADDRESS_0 + (4 * i), &new_val, 4,
+                       PCI_CONFIG_WRITE);
+        config_op_send(pdev, PCI_BASE_ADDRESS_0 + (4 * i), &new_val, 4,
+                       PCI_CONFIG_READ);
+        size = (~(new_val & 0xFFFFFFF0)) + 1;
+        config_op_send(pdev, PCI_BASE_ADDRESS_0 + (4 * i), &orig_val, 4,
+                       PCI_CONFIG_WRITE);
+        type = (new_val & 0x1) ?
+                   PCI_BASE_ADDRESS_SPACE_IO : PCI_BASE_ADDRESS_SPACE_MEMORY;
+
+        if (size) {
+            pdev->region[i].dev = pdev;
+            pdev->region[i].present = true;
+            if (type == PCI_BASE_ADDRESS_SPACE_MEMORY) {
+                pdev->region[i].memory = true;
+            }
+            name = g_strdup_printf("bar-region-%d", i);
+            memory_region_init_io(&pdev->region[i].mr, OBJECT(pdev),
+                                  &proxy_mr_ops, &pdev->region[i],
+                                  name, size);
+            pci_register_bar(dev, i, type, &pdev->region[i].mr);
+            g_free(name);
+        }
+    }
+}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v12 19/19] multi-process: perform device reset in the remote process
  2020-12-01 20:22 [PATCH v12 00/19] Initial support for multi-process Qemu Jagannathan Raman
                   ` (17 preceding siblings ...)
  2020-12-01 20:22 ` [PATCH v12 18/19] multi-process: Retrieve PCI info from remote process Jagannathan Raman
@ 2020-12-01 20:22 ` Jagannathan Raman
  2020-12-03  9:14 ` [PATCH v12 00/19] Initial support for multi-process Qemu Stefan Hajnoczi
  19 siblings, 0 replies; 52+ messages in thread
From: Jagannathan Raman @ 2020-12-01 20:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, fam, swapnil.ingle, john.g.johnson, kraxel,
	jag.raman, quintela, mst, armbru, kanth.ghatraju, felipe, thuth,
	ehabkost, konrad.wilk, dgilbert, alex.williamson, stefanha,
	thanos.makatos, rth, kwolf, berrange, mreitz, ross.lagerwall,
	marcandre.lureau, pbonzini

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>

Perform device reset in the remote process when QEMU performs
device reset. This is required to reset the internal state
(like registers, etc...) of emulated devices

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/hw/remote/mpqemu-link.h |  1 +
 hw/remote/message.c             | 23 ++++++++++++++++++++++-
 hw/remote/proxy.c               | 20 ++++++++++++++++++++
 3 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/include/hw/remote/mpqemu-link.h b/include/hw/remote/mpqemu-link.h
index a0c5c5d..0b87293 100644
--- a/include/hw/remote/mpqemu-link.h
+++ b/include/hw/remote/mpqemu-link.h
@@ -39,6 +39,7 @@ typedef enum {
     BAR_WRITE,
     BAR_READ,
     SET_IRQFD,
+    DEVICE_RESET,
     MPQEMU_CMD_MAX,
 } MPQemuCmd;
 
diff --git a/hw/remote/message.c b/hw/remote/message.c
index a1b1018..f3af4c5 100644
--- a/hw/remote/message.c
+++ b/hw/remote/message.c
@@ -19,6 +19,7 @@
 #include "exec/memattrs.h"
 #include "hw/remote/memory.h"
 #include "hw/remote/iohub.h"
+#include "sysemu/reset.h"
 
 static void process_config_write(QIOChannel *ioc, PCIDevice *dev,
                                  MPQemuMsg *msg);
@@ -26,6 +27,8 @@ static void process_config_read(QIOChannel *ioc, PCIDevice *dev,
                                 MPQemuMsg *msg);
 static void process_bar_write(QIOChannel *ioc, MPQemuMsg *msg, Error **errp);
 static void process_bar_read(QIOChannel *ioc, MPQemuMsg *msg, Error **errp);
+static void process_device_reset_msg(QIOChannel *ioc, PCIDevice *dev,
+                                     Error **errp);
 
 void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
 {
@@ -72,7 +75,9 @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
         case SET_IRQFD:
             process_set_irqfd_msg(pci_dev, &msg);
             break;
-
+        case DEVICE_RESET:
+            process_device_reset_msg(com->ioc, pci_dev, &local_err);
+            break;
         default:
             error_setg(&local_err,
                        "Unknown command (%d) received for device %s (pid=%d)",
@@ -218,3 +223,19 @@ fail:
                    "in remote process pid=%d", getpid());
     }
 }
+
+static void process_device_reset_msg(QIOChannel *ioc, PCIDevice *dev,
+                                     Error **errp)
+{
+    DeviceClass *dc = DEVICE_GET_CLASS(dev);
+    DeviceState *s = DEVICE(dev);
+    MPQemuMsg ret = { 0 };
+
+    if (dc->reset) {
+        dc->reset(s);
+    }
+
+    ret.cmd = RET_MSG;
+
+    mpqemu_msg_send(&ret, ioc, errp);
+}
diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
index a68ee66..5b7b14c 100644
--- a/hw/remote/proxy.c
+++ b/hw/remote/proxy.c
@@ -25,6 +25,7 @@
 #include "util/event_notifier-posix.c"
 
 static void probe_pci_info(PCIDevice *dev, Error **errp);
+static void proxy_device_reset(DeviceState *dev);
 
 static void proxy_set_socket(PCIProxyDev *pdev, int fd, Error **errp)
 {
@@ -199,6 +200,8 @@ static void pci_proxy_dev_class_init(ObjectClass *klass, void *data)
     k->config_read = pci_proxy_read_config;
     k->config_write = pci_proxy_write_config;
 
+    dc->reset = proxy_device_reset;
+
     device_class_set_props(dc, proxy_properties);
 }
 
@@ -356,3 +359,20 @@ static void probe_pci_info(PCIDevice *dev, Error **errp)
         }
     }
 }
+
+static void proxy_device_reset(DeviceState *dev)
+{
+    PCIProxyDev *pdev = PCI_PROXY_DEV(dev);
+    MPQemuMsg msg = { 0 };
+    Error *local_err = NULL;
+
+    msg.cmd = DEVICE_RESET;
+    msg.size = 0;
+
+    (void)mpqemu_msg_send_and_await_reply(&msg, pdev, &local_err);
+    if (local_err) {
+        error_report("Failed to send DEVICE_RESET to the remote process");
+        error_free(local_err);
+    }
+
+}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 00/19] Initial support for multi-process Qemu
  2020-12-01 20:22 [PATCH v12 00/19] Initial support for multi-process Qemu Jagannathan Raman
                   ` (18 preceding siblings ...)
  2020-12-01 20:22 ` [PATCH v12 19/19] multi-process: perform device reset in the " Jagannathan Raman
@ 2020-12-03  9:14 ` Stefan Hajnoczi
  2020-12-03 19:26   ` Elena Ufimtseva
  2020-12-03 20:40   ` Peter Maydell
  19 siblings, 2 replies; 52+ messages in thread
From: Stefan Hajnoczi @ 2020-12-03  9:14 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, fam, swapnil.ingle, john.g.johnson, qemu-devel,
	kraxel, quintela, peter.maydell, mst, armbru, kanth.ghatraju,
	felipe, thuth, ehabkost, konrad.wilk, dgilbert, alex.williamson,
	thanos.makatos, rth, kwolf, berrange, mreitz, ross.lagerwall,
	marcandre.lureau, pbonzini

[-- Attachment #1: Type: text/plain, Size: 578 bytes --]

On Tue, Dec 01, 2020 at 03:22:35PM -0500, Jagannathan Raman wrote:
> This is the v12 of the patchset. Thank you very much for the
> review of the v11 of the series.

I'm in favor of merging this for QEMU 6.0. The command-line interface
has the x- prefix so QEMU is not committing to a stable interface.
Changes needed to support additional device types or to switch to the
vfio-user protocol can be made later.

Jag, Elena, JJ: I suggest getting your GPG key to Peter Maydell so you
can send multi-process QEMU pull requests.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 00/19] Initial support for multi-process Qemu
  2020-12-03  9:14 ` [PATCH v12 00/19] Initial support for multi-process Qemu Stefan Hajnoczi
@ 2020-12-03 19:26   ` Elena Ufimtseva
  2020-12-03 20:40   ` Peter Maydell
  1 sibling, 0 replies; 52+ messages in thread
From: Elena Ufimtseva @ 2020-12-03 19:26 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: fam, john.g.johnson, swapnil.ingle, mst, qemu-devel, kraxel,
	Jagannathan Raman, quintela, peter.maydell, armbru,
	kanth.ghatraju, felipe, thuth, ehabkost, konrad.wilk, dgilbert,
	alex.williamson, thanos.makatos, rth, kwolf, berrange, mreitz,
	ross.lagerwall, marcandre.lureau, pbonzini

On Thu, Dec 03, 2020 at 09:14:04AM +0000, Stefan Hajnoczi wrote:
> On Tue, Dec 01, 2020 at 03:22:35PM -0500, Jagannathan Raman wrote:
> > This is the v12 of the patchset. Thank you very much for the
> > review of the v11 of the series.
> 
> I'm in favor of merging this for QEMU 6.0. The command-line interface
> has the x- prefix so QEMU is not committing to a stable interface.
> Changes needed to support additional device types or to switch to the
> vfio-user protocol can be made later.
> 

Woot! Thank you Stefan!

> Jag, Elena, JJ: I suggest getting your GPG key to Peter Maydell so you
> can send multi-process QEMU pull requests.
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

In progress.
Do we need to add some tagging for the PULL patches?
Should we include the git repo and have the proper tag as well?

Elena






^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 00/19] Initial support for multi-process Qemu
  2020-12-03  9:14 ` [PATCH v12 00/19] Initial support for multi-process Qemu Stefan Hajnoczi
  2020-12-03 19:26   ` Elena Ufimtseva
@ 2020-12-03 20:40   ` Peter Maydell
  2020-12-10 11:13     ` Stefan Hajnoczi
  1 sibling, 1 reply; 52+ messages in thread
From: Peter Maydell @ 2020-12-03 20:40 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Elena Ufimtseva, Fam Zheng, Swapnil Ingle, john.g.johnson,
	QEMU Developers, Gerd Hoffmann, Jagannathan Raman, Juan Quintela,
	Michael S. Tsirkin, Markus Armbruster, kanth.ghatraju,
	Felipe Franciosi, Thomas Huth, Eduardo Habkost, konrad.wilk,
	Dr. David Alan Gilbert, Alex Williamson, thanos.makatos,
	Richard Henderson, Kevin Wolf, Daniel P. Berrange, Max Reitz,
	Ross Lagerwall, Marc-André Lureau, Paolo Bonzini

On Thu, 3 Dec 2020 at 09:51, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Tue, Dec 01, 2020 at 03:22:35PM -0500, Jagannathan Raman wrote:
> > This is the v12 of the patchset. Thank you very much for the
> > review of the v11 of the series.
>
> I'm in favor of merging this for QEMU 6.0. The command-line interface
> has the x- prefix so QEMU is not committing to a stable interface.
> Changes needed to support additional device types or to switch to the
> vfio-user protocol can be made later.
>
> Jag, Elena, JJ: I suggest getting your GPG key to Peter Maydell so you
> can send multi-process QEMU pull requests.

I would prefer to see this going through the tree of an
established QEMU developer who's already sending pullrequests,
at least initially.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 02/19] multi-process: add configure and usage information
  2020-12-01 20:22 ` [PATCH v12 02/19] multi-process: add configure and usage information Jagannathan Raman
@ 2020-12-04 14:10   ` Marc-André Lureau
  2020-12-04 14:37   ` Daniel P. Berrangé
  1 sibling, 0 replies; 52+ messages in thread
From: Marc-André Lureau @ 2020-12-04 14:10 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: Elena Ufimtseva, Fam Zheng, Swapnil Ingle, John G Johnson, QEMU,
	Gerd Hoffmann, Juan Quintela, Michael S. Tsirkin,
	Markus Armbruster, Kanth Ghatraju, Felipe Franciosi, Thomas Huth,
	Eduardo Habkost, Konrad Rzeszutek Wilk, Dr. David Alan Gilbert,
	Alex Williamson, Stefan Hajnoczi, Thanos Makatos,
	Richard Henderson, Kevin Wolf, Daniel P. Berrange, Max Reitz,
	Ross Lagerwall, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 8464 bytes --]

Hi

On Wed, Dec 2, 2020 at 12:23 AM Jagannathan Raman <jag.raman@oracle.com>
wrote:

> From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>
> Adds documentation explaining the command-line arguments needed
> to use multi-process. Also adds a python script that illustrates the
> usage.
>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  docs/multi-process.rst                        | 66 +++++++++++++++++++
>  MAINTAINERS                                   |  1 +
>  tests/multiprocess/multiprocess-lsi53c895a.py | 92
> +++++++++++++++++++++++++++
>  3 files changed, 159 insertions(+)
>  create mode 100644 docs/multi-process.rst
>  create mode 100755 tests/multiprocess/multiprocess-lsi53c895a.py
>
> diff --git a/docs/multi-process.rst b/docs/multi-process.rst
> new file mode 100644
> index 0000000..9a5fe5b
> --- /dev/null
> +++ b/docs/multi-process.rst
> @@ -0,0 +1,66 @@
> +Multi-process QEMU
> +==================
> +
> +This document describes how to configure and use multi-process qemu.
> +For the design document refer to docs/devel/qemu-multiprocess.
> +
> +1) Configuration
> +----------------
> +
> +multi-process is enabled by default for targets that enable KVM
> +
> +
> +2) Usage
> +--------
> +
> +Multi-process QEMU requires an orchestrator to launch. Please refer to a
> +light-weight python based orchestrator for mpqemu in
> +scripts/mpqemu-launcher.py to lauch QEMU in multi-process mode.
> +
> +Following is a description of command-line used to launch mpqemu.
> +
> +* Orchestrator:
> +
> +  - The Orchestrator creates a unix socketpair
> +
> +  - It launches the remote process and passes one of the
> +    sockets to it via command-line.
> +
> +  - It then launches QEMU and specifies the other socket as an option
> +    to the Proxy device object
> +
> +* Remote Process:
> +
> +  - QEMU can enter remote process mode by using the "remote" machine
> +    option.
> +
> +  - The orchestrator creates a "remote-object" with details about
> +    the device and the file descriptor for the device
> +
> +  - The remaining options are no different from how one launches QEMU with
> +    devices.
> +
> +  - Example command-line for the remote process is as follows:
> +
> +      /usr/bin/qemu-system-x86_64                                        \
> +      -machine x-remote                                                  \
> +      -device lsi53c895a,id=lsi0                                         \
> +      -drive id=drive_image2,file=/build/ol7-nvme-test-1.qcow2           \
> +      -device scsi-hd,id=drive2,drive=drive_image2,bus=lsi0.0,scsi-id=0  \
> +      -object x-remote-object,id=robj1,devid=lsi1,fd=4,
> +
> +* QEMU:
> +
> +  - Since parts of the RAM are shared between QEMU & remote process, a
> +    memory-backend-memfd is required to facilitate this, as follows:
> +
> +    -object memory-backend-memfd,id=mem,size=2G
> +
> +  - A "x-pci-proxy-dev" device is created for each of the PCI devices
> emulated
> +    in the remote process. A "socket" sub-option specifies the other end
> of
> +    unix channel created by orchestrator. The "id" sub-option must be
> specified
> +    and should be the same as the "id" specified for the remote PCI device
> +
> +  - Example commandline for QEMU is as follows:
> +
> +      -device x-pci-proxy-dev,id=lsi0,socket=3
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 88a5a14..f615ad1 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3136,6 +3136,7 @@ M: Jagannathan Raman <jag.raman@oracle.com>
>  M: John G Johnson <john.g.johnson@oracle.com>
>  S: Maintained
>  F: docs/devel/multi-process.rst
> +F: tests/multiprocess/multiprocess-lsi53c895a.py
>
>  Build and test automation
>  -------------------------
> diff --git a/tests/multiprocess/multiprocess-lsi53c895a.py
> b/tests/multiprocess/multiprocess-lsi53c895a.py
> new file mode 100755
> index 0000000..bfe4f66
> --- /dev/null
> +++ b/tests/multiprocess/multiprocess-lsi53c895a.py
>

This might not be appropriate under qemu tree tests/ imho. It's not a test,
at best it complements the documentation.

@@ -0,0 +1,92 @@
> +#!/usr/bin/env python3
> +
> +import urllib.request
> +import subprocess
> +import argparse
> +import socket
> +import sys
> +import os
> +
> +arch = os.uname()[4]
> +proc_path = os.path.join(os.getcwd(), '..', '..', 'build',
> arch+'-softmmu',
> +                         'qemu-system-'+arch)
> +
> +parser = argparse.ArgumentParser(description='Launcher for multi-process
> QEMU')
> +parser.add_argument('--bin', required=False, help='location of QEMU
> binary',
> +                    metavar='bin');
> +args = parser.parse_args()
> +
> +if args.bin is not None:
> +    proc_path = args.bin
> +
> +if not os.path.isfile(proc_path):
> +    sys.exit('QEMU binary not found')
> +
> +kernel_path = os.path.join(os.getcwd(), 'vmlinuz')
> +initrd_path = os.path.join(os.getcwd(), 'initrd')
> +
> +proxy_cmd = [ proc_path,
>   \
> +              '-name', 'Fedora', '-smp', '4', '-m', '2048', '-cpu',
> 'host', \
> +              '-object', 'memory-backend-memfd,id=sysmem-file,size=2G',
>    \
> +              '-numa', 'node,memdev=sysmem-file',
>    \
> +              '-kernel', kernel_path, '-initrd', initrd_path,
>    \
> +              '-vnc', ':0',
>    \
> +              '-monitor', 'unix:/home/qemu-sock,server,nowait',
>    \
>

That path is odd. Make it a TemporaryFile, or an argument. Even simpler,
use socketpair()

+            ]
> +
> +if arch == 'x86_64':
> +    print('Downloading images for arch x86_64')
> +    kernel_url = 'https://dl.fedoraproject.org/pub/fedora/linux/'    \
> +                 'releases/33/Everything/x86_64/os/images/'          \
> +                 'pxeboot/vmlinuz'
> +    initrd_url = 'https://dl.fedoraproject.org/pub/fedora/linux/'    \
> +                 'releases/33/Everything/x86_64/os/images/'          \
> +                 'pxeboot/initrd.img'
> +    proxy_cmd.append('-machine')
> +    proxy_cmd.append('pc,accel=kvm')
> +    proxy_cmd.append('-append')
> +    proxy_cmd.append('rdinit=/bin/bash console=ttyS0 console=tty0')
> +elif arch == 'aarch64':
> +    print('Downloading images for arch aarch64')
> +    kernel_url = 'https://dl.fedoraproject.org/pub/fedora/linux/'    \
> +                 'releases/33/Everything/aarch64/os/images/'         \
> +                 'pxeboot/vmlinuz'
> +    initrd_url = 'https://dl.fedoraproject.org/pub/fedora/linux/'    \
> +                 'releases/33/Everything/aarch64/os/images/'         \
> +                 'pxeboot/initrd.img'
> +    proxy_cmd.append('-machine')
> +    proxy_cmd.append('virt,gic-version=3')
> +    proxy_cmd.append('-accel')
> +    proxy_cmd.append('kvm')
> +    proxy_cmd.append('-append')
> +    proxy_cmd.append('rdinit=/bin/bash')
>

We have vm-based testing under tests/vm, can we imagine extending that
instead?

To not delay further the series, I would suggest to drop it for now.

+else:
> +    sys.exit('Arch %s not tested' % arch)
> +
> +urllib.request.urlretrieve(kernel_url, kernel_path)
> +urllib.request.urlretrieve(initrd_url, initrd_path)
> +
> +proxy, remote = socket.socketpair(socket.AF_UNIX, socket.SOCK_STREAM)
> +
> +proxy_cmd.append('-device')
> +proxy_cmd.append('x-pci-proxy-dev,id=lsi1,fd='+str(proxy.fileno()))
> +
> +remote_cmd = [ proc_path,
>       \
> +               '-machine', 'x-remote',
>      \
> +               '-device', 'lsi53c895a,id=lsi1',
>       \
> +               '-object',
>       \
> +
>  'x-remote-object,id=robj1,devid=lsi1,fd='+str(remote.fileno()), \
> +               '-display', 'none',
>      \
> +               '-monitor', 'unix:/home/rem-sock,server,nowait',
>       \
> +             ]
> +
> +pid = os.fork();
> +
> +if pid:
> +    # In Proxy
> +    print('Launching QEMU with Proxy object');
> +    process = subprocess.Popen(proxy_cmd, pass_fds=[proxy.fileno()])
> +else:
> +    # In remote
> +    print('Launching Remote process');
> +    process = subprocess.Popen(remote_cmd, pass_fds=[remote.fileno(), 0,
> 1, 2])
> --
> 1.8.3.1
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 12185 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 03/19] memory: alloc RAM from file at offset
  2020-12-01 20:22 ` [PATCH v12 03/19] memory: alloc RAM from file at offset Jagannathan Raman
@ 2020-12-04 14:13   ` Marc-André Lureau
  2020-12-04 14:18     ` Marc-André Lureau
  0 siblings, 1 reply; 52+ messages in thread
From: Marc-André Lureau @ 2020-12-04 14:13 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: Elena Ufimtseva, Fam Zheng, Swapnil Ingle, John G Johnson, QEMU,
	Gerd Hoffmann, Juan Quintela, Michael S. Tsirkin,
	Markus Armbruster, Kanth Ghatraju, Felipe Franciosi, Thomas Huth,
	Eduardo Habkost, Konrad Rzeszutek Wilk, Dr. David Alan Gilbert,
	Alex Williamson, Stefan Hajnoczi, Thanos Makatos,
	Richard Henderson, Kevin Wolf, Daniel P. Berrange, Max Reitz,
	Ross Lagerwall, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 9627 bytes --]

On Wed, Dec 2, 2020 at 12:23 AM Jagannathan Raman <jag.raman@oracle.com>
wrote:

> Allow RAM MemoryRegion to be created from an offset in a file, instead
> of allocating at offset of 0 by default. This is needed to synchronize
> RAM between QEMU & remote process.
>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/exec/memory.h     |  2 ++
>  include/exec/ram_addr.h   |  2 +-
>  include/qemu/mmap-alloc.h |  3 ++-
>  backends/hostmem-memfd.c  |  2 +-
>  hw/misc/ivshmem.c         |  3 ++-
>  softmmu/memory.c          |  3 ++-
>  softmmu/physmem.c         | 11 +++++++----
>  util/mmap-alloc.c         |  7 ++++---
>  util/oslib-posix.c        |  2 +-
>  9 files changed, 22 insertions(+), 13 deletions(-)
>
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 0f3e6bc..7bcaada 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -980,6 +980,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
>   * @size: size of the region.
>   * @share: %true if memory must be mmaped with the MAP_SHARED flag
>   * @fd: the fd to mmap.
> + * @offset: offset within the file referenced by fd
>   * @errp: pointer to Error*, to store an error if it happens.
>   *
>   * Note that this function does not do anything to cause the data in the
> @@ -991,6 +992,7 @@ void memory_region_init_ram_from_fd(MemoryRegion *mr,
>                                      uint64_t size,
>                                      bool share,
>                                      int fd,
> +                                    ram_addr_t offset,
>                                      Error **errp);
>  #endif
>
> diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
> index c6d2ef1..d465a48 100644
> --- a/include/exec/ram_addr.h
> +++ b/include/exec/ram_addr.h
> @@ -121,7 +121,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size,
> MemoryRegion *mr,
>                                     Error **errp);
>  RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
>                                   uint32_t ram_flags, int fd,
> -                                 Error **errp);
> +                                 off_t offset, Error **errp);
>
>  RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
>                                    MemoryRegion *mr, Error **errp);
> diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
> index e786266..4f57985 100644
> --- a/include/qemu/mmap-alloc.h
> +++ b/include/qemu/mmap-alloc.h
> @@ -25,7 +25,8 @@ void *qemu_ram_mmap(int fd,
>                      size_t size,
>                      size_t align,
>                      bool shared,
> -                    bool is_pmem);
> +                    bool is_pmem,
> +                    off_t start);
>

I'd suggest to keep the variable name  "offset", to avoid potential
confusion (it's also the name of the mmap() argument).


>  void qemu_ram_munmap(int fd, void *ptr, size_t size);
>
> diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c
> index e5626d4..69b0ae3 100644
> --- a/backends/hostmem-memfd.c
> +++ b/backends/hostmem-memfd.c
> @@ -55,7 +55,7 @@ memfd_backend_memory_alloc(HostMemoryBackend *backend,
> Error **errp)
>      name = host_memory_backend_get_name(backend);
>      memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend),
>                                     name, backend->size,
> -                                   backend->share, fd, errp);
> +                                   backend->share, fd, 0, errp);
>      g_free(name);
>  }
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index e321e5c..8d3e1ee 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -494,7 +494,8 @@ static void process_msg_shmem(IVShmemState *s, int fd,
> Error **errp)
>
>      /* mmap the region and map into the BAR2 */
>      memory_region_init_ram_from_fd(&s->server_bar2, OBJECT(s),
> -                                   "ivshmem.bar2", size, true, fd,
> &local_err);
> +                                   "ivshmem.bar2", size, true, fd, 0,
> +                                   &local_err);
>      if (local_err) {
>          error_propagate(errp, local_err);
>          return;
> diff --git a/softmmu/memory.c b/softmmu/memory.c
> index 11ca94d..e4ed0e4 100644
> --- a/softmmu/memory.c
> +++ b/softmmu/memory.c
> @@ -1612,6 +1612,7 @@ void memory_region_init_ram_from_fd(MemoryRegion *mr,
>                                      uint64_t size,
>                                      bool share,
>                                      int fd,
> +                                    ram_addr_t offset,
>                                      Error **errp)
>  {
>      Error *err = NULL;
> @@ -1621,7 +1622,7 @@ void memory_region_init_ram_from_fd(MemoryRegion *mr,
>      mr->destructor = memory_region_destructor_ram;
>      mr->ram_block = qemu_ram_alloc_from_fd(size, mr,
>                                             share ? RAM_SHARED : 0,
> -                                           fd, &err);
> +                                           fd, offset, &err);
>      mr->dirty_log_mask = tcg_enabled() ? (1 << DIRTY_MEMORY_CODE) : 0;
>      if (err) {
>          mr->size = int128_zero();
> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> index 3027747..e0b8fc6 100644
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -1461,6 +1461,7 @@ static void *file_ram_alloc(RAMBlock *block,
>                              ram_addr_t memory,
>                              int fd,
>                              bool truncate,
> +                            off_t offset,
>                              Error **errp)
>  {
>      void *area;
> @@ -1511,7 +1512,8 @@ static void *file_ram_alloc(RAMBlock *block,
>      }
>
>      area = qemu_ram_mmap(fd, memory, block->mr->align,
> -                         block->flags & RAM_SHARED, block->flags &
> RAM_PMEM);
> +                         block->flags & RAM_SHARED, block->flags &
> RAM_PMEM,
> +                         offset);
>      if (area == MAP_FAILED) {
>          error_setg_errno(errp, errno,
>                           "unable to map backing store for guest RAM");
> @@ -1943,7 +1945,7 @@ static void ram_block_add(RAMBlock *new_block, Error
> **errp, bool shared)
>  #ifdef CONFIG_POSIX
>  RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
>                                   uint32_t ram_flags, int fd,
> -                                 Error **errp)
> +                                 off_t offset, Error **errp)
>  {
>      RAMBlock *new_block;
>      Error *local_err = NULL;
> @@ -1996,7 +1998,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size,
> MemoryRegion *mr,
>      new_block->used_length = size;
>      new_block->max_length = size;
>      new_block->flags = ram_flags;
> -    new_block->host = file_ram_alloc(new_block, size, fd, !file_size,
> errp);
> +    new_block->host = file_ram_alloc(new_block, size, fd, !file_size,
> offset,
> +                                     errp);
>      if (!new_block->host) {
>          g_free(new_block);
>          return NULL;
> @@ -2026,7 +2029,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size,
> MemoryRegion *mr,
>          return NULL;
>      }
>
> -    block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, errp);
> +    block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, 0, errp);
>      if (!block) {
>          if (created) {
>              unlink(mem_path);
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index 27dcccd..a28f702 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -86,7 +86,8 @@ void *qemu_ram_mmap(int fd,
>                      size_t size,
>                      size_t align,
>                      bool shared,
> -                    bool is_pmem)
> +                    bool is_pmem,
> +                    off_t start)
>  {
>      int flags;
>      int map_sync_flags = 0;
> @@ -147,7 +148,7 @@ void *qemu_ram_mmap(int fd,
>      offset = QEMU_ALIGN_UP((uintptr_t)guardptr, align) -
> (uintptr_t)guardptr;
>
>      ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE,
> -               flags | map_sync_flags, fd, 0);
> +               flags | map_sync_flags, fd, start);
>
>      if (ptr == MAP_FAILED && map_sync_flags) {
>          if (errno == ENOTSUP) {
> @@ -172,7 +173,7 @@ void *qemu_ram_mmap(int fd,
>           * we will remove these flags to handle compatibility.
>           */
>          ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE,
> -                   flags, fd, 0);
> +                   flags, fd, start);
>      }
>
>      if (ptr == MAP_FAILED) {
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index f15234b..93874df 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -227,7 +227,7 @@ void *qemu_memalign(size_t alignment, size_t size)
>  void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
>  {
>      size_t align = QEMU_VMALLOC_ALIGN;
> -    void *ptr = qemu_ram_mmap(-1, size, align, shared, false);
> +    void *ptr = qemu_ram_mmap(-1, size, align, shared, false, 0);
>
>      if (ptr == MAP_FAILED) {
>          return NULL;
> --
> 1.8.3.1
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 12057 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 03/19] memory: alloc RAM from file at offset
  2020-12-04 14:13   ` Marc-André Lureau
@ 2020-12-04 14:18     ` Marc-André Lureau
  0 siblings, 0 replies; 52+ messages in thread
From: Marc-André Lureau @ 2020-12-04 14:18 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: Elena Ufimtseva, Fam Zheng, Swapnil Ingle, John G Johnson, QEMU,
	Gerd Hoffmann, Juan Quintela, Michael S. Tsirkin,
	Markus Armbruster, Kanth Ghatraju, Felipe Franciosi, Thomas Huth,
	Eduardo Habkost, Konrad Rzeszutek Wilk, Dr. David Alan Gilbert,
	Alex Williamson, Stefan Hajnoczi, Thanos Makatos,
	Richard Henderson, Kevin Wolf, Daniel P. Berrange, Max Reitz,
	Ross Lagerwall, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 10122 bytes --]

On Fri, Dec 4, 2020 at 6:13 PM Marc-André Lureau <marcandre.lureau@gmail.com>
wrote:

>
>
> On Wed, Dec 2, 2020 at 12:23 AM Jagannathan Raman <jag.raman@oracle.com>
> wrote:
>
>> Allow RAM MemoryRegion to be created from an offset in a file, instead
>> of allocating at offset of 0 by default. This is needed to synchronize
>> RAM between QEMU & remote process.
>>
>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>> ---
>>  include/exec/memory.h     |  2 ++
>>  include/exec/ram_addr.h   |  2 +-
>>  include/qemu/mmap-alloc.h |  3 ++-
>>  backends/hostmem-memfd.c  |  2 +-
>>  hw/misc/ivshmem.c         |  3 ++-
>>  softmmu/memory.c          |  3 ++-
>>  softmmu/physmem.c         | 11 +++++++----
>>  util/mmap-alloc.c         |  7 ++++---
>>  util/oslib-posix.c        |  2 +-
>>  9 files changed, 22 insertions(+), 13 deletions(-)
>>
>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index 0f3e6bc..7bcaada 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -980,6 +980,7 @@ void memory_region_init_ram_from_file(MemoryRegion
>> *mr,
>>   * @size: size of the region.
>>   * @share: %true if memory must be mmaped with the MAP_SHARED flag
>>   * @fd: the fd to mmap.
>> + * @offset: offset within the file referenced by fd
>>   * @errp: pointer to Error*, to store an error if it happens.
>>   *
>>   * Note that this function does not do anything to cause the data in the
>> @@ -991,6 +992,7 @@ void memory_region_init_ram_from_fd(MemoryRegion *mr,
>>                                      uint64_t size,
>>                                      bool share,
>>                                      int fd,
>> +                                    ram_addr_t offset,
>>                                      Error **errp);
>>  #endif
>>
>> diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
>> index c6d2ef1..d465a48 100644
>> --- a/include/exec/ram_addr.h
>> +++ b/include/exec/ram_addr.h
>> @@ -121,7 +121,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size,
>> MemoryRegion *mr,
>>                                     Error **errp);
>>  RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
>>                                   uint32_t ram_flags, int fd,
>> -                                 Error **errp);
>> +                                 off_t offset, Error **errp);
>>
>>  RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
>>                                    MemoryRegion *mr, Error **errp);
>> diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
>> index e786266..4f57985 100644
>> --- a/include/qemu/mmap-alloc.h
>> +++ b/include/qemu/mmap-alloc.h
>> @@ -25,7 +25,8 @@ void *qemu_ram_mmap(int fd,
>>                      size_t size,
>>                      size_t align,
>>                      bool shared,
>> -                    bool is_pmem);
>> +                    bool is_pmem,
>> +                    off_t start);
>>
>
> I'd suggest to keep the variable name  "offset", to avoid potential
> confusion (it's also the name of the mmap() argument).
>

I realize the inner offset variable will need to be renamed then, I'd
suggest guard_offset for instance.


>
>>  void qemu_ram_munmap(int fd, void *ptr, size_t size);
>>
>> diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c
>> index e5626d4..69b0ae3 100644
>> --- a/backends/hostmem-memfd.c
>> +++ b/backends/hostmem-memfd.c
>> @@ -55,7 +55,7 @@ memfd_backend_memory_alloc(HostMemoryBackend *backend,
>> Error **errp)
>>      name = host_memory_backend_get_name(backend);
>>      memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend),
>>                                     name, backend->size,
>> -                                   backend->share, fd, errp);
>> +                                   backend->share, fd, 0, errp);
>>      g_free(name);
>>  }
>>
>> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
>> index e321e5c..8d3e1ee 100644
>> --- a/hw/misc/ivshmem.c
>> +++ b/hw/misc/ivshmem.c
>> @@ -494,7 +494,8 @@ static void process_msg_shmem(IVShmemState *s, int
>> fd, Error **errp)
>>
>>      /* mmap the region and map into the BAR2 */
>>      memory_region_init_ram_from_fd(&s->server_bar2, OBJECT(s),
>> -                                   "ivshmem.bar2", size, true, fd,
>> &local_err);
>> +                                   "ivshmem.bar2", size, true, fd, 0,
>> +                                   &local_err);
>>      if (local_err) {
>>          error_propagate(errp, local_err);
>>          return;
>> diff --git a/softmmu/memory.c b/softmmu/memory.c
>> index 11ca94d..e4ed0e4 100644
>> --- a/softmmu/memory.c
>> +++ b/softmmu/memory.c
>> @@ -1612,6 +1612,7 @@ void memory_region_init_ram_from_fd(MemoryRegion
>> *mr,
>>                                      uint64_t size,
>>                                      bool share,
>>                                      int fd,
>> +                                    ram_addr_t offset,
>>                                      Error **errp)
>>  {
>>      Error *err = NULL;
>> @@ -1621,7 +1622,7 @@ void memory_region_init_ram_from_fd(MemoryRegion
>> *mr,
>>      mr->destructor = memory_region_destructor_ram;
>>      mr->ram_block = qemu_ram_alloc_from_fd(size, mr,
>>                                             share ? RAM_SHARED : 0,
>> -                                           fd, &err);
>> +                                           fd, offset, &err);
>>      mr->dirty_log_mask = tcg_enabled() ? (1 << DIRTY_MEMORY_CODE) : 0;
>>      if (err) {
>>          mr->size = int128_zero();
>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
>> index 3027747..e0b8fc6 100644
>> --- a/softmmu/physmem.c
>> +++ b/softmmu/physmem.c
>> @@ -1461,6 +1461,7 @@ static void *file_ram_alloc(RAMBlock *block,
>>                              ram_addr_t memory,
>>                              int fd,
>>                              bool truncate,
>> +                            off_t offset,
>>                              Error **errp)
>>  {
>>      void *area;
>> @@ -1511,7 +1512,8 @@ static void *file_ram_alloc(RAMBlock *block,
>>      }
>>
>>      area = qemu_ram_mmap(fd, memory, block->mr->align,
>> -                         block->flags & RAM_SHARED, block->flags &
>> RAM_PMEM);
>> +                         block->flags & RAM_SHARED, block->flags &
>> RAM_PMEM,
>> +                         offset);
>>      if (area == MAP_FAILED) {
>>          error_setg_errno(errp, errno,
>>                           "unable to map backing store for guest RAM");
>> @@ -1943,7 +1945,7 @@ static void ram_block_add(RAMBlock *new_block,
>> Error **errp, bool shared)
>>  #ifdef CONFIG_POSIX
>>  RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
>>                                   uint32_t ram_flags, int fd,
>> -                                 Error **errp)
>> +                                 off_t offset, Error **errp)
>>  {
>>      RAMBlock *new_block;
>>      Error *local_err = NULL;
>> @@ -1996,7 +1998,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size,
>> MemoryRegion *mr,
>>      new_block->used_length = size;
>>      new_block->max_length = size;
>>      new_block->flags = ram_flags;
>> -    new_block->host = file_ram_alloc(new_block, size, fd, !file_size,
>> errp);
>> +    new_block->host = file_ram_alloc(new_block, size, fd, !file_size,
>> offset,
>> +                                     errp);
>>      if (!new_block->host) {
>>          g_free(new_block);
>>          return NULL;
>> @@ -2026,7 +2029,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size,
>> MemoryRegion *mr,
>>          return NULL;
>>      }
>>
>> -    block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, errp);
>> +    block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, 0, errp);
>>      if (!block) {
>>          if (created) {
>>              unlink(mem_path);
>> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
>> index 27dcccd..a28f702 100644
>> --- a/util/mmap-alloc.c
>> +++ b/util/mmap-alloc.c
>> @@ -86,7 +86,8 @@ void *qemu_ram_mmap(int fd,
>>                      size_t size,
>>                      size_t align,
>>                      bool shared,
>> -                    bool is_pmem)
>> +                    bool is_pmem,
>> +                    off_t start)
>>  {
>>      int flags;
>>      int map_sync_flags = 0;
>> @@ -147,7 +148,7 @@ void *qemu_ram_mmap(int fd,
>>      offset = QEMU_ALIGN_UP((uintptr_t)guardptr, align) -
>> (uintptr_t)guardptr;
>>
>>      ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE,
>> -               flags | map_sync_flags, fd, 0);
>> +               flags | map_sync_flags, fd, start);
>>
>>      if (ptr == MAP_FAILED && map_sync_flags) {
>>          if (errno == ENOTSUP) {
>> @@ -172,7 +173,7 @@ void *qemu_ram_mmap(int fd,
>>           * we will remove these flags to handle compatibility.
>>           */
>>          ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE,
>> -                   flags, fd, 0);
>> +                   flags, fd, start);
>>      }
>>
>>      if (ptr == MAP_FAILED) {
>> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
>> index f15234b..93874df 100644
>> --- a/util/oslib-posix.c
>> +++ b/util/oslib-posix.c
>> @@ -227,7 +227,7 @@ void *qemu_memalign(size_t alignment, size_t size)
>>  void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
>>  {
>>      size_t align = QEMU_VMALLOC_ALIGN;
>> -    void *ptr = qemu_ram_mmap(-1, size, align, shared, false);
>> +    void *ptr = qemu_ram_mmap(-1, size, align, shared, false, 0);
>>
>>      if (ptr == MAP_FAILED) {
>>          return NULL;
>> --
>> 1.8.3.1
>>
>>
>
> --
> Marc-André Lureau
>


-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 12879 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 05/19] multi-process: setup PCI host bridge for remote device
  2020-12-01 20:22 ` [PATCH v12 05/19] multi-process: setup PCI host bridge for remote device Jagannathan Raman
@ 2020-12-04 14:29   ` Marc-André Lureau
  2020-12-04 14:32   ` Marc-André Lureau
  1 sibling, 0 replies; 52+ messages in thread
From: Marc-André Lureau @ 2020-12-04 14:29 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: Elena Ufimtseva, Fam Zheng, Swapnil Ingle, John G Johnson, QEMU,
	Gerd Hoffmann, Juan Quintela, Michael S. Tsirkin,
	Markus Armbruster, Kanth Ghatraju, Felipe Franciosi, Thomas Huth,
	Eduardo Habkost, Konrad Rzeszutek Wilk, Dr. David Alan Gilbert,
	Alex Williamson, Stefan Hajnoczi, Thanos Makatos,
	Richard Henderson, Kevin Wolf, Daniel P. Berrange, Max Reitz,
	Ross Lagerwall, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 6564 bytes --]

On Wed, Dec 2, 2020 at 12:23 AM Jagannathan Raman <jag.raman@oracle.com>
wrote:

> PCI host bridge is setup for the remote device process. It is
> implemented using remote-pcihost object. It is an extension of the PCI
> host bridge setup by QEMU.
> Remote-pcihost configures a PCI bus which could be used by the remote
> PCI device to latch on to.
>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/hw/pci-host/remote.h | 30 ++++++++++++++++++
>  hw/pci-host/remote.c         | 75
> ++++++++++++++++++++++++++++++++++++++++++++
>  MAINTAINERS                  |  2 ++
>  hw/pci-host/Kconfig          |  3 ++
>  hw/pci-host/meson.build      |  1 +
>  hw/remote/Kconfig            |  1 +
>  6 files changed, 112 insertions(+)
>  create mode 100644 include/hw/pci-host/remote.h
>  create mode 100644 hw/pci-host/remote.c
>
> diff --git a/include/hw/pci-host/remote.h b/include/hw/pci-host/remote.h
> new file mode 100644
> index 0000000..bab6d3c
> --- /dev/null
> +++ b/include/hw/pci-host/remote.h
> @@ -0,0 +1,30 @@
> +/*
> + * PCI Host for remote device
> + *
> + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef REMOTE_PCIHOST_H
> +#define REMOTE_PCIHOST_H
> +
> +#include "exec/memory.h"
> +#include "hw/pci/pcie_host.h"
> +
> +#define TYPE_REMOTE_HOST_DEVICE "remote-pcihost"
> +#define REMOTE_HOST_DEVICE(obj) \
> +    OBJECT_CHECK(RemotePCIHost, (obj), TYPE_REMOTE_HOST_DEVICE)
> +
> +typedef struct RemotePCIHost {
> +    /*< private >*/
> +    PCIExpressHost parent_obj;
> +    /*< public >*/
> +
> +    MemoryRegion *mr_pci_mem;
> +    MemoryRegion *mr_sys_io;
> +} RemotePCIHost;
> +
> +#endif
> diff --git a/hw/pci-host/remote.c b/hw/pci-host/remote.c
> new file mode 100644
> index 0000000..11325e2
> --- /dev/null
> +++ b/hw/pci-host/remote.c
> @@ -0,0 +1,75 @@
> +/*
> + * Remote PCI host device
> + *
> + * Unlike PCI host devices that model physical hardware, the purpose
> + * of this PCI host is to host multi-process QEMU devices.
> + *
> + * Multi-process QEMU extends the PCI host of a QEMU machine into a
> + * remote process. Any PCI device attached to the remote process is
> + * visible in the QEMU guest. This allows existing QEMU device models
> + * to be reused in the remote process.
> + *
> + * This PCI host is purely a container for PCI devices. It's fake in the
> + * sense that the guest never sees this PCI host and has no way of
> + * accessing it. Its job is just to provide the environment that QEMU
> + * PCI device models need when running in a remote process.
> + *
> + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +
> +#include "hw/pci/pci.h"
> +#include "hw/pci/pci_host.h"
> +#include "hw/pci/pcie_host.h"
> +#include "hw/qdev-properties.h"
> +#include "hw/pci-host/remote.h"
> +#include "exec/memory.h"
> +
> +static const char *remote_pcihost_root_bus_path(PCIHostState *host_bridge,
> +                                                PCIBus *rootbus)
> +{
> +    return "0000:00";
> +}
> +
> +static void remote_pcihost_realize(DeviceState *dev, Error **errp)
> +{
> +    PCIHostState *pci = PCI_HOST_BRIDGE(dev);
> +    RemotePCIHost *s = REMOTE_HOST_DEVICE(dev);
> +
> +    pci->bus = pci_root_bus_new(DEVICE(s), "remote-pci",
> +                                s->mr_pci_mem, s->mr_sys_io,
> +                                0, TYPE_PCIE_BUS);
> +}
> +
> +static void remote_pcihost_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
> +
> +    hc->root_bus_path = remote_pcihost_root_bus_path;
> +    dc->realize = remote_pcihost_realize;
> +
> +    dc->user_creatable = false;
> +    set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
> +    dc->fw_name = "pci";
> +}
> +
> +static const TypeInfo remote_pcihost_info = {
> +    .name = TYPE_REMOTE_HOST_DEVICE,
> +    .parent = TYPE_PCIE_HOST_BRIDGE,
> +    .instance_size = sizeof(RemotePCIHost),
> +    .class_init = remote_pcihost_class_init,
> +};
> +
> +static void remote_pcihost_register(void)
> +{
> +    type_register_static(&remote_pcihost_info);
> +}
> +
> +type_init(remote_pcihost_register)
> diff --git a/MAINTAINERS b/MAINTAINERS
> index f615ad1..4515476 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3137,6 +3137,8 @@ M: John G Johnson <john.g.johnson@oracle.com>
>  S: Maintained
>  F: docs/devel/multi-process.rst
>  F: tests/multiprocess/multiprocess-lsi53c895a.py
> +F: hw/pci-host/remote.c
> +F: include/hw/pci-host/remote.h
>
>  Build and test automation
>  -------------------------
> diff --git a/hw/pci-host/Kconfig b/hw/pci-host/Kconfig
> index 036a618..25cdeb2 100644
> --- a/hw/pci-host/Kconfig
> +++ b/hw/pci-host/Kconfig
> @@ -60,3 +60,6 @@ config PCI_BONITO
>      select PCI
>      select UNIMP
>      bool
> +
> +config MULTIPROCESS_HOST
> +    bool
>

Why not REMOTE_PCIHOST ?

diff --git a/hw/pci-host/meson.build b/hw/pci-host/meson.build
> index e6d1b89..4147100 100644
> --- a/hw/pci-host/meson.build
> +++ b/hw/pci-host/meson.build
> @@ -9,6 +9,7 @@ pci_ss.add(when: 'CONFIG_PCI_EXPRESS_XILINX', if_true:
> files('xilinx-pcie.c'))
>  pci_ss.add(when: 'CONFIG_PCI_I440FX', if_true: files('i440fx.c'))
>  pci_ss.add(when: 'CONFIG_PCI_SABRE', if_true: files('sabre.c'))
>  pci_ss.add(when: 'CONFIG_XEN_IGD_PASSTHROUGH', if_true:
> files('xen_igd_pt.c'))
> +pci_ss.add(when: 'CONFIG_MULTIPROCESS_HOST', if_true: files('remote.c'))
>
>  # PPC devices
>  pci_ss.add(when: 'CONFIG_PREP_PCI', if_true: files('prep.c'))
> diff --git a/hw/remote/Kconfig b/hw/remote/Kconfig
> index 5484446..fb6ee4a 100644
> --- a/hw/remote/Kconfig
> +++ b/hw/remote/Kconfig
> @@ -1,3 +1,4 @@
>  config MULTIPROCESS
>      bool
>      depends on PCI && KVM
> +    select MULTIPROCESS_HOST
> --
> 1.8.3.1
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 8251 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 05/19] multi-process: setup PCI host bridge for remote device
  2020-12-01 20:22 ` [PATCH v12 05/19] multi-process: setup PCI host bridge for remote device Jagannathan Raman
  2020-12-04 14:29   ` Marc-André Lureau
@ 2020-12-04 14:32   ` Marc-André Lureau
  1 sibling, 0 replies; 52+ messages in thread
From: Marc-André Lureau @ 2020-12-04 14:32 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: Elena Ufimtseva, Fam Zheng, Swapnil Ingle, John G Johnson, QEMU,
	Gerd Hoffmann, Juan Quintela, Michael S. Tsirkin,
	Markus Armbruster, Kanth Ghatraju, Felipe Franciosi, Thomas Huth,
	Eduardo Habkost, Konrad Rzeszutek Wilk, Dr. David Alan Gilbert,
	Alex Williamson, Stefan Hajnoczi, Thanos Makatos,
	Richard Henderson, Kevin Wolf, Daniel P. Berrange, Max Reitz,
	Ross Lagerwall, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 6673 bytes --]

On Wed, Dec 2, 2020 at 12:23 AM Jagannathan Raman <jag.raman@oracle.com>
wrote:

> PCI host bridge is setup for the remote device process. It is
> implemented using remote-pcihost object. It is an extension of the PCI
> host bridge setup by QEMU.
> Remote-pcihost configures a PCI bus which could be used by the remote
> PCI device to latch on to.
>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/hw/pci-host/remote.h | 30 ++++++++++++++++++
>  hw/pci-host/remote.c         | 75
> ++++++++++++++++++++++++++++++++++++++++++++
>  MAINTAINERS                  |  2 ++
>  hw/pci-host/Kconfig          |  3 ++
>  hw/pci-host/meson.build      |  1 +
>  hw/remote/Kconfig            |  1 +
>  6 files changed, 112 insertions(+)
>  create mode 100644 include/hw/pci-host/remote.h
>  create mode 100644 hw/pci-host/remote.c
>
> diff --git a/include/hw/pci-host/remote.h b/include/hw/pci-host/remote.h
> new file mode 100644
> index 0000000..bab6d3c
> --- /dev/null
> +++ b/include/hw/pci-host/remote.h
> @@ -0,0 +1,30 @@
> +/*
> + * PCI Host for remote device
> + *
> + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef REMOTE_PCIHOST_H
> +#define REMOTE_PCIHOST_H
> +
> +#include "exec/memory.h"
> +#include "hw/pci/pcie_host.h"
> +
> +#define TYPE_REMOTE_HOST_DEVICE "remote-pcihost"
>

I got confused by the following patch. Please try to be consistent.

TYPE_REMOTE_PCIHOST would help.

thanks

+#define REMOTE_HOST_DEVICE(obj) \
> +    OBJECT_CHECK(RemotePCIHost, (obj), TYPE_REMOTE_HOST_DEVICE)
>

REMOTE_PCIHOST

+
> +typedef struct RemotePCIHost {
> +    /*< private >*/
> +    PCIExpressHost parent_obj;
> +    /*< public >*/
> +
> +    MemoryRegion *mr_pci_mem;
> +    MemoryRegion *mr_sys_io;
> +} RemotePCIHost;
> +
> +#endif
> diff --git a/hw/pci-host/remote.c b/hw/pci-host/remote.c
> new file mode 100644
> index 0000000..11325e2
> --- /dev/null
> +++ b/hw/pci-host/remote.c
> @@ -0,0 +1,75 @@
> +/*
> + * Remote PCI host device
> + *
> + * Unlike PCI host devices that model physical hardware, the purpose
> + * of this PCI host is to host multi-process QEMU devices.
> + *
> + * Multi-process QEMU extends the PCI host of a QEMU machine into a
> + * remote process. Any PCI device attached to the remote process is
> + * visible in the QEMU guest. This allows existing QEMU device models
> + * to be reused in the remote process.
> + *
> + * This PCI host is purely a container for PCI devices. It's fake in the
> + * sense that the guest never sees this PCI host and has no way of
> + * accessing it. Its job is just to provide the environment that QEMU
> + * PCI device models need when running in a remote process.
> + *
> + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +
> +#include "hw/pci/pci.h"
> +#include "hw/pci/pci_host.h"
> +#include "hw/pci/pcie_host.h"
> +#include "hw/qdev-properties.h"
> +#include "hw/pci-host/remote.h"
> +#include "exec/memory.h"
> +
> +static const char *remote_pcihost_root_bus_path(PCIHostState *host_bridge,
> +                                                PCIBus *rootbus)
> +{
> +    return "0000:00";
> +}
> +
> +static void remote_pcihost_realize(DeviceState *dev, Error **errp)
> +{
> +    PCIHostState *pci = PCI_HOST_BRIDGE(dev);
> +    RemotePCIHost *s = REMOTE_HOST_DEVICE(dev);
> +
> +    pci->bus = pci_root_bus_new(DEVICE(s), "remote-pci",
> +                                s->mr_pci_mem, s->mr_sys_io,
> +                                0, TYPE_PCIE_BUS);
> +}
> +
> +static void remote_pcihost_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
> +
> +    hc->root_bus_path = remote_pcihost_root_bus_path;
> +    dc->realize = remote_pcihost_realize;
> +
> +    dc->user_creatable = false;
> +    set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
> +    dc->fw_name = "pci";
> +}
> +
> +static const TypeInfo remote_pcihost_info = {
> +    .name = TYPE_REMOTE_HOST_DEVICE,
> +    .parent = TYPE_PCIE_HOST_BRIDGE,
> +    .instance_size = sizeof(RemotePCIHost),
> +    .class_init = remote_pcihost_class_init,
> +};
> +
> +static void remote_pcihost_register(void)
> +{
> +    type_register_static(&remote_pcihost_info);
> +}
> +
> +type_init(remote_pcihost_register)
> diff --git a/MAINTAINERS b/MAINTAINERS
> index f615ad1..4515476 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3137,6 +3137,8 @@ M: John G Johnson <john.g.johnson@oracle.com>
>  S: Maintained
>  F: docs/devel/multi-process.rst
>  F: tests/multiprocess/multiprocess-lsi53c895a.py
> +F: hw/pci-host/remote.c
> +F: include/hw/pci-host/remote.h
>
>  Build and test automation
>  -------------------------
> diff --git a/hw/pci-host/Kconfig b/hw/pci-host/Kconfig
> index 036a618..25cdeb2 100644
> --- a/hw/pci-host/Kconfig
> +++ b/hw/pci-host/Kconfig
> @@ -60,3 +60,6 @@ config PCI_BONITO
>      select PCI
>      select UNIMP
>      bool
> +
> +config MULTIPROCESS_HOST
> +    bool
> diff --git a/hw/pci-host/meson.build b/hw/pci-host/meson.build
> index e6d1b89..4147100 100644
> --- a/hw/pci-host/meson.build
> +++ b/hw/pci-host/meson.build
> @@ -9,6 +9,7 @@ pci_ss.add(when: 'CONFIG_PCI_EXPRESS_XILINX', if_true:
> files('xilinx-pcie.c'))
>  pci_ss.add(when: 'CONFIG_PCI_I440FX', if_true: files('i440fx.c'))
>  pci_ss.add(when: 'CONFIG_PCI_SABRE', if_true: files('sabre.c'))
>  pci_ss.add(when: 'CONFIG_XEN_IGD_PASSTHROUGH', if_true:
> files('xen_igd_pt.c'))
> +pci_ss.add(when: 'CONFIG_MULTIPROCESS_HOST', if_true: files('remote.c'))
>
>  # PPC devices
>  pci_ss.add(when: 'CONFIG_PREP_PCI', if_true: files('prep.c'))
> diff --git a/hw/remote/Kconfig b/hw/remote/Kconfig
> index 5484446..fb6ee4a 100644
> --- a/hw/remote/Kconfig
> +++ b/hw/remote/Kconfig
> @@ -1,3 +1,4 @@
>  config MULTIPROCESS
>      bool
>      depends on PCI && KVM
> +    select MULTIPROCESS_HOST
> --
> 1.8.3.1
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 8579 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 06/19] multi-process: setup a machine object for remote device process
  2020-12-01 20:22 ` [PATCH v12 06/19] multi-process: setup a machine object for remote device process Jagannathan Raman
@ 2020-12-04 14:35   ` Marc-André Lureau
  2020-12-09 16:56     ` Jag Raman
  0 siblings, 1 reply; 52+ messages in thread
From: Marc-André Lureau @ 2020-12-04 14:35 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: Elena Ufimtseva, Fam Zheng, Swapnil Ingle, John G Johnson, QEMU,
	Gerd Hoffmann, Juan Quintela, Michael S. Tsirkin,
	Markus Armbruster, Kanth Ghatraju, Felipe Franciosi, Thomas Huth,
	Eduardo Habkost, Konrad Rzeszutek Wilk, Dr. David Alan Gilbert,
	Alex Williamson, Stefan Hajnoczi, Thanos Makatos,
	Richard Henderson, Kevin Wolf, Daniel P. Berrange, Max Reitz,
	Ross Lagerwall, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 6049 bytes --]

On Wed, Dec 2, 2020 at 12:23 AM Jagannathan Raman <jag.raman@oracle.com>
wrote:

> x-remote-machine object sets up various subsystems of the remote
> device process. Instantiate PCI host bridge object and initialize RAM, IO &
> PCI memory regions.
>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/hw/pci-host/remote.h |  1 +
>  include/hw/remote/machine.h  | 28 ++++++++++++++++++
>  hw/remote/machine.c          | 69
> ++++++++++++++++++++++++++++++++++++++++++++
>  MAINTAINERS                  |  2 ++
>  hw/meson.build               |  1 +
>  hw/remote/meson.build        |  5 ++++
>  6 files changed, 106 insertions(+)
>  create mode 100644 include/hw/remote/machine.h
>  create mode 100644 hw/remote/machine.c
>  create mode 100644 hw/remote/meson.build
>
> diff --git a/include/hw/pci-host/remote.h b/include/hw/pci-host/remote.h
> index bab6d3c..cc0fff4 100644
> --- a/include/hw/pci-host/remote.h
> +++ b/include/hw/pci-host/remote.h
> @@ -25,6 +25,7 @@ typedef struct RemotePCIHost {
>
>      MemoryRegion *mr_pci_mem;
>      MemoryRegion *mr_sys_io;
> +    MemoryRegion *mr_sys_mem;
>

Why is this not part of the previous patch?

 } RemotePCIHost;
>
>  #endif
> diff --git a/include/hw/remote/machine.h b/include/hw/remote/machine.h
> new file mode 100644
> index 0000000..d312972
> --- /dev/null
> +++ b/include/hw/remote/machine.h
> @@ -0,0 +1,28 @@
> +/*
> + * Remote machine configuration
> + *
> + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef REMOTE_MACHINE_H
> +#define REMOTE_MACHINE_H
> +
> +#include "qom/object.h"
> +#include "hw/boards.h"
> +#include "hw/pci-host/remote.h"
> +
> +typedef struct RemoteMachineState {
> +    MachineState parent_obj;
> +
> +    RemotePCIHost *host;
> +} RemoteMachineState;
> +
> +#define TYPE_REMOTE_MACHINE "x-remote-machine"
> +#define REMOTE_MACHINE(obj) \
> +    OBJECT_CHECK(RemoteMachineState, (obj), TYPE_REMOTE_MACHINE)
> +
> +#endif
> diff --git a/hw/remote/machine.c b/hw/remote/machine.c
> new file mode 100644
> index 0000000..c5658bf
> --- /dev/null
> +++ b/hw/remote/machine.c
> @@ -0,0 +1,69 @@
> +/*
> + * Machine for remote device
> + *
> + *  This machine type is used by the remote device process in
> multi-process
> + *  QEMU. QEMU device models depend on parent busses, interrupt
> controllers,
> + *  memory regions, etc. The remote machine type offers this environment
> so
> + *  that QEMU device models can be used as remote devices.
> + *
> + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +
> +#include "hw/remote/machine.h"
> +#include "exec/address-spaces.h"
> +#include "exec/memory.h"
> +#include "qapi/error.h"
> +
> +static void remote_machine_init(MachineState *machine)
> +{
> +    MemoryRegion *system_memory, *system_io, *pci_memory;
> +    RemoteMachineState *s = REMOTE_MACHINE(machine);
> +    RemotePCIHost *rem_host;
> +
> +    system_memory = get_system_memory();
> +    system_io = get_system_io();
> +
> +    pci_memory = g_new(MemoryRegion, 1);
> +    memory_region_init(pci_memory, NULL, "pci", UINT64_MAX);
> +
> +    rem_host = REMOTE_HOST_DEVICE(qdev_new(TYPE_REMOTE_HOST_DEVICE));
> +
> +    rem_host->mr_pci_mem = pci_memory;
> +    rem_host->mr_sys_mem = system_memory;
> +    rem_host->mr_sys_io = system_io;
> +
> +    s->host = rem_host;
> +
> +    object_property_add_child(OBJECT(s), "remote-device",
> OBJECT(rem_host));
>

"remote-pcihost" instead ?

+    memory_region_add_subregion_overlap(system_memory, 0x0, pci_memory,
> -1);
> +
> +    qdev_realize(DEVICE(rem_host), sysbus_get_default(), &error_fatal);
> +}
> +
> +static void remote_machine_class_init(ObjectClass *oc, void *data)
> +{
> +    MachineClass *mc = MACHINE_CLASS(oc);
> +
> +    mc->init = remote_machine_init;
>

Set mc->desc = "Experimental remote machine" ?

+}
> +
> +static const TypeInfo remote_machine = {
> +    .name = TYPE_REMOTE_MACHINE,
> +    .parent = TYPE_MACHINE,
> +    .instance_size = sizeof(RemoteMachineState),
> +    .class_init = remote_machine_class_init,
> +};
> +
> +static void remote_machine_register_types(void)
> +{
> +    type_register_static(&remote_machine);
> +}
> +
> +type_init(remote_machine_register_types);
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4515476..c45ac1d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3139,6 +3139,8 @@ F: docs/devel/multi-process.rst
>  F: tests/multiprocess/multiprocess-lsi53c895a.py
>  F: hw/pci-host/remote.c
>  F: include/hw/pci-host/remote.h
> +F: hw/remote/machine.c
> +F: include/hw/remote/machine.h
>
>  Build and test automation
>  -------------------------
> diff --git a/hw/meson.build b/hw/meson.build
> index 010de72..e615d72 100644
> --- a/hw/meson.build
> +++ b/hw/meson.build
> @@ -56,6 +56,7 @@ subdir('moxie')
>  subdir('nios2')
>  subdir('openrisc')
>  subdir('ppc')
> +subdir('remote')
>  subdir('riscv')
>  subdir('rx')
>  subdir('s390x')
> diff --git a/hw/remote/meson.build b/hw/remote/meson.build
> new file mode 100644
> index 0000000..197b038
> --- /dev/null
> +++ b/hw/remote/meson.build
> @@ -0,0 +1,5 @@
> +remote_ss = ss.source_set()
> +
> +remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('machine.c'))
> +
> +softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
> --
> 1.8.3.1
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 7907 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 02/19] multi-process: add configure and usage information
  2020-12-01 20:22 ` [PATCH v12 02/19] multi-process: add configure and usage information Jagannathan Raman
  2020-12-04 14:10   ` Marc-André Lureau
@ 2020-12-04 14:37   ` Daniel P. Berrangé
  2020-12-09 16:20     ` Jag Raman
  1 sibling, 1 reply; 52+ messages in thread
From: Daniel P. Berrangé @ 2020-12-04 14:37 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, fam, swapnil.ingle, john.g.johnson, qemu-devel,
	kraxel, quintela, mst, armbru, kanth.ghatraju, felipe, thuth,
	ehabkost, konrad.wilk, dgilbert, alex.williamson, stefanha,
	thanos.makatos, rth, kwolf, mreitz, ross.lagerwall,
	marcandre.lureau, pbonzini

On Tue, Dec 01, 2020 at 03:22:37PM -0500, Jagannathan Raman wrote:
> From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> 
> Adds documentation explaining the command-line arguments needed
> to use multi-process. Also adds a python script that illustrates the
> usage.
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  docs/multi-process.rst                        | 66 +++++++++++++++++++
>  MAINTAINERS                                   |  1 +
>  tests/multiprocess/multiprocess-lsi53c895a.py | 92 +++++++++++++++++++++++++++
>  3 files changed, 159 insertions(+)
>  create mode 100644 docs/multi-process.rst
>  create mode 100755 tests/multiprocess/multiprocess-lsi53c895a.py


> diff --git a/tests/multiprocess/multiprocess-lsi53c895a.py b/tests/multiprocess/multiprocess-lsi53c895a.py
> new file mode 100755
> index 0000000..bfe4f66
> --- /dev/null
> +++ b/tests/multiprocess/multiprocess-lsi53c895a.py
> @@ -0,0 +1,92 @@
> +#!/usr/bin/env python3
> +
> +import urllib.request
> +import subprocess
> +import argparse
> +import socket
> +import sys
> +import os
> +
> +arch = os.uname()[4]
> +proc_path = os.path.join(os.getcwd(), '..', '..', 'build', arch+'-softmmu',
> +                         'qemu-system-'+arch)
> +
> +parser = argparse.ArgumentParser(description='Launcher for multi-process QEMU')
> +parser.add_argument('--bin', required=False, help='location of QEMU binary',
> +                    metavar='bin');
> +args = parser.parse_args()
> +
> +if args.bin is not None:
> +    proc_path = args.bin
> +
> +if not os.path.isfile(proc_path):
> +    sys.exit('QEMU binary not found')
> +
> +kernel_path = os.path.join(os.getcwd(), 'vmlinuz')
> +initrd_path = os.path.join(os.getcwd(), 'initrd')
> +
> +proxy_cmd = [ proc_path,                                                    \
> +              '-name', 'Fedora', '-smp', '4', '-m', '2048', '-cpu', 'host', \

I wonder if setting 2 GB of RAM is too large for something that runs by
default as a test.

> +              '-object', 'memory-backend-memfd,id=sysmem-file,size=2G',     \
> +              '-numa', 'node,memdev=sysmem-file',                           \
> +              '-kernel', kernel_path, '-initrd', initrd_path,               \
> +              '-vnc', ':0',                                                 \
> +              '-monitor', 'unix:/home/qemu-sock,server,nowait',             \
> +            ]
> +
> +if arch == 'x86_64':
> +    print('Downloading images for arch x86_64')
> +    kernel_url = 'https://dl.fedoraproject.org/pub/fedora/linux/'    \
> +                 'releases/33/Everything/x86_64/os/images/'          \
> +                 'pxeboot/vmlinuz'
> +    initrd_url = 'https://dl.fedoraproject.org/pub/fedora/linux/'    \
> +                 'releases/33/Everything/x86_64/os/images/'          \
> +                 'pxeboot/initrd.img'
> +    proxy_cmd.append('-machine')
> +    proxy_cmd.append('pc,accel=kvm')
> +    proxy_cmd.append('-append')
> +    proxy_cmd.append('rdinit=/bin/bash console=ttyS0 console=tty0')
> +elif arch == 'aarch64':
> +    print('Downloading images for arch aarch64')
> +    kernel_url = 'https://dl.fedoraproject.org/pub/fedora/linux/'    \
> +                 'releases/33/Everything/aarch64/os/images/'         \
> +                 'pxeboot/vmlinuz'
> +    initrd_url = 'https://dl.fedoraproject.org/pub/fedora/linux/'    \
> +                 'releases/33/Everything/aarch64/os/images/'         \
> +                 'pxeboot/initrd.img'
> +    proxy_cmd.append('-machine')
> +    proxy_cmd.append('virt,gic-version=3')
> +    proxy_cmd.append('-accel')
> +    proxy_cmd.append('kvm')
> +    proxy_cmd.append('-append')
> +    proxy_cmd.append('rdinit=/bin/bash')
> +else:
> +    sys.exit('Arch %s not tested' % arch)

It doens't look like you really need a full OS here. Rather than
downloading the fairly large Fedora images, I'd suggest just using
the kernel that exists on the host OS already in /boot, and then
building a tiny initrd that contains just a static linked busybox.

I have this helper script that could be imported into QEMU for
this purpose:

  https://gitlab.com/berrange/tiny-vm-tools/-/blob/master/make-tiny-image.py

And just skip the test if busybox doesn't exist, or if the vmlinux
in /boot isn't accessible (Debian restricts it to root only IIRC)

> +
> +urllib.request.urlretrieve(kernel_url, kernel_path)
> +urllib.request.urlretrieve(initrd_url, initrd_path)
> +
> +proxy, remote = socket.socketpair(socket.AF_UNIX, socket.SOCK_STREAM)
> +
> +proxy_cmd.append('-device')
> +proxy_cmd.append('x-pci-proxy-dev,id=lsi1,fd='+str(proxy.fileno()))
> +
> +remote_cmd = [ proc_path,                                                      \
> +               '-machine', 'x-remote',                                         \
> +               '-device', 'lsi53c895a,id=lsi1',                                \
> +               '-object',                                                      \
> +               'x-remote-object,id=robj1,devid=lsi1,fd='+str(remote.fileno()), \
> +               '-display', 'none',                                             \
> +               '-monitor', 'unix:/home/rem-sock,server,nowait',                \
> +             ]
> +
> +pid = os.fork();
> +
> +if pid:
> +    # In Proxy
> +    print('Launching QEMU with Proxy object');
> +    process = subprocess.Popen(proxy_cmd, pass_fds=[proxy.fileno()])
> +else:
> +    # In remote
> +    print('Launching Remote process');
> +    process = subprocess.Popen(remote_cmd, pass_fds=[remote.fileno(), 0, 1, 2])

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 07/19] multi-process: add qio channel function to transmit data and fds
  2020-12-01 20:22 ` [PATCH v12 07/19] multi-process: add qio channel function to transmit data and fds Jagannathan Raman
@ 2020-12-04 14:40   ` Marc-André Lureau
  0 siblings, 0 replies; 52+ messages in thread
From: Marc-André Lureau @ 2020-12-04 14:40 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: Elena Ufimtseva, Fam Zheng, Swapnil Ingle, John G Johnson, QEMU,
	Gerd Hoffmann, Juan Quintela, Michael S. Tsirkin,
	Markus Armbruster, Kanth Ghatraju, Felipe Franciosi, Thomas Huth,
	Eduardo Habkost, Konrad Rzeszutek Wilk, Dr. David Alan Gilbert,
	Alex Williamson, Stefan Hajnoczi, Thanos Makatos,
	Richard Henderson, Kevin Wolf, Daniel P. Berrange, Max Reitz,
	Ross Lagerwall, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 3973 bytes --]

On Wed, Dec 2, 2020 at 12:23 AM Jagannathan Raman <jag.raman@oracle.com>
wrote:

> From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>
> Adds QIO channel functions that transmits the input iovs as well as the
> supplied fds.
>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/io/channel.h | 24 ++++++++++++++++++++++++
>  io/channel.c         | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 69 insertions(+)
>
> diff --git a/include/io/channel.h b/include/io/channel.h
> index 4d6fe45..0aa44e1 100644
> --- a/include/io/channel.h
> +++ b/include/io/channel.h
> @@ -773,5 +773,29 @@ void qio_channel_set_aio_fd_handler(QIOChannel *ioc,
>                                      IOHandler *io_read,
>                                      IOHandler *io_write,
>                                      void *opaque);
> +/**
> + * qio_channel_writev_full_all:
> + * @ioc: the channel object
> + * @iov: the array of memory regions to write data from
> + * @niov: the length of the @iov array
> + * @fds: an array of file handles to send
> + * @nfds: number of file handles in @fds
> + * @errp: pointer to a NULL-initialized error object
> + *
> + *
> + * Behaves like qio_channel_writev_full but will attempt
> + * to send all data passed (file handles and memory regions).
> + * The function will wait for all requested data
> + * to be written, yielding from the current coroutine
> + * if required.
> + *
> + * Returns: 0 if all bytes were written, or -1 on error
> + */
> +
> +int qio_channel_writev_full_all(QIOChannel *ioc,
> +                           const struct iovec *iov,
> +                           size_t niov,
> +                           int *fds, size_t nfds,
> +                           Error **errp);
>
>  #endif /* QIO_CHANNEL_H */
> diff --git a/io/channel.c b/io/channel.c
> index 93d449d..255dd46 100644
> --- a/io/channel.c
> +++ b/io/channel.c
> @@ -190,6 +190,51 @@ int qio_channel_writev_all(QIOChannel *ioc,
>      return ret;
>  }
>
> +int qio_channel_writev_full_all(QIOChannel *ioc,
> +                                const struct iovec *iov,
> +                                size_t niov,
> +                                int *fds, size_t nfds,
> +                                Error **errp)
> +{
>

Please make qio_channel_writev_all() call qio_channel_writev_full_all() to
avoid logic duplication.


> +    int ret = -1;
> +    struct iovec *local_iov = g_new(struct iovec, niov);
> +    struct iovec *local_iov_head = local_iov;
> +    unsigned int nlocal_iov = niov;
> +
> +    nlocal_iov = iov_copy(local_iov, nlocal_iov,
> +                          iov, niov,
> +                          0, iov_size(iov, niov));
> +
> +    while (nlocal_iov > 0) {
> +        ssize_t len;
> +        len = qio_channel_writev_full(ioc, local_iov, nlocal_iov, fds,
> +                                      nfds, errp);
> +        if (len == QIO_CHANNEL_ERR_BLOCK) {
> +            if (qemu_in_coroutine()) {
> +                qio_channel_yield(ioc, G_IO_OUT);
> +            } else {
> +                qio_channel_wait(ioc, G_IO_OUT);
> +            }
> +            continue;
> +        }
> +        if (len < 0) {
> +            goto cleanup;
> +        }
> +
> +        iov_discard_front(&local_iov, &nlocal_iov, len);
> +
> +        if (len > 0) {
> +            fds = NULL;
> +            nfds = 0;
> +        }
> +    }
> +
> +    ret = 0;
> + cleanup:
> +    g_free(local_iov_head);
> +    return ret;
> +}
> +
>  ssize_t qio_channel_readv(QIOChannel *ioc,
>                            const struct iovec *iov,
>                            size_t niov,
> --
> 1.8.3.1
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 5454 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 08/19] multi-process: define MPQemuMsg format and transmission functions
  2020-12-01 20:22 ` [PATCH v12 08/19] multi-process: define MPQemuMsg format and transmission functions Jagannathan Raman
@ 2020-12-07 13:18   ` Marc-André Lureau
  2020-12-10  1:40     ` Elena Ufimtseva
  0 siblings, 1 reply; 52+ messages in thread
From: Marc-André Lureau @ 2020-12-07 13:18 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: Elena Ufimtseva, Fam Zheng, Swapnil Ingle, John G Johnson, QEMU,
	Gerd Hoffmann, Juan Quintela, Michael S. Tsirkin,
	Markus Armbruster, Kanth Ghatraju, Felipe Franciosi, Thomas Huth,
	Eduardo Habkost, Konrad Rzeszutek Wilk, Dr. David Alan Gilbert,
	Alex Williamson, Stefan Hajnoczi, Thanos Makatos,
	Richard Henderson, Kevin Wolf, Daniel P. Berrange, Max Reitz,
	Ross Lagerwall, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 12532 bytes --]

Hi

On Wed, Dec 2, 2020 at 12:25 AM Jagannathan Raman <jag.raman@oracle.com>
wrote:

> From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>
> Defines MPQemuMsg, which is the message that is sent to the remote
> process. This message is sent over QIOChannel and is used to
> command the remote process to perform various tasks.
> Define transmission functions used by proxy and by remote.
> There are certain restrictions on where its safe to use these
> functions:
>   - From main loop in co-routine context. Will block the main loop if not
> in
>     co-routine context;
>   - From vCPU thread with no co-routine context and if the channel is not
> part
>     of the main loop handling;
>   - From IOThread within co-routine context, outside of co-routine context
> will
>     block IOThread;
>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> ---
>  include/hw/remote/mpqemu-link.h |  60 ++++++++++
>  hw/remote/mpqemu-link.c         | 242
> ++++++++++++++++++++++++++++++++++++++++
>  MAINTAINERS                     |   2 +
>  hw/remote/meson.build           |   1 +
>  4 files changed, 305 insertions(+)
>  create mode 100644 include/hw/remote/mpqemu-link.h
>  create mode 100644 hw/remote/mpqemu-link.c
>
> diff --git a/include/hw/remote/mpqemu-link.h
> b/include/hw/remote/mpqemu-link.h
> new file mode 100644
> index 0000000..2d79ff8
> --- /dev/null
> +++ b/include/hw/remote/mpqemu-link.h
> @@ -0,0 +1,60 @@
> +/*
> + * Communication channel between QEMU and remote device process
> + *
> + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef MPQEMU_LINK_H
> +#define MPQEMU_LINK_H
> +
> +#include "qom/object.h"
> +#include "qemu/thread.h"
> +#include "io/channel.h"
> +
> +#define REMOTE_MAX_FDS 8
> +
> +#define MPQEMU_MSG_HDR_SIZE offsetof(MPQemuMsg, data.u64)
> +
> +/**
> + * MPQemuCmd:
> + *
> + * MPQemuCmd enum type to specify the command to be executed on the remote
> + * device.
> + */
> +typedef enum {
> +    MPQEMU_CMD_INIT,
> +    MPQEMU_CMD_MAX,
> +} MPQemuCmd;
> +
> +/**
> + * MPQemuMsg:
> + * @cmd: The remote command
> + * @size: Size of the data to be shared
> + * @data: Structured data
> + * @fds: File descriptors to be shared with remote device
> + *
> + * MPQemuMsg Format of the message sent to the remote device from QEMU.
> + *
> + */
> +typedef struct {
> +    int cmd;
> +    size_t size;
> +
> +    union {
> +        uint64_t u64;
> +    } data;
> +
> +    int fds[REMOTE_MAX_FDS];
> +    int num_fds;
> +} MPQemuMsg;
> +
> +void mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp);
> +void mpqemu_msg_recv(MPQemuMsg *msg, QIOChannel *ioc, Error **errp);
> +
> +bool mpqemu_msg_valid(MPQemuMsg *msg);
> +
> +#endif
> diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
> new file mode 100644
> index 0000000..e535ed2
> --- /dev/null
> +++ b/hw/remote/mpqemu-link.c
> @@ -0,0 +1,242 @@
> +/*
> + * Communication channel between QEMU and remote device process
> + *
> + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +
> +#include "qemu/module.h"
> +#include "hw/remote/mpqemu-link.h"
> +#include "qapi/error.h"
> +#include "qemu/iov.h"
> +#include "qemu/error-report.h"
> +#include "qemu/main-loop.h"
> +
> +/*
> + * Send message over the ioc QIOChannel.
> + * This function is safe to call from:
> + * - From main loop in co-routine context. Will block the main loop if
> not in
> + *   co-routine context;
> + * - From vCPU thread with no co-routine context and if the channel is
> not part
> + *   of the main loop handling;
> + * - From IOThread within co-routine context, outside of co-routine
> context
> + *   will block IOThread;
>

Can drop the extra "From" on each line.

+ */
> +void mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp)
> +{
> +    bool iolock = qemu_mutex_iothread_locked();
> +    bool iothread = qemu_get_current_aio_context() ==
> qemu_get_aio_context() ?
> +                    false : true;
>

I would introduce a qemu_in_iothread() helper (similar to
qemu_in_coroutine() etc)

+    Error *local_err = NULL;
> +    struct iovec send[2] = {0};
> +    int *fds = NULL;
> +    size_t nfds = 0;
> +
> +    send[0].iov_base = msg;
> +    send[0].iov_len = MPQEMU_MSG_HDR_SIZE;
> +
> +    send[1].iov_base = (void *)&msg->data;
> +    send[1].iov_len = msg->size;
> +
> +    if (msg->num_fds) {
> +        nfds = msg->num_fds;
> +        fds = msg->fds;
> +    }
> +    /*
> +     * Dont use in IOThread out of co-routine context as
> +     * it will block IOThread.
> +     */
> +    if (iothread) {
> +        assert(qemu_in_coroutine());
> +    }
>

or simply assert(!iothread || qemu_in_coroutine())

+    /*
> +     * Skip unlocking/locking iothread when in IOThread running
> +     * in co-routine context. Co-routine context is asserted above
> +     * for IOThread case.
> +     * Also skip this while in a co-routine in the main context.
> +     */
> +    if (iolock && !iothread && !qemu_in_coroutine()) {
> +        qemu_mutex_unlock_iothread();
> +    }
> +
> +    (void)qio_channel_writev_full_all(ioc, send, G_N_ELEMENTS(send), fds,
> nfds,
> +                                      &local_err);
>

That extra (void) is probably unnecessary.


+
> +    if (iolock && !iothread && !qemu_in_coroutine()) {
> +        /* See above comment why skip locking here. */
> +        qemu_mutex_lock_iothread();
> +    }
> +
> +    if (errp) {
> +        error_propagate(errp, local_err);
> +    } else if (local_err) {
> +        error_report_err(local_err);
> +    }
>

Not sure this behaviour is recommended. Instead, a trace and an ERRP_GUARD
would be more idiomatic.


> +
> +    return;
>

That's an unnecessary return. Why not return true/false based on error?

+}
> +
> +/*
> + * Read message from the ioc QIOChannel.
> + * This function is safe to call from:
> + * - From main loop in co-routine context. Will block the main loop if
> not in
> + *   co-routine context;
> + * - From vCPU thread with no co-routine context and if the channel is
> not part
> + *   of the main loop handling;
> + * - From IOThread within co-routine context, outside of co-routine
> context
> + *   will block IOThread;
> + */
> +static ssize_t mpqemu_read(QIOChannel *ioc, void *buf, size_t len, int
> **fds,
> +                           size_t *nfds, Error **errp)
>
+{
> +    struct iovec iov = { .iov_base = buf, .iov_len = len };
> +    bool iolock = qemu_mutex_iothread_locked();
> +    bool iothread = qemu_get_current_aio_context() ==
> qemu_get_aio_context()
> +                        ? false : true;
> +    struct iovec *iovp = &iov;
> +    Error *local_err = NULL;
> +    unsigned int niov = 1;
> +    size_t *l_nfds = nfds;
> +    int **l_fds = fds;
> +    ssize_t bytes = 0;
> +    size_t size;
> +
> +    size = iov.iov_len;
> +
> +    /*
> +     * Dont use in IOThread out of co-routine context as
> +     * it will block IOThread.
> +     */
> +    if (iothread) {
> +        assert(qemu_in_coroutine());
> +    }
>

as above


> +
> +    while (size > 0) {
> +        bytes = qio_channel_readv_full(ioc, iovp, niov, l_fds, l_nfds,
> +                                       &local_err);
> +        if (bytes == QIO_CHANNEL_ERR_BLOCK) {
> +            /*
> +             * Skip unlocking/locking iothread when in IOThread running
> +             * in co-routine context. Co-routine context is asserted above
> +             * for IOThread case.
> +             * Also skip this while in a co-routine in the main context.
> +             */
> +            if (iolock && !iothread && !qemu_in_coroutine()) {
> +                qemu_mutex_unlock_iothread();
>

Why not lock the iothread at the beginning of the function and call a
readv_full_all like we do for writes?

+            }
> +            if (qemu_in_coroutine()) {
> +                qio_channel_yield(ioc, G_IO_IN);
> +            } else {
> +                qio_channel_wait(ioc, G_IO_IN);
> +            }
> +            /* See above comment why skip locking here. */
> +            if (iolock && !iothread && !qemu_in_coroutine()) {
> +                qemu_mutex_lock_iothread();
> +            }
> +            continue;
>
+        }
> +
> +        if (bytes <= 0) {
> +            error_propagate(errp, local_err);
> +            return -EIO;
> +        }
> +
> +        l_fds = NULL;
> +        l_nfds = NULL;
> +
> +        size -= bytes;
> +
> +        (void)iov_discard_front(&iovp, &niov, bytes);
>

needless cast

+    }
> +
> +    return len - size;
> +}
> +
> +void mpqemu_msg_recv(MPQemuMsg *msg, QIOChannel *ioc, Error **errp)
> +{
> +    Error *local_err = NULL;
> +    int *fds = NULL;
> +    size_t nfds = 0;
> +    ssize_t len;
> +
> +    len = mpqemu_read(ioc, (void *)msg, MPQEMU_MSG_HDR_SIZE, &fds, &nfds,
>

This cast is not necessary

+                      &local_err);
> +    if (!local_err) {
> +        if (len == -EIO) {
> +            error_setg(&local_err, "Connection closed.");
> +            goto fail;
> +        }
> +        if (len < 0) {
> +            error_setg(&local_err, "Message length is less than 0");
> +            goto fail;
> +        }
> +        if (len != MPQEMU_MSG_HDR_SIZE) {
> +            error_setg(&local_err, "Message header corrupted");
> +            goto fail;
> +        }
> +    } else {
> +        goto fail;
> +    }
> +
> +    if (msg->size > sizeof(msg->data)) {
> +        error_setg(&local_err, "Invalid size for message");
> +        goto fail;
> +    }
> +
> +    if (mpqemu_read(ioc, (void *)&msg->data, msg->size, NULL, NULL,
>

that one too

+                    &local_err) < 0) {
> +        goto fail;
> +    }
> +
> +    msg->num_fds = nfds;
> +    if (nfds > G_N_ELEMENTS(msg->fds)) {
> +        error_setg(&local_err,
> +                   "Overflow error: received %zu fds, more than max of %d
> fds",
> +                   nfds, REMOTE_MAX_FDS);
> +        goto fail;
> +    } else if (nfds) {
> +        memcpy(msg->fds, fds, nfds * sizeof(int));
> +    }
> +
> +fail:
> +    while (local_err && nfds) {
> +        close(fds[nfds - 1]);
> +        nfds--;
> +    }
> +
> +    g_free(fds);
> +
> +    if (errp) {
> +        error_propagate(errp, local_err);
> +    } else if (local_err) {
> +        error_report_err(local_err);
> +    }
> +}
> +
> +bool mpqemu_msg_valid(MPQemuMsg *msg)
> +{
> +    if (msg->cmd >= MPQEMU_CMD_MAX && msg->cmd < 0) {
> +        return false;
> +    }
> +
> +    /* Verify FDs. */
> +    if (msg->num_fds >= REMOTE_MAX_FDS) {
> +        return false;
> +    }
> +
> +    if (msg->num_fds > 0) {
> +        for (int i = 0; i < msg->num_fds; i++) {
> +            if (fcntl(msg->fds[i], F_GETFL) == -1) {
> +                return false;
> +            }
> +        }
> +    }
> +
> +    return true;
> +}
> diff --git a/MAINTAINERS b/MAINTAINERS
> index c45ac1d..d0c891a 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3141,6 +3141,8 @@ F: hw/pci-host/remote.c
>  F: include/hw/pci-host/remote.h
>  F: hw/remote/machine.c
>  F: include/hw/remote/machine.h
> +F: hw/remote/mpqemu-link.c
> +F: include/hw/remote/mpqemu-link.h
>
>  Build and test automation
>  -------------------------
> diff --git a/hw/remote/meson.build b/hw/remote/meson.build
> index 197b038..a2b2fc0 100644
> --- a/hw/remote/meson.build
> +++ b/hw/remote/meson.build
> @@ -1,5 +1,6 @@
>  remote_ss = ss.source_set()
>
>  remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('machine.c'))
> +remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true:
> files('mpqemu-link.c'))
>
>  softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
> --
> 1.8.3.1
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 17159 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 09/19] multi-process: Initialize message handler in remote device
  2020-12-01 20:22 ` [PATCH v12 09/19] multi-process: Initialize message handler in remote device Jagannathan Raman
@ 2020-12-07 13:33   ` Marc-André Lureau
  0 siblings, 0 replies; 52+ messages in thread
From: Marc-André Lureau @ 2020-12-07 13:33 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: Elena Ufimtseva, Fam Zheng, Swapnil Ingle, John G Johnson, QEMU,
	Gerd Hoffmann, Juan Quintela, Michael S. Tsirkin,
	Markus Armbruster, Kanth Ghatraju, Felipe Franciosi, Thomas Huth,
	Eduardo Habkost, Konrad Rzeszutek Wilk, Dr. David Alan Gilbert,
	Alex Williamson, Stefan Hajnoczi, Thanos Makatos,
	Richard Henderson, Kevin Wolf, Daniel P. Berrange, Max Reitz,
	Ross Lagerwall, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 5051 bytes --]

Hi

On Wed, Dec 2, 2020 at 12:23 AM Jagannathan Raman <jag.raman@oracle.com>
wrote:

> Initializes the message handler function in the remote process. It is
> called whenever there's an event pending on QIOChannel that registers
> this function.
>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/hw/remote/machine.h |  9 +++++++
>  hw/remote/message.c         | 61
> +++++++++++++++++++++++++++++++++++++++++++++
>  MAINTAINERS                 |  1 +
>  hw/remote/meson.build       |  1 +
>  4 files changed, 72 insertions(+)
>  create mode 100644 hw/remote/message.c
>
> diff --git a/include/hw/remote/machine.h b/include/hw/remote/machine.h
> index d312972..3073db6 100644
> --- a/include/hw/remote/machine.h
> +++ b/include/hw/remote/machine.h
> @@ -14,6 +14,7 @@
>  #include "qom/object.h"
>  #include "hw/boards.h"
>  #include "hw/pci-host/remote.h"
> +#include "io/channel.h"
>
>  typedef struct RemoteMachineState {
>      MachineState parent_obj;
> @@ -21,8 +22,16 @@ typedef struct RemoteMachineState {
>      RemotePCIHost *host;
>  } RemoteMachineState;
>
> +/* Used to pass to co-routine device and ioc. */
> +typedef struct RemoteCommDev {
> +    PCIDevice *dev;
> +    QIOChannel *ioc;
> +} RemoteCommDev;
> +
>  #define TYPE_REMOTE_MACHINE "x-remote-machine"
>  #define REMOTE_MACHINE(obj) \
>      OBJECT_CHECK(RemoteMachineState, (obj), TYPE_REMOTE_MACHINE)
>
> +void coroutine_fn mpqemu_remote_msg_loop_co(void *data);
> +
>  #endif
> diff --git a/hw/remote/message.c b/hw/remote/message.c
> new file mode 100644
> index 0000000..5d87bf4
> --- /dev/null
> +++ b/hw/remote/message.c
> @@ -0,0 +1,61 @@
> +/*
> + * Copyright © 2020 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL-v2, version 2 or
> later.
> + *
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +
> +#include "hw/remote/machine.h"
> +#include "io/channel.h"
> +#include "hw/remote/mpqemu-link.h"
> +#include "qapi/error.h"
> +#include "sysemu/runstate.h"
> +
> +void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
> +{
> +    RemoteCommDev *com = (RemoteCommDev *)data;
> +    PCIDevice *pci_dev = NULL;
> +
> +    pci_dev = com->dev;
> +    for (;;) {
> +        MPQemuMsg msg = {0};
> +        Error *local_err = NULL;
> +
> +        if (!com->ioc) {
> +            error_report("ERROR: No channel available");
> +            break;
> +        }
>

Shouldn't this be assert() at the top?


> +        mpqemu_msg_recv(&msg, com->ioc, &local_err);
> +        if (local_err) {
> +            error_report_err(local_err);
> +            break;
>

Error handling is not consistent in this function. Could you cleanup error
code paths so error handling & reporting is done in one place?

+        }
> +
> +        if (!mpqemu_msg_valid(&msg)) {
> +            error_report("Received invalid message from proxy"
> +                         "in remote process pid=%d", getpid());
> +            break;
> +        }
> +
> +        switch (msg.cmd) {
> +        default:
> +            error_setg(&local_err,
> +                       "Unknown command (%d) received for device %s
> (pid=%d)",
> +                       msg.cmd, DEVICE(pci_dev)->id, getpid());
> +        }
> +
> +        if (local_err) {
> +            error_report_err(local_err);
> +            qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
>

Presumably that error handling should be done outside of the for(;;) loop.

SHUTDOWN_CAUSE_HOST_ERROR might be more appropriate in this case, or
perhaps introduce a new ShutdownCause?

+            break;
> +        }
> +    }
> +    qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
> +
> +    return;
>

needless return statement

+}
> diff --git a/MAINTAINERS b/MAINTAINERS
> index d0c891a..b64e4b8 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3143,6 +3143,7 @@ F: hw/remote/machine.c
>  F: include/hw/remote/machine.h
>  F: hw/remote/mpqemu-link.c
>  F: include/hw/remote/mpqemu-link.h
> +F: hw/remote/message.c
>
>  Build and test automation
>  -------------------------
> diff --git a/hw/remote/meson.build b/hw/remote/meson.build
> index a2b2fc0..9f5c57f 100644
> --- a/hw/remote/meson.build
> +++ b/hw/remote/meson.build
> @@ -2,5 +2,6 @@ remote_ss = ss.source_set()
>
>  remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('machine.c'))
>  remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true:
> files('mpqemu-link.c'))
> +remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
>
>  softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
> --
> 1.8.3.1
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 7134 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 10/19] multi-process: Associate fd of a PCIDevice with its object
  2020-12-01 20:22 ` [PATCH v12 10/19] multi-process: Associate fd of a PCIDevice with its object Jagannathan Raman
@ 2020-12-07 14:03   ` Marc-André Lureau
  2020-12-08 12:07     ` Marc-André Lureau
  0 siblings, 1 reply; 52+ messages in thread
From: Marc-André Lureau @ 2020-12-07 14:03 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: Elena Ufimtseva, Fam Zheng, Swapnil Ingle, John G Johnson, QEMU,
	Gerd Hoffmann, Juan Quintela, Michael S. Tsirkin,
	Markus Armbruster, Kanth Ghatraju, Felipe Franciosi, Thomas Huth,
	Eduardo Habkost, Konrad Rzeszutek Wilk, Dr. David Alan Gilbert,
	Alex Williamson, Stefan Hajnoczi, Thanos Makatos,
	Richard Henderson, Kevin Wolf, Daniel P. Berrange, Max Reitz,
	Ross Lagerwall, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 9364 bytes --]

Hi

On Wed, Dec 2, 2020 at 12:25 AM Jagannathan Raman <jag.raman@oracle.com>
wrote:

> Associate the file descriptor for a PCIDevice in remote process with
> DeviceState object.
>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/hw/remote/remote-obj.h |  42 +++++++++++
>  hw/remote/message.c            |   1 +
>  hw/remote/remote-obj.c         | 154
> +++++++++++++++++++++++++++++++++++++++++
>  MAINTAINERS                    |   2 +
>  hw/remote/meson.build          |   1 +
>  5 files changed, 200 insertions(+)
>  create mode 100644 include/hw/remote/remote-obj.h
>  create mode 100644 hw/remote/remote-obj.c
>
> diff --git a/include/hw/remote/remote-obj.h
> b/include/hw/remote/remote-obj.h
> new file mode 100644
> index 0000000..0e95813
> --- /dev/null
> +++ b/include/hw/remote/remote-obj.h
> @@ -0,0 +1,42 @@
> +/*
> + * Copyright © 2020 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL-v2, version 2 or
> later.
> + *
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef REMOTE_OBJECT_H
> +#define REMOTE_OBJECT_H
> +
> +#include "io/channel.h"
> +#include "qemu/notify.h"
> +
> +#define TYPE_REMOTE_OBJECT "x-remote-object"
> +#define REMOTE_OBJECT(obj) \
> +    OBJECT_CHECK(RemoteObject, (obj), TYPE_REMOTE_OBJECT)
> +#define REMOTE_OBJECT_GET_CLASS(obj) \
> +    OBJECT_GET_CLASS(RemoteObjectClass, (obj), TYPE_REMOTE_OBJECT)
> +#define REMOTE_OBJECT_CLASS(klass) \
> +    OBJECT_CLASS_CHECK(RemoteObjectClass, (klass), TYPE_REMOTE_OBJECT)
> +
> +typedef struct {
> +    ObjectClass parent_class;
> +
> +    unsigned int nr_devs;
> +    unsigned int max_devs;
> +} RemoteObjectClass;
> +
> +typedef struct {
> +    /* private */
> +    Object parent;
> +
> +    Notifier machine_done;
> +
> +    int fd;
> +    char *devid;
> +    QIOChannel *ioc;
> +} RemoteObject;
> +
> +#endif
>

There is no need for a header if the header isn't actually shared with
various .c units. In this series, you can just declare those structs in
remote-obj.c

diff --git a/hw/remote/message.c b/hw/remote/message.c
> index 5d87bf4..1f2edc7 100644
> --- a/hw/remote/message.c
> +++ b/hw/remote/message.c
> @@ -56,6 +56,7 @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
>          }
>      }
>      qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
> +    g_free(com);
>


Should be squashed with the previous patch


>      return;
>  }
> diff --git a/hw/remote/remote-obj.c b/hw/remote/remote-obj.c
> new file mode 100644
> index 0000000..3b4c0d4
> --- /dev/null
> +++ b/hw/remote/remote-obj.c
> @@ -0,0 +1,154 @@
> +/*
> + * Copyright © 2020 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL-v2, version 2 or
> later.
> + *
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +
> +#include "hw/remote/remote-obj.h"
> +#include "qemu/error-report.h"
> +#include "qom/object_interfaces.h"
> +#include "hw/qdev-core.h"
> +#include "io/channel.h"
> +#include "hw/qdev-core.h"
> +#include "hw/remote/machine.h"
> +#include "io/channel-util.h"
> +#include "qapi/error.h"
> +#include "sysemu/sysemu.h"
> +#include "hw/pci/pci.h"
> +
> +static void remote_object_set_fd(Object *obj, const char *str, Error
> **errp)
> +{
> +    RemoteObject *o = REMOTE_OBJECT(obj);
> +
> +    o->fd = atoi(str);
>

 qemu_strtoi() has better error handling semantics. You may also want to
check it's a valid socket fd with fd_is_socket().

Alternatively, you could use qemu_open() which allows to open from QMP
fdset ("/dev/fdset/..."). This should be more flexible.

+}
> +
> +static void remote_object_set_devid(Object *obj, const char *str, Error
> **errp)
> +{
> +    RemoteObject *o = REMOTE_OBJECT(obj);
> +
> +    g_free(o->devid);
> +
> +    o->devid = g_strdup(str);
> +}
> +
> +static void property_release_remote_object(Object *obj, const char *name,
> +                                           void *opaque)
> +{
> +    Object *remote_object = OBJECT(opaque);
> +
> +    object_unref(remote_object);
> +}
>

Hmm, an object property, discussed below.

+
> +static void remote_object_machine_done(Notifier *notifier, void *data)
> +{
> +    RemoteObject *o = container_of(notifier, RemoteObject, machine_done);
> +    DeviceState *dev = NULL;
> +    QIOChannel *ioc = NULL;
> +    Coroutine *co = NULL;
> +    RemoteCommDev *comdev = NULL;
> +    Error *err = NULL;
> +
> +    dev = qdev_find_recursive(sysbus_get_default(), o->devid);
> +    if (!dev || !object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
> +        error_report("%s is not a PCI device", o->devid);
> +        return;
> +    }
> +
> +    ioc = qio_channel_new_fd(o->fd, &err);
> +    if (!ioc) {
> +        error_report_err(err);
> +        return;
> +    }
> +    qio_channel_set_blocking(ioc, false, NULL);
> +
> +    object_property_add(OBJECT(dev), "remote-object", "object", NULL,
> NULL,
> +                        property_release_remote_object, (void
> *)OBJECT(o));
>

In general, we are trying to avoid runtime/dynamic properties and slowly
replacing them with class properties.

Furthermore, this is not the way QOM handles object properties. You should
use object_class_property_add_link().

+    /* co-routine should free this. */
> +    comdev = g_new0(RemoteCommDev, 1);
> +    *comdev = (RemoteCommDev) {
> +        .ioc = ioc,
> +        .dev = PCI_DEVICE(dev),
> +    };
>
+
> +    co = qemu_coroutine_create(mpqemu_remote_msg_loop_co, comdev);
> +    qemu_coroutine_enter(co);
> +}
> +
> +static void remote_object_init(Object *obj)
> +{
> +    RemoteObjectClass *k = REMOTE_OBJECT_GET_CLASS(obj);
> +    RemoteObject *o = REMOTE_OBJECT(obj);
> +
> +    if (k->nr_devs >= k->max_devs) {
> +        error_report("Reached maximum number of devices: %u",
> k->max_devs);
> +        return;
> +    }
> +
> +    o->ioc = NULL;
> +    o->fd = -1;
> +    o->devid = NULL;
> +
> +    k->nr_devs++;
> +
> +    object_property_add_str(obj, "fd", NULL, remote_object_set_fd);
> +    object_property_set_description(obj, "fd",
> +                                    "file descriptor for the object");
> +    object_property_add_str(obj, "devid", NULL, remote_object_set_devid);
> +    object_property_set_description(obj, "devid",
> +                                    "id of device to associate");
>

Please use class properties

+
> +    o->machine_done.notify = remote_object_machine_done;
> +    qemu_add_machine_init_done_notifier(&o->machine_done);
> +}
> +
> +static void remote_object_finalize(Object *obj)
> +{
> +    RemoteObjectClass *k = REMOTE_OBJECT_GET_CLASS(obj);
> +    RemoteObject *o = REMOTE_OBJECT(obj);
> +
> +    if (o->ioc) {
> +        qio_channel_shutdown(o->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
> +        qio_channel_close(o->ioc, NULL);
> +    }
> +
> +    object_unref(OBJECT(o->ioc));
> +
> +    k->nr_devs--;
> +    g_free(o->devid);
> +}
> +
> +static void remote_object_class_init(ObjectClass *klass, void *data)
> +{
> +    RemoteObjectClass *k = REMOTE_OBJECT_CLASS(klass);
> +
> +    k->max_devs = 1;
>

Could you explain that limitation, in a code comment and/or commit message?

+    k->nr_devs = 0;
> +}
> +
> +static const TypeInfo remote_object_info = {
> +    .name = TYPE_REMOTE_OBJECT,
> +    .parent = TYPE_OBJECT,
> +    .instance_size = sizeof(RemoteObject),
> +    .instance_init = remote_object_init,
> +    .instance_finalize = remote_object_finalize,
> +    .class_size = sizeof(RemoteObjectClass),
> +    .class_init = remote_object_class_init,
> +    .interfaces = (InterfaceInfo[]) {
> +        { TYPE_USER_CREATABLE },
> +        { }
> +    }
> +};
> +
> +static void register_types(void)
> +{
> +    type_register_static(&remote_object_info);
> +}
> +
> +type_init(register_types);
> diff --git a/MAINTAINERS b/MAINTAINERS
> index b64e4b8..aedfc27 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3144,6 +3144,8 @@ F: include/hw/remote/machine.h
>  F: hw/remote/mpqemu-link.c
>  F: include/hw/remote/mpqemu-link.h
>  F: hw/remote/message.c
> +F: include/hw/remote/remote-obj.h
> +F: hw/remote/remote-obj.c
>
>  Build and test automation
>  -------------------------
> diff --git a/hw/remote/meson.build b/hw/remote/meson.build
> index 9f5c57f..71d0a56 100644
> --- a/hw/remote/meson.build
> +++ b/hw/remote/meson.build
> @@ -3,5 +3,6 @@ remote_ss = ss.source_set()
>  remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('machine.c'))
>  remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true:
> files('mpqemu-link.c'))
>  remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
> +remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
>
>  softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
> --
> 1.8.3.1
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 12629 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 11/19] multi-process: setup memory manager for remote device
  2020-12-01 20:22 ` [PATCH v12 11/19] multi-process: setup memory manager for remote device Jagannathan Raman
@ 2020-12-08 11:54   ` Marc-André Lureau
  2020-12-08 11:58   ` Marc-André Lureau
  1 sibling, 0 replies; 52+ messages in thread
From: Marc-André Lureau @ 2020-12-08 11:54 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: Elena Ufimtseva, Fam Zheng, Swapnil Ingle, John G Johnson, QEMU,
	Gerd Hoffmann, Juan Quintela, Michael S. Tsirkin,
	Markus Armbruster, Kanth Ghatraju, Felipe Franciosi, Thomas Huth,
	Eduardo Habkost, Konrad Rzeszutek Wilk, Dr. David Alan Gilbert,
	Alex Williamson, Stefan Hajnoczi, Thanos Makatos,
	Richard Henderson, Kevin Wolf, Daniel P. Berrange, Max Reitz,
	Ross Lagerwall, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 7596 bytes --]

On Wed, Dec 2, 2020 at 12:23 AM Jagannathan Raman <jag.raman@oracle.com>
wrote:

> SyncSysMemMsg message format is defined. It is used to send
> file descriptors of the RAM regions to remote device.
> RAM on the remote device is configured with a set of file descriptors.
> Old RAM regions are deleted and new regions, each with an fd, is
> added to the RAM.
>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/hw/remote/memory.h      | 19 ++++++++++++++
>  include/hw/remote/mpqemu-link.h | 13 +++++++++
>  hw/remote/memory.c              | 58
> +++++++++++++++++++++++++++++++++++++++++
>  hw/remote/mpqemu-link.c         | 11 ++++++++
>  MAINTAINERS                     |  2 ++
>  hw/remote/meson.build           |  2 ++
>  6 files changed, 105 insertions(+)
>  create mode 100644 include/hw/remote/memory.h
>  create mode 100644 hw/remote/memory.c
>
> diff --git a/include/hw/remote/memory.h b/include/hw/remote/memory.h
> new file mode 100644
> index 0000000..4fd548e
> --- /dev/null
> +++ b/include/hw/remote/memory.h
> @@ -0,0 +1,19 @@
> +/*
> + * Memory manager for remote device
> + *
> + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef REMOTE_MEMORY_H
> +#define REMOTE_MEMORY_H
> +
> +#include "exec/hwaddr.h"
> +#include "hw/remote/mpqemu-link.h"
> +
> +void remote_sysmem_reconfig(MPQemuMsg *msg, Error **errp);
> +
> +#endif
> diff --git a/include/hw/remote/mpqemu-link.h
> b/include/hw/remote/mpqemu-link.h
> index 2d79ff8..070ac77 100644
> --- a/include/hw/remote/mpqemu-link.h
> +++ b/include/hw/remote/mpqemu-link.h
> @@ -14,6 +14,7 @@
>  #include "qom/object.h"
>  #include "qemu/thread.h"
>  #include "io/channel.h"
> +#include "exec/hwaddr.h"
>
>  #define REMOTE_MAX_FDS 8
>
> @@ -24,12 +25,22 @@
>   *
>   * MPQemuCmd enum type to specify the command to be executed on the remote
>   * device.
> + *
> + * SYNC_SYSMEM      Shares QEMU's RAM with remote device's RAM
>

My understanding is that it's deliberately a private protocol between qemu
and remote host processes, so that it can break anytime. And in the future
it will be vfio-user based. Correct? It would be worth to leave a note
about it somewhere.

  */
>  typedef enum {
>      MPQEMU_CMD_INIT,
> +    SYNC_SYSMEM,
> +    RET_MSG,
>      MPQEMU_CMD_MAX,
>  } MPQemuCmd;
>
> +typedef struct {
> +    hwaddr gpas[REMOTE_MAX_FDS];
> +    uint64_t sizes[REMOTE_MAX_FDS];
> +    off_t offsets[REMOTE_MAX_FDS];
> +} SyncSysmemMsg;
> +
>  /**
>   * MPQemuMsg:
>   * @cmd: The remote command
> @@ -40,12 +51,14 @@ typedef enum {
>   * MPQemuMsg Format of the message sent to the remote device from QEMU.
>   *
>   */
> +
>  typedef struct {
>      int cmd;
>      size_t size;
>
>      union {
>          uint64_t u64;
> +        SyncSysmemMsg sync_sysmem;
>      } data;
>
>      int fds[REMOTE_MAX_FDS];
> diff --git a/hw/remote/memory.c b/hw/remote/memory.c
> new file mode 100644
> index 0000000..6d1e830
> --- /dev/null
> +++ b/hw/remote/memory.c
> @@ -0,0 +1,58 @@
> +/*
> + * Memory manager for remote device
> + *
> + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +
> +#include "hw/remote/memory.h"
> +#include "exec/address-spaces.h"
> +#include "exec/ram_addr.h"
> +#include "qapi/error.h"
> +
> +void remote_sysmem_reconfig(MPQemuMsg *msg, Error **errp)
> +{
> +    SyncSysmemMsg *sysmem_info = &msg->data.sync_sysmem;
> +    MemoryRegion *sysmem, *subregion, *next;
> +    static unsigned int suffix;
> +    Error *local_err = NULL;
> +    char *name;
> +    int region;
> +
> +    sysmem = get_system_memory();
> +
> +    memory_region_transaction_begin();
>

It looks like this transaction business isn't really helping here. Both
del_subregion and add_subregion already handle it.

+
> +    QTAILQ_FOREACH_SAFE(subregion, &sysmem->subregions, subregions_link,
> next) {
> +        if (subregion->ram) {
> +            memory_region_del_subregion(sysmem, subregion);
> +            object_unparent(OBJECT(subregion));
> +        }
> +    }
> +
> +    for (region = 0; region < msg->num_fds; region++) {
> +        subregion = g_new(MemoryRegion, 1);
> +        name = g_strdup_printf("remote-mem-%u", suffix++);
> +        memory_region_init_ram_from_fd(subregion, NULL,
> +                                       name, sysmem_info->sizes[region],
> +                                       true, msg->fds[region],
> +                                       sysmem_info->offsets[region],
> +                                       &local_err);
> +        g_free(name);
>

We are quite happily using g_auto these days


> +        if (local_err) {
> +            error_propagate(errp, local_err);
>

here that would help prevent leaking subregion.

And ERRP_GUARD could also help remove local_err and error_propagate

+            break;
> +        }
> +
> +        memory_region_add_subregion(sysmem, sysmem_info->gpas[region],
> +                                    subregion);
> +    }
> +
> +    memory_region_transaction_commit();
> +}
> diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
> index e535ed2..bbd9df3 100644
> --- a/hw/remote/mpqemu-link.c
> +++ b/hw/remote/mpqemu-link.c
> @@ -238,5 +238,16 @@ bool mpqemu_msg_valid(MPQemuMsg *msg)
>          }
>      }
>
> +     /* Verify message specific fields. */
> +    switch (msg->cmd) {
> +    case SYNC_SYSMEM:
> +        if (msg->num_fds == 0 || msg->size != sizeof(SyncSysmemMsg)) {
> +            return false;
> +        }
> +        break;
> +    default:
> +        break;
> +    }
> +
>      return true;
>  }
> diff --git a/MAINTAINERS b/MAINTAINERS
> index aedfc27..24cb36e 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3146,6 +3146,8 @@ F: include/hw/remote/mpqemu-link.h
>  F: hw/remote/message.c
>  F: include/hw/remote/remote-obj.h
>  F: hw/remote/remote-obj.c
> +F: include/hw/remote/memory.h
> +F: hw/remote/memory.c
>
>  Build and test automation
>  -------------------------
> diff --git a/hw/remote/meson.build b/hw/remote/meson.build
> index 71d0a56..64da16c 100644
> --- a/hw/remote/meson.build
> +++ b/hw/remote/meson.build
> @@ -5,4 +5,6 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true:
> files('mpqemu-link.c'))
>  remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
>  remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
>
> +specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
> +
>  softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
> --
> 1.8.3.1
>
>
I can't comment much on the overall approach. That looks very
straightforward to me, I would be surprised if there are not underlying
subtleties. Again, is this only experimental until a vfio-user
implementation emerge? If not, having unit tests would be important (if not
required).

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 9956 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 11/19] multi-process: setup memory manager for remote device
  2020-12-01 20:22 ` [PATCH v12 11/19] multi-process: setup memory manager for remote device Jagannathan Raman
  2020-12-08 11:54   ` Marc-André Lureau
@ 2020-12-08 11:58   ` Marc-André Lureau
  1 sibling, 0 replies; 52+ messages in thread
From: Marc-André Lureau @ 2020-12-08 11:58 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: Elena Ufimtseva, Fam Zheng, Swapnil Ingle, John G Johnson, QEMU,
	Gerd Hoffmann, Juan Quintela, Michael S. Tsirkin,
	Markus Armbruster, Kanth Ghatraju, Felipe Franciosi, Thomas Huth,
	Eduardo Habkost, Konrad Rzeszutek Wilk, Dr. David Alan Gilbert,
	Alex Williamson, Stefan Hajnoczi, Thanos Makatos,
	Richard Henderson, Kevin Wolf, Daniel P. Berrange, Max Reitz,
	Ross Lagerwall, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 6794 bytes --]

On Wed, Dec 2, 2020 at 12:23 AM Jagannathan Raman <jag.raman@oracle.com>
wrote:

> SyncSysMemMsg message format is defined. It is used to send
> file descriptors of the RAM regions to remote device.
> RAM on the remote device is configured with a set of file descriptors.
> Old RAM regions are deleted and new regions, each with an fd, is
> added to the RAM.
>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/hw/remote/memory.h      | 19 ++++++++++++++
>  include/hw/remote/mpqemu-link.h | 13 +++++++++
>  hw/remote/memory.c              | 58
> +++++++++++++++++++++++++++++++++++++++++
>  hw/remote/mpqemu-link.c         | 11 ++++++++
>  MAINTAINERS                     |  2 ++
>  hw/remote/meson.build           |  2 ++
>  6 files changed, 105 insertions(+)
>  create mode 100644 include/hw/remote/memory.h
>  create mode 100644 hw/remote/memory.c
>
> diff --git a/include/hw/remote/memory.h b/include/hw/remote/memory.h
> new file mode 100644
> index 0000000..4fd548e
> --- /dev/null
> +++ b/include/hw/remote/memory.h
> @@ -0,0 +1,19 @@
> +/*
> + * Memory manager for remote device
> + *
> + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef REMOTE_MEMORY_H
> +#define REMOTE_MEMORY_H
> +
> +#include "exec/hwaddr.h"
> +#include "hw/remote/mpqemu-link.h"
> +
> +void remote_sysmem_reconfig(MPQemuMsg *msg, Error **errp);
> +
> +#endif
> diff --git a/include/hw/remote/mpqemu-link.h
> b/include/hw/remote/mpqemu-link.h
> index 2d79ff8..070ac77 100644
> --- a/include/hw/remote/mpqemu-link.h
> +++ b/include/hw/remote/mpqemu-link.h
> @@ -14,6 +14,7 @@
>  #include "qom/object.h"
>  #include "qemu/thread.h"
>  #include "io/channel.h"
> +#include "exec/hwaddr.h"
>
>  #define REMOTE_MAX_FDS 8
>
> @@ -24,12 +25,22 @@
>   *
>   * MPQemuCmd enum type to specify the command to be executed on the remote
>   * device.
> + *
> + * SYNC_SYSMEM      Shares QEMU's RAM with remote device's RAM
>   */
>  typedef enum {
>      MPQEMU_CMD_INIT,
> +    SYNC_SYSMEM,
> +    RET_MSG,
>

RET_MSG is not used here, but later. ok.

     MPQEMU_CMD_MAX,
>  } MPQemuCmd;
>
> +typedef struct {
> +    hwaddr gpas[REMOTE_MAX_FDS];
> +    uint64_t sizes[REMOTE_MAX_FDS];
> +    off_t offsets[REMOTE_MAX_FDS];
> +} SyncSysmemMsg;
> +
>  /**
>   * MPQemuMsg:
>   * @cmd: The remote command
> @@ -40,12 +51,14 @@ typedef enum {
>   * MPQemuMsg Format of the message sent to the remote device from QEMU.
>   *
>   */
> +
>  typedef struct {
>      int cmd;
>      size_t size;
>
>      union {
>          uint64_t u64;
> +        SyncSysmemMsg sync_sysmem;
>      } data;
>
>      int fds[REMOTE_MAX_FDS];
> diff --git a/hw/remote/memory.c b/hw/remote/memory.c
> new file mode 100644
> index 0000000..6d1e830
> --- /dev/null
> +++ b/hw/remote/memory.c
> @@ -0,0 +1,58 @@
> +/*
> + * Memory manager for remote device
> + *
> + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +
> +#include "hw/remote/memory.h"
> +#include "exec/address-spaces.h"
> +#include "exec/ram_addr.h"
> +#include "qapi/error.h"
> +
> +void remote_sysmem_reconfig(MPQemuMsg *msg, Error **errp)
> +{
> +    SyncSysmemMsg *sysmem_info = &msg->data.sync_sysmem;
> +    MemoryRegion *sysmem, *subregion, *next;
> +    static unsigned int suffix;
> +    Error *local_err = NULL;
> +    char *name;
> +    int region;
> +
> +    sysmem = get_system_memory();
> +
> +    memory_region_transaction_begin();
> +
> +    QTAILQ_FOREACH_SAFE(subregion, &sysmem->subregions, subregions_link,
> next) {
> +        if (subregion->ram) {
> +            memory_region_del_subregion(sysmem, subregion);
> +            object_unparent(OBJECT(subregion));
> +        }
> +    }
> +
> +    for (region = 0; region < msg->num_fds; region++) {
> +        subregion = g_new(MemoryRegion, 1);
> +        name = g_strdup_printf("remote-mem-%u", suffix++);
> +        memory_region_init_ram_from_fd(subregion, NULL,
> +                                       name, sysmem_info->sizes[region],
> +                                       true, msg->fds[region],
> +                                       sysmem_info->offsets[region],
> +                                       &local_err);
> +        g_free(name);
> +        if (local_err) {
> +            error_propagate(errp, local_err);
> +            break;
> +        }
> +
> +        memory_region_add_subregion(sysmem, sysmem_info->gpas[region],
> +                                    subregion);
> +    }
> +
> +    memory_region_transaction_commit();
> +}
> diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
> index e535ed2..bbd9df3 100644
> --- a/hw/remote/mpqemu-link.c
> +++ b/hw/remote/mpqemu-link.c
> @@ -238,5 +238,16 @@ bool mpqemu_msg_valid(MPQemuMsg *msg)
>          }
>      }
>
> +     /* Verify message specific fields. */
> +    switch (msg->cmd) {
> +    case SYNC_SYSMEM:
> +        if (msg->num_fds == 0 || msg->size != sizeof(SyncSysmemMsg)) {
> +            return false;
> +        }
> +        break;
> +    default:
> +        break;
> +    }
> +
>      return true;
>  }
> diff --git a/MAINTAINERS b/MAINTAINERS
> index aedfc27..24cb36e 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3146,6 +3146,8 @@ F: include/hw/remote/mpqemu-link.h
>  F: hw/remote/message.c
>  F: include/hw/remote/remote-obj.h
>  F: hw/remote/remote-obj.c
> +F: include/hw/remote/memory.h
> +F: hw/remote/memory.c
>
>  Build and test automation
>  -------------------------
> diff --git a/hw/remote/meson.build b/hw/remote/meson.build
> index 71d0a56..64da16c 100644
> --- a/hw/remote/meson.build
> +++ b/hw/remote/meson.build
> @@ -5,4 +5,6 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true:
> files('mpqemu-link.c'))
>  remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
>  remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
>
> +specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
> +
>  softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
> --
> 1.8.3.1
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 8609 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 10/19] multi-process: Associate fd of a PCIDevice with its object
  2020-12-07 14:03   ` Marc-André Lureau
@ 2020-12-08 12:07     ` Marc-André Lureau
  0 siblings, 0 replies; 52+ messages in thread
From: Marc-André Lureau @ 2020-12-08 12:07 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: Elena Ufimtseva, Fam Zheng, Swapnil Ingle, John G Johnson, QEMU,
	Gerd Hoffmann, Juan Quintela, Michael S. Tsirkin,
	Markus Armbruster, Kanth Ghatraju, Felipe Franciosi, Thomas Huth,
	Eduardo Habkost, Konrad Rzeszutek Wilk, Dr. David Alan Gilbert,
	Alex Williamson, Stefan Hajnoczi, Thanos Makatos,
	Richard Henderson, Kevin Wolf, Daniel P. Berrange, Max Reitz,
	Ross Lagerwall, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 4495 bytes --]

On Mon, Dec 7, 2020 at 6:03 PM Marc-André Lureau <marcandre.lureau@gmail.com>
wrote:

> Hi
>
> On Wed, Dec 2, 2020 at 12:25 AM Jagannathan Raman <jag.raman@oracle.com>
> wrote:
>
>> Associate the file descriptor for a PCIDevice in remote process with
>> DeviceState object.
>>
>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>> ---
>>  include/hw/remote/remote-obj.h |  42 +++++++++++
>>  hw/remote/message.c            |   1 +
>>  hw/remote/remote-obj.c         | 154
>> +++++++++++++++++++++++++++++++++++++++++
>>  MAINTAINERS                    |   2 +
>>  hw/remote/meson.build          |   1 +
>>  5 files changed, 200 insertions(+)
>>  create mode 100644 include/hw/remote/remote-obj.h
>>  create mode 100644 hw/remote/remote-obj.c
>>
>> diff --git a/include/hw/remote/remote-obj.h
>> b/include/hw/remote/remote-obj.h
>> new file mode 100644
>> index 0000000..0e95813
>> --- /dev/null
>> +++ b/include/hw/remote/remote-obj.h
>> @@ -0,0 +1,42 @@
>> +/*
>> + * Copyright © 2020 Oracle and/or its affiliates.
>> + *
>> + * This work is licensed under the terms of the GNU GPL-v2, version 2 or
>> later.
>> + *
>> + * See the COPYING file in the top-level directory.
>> + *
>> + */
>> +
>> +#ifndef REMOTE_OBJECT_H
>> +#define REMOTE_OBJECT_H
>> +
>> +#include "io/channel.h"
>> +#include "qemu/notify.h"
>> +
>> +#define TYPE_REMOTE_OBJECT "x-remote-object"
>> +#define REMOTE_OBJECT(obj) \
>> +    OBJECT_CHECK(RemoteObject, (obj), TYPE_REMOTE_OBJECT)
>> +#define REMOTE_OBJECT_GET_CLASS(obj) \
>> +    OBJECT_GET_CLASS(RemoteObjectClass, (obj), TYPE_REMOTE_OBJECT)
>> +#define REMOTE_OBJECT_CLASS(klass) \
>> +    OBJECT_CLASS_CHECK(RemoteObjectClass, (klass), TYPE_REMOTE_OBJECT)
>> +
>> +typedef struct {
>> +    ObjectClass parent_class;
>> +
>> +    unsigned int nr_devs;
>> +    unsigned int max_devs;
>> +} RemoteObjectClass;
>> +
>> +typedef struct {
>> +    /* private */
>> +    Object parent;
>> +
>> +    Notifier machine_done;
>> +
>> +    int fd;
>> +    char *devid;
>> +    QIOChannel *ioc;
>> +} RemoteObject;
>> +
>> +#endif
>>
>
> There is no need for a header if the header isn't actually shared with
> various .c units. In this series, you can just declare those structs in
> remote-obj.c
>
> diff --git a/hw/remote/message.c b/hw/remote/message.c
>> index 5d87bf4..1f2edc7 100644
>> --- a/hw/remote/message.c
>> +++ b/hw/remote/message.c
>> @@ -56,6 +56,7 @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
>>          }
>>      }
>>      qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
>> +    g_free(com);
>>
>
>
> Should be squashed with the previous patch
>
>
>>      return;
>>  }
>> diff --git a/hw/remote/remote-obj.c b/hw/remote/remote-obj.c
>> new file mode 100644
>> index 0000000..3b4c0d4
>> --- /dev/null
>> +++ b/hw/remote/remote-obj.c
>> @@ -0,0 +1,154 @@
>> +/*
>> + * Copyright © 2020 Oracle and/or its affiliates.
>> + *
>> + * This work is licensed under the terms of the GNU GPL-v2, version 2 or
>> later.
>> + *
>> + * See the COPYING file in the top-level directory.
>> + *
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu-common.h"
>> +
>> +#include "hw/remote/remote-obj.h"
>> +#include "qemu/error-report.h"
>> +#include "qom/object_interfaces.h"
>> +#include "hw/qdev-core.h"
>> +#include "io/channel.h"
>> +#include "hw/qdev-core.h"
>> +#include "hw/remote/machine.h"
>> +#include "io/channel-util.h"
>> +#include "qapi/error.h"
>> +#include "sysemu/sysemu.h"
>> +#include "hw/pci/pci.h"
>> +
>> +static void remote_object_set_fd(Object *obj, const char *str, Error
>> **errp)
>> +{
>> +    RemoteObject *o = REMOTE_OBJECT(obj);
>> +
>> +    o->fd = atoi(str);
>>
>
>  qemu_strtoi() has better error handling semantics. You may also want to
> check it's a valid socket fd with fd_is_socket().
>
> Alternatively, you could use qemu_open() which allows to open from QMP
> fdset ("/dev/fdset/..."). This should be more flexible.
>


A better alternative is qemu_parse_fd().

In some later patch, you use monitor_fd_param(monitor_cur(), ...) for
x-pci-proxy-dev "fd" property.

That might be the right API, for consistency, use it here too?


-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 6261 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 12/19] multi-process: introduce proxy object
  2020-12-01 20:22 ` [PATCH v12 12/19] multi-process: introduce proxy object Jagannathan Raman
@ 2020-12-08 12:23   ` Marc-André Lureau
  0 siblings, 0 replies; 52+ messages in thread
From: Marc-André Lureau @ 2020-12-08 12:23 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: Elena Ufimtseva, Fam Zheng, Swapnil Ingle, John G Johnson, QEMU,
	Gerd Hoffmann, Juan Quintela, Michael S. Tsirkin,
	Markus Armbruster, Kanth Ghatraju, Felipe Franciosi, Thomas Huth,
	Eduardo Habkost, Konrad Rzeszutek Wilk, Dr. David Alan Gilbert,
	Alex Williamson, Stefan Hajnoczi, Thanos Makatos,
	Richard Henderson, Kevin Wolf, Daniel P. Berrange, Max Reitz,
	Ross Lagerwall, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 6808 bytes --]

Hi

On Wed, Dec 2, 2020 at 12:23 AM Jagannathan Raman <jag.raman@oracle.com>
wrote:

> From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>
> Defines a PCI Device proxy object as a child of TYPE_PCI_DEVICE.
>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/hw/remote/proxy.h | 36 +++++++++++++++++
>  hw/remote/proxy.c         | 98
> +++++++++++++++++++++++++++++++++++++++++++++++
>  MAINTAINERS               |  2 +
>  hw/remote/meson.build     |  1 +
>  4 files changed, 137 insertions(+)
>  create mode 100644 include/hw/remote/proxy.h
>  create mode 100644 hw/remote/proxy.c
>
> diff --git a/include/hw/remote/proxy.h b/include/hw/remote/proxy.h
> new file mode 100644
> index 0000000..923432a
> --- /dev/null
> +++ b/include/hw/remote/proxy.h
> @@ -0,0 +1,36 @@
> +/*
> + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef PROXY_H
> +#define PROXY_H
>
+
> +#include "hw/pci/pci.h"
> +#include "io/channel.h"
> +
> +#define TYPE_PCI_PROXY_DEV "x-pci-proxy-dev"
> +
> +#define PCI_PROXY_DEV(obj) \
> +            OBJECT_CHECK(PCIProxyDev, (obj), TYPE_PCI_PROXY_DEV)
> +typedef struct PCIProxyDev PCIProxyDev;
> +
> +struct PCIProxyDev {
> +    PCIDevice parent_dev;
> +    char *fd;
> +
> +    /*
> +     * Mutex used to protect the QIOChannel fd from
> +     * the concurrent access by the VCPUs since proxy
> +     * blocks while awaiting for the replies from the
> +     * process remote.
> +     */
> +    QemuMutex io_mutex;
> +    QIOChannel *ioc;
> +    Error *migration_blocker;
> +};
> +
> +#endif /* PROXY_H */
> diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
> new file mode 100644
> index 0000000..29100bc
> --- /dev/null
> +++ b/hw/remote/proxy.c
> @@ -0,0 +1,98 @@
> +/*
> + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +
> +#include "hw/remote/proxy.h"
> +#include "hw/pci/pci.h"
> +#include "qapi/error.h"
> +#include "io/channel-util.h"
> +#include "hw/qdev-properties.h"
> +#include "monitor/monitor.h"
> +#include "migration/blocker.h"
> +
> +static void proxy_set_socket(PCIProxyDev *pdev, int fd, Error **errp)
> +{
> +    pdev->ioc = qio_channel_new_fd(fd, errp);
> +}
>

That one line function (called once) should be inlined.

+
> +static Property proxy_properties[] = {
> +    DEFINE_PROP_STRING("fd", PCIProxyDev, fd),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
>

Generally we put properties just above class_init, where it is used.

+static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
> +{
>

(errp shouldn't be NULL, but ERRP_GUARD would be safer)

+    PCIProxyDev *dev = PCI_PROXY_DEV(device);
> +    int fd;
> +
> +    if (dev->fd) {
> +        fd = monitor_fd_param(monitor_cur(), dev->fd, errp);
> +        if (fd == -1) {
> +            error_prepend(errp, "proxy: unable to parse fd: ");
> +            return;
> +        }
> +        proxy_set_socket(dev, fd, errp);
> +    } else {
> +        error_setg(errp, "fd parameter not specified for %s",
> +                   DEVICE(device)->id);
> +        return;
>

We prefer early return, to keep the code easy to read.

if (!dev->fd) {
  return error...
}

the_normal_thing();
...

+    }
> +
> +    error_setg(&dev->migration_blocker, "%s does not support migration",
> +               TYPE_PCI_PROXY_DEV);
> +    if (migrate_add_blocker(dev->migration_blocker, errp)) {
> +        error_free(dev->migration_blocker);
>

leave that free to dev_exit() ?


> +        error_free(*errp);
> +        dev->migration_blocker = NULL;
> +        error_setg(errp, "Failed to set migration blocker");
>

Why not use the error from migrate_add_blocker?

+    }
> +
> +    qemu_mutex_init(&dev->io_mutex);
> +    qio_channel_set_blocking(dev->ioc, true, NULL);
> +}
> +
> +static void pci_proxy_dev_exit(PCIDevice *pdev)
> +{
> +    PCIProxyDev *dev = PCI_PROXY_DEV(pdev);
> +
> +    qio_channel_close(dev->ioc, NULL);
>

on early return in realize, dev->ioc is NULL. This will crash.


+
> +    migrate_del_blocker(dev->migration_blocker);
>

So is migration_blocker, but this should be safe with NULL


> +
> +    error_free(dev->migration_blocker);
>
+}
> +
> +static void pci_proxy_dev_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> +
> +    k->realize = pci_proxy_dev_realize;
> +    k->exit = pci_proxy_dev_exit;
> +    device_class_set_props(dc, proxy_properties);
> +}
> +
> +static const TypeInfo pci_proxy_dev_type_info = {
> +    .name          = TYPE_PCI_PROXY_DEV,
> +    .parent        = TYPE_PCI_DEVICE,
> +    .instance_size = sizeof(PCIProxyDev),
> +    .class_init    = pci_proxy_dev_class_init,
> +    .interfaces = (InterfaceInfo[]) {
> +        { INTERFACE_CONVENTIONAL_PCI_DEVICE },
> +        { },
> +    },
> +};
> +
> +static void pci_proxy_dev_register_types(void)
> +{
> +    type_register_static(&pci_proxy_dev_type_info);
> +}
> +
> +type_init(pci_proxy_dev_register_types)
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 24cb36e..ebd1d1d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3148,6 +3148,8 @@ F: include/hw/remote/remote-obj.h
>  F: hw/remote/remote-obj.c
>  F: include/hw/remote/memory.h
>  F: hw/remote/memory.c
> +F: hw/remote/proxy.c
> +F: include/hw/remote/proxy.h
>
>  Build and test automation
>  -------------------------
> diff --git a/hw/remote/meson.build b/hw/remote/meson.build
> index 64da16c..569cd20 100644
> --- a/hw/remote/meson.build
> +++ b/hw/remote/meson.build
> @@ -4,6 +4,7 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true:
> files('machine.c'))
>  remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true:
> files('mpqemu-link.c'))
>  remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
>  remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
> +remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
>
>  specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
>
> --
> 1.8.3.1
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 10186 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 13/19] multi-process: add proxy communication functions
  2020-12-01 20:22 ` [PATCH v12 13/19] multi-process: add proxy communication functions Jagannathan Raman
@ 2020-12-08 12:39   ` Marc-André Lureau
  0 siblings, 0 replies; 52+ messages in thread
From: Marc-André Lureau @ 2020-12-08 12:39 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: Elena Ufimtseva, Fam Zheng, Swapnil Ingle, John G Johnson, QEMU,
	Gerd Hoffmann, Juan Quintela, Michael S. Tsirkin,
	Markus Armbruster, Kanth Ghatraju, Felipe Franciosi, Thomas Huth,
	Eduardo Habkost, Konrad Rzeszutek Wilk, Dr. David Alan Gilbert,
	Alex Williamson, Stefan Hajnoczi, Thanos Makatos,
	Richard Henderson, Kevin Wolf, Daniel P. Berrange, Max Reitz,
	Ross Lagerwall, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 3560 bytes --]

Hi

On Wed, Dec 2, 2020 at 12:23 AM Jagannathan Raman <jag.raman@oracle.com>
wrote:

> From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/hw/remote/mpqemu-link.h |  4 ++++
>  hw/remote/mpqemu-link.c         | 38
> ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 42 insertions(+)
>
> diff --git a/include/hw/remote/mpqemu-link.h
> b/include/hw/remote/mpqemu-link.h
> index 070ac77..cee9468 100644
> --- a/include/hw/remote/mpqemu-link.h
> +++ b/include/hw/remote/mpqemu-link.h
> @@ -15,6 +15,8 @@
>  #include "qemu/thread.h"
>  #include "io/channel.h"
>  #include "exec/hwaddr.h"
> +#include "io/channel-socket.h"
> +#include "hw/remote/proxy.h"
>
>  #define REMOTE_MAX_FDS 8
>
> @@ -65,6 +67,8 @@ typedef struct {
>      int num_fds;
>  } MPQemuMsg;
>
> +uint64_t mpqemu_msg_send_and_await_reply(MPQemuMsg *msg, PCIProxyDev
> *pdev,
> +                                         Error **errp);
>  void mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp);
>  void mpqemu_msg_recv(MPQemuMsg *msg, QIOChannel *ioc, Error **errp);
>
> diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
> index bbd9df3..18c8a54 100644
> --- a/hw/remote/mpqemu-link.c
> +++ b/hw/remote/mpqemu-link.c
> @@ -17,6 +17,7 @@
>  #include "qemu/iov.h"
>  #include "qemu/error-report.h"
>  #include "qemu/main-loop.h"
> +#include "io/channel.h"
>
>  /*
>   * Send message over the ioc QIOChannel.
> @@ -219,6 +220,43 @@ fail:
>      }
>  }
>
> +/*
> + * Called from VCPU thread in non-coroutine context.
>

You could check that precondition in code early on, presumably.

The comment could use some further description, like "Send @msg and wait
for a RET_MSG reply. Returns the associated u64 message code, or UINT64_MAX
on error."

+ */
> +uint64_t mpqemu_msg_send_and_await_reply(MPQemuMsg *msg, PCIProxyDev
> *pdev,
> +                                         Error **errp)
> +{
> +    MPQemuMsg msg_reply = {0};
> +    uint64_t ret = UINT64_MAX;
> +    Error *local_err = NULL;
> +
>

Should work with ERRP_GUARD instead

+    qemu_mutex_lock(&pdev->io_mutex);
>

Should work with QEMU_LOCK_GUARD

+    mpqemu_msg_send(msg, pdev->ioc, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        goto exit_send;
> +    }
> +
>

By making mpqemu_msg_send() return true on success, this should then become
simply

if (!mpqemu_msg_send(msg, pdev->ioc, errp))
  return ret;

+    mpqemu_msg_recv(&msg_reply, pdev->ioc, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        goto exit_send;
> +    }
> +
> +    if (!mpqemu_msg_valid(&msg_reply) || msg_reply.cmd != RET_MSG) {
> +        error_setg(errp, "ERROR: Invalid reply received for command %d",
> +                         msg->cmd);
> +        goto exit_send;
> +    } else {
>

that else is unneeded.

The function with automatic cleanups should be simpler.

+        ret = msg_reply.data.u64;
> +    }
> +
> + exit_send:
> +    qemu_mutex_unlock(&pdev->io_mutex);
> +
> +    return ret;
> +}
> +
>  bool mpqemu_msg_valid(MPQemuMsg *msg)
>  {
>      if (msg->cmd >= MPQEMU_CMD_MAX && msg->cmd < 0) {
> --
> 1.8.3.1
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 5667 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 14/19] multi-process: Forward PCI config space acceses to the remote process
  2020-12-01 20:22 ` [PATCH v12 14/19] multi-process: Forward PCI config space acceses to the remote process Jagannathan Raman
@ 2020-12-08 12:52   ` Marc-André Lureau
  0 siblings, 0 replies; 52+ messages in thread
From: Marc-André Lureau @ 2020-12-08 12:52 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: Elena Ufimtseva, Fam Zheng, Swapnil Ingle, John G Johnson, QEMU,
	Gerd Hoffmann, Juan Quintela, Michael S. Tsirkin,
	Markus Armbruster, Kanth Ghatraju, Felipe Franciosi, Thomas Huth,
	Eduardo Habkost, Konrad Rzeszutek Wilk, Dr. David Alan Gilbert,
	Alex Williamson, Stefan Hajnoczi, Thanos Makatos,
	Richard Henderson, Kevin Wolf, Daniel P. Berrange, Max Reitz,
	Ross Lagerwall, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 8413 bytes --]

Hi

On Wed, Dec 2, 2020 at 12:25 AM Jagannathan Raman <jag.raman@oracle.com>
wrote:

> From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>
> The Proxy Object sends the PCI config space accesses as messages
> to the remote process over the communication channel
>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/hw/remote/mpqemu-link.h |  9 ++++++
>  hw/remote/message.c             | 62
> +++++++++++++++++++++++++++++++++++++++++
>  hw/remote/mpqemu-link.c         |  6 ++++
>  hw/remote/proxy.c               | 51 +++++++++++++++++++++++++++++++++
>  4 files changed, 128 insertions(+)
>
> diff --git a/include/hw/remote/mpqemu-link.h
> b/include/hw/remote/mpqemu-link.h
> index cee9468..057c98b 100644
> --- a/include/hw/remote/mpqemu-link.h
> +++ b/include/hw/remote/mpqemu-link.h
> @@ -34,6 +34,8 @@ typedef enum {
>      MPQEMU_CMD_INIT,
>      SYNC_SYSMEM,
>      RET_MSG,
> +    PCI_CONFIG_WRITE,
> +    PCI_CONFIG_READ,
>      MPQEMU_CMD_MAX,
>

It would be a good idea to prefix all enums with MQEMU_CMD.

 } MPQemuCmd;
>
> @@ -43,6 +45,12 @@ typedef struct {
>      off_t offsets[REMOTE_MAX_FDS];
>  } SyncSysmemMsg;
>
> +typedef struct {
> +    uint32_t addr;
> +    uint32_t val;
> +    int l;
>

"len" please, it's already short enough :)

+} PciConfDataMsg;
> +
>  /**
>   * MPQemuMsg:
>   * @cmd: The remote command
> @@ -60,6 +68,7 @@ typedef struct {
>
>      union {
>          uint64_t u64;
> +        PciConfDataMsg pci_conf_data;
>          SyncSysmemMsg sync_sysmem;
>      } data;
>
> diff --git a/hw/remote/message.c b/hw/remote/message.c
> index 1f2edc7..52a6f6f 100644
> --- a/hw/remote/message.c
> +++ b/hw/remote/message.c
> @@ -15,6 +15,12 @@
>  #include "hw/remote/mpqemu-link.h"
>  #include "qapi/error.h"
>  #include "sysemu/runstate.h"
> +#include "hw/pci/pci.h"
> +
> +static void process_config_write(QIOChannel *ioc, PCIDevice *dev,
> +                                 MPQemuMsg *msg);
> +static void process_config_read(QIOChannel *ioc, PCIDevice *dev,
> +                                MPQemuMsg *msg);
>
>  void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
>  {
> @@ -43,6 +49,12 @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
>          }
>
>          switch (msg.cmd) {
> +        case PCI_CONFIG_WRITE:
> +            process_config_write(com->ioc, pci_dev, &msg);
> +            break;
>

Some other process_ functions take &local_err. I think this is a better
idea, so error reporting is done in a single place from
mpqemu_remote_msg_loop_co() for now

+        case PCI_CONFIG_READ:
> +            process_config_read(com->ioc, pci_dev, &msg);
> +            break;
>          default:
>              error_setg(&local_err,
>                         "Unknown command (%d) received for device %s
> (pid=%d)",
> @@ -60,3 +72,53 @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
>
>      return;
>  }
> +
> +static void process_config_write(QIOChannel *ioc, PCIDevice *dev,
> +                                 MPQemuMsg *msg)
> +{
> +    PciConfDataMsg *conf = (PciConfDataMsg *)&msg->data.pci_conf_data;
> +    MPQemuMsg ret = { 0 };
> +    Error *local_err = NULL;
> +
> +    if ((conf->addr + sizeof(conf->val)) > pci_config_size(dev)) {
> +        error_report("Bad address received when writing PCI config, pid
> %d",
> +                     getpid());
>

Should use FMT_pid.

But then, I am not sure some error messages should have PID, while most of
them dont. That should be either the job of a log manager, or a custom
logger function/option.

+        ret.data.u64 = UINT64_MAX;
> +    } else {
> +        pci_default_write_config(dev, conf->addr, conf->val, conf->l);
> +    }
> +
> +    ret.cmd = RET_MSG;
> +    ret.size = sizeof(ret.data.u64);
> +
> +    mpqemu_msg_send(&ret, ioc, &local_err);
> +    if (local_err) {
> +        error_report("Could not send message to proxy from pid %d",
> +                     getpid());
>

Here it leaks local_err, and ignores it. Use error_prepend instead?

+    }
> +}
> +
> +static void process_config_read(QIOChannel *ioc, PCIDevice *dev,
> +                                MPQemuMsg *msg)
> +{
> +    PciConfDataMsg *conf = (PciConfDataMsg *)&msg->data.pci_conf_data;
> +    MPQemuMsg ret = { 0 };
> +    Error *local_err = NULL;
> +
> +    if ((conf->addr + sizeof(conf->val)) > pci_config_size(dev)) {
> +        error_report("Bad address received when reading PCI config, pid
> %d",
> +                     getpid());
> +        ret.data.u64 = UINT64_MAX;
> +    } else {
> +        ret.data.u64 = pci_default_read_config(dev, conf->addr, conf->l);
> +    }
> +
> +    ret.cmd = RET_MSG;
> +    ret.size = sizeof(ret.data.u64);
> +
> +    mpqemu_msg_send(&ret, ioc, &local_err);
> +    if (local_err) {
> +        error_report("Could not send message to proxy from pid %d",
> +                     getpid());
>

Same as earlier

+    }
> +}
> diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
> index 18c8a54..83dbd65 100644
> --- a/hw/remote/mpqemu-link.c
> +++ b/hw/remote/mpqemu-link.c
> @@ -283,6 +283,12 @@ bool mpqemu_msg_valid(MPQemuMsg *msg)
>              return false;
>          }
>          break;
> +    case PCI_CONFIG_WRITE:
> +    case PCI_CONFIG_READ:
> +        if (msg->size != sizeof(PciConfDataMsg)) {
> +            return false;
> +        }
> +        break;
>      default:
>          break;
>      }
> diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
> index 29100bc..c193484 100644
> --- a/hw/remote/proxy.c
> +++ b/hw/remote/proxy.c
> @@ -16,6 +16,8 @@
>  #include "hw/qdev-properties.h"
>  #include "monitor/monitor.h"
>  #include "migration/blocker.h"
> +#include "hw/remote/mpqemu-link.h"
> +#include "qemu/error-report.h"
>
>  static void proxy_set_socket(PCIProxyDev *pdev, int fd, Error **errp)
>  {
> @@ -69,6 +71,52 @@ static void pci_proxy_dev_exit(PCIDevice *pdev)
>      error_free(dev->migration_blocker);
>  }
>
> +static int config_op_send(PCIProxyDev *pdev, uint32_t addr, uint32_t *val,
> +                          int l, unsigned int op)
> +{
> +    MPQemuMsg msg = { 0 };
> +    uint64_t ret = -EINVAL;
> +    Error *local_err = NULL;
> +
> +    msg.cmd = op;
> +    msg.data.pci_conf_data.addr = addr;
> +    msg.data.pci_conf_data.val = (op == PCI_CONFIG_WRITE) ? *val : 0;
> +    msg.data.pci_conf_data.l = l;
> +    msg.size = sizeof(PciConfDataMsg);
> +
> +    ret = mpqemu_msg_send_and_await_reply(&msg, pdev, &local_err);
> +    if (local_err) {
> +        error_report_err(local_err);
> +    }
> +    if (op == PCI_CONFIG_READ) {
> +        *val = (uint32_t)ret;
>

That's a suspicious cast, without error checking.

+    }
> +
> +    return ret;
> +}
> +
> +static uint32_t pci_proxy_read_config(PCIDevice *d, uint32_t addr, int
> len)
> +{
> +    uint32_t val;
> +
> +    (void)config_op_send(PCI_PROXY_DEV(d), addr, &val, len,
> PCI_CONFIG_READ);
>

I don't know why (void)cast here, please enlighten me

+
> +    return val;
> +}
> +
> +static void pci_proxy_write_config(PCIDevice *d, uint32_t addr, uint32_t
> val,
> +                                   int l)
> +{
> +    /*
> +     * Some of the functions access the copy of remote device's PCI config
> +     * space which is cached in the proxy device. Therefore, maintain
> +     * it updated.
> +     */
> +    pci_default_write_config(d, addr, val, l);
> +
> +    (void)config_op_send(PCI_PROXY_DEV(d), addr, &val, l,
> PCI_CONFIG_WRITE);
>

again

+}
> +
>  static void pci_proxy_dev_class_init(ObjectClass *klass, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(klass);
> @@ -76,6 +124,9 @@ static void pci_proxy_dev_class_init(ObjectClass
> *klass, void *data)
>
>      k->realize = pci_proxy_dev_realize;
>      k->exit = pci_proxy_dev_exit;
> +    k->config_read = pci_proxy_read_config;
> +    k->config_write = pci_proxy_write_config;
> +
>      device_class_set_props(dc, proxy_properties);
>  }
>
> --
> 1.8.3.1
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 11909 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 16/19] multi-process: Synchronize remote memory
  2020-12-01 20:22 ` [PATCH v12 16/19] multi-process: Synchronize remote memory Jagannathan Raman
@ 2020-12-08 13:57   ` Marc-André Lureau
  2020-12-09 16:18     ` Jag Raman
  0 siblings, 1 reply; 52+ messages in thread
From: Marc-André Lureau @ 2020-12-08 13:57 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: Elena Ufimtseva, Fam Zheng, Swapnil Ingle, John G Johnson, QEMU,
	Gerd Hoffmann, Juan Quintela, Michael S. Tsirkin,
	Markus Armbruster, Kanth Ghatraju, Felipe Franciosi, Thomas Huth,
	Eduardo Habkost, Konrad Rzeszutek Wilk, Dr. David Alan Gilbert,
	Alex Williamson, Stefan Hajnoczi, Thanos Makatos,
	Richard Henderson, Kevin Wolf, Daniel P. Berrange, Max Reitz,
	Ross Lagerwall, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 14319 bytes --]

Hi

On Wed, Dec 2, 2020 at 12:23 AM Jagannathan Raman <jag.raman@oracle.com>
wrote:

> Add memory-listener object which is used to keep the view of the RAM
> in sync between QEMU and remote process.
> A MemoryListener is registered for system-memory AddressSpace. The
> listener sends SYNC_SYSMEM message to the remote process when memory
> listener commits the changes to memory, the remote process receives
> the message and processes it in the handler for SYNC_SYSMEM message.
>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/hw/remote/memory-sync.h |  27 ++++++
>  include/hw/remote/proxy.h       |   2 +
>  hw/remote/memory-sync.c         | 210
> ++++++++++++++++++++++++++++++++++++++++
>  hw/remote/message.c             |   5 +
>  hw/remote/proxy.c               |   6 ++
>  MAINTAINERS                     |   2 +
>  hw/remote/meson.build           |   1 +
>  7 files changed, 253 insertions(+)
>  create mode 100644 include/hw/remote/memory-sync.h
>  create mode 100644 hw/remote/memory-sync.c
>
> diff --git a/include/hw/remote/memory-sync.h
> b/include/hw/remote/memory-sync.h
> new file mode 100644
> index 0000000..785f76a
> --- /dev/null
> +++ b/include/hw/remote/memory-sync.h
> @@ -0,0 +1,27 @@
> +/*
> + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef MEMORY_SYNC_H
> +#define MEMORY_SYNC_H
> +
> +#include "exec/memory.h"
> +#include "io/channel.h"
> +
> +typedef struct RemoteMemSync {
> +    MemoryListener listener;
> +
> +    int n_mr_sections;
> +    MemoryRegionSection *mr_sections;
> +
> +    QIOChannel *ioc;
> +} RemoteMemSync;
> +
> +void configure_memory_sync(RemoteMemSync *sync, QIOChannel *ioc);
> +void deconfigure_memory_sync(RemoteMemSync *sync);
>

RemoteMemSync vs MemorySync, and function with _memory_sync suffixes...
Naming things is hard, but trying to be consistent generally helps.

My understanding is that this is a proxy-dev helper to handle memory
listening and sending SYNC_SYSMEM.

I would thus suggest naming it ProxyMemoryListener. It could eventually be
folded in proxy.c

Please try to be consistent with header naming, structure naming, type,
functions and enum prefixes etc.

proxy_memory_listener isn't that long imho.

+
> +#endif
> diff --git a/include/hw/remote/proxy.h b/include/hw/remote/proxy.h
> index e29c61b..a687b7d 100644
> --- a/include/hw/remote/proxy.h
> +++ b/include/hw/remote/proxy.h
> @@ -11,6 +11,7 @@
>
>  #include "hw/pci/pci.h"
>  #include "io/channel.h"
> +#include "hw/remote/memory-sync.h"
>
>  #define TYPE_PCI_PROXY_DEV "x-pci-proxy-dev"
>
> @@ -40,6 +41,7 @@ struct PCIProxyDev {
>      QemuMutex io_mutex;
>      QIOChannel *ioc;
>      Error *migration_blocker;
> +    RemoteMemSync sync;
>      ProxyMemoryRegion region[PCI_NUM_REGIONS];
>  };
>
> diff --git a/hw/remote/memory-sync.c b/hw/remote/memory-sync.c
> new file mode 100644
> index 0000000..2365e69
> --- /dev/null
> +++ b/hw/remote/memory-sync.c
> @@ -0,0 +1,210 @@
> +/*
> + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +
> +#include "qemu/compiler.h"
> +#include "qemu/int128.h"
> +#include "qemu/range.h"
> +#include "exec/memory.h"
> +#include "exec/cpu-common.h"
> +#include "cpu.h"
> +#include "exec/ram_addr.h"
> +#include "exec/address-spaces.h"
> +#include "hw/remote/mpqemu-link.h"
> +#include "hw/remote/memory-sync.h"
> +
> +static void proxy_ml_begin(MemoryListener *listener)
>

I suggest to rename begin -> reset

+{
> +    RemoteMemSync *sync = container_of(listener, RemoteMemSync, listener);
> +    int mrs;
> +
> +    for (mrs = 0; mrs < sync->n_mr_sections; mrs++) {
> +        memory_region_unref(sync->mr_sections[mrs].mr);
> +    }
> +
> +    g_free(sync->mr_sections);
> +    sync->mr_sections = NULL;
> +    sync->n_mr_sections = 0;
> +}
> +
> +static int get_fd_from_hostaddr(uint64_t host, ram_addr_t *offset)
>

This function is very similar to vhost_user_get_mr_data(). That suggests we
could factor the code.

Perhaps a new memory_region_from_host_full(), or extend
memory_region_from_host() with an extra optional "int *fd" argument.


> +{
> +    MemoryRegion *mr;
> +    ram_addr_t off;
> +
> +    /**
> +     * Assumes that the host address is a valid address as it's
> +     * coming from the MemoryListener system. In the case host
> +     * address is not valid, the following call would return
> +     * the default subregion of "system_memory" region, and
> +     * not NULL. So it's not possible to check for NULL here.
> +     */
> +    mr = memory_region_from_host((void *)(uintptr_t)host, &off);
> +
> +    if (offset) {
> +        *offset = off;
> +    }
> +
> +    return memory_region_get_fd(mr);
> +}
> +
> +static bool proxy_mrs_can_merge(uint64_t host, uint64_t prev_host, size_t
> size)
> +{
>

This seems similar to vhost_user_can_merge().

+    bool merge;
> +    int fd1, fd2;
> +
> +    fd1 = get_fd_from_hostaddr(host, NULL);
> +
> +    fd2 = get_fd_from_hostaddr(prev_host, NULL);
> +
> +    merge = (fd1 == fd2);
>

This could be written in a simpler manner, ex:

if (get_fd_from_hostaddr(host, NULL) != get_fd_from_hostaddr(prev_host,
NULL))
  return false

+
> +    merge &= ((prev_host + size) == host);
>

That check could be done early on before doing the more expensive
memory_region_from_host() calls

+
> +    return merge;
> +}
> +
> +static bool try_merge(RemoteMemSync *sync, MemoryRegionSection *section)
> +{
> +    uint64_t mrs_size, mrs_gpa, mrs_page;
> +    MemoryRegionSection *prev_sec;
> +    bool merged = false;
> +    uintptr_t mrs_host;
> +    RAMBlock *mrs_rb;
> +
> +    if (!sync->n_mr_sections) {
> +        return false;
> +    }
> +
> +    mrs_rb = section->mr->ram_block;
> +    mrs_page = (uint64_t)qemu_ram_pagesize(mrs_rb);
> +    mrs_size = int128_get64(section->size);
> +    mrs_gpa = section->offset_within_address_space;
> +    mrs_host = (uintptr_t)memory_region_get_ram_ptr(section->mr) +
> +               section->offset_within_region;
> +
> +    if (get_fd_from_hostaddr(mrs_host, NULL) < 0) {
> +        return true;
> +    }
> +
> +    mrs_host = mrs_host & ~(mrs_page - 1);
> +    mrs_gpa = mrs_gpa & ~(mrs_page - 1);
> +    mrs_size = ROUND_UP(mrs_size, mrs_page);
> +
> +    prev_sec = sync->mr_sections + (sync->n_mr_sections - 1);
> +    uint64_t prev_gpa_start = prev_sec->offset_within_address_space;
> +    uint64_t prev_size = int128_get64(prev_sec->size);
> +    uint64_t prev_gpa_end   = range_get_last(prev_gpa_start, prev_size);
> +    uint64_t prev_host_start =
> +        (uintptr_t)memory_region_get_ram_ptr(prev_sec->mr) +
> +        prev_sec->offset_within_region;
> +    uint64_t prev_host_end = range_get_last(prev_host_start, prev_size);
> +
> +    if (mrs_gpa <= (prev_gpa_end + 1)) {
> +        g_assert(mrs_gpa > prev_gpa_start);
> +
> +        if ((section->mr == prev_sec->mr) &&
> +            proxy_mrs_can_merge(mrs_host, prev_host_start,
> +                                (mrs_gpa - prev_gpa_start))) {
> +            uint64_t max_end = MAX(prev_host_end, mrs_host + mrs_size);
> +            merged = true;
> +            prev_sec->offset_within_address_space =
> +                MIN(prev_gpa_start, mrs_gpa);
> +            prev_sec->offset_within_region =
> +                MIN(prev_host_start, mrs_host) -
> +                (uintptr_t)memory_region_get_ram_ptr(prev_sec->mr);
> +            prev_sec->size = int128_make64(max_end - MIN(prev_host_start,
> +                                                         mrs_host));
> +        }
> +    }
> +
> +    return merged;
> +}
> +
> +static void proxy_ml_region_addnop(MemoryListener *listener,
> +                                   MemoryRegionSection *section)
> +{
> +    RemoteMemSync *sync = container_of(listener, RemoteMemSync, listener);
> +
> +    if (!(memory_region_is_ram(section->mr) &&
> +          !memory_region_is_rom(section->mr))) {
> +        return;
>

A bit clearer in vhost.c:
if (memory_region_is_ram(mr) && !memory_region_is_rom(mr)) {


> +    }
> +
> +    if (try_merge(sync, section)) {
> +        return;
> +    }
> +
> +    ++sync->n_mr_sections;
> +    sync->mr_sections = g_renew(MemoryRegionSection, sync->mr_sections,
> +                                sync->n_mr_sections);
> +    sync->mr_sections[sync->n_mr_sections - 1] = *section;
> +    sync->mr_sections[sync->n_mr_sections - 1].fv = NULL;
> +    memory_region_ref(section->mr);
> +}
> +
> +static void proxy_ml_commit(MemoryListener *listener)
> +{
> +    RemoteMemSync *sync = container_of(listener, RemoteMemSync, listener);
> +    MPQemuMsg msg;
> +    MemoryRegionSection *section;
> +    ram_addr_t offset;
> +    uintptr_t host_addr;
> +    int region;
> +    Error *local_err = NULL;
> +
> +    memset(&msg, 0, sizeof(MPQemuMsg));
> +
> +    msg.cmd = SYNC_SYSMEM;
> +    msg.num_fds = sync->n_mr_sections;
> +    msg.size = sizeof(SyncSysmemMsg);
> +    if (msg.num_fds > REMOTE_MAX_FDS) {
> +        error_report("Number of fds is more than %d", REMOTE_MAX_FDS);
> +        return;
> +    }
> +
> +    for (region = 0; region < sync->n_mr_sections; region++) {
> +        section = &sync->mr_sections[region];
> +        msg.data.sync_sysmem.gpas[region] =
> +            section->offset_within_address_space;
> +        msg.data.sync_sysmem.sizes[region] = int128_get64(section->size);
> +        host_addr = (uintptr_t)memory_region_get_ram_ptr(section->mr) +
> +                    section->offset_within_region;
> +        msg.fds[region] = get_fd_from_hostaddr(host_addr, &offset);
> +        msg.data.sync_sysmem.offsets[region] = offset;
> +    }
> +    mpqemu_msg_send(&msg, sync->ioc, &local_err);
> +    if (local_err) {
> +        error_report("Error in sending command %d", msg.cmd);
> +    }
> +}
>

That whole complex code above duplicates much of the logic in vhost.c. Can
we try to factorize it instead?

+
> +void deconfigure_memory_sync(RemoteMemSync *sync)
> +{
> +    memory_listener_unregister(&sync->listener);
> +
> +    proxy_ml_begin(&sync->listener);
> +}
> +
> +void configure_memory_sync(RemoteMemSync *sync, QIOChannel *ioc)
> +{
> +    sync->n_mr_sections = 0;
> +    sync->mr_sections = NULL;
> +
> +    sync->ioc = ioc;
> +
> +    sync->listener.begin = proxy_ml_begin;
> +    sync->listener.commit = proxy_ml_commit;
> +    sync->listener.region_add = proxy_ml_region_addnop;
> +    sync->listener.region_nop = proxy_ml_region_addnop;
> +    sync->listener.priority = 10;
> +
> +    memory_listener_register(&sync->listener, &address_space_memory);
> +}
> diff --git a/hw/remote/message.c b/hw/remote/message.c
> index 0f3e38a..454fd2d 100644
> --- a/hw/remote/message.c
> +++ b/hw/remote/message.c
> @@ -17,6 +17,7 @@
>  #include "sysemu/runstate.h"
>  #include "hw/pci/pci.h"
>  #include "exec/memattrs.h"
> +#include "hw/remote/memory.h"
>
>  static void process_config_write(QIOChannel *ioc, PCIDevice *dev,
>                                   MPQemuMsg *msg);
> @@ -64,6 +65,10 @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
>          case BAR_READ:
>              process_bar_read(com->ioc, &msg, &local_err);
>              break;
> +        case SYNC_SYSMEM:
> +            remote_sysmem_reconfig(&msg, &local_err);
> +            break;
> +
>          default:
>              error_setg(&local_err,
>                         "Unknown command (%d) received for device %s
> (pid=%d)",
> diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
> index 039347d..0f2d1aa 100644
> --- a/hw/remote/proxy.c
> +++ b/hw/remote/proxy.c
> @@ -18,6 +18,8 @@
>  #include "migration/blocker.h"
>  #include "hw/remote/mpqemu-link.h"
>  #include "qemu/error-report.h"
> +#include "hw/remote/memory-sync.h"
> +#include "qom/object.h"
>
>  static void proxy_set_socket(PCIProxyDev *pdev, int fd, Error **errp)
>  {
> @@ -58,6 +60,8 @@ static void pci_proxy_dev_realize(PCIDevice *device,
> Error **errp)
>
>      qemu_mutex_init(&dev->io_mutex);
>      qio_channel_set_blocking(dev->ioc, true, NULL);
> +
> +    configure_memory_sync(&dev->sync, dev->ioc);
>  }
>
>  static void pci_proxy_dev_exit(PCIDevice *pdev)
> @@ -69,6 +73,8 @@ static void pci_proxy_dev_exit(PCIDevice *pdev)
>      migrate_del_blocker(dev->migration_blocker);
>
>      error_free(dev->migration_blocker);
> +
> +    deconfigure_memory_sync(&dev->sync);
>  }
>
>  static int config_op_send(PCIProxyDev *pdev, uint32_t addr, uint32_t *val,
> diff --git a/MAINTAINERS b/MAINTAINERS
> index ebd1d1d..5d78b78 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3150,6 +3150,8 @@ F: include/hw/remote/memory.h
>  F: hw/remote/memory.c
>  F: hw/remote/proxy.c
>  F: include/hw/remote/proxy.h
> +F: hw/remote/memory-sync.c
> +F: include/hw/remote/memory-sync.h
>
>  Build and test automation
>  -------------------------
> diff --git a/hw/remote/meson.build b/hw/remote/meson.build
> index 569cd20..7d434a5 100644
> --- a/hw/remote/meson.build
> +++ b/hw/remote/meson.build
> @@ -7,5 +7,6 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true:
> files('remote-obj.c'))
>  remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
>
>  specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
> +specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true:
> files('memory-sync.c'))
>
>  softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
> --
> 1.8.3.1
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 18565 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 16/19] multi-process: Synchronize remote memory
  2020-12-08 13:57   ` Marc-André Lureau
@ 2020-12-09 16:18     ` Jag Raman
  2020-12-09 21:28       ` Marc-André Lureau
  0 siblings, 1 reply; 52+ messages in thread
From: Jag Raman @ 2020-12-09 16:18 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Elena Ufimtseva, Fam Zheng, Swapnil Ingle, John G Johnson, QEMU,
	Gerd Hoffmann, Juan Quintela, Michael S. Tsirkin,
	Markus Armbruster, Kanth Ghatraju, Felipe Franciosi, Thomas Huth,
	Eduardo Habkost, Konrad Rzeszutek Wilk, Dr. David Alan Gilbert,
	Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Richard Henderson, Kevin Wolf, Daniel P. Berrange, Max Reitz,
	Ross Lagerwall, Thanos Makatos



> On Dec 8, 2020, at 8:57 AM, Marc-André Lureau <marcandre.lureau@gmail.com> wrote:
> 
> Hi
> 
> On Wed, Dec 2, 2020 at 12:23 AM Jagannathan Raman <jag.raman@oracle.com> wrote:
> Add memory-listener object which is used to keep the view of the RAM
> in sync between QEMU and remote process.
> A MemoryListener is registered for system-memory AddressSpace. The
> listener sends SYNC_SYSMEM message to the remote process when memory
> listener commits the changes to memory, the remote process receives
> the message and processes it in the handler for SYNC_SYSMEM message.
> 
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/hw/remote/memory-sync.h |  27 ++++++
>  include/hw/remote/proxy.h       |   2 +
>  hw/remote/memory-sync.c         | 210 ++++++++++++++++++++++++++++++++++++++++
>  hw/remote/message.c             |   5 +
>  hw/remote/proxy.c               |   6 ++
>  MAINTAINERS                     |   2 +
>  hw/remote/meson.build           |   1 +
>  7 files changed, 253 insertions(+)
>  create mode 100644 include/hw/remote/memory-sync.h
>  create mode 100644 hw/remote/memory-sync.c
> 
> diff --git a/include/hw/remote/memory-sync.h b/include/hw/remote/memory-sync.h
> new file mode 100644
> index 0000000..785f76a
> --- /dev/null
> +++ b/include/hw/remote/memory-sync.h
> @@ -0,0 +1,27 @@
> +/*
> + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef MEMORY_SYNC_H
> +#define MEMORY_SYNC_H
> +
> +#include "exec/memory.h"
> +#include "io/channel.h"
> +
> +typedef struct RemoteMemSync {
> +    MemoryListener listener;
> +
> +    int n_mr_sections;
> +    MemoryRegionSection *mr_sections;
> +
> +    QIOChannel *ioc;
> +} RemoteMemSync;
> +
> +void configure_memory_sync(RemoteMemSync *sync, QIOChannel *ioc);
> +void deconfigure_memory_sync(RemoteMemSync *sync);
> 
> RemoteMemSync vs MemorySync, and function with _memory_sync suffixes...
> Naming things is hard, but trying to be consistent generally helps.
> 
> My understanding is that this is a proxy-dev helper to handle memory listening and sending SYNC_SYSMEM.
> 
> I would thus suggest naming it ProxyMemoryListener. It could eventually be folded in proxy.c
> 
> Please try to be consistent with header naming, structure naming, type, functions and enum prefixes etc.
> 
> proxy_memory_listener isn't that long imho.
> 
> +
> +#endif
> diff --git a/include/hw/remote/proxy.h b/include/hw/remote/proxy.h
> index e29c61b..a687b7d 100644
> --- a/include/hw/remote/proxy.h
> +++ b/include/hw/remote/proxy.h
> @@ -11,6 +11,7 @@
> 
>  #include "hw/pci/pci.h"
>  #include "io/channel.h"
> +#include "hw/remote/memory-sync.h"
> 
>  #define TYPE_PCI_PROXY_DEV "x-pci-proxy-dev"
> 
> @@ -40,6 +41,7 @@ struct PCIProxyDev {
>      QemuMutex io_mutex;
>      QIOChannel *ioc;
>      Error *migration_blocker;
> +    RemoteMemSync sync;
>      ProxyMemoryRegion region[PCI_NUM_REGIONS];
>  };
> 
> diff --git a/hw/remote/memory-sync.c b/hw/remote/memory-sync.c
> new file mode 100644
> index 0000000..2365e69
> --- /dev/null
> +++ b/hw/remote/memory-sync.c
> @@ -0,0 +1,210 @@
> +/*
> + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +
> +#include "qemu/compiler.h"
> +#include "qemu/int128.h"
> +#include "qemu/range.h"
> +#include "exec/memory.h"
> +#include "exec/cpu-common.h"
> +#include "cpu.h"
> +#include "exec/ram_addr.h"
> +#include "exec/address-spaces.h"
> +#include "hw/remote/mpqemu-link.h"
> +#include "hw/remote/memory-sync.h"
> +
> +static void proxy_ml_begin(MemoryListener *listener)
> 
> I suggest to rename begin -> reset 
> 
> +{
> +    RemoteMemSync *sync = container_of(listener, RemoteMemSync, listener);
> +    int mrs;
> +
> +    for (mrs = 0; mrs < sync->n_mr_sections; mrs++) {
> +        memory_region_unref(sync->mr_sections[mrs].mr);
> +    }
> +
> +    g_free(sync->mr_sections);
> +    sync->mr_sections = NULL;
> +    sync->n_mr_sections = 0;
> +}
> +
> +static int get_fd_from_hostaddr(uint64_t host, ram_addr_t *offset)
> 
> This function is very similar to vhost_user_get_mr_data(). That suggests we could factor the code.
> 
> Perhaps a new memory_region_from_host_full(), or extend memory_region_from_host() with an extra optional "int *fd" argument.
>  
> +{
> +    MemoryRegion *mr;
> +    ram_addr_t off;
> +
> +    /**
> +     * Assumes that the host address is a valid address as it's
> +     * coming from the MemoryListener system. In the case host
> +     * address is not valid, the following call would return
> +     * the default subregion of "system_memory" region, and
> +     * not NULL. So it's not possible to check for NULL here.
> +     */
> +    mr = memory_region_from_host((void *)(uintptr_t)host, &off);
> +
> +    if (offset) {
> +        *offset = off;
> +    }
> +
> +    return memory_region_get_fd(mr);
> +}
> +
> +static bool proxy_mrs_can_merge(uint64_t host, uint64_t prev_host, size_t size)
> +{
> 
> This seems similar to vhost_user_can_merge(). 
> 
> +    bool merge;
> +    int fd1, fd2;
> +
> +    fd1 = get_fd_from_hostaddr(host, NULL);
> +
> +    fd2 = get_fd_from_hostaddr(prev_host, NULL);
> +
> +    merge = (fd1 == fd2);
> 
> This could be written in a simpler manner, ex:
> 
> if (get_fd_from_hostaddr(host, NULL) != get_fd_from_hostaddr(prev_host, NULL))
>   return false
> 
> +
> +    merge &= ((prev_host + size) == host);
> 
> That check could be done early on before doing the more expensive memory_region_from_host() calls
> 
> +
> +    return merge;
> +}
> +
> +static bool try_merge(RemoteMemSync *sync, MemoryRegionSection *section)
> +{
> +    uint64_t mrs_size, mrs_gpa, mrs_page;
> +    MemoryRegionSection *prev_sec;
> +    bool merged = false;
> +    uintptr_t mrs_host;
> +    RAMBlock *mrs_rb;
> +
> +    if (!sync->n_mr_sections) {
> +        return false;
> +    }
> +
> +    mrs_rb = section->mr->ram_block;
> +    mrs_page = (uint64_t)qemu_ram_pagesize(mrs_rb);
> +    mrs_size = int128_get64(section->size);
> +    mrs_gpa = section->offset_within_address_space;
> +    mrs_host = (uintptr_t)memory_region_get_ram_ptr(section->mr) +
> +               section->offset_within_region;
> +
> +    if (get_fd_from_hostaddr(mrs_host, NULL) < 0) {
> +        return true;
> +    }
> +
> +    mrs_host = mrs_host & ~(mrs_page - 1);
> +    mrs_gpa = mrs_gpa & ~(mrs_page - 1);
> +    mrs_size = ROUND_UP(mrs_size, mrs_page);
> +
> +    prev_sec = sync->mr_sections + (sync->n_mr_sections - 1);
> +    uint64_t prev_gpa_start = prev_sec->offset_within_address_space;
> +    uint64_t prev_size = int128_get64(prev_sec->size);
> +    uint64_t prev_gpa_end   = range_get_last(prev_gpa_start, prev_size);
> +    uint64_t prev_host_start =
> +        (uintptr_t)memory_region_get_ram_ptr(prev_sec->mr) +
> +        prev_sec->offset_within_region;
> +    uint64_t prev_host_end = range_get_last(prev_host_start, prev_size);
> +
> +    if (mrs_gpa <= (prev_gpa_end + 1)) {
> +        g_assert(mrs_gpa > prev_gpa_start);
> +
> +        if ((section->mr == prev_sec->mr) &&
> +            proxy_mrs_can_merge(mrs_host, prev_host_start,
> +                                (mrs_gpa - prev_gpa_start))) {
> +            uint64_t max_end = MAX(prev_host_end, mrs_host + mrs_size);
> +            merged = true;
> +            prev_sec->offset_within_address_space =
> +                MIN(prev_gpa_start, mrs_gpa);
> +            prev_sec->offset_within_region =
> +                MIN(prev_host_start, mrs_host) -
> +                (uintptr_t)memory_region_get_ram_ptr(prev_sec->mr);
> +            prev_sec->size = int128_make64(max_end - MIN(prev_host_start,
> +                                                         mrs_host));
> +        }
> +    }
> +
> +    return merged;
> +}
> +
> +static void proxy_ml_region_addnop(MemoryListener *listener,
> +                                   MemoryRegionSection *section)
> +{
> +    RemoteMemSync *sync = container_of(listener, RemoteMemSync, listener);
> +
> +    if (!(memory_region_is_ram(section->mr) &&
> +          !memory_region_is_rom(section->mr))) {
> +        return;
> 
> A bit clearer in vhost.c:
> if (memory_region_is_ram(mr) && !memory_region_is_rom(mr)) {
>  
> +    }
> +
> +    if (try_merge(sync, section)) {
> +        return;
> +    }
> +
> +    ++sync->n_mr_sections;
> +    sync->mr_sections = g_renew(MemoryRegionSection, sync->mr_sections,
> +                                sync->n_mr_sections);
> +    sync->mr_sections[sync->n_mr_sections - 1] = *section;
> +    sync->mr_sections[sync->n_mr_sections - 1].fv = NULL;
> +    memory_region_ref(section->mr);
> +}
> +
> +static void proxy_ml_commit(MemoryListener *listener)
> +{
> +    RemoteMemSync *sync = container_of(listener, RemoteMemSync, listener);
> +    MPQemuMsg msg;
> +    MemoryRegionSection *section;
> +    ram_addr_t offset;
> +    uintptr_t host_addr;
> +    int region;
> +    Error *local_err = NULL;
> +
> +    memset(&msg, 0, sizeof(MPQemuMsg));
> +
> +    msg.cmd = SYNC_SYSMEM;
> +    msg.num_fds = sync->n_mr_sections;
> +    msg.size = sizeof(SyncSysmemMsg);
> +    if (msg.num_fds > REMOTE_MAX_FDS) {
> +        error_report("Number of fds is more than %d", REMOTE_MAX_FDS);
> +        return;
> +    }
> +
> +    for (region = 0; region < sync->n_mr_sections; region++) {
> +        section = &sync->mr_sections[region];
> +        msg.data.sync_sysmem.gpas[region] =
> +            section->offset_within_address_space;
> +        msg.data.sync_sysmem.sizes[region] = int128_get64(section->size);
> +        host_addr = (uintptr_t)memory_region_get_ram_ptr(section->mr) +
> +                    section->offset_within_region;
> +        msg.fds[region] = get_fd_from_hostaddr(host_addr, &offset);
> +        msg.data.sync_sysmem.offsets[region] = offset;
> +    }
> +    mpqemu_msg_send(&msg, sync->ioc, &local_err);
> +    if (local_err) {
> +        error_report("Error in sending command %d", msg.cmd);
> +    }
> +}
> 
> That whole complex code above duplicates much of the logic in vhost.c. Can we try to factorize it instead?

Hi Marc-Andre,

Thank you for sharing your feedback!

Would it be alright if we addressed this item alone in a separate patch in the future? Since
this refactoring affects vhost code, we’re wondering it would be better to address it in a
future patch to help with any regression analysis in the future.

Thank you!
—
Jag

> 
> +
> +void deconfigure_memory_sync(RemoteMemSync *sync)
> +{
> +    memory_listener_unregister(&sync->listener);
> +
> +    proxy_ml_begin(&sync->listener);
> +}
> +
> +void configure_memory_sync(RemoteMemSync *sync, QIOChannel *ioc)
> +{
> +    sync->n_mr_sections = 0;
> +    sync->mr_sections = NULL;
> +
> +    sync->ioc = ioc;
> +
> +    sync->listener.begin = proxy_ml_begin;
> +    sync->listener.commit = proxy_ml_commit;
> +    sync->listener.region_add = proxy_ml_region_addnop;
> +    sync->listener.region_nop = proxy_ml_region_addnop;
> +    sync->listener.priority = 10;
> +
> +    memory_listener_register(&sync->listener, &address_space_memory);
> +}
> diff --git a/hw/remote/message.c b/hw/remote/message.c
> index 0f3e38a..454fd2d 100644
> --- a/hw/remote/message.c
> +++ b/hw/remote/message.c
> @@ -17,6 +17,7 @@
>  #include "sysemu/runstate.h"
>  #include "hw/pci/pci.h"
>  #include "exec/memattrs.h"
> +#include "hw/remote/memory.h"
> 
>  static void process_config_write(QIOChannel *ioc, PCIDevice *dev,
>                                   MPQemuMsg *msg);
> @@ -64,6 +65,10 @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
>          case BAR_READ:
>              process_bar_read(com->ioc, &msg, &local_err);
>              break;
> +        case SYNC_SYSMEM:
> +            remote_sysmem_reconfig(&msg, &local_err);
> +            break;
> +
>          default:
>              error_setg(&local_err,
>                         "Unknown command (%d) received for device %s (pid=%d)",
> diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
> index 039347d..0f2d1aa 100644
> --- a/hw/remote/proxy.c
> +++ b/hw/remote/proxy.c
> @@ -18,6 +18,8 @@
>  #include "migration/blocker.h"
>  #include "hw/remote/mpqemu-link.h"
>  #include "qemu/error-report.h"
> +#include "hw/remote/memory-sync.h"
> +#include "qom/object.h"
> 
>  static void proxy_set_socket(PCIProxyDev *pdev, int fd, Error **errp)
>  {
> @@ -58,6 +60,8 @@ static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
> 
>      qemu_mutex_init(&dev->io_mutex);
>      qio_channel_set_blocking(dev->ioc, true, NULL);
> +
> +    configure_memory_sync(&dev->sync, dev->ioc);
>  }
> 
>  static void pci_proxy_dev_exit(PCIDevice *pdev)
> @@ -69,6 +73,8 @@ static void pci_proxy_dev_exit(PCIDevice *pdev)
>      migrate_del_blocker(dev->migration_blocker);
> 
>      error_free(dev->migration_blocker);
> +
> +    deconfigure_memory_sync(&dev->sync);
>  }
> 
>  static int config_op_send(PCIProxyDev *pdev, uint32_t addr, uint32_t *val,
> diff --git a/MAINTAINERS b/MAINTAINERS
> index ebd1d1d..5d78b78 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3150,6 +3150,8 @@ F: include/hw/remote/memory.h
>  F: hw/remote/memory.c
>  F: hw/remote/proxy.c
>  F: include/hw/remote/proxy.h
> +F: hw/remote/memory-sync.c
> +F: include/hw/remote/memory-sync.h
> 
>  Build and test automation
>  -------------------------
> diff --git a/hw/remote/meson.build b/hw/remote/meson.build
> index 569cd20..7d434a5 100644
> --- a/hw/remote/meson.build
> +++ b/hw/remote/meson.build
> @@ -7,5 +7,6 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
>  remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
> 
>  specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
> +specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory-sync.c'))
> 
>  softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
> -- 
> 1.8.3.1
> 
> 
> 
> -- 
> Marc-André Lureau



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 02/19] multi-process: add configure and usage information
  2020-12-04 14:37   ` Daniel P. Berrangé
@ 2020-12-09 16:20     ` Jag Raman
  0 siblings, 0 replies; 52+ messages in thread
From: Jag Raman @ 2020-12-09 16:20 UTC (permalink / raw)
  To: "Daniel P. Berrangé"
  Cc: Elena Ufimtseva, Fam Zheng, Swapnil Ingle, John G Johnson, QEMU,
	kraxel, quintela, mst, armbru, kanth.ghatraju, felipe, thuth,
	ehabkost, konrad.wilk, dgilbert, alex.williamson, stefanha,
	pbonzini, rth, kwolf, mreitz, ross.lagerwall, marcandre.lureau,
	thanos.makatos



> On Dec 4, 2020, at 9:37 AM, Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
> On Tue, Dec 01, 2020 at 03:22:37PM -0500, Jagannathan Raman wrote:
>> From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>> 
>> Adds documentation explaining the command-line arguments needed
>> to use multi-process. Also adds a python script that illustrates the
>> usage.
>> 
>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>> ---
>> docs/multi-process.rst                        | 66 +++++++++++++++++++
>> MAINTAINERS                                   |  1 +
>> tests/multiprocess/multiprocess-lsi53c895a.py | 92 +++++++++++++++++++++++++++
>> 3 files changed, 159 insertions(+)
>> create mode 100644 docs/multi-process.rst
>> create mode 100755 tests/multiprocess/multiprocess-lsi53c895a.py
> 
> 
>> diff --git a/tests/multiprocess/multiprocess-lsi53c895a.py b/tests/multiprocess/multiprocess-lsi53c895a.py
>> new file mode 100755
>> index 0000000..bfe4f66
>> --- /dev/null
>> +++ b/tests/multiprocess/multiprocess-lsi53c895a.py
>> @@ -0,0 +1,92 @@
>> +#!/usr/bin/env python3
>> +
>> +import urllib.request
>> +import subprocess
>> +import argparse
>> +import socket
>> +import sys
>> +import os
>> +
>> +arch = os.uname()[4]
>> +proc_path = os.path.join(os.getcwd(), '..', '..', 'build', arch+'-softmmu',
>> +                         'qemu-system-'+arch)
>> +
>> +parser = argparse.ArgumentParser(description='Launcher for multi-process QEMU')
>> +parser.add_argument('--bin', required=False, help='location of QEMU binary',
>> +                    metavar='bin');
>> +args = parser.parse_args()
>> +
>> +if args.bin is not None:
>> +    proc_path = args.bin
>> +
>> +if not os.path.isfile(proc_path):
>> +    sys.exit('QEMU binary not found')
>> +
>> +kernel_path = os.path.join(os.getcwd(), 'vmlinuz')
>> +initrd_path = os.path.join(os.getcwd(), 'initrd')
>> +
>> +proxy_cmd = [ proc_path,                                                    \
>> +              '-name', 'Fedora', '-smp', '4', '-m', '2048', '-cpu', 'host', \
> 
> I wonder if setting 2 GB of RAM is too large for something that runs by
> default as a test.
> 
>> +              '-object', 'memory-backend-memfd,id=sysmem-file,size=2G',     \
>> +              '-numa', 'node,memdev=sysmem-file',                           \
>> +              '-kernel', kernel_path, '-initrd', initrd_path,               \
>> +              '-vnc', ':0',                                                 \
>> +              '-monitor', 'unix:/home/qemu-sock,server,nowait',             \
>> +            ]
>> +
>> +if arch == 'x86_64':
>> +    print('Downloading images for arch x86_64')
>> +    kernel_url = 'https://dl.fedoraproject.org/pub/fedora/linux/'    \
>> +                 'releases/33/Everything/x86_64/os/images/'          \
>> +                 'pxeboot/vmlinuz'
>> +    initrd_url = 'https://dl.fedoraproject.org/pub/fedora/linux/'    \
>> +                 'releases/33/Everything/x86_64/os/images/'          \
>> +                 'pxeboot/initrd.img'
>> +    proxy_cmd.append('-machine')
>> +    proxy_cmd.append('pc,accel=kvm')
>> +    proxy_cmd.append('-append')
>> +    proxy_cmd.append('rdinit=/bin/bash console=ttyS0 console=tty0')
>> +elif arch == 'aarch64':
>> +    print('Downloading images for arch aarch64')
>> +    kernel_url = 'https://dl.fedoraproject.org/pub/fedora/linux/'    \
>> +                 'releases/33/Everything/aarch64/os/images/'         \
>> +                 'pxeboot/vmlinuz'
>> +    initrd_url = 'https://dl.fedoraproject.org/pub/fedora/linux/'    \
>> +                 'releases/33/Everything/aarch64/os/images/'         \
>> +                 'pxeboot/initrd.img'
>> +    proxy_cmd.append('-machine')
>> +    proxy_cmd.append('virt,gic-version=3')
>> +    proxy_cmd.append('-accel')
>> +    proxy_cmd.append('kvm')
>> +    proxy_cmd.append('-append')
>> +    proxy_cmd.append('rdinit=/bin/bash')
>> +else:
>> +    sys.exit('Arch %s not tested' % arch)
> 
> It doens't look like you really need a full OS here. Rather than
> downloading the fairly large Fedora images, I'd suggest just using
> the kernel that exists on the host OS already in /boot, and then
> building a tiny initrd that contains just a static linked busybox.
> 
> I have this helper script that could be imported into QEMU for
> this purpose:
> 
>  https://gitlab.com/berrange/tiny-vm-tools/-/blob/master/make-tiny-image.py

Hi Daniel,

That’s a nifty script. I was trying to do something similar. Thank you for sharing!

—
Jag

> 
> And just skip the test if busybox doesn't exist, or if the vmlinux
> in /boot isn't accessible (Debian restricts it to root only IIRC)
> 
>> +
>> +urllib.request.urlretrieve(kernel_url, kernel_path)
>> +urllib.request.urlretrieve(initrd_url, initrd_path)
>> +
>> +proxy, remote = socket.socketpair(socket.AF_UNIX, socket.SOCK_STREAM)
>> +
>> +proxy_cmd.append('-device')
>> +proxy_cmd.append('x-pci-proxy-dev,id=lsi1,fd='+str(proxy.fileno()))
>> +
>> +remote_cmd = [ proc_path,                                                      \
>> +               '-machine', 'x-remote',                                         \
>> +               '-device', 'lsi53c895a,id=lsi1',                                \
>> +               '-object',                                                      \
>> +               'x-remote-object,id=robj1,devid=lsi1,fd='+str(remote.fileno()), \
>> +               '-display', 'none',                                             \
>> +               '-monitor', 'unix:/home/rem-sock,server,nowait',                \
>> +             ]
>> +
>> +pid = os.fork();
>> +
>> +if pid:
>> +    # In Proxy
>> +    print('Launching QEMU with Proxy object');
>> +    process = subprocess.Popen(proxy_cmd, pass_fds=[proxy.fileno()])
>> +else:
>> +    # In remote
>> +    print('Launching Remote process');
>> +    process = subprocess.Popen(remote_cmd, pass_fds=[remote.fileno(), 0, 1, 2])
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
> 
> 



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 06/19] multi-process: setup a machine object for remote device process
  2020-12-04 14:35   ` Marc-André Lureau
@ 2020-12-09 16:56     ` Jag Raman
  0 siblings, 0 replies; 52+ messages in thread
From: Jag Raman @ 2020-12-09 16:56 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Elena Ufimtseva, Fam Zheng, Swapnil Ingle, John G Johnson, QEMU,
	Gerd Hoffmann, Juan Quintela, Michael S. Tsirkin,
	Markus Armbruster, Kanth Ghatraju, Felipe Franciosi, Thomas Huth,
	Eduardo Habkost, Konrad Rzeszutek Wilk, Dr. David Alan Gilbert,
	Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Richard Henderson, Kevin Wolf, Daniel P. Berrange, Max Reitz,
	Ross Lagerwall, Thanos Makatos



> On Dec 4, 2020, at 9:35 AM, Marc-André Lureau <marcandre.lureau@gmail.com> wrote:
> 
> 
> 
> On Wed, Dec 2, 2020 at 12:23 AM Jagannathan Raman <jag.raman@oracle.com> wrote:
> x-remote-machine object sets up various subsystems of the remote
> device process. Instantiate PCI host bridge object and initialize RAM, IO &
> PCI memory regions.
> 
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/hw/pci-host/remote.h |  1 +
>  include/hw/remote/machine.h  | 28 ++++++++++++++++++
>  hw/remote/machine.c          | 69 ++++++++++++++++++++++++++++++++++++++++++++
>  MAINTAINERS                  |  2 ++
>  hw/meson.build               |  1 +
>  hw/remote/meson.build        |  5 ++++
>  6 files changed, 106 insertions(+)
>  create mode 100644 include/hw/remote/machine.h
>  create mode 100644 hw/remote/machine.c
>  create mode 100644 hw/remote/meson.build
> 
> diff --git a/include/hw/pci-host/remote.h b/include/hw/pci-host/remote.h
> index bab6d3c..cc0fff4 100644
> --- a/include/hw/pci-host/remote.h
> +++ b/include/hw/pci-host/remote.h
> @@ -25,6 +25,7 @@ typedef struct RemotePCIHost {
> 
>      MemoryRegion *mr_pci_mem;
>      MemoryRegion *mr_sys_io;
> +    MemoryRegion *mr_sys_mem;
> 
> Why is this not part of the previous patch?

Hi Marc-Andre,

We originally defined this variable in a previous patch. But we were
not using it in that patch.

Based on feedback we previously received, we moved it to this patch
as it is used for the first time here.

Thank you!
—
Jag

> 
>  } RemotePCIHost;
> 
>  #endif
> diff --git a/include/hw/remote/machine.h b/include/hw/remote/machine.h
> new file mode 100644
> index 0000000..d312972
> --- /dev/null
> +++ b/include/hw/remote/machine.h
> @@ -0,0 +1,28 @@
> +/*
> + * Remote machine configuration
> + *
> + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef REMOTE_MACHINE_H
> +#define REMOTE_MACHINE_H
> +
> +#include "qom/object.h"
> +#include "hw/boards.h"
> +#include "hw/pci-host/remote.h"
> +
> +typedef struct RemoteMachineState {
> +    MachineState parent_obj;
> +
> +    RemotePCIHost *host;
> +} RemoteMachineState;
> +
> +#define TYPE_REMOTE_MACHINE "x-remote-machine"
> +#define REMOTE_MACHINE(obj) \
> +    OBJECT_CHECK(RemoteMachineState, (obj), TYPE_REMOTE_MACHINE)
> +
> +#endif
> diff --git a/hw/remote/machine.c b/hw/remote/machine.c
> new file mode 100644
> index 0000000..c5658bf
> --- /dev/null
> +++ b/hw/remote/machine.c
> @@ -0,0 +1,69 @@
> +/*
> + * Machine for remote device
> + *
> + *  This machine type is used by the remote device process in multi-process
> + *  QEMU. QEMU device models depend on parent busses, interrupt controllers,
> + *  memory regions, etc. The remote machine type offers this environment so
> + *  that QEMU device models can be used as remote devices.
> + *
> + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +
> +#include "hw/remote/machine.h"
> +#include "exec/address-spaces.h"
> +#include "exec/memory.h"
> +#include "qapi/error.h"
> +
> +static void remote_machine_init(MachineState *machine)
> +{
> +    MemoryRegion *system_memory, *system_io, *pci_memory;
> +    RemoteMachineState *s = REMOTE_MACHINE(machine);
> +    RemotePCIHost *rem_host;
> +
> +    system_memory = get_system_memory();
> +    system_io = get_system_io();
> +
> +    pci_memory = g_new(MemoryRegion, 1);
> +    memory_region_init(pci_memory, NULL, "pci", UINT64_MAX);
> +
> +    rem_host = REMOTE_HOST_DEVICE(qdev_new(TYPE_REMOTE_HOST_DEVICE));
> +
> +    rem_host->mr_pci_mem = pci_memory;
> +    rem_host->mr_sys_mem = system_memory;
> +    rem_host->mr_sys_io = system_io;
> +
> +    s->host = rem_host;
> +
> +    object_property_add_child(OBJECT(s), "remote-device", OBJECT(rem_host));
> 
> "remote-pcihost" instead ?
> 
> +    memory_region_add_subregion_overlap(system_memory, 0x0, pci_memory, -1);
> +
> +    qdev_realize(DEVICE(rem_host), sysbus_get_default(), &error_fatal);
> +}
> +
> +static void remote_machine_class_init(ObjectClass *oc, void *data)
> +{
> +    MachineClass *mc = MACHINE_CLASS(oc);
> +
> +    mc->init = remote_machine_init;
> 
> Set mc->desc = "Experimental remote machine" ?
> 
> +}
> +
> +static const TypeInfo remote_machine = {
> +    .name = TYPE_REMOTE_MACHINE,
> +    .parent = TYPE_MACHINE,
> +    .instance_size = sizeof(RemoteMachineState),
> +    .class_init = remote_machine_class_init,
> +};
> +
> +static void remote_machine_register_types(void)
> +{
> +    type_register_static(&remote_machine);
> +}
> +
> +type_init(remote_machine_register_types);
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4515476..c45ac1d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3139,6 +3139,8 @@ F: docs/devel/multi-process.rst
>  F: tests/multiprocess/multiprocess-lsi53c895a.py
>  F: hw/pci-host/remote.c
>  F: include/hw/pci-host/remote.h
> +F: hw/remote/machine.c
> +F: include/hw/remote/machine.h
> 
>  Build and test automation
>  -------------------------
> diff --git a/hw/meson.build b/hw/meson.build
> index 010de72..e615d72 100644
> --- a/hw/meson.build
> +++ b/hw/meson.build
> @@ -56,6 +56,7 @@ subdir('moxie')
>  subdir('nios2')
>  subdir('openrisc')
>  subdir('ppc')
> +subdir('remote')
>  subdir('riscv')
>  subdir('rx')
>  subdir('s390x')
> diff --git a/hw/remote/meson.build b/hw/remote/meson.build
> new file mode 100644
> index 0000000..197b038
> --- /dev/null
> +++ b/hw/remote/meson.build
> @@ -0,0 +1,5 @@
> +remote_ss = ss.source_set()
> +
> +remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('machine.c'))
> +
> +softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
> -- 
> 1.8.3.1
> 
> 
> 
> -- 
> Marc-André Lureau



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 16/19] multi-process: Synchronize remote memory
  2020-12-09 16:18     ` Jag Raman
@ 2020-12-09 21:28       ` Marc-André Lureau
  2020-12-10 16:57         ` Jag Raman
  0 siblings, 1 reply; 52+ messages in thread
From: Marc-André Lureau @ 2020-12-09 21:28 UTC (permalink / raw)
  To: Jag Raman
  Cc: Elena Ufimtseva, Fam Zheng, Swapnil Ingle, John G Johnson, QEMU,
	Gerd Hoffmann, Juan Quintela, Michael S. Tsirkin,
	Markus Armbruster, Kanth Ghatraju, Felipe Franciosi, Thomas Huth,
	Eduardo Habkost, Konrad Rzeszutek Wilk, Dr. David Alan Gilbert,
	Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Richard Henderson, Kevin Wolf, Daniel P. Berrange, Max Reitz,
	Ross Lagerwall, Thanos Makatos

[-- Attachment #1: Type: text/plain, Size: 15905 bytes --]

On Wed, Dec 9, 2020 at 8:20 PM Jag Raman <jag.raman@oracle.com> wrote:

>
>
> > On Dec 8, 2020, at 8:57 AM, Marc-André Lureau <
> marcandre.lureau@gmail.com> wrote:
> >
> > Hi
> >
> > On Wed, Dec 2, 2020 at 12:23 AM Jagannathan Raman <jag.raman@oracle.com>
> wrote:
> > Add memory-listener object which is used to keep the view of the RAM
> > in sync between QEMU and remote process.
> > A MemoryListener is registered for system-memory AddressSpace. The
> > listener sends SYNC_SYSMEM message to the remote process when memory
> > listener commits the changes to memory, the remote process receives
> > the message and processes it in the handler for SYNC_SYSMEM message.
> >
> > Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> > Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> > Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  include/hw/remote/memory-sync.h |  27 ++++++
> >  include/hw/remote/proxy.h       |   2 +
> >  hw/remote/memory-sync.c         | 210
> ++++++++++++++++++++++++++++++++++++++++
> >  hw/remote/message.c             |   5 +
> >  hw/remote/proxy.c               |   6 ++
> >  MAINTAINERS                     |   2 +
> >  hw/remote/meson.build           |   1 +
> >  7 files changed, 253 insertions(+)
> >  create mode 100644 include/hw/remote/memory-sync.h
> >  create mode 100644 hw/remote/memory-sync.c
> >
> > diff --git a/include/hw/remote/memory-sync.h
> b/include/hw/remote/memory-sync.h
> > new file mode 100644
> > index 0000000..785f76a
> > --- /dev/null
> > +++ b/include/hw/remote/memory-sync.h
> > @@ -0,0 +1,27 @@
> > +/*
> > + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or
> later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#ifndef MEMORY_SYNC_H
> > +#define MEMORY_SYNC_H
> > +
> > +#include "exec/memory.h"
> > +#include "io/channel.h"
> > +
> > +typedef struct RemoteMemSync {
> > +    MemoryListener listener;
> > +
> > +    int n_mr_sections;
> > +    MemoryRegionSection *mr_sections;
> > +
> > +    QIOChannel *ioc;
> > +} RemoteMemSync;
> > +
> > +void configure_memory_sync(RemoteMemSync *sync, QIOChannel *ioc);
> > +void deconfigure_memory_sync(RemoteMemSync *sync);
> >
> > RemoteMemSync vs MemorySync, and function with _memory_sync suffixes...
> > Naming things is hard, but trying to be consistent generally helps.
> >
> > My understanding is that this is a proxy-dev helper to handle memory
> listening and sending SYNC_SYSMEM.
> >
> > I would thus suggest naming it ProxyMemoryListener. It could eventually
> be folded in proxy.c
> >
> > Please try to be consistent with header naming, structure naming, type,
> functions and enum prefixes etc.
> >
> > proxy_memory_listener isn't that long imho.
> >
> > +
> > +#endif
> > diff --git a/include/hw/remote/proxy.h b/include/hw/remote/proxy.h
> > index e29c61b..a687b7d 100644
> > --- a/include/hw/remote/proxy.h
> > +++ b/include/hw/remote/proxy.h
> > @@ -11,6 +11,7 @@
> >
> >  #include "hw/pci/pci.h"
> >  #include "io/channel.h"
> > +#include "hw/remote/memory-sync.h"
> >
> >  #define TYPE_PCI_PROXY_DEV "x-pci-proxy-dev"
> >
> > @@ -40,6 +41,7 @@ struct PCIProxyDev {
> >      QemuMutex io_mutex;
> >      QIOChannel *ioc;
> >      Error *migration_blocker;
> > +    RemoteMemSync sync;
> >      ProxyMemoryRegion region[PCI_NUM_REGIONS];
> >  };
> >
> > diff --git a/hw/remote/memory-sync.c b/hw/remote/memory-sync.c
> > new file mode 100644
> > index 0000000..2365e69
> > --- /dev/null
> > +++ b/hw/remote/memory-sync.c
> > @@ -0,0 +1,210 @@
> > +/*
> > + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or
> later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "qemu-common.h"
> > +
> > +#include "qemu/compiler.h"
> > +#include "qemu/int128.h"
> > +#include "qemu/range.h"
> > +#include "exec/memory.h"
> > +#include "exec/cpu-common.h"
> > +#include "cpu.h"
> > +#include "exec/ram_addr.h"
> > +#include "exec/address-spaces.h"
> > +#include "hw/remote/mpqemu-link.h"
> > +#include "hw/remote/memory-sync.h"
> > +
> > +static void proxy_ml_begin(MemoryListener *listener)
> >
> > I suggest to rename begin -> reset
> >
> > +{
> > +    RemoteMemSync *sync = container_of(listener, RemoteMemSync,
> listener);
> > +    int mrs;
> > +
> > +    for (mrs = 0; mrs < sync->n_mr_sections; mrs++) {
> > +        memory_region_unref(sync->mr_sections[mrs].mr);
> > +    }
> > +
> > +    g_free(sync->mr_sections);
> > +    sync->mr_sections = NULL;
> > +    sync->n_mr_sections = 0;
> > +}
> > +
> > +static int get_fd_from_hostaddr(uint64_t host, ram_addr_t *offset)
> >
> > This function is very similar to vhost_user_get_mr_data(). That suggests
> we could factor the code.
> >
> > Perhaps a new memory_region_from_host_full(), or extend
> memory_region_from_host() with an extra optional "int *fd" argument.
> >
> > +{
> > +    MemoryRegion *mr;
> > +    ram_addr_t off;
> > +
> > +    /**
> > +     * Assumes that the host address is a valid address as it's
> > +     * coming from the MemoryListener system. In the case host
> > +     * address is not valid, the following call would return
> > +     * the default subregion of "system_memory" region, and
> > +     * not NULL. So it's not possible to check for NULL here.
> > +     */
> > +    mr = memory_region_from_host((void *)(uintptr_t)host, &off);
> > +
> > +    if (offset) {
> > +        *offset = off;
> > +    }
> > +
> > +    return memory_region_get_fd(mr);
> > +}
> > +
> > +static bool proxy_mrs_can_merge(uint64_t host, uint64_t prev_host,
> size_t size)
> > +{
> >
> > This seems similar to vhost_user_can_merge().
> >
> > +    bool merge;
> > +    int fd1, fd2;
> > +
> > +    fd1 = get_fd_from_hostaddr(host, NULL);
> > +
> > +    fd2 = get_fd_from_hostaddr(prev_host, NULL);
> > +
> > +    merge = (fd1 == fd2);
> >
> > This could be written in a simpler manner, ex:
> >
> > if (get_fd_from_hostaddr(host, NULL) != get_fd_from_hostaddr(prev_host,
> NULL))
> >   return false
> >
> > +
> > +    merge &= ((prev_host + size) == host);
> >
> > That check could be done early on before doing the more expensive
> memory_region_from_host() calls
> >
> > +
> > +    return merge;
> > +}
> > +
> > +static bool try_merge(RemoteMemSync *sync, MemoryRegionSection *section)
> > +{
> > +    uint64_t mrs_size, mrs_gpa, mrs_page;
> > +    MemoryRegionSection *prev_sec;
> > +    bool merged = false;
> > +    uintptr_t mrs_host;
> > +    RAMBlock *mrs_rb;
> > +
> > +    if (!sync->n_mr_sections) {
> > +        return false;
> > +    }
> > +
> > +    mrs_rb = section->mr->ram_block;
> > +    mrs_page = (uint64_t)qemu_ram_pagesize(mrs_rb);
> > +    mrs_size = int128_get64(section->size);
> > +    mrs_gpa = section->offset_within_address_space;
> > +    mrs_host = (uintptr_t)memory_region_get_ram_ptr(section->mr) +
> > +               section->offset_within_region;
> > +
> > +    if (get_fd_from_hostaddr(mrs_host, NULL) < 0) {
> > +        return true;
> > +    }
> > +
> > +    mrs_host = mrs_host & ~(mrs_page - 1);
> > +    mrs_gpa = mrs_gpa & ~(mrs_page - 1);
> > +    mrs_size = ROUND_UP(mrs_size, mrs_page);
> > +
> > +    prev_sec = sync->mr_sections + (sync->n_mr_sections - 1);
> > +    uint64_t prev_gpa_start = prev_sec->offset_within_address_space;
> > +    uint64_t prev_size = int128_get64(prev_sec->size);
> > +    uint64_t prev_gpa_end   = range_get_last(prev_gpa_start, prev_size);
> > +    uint64_t prev_host_start =
> > +        (uintptr_t)memory_region_get_ram_ptr(prev_sec->mr) +
> > +        prev_sec->offset_within_region;
> > +    uint64_t prev_host_end = range_get_last(prev_host_start, prev_size);
> > +
> > +    if (mrs_gpa <= (prev_gpa_end + 1)) {
> > +        g_assert(mrs_gpa > prev_gpa_start);
> > +
> > +        if ((section->mr == prev_sec->mr) &&
> > +            proxy_mrs_can_merge(mrs_host, prev_host_start,
> > +                                (mrs_gpa - prev_gpa_start))) {
> > +            uint64_t max_end = MAX(prev_host_end, mrs_host + mrs_size);
> > +            merged = true;
> > +            prev_sec->offset_within_address_space =
> > +                MIN(prev_gpa_start, mrs_gpa);
> > +            prev_sec->offset_within_region =
> > +                MIN(prev_host_start, mrs_host) -
> > +                (uintptr_t)memory_region_get_ram_ptr(prev_sec->mr);
> > +            prev_sec->size = int128_make64(max_end -
> MIN(prev_host_start,
> > +                                                         mrs_host));
> > +        }
> > +    }
> > +
> > +    return merged;
> > +}
> > +
> > +static void proxy_ml_region_addnop(MemoryListener *listener,
> > +                                   MemoryRegionSection *section)
> > +{
> > +    RemoteMemSync *sync = container_of(listener, RemoteMemSync,
> listener);
> > +
> > +    if (!(memory_region_is_ram(section->mr) &&
> > +          !memory_region_is_rom(section->mr))) {
> > +        return;
> >
> > A bit clearer in vhost.c:
> > if (memory_region_is_ram(mr) && !memory_region_is_rom(mr)) {
> >
> > +    }
> > +
> > +    if (try_merge(sync, section)) {
> > +        return;
> > +    }
> > +
> > +    ++sync->n_mr_sections;
> > +    sync->mr_sections = g_renew(MemoryRegionSection, sync->mr_sections,
> > +                                sync->n_mr_sections);
> > +    sync->mr_sections[sync->n_mr_sections - 1] = *section;
> > +    sync->mr_sections[sync->n_mr_sections - 1].fv = NULL;
> > +    memory_region_ref(section->mr);
> > +}
> > +
> > +static void proxy_ml_commit(MemoryListener *listener)
> > +{
> > +    RemoteMemSync *sync = container_of(listener, RemoteMemSync,
> listener);
> > +    MPQemuMsg msg;
> > +    MemoryRegionSection *section;
> > +    ram_addr_t offset;
> > +    uintptr_t host_addr;
> > +    int region;
> > +    Error *local_err = NULL;
> > +
> > +    memset(&msg, 0, sizeof(MPQemuMsg));
> > +
> > +    msg.cmd = SYNC_SYSMEM;
> > +    msg.num_fds = sync->n_mr_sections;
> > +    msg.size = sizeof(SyncSysmemMsg);
> > +    if (msg.num_fds > REMOTE_MAX_FDS) {
> > +        error_report("Number of fds is more than %d", REMOTE_MAX_FDS);
> > +        return;
> > +    }
> > +
> > +    for (region = 0; region < sync->n_mr_sections; region++) {
> > +        section = &sync->mr_sections[region];
> > +        msg.data.sync_sysmem.gpas[region] =
> > +            section->offset_within_address_space;
> > +        msg.data.sync_sysmem.sizes[region] =
> int128_get64(section->size);
> > +        host_addr = (uintptr_t)memory_region_get_ram_ptr(section->mr) +
> > +                    section->offset_within_region;
> > +        msg.fds[region] = get_fd_from_hostaddr(host_addr, &offset);
> > +        msg.data.sync_sysmem.offsets[region] = offset;
> > +    }
> > +    mpqemu_msg_send(&msg, sync->ioc, &local_err);
> > +    if (local_err) {
> > +        error_report("Error in sending command %d", msg.cmd);
> > +    }
> > +}
> >
> > That whole complex code above duplicates much of the logic in vhost.c.
> Can we try to factorize it instead?
>
> Hi Marc-Andre,
>
> Thank you for sharing your feedback!
>
> Would it be alright if we addressed this item alone in a separate patch in
> the future? Since
> this refactoring affects vhost code, we’re wondering it would be better to
> address it in a
> future patch to help with any regression analysis in the future.
>

That's fine with me, but please leave a TODO note in the code then.

thanks


> Thank you!
> —
> Jag
>
> >
> > +
> > +void deconfigure_memory_sync(RemoteMemSync *sync)
> > +{
> > +    memory_listener_unregister(&sync->listener);
> > +
> > +    proxy_ml_begin(&sync->listener);
> > +}
> > +
> > +void configure_memory_sync(RemoteMemSync *sync, QIOChannel *ioc)
> > +{
> > +    sync->n_mr_sections = 0;
> > +    sync->mr_sections = NULL;
> > +
> > +    sync->ioc = ioc;
> > +
> > +    sync->listener.begin = proxy_ml_begin;
> > +    sync->listener.commit = proxy_ml_commit;
> > +    sync->listener.region_add = proxy_ml_region_addnop;
> > +    sync->listener.region_nop = proxy_ml_region_addnop;
> > +    sync->listener.priority = 10;
> > +
> > +    memory_listener_register(&sync->listener, &address_space_memory);
> > +}
> > diff --git a/hw/remote/message.c b/hw/remote/message.c
> > index 0f3e38a..454fd2d 100644
> > --- a/hw/remote/message.c
> > +++ b/hw/remote/message.c
> > @@ -17,6 +17,7 @@
> >  #include "sysemu/runstate.h"
> >  #include "hw/pci/pci.h"
> >  #include "exec/memattrs.h"
> > +#include "hw/remote/memory.h"
> >
> >  static void process_config_write(QIOChannel *ioc, PCIDevice *dev,
> >                                   MPQemuMsg *msg);
> > @@ -64,6 +65,10 @@ void coroutine_fn mpqemu_remote_msg_loop_co(void
> *data)
> >          case BAR_READ:
> >              process_bar_read(com->ioc, &msg, &local_err);
> >              break;
> > +        case SYNC_SYSMEM:
> > +            remote_sysmem_reconfig(&msg, &local_err);
> > +            break;
> > +
> >          default:
> >              error_setg(&local_err,
> >                         "Unknown command (%d) received for device %s
> (pid=%d)",
> > diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
> > index 039347d..0f2d1aa 100644
> > --- a/hw/remote/proxy.c
> > +++ b/hw/remote/proxy.c
> > @@ -18,6 +18,8 @@
> >  #include "migration/blocker.h"
> >  #include "hw/remote/mpqemu-link.h"
> >  #include "qemu/error-report.h"
> > +#include "hw/remote/memory-sync.h"
> > +#include "qom/object.h"
> >
> >  static void proxy_set_socket(PCIProxyDev *pdev, int fd, Error **errp)
> >  {
> > @@ -58,6 +60,8 @@ static void pci_proxy_dev_realize(PCIDevice *device,
> Error **errp)
> >
> >      qemu_mutex_init(&dev->io_mutex);
> >      qio_channel_set_blocking(dev->ioc, true, NULL);
> > +
> > +    configure_memory_sync(&dev->sync, dev->ioc);
> >  }
> >
> >  static void pci_proxy_dev_exit(PCIDevice *pdev)
> > @@ -69,6 +73,8 @@ static void pci_proxy_dev_exit(PCIDevice *pdev)
> >      migrate_del_blocker(dev->migration_blocker);
> >
> >      error_free(dev->migration_blocker);
> > +
> > +    deconfigure_memory_sync(&dev->sync);
> >  }
> >
> >  static int config_op_send(PCIProxyDev *pdev, uint32_t addr, uint32_t
> *val,
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index ebd1d1d..5d78b78 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -3150,6 +3150,8 @@ F: include/hw/remote/memory.h
> >  F: hw/remote/memory.c
> >  F: hw/remote/proxy.c
> >  F: include/hw/remote/proxy.h
> > +F: hw/remote/memory-sync.c
> > +F: include/hw/remote/memory-sync.h
> >
> >  Build and test automation
> >  -------------------------
> > diff --git a/hw/remote/meson.build b/hw/remote/meson.build
> > index 569cd20..7d434a5 100644
> > --- a/hw/remote/meson.build
> > +++ b/hw/remote/meson.build
> > @@ -7,5 +7,6 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true:
> files('remote-obj.c'))
> >  remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
> >
> >  specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
> > +specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true:
> files('memory-sync.c'))
> >
> >  softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
> > --
> > 1.8.3.1
> >
> >
> >
> > --
> > Marc-André Lureau
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 20341 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 08/19] multi-process: define MPQemuMsg format and transmission functions
  2020-12-07 13:18   ` Marc-André Lureau
@ 2020-12-10  1:40     ` Elena Ufimtseva
  2020-12-10  8:20       ` Marc-André Lureau
  0 siblings, 1 reply; 52+ messages in thread
From: Elena Ufimtseva @ 2020-12-10  1:40 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Fam Zheng, John G Johnson, Swapnil Ingle, Michael S. Tsirkin,
	QEMU, Gerd Hoffmann, Jagannathan Raman, Juan Quintela,
	Markus Armbruster, Kanth Ghatraju, Felipe Franciosi, Thomas Huth,
	Eduardo Habkost, Konrad Rzeszutek Wilk, Dr. David Alan Gilbert,
	Alex Williamson, Stefan Hajnoczi, Thanos Makatos,
	Richard Henderson, Kevin Wolf, Daniel P. Berrange, Max Reitz,
	Ross Lagerwall, Paolo Bonzini

On Mon, Dec 07, 2020 at 05:18:46PM +0400, Marc-André Lureau wrote:
> Hi
> 
> On Wed, Dec 2, 2020 at 12:25 AM Jagannathan Raman <jag.raman@oracle.com>
> wrote:
> 
> > From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> >
> > Defines MPQemuMsg, which is the message that is sent to the remote
> > process. This message is sent over QIOChannel and is used to
> > command the remote process to perform various tasks.
> > Define transmission functions used by proxy and by remote.
> > There are certain restrictions on where its safe to use these
> > functions:
> >   - From main loop in co-routine context. Will block the main loop if not
> > in
> >     co-routine context;
> >   - From vCPU thread with no co-routine context and if the channel is not
> > part
> >     of the main loop handling;
> >   - From IOThread within co-routine context, outside of co-routine context
> > will
> >     block IOThread;
> >
> > Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> > Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> > Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> > ---
> >  include/hw/remote/mpqemu-link.h |  60 ++++++++++
> >  hw/remote/mpqemu-link.c         | 242
> > ++++++++++++++++++++++++++++++++++++++++
> >  MAINTAINERS                     |   2 +
> >  hw/remote/meson.build           |   1 +
> >  4 files changed, 305 insertions(+)
> >  create mode 100644 include/hw/remote/mpqemu-link.h
> >  create mode 100644 hw/remote/mpqemu-link.c
> >
> > diff --git a/include/hw/remote/mpqemu-link.h
> > b/include/hw/remote/mpqemu-link.h
> > new file mode 100644
> > index 0000000..2d79ff8
> > --- /dev/null
> > +++ b/include/hw/remote/mpqemu-link.h
> > @@ -0,0 +1,60 @@
> > +/*
> > + * Communication channel between QEMU and remote device process
> > + *
> > + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or
> > later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#ifndef MPQEMU_LINK_H
> > +#define MPQEMU_LINK_H
> > +
> > +#include "qom/object.h"
> > +#include "qemu/thread.h"
> > +#include "io/channel.h"
> > +
> > +#define REMOTE_MAX_FDS 8
> > +
> > +#define MPQEMU_MSG_HDR_SIZE offsetof(MPQemuMsg, data.u64)
> > +
> > +/**
> > + * MPQemuCmd:
> > + *
> > + * MPQemuCmd enum type to specify the command to be executed on the remote
> > + * device.
> > + */
> > +typedef enum {
> > +    MPQEMU_CMD_INIT,
> > +    MPQEMU_CMD_MAX,
> > +} MPQemuCmd;
> > +
> > +/**
> > + * MPQemuMsg:
> > + * @cmd: The remote command
> > + * @size: Size of the data to be shared
> > + * @data: Structured data
> > + * @fds: File descriptors to be shared with remote device
> > + *
> > + * MPQemuMsg Format of the message sent to the remote device from QEMU.
> > + *
> > + */
> > +typedef struct {
> > +    int cmd;
> > +    size_t size;
> > +
> > +    union {
> > +        uint64_t u64;
> > +    } data;
> > +
> > +    int fds[REMOTE_MAX_FDS];
> > +    int num_fds;
> > +} MPQemuMsg;
> > +
> > +void mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp);
> > +void mpqemu_msg_recv(MPQemuMsg *msg, QIOChannel *ioc, Error **errp);
> > +
> > +bool mpqemu_msg_valid(MPQemuMsg *msg);
> > +
> > +#endif
> > diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
> > new file mode 100644
> > index 0000000..e535ed2
> > --- /dev/null
> > +++ b/hw/remote/mpqemu-link.c
> > @@ -0,0 +1,242 @@
> > +/*
> > + * Communication channel between QEMU and remote device process
> > + *
> > + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or
> > later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "qemu-common.h"
> > +
> > +#include "qemu/module.h"
> > +#include "hw/remote/mpqemu-link.h"
> > +#include "qapi/error.h"
> > +#include "qemu/iov.h"
> > +#include "qemu/error-report.h"
> > +#include "qemu/main-loop.h"
> > +
> > +/*
> > + * Send message over the ioc QIOChannel.
> > + * This function is safe to call from:
> > + * - From main loop in co-routine context. Will block the main loop if
> > not in
> > + *   co-routine context;
> > + * - From vCPU thread with no co-routine context and if the channel is
> > not part
> > + *   of the main loop handling;
> > + * - From IOThread within co-routine context, outside of co-routine
> > context
> > + *   will block IOThread;
> >
> 
> Can drop the extra "From" on each line.
> 
> + */
> > +void mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp)
> > +{
> > +    bool iolock = qemu_mutex_iothread_locked();
> > +    bool iothread = qemu_get_current_aio_context() ==
> > qemu_get_aio_context() ?
> > +                    false : true;
> >
> 
> I would introduce a qemu_in_iothread() helper (similar to
> qemu_in_coroutine() etc)
> 
> +    Error *local_err = NULL;
> > +    struct iovec send[2] = {0};
> > +    int *fds = NULL;
> > +    size_t nfds = 0;
> > +
> > +    send[0].iov_base = msg;
> > +    send[0].iov_len = MPQEMU_MSG_HDR_SIZE;
> > +
> > +    send[1].iov_base = (void *)&msg->data;
> > +    send[1].iov_len = msg->size;
> > +
> > +    if (msg->num_fds) {
> > +        nfds = msg->num_fds;
> > +        fds = msg->fds;
> > +    }
> > +    /*
> > +     * Dont use in IOThread out of co-routine context as
> > +     * it will block IOThread.
> > +     */
> > +    if (iothread) {
> > +        assert(qemu_in_coroutine());
> > +    }
> >
> 
> or simply assert(!iothread || qemu_in_coroutine())
> 
> +    /*
> > +     * Skip unlocking/locking iothread when in IOThread running
> > +     * in co-routine context. Co-routine context is asserted above
> > +     * for IOThread case.
> > +     * Also skip this while in a co-routine in the main context.
> > +     */
> > +    if (iolock && !iothread && !qemu_in_coroutine()) {
> > +        qemu_mutex_unlock_iothread();
> > +    }
> > +
> > +    (void)qio_channel_writev_full_all(ioc, send, G_N_ELEMENTS(send), fds,
> > nfds,
> > +                                      &local_err);
> >
> 
> That extra (void) is probably unnecessary.
> 
> 
> +
> > +    if (iolock && !iothread && !qemu_in_coroutine()) {
> > +        /* See above comment why skip locking here. */
> > +        qemu_mutex_lock_iothread();
> > +    }
> > +
> > +    if (errp) {
> > +        error_propagate(errp, local_err);
> > +    } else if (local_err) {
> > +        error_report_err(local_err);
> > +    }
> >
> 

Hi Marc-Andre,

Thank you for reviewing the patches.


> Not sure this behaviour is recommended. Instead, a trace and an ERRP_GUARD
> would be more idiomatic.

Did you mean to suggest using trace_ functions for the general use, not only the
failure path. Just want to make sure I understood correctly.

Should the trace file subdirectory (in this case ./hw/remote/) be included into
trace_events_subdirs of meson.build with the condition that CONFIG_MULTIPROCESS is enabled?

Something like 
<snip>

config_devices_mak_file = target + '-config-devices.mak'
devconfig = keyval.load(meson.current_build_dir() / target + '-config-devices.mak')
have_multiprocess = 'CONFIG_MULTIPROCESS' in devconfig

if have_multiproces
...'

</snip>

Thank you!

Elena
> 
> 
> > +
> > +    return;
> >
> 
> That's an unnecessary return. Why not return true/false based on error?
> 
> +}
> > +
> > +/*
> > + * Read message from the ioc QIOChannel.
> > + * This function is safe to call from:
> > + * - From main loop in co-routine context. Will block the main loop if
> > not in
> > + *   co-routine context;
> > + * - From vCPU thread with no co-routine context and if the channel is
> > not part
> > + *   of the main loop handling;
> > + * - From IOThread within co-routine context, outside of co-routine
> > context
> > + *   will block IOThread;
> > + */
> > +static ssize_t mpqemu_read(QIOChannel *ioc, void *buf, size_t len, int
> > **fds,
> > +                           size_t *nfds, Error **errp)
> >
> +{
> > +    struct iovec iov = { .iov_base = buf, .iov_len = len };
> > +    bool iolock = qemu_mutex_iothread_locked();
> > +    bool iothread = qemu_get_current_aio_context() ==
> > qemu_get_aio_context()
> > +                        ? false : true;
> > +    struct iovec *iovp = &iov;
> > +    Error *local_err = NULL;
> > +    unsigned int niov = 1;
> > +    size_t *l_nfds = nfds;
> > +    int **l_fds = fds;
> > +    ssize_t bytes = 0;
> > +    size_t size;
> > +
> > +    size = iov.iov_len;
> > +
> > +    /*
> > +     * Dont use in IOThread out of co-routine context as
> > +     * it will block IOThread.
> > +     */
> > +    if (iothread) {
> > +        assert(qemu_in_coroutine());
> > +    }
> >
> 
> as above
> 
> 
> > +
> > +    while (size > 0) {
> > +        bytes = qio_channel_readv_full(ioc, iovp, niov, l_fds, l_nfds,
> > +                                       &local_err);
> > +        if (bytes == QIO_CHANNEL_ERR_BLOCK) {
> > +            /*
> > +             * Skip unlocking/locking iothread when in IOThread running
> > +             * in co-routine context. Co-routine context is asserted above
> > +             * for IOThread case.
> > +             * Also skip this while in a co-routine in the main context.
> > +             */
> > +            if (iolock && !iothread && !qemu_in_coroutine()) {
> > +                qemu_mutex_unlock_iothread();
> >
> 
> Why not lock the iothread at the beginning of the function and call a
> readv_full_all like we do for writes?
> 
> +            }
> > +            if (qemu_in_coroutine()) {
> > +                qio_channel_yield(ioc, G_IO_IN);
> > +            } else {
> > +                qio_channel_wait(ioc, G_IO_IN);
> > +            }
> > +            /* See above comment why skip locking here. */
> > +            if (iolock && !iothread && !qemu_in_coroutine()) {
> > +                qemu_mutex_lock_iothread();
> > +            }
> > +            continue;
> >
> +        }
> > +
> > +        if (bytes <= 0) {
> > +            error_propagate(errp, local_err);
> > +            return -EIO;
> > +        }
> > +
> > +        l_fds = NULL;
> > +        l_nfds = NULL;
> > +
> > +        size -= bytes;
> > +
> > +        (void)iov_discard_front(&iovp, &niov, bytes);
> >
> 
> needless cast
> 
> +    }
> > +
> > +    return len - size;
> > +}
> > +
> > +void mpqemu_msg_recv(MPQemuMsg *msg, QIOChannel *ioc, Error **errp)
> > +{
> > +    Error *local_err = NULL;
> > +    int *fds = NULL;
> > +    size_t nfds = 0;
> > +    ssize_t len;
> > +
> > +    len = mpqemu_read(ioc, (void *)msg, MPQEMU_MSG_HDR_SIZE, &fds, &nfds,
> >
> 
> This cast is not necessary
> 
> +                      &local_err);
> > +    if (!local_err) {
> > +        if (len == -EIO) {
> > +            error_setg(&local_err, "Connection closed.");
> > +            goto fail;
> > +        }
> > +        if (len < 0) {
> > +            error_setg(&local_err, "Message length is less than 0");
> > +            goto fail;
> > +        }
> > +        if (len != MPQEMU_MSG_HDR_SIZE) {
> > +            error_setg(&local_err, "Message header corrupted");
> > +            goto fail;
> > +        }
> > +    } else {
> > +        goto fail;
> > +    }
> > +
> > +    if (msg->size > sizeof(msg->data)) {
> > +        error_setg(&local_err, "Invalid size for message");
> > +        goto fail;
> > +    }
> > +
> > +    if (mpqemu_read(ioc, (void *)&msg->data, msg->size, NULL, NULL,
> >
> 
> that one too
> 
> +                    &local_err) < 0) {
> > +        goto fail;
> > +    }
> > +
> > +    msg->num_fds = nfds;
> > +    if (nfds > G_N_ELEMENTS(msg->fds)) {
> > +        error_setg(&local_err,
> > +                   "Overflow error: received %zu fds, more than max of %d
> > fds",
> > +                   nfds, REMOTE_MAX_FDS);
> > +        goto fail;
> > +    } else if (nfds) {
> > +        memcpy(msg->fds, fds, nfds * sizeof(int));
> > +    }
> > +
> > +fail:
> > +    while (local_err && nfds) {
> > +        close(fds[nfds - 1]);
> > +        nfds--;
> > +    }
> > +
> > +    g_free(fds);
> > +
> > +    if (errp) {
> > +        error_propagate(errp, local_err);
> > +    } else if (local_err) {
> > +        error_report_err(local_err);
> > +    }
> > +}
> > +
> > +bool mpqemu_msg_valid(MPQemuMsg *msg)
> > +{
> > +    if (msg->cmd >= MPQEMU_CMD_MAX && msg->cmd < 0) {
> > +        return false;
> > +    }
> > +
> > +    /* Verify FDs. */
> > +    if (msg->num_fds >= REMOTE_MAX_FDS) {
> > +        return false;
> > +    }
> > +
> > +    if (msg->num_fds > 0) {
> > +        for (int i = 0; i < msg->num_fds; i++) {
> > +            if (fcntl(msg->fds[i], F_GETFL) == -1) {
> > +                return false;
> > +            }
> > +        }
> > +    }
> > +
> > +    return true;
> > +}
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index c45ac1d..d0c891a 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -3141,6 +3141,8 @@ F: hw/pci-host/remote.c
> >  F: include/hw/pci-host/remote.h
> >  F: hw/remote/machine.c
> >  F: include/hw/remote/machine.h
> > +F: hw/remote/mpqemu-link.c
> > +F: include/hw/remote/mpqemu-link.h
> >
> >  Build and test automation
> >  -------------------------
> > diff --git a/hw/remote/meson.build b/hw/remote/meson.build
> > index 197b038..a2b2fc0 100644
> > --- a/hw/remote/meson.build
> > +++ b/hw/remote/meson.build
> > @@ -1,5 +1,6 @@
> >  remote_ss = ss.source_set()
> >
> >  remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('machine.c'))
> > +remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true:
> > files('mpqemu-link.c'))
> >
> >  softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
> > --
> > 1.8.3.1
> >
> >
> 
> -- 
> Marc-André Lureau


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 08/19] multi-process: define MPQemuMsg format and transmission functions
  2020-12-10  1:40     ` Elena Ufimtseva
@ 2020-12-10  8:20       ` Marc-André Lureau
  2020-12-10 12:53         ` Elena Ufimtseva
  0 siblings, 1 reply; 52+ messages in thread
From: Marc-André Lureau @ 2020-12-10  8:20 UTC (permalink / raw)
  To: Elena Ufimtseva
  Cc: Fam Zheng, John G Johnson, Swapnil Ingle, Michael S. Tsirkin,
	QEMU, Gerd Hoffmann, Jagannathan Raman, Juan Quintela,
	Markus Armbruster, Kanth Ghatraju, Felipe Franciosi, Thomas Huth,
	Eduardo Habkost, Konrad Rzeszutek Wilk, Dr. David Alan Gilbert,
	Alex Williamson, Stefan Hajnoczi, Thanos Makatos,
	Richard Henderson, Kevin Wolf, Daniel P. Berrange, Max Reitz,
	Ross Lagerwall, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 9623 bytes --]

Hi

On Thu, Dec 10, 2020 at 5:42 AM Elena Ufimtseva <elena.ufimtseva@oracle.com>
wrote:

> On Mon, Dec 07, 2020 at 05:18:46PM +0400, Marc-André Lureau wrote:
> > Hi
> >
> > On Wed, Dec 2, 2020 at 12:25 AM Jagannathan Raman <jag.raman@oracle.com>
> > wrote:
> >
> > > From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> > >
> > > Defines MPQemuMsg, which is the message that is sent to the remote
> > > process. This message is sent over QIOChannel and is used to
> > > command the remote process to perform various tasks.
> > > Define transmission functions used by proxy and by remote.
> > > There are certain restrictions on where its safe to use these
> > > functions:
> > >   - From main loop in co-routine context. Will block the main loop if
> not
> > > in
> > >     co-routine context;
> > >   - From vCPU thread with no co-routine context and if the channel is
> not
> > > part
> > >     of the main loop handling;
> > >   - From IOThread within co-routine context, outside of co-routine
> context
> > > will
> > >     block IOThread;
> > >
> > > Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> > > Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> > > Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> > > ---
> > >  include/hw/remote/mpqemu-link.h |  60 ++++++++++
> > >  hw/remote/mpqemu-link.c         | 242
> > > ++++++++++++++++++++++++++++++++++++++++
> > >  MAINTAINERS                     |   2 +
> > >  hw/remote/meson.build           |   1 +
> > >  4 files changed, 305 insertions(+)
> > >  create mode 100644 include/hw/remote/mpqemu-link.h
> > >  create mode 100644 hw/remote/mpqemu-link.c
> > >
> > > diff --git a/include/hw/remote/mpqemu-link.h
> > > b/include/hw/remote/mpqemu-link.h
> > > new file mode 100644
> > > index 0000000..2d79ff8
> > > --- /dev/null
> > > +++ b/include/hw/remote/mpqemu-link.h
> > > @@ -0,0 +1,60 @@
> > > +/*
> > > + * Communication channel between QEMU and remote device process
> > > + *
> > > + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> > > + *
> > > + * This work is licensed under the terms of the GNU GPL, version 2 or
> > > later.
> > > + * See the COPYING file in the top-level directory.
> > > + *
> > > + */
> > > +
> > > +#ifndef MPQEMU_LINK_H
> > > +#define MPQEMU_LINK_H
> > > +
> > > +#include "qom/object.h"
> > > +#include "qemu/thread.h"
> > > +#include "io/channel.h"
> > > +
> > > +#define REMOTE_MAX_FDS 8
> > > +
> > > +#define MPQEMU_MSG_HDR_SIZE offsetof(MPQemuMsg, data.u64)
> > > +
> > > +/**
> > > + * MPQemuCmd:
> > > + *
> > > + * MPQemuCmd enum type to specify the command to be executed on the
> remote
> > > + * device.
> > > + */
> > > +typedef enum {
> > > +    MPQEMU_CMD_INIT,
> > > +    MPQEMU_CMD_MAX,
> > > +} MPQemuCmd;
> > > +
> > > +/**
> > > + * MPQemuMsg:
> > > + * @cmd: The remote command
> > > + * @size: Size of the data to be shared
> > > + * @data: Structured data
> > > + * @fds: File descriptors to be shared with remote device
> > > + *
> > > + * MPQemuMsg Format of the message sent to the remote device from
> QEMU.
> > > + *
> > > + */
> > > +typedef struct {
> > > +    int cmd;
> > > +    size_t size;
> > > +
> > > +    union {
> > > +        uint64_t u64;
> > > +    } data;
> > > +
> > > +    int fds[REMOTE_MAX_FDS];
> > > +    int num_fds;
> > > +} MPQemuMsg;
> > > +
> > > +void mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp);
> > > +void mpqemu_msg_recv(MPQemuMsg *msg, QIOChannel *ioc, Error **errp);
> > > +
> > > +bool mpqemu_msg_valid(MPQemuMsg *msg);
> > > +
> > > +#endif
> > > diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
> > > new file mode 100644
> > > index 0000000..e535ed2
> > > --- /dev/null
> > > +++ b/hw/remote/mpqemu-link.c
> > > @@ -0,0 +1,242 @@
> > > +/*
> > > + * Communication channel between QEMU and remote device process
> > > + *
> > > + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> > > + *
> > > + * This work is licensed under the terms of the GNU GPL, version 2 or
> > > later.
> > > + * See the COPYING file in the top-level directory.
> > > + *
> > > + */
> > > +
> > > +#include "qemu/osdep.h"
> > > +#include "qemu-common.h"
> > > +
> > > +#include "qemu/module.h"
> > > +#include "hw/remote/mpqemu-link.h"
> > > +#include "qapi/error.h"
> > > +#include "qemu/iov.h"
> > > +#include "qemu/error-report.h"
> > > +#include "qemu/main-loop.h"
> > > +
> > > +/*
> > > + * Send message over the ioc QIOChannel.
> > > + * This function is safe to call from:
> > > + * - From main loop in co-routine context. Will block the main loop if
> > > not in
> > > + *   co-routine context;
> > > + * - From vCPU thread with no co-routine context and if the channel is
> > > not part
> > > + *   of the main loop handling;
> > > + * - From IOThread within co-routine context, outside of co-routine
> > > context
> > > + *   will block IOThread;
> > >
> >
> > Can drop the extra "From" on each line.
> >
> > + */
> > > +void mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp)
> > > +{
> > > +    bool iolock = qemu_mutex_iothread_locked();
> > > +    bool iothread = qemu_get_current_aio_context() ==
> > > qemu_get_aio_context() ?
> > > +                    false : true;
> > >
> >
> > I would introduce a qemu_in_iothread() helper (similar to
> > qemu_in_coroutine() etc)
> >
> > +    Error *local_err = NULL;
> > > +    struct iovec send[2] = {0};
> > > +    int *fds = NULL;
> > > +    size_t nfds = 0;
> > > +
> > > +    send[0].iov_base = msg;
> > > +    send[0].iov_len = MPQEMU_MSG_HDR_SIZE;
> > > +
> > > +    send[1].iov_base = (void *)&msg->data;
> > > +    send[1].iov_len = msg->size;
> > > +
> > > +    if (msg->num_fds) {
> > > +        nfds = msg->num_fds;
> > > +        fds = msg->fds;
> > > +    }
> > > +    /*
> > > +     * Dont use in IOThread out of co-routine context as
> > > +     * it will block IOThread.
> > > +     */
> > > +    if (iothread) {
> > > +        assert(qemu_in_coroutine());
> > > +    }
> > >
> >
> > or simply assert(!iothread || qemu_in_coroutine())
> >
> > +    /*
> > > +     * Skip unlocking/locking iothread when in IOThread running
> > > +     * in co-routine context. Co-routine context is asserted above
> > > +     * for IOThread case.
> > > +     * Also skip this while in a co-routine in the main context.
> > > +     */
> > > +    if (iolock && !iothread && !qemu_in_coroutine()) {
> > > +        qemu_mutex_unlock_iothread();
> > > +    }
> > > +
> > > +    (void)qio_channel_writev_full_all(ioc, send, G_N_ELEMENTS(send),
> fds,
> > > nfds,
> > > +                                      &local_err);
> > >
> >
> > That extra (void) is probably unnecessary.
> >
> >
> > +
> > > +    if (iolock && !iothread && !qemu_in_coroutine()) {
> > > +        /* See above comment why skip locking here. */
> > > +        qemu_mutex_lock_iothread();
> > > +    }
> > > +
> > > +    if (errp) {
> > > +        error_propagate(errp, local_err);
> > > +    } else if (local_err) {
> > > +        error_report_err(local_err);
> > > +    }
> > >
> >
>
> Hi Marc-Andre,
>
> Thank you for reviewing the patches.
>
>
> > Not sure this behaviour is recommended. Instead, a trace and an
> ERRP_GUARD
> > would be more idiomatic.
>
> Did you mean to suggest using trace_ functions for the general use, not
> only the
> failure path. Just want to make sure I understood correctly.
>

That's what I would suggest for error handling: (not tested)

diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
index d75b4782ee..a7ac37627e 100644
--- a/hw/remote/mpqemu-link.c
+++ b/hw/remote/mpqemu-link.c
@@ -31,10 +31,10 @@
  */
 void mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp)
 {
+    ERRP_GUARD();
     bool iolock = qemu_mutex_iothread_locked();
     bool iothread = qemu_get_current_aio_context() ==
qemu_get_aio_context() ?
                     false : true;
-    Error *local_err = NULL;
     struct iovec send[2] = {0};
     int *fds = NULL;
     size_t nfds = 0;
@@ -66,21 +66,15 @@ void mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc,
Error **errp)
         qemu_mutex_unlock_iothread();
     }

-    (void)qio_channel_writev_full_all(ioc, send, G_N_ELEMENTS(send), fds,
nfds,
-                                      &local_err);
+    if (qio_channel_writev_full_all(ioc, send, G_N_ELEMENTS(send), fds,
nfds, errp) == -1) {
+        trace_mpqemu_io_error(msg, ioc, error_get_pretty(*errp));
+    }

     if (iolock && !iothread && !qemu_in_coroutine()) {
         /* See above comment why skip locking here. */
         qemu_mutex_lock_iothread();
     }

-    if (errp) {
-        error_propagate(errp, local_err);
-    } else if (local_err) {
-        error_report_err(local_err);
-    }
-
-    return;
 }




>
> Should the trace file subdirectory (in this case ./hw/remote/) be included
> into
> trace_events_subdirs of meson.build with the condition that
> CONFIG_MULTIPROCESS is enabled?
>
> Something like
> <snip>
>
> config_devices_mak_file = target + '-config-devices.mak'
> devconfig = keyval.load(meson.current_build_dir() / target +
> '-config-devices.mak')
> have_multiprocess = 'CONFIG_MULTIPROCESS' in devconfig
>
> if have_multiproces
> ...'
>
> </snip>
>

That shouldn't be necessary, do like the other hw/ traces, adding themself
to trace_events_subdirs.


-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 12918 bytes --]

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 00/19] Initial support for multi-process Qemu
  2020-12-03 20:40   ` Peter Maydell
@ 2020-12-10 11:13     ` Stefan Hajnoczi
  2020-12-10 11:24       ` Peter Maydell
  0 siblings, 1 reply; 52+ messages in thread
From: Stefan Hajnoczi @ 2020-12-10 11:13 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Elena Ufimtseva, Fam Zheng, Swapnil Ingle, john.g.johnson,
	QEMU Developers, Gerd Hoffmann, Jagannathan Raman, Juan Quintela,
	Michael S. Tsirkin, Markus Armbruster, kanth.ghatraju,
	Felipe Franciosi, Thomas Huth, Eduardo Habkost, konrad.wilk,
	Dr. David Alan Gilbert, Alex Williamson, thanos.makatos,
	Richard Henderson, Kevin Wolf, Daniel P. Berrange, Max Reitz,
	Ross Lagerwall, Marc-André Lureau, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 1474 bytes --]

On Thu, Dec 03, 2020 at 08:40:11PM +0000, Peter Maydell wrote:
> On Thu, 3 Dec 2020 at 09:51, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > On Tue, Dec 01, 2020 at 03:22:35PM -0500, Jagannathan Raman wrote:
> > > This is the v12 of the patchset. Thank you very much for the
> > > review of the v11 of the series.
> >
> > I'm in favor of merging this for QEMU 6.0. The command-line interface
> > has the x- prefix so QEMU is not committing to a stable interface.
> > Changes needed to support additional device types or to switch to the
> > vfio-user protocol can be made later.
> >
> > Jag, Elena, JJ: I suggest getting your GPG key to Peter Maydell so you
> > can send multi-process QEMU pull requests.
> 
> I would prefer to see this going through the tree of an
> established QEMU developer who's already sending pullrequests,
> at least initially.

Once the discussion has completed I can send the patches in a pull
request.

I don't want to be the bottleneck for all multi-process QEMU patches in
the future though. That's why I think the authors should be able to send
pull requests on their own after the initial code is merged. Much of
this work is isolated an only affects multi-process QEMU and the feature
is marked experimental. There is little risk of introducing instability
for non-multi-process QEMU users/developers. Hence why this is a new
subsystem and has MAINTAINERS files entries.

Does that sound good?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 00/19] Initial support for multi-process Qemu
  2020-12-10 11:13     ` Stefan Hajnoczi
@ 2020-12-10 11:24       ` Peter Maydell
  2020-12-10 15:31         ` Stefan Hajnoczi
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Maydell @ 2020-12-10 11:24 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Elena Ufimtseva, Fam Zheng, Swapnil Ingle, john.g.johnson,
	QEMU Developers, Gerd Hoffmann, Jagannathan Raman, Juan Quintela,
	Michael S. Tsirkin, Markus Armbruster, kanth.ghatraju,
	Felipe Franciosi, Thomas Huth, Eduardo Habkost, konrad.wilk,
	Dr. David Alan Gilbert, Alex Williamson, thanos.makatos,
	Richard Henderson, Kevin Wolf, Daniel P. Berrange, Max Reitz,
	Ross Lagerwall, Marc-André Lureau, Paolo Bonzini

On Thu, 10 Dec 2020 at 11:14, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> On Thu, Dec 03, 2020 at 08:40:11PM +0000, Peter Maydell wrote:
> > I would prefer to see this going through the tree of an
> > established QEMU developer who's already sending pullrequests,
> > at least initially.
>
> Once the discussion has completed I can send the patches in a pull
> request.
>
> I don't want to be the bottleneck for all multi-process QEMU patches in
> the future though. That's why I think the authors should be able to send
> pull requests on their own after the initial code is merged. Much of
> this work is isolated an only affects multi-process QEMU and the feature
> is marked experimental. There is little risk of introducing instability
> for non-multi-process QEMU users/developers. Hence why this is a new
> subsystem and has MAINTAINERS files entries.

My reasoning is basically that new pull-request senders are more
work for me, because I have to make sure they have a GPG key set
up, and then examine pull requests pretty carefully to check they're
well-formed, all the sign-offs are correct, the changes aren't
touching areas of the codebase that they shouldn't, and so on.
That's particularly painful if the first pull request that comes
through is a massive one rather than "here's a small number of
patches with some bug fixes".

thanks
-- PMM


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 08/19] multi-process: define MPQemuMsg format and transmission functions
  2020-12-10  8:20       ` Marc-André Lureau
@ 2020-12-10 12:53         ` Elena Ufimtseva
  0 siblings, 0 replies; 52+ messages in thread
From: Elena Ufimtseva @ 2020-12-10 12:53 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Fam Zheng, John G Johnson, Swapnil Ingle, Michael S. Tsirkin,
	QEMU, Gerd Hoffmann, Jagannathan Raman, Juan Quintela,
	Markus Armbruster, Kanth Ghatraju, Felipe Franciosi, Thomas Huth,
	Eduardo Habkost, Konrad Rzeszutek Wilk, Dr. David Alan Gilbert,
	Alex Williamson, Stefan Hajnoczi, Thanos Makatos,
	Richard Henderson, Kevin Wolf, Daniel P. Berrange, Max Reitz,
	Ross Lagerwall, Paolo Bonzini

On Thu, Dec 10, 2020 at 12:20:06PM +0400, Marc-André Lureau wrote:
> Hi
> 
> On Thu, Dec 10, 2020 at 5:42 AM Elena Ufimtseva <elena.ufimtseva@oracle.com>
> wrote:
> 
> > On Mon, Dec 07, 2020 at 05:18:46PM +0400, Marc-André Lureau wrote:
> > > Hi
> > >
> > > On Wed, Dec 2, 2020 at 12:25 AM Jagannathan Raman <jag.raman@oracle.com>
> > > wrote:
> > >
> > > > From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> > > >
> > > > Defines MPQemuMsg, which is the message that is sent to the remote
> > > > process. This message is sent over QIOChannel and is used to
> > > > command the remote process to perform various tasks.
> > > > Define transmission functions used by proxy and by remote.
> > > > There are certain restrictions on where its safe to use these
> > > > functions:
> > > >   - From main loop in co-routine context. Will block the main loop if
> > not
> > > > in
> > > >     co-routine context;
> > > >   - From vCPU thread with no co-routine context and if the channel is
> > not
> > > > part
> > > >     of the main loop handling;
> > > >   - From IOThread within co-routine context, outside of co-routine
> > context
> > > > will
> > > >     block IOThread;
> > > >
> > > > Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> > > > Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> > > > Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> > > > ---
> > > >  include/hw/remote/mpqemu-link.h |  60 ++++++++++
> > > >  hw/remote/mpqemu-link.c         | 242
> > > > ++++++++++++++++++++++++++++++++++++++++
> > > >  MAINTAINERS                     |   2 +
> > > >  hw/remote/meson.build           |   1 +
> > > >  4 files changed, 305 insertions(+)
> > > >  create mode 100644 include/hw/remote/mpqemu-link.h
> > > >  create mode 100644 hw/remote/mpqemu-link.c
> > > >
> > > > diff --git a/include/hw/remote/mpqemu-link.h
> > > > b/include/hw/remote/mpqemu-link.h
> > > > new file mode 100644
> > > > index 0000000..2d79ff8
> > > > --- /dev/null
> > > > +++ b/include/hw/remote/mpqemu-link.h
> > > > @@ -0,0 +1,60 @@
> > > > +/*
> > > > + * Communication channel between QEMU and remote device process
> > > > + *
> > > > + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> > > > + *
> > > > + * This work is licensed under the terms of the GNU GPL, version 2 or
> > > > later.
> > > > + * See the COPYING file in the top-level directory.
> > > > + *
> > > > + */
> > > > +
> > > > +#ifndef MPQEMU_LINK_H
> > > > +#define MPQEMU_LINK_H
> > > > +
> > > > +#include "qom/object.h"
> > > > +#include "qemu/thread.h"
> > > > +#include "io/channel.h"
> > > > +
> > > > +#define REMOTE_MAX_FDS 8
> > > > +
> > > > +#define MPQEMU_MSG_HDR_SIZE offsetof(MPQemuMsg, data.u64)
> > > > +
> > > > +/**
> > > > + * MPQemuCmd:
> > > > + *
> > > > + * MPQemuCmd enum type to specify the command to be executed on the
> > remote
> > > > + * device.
> > > > + */
> > > > +typedef enum {
> > > > +    MPQEMU_CMD_INIT,
> > > > +    MPQEMU_CMD_MAX,
> > > > +} MPQemuCmd;
> > > > +
> > > > +/**
> > > > + * MPQemuMsg:
> > > > + * @cmd: The remote command
> > > > + * @size: Size of the data to be shared
> > > > + * @data: Structured data
> > > > + * @fds: File descriptors to be shared with remote device
> > > > + *
> > > > + * MPQemuMsg Format of the message sent to the remote device from
> > QEMU.
> > > > + *
> > > > + */
> > > > +typedef struct {
> > > > +    int cmd;
> > > > +    size_t size;
> > > > +
> > > > +    union {
> > > > +        uint64_t u64;
> > > > +    } data;
> > > > +
> > > > +    int fds[REMOTE_MAX_FDS];
> > > > +    int num_fds;
> > > > +} MPQemuMsg;
> > > > +
> > > > +void mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp);
> > > > +void mpqemu_msg_recv(MPQemuMsg *msg, QIOChannel *ioc, Error **errp);
> > > > +
> > > > +bool mpqemu_msg_valid(MPQemuMsg *msg);
> > > > +
> > > > +#endif
> > > > diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
> > > > new file mode 100644
> > > > index 0000000..e535ed2
> > > > --- /dev/null
> > > > +++ b/hw/remote/mpqemu-link.c
> > > > @@ -0,0 +1,242 @@
> > > > +/*
> > > > + * Communication channel between QEMU and remote device process
> > > > + *
> > > > + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> > > > + *
> > > > + * This work is licensed under the terms of the GNU GPL, version 2 or
> > > > later.
> > > > + * See the COPYING file in the top-level directory.
> > > > + *
> > > > + */
> > > > +
> > > > +#include "qemu/osdep.h"
> > > > +#include "qemu-common.h"
> > > > +
> > > > +#include "qemu/module.h"
> > > > +#include "hw/remote/mpqemu-link.h"
> > > > +#include "qapi/error.h"
> > > > +#include "qemu/iov.h"
> > > > +#include "qemu/error-report.h"
> > > > +#include "qemu/main-loop.h"
> > > > +
> > > > +/*
> > > > + * Send message over the ioc QIOChannel.
> > > > + * This function is safe to call from:
> > > > + * - From main loop in co-routine context. Will block the main loop if
> > > > not in
> > > > + *   co-routine context;
> > > > + * - From vCPU thread with no co-routine context and if the channel is
> > > > not part
> > > > + *   of the main loop handling;
> > > > + * - From IOThread within co-routine context, outside of co-routine
> > > > context
> > > > + *   will block IOThread;
> > > >
> > >
> > > Can drop the extra "From" on each line.
> > >
> > > + */
> > > > +void mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp)
> > > > +{
> > > > +    bool iolock = qemu_mutex_iothread_locked();
> > > > +    bool iothread = qemu_get_current_aio_context() ==
> > > > qemu_get_aio_context() ?
> > > > +                    false : true;
> > > >
> > >
> > > I would introduce a qemu_in_iothread() helper (similar to
> > > qemu_in_coroutine() etc)
> > >
> > > +    Error *local_err = NULL;
> > > > +    struct iovec send[2] = {0};
> > > > +    int *fds = NULL;
> > > > +    size_t nfds = 0;
> > > > +
> > > > +    send[0].iov_base = msg;
> > > > +    send[0].iov_len = MPQEMU_MSG_HDR_SIZE;
> > > > +
> > > > +    send[1].iov_base = (void *)&msg->data;
> > > > +    send[1].iov_len = msg->size;
> > > > +
> > > > +    if (msg->num_fds) {
> > > > +        nfds = msg->num_fds;
> > > > +        fds = msg->fds;
> > > > +    }
> > > > +    /*
> > > > +     * Dont use in IOThread out of co-routine context as
> > > > +     * it will block IOThread.
> > > > +     */
> > > > +    if (iothread) {
> > > > +        assert(qemu_in_coroutine());
> > > > +    }
> > > >
> > >
> > > or simply assert(!iothread || qemu_in_coroutine())
> > >
> > > +    /*
> > > > +     * Skip unlocking/locking iothread when in IOThread running
> > > > +     * in co-routine context. Co-routine context is asserted above
> > > > +     * for IOThread case.
> > > > +     * Also skip this while in a co-routine in the main context.
> > > > +     */
> > > > +    if (iolock && !iothread && !qemu_in_coroutine()) {
> > > > +        qemu_mutex_unlock_iothread();
> > > > +    }
> > > > +
> > > > +    (void)qio_channel_writev_full_all(ioc, send, G_N_ELEMENTS(send),
> > fds,
> > > > nfds,
> > > > +                                      &local_err);
> > > >
> > >
> > > That extra (void) is probably unnecessary.
> > >
> > >
> > > +
> > > > +    if (iolock && !iothread && !qemu_in_coroutine()) {
> > > > +        /* See above comment why skip locking here. */
> > > > +        qemu_mutex_lock_iothread();
> > > > +    }
> > > > +
> > > > +    if (errp) {
> > > > +        error_propagate(errp, local_err);
> > > > +    } else if (local_err) {
> > > > +        error_report_err(local_err);
> > > > +    }
> > > >
> > >
> >
> > Hi Marc-Andre,
> >
> > Thank you for reviewing the patches.
> >
> >
> > > Not sure this behaviour is recommended. Instead, a trace and an
> > ERRP_GUARD
> > > would be more idiomatic.
> >
> > Did you mean to suggest using trace_ functions for the general use, not
> > only the
> > failure path. Just want to make sure I understood correctly.
> >
> 
> That's what I would suggest for error handling: (not tested)
> 
> diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
> index d75b4782ee..a7ac37627e 100644
> --- a/hw/remote/mpqemu-link.c
> +++ b/hw/remote/mpqemu-link.c
> @@ -31,10 +31,10 @@
>   */
>  void mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp)
>  {
> +    ERRP_GUARD();
>      bool iolock = qemu_mutex_iothread_locked();
>      bool iothread = qemu_get_current_aio_context() ==
> qemu_get_aio_context() ?
>                      false : true;
> -    Error *local_err = NULL;
>      struct iovec send[2] = {0};
>      int *fds = NULL;
>      size_t nfds = 0;
> @@ -66,21 +66,15 @@ void mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc,
> Error **errp)
>          qemu_mutex_unlock_iothread();
>      }
> 
> -    (void)qio_channel_writev_full_all(ioc, send, G_N_ELEMENTS(send), fds,
> nfds,
> -                                      &local_err);
> +    if (qio_channel_writev_full_all(ioc, send, G_N_ELEMENTS(send), fds,
> nfds, errp) == -1) {
> +        trace_mpqemu_io_error(msg, ioc, error_get_pretty(*errp));
> +    }


Thanks, that answers my question. I didn't see the examples that
convinced me using trace events as the means of error reporting.
Now I do :)
> 
>      if (iolock && !iothread && !qemu_in_coroutine()) {
>          /* See above comment why skip locking here. */
>          qemu_mutex_lock_iothread();
>      }
> 
> -    if (errp) {
> -        error_propagate(errp, local_err);
> -    } else if (local_err) {
> -        error_report_err(local_err);
> -    }
> -
> -    return;
>  }
> 
> 
> 
> 
> >
> > Should the trace file subdirectory (in this case ./hw/remote/) be included
> > into
> > trace_events_subdirs of meson.build with the condition that
> > CONFIG_MULTIPROCESS is enabled?
> >
> > Something like
> > <snip>
> >
> > config_devices_mak_file = target + '-config-devices.mak'
> > devconfig = keyval.load(meson.current_build_dir() / target +
> > '-config-devices.mak')
> > have_multiprocess = 'CONFIG_MULTIPROCESS' in devconfig
> >
> > if have_multiproces
> > ...'
> >
> > </snip>
> >
> 
> That shouldn't be necessary, do like the other hw/ traces, adding themself
> to trace_events_subdirs.

Got it, thank you!
> 
> 
> -- 
> Marc-André Lureau


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 00/19] Initial support for multi-process Qemu
  2020-12-10 11:24       ` Peter Maydell
@ 2020-12-10 15:31         ` Stefan Hajnoczi
  0 siblings, 0 replies; 52+ messages in thread
From: Stefan Hajnoczi @ 2020-12-10 15:31 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Elena Ufimtseva, Fam Zheng, Swapnil Ingle, john.g.johnson,
	QEMU Developers, Gerd Hoffmann, Jagannathan Raman, Juan Quintela,
	Michael S. Tsirkin, Markus Armbruster, kanth.ghatraju,
	Felipe Franciosi, Thomas Huth, Eduardo Habkost, konrad.wilk,
	Dr. David Alan Gilbert, Alex Williamson, thanos.makatos,
	Richard Henderson, Kevin Wolf, Daniel P. Berrange, Max Reitz,
	Ross Lagerwall, Marc-André Lureau, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 1601 bytes --]

On Thu, Dec 10, 2020 at 11:24:46AM +0000, Peter Maydell wrote:
> On Thu, 10 Dec 2020 at 11:14, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > On Thu, Dec 03, 2020 at 08:40:11PM +0000, Peter Maydell wrote:
> > > I would prefer to see this going through the tree of an
> > > established QEMU developer who's already sending pullrequests,
> > > at least initially.
> >
> > Once the discussion has completed I can send the patches in a pull
> > request.
> >
> > I don't want to be the bottleneck for all multi-process QEMU patches in
> > the future though. That's why I think the authors should be able to send
> > pull requests on their own after the initial code is merged. Much of
> > this work is isolated an only affects multi-process QEMU and the feature
> > is marked experimental. There is little risk of introducing instability
> > for non-multi-process QEMU users/developers. Hence why this is a new
> > subsystem and has MAINTAINERS files entries.
> 
> My reasoning is basically that new pull-request senders are more
> work for me, because I have to make sure they have a GPG key set
> up, and then examine pull requests pretty carefully to check they're
> well-formed, all the sign-offs are correct, the changes aren't
> touching areas of the codebase that they shouldn't, and so on.
> That's particularly painful if the first pull request that comes
> through is a massive one rather than "here's a small number of
> patches with some bug fixes".

Thanks for explaining. I will merge this series when review has
finished and send you a pull request.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v12 16/19] multi-process: Synchronize remote memory
  2020-12-09 21:28       ` Marc-André Lureau
@ 2020-12-10 16:57         ` Jag Raman
  0 siblings, 0 replies; 52+ messages in thread
From: Jag Raman @ 2020-12-10 16:57 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Elena Ufimtseva, Fam Zheng, Swapnil Ingle, John G Johnson, QEMU,
	Gerd Hoffmann, Juan Quintela, Michael S. Tsirkin,
	Markus Armbruster, Kanth Ghatraju, Felipe Franciosi, Thomas Huth,
	Eduardo Habkost, Konrad Rzeszutek Wilk, Dr. David Alan Gilbert,
	Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Richard Henderson, Kevin Wolf, Daniel P. Berrange, Max Reitz,
	Ross Lagerwall, Thanos Makatos



> On Dec 9, 2020, at 4:28 PM, Marc-André Lureau <marcandre.lureau@gmail.com> wrote:
> 
> 
> 
> On Wed, Dec 9, 2020 at 8:20 PM Jag Raman <jag.raman@oracle.com> wrote:
> 
> 
> > On Dec 8, 2020, at 8:57 AM, Marc-André Lureau <marcandre.lureau@gmail.com> wrote:
> > 
> > Hi
> > 
> > On Wed, Dec 2, 2020 at 12:23 AM Jagannathan Raman <jag.raman@oracle.com> wrote:
> > Add memory-listener object which is used to keep the view of the RAM
> > in sync between QEMU and remote process.
> > A MemoryListener is registered for system-memory AddressSpace. The
> > listener sends SYNC_SYSMEM message to the remote process when memory
> > listener commits the changes to memory, the remote process receives
> > the message and processes it in the handler for SYNC_SYSMEM message.
> > 
> > Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> > Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> > Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  include/hw/remote/memory-sync.h |  27 ++++++
> >  include/hw/remote/proxy.h       |   2 +
> >  hw/remote/memory-sync.c         | 210 ++++++++++++++++++++++++++++++++++++++++
> >  hw/remote/message.c             |   5 +
> >  hw/remote/proxy.c               |   6 ++
> >  MAINTAINERS                     |   2 +
> >  hw/remote/meson.build           |   1 +
> >  7 files changed, 253 insertions(+)
> >  create mode 100644 include/hw/remote/memory-sync.h
> >  create mode 100644 hw/remote/memory-sync.c
> > 
> > diff --git a/include/hw/remote/memory-sync.h b/include/hw/remote/memory-sync.h
> > new file mode 100644
> > index 0000000..785f76a
> > --- /dev/null
> > +++ b/include/hw/remote/memory-sync.h
> > @@ -0,0 +1,27 @@
> > +/*
> > + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#ifndef MEMORY_SYNC_H
> > +#define MEMORY_SYNC_H
> > +
> > +#include "exec/memory.h"
> > +#include "io/channel.h"
> > +
> > +typedef struct RemoteMemSync {
> > +    MemoryListener listener;
> > +
> > +    int n_mr_sections;
> > +    MemoryRegionSection *mr_sections;
> > +
> > +    QIOChannel *ioc;
> > +} RemoteMemSync;
> > +
> > +void configure_memory_sync(RemoteMemSync *sync, QIOChannel *ioc);
> > +void deconfigure_memory_sync(RemoteMemSync *sync);
> > 
> > RemoteMemSync vs MemorySync, and function with _memory_sync suffixes...
> > Naming things is hard, but trying to be consistent generally helps.
> > 
> > My understanding is that this is a proxy-dev helper to handle memory listening and sending SYNC_SYSMEM.
> > 
> > I would thus suggest naming it ProxyMemoryListener. It could eventually be folded in proxy.c
> > 
> > Please try to be consistent with header naming, structure naming, type, functions and enum prefixes etc.
> > 
> > proxy_memory_listener isn't that long imho.
> > 
> > +
> > +#endif
> > diff --git a/include/hw/remote/proxy.h b/include/hw/remote/proxy.h
> > index e29c61b..a687b7d 100644
> > --- a/include/hw/remote/proxy.h
> > +++ b/include/hw/remote/proxy.h
> > @@ -11,6 +11,7 @@
> > 
> >  #include "hw/pci/pci.h"
> >  #include "io/channel.h"
> > +#include "hw/remote/memory-sync.h"
> > 
> >  #define TYPE_PCI_PROXY_DEV "x-pci-proxy-dev"
> > 
> > @@ -40,6 +41,7 @@ struct PCIProxyDev {
> >      QemuMutex io_mutex;
> >      QIOChannel *ioc;
> >      Error *migration_blocker;
> > +    RemoteMemSync sync;
> >      ProxyMemoryRegion region[PCI_NUM_REGIONS];
> >  };
> > 
> > diff --git a/hw/remote/memory-sync.c b/hw/remote/memory-sync.c
> > new file mode 100644
> > index 0000000..2365e69
> > --- /dev/null
> > +++ b/hw/remote/memory-sync.c
> > @@ -0,0 +1,210 @@
> > +/*
> > + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "qemu-common.h"
> > +
> > +#include "qemu/compiler.h"
> > +#include "qemu/int128.h"
> > +#include "qemu/range.h"
> > +#include "exec/memory.h"
> > +#include "exec/cpu-common.h"
> > +#include "cpu.h"
> > +#include "exec/ram_addr.h"
> > +#include "exec/address-spaces.h"
> > +#include "hw/remote/mpqemu-link.h"
> > +#include "hw/remote/memory-sync.h"
> > +
> > +static void proxy_ml_begin(MemoryListener *listener)
> > 
> > I suggest to rename begin -> reset 
> > 
> > +{
> > +    RemoteMemSync *sync = container_of(listener, RemoteMemSync, listener);
> > +    int mrs;
> > +
> > +    for (mrs = 0; mrs < sync->n_mr_sections; mrs++) {
> > +        memory_region_unref(sync->mr_sections[mrs].mr);
> > +    }
> > +
> > +    g_free(sync->mr_sections);
> > +    sync->mr_sections = NULL;
> > +    sync->n_mr_sections = 0;
> > +}
> > +
> > +static int get_fd_from_hostaddr(uint64_t host, ram_addr_t *offset)
> > 
> > This function is very similar to vhost_user_get_mr_data(). That suggests we could factor the code.
> > 
> > Perhaps a new memory_region_from_host_full(), or extend memory_region_from_host() with an extra optional "int *fd" argument.
> >  
> > +{
> > +    MemoryRegion *mr;
> > +    ram_addr_t off;
> > +
> > +    /**
> > +     * Assumes that the host address is a valid address as it's
> > +     * coming from the MemoryListener system. In the case host
> > +     * address is not valid, the following call would return
> > +     * the default subregion of "system_memory" region, and
> > +     * not NULL. So it's not possible to check for NULL here.
> > +     */
> > +    mr = memory_region_from_host((void *)(uintptr_t)host, &off);
> > +
> > +    if (offset) {
> > +        *offset = off;
> > +    }
> > +
> > +    return memory_region_get_fd(mr);
> > +}
> > +
> > +static bool proxy_mrs_can_merge(uint64_t host, uint64_t prev_host, size_t size)
> > +{
> > 
> > This seems similar to vhost_user_can_merge(). 
> > 
> > +    bool merge;
> > +    int fd1, fd2;
> > +
> > +    fd1 = get_fd_from_hostaddr(host, NULL);
> > +
> > +    fd2 = get_fd_from_hostaddr(prev_host, NULL);
> > +
> > +    merge = (fd1 == fd2);
> > 
> > This could be written in a simpler manner, ex:
> > 
> > if (get_fd_from_hostaddr(host, NULL) != get_fd_from_hostaddr(prev_host, NULL))
> >   return false
> > 
> > +
> > +    merge &= ((prev_host + size) == host);
> > 
> > That check could be done early on before doing the more expensive memory_region_from_host() calls
> > 
> > +
> > +    return merge;
> > +}
> > +
> > +static bool try_merge(RemoteMemSync *sync, MemoryRegionSection *section)
> > +{
> > +    uint64_t mrs_size, mrs_gpa, mrs_page;
> > +    MemoryRegionSection *prev_sec;
> > +    bool merged = false;
> > +    uintptr_t mrs_host;
> > +    RAMBlock *mrs_rb;
> > +
> > +    if (!sync->n_mr_sections) {
> > +        return false;
> > +    }
> > +
> > +    mrs_rb = section->mr->ram_block;
> > +    mrs_page = (uint64_t)qemu_ram_pagesize(mrs_rb);
> > +    mrs_size = int128_get64(section->size);
> > +    mrs_gpa = section->offset_within_address_space;
> > +    mrs_host = (uintptr_t)memory_region_get_ram_ptr(section->mr) +
> > +               section->offset_within_region;
> > +
> > +    if (get_fd_from_hostaddr(mrs_host, NULL) < 0) {
> > +        return true;
> > +    }
> > +
> > +    mrs_host = mrs_host & ~(mrs_page - 1);
> > +    mrs_gpa = mrs_gpa & ~(mrs_page - 1);
> > +    mrs_size = ROUND_UP(mrs_size, mrs_page);
> > +
> > +    prev_sec = sync->mr_sections + (sync->n_mr_sections - 1);
> > +    uint64_t prev_gpa_start = prev_sec->offset_within_address_space;
> > +    uint64_t prev_size = int128_get64(prev_sec->size);
> > +    uint64_t prev_gpa_end   = range_get_last(prev_gpa_start, prev_size);
> > +    uint64_t prev_host_start =
> > +        (uintptr_t)memory_region_get_ram_ptr(prev_sec->mr) +
> > +        prev_sec->offset_within_region;
> > +    uint64_t prev_host_end = range_get_last(prev_host_start, prev_size);
> > +
> > +    if (mrs_gpa <= (prev_gpa_end + 1)) {
> > +        g_assert(mrs_gpa > prev_gpa_start);
> > +
> > +        if ((section->mr == prev_sec->mr) &&
> > +            proxy_mrs_can_merge(mrs_host, prev_host_start,
> > +                                (mrs_gpa - prev_gpa_start))) {
> > +            uint64_t max_end = MAX(prev_host_end, mrs_host + mrs_size);
> > +            merged = true;
> > +            prev_sec->offset_within_address_space =
> > +                MIN(prev_gpa_start, mrs_gpa);
> > +            prev_sec->offset_within_region =
> > +                MIN(prev_host_start, mrs_host) -
> > +                (uintptr_t)memory_region_get_ram_ptr(prev_sec->mr);
> > +            prev_sec->size = int128_make64(max_end - MIN(prev_host_start,
> > +                                                         mrs_host));
> > +        }
> > +    }
> > +
> > +    return merged;
> > +}
> > +
> > +static void proxy_ml_region_addnop(MemoryListener *listener,
> > +                                   MemoryRegionSection *section)
> > +{
> > +    RemoteMemSync *sync = container_of(listener, RemoteMemSync, listener);
> > +
> > +    if (!(memory_region_is_ram(section->mr) &&
> > +          !memory_region_is_rom(section->mr))) {
> > +        return;
> > 
> > A bit clearer in vhost.c:
> > if (memory_region_is_ram(mr) && !memory_region_is_rom(mr)) {
> >  
> > +    }
> > +
> > +    if (try_merge(sync, section)) {
> > +        return;
> > +    }
> > +
> > +    ++sync->n_mr_sections;
> > +    sync->mr_sections = g_renew(MemoryRegionSection, sync->mr_sections,
> > +                                sync->n_mr_sections);
> > +    sync->mr_sections[sync->n_mr_sections - 1] = *section;
> > +    sync->mr_sections[sync->n_mr_sections - 1].fv = NULL;
> > +    memory_region_ref(section->mr);
> > +}
> > +
> > +static void proxy_ml_commit(MemoryListener *listener)
> > +{
> > +    RemoteMemSync *sync = container_of(listener, RemoteMemSync, listener);
> > +    MPQemuMsg msg;
> > +    MemoryRegionSection *section;
> > +    ram_addr_t offset;
> > +    uintptr_t host_addr;
> > +    int region;
> > +    Error *local_err = NULL;
> > +
> > +    memset(&msg, 0, sizeof(MPQemuMsg));
> > +
> > +    msg.cmd = SYNC_SYSMEM;
> > +    msg.num_fds = sync->n_mr_sections;
> > +    msg.size = sizeof(SyncSysmemMsg);
> > +    if (msg.num_fds > REMOTE_MAX_FDS) {
> > +        error_report("Number of fds is more than %d", REMOTE_MAX_FDS);
> > +        return;
> > +    }
> > +
> > +    for (region = 0; region < sync->n_mr_sections; region++) {
> > +        section = &sync->mr_sections[region];
> > +        msg.data.sync_sysmem.gpas[region] =
> > +            section->offset_within_address_space;
> > +        msg.data.sync_sysmem.sizes[region] = int128_get64(section->size);
> > +        host_addr = (uintptr_t)memory_region_get_ram_ptr(section->mr) +
> > +                    section->offset_within_region;
> > +        msg.fds[region] = get_fd_from_hostaddr(host_addr, &offset);
> > +        msg.data.sync_sysmem.offsets[region] = offset;
> > +    }
> > +    mpqemu_msg_send(&msg, sync->ioc, &local_err);
> > +    if (local_err) {
> > +        error_report("Error in sending command %d", msg.cmd);
> > +    }
> > +}
> > 
> > That whole complex code above duplicates much of the logic in vhost.c. Can we try to factorize it instead?
> 
> Hi Marc-Andre,
> 
> Thank you for sharing your feedback!
> 
> Would it be alright if we addressed this item alone in a separate patch in the future? Since
> this refactoring affects vhost code, we’re wondering it would be better to address it in a
> future patch to help with any regression analysis in the future.
> 
> That's fine with me, but please leave a TODO note in the code then.
> 
> thanks

Thank you very much for confirming!

—
Jag

> 
> 
> Thank you!
> —
> Jag
> 
> > 
> > +
> > +void deconfigure_memory_sync(RemoteMemSync *sync)
> > +{
> > +    memory_listener_unregister(&sync->listener);
> > +
> > +    proxy_ml_begin(&sync->listener);
> > +}
> > +
> > +void configure_memory_sync(RemoteMemSync *sync, QIOChannel *ioc)
> > +{
> > +    sync->n_mr_sections = 0;
> > +    sync->mr_sections = NULL;
> > +
> > +    sync->ioc = ioc;
> > +
> > +    sync->listener.begin = proxy_ml_begin;
> > +    sync->listener.commit = proxy_ml_commit;
> > +    sync->listener.region_add = proxy_ml_region_addnop;
> > +    sync->listener.region_nop = proxy_ml_region_addnop;
> > +    sync->listener.priority = 10;
> > +
> > +    memory_listener_register(&sync->listener, &address_space_memory);
> > +}
> > diff --git a/hw/remote/message.c b/hw/remote/message.c
> > index 0f3e38a..454fd2d 100644
> > --- a/hw/remote/message.c
> > +++ b/hw/remote/message.c
> > @@ -17,6 +17,7 @@
> >  #include "sysemu/runstate.h"
> >  #include "hw/pci/pci.h"
> >  #include "exec/memattrs.h"
> > +#include "hw/remote/memory.h"
> > 
> >  static void process_config_write(QIOChannel *ioc, PCIDevice *dev,
> >                                   MPQemuMsg *msg);
> > @@ -64,6 +65,10 @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
> >          case BAR_READ:
> >              process_bar_read(com->ioc, &msg, &local_err);
> >              break;
> > +        case SYNC_SYSMEM:
> > +            remote_sysmem_reconfig(&msg, &local_err);
> > +            break;
> > +
> >          default:
> >              error_setg(&local_err,
> >                         "Unknown command (%d) received for device %s (pid=%d)",
> > diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
> > index 039347d..0f2d1aa 100644
> > --- a/hw/remote/proxy.c
> > +++ b/hw/remote/proxy.c
> > @@ -18,6 +18,8 @@
> >  #include "migration/blocker.h"
> >  #include "hw/remote/mpqemu-link.h"
> >  #include "qemu/error-report.h"
> > +#include "hw/remote/memory-sync.h"
> > +#include "qom/object.h"
> > 
> >  static void proxy_set_socket(PCIProxyDev *pdev, int fd, Error **errp)
> >  {
> > @@ -58,6 +60,8 @@ static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
> > 
> >      qemu_mutex_init(&dev->io_mutex);
> >      qio_channel_set_blocking(dev->ioc, true, NULL);
> > +
> > +    configure_memory_sync(&dev->sync, dev->ioc);
> >  }
> > 
> >  static void pci_proxy_dev_exit(PCIDevice *pdev)
> > @@ -69,6 +73,8 @@ static void pci_proxy_dev_exit(PCIDevice *pdev)
> >      migrate_del_blocker(dev->migration_blocker);
> > 
> >      error_free(dev->migration_blocker);
> > +
> > +    deconfigure_memory_sync(&dev->sync);
> >  }
> > 
> >  static int config_op_send(PCIProxyDev *pdev, uint32_t addr, uint32_t *val,
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index ebd1d1d..5d78b78 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -3150,6 +3150,8 @@ F: include/hw/remote/memory.h
> >  F: hw/remote/memory.c
> >  F: hw/remote/proxy.c
> >  F: include/hw/remote/proxy.h
> > +F: hw/remote/memory-sync.c
> > +F: include/hw/remote/memory-sync.h
> > 
> >  Build and test automation
> >  -------------------------
> > diff --git a/hw/remote/meson.build b/hw/remote/meson.build
> > index 569cd20..7d434a5 100644
> > --- a/hw/remote/meson.build
> > +++ b/hw/remote/meson.build
> > @@ -7,5 +7,6 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
> >  remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
> > 
> >  specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
> > +specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory-sync.c'))
> > 
> >  softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
> > -- 
> > 1.8.3.1
> > 
> > 
> > 
> > -- 
> > Marc-André Lureau
> 
> 
> 
> -- 
> Marc-André Lureau



^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2020-12-10 17:13 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-01 20:22 [PATCH v12 00/19] Initial support for multi-process Qemu Jagannathan Raman
2020-12-01 20:22 ` [PATCH v12 01/19] multi-process: add the concept description to docs/devel/qemu-multiprocess Jagannathan Raman
2020-12-01 20:22 ` [PATCH v12 02/19] multi-process: add configure and usage information Jagannathan Raman
2020-12-04 14:10   ` Marc-André Lureau
2020-12-04 14:37   ` Daniel P. Berrangé
2020-12-09 16:20     ` Jag Raman
2020-12-01 20:22 ` [PATCH v12 03/19] memory: alloc RAM from file at offset Jagannathan Raman
2020-12-04 14:13   ` Marc-André Lureau
2020-12-04 14:18     ` Marc-André Lureau
2020-12-01 20:22 ` [PATCH v12 04/19] multi-process: Add config option for multi-process QEMU Jagannathan Raman
2020-12-01 20:22 ` [PATCH v12 05/19] multi-process: setup PCI host bridge for remote device Jagannathan Raman
2020-12-04 14:29   ` Marc-André Lureau
2020-12-04 14:32   ` Marc-André Lureau
2020-12-01 20:22 ` [PATCH v12 06/19] multi-process: setup a machine object for remote device process Jagannathan Raman
2020-12-04 14:35   ` Marc-André Lureau
2020-12-09 16:56     ` Jag Raman
2020-12-01 20:22 ` [PATCH v12 07/19] multi-process: add qio channel function to transmit data and fds Jagannathan Raman
2020-12-04 14:40   ` Marc-André Lureau
2020-12-01 20:22 ` [PATCH v12 08/19] multi-process: define MPQemuMsg format and transmission functions Jagannathan Raman
2020-12-07 13:18   ` Marc-André Lureau
2020-12-10  1:40     ` Elena Ufimtseva
2020-12-10  8:20       ` Marc-André Lureau
2020-12-10 12:53         ` Elena Ufimtseva
2020-12-01 20:22 ` [PATCH v12 09/19] multi-process: Initialize message handler in remote device Jagannathan Raman
2020-12-07 13:33   ` Marc-André Lureau
2020-12-01 20:22 ` [PATCH v12 10/19] multi-process: Associate fd of a PCIDevice with its object Jagannathan Raman
2020-12-07 14:03   ` Marc-André Lureau
2020-12-08 12:07     ` Marc-André Lureau
2020-12-01 20:22 ` [PATCH v12 11/19] multi-process: setup memory manager for remote device Jagannathan Raman
2020-12-08 11:54   ` Marc-André Lureau
2020-12-08 11:58   ` Marc-André Lureau
2020-12-01 20:22 ` [PATCH v12 12/19] multi-process: introduce proxy object Jagannathan Raman
2020-12-08 12:23   ` Marc-André Lureau
2020-12-01 20:22 ` [PATCH v12 13/19] multi-process: add proxy communication functions Jagannathan Raman
2020-12-08 12:39   ` Marc-André Lureau
2020-12-01 20:22 ` [PATCH v12 14/19] multi-process: Forward PCI config space acceses to the remote process Jagannathan Raman
2020-12-08 12:52   ` Marc-André Lureau
2020-12-01 20:22 ` [PATCH v12 15/19] multi-process: PCI BAR read/write handling for proxy & remote endpoints Jagannathan Raman
2020-12-01 20:22 ` [PATCH v12 16/19] multi-process: Synchronize remote memory Jagannathan Raman
2020-12-08 13:57   ` Marc-André Lureau
2020-12-09 16:18     ` Jag Raman
2020-12-09 21:28       ` Marc-André Lureau
2020-12-10 16:57         ` Jag Raman
2020-12-01 20:22 ` [PATCH v12 17/19] multi-process: create IOHUB object to handle irq Jagannathan Raman
2020-12-01 20:22 ` [PATCH v12 18/19] multi-process: Retrieve PCI info from remote process Jagannathan Raman
2020-12-01 20:22 ` [PATCH v12 19/19] multi-process: perform device reset in the " Jagannathan Raman
2020-12-03  9:14 ` [PATCH v12 00/19] Initial support for multi-process Qemu Stefan Hajnoczi
2020-12-03 19:26   ` Elena Ufimtseva
2020-12-03 20:40   ` Peter Maydell
2020-12-10 11:13     ` Stefan Hajnoczi
2020-12-10 11:24       ` Peter Maydell
2020-12-10 15:31         ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.