All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] vhost-user: Back-end state migration
@ 2023-07-12 11:16 ` Hanna Czenczek
  0 siblings, 0 replies; 36+ messages in thread
From: Hanna Czenczek @ 2023-07-12 11:16 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Czenczek, Michael S . Tsirkin, Stefan Hajnoczi,
	German Maglione, Eugenio Pérez

RFC:
https://lists.nongnu.org/archive/html/qemu-devel/2023-03/msg04263.html

v1:
https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg01575.html

Based-on: <20230711155230.64277-1-hreitz@redhat.com>
          (“[PATCH 0/6] vhost-user: Add suspend/resume”)
https://lists.nongnu.org/archive/html/qemu-devel/2023-07/msg02450.html


Hi,

Most of the feedback on v1 focused on how to properly stop a device that
has internal state, both for the purpose of migration and just plainly
for VM stop/cont.  I hope this is resolved by my vhost-user
suspend/resume series, which this series here is based on (linked
above).

Therefore, these patches here for the actual state transfer have changed
only little since v1:

- I’ve removed the original patch 1 that would explicitly re-enable all
  vrings every time after using SET_FEATURES with
  VHOST_USER_F_PROTOCOL_FEATURES, replacing it with the stand-alone
  patch “[PATCH] vhost-user.rst: Clarify enabling/disabling vrings”
  (20230712091704.15589-1-hreitz@redhat.com), which does not change
  behavior but only documentation

- In its place, I’ve added a patch that adds documentation on the
  protocol additions.

- Patch 2: Drastically shortened the commit message, this documentation
  is now where it should be, namely in vhost-user.rst (added by patch
  1).  Also rebased on the “Add suspend/resume” series, which adds
  feature bits and vhost-user operations of its own.

- Patch 3: Instead of checking dev->started to see that the device is
  stopped, we might now want to check dev->suspended instead.  However,
  this is only set if the device actually supports suspending, so in
  fact we cannot use it.  Added documentation on this fact.
  (Dropped checking enable_vqs, because this variable had been added
  only by the old patch 1, which I dropped.)


Hanna Czenczek (4):
  vhost-user.rst: Migrating back-end-internal state
  vhost-user: Interface for migration state transfer
  vhost: Add high-level state save/load functions
  vhost-user-fs: Implement internal migration

 docs/interop/vhost-user.rst       |  87 +++++++++++
 include/hw/virtio/vhost-backend.h |  24 +++
 include/hw/virtio/vhost.h         | 114 ++++++++++++++
 hw/virtio/vhost-user-fs.c         | 101 ++++++++++++-
 hw/virtio/vhost-user.c            | 147 ++++++++++++++++++
 hw/virtio/vhost.c                 | 241 ++++++++++++++++++++++++++++++
 6 files changed, 713 insertions(+), 1 deletion(-)

-- 
2.41.0



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Virtio-fs] [PATCH v2 0/4] vhost-user: Back-end state migration
@ 2023-07-12 11:16 ` Hanna Czenczek
  0 siblings, 0 replies; 36+ messages in thread
From: Hanna Czenczek @ 2023-07-12 11:16 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Czenczek, Michael S . Tsirkin, Stefan Hajnoczi,
	German Maglione, Eugenio Pérez

RFC:
https://lists.nongnu.org/archive/html/qemu-devel/2023-03/msg04263.html

v1:
https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg01575.html

Based-on: <20230711155230.64277-1-hreitz@redhat.com>
          (“[PATCH 0/6] vhost-user: Add suspend/resume”)
https://lists.nongnu.org/archive/html/qemu-devel/2023-07/msg02450.html


Hi,

Most of the feedback on v1 focused on how to properly stop a device that
has internal state, both for the purpose of migration and just plainly
for VM stop/cont.  I hope this is resolved by my vhost-user
suspend/resume series, which this series here is based on (linked
above).

Therefore, these patches here for the actual state transfer have changed
only little since v1:

- I’ve removed the original patch 1 that would explicitly re-enable all
  vrings every time after using SET_FEATURES with
  VHOST_USER_F_PROTOCOL_FEATURES, replacing it with the stand-alone
  patch “[PATCH] vhost-user.rst: Clarify enabling/disabling vrings”
  (20230712091704.15589-1-hreitz@redhat.com), which does not change
  behavior but only documentation

- In its place, I’ve added a patch that adds documentation on the
  protocol additions.

- Patch 2: Drastically shortened the commit message, this documentation
  is now where it should be, namely in vhost-user.rst (added by patch
  1).  Also rebased on the “Add suspend/resume” series, which adds
  feature bits and vhost-user operations of its own.

- Patch 3: Instead of checking dev->started to see that the device is
  stopped, we might now want to check dev->suspended instead.  However,
  this is only set if the device actually supports suspending, so in
  fact we cannot use it.  Added documentation on this fact.
  (Dropped checking enable_vqs, because this variable had been added
  only by the old patch 1, which I dropped.)


Hanna Czenczek (4):
  vhost-user.rst: Migrating back-end-internal state
  vhost-user: Interface for migration state transfer
  vhost: Add high-level state save/load functions
  vhost-user-fs: Implement internal migration

 docs/interop/vhost-user.rst       |  87 +++++++++++
 include/hw/virtio/vhost-backend.h |  24 +++
 include/hw/virtio/vhost.h         | 114 ++++++++++++++
 hw/virtio/vhost-user-fs.c         | 101 ++++++++++++-
 hw/virtio/vhost-user.c            | 147 ++++++++++++++++++
 hw/virtio/vhost.c                 | 241 ++++++++++++++++++++++++++++++
 6 files changed, 713 insertions(+), 1 deletion(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 1/4] vhost-user.rst: Migrating back-end-internal state
  2023-07-12 11:16 ` [Virtio-fs] " Hanna Czenczek
@ 2023-07-12 11:16   ` Hanna Czenczek
  -1 siblings, 0 replies; 36+ messages in thread
From: Hanna Czenczek @ 2023-07-12 11:16 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Czenczek, Michael S . Tsirkin, Stefan Hajnoczi,
	German Maglione, Eugenio Pérez

For vhost-user devices, qemu can migrate the virtio state, but not the
back-end's internal state.  To do so, we need to be able to transfer
this internal state between front-end (qemu) and back-end.

At this point, this new feature is added for the purpose of virtio-fs
migration.  Because virtiofsd's internal state will not be too large, we
believe it is best to transfer it as a single binary blob after the
streaming phase.

These are the additions to the protocol:
- New vhost-user protocol feature VHOST_USER_PROTOCOL_F_MIGRATORY_STATE
- SET_DEVICE_STATE_FD function: Front-end and back-end negotiate a pipe
  over which to transfer the state.
- CHECK_DEVICE_STATE: After the state has been transferred through the
  pipe, the front-end invokes this function to verify success.  There is
  no in-band way (through the pipe) to indicate failure, so we need to
  check explicitly.

Once the transfer pipe has been established via SET_DEVICE_STATE_FD
(which includes establishing the direction of transfer and migration
phase), the sending side writes its data into the pipe, and the reading
side reads it until it sees an EOF.  Then, the front-end will check for
success via CHECK_DEVICE_STATE, which on the destination side includes
checking for integrity (i.e. errors during deserialization).

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
 docs/interop/vhost-user.rst | 87 +++++++++++++++++++++++++++++++++++++
 1 file changed, 87 insertions(+)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index ac6be34c4c..c98dfeca25 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -334,6 +334,7 @@ in the ancillary data:
 * ``VHOST_USER_SET_VRING_ERR``
 * ``VHOST_USER_SET_BACKEND_REQ_FD`` (previous name ``VHOST_USER_SET_SLAVE_REQ_FD``)
 * ``VHOST_USER_SET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``)
+* ``VHOST_USER_SET_DEVICE_STATE_FD``
 
 If *front-end* is unable to send the full message or receives a wrong
 reply it will close the connection. An optional reconnection mechanism
@@ -497,6 +498,44 @@ it performs WAKE ioctl's on the userfaultfd to wake the stalled
 back-end.  The front-end indicates support for this via the
 ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` feature.
 
+.. _migrating_backend_state:
+
+Migrating back-end state
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+If the back-end has internal state that is to be sent from source to
+destination, the front-end may be able to store and transfer it via an
+internal migration stream.  Support for this is negotiated with the
+``VHOST_USER_PROTOCOL_F_MIGRATORY_STATE`` feature.
+
+First, a channel over which the state is transferred is established on
+the source side using the ``VHOST_USER_SET_DEVICE_STATE_FD`` message.
+This message has two parameters:
+
+* Direction of transfer: On the source, the data is saved, transferring
+  it from the back-end to the front-end.  On the destination, the data
+  is loaded, transferring it from the front-end to the back-end.
+
+* Migration phase: Currently, only the period after memory transfer
+  before switch-over is supported, in which the device is suspended and
+  all of its rings are stopped.
+
+Then, the writing end will write all the data it has, signalling the end
+of data by closing its end of the pipe.  The reading end must read all
+of this data and process it:
+
+* If saving, the front-end will transfer this data to the destination,
+  where it is loaded into the destination back-end.
+
+* If loading, the back-end must deserialize its internal state from the
+  transferred data and be set up to resume operation.
+
+After the front-end has seen all data transferred (saving: seen an EOF
+on the pipe; loading: closed its end of the pipe), it sends the
+``VHOST_USER_CHECK_DEVICE_STATE`` message to verify that data transfer
+was successful in the back-end, too.  The back-end responds once it
+knows whether the tranfer and processing was successful or not.
+
 Memory access
 -------------
 
@@ -891,6 +930,7 @@ Protocol features
   #define VHOST_USER_PROTOCOL_F_STATUS               16
   #define VHOST_USER_PROTOCOL_F_XEN_MMAP             17
   #define VHOST_USER_PROTOCOL_F_SUSPEND              18
+  #define VHOST_USER_PROTOCOL_F_MIGRATORY_STATE      19
 
 Front-end message types
 -----------------------
@@ -1471,6 +1511,53 @@ Front-end message types
   before.  The back-end must again begin processing rings that are not
   stopped, and it may resume background operations.
 
+``VHOST_USER_SET_DEVICE_STATE_FD``
+  :id: 43
+  :equivalent ioctl: N/A
+  :request payload: device state transfer parameters
+  :reply payload: ``u64``
+
+  The front-end negotiates a pipe over which to transfer the back-end’s
+  internal state during migration.  For this purpose, this message is
+  accompanied by a file descriptor that is to be the back-end’s end of
+  the pipe.  If the back-end can provide a more efficient pipe (i.e.
+  because it internally already has a pipe into/from which to
+  put/receive state), it can ignore this and reply with a different file
+  descriptor to serve as the front-end’s end.
+
+  The request payload contains parameters for the subsequent data
+  transfer, as described in the :ref:`Migrating back-end state
+  <migrating_backend_state>` section.  That section also explains the
+  data transfer itself.
+
+  The value returned is both an indication for success, and whether a
+  new pipe file descriptor is returned: Bits 0–7 are 0 on success, and
+  non-zero on error.  Bit 8 is the invalid FD flag; this flag is set
+  when there is no file descriptor returned.  When this flag is not set,
+  the front-end must use the returned file descriptor as its end of the
+  pipe.  The back-end must not both indicate an error and return a file
+  descriptor.
+
+  Using this function requires prior negotiation of the
+  ``VHOST_USER_PROTOCOL_F_MIGRATORY_STATE`` feature.
+
+``VHOST_USER_CHECK_DEVICE_STATE``
+  :id: 44
+  :equivalent ioctl: N/A
+  :request payload: N/A
+  :reply payload: ``u64``
+
+  After transferring the back-end’s internal state during migration (see
+  the :ref:`Migrating back-end state <migrating_backend_state>`
+  section), check whether the back-end was able to successfully fully
+  process the state.
+
+  The value returned indicates success or error; 0 is success, any
+  non-zero value is an error.
+
+  Using this function requires prior negotiation of the
+  ``VHOST_USER_PROTOCOL_F_MIGRATORY_STATE`` feature.
+
 
 Back-end message types
 ----------------------
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Virtio-fs] [PATCH v2 1/4] vhost-user.rst: Migrating back-end-internal state
@ 2023-07-12 11:16   ` Hanna Czenczek
  0 siblings, 0 replies; 36+ messages in thread
From: Hanna Czenczek @ 2023-07-12 11:16 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Czenczek, Michael S . Tsirkin, Stefan Hajnoczi,
	German Maglione, Eugenio Pérez

For vhost-user devices, qemu can migrate the virtio state, but not the
back-end's internal state.  To do so, we need to be able to transfer
this internal state between front-end (qemu) and back-end.

At this point, this new feature is added for the purpose of virtio-fs
migration.  Because virtiofsd's internal state will not be too large, we
believe it is best to transfer it as a single binary blob after the
streaming phase.

These are the additions to the protocol:
- New vhost-user protocol feature VHOST_USER_PROTOCOL_F_MIGRATORY_STATE
- SET_DEVICE_STATE_FD function: Front-end and back-end negotiate a pipe
  over which to transfer the state.
- CHECK_DEVICE_STATE: After the state has been transferred through the
  pipe, the front-end invokes this function to verify success.  There is
  no in-band way (through the pipe) to indicate failure, so we need to
  check explicitly.

Once the transfer pipe has been established via SET_DEVICE_STATE_FD
(which includes establishing the direction of transfer and migration
phase), the sending side writes its data into the pipe, and the reading
side reads it until it sees an EOF.  Then, the front-end will check for
success via CHECK_DEVICE_STATE, which on the destination side includes
checking for integrity (i.e. errors during deserialization).

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
 docs/interop/vhost-user.rst | 87 +++++++++++++++++++++++++++++++++++++
 1 file changed, 87 insertions(+)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index ac6be34c4c..c98dfeca25 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -334,6 +334,7 @@ in the ancillary data:
 * ``VHOST_USER_SET_VRING_ERR``
 * ``VHOST_USER_SET_BACKEND_REQ_FD`` (previous name ``VHOST_USER_SET_SLAVE_REQ_FD``)
 * ``VHOST_USER_SET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``)
+* ``VHOST_USER_SET_DEVICE_STATE_FD``
 
 If *front-end* is unable to send the full message or receives a wrong
 reply it will close the connection. An optional reconnection mechanism
@@ -497,6 +498,44 @@ it performs WAKE ioctl's on the userfaultfd to wake the stalled
 back-end.  The front-end indicates support for this via the
 ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` feature.
 
+.. _migrating_backend_state:
+
+Migrating back-end state
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+If the back-end has internal state that is to be sent from source to
+destination, the front-end may be able to store and transfer it via an
+internal migration stream.  Support for this is negotiated with the
+``VHOST_USER_PROTOCOL_F_MIGRATORY_STATE`` feature.
+
+First, a channel over which the state is transferred is established on
+the source side using the ``VHOST_USER_SET_DEVICE_STATE_FD`` message.
+This message has two parameters:
+
+* Direction of transfer: On the source, the data is saved, transferring
+  it from the back-end to the front-end.  On the destination, the data
+  is loaded, transferring it from the front-end to the back-end.
+
+* Migration phase: Currently, only the period after memory transfer
+  before switch-over is supported, in which the device is suspended and
+  all of its rings are stopped.
+
+Then, the writing end will write all the data it has, signalling the end
+of data by closing its end of the pipe.  The reading end must read all
+of this data and process it:
+
+* If saving, the front-end will transfer this data to the destination,
+  where it is loaded into the destination back-end.
+
+* If loading, the back-end must deserialize its internal state from the
+  transferred data and be set up to resume operation.
+
+After the front-end has seen all data transferred (saving: seen an EOF
+on the pipe; loading: closed its end of the pipe), it sends the
+``VHOST_USER_CHECK_DEVICE_STATE`` message to verify that data transfer
+was successful in the back-end, too.  The back-end responds once it
+knows whether the tranfer and processing was successful or not.
+
 Memory access
 -------------
 
@@ -891,6 +930,7 @@ Protocol features
   #define VHOST_USER_PROTOCOL_F_STATUS               16
   #define VHOST_USER_PROTOCOL_F_XEN_MMAP             17
   #define VHOST_USER_PROTOCOL_F_SUSPEND              18
+  #define VHOST_USER_PROTOCOL_F_MIGRATORY_STATE      19
 
 Front-end message types
 -----------------------
@@ -1471,6 +1511,53 @@ Front-end message types
   before.  The back-end must again begin processing rings that are not
   stopped, and it may resume background operations.
 
+``VHOST_USER_SET_DEVICE_STATE_FD``
+  :id: 43
+  :equivalent ioctl: N/A
+  :request payload: device state transfer parameters
+  :reply payload: ``u64``
+
+  The front-end negotiates a pipe over which to transfer the back-end’s
+  internal state during migration.  For this purpose, this message is
+  accompanied by a file descriptor that is to be the back-end’s end of
+  the pipe.  If the back-end can provide a more efficient pipe (i.e.
+  because it internally already has a pipe into/from which to
+  put/receive state), it can ignore this and reply with a different file
+  descriptor to serve as the front-end’s end.
+
+  The request payload contains parameters for the subsequent data
+  transfer, as described in the :ref:`Migrating back-end state
+  <migrating_backend_state>` section.  That section also explains the
+  data transfer itself.
+
+  The value returned is both an indication for success, and whether a
+  new pipe file descriptor is returned: Bits 0–7 are 0 on success, and
+  non-zero on error.  Bit 8 is the invalid FD flag; this flag is set
+  when there is no file descriptor returned.  When this flag is not set,
+  the front-end must use the returned file descriptor as its end of the
+  pipe.  The back-end must not both indicate an error and return a file
+  descriptor.
+
+  Using this function requires prior negotiation of the
+  ``VHOST_USER_PROTOCOL_F_MIGRATORY_STATE`` feature.
+
+``VHOST_USER_CHECK_DEVICE_STATE``
+  :id: 44
+  :equivalent ioctl: N/A
+  :request payload: N/A
+  :reply payload: ``u64``
+
+  After transferring the back-end’s internal state during migration (see
+  the :ref:`Migrating back-end state <migrating_backend_state>`
+  section), check whether the back-end was able to successfully fully
+  process the state.
+
+  The value returned indicates success or error; 0 is success, any
+  non-zero value is an error.
+
+  Using this function requires prior negotiation of the
+  ``VHOST_USER_PROTOCOL_F_MIGRATORY_STATE`` feature.
+
 
 Back-end message types
 ----------------------
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 2/4] vhost-user: Interface for migration state transfer
  2023-07-12 11:16 ` [Virtio-fs] " Hanna Czenczek
@ 2023-07-12 11:17   ` Hanna Czenczek
  -1 siblings, 0 replies; 36+ messages in thread
From: Hanna Czenczek @ 2023-07-12 11:17 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Czenczek, Michael S . Tsirkin, Stefan Hajnoczi,
	German Maglione, Eugenio Pérez

Add the interface for transferring the back-end's state during migration
as defined previously in vhost-user.rst.

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
 include/hw/virtio/vhost-backend.h |  24 +++++
 include/hw/virtio/vhost.h         |  79 ++++++++++++++++
 hw/virtio/vhost-user.c            | 147 ++++++++++++++++++++++++++++++
 hw/virtio/vhost.c                 |  37 ++++++++
 4 files changed, 287 insertions(+)

diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index 31a251a9f5..e59d0b53f8 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -26,6 +26,18 @@ typedef enum VhostSetConfigType {
     VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
 } VhostSetConfigType;
 
+typedef enum VhostDeviceStateDirection {
+    /* Transfer state from back-end (device) to front-end */
+    VHOST_TRANSFER_STATE_DIRECTION_SAVE = 0,
+    /* Transfer state from front-end to back-end (device) */
+    VHOST_TRANSFER_STATE_DIRECTION_LOAD = 1,
+} VhostDeviceStateDirection;
+
+typedef enum VhostDeviceStatePhase {
+    /* The device (and all its vrings) is stopped */
+    VHOST_TRANSFER_STATE_PHASE_STOPPED = 0,
+} VhostDeviceStatePhase;
+
 struct vhost_inflight;
 struct vhost_dev;
 struct vhost_log;
@@ -133,6 +145,15 @@ typedef int (*vhost_set_config_call_op)(struct vhost_dev *dev,
 
 typedef void (*vhost_reset_status_op)(struct vhost_dev *dev);
 
+typedef bool (*vhost_supports_migratory_state_op)(struct vhost_dev *dev);
+typedef int (*vhost_set_device_state_fd_op)(struct vhost_dev *dev,
+                                            VhostDeviceStateDirection direction,
+                                            VhostDeviceStatePhase phase,
+                                            int fd,
+                                            int *reply_fd,
+                                            Error **errp);
+typedef int (*vhost_check_device_state_op)(struct vhost_dev *dev, Error **errp);
+
 typedef struct VhostOps {
     VhostBackendType backend_type;
     vhost_backend_init vhost_backend_init;
@@ -181,6 +202,9 @@ typedef struct VhostOps {
     vhost_force_iommu_op vhost_force_iommu;
     vhost_set_config_call_op vhost_set_config_call;
     vhost_reset_status_op vhost_reset_status;
+    vhost_supports_migratory_state_op vhost_supports_migratory_state;
+    vhost_set_device_state_fd_op vhost_set_device_state_fd;
+    vhost_check_device_state_op vhost_check_device_state;
 } VhostOps;
 
 int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 69bf59d630..d8877496e5 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -346,4 +346,83 @@ int vhost_dev_set_inflight(struct vhost_dev *dev,
 int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t queue_size,
                            struct vhost_inflight *inflight);
 bool vhost_dev_has_iommu(struct vhost_dev *dev);
+
+/**
+ * vhost_supports_migratory_state(): Checks whether the back-end
+ * supports transferring internal state for the purpose of migration.
+ * Support for this feature is required for vhost_set_device_state_fd()
+ * and vhost_check_device_state().
+ *
+ * @dev: The vhost device
+ *
+ * Returns true if the device supports these commands, and false if it
+ * does not.
+ */
+bool vhost_supports_migratory_state(struct vhost_dev *dev);
+
+/**
+ * vhost_set_device_state_fd(): Begin transfer of internal state from/to
+ * the back-end for the purpose of migration.  Data is to be transferred
+ * over a pipe according to @direction and @phase.  The sending end must
+ * only write to the pipe, and the receiving end must only read from it.
+ * Once the sending end is done, it closes its FD.  The receiving end
+ * must take this as the end-of-transfer signal and close its FD, too.
+ *
+ * @fd is the back-end's end of the pipe: The write FD for SAVE, and the
+ * read FD for LOAD.  This function transfers ownership of @fd to the
+ * back-end, i.e. closes it in the front-end.
+ *
+ * The back-end may optionally reply with an FD of its own, if this
+ * improves efficiency on its end.  In this case, the returned FD is
+ * stored in *reply_fd.  The back-end will discard the FD sent to it,
+ * and the front-end must use *reply_fd for transferring state to/from
+ * the back-end.
+ *
+ * @dev: The vhost device
+ * @direction: The direction in which the state is to be transferred.
+ *             For outgoing migrations, this is SAVE, and data is read
+ *             from the back-end and stored by the front-end in the
+ *             migration stream.
+ *             For incoming migrations, this is LOAD, and data is read
+ *             by the front-end from the migration stream and sent to
+ *             the back-end to restore the saved state.
+ * @phase: Which migration phase we are in.  Currently, there is only
+ *         STOPPED (device and all vrings are stopped), in the future,
+ *         more phases such as PRE_COPY or POST_COPY may be added.
+ * @fd: Back-end's end of the pipe through which to transfer state; note
+ *      that ownership is transferred to the back-end, so this function
+ *      closes @fd in the front-end.
+ * @reply_fd: If the back-end wishes to use a different pipe for state
+ *            transfer, this will contain an FD for the front-end to
+ *            use.  Otherwise, -1 is stored here.
+ * @errp: Potential error description
+ *
+ * Returns 0 on success, and -errno on failure.
+ */
+int vhost_set_device_state_fd(struct vhost_dev *dev,
+                              VhostDeviceStateDirection direction,
+                              VhostDeviceStatePhase phase,
+                              int fd,
+                              int *reply_fd,
+                              Error **errp);
+
+/**
+ * vhost_set_device_state_fd(): After transferring state from/to the
+ * back-end via vhost_set_device_state_fd(), i.e. once the sending end
+ * has closed the pipe, inquire the back-end to report any potential
+ * errors that have occurred on its side.  This allows to sense errors
+ * like:
+ * - During outgoing migration, when the source side had already started
+ *   to produce its state, something went wrong and it failed to finish
+ * - During incoming migration, when the received state is somehow
+ *   invalid and cannot be processed by the back-end
+ *
+ * @dev: The vhost device
+ * @errp: Potential error description
+ *
+ * Returns 0 when the back-end reports successful state transfer and
+ * processing, and -errno when an error occurred somewhere.
+ */
+int vhost_check_device_state(struct vhost_dev *dev, Error **errp);
+
 #endif
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 53a881ec2a..8e6b5485e8 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -76,6 +76,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_STATUS = 16,
     /* Feature 17 reserved for VHOST_USER_PROTOCOL_F_XEN_MMAP. */
     VHOST_USER_PROTOCOL_F_SUSPEND = 18,
+    VHOST_USER_PROTOCOL_F_MIGRATORY_STATE = 19,
     VHOST_USER_PROTOCOL_F_MAX
 };
 
@@ -125,6 +126,8 @@ typedef enum VhostUserRequest {
     VHOST_USER_GET_STATUS = 40,
     VHOST_USER_SUSPEND = 41,
     VHOST_USER_RESUME = 42,
+    VHOST_USER_SET_DEVICE_STATE_FD = 43,
+    VHOST_USER_CHECK_DEVICE_STATE = 44,
     VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -216,6 +219,12 @@ typedef struct {
     uint32_t size; /* the following payload size */
 } QEMU_PACKED VhostUserHeader;
 
+/* Request payload of VHOST_USER_SET_DEVICE_STATE_FD */
+typedef struct VhostUserTransferDeviceState {
+    uint32_t direction;
+    uint32_t phase;
+} VhostUserTransferDeviceState;
+
 typedef union {
 #define VHOST_USER_VRING_IDX_MASK   (0xff)
 #define VHOST_USER_VRING_NOFD_MASK  (0x1 << 8)
@@ -230,6 +239,7 @@ typedef union {
         VhostUserCryptoSession session;
         VhostUserVringArea area;
         VhostUserInflight inflight;
+        VhostUserTransferDeviceState transfer_state;
 } VhostUserPayload;
 
 typedef struct VhostUserMsg {
@@ -2838,6 +2848,140 @@ static void vhost_user_reset_status(struct vhost_dev *dev)
     }
 }
 
+static bool vhost_user_supports_migratory_state(struct vhost_dev *dev)
+{
+    return virtio_has_feature(dev->protocol_features,
+                              VHOST_USER_PROTOCOL_F_MIGRATORY_STATE);
+}
+
+static int vhost_user_set_device_state_fd(struct vhost_dev *dev,
+                                          VhostDeviceStateDirection direction,
+                                          VhostDeviceStatePhase phase,
+                                          int fd,
+                                          int *reply_fd,
+                                          Error **errp)
+{
+    int ret;
+    struct vhost_user *vu = dev->opaque;
+    VhostUserMsg msg = {
+        .hdr = {
+            .request = VHOST_USER_SET_DEVICE_STATE_FD,
+            .flags = VHOST_USER_VERSION,
+            .size = sizeof(msg.payload.transfer_state),
+        },
+        .payload.transfer_state = {
+            .direction = direction,
+            .phase = phase,
+        },
+    };
+
+    *reply_fd = -1;
+
+    if (!vhost_user_supports_migratory_state(dev)) {
+        close(fd);
+        error_setg(errp, "Back-end does not support migration state transfer");
+        return -ENOTSUP;
+    }
+
+    ret = vhost_user_write(dev, &msg, &fd, 1);
+    close(fd);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret,
+                         "Failed to send SET_DEVICE_STATE_FD message");
+        return ret;
+    }
+
+    ret = vhost_user_read(dev, &msg);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret,
+                         "Failed to receive SET_DEVICE_STATE_FD reply");
+        return ret;
+    }
+
+    if (msg.hdr.request != VHOST_USER_SET_DEVICE_STATE_FD) {
+        error_setg(errp,
+                   "Received unexpected message type, expected %d, received %d",
+                   VHOST_USER_SET_DEVICE_STATE_FD, msg.hdr.request);
+        return -EPROTO;
+    }
+
+    if (msg.hdr.size != sizeof(msg.payload.u64)) {
+        error_setg(errp,
+                   "Received bad message size, expected %zu, received %" PRIu32,
+                   sizeof(msg.payload.u64), msg.hdr.size);
+        return -EPROTO;
+    }
+
+    if ((msg.payload.u64 & 0xff) != 0) {
+        error_setg(errp, "Back-end did not accept migration state transfer");
+        return -EIO;
+    }
+
+    if (!(msg.payload.u64 & VHOST_USER_VRING_NOFD_MASK)) {
+        *reply_fd = qemu_chr_fe_get_msgfd(vu->user->chr);
+        if (*reply_fd < 0) {
+            error_setg(errp,
+                       "Failed to get back-end-provided transfer pipe FD");
+            *reply_fd = -1;
+            return -EIO;
+        }
+    }
+
+    return 0;
+}
+
+static int vhost_user_check_device_state(struct vhost_dev *dev, Error **errp)
+{
+    int ret;
+    VhostUserMsg msg = {
+        .hdr = {
+            .request = VHOST_USER_CHECK_DEVICE_STATE,
+            .flags = VHOST_USER_VERSION,
+            .size = 0,
+        },
+    };
+
+    if (!vhost_user_supports_migratory_state(dev)) {
+        error_setg(errp, "Back-end does not support migration state transfer");
+        return -ENOTSUP;
+    }
+
+    ret = vhost_user_write(dev, &msg, NULL, 0);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret,
+                         "Failed to send CHECK_DEVICE_STATE message");
+        return ret;
+    }
+
+    ret = vhost_user_read(dev, &msg);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret,
+                         "Failed to receive CHECK_DEVICE_STATE reply");
+        return ret;
+    }
+
+    if (msg.hdr.request != VHOST_USER_CHECK_DEVICE_STATE) {
+        error_setg(errp,
+                   "Received unexpected message type, expected %d, received %d",
+                   VHOST_USER_CHECK_DEVICE_STATE, msg.hdr.request);
+        return -EPROTO;
+    }
+
+    if (msg.hdr.size != sizeof(msg.payload.u64)) {
+        error_setg(errp,
+                   "Received bad message size, expected %zu, received %" PRIu32,
+                   sizeof(msg.payload.u64), msg.hdr.size);
+        return -EPROTO;
+    }
+
+    if (msg.payload.u64 != 0) {
+        error_setg(errp, "Back-end failed to process its internal state");
+        return -EIO;
+    }
+
+    return 0;
+}
+
 const VhostOps user_ops = {
         .backend_type = VHOST_BACKEND_TYPE_USER,
         .vhost_backend_init = vhost_user_backend_init,
@@ -2874,4 +3018,7 @@ const VhostOps user_ops = {
         .vhost_set_inflight_fd = vhost_user_set_inflight_fd,
         .vhost_dev_start = vhost_user_dev_start,
         .vhost_reset_status = vhost_user_reset_status,
+        .vhost_supports_migratory_state = vhost_user_supports_migratory_state,
+        .vhost_set_device_state_fd = vhost_user_set_device_state_fd,
+        .vhost_check_device_state = vhost_user_check_device_state,
 };
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 2e28e58da7..756b6d55a8 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -2091,3 +2091,40 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
 
     return -ENOSYS;
 }
+
+bool vhost_supports_migratory_state(struct vhost_dev *dev)
+{
+    if (dev->vhost_ops->vhost_supports_migratory_state) {
+        return dev->vhost_ops->vhost_supports_migratory_state(dev);
+    }
+
+    return false;
+}
+
+int vhost_set_device_state_fd(struct vhost_dev *dev,
+                              VhostDeviceStateDirection direction,
+                              VhostDeviceStatePhase phase,
+                              int fd,
+                              int *reply_fd,
+                              Error **errp)
+{
+    if (dev->vhost_ops->vhost_set_device_state_fd) {
+        return dev->vhost_ops->vhost_set_device_state_fd(dev, direction, phase,
+                                                         fd, reply_fd, errp);
+    }
+
+    error_setg(errp,
+               "vhost transport does not support migration state transfer");
+    return -ENOSYS;
+}
+
+int vhost_check_device_state(struct vhost_dev *dev, Error **errp)
+{
+    if (dev->vhost_ops->vhost_check_device_state) {
+        return dev->vhost_ops->vhost_check_device_state(dev, errp);
+    }
+
+    error_setg(errp,
+               "vhost transport does not support migration state transfer");
+    return -ENOSYS;
+}
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Virtio-fs] [PATCH v2 2/4] vhost-user: Interface for migration state transfer
@ 2023-07-12 11:17   ` Hanna Czenczek
  0 siblings, 0 replies; 36+ messages in thread
From: Hanna Czenczek @ 2023-07-12 11:17 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Czenczek, Michael S . Tsirkin, Stefan Hajnoczi,
	German Maglione, Eugenio Pérez

Add the interface for transferring the back-end's state during migration
as defined previously in vhost-user.rst.

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
 include/hw/virtio/vhost-backend.h |  24 +++++
 include/hw/virtio/vhost.h         |  79 ++++++++++++++++
 hw/virtio/vhost-user.c            | 147 ++++++++++++++++++++++++++++++
 hw/virtio/vhost.c                 |  37 ++++++++
 4 files changed, 287 insertions(+)

diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index 31a251a9f5..e59d0b53f8 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -26,6 +26,18 @@ typedef enum VhostSetConfigType {
     VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
 } VhostSetConfigType;
 
+typedef enum VhostDeviceStateDirection {
+    /* Transfer state from back-end (device) to front-end */
+    VHOST_TRANSFER_STATE_DIRECTION_SAVE = 0,
+    /* Transfer state from front-end to back-end (device) */
+    VHOST_TRANSFER_STATE_DIRECTION_LOAD = 1,
+} VhostDeviceStateDirection;
+
+typedef enum VhostDeviceStatePhase {
+    /* The device (and all its vrings) is stopped */
+    VHOST_TRANSFER_STATE_PHASE_STOPPED = 0,
+} VhostDeviceStatePhase;
+
 struct vhost_inflight;
 struct vhost_dev;
 struct vhost_log;
@@ -133,6 +145,15 @@ typedef int (*vhost_set_config_call_op)(struct vhost_dev *dev,
 
 typedef void (*vhost_reset_status_op)(struct vhost_dev *dev);
 
+typedef bool (*vhost_supports_migratory_state_op)(struct vhost_dev *dev);
+typedef int (*vhost_set_device_state_fd_op)(struct vhost_dev *dev,
+                                            VhostDeviceStateDirection direction,
+                                            VhostDeviceStatePhase phase,
+                                            int fd,
+                                            int *reply_fd,
+                                            Error **errp);
+typedef int (*vhost_check_device_state_op)(struct vhost_dev *dev, Error **errp);
+
 typedef struct VhostOps {
     VhostBackendType backend_type;
     vhost_backend_init vhost_backend_init;
@@ -181,6 +202,9 @@ typedef struct VhostOps {
     vhost_force_iommu_op vhost_force_iommu;
     vhost_set_config_call_op vhost_set_config_call;
     vhost_reset_status_op vhost_reset_status;
+    vhost_supports_migratory_state_op vhost_supports_migratory_state;
+    vhost_set_device_state_fd_op vhost_set_device_state_fd;
+    vhost_check_device_state_op vhost_check_device_state;
 } VhostOps;
 
 int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 69bf59d630..d8877496e5 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -346,4 +346,83 @@ int vhost_dev_set_inflight(struct vhost_dev *dev,
 int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t queue_size,
                            struct vhost_inflight *inflight);
 bool vhost_dev_has_iommu(struct vhost_dev *dev);
+
+/**
+ * vhost_supports_migratory_state(): Checks whether the back-end
+ * supports transferring internal state for the purpose of migration.
+ * Support for this feature is required for vhost_set_device_state_fd()
+ * and vhost_check_device_state().
+ *
+ * @dev: The vhost device
+ *
+ * Returns true if the device supports these commands, and false if it
+ * does not.
+ */
+bool vhost_supports_migratory_state(struct vhost_dev *dev);
+
+/**
+ * vhost_set_device_state_fd(): Begin transfer of internal state from/to
+ * the back-end for the purpose of migration.  Data is to be transferred
+ * over a pipe according to @direction and @phase.  The sending end must
+ * only write to the pipe, and the receiving end must only read from it.
+ * Once the sending end is done, it closes its FD.  The receiving end
+ * must take this as the end-of-transfer signal and close its FD, too.
+ *
+ * @fd is the back-end's end of the pipe: The write FD for SAVE, and the
+ * read FD for LOAD.  This function transfers ownership of @fd to the
+ * back-end, i.e. closes it in the front-end.
+ *
+ * The back-end may optionally reply with an FD of its own, if this
+ * improves efficiency on its end.  In this case, the returned FD is
+ * stored in *reply_fd.  The back-end will discard the FD sent to it,
+ * and the front-end must use *reply_fd for transferring state to/from
+ * the back-end.
+ *
+ * @dev: The vhost device
+ * @direction: The direction in which the state is to be transferred.
+ *             For outgoing migrations, this is SAVE, and data is read
+ *             from the back-end and stored by the front-end in the
+ *             migration stream.
+ *             For incoming migrations, this is LOAD, and data is read
+ *             by the front-end from the migration stream and sent to
+ *             the back-end to restore the saved state.
+ * @phase: Which migration phase we are in.  Currently, there is only
+ *         STOPPED (device and all vrings are stopped), in the future,
+ *         more phases such as PRE_COPY or POST_COPY may be added.
+ * @fd: Back-end's end of the pipe through which to transfer state; note
+ *      that ownership is transferred to the back-end, so this function
+ *      closes @fd in the front-end.
+ * @reply_fd: If the back-end wishes to use a different pipe for state
+ *            transfer, this will contain an FD for the front-end to
+ *            use.  Otherwise, -1 is stored here.
+ * @errp: Potential error description
+ *
+ * Returns 0 on success, and -errno on failure.
+ */
+int vhost_set_device_state_fd(struct vhost_dev *dev,
+                              VhostDeviceStateDirection direction,
+                              VhostDeviceStatePhase phase,
+                              int fd,
+                              int *reply_fd,
+                              Error **errp);
+
+/**
+ * vhost_set_device_state_fd(): After transferring state from/to the
+ * back-end via vhost_set_device_state_fd(), i.e. once the sending end
+ * has closed the pipe, inquire the back-end to report any potential
+ * errors that have occurred on its side.  This allows to sense errors
+ * like:
+ * - During outgoing migration, when the source side had already started
+ *   to produce its state, something went wrong and it failed to finish
+ * - During incoming migration, when the received state is somehow
+ *   invalid and cannot be processed by the back-end
+ *
+ * @dev: The vhost device
+ * @errp: Potential error description
+ *
+ * Returns 0 when the back-end reports successful state transfer and
+ * processing, and -errno when an error occurred somewhere.
+ */
+int vhost_check_device_state(struct vhost_dev *dev, Error **errp);
+
 #endif
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 53a881ec2a..8e6b5485e8 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -76,6 +76,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_STATUS = 16,
     /* Feature 17 reserved for VHOST_USER_PROTOCOL_F_XEN_MMAP. */
     VHOST_USER_PROTOCOL_F_SUSPEND = 18,
+    VHOST_USER_PROTOCOL_F_MIGRATORY_STATE = 19,
     VHOST_USER_PROTOCOL_F_MAX
 };
 
@@ -125,6 +126,8 @@ typedef enum VhostUserRequest {
     VHOST_USER_GET_STATUS = 40,
     VHOST_USER_SUSPEND = 41,
     VHOST_USER_RESUME = 42,
+    VHOST_USER_SET_DEVICE_STATE_FD = 43,
+    VHOST_USER_CHECK_DEVICE_STATE = 44,
     VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -216,6 +219,12 @@ typedef struct {
     uint32_t size; /* the following payload size */
 } QEMU_PACKED VhostUserHeader;
 
+/* Request payload of VHOST_USER_SET_DEVICE_STATE_FD */
+typedef struct VhostUserTransferDeviceState {
+    uint32_t direction;
+    uint32_t phase;
+} VhostUserTransferDeviceState;
+
 typedef union {
 #define VHOST_USER_VRING_IDX_MASK   (0xff)
 #define VHOST_USER_VRING_NOFD_MASK  (0x1 << 8)
@@ -230,6 +239,7 @@ typedef union {
         VhostUserCryptoSession session;
         VhostUserVringArea area;
         VhostUserInflight inflight;
+        VhostUserTransferDeviceState transfer_state;
 } VhostUserPayload;
 
 typedef struct VhostUserMsg {
@@ -2838,6 +2848,140 @@ static void vhost_user_reset_status(struct vhost_dev *dev)
     }
 }
 
+static bool vhost_user_supports_migratory_state(struct vhost_dev *dev)
+{
+    return virtio_has_feature(dev->protocol_features,
+                              VHOST_USER_PROTOCOL_F_MIGRATORY_STATE);
+}
+
+static int vhost_user_set_device_state_fd(struct vhost_dev *dev,
+                                          VhostDeviceStateDirection direction,
+                                          VhostDeviceStatePhase phase,
+                                          int fd,
+                                          int *reply_fd,
+                                          Error **errp)
+{
+    int ret;
+    struct vhost_user *vu = dev->opaque;
+    VhostUserMsg msg = {
+        .hdr = {
+            .request = VHOST_USER_SET_DEVICE_STATE_FD,
+            .flags = VHOST_USER_VERSION,
+            .size = sizeof(msg.payload.transfer_state),
+        },
+        .payload.transfer_state = {
+            .direction = direction,
+            .phase = phase,
+        },
+    };
+
+    *reply_fd = -1;
+
+    if (!vhost_user_supports_migratory_state(dev)) {
+        close(fd);
+        error_setg(errp, "Back-end does not support migration state transfer");
+        return -ENOTSUP;
+    }
+
+    ret = vhost_user_write(dev, &msg, &fd, 1);
+    close(fd);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret,
+                         "Failed to send SET_DEVICE_STATE_FD message");
+        return ret;
+    }
+
+    ret = vhost_user_read(dev, &msg);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret,
+                         "Failed to receive SET_DEVICE_STATE_FD reply");
+        return ret;
+    }
+
+    if (msg.hdr.request != VHOST_USER_SET_DEVICE_STATE_FD) {
+        error_setg(errp,
+                   "Received unexpected message type, expected %d, received %d",
+                   VHOST_USER_SET_DEVICE_STATE_FD, msg.hdr.request);
+        return -EPROTO;
+    }
+
+    if (msg.hdr.size != sizeof(msg.payload.u64)) {
+        error_setg(errp,
+                   "Received bad message size, expected %zu, received %" PRIu32,
+                   sizeof(msg.payload.u64), msg.hdr.size);
+        return -EPROTO;
+    }
+
+    if ((msg.payload.u64 & 0xff) != 0) {
+        error_setg(errp, "Back-end did not accept migration state transfer");
+        return -EIO;
+    }
+
+    if (!(msg.payload.u64 & VHOST_USER_VRING_NOFD_MASK)) {
+        *reply_fd = qemu_chr_fe_get_msgfd(vu->user->chr);
+        if (*reply_fd < 0) {
+            error_setg(errp,
+                       "Failed to get back-end-provided transfer pipe FD");
+            *reply_fd = -1;
+            return -EIO;
+        }
+    }
+
+    return 0;
+}
+
+static int vhost_user_check_device_state(struct vhost_dev *dev, Error **errp)
+{
+    int ret;
+    VhostUserMsg msg = {
+        .hdr = {
+            .request = VHOST_USER_CHECK_DEVICE_STATE,
+            .flags = VHOST_USER_VERSION,
+            .size = 0,
+        },
+    };
+
+    if (!vhost_user_supports_migratory_state(dev)) {
+        error_setg(errp, "Back-end does not support migration state transfer");
+        return -ENOTSUP;
+    }
+
+    ret = vhost_user_write(dev, &msg, NULL, 0);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret,
+                         "Failed to send CHECK_DEVICE_STATE message");
+        return ret;
+    }
+
+    ret = vhost_user_read(dev, &msg);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret,
+                         "Failed to receive CHECK_DEVICE_STATE reply");
+        return ret;
+    }
+
+    if (msg.hdr.request != VHOST_USER_CHECK_DEVICE_STATE) {
+        error_setg(errp,
+                   "Received unexpected message type, expected %d, received %d",
+                   VHOST_USER_CHECK_DEVICE_STATE, msg.hdr.request);
+        return -EPROTO;
+    }
+
+    if (msg.hdr.size != sizeof(msg.payload.u64)) {
+        error_setg(errp,
+                   "Received bad message size, expected %zu, received %" PRIu32,
+                   sizeof(msg.payload.u64), msg.hdr.size);
+        return -EPROTO;
+    }
+
+    if (msg.payload.u64 != 0) {
+        error_setg(errp, "Back-end failed to process its internal state");
+        return -EIO;
+    }
+
+    return 0;
+}
+
 const VhostOps user_ops = {
         .backend_type = VHOST_BACKEND_TYPE_USER,
         .vhost_backend_init = vhost_user_backend_init,
@@ -2874,4 +3018,7 @@ const VhostOps user_ops = {
         .vhost_set_inflight_fd = vhost_user_set_inflight_fd,
         .vhost_dev_start = vhost_user_dev_start,
         .vhost_reset_status = vhost_user_reset_status,
+        .vhost_supports_migratory_state = vhost_user_supports_migratory_state,
+        .vhost_set_device_state_fd = vhost_user_set_device_state_fd,
+        .vhost_check_device_state = vhost_user_check_device_state,
 };
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 2e28e58da7..756b6d55a8 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -2091,3 +2091,40 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
 
     return -ENOSYS;
 }
+
+bool vhost_supports_migratory_state(struct vhost_dev *dev)
+{
+    if (dev->vhost_ops->vhost_supports_migratory_state) {
+        return dev->vhost_ops->vhost_supports_migratory_state(dev);
+    }
+
+    return false;
+}
+
+int vhost_set_device_state_fd(struct vhost_dev *dev,
+                              VhostDeviceStateDirection direction,
+                              VhostDeviceStatePhase phase,
+                              int fd,
+                              int *reply_fd,
+                              Error **errp)
+{
+    if (dev->vhost_ops->vhost_set_device_state_fd) {
+        return dev->vhost_ops->vhost_set_device_state_fd(dev, direction, phase,
+                                                         fd, reply_fd, errp);
+    }
+
+    error_setg(errp,
+               "vhost transport does not support migration state transfer");
+    return -ENOSYS;
+}
+
+int vhost_check_device_state(struct vhost_dev *dev, Error **errp)
+{
+    if (dev->vhost_ops->vhost_check_device_state) {
+        return dev->vhost_ops->vhost_check_device_state(dev, errp);
+    }
+
+    error_setg(errp,
+               "vhost transport does not support migration state transfer");
+    return -ENOSYS;
+}
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 3/4] vhost: Add high-level state save/load functions
  2023-07-12 11:16 ` [Virtio-fs] " Hanna Czenczek
@ 2023-07-12 11:17   ` Hanna Czenczek
  -1 siblings, 0 replies; 36+ messages in thread
From: Hanna Czenczek @ 2023-07-12 11:17 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Czenczek, Michael S . Tsirkin, Stefan Hajnoczi,
	German Maglione, Eugenio Pérez

vhost_save_backend_state() and vhost_load_backend_state() can be used by
vhost front-ends to easily save and load the back-end's state to/from
the migration stream.

Because we do not know the full state size ahead of time,
vhost_save_backend_state() simply reads the data in 1 MB chunks, and
writes each chunk consecutively into the migration stream, prefixed by
its length.  EOF is indicated by a 0-length chunk.

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
 include/hw/virtio/vhost.h |  35 +++++++
 hw/virtio/vhost.c         | 204 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 239 insertions(+)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index d8877496e5..0c282abd4e 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -425,4 +425,39 @@ int vhost_set_device_state_fd(struct vhost_dev *dev,
  */
 int vhost_check_device_state(struct vhost_dev *dev, Error **errp);
 
+/**
+ * vhost_save_backend_state(): High-level function to receive a vhost
+ * back-end's state, and save it in `f`.  Uses
+ * `vhost_set_device_state_fd()` to get the data from the back-end, and
+ * stores it in consecutive chunks that are each prefixed by their
+ * respective length (be32).  The end is marked by a 0-length chunk.
+ *
+ * Must only be called while the device and all its vrings are stopped
+ * (`VHOST_TRANSFER_STATE_PHASE_STOPPED`).
+ *
+ * @dev: The vhost device from which to save the state
+ * @f: Migration stream in which to save the state
+ * @errp: Potential error message
+ *
+ * Returns 0 on success, and -errno otherwise.
+ */
+int vhost_save_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp);
+
+/**
+ * vhost_load_backend_state(): High-level function to load a vhost
+ * back-end's state from `f`, and send it over to the back-end.  Reads
+ * the data from `f` in the format used by `vhost_save_state()`, and
+ * uses `vhost_set_device_state_fd()` to transfer it to the back-end.
+ *
+ * Must only be called while the device and all its vrings are stopped
+ * (`VHOST_TRANSFER_STATE_PHASE_STOPPED`).
+ *
+ * @dev: The vhost device to which to send the sate
+ * @f: Migration stream from which to load the state
+ * @errp: Potential error message
+ *
+ * Returns 0 on success, and -errno otherwise.
+ */
+int vhost_load_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp);
+
 #endif
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 756b6d55a8..332d49a310 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -2128,3 +2128,207 @@ int vhost_check_device_state(struct vhost_dev *dev, Error **errp)
                "vhost transport does not support migration state transfer");
     return -ENOSYS;
 }
+
+int vhost_save_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp)
+{
+    /* Maximum chunk size in which to transfer the state */
+    const size_t chunk_size = 1 * 1024 * 1024;
+    void *transfer_buf = NULL;
+    g_autoptr(GError) g_err = NULL;
+    int pipe_fds[2], read_fd = -1, write_fd = -1, reply_fd = -1;
+    int ret;
+
+    /* [0] for reading (our end), [1] for writing (back-end's end) */
+    if (!g_unix_open_pipe(pipe_fds, FD_CLOEXEC, &g_err)) {
+        error_setg(errp, "Failed to set up state transfer pipe: %s",
+                   g_err->message);
+        ret = -EINVAL;
+        goto fail;
+    }
+
+    read_fd = pipe_fds[0];
+    write_fd = pipe_fds[1];
+
+    /*
+     * VHOST_TRANSFER_STATE_PHASE_STOPPED means the device must be stopped.
+     * We cannot check dev->suspended, because the back-end may not support
+     * suspending.
+     */
+    assert(!dev->started);
+
+    /* Transfer ownership of write_fd to the back-end */
+    ret = vhost_set_device_state_fd(dev,
+                                    VHOST_TRANSFER_STATE_DIRECTION_SAVE,
+                                    VHOST_TRANSFER_STATE_PHASE_STOPPED,
+                                    write_fd,
+                                    &reply_fd,
+                                    errp);
+    if (ret < 0) {
+        error_prepend(errp, "Failed to initiate state transfer: ");
+        goto fail;
+    }
+
+    /* If the back-end wishes to use a different pipe, switch over */
+    if (reply_fd >= 0) {
+        close(read_fd);
+        read_fd = reply_fd;
+    }
+
+    transfer_buf = g_malloc(chunk_size);
+
+    while (true) {
+        ssize_t read_ret;
+
+        read_ret = read(read_fd, transfer_buf, chunk_size);
+        if (read_ret < 0) {
+            ret = -errno;
+            error_setg_errno(errp, -ret, "Failed to receive state");
+            goto fail;
+        }
+
+        assert(read_ret <= chunk_size);
+        qemu_put_be32(f, read_ret);
+
+        if (read_ret == 0) {
+            /* EOF */
+            break;
+        }
+
+        qemu_put_buffer(f, transfer_buf, read_ret);
+    }
+
+    /*
+     * Back-end will not really care, but be clean and close our end of the pipe
+     * before inquiring the back-end about whether transfer was successful
+     */
+    close(read_fd);
+    read_fd = -1;
+
+    /* Also, verify that the device is still stopped */
+    assert(!dev->started);
+
+    ret = vhost_check_device_state(dev, errp);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    ret = 0;
+fail:
+    g_free(transfer_buf);
+    if (read_fd >= 0) {
+        close(read_fd);
+    }
+
+    return ret;
+}
+
+int vhost_load_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp)
+{
+    size_t transfer_buf_size = 0;
+    void *transfer_buf = NULL;
+    g_autoptr(GError) g_err = NULL;
+    int pipe_fds[2], read_fd = -1, write_fd = -1, reply_fd = -1;
+    int ret;
+
+    /* [0] for reading (back-end's end), [1] for writing (our end) */
+    if (!g_unix_open_pipe(pipe_fds, FD_CLOEXEC, &g_err)) {
+        error_setg(errp, "Failed to set up state transfer pipe: %s",
+                   g_err->message);
+        ret = -EINVAL;
+        goto fail;
+    }
+
+    read_fd = pipe_fds[0];
+    write_fd = pipe_fds[1];
+
+    /*
+     * VHOST_TRANSFER_STATE_PHASE_STOPPED means the device must be stopped.
+     * We cannot check dev->suspended, because the back-end may not support
+     * suspending.
+     */
+    assert(!dev->started);
+
+    /* Transfer ownership of read_fd to the back-end */
+    ret = vhost_set_device_state_fd(dev,
+                                    VHOST_TRANSFER_STATE_DIRECTION_LOAD,
+                                    VHOST_TRANSFER_STATE_PHASE_STOPPED,
+                                    read_fd,
+                                    &reply_fd,
+                                    errp);
+    if (ret < 0) {
+        error_prepend(errp, "Failed to initiate state transfer: ");
+        goto fail;
+    }
+
+    /* If the back-end wishes to use a different pipe, switch over */
+    if (reply_fd >= 0) {
+        close(write_fd);
+        write_fd = reply_fd;
+    }
+
+    while (true) {
+        size_t this_chunk_size = qemu_get_be32(f);
+        ssize_t write_ret;
+        const uint8_t *transfer_pointer;
+
+        if (this_chunk_size == 0) {
+            /* End of state */
+            break;
+        }
+
+        if (transfer_buf_size < this_chunk_size) {
+            transfer_buf = g_realloc(transfer_buf, this_chunk_size);
+            transfer_buf_size = this_chunk_size;
+        }
+
+        if (qemu_get_buffer(f, transfer_buf, this_chunk_size) <
+                this_chunk_size)
+        {
+            error_setg(errp, "Failed to read state");
+            ret = -EINVAL;
+            goto fail;
+        }
+
+        transfer_pointer = transfer_buf;
+        while (this_chunk_size > 0) {
+            write_ret = write(write_fd, transfer_pointer, this_chunk_size);
+            if (write_ret < 0) {
+                ret = -errno;
+                error_setg_errno(errp, -ret, "Failed to send state");
+                goto fail;
+            } else if (write_ret == 0) {
+                error_setg(errp, "Failed to send state: Connection is closed");
+                ret = -ECONNRESET;
+                goto fail;
+            }
+
+            assert(write_ret <= this_chunk_size);
+            this_chunk_size -= write_ret;
+            transfer_pointer += write_ret;
+        }
+    }
+
+    /*
+     * Close our end, thus ending transfer, before inquiring the back-end about
+     * whether transfer was successful
+     */
+    close(write_fd);
+    write_fd = -1;
+
+    /* Also, verify that the device is still stopped */
+    assert(!dev->started);
+
+    ret = vhost_check_device_state(dev, errp);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    ret = 0;
+fail:
+    g_free(transfer_buf);
+    if (write_fd >= 0) {
+        close(write_fd);
+    }
+
+    return ret;
+}
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Virtio-fs] [PATCH v2 3/4] vhost: Add high-level state save/load functions
@ 2023-07-12 11:17   ` Hanna Czenczek
  0 siblings, 0 replies; 36+ messages in thread
From: Hanna Czenczek @ 2023-07-12 11:17 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Czenczek, Michael S . Tsirkin, Stefan Hajnoczi,
	German Maglione, Eugenio Pérez

vhost_save_backend_state() and vhost_load_backend_state() can be used by
vhost front-ends to easily save and load the back-end's state to/from
the migration stream.

Because we do not know the full state size ahead of time,
vhost_save_backend_state() simply reads the data in 1 MB chunks, and
writes each chunk consecutively into the migration stream, prefixed by
its length.  EOF is indicated by a 0-length chunk.

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
 include/hw/virtio/vhost.h |  35 +++++++
 hw/virtio/vhost.c         | 204 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 239 insertions(+)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index d8877496e5..0c282abd4e 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -425,4 +425,39 @@ int vhost_set_device_state_fd(struct vhost_dev *dev,
  */
 int vhost_check_device_state(struct vhost_dev *dev, Error **errp);
 
+/**
+ * vhost_save_backend_state(): High-level function to receive a vhost
+ * back-end's state, and save it in `f`.  Uses
+ * `vhost_set_device_state_fd()` to get the data from the back-end, and
+ * stores it in consecutive chunks that are each prefixed by their
+ * respective length (be32).  The end is marked by a 0-length chunk.
+ *
+ * Must only be called while the device and all its vrings are stopped
+ * (`VHOST_TRANSFER_STATE_PHASE_STOPPED`).
+ *
+ * @dev: The vhost device from which to save the state
+ * @f: Migration stream in which to save the state
+ * @errp: Potential error message
+ *
+ * Returns 0 on success, and -errno otherwise.
+ */
+int vhost_save_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp);
+
+/**
+ * vhost_load_backend_state(): High-level function to load a vhost
+ * back-end's state from `f`, and send it over to the back-end.  Reads
+ * the data from `f` in the format used by `vhost_save_state()`, and
+ * uses `vhost_set_device_state_fd()` to transfer it to the back-end.
+ *
+ * Must only be called while the device and all its vrings are stopped
+ * (`VHOST_TRANSFER_STATE_PHASE_STOPPED`).
+ *
+ * @dev: The vhost device to which to send the sate
+ * @f: Migration stream from which to load the state
+ * @errp: Potential error message
+ *
+ * Returns 0 on success, and -errno otherwise.
+ */
+int vhost_load_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp);
+
 #endif
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 756b6d55a8..332d49a310 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -2128,3 +2128,207 @@ int vhost_check_device_state(struct vhost_dev *dev, Error **errp)
                "vhost transport does not support migration state transfer");
     return -ENOSYS;
 }
+
+int vhost_save_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp)
+{
+    /* Maximum chunk size in which to transfer the state */
+    const size_t chunk_size = 1 * 1024 * 1024;
+    void *transfer_buf = NULL;
+    g_autoptr(GError) g_err = NULL;
+    int pipe_fds[2], read_fd = -1, write_fd = -1, reply_fd = -1;
+    int ret;
+
+    /* [0] for reading (our end), [1] for writing (back-end's end) */
+    if (!g_unix_open_pipe(pipe_fds, FD_CLOEXEC, &g_err)) {
+        error_setg(errp, "Failed to set up state transfer pipe: %s",
+                   g_err->message);
+        ret = -EINVAL;
+        goto fail;
+    }
+
+    read_fd = pipe_fds[0];
+    write_fd = pipe_fds[1];
+
+    /*
+     * VHOST_TRANSFER_STATE_PHASE_STOPPED means the device must be stopped.
+     * We cannot check dev->suspended, because the back-end may not support
+     * suspending.
+     */
+    assert(!dev->started);
+
+    /* Transfer ownership of write_fd to the back-end */
+    ret = vhost_set_device_state_fd(dev,
+                                    VHOST_TRANSFER_STATE_DIRECTION_SAVE,
+                                    VHOST_TRANSFER_STATE_PHASE_STOPPED,
+                                    write_fd,
+                                    &reply_fd,
+                                    errp);
+    if (ret < 0) {
+        error_prepend(errp, "Failed to initiate state transfer: ");
+        goto fail;
+    }
+
+    /* If the back-end wishes to use a different pipe, switch over */
+    if (reply_fd >= 0) {
+        close(read_fd);
+        read_fd = reply_fd;
+    }
+
+    transfer_buf = g_malloc(chunk_size);
+
+    while (true) {
+        ssize_t read_ret;
+
+        read_ret = read(read_fd, transfer_buf, chunk_size);
+        if (read_ret < 0) {
+            ret = -errno;
+            error_setg_errno(errp, -ret, "Failed to receive state");
+            goto fail;
+        }
+
+        assert(read_ret <= chunk_size);
+        qemu_put_be32(f, read_ret);
+
+        if (read_ret == 0) {
+            /* EOF */
+            break;
+        }
+
+        qemu_put_buffer(f, transfer_buf, read_ret);
+    }
+
+    /*
+     * Back-end will not really care, but be clean and close our end of the pipe
+     * before inquiring the back-end about whether transfer was successful
+     */
+    close(read_fd);
+    read_fd = -1;
+
+    /* Also, verify that the device is still stopped */
+    assert(!dev->started);
+
+    ret = vhost_check_device_state(dev, errp);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    ret = 0;
+fail:
+    g_free(transfer_buf);
+    if (read_fd >= 0) {
+        close(read_fd);
+    }
+
+    return ret;
+}
+
+int vhost_load_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp)
+{
+    size_t transfer_buf_size = 0;
+    void *transfer_buf = NULL;
+    g_autoptr(GError) g_err = NULL;
+    int pipe_fds[2], read_fd = -1, write_fd = -1, reply_fd = -1;
+    int ret;
+
+    /* [0] for reading (back-end's end), [1] for writing (our end) */
+    if (!g_unix_open_pipe(pipe_fds, FD_CLOEXEC, &g_err)) {
+        error_setg(errp, "Failed to set up state transfer pipe: %s",
+                   g_err->message);
+        ret = -EINVAL;
+        goto fail;
+    }
+
+    read_fd = pipe_fds[0];
+    write_fd = pipe_fds[1];
+
+    /*
+     * VHOST_TRANSFER_STATE_PHASE_STOPPED means the device must be stopped.
+     * We cannot check dev->suspended, because the back-end may not support
+     * suspending.
+     */
+    assert(!dev->started);
+
+    /* Transfer ownership of read_fd to the back-end */
+    ret = vhost_set_device_state_fd(dev,
+                                    VHOST_TRANSFER_STATE_DIRECTION_LOAD,
+                                    VHOST_TRANSFER_STATE_PHASE_STOPPED,
+                                    read_fd,
+                                    &reply_fd,
+                                    errp);
+    if (ret < 0) {
+        error_prepend(errp, "Failed to initiate state transfer: ");
+        goto fail;
+    }
+
+    /* If the back-end wishes to use a different pipe, switch over */
+    if (reply_fd >= 0) {
+        close(write_fd);
+        write_fd = reply_fd;
+    }
+
+    while (true) {
+        size_t this_chunk_size = qemu_get_be32(f);
+        ssize_t write_ret;
+        const uint8_t *transfer_pointer;
+
+        if (this_chunk_size == 0) {
+            /* End of state */
+            break;
+        }
+
+        if (transfer_buf_size < this_chunk_size) {
+            transfer_buf = g_realloc(transfer_buf, this_chunk_size);
+            transfer_buf_size = this_chunk_size;
+        }
+
+        if (qemu_get_buffer(f, transfer_buf, this_chunk_size) <
+                this_chunk_size)
+        {
+            error_setg(errp, "Failed to read state");
+            ret = -EINVAL;
+            goto fail;
+        }
+
+        transfer_pointer = transfer_buf;
+        while (this_chunk_size > 0) {
+            write_ret = write(write_fd, transfer_pointer, this_chunk_size);
+            if (write_ret < 0) {
+                ret = -errno;
+                error_setg_errno(errp, -ret, "Failed to send state");
+                goto fail;
+            } else if (write_ret == 0) {
+                error_setg(errp, "Failed to send state: Connection is closed");
+                ret = -ECONNRESET;
+                goto fail;
+            }
+
+            assert(write_ret <= this_chunk_size);
+            this_chunk_size -= write_ret;
+            transfer_pointer += write_ret;
+        }
+    }
+
+    /*
+     * Close our end, thus ending transfer, before inquiring the back-end about
+     * whether transfer was successful
+     */
+    close(write_fd);
+    write_fd = -1;
+
+    /* Also, verify that the device is still stopped */
+    assert(!dev->started);
+
+    ret = vhost_check_device_state(dev, errp);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    ret = 0;
+fail:
+    g_free(transfer_buf);
+    if (write_fd >= 0) {
+        close(write_fd);
+    }
+
+    return ret;
+}
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 4/4] vhost-user-fs: Implement internal migration
  2023-07-12 11:16 ` [Virtio-fs] " Hanna Czenczek
@ 2023-07-12 11:17   ` Hanna Czenczek
  -1 siblings, 0 replies; 36+ messages in thread
From: Hanna Czenczek @ 2023-07-12 11:17 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Czenczek, Michael S . Tsirkin, Stefan Hajnoczi,
	German Maglione, Eugenio Pérez

A virtio-fs device's VM state consists of:
- the virtio device (vring) state (VMSTATE_VIRTIO_DEVICE)
- the back-end's (virtiofsd's) internal state

We get/set the latter via the new vhost operations to transfer migratory
state.  It is its own dedicated subsection, so that for external
migration, it can be disabled.

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
 hw/virtio/vhost-user-fs.c | 101 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 100 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 49d699ffc2..2b30f3d442 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -298,9 +298,108 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
     return &fs->vhost_dev;
 }
 
+/**
+ * Fetch the internal state from virtiofsd and save it to `f`.
+ */
+static int vuf_save_state(QEMUFile *f, void *pv, size_t size,
+                          const VMStateField *field, JSONWriter *vmdesc)
+{
+    VirtIODevice *vdev = pv;
+    VHostUserFS *fs = VHOST_USER_FS(vdev);
+    Error *local_error = NULL;
+    int ret;
+
+    ret = vhost_save_backend_state(&fs->vhost_dev, f, &local_error);
+    if (ret < 0) {
+        error_reportf_err(local_error,
+                          "Error saving back-end state of %s device %s "
+                          "(tag: \"%s\"): ",
+                          vdev->name, vdev->parent_obj.canonical_path,
+                          fs->conf.tag ?: "<none>");
+        return ret;
+    }
+
+    return 0;
+}
+
+/**
+ * Load virtiofsd's internal state from `f` and send it over to virtiofsd.
+ */
+static int vuf_load_state(QEMUFile *f, void *pv, size_t size,
+                          const VMStateField *field)
+{
+    VirtIODevice *vdev = pv;
+    VHostUserFS *fs = VHOST_USER_FS(vdev);
+    Error *local_error = NULL;
+    int ret;
+
+    ret = vhost_load_backend_state(&fs->vhost_dev, f, &local_error);
+    if (ret < 0) {
+        error_reportf_err(local_error,
+                          "Error loading back-end state of %s device %s "
+                          "(tag: \"%s\"): ",
+                          vdev->name, vdev->parent_obj.canonical_path,
+                          fs->conf.tag ?: "<none>");
+        return ret;
+    }
+
+    return 0;
+}
+
+static bool vuf_is_internal_migration(void *opaque)
+{
+    /* TODO: Return false when an external migration is requested */
+    return true;
+}
+
+static int vuf_check_migration_support(void *opaque)
+{
+    VirtIODevice *vdev = opaque;
+    VHostUserFS *fs = VHOST_USER_FS(vdev);
+
+    if (!vhost_supports_migratory_state(&fs->vhost_dev)) {
+        error_report("Back-end of %s device %s (tag: \"%s\") does not support "
+                     "migration through qemu",
+                     vdev->name, vdev->parent_obj.canonical_path,
+                     fs->conf.tag ?: "<none>");
+        return -ENOTSUP;
+    }
+
+    return 0;
+}
+
+static const VMStateDescription vuf_backend_vmstate;
+
 static const VMStateDescription vuf_vmstate = {
     .name = "vhost-user-fs",
-    .unmigratable = 1,
+    .version_id = 0,
+    .fields = (VMStateField[]) {
+        VMSTATE_VIRTIO_DEVICE,
+        VMSTATE_END_OF_LIST()
+    },
+    .subsections = (const VMStateDescription * []) {
+        &vuf_backend_vmstate,
+        NULL,
+    }
+};
+
+static const VMStateDescription vuf_backend_vmstate = {
+    .name = "vhost-user-fs-backend",
+    .version_id = 0,
+    .needed = vuf_is_internal_migration,
+    .pre_load = vuf_check_migration_support,
+    .pre_save = vuf_check_migration_support,
+    .fields = (VMStateField[]) {
+        {
+            .name = "back-end",
+            .info = &(const VMStateInfo) {
+                .name = "virtio-fs back-end state",
+                .get = vuf_load_state,
+                .put = vuf_save_state,
+            },
+        },
+        VMSTATE_END_OF_LIST()
+    },
 };
 
 static Property vuf_properties[] = {
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Virtio-fs] [PATCH v2 4/4] vhost-user-fs: Implement internal migration
@ 2023-07-12 11:17   ` Hanna Czenczek
  0 siblings, 0 replies; 36+ messages in thread
From: Hanna Czenczek @ 2023-07-12 11:17 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Czenczek, Michael S . Tsirkin, Stefan Hajnoczi,
	German Maglione, Eugenio Pérez

A virtio-fs device's VM state consists of:
- the virtio device (vring) state (VMSTATE_VIRTIO_DEVICE)
- the back-end's (virtiofsd's) internal state

We get/set the latter via the new vhost operations to transfer migratory
state.  It is its own dedicated subsection, so that for external
migration, it can be disabled.

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
 hw/virtio/vhost-user-fs.c | 101 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 100 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 49d699ffc2..2b30f3d442 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -298,9 +298,108 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
     return &fs->vhost_dev;
 }
 
+/**
+ * Fetch the internal state from virtiofsd and save it to `f`.
+ */
+static int vuf_save_state(QEMUFile *f, void *pv, size_t size,
+                          const VMStateField *field, JSONWriter *vmdesc)
+{
+    VirtIODevice *vdev = pv;
+    VHostUserFS *fs = VHOST_USER_FS(vdev);
+    Error *local_error = NULL;
+    int ret;
+
+    ret = vhost_save_backend_state(&fs->vhost_dev, f, &local_error);
+    if (ret < 0) {
+        error_reportf_err(local_error,
+                          "Error saving back-end state of %s device %s "
+                          "(tag: \"%s\"): ",
+                          vdev->name, vdev->parent_obj.canonical_path,
+                          fs->conf.tag ?: "<none>");
+        return ret;
+    }
+
+    return 0;
+}
+
+/**
+ * Load virtiofsd's internal state from `f` and send it over to virtiofsd.
+ */
+static int vuf_load_state(QEMUFile *f, void *pv, size_t size,
+                          const VMStateField *field)
+{
+    VirtIODevice *vdev = pv;
+    VHostUserFS *fs = VHOST_USER_FS(vdev);
+    Error *local_error = NULL;
+    int ret;
+
+    ret = vhost_load_backend_state(&fs->vhost_dev, f, &local_error);
+    if (ret < 0) {
+        error_reportf_err(local_error,
+                          "Error loading back-end state of %s device %s "
+                          "(tag: \"%s\"): ",
+                          vdev->name, vdev->parent_obj.canonical_path,
+                          fs->conf.tag ?: "<none>");
+        return ret;
+    }
+
+    return 0;
+}
+
+static bool vuf_is_internal_migration(void *opaque)
+{
+    /* TODO: Return false when an external migration is requested */
+    return true;
+}
+
+static int vuf_check_migration_support(void *opaque)
+{
+    VirtIODevice *vdev = opaque;
+    VHostUserFS *fs = VHOST_USER_FS(vdev);
+
+    if (!vhost_supports_migratory_state(&fs->vhost_dev)) {
+        error_report("Back-end of %s device %s (tag: \"%s\") does not support "
+                     "migration through qemu",
+                     vdev->name, vdev->parent_obj.canonical_path,
+                     fs->conf.tag ?: "<none>");
+        return -ENOTSUP;
+    }
+
+    return 0;
+}
+
+static const VMStateDescription vuf_backend_vmstate;
+
 static const VMStateDescription vuf_vmstate = {
     .name = "vhost-user-fs",
-    .unmigratable = 1,
+    .version_id = 0,
+    .fields = (VMStateField[]) {
+        VMSTATE_VIRTIO_DEVICE,
+        VMSTATE_END_OF_LIST()
+    },
+    .subsections = (const VMStateDescription * []) {
+        &vuf_backend_vmstate,
+        NULL,
+    }
+};
+
+static const VMStateDescription vuf_backend_vmstate = {
+    .name = "vhost-user-fs-backend",
+    .version_id = 0,
+    .needed = vuf_is_internal_migration,
+    .pre_load = vuf_check_migration_support,
+    .pre_save = vuf_check_migration_support,
+    .fields = (VMStateField[]) {
+        {
+            .name = "back-end",
+            .info = &(const VMStateInfo) {
+                .name = "virtio-fs back-end state",
+                .get = vuf_load_state,
+                .put = vuf_save_state,
+            },
+        },
+        VMSTATE_END_OF_LIST()
+    },
 };
 
 static Property vuf_properties[] = {
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/4] vhost-user.rst: Migrating back-end-internal state
  2023-07-12 11:16   ` [Virtio-fs] " Hanna Czenczek
@ 2023-07-18 15:57     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2023-07-18 15:57 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: qemu-devel, virtio-fs, Michael S . Tsirkin, German Maglione,
	Eugenio Pérez

[-- Attachment #1: Type: text/plain, Size: 8205 bytes --]

On Wed, Jul 12, 2023 at 01:16:59PM +0200, Hanna Czenczek wrote:
> For vhost-user devices, qemu can migrate the virtio state, but not the
> back-end's internal state.  To do so, we need to be able to transfer
> this internal state between front-end (qemu) and back-end.
> 
> At this point, this new feature is added for the purpose of virtio-fs
> migration.  Because virtiofsd's internal state will not be too large, we
> believe it is best to transfer it as a single binary blob after the
> streaming phase.
> 
> These are the additions to the protocol:
> - New vhost-user protocol feature VHOST_USER_PROTOCOL_F_MIGRATORY_STATE

It's not 100% clear whether "migratory" is related to live migration or
something else. I don't like the name :P.

The name "VHOST_USER_PROTOCOL_F_DEVICE_STATE" would be more obviously
associated with SET_DEVICE_STATE_FD and CHECK_DEVICE_STATE than
"MIGRATORY_STATE".

> - SET_DEVICE_STATE_FD function: Front-end and back-end negotiate a pipe
>   over which to transfer the state.

Does it need to be a pipe or can it be another type of file (e.g. UNIX
domain socket)?

In the future the fd may become bi-directional. Pipes are
uni-directional on Linux.

I suggest calling it a "file descriptor" and not mentioning "pipes"
explicitly.

> - CHECK_DEVICE_STATE: After the state has been transferred through the
>   pipe, the front-end invokes this function to verify success.  There is
>   no in-band way (through the pipe) to indicate failure, so we need to
>   check explicitly.
> 
> Once the transfer pipe has been established via SET_DEVICE_STATE_FD
> (which includes establishing the direction of transfer and migration
> phase), the sending side writes its data into the pipe, and the reading
> side reads it until it sees an EOF.  Then, the front-end will check for
> success via CHECK_DEVICE_STATE, which on the destination side includes
> checking for integrity (i.e. errors during deserialization).
> 
> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> ---
>  docs/interop/vhost-user.rst | 87 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 87 insertions(+)
> 
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index ac6be34c4c..c98dfeca25 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -334,6 +334,7 @@ in the ancillary data:
>  * ``VHOST_USER_SET_VRING_ERR``
>  * ``VHOST_USER_SET_BACKEND_REQ_FD`` (previous name ``VHOST_USER_SET_SLAVE_REQ_FD``)
>  * ``VHOST_USER_SET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``)
> +* ``VHOST_USER_SET_DEVICE_STATE_FD``
>  
>  If *front-end* is unable to send the full message or receives a wrong
>  reply it will close the connection. An optional reconnection mechanism
> @@ -497,6 +498,44 @@ it performs WAKE ioctl's on the userfaultfd to wake the stalled
>  back-end.  The front-end indicates support for this via the
>  ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` feature.
>  
> +.. _migrating_backend_state:
> +
> +Migrating back-end state
> +^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +If the back-end has internal state that is to be sent from source to
> +destination,

Migration and the terms "source" and "destination" have not been
defined. Here is a suggestion for an introductory paragraph:

  Migrating device state involves transferring the state from one
  back-end, called the source, to another back-end, called the
  destination. After migration, the destination transparently resumes
  operation without requiring the driver to re-initialize the device at
  the VIRTIO level. If the migration fails, then the source can
  transparently resume operation until another migration attempt is
  made.

> the front-end may be able to store and transfer it via an
> +internal migration stream.  Support for this is negotiated with the
> +``VHOST_USER_PROTOCOL_F_MIGRATORY_STATE`` feature.
> +
> +First, a channel over which the state is transferred is established on
> +the source side using the ``VHOST_USER_SET_DEVICE_STATE_FD`` message.
> +This message has two parameters:
> +
> +* Direction of transfer: On the source, the data is saved, transferring
> +  it from the back-end to the front-end.  On the destination, the data
> +  is loaded, transferring it from the front-end to the back-end.
> +
> +* Migration phase: Currently, only the period after memory transfer

"memory transfer" is vague. This sentence is referring to VM live
migration and guest RAM but it may be better to focus on just the device
perspective and not the VM:

  Migration is currently only supported while the device is suspended
  and all of its rings are stopped. In the future, additional phases
  might be support to allow iterative migration while the device is
  running.

> +  before switch-over is supported, in which the device is suspended and
> +  all of its rings are stopped.
> +
> +Then, the writing end will write all the data it has, signalling the end
> +of data by closing its end of the pipe.  The reading end must read all
> +of this data and process it:
> +
> +* If saving, the front-end will transfer this data to the destination,

To be extra clear:

  ...transfer this data to the destination through some
  implementation-specific means.

> +  where it is loaded into the destination back-end.
> +
> +* If loading, the back-end must deserialize its internal state from the
> +  transferred data and be set up to resume operation.

"and be set up to resume operation" is a little unclear to me. I guess
it means "in preparation for VHOST_USER_RESUME".

> +
> +After the front-end has seen all data transferred (saving: seen an EOF
> +on the pipe; loading: closed its end of the pipe), it sends the
> +``VHOST_USER_CHECK_DEVICE_STATE`` message to verify that data transfer
> +was successful in the back-end, too.  The back-end responds once it
> +knows whether the tranfer and processing was successful or not.
> +
>  Memory access
>  -------------
>  
> @@ -891,6 +930,7 @@ Protocol features
>    #define VHOST_USER_PROTOCOL_F_STATUS               16
>    #define VHOST_USER_PROTOCOL_F_XEN_MMAP             17
>    #define VHOST_USER_PROTOCOL_F_SUSPEND              18
> +  #define VHOST_USER_PROTOCOL_F_MIGRATORY_STATE      19
>  
>  Front-end message types
>  -----------------------
> @@ -1471,6 +1511,53 @@ Front-end message types
>    before.  The back-end must again begin processing rings that are not
>    stopped, and it may resume background operations.
>  
> +``VHOST_USER_SET_DEVICE_STATE_FD``
> +  :id: 43
> +  :equivalent ioctl: N/A
> +  :request payload: device state transfer parameters
> +  :reply payload: ``u64``
> +
> +  The front-end negotiates a pipe over which to transfer the back-end’s
> +  internal state during migration.  For this purpose, this message is
> +  accompanied by a file descriptor that is to be the back-end’s end of
> +  the pipe.  If the back-end can provide a more efficient pipe (i.e.
> +  because it internally already has a pipe into/from which to
> +  put/receive state), it can ignore this and reply with a different file
> +  descriptor to serve as the front-end’s end.
> +
> +  The request payload contains parameters for the subsequent data
> +  transfer, as described in the :ref:`Migrating back-end state
> +  <migrating_backend_state>` section.  That section also explains the
> +  data transfer itself.
> +
> +  The value returned is both an indication for success, and whether a
> +  new pipe file descriptor is returned: Bits 0–7 are 0 on success, and
> +  non-zero on error.  Bit 8 is the invalid FD flag; this flag is set
> +  when there is no file descriptor returned.  When this flag is not set,
> +  the front-end must use the returned file descriptor as its end of the
> +  pipe.  The back-end must not both indicate an error and return a file
> +  descriptor.

Is the invalid FD flag necessary? The front-end can check whether or not
an fd was passed along with the result, so I'm not sure why the result
also needs to communicate this.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Virtio-fs] [PATCH v2 1/4] vhost-user.rst: Migrating back-end-internal state
@ 2023-07-18 15:57     ` Stefan Hajnoczi
  0 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2023-07-18 15:57 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: qemu-devel, virtio-fs, Michael S . Tsirkin, German Maglione,
	Eugenio Pérez

[-- Attachment #1: Type: text/plain, Size: 8205 bytes --]

On Wed, Jul 12, 2023 at 01:16:59PM +0200, Hanna Czenczek wrote:
> For vhost-user devices, qemu can migrate the virtio state, but not the
> back-end's internal state.  To do so, we need to be able to transfer
> this internal state between front-end (qemu) and back-end.
> 
> At this point, this new feature is added for the purpose of virtio-fs
> migration.  Because virtiofsd's internal state will not be too large, we
> believe it is best to transfer it as a single binary blob after the
> streaming phase.
> 
> These are the additions to the protocol:
> - New vhost-user protocol feature VHOST_USER_PROTOCOL_F_MIGRATORY_STATE

It's not 100% clear whether "migratory" is related to live migration or
something else. I don't like the name :P.

The name "VHOST_USER_PROTOCOL_F_DEVICE_STATE" would be more obviously
associated with SET_DEVICE_STATE_FD and CHECK_DEVICE_STATE than
"MIGRATORY_STATE".

> - SET_DEVICE_STATE_FD function: Front-end and back-end negotiate a pipe
>   over which to transfer the state.

Does it need to be a pipe or can it be another type of file (e.g. UNIX
domain socket)?

In the future the fd may become bi-directional. Pipes are
uni-directional on Linux.

I suggest calling it a "file descriptor" and not mentioning "pipes"
explicitly.

> - CHECK_DEVICE_STATE: After the state has been transferred through the
>   pipe, the front-end invokes this function to verify success.  There is
>   no in-band way (through the pipe) to indicate failure, so we need to
>   check explicitly.
> 
> Once the transfer pipe has been established via SET_DEVICE_STATE_FD
> (which includes establishing the direction of transfer and migration
> phase), the sending side writes its data into the pipe, and the reading
> side reads it until it sees an EOF.  Then, the front-end will check for
> success via CHECK_DEVICE_STATE, which on the destination side includes
> checking for integrity (i.e. errors during deserialization).
> 
> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> ---
>  docs/interop/vhost-user.rst | 87 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 87 insertions(+)
> 
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index ac6be34c4c..c98dfeca25 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -334,6 +334,7 @@ in the ancillary data:
>  * ``VHOST_USER_SET_VRING_ERR``
>  * ``VHOST_USER_SET_BACKEND_REQ_FD`` (previous name ``VHOST_USER_SET_SLAVE_REQ_FD``)
>  * ``VHOST_USER_SET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``)
> +* ``VHOST_USER_SET_DEVICE_STATE_FD``
>  
>  If *front-end* is unable to send the full message or receives a wrong
>  reply it will close the connection. An optional reconnection mechanism
> @@ -497,6 +498,44 @@ it performs WAKE ioctl's on the userfaultfd to wake the stalled
>  back-end.  The front-end indicates support for this via the
>  ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` feature.
>  
> +.. _migrating_backend_state:
> +
> +Migrating back-end state
> +^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +If the back-end has internal state that is to be sent from source to
> +destination,

Migration and the terms "source" and "destination" have not been
defined. Here is a suggestion for an introductory paragraph:

  Migrating device state involves transferring the state from one
  back-end, called the source, to another back-end, called the
  destination. After migration, the destination transparently resumes
  operation without requiring the driver to re-initialize the device at
  the VIRTIO level. If the migration fails, then the source can
  transparently resume operation until another migration attempt is
  made.

> the front-end may be able to store and transfer it via an
> +internal migration stream.  Support for this is negotiated with the
> +``VHOST_USER_PROTOCOL_F_MIGRATORY_STATE`` feature.
> +
> +First, a channel over which the state is transferred is established on
> +the source side using the ``VHOST_USER_SET_DEVICE_STATE_FD`` message.
> +This message has two parameters:
> +
> +* Direction of transfer: On the source, the data is saved, transferring
> +  it from the back-end to the front-end.  On the destination, the data
> +  is loaded, transferring it from the front-end to the back-end.
> +
> +* Migration phase: Currently, only the period after memory transfer

"memory transfer" is vague. This sentence is referring to VM live
migration and guest RAM but it may be better to focus on just the device
perspective and not the VM:

  Migration is currently only supported while the device is suspended
  and all of its rings are stopped. In the future, additional phases
  might be support to allow iterative migration while the device is
  running.

> +  before switch-over is supported, in which the device is suspended and
> +  all of its rings are stopped.
> +
> +Then, the writing end will write all the data it has, signalling the end
> +of data by closing its end of the pipe.  The reading end must read all
> +of this data and process it:
> +
> +* If saving, the front-end will transfer this data to the destination,

To be extra clear:

  ...transfer this data to the destination through some
  implementation-specific means.

> +  where it is loaded into the destination back-end.
> +
> +* If loading, the back-end must deserialize its internal state from the
> +  transferred data and be set up to resume operation.

"and be set up to resume operation" is a little unclear to me. I guess
it means "in preparation for VHOST_USER_RESUME".

> +
> +After the front-end has seen all data transferred (saving: seen an EOF
> +on the pipe; loading: closed its end of the pipe), it sends the
> +``VHOST_USER_CHECK_DEVICE_STATE`` message to verify that data transfer
> +was successful in the back-end, too.  The back-end responds once it
> +knows whether the tranfer and processing was successful or not.
> +
>  Memory access
>  -------------
>  
> @@ -891,6 +930,7 @@ Protocol features
>    #define VHOST_USER_PROTOCOL_F_STATUS               16
>    #define VHOST_USER_PROTOCOL_F_XEN_MMAP             17
>    #define VHOST_USER_PROTOCOL_F_SUSPEND              18
> +  #define VHOST_USER_PROTOCOL_F_MIGRATORY_STATE      19
>  
>  Front-end message types
>  -----------------------
> @@ -1471,6 +1511,53 @@ Front-end message types
>    before.  The back-end must again begin processing rings that are not
>    stopped, and it may resume background operations.
>  
> +``VHOST_USER_SET_DEVICE_STATE_FD``
> +  :id: 43
> +  :equivalent ioctl: N/A
> +  :request payload: device state transfer parameters
> +  :reply payload: ``u64``
> +
> +  The front-end negotiates a pipe over which to transfer the back-end’s
> +  internal state during migration.  For this purpose, this message is
> +  accompanied by a file descriptor that is to be the back-end’s end of
> +  the pipe.  If the back-end can provide a more efficient pipe (i.e.
> +  because it internally already has a pipe into/from which to
> +  put/receive state), it can ignore this and reply with a different file
> +  descriptor to serve as the front-end’s end.
> +
> +  The request payload contains parameters for the subsequent data
> +  transfer, as described in the :ref:`Migrating back-end state
> +  <migrating_backend_state>` section.  That section also explains the
> +  data transfer itself.
> +
> +  The value returned is both an indication for success, and whether a
> +  new pipe file descriptor is returned: Bits 0–7 are 0 on success, and
> +  non-zero on error.  Bit 8 is the invalid FD flag; this flag is set
> +  when there is no file descriptor returned.  When this flag is not set,
> +  the front-end must use the returned file descriptor as its end of the
> +  pipe.  The back-end must not both indicate an error and return a file
> +  descriptor.

Is the invalid FD flag necessary? The front-end can check whether or not
an fd was passed along with the result, so I'm not sure why the result
also needs to communicate this.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/4] vhost-user.rst: Migrating back-end-internal state
  2023-07-12 11:16   ` [Virtio-fs] " Hanna Czenczek
@ 2023-07-18 16:12     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2023-07-18 16:12 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: qemu-devel, virtio-fs, Michael S . Tsirkin, German Maglione,
	Eugenio Pérez

[-- Attachment #1: Type: text/plain, Size: 418 bytes --]

On Wed, Jul 12, 2023 at 01:16:59PM +0200, Hanna Czenczek wrote:
> @@ -1471,6 +1511,53 @@ Front-end message types
>    before.  The back-end must again begin processing rings that are not
>    stopped, and it may resume background operations.
>  
> +``VHOST_USER_SET_DEVICE_STATE_FD``
> +  :id: 43
> +  :equivalent ioctl: N/A
> +  :request payload: device state transfer parameters

Where are these defined?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Virtio-fs] [PATCH v2 1/4] vhost-user.rst: Migrating back-end-internal state
@ 2023-07-18 16:12     ` Stefan Hajnoczi
  0 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2023-07-18 16:12 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: qemu-devel, virtio-fs, Michael S . Tsirkin, German Maglione,
	Eugenio Pérez

[-- Attachment #1: Type: text/plain, Size: 418 bytes --]

On Wed, Jul 12, 2023 at 01:16:59PM +0200, Hanna Czenczek wrote:
> @@ -1471,6 +1511,53 @@ Front-end message types
>    before.  The back-end must again begin processing rings that are not
>    stopped, and it may resume background operations.
>  
> +``VHOST_USER_SET_DEVICE_STATE_FD``
> +  :id: 43
> +  :equivalent ioctl: N/A
> +  :request payload: device state transfer parameters

Where are these defined?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/4] vhost-user: Interface for migration state transfer
  2023-07-12 11:17   ` [Virtio-fs] " Hanna Czenczek
@ 2023-07-18 18:32     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2023-07-18 18:32 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: qemu-devel, virtio-fs, Michael S . Tsirkin, German Maglione,
	Eugenio Pérez

[-- Attachment #1: Type: text/plain, Size: 581 bytes --]

On Wed, Jul 12, 2023 at 01:17:00PM +0200, Hanna Czenczek wrote:
> Add the interface for transferring the back-end's state during migration
> as defined previously in vhost-user.rst.
> 
> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> ---
>  include/hw/virtio/vhost-backend.h |  24 +++++
>  include/hw/virtio/vhost.h         |  79 ++++++++++++++++
>  hw/virtio/vhost-user.c            | 147 ++++++++++++++++++++++++++++++
>  hw/virtio/vhost.c                 |  37 ++++++++
>  4 files changed, 287 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Virtio-fs] [PATCH v2 2/4] vhost-user: Interface for migration state transfer
@ 2023-07-18 18:32     ` Stefan Hajnoczi
  0 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2023-07-18 18:32 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: qemu-devel, virtio-fs, Michael S . Tsirkin, German Maglione,
	Eugenio Pérez

[-- Attachment #1: Type: text/plain, Size: 581 bytes --]

On Wed, Jul 12, 2023 at 01:17:00PM +0200, Hanna Czenczek wrote:
> Add the interface for transferring the back-end's state during migration
> as defined previously in vhost-user.rst.
> 
> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> ---
>  include/hw/virtio/vhost-backend.h |  24 +++++
>  include/hw/virtio/vhost.h         |  79 ++++++++++++++++
>  hw/virtio/vhost-user.c            | 147 ++++++++++++++++++++++++++++++
>  hw/virtio/vhost.c                 |  37 ++++++++
>  4 files changed, 287 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/4] vhost: Add high-level state save/load functions
  2023-07-12 11:17   ` [Virtio-fs] " Hanna Czenczek
@ 2023-07-18 18:42     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2023-07-18 18:42 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: qemu-devel, virtio-fs, Michael S . Tsirkin, German Maglione,
	Eugenio Pérez

[-- Attachment #1: Type: text/plain, Size: 9734 bytes --]

On Wed, Jul 12, 2023 at 01:17:01PM +0200, Hanna Czenczek wrote:
> vhost_save_backend_state() and vhost_load_backend_state() can be used by
> vhost front-ends to easily save and load the back-end's state to/from
> the migration stream.
> 
> Because we do not know the full state size ahead of time,
> vhost_save_backend_state() simply reads the data in 1 MB chunks, and
> writes each chunk consecutively into the migration stream, prefixed by
> its length.  EOF is indicated by a 0-length chunk.
> 
> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> ---
>  include/hw/virtio/vhost.h |  35 +++++++
>  hw/virtio/vhost.c         | 204 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 239 insertions(+)
> 
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index d8877496e5..0c282abd4e 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -425,4 +425,39 @@ int vhost_set_device_state_fd(struct vhost_dev *dev,
>   */
>  int vhost_check_device_state(struct vhost_dev *dev, Error **errp);
>  
> +/**
> + * vhost_save_backend_state(): High-level function to receive a vhost
> + * back-end's state, and save it in `f`.  Uses

I think the GtkDoc syntax is @f instead of `f`.

> + * `vhost_set_device_state_fd()` to get the data from the back-end, and
> + * stores it in consecutive chunks that are each prefixed by their
> + * respective length (be32).  The end is marked by a 0-length chunk.
> + *
> + * Must only be called while the device and all its vrings are stopped
> + * (`VHOST_TRANSFER_STATE_PHASE_STOPPED`).
> + *
> + * @dev: The vhost device from which to save the state
> + * @f: Migration stream in which to save the state
> + * @errp: Potential error message
> + *
> + * Returns 0 on success, and -errno otherwise.
> + */
> +int vhost_save_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp);
> +
> +/**
> + * vhost_load_backend_state(): High-level function to load a vhost
> + * back-end's state from `f`, and send it over to the back-end.  Reads
> + * the data from `f` in the format used by `vhost_save_state()`, and
> + * uses `vhost_set_device_state_fd()` to transfer it to the back-end.
> + *
> + * Must only be called while the device and all its vrings are stopped
> + * (`VHOST_TRANSFER_STATE_PHASE_STOPPED`).
> + *
> + * @dev: The vhost device to which to send the sate
> + * @f: Migration stream from which to load the state
> + * @errp: Potential error message
> + *
> + * Returns 0 on success, and -errno otherwise.
> + */
> +int vhost_load_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp);
> +
>  #endif
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 756b6d55a8..332d49a310 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -2128,3 +2128,207 @@ int vhost_check_device_state(struct vhost_dev *dev, Error **errp)
>                 "vhost transport does not support migration state transfer");
>      return -ENOSYS;
>  }
> +
> +int vhost_save_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp)
> +{
> +    /* Maximum chunk size in which to transfer the state */
> +    const size_t chunk_size = 1 * 1024 * 1024;
> +    void *transfer_buf = NULL;
> +    g_autoptr(GError) g_err = NULL;
> +    int pipe_fds[2], read_fd = -1, write_fd = -1, reply_fd = -1;
> +    int ret;
> +
> +    /* [0] for reading (our end), [1] for writing (back-end's end) */
> +    if (!g_unix_open_pipe(pipe_fds, FD_CLOEXEC, &g_err)) {
> +        error_setg(errp, "Failed to set up state transfer pipe: %s",
> +                   g_err->message);
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +
> +    read_fd = pipe_fds[0];
> +    write_fd = pipe_fds[1];
> +
> +    /*
> +     * VHOST_TRANSFER_STATE_PHASE_STOPPED means the device must be stopped.
> +     * We cannot check dev->suspended, because the back-end may not support
> +     * suspending.
> +     */
> +    assert(!dev->started);
> +
> +    /* Transfer ownership of write_fd to the back-end */
> +    ret = vhost_set_device_state_fd(dev,
> +                                    VHOST_TRANSFER_STATE_DIRECTION_SAVE,
> +                                    VHOST_TRANSFER_STATE_PHASE_STOPPED,
> +                                    write_fd,
> +                                    &reply_fd,
> +                                    errp);
> +    if (ret < 0) {
> +        error_prepend(errp, "Failed to initiate state transfer: ");
> +        goto fail;
> +    }
> +
> +    /* If the back-end wishes to use a different pipe, switch over */
> +    if (reply_fd >= 0) {
> +        close(read_fd);
> +        read_fd = reply_fd;
> +    }
> +
> +    transfer_buf = g_malloc(chunk_size);
> +
> +    while (true) {
> +        ssize_t read_ret;
> +
> +        read_ret = read(read_fd, transfer_buf, chunk_size);
> +        if (read_ret < 0) {

Is it necessary to handle -EINTR?

> +            ret = -errno;
> +            error_setg_errno(errp, -ret, "Failed to receive state");
> +            goto fail;
> +        }
> +
> +        assert(read_ret <= chunk_size);
> +        qemu_put_be32(f, read_ret);
> +
> +        if (read_ret == 0) {
> +            /* EOF */
> +            break;
> +        }
> +
> +        qemu_put_buffer(f, transfer_buf, read_ret);
> +    }
> +
> +    /*
> +     * Back-end will not really care, but be clean and close our end of the pipe
> +     * before inquiring the back-end about whether transfer was successful
> +     */
> +    close(read_fd);
> +    read_fd = -1;
> +
> +    /* Also, verify that the device is still stopped */
> +    assert(!dev->started);
> +
> +    ret = vhost_check_device_state(dev, errp);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    ret = 0;
> +fail:
> +    g_free(transfer_buf);
> +    if (read_fd >= 0) {
> +        close(read_fd);
> +    }
> +
> +    return ret;
> +}
> +
> +int vhost_load_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp)
> +{
> +    size_t transfer_buf_size = 0;
> +    void *transfer_buf = NULL;
> +    g_autoptr(GError) g_err = NULL;
> +    int pipe_fds[2], read_fd = -1, write_fd = -1, reply_fd = -1;
> +    int ret;
> +
> +    /* [0] for reading (back-end's end), [1] for writing (our end) */
> +    if (!g_unix_open_pipe(pipe_fds, FD_CLOEXEC, &g_err)) {
> +        error_setg(errp, "Failed to set up state transfer pipe: %s",
> +                   g_err->message);
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +
> +    read_fd = pipe_fds[0];
> +    write_fd = pipe_fds[1];
> +
> +    /*
> +     * VHOST_TRANSFER_STATE_PHASE_STOPPED means the device must be stopped.
> +     * We cannot check dev->suspended, because the back-end may not support
> +     * suspending.
> +     */
> +    assert(!dev->started);
> +
> +    /* Transfer ownership of read_fd to the back-end */
> +    ret = vhost_set_device_state_fd(dev,
> +                                    VHOST_TRANSFER_STATE_DIRECTION_LOAD,
> +                                    VHOST_TRANSFER_STATE_PHASE_STOPPED,
> +                                    read_fd,
> +                                    &reply_fd,
> +                                    errp);
> +    if (ret < 0) {
> +        error_prepend(errp, "Failed to initiate state transfer: ");
> +        goto fail;
> +    }
> +
> +    /* If the back-end wishes to use a different pipe, switch over */
> +    if (reply_fd >= 0) {
> +        close(write_fd);
> +        write_fd = reply_fd;
> +    }
> +
> +    while (true) {
> +        size_t this_chunk_size = qemu_get_be32(f);
> +        ssize_t write_ret;
> +        const uint8_t *transfer_pointer;
> +
> +        if (this_chunk_size == 0) {
> +            /* End of state */
> +            break;
> +        }
> +
> +        if (transfer_buf_size < this_chunk_size) {
> +            transfer_buf = g_realloc(transfer_buf, this_chunk_size);
> +            transfer_buf_size = this_chunk_size;
> +        }
> +
> +        if (qemu_get_buffer(f, transfer_buf, this_chunk_size) <
> +                this_chunk_size)
> +        {
> +            error_setg(errp, "Failed to read state");
> +            ret = -EINVAL;
> +            goto fail;
> +        }
> +
> +        transfer_pointer = transfer_buf;
> +        while (this_chunk_size > 0) {
> +            write_ret = write(write_fd, transfer_pointer, this_chunk_size);
> +            if (write_ret < 0) {

Is it necessary to handle -EINTR?

> +                ret = -errno;
> +                error_setg_errno(errp, -ret, "Failed to send state");
> +                goto fail;
> +            } else if (write_ret == 0) {
> +                error_setg(errp, "Failed to send state: Connection is closed");
> +                ret = -ECONNRESET;
> +                goto fail;
> +            }
> +
> +            assert(write_ret <= this_chunk_size);
> +            this_chunk_size -= write_ret;
> +            transfer_pointer += write_ret;
> +        }
> +    }
> +
> +    /*
> +     * Close our end, thus ending transfer, before inquiring the back-end about
> +     * whether transfer was successful
> +     */
> +    close(write_fd);
> +    write_fd = -1;
> +
> +    /* Also, verify that the device is still stopped */
> +    assert(!dev->started);
> +
> +    ret = vhost_check_device_state(dev, errp);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    ret = 0;
> +fail:
> +    g_free(transfer_buf);
> +    if (write_fd >= 0) {
> +        close(write_fd);
> +    }
> +
> +    return ret;
> +}
> -- 
> 2.41.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Virtio-fs] [PATCH v2 3/4] vhost: Add high-level state save/load functions
@ 2023-07-18 18:42     ` Stefan Hajnoczi
  0 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2023-07-18 18:42 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: qemu-devel, virtio-fs, Michael S . Tsirkin, German Maglione,
	Eugenio Pérez

[-- Attachment #1: Type: text/plain, Size: 9734 bytes --]

On Wed, Jul 12, 2023 at 01:17:01PM +0200, Hanna Czenczek wrote:
> vhost_save_backend_state() and vhost_load_backend_state() can be used by
> vhost front-ends to easily save and load the back-end's state to/from
> the migration stream.
> 
> Because we do not know the full state size ahead of time,
> vhost_save_backend_state() simply reads the data in 1 MB chunks, and
> writes each chunk consecutively into the migration stream, prefixed by
> its length.  EOF is indicated by a 0-length chunk.
> 
> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> ---
>  include/hw/virtio/vhost.h |  35 +++++++
>  hw/virtio/vhost.c         | 204 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 239 insertions(+)
> 
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index d8877496e5..0c282abd4e 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -425,4 +425,39 @@ int vhost_set_device_state_fd(struct vhost_dev *dev,
>   */
>  int vhost_check_device_state(struct vhost_dev *dev, Error **errp);
>  
> +/**
> + * vhost_save_backend_state(): High-level function to receive a vhost
> + * back-end's state, and save it in `f`.  Uses

I think the GtkDoc syntax is @f instead of `f`.

> + * `vhost_set_device_state_fd()` to get the data from the back-end, and
> + * stores it in consecutive chunks that are each prefixed by their
> + * respective length (be32).  The end is marked by a 0-length chunk.
> + *
> + * Must only be called while the device and all its vrings are stopped
> + * (`VHOST_TRANSFER_STATE_PHASE_STOPPED`).
> + *
> + * @dev: The vhost device from which to save the state
> + * @f: Migration stream in which to save the state
> + * @errp: Potential error message
> + *
> + * Returns 0 on success, and -errno otherwise.
> + */
> +int vhost_save_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp);
> +
> +/**
> + * vhost_load_backend_state(): High-level function to load a vhost
> + * back-end's state from `f`, and send it over to the back-end.  Reads
> + * the data from `f` in the format used by `vhost_save_state()`, and
> + * uses `vhost_set_device_state_fd()` to transfer it to the back-end.
> + *
> + * Must only be called while the device and all its vrings are stopped
> + * (`VHOST_TRANSFER_STATE_PHASE_STOPPED`).
> + *
> + * @dev: The vhost device to which to send the sate
> + * @f: Migration stream from which to load the state
> + * @errp: Potential error message
> + *
> + * Returns 0 on success, and -errno otherwise.
> + */
> +int vhost_load_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp);
> +
>  #endif
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 756b6d55a8..332d49a310 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -2128,3 +2128,207 @@ int vhost_check_device_state(struct vhost_dev *dev, Error **errp)
>                 "vhost transport does not support migration state transfer");
>      return -ENOSYS;
>  }
> +
> +int vhost_save_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp)
> +{
> +    /* Maximum chunk size in which to transfer the state */
> +    const size_t chunk_size = 1 * 1024 * 1024;
> +    void *transfer_buf = NULL;
> +    g_autoptr(GError) g_err = NULL;
> +    int pipe_fds[2], read_fd = -1, write_fd = -1, reply_fd = -1;
> +    int ret;
> +
> +    /* [0] for reading (our end), [1] for writing (back-end's end) */
> +    if (!g_unix_open_pipe(pipe_fds, FD_CLOEXEC, &g_err)) {
> +        error_setg(errp, "Failed to set up state transfer pipe: %s",
> +                   g_err->message);
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +
> +    read_fd = pipe_fds[0];
> +    write_fd = pipe_fds[1];
> +
> +    /*
> +     * VHOST_TRANSFER_STATE_PHASE_STOPPED means the device must be stopped.
> +     * We cannot check dev->suspended, because the back-end may not support
> +     * suspending.
> +     */
> +    assert(!dev->started);
> +
> +    /* Transfer ownership of write_fd to the back-end */
> +    ret = vhost_set_device_state_fd(dev,
> +                                    VHOST_TRANSFER_STATE_DIRECTION_SAVE,
> +                                    VHOST_TRANSFER_STATE_PHASE_STOPPED,
> +                                    write_fd,
> +                                    &reply_fd,
> +                                    errp);
> +    if (ret < 0) {
> +        error_prepend(errp, "Failed to initiate state transfer: ");
> +        goto fail;
> +    }
> +
> +    /* If the back-end wishes to use a different pipe, switch over */
> +    if (reply_fd >= 0) {
> +        close(read_fd);
> +        read_fd = reply_fd;
> +    }
> +
> +    transfer_buf = g_malloc(chunk_size);
> +
> +    while (true) {
> +        ssize_t read_ret;
> +
> +        read_ret = read(read_fd, transfer_buf, chunk_size);
> +        if (read_ret < 0) {

Is it necessary to handle -EINTR?

> +            ret = -errno;
> +            error_setg_errno(errp, -ret, "Failed to receive state");
> +            goto fail;
> +        }
> +
> +        assert(read_ret <= chunk_size);
> +        qemu_put_be32(f, read_ret);
> +
> +        if (read_ret == 0) {
> +            /* EOF */
> +            break;
> +        }
> +
> +        qemu_put_buffer(f, transfer_buf, read_ret);
> +    }
> +
> +    /*
> +     * Back-end will not really care, but be clean and close our end of the pipe
> +     * before inquiring the back-end about whether transfer was successful
> +     */
> +    close(read_fd);
> +    read_fd = -1;
> +
> +    /* Also, verify that the device is still stopped */
> +    assert(!dev->started);
> +
> +    ret = vhost_check_device_state(dev, errp);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    ret = 0;
> +fail:
> +    g_free(transfer_buf);
> +    if (read_fd >= 0) {
> +        close(read_fd);
> +    }
> +
> +    return ret;
> +}
> +
> +int vhost_load_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp)
> +{
> +    size_t transfer_buf_size = 0;
> +    void *transfer_buf = NULL;
> +    g_autoptr(GError) g_err = NULL;
> +    int pipe_fds[2], read_fd = -1, write_fd = -1, reply_fd = -1;
> +    int ret;
> +
> +    /* [0] for reading (back-end's end), [1] for writing (our end) */
> +    if (!g_unix_open_pipe(pipe_fds, FD_CLOEXEC, &g_err)) {
> +        error_setg(errp, "Failed to set up state transfer pipe: %s",
> +                   g_err->message);
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +
> +    read_fd = pipe_fds[0];
> +    write_fd = pipe_fds[1];
> +
> +    /*
> +     * VHOST_TRANSFER_STATE_PHASE_STOPPED means the device must be stopped.
> +     * We cannot check dev->suspended, because the back-end may not support
> +     * suspending.
> +     */
> +    assert(!dev->started);
> +
> +    /* Transfer ownership of read_fd to the back-end */
> +    ret = vhost_set_device_state_fd(dev,
> +                                    VHOST_TRANSFER_STATE_DIRECTION_LOAD,
> +                                    VHOST_TRANSFER_STATE_PHASE_STOPPED,
> +                                    read_fd,
> +                                    &reply_fd,
> +                                    errp);
> +    if (ret < 0) {
> +        error_prepend(errp, "Failed to initiate state transfer: ");
> +        goto fail;
> +    }
> +
> +    /* If the back-end wishes to use a different pipe, switch over */
> +    if (reply_fd >= 0) {
> +        close(write_fd);
> +        write_fd = reply_fd;
> +    }
> +
> +    while (true) {
> +        size_t this_chunk_size = qemu_get_be32(f);
> +        ssize_t write_ret;
> +        const uint8_t *transfer_pointer;
> +
> +        if (this_chunk_size == 0) {
> +            /* End of state */
> +            break;
> +        }
> +
> +        if (transfer_buf_size < this_chunk_size) {
> +            transfer_buf = g_realloc(transfer_buf, this_chunk_size);
> +            transfer_buf_size = this_chunk_size;
> +        }
> +
> +        if (qemu_get_buffer(f, transfer_buf, this_chunk_size) <
> +                this_chunk_size)
> +        {
> +            error_setg(errp, "Failed to read state");
> +            ret = -EINVAL;
> +            goto fail;
> +        }
> +
> +        transfer_pointer = transfer_buf;
> +        while (this_chunk_size > 0) {
> +            write_ret = write(write_fd, transfer_pointer, this_chunk_size);
> +            if (write_ret < 0) {

Is it necessary to handle -EINTR?

> +                ret = -errno;
> +                error_setg_errno(errp, -ret, "Failed to send state");
> +                goto fail;
> +            } else if (write_ret == 0) {
> +                error_setg(errp, "Failed to send state: Connection is closed");
> +                ret = -ECONNRESET;
> +                goto fail;
> +            }
> +
> +            assert(write_ret <= this_chunk_size);
> +            this_chunk_size -= write_ret;
> +            transfer_pointer += write_ret;
> +        }
> +    }
> +
> +    /*
> +     * Close our end, thus ending transfer, before inquiring the back-end about
> +     * whether transfer was successful
> +     */
> +    close(write_fd);
> +    write_fd = -1;
> +
> +    /* Also, verify that the device is still stopped */
> +    assert(!dev->started);
> +
> +    ret = vhost_check_device_state(dev, errp);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    ret = 0;
> +fail:
> +    g_free(transfer_buf);
> +    if (write_fd >= 0) {
> +        close(write_fd);
> +    }
> +
> +    return ret;
> +}
> -- 
> 2.41.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 4/4] vhost-user-fs: Implement internal migration
  2023-07-12 11:17   ` [Virtio-fs] " Hanna Czenczek
@ 2023-07-18 18:44     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2023-07-18 18:44 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: qemu-devel, virtio-fs, Michael S . Tsirkin, German Maglione,
	Eugenio Pérez

[-- Attachment #1: Type: text/plain, Size: 649 bytes --]

On Wed, Jul 12, 2023 at 01:17:02PM +0200, Hanna Czenczek wrote:
> A virtio-fs device's VM state consists of:
> - the virtio device (vring) state (VMSTATE_VIRTIO_DEVICE)
> - the back-end's (virtiofsd's) internal state
> 
> We get/set the latter via the new vhost operations to transfer migratory
> state.  It is its own dedicated subsection, so that for external
> migration, it can be disabled.
> 
> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> ---
>  hw/virtio/vhost-user-fs.c | 101 +++++++++++++++++++++++++++++++++++++-
>  1 file changed, 100 insertions(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Virtio-fs] [PATCH v2 4/4] vhost-user-fs: Implement internal migration
@ 2023-07-18 18:44     ` Stefan Hajnoczi
  0 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2023-07-18 18:44 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: qemu-devel, virtio-fs, Michael S . Tsirkin, German Maglione,
	Eugenio Pérez

[-- Attachment #1: Type: text/plain, Size: 649 bytes --]

On Wed, Jul 12, 2023 at 01:17:02PM +0200, Hanna Czenczek wrote:
> A virtio-fs device's VM state consists of:
> - the virtio device (vring) state (VMSTATE_VIRTIO_DEVICE)
> - the back-end's (virtiofsd's) internal state
> 
> We get/set the latter via the new vhost operations to transfer migratory
> state.  It is its own dedicated subsection, so that for external
> migration, it can be disabled.
> 
> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> ---
>  hw/virtio/vhost-user-fs.c | 101 +++++++++++++++++++++++++++++++++++++-
>  1 file changed, 100 insertions(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/4] vhost-user.rst: Migrating back-end-internal state
  2023-07-18 15:57     ` [Virtio-fs] " Stefan Hajnoczi
@ 2023-07-19 16:33       ` Hanna Czenczek
  -1 siblings, 0 replies; 36+ messages in thread
From: Hanna Czenczek @ 2023-07-19 16:33 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, virtio-fs, Michael S . Tsirkin, German Maglione,
	Eugenio Pérez

On 18.07.23 17:57, Stefan Hajnoczi wrote:
> On Wed, Jul 12, 2023 at 01:16:59PM +0200, Hanna Czenczek wrote:
>> For vhost-user devices, qemu can migrate the virtio state, but not the
>> back-end's internal state.  To do so, we need to be able to transfer
>> this internal state between front-end (qemu) and back-end.
>>
>> At this point, this new feature is added for the purpose of virtio-fs
>> migration.  Because virtiofsd's internal state will not be too large, we
>> believe it is best to transfer it as a single binary blob after the
>> streaming phase.
>>
>> These are the additions to the protocol:
>> - New vhost-user protocol feature VHOST_USER_PROTOCOL_F_MIGRATORY_STATE
> It's not 100% clear whether "migratory" is related to live migration or
> something else. I don't like the name :P.
>
> The name "VHOST_USER_PROTOCOL_F_DEVICE_STATE" would be more obviously
> associated with SET_DEVICE_STATE_FD and CHECK_DEVICE_STATE than
> "MIGRATORY_STATE".

Sure, sure.  Naming things is hard. :)

>> - SET_DEVICE_STATE_FD function: Front-end and back-end negotiate a pipe
>>    over which to transfer the state.
> Does it need to be a pipe or can it be another type of file (e.g. UNIX
> domain socket)?

It’s difficult to say, honestly.  It can be anything, but I’m not sure 
how to describe that in this specification.

It must be any FD into which the state sender can write the state and 
signal end of state by closing its FD; and from which the state receiver 
can read the state, terminated by seeing an EOF.  As you say, that 
doesn’t mean that the sender has to write the state into the FD, nor 
that the receiver has to read it (into memory), it’s just that either 
side must ensure the other can do it.

> In the future the fd may become bi-directional. Pipes are
> uni-directional on Linux.
>
> I suggest calling it a "file descriptor" and not mentioning "pipes"
> explicitly.

Works here in the commit message, but in the document, we need to be 
explicit about the requirements for this FD, i.e. the way in which 
front-end and back-end can expect the FD to be usable.  Calling it a 
“pipe” was a simple way, but you’re right, it’s more general than that.

>> - CHECK_DEVICE_STATE: After the state has been transferred through the
>>    pipe, the front-end invokes this function to verify success.  There is
>>    no in-band way (through the pipe) to indicate failure, so we need to
>>    check explicitly.
>>
>> Once the transfer pipe has been established via SET_DEVICE_STATE_FD
>> (which includes establishing the direction of transfer and migration
>> phase), the sending side writes its data into the pipe, and the reading
>> side reads it until it sees an EOF.  Then, the front-end will check for
>> success via CHECK_DEVICE_STATE, which on the destination side includes
>> checking for integrity (i.e. errors during deserialization).
>>
>> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
>> ---
>>   docs/interop/vhost-user.rst | 87 +++++++++++++++++++++++++++++++++++++
>>   1 file changed, 87 insertions(+)
>>
>> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
>> index ac6be34c4c..c98dfeca25 100644
>> --- a/docs/interop/vhost-user.rst
>> +++ b/docs/interop/vhost-user.rst
>> @@ -334,6 +334,7 @@ in the ancillary data:
>>   * ``VHOST_USER_SET_VRING_ERR``
>>   * ``VHOST_USER_SET_BACKEND_REQ_FD`` (previous name ``VHOST_USER_SET_SLAVE_REQ_FD``)
>>   * ``VHOST_USER_SET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``)
>> +* ``VHOST_USER_SET_DEVICE_STATE_FD``
>>   
>>   If *front-end* is unable to send the full message or receives a wrong
>>   reply it will close the connection. An optional reconnection mechanism
>> @@ -497,6 +498,44 @@ it performs WAKE ioctl's on the userfaultfd to wake the stalled
>>   back-end.  The front-end indicates support for this via the
>>   ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` feature.
>>   
>> +.. _migrating_backend_state:
>> +
>> +Migrating back-end state
>> +^^^^^^^^^^^^^^^^^^^^^^^^
>> +
>> +If the back-end has internal state that is to be sent from source to
>> +destination,
> Migration and the terms "source" and "destination" have not been
> defined. Here is a suggestion for an introductory paragraph:
>
>    Migrating device state involves transferring the state from one
>    back-end, called the source, to another back-end, called the
>    destination. After migration, the destination transparently resumes
>    operation without requiring the driver to re-initialize the device at
>    the VIRTIO level. If the migration fails, then the source can
>    transparently resume operation until another migration attempt is
>    made.

You’re right, thanks!  Maybe I’ll try to be even more verbose here, and 
include what VM and guest do.

>> the front-end may be able to store and transfer it via an
>> +internal migration stream.  Support for this is negotiated with the
>> +``VHOST_USER_PROTOCOL_F_MIGRATORY_STATE`` feature.
>> +
>> +First, a channel over which the state is transferred is established on
>> +the source side using the ``VHOST_USER_SET_DEVICE_STATE_FD`` message.
>> +This message has two parameters:
>> +
>> +* Direction of transfer: On the source, the data is saved, transferring
>> +  it from the back-end to the front-end.  On the destination, the data
>> +  is loaded, transferring it from the front-end to the back-end.
>> +
>> +* Migration phase: Currently, only the period after memory transfer
> "memory transfer" is vague. This sentence is referring to VM live
> migration and guest RAM but it may be better to focus on just the device
> perspective and not the VM:

The device perspective does include guest RAM, though, because the 
back-end must log its memory modifications, so it is very much involved 
in that process.  I think it’s a good idea to note that the state 
transfer will occur afterwards.

>    Migration is currently only supported while the device is suspended
>    and all of its rings are stopped. In the future, additional phases
>    might be support to allow iterative migration while the device is
>    running.

In any case, I’ll happily add this last sentence.

>> +  before switch-over is supported, in which the device is suspended and
>> +  all of its rings are stopped.
>> +
>> +Then, the writing end will write all the data it has, signalling the end
>> +of data by closing its end of the pipe.  The reading end must read all
>> +of this data and process it:
>> +
>> +* If saving, the front-end will transfer this data to the destination,
> To be extra clear:
>
>    ...transfer this data to the destination through some
>    implementation-specific means.

Yep!

>> +  where it is loaded into the destination back-end.
>> +
>> +* If loading, the back-end must deserialize its internal state from the
>> +  transferred data and be set up to resume operation.
> "and be set up to resume operation" is a little unclear to me. I guess
> it means "in preparation for VHOST_USER_RESUME".

I don’t think the back-end on the destination will receive a RESUME.  It 
never gets a SUSPEND, after all.  So this is about resuming operation 
once the vrings are kicked, and resuming it like it was left on the 
source when the back-end was SUSPEND-ed there.

>> +
>> +After the front-end has seen all data transferred (saving: seen an EOF
>> +on the pipe; loading: closed its end of the pipe), it sends the
>> +``VHOST_USER_CHECK_DEVICE_STATE`` message to verify that data transfer
>> +was successful in the back-end, too.  The back-end responds once it
>> +knows whether the tranfer and processing was successful or not.
>> +
>>   Memory access
>>   -------------
>>   
>> @@ -891,6 +930,7 @@ Protocol features
>>     #define VHOST_USER_PROTOCOL_F_STATUS               16
>>     #define VHOST_USER_PROTOCOL_F_XEN_MMAP             17
>>     #define VHOST_USER_PROTOCOL_F_SUSPEND              18
>> +  #define VHOST_USER_PROTOCOL_F_MIGRATORY_STATE      19
>>   
>>   Front-end message types
>>   -----------------------
>> @@ -1471,6 +1511,53 @@ Front-end message types
>>     before.  The back-end must again begin processing rings that are not
>>     stopped, and it may resume background operations.
>>   
>> +``VHOST_USER_SET_DEVICE_STATE_FD``
>> +  :id: 43
>> +  :equivalent ioctl: N/A
>> +  :request payload: device state transfer parameters
>> +  :reply payload: ``u64``
>> +
>> +  The front-end negotiates a pipe over which to transfer the back-end’s
>> +  internal state during migration.  For this purpose, this message is
>> +  accompanied by a file descriptor that is to be the back-end’s end of
>> +  the pipe.  If the back-end can provide a more efficient pipe (i.e.
>> +  because it internally already has a pipe into/from which to
>> +  put/receive state), it can ignore this and reply with a different file
>> +  descriptor to serve as the front-end’s end.
>> +
>> +  The request payload contains parameters for the subsequent data
>> +  transfer, as described in the :ref:`Migrating back-end state
>> +  <migrating_backend_state>` section.  That section also explains the
>> +  data transfer itself.
>> +
>> +  The value returned is both an indication for success, and whether a
>> +  new pipe file descriptor is returned: Bits 0–7 are 0 on success, and
>> +  non-zero on error.  Bit 8 is the invalid FD flag; this flag is set
>> +  when there is no file descriptor returned.  When this flag is not set,
>> +  the front-end must use the returned file descriptor as its end of the
>> +  pipe.  The back-end must not both indicate an error and return a file
>> +  descriptor.
> Is the invalid FD flag necessary? The front-end can check whether or not
> an fd was passed along with the result, so I'm not sure why the result
> also needs to communicate this.

If the front-end can check this, shouldn’t the back-end also generally 
be able to check whether the front-end has passed an FD in the ancillary 
data?  We do have this flag in messages sent by the front-end that can 
optionally provide an FD (SET_VRING_KICK, SET_VRING_CALL), so I thought 
it would be good for symmetry to keep this convention every time an FD 
is optional in communication between front-end and back-end, in either 
direction.

Hanna



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Virtio-fs] [PATCH v2 1/4] vhost-user.rst: Migrating back-end-internal state
@ 2023-07-19 16:33       ` Hanna Czenczek
  0 siblings, 0 replies; 36+ messages in thread
From: Hanna Czenczek @ 2023-07-19 16:33 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, virtio-fs, Michael S . Tsirkin, German Maglione,
	Eugenio Pérez

On 18.07.23 17:57, Stefan Hajnoczi wrote:
> On Wed, Jul 12, 2023 at 01:16:59PM +0200, Hanna Czenczek wrote:
>> For vhost-user devices, qemu can migrate the virtio state, but not the
>> back-end's internal state.  To do so, we need to be able to transfer
>> this internal state between front-end (qemu) and back-end.
>>
>> At this point, this new feature is added for the purpose of virtio-fs
>> migration.  Because virtiofsd's internal state will not be too large, we
>> believe it is best to transfer it as a single binary blob after the
>> streaming phase.
>>
>> These are the additions to the protocol:
>> - New vhost-user protocol feature VHOST_USER_PROTOCOL_F_MIGRATORY_STATE
> It's not 100% clear whether "migratory" is related to live migration or
> something else. I don't like the name :P.
>
> The name "VHOST_USER_PROTOCOL_F_DEVICE_STATE" would be more obviously
> associated with SET_DEVICE_STATE_FD and CHECK_DEVICE_STATE than
> "MIGRATORY_STATE".

Sure, sure.  Naming things is hard. :)

>> - SET_DEVICE_STATE_FD function: Front-end and back-end negotiate a pipe
>>    over which to transfer the state.
> Does it need to be a pipe or can it be another type of file (e.g. UNIX
> domain socket)?

It’s difficult to say, honestly.  It can be anything, but I’m not sure 
how to describe that in this specification.

It must be any FD into which the state sender can write the state and 
signal end of state by closing its FD; and from which the state receiver 
can read the state, terminated by seeing an EOF.  As you say, that 
doesn’t mean that the sender has to write the state into the FD, nor 
that the receiver has to read it (into memory), it’s just that either 
side must ensure the other can do it.

> In the future the fd may become bi-directional. Pipes are
> uni-directional on Linux.
>
> I suggest calling it a "file descriptor" and not mentioning "pipes"
> explicitly.

Works here in the commit message, but in the document, we need to be 
explicit about the requirements for this FD, i.e. the way in which 
front-end and back-end can expect the FD to be usable.  Calling it a 
“pipe” was a simple way, but you’re right, it’s more general than that.

>> - CHECK_DEVICE_STATE: After the state has been transferred through the
>>    pipe, the front-end invokes this function to verify success.  There is
>>    no in-band way (through the pipe) to indicate failure, so we need to
>>    check explicitly.
>>
>> Once the transfer pipe has been established via SET_DEVICE_STATE_FD
>> (which includes establishing the direction of transfer and migration
>> phase), the sending side writes its data into the pipe, and the reading
>> side reads it until it sees an EOF.  Then, the front-end will check for
>> success via CHECK_DEVICE_STATE, which on the destination side includes
>> checking for integrity (i.e. errors during deserialization).
>>
>> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
>> ---
>>   docs/interop/vhost-user.rst | 87 +++++++++++++++++++++++++++++++++++++
>>   1 file changed, 87 insertions(+)
>>
>> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
>> index ac6be34c4c..c98dfeca25 100644
>> --- a/docs/interop/vhost-user.rst
>> +++ b/docs/interop/vhost-user.rst
>> @@ -334,6 +334,7 @@ in the ancillary data:
>>   * ``VHOST_USER_SET_VRING_ERR``
>>   * ``VHOST_USER_SET_BACKEND_REQ_FD`` (previous name ``VHOST_USER_SET_SLAVE_REQ_FD``)
>>   * ``VHOST_USER_SET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``)
>> +* ``VHOST_USER_SET_DEVICE_STATE_FD``
>>   
>>   If *front-end* is unable to send the full message or receives a wrong
>>   reply it will close the connection. An optional reconnection mechanism
>> @@ -497,6 +498,44 @@ it performs WAKE ioctl's on the userfaultfd to wake the stalled
>>   back-end.  The front-end indicates support for this via the
>>   ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` feature.
>>   
>> +.. _migrating_backend_state:
>> +
>> +Migrating back-end state
>> +^^^^^^^^^^^^^^^^^^^^^^^^
>> +
>> +If the back-end has internal state that is to be sent from source to
>> +destination,
> Migration and the terms "source" and "destination" have not been
> defined. Here is a suggestion for an introductory paragraph:
>
>    Migrating device state involves transferring the state from one
>    back-end, called the source, to another back-end, called the
>    destination. After migration, the destination transparently resumes
>    operation without requiring the driver to re-initialize the device at
>    the VIRTIO level. If the migration fails, then the source can
>    transparently resume operation until another migration attempt is
>    made.

You’re right, thanks!  Maybe I’ll try to be even more verbose here, and 
include what VM and guest do.

>> the front-end may be able to store and transfer it via an
>> +internal migration stream.  Support for this is negotiated with the
>> +``VHOST_USER_PROTOCOL_F_MIGRATORY_STATE`` feature.
>> +
>> +First, a channel over which the state is transferred is established on
>> +the source side using the ``VHOST_USER_SET_DEVICE_STATE_FD`` message.
>> +This message has two parameters:
>> +
>> +* Direction of transfer: On the source, the data is saved, transferring
>> +  it from the back-end to the front-end.  On the destination, the data
>> +  is loaded, transferring it from the front-end to the back-end.
>> +
>> +* Migration phase: Currently, only the period after memory transfer
> "memory transfer" is vague. This sentence is referring to VM live
> migration and guest RAM but it may be better to focus on just the device
> perspective and not the VM:

The device perspective does include guest RAM, though, because the 
back-end must log its memory modifications, so it is very much involved 
in that process.  I think it’s a good idea to note that the state 
transfer will occur afterwards.

>    Migration is currently only supported while the device is suspended
>    and all of its rings are stopped. In the future, additional phases
>    might be support to allow iterative migration while the device is
>    running.

In any case, I’ll happily add this last sentence.

>> +  before switch-over is supported, in which the device is suspended and
>> +  all of its rings are stopped.
>> +
>> +Then, the writing end will write all the data it has, signalling the end
>> +of data by closing its end of the pipe.  The reading end must read all
>> +of this data and process it:
>> +
>> +* If saving, the front-end will transfer this data to the destination,
> To be extra clear:
>
>    ...transfer this data to the destination through some
>    implementation-specific means.

Yep!

>> +  where it is loaded into the destination back-end.
>> +
>> +* If loading, the back-end must deserialize its internal state from the
>> +  transferred data and be set up to resume operation.
> "and be set up to resume operation" is a little unclear to me. I guess
> it means "in preparation for VHOST_USER_RESUME".

I don’t think the back-end on the destination will receive a RESUME.  It 
never gets a SUSPEND, after all.  So this is about resuming operation 
once the vrings are kicked, and resuming it like it was left on the 
source when the back-end was SUSPEND-ed there.

>> +
>> +After the front-end has seen all data transferred (saving: seen an EOF
>> +on the pipe; loading: closed its end of the pipe), it sends the
>> +``VHOST_USER_CHECK_DEVICE_STATE`` message to verify that data transfer
>> +was successful in the back-end, too.  The back-end responds once it
>> +knows whether the tranfer and processing was successful or not.
>> +
>>   Memory access
>>   -------------
>>   
>> @@ -891,6 +930,7 @@ Protocol features
>>     #define VHOST_USER_PROTOCOL_F_STATUS               16
>>     #define VHOST_USER_PROTOCOL_F_XEN_MMAP             17
>>     #define VHOST_USER_PROTOCOL_F_SUSPEND              18
>> +  #define VHOST_USER_PROTOCOL_F_MIGRATORY_STATE      19
>>   
>>   Front-end message types
>>   -----------------------
>> @@ -1471,6 +1511,53 @@ Front-end message types
>>     before.  The back-end must again begin processing rings that are not
>>     stopped, and it may resume background operations.
>>   
>> +``VHOST_USER_SET_DEVICE_STATE_FD``
>> +  :id: 43
>> +  :equivalent ioctl: N/A
>> +  :request payload: device state transfer parameters
>> +  :reply payload: ``u64``
>> +
>> +  The front-end negotiates a pipe over which to transfer the back-end’s
>> +  internal state during migration.  For this purpose, this message is
>> +  accompanied by a file descriptor that is to be the back-end’s end of
>> +  the pipe.  If the back-end can provide a more efficient pipe (i.e.
>> +  because it internally already has a pipe into/from which to
>> +  put/receive state), it can ignore this and reply with a different file
>> +  descriptor to serve as the front-end’s end.
>> +
>> +  The request payload contains parameters for the subsequent data
>> +  transfer, as described in the :ref:`Migrating back-end state
>> +  <migrating_backend_state>` section.  That section also explains the
>> +  data transfer itself.
>> +
>> +  The value returned is both an indication for success, and whether a
>> +  new pipe file descriptor is returned: Bits 0–7 are 0 on success, and
>> +  non-zero on error.  Bit 8 is the invalid FD flag; this flag is set
>> +  when there is no file descriptor returned.  When this flag is not set,
>> +  the front-end must use the returned file descriptor as its end of the
>> +  pipe.  The back-end must not both indicate an error and return a file
>> +  descriptor.
> Is the invalid FD flag necessary? The front-end can check whether or not
> an fd was passed along with the result, so I'm not sure why the result
> also needs to communicate this.

If the front-end can check this, shouldn’t the back-end also generally 
be able to check whether the front-end has passed an FD in the ancillary 
data?  We do have this flag in messages sent by the front-end that can 
optionally provide an FD (SET_VRING_KICK, SET_VRING_CALL), so I thought 
it would be good for symmetry to keep this convention every time an FD 
is optional in communication between front-end and back-end, in either 
direction.

Hanna


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Virtio-fs] [PATCH v2 1/4] vhost-user.rst: Migrating back-end-internal state
  2023-07-18 16:12     ` [Virtio-fs] " Stefan Hajnoczi
  (?)
@ 2023-07-20 11:32     ` Hanna Czenczek
  -1 siblings, 0 replies; 36+ messages in thread
From: Hanna Czenczek @ 2023-07-20 11:32 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: virtio-fs, Eugenio Pérez, qemu-devel, Michael S . Tsirkin

On 18.07.23 18:12, Stefan Hajnoczi wrote:
> On Wed, Jul 12, 2023 at 01:16:59PM +0200, Hanna Czenczek wrote:
>> @@ -1471,6 +1511,53 @@ Front-end message types
>>     before.  The back-end must again begin processing rings that are not
>>     stopped, and it may resume background operations.
>>   
>> +``VHOST_USER_SET_DEVICE_STATE_FD``
>> +  :id: 43
>> +  :equivalent ioctl: N/A
>> +  :request payload: device state transfer parameters
> Where are these defined?

...an excellent question.  Right, I forgot to add them!


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/4] vhost: Add high-level state save/load functions
  2023-07-18 18:42     ` [Virtio-fs] " Stefan Hajnoczi
@ 2023-07-20 11:42       ` Hanna Czenczek
  -1 siblings, 0 replies; 36+ messages in thread
From: Hanna Czenczek @ 2023-07-20 11:42 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, virtio-fs, Michael S . Tsirkin, German Maglione,
	Eugenio Pérez

On 18.07.23 20:42, Stefan Hajnoczi wrote:
> On Wed, Jul 12, 2023 at 01:17:01PM +0200, Hanna Czenczek wrote:
>> vhost_save_backend_state() and vhost_load_backend_state() can be used by
>> vhost front-ends to easily save and load the back-end's state to/from
>> the migration stream.
>>
>> Because we do not know the full state size ahead of time,
>> vhost_save_backend_state() simply reads the data in 1 MB chunks, and
>> writes each chunk consecutively into the migration stream, prefixed by
>> its length.  EOF is indicated by a 0-length chunk.
>>
>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
>> ---
>>   include/hw/virtio/vhost.h |  35 +++++++
>>   hw/virtio/vhost.c         | 204 ++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 239 insertions(+)
>>
>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
>> index d8877496e5..0c282abd4e 100644
>> --- a/include/hw/virtio/vhost.h
>> +++ b/include/hw/virtio/vhost.h
>> @@ -425,4 +425,39 @@ int vhost_set_device_state_fd(struct vhost_dev *dev,
>>    */
>>   int vhost_check_device_state(struct vhost_dev *dev, Error **errp);
>>   
>> +/**
>> + * vhost_save_backend_state(): High-level function to receive a vhost
>> + * back-end's state, and save it in `f`.  Uses
> I think the GtkDoc syntax is @f instead of `f`.

OK, I’ll fix the parameter name references!

>> + * `vhost_set_device_state_fd()` to get the data from the back-end, and
>> + * stores it in consecutive chunks that are each prefixed by their
>> + * respective length (be32).  The end is marked by a 0-length chunk.
>> + *
>> + * Must only be called while the device and all its vrings are stopped
>> + * (`VHOST_TRANSFER_STATE_PHASE_STOPPED`).
>> + *
>> + * @dev: The vhost device from which to save the state
>> + * @f: Migration stream in which to save the state
>> + * @errp: Potential error message
>> + *
>> + * Returns 0 on success, and -errno otherwise.
>> + */
>> +int vhost_save_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp);
>> +
>> +/**
>> + * vhost_load_backend_state(): High-level function to load a vhost
>> + * back-end's state from `f`, and send it over to the back-end.  Reads
>> + * the data from `f` in the format used by `vhost_save_state()`, and
>> + * uses `vhost_set_device_state_fd()` to transfer it to the back-end.
>> + *
>> + * Must only be called while the device and all its vrings are stopped
>> + * (`VHOST_TRANSFER_STATE_PHASE_STOPPED`).
>> + *
>> + * @dev: The vhost device to which to send the sate
>> + * @f: Migration stream from which to load the state
>> + * @errp: Potential error message
>> + *
>> + * Returns 0 on success, and -errno otherwise.
>> + */
>> +int vhost_load_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp);
>> +
>>   #endif
>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> index 756b6d55a8..332d49a310 100644
>> --- a/hw/virtio/vhost.c
>> +++ b/hw/virtio/vhost.c
>> @@ -2128,3 +2128,207 @@ int vhost_check_device_state(struct vhost_dev *dev, Error **errp)
>>                  "vhost transport does not support migration state transfer");
>>       return -ENOSYS;
>>   }
>> +
>> +int vhost_save_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp)
>> +{
>> +    /* Maximum chunk size in which to transfer the state */
>> +    const size_t chunk_size = 1 * 1024 * 1024;
>> +    void *transfer_buf = NULL;
>> +    g_autoptr(GError) g_err = NULL;
>> +    int pipe_fds[2], read_fd = -1, write_fd = -1, reply_fd = -1;
>> +    int ret;
>> +
>> +    /* [0] for reading (our end), [1] for writing (back-end's end) */
>> +    if (!g_unix_open_pipe(pipe_fds, FD_CLOEXEC, &g_err)) {
>> +        error_setg(errp, "Failed to set up state transfer pipe: %s",
>> +                   g_err->message);
>> +        ret = -EINVAL;
>> +        goto fail;
>> +    }
>> +
>> +    read_fd = pipe_fds[0];
>> +    write_fd = pipe_fds[1];
>> +
>> +    /*
>> +     * VHOST_TRANSFER_STATE_PHASE_STOPPED means the device must be stopped.
>> +     * We cannot check dev->suspended, because the back-end may not support
>> +     * suspending.
>> +     */
>> +    assert(!dev->started);
>> +
>> +    /* Transfer ownership of write_fd to the back-end */
>> +    ret = vhost_set_device_state_fd(dev,
>> +                                    VHOST_TRANSFER_STATE_DIRECTION_SAVE,
>> +                                    VHOST_TRANSFER_STATE_PHASE_STOPPED,
>> +                                    write_fd,
>> +                                    &reply_fd,
>> +                                    errp);
>> +    if (ret < 0) {
>> +        error_prepend(errp, "Failed to initiate state transfer: ");
>> +        goto fail;
>> +    }
>> +
>> +    /* If the back-end wishes to use a different pipe, switch over */
>> +    if (reply_fd >= 0) {
>> +        close(read_fd);
>> +        read_fd = reply_fd;
>> +    }
>> +
>> +    transfer_buf = g_malloc(chunk_size);
>> +
>> +    while (true) {
>> +        ssize_t read_ret;
>> +
>> +        read_ret = read(read_fd, transfer_buf, chunk_size);
>> +        if (read_ret < 0) {
> Is it necessary to handle -EINTR?

Yes, indeed, I do believe so.  I’ll wrap this and the write() in 
RETRY_ON_EINTR().

Hanna



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Virtio-fs] [PATCH v2 3/4] vhost: Add high-level state save/load functions
@ 2023-07-20 11:42       ` Hanna Czenczek
  0 siblings, 0 replies; 36+ messages in thread
From: Hanna Czenczek @ 2023-07-20 11:42 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, virtio-fs, Michael S . Tsirkin, German Maglione,
	Eugenio Pérez

On 18.07.23 20:42, Stefan Hajnoczi wrote:
> On Wed, Jul 12, 2023 at 01:17:01PM +0200, Hanna Czenczek wrote:
>> vhost_save_backend_state() and vhost_load_backend_state() can be used by
>> vhost front-ends to easily save and load the back-end's state to/from
>> the migration stream.
>>
>> Because we do not know the full state size ahead of time,
>> vhost_save_backend_state() simply reads the data in 1 MB chunks, and
>> writes each chunk consecutively into the migration stream, prefixed by
>> its length.  EOF is indicated by a 0-length chunk.
>>
>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
>> ---
>>   include/hw/virtio/vhost.h |  35 +++++++
>>   hw/virtio/vhost.c         | 204 ++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 239 insertions(+)
>>
>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
>> index d8877496e5..0c282abd4e 100644
>> --- a/include/hw/virtio/vhost.h
>> +++ b/include/hw/virtio/vhost.h
>> @@ -425,4 +425,39 @@ int vhost_set_device_state_fd(struct vhost_dev *dev,
>>    */
>>   int vhost_check_device_state(struct vhost_dev *dev, Error **errp);
>>   
>> +/**
>> + * vhost_save_backend_state(): High-level function to receive a vhost
>> + * back-end's state, and save it in `f`.  Uses
> I think the GtkDoc syntax is @f instead of `f`.

OK, I’ll fix the parameter name references!

>> + * `vhost_set_device_state_fd()` to get the data from the back-end, and
>> + * stores it in consecutive chunks that are each prefixed by their
>> + * respective length (be32).  The end is marked by a 0-length chunk.
>> + *
>> + * Must only be called while the device and all its vrings are stopped
>> + * (`VHOST_TRANSFER_STATE_PHASE_STOPPED`).
>> + *
>> + * @dev: The vhost device from which to save the state
>> + * @f: Migration stream in which to save the state
>> + * @errp: Potential error message
>> + *
>> + * Returns 0 on success, and -errno otherwise.
>> + */
>> +int vhost_save_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp);
>> +
>> +/**
>> + * vhost_load_backend_state(): High-level function to load a vhost
>> + * back-end's state from `f`, and send it over to the back-end.  Reads
>> + * the data from `f` in the format used by `vhost_save_state()`, and
>> + * uses `vhost_set_device_state_fd()` to transfer it to the back-end.
>> + *
>> + * Must only be called while the device and all its vrings are stopped
>> + * (`VHOST_TRANSFER_STATE_PHASE_STOPPED`).
>> + *
>> + * @dev: The vhost device to which to send the sate
>> + * @f: Migration stream from which to load the state
>> + * @errp: Potential error message
>> + *
>> + * Returns 0 on success, and -errno otherwise.
>> + */
>> +int vhost_load_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp);
>> +
>>   #endif
>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> index 756b6d55a8..332d49a310 100644
>> --- a/hw/virtio/vhost.c
>> +++ b/hw/virtio/vhost.c
>> @@ -2128,3 +2128,207 @@ int vhost_check_device_state(struct vhost_dev *dev, Error **errp)
>>                  "vhost transport does not support migration state transfer");
>>       return -ENOSYS;
>>   }
>> +
>> +int vhost_save_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp)
>> +{
>> +    /* Maximum chunk size in which to transfer the state */
>> +    const size_t chunk_size = 1 * 1024 * 1024;
>> +    void *transfer_buf = NULL;
>> +    g_autoptr(GError) g_err = NULL;
>> +    int pipe_fds[2], read_fd = -1, write_fd = -1, reply_fd = -1;
>> +    int ret;
>> +
>> +    /* [0] for reading (our end), [1] for writing (back-end's end) */
>> +    if (!g_unix_open_pipe(pipe_fds, FD_CLOEXEC, &g_err)) {
>> +        error_setg(errp, "Failed to set up state transfer pipe: %s",
>> +                   g_err->message);
>> +        ret = -EINVAL;
>> +        goto fail;
>> +    }
>> +
>> +    read_fd = pipe_fds[0];
>> +    write_fd = pipe_fds[1];
>> +
>> +    /*
>> +     * VHOST_TRANSFER_STATE_PHASE_STOPPED means the device must be stopped.
>> +     * We cannot check dev->suspended, because the back-end may not support
>> +     * suspending.
>> +     */
>> +    assert(!dev->started);
>> +
>> +    /* Transfer ownership of write_fd to the back-end */
>> +    ret = vhost_set_device_state_fd(dev,
>> +                                    VHOST_TRANSFER_STATE_DIRECTION_SAVE,
>> +                                    VHOST_TRANSFER_STATE_PHASE_STOPPED,
>> +                                    write_fd,
>> +                                    &reply_fd,
>> +                                    errp);
>> +    if (ret < 0) {
>> +        error_prepend(errp, "Failed to initiate state transfer: ");
>> +        goto fail;
>> +    }
>> +
>> +    /* If the back-end wishes to use a different pipe, switch over */
>> +    if (reply_fd >= 0) {
>> +        close(read_fd);
>> +        read_fd = reply_fd;
>> +    }
>> +
>> +    transfer_buf = g_malloc(chunk_size);
>> +
>> +    while (true) {
>> +        ssize_t read_ret;
>> +
>> +        read_ret = read(read_fd, transfer_buf, chunk_size);
>> +        if (read_ret < 0) {
> Is it necessary to handle -EINTR?

Yes, indeed, I do believe so.  I’ll wrap this and the write() in 
RETRY_ON_EINTR().

Hanna


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/4] vhost-user.rst: Migrating back-end-internal state
  2023-07-19 16:33       ` [Virtio-fs] " Hanna Czenczek
@ 2023-07-20 11:43         ` Stefan Hajnoczi
  -1 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2023-07-20 11:43 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: Stefan Hajnoczi, qemu-devel, virtio-fs, Michael S . Tsirkin,
	German Maglione, Eugenio Pérez

On Wed, 19 Jul 2023 at 12:35, Hanna Czenczek <hreitz@redhat.com> wrote:
>
> On 18.07.23 17:57, Stefan Hajnoczi wrote:
> > On Wed, Jul 12, 2023 at 01:16:59PM +0200, Hanna Czenczek wrote:
> >> For vhost-user devices, qemu can migrate the virtio state, but not the
> >> back-end's internal state.  To do so, we need to be able to transfer
> >> this internal state between front-end (qemu) and back-end.
> >>
> >> At this point, this new feature is added for the purpose of virtio-fs
> >> migration.  Because virtiofsd's internal state will not be too large, we
> >> believe it is best to transfer it as a single binary blob after the
> >> streaming phase.
> >>
> >> These are the additions to the protocol:
> >> - New vhost-user protocol feature VHOST_USER_PROTOCOL_F_MIGRATORY_STATE
> > It's not 100% clear whether "migratory" is related to live migration or
> > something else. I don't like the name :P.
> >
> > The name "VHOST_USER_PROTOCOL_F_DEVICE_STATE" would be more obviously
> > associated with SET_DEVICE_STATE_FD and CHECK_DEVICE_STATE than
> > "MIGRATORY_STATE".
>
> Sure, sure.  Naming things is hard. :)
>
> >> - SET_DEVICE_STATE_FD function: Front-end and back-end negotiate a pipe
> >>    over which to transfer the state.
> > Does it need to be a pipe or can it be another type of file (e.g. UNIX
> > domain socket)?
>
> It’s difficult to say, honestly.  It can be anything, but I’m not sure
> how to describe that in this specification.
>
> It must be any FD into which the state sender can write the state and
> signal end of state by closing its FD; and from which the state receiver
> can read the state, terminated by seeing an EOF.  As you say, that
> doesn’t mean that the sender has to write the state into the FD, nor
> that the receiver has to read it (into memory), it’s just that either
> side must ensure the other can do it.
>
> > In the future the fd may become bi-directional. Pipes are
> > uni-directional on Linux.
> >
> > I suggest calling it a "file descriptor" and not mentioning "pipes"
> > explicitly.
>
> Works here in the commit message, but in the document, we need to be
> explicit about the requirements for this FD, i.e. the way in which
> front-end and back-end can expect the FD to be usable.  Calling it a
> “pipe” was a simple way, but you’re right, it’s more general than that.
>
> >> - CHECK_DEVICE_STATE: After the state has been transferred through the
> >>    pipe, the front-end invokes this function to verify success.  There is
> >>    no in-band way (through the pipe) to indicate failure, so we need to
> >>    check explicitly.
> >>
> >> Once the transfer pipe has been established via SET_DEVICE_STATE_FD
> >> (which includes establishing the direction of transfer and migration
> >> phase), the sending side writes its data into the pipe, and the reading
> >> side reads it until it sees an EOF.  Then, the front-end will check for
> >> success via CHECK_DEVICE_STATE, which on the destination side includes
> >> checking for integrity (i.e. errors during deserialization).
> >>
> >> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> >> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> >> ---
> >>   docs/interop/vhost-user.rst | 87 +++++++++++++++++++++++++++++++++++++
> >>   1 file changed, 87 insertions(+)
> >>
> >> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> >> index ac6be34c4c..c98dfeca25 100644
> >> --- a/docs/interop/vhost-user.rst
> >> +++ b/docs/interop/vhost-user.rst
> >> @@ -334,6 +334,7 @@ in the ancillary data:
> >>   * ``VHOST_USER_SET_VRING_ERR``
> >>   * ``VHOST_USER_SET_BACKEND_REQ_FD`` (previous name ``VHOST_USER_SET_SLAVE_REQ_FD``)
> >>   * ``VHOST_USER_SET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``)
> >> +* ``VHOST_USER_SET_DEVICE_STATE_FD``
> >>
> >>   If *front-end* is unable to send the full message or receives a wrong
> >>   reply it will close the connection. An optional reconnection mechanism
> >> @@ -497,6 +498,44 @@ it performs WAKE ioctl's on the userfaultfd to wake the stalled
> >>   back-end.  The front-end indicates support for this via the
> >>   ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` feature.
> >>
> >> +.. _migrating_backend_state:
> >> +
> >> +Migrating back-end state
> >> +^^^^^^^^^^^^^^^^^^^^^^^^
> >> +
> >> +If the back-end has internal state that is to be sent from source to
> >> +destination,
> > Migration and the terms "source" and "destination" have not been
> > defined. Here is a suggestion for an introductory paragraph:
> >
> >    Migrating device state involves transferring the state from one
> >    back-end, called the source, to another back-end, called the
> >    destination. After migration, the destination transparently resumes
> >    operation without requiring the driver to re-initialize the device at
> >    the VIRTIO level. If the migration fails, then the source can
> >    transparently resume operation until another migration attempt is
> >    made.
>
> You’re right, thanks!  Maybe I’ll try to be even more verbose here, and
> include what VM and guest do.
>
> >> the front-end may be able to store and transfer it via an
> >> +internal migration stream.  Support for this is negotiated with the
> >> +``VHOST_USER_PROTOCOL_F_MIGRATORY_STATE`` feature.
> >> +
> >> +First, a channel over which the state is transferred is established on
> >> +the source side using the ``VHOST_USER_SET_DEVICE_STATE_FD`` message.
> >> +This message has two parameters:
> >> +
> >> +* Direction of transfer: On the source, the data is saved, transferring
> >> +  it from the back-end to the front-end.  On the destination, the data
> >> +  is loaded, transferring it from the front-end to the back-end.
> >> +
> >> +* Migration phase: Currently, only the period after memory transfer
> > "memory transfer" is vague. This sentence is referring to VM live
> > migration and guest RAM but it may be better to focus on just the device
> > perspective and not the VM:
>
> The device perspective does include guest RAM, though, because the
> back-end must log its memory modifications, so it is very much involved
> in that process.  I think it’s a good idea to note that the state
> transfer will occur afterwards.

Okay. Please use "memory mapped regions" the first time memory is
mentioned, the same term that the vhost-user specification uses at the
beginning of the Migration section. That way it's clear exactly what
"memory" is.

>
> >    Migration is currently only supported while the device is suspended
> >    and all of its rings are stopped. In the future, additional phases
> >    might be support to allow iterative migration while the device is
> >    running.
>
> In any case, I’ll happily add this last sentence.
>
> >> +  before switch-over is supported, in which the device is suspended and
> >> +  all of its rings are stopped.
> >> +
> >> +Then, the writing end will write all the data it has, signalling the end
> >> +of data by closing its end of the pipe.  The reading end must read all
> >> +of this data and process it:
> >> +
> >> +* If saving, the front-end will transfer this data to the destination,
> > To be extra clear:
> >
> >    ...transfer this data to the destination through some
> >    implementation-specific means.
>
> Yep!
>
> >> +  where it is loaded into the destination back-end.
> >> +
> >> +* If loading, the back-end must deserialize its internal state from the
> >> +  transferred data and be set up to resume operation.
> > "and be set up to resume operation" is a little unclear to me. I guess
> > it means "in preparation for VHOST_USER_RESUME".
>
> I don’t think the back-end on the destination will receive a RESUME.  It
> never gets a SUSPEND, after all.  So this is about resuming operation
> once the vrings are kicked, and resuming it like it was left on the
> source when the back-end was SUSPEND-ed there.

This shows that the spec does not spell out how operation is resumed
on the destination (or source, in case of failure). Can you extend
this part of the spec to explain it?

>
> >> +
> >> +After the front-end has seen all data transferred (saving: seen an EOF
> >> +on the pipe; loading: closed its end of the pipe), it sends the
> >> +``VHOST_USER_CHECK_DEVICE_STATE`` message to verify that data transfer
> >> +was successful in the back-end, too.  The back-end responds once it
> >> +knows whether the tranfer and processing was successful or not.
> >> +
> >>   Memory access
> >>   -------------
> >>
> >> @@ -891,6 +930,7 @@ Protocol features
> >>     #define VHOST_USER_PROTOCOL_F_STATUS               16
> >>     #define VHOST_USER_PROTOCOL_F_XEN_MMAP             17
> >>     #define VHOST_USER_PROTOCOL_F_SUSPEND              18
> >> +  #define VHOST_USER_PROTOCOL_F_MIGRATORY_STATE      19
> >>
> >>   Front-end message types
> >>   -----------------------
> >> @@ -1471,6 +1511,53 @@ Front-end message types
> >>     before.  The back-end must again begin processing rings that are not
> >>     stopped, and it may resume background operations.
> >>
> >> +``VHOST_USER_SET_DEVICE_STATE_FD``
> >> +  :id: 43
> >> +  :equivalent ioctl: N/A
> >> +  :request payload: device state transfer parameters
> >> +  :reply payload: ``u64``
> >> +
> >> +  The front-end negotiates a pipe over which to transfer the back-end’s
> >> +  internal state during migration.  For this purpose, this message is
> >> +  accompanied by a file descriptor that is to be the back-end’s end of
> >> +  the pipe.  If the back-end can provide a more efficient pipe (i.e.
> >> +  because it internally already has a pipe into/from which to
> >> +  put/receive state), it can ignore this and reply with a different file
> >> +  descriptor to serve as the front-end’s end.
> >> +
> >> +  The request payload contains parameters for the subsequent data
> >> +  transfer, as described in the :ref:`Migrating back-end state
> >> +  <migrating_backend_state>` section.  That section also explains the
> >> +  data transfer itself.
> >> +
> >> +  The value returned is both an indication for success, and whether a
> >> +  new pipe file descriptor is returned: Bits 0–7 are 0 on success, and
> >> +  non-zero on error.  Bit 8 is the invalid FD flag; this flag is set
> >> +  when there is no file descriptor returned.  When this flag is not set,
> >> +  the front-end must use the returned file descriptor as its end of the
> >> +  pipe.  The back-end must not both indicate an error and return a file
> >> +  descriptor.
> > Is the invalid FD flag necessary? The front-end can check whether or not
> > an fd was passed along with the result, so I'm not sure why the result
> > also needs to communicate this.
>
> If the front-end can check this, shouldn’t the back-end also generally
> be able to check whether the front-end has passed an FD in the ancillary
> data?  We do have this flag in messages sent by the front-end that can
> optionally provide an FD (SET_VRING_KICK, SET_VRING_CALL), so I thought
> it would be good for symmetry to keep this convention every time an FD
> is optional in communication between front-end and back-end, in either
> direction.

Consistency is good. I wasn't aware that the other messages do that.
In that case, no complaints from me.

Stefan


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Virtio-fs] [PATCH v2 1/4] vhost-user.rst: Migrating back-end-internal state
@ 2023-07-20 11:43         ` Stefan Hajnoczi
  0 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2023-07-20 11:43 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: Stefan Hajnoczi, qemu-devel, virtio-fs, Michael S . Tsirkin,
	German Maglione, Eugenio Pérez

On Wed, 19 Jul 2023 at 12:35, Hanna Czenczek <hreitz@redhat.com> wrote:
>
> On 18.07.23 17:57, Stefan Hajnoczi wrote:
> > On Wed, Jul 12, 2023 at 01:16:59PM +0200, Hanna Czenczek wrote:
> >> For vhost-user devices, qemu can migrate the virtio state, but not the
> >> back-end's internal state.  To do so, we need to be able to transfer
> >> this internal state between front-end (qemu) and back-end.
> >>
> >> At this point, this new feature is added for the purpose of virtio-fs
> >> migration.  Because virtiofsd's internal state will not be too large, we
> >> believe it is best to transfer it as a single binary blob after the
> >> streaming phase.
> >>
> >> These are the additions to the protocol:
> >> - New vhost-user protocol feature VHOST_USER_PROTOCOL_F_MIGRATORY_STATE
> > It's not 100% clear whether "migratory" is related to live migration or
> > something else. I don't like the name :P.
> >
> > The name "VHOST_USER_PROTOCOL_F_DEVICE_STATE" would be more obviously
> > associated with SET_DEVICE_STATE_FD and CHECK_DEVICE_STATE than
> > "MIGRATORY_STATE".
>
> Sure, sure.  Naming things is hard. :)
>
> >> - SET_DEVICE_STATE_FD function: Front-end and back-end negotiate a pipe
> >>    over which to transfer the state.
> > Does it need to be a pipe or can it be another type of file (e.g. UNIX
> > domain socket)?
>
> It’s difficult to say, honestly.  It can be anything, but I’m not sure
> how to describe that in this specification.
>
> It must be any FD into which the state sender can write the state and
> signal end of state by closing its FD; and from which the state receiver
> can read the state, terminated by seeing an EOF.  As you say, that
> doesn’t mean that the sender has to write the state into the FD, nor
> that the receiver has to read it (into memory), it’s just that either
> side must ensure the other can do it.
>
> > In the future the fd may become bi-directional. Pipes are
> > uni-directional on Linux.
> >
> > I suggest calling it a "file descriptor" and not mentioning "pipes"
> > explicitly.
>
> Works here in the commit message, but in the document, we need to be
> explicit about the requirements for this FD, i.e. the way in which
> front-end and back-end can expect the FD to be usable.  Calling it a
> “pipe” was a simple way, but you’re right, it’s more general than that.
>
> >> - CHECK_DEVICE_STATE: After the state has been transferred through the
> >>    pipe, the front-end invokes this function to verify success.  There is
> >>    no in-band way (through the pipe) to indicate failure, so we need to
> >>    check explicitly.
> >>
> >> Once the transfer pipe has been established via SET_DEVICE_STATE_FD
> >> (which includes establishing the direction of transfer and migration
> >> phase), the sending side writes its data into the pipe, and the reading
> >> side reads it until it sees an EOF.  Then, the front-end will check for
> >> success via CHECK_DEVICE_STATE, which on the destination side includes
> >> checking for integrity (i.e. errors during deserialization).
> >>
> >> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> >> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> >> ---
> >>   docs/interop/vhost-user.rst | 87 +++++++++++++++++++++++++++++++++++++
> >>   1 file changed, 87 insertions(+)
> >>
> >> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> >> index ac6be34c4c..c98dfeca25 100644
> >> --- a/docs/interop/vhost-user.rst
> >> +++ b/docs/interop/vhost-user.rst
> >> @@ -334,6 +334,7 @@ in the ancillary data:
> >>   * ``VHOST_USER_SET_VRING_ERR``
> >>   * ``VHOST_USER_SET_BACKEND_REQ_FD`` (previous name ``VHOST_USER_SET_SLAVE_REQ_FD``)
> >>   * ``VHOST_USER_SET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``)
> >> +* ``VHOST_USER_SET_DEVICE_STATE_FD``
> >>
> >>   If *front-end* is unable to send the full message or receives a wrong
> >>   reply it will close the connection. An optional reconnection mechanism
> >> @@ -497,6 +498,44 @@ it performs WAKE ioctl's on the userfaultfd to wake the stalled
> >>   back-end.  The front-end indicates support for this via the
> >>   ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` feature.
> >>
> >> +.. _migrating_backend_state:
> >> +
> >> +Migrating back-end state
> >> +^^^^^^^^^^^^^^^^^^^^^^^^
> >> +
> >> +If the back-end has internal state that is to be sent from source to
> >> +destination,
> > Migration and the terms "source" and "destination" have not been
> > defined. Here is a suggestion for an introductory paragraph:
> >
> >    Migrating device state involves transferring the state from one
> >    back-end, called the source, to another back-end, called the
> >    destination. After migration, the destination transparently resumes
> >    operation without requiring the driver to re-initialize the device at
> >    the VIRTIO level. If the migration fails, then the source can
> >    transparently resume operation until another migration attempt is
> >    made.
>
> You’re right, thanks!  Maybe I’ll try to be even more verbose here, and
> include what VM and guest do.
>
> >> the front-end may be able to store and transfer it via an
> >> +internal migration stream.  Support for this is negotiated with the
> >> +``VHOST_USER_PROTOCOL_F_MIGRATORY_STATE`` feature.
> >> +
> >> +First, a channel over which the state is transferred is established on
> >> +the source side using the ``VHOST_USER_SET_DEVICE_STATE_FD`` message.
> >> +This message has two parameters:
> >> +
> >> +* Direction of transfer: On the source, the data is saved, transferring
> >> +  it from the back-end to the front-end.  On the destination, the data
> >> +  is loaded, transferring it from the front-end to the back-end.
> >> +
> >> +* Migration phase: Currently, only the period after memory transfer
> > "memory transfer" is vague. This sentence is referring to VM live
> > migration and guest RAM but it may be better to focus on just the device
> > perspective and not the VM:
>
> The device perspective does include guest RAM, though, because the
> back-end must log its memory modifications, so it is very much involved
> in that process.  I think it’s a good idea to note that the state
> transfer will occur afterwards.

Okay. Please use "memory mapped regions" the first time memory is
mentioned, the same term that the vhost-user specification uses at the
beginning of the Migration section. That way it's clear exactly what
"memory" is.

>
> >    Migration is currently only supported while the device is suspended
> >    and all of its rings are stopped. In the future, additional phases
> >    might be support to allow iterative migration while the device is
> >    running.
>
> In any case, I’ll happily add this last sentence.
>
> >> +  before switch-over is supported, in which the device is suspended and
> >> +  all of its rings are stopped.
> >> +
> >> +Then, the writing end will write all the data it has, signalling the end
> >> +of data by closing its end of the pipe.  The reading end must read all
> >> +of this data and process it:
> >> +
> >> +* If saving, the front-end will transfer this data to the destination,
> > To be extra clear:
> >
> >    ...transfer this data to the destination through some
> >    implementation-specific means.
>
> Yep!
>
> >> +  where it is loaded into the destination back-end.
> >> +
> >> +* If loading, the back-end must deserialize its internal state from the
> >> +  transferred data and be set up to resume operation.
> > "and be set up to resume operation" is a little unclear to me. I guess
> > it means "in preparation for VHOST_USER_RESUME".
>
> I don’t think the back-end on the destination will receive a RESUME.  It
> never gets a SUSPEND, after all.  So this is about resuming operation
> once the vrings are kicked, and resuming it like it was left on the
> source when the back-end was SUSPEND-ed there.

This shows that the spec does not spell out how operation is resumed
on the destination (or source, in case of failure). Can you extend
this part of the spec to explain it?

>
> >> +
> >> +After the front-end has seen all data transferred (saving: seen an EOF
> >> +on the pipe; loading: closed its end of the pipe), it sends the
> >> +``VHOST_USER_CHECK_DEVICE_STATE`` message to verify that data transfer
> >> +was successful in the back-end, too.  The back-end responds once it
> >> +knows whether the tranfer and processing was successful or not.
> >> +
> >>   Memory access
> >>   -------------
> >>
> >> @@ -891,6 +930,7 @@ Protocol features
> >>     #define VHOST_USER_PROTOCOL_F_STATUS               16
> >>     #define VHOST_USER_PROTOCOL_F_XEN_MMAP             17
> >>     #define VHOST_USER_PROTOCOL_F_SUSPEND              18
> >> +  #define VHOST_USER_PROTOCOL_F_MIGRATORY_STATE      19
> >>
> >>   Front-end message types
> >>   -----------------------
> >> @@ -1471,6 +1511,53 @@ Front-end message types
> >>     before.  The back-end must again begin processing rings that are not
> >>     stopped, and it may resume background operations.
> >>
> >> +``VHOST_USER_SET_DEVICE_STATE_FD``
> >> +  :id: 43
> >> +  :equivalent ioctl: N/A
> >> +  :request payload: device state transfer parameters
> >> +  :reply payload: ``u64``
> >> +
> >> +  The front-end negotiates a pipe over which to transfer the back-end’s
> >> +  internal state during migration.  For this purpose, this message is
> >> +  accompanied by a file descriptor that is to be the back-end’s end of
> >> +  the pipe.  If the back-end can provide a more efficient pipe (i.e.
> >> +  because it internally already has a pipe into/from which to
> >> +  put/receive state), it can ignore this and reply with a different file
> >> +  descriptor to serve as the front-end’s end.
> >> +
> >> +  The request payload contains parameters for the subsequent data
> >> +  transfer, as described in the :ref:`Migrating back-end state
> >> +  <migrating_backend_state>` section.  That section also explains the
> >> +  data transfer itself.
> >> +
> >> +  The value returned is both an indication for success, and whether a
> >> +  new pipe file descriptor is returned: Bits 0–7 are 0 on success, and
> >> +  non-zero on error.  Bit 8 is the invalid FD flag; this flag is set
> >> +  when there is no file descriptor returned.  When this flag is not set,
> >> +  the front-end must use the returned file descriptor as its end of the
> >> +  pipe.  The back-end must not both indicate an error and return a file
> >> +  descriptor.
> > Is the invalid FD flag necessary? The front-end can check whether or not
> > an fd was passed along with the result, so I'm not sure why the result
> > also needs to communicate this.
>
> If the front-end can check this, shouldn’t the back-end also generally
> be able to check whether the front-end has passed an FD in the ancillary
> data?  We do have this flag in messages sent by the front-end that can
> optionally provide an FD (SET_VRING_KICK, SET_VRING_CALL), so I thought
> it would be good for symmetry to keep this convention every time an FD
> is optional in communication between front-end and back-end, in either
> direction.

Consistency is good. I wasn't aware that the other messages do that.
In that case, no complaints from me.

Stefan


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Virtio-fs] [PATCH v2 2/4] vhost-user: Interface for migration state transfer
  2023-07-12 11:17   ` [Virtio-fs] " Hanna Czenczek
  (?)
  (?)
@ 2023-07-20 12:13   ` Hao Xu
  2023-07-20 13:20     ` Hanna Czenczek
  -1 siblings, 1 reply; 36+ messages in thread
From: Hao Xu @ 2023-07-20 12:13 UTC (permalink / raw)
  To: Hanna Czenczek, qemu-devel, virtio-fs
  Cc: Eugenio Pérez, Stefan Hajnoczi, Michael S . Tsirkin


On 7/12/23 19:17, Hanna Czenczek wrote:
> Add the interface for transferring the back-end's state during migration
> as defined previously in vhost-user.rst.
>
> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> ---
>   include/hw/virtio/vhost-backend.h |  24 +++++
>   include/hw/virtio/vhost.h         |  79 ++++++++++++++++
>   hw/virtio/vhost-user.c            | 147 ++++++++++++++++++++++++++++++
>   hw/virtio/vhost.c                 |  37 ++++++++
>   4 files changed, 287 insertions(+)
>
> diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> index 31a251a9f5..e59d0b53f8 100644
> --- a/include/hw/virtio/vhost-backend.h
> +++ b/include/hw/virtio/vhost-backend.h
> @@ -26,6 +26,18 @@ typedef enum VhostSetConfigType {
>       VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
>   } VhostSetConfigType;
>   
> +typedef enum VhostDeviceStateDirection {
> +    /* Transfer state from back-end (device) to front-end */
> +    VHOST_TRANSFER_STATE_DIRECTION_SAVE = 0,
> +    /* Transfer state from front-end to back-end (device) */
> +    VHOST_TRANSFER_STATE_DIRECTION_LOAD = 1,
> +} VhostDeviceStateDirection;
> +
> +typedef enum VhostDeviceStatePhase {
> +    /* The device (and all its vrings) is stopped */
> +    VHOST_TRANSFER_STATE_PHASE_STOPPED = 0,
> +} VhostDeviceStatePhase;
> +
>   struct vhost_inflight;
>   struct vhost_dev;
>   struct vhost_log;
> @@ -133,6 +145,15 @@ typedef int (*vhost_set_config_call_op)(struct vhost_dev *dev,
>   
>   typedef void (*vhost_reset_status_op)(struct vhost_dev *dev);
>   
> +typedef bool (*vhost_supports_migratory_state_op)(struct vhost_dev *dev);
> +typedef int (*vhost_set_device_state_fd_op)(struct vhost_dev *dev,
> +                                            VhostDeviceStateDirection direction,
> +                                            VhostDeviceStatePhase phase,
> +                                            int fd,
> +                                            int *reply_fd,
> +                                            Error **errp);
> +typedef int (*vhost_check_device_state_op)(struct vhost_dev *dev, Error **errp);
> +
>   typedef struct VhostOps {
>       VhostBackendType backend_type;
>       vhost_backend_init vhost_backend_init;
> @@ -181,6 +202,9 @@ typedef struct VhostOps {
>       vhost_force_iommu_op vhost_force_iommu;
>       vhost_set_config_call_op vhost_set_config_call;
>       vhost_reset_status_op vhost_reset_status;
> +    vhost_supports_migratory_state_op vhost_supports_migratory_state;
> +    vhost_set_device_state_fd_op vhost_set_device_state_fd;
> +    vhost_check_device_state_op vhost_check_device_state;
>   } VhostOps;
>   
>   int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index 69bf59d630..d8877496e5 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -346,4 +346,83 @@ int vhost_dev_set_inflight(struct vhost_dev *dev,
>   int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t queue_size,
>                              struct vhost_inflight *inflight);
>   bool vhost_dev_has_iommu(struct vhost_dev *dev);
> +
> +/**
> + * vhost_supports_migratory_state(): Checks whether the back-end
> + * supports transferring internal state for the purpose of migration.
> + * Support for this feature is required for vhost_set_device_state_fd()
> + * and vhost_check_device_state().
> + *
> + * @dev: The vhost device
> + *
> + * Returns true if the device supports these commands, and false if it
> + * does not.
> + */
> +bool vhost_supports_migratory_state(struct vhost_dev *dev);
> +
> +/**
> + * vhost_set_device_state_fd(): Begin transfer of internal state from/to
> + * the back-end for the purpose of migration.  Data is to be transferred
> + * over a pipe according to @direction and @phase.  The sending end must
> + * only write to the pipe, and the receiving end must only read from it.
> + * Once the sending end is done, it closes its FD.  The receiving end
> + * must take this as the end-of-transfer signal and close its FD, too.
> + *
> + * @fd is the back-end's end of the pipe: The write FD for SAVE, and the
> + * read FD for LOAD.  This function transfers ownership of @fd to the
> + * back-end, i.e. closes it in the front-end.
> + *
> + * The back-end may optionally reply with an FD of its own, if this
> + * improves efficiency on its end.  In this case, the returned FD is


Hi Hanna,

In what case/situation, the back-end will have a more efficient fd?

Here my understanding of this "FD of its own" is as same type as

the given fd(e.g. both pipe files), why the fd from back-end makes

difference? Do I miss anything here?


Regards,

Hao


> + * stored in *reply_fd.  The back-end will discard the FD sent to it,
> + * and the front-end must use *reply_fd for transferring state to/from
> + * the back-end.
> + *
> + * @dev: The vhost device
> + * @direction: The direction in which the state is to be transferred.
> + *             For outgoing migrations, this is SAVE, and data is read
> + *             from the back-end and stored by the front-end in the
> + *             migration stream.
> + *             For incoming migrations, this is LOAD, and data is read
> + *             by the front-end from the migration stream and sent to
> + *             the back-end to restore the saved state.
> + * @phase: Which migration phase we are in.  Currently, there is only
> + *         STOPPED (device and all vrings are stopped), in the future,
> + *         more phases such as PRE_COPY or POST_COPY may be added.
> + * @fd: Back-end's end of the pipe through which to transfer state; note
> + *      that ownership is transferred to the back-end, so this function
> + *      closes @fd in the front-end.
> + * @reply_fd: If the back-end wishes to use a different pipe for state
> + *            transfer, this will contain an FD for the front-end to
> + *            use.  Otherwise, -1 is stored here.
> + * @errp: Potential error description
> + *
> + * Returns 0 on success, and -errno on failure.
> + */
> +int vhost_set_device_state_fd(struct vhost_dev *dev,
> +                              VhostDeviceStateDirection direction,
> +                              VhostDeviceStatePhase phase,
> +                              int fd,
> +                              int *reply_fd,
> +                              Error **errp);
> +
> +/**
> + * vhost_set_device_state_fd(): After transferring state from/to the
> + * back-end via vhost_set_device_state_fd(), i.e. once the sending end
> + * has closed the pipe, inquire the back-end to report any potential
> + * errors that have occurred on its side.  This allows to sense errors
> + * like:
> + * - During outgoing migration, when the source side had already started
> + *   to produce its state, something went wrong and it failed to finish
> + * - During incoming migration, when the received state is somehow
> + *   invalid and cannot be processed by the back-end
> + *
> + * @dev: The vhost device
> + * @errp: Potential error description
> + *
> + * Returns 0 when the back-end reports successful state transfer and
> + * processing, and -errno when an error occurred somewhere.
> + */
> +int vhost_check_device_state(struct vhost_dev *dev, Error **errp);
> +
>   #endif
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 53a881ec2a..8e6b5485e8 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -76,6 +76,7 @@ enum VhostUserProtocolFeature {
>       VHOST_USER_PROTOCOL_F_STATUS = 16,
>       /* Feature 17 reserved for VHOST_USER_PROTOCOL_F_XEN_MMAP. */
>       VHOST_USER_PROTOCOL_F_SUSPEND = 18,
> +    VHOST_USER_PROTOCOL_F_MIGRATORY_STATE = 19,
>       VHOST_USER_PROTOCOL_F_MAX
>   };
>   
> @@ -125,6 +126,8 @@ typedef enum VhostUserRequest {
>       VHOST_USER_GET_STATUS = 40,
>       VHOST_USER_SUSPEND = 41,
>       VHOST_USER_RESUME = 42,
> +    VHOST_USER_SET_DEVICE_STATE_FD = 43,
> +    VHOST_USER_CHECK_DEVICE_STATE = 44,
>       VHOST_USER_MAX
>   } VhostUserRequest;
>   
> @@ -216,6 +219,12 @@ typedef struct {
>       uint32_t size; /* the following payload size */
>   } QEMU_PACKED VhostUserHeader;
>   
> +/* Request payload of VHOST_USER_SET_DEVICE_STATE_FD */
> +typedef struct VhostUserTransferDeviceState {
> +    uint32_t direction;
> +    uint32_t phase;
> +} VhostUserTransferDeviceState;
> +
>   typedef union {
>   #define VHOST_USER_VRING_IDX_MASK   (0xff)
>   #define VHOST_USER_VRING_NOFD_MASK  (0x1 << 8)
> @@ -230,6 +239,7 @@ typedef union {
>           VhostUserCryptoSession session;
>           VhostUserVringArea area;
>           VhostUserInflight inflight;
> +        VhostUserTransferDeviceState transfer_state;
>   } VhostUserPayload;
>   
>   typedef struct VhostUserMsg {
> @@ -2838,6 +2848,140 @@ static void vhost_user_reset_status(struct vhost_dev *dev)
>       }
>   }
>   
> +static bool vhost_user_supports_migratory_state(struct vhost_dev *dev)
> +{
> +    return virtio_has_feature(dev->protocol_features,
> +                              VHOST_USER_PROTOCOL_F_MIGRATORY_STATE);
> +}
> +
> +static int vhost_user_set_device_state_fd(struct vhost_dev *dev,
> +                                          VhostDeviceStateDirection direction,
> +                                          VhostDeviceStatePhase phase,
> +                                          int fd,
> +                                          int *reply_fd,
> +                                          Error **errp)
> +{
> +    int ret;
> +    struct vhost_user *vu = dev->opaque;
> +    VhostUserMsg msg = {
> +        .hdr = {
> +            .request = VHOST_USER_SET_DEVICE_STATE_FD,
> +            .flags = VHOST_USER_VERSION,
> +            .size = sizeof(msg.payload.transfer_state),
> +        },
> +        .payload.transfer_state = {
> +            .direction = direction,
> +            .phase = phase,
> +        },
> +    };
> +
> +    *reply_fd = -1;
> +
> +    if (!vhost_user_supports_migratory_state(dev)) {
> +        close(fd);
> +        error_setg(errp, "Back-end does not support migration state transfer");
> +        return -ENOTSUP;
> +    }
> +
> +    ret = vhost_user_write(dev, &msg, &fd, 1);
> +    close(fd);
> +    if (ret < 0) {
> +        error_setg_errno(errp, -ret,
> +                         "Failed to send SET_DEVICE_STATE_FD message");
> +        return ret;
> +    }
> +
> +    ret = vhost_user_read(dev, &msg);
> +    if (ret < 0) {
> +        error_setg_errno(errp, -ret,
> +                         "Failed to receive SET_DEVICE_STATE_FD reply");
> +        return ret;
> +    }
> +
> +    if (msg.hdr.request != VHOST_USER_SET_DEVICE_STATE_FD) {
> +        error_setg(errp,
> +                   "Received unexpected message type, expected %d, received %d",
> +                   VHOST_USER_SET_DEVICE_STATE_FD, msg.hdr.request);
> +        return -EPROTO;
> +    }
> +
> +    if (msg.hdr.size != sizeof(msg.payload.u64)) {
> +        error_setg(errp,
> +                   "Received bad message size, expected %zu, received %" PRIu32,
> +                   sizeof(msg.payload.u64), msg.hdr.size);
> +        return -EPROTO;
> +    }
> +
> +    if ((msg.payload.u64 & 0xff) != 0) {
> +        error_setg(errp, "Back-end did not accept migration state transfer");
> +        return -EIO;
> +    }
> +
> +    if (!(msg.payload.u64 & VHOST_USER_VRING_NOFD_MASK)) {
> +        *reply_fd = qemu_chr_fe_get_msgfd(vu->user->chr);
> +        if (*reply_fd < 0) {
> +            error_setg(errp,
> +                       "Failed to get back-end-provided transfer pipe FD");
> +            *reply_fd = -1;
> +            return -EIO;
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +static int vhost_user_check_device_state(struct vhost_dev *dev, Error **errp)
> +{
> +    int ret;
> +    VhostUserMsg msg = {
> +        .hdr = {
> +            .request = VHOST_USER_CHECK_DEVICE_STATE,
> +            .flags = VHOST_USER_VERSION,
> +            .size = 0,
> +        },
> +    };
> +
> +    if (!vhost_user_supports_migratory_state(dev)) {
> +        error_setg(errp, "Back-end does not support migration state transfer");
> +        return -ENOTSUP;
> +    }
> +
> +    ret = vhost_user_write(dev, &msg, NULL, 0);
> +    if (ret < 0) {
> +        error_setg_errno(errp, -ret,
> +                         "Failed to send CHECK_DEVICE_STATE message");
> +        return ret;
> +    }
> +
> +    ret = vhost_user_read(dev, &msg);
> +    if (ret < 0) {
> +        error_setg_errno(errp, -ret,
> +                         "Failed to receive CHECK_DEVICE_STATE reply");
> +        return ret;
> +    }
> +
> +    if (msg.hdr.request != VHOST_USER_CHECK_DEVICE_STATE) {
> +        error_setg(errp,
> +                   "Received unexpected message type, expected %d, received %d",
> +                   VHOST_USER_CHECK_DEVICE_STATE, msg.hdr.request);
> +        return -EPROTO;
> +    }
> +
> +    if (msg.hdr.size != sizeof(msg.payload.u64)) {
> +        error_setg(errp,
> +                   "Received bad message size, expected %zu, received %" PRIu32,
> +                   sizeof(msg.payload.u64), msg.hdr.size);
> +        return -EPROTO;
> +    }
> +
> +    if (msg.payload.u64 != 0) {
> +        error_setg(errp, "Back-end failed to process its internal state");
> +        return -EIO;
> +    }
> +
> +    return 0;
> +}
> +
>   const VhostOps user_ops = {
>           .backend_type = VHOST_BACKEND_TYPE_USER,
>           .vhost_backend_init = vhost_user_backend_init,
> @@ -2874,4 +3018,7 @@ const VhostOps user_ops = {
>           .vhost_set_inflight_fd = vhost_user_set_inflight_fd,
>           .vhost_dev_start = vhost_user_dev_start,
>           .vhost_reset_status = vhost_user_reset_status,
> +        .vhost_supports_migratory_state = vhost_user_supports_migratory_state,
> +        .vhost_set_device_state_fd = vhost_user_set_device_state_fd,
> +        .vhost_check_device_state = vhost_user_check_device_state,
>   };
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 2e28e58da7..756b6d55a8 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -2091,3 +2091,40 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
>   
>       return -ENOSYS;
>   }
> +
> +bool vhost_supports_migratory_state(struct vhost_dev *dev)
> +{
> +    if (dev->vhost_ops->vhost_supports_migratory_state) {
> +        return dev->vhost_ops->vhost_supports_migratory_state(dev);
> +    }
> +
> +    return false;
> +}
> +
> +int vhost_set_device_state_fd(struct vhost_dev *dev,
> +                              VhostDeviceStateDirection direction,
> +                              VhostDeviceStatePhase phase,
> +                              int fd,
> +                              int *reply_fd,
> +                              Error **errp)
> +{
> +    if (dev->vhost_ops->vhost_set_device_state_fd) {
> +        return dev->vhost_ops->vhost_set_device_state_fd(dev, direction, phase,
> +                                                         fd, reply_fd, errp);
> +    }
> +
> +    error_setg(errp,
> +               "vhost transport does not support migration state transfer");
> +    return -ENOSYS;
> +}
> +
> +int vhost_check_device_state(struct vhost_dev *dev, Error **errp)
> +{
> +    if (dev->vhost_ops->vhost_check_device_state) {
> +        return dev->vhost_ops->vhost_check_device_state(dev, errp);
> +    }
> +
> +    error_setg(errp,
> +               "vhost transport does not support migration state transfer");
> +    return -ENOSYS;
> +}


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Virtio-fs] [PATCH v2 2/4] vhost-user: Interface for migration state transfer
  2023-07-20 12:13   ` Hao Xu
@ 2023-07-20 13:20     ` Hanna Czenczek
  2023-07-20 15:05       ` Hao Xu
  0 siblings, 1 reply; 36+ messages in thread
From: Hanna Czenczek @ 2023-07-20 13:20 UTC (permalink / raw)
  To: Hao Xu, qemu-devel, virtio-fs
  Cc: Eugenio Pérez, Stefan Hajnoczi, Michael S . Tsirkin

On 20.07.23 14:13, Hao Xu wrote:
>
> On 7/12/23 19:17, Hanna Czenczek wrote:
>> Add the interface for transferring the back-end's state during migration
>> as defined previously in vhost-user.rst.
>>
>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
>> ---
>>   include/hw/virtio/vhost-backend.h |  24 +++++
>>   include/hw/virtio/vhost.h         |  79 ++++++++++++++++
>>   hw/virtio/vhost-user.c            | 147 ++++++++++++++++++++++++++++++
>>   hw/virtio/vhost.c                 |  37 ++++++++
>>   4 files changed, 287 insertions(+)
>>
>> diff --git a/include/hw/virtio/vhost-backend.h 
>> b/include/hw/virtio/vhost-backend.h
>> index 31a251a9f5..e59d0b53f8 100644
>> --- a/include/hw/virtio/vhost-backend.h
>> +++ b/include/hw/virtio/vhost-backend.h
>> @@ -26,6 +26,18 @@ typedef enum VhostSetConfigType {
>>       VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
>>   } VhostSetConfigType;
>>   +typedef enum VhostDeviceStateDirection {
>> +    /* Transfer state from back-end (device) to front-end */
>> +    VHOST_TRANSFER_STATE_DIRECTION_SAVE = 0,
>> +    /* Transfer state from front-end to back-end (device) */
>> +    VHOST_TRANSFER_STATE_DIRECTION_LOAD = 1,
>> +} VhostDeviceStateDirection;
>> +
>> +typedef enum VhostDeviceStatePhase {
>> +    /* The device (and all its vrings) is stopped */
>> +    VHOST_TRANSFER_STATE_PHASE_STOPPED = 0,
>> +} VhostDeviceStatePhase;
>> +
>>   struct vhost_inflight;
>>   struct vhost_dev;
>>   struct vhost_log;
>> @@ -133,6 +145,15 @@ typedef int (*vhost_set_config_call_op)(struct 
>> vhost_dev *dev,
>>     typedef void (*vhost_reset_status_op)(struct vhost_dev *dev);
>>   +typedef bool (*vhost_supports_migratory_state_op)(struct vhost_dev 
>> *dev);
>> +typedef int (*vhost_set_device_state_fd_op)(struct vhost_dev *dev,
>> + VhostDeviceStateDirection direction,
>> + VhostDeviceStatePhase phase,
>> +                                            int fd,
>> +                                            int *reply_fd,
>> +                                            Error **errp);
>> +typedef int (*vhost_check_device_state_op)(struct vhost_dev *dev, 
>> Error **errp);
>> +
>>   typedef struct VhostOps {
>>       VhostBackendType backend_type;
>>       vhost_backend_init vhost_backend_init;
>> @@ -181,6 +202,9 @@ typedef struct VhostOps {
>>       vhost_force_iommu_op vhost_force_iommu;
>>       vhost_set_config_call_op vhost_set_config_call;
>>       vhost_reset_status_op vhost_reset_status;
>> +    vhost_supports_migratory_state_op vhost_supports_migratory_state;
>> +    vhost_set_device_state_fd_op vhost_set_device_state_fd;
>> +    vhost_check_device_state_op vhost_check_device_state;
>>   } VhostOps;
>>     int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
>> index 69bf59d630..d8877496e5 100644
>> --- a/include/hw/virtio/vhost.h
>> +++ b/include/hw/virtio/vhost.h
>> @@ -346,4 +346,83 @@ int vhost_dev_set_inflight(struct vhost_dev *dev,
>>   int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t queue_size,
>>                              struct vhost_inflight *inflight);
>>   bool vhost_dev_has_iommu(struct vhost_dev *dev);
>> +
>> +/**
>> + * vhost_supports_migratory_state(): Checks whether the back-end
>> + * supports transferring internal state for the purpose of migration.
>> + * Support for this feature is required for vhost_set_device_state_fd()
>> + * and vhost_check_device_state().
>> + *
>> + * @dev: The vhost device
>> + *
>> + * Returns true if the device supports these commands, and false if it
>> + * does not.
>> + */
>> +bool vhost_supports_migratory_state(struct vhost_dev *dev);
>> +
>> +/**
>> + * vhost_set_device_state_fd(): Begin transfer of internal state 
>> from/to
>> + * the back-end for the purpose of migration.  Data is to be 
>> transferred
>> + * over a pipe according to @direction and @phase.  The sending end 
>> must
>> + * only write to the pipe, and the receiving end must only read from 
>> it.
>> + * Once the sending end is done, it closes its FD.  The receiving end
>> + * must take this as the end-of-transfer signal and close its FD, too.
>> + *
>> + * @fd is the back-end's end of the pipe: The write FD for SAVE, and 
>> the
>> + * read FD for LOAD.  This function transfers ownership of @fd to the
>> + * back-end, i.e. closes it in the front-end.
>> + *
>> + * The back-end may optionally reply with an FD of its own, if this
>> + * improves efficiency on its end.  In this case, the returned FD is
>
>
> Hi Hanna,
>
> In what case/situation, the back-end will have a more efficient fd?

Hi Hao,

There is no example yet.

> Here my understanding of this "FD of its own" is as same type as
>
> the given fd(e.g. both pipe files), why the fd from back-end makes
>
> difference? Do I miss anything here?

Maybe it makes more sense in the context of how we came up with the 
idea: Specifically, Stefan and me were asking which end should provide 
the FD.  In the context of vhost-user, it makes sense to have it be the 
front-end, because it controls vhost-user communication, but that’s just 
the natural protocol choice, not necessarily the most efficient one.

It is imaginable that the front-end (e.g. qemu) could create a file 
descriptor whose data is directly spliced (automatically) into the 
migration stream, and then hand this FD to the back-end.  In practice, 
this doesn’t work for qemu (at this point), because it doesn’t use 
simple read/write into the migration stream, but has an abstraction 
layer for that.  It might be possible to make this work in some cases, 
depending on what is used as a transport, but (1) not generally, and (2) 
not now.  But this would be efficient.

The model we’d implement in qemu with this series is comparatively not 
efficient, because it manually copies data from the FD (which by default 
is a pipe) into the migration stream.

But it is possible that the front-end can provide a zero-copy FD into 
the migration stream, and for that reason we want to allow the front-end 
to provide the transfer FD.

In contrast, the back-end might have a more efficient implementation on 
its side, too, though.  It is difficult to imagine, but it may be 
possible that it has an FD already where the data needs to written 
to/read from, e.g. because it’s connected to a physical device that 
wants to get/send its state this way.  Admittedly, we have absolutely no 
concrete example for such a back-end at this point, but it’s hard to 
rule out that it is possible that there will be back-ends that could 
make use of zero-copy if only they are allowed to dictate the transfer FD.

So because we in qemu can’t (at least not generally) provide an 
efficient (zero-copy) implementation, we don’t want to rule out that the 
back-end might be able to, so we also want to allow it to provide the 
transfer FD.

In the end, we decided that we don’t want to preclude either side of 
providing the FD.  If one side knows it can do better than a plain pipe 
with copying on both ends, it should provide the FD.  That doesn’t 
complicate the implementation much.

(So, notably, we measure “improves efficiency” based on “is it better 
than a plain pipe with copying on both ends”.  A pipe with copying is 
the default implementation, but as Stefan has pointed out in his review, 
it doesn’t need to be a pipe.  More efficient FDs, like the back-end can 
provide in its reply, would actually likely not be pipes.)

Hanna


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Virtio-fs] [PATCH v2 2/4] vhost-user: Interface for migration state transfer
  2023-07-20 13:20     ` Hanna Czenczek
@ 2023-07-20 15:05       ` Hao Xu
  2023-07-21  8:07         ` Hanna Czenczek
  0 siblings, 1 reply; 36+ messages in thread
From: Hao Xu @ 2023-07-20 15:05 UTC (permalink / raw)
  To: Hanna Czenczek, qemu-devel, virtio-fs
  Cc: Eugenio Pérez, Stefan Hajnoczi, Michael S . Tsirkin


On 7/20/23 21:20, Hanna Czenczek wrote:
> On 20.07.23 14:13, Hao Xu wrote:
>>
>> On 7/12/23 19:17, Hanna Czenczek wrote:
>>> Add the interface for transferring the back-end's state during 
>>> migration
>>> as defined previously in vhost-user.rst.
>>>
>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
>>> ---
>>>   include/hw/virtio/vhost-backend.h |  24 +++++
>>>   include/hw/virtio/vhost.h         |  79 ++++++++++++++++
>>>   hw/virtio/vhost-user.c            | 147 
>>> ++++++++++++++++++++++++++++++
>>>   hw/virtio/vhost.c                 |  37 ++++++++
>>>   4 files changed, 287 insertions(+)
>>>
>>> diff --git a/include/hw/virtio/vhost-backend.h 
>>> b/include/hw/virtio/vhost-backend.h
>>> index 31a251a9f5..e59d0b53f8 100644
>>> --- a/include/hw/virtio/vhost-backend.h
>>> +++ b/include/hw/virtio/vhost-backend.h
>>> @@ -26,6 +26,18 @@ typedef enum VhostSetConfigType {
>>>       VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
>>>   } VhostSetConfigType;
>>>   +typedef enum VhostDeviceStateDirection {
>>> +    /* Transfer state from back-end (device) to front-end */
>>> +    VHOST_TRANSFER_STATE_DIRECTION_SAVE = 0,
>>> +    /* Transfer state from front-end to back-end (device) */
>>> +    VHOST_TRANSFER_STATE_DIRECTION_LOAD = 1,
>>> +} VhostDeviceStateDirection;
>>> +
>>> +typedef enum VhostDeviceStatePhase {
>>> +    /* The device (and all its vrings) is stopped */
>>> +    VHOST_TRANSFER_STATE_PHASE_STOPPED = 0,
>>> +} VhostDeviceStatePhase;
>>> +
>>>   struct vhost_inflight;
>>>   struct vhost_dev;
>>>   struct vhost_log;
>>> @@ -133,6 +145,15 @@ typedef int (*vhost_set_config_call_op)(struct 
>>> vhost_dev *dev,
>>>     typedef void (*vhost_reset_status_op)(struct vhost_dev *dev);
>>>   +typedef bool (*vhost_supports_migratory_state_op)(struct 
>>> vhost_dev *dev);
>>> +typedef int (*vhost_set_device_state_fd_op)(struct vhost_dev *dev,
>>> + VhostDeviceStateDirection direction,
>>> + VhostDeviceStatePhase phase,
>>> +                                            int fd,
>>> +                                            int *reply_fd,
>>> +                                            Error **errp);
>>> +typedef int (*vhost_check_device_state_op)(struct vhost_dev *dev, 
>>> Error **errp);
>>> +
>>>   typedef struct VhostOps {
>>>       VhostBackendType backend_type;
>>>       vhost_backend_init vhost_backend_init;
>>> @@ -181,6 +202,9 @@ typedef struct VhostOps {
>>>       vhost_force_iommu_op vhost_force_iommu;
>>>       vhost_set_config_call_op vhost_set_config_call;
>>>       vhost_reset_status_op vhost_reset_status;
>>> +    vhost_supports_migratory_state_op vhost_supports_migratory_state;
>>> +    vhost_set_device_state_fd_op vhost_set_device_state_fd;
>>> +    vhost_check_device_state_op vhost_check_device_state;
>>>   } VhostOps;
>>>     int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
>>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
>>> index 69bf59d630..d8877496e5 100644
>>> --- a/include/hw/virtio/vhost.h
>>> +++ b/include/hw/virtio/vhost.h
>>> @@ -346,4 +346,83 @@ int vhost_dev_set_inflight(struct vhost_dev *dev,
>>>   int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t 
>>> queue_size,
>>>                              struct vhost_inflight *inflight);
>>>   bool vhost_dev_has_iommu(struct vhost_dev *dev);
>>> +
>>> +/**
>>> + * vhost_supports_migratory_state(): Checks whether the back-end
>>> + * supports transferring internal state for the purpose of migration.
>>> + * Support for this feature is required for 
>>> vhost_set_device_state_fd()
>>> + * and vhost_check_device_state().
>>> + *
>>> + * @dev: The vhost device
>>> + *
>>> + * Returns true if the device supports these commands, and false if it
>>> + * does not.
>>> + */
>>> +bool vhost_supports_migratory_state(struct vhost_dev *dev);
>>> +
>>> +/**
>>> + * vhost_set_device_state_fd(): Begin transfer of internal state 
>>> from/to
>>> + * the back-end for the purpose of migration.  Data is to be 
>>> transferred
>>> + * over a pipe according to @direction and @phase.  The sending end 
>>> must
>>> + * only write to the pipe, and the receiving end must only read 
>>> from it.
>>> + * Once the sending end is done, it closes its FD.  The receiving end
>>> + * must take this as the end-of-transfer signal and close its FD, too.
>>> + *
>>> + * @fd is the back-end's end of the pipe: The write FD for SAVE, 
>>> and the
>>> + * read FD for LOAD.  This function transfers ownership of @fd to the
>>> + * back-end, i.e. closes it in the front-end.
>>> + *
>>> + * The back-end may optionally reply with an FD of its own, if this
>>> + * improves efficiency on its end.  In this case, the returned FD is
>>
>>
>> Hi Hanna,
>>
>> In what case/situation, the back-end will have a more efficient fd?
>
> Hi Hao,
>
> There is no example yet.
>
>> Here my understanding of this "FD of its own" is as same type as
>>
>> the given fd(e.g. both pipe files), why the fd from back-end makes
>>
>> difference? Do I miss anything here?
>
> Maybe it makes more sense in the context of how we came up with the 
> idea: Specifically, Stefan and me were asking which end should provide 
> the FD.  In the context of vhost-user, it makes sense to have it be 
> the front-end, because it controls vhost-user communication, but 
> that’s just the natural protocol choice, not necessarily the most 
> efficient one.
>
> It is imaginable that the front-end (e.g. qemu) could create a file 
> descriptor whose data is directly spliced (automatically) into the 
> migration stream, and then hand this FD to the back-end. In practice, 
> this doesn’t work for qemu (at this point), because it doesn’t use 
> simple read/write into the migration stream, but has an abstraction 
> layer for that.  It might be possible to make this work in some cases, 
> depending on what is used as a transport, but (1) not generally, and 
> (2) not now.  But this would be efficient.


I'm thinking one thing, we now already have a channel(a unix domain 
socket) between front-end and back-end, why not delivering the state 
file (as an fd) to/from back-end directly rathen than negotiating

a new pipe as data channel.(since we already assume they can share fd of 
pipe, why not fd of normal file?)



>
> The model we’d implement in qemu with this series is comparatively not 
> efficient, because it manually copies data from the FD (which by 
> default is a pipe) into the migration stream.
>
> But it is possible that the front-end can provide a zero-copy FD into 
> the migration stream, and for that reason we want to allow the 
> front-end to provide the transfer FD.
>
> In contrast, the back-end might have a more efficient implementation 
> on its side, too, though.  It is difficult to imagine, but it may be 
> possible that it has an FD already where the data needs to written 
> to/read from, e.g. because it’s connected to a physical device that 
> wants to get/send its state this way.  Admittedly, we have absolutely 
> no concrete example for such a back-end at this point, but it’s hard 
> to rule out that it is possible that there will be back-ends that 
> could make use of zero-copy if only they are allowed to dictate the 
> transfer FD.
>
> So because we in qemu can’t (at least not generally) provide an 
> efficient (zero-copy) implementation, we don’t want to rule out that 
> the back-end might be able to, so we also want to allow it to provide 
> the transfer FD.
>
> In the end, we decided that we don’t want to preclude either side of 
> providing the FD.  If one side knows it can do better than a plain 
> pipe with copying on both ends, it should provide the FD. That doesn’t 
> complicate the implementation much.
>
> (So, notably, we measure “improves efficiency” based on “is it better 
> than a plain pipe with copying on both ends”.  A pipe with copying is 
> the default implementation, but as Stefan has pointed out in his 
> review, it doesn’t need to be a pipe.  More efficient FDs, like the 
> back-end can provide in its reply, would actually likely not be pipes.)


Yea, but the vhost-user protocol in this patchset defines these FDs as 
data transfer channel not the data itself, how about the latter?




>
> Hanna
>


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Virtio-fs] [PATCH v2 2/4] vhost-user: Interface for migration state transfer
  2023-07-20 15:05       ` Hao Xu
@ 2023-07-21  8:07         ` Hanna Czenczek
  2023-07-21  8:23           ` Hao Xu
  0 siblings, 1 reply; 36+ messages in thread
From: Hanna Czenczek @ 2023-07-21  8:07 UTC (permalink / raw)
  To: Hao Xu, qemu-devel, virtio-fs
  Cc: Eugenio Pérez, Stefan Hajnoczi, Michael S . Tsirkin

On 20.07.23 17:05, Hao Xu wrote:
>
> On 7/20/23 21:20, Hanna Czenczek wrote:
>> On 20.07.23 14:13, Hao Xu wrote:
>>>
>>> On 7/12/23 19:17, Hanna Czenczek wrote:
>>>> Add the interface for transferring the back-end's state during 
>>>> migration
>>>> as defined previously in vhost-user.rst.
>>>>
>>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
>>>> ---
>>>>   include/hw/virtio/vhost-backend.h |  24 +++++
>>>>   include/hw/virtio/vhost.h         |  79 ++++++++++++++++
>>>>   hw/virtio/vhost-user.c            | 147 
>>>> ++++++++++++++++++++++++++++++
>>>>   hw/virtio/vhost.c                 |  37 ++++++++
>>>>   4 files changed, 287 insertions(+)
>>>>
>>>> diff --git a/include/hw/virtio/vhost-backend.h 
>>>> b/include/hw/virtio/vhost-backend.h
>>>> index 31a251a9f5..e59d0b53f8 100644
>>>> --- a/include/hw/virtio/vhost-backend.h
>>>> +++ b/include/hw/virtio/vhost-backend.h
>>>> @@ -26,6 +26,18 @@ typedef enum VhostSetConfigType {
>>>>       VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
>>>>   } VhostSetConfigType;
>>>>   +typedef enum VhostDeviceStateDirection {
>>>> +    /* Transfer state from back-end (device) to front-end */
>>>> +    VHOST_TRANSFER_STATE_DIRECTION_SAVE = 0,
>>>> +    /* Transfer state from front-end to back-end (device) */
>>>> +    VHOST_TRANSFER_STATE_DIRECTION_LOAD = 1,
>>>> +} VhostDeviceStateDirection;
>>>> +
>>>> +typedef enum VhostDeviceStatePhase {
>>>> +    /* The device (and all its vrings) is stopped */
>>>> +    VHOST_TRANSFER_STATE_PHASE_STOPPED = 0,
>>>> +} VhostDeviceStatePhase;
>>>> +
>>>>   struct vhost_inflight;
>>>>   struct vhost_dev;
>>>>   struct vhost_log;
>>>> @@ -133,6 +145,15 @@ typedef int (*vhost_set_config_call_op)(struct 
>>>> vhost_dev *dev,
>>>>     typedef void (*vhost_reset_status_op)(struct vhost_dev *dev);
>>>>   +typedef bool (*vhost_supports_migratory_state_op)(struct 
>>>> vhost_dev *dev);
>>>> +typedef int (*vhost_set_device_state_fd_op)(struct vhost_dev *dev,
>>>> + VhostDeviceStateDirection direction,
>>>> + VhostDeviceStatePhase phase,
>>>> +                                            int fd,
>>>> +                                            int *reply_fd,
>>>> +                                            Error **errp);
>>>> +typedef int (*vhost_check_device_state_op)(struct vhost_dev *dev, 
>>>> Error **errp);
>>>> +
>>>>   typedef struct VhostOps {
>>>>       VhostBackendType backend_type;
>>>>       vhost_backend_init vhost_backend_init;
>>>> @@ -181,6 +202,9 @@ typedef struct VhostOps {
>>>>       vhost_force_iommu_op vhost_force_iommu;
>>>>       vhost_set_config_call_op vhost_set_config_call;
>>>>       vhost_reset_status_op vhost_reset_status;
>>>> +    vhost_supports_migratory_state_op vhost_supports_migratory_state;
>>>> +    vhost_set_device_state_fd_op vhost_set_device_state_fd;
>>>> +    vhost_check_device_state_op vhost_check_device_state;
>>>>   } VhostOps;
>>>>     int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
>>>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
>>>> index 69bf59d630..d8877496e5 100644
>>>> --- a/include/hw/virtio/vhost.h
>>>> +++ b/include/hw/virtio/vhost.h
>>>> @@ -346,4 +346,83 @@ int vhost_dev_set_inflight(struct vhost_dev *dev,
>>>>   int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t 
>>>> queue_size,
>>>>                              struct vhost_inflight *inflight);
>>>>   bool vhost_dev_has_iommu(struct vhost_dev *dev);
>>>> +
>>>> +/**
>>>> + * vhost_supports_migratory_state(): Checks whether the back-end
>>>> + * supports transferring internal state for the purpose of migration.
>>>> + * Support for this feature is required for 
>>>> vhost_set_device_state_fd()
>>>> + * and vhost_check_device_state().
>>>> + *
>>>> + * @dev: The vhost device
>>>> + *
>>>> + * Returns true if the device supports these commands, and false 
>>>> if it
>>>> + * does not.
>>>> + */
>>>> +bool vhost_supports_migratory_state(struct vhost_dev *dev);
>>>> +
>>>> +/**
>>>> + * vhost_set_device_state_fd(): Begin transfer of internal state 
>>>> from/to
>>>> + * the back-end for the purpose of migration.  Data is to be 
>>>> transferred
>>>> + * over a pipe according to @direction and @phase.  The sending 
>>>> end must
>>>> + * only write to the pipe, and the receiving end must only read 
>>>> from it.
>>>> + * Once the sending end is done, it closes its FD.  The receiving end
>>>> + * must take this as the end-of-transfer signal and close its FD, 
>>>> too.
>>>> + *
>>>> + * @fd is the back-end's end of the pipe: The write FD for SAVE, 
>>>> and the
>>>> + * read FD for LOAD.  This function transfers ownership of @fd to the
>>>> + * back-end, i.e. closes it in the front-end.
>>>> + *
>>>> + * The back-end may optionally reply with an FD of its own, if this
>>>> + * improves efficiency on its end.  In this case, the returned FD is
>>>
>>>
>>> Hi Hanna,
>>>
>>> In what case/situation, the back-end will have a more efficient fd?
>>
>> Hi Hao,
>>
>> There is no example yet.
>>
>>> Here my understanding of this "FD of its own" is as same type as
>>>
>>> the given fd(e.g. both pipe files), why the fd from back-end makes
>>>
>>> difference? Do I miss anything here?
>>
>> Maybe it makes more sense in the context of how we came up with the 
>> idea: Specifically, Stefan and me were asking which end should 
>> provide the FD.  In the context of vhost-user, it makes sense to have 
>> it be the front-end, because it controls vhost-user communication, 
>> but that’s just the natural protocol choice, not necessarily the most 
>> efficient one.
>>
>> It is imaginable that the front-end (e.g. qemu) could create a file 
>> descriptor whose data is directly spliced (automatically) into the 
>> migration stream, and then hand this FD to the back-end. In practice, 
>> this doesn’t work for qemu (at this point), because it doesn’t use 
>> simple read/write into the migration stream, but has an abstraction 
>> layer for that.  It might be possible to make this work in some 
>> cases, depending on what is used as a transport, but (1) not 
>> generally, and (2) not now.  But this would be efficient.
>
>
> I'm thinking one thing, we now already have a channel(a unix domain 
> socket) between front-end and back-end, why not delivering the state 
> file (as an fd) to/from back-end directly rathen than negotiating
>
> a new pipe as data channel.(since we already assume they can share fd 
> of pipe, why not fd of normal file?)

I don’t quite understand.  We do deliver the FD to the back-end through 
this new command to the back-end, rather directly.  The back-end can 
disagree and send another FD back, but it doesn’t have to.  It can just 
simply always take the FD that it’s been given.

The FD also doesn’t need to be a pipe, it can be any FD that the 
back-end can read the state from (see my discussion with Stefan on patch 
1).  In the practical implementation, it’s difficult for qemu to provide 
anything that is not a pipe, because the migration state can be sent 
over many channels, and the respective FD (if any) is not exposed on the 
high level (for devices).

>
>>
>> The model we’d implement in qemu with this series is comparatively 
>> not efficient, because it manually copies data from the FD (which by 
>> default is a pipe) into the migration stream.
>>
>> But it is possible that the front-end can provide a zero-copy FD into 
>> the migration stream, and for that reason we want to allow the 
>> front-end to provide the transfer FD.
>>
>> In contrast, the back-end might have a more efficient implementation 
>> on its side, too, though.  It is difficult to imagine, but it may be 
>> possible that it has an FD already where the data needs to written 
>> to/read from, e.g. because it’s connected to a physical device that 
>> wants to get/send its state this way.  Admittedly, we have absolutely 
>> no concrete example for such a back-end at this point, but it’s hard 
>> to rule out that it is possible that there will be back-ends that 
>> could make use of zero-copy if only they are allowed to dictate the 
>> transfer FD.
>>
>> So because we in qemu can’t (at least not generally) provide an 
>> efficient (zero-copy) implementation, we don’t want to rule out that 
>> the back-end might be able to, so we also want to allow it to provide 
>> the transfer FD.
>>
>> In the end, we decided that we don’t want to preclude either side of 
>> providing the FD.  If one side knows it can do better than a plain 
>> pipe with copying on both ends, it should provide the FD. That 
>> doesn’t complicate the implementation much.
>>
>> (So, notably, we measure “improves efficiency” based on “is it better 
>> than a plain pipe with copying on both ends”.  A pipe with copying is 
>> the default implementation, but as Stefan has pointed out in his 
>> review, it doesn’t need to be a pipe.  More efficient FDs, like the 
>> back-end can provide in its reply, would actually likely not be pipes.)
>
>
> Yea, but the vhost-user protocol in this patchset defines these FDs as 
> data transfer channel not the data itself, how about the latter?

I don’t quite understand this either.  Perhaps you’re suggesting that we 
should just transfer the data over the vhost-user socket. There are two 
reasons why I don’t think we should do that: First, the vhost-user 
socket is currently only used for a small amount of data.  For example, 
the Rust vhost crate will reject any message longer than 4096 bytes.  So 
if I were to use it for more data, I feel like I’d be abusing it.  
Second, if/when we add other transfer phases (i.e. while the VM is still 
running, to transfer larger amounts of state), it will be important that 
the vhost-user socket is open for messages from the front-end at any 
time.  So then, at the latest, would we need a separate data channel.  I 
don’t feel like there is harm in making the interface have this separate 
channel now already.

Or maybe you suggest that the separate FD is OK, but it shouldn’t be 
defined to be a data transfer channel, but just data, i.e. not 
necessarily a pipe.  I think that’s basically the discussion I had with 
Stefan on patch 1: That the important thing is just that the receiving 
end can read data out of the FD until EOF, and the writing end can write 
data and close the FD to signal the end of data. Whether they actually 
do that is not important, only that anything they do will let the other 
end do that.

Hanna


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Virtio-fs] [PATCH v2 2/4] vhost-user: Interface for migration state transfer
  2023-07-21  8:07         ` Hanna Czenczek
@ 2023-07-21  8:23           ` Hao Xu
  0 siblings, 0 replies; 36+ messages in thread
From: Hao Xu @ 2023-07-21  8:23 UTC (permalink / raw)
  To: Hanna Czenczek, qemu-devel, virtio-fs
  Cc: Eugenio Pérez, Stefan Hajnoczi, Michael S . Tsirkin


On 7/21/23 16:07, Hanna Czenczek wrote:
> On 20.07.23 17:05, Hao Xu wrote:
>>
>> On 7/20/23 21:20, Hanna Czenczek wrote:
>>> On 20.07.23 14:13, Hao Xu wrote:
>>>>
>>>> On 7/12/23 19:17, Hanna Czenczek wrote:
>>>>> Add the interface for transferring the back-end's state during 
>>>>> migration
>>>>> as defined previously in vhost-user.rst.
>>>>>
>>>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
>>>>> ---
>>>>>   include/hw/virtio/vhost-backend.h |  24 +++++
>>>>>   include/hw/virtio/vhost.h         |  79 ++++++++++++++++
>>>>>   hw/virtio/vhost-user.c            | 147 
>>>>> ++++++++++++++++++++++++++++++
>>>>>   hw/virtio/vhost.c                 |  37 ++++++++
>>>>>   4 files changed, 287 insertions(+)
>>>>>
>>>>> diff --git a/include/hw/virtio/vhost-backend.h 
>>>>> b/include/hw/virtio/vhost-backend.h
>>>>> index 31a251a9f5..e59d0b53f8 100644
>>>>> --- a/include/hw/virtio/vhost-backend.h
>>>>> +++ b/include/hw/virtio/vhost-backend.h
>>>>> @@ -26,6 +26,18 @@ typedef enum VhostSetConfigType {
>>>>>       VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
>>>>>   } VhostSetConfigType;
>>>>>   +typedef enum VhostDeviceStateDirection {
>>>>> +    /* Transfer state from back-end (device) to front-end */
>>>>> +    VHOST_TRANSFER_STATE_DIRECTION_SAVE = 0,
>>>>> +    /* Transfer state from front-end to back-end (device) */
>>>>> +    VHOST_TRANSFER_STATE_DIRECTION_LOAD = 1,
>>>>> +} VhostDeviceStateDirection;
>>>>> +
>>>>> +typedef enum VhostDeviceStatePhase {
>>>>> +    /* The device (and all its vrings) is stopped */
>>>>> +    VHOST_TRANSFER_STATE_PHASE_STOPPED = 0,
>>>>> +} VhostDeviceStatePhase;
>>>>> +
>>>>>   struct vhost_inflight;
>>>>>   struct vhost_dev;
>>>>>   struct vhost_log;
>>>>> @@ -133,6 +145,15 @@ typedef int 
>>>>> (*vhost_set_config_call_op)(struct vhost_dev *dev,
>>>>>     typedef void (*vhost_reset_status_op)(struct vhost_dev *dev);
>>>>>   +typedef bool (*vhost_supports_migratory_state_op)(struct 
>>>>> vhost_dev *dev);
>>>>> +typedef int (*vhost_set_device_state_fd_op)(struct vhost_dev *dev,
>>>>> + VhostDeviceStateDirection direction,
>>>>> + VhostDeviceStatePhase phase,
>>>>> +                                            int fd,
>>>>> +                                            int *reply_fd,
>>>>> +                                            Error **errp);
>>>>> +typedef int (*vhost_check_device_state_op)(struct vhost_dev *dev, 
>>>>> Error **errp);
>>>>> +
>>>>>   typedef struct VhostOps {
>>>>>       VhostBackendType backend_type;
>>>>>       vhost_backend_init vhost_backend_init;
>>>>> @@ -181,6 +202,9 @@ typedef struct VhostOps {
>>>>>       vhost_force_iommu_op vhost_force_iommu;
>>>>>       vhost_set_config_call_op vhost_set_config_call;
>>>>>       vhost_reset_status_op vhost_reset_status;
>>>>> +    vhost_supports_migratory_state_op 
>>>>> vhost_supports_migratory_state;
>>>>> +    vhost_set_device_state_fd_op vhost_set_device_state_fd;
>>>>> +    vhost_check_device_state_op vhost_check_device_state;
>>>>>   } VhostOps;
>>>>>     int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
>>>>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
>>>>> index 69bf59d630..d8877496e5 100644
>>>>> --- a/include/hw/virtio/vhost.h
>>>>> +++ b/include/hw/virtio/vhost.h
>>>>> @@ -346,4 +346,83 @@ int vhost_dev_set_inflight(struct vhost_dev 
>>>>> *dev,
>>>>>   int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t 
>>>>> queue_size,
>>>>>                              struct vhost_inflight *inflight);
>>>>>   bool vhost_dev_has_iommu(struct vhost_dev *dev);
>>>>> +
>>>>> +/**
>>>>> + * vhost_supports_migratory_state(): Checks whether the back-end
>>>>> + * supports transferring internal state for the purpose of 
>>>>> migration.
>>>>> + * Support for this feature is required for 
>>>>> vhost_set_device_state_fd()
>>>>> + * and vhost_check_device_state().
>>>>> + *
>>>>> + * @dev: The vhost device
>>>>> + *
>>>>> + * Returns true if the device supports these commands, and false 
>>>>> if it
>>>>> + * does not.
>>>>> + */
>>>>> +bool vhost_supports_migratory_state(struct vhost_dev *dev);
>>>>> +
>>>>> +/**
>>>>> + * vhost_set_device_state_fd(): Begin transfer of internal state 
>>>>> from/to
>>>>> + * the back-end for the purpose of migration.  Data is to be 
>>>>> transferred
>>>>> + * over a pipe according to @direction and @phase.  The sending 
>>>>> end must
>>>>> + * only write to the pipe, and the receiving end must only read 
>>>>> from it.
>>>>> + * Once the sending end is done, it closes its FD.  The receiving 
>>>>> end
>>>>> + * must take this as the end-of-transfer signal and close its FD, 
>>>>> too.
>>>>> + *
>>>>> + * @fd is the back-end's end of the pipe: The write FD for SAVE, 
>>>>> and the
>>>>> + * read FD for LOAD.  This function transfers ownership of @fd to 
>>>>> the
>>>>> + * back-end, i.e. closes it in the front-end.
>>>>> + *
>>>>> + * The back-end may optionally reply with an FD of its own, if this
>>>>> + * improves efficiency on its end.  In this case, the returned FD is
>>>>
>>>>
>>>> Hi Hanna,
>>>>
>>>> In what case/situation, the back-end will have a more efficient fd?
>>>
>>> Hi Hao,
>>>
>>> There is no example yet.
>>>
>>>> Here my understanding of this "FD of its own" is as same type as
>>>>
>>>> the given fd(e.g. both pipe files), why the fd from back-end makes
>>>>
>>>> difference? Do I miss anything here?
>>>
>>> Maybe it makes more sense in the context of how we came up with the 
>>> idea: Specifically, Stefan and me were asking which end should 
>>> provide the FD.  In the context of vhost-user, it makes sense to 
>>> have it be the front-end, because it controls vhost-user 
>>> communication, but that’s just the natural protocol choice, not 
>>> necessarily the most efficient one.
>>>
>>> It is imaginable that the front-end (e.g. qemu) could create a file 
>>> descriptor whose data is directly spliced (automatically) into the 
>>> migration stream, and then hand this FD to the back-end. In 
>>> practice, this doesn’t work for qemu (at this point), because it 
>>> doesn’t use simple read/write into the migration stream, but has an 
>>> abstraction layer for that.  It might be possible to make this work 
>>> in some cases, depending on what is used as a transport, but (1) not 
>>> generally, and (2) not now.  But this would be efficient.
>>
>>
>> I'm thinking one thing, we now already have a channel(a unix domain 
>> socket) between front-end and back-end, why not delivering the state 
>> file (as an fd) to/from back-end directly rathen than negotiating
>>
>> a new pipe as data channel.(since we already assume they can share fd 
>> of pipe, why not fd of normal file?)
>
> I don’t quite understand.  We do deliver the FD to the back-end 
> through this new command to the back-end, rather directly.  The 
> back-end can disagree and send another FD back, but it doesn’t have 
> to.  It can just simply always take the FD that it’s been given.
>
> The FD also doesn’t need to be a pipe, it can be any FD that the 
> back-end can read the state from (see my discussion with Stefan on 
> patch 1).  In the practical implementation, it’s difficult for qemu to 
> provide anything that is not a pipe, because the migration state can 
> be sent over many channels, and the respective FD (if any) is not 
> exposed on the high level (for devices).
>
>>
>>>
>>> The model we’d implement in qemu with this series is comparatively 
>>> not efficient, because it manually copies data from the FD (which by 
>>> default is a pipe) into the migration stream.
>>>
>>> But it is possible that the front-end can provide a zero-copy FD 
>>> into the migration stream, and for that reason we want to allow the 
>>> front-end to provide the transfer FD.
>>>
>>> In contrast, the back-end might have a more efficient implementation 
>>> on its side, too, though.  It is difficult to imagine, but it may be 
>>> possible that it has an FD already where the data needs to written 
>>> to/read from, e.g. because it’s connected to a physical device that 
>>> wants to get/send its state this way.  Admittedly, we have 
>>> absolutely no concrete example for such a back-end at this point, 
>>> but it’s hard to rule out that it is possible that there will be 
>>> back-ends that could make use of zero-copy if only they are allowed 
>>> to dictate the transfer FD.
>>>
>>> So because we in qemu can’t (at least not generally) provide an 
>>> efficient (zero-copy) implementation, we don’t want to rule out that 
>>> the back-end might be able to, so we also want to allow it to 
>>> provide the transfer FD.
>>>
>>> In the end, we decided that we don’t want to preclude either side of 
>>> providing the FD.  If one side knows it can do better than a plain 
>>> pipe with copying on both ends, it should provide the FD. That 
>>> doesn’t complicate the implementation much.
>>>
>>> (So, notably, we measure “improves efficiency” based on “is it 
>>> better than a plain pipe with copying on both ends”.  A pipe with 
>>> copying is the default implementation, but as Stefan has pointed out 
>>> in his review, it doesn’t need to be a pipe.  More efficient FDs, 
>>> like the back-end can provide in its reply, would actually likely 
>>> not be pipes.)
>>
>>
>> Yea, but the vhost-user protocol in this patchset defines these FDs 
>> as data transfer channel not the data itself, how about the latter?
>
> I don’t quite understand this either.  Perhaps you’re suggesting that 
> we should just transfer the data over the vhost-user socket. There are 
> two reasons why I don’t think we should do that: First, the vhost-user 
> socket is currently only used for a small amount of data.  For 
> example, the Rust vhost crate will reject any message longer than 4096 
> bytes.  So if I were to use it for more data, I feel like I’d be 
> abusing it.  Second, if/when we add other transfer phases (i.e. while 
> the VM is still running, to transfer larger amounts of state), it will 
> be important that the vhost-user socket is open for messages from the 
> front-end at any time.  So then, at the latest, would we need a 
> separate data channel.  I don’t feel like there is harm in making the 
> interface have this separate channel now already.
>
> Or maybe you suggest that the separate FD is OK, but it shouldn’t be 
> defined to be a data transfer channel, but just data, i.e. not 
> necessarily a pipe.  I think that’s basically the discussion I had 
> with Stefan on patch 1: That the important thing is just that the 
> receiving end can read data out of the FD until EOF, and the writing 
> end can write data and close the FD to signal the end of data. Whether 
> they actually do that is not important, only that anything they do 
> will let the other end do that.


I re-read the discussion of patch1, I now see what I suggested is 
already included in the protocol design.


>
> Hanna
>


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/4] vhost: Add high-level state save/load functions
  2023-07-12 11:17   ` [Virtio-fs] " Hanna Czenczek
@ 2023-07-21 15:18     ` Eugenio Perez Martin
  -1 siblings, 0 replies; 36+ messages in thread
From: Eugenio Perez Martin @ 2023-07-21 15:18 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: qemu-devel, virtio-fs, Michael S . Tsirkin, Stefan Hajnoczi,
	German Maglione

On Wed, Jul 12, 2023 at 1:17 PM Hanna Czenczek <hreitz@redhat.com> wrote:
>
> vhost_save_backend_state() and vhost_load_backend_state() can be used by
> vhost front-ends to easily save and load the back-end's state to/from
> the migration stream.
>
> Because we do not know the full state size ahead of time,
> vhost_save_backend_state() simply reads the data in 1 MB chunks, and
> writes each chunk consecutively into the migration stream, prefixed by
> its length.  EOF is indicated by a 0-length chunk.
>
> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> ---
>  include/hw/virtio/vhost.h |  35 +++++++
>  hw/virtio/vhost.c         | 204 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 239 insertions(+)
>
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index d8877496e5..0c282abd4e 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -425,4 +425,39 @@ int vhost_set_device_state_fd(struct vhost_dev *dev,
>   */
>  int vhost_check_device_state(struct vhost_dev *dev, Error **errp);
>
> +/**
> + * vhost_save_backend_state(): High-level function to receive a vhost
> + * back-end's state, and save it in `f`.  Uses
> + * `vhost_set_device_state_fd()` to get the data from the back-end, and
> + * stores it in consecutive chunks that are each prefixed by their
> + * respective length (be32).  The end is marked by a 0-length chunk.
> + *
> + * Must only be called while the device and all its vrings are stopped
> + * (`VHOST_TRANSFER_STATE_PHASE_STOPPED`).
> + *
> + * @dev: The vhost device from which to save the state
> + * @f: Migration stream in which to save the state
> + * @errp: Potential error message
> + *
> + * Returns 0 on success, and -errno otherwise.
> + */
> +int vhost_save_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp);
> +
> +/**
> + * vhost_load_backend_state(): High-level function to load a vhost
> + * back-end's state from `f`, and send it over to the back-end.  Reads
> + * the data from `f` in the format used by `vhost_save_state()`, and
> + * uses `vhost_set_device_state_fd()` to transfer it to the back-end.
> + *
> + * Must only be called while the device and all its vrings are stopped
> + * (`VHOST_TRANSFER_STATE_PHASE_STOPPED`).
> + *
> + * @dev: The vhost device to which to send the sate
> + * @f: Migration stream from which to load the state
> + * @errp: Potential error message
> + *
> + * Returns 0 on success, and -errno otherwise.
> + */
> +int vhost_load_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp);
> +
>  #endif
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 756b6d55a8..332d49a310 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -2128,3 +2128,207 @@ int vhost_check_device_state(struct vhost_dev *dev, Error **errp)
>                 "vhost transport does not support migration state transfer");
>      return -ENOSYS;
>  }
> +
> +int vhost_save_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp)
> +{
> +    /* Maximum chunk size in which to transfer the state */
> +    const size_t chunk_size = 1 * 1024 * 1024;
> +    void *transfer_buf = NULL;
> +    g_autoptr(GError) g_err = NULL;
> +    int pipe_fds[2], read_fd = -1, write_fd = -1, reply_fd = -1;
> +    int ret;
> +
> +    /* [0] for reading (our end), [1] for writing (back-end's end) */
> +    if (!g_unix_open_pipe(pipe_fds, FD_CLOEXEC, &g_err)) {
> +        error_setg(errp, "Failed to set up state transfer pipe: %s",
> +                   g_err->message);
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +
> +    read_fd = pipe_fds[0];
> +    write_fd = pipe_fds[1];
> +
> +    /*
> +     * VHOST_TRANSFER_STATE_PHASE_STOPPED means the device must be stopped.
> +     * We cannot check dev->suspended, because the back-end may not support
> +     * suspending.
> +     */
> +    assert(!dev->started);
> +
> +    /* Transfer ownership of write_fd to the back-end */
> +    ret = vhost_set_device_state_fd(dev,
> +                                    VHOST_TRANSFER_STATE_DIRECTION_SAVE,
> +                                    VHOST_TRANSFER_STATE_PHASE_STOPPED,
> +                                    write_fd,
> +                                    &reply_fd,
> +                                    errp);
> +    if (ret < 0) {
> +        error_prepend(errp, "Failed to initiate state transfer: ");
> +        goto fail;
> +    }
> +
> +    /* If the back-end wishes to use a different pipe, switch over */
> +    if (reply_fd >= 0) {
> +        close(read_fd);
> +        read_fd = reply_fd;
> +    }
> +
> +    transfer_buf = g_malloc(chunk_size);
> +
> +    while (true) {
> +        ssize_t read_ret;
> +
> +        read_ret = read(read_fd, transfer_buf, chunk_size);
> +        if (read_ret < 0) {
> +            ret = -errno;
> +            error_setg_errno(errp, -ret, "Failed to receive state");
> +            goto fail;
> +        }
> +
> +        assert(read_ret <= chunk_size);
> +        qemu_put_be32(f, read_ret);
> +
> +        if (read_ret == 0) {
> +            /* EOF */
> +            break;
> +        }
> +
> +        qemu_put_buffer(f, transfer_buf, read_ret);
> +    }
> +
> +    /*
> +     * Back-end will not really care, but be clean and close our end of the pipe
> +     * before inquiring the back-end about whether transfer was successful
> +     */
> +    close(read_fd);
> +    read_fd = -1;
> +
> +    /* Also, verify that the device is still stopped */
> +    assert(!dev->started);
> +
> +    ret = vhost_check_device_state(dev, errp);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    ret = 0;
> +fail:
> +    g_free(transfer_buf);
> +    if (read_fd >= 0) {
> +        close(read_fd);
> +    }
> +
> +    return ret;
> +}
> +
> +int vhost_load_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp)
> +{
> +    size_t transfer_buf_size = 0;
> +    void *transfer_buf = NULL;
> +    g_autoptr(GError) g_err = NULL;
> +    int pipe_fds[2], read_fd = -1, write_fd = -1, reply_fd = -1;
> +    int ret;
> +
> +    /* [0] for reading (back-end's end), [1] for writing (our end) */
> +    if (!g_unix_open_pipe(pipe_fds, FD_CLOEXEC, &g_err)) {
> +        error_setg(errp, "Failed to set up state transfer pipe: %s",
> +                   g_err->message);
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +
> +    read_fd = pipe_fds[0];
> +    write_fd = pipe_fds[1];
> +
> +    /*
> +     * VHOST_TRANSFER_STATE_PHASE_STOPPED means the device must be stopped.
> +     * We cannot check dev->suspended, because the back-end may not support
> +     * suspending.
> +     */
> +    assert(!dev->started);
> +
> +    /* Transfer ownership of read_fd to the back-end */
> +    ret = vhost_set_device_state_fd(dev,
> +                                    VHOST_TRANSFER_STATE_DIRECTION_LOAD,
> +                                    VHOST_TRANSFER_STATE_PHASE_STOPPED,
> +                                    read_fd,
> +                                    &reply_fd,
> +                                    errp);
> +    if (ret < 0) {
> +        error_prepend(errp, "Failed to initiate state transfer: ");
> +        goto fail;
> +    }
> +
> +    /* If the back-end wishes to use a different pipe, switch over */
> +    if (reply_fd >= 0) {
> +        close(write_fd);
> +        write_fd = reply_fd;
> +    }
> +
> +    while (true) {
> +        size_t this_chunk_size = qemu_get_be32(f);
> +        ssize_t write_ret;
> +        const uint8_t *transfer_pointer;
> +
> +        if (this_chunk_size == 0) {
> +            /* End of state */
> +            break;
> +        }
> +
> +        if (transfer_buf_size < this_chunk_size) {
> +            transfer_buf = g_realloc(transfer_buf, this_chunk_size);
> +            transfer_buf_size = this_chunk_size;
> +        }
> +
> +        if (qemu_get_buffer(f, transfer_buf, this_chunk_size) <
> +                this_chunk_size)
> +        {
> +            error_setg(errp, "Failed to read state");
> +            ret = -EINVAL;
> +            goto fail;
> +        }
> +
> +        transfer_pointer = transfer_buf;
> +        while (this_chunk_size > 0) {
> +            write_ret = write(write_fd, transfer_pointer, this_chunk_size);
> +            if (write_ret < 0) {
> +                ret = -errno;
> +                error_setg_errno(errp, -ret, "Failed to send state");
> +                goto fail;
> +            } else if (write_ret == 0) {
> +                error_setg(errp, "Failed to send state: Connection is closed");
> +                ret = -ECONNRESET;
> +                goto fail;
> +            }
> +
> +            assert(write_ret <= this_chunk_size);
> +            this_chunk_size -= write_ret;
> +            transfer_pointer += write_ret;
> +        }
> +    }
> +
> +    /*
> +     * Close our end, thus ending transfer, before inquiring the back-end about
> +     * whether transfer was successful
> +     */
> +    close(write_fd);
> +    write_fd = -1;
> +
> +    /* Also, verify that the device is still stopped */
> +    assert(!dev->started);
> +
> +    ret = vhost_check_device_state(dev, errp);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    ret = 0;
> +fail:
> +    g_free(transfer_buf);

Nitpick, but transfer_buf could have the g_autofree parameter.

Thanks!

> +    if (write_fd >= 0) {
> +        close(write_fd);
> +    }
> +
> +    return ret;
> +}
> --
> 2.41.0
>



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Virtio-fs] [PATCH v2 3/4] vhost: Add high-level state save/load functions
@ 2023-07-21 15:18     ` Eugenio Perez Martin
  0 siblings, 0 replies; 36+ messages in thread
From: Eugenio Perez Martin @ 2023-07-21 15:18 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: qemu-devel, virtio-fs, Michael S . Tsirkin, Stefan Hajnoczi,
	German Maglione

On Wed, Jul 12, 2023 at 1:17 PM Hanna Czenczek <hreitz@redhat.com> wrote:
>
> vhost_save_backend_state() and vhost_load_backend_state() can be used by
> vhost front-ends to easily save and load the back-end's state to/from
> the migration stream.
>
> Because we do not know the full state size ahead of time,
> vhost_save_backend_state() simply reads the data in 1 MB chunks, and
> writes each chunk consecutively into the migration stream, prefixed by
> its length.  EOF is indicated by a 0-length chunk.
>
> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> ---
>  include/hw/virtio/vhost.h |  35 +++++++
>  hw/virtio/vhost.c         | 204 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 239 insertions(+)
>
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index d8877496e5..0c282abd4e 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -425,4 +425,39 @@ int vhost_set_device_state_fd(struct vhost_dev *dev,
>   */
>  int vhost_check_device_state(struct vhost_dev *dev, Error **errp);
>
> +/**
> + * vhost_save_backend_state(): High-level function to receive a vhost
> + * back-end's state, and save it in `f`.  Uses
> + * `vhost_set_device_state_fd()` to get the data from the back-end, and
> + * stores it in consecutive chunks that are each prefixed by their
> + * respective length (be32).  The end is marked by a 0-length chunk.
> + *
> + * Must only be called while the device and all its vrings are stopped
> + * (`VHOST_TRANSFER_STATE_PHASE_STOPPED`).
> + *
> + * @dev: The vhost device from which to save the state
> + * @f: Migration stream in which to save the state
> + * @errp: Potential error message
> + *
> + * Returns 0 on success, and -errno otherwise.
> + */
> +int vhost_save_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp);
> +
> +/**
> + * vhost_load_backend_state(): High-level function to load a vhost
> + * back-end's state from `f`, and send it over to the back-end.  Reads
> + * the data from `f` in the format used by `vhost_save_state()`, and
> + * uses `vhost_set_device_state_fd()` to transfer it to the back-end.
> + *
> + * Must only be called while the device and all its vrings are stopped
> + * (`VHOST_TRANSFER_STATE_PHASE_STOPPED`).
> + *
> + * @dev: The vhost device to which to send the sate
> + * @f: Migration stream from which to load the state
> + * @errp: Potential error message
> + *
> + * Returns 0 on success, and -errno otherwise.
> + */
> +int vhost_load_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp);
> +
>  #endif
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 756b6d55a8..332d49a310 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -2128,3 +2128,207 @@ int vhost_check_device_state(struct vhost_dev *dev, Error **errp)
>                 "vhost transport does not support migration state transfer");
>      return -ENOSYS;
>  }
> +
> +int vhost_save_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp)
> +{
> +    /* Maximum chunk size in which to transfer the state */
> +    const size_t chunk_size = 1 * 1024 * 1024;
> +    void *transfer_buf = NULL;
> +    g_autoptr(GError) g_err = NULL;
> +    int pipe_fds[2], read_fd = -1, write_fd = -1, reply_fd = -1;
> +    int ret;
> +
> +    /* [0] for reading (our end), [1] for writing (back-end's end) */
> +    if (!g_unix_open_pipe(pipe_fds, FD_CLOEXEC, &g_err)) {
> +        error_setg(errp, "Failed to set up state transfer pipe: %s",
> +                   g_err->message);
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +
> +    read_fd = pipe_fds[0];
> +    write_fd = pipe_fds[1];
> +
> +    /*
> +     * VHOST_TRANSFER_STATE_PHASE_STOPPED means the device must be stopped.
> +     * We cannot check dev->suspended, because the back-end may not support
> +     * suspending.
> +     */
> +    assert(!dev->started);
> +
> +    /* Transfer ownership of write_fd to the back-end */
> +    ret = vhost_set_device_state_fd(dev,
> +                                    VHOST_TRANSFER_STATE_DIRECTION_SAVE,
> +                                    VHOST_TRANSFER_STATE_PHASE_STOPPED,
> +                                    write_fd,
> +                                    &reply_fd,
> +                                    errp);
> +    if (ret < 0) {
> +        error_prepend(errp, "Failed to initiate state transfer: ");
> +        goto fail;
> +    }
> +
> +    /* If the back-end wishes to use a different pipe, switch over */
> +    if (reply_fd >= 0) {
> +        close(read_fd);
> +        read_fd = reply_fd;
> +    }
> +
> +    transfer_buf = g_malloc(chunk_size);
> +
> +    while (true) {
> +        ssize_t read_ret;
> +
> +        read_ret = read(read_fd, transfer_buf, chunk_size);
> +        if (read_ret < 0) {
> +            ret = -errno;
> +            error_setg_errno(errp, -ret, "Failed to receive state");
> +            goto fail;
> +        }
> +
> +        assert(read_ret <= chunk_size);
> +        qemu_put_be32(f, read_ret);
> +
> +        if (read_ret == 0) {
> +            /* EOF */
> +            break;
> +        }
> +
> +        qemu_put_buffer(f, transfer_buf, read_ret);
> +    }
> +
> +    /*
> +     * Back-end will not really care, but be clean and close our end of the pipe
> +     * before inquiring the back-end about whether transfer was successful
> +     */
> +    close(read_fd);
> +    read_fd = -1;
> +
> +    /* Also, verify that the device is still stopped */
> +    assert(!dev->started);
> +
> +    ret = vhost_check_device_state(dev, errp);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    ret = 0;
> +fail:
> +    g_free(transfer_buf);
> +    if (read_fd >= 0) {
> +        close(read_fd);
> +    }
> +
> +    return ret;
> +}
> +
> +int vhost_load_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp)
> +{
> +    size_t transfer_buf_size = 0;
> +    void *transfer_buf = NULL;
> +    g_autoptr(GError) g_err = NULL;
> +    int pipe_fds[2], read_fd = -1, write_fd = -1, reply_fd = -1;
> +    int ret;
> +
> +    /* [0] for reading (back-end's end), [1] for writing (our end) */
> +    if (!g_unix_open_pipe(pipe_fds, FD_CLOEXEC, &g_err)) {
> +        error_setg(errp, "Failed to set up state transfer pipe: %s",
> +                   g_err->message);
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +
> +    read_fd = pipe_fds[0];
> +    write_fd = pipe_fds[1];
> +
> +    /*
> +     * VHOST_TRANSFER_STATE_PHASE_STOPPED means the device must be stopped.
> +     * We cannot check dev->suspended, because the back-end may not support
> +     * suspending.
> +     */
> +    assert(!dev->started);
> +
> +    /* Transfer ownership of read_fd to the back-end */
> +    ret = vhost_set_device_state_fd(dev,
> +                                    VHOST_TRANSFER_STATE_DIRECTION_LOAD,
> +                                    VHOST_TRANSFER_STATE_PHASE_STOPPED,
> +                                    read_fd,
> +                                    &reply_fd,
> +                                    errp);
> +    if (ret < 0) {
> +        error_prepend(errp, "Failed to initiate state transfer: ");
> +        goto fail;
> +    }
> +
> +    /* If the back-end wishes to use a different pipe, switch over */
> +    if (reply_fd >= 0) {
> +        close(write_fd);
> +        write_fd = reply_fd;
> +    }
> +
> +    while (true) {
> +        size_t this_chunk_size = qemu_get_be32(f);
> +        ssize_t write_ret;
> +        const uint8_t *transfer_pointer;
> +
> +        if (this_chunk_size == 0) {
> +            /* End of state */
> +            break;
> +        }
> +
> +        if (transfer_buf_size < this_chunk_size) {
> +            transfer_buf = g_realloc(transfer_buf, this_chunk_size);
> +            transfer_buf_size = this_chunk_size;
> +        }
> +
> +        if (qemu_get_buffer(f, transfer_buf, this_chunk_size) <
> +                this_chunk_size)
> +        {
> +            error_setg(errp, "Failed to read state");
> +            ret = -EINVAL;
> +            goto fail;
> +        }
> +
> +        transfer_pointer = transfer_buf;
> +        while (this_chunk_size > 0) {
> +            write_ret = write(write_fd, transfer_pointer, this_chunk_size);
> +            if (write_ret < 0) {
> +                ret = -errno;
> +                error_setg_errno(errp, -ret, "Failed to send state");
> +                goto fail;
> +            } else if (write_ret == 0) {
> +                error_setg(errp, "Failed to send state: Connection is closed");
> +                ret = -ECONNRESET;
> +                goto fail;
> +            }
> +
> +            assert(write_ret <= this_chunk_size);
> +            this_chunk_size -= write_ret;
> +            transfer_pointer += write_ret;
> +        }
> +    }
> +
> +    /*
> +     * Close our end, thus ending transfer, before inquiring the back-end about
> +     * whether transfer was successful
> +     */
> +    close(write_fd);
> +    write_fd = -1;
> +
> +    /* Also, verify that the device is still stopped */
> +    assert(!dev->started);
> +
> +    ret = vhost_check_device_state(dev, errp);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    ret = 0;
> +fail:
> +    g_free(transfer_buf);

Nitpick, but transfer_buf could have the g_autofree parameter.

Thanks!

> +    if (write_fd >= 0) {
> +        close(write_fd);
> +    }
> +
> +    return ret;
> +}
> --
> 2.41.0
>


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/4] vhost: Add high-level state save/load functions
  2023-07-21 15:18     ` [Virtio-fs] " Eugenio Perez Martin
@ 2023-07-21 16:09       ` Hanna Czenczek
  -1 siblings, 0 replies; 36+ messages in thread
From: Hanna Czenczek @ 2023-07-21 16:09 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, virtio-fs, Michael S . Tsirkin, Stefan Hajnoczi,
	German Maglione

On 21.07.23 17:18, Eugenio Perez Martin wrote:
> On Wed, Jul 12, 2023 at 1:17 PM Hanna Czenczek <hreitz@redhat.com> wrote:
>> vhost_save_backend_state() and vhost_load_backend_state() can be used by
>> vhost front-ends to easily save and load the back-end's state to/from
>> the migration stream.
>>
>> Because we do not know the full state size ahead of time,
>> vhost_save_backend_state() simply reads the data in 1 MB chunks, and
>> writes each chunk consecutively into the migration stream, prefixed by
>> its length.  EOF is indicated by a 0-length chunk.
>>
>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
>> ---
>>   include/hw/virtio/vhost.h |  35 +++++++
>>   hw/virtio/vhost.c         | 204 ++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 239 insertions(+)

[...]

>> +int vhost_load_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp)
>> +{
>> +    size_t transfer_buf_size = 0;
>> +    void *transfer_buf = NULL;
>> +    g_autoptr(GError) g_err = NULL;
>> +    int pipe_fds[2], read_fd = -1, write_fd = -1, reply_fd = -1;
>> +    int ret;

[...]

>> +    ret = 0;
>> +fail:
>> +    g_free(transfer_buf);
> Nitpick, but transfer_buf could have the g_autofree parameter.

Ah, sure, thanks!

Hanna



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Virtio-fs] [PATCH v2 3/4] vhost: Add high-level state save/load functions
@ 2023-07-21 16:09       ` Hanna Czenczek
  0 siblings, 0 replies; 36+ messages in thread
From: Hanna Czenczek @ 2023-07-21 16:09 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, virtio-fs, Michael S . Tsirkin, Stefan Hajnoczi,
	German Maglione

On 21.07.23 17:18, Eugenio Perez Martin wrote:
> On Wed, Jul 12, 2023 at 1:17 PM Hanna Czenczek <hreitz@redhat.com> wrote:
>> vhost_save_backend_state() and vhost_load_backend_state() can be used by
>> vhost front-ends to easily save and load the back-end's state to/from
>> the migration stream.
>>
>> Because we do not know the full state size ahead of time,
>> vhost_save_backend_state() simply reads the data in 1 MB chunks, and
>> writes each chunk consecutively into the migration stream, prefixed by
>> its length.  EOF is indicated by a 0-length chunk.
>>
>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
>> ---
>>   include/hw/virtio/vhost.h |  35 +++++++
>>   hw/virtio/vhost.c         | 204 ++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 239 insertions(+)

[...]

>> +int vhost_load_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp)
>> +{
>> +    size_t transfer_buf_size = 0;
>> +    void *transfer_buf = NULL;
>> +    g_autoptr(GError) g_err = NULL;
>> +    int pipe_fds[2], read_fd = -1, write_fd = -1, reply_fd = -1;
>> +    int ret;

[...]

>> +    ret = 0;
>> +fail:
>> +    g_free(transfer_buf);
> Nitpick, but transfer_buf could have the g_autofree parameter.

Ah, sure, thanks!

Hanna


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2023-07-21 16:10 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-12 11:16 [PATCH v2 0/4] vhost-user: Back-end state migration Hanna Czenczek
2023-07-12 11:16 ` [Virtio-fs] " Hanna Czenczek
2023-07-12 11:16 ` [PATCH v2 1/4] vhost-user.rst: Migrating back-end-internal state Hanna Czenczek
2023-07-12 11:16   ` [Virtio-fs] " Hanna Czenczek
2023-07-18 15:57   ` Stefan Hajnoczi
2023-07-18 15:57     ` [Virtio-fs] " Stefan Hajnoczi
2023-07-19 16:33     ` Hanna Czenczek
2023-07-19 16:33       ` [Virtio-fs] " Hanna Czenczek
2023-07-20 11:43       ` Stefan Hajnoczi
2023-07-20 11:43         ` [Virtio-fs] " Stefan Hajnoczi
2023-07-18 16:12   ` Stefan Hajnoczi
2023-07-18 16:12     ` [Virtio-fs] " Stefan Hajnoczi
2023-07-20 11:32     ` Hanna Czenczek
2023-07-12 11:17 ` [PATCH v2 2/4] vhost-user: Interface for migration state transfer Hanna Czenczek
2023-07-12 11:17   ` [Virtio-fs] " Hanna Czenczek
2023-07-18 18:32   ` Stefan Hajnoczi
2023-07-18 18:32     ` [Virtio-fs] " Stefan Hajnoczi
2023-07-20 12:13   ` Hao Xu
2023-07-20 13:20     ` Hanna Czenczek
2023-07-20 15:05       ` Hao Xu
2023-07-21  8:07         ` Hanna Czenczek
2023-07-21  8:23           ` Hao Xu
2023-07-12 11:17 ` [PATCH v2 3/4] vhost: Add high-level state save/load functions Hanna Czenczek
2023-07-12 11:17   ` [Virtio-fs] " Hanna Czenczek
2023-07-18 18:42   ` Stefan Hajnoczi
2023-07-18 18:42     ` [Virtio-fs] " Stefan Hajnoczi
2023-07-20 11:42     ` Hanna Czenczek
2023-07-20 11:42       ` [Virtio-fs] " Hanna Czenczek
2023-07-21 15:18   ` Eugenio Perez Martin
2023-07-21 15:18     ` [Virtio-fs] " Eugenio Perez Martin
2023-07-21 16:09     ` Hanna Czenczek
2023-07-21 16:09       ` [Virtio-fs] " Hanna Czenczek
2023-07-12 11:17 ` [PATCH v2 4/4] vhost-user-fs: Implement internal migration Hanna Czenczek
2023-07-12 11:17   ` [Virtio-fs] " Hanna Czenczek
2023-07-18 18:44   ` Stefan Hajnoczi
2023-07-18 18:44     ` [Virtio-fs] " Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.