All of lore.kernel.org
 help / color / mirror / Atom feed
* About restoring the state in vhost-vdpa device
@ 2022-05-11 19:43 Eugenio Perez Martin
  2022-05-12  4:00   ` Jason Wang
  2022-05-13 15:08   ` Parav Pandit
  0 siblings, 2 replies; 20+ messages in thread
From: Eugenio Perez Martin @ 2022-05-11 19:43 UTC (permalink / raw)
  To: virtualization, qemu-level, Jason Wang, Cindy Lu, Parav Pandit,
	Gautam Dawar, virtio-networking, Eli Cohen, Laurent Vivier,
	Stefano Garzarella

This is a proposal to restore the state of the vhost-vdpa device at
the destination after a live migration. It uses as many available
features both from the device and from qemu as possible so we keep the
communication simple and speed up the merging process.

# Initializing a vhost-vdpa device.

Without the context of live migration, the steps to initialize the
device from vhost-vdpa at qemu starting are:
1) [vhost] Open the vdpa device, Using simply open()
2) [vhost+virtio] Get device features. These are expected not to
change in the device's lifetime, so we can save them. Qemu issues a
VHOST_GET_FEATURES ioctl and vdpa forwards to the backend driver using
get_device_features() callback.
3) [vhost+virtio] Get its max_queue_pairs if _F_MQ and _F_CTRL_VQ.
These are obtained using VHOST_VDPA_GET_CONFIG, and that request is
forwarded to the device using get_config. QEMU expects the device to
not change it in its lifetime.
4) [vhost] Vdpa set status (_S_ACKNOLEDGE, _S_DRIVER). Still no
FEATURES_OK or DRIVER_OK. The ioctl is VHOST_VDPA_SET_STATUS, and the
vdpa backend driver callback is set_status.

These are the steps used to initialize the device in qemu terminology,
taking away some redundancies to make it simpler.

Now the driver sends the FEATURES_OK and the DRIVER_OK, and qemu
detects it, so it *starts* the device.

# Starting a vhost-vdpa device

At virtio_net_vhost_status we have two important variables here:
int cvq = _F_CTRL_VQ ? 1 : 0;
int queue_pairs = _F_CTRL_VQ && _F_MQ ? (max_queue_pairs of step 3) : 0;

Now identification of the cvq index. Qemu *know* that the device will
expose it at the last queue (max_queue_pairs*2) if _F_MQ has been
acknowledged by the guest's driver or 2 if not. It cannot depend on
any data sent to the device via cvq, because we couldn't get its
command status on a change.

Now we start the vhost device. The workflow is currently:

5) [virtio+vhost] The first step is to send the acknowledgement of the
Virtio features and vhost/vdpa backend features to the device, so it
knows how to configure itself. This is done using the same calls as
step 4 with these feature bits added.
6) [virtio] Set the size, base, addr, kick and call fd for each queue
(SET_VRING_ADDR, SET_VRING_NUM, ...; and forwarded with
set_vq_address, set_vq_state, ...)
7) [vdpa] Send host notifiers and *send SET_VRING_ENABLE = 1* for each
queue. This is done using ioctl VHOST_VDPA_SET_VRING_ENABLE, and
forwarded to the vdpa backend using set_vq_ready callback.
8) [virtio + vdpa] Send memory translations & set DRIVER_OK.

If we follow the current workflow, the device is allowed now to start
receiving only on vq pair 0, since we've still not set the multi queue
pair. This could cause the guest to receive packets in unexpected
queues, breaking RSS.

# Proposal

Our proposal diverge in step 7: Instead of enabling *all* the
virtqueues, only enable the CVQ. After that, send the DRIVER_OK and
queue all the control commands to restore the device status (MQ, RSS,
...). Once all of them have been acknowledged ("device", or emulated
cvq in host vdpa backend driver, has used all cvq buffers, enable
(SET_VRING_ENABLE, set_vq_ready) all other queues.

Everything needed for this is already implemented in the kernel as far
as I see, there is only a small modification in qemu needed. Thus
achieving the restoring of the device state without creating
maintenance burden.

A lot of optimizations can be applied on top without the need to add
stuff to the migration protocol or vDPA uAPI, like the pre-warming of
the vdpa queues or adding more capabilities to the emulated CVQ.

Other optimizations like applying the state out of band can also be
added so they can run in parallel with the migration, but that
requires a bigger change in qemu migration protocol making us lose
focus on achieving at least the basic device migration in my opinion.

Thoughts?



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: About restoring the state in vhost-vdpa device
  2022-05-11 19:43 About restoring the state in vhost-vdpa device Eugenio Perez Martin
@ 2022-05-12  4:00   ` Jason Wang
  2022-05-13 15:08   ` Parav Pandit
  1 sibling, 0 replies; 20+ messages in thread
From: Jason Wang @ 2022-05-12  4:00 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Cindy Lu, virtio-networking, qemu-level,
	Gautam Dawar, virtualization, Eli Cohen

On Thu, May 12, 2022 at 3:44 AM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> This is a proposal to restore the state of the vhost-vdpa device at
> the destination after a live migration. It uses as many available
> features both from the device and from qemu as possible so we keep the
> communication simple and speed up the merging process.

When we finalize the design, we can formalize it in kernel Documentation/

>
> # Initializing a vhost-vdpa device.
>
> Without the context of live migration, the steps to initialize the
> device from vhost-vdpa at qemu starting are:
> 1) [vhost] Open the vdpa device, Using simply open()
> 2) [vhost+virtio] Get device features. These are expected not to
> change in the device's lifetime, so we can save them. Qemu issues a
> VHOST_GET_FEATURES ioctl and vdpa forwards to the backend driver using
> get_device_features() callback.

For "virtio" do you mean it's an action that is defined in the spec?

> 3) [vhost+virtio] Get its max_queue_pairs if _F_MQ and _F_CTRL_VQ.
> These are obtained using VHOST_VDPA_GET_CONFIG, and that request is
> forwarded to the device using get_config. QEMU expects the device to
> not change it in its lifetime.
> 4) [vhost] Vdpa set status (_S_ACKNOLEDGE, _S_DRIVER). Still no
> FEATURES_OK or DRIVER_OK. The ioctl is VHOST_VDPA_SET_STATUS, and the
> vdpa backend driver callback is set_status.
>
> These are the steps used to initialize the device in qemu terminology,
> taking away some redundancies to make it simpler.
>
> Now the driver sends the FEATURES_OK and the DRIVER_OK, and qemu
> detects it, so it *starts* the device.
>
> # Starting a vhost-vdpa device
>
> At virtio_net_vhost_status we have two important variables here:
> int cvq = _F_CTRL_VQ ? 1 : 0;
> int queue_pairs = _F_CTRL_VQ && _F_MQ ? (max_queue_pairs of step 3) : 0;
>
> Now identification of the cvq index. Qemu *know* that the device will
> expose it at the last queue (max_queue_pairs*2) if _F_MQ has been
> acknowledged by the guest's driver or 2 if not. It cannot depend on
> any data sent to the device via cvq, because we couldn't get its
> command status on a change.
>
> Now we start the vhost device. The workflow is currently:
>
> 5) [virtio+vhost] The first step is to send the acknowledgement of the
> Virtio features and vhost/vdpa backend features to the device, so it
> knows how to configure itself. This is done using the same calls as
> step 4 with these feature bits added.
> 6) [virtio] Set the size, base, addr, kick and call fd for each queue
> (SET_VRING_ADDR, SET_VRING_NUM, ...; and forwarded with
> set_vq_address, set_vq_state, ...)
> 7) [vdpa] Send host notifiers and *send SET_VRING_ENABLE = 1* for each
> queue. This is done using ioctl VHOST_VDPA_SET_VRING_ENABLE, and
> forwarded to the vdpa backend using set_vq_ready callback.
> 8) [virtio + vdpa] Send memory translations & set DRIVER_OK.
>
> If we follow the current workflow, the device is allowed now to start
> receiving only on vq pair 0, since we've still not set the multi queue
> pair. This could cause the guest to receive packets in unexpected
> queues, breaking RSS.
>
> # Proposal
>
> Our proposal diverge in step 7: Instead of enabling *all* the
> virtqueues, only enable the CVQ. After that, send the DRIVER_OK and
> queue all the control commands to restore the device status (MQ, RSS,
> ...). Once all of them have been acknowledged ("device", or emulated
> cvq in host vdpa backend driver, has used all cvq buffers, enable
> (SET_VRING_ENABLE, set_vq_ready) all other queues.
>
> Everything needed for this is already implemented in the kernel as far
> as I see, there is only a small modification in qemu needed. Thus
> achieving the restoring of the device state without creating
> maintenance burden.

Yes, one of the major motivations is to try to reuse the existing APIs
as much as possible as a start. It doesn't mean we can't invent new
API, but having a dedicated save/restore uAPI looks fine. But it looks
more like a work that needs to be finalized in the virtio spec
otherwise we may end up with code that is hard to maintain.

Thanks

>
> A lot of optimizations can be applied on top without the need to add
> stuff to the migration protocol or vDPA uAPI, like the pre-warming of
> the vdpa queues or adding more capabilities to the emulated CVQ.
>
> Other optimizations like applying the state out of band can also be
> added so they can run in parallel with the migration, but that
> requires a bigger change in qemu migration protocol making us lose
> focus on achieving at least the basic device migration in my opinion.
>
> Thoughts?
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: About restoring the state in vhost-vdpa device
@ 2022-05-12  4:00   ` Jason Wang
  0 siblings, 0 replies; 20+ messages in thread
From: Jason Wang @ 2022-05-12  4:00 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: virtualization, qemu-level, Cindy Lu, Parav Pandit, Gautam Dawar,
	virtio-networking, Eli Cohen, Laurent Vivier, Stefano Garzarella

On Thu, May 12, 2022 at 3:44 AM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> This is a proposal to restore the state of the vhost-vdpa device at
> the destination after a live migration. It uses as many available
> features both from the device and from qemu as possible so we keep the
> communication simple and speed up the merging process.

When we finalize the design, we can formalize it in kernel Documentation/

>
> # Initializing a vhost-vdpa device.
>
> Without the context of live migration, the steps to initialize the
> device from vhost-vdpa at qemu starting are:
> 1) [vhost] Open the vdpa device, Using simply open()
> 2) [vhost+virtio] Get device features. These are expected not to
> change in the device's lifetime, so we can save them. Qemu issues a
> VHOST_GET_FEATURES ioctl and vdpa forwards to the backend driver using
> get_device_features() callback.

For "virtio" do you mean it's an action that is defined in the spec?

> 3) [vhost+virtio] Get its max_queue_pairs if _F_MQ and _F_CTRL_VQ.
> These are obtained using VHOST_VDPA_GET_CONFIG, and that request is
> forwarded to the device using get_config. QEMU expects the device to
> not change it in its lifetime.
> 4) [vhost] Vdpa set status (_S_ACKNOLEDGE, _S_DRIVER). Still no
> FEATURES_OK or DRIVER_OK. The ioctl is VHOST_VDPA_SET_STATUS, and the
> vdpa backend driver callback is set_status.
>
> These are the steps used to initialize the device in qemu terminology,
> taking away some redundancies to make it simpler.
>
> Now the driver sends the FEATURES_OK and the DRIVER_OK, and qemu
> detects it, so it *starts* the device.
>
> # Starting a vhost-vdpa device
>
> At virtio_net_vhost_status we have two important variables here:
> int cvq = _F_CTRL_VQ ? 1 : 0;
> int queue_pairs = _F_CTRL_VQ && _F_MQ ? (max_queue_pairs of step 3) : 0;
>
> Now identification of the cvq index. Qemu *know* that the device will
> expose it at the last queue (max_queue_pairs*2) if _F_MQ has been
> acknowledged by the guest's driver or 2 if not. It cannot depend on
> any data sent to the device via cvq, because we couldn't get its
> command status on a change.
>
> Now we start the vhost device. The workflow is currently:
>
> 5) [virtio+vhost] The first step is to send the acknowledgement of the
> Virtio features and vhost/vdpa backend features to the device, so it
> knows how to configure itself. This is done using the same calls as
> step 4 with these feature bits added.
> 6) [virtio] Set the size, base, addr, kick and call fd for each queue
> (SET_VRING_ADDR, SET_VRING_NUM, ...; and forwarded with
> set_vq_address, set_vq_state, ...)
> 7) [vdpa] Send host notifiers and *send SET_VRING_ENABLE = 1* for each
> queue. This is done using ioctl VHOST_VDPA_SET_VRING_ENABLE, and
> forwarded to the vdpa backend using set_vq_ready callback.
> 8) [virtio + vdpa] Send memory translations & set DRIVER_OK.
>
> If we follow the current workflow, the device is allowed now to start
> receiving only on vq pair 0, since we've still not set the multi queue
> pair. This could cause the guest to receive packets in unexpected
> queues, breaking RSS.
>
> # Proposal
>
> Our proposal diverge in step 7: Instead of enabling *all* the
> virtqueues, only enable the CVQ. After that, send the DRIVER_OK and
> queue all the control commands to restore the device status (MQ, RSS,
> ...). Once all of them have been acknowledged ("device", or emulated
> cvq in host vdpa backend driver, has used all cvq buffers, enable
> (SET_VRING_ENABLE, set_vq_ready) all other queues.
>
> Everything needed for this is already implemented in the kernel as far
> as I see, there is only a small modification in qemu needed. Thus
> achieving the restoring of the device state without creating
> maintenance burden.

Yes, one of the major motivations is to try to reuse the existing APIs
as much as possible as a start. It doesn't mean we can't invent new
API, but having a dedicated save/restore uAPI looks fine. But it looks
more like a work that needs to be finalized in the virtio spec
otherwise we may end up with code that is hard to maintain.

Thanks

>
> A lot of optimizations can be applied on top without the need to add
> stuff to the migration protocol or vDPA uAPI, like the pre-warming of
> the vdpa queues or adding more capabilities to the emulated CVQ.
>
> Other optimizations like applying the state out of band can also be
> added so they can run in parallel with the migration, but that
> requires a bigger change in qemu migration protocol making us lose
> focus on achieving at least the basic device migration in my opinion.
>
> Thoughts?
>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: About restoring the state in vhost-vdpa device
  2022-05-11 19:43 About restoring the state in vhost-vdpa device Eugenio Perez Martin
@ 2022-05-13 15:08   ` Parav Pandit
  2022-05-13 15:08   ` Parav Pandit
  1 sibling, 0 replies; 20+ messages in thread
From: Parav Pandit via Virtualization @ 2022-05-13 15:08 UTC (permalink / raw)
  To: Eugenio Perez Martin, virtualization, qemu-level, Jason Wang,
	Cindy Lu, Gautam Dawar, virtio-networking, Eli Cohen,
	Laurent Vivier, Stefano Garzarella


> From: Eugenio Perez Martin <eperezma@redhat.com>
> Sent: Wednesday, May 11, 2022 3:44 PM
> 
> This is a proposal to restore the state of the vhost-vdpa device at the
> destination after a live migration. It uses as many available features both
> from the device and from qemu as possible so we keep the communication
> simple and speed up the merging process.
> 
> # Initializing a vhost-vdpa device.
> 
> Without the context of live migration, the steps to initialize the device from
> vhost-vdpa at qemu starting are:
> 1) [vhost] Open the vdpa device, Using simply open()
> 2) [vhost+virtio] Get device features. These are expected not to change in
> the device's lifetime, so we can save them. Qemu issues a
> VHOST_GET_FEATURES ioctl and vdpa forwards to the backend driver using
> get_device_features() callback.
> 3) [vhost+virtio] Get its max_queue_pairs if _F_MQ and _F_CTRL_VQ.
This should be soon replaced with more generic num_vq interface as max_queue_pairs don’t, work beyond net.
There is no need to continue some ancient interface way for newly built vdpa stack.

> These are obtained using VHOST_VDPA_GET_CONFIG, and that request is
> forwarded to the device using get_config. QEMU expects the device to not
> change it in its lifetime.
> 4) [vhost] Vdpa set status (_S_ACKNOLEDGE, _S_DRIVER). Still no
> FEATURES_OK or DRIVER_OK. The ioctl is VHOST_VDPA_SET_STATUS, and
> the vdpa backend driver callback is set_status.
> 
> These are the steps used to initialize the device in qemu terminology, taking
> away some redundancies to make it simpler.
> 
> Now the driver sends the FEATURES_OK and the DRIVER_OK, and qemu
> detects it, so it *starts* the device.
> 
> # Starting a vhost-vdpa device
> 
> At virtio_net_vhost_status we have two important variables here:
> int cvq = _F_CTRL_VQ ? 1 : 0;
> int queue_pairs = _F_CTRL_VQ && _F_MQ ? (max_queue_pairs of step 3) :
> 0;
> 
> Now identification of the cvq index. Qemu *know* that the device will
> expose it at the last queue (max_queue_pairs*2) if _F_MQ has been
> acknowledged by the guest's driver or 2 if not. It cannot depend on any data
> sent to the device via cvq, because we couldn't get its command status on a
> change.
> 
> Now we start the vhost device. The workflow is currently:
> 
> 5) [virtio+vhost] The first step is to send the acknowledgement of the Virtio
> features and vhost/vdpa backend features to the device, so it knows how to
> configure itself. This is done using the same calls as step 4 with these feature
> bits added.
> 6) [virtio] Set the size, base, addr, kick and call fd for each queue
> (SET_VRING_ADDR, SET_VRING_NUM, ...; and forwarded with
> set_vq_address, set_vq_state, ...)
> 7) [vdpa] Send host notifiers and *send SET_VRING_ENABLE = 1* for each
> queue. This is done using ioctl VHOST_VDPA_SET_VRING_ENABLE, and
> forwarded to the vdpa backend using set_vq_ready callback.
> 8) [virtio + vdpa] Send memory translations & set DRIVER_OK.
> 
So MQ all VQs setup should be set before step_8.

> If we follow the current workflow, the device is allowed now to start
> receiving only on vq pair 0, since we've still not set the multi queue pair. This
> could cause the guest to receive packets in unexpected queues, breaking
> RSS.
> 
> # Proposal
> 
> Our proposal diverge in step 7: Instead of enabling *all* the virtqueues, only
> enable the CVQ. 
Just to double check, VQ 0 and 1 of the net are also not enabled, correct?

> After that, send the DRIVER_OK and queue all the control
> commands to restore the device status (MQ, RSS, ...). Once all of them have
> been acknowledged ("device", or emulated cvq in host vdpa backend driver,
> has used all cvq buffers, enable (SET_VRING_ENABLE, set_vq_ready) all
> other queues.
> 
What is special about doing DRIVER_OK and enqueuing the control commands?
Why other configuration cannot be applied before DRIVER_OK?

In other words,
Step_7 already setups up the necessary VQ related fields.

Before doing driver ok, what is needed is to setup any other device fields and features.
For net this includes rss, vlan, mac filters.
So, a new vdpa ioctl() should be able to set these values.
This is the ioctl() between user and kernel.
Post this ioctl(), DRIVER_OK should be done resuming the device.

Device has full view of config now.

This node local device setup change should not require migration protocol change.

This scheme will also work for non_net virtio devices too.

> Everything needed for this is already implemented in the kernel as far as I
> see, there is only a small modification in qemu needed. Thus achieving the
> restoring of the device state without creating maintenance burden.
> 
> A lot of optimizations can be applied on top without the need to add stuff to
> the migration protocol or vDPA uAPI, like the pre-warming of the vdpa
> queues or adding more capabilities to the emulated CVQ.
Above ioctl() will enable vdpa subsystem to apply this setting one mor more times in pre-warming up stage before DRIVER_OK.

> 
> Other optimizations like applying the state out of band can also be added so
> they can run in parallel with the migration, but that requires a bigger change
> in qemu migration protocol making us lose focus on achieving at least the
> basic device migration in my opinion.
> 
Let's strive to apply this in-band as much as possible. Applying out of band opens issues unrelated to migration (authentication and more).

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: About restoring the state in vhost-vdpa device
@ 2022-05-13 15:08   ` Parav Pandit
  0 siblings, 0 replies; 20+ messages in thread
From: Parav Pandit @ 2022-05-13 15:08 UTC (permalink / raw)
  To: Eugenio Perez Martin, virtualization, qemu-level, Jason Wang,
	Cindy Lu, Gautam Dawar, virtio-networking, Eli Cohen,
	Laurent Vivier, Stefano Garzarella


> From: Eugenio Perez Martin <eperezma@redhat.com>
> Sent: Wednesday, May 11, 2022 3:44 PM
> 
> This is a proposal to restore the state of the vhost-vdpa device at the
> destination after a live migration. It uses as many available features both
> from the device and from qemu as possible so we keep the communication
> simple and speed up the merging process.
> 
> # Initializing a vhost-vdpa device.
> 
> Without the context of live migration, the steps to initialize the device from
> vhost-vdpa at qemu starting are:
> 1) [vhost] Open the vdpa device, Using simply open()
> 2) [vhost+virtio] Get device features. These are expected not to change in
> the device's lifetime, so we can save them. Qemu issues a
> VHOST_GET_FEATURES ioctl and vdpa forwards to the backend driver using
> get_device_features() callback.
> 3) [vhost+virtio] Get its max_queue_pairs if _F_MQ and _F_CTRL_VQ.
This should be soon replaced with more generic num_vq interface as max_queue_pairs don’t, work beyond net.
There is no need to continue some ancient interface way for newly built vdpa stack.

> These are obtained using VHOST_VDPA_GET_CONFIG, and that request is
> forwarded to the device using get_config. QEMU expects the device to not
> change it in its lifetime.
> 4) [vhost] Vdpa set status (_S_ACKNOLEDGE, _S_DRIVER). Still no
> FEATURES_OK or DRIVER_OK. The ioctl is VHOST_VDPA_SET_STATUS, and
> the vdpa backend driver callback is set_status.
> 
> These are the steps used to initialize the device in qemu terminology, taking
> away some redundancies to make it simpler.
> 
> Now the driver sends the FEATURES_OK and the DRIVER_OK, and qemu
> detects it, so it *starts* the device.
> 
> # Starting a vhost-vdpa device
> 
> At virtio_net_vhost_status we have two important variables here:
> int cvq = _F_CTRL_VQ ? 1 : 0;
> int queue_pairs = _F_CTRL_VQ && _F_MQ ? (max_queue_pairs of step 3) :
> 0;
> 
> Now identification of the cvq index. Qemu *know* that the device will
> expose it at the last queue (max_queue_pairs*2) if _F_MQ has been
> acknowledged by the guest's driver or 2 if not. It cannot depend on any data
> sent to the device via cvq, because we couldn't get its command status on a
> change.
> 
> Now we start the vhost device. The workflow is currently:
> 
> 5) [virtio+vhost] The first step is to send the acknowledgement of the Virtio
> features and vhost/vdpa backend features to the device, so it knows how to
> configure itself. This is done using the same calls as step 4 with these feature
> bits added.
> 6) [virtio] Set the size, base, addr, kick and call fd for each queue
> (SET_VRING_ADDR, SET_VRING_NUM, ...; and forwarded with
> set_vq_address, set_vq_state, ...)
> 7) [vdpa] Send host notifiers and *send SET_VRING_ENABLE = 1* for each
> queue. This is done using ioctl VHOST_VDPA_SET_VRING_ENABLE, and
> forwarded to the vdpa backend using set_vq_ready callback.
> 8) [virtio + vdpa] Send memory translations & set DRIVER_OK.
> 
So MQ all VQs setup should be set before step_8.

> If we follow the current workflow, the device is allowed now to start
> receiving only on vq pair 0, since we've still not set the multi queue pair. This
> could cause the guest to receive packets in unexpected queues, breaking
> RSS.
> 
> # Proposal
> 
> Our proposal diverge in step 7: Instead of enabling *all* the virtqueues, only
> enable the CVQ. 
Just to double check, VQ 0 and 1 of the net are also not enabled, correct?

> After that, send the DRIVER_OK and queue all the control
> commands to restore the device status (MQ, RSS, ...). Once all of them have
> been acknowledged ("device", or emulated cvq in host vdpa backend driver,
> has used all cvq buffers, enable (SET_VRING_ENABLE, set_vq_ready) all
> other queues.
> 
What is special about doing DRIVER_OK and enqueuing the control commands?
Why other configuration cannot be applied before DRIVER_OK?

In other words,
Step_7 already setups up the necessary VQ related fields.

Before doing driver ok, what is needed is to setup any other device fields and features.
For net this includes rss, vlan, mac filters.
So, a new vdpa ioctl() should be able to set these values.
This is the ioctl() between user and kernel.
Post this ioctl(), DRIVER_OK should be done resuming the device.

Device has full view of config now.

This node local device setup change should not require migration protocol change.

This scheme will also work for non_net virtio devices too.

> Everything needed for this is already implemented in the kernel as far as I
> see, there is only a small modification in qemu needed. Thus achieving the
> restoring of the device state without creating maintenance burden.
> 
> A lot of optimizations can be applied on top without the need to add stuff to
> the migration protocol or vDPA uAPI, like the pre-warming of the vdpa
> queues or adding more capabilities to the emulated CVQ.
Above ioctl() will enable vdpa subsystem to apply this setting one mor more times in pre-warming up stage before DRIVER_OK.

> 
> Other optimizations like applying the state out of band can also be added so
> they can run in parallel with the migration, but that requires a bigger change
> in qemu migration protocol making us lose focus on achieving at least the
> basic device migration in my opinion.
> 
Let's strive to apply this in-band as much as possible. Applying out of band opens issues unrelated to migration (authentication and more).


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: About restoring the state in vhost-vdpa device
  2022-05-13 15:08   ` Parav Pandit
  (?)
@ 2022-05-13 17:48   ` Gautam Dawar
  2022-05-13 18:25       ` Parav Pandit
  -1 siblings, 1 reply; 20+ messages in thread
From: Gautam Dawar @ 2022-05-13 17:48 UTC (permalink / raw)
  To: Parav Pandit, Eugenio Perez Martin, virtualization, qemu-level,
	Jason Wang, Cindy Lu, virtio-networking, Eli Cohen,
	Laurent Vivier, Stefano Garzarella

-----Original Message-----
From: Parav Pandit <parav@nvidia.com> 
Sent: Friday, May 13, 2022 8:39 PM
To: Eugenio Perez Martin <eperezma@redhat.com>; virtualization <virtualization@lists.linux-foundation.org>; qemu-level <qemu-devel@nongnu.org>; Jason Wang <jasowang@redhat.com>; Cindy Lu <lulu@redhat.com>; Gautam Dawar <gdawar@xilinx.com>; virtio-networking@redhat.com; Eli Cohen <elic@nvidia.com>; Laurent Vivier <lvivier@redhat.com>; Stefano Garzarella <sgarzare@redhat.com>
Subject: RE: About restoring the state in vhost-vdpa device


> From: Eugenio Perez Martin <eperezma@redhat.com>
> Sent: Wednesday, May 11, 2022 3:44 PM
> 
> This is a proposal to restore the state of the vhost-vdpa device at 
> the destination after a live migration. It uses as many available 
> features both from the device and from qemu as possible so we keep the 
> communication simple and speed up the merging process.
> 
> # Initializing a vhost-vdpa device.
> 
> Without the context of live migration, the steps to initialize the 
> device from vhost-vdpa at qemu starting are:
> 1) [vhost] Open the vdpa device, Using simply open()
> 2) [vhost+virtio] Get device features. These are expected not to 
> change in the device's lifetime, so we can save them. Qemu issues a 
> VHOST_GET_FEATURES ioctl and vdpa forwards to the backend driver using
> get_device_features() callback.
> 3) [vhost+virtio] Get its max_queue_pairs if _F_MQ and _F_CTRL_VQ.
This should be soon replaced with more generic num_vq interface as max_queue_pairs don’t, work beyond net.
There is no need to continue some ancient interface way for newly built vdpa stack.

> These are obtained using VHOST_VDPA_GET_CONFIG, and that request is 
> forwarded to the device using get_config. QEMU expects the device to 
> not change it in its lifetime.
> 4) [vhost] Vdpa set status (_S_ACKNOLEDGE, _S_DRIVER). Still no 
> FEATURES_OK or DRIVER_OK. The ioctl is VHOST_VDPA_SET_STATUS, and the 
> vdpa backend driver callback is set_status.
> 
> These are the steps used to initialize the device in qemu terminology, 
> taking away some redundancies to make it simpler.
> 
> Now the driver sends the FEATURES_OK and the DRIVER_OK, and qemu 
> detects it, so it *starts* the device.
> 
> # Starting a vhost-vdpa device
> 
> At virtio_net_vhost_status we have two important variables here:
> int cvq = _F_CTRL_VQ ? 1 : 0;
> int queue_pairs = _F_CTRL_VQ && _F_MQ ? (max_queue_pairs of step 3) :
> 0;
> 
> Now identification of the cvq index. Qemu *know* that the device will 
> expose it at the last queue (max_queue_pairs*2) if _F_MQ has been 
> acknowledged by the guest's driver or 2 if not. It cannot depend on 
> any data sent to the device via cvq, because we couldn't get its 
> command status on a change.
> 
> Now we start the vhost device. The workflow is currently:
> 
> 5) [virtio+vhost] The first step is to send the acknowledgement of the 
> Virtio features and vhost/vdpa backend features to the device, so it 
> knows how to configure itself. This is done using the same calls as 
> step 4 with these feature bits added.
> 6) [virtio] Set the size, base, addr, kick and call fd for each queue 
> (SET_VRING_ADDR, SET_VRING_NUM, ...; and forwarded with 
> set_vq_address, set_vq_state, ...)
> 7) [vdpa] Send host notifiers and *send SET_VRING_ENABLE = 1* for each 
> queue. This is done using ioctl VHOST_VDPA_SET_VRING_ENABLE, and 
> forwarded to the vdpa backend using set_vq_ready callback.
> 8) [virtio + vdpa] Send memory translations & set DRIVER_OK.
> 
So MQ all VQs setup should be set before step_8.

> If we follow the current workflow, the device is allowed now to start 
> receiving only on vq pair 0, since we've still not set the multi queue 
> pair. This could cause the guest to receive packets in unexpected 
> queues, breaking RSS.
> 
> # Proposal
> 
> Our proposal diverge in step 7: Instead of enabling *all* the 
> virtqueues, only enable the CVQ.
Just to double check, VQ 0 and 1 of the net are also not enabled, correct?
[GD>>] Yes, that's my understanding as well.

> After that, send the DRIVER_OK and queue all the control commands to 
> restore the device status (MQ, RSS, ...). Once all of them have been 
> acknowledged ("device", or emulated cvq in host vdpa backend driver, 
> has used all cvq buffers, enable (SET_VRING_ENABLE, set_vq_ready) all 
> other queues.
> 
What is special about doing DRIVER_OK and enqueuing the control commands?
Why other configuration cannot be applied before DRIVER_OK?
[GD>>] For the device to be live (and any queue to be able to pass traffic), DRIVER_OK is a must. So, any configuration can be passed over the CVQ only after it is started (vring configuration + DRIVER_OK). For an emulated queue, if the order is reversed, I think the enqueued commands will remain buffered and device should be able to service them when it goes live.

In other words,
Step_7 already setups up the necessary VQ related fields.

Before doing driver ok, what is needed is to setup any other device fields and features.
For net this includes rss, vlan, mac filters.
So, a new vdpa ioctl() should be able to set these values.
This is the ioctl() between user and kernel.
Post this ioctl(), DRIVER_OK should be done resuming the device.

Device has full view of config now.

This node local device setup change should not require migration protocol change.

This scheme will also work for non_net virtio devices too.

> Everything needed for this is already implemented in the kernel as far 
> as I see, there is only a small modification in qemu needed. Thus 
> achieving the restoring of the device state without creating maintenance burden.
> 
> A lot of optimizations can be applied on top without the need to add 
> stuff to the migration protocol or vDPA uAPI, like the pre-warming of 
> the vdpa queues or adding more capabilities to the emulated CVQ.
Above ioctl() will enable vdpa subsystem to apply this setting one mor more times in pre-warming up stage before DRIVER_OK.

> 
> Other optimizations like applying the state out of band can also be 
> added so they can run in parallel with the migration, but that 
> requires a bigger change in qemu migration protocol making us lose 
> focus on achieving at least the basic device migration in my opinion.
> 
Let's strive to apply this in-band as much as possible. Applying out of band opens issues unrelated to migration (authentication and more).


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: About restoring the state in vhost-vdpa device
  2022-05-13 17:48   ` Gautam Dawar
@ 2022-05-13 18:25       ` Parav Pandit
  0 siblings, 0 replies; 20+ messages in thread
From: Parav Pandit via Virtualization @ 2022-05-13 18:25 UTC (permalink / raw)
  To: Gautam Dawar, Eugenio Perez Martin, virtualization, qemu-level,
	Jason Wang, Cindy Lu, virtio-networking, Eli Cohen,
	Laurent Vivier, Stefano Garzarella

Hi Gautam,

Please fix your email client to have right response format.
Otherwise, it will be confusing for the rest and us to follow the conversation.

More below.

> From: Gautam Dawar <gdawar@xilinx.com>
> Sent: Friday, May 13, 2022 1:48 PM

> > Our proposal diverge in step 7: Instead of enabling *all* the
> > virtqueues, only enable the CVQ.
> Just to double check, VQ 0 and 1 of the net are also not enabled, correct?
> [GD>>] Yes, that's my understanding as well.
> 
> > After that, send the DRIVER_OK and queue all the control commands to
> > restore the device status (MQ, RSS, ...). Once all of them have been
> > acknowledged ("device", or emulated cvq in host vdpa backend driver,
> > has used all cvq buffers, enable (SET_VRING_ENABLE, set_vq_ready) all
> > other queues.
> >
> What is special about doing DRIVER_OK and enqueuing the control
> commands?
> Why other configuration cannot be applied before DRIVER_OK?
> [GD>>] For the device to be live (and any queue to be able to pass traffic),
> DRIVER_OK is a must. 
This applies only to the vdpa device implemented over virtio device.
For such use case/implementation anyway a proper virtio spec extension is needed for it be efficient.
And what that happens this scheme will still work.

Other vdpa devices doesn’t have to live with this limitation at this moment.

> So, any configuration can be passed over the CVQ only
> after it is started (vring configuration + DRIVER_OK). For an emulated queue,
> if the order is reversed, I think the enqueued commands will remain
> buffered and device should be able to service them when it goes live.
I likely didn’t understand what you describe above.

Vq avail index etc is programmed before doing DRIVER_OK anyway.

Sequence is very straight forward at destination from user to kernel.
1. Set config space fields (such as virtio_net_config/virtio_blk_config).
2. Set other device attributes (max vqs, current num vqs)
3. Set net specific config (RSS, vlan, mac and more filters)
4. Set VQ fields (enable, msix, addr, avail indx)
5. Set DRIVER_OK, device resumes from where it left off

Steps #1 to #4 can be done multiple times in pre-warm device up case in future.
For now, they can be done once to get things started.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: About restoring the state in vhost-vdpa device
@ 2022-05-13 18:25       ` Parav Pandit
  0 siblings, 0 replies; 20+ messages in thread
From: Parav Pandit @ 2022-05-13 18:25 UTC (permalink / raw)
  To: Gautam Dawar, Eugenio Perez Martin, virtualization, qemu-level,
	Jason Wang, Cindy Lu, virtio-networking, Eli Cohen,
	Laurent Vivier, Stefano Garzarella

Hi Gautam,

Please fix your email client to have right response format.
Otherwise, it will be confusing for the rest and us to follow the conversation.

More below.

> From: Gautam Dawar <gdawar@xilinx.com>
> Sent: Friday, May 13, 2022 1:48 PM

> > Our proposal diverge in step 7: Instead of enabling *all* the
> > virtqueues, only enable the CVQ.
> Just to double check, VQ 0 and 1 of the net are also not enabled, correct?
> [GD>>] Yes, that's my understanding as well.
> 
> > After that, send the DRIVER_OK and queue all the control commands to
> > restore the device status (MQ, RSS, ...). Once all of them have been
> > acknowledged ("device", or emulated cvq in host vdpa backend driver,
> > has used all cvq buffers, enable (SET_VRING_ENABLE, set_vq_ready) all
> > other queues.
> >
> What is special about doing DRIVER_OK and enqueuing the control
> commands?
> Why other configuration cannot be applied before DRIVER_OK?
> [GD>>] For the device to be live (and any queue to be able to pass traffic),
> DRIVER_OK is a must. 
This applies only to the vdpa device implemented over virtio device.
For such use case/implementation anyway a proper virtio spec extension is needed for it be efficient.
And what that happens this scheme will still work.

Other vdpa devices doesn’t have to live with this limitation at this moment.

> So, any configuration can be passed over the CVQ only
> after it is started (vring configuration + DRIVER_OK). For an emulated queue,
> if the order is reversed, I think the enqueued commands will remain
> buffered and device should be able to service them when it goes live.
I likely didn’t understand what you describe above.

Vq avail index etc is programmed before doing DRIVER_OK anyway.

Sequence is very straight forward at destination from user to kernel.
1. Set config space fields (such as virtio_net_config/virtio_blk_config).
2. Set other device attributes (max vqs, current num vqs)
3. Set net specific config (RSS, vlan, mac and more filters)
4. Set VQ fields (enable, msix, addr, avail indx)
5. Set DRIVER_OK, device resumes from where it left off

Steps #1 to #4 can be done multiple times in pre-warm device up case in future.
For now, they can be done once to get things started.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: About restoring the state in vhost-vdpa device
  2022-05-13 18:25       ` Parav Pandit
  (?)
@ 2022-05-16  8:50       ` Eugenio Perez Martin
  2022-05-16 20:29           ` Parav Pandit
  -1 siblings, 1 reply; 20+ messages in thread
From: Eugenio Perez Martin @ 2022-05-16  8:50 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Gautam Dawar, virtualization, qemu-level, Jason Wang, Cindy Lu,
	virtio-networking, Eli Cohen, Laurent Vivier, Stefano Garzarella

On Fri, May 13, 2022 at 8:25 PM Parav Pandit <parav@nvidia.com> wrote:
>
> Hi Gautam,
>
> Please fix your email client to have right response format.
> Otherwise, it will be confusing for the rest and us to follow the conversation.
>
> More below.
>
> > From: Gautam Dawar <gdawar@xilinx.com>
> > Sent: Friday, May 13, 2022 1:48 PM
>
> > > Our proposal diverge in step 7: Instead of enabling *all* the
> > > virtqueues, only enable the CVQ.
> > Just to double check, VQ 0 and 1 of the net are also not enabled, correct?
> > [GD>>] Yes, that's my understanding as well.
> >

That's correct. We can say that for a queue to be enabled three things
must happen:
* DRIVER_OK (Still not send)
* VHOST_VDPA_SET_VRING_ENABLE sent for that queue (Only sent for CVQ)
* If queue is not in first data queue pair and not cvq: send
VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET with a queue pair that include it.

> > > After that, send the DRIVER_OK and queue all the control commands to
> > > restore the device status (MQ, RSS, ...). Once all of them have been
> > > acknowledged ("device", or emulated cvq in host vdpa backend driver,
> > > has used all cvq buffers, enable (SET_VRING_ENABLE, set_vq_ready) all
> > > other queues.
> > >
> > What is special about doing DRIVER_OK and enqueuing the control
> > commands?
> > Why other configuration cannot be applied before DRIVER_OK?

There is nothing special beyond "they have a method to be set that
way, so reusing it avoids having to maintain many ways to set the same
things, simplifying implementations".

I'm not saying "it has been like that forever so we cannot change it":
I'm very open to the change but priority-wise we should first achieve
a working LM with packed, in_order, or even indirect.

> > [GD>>] For the device to be live (and any queue to be able to pass traffic),
> > DRIVER_OK is a must.
> This applies only to the vdpa device implemented over virtio device.
> For such use case/implementation anyway a proper virtio spec extension is needed for it be efficient.
> And what that happens this scheme will still work.
>

Although it's a longer route, I'd very much prefer an in-band virtio
way to perform it rather than a linux/vdpa specific. It's one of the
reasons I prefer the CVQ behavior over a vdpa specific ioctl.

> Other vdpa devices doesn’t have to live with this limitation at this moment.
>
> > So, any configuration can be passed over the CVQ only
> > after it is started (vring configuration + DRIVER_OK). For an emulated queue,
> > if the order is reversed, I think the enqueued commands will remain
> > buffered and device should be able to service them when it goes live.
> I likely didn’t understand what you describe above.
>
> Vq avail index etc is programmed before doing DRIVER_OK anyway.
>
> Sequence is very straight forward at destination from user to kernel.
> 1. Set config space fields (such as virtio_net_config/virtio_blk_config).
> 2. Set other device attributes (max vqs, current num vqs)
> 3. Set net specific config (RSS, vlan, mac and more filters)
> 4. Set VQ fields (enable, msix, addr, avail indx)
> 5. Set DRIVER_OK, device resumes from where it left off
>
> Steps #1 to #4 can be done multiple times in pre-warm device up case in future.

That requires creating a way to set all the parameters enumerated
there to vdpa devices. Each time a new feature is added to virtio-net
that modifies the guest-visible fronted device we would need to update
it. And all the layers of the stack need to maintain more state.

From the guest point of view, to enable all the queues with
VHOST_VDPA_SET_VRING_ENABLE and don't send DRIVER_OK is the same as
send DRIVER_OK and not to enable any data queue with
VHOST_VDPA_SET_VRING_ENABLE. We can do all the pre-warming phase that
way too, avoiding adding more maintenance burden to vdpa.

> For now, they can be done once to get things started.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: About restoring the state in vhost-vdpa device
  2022-05-16  8:50       ` Eugenio Perez Martin
@ 2022-05-16 20:29           ` Parav Pandit
  0 siblings, 0 replies; 20+ messages in thread
From: Parav Pandit via Virtualization @ 2022-05-16 20:29 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Cindy Lu, virtio-networking, qemu-level,
	Gautam Dawar, virtualization, Eli Cohen



> From: Eugenio Perez Martin <eperezma@redhat.com>
> Sent: Monday, May 16, 2022 4:51 AM
> 
> On Fri, May 13, 2022 at 8:25 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> > Hi Gautam,
> >
> > Please fix your email client to have right response format.
> > Otherwise, it will be confusing for the rest and us to follow the
> conversation.
> >
> > More below.
> >
> > > From: Gautam Dawar <gdawar@xilinx.com>
> > > Sent: Friday, May 13, 2022 1:48 PM
> >
> > > > Our proposal diverge in step 7: Instead of enabling *all* the
> > > > virtqueues, only enable the CVQ.
> > > Just to double check, VQ 0 and 1 of the net are also not enabled, correct?
> > > [GD>>] Yes, that's my understanding as well.
> > >
> 
> That's correct. We can say that for a queue to be enabled three things must
> happen:
> * DRIVER_OK (Still not send)
> * VHOST_VDPA_SET_VRING_ENABLE sent for that queue (Only sent for
> CVQ)
> * If queue is not in first data queue pair and not cvq: send
> VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET with a queue pair that include it.
> 
These if conditions, specially the last one is not good as it requires device type knowledge, which in most cases not needed.
Specially for the new code.

> > > > After that, send the DRIVER_OK and queue all the control commands
> > > > to restore the device status (MQ, RSS, ...). Once all of them have
> > > > been acknowledged ("device", or emulated cvq in host vdpa backend
> > > > driver, has used all cvq buffers, enable (SET_VRING_ENABLE,
> > > > set_vq_ready) all other queues.
> > > >
> > > What is special about doing DRIVER_OK and enqueuing the control
> > > commands?
> > > Why other configuration cannot be applied before DRIVER_OK?
> 
> There is nothing special beyond "they have a method to be set that way, so
> reusing it avoids having to maintain many ways to set the same things,
> simplifying implementations".
> 
> I'm not saying "it has been like that forever so we cannot change it":
> I'm very open to the change but priority-wise we should first achieve a
> working LM with packed, in_order, or even indirect.
> 
> > > [GD>>] For the device to be live (and any queue to be able to pass
> > > traffic), DRIVER_OK is a must.
> > This applies only to the vdpa device implemented over virtio device.
> > For such use case/implementation anyway a proper virtio spec extension is
> needed for it be efficient.
> > And what that happens this scheme will still work.
> >
> 
> Although it's a longer route, I'd very much prefer an in-band virtio way to
> perform it rather than a linux/vdpa specific. It's one of the reasons I prefer
> the CVQ behavior over a vdpa specific ioctl.
> 
What is the in-band method to set last_avail_idx?
In-band virtio method doesn't exist.

> > Other vdpa devices doesn’t have to live with this limitation at this moment.
> >
> > > So, any configuration can be passed over the CVQ only after it is
> > > started (vring configuration + DRIVER_OK). For an emulated queue, if
> > > the order is reversed, I think the enqueued commands will remain
> > > buffered and device should be able to service them when it goes live.
> > I likely didn’t understand what you describe above.
> >
> > Vq avail index etc is programmed before doing DRIVER_OK anyway.
> >
> > Sequence is very straight forward at destination from user to kernel.
> > 1. Set config space fields (such as virtio_net_config/virtio_blk_config).
> > 2. Set other device attributes (max vqs, current num vqs) 3. Set net
> > specific config (RSS, vlan, mac and more filters) 4. Set VQ fields
> > (enable, msix, addr, avail indx) 5. Set DRIVER_OK, device resumes from
> > where it left off
> >
> > Steps #1 to #4 can be done multiple times in pre-warm device up case in
> future.
> 
> That requires creating a way to set all the parameters enumerated there to
> vdpa devices. Each time a new feature is added to virtio-net that modifies
> the guest-visible fronted device we would need to update it. 
Any guest visible feature exposed by the vdpa device on the source side, a migration agent needs to verify/and make destination device capable to support it anyway. Without it a device may be migrated, but it won't perform at same level as source.

> And all the
> layers of the stack need to maintain more state.
Mostly not. A complete virtio device state arrived from source vdpa device can be given to destination vdpa device without anyone else looking in the middle. If this format is known/well defined.

> 
> From the guest point of view, to enable all the queues with
> VHOST_VDPA_SET_VRING_ENABLE and don't send DRIVER_OK is the same
> as send DRIVER_OK and not to enable any data queue with
> VHOST_VDPA_SET_VRING_ENABLE. 

Enabling SET_VRING_ENABLE after DRIVER_OK has two basic things broken.
1. supplied RSS config and VQ config is not honored for several tens of hundreds of milliseconds
It will be purely dependent on how/when this ioctl are made.
Due to this behavior packet supposed to arrive in X VQ, arrives in Y VQ.

2. Each VQ enablement one at a time, requires constant steering update for the VQ
While this information is something already known. Trying to reuse brings a callback result in this in-efficiency.
So better to start with more reusable APIs that fits the LM flow.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: About restoring the state in vhost-vdpa device
@ 2022-05-16 20:29           ` Parav Pandit
  0 siblings, 0 replies; 20+ messages in thread
From: Parav Pandit @ 2022-05-16 20:29 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Gautam Dawar, virtualization, qemu-level, Jason Wang, Cindy Lu,
	virtio-networking, Eli Cohen, Laurent Vivier, Stefano Garzarella



> From: Eugenio Perez Martin <eperezma@redhat.com>
> Sent: Monday, May 16, 2022 4:51 AM
> 
> On Fri, May 13, 2022 at 8:25 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> > Hi Gautam,
> >
> > Please fix your email client to have right response format.
> > Otherwise, it will be confusing for the rest and us to follow the
> conversation.
> >
> > More below.
> >
> > > From: Gautam Dawar <gdawar@xilinx.com>
> > > Sent: Friday, May 13, 2022 1:48 PM
> >
> > > > Our proposal diverge in step 7: Instead of enabling *all* the
> > > > virtqueues, only enable the CVQ.
> > > Just to double check, VQ 0 and 1 of the net are also not enabled, correct?
> > > [GD>>] Yes, that's my understanding as well.
> > >
> 
> That's correct. We can say that for a queue to be enabled three things must
> happen:
> * DRIVER_OK (Still not send)
> * VHOST_VDPA_SET_VRING_ENABLE sent for that queue (Only sent for
> CVQ)
> * If queue is not in first data queue pair and not cvq: send
> VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET with a queue pair that include it.
> 
These if conditions, specially the last one is not good as it requires device type knowledge, which in most cases not needed.
Specially for the new code.

> > > > After that, send the DRIVER_OK and queue all the control commands
> > > > to restore the device status (MQ, RSS, ...). Once all of them have
> > > > been acknowledged ("device", or emulated cvq in host vdpa backend
> > > > driver, has used all cvq buffers, enable (SET_VRING_ENABLE,
> > > > set_vq_ready) all other queues.
> > > >
> > > What is special about doing DRIVER_OK and enqueuing the control
> > > commands?
> > > Why other configuration cannot be applied before DRIVER_OK?
> 
> There is nothing special beyond "they have a method to be set that way, so
> reusing it avoids having to maintain many ways to set the same things,
> simplifying implementations".
> 
> I'm not saying "it has been like that forever so we cannot change it":
> I'm very open to the change but priority-wise we should first achieve a
> working LM with packed, in_order, or even indirect.
> 
> > > [GD>>] For the device to be live (and any queue to be able to pass
> > > traffic), DRIVER_OK is a must.
> > This applies only to the vdpa device implemented over virtio device.
> > For such use case/implementation anyway a proper virtio spec extension is
> needed for it be efficient.
> > And what that happens this scheme will still work.
> >
> 
> Although it's a longer route, I'd very much prefer an in-band virtio way to
> perform it rather than a linux/vdpa specific. It's one of the reasons I prefer
> the CVQ behavior over a vdpa specific ioctl.
> 
What is the in-band method to set last_avail_idx?
In-band virtio method doesn't exist.

> > Other vdpa devices doesn’t have to live with this limitation at this moment.
> >
> > > So, any configuration can be passed over the CVQ only after it is
> > > started (vring configuration + DRIVER_OK). For an emulated queue, if
> > > the order is reversed, I think the enqueued commands will remain
> > > buffered and device should be able to service them when it goes live.
> > I likely didn’t understand what you describe above.
> >
> > Vq avail index etc is programmed before doing DRIVER_OK anyway.
> >
> > Sequence is very straight forward at destination from user to kernel.
> > 1. Set config space fields (such as virtio_net_config/virtio_blk_config).
> > 2. Set other device attributes (max vqs, current num vqs) 3. Set net
> > specific config (RSS, vlan, mac and more filters) 4. Set VQ fields
> > (enable, msix, addr, avail indx) 5. Set DRIVER_OK, device resumes from
> > where it left off
> >
> > Steps #1 to #4 can be done multiple times in pre-warm device up case in
> future.
> 
> That requires creating a way to set all the parameters enumerated there to
> vdpa devices. Each time a new feature is added to virtio-net that modifies
> the guest-visible fronted device we would need to update it. 
Any guest visible feature exposed by the vdpa device on the source side, a migration agent needs to verify/and make destination device capable to support it anyway. Without it a device may be migrated, but it won't perform at same level as source.

> And all the
> layers of the stack need to maintain more state.
Mostly not. A complete virtio device state arrived from source vdpa device can be given to destination vdpa device without anyone else looking in the middle. If this format is known/well defined.

> 
> From the guest point of view, to enable all the queues with
> VHOST_VDPA_SET_VRING_ENABLE and don't send DRIVER_OK is the same
> as send DRIVER_OK and not to enable any data queue with
> VHOST_VDPA_SET_VRING_ENABLE. 

Enabling SET_VRING_ENABLE after DRIVER_OK has two basic things broken.
1. supplied RSS config and VQ config is not honored for several tens of hundreds of milliseconds
It will be purely dependent on how/when this ioctl are made.
Due to this behavior packet supposed to arrive in X VQ, arrives in Y VQ.

2. Each VQ enablement one at a time, requires constant steering update for the VQ
While this information is something already known. Trying to reuse brings a callback result in this in-efficiency.
So better to start with more reusable APIs that fits the LM flow.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: About restoring the state in vhost-vdpa device
  2022-05-16 20:29           ` Parav Pandit
@ 2022-05-17  3:05             ` Jason Wang
  -1 siblings, 0 replies; 20+ messages in thread
From: Jason Wang @ 2022-05-17  3:05 UTC (permalink / raw)
  To: Parav Pandit, Eugenio Perez Martin
  Cc: Laurent Vivier, Cindy Lu, virtio-networking, qemu-level,
	Gautam Dawar, virtualization, Eli Cohen


在 2022/5/17 04:29, Parav Pandit 写道:
>
>> From: Eugenio Perez Martin <eperezma@redhat.com>
>> Sent: Monday, May 16, 2022 4:51 AM
>>
>> On Fri, May 13, 2022 at 8:25 PM Parav Pandit <parav@nvidia.com> wrote:
>>> Hi Gautam,
>>>
>>> Please fix your email client to have right response format.
>>> Otherwise, it will be confusing for the rest and us to follow the
>> conversation.
>>> More below.
>>>
>>>> From: Gautam Dawar <gdawar@xilinx.com>
>>>> Sent: Friday, May 13, 2022 1:48 PM
>>>>> Our proposal diverge in step 7: Instead of enabling *all* the
>>>>> virtqueues, only enable the CVQ.
>>>> Just to double check, VQ 0 and 1 of the net are also not enabled, correct?
>>>> [GD>>] Yes, that's my understanding as well.
>>>>
>> That's correct. We can say that for a queue to be enabled three things must
>> happen:
>> * DRIVER_OK (Still not send)
>> * VHOST_VDPA_SET_VRING_ENABLE sent for that queue (Only sent for
>> CVQ)
>> * If queue is not in first data queue pair and not cvq: send
>> VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET with a queue pair that include it.
>>
> These if conditions, specially the last one is not good as it requires device type knowledge, which in most cases not needed.
> Specially for the new code.
>
>>>>> After that, send the DRIVER_OK and queue all the control commands
>>>>> to restore the device status (MQ, RSS, ...). Once all of them have
>>>>> been acknowledged ("device", or emulated cvq in host vdpa backend
>>>>> driver, has used all cvq buffers, enable (SET_VRING_ENABLE,
>>>>> set_vq_ready) all other queues.
>>>>>
>>>> What is special about doing DRIVER_OK and enqueuing the control
>>>> commands?
>>>> Why other configuration cannot be applied before DRIVER_OK?
>> There is nothing special beyond "they have a method to be set that way, so
>> reusing it avoids having to maintain many ways to set the same things,
>> simplifying implementations".
>>
>> I'm not saying "it has been like that forever so we cannot change it":
>> I'm very open to the change but priority-wise we should first achieve a
>> working LM with packed, in_order, or even indirect.
>>
>>>> [GD>>] For the device to be live (and any queue to be able to pass
>>>> traffic), DRIVER_OK is a must.
>>> This applies only to the vdpa device implemented over virtio device.
>>> For such use case/implementation anyway a proper virtio spec extension is
>> needed for it be efficient.
>>> And what that happens this scheme will still work.
>>>
>> Although it's a longer route, I'd very much prefer an in-band virtio way to
>> perform it rather than a linux/vdpa specific. It's one of the reasons I prefer
>> the CVQ behavior over a vdpa specific ioctl.
>>
> What is the in-band method to set last_avail_idx?
> In-band virtio method doesn't exist.


Right, but it's part of the vhost API which was there for more than 10 
years. This should be supported by all the vDPA vendors.


>
>>> Other vdpa devices doesn’t have to live with this limitation at this moment.
>>>
>>>> So, any configuration can be passed over the CVQ only after it is
>>>> started (vring configuration + DRIVER_OK). For an emulated queue, if
>>>> the order is reversed, I think the enqueued commands will remain
>>>> buffered and device should be able to service them when it goes live.
>>> I likely didn’t understand what you describe above.
>>>
>>> Vq avail index etc is programmed before doing DRIVER_OK anyway.
>>>
>>> Sequence is very straight forward at destination from user to kernel.
>>> 1. Set config space fields (such as virtio_net_config/virtio_blk_config).
>>> 2. Set other device attributes (max vqs, current num vqs) 3. Set net
>>> specific config (RSS, vlan, mac and more filters) 4. Set VQ fields
>>> (enable, msix, addr, avail indx) 5. Set DRIVER_OK, device resumes from
>>> where it left off
>>>
>>> Steps #1 to #4 can be done multiple times in pre-warm device up case in
>> future.
>>
>> That requires creating a way to set all the parameters enumerated there to
>> vdpa devices. Each time a new feature is added to virtio-net that modifies
>> the guest-visible fronted device we would need to update it.
> Any guest visible feature exposed by the vdpa device on the source side, a migration agent needs to verify/and make destination device capable to support it anyway. Without it a device may be migrated, but it won't perform at same level as source.
>
>> And all the
>> layers of the stack need to maintain more state.
> Mostly not. A complete virtio device state arrived from source vdpa device can be given to destination vdpa device without anyone else looking in the middle. If this format is known/well defined.


That's fine, and it seems the virtio spec is a better place for this, 
then we won't duplicate efforts?


>
>>  From the guest point of view, to enable all the queues with
>> VHOST_VDPA_SET_VRING_ENABLE and don't send DRIVER_OK is the same
>> as send DRIVER_OK and not to enable any data queue with
>> VHOST_VDPA_SET_VRING_ENABLE.
> Enabling SET_VRING_ENABLE after DRIVER_OK has two basic things broken.


It looks to me the spec:

1) For PCI it doesn't forbid the driver to set queue_enable to 1 after 
DRIVER_OK.
2) For MMIO, it even allows the driver to disable a queue after DRIVER_OK


> 1. supplied RSS config and VQ config is not honored for several tens of hundreds of milliseconds
> It will be purely dependent on how/when this ioctl are made.
> Due to this behavior packet supposed to arrive in X VQ, arrives in Y VQ.


I don't get why we end up with this situation.

1) enable cvq
2) set driver_ok
3) set RSS
4) enable TX/RX

vs

1) set RSS
2) enable cvq
3) enable TX/RX
4) set driver_ok

Is the latter faster?


>
> 2. Each VQ enablement one at a time, requires constant steering update for the VQ
> While this information is something already known. Trying to reuse brings a callback result in this in-efficiency.
> So better to start with more reusable APIs that fits the LM flow.


I agree, but the method proposed in the mail seems to be the only way 
that can work with the all the major vDPA vendors.

E.g the new API requires the device has the ability to receive device 
state other than the control virtqueue which might not be supported the 
hardware. (The device might expects a trap and emulate model rather than 
save and restore).

 From qemu point of view, it might need to support both models.

If the device can't do save and restore:

1.1) enable cvq
1.2) set driver_ok
1.3) set device state (MQ, RSS) via control vq
1.4) enable TX/RX

If the device can do save and restore:

2.1) set device state (new API for setting MQ,RSS)
2.2) enable cvq
2.3) enable TX?RX
2.4) set driver_ok

We can start from 1 since it works for all device and then adding 
support for 2?

Thanks


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: About restoring the state in vhost-vdpa device
@ 2022-05-17  3:05             ` Jason Wang
  0 siblings, 0 replies; 20+ messages in thread
From: Jason Wang @ 2022-05-17  3:05 UTC (permalink / raw)
  To: Parav Pandit, Eugenio Perez Martin
  Cc: Gautam Dawar, virtualization, qemu-level, Cindy Lu,
	virtio-networking, Eli Cohen, Laurent Vivier, Stefano Garzarella


在 2022/5/17 04:29, Parav Pandit 写道:
>
>> From: Eugenio Perez Martin <eperezma@redhat.com>
>> Sent: Monday, May 16, 2022 4:51 AM
>>
>> On Fri, May 13, 2022 at 8:25 PM Parav Pandit <parav@nvidia.com> wrote:
>>> Hi Gautam,
>>>
>>> Please fix your email client to have right response format.
>>> Otherwise, it will be confusing for the rest and us to follow the
>> conversation.
>>> More below.
>>>
>>>> From: Gautam Dawar <gdawar@xilinx.com>
>>>> Sent: Friday, May 13, 2022 1:48 PM
>>>>> Our proposal diverge in step 7: Instead of enabling *all* the
>>>>> virtqueues, only enable the CVQ.
>>>> Just to double check, VQ 0 and 1 of the net are also not enabled, correct?
>>>> [GD>>] Yes, that's my understanding as well.
>>>>
>> That's correct. We can say that for a queue to be enabled three things must
>> happen:
>> * DRIVER_OK (Still not send)
>> * VHOST_VDPA_SET_VRING_ENABLE sent for that queue (Only sent for
>> CVQ)
>> * If queue is not in first data queue pair and not cvq: send
>> VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET with a queue pair that include it.
>>
> These if conditions, specially the last one is not good as it requires device type knowledge, which in most cases not needed.
> Specially for the new code.
>
>>>>> After that, send the DRIVER_OK and queue all the control commands
>>>>> to restore the device status (MQ, RSS, ...). Once all of them have
>>>>> been acknowledged ("device", or emulated cvq in host vdpa backend
>>>>> driver, has used all cvq buffers, enable (SET_VRING_ENABLE,
>>>>> set_vq_ready) all other queues.
>>>>>
>>>> What is special about doing DRIVER_OK and enqueuing the control
>>>> commands?
>>>> Why other configuration cannot be applied before DRIVER_OK?
>> There is nothing special beyond "they have a method to be set that way, so
>> reusing it avoids having to maintain many ways to set the same things,
>> simplifying implementations".
>>
>> I'm not saying "it has been like that forever so we cannot change it":
>> I'm very open to the change but priority-wise we should first achieve a
>> working LM with packed, in_order, or even indirect.
>>
>>>> [GD>>] For the device to be live (and any queue to be able to pass
>>>> traffic), DRIVER_OK is a must.
>>> This applies only to the vdpa device implemented over virtio device.
>>> For such use case/implementation anyway a proper virtio spec extension is
>> needed for it be efficient.
>>> And what that happens this scheme will still work.
>>>
>> Although it's a longer route, I'd very much prefer an in-band virtio way to
>> perform it rather than a linux/vdpa specific. It's one of the reasons I prefer
>> the CVQ behavior over a vdpa specific ioctl.
>>
> What is the in-band method to set last_avail_idx?
> In-band virtio method doesn't exist.


Right, but it's part of the vhost API which was there for more than 10 
years. This should be supported by all the vDPA vendors.


>
>>> Other vdpa devices doesn’t have to live with this limitation at this moment.
>>>
>>>> So, any configuration can be passed over the CVQ only after it is
>>>> started (vring configuration + DRIVER_OK). For an emulated queue, if
>>>> the order is reversed, I think the enqueued commands will remain
>>>> buffered and device should be able to service them when it goes live.
>>> I likely didn’t understand what you describe above.
>>>
>>> Vq avail index etc is programmed before doing DRIVER_OK anyway.
>>>
>>> Sequence is very straight forward at destination from user to kernel.
>>> 1. Set config space fields (such as virtio_net_config/virtio_blk_config).
>>> 2. Set other device attributes (max vqs, current num vqs) 3. Set net
>>> specific config (RSS, vlan, mac and more filters) 4. Set VQ fields
>>> (enable, msix, addr, avail indx) 5. Set DRIVER_OK, device resumes from
>>> where it left off
>>>
>>> Steps #1 to #4 can be done multiple times in pre-warm device up case in
>> future.
>>
>> That requires creating a way to set all the parameters enumerated there to
>> vdpa devices. Each time a new feature is added to virtio-net that modifies
>> the guest-visible fronted device we would need to update it.
> Any guest visible feature exposed by the vdpa device on the source side, a migration agent needs to verify/and make destination device capable to support it anyway. Without it a device may be migrated, but it won't perform at same level as source.
>
>> And all the
>> layers of the stack need to maintain more state.
> Mostly not. A complete virtio device state arrived from source vdpa device can be given to destination vdpa device without anyone else looking in the middle. If this format is known/well defined.


That's fine, and it seems the virtio spec is a better place for this, 
then we won't duplicate efforts?


>
>>  From the guest point of view, to enable all the queues with
>> VHOST_VDPA_SET_VRING_ENABLE and don't send DRIVER_OK is the same
>> as send DRIVER_OK and not to enable any data queue with
>> VHOST_VDPA_SET_VRING_ENABLE.
> Enabling SET_VRING_ENABLE after DRIVER_OK has two basic things broken.


It looks to me the spec:

1) For PCI it doesn't forbid the driver to set queue_enable to 1 after 
DRIVER_OK.
2) For MMIO, it even allows the driver to disable a queue after DRIVER_OK


> 1. supplied RSS config and VQ config is not honored for several tens of hundreds of milliseconds
> It will be purely dependent on how/when this ioctl are made.
> Due to this behavior packet supposed to arrive in X VQ, arrives in Y VQ.


I don't get why we end up with this situation.

1) enable cvq
2) set driver_ok
3) set RSS
4) enable TX/RX

vs

1) set RSS
2) enable cvq
3) enable TX/RX
4) set driver_ok

Is the latter faster?


>
> 2. Each VQ enablement one at a time, requires constant steering update for the VQ
> While this information is something already known. Trying to reuse brings a callback result in this in-efficiency.
> So better to start with more reusable APIs that fits the LM flow.


I agree, but the method proposed in the mail seems to be the only way 
that can work with the all the major vDPA vendors.

E.g the new API requires the device has the ability to receive device 
state other than the control virtqueue which might not be supported the 
hardware. (The device might expects a trap and emulate model rather than 
save and restore).

 From qemu point of view, it might need to support both models.

If the device can't do save and restore:

1.1) enable cvq
1.2) set driver_ok
1.3) set device state (MQ, RSS) via control vq
1.4) enable TX/RX

If the device can do save and restore:

2.1) set device state (new API for setting MQ,RSS)
2.2) enable cvq
2.3) enable TX?RX
2.4) set driver_ok

We can start from 1 since it works for all device and then adding 
support for 2?

Thanks




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: About restoring the state in vhost-vdpa device
  2022-05-16 20:29           ` Parav Pandit
  (?)
  (?)
@ 2022-05-17  8:12           ` Eugenio Perez Martin
  2022-05-18 12:51               ` Parav Pandit
  -1 siblings, 1 reply; 20+ messages in thread
From: Eugenio Perez Martin @ 2022-05-17  8:12 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Gautam Dawar, virtualization, qemu-level, Jason Wang, Cindy Lu,
	virtio-networking, Eli Cohen, Laurent Vivier, Stefano Garzarella

On Mon, May 16, 2022 at 10:29 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Eugenio Perez Martin <eperezma@redhat.com>
> > Sent: Monday, May 16, 2022 4:51 AM
> >
> > On Fri, May 13, 2022 at 8:25 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > > Hi Gautam,
> > >
> > > Please fix your email client to have right response format.
> > > Otherwise, it will be confusing for the rest and us to follow the
> > conversation.
> > >
> > > More below.
> > >
> > > > From: Gautam Dawar <gdawar@xilinx.com>
> > > > Sent: Friday, May 13, 2022 1:48 PM
> > >
> > > > > Our proposal diverge in step 7: Instead of enabling *all* the
> > > > > virtqueues, only enable the CVQ.
> > > > Just to double check, VQ 0 and 1 of the net are also not enabled, correct?
> > > > [GD>>] Yes, that's my understanding as well.
> > > >
> >
> > That's correct. We can say that for a queue to be enabled three things must
> > happen:
> > * DRIVER_OK (Still not send)
> > * VHOST_VDPA_SET_VRING_ENABLE sent for that queue (Only sent for
> > CVQ)
> > * If queue is not in first data queue pair and not cvq: send
> > VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET with a queue pair that include it.
> >
> These if conditions, specially the last one is not good as it requires device type knowledge, which in most cases not needed.
> Specially for the new code.
>

DRIVER_OK and SET_VRING_ENABLE are the current way to do so, and all
kinds of vdpa devices are running this way. For the last conditional I
meant only -net. -console have VIRTIO_CONSOLE_F_MULTIPORT, and so on.

But the point was that this is already in the standard and integrated
in the devices:this is not part of the proposal, but the introduction
explaining how the devices work at this moment. We can try to optimize
this flow for sure, but it's a different discussion.

> > > > > After that, send the DRIVER_OK and queue all the control commands
> > > > > to restore the device status (MQ, RSS, ...). Once all of them have
> > > > > been acknowledged ("device", or emulated cvq in host vdpa backend
> > > > > driver, has used all cvq buffers, enable (SET_VRING_ENABLE,
> > > > > set_vq_ready) all other queues.
> > > > >
> > > > What is special about doing DRIVER_OK and enqueuing the control
> > > > commands?
> > > > Why other configuration cannot be applied before DRIVER_OK?
> >
> > There is nothing special beyond "they have a method to be set that way, so
> > reusing it avoids having to maintain many ways to set the same things,
> > simplifying implementations".
> >
> > I'm not saying "it has been like that forever so we cannot change it":
> > I'm very open to the change but priority-wise we should first achieve a
> > working LM with packed, in_order, or even indirect.
> >
> > > > [GD>>] For the device to be live (and any queue to be able to pass
> > > > traffic), DRIVER_OK is a must.
> > > This applies only to the vdpa device implemented over virtio device.
> > > For such use case/implementation anyway a proper virtio spec extension is
> > needed for it be efficient.
> > > And what that happens this scheme will still work.
> > >
> >
> > Although it's a longer route, I'd very much prefer an in-band virtio way to
> > perform it rather than a linux/vdpa specific. It's one of the reasons I prefer
> > the CVQ behavior over a vdpa specific ioctl.
> >
> What is the in-band method to set last_avail_idx?
> In-band virtio method doesn't exist.
>

There isn't, I was acknowledging your point about "a proper virtio
spec extension is needed for it to be efficient".

The intended method is SET_BASE, supported by vhost device types, as
Jason pointed out in his mail. It has been supported for a long time.
In my opinion we should make it an in-band virtio operation too, since
being vhost-only is a limit for some devices.

> > > Other vdpa devices doesn’t have to live with this limitation at this moment.
> > >
> > > > So, any configuration can be passed over the CVQ only after it is
> > > > started (vring configuration + DRIVER_OK). For an emulated queue, if
> > > > the order is reversed, I think the enqueued commands will remain
> > > > buffered and device should be able to service them when it goes live.
> > > I likely didn’t understand what you describe above.
> > >
> > > Vq avail index etc is programmed before doing DRIVER_OK anyway.
> > >
> > > Sequence is very straight forward at destination from user to kernel.
> > > 1. Set config space fields (such as virtio_net_config/virtio_blk_config).
> > > 2. Set other device attributes (max vqs, current num vqs) 3. Set net
> > > specific config (RSS, vlan, mac and more filters) 4. Set VQ fields
> > > (enable, msix, addr, avail indx) 5. Set DRIVER_OK, device resumes from
> > > where it left off
> > >
> > > Steps #1 to #4 can be done multiple times in pre-warm device up case in
> > future.
> >
> > That requires creating a way to set all the parameters enumerated there to
> > vdpa devices. Each time a new feature is added to virtio-net that modifies
> > the guest-visible fronted device we would need to update it.
> Any guest visible feature exposed by the vdpa device on the source side, a migration agent needs to verify/and make destination device capable to support it anyway. Without it a device may be migrated, but it won't perform at same level as source.
>

We can discuss how to reach that point, but it's not the state at this
moment. And I doubt we can reach it before the next kernel merge
window.

> > And all the
> > layers of the stack need to maintain more state.
> Mostly not. A complete virtio device state arrived from source vdpa device can be given to destination vdpa device without anyone else looking in the middle. If this format is known/well defined.
>

I'm not sure I follow this. If you mean that qemu defines a format for
migration data, that is not 100% true: It changes between versions,
and you cannot migrate between two versions of qemu that are too far
apart without migrating between the immediate version if I'm not
wrong. Migration bugs are solved by changing that format, since qemu
does not need interoperability with other VMM at the moment.

The boundaries between qemu and devices need more restrictions than
that for interoperability.

I agree we could set a format for the virtio devices state, but I
think the right place is the virtio standard, not the vDPA layer. If
we do at vDPA layer, we are "repeating the mistake" of VHOST_BASE: We
need to maintain two ways to perform the same action.

> >
> > From the guest point of view, to enable all the queues with
> > VHOST_VDPA_SET_VRING_ENABLE and don't send DRIVER_OK is the same
> > as send DRIVER_OK and not to enable any data queue with
> > VHOST_VDPA_SET_VRING_ENABLE.
>
> Enabling SET_VRING_ENABLE after DRIVER_OK has two basic things broken.
> 1. supplied RSS config and VQ config is not honored for several tens of hundreds of milliseconds
> It will be purely dependent on how/when this ioctl are made.

We can optimize this without adding burden to the API.

> Due to this behavior packet supposed to arrive in X VQ, arrives in Y VQ.
>

I'm not sure why that happens.

By the standard:
After the driver transmitted a packet of a flow on transmitqX, the
device SHOULD cause incoming packets for that flow to be steered to
receiveqX. For uni-directional protocols, or where no packets have
been transmitted yet, the device MAY steer a packet to a random queue
out of the specified receiveq1…receiveqn.

The device MUST NOT queue packets on receive queues greater than
virtqueue_pairs once it has placed the VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET
command in a used buffer.

It doesn't say the requests must be migrated from one queue to another
beyond the interpretation of the SHOULD. Maybe we need to restrict the
standard more to reduce the differences between the devices?

> 2. Each VQ enablement one at a time, requires constant steering update for the VQ
> While this information is something already known. Trying to reuse brings a callback result in this in-efficiency.
> So better to start with more reusable APIs that fits the LM flow.

We can change to that model later. Since the model proposed by us does
not add any burden, we can discard it down the road if something
better arises. The proposed behavior should already work for all
devices: It comes for free regarding kernel / vdpa code.

I think that doing at vhost/vDPA level is going to cause the same
problem as VRING_SET_BASE: We will need to maintain two ways of
performing the same, and the code will need to synchronize them. I'm
not *against* adding it by itself, I'm just considering it an
optimization that needs to be balanced against what already enables
the device to perform state restoring.

Thanks!



^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: About restoring the state in vhost-vdpa device
  2022-05-17  3:05             ` Jason Wang
@ 2022-05-18 12:43               ` Parav Pandit
  -1 siblings, 0 replies; 20+ messages in thread
From: Parav Pandit via Virtualization @ 2022-05-18 12:43 UTC (permalink / raw)
  To: Jason Wang, Eugenio Perez Martin
  Cc: Laurent Vivier, Cindy Lu, virtio-networking, qemu-level,
	Gautam Dawar, virtualization, Eli Cohen


> From: Jason Wang <jasowang@redhat.com>
> Sent: Monday, May 16, 2022 11:05 PM
> >> Although it's a longer route, I'd very much prefer an in-band virtio
> >> way to perform it rather than a linux/vdpa specific. It's one of the
> >> reasons I prefer the CVQ behavior over a vdpa specific ioctl.
> >>
> > What is the in-band method to set last_avail_idx?
> > In-band virtio method doesn't exist.
> 
> 
> Right, but it's part of the vhost API which was there for more than 10 years.
> This should be supported by all the vDPA vendors.
Sure. My point to Eugenio was that vdpa doesn’t have to limited by virtio spec.
Plumbing exists to make vdpa work without virtio spec.
And hence, additional ioctl can be ok.

> >> layers of the stack need to maintain more state.
> > Mostly not. A complete virtio device state arrived from source vdpa device
> can be given to destination vdpa device without anyone else looking in the
> middle. If this format is known/well defined.
> 
> 
> That's fine, and it seems the virtio spec is a better place for this,
> then we won't duplicate efforts?
> 
Yes. for VDPA kernel, setting parameters doesn’t need virtio spec update.
It is similar to avail index setting.

> 
> >
> >>  From the guest point of view, to enable all the queues with
> >> VHOST_VDPA_SET_VRING_ENABLE and don't send DRIVER_OK is the
> same
> >> as send DRIVER_OK and not to enable any data queue with
> >> VHOST_VDPA_SET_VRING_ENABLE.
> > Enabling SET_VRING_ENABLE after DRIVER_OK has two basic things
> broken.
> 
> 
> It looks to me the spec:
> 
> 1) For PCI it doesn't forbid the driver to set queue_enable to 1 after
> DRIVER_OK.
Device init sequence sort of hints that vq setup should be done before driver_ok in below snippet.

"Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup,
reading and possibly writing the device’s virtio configuration space, and population of virtqueues."

For a moment even if we assume, that queue can be enabled after driver_ok, it ends up going to incorrect queue.
Because the queue where it supposed to go, it not enabled and its rss is not setup.

So on restore flow it is desired to set needed config before doing driver_ok.

> 2) For MMIO, it even allows the driver to disable a queue after DRIVER_OK
> 
> 
> > 1. supplied RSS config and VQ config is not honored for several tens of
> hundreds of milliseconds
> > It will be purely dependent on how/when this ioctl are made.
> > Due to this behavior packet supposed to arrive in X VQ, arrives in Y VQ.
> 
> 
> I don't get why we end up with this situation.
> 
> 1) enable cvq
> 2) set driver_ok
> 3) set RSS
> 4) enable TX/RX
> 
> vs
> 
> 1) set RSS
> 2) enable cvq
> 3) enable TX/RX
> 4) set driver_ok
> 
> Is the latter faster?
> 
Yes, because later sequence has the ability to setup steering config once.
As opposed to that first sequence needs to incrementally update the rss setting on every new queue addition on step #4.

> 
> >
> > 2. Each VQ enablement one at a time, requires constant steering update
> for the VQ
> > While this information is something already known. Trying to reuse brings a
> callback result in this in-efficiency.
> > So better to start with more reusable APIs that fits the LM flow.
> 
> 
> I agree, but the method proposed in the mail seems to be the only way
> that can work with the all the major vDPA vendors.
> 
> E.g the new API requires the device has the ability to receive device
> state other than the control virtqueue which might not be supported the
> hardware. (The device might expects a trap and emulate model rather than
> save and restore).
> 
How a given vendor to return the values is in the vendor specific vdpa driver, just like avail_index which is not coming through the CVQ.

>  From qemu point of view, it might need to support both models.
> 
> If the device can't do save and restore:
> 
> 1.1) enable cvq
> 1.2) set driver_ok
> 1.3) set device state (MQ, RSS) via control vq
> 1.4) enable TX/RX
> 
> If the device can do save and restore:
> 
> 2.1) set device state (new API for setting MQ,RSS)
> 2.2) enable cvq
> 2.3) enable TX?RX
> 2.4) set driver_ok
> 
> We can start from 1 since it works for all device and then adding
> support for 2?
> 

How about:
3.1) create cvq for the supported device
Cvq not exposed to user space, stays in the kernel. Vdpa driver created it.

3.2) set device state (MQ, RSS) comes via user->kernel ioctl()
Vdpa driver internally decides whether to use cvq or something else (like avail index).

3.3) enable tx/rx
3.4) set driver_ok
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: About restoring the state in vhost-vdpa device
@ 2022-05-18 12:43               ` Parav Pandit
  0 siblings, 0 replies; 20+ messages in thread
From: Parav Pandit @ 2022-05-18 12:43 UTC (permalink / raw)
  To: Jason Wang, Eugenio Perez Martin
  Cc: Gautam Dawar, virtualization, qemu-level, Cindy Lu,
	virtio-networking, Eli Cohen, Laurent Vivier, Stefano Garzarella


> From: Jason Wang <jasowang@redhat.com>
> Sent: Monday, May 16, 2022 11:05 PM
> >> Although it's a longer route, I'd very much prefer an in-band virtio
> >> way to perform it rather than a linux/vdpa specific. It's one of the
> >> reasons I prefer the CVQ behavior over a vdpa specific ioctl.
> >>
> > What is the in-band method to set last_avail_idx?
> > In-band virtio method doesn't exist.
> 
> 
> Right, but it's part of the vhost API which was there for more than 10 years.
> This should be supported by all the vDPA vendors.
Sure. My point to Eugenio was that vdpa doesn’t have to limited by virtio spec.
Plumbing exists to make vdpa work without virtio spec.
And hence, additional ioctl can be ok.

> >> layers of the stack need to maintain more state.
> > Mostly not. A complete virtio device state arrived from source vdpa device
> can be given to destination vdpa device without anyone else looking in the
> middle. If this format is known/well defined.
> 
> 
> That's fine, and it seems the virtio spec is a better place for this,
> then we won't duplicate efforts?
> 
Yes. for VDPA kernel, setting parameters doesn’t need virtio spec update.
It is similar to avail index setting.

> 
> >
> >>  From the guest point of view, to enable all the queues with
> >> VHOST_VDPA_SET_VRING_ENABLE and don't send DRIVER_OK is the
> same
> >> as send DRIVER_OK and not to enable any data queue with
> >> VHOST_VDPA_SET_VRING_ENABLE.
> > Enabling SET_VRING_ENABLE after DRIVER_OK has two basic things
> broken.
> 
> 
> It looks to me the spec:
> 
> 1) For PCI it doesn't forbid the driver to set queue_enable to 1 after
> DRIVER_OK.
Device init sequence sort of hints that vq setup should be done before driver_ok in below snippet.

"Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup,
reading and possibly writing the device’s virtio configuration space, and population of virtqueues."

For a moment even if we assume, that queue can be enabled after driver_ok, it ends up going to incorrect queue.
Because the queue where it supposed to go, it not enabled and its rss is not setup.

So on restore flow it is desired to set needed config before doing driver_ok.

> 2) For MMIO, it even allows the driver to disable a queue after DRIVER_OK
> 
> 
> > 1. supplied RSS config and VQ config is not honored for several tens of
> hundreds of milliseconds
> > It will be purely dependent on how/when this ioctl are made.
> > Due to this behavior packet supposed to arrive in X VQ, arrives in Y VQ.
> 
> 
> I don't get why we end up with this situation.
> 
> 1) enable cvq
> 2) set driver_ok
> 3) set RSS
> 4) enable TX/RX
> 
> vs
> 
> 1) set RSS
> 2) enable cvq
> 3) enable TX/RX
> 4) set driver_ok
> 
> Is the latter faster?
> 
Yes, because later sequence has the ability to setup steering config once.
As opposed to that first sequence needs to incrementally update the rss setting on every new queue addition on step #4.

> 
> >
> > 2. Each VQ enablement one at a time, requires constant steering update
> for the VQ
> > While this information is something already known. Trying to reuse brings a
> callback result in this in-efficiency.
> > So better to start with more reusable APIs that fits the LM flow.
> 
> 
> I agree, but the method proposed in the mail seems to be the only way
> that can work with the all the major vDPA vendors.
> 
> E.g the new API requires the device has the ability to receive device
> state other than the control virtqueue which might not be supported the
> hardware. (The device might expects a trap and emulate model rather than
> save and restore).
> 
How a given vendor to return the values is in the vendor specific vdpa driver, just like avail_index which is not coming through the CVQ.

>  From qemu point of view, it might need to support both models.
> 
> If the device can't do save and restore:
> 
> 1.1) enable cvq
> 1.2) set driver_ok
> 1.3) set device state (MQ, RSS) via control vq
> 1.4) enable TX/RX
> 
> If the device can do save and restore:
> 
> 2.1) set device state (new API for setting MQ,RSS)
> 2.2) enable cvq
> 2.3) enable TX?RX
> 2.4) set driver_ok
> 
> We can start from 1 since it works for all device and then adding
> support for 2?
> 

How about:
3.1) create cvq for the supported device
Cvq not exposed to user space, stays in the kernel. Vdpa driver created it.

3.2) set device state (MQ, RSS) comes via user->kernel ioctl()
Vdpa driver internally decides whether to use cvq or something else (like avail index).

3.3) enable tx/rx
3.4) set driver_ok

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: About restoring the state in vhost-vdpa device
  2022-05-17  8:12           ` Eugenio Perez Martin
@ 2022-05-18 12:51               ` Parav Pandit
  0 siblings, 0 replies; 20+ messages in thread
From: Parav Pandit via Virtualization @ 2022-05-18 12:51 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Cindy Lu, virtio-networking, qemu-level,
	Gautam Dawar, virtualization, Eli Cohen


> From: Eugenio Perez Martin <eperezma@redhat.com>
> Sent: Tuesday, May 17, 2022 4:12 AM
 
> > 2. Each VQ enablement one at a time, requires constant steering update
> > for the VQ While this information is something already known. Trying to
> reuse brings a callback result in this in-efficiency.
> > So better to start with more reusable APIs that fits the LM flow.
> 
> We can change to that model later. Since the model proposed by us does not
> add any burden, we can discard it down the road if something better arises.
> The proposed behavior should already work for all
> devices: It comes for free regarding kernel / vdpa code.
It is not for free.
It comes with higher LM downtime.
And that makes it unusable as the queues scale.

> 
> I think that doing at vhost/vDPA level is going to cause the same problem as
> VRING_SET_BASE: We will need to maintain two ways of performing the
> same, and the code will need to synchronize them. I'm not *against* adding
> it by itself, I'm just considering it an optimization that needs to be balanced
> against what already enables the device to perform state restoring.

We only need to change the sequencing of how we restore and abstract it out how to restore in the vdpa layer.
CVQ or something else it the choice internal inside the vpda vendor driver.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: About restoring the state in vhost-vdpa device
@ 2022-05-18 12:51               ` Parav Pandit
  0 siblings, 0 replies; 20+ messages in thread
From: Parav Pandit @ 2022-05-18 12:51 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Gautam Dawar, virtualization, qemu-level, Jason Wang, Cindy Lu,
	virtio-networking, Eli Cohen, Laurent Vivier, Stefano Garzarella


> From: Eugenio Perez Martin <eperezma@redhat.com>
> Sent: Tuesday, May 17, 2022 4:12 AM
 
> > 2. Each VQ enablement one at a time, requires constant steering update
> > for the VQ While this information is something already known. Trying to
> reuse brings a callback result in this in-efficiency.
> > So better to start with more reusable APIs that fits the LM flow.
> 
> We can change to that model later. Since the model proposed by us does not
> add any burden, we can discard it down the road if something better arises.
> The proposed behavior should already work for all
> devices: It comes for free regarding kernel / vdpa code.
It is not for free.
It comes with higher LM downtime.
And that makes it unusable as the queues scale.

> 
> I think that doing at vhost/vDPA level is going to cause the same problem as
> VRING_SET_BASE: We will need to maintain two ways of performing the
> same, and the code will need to synchronize them. I'm not *against* adding
> it by itself, I'm just considering it an optimization that needs to be balanced
> against what already enables the device to perform state restoring.

We only need to change the sequencing of how we restore and abstract it out how to restore in the vdpa layer.
CVQ or something else it the choice internal inside the vpda vendor driver.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: About restoring the state in vhost-vdpa device
  2022-05-18 12:43               ` Parav Pandit
@ 2022-05-23  3:12                 ` Jason Wang
  -1 siblings, 0 replies; 20+ messages in thread
From: Jason Wang @ 2022-05-23  3:12 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Laurent Vivier, Cindy Lu, piotr.uminski, virtio-networking,
	qemu-level, Gautam Dawar, virtualization, Eugenio Perez Martin,
	Eli Cohen, Zhu Lingshan

On Wed, May 18, 2022 at 8:44 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Monday, May 16, 2022 11:05 PM
> > >> Although it's a longer route, I'd very much prefer an in-band virtio
> > >> way to perform it rather than a linux/vdpa specific. It's one of the
> > >> reasons I prefer the CVQ behavior over a vdpa specific ioctl.
> > >>
> > > What is the in-band method to set last_avail_idx?
> > > In-band virtio method doesn't exist.
> >
> >
> > Right, but it's part of the vhost API which was there for more than 10 years.
> > This should be supported by all the vDPA vendors.
> Sure. My point to Eugenio was that vdpa doesn’t have to limited by virtio spec.

Yes, but the APIs only consist of the ones that have been already
supported by either virtio or vhost.

> Plumbing exists to make vdpa work without virtio spec.
> And hence, additional ioctl can be ok.
>
> > >> layers of the stack need to maintain more state.
> > > Mostly not. A complete virtio device state arrived from source vdpa device
> > can be given to destination vdpa device without anyone else looking in the
> > middle. If this format is known/well defined.
> >
> >
> > That's fine, and it seems the virtio spec is a better place for this,
> > then we won't duplicate efforts?
> >
> Yes. for VDPA kernel, setting parameters doesn’t need virtio spec update.

We've already had a spec patch for this.

> It is similar to avail index setting.

Yes, but we don't want to be separated too much from the virtio spec
unless we have a very strong point. The reason is that it would be
challenging to offer forward compatibility to the future spec support
of device state restoring. That's why we tend to reuse the existing
APIs so far.

>
> >
> > >
> > >>  From the guest point of view, to enable all the queues with
> > >> VHOST_VDPA_SET_VRING_ENABLE and don't send DRIVER_OK is the
> > same
> > >> as send DRIVER_OK and not to enable any data queue with
> > >> VHOST_VDPA_SET_VRING_ENABLE.
> > > Enabling SET_VRING_ENABLE after DRIVER_OK has two basic things
> > broken.
> >
> >
> > It looks to me the spec:
> >
> > 1) For PCI it doesn't forbid the driver to set queue_enable to 1 after
> > DRIVER_OK.
> Device init sequence sort of hints that vq setup should be done before driver_ok in below snippet.
>
> "Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup,
> reading and possibly writing the device’s virtio configuration space, and population of virtqueues."
>
> For a moment even if we assume, that queue can be enabled after driver_ok, it ends up going to incorrect queue.

For RSS, the device can choose to drop the packet if the destination
queue is not enabled, we can clarify this in the spec. Actually
there's a patch that has clarified the packet should be dropped if the
queue is reset:

https://lists.oasis-open.org/archives/virtio-dev/202204/msg00063.html

We need to do something similar to queue_enable in this case. Then we
are all fine.

And the over head of the incremental ioctls is not something that
can't be addressed, we can introduce e.g a new command to disable
datapath by setting num_queue_paris to 0.

> Because the queue where it supposed to go, it not enabled and its rss is not setup.

So the queue_enable and num_queue_pairs work at different levels.
Queue_enable works at general virtio level, but the num_queue_paris
works for networking only.

From the spec, it allows the following setups:

1) enable 1st queue pairs
2) set driver_ok
3) set 4 queue pairs

The device is expected to deal with this setup anyhow.

>
> So on restore flow it is desired to set needed config before doing driver_ok.
>
> > 2) For MMIO, it even allows the driver to disable a queue after DRIVER_OK
> >
> >
> > > 1. supplied RSS config and VQ config is not honored for several tens of
> > hundreds of milliseconds
> > > It will be purely dependent on how/when this ioctl are made.
> > > Due to this behavior packet supposed to arrive in X VQ, arrives in Y VQ.
> >
> >
> > I don't get why we end up with this situation.
> >
> > 1) enable cvq
> > 2) set driver_ok
> > 3) set RSS
> > 4) enable TX/RX
> >
> > vs
> >
> > 1) set RSS
> > 2) enable cvq
> > 3) enable TX/RX
> > 4) set driver_ok
> >
> > Is the latter faster?
> >
> Yes, because later sequence has the ability to setup steering config once.
> As opposed to that first sequence needs to incrementally update the rss setting on every new queue addition on step #4.

So what I understand the device RX flow is something like:

Incoming packet -> steering[1] -> queue_enable[2] -> start DMA

The steering[1] is expected to be configured when setting either MQ or RSS once.
The queue_enable[2] check is an independent check after steering.

We should only start the DMA after both 1 and 2. So the only
incremental thing is queue_enable[2].

I would try Eugenio's proposal and see how it works, anyhow it's the
cheapest way so far (without introducing new ioctls etc). My
understanding is that it should work (probably with some overhead on
the queue_enable step) for all vendors so far. If it doesn't we can do
new ioctls.

>
> >
> > >
> > > 2. Each VQ enablement one at a time, requires constant steering update
> > for the VQ
> > > While this information is something already known. Trying to reuse brings a
> > callback result in this in-efficiency.
> > > So better to start with more reusable APIs that fits the LM flow.
> >
> >
> > I agree, but the method proposed in the mail seems to be the only way
> > that can work with the all the major vDPA vendors.
> >
> > E.g the new API requires the device has the ability to receive device
> > state other than the control virtqueue which might not be supported the
> > hardware. (The device might expects a trap and emulate model rather than
> > save and restore).
> >
> How a given vendor to return the values is in the vendor specific vdpa driver, just like avail_index which is not coming through the CVQ.

The problem is:

1) at virtqueue level, we know the index (and the inflight buffers)
are the only state
2) at the device level, we know there're a lot of other stuffs,
inventing new ioctls might require very careful design which may take
time

>
> >  From qemu point of view, it might need to support both models.
> >
> > If the device can't do save and restore:
> >
> > 1.1) enable cvq
> > 1.2) set driver_ok
> > 1.3) set device state (MQ, RSS) via control vq
> > 1.4) enable TX/RX
> >
> > If the device can do save and restore:
> >
> > 2.1) set device state (new API for setting MQ,RSS)
> > 2.2) enable cvq
> > 2.3) enable TX?RX
> > 2.4) set driver_ok
> >
> > We can start from 1 since it works for all device and then adding
> > support for 2?
> >
>
> How about:
> 3.1) create cvq for the supported device
> Cvq not exposed to user space, stays in the kernel. Vdpa driver created it.

If it's a hardware CVQ, we may still need to set DRIVER_OK before
setting the states.

>
> 3.2) set device state (MQ, RSS) comes via user->kernel ioctl()
> Vdpa driver internally decides whether to use cvq or something else (like avail index).
>

I think this is similar to method 2. The challenge is that the ioctl()
is not flexible as a queue, e.g we want a device agnostics API, (and
if we had one, it could be defined in the virtio-spec). At the API
level, it doesn't differ too much whether it's an ioctl or a queue. If
we decide to go with this way, we can simply insist that the CVQ can
work before DRIVER_OK (with a backend feature anyhow).

3.1) enable cvq
3.2) set device state (RSS,MQ) via CVQ
3.3) enable tx/rx
3.4) set driver_ok

Thanks


> 3.3) enable tx/rx
> 3.4) set driver_ok

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: About restoring the state in vhost-vdpa device
@ 2022-05-23  3:12                 ` Jason Wang
  0 siblings, 0 replies; 20+ messages in thread
From: Jason Wang @ 2022-05-23  3:12 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Eugenio Perez Martin, Gautam Dawar, virtualization, qemu-level,
	Cindy Lu, virtio-networking, Eli Cohen, Laurent Vivier,
	Stefano Garzarella, Zhu Lingshan, piotr.uminski

On Wed, May 18, 2022 at 8:44 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Monday, May 16, 2022 11:05 PM
> > >> Although it's a longer route, I'd very much prefer an in-band virtio
> > >> way to perform it rather than a linux/vdpa specific. It's one of the
> > >> reasons I prefer the CVQ behavior over a vdpa specific ioctl.
> > >>
> > > What is the in-band method to set last_avail_idx?
> > > In-band virtio method doesn't exist.
> >
> >
> > Right, but it's part of the vhost API which was there for more than 10 years.
> > This should be supported by all the vDPA vendors.
> Sure. My point to Eugenio was that vdpa doesn’t have to limited by virtio spec.

Yes, but the APIs only consist of the ones that have been already
supported by either virtio or vhost.

> Plumbing exists to make vdpa work without virtio spec.
> And hence, additional ioctl can be ok.
>
> > >> layers of the stack need to maintain more state.
> > > Mostly not. A complete virtio device state arrived from source vdpa device
> > can be given to destination vdpa device without anyone else looking in the
> > middle. If this format is known/well defined.
> >
> >
> > That's fine, and it seems the virtio spec is a better place for this,
> > then we won't duplicate efforts?
> >
> Yes. for VDPA kernel, setting parameters doesn’t need virtio spec update.

We've already had a spec patch for this.

> It is similar to avail index setting.

Yes, but we don't want to be separated too much from the virtio spec
unless we have a very strong point. The reason is that it would be
challenging to offer forward compatibility to the future spec support
of device state restoring. That's why we tend to reuse the existing
APIs so far.

>
> >
> > >
> > >>  From the guest point of view, to enable all the queues with
> > >> VHOST_VDPA_SET_VRING_ENABLE and don't send DRIVER_OK is the
> > same
> > >> as send DRIVER_OK and not to enable any data queue with
> > >> VHOST_VDPA_SET_VRING_ENABLE.
> > > Enabling SET_VRING_ENABLE after DRIVER_OK has two basic things
> > broken.
> >
> >
> > It looks to me the spec:
> >
> > 1) For PCI it doesn't forbid the driver to set queue_enable to 1 after
> > DRIVER_OK.
> Device init sequence sort of hints that vq setup should be done before driver_ok in below snippet.
>
> "Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup,
> reading and possibly writing the device’s virtio configuration space, and population of virtqueues."
>
> For a moment even if we assume, that queue can be enabled after driver_ok, it ends up going to incorrect queue.

For RSS, the device can choose to drop the packet if the destination
queue is not enabled, we can clarify this in the spec. Actually
there's a patch that has clarified the packet should be dropped if the
queue is reset:

https://lists.oasis-open.org/archives/virtio-dev/202204/msg00063.html

We need to do something similar to queue_enable in this case. Then we
are all fine.

And the over head of the incremental ioctls is not something that
can't be addressed, we can introduce e.g a new command to disable
datapath by setting num_queue_paris to 0.

> Because the queue where it supposed to go, it not enabled and its rss is not setup.

So the queue_enable and num_queue_pairs work at different levels.
Queue_enable works at general virtio level, but the num_queue_paris
works for networking only.

From the spec, it allows the following setups:

1) enable 1st queue pairs
2) set driver_ok
3) set 4 queue pairs

The device is expected to deal with this setup anyhow.

>
> So on restore flow it is desired to set needed config before doing driver_ok.
>
> > 2) For MMIO, it even allows the driver to disable a queue after DRIVER_OK
> >
> >
> > > 1. supplied RSS config and VQ config is not honored for several tens of
> > hundreds of milliseconds
> > > It will be purely dependent on how/when this ioctl are made.
> > > Due to this behavior packet supposed to arrive in X VQ, arrives in Y VQ.
> >
> >
> > I don't get why we end up with this situation.
> >
> > 1) enable cvq
> > 2) set driver_ok
> > 3) set RSS
> > 4) enable TX/RX
> >
> > vs
> >
> > 1) set RSS
> > 2) enable cvq
> > 3) enable TX/RX
> > 4) set driver_ok
> >
> > Is the latter faster?
> >
> Yes, because later sequence has the ability to setup steering config once.
> As opposed to that first sequence needs to incrementally update the rss setting on every new queue addition on step #4.

So what I understand the device RX flow is something like:

Incoming packet -> steering[1] -> queue_enable[2] -> start DMA

The steering[1] is expected to be configured when setting either MQ or RSS once.
The queue_enable[2] check is an independent check after steering.

We should only start the DMA after both 1 and 2. So the only
incremental thing is queue_enable[2].

I would try Eugenio's proposal and see how it works, anyhow it's the
cheapest way so far (without introducing new ioctls etc). My
understanding is that it should work (probably with some overhead on
the queue_enable step) for all vendors so far. If it doesn't we can do
new ioctls.

>
> >
> > >
> > > 2. Each VQ enablement one at a time, requires constant steering update
> > for the VQ
> > > While this information is something already known. Trying to reuse brings a
> > callback result in this in-efficiency.
> > > So better to start with more reusable APIs that fits the LM flow.
> >
> >
> > I agree, but the method proposed in the mail seems to be the only way
> > that can work with the all the major vDPA vendors.
> >
> > E.g the new API requires the device has the ability to receive device
> > state other than the control virtqueue which might not be supported the
> > hardware. (The device might expects a trap and emulate model rather than
> > save and restore).
> >
> How a given vendor to return the values is in the vendor specific vdpa driver, just like avail_index which is not coming through the CVQ.

The problem is:

1) at virtqueue level, we know the index (and the inflight buffers)
are the only state
2) at the device level, we know there're a lot of other stuffs,
inventing new ioctls might require very careful design which may take
time

>
> >  From qemu point of view, it might need to support both models.
> >
> > If the device can't do save and restore:
> >
> > 1.1) enable cvq
> > 1.2) set driver_ok
> > 1.3) set device state (MQ, RSS) via control vq
> > 1.4) enable TX/RX
> >
> > If the device can do save and restore:
> >
> > 2.1) set device state (new API for setting MQ,RSS)
> > 2.2) enable cvq
> > 2.3) enable TX?RX
> > 2.4) set driver_ok
> >
> > We can start from 1 since it works for all device and then adding
> > support for 2?
> >
>
> How about:
> 3.1) create cvq for the supported device
> Cvq not exposed to user space, stays in the kernel. Vdpa driver created it.

If it's a hardware CVQ, we may still need to set DRIVER_OK before
setting the states.

>
> 3.2) set device state (MQ, RSS) comes via user->kernel ioctl()
> Vdpa driver internally decides whether to use cvq or something else (like avail index).
>

I think this is similar to method 2. The challenge is that the ioctl()
is not flexible as a queue, e.g we want a device agnostics API, (and
if we had one, it could be defined in the virtio-spec). At the API
level, it doesn't differ too much whether it's an ioctl or a queue. If
we decide to go with this way, we can simply insist that the CVQ can
work before DRIVER_OK (with a backend feature anyhow).

3.1) enable cvq
3.2) set device state (RSS,MQ) via CVQ
3.3) enable tx/rx
3.4) set driver_ok

Thanks


> 3.3) enable tx/rx
> 3.4) set driver_ok



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-05-23  3:14 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-11 19:43 About restoring the state in vhost-vdpa device Eugenio Perez Martin
2022-05-12  4:00 ` Jason Wang
2022-05-12  4:00   ` Jason Wang
2022-05-13 15:08 ` Parav Pandit via Virtualization
2022-05-13 15:08   ` Parav Pandit
2022-05-13 17:48   ` Gautam Dawar
2022-05-13 18:25     ` Parav Pandit via Virtualization
2022-05-13 18:25       ` Parav Pandit
2022-05-16  8:50       ` Eugenio Perez Martin
2022-05-16 20:29         ` Parav Pandit via Virtualization
2022-05-16 20:29           ` Parav Pandit
2022-05-17  3:05           ` Jason Wang
2022-05-17  3:05             ` Jason Wang
2022-05-18 12:43             ` Parav Pandit via Virtualization
2022-05-18 12:43               ` Parav Pandit
2022-05-23  3:12               ` Jason Wang
2022-05-23  3:12                 ` Jason Wang
2022-05-17  8:12           ` Eugenio Perez Martin
2022-05-18 12:51             ` Parav Pandit via Virtualization
2022-05-18 12:51               ` Parav Pandit

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.