Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user

All of lore.kernel.org
 help / color / mirror / Atom feed

* Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user
@ 2023-01-31 19:10 Eugenio Perez Martin
  2023-01-31 19:11 ` Eugenio Perez Martin
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Eugenio Perez Martin @ 2023-01-31 19:10 UTC (permalink / raw)
  To: Maxime Coquelin, Jason Wang, Michael Tsirkin, Cindy Lu,
	Stefano Garzarella, qemu-level, Laurent Vivier, Juan Quintela

Hi,

The current approach of offering an emulated CVQ to the guest and map
the commands to vhost-user is not scaling well:
* Some devices already offer it, so the transformation is redundant.
* There is no support for commands with variable length (RSS?)

We can solve both of them by offering it through vhost-user the same
way as vhost-vdpa do. With this approach qemu needs to track the
commands, for similar reasons as vhost-vdpa: qemu needs to track the
device status for live migration. vhost-user should use the same SVQ
code for this, so we avoid duplications.

One of the challenges here is to know what virtqueue to shadow /
isolate. The vhost-user device may not have the same queues as the
device frontend:
* The first depends on the actual vhost-user device, and qemu fetches
it with VHOST_USER_GET_QUEUE_NUM at the moment.
* The qemu device frontend's is set by netdev queues= cmdline parameter in qemu

For the device, the CVQ is the last one it offers, but for the guest
it is the last one offered in config space.

To create a new vhost-user command to decrease that maximum number of
queues may be an option. But we can do it without adding more
commands, remapping the CVQ index at virtqueue setup. I think it
should be doable using (struct vhost_dev).vq_index and maybe a few
adjustments here and there.

Thoughts?

Thanks!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user
  2023-01-31 19:10 Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user Eugenio Perez Martin
@ 2023-01-31 19:11 ` Eugenio Perez Martin
  2023-01-31 21:32   ` Michael S. Tsirkin
  2023-02-01  3:29   ` Jason Wang
  2023-02-01  3:27 ` Jason Wang
  2023-02-01 11:14 ` Maxime Coquelin
  2 siblings, 2 replies; 23+ messages in thread
From: Eugenio Perez Martin @ 2023-01-31 19:11 UTC (permalink / raw)
  To: Maxime Coquelin, Jason Wang, Michael Tsirkin, Cindy Lu,
	Stefano Garzarella, qemu-level, Laurent Vivier, Juan Quintela

On Tue, Jan 31, 2023 at 8:10 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> Hi,
>
> The current approach of offering an emulated CVQ to the guest and map
> the commands to vhost-user is not scaling well:
> * Some devices already offer it, so the transformation is redundant.
> * There is no support for commands with variable length (RSS?)
>
> We can solve both of them by offering it through vhost-user the same
> way as vhost-vdpa do. With this approach qemu needs to track the
> commands, for similar reasons as vhost-vdpa: qemu needs to track the
> device status for live migration. vhost-user should use the same SVQ
> code for this, so we avoid duplications.
>
> One of the challenges here is to know what virtqueue to shadow /
> isolate. The vhost-user device may not have the same queues as the
> device frontend:
> * The first depends on the actual vhost-user device, and qemu fetches
> it with VHOST_USER_GET_QUEUE_NUM at the moment.
> * The qemu device frontend's is set by netdev queues= cmdline parameter in qemu
>
> For the device, the CVQ is the last one it offers, but for the guest
> it is the last one offered in config space.
>
> To create a new vhost-user command to decrease that maximum number of
> queues may be an option. But we can do it without adding more
> commands, remapping the CVQ index at virtqueue setup. I think it
> should be doable using (struct vhost_dev).vq_index and maybe a few
> adjustments here and there.
>
> Thoughts?
>
> Thanks!

(Starting a separated thread to vhost-vdpa related use case)

This could also work for vhost-vdpa if we ever decide to honor netdev
queues argument. It is totally ignored now, as opposed to the rest of
backends:
* vhost-kernel, whose tap device has the requested number of queues.
* vhost-user, that errors with ("you are asking more queues than
supported") if the vhost-user parent device has less queues than
requested (by vhost-user msg VHOST_USER_GET_QUEUE_NUM).

One of the reasons for this is that device configuration space is
totally passthrough, with the values for mtu, rss conditions, etc.
This is not ideal, as qemu cannot check src and destination
equivalence and they can change under the feets of the guest in the
event of a migration. External tools are needed for this, duplicating
part of the effort.

Start intercepting config space accesses and offering an emulated one
to the guest with this kind of adjustments is beneficial, as it makes
vhost-vdpa more similar to the rest of backends, making the surprise
on a change way lower.

Thoughts?

Thanks!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user
  2023-01-31 19:11 ` Eugenio Perez Martin
@ 2023-01-31 21:32   ` Michael S. Tsirkin
  2023-02-01  3:29   ` Jason Wang
  1 sibling, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2023-01-31 21:32 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Maxime Coquelin, Jason Wang, Cindy Lu, Stefano Garzarella,
	qemu-level, Laurent Vivier, Juan Quintela

On Tue, Jan 31, 2023 at 08:11:06PM +0100, Eugenio Perez Martin wrote:
> On Tue, Jan 31, 2023 at 8:10 PM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
> >
> > Hi,
> >
> > The current approach of offering an emulated CVQ to the guest and map
> > the commands to vhost-user is not scaling well:
> > * Some devices already offer it, so the transformation is redundant.
> > * There is no support for commands with variable length (RSS?)
> >
> > We can solve both of them by offering it through vhost-user the same
> > way as vhost-vdpa do. With this approach qemu needs to track the
> > commands, for similar reasons as vhost-vdpa: qemu needs to track the
> > device status for live migration. vhost-user should use the same SVQ
> > code for this, so we avoid duplications.
> >
> > One of the challenges here is to know what virtqueue to shadow /
> > isolate. The vhost-user device may not have the same queues as the
> > device frontend:
> > * The first depends on the actual vhost-user device, and qemu fetches
> > it with VHOST_USER_GET_QUEUE_NUM at the moment.
> > * The qemu device frontend's is set by netdev queues= cmdline parameter in qemu
> >
> > For the device, the CVQ is the last one it offers, but for the guest
> > it is the last one offered in config space.
> >
> > To create a new vhost-user command to decrease that maximum number of
> > queues may be an option. But we can do it without adding more
> > commands, remapping the CVQ index at virtqueue setup. I think it
> > should be doable using (struct vhost_dev).vq_index and maybe a few
> > adjustments here and there.
> >
> > Thoughts?
> >
> > Thanks!
> 
> 
> (Starting a separated thread to vhost-vdpa related use case)
> 
> This could also work for vhost-vdpa if we ever decide to honor netdev
> queues argument. It is totally ignored now, as opposed to the rest of
> backends:
> * vhost-kernel, whose tap device has the requested number of queues.
> * vhost-user, that errors with ("you are asking more queues than
> supported") if the vhost-user parent device has less queues than
> requested (by vhost-user msg VHOST_USER_GET_QUEUE_NUM).
> 
> One of the reasons for this is that device configuration space is
> totally passthrough, with the values for mtu, rss conditions, etc.
> This is not ideal, as qemu cannot check src and destination
> equivalence and they can change under the feets of the guest in the
> event of a migration. External tools are needed for this, duplicating
> part of the effort.
> 
> Start intercepting config space accesses and offering an emulated one
> to the guest with this kind of adjustments is beneficial, as it makes
> vhost-vdpa more similar to the rest of backends, making the surprise
> on a change way lower.
> 
> Thoughts?
> 
> Thanks!

I agree here.

-- 
MST



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user
  2023-01-31 19:10 Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user Eugenio Perez Martin
  2023-01-31 19:11 ` Eugenio Perez Martin
@ 2023-02-01  3:27 ` Jason Wang
  2023-02-01  6:55   ` Eugenio Perez Martin
  2023-02-01 11:14 ` Maxime Coquelin
  2 siblings, 1 reply; 23+ messages in thread
From: Jason Wang @ 2023-02-01  3:27 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Maxime Coquelin, Michael Tsirkin, Cindy Lu, Stefano Garzarella,
	qemu-level, Laurent Vivier, Juan Quintela

On Wed, Feb 1, 2023 at 3:10 AM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> Hi,
>
> The current approach of offering an emulated CVQ to the guest and map
> the commands to vhost-user is not scaling well:
> * Some devices already offer it, so the transformation is redundant.
> * There is no support for commands with variable length (RSS?)
>
> We can solve both of them by offering it through vhost-user the same
> way as vhost-vdpa do. With this approach qemu needs to track the
> commands, for similar reasons as vhost-vdpa: qemu needs to track the
> device status for live migration. vhost-user should use the same SVQ
> code for this, so we avoid duplications.

Note that it really depends on the model we used. SVQ works well for
trap and emulation (without new API to be invented). But if save and
load API is invented, SVQ is not a must.

>
> One of the challenges here is to know what virtqueue to shadow /
> isolate. The vhost-user device may not have the same queues as the
> device frontend:
> * The first depends on the actual vhost-user device, and qemu fetches
> it with VHOST_USER_GET_QUEUE_NUM at the moment.
> * The qemu device frontend's is set by netdev queues= cmdline parameter in qemu
>
> For the device, the CVQ is the last one it offers, but for the guest
> it is the last one offered in config space.
>
> To create a new vhost-user command to decrease that maximum number of
> queues may be an option. But we can do it without adding more
> commands, remapping the CVQ index at virtqueue setup. I think it
> should be doable using (struct vhost_dev).vq_index and maybe a few
> adjustments here and there.

It requires device specific knowledge, it might work for networking
devices but not others (or need new codes).

Thanks

>
> Thoughts?
>
> Thanks!
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user
  2023-01-31 19:11 ` Eugenio Perez Martin
  2023-01-31 21:32   ` Michael S. Tsirkin
@ 2023-02-01  3:29   ` Jason Wang
  2023-02-01  7:49     ` Eugenio Perez Martin
  2023-02-01 10:53     ` Michael S. Tsirkin
  1 sibling, 2 replies; 23+ messages in thread
From: Jason Wang @ 2023-02-01  3:29 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Maxime Coquelin, Michael Tsirkin, Cindy Lu, Stefano Garzarella,
	qemu-level, Laurent Vivier, Juan Quintela

On Wed, Feb 1, 2023 at 3:11 AM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Tue, Jan 31, 2023 at 8:10 PM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
> >
> > Hi,
> >
> > The current approach of offering an emulated CVQ to the guest and map
> > the commands to vhost-user is not scaling well:
> > * Some devices already offer it, so the transformation is redundant.
> > * There is no support for commands with variable length (RSS?)
> >
> > We can solve both of them by offering it through vhost-user the same
> > way as vhost-vdpa do. With this approach qemu needs to track the
> > commands, for similar reasons as vhost-vdpa: qemu needs to track the
> > device status for live migration. vhost-user should use the same SVQ
> > code for this, so we avoid duplications.
> >
> > One of the challenges here is to know what virtqueue to shadow /
> > isolate. The vhost-user device may not have the same queues as the
> > device frontend:
> > * The first depends on the actual vhost-user device, and qemu fetches
> > it with VHOST_USER_GET_QUEUE_NUM at the moment.
> > * The qemu device frontend's is set by netdev queues= cmdline parameter in qemu
> >
> > For the device, the CVQ is the last one it offers, but for the guest
> > it is the last one offered in config space.
> >
> > To create a new vhost-user command to decrease that maximum number of
> > queues may be an option. But we can do it without adding more
> > commands, remapping the CVQ index at virtqueue setup. I think it
> > should be doable using (struct vhost_dev).vq_index and maybe a few
> > adjustments here and there.
> >
> > Thoughts?
> >
> > Thanks!
>
>
> (Starting a separated thread to vhost-vdpa related use case)
>
> This could also work for vhost-vdpa if we ever decide to honor netdev
> queues argument. It is totally ignored now, as opposed to the rest of
> backends:
> * vhost-kernel, whose tap device has the requested number of queues.
> * vhost-user, that errors with ("you are asking more queues than
> supported") if the vhost-user parent device has less queues than
> requested (by vhost-user msg VHOST_USER_GET_QUEUE_NUM).
>
> One of the reasons for this is that device configuration space is
> totally passthrough, with the values for mtu, rss conditions, etc.
> This is not ideal, as qemu cannot check src and destination
> equivalence and they can change under the feets of the guest in the
> event of a migration.

This looks not the responsibility of qemu but the upper layer (to
provision the same config/features in src/dst).

> External tools are needed for this, duplicating
> part of the effort.
>
> Start intercepting config space accesses and offering an emulated one
> to the guest with this kind of adjustments is beneficial, as it makes
> vhost-vdpa more similar to the rest of backends, making the surprise
> on a change way lower.

This probably needs more thought, since vDPA already provides a kind
of emulation in the kernel. My understanding is that it would be
sufficient to add checks to make sure the config that guests see is
consistent with what host provisioned?

Thanks

>
> Thoughts?
>
> Thanks!
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user
  2023-02-01  3:27 ` Jason Wang
@ 2023-02-01  6:55   ` Eugenio Perez Martin
  2023-02-02  3:02     ` Jason Wang
  0 siblings, 1 reply; 23+ messages in thread
From: Eugenio Perez Martin @ 2023-02-01  6:55 UTC (permalink / raw)
  To: Jason Wang
  Cc: Maxime Coquelin, Michael Tsirkin, Cindy Lu, Stefano Garzarella,
	qemu-level, Laurent Vivier, Juan Quintela

On Wed, Feb 1, 2023 at 4:27 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Wed, Feb 1, 2023 at 3:10 AM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> >
> > Hi,
> >
> > The current approach of offering an emulated CVQ to the guest and map
> > the commands to vhost-user is not scaling well:
> > * Some devices already offer it, so the transformation is redundant.
> > * There is no support for commands with variable length (RSS?)
> >
> > We can solve both of them by offering it through vhost-user the same
> > way as vhost-vdpa do. With this approach qemu needs to track the
> > commands, for similar reasons as vhost-vdpa: qemu needs to track the
> > device status for live migration. vhost-user should use the same SVQ
> > code for this, so we avoid duplications.
>
> Note that it really depends on the model we used. SVQ works well for
> trap and emulation (without new API to be invented). But if save and
> load API is invented, SVQ is not a must.
>

That's right, but the premise in the proposal is that control vq RSS
messages are already complex enough to avoid adding more vhost-user
messages. I cannot imagine expanding the vhost-user or virtio API to
add the save and restore functions soon :).

> >
> > One of the challenges here is to know what virtqueue to shadow /
> > isolate. The vhost-user device may not have the same queues as the
> > device frontend:
> > * The first depends on the actual vhost-user device, and qemu fetches
> > it with VHOST_USER_GET_QUEUE_NUM at the moment.
> > * The qemu device frontend's is set by netdev queues= cmdline parameter in qemu
> >
> > For the device, the CVQ is the last one it offers, but for the guest
> > it is the last one offered in config space.
> >
> > To create a new vhost-user command to decrease that maximum number of
> > queues may be an option. But we can do it without adding more
> > commands, remapping the CVQ index at virtqueue setup. I think it
> > should be doable using (struct vhost_dev).vq_index and maybe a few
> > adjustments here and there.
>
> It requires device specific knowledge, it might work for networking
> devices but not others (or need new codes).
>

Yes, I didn't review all the other kinds of devices for the proposal,
but I'm assuming:
* There is no other device that has already implemented CVQ over
vhost-user (or this problems would have been solved)
* All vhost-user devices config space are already offered by qemu like
vhost-user net, and the cvq-alike index is well defined in the
standard like -net.

So this proposal should fit all those devices, isn't it?

Thanks!



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user
  2023-02-01  3:29   ` Jason Wang
@ 2023-02-01  7:49     ` Eugenio Perez Martin
  2023-02-01 10:44       ` Michael S. Tsirkin
  2023-02-02  3:41       ` Jason Wang
  2023-02-01 10:53     ` Michael S. Tsirkin
  1 sibling, 2 replies; 23+ messages in thread
From: Eugenio Perez Martin @ 2023-02-01  7:49 UTC (permalink / raw)
  To: Jason Wang
  Cc: Maxime Coquelin, Michael Tsirkin, Cindy Lu, Stefano Garzarella,
	qemu-level, Laurent Vivier, Juan Quintela

On Wed, Feb 1, 2023 at 4:29 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Wed, Feb 1, 2023 at 3:11 AM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> >
> > On Tue, Jan 31, 2023 at 8:10 PM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > Hi,
> > >
> > > The current approach of offering an emulated CVQ to the guest and map
> > > the commands to vhost-user is not scaling well:
> > > * Some devices already offer it, so the transformation is redundant.
> > > * There is no support for commands with variable length (RSS?)
> > >
> > > We can solve both of them by offering it through vhost-user the same
> > > way as vhost-vdpa do. With this approach qemu needs to track the
> > > commands, for similar reasons as vhost-vdpa: qemu needs to track the
> > > device status for live migration. vhost-user should use the same SVQ
> > > code for this, so we avoid duplications.
> > >
> > > One of the challenges here is to know what virtqueue to shadow /
> > > isolate. The vhost-user device may not have the same queues as the
> > > device frontend:
> > > * The first depends on the actual vhost-user device, and qemu fetches
> > > it with VHOST_USER_GET_QUEUE_NUM at the moment.
> > > * The qemu device frontend's is set by netdev queues= cmdline parameter in qemu
> > >
> > > For the device, the CVQ is the last one it offers, but for the guest
> > > it is the last one offered in config space.
> > >
> > > To create a new vhost-user command to decrease that maximum number of
> > > queues may be an option. But we can do it without adding more
> > > commands, remapping the CVQ index at virtqueue setup. I think it
> > > should be doable using (struct vhost_dev).vq_index and maybe a few
> > > adjustments here and there.
> > >
> > > Thoughts?
> > >
> > > Thanks!
> >
> >
> > (Starting a separated thread to vhost-vdpa related use case)
> >
> > This could also work for vhost-vdpa if we ever decide to honor netdev
> > queues argument. It is totally ignored now, as opposed to the rest of
> > backends:
> > * vhost-kernel, whose tap device has the requested number of queues.
> > * vhost-user, that errors with ("you are asking more queues than
> > supported") if the vhost-user parent device has less queues than
> > requested (by vhost-user msg VHOST_USER_GET_QUEUE_NUM).
> >
> > One of the reasons for this is that device configuration space is
> > totally passthrough, with the values for mtu, rss conditions, etc.
> > This is not ideal, as qemu cannot check src and destination
> > equivalence and they can change under the feets of the guest in the
> > event of a migration.
>
> This looks not the responsibility of qemu but the upper layer (to
> provision the same config/features in src/dst).

I think both share it. Or, at least, that it is inconsistent that QEMU
is in charge of checking / providing consistency for virtio features,
but not virtio-net config space.

If we follow that to the extreme, we could simply delete the feature
checks, right?

>
> > External tools are needed for this, duplicating
> > part of the effort.
> >
> > Start intercepting config space accesses and offering an emulated one
> > to the guest with this kind of adjustments is beneficial, as it makes
> > vhost-vdpa more similar to the rest of backends, making the surprise
> > on a change way lower.
>
> This probably needs more thought, since vDPA already provides a kind
> of emulation in the kernel. My understanding is that it would be
> sufficient to add checks to make sure the config that guests see is
> consistent with what host provisioned?
>

With host provisioned you mean with "vdpa" tool or with qemu? Also, we
need a way to communicate the guest values to it If those checks are
added in the kernel.

The reasoning here is the same as above: QEMU already filters features
with its own emulated layer, so the operator can specify a feature
that will never appear to the guest. It has other uses (abstract
between transport for example), but feature filtering is definitely a
thing there.

A feature set to off in a VM (or that does not exist in that
particular qemu version) will never appear as on even in the case of
migration to modern qemu versions.

We don't have the equivalent protection for device config space. QEMU
could assure a consistent MTU, number of queues, etc for the guest in
virtio_net_get_config (and equivalent for other kinds of devices).
QEMU already has some transformations there. It shouldn't take a lot
of code.

Having said that:
* I'm ok with starting just with checks there instead of
transformations like the queues remap proposed here.
* If we choose not to implement it, I'm not proposing to actually
delete the features checks, as I see them useful :).

Thanks!



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user
  2023-02-01  7:49     ` Eugenio Perez Martin
@ 2023-02-01 10:44       ` Michael S. Tsirkin
  2023-02-02  3:41       ` Jason Wang
  1 sibling, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2023-02-01 10:44 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Jason Wang, Maxime Coquelin, Cindy Lu, Stefano Garzarella,
	qemu-level, Laurent Vivier, Juan Quintela

On Wed, Feb 01, 2023 at 08:49:30AM +0100, Eugenio Perez Martin wrote:
> On Wed, Feb 1, 2023 at 4:29 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Wed, Feb 1, 2023 at 3:11 AM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > >
> > > On Tue, Jan 31, 2023 at 8:10 PM Eugenio Perez Martin
> > > <eperezma@redhat.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > The current approach of offering an emulated CVQ to the guest and map
> > > > the commands to vhost-user is not scaling well:
> > > > * Some devices already offer it, so the transformation is redundant.
> > > > * There is no support for commands with variable length (RSS?)
> > > >
> > > > We can solve both of them by offering it through vhost-user the same
> > > > way as vhost-vdpa do. With this approach qemu needs to track the
> > > > commands, for similar reasons as vhost-vdpa: qemu needs to track the
> > > > device status for live migration. vhost-user should use the same SVQ
> > > > code for this, so we avoid duplications.
> > > >
> > > > One of the challenges here is to know what virtqueue to shadow /
> > > > isolate. The vhost-user device may not have the same queues as the
> > > > device frontend:
> > > > * The first depends on the actual vhost-user device, and qemu fetches
> > > > it with VHOST_USER_GET_QUEUE_NUM at the moment.
> > > > * The qemu device frontend's is set by netdev queues= cmdline parameter in qemu
> > > >
> > > > For the device, the CVQ is the last one it offers, but for the guest
> > > > it is the last one offered in config space.
> > > >
> > > > To create a new vhost-user command to decrease that maximum number of
> > > > queues may be an option. But we can do it without adding more
> > > > commands, remapping the CVQ index at virtqueue setup. I think it
> > > > should be doable using (struct vhost_dev).vq_index and maybe a few
> > > > adjustments here and there.
> > > >
> > > > Thoughts?
> > > >
> > > > Thanks!
> > >
> > >
> > > (Starting a separated thread to vhost-vdpa related use case)
> > >
> > > This could also work for vhost-vdpa if we ever decide to honor netdev
> > > queues argument. It is totally ignored now, as opposed to the rest of
> > > backends:
> > > * vhost-kernel, whose tap device has the requested number of queues.
> > > * vhost-user, that errors with ("you are asking more queues than
> > > supported") if the vhost-user parent device has less queues than
> > > requested (by vhost-user msg VHOST_USER_GET_QUEUE_NUM).
> > >
> > > One of the reasons for this is that device configuration space is
> > > totally passthrough, with the values for mtu, rss conditions, etc.
> > > This is not ideal, as qemu cannot check src and destination
> > > equivalence and they can change under the feets of the guest in the
> > > event of a migration.
> >
> > This looks not the responsibility of qemu but the upper layer (to
> > provision the same config/features in src/dst).
> 
> I think both share it. Or, at least, that it is inconsistent that QEMU
> is in charge of checking / providing consistency for virtio features,
> but not virtio-net config space.
> 
> If we follow that to the extreme, we could simply delete the feature
> checks, right?
> 
> >
> > > External tools are needed for this, duplicating
> > > part of the effort.
> > >
> > > Start intercepting config space accesses and offering an emulated one
> > > to the guest with this kind of adjustments is beneficial, as it makes
> > > vhost-vdpa more similar to the rest of backends, making the surprise
> > > on a change way lower.
> >
> > This probably needs more thought, since vDPA already provides a kind
> > of emulation in the kernel. My understanding is that it would be
> > sufficient to add checks to make sure the config that guests see is
> > consistent with what host provisioned?
> >
> 
> With host provisioned you mean with "vdpa" tool or with qemu? Also, we
> need a way to communicate the guest values to it If those checks are
> added in the kernel.
> 
> The reasoning here is the same as above: QEMU already filters features
> with its own emulated layer, so the operator can specify a feature
> that will never appear to the guest. It has other uses (abstract
> between transport for example), but feature filtering is definitely a
> thing there.
> 
> A feature set to off in a VM (or that does not exist in that
> particular qemu version) will never appear as on even in the case of
> migration to modern qemu versions.
> 
> We don't have the equivalent protection for device config space. QEMU
> could assure a consistent MTU, number of queues, etc for the guest in
> virtio_net_get_config (and equivalent for other kinds of devices).
> QEMU already has some transformations there. It shouldn't take a lot
> of code.

I think I agree. It's the easiest way to ensure migration
consistency without troubles.

> Having said that:
> * I'm ok with starting just with checks there instead of
> transformations like the queues remap proposed here.
> * If we choose not to implement it, I'm not proposing to actually
> delete the features checks, as I see them useful :).
> 
> Thanks!



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user
  2023-02-01  3:29   ` Jason Wang
  2023-02-01  7:49     ` Eugenio Perez Martin
@ 2023-02-01 10:53     ` Michael S. Tsirkin
  1 sibling, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2023-02-01 10:53 UTC (permalink / raw)
  To: Jason Wang
  Cc: Eugenio Perez Martin, Maxime Coquelin, Cindy Lu,
	Stefano Garzarella, qemu-level, Laurent Vivier, Juan Quintela

On Wed, Feb 01, 2023 at 11:29:35AM +0800, Jason Wang wrote:
> On Wed, Feb 1, 2023 at 3:11 AM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> >
> > On Tue, Jan 31, 2023 at 8:10 PM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > Hi,
> > >
> > > The current approach of offering an emulated CVQ to the guest and map
> > > the commands to vhost-user is not scaling well:
> > > * Some devices already offer it, so the transformation is redundant.
> > > * There is no support for commands with variable length (RSS?)
> > >
> > > We can solve both of them by offering it through vhost-user the same
> > > way as vhost-vdpa do. With this approach qemu needs to track the
> > > commands, for similar reasons as vhost-vdpa: qemu needs to track the
> > > device status for live migration. vhost-user should use the same SVQ
> > > code for this, so we avoid duplications.
> > >
> > > One of the challenges here is to know what virtqueue to shadow /
> > > isolate. The vhost-user device may not have the same queues as the
> > > device frontend:
> > > * The first depends on the actual vhost-user device, and qemu fetches
> > > it with VHOST_USER_GET_QUEUE_NUM at the moment.
> > > * The qemu device frontend's is set by netdev queues= cmdline parameter in qemu
> > >
> > > For the device, the CVQ is the last one it offers, but for the guest
> > > it is the last one offered in config space.
> > >
> > > To create a new vhost-user command to decrease that maximum number of
> > > queues may be an option. But we can do it without adding more
> > > commands, remapping the CVQ index at virtqueue setup. I think it
> > > should be doable using (struct vhost_dev).vq_index and maybe a few
> > > adjustments here and there.
> > >
> > > Thoughts?
> > >
> > > Thanks!
> >
> >
> > (Starting a separated thread to vhost-vdpa related use case)
> >
> > This could also work for vhost-vdpa if we ever decide to honor netdev
> > queues argument. It is totally ignored now, as opposed to the rest of
> > backends:
> > * vhost-kernel, whose tap device has the requested number of queues.
> > * vhost-user, that errors with ("you are asking more queues than
> > supported") if the vhost-user parent device has less queues than
> > requested (by vhost-user msg VHOST_USER_GET_QUEUE_NUM).
> >
> > One of the reasons for this is that device configuration space is
> > totally passthrough, with the values for mtu, rss conditions, etc.
> > This is not ideal, as qemu cannot check src and destination
> > equivalence and they can change under the feets of the guest in the
> > event of a migration.
> 
> This looks not the responsibility of qemu but the upper layer (to
> provision the same config/features in src/dst).
> 
> > External tools are needed for this, duplicating
> > part of the effort.
> >
> > Start intercepting config space accesses and offering an emulated one
> > to the guest with this kind of adjustments is beneficial, as it makes
> > vhost-vdpa more similar to the rest of backends, making the surprise
> > on a change way lower.
> 
> This probably needs more thought, since vDPA already provides a kind
> of emulation in the kernel. My understanding is that it would be
> sufficient to add checks to make sure the config that guests see is
> consistent with what host provisioned?
> 
> Thanks

I feel ability to adjust and emulate one device on top of another
one is a strong suit of virtio (as opposed to e.g. vfio which
always wants exactly the same device). Let's not throw this baby
out with the bathwater.

> >
> > Thoughts?
> >
> > Thanks!
> >



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user
  2023-01-31 19:10 Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user Eugenio Perez Martin
  2023-01-31 19:11 ` Eugenio Perez Martin
  2023-02-01  3:27 ` Jason Wang
@ 2023-02-01 11:14 ` Maxime Coquelin
  2023-02-01 11:19   ` Eugenio Perez Martin
  2023-02-01 11:20   ` Michael S. Tsirkin
  2 siblings, 2 replies; 23+ messages in thread
From: Maxime Coquelin @ 2023-02-01 11:14 UTC (permalink / raw)
  To: Eugenio Perez Martin, Jason Wang, Michael Tsirkin, Cindy Lu,
	Stefano Garzarella, qemu-level, Laurent Vivier, Juan Quintela

Thanks Eugenio for working on this.

On 1/31/23 20:10, Eugenio Perez Martin wrote:
> Hi,
> 
> The current approach of offering an emulated CVQ to the guest and map
> the commands to vhost-user is not scaling well:
> * Some devices already offer it, so the transformation is redundant.
> * There is no support for commands with variable length (RSS?)
> 
> We can solve both of them by offering it through vhost-user the same
> way as vhost-vdpa do. With this approach qemu needs to track the
> commands, for similar reasons as vhost-vdpa: qemu needs to track the
> device status for live migration. vhost-user should use the same SVQ
> code for this, so we avoid duplications.
> 
> One of the challenges here is to know what virtqueue to shadow /
> isolate. The vhost-user device may not have the same queues as the
> device frontend:
> * The first depends on the actual vhost-user device, and qemu fetches
> it with VHOST_USER_GET_QUEUE_NUM at the moment.
> * The qemu device frontend's is set by netdev queues= cmdline parameter in qemu
> 
> For the device, the CVQ is the last one it offers, but for the guest
> it is the last one offered in config space.
> 
> To create a new vhost-user command to decrease that maximum number of
> queues may be an option. But we can do it without adding more
> commands, remapping the CVQ index at virtqueue setup. I think it
> should be doable using (struct vhost_dev).vq_index and maybe a few
> adjustments here and there.
> 
> Thoughts?

I am fine with both proposals.
I think index remapping will require a bit more rework in the DPDK
Vhost-user library, but nothing insurmountable.

I am currently working on a PoC adding support for VDUSE in the DPDK
Vhost library, and recently added control queue support. We can reuse it
if we want to prototype your proposal.

Maxime

> Thanks!
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user
  2023-02-01 11:14 ` Maxime Coquelin
@ 2023-02-01 11:19   ` Eugenio Perez Martin
  2023-03-02  8:48     ` Maxime Coquelin
  2023-02-01 11:20   ` Michael S. Tsirkin
  1 sibling, 1 reply; 23+ messages in thread
From: Eugenio Perez Martin @ 2023-02-01 11:19 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: Jason Wang, Michael Tsirkin, Cindy Lu, Stefano Garzarella,
	qemu-level, Laurent Vivier, Juan Quintela

On Wed, Feb 1, 2023 at 12:14 PM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
>
> Thanks Eugenio for working on this.
>
> On 1/31/23 20:10, Eugenio Perez Martin wrote:
> > Hi,
> >
> > The current approach of offering an emulated CVQ to the guest and map
> > the commands to vhost-user is not scaling well:
> > * Some devices already offer it, so the transformation is redundant.
> > * There is no support for commands with variable length (RSS?)
> >
> > We can solve both of them by offering it through vhost-user the same
> > way as vhost-vdpa do. With this approach qemu needs to track the
> > commands, for similar reasons as vhost-vdpa: qemu needs to track the
> > device status for live migration. vhost-user should use the same SVQ
> > code for this, so we avoid duplications.
> >
> > One of the challenges here is to know what virtqueue to shadow /
> > isolate. The vhost-user device may not have the same queues as the
> > device frontend:
> > * The first depends on the actual vhost-user device, and qemu fetches
> > it with VHOST_USER_GET_QUEUE_NUM at the moment.
> > * The qemu device frontend's is set by netdev queues= cmdline parameter in qemu
> >
> > For the device, the CVQ is the last one it offers, but for the guest
> > it is the last one offered in config space.
> >
> > To create a new vhost-user command to decrease that maximum number of
> > queues may be an option. But we can do it without adding more
> > commands, remapping the CVQ index at virtqueue setup. I think it
> > should be doable using (struct vhost_dev).vq_index and maybe a few
> > adjustments here and there.
> >
> > Thoughts?
>
> I am fine with both proposals.
> I think index remapping will require a bit more rework in the DPDK
> Vhost-user library, but nothing insurmountable.
>

Can you expand on this? I think my proposal does not require modifying
anything in the device. Or you mean hw/virtio/vhost-user.c and
similar?

Thanks!

> I am currently working on a PoC adding support for VDUSE in the DPDK
> Vhost library, and recently added control queue support. We can reuse it
> if we want to prototype your proposal.
>

Sure, that would be great.

Thanks!



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user
  2023-02-01 11:14 ` Maxime Coquelin
  2023-02-01 11:19   ` Eugenio Perez Martin
@ 2023-02-01 11:20   ` Michael S. Tsirkin
  2023-02-01 11:48     ` Eugenio Perez Martin
  2023-03-08 10:33     ` Maxime Coquelin
  1 sibling, 2 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2023-02-01 11:20 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: Eugenio Perez Martin, Jason Wang, Cindy Lu, Stefano Garzarella,
	qemu-level, Laurent Vivier, Juan Quintela

On Wed, Feb 01, 2023 at 12:14:18PM +0100, Maxime Coquelin wrote:
> Thanks Eugenio for working on this.
> 
> On 1/31/23 20:10, Eugenio Perez Martin wrote:
> > Hi,
> > 
> > The current approach of offering an emulated CVQ to the guest and map
> > the commands to vhost-user is not scaling well:
> > * Some devices already offer it, so the transformation is redundant.
> > * There is no support for commands with variable length (RSS?)
> > 
> > We can solve both of them by offering it through vhost-user the same
> > way as vhost-vdpa do. With this approach qemu needs to track the
> > commands, for similar reasons as vhost-vdpa: qemu needs to track the
> > device status for live migration. vhost-user should use the same SVQ
> > code for this, so we avoid duplications.
> > 
> > One of the challenges here is to know what virtqueue to shadow /
> > isolate. The vhost-user device may not have the same queues as the
> > device frontend:
> > * The first depends on the actual vhost-user device, and qemu fetches
> > it with VHOST_USER_GET_QUEUE_NUM at the moment.
> > * The qemu device frontend's is set by netdev queues= cmdline parameter in qemu
> > 
> > For the device, the CVQ is the last one it offers, but for the guest
> > it is the last one offered in config space.
> > 
> > To create a new vhost-user command to decrease that maximum number of
> > queues may be an option. But we can do it without adding more
> > commands, remapping the CVQ index at virtqueue setup. I think it
> > should be doable using (struct vhost_dev).vq_index and maybe a few
> > adjustments here and there.
> > 
> > Thoughts?
> 
> I am fine with both proposals.
> I think index remapping will require a bit more rework in the DPDK
> Vhost-user library, but nothing insurmountable.
> 
> I am currently working on a PoC adding support for VDUSE in the DPDK
> Vhost library, and recently added control queue support. We can reuse it
> if we want to prototype your proposal.
> 
> Maxime
> 
> > Thanks!
> > 


technically backend knows how many vqs are there, last one is cvq...
not sure we need full blown remapping ...

-- 
MST



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user
  2023-02-01 11:20   ` Michael S. Tsirkin
@ 2023-02-01 11:48     ` Eugenio Perez Martin
  2023-02-02  3:44       ` Jason Wang
  2023-03-08 10:33     ` Maxime Coquelin
  1 sibling, 1 reply; 23+ messages in thread
From: Eugenio Perez Martin @ 2023-02-01 11:48 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Maxime Coquelin, Jason Wang, Cindy Lu, Stefano Garzarella,
	qemu-level, Laurent Vivier, Juan Quintela

On Wed, Feb 1, 2023 at 12:20 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Feb 01, 2023 at 12:14:18PM +0100, Maxime Coquelin wrote:
> > Thanks Eugenio for working on this.
> >
> > On 1/31/23 20:10, Eugenio Perez Martin wrote:
> > > Hi,
> > >
> > > The current approach of offering an emulated CVQ to the guest and map
> > > the commands to vhost-user is not scaling well:
> > > * Some devices already offer it, so the transformation is redundant.
> > > * There is no support for commands with variable length (RSS?)
> > >
> > > We can solve both of them by offering it through vhost-user the same
> > > way as vhost-vdpa do. With this approach qemu needs to track the
> > > commands, for similar reasons as vhost-vdpa: qemu needs to track the
> > > device status for live migration. vhost-user should use the same SVQ
> > > code for this, so we avoid duplications.
> > >
> > > One of the challenges here is to know what virtqueue to shadow /
> > > isolate. The vhost-user device may not have the same queues as the
> > > device frontend:
> > > * The first depends on the actual vhost-user device, and qemu fetches
> > > it with VHOST_USER_GET_QUEUE_NUM at the moment.
> > > * The qemu device frontend's is set by netdev queues= cmdline parameter in qemu
> > >
> > > For the device, the CVQ is the last one it offers, but for the guest
> > > it is the last one offered in config space.
> > >
> > > To create a new vhost-user command to decrease that maximum number of
> > > queues may be an option. But we can do it without adding more
> > > commands, remapping the CVQ index at virtqueue setup. I think it
> > > should be doable using (struct vhost_dev).vq_index and maybe a few
> > > adjustments here and there.
> > >
> > > Thoughts?
> >
> > I am fine with both proposals.
> > I think index remapping will require a bit more rework in the DPDK
> > Vhost-user library, but nothing insurmountable.
> >
> > I am currently working on a PoC adding support for VDUSE in the DPDK
> > Vhost library, and recently added control queue support. We can reuse it
> > if we want to prototype your proposal.
> >
> > Maxime
> >
> > > Thanks!
> > >
>
>
> technically backend knows how many vqs are there, last one is cvq...
> not sure we need full blown remapping ...
>

The number of queues may not be the same between cmdline and the device.

If vhost-user device cmdline wants more queues than offered by the
device qemu will print an error. But the reverse (to offer the same
number of queues or less than the device have) is valid at this
moment.

If we add cvq with this scheme, cvq index will not be the same between
the guest and the device.

vhost-vdpa totally ignores the queues parameter, so we're losing the
opportunity to offer a consistent config space in the event of a
migration. I suggest we should act the same way as I'm proposing here
with vhost-user, so:
* QEMU can block the migration in the case the destination cannot
offer the same number of queues.
* The guest will not see a change of the config space under its feets.

Now there are other fields in the config space for sure (mtu, rss
size, etc), but I think the most complex case is the number of queues
because cvq.

Is that clearer?

Thanks!



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user
  2023-02-01  6:55   ` Eugenio Perez Martin
@ 2023-02-02  3:02     ` Jason Wang
  0 siblings, 0 replies; 23+ messages in thread
From: Jason Wang @ 2023-02-02  3:02 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Maxime Coquelin, Michael Tsirkin, Cindy Lu, Stefano Garzarella,
	qemu-level, Laurent Vivier, Juan Quintela

On Wed, Feb 1, 2023 at 2:56 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Wed, Feb 1, 2023 at 4:27 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Wed, Feb 1, 2023 at 3:10 AM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > >
> > > Hi,
> > >
> > > The current approach of offering an emulated CVQ to the guest and map
> > > the commands to vhost-user is not scaling well:
> > > * Some devices already offer it, so the transformation is redundant.
> > > * There is no support for commands with variable length (RSS?)
> > >
> > > We can solve both of them by offering it through vhost-user the same
> > > way as vhost-vdpa do. With this approach qemu needs to track the
> > > commands, for similar reasons as vhost-vdpa: qemu needs to track the
> > > device status for live migration. vhost-user should use the same SVQ
> > > code for this, so we avoid duplications.
> >
> > Note that it really depends on the model we used. SVQ works well for
> > trap and emulation (without new API to be invented). But if save and
> > load API is invented, SVQ is not a must.
> >
>
> That's right, but the premise in the proposal is that control vq RSS
> messages are already complex enough to avoid adding more vhost-user
> messages. I cannot imagine expanding the vhost-user or virtio API to
> add the save and restore functions soon :).

Probably, but we know we need that some day.

>
> > >
> > > One of the challenges here is to know what virtqueue to shadow /
> > > isolate. The vhost-user device may not have the same queues as the
> > > device frontend:
> > > * The first depends on the actual vhost-user device, and qemu fetches
> > > it with VHOST_USER_GET_QUEUE_NUM at the moment.
> > > * The qemu device frontend's is set by netdev queues= cmdline parameter in qemu
> > >
> > > For the device, the CVQ is the last one it offers, but for the guest
> > > it is the last one offered in config space.
> > >
> > > To create a new vhost-user command to decrease that maximum number of
> > > queues may be an option. But we can do it without adding more
> > > commands, remapping the CVQ index at virtqueue setup. I think it
> > > should be doable using (struct vhost_dev).vq_index and maybe a few
> > > adjustments here and there.
> >
> > It requires device specific knowledge, it might work for networking
> > devices but not others (or need new codes).
> >
>
> Yes, I didn't review all the other kinds of devices for the proposal,
> but I'm assuming:
> * There is no other device that has already implemented CVQ over
> vhost-user (or this problems would have been solved)

If I was not wrong, vhost-user protocol itself allows arbitrary types
of virtqueue to be implemented on top. So it might just be an
implementation issue.

> * All vhost-user devices config space are already offered by qemu like
> vhost-user net, and the cvq-alike index is well defined in the
> standard like -net.
>
> So this proposal should fit all those devices, isn't it?

If it could be done without extending vhost-user API, that would be fine.

Thanks

>
> Thanks!
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user
  2023-02-01  7:49     ` Eugenio Perez Martin
  2023-02-01 10:44       ` Michael S. Tsirkin
@ 2023-02-02  3:41       ` Jason Wang
  2023-02-02 18:32         ` Eugenio Perez Martin
  1 sibling, 1 reply; 23+ messages in thread
From: Jason Wang @ 2023-02-02  3:41 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Maxime Coquelin, Michael Tsirkin, Cindy Lu, Stefano Garzarella,
	qemu-level, Laurent Vivier, Juan Quintela


在 2023/2/1 15:49, Eugenio Perez Martin 写道:
> On Wed, Feb 1, 2023 at 4:29 AM Jason Wang <jasowang@redhat.com> wrote:
>> On Wed, Feb 1, 2023 at 3:11 AM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>>> On Tue, Jan 31, 2023 at 8:10 PM Eugenio Perez Martin
>>> <eperezma@redhat.com> wrote:
>>>> Hi,
>>>>
>>>> The current approach of offering an emulated CVQ to the guest and map
>>>> the commands to vhost-user is not scaling well:
>>>> * Some devices already offer it, so the transformation is redundant.
>>>> * There is no support for commands with variable length (RSS?)
>>>>
>>>> We can solve both of them by offering it through vhost-user the same
>>>> way as vhost-vdpa do. With this approach qemu needs to track the
>>>> commands, for similar reasons as vhost-vdpa: qemu needs to track the
>>>> device status for live migration. vhost-user should use the same SVQ
>>>> code for this, so we avoid duplications.
>>>>
>>>> One of the challenges here is to know what virtqueue to shadow /
>>>> isolate. The vhost-user device may not have the same queues as the
>>>> device frontend:
>>>> * The first depends on the actual vhost-user device, and qemu fetches
>>>> it with VHOST_USER_GET_QUEUE_NUM at the moment.
>>>> * The qemu device frontend's is set by netdev queues= cmdline parameter in qemu
>>>>
>>>> For the device, the CVQ is the last one it offers, but for the guest
>>>> it is the last one offered in config space.
>>>>
>>>> To create a new vhost-user command to decrease that maximum number of
>>>> queues may be an option. But we can do it without adding more
>>>> commands, remapping the CVQ index at virtqueue setup. I think it
>>>> should be doable using (struct vhost_dev).vq_index and maybe a few
>>>> adjustments here and there.
>>>>
>>>> Thoughts?
>>>>
>>>> Thanks!
>>>
>>> (Starting a separated thread to vhost-vdpa related use case)
>>>
>>> This could also work for vhost-vdpa if we ever decide to honor netdev
>>> queues argument. It is totally ignored now, as opposed to the rest of
>>> backends:
>>> * vhost-kernel, whose tap device has the requested number of queues.
>>> * vhost-user, that errors with ("you are asking more queues than
>>> supported") if the vhost-user parent device has less queues than
>>> requested (by vhost-user msg VHOST_USER_GET_QUEUE_NUM).
>>>
>>> One of the reasons for this is that device configuration space is
>>> totally passthrough, with the values for mtu, rss conditions, etc.
>>> This is not ideal, as qemu cannot check src and destination
>>> equivalence and they can change under the feets of the guest in the
>>> event of a migration.
>> This looks not the responsibility of qemu but the upper layer (to
>> provision the same config/features in src/dst).
> I think both share it. Or, at least, that it is inconsistent that QEMU
> is in charge of checking / providing consistency for virtio features,
> but not virtio-net config space.
>
> If we follow that to the extreme, we could simply delete the feature
> checks, right?


Just to make sure we are at the same page.

If you mean deleting the feature checks in Qemu, then I think we can't 
do that.

What I meant is.

Consider vDPA is provisioned (either netlink or other way) with featureX 
and configY. It would be sufficient to validate if the emulated device 
features and configs matches exactly what vDPA device had.

Technically, it should be possible to doing any mediation in the middle 
but it may cause a lot of troubles in the management and others, consider:

featureX is not provisioned but emulated by Qemu, then it's almost 
impossible for the management to check the migration compatibility. If 
feature X can be easily emulated, it should be done in the layer of vDPA 
parent not Qemu, then it could be recognized by the management.


>
>>> External tools are needed for this, duplicating
>>> part of the effort.
>>>
>>> Start intercepting config space accesses and offering an emulated one
>>> to the guest with this kind of adjustments is beneficial, as it makes
>>> vhost-vdpa more similar to the rest of backends, making the surprise
>>> on a change way lower.
>> This probably needs more thought, since vDPA already provides a kind
>> of emulation in the kernel. My understanding is that it would be
>> sufficient to add checks to make sure the config that guests see is
>> consistent with what host provisioned?
>>
> With host provisioned you mean with "vdpa" tool or with qemu?


Make sure the features and config of emulated device provided by Qemu 
matches the vDPA device provisioned via netlink or other mgmt API.


> Also, we
> need a way to communicate the guest values to it If those checks are
> added in the kernel.
>
> The reasoning here is the same as above: QEMU already filters features
> with its own emulated layer, so the operator can specify a feature
> that will never appear to the guest.


This needs to be done at the time of vDPA device provisioning. Otherwise 
we will end up with a lot of corner cases. E.g if 8 queue pairs is 
provisioned, do we allow starting a guest with 4 queue pairs?


>   It has other uses (abstract
> between transport for example), but feature filtering is definitely a
> thing there.
>
> A feature set to off in a VM (or that does not exist in that
> particular qemu version) will never appear as on even in the case of
> migration to modern qemu versions.
>
> We don't have the equivalent protection for device config space. QEMU
> could assure a consistent MTU, number of queues, etc for the guest in
> virtio_net_get_config (and equivalent for other kinds of devices).
> QEMU already has some transformations there. It shouldn't take a lot
> of code.
>
> Having said that:
> * I'm ok with starting just with checks there instead of
> transformations like the queues remap proposed here.


I think we need to keep thing easier. Technically, we could do any kind 
of the mediation/emulation via Qemu, but we need only implement the one 
that is really needed.

Queue remapping might complicate a lot stuffs like notification area 
mapping etc.

Thanks


> * If we choose not to implement it, I'm not proposing to actually
> delete the features checks, as I see them useful :).
>
> Thanks!
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user
  2023-02-01 11:48     ` Eugenio Perez Martin
@ 2023-02-02  3:44       ` Jason Wang
  2023-02-02 18:37         ` Eugenio Perez Martin
  0 siblings, 1 reply; 23+ messages in thread
From: Jason Wang @ 2023-02-02  3:44 UTC (permalink / raw)
  To: Eugenio Perez Martin, Michael S. Tsirkin
  Cc: Maxime Coquelin, Cindy Lu, Stefano Garzarella, qemu-level,
	Laurent Vivier, Juan Quintela


在 2023/2/1 19:48, Eugenio Perez Martin 写道:
> On Wed, Feb 1, 2023 at 12:20 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Wed, Feb 01, 2023 at 12:14:18PM +0100, Maxime Coquelin wrote:
>>> Thanks Eugenio for working on this.
>>>
>>> On 1/31/23 20:10, Eugenio Perez Martin wrote:
>>>> Hi,
>>>>
>>>> The current approach of offering an emulated CVQ to the guest and map
>>>> the commands to vhost-user is not scaling well:
>>>> * Some devices already offer it, so the transformation is redundant.
>>>> * There is no support for commands with variable length (RSS?)
>>>>
>>>> We can solve both of them by offering it through vhost-user the same
>>>> way as vhost-vdpa do. With this approach qemu needs to track the
>>>> commands, for similar reasons as vhost-vdpa: qemu needs to track the
>>>> device status for live migration. vhost-user should use the same SVQ
>>>> code for this, so we avoid duplications.
>>>>
>>>> One of the challenges here is to know what virtqueue to shadow /
>>>> isolate. The vhost-user device may not have the same queues as the
>>>> device frontend:
>>>> * The first depends on the actual vhost-user device, and qemu fetches
>>>> it with VHOST_USER_GET_QUEUE_NUM at the moment.
>>>> * The qemu device frontend's is set by netdev queues= cmdline parameter in qemu
>>>>
>>>> For the device, the CVQ is the last one it offers, but for the guest
>>>> it is the last one offered in config space.
>>>>
>>>> To create a new vhost-user command to decrease that maximum number of
>>>> queues may be an option. But we can do it without adding more
>>>> commands, remapping the CVQ index at virtqueue setup. I think it
>>>> should be doable using (struct vhost_dev).vq_index and maybe a few
>>>> adjustments here and there.
>>>>
>>>> Thoughts?
>>> I am fine with both proposals.
>>> I think index remapping will require a bit more rework in the DPDK
>>> Vhost-user library, but nothing insurmountable.
>>>
>>> I am currently working on a PoC adding support for VDUSE in the DPDK
>>> Vhost library, and recently added control queue support. We can reuse it
>>> if we want to prototype your proposal.
>>>
>>> Maxime
>>>
>>>> Thanks!
>>>>
>>
>> technically backend knows how many vqs are there, last one is cvq...
>> not sure we need full blown remapping ...
>>
> The number of queues may not be the same between cmdline and the device.
>
> If vhost-user device cmdline wants more queues than offered by the
> device qemu will print an error. But the reverse (to offer the same
> number of queues or less than the device have) is valid at this
> moment.
>
> If we add cvq with this scheme, cvq index will not be the same between
> the guest and the device.
>
> vhost-vdpa totally ignores the queues parameter, so we're losing the
> opportunity to offer a consistent config space in the event of a
> migration. I suggest we should act the same way as I'm proposing here
> with vhost-user, so:
> * QEMU can block the migration in the case the destination cannot
> offer the same number of queues.
> * The guest will not see a change of the config space under its feets.


As we discussed in the past, it would be easier to fail the device 
initialization in this case.

Thanks


>
> Now there are other fields in the config space for sure (mtu, rss
> size, etc), but I think the most complex case is the number of queues
> because cvq.
>
> Is that clearer?
>
> Thanks!
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user
  2023-02-02  3:41       ` Jason Wang
@ 2023-02-02 18:32         ` Eugenio Perez Martin
  0 siblings, 0 replies; 23+ messages in thread
From: Eugenio Perez Martin @ 2023-02-02 18:32 UTC (permalink / raw)
  To: Jason Wang
  Cc: Maxime Coquelin, Michael Tsirkin, Cindy Lu, Stefano Garzarella,
	qemu-level, Laurent Vivier, Juan Quintela

On Thu, Feb 2, 2023 at 4:41 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2023/2/1 15:49, Eugenio Perez Martin 写道:
> > On Wed, Feb 1, 2023 at 4:29 AM Jason Wang <jasowang@redhat.com> wrote:
> >> On Wed, Feb 1, 2023 at 3:11 AM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> >>> On Tue, Jan 31, 2023 at 8:10 PM Eugenio Perez Martin
> >>> <eperezma@redhat.com> wrote:
> >>>> Hi,
> >>>>
> >>>> The current approach of offering an emulated CVQ to the guest and map
> >>>> the commands to vhost-user is not scaling well:
> >>>> * Some devices already offer it, so the transformation is redundant.
> >>>> * There is no support for commands with variable length (RSS?)
> >>>>
> >>>> We can solve both of them by offering it through vhost-user the same
> >>>> way as vhost-vdpa do. With this approach qemu needs to track the
> >>>> commands, for similar reasons as vhost-vdpa: qemu needs to track the
> >>>> device status for live migration. vhost-user should use the same SVQ
> >>>> code for this, so we avoid duplications.
> >>>>
> >>>> One of the challenges here is to know what virtqueue to shadow /
> >>>> isolate. The vhost-user device may not have the same queues as the
> >>>> device frontend:
> >>>> * The first depends on the actual vhost-user device, and qemu fetches
> >>>> it with VHOST_USER_GET_QUEUE_NUM at the moment.
> >>>> * The qemu device frontend's is set by netdev queues= cmdline parameter in qemu
> >>>>
> >>>> For the device, the CVQ is the last one it offers, but for the guest
> >>>> it is the last one offered in config space.
> >>>>
> >>>> To create a new vhost-user command to decrease that maximum number of
> >>>> queues may be an option. But we can do it without adding more
> >>>> commands, remapping the CVQ index at virtqueue setup. I think it
> >>>> should be doable using (struct vhost_dev).vq_index and maybe a few
> >>>> adjustments here and there.
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> Thanks!
> >>>
> >>> (Starting a separated thread to vhost-vdpa related use case)
> >>>
> >>> This could also work for vhost-vdpa if we ever decide to honor netdev
> >>> queues argument. It is totally ignored now, as opposed to the rest of
> >>> backends:
> >>> * vhost-kernel, whose tap device has the requested number of queues.
> >>> * vhost-user, that errors with ("you are asking more queues than
> >>> supported") if the vhost-user parent device has less queues than
> >>> requested (by vhost-user msg VHOST_USER_GET_QUEUE_NUM).
> >>>
> >>> One of the reasons for this is that device configuration space is
> >>> totally passthrough, with the values for mtu, rss conditions, etc.
> >>> This is not ideal, as qemu cannot check src and destination
> >>> equivalence and they can change under the feets of the guest in the
> >>> event of a migration.
> >> This looks not the responsibility of qemu but the upper layer (to
> >> provision the same config/features in src/dst).
> > I think both share it. Or, at least, that it is inconsistent that QEMU
> > is in charge of checking / providing consistency for virtio features,
> > but not virtio-net config space.
> >
> > If we follow that to the extreme, we could simply delete the feature
> > checks, right?
>
>
> Just to make sure we are at the same page.
>
> If you mean deleting the feature checks in Qemu, then I think we can't
> do that.
>

So my point is, is it expected for the user that it can trust qemu
will migrate features like packed=off/on or tx/rx_queue_size=N but it
will not migrate mtu=N or queues=N?

I know the difference is if the field depends on virtio common config
or virtio net config space. A little bit higher level, if the feature
is common to all virtio devices or only virtio-net. If I'm working
with qemu, I don't know how think from the user POV the number of
queues is gray here, would it migrate, wouldn't? :).

> What I meant is.
>
> Consider vDPA is provisioned (either netlink or other way) with featureX
> and configY. It would be sufficient to validate if the emulated device
> features and configs matches exactly what vDPA device had.
>
> Technically, it should be possible to doing any mediation in the middle
> but it may cause a lot of troubles in the management and others, consider:
>
> featureX is not provisioned but emulated by Qemu, then it's almost
> impossible for the management to check the migration compatibility. If
> feature X can be easily emulated, it should be done in the layer of vDPA
> parent not Qemu, then it could be recognized by the management.
>

I kind of buy this, although I think it would be solvable by asking
qemu what features it emulates and then add to the feature bits mix.

But I'm not proposing to emulate features (here :) ) actually, but to
treat the device config space the same way as we treat the
virtio_pci_common_cfg, and emulate it all the time, effectively
homogenizing the same way as vhost-user etc is homogenized.

I understand the provision tool is a way to do it, maybe even more
convenient. Do all devices support it? Is it reasonable to expect that
all devices that will be migrated (into) will support it?

>
> >
> >>> External tools are needed for this, duplicating
> >>> part of the effort.
> >>>
> >>> Start intercepting config space accesses and offering an emulated one
> >>> to the guest with this kind of adjustments is beneficial, as it makes
> >>> vhost-vdpa more similar to the rest of backends, making the surprise
> >>> on a change way lower.
> >> This probably needs more thought, since vDPA already provides a kind
> >> of emulation in the kernel. My understanding is that it would be
> >> sufficient to add checks to make sure the config that guests see is
> >> consistent with what host provisioned?
> >>
> > With host provisioned you mean with "vdpa" tool or with qemu?
>
>
> Make sure the features and config of emulated device provided by Qemu
> matches the vDPA device provisioned via netlink or other mgmt API.
>

Yes, that is doable for sure. It should be enough with fetching the
config with VHOST_VDPA_GET_CONFIG ioctl, isn't it?

>
> > Also, we
> > need a way to communicate the guest values to it If those checks are
> > added in the kernel.
> >
> > The reasoning here is the same as above: QEMU already filters features
> > with its own emulated layer, so the operator can specify a feature
> > that will never appear to the guest.
>
>
> This needs to be done at the time of vDPA device provisioning. Otherwise
> we will end up with a lot of corner cases. E.g if 8 queue pairs is
> provisioned, do we allow starting a guest with 4 queue pairs?
>

In my proposal, qemu would adjust the number of queue pairs the guest
sees to 4 qps.

>
> >   It has other uses (abstract
> > between transport for example), but feature filtering is definitely a
> > thing there.
> >
> > A feature set to off in a VM (or that does not exist in that
> > particular qemu version) will never appear as on even in the case of
> > migration to modern qemu versions.
> >
> > We don't have the equivalent protection for device config space. QEMU
> > could assure a consistent MTU, number of queues, etc for the guest in
> > virtio_net_get_config (and equivalent for other kinds of devices).
> > QEMU already has some transformations there. It shouldn't take a lot
> > of code.
> >
> > Having said that:
> > * I'm ok with starting just with checks there instead of
> > transformations like the queues remap proposed here.
>
>
> I think we need to keep thing easier. Technically, we could do any kind
> of the mediation/emulation via Qemu, but we need only implement the one
> that is really needed.
>

I agree with this point, but I think we are just moving complexity.

Let's go back to the original vdpa question / problems:

1) There are parameters totally ignored by qemu's vhost-vdpa (mtu,
queues, etc), and a naive operator is surprised by this behavior. And
the guest will see that the config space changes suddenly in a
migration if the device provisioning is not done providing "all and
every" device config space value.

Should qemu forbid cmdline parameters if we're using vhost-vdpa backend? like:
* Backend queues= parameter (unused from its inclusion)
* Set mtu, speed, duplex, etc in qemu cmdline as long as the backend is vdpa.

2) For the second problem, maybe a spurious config interrupt is
missing? Is the device allowed to change all of them, like reduce rss
max table length? Or should the provision tool be able to fetch all of
these values from the original device and then send it to the
provision tool in the destination?

Thanks!

> Queue remapping might complicate a lot stuffs like notification area
> mapping etc.
>
> Thanks
>
>
> > * If we choose not to implement it, I'm not proposing to actually
> > delete the features checks, as I see them useful :).
> >
> > Thanks!
> >
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user
  2023-02-02  3:44       ` Jason Wang
@ 2023-02-02 18:37         ` Eugenio Perez Martin
  0 siblings, 0 replies; 23+ messages in thread
From: Eugenio Perez Martin @ 2023-02-02 18:37 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Maxime Coquelin, Cindy Lu,
	Stefano Garzarella, qemu-level, Laurent Vivier, Juan Quintela

On Thu, Feb 2, 2023 at 4:45 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2023/2/1 19:48, Eugenio Perez Martin 写道:
> > On Wed, Feb 1, 2023 at 12:20 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >> On Wed, Feb 01, 2023 at 12:14:18PM +0100, Maxime Coquelin wrote:
> >>> Thanks Eugenio for working on this.
> >>>
> >>> On 1/31/23 20:10, Eugenio Perez Martin wrote:
> >>>> Hi,
> >>>>
> >>>> The current approach of offering an emulated CVQ to the guest and map
> >>>> the commands to vhost-user is not scaling well:
> >>>> * Some devices already offer it, so the transformation is redundant.
> >>>> * There is no support for commands with variable length (RSS?)
> >>>>
> >>>> We can solve both of them by offering it through vhost-user the same
> >>>> way as vhost-vdpa do. With this approach qemu needs to track the
> >>>> commands, for similar reasons as vhost-vdpa: qemu needs to track the
> >>>> device status for live migration. vhost-user should use the same SVQ
> >>>> code for this, so we avoid duplications.
> >>>>
> >>>> One of the challenges here is to know what virtqueue to shadow /
> >>>> isolate. The vhost-user device may not have the same queues as the
> >>>> device frontend:
> >>>> * The first depends on the actual vhost-user device, and qemu fetches
> >>>> it with VHOST_USER_GET_QUEUE_NUM at the moment.
> >>>> * The qemu device frontend's is set by netdev queues= cmdline parameter in qemu
> >>>>
> >>>> For the device, the CVQ is the last one it offers, but for the guest
> >>>> it is the last one offered in config space.
> >>>>
> >>>> To create a new vhost-user command to decrease that maximum number of
> >>>> queues may be an option. But we can do it without adding more
> >>>> commands, remapping the CVQ index at virtqueue setup. I think it
> >>>> should be doable using (struct vhost_dev).vq_index and maybe a few
> >>>> adjustments here and there.
> >>>>
> >>>> Thoughts?
> >>> I am fine with both proposals.
> >>> I think index remapping will require a bit more rework in the DPDK
> >>> Vhost-user library, but nothing insurmountable.
> >>>
> >>> I am currently working on a PoC adding support for VDUSE in the DPDK
> >>> Vhost library, and recently added control queue support. We can reuse it
> >>> if we want to prototype your proposal.
> >>>
> >>> Maxime
> >>>
> >>>> Thanks!
> >>>>
> >>
> >> technically backend knows how many vqs are there, last one is cvq...
> >> not sure we need full blown remapping ...
> >>
> > The number of queues may not be the same between cmdline and the device.
> >
> > If vhost-user device cmdline wants more queues than offered by the
> > device qemu will print an error. But the reverse (to offer the same
> > number of queues or less than the device have) is valid at this
> > moment.
> >
> > If we add cvq with this scheme, cvq index will not be the same between
> > the guest and the device.
> >
> > vhost-vdpa totally ignores the queues parameter, so we're losing the
> > opportunity to offer a consistent config space in the event of a
> > migration. I suggest we should act the same way as I'm proposing here
> > with vhost-user, so:
> > * QEMU can block the migration in the case the destination cannot
> > offer the same number of queues.
> > * The guest will not see a change of the config space under its feets.
>
>
> As we discussed in the past, it would be easier to fail the device
> initialization in this case.
>

But qemu does not know the source config space, so it cannot check if
it is equal. I think qemu cmdline combined with the migration protocol
is the security measure about this. It already checks for features,
the plan is to extend that check for config space.

Thanks!

> Thanks
>
>
> >
> > Now there are other fields in the config space for sure (mtu, rss
> > size, etc), but I think the most complex case is the number of queues
> > because cvq.
> >
> > Is that clearer?
> >
> > Thanks!
> >
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user
  2023-02-01 11:19   ` Eugenio Perez Martin
@ 2023-03-02  8:48     ` Maxime Coquelin
  0 siblings, 0 replies; 23+ messages in thread
From: Maxime Coquelin @ 2023-03-02  8:48 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Jason Wang, Michael Tsirkin, Cindy Lu, Stefano Garzarella,
	qemu-level, Laurent Vivier, Juan Quintela



On 2/1/23 12:19, Eugenio Perez Martin wrote:
> On Wed, Feb 1, 2023 at 12:14 PM Maxime Coquelin
> <maxime.coquelin@redhat.com> wrote:
>>
>> Thanks Eugenio for working on this.
>>
>> On 1/31/23 20:10, Eugenio Perez Martin wrote:
>>> Hi,
>>>
>>> The current approach of offering an emulated CVQ to the guest and map
>>> the commands to vhost-user is not scaling well:
>>> * Some devices already offer it, so the transformation is redundant.
>>> * There is no support for commands with variable length (RSS?)
>>>
>>> We can solve both of them by offering it through vhost-user the same
>>> way as vhost-vdpa do. With this approach qemu needs to track the
>>> commands, for similar reasons as vhost-vdpa: qemu needs to track the
>>> device status for live migration. vhost-user should use the same SVQ
>>> code for this, so we avoid duplications.
>>>
>>> One of the challenges here is to know what virtqueue to shadow /
>>> isolate. The vhost-user device may not have the same queues as the
>>> device frontend:
>>> * The first depends on the actual vhost-user device, and qemu fetches
>>> it with VHOST_USER_GET_QUEUE_NUM at the moment.
>>> * The qemu device frontend's is set by netdev queues= cmdline parameter in qemu
>>>
>>> For the device, the CVQ is the last one it offers, but for the guest
>>> it is the last one offered in config space.
>>>
>>> To create a new vhost-user command to decrease that maximum number of
>>> queues may be an option. But we can do it without adding more
>>> commands, remapping the CVQ index at virtqueue setup. I think it
>>> should be doable using (struct vhost_dev).vq_index and maybe a few
>>> adjustments here and there.
>>>
>>> Thoughts?
>>
>> I am fine with both proposals.
>> I think index remapping will require a bit more rework in the DPDK
>> Vhost-user library, but nothing insurmountable.
>>
> 
> Can you expand on this? I think my proposal does not require modifying
> anything in the device. Or you mean hw/virtio/vhost-user.c and
> similar?

I had deeper look, and both proposals should not be very different in
term of rework on DPDK side.

Maxime

> Thanks!
> 
>> I am currently working on a PoC adding support for VDUSE in the DPDK
>> Vhost library, and recently added control queue support. We can reuse it
>> if we want to prototype your proposal.
>>
> 
> Sure, that would be great.
> 
> Thanks!
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user
  2023-02-01 11:20   ` Michael S. Tsirkin
  2023-02-01 11:48     ` Eugenio Perez Martin
@ 2023-03-08 10:33     ` Maxime Coquelin
  2023-03-08 12:15       ` Michael S. Tsirkin
  1 sibling, 1 reply; 23+ messages in thread
From: Maxime Coquelin @ 2023-03-08 10:33 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Eugenio Perez Martin, Jason Wang, Cindy Lu, Stefano Garzarella,
	qemu-level, Laurent Vivier, Juan Quintela

Hello Michael,

On 2/1/23 12:20, Michael S. Tsirkin wrote:
> On Wed, Feb 01, 2023 at 12:14:18PM +0100, Maxime Coquelin wrote:
>> Thanks Eugenio for working on this.
>>
>> On 1/31/23 20:10, Eugenio Perez Martin wrote:
>>> Hi,
>>>
>>> The current approach of offering an emulated CVQ to the guest and map
>>> the commands to vhost-user is not scaling well:
>>> * Some devices already offer it, so the transformation is redundant.
>>> * There is no support for commands with variable length (RSS?)
>>>
>>> We can solve both of them by offering it through vhost-user the same
>>> way as vhost-vdpa do. With this approach qemu needs to track the
>>> commands, for similar reasons as vhost-vdpa: qemu needs to track the
>>> device status for live migration. vhost-user should use the same SVQ
>>> code for this, so we avoid duplications.
>>>
>>> One of the challenges here is to know what virtqueue to shadow /
>>> isolate. The vhost-user device may not have the same queues as the
>>> device frontend:
>>> * The first depends on the actual vhost-user device, and qemu fetches
>>> it with VHOST_USER_GET_QUEUE_NUM at the moment.
>>> * The qemu device frontend's is set by netdev queues= cmdline parameter in qemu
>>>
>>> For the device, the CVQ is the last one it offers, but for the guest
>>> it is the last one offered in config space.
>>>
>>> To create a new vhost-user command to decrease that maximum number of
>>> queues may be an option. But we can do it without adding more
>>> commands, remapping the CVQ index at virtqueue setup. I think it
>>> should be doable using (struct vhost_dev).vq_index and maybe a few
>>> adjustments here and there.
>>>
>>> Thoughts?
>>
>> I am fine with both proposals.
>> I think index remapping will require a bit more rework in the DPDK
>> Vhost-user library, but nothing insurmountable.
>>
>> I am currently working on a PoC adding support for VDUSE in the DPDK
>> Vhost library, and recently added control queue support. We can reuse it
>> if we want to prototype your proposal.
>>
>> Maxime
>>
>>> Thanks!
>>>
> 
> 
> technically backend knows how many vqs are there, last one is cvq...
> not sure we need full blown remapping ...
> 

Before VHOST_USER_PROTOCOL_F_STATUS was supported by qemu (very
recently, v7.2.0), we had no way for the backend to be sure the
frontend won't configure new queue pairs, this not not defined in the
spec AFAICT [0]. In DPDK Vhost library, we notify the application it can
start to use the device once the first queue pair is setup and enabled,
then we notify the application when new queues are ready to be
processed. In this case, I think we cannot deduce whether the queue is a
data or a control queue when it is setup.

When VHOST_USER_PROTOCOL_F_STATUS is supported, we know no more queues
will be configured once the DRIVER_OK status is set. In this case, we
can deduce the last queue setup will be the control queue at DRIVER_OK
time if the number of queues is odd.

Using index remapping, we would know directly at queue setup time
whether this is a data or control queue based on its index value,
i.e. if the index equals to max queue index supported by the backend.
But thinking at it again, we may at least back this with a protocol
feature to avoid issues with legacy backends.

I hope it clarifies, let me know if anything unclear.

Thanks,
Maxime

[0]: 
https://elixir.bootlin.com/qemu/latest/source/docs/interop/vhost-user.rst



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user
  2023-03-08 10:33     ` Maxime Coquelin
@ 2023-03-08 12:15       ` Michael S. Tsirkin
  2023-03-10 10:33         ` Maxime Coquelin
  0 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2023-03-08 12:15 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: Eugenio Perez Martin, Jason Wang, Cindy Lu, Stefano Garzarella,
	qemu-level, Laurent Vivier, Juan Quintela

On Wed, Mar 08, 2023 at 11:33:45AM +0100, Maxime Coquelin wrote:
> Hello Michael,
> 
> On 2/1/23 12:20, Michael S. Tsirkin wrote:
> > On Wed, Feb 01, 2023 at 12:14:18PM +0100, Maxime Coquelin wrote:
> > > Thanks Eugenio for working on this.
> > > 
> > > On 1/31/23 20:10, Eugenio Perez Martin wrote:
> > > > Hi,
> > > > 
> > > > The current approach of offering an emulated CVQ to the guest and map
> > > > the commands to vhost-user is not scaling well:
> > > > * Some devices already offer it, so the transformation is redundant.
> > > > * There is no support for commands with variable length (RSS?)
> > > > 
> > > > We can solve both of them by offering it through vhost-user the same
> > > > way as vhost-vdpa do. With this approach qemu needs to track the
> > > > commands, for similar reasons as vhost-vdpa: qemu needs to track the
> > > > device status for live migration. vhost-user should use the same SVQ
> > > > code for this, so we avoid duplications.
> > > > 
> > > > One of the challenges here is to know what virtqueue to shadow /
> > > > isolate. The vhost-user device may not have the same queues as the
> > > > device frontend:
> > > > * The first depends on the actual vhost-user device, and qemu fetches
> > > > it with VHOST_USER_GET_QUEUE_NUM at the moment.
> > > > * The qemu device frontend's is set by netdev queues= cmdline parameter in qemu
> > > > 
> > > > For the device, the CVQ is the last one it offers, but for the guest
> > > > it is the last one offered in config space.
> > > > 
> > > > To create a new vhost-user command to decrease that maximum number of
> > > > queues may be an option. But we can do it without adding more
> > > > commands, remapping the CVQ index at virtqueue setup. I think it
> > > > should be doable using (struct vhost_dev).vq_index and maybe a few
> > > > adjustments here and there.
> > > > 
> > > > Thoughts?
> > > 
> > > I am fine with both proposals.
> > > I think index remapping will require a bit more rework in the DPDK
> > > Vhost-user library, but nothing insurmountable.
> > > 
> > > I am currently working on a PoC adding support for VDUSE in the DPDK
> > > Vhost library, and recently added control queue support. We can reuse it
> > > if we want to prototype your proposal.
> > > 
> > > Maxime
> > > 
> > > > Thanks!
> > > > 
> > 
> > 
> > technically backend knows how many vqs are there, last one is cvq...
> > not sure we need full blown remapping ...
> > 
> 
> Before VHOST_USER_PROTOCOL_F_STATUS was supported by qemu (very
> recently, v7.2.0), we had no way for the backend to be sure the
> frontend won't configure new queue pairs, this not not defined in the
> spec AFAICT [0]. In DPDK Vhost library, we notify the application it can
> start to use the device once the first queue pair is setup and enabled,
> then we notify the application when new queues are ready to be
> processed. In this case, I think we cannot deduce whether the queue is a
> data or a control queue when it is setup.
> 
> When VHOST_USER_PROTOCOL_F_STATUS is supported, we know no more queues
> will be configured once the DRIVER_OK status is set. In this case, we
> can deduce the last queue setup will be the control queue at DRIVER_OK
> time if the number of queues is odd.
> 
> Using index remapping, we would know directly at queue setup time
> whether this is a data or control queue based on its index value,
> i.e. if the index equals to max queue index supported by the backend.
> But thinking at it again, we may at least back this with a protocol
> feature to avoid issues with legacy backends.
> 
> I hope it clarifies, let me know if anything unclear.
> 
> Thanks,
> Maxime
> 
> [0]:
> https://elixir.bootlin.com/qemu/latest/source/docs/interop/vhost-user.rst


OK maybe document this.

-- 
MST



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user
  2023-03-08 12:15       ` Michael S. Tsirkin
@ 2023-03-10 10:33         ` Maxime Coquelin
  2023-03-10 10:49           ` Michael S. Tsirkin
  0 siblings, 1 reply; 23+ messages in thread
From: Maxime Coquelin @ 2023-03-10 10:33 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Eugenio Perez Martin, Jason Wang, Cindy Lu, Stefano Garzarella,
	qemu-level, Laurent Vivier, Juan Quintela, Eelco Chaudron



On 3/8/23 13:15, Michael S. Tsirkin wrote:
> On Wed, Mar 08, 2023 at 11:33:45AM +0100, Maxime Coquelin wrote:
>> Hello Michael,
>>
>> On 2/1/23 12:20, Michael S. Tsirkin wrote:
>>> On Wed, Feb 01, 2023 at 12:14:18PM +0100, Maxime Coquelin wrote:
>>>> Thanks Eugenio for working on this.
>>>>
>>>> On 1/31/23 20:10, Eugenio Perez Martin wrote:
>>>>> Hi,
>>>>>
>>>>> The current approach of offering an emulated CVQ to the guest and map
>>>>> the commands to vhost-user is not scaling well:
>>>>> * Some devices already offer it, so the transformation is redundant.
>>>>> * There is no support for commands with variable length (RSS?)
>>>>>
>>>>> We can solve both of them by offering it through vhost-user the same
>>>>> way as vhost-vdpa do. With this approach qemu needs to track the
>>>>> commands, for similar reasons as vhost-vdpa: qemu needs to track the
>>>>> device status for live migration. vhost-user should use the same SVQ
>>>>> code for this, so we avoid duplications.
>>>>>
>>>>> One of the challenges here is to know what virtqueue to shadow /
>>>>> isolate. The vhost-user device may not have the same queues as the
>>>>> device frontend:
>>>>> * The first depends on the actual vhost-user device, and qemu fetches
>>>>> it with VHOST_USER_GET_QUEUE_NUM at the moment.
>>>>> * The qemu device frontend's is set by netdev queues= cmdline parameter in qemu
>>>>>
>>>>> For the device, the CVQ is the last one it offers, but for the guest
>>>>> it is the last one offered in config space.
>>>>>
>>>>> To create a new vhost-user command to decrease that maximum number of
>>>>> queues may be an option. But we can do it without adding more
>>>>> commands, remapping the CVQ index at virtqueue setup. I think it
>>>>> should be doable using (struct vhost_dev).vq_index and maybe a few
>>>>> adjustments here and there.
>>>>>
>>>>> Thoughts?
>>>>
>>>> I am fine with both proposals.
>>>> I think index remapping will require a bit more rework in the DPDK
>>>> Vhost-user library, but nothing insurmountable.
>>>>
>>>> I am currently working on a PoC adding support for VDUSE in the DPDK
>>>> Vhost library, and recently added control queue support. We can reuse it
>>>> if we want to prototype your proposal.
>>>>
>>>> Maxime
>>>>
>>>>> Thanks!
>>>>>
>>>
>>>
>>> technically backend knows how many vqs are there, last one is cvq...
>>> not sure we need full blown remapping ...
>>>
>>
>> Before VHOST_USER_PROTOCOL_F_STATUS was supported by qemu (very
>> recently, v7.2.0), we had no way for the backend to be sure the
>> frontend won't configure new queue pairs, this not not defined in the
>> spec AFAICT [0]. In DPDK Vhost library, we notify the application it can
>> start to use the device once the first queue pair is setup and enabled,
>> then we notify the application when new queues are ready to be
>> processed. In this case, I think we cannot deduce whether the queue is a
>> data or a control queue when it is setup.
>>
>> When VHOST_USER_PROTOCOL_F_STATUS is supported, we know no more queues
>> will be configured once the DRIVER_OK status is set. In this case, we
>> can deduce the last queue setup will be the control queue at DRIVER_OK
>> time if the number of queues is odd.
>>
>> Using index remapping, we would know directly at queue setup time
>> whether this is a data or control queue based on its index value,
>> i.e. if the index equals to max queue index supported by the backend.
>> But thinking at it again, we may at least back this with a protocol
>> feature to avoid issues with legacy backends.
>>
>> I hope it clarifies, let me know if anything unclear.
>>
>> Thanks,
>> Maxime
>>
>> [0]:
>> https://elixir.bootlin.com/qemu/latest/source/docs/interop/vhost-user.rst
> 
> 
> OK maybe document this.

Sure, working on it... But I just found a discrepancy related to
VHOST_USER_GET_QUEUE_NUM between the spec and the frontend/backend
implementations.

In the spec [0], VHOST_USER_GET_QUEUE_NUM reply is the number of queues.
In Qemu Vhost-user Net frontend [1], VHOST_USER_GET_QUEUE_NUM is handled
as the number of queue *pairs*, and so does the DPDK Vhost library. 
Vhost-user-bridge Qemu test application handles it as the number of queues.

Other device types seem to handle it as the number of queues, which
makes sense since they don't have the notion of queue pair.

Fixing the QEMU and DPDK implementations would require a new protocol 
feature bit not to break compatibility with older versions.

So maybe we should add in the spec that for network devices,
VHOST_USER_GET_QUEUE_NUM reply represents the number of queue pairs, and
also fix vhost-user-bridge to reply with the number of queue pairs?

Maxime

[0]: 
https://elixir.bootlin.com/qemu/latest/source/docs/interop/vhost-user.rst#L1091
[1]: https://elixir.bootlin.com/qemu/latest/source/net/vhost-user.c#L69



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user
  2023-03-10 10:33         ` Maxime Coquelin
@ 2023-03-10 10:49           ` Michael S. Tsirkin
  0 siblings, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2023-03-10 10:49 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: Eugenio Perez Martin, Jason Wang, Cindy Lu, Stefano Garzarella,
	qemu-level, Laurent Vivier, Juan Quintela, Eelco Chaudron

On Fri, Mar 10, 2023 at 11:33:42AM +0100, Maxime Coquelin wrote:
> 
> 
> On 3/8/23 13:15, Michael S. Tsirkin wrote:
> > On Wed, Mar 08, 2023 at 11:33:45AM +0100, Maxime Coquelin wrote:
> > > Hello Michael,
> > > 
> > > On 2/1/23 12:20, Michael S. Tsirkin wrote:
> > > > On Wed, Feb 01, 2023 at 12:14:18PM +0100, Maxime Coquelin wrote:
> > > > > Thanks Eugenio for working on this.
> > > > > 
> > > > > On 1/31/23 20:10, Eugenio Perez Martin wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > The current approach of offering an emulated CVQ to the guest and map
> > > > > > the commands to vhost-user is not scaling well:
> > > > > > * Some devices already offer it, so the transformation is redundant.
> > > > > > * There is no support for commands with variable length (RSS?)
> > > > > > 
> > > > > > We can solve both of them by offering it through vhost-user the same
> > > > > > way as vhost-vdpa do. With this approach qemu needs to track the
> > > > > > commands, for similar reasons as vhost-vdpa: qemu needs to track the
> > > > > > device status for live migration. vhost-user should use the same SVQ
> > > > > > code for this, so we avoid duplications.
> > > > > > 
> > > > > > One of the challenges here is to know what virtqueue to shadow /
> > > > > > isolate. The vhost-user device may not have the same queues as the
> > > > > > device frontend:
> > > > > > * The first depends on the actual vhost-user device, and qemu fetches
> > > > > > it with VHOST_USER_GET_QUEUE_NUM at the moment.
> > > > > > * The qemu device frontend's is set by netdev queues= cmdline parameter in qemu
> > > > > > 
> > > > > > For the device, the CVQ is the last one it offers, but for the guest
> > > > > > it is the last one offered in config space.
> > > > > > 
> > > > > > To create a new vhost-user command to decrease that maximum number of
> > > > > > queues may be an option. But we can do it without adding more
> > > > > > commands, remapping the CVQ index at virtqueue setup. I think it
> > > > > > should be doable using (struct vhost_dev).vq_index and maybe a few
> > > > > > adjustments here and there.
> > > > > > 
> > > > > > Thoughts?
> > > > > 
> > > > > I am fine with both proposals.
> > > > > I think index remapping will require a bit more rework in the DPDK
> > > > > Vhost-user library, but nothing insurmountable.
> > > > > 
> > > > > I am currently working on a PoC adding support for VDUSE in the DPDK
> > > > > Vhost library, and recently added control queue support. We can reuse it
> > > > > if we want to prototype your proposal.
> > > > > 
> > > > > Maxime
> > > > > 
> > > > > > Thanks!
> > > > > > 
> > > > 
> > > > 
> > > > technically backend knows how many vqs are there, last one is cvq...
> > > > not sure we need full blown remapping ...
> > > > 
> > > 
> > > Before VHOST_USER_PROTOCOL_F_STATUS was supported by qemu (very
> > > recently, v7.2.0), we had no way for the backend to be sure the
> > > frontend won't configure new queue pairs, this not not defined in the
> > > spec AFAICT [0]. In DPDK Vhost library, we notify the application it can
> > > start to use the device once the first queue pair is setup and enabled,
> > > then we notify the application when new queues are ready to be
> > > processed. In this case, I think we cannot deduce whether the queue is a
> > > data or a control queue when it is setup.
> > > 
> > > When VHOST_USER_PROTOCOL_F_STATUS is supported, we know no more queues
> > > will be configured once the DRIVER_OK status is set. In this case, we
> > > can deduce the last queue setup will be the control queue at DRIVER_OK
> > > time if the number of queues is odd.
> > > 
> > > Using index remapping, we would know directly at queue setup time
> > > whether this is a data or control queue based on its index value,
> > > i.e. if the index equals to max queue index supported by the backend.
> > > But thinking at it again, we may at least back this with a protocol
> > > feature to avoid issues with legacy backends.
> > > 
> > > I hope it clarifies, let me know if anything unclear.
> > > 
> > > Thanks,
> > > Maxime
> > > 
> > > [0]:
> > > https://elixir.bootlin.com/qemu/latest/source/docs/interop/vhost-user.rst
> > 
> > 
> > OK maybe document this.
> 
> Sure, working on it... But I just found a discrepancy related to
> VHOST_USER_GET_QUEUE_NUM between the spec and the frontend/backend
> implementations.
> 
> In the spec [0], VHOST_USER_GET_QUEUE_NUM reply is the number of queues.
> In Qemu Vhost-user Net frontend [1], VHOST_USER_GET_QUEUE_NUM is handled
> as the number of queue *pairs*, and so does the DPDK Vhost library.
> Vhost-user-bridge Qemu test application handles it as the number of queues.

weird how does Vhost-user-bridge work then? I guess it just
ignores the extra queues that were not inited?

> Other device types seem to handle it as the number of queues, which
> makes sense since they don't have the notion of queue pair.
> 
> Fixing the QEMU and DPDK implementations would require a new protocol
> feature bit not to break compatibility with older versions.
> 
> So maybe we should add in the spec that for network devices,
> VHOST_USER_GET_QUEUE_NUM reply represents the number of queue pairs, and
> also fix vhost-user-bridge to reply with the number of queue pairs?
> 
> Maxime

Not sure we need to fix vhost-user-bridge - it seems to work?
In any case let's add a protocol feature to fix it for net maybe?


> [0]: https://elixir.bootlin.com/qemu/latest/source/docs/interop/vhost-user.rst#L1091
> [1]: https://elixir.bootlin.com/qemu/latest/source/net/vhost-user.c#L69



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2023-03-10 10:50 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-31 19:10 Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user Eugenio Perez Martin
2023-01-31 19:11 ` Eugenio Perez Martin
2023-01-31 21:32   ` Michael S. Tsirkin
2023-02-01  3:29   ` Jason Wang
2023-02-01  7:49     ` Eugenio Perez Martin
2023-02-01 10:44       ` Michael S. Tsirkin
2023-02-02  3:41       ` Jason Wang
2023-02-02 18:32         ` Eugenio Perez Martin
2023-02-01 10:53     ` Michael S. Tsirkin
2023-02-01  3:27 ` Jason Wang
2023-02-01  6:55   ` Eugenio Perez Martin
2023-02-02  3:02     ` Jason Wang
2023-02-01 11:14 ` Maxime Coquelin
2023-02-01 11:19   ` Eugenio Perez Martin
2023-03-02  8:48     ` Maxime Coquelin
2023-02-01 11:20   ` Michael S. Tsirkin
2023-02-01 11:48     ` Eugenio Perez Martin
2023-02-02  3:44       ` Jason Wang
2023-02-02 18:37         ` Eugenio Perez Martin
2023-03-08 10:33     ` Maxime Coquelin
2023-03-08 12:15       ` Michael S. Tsirkin
2023-03-10 10:33         ` Maxime Coquelin
2023-03-10 10:49           ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.