All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: RFC: VDPA Interrupt vector distribution
       [not found] <bc4136ed-abe0-dcc2-4dd9-31dcf3d8c179@nvidia.com>
@ 2023-01-30  8:19 ` Jason Wang
       [not found]   ` <23806cd9-ffde-778c-5fa5-b95bd1ff0b44@nvidia.com>
  0 siblings, 1 reply; 4+ messages in thread
From: Jason Wang @ 2023-01-30  8:19 UTC (permalink / raw)
  To: Eli Cohen; +Cc: mst, virtualization

Hi Eli:

On Mon, Jan 23, 2023 at 1:59 PM Eli Cohen <elic@nvidia.com> wrote:
>
> VDPA allows hardware drivers the propagate interrupts from the hardware
> directly to the vCPU used by the guest. In a typical implementation, the
> hardware driver will assign the interrupt vectors to the virtqueues and report
> this information back through the get_vq_irq() callback defined in
> struct vdpa_config_ops.
>
> Interrupt vectors could be a scarce resource and may be limited. For such
> cases, we can opt the administrator, through the vdpa tool, to set the policy
> defining how to distribute the available vectors amongst the data virtqueues.
>
> The following policies are proposed:
>
> 1. First comes first served. Assign a vector to each data virtqueue by the
>     virtqueue index. Virtqueues which could not be assigned a dedicated vector
>     would use the hardware driver to propagate interrupts using the available
>     callback mechanism.
>
>     vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 int=all
>
>     This is the default mode and works even if "int=all" was not specified.
>
> 2. Use round robin distribution so virtqueues could share vectors.
>     vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 int=all intmode=share
>
> 3. Assign vectors to RX virtqueues only.
> 3.1 Do not share vectors
>      vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 int=rx
> 3.2 Share vectors
>      vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 int=rx intmode=share
>
> 4. Assign vectors to TX virtqueues only. Can share or not, like rx.
> 5. Fail device creation if number of vectors cannot be fulfilled.
>     vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 max_vq_pairs 8 int=rx intnum=8

I wonder:

1) how the administrator can know if there's sufficient resources for
one of the above policies.
2) how does the administrator know which policy is the best assuming
the resources are sufficient? (E.g vectors to RX only or vectors to TX
only)

If it requires a vendor specific way or knowledge, I believe it's
better to code them in:

1) the vDPA parent or
2) underlayer management tool or drivers

Thanks

>
>
>
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: VDPA Interrupt vector distribution
       [not found]   ` <23806cd9-ffde-778c-5fa5-b95bd1ff0b44@nvidia.com>
@ 2023-01-30 11:34     ` Michael S. Tsirkin
       [not found]       ` <734e2553-199f-94eb-88d1-a642ec1c7490@nvidia.com>
  0 siblings, 1 reply; 4+ messages in thread
From: Michael S. Tsirkin @ 2023-01-30 11:34 UTC (permalink / raw)
  To: Eli Cohen; +Cc: virtualization

On Mon, Jan 30, 2023 at 12:01:23PM +0200, Eli Cohen wrote:
> On 30/01/2023 10:19, Jason Wang wrote:
> > Hi Eli:
> > 
> > On Mon, Jan 23, 2023 at 1:59 PM Eli Cohen <elic@nvidia.com> wrote:
> > > VDPA allows hardware drivers the propagate interrupts from the hardware
> > > directly to the vCPU used by the guest. In a typical implementation, the
> > > hardware driver will assign the interrupt vectors to the virtqueues and report
> > > this information back through the get_vq_irq() callback defined in
> > > struct vdpa_config_ops.
> > > 
> > > Interrupt vectors could be a scarce resource and may be limited. For such
> > > cases, we can opt the administrator, through the vdpa tool, to set the policy
> > > defining how to distribute the available vectors amongst the data virtqueues.
> > > 
> > > The following policies are proposed:
> > > 
> > > 1. First comes first served. Assign a vector to each data virtqueue by the
> > >      virtqueue index. Virtqueues which could not be assigned a dedicated vector
> > >      would use the hardware driver to propagate interrupts using the available
> > >      callback mechanism.
> > > 
> > >      vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 int=all
> > > 
> > >      This is the default mode and works even if "int=all" was not specified.
> > > 
> > > 2. Use round robin distribution so virtqueues could share vectors.
> > >      vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 int=all intmode=share
> > > 
> > > 3. Assign vectors to RX virtqueues only.
> > > 3.1 Do not share vectors
> > >       vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 int=rx
> > > 3.2 Share vectors
> > >       vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 int=rx intmode=share
> > > 
> > > 4. Assign vectors to TX virtqueues only. Can share or not, like rx.
> > > 5. Fail device creation if number of vectors cannot be fulfilled.
> > >      vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 max_vq_pairs 8 int=rx intnum=8
> > I wonder:
> > 
> > 1) how the administrator can know if there's sufficient resources for
> > one of the above policies.
> There's no established way to know. The idea is to use whatever there is
> assuming interrupt bypassing is always better then the callback mechanism.
> > 2) how does the administrator know which policy is the best assuming
> > the resources are sufficient? (E.g vectors to RX only or vectors to TX
> > only)
> I don't think there's a rule of thumb here but he needs to experiment what
> works best for him.
> > 
> > If it requires a vendor specific way or knowledge, I believe it's
> > better to code them in:
> > 
> > 1) the vDPA parent or
> > 2) underlayer management tool or drivers
> > 
> > Thanks
> 
> I was wondering also about the current mechanism we have. The hardware
> driver reports irq number for each VQ.
> 
> The guest driver sees a virtio pci device with MSIX vectors as the number of
> virtqueues.
> 
> Suppose the hardware driver provided only 5 interrupt vectors while there
> are 16 VQs.
> 
> Which MSIX vector at the guest gets really posted interrupt and which one
> uses callback handled at the hardware driver?

Not sure I understand.
If you get a single interrupt from hardware callback or posted
you can only drive one interrupt to guest, no?


> > > 
> > > 
> > > 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: VDPA Interrupt vector distribution
       [not found]       ` <734e2553-199f-94eb-88d1-a642ec1c7490@nvidia.com>
@ 2023-01-31  6:02         ` Jason Wang
  2023-01-31  7:26         ` Michael S. Tsirkin
  1 sibling, 0 replies; 4+ messages in thread
From: Jason Wang @ 2023-01-31  6:02 UTC (permalink / raw)
  To: Eli Cohen; +Cc: virtualization, Michael S. Tsirkin

On Mon, Jan 30, 2023 at 7:54 PM Eli Cohen <elic@nvidia.com> wrote:
>
>
> On 30/01/2023 13:34, Michael S. Tsirkin wrote:
> > On Mon, Jan 30, 2023 at 12:01:23PM +0200, Eli Cohen wrote:
> >> On 30/01/2023 10:19, Jason Wang wrote:
> >>> Hi Eli:
> >>>
> >>> On Mon, Jan 23, 2023 at 1:59 PM Eli Cohen <elic@nvidia.com> wrote:
> >>>> VDPA allows hardware drivers the propagate interrupts from the hardware
> >>>> directly to the vCPU used by the guest. In a typical implementation, the
> >>>> hardware driver will assign the interrupt vectors to the virtqueues and report
> >>>> this information back through the get_vq_irq() callback defined in
> >>>> struct vdpa_config_ops.
> >>>>
> >>>> Interrupt vectors could be a scarce resource and may be limited. For such
> >>>> cases, we can opt the administrator, through the vdpa tool, to set the policy
> >>>> defining how to distribute the available vectors amongst the data virtqueues.
> >>>>
> >>>> The following policies are proposed:
> >>>>
> >>>> 1. First comes first served. Assign a vector to each data virtqueue by the
> >>>>       virtqueue index. Virtqueues which could not be assigned a dedicated vector
> >>>>       would use the hardware driver to propagate interrupts using the available
> >>>>       callback mechanism.
> >>>>
> >>>>       vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 int=all
> >>>>
> >>>>       This is the default mode and works even if "int=all" was not specified.
> >>>>
> >>>> 2. Use round robin distribution so virtqueues could share vectors.
> >>>>       vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 int=all intmode=share
> >>>>
> >>>> 3. Assign vectors to RX virtqueues only.
> >>>> 3.1 Do not share vectors
> >>>>        vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 int=rx
> >>>> 3.2 Share vectors
> >>>>        vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 int=rx intmode=share
> >>>>
> >>>> 4. Assign vectors to TX virtqueues only. Can share or not, like rx.
> >>>> 5. Fail device creation if number of vectors cannot be fulfilled.
> >>>>       vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 max_vq_pairs 8 int=rx intnum=8
> >>> I wonder:
> >>>
> >>> 1) how the administrator can know if there's sufficient resources for
> >>> one of the above policies.
> >> There's no established way to know. The idea is to use whatever there is
> >> assuming interrupt bypassing is always better then the callback mechanism.
> >>> 2) how does the administrator know which policy is the best assuming
> >>> the resources are sufficient? (E.g vectors to RX only or vectors to TX
> >>> only)
> >> I don't think there's a rule of thumb here but he needs to experiment what
> >> works best for him.
> >>> If it requires a vendor specific way or knowledge, I believe it's
> >>> better to code them in:
> >>>
> >>> 1) the vDPA parent or
> >>> 2) underlayer management tool or drivers
> >>>
> >>> Thanks
> >> I was wondering also about the current mechanism we have. The hardware
> >> driver reports irq number for each VQ.
> >>
> >> The guest driver sees a virtio pci device with MSIX vectors as the number of
> >> virtqueues.
> >>
> >> Suppose the hardware driver provided only 5 interrupt vectors while there
> >> are 16 VQs.
> >>
> >> Which MSIX vector at the guest gets really posted interrupt and which one
> >> uses callback handled at the hardware driver?
> > Not sure I understand.
> > If you get a single interrupt from hardware callback or posted
> > you can only drive one interrupt to guest, no?
> >
> For every VQ I have a chance to assign interrupt vector.
>
> Consider this scenario:
>
> mlx5_vdpa created with 16 data virtqueu
>
> mlx5_vdpa associates VQ0 with interrupt vector. The reset of the vectors
> don't get assigned vectors and use old callback mechanism.
>
> When you go to the VM and run lspci, you will see the device has 16 MSIX
> vectors.

Note that the guest MSI-X vectors are emulated by software, you can
change by specificing "vectors=X" parameters of virtio-pci. And those
MSI-X vectors are backed by eventfds which Qemu will create and pass
to both KVM and vhost-vDPA.

>
> Do you know which of the MSIX vectors on the guest is the vector I
> assigned for VQ0?

The mapping from guest MSI-X vector to VQ0 is done via
queue_msix_vector in the capability, and it is under the control of
guest virtio-pci drivers.

The mapping from host MSI-X to guest MSI-X (required for the posted
interrupt) is done via matching the eventfd between KVM and vhost-vDPA
when assigning eventfds. So assuming:

1) guest driver use guest seen MSI-X vector X for vq0
2) host driver report irqX via get_vq_irq(0)

Then corresponding host MSI-X of irqX is mapped to vq0 (via guest seen
MSI-X vector X) via posted interrupt when it is possible. If the
posted interrupt can't work for some reasons, the code will fallback
to vq_callback which is a simple eventfd_signal().

Thanks

>
> >>>>
> >>>>
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: VDPA Interrupt vector distribution
       [not found]       ` <734e2553-199f-94eb-88d1-a642ec1c7490@nvidia.com>
  2023-01-31  6:02         ` Jason Wang
@ 2023-01-31  7:26         ` Michael S. Tsirkin
  1 sibling, 0 replies; 4+ messages in thread
From: Michael S. Tsirkin @ 2023-01-31  7:26 UTC (permalink / raw)
  To: Eli Cohen; +Cc: virtualization

On Mon, Jan 30, 2023 at 01:54:14PM +0200, Eli Cohen wrote:
> 
> On 30/01/2023 13:34, Michael S. Tsirkin wrote:
> > On Mon, Jan 30, 2023 at 12:01:23PM +0200, Eli Cohen wrote:
> > > On 30/01/2023 10:19, Jason Wang wrote:
> > > > Hi Eli:
> > > > 
> > > > On Mon, Jan 23, 2023 at 1:59 PM Eli Cohen <elic@nvidia.com> wrote:
> > > > > VDPA allows hardware drivers the propagate interrupts from the hardware
> > > > > directly to the vCPU used by the guest. In a typical implementation, the
> > > > > hardware driver will assign the interrupt vectors to the virtqueues and report
> > > > > this information back through the get_vq_irq() callback defined in
> > > > > struct vdpa_config_ops.
> > > > > 
> > > > > Interrupt vectors could be a scarce resource and may be limited. For such
> > > > > cases, we can opt the administrator, through the vdpa tool, to set the policy
> > > > > defining how to distribute the available vectors amongst the data virtqueues.
> > > > > 
> > > > > The following policies are proposed:
> > > > > 
> > > > > 1. First comes first served. Assign a vector to each data virtqueue by the
> > > > >       virtqueue index. Virtqueues which could not be assigned a dedicated vector
> > > > >       would use the hardware driver to propagate interrupts using the available
> > > > >       callback mechanism.
> > > > > 
> > > > >       vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 int=all
> > > > > 
> > > > >       This is the default mode and works even if "int=all" was not specified.
> > > > > 
> > > > > 2. Use round robin distribution so virtqueues could share vectors.
> > > > >       vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 int=all intmode=share
> > > > > 
> > > > > 3. Assign vectors to RX virtqueues only.
> > > > > 3.1 Do not share vectors
> > > > >        vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 int=rx
> > > > > 3.2 Share vectors
> > > > >        vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 int=rx intmode=share
> > > > > 
> > > > > 4. Assign vectors to TX virtqueues only. Can share or not, like rx.
> > > > > 5. Fail device creation if number of vectors cannot be fulfilled.
> > > > >       vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 max_vq_pairs 8 int=rx intnum=8
> > > > I wonder:
> > > > 
> > > > 1) how the administrator can know if there's sufficient resources for
> > > > one of the above policies.
> > > There's no established way to know. The idea is to use whatever there is
> > > assuming interrupt bypassing is always better then the callback mechanism.
> > > > 2) how does the administrator know which policy is the best assuming
> > > > the resources are sufficient? (E.g vectors to RX only or vectors to TX
> > > > only)
> > > I don't think there's a rule of thumb here but he needs to experiment what
> > > works best for him.
> > > > If it requires a vendor specific way or knowledge, I believe it's
> > > > better to code them in:
> > > > 
> > > > 1) the vDPA parent or
> > > > 2) underlayer management tool or drivers
> > > > 
> > > > Thanks
> > > I was wondering also about the current mechanism we have. The hardware
> > > driver reports irq number for each VQ.
> > > 
> > > The guest driver sees a virtio pci device with MSIX vectors as the number of
> > > virtqueues.
> > > 
> > > Suppose the hardware driver provided only 5 interrupt vectors while there
> > > are 16 VQs.
> > > 
> > > Which MSIX vector at the guest gets really posted interrupt and which one
> > > uses callback handled at the hardware driver?
> > Not sure I understand.
> > If you get a single interrupt from hardware callback or posted
> > you can only drive one interrupt to guest, no?
> > 
> For every VQ I have a chance to assign interrupt vector.
> 
> Consider this scenario:
> 
> mlx5_vdpa created with 16 data virtqueu
> 
> mlx5_vdpa associates VQ0 with interrupt vector. The reset of the vectors
> don't get assigned vectors and use old callback mechanism.
> 
> When you go to the VM and run lspci, you will see the device has 16 MSIX
> vectors.
> 
> Do you know which of the MSIX vectors on the guest is the vector I assigned
> for VQ0?

Me as in which component?
And I don't really understand how this answers the question.
If hardware only supports 5 vectors, how can we expose 16
vectors to guest? Host can send to guest as many as it wants,
sure (this is the callback you are referring to, right?)
but host will not know which interrupt to send.
I conclude that exposing to guest more vectors than
hardware supports is simply not something we should do.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-01-31  7:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bc4136ed-abe0-dcc2-4dd9-31dcf3d8c179@nvidia.com>
2023-01-30  8:19 ` RFC: VDPA Interrupt vector distribution Jason Wang
     [not found]   ` <23806cd9-ffde-778c-5fa5-b95bd1ff0b44@nvidia.com>
2023-01-30 11:34     ` Michael S. Tsirkin
     [not found]       ` <734e2553-199f-94eb-88d1-a642ec1c7490@nvidia.com>
2023-01-31  6:02         ` Jason Wang
2023-01-31  7:26         ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.