virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v4 03/11] virtio-vdpa: Support interrupt affinity spreading mechanism
       [not found] ` <20230323053043.35-4-xieyongji@bytedance.com>
@ 2023-03-24  6:27   ` Jason Wang
  2023-03-24  9:12     ` Michael S. Tsirkin
       [not found]     ` <CACycT3sm1P2qDQTNKp+RLmyd84+v8xwErf_g1SXqiaJDQO8LNg@mail.gmail.com>
  0 siblings, 2 replies; 11+ messages in thread
From: Jason Wang @ 2023-03-24  6:27 UTC (permalink / raw)
  To: Xie Yongji; +Cc: linux-kernel, tglx, virtualization, hch, mst

On Thu, Mar 23, 2023 at 1:31 PM Xie Yongji <xieyongji@bytedance.com> wrote:
>
> To support interrupt affinity spreading mechanism,
> this makes use of group_cpus_evenly() to create
> an irq callback affinity mask for each virtqueue
> of vdpa device. Then we will unify set_vq_affinity
> callback to pass the affinity to the vdpa device driver.
>
> Signed-off-by: Xie Yongji <xieyongji@bytedance.com>

Thinking hard of all the logics, I think I've found something interesting.

Commit ad71473d9c437 ("virtio_blk: use virtio IRQ affinity") tries to
pass irq_affinity to transport specific find_vqs().  This seems a
layer violation since driver has no knowledge of

1) whether or not the callback is based on an IRQ
2) whether or not the device is a PCI or not (the details are hided by
the transport driver)
3) how many vectors could be used by a device

This means the driver can't actually pass a real affinity masks so the
commit passes a zero irq affinity structure as a hint in fact, so the
PCI layer can build a default affinity based that groups cpus evenly
based on the number of MSI-X vectors (the core logic is the
group_cpus_evenly). I think we should fix this by replacing the
irq_affinity structure with

1) a boolean like auto_cb_spreading

or

2) queue to cpu mapping

So each transport can do its own logic based on that. Then virtio-vDPA
can pass that policy to VDUSE where we only need a group_cpus_evenly()
and avoid duplicating irq_create_affinity_masks()?

Thanks

> ---
>  drivers/virtio/virtio_vdpa.c | 68 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 68 insertions(+)
>
> diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
> index f72696b4c1c2..f3826f42b704 100644
> --- a/drivers/virtio/virtio_vdpa.c
> +++ b/drivers/virtio/virtio_vdpa.c
> @@ -13,6 +13,7 @@
>  #include <linux/kernel.h>
>  #include <linux/slab.h>
>  #include <linux/uuid.h>
> +#include <linux/group_cpus.h>
>  #include <linux/virtio.h>
>  #include <linux/vdpa.h>
>  #include <linux/virtio_config.h>
> @@ -272,6 +273,66 @@ static void virtio_vdpa_del_vqs(struct virtio_device *vdev)
>                 virtio_vdpa_del_vq(vq);
>  }
>
> +static void default_calc_sets(struct irq_affinity *affd, unsigned int affvecs)
> +{
> +       affd->nr_sets = 1;
> +       affd->set_size[0] = affvecs;
> +}
> +
> +static struct cpumask *
> +create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd)
> +{
> +       unsigned int affvecs = 0, curvec, usedvecs, i;
> +       struct cpumask *masks = NULL;
> +
> +       if (nvecs > affd->pre_vectors + affd->post_vectors)
> +               affvecs = nvecs - affd->pre_vectors - affd->post_vectors;
> +
> +       if (!affd->calc_sets)
> +               affd->calc_sets = default_calc_sets;
> +
> +       affd->calc_sets(affd, affvecs);
> +
> +       if (!affvecs)
> +               return NULL;
> +
> +       masks = kcalloc(nvecs, sizeof(*masks), GFP_KERNEL);
> +       if (!masks)
> +               return NULL;
> +
> +       /* Fill out vectors at the beginning that don't need affinity */
> +       for (curvec = 0; curvec < affd->pre_vectors; curvec++)
> +               cpumask_setall(&masks[curvec]);
> +
> +       for (i = 0, usedvecs = 0; i < affd->nr_sets; i++) {
> +               unsigned int this_vecs = affd->set_size[i];
> +               int j;
> +               struct cpumask *result = group_cpus_evenly(this_vecs);
> +
> +               if (!result) {
> +                       kfree(masks);
> +                       return NULL;
> +               }
> +
> +               for (j = 0; j < this_vecs; j++)
> +                       cpumask_copy(&masks[curvec + j], &result[j]);
> +               kfree(result);
> +
> +               curvec += this_vecs;
> +               usedvecs += this_vecs;
> +       }
> +
> +       /* Fill out vectors at the end that don't need affinity */
> +       if (usedvecs >= affvecs)
> +               curvec = affd->pre_vectors + affvecs;
> +       else
> +               curvec = affd->pre_vectors + usedvecs;
> +       for (; curvec < nvecs; curvec++)
> +               cpumask_setall(&masks[curvec]);
> +
> +       return masks;
> +}
> +
>  static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
>                                 struct virtqueue *vqs[],
>                                 vq_callback_t *callbacks[],
> @@ -282,9 +343,15 @@ static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
>         struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vdev);
>         struct vdpa_device *vdpa = vd_get_vdpa(vdev);
>         const struct vdpa_config_ops *ops = vdpa->config;
> +       struct irq_affinity default_affd = { 0 };
> +       struct cpumask *masks;
>         struct vdpa_callback cb;
>         int i, err, queue_idx = 0;
>
> +       masks = create_affinity_masks(nvqs, desc ? desc : &default_affd);
> +       if (!masks)
> +               return -ENOMEM;
> +
>         for (i = 0; i < nvqs; ++i) {
>                 if (!names[i]) {
>                         vqs[i] = NULL;
> @@ -298,6 +365,7 @@ static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
>                         err = PTR_ERR(vqs[i]);
>                         goto err_setup_vq;
>                 }
> +               ops->set_vq_affinity(vdpa, i, &masks[i]);
>         }
>
>         cb.callback = virtio_vdpa_config_cb;
> --
> 2.20.1
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 03/11] virtio-vdpa: Support interrupt affinity spreading mechanism
  2023-03-24  6:27   ` [PATCH v4 03/11] virtio-vdpa: Support interrupt affinity spreading mechanism Jason Wang
@ 2023-03-24  9:12     ` Michael S. Tsirkin
  2023-03-28  6:12       ` Jason Wang
       [not found]     ` <CACycT3sm1P2qDQTNKp+RLmyd84+v8xwErf_g1SXqiaJDQO8LNg@mail.gmail.com>
  1 sibling, 1 reply; 11+ messages in thread
From: Michael S. Tsirkin @ 2023-03-24  9:12 UTC (permalink / raw)
  To: Jason Wang; +Cc: Xie Yongji, tglx, hch, linux-kernel, virtualization

On Fri, Mar 24, 2023 at 02:27:52PM +0800, Jason Wang wrote:
> On Thu, Mar 23, 2023 at 1:31 PM Xie Yongji <xieyongji@bytedance.com> wrote:
> >
> > To support interrupt affinity spreading mechanism,
> > this makes use of group_cpus_evenly() to create
> > an irq callback affinity mask for each virtqueue
> > of vdpa device. Then we will unify set_vq_affinity
> > callback to pass the affinity to the vdpa device driver.
> >
> > Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
> 
> Thinking hard of all the logics, I think I've found something interesting.
> 
> Commit ad71473d9c437 ("virtio_blk: use virtio IRQ affinity") tries to
> pass irq_affinity to transport specific find_vqs().  This seems a
> layer violation since driver has no knowledge of
> 
> 1) whether or not the callback is based on an IRQ
> 2) whether or not the device is a PCI or not (the details are hided by
> the transport driver)
> 3) how many vectors could be used by a device
> 
> This means the driver can't actually pass a real affinity masks so the
> commit passes a zero irq affinity structure as a hint in fact, so the
> PCI layer can build a default affinity based that groups cpus evenly
> based on the number of MSI-X vectors (the core logic is the
> group_cpus_evenly). I think we should fix this by replacing the
> irq_affinity structure with
> 
> 1) a boolean like auto_cb_spreading
> 
> or
> 
> 2) queue to cpu mapping
> 
> So each transport can do its own logic based on that. Then virtio-vDPA
> can pass that policy to VDUSE where we only need a group_cpus_evenly()
> and avoid duplicating irq_create_affinity_masks()?
> 
> Thanks

I don't really understand what you propose. Care to post a patch?
Also does it have to block this patchset or can it be done on top?

> > ---
> >  drivers/virtio/virtio_vdpa.c | 68 ++++++++++++++++++++++++++++++++++++
> >  1 file changed, 68 insertions(+)
> >
> > diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
> > index f72696b4c1c2..f3826f42b704 100644
> > --- a/drivers/virtio/virtio_vdpa.c
> > +++ b/drivers/virtio/virtio_vdpa.c
> > @@ -13,6 +13,7 @@
> >  #include <linux/kernel.h>
> >  #include <linux/slab.h>
> >  #include <linux/uuid.h>
> > +#include <linux/group_cpus.h>
> >  #include <linux/virtio.h>
> >  #include <linux/vdpa.h>
> >  #include <linux/virtio_config.h>
> > @@ -272,6 +273,66 @@ static void virtio_vdpa_del_vqs(struct virtio_device *vdev)
> >                 virtio_vdpa_del_vq(vq);
> >  }
> >
> > +static void default_calc_sets(struct irq_affinity *affd, unsigned int affvecs)
> > +{
> > +       affd->nr_sets = 1;
> > +       affd->set_size[0] = affvecs;
> > +}
> > +
> > +static struct cpumask *
> > +create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd)
> > +{
> > +       unsigned int affvecs = 0, curvec, usedvecs, i;
> > +       struct cpumask *masks = NULL;
> > +
> > +       if (nvecs > affd->pre_vectors + affd->post_vectors)
> > +               affvecs = nvecs - affd->pre_vectors - affd->post_vectors;
> > +
> > +       if (!affd->calc_sets)
> > +               affd->calc_sets = default_calc_sets;
> > +
> > +       affd->calc_sets(affd, affvecs);
> > +
> > +       if (!affvecs)
> > +               return NULL;
> > +
> > +       masks = kcalloc(nvecs, sizeof(*masks), GFP_KERNEL);
> > +       if (!masks)
> > +               return NULL;
> > +
> > +       /* Fill out vectors at the beginning that don't need affinity */
> > +       for (curvec = 0; curvec < affd->pre_vectors; curvec++)
> > +               cpumask_setall(&masks[curvec]);
> > +
> > +       for (i = 0, usedvecs = 0; i < affd->nr_sets; i++) {
> > +               unsigned int this_vecs = affd->set_size[i];
> > +               int j;
> > +               struct cpumask *result = group_cpus_evenly(this_vecs);
> > +
> > +               if (!result) {
> > +                       kfree(masks);
> > +                       return NULL;
> > +               }
> > +
> > +               for (j = 0; j < this_vecs; j++)
> > +                       cpumask_copy(&masks[curvec + j], &result[j]);
> > +               kfree(result);
> > +
> > +               curvec += this_vecs;
> > +               usedvecs += this_vecs;
> > +       }
> > +
> > +       /* Fill out vectors at the end that don't need affinity */
> > +       if (usedvecs >= affvecs)
> > +               curvec = affd->pre_vectors + affvecs;
> > +       else
> > +               curvec = affd->pre_vectors + usedvecs;
> > +       for (; curvec < nvecs; curvec++)
> > +               cpumask_setall(&masks[curvec]);
> > +
> > +       return masks;
> > +}
> > +
> >  static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
> >                                 struct virtqueue *vqs[],
> >                                 vq_callback_t *callbacks[],
> > @@ -282,9 +343,15 @@ static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
> >         struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vdev);
> >         struct vdpa_device *vdpa = vd_get_vdpa(vdev);
> >         const struct vdpa_config_ops *ops = vdpa->config;
> > +       struct irq_affinity default_affd = { 0 };
> > +       struct cpumask *masks;
> >         struct vdpa_callback cb;
> >         int i, err, queue_idx = 0;
> >
> > +       masks = create_affinity_masks(nvqs, desc ? desc : &default_affd);
> > +       if (!masks)
> > +               return -ENOMEM;
> > +
> >         for (i = 0; i < nvqs; ++i) {
> >                 if (!names[i]) {
> >                         vqs[i] = NULL;
> > @@ -298,6 +365,7 @@ static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
> >                         err = PTR_ERR(vqs[i]);
> >                         goto err_setup_vq;
> >                 }
> > +               ops->set_vq_affinity(vdpa, i, &masks[i]);
> >         }
> >
> >         cb.callback = virtio_vdpa_config_cb;
> > --
> > 2.20.1
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 03/11] virtio-vdpa: Support interrupt affinity spreading mechanism
       [not found]     ` <CACycT3sm1P2qDQTNKp+RLmyd84+v8xwErf_g1SXqiaJDQO8LNg@mail.gmail.com>
@ 2023-03-28  3:14       ` Jason Wang
       [not found]         ` <CACycT3uYbnrQDDbFmwdww8ukMU1t9RsAuutHsFT-UzK9_Mc=Kg@mail.gmail.com>
  0 siblings, 1 reply; 11+ messages in thread
From: Jason Wang @ 2023-03-28  3:14 UTC (permalink / raw)
  To: Yongji Xie
  Cc: linux-kernel, Thomas Gleixner, virtualization, Christoph Hellwig,
	Michael S. Tsirkin

On Tue, Mar 28, 2023 at 11:03 AM Yongji Xie <xieyongji@bytedance.com> wrote:
>
> On Fri, Mar 24, 2023 at 2:28 PM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Thu, Mar 23, 2023 at 1:31 PM Xie Yongji <xieyongji@bytedance.com> wrote:
> > >
> > > To support interrupt affinity spreading mechanism,
> > > this makes use of group_cpus_evenly() to create
> > > an irq callback affinity mask for each virtqueue
> > > of vdpa device. Then we will unify set_vq_affinity
> > > callback to pass the affinity to the vdpa device driver.
> > >
> > > Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
> >
> > Thinking hard of all the logics, I think I've found something interesting.
> >
> > Commit ad71473d9c437 ("virtio_blk: use virtio IRQ affinity") tries to
> > pass irq_affinity to transport specific find_vqs().  This seems a
> > layer violation since driver has no knowledge of
> >
> > 1) whether or not the callback is based on an IRQ
> > 2) whether or not the device is a PCI or not (the details are hided by
> > the transport driver)
> > 3) how many vectors could be used by a device
> >
> > This means the driver can't actually pass a real affinity masks so the
> > commit passes a zero irq affinity structure as a hint in fact, so the
> > PCI layer can build a default affinity based that groups cpus evenly
> > based on the number of MSI-X vectors (the core logic is the
> > group_cpus_evenly). I think we should fix this by replacing the
> > irq_affinity structure with
> >
> > 1) a boolean like auto_cb_spreading
> >
> > or
> >
> > 2) queue to cpu mapping
> >
>
> But only the driver knows which queues are used in the control path
> which don't need the automatic irq affinity assignment.

Is this knowledge awarded by the transport driver now?

E.g virtio-blk uses:

        struct irq_affinity desc = { 0, };

Atleast we can tell the transport driver which vq requires automatic
irq affinity.

> So I think the
> irq_affinity structure can only be created by device drivers and
> passed to the virtio-pci/virtio-vdpa driver.

This could be not easy since the driver doesn't even know how many
interrupts will be used by the transport driver, so it can't built the
actual affinity structure.

>
> > So each transport can do its own logic based on that. Then virtio-vDPA
> > can pass that policy to VDUSE where we only need a group_cpus_evenly()
> > and avoid duplicating irq_create_affinity_masks()?
> >
>
> I don't get why we would have duplicated irq_create_affinity_masks().

I meant the create_affinity_masks() in patch 3 seems a duplication of
irq_create_affinity_masks().

Thanks

>
> Thanks,
> Yongji
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 03/11] virtio-vdpa: Support interrupt affinity spreading mechanism
       [not found]         ` <CACycT3uYbnrQDDbFmwdww8ukMU1t9RsAuutHsFT-UzK9_Mc=Kg@mail.gmail.com>
@ 2023-03-28  3:44           ` Jason Wang
       [not found]             ` <CACycT3vCqisBS0OyMsnyrw0i6kWTDqSZ4GQbdoycHz-L3=1Q7Q@mail.gmail.com>
  0 siblings, 1 reply; 11+ messages in thread
From: Jason Wang @ 2023-03-28  3:44 UTC (permalink / raw)
  To: Yongji Xie
  Cc: linux-kernel, Thomas Gleixner, virtualization, Christoph Hellwig,
	Michael S. Tsirkin

On Tue, Mar 28, 2023 at 11:33 AM Yongji Xie <xieyongji@bytedance.com> wrote:
>
> On Tue, Mar 28, 2023 at 11:14 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Mar 28, 2023 at 11:03 AM Yongji Xie <xieyongji@bytedance.com> wrote:
> > >
> > > On Fri, Mar 24, 2023 at 2:28 PM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Thu, Mar 23, 2023 at 1:31 PM Xie Yongji <xieyongji@bytedance.com> wrote:
> > > > >
> > > > > To support interrupt affinity spreading mechanism,
> > > > > this makes use of group_cpus_evenly() to create
> > > > > an irq callback affinity mask for each virtqueue
> > > > > of vdpa device. Then we will unify set_vq_affinity
> > > > > callback to pass the affinity to the vdpa device driver.
> > > > >
> > > > > Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
> > > >
> > > > Thinking hard of all the logics, I think I've found something interesting.
> > > >
> > > > Commit ad71473d9c437 ("virtio_blk: use virtio IRQ affinity") tries to
> > > > pass irq_affinity to transport specific find_vqs().  This seems a
> > > > layer violation since driver has no knowledge of
> > > >
> > > > 1) whether or not the callback is based on an IRQ
> > > > 2) whether or not the device is a PCI or not (the details are hided by
> > > > the transport driver)
> > > > 3) how many vectors could be used by a device
> > > >
> > > > This means the driver can't actually pass a real affinity masks so the
> > > > commit passes a zero irq affinity structure as a hint in fact, so the
> > > > PCI layer can build a default affinity based that groups cpus evenly
> > > > based on the number of MSI-X vectors (the core logic is the
> > > > group_cpus_evenly). I think we should fix this by replacing the
> > > > irq_affinity structure with
> > > >
> > > > 1) a boolean like auto_cb_spreading
> > > >
> > > > or
> > > >
> > > > 2) queue to cpu mapping
> > > >
> > >
> > > But only the driver knows which queues are used in the control path
> > > which don't need the automatic irq affinity assignment.
> >
> > Is this knowledge awarded by the transport driver now?
> >
>
> This knowledge is awarded by the device driver rather than the transport driver.
>
> E.g. virtio-scsi uses:
>
>     struct irq_affinity desc = { .pre_vectors = 2 }; // vq0 is control
> queue, vq1 is event queue

Ok, but it only works as a hint, it's not a real affinity. As replied,
we can pass an array of boolean in this case then transport driver
knows it doesn't need to use automatic affinity for the first two
queues.

>
> > E.g virtio-blk uses:
> >
> >         struct irq_affinity desc = { 0, };
> >
> > Atleast we can tell the transport driver which vq requires automatic
> > irq affinity.
> >
>
> I think that is what the current implementation does.
>
> > > So I think the
> > > irq_affinity structure can only be created by device drivers and
> > > passed to the virtio-pci/virtio-vdpa driver.
> >
> > This could be not easy since the driver doesn't even know how many
> > interrupts will be used by the transport driver, so it can't built the
> > actual affinity structure.
> >
>
> The actual affinity mask is built by the transport driver,

For PCI yes, it talks directly to the IRQ subsystems.

> device
> driver only passes a hint on which queues don't need the automatic irq
> affinity assignment.

But not for virtio-vDPA since the IRQ needs to be dealt with by the
parent driver. For our case, it's the VDUSE where it doesn't need IRQ
at all, a queue to cpu mapping is sufficient.

Thanks

>
> Thanks,
> Yongji
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 03/11] virtio-vdpa: Support interrupt affinity spreading mechanism
       [not found]             ` <CACycT3vCqisBS0OyMsnyrw0i6kWTDqSZ4GQbdoycHz-L3=1Q7Q@mail.gmail.com>
@ 2023-03-28  6:07               ` Jason Wang
  0 siblings, 0 replies; 11+ messages in thread
From: Jason Wang @ 2023-03-28  6:07 UTC (permalink / raw)
  To: Yongji Xie
  Cc: linux-kernel, Thomas Gleixner, virtualization, Christoph Hellwig,
	Michael S. Tsirkin

On Tue, Mar 28, 2023 at 12:05 PM Yongji Xie <xieyongji@bytedance.com> wrote:
>
> On Tue, Mar 28, 2023 at 11:44 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Mar 28, 2023 at 11:33 AM Yongji Xie <xieyongji@bytedance.com> wrote:
> > >
> > > On Tue, Mar 28, 2023 at 11:14 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Tue, Mar 28, 2023 at 11:03 AM Yongji Xie <xieyongji@bytedance.com> wrote:
> > > > >
> > > > > On Fri, Mar 24, 2023 at 2:28 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > >
> > > > > > On Thu, Mar 23, 2023 at 1:31 PM Xie Yongji <xieyongji@bytedance.com> wrote:
> > > > > > >
> > > > > > > To support interrupt affinity spreading mechanism,
> > > > > > > this makes use of group_cpus_evenly() to create
> > > > > > > an irq callback affinity mask for each virtqueue
> > > > > > > of vdpa device. Then we will unify set_vq_affinity
> > > > > > > callback to pass the affinity to the vdpa device driver.
> > > > > > >
> > > > > > > Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
> > > > > >
> > > > > > Thinking hard of all the logics, I think I've found something interesting.
> > > > > >
> > > > > > Commit ad71473d9c437 ("virtio_blk: use virtio IRQ affinity") tries to
> > > > > > pass irq_affinity to transport specific find_vqs().  This seems a
> > > > > > layer violation since driver has no knowledge of
> > > > > >
> > > > > > 1) whether or not the callback is based on an IRQ
> > > > > > 2) whether or not the device is a PCI or not (the details are hided by
> > > > > > the transport driver)
> > > > > > 3) how many vectors could be used by a device
> > > > > >
> > > > > > This means the driver can't actually pass a real affinity masks so the
> > > > > > commit passes a zero irq affinity structure as a hint in fact, so the
> > > > > > PCI layer can build a default affinity based that groups cpus evenly
> > > > > > based on the number of MSI-X vectors (the core logic is the
> > > > > > group_cpus_evenly). I think we should fix this by replacing the
> > > > > > irq_affinity structure with
> > > > > >
> > > > > > 1) a boolean like auto_cb_spreading
> > > > > >
> > > > > > or
> > > > > >
> > > > > > 2) queue to cpu mapping
> > > > > >
> > > > >
> > > > > But only the driver knows which queues are used in the control path
> > > > > which don't need the automatic irq affinity assignment.
> > > >
> > > > Is this knowledge awarded by the transport driver now?
> > > >
> > >
> > > This knowledge is awarded by the device driver rather than the transport driver.
> > >
> > > E.g. virtio-scsi uses:
> > >
> > >     struct irq_affinity desc = { .pre_vectors = 2 }; // vq0 is control
> > > queue, vq1 is event queue
> >
> > Ok, but it only works as a hint, it's not a real affinity. As replied,
> > we can pass an array of boolean in this case then transport driver
> > knows it doesn't need to use automatic affinity for the first two
> > queues.
> >
>
> But we don't know whether we would use other fields in structure
> irq_affinity in the future. So a full set should be better?

Good point. So the issue is the calc_sets() and we probably need that
if there's a virtio driver that needs more than one set of vectors
that needs to be spreaded. Technically, we could have a virtio level
abstraction for this but I agree it's probably not worth bothering
now.

>
> > >
> > > > E.g virtio-blk uses:
> > > >
> > > >         struct irq_affinity desc = { 0, };
> > > >
> > > > Atleast we can tell the transport driver which vq requires automatic
> > > > irq affinity.
> > > >
> > >
> > > I think that is what the current implementation does.
> > >
> > > > > So I think the
> > > > > irq_affinity structure can only be created by device drivers and
> > > > > passed to the virtio-pci/virtio-vdpa driver.
> > > >
> > > > This could be not easy since the driver doesn't even know how many
> > > > interrupts will be used by the transport driver, so it can't built the
> > > > actual affinity structure.
> > > >
> > >
> > > The actual affinity mask is built by the transport driver,
> >
> > For PCI yes, it talks directly to the IRQ subsystems.
> >
> > > device
> > > driver only passes a hint on which queues don't need the automatic irq
> > > affinity assignment.
> >
> > But not for virtio-vDPA since the IRQ needs to be dealt with by the
> > parent driver. For our case, it's the VDUSE where it doesn't need IRQ
> > at all, a queue to cpu mapping is sufficient.
> >
>
> The device driver doesn't know whether it is binded to virtio-pci or
> virtio-vdpa. So it should pass a full set needed by the automatic irq
> affinity assignment instead of a subset. Then virtio-vdpa can choose
> to pass a queue to cpu mapping to VDUSE, which is what we do now (use
> set_vq_affinity()).

Yes, so basically two ways:

1) automatic IRQ management, passing affd to find_vqs(), affinity was
determined by the transport (e.g vDPA).
2) affinity that is under the control of the driver, it needs to use
set_vq_affinity() but need to deal with cpu hotplug stuffs.

Thanks

>
> Thanks,
> Yongji
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 03/11] virtio-vdpa: Support interrupt affinity spreading mechanism
  2023-03-24  9:12     ` Michael S. Tsirkin
@ 2023-03-28  6:12       ` Jason Wang
  0 siblings, 0 replies; 11+ messages in thread
From: Jason Wang @ 2023-03-28  6:12 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Xie Yongji, tglx, hch, linux-kernel, virtualization


在 2023/3/24 17:12, Michael S. Tsirkin 写道:
> On Fri, Mar 24, 2023 at 02:27:52PM +0800, Jason Wang wrote:
>> On Thu, Mar 23, 2023 at 1:31 PM Xie Yongji <xieyongji@bytedance.com> wrote:
>>> To support interrupt affinity spreading mechanism,
>>> this makes use of group_cpus_evenly() to create
>>> an irq callback affinity mask for each virtqueue
>>> of vdpa device. Then we will unify set_vq_affinity
>>> callback to pass the affinity to the vdpa device driver.
>>>
>>> Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
>> Thinking hard of all the logics, I think I've found something interesting.
>>
>> Commit ad71473d9c437 ("virtio_blk: use virtio IRQ affinity") tries to
>> pass irq_affinity to transport specific find_vqs().  This seems a
>> layer violation since driver has no knowledge of
>>
>> 1) whether or not the callback is based on an IRQ
>> 2) whether or not the device is a PCI or not (the details are hided by
>> the transport driver)
>> 3) how many vectors could be used by a device
>>
>> This means the driver can't actually pass a real affinity masks so the
>> commit passes a zero irq affinity structure as a hint in fact, so the
>> PCI layer can build a default affinity based that groups cpus evenly
>> based on the number of MSI-X vectors (the core logic is the
>> group_cpus_evenly). I think we should fix this by replacing the
>> irq_affinity structure with
>>
>> 1) a boolean like auto_cb_spreading
>>
>> or
>>
>> 2) queue to cpu mapping
>>
>> So each transport can do its own logic based on that. Then virtio-vDPA
>> can pass that policy to VDUSE where we only need a group_cpus_evenly()
>> and avoid duplicating irq_create_affinity_masks()?
>>
>> Thanks
> I don't really understand what you propose. Care to post a patch?


I meant to avoid passing irq_affinity structure in find_vqs but an array 
of boolean telling us whether or not the vq requires a automatic 
spreading of callbacks. But it seems less flexible.


> Also does it have to block this patchset or can it be done on top?


We can leave it in the future.

So

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


>
>>> ---
>>>   drivers/virtio/virtio_vdpa.c | 68 ++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 68 insertions(+)
>>>
>>> diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
>>> index f72696b4c1c2..f3826f42b704 100644
>>> --- a/drivers/virtio/virtio_vdpa.c
>>> +++ b/drivers/virtio/virtio_vdpa.c
>>> @@ -13,6 +13,7 @@
>>>   #include <linux/kernel.h>
>>>   #include <linux/slab.h>
>>>   #include <linux/uuid.h>
>>> +#include <linux/group_cpus.h>
>>>   #include <linux/virtio.h>
>>>   #include <linux/vdpa.h>
>>>   #include <linux/virtio_config.h>
>>> @@ -272,6 +273,66 @@ static void virtio_vdpa_del_vqs(struct virtio_device *vdev)
>>>                  virtio_vdpa_del_vq(vq);
>>>   }
>>>
>>> +static void default_calc_sets(struct irq_affinity *affd, unsigned int affvecs)
>>> +{
>>> +       affd->nr_sets = 1;
>>> +       affd->set_size[0] = affvecs;
>>> +}
>>> +
>>> +static struct cpumask *
>>> +create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd)
>>> +{
>>> +       unsigned int affvecs = 0, curvec, usedvecs, i;
>>> +       struct cpumask *masks = NULL;
>>> +
>>> +       if (nvecs > affd->pre_vectors + affd->post_vectors)
>>> +               affvecs = nvecs - affd->pre_vectors - affd->post_vectors;
>>> +
>>> +       if (!affd->calc_sets)
>>> +               affd->calc_sets = default_calc_sets;
>>> +
>>> +       affd->calc_sets(affd, affvecs);
>>> +
>>> +       if (!affvecs)
>>> +               return NULL;
>>> +
>>> +       masks = kcalloc(nvecs, sizeof(*masks), GFP_KERNEL);
>>> +       if (!masks)
>>> +               return NULL;
>>> +
>>> +       /* Fill out vectors at the beginning that don't need affinity */
>>> +       for (curvec = 0; curvec < affd->pre_vectors; curvec++)
>>> +               cpumask_setall(&masks[curvec]);
>>> +
>>> +       for (i = 0, usedvecs = 0; i < affd->nr_sets; i++) {
>>> +               unsigned int this_vecs = affd->set_size[i];
>>> +               int j;
>>> +               struct cpumask *result = group_cpus_evenly(this_vecs);
>>> +
>>> +               if (!result) {
>>> +                       kfree(masks);
>>> +                       return NULL;
>>> +               }
>>> +
>>> +               for (j = 0; j < this_vecs; j++)
>>> +                       cpumask_copy(&masks[curvec + j], &result[j]);
>>> +               kfree(result);
>>> +
>>> +               curvec += this_vecs;
>>> +               usedvecs += this_vecs;
>>> +       }
>>> +
>>> +       /* Fill out vectors at the end that don't need affinity */
>>> +       if (usedvecs >= affvecs)
>>> +               curvec = affd->pre_vectors + affvecs;
>>> +       else
>>> +               curvec = affd->pre_vectors + usedvecs;
>>> +       for (; curvec < nvecs; curvec++)
>>> +               cpumask_setall(&masks[curvec]);
>>> +
>>> +       return masks;
>>> +}
>>> +
>>>   static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
>>>                                  struct virtqueue *vqs[],
>>>                                  vq_callback_t *callbacks[],
>>> @@ -282,9 +343,15 @@ static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
>>>          struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vdev);
>>>          struct vdpa_device *vdpa = vd_get_vdpa(vdev);
>>>          const struct vdpa_config_ops *ops = vdpa->config;
>>> +       struct irq_affinity default_affd = { 0 };
>>> +       struct cpumask *masks;
>>>          struct vdpa_callback cb;
>>>          int i, err, queue_idx = 0;
>>>
>>> +       masks = create_affinity_masks(nvqs, desc ? desc : &default_affd);
>>> +       if (!masks)
>>> +               return -ENOMEM;
>>> +
>>>          for (i = 0; i < nvqs; ++i) {
>>>                  if (!names[i]) {
>>>                          vqs[i] = NULL;
>>> @@ -298,6 +365,7 @@ static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
>>>                          err = PTR_ERR(vqs[i]);
>>>                          goto err_setup_vq;
>>>                  }
>>> +               ops->set_vq_affinity(vdpa, i, &masks[i]);
>>>          }
>>>
>>>          cb.callback = virtio_vdpa_config_cb;
>>> --
>>> 2.20.1
>>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 05/11] vduse: Support set_vq_affinity callback
       [not found] ` <20230323053043.35-6-xieyongji@bytedance.com>
@ 2023-03-28  6:14   ` Jason Wang
  0 siblings, 0 replies; 11+ messages in thread
From: Jason Wang @ 2023-03-28  6:14 UTC (permalink / raw)
  To: Xie Yongji, mst, tglx, hch; +Cc: linux-kernel, virtualization


在 2023/3/23 13:30, Xie Yongji 写道:
> Since virtio-vdpa bus driver already support interrupt
> affinity spreading mechanism, let's implement the
> set_vq_affinity callback to bring it to vduse device.
> After we get the virtqueue's affinity, we can spread
> IRQs between CPUs in the affinity mask, in a round-robin
> manner, to run the irq callback.
>
> Signed-off-by: Xie Yongji <xieyongji@bytedance.com>


Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


> ---
>   drivers/vdpa/vdpa_user/vduse_dev.c | 61 ++++++++++++++++++++++++++----
>   1 file changed, 54 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> index 98359d87a06f..45aa8703c4b5 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -41,6 +41,8 @@
>   #define VDUSE_IOVA_SIZE (128 * 1024 * 1024)
>   #define VDUSE_MSG_DEFAULT_TIMEOUT 30
>   
> +#define IRQ_UNBOUND -1
> +
>   struct vduse_virtqueue {
>   	u16 index;
>   	u16 num_max;
> @@ -57,6 +59,8 @@ struct vduse_virtqueue {
>   	struct vdpa_callback cb;
>   	struct work_struct inject;
>   	struct work_struct kick;
> +	int irq_effective_cpu;
> +	struct cpumask irq_affinity;
>   };
>   
>   struct vduse_dev;
> @@ -128,6 +132,7 @@ static struct class *vduse_class;
>   static struct cdev vduse_ctrl_cdev;
>   static struct cdev vduse_cdev;
>   static struct workqueue_struct *vduse_irq_wq;
> +static struct workqueue_struct *vduse_irq_bound_wq;
>   
>   static u32 allowed_device_id[] = {
>   	VIRTIO_ID_BLOCK,
> @@ -708,6 +713,15 @@ static u32 vduse_vdpa_get_generation(struct vdpa_device *vdpa)
>   	return dev->generation;
>   }
>   
> +static int vduse_vdpa_set_vq_affinity(struct vdpa_device *vdpa, u16 idx,
> +				      const struct cpumask *cpu_mask)
> +{
> +	struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> +
> +	cpumask_copy(&dev->vqs[idx]->irq_affinity, cpu_mask);
> +	return 0;
> +}
> +
>   static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
>   				unsigned int asid,
>   				struct vhost_iotlb *iotlb)
> @@ -758,6 +772,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
>   	.get_config		= vduse_vdpa_get_config,
>   	.set_config		= vduse_vdpa_set_config,
>   	.get_generation		= vduse_vdpa_get_generation,
> +	.set_vq_affinity	= vduse_vdpa_set_vq_affinity,
>   	.reset			= vduse_vdpa_reset,
>   	.set_map		= vduse_vdpa_set_map,
>   	.free			= vduse_vdpa_free,
> @@ -917,7 +932,8 @@ static void vduse_vq_irq_inject(struct work_struct *work)
>   }
>   
>   static int vduse_dev_queue_irq_work(struct vduse_dev *dev,
> -				    struct work_struct *irq_work)
> +				    struct work_struct *irq_work,
> +				    int irq_effective_cpu)
>   {
>   	int ret = -EINVAL;
>   
> @@ -926,7 +942,11 @@ static int vduse_dev_queue_irq_work(struct vduse_dev *dev,
>   		goto unlock;
>   
>   	ret = 0;
> -	queue_work(vduse_irq_wq, irq_work);
> +	if (irq_effective_cpu == IRQ_UNBOUND)
> +		queue_work(vduse_irq_wq, irq_work);
> +	else
> +		queue_work_on(irq_effective_cpu,
> +			      vduse_irq_bound_wq, irq_work);
>   unlock:
>   	up_read(&dev->rwsem);
>   
> @@ -1029,6 +1049,22 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
>   	return ret;
>   }
>   
> +static void vduse_vq_update_effective_cpu(struct vduse_virtqueue *vq)
> +{
> +	int curr_cpu = vq->irq_effective_cpu;
> +
> +	while (true) {
> +		curr_cpu = cpumask_next(curr_cpu, &vq->irq_affinity);
> +		if (cpu_online(curr_cpu))
> +			break;
> +
> +		if (curr_cpu >= nr_cpu_ids)
> +			curr_cpu = IRQ_UNBOUND;
> +	}
> +
> +	vq->irq_effective_cpu = curr_cpu;
> +}
> +
>   static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
>   			    unsigned long arg)
>   {
> @@ -1111,7 +1147,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
>   		break;
>   	}
>   	case VDUSE_DEV_INJECT_CONFIG_IRQ:
> -		ret = vduse_dev_queue_irq_work(dev, &dev->inject);
> +		ret = vduse_dev_queue_irq_work(dev, &dev->inject, IRQ_UNBOUND);
>   		break;
>   	case VDUSE_VQ_SETUP: {
>   		struct vduse_vq_config config;
> @@ -1198,7 +1234,10 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
>   			break;
>   
>   		index = array_index_nospec(index, dev->vq_num);
> -		ret = vduse_dev_queue_irq_work(dev, &dev->vqs[index]->inject);
> +
> +		vduse_vq_update_effective_cpu(dev->vqs[index]);
> +		ret = vduse_dev_queue_irq_work(dev, &dev->vqs[index]->inject,
> +					dev->vqs[index]->irq_effective_cpu);
>   		break;
>   	}
>   	case VDUSE_IOTLB_REG_UMEM: {
> @@ -1367,10 +1406,12 @@ static int vduse_dev_init_vqs(struct vduse_dev *dev, u32 vq_align, u32 vq_num)
>   			goto err;
>   
>   		dev->vqs[i]->index = i;
> +		dev->vqs[i]->irq_effective_cpu = IRQ_UNBOUND;
>   		INIT_WORK(&dev->vqs[i]->inject, vduse_vq_irq_inject);
>   		INIT_WORK(&dev->vqs[i]->kick, vduse_vq_kick_work);
>   		spin_lock_init(&dev->vqs[i]->kick_lock);
>   		spin_lock_init(&dev->vqs[i]->irq_lock);
> +		cpumask_setall(&dev->vqs[i]->irq_affinity);
>   	}
>   
>   	return 0;
> @@ -1858,12 +1899,15 @@ static int vduse_init(void)
>   	if (ret)
>   		goto err_cdev;
>   
> +	ret = -ENOMEM;
>   	vduse_irq_wq = alloc_workqueue("vduse-irq",
>   				WQ_HIGHPRI | WQ_SYSFS | WQ_UNBOUND, 0);
> -	if (!vduse_irq_wq) {
> -		ret = -ENOMEM;
> +	if (!vduse_irq_wq)
>   		goto err_wq;
> -	}
> +
> +	vduse_irq_bound_wq = alloc_workqueue("vduse-irq-bound", WQ_HIGHPRI, 0);
> +	if (!vduse_irq_bound_wq)
> +		goto err_bound_wq;
>   
>   	ret = vduse_domain_init();
>   	if (ret)
> @@ -1877,6 +1921,8 @@ static int vduse_init(void)
>   err_mgmtdev:
>   	vduse_domain_exit();
>   err_domain:
> +	destroy_workqueue(vduse_irq_bound_wq);
> +err_bound_wq:
>   	destroy_workqueue(vduse_irq_wq);
>   err_wq:
>   	cdev_del(&vduse_cdev);
> @@ -1896,6 +1942,7 @@ static void vduse_exit(void)
>   {
>   	vduse_mgmtdev_exit();
>   	vduse_domain_exit();
> +	destroy_workqueue(vduse_irq_bound_wq);
>   	destroy_workqueue(vduse_irq_wq);
>   	cdev_del(&vduse_cdev);
>   	device_destroy(vduse_class, vduse_major);

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 06/11] vduse: Support get_vq_affinity callback
       [not found] ` <20230323053043.35-7-xieyongji@bytedance.com>
@ 2023-03-28  6:15   ` Jason Wang
  0 siblings, 0 replies; 11+ messages in thread
From: Jason Wang @ 2023-03-28  6:15 UTC (permalink / raw)
  To: Xie Yongji, mst, tglx, hch; +Cc: linux-kernel, virtualization


在 2023/3/23 13:30, Xie Yongji 写道:
> This implements get_vq_affinity callback so that
> the virtio-blk driver can build the blk-mq queues
> based on the irq callback affinity.
>
> Signed-off-by: Xie Yongji <xieyongji@bytedance.com>


Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


> ---
>   drivers/vdpa/vdpa_user/vduse_dev.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
>
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> index 45aa8703c4b5..cefabd0dab9c 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -722,6 +722,14 @@ static int vduse_vdpa_set_vq_affinity(struct vdpa_device *vdpa, u16 idx,
>   	return 0;
>   }
>   
> +static const struct cpumask *
> +vduse_vdpa_get_vq_affinity(struct vdpa_device *vdpa, u16 idx)
> +{
> +	struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> +
> +	return &dev->vqs[idx]->irq_affinity;
> +}
> +
>   static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
>   				unsigned int asid,
>   				struct vhost_iotlb *iotlb)
> @@ -773,6 +781,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
>   	.set_config		= vduse_vdpa_set_config,
>   	.get_generation		= vduse_vdpa_get_generation,
>   	.set_vq_affinity	= vduse_vdpa_set_vq_affinity,
> +	.get_vq_affinity	= vduse_vdpa_get_vq_affinity,
>   	.reset			= vduse_vdpa_reset,
>   	.set_map		= vduse_vdpa_set_map,
>   	.free			= vduse_vdpa_free,

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 07/11] vduse: Add sysfs interface for irq callback affinity
       [not found] ` <20230323053043.35-8-xieyongji@bytedance.com>
@ 2023-03-28  6:16   ` Jason Wang
  0 siblings, 0 replies; 11+ messages in thread
From: Jason Wang @ 2023-03-28  6:16 UTC (permalink / raw)
  To: Xie Yongji, mst, tglx, hch; +Cc: linux-kernel, virtualization


在 2023/3/23 13:30, Xie Yongji 写道:
> Add sysfs interface for each vduse virtqueue to
> get/set the affinity for irq callback. This might
> be useful for performance tuning when the irq callback
> affinity mask contains more than one CPU.
>
> Signed-off-by: Xie Yongji <xieyongji@bytedance.com>


Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


> ---
>   drivers/vdpa/vdpa_user/vduse_dev.c | 124 ++++++++++++++++++++++++++---
>   1 file changed, 113 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> index cefabd0dab9c..77da3685568a 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -61,6 +61,7 @@ struct vduse_virtqueue {
>   	struct work_struct kick;
>   	int irq_effective_cpu;
>   	struct cpumask irq_affinity;
> +	struct kobject kobj;
>   };
>   
>   struct vduse_dev;
> @@ -1387,6 +1388,96 @@ static const struct file_operations vduse_dev_fops = {
>   	.llseek		= noop_llseek,
>   };
>   
> +static ssize_t irq_cb_affinity_show(struct vduse_virtqueue *vq, char *buf)
> +{
> +	return sprintf(buf, "%*pb\n", cpumask_pr_args(&vq->irq_affinity));
> +}
> +
> +static ssize_t irq_cb_affinity_store(struct vduse_virtqueue *vq,
> +				     const char *buf, size_t count)
> +{
> +	cpumask_var_t new_value;
> +	int ret;
> +
> +	if (!zalloc_cpumask_var(&new_value, GFP_KERNEL))
> +		return -ENOMEM;
> +
> +	ret = cpumask_parse(buf, new_value);
> +	if (ret)
> +		goto free_mask;
> +
> +	ret = -EINVAL;
> +	if (!cpumask_intersects(new_value, cpu_online_mask))
> +		goto free_mask;
> +
> +	cpumask_copy(&vq->irq_affinity, new_value);
> +	ret = count;
> +free_mask:
> +	free_cpumask_var(new_value);
> +	return ret;
> +}
> +
> +struct vq_sysfs_entry {
> +	struct attribute attr;
> +	ssize_t (*show)(struct vduse_virtqueue *vq, char *buf);
> +	ssize_t (*store)(struct vduse_virtqueue *vq, const char *buf,
> +			 size_t count);
> +};
> +
> +static struct vq_sysfs_entry irq_cb_affinity_attr = __ATTR_RW(irq_cb_affinity);
> +
> +static struct attribute *vq_attrs[] = {
> +	&irq_cb_affinity_attr.attr,
> +	NULL,
> +};
> +ATTRIBUTE_GROUPS(vq);
> +
> +static ssize_t vq_attr_show(struct kobject *kobj, struct attribute *attr,
> +			    char *buf)
> +{
> +	struct vduse_virtqueue *vq = container_of(kobj,
> +					struct vduse_virtqueue, kobj);
> +	struct vq_sysfs_entry *entry = container_of(attr,
> +					struct vq_sysfs_entry, attr);
> +
> +	if (!entry->show)
> +		return -EIO;
> +
> +	return entry->show(vq, buf);
> +}
> +
> +static ssize_t vq_attr_store(struct kobject *kobj, struct attribute *attr,
> +			     const char *buf, size_t count)
> +{
> +	struct vduse_virtqueue *vq = container_of(kobj,
> +					struct vduse_virtqueue, kobj);
> +	struct vq_sysfs_entry *entry = container_of(attr,
> +					struct vq_sysfs_entry, attr);
> +
> +	if (!entry->store)
> +		return -EIO;
> +
> +	return entry->store(vq, buf, count);
> +}
> +
> +static const struct sysfs_ops vq_sysfs_ops = {
> +	.show = vq_attr_show,
> +	.store = vq_attr_store,
> +};
> +
> +static void vq_release(struct kobject *kobj)
> +{
> +	struct vduse_virtqueue *vq = container_of(kobj,
> +					struct vduse_virtqueue, kobj);
> +	kfree(vq);
> +}
> +
> +static const struct kobj_type vq_type = {
> +	.release	= vq_release,
> +	.sysfs_ops	= &vq_sysfs_ops,
> +	.default_groups	= vq_groups,
> +};
> +
>   static void vduse_dev_deinit_vqs(struct vduse_dev *dev)
>   {
>   	int i;
> @@ -1395,13 +1486,13 @@ static void vduse_dev_deinit_vqs(struct vduse_dev *dev)
>   		return;
>   
>   	for (i = 0; i < dev->vq_num; i++)
> -		kfree(dev->vqs[i]);
> +		kobject_put(&dev->vqs[i]->kobj);
>   	kfree(dev->vqs);
>   }
>   
>   static int vduse_dev_init_vqs(struct vduse_dev *dev, u32 vq_align, u32 vq_num)
>   {
> -	int i;
> +	int ret, i;
>   
>   	dev->vq_align = vq_align;
>   	dev->vq_num = vq_num;
> @@ -1411,8 +1502,10 @@ static int vduse_dev_init_vqs(struct vduse_dev *dev, u32 vq_align, u32 vq_num)
>   
>   	for (i = 0; i < vq_num; i++) {
>   		dev->vqs[i] = kzalloc(sizeof(*dev->vqs[i]), GFP_KERNEL);
> -		if (!dev->vqs[i])
> +		if (!dev->vqs[i]) {
> +			ret = -ENOMEM;
>   			goto err;
> +		}
>   
>   		dev->vqs[i]->index = i;
>   		dev->vqs[i]->irq_effective_cpu = IRQ_UNBOUND;
> @@ -1421,15 +1514,23 @@ static int vduse_dev_init_vqs(struct vduse_dev *dev, u32 vq_align, u32 vq_num)
>   		spin_lock_init(&dev->vqs[i]->kick_lock);
>   		spin_lock_init(&dev->vqs[i]->irq_lock);
>   		cpumask_setall(&dev->vqs[i]->irq_affinity);
> +
> +		kobject_init(&dev->vqs[i]->kobj, &vq_type);
> +		ret = kobject_add(&dev->vqs[i]->kobj,
> +				  &dev->dev->kobj, "vq%d", i);
> +		if (ret) {
> +			kfree(dev->vqs[i]);
> +			goto err;
> +		}
>   	}
>   
>   	return 0;
>   err:
>   	while (i--)
> -		kfree(dev->vqs[i]);
> +		kobject_put(&dev->vqs[i]->kobj);
>   	kfree(dev->vqs);
>   	dev->vqs = NULL;
> -	return -ENOMEM;
> +	return ret;
>   }
>   
>   static struct vduse_dev *vduse_dev_create(void)
> @@ -1607,10 +1708,6 @@ static int vduse_create_dev(struct vduse_dev_config *config,
>   	dev->config = config_buf;
>   	dev->config_size = config->config_size;
>   
> -	ret = vduse_dev_init_vqs(dev, config->vq_align, config->vq_num);
> -	if (ret)
> -		goto err_vqs;
> -
>   	ret = idr_alloc(&vduse_idr, dev, 1, VDUSE_DEV_MAX, GFP_KERNEL);
>   	if (ret < 0)
>   		goto err_idr;
> @@ -1624,14 +1721,19 @@ static int vduse_create_dev(struct vduse_dev_config *config,
>   		ret = PTR_ERR(dev->dev);
>   		goto err_dev;
>   	}
> +
> +	ret = vduse_dev_init_vqs(dev, config->vq_align, config->vq_num);
> +	if (ret)
> +		goto err_vqs;
> +
>   	__module_get(THIS_MODULE);
>   
>   	return 0;
> +err_vqs:
> +	device_destroy(vduse_class, MKDEV(MAJOR(vduse_major), dev->minor));
>   err_dev:
>   	idr_remove(&vduse_idr, dev->minor);
>   err_idr:
> -	vduse_dev_deinit_vqs(dev);
> -err_vqs:
>   	vduse_domain_destroy(dev->domain);
>   err_domain:
>   	kfree(dev->name);

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 08/11] vdpa: Add eventfd for the vdpa callback
       [not found] ` <20230323053043.35-9-xieyongji@bytedance.com>
@ 2023-03-28  6:17   ` Jason Wang
  0 siblings, 0 replies; 11+ messages in thread
From: Jason Wang @ 2023-03-28  6:17 UTC (permalink / raw)
  To: Xie Yongji, mst, tglx, hch; +Cc: linux-kernel, virtualization


在 2023/3/23 13:30, Xie Yongji 写道:
> Add eventfd for the vdpa callback so that user
> can signal it directly instead of triggering the
> callback. It will be used for vhost-vdpa case.
>
> Signed-off-by: Xie Yongji <xieyongji@bytedance.com>


Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


> ---
>   drivers/vhost/vdpa.c         | 2 ++
>   drivers/virtio/virtio_vdpa.c | 1 +
>   include/linux/vdpa.h         | 6 ++++++
>   3 files changed, 9 insertions(+)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 7be9d9d8f01c..9cd878e25cff 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -599,9 +599,11 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
>   		if (vq->call_ctx.ctx) {
>   			cb.callback = vhost_vdpa_virtqueue_cb;
>   			cb.private = vq;
> +			cb.trigger = vq->call_ctx.ctx;
>   		} else {
>   			cb.callback = NULL;
>   			cb.private = NULL;
> +			cb.trigger = NULL;
>   		}
>   		ops->set_vq_cb(vdpa, idx, &cb);
>   		vhost_vdpa_setup_vq_irq(v, idx);
> diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
> index f3826f42b704..2a095f37f26b 100644
> --- a/drivers/virtio/virtio_vdpa.c
> +++ b/drivers/virtio/virtio_vdpa.c
> @@ -196,6 +196,7 @@ virtio_vdpa_setup_vq(struct virtio_device *vdev, unsigned int index,
>   	/* Setup virtqueue callback */
>   	cb.callback = callback ? virtio_vdpa_virtqueue_cb : NULL;
>   	cb.private = info;
> +	cb.trigger = NULL;
>   	ops->set_vq_cb(vdpa, index, &cb);
>   	ops->set_vq_num(vdpa, index, virtqueue_get_vring_size(vq));
>   
> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> index e52c9a37999c..0372b2e3d38a 100644
> --- a/include/linux/vdpa.h
> +++ b/include/linux/vdpa.h
> @@ -13,10 +13,16 @@
>    * struct vdpa_calllback - vDPA callback definition.
>    * @callback: interrupt callback function
>    * @private: the data passed to the callback function
> + * @trigger: the eventfd for the callback (Optional).
> + *           When it is set, the vDPA driver must guarantee that
> + *           signaling it is functional equivalent to triggering
> + *           the callback. Then vDPA parent can signal it directly
> + *           instead of triggering the callback.
>    */
>   struct vdpa_callback {
>   	irqreturn_t (*callback)(void *data);
>   	void *private;
> +	struct eventfd_ctx *trigger;
>   };
>   
>   /**

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 01/11] lib/group_cpus: Export group_cpus_evenly()
       [not found] ` <20230323053043.35-2-xieyongji@bytedance.com>
@ 2023-04-04 18:20   ` Michael S. Tsirkin
  0 siblings, 0 replies; 11+ messages in thread
From: Michael S. Tsirkin @ 2023-04-04 18:20 UTC (permalink / raw)
  To: Xie Yongji; +Cc: linux-kernel, virtualization, tglx, hch

On Thu, Mar 23, 2023 at 01:30:33PM +0800, Xie Yongji wrote:
> Export group_cpus_evenly() so that some modules
> can make use of it to group CPUs evenly according
> to NUMA and CPU locality.
> 
> Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
> Acked-by: Jason Wang <jasowang@redhat.com>


Hi Thomas, do you ack merging this through my tree?
Thanks!

> ---
>  lib/group_cpus.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> index 9c837a35fef7..aa3f6815bb12 100644
> --- a/lib/group_cpus.c
> +++ b/lib/group_cpus.c
> @@ -426,3 +426,4 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps)
>  	return masks;
>  }
>  #endif /* CONFIG_SMP */
> +EXPORT_SYMBOL_GPL(group_cpus_evenly);
> -- 
> 2.20.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-04-04 18:20 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20230323053043.35-1-xieyongji@bytedance.com>
     [not found] ` <20230323053043.35-4-xieyongji@bytedance.com>
2023-03-24  6:27   ` [PATCH v4 03/11] virtio-vdpa: Support interrupt affinity spreading mechanism Jason Wang
2023-03-24  9:12     ` Michael S. Tsirkin
2023-03-28  6:12       ` Jason Wang
     [not found]     ` <CACycT3sm1P2qDQTNKp+RLmyd84+v8xwErf_g1SXqiaJDQO8LNg@mail.gmail.com>
2023-03-28  3:14       ` Jason Wang
     [not found]         ` <CACycT3uYbnrQDDbFmwdww8ukMU1t9RsAuutHsFT-UzK9_Mc=Kg@mail.gmail.com>
2023-03-28  3:44           ` Jason Wang
     [not found]             ` <CACycT3vCqisBS0OyMsnyrw0i6kWTDqSZ4GQbdoycHz-L3=1Q7Q@mail.gmail.com>
2023-03-28  6:07               ` Jason Wang
     [not found] ` <20230323053043.35-6-xieyongji@bytedance.com>
2023-03-28  6:14   ` [PATCH v4 05/11] vduse: Support set_vq_affinity callback Jason Wang
     [not found] ` <20230323053043.35-7-xieyongji@bytedance.com>
2023-03-28  6:15   ` [PATCH v4 06/11] vduse: Support get_vq_affinity callback Jason Wang
     [not found] ` <20230323053043.35-8-xieyongji@bytedance.com>
2023-03-28  6:16   ` [PATCH v4 07/11] vduse: Add sysfs interface for irq callback affinity Jason Wang
     [not found] ` <20230323053043.35-9-xieyongji@bytedance.com>
2023-03-28  6:17   ` [PATCH v4 08/11] vdpa: Add eventfd for the vdpa callback Jason Wang
     [not found] ` <20230323053043.35-2-xieyongji@bytedance.com>
2023-04-04 18:20   ` [PATCH v4 01/11] lib/group_cpus: Export group_cpus_evenly() Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).