This series intends to implement IRQ offloading for vhost_vdpa. By the feat of irq forwarding facilities like posted interrupt on X86, irq bypass can help deliver interrupts to vCPU directly. vDPA devices have dedicated hardware backends like VFIO pass-throughed devices. So it would be possible to setup irq offloading(irq bypass) for vDPA devices and gain performance improvements. In my testing, with this feature, we can save 0.1ms in a ping between two VFs on average. changes from V3: (1)removed vDPA irq allocate/free helpers in vDPA core. (2)add a new function get_vq_irq() in struct vdpa_config_ops, upper layer driver can use this function to: A. query the irq numbner of a vq. B. detect whether a vq is enabled. (3)implement get_vq_irq() in ifcvf driver. (4)in vhost_vdpa, set_status() will setup irq offloading when setting DRIVER_OK, and unsetup when receive !DRIVER_OK. (5)minor improvements. changes from V2: (1)rename struct vhost_call_ctx to vhost_vring_call (2)add kvm_arch_end_assignment() in del_producer() code path (3)rename vDPA helpers to vdpa_devm_request_irq() and vdpa_devm_free_irq(). Add better comments for them. (4)better comments for setup_vq_irq() and unsetup_vq_irq() (5)In vDPA VHOST_SET_VRING_CALL, will call vhost_vdpa_update_vq_irq() without checking producer.irq, move this check into vhost_vdpa_update_vq_irq(), so that get advantage of the spinlock. (6)Add a function vhost_vdpa_clean_irq(), this function will unregister the producer of vqs when vhost_vdpa_release(). This is safe for control vq. (7) minor improvements changes from V1: (1)dropped vfio changes. (3)removed KVM_HVAE_IRQ_BYPASS checks (4)locking fixes (5)simplified vhost_vdpa_updat Zhu Lingshan (6): vhost: introduce vhost_vring_call kvm: detect assigned device via irqbypass manager vDPA: add get_vq_irq() in vdpa_config_ops vhost_vdpa: implement IRQ offloading in vhost_vdpa ifcvf: implement vdpa_config_ops.get_vq_irq() irqbypass: do not start cons/prod when failed connect arch/x86/kvm/x86.c | 12 ++++- drivers/vdpa/ifcvf/ifcvf_main.c | 18 ++++++- drivers/vhost/Kconfig | 1 + drivers/vhost/vdpa.c | 83 +++++++++++++++++++++++++++++++-- drivers/vhost/vhost.c | 22 ++++++--- drivers/vhost/vhost.h | 9 +++- include/linux/vdpa.h | 6 +++ virt/lib/irqbypass.c | 16 ++++--- 8 files changed, 147 insertions(+), 20 deletions(-) -- 2.18.4
This commit introduces struct vhost_vring_call which replaced raw struct eventfd_ctx *call_ctx in struct vhost_virtqueue. Besides eventfd_ctx, it contains a spin lock and an irq_bypass_producer in its structure. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Suggested-by: Jason Wang <jasowang@redhat.com> --- drivers/vhost/vdpa.c | 4 ++-- drivers/vhost/vhost.c | 22 ++++++++++++++++------ drivers/vhost/vhost.h | 9 ++++++++- 3 files changed, 26 insertions(+), 9 deletions(-) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index a54b60d6623f..df3cf386b0cd 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -96,7 +96,7 @@ static void handle_vq_kick(struct vhost_work *work) static irqreturn_t vhost_vdpa_virtqueue_cb(void *private) { struct vhost_virtqueue *vq = private; - struct eventfd_ctx *call_ctx = vq->call_ctx; + struct eventfd_ctx *call_ctx = vq->call_ctx.ctx; if (call_ctx) eventfd_signal(call_ctx, 1); @@ -382,7 +382,7 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd, break; case VHOST_SET_VRING_CALL: - if (vq->call_ctx) { + if (vq->call_ctx.ctx) { cb.callback = vhost_vdpa_virtqueue_cb; cb.private = vq; } else { diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index d7b8df3edffc..9f1a845a9302 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -298,6 +298,13 @@ static void vhost_vq_meta_reset(struct vhost_dev *d) __vhost_vq_meta_reset(d->vqs[i]); } +static void vhost_vring_call_reset(struct vhost_vring_call *call_ctx) +{ + call_ctx->ctx = NULL; + memset(&call_ctx->producer, 0x0, sizeof(struct irq_bypass_producer)); + spin_lock_init(&call_ctx->ctx_lock); +} + static void vhost_vq_reset(struct vhost_dev *dev, struct vhost_virtqueue *vq) { @@ -319,13 +326,13 @@ static void vhost_vq_reset(struct vhost_dev *dev, vq->log_base = NULL; vq->error_ctx = NULL; vq->kick = NULL; - vq->call_ctx = NULL; vq->log_ctx = NULL; vhost_reset_is_le(vq); vhost_disable_cross_endian(vq); vq->busyloop_timeout = 0; vq->umem = NULL; vq->iotlb = NULL; + vhost_vring_call_reset(&vq->call_ctx); __vhost_vq_meta_reset(vq); } @@ -685,8 +692,8 @@ void vhost_dev_cleanup(struct vhost_dev *dev) eventfd_ctx_put(dev->vqs[i]->error_ctx); if (dev->vqs[i]->kick) fput(dev->vqs[i]->kick); - if (dev->vqs[i]->call_ctx) - eventfd_ctx_put(dev->vqs[i]->call_ctx); + if (dev->vqs[i]->call_ctx.ctx) + eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx); vhost_vq_reset(dev, dev->vqs[i]); } vhost_dev_free_iovecs(dev); @@ -1629,7 +1636,10 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg r = PTR_ERR(ctx); break; } - swap(ctx, vq->call_ctx); + + spin_lock(&vq->call_ctx.ctx_lock); + swap(ctx, vq->call_ctx.ctx); + spin_unlock(&vq->call_ctx.ctx_lock); break; case VHOST_SET_VRING_ERR: if (copy_from_user(&f, argp, sizeof f)) { @@ -2440,8 +2450,8 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq) void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq) { /* Signal the Guest tell them we used something up. */ - if (vq->call_ctx && vhost_notify(dev, vq)) - eventfd_signal(vq->call_ctx, 1); + if (vq->call_ctx.ctx && vhost_notify(dev, vq)) + eventfd_signal(vq->call_ctx.ctx, 1); } EXPORT_SYMBOL_GPL(vhost_signal); diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index c8e96a095d3b..38eb1aa3b68d 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -13,6 +13,7 @@ #include <linux/virtio_ring.h> #include <linux/atomic.h> #include <linux/vhost_iotlb.h> +#include <linux/irqbypass.h> struct vhost_work; typedef void (*vhost_work_fn_t)(struct vhost_work *work); @@ -60,6 +61,12 @@ enum vhost_uaddr_type { VHOST_NUM_ADDRS = 3, }; +struct vhost_vring_call { + struct eventfd_ctx *ctx; + struct irq_bypass_producer producer; + spinlock_t ctx_lock; +}; + /* The virtqueue structure describes a queue attached to a device. */ struct vhost_virtqueue { struct vhost_dev *dev; @@ -72,7 +79,7 @@ struct vhost_virtqueue { vring_used_t __user *used; const struct vhost_iotlb_map *meta_iotlb[VHOST_NUM_ADDRS]; struct file *kick; - struct eventfd_ctx *call_ctx; + struct vhost_vring_call call_ctx; struct eventfd_ctx *error_ctx; struct eventfd_ctx *log_ctx; -- 2.18.4
vDPA devices has dedicated backed hardware like passthrough-ed devices. Then it is possible to setup irq offloading to vCPU for vDPA devices. Thus this patch tries to manipulated assigned device counters by kvm_arch_start/end_assignment() in irqbypass manager, so that assigned devices could be detected in update_pi_irte() We will increase/decrease the assigned device counter in kvm/x86. Both vDPA and VFIO would go through this code path. Only X86 uses these counters and kvm_arch_start/end_assignment(), so this code path only affect x86 for now. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Suggested-by: Jason Wang <jasowang@redhat.com> --- arch/x86/kvm/x86.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 88c593f83b28..76a2e7fd18c7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10630,11 +10630,17 @@ int kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *cons, { struct kvm_kernel_irqfd *irqfd = container_of(cons, struct kvm_kernel_irqfd, consumer); + int ret; irqfd->producer = prod; + kvm_arch_start_assignment(irqfd->kvm); + ret = kvm_x86_ops.update_pi_irte(irqfd->kvm, + prod->irq, irqfd->gsi, 1); + + if (ret) + kvm_arch_end_assignment(irqfd->kvm); - return kvm_x86_ops.update_pi_irte(irqfd->kvm, - prod->irq, irqfd->gsi, 1); + return ret; } void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons, @@ -10657,6 +10663,8 @@ void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons, if (ret) printk(KERN_INFO "irq bypass consumer (token %p) unregistration" " fails: %d\n", irqfd->consumer.token, ret); + + kvm_arch_end_assignment(irqfd->kvm); } int kvm_arch_update_irqfd_routing(struct kvm *kvm, unsigned int host_irq, -- 2.18.4
This commit adds a new function get_vq_irq() in struct vdpa_config_ops, which will return the irq number of a virtqueue. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Suggested-by: Jason Wang <jasowang@redhat.com> --- include/linux/vdpa.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index 239db794357c..cebc79173aaa 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -87,6 +87,11 @@ struct vdpa_device { * @vdev: vdpa device * @idx: virtqueue index * Returns the notifcation area + * @get_vq_irq: Get the irq number of a virtqueue + * @vdev: vdpa device + * @idx: virtqueue index + * Returns u32: irq number of a virtqueue, + * -EINVAL if no irq assigned. * @get_vq_align: Get the virtqueue align requirement * for the device * @vdev: vdpa device @@ -178,6 +183,7 @@ struct vdpa_config_ops { u64 (*get_vq_state)(struct vdpa_device *vdev, u16 idx); struct vdpa_notification_area (*get_vq_notification)(struct vdpa_device *vdev, u16 idx); + u32 (*get_vq_irq)(struct vdpa_device *vdv, u16 idx); /* Device ops */ u32 (*get_vq_align)(struct vdpa_device *vdev); -- 2.18.4
This patch introduce a set of functions for setup/unsetup and update irq offloading respectively by register/unregister and re-register the irq_bypass_producer. With these functions, this commit can setup/unsetup irq offloading through setting DRIVER_OK/!DRIVER_OK, and update irq offloading through SET_VRING_CALL. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Suggested-by: Jason Wang <jasowang@redhat.com> --- drivers/vhost/Kconfig | 1 + drivers/vhost/vdpa.c | 79 ++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 79 insertions(+), 1 deletion(-) diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig index d3688c6afb87..587fbae06182 100644 --- a/drivers/vhost/Kconfig +++ b/drivers/vhost/Kconfig @@ -65,6 +65,7 @@ config VHOST_VDPA tristate "Vhost driver for vDPA-based backend" depends on EVENTFD select VHOST + select IRQ_BYPASS_MANAGER depends on VDPA help This kernel module can be loaded in host kernel to accelerate diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index df3cf386b0cd..1dccced321f8 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -115,6 +115,55 @@ static irqreturn_t vhost_vdpa_config_cb(void *private) return IRQ_HANDLED; } +static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid) +{ + struct vhost_virtqueue *vq = &v->vqs[qid]; + const struct vdpa_config_ops *ops = v->vdpa->config; + struct vdpa_device *vdpa = v->vdpa; + int ret, irq; + + spin_lock(&vq->call_ctx.ctx_lock); + irq = ops->get_vq_irq(vdpa, qid); + if (!vq->call_ctx.ctx || irq == -EINVAL) { + spin_unlock(&vq->call_ctx.ctx_lock); + return; + } + + vq->call_ctx.producer.token = vq->call_ctx.ctx; + vq->call_ctx.producer.irq = irq; + ret = irq_bypass_register_producer(&vq->call_ctx.producer); + spin_unlock(&vq->call_ctx.ctx_lock); +} + +static void vhost_vdpa_unsetup_vq_irq(struct vhost_vdpa *v, int qid) +{ + struct vhost_virtqueue *vq = &v->vqs[qid]; + + spin_lock(&vq->call_ctx.ctx_lock); + irq_bypass_unregister_producer(&vq->call_ctx.producer); + spin_unlock(&vq->call_ctx.ctx_lock); +} + +static void vhost_vdpa_update_vq_irq(struct vhost_virtqueue *vq) +{ + spin_lock(&vq->call_ctx.ctx_lock); + /* + * if it has a non-zero irq, means there is a + * previsouly registered irq_bypass_producer, + * we should update it when ctx (its token) + * changes. + */ + if (!vq->call_ctx.producer.irq) { + spin_unlock(&vq->call_ctx.ctx_lock); + return; + } + + irq_bypass_unregister_producer(&vq->call_ctx.producer); + vq->call_ctx.producer.token = vq->call_ctx.ctx; + irq_bypass_register_producer(&vq->call_ctx.producer); + spin_unlock(&vq->call_ctx.ctx_lock); +} + static void vhost_vdpa_reset(struct vhost_vdpa *v) { struct vdpa_device *vdpa = v->vdpa; @@ -155,11 +204,15 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, u8 __user *statusp) { struct vdpa_device *vdpa = v->vdpa; const struct vdpa_config_ops *ops = vdpa->config; - u8 status; + u8 status, status_old; + int i, nvqs; if (copy_from_user(&status, statusp, sizeof(status))) return -EFAULT; + status_old = ops->get_status(vdpa); + nvqs = v->nvqs; + /* * Userspace shouldn't remove status bits unless reset the * status to 0. @@ -167,6 +220,15 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, u8 __user *statusp) if (status != 0 && (ops->get_status(vdpa) & ~status) != 0) return -EINVAL; + /* vq irq is not expected to be changed once DRIVER_OK is set */ + if ((status & VIRTIO_CONFIG_S_DRIVER_OK) && !(status_old & VIRTIO_CONFIG_S_DRIVER_OK)) + for (i = 0; i < nvqs; i++) + vhost_vdpa_setup_vq_irq(v, i); + + if ((status_old & VIRTIO_CONFIG_S_DRIVER_OK) && !(status & VIRTIO_CONFIG_S_DRIVER_OK)) + for (i = 0; i < nvqs; i++) + vhost_vdpa_unsetup_vq_irq(v, i); + ops->set_status(vdpa, status); return 0; @@ -332,6 +394,7 @@ static long vhost_vdpa_set_config_call(struct vhost_vdpa *v, u32 __user *argp) return 0; } + static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd, void __user *argp) { @@ -390,6 +453,7 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd, cb.private = NULL; } ops->set_vq_cb(vdpa, idx, &cb); + vhost_vdpa_update_vq_irq(vq); break; case VHOST_SET_VRING_NUM: @@ -765,6 +829,18 @@ static int vhost_vdpa_open(struct inode *inode, struct file *filep) return r; } +static void vhost_vdpa_clean_irq(struct vhost_vdpa *v) +{ + struct vhost_virtqueue *vq; + int i; + + for (i = 0; i < v->nvqs; i++) { + vq = &v->vqs[i]; + if (vq->call_ctx.producer.irq) + irq_bypass_unregister_producer(&vq->call_ctx.producer); + } +} + static int vhost_vdpa_release(struct inode *inode, struct file *filep) { struct vhost_vdpa *v = filep->private_data; @@ -777,6 +853,7 @@ static int vhost_vdpa_release(struct inode *inode, struct file *filep) vhost_vdpa_iotlb_free(v); vhost_vdpa_free_domain(v); vhost_vdpa_config_put(v); + vhost_vdpa_clean_irq(v); vhost_dev_cleanup(&v->vdev); kfree(v->vdev.vqs); mutex_unlock(&d->mutex); -- 2.18.4
This commit implemented vdpa_config_ops.get_vq_irq() in ifcvf, and initialized vq irq to -EINVAL. So that ifcvf can report irq number of a vq, or -EINVAL if the vq is not assigned an irq number. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Suggested-by: Jason Wang <jasowang@redhat.com> --- drivers/vdpa/ifcvf/ifcvf_main.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c index f5a60c14b979..8db530280abe 100644 --- a/drivers/vdpa/ifcvf/ifcvf_main.c +++ b/drivers/vdpa/ifcvf/ifcvf_main.c @@ -50,8 +50,10 @@ static void ifcvf_free_irq(struct ifcvf_adapter *adapter, int queues) int i; - for (i = 0; i < queues; i++) + for (i = 0; i < queues; i++) { devm_free_irq(&pdev->dev, vf->vring[i].irq, &vf->vring[i]); + vf->vring[i].irq = -EINVAL; + } ifcvf_free_irq_vectors(pdev); } @@ -352,6 +354,14 @@ static void ifcvf_vdpa_set_config_cb(struct vdpa_device *vdpa_dev, vf->config_cb.private = cb->private; } +static u32 ifcvf_vdpa_get_vq_irq(struct vdpa_device *vdpa_dev, + u16 qid) +{ + struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev); + + return vf->vring[qid].irq; +} + /* * IFCVF currently does't have on-chip IOMMU, so not * implemented set_map()/dma_map()/dma_unmap() @@ -369,6 +379,7 @@ static const struct vdpa_config_ops ifc_vdpa_ops = { .get_vq_ready = ifcvf_vdpa_get_vq_ready, .set_vq_num = ifcvf_vdpa_set_vq_num, .set_vq_address = ifcvf_vdpa_set_vq_address, + .get_vq_irq = ifcvf_vdpa_get_vq_irq, .kick_vq = ifcvf_vdpa_kick_vq, .get_generation = ifcvf_vdpa_get_generation, .get_device_id = ifcvf_vdpa_get_device_id, @@ -384,7 +395,7 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id *id) struct device *dev = &pdev->dev; struct ifcvf_adapter *adapter; struct ifcvf_hw *vf; - int ret; + int ret, i; ret = pcim_enable_device(pdev); if (ret) { @@ -441,6 +452,9 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id *id) goto err; } + for (i = 0; i < IFCVF_MAX_QUEUE_PAIRS * 2; i++) + vf->vring[i].irq = -EINVAL; + ret = vdpa_register_device(&adapter->vdpa); if (ret) { IFCVF_ERR(pdev, "Failed to register ifcvf to vdpa bus"); -- 2.18.4
If failed to connect, there is no need to start consumer nor producer. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Suggested-by: Jason Wang <jasowang@redhat.com> --- virt/lib/irqbypass.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/virt/lib/irqbypass.c b/virt/lib/irqbypass.c index 28fda42e471b..c9bb3957f58a 100644 --- a/virt/lib/irqbypass.c +++ b/virt/lib/irqbypass.c @@ -40,17 +40,21 @@ static int __connect(struct irq_bypass_producer *prod, if (prod->add_consumer) ret = prod->add_consumer(prod, cons); - if (!ret) { - ret = cons->add_producer(cons, prod); - if (ret && prod->del_consumer) - prod->del_consumer(prod, cons); - } + if (ret) + goto err_add_consumer; + + ret = cons->add_producer(cons, prod); + if (ret) + goto err_add_producer; if (cons->start) cons->start(cons); if (prod->start) prod->start(prod); - +err_add_producer: + if (prod->del_consumer) + prod->del_consumer(prod, cons); +err_add_consumer: return ret; } -- 2.18.4
On 2020/7/28 下午12:24, Zhu Lingshan wrote: > This commit adds a new function get_vq_irq() in struct > vdpa_config_ops, which will return the irq number of a > virtqueue. > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> > Suggested-by: Jason Wang <jasowang@redhat.com> > --- > include/linux/vdpa.h | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h > index 239db794357c..cebc79173aaa 100644 > --- a/include/linux/vdpa.h > +++ b/include/linux/vdpa.h > @@ -87,6 +87,11 @@ struct vdpa_device { > * @vdev: vdpa device > * @idx: virtqueue index > * Returns the notifcation area > + * @get_vq_irq: Get the irq number of a virtqueue > + * @vdev: vdpa device > + * @idx: virtqueue index > + * Returns u32: irq number of a virtqueue, > + * -EINVAL if no irq assigned. I think we can not get -EINVAL since the function will return a u32. Thanks > * @get_vq_align: Get the virtqueue align requirement > * for the device > * @vdev: vdpa device > @@ -178,6 +183,7 @@ struct vdpa_config_ops { > u64 (*get_vq_state)(struct vdpa_device *vdev, u16 idx); > struct vdpa_notification_area > (*get_vq_notification)(struct vdpa_device *vdev, u16 idx); > + u32 (*get_vq_irq)(struct vdpa_device *vdv, u16 idx); > > /* Device ops */ > u32 (*get_vq_align)(struct vdpa_device *vdev);
On 2020/7/28 下午12:24, Zhu Lingshan wrote: > This patch introduce a set of functions for setup/unsetup > and update irq offloading respectively by register/unregister > and re-register the irq_bypass_producer. > > With these functions, this commit can setup/unsetup > irq offloading through setting DRIVER_OK/!DRIVER_OK, and > update irq offloading through SET_VRING_CALL. > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> > Suggested-by: Jason Wang <jasowang@redhat.com> > --- > drivers/vhost/Kconfig | 1 + > drivers/vhost/vdpa.c | 79 ++++++++++++++++++++++++++++++++++++++++++- > 2 files changed, 79 insertions(+), 1 deletion(-) > > diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig > index d3688c6afb87..587fbae06182 100644 > --- a/drivers/vhost/Kconfig > +++ b/drivers/vhost/Kconfig > @@ -65,6 +65,7 @@ config VHOST_VDPA > tristate "Vhost driver for vDPA-based backend" > depends on EVENTFD > select VHOST > + select IRQ_BYPASS_MANAGER > depends on VDPA > help > This kernel module can be loaded in host kernel to accelerate > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c > index df3cf386b0cd..1dccced321f8 100644 > --- a/drivers/vhost/vdpa.c > +++ b/drivers/vhost/vdpa.c > @@ -115,6 +115,55 @@ static irqreturn_t vhost_vdpa_config_cb(void *private) > return IRQ_HANDLED; > } > > +static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid) > +{ > + struct vhost_virtqueue *vq = &v->vqs[qid]; > + const struct vdpa_config_ops *ops = v->vdpa->config; > + struct vdpa_device *vdpa = v->vdpa; > + int ret, irq; > + > + spin_lock(&vq->call_ctx.ctx_lock); > + irq = ops->get_vq_irq(vdpa, qid); > + if (!vq->call_ctx.ctx || irq == -EINVAL) { It's better to check returned irq as "irq < 0" to be more robust. Forcing a specific errno value is not good. > + spin_unlock(&vq->call_ctx.ctx_lock); > + return; > + } > + > + vq->call_ctx.producer.token = vq->call_ctx.ctx; > + vq->call_ctx.producer.irq = irq; > + ret = irq_bypass_register_producer(&vq->call_ctx.producer); > + spin_unlock(&vq->call_ctx.ctx_lock); > +} > + > +static void vhost_vdpa_unsetup_vq_irq(struct vhost_vdpa *v, int qid) > +{ > + struct vhost_virtqueue *vq = &v->vqs[qid]; > + > + spin_lock(&vq->call_ctx.ctx_lock); > + irq_bypass_unregister_producer(&vq->call_ctx.producer); > + spin_unlock(&vq->call_ctx.ctx_lock); > +} > + > +static void vhost_vdpa_update_vq_irq(struct vhost_virtqueue *vq) > +{ > + spin_lock(&vq->call_ctx.ctx_lock); > + /* > + * if it has a non-zero irq, means there is a > + * previsouly registered irq_bypass_producer, > + * we should update it when ctx (its token) > + * changes. > + */ > + if (!vq->call_ctx.producer.irq) { > + spin_unlock(&vq->call_ctx.ctx_lock); > + return; > + } > + > + irq_bypass_unregister_producer(&vq->call_ctx.producer); > + vq->call_ctx.producer.token = vq->call_ctx.ctx; > + irq_bypass_register_producer(&vq->call_ctx.producer); > + spin_unlock(&vq->call_ctx.ctx_lock); > +} > + > static void vhost_vdpa_reset(struct vhost_vdpa *v) > { > struct vdpa_device *vdpa = v->vdpa; > @@ -155,11 +204,15 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, u8 __user *statusp) > { > struct vdpa_device *vdpa = v->vdpa; > const struct vdpa_config_ops *ops = vdpa->config; > - u8 status; > + u8 status, status_old; > + int i, nvqs; > > if (copy_from_user(&status, statusp, sizeof(status))) > return -EFAULT; > > + status_old = ops->get_status(vdpa); > + nvqs = v->nvqs; > + > /* > * Userspace shouldn't remove status bits unless reset the > * status to 0. > @@ -167,6 +220,15 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, u8 __user *statusp) > if (status != 0 && (ops->get_status(vdpa) & ~status) != 0) > return -EINVAL; > > + /* vq irq is not expected to be changed once DRIVER_OK is set */ So this basically limit the usage of get_vq_irq() in the context vhost_vdpa_set_status() and other vDPA bus drivers' set_status(). If this is true, there's even no need to introduce any new config ops but just let set_status() to return the irqs used for the device. Or if we want this to be more generic, we need vpda's own irq manager (which should be similar to irq bypass manager). That is: - bus driver can register itself as consumer - vDPA device driver can register itself as producer - matching via queue index - deal with registering/unregistering of consumer/producer So there's no need to care when or where the vDPA device driver to allocate the irq, and we don't need to care at which context the vDPA bus driver can use the irq. Either side may get notified when the other side is gone (though we only care about the gone of producer probably). Any thought on this? Thanks > + if ((status & VIRTIO_CONFIG_S_DRIVER_OK) && !(status_old & VIRTIO_CONFIG_S_DRIVER_OK)) > + for (i = 0; i < nvqs; i++) > + vhost_vdpa_setup_vq_irq(v, i); > + > + if ((status_old & VIRTIO_CONFIG_S_DRIVER_OK) && !(status & VIRTIO_CONFIG_S_DRIVER_OK)) > + for (i = 0; i < nvqs; i++) > + vhost_vdpa_unsetup_vq_irq(v, i); > + > ops->set_status(vdpa, status); > > return 0; > @@ -332,6 +394,7 @@ static long vhost_vdpa_set_config_call(struct vhost_vdpa *v, u32 __user *argp) > > return 0; > } > + > static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd, > void __user *argp) > { > @@ -390,6 +453,7 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd, > cb.private = NULL; > } > ops->set_vq_cb(vdpa, idx, &cb); > + vhost_vdpa_update_vq_irq(vq); > break; > > case VHOST_SET_VRING_NUM: > @@ -765,6 +829,18 @@ static int vhost_vdpa_open(struct inode *inode, struct file *filep) > return r; > } > > +static void vhost_vdpa_clean_irq(struct vhost_vdpa *v) > +{ > + struct vhost_virtqueue *vq; > + int i; > + > + for (i = 0; i < v->nvqs; i++) { > + vq = &v->vqs[i]; > + if (vq->call_ctx.producer.irq) > + irq_bypass_unregister_producer(&vq->call_ctx.producer); > + } > +} > + > static int vhost_vdpa_release(struct inode *inode, struct file *filep) > { > struct vhost_vdpa *v = filep->private_data; > @@ -777,6 +853,7 @@ static int vhost_vdpa_release(struct inode *inode, struct file *filep) > vhost_vdpa_iotlb_free(v); > vhost_vdpa_free_domain(v); > vhost_vdpa_config_put(v); > + vhost_vdpa_clean_irq(v); > vhost_dev_cleanup(&v->vdev); > kfree(v->vdev.vqs); > mutex_unlock(&d->mutex);
On Tue, Jul 28, 2020 at 12:24:03PM +0800, Zhu Lingshan wrote:
>
> +static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid)
> +{
> + struct vhost_virtqueue *vq = &v->vqs[qid];
> + const struct vdpa_config_ops *ops = v->vdpa->config;
> + struct vdpa_device *vdpa = v->vdpa;
> + int ret, irq;
> +
> + spin_lock(&vq->call_ctx.ctx_lock);
> + irq = ops->get_vq_irq(vdpa, qid);
> + if (!vq->call_ctx.ctx || irq == -EINVAL) {
> + spin_unlock(&vq->call_ctx.ctx_lock);
> + return;
> + }
> +
If I understand correctly, this will cause these IRQs to be forwarded
directly to the VCPU, e.g. will be handled by the guest/qemu.
Does this mean that the host will not handle this interrupt? How does it
work in case on level triggered interrupts?
In the case of ConnectX, I need to execute some code to acknowledge the
interrupt.
Can you explain how this should be done?
On 2020/7/28 下午5:18, Zhu, Lingshan wrote: >>> >>> * status to 0. >>> @@ -167,6 +220,15 @@ static long vhost_vdpa_set_status(struct >>> vhost_vdpa *v, u8 __user *statusp) >>> if (status != 0 && (ops->get_status(vdpa) & ~status) != 0) >>> return -EINVAL; >>> + /* vq irq is not expected to be changed once DRIVER_OK is set */ >> >> >> So this basically limit the usage of get_vq_irq() in the context >> vhost_vdpa_set_status() and other vDPA bus drivers' set_status(). If >> this is true, there's even no need to introduce any new config ops >> but just let set_status() to return the irqs used for the device. Or >> if we want this to be more generic, we need vpda's own irq manager >> (which should be similar to irq bypass manager). That is: > I think there is no need for a driver to free / re-request its irqs after DRIVER_OK though it can do so. If a driver changed its irq of a vq after DRIVER_OK, the vq is still operational but will lose irq offloading that is reasonable. > If we want set_status() return irqs, we need to record the irqs somewhere in vdpa_device, Why, we can simply pass an array to the driver I think? void (*set_status)(struct vdpa_device *vdev, u8 status, int *irqs); > as we discussed in a previous thread, this may need initialize and cleanup works, so a new ops > with proper comments (don't say they could not change irq, but highlight if irq changes, irq offloading will not work till next DRIVER_OK) could be more elegant. > However if we really need to change irq after DRIVER_OK, I think maybe we still need vDPA vq irq allocate / free helpers, then the helpers can not be used in probe() as we discussed before, it is a step back to V3 series. Still, it's not about whether driver may change irq after DRIVER_OK but implication of the assumption. If one bus ops must be called in another ops, it's better to just implement them in one ops. >> >> - bus driver can register itself as consumer >> - vDPA device driver can register itself as producer >> - matching via queue index > IMHO, is it too heavy for this feature, Do you mean LOCs? We can: 1) refactor irq bypass manager 2) invent it our own (a much simplified version compared to bypass manager) 3) enforcing them via vDPA bus Each of the above should be not a lot of coding. I think method 3 is partially done in your previous series but in an implicit manner: - bus driver that has alloc_irq/free_irq implemented could be implicitly treated as consumer registering - every vDPA device driver could be treated as producer - vdpa_devm_alloc_irq() could be treated as producer registering - alloc_irq/free_irq is the add_producer/del_procuer We probably just lack some synchronization with driver probe/remove. > and how can they match if two individual adapters both have vq idx = 1. The matching is per vDPA device. Thanks > Thanks! >> - deal with registering/unregistering of consumer/producer >> >> So there's no need to care when or where the vDPA device driver to >> allocate the irq, and we don't need to care at which context the vDPA >> bus driver can use the irq. Either side may get notified when the >> other side is gone (though we only care about the gone of producer >> probably). >> >> Any thought on this? >> >> Thanks >> >>
On 2020/7/28 下午5:04, Eli Cohen wrote: > On Tue, Jul 28, 2020 at 12:24:03PM +0800, Zhu Lingshan wrote: >> >> +static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid) >> +{ >> + struct vhost_virtqueue *vq = &v->vqs[qid]; >> + const struct vdpa_config_ops *ops = v->vdpa->config; >> + struct vdpa_device *vdpa = v->vdpa; >> + int ret, irq; >> + >> + spin_lock(&vq->call_ctx.ctx_lock); >> + irq = ops->get_vq_irq(vdpa, qid); >> + if (!vq->call_ctx.ctx || irq == -EINVAL) { >> + spin_unlock(&vq->call_ctx.ctx_lock); >> + return; >> + } >> + > If I understand correctly, this will cause these IRQs to be forwarded > directly to the VCPU, e.g. will be handled by the guest/qemu. Yes, if it can bypassed, the interrupt will be delivered to vCPU directly. > Does this mean that the host will not handle this interrupt? How does it > work in case on level triggered interrupts? There's no guarantee that the KVM arch code can make sure the irq bypass work for any type of irq. So if they the irq will still need to be handled by host first. This means we should keep the host interrupt handler as a slowpath (fallback). > > In the case of ConnectX, I need to execute some code to acknowledge the > interrupt. This turns out to be hard for irq bypassing to work. Is it because the irq is shared or what kind of ack you need to do? Thanks > > Can you explain how this should be done? >
On Wed, Jul 29, 2020 at 05:21:53PM +0800, Jason Wang wrote: > > On 2020/7/28 下午5:04, Eli Cohen wrote: > >On Tue, Jul 28, 2020 at 12:24:03PM +0800, Zhu Lingshan wrote: > >>+static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid) > >>+{ > >>+ struct vhost_virtqueue *vq = &v->vqs[qid]; > >>+ const struct vdpa_config_ops *ops = v->vdpa->config; > >>+ struct vdpa_device *vdpa = v->vdpa; > >>+ int ret, irq; > >>+ > >>+ spin_lock(&vq->call_ctx.ctx_lock); > >>+ irq = ops->get_vq_irq(vdpa, qid); > >>+ if (!vq->call_ctx.ctx || irq == -EINVAL) { > >>+ spin_unlock(&vq->call_ctx.ctx_lock); > >>+ return; > >>+ } > >>+ > >If I understand correctly, this will cause these IRQs to be forwarded > >directly to the VCPU, e.g. will be handled by the guest/qemu. > > > Yes, if it can bypassed, the interrupt will be delivered to vCPU directly. > So, usually the network driver knows how to handle interrups for its devices. I assume the virtio_net driver at the guest has some default processing but what if the underlying hardware device (such as the case of vdpa) needs to take some actions? Is there an option to do bounce the interrupt back to the vendor specific driver in the host so it can take these actions? > > >Does this mean that the host will not handle this interrupt? How does it > >work in case on level triggered interrupts? > > > There's no guarantee that the KVM arch code can make sure the irq > bypass work for any type of irq. So if they the irq will still need > to be handled by host first. This means we should keep the host > interrupt handler as a slowpath (fallback). > > > > >In the case of ConnectX, I need to execute some code to acknowledge the > >interrupt. > > > This turns out to be hard for irq bypassing to work. Is it because > the irq is shared or what kind of ack you need to do? I have an EQ which is a queue for events comming from the hardware. This EQ can created so it reports only completion events but I still need to execute code that roughly tells the device that I saw these event records and then arm it again so it can report more interrupts (e.g if more packets are received or sent). This is device specific code. > > Thanks > > > > > >Can you explain how this should be done? > > >
On 2020/7/29 下午5:55, Eli Cohen wrote: > On Wed, Jul 29, 2020 at 05:21:53PM +0800, Jason Wang wrote: >> On 2020/7/28 下午5:04, Eli Cohen wrote: >>> On Tue, Jul 28, 2020 at 12:24:03PM +0800, Zhu Lingshan wrote: >>>> +static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid) >>>> +{ >>>> + struct vhost_virtqueue *vq = &v->vqs[qid]; >>>> + const struct vdpa_config_ops *ops = v->vdpa->config; >>>> + struct vdpa_device *vdpa = v->vdpa; >>>> + int ret, irq; >>>> + >>>> + spin_lock(&vq->call_ctx.ctx_lock); >>>> + irq = ops->get_vq_irq(vdpa, qid); >>>> + if (!vq->call_ctx.ctx || irq == -EINVAL) { >>>> + spin_unlock(&vq->call_ctx.ctx_lock); >>>> + return; >>>> + } >>>> + >>> If I understand correctly, this will cause these IRQs to be forwarded >>> directly to the VCPU, e.g. will be handled by the guest/qemu. >> >> Yes, if it can bypassed, the interrupt will be delivered to vCPU directly. >> > So, usually the network driver knows how to handle interrups for its > devices. I assume the virtio_net driver at the guest has some default > processing but what if the underlying hardware device (such as the case > of vdpa) needs to take some actions? Virtio splits the bus operations out of device operations. So did the driver. The virtio-net driver depends on a transport driver to talk to the real device. Usually PCI is used as the transport for the device. In this case virtio-pci driver is in charge of dealing with irq allocation/free/configuration and it needs to co-operate with platform specific irqchip (virtualized by KVM) to finish the work like irq acknowledge etc. E.g for x86, the irq offloading can only work when there's a hardware support of virtual irqchip (APICv) then all stuffs could be done without vmexits. So no vendor specific part since the device and transport are all standard. > Is there an option to do bounce the > interrupt back to the vendor specific driver in the host so it can take > these actions? Currently not, but even if we can do this, I'm afraid we will lose the performance advantage of irq bypassing. > >>> Does this mean that the host will not handle this interrupt? How does it >>> work in case on level triggered interrupts? >> >> There's no guarantee that the KVM arch code can make sure the irq >> bypass work for any type of irq. So if they the irq will still need >> to be handled by host first. This means we should keep the host >> interrupt handler as a slowpath (fallback). >> >>> In the case of ConnectX, I need to execute some code to acknowledge the >>> interrupt. >> >> This turns out to be hard for irq bypassing to work. Is it because >> the irq is shared or what kind of ack you need to do? > I have an EQ which is a queue for events comming from the hardware. This > EQ can created so it reports only completion events but I still need to > execute code that roughly tells the device that I saw these event > records and then arm it again so it can report more interrupts (e.g if > more packets are received or sent). This is device specific code. Any chance that the hardware can use MSI (which is not the case here)? Thanks >> Thanks >> >> >>> Can you explain how this should be done? >>>
On Wed, Jul 29, 2020 at 06:19:52PM +0800, Jason Wang wrote:
I am checking internally if we can work in a mode not requiring to
acknowledge the interrupt. I will update.
Thanks for the explanations.
>
> On 2020/7/29 下午5:55, Eli Cohen wrote:
> >On Wed, Jul 29, 2020 at 05:21:53PM +0800, Jason Wang wrote:
> >>On 2020/7/28 下午5:04, Eli Cohen wrote:
> >>>On Tue, Jul 28, 2020 at 12:24:03PM +0800, Zhu Lingshan wrote:
> >>>>+static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid)
> >>>>+{
> >>>>+ struct vhost_virtqueue *vq = &v->vqs[qid];
> >>>>+ const struct vdpa_config_ops *ops = v->vdpa->config;
> >>>>+ struct vdpa_device *vdpa = v->vdpa;
> >>>>+ int ret, irq;
> >>>>+
> >>>>+ spin_lock(&vq->call_ctx.ctx_lock);
> >>>>+ irq = ops->get_vq_irq(vdpa, qid);
> >>>>+ if (!vq->call_ctx.ctx || irq == -EINVAL) {
> >>>>+ spin_unlock(&vq->call_ctx.ctx_lock);
> >>>>+ return;
> >>>>+ }
> >>>>+
> >>>If I understand correctly, this will cause these IRQs to be forwarded
> >>>directly to the VCPU, e.g. will be handled by the guest/qemu.
> >>
> >>Yes, if it can bypassed, the interrupt will be delivered to vCPU directly.
> >>
> >So, usually the network driver knows how to handle interrups for its
> >devices. I assume the virtio_net driver at the guest has some default
> >processing but what if the underlying hardware device (such as the case
> >of vdpa) needs to take some actions?
>
>
> Virtio splits the bus operations out of device operations. So did
> the driver.
>
> The virtio-net driver depends on a transport driver to talk to the
> real device. Usually PCI is used as the transport for the device. In
> this case virtio-pci driver is in charge of dealing with irq
> allocation/free/configuration and it needs to co-operate with
> platform specific irqchip (virtualized by KVM) to finish the work
> like irq acknowledge etc. E.g for x86, the irq offloading can only
> work when there's a hardware support of virtual irqchip (APICv) then
> all stuffs could be done without vmexits.
>
> So no vendor specific part since the device and transport are all standard.
>
>
> > Is there an option to do bounce the
> >interrupt back to the vendor specific driver in the host so it can take
> >these actions?
>
>
> Currently not, but even if we can do this, I'm afraid we will lose
> the performance advantage of irq bypassing.
>
>
> >
> >>>Does this mean that the host will not handle this interrupt? How does it
> >>>work in case on level triggered interrupts?
> >>
> >>There's no guarantee that the KVM arch code can make sure the irq
> >>bypass work for any type of irq. So if they the irq will still need
> >>to be handled by host first. This means we should keep the host
> >>interrupt handler as a slowpath (fallback).
> >>
> >>>In the case of ConnectX, I need to execute some code to acknowledge the
> >>>interrupt.
> >>
> >>This turns out to be hard for irq bypassing to work. Is it because
> >>the irq is shared or what kind of ack you need to do?
> >I have an EQ which is a queue for events comming from the hardware. This
> >EQ can created so it reports only completion events but I still need to
> >execute code that roughly tells the device that I saw these event
> >records and then arm it again so it can report more interrupts (e.g if
> >more packets are received or sent). This is device specific code.
>
>
> Any chance that the hardware can use MSI (which is not the case here)?
>
> Thanks
>
>
> >>Thanks
> >>
> >>
> >>>Can you explain how this should be done?
> >>>
>
OK, we have a mode of operation that does not require driver
intervention to manipulate the event queues so I think we're ok with
this design.
On Wed, Jul 29, 2020 at 06:19:52PM +0800, Jason Wang wrote:
>
> On 2020/7/29 下午5:55, Eli Cohen wrote:
> >On Wed, Jul 29, 2020 at 05:21:53PM +0800, Jason Wang wrote:
> >>On 2020/7/28 下午5:04, Eli Cohen wrote:
> >>>On Tue, Jul 28, 2020 at 12:24:03PM +0800, Zhu Lingshan wrote:
> >>>>+static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid)
> >>>>+{
> >>>>+ struct vhost_virtqueue *vq = &v->vqs[qid];
> >>>>+ const struct vdpa_config_ops *ops = v->vdpa->config;
> >>>>+ struct vdpa_device *vdpa = v->vdpa;
> >>>>+ int ret, irq;
> >>>>+
> >>>>+ spin_lock(&vq->call_ctx.ctx_lock);
> >>>>+ irq = ops->get_vq_irq(vdpa, qid);
> >>>>+ if (!vq->call_ctx.ctx || irq == -EINVAL) {
> >>>>+ spin_unlock(&vq->call_ctx.ctx_lock);
> >>>>+ return;
> >>>>+ }
> >>>>+
> >>>If I understand correctly, this will cause these IRQs to be forwarded
> >>>directly to the VCPU, e.g. will be handled by the guest/qemu.
> >>
> >>Yes, if it can bypassed, the interrupt will be delivered to vCPU directly.
> >>
> >So, usually the network driver knows how to handle interrups for its
> >devices. I assume the virtio_net driver at the guest has some default
> >processing but what if the underlying hardware device (such as the case
> >of vdpa) needs to take some actions?
>
>
> Virtio splits the bus operations out of device operations. So did
> the driver.
>
> The virtio-net driver depends on a transport driver to talk to the
> real device. Usually PCI is used as the transport for the device. In
> this case virtio-pci driver is in charge of dealing with irq
> allocation/free/configuration and it needs to co-operate with
> platform specific irqchip (virtualized by KVM) to finish the work
> like irq acknowledge etc. E.g for x86, the irq offloading can only
> work when there's a hardware support of virtual irqchip (APICv) then
> all stuffs could be done without vmexits.
>
> So no vendor specific part since the device and transport are all standard.
>
>
> > Is there an option to do bounce the
> >interrupt back to the vendor specific driver in the host so it can take
> >these actions?
>
>
> Currently not, but even if we can do this, I'm afraid we will lose
> the performance advantage of irq bypassing.
>
>
> >
> >>>Does this mean that the host will not handle this interrupt? How does it
> >>>work in case on level triggered interrupts?
> >>
> >>There's no guarantee that the KVM arch code can make sure the irq
> >>bypass work for any type of irq. So if they the irq will still need
> >>to be handled by host first. This means we should keep the host
> >>interrupt handler as a slowpath (fallback).
> >>
> >>>In the case of ConnectX, I need to execute some code to acknowledge the
> >>>interrupt.
> >>
> >>This turns out to be hard for irq bypassing to work. Is it because
> >>the irq is shared or what kind of ack you need to do?
> >I have an EQ which is a queue for events comming from the hardware. This
> >EQ can created so it reports only completion events but I still need to
> >execute code that roughly tells the device that I saw these event
> >records and then arm it again so it can report more interrupts (e.g if
> >more packets are received or sent). This is device specific code.
>
>
> Any chance that the hardware can use MSI (which is not the case here)?
>
> Thanks
>
>
> >>Thanks
> >>
> >>
> >>>Can you explain how this should be done?
> >>>
>
On 2020/7/29 下午10:15, Eli Cohen wrote:
> OK, we have a mode of operation that does not require driver
> intervention to manipulate the event queues so I think we're ok with
> this design.
Good to know this.
Thanks
On 2020/7/28 下午12:23, Zhu Lingshan wrote:
> This series intends to implement IRQ offloading for
> vhost_vdpa.
>
> By the feat of irq forwarding facilities like posted
> interrupt on X86, irq bypass can help deliver
> interrupts to vCPU directly.
>
> vDPA devices have dedicated hardware backends like VFIO
> pass-throughed devices. So it would be possible to setup
> irq offloading(irq bypass) for vDPA devices and gain
> performance improvements.
>
> In my testing, with this feature, we can save 0.1ms
> in a ping between two VFs on average.
> changes from V3:
> (1)removed vDPA irq allocate/free helpers in vDPA core.
> (2)add a new function get_vq_irq() in struct vdpa_config_ops,
> upper layer driver can use this function to: A. query the
> irq numbner of a vq. B. detect whether a vq is enabled.
> (3)implement get_vq_irq() in ifcvf driver.
> (4)in vhost_vdpa, set_status() will setup irq offloading when
> setting DRIVER_OK, and unsetup when receive !DRIVER_OK.
> (5)minor improvements.
Ok, I think you can go ahead to post a V5. It's not bad to start with
get_vq_irq() and we can do any changes afterwards if it can work for
some cases.
Thanks