linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
@ 2015-10-27  1:20 Yunhong Jiang
  2015-10-27  3:37 ` Alex Williamson
  0 siblings, 1 reply; 17+ messages in thread
From: Yunhong Jiang @ 2015-10-27  1:20 UTC (permalink / raw)
  To: alex.williamson; +Cc: kvm, linux-kernel, Yunhong Jiang

An option to force VFIO PCI MSI/MSI-X handler as non-threaded IRQ,
even when CONFIG_IRQ_FORCED_THREADING=y. This is uselful when
assigning a device to a guest with low latency requirement since it
reduce the context switch to/from the IRQ thread.

An experiment was conducted on a HSW platform for 1 minutes, with the
guest vCPU bound to isolated pCPU. The assigned device triggered the
interrupt every 1ms. The average EXTERNAL_INTERRUPT exit handling time
is dropped from 5.3us to 2.2us.

Another choice is to change VFIO_DEVICE_SET_IRQS ioctl, to apply this
option only to specific devices when in kernel irq_chip is enabled. It
provides more flexibility but is more complex, not sure if we need go
through that way.

Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
---
 drivers/vfio/pci/vfio_pci_intrs.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 1f577b4..ca1f95a 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -22,9 +22,13 @@
 #include <linux/vfio.h>
 #include <linux/wait.h>
 #include <linux/slab.h>
+#include <linux/module.h>
 
 #include "vfio_pci_private.h"
 
+static bool nonthread_msi = 1;
+module_param(nonthread_msi, bool, 0444);
+
 /*
  * INTx
  */
@@ -313,6 +317,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
 	char *name = msix ? "vfio-msix" : "vfio-msi";
 	struct eventfd_ctx *trigger;
 	int ret;
+	unsigned long irqflags = 0;
 
 	if (vector >= vdev->num_ctx)
 		return -EINVAL;
@@ -352,7 +357,10 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
 		pci_write_msi_msg(irq, &msg);
 	}
 
-	ret = request_irq(irq, vfio_msihandler, 0,
+	if (nonthread_msi)
+		irqflags = IRQF_NO_THREAD;
+
+	ret = request_irq(irq, vfio_msihandler, irqflags,
 			  vdev->ctx[vector].name, trigger);
 	if (ret) {
 		kfree(vdev->ctx[vector].name);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
  2015-10-27  1:20 [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ Yunhong Jiang
@ 2015-10-27  3:37 ` Alex Williamson
  2015-10-27  6:35   ` Yunhong Jiang
  0 siblings, 1 reply; 17+ messages in thread
From: Alex Williamson @ 2015-10-27  3:37 UTC (permalink / raw)
  To: Yunhong Jiang; +Cc: kvm, linux-kernel

On Mon, 2015-10-26 at 18:20 -0700, Yunhong Jiang wrote:
> An option to force VFIO PCI MSI/MSI-X handler as non-threaded IRQ,
> even when CONFIG_IRQ_FORCED_THREADING=y. This is uselful when
> assigning a device to a guest with low latency requirement since it
> reduce the context switch to/from the IRQ thread.

Is there any way we can do this automatically?  Perhaps detecting that
we're on a RT kernel or maybe that the user is running with RT priority?
I find that module options are mostly misunderstood and misused.

> An experiment was conducted on a HSW platform for 1 minutes, with the
> guest vCPU bound to isolated pCPU. The assigned device triggered the
> interrupt every 1ms. The average EXTERNAL_INTERRUPT exit handling time
> is dropped from 5.3us to 2.2us.
> 
> Another choice is to change VFIO_DEVICE_SET_IRQS ioctl, to apply this
> option only to specific devices when in kernel irq_chip is enabled. It
> provides more flexibility but is more complex, not sure if we need go
> through that way.

Allowing the user to decide whether or not to use a threaded IRQ seems
like a privilege violation; a chance for the user to game the system and
give themselves better latency, maybe at the cost of others.  I think
we're better off trying to infer the privilege from the task priority or
kernel config or, if we run out of options, make a module option as you
have here requiring the system admin to provide the privilege.  Thanks,

Alex


> Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
> ---
>  drivers/vfio/pci/vfio_pci_intrs.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 1f577b4..ca1f95a 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -22,9 +22,13 @@
>  #include <linux/vfio.h>
>  #include <linux/wait.h>
>  #include <linux/slab.h>
> +#include <linux/module.h>
>  
>  #include "vfio_pci_private.h"
>  
> +static bool nonthread_msi = 1;
> +module_param(nonthread_msi, bool, 0444);
> +
>  /*
>   * INTx
>   */
> @@ -313,6 +317,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
>  	char *name = msix ? "vfio-msix" : "vfio-msi";
>  	struct eventfd_ctx *trigger;
>  	int ret;
> +	unsigned long irqflags = 0;
>  
>  	if (vector >= vdev->num_ctx)
>  		return -EINVAL;
> @@ -352,7 +357,10 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
>  		pci_write_msi_msg(irq, &msg);
>  	}
>  
> -	ret = request_irq(irq, vfio_msihandler, 0,
> +	if (nonthread_msi)
> +		irqflags = IRQF_NO_THREAD;
> +
> +	ret = request_irq(irq, vfio_msihandler, irqflags,
>  			  vdev->ctx[vector].name, trigger);
>  	if (ret) {
>  		kfree(vdev->ctx[vector].name);




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
  2015-10-27  3:37 ` Alex Williamson
@ 2015-10-27  6:35   ` Yunhong Jiang
  2015-10-27  9:29     ` Paolo Bonzini
  0 siblings, 1 reply; 17+ messages in thread
From: Yunhong Jiang @ 2015-10-27  6:35 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, linux-kernel

On Mon, Oct 26, 2015 at 09:37:14PM -0600, Alex Williamson wrote:
> On Mon, 2015-10-26 at 18:20 -0700, Yunhong Jiang wrote:
> > An option to force VFIO PCI MSI/MSI-X handler as non-threaded IRQ,
> > even when CONFIG_IRQ_FORCED_THREADING=y. This is uselful when
> > assigning a device to a guest with low latency requirement since it
> > reduce the context switch to/from the IRQ thread.
> 
> Is there any way we can do this automatically?  Perhaps detecting that
> we're on a RT kernel or maybe that the user is running with RT priority?
> I find that module options are mostly misunderstood and misused.

Alex, thanks for review.

It's not easy to detect if the user is running with RT priority, since 
sometimes the user start the thread and then set the scheduler priority 
late.

Also should we do this only for in kernel irqchip scenario and not for user 
space handler, since in kernel irqchip has lower overhead?

> 
> > An experiment was conducted on a HSW platform for 1 minutes, with the
> > guest vCPU bound to isolated pCPU. The assigned device triggered the
> > interrupt every 1ms. The average EXTERNAL_INTERRUPT exit handling time
> > is dropped from 5.3us to 2.2us.
> > 
> > Another choice is to change VFIO_DEVICE_SET_IRQS ioctl, to apply this
> > option only to specific devices when in kernel irq_chip is enabled. It
> > provides more flexibility but is more complex, not sure if we need go
> > through that way.
> 
> Allowing the user to decide whether or not to use a threaded IRQ seems
> like a privilege violation; a chance for the user to game the system and
> give themselves better latency, maybe at the cost of others.  I think

Yes, you are right. One benefit of the ioctl change is to have a 
per-device-option thus is more flexible.

> we're better off trying to infer the privilege from the task priority or

I'd think system admin may make decision after some tunning, like you said 
it "maybe at the cost of others" and not sure if we should make decision 
based on task priority or kernel config.

Thanks
--jyh 

> kernel config or, if we run out of options, make a module option as you
> have here requiring the system admin to provide the privilege.  Thanks,
> 
> Alex
> 
> 
> > Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
> > ---
> >  drivers/vfio/pci/vfio_pci_intrs.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> > index 1f577b4..ca1f95a 100644
> > --- a/drivers/vfio/pci/vfio_pci_intrs.c
> > +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> > @@ -22,9 +22,13 @@
> >  #include <linux/vfio.h>
> >  #include <linux/wait.h>
> >  #include <linux/slab.h>
> > +#include <linux/module.h>
> >  
> >  #include "vfio_pci_private.h"
> >  
> > +static bool nonthread_msi = 1;
> > +module_param(nonthread_msi, bool, 0444);
> > +
> >  /*
> >   * INTx
> >   */
> > @@ -313,6 +317,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
> >  	char *name = msix ? "vfio-msix" : "vfio-msi";
> >  	struct eventfd_ctx *trigger;
> >  	int ret;
> > +	unsigned long irqflags = 0;
> >  
> >  	if (vector >= vdev->num_ctx)
> >  		return -EINVAL;
> > @@ -352,7 +357,10 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
> >  		pci_write_msi_msg(irq, &msg);
> >  	}
> >  
> > -	ret = request_irq(irq, vfio_msihandler, 0,
> > +	if (nonthread_msi)
> > +		irqflags = IRQF_NO_THREAD;
> > +
> > +	ret = request_irq(irq, vfio_msihandler, irqflags,
> >  			  vdev->ctx[vector].name, trigger);
> >  	if (ret) {
> >  		kfree(vdev->ctx[vector].name);
> 
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
  2015-10-27  6:35   ` Yunhong Jiang
@ 2015-10-27  9:29     ` Paolo Bonzini
  2015-10-27 21:26       ` Yunhong Jiang
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2015-10-27  9:29 UTC (permalink / raw)
  To: Yunhong Jiang, Alex Williamson; +Cc: kvm, linux-kernel



On 27/10/2015 07:35, Yunhong Jiang wrote:
> On Mon, Oct 26, 2015 at 09:37:14PM -0600, Alex Williamson wrote:
>> On Mon, 2015-10-26 at 18:20 -0700, Yunhong Jiang wrote:
>>> An option to force VFIO PCI MSI/MSI-X handler as non-threaded IRQ,
>>> even when CONFIG_IRQ_FORCED_THREADING=y. This is uselful when
>>> assigning a device to a guest with low latency requirement since it
>>> reduce the context switch to/from the IRQ thread.
>>
>> Is there any way we can do this automatically?  Perhaps detecting that
>> we're on a RT kernel or maybe that the user is running with RT priority?
>> I find that module options are mostly misunderstood and misused.
> 
> Alex, thanks for review.
> 
> It's not easy to detect if the user is running with RT priority, since 
> sometimes the user start the thread and then set the scheduler priority 
> late.
> 
> Also should we do this only for in kernel irqchip scenario and not for user 
> space handler, since in kernel irqchip has lower overhead?

The overhead of the non-threaded IRQ handler is the same for kernel or
userspace irqchip, since the handler just writes 1 to the eventfd.

On RT kernels however can you call eventfd_signal from interrupt
context?  You cannot call spin_lock_irqsave (which can sleep) from a
non-threaded interrupt handler, can you?  You would need a raw spin lock.

Paolo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
  2015-10-27  9:29     ` Paolo Bonzini
@ 2015-10-27 21:26       ` Yunhong Jiang
  2015-10-28  0:44         ` Paolo Bonzini
  0 siblings, 1 reply; 17+ messages in thread
From: Yunhong Jiang @ 2015-10-27 21:26 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Alex Williamson, kvm, linux-kernel

On Tue, Oct 27, 2015 at 10:29:28AM +0100, Paolo Bonzini wrote:
> 
> 
> On 27/10/2015 07:35, Yunhong Jiang wrote:
> > On Mon, Oct 26, 2015 at 09:37:14PM -0600, Alex Williamson wrote:
> >> On Mon, 2015-10-26 at 18:20 -0700, Yunhong Jiang wrote:
> >>> An option to force VFIO PCI MSI/MSI-X handler as non-threaded IRQ,
> >>> even when CONFIG_IRQ_FORCED_THREADING=y. This is uselful when
> >>> assigning a device to a guest with low latency requirement since it
> >>> reduce the context switch to/from the IRQ thread.
> >>
> >> Is there any way we can do this automatically?  Perhaps detecting that
> >> we're on a RT kernel or maybe that the user is running with RT priority?
> >> I find that module options are mostly misunderstood and misused.
> > 
> > Alex, thanks for review.
> > 
> > It's not easy to detect if the user is running with RT priority, since 
> > sometimes the user start the thread and then set the scheduler priority 
> > late.
> > 
> > Also should we do this only for in kernel irqchip scenario and not for user 
> > space handler, since in kernel irqchip has lower overhead?
> 
> The overhead of the non-threaded IRQ handler is the same for kernel or
> userspace irqchip, since the handler just writes 1 to the eventfd.

IIUC, the handler not only write1 1 to the eventfd, it also invoke the wait 
queue function, and the in kernel irqchip has different callback with the 
user space irqchip, am I right? But I should not state that in kernel 
irqchip has lower overhead since I have no data for it.

> 
> On RT kernels however can you call eventfd_signal from interrupt
> context?  You cannot call spin_lock_irqsave (which can sleep) from a
> non-threaded interrupt handler, can you?  You would need a raw spin lock.

Thanks for pointing this out. Yes, we can't call spin_lock_irqsave on RT 
kernel. Will do this way on next patch. But not sure if it's overkill to use 
raw_spinlock there since the eventfd_signal is used by other caller also.

Thanks
--jyh


> 
> Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
  2015-10-27 21:26       ` Yunhong Jiang
@ 2015-10-28  0:44         ` Paolo Bonzini
  2015-10-28 16:00           ` Alex Williamson
  2015-10-28 17:50           ` Yunhong Jiang
  0 siblings, 2 replies; 17+ messages in thread
From: Paolo Bonzini @ 2015-10-28  0:44 UTC (permalink / raw)
  To: Yunhong Jiang; +Cc: Alex Williamson, kvm, linux-kernel



On 27/10/2015 22:26, Yunhong Jiang wrote:
>> > On RT kernels however can you call eventfd_signal from interrupt
>> > context?  You cannot call spin_lock_irqsave (which can sleep) from a
>> > non-threaded interrupt handler, can you?  You would need a raw spin lock.
> Thanks for pointing this out. Yes, we can't call spin_lock_irqsave on RT 
> kernel. Will do this way on next patch. But not sure if it's overkill to use 
> raw_spinlock there since the eventfd_signal is used by other caller also.

No, I don't think you can use raw_spinlock there.  The problem is not
just eventfd_signal, it is especially wake_up_locked_poll.  You cannot
convert the whole workqueue infrastructure to use raw_spinlock.

Alex, would it make sense to use the IRQ bypass infrastructure always,
not just for VT-d, to do the MSI injection directly from the VFIO
interrupt handler and bypass the eventfd?  Basically this would add an
RCU-protected list of consumers matching the token to struct
irq_bypass_producer, and a

	int (*inject)(struct irq_bypass_consumer *);

callback to struct irq_bypass_consumer.  If any callback returns true,
the eventfd is not signaled.  The KVM implementation would be like this
(compare with virt/kvm/eventfd.c):

	/* Extracted out of irqfd_wakeup */
	static int
	irqfd_wakeup_pollin(struct kvm_kernel_irqfd *irqfd)
	{
		...
	}

	/* Extracted out of irqfd_wakeup */
	static int
	irqfd_wakeup_pollhup(struct kvm_kernel_irqfd *irqfd)
	{
		...
	}

	static int
	irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync,
		     void *key)
	{
	        struct _irqfd *irqfd = container_of(wait,
			struct _irqfd, wait);
	        unsigned long flags = (unsigned long)key;

		if (flags & POLLIN)
			irqfd_wakeup_pollin(irqfd);
		if (flags & POLLHUP)
			irqfd_wakeup_pollhup(irqfd);

		return 0;
	}

	static int kvm_arch_irq_bypass_inject(
		struct irq_bypass_consumer *cons)
	{
		struct kvm_kernel_irqfd *irqfd =
			container_of(cons, struct kvm_kernel_irqfd,
				     consumer);	

		irqfd_wakeup_pollin(irqfd);
	}

Or do you think it would be a hack?  The latency improvement might
actually be even better than what Yunhong is already reporting.

Paolo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
  2015-10-28  0:44         ` Paolo Bonzini
@ 2015-10-28 16:00           ` Alex Williamson
  2015-10-28 17:05             ` Paolo Bonzini
  2015-10-28 17:50           ` Yunhong Jiang
  1 sibling, 1 reply; 17+ messages in thread
From: Alex Williamson @ 2015-10-28 16:00 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Yunhong Jiang, kvm, linux-kernel

On Wed, 2015-10-28 at 01:44 +0100, Paolo Bonzini wrote:
> 
> On 27/10/2015 22:26, Yunhong Jiang wrote:
> >> > On RT kernels however can you call eventfd_signal from interrupt
> >> > context?  You cannot call spin_lock_irqsave (which can sleep) from a
> >> > non-threaded interrupt handler, can you?  You would need a raw spin lock.
> > Thanks for pointing this out. Yes, we can't call spin_lock_irqsave on RT 
> > kernel. Will do this way on next patch. But not sure if it's overkill to use 
> > raw_spinlock there since the eventfd_signal is used by other caller also.
> 
> No, I don't think you can use raw_spinlock there.  The problem is not
> just eventfd_signal, it is especially wake_up_locked_poll.  You cannot
> convert the whole workqueue infrastructure to use raw_spinlock.
> 
> Alex, would it make sense to use the IRQ bypass infrastructure always,
> not just for VT-d, to do the MSI injection directly from the VFIO
> interrupt handler and bypass the eventfd?  Basically this would add an
> RCU-protected list of consumers matching the token to struct
> irq_bypass_producer, and a
> 
> 	int (*inject)(struct irq_bypass_consumer *);
> 
> callback to struct irq_bypass_consumer.  If any callback returns true,
> the eventfd is not signaled.  The KVM implementation would be like this
> (compare with virt/kvm/eventfd.c):
> 
> 	/* Extracted out of irqfd_wakeup */
> 	static int
> 	irqfd_wakeup_pollin(struct kvm_kernel_irqfd *irqfd)
> 	{
> 		...
> 	}
> 
> 	/* Extracted out of irqfd_wakeup */
> 	static int
> 	irqfd_wakeup_pollhup(struct kvm_kernel_irqfd *irqfd)
> 	{
> 		...
> 	}
> 
> 	static int
> 	irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync,
> 		     void *key)
> 	{
> 	        struct _irqfd *irqfd = container_of(wait,
> 			struct _irqfd, wait);
> 	        unsigned long flags = (unsigned long)key;
> 
> 		if (flags & POLLIN)
> 			irqfd_wakeup_pollin(irqfd);
> 		if (flags & POLLHUP)
> 			irqfd_wakeup_pollhup(irqfd);
> 
> 		return 0;
> 	}
> 
> 	static int kvm_arch_irq_bypass_inject(
> 		struct irq_bypass_consumer *cons)
> 	{
> 		struct kvm_kernel_irqfd *irqfd =
> 			container_of(cons, struct kvm_kernel_irqfd,
> 				     consumer);	
> 
> 		irqfd_wakeup_pollin(irqfd);
> 	}
> 
> Or do you think it would be a hack?  The latency improvement might
> actually be even better than what Yunhong is already reporting.

Yeah, that might be a good idea, it's probably more plausible than
making the eventfd_signal() code friendly to call from hard interrupt
context.  On the vfio side can we use request_threaded_irq() directly
for this?  Making the hard irq handler return IRQ_HANDLED if we can use
the irq bypass manager or IRQ_WAKE_THREAD if we need to use the eventfd.
I think we need some way to get back to irq thread context to use
eventfd_signal().  Would we ever not want to use the direct bypass
manager path if available?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
  2015-10-28 16:00           ` Alex Williamson
@ 2015-10-28 17:05             ` Paolo Bonzini
  2015-10-28 23:54               ` Marcelo Tosatti
  2015-10-29  3:11               ` Alex Williamson
  0 siblings, 2 replies; 17+ messages in thread
From: Paolo Bonzini @ 2015-10-28 17:05 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Yunhong Jiang, kvm, linux-kernel, Marcelo Tosatti



On 28/10/2015 17:00, Alex Williamson wrote:
> > Alex, would it make sense to use the IRQ bypass infrastructure always,
> > not just for VT-d, to do the MSI injection directly from the VFIO
> > interrupt handler and bypass the eventfd?  Basically this would add an
> > RCU-protected list of consumers matching the token to struct
> > irq_bypass_producer, and a
> > 
> > 	int (*inject)(struct irq_bypass_consumer *);
> > 
> > callback to struct irq_bypass_consumer.  If any callback returns true,
> > the eventfd is not signaled.
>
> Yeah, that might be a good idea, it's probably more plausible than
> making the eventfd_signal() code friendly to call from hard interrupt
> context.  On the vfio side can we use request_threaded_irq() directly
> for this?

I don't know if that gives you a non-threaded IRQ with the real-time
kernel...  CCing Marcelo to get some insight.

> Making the hard irq handler return IRQ_HANDLED if we can use
> the irq bypass manager or IRQ_WAKE_THREAD if we need to use the eventfd.
> I think we need some way to get back to irq thread context to use
> eventfd_signal().

The irqfd is already able to schedule a work item, because it runs with
interrupts disabled, so I think we can always return IRQ_HANDLED.

There's another little complication.  Right now, only x86 has
kvm_set_msi_inatomic.  We should merge kvm_set_msi_inatomic,
kvm_set_irq_inatomic and kvm_arch_set_irq.

Some cleanups are needed there; the flow between the functions is really
badly structured because the API grew somewhat by accretion.  I'll get
to it next week or on the way back to Italy.

> Would we ever not want to use the direct bypass
> manager path if available?  Thanks,

I don't think so.  KVM always registers itself as a consumer, even if
there is no VT-d posted interrupts.  add_producer simply returns -EINVAL
then.

Paolo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
  2015-10-28  0:44         ` Paolo Bonzini
  2015-10-28 16:00           ` Alex Williamson
@ 2015-10-28 17:50           ` Yunhong Jiang
  2015-10-28 18:18             ` Alex Williamson
  2015-10-28 18:28             ` Paolo Bonzini
  1 sibling, 2 replies; 17+ messages in thread
From: Yunhong Jiang @ 2015-10-28 17:50 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Alex Williamson, kvm, linux-kernel, Steven Rostedt

On Wed, Oct 28, 2015 at 01:44:55AM +0100, Paolo Bonzini wrote:
> 
> 
> On 27/10/2015 22:26, Yunhong Jiang wrote:
> >> > On RT kernels however can you call eventfd_signal from interrupt
> >> > context?  You cannot call spin_lock_irqsave (which can sleep) from a
> >> > non-threaded interrupt handler, can you?  You would need a raw spin lock.
> > Thanks for pointing this out. Yes, we can't call spin_lock_irqsave on RT 
> > kernel. Will do this way on next patch. But not sure if it's overkill to use 
> > raw_spinlock there since the eventfd_signal is used by other caller also.
> 
> No, I don't think you can use raw_spinlock there.  The problem is not
> just eventfd_signal, it is especially wake_up_locked_poll.  You cannot
> convert the whole workqueue infrastructure to use raw_spinlock.

You mean the waitqueue, instead of workqueue, right? One choice is to change 
the eventfd to use simple wait queue, which is raw_spinlock. But use simple 
waitqueue on eventfd may in fact impact real time latency if not in this 
scenario.

> 
> Alex, would it make sense to use the IRQ bypass infrastructure always,
> not just for VT-d, to do the MSI injection directly from the VFIO
> interrupt handler and bypass the eventfd?  Basically this would add an
> RCU-protected list of consumers matching the token to struct
> irq_bypass_producer, and a
> 
> 	int (*inject)(struct irq_bypass_consumer *);
> 
> callback to struct irq_bypass_consumer.  If any callback returns true,
> the eventfd is not signaled.  The KVM implementation would be like this
> (compare with virt/kvm/eventfd.c):
> 
> 	/* Extracted out of irqfd_wakeup */
> 	static int
> 	irqfd_wakeup_pollin(struct kvm_kernel_irqfd *irqfd)
> 	{
> 		...
> 	}
> 
> 	/* Extracted out of irqfd_wakeup */
> 	static int
> 	irqfd_wakeup_pollhup(struct kvm_kernel_irqfd *irqfd)
> 	{
> 		...
> 	}
> 
> 	static int
> 	irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync,
> 		     void *key)
> 	{
> 	        struct _irqfd *irqfd = container_of(wait,
> 			struct _irqfd, wait);
> 	        unsigned long flags = (unsigned long)key;
> 
> 		if (flags & POLLIN)
> 			irqfd_wakeup_pollin(irqfd);
> 		if (flags & POLLHUP)
> 			irqfd_wakeup_pollhup(irqfd);
> 
> 		return 0;
> 	}
> 
> 	static int kvm_arch_irq_bypass_inject(
> 		struct irq_bypass_consumer *cons)
> 	{
> 		struct kvm_kernel_irqfd *irqfd =
> 			container_of(cons, struct kvm_kernel_irqfd,
> 				     consumer);	
> 
> 		irqfd_wakeup_pollin(irqfd);
> 	}
> 
This is a good idea IMHO. So for MSI interrupt, the 
kvm_arch_irq_bypass_inject will be used, and the irqfd_wakeup will not be 
invoked anymore, am I right?

I noticed the irq bypass manager is not merged yet, are there any git branch 
for it?

> Or do you think it would be a hack?  The latency improvement might
> actually be even better than what Yunhong is already reporting.

I will be glad to try it.

Thanks
--jyh

> 
> Paolo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
  2015-10-28 17:50           ` Yunhong Jiang
@ 2015-10-28 18:18             ` Alex Williamson
  2015-10-28 21:46               ` Yunhong Jiang
  2015-10-28 18:28             ` Paolo Bonzini
  1 sibling, 1 reply; 17+ messages in thread
From: Alex Williamson @ 2015-10-28 18:18 UTC (permalink / raw)
  To: Yunhong Jiang; +Cc: Paolo Bonzini, kvm, linux-kernel, Steven Rostedt

On Wed, 2015-10-28 at 10:50 -0700, Yunhong Jiang wrote:
> On Wed, Oct 28, 2015 at 01:44:55AM +0100, Paolo Bonzini wrote:
> > 
> > 
> > On 27/10/2015 22:26, Yunhong Jiang wrote:
> > >> > On RT kernels however can you call eventfd_signal from interrupt
> > >> > context?  You cannot call spin_lock_irqsave (which can sleep) from a
> > >> > non-threaded interrupt handler, can you?  You would need a raw spin lock.
> > > Thanks for pointing this out. Yes, we can't call spin_lock_irqsave on RT 
> > > kernel. Will do this way on next patch. But not sure if it's overkill to use 
> > > raw_spinlock there since the eventfd_signal is used by other caller also.
> > 
> > No, I don't think you can use raw_spinlock there.  The problem is not
> > just eventfd_signal, it is especially wake_up_locked_poll.  You cannot
> > convert the whole workqueue infrastructure to use raw_spinlock.
> 
> You mean the waitqueue, instead of workqueue, right? One choice is to change 
> the eventfd to use simple wait queue, which is raw_spinlock. But use simple 
> waitqueue on eventfd may in fact impact real time latency if not in this 
> scenario.
> 
> > 
> > Alex, would it make sense to use the IRQ bypass infrastructure always,
> > not just for VT-d, to do the MSI injection directly from the VFIO
> > interrupt handler and bypass the eventfd?  Basically this would add an
> > RCU-protected list of consumers matching the token to struct
> > irq_bypass_producer, and a
> > 
> > 	int (*inject)(struct irq_bypass_consumer *);
> > 
> > callback to struct irq_bypass_consumer.  If any callback returns true,
> > the eventfd is not signaled.  The KVM implementation would be like this
> > (compare with virt/kvm/eventfd.c):
> > 
> > 	/* Extracted out of irqfd_wakeup */
> > 	static int
> > 	irqfd_wakeup_pollin(struct kvm_kernel_irqfd *irqfd)
> > 	{
> > 		...
> > 	}
> > 
> > 	/* Extracted out of irqfd_wakeup */
> > 	static int
> > 	irqfd_wakeup_pollhup(struct kvm_kernel_irqfd *irqfd)
> > 	{
> > 		...
> > 	}
> > 
> > 	static int
> > 	irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync,
> > 		     void *key)
> > 	{
> > 	        struct _irqfd *irqfd = container_of(wait,
> > 			struct _irqfd, wait);
> > 	        unsigned long flags = (unsigned long)key;
> > 
> > 		if (flags & POLLIN)
> > 			irqfd_wakeup_pollin(irqfd);
> > 		if (flags & POLLHUP)
> > 			irqfd_wakeup_pollhup(irqfd);
> > 
> > 		return 0;
> > 	}
> > 
> > 	static int kvm_arch_irq_bypass_inject(
> > 		struct irq_bypass_consumer *cons)
> > 	{
> > 		struct kvm_kernel_irqfd *irqfd =
> > 			container_of(cons, struct kvm_kernel_irqfd,
> > 				     consumer);	
> > 
> > 		irqfd_wakeup_pollin(irqfd);
> > 	}
> > 
> This is a good idea IMHO. So for MSI interrupt, the 
> kvm_arch_irq_bypass_inject will be used, and the irqfd_wakeup will not be 
> invoked anymore, am I right?
> 
> I noticed the irq bypass manager is not merged yet, are there any git branch 
> for it?

It's in linux-next via the kvm.git next branch:

git://git.kernel.org/pub/scm/virt/kvm/kvm.git

Thanks,
Alex


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
  2015-10-28 17:50           ` Yunhong Jiang
  2015-10-28 18:18             ` Alex Williamson
@ 2015-10-28 18:28             ` Paolo Bonzini
  1 sibling, 0 replies; 17+ messages in thread
From: Paolo Bonzini @ 2015-10-28 18:28 UTC (permalink / raw)
  To: Yunhong Jiang; +Cc: Alex Williamson, kvm, linux-kernel, Steven Rostedt



On 28/10/2015 18:50, Yunhong Jiang wrote:
> > No, I don't think you can use raw_spinlock there.  The problem is not
> > just eventfd_signal, it is especially wake_up_locked_poll.  You cannot
> > convert the whole workqueue infrastructure to use raw_spinlock.
> 
> You mean the waitqueue, instead of workqueue, right?

Yes.

> One choice is to change 
> the eventfd to use simple wait queue, which is raw_spinlock. But use simple 
> waitqueue on eventfd may in fact impact real time latency if not in this 
> scenario.

Userspace can put an arbitrary amount of tasks on the work queue, so
it's not possible to use a simple wait queue.  It would also touch
multiple subsystems, so it's much better to bypass the eventfd completely.

Paolo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
  2015-10-28 18:18             ` Alex Williamson
@ 2015-10-28 21:46               ` Yunhong Jiang
  0 siblings, 0 replies; 17+ messages in thread
From: Yunhong Jiang @ 2015-10-28 21:46 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Paolo Bonzini, kvm, linux-kernel, Steven Rostedt

On Wed, Oct 28, 2015 at 12:18:48PM -0600, Alex Williamson wrote:
> On Wed, 2015-10-28 at 10:50 -0700, Yunhong Jiang wrote:
> > On Wed, Oct 28, 2015 at 01:44:55AM +0100, Paolo Bonzini wrote:
> 
> It's in linux-next via the kvm.git next branch:
> 
> git://git.kernel.org/pub/scm/virt/kvm/kvm.git
> 
> Thanks,
> Alex

Thanks

--jyh

> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
  2015-10-28 17:05             ` Paolo Bonzini
@ 2015-10-28 23:54               ` Marcelo Tosatti
  2015-10-29  3:11               ` Alex Williamson
  1 sibling, 0 replies; 17+ messages in thread
From: Marcelo Tosatti @ 2015-10-28 23:54 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Alex Williamson, Yunhong Jiang, kvm, linux-kernel

On Wed, Oct 28, 2015 at 06:05:00PM +0100, Paolo Bonzini wrote:
> 
> 
> On 28/10/2015 17:00, Alex Williamson wrote:
> > > Alex, would it make sense to use the IRQ bypass infrastructure always,
> > > not just for VT-d, to do the MSI injection directly from the VFIO
> > > interrupt handler and bypass the eventfd?  Basically this would add an
> > > RCU-protected list of consumers matching the token to struct
> > > irq_bypass_producer, and a
> > > 
> > > 	int (*inject)(struct irq_bypass_consumer *);
> > > 
> > > callback to struct irq_bypass_consumer.  If any callback returns true,
> > > the eventfd is not signaled.
> >
> > Yeah, that might be a good idea, it's probably more plausible than
> > making the eventfd_signal() code friendly to call from hard interrupt
> > context.  On the vfio side can we use request_threaded_irq() directly
> > for this?
> 
> I don't know if that gives you a non-threaded IRQ with the real-time
> kernel...  CCing Marcelo to get some insight.

The vfio interrupt handler (threaded or not) runs at a higher priority
than the vcpu thread. So don't worry about -RT.

About bypass: the smaller number of instructions between device ISR and
injection of interrupt to guest, the better, as that will translate
directly to reduction in interrupt latency times, which is important, as
it determines 

1. how often you can switch from pollmode to ACPI C-states.
2. whether the realtime workload is virtualizable.

The answer to properties of request_threaded_irq() is: don't know.

> > Making the hard irq handler return IRQ_HANDLED if we can use
> > the irq bypass manager or IRQ_WAKE_THREAD if we need to use the eventfd.
> > I think we need some way to get back to irq thread context to use
> > eventfd_signal().
> 
> The irqfd is already able to schedule a work item, because it runs with
> interrupts disabled, so I think we can always return IRQ_HANDLED.
> 
> There's another little complication.  Right now, only x86 has
> kvm_set_msi_inatomic.  We should merge kvm_set_msi_inatomic,
> kvm_set_irq_inatomic and kvm_arch_set_irq.
> 
> Some cleanups are needed there; the flow between the functions is really
> badly structured because the API grew somewhat by accretion.  I'll get
> to it next week or on the way back to Italy.
> 
> > Would we ever not want to use the direct bypass
> > manager path if available?  Thanks,
> 
> I don't think so.  KVM always registers itself as a consumer, even if
> there is no VT-d posted interrupts.  add_producer simply returns -EINVAL
> then.
> 
> Paolo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
  2015-10-28 17:05             ` Paolo Bonzini
  2015-10-28 23:54               ` Marcelo Tosatti
@ 2015-10-29  3:11               ` Alex Williamson
  2015-10-29  9:45                 ` Paolo Bonzini
  1 sibling, 1 reply; 17+ messages in thread
From: Alex Williamson @ 2015-10-29  3:11 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Yunhong Jiang, kvm, linux-kernel, Marcelo Tosatti

On Wed, 2015-10-28 at 18:05 +0100, Paolo Bonzini wrote:
> 
> On 28/10/2015 17:00, Alex Williamson wrote:
> > > Alex, would it make sense to use the IRQ bypass infrastructure always,
> > > not just for VT-d, to do the MSI injection directly from the VFIO
> > > interrupt handler and bypass the eventfd?  Basically this would add an
> > > RCU-protected list of consumers matching the token to struct
> > > irq_bypass_producer, and a
> > > 
> > > 	int (*inject)(struct irq_bypass_consumer *);
> > > 
> > > callback to struct irq_bypass_consumer.  If any callback returns true,
> > > the eventfd is not signaled.
> >
> > Yeah, that might be a good idea, it's probably more plausible than
> > making the eventfd_signal() code friendly to call from hard interrupt
> > context.  On the vfio side can we use request_threaded_irq() directly
> > for this?
> 
> I don't know if that gives you a non-threaded IRQ with the real-time
> kernel...  CCing Marcelo to get some insight.
> 
> > Making the hard irq handler return IRQ_HANDLED if we can use
> > the irq bypass manager or IRQ_WAKE_THREAD if we need to use the eventfd.
> > I think we need some way to get back to irq thread context to use
> > eventfd_signal().
> 
> The irqfd is already able to schedule a work item, because it runs with
> interrupts disabled, so I think we can always return IRQ_HANDLED.

I'm confused by this.  The problem with adding IRQF_NO_THREAD to our
current handler is that it hits the spinlock that can sleep in
eventfd_signal() and the waitqueue further down the stack before we get
to the irqfd.  So if we split to a non-threaded handler vs a threaded
handler, where the non-threaded handler either returns IRQ_HANDLED or
IRQ_WAKE_THREAD to queue the threaded handler, there's only so much that
the non-threaded handler can do before we start running into the same
problem.  I think that means that the non-threaded handler needs to
return IRQ_WAKE_THREAD if we need to use the current eventfd_signal()
path, such as if the bypass path is not available.  If we can get
through the bypass path and the KVM irqfd side is safe for the
non-threaded handler, inject succeeds and we return IRQ_HANDLED, right?
Thanks,

Alex


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
  2015-10-29  3:11               ` Alex Williamson
@ 2015-10-29  9:45                 ` Paolo Bonzini
  2015-10-30  6:16                   ` Yunhong Jiang
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2015-10-29  9:45 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Yunhong Jiang, kvm, linux-kernel, Marcelo Tosatti



On 29/10/2015 04:11, Alex Williamson wrote:
> > The irqfd is already able to schedule a work item, because it runs with
> > interrupts disabled, so I think we can always return IRQ_HANDLED.
>
> I'm confused by this.  The problem with adding IRQF_NO_THREAD to our
> current handler is that it hits the spinlock that can sleep in
> eventfd_signal() and the waitqueue further down the stack before we get
> to the irqfd.  So if we split to a non-threaded handler vs a threaded
> handler, where the non-threaded handler either returns IRQ_HANDLED or
> IRQ_WAKE_THREAD to queue the threaded handler, there's only so much that
> the non-threaded handler can do before we start running into the same
> problem.

You're right.  I thought schedule_work used raw spinlocks (and then
everything would be done in the inject callback), but I was wrong.

Basically where irqfd_wakeup now does schedule_work, it would need to
return IRQ_WAKE_THREAD.  The threaded handler then can just do the
eventfd_signal.

Paolo

> I think that means that the non-threaded handler needs to
> return IRQ_WAKE_THREAD if we need to use the current eventfd_signal()
> path, such as if the bypass path is not available.  If we can get
> through the bypass path and the KVM irqfd side is safe for the
> non-threaded handler, inject succeeds and we return IRQ_HANDLED, right?
> Thanks,

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
  2015-10-29  9:45                 ` Paolo Bonzini
@ 2015-10-30  6:16                   ` Yunhong Jiang
  2015-11-02  9:17                     ` Paolo Bonzini
  0 siblings, 1 reply; 17+ messages in thread
From: Yunhong Jiang @ 2015-10-30  6:16 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Alex Williamson, kvm, linux-kernel, Marcelo Tosatti

On Thu, Oct 29, 2015 at 10:45:44AM +0100, Paolo Bonzini wrote:
> 
> 
> On 29/10/2015 04:11, Alex Williamson wrote:
> > > The irqfd is already able to schedule a work item, because it runs with
> > > interrupts disabled, so I think we can always return IRQ_HANDLED.
> >
> > I'm confused by this.  The problem with adding IRQF_NO_THREAD to our
> > current handler is that it hits the spinlock that can sleep in
> > eventfd_signal() and the waitqueue further down the stack before we get
> > to the irqfd.  So if we split to a non-threaded handler vs a threaded
> > handler, where the non-threaded handler either returns IRQ_HANDLED or
> > IRQ_WAKE_THREAD to queue the threaded handler, there's only so much that
> > the non-threaded handler can do before we start running into the same
> > problem.
> 
> You're right.  I thought schedule_work used raw spinlocks (and then
> everything would be done in the inject callback), but I was wrong.
> 
> Basically where irqfd_wakeup now does schedule_work, it would need to
> return IRQ_WAKE_THREAD.  The threaded handler then can just do the
> eventfd_signal.
> 

And with this change, we even don't need the module option anymore, we first 
try the primary handler, which is in hard irq context, and if failed, then
threaded irq handler. Am I right?

Paolo/Alex, do you want to work on the patch yourself? If not, I will be 
happy to try this method.

Thanks
--jyh

> Paolo
> 
> > I think that means that the non-threaded handler needs to
> > return IRQ_WAKE_THREAD if we need to use the current eventfd_signal()
> > path, such as if the bypass path is not available.  If we can get
> > through the bypass path and the KVM irqfd side is safe for the
> > non-threaded handler, inject succeeds and we return IRQ_HANDLED, right?
> > Thanks,

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
  2015-10-30  6:16                   ` Yunhong Jiang
@ 2015-11-02  9:17                     ` Paolo Bonzini
  0 siblings, 0 replies; 17+ messages in thread
From: Paolo Bonzini @ 2015-11-02  9:17 UTC (permalink / raw)
  To: Yunhong Jiang; +Cc: Alex Williamson, kvm, linux-kernel, Marcelo Tosatti



On 30/10/2015 07:16, Yunhong Jiang wrote:
> And with this change, we even don't need the module option anymore, we first 
> try the primary handler, which is in hard irq context, and if failed, then
> threaded irq handler. Am I right?

Yes.

> Paolo/Alex, do you want to work on the patch yourself? If not, I will be 
> happy to try this method.

Of course you can do it yourself.

Thanks!

Paolo

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-11-02  9:17 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-27  1:20 [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ Yunhong Jiang
2015-10-27  3:37 ` Alex Williamson
2015-10-27  6:35   ` Yunhong Jiang
2015-10-27  9:29     ` Paolo Bonzini
2015-10-27 21:26       ` Yunhong Jiang
2015-10-28  0:44         ` Paolo Bonzini
2015-10-28 16:00           ` Alex Williamson
2015-10-28 17:05             ` Paolo Bonzini
2015-10-28 23:54               ` Marcelo Tosatti
2015-10-29  3:11               ` Alex Williamson
2015-10-29  9:45                 ` Paolo Bonzini
2015-10-30  6:16                   ` Yunhong Jiang
2015-11-02  9:17                     ` Paolo Bonzini
2015-10-28 17:50           ` Yunhong Jiang
2015-10-28 18:18             ` Alex Williamson
2015-10-28 21:46               ` Yunhong Jiang
2015-10-28 18:28             ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).