linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] KVM: Fix assigned device MSI-X entry setting leak
@ 2012-01-30 21:05 Alex Williamson
  2012-01-31 19:11 ` Michael S. Tsirkin
  0 siblings, 1 reply; 6+ messages in thread
From: Alex Williamson @ 2012-01-30 21:05 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, alex.williamson, jan.kiszka

We need to prioritize our matching when setting MSI-X vector
entries.  Unused entries should only be used if we don't find
an exact match or else we risk duplicating entries.  This was
causing an ENOSPC return when trying to mask and unmask MSI-X
vectors based on guest MSI-X table updates.

The new KVM_CAP_DEVICE_MSIX_MASK extension indicates the
presence of this fix.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

v2: Add capability indicating MSIX_MASKing now works.

The faulting sequence went something like:

Start:
[0] entry 0, vector A
[1] entry 1, vector B
[2] entry 2, vector C

Set entry 1 to 0:
[0] entry 0, vector A
[1] entry 1->1, vector B->0
[2] entry 2, vector C

Set entry 2 to 0:
[0] entry 0, vector A
[1] entry 1->2, vector 0->0 <- incorrectly matched
[2] entry 2, vector C

Set entry 2 to C:
[0] entry 0, vector A
[1] entry 2->2, vector 0->C <- incorrectly matched again
[2] entry 2, vector C

Set entry 1 to B:
[0] entry 0, vector A
[1] entry 2, vector C
[2] entry 2, vector C
-ENOSPC

 Documentation/virtual/kvm/api.txt |    5 +++--
 arch/x86/kvm/x86.c                |    1 +
 include/linux/kvm.h               |    1 +
 virt/kvm/assigned-dev.c           |   21 ++++++++++++++-------
 4 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index e1d94bf..dd0b554 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1304,8 +1304,9 @@ Type: vm ioctl
 Parameters: struct kvm_assigned_msix_entry (in)
 Returns: 0 on success, -1 on error
 
-Specifies the routing of an MSI-X assigned device interrupt to a GSI. Setting
-the GSI vector to zero means disabling the interrupt.
+Specifies the routing of an MSI-X assigned device interrupt to a GSI. If
+KVM_CAP_DEVICE_MSIX_MASK is available, setting the GSI vector to zero means
+disabling the interrupt.
 
 struct kvm_assigned_msix_entry {
 	__u32 assigned_dev_id;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 14d6cad..a35255e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2057,6 +2057,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_XSAVE:
 	case KVM_CAP_ASYNC_PF:
 	case KVM_CAP_GET_TSC_KHZ:
+	case KVM_CAP_DEVICE_MSIX_MASK:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 68e67e5..64438e6 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -558,6 +558,7 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_PPC_PAPR 68
 #define KVM_CAP_S390_GMAP 71
 #define KVM_CAP_TSC_DEADLINE_TIMER 72
+#define KVM_CAP_DEVICE_MSIX_MASK 73
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index 758e3b3..58f3f6b 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -728,7 +728,7 @@ msix_nr_out:
 static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
 				       struct kvm_assigned_msix_entry *entry)
 {
-	int r = 0, i;
+	int r = 0, i, unused = -1;
 	struct kvm_assigned_dev_kernel *adev;
 
 	mutex_lock(&kvm->lock);
@@ -741,17 +741,24 @@ static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
 		goto msix_entry_out;
 	}
 
-	for (i = 0; i < adev->entries_nr; i++)
-		if (adev->guest_msix_entries[i].vector == 0 ||
-		    adev->guest_msix_entries[i].entry == entry->entry) {
+	for (i = 0; i < adev->entries_nr; i++) {
+		if (adev->guest_msix_entries[i].entry == entry->entry) {
 			adev->guest_msix_entries[i].entry = entry->entry;
 			adev->guest_msix_entries[i].vector = entry->gsi;
 			adev->host_msix_entries[i].entry = entry->entry;
 			break;
-		}
+		} else if (unused < 0 && !adev->guest_msix_entries[i].vector)
+			unused = i;
+	}
+
 	if (i == adev->entries_nr) {
-		r = -ENOSPC;
-		goto msix_entry_out;
+		if (unused < 0) {
+			r = -ENOSPC;
+			goto msix_entry_out;
+		}
+		adev->guest_msix_entries[unused].entry = entry->entry;
+		adev->guest_msix_entries[unused].vector = entry->gsi;
+		adev->host_msix_entries[unused].entry = entry->entry;
 	}
 
 msix_entry_out:


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] KVM: Fix assigned device MSI-X entry setting leak
  2012-01-30 21:05 [PATCH v2] KVM: Fix assigned device MSI-X entry setting leak Alex Williamson
@ 2012-01-31 19:11 ` Michael S. Tsirkin
  2012-02-06 21:46   ` Alex Williamson
  0 siblings, 1 reply; 6+ messages in thread
From: Michael S. Tsirkin @ 2012-01-31 19:11 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, linux-kernel, jan.kiszka

On Mon, Jan 30, 2012 at 02:05:54PM -0700, Alex Williamson wrote:
> We need to prioritize our matching when setting MSI-X vector
> entries.  Unused entries should only be used if we don't find
> an exact match or else we risk duplicating entries.  This was
> causing an ENOSPC return when trying to mask and unmask MSI-X
> vectors based on guest MSI-X table updates.
> 
> The new KVM_CAP_DEVICE_MSIX_MASK extension indicates the
> presence of this fix.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
> 
> v2: Add capability indicating MSIX_MASKing now works.
> 
> The faulting sequence went something like:
> 
> Start:
> [0] entry 0, vector A
> [1] entry 1, vector B
> [2] entry 2, vector C
> 
> Set entry 1 to 0:
> [0] entry 0, vector A
> [1] entry 1->1, vector B->0
> [2] entry 2, vector C
> 
> Set entry 2 to 0:
> [0] entry 0, vector A
> [1] entry 1->2, vector 0->0 <- incorrectly matched
> [2] entry 2, vector C
> 
> Set entry 2 to C:
> [0] entry 0, vector A
> [1] entry 2->2, vector 0->C <- incorrectly matched again
> [2] entry 2, vector C
> 
> Set entry 1 to B:
> [0] entry 0, vector A
> [1] entry 2, vector C
> [2] entry 2, vector C
> -ENOSPC
> 
>  Documentation/virtual/kvm/api.txt |    5 +++--
>  arch/x86/kvm/x86.c                |    1 +
>  include/linux/kvm.h               |    1 +
>  virt/kvm/assigned-dev.c           |   21 ++++++++++++++-------
>  4 files changed, 19 insertions(+), 9 deletions(-)

I don't object to fixing this bug. But I don't think
we need a capability for this. Here's why:

setting an entry to zero is not really matching
what devices need, since that would lose interrupts.
What devices need is mask entries to disable them,
then unmask and get an interrupt if it was delayed.

So, if we are going to add a new capability, how about
sticking a mask bit in the padding instead?

As was shown in the past this can even improve performance since we can
propagate the mask bit back to the host device.


> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index e1d94bf..dd0b554 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1304,8 +1304,9 @@ Type: vm ioctl
>  Parameters: struct kvm_assigned_msix_entry (in)
>  Returns: 0 on success, -1 on error
>  
> -Specifies the routing of an MSI-X assigned device interrupt to a GSI. Setting
> -the GSI vector to zero means disabling the interrupt.
> +Specifies the routing of an MSI-X assigned device interrupt to a GSI. If
> +KVM_CAP_DEVICE_MSIX_MASK is available, setting the GSI vector to zero means
> +disabling the interrupt.
>  
>  struct kvm_assigned_msix_entry {
>  	__u32 assigned_dev_id;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 14d6cad..a35255e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2057,6 +2057,7 @@ int kvm_dev_ioctl_check_extension(long ext)
>  	case KVM_CAP_XSAVE:
>  	case KVM_CAP_ASYNC_PF:
>  	case KVM_CAP_GET_TSC_KHZ:
> +	case KVM_CAP_DEVICE_MSIX_MASK:
>  		r = 1;
>  		break;
>  	case KVM_CAP_COALESCED_MMIO:
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index 68e67e5..64438e6 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -558,6 +558,7 @@ struct kvm_ppc_pvinfo {
>  #define KVM_CAP_PPC_PAPR 68
>  #define KVM_CAP_S390_GMAP 71
>  #define KVM_CAP_TSC_DEADLINE_TIMER 72
> +#define KVM_CAP_DEVICE_MSIX_MASK 73
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> index 758e3b3..58f3f6b 100644
> --- a/virt/kvm/assigned-dev.c
> +++ b/virt/kvm/assigned-dev.c
> @@ -728,7 +728,7 @@ msix_nr_out:
>  static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
>  				       struct kvm_assigned_msix_entry *entry)
>  {
> -	int r = 0, i;
> +	int r = 0, i, unused = -1;
>  	struct kvm_assigned_dev_kernel *adev;
>  
>  	mutex_lock(&kvm->lock);
> @@ -741,17 +741,24 @@ static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
>  		goto msix_entry_out;
>  	}
>  
> -	for (i = 0; i < adev->entries_nr; i++)
> -		if (adev->guest_msix_entries[i].vector == 0 ||
> -		    adev->guest_msix_entries[i].entry == entry->entry) {
> +	for (i = 0; i < adev->entries_nr; i++) {
> +		if (adev->guest_msix_entries[i].entry == entry->entry) {
>  			adev->guest_msix_entries[i].entry = entry->entry;
>  			adev->guest_msix_entries[i].vector = entry->gsi;
>  			adev->host_msix_entries[i].entry = entry->entry;
>  			break;
> -		}
> +		} else if (unused < 0 && !adev->guest_msix_entries[i].vector)
> +			unused = i;
> +	}
> +
>  	if (i == adev->entries_nr) {
> -		r = -ENOSPC;
> -		goto msix_entry_out;
> +		if (unused < 0) {
> +			r = -ENOSPC;
> +			goto msix_entry_out;
> +		}
> +		adev->guest_msix_entries[unused].entry = entry->entry;
> +		adev->guest_msix_entries[unused].vector = entry->gsi;
> +		adev->host_msix_entries[unused].entry = entry->entry;
>  	}
>  
>  msix_entry_out:
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] KVM: Fix assigned device MSI-X entry setting leak
  2012-01-31 19:11 ` Michael S. Tsirkin
@ 2012-02-06 21:46   ` Alex Williamson
  2012-02-07  6:31     ` Michael S. Tsirkin
  0 siblings, 1 reply; 6+ messages in thread
From: Alex Williamson @ 2012-02-06 21:46 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, linux-kernel, jan.kiszka

On Tue, 2012-01-31 at 21:11 +0200, Michael S. Tsirkin wrote:
> On Mon, Jan 30, 2012 at 02:05:54PM -0700, Alex Williamson wrote:
> > We need to prioritize our matching when setting MSI-X vector
> > entries.  Unused entries should only be used if we don't find
> > an exact match or else we risk duplicating entries.  This was
> > causing an ENOSPC return when trying to mask and unmask MSI-X
> > vectors based on guest MSI-X table updates.
> > 
> > The new KVM_CAP_DEVICE_MSIX_MASK extension indicates the
> > presence of this fix.
> > 
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> > 
> > v2: Add capability indicating MSIX_MASKing now works.
> > 
> > The faulting sequence went something like:
> > 
> > Start:
> > [0] entry 0, vector A
> > [1] entry 1, vector B
> > [2] entry 2, vector C
> > 
> > Set entry 1 to 0:
> > [0] entry 0, vector A
> > [1] entry 1->1, vector B->0
> > [2] entry 2, vector C
> > 
> > Set entry 2 to 0:
> > [0] entry 0, vector A
> > [1] entry 1->2, vector 0->0 <- incorrectly matched
> > [2] entry 2, vector C
> > 
> > Set entry 2 to C:
> > [0] entry 0, vector A
> > [1] entry 2->2, vector 0->C <- incorrectly matched again
> > [2] entry 2, vector C
> > 
> > Set entry 1 to B:
> > [0] entry 0, vector A
> > [1] entry 2, vector C
> > [2] entry 2, vector C
> > -ENOSPC
> > 
> >  Documentation/virtual/kvm/api.txt |    5 +++--
> >  arch/x86/kvm/x86.c                |    1 +
> >  include/linux/kvm.h               |    1 +
> >  virt/kvm/assigned-dev.c           |   21 ++++++++++++++-------
> >  4 files changed, 19 insertions(+), 9 deletions(-)
> 
> I don't object to fixing this bug. But I don't think
> we need a capability for this. Here's why:
> 
> setting an entry to zero is not really matching
> what devices need, since that would lose interrupts.
> What devices need is mask entries to disable them,
> then unmask and get an interrupt if it was delayed.
> 
> So, if we are going to add a new capability, how about
> sticking a mask bit in the padding instead?

I'll take a look.  Are you suggesting then that we'd note the masked
interrupt was deferred and inject when unmasked?

> As was shown in the past this can even improve performance since we can
> propagate the mask bit back to the host device.

I'm dubious whether there's actually a use case that benefits from
pushing the mask down to hardware.  The typical mask case is a temporary
mask to update the vector data/address then unmask.  We're actually
accelerating that by not pushing it down to hardware.  Are there real
world drivers that enable a vector and then mask it for an extended
period of time?  Thanks,

Alex

> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > index e1d94bf..dd0b554 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -1304,8 +1304,9 @@ Type: vm ioctl
> >  Parameters: struct kvm_assigned_msix_entry (in)
> >  Returns: 0 on success, -1 on error
> >  
> > -Specifies the routing of an MSI-X assigned device interrupt to a GSI. Setting
> > -the GSI vector to zero means disabling the interrupt.
> > +Specifies the routing of an MSI-X assigned device interrupt to a GSI. If
> > +KVM_CAP_DEVICE_MSIX_MASK is available, setting the GSI vector to zero means
> > +disabling the interrupt.
> >  
> >  struct kvm_assigned_msix_entry {
> >  	__u32 assigned_dev_id;
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 14d6cad..a35255e 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -2057,6 +2057,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> >  	case KVM_CAP_XSAVE:
> >  	case KVM_CAP_ASYNC_PF:
> >  	case KVM_CAP_GET_TSC_KHZ:
> > +	case KVM_CAP_DEVICE_MSIX_MASK:
> >  		r = 1;
> >  		break;
> >  	case KVM_CAP_COALESCED_MMIO:
> > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > index 68e67e5..64438e6 100644
> > --- a/include/linux/kvm.h
> > +++ b/include/linux/kvm.h
> > @@ -558,6 +558,7 @@ struct kvm_ppc_pvinfo {
> >  #define KVM_CAP_PPC_PAPR 68
> >  #define KVM_CAP_S390_GMAP 71
> >  #define KVM_CAP_TSC_DEADLINE_TIMER 72
> > +#define KVM_CAP_DEVICE_MSIX_MASK 73
> >  
> >  #ifdef KVM_CAP_IRQ_ROUTING
> >  
> > diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> > index 758e3b3..58f3f6b 100644
> > --- a/virt/kvm/assigned-dev.c
> > +++ b/virt/kvm/assigned-dev.c
> > @@ -728,7 +728,7 @@ msix_nr_out:
> >  static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
> >  				       struct kvm_assigned_msix_entry *entry)
> >  {
> > -	int r = 0, i;
> > +	int r = 0, i, unused = -1;
> >  	struct kvm_assigned_dev_kernel *adev;
> >  
> >  	mutex_lock(&kvm->lock);
> > @@ -741,17 +741,24 @@ static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
> >  		goto msix_entry_out;
> >  	}
> >  
> > -	for (i = 0; i < adev->entries_nr; i++)
> > -		if (adev->guest_msix_entries[i].vector == 0 ||
> > -		    adev->guest_msix_entries[i].entry == entry->entry) {
> > +	for (i = 0; i < adev->entries_nr; i++) {
> > +		if (adev->guest_msix_entries[i].entry == entry->entry) {
> >  			adev->guest_msix_entries[i].entry = entry->entry;
> >  			adev->guest_msix_entries[i].vector = entry->gsi;
> >  			adev->host_msix_entries[i].entry = entry->entry;
> >  			break;
> > -		}
> > +		} else if (unused < 0 && !adev->guest_msix_entries[i].vector)
> > +			unused = i;
> > +	}
> > +
> >  	if (i == adev->entries_nr) {
> > -		r = -ENOSPC;
> > -		goto msix_entry_out;
> > +		if (unused < 0) {
> > +			r = -ENOSPC;
> > +			goto msix_entry_out;
> > +		}
> > +		adev->guest_msix_entries[unused].entry = entry->entry;
> > +		adev->guest_msix_entries[unused].vector = entry->gsi;
> > +		adev->host_msix_entries[unused].entry = entry->entry;
> >  	}
> >  
> >  msix_entry_out:
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] KVM: Fix assigned device MSI-X entry setting leak
  2012-02-06 21:46   ` Alex Williamson
@ 2012-02-07  6:31     ` Michael S. Tsirkin
  2012-02-07 15:11       ` Alex Williamson
  0 siblings, 1 reply; 6+ messages in thread
From: Michael S. Tsirkin @ 2012-02-07  6:31 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, linux-kernel, jan.kiszka

On Mon, Feb 06, 2012 at 02:46:29PM -0700, Alex Williamson wrote:
> On Tue, 2012-01-31 at 21:11 +0200, Michael S. Tsirkin wrote:
> > On Mon, Jan 30, 2012 at 02:05:54PM -0700, Alex Williamson wrote:
> > > We need to prioritize our matching when setting MSI-X vector
> > > entries.  Unused entries should only be used if we don't find
> > > an exact match or else we risk duplicating entries.  This was
> > > causing an ENOSPC return when trying to mask and unmask MSI-X
> > > vectors based on guest MSI-X table updates.
> > > 
> > > The new KVM_CAP_DEVICE_MSIX_MASK extension indicates the
> > > presence of this fix.
> > > 
> > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > ---
> > > 
> > > v2: Add capability indicating MSIX_MASKing now works.
> > > 
> > > The faulting sequence went something like:
> > > 
> > > Start:
> > > [0] entry 0, vector A
> > > [1] entry 1, vector B
> > > [2] entry 2, vector C
> > > 
> > > Set entry 1 to 0:
> > > [0] entry 0, vector A
> > > [1] entry 1->1, vector B->0
> > > [2] entry 2, vector C
> > > 
> > > Set entry 2 to 0:
> > > [0] entry 0, vector A
> > > [1] entry 1->2, vector 0->0 <- incorrectly matched
> > > [2] entry 2, vector C
> > > 
> > > Set entry 2 to C:
> > > [0] entry 0, vector A
> > > [1] entry 2->2, vector 0->C <- incorrectly matched again
> > > [2] entry 2, vector C
> > > 
> > > Set entry 1 to B:
> > > [0] entry 0, vector A
> > > [1] entry 2, vector C
> > > [2] entry 2, vector C
> > > -ENOSPC
> > > 
> > >  Documentation/virtual/kvm/api.txt |    5 +++--
> > >  arch/x86/kvm/x86.c                |    1 +
> > >  include/linux/kvm.h               |    1 +
> > >  virt/kvm/assigned-dev.c           |   21 ++++++++++++++-------
> > >  4 files changed, 19 insertions(+), 9 deletions(-)
> > 
> > I don't object to fixing this bug. But I don't think
> > we need a capability for this. Here's why:
> > 
> > setting an entry to zero is not really matching
> > what devices need, since that would lose interrupts.
> > What devices need is mask entries to disable them,
> > then unmask and get an interrupt if it was delayed.
> > 
> > So, if we are going to add a new capability, how about
> > sticking a mask bit in the padding instead?
> 
> I'll take a look.  Are you suggesting then that we'd note the masked
> interrupt was deferred and inject when unmasked?
> 
> > As was shown in the past this can even improve performance since we can
> > propagate the mask bit back to the host device.
> 
> I'm dubious whether there's actually a use case that benefits from
> pushing the mask down to hardware.  The typical mask case is a temporary
> mask to update the vector data/address then unmask.  We're actually
> accelerating that by not pushing it down to hardware.  Are there real
> world drivers that enable a vector and then mask it for an extended
> period of time?

I don't know. Shen Yang at some point said:
"We have also reproduced the issue on some large scale benchmark on the guest with
"newer kernel like 2.6.30 on Xen, under very high interrupt rate, due to some
"interrupt rate limitation mechanism in kernel.

And these patches did push the interrupts down - but not to hardware,
masking interrupt as pending in kernel only sets a bit, it is only
pushed down when we next get an interrupt while vector is masked.

> Thanks,
> 
> Alex

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] KVM: Fix assigned device MSI-X entry setting leak
  2012-02-07  6:31     ` Michael S. Tsirkin
@ 2012-02-07 15:11       ` Alex Williamson
  2012-02-07 15:56         ` Michael S. Tsirkin
  0 siblings, 1 reply; 6+ messages in thread
From: Alex Williamson @ 2012-02-07 15:11 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, linux-kernel, jan.kiszka

On Tue, 2012-02-07 at 08:31 +0200, Michael S. Tsirkin wrote:
> On Mon, Feb 06, 2012 at 02:46:29PM -0700, Alex Williamson wrote:
> > On Tue, 2012-01-31 at 21:11 +0200, Michael S. Tsirkin wrote:
> > > On Mon, Jan 30, 2012 at 02:05:54PM -0700, Alex Williamson wrote:
> > > > We need to prioritize our matching when setting MSI-X vector
> > > > entries.  Unused entries should only be used if we don't find
> > > > an exact match or else we risk duplicating entries.  This was
> > > > causing an ENOSPC return when trying to mask and unmask MSI-X
> > > > vectors based on guest MSI-X table updates.
> > > > 
> > > > The new KVM_CAP_DEVICE_MSIX_MASK extension indicates the
> > > > presence of this fix.
> > > > 
> > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > > ---
> > > > 
> > > > v2: Add capability indicating MSIX_MASKing now works.
> > > > 
> > > > The faulting sequence went something like:
> > > > 
> > > > Start:
> > > > [0] entry 0, vector A
> > > > [1] entry 1, vector B
> > > > [2] entry 2, vector C
> > > > 
> > > > Set entry 1 to 0:
> > > > [0] entry 0, vector A
> > > > [1] entry 1->1, vector B->0
> > > > [2] entry 2, vector C
> > > > 
> > > > Set entry 2 to 0:
> > > > [0] entry 0, vector A
> > > > [1] entry 1->2, vector 0->0 <- incorrectly matched
> > > > [2] entry 2, vector C
> > > > 
> > > > Set entry 2 to C:
> > > > [0] entry 0, vector A
> > > > [1] entry 2->2, vector 0->C <- incorrectly matched again
> > > > [2] entry 2, vector C
> > > > 
> > > > Set entry 1 to B:
> > > > [0] entry 0, vector A
> > > > [1] entry 2, vector C
> > > > [2] entry 2, vector C
> > > > -ENOSPC
> > > > 
> > > >  Documentation/virtual/kvm/api.txt |    5 +++--
> > > >  arch/x86/kvm/x86.c                |    1 +
> > > >  include/linux/kvm.h               |    1 +
> > > >  virt/kvm/assigned-dev.c           |   21 ++++++++++++++-------
> > > >  4 files changed, 19 insertions(+), 9 deletions(-)
> > > 
> > > I don't object to fixing this bug. But I don't think
> > > we need a capability for this. Here's why:
> > > 
> > > setting an entry to zero is not really matching
> > > what devices need, since that would lose interrupts.
> > > What devices need is mask entries to disable them,
> > > then unmask and get an interrupt if it was delayed.
> > > 
> > > So, if we are going to add a new capability, how about
> > > sticking a mask bit in the padding instead?
> > 
> > I'll take a look.  Are you suggesting then that we'd note the masked
> > interrupt was deferred and inject when unmasked?
> > 
> > > As was shown in the past this can even improve performance since we can
> > > propagate the mask bit back to the host device.
> > 
> > I'm dubious whether there's actually a use case that benefits from
> > pushing the mask down to hardware.  The typical mask case is a temporary
> > mask to update the vector data/address then unmask.  We're actually
> > accelerating that by not pushing it down to hardware.  Are there real
> > world drivers that enable a vector and then mask it for an extended
> > period of time?
> 
> I don't know. Shen Yang at some point said:
> "We have also reproduced the issue on some large scale benchmark on the guest with
> "newer kernel like 2.6.30 on Xen, under very high interrupt rate, due to some
> "interrupt rate limitation mechanism in kernel.
> 
> And these patches did push the interrupts down - but not to hardware,
> masking interrupt as pending in kernel only sets a bit, it is only
> pushed down when we next get an interrupt while vector is masked.

I thought the high rate masking was on older guests, but in any case a
reasonable first step is emulated masking in kvm so that we can
re-inject on unmask and not lose interrupts.  That can either lead to
lazy hardware masking with no additional change to the API or if someone
takes the reigns on MSI-X table acceleration, we can head in that
direction.  The MSI-X table interaction I see on an average sized guest
with an assigned 82576 PF certainly isn't calling out for more
acceleration, but 10G devices and beyond could behave differently.
Thanks,

Alex


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] KVM: Fix assigned device MSI-X entry setting leak
  2012-02-07 15:11       ` Alex Williamson
@ 2012-02-07 15:56         ` Michael S. Tsirkin
  0 siblings, 0 replies; 6+ messages in thread
From: Michael S. Tsirkin @ 2012-02-07 15:56 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, linux-kernel, jan.kiszka

On Tue, Feb 07, 2012 at 08:11:30AM -0700, Alex Williamson wrote:
> On Tue, 2012-02-07 at 08:31 +0200, Michael S. Tsirkin wrote:
> > On Mon, Feb 06, 2012 at 02:46:29PM -0700, Alex Williamson wrote:
> > > On Tue, 2012-01-31 at 21:11 +0200, Michael S. Tsirkin wrote:
> > > > On Mon, Jan 30, 2012 at 02:05:54PM -0700, Alex Williamson wrote:
> > > > > We need to prioritize our matching when setting MSI-X vector
> > > > > entries.  Unused entries should only be used if we don't find
> > > > > an exact match or else we risk duplicating entries.  This was
> > > > > causing an ENOSPC return when trying to mask and unmask MSI-X
> > > > > vectors based on guest MSI-X table updates.
> > > > > 
> > > > > The new KVM_CAP_DEVICE_MSIX_MASK extension indicates the
> > > > > presence of this fix.
> > > > > 
> > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > > > ---
> > > > > 
> > > > > v2: Add capability indicating MSIX_MASKing now works.
> > > > > 
> > > > > The faulting sequence went something like:
> > > > > 
> > > > > Start:
> > > > > [0] entry 0, vector A
> > > > > [1] entry 1, vector B
> > > > > [2] entry 2, vector C
> > > > > 
> > > > > Set entry 1 to 0:
> > > > > [0] entry 0, vector A
> > > > > [1] entry 1->1, vector B->0
> > > > > [2] entry 2, vector C
> > > > > 
> > > > > Set entry 2 to 0:
> > > > > [0] entry 0, vector A
> > > > > [1] entry 1->2, vector 0->0 <- incorrectly matched
> > > > > [2] entry 2, vector C
> > > > > 
> > > > > Set entry 2 to C:
> > > > > [0] entry 0, vector A
> > > > > [1] entry 2->2, vector 0->C <- incorrectly matched again
> > > > > [2] entry 2, vector C
> > > > > 
> > > > > Set entry 1 to B:
> > > > > [0] entry 0, vector A
> > > > > [1] entry 2, vector C
> > > > > [2] entry 2, vector C
> > > > > -ENOSPC
> > > > > 
> > > > >  Documentation/virtual/kvm/api.txt |    5 +++--
> > > > >  arch/x86/kvm/x86.c                |    1 +
> > > > >  include/linux/kvm.h               |    1 +
> > > > >  virt/kvm/assigned-dev.c           |   21 ++++++++++++++-------
> > > > >  4 files changed, 19 insertions(+), 9 deletions(-)
> > > > 
> > > > I don't object to fixing this bug. But I don't think
> > > > we need a capability for this. Here's why:
> > > > 
> > > > setting an entry to zero is not really matching
> > > > what devices need, since that would lose interrupts.
> > > > What devices need is mask entries to disable them,
> > > > then unmask and get an interrupt if it was delayed.
> > > > 
> > > > So, if we are going to add a new capability, how about
> > > > sticking a mask bit in the padding instead?
> > > 
> > > I'll take a look.  Are you suggesting then that we'd note the masked
> > > interrupt was deferred and inject when unmasked?
> > > 
> > > > As was shown in the past this can even improve performance since we can
> > > > propagate the mask bit back to the host device.
> > > 
> > > I'm dubious whether there's actually a use case that benefits from
> > > pushing the mask down to hardware.  The typical mask case is a temporary
> > > mask to update the vector data/address then unmask.  We're actually
> > > accelerating that by not pushing it down to hardware.  Are there real
> > > world drivers that enable a vector and then mask it for an extended
> > > period of time?
> > 
> > I don't know. Shen Yang at some point said:
> > "We have also reproduced the issue on some large scale benchmark on the guest with
> > "newer kernel like 2.6.30 on Xen, under very high interrupt rate, due to some
> > "interrupt rate limitation mechanism in kernel.
> > 
> > And these patches did push the interrupts down - but not to hardware,
> > masking interrupt as pending in kernel only sets a bit, it is only
> > pushed down when we next get an interrupt while vector is masked.
> 
> I thought the high rate masking was on older guests, but in any case a
> reasonable first step is emulated masking in kvm so that we can
> re-inject on unmask and not lose interrupts.

Yes, I agree to that. That would also be good for emulated devices.

>  That can either lead to
> lazy hardware masking with no additional change to the API or if someone
> takes the reigns on MSI-X table acceleration, we can head in that
> direction.  The MSI-X table interaction I see on an average sized guest
> with an assigned 82576 PF certainly isn't calling out for more
> acceleration, but 10G devices and beyond could behave differently.
> Thanks,
> 
> Alex

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-02-07 15:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-30 21:05 [PATCH v2] KVM: Fix assigned device MSI-X entry setting leak Alex Williamson
2012-01-31 19:11 ` Michael S. Tsirkin
2012-02-06 21:46   ` Alex Williamson
2012-02-07  6:31     ` Michael S. Tsirkin
2012-02-07 15:11       ` Alex Williamson
2012-02-07 15:56         ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).