[PATCH]vtd: Fix for irq bind failure after PCI attaching 32 times

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH]vtd: Fix for irq bind failure after PCI attaching 32 times
@ 2011-01-26  8:02 Zhang, Fengzhe
  2011-01-26  8:41 ` Keir Fraser
  2011-01-26 11:10 ` Stefano Stabellini
  0 siblings, 2 replies; 10+ messages in thread
From: Zhang, Fengzhe @ 2011-01-26  8:02 UTC (permalink / raw)
  To: xen-devel; +Cc: Zhang, Fengzhe

[-- Attachment #1: Type: text/plain, Size: 1006 bytes --]

vtd: Fix for irq bind failure after PCI attaching 32 times

Originally when detaching a PCI device, pirq_to_emuirq and pirq_to_irq are freed via hypercall do_physdev_op. Now in function pt_irq_destroy_bind_vtd, duplicated logic is added to free pirq_to_emuirq, but not pirq_to_irq. This causes do_physdev_op fail to free both emuirq and irq. After attaching a PCI device for 32 times, irq resources run out. This patch removes the redundant logic.

Signed-off-by: Fengzhe Zhang <fengzhe.zhang@intel.com>

diff -r 003acf02d416 xen/drivers/passthrough/io.c
--- a/xen/drivers/passthrough/io.c	Thu Jan 20 17:04:06 2011 +0000
+++ b/xen/drivers/passthrough/io.c	Wed Jan 26 23:05:33 2011 +0800
@@ -375,7 +375,6 @@
             hvm_irq_dpci->mirq[machine_gsi].dom   = NULL;
             hvm_irq_dpci->mirq[machine_gsi].flags = 0;
             clear_bit(machine_gsi, hvm_irq_dpci->mapping);
-            unmap_domain_pirq_emuirq(d, machine_gsi);
         }
     }
     spin_unlock(&d->event_lock);

[-- Attachment #2: irq_bind_failure_after_pci-attach_32_times_fix.patch --]
[-- Type: application/octet-stream, Size: 1006 bytes --]

vtd: Fix for irq bind failure after PCI attaching 32 times

Originally when detaching a PCI device, pirq_to_emuirq and pirq_to_irq are freed via hypercall do_physdev_op. Now in function pt_irq_destroy_bind_vtd, duplicated logic is added to free pirq_to_emuirq, but not pirq_to_irq. This causes do_physdev_op fail to free both emuirq and irq. After attaching a PCI device for 32 times, irq resources run out. This patch removes the redundant logic.

Signed-off-by: Fengzhe Zhang <fengzhe.zhang@intel.com>

diff -r 003acf02d416 xen/drivers/passthrough/io.c
--- a/xen/drivers/passthrough/io.c	Thu Jan 20 17:04:06 2011 +0000
+++ b/xen/drivers/passthrough/io.c	Wed Jan 26 23:05:33 2011 +0800
@@ -375,7 +375,6 @@
             hvm_irq_dpci->mirq[machine_gsi].dom   = NULL;
             hvm_irq_dpci->mirq[machine_gsi].flags = 0;
             clear_bit(machine_gsi, hvm_irq_dpci->mapping);
-            unmap_domain_pirq_emuirq(d, machine_gsi);
         }
     }
     spin_unlock(&d->event_lock);

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH]vtd: Fix for irq bind failure after PCI attaching 32 times
  2011-01-26  8:02 [PATCH]vtd: Fix for irq bind failure after PCI attaching 32 times Zhang, Fengzhe
@ 2011-01-26  8:41 ` Keir Fraser
  2011-01-26 11:10 ` Stefano Stabellini
  1 sibling, 0 replies; 10+ messages in thread
From: Keir Fraser @ 2011-01-26  8:41 UTC (permalink / raw)
  To: Zhang, Fengzhe, xen-devel; +Cc: Stefano Stabellini

On 26/01/2011 08:02, "Zhang, Fengzhe" <fengzhe.zhang@intel.com> wrote:

> vtd: Fix for irq bind failure after PCI attaching 32 times
> 
> Originally when detaching a PCI device, pirq_to_emuirq and pirq_to_irq are
> freed via hypercall do_physdev_op. Now in function pt_irq_destroy_bind_vtd,
> duplicated logic is added to free pirq_to_emuirq, but not pirq_to_irq. This
> causes do_physdev_op fail to free both emuirq and irq. After attaching a PCI
> device for 32 times, irq resources run out. This patch removes the redundant
> logic.

This needs an Ack, or alternative fix, from Stefano (cc'ed).

 -- Keir

> Signed-off-by: Fengzhe Zhang <fengzhe.zhang@intel.com>
> 
> diff -r 003acf02d416 xen/drivers/passthrough/io.c
> --- a/xen/drivers/passthrough/io.c Thu Jan 20 17:04:06 2011 +0000
> +++ b/xen/drivers/passthrough/io.c Wed Jan 26 23:05:33 2011 +0800
> @@ -375,7 +375,6 @@
>              hvm_irq_dpci->mirq[machine_gsi].dom   = NULL;
>              hvm_irq_dpci->mirq[machine_gsi].flags = 0;
>              clear_bit(machine_gsi, hvm_irq_dpci->mapping);
> -            unmap_domain_pirq_emuirq(d, machine_gsi);
>          }
>      }
>      spin_unlock(&d->event_lock);
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH]vtd: Fix for irq bind failure after PCI attaching 32 times
  2011-01-26  8:02 [PATCH]vtd: Fix for irq bind failure after PCI attaching 32 times Zhang, Fengzhe
  2011-01-26  8:41 ` Keir Fraser
@ 2011-01-26 11:10 ` Stefano Stabellini
  2011-01-27  7:39   ` Zhang, Fengzhe
  1 sibling, 1 reply; 10+ messages in thread
From: Stefano Stabellini @ 2011-01-26 11:10 UTC (permalink / raw)
  To: Zhang, Fengzhe; +Cc: xen-devel

On Wed, 26 Jan 2011, Zhang, Fengzhe wrote:
> vtd: Fix for irq bind failure after PCI attaching 32 times
> 
> Originally when detaching a PCI device, pirq_to_emuirq and pirq_to_irq are freed via hypercall do_physdev_op. Now in function pt_irq_destroy_bind_vtd, duplicated logic is added to free pirq_to_emuirq, but not pirq_to_irq. This causes do_physdev_op fail to free both emuirq and irq. After attaching a PCI device for 32 times, irq resources run out. This patch removes the redundant logic.
> 
> Signed-off-by: Fengzhe Zhang <fengzhe.zhang@intel.com>
> 

It looks OK in principle, but if the theory is that we should always
call xc_physdev_unmap_pirq after xc_domain_unbind_pt_irq, I can find an
instance of xc_domain_unbind_pt_irq without any corresponding
xc_physdev_unmap_pirq.

Take a look at hw/pass-through.c:pt_reset_interrupt_and_io_mapping in
qemu:

if (ptdev->msi_trans_en == 0 && ptdev->machine_irq)
{
    if (xc_domain_unbind_pt_irq(xc_handle, domid, ptdev->machine_irq,
                    PT_IRQ_TYPE_PCI, 0, e_device, e_intx, 0))
        PT_LOG("Error: Unbinding of interrupt failed!\n");
}

but there is no following xc_physdev_unmap_pirq if MSI and MSIX are
disabled.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH]vtd: Fix for irq bind failure after PCI attaching 32 times
  2011-01-26 11:10 ` Stefano Stabellini
@ 2011-01-27  7:39   ` Zhang, Fengzhe
  2011-01-27 11:16     ` Stefano Stabellini
  2011-02-02 17:35     ` Stefano Stabellini
  0 siblings, 2 replies; 10+ messages in thread
From: Zhang, Fengzhe @ 2011-01-27  7:39 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Zhang, Xiantao

Hi, Stefano,

Here is the calling graph that cause the bug:

unregister_real_device (ioemu)
    |
    +----> pt_msix_disable (ioemu)
            |
            +----> xc_domain_unbind_msi_irq (ioemu)
            |       |
            |       +----> do_domctl (xen) ----> arch_do_domctl (xen) ----> pt_irq_destroy_bind_vtd (xen)
            |              |
            |              +----> unmap_domain_pirq_emuirq (xen)  //freed pirq_to_emuirq
            |
            +----> xc_physdev_unmap_pirq (ioemu)
                   |
                   +----> do_physdev_op (xen) 
                           |
                           +----> physdev_unmap_pirq (xen)
                                   |
                                   +----> unmap_domain_pirq_emuirq (xen)  //found pirq_to_emuirq already freed, abort
                                   |
                                   +----> unmap_domain_pirq (xen)    //not called

The code path you mentioned is not taken for VF dev as its ptdev->machine_irq is 0.


-----Original Message-----
From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com] 
Sent: Wednesday, January 26, 2011 7:11 PM
To: Zhang, Fengzhe
Cc: xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] [PATCH]vtd: Fix for irq bind failure after PCI attaching 32 times

On Wed, 26 Jan 2011, Zhang, Fengzhe wrote:
> vtd: Fix for irq bind failure after PCI attaching 32 times
> 
> Originally when detaching a PCI device, pirq_to_emuirq and pirq_to_irq are freed via hypercall do_physdev_op. Now in function pt_irq_destroy_bind_vtd, duplicated logic is added to free pirq_to_emuirq, but not pirq_to_irq. This causes do_physdev_op fail to free both emuirq and irq. After attaching a PCI device for 32 times, irq resources run out. This patch removes the redundant logic.
> 
> Signed-off-by: Fengzhe Zhang <fengzhe.zhang@intel.com>
> 

It looks OK in principle, but if the theory is that we should always
call xc_physdev_unmap_pirq after xc_domain_unbind_pt_irq, I can find an
instance of xc_domain_unbind_pt_irq without any corresponding
xc_physdev_unmap_pirq.

Take a look at hw/pass-through.c:pt_reset_interrupt_and_io_mapping in
qemu:

if (ptdev->msi_trans_en == 0 && ptdev->machine_irq)
{
    if (xc_domain_unbind_pt_irq(xc_handle, domid, ptdev->machine_irq,
                    PT_IRQ_TYPE_PCI, 0, e_device, e_intx, 0))
        PT_LOG("Error: Unbinding of interrupt failed!\n");
}

but there is no following xc_physdev_unmap_pirq if MSI and MSIX are
disabled.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH]vtd: Fix for irq bind failure after PCI attaching 32 times
  2011-01-27  7:39   ` Zhang, Fengzhe
@ 2011-01-27 11:16     ` Stefano Stabellini
  2011-02-02 17:35     ` Stefano Stabellini
  1 sibling, 0 replies; 10+ messages in thread
From: Stefano Stabellini @ 2011-01-27 11:16 UTC (permalink / raw)
  To: Zhang, Fengzhe; +Cc: xen-devel, Zhang, Xiantao, Stefano Stabellini

On Thu, 27 Jan 2011, Zhang, Fengzhe wrote:
> Hi, Stefano,
> 
> Here is the calling graph that cause the bug:
> 
> unregister_real_device (ioemu)
>     |
>     +----> pt_msix_disable (ioemu)
>             |
>             +----> xc_domain_unbind_msi_irq (ioemu)
>             |       |
>             |       +----> do_domctl (xen) ----> arch_do_domctl (xen) ----> pt_irq_destroy_bind_vtd (xen)
>             |              |
>             |              +----> unmap_domain_pirq_emuirq (xen)  //freed pirq_to_emuirq
>             |
>             +----> xc_physdev_unmap_pirq (ioemu)
>                    |
>                    +----> do_physdev_op (xen) 
>                            |
>                            +----> physdev_unmap_pirq (xen)
>                                    |
>                                    +----> unmap_domain_pirq_emuirq (xen)  //found pirq_to_emuirq already freed, abort
>                                    |
>                                    +----> unmap_domain_pirq (xen)    //not called
> 
> The code path you mentioned is not taken for VF dev as its ptdev->machine_irq is 0.
> 

Thank you for the clarification. I think your patch is correct and
should be applied.


I was just wondering if it is possible to think of another scenario that
would trigger the same kind of bug calling several times
pt_reset_interrupt_and_io_mapping in ioemu, because it seems to me that
there is no xc_physdev_unmap_pirq call there if ptdev->machine_irq != 0.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH]vtd: Fix for irq bind failure after PCI attaching 32 times
  2011-01-27  7:39   ` Zhang, Fengzhe
  2011-01-27 11:16     ` Stefano Stabellini
@ 2011-02-02 17:35     ` Stefano Stabellini
  2011-02-03 15:22       ` Stefano Stabellini
  1 sibling, 1 reply; 10+ messages in thread
From: Stefano Stabellini @ 2011-02-02 17:35 UTC (permalink / raw)
  To: Zhang, Fengzhe; +Cc: xen-devel, Zhang, Xiantao, Stefano Stabellini

On Thu, 27 Jan 2011, Zhang, Fengzhe wrote:
> Hi, Stefano,
> 
> Here is the calling graph that cause the bug:
> 
> unregister_real_device (ioemu)
>     |
>     +----> pt_msix_disable (ioemu)
>             |
>             +----> xc_domain_unbind_msi_irq (ioemu)
>             |       |
>             |       +----> do_domctl (xen) ----> arch_do_domctl (xen) ----> pt_irq_destroy_bind_vtd (xen)
>             |              |
>             |              +----> unmap_domain_pirq_emuirq (xen)  //freed pirq_to_emuirq
>             |
>             +----> xc_physdev_unmap_pirq (ioemu)
>                    |
>                    +----> do_physdev_op (xen) 
>                            |
>                            +----> physdev_unmap_pirq (xen)
>                                    |
>                                    +----> unmap_domain_pirq_emuirq (xen)  //found pirq_to_emuirq already freed, abort
>                                    |
>                                    +----> unmap_domain_pirq (xen)    //not called
> 
> The code path you mentioned is not taken for VF dev as its ptdev->machine_irq is 0.

It has just occurred to me that all this only happens with guest
cooperation: unregister_real_device is called on pci hotplug in response
to guest's action.
That means that a guest that doesn't support pci hot-unplug (or a
malicious guest) won't do anything in response to the acpi SCI interrupt
we send, therefore unregister_real_device will never be called and we
will be leaking MSIs in the host!

Of course we could solve it adding a new xenstore command to qemu that
calls unregister_real_device directly, but it seems to me that relying
on qemu to free some hypervisor/dom0 resources is not a good idea.

Xen knows all the pirq remapped to this domain, so wouldn't it be
possible for Xen to call pt_irq_destroy_bind_vtd and physdev_unmap_pirq
on domain_kill?
I think that Xen shouldn't leak pirqs no matter what the toolstack or
qemu do.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH]vtd: Fix for irq bind failure after PCI attaching 32 times
  2011-02-02 17:35     ` Stefano Stabellini
@ 2011-02-03 15:22       ` Stefano Stabellini
  2011-02-03 16:54         ` Ian Jackson
                           ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Stefano Stabellini @ 2011-02-03 15:22 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Zhang, Xiantao, Zhang, Fengzhe

On Wed, 2 Feb 2011, Stefano Stabellini wrote:
> On Thu, 27 Jan 2011, Zhang, Fengzhe wrote:
> > Hi, Stefano,
> > 
> > Here is the calling graph that cause the bug:
> > 
> > unregister_real_device (ioemu)
> >     |
> >     +----> pt_msix_disable (ioemu)
> >             |
> >             +----> xc_domain_unbind_msi_irq (ioemu)
> >             |       |
> >             |       +----> do_domctl (xen) ----> arch_do_domctl (xen) ----> pt_irq_destroy_bind_vtd (xen)
> >             |              |
> >             |              +----> unmap_domain_pirq_emuirq (xen)  //freed pirq_to_emuirq
> >             |
> >             +----> xc_physdev_unmap_pirq (ioemu)
> >                    |
> >                    +----> do_physdev_op (xen) 
> >                            |
> >                            +----> physdev_unmap_pirq (xen)
> >                                    |
> >                                    +----> unmap_domain_pirq_emuirq (xen)  //found pirq_to_emuirq already freed, abort
> >                                    |
> >                                    +----> unmap_domain_pirq (xen)    //not called
> > 
> > The code path you mentioned is not taken for VF dev as its ptdev->machine_irq is 0.
> 
> It has just occurred to me that all this only happens with guest
> cooperation: unregister_real_device is called on pci hotplug in response
> to guest's action.
> That means that a guest that doesn't support pci hot-unplug (or a
> malicious guest) won't do anything in response to the acpi SCI interrupt
> we send, therefore unregister_real_device will never be called and we
> will be leaking MSIs in the host!
> 
> Of course we could solve it adding a new xenstore command to qemu that
> calls unregister_real_device directly, but it seems to me that relying
> on qemu to free some hypervisor/dom0 resources is not a good idea.
> 
> Xen knows all the pirq remapped to this domain, so wouldn't it be
> possible for Xen to call pt_irq_destroy_bind_vtd and physdev_unmap_pirq
> on domain_kill?
> I think that Xen shouldn't leak pirqs no matter what the toolstack or
> qemu do.
> 

actually it looks like xen is cleaning up after itself:

arch_domain_destroy
        |
        +--> pci_release_devices
        |            |
        |            +--> pci_cleanup_msi
        |                   |
        |                   +--> msi_free_irq
        |                           |
        |                           +--> iommu_update_ire_from_msi
        |                               (should clean the vtd binding, like pt_irq_destroy_bind_vtd)
        |            
        |            
        +--> free_domain_pirqs
                     |
                     +--> unmap_domain_pirq


so it doesn't actually matter if the guest supports pci hotplug or not,
because if it doesn't, xen won't leak any resources anyway.
Am I right? Could you please confirm this?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH]vtd: Fix for irq bind failure after PCI attaching 32 times
  2011-02-03 15:22       ` Stefano Stabellini
@ 2011-02-03 16:54         ` Ian Jackson
  2011-02-14 14:17         ` Stefano Stabellini
  2011-02-25  1:41         ` Kay, Allen M
  2 siblings, 0 replies; 10+ messages in thread
From: Ian Jackson @ 2011-02-03 16:54 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Zhang, Xiantao, Zhang, Fengzhe

Stefano Stabellini writes ("RE: [Xen-devel] [PATCH]vtd: Fix for irq bind failure after PCI attaching 32 times"):
> so it doesn't actually matter if the guest supports pci hotplug or not,
> because if it doesn't, xen won't leak any resources anyway.
> Am I right? Could you please confirm this?

Is there some way to get a list of the MSIs etc. which have been
reserved/allocated ?  If so my automatic test system could check they
weren't leaked.

(Although this will be of more interest when I make it do some testing
of pci passthrough...)

Ian.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH]vtd: Fix for irq bind failure after PCI attaching 32 times
  2011-02-03 15:22       ` Stefano Stabellini
  2011-02-03 16:54         ` Ian Jackson
@ 2011-02-14 14:17         ` Stefano Stabellini
  2011-02-25  1:41         ` Kay, Allen M
  2 siblings, 0 replies; 10+ messages in thread
From: Stefano Stabellini @ 2011-02-14 14:17 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Zhang, Xiantao, Zhang, Fengzhe

ping?

On Thu, 3 Feb 2011, Stefano Stabellini wrote:
> On Wed, 2 Feb 2011, Stefano Stabellini wrote:
> > On Thu, 27 Jan 2011, Zhang, Fengzhe wrote:
> > > Hi, Stefano,
> > > 
> > > Here is the calling graph that cause the bug:
> > > 
> > > unregister_real_device (ioemu)
> > >     |
> > >     +----> pt_msix_disable (ioemu)
> > >             |
> > >             +----> xc_domain_unbind_msi_irq (ioemu)
> > >             |       |
> > >             |       +----> do_domctl (xen) ----> arch_do_domctl (xen) ----> pt_irq_destroy_bind_vtd (xen)
> > >             |              |
> > >             |              +----> unmap_domain_pirq_emuirq (xen)  //freed pirq_to_emuirq
> > >             |
> > >             +----> xc_physdev_unmap_pirq (ioemu)
> > >                    |
> > >                    +----> do_physdev_op (xen) 
> > >                            |
> > >                            +----> physdev_unmap_pirq (xen)
> > >                                    |
> > >                                    +----> unmap_domain_pirq_emuirq (xen)  //found pirq_to_emuirq already freed, abort
> > >                                    |
> > >                                    +----> unmap_domain_pirq (xen)    //not called
> > > 
> > > The code path you mentioned is not taken for VF dev as its ptdev->machine_irq is 0.
> > 
> > It has just occurred to me that all this only happens with guest
> > cooperation: unregister_real_device is called on pci hotplug in response
> > to guest's action.
> > That means that a guest that doesn't support pci hot-unplug (or a
> > malicious guest) won't do anything in response to the acpi SCI interrupt
> > we send, therefore unregister_real_device will never be called and we
> > will be leaking MSIs in the host!
> > 
> > Of course we could solve it adding a new xenstore command to qemu that
> > calls unregister_real_device directly, but it seems to me that relying
> > on qemu to free some hypervisor/dom0 resources is not a good idea.
> > 
> > Xen knows all the pirq remapped to this domain, so wouldn't it be
> > possible for Xen to call pt_irq_destroy_bind_vtd and physdev_unmap_pirq
> > on domain_kill?
> > I think that Xen shouldn't leak pirqs no matter what the toolstack or
> > qemu do.
> > 
> 
> actually it looks like xen is cleaning up after itself:
> 
> arch_domain_destroy
>         |
>         +--> pci_release_devices
>         |            |
>         |            +--> pci_cleanup_msi
>         |                   |
>         |                   +--> msi_free_irq
>         |                           |
>         |                           +--> iommu_update_ire_from_msi
>         |                               (should clean the vtd binding, like pt_irq_destroy_bind_vtd)
>         |            
>         |            
>         +--> free_domain_pirqs
>                      |
>                      +--> unmap_domain_pirq
> 
> 
> so it doesn't actually matter if the guest supports pci hotplug or not,
> because if it doesn't, xen won't leak any resources anyway.
> Am I right? Could you please confirm this?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH]vtd: Fix for irq bind failure after PCI attaching 32 times
  2011-02-03 15:22       ` Stefano Stabellini
  2011-02-03 16:54         ` Ian Jackson
  2011-02-14 14:17         ` Stefano Stabellini
@ 2011-02-25  1:41         ` Kay, Allen M
  2 siblings, 0 replies; 10+ messages in thread
From: Kay, Allen M @ 2011-02-25  1:41 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Zhang, Xiantao, Zhang, Fengzhe

>        |                           +--> iommu_update_ire_from_msi
>        |                               (should clean the vtd binding, like pt_irq_destroy_bind_vtd)

Stefano, iommu_update_ire_from_msi() maps to intremap.c/msi_msg_write_remap_rte() with vt-d hardware.  This function strictly deals with VT-d interrupt remapping HW related tasks.  It does not do any irq cleanup for xen.

Allen

-----Original Message-----
From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Stefano Stabellini
Sent: Thursday, February 03, 2011 7:23 AM
To: Stefano Stabellini
Cc: xen-devel@lists.xensource.com; Zhang, Xiantao; Zhang, Fengzhe
Subject: RE: [Xen-devel] [PATCH]vtd: Fix for irq bind failure after PCI attaching 32 times

On Wed, 2 Feb 2011, Stefano Stabellini wrote:
> On Thu, 27 Jan 2011, Zhang, Fengzhe wrote:
> > Hi, Stefano,
> > 
> > Here is the calling graph that cause the bug:
> > 
> > unregister_real_device (ioemu)
> >     |
> >     +----> pt_msix_disable (ioemu)
> >             |
> >             +----> xc_domain_unbind_msi_irq (ioemu)
> >             |       |
> >             |       +----> do_domctl (xen) ----> arch_do_domctl (xen) ----> pt_irq_destroy_bind_vtd (xen)
> >             |              |
> >             |              +----> unmap_domain_pirq_emuirq (xen)  //freed pirq_to_emuirq
> >             |
> >             +----> xc_physdev_unmap_pirq (ioemu)
> >                    |
> >                    +----> do_physdev_op (xen) 
> >                            |
> >                            +----> physdev_unmap_pirq (xen)
> >                                    |
> >                                    +----> unmap_domain_pirq_emuirq (xen)  //found pirq_to_emuirq already freed, abort
> >                                    |
> >                                    +----> unmap_domain_pirq (xen)    //not called
> > 
> > The code path you mentioned is not taken for VF dev as its ptdev->machine_irq is 0.
> 
> It has just occurred to me that all this only happens with guest
> cooperation: unregister_real_device is called on pci hotplug in response
> to guest's action.
> That means that a guest that doesn't support pci hot-unplug (or a
> malicious guest) won't do anything in response to the acpi SCI interrupt
> we send, therefore unregister_real_device will never be called and we
> will be leaking MSIs in the host!
> 
> Of course we could solve it adding a new xenstore command to qemu that
> calls unregister_real_device directly, but it seems to me that relying
> on qemu to free some hypervisor/dom0 resources is not a good idea.
> 
> Xen knows all the pirq remapped to this domain, so wouldn't it be
> possible for Xen to call pt_irq_destroy_bind_vtd and physdev_unmap_pirq
> on domain_kill?
> I think that Xen shouldn't leak pirqs no matter what the toolstack or
> qemu do.
> 

actually it looks like xen is cleaning up after itself:

arch_domain_destroy
        |
        +--> pci_release_devices
        |            |
        |            +--> pci_cleanup_msi
        |                   |
        |                   +--> msi_free_irq
        |                           |
        |                           +--> iommu_update_ire_from_msi
        |                               (should clean the vtd binding, like pt_irq_destroy_bind_vtd)
        |            
        |            
        +--> free_domain_pirqs
                     |
                     +--> unmap_domain_pirq


so it doesn't actually matter if the guest supports pci hotplug or not,
because if it doesn't, xen won't leak any resources anyway.
Am I right? Could you please confirm this?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-02-25  1:41 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-26  8:02 [PATCH]vtd: Fix for irq bind failure after PCI attaching 32 times Zhang, Fengzhe
2011-01-26  8:41 ` Keir Fraser
2011-01-26 11:10 ` Stefano Stabellini
2011-01-27  7:39   ` Zhang, Fengzhe
2011-01-27 11:16     ` Stefano Stabellini
2011-02-02 17:35     ` Stefano Stabellini
2011-02-03 15:22       ` Stefano Stabellini
2011-02-03 16:54         ` Ian Jackson
2011-02-14 14:17         ` Stefano Stabellini
2011-02-25  1:41         ` Kay, Allen M

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.