All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
@ 2016-01-07  1:39 Xu, Quan
  2016-01-07 13:28 ` Jan Beulich
  0 siblings, 1 reply; 35+ messages in thread
From: Xu, Quan @ 2016-01-07  1:39 UTC (permalink / raw)
  To: jbeulich
  Cc: Tian, Kevin, Wu, Feng, Xu, Quan, george.dunlap, tim, Dong, Eddie,
	xen-devel, Nakajima, Jun, andrew.cooper3, keir

On January 06, 2016 7:26 PM, <quan.xu@intel.com> wrote: 
> > > diff --git a/xen/drivers/passthrough/vtd/qinval.c
> > > b/xen/drivers/passthrough/vtd/qinval.c
> > > index b227e4e..7330c5d 100644
> > > --- a/xen/drivers/passthrough/vtd/qinval.c
> > > +++ b/xen/drivers/passthrough/vtd/qinval.c
> > > @@ -190,9 +190,19 @@ static int queue_invalidate_wait(struct iommu
> > > *iommu,  static int invalidate_sync(struct iommu *iommu)  {
> > >      struct qi_ctrl *qi_ctrl = iommu_qi_ctrl(iommu);
> > > +    int rc;
> > >
> > >      if ( qi_ctrl->qinval_maddr )
> > > -        return queue_invalidate_wait(iommu, 0, 1, 1);
> > > +    {
> > > +        rc = queue_invalidate_wait(iommu, 0, 1, 1);
> > > +        if ( rc )
> > > +        {
> > > +            if ( rc == -ETIMEDOUT )
> > > +                panic("Queue invalidate wait descriptor was
> > timeout.\n");
> > > +            return rc;
> > > +        }
> > > +    }
> > > +
> >
> > why do you still do panic here? w/ PATCH 1/3 you should return rc to
> > caller directly, right?
> >
> 
> This panic is original in queue_invalidate_wait(). Patch 1/3 is just for Device-TLB
> flush error ( Actually it covers iotlb flush error).
> If I fix VT-d iotlb/context/iec flush error to propagated callers, then I can
> remove this panic and return rc to propagated caller directly.
> 
> 
> 
> > > +static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16 did,
> > > +                                         u16 seg, u8 bus, u8
> devfn)
> > {
> > > +    struct domain *d;
> > > +    struct pci_dev *pdev;
> > > +
> > > +    d = rcu_lock_domain_by_id(iommu->domid_map[did]);
> > > +    ASSERT(d);
> > > +    for_each_pdev(d, pdev)
> > > +    {
> > > +        if ( (pdev->seg == seg) &&
> > > +             (pdev->bus == bus) &&
> > > +             (pdev->devfn == devfn) )
> > > +        {
> > > +            if ( pdev->domain )
> > > +            {
> > > +                list_del(&pdev->domain_list);
> > > +                pdev->domain = NULL;
> > > +            }
> > > +
> > > +            if ( pci_hide_device(bus, devfn) )
> > > +            {
> > > +                printk(XENLOG_ERR
> > > +                       "IOMMU hide device %04x:%02x:%02x error.",
> > > +                       seg, bus, devfn);
> > > +                break;
> > > +            }
> >
> > I'm not sure whether using pci_hide_device is enough or not break
> > other code path here? For example, now you have errors propagated to
> callers.
> 
> More information, after call pci_hide_device(), ..
>          pdev->domain = dom_xen;
>          list_add(&pdev->domain_list, &dom_xen->arch.pdev_list); ..
> 
> Hidden PCI devices are associated with 'dom_xen'.
> Now I found it may not clear rmrr mapping / context mapping. .i.e
> iommu_remove_device() -- [hd->platform_ops is NULL for dom_xen].
> But I think it is acceptable, IMO dom_xen is a part hypervisor, and there
> mappings would not a security issue.
> Or I can remove these mappings before to hide this device.
> 
> So I think it will not break other code break other code.
> 
> > What's the
> > final policy against flush error?
> >
> 
> If Device-TLB flush is timeout, we'll hide the target ATS device and crash the
> domain owning this ATS device.
> If impacted domain is hardware domain, just throw out a warning, instead of
> crash the hardware domain.
> The hided Device will be disallowed to be further assigned to any domain.
> 
> 
> 
> >If they will finally go to cleanup some more state  relying on
> >pdev->domain, then that code path might be broken. If it's fine to do
> >cleanup at this point, then just hidden is not enough. If you look at
> >pci_remove_device, there's quite some state to be further cleaned up...
> >
> 
> 
> As the pdev->domain is 'dom_xen';
> So for this case, it is still working. Correct me if it is not correct.
> BTW, a quick question, which case to call pci_remove_device()?
> 
> 
> > I didn't think about the full logic thoroughly now. But it would
> > always be good to not hide device now until a point where all related
> > states have been cleaned up in error handling path chained up.
> >

Jan, could you help me to double check it? thanks.

Quan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-07  1:39 [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue Xu, Quan
@ 2016-01-07 13:28 ` Jan Beulich
  2016-01-07 13:46   ` Xu, Quan
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2016-01-07 13:28 UTC (permalink / raw)
  To: Quan Xu
  Cc: tim, Kevin Tian, Feng Wu, george.dunlap, andrew.cooper3,
	Eddie Dong, xen-devel, Jun Nakajima, keir

>>> On 07.01.16 at 02:39, <quan.xu@intel.com> wrote:
> On January 06, 2016 7:26 PM, <quan.xu@intel.com> wrote: 
>> > I didn't think about the full logic thoroughly now. But it would
>> > always be good to not hide device now until a point where all related
>> > states have been cleaned up in error handling path chained up.
>> >
> 
> Jan, could you help me to double check it? thanks.

I'm not sure I understand what you want or need, the more that
I didn't even get around to look at the patches yet.

Jan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-07 13:28 ` Jan Beulich
@ 2016-01-07 13:46   ` Xu, Quan
  2016-01-08  1:06     ` Xu, Quan
  2016-01-13  6:12     ` Tian, Kevin
  0 siblings, 2 replies; 35+ messages in thread
From: Xu, Quan @ 2016-01-07 13:46 UTC (permalink / raw)
  To: Jan Beulich, Tian, Kevin
  Cc: keir, george.dunlap, andrew.cooper3, tim, xen-devel, Nakajima,
	Jun, Wu, Feng

> On January 07, 2016 9:28 PM, <JBeulich@suse.com> wrote:
> >>> On 07.01.16 at 02:39, <quan.xu@intel.com> wrote:
> > On January 06, 2016 7:26 PM, <quan.xu@intel.com> wrote:
> >> > I didn't think about the full logic thoroughly now. But it would
> >> > always be good to not hide device now until a point where all
> >> > related states have been cleaned up in error handling path chained up.
> >> >
> >
> > Jan, could you help me to double check it? thanks.
> 
> I'm not sure I understand what you want or need, the more that I didn't even
> get around to look at the patches yet.
> 

Jan, 
Patch 2/3 and Patch 3/3 are based on v3 (Actually they are v3's patch 1/2 and patch 2/2).
We have discussed how to hide a device with pci_hide_device() when Device-TLB flush is error. 

Now there are 2 concerns:
      1. Hide the PCI device may break the code path. (then the pdev->domain is dom_xen)
      2. Is the blew logic right?
           --If Device-TLB flush is timeout, we'll hide the target ATS device and crash the domain owning this ATS device.
             If impacted domain is hardware domain, just throw out a warning, instead of crash the hardware domain.
            The hided Device will be disallowed to be further assigned to any domain.

Kevin, correct me if I am wrong.


Quan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-07 13:46   ` Xu, Quan
@ 2016-01-08  1:06     ` Xu, Quan
  2016-01-13  6:12     ` Tian, Kevin
  1 sibling, 0 replies; 35+ messages in thread
From: Xu, Quan @ 2016-01-08  1:06 UTC (permalink / raw)
  To: Xu, Quan, Jan Beulich, Tian, Kevin
  Cc: keir, george.dunlap, andrew.cooper3, tim, xen-devel, Nakajima,
	Jun, Wu, Feng

On January 07, 2016 9:47 PM, <quan.xu@intel.com> wrote:
> Jan,
> Patch 2/3 and Patch 3/3 are based on v3 (Actually they are v3's patch 1/2 and
> patch 2/2).
> We have discussed how to hide a device with pci_hide_device() when
> Device-TLB flush is error.
> 
> Now there are 2 concerns:
>       1. Hide the PCI device may break the code path. (then the pdev->domain
> is dom_xen)
>       2. Is the blew logic right?
>            --If Device-TLB flush is timeout, we'll hide the target ATS device
> and crash the domain owning this ATS device.
>              If impacted domain is hardware domain, just throw out a
> warning, instead of crash the hardware domain.
>             The hided Device will be disallowed to be further assigned to any

Sorry, just correct it. 'hided Device' -> 'hidden device'. 



> domain.
> 
> Kevin, correct me if I am wrong.
> 
> 
> Quan
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-07 13:46   ` Xu, Quan
  2016-01-08  1:06     ` Xu, Quan
@ 2016-01-13  6:12     ` Tian, Kevin
  2016-01-13 11:02       ` Jan Beulich
  1 sibling, 1 reply; 35+ messages in thread
From: Tian, Kevin @ 2016-01-13  6:12 UTC (permalink / raw)
  To: Xu, Quan, Jan Beulich
  Cc: keir, george.dunlap, andrew.cooper3, tim, xen-devel, Nakajima,
	Jun, Wu, Feng

> From: Xu, Quan
> Sent: Thursday, January 07, 2016 9:47 PM
> 
> > On January 07, 2016 9:28 PM, <JBeulich@suse.com> wrote:
> > >>> On 07.01.16 at 02:39, <quan.xu@intel.com> wrote:
> > > On January 06, 2016 7:26 PM, <quan.xu@intel.com> wrote:
> > >> > I didn't think about the full logic thoroughly now. But it would
> > >> > always be good to not hide device now until a point where all
> > >> > related states have been cleaned up in error handling path chained up.
> > >> >
> > >
> > > Jan, could you help me to double check it? thanks.
> >
> > I'm not sure I understand what you want or need, the more that I didn't even
> > get around to look at the patches yet.
> >
> 
> Jan,
> Patch 2/3 and Patch 3/3 are based on v3 (Actually they are v3's patch 1/2 and patch 2/2).
> We have discussed how to hide a device with pci_hide_device() when Device-TLB flush is
> error.
> 
> Now there are 2 concerns:
>       1. Hide the PCI device may break the code path. (then the pdev->domain is
> dom_xen)
>       2. Is the blew logic right?
>            --If Device-TLB flush is timeout, we'll hide the target ATS device and crash the
> domain owning this ATS device.
>              If impacted domain is hardware domain, just throw out a warning, instead of
> crash the hardware domain.
>             The hided Device will be disallowed to be further assigned to any domain.
> 
> Kevin, correct me if I am wrong.
> 
> 

for 2, yes it's the policy we agreed in previous discussion.

for 1, after more thinking I think the concern is real. pci_hide_device
is used once in early boot-up phase. At that time, it's simple to just
have right owner configured. However in the path of normal device
assign/deassign, there are tons of more state associated which may
be related to the owner. Though we may do some special handling
in related code paths to have dom_xen specially handled, it's way
tricky and not easy to maintain.

I think the cleaner solution, similar to your earlier version, is to
set a flag and then continue existing calling chains with all required
error handling completed. Only at that place we can safely invoke
pci_hide_device. If outmost callers are scattered, we may do a 
lazy hide until next time when it's assigned to another guest while
the new flag is recognized.

Jan, your comments?

Thanks
Kevin

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-13  6:12     ` Tian, Kevin
@ 2016-01-13 11:02       ` Jan Beulich
  2016-01-14  1:22         ` Tian, Kevin
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2016-01-13 11:02 UTC (permalink / raw)
  To: Kevin Tian, Quan Xu
  Cc: keir, george.dunlap, andrew.cooper3, tim, xen-devel,
	Jun Nakajima, Feng Wu

>>> On 13.01.16 at 07:12, <kevin.tian@intel.com> wrote:
>>  From: Xu, Quan
>> Sent: Thursday, January 07, 2016 9:47 PM
>> 
>> > On January 07, 2016 9:28 PM, <JBeulich@suse.com> wrote:
>> > >>> On 07.01.16 at 02:39, <quan.xu@intel.com> wrote:
>> > > On January 06, 2016 7:26 PM, <quan.xu@intel.com> wrote:
>> > >> > I didn't think about the full logic thoroughly now. But it would
>> > >> > always be good to not hide device now until a point where all
>> > >> > related states have been cleaned up in error handling path chained up.
>> > >> >
>> > >
>> > > Jan, could you help me to double check it? thanks.
>> >
>> > I'm not sure I understand what you want or need, the more that I didn't 
> even
>> > get around to look at the patches yet.
>> >
>> 
>> Jan,
>> Patch 2/3 and Patch 3/3 are based on v3 (Actually they are v3's patch 1/2 
> and patch 2/2).
>> We have discussed how to hide a device with pci_hide_device() when Device-TLB 
> flush is
>> error.
>> 
>> Now there are 2 concerns:
>>       1. Hide the PCI device may break the code path. (then the pdev->domain 
> is
>> dom_xen)
>>       2. Is the blew logic right?
>>            --If Device-TLB flush is timeout, we'll hide the target ATS device 
> and crash the
>> domain owning this ATS device.
>>              If impacted domain is hardware domain, just throw out a 
> warning, instead of
>> crash the hardware domain.
>>             The hided Device will be disallowed to be further assigned to 
> any domain.
>> 
>> Kevin, correct me if I am wrong.
>> 
>> 
> 
> for 2, yes it's the policy we agreed in previous discussion.
> 
> for 1, after more thinking I think the concern is real. pci_hide_device
> is used once in early boot-up phase. At that time, it's simple to just
> have right owner configured. However in the path of normal device
> assign/deassign, there are tons of more state associated which may
> be related to the owner. Though we may do some special handling
> in related code paths to have dom_xen specially handled, it's way
> tricky and not easy to maintain.

I don't buy this without one of you pointing out the actual
difficulties: The domain is being crashed anyway, so there's no
true device de-assignment needed (IOMMU tables will get torn
down no matter what).

Jan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-13 11:02       ` Jan Beulich
@ 2016-01-14  1:22         ` Tian, Kevin
  2016-01-14  9:00           ` Jan Beulich
  0 siblings, 1 reply; 35+ messages in thread
From: Tian, Kevin @ 2016-01-14  1:22 UTC (permalink / raw)
  To: Jan Beulich, Xu, Quan
  Cc: keir, george.dunlap, andrew.cooper3, tim, xen-devel, Nakajima,
	Jun, Wu, Feng

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Wednesday, January 13, 2016 7:03 PM
> 
> >> Jan,
> >> Patch 2/3 and Patch 3/3 are based on v3 (Actually they are v3's patch 1/2
> > and patch 2/2).
> >> We have discussed how to hide a device with pci_hide_device() when Device-TLB
> > flush is
> >> error.
> >>
> >> Now there are 2 concerns:
> >>       1. Hide the PCI device may break the code path. (then the pdev->domain
> > is
> >> dom_xen)
> >>       2. Is the blew logic right?
> >>            --If Device-TLB flush is timeout, we'll hide the target ATS device
> > and crash the
> >> domain owning this ATS device.
> >>              If impacted domain is hardware domain, just throw out a
> > warning, instead of
> >> crash the hardware domain.
> >>             The hided Device will be disallowed to be further assigned to
> > any domain.
> >>
> >> Kevin, correct me if I am wrong.
> >>
> >>
> >
> > for 2, yes it's the policy we agreed in previous discussion.
> >
> > for 1, after more thinking I think the concern is real. pci_hide_device
> > is used once in early boot-up phase. At that time, it's simple to just
> > have right owner configured. However in the path of normal device
> > assign/deassign, there are tons of more state associated which may
> > be related to the owner. Though we may do some special handling
> > in related code paths to have dom_xen specially handled, it's way
> > tricky and not easy to maintain.
> 
> I don't buy this without one of you pointing out the actual
> difficulties: The domain is being crashed anyway, so there's no
> true device de-assignment needed (IOMMU tables will get torn
> down no matter what).
> 

Here is one example of a quick glimpse:

static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev)
{
	[...]

    ret = domain_context_mapping(pdev->domain, devfn, pdev);
    if ( ret )
    {
        dprintk(XENLOG_ERR VTDPREFIX, "d%d: context mapping failed\n",
                pdev->domain->domain_id);
        return ret;
    }

If pdev->domain is changed inside due to pci_hide_device, the error
message will be inaccurate.

That is why I think we'd better hide device until current call chain
completes.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-14  1:22         ` Tian, Kevin
@ 2016-01-14  9:00           ` Jan Beulich
  2016-01-14  9:12             ` Tian, Kevin
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2016-01-14  9:00 UTC (permalink / raw)
  To: Kevin Tian, Quan Xu
  Cc: keir, george.dunlap, andrew.cooper3, tim, xen-devel,
	Jun Nakajima, Feng Wu

>>> On 14.01.16 at 02:22, <kevin.tian@intel.com> wrote:
>>  From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Wednesday, January 13, 2016 7:03 PM
>> 
>> >> Jan,
>> >> Patch 2/3 and Patch 3/3 are based on v3 (Actually they are v3's patch 1/2
>> > and patch 2/2).
>> >> We have discussed how to hide a device with pci_hide_device() when 
> Device-TLB
>> > flush is
>> >> error.
>> >>
>> >> Now there are 2 concerns:
>> >>       1. Hide the PCI device may break the code path. (then the pdev->domain
>> > is
>> >> dom_xen)
>> >>       2. Is the blew logic right?
>> >>            --If Device-TLB flush is timeout, we'll hide the target ATS device
>> > and crash the
>> >> domain owning this ATS device.
>> >>              If impacted domain is hardware domain, just throw out a
>> > warning, instead of
>> >> crash the hardware domain.
>> >>             The hided Device will be disallowed to be further assigned to
>> > any domain.
>> >>
>> >> Kevin, correct me if I am wrong.
>> >>
>> >>
>> >
>> > for 2, yes it's the policy we agreed in previous discussion.
>> >
>> > for 1, after more thinking I think the concern is real. pci_hide_device
>> > is used once in early boot-up phase. At that time, it's simple to just
>> > have right owner configured. However in the path of normal device
>> > assign/deassign, there are tons of more state associated which may
>> > be related to the owner. Though we may do some special handling
>> > in related code paths to have dom_xen specially handled, it's way
>> > tricky and not easy to maintain.
>> 
>> I don't buy this without one of you pointing out the actual
>> difficulties: The domain is being crashed anyway, so there's no
>> true device de-assignment needed (IOMMU tables will get torn
>> down no matter what).
>> 
> 
> Here is one example of a quick glimpse:
> 
> static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev)
> {
> 	[...]
> 
>     ret = domain_context_mapping(pdev->domain, devfn, pdev);
>     if ( ret )
>     {
>         dprintk(XENLOG_ERR VTDPREFIX, "d%d: context mapping failed\n",
>                 pdev->domain->domain_id);
>         return ret;
>     }
> 
> If pdev->domain is changed inside due to pci_hide_device, the error
> message will be inaccurate.

But that's a thing trivial to fix up. You had talked about tricky things...

Jan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-14  9:00           ` Jan Beulich
@ 2016-01-14  9:12             ` Tian, Kevin
  2016-01-14 10:46               ` Xu, Quan
  0 siblings, 1 reply; 35+ messages in thread
From: Tian, Kevin @ 2016-01-14  9:12 UTC (permalink / raw)
  To: Jan Beulich, Xu, Quan
  Cc: keir, george.dunlap, andrew.cooper3, tim, xen-devel, Nakajima,
	Jun, Wu, Feng

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Thursday, January 14, 2016 5:00 PM
> 
> >>> On 14.01.16 at 02:22, <kevin.tian@intel.com> wrote:
> >>  From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Wednesday, January 13, 2016 7:03 PM
> >>
> >> >> Jan,
> >> >> Patch 2/3 and Patch 3/3 are based on v3 (Actually they are v3's patch 1/2
> >> > and patch 2/2).
> >> >> We have discussed how to hide a device with pci_hide_device() when
> > Device-TLB
> >> > flush is
> >> >> error.
> >> >>
> >> >> Now there are 2 concerns:
> >> >>       1. Hide the PCI device may break the code path. (then the pdev->domain
> >> > is
> >> >> dom_xen)
> >> >>       2. Is the blew logic right?
> >> >>            --If Device-TLB flush is timeout, we'll hide the target ATS device
> >> > and crash the
> >> >> domain owning this ATS device.
> >> >>              If impacted domain is hardware domain, just throw out a
> >> > warning, instead of
> >> >> crash the hardware domain.
> >> >>             The hided Device will be disallowed to be further assigned to
> >> > any domain.
> >> >>
> >> >> Kevin, correct me if I am wrong.
> >> >>
> >> >>
> >> >
> >> > for 2, yes it's the policy we agreed in previous discussion.
> >> >
> >> > for 1, after more thinking I think the concern is real. pci_hide_device
> >> > is used once in early boot-up phase. At that time, it's simple to just
> >> > have right owner configured. However in the path of normal device
> >> > assign/deassign, there are tons of more state associated which may
> >> > be related to the owner. Though we may do some special handling
> >> > in related code paths to have dom_xen specially handled, it's way
> >> > tricky and not easy to maintain.
> >>
> >> I don't buy this without one of you pointing out the actual
> >> difficulties: The domain is being crashed anyway, so there's no
> >> true device de-assignment needed (IOMMU tables will get torn
> >> down no matter what).
> >>
> >
> > Here is one example of a quick glimpse:
> >
> > static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev)
> > {
> > 	[...]
> >
> >     ret = domain_context_mapping(pdev->domain, devfn, pdev);
> >     if ( ret )
> >     {
> >         dprintk(XENLOG_ERR VTDPREFIX, "d%d: context mapping failed\n",
> >                 pdev->domain->domain_id);
> >         return ret;
> >     }
> >
> > If pdev->domain is changed inside due to pci_hide_device, the error
> > message will be inaccurate.
> 
> But that's a thing trivial to fix up. You had talked about tricky things...
> 

It's not about how this specific thing can be fixed. I didn't check all the code. 
There could be more examples, and in the future all new code need to be
aware that the majority of IOMMU functions may have pdev->domain changed
due to error. My concern is more that it's not a good design. I think it's natural
to have a state changed only after all existing references in the call 
chain have been completed. Adding a new flag to delay hiding device looks much 
clearer to me.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-14  9:12             ` Tian, Kevin
@ 2016-01-14 10:46               ` Xu, Quan
  2016-01-18 15:32                 ` Tim Deegan
  0 siblings, 1 reply; 35+ messages in thread
From: Xu, Quan @ 2016-01-14 10:46 UTC (permalink / raw)
  To: tim, andrew.cooper3
  Cc: Tian, Kevin, keir, Nakajima, Jun, george.dunlap, xen-devel,
	Jan Beulich, Wu, Feng



> -----Original Message-----
> From: Tian, Kevin
> Sent: Thursday, January 14, 2016 5:13 PM
> To: Jan Beulich; Xu, Quan
> Cc: andrew.cooper3@citrix.com; george.dunlap@eu.citrix.com; Wu, Feng;
> Nakajima, Jun; xen-devel@lists.xen.org; keir@xen.org; tim@xen.org
> Subject: RE: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout
> issue.
> 

> > From: Jan Beulich [mailto:JBeulich@suse.com]
> > Sent: Thursday, January 14, 2016 5:00 PM
> >
> > >>> On 14.01.16 at 02:22, <kevin.tian@intel.com> wrote:
> > >>  From: Jan Beulich [mailto:JBeulich@suse.com]
> > >> Sent: Wednesday, January 13, 2016 7:03 PM
> > >>
> > >> >> Jan,
> > >> >> Patch 2/3 and Patch 3/3 are based on v3 (Actually they are v3's
> > >> >> patch 1/2
> > >> > and patch 2/2).
> > >> >> We have discussed how to hide a device with pci_hide_device()
> > >> >> when
> > > Device-TLB
> > >> > flush is
> > >> >> error.
> > >> >>
> > >> >> Now there are 2 concerns:
> > >> >>       1. Hide the PCI device may break the code path. (then the
> > >> >> pdev->domain
> > >> > is
> > >> >> dom_xen)
> > >> >>       2. Is the blew logic right?
> > >> >>            --If Device-TLB flush is timeout, we'll hide the
> > >> >> target ATS device
> > >> > and crash the
> > >> >> domain owning this ATS device.
> > >> >>              If impacted domain is hardware domain, just throw
> > >> >> out a
> > >> > warning, instead of
> > >> >> crash the hardware domain.
> > >> >>             The hided Device will be disallowed to be further
> > >> >> assigned to
> > >> > any domain.
> > >> >>
> > >> >> Kevin, correct me if I am wrong.
> > >> >>
> > >> >>
> > >> >
> > >> > for 2, yes it's the policy we agreed in previous discussion.
> > >> >
> > >> > for 1, after more thinking I think the concern is real.
> > >> > pci_hide_device is used once in early boot-up phase. At that
> > >> > time, it's simple to just have right owner configured. However in
> > >> > the path of normal device assign/deassign, there are tons of more
> > >> > state associated which may be related to the owner. Though we may
> > >> > do some special handling in related code paths to have dom_xen
> > >> > specially handled, it's way tricky and not easy to maintain.
> > >>
> > >> I don't buy this without one of you pointing out the actual
> > >> difficulties: The domain is being crashed anyway, so there's no
> > >> true device de-assignment needed (IOMMU tables will get torn down
> > >> no matter what).
> > >>
> > >
> > > Here is one example of a quick glimpse:
> > >
> > > static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev) {
> > > 	[...]
> > >
> > >     ret = domain_context_mapping(pdev->domain, devfn, pdev);
> > >     if ( ret )
> > >     {
> > >         dprintk(XENLOG_ERR VTDPREFIX, "d%d: context mapping
> failed\n",
> > >                 pdev->domain->domain_id);
> > >         return ret;
> > >     }
> > >
> > > If pdev->domain is changed inside due to pci_hide_device, the error
> > > message will be inaccurate.
> >
> > But that's a thing trivial to fix up. You had talked about tricky things...
> >
> 
> It's not about how this specific thing can be fixed. I didn't check all the code.
> There could be more examples, and in the future all new code need to be aware
> that the majority of IOMMU functions may have pdev->domain changed due to
> error. My concern is more that it's not a good design. I think it's natural to have
> a state changed only after all existing references in the call chain have been
> completed. Adding a new flag to delay hiding device looks much clearer to me.
> 

It looks we are not on the same page for how to hide a device.
Actually I have implemented these two ideas with pci_hide_device() or flag in previous versions.

Andrew and Tim, what's your opinion?


Quan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-14 10:46               ` Xu, Quan
@ 2016-01-18 15:32                 ` Tim Deegan
  2016-01-19  0:46                   ` Tian, Kevin
  2016-01-19  6:54                   ` Xu, Quan
  0 siblings, 2 replies; 35+ messages in thread
From: Tim Deegan @ 2016-01-18 15:32 UTC (permalink / raw)
  To: Xu, Quan
  Cc: Tian, Kevin, keir, Jan Beulich, george.dunlap, andrew.cooper3,
	xen-devel, Nakajima, Jun, Wu, Feng

At 10:46 +0000 on 14 Jan (1452768377), Xu, Quan wrote:
> > It's not about how this specific thing can be fixed. I didn't check all the code.
> > There could be more examples, and in the future all new code need to be aware
> > that the majority of IOMMU functions may have pdev->domain changed due to
> > error. My concern is more that it's not a good design. I think it's natural to have
> > a state changed only after all existing references in the call chain have been
> > completed. Adding a new flag to delay hiding device looks much clearer to me.
> 
> It looks we are not on the same page for how to hide a device.
> Actually I have implemented these two ideas with pci_hide_device()
> or flag in previous versions.
> 
> Andrew and Tim, what's your opinion?

I have no strong opinion, since this isn't my area (and really never
was - I worked on address translation more than device allocation).
Since you ask, I don't see anything wrong with hiding the device if
you already own all the locks, and I'd be inclined to make whatever
changes are necessary ASAP after the error.  Deferring the change (and
having all callers need to look for the deferred error) sounds
fragile.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-18 15:32                 ` Tim Deegan
@ 2016-01-19  0:46                   ` Tian, Kevin
  2016-01-19  6:54                   ` Xu, Quan
  1 sibling, 0 replies; 35+ messages in thread
From: Tian, Kevin @ 2016-01-19  0:46 UTC (permalink / raw)
  To: Tim Deegan, Xu, Quan
  Cc: keir, Nakajima, Jun, george.dunlap, andrew.cooper3, xen-devel,
	Jan Beulich, Wu, Feng

> From: Tim Deegan [mailto:tim@xen.org]
> Sent: Monday, January 18, 2016 11:32 PM
> 
> At 10:46 +0000 on 14 Jan (1452768377), Xu, Quan wrote:
> > > It's not about how this specific thing can be fixed. I didn't check all the code.
> > > There could be more examples, and in the future all new code need to be aware
> > > that the majority of IOMMU functions may have pdev->domain changed due to
> > > error. My concern is more that it's not a good design. I think it's natural to have
> > > a state changed only after all existing references in the call chain have been
> > > completed. Adding a new flag to delay hiding device looks much clearer to me.
> >
> > It looks we are not on the same page for how to hide a device.
> > Actually I have implemented these two ideas with pci_hide_device()
> > or flag in previous versions.
> >
> > Andrew and Tim, what's your opinion?
> 
> I have no strong opinion, since this isn't my area (and really never
> was - I worked on address translation more than device allocation).
> Since you ask, I don't see anything wrong with hiding the device if
> you already own all the locks, and I'd be inclined to make whatever
> changes are necessary ASAP after the error.  Deferring the change (and
> having all callers need to look for the deferred error) sounds
> fragile.
> 

It doesn't need to have all callers do same deferred error check. It
can be done in next device assignment point. But I'm OK with going
Jan's proposal, i.e. hide device right when thing goes wrong, as long
as it is the common practice in Xen.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-18 15:32                 ` Tim Deegan
  2016-01-19  0:46                   ` Tian, Kevin
@ 2016-01-19  6:54                   ` Xu, Quan
  1 sibling, 0 replies; 35+ messages in thread
From: Xu, Quan @ 2016-01-19  6:54 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Tian, Kevin, keir, Jan Beulich, george.dunlap, andrew.cooper3,
	xen-devel, Nakajima, Jun, Wu, Feng

> On January 18, 2016 at 11:32pm, <tim@xen.org> wrote:
> At 10:46 +0000 on 14 Jan (1452768377), Xu, Quan wrote:
> > > It's not about how this specific thing can be fixed. I didn't check all the code.
> > > There could be more examples, and in the future all new code need to
> > > be aware that the majority of IOMMU functions may have pdev->domain
> > > changed due to error. My concern is more that it's not a good
> > > design. I think it's natural to have a state changed only after all
> > > existing references in the call chain have been completed. Adding a new flag
> to delay hiding device looks much clearer to me.
> >
> > It looks we are not on the same page for how to hide a device.
> > Actually I have implemented these two ideas with pci_hide_device() or
> > flag in previous versions.
> >
> > Andrew and Tim, what's your opinion?
> 
> I have no strong opinion, since this isn't my area (and really never was - I
> worked on address translation more than device allocation).
> Since you ask, I don't see anything wrong with hiding the device if you already
> own all the locks, and I'd be inclined to make whatever changes are necessary
> ASAP after the error.  Deferring the change (and having all callers need to look
> for the deferred error) sounds fragile.

Tim, thanks a lots. I always assumed that you know ALL of Xen..
I will follow current v4 implement and send out v5 ASAP.

- Quan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-27 14:29                                     ` Jan Beulich
@ 2016-01-27 14:35                                       ` Xu, Quan
  0 siblings, 0 replies; 35+ messages in thread
From: Xu, Quan @ 2016-01-27 14:35 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, george.dunlap, andrew.cooper3, tim, xen-devel

> On January 27, 2016 10:29pm, <JBeulich@suse.com> wrote:
> >>> On 27.01.16 at 15:13, <quan.xu@intel.com> wrote:
> >>  On January 27, 2016 at 9:15pm, <JBeulich@suse.com> wrote:
> >> >>> On 27.01.16 at 13:38, <quan.xu@intel.com> wrote:
> >> >>  On January 27, 2016 at 7:24pm, <JBeulich@suse.com> wrote:
> >> >> >>> On 27.01.16 at 12:09, <quan.xu@intel.com> wrote:
> >> >> >>  On January 27, 2016 at 6:48am, <Tian, Kevin> wrote:
> >> >> >> > From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >> >> > Sent: Tuesday, January 26, 2016 11:53 PM
> >> >> >
> >> >> >
> >> >> >> > Once again: Before getting started, please assess which route
> >> >> >> > is going to be the better one. Remember that we had already
> >> >> >> > discussed and put aside some form of deferring the hiding of
> >> >> >> > devices, so if you come back with a patch doing that again,
> >> >> >> > you'll have to be able to explain why the alternative(s) are worse.
> >> >> >> >
> >> >> >>
> >> >> >> Quan, could you list pros/cons of those alternatives based on
> >> >> >> discussion so
> >> >> far?
> >> >> >> Then we can decide which way should be done before you go to
> >> >> >> actual
> >> >> coding.
> >> >> >> Earlier suggestion on hiding device immediately is under the
> >> >> >> assumption that all locks have been held. If this part becomes
> >> >> >> too complex, and you can explain clearly that deferring the
> >> >> >> hiding action doesn't lead to any race condition, then people
> >> >> >> can see why you are
> >> >> proposing defer again.
> >> >> >
> >> >> >
> >> >> > The following are pros/cons of those alternatives. It is also
> >> >> > why I propose defer again.
> >> >> >
> >> >> > -- --
> >> >> > 1. Hiding the devices immediately
> >> >> > Pros:
> >> >> >      * it makes whatever changes are ASAP after the Device-TLB
> >> >> > flush
> >> error.
> >> >> >
> >> >> > Cons:
> >> >> >      * It may break the code path.
> >> >> >      * It may lead to any race condition.
> >> >> >      * Hiding the devices immediately is under the assumption
> >> >> > that all
> >> > locks
> >> >> have been held.
> >> >> >       Different locking state is possible for different call trees.
> >> >> > This
> >> > part
> >> >> becomes too complex.
> >> >>
> >> >> So you just repeat what you've already said before. "This part
> >> >> becomes too complex" you say without any kind of proof, yet that's
> >> >> what we need to understand whether the alternative of doing the
> >> >> locking correctly really is
> >> > this
> >> >> bad (and I continue to not see why it would).
> >> >
> >> >
> >> > Such as pcidevs_lock:
> >> >
> >> > 1. as I mentioned, it is indeed different locking state is possible
> >> > for different call trees of flush Device-TLB. When Device-TLB flush
> >> > is error, It is really challenge to judge whether to acquire the
> >> > pcidevs_lock or
> >> not.
> >> >
> >> >    For example,
> >> >    *It is _not_under_ lock for the following call tree:
> >> > $ flush_iotlb_qi()--- iommu_flush_iotlb_psi() --
> >> > __intel_iommu_iotlb_flush()
> >> > --intel_iommu_iotlb_flush() --iommu_iotlb_flush()
> >> > --xenmem_add_to_physmap()--do_memory_op()
> >> >
> >> >    *It is _under_ lock for the following call tree:
> >> > $flush_iotlb_qi()--iommu_flush_iotlb_dsi()--domain_context_unmap_on
> >> > e()
> >> > --domain_con
> >> > text_unmap()--reassign_device_ownership()--deassign_device()-iommu_
> >> > do_
> >> > pci_domctl()
> >> >
> >> > 2. if I try to acquire the pcidevs_lock for some _not_under_ lock
> >> > call tree, it makes things worse. As the pcidevs_lock is a big lock, then
> >> >   Frequent memory modification may block the pci-device assign due
> >> > to the pcidevs_lock. if I try to split the pcidevs_lock into small locks.
> >> >   It may takes a great deal of time to make it stable.
> >>
> >> I don't understand this, namely in the context of my suggestion to
> >> simply pass down a flag indicating whether the lock is being held
> >> (and hence acquiring it only in the most narrow scope if not already owning
> it).
> >>
> >
> > This is also an idea.
> > BTW, Does the lock refer to pcidevs_lock?
> 
> Yes, for now I assume that only that lock actually needs special attention.
> 

Thanks. I think so too.
-Quan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-27 14:13                                   ` Xu, Quan
@ 2016-01-27 14:29                                     ` Jan Beulich
  2016-01-27 14:35                                       ` Xu, Quan
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2016-01-27 14:29 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, george.dunlap, andrew.cooper3, tim, xen-devel

>>> On 27.01.16 at 15:13, <quan.xu@intel.com> wrote:
>>  On January 27, 2016 at 9:15pm, <JBeulich@suse.com> wrote:
>> >>> On 27.01.16 at 13:38, <quan.xu@intel.com> wrote:
>> >>  On January 27, 2016 at 7:24pm, <JBeulich@suse.com> wrote:
>> >> >>> On 27.01.16 at 12:09, <quan.xu@intel.com> wrote:
>> >> >>  On January 27, 2016 at 6:48am, <Tian, Kevin> wrote:
>> >> >> > From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> >> > Sent: Tuesday, January 26, 2016 11:53 PM
>> >> >
>> >> >
>> >> >> > Once again: Before getting started, please assess which route is
>> >> >> > going to be the better one. Remember that we had already
>> >> >> > discussed and put aside some form of deferring the hiding of
>> >> >> > devices, so if you come back with a patch doing that again,
>> >> >> > you'll have to be able to explain why the alternative(s) are worse.
>> >> >> >
>> >> >>
>> >> >> Quan, could you list pros/cons of those alternatives based on
>> >> >> discussion so
>> >> far?
>> >> >> Then we can decide which way should be done before you go to
>> >> >> actual
>> >> coding.
>> >> >> Earlier suggestion on hiding device immediately is under the
>> >> >> assumption that all locks have been held. If this part becomes too
>> >> >> complex, and you can explain clearly that deferring the hiding
>> >> >> action doesn't lead to any race condition, then people can see why
>> >> >> you are
>> >> proposing defer again.
>> >> >
>> >> >
>> >> > The following are pros/cons of those alternatives. It is also why I
>> >> > propose defer again.
>> >> >
>> >> > -- --
>> >> > 1. Hiding the devices immediately
>> >> > Pros:
>> >> >      * it makes whatever changes are ASAP after the Device-TLB flush
>> error.
>> >> >
>> >> > Cons:
>> >> >      * It may break the code path.
>> >> >      * It may lead to any race condition.
>> >> >      * Hiding the devices immediately is under the assumption that
>> >> > all
>> > locks
>> >> have been held.
>> >> >       Different locking state is possible for different call trees.
>> >> > This
>> > part
>> >> becomes too complex.
>> >>
>> >> So you just repeat what you've already said before. "This part
>> >> becomes too complex" you say without any kind of proof, yet that's
>> >> what we need to understand whether the alternative of doing the
>> >> locking correctly really is
>> > this
>> >> bad (and I continue to not see why it would).
>> >
>> >
>> > Such as pcidevs_lock:
>> >
>> > 1. as I mentioned, it is indeed different locking state is possible
>> > for different call trees of flush Device-TLB. When Device-TLB flush is
>> > error, It is really challenge to judge whether to acquire the pcidevs_lock or
>> not.
>> >
>> >    For example,
>> >    *It is _not_under_ lock for the following call tree:
>> > $ flush_iotlb_qi()--- iommu_flush_iotlb_psi() --
>> > __intel_iommu_iotlb_flush()
>> > --intel_iommu_iotlb_flush() --iommu_iotlb_flush()
>> > --xenmem_add_to_physmap()--do_memory_op()
>> >
>> >    *It is _under_ lock for the following call tree:
>> > $flush_iotlb_qi()--iommu_flush_iotlb_dsi()--domain_context_unmap_one()
>> > --domain_con
>> > text_unmap()--reassign_device_ownership()--deassign_device()-iommu_do_
>> > pci_domctl()
>> >
>> > 2. if I try to acquire the pcidevs_lock for some _not_under_ lock call
>> > tree, it makes things worse. As the pcidevs_lock is a big lock, then
>> >   Frequent memory modification may block the pci-device assign due to
>> > the pcidevs_lock. if I try to split the pcidevs_lock into small locks.
>> >   It may takes a great deal of time to make it stable.
>> 
>> I don't understand this, namely in the context of my suggestion to simply pass
>> down a flag indicating whether the lock is being held (and hence acquiring it
>> only in the most narrow scope if not already owning it).
>> 
> 
> This is also an idea.
> BTW, Does the lock refer to pcidevs_lock?

Yes, for now I assume that only that lock actually needs special
attention.

Jan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-27 13:15                                 ` Jan Beulich
@ 2016-01-27 14:13                                   ` Xu, Quan
  2016-01-27 14:29                                     ` Jan Beulich
  0 siblings, 1 reply; 35+ messages in thread
From: Xu, Quan @ 2016-01-27 14:13 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, george.dunlap, andrew.cooper3, tim, xen-devel

> On January 27, 2016 at 9:15pm, <JBeulich@suse.com> wrote:
> >>> On 27.01.16 at 13:38, <quan.xu@intel.com> wrote:
> >>  On January 27, 2016 at 7:24pm, <JBeulich@suse.com> wrote:
> >> >>> On 27.01.16 at 12:09, <quan.xu@intel.com> wrote:
> >> >>  On January 27, 2016 at 6:48am, <Tian, Kevin> wrote:
> >> >> > From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >> > Sent: Tuesday, January 26, 2016 11:53 PM
> >> >
> >> >
> >> >> > Once again: Before getting started, please assess which route is
> >> >> > going to be the better one. Remember that we had already
> >> >> > discussed and put aside some form of deferring the hiding of
> >> >> > devices, so if you come back with a patch doing that again,
> >> >> > you'll have to be able to explain why the alternative(s) are worse.
> >> >> >
> >> >>
> >> >> Quan, could you list pros/cons of those alternatives based on
> >> >> discussion so
> >> far?
> >> >> Then we can decide which way should be done before you go to
> >> >> actual
> >> coding.
> >> >> Earlier suggestion on hiding device immediately is under the
> >> >> assumption that all locks have been held. If this part becomes too
> >> >> complex, and you can explain clearly that deferring the hiding
> >> >> action doesn't lead to any race condition, then people can see why
> >> >> you are
> >> proposing defer again.
> >> >
> >> >
> >> > The following are pros/cons of those alternatives. It is also why I
> >> > propose defer again.
> >> >
> >> > -- --
> >> > 1. Hiding the devices immediately
> >> > Pros:
> >> >      * it makes whatever changes are ASAP after the Device-TLB flush
> error.
> >> >
> >> > Cons:
> >> >      * It may break the code path.
> >> >      * It may lead to any race condition.
> >> >      * Hiding the devices immediately is under the assumption that
> >> > all
> > locks
> >> have been held.
> >> >       Different locking state is possible for different call trees.
> >> > This
> > part
> >> becomes too complex.
> >>
> >> So you just repeat what you've already said before. "This part
> >> becomes too complex" you say without any kind of proof, yet that's
> >> what we need to understand whether the alternative of doing the
> >> locking correctly really is
> > this
> >> bad (and I continue to not see why it would).
> >
> >
> > Such as pcidevs_lock:
> >
> > 1. as I mentioned, it is indeed different locking state is possible
> > for different call trees of flush Device-TLB. When Device-TLB flush is
> > error, It is really challenge to judge whether to acquire the pcidevs_lock or
> not.
> >
> >    For example,
> >    *It is _not_under_ lock for the following call tree:
> > $ flush_iotlb_qi()--- iommu_flush_iotlb_psi() --
> > __intel_iommu_iotlb_flush()
> > --intel_iommu_iotlb_flush() --iommu_iotlb_flush()
> > --xenmem_add_to_physmap()--do_memory_op()
> >
> >    *It is _under_ lock for the following call tree:
> > $flush_iotlb_qi()--iommu_flush_iotlb_dsi()--domain_context_unmap_one()
> > --domain_con
> > text_unmap()--reassign_device_ownership()--deassign_device()-iommu_do_
> > pci_domctl()
> >
> > 2. if I try to acquire the pcidevs_lock for some _not_under_ lock call
> > tree, it makes things worse. As the pcidevs_lock is a big lock, then
> >   Frequent memory modification may block the pci-device assign due to
> > the pcidevs_lock. if I try to split the pcidevs_lock into small locks.
> >   It may takes a great deal of time to make it stable.
> 
> I don't understand this, namely in the context of my suggestion to simply pass
> down a flag indicating whether the lock is being held (and hence acquiring it
> only in the most narrow scope if not already owning it).
> 

This is also an idea.
BTW, Does the lock refer to pcidevs_lock?

-Quan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-27 12:38                               ` Xu, Quan
@ 2016-01-27 13:15                                 ` Jan Beulich
  2016-01-27 14:13                                   ` Xu, Quan
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2016-01-27 13:15 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, george.dunlap, andrew.cooper3, tim, xen-devel

>>> On 27.01.16 at 13:38, <quan.xu@intel.com> wrote:
>>  On January 27, 2016 at 7:24pm, <JBeulich@suse.com> wrote:
>> >>> On 27.01.16 at 12:09, <quan.xu@intel.com> wrote:
>> >>  On January 27, 2016 at 6:48am, <Tian, Kevin> wrote:
>> >> > From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> > Sent: Tuesday, January 26, 2016 11:53 PM
>> >
>> >
>> >> > Once again: Before getting started, please assess which route is
>> >> > going to be the better one. Remember that we had already discussed
>> >> > and put aside some form of deferring the hiding of devices, so if
>> >> > you come back with a patch doing that again, you'll have to be able
>> >> > to explain why the alternative(s) are worse.
>> >> >
>> >>
>> >> Quan, could you list pros/cons of those alternatives based on discussion so
>> far?
>> >> Then we can decide which way should be done before you go to actual
>> coding.
>> >> Earlier suggestion on hiding device immediately is under the
>> >> assumption that all locks have been held. If this part becomes too
>> >> complex, and you can explain clearly that deferring the hiding action
>> >> doesn't lead to any race condition, then people can see why you are
>> proposing defer again.
>> >
>> >
>> > The following are pros/cons of those alternatives. It is also why I
>> > propose defer again.
>> >
>> > -- --
>> > 1. Hiding the devices immediately
>> > Pros:
>> >      * it makes whatever changes are ASAP after the Device-TLB flush error.
>> >
>> > Cons:
>> >      * It may break the code path.
>> >      * It may lead to any race condition.
>> >      * Hiding the devices immediately is under the assumption that all 
> locks
>> have been held.
>> >       Different locking state is possible for different call trees. This 
> part
>> becomes too complex.
>> 
>> So you just repeat what you've already said before. "This part becomes too
>> complex" you say without any kind of proof, yet that's what we need to
>> understand whether the alternative of doing the locking correctly really is 
> this
>> bad (and I continue to not see why it would).
> 
> 
> Such as pcidevs_lock:
> 
> 1. as I mentioned, it is indeed different locking state is possible for 
> different call trees of flush Device-TLB. When Device-TLB flush is error, It is 
> really challenge to judge whether to acquire the pcidevs_lock or not.
> 
>    For example,
>    *It is _not_under_ lock for the following call tree:
> $ flush_iotlb_qi()--- iommu_flush_iotlb_psi() -- __intel_iommu_iotlb_flush() 
> --intel_iommu_iotlb_flush() --iommu_iotlb_flush() 
> --xenmem_add_to_physmap()--do_memory_op() 
> 
>    *It is _under_ lock for the following call tree:
> $flush_iotlb_qi()--iommu_flush_iotlb_dsi()--domain_context_unmap_one()--domain_con
> text_unmap()--reassign_device_ownership()--deassign_device()-iommu_do_pci_domctl()
> 
> 2. if I try to acquire the pcidevs_lock for some _not_under_ lock call tree, 
> it makes things worse. As the pcidevs_lock is a big lock, then
>   Frequent memory modification may block the pci-device assign due to the 
> pcidevs_lock. if I try to split the pcidevs_lock into small locks.
>   It may takes a great deal of time to make it stable.

I don't understand this, namely in the context of my suggestion to
simply pass down a flag indicating whether the lock is being held
(and hence acquiring it only in the most narrow scope if not already
owning it).

Jan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-27 11:24                             ` Jan Beulich
@ 2016-01-27 12:38                               ` Xu, Quan
  2016-01-27 13:15                                 ` Jan Beulich
  0 siblings, 1 reply; 35+ messages in thread
From: Xu, Quan @ 2016-01-27 12:38 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, george.dunlap, andrew.cooper3, tim, xen-devel

> On January 27, 2016 at 7:24pm, <JBeulich@suse.com> wrote:
> >>> On 27.01.16 at 12:09, <quan.xu@intel.com> wrote:
> >>  On January 27, 2016 at 6:48am, <Tian, Kevin> wrote:
> >> > From: Jan Beulich [mailto:JBeulich@suse.com]
> >> > Sent: Tuesday, January 26, 2016 11:53 PM
> >
> >
> >> > Once again: Before getting started, please assess which route is
> >> > going to be the better one. Remember that we had already discussed
> >> > and put aside some form of deferring the hiding of devices, so if
> >> > you come back with a patch doing that again, you'll have to be able
> >> > to explain why the alternative(s) are worse.
> >> >
> >>
> >> Quan, could you list pros/cons of those alternatives based on discussion so
> far?
> >> Then we can decide which way should be done before you go to actual
> coding.
> >> Earlier suggestion on hiding device immediately is under the
> >> assumption that all locks have been held. If this part becomes too
> >> complex, and you can explain clearly that deferring the hiding action
> >> doesn't lead to any race condition, then people can see why you are
> proposing defer again.
> >
> >
> > The following are pros/cons of those alternatives. It is also why I
> > propose defer again.
> >
> > -- --
> > 1. Hiding the devices immediately
> > Pros:
> >      * it makes whatever changes are ASAP after the Device-TLB flush error.
> >
> > Cons:
> >      * It may break the code path.
> >      * It may lead to any race condition.
> >      * Hiding the devices immediately is under the assumption that all locks
> have been held.
> >       Different locking state is possible for different call trees. This part
> becomes too complex.
> 
> So you just repeat what you've already said before. "This part becomes too
> complex" you say without any kind of proof, yet that's what we need to
> understand whether the alternative of doing the locking correctly really is this
> bad (and I continue to not see why it would).


Such as pcidevs_lock:

1. as I mentioned, it is indeed different locking state is possible for different call trees of flush Device-TLB. When Device-TLB flush is error, It is really challenge to judge whether to acquire the pcidevs_lock or not.

   For example,
   *It is _not_under_ lock for the following call tree:
$ flush_iotlb_qi()--- iommu_flush_iotlb_psi() -- __intel_iommu_iotlb_flush() --intel_iommu_iotlb_flush() --iommu_iotlb_flush() --xenmem_add_to_physmap()--do_memory_op() 

   *It is _under_ lock for the following call tree:
$flush_iotlb_qi()--iommu_flush_iotlb_dsi()--domain_context_unmap_one()--domain_context_unmap()--reassign_device_ownership()--deassign_device()-iommu_do_pci_domctl()

2. if I try to acquire the pcidevs_lock for some _not_under_ lock call tree, it makes things worse. As the pcidevs_lock is a big lock, then
  Frequent memory modification may block the pci-device assign due to the pcidevs_lock. if I try to split the pcidevs_lock into small locks.
  It may takes a great deal of time to make it stable.

Also I should consider the other locks..


-Quan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-27 11:09                           ` Xu, Quan
@ 2016-01-27 11:24                             ` Jan Beulich
  2016-01-27 12:38                               ` Xu, Quan
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2016-01-27 11:24 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, george.dunlap, andrew.cooper3, tim, xen-devel

>>> On 27.01.16 at 12:09, <quan.xu@intel.com> wrote:
>>  On January 27, 2016 at 6:48am, <Tian, Kevin> wrote:
>> > From: Jan Beulich [mailto:JBeulich@suse.com]
>> > Sent: Tuesday, January 26, 2016 11:53 PM
> 
> 
>> > Once again: Before getting started, please assess which route is going
>> > to be the better one. Remember that we had already discussed and put
>> > aside some form of deferring the hiding of devices, so if you come
>> > back with a patch doing that again, you'll have to be able to explain
>> > why the alternative(s) are worse.
>> >
>> 
>> Quan, could you list pros/cons of those alternatives based on discussion so far?
>> Then we can decide which way should be done before you go to actual coding.
>> Earlier suggestion on hiding device immediately is under the assumption that all
>> locks have been held. If this part becomes too complex, and you can explain
>> clearly that deferring the hiding action doesn't lead to any race condition, then
>> people can see why you are proposing defer again.
> 
> 
> The following are pros/cons of those alternatives. It is also why I propose 
> defer again.
> 
> -- --
> 1. Hiding the devices immediately
> Pros:
>      * it makes whatever changes are ASAP after the Device-TLB flush error.
>  
> Cons:
>      * It may break the code path.
>      * It may lead to any race condition.
>      * Hiding the devices immediately is under the assumption that all locks have been held.
>       Different locking state is possible for different call trees. This part becomes too complex.

So you just repeat what you've already said before. "This part
becomes too complex" you say without any kind of proof, yet
that's what we need to understand whether the alternative of
doing the locking correctly really is this bad (and I continue to
not see why it would).

Jan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-26 22:47                         ` Tian, Kevin
@ 2016-01-27 11:09                           ` Xu, Quan
  2016-01-27 11:24                             ` Jan Beulich
  0 siblings, 1 reply; 35+ messages in thread
From: Xu, Quan @ 2016-01-27 11:09 UTC (permalink / raw)
  To: Tian, Kevin, Jan Beulich
  Cc: george.dunlap, andrew.cooper3, tim, Wu, Feng, xen-devel

> On January 27, 2016 at 6:48am, <Tian, Kevin> wrote:
> > From: Jan Beulich [mailto:JBeulich@suse.com]
> > Sent: Tuesday, January 26, 2016 11:53 PM


> > Once again: Before getting started, please assess which route is going
> > to be the better one. Remember that we had already discussed and put
> > aside some form of deferring the hiding of devices, so if you come
> > back with a patch doing that again, you'll have to be able to explain
> > why the alternative(s) are worse.
> >
> 
> Quan, could you list pros/cons of those alternatives based on discussion so far?
> Then we can decide which way should be done before you go to actual coding.
> Earlier suggestion on hiding device immediately is under the assumption that all
> locks have been held. If this part becomes too complex, and you can explain
> clearly that deferring the hiding action doesn't lead to any race condition, then
> people can see why you are proposing defer again.


The following are pros/cons of those alternatives. It is also why I propose defer again.

-- --
1. Hiding the devices immediately
Pros:
     * it makes whatever changes are ASAP after the Device-TLB flush error.
 
Cons:
     * It may break the code path.
     * It may lead to any race condition.
     * Hiding the devices immediately is under the assumption that all locks have been held.
      Different locking state is possible for different call trees. This part becomes too complex.
 
2. deferring the hiding of devices
Pros:
    *It doesn't lead to any race condition.
    *It makes existing calling chains with all required error handling completed.
 
Cons:
    * It needs to maintain a flag.
-- --

-----
How to defer the hiding of devices:
    When Device-TLB flush is error, it is to set a new flag and then continue existing calling chains with all required error handling completed. Only at safe place we can safely invoke pci_hide_device() variant to hide devices. We do a lazy hide until next time when it's assigned to another guest while the new flag is recognized.

Furthermore explanation:
A new flag:
         In my previous email, I introduced a "pci_dev->info.is_unassignable" flag. When Device-TLB flush is error, I think it is not a good idea to modify the pci_dev,
         As it is uncertain whether pcidevs_lock is acquired.
         I'd better introduce a new iommu bitmap array to register the device with BDF. It may be quite similar to 'iommu->domid_bitmap'

Safe place: 
         I think the beginning of reassign_device_ownership() is a safe place to hide the device which is registered with the new flag.
         reassign_device_ownership() is under lock.

pci_hide_device() variant:
         Remove the pcidevs_lock, as the pcidevs_lock has been acquired at call tree of reassign_device_ownership().


Thanks!!!
-Quan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-26 15:52                       ` Jan Beulich
@ 2016-01-26 22:47                         ` Tian, Kevin
  2016-01-27 11:09                           ` Xu, Quan
  0 siblings, 1 reply; 35+ messages in thread
From: Tian, Kevin @ 2016-01-26 22:47 UTC (permalink / raw)
  To: Jan Beulich, Xu, Quan
  Cc: george.dunlap, andrew.cooper3, tim, Wu, Feng, xen-devel

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Tuesday, January 26, 2016 11:53 PM
> 
> >> > At first, I am open for any solution.
> >> > pcidevs_lock is quite a big lock. For this point, it looks much better
> >> > to add a new flag to delay hiding device.
> >> > I am also afraid that it may raise further security issues.
> >>
> >> Well, I'd say just go and see which one turns out to be less cumbersome
> > and/or
> >> less intrusive.
> >>
> > For this lock, any good idea?
> > IMO, I can get started to add a new flag to delay hiding device.
> 
> Once again: Before getting started, please assess which route is
> going to be the better one. Remember that we had already
> discussed and put aside some form of deferring the hiding of
> devices, so if you come back with a patch doing that again, you'll
> have to be able to explain why the alternative(s) are worse.
> 

Quan, could you list pros/cons of those alternatives based on
discussion so far? Then we can decide which way should be
done before you go to actual coding. Earlier suggestion on
hiding device immediately is under the assumption that all
locks have been held. If this part becomes too complex, and
you can explain clearly that deferring the hiding action doesn't
lead to any race condition, then people can see why you are
proposing defer again.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-26 15:27                     ` Xu, Quan
@ 2016-01-26 15:52                       ` Jan Beulich
  2016-01-26 22:47                         ` Tian, Kevin
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2016-01-26 15:52 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, george.dunlap, andrew.cooper3, tim, xen-devel

>>> On 26.01.16 at 16:27, <quan.xu@intel.com> wrote:
>>  On January 26, 2016 at 10:00pm, <JBeulich@suse.com> wrote:
>> >>> On 26.01.16 at 14:47, <quan.xu@intel.com> wrote:
>> > As you mentioned , I simply need to consult the bitmap along with the
>> > domain ID array.
>> >
>> > +If ( test_bit(did, iommu->domid_bitmap) && iommu->domid_map[did] >= 0 )
>> > +   d = rcu_lock_domain_by_id(iommu->domid_map[did]);
>> >
>> > Is it right now?
>> 
>> Mostly, except that I don't understand the >= 0 part.
>> 
> Domain ID should be >= 0..
> If it is redundant, I can remove it.

Quan, please: You have the code, so you can check whether
negative values ever get stored there. And had you checked,
you'd have found that domid_map is declared as u16 *. The
"u" here, as I hope you know, stands for "unsigned". (Of
course this really should be domid_t, but I'm sure the VT-d
maintainers won't care at all about such inconsistencies.)

>> > At first, I am open for any solution.
>> > pcidevs_lock is quite a big lock. For this point, it looks much better
>> > to add a new flag to delay hiding device.
>> > I am also afraid that it may raise further security issues.
>> 
>> Well, I'd say just go and see which one turns out to be less cumbersome 
> and/or
>> less intrusive.
>> 
> For this lock, any good idea?
> IMO, I can get started to add a new flag to delay hiding device.

Once again: Before getting started, please assess which route is
going to be the better one. Remember that we had already
discussed and put aside some form of deferring the hiding of
devices, so if you come back with a patch doing that again, you'll
have to be able to explain why the alternative(s) are worse.

Jan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-26 13:59                   ` Jan Beulich
@ 2016-01-26 15:27                     ` Xu, Quan
  2016-01-26 15:52                       ` Jan Beulich
  0 siblings, 1 reply; 35+ messages in thread
From: Xu, Quan @ 2016-01-26 15:27 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, george.dunlap, andrew.cooper3, tim, xen-devel

> On January 26, 2016 at 10:00pm, <JBeulich@suse.com> wrote:
> >>> On 26.01.16 at 14:47, <quan.xu@intel.com> wrote:
> > As you mentioned , I simply need to consult the bitmap along with the
> > domain ID array.
> >
> > +If ( test_bit(did, iommu->domid_bitmap) && iommu->domid_map[did] >= 0 )
> > +   d = rcu_lock_domain_by_id(iommu->domid_map[did]);
> >
> > Is it right now?
> 
> Mostly, except that I don't understand the >= 0 part.
> 
Domain ID should be >= 0..
If it is redundant, I can remove it.

> > At first, I am open for any solution.
> > pcidevs_lock is quite a big lock. For this point, it looks much better
> > to add a new flag to delay hiding device.
> > I am also afraid that it may raise further security issues.
> 
> Well, I'd say just go and see which one turns out to be less cumbersome and/or
> less intrusive.
> 
For this lock, any good idea?
IMO, I can get started to add a new flag to delay hiding device.

-Quan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-26 13:47                 ` Xu, Quan
@ 2016-01-26 13:59                   ` Jan Beulich
  2016-01-26 15:27                     ` Xu, Quan
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2016-01-26 13:59 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, george.dunlap, andrew.cooper3, tim, xen-devel

>>> On 26.01.16 at 14:47, <quan.xu@intel.com> wrote:
> As you mentioned , I simply need to consult the bitmap along with the domain 
> ID array.
> 
> +If ( test_bit(did, iommu->domid_bitmap) && iommu->domid_map[did] >= 0 )
> +   d = rcu_lock_domain_by_id(iommu->domid_map[did]);
> 
> Is it right now?

Mostly, except that I don't understand the >= 0 part.

> At first, I am open for any solution.
> pcidevs_lock is quite a big lock. For this point, it looks much better to 
> add a new flag to delay hiding device.
> I am also afraid that it may raise further security issues.

Well, I'd say just go and see which one turns out to be less
cumbersome and/or less intrusive.

Jan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-25 14:09               ` Jan Beulich
@ 2016-01-26 13:47                 ` Xu, Quan
  2016-01-26 13:59                   ` Jan Beulich
  0 siblings, 1 reply; 35+ messages in thread
From: Xu, Quan @ 2016-01-26 13:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, george.dunlap, andrew.cooper3, tim, xen-devel

> On January 25, 2016 at 10:09pm, <JBeulich@suse.com> wrote:
> >>> On 22.01.16 at 16:57, <quan.xu@intel.com> wrote:
> >>  On January 22, 2016 at 12:31am, <JBeulich@suse.com> wrote:
> >> >>> On 21.01.16 at 17:16, <quan.xu@intel.com> wrote:
> >> >>  On January 20, 2016 at 7:29 pm,  <JBeulich@suse.com> wrote:
> >> >> >>> On 20.01.16 at 11:26, <quan.xu@intel.com> wrote:
> >> >> >> On January 15, 2016 at 9:10, <JBeulich@suse.com> wrote:
> >> >> >> >>> On 23.12.15 at 09:25, <quan.xu@intel.com> wrote:
> >> >> >> Also note that unused slots hold zero, i.e. there's at least a
> >> >> >> theoretical
> >> >> > risk of
> >> >> >> problems here when you don't also look at
> >> >> >> iommu->domid_bitmap.
> >> >> >>
> >> >> > I am not clear how to fix this point. Do you have good idea?
> >> >> > Add a lock to 'iommu->domid_bitmap'?
> >> >>
> >> >> How would a lock help avoiding mistaking an unused slot to mean Dom0?
> >> >> As already suggested - I think you simply need to consult the
> >> >> bitmap along with
> >> > the
> >> >> domain ID array.
> >> >>
> >> >
> >> > Try to get domain id with iommu->domid_map[did] ?
> >>
> >> ???
> >>
> >         +if ( iommu->domid_map )
> >         +   d = rcu_lock_domain_by_id(iommu->domid_map[did]);
> >
> > Is it right?
> 
> I don't see what this changes. Again - what your code has been lacking so far is
> some mechanism to guarantee that what you read from domid_map[] is
> actually a valid domain ID. I can only once again point your attention to
> domid_bitmap, which afaics gives you exactly that valid/invalid indication.
> 
Jan,
Sorry, I was confused about this point as I didn't understand the domid_bitmap. I printed out the iommu->domid_bitmap[did], which was tricky to me.
Now I get it. 
As you mentioned , I simply need to consult the bitmap along with the domain ID array.

+If ( test_bit(did, iommu->domid_bitmap) && iommu->domid_map[did] >= 0 )
+   d = rcu_lock_domain_by_id(iommu->domid_map[did]);

Is it right now?

> >> >> >> > +            {
> >> >> >> > +                list_del(&pdev->domain_list);
> >> >> >>
> >> >> >> This should happen under pcidevs_lock - you need to either
> >> >> >> acquire it or
> >> >> >> ASSERT() it being held.
> >> >> >>
> >> >> >
> >> >> > I should check whether it is under pcidevs_lock -- with
> >> >> > spin_is_locked(&pcidevs_lock)
> >> >> > If it is not under pcidevs_lock, I should acquire it.
> >> >> > I think ASSERT() is not a good idea. Hypervisor acquires this
> >> >> > lock and then remove the resource.
> >> >>
> >> >> I don't understand this last sentence.
> >> >>
> >> > For example: in
> >> > pci_remove_device()
> >> > {
> >> > ...
> >> >     spin_lock(&pcidevs_lock);
> >> >     ..
> >> >     iommu_remove_device()..
> >> >     ..
> >> >     spin_unlock(&pcidevs_lock);
> >> > }
> >> >
> >> > Device-TLB is maybe flush error in iommu_remove_device()..
> >> > Then it is under pcidevs_lock..
> >> > In this case, I need to check whether it is under pcidevs_lock.
> >> > If not, I need to acquire the pcidevs_lock.
> >>
> >> Ah, okay. But you can't use spin_is_locked() for that purpose.
> >>
> > If I introduce a new parameter 'lock'.
> > + int lock = spin_is_locked(&pcidevs_lock);
> >
> >
> > + if ( !lock )
> > +    spin_lock(&pcidevs_lock);
> > ...
> > + if ( !lock )
> > +    spin_unlock(&pcidevs_lock);
> >
> > Is it right?
> > Jan, do you have some better idea?
> 
> If indeed different locking state is possible for different call trees, 

Indeed different. For example,
It is _not_under_ lock for the following call tree:
$ flush_iotlb_qi()--- iommu_flush_iotlb_psi() -- __intel_iommu_iotlb_flush() --intel_iommu_iotlb_flush() --iommu_iotlb_flush() --xenmem_add_to_physmap()--do_memory_op() 

It is _under_ lock for the following call tree:
$flush_iotlb_qi()--iommu_flush_iotlb_dsi()--domain_context_unmap_one()--domain_context_unmap()--reassign_device_ownership()--deassign_device()-iommu_do_pci_domctl()



> then I'm
> afraid you won't get around passing down flags to indicate whether the needed
> lock is already being held.
> 
Agreed.

At first, I am open for any solution.
pcidevs_lock is quite a big lock. For this point, it looks much better to add a new flag to delay hiding device.
I am also afraid that it may raise further security issues.


Jan, thanks!!
-Quan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-22 15:57             ` Xu, Quan
@ 2016-01-25 14:09               ` Jan Beulich
  2016-01-26 13:47                 ` Xu, Quan
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2016-01-25 14:09 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, george.dunlap, andrew.cooper3, tim,
	xen-devel, Jun Nakajima, keir

>>> On 22.01.16 at 16:57, <quan.xu@intel.com> wrote:
>>  On January 22, 2016 at 12:31am, <JBeulich@suse.com> wrote:
>> >>> On 21.01.16 at 17:16, <quan.xu@intel.com> wrote:
>> >>  On January 20, 2016 at 7:29 pm,  <JBeulich@suse.com> wrote:
>> >> >>> On 20.01.16 at 11:26, <quan.xu@intel.com> wrote:
>> >> >> On January 15, 2016 at 9:10, <JBeulich@suse.com> wrote:
>> >> >> >>> On 23.12.15 at 09:25, <quan.xu@intel.com> wrote:
>> >> >> Also note that unused slots hold zero, i.e. there's at least a
>> >> >> theoretical
>> >> > risk of
>> >> >> problems here when you don't also look at
>> >> >> iommu->domid_bitmap.
>> >> >>
>> >> > I am not clear how to fix this point. Do you have good idea?
>> >> > Add a lock to 'iommu->domid_bitmap'?
>> >>
>> >> How would a lock help avoiding mistaking an unused slot to mean Dom0?
>> >> As already suggested - I think you simply need to consult the bitmap
>> >> along with
>> > the
>> >> domain ID array.
>> >>
>> >
>> > Try to get domain id with iommu->domid_map[did] ?
>> 
>> ???
>> 
>         +if ( iommu->domid_map )
>         +   d = rcu_lock_domain_by_id(iommu->domid_map[did]);
> 
> Is it right?

I don't see what this changes. Again - what your code has been
lacking so far is some mechanism to guarantee that what you
read from domid_map[] is actually a valid domain ID. I can only
once again point your attention to domid_bitmap, which afaics
gives you exactly that valid/invalid indication.

>> >> >> > +            {
>> >> >> > +                list_del(&pdev->domain_list);
>> >> >>
>> >> >> This should happen under pcidevs_lock - you need to either acquire
>> >> >> it or
>> >> >> ASSERT() it being held.
>> >> >>
>> >> >
>> >> > I should check whether it is under pcidevs_lock -- with
>> >> > spin_is_locked(&pcidevs_lock)
>> >> > If it is not under pcidevs_lock, I should acquire it.
>> >> > I think ASSERT() is not a good idea. Hypervisor acquires this lock
>> >> > and then remove the resource.
>> >>
>> >> I don't understand this last sentence.
>> >>
>> > For example: in
>> > pci_remove_device()
>> > {
>> > ...
>> >     spin_lock(&pcidevs_lock);
>> >     ..
>> >     iommu_remove_device()..
>> >     ..
>> >     spin_unlock(&pcidevs_lock);
>> > }
>> >
>> > Device-TLB is maybe flush error in iommu_remove_device()..
>> > Then it is under pcidevs_lock..
>> > In this case, I need to check whether it is under pcidevs_lock.
>> > If not, I need to acquire the pcidevs_lock.
>> 
>> Ah, okay. But you can't use spin_is_locked() for that purpose.
>> 
> Why?

Because spin_is_locked() returns true if _any CPU_ holds the lock,
not just when the CPU you're running on does.

> If I introduce a new parameter 'lock'.
> + int lock = spin_is_locked(&pcidevs_lock);
> 
> 
> + if ( !lock )
> +    spin_lock(&pcidevs_lock);
> ...
> + if ( !lock )
> +    spin_unlock(&pcidevs_lock);
> 
> Is it right? 
> Jan, do you have some better idea?

If indeed different locking state is possible for different call trees,
then I'm afraid you won't get around passing down flags to indicate
whether the needed lock is already being held.

> I think ASSERT is not right.

Indeed.

>> >> >> > @@ -349,9 +416,18 @@ static int flush_iotlb_qi(
>> >> >> >          queue_invalidate_iotlb(iommu,
>> >> >> >                                 type >>
>> >> >> DMA_TLB_FLUSH_GRANU_OFFSET, dr,
>> >> >> >                                 dw, did, size_order, 0, addr);
>> >> >> > +
>> >> >> > +        /*
>> >> >> > +         * Synchronize with hardware for invalidation request
>> >> descriptors
>> >> >> > +         * submitted before Device-TLB invalidate descriptor.
>> >> >> > +         */
>> >> >> > +        rc = invalidate_sync(iommu);
>> >> >> > +        if ( rc )
>> >> >> > +             return rc;
>> >> >> > +
>> >> >> >          if ( flush_dev_iotlb )
>> >> >> >              ret = dev_invalidate_iotlb(iommu, did, addr,
>> >> >> > size_order,
>> >> type);
>> >> >> > -        rc = invalidate_sync(iommu);
>> >> >> > +
>> >> >> >          if ( !ret )
>> >> >> >              ret = rc;
>> >> >> >      }
>> >> >>
>> >> >> This change is because of ...?
>> >> >>
>> >> > As Kevin's comments,
>> >> > I add 'dev_invalidate_iotlb_sync()' to dev_invalidate_iotlb().
>> >> > Then i can know which device is flush timeout.
>> >>
>> >> I don't understand: How is your reply related to you moving up the
>> > invocation of
>> >> invalidate_sync()?
>> >
>> > This invalidate_sync() is to synchronize with hardware for
>> > invalidation request descriptors submitted before Device-TLB
>> > invalidate descriptor.
>> >
>> > If I don't add this invalidate_sync()..
>> > dev_invalidate_iotlb_timeout() is not clear whether the flush time out
>> > is about Device-TLB flush error or the other previous
>> > iotlb/iec/context flush error.
>> 
>> This being anything but obvious, maybe a brief comment would help?
>> 
> 
> Change 
> "
> Synchronize with hardware for invalidation request descriptors 
> submitted before Device-TLB invalidate descriptor.
> "
> TO
> 
> "Synchronize invalidation completions before Device-TLB invalidation
> "
> ?

I'd even further emphasize the ordering, by saying "Before
Device-TLB invalidation we need to ..." (or perhaps it's rather
"... we'd like to ...").

>> >> >> Plus logging the error code and device coordinates might turn out
>> >> >> helpful in such cases. But first of all - if there was a timeout,
>> >> >> we'd make it here only for Dom0. Perhaps the printing should move
>> >> >> into an
>> >> else to the domain_crash()?
>> >> >> And if there was another error, the message would be outright wrong.
>> >> >>
>> >> > IMO, the print for Dom0 is enough.
>> >> > I think it is not a good idea to move into an else to domain_crash().
>> >> > Domain_crash is quite a common part.
>> >> > Anyway I can improve it in low priority.
>> >>
>> >> At the very least the message should not end up being actively misleading.
>> >>
>> > What about the below?
>> > +                printk(XENLOG_ERR
>> > +                       "Device %04x:%02x:%02x is flush error.",
>> > +                       seg, bus, devfn);
>> 
>> Well, for one you once again omit the error code. And then (I think others 
> had
>> pointed that out already) a device cannot be flush error.
>> Perhaps you mean "Flush error %d on device ...\n"? Plus you absolutely 
> should
>> print PCI device coordinates in the canonical way, 
> 
> What does 'print PCI device coordinates in the canonical way' mean ?
> Such as  "0000:00:02.1"?

Yes. There are numerous examples throughout the tree.

Jan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-21 16:31           ` Jan Beulich
@ 2016-01-22 15:57             ` Xu, Quan
  2016-01-25 14:09               ` Jan Beulich
  0 siblings, 1 reply; 35+ messages in thread
From: Xu, Quan @ 2016-01-22 15:57 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, george.dunlap, andrew.cooper3, tim,
	xen-devel, Nakajima, Jun, keir

> On January 22, 2016 at 12:31am, <JBeulich@suse.com> wrote:
> >>> On 21.01.16 at 17:16, <quan.xu@intel.com> wrote:
> >>  On January 20, 2016 at 7:29 pm,  <JBeulich@suse.com> wrote:
> >> >>> On 20.01.16 at 11:26, <quan.xu@intel.com> wrote:
> >> >> On January 15, 2016 at 9:10, <JBeulich@suse.com> wrote:
> >> >> >>> On 23.12.15 at 09:25, <quan.xu@intel.com> wrote:
> >> >> > @@ -229,6 +239,63 @@ int qinval_device_iotlb(struct iommu *iommu,
> >> >> >      return 0;
> >> >> >  }
> >> >> >
> >> >> > +static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16
> did,
> >> >> > +                                         u16 seg, u8 bus, u8
> >> >> > +devfn)
> >> >> {
> >> >> > +    struct domain *d;
> >> >> > +    struct pci_dev *pdev;
> >> >> > +
> >> >> > +    d = rcu_lock_domain_by_id(iommu->domid_map[did]);
> >> >> > +    ASSERT(d);
> >> >>
> >> >> Don't you need to acquire some lock in order to safely assert this?
> >> >
> >> > Referred to the other use case of 'rcu_lock_domain_by_id()', Xen
> >> > checks whether The domain is 'NULL'.
> >> > Could I also replace the 'ASSERT(d)' with
> >> >           + If ( d == NULL )
> >> >           +   return;
> >> > ?
> >>
> >> At a first glance this doesn't look right, but in the end that's
> >> something
> > you need
> >> to answer.
> >>
> >
> > Is it quite similar to whether the domain has been destroyed when
> > Device-TLB is flushing?
> 
> Going back to the original question I raised, you need to ask
> yourself: Can d validly be NULL when you get here (e.g. due to some other
> processor changing something in parallel)?

I think that is YES.
Not all of the callers acquire RCU lock. Such as:
   $flush_iotlb_qi()--- iommu_flush_iotlb_psi() -- __intel_iommu_iotlb_flush() --intel_iommu_iotlb_flush() --iommu_iotlb_flush() --xenmem_add_to_physmap()--do_memory_op()
When Device-TLB is flushing for the above call path, the domain is destroyed in parallel. Then the d is NULL.


> If yes, you can't ASSERT(), and the
> next question then is what exactly it means that it got ripped off the table. The
> answer to that determines whether simply returning would be okay.
> 

IMO, Simply returning would be okay, but the device would not be hidden.



> >> >> Also note that unused slots hold zero, i.e. there's at least a
> >> >> theoretical
> >> > risk of
> >> >> problems here when you don't also look at
> >> >> iommu->domid_bitmap.
> >> >>
> >> > I am not clear how to fix this point. Do you have good idea?
> >> > Add a lock to 'iommu->domid_bitmap'?
> >>
> >> How would a lock help avoiding mistaking an unused slot to mean Dom0?
> >> As already suggested - I think you simply need to consult the bitmap
> >> along with
> > the
> >> domain ID array.
> >>
> >
> > Try to get domain id with iommu->domid_map[did] ?
> 
> ???
> 
        +if ( iommu->domid_map )
        +   d = rcu_lock_domain_by_id(iommu->domid_map[did]);

Is it right?


> >> >> > +            {
> >> >> > +                list_del(&pdev->domain_list);
> >> >>
> >> >> This should happen under pcidevs_lock - you need to either acquire
> >> >> it or
> >> >> ASSERT() it being held.
> >> >>
> >> >
> >> > I should check whether it is under pcidevs_lock -- with
> >> > spin_is_locked(&pcidevs_lock)
> >> > If it is not under pcidevs_lock, I should acquire it.
> >> > I think ASSERT() is not a good idea. Hypervisor acquires this lock
> >> > and then remove the resource.
> >>
> >> I don't understand this last sentence.
> >>
> > For example: in
> > pci_remove_device()
> > {
> > ...
> >     spin_lock(&pcidevs_lock);
> >     ..
> >     iommu_remove_device()..
> >     ..
> >     spin_unlock(&pcidevs_lock);
> > }
> >
> > Device-TLB is maybe flush error in iommu_remove_device()..
> > Then it is under pcidevs_lock..
> > In this case, I need to check whether it is under pcidevs_lock.
> > If not, I need to acquire the pcidevs_lock.
> 
> Ah, okay. But you can't use spin_is_locked() for that purpose.
> 
Why?

If I introduce a new parameter 'lock'.
+ int lock = spin_is_locked(&pcidevs_lock);


+ if ( !lock )
+    spin_lock(&pcidevs_lock);
...
+ if ( !lock )
+    spin_unlock(&pcidevs_lock);

Is it right? 
Jan, do you have some better idea? I think ASSERT is not right.




> >> >> > @@ -349,9 +416,18 @@ static int flush_iotlb_qi(
> >> >> >          queue_invalidate_iotlb(iommu,
> >> >> >                                 type >>
> >> >> DMA_TLB_FLUSH_GRANU_OFFSET, dr,
> >> >> >                                 dw, did, size_order, 0, addr);
> >> >> > +
> >> >> > +        /*
> >> >> > +         * Synchronize with hardware for invalidation request
> >> descriptors
> >> >> > +         * submitted before Device-TLB invalidate descriptor.
> >> >> > +         */
> >> >> > +        rc = invalidate_sync(iommu);
> >> >> > +        if ( rc )
> >> >> > +             return rc;
> >> >> > +
> >> >> >          if ( flush_dev_iotlb )
> >> >> >              ret = dev_invalidate_iotlb(iommu, did, addr,
> >> >> > size_order,
> >> type);
> >> >> > -        rc = invalidate_sync(iommu);
> >> >> > +
> >> >> >          if ( !ret )
> >> >> >              ret = rc;
> >> >> >      }
> >> >>
> >> >> This change is because of ...?
> >> >>
> >> > As Kevin's comments,
> >> > I add 'dev_invalidate_iotlb_sync()' to dev_invalidate_iotlb().
> >> > Then i can know which device is flush timeout.
> >>
> >> I don't understand: How is your reply related to you moving up the
> > invocation of
> >> invalidate_sync()?
> >
> > This invalidate_sync() is to synchronize with hardware for
> > invalidation request descriptors submitted before Device-TLB
> > invalidate descriptor.
> >
> > If I don't add this invalidate_sync()..
> > dev_invalidate_iotlb_timeout() is not clear whether the flush time out
> > is about Device-TLB flush error or the other previous
> > iotlb/iec/context flush error.
> 
> This being anything but obvious, maybe a brief comment would help?
> 

Change 
"
Synchronize with hardware for invalidation request descriptors 
submitted before Device-TLB invalidate descriptor.
"
TO

"Synchronize invalidation completions before Device-TLB invalidation
"
?

> >> >> Plus logging the error code and device coordinates might turn out
> >> >> helpful in such cases. But first of all - if there was a timeout,
> >> >> we'd make it here only for Dom0. Perhaps the printing should move
> >> >> into an
> >> else to the domain_crash()?
> >> >> And if there was another error, the message would be outright wrong.
> >> >>
> >> > IMO, the print for Dom0 is enough.
> >> > I think it is not a good idea to move into an else to domain_crash().
> >> > Domain_crash is quite a common part.
> >> > Anyway I can improve it in low priority.
> >>
> >> At the very least the message should not end up being actively misleading.
> >>
> > What about the below?
> > +                printk(XENLOG_ERR
> > +                       "Device %04x:%02x:%02x is flush error.",
> > +                       seg, bus, devfn);
> 
> Well, for one you once again omit the error code. And then (I think others had
> pointed that out already) a device cannot be flush error.
> Perhaps you mean "Flush error %d on device ...\n"? Plus you absolutely should
> print PCI device coordinates in the canonical way, 

What does 'print PCI device coordinates in the canonical way' mean ?
Such as  "0000:00:02.1"?

> so that grep-ing a log file for
> issues with a specific device will find everything related to that device.


I am appreciate your kindness.
Quan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-21 16:16         ` Xu, Quan
@ 2016-01-21 16:31           ` Jan Beulich
  2016-01-22 15:57             ` Xu, Quan
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2016-01-21 16:31 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, george.dunlap, andrew.cooper3, tim,
	xen-devel, Jun Nakajima, keir

>>> On 21.01.16 at 17:16, <quan.xu@intel.com> wrote:
>>  On January 20, 2016 at 7:29 pm,  <JBeulich@suse.com> wrote:
>> >>> On 20.01.16 at 11:26, <quan.xu@intel.com> wrote:
>> >> On January 15, 2016 at 9:10, <JBeulich@suse.com> wrote:
>> >> >>> On 23.12.15 at 09:25, <quan.xu@intel.com> wrote:
>> >> > @@ -229,6 +239,63 @@ int qinval_device_iotlb(struct iommu *iommu,
>> >> >      return 0;
>> >> >  }
>> >> >
>> >> > +static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16 did,
>> >> > +                                         u16 seg, u8 bus, u8
>> >> > +devfn)
>> >> {
>> >> > +    struct domain *d;
>> >> > +    struct pci_dev *pdev;
>> >> > +
>> >> > +    d = rcu_lock_domain_by_id(iommu->domid_map[did]);
>> >> > +    ASSERT(d);
>> >>
>> >> Don't you need to acquire some lock in order to safely assert this?
>> >
>> > Referred to the other use case of 'rcu_lock_domain_by_id()', Xen
>> > checks whether The domain is 'NULL'.
>> > Could I also replace the 'ASSERT(d)' with
>> >           + If ( d == NULL )
>> >           +   return;
>> > ?
>> 
>> At a first glance this doesn't look right, but in the end that's something 
> you need
>> to answer.
>> 
> 
> Is it quite similar to whether the domain has been destroyed when Device-TLB 
> is flushing?

Going back to the original question I raised, you need to ask
yourself: Can d validly be NULL when you get here (e.g. due to
some other processor changing something in parallel)? If yes,
you can't ASSERT(), and the next question then is what exactly
it means that it got ripped off the table. The answer to that
determines whether simply returning would be okay.

>> >> Also note that unused slots hold zero, i.e. there's at least a
>> >> theoretical
>> > risk of
>> >> problems here when you don't also look at
>> >> iommu->domid_bitmap.
>> >>
>> > I am not clear how to fix this point. Do you have good idea?
>> > Add a lock to 'iommu->domid_bitmap'?
>> 
>> How would a lock help avoiding mistaking an unused slot to mean Dom0? As
>> already suggested - I think you simply need to consult the bitmap along with 
> the
>> domain ID array.
>> 
> 
> Try to get domain id with iommu->domid_map[did] ?

???

>> >> > +            {
>> >> > +                list_del(&pdev->domain_list);
>> >>
>> >> This should happen under pcidevs_lock - you need to either acquire it
>> >> or
>> >> ASSERT() it being held.
>> >>
>> >
>> > I should check whether it is under pcidevs_lock -- with
>> > spin_is_locked(&pcidevs_lock)
>> > If it is not under pcidevs_lock, I should acquire it.
>> > I think ASSERT() is not a good idea. Hypervisor acquires this lock and
>> > then remove the resource.
>> 
>> I don't understand this last sentence.
>> 
> For example: in 
> pci_remove_device()
> {
> ...
>     spin_lock(&pcidevs_lock);
>     ..
>     iommu_remove_device()..
>     ..
>     spin_unlock(&pcidevs_lock);
> }
> 
> Device-TLB is maybe flush error in iommu_remove_device()..
> Then it is under pcidevs_lock..
> In this case, I need to check whether it is under pcidevs_lock.
> If not, I need to acquire the pcidevs_lock.

Ah, okay. But you can't use spin_is_locked() for that purpose.

>> >> > @@ -349,9 +416,18 @@ static int flush_iotlb_qi(
>> >> >          queue_invalidate_iotlb(iommu,
>> >> >                                 type >>
>> >> DMA_TLB_FLUSH_GRANU_OFFSET, dr,
>> >> >                                 dw, did, size_order, 0, addr);
>> >> > +
>> >> > +        /*
>> >> > +         * Synchronize with hardware for invalidation request
>> descriptors
>> >> > +         * submitted before Device-TLB invalidate descriptor.
>> >> > +         */
>> >> > +        rc = invalidate_sync(iommu);
>> >> > +        if ( rc )
>> >> > +             return rc;
>> >> > +
>> >> >          if ( flush_dev_iotlb )
>> >> >              ret = dev_invalidate_iotlb(iommu, did, addr, size_order,
>> type);
>> >> > -        rc = invalidate_sync(iommu);
>> >> > +
>> >> >          if ( !ret )
>> >> >              ret = rc;
>> >> >      }
>> >>
>> >> This change is because of ...?
>> >>
>> > As Kevin's comments,
>> > I add 'dev_invalidate_iotlb_sync()' to dev_invalidate_iotlb().
>> > Then i can know which device is flush timeout.
>> 
>> I don't understand: How is your reply related to you moving up the 
> invocation of
>> invalidate_sync()?
> 
> This invalidate_sync() is to synchronize with hardware for invalidation 
> request descriptors
> submitted before Device-TLB invalidate descriptor.
> 
> If I don't add this invalidate_sync().. 
> dev_invalidate_iotlb_timeout() is not clear whether the flush time out is 
> about Device-TLB flush error or the other previous iotlb/iec/context flush 
> error. 

This being anything but obvious, maybe a brief comment would
help?

>> >> Plus logging the error code and device coordinates might turn out
>> >> helpful in such cases. But first of all - if there was a timeout,
>> >> we'd make it here only for Dom0. Perhaps the printing should move into an
>> else to the domain_crash()?
>> >> And if there was another error, the message would be outright wrong.
>> >>
>> > IMO, the print for Dom0 is enough.
>> > I think it is not a good idea to move into an else to domain_crash().
>> > Domain_crash is quite a common part.
>> > Anyway I can improve it in low priority.
>> 
>> At the very least the message should not end up being actively misleading.
>> 
> What about the below?
> +                printk(XENLOG_ERR
> +                       "Device %04x:%02x:%02x is flush error.",
> +                       seg, bus, devfn);

Well, for one you once again omit the error code. And then (I think
others had pointed that out already) a device cannot be flush error.
Perhaps you mean "Flush error %d on device ...\n"? Plus you
absolutely should print PCI device coordinates in the canonical way,
so that grep-ing a log file for issues with a specific device will find
everything related to that device.

Jan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-20 11:29       ` Jan Beulich
@ 2016-01-21 16:16         ` Xu, Quan
  2016-01-21 16:31           ` Jan Beulich
  0 siblings, 1 reply; 35+ messages in thread
From: Xu, Quan @ 2016-01-21 16:16 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, george.dunlap, andrew.cooper3, tim,
	xen-devel, Nakajima, Jun, keir

> On January 20, 2016 at 7:29 pm,  <JBeulich@suse.com> wrote:
> >>> On 20.01.16 at 11:26, <quan.xu@intel.com> wrote:
> >> On January 15, 2016 at 9:10, <JBeulich@suse.com> wrote:
> >> >>> On 23.12.15 at 09:25, <quan.xu@intel.com> wrote:
> >> > @@ -229,6 +239,63 @@ int qinval_device_iotlb(struct iommu *iommu,
> >> >      return 0;
> >> >  }
> >> >
> >> > +static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16 did,
> >> > +                                         u16 seg, u8 bus, u8
> >> > +devfn)
> >> {
> >> > +    struct domain *d;
> >> > +    struct pci_dev *pdev;
> >> > +
> >> > +    d = rcu_lock_domain_by_id(iommu->domid_map[did]);
> >> > +    ASSERT(d);
> >>
> >> Don't you need to acquire some lock in order to safely assert this?
> >
> > Referred to the other use case of 'rcu_lock_domain_by_id()', Xen
> > checks whether The domain is 'NULL'.
> > Could I also replace the 'ASSERT(d)' with
> >           + If ( d == NULL )
> >           +   return;
> > ?
> 
> At a first glance this doesn't look right, but in the end that's something you need
> to answer.
> 

Is it quite similar to whether the domain has been destroyed when Device-TLB is flushing?
Correct me if I still doesn't get you point..


> >> Also note that unused slots hold zero, i.e. there's at least a
> >> theoretical
> > risk of
> >> problems here when you don't also look at
> >> iommu->domid_bitmap.
> >>
> > I am not clear how to fix this point. Do you have good idea?
> > Add a lock to 'iommu->domid_bitmap'?
> 
> How would a lock help avoiding mistaking an unused slot to mean Dom0? As
> already suggested - I think you simply need to consult the bitmap along with the
> domain ID array.
> 

Try to get domain id with iommu->domid_map[did] ?


> >> > +            {
> >> > +                list_del(&pdev->domain_list);
> >>
> >> This should happen under pcidevs_lock - you need to either acquire it
> >> or
> >> ASSERT() it being held.
> >>
> >
> > I should check whether it is under pcidevs_lock -- with
> > spin_is_locked(&pcidevs_lock)
> > If it is not under pcidevs_lock, I should acquire it.
> > I think ASSERT() is not a good idea. Hypervisor acquires this lock and
> > then remove the resource.
> 
> I don't understand this last sentence.
> 
For example: in 
pci_remove_device()
{
...
    spin_lock(&pcidevs_lock);
    ..
    iommu_remove_device()..
    ..
    spin_unlock(&pcidevs_lock);
}

Device-TLB is maybe flush error in iommu_remove_device()..
Then it is under pcidevs_lock..
In this case, I need to check whether it is under pcidevs_lock.
If not, I need to acquire the pcidevs_lock.

> >> > @@ -349,9 +416,18 @@ static int flush_iotlb_qi(
> >> >          queue_invalidate_iotlb(iommu,
> >> >                                 type >>
> >> DMA_TLB_FLUSH_GRANU_OFFSET, dr,
> >> >                                 dw, did, size_order, 0, addr);
> >> > +
> >> > +        /*
> >> > +         * Synchronize with hardware for invalidation request
> descriptors
> >> > +         * submitted before Device-TLB invalidate descriptor.
> >> > +         */
> >> > +        rc = invalidate_sync(iommu);
> >> > +        if ( rc )
> >> > +             return rc;
> >> > +
> >> >          if ( flush_dev_iotlb )
> >> >              ret = dev_invalidate_iotlb(iommu, did, addr, size_order,
> type);
> >> > -        rc = invalidate_sync(iommu);
> >> > +
> >> >          if ( !ret )
> >> >              ret = rc;
> >> >      }
> >>
> >> This change is because of ...?
> >>
> > As Kevin's comments,
> > I add 'dev_invalidate_iotlb_sync()' to dev_invalidate_iotlb().
> > Then i can know which device is flush timeout.
> 
> I don't understand: How is your reply related to you moving up the invocation of
> invalidate_sync()?

This invalidate_sync() is to synchronize with hardware for invalidation request descriptors
submitted before Device-TLB invalidate descriptor.

If I don't add this invalidate_sync().. 
dev_invalidate_iotlb_timeout() is not clear whether the flush time out is about Device-TLB flush error or the other previous iotlb/iec/context flush error. 


> 
> >> Plus logging the error code and device coordinates might turn out
> >> helpful in such cases. But first of all - if there was a timeout,
> >> we'd make it here only for Dom0. Perhaps the printing should move into an
> else to the domain_crash()?
> >> And if there was another error, the message would be outright wrong.
> >>
> > IMO, the print for Dom0 is enough.
> > I think it is not a good idea to move into an else to domain_crash().
> > Domain_crash is quite a common part.
> > Anyway I can improve it in low priority.
> 
> At the very least the message should not end up being actively misleading.
> 
What about the below?
+                printk(XENLOG_ERR
+                       "Device %04x:%02x:%02x is flush error.",
+                       seg, bus, devfn);



Quan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-20 10:26     ` Xu, Quan
@ 2016-01-20 11:29       ` Jan Beulich
  2016-01-21 16:16         ` Xu, Quan
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2016-01-20 11:29 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, george.dunlap, andrew.cooper3, tim,
	xen-devel, Jun Nakajima, keir

>>> On 20.01.16 at 11:26, <quan.xu@intel.com> wrote:
>> On January 15, 2016 at 9:10, <JBeulich@suse.com> wrote:
>> >>> On 23.12.15 at 09:25, <quan.xu@intel.com> wrote:
>> > @@ -229,6 +239,63 @@ int qinval_device_iotlb(struct iommu *iommu,
>> >      return 0;
>> >  }
>> >
>> > +static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16 did,
>> > +                                         u16 seg, u8 bus, u8 devfn)
>> {
>> > +    struct domain *d;
>> > +    struct pci_dev *pdev;
>> > +
>> > +    d = rcu_lock_domain_by_id(iommu->domid_map[did]);
>> > +    ASSERT(d);
>> 
>> Don't you need to acquire some lock in order to safely assert this?
> 
> Referred to the other use case of 'rcu_lock_domain_by_id()', Xen checks 
> whether
> The domain is 'NULL'. 
> Could I also replace the 'ASSERT(d)' with
>           + If ( d == NULL )
>           +   return;
> ?

At a first glance this doesn't look right, but in the end that's
something you need to answer.

>> Also note that unused slots hold zero, i.e. there's at least a theoretical 
> risk of
>> problems here when you don't also look at
>> iommu->domid_bitmap.
>> 
> I am not clear how to fix this point. Do you have good idea?
> Add a lock to 'iommu->domid_bitmap'?

How would a lock help avoiding mistaking an unused slot to mean
Dom0? As already suggested - I think you simply need to consult
the bitmap along with the domain ID array.

>> > +            {
>> > +                list_del(&pdev->domain_list);
>> 
>> This should happen under pcidevs_lock - you need to either acquire it or
>> ASSERT() it being held.
>> 
> 
> I should check whether it is under pcidevs_lock -- with 
> spin_is_locked(&pcidevs_lock)
> If it is not under pcidevs_lock, I should acquire it.
> I think ASSERT() is not a good idea. Hypervisor acquires this lock and then 
> remove the resource.

I don't understand this last sentence.

>> > @@ -349,9 +416,18 @@ static int flush_iotlb_qi(
>> >          queue_invalidate_iotlb(iommu,
>> >                                 type >>
>> DMA_TLB_FLUSH_GRANU_OFFSET, dr,
>> >                                 dw, did, size_order, 0, addr);
>> > +
>> > +        /*
>> > +         * Synchronize with hardware for invalidation request descriptors
>> > +         * submitted before Device-TLB invalidate descriptor.
>> > +         */
>> > +        rc = invalidate_sync(iommu);
>> > +        if ( rc )
>> > +             return rc;
>> > +
>> >          if ( flush_dev_iotlb )
>> >              ret = dev_invalidate_iotlb(iommu, did, addr, size_order, type);
>> > -        rc = invalidate_sync(iommu);
>> > +
>> >          if ( !ret )
>> >              ret = rc;
>> >      }
>> 
>> This change is because of ...?
>> 
> As Kevin's comments,
> I add 'dev_invalidate_iotlb_sync()' to dev_invalidate_iotlb(). 
> Then i can know which device is flush timeout.

I don't understand: How is your reply related to you moving up the
invocation of invalidate_sync()?

>> Plus logging the error code and device coordinates might turn out helpful in
>> such cases. But first of all - if there was a timeout, we'd make it here only for
>> Dom0. Perhaps the printing should move into an else to the domain_crash()?
>> And if there was another error, the message would be outright wrong.
>> 
> IMO, the print for Dom0 is enough.
> I think it is not a good idea to move into an else to domain_crash(). 
> Domain_crash is quite a common part.
> Anyway I can improve it in low priority.

At the very least the message should not end up being actively
misleading.

Jan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2016-01-15 13:09   ` Jan Beulich
@ 2016-01-20 10:26     ` Xu, Quan
  2016-01-20 11:29       ` Jan Beulich
  0 siblings, 1 reply; 35+ messages in thread
From: Xu, Quan @ 2016-01-20 10:26 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, george.dunlap, andrew.cooper3, tim,
	xen-devel, Nakajima, Jun, keir

Jan, thanks for your comments.

> On January 15, 2016 at 9:10, <JBeulich@suse.com> wrote:
> >>> On 23.12.15 at 09:25, <quan.xu@intel.com> wrote:
> > --- a/xen/drivers/passthrough/vtd/qinval.c
> > +++ b/xen/drivers/passthrough/vtd/qinval.c
> > @@ -190,9 +190,19 @@ static int queue_invalidate_wait(struct iommu
> > *iommu,  static int invalidate_sync(struct iommu *iommu)  {
> >      struct qi_ctrl *qi_ctrl = iommu_qi_ctrl(iommu);
> > +    int rc;
> >
> >      if ( qi_ctrl->qinval_maddr )
> > -        return queue_invalidate_wait(iommu, 0, 1, 1);
> > +    {
> > +        rc = queue_invalidate_wait(iommu, 0, 1, 1);
> > +        if ( rc )
> > +        {
> > +            if ( rc == -ETIMEDOUT )
> > +                panic("Queue invalidate wait descriptor was
> > + timeout.\n");
> 


> Unless I'm overlooking something this doesn't replace and existing panic(), and I
> think I had indicated quite clearly that this series shouldn't add any new ones,
> unless the alternative of crashing the owning domain is completely unfeasible.
> 


I will remove it in next v5.

[...]

> > @@ -229,6 +239,63 @@ int qinval_device_iotlb(struct iommu *iommu,
> >      return 0;
> >  }
> >
> > +static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16 did,
> > +                                         u16 seg, u8 bus, u8 devfn)
> {
> > +    struct domain *d;
> > +    struct pci_dev *pdev;
> > +
> > +    d = rcu_lock_domain_by_id(iommu->domid_map[did]);
> > +    ASSERT(d);
> 
> Don't you need to acquire some lock in order to safely assert this?

Referred to the other use case of 'rcu_lock_domain_by_id()', Xen checks whether
The domain is 'NULL'. 
Could I also replace the 'ASSERT(d)' with
          + If ( d == NULL )
          +   return;
?

> Also note that unused slots hold zero, i.e. there's at least a theoretical risk of
> problems here when you don't also look at
> iommu->domid_bitmap.
> 
I am not clear how to fix this point. Do you have good idea?
Add a lock to 'iommu->domid_bitmap'?


> > +    for_each_pdev(d, pdev)
> > +    {
> > +        if ( (pdev->seg == seg) &&
> > +             (pdev->bus == bus) &&
> > +             (pdev->devfn == devfn) )
> > +        {
> > +            if ( pdev->domain )
> 
> If the device is on the domain's list, can this reasonably be false?

I can't find the false case.

> I.e. don't you rather mean ASSERT() here?

I'd better replace 'if' with ASSERT() for this case.

> 
> > +            {
> > +                list_del(&pdev->domain_list);
> 
> This should happen under pcidevs_lock - you need to either acquire it or
> ASSERT() it being held.
> 

I should check whether it is under pcidevs_lock -- with spin_is_locked(&pcidevs_lock)
If it is not under pcidevs_lock, I should acquire it.
I think ASSERT() is not a good idea. Hypervisor acquires this lock and then remove the resource.



> > +    }
> > +
> > +    if ( !is_hardware_domain(d) )
> > +        domain_crash(d);
> 
> I wonder whether the device hiding shouldn't also be avoided if the device is
> owned by hwdom.
> 

I think it should be avoided if the device is owned by hwdom.
Otherwise, it is quite similar to panic..


> > @@ -349,9 +416,18 @@ static int flush_iotlb_qi(
> >          queue_invalidate_iotlb(iommu,
> >                                 type >>
> DMA_TLB_FLUSH_GRANU_OFFSET, dr,
> >                                 dw, did, size_order, 0, addr);
> > +
> > +        /*
> > +         * Synchronize with hardware for invalidation request descriptors
> > +         * submitted before Device-TLB invalidate descriptor.
> > +         */
> > +        rc = invalidate_sync(iommu);
> > +        if ( rc )
> > +             return rc;
> > +
> >          if ( flush_dev_iotlb )
> >              ret = dev_invalidate_iotlb(iommu, did, addr, size_order, type);
> > -        rc = invalidate_sync(iommu);
> > +
> >          if ( !ret )
> >              ret = rc;
> >      }
> 
> This change is because of ...?
> 
As Kevin's comments,
I add 'dev_invalidate_iotlb_sync()' to dev_invalidate_iotlb(). 
Then i can know which device is flush timeout.



> > --- a/xen/drivers/passthrough/vtd/x86/ats.c
> > +++ b/xen/drivers/passthrough/vtd/x86/ats.c
> > @@ -162,6 +162,19 @@ int dev_invalidate_iotlb(struct iommu *iommu, u16
> did,
> >              return -EOPNOTSUPP;
> >          }
> >
> > +        /*
> > +         * Synchronize with hardware for Device-TLB invalidate
> > +         * descriptor.
> > +         */
> > +        ret = dev_invalidate_iotlb_sync(iommu, did, pdev->seg,
> > +                                        pdev->bus, pdev->devfn);
> > +        if ( ret )
> > +        {
> > +            dprintk(XENLOG_WARNING VTDPREFIX,
> > +                    "Device-TLB flush timeout.\n");
> 
> Are you aware that dprintk() compiles to nothing in non-debug builds?


To be honest, I was not aware.

> Plus logging the error code and device coordinates might turn out helpful in
> such cases. But first of all - if there was a timeout, we'd make it here only for
> Dom0. Perhaps the printing should move into an else to the domain_crash()?
> And if there was another error, the message would be outright wrong.
> 
IMO, the print for Dom0 is enough.
I think it is not a good idea to move into an else to domain_crash(). Domain_crash is quite a common part.
Anyway I can improve it in low priority.

Jan, thanks..
-Quan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2015-12-23  8:25 ` [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue Quan Xu
  2015-12-25  3:06   ` Tian, Kevin
@ 2016-01-15 13:09   ` Jan Beulich
  2016-01-20 10:26     ` Xu, Quan
  1 sibling, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2016-01-15 13:09 UTC (permalink / raw)
  To: Quan Xu
  Cc: kevin.tian, feng.wu, george.dunlap, andrew.cooper3, tim,
	xen-devel, jun.nakajima, keir

>>> On 23.12.15 at 09:25, <quan.xu@intel.com> wrote:
> --- a/xen/drivers/passthrough/vtd/qinval.c
> +++ b/xen/drivers/passthrough/vtd/qinval.c
> @@ -190,9 +190,19 @@ static int queue_invalidate_wait(struct iommu *iommu,
>  static int invalidate_sync(struct iommu *iommu)
>  {
>      struct qi_ctrl *qi_ctrl = iommu_qi_ctrl(iommu);
> +    int rc;
>  
>      if ( qi_ctrl->qinval_maddr )
> -        return queue_invalidate_wait(iommu, 0, 1, 1);
> +    {
> +        rc = queue_invalidate_wait(iommu, 0, 1, 1);
> +        if ( rc )
> +        {
> +            if ( rc == -ETIMEDOUT )
> +                panic("Queue invalidate wait descriptor was timeout.\n");

Unless I'm overlooking something this doesn't replace and existing
panic(), and I think I had indicated quite clearly that this series
shouldn't add any new ones, unless the alternative of crashing
the owning domain is completely unfeasible.

> +                panic("Queue invalidate wait descriptor was timeout.\n");
> +            return rc;
> +        }
> +    }
> +
>      return 0;
>  }

Please avoid introducing multiple return points when this can be
trivially avoided.

> @@ -229,6 +239,63 @@ int qinval_device_iotlb(struct iommu *iommu,
>      return 0;
>  }
>  
> +static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16 did,
> +                                         u16 seg, u8 bus, u8 devfn)
> +{
> +    struct domain *d;
> +    struct pci_dev *pdev;
> +
> +    d = rcu_lock_domain_by_id(iommu->domid_map[did]);
> +    ASSERT(d);

Don't you need to acquire some lock in order to safely assert this?
Also note that unused slots hold zero, i.e. there's at least a
theoretical risk of problems here when you don't also look at
iommu->domid_bitmap.

> +    for_each_pdev(d, pdev)
> +    {
> +        if ( (pdev->seg == seg) &&
> +             (pdev->bus == bus) &&
> +             (pdev->devfn == devfn) )
> +        {
> +            if ( pdev->domain )

If the device is on the domain's list, can this reasonably be false?
I.e. don't you rather mean ASSERT() here?

> +            {
> +                list_del(&pdev->domain_list);

This should happen under pcidevs_lock - you need to either acquire
it or ASSERT() it being held.

> +                pdev->domain = NULL;
> +            }
> +
> +            if ( pci_hide_device(bus, devfn) )
> +            {
> +                printk(XENLOG_ERR
> +                       "IOMMU hide device %04x:%02x:%02x error.",
> +                       seg, bus, devfn);
> +                break;
> +            }
> +
> +            break;
> +        }

Redundant breaks.

> +    }
> +
> +    if ( !is_hardware_domain(d) )
> +        domain_crash(d);

I wonder whether the device hiding shouldn't also be avoided if
the device is owned by hwdom.

> @@ -349,9 +416,18 @@ static int flush_iotlb_qi(
>          queue_invalidate_iotlb(iommu,
>                                 type >> DMA_TLB_FLUSH_GRANU_OFFSET, dr,
>                                 dw, did, size_order, 0, addr);
> +
> +        /*
> +         * Synchronize with hardware for invalidation request descriptors
> +         * submitted before Device-TLB invalidate descriptor.
> +         */
> +        rc = invalidate_sync(iommu);
> +        if ( rc )
> +             return rc;
> +
>          if ( flush_dev_iotlb )
>              ret = dev_invalidate_iotlb(iommu, did, addr, size_order, type);
> -        rc = invalidate_sync(iommu);
> +
>          if ( !ret )
>              ret = rc;
>      }

This change is because of ...?

> --- a/xen/drivers/passthrough/vtd/x86/ats.c
> +++ b/xen/drivers/passthrough/vtd/x86/ats.c
> @@ -162,6 +162,19 @@ int dev_invalidate_iotlb(struct iommu *iommu, u16 did,
>              return -EOPNOTSUPP;
>          }
>  
> +        /*
> +         * Synchronize with hardware for Device-TLB invalidate
> +         * descriptor.
> +         */
> +        ret = dev_invalidate_iotlb_sync(iommu, did, pdev->seg,
> +                                        pdev->bus, pdev->devfn);
> +        if ( ret )
> +        {
> +            dprintk(XENLOG_WARNING VTDPREFIX,
> +                    "Device-TLB flush timeout.\n");

Are you aware that dprintk() compiles to nothing in non-debug builds?
Plus logging the error code and device coordinates might turn out
helpful in such cases. But first of all - if there was a timeout, we'd
make it here only for Dom0. Perhaps the printing should move into
an else to the domain_crash()? And if there was another error, the
message would be outright wrong.

Jan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2015-12-25  3:06   ` Tian, Kevin
@ 2016-01-06 11:25     ` Xu, Quan
  0 siblings, 0 replies; 35+ messages in thread
From: Xu, Quan @ 2016-01-06 11:25 UTC (permalink / raw)
  To: Tian, Kevin, jbeulich
  Cc: Wu, Feng, Dong, Eddie, george.dunlap, andrew.cooper3, tim,
	xen-devel, Nakajima, Jun, keir

> On December 25 2015, 11:06 AM, <Tian, Kevin> wrote: 
> > From: Xu, Quan
> > Sent: Wednesday, December 23, 2015 4:26 PM
> >
> > Now if IOTLB/Context/IETC flush is timeout, panic hypervisor.
> > The coming patch set will fix it.
> 
> again, please adjust comment to reflect what this patch is doing, i.e. only about
> improvement on Device-TLB flush.
> 

Agreed.

> >
> > If Device-TLB flush is timeout, we'll hide the target ATS device and
> > crash the domain owning this ATS device.
> >
> > If impacted domain is hardware domain, just throw out a warning.
> >
> > The hided Device will be disallowed to be further assigned to any
> > domain.
> 
> hided Device -> hidden device
> 

Agreed.

[...]

> > diff --git a/xen/drivers/passthrough/vtd/qinval.c
> > b/xen/drivers/passthrough/vtd/qinval.c
> > index b227e4e..7330c5d 100644
> > --- a/xen/drivers/passthrough/vtd/qinval.c
> > +++ b/xen/drivers/passthrough/vtd/qinval.c
> > @@ -190,9 +190,19 @@ static int queue_invalidate_wait(struct iommu
> > *iommu,  static int invalidate_sync(struct iommu *iommu)  {
> >      struct qi_ctrl *qi_ctrl = iommu_qi_ctrl(iommu);
> > +    int rc;
> >
> >      if ( qi_ctrl->qinval_maddr )
> > -        return queue_invalidate_wait(iommu, 0, 1, 1);
> > +    {
> > +        rc = queue_invalidate_wait(iommu, 0, 1, 1);
> > +        if ( rc )
> > +        {
> > +            if ( rc == -ETIMEDOUT )
> > +                panic("Queue invalidate wait descriptor was
> timeout.\n");
> > +            return rc;
> > +        }
> > +    }
> > +
> 
> why do you still do panic here? w/ PATCH 1/3 you should return rc to caller
> directly, right?
> 

This panic is original in queue_invalidate_wait(). Patch 1/3 is just for Device-TLB flush error ( Actually it covers iotlb flush error).
If I fix VT-d iotlb/context/iec flush error to propagated callers, then I can remove this panic and return rc to propagated caller directly.



> > +static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16 did,
> > +                                         u16 seg, u8 bus, u8 devfn)
> {
> > +    struct domain *d;
> > +    struct pci_dev *pdev;
> > +
> > +    d = rcu_lock_domain_by_id(iommu->domid_map[did]);
> > +    ASSERT(d);
> > +    for_each_pdev(d, pdev)
> > +    {
> > +        if ( (pdev->seg == seg) &&
> > +             (pdev->bus == bus) &&
> > +             (pdev->devfn == devfn) )
> > +        {
> > +            if ( pdev->domain )
> > +            {
> > +                list_del(&pdev->domain_list);
> > +                pdev->domain = NULL;
> > +            }
> > +
> > +            if ( pci_hide_device(bus, devfn) )
> > +            {
> > +                printk(XENLOG_ERR
> > +                       "IOMMU hide device %04x:%02x:%02x error.",
> > +                       seg, bus, devfn);
> > +                break;
> > +            }
> 
> I'm not sure whether using pci_hide_device is enough or not break other code
> path here? For example, now you have errors propagated to callers.

More information, after call pci_hide_device(),
..
         pdev->domain = dom_xen;
         list_add(&pdev->domain_list, &dom_xen->arch.pdev_list);
..

Hidden PCI devices are associated with 'dom_xen'.
Now I found it may not clear rmrr mapping / context mapping. .i.e iommu_remove_device() -- [hd->platform_ops is NULL for dom_xen].
But I think it is acceptable, IMO dom_xen is a part hypervisor, and there mappings would not a security issue.
Or I can remove these mappings before to hide this device.

So I think it will not break other code break other code.

> What's the
> final policy against flush error? 
>

If Device-TLB flush is timeout, we'll hide the target ATS device and crash the domain owning this ATS device.
If impacted domain is hardware domain, just throw out a warning, instead of crash the hardware domain.
The hided Device will be disallowed to be further assigned to any domain.



>If they will finally go to cleanup some more state
> relying on pdev->domain, then that code path might be broken. If it's fine to do
> cleanup at this point, then just hidden is not enough. If you look at
> pci_remove_device, there's quite some state to be further cleaned up...
> 


As the pdev->domain is 'dom_xen';
So for this case, it is still working. Correct me if it is not correct.
BTW, a quick question, which case to call pci_remove_device()?


> I didn't think about the full logic thoroughly now. But it would always be good
> to not hide device now until a point where all related states have been cleaned
> up in error handling path chained up.
> 
 
Quan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2015-12-23  8:25 ` [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue Quan Xu
@ 2015-12-25  3:06   ` Tian, Kevin
  2016-01-06 11:25     ` Xu, Quan
  2016-01-15 13:09   ` Jan Beulich
  1 sibling, 1 reply; 35+ messages in thread
From: Tian, Kevin @ 2015-12-25  3:06 UTC (permalink / raw)
  To: Xu, Quan, jbeulich
  Cc: Wu, Feng, Dong, Eddie, george.dunlap, andrew.cooper3, tim,
	xen-devel, Nakajima, Jun, keir

> From: Xu, Quan
> Sent: Wednesday, December 23, 2015 4:26 PM
> 
> Now if IOTLB/Context/IETC flush is timeout, panic hypervisor.
> The coming patch set will fix it.

again, please adjust comment to reflect what this patch is
doing, i.e. only about improvement on Device-TLB flush.

> 
> If Device-TLB flush is timeout, we'll hide the target ATS
> device and crash the domain owning this ATS device.
> 
> If impacted domain is hardware domain, just throw out a warning.
> 
> The hided Device will be disallowed to be further assigned to
> any domain.

hided Device -> hidden device

> 
> Signed-off-by: Quan Xu <quan.xu@intel.com>
> ---
>  xen/drivers/passthrough/pci.c         |  2 +-
>  xen/drivers/passthrough/vtd/extern.h  |  2 +
>  xen/drivers/passthrough/vtd/qinval.c  | 80
> ++++++++++++++++++++++++++++++++++-
>  xen/drivers/passthrough/vtd/x86/ats.c | 13 ++++++
>  4 files changed, 94 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index 27b3ca7..2d7dc59 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -407,7 +407,7 @@ static void _pci_hide_device(struct pci_dev *pdev)
>      list_add(&pdev->domain_list, &dom_xen->arch.pdev_list);
>  }
> 
> -int __init pci_hide_device(int bus, int devfn)
> +int pci_hide_device(int bus, int devfn)
>  {
>      struct pci_dev *pdev;
>      int rc = -ENOMEM;
> diff --git a/xen/drivers/passthrough/vtd/extern.h
> b/xen/drivers/passthrough/vtd/extern.h
> index ec9c513..a2123ce 100644
> --- a/xen/drivers/passthrough/vtd/extern.h
> +++ b/xen/drivers/passthrough/vtd/extern.h
> @@ -58,6 +58,8 @@ int ats_device(const struct pci_dev *, const struct acpi_drhd_unit *);
> 
>  int dev_invalidate_iotlb(struct iommu *iommu, u16 did,
>                           u64 addr, unsigned int size_order, u64 type);
> +int dev_invalidate_iotlb_sync(struct iommu *iommu, u16 did,
> +                              u16 seg, u8 bus, u8 devfn);
> 
>  int qinval_device_iotlb(struct iommu *iommu,
>                          u32 max_invs_pend, u16 sid, u16 size, u64 addr);
> diff --git a/xen/drivers/passthrough/vtd/qinval.c b/xen/drivers/passthrough/vtd/qinval.c
> index b227e4e..7330c5d 100644
> --- a/xen/drivers/passthrough/vtd/qinval.c
> +++ b/xen/drivers/passthrough/vtd/qinval.c
> @@ -190,9 +190,19 @@ static int queue_invalidate_wait(struct iommu *iommu,
>  static int invalidate_sync(struct iommu *iommu)
>  {
>      struct qi_ctrl *qi_ctrl = iommu_qi_ctrl(iommu);
> +    int rc;
> 
>      if ( qi_ctrl->qinval_maddr )
> -        return queue_invalidate_wait(iommu, 0, 1, 1);
> +    {
> +        rc = queue_invalidate_wait(iommu, 0, 1, 1);
> +        if ( rc )
> +        {
> +            if ( rc == -ETIMEDOUT )
> +                panic("Queue invalidate wait descriptor was timeout.\n");
> +            return rc;
> +        }
> +    }
> +

why do you still do panic here? w/ PATCH 1/3 you should return rc
to caller directly, right?

>      return 0;
>  }
> 
> @@ -229,6 +239,63 @@ int qinval_device_iotlb(struct iommu *iommu,
>      return 0;
>  }
> 
> +static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16 did,
> +                                         u16 seg, u8 bus, u8 devfn)
> +{
> +    struct domain *d;
> +    struct pci_dev *pdev;
> +
> +    d = rcu_lock_domain_by_id(iommu->domid_map[did]);
> +    ASSERT(d);
> +    for_each_pdev(d, pdev)
> +    {
> +        if ( (pdev->seg == seg) &&
> +             (pdev->bus == bus) &&
> +             (pdev->devfn == devfn) )
> +        {
> +            if ( pdev->domain )
> +            {
> +                list_del(&pdev->domain_list);
> +                pdev->domain = NULL;
> +            }
> +
> +            if ( pci_hide_device(bus, devfn) )
> +            {
> +                printk(XENLOG_ERR
> +                       "IOMMU hide device %04x:%02x:%02x error.",
> +                       seg, bus, devfn);
> +                break;
> +            }

I'm not sure whether using pci_hide_device is enough or not break
other code path here? For example, now you have errors propagated
to callers. What's the final policy against flush error? If they will
finally go to cleanup some more state relying on pdev->domain, then
that code path might be broken. If it's fine to do cleanup at this point,
then just hidden is not enough. If you look at pci_remove_device, there's
quite some state to be further cleaned up...

I didn't think about the full logic thoroughly now. But it would always be
good to not hide device now until a point where all related states have
been cleaned up in error handling path chained up.

Thanks
Kevin 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
  2015-12-23  8:25 [PATCH v4 0/3] VT-d Device-TLB flush issue Quan Xu
@ 2015-12-23  8:25 ` Quan Xu
  2015-12-25  3:06   ` Tian, Kevin
  2016-01-15 13:09   ` Jan Beulich
  0 siblings, 2 replies; 35+ messages in thread
From: Quan Xu @ 2015-12-23  8:25 UTC (permalink / raw)
  To: jbeulich, kevin.tian
  Cc: feng.wu, eddie.dong, george.dunlap, andrew.cooper3, tim,
	xen-devel, jun.nakajima, Quan Xu, keir

Now if IOTLB/Context/IETC flush is timeout, panic hypervisor.
The coming patch set will fix it.

If Device-TLB flush is timeout, we'll hide the target ATS
device and crash the domain owning this ATS device.

If impacted domain is hardware domain, just throw out a warning.

The hided Device will be disallowed to be further assigned to
any domain.

Signed-off-by: Quan Xu <quan.xu@intel.com>
---
 xen/drivers/passthrough/pci.c         |  2 +-
 xen/drivers/passthrough/vtd/extern.h  |  2 +
 xen/drivers/passthrough/vtd/qinval.c  | 80 ++++++++++++++++++++++++++++++++++-
 xen/drivers/passthrough/vtd/x86/ats.c | 13 ++++++
 4 files changed, 94 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 27b3ca7..2d7dc59 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -407,7 +407,7 @@ static void _pci_hide_device(struct pci_dev *pdev)
     list_add(&pdev->domain_list, &dom_xen->arch.pdev_list);
 }
 
-int __init pci_hide_device(int bus, int devfn)
+int pci_hide_device(int bus, int devfn)
 {
     struct pci_dev *pdev;
     int rc = -ENOMEM;
diff --git a/xen/drivers/passthrough/vtd/extern.h b/xen/drivers/passthrough/vtd/extern.h
index ec9c513..a2123ce 100644
--- a/xen/drivers/passthrough/vtd/extern.h
+++ b/xen/drivers/passthrough/vtd/extern.h
@@ -58,6 +58,8 @@ int ats_device(const struct pci_dev *, const struct acpi_drhd_unit *);
 
 int dev_invalidate_iotlb(struct iommu *iommu, u16 did,
                          u64 addr, unsigned int size_order, u64 type);
+int dev_invalidate_iotlb_sync(struct iommu *iommu, u16 did,
+                              u16 seg, u8 bus, u8 devfn);
 
 int qinval_device_iotlb(struct iommu *iommu,
                         u32 max_invs_pend, u16 sid, u16 size, u64 addr);
diff --git a/xen/drivers/passthrough/vtd/qinval.c b/xen/drivers/passthrough/vtd/qinval.c
index b227e4e..7330c5d 100644
--- a/xen/drivers/passthrough/vtd/qinval.c
+++ b/xen/drivers/passthrough/vtd/qinval.c
@@ -190,9 +190,19 @@ static int queue_invalidate_wait(struct iommu *iommu,
 static int invalidate_sync(struct iommu *iommu)
 {
     struct qi_ctrl *qi_ctrl = iommu_qi_ctrl(iommu);
+    int rc;
 
     if ( qi_ctrl->qinval_maddr )
-        return queue_invalidate_wait(iommu, 0, 1, 1);
+    {
+        rc = queue_invalidate_wait(iommu, 0, 1, 1);
+        if ( rc )
+        {
+            if ( rc == -ETIMEDOUT )
+                panic("Queue invalidate wait descriptor was timeout.\n");
+            return rc;
+        }
+    }
+
     return 0;
 }
 
@@ -229,6 +239,63 @@ int qinval_device_iotlb(struct iommu *iommu,
     return 0;
 }
 
+static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16 did,
+                                         u16 seg, u8 bus, u8 devfn)
+{
+    struct domain *d;
+    struct pci_dev *pdev;
+
+    d = rcu_lock_domain_by_id(iommu->domid_map[did]);
+    ASSERT(d);
+    for_each_pdev(d, pdev)
+    {
+        if ( (pdev->seg == seg) &&
+             (pdev->bus == bus) &&
+             (pdev->devfn == devfn) )
+        {
+            if ( pdev->domain )
+            {
+                list_del(&pdev->domain_list);
+                pdev->domain = NULL;
+            }
+
+            if ( pci_hide_device(bus, devfn) )
+            {
+                printk(XENLOG_ERR
+                       "IOMMU hide device %04x:%02x:%02x error.",
+                       seg, bus, devfn);
+                break;
+            }
+
+            break;
+        }
+    }
+
+    if ( !is_hardware_domain(d) )
+        domain_crash(d);
+    rcu_unlock_domain(d);
+}
+
+int dev_invalidate_iotlb_sync(struct iommu *iommu, u16 did,
+                              u16 seg, u8 bus, u8 devfn)
+{
+    struct qi_ctrl *qi_ctrl = iommu_qi_ctrl(iommu);
+    int rc;
+
+    if ( qi_ctrl->qinval_maddr )
+    {
+        rc = queue_invalidate_wait(iommu, 0, 1, 1);
+        if ( rc )
+        {
+            if ( rc == -ETIMEDOUT )
+                dev_invalidate_iotlb_timeout(iommu, did, seg, bus, devfn);
+            return rc;
+        }
+    }
+
+    return 0;
+}
+
 static void queue_invalidate_iec(struct iommu *iommu, u8 granu, u8 im, u16 iidx)
 {
     unsigned long flags;
@@ -349,9 +416,18 @@ static int flush_iotlb_qi(
         queue_invalidate_iotlb(iommu,
                                type >> DMA_TLB_FLUSH_GRANU_OFFSET, dr,
                                dw, did, size_order, 0, addr);
+
+        /*
+         * Synchronize with hardware for invalidation request descriptors
+         * submitted before Device-TLB invalidate descriptor.
+         */
+        rc = invalidate_sync(iommu);
+        if ( rc )
+             return rc;
+
         if ( flush_dev_iotlb )
             ret = dev_invalidate_iotlb(iommu, did, addr, size_order, type);
-        rc = invalidate_sync(iommu);
+
         if ( !ret )
             ret = rc;
     }
diff --git a/xen/drivers/passthrough/vtd/x86/ats.c b/xen/drivers/passthrough/vtd/x86/ats.c
index 7c797f6..6299f52 100644
--- a/xen/drivers/passthrough/vtd/x86/ats.c
+++ b/xen/drivers/passthrough/vtd/x86/ats.c
@@ -162,6 +162,19 @@ int dev_invalidate_iotlb(struct iommu *iommu, u16 did,
             return -EOPNOTSUPP;
         }
 
+        /*
+         * Synchronize with hardware for Device-TLB invalidate
+         * descriptor.
+         */
+        ret = dev_invalidate_iotlb_sync(iommu, did, pdev->seg,
+                                        pdev->bus, pdev->devfn);
+        if ( ret )
+        {
+            dprintk(XENLOG_WARNING VTDPREFIX,
+                    "Device-TLB flush timeout.\n");
+            return ret;
+        }
+
         if ( !ret )
             ret = rc;
     }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2016-01-27 14:35 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-07  1:39 [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue Xu, Quan
2016-01-07 13:28 ` Jan Beulich
2016-01-07 13:46   ` Xu, Quan
2016-01-08  1:06     ` Xu, Quan
2016-01-13  6:12     ` Tian, Kevin
2016-01-13 11:02       ` Jan Beulich
2016-01-14  1:22         ` Tian, Kevin
2016-01-14  9:00           ` Jan Beulich
2016-01-14  9:12             ` Tian, Kevin
2016-01-14 10:46               ` Xu, Quan
2016-01-18 15:32                 ` Tim Deegan
2016-01-19  0:46                   ` Tian, Kevin
2016-01-19  6:54                   ` Xu, Quan
  -- strict thread matches above, loose matches on Subject: below --
2015-12-23  8:25 [PATCH v4 0/3] VT-d Device-TLB flush issue Quan Xu
2015-12-23  8:25 ` [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue Quan Xu
2015-12-25  3:06   ` Tian, Kevin
2016-01-06 11:25     ` Xu, Quan
2016-01-15 13:09   ` Jan Beulich
2016-01-20 10:26     ` Xu, Quan
2016-01-20 11:29       ` Jan Beulich
2016-01-21 16:16         ` Xu, Quan
2016-01-21 16:31           ` Jan Beulich
2016-01-22 15:57             ` Xu, Quan
2016-01-25 14:09               ` Jan Beulich
2016-01-26 13:47                 ` Xu, Quan
2016-01-26 13:59                   ` Jan Beulich
2016-01-26 15:27                     ` Xu, Quan
2016-01-26 15:52                       ` Jan Beulich
2016-01-26 22:47                         ` Tian, Kevin
2016-01-27 11:09                           ` Xu, Quan
2016-01-27 11:24                             ` Jan Beulich
2016-01-27 12:38                               ` Xu, Quan
2016-01-27 13:15                                 ` Jan Beulich
2016-01-27 14:13                                   ` Xu, Quan
2016-01-27 14:29                                     ` Jan Beulich
2016-01-27 14:35                                       ` Xu, Quan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.