All of lore.kernel.org
 help / color / mirror / Atom feed
* VT-d flush timeout
@ 2014-08-18  2:01 Zhang, Yang Z
  2014-08-18  9:47 ` Andrew Cooper
  2014-08-18 12:48 ` Jan Beulich
  0 siblings, 2 replies; 21+ messages in thread
From: Zhang, Yang Z @ 2014-08-18  2:01 UTC (permalink / raw)
  To: xen-devel
  Cc: Li, LiangX Z, Andrew Cooper, Tian, Kevin,
	'Jan Beulich (JBeulich@suse.com)'

Hi all

This is continuing with previous discussion about VT-d spin loop. According previous discussion, we will deal with current 1 second flush timeout firstly.

After reviewing Linux IOMMU code, it uses the timeout mechanism widely, e.g., flush iotlb and context via register based mechanism,
__iommu_flush_context():
    /* Make sure hardware complete it */
    IOMMU_WAIT_OP(iommu, DMAR_CCMD_REG,
       dmar_readq, (!(val & DMA_CCMD_ICC)), val);

The only place it doesn't use this timeout mechanism is queue based invalidation. I think the reason is that the max number of queue entry is 2^15 and we don't know how much time is needed really to flush 2^15 entries. So it is better to not use timeout here. Likewise, for Xen side, we will only remove the timeout in qi flush function and use spin for instead. 

Any comments?

best regards
yang

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VT-d flush timeout
  2014-08-18  2:01 VT-d flush timeout Zhang, Yang Z
@ 2014-08-18  9:47 ` Andrew Cooper
  2014-08-18 12:48 ` Jan Beulich
  1 sibling, 0 replies; 21+ messages in thread
From: Andrew Cooper @ 2014-08-18  9:47 UTC (permalink / raw)
  To: Zhang, Yang Z, xen-devel
  Cc: Li, LiangX Z, Tian, Kevin, 'Jan Beulich (JBeulich@suse.com)'

On 18/08/14 03:01, Zhang, Yang Z wrote:
> Hi all
>
> This is continuing with previous discussion about VT-d spin loop. According previous discussion, we will deal with current 1 second flush timeout firstly.
>
> After reviewing Linux IOMMU code, it uses the timeout mechanism widely, e.g., flush iotlb and context via register based mechanism,
> __iommu_flush_context():
>     /* Make sure hardware complete it */
>     IOMMU_WAIT_OP(iommu, DMAR_CCMD_REG,
>        dmar_readq, (!(val & DMA_CCMD_ICC)), val);
>
> The only place it doesn't use this timeout mechanism is queue based invalidation. I think the reason is that the max number of queue entry is 2^15 and we don't know how much time is needed really to flush 2^15 entries. So it is better to not use timeout here. Likewise, for Xen side, we will only remove the timeout in qi flush function and use spin for instead. 
>
> Any comments?

Waiting 1 second for a timeout is quite antisocial, although as a
panic() is the result, the system wasn't going to stay alive anyway.

However, synchronously waiting for flush is not acceptable.  As
identified in the Citrix/Intel monthly meetings, there is hardware which
takes milliseconds to reply to a flush.  This is a meaningful fraction
of the default scheduling timeslice.

It is my strong opinion that all spin loops like this need to be made
asynchronous, unless we know for certain that there is an upper bound
measured in a very small quantity of microseconds, where rescheduling
another vcpu might be a poor decision.

~Andrew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VT-d flush timeout
  2014-08-18  2:01 VT-d flush timeout Zhang, Yang Z
  2014-08-18  9:47 ` Andrew Cooper
@ 2014-08-18 12:48 ` Jan Beulich
  2014-08-19  1:34   ` Zhang, Yang Z
  1 sibling, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2014-08-18 12:48 UTC (permalink / raw)
  To: yang.z.zhang; +Cc: liangx.z.li, andrew.cooper3, kevin.tian, xen-devel

>>> "Zhang, Yang Z" <yang.z.zhang@intel.com> 08/18/14 4:01 AM >>>
>After reviewing Linux IOMMU code, it uses the timeout mechanism widely,
>e.g., flush iotlb and context via register based mechanism,
>__iommu_flush_context():
    >/* Make sure hardware complete it */
    >IOMMU_WAIT_OP(iommu, DMAR_CCMD_REG,
       >dmar_readq, (!(val & DMA_CCMD_ICC)), val);

I don't think using Linux as a reference is valid here, especially when Linux
behaves as wrongly as Xen currently does (spinning).

>The only place it doesn't use this timeout mechanism is queue based
>invalidation. I think the reason is that the max number of queue entry is
>2^15 and we don't know how much time is needed really to flush 2^15
>entries. So it is better to not use timeout here. Likewise, for Xen side, we
>will only remove the timeout in qi flush function and use spin for instead. 

What would that buy us? You spin either way. According what was
discussed on our weekly calls so far, the intention for a first step was to
reduce the current 1s timeout to a value large enough to cover what the
IOMMU really requires (in the non-ATS cases only of course). The second
more involved step would then be to get rid of all the spinning where, as
Andrew already said, the maximum time period to wait is longer than a
few microseconds.

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VT-d flush timeout
  2014-08-18 12:48 ` Jan Beulich
@ 2014-08-19  1:34   ` Zhang, Yang Z
  2014-08-19 13:02     ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Zhang, Yang Z @ 2014-08-19  1:34 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Li, LiangX Z, andrew.cooper3, Tian, Kevin, xen-devel

Jan Beulich wrote on 2014-08-18:
>>>> "Zhang, Yang Z" <yang.z.zhang@intel.com> 08/18/14 4:01 AM >>>
>> After reviewing Linux IOMMU code, it uses the timeout mechanism
>> widely, e.g., flush iotlb and context via register based mechanism,
>> __iommu_flush_context():
>> /* Make sure hardware complete it */
>> IOMMU_WAIT_OP(iommu, DMAR_CCMD_REG,
>> dmar_readq, (!(val & DMA_CCMD_ICC)), val);
> 
> I don't think using Linux as a reference is valid here, especially
> when Linux behaves as wrongly as Xen currently does (spinning).
> 
>> The only place it doesn't use this timeout mechanism is queue based
>> invalidation. I think the reason is that the max number of queue
>> entry is
>> 2^15 and we don't know how much time is needed really to flush 2^15
>> entries. So it is better to not use timeout here. Likewise, for Xen
>> side, we will only remove the timeout in qi flush function and use
>> spin for
> instead.
> 
> What would that buy us? You spin either way. According what was
> discussed on our weekly calls so far, the intention for a first step
> was to reduce the current 1s timeout to a value large enough to cover
> what the IOMMU really requires (in the non-ATS cases only of course).

I don't get your point. What's the different between the 1s and the large enough value? Since hardware completed the flush quickly, it will never spin more than real flush time in normal case. 1s spin only happens in the abnormal case.
My only concern is that, for QI flush, the spin time relies on the length of the queue. I am not sure whether 1s is enough for worst case and I think we should remove the 1s in QI flush. And I think this also the same reason for Linux don't use timeout mechanism in QI flush.

> The second more involved step would then be to get rid of all the
> spinning where, as Andrew already said, the maximum time period to wait is longer than a few microseconds.

Agree. In future, we must adapt more reasonable approach to do it, like asynchronous flush.

> 
> Jan


Best regards,
Yang

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VT-d flush timeout
  2014-08-19  1:34   ` Zhang, Yang Z
@ 2014-08-19 13:02     ` Jan Beulich
  2014-08-21  3:16       ` Zhang, Yang Z
  2014-09-25  1:02       ` Dong, Eddie
  0 siblings, 2 replies; 21+ messages in thread
From: Jan Beulich @ 2014-08-19 13:02 UTC (permalink / raw)
  To: yang.z.zhang; +Cc: liangx.z.li, andrew.cooper3, kevin.tian, xen-devel

>>> "Zhang, Yang Z" <yang.z.zhang@intel.com> 08/19/14 3:34 AM >>>
>Jan Beulich wrote on 2014-08-18:
>>>>> "Zhang, Yang Z" <yang.z.zhang@intel.com> 08/18/14 4:01 AM >>>
>>> The only place it doesn't use this timeout mechanism is queue based
>>> invalidation. I think the reason is that the max number of queue
>>> entry is
>>> 2^15 and we don't know how much time is needed really to flush 2^15
>>> entries. So it is better to not use timeout here. Likewise, for Xen
>>> side, we will only remove the timeout in qi flush function and use
>>> spin for
>>> instead.
>> 
>> What would that buy us? You spin either way. According what was
>> discussed on our weekly calls so far, the intention for a first step
>> was to reduce the current 1s timeout to a value large enough to cover
>> what the IOMMU really requires (in the non-ATS cases only of course).
>
>I don't get your point. What's the different between the 1s and the large
>enough value? Since hardware completed the flush quickly, it will never
>spin more than real flush time in normal case. 1s spin only happens in
>the abnormal case.

Right, but we can't ignore this abnormal case. In particular we can't exclude
that a DomU with a device assigned may have ways to (perhaps indirectly)
affect the completion time of the flushes.

>My only concern is that, for QI flush, the spin time relies on the length of
>the queue. I am not sure whether 1s is enough for worst case and I think
>we should remove the 1s in QI flush. And I think this also the same reason
>for Linux don't use timeout mechanism in QI flush.

First of all I think both Linux and Xen in the majority of cases waits for
completion of just individual queue entries. I.e. I'm not sure if the practical
worst case really is equal to the theoretical one. And then removing a
timeout just to allow _longer_ spinning isn't really a step forward. If the
timeout isn't big enough, the only solution is to immediately replace it with
asynchronous handling.

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VT-d flush timeout
  2014-08-19 13:02     ` Jan Beulich
@ 2014-08-21  3:16       ` Zhang, Yang Z
  2014-08-22  7:33         ` Jan Beulich
  2014-09-25  1:02       ` Dong, Eddie
  1 sibling, 1 reply; 21+ messages in thread
From: Zhang, Yang Z @ 2014-08-21  3:16 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Li, LiangX Z, andrew.cooper3, Tian, Kevin, xen-devel

Jan Beulich wrote on 2014-08-19:
>>>> "Zhang, Yang Z" <yang.z.zhang@intel.com> 08/19/14 3:34 AM >>>
>> Jan Beulich wrote on 2014-08-18:
>>>>>> "Zhang, Yang Z" <yang.z.zhang@intel.com> 08/18/14 4:01 AM >>>
>>>> The only place it doesn't use this timeout mechanism is queue
>>>> based invalidation. I think the reason is that the max number of
>>>> queue entry is
>>>> 2^15 and we don't know how much time is needed really to flush
>>>> 2^15 entries. So it is better to not use timeout here. Likewise,
>>>> for Xen side, we will only remove the timeout in qi flush function
>>>> and use spin for instead.
>>> 
>>> What would that buy us? You spin either way. According what was
>>> discussed on our weekly calls so far, the intention for a first
>>> step was to reduce the current 1s timeout to a value large enough
>>> to cover what the IOMMU really requires (in the non-ATS cases only of course).
>> 
>> I don't get your point. What's the different between the 1s and the
>> large enough value? Since hardware completed the flush quickly, it
>> will never spin more than real flush time in normal case. 1s spin
>> only happens in the abnormal case.
> 
> Right, but we can't ignore this abnormal case. In particular we can't
> exclude that a DomU with a device assigned may have ways to (perhaps
> indirectly) affect the completion time of the flushes.

I don't think timeout value helps in this case.

> 
>> My only concern is that, for QI flush, the spin time relies on the
>> length of the queue. I am not sure whether 1s is enough for worst
>> case and I think we should remove the 1s in QI flush. And I think
>> this also the same reason for Linux don't use timeout mechanism in QI flush.
> 
> First of all I think both Linux and Xen in the majority of cases waits
> for completion of just individual queue entries. I.e. I'm not sure if
> the practical worst case really is equal to the theoretical one. And

This is my guessing from Linux's implementation but may wrong.

> then removing a timeout just to allow _longer_ spinning isn't really a
> step forward. If the timeout isn't big enough, the only solution is to
> immediately replace it with asynchronous handling.

Agree. I think it is better to leave as it is before we have the asynchronous handling.

> 
> Jan


Best regards,
Yang

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VT-d flush timeout
  2014-08-21  3:16       ` Zhang, Yang Z
@ 2014-08-22  7:33         ` Jan Beulich
  2014-08-22  7:49           ` Zhang, Yang Z
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2014-08-22  7:33 UTC (permalink / raw)
  To: Yang Z Zhang; +Cc: LiangX Z Li, andrew.cooper3, Kevin Tian, xen-devel

>>> On 21.08.14 at 05:16, <yang.z.zhang@intel.com> wrote:
> Jan Beulich wrote on 2014-08-19:
>>>>> "Zhang, Yang Z" <yang.z.zhang@intel.com> 08/19/14 3:34 AM >>>
>>> My only concern is that, for QI flush, the spin time relies on the
>>> length of the queue. I am not sure whether 1s is enough for worst
>>> case and I think we should remove the 1s in QI flush. And I think
>>> this also the same reason for Linux don't use timeout mechanism in QI flush.
>> 
>> First of all I think both Linux and Xen in the majority of cases waits
>> for completion of just individual queue entries. I.e. I'm not sure if
>> the practical worst case really is equal to the theoretical one. And
> 
> This is my guessing from Linux's implementation but may wrong.

Which is why we ask for you (the VT-d maintainer) to, as a first
step, supply a patch limiting the spinning time to a value smaller
than the current on, just enough to cover real requirements. The
second step then ought to be to rework the code to use
asynchronous operation (presumably including making use of the
respective IOMMU interrupt).

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VT-d flush timeout
  2014-08-22  7:33         ` Jan Beulich
@ 2014-08-22  7:49           ` Zhang, Yang Z
  2014-08-22  7:58             ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Zhang, Yang Z @ 2014-08-22  7:49 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Li, LiangX Z, andrew.cooper3, Tian, Kevin, xen-devel

Jan Beulich wrote on 2014-08-22:
>>>> On 21.08.14 at 05:16, <yang.z.zhang@intel.com> wrote:
>> Jan Beulich wrote on 2014-08-19:
>>>>>> "Zhang, Yang Z" <yang.z.zhang@intel.com> 08/19/14 3:34 AM >>>
>>>> My only concern is that, for QI flush, the spin time relies on the
>>>> length of the queue. I am not sure whether 1s is enough for worst
>>>> case and I think we should remove the 1s in QI flush. And I think
>>>> this also the same reason for Linux don't use timeout mechanism in
>>>> QI
> flush.
>>> 
>>> First of all I think both Linux and Xen in the majority of cases
>>> waits for completion of just individual queue entries. I.e. I'm not
>>> sure if the practical worst case really is equal to the theoretical
>>> one. And
>> 
>> This is my guessing from Linux's implementation but may wrong.
> 
> Which is why we ask for you (the VT-d maintainer) to, as a first step,
> supply a patch limiting the spinning time to a value smaller than the
> current on, just enough to cover real requirements. The second step

This doesn't answer my question. I still don't see why a smaller value helps.

> then ought to be to rework the code to use asynchronous operation
> (presumably including making use of the respective IOMMU interrupt).
> 
> Jan


Best regards,
Yang

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VT-d flush timeout
  2014-08-22  7:49           ` Zhang, Yang Z
@ 2014-08-22  7:58             ` Jan Beulich
  2014-08-22  8:05               ` Zhang, Yang Z
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2014-08-22  7:58 UTC (permalink / raw)
  To: Yang Z Zhang
  Cc: LiangX Z Li, andrew.cooper3, Kevin Tian, Donald D Dugger, xen-devel

>>> On 22.08.14 at 09:49, <yang.z.zhang@intel.com> wrote:
> Jan Beulich wrote on 2014-08-22:
>>>>> On 21.08.14 at 05:16, <yang.z.zhang@intel.com> wrote:
>>> Jan Beulich wrote on 2014-08-19:
>>>>>>> "Zhang, Yang Z" <yang.z.zhang@intel.com> 08/19/14 3:34 AM >>>
>>>>> My only concern is that, for QI flush, the spin time relies on the
>>>>> length of the queue. I am not sure whether 1s is enough for worst
>>>>> case and I think we should remove the 1s in QI flush. And I think
>>>>> this also the same reason for Linux don't use timeout mechanism in
>>>>> QI
>> flush.
>>>> 
>>>> First of all I think both Linux and Xen in the majority of cases
>>>> waits for completion of just individual queue entries. I.e. I'm not
>>>> sure if the practical worst case really is equal to the theoretical
>>>> one. And
>>> 
>>> This is my guessing from Linux's implementation but may wrong.
>> 
>> Which is why we ask for you (the VT-d maintainer) to, as a first step,
>> supply a patch limiting the spinning time to a value smaller than the
>> current on, just enough to cover real requirements. The second step
> 
> This doesn't answer my question. I still don't see why a smaller value 
> helps.

Because it reduces the impact the currently large value would have
in misbehaving cases? Don clearly indicated to me that it shouldn't
be a big deal to reduce the current timeout, so I really don't see what
all this argument is about.

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VT-d flush timeout
  2014-08-22  7:58             ` Jan Beulich
@ 2014-08-22  8:05               ` Zhang, Yang Z
  2014-08-25 17:21                 ` Dugger, Donald D
  0 siblings, 1 reply; 21+ messages in thread
From: Zhang, Yang Z @ 2014-08-22  8:05 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Li, LiangX Z, andrew.cooper3, Tian, Kevin, Dugger, Donald D, xen-devel

Jan Beulich wrote on 2014-08-22:
>>>> On 22.08.14 at 09:49, <yang.z.zhang@intel.com> wrote:
>> Jan Beulich wrote on 2014-08-22:
>>>>>> On 21.08.14 at 05:16, <yang.z.zhang@intel.com> wrote:
>>>> Jan Beulich wrote on 2014-08-19:
>>>>>>>> "Zhang, Yang Z" <yang.z.zhang@intel.com> 08/19/14 3:34 AM >>>
>>>>>> My only concern is that, for QI flush, the spin time relies on
>>>>>> the length of the queue. I am not sure whether 1s is enough for
>>>>>> worst case and I think we should remove the 1s in QI flush. And
>>>>>> I think this also the same reason for Linux don't use timeout
>>>>>> mechanism in QI
>>> flush.
>>>>> 
>>>>> First of all I think both Linux and Xen in the majority of cases
>>>>> waits for completion of just individual queue entries. I.e. I'm
>>>>> not sure if the practical worst case really is equal to the
>>>>> theoretical one. And
>>>> 
>>>> This is my guessing from Linux's implementation but may wrong.
>>> 
>>> Which is why we ask for you (the VT-d maintainer) to, as a first
>>> step, supply a patch limiting the spinning time to a value smaller
>>> than the current on, just enough to cover real requirements. The
>>> second step
>> 
>> This doesn't answer my question. I still don't see why a smaller
>> value helps.
> 
> Because it reduces the impact the currently large value would have in
> misbehaving cases? Don clearly indicated to me that it shouldn't be a
> big deal to reduce the current timeout, so I really don't see what all
> this argument is about.

I am not arguing. Since I am not in your meeting, so I don't know the background of the decision and I hope someone can answer my question.

> 
> Jan


Best regards,
Yang

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VT-d flush timeout
  2014-08-22  8:05               ` Zhang, Yang Z
@ 2014-08-25 17:21                 ` Dugger, Donald D
  0 siblings, 0 replies; 21+ messages in thread
From: Dugger, Donald D @ 2014-08-25 17:21 UTC (permalink / raw)
  To: Zhang, Yang Z, Jan Beulich
  Cc: Li, LiangX Z, andrew.cooper3, Tian, Kevin, xen-devel

I think the point is that the current code is incorrect In all cases.  Currently we have a 1 second timeout which is too long (by a factor of 100) for the non-ACS case and is too short (by a factor of 60 according to the PCI SIG spec) for the ACS case.

Given that a 10 msec timeout is sufficient for the non-ACS case we should make the simple change to the current timeout to change it to 10 msec while we work on creating a more complex solution for the 60 second wait required for the ACS case.

--
Don Dugger
"Censeo Toto nos in Kansa esse decisse." - D. Gale
Ph: 303/443-3786

-----Original Message-----
From: Zhang, Yang Z 
Sent: Friday, August 22, 2014 2:06 AM
To: Jan Beulich
Cc: andrew.cooper3@citrix.com; Dugger, Donald D; Tian, Kevin; Li, LiangX Z; xen-devel@lists.xen.org
Subject: RE: RE: VT-d flush timeout

Jan Beulich wrote on 2014-08-22:
>>>> On 22.08.14 at 09:49, <yang.z.zhang@intel.com> wrote:
>> Jan Beulich wrote on 2014-08-22:
>>>>>> On 21.08.14 at 05:16, <yang.z.zhang@intel.com> wrote:
>>>> Jan Beulich wrote on 2014-08-19:
>>>>>>>> "Zhang, Yang Z" <yang.z.zhang@intel.com> 08/19/14 3:34 AM >>>
>>>>>> My only concern is that, for QI flush, the spin time relies on 
>>>>>> the length of the queue. I am not sure whether 1s is enough for 
>>>>>> worst case and I think we should remove the 1s in QI flush. And I 
>>>>>> think this also the same reason for Linux don't use timeout 
>>>>>> mechanism in QI
>>> flush.
>>>>> 
>>>>> First of all I think both Linux and Xen in the majority of cases 
>>>>> waits for completion of just individual queue entries. I.e. I'm 
>>>>> not sure if the practical worst case really is equal to the 
>>>>> theoretical one. And
>>>> 
>>>> This is my guessing from Linux's implementation but may wrong.
>>> 
>>> Which is why we ask for you (the VT-d maintainer) to, as a first 
>>> step, supply a patch limiting the spinning time to a value smaller 
>>> than the current on, just enough to cover real requirements. The 
>>> second step
>> 
>> This doesn't answer my question. I still don't see why a smaller 
>> value helps.
> 
> Because it reduces the impact the currently large value would have in 
> misbehaving cases? Don clearly indicated to me that it shouldn't be a 
> big deal to reduce the current timeout, so I really don't see what all 
> this argument is about.

I am not arguing. Since I am not in your meeting, so I don't know the background of the decision and I hope someone can answer my question.

> 
> Jan


Best regards,
Yang

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VT-d flush timeout
  2014-08-19 13:02     ` Jan Beulich
  2014-08-21  3:16       ` Zhang, Yang Z
@ 2014-09-25  1:02       ` Dong, Eddie
  2014-09-25  8:55         ` Jan Beulich
  1 sibling, 1 reply; 21+ messages in thread
From: Dong, Eddie @ 2014-09-25  1:02 UTC (permalink / raw)
  To: Jan Beulich, Zhang, Yang Z
  Cc: Li, LiangX Z, andrew.cooper3, Tian, Kevin, Dong, Eddie, xen-devel

> >I don't get your point. What's the different between the 1s and the
> >large enough value? Since hardware completed the flush quickly, it will
> >never spin more than real flush time in normal case. 1s spin only
> >happens in the abnormal case.
> 
> Right, but we can't ignore this abnormal case. In particular we can't exclude
> that a DomU with a device assigned may have ways to (perhaps indirectly)
> affect the completion time of the flushes.
> 
> >My only concern is that, for QI flush, the spin time relies on the
> >length of the queue. I am not sure whether 1s is enough for worst case
> >and I think we should remove the 1s in QI flush. And I think this also
> >the same reason for Linux don't use timeout mechanism in QI flush.
> 
> First of all I think both Linux and Xen in the majority of cases waits for
> completion of just individual queue entries. I.e. I'm not sure if the practical
> worst case really is equal to the theoretical one. And then removing a
> timeout just to allow _longer_ spinning isn't really a step forward. If the
> timeout isn't big enough, the only solution is to immediately replace it with
> asynchronous handling.
> 

Giving this path of long time wait-loop only happens at the case when the hardware fails, I don't care if it enters panic in 1 seconds or in 10ms seconds. But the software compatibility (for all existing platforms and potential future platforms) is much more important.

Replacing wait-loop with an asynchronous handling mechanism is definitely good -- maybe put an TODO list for the IOMMU stuff which I believe requiring not small effort. But the main goal should be targeting the normal code path, i.e. success of the IOTLB operation no matter it is 1ms or 10ms. 

However, as for the timeout code path, given that the specification doesn't say what the hardware WORST case is, using practical smaller number is not a good choice to me. Nobody is able to test on all platforms, and it may fail in future platform. Further more, the result of the mis-prediction to the hardware behavior is so serious-->leading to hypervisor panic.  Of course 1second is not a good value too, but at least it is verified in the past years.

Thx Eddie

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VT-d flush timeout
  2014-09-25  1:02       ` Dong, Eddie
@ 2014-09-25  8:55         ` Jan Beulich
  2014-09-25 10:44           ` Andrew Cooper
  2014-09-25 23:23           ` Dong, Eddie
  0 siblings, 2 replies; 21+ messages in thread
From: Jan Beulich @ 2014-09-25  8:55 UTC (permalink / raw)
  To: andrew.cooper3, Eddie Dong, Yang Z Zhang
  Cc: LiangX Z Li, Kevin Tian, xen-devel

>>> On 25.09.14 at 03:02, <eddie.dong@intel.com> wrote:
>> >I don't get your point. What's the different between the 1s and the
>> >large enough value? Since hardware completed the flush quickly, it will
>> >never spin more than real flush time in normal case. 1s spin only
>> >happens in the abnormal case.
>> 
>> Right, but we can't ignore this abnormal case. In particular we can't 
> exclude
>> that a DomU with a device assigned may have ways to (perhaps indirectly)
>> affect the completion time of the flushes.
>> 
>> >My only concern is that, for QI flush, the spin time relies on the
>> >length of the queue. I am not sure whether 1s is enough for worst case
>> >and I think we should remove the 1s in QI flush. And I think this also
>> >the same reason for Linux don't use timeout mechanism in QI flush.
>> 
>> First of all I think both Linux and Xen in the majority of cases waits for
>> completion of just individual queue entries. I.e. I'm not sure if the 
> practical
>> worst case really is equal to the theoretical one. And then removing a
>> timeout just to allow _longer_ spinning isn't really a step forward. If the
>> timeout isn't big enough, the only solution is to immediately replace it 
> with
>> asynchronous handling.
>> 
> 
> Giving this path of long time wait-loop only happens at the case when the 
> hardware fails, I don't care if it enters panic in 1 seconds or in 10ms 
> seconds. But the software compatibility (for all existing platforms and 
> potential future platforms) is much more important.

I agree for the paths leading to a panic(). But there's one such case
where it doesn't: snb_vtd_ops_preamble().

> Replacing wait-loop with an asynchronous handling mechanism is definitely 
> good -- maybe put an TODO list for the IOMMU stuff which I believe requiring 
> not small effort. But the main goal should be targeting the normal code path, 
> i.e. success of the IOTLB operation no matter it is 1ms or 10ms. 
> 
> However, as for the timeout code path, given that the specification doesn't 
> say what the hardware WORST case is, using practical smaller number is not a 
> good choice to me. Nobody is able to test on all platforms, and it may fail 
> in future platform. Further more, the result of the mis-prediction to the 
> hardware behavior is so serious-->leading to hypervisor panic.  Of course 
> 1second is not a good value too, but at least it is verified in the past 
> years.

Once again - the ATS spec talks of 60 seconds (with possibly another
30 seconds added on top depending on how you read it). So while for
the non-ATS case, provided the one case pointed out above gets dealt
with, I agree we don't need to bother changing the code, the ATS case
clearly makes asynchronous handling necessary. It is for that reason
that we decided to disable ATS support by default. Albeit now that I
think about it again, I'm not sure that was an appropriate action for
the AMD side of things - Andrew, what do you think?

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VT-d flush timeout
  2014-09-25  8:55         ` Jan Beulich
@ 2014-09-25 10:44           ` Andrew Cooper
  2014-09-25 10:55             ` Jan Beulich
  2014-09-25 23:23           ` Dong, Eddie
  1 sibling, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2014-09-25 10:44 UTC (permalink / raw)
  To: Jan Beulich, Eddie Dong, Yang Z Zhang; +Cc: LiangX Z Li, Kevin Tian, xen-devel

On 25/09/14 09:55, Jan Beulich wrote:
>>>> On 25.09.14 at 03:02, <eddie.dong@intel.com> wrote:
>>>> I don't get your point. What's the different between the 1s and the
>>>> large enough value? Since hardware completed the flush quickly, it will
>>>> never spin more than real flush time in normal case. 1s spin only
>>>> happens in the abnormal case.
>>> Right, but we can't ignore this abnormal case. In particular we can't 
>> exclude
>>> that a DomU with a device assigned may have ways to (perhaps indirectly)
>>> affect the completion time of the flushes.
>>>
>>>> My only concern is that, for QI flush, the spin time relies on the
>>>> length of the queue. I am not sure whether 1s is enough for worst case
>>>> and I think we should remove the 1s in QI flush. And I think this also
>>>> the same reason for Linux don't use timeout mechanism in QI flush.
>>> First of all I think both Linux and Xen in the majority of cases waits for
>>> completion of just individual queue entries. I.e. I'm not sure if the 
>> practical
>>> worst case really is equal to the theoretical one. And then removing a
>>> timeout just to allow _longer_ spinning isn't really a step forward. If the
>>> timeout isn't big enough, the only solution is to immediately replace it 
>> with
>>> asynchronous handling.
>>>
>> Giving this path of long time wait-loop only happens at the case when the 
>> hardware fails, I don't care if it enters panic in 1 seconds or in 10ms 
>> seconds. But the software compatibility (for all existing platforms and 
>> potential future platforms) is much more important.
> I agree for the paths leading to a panic(). But there's one such case
> where it doesn't: snb_vtd_ops_preamble().
>
>> Replacing wait-loop with an asynchronous handling mechanism is definitely 
>> good -- maybe put an TODO list for the IOMMU stuff which I believe requiring 
>> not small effort. But the main goal should be targeting the normal code path, 
>> i.e. success of the IOTLB operation no matter it is 1ms or 10ms. 
>>
>> However, as for the timeout code path, given that the specification doesn't 
>> say what the hardware WORST case is, using practical smaller number is not a 
>> good choice to me. Nobody is able to test on all platforms, and it may fail 
>> in future platform. Further more, the result of the mis-prediction to the 
>> hardware behavior is so serious-->leading to hypervisor panic.  Of course 
>> 1second is not a good value too, but at least it is verified in the past 
>> years.
> Once again - the ATS spec talks of 60 seconds (with possibly another
> 30 seconds added on top depending on how you read it). So while for
> the non-ATS case, provided the one case pointed out above gets dealt
> with, I agree we don't need to bother changing the code, the ATS case
> clearly makes asynchronous handling necessary. It is for that reason
> that we decided to disable ATS support by default. Albeit now that I
> think about it again, I'm not sure that was an appropriate action for
> the AMD side of things - Andrew, what do you think?
>
> Jan
>

The AMD code will wait for a period of time for the COMMAND_WAIT to
start, but not panic() on a timeout.

By my reading of the code, amd_iommu_flush_iotlb() can complete without
confirming that the iotlbs have actually been flushed, and all that is
left behind is an AMD_IOMMU_DEBUG() message.

~Andrew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VT-d flush timeout
  2014-09-25 10:44           ` Andrew Cooper
@ 2014-09-25 10:55             ` Jan Beulich
  2014-09-25 10:56               ` Andrew Cooper
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2014-09-25 10:55 UTC (permalink / raw)
  To: Aravind Gopalakrishnan, suravee.suthikulpanit, Andrew Cooper
  Cc: LiangX Z Li, Yang ZZhang, Kevin Tian, Eddie Dong, xen-devel

>>> On 25.09.14 at 12:44, <andrew.cooper3@citrix.com> wrote:
> The AMD code will wait for a period of time for the COMMAND_WAIT to
> start, but not panic() on a timeout.
> 
> By my reading of the code, amd_iommu_flush_iotlb() can complete without
> confirming that the iotlbs have actually been flushed, and all that is
> left behind is an AMD_IOMMU_DEBUG() message.

Now that you say this I recall having stumbled across this questionable
behavior in the past (and perhaps even more than once). Certainly
something for our AMD IOMMU maintainers to look into...

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VT-d flush timeout
  2014-09-25 10:55             ` Jan Beulich
@ 2014-09-25 10:56               ` Andrew Cooper
  2014-09-25 11:48                 ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2014-09-25 10:56 UTC (permalink / raw)
  To: Jan Beulich, Aravind Gopalakrishnan, suravee.suthikulpanit
  Cc: LiangX Z Li, Yang ZZhang, Kevin Tian, Eddie Dong, xen-devel

On 25/09/14 11:55, Jan Beulich wrote:
>>>> On 25.09.14 at 12:44, <andrew.cooper3@citrix.com> wrote:
>> The AMD code will wait for a period of time for the COMMAND_WAIT to
>> start, but not panic() on a timeout.
>>
>> By my reading of the code, amd_iommu_flush_iotlb() can complete without
>> confirming that the iotlbs have actually been flushed, and all that is
>> left behind is an AMD_IOMMU_DEBUG() message.
> Now that you say this I recall having stumbled across this questionable
> behavior in the past (and perhaps even more than once). Certainly
> something for our AMD IOMMU maintainers to look into...
>
> Jan
>

Agreed, although for now I think it is safer to leave ATS disabled.

~Andrew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VT-d flush timeout
  2014-09-25 10:56               ` Andrew Cooper
@ 2014-09-25 11:48                 ` Jan Beulich
  2014-09-25 21:21                   ` Tian, Kevin
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2014-09-25 11:48 UTC (permalink / raw)
  To: Aravind Gopalakrishnan, suravee.suthikulpanit, Andrew Cooper
  Cc: LiangX Z Li, Yang ZZhang, Kevin Tian, Eddie Dong, xen-devel

>>> On 25.09.14 at 12:56, <andrew.cooper3@citrix.com> wrote:
> On 25/09/14 11:55, Jan Beulich wrote:
>>>>> On 25.09.14 at 12:44, <andrew.cooper3@citrix.com> wrote:
>>> The AMD code will wait for a period of time for the COMMAND_WAIT to
>>> start, but not panic() on a timeout.
>>>
>>> By my reading of the code, amd_iommu_flush_iotlb() can complete without
>>> confirming that the iotlbs have actually been flushed, and all that is
>>> left behind is an AMD_IOMMU_DEBUG() message.
>> Now that you say this I recall having stumbled across this questionable
>> behavior in the past (and perhaps even more than once). Certainly
>> something for our AMD IOMMU maintainers to look into...
> 
> Agreed, although for now I think it is safer to leave ATS disabled.

Fully agree.

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VT-d flush timeout
  2014-09-25 11:48                 ` Jan Beulich
@ 2014-09-25 21:21                   ` Tian, Kevin
  0 siblings, 0 replies; 21+ messages in thread
From: Tian, Kevin @ 2014-09-25 21:21 UTC (permalink / raw)
  To: Jan Beulich, Aravind Gopalakrishnan, suravee.suthikulpanit,
	Andrew Cooper
  Cc: Li, LiangX Z, Zhang, Yang Z, Dong, Eddie, xen-devel

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Thursday, September 25, 2014 4:48 AM
> 
> >>> On 25.09.14 at 12:56, <andrew.cooper3@citrix.com> wrote:
> > On 25/09/14 11:55, Jan Beulich wrote:
> >>>>> On 25.09.14 at 12:44, <andrew.cooper3@citrix.com> wrote:
> >>> The AMD code will wait for a period of time for the COMMAND_WAIT to
> >>> start, but not panic() on a timeout.
> >>>
> >>> By my reading of the code, amd_iommu_flush_iotlb() can complete
> without
> >>> confirming that the iotlbs have actually been flushed, and all that is
> >>> left behind is an AMD_IOMMU_DEBUG() message.
> >> Now that you say this I recall having stumbled across this questionable
> >> behavior in the past (and perhaps even more than once). Certainly
> >> something for our AMD IOMMU maintainers to look into...
> >
> > Agreed, although for now I think it is safer to leave ATS disabled.
> 
> Fully agree.
> 

I agree with this, until we have the asynchronous wait scheme implemented.
And we don't need to change the timeout value for non-ATS cases, as discussed
earlier.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VT-d flush timeout
  2014-09-25  8:55         ` Jan Beulich
  2014-09-25 10:44           ` Andrew Cooper
@ 2014-09-25 23:23           ` Dong, Eddie
  2014-09-26  1:07             ` Zhang, Yang Z
  2014-09-26  6:30             ` Jan Beulich
  1 sibling, 2 replies; 21+ messages in thread
From: Dong, Eddie @ 2014-09-25 23:23 UTC (permalink / raw)
  To: Jan Beulich, andrew.cooper3, Zhang, Yang Z
  Cc: Li, LiangX Z, Tian, Kevin, Dong, Eddie, xen-devel

> >>
> >
> > Giving this path of long time wait-loop only happens at the case when
> > the hardware fails, I don't care if it enters panic in 1 seconds or in
> > 10ms seconds. But the software compatibility (for all existing
> > platforms and potential future platforms) is much more important.
> 
> I agree for the paths leading to a panic(). But there's one such case where it
> doesn't: snb_vtd_ops_preamble().
> 

For some reason, it is using a same MAGIC TIMEOUT value with VTd flush timeout.
They are actually different and this one is used to prevent the IGD to enter RC6 state :)

Whether the IGD enters RC6 state in 1second is another story, it doesn't indicate the 
hardware is bad, like the VTD flush timeout indicates. So it is not necessary to enter panic.

Maybe we should cook a patch to use different TIMEOUT MACRO, though they are 1second in both cases.

Thx Eddie

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VT-d flush timeout
  2014-09-25 23:23           ` Dong, Eddie
@ 2014-09-26  1:07             ` Zhang, Yang Z
  2014-09-26  6:30             ` Jan Beulich
  1 sibling, 0 replies; 21+ messages in thread
From: Zhang, Yang Z @ 2014-09-26  1:07 UTC (permalink / raw)
  To: Dong, Eddie, Jan Beulich, andrew.cooper3
  Cc: Li, LiangX Z, Tian, Kevin, xen-devel

Dong, Eddie wrote on 2014-09-26:
>>>> 
>>> 
>>> Giving this path of long time wait-loop only happens at the case
>>> when the hardware fails, I don't care if it enters panic in 1
>>> seconds or in 10ms seconds. But the software compatibility (for
>>> all existing platforms and potential future platforms) is much more important.
>> 
>> I agree for the paths leading to a panic(). But there's one such
>> case where it
>> doesn't: snb_vtd_ops_preamble().
>> 
> 
> For some reason, it is using a same MAGIC TIMEOUT value with VTd flush
> timeout. They are actually different and this one is used to prevent the
> IGD to enter RC6 state :)
> 
> Whether the IGD enters RC6 state in 1second is another story, it
> doesn't indicate the hardware is bad, like the VTD flush timeout
> indicates. So it is not necessary to enter panic.
> 
> Maybe we should cook a patch to use different TIMEOUT MACRO, though
> they are 1second in both cases.

Agree. It's better to use another MACRO since it totally has nothing to do with VT-D.

> 
> Thx Eddie


Best regards,
Yang

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VT-d flush timeout
  2014-09-25 23:23           ` Dong, Eddie
  2014-09-26  1:07             ` Zhang, Yang Z
@ 2014-09-26  6:30             ` Jan Beulich
  1 sibling, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2014-09-26  6:30 UTC (permalink / raw)
  To: andrew.cooper3, Eddie Dong, Yang Z Zhang
  Cc: LiangX Z Li, Kevin Tian, xen-devel

>>> On 26.09.14 at 01:23, <eddie.dong@intel.com> wrote:
>> >>
>> >
>> > Giving this path of long time wait-loop only happens at the case when
>> > the hardware fails, I don't care if it enters panic in 1 seconds or in
>> > 10ms seconds. But the software compatibility (for all existing
>> > platforms and potential future platforms) is much more important.
>> 
>> I agree for the paths leading to a panic(). But there's one such case where 
> it
>> doesn't: snb_vtd_ops_preamble().
>> 
> 
> For some reason, it is using a same MAGIC TIMEOUT value with VTd flush 
> timeout.
> They are actually different and this one is used to prevent the IGD to enter 
> RC6 state :)
> 
> Whether the IGD enters RC6 state in 1second is another story, it doesn't 
> indicate the 
> hardware is bad, like the VTD flush timeout indicates. So it is not 
> necessary to enter panic.
> 
> Maybe we should cook a patch to use different TIMEOUT MACRO, though they are 
> 1second in both cases.

If it needs to stay 1 second, more than just introducing a separate
macro should be done. If the value can be reduced to a couple of
milliseconds, keeping the code as is with just the new macro put in
place would be fine.

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2014-09-26  6:30 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-18  2:01 VT-d flush timeout Zhang, Yang Z
2014-08-18  9:47 ` Andrew Cooper
2014-08-18 12:48 ` Jan Beulich
2014-08-19  1:34   ` Zhang, Yang Z
2014-08-19 13:02     ` Jan Beulich
2014-08-21  3:16       ` Zhang, Yang Z
2014-08-22  7:33         ` Jan Beulich
2014-08-22  7:49           ` Zhang, Yang Z
2014-08-22  7:58             ` Jan Beulich
2014-08-22  8:05               ` Zhang, Yang Z
2014-08-25 17:21                 ` Dugger, Donald D
2014-09-25  1:02       ` Dong, Eddie
2014-09-25  8:55         ` Jan Beulich
2014-09-25 10:44           ` Andrew Cooper
2014-09-25 10:55             ` Jan Beulich
2014-09-25 10:56               ` Andrew Cooper
2014-09-25 11:48                 ` Jan Beulich
2014-09-25 21:21                   ` Tian, Kevin
2014-09-25 23:23           ` Dong, Eddie
2014-09-26  1:07             ` Zhang, Yang Z
2014-09-26  6:30             ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.