All of lore.kernel.org
 help / color / mirror / Atom feed
* VT-d spin loops
       [not found] <53A96B80020000780001CB8F@mail.emea.novell.com>
@ 2014-06-26 10:30 ` Jan Beulich
  2014-06-26 10:38   ` Andrew Cooper
  2014-07-09 23:22   ` Tian, Kevin
  0 siblings, 2 replies; 10+ messages in thread
From: Jan Beulich @ 2014-06-26 10:30 UTC (permalink / raw)
  To: Kevin Tian, Yang Z Zhang, xen-devel
  Cc: Andrew Cooper, Malcolm Crossley, Donald D Dugger

All,

VT-d code currently has a number of cases where completion of certain
operations is being waited for by way of spinning. The various instances
can be identified relatively easily by grep-ing for all uses of
DMAR_OPERATION_TIMEOUT (the majority of instances use that
variable indirectly through IOMMU_WAIT_OP()), allowing for loops of
up to 1 second. While in many of the cases this _may_ be acceptable
(which would need to be proven for each individual case, also taking into
consideration how many of these spinning loops may be executed in a
row with no preemption/scheduling in between), the invalidation case
seems particularly problematic: Using DMAR_OPERATION_TIMEOUT is
a mistake here in the first place, as the timeout here isn't related to
response times by the IOMMU engine. Instead - with ATS in use - the
specification mandates a timeout of 1 _minute_ (with a 50% slack, the
meaning of which none of us [Andrew and Malcolm brought this issue
to my attention in private talks on the hackathon] was able to really
interpret in a sensible way).

So there are two things that need doing rather sooner than later:

First and foremost the ATS case needs to be taken into consideration
when doing invalidations. Obviously we can't spin for a minute, so
invalidation absolutely needs to be converted to a non-spinning model.
We realize this isn't going to be trivial, which is why we bring this up
here rather than coming forward with a patch right away.

Second, looking at Linux (which interestingly enough also spins, and
that even without any timeout) there are flags in the fault status
register that can be used to detect at least some loop abort conditions.
We should definitely make use of anything that can shorten these
spinning loops (as was already done in commit dd6d87a4 ["VT-d: drop
redundant calls to invalidate_sync()"] as a very tiny first step). The
main problem with trying to clone at least some of the functionality
from Linux is that I'm not convinced the replaying they do can
actually do anything good. Plus it's clear that - spinning or not - the
consequences of an invalidation request not completing successfully
need to be taken care of (and it's of no help that in all cases I looked
at so far errors passed up from the leaf functions sooner or later
get dropped on the floor or mis-interpreted).

And finally, all other spinning instances need to be audited to make
sure they can't add up to multiple-second spins (iirc we can't
tolerate more than about 4s without running into time problems on
certain hardware).

Jan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: VT-d spin loops
  2014-06-26 10:30 ` VT-d spin loops Jan Beulich
@ 2014-06-26 10:38   ` Andrew Cooper
  2014-06-26 10:43     ` Andrew Cooper
  2014-07-09 23:22   ` Tian, Kevin
  1 sibling, 1 reply; 10+ messages in thread
From: Andrew Cooper @ 2014-06-26 10:38 UTC (permalink / raw)
  To: Jan Beulich, Kevin Tian, Yang Z Zhang, xen-devel
  Cc: Malcolm Crossley, Donald D Dugger

On 26/06/14 11:30, Jan Beulich wrote:
> All,
>
> VT-d code currently has a number of cases where completion of certain
> operations is being waited for by way of spinning. The various instances
> can be identified relatively easily by grep-ing for all uses of
> DMAR_OPERATION_TIMEOUT (the majority of instances use that
> variable indirectly through IOMMU_WAIT_OP()), allowing for loops of
> up to 1 second. While in many of the cases this _may_ be acceptable
> (which would need to be proven for each individual case, also taking into
> consideration how many of these spinning loops may be executed in a
> row with no preemption/scheduling in between), the invalidation case
> seems particularly problematic: Using DMAR_OPERATION_TIMEOUT is
> a mistake here in the first place, as the timeout here isn't related to
> response times by the IOMMU engine. Instead - with ATS in use - the
> specification mandates a timeout of 1 _minute_ (with a 50% slack, the
> meaning of which none of us [Andrew and Malcolm brought this issue
> to my attention in private talks on the hackathon] was able to really
> interpret in a sensible way).

>From a worst case point of view, this 50% slack means 90 seconds total.

>
> So there are two things that need doing rather sooner than later:
>
> First and foremost the ATS case needs to be taken into consideration
> when doing invalidations. Obviously we can't spin for a minute, so
> invalidation absolutely needs to be converted to a non-spinning model.
> We realize this isn't going to be trivial, which is why we bring this up
> here rather than coming forward with a patch right away.
>
> Second, looking at Linux (which interestingly enough also spins, and
> that even without any timeout) there are flags in the fault status
> register that can be used to detect at least some loop abort conditions.
> We should definitely make use of anything that can shorten these
> spinning loops (as was already done in commit dd6d87a4 ["VT-d: drop
> redundant calls to invalidate_sync()"] as a very tiny first step). The
> main problem with trying to clone at least some of the functionality
> from Linux is that I'm not convinced the replaying they do can
> actually do anything good. Plus it's clear that - spinning or not - the
> consequences of an invalidation request not completing successfully
> need to be taken care of (and it's of no help that in all cases I looked
> at so far errors passed up from the leaf functions sooner or later
> get dropped on the floor or mis-interpreted).
>
> And finally, all other spinning instances need to be audited to make
> sure they can't add up to multiple-second spins (iirc we can't
> tolerate more than about 4s without running into time problems on
> certain hardware).
>
> Jan
>

With a default watchdog, the maximum possible spin time is 5 seconds.

However, given the 1 second time calibration rendezvous, any spin longer
than 1 second will hold all other Xen cpus up waiting for the spinning
cpu, which is a stall of the entire results in a stall of the entire
system until the spinner has exited.

~Andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: VT-d spin loops
  2014-06-26 10:38   ` Andrew Cooper
@ 2014-06-26 10:43     ` Andrew Cooper
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Cooper @ 2014-06-26 10:43 UTC (permalink / raw)
  To: Jan Beulich, Kevin Tian, Yang Z Zhang, xen-devel
  Cc: Malcolm Crossley, Donald D Dugger

On 26/06/14 11:38, Andrew Cooper wrote:
> On 26/06/14 11:30, Jan Beulich wrote:
>> All,
>>
>> VT-d code currently has a number of cases where completion of certain
>> operations is being waited for by way of spinning. The various instances
>> can be identified relatively easily by grep-ing for all uses of
>> DMAR_OPERATION_TIMEOUT (the majority of instances use that
>> variable indirectly through IOMMU_WAIT_OP()), allowing for loops of
>> up to 1 second. While in many of the cases this _may_ be acceptable
>> (which would need to be proven for each individual case, also taking into
>> consideration how many of these spinning loops may be executed in a
>> row with no preemption/scheduling in between), the invalidation case
>> seems particularly problematic: Using DMAR_OPERATION_TIMEOUT is
>> a mistake here in the first place, as the timeout here isn't related to
>> response times by the IOMMU engine. Instead - with ATS in use - the
>> specification mandates a timeout of 1 _minute_ (with a 50% slack, the
>> meaning of which none of us [Andrew and Malcolm brought this issue
>> to my attention in private talks on the hackathon] was able to really
>> interpret in a sensible way).
> From a worst case point of view, this 50% slack means 90 seconds total.
>
>> So there are two things that need doing rather sooner than later:
>>
>> First and foremost the ATS case needs to be taken into consideration
>> when doing invalidations. Obviously we can't spin for a minute, so
>> invalidation absolutely needs to be converted to a non-spinning model.
>> We realize this isn't going to be trivial, which is why we bring this up
>> here rather than coming forward with a patch right away.
>>
>> Second, looking at Linux (which interestingly enough also spins, and
>> that even without any timeout) there are flags in the fault status
>> register that can be used to detect at least some loop abort conditions.
>> We should definitely make use of anything that can shorten these
>> spinning loops (as was already done in commit dd6d87a4 ["VT-d: drop
>> redundant calls to invalidate_sync()"] as a very tiny first step). The
>> main problem with trying to clone at least some of the functionality
>> from Linux is that I'm not convinced the replaying they do can
>> actually do anything good. Plus it's clear that - spinning or not - the
>> consequences of an invalidation request not completing successfully
>> need to be taken care of (and it's of no help that in all cases I looked
>> at so far errors passed up from the leaf functions sooner or later
>> get dropped on the floor or mis-interpreted).
>>
>> And finally, all other spinning instances need to be audited to make
>> sure they can't add up to multiple-second spins (iirc we can't
>> tolerate more than about 4s without running into time problems on
>> certain hardware).
>>
>> Jan
>>
> With a default watchdog, the maximum possible spin time is 5 seconds.
>
> However, given the 1 second time calibration rendezvous, any spin longer
> than 1 second will hold all other Xen cpus up waiting for the spinning
> cpu, which is a stall of the entire results in a stall of the entire
> system until the spinner has exited.
>
> ~Andrew

Hmm sorry - that sentence didn't come out quite correctly.

Any spin for the IOMMU longer than 1 second will stall the entire rest
of the system until it is complete, due to the 1 second time calibration
rendezvous, as some of the IOMMU spins are with interrupts disabled.

~Andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: VT-d spin loops
  2014-06-26 10:30 ` VT-d spin loops Jan Beulich
  2014-06-26 10:38   ` Andrew Cooper
@ 2014-07-09 23:22   ` Tian, Kevin
  2014-07-10  9:55     ` Andrew Cooper
  1 sibling, 1 reply; 10+ messages in thread
From: Tian, Kevin @ 2014-07-09 23:22 UTC (permalink / raw)
  To: Jan Beulich, Zhang, Yang Z, xen-devel
  Cc: Andrew Cooper, Malcolm Crossley, Dugger, Donald D

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Thursday, June 26, 2014 3:30 AM
> 
> All,
> 
> VT-d code currently has a number of cases where completion of certain
> operations is being waited for by way of spinning. The various instances
> can be identified relatively easily by grep-ing for all uses of
> DMAR_OPERATION_TIMEOUT (the majority of instances use that
> variable indirectly through IOMMU_WAIT_OP()), allowing for loops of
> up to 1 second. While in many of the cases this _may_ be acceptable
> (which would need to be proven for each individual case, also taking into
> consideration how many of these spinning loops may be executed in a
> row with no preemption/scheduling in between), the invalidation case
> seems particularly problematic: Using DMAR_OPERATION_TIMEOUT is
> a mistake here in the first place, as the timeout here isn't related to
> response times by the IOMMU engine. Instead - with ATS in use - the
> specification mandates a timeout of 1 _minute_ (with a 50% slack, the
> meaning of which none of us [Andrew and Malcolm brought this issue
> to my attention in private talks on the hackathon] was able to really
> interpret in a sensible way).

yes, that's not a good design. Most waits happen in IOMMU initialization,
where 1s timeout is less a big issue. At runtime cache/tlb flush and root
entry manipulation are definitely not good with long spinning .

> 
> So there are two things that need doing rather sooner than later:
> 
> First and foremost the ATS case needs to be taken into consideration
> when doing invalidations. Obviously we can't spin for a minute, so
> invalidation absolutely needs to be converted to a non-spinning model.
> We realize this isn't going to be trivial, which is why we bring this up
> here rather than coming forward with a patch right away.

ATS should be fine. Device TLB can ONLY be validated through qinval
interface, which is asynchronous so no need to consider 1 minute timeout
even in current spinning model.

> 
> Second, looking at Linux (which interestingly enough also spins, and
> that even without any timeout) there are flags in the fault status
> register that can be used to detect at least some loop abort conditions.
> We should definitely make use of anything that can shorten these
> spinning loops (as was already done in commit dd6d87a4 ["VT-d: drop
> redundant calls to invalidate_sync()"] as a very tiny first step). The
> main problem with trying to clone at least some of the functionality
> from Linux is that I'm not convinced the replaying they do can
> actually do anything good. Plus it's clear that - spinning or not - the
> consequences of an invalidation request not completing successfully
> need to be taken care of (and it's of no help that in all cases I looked
> at so far errors passed up from the leaf functions sooner or later
> get dropped on the floor or mis-interpreted).
> 
> And finally, all other spinning instances need to be audited to make
> sure they can't add up to multiple-second spins (iirc we can't
> tolerate more than about 4s without running into time problems on
> certain hardware).
> 

In general yes a non-spinning model is better, but it requires non-trivial
change to make all IOMMU operations asynchronous. If ATS is not a
concern, is it still worthy of change besides auditing existing usages? :-)

Thanks
Kevin
Thanks
Kevin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: VT-d spin loops
  2014-07-09 23:22   ` Tian, Kevin
@ 2014-07-10  9:55     ` Andrew Cooper
  2014-07-10 13:19       ` Dugger, Donald D
                         ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Andrew Cooper @ 2014-07-10  9:55 UTC (permalink / raw)
  To: Tian, Kevin, Jan Beulich, Zhang, Yang Z, xen-devel
  Cc: Malcolm Crossley, Dugger, Donald D

On 10/07/14 00:22, Tian, Kevin wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Thursday, June 26, 2014 3:30 AM
>>
>> All,
>>
>> VT-d code currently has a number of cases where completion of certain
>> operations is being waited for by way of spinning. The various instances
>> can be identified relatively easily by grep-ing for all uses of
>> DMAR_OPERATION_TIMEOUT (the majority of instances use that
>> variable indirectly through IOMMU_WAIT_OP()), allowing for loops of
>> up to 1 second. While in many of the cases this _may_ be acceptable
>> (which would need to be proven for each individual case, also taking into
>> consideration how many of these spinning loops may be executed in a
>> row with no preemption/scheduling in between), the invalidation case
>> seems particularly problematic: Using DMAR_OPERATION_TIMEOUT is
>> a mistake here in the first place, as the timeout here isn't related to
>> response times by the IOMMU engine. Instead - with ATS in use - the
>> specification mandates a timeout of 1 _minute_ (with a 50% slack, the
>> meaning of which none of us [Andrew and Malcolm brought this issue
>> to my attention in private talks on the hackathon] was able to really
>> interpret in a sensible way).
> yes, that's not a good design. Most waits happen in IOMMU initialization,
> where 1s timeout is less a big issue. At runtime cache/tlb flush and root
> entry manipulation are definitely not good with long spinning .
>
>> So there are two things that need doing rather sooner than later:
>>
>> First and foremost the ATS case needs to be taken into consideration
>> when doing invalidations. Obviously we can't spin for a minute, so
>> invalidation absolutely needs to be converted to a non-spinning model.
>> We realize this isn't going to be trivial, which is why we bring this up
>> here rather than coming forward with a patch right away.
> ATS should be fine. Device TLB can ONLY be validated through qinval
> interface, which is asynchronous so no need to consider 1 minute timeout
> even in current spinning model.

There are currently no asynchronous invalidations in Xen. ATS certainly
is a problem.

>
>> Second, looking at Linux (which interestingly enough also spins, and
>> that even without any timeout) there are flags in the fault status
>> register that can be used to detect at least some loop abort conditions.
>> We should definitely make use of anything that can shorten these
>> spinning loops (as was already done in commit dd6d87a4 ["VT-d: drop
>> redundant calls to invalidate_sync()"] as a very tiny first step). The
>> main problem with trying to clone at least some of the functionality
>> from Linux is that I'm not convinced the replaying they do can
>> actually do anything good. Plus it's clear that - spinning or not - the
>> consequences of an invalidation request not completing successfully
>> need to be taken care of (and it's of no help that in all cases I looked
>> at so far errors passed up from the leaf functions sooner or later
>> get dropped on the floor or mis-interpreted).
>>
>> And finally, all other spinning instances need to be audited to make
>> sure they can't add up to multiple-second spins (iirc we can't
>> tolerate more than about 4s without running into time problems on
>> certain hardware).
>>
> In general yes a non-spinning model is better, but it requires non-trivial
> change to make all IOMMU operations asynchronous. If ATS is not a
> concern, is it still worthy of change besides auditing existing usages?

Even if the invalidation is only at the IOMMU, waiting milliseconds for
the completion is still time better spent elsewhere, such as running VMs.

Do you have any numbers for typical completion times for invalidate
requests?

~Andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: VT-d spin loops
  2014-07-10  9:55     ` Andrew Cooper
@ 2014-07-10 13:19       ` Dugger, Donald D
  2014-07-10 15:53       ` Tian, Kevin
  2014-07-15  8:00       ` Zhang, Yang Z
  2 siblings, 0 replies; 10+ messages in thread
From: Dugger, Donald D @ 2014-07-10 13:19 UTC (permalink / raw)
  To: Andrew Cooper, Tian, Kevin, Jan Beulich, Zhang, Yang Z, xen-devel
  Cc: Malcolm Crossley

Andrew-

I working on getting specifics for completion times for invalidation requests, I'll post the info as soon as I get it.

--
Don Dugger
"Censeo Toto nos in Kansa esse decisse." - D. Gale
Ph: 303/443-3786

-----Original Message-----
From: Andrew Cooper [mailto:andrew.cooper3@citrix.com] 
Sent: Thursday, July 10, 2014 3:55 AM
To: Tian, Kevin; Jan Beulich; Zhang, Yang Z; xen-devel
Cc: Malcolm Crossley; Dugger, Donald D
Subject: Re: VT-d spin loops

On 10/07/14 00:22, Tian, Kevin wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Thursday, June 26, 2014 3:30 AM
>>
>> All,
>>
>> VT-d code currently has a number of cases where completion of certain 
>> operations is being waited for by way of spinning. The various 
>> instances can be identified relatively easily by grep-ing for all 
>> uses of DMAR_OPERATION_TIMEOUT (the majority of instances use that 
>> variable indirectly through IOMMU_WAIT_OP()), allowing for loops of 
>> up to 1 second. While in many of the cases this _may_ be acceptable 
>> (which would need to be proven for each individual case, also taking 
>> into consideration how many of these spinning loops may be executed 
>> in a row with no preemption/scheduling in between), the invalidation 
>> case seems particularly problematic: Using DMAR_OPERATION_TIMEOUT is 
>> a mistake here in the first place, as the timeout here isn't related 
>> to response times by the IOMMU engine. Instead - with ATS in use - 
>> the specification mandates a timeout of 1 _minute_ (with a 50% slack, 
>> the meaning of which none of us [Andrew and Malcolm brought this 
>> issue to my attention in private talks on the hackathon] was able to 
>> really interpret in a sensible way).
> yes, that's not a good design. Most waits happen in IOMMU 
> initialization, where 1s timeout is less a big issue. At runtime 
> cache/tlb flush and root entry manipulation are definitely not good with long spinning .
>
>> So there are two things that need doing rather sooner than later:
>>
>> First and foremost the ATS case needs to be taken into consideration 
>> when doing invalidations. Obviously we can't spin for a minute, so 
>> invalidation absolutely needs to be converted to a non-spinning model.
>> We realize this isn't going to be trivial, which is why we bring this 
>> up here rather than coming forward with a patch right away.
> ATS should be fine. Device TLB can ONLY be validated through qinval 
> interface, which is asynchronous so no need to consider 1 minute 
> timeout even in current spinning model.

There are currently no asynchronous invalidations in Xen. ATS certainly is a problem.

>
>> Second, looking at Linux (which interestingly enough also spins, and 
>> that even without any timeout) there are flags in the fault status 
>> register that can be used to detect at least some loop abort conditions.
>> We should definitely make use of anything that can shorten these 
>> spinning loops (as was already done in commit dd6d87a4 ["VT-d: drop 
>> redundant calls to invalidate_sync()"] as a very tiny first step). 
>> The main problem with trying to clone at least some of the 
>> functionality from Linux is that I'm not convinced the replaying they 
>> do can actually do anything good. Plus it's clear that - spinning or 
>> not - the consequences of an invalidation request not completing 
>> successfully need to be taken care of (and it's of no help that in 
>> all cases I looked at so far errors passed up from the leaf functions 
>> sooner or later get dropped on the floor or mis-interpreted).
>>
>> And finally, all other spinning instances need to be audited to make 
>> sure they can't add up to multiple-second spins (iirc we can't 
>> tolerate more than about 4s without running into time problems on 
>> certain hardware).
>>
> In general yes a non-spinning model is better, but it requires 
> non-trivial change to make all IOMMU operations asynchronous. If ATS 
> is not a concern, is it still worthy of change besides auditing existing usages?

Even if the invalidation is only at the IOMMU, waiting milliseconds for the completion is still time better spent elsewhere, such as running VMs.

Do you have any numbers for typical completion times for invalidate requests?

~Andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: VT-d spin loops
  2014-07-10  9:55     ` Andrew Cooper
  2014-07-10 13:19       ` Dugger, Donald D
@ 2014-07-10 15:53       ` Tian, Kevin
  2014-07-15  8:00       ` Zhang, Yang Z
  2 siblings, 0 replies; 10+ messages in thread
From: Tian, Kevin @ 2014-07-10 15:53 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich, Zhang, Yang Z, xen-devel
  Cc: Malcolm Crossley, Dugger, Donald D

> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Thursday, July 10, 2014 2:55 AM
> 
> On 10/07/14 00:22, Tian, Kevin wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Thursday, June 26, 2014 3:30 AM
> >>
> >> All,
> >>
> >> VT-d code currently has a number of cases where completion of certain
> >> operations is being waited for by way of spinning. The various instances
> >> can be identified relatively easily by grep-ing for all uses of
> >> DMAR_OPERATION_TIMEOUT (the majority of instances use that
> >> variable indirectly through IOMMU_WAIT_OP()), allowing for loops of
> >> up to 1 second. While in many of the cases this _may_ be acceptable
> >> (which would need to be proven for each individual case, also taking into
> >> consideration how many of these spinning loops may be executed in a
> >> row with no preemption/scheduling in between), the invalidation case
> >> seems particularly problematic: Using DMAR_OPERATION_TIMEOUT is
> >> a mistake here in the first place, as the timeout here isn't related to
> >> response times by the IOMMU engine. Instead - with ATS in use - the
> >> specification mandates a timeout of 1 _minute_ (with a 50% slack, the
> >> meaning of which none of us [Andrew and Malcolm brought this issue
> >> to my attention in private talks on the hackathon] was able to really
> >> interpret in a sensible way).
> > yes, that's not a good design. Most waits happen in IOMMU initialization,
> > where 1s timeout is less a big issue. At runtime cache/tlb flush and root
> > entry manipulation are definitely not good with long spinning .
> >
> >> So there are two things that need doing rather sooner than later:
> >>
> >> First and foremost the ATS case needs to be taken into consideration
> >> when doing invalidations. Obviously we can't spin for a minute, so
> >> invalidation absolutely needs to be converted to a non-spinning model.
> >> We realize this isn't going to be trivial, which is why we bring this up
> >> here rather than coming forward with a patch right away.
> > ATS should be fine. Device TLB can ONLY be validated through qinval
> > interface, which is asynchronous so no need to consider 1 minute timeout
> > even in current spinning model.
> 
> There are currently no asynchronous invalidations in Xen. ATS certainly
> is a problem.

but that problem can be separately solved, not bound to the spin concern 
in this thread...

> 
> >
> >> Second, looking at Linux (which interestingly enough also spins, and
> >> that even without any timeout) there are flags in the fault status
> >> register that can be used to detect at least some loop abort conditions.
> >> We should definitely make use of anything that can shorten these
> >> spinning loops (as was already done in commit dd6d87a4 ["VT-d: drop
> >> redundant calls to invalidate_sync()"] as a very tiny first step). The
> >> main problem with trying to clone at least some of the functionality
> >> from Linux is that I'm not convinced the replaying they do can
> >> actually do anything good. Plus it's clear that - spinning or not - the
> >> consequences of an invalidation request not completing successfully
> >> need to be taken care of (and it's of no help that in all cases I looked
> >> at so far errors passed up from the leaf functions sooner or later
> >> get dropped on the floor or mis-interpreted).
> >>
> >> And finally, all other spinning instances need to be audited to make
> >> sure they can't add up to multiple-second spins (iirc we can't
> >> tolerate more than about 4s without running into time problems on
> >> certain hardware).
> >>
> > In general yes a non-spinning model is better, but it requires non-trivial
> > change to make all IOMMU operations asynchronous. If ATS is not a
> > concern, is it still worthy of change besides auditing existing usages?
> 
> Even if the invalidation is only at the IOMMU, waiting milliseconds for
> the completion is still time better spent elsewhere, such as running VMs.
> 
> Do you have any numbers for typical completion times for invalidate
> requests?
> 
> ~Andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: VT-d spin loops
  2014-07-10  9:55     ` Andrew Cooper
  2014-07-10 13:19       ` Dugger, Donald D
  2014-07-10 15:53       ` Tian, Kevin
@ 2014-07-15  8:00       ` Zhang, Yang Z
  2014-07-23  8:04         ` Jan Beulich
  2 siblings, 1 reply; 10+ messages in thread
From: Zhang, Yang Z @ 2014-07-15  8:00 UTC (permalink / raw)
  To: Andrew Cooper, Tian, Kevin, Jan Beulich, xen-devel
  Cc: Malcolm Crossley, Dugger, Donald D

Andrew Cooper wrote on 2014-07-10:
> On 10/07/14 00:22, Tian, Kevin wrote:
>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>> Sent: Thursday, June 26, 2014 3:30 AM
>>> 
>>> All,
>>> 
>>> VT-d code currently has a number of cases where completion of
>>> certain operations is being waited for by way of spinning. The
>>> various instances can be identified relatively easily by grep-ing
>>> for all uses of DMAR_OPERATION_TIMEOUT (the majority of instances
>>> use that variable indirectly through IOMMU_WAIT_OP()), allowing for
>>> loops of up to 1 second. While in many of the cases this _may_ be
>>> acceptable (which would need to be proven for each individual case,
>>> also taking into consideration how many of these spinning loops may
>>> be executed in a row with no preemption/scheduling in between), the
>>> invalidation case seems particularly problematic: Using
>>> DMAR_OPERATION_TIMEOUT is a mistake here in the first place, as the
>>> timeout here isn't related to response times by the IOMMU engine.
>>> Instead - with ATS in use - the specification mandates a timeout of
>>> 1 _minute_ (with a 50% slack, the meaning of which none of us
>>> [Andrew and Malcolm brought this issue to my attention in private
>>> talks on the hackathon] was able to really interpret in a sensible way).
>> yes, that's not a good design. Most waits happen in IOMMU
>> initialization, where 1s timeout is less a big issue. At runtime
>> cache/tlb flush and root entry manipulation are definitely not good
>> with long spinning .
>> 
>>> So there are two things that need doing rather sooner than later:
>>> 
>>> First and foremost the ATS case needs to be taken into
>>> consideration when doing invalidations. Obviously we can't spin for
>>> a minute, so invalidation absolutely needs to be converted to a non-spinning model.
>>> We realize this isn't going to be trivial, which is why we bring
>>> this up here rather than coming forward with a patch right away.
>> ATS should be fine. Device TLB can ONLY be validated through qinval
>> interface, which is asynchronous so no need to consider 1 minute
>> timeout even in current spinning model.
> 
> There are currently no asynchronous invalidations in Xen. ATS
> certainly is a problem.

How Linux upstream handle ATS? Does it have any asynchronous invalidations mechanism?

> 
>> 
>>> Second, looking at Linux (which interestingly enough also spins,
>>> and that even without any timeout) there are flags in the fault
>>> status register that can be used to detect at least some loop abort conditions.
>>> We should definitely make use of anything that can shorten these
>>> spinning loops (as was already done in commit dd6d87a4 ["VT-d: drop
>>> redundant calls to invalidate_sync()"] as a very tiny first step).
>>> The main problem with trying to clone at least some of the
>>> functionality from Linux is that I'm not convinced the replaying
>>> they do can actually do anything good. Plus it's clear that -
>>> spinning or not - the consequences of an invalidation request not
>>> completing successfully need to be taken care of (and it's of no
>>> help that in all cases I looked at so far errors passed up from the
>>> leaf functions sooner or later get dropped on the floor or mis-interpreted).
>>> 
>>> And finally, all other spinning instances need to be audited to
>>> make sure they can't add up to multiple-second spins (iirc we can't
>>> tolerate more than about 4s without running into time problems on
>>> certain hardware).
>>> 
>> In general yes a non-spinning model is better, but it requires
>> non-trivial change to make all IOMMU operations asynchronous. If ATS
>> is not a concern, is it still worthy of change besides auditing existing usages?
> 
> Even if the invalidation is only at the IOMMU, waiting milliseconds
> for the completion is still time better spent elsewhere, such as running VMs.
> 
> Do you have any numbers for typical completion times for invalidate requests?
> 

The invalidations are completed fairly quickly by hardware. So the cost for spin can be ignored?

> ~Andrew


Best regards,
Yang

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: VT-d spin loops
  2014-07-15  8:00       ` Zhang, Yang Z
@ 2014-07-23  8:04         ` Jan Beulich
  2014-07-23 16:17           ` Tian, Kevin
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2014-07-23  8:04 UTC (permalink / raw)
  To: Yang Z Zhang
  Cc: Andrew Cooper, Malcolm Crossley, Kevin Tian, Donald D Dugger, xen-devel

>>> On 15.07.14 at 10:00, <yang.z.zhang@intel.com> wrote:
> Andrew Cooper wrote on 2014-07-10:
>> On 10/07/14 00:22, Tian, Kevin wrote:
>>> ATS should be fine. Device TLB can ONLY be validated through qinval
>>> interface, which is asynchronous so no need to consider 1 minute
>>> timeout even in current spinning model.
>> 
>> There are currently no asynchronous invalidations in Xen. ATS
>> certainly is a problem.
> 
> How Linux upstream handle ATS? Does it have any asynchronous invalidations 
> mechanism?

Not according to my inspection of the code.

>>> In general yes a non-spinning model is better, but it requires
>>> non-trivial change to make all IOMMU operations asynchronous. If ATS
>>> is not a concern, is it still worthy of change besides auditing existing 
> usages?
>> 
>> Even if the invalidation is only at the IOMMU, waiting milliseconds
>> for the completion is still time better spent elsewhere, such as running 
> VMs.
>> 
>> Do you have any numbers for typical completion times for invalidate 
> requests?
>> 
> 
> The invalidations are completed fairly quickly by hardware. So the cost for 
> spin can be ignored?

No, we have to be prepared for a timeout to occur, without killing
the entire host (killing the guest owning affected device would be
acceptable as a consequence), even more so with the longer
timeouts mandated by ATS.

Jan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: VT-d spin loops
  2014-07-23  8:04         ` Jan Beulich
@ 2014-07-23 16:17           ` Tian, Kevin
  0 siblings, 0 replies; 10+ messages in thread
From: Tian, Kevin @ 2014-07-23 16:17 UTC (permalink / raw)
  To: Jan Beulich, Zhang, Yang Z
  Cc: Andrew Cooper, Malcolm Crossley, Dugger, Donald D, xen-devel

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Wednesday, July 23, 2014 1:04 AM
> 
> >>> On 15.07.14 at 10:00, <yang.z.zhang@intel.com> wrote:
> > Andrew Cooper wrote on 2014-07-10:
> >> On 10/07/14 00:22, Tian, Kevin wrote:
> >>> ATS should be fine. Device TLB can ONLY be validated through qinval
> >>> interface, which is asynchronous so no need to consider 1 minute
> >>> timeout even in current spinning model.
> >>
> >> There are currently no asynchronous invalidations in Xen. ATS
> >> certainly is a problem.
> >
> > How Linux upstream handle ATS? Does it have any asynchronous
> invalidations
> > mechanism?
> 
> Not according to my inspection of the code.
> 
> >>> In general yes a non-spinning model is better, but it requires
> >>> non-trivial change to make all IOMMU operations asynchronous. If ATS
> >>> is not a concern, is it still worthy of change besides auditing existing
> > usages?
> >>
> >> Even if the invalidation is only at the IOMMU, waiting milliseconds
> >> for the completion is still time better spent elsewhere, such as running
> > VMs.
> >>
> >> Do you have any numbers for typical completion times for invalidate
> > requests?
> >>
> >
> > The invalidations are completed fairly quickly by hardware. So the cost for
> > spin can be ignored?
> 
> No, we have to be prepared for a timeout to occur, without killing
> the entire host (killing the guest owning affected device would be
> acceptable as a consequence), even more so with the longer
> timeouts mandated by ATS.
> 

right. for ATS we really need an asynchronous implementation, while 
doing that also means whole spin loop concept is changed.


Thanks
Kevin

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-07-23 16:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <53A96B80020000780001CB8F@mail.emea.novell.com>
2014-06-26 10:30 ` VT-d spin loops Jan Beulich
2014-06-26 10:38   ` Andrew Cooper
2014-06-26 10:43     ` Andrew Cooper
2014-07-09 23:22   ` Tian, Kevin
2014-07-10  9:55     ` Andrew Cooper
2014-07-10 13:19       ` Dugger, Donald D
2014-07-10 15:53       ` Tian, Kevin
2014-07-15  8:00       ` Zhang, Yang Z
2014-07-23  8:04         ` Jan Beulich
2014-07-23 16:17           ` Tian, Kevin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.