From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Jan Beulich" <JBeulich@suse.com>
Subject: VT-d spin loops
Date: Thu, 26 Jun 2014 11:30:06 +0100
Message-ID: <53AC124E020000780001D8FF@mail.emea.novell.com>
References: <53A96B80020000780001CB8F@mail.emea.novell.com>
	<53AC124E020000780001D8FF@mail.emea.novell.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta5.messagelabs.com ([195.245.231.135])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <JBeulich@suse.com>) id 1X06wT-0008RI-6Q
	for xen-devel@lists.xenproject.org; Thu, 26 Jun 2014 10:30:09 +0000
Content-Disposition: inline
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Kevin Tian <kevin.tian@intel.com>, Yang Z Zhang <yang.z.zhang@intel.com>, xen-devel <xen-devel@lists.xenproject.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>, Malcolm Crossley <malcolm.crossley@citrix.com>, Donald D Dugger <donald.d.dugger@intel.com>
List-Id: xen-devel@lists.xenproject.org

All,

VT-d code currently has a number of cases where completion of certain
operations is being waited for by way of spinning. The various instances
can be identified relatively easily by grep-ing for all uses of
DMAR_OPERATION_TIMEOUT (the majority of instances use that
variable indirectly through IOMMU_WAIT_OP()), allowing for loops of
up to 1 second. While in many of the cases this _may_ be acceptable
(which would need to be proven for each individual case, also taking into
consideration how many of these spinning loops may be executed in a
row with no preemption/scheduling in between), the invalidation case
seems particularly problematic: Using DMAR_OPERATION_TIMEOUT is
a mistake here in the first place, as the timeout here isn't related to
response times by the IOMMU engine. Instead - with ATS in use - the
specification mandates a timeout of 1 _minute_ (with a 50% slack, the
meaning of which none of us [Andrew and Malcolm brought this issue
to my attention in private talks on the hackathon] was able to really
interpret in a sensible way).

So there are two things that need doing rather sooner than later:

First and foremost the ATS case needs to be taken into consideration
when doing invalidations. Obviously we can't spin for a minute, so
invalidation absolutely needs to be converted to a non-spinning model.
We realize this isn't going to be trivial, which is why we bring this up
here rather than coming forward with a patch right away.

Second, looking at Linux (which interestingly enough also spins, and
that even without any timeout) there are flags in the fault status
register that can be used to detect at least some loop abort conditions.
We should definitely make use of anything that can shorten these
spinning loops (as was already done in commit dd6d87a4 ["VT-d: drop
redundant calls to invalidate_sync()"] as a very tiny first step). The
main problem with trying to clone at least some of the functionality
from Linux is that I'm not convinced the replaying they do can
actually do anything good. Plus it's clear that - spinning or not - the
consequences of an invalidation request not completing successfully
need to be taken care of (and it's of no help that in all cases I looked
at so far errors passed up from the leaf functions sooner or later
get dropped on the floor or mis-interpreted).

And finally, all other spinning instances need to be audited to make
sure they can't add up to multiple-second spins (iirc we can't
tolerate more than about 4s without running into time problems on
certain hardware).

Jan