From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Jan Beulich" Subject: VT-d spin loops Date: Thu, 26 Jun 2014 11:30:06 +0100 Message-ID: <53AC124E020000780001D8FF@mail.emea.novell.com> References: <53A96B80020000780001CB8F@mail.emea.novell.com> <53AC124E020000780001D8FF@mail.emea.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1X06wT-0008RI-6Q for xen-devel@lists.xenproject.org; Thu, 26 Jun 2014 10:30:09 +0000 Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Kevin Tian , Yang Z Zhang , xen-devel Cc: Andrew Cooper , Malcolm Crossley , Donald D Dugger List-Id: xen-devel@lists.xenproject.org All, VT-d code currently has a number of cases where completion of certain operations is being waited for by way of spinning. The various instances can be identified relatively easily by grep-ing for all uses of DMAR_OPERATION_TIMEOUT (the majority of instances use that variable indirectly through IOMMU_WAIT_OP()), allowing for loops of up to 1 second. While in many of the cases this _may_ be acceptable (which would need to be proven for each individual case, also taking into consideration how many of these spinning loops may be executed in a row with no preemption/scheduling in between), the invalidation case seems particularly problematic: Using DMAR_OPERATION_TIMEOUT is a mistake here in the first place, as the timeout here isn't related to response times by the IOMMU engine. Instead - with ATS in use - the specification mandates a timeout of 1 _minute_ (with a 50% slack, the meaning of which none of us [Andrew and Malcolm brought this issue to my attention in private talks on the hackathon] was able to really interpret in a sensible way). So there are two things that need doing rather sooner than later: First and foremost the ATS case needs to be taken into consideration when doing invalidations. Obviously we can't spin for a minute, so invalidation absolutely needs to be converted to a non-spinning model. We realize this isn't going to be trivial, which is why we bring this up here rather than coming forward with a patch right away. Second, looking at Linux (which interestingly enough also spins, and that even without any timeout) there are flags in the fault status register that can be used to detect at least some loop abort conditions. We should definitely make use of anything that can shorten these spinning loops (as was already done in commit dd6d87a4 ["VT-d: drop redundant calls to invalidate_sync()"] as a very tiny first step). The main problem with trying to clone at least some of the functionality from Linux is that I'm not convinced the replaying they do can actually do anything good. Plus it's clear that - spinning or not - the consequences of an invalidation request not completing successfully need to be taken care of (and it's of no help that in all cases I looked at so far errors passed up from the leaf functions sooner or later get dropped on the floor or mis-interpreted). And finally, all other spinning instances need to be audited to make sure they can't add up to multiple-second spins (iirc we can't tolerate more than about 4s without running into time problems on certain hardware). Jan