VT-d async invalidation for Device-TLB.

* VT-d async invalidation for Device-TLB.
@ 2015-06-03  7:49 Xu, Quan
  2015-06-10  8:04 ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: Xu, Quan @ 2015-06-03  7:49 UTC (permalink / raw)
  To: xen-devel
  Cc: Tian, Kevin, Xu, Quan, andrew.cooper3, Dugger, Donald D,
	Jan Beulich, Zhang, Yang Z

Hi All,
     This Email is about VT-d async invalidation for Device-TLB.

Background
=========

As Jan Beulich mentioned(http://lists.xenproject.org/archives/html/xen-devel/2014-06/msg03351.html ), VT-d code currently has a number of cases where completion of certain operations is being waited for by way of spinning. The majority of instances use that variable indirectly through IOMMU_WAIT_OP() macro , allowing for loops of up to 1 second(DMAR_OPERATION_TIMEOUT). While in many of the cases this may be acceptable, the invalidation case seems particularly problematic. Currently hypervisor polls the status address of wait descriptor up to 1 second to get Invalidation flush result. When Invalidation queue includes Device-TLB invalidation, Using 1 second is a mistake here in the validation sync. As the 1 second timeout here is related to response times by the IOMMU engine, Instead of Devi
 ce-TLB invalidation with PCI-e Address Translation Services (ATS) in use. the ATS specification mandates a timeout of 1 _minute_ for cache flush. The ATS case needs to be taken into consideration when doing invalidations.
Obviously we can't spin for a minute, so invalidation absolutely needs to be converted to a non-spinning model.

Design Overview
=============
This design implements a non-spinning model for Device-TLB invalidation - using an interrupt based mechanism. Each domain maintains a invalidation table, and the hypervisor has an entry of invalidation tables. The invalidation table keeps the count of in-flight Device-TLB invalidation queues, and also provides the same polling parameter for mutil in-flight Device-TLB invalidation queues of each domain.
When a domain issues a request to Device-TLB invalidation queue, update invalidation table's count of in-flight Device-TLB invalidation queue and assign the Status Data of wait descriptor of the invalidation queue. An interrupt is sent out to the hypervisor once a Device-TLB invalidation request is done. In interrupt handler, we will schedule a soft-irq to do the following check: 
    if invalidation table's count of in-flight Device-TLB invalidation queues == polling parameter:
	   This domain has no in-flight invalidation requests.
    else
	   This domain has in-flight invalidation requests.
The domain is put into the "blocked" status if it has in-flight Device-TLB invalidation requests, and awoken when all the requests are done. A fault event will be generated if an invalidation failed. We can either crash the domain or crash Xen.
    For Context Invalidation and IOTLB invalidation without Device-TLB invalidation, Invalidation Queue flushes synchronous invalidation as before(This is a tradeoff and the cost of interrupt is overhead).

More details:

1. invalidation table. We define iommu _invl structure in domain.
Struct iommu _invl {
    volatile u64 iommu _invl _poll_slot :62;
    domid_t dom_id;
    u64 iommu _invl _status_data :32;
}__attribute__ ((aligned (64)));

   iommu _invl _poll_slot: Set it equal to the status address of wait descriptor when the invalidation queue is with Device-TLB.
   dom_id: Keep the id of the domain.
   iommu _invl _status_data: Keep the count of in-flight queue with Device-TLB invalidation.

2. Modification to Device IOTLB invalidation:
    - Enabled interrupt notification when hardware completes the invalidations: 
        Set FN, IF and SW bits in Invalidation Wait Descriptor. The reason why also set SW bit is that the interrupt for notification is global not per domain. So we still need to poll the status address to know which domain's flush request is
        completed in interrupt handler.
    - A new per-domain flag (iommu_pending_flush) is used to track the flush status of IOTLB invalidation with Device-TLB invalidation:
        iommu_pending_flush will be set before flushing the Device-TLB invalidation.
    - new logic to do synchronize.
        if no Device-TLB invalidation:
            Back to current invalidation logic.
	   else 
            Set IF, SW, FN bit in wait descriptor and prepare the Status Data.
            Set iommu_pending_flush
            Put the domain in pending flush list
            Return

3. Modification to domain running lifecycle:
    - When iommu_pending_flush is set, the domain is not allowed to enter non-root mode: pause domain before VM entry.

4. New interrupt handler for invalidation completion:
    - when hardware completes the invalidations with Device IOTLB, it generates an interrupt to notify hypervisor.
    - In interrupt handler, we will schedule a soft-irq to handle the finished invalidations.
    - soft-irq to handle finished invalidation:
        Scan the pending flush list
	    for each entry in list
            check the values of iommu _invl _poll_slot and iommu _invl _status_data in each domain's invalidation table.
            if yes, clear iommu_pending_flush and invalidation table, then wakeup the domain.
     (We can leverage IM bit of Invalidation Event Control Register to optimize the interrupt).

5. invalidation failed.
    - A fault event will be generated if invalidation failed. we can either crash the domain or crash Xen if receive an invalidation fault event.

Intel OTC
Quan Xu

^ permalink raw reply	[flat|nested] 15+ messages in thread