Hi, On 2018/5/25 17:47, Jean-Philippe Brucker wrote: > On 25/05/18 03:39, Xu Zaibo wrote: >> Hi, >> >> On 2018/5/24 23:04, Jean-Philippe Brucker wrote: >>> On 24/05/18 13:35, Xu Zaibo wrote: >>>>> Right, sva_init() must be called once for any device that intends to use >>>>> bind(). For the second process though, group->sva_enabled will be true >>>>> so we won't call sva_init() again, only bind(). >>>> Well, while I create mediated devices based on one parent device to support multiple >>>> processes(a new process will create a new 'vfio_group' for the corresponding mediated device, >>>> and 'sva_enabled' cannot work any more), in fact, *sva_init and *sva_shutdown are basically >>>> working on parent device, so, as a result, I just only need sva initiation and shutdown on the >>>> parent device only once. So I change the two as following: >>>> >>>> @@ -551,8 +565,18 @@ int iommu_sva_device_init(struct device *dev, unsigned long features, >>>> if (features & ~IOMMU_SVA_FEAT_IOPF) >>>> return -EINVAL; >>>> >>>> + /* If already exists, do nothing */ >>>> + mutex_lock(&dev->iommu_param->lock); >>>> + if (dev->iommu_param->sva_param) { >>>> + mutex_unlock(&dev->iommu_param->lock); >>>> + return 0; >>>> + } >>>> + mutex_unlock(&dev->iommu_param->lock); >>>> >>>> if (features & IOMMU_SVA_FEAT_IOPF) { >>>> ret = iommu_register_device_fault_handler(dev, iommu_queue_iopf, >>>> >>>> >>>> @@ -621,6 +646,14 @@ int iommu_sva_device_shutdown(struct device *dev) >>>> if (!domain) >>>> return -ENODEV; >>>> >>>> + /* If any other process is working on the device, shut down does nothing. */ >>>> + mutex_lock(&dev->iommu_param->lock); >>>> + if (!list_empty(&dev->iommu_param->sva_param->mm_list)) { >>>> + mutex_unlock(&dev->iommu_param->lock); >>>> + return 0; >>>> + } >>>> + mutex_unlock(&dev->iommu_param->lock); >>> I don't think iommu-sva.c is the best place for this, it's probably >>> better to implement an intermediate layer (the mediating driver), that >>> calls iommu_sva_device_init() and iommu_sva_device_shutdown() once. Then >>> vfio-pci would still call these functions itself, but for mdev the >>> mediating driver keeps a refcount of groups, and calls device_shutdown() >>> only when freeing the last mdev. >>> >>> A device driver (non mdev in this example) expects to be able to free >>> all its resources after sva_device_shutdown() returns. Imagine the >>> mm_list isn't empty (mm_exit() is running late), and instead of waiting >>> in unbind_dev_all() below, we return 0 immediately. Then the calling >>> driver frees its resources, and the mm_exit callback along with private >>> data passed to bind() disappear. If a mm_exit() is still running in >>> parallel, then it will try to access freed data and corrupt memory. So >>> in this function if mm_list isn't empty, the only thing we can do is wait. >>> >> I still don't understand why we should 'unbind_dev_all', is it possible >> to do a 'unbind_dev_pasid'? > Not in sva_device_shutdown(), it needs to clean up everything. For > example you want to physically unplug the device, or assign it to a VM. > To prevent any leak sva_device_shutdown() needs to remove all bonds. In > theory there shouldn't be any, since either the driver did unbind_dev(), > or all process exited. This is a safety net. > >> Then we can do other things instead of waiting that user may not like. :) > They may not like it, but it's for their own good :) At the moment we're > waiting that: > > * All exit_mm() callback for this device have finished. If we don't wait > then the caller will free the private data passed to bind and the > mm_exit() callback while they are still being used. > > * All page requests targeting this device are dealt with. If we don't > wait then some requests, that are lingering in the IOMMU PRI queue, > may hit the next contexts bound to this device, possibly in a > different VM. It may not be too risky (though probably exploitable in > some way), but is incredibly messy. > > All of this is bounded in time, and normally should be over pretty fast > unless the device driver's exit_mm() does something strange. If the > driver did the right thing, there shouldn't be any wait here (although > there may be one in unbind_dev() for the same reasons - prevent use > after free). > I guess there may be some misunderstandings :). From the current patches, '/iommu_sva_device_shutdown/' is called by '/vfio_iommu_sva_shutdown/', which is mainly used by '/vfio_iommu_type1_detach_group/' that is finally called by processes' release of vfio facilitiy automatically or called by 'VFIO_GROUP_UNSET_CONTAINER' manually in the processes. So, just image that 2 processes is working on the device with IOPF feature, and the 2 do the following to enable SVA: / 1.open("/dev/vfio/vfio") ;// // // 2.open the group of the devcie by calling open("/dev/vfio/x"), but now, // // I think the second processes will fail to open the group because current VFIO thinks that one group only can be used by one process/vm, at present, I use mediated device to create more groups on the parent device to support multiple processes;// / // / 3.VFIO_GROUP_SET_CONTAINER;/ / 4.VFIO_SET_IOMMU; / /// 5.VFIO_IOMMU_BIND; 6.Do some works with the hardware working unit filled by PASID on the device; 7.//VFIO_IOMMU_UNBIND; //***8.VFIO_GROUP_UNSET_CONTAINER;---here, have to sleep to wait another process to finish works of the step 6; * 9. close(group); close(containner); /So, my idea is: If it is possible to just release the params or facilities that only belong to the current process while the process shutdown the device, and while the last process releases them all. Then, as in the above step 8, we don't need to wait, or maybe wait for a very long time if the other process is doing lots of work on the device. Thanks Zaibo >