From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933464AbcKHRqX (ORCPT ); Tue, 8 Nov 2016 12:46:23 -0500 Received: from mx1.redhat.com ([209.132.183.28]:57146 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751676AbcKHRqV (ORCPT ); Tue, 8 Nov 2016 12:46:21 -0500 Date: Tue, 8 Nov 2016 10:46:19 -0700 From: Alex Williamson To: Kirti Wankhede Cc: , , , , , , , , Subject: Re: [PATCH v11 11/22] vfio iommu: Add blocking notifier to notify DMA_UNMAP Message-ID: <20161108104619.6f76b918@t450s.home> In-Reply-To: References: <1478293856-8191-1-git-send-email-kwankhede@nvidia.com> <1478293856-8191-12-git-send-email-kwankhede@nvidia.com> <20161107164544.57ab1a92@t450s.home> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Tue, 08 Nov 2016 17:46:20 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 8 Nov 2016 21:56:29 +0530 Kirti Wankhede wrote: > On 11/8/2016 5:15 AM, Alex Williamson wrote: > > On Sat, 5 Nov 2016 02:40:45 +0530 > > Kirti Wankhede wrote: > > > ... > >> > >> +int vfio_register_notifier(struct device *dev, struct notifier_block *nb) > > > > Is the expectation here that this is a generic notifier for all > > vfio->mdev signaling? That should probably be made clear in the mdev > > API to avoid vendor drivers assuming their notifier callback only > > occurs for unmaps, even if that's currently the case. > > > > Ok. Adding comment about notifier callback in mdev_device which is part > of next patch. > > ... > > >> mutex_lock(&iommu->lock); > >> > >> - if (!iommu->external_domain) { > >> + /* Fail if notifier list is empty */ > >> + if ((!iommu->external_domain) || (!iommu->notifier.head)) { > >> ret = -EINVAL; > >> goto pin_done; > >> } > >> @@ -867,6 +870,11 @@ unlock: > >> /* Report how much was unmapped */ > >> unmap->size = unmapped; > >> > >> + if (unmapped && iommu->external_domain) > >> + blocking_notifier_call_chain(&iommu->notifier, > >> + VFIO_IOMMU_NOTIFY_DMA_UNMAP, > >> + unmap); > > > > This is after the fact, there's already a gap here where pages are > > unpinned and the mdev device is still running. > > Oh, there is a bug here, now unpin_pages() take user_pfn as argument and > find vfio_dma. If its not found, it doesn't unpin pages. We have to call > this notifier before vfio_remove_dma(). But if we call this before > vfio_remove_dma() there will be deadlock since iommu->lock is already > held here and vfio_iommu_type1_unpin_pages() will also try to hold > iommu->lock. > If we want to call blocking_notifier_call_chain() before > vfio_remove_dma(), sequence should be: > > unmapped += dma->size; > mutex_unlock(&iommu->lock); > if (iommu->external_domain)) { > struct vfio_iommu_type1_dma_unmap nb_unmap; > > nb_unmap.iova = dma->iova; > nb_unmap.size = dma->size; > blocking_notifier_call_chain(&iommu->notifier, > VFIO_IOMMU_NOTIFY_DMA_UNMAP, > &nb_unmap); > } > mutex_lock(&iommu->lock); > vfio_remove_dma(iommu, dma); It seems like it would be worthwhile to have the rb-tree rooted in the vfio-dma, then we only need to call the notifier if there are pages pinned within that vfio-dma (ie. the rb-tree is not empty). We can then release the lock call the notifier, re-acquire the lock, and BUG_ON if the rb-tree still is not empty. We might get duplicate pfns between separate vfio_dma structs, but as I mentioned in other replies, that seems like an exception that we don't need to optimize for. > > The notifier needs to > > happen prior to that and I suspect that we need to validate that we > > have no remaining external pfn references within this vfio_dma block. > > It seems like we need to root our pfn tracking in the vfio_dma so that > > we can see that it's empty after the notifier chain and BUG_ON if not. > > There is no way to find pfns from that iova range with current > implementation. We can have this validate if we go with linear array of > iova to track pfns. Right, I was still hoping to avoid storing the pfn even with the array/page-table approach though, ask the mm layer for the mapping again. Is that too much overhead? Maybe the page table could store the phys addr and we could use PAGE_MASK to store the reference count so that each entry is still only 8bytes(?) > > I would also add some enforcement that external pinning is only enabled > > when vfio_iommu_type1 is configured for v2 semantics (ie. we only > > support unmaps exactly matching previous maps). > > > > Ok I'll add that check. > > Thanks, > Kirti