From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.6 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1037C433B4 for ; Fri, 21 May 2021 06:40:13 +0000 (UTC) Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2D015613B6 for ; Fri, 21 May 2021 06:40:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2D015613B6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding :Content-Type:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date:Message-ID:From: References:CC:To:Subject:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=YNcB+flCDf9vXptrn7nFdGLsDlbBbOw1D2cmekcerf8=; b=j7sSve+BFPQ1if7BSOXOTfTLtg CpI9UABw1el4hTVB2Mz+SuwyxcuqwrL8QSKFKld5bvas0fSRopwypczwzlR6RssF1KJozDiVXsgRp n4PWCqhAiqy7xzY+f8u9khgLl/5m2rYissphJ24CQEQDIYbUBS4AGHokWJCb8E5P7E9JdAsHNECKs FTXJdO7Wdh5vF6YYk7sPBoMd5yf2fPEtc7bJZrlkPlcw4kaeRTvWIS7qccHy7cq+af5ZTEGe5dHPf 7479XAVV+Ao4ArOXN0AEXofibnM4wsp/33VmdCD8JAvJCuxIAQAVGTI07pA+XqzTIOfRNehrDjnnQ 8BRMHo6Q==; Received: from localhost ([::1] helo=desiato.infradead.org) by desiato.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1ljyne-0040Q9-UJ; Fri, 21 May 2021 06:38:23 +0000 Received: from bombadil.infradead.org ([2607:7c80:54:e::133]) by desiato.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1ljynD-0040NM-Pc for linux-arm-kernel@desiato.infradead.org; Fri, 21 May 2021 06:37:55 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: Content-Type:In-Reply-To:MIME-Version:Date:Message-ID:From:References:CC:To: Subject:Sender:Reply-To:Content-ID:Content-Description; bh=37vWn79mYVm8fcw7s3K2AxZSTef2uTMSh1gl8iElrm8=; b=U8TkUB0GT31MyYpM3K0OexTiHM PXRFTfk3v6BTueEzTPGqVNg367ZG4NE5aUXUVxj2MIwtUBBha3/LgUgK7QKjwzb4siz4yVqyC/qXD mpRyxV96GuuTDZy6pJa9zgqkIs/RUtQxenUGqIohvn/nByOG3Z0+UGcHnfm+y7tp2zhCXcZ4fAKmy 6xGBllvB+Y4BgJA2c8A0AnXPhZ4IbhqWQHDV+2rafxD/x504UtyabBxvPyELjy3ZZjnLHdb9vOtiG SsEkwFj5GvxXEohB80vRwgppwHP8g0nwx0FtFF6UhAHmHEU9mSXsl8EvjuYQ/lQVqoz9yPX/0W2jZ Zm0P6Zxw==; Received: from szxga07-in.huawei.com ([45.249.212.35]) by bombadil.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1ljynA-00GsQ8-Ba for linux-arm-kernel@lists.infradead.org; Fri, 21 May 2021 06:37:54 +0000 Received: from dggems704-chm.china.huawei.com (unknown [172.30.72.60]) by szxga07-in.huawei.com (SkyGuard) with ESMTP id 4FmcKc3VJHzCvPs; Fri, 21 May 2021 14:35:00 +0800 (CST) Received: from dggpemm500022.china.huawei.com (7.185.36.162) by dggems704-chm.china.huawei.com (10.3.19.181) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Fri, 21 May 2021 14:37:48 +0800 Received: from [10.174.187.155] (10.174.187.155) by dggpemm500022.china.huawei.com (7.185.36.162) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Fri, 21 May 2021 14:37:47 +0800 Subject: Re: [RFC PATCH v3 3/8] vfio/type1: Add an MMU notifier to avoid pinning To: Alex Williamson CC: Cornelia Huck , Will Deacon , "Robin Murphy" , Joerg Roedel , "Jean-Philippe Brucker" , Eric Auger , , , , , , Kevin Tian , Lu Baolu , , Christoph Hellwig , Jonathan Cameron , "Barry Song" , , References: <20210409034420.1799-1-lushenming@huawei.com> <20210409034420.1799-4-lushenming@huawei.com> <20210518125831.153e039c.alex.williamson@redhat.com> From: Shenming Lu Message-ID: <6a914c05-fbd6-8555-1a1d-a21e5fa40a37@huawei.com> Date: Fri, 21 May 2021 14:37:47 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.2.2 MIME-Version: 1.0 In-Reply-To: <20210518125831.153e039c.alex.williamson@redhat.com> Content-Language: en-US X-Originating-IP: [10.174.187.155] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggpemm500022.china.huawei.com (7.185.36.162) X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210520_233753_048842_1D9D8FBF X-CRM114-Status: GOOD ( 35.31 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On 2021/5/19 2:58, Alex Williamson wrote: > On Fri, 9 Apr 2021 11:44:15 +0800 > Shenming Lu wrote: > >> To avoid pinning pages when they are mapped in IOMMU page tables, we >> add an MMU notifier to tell the addresses which are no longer valid >> and try to unmap them. >> >> Signed-off-by: Shenming Lu >> --- >> drivers/vfio/vfio_iommu_type1.c | 112 +++++++++++++++++++++++++++++++- >> 1 file changed, 109 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c >> index ab0ff60ee207..1cb9d1f2717b 100644 >> --- a/drivers/vfio/vfio_iommu_type1.c >> +++ b/drivers/vfio/vfio_iommu_type1.c >> @@ -40,6 +40,7 @@ >> #include >> #include >> #include >> +#include >> >> #define DRIVER_VERSION "0.2" >> #define DRIVER_AUTHOR "Alex Williamson " >> @@ -69,6 +70,7 @@ struct vfio_iommu { >> struct mutex lock; >> struct rb_root dma_list; >> struct blocking_notifier_head notifier; >> + struct mmu_notifier mn; >> unsigned int dma_avail; >> unsigned int vaddr_invalid_count; >> uint64_t pgsize_bitmap; >> @@ -1204,6 +1206,72 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma, >> return unlocked; >> } >> >> +/* Unmap the IOPF mapped pages in the specified range. */ >> +static void vfio_unmap_partial_iopf(struct vfio_iommu *iommu, >> + struct vfio_dma *dma, >> + dma_addr_t start, dma_addr_t end) >> +{ >> + struct iommu_iotlb_gather *gathers; >> + struct vfio_domain *d; >> + int i, num_domains = 0; >> + >> + list_for_each_entry(d, &iommu->domain_list, next) >> + num_domains++; >> + >> + gathers = kzalloc(sizeof(*gathers) * num_domains, GFP_KERNEL); >> + if (gathers) { >> + for (i = 0; i < num_domains; i++) >> + iommu_iotlb_gather_init(&gathers[i]); >> + } > > > If we're always serialized on iommu->lock, would it make sense to have > gathers pre-allocated on the vfio_iommu object? Yeah, we can do it. > >> + >> + while (start < end) { >> + unsigned long bit_offset; >> + size_t len; >> + >> + bit_offset = (start - dma->iova) >> PAGE_SHIFT; >> + >> + for (len = 0; start + len < end; len += PAGE_SIZE) { >> + if (!IOPF_MAPPED_BITMAP_GET(dma, >> + bit_offset + (len >> PAGE_SHIFT))) >> + break; > > > There are bitmap helpers for this, find_first_bit(), > find_next_zero_bit(). Thanks for the hint. :-) > > >> + } >> + >> + if (len) { >> + i = 0; >> + list_for_each_entry(d, &iommu->domain_list, next) { >> + size_t unmapped; >> + >> + if (gathers) >> + unmapped = iommu_unmap_fast(d->domain, >> + start, len, >> + &gathers[i++]); >> + else >> + unmapped = iommu_unmap(d->domain, >> + start, len); >> + >> + if (WARN_ON(unmapped != len)) > > The IOMMU API does not guarantee arbitrary unmap sizes, this will > trigger and this exit path is wrong. If we've already unmapped the > IOMMU, shouldn't we proceed with @unmapped rather than @len so the > device can re-fault the extra mappings? Otherwise the IOMMU state > doesn't match the iopf bitmap state. OK, I will correct it. And can we assume that the @unmapped values (returned from iommu_unmap) of all domains are the same (since all domains share the same iopf_mapped_bitmap)? > >> + goto out; >> + } >> + >> + bitmap_clear(dma->iopf_mapped_bitmap, >> + bit_offset, len >> PAGE_SHIFT); >> + >> + cond_resched(); >> + } >> + >> + start += (len + PAGE_SIZE); >> + } >> + >> +out: >> + if (gathers) { >> + i = 0; >> + list_for_each_entry(d, &iommu->domain_list, next) >> + iommu_iotlb_sync(d->domain, &gathers[i++]); >> + >> + kfree(gathers); >> + } >> +} >> + >> static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) >> { >> WARN_ON(!RB_EMPTY_ROOT(&dma->pfn_list)); >> @@ -3197,17 +3265,18 @@ static int vfio_iommu_type1_dma_map_iopf(struct iommu_fault *fault, void *data) >> >> vaddr = iova - dma->iova + dma->vaddr; >> >> - if (vfio_pin_page_external(dma, vaddr, &pfn, true)) >> + if (vfio_pin_page_external(dma, vaddr, &pfn, false)) >> goto out_invalid; >> >> if (vfio_iommu_map(iommu, iova, pfn, 1, dma->prot)) { >> - if (put_pfn(pfn, dma->prot)) >> - vfio_lock_acct(dma, -1, true); >> + put_pfn(pfn, dma->prot); >> goto out_invalid; >> } >> >> bitmap_set(dma->iopf_mapped_bitmap, bit_offset, 1); >> >> + put_pfn(pfn, dma->prot); >> + >> out_success: >> status = IOMMU_PAGE_RESP_SUCCESS; >> >> @@ -3220,6 +3289,43 @@ static int vfio_iommu_type1_dma_map_iopf(struct iommu_fault *fault, void *data) >> return 0; >> } >> >> +static void mn_invalidate_range(struct mmu_notifier *mn, struct mm_struct *mm, >> + unsigned long start, unsigned long end) >> +{ >> + struct vfio_iommu *iommu = container_of(mn, struct vfio_iommu, mn); >> + struct rb_node *n; >> + int ret; >> + >> + mutex_lock(&iommu->lock); >> + >> + ret = vfio_wait_all_valid(iommu); >> + if (WARN_ON(ret < 0)) >> + return; > > Is WARN_ON sufficient for this error condition? We've been told to > evacuate a range of mm, the device still has DMA access, we've removed > page pinning. This seems like a BUG_ON condition to me, we can't allow > the system to continue in any way with pages getting unmapped from > under the device. Make sense. > >> + >> + for (n = rb_first(&iommu->dma_list); n; n = rb_next(n)) { >> + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node); >> + unsigned long start_n, end_n; >> + >> + if (end <= dma->vaddr || start >= dma->vaddr + dma->size) >> + continue; >> + >> + start_n = ALIGN_DOWN(max_t(unsigned long, start, dma->vaddr), >> + PAGE_SIZE); >> + end_n = ALIGN(min_t(unsigned long, end, dma->vaddr + dma->size), >> + PAGE_SIZE); >> + >> + vfio_unmap_partial_iopf(iommu, dma, >> + start_n - dma->vaddr + dma->iova, >> + end_n - dma->vaddr + dma->iova); >> + } >> + >> + mutex_unlock(&iommu->lock); >> +} >> + >> +static const struct mmu_notifier_ops vfio_iommu_type1_mn_ops = { >> + .invalidate_range = mn_invalidate_range, >> +}; >> + >> static long vfio_iommu_type1_ioctl(void *iommu_data, >> unsigned int cmd, unsigned long arg) >> { > > Again, this patch series is difficult to follow because we're > introducing dead code until the mmu notifier actually has a path to be > registered. Sorry again for this, I will be careful for the sequence of the series later. > We shouldn't be taking any faults until iopf is enabled, > so it seems like we can add more of the core support alongside this > code. Could you be more specific about this? :-) We can add a requirement that the VFIO_IOMMU_ENABLE_IOPF ioctl (in Patch 5) must be called right after VFIO_SET_IOMMU, so that IOPF is enabled before setting up any DMA mapping. Is this sufficient? Thanks, Shenming > > . > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel