From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40D45C433ED for ; Tue, 18 May 2021 19:01:54 +0000 (UTC) Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C8D636100C for ; Tue, 18 May 2021 19:01:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C8D636100C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding :Content-Type:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To:Message-ID: Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=B+tbDn6hSY1ZgCuYA2sSHFdRRK3HXdpLFFtIrLvxdHY=; b=XxMVZr9a/gzQfLeHGwhDilWwl zCdwsbgVcvRBr4WFy5/zuHI4N5Ek6B1b9bEFtIRqE3ZR12n+pWZr4T+plegEJjffpaOjVWURReDRY ONDtQahTQjwArrKn0AK7qiZ5xxcE+K7r18Z6IZjVVfSd40howbf7lidLbjEVvv2lAid18dab4300F ku8UYV4DRHqYY633PLSTps8cFsYgIhpwmYEPavarw6p1zl7O7Ddg0nKFmCnudboecymn+bj7831tm 1K+C/g8xZ8OG3hBrLc9CTbrOccXpZbsPH/9ddyKJ6EDWiMWpmgohSRsmc33RlYFCwSvT++e/aAN6U u21E4bZ3Q==; Received: from localhost ([::1] helo=desiato.infradead.org) by desiato.infradead.org with esmtp (Exim 4.94 #2 (Red Hat Linux)) id 1lj4wj-001fnc-7T; Tue, 18 May 2021 19:00:02 +0000 Received: from bombadil.infradead.org ([2607:7c80:54:e::133]) by desiato.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lj4vu-001fgb-EV for linux-arm-kernel@desiato.infradead.org; Tue, 18 May 2021 18:59:10 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Subject:Cc:To: From:Date:Sender:Reply-To:Content-ID:Content-Description; bh=Ax4CDb+Sd6mNQ6JnBMbZp3JYaJV2pPRPTkaJj0ycGsA=; b=LI/57m7zdaq+ouy/YFGE8WAahe B1DcGHtCJRBdgGW7/Js3efwbiAJmdVakKKxN/EtxaNZPSmyaSIjJCYxHV2uvrzIpYa4fTCAThqjdG 7iaTdMaSRRpSrobOYppbEBZ/0hdfLBZzGhtnDGy6iMc7WI5MTt+zrce2QdwF8RRmwI02aLqJOmmhs xWczFUlW7mW0UwmR2XuyOzMc4kP/rc3DCRaLJrpFHrE73KXY1V5H/A1ex62xCusmefTro+2FCgnOn SkJW2h+m7EMyO4HB92mtyO26EYRwai9Ren4jLaHCtk9kfyRpjGPLVQc195lmQesOSky0x+ykt07zb U0WK1HrQ==; Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]) by bombadil.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lj4vq-00EtfC-Ng for linux-arm-kernel@lists.infradead.org; Tue, 18 May 2021 18:59:09 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1621364345; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Ax4CDb+Sd6mNQ6JnBMbZp3JYaJV2pPRPTkaJj0ycGsA=; b=jK4mxrSrfeZ0nroNnipElArcVIsC0rjefG+VVazJCDGYRnDxO7IjFuEI4cpG+5jY1UdkGh hG13D2fP8TjOsoqq6Y6Q8NGk96AAbvKK9UhJzMAFFARWYrWfrXy4H1ITMNih2lVuo92/XY I5a3lAbHzB+LReFyIgFPbA2inrpIRyY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-399-LSvapAV8PQWv0YsaYRwixg-1; Tue, 18 May 2021 14:59:03 -0400 X-MC-Unique: LSvapAV8PQWv0YsaYRwixg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 00050100C61D; Tue, 18 May 2021 18:59:00 +0000 (UTC) Received: from redhat.com (ovpn-113-225.phx2.redhat.com [10.3.113.225]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3A0F6100F49F; Tue, 18 May 2021 18:59:00 +0000 (UTC) Date: Tue, 18 May 2021 12:58:31 -0600 From: Alex Williamson To: Shenming Lu Cc: Cornelia Huck , Will Deacon , Robin Murphy , Joerg Roedel , Jean-Philippe Brucker , Eric Auger , , , , , , Kevin Tian , Lu Baolu , , Christoph Hellwig , Jonathan Cameron , Barry Song , , Subject: Re: [RFC PATCH v3 3/8] vfio/type1: Add an MMU notifier to avoid pinning Message-ID: <20210518125831.153e039c.alex.williamson@redhat.com> In-Reply-To: <20210409034420.1799-4-lushenming@huawei.com> References: <20210409034420.1799-1-lushenming@huawei.com> <20210409034420.1799-4-lushenming@huawei.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210518_115906_899332_735705D8 X-CRM114-Status: GOOD ( 34.28 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Fri, 9 Apr 2021 11:44:15 +0800 Shenming Lu wrote: > To avoid pinning pages when they are mapped in IOMMU page tables, we > add an MMU notifier to tell the addresses which are no longer valid > and try to unmap them. > > Signed-off-by: Shenming Lu > --- > drivers/vfio/vfio_iommu_type1.c | 112 +++++++++++++++++++++++++++++++- > 1 file changed, 109 insertions(+), 3 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index ab0ff60ee207..1cb9d1f2717b 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -40,6 +40,7 @@ > #include > #include > #include > +#include > > #define DRIVER_VERSION "0.2" > #define DRIVER_AUTHOR "Alex Williamson " > @@ -69,6 +70,7 @@ struct vfio_iommu { > struct mutex lock; > struct rb_root dma_list; > struct blocking_notifier_head notifier; > + struct mmu_notifier mn; > unsigned int dma_avail; > unsigned int vaddr_invalid_count; > uint64_t pgsize_bitmap; > @@ -1204,6 +1206,72 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma, > return unlocked; > } > > +/* Unmap the IOPF mapped pages in the specified range. */ > +static void vfio_unmap_partial_iopf(struct vfio_iommu *iommu, > + struct vfio_dma *dma, > + dma_addr_t start, dma_addr_t end) > +{ > + struct iommu_iotlb_gather *gathers; > + struct vfio_domain *d; > + int i, num_domains = 0; > + > + list_for_each_entry(d, &iommu->domain_list, next) > + num_domains++; > + > + gathers = kzalloc(sizeof(*gathers) * num_domains, GFP_KERNEL); > + if (gathers) { > + for (i = 0; i < num_domains; i++) > + iommu_iotlb_gather_init(&gathers[i]); > + } If we're always serialized on iommu->lock, would it make sense to have gathers pre-allocated on the vfio_iommu object? > + > + while (start < end) { > + unsigned long bit_offset; > + size_t len; > + > + bit_offset = (start - dma->iova) >> PAGE_SHIFT; > + > + for (len = 0; start + len < end; len += PAGE_SIZE) { > + if (!IOPF_MAPPED_BITMAP_GET(dma, > + bit_offset + (len >> PAGE_SHIFT))) > + break; There are bitmap helpers for this, find_first_bit(), find_next_zero_bit(). > + } > + > + if (len) { > + i = 0; > + list_for_each_entry(d, &iommu->domain_list, next) { > + size_t unmapped; > + > + if (gathers) > + unmapped = iommu_unmap_fast(d->domain, > + start, len, > + &gathers[i++]); > + else > + unmapped = iommu_unmap(d->domain, > + start, len); > + > + if (WARN_ON(unmapped != len)) The IOMMU API does not guarantee arbitrary unmap sizes, this will trigger and this exit path is wrong. If we've already unmapped the IOMMU, shouldn't we proceed with @unmapped rather than @len so the device can re-fault the extra mappings? Otherwise the IOMMU state doesn't match the iopf bitmap state. > + goto out; > + } > + > + bitmap_clear(dma->iopf_mapped_bitmap, > + bit_offset, len >> PAGE_SHIFT); > + > + cond_resched(); > + } > + > + start += (len + PAGE_SIZE); > + } > + > +out: > + if (gathers) { > + i = 0; > + list_for_each_entry(d, &iommu->domain_list, next) > + iommu_iotlb_sync(d->domain, &gathers[i++]); > + > + kfree(gathers); > + } > +} > + > static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) > { > WARN_ON(!RB_EMPTY_ROOT(&dma->pfn_list)); > @@ -3197,17 +3265,18 @@ static int vfio_iommu_type1_dma_map_iopf(struct iommu_fault *fault, void *data) > > vaddr = iova - dma->iova + dma->vaddr; > > - if (vfio_pin_page_external(dma, vaddr, &pfn, true)) > + if (vfio_pin_page_external(dma, vaddr, &pfn, false)) > goto out_invalid; > > if (vfio_iommu_map(iommu, iova, pfn, 1, dma->prot)) { > - if (put_pfn(pfn, dma->prot)) > - vfio_lock_acct(dma, -1, true); > + put_pfn(pfn, dma->prot); > goto out_invalid; > } > > bitmap_set(dma->iopf_mapped_bitmap, bit_offset, 1); > > + put_pfn(pfn, dma->prot); > + > out_success: > status = IOMMU_PAGE_RESP_SUCCESS; > > @@ -3220,6 +3289,43 @@ static int vfio_iommu_type1_dma_map_iopf(struct iommu_fault *fault, void *data) > return 0; > } > > +static void mn_invalidate_range(struct mmu_notifier *mn, struct mm_struct *mm, > + unsigned long start, unsigned long end) > +{ > + struct vfio_iommu *iommu = container_of(mn, struct vfio_iommu, mn); > + struct rb_node *n; > + int ret; > + > + mutex_lock(&iommu->lock); > + > + ret = vfio_wait_all_valid(iommu); > + if (WARN_ON(ret < 0)) > + return; Is WARN_ON sufficient for this error condition? We've been told to evacuate a range of mm, the device still has DMA access, we've removed page pinning. This seems like a BUG_ON condition to me, we can't allow the system to continue in any way with pages getting unmapped from under the device. > + > + for (n = rb_first(&iommu->dma_list); n; n = rb_next(n)) { > + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node); > + unsigned long start_n, end_n; > + > + if (end <= dma->vaddr || start >= dma->vaddr + dma->size) > + continue; > + > + start_n = ALIGN_DOWN(max_t(unsigned long, start, dma->vaddr), > + PAGE_SIZE); > + end_n = ALIGN(min_t(unsigned long, end, dma->vaddr + dma->size), > + PAGE_SIZE); > + > + vfio_unmap_partial_iopf(iommu, dma, > + start_n - dma->vaddr + dma->iova, > + end_n - dma->vaddr + dma->iova); > + } > + > + mutex_unlock(&iommu->lock); > +} > + > +static const struct mmu_notifier_ops vfio_iommu_type1_mn_ops = { > + .invalidate_range = mn_invalidate_range, > +}; > + > static long vfio_iommu_type1_ioctl(void *iommu_data, > unsigned int cmd, unsigned long arg) > { Again, this patch series is difficult to follow because we're introducing dead code until the mmu notifier actually has a path to be registered. We shouldn't be taking any faults until iopf is enabled, so it seems like we can add more of the core support alongside this code. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel