From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A2AEC43331 for ; Fri, 27 Mar 2020 05:29:28 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 533EF206DB for ; Fri, 27 Mar 2020 05:29:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="PBAxh0Fj" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 533EF206DB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:37256 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jHhYd-0004Ru-Ag for qemu-devel@archiver.kernel.org; Fri, 27 Mar 2020 01:29:27 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:37099) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jHhXk-00040Y-EZ for qemu-devel@nongnu.org; Fri, 27 Mar 2020 01:28:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jHhXh-0005GM-7Z for qemu-devel@nongnu.org; Fri, 27 Mar 2020 01:28:31 -0400 Received: from hqnvemgate24.nvidia.com ([216.228.121.143]:8559) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1jHhXg-00057t-Sv for qemu-devel@nongnu.org; Fri, 27 Mar 2020 01:28:29 -0400 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate24.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Thu, 26 Mar 2020 22:26:54 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Thu, 26 Mar 2020 22:28:26 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Thu, 26 Mar 2020 22:28:26 -0700 Received: from [10.40.103.35] (10.124.1.5) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Fri, 27 Mar 2020 05:28:17 +0000 Subject: Re: [PATCH v16 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking. To: Yan Zhao References: <1585084732-18473-1-git-send-email-kwankhede@nvidia.com> <20200325021135.GB20109@joy-OptiPlex-7040> <33d38629-aeaf-1c30-26d4-958b998620b0@nvidia.com> <20200327003055.GB26419@joy-OptiPlex-7040> X-Nvconfidentiality: public From: Kirti Wankhede Message-ID: Date: Fri, 27 Mar 2020 10:58:13 +0530 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 MIME-Version: 1.0 In-Reply-To: <20200327003055.GB26419@joy-OptiPlex-7040> X-Originating-IP: [10.124.1.5] X-ClientProxiedBy: HQMAIL111.nvidia.com (172.20.187.18) To HQMAIL107.nvidia.com (172.20.187.13) Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1585286814; bh=V8dNmVuRD8K5YVaYmXdbyNYANtS12HbS4K+11yQGxzc=; h=X-PGP-Universal:Subject:To:CC:References:X-Nvconfidentiality:From: Message-ID:Date:User-Agent:MIME-Version:In-Reply-To: X-Originating-IP:X-ClientProxiedBy:Content-Type:Content-Language: Content-Transfer-Encoding; b=PBAxh0FjgDxvr7A1Lz6elU9aBYjn8JyPBp69ueEB7Vcoo1aP498kRM6RERUJH1A9V aDGAFaZj8hjJpKCtVee7bQiU4p/YL2WnJRM0OfcncHHwBypWizkFf7Fd8+/lS1iZDk HJlfNV5lW2DttFU6rkLi6/18Pi9IzabSW8+QY9oUPtgC0Iz0H0PqTOnyouXTqRE7Vr d94h8WlHk3JTDNhnhe+WvOOEhh1UqDLZ8nS+w2rCFrTwTn6axaYsaqIdduyEhM4xwS QOjlG6PVgjctGekWNlA7k0PHogWmlw0iEzl+87PQ217AHpGquYYhKK6NxjgNz+edYj 8MYhZOvF7CR9A== X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 216.228.121.143 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Zhengxiao.zx@Alibaba-inc.com" , "Tian, Kevin" , "Liu, Yi L" , "cjia@nvidia.com" , "kvm@vger.kernel.org" , "eskultet@redhat.com" , "Yang, Ziye" , "qemu-devel@nongnu.org" , "cohuck@redhat.com" , "shuangtai.tst@alibaba-inc.com" , "dgilbert@redhat.com" , "Wang, Zhi A" , "mlevitsk@redhat.com" , "pasic@linux.ibm.com" , "aik@ozlabs.ru" , "alex.williamson@redhat.com" , "eauger@redhat.com" , "felipe@nutanix.com" , "jonathan.davies@nutanix.com" , "Liu, Changpeng" , "Ken.Xue@amd.com" Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Hit send button little early. > > I checked v12, it's not like what I said. > In v12, bitmaps are generated per vfio_dma, and combination of the > bitmaps are required in order to generate a big bitmap suiting for dirty > query. It can cause problem when offset not aligning. > But what I propose here is to generate an rb tree orthogonal to the tree > of vfio_dma. > > as to CPU cycles saving, I don't think iterating/translating page by page > would achieve that purpose. > Instead of creating one extra rb tree for dirty pages tracking in v10 tried to use dma->pfn_list itself, we tried changes in v10, v11 and v12, latest version is evolved version with best possible approach after discussion. Probably, go through v11 as well. https://patchwork.kernel.org/patch/11298335/ Thanks, Kirti On 3/27/2020 6:00 AM, Yan Zhao wrote: > On Fri, Mar 27, 2020 at 05:39:01AM +0800, Kirti Wankhede wrote: >> >> >> On 3/25/2020 7:41 AM, Yan Zhao wrote: >>> On Wed, Mar 25, 2020 at 05:18:52AM +0800, Kirti Wankhede wrote: >>>> VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations: >>>> - Start dirty pages tracking while migration is active >>>> - Stop dirty pages tracking. >>>> - Get dirty pages bitmap. Its user space application's responsibility to >>>> copy content of dirty pages from source to destination during migration. >>>> >>>> To prevent DoS attack, memory for bitmap is allocated per vfio_dma >>>> structure. Bitmap size is calculated considering smallest supported page >>>> size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled >>>> >>>> Bitmap is populated for already pinned pages when bitmap is allocated for >>>> a vfio_dma with the smallest supported page size. Update bitmap from >>>> pinning functions when tracking is enabled. When user application queries >>>> bitmap, check if requested page size is same as page size used to >>>> populated bitmap. If it is equal, copy bitmap, but if not equal, return >>>> error. >>>> >>>> Signed-off-by: Kirti Wankhede >>>> Reviewed-by: Neo Jia >>>> --- >>>> drivers/vfio/vfio_iommu_type1.c | 266 +++++++++++++++++++++++++++++++++++++++- >>>> 1 file changed, 260 insertions(+), 6 deletions(-) >>>> >>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c >>>> index 70aeab921d0f..874a1a7ae925 100644 >>>> --- a/drivers/vfio/vfio_iommu_type1.c >>>> +++ b/drivers/vfio/vfio_iommu_type1.c >>>> @@ -71,6 +71,7 @@ struct vfio_iommu { >>>> unsigned int dma_avail; >>>> bool v2; >>>> bool nesting; >>>> + bool dirty_page_tracking; >>>> }; >>>> >>>> struct vfio_domain { >>>> @@ -91,6 +92,7 @@ struct vfio_dma { >>>> bool lock_cap; /* capable(CAP_IPC_LOCK) */ >>>> struct task_struct *task; >>>> struct rb_root pfn_list; /* Ex-user pinned pfn list */ >>>> + unsigned long *bitmap; >>>> }; >>>> >>>> struct vfio_group { >>>> @@ -125,7 +127,21 @@ struct vfio_regions { >>>> #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) \ >>>> (!list_empty(&iommu->domain_list)) >>>> >>>> +#define DIRTY_BITMAP_BYTES(n) (ALIGN(n, BITS_PER_TYPE(u64)) / BITS_PER_BYTE) >>>> + >>>> +/* >>>> + * Input argument of number of bits to bitmap_set() is unsigned integer, which >>>> + * further casts to signed integer for unaligned multi-bit operation, >>>> + * __bitmap_set(). >>>> + * Then maximum bitmap size supported is 2^31 bits divided by 2^3 bits/byte, >>>> + * that is 2^28 (256 MB) which maps to 2^31 * 2^12 = 2^43 (8TB) on 4K page >>>> + * system. >>>> + */ >>>> +#define DIRTY_BITMAP_PAGES_MAX (uint64_t)(INT_MAX - 1) >>>> +#define DIRTY_BITMAP_SIZE_MAX DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX) >>>> + >>>> static int put_pfn(unsigned long pfn, int prot); >>>> +static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu); >>>> >>>> /* >>>> * This code handles mapping and unmapping of user data buffers >>>> @@ -175,6 +191,77 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, struct vfio_dma *old) >>>> rb_erase(&old->node, &iommu->dma_list); >>>> } >>>> >>>> + >>>> +static int vfio_dma_bitmap_alloc(struct vfio_dma *dma, uint64_t pgsize) >>>> +{ >>>> + uint64_t npages = dma->size / pgsize; >>>> + >>>> + if (npages > DIRTY_BITMAP_PAGES_MAX) >>>> + return -EINVAL; >>>> + >>>> + dma->bitmap = kvzalloc(DIRTY_BITMAP_BYTES(npages), GFP_KERNEL); >>>> + if (!dma->bitmap) >>>> + return -ENOMEM; >>>> + >>>> + return 0; >>>> +} >>>> + >>>> +static void vfio_dma_bitmap_free(struct vfio_dma *dma) >>>> +{ >>>> + kfree(dma->bitmap); >>>> + dma->bitmap = NULL; >>>> +} >>>> + >>>> +static void vfio_dma_populate_bitmap(struct vfio_dma *dma, uint64_t pgsize) >>>> +{ >>>> + struct rb_node *p; >>>> + >>>> + if (RB_EMPTY_ROOT(&dma->pfn_list)) >>>> + return; >>>> + >>>> + for (p = rb_first(&dma->pfn_list); p; p = rb_next(p)) { >>>> + struct vfio_pfn *vpfn = rb_entry(p, struct vfio_pfn, node); >>>> + >>>> + bitmap_set(dma->bitmap, (vpfn->iova - dma->iova) / pgsize, 1); >>>> + } >>>> +} >>>> + >>>> +static int vfio_dma_bitmap_alloc_all(struct vfio_iommu *iommu, uint64_t pgsize) >>>> +{ >>>> + struct rb_node *n = rb_first(&iommu->dma_list); >>>> + >>>> + for (; n; n = rb_next(n)) { >>>> + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node); >>>> + int ret; >>>> + >>>> + ret = vfio_dma_bitmap_alloc(dma, pgsize); >>>> + if (ret) { >>>> + struct rb_node *p = rb_prev(n); >>>> + >>>> + for (; p; p = rb_prev(p)) { >>>> + struct vfio_dma *dma = rb_entry(n, >>>> + struct vfio_dma, node); >>>> + >>>> + vfio_dma_bitmap_free(dma); >>>> + } >>>> + return ret; >>>> + } >>>> + vfio_dma_populate_bitmap(dma, pgsize); >>>> + } >>>> + return 0; >>>> +} >>>> + >>>> +static void vfio_dma_bitmap_free_all(struct vfio_iommu *iommu) >>>> +{ >>>> + struct rb_node *n = rb_first(&iommu->dma_list); >>>> + >>>> + for (; n; n = rb_next(n)) { >>>> + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node); >>>> + >>>> + vfio_dma_bitmap_free(dma); >>>> + } >>>> +} >>>> + >>>> /* >>>> * Helper Functions for host iova-pfn list >>>> */ >>>> @@ -567,6 +654,18 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data, >>>> vfio_unpin_page_external(dma, iova, do_accounting); >>>> goto pin_unwind; >>>> } >>>> + >>>> + if (iommu->dirty_page_tracking) { >>>> + unsigned long pgshift = >>>> + __ffs(vfio_pgsize_bitmap(iommu)); >>>> + >>>> + /* >>>> + * Bitmap populated with the smallest supported page >>>> + * size >>>> + */ >>>> + bitmap_set(dma->bitmap, >>>> + (vpfn->iova - dma->iova) >> pgshift, 1); >>>> + } >>>> } >>>> >>>> ret = i; >>>> @@ -801,6 +900,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) >>>> vfio_unmap_unpin(iommu, dma, true); >>>> vfio_unlink_dma(iommu, dma); >>>> put_task_struct(dma->task); >>>> + vfio_dma_bitmap_free(dma); >>>> kfree(dma); >>>> iommu->dma_avail++; >>>> } >>>> @@ -831,6 +931,57 @@ static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu) >>>> return bitmap; >>>> } >>>> >>>> +static int vfio_iova_dirty_bitmap(struct vfio_iommu *iommu, dma_addr_t iova, >>>> + size_t size, uint64_t pgsize, >>>> + u64 __user *bitmap) >>>> +{ >>>> + struct vfio_dma *dma; >>>> + unsigned long pgshift = __ffs(pgsize); >>>> + unsigned int npages, bitmap_size; >>>> + >>>> + dma = vfio_find_dma(iommu, iova, 1); >>>> + >>>> + if (!dma) >>>> + return -EINVAL; >>>> + >>>> + if (dma->iova != iova || dma->size != size) >>>> + return -EINVAL; >>>> + >>> Still don't sure if it's a good practice. >>> I saw the qemu implementation. >>> Qemu just iterates the whole IOVA address space, >>> It needs to find IOTLB entry for an IOVA >>> (1) if it can find an IOTLB for an IOVA, do the DIRTY_PAGES IOCTL and >>> increment IOVA by (iotlb.addr_mask + 1) >>> >>> (2) if no existing IOTLB found, the imrc->translate needs to go searching shadow >>> page table to try to generate one. >>> if it still fails,(most probably case, as IOMMU only maps a small part in its address >>> space). increment IOVA by 1 page. >>> >>> So, if the address space width is 39bit, and if there's only one page >>> mapped, you still have to translate IOVA for around 2^27 times in each >>> query. Isn't it too inefficient? >>> >> >> This is Qemu side implementation, let discuss it on QEMU patches. >> > But kernel has to support it first, right? > >>> So, IMHO, why we could not just save an rb tree specific for dirty pages, then generate >>> a bitmap for each query? >> >> This is looping back to implentation in v10 - v12 version. There are >> problems discussed during v10 to v12 version of patches with this approach. >> - populating dirty bitmap at the time of query will add more CPU cycles. >> - If we save these CPU cyles means dirty pages need to be tracked when >> they are pinned or dirtied by CPU, that is, inttoduced per vfio_dma >> bitmap. If ranges are not vfio_dma aligned, then copying bitmap to user >> space becomes complicated and unefficient. >> >> So we decided to go with the approach implemented here. > > I checked v12, it's not like what I said. > In v12, bitmaps are generated per vfio_dma, and combination of the > bitmaps are required in order to generate a big bitmap suiting for dirty > query. It can cause problem when offset not aligning. > But what I propose here is to generate an rb tree orthogonal to the tree > of vfio_dma. > > as to CPU cycles saving, I don't think iterating/translating page by page > would achieve that purpose. > > > >>> >>>> + npages = dma->size >> pgshift; >>>> + bitmap_size = DIRTY_BITMAP_BYTES(npages); >>>> + >>>> + /* mark all pages dirty if all pages are pinned and mapped. */ >>>> + if (dma->iommu_mapped) >>>> + bitmap_set(dma->bitmap, 0, npages); >>>> + >>>> + if (copy_to_user((void __user *)bitmap, dma->bitmap, bitmap_size)) >>>> + return -EFAULT; >>>> + >>>> + /* >>>> + * Re-populate bitmap to include all pinned pages which are considered >>>> + * as dirty but exclude pages which are unpinned and pages which are >>>> + * marked dirty by vfio_dma_rw() >>>> + */ >>>> + bitmap_clear(dma->bitmap, 0, npages); >>>> + vfio_dma_populate_bitmap(dma, pgsize); >>> will this also repopulate bitmap for pinned pages set by pass-through devices in >>> patch 07 ? >>> >> >> If pass through device's driver pins pages using vfio_pin_pages and all >> devices in the group pins pages through vfio_pin_pages, then >> iommu->pinned_page_dirty_scope is set true, then bitmap is repolutated. >> >> > pass-through devices already have all guest memory pinned, it would have > no reason to call vfio_pin_pages if not attempting to mark page dirty. > Then if it calls vfio_pin_pages, it means "the pages are accessed, please > mark them dirty, feel free to clean it when you get it", > not "the pages will be accesses, please mark them dirty continuously" > > Thanks > Yan > >> >>> >>>> + return 0; >>>> +} >>>> + >>>> +static int verify_bitmap_size(uint64_t npages, uint64_t bitmap_size) >>>> +{ >>>> + uint64_t bsize; >>>> + >>>> + if (!npages || !bitmap_size || (bitmap_size > DIRTY_BITMAP_SIZE_MAX)) >>>> + return -EINVAL; >>>> + >>>> + bsize = DIRTY_BITMAP_BYTES(npages); >>>> + >>>> + if (bitmap_size < bsize) >>>> + return -EINVAL; >>>> + >>>> + return 0; >>>> +} >>>> + >>>> static int vfio_dma_do_unmap(struct vfio_iommu *iommu, >>>> struct vfio_iommu_type1_dma_unmap *unmap) >>>> { >>>> @@ -1038,16 +1189,16 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, >>>> unsigned long vaddr = map->vaddr; >>>> size_t size = map->size; >>>> int ret = 0, prot = 0; >>>> - uint64_t mask; >>>> + uint64_t pgsize; >>>> struct vfio_dma *dma; >>>> >>>> /* Verify that none of our __u64 fields overflow */ >>>> if (map->size != size || map->vaddr != vaddr || map->iova != iova) >>>> return -EINVAL; >>>> >>>> - mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1; >>>> + pgsize = (uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu)); >>>> >>>> - WARN_ON(mask & PAGE_MASK); >>>> + WARN_ON((pgsize - 1) & PAGE_MASK); >>>> >>>> /* READ/WRITE from device perspective */ >>>> if (map->flags & VFIO_DMA_MAP_FLAG_WRITE) >>>> @@ -1055,7 +1206,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, >>>> if (map->flags & VFIO_DMA_MAP_FLAG_READ) >>>> prot |= IOMMU_READ; >>>> >>>> - if (!prot || !size || (size | iova | vaddr) & mask) >>>> + if (!prot || !size || (size | iova | vaddr) & (pgsize - 1)) >>>> return -EINVAL; >>>> >>>> /* Don't allow IOVA or virtual address wrap */ >>>> @@ -1130,6 +1281,12 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, >>>> else >>>> ret = vfio_pin_map_dma(iommu, dma, size); >>>> >>>> + if (!ret && iommu->dirty_page_tracking) { >>>> + ret = vfio_dma_bitmap_alloc(dma, pgsize); >>>> + if (ret) >>>> + vfio_remove_dma(iommu, dma); >>>> + } >>>> + >>>> out_unlock: >>>> mutex_unlock(&iommu->lock); >>>> return ret; >>>> @@ -2278,6 +2435,93 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, >>>> >>>> return copy_to_user((void __user *)arg, &unmap, minsz) ? >>>> -EFAULT : 0; >>>> + } else if (cmd == VFIO_IOMMU_DIRTY_PAGES) { >>>> + struct vfio_iommu_type1_dirty_bitmap dirty; >>>> + uint32_t mask = VFIO_IOMMU_DIRTY_PAGES_FLAG_START | >>>> + VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP | >>>> + VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP; >>>> + int ret = 0; >>>> + >>>> + if (!iommu->v2) >>>> + return -EACCES; >>>> + >>>> + minsz = offsetofend(struct vfio_iommu_type1_dirty_bitmap, >>>> + flags); >>>> + >>>> + if (copy_from_user(&dirty, (void __user *)arg, minsz)) >>>> + return -EFAULT; >>>> + >>>> + if (dirty.argsz < minsz || dirty.flags & ~mask) >>>> + return -EINVAL; >>>> + >>>> + /* only one flag should be set at a time */ >>>> + if (__ffs(dirty.flags) != __fls(dirty.flags)) >>>> + return -EINVAL; >>>> + >>>> + if (dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_START) { >>>> + uint64_t pgsize = 1 << __ffs(vfio_pgsize_bitmap(iommu)); >>>> + >>>> + mutex_lock(&iommu->lock); >>>> + if (!iommu->dirty_page_tracking) { >>>> + ret = vfio_dma_bitmap_alloc_all(iommu, pgsize); >>>> + if (!ret) >>>> + iommu->dirty_page_tracking = true; >>>> + } >>>> + mutex_unlock(&iommu->lock); >>>> + return ret; >>>> + } else if (dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP) { >>>> + mutex_lock(&iommu->lock); >>>> + if (iommu->dirty_page_tracking) { >>>> + iommu->dirty_page_tracking = false; >>>> + vfio_dma_bitmap_free_all(iommu); >>>> + } >>>> + mutex_unlock(&iommu->lock); >>>> + return 0; >>>> + } else if (dirty.flags & >>>> + VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP) { >>>> + struct vfio_iommu_type1_dirty_bitmap_get range; >>>> + unsigned long pgshift; >>>> + size_t data_size = dirty.argsz - minsz; >>>> + uint64_t iommu_pgsize = >>>> + 1 << __ffs(vfio_pgsize_bitmap(iommu)); >>>> + >>>> + if (!data_size || data_size < sizeof(range)) >>>> + return -EINVAL; >>>> + >>>> + if (copy_from_user(&range, (void __user *)(arg + minsz), >>>> + sizeof(range))) >>>> + return -EFAULT; >>>> + >>>> + /* allow only smallest supported pgsize */ >>>> + if (range.bitmap.pgsize != iommu_pgsize) >>>> + return -EINVAL; >>>> + if (range.iova & (iommu_pgsize - 1)) >>>> + return -EINVAL; >>>> + if (!range.size || range.size & (iommu_pgsize - 1)) >>>> + return -EINVAL; >>>> + if (range.iova + range.size < range.iova) >>>> + return -EINVAL; >>>> + if (!access_ok((void __user *)range.bitmap.data, >>>> + range.bitmap.size)) >>>> + return -EINVAL; >>>> + >>>> + pgshift = __ffs(range.bitmap.pgsize); >>>> + ret = verify_bitmap_size(range.size >> pgshift, >>>> + range.bitmap.size); >>>> + if (ret) >>>> + return ret; >>>> + >>>> + mutex_lock(&iommu->lock); >>>> + if (iommu->dirty_page_tracking) >>>> + ret = vfio_iova_dirty_bitmap(iommu, range.iova, >>>> + range.size, range.bitmap.pgsize, >>>> + range.bitmap.data); >>>> + else >>>> + ret = -EINVAL; >>>> + mutex_unlock(&iommu->lock); >>>> + >>>> + return ret; >>>> + } >>>> } >>>> >>>> return -ENOTTY; >>>> @@ -2345,10 +2589,20 @@ static int vfio_iommu_type1_dma_rw_chunk(struct vfio_iommu *iommu, >>>> >>>> vaddr = dma->vaddr + offset; >>>> >>>> - if (write) >>>> + if (write) { >>>> *copied = __copy_to_user((void __user *)vaddr, data, >>>> count) ? 0 : count; >>>> - else >>>> + if (*copied && iommu->dirty_page_tracking) { >>>> + unsigned long pgshift = >>>> + __ffs(vfio_pgsize_bitmap(iommu)); >>>> + /* >>>> + * Bitmap populated with the smallest supported page >>>> + * size >>>> + */ >>>> + bitmap_set(dma->bitmap, offset >> pgshift, >>>> + *copied >> pgshift); >>>> + } >>>> + } else >>>> *copied = __copy_from_user(data, (void __user *)vaddr, >>>> count) ? 0 : count; >>>> if (kthread) >>>> -- >>>> 2.7.0 >>>>