From: Kirti Wankhede <kwankhede@nvidia.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Zhengxiao.zx@Alibaba-inc.com, kevin.tian@intel.com,
yi.l.liu@intel.com, cjia@nvidia.com, kvm@vger.kernel.org,
eskultet@redhat.com, ziye.yang@intel.com, qemu-devel@nongnu.org,
cohuck@redhat.com, shuangtai.tst@alibaba-inc.com,
dgilbert@redhat.com, zhi.a.wang@intel.com, mlevitsk@redhat.com,
pasic@linux.ibm.com, aik@ozlabs.ru, eauger@redhat.com,
felipe@nutanix.com, jonathan.davies@nutanix.com,
yan.y.zhao@intel.com, changpeng.liu@intel.com, Ken.Xue@amd.com
Subject: Re: [PATCH Kernel v18 5/7] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap
Date: Wed, 13 May 2020 02:00:54 +0530 [thread overview]
Message-ID: <e4a5cb87-3bc3-ff1b-9ffb-479a4d418922@nvidia.com> (raw)
In-Reply-To: <20200506162511.032bb1e6@w520.home>
On 5/7/2020 3:55 AM, Alex Williamson wrote:
> On Mon, 4 May 2020 21:28:57 +0530
> Kirti Wankhede <kwankhede@nvidia.com> wrote:
>
>> DMA mapped pages, including those pinned by mdev vendor drivers, might
>> get unpinned and unmapped while migration is active and device is still
>> running. For example, in pre-copy phase while guest driver could access
>> those pages, host device or vendor driver can dirty these mapped pages.
>> Such pages should be marked dirty so as to maintain memory consistency
>> for a user making use of dirty page tracking.
>>
>> To get bitmap during unmap, user should allocate memory for bitmap, set
>> size of allocated memory, set page size to be considered for bitmap and
>> set flag VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP.
>>
>> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
>> Reviewed-by: Neo Jia <cjia@nvidia.com>
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 84 +++++++++++++++++++++++++++++++++++++++--
>> include/uapi/linux/vfio.h | 10 +++++
>> 2 files changed, 90 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 01dcb417836f..8b27faf1ec38 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -983,12 +983,14 @@ static int verify_bitmap_size(uint64_t npages, uint64_t bitmap_size)
>> }
>>
>> static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>> - struct vfio_iommu_type1_dma_unmap *unmap)
>> + struct vfio_iommu_type1_dma_unmap *unmap,
>> + struct vfio_bitmap *bitmap)
>> {
>> uint64_t mask;
>> struct vfio_dma *dma, *dma_last = NULL;
>> size_t unmapped = 0;
>> int ret = 0, retries = 0;
>> + unsigned long *final_bitmap = NULL, *temp_bitmap = NULL;
>>
>> mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
>>
>> @@ -1041,6 +1043,7 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>> ret = -EINVAL;
>> goto unlock;
>> }
>> +
>> dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
>> if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
>> ret = -EINVAL;
>> @@ -1048,6 +1051,22 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>> }
>> }
>>
>> + if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
>> + iommu->dirty_page_tracking) {
>
> Why do we even accept VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP when not
> dirty page tracking rather than returning -EINVAL? It would simplify
> things here to reject it at the ioctl and silently ignoring a flag is
> rarely if ever the right approach.
>
>> + final_bitmap = kvzalloc(bitmap->size, GFP_KERNEL);
>> + if (!final_bitmap) {
>> + ret = -ENOMEM;
>> + goto unlock;
>> + }
>> +
>> + temp_bitmap = kvzalloc(bitmap->size, GFP_KERNEL);
>> + if (!temp_bitmap) {
>> + ret = -ENOMEM;
>> + kfree(final_bitmap);
>> + goto unlock;
>> + }
>
> YIKES! So the user can instantly trigger the kernel to internally
> allocate 2 x 256MB, regardless of how much they can actually map.
>
That is worst case senario. I don't think ideally that will ever hit.
More comment below regarding this.
>> + }
>> +
>> while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
>> if (!iommu->v2 && unmap->iova > dma->iova)
>> break;
>> @@ -1058,6 +1077,24 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>> if (dma->task->mm != current->mm)
>> break;
>>
>> + if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
>> + iommu->dirty_page_tracking) {
>> + unsigned long pgshift = __ffs(bitmap->pgsize);
>> + unsigned int npages = dma->size >> pgshift;
>> + unsigned int shift;
>> +
>> + vfio_iova_dirty_bitmap(iommu, dma->iova, dma->size,
>> + bitmap->pgsize, (u64 *)temp_bitmap);
>
> vfio_iova_dirty_bitmap() takes a __user bitmap, we're doing
> copy_to_user() on a kernel allocated buffer???
>
Actually, there is no need to call vfio_iova_dirty_bitmap(), dma pointer
is known here and since its getting unmapped, there is no need to
repopulate bitmap. Removing vfio_iova_dirty_bitmap() and changing it as
below:
if (unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) {
unsigned long pgshift = __ffs(bitmap->pgsize);
unsigned int npages = dma->size >> pgshift;
unsigned int bitmap_size = DIRTY_BITMAP_BYTES(npages);
unsigned int shift = (dma->iova - unmap->iova) >>
pgshift;
/*
* mark all pages dirty if all pages are pinned and
* mapped.
*/
if (dma->iommu_mapped)
bitmap_set(temp_bitmap, 0, npages);
else
memcpy(temp_bitmap, dma->bitmap, bitmap_size);
if (shift)
bitmap_shift_left(temp_bitmap, temp_bitmap,
shift, npages);
bitmap_or(final_bitmap, final_bitmap, temp_bitmap,
shift + npages);
memset(temp_bitmap, 0, bitmap->size);
}
>> +
>> + shift = (dma->iova - unmap->iova) >> pgshift;
>> + if (shift)
>> + bitmap_shift_left(temp_bitmap, temp_bitmap,
>> + shift, npages);
>> + bitmap_or(final_bitmap, final_bitmap, temp_bitmap,
>> + shift + npages);
>> + memset(temp_bitmap, 0, bitmap->size);
>> + }
>
> It seems like if the per vfio_dma dirty bitmap was oversized by a long
> that we could shift it in place, then we'd only need one working bitmap
> buffer and we could size that to fit the vfio_dma (or the largest
> vfio_dma if we don't want to free and re-alloc for each vfio_dma).
> We'd need to do more copy_to/from_user()s, but we'd also avoid copying
> between sparse mappings (user zero'd bitmap required) and we'd have a
> far more reasonable memory usage. Thanks,
>
I thought about it, but couldn't optimize to use one bitmap buffer.
This case will only hit during migration with vIOMMU enabled.
Can we keep these 2 bitmap buffers for now and optimize it later?
Thanks,
Kirti
next prev parent reply other threads:[~2020-05-12 20:32 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-04 15:58 [PATCH Kernel v18 0/7] KABIs to support migration for VFIO devices Kirti Wankhede
2020-05-04 15:58 ` [PATCH Kernel v18 1/7] vfio: UAPI for migration interface for device state Kirti Wankhede
2020-05-04 15:58 ` [PATCH Kernel v18 2/7] vfio iommu: Remove atomicity of ref_count of pinned pages Kirti Wankhede
2020-05-04 15:58 ` [PATCH Kernel v18 3/7] vfio iommu: Add ioctl definition for dirty pages tracking Kirti Wankhede
2020-05-04 15:58 ` [PATCH Kernel v18 4/7] vfio iommu: Implementation of ioctl " Kirti Wankhede
2020-05-06 8:15 ` Yan Zhao
2020-05-06 19:42 ` Kirti Wankhede
2020-05-07 18:19 ` Alex Williamson
2020-05-06 10:54 ` Cornelia Huck
2020-05-13 20:26 ` Kirti Wankhede
2020-05-04 15:58 ` [PATCH Kernel v18 5/7] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap Kirti Wankhede
2020-05-06 22:25 ` Alex Williamson
2020-05-12 20:30 ` Kirti Wankhede [this message]
2020-05-12 21:21 ` Alex Williamson
2020-05-04 15:58 ` [PATCH Kernel v18 6/7] vfio iommu: Add migration capability to report supported features Kirti Wankhede
2020-05-06 22:27 ` Alex Williamson
2020-05-07 5:37 ` Kirti Wankhede
2020-05-07 15:17 ` Alex Williamson
2020-05-04 15:58 ` [PATCH Kernel v18 7/7] vfio: Selective dirty page tracking if IOMMU backed device pins pages Kirti Wankhede
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e4a5cb87-3bc3-ff1b-9ffb-479a4d418922@nvidia.com \
--to=kwankhede@nvidia.com \
--cc=Ken.Xue@amd.com \
--cc=Zhengxiao.zx@Alibaba-inc.com \
--cc=aik@ozlabs.ru \
--cc=alex.williamson@redhat.com \
--cc=changpeng.liu@intel.com \
--cc=cjia@nvidia.com \
--cc=cohuck@redhat.com \
--cc=dgilbert@redhat.com \
--cc=eauger@redhat.com \
--cc=eskultet@redhat.com \
--cc=felipe@nutanix.com \
--cc=jonathan.davies@nutanix.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=mlevitsk@redhat.com \
--cc=pasic@linux.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=shuangtai.tst@alibaba-inc.com \
--cc=yan.y.zhao@intel.com \
--cc=yi.l.liu@intel.com \
--cc=zhi.a.wang@intel.com \
--cc=ziye.yang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).