linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] VDUSE: Support registering userspace memory as bounce buffer
@ 2022-07-06  5:04 Xie Yongji
  2022-07-06  5:04 ` [PATCH v2 1/5] vduse: Remove unnecessary spin lock protection Xie Yongji
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Xie Yongji @ 2022-07-06  5:04 UTC (permalink / raw)
  To: mst, jasowang, xiaodong.liu, maxime.coquelin, stefanha
  Cc: songmuchun, virtualization, linux-kernel

Hi all,

This series introduces some new ioctls: VDUSE_IOTLB_GET_INFO,
VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM to support
registering and de-registering userspace memory for IOTLB
as bounce buffer in virtio-vdpa case.

The VDUSE_IOTLB_GET_INFO ioctl can help user to query IOLTB
information such as bounce buffer size. Then user can use
those information on VDUSE_IOTLB_REG_UMEM and
VDUSE_IOTLB_DEREG_UMEM ioctls to register and de-register
userspace memory for IOTLB.

During registering and de-registering, the DMA data in use
would be copied from kernel bounce pages to userspace bounce
pages and back.

With this feature, some existing application such as SPDK
and DPDK can leverage the datapath of VDUSE directly and
efficiently as discussed before [1][2]. They can register
some preallocated hugepages to VDUSE to avoid an extra
memcpy from bounce-buffer to hugepages.

The kernel and userspace codes could be found in github:

https://github.com/bytedance/linux/tree/vduse-umem
https://github.com/bytedance/qemu/tree/vduse-umem

To test it with qemu-storage-daemon:

$ qemu-storage-daemon \
    --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server=on,wait=off \
    --monitor chardev=charmonitor \
    --blockdev driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0 \
    --export type=vduse-blk,id=vduse-test,name=vduse-test,node-name=disk0,writable=on

[1] https://lkml.org/lkml/2021/6/27/318
[2] https://lkml.org/lkml/2022/7/4/246

Please review, thanks!

V1 to V2:
- Drop the patch that updating API version [MST]
- Replace unpin_user_pages() with unpin_user_pages_dirty_lock() [MST]
- Use __vmalloc(__GFP_ACCOUNT) for memory accounting [MST]

Xie Yongji (5):
  vduse: Remove unnecessary spin lock protection
  vduse: Use memcpy_{to,from}_page() in do_bounce()
  vduse: Support using userspace pages as bounce buffer
  vduse: Support querying IOLTB information
  vduse: Support registering userspace memory for IOTLB

 drivers/vdpa/vdpa_user/iova_domain.c | 134 ++++++++++++++++++++---
 drivers/vdpa/vdpa_user/iova_domain.h |   9 ++
 drivers/vdpa/vdpa_user/vduse_dev.c   | 152 +++++++++++++++++++++++++++
 include/uapi/linux/vduse.h           |  45 ++++++++
 4 files changed, 327 insertions(+), 13 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 1/5] vduse: Remove unnecessary spin lock protection
  2022-07-06  5:04 [PATCH v2 0/5] VDUSE: Support registering userspace memory as bounce buffer Xie Yongji
@ 2022-07-06  5:04 ` Xie Yongji
  2022-07-13  5:43   ` Jason Wang
  2022-07-13  5:57   ` Michael S. Tsirkin
  2022-07-06  5:05 ` [PATCH v2 2/5] vduse: Use memcpy_{to,from}_page() in do_bounce() Xie Yongji
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 25+ messages in thread
From: Xie Yongji @ 2022-07-06  5:04 UTC (permalink / raw)
  To: mst, jasowang, xiaodong.liu, maxime.coquelin, stefanha
  Cc: songmuchun, virtualization, linux-kernel

Taking iotlb lock to access bounce page in page fault
handler is meaningless since vduse_domain_free_bounce_pages()
would only be called during file release.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
---
 drivers/vdpa/vdpa_user/iova_domain.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c
index 6daa3978d290..bca1f0b8850c 100644
--- a/drivers/vdpa/vdpa_user/iova_domain.c
+++ b/drivers/vdpa/vdpa_user/iova_domain.c
@@ -211,17 +211,14 @@ static struct page *
 vduse_domain_get_bounce_page(struct vduse_iova_domain *domain, u64 iova)
 {
 	struct vduse_bounce_map *map;
-	struct page *page = NULL;
+	struct page *page;
 
-	spin_lock(&domain->iotlb_lock);
 	map = &domain->bounce_maps[iova >> PAGE_SHIFT];
 	if (!map->bounce_page)
-		goto out;
+		return NULL;
 
 	page = map->bounce_page;
 	get_page(page);
-out:
-	spin_unlock(&domain->iotlb_lock);
 
 	return page;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 2/5] vduse: Use memcpy_{to,from}_page() in do_bounce()
  2022-07-06  5:04 [PATCH v2 0/5] VDUSE: Support registering userspace memory as bounce buffer Xie Yongji
  2022-07-06  5:04 ` [PATCH v2 1/5] vduse: Remove unnecessary spin lock protection Xie Yongji
@ 2022-07-06  5:05 ` Xie Yongji
  2022-07-13  5:59   ` Jason Wang
  2022-07-06  5:05 ` [PATCH v2 3/5] vduse: Support using userspace pages as bounce buffer Xie Yongji
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Xie Yongji @ 2022-07-06  5:05 UTC (permalink / raw)
  To: mst, jasowang, xiaodong.liu, maxime.coquelin, stefanha
  Cc: songmuchun, virtualization, linux-kernel

kmap_atomic() is being deprecated in favor of kmap_local_page().

The use of kmap_atomic() in do_bounce() is all thread local therefore
kmap_local_page() is a sufficient replacement.

Convert to kmap_local_page() but, instead of open coding it,
use the helpers memcpy_to_page() and memcpy_from_page().

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
---
 drivers/vdpa/vdpa_user/iova_domain.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c
index bca1f0b8850c..50d7c08d5450 100644
--- a/drivers/vdpa/vdpa_user/iova_domain.c
+++ b/drivers/vdpa/vdpa_user/iova_domain.c
@@ -138,18 +138,17 @@ static void do_bounce(phys_addr_t orig, void *addr, size_t size,
 {
 	unsigned long pfn = PFN_DOWN(orig);
 	unsigned int offset = offset_in_page(orig);
-	char *buffer;
+	struct page *page;
 	unsigned int sz = 0;
 
 	while (size) {
 		sz = min_t(size_t, PAGE_SIZE - offset, size);
 
-		buffer = kmap_atomic(pfn_to_page(pfn));
+		page = pfn_to_page(pfn);
 		if (dir == DMA_TO_DEVICE)
-			memcpy(addr, buffer + offset, sz);
+			memcpy_from_page(addr, page, offset, sz);
 		else
-			memcpy(buffer + offset, addr, sz);
-		kunmap_atomic(buffer);
+			memcpy_to_page(page, offset, addr, sz);
 
 		size -= sz;
 		pfn++;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 3/5] vduse: Support using userspace pages as bounce buffer
  2022-07-06  5:04 [PATCH v2 0/5] VDUSE: Support registering userspace memory as bounce buffer Xie Yongji
  2022-07-06  5:04 ` [PATCH v2 1/5] vduse: Remove unnecessary spin lock protection Xie Yongji
  2022-07-06  5:05 ` [PATCH v2 2/5] vduse: Use memcpy_{to,from}_page() in do_bounce() Xie Yongji
@ 2022-07-06  5:05 ` Xie Yongji
  2022-07-06  5:05 ` [PATCH v2 4/5] vduse: Support querying IOLTB information Xie Yongji
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Xie Yongji @ 2022-07-06  5:05 UTC (permalink / raw)
  To: mst, jasowang, xiaodong.liu, maxime.coquelin, stefanha
  Cc: songmuchun, virtualization, linux-kernel

Introduce two APIs: vduse_domain_add_user_bounce_pages()
and vduse_domain_remove_user_bounce_pages() to support
adding and removing userspace pages for bounce buffers.
During adding and removing, the DMA data would be copied
from the kernel bounce pages to the userspace bounce pages
and back.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
---
 drivers/vdpa/vdpa_user/iova_domain.c | 128 +++++++++++++++++++++++++--
 drivers/vdpa/vdpa_user/iova_domain.h |   9 ++
 2 files changed, 129 insertions(+), 8 deletions(-)

diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c
index 50d7c08d5450..2ae29341228e 100644
--- a/drivers/vdpa/vdpa_user/iova_domain.c
+++ b/drivers/vdpa/vdpa_user/iova_domain.c
@@ -178,8 +178,9 @@ static void vduse_domain_bounce(struct vduse_iova_domain *domain,
 			    map->orig_phys == INVALID_PHYS_ADDR))
 			return;
 
-		addr = page_address(map->bounce_page) + offset;
-		do_bounce(map->orig_phys + offset, addr, sz, dir);
+		addr = kmap_local_page(map->bounce_page);
+		do_bounce(map->orig_phys + offset, addr + offset, sz, dir);
+		kunmap_local(addr);
 		size -= sz;
 		iova += sz;
 	}
@@ -210,20 +211,23 @@ static struct page *
 vduse_domain_get_bounce_page(struct vduse_iova_domain *domain, u64 iova)
 {
 	struct vduse_bounce_map *map;
-	struct page *page;
+	struct page *page = NULL;
 
+	read_lock(&domain->bounce_lock);
 	map = &domain->bounce_maps[iova >> PAGE_SHIFT];
-	if (!map->bounce_page)
-		return NULL;
+	if (domain->user_bounce_pages || !map->bounce_page)
+		goto out;
 
 	page = map->bounce_page;
 	get_page(page);
+out:
+	read_unlock(&domain->bounce_lock);
 
 	return page;
 }
 
 static void
-vduse_domain_free_bounce_pages(struct vduse_iova_domain *domain)
+vduse_domain_free_kernel_bounce_pages(struct vduse_iova_domain *domain)
 {
 	struct vduse_bounce_map *map;
 	unsigned long pfn, bounce_pfns;
@@ -243,6 +247,81 @@ vduse_domain_free_bounce_pages(struct vduse_iova_domain *domain)
 	}
 }
 
+int vduse_domain_add_user_bounce_pages(struct vduse_iova_domain *domain,
+				       struct page **pages, int count)
+{
+	struct vduse_bounce_map *map;
+	int i, ret;
+
+	/* Now we don't support partial mapping */
+	if (count != (domain->bounce_size >> PAGE_SHIFT))
+		return -EINVAL;
+
+	write_lock(&domain->bounce_lock);
+	ret = -EEXIST;
+	if (domain->user_bounce_pages)
+		goto out;
+
+	ret = -EBUSY;
+	/*
+	 * Make sure nobody maps the kernel bounce pages,
+	 * then we can free them.
+	 */
+	if (domain->mapped)
+		goto out;
+
+	for (i = 0; i < count; i++) {
+		map = &domain->bounce_maps[i];
+		if (map->bounce_page) {
+			/* Copy kernel page to user page if it's in use */
+			if (map->orig_phys != INVALID_PHYS_ADDR)
+				memcpy_to_page(pages[i], 0,
+					       page_address(map->bounce_page),
+					       PAGE_SIZE);
+			__free_page(map->bounce_page);
+		}
+		map->bounce_page = pages[i];
+		get_page(pages[i]);
+	}
+	domain->user_bounce_pages = true;
+	ret = 0;
+out:
+	write_unlock(&domain->bounce_lock);
+
+	return ret;
+}
+
+void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain)
+{
+	struct vduse_bounce_map *map;
+	unsigned long i, count;
+
+	write_lock(&domain->bounce_lock);
+	if (!domain->user_bounce_pages)
+		goto out;
+
+	count = domain->bounce_size >> PAGE_SHIFT;
+	for (i = 0; i < count; i++) {
+		struct page *page = NULL;
+
+		map = &domain->bounce_maps[i];
+		if (WARN_ON(!map->bounce_page))
+			continue;
+
+		/* Copy user page to kernel page if it's in use */
+		if (map->orig_phys != INVALID_PHYS_ADDR) {
+			page = alloc_page(GFP_ATOMIC | __GFP_NOFAIL);
+			memcpy_from_page(page_address(page),
+					 map->bounce_page, 0, PAGE_SIZE);
+		}
+		put_page(map->bounce_page);
+		map->bounce_page = page;
+	}
+	domain->user_bounce_pages = false;
+out:
+	write_unlock(&domain->bounce_lock);
+}
+
 void vduse_domain_reset_bounce_map(struct vduse_iova_domain *domain)
 {
 	if (!domain->bounce_map)
@@ -318,13 +397,18 @@ dma_addr_t vduse_domain_map_page(struct vduse_iova_domain *domain,
 	if (vduse_domain_init_bounce_map(domain))
 		goto err;
 
+	read_lock(&domain->bounce_lock);
 	if (vduse_domain_map_bounce_page(domain, (u64)iova, (u64)size, pa))
-		goto err;
+		goto err_unlock;
 
 	if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)
 		vduse_domain_bounce(domain, iova, size, DMA_TO_DEVICE);
 
+	read_unlock(&domain->bounce_lock);
+
 	return iova;
+err_unlock:
+	read_unlock(&domain->bounce_lock);
 err:
 	vduse_domain_free_iova(iovad, iova, size);
 	return DMA_MAPPING_ERROR;
@@ -336,10 +420,12 @@ void vduse_domain_unmap_page(struct vduse_iova_domain *domain,
 {
 	struct iova_domain *iovad = &domain->stream_iovad;
 
+	read_lock(&domain->bounce_lock);
 	if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)
 		vduse_domain_bounce(domain, dma_addr, size, DMA_FROM_DEVICE);
 
 	vduse_domain_unmap_bounce_page(domain, (u64)dma_addr, (u64)size);
+	read_unlock(&domain->bounce_lock);
 	vduse_domain_free_iova(iovad, dma_addr, size);
 }
 
@@ -404,6 +490,24 @@ void vduse_domain_free_coherent(struct vduse_iova_domain *domain, size_t size,
 	free_pages_exact(phys_to_virt(pa), size);
 }
 
+static void vduse_domain_mmap_open(struct vm_area_struct *vma)
+{
+	struct vduse_iova_domain *domain = vma->vm_private_data;
+
+	write_lock(&domain->bounce_lock);
+	domain->mapped++;
+	write_unlock(&domain->bounce_lock);
+}
+
+static void vduse_domain_mmap_close(struct vm_area_struct *vma)
+{
+	struct vduse_iova_domain *domain = vma->vm_private_data;
+
+	write_lock(&domain->bounce_lock);
+	domain->mapped--;
+	write_unlock(&domain->bounce_lock);
+}
+
 static vm_fault_t vduse_domain_mmap_fault(struct vm_fault *vmf)
 {
 	struct vduse_iova_domain *domain = vmf->vma->vm_private_data;
@@ -427,6 +531,8 @@ static vm_fault_t vduse_domain_mmap_fault(struct vm_fault *vmf)
 }
 
 static const struct vm_operations_struct vduse_domain_mmap_ops = {
+	.open =	vduse_domain_mmap_open,
+	.close = vduse_domain_mmap_close,
 	.fault = vduse_domain_mmap_fault,
 };
 
@@ -438,6 +544,10 @@ static int vduse_domain_mmap(struct file *file, struct vm_area_struct *vma)
 	vma->vm_private_data = domain;
 	vma->vm_ops = &vduse_domain_mmap_ops;
 
+	write_lock(&domain->bounce_lock);
+	domain->mapped++;
+	write_unlock(&domain->bounce_lock);
+
 	return 0;
 }
 
@@ -447,7 +557,8 @@ static int vduse_domain_release(struct inode *inode, struct file *file)
 
 	spin_lock(&domain->iotlb_lock);
 	vduse_iotlb_del_range(domain, 0, ULLONG_MAX);
-	vduse_domain_free_bounce_pages(domain);
+	vduse_domain_remove_user_bounce_pages(domain);
+	vduse_domain_free_kernel_bounce_pages(domain);
 	spin_unlock(&domain->iotlb_lock);
 	put_iova_domain(&domain->stream_iovad);
 	put_iova_domain(&domain->consistent_iovad);
@@ -507,6 +618,7 @@ vduse_domain_create(unsigned long iova_limit, size_t bounce_size)
 		goto err_file;
 
 	domain->file = file;
+	rwlock_init(&domain->bounce_lock);
 	spin_lock_init(&domain->iotlb_lock);
 	init_iova_domain(&domain->stream_iovad,
 			PAGE_SIZE, IOVA_START_PFN);
diff --git a/drivers/vdpa/vdpa_user/iova_domain.h b/drivers/vdpa/vdpa_user/iova_domain.h
index 2722d9b8e21a..4a47615346ac 100644
--- a/drivers/vdpa/vdpa_user/iova_domain.h
+++ b/drivers/vdpa/vdpa_user/iova_domain.h
@@ -14,6 +14,7 @@
 #include <linux/iova.h>
 #include <linux/dma-mapping.h>
 #include <linux/vhost_iotlb.h>
+#include <linux/rwlock.h>
 
 #define IOVA_START_PFN 1
 
@@ -34,6 +35,9 @@ struct vduse_iova_domain {
 	struct vhost_iotlb *iotlb;
 	spinlock_t iotlb_lock;
 	struct file *file;
+	int mapped;
+	bool user_bounce_pages;
+	rwlock_t bounce_lock;
 };
 
 int vduse_domain_set_map(struct vduse_iova_domain *domain,
@@ -61,6 +65,11 @@ void vduse_domain_free_coherent(struct vduse_iova_domain *domain, size_t size,
 
 void vduse_domain_reset_bounce_map(struct vduse_iova_domain *domain);
 
+int vduse_domain_add_user_bounce_pages(struct vduse_iova_domain *domain,
+				       struct page **pages, int count);
+
+void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain);
+
 void vduse_domain_destroy(struct vduse_iova_domain *domain);
 
 struct vduse_iova_domain *vduse_domain_create(unsigned long iova_limit,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 4/5] vduse: Support querying IOLTB information
  2022-07-06  5:04 [PATCH v2 0/5] VDUSE: Support registering userspace memory as bounce buffer Xie Yongji
                   ` (2 preceding siblings ...)
  2022-07-06  5:05 ` [PATCH v2 3/5] vduse: Support using userspace pages as bounce buffer Xie Yongji
@ 2022-07-06  5:05 ` Xie Yongji
  2022-07-14  2:51   ` Jason Wang
  2022-07-06  5:05 ` [PATCH v2 5/5] vduse: Support registering userspace memory for IOTLB Xie Yongji
  2022-07-06  9:30 ` [PATCH v2 0/5] VDUSE: Support registering userspace memory as bounce buffer Jason Wang
  5 siblings, 1 reply; 25+ messages in thread
From: Xie Yongji @ 2022-07-06  5:05 UTC (permalink / raw)
  To: mst, jasowang, xiaodong.liu, maxime.coquelin, stefanha
  Cc: songmuchun, virtualization, linux-kernel

This introduces a new ioctl: VDUSE_IOTLB_GET_INFO to
support querying IOLTB information such as bounce
buffer size.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
---
 drivers/vdpa/vdpa_user/vduse_dev.c | 13 +++++++++++++
 include/uapi/linux/vduse.h         | 17 +++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index 3bc27de58f46..c47a5d9765cf 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -1089,6 +1089,19 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
 		ret = vduse_dev_queue_irq_work(dev, &dev->vqs[index].inject);
 		break;
 	}
+	case VDUSE_IOTLB_GET_INFO: {
+		struct vduse_iotlb_info iotlb;
+
+		iotlb.bounce_iova = 0;
+		iotlb.bounce_size = dev->domain->bounce_size;
+
+		ret = -EFAULT;
+		if (copy_to_user(argp, &iotlb, sizeof(iotlb)))
+			break;
+
+		ret = 0;
+		break;
+	}
 	default:
 		ret = -ENOIOCTLCMD;
 		break;
diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h
index 7cfe1c1280c0..c201b7a77c2c 100644
--- a/include/uapi/linux/vduse.h
+++ b/include/uapi/linux/vduse.h
@@ -210,6 +210,23 @@ struct vduse_vq_eventfd {
  */
 #define VDUSE_VQ_INJECT_IRQ	_IOW(VDUSE_BASE, 0x17, __u32)
 
+/**
+ * struct vduse_iotlb_info - IOTLB information
+ * @bounce_iova: start IOVA of bounce buffer
+ * @bounce_size: bounce buffer size
+ * @reserved: for future use, needs to be initialized to zero
+ *
+ * Structure used by VDUSE_IOTLB_GET_INFO ioctl to get IOTLB information.
+ */
+struct vduse_iotlb_info {
+	__u64 bounce_iova;
+	__u64 bounce_size;
+	__u64 reserved[2];
+};
+
+/* Get IOTLB information, e.g. bounce buffer size */
+#define VDUSE_IOTLB_GET_INFO    _IOR(VDUSE_BASE, 0x18, struct vduse_iotlb_info)
+
 /* The control messages definition for read(2)/write(2) on /dev/vduse/$NAME */
 
 /**
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 5/5] vduse: Support registering userspace memory for IOTLB
  2022-07-06  5:04 [PATCH v2 0/5] VDUSE: Support registering userspace memory as bounce buffer Xie Yongji
                   ` (3 preceding siblings ...)
  2022-07-06  5:05 ` [PATCH v2 4/5] vduse: Support querying IOLTB information Xie Yongji
@ 2022-07-06  5:05 ` Xie Yongji
  2022-07-06  6:08   ` kernel test robot
                     ` (2 more replies)
  2022-07-06  9:30 ` [PATCH v2 0/5] VDUSE: Support registering userspace memory as bounce buffer Jason Wang
  5 siblings, 3 replies; 25+ messages in thread
From: Xie Yongji @ 2022-07-06  5:05 UTC (permalink / raw)
  To: mst, jasowang, xiaodong.liu, maxime.coquelin, stefanha
  Cc: songmuchun, virtualization, linux-kernel

Introduce two ioctls: VDUSE_IOTLB_REG_UMEM and
VDUSE_IOTLB_DEREG_UMEM to support registering
and de-registering userspace memory for IOTLB
in virtio-vdpa case.

Now it only supports registering userspace memory
for IOTLB as bounce buffer.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
---
 drivers/vdpa/vdpa_user/vduse_dev.c | 139 +++++++++++++++++++++++++++++
 include/uapi/linux/vduse.h         |  28 ++++++
 2 files changed, 167 insertions(+)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index c47a5d9765cf..4467ae2381bb 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -21,6 +21,7 @@
 #include <linux/uio.h>
 #include <linux/vdpa.h>
 #include <linux/nospec.h>
+#include <linux/sched/mm.h>
 #include <uapi/linux/vduse.h>
 #include <uapi/linux/vdpa.h>
 #include <uapi/linux/virtio_config.h>
@@ -64,6 +65,13 @@ struct vduse_vdpa {
 	struct vduse_dev *dev;
 };
 
+struct vduse_iotlb_mem {
+	unsigned long iova;
+	unsigned long npages;
+	struct page **pages;
+	struct mm_struct *mm;
+};
+
 struct vduse_dev {
 	struct vduse_vdpa *vdev;
 	struct device *dev;
@@ -95,6 +103,8 @@ struct vduse_dev {
 	u8 status;
 	u32 vq_num;
 	u32 vq_align;
+	struct vduse_iotlb_mem *iotlb_mem;
+	struct mutex mem_lock;
 };
 
 struct vduse_dev_msg {
@@ -917,6 +927,101 @@ static int vduse_dev_queue_irq_work(struct vduse_dev *dev,
 	return ret;
 }
 
+static int vduse_dev_dereg_iotlb_mem(struct vduse_dev *dev,
+				     u64 iova, u64 size)
+{
+	int ret;
+
+	mutex_lock(&dev->mem_lock);
+	ret = -ENOENT;
+	if (!dev->iotlb_mem)
+		goto unlock;
+
+	ret = -EINVAL;
+	if (dev->iotlb_mem->iova != iova || size != dev->domain->bounce_size)
+		goto unlock;
+
+	vduse_domain_remove_user_bounce_pages(dev->domain);
+	unpin_user_pages_dirty_lock(dev->iotlb_mem->pages,
+				    dev->iotlb_mem->npages, true);
+	atomic64_sub(dev->iotlb_mem->npages, &dev->iotlb_mem->mm->pinned_vm);
+	mmdrop(dev->iotlb_mem->mm);
+	vfree(dev->iotlb_mem->pages);
+	kfree(dev->iotlb_mem);
+	dev->iotlb_mem = NULL;
+	ret = 0;
+unlock:
+	mutex_unlock(&dev->mem_lock);
+	return ret;
+}
+
+static int vduse_dev_reg_iotlb_mem(struct vduse_dev *dev,
+				   u64 iova, u64 uaddr, u64 size)
+{
+	struct page **page_list = NULL;
+	struct vduse_iotlb_mem *mem = NULL;
+	long pinned = 0;
+	unsigned long npages, lock_limit;
+	int ret;
+
+	if (size != dev->domain->bounce_size ||
+	    iova != 0 || uaddr & ~PAGE_MASK)
+		return -EINVAL;
+
+	mutex_lock(&dev->mem_lock);
+	ret = -EEXIST;
+	if (dev->iotlb_mem)
+		goto unlock;
+
+	ret = -ENOMEM;
+	npages = size >> PAGE_SHIFT;
+	page_list = __vmalloc(array_size(npages, sizeof(struct page *)),
+			      GFP_KERNEL_ACCOUNT);
+	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
+	if (!page_list || !mem)
+		goto unlock;
+
+	mmap_read_lock(current->mm);
+
+	lock_limit = PFN_DOWN(rlimit(RLIMIT_MEMLOCK));
+	if (npages + atomic64_read(&current->mm->pinned_vm) > lock_limit)
+		goto out;
+
+	pinned = pin_user_pages(uaddr, npages, FOLL_LONGTERM | FOLL_WRITE,
+				page_list, NULL);
+	if (pinned != npages) {
+		ret = pinned < 0 ? pinned : -ENOMEM;
+		goto out;
+	}
+
+	ret = vduse_domain_add_user_bounce_pages(dev->domain,
+						 page_list, pinned);
+	if (ret)
+		goto out;
+
+	atomic64_add(npages, &current->mm->pinned_vm);
+
+	mem->pages = page_list;
+	mem->npages = pinned;
+	mem->iova = iova;
+	mem->mm = current->mm;
+	mmgrab(current->mm);
+
+	dev->iotlb_mem = mem;
+out:
+	if (ret && pinned > 0)
+		unpin_user_pages(page_list, pinned);
+
+	mmap_read_unlock(current->mm);
+unlock:
+	if (ret) {
+		vfree(page_list);
+		kfree(mem);
+	}
+	mutex_unlock(&dev->mem_lock);
+	return ret;
+}
+
 static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
 			    unsigned long arg)
 {
@@ -943,6 +1048,16 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
 		if (entry.start > entry.last)
 			break;
 
+		if (domain->bounce_map && dev->iotlb_mem) {
+			ret = -EEXIST;
+			if (entry.start >= 0 &&
+			    entry.last < domain->bounce_size)
+				break;
+
+			if (entry.start < domain->bounce_size)
+				entry.start = domain->bounce_size;
+		}
+
 		spin_lock(&domain->iotlb_lock);
 		map = vhost_iotlb_itree_first(domain->iotlb,
 					      entry.start, entry.last);
@@ -1102,6 +1217,28 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
 		ret = 0;
 		break;
 	}
+	case VDUSE_IOTLB_REG_UMEM: {
+		struct vduse_iotlb_umem umem;
+
+		ret = -EFAULT;
+		if (copy_from_user(&umem, argp, sizeof(umem)))
+			break;
+
+		ret = vduse_dev_reg_iotlb_mem(dev, umem.iova,
+					      umem.uaddr, umem.size);
+		break;
+	}
+	case VDUSE_IOTLB_DEREG_UMEM: {
+		struct vduse_iotlb_umem umem;
+
+		ret = -EFAULT;
+		if (copy_from_user(&umem, argp, sizeof(umem)))
+			break;
+
+		ret = vduse_dev_dereg_iotlb_mem(dev, umem.iova,
+						umem.size);
+		break;
+	}
 	default:
 		ret = -ENOIOCTLCMD;
 		break;
@@ -1114,6 +1251,7 @@ static int vduse_dev_release(struct inode *inode, struct file *file)
 {
 	struct vduse_dev *dev = file->private_data;
 
+	vduse_dev_dereg_iotlb_mem(dev, 0, dev->domain->bounce_size);
 	spin_lock(&dev->msg_lock);
 	/* Make sure the inflight messages can processed after reconncection */
 	list_splice_init(&dev->recv_list, &dev->send_list);
@@ -1176,6 +1314,7 @@ static struct vduse_dev *vduse_dev_create(void)
 		return NULL;
 
 	mutex_init(&dev->lock);
+	mutex_init(&dev->mem_lock);
 	spin_lock_init(&dev->msg_lock);
 	INIT_LIST_HEAD(&dev->send_list);
 	INIT_LIST_HEAD(&dev->recv_list);
diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h
index c201b7a77c2c..1b17391e228f 100644
--- a/include/uapi/linux/vduse.h
+++ b/include/uapi/linux/vduse.h
@@ -227,6 +227,34 @@ struct vduse_iotlb_info {
 /* Get IOTLB information, e.g. bounce buffer size */
 #define VDUSE_IOTLB_GET_INFO    _IOR(VDUSE_BASE, 0x18, struct vduse_iotlb_info)
 
+/**
+ * struct vduse_iotlb_umem - userspace memory configuration
+ * @uaddr: start address of userspace memory, it must be aligned to page size
+ * @iova: IOVA of userspace memory, it must be equal to bounce iova returned
+ *        by VDUSE_IOTLB_GET_INFO now
+ * @size: size of userspace memory, it must be equal to bounce size returned
+ *        by VDUSE_IOTLB_GET_INFO now
+ * @reserved: for future use, needs to be initialized to zero
+ *
+ * Structure used by VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM
+ * ioctls to register/de-register userspace memory for IOTLB.
+ */
+struct vduse_iotlb_umem {
+	__u64 uaddr;
+	__u64 iova;
+	__u64 size;
+	__u64 reserved[3];
+};
+
+/*
+ * Register userspace memory for IOTLB. Now we only support registering
+ * userspace memory as bounce buffer.
+ */
+#define VDUSE_IOTLB_REG_UMEM	_IOW(VDUSE_BASE, 0x19, struct vduse_iotlb_umem)
+
+/* De-register the userspace memory. Caller should set iova and size field. */
+#define VDUSE_IOTLB_DEREG_UMEM	_IOW(VDUSE_BASE, 0x1a, struct vduse_iotlb_umem)
+
 /* The control messages definition for read(2)/write(2) on /dev/vduse/$NAME */
 
 /**
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/5] vduse: Support registering userspace memory for IOTLB
  2022-07-06  5:05 ` [PATCH v2 5/5] vduse: Support registering userspace memory for IOTLB Xie Yongji
@ 2022-07-06  6:08   ` kernel test robot
  2022-07-07 23:01   ` kernel test robot
  2022-07-08  1:14   ` kernel test robot
  2 siblings, 0 replies; 25+ messages in thread
From: kernel test robot @ 2022-07-06  6:08 UTC (permalink / raw)
  To: Xie Yongji, mst, jasowang, xiaodong.liu, maxime.coquelin, stefanha
  Cc: kbuild-all, songmuchun, virtualization, linux-kernel

Hi Xie,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.19-rc5 next-20220705]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Xie-Yongji/VDUSE-Support-registering-userspace-memory-as-bounce-buffer/20220706-130802
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git e35e5b6f695d241ffb1d223207da58a1fbcdff4b
config: alpha-allyesconfig (https://download.01.org/0day-ci/archive/20220706/202207061418.mTlO0yqK-lkp@intel.com/config)
compiler: alpha-linux-gcc (GCC) 11.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/9be699264e4fede9c3be913b2d1003c260d9fa05
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Xie-Yongji/VDUSE-Support-registering-userspace-memory-as-bounce-buffer/20220706-130802
        git checkout 9be699264e4fede9c3be913b2d1003c260d9fa05
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=alpha SHELL=/bin/bash drivers/vdpa/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   drivers/vdpa/vdpa_user/vduse_dev.c: In function 'vduse_dev_dereg_iotlb_mem':
   drivers/vdpa/vdpa_user/vduse_dev.c:949:9: error: implicit declaration of function 'vfree'; did you mean 'kvfree'? [-Werror=implicit-function-declaration]
     949 |         vfree(dev->iotlb_mem->pages);
         |         ^~~~~
         |         kvfree
   drivers/vdpa/vdpa_user/vduse_dev.c: In function 'vduse_dev_reg_iotlb_mem':
   drivers/vdpa/vdpa_user/vduse_dev.c:978:21: error: implicit declaration of function '__vmalloc'; did you mean '__kmalloc'? [-Werror=implicit-function-declaration]
     978 |         page_list = __vmalloc(array_size(npages, sizeof(struct page *)),
         |                     ^~~~~~~~~
         |                     __kmalloc
>> drivers/vdpa/vdpa_user/vduse_dev.c:978:19: warning: assignment to 'struct page **' from 'int' makes pointer from integer without a cast [-Wint-conversion]
     978 |         page_list = __vmalloc(array_size(npages, sizeof(struct page *)),
         |                   ^
   cc1: some warnings being treated as errors


vim +978 drivers/vdpa/vdpa_user/vduse_dev.c

   957	
   958	static int vduse_dev_reg_iotlb_mem(struct vduse_dev *dev,
   959					   u64 iova, u64 uaddr, u64 size)
   960	{
   961		struct page **page_list = NULL;
   962		struct vduse_iotlb_mem *mem = NULL;
   963		long pinned = 0;
   964		unsigned long npages, lock_limit;
   965		int ret;
   966	
   967		if (size != dev->domain->bounce_size ||
   968		    iova != 0 || uaddr & ~PAGE_MASK)
   969			return -EINVAL;
   970	
   971		mutex_lock(&dev->mem_lock);
   972		ret = -EEXIST;
   973		if (dev->iotlb_mem)
   974			goto unlock;
   975	
   976		ret = -ENOMEM;
   977		npages = size >> PAGE_SHIFT;
 > 978		page_list = __vmalloc(array_size(npages, sizeof(struct page *)),
   979				      GFP_KERNEL_ACCOUNT);
   980		mem = kzalloc(sizeof(*mem), GFP_KERNEL);
   981		if (!page_list || !mem)
   982			goto unlock;
   983	
   984		mmap_read_lock(current->mm);
   985	
   986		lock_limit = PFN_DOWN(rlimit(RLIMIT_MEMLOCK));
   987		if (npages + atomic64_read(&current->mm->pinned_vm) > lock_limit)
   988			goto out;
   989	
   990		pinned = pin_user_pages(uaddr, npages, FOLL_LONGTERM | FOLL_WRITE,
   991					page_list, NULL);
   992		if (pinned != npages) {
   993			ret = pinned < 0 ? pinned : -ENOMEM;
   994			goto out;
   995		}
   996	
   997		ret = vduse_domain_add_user_bounce_pages(dev->domain,
   998							 page_list, pinned);
   999		if (ret)
  1000			goto out;
  1001	
  1002		atomic64_add(npages, &current->mm->pinned_vm);
  1003	
  1004		mem->pages = page_list;
  1005		mem->npages = pinned;
  1006		mem->iova = iova;
  1007		mem->mm = current->mm;
  1008		mmgrab(current->mm);
  1009	
  1010		dev->iotlb_mem = mem;
  1011	out:
  1012		if (ret && pinned > 0)
  1013			unpin_user_pages(page_list, pinned);
  1014	
  1015		mmap_read_unlock(current->mm);
  1016	unlock:
  1017		if (ret) {
  1018			vfree(page_list);
  1019			kfree(mem);
  1020		}
  1021		mutex_unlock(&dev->mem_lock);
  1022		return ret;
  1023	}
  1024	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/5] VDUSE: Support registering userspace memory as bounce buffer
  2022-07-06  5:04 [PATCH v2 0/5] VDUSE: Support registering userspace memory as bounce buffer Xie Yongji
                   ` (4 preceding siblings ...)
  2022-07-06  5:05 ` [PATCH v2 5/5] vduse: Support registering userspace memory for IOTLB Xie Yongji
@ 2022-07-06  9:30 ` Jason Wang
  2022-07-06 10:15   ` Yongji Xie
  5 siblings, 1 reply; 25+ messages in thread
From: Jason Wang @ 2022-07-06  9:30 UTC (permalink / raw)
  To: Xie Yongji
  Cc: mst, Liu Xiaodong, Maxime Coquelin, Stefan Hajnoczi, songmuchun,
	virtualization, linux-kernel

On Wed, Jul 6, 2022 at 1:05 PM Xie Yongji <xieyongji@bytedance.com> wrote:
>
> Hi all,
>
> This series introduces some new ioctls: VDUSE_IOTLB_GET_INFO,
> VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM to support
> registering and de-registering userspace memory for IOTLB
> as bounce buffer in virtio-vdpa case.
>
> The VDUSE_IOTLB_GET_INFO ioctl can help user to query IOLTB
> information such as bounce buffer size. Then user can use
> those information on VDUSE_IOTLB_REG_UMEM and
> VDUSE_IOTLB_DEREG_UMEM ioctls to register and de-register
> userspace memory for IOTLB.
>
> During registering and de-registering, the DMA data in use
> would be copied from kernel bounce pages to userspace bounce
> pages and back.
>
> With this feature, some existing application such as SPDK
> and DPDK can leverage the datapath of VDUSE directly and
> efficiently as discussed before [1][2]. They can register
> some preallocated hugepages to VDUSE to avoid an extra
> memcpy from bounce-buffer to hugepages.

This is really interesting.

But a small concern on uAPI is that this seems to expose the VDUSE
internal implementation (bounce buffer) to userspace. We tried hard to
hide it via the GET_FD before. Anyway can we keep it?

Thanks

>
> The kernel and userspace codes could be found in github:
>
> https://github.com/bytedance/linux/tree/vduse-umem
> https://github.com/bytedance/qemu/tree/vduse-umem
>
> To test it with qemu-storage-daemon:
>
> $ qemu-storage-daemon \
>     --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server=on,wait=off \
>     --monitor chardev=charmonitor \
>     --blockdev driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0 \
>     --export type=vduse-blk,id=vduse-test,name=vduse-test,node-name=disk0,writable=on
>
> [1] https://lkml.org/lkml/2021/6/27/318
> [2] https://lkml.org/lkml/2022/7/4/246
>
> Please review, thanks!
>
> V1 to V2:
> - Drop the patch that updating API version [MST]
> - Replace unpin_user_pages() with unpin_user_pages_dirty_lock() [MST]
> - Use __vmalloc(__GFP_ACCOUNT) for memory accounting [MST]
>
> Xie Yongji (5):
>   vduse: Remove unnecessary spin lock protection
>   vduse: Use memcpy_{to,from}_page() in do_bounce()
>   vduse: Support using userspace pages as bounce buffer
>   vduse: Support querying IOLTB information
>   vduse: Support registering userspace memory for IOTLB
>
>  drivers/vdpa/vdpa_user/iova_domain.c | 134 ++++++++++++++++++++---
>  drivers/vdpa/vdpa_user/iova_domain.h |   9 ++
>  drivers/vdpa/vdpa_user/vduse_dev.c   | 152 +++++++++++++++++++++++++++
>  include/uapi/linux/vduse.h           |  45 ++++++++
>  4 files changed, 327 insertions(+), 13 deletions(-)
>
> --
> 2.20.1
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/5] VDUSE: Support registering userspace memory as bounce buffer
  2022-07-06  9:30 ` [PATCH v2 0/5] VDUSE: Support registering userspace memory as bounce buffer Jason Wang
@ 2022-07-06 10:15   ` Yongji Xie
  2022-07-08  8:38     ` Jason Wang
  0 siblings, 1 reply; 25+ messages in thread
From: Yongji Xie @ 2022-07-06 10:15 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, Liu Xiaodong, Maxime Coquelin, Stefan Hajnoczi, songmuchun,
	virtualization, linux-kernel

On Wed, Jul 6, 2022 at 5:30 PM Jason Wang <jasowang@redhat.com> wrote:
>
> On Wed, Jul 6, 2022 at 1:05 PM Xie Yongji <xieyongji@bytedance.com> wrote:
> >
> > Hi all,
> >
> > This series introduces some new ioctls: VDUSE_IOTLB_GET_INFO,
> > VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM to support
> > registering and de-registering userspace memory for IOTLB
> > as bounce buffer in virtio-vdpa case.
> >
> > The VDUSE_IOTLB_GET_INFO ioctl can help user to query IOLTB
> > information such as bounce buffer size. Then user can use
> > those information on VDUSE_IOTLB_REG_UMEM and
> > VDUSE_IOTLB_DEREG_UMEM ioctls to register and de-register
> > userspace memory for IOTLB.
> >
> > During registering and de-registering, the DMA data in use
> > would be copied from kernel bounce pages to userspace bounce
> > pages and back.
> >
> > With this feature, some existing application such as SPDK
> > and DPDK can leverage the datapath of VDUSE directly and
> > efficiently as discussed before [1][2]. They can register
> > some preallocated hugepages to VDUSE to avoid an extra
> > memcpy from bounce-buffer to hugepages.
>
> This is really interesting.
>
> But a small concern on uAPI is that this seems to expose the VDUSE
> internal implementation (bounce buffer) to userspace. We tried hard to
> hide it via the GET_FD before. Anyway can we keep it?
>

Another way is changing GET_FD ioctl to add a flag or reuse 'perm'
field to indicate whether a IOVA region supports userspace memory
registration. Then userspace can use
VDUSE_IOTLB_REG_UMEM/VDUSE_IOTLB_DEREG_UMEM to register/deregister
userspace memory for this IOVA region. Any suggestions?

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/5] vduse: Support registering userspace memory for IOTLB
  2022-07-06  5:05 ` [PATCH v2 5/5] vduse: Support registering userspace memory for IOTLB Xie Yongji
  2022-07-06  6:08   ` kernel test robot
@ 2022-07-07 23:01   ` kernel test robot
  2022-07-08  1:14   ` kernel test robot
  2 siblings, 0 replies; 25+ messages in thread
From: kernel test robot @ 2022-07-07 23:01 UTC (permalink / raw)
  To: Xie Yongji, mst, jasowang, xiaodong.liu, maxime.coquelin, stefanha
  Cc: llvm, kbuild-all, songmuchun, virtualization, linux-kernel

Hi Xie,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.19-rc5 next-20220707]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Xie-Yongji/VDUSE-Support-registering-userspace-memory-as-bounce-buffer/20220706-130802
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git e35e5b6f695d241ffb1d223207da58a1fbcdff4b
config: hexagon-randconfig-r005-20220707 (https://download.01.org/0day-ci/archive/20220708/202207080640.p8q3NqHB-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 66ae1d60bb278793fd651cece264699d522bab84)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/9be699264e4fede9c3be913b2d1003c260d9fa05
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Xie-Yongji/VDUSE-Support-registering-userspace-memory-as-bounce-buffer/20220706-130802
        git checkout 9be699264e4fede9c3be913b2d1003c260d9fa05
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash drivers/vdpa/vdpa_user/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

>> drivers/vdpa/vdpa_user/vduse_dev.c:949:2: error: call to undeclared function 'vfree'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
           vfree(dev->iotlb_mem->pages);
           ^
>> drivers/vdpa/vdpa_user/vduse_dev.c:978:14: error: call to undeclared function '__vmalloc'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
           page_list = __vmalloc(array_size(npages, sizeof(struct page *)),
                       ^
   drivers/vdpa/vdpa_user/vduse_dev.c:978:14: note: did you mean '__kmalloc'?
   include/linux/slab.h:434:7: note: '__kmalloc' declared here
   void *__kmalloc(size_t size, gfp_t flags) __assume_kmalloc_alignment __alloc_size(1);
         ^
>> drivers/vdpa/vdpa_user/vduse_dev.c:978:12: warning: incompatible integer to pointer conversion assigning to 'struct page **' from 'int' [-Wint-conversion]
           page_list = __vmalloc(array_size(npages, sizeof(struct page *)),
                     ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/vdpa/vdpa_user/vduse_dev.c:1018:3: error: call to undeclared function 'vfree'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
                   vfree(page_list);
                   ^
   drivers/vdpa/vdpa_user/vduse_dev.c:1654:51: warning: shift count >= width of type [-Wshift-count-overflow]
           ret = dma_set_mask_and_coherent(&vdev->vdpa.dev, DMA_BIT_MASK(64));
                                                            ^~~~~~~~~~~~~~~~
   include/linux/dma-mapping.h:76:54: note: expanded from macro 'DMA_BIT_MASK'
   #define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1))
                                                        ^ ~~~
   2 warnings and 3 errors generated.


vim +/vfree +949 drivers/vdpa/vdpa_user/vduse_dev.c

   929	
   930	static int vduse_dev_dereg_iotlb_mem(struct vduse_dev *dev,
   931					     u64 iova, u64 size)
   932	{
   933		int ret;
   934	
   935		mutex_lock(&dev->mem_lock);
   936		ret = -ENOENT;
   937		if (!dev->iotlb_mem)
   938			goto unlock;
   939	
   940		ret = -EINVAL;
   941		if (dev->iotlb_mem->iova != iova || size != dev->domain->bounce_size)
   942			goto unlock;
   943	
   944		vduse_domain_remove_user_bounce_pages(dev->domain);
   945		unpin_user_pages_dirty_lock(dev->iotlb_mem->pages,
   946					    dev->iotlb_mem->npages, true);
   947		atomic64_sub(dev->iotlb_mem->npages, &dev->iotlb_mem->mm->pinned_vm);
   948		mmdrop(dev->iotlb_mem->mm);
 > 949		vfree(dev->iotlb_mem->pages);
   950		kfree(dev->iotlb_mem);
   951		dev->iotlb_mem = NULL;
   952		ret = 0;
   953	unlock:
   954		mutex_unlock(&dev->mem_lock);
   955		return ret;
   956	}
   957	
   958	static int vduse_dev_reg_iotlb_mem(struct vduse_dev *dev,
   959					   u64 iova, u64 uaddr, u64 size)
   960	{
   961		struct page **page_list = NULL;
   962		struct vduse_iotlb_mem *mem = NULL;
   963		long pinned = 0;
   964		unsigned long npages, lock_limit;
   965		int ret;
   966	
   967		if (size != dev->domain->bounce_size ||
   968		    iova != 0 || uaddr & ~PAGE_MASK)
   969			return -EINVAL;
   970	
   971		mutex_lock(&dev->mem_lock);
   972		ret = -EEXIST;
   973		if (dev->iotlb_mem)
   974			goto unlock;
   975	
   976		ret = -ENOMEM;
   977		npages = size >> PAGE_SHIFT;
 > 978		page_list = __vmalloc(array_size(npages, sizeof(struct page *)),
   979				      GFP_KERNEL_ACCOUNT);
   980		mem = kzalloc(sizeof(*mem), GFP_KERNEL);
   981		if (!page_list || !mem)
   982			goto unlock;
   983	
   984		mmap_read_lock(current->mm);
   985	
   986		lock_limit = PFN_DOWN(rlimit(RLIMIT_MEMLOCK));
   987		if (npages + atomic64_read(&current->mm->pinned_vm) > lock_limit)
   988			goto out;
   989	
   990		pinned = pin_user_pages(uaddr, npages, FOLL_LONGTERM | FOLL_WRITE,
   991					page_list, NULL);
   992		if (pinned != npages) {
   993			ret = pinned < 0 ? pinned : -ENOMEM;
   994			goto out;
   995		}
   996	
   997		ret = vduse_domain_add_user_bounce_pages(dev->domain,
   998							 page_list, pinned);
   999		if (ret)
  1000			goto out;
  1001	
  1002		atomic64_add(npages, &current->mm->pinned_vm);
  1003	
  1004		mem->pages = page_list;
  1005		mem->npages = pinned;
  1006		mem->iova = iova;
  1007		mem->mm = current->mm;
  1008		mmgrab(current->mm);
  1009	
  1010		dev->iotlb_mem = mem;
  1011	out:
  1012		if (ret && pinned > 0)
  1013			unpin_user_pages(page_list, pinned);
  1014	
  1015		mmap_read_unlock(current->mm);
  1016	unlock:
  1017		if (ret) {
  1018			vfree(page_list);
  1019			kfree(mem);
  1020		}
  1021		mutex_unlock(&dev->mem_lock);
  1022		return ret;
  1023	}
  1024	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/5] vduse: Support registering userspace memory for IOTLB
  2022-07-06  5:05 ` [PATCH v2 5/5] vduse: Support registering userspace memory for IOTLB Xie Yongji
  2022-07-06  6:08   ` kernel test robot
  2022-07-07 23:01   ` kernel test robot
@ 2022-07-08  1:14   ` kernel test robot
  2 siblings, 0 replies; 25+ messages in thread
From: kernel test robot @ 2022-07-08  1:14 UTC (permalink / raw)
  To: Xie Yongji, mst, jasowang, xiaodong.liu, maxime.coquelin, stefanha
  Cc: kbuild-all, songmuchun, virtualization, linux-kernel

Hi Xie,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.19-rc5 next-20220707]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Xie-Yongji/VDUSE-Support-registering-userspace-memory-as-bounce-buffer/20220706-130802
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git e35e5b6f695d241ffb1d223207da58a1fbcdff4b
config: parisc-randconfig-r003-20220707 (https://download.01.org/0day-ci/archive/20220708/202207080910.VfMFrTtN-lkp@intel.com/config)
compiler: hppa-linux-gcc (GCC) 11.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/9be699264e4fede9c3be913b2d1003c260d9fa05
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Xie-Yongji/VDUSE-Support-registering-userspace-memory-as-bounce-buffer/20220706-130802
        git checkout 9be699264e4fede9c3be913b2d1003c260d9fa05
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=parisc SHELL=/bin/bash drivers/vdpa/vdpa_user/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   drivers/vdpa/vdpa_user/vduse_dev.c: In function 'vduse_dev_dereg_iotlb_mem':
>> drivers/vdpa/vdpa_user/vduse_dev.c:949:9: error: implicit declaration of function 'vfree'; did you mean 'kvfree'? [-Werror=implicit-function-declaration]
     949 |         vfree(dev->iotlb_mem->pages);
         |         ^~~~~
         |         kvfree
   drivers/vdpa/vdpa_user/vduse_dev.c: In function 'vduse_dev_reg_iotlb_mem':
>> drivers/vdpa/vdpa_user/vduse_dev.c:978:21: error: implicit declaration of function '__vmalloc'; did you mean '__kmalloc'? [-Werror=implicit-function-declaration]
     978 |         page_list = __vmalloc(array_size(npages, sizeof(struct page *)),
         |                     ^~~~~~~~~
         |                     __kmalloc
>> drivers/vdpa/vdpa_user/vduse_dev.c:978:19: warning: assignment to 'struct page **' from 'int' makes pointer from integer without a cast [-Wint-conversion]
     978 |         page_list = __vmalloc(array_size(npages, sizeof(struct page *)),
         |                   ^
   cc1: some warnings being treated as errors

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for DRM_DP_AUX_BUS
   Depends on HAS_IOMEM && DRM && OF
   Selected by
   - DRM_MSM && HAS_IOMEM && DRM && (ARCH_QCOM || SOC_IMX5 || COMPILE_TEST && COMMON_CLK && IOMMU_SUPPORT && (QCOM_OCMEM || QCOM_OCMEM && (QCOM_LLCC || QCOM_LLCC && (QCOM_COMMAND_DB || QCOM_COMMAND_DB


vim +949 drivers/vdpa/vdpa_user/vduse_dev.c

   929	
   930	static int vduse_dev_dereg_iotlb_mem(struct vduse_dev *dev,
   931					     u64 iova, u64 size)
   932	{
   933		int ret;
   934	
   935		mutex_lock(&dev->mem_lock);
   936		ret = -ENOENT;
   937		if (!dev->iotlb_mem)
   938			goto unlock;
   939	
   940		ret = -EINVAL;
   941		if (dev->iotlb_mem->iova != iova || size != dev->domain->bounce_size)
   942			goto unlock;
   943	
   944		vduse_domain_remove_user_bounce_pages(dev->domain);
   945		unpin_user_pages_dirty_lock(dev->iotlb_mem->pages,
   946					    dev->iotlb_mem->npages, true);
   947		atomic64_sub(dev->iotlb_mem->npages, &dev->iotlb_mem->mm->pinned_vm);
   948		mmdrop(dev->iotlb_mem->mm);
 > 949		vfree(dev->iotlb_mem->pages);
   950		kfree(dev->iotlb_mem);
   951		dev->iotlb_mem = NULL;
   952		ret = 0;
   953	unlock:
   954		mutex_unlock(&dev->mem_lock);
   955		return ret;
   956	}
   957	
   958	static int vduse_dev_reg_iotlb_mem(struct vduse_dev *dev,
   959					   u64 iova, u64 uaddr, u64 size)
   960	{
   961		struct page **page_list = NULL;
   962		struct vduse_iotlb_mem *mem = NULL;
   963		long pinned = 0;
   964		unsigned long npages, lock_limit;
   965		int ret;
   966	
   967		if (size != dev->domain->bounce_size ||
   968		    iova != 0 || uaddr & ~PAGE_MASK)
   969			return -EINVAL;
   970	
   971		mutex_lock(&dev->mem_lock);
   972		ret = -EEXIST;
   973		if (dev->iotlb_mem)
   974			goto unlock;
   975	
   976		ret = -ENOMEM;
   977		npages = size >> PAGE_SHIFT;
 > 978		page_list = __vmalloc(array_size(npages, sizeof(struct page *)),
   979				      GFP_KERNEL_ACCOUNT);
   980		mem = kzalloc(sizeof(*mem), GFP_KERNEL);
   981		if (!page_list || !mem)
   982			goto unlock;
   983	
   984		mmap_read_lock(current->mm);
   985	
   986		lock_limit = PFN_DOWN(rlimit(RLIMIT_MEMLOCK));
   987		if (npages + atomic64_read(&current->mm->pinned_vm) > lock_limit)
   988			goto out;
   989	
   990		pinned = pin_user_pages(uaddr, npages, FOLL_LONGTERM | FOLL_WRITE,
   991					page_list, NULL);
   992		if (pinned != npages) {
   993			ret = pinned < 0 ? pinned : -ENOMEM;
   994			goto out;
   995		}
   996	
   997		ret = vduse_domain_add_user_bounce_pages(dev->domain,
   998							 page_list, pinned);
   999		if (ret)
  1000			goto out;
  1001	
  1002		atomic64_add(npages, &current->mm->pinned_vm);
  1003	
  1004		mem->pages = page_list;
  1005		mem->npages = pinned;
  1006		mem->iova = iova;
  1007		mem->mm = current->mm;
  1008		mmgrab(current->mm);
  1009	
  1010		dev->iotlb_mem = mem;
  1011	out:
  1012		if (ret && pinned > 0)
  1013			unpin_user_pages(page_list, pinned);
  1014	
  1015		mmap_read_unlock(current->mm);
  1016	unlock:
  1017		if (ret) {
  1018			vfree(page_list);
  1019			kfree(mem);
  1020		}
  1021		mutex_unlock(&dev->mem_lock);
  1022		return ret;
  1023	}
  1024	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/5] VDUSE: Support registering userspace memory as bounce buffer
  2022-07-06 10:15   ` Yongji Xie
@ 2022-07-08  8:38     ` Jason Wang
  2022-07-08  9:53       ` Yongji Xie
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Wang @ 2022-07-08  8:38 UTC (permalink / raw)
  To: Yongji Xie
  Cc: mst, Liu Xiaodong, Maxime Coquelin, Stefan Hajnoczi, songmuchun,
	virtualization, linux-kernel

On Wed, Jul 6, 2022 at 6:16 PM Yongji Xie <xieyongji@bytedance.com> wrote:
>
> On Wed, Jul 6, 2022 at 5:30 PM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Wed, Jul 6, 2022 at 1:05 PM Xie Yongji <xieyongji@bytedance.com> wrote:
> > >
> > > Hi all,
> > >
> > > This series introduces some new ioctls: VDUSE_IOTLB_GET_INFO,
> > > VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM to support
> > > registering and de-registering userspace memory for IOTLB
> > > as bounce buffer in virtio-vdpa case.
> > >
> > > The VDUSE_IOTLB_GET_INFO ioctl can help user to query IOLTB
> > > information such as bounce buffer size. Then user can use
> > > those information on VDUSE_IOTLB_REG_UMEM and
> > > VDUSE_IOTLB_DEREG_UMEM ioctls to register and de-register
> > > userspace memory for IOTLB.
> > >
> > > During registering and de-registering, the DMA data in use
> > > would be copied from kernel bounce pages to userspace bounce
> > > pages and back.
> > >
> > > With this feature, some existing application such as SPDK
> > > and DPDK can leverage the datapath of VDUSE directly and
> > > efficiently as discussed before [1][2]. They can register
> > > some preallocated hugepages to VDUSE to avoid an extra
> > > memcpy from bounce-buffer to hugepages.
> >
> > This is really interesting.
> >
> > But a small concern on uAPI is that this seems to expose the VDUSE
> > internal implementation (bounce buffer) to userspace. We tried hard to
> > hide it via the GET_FD before. Anyway can we keep it?
> >
>
> Another way is changing GET_FD ioctl to add a flag or reuse 'perm'
> field to indicate whether a IOVA region supports userspace memory
> registration. Then userspace can use
> VDUSE_IOTLB_REG_UMEM/VDUSE_IOTLB_DEREG_UMEM to register/deregister
> userspace memory for this IOVA region.

Looks better.

> Any suggestions?

I wonder what's the value of keeping the compatibility with the kernel
mmaped bounce buffer. It means we need to take extra care on e.g data
copying when reg/reg user space memory.

Can we simply allow the third kind of fd that only works for umem registration?

Thanks

>
> Thanks,
> Yongji
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/5] VDUSE: Support registering userspace memory as bounce buffer
  2022-07-08  8:38     ` Jason Wang
@ 2022-07-08  9:53       ` Yongji Xie
  2022-07-11  6:02         ` Jason Wang
  0 siblings, 1 reply; 25+ messages in thread
From: Yongji Xie @ 2022-07-08  9:53 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, Liu Xiaodong, Maxime Coquelin, Stefan Hajnoczi, songmuchun,
	virtualization, linux-kernel

On Fri, Jul 8, 2022 at 4:38 PM Jason Wang <jasowang@redhat.com> wrote:
>
> On Wed, Jul 6, 2022 at 6:16 PM Yongji Xie <xieyongji@bytedance.com> wrote:
> >
> > On Wed, Jul 6, 2022 at 5:30 PM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Wed, Jul 6, 2022 at 1:05 PM Xie Yongji <xieyongji@bytedance.com> wrote:
> > > >
> > > > Hi all,
> > > >
> > > > This series introduces some new ioctls: VDUSE_IOTLB_GET_INFO,
> > > > VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM to support
> > > > registering and de-registering userspace memory for IOTLB
> > > > as bounce buffer in virtio-vdpa case.
> > > >
> > > > The VDUSE_IOTLB_GET_INFO ioctl can help user to query IOLTB
> > > > information such as bounce buffer size. Then user can use
> > > > those information on VDUSE_IOTLB_REG_UMEM and
> > > > VDUSE_IOTLB_DEREG_UMEM ioctls to register and de-register
> > > > userspace memory for IOTLB.
> > > >
> > > > During registering and de-registering, the DMA data in use
> > > > would be copied from kernel bounce pages to userspace bounce
> > > > pages and back.
> > > >
> > > > With this feature, some existing application such as SPDK
> > > > and DPDK can leverage the datapath of VDUSE directly and
> > > > efficiently as discussed before [1][2]. They can register
> > > > some preallocated hugepages to VDUSE to avoid an extra
> > > > memcpy from bounce-buffer to hugepages.
> > >
> > > This is really interesting.
> > >
> > > But a small concern on uAPI is that this seems to expose the VDUSE
> > > internal implementation (bounce buffer) to userspace. We tried hard to
> > > hide it via the GET_FD before. Anyway can we keep it?
> > >
> >
> > Another way is changing GET_FD ioctl to add a flag or reuse 'perm'
> > field to indicate whether a IOVA region supports userspace memory
> > registration. Then userspace can use
> > VDUSE_IOTLB_REG_UMEM/VDUSE_IOTLB_DEREG_UMEM to register/deregister
> > userspace memory for this IOVA region.
>
> Looks better.
>

OK.

> > Any suggestions?
>
> I wonder what's the value of keeping the compatibility with the kernel
> mmaped bounce buffer. It means we need to take extra care on e.g data
> copying when reg/reg user space memory.
>

I'm not sure I get your point on the compatibility with the kernel
bounce buffer. Do you mean they use the same iova region?

The userspace daemon might crash or reboot. In those cases, we still
need a kernel buffer to store/recover the data.

> Can we simply allow the third kind of fd that only works for umem registration?
>

Do you mean using another iova region for umem? I think we don't need
a fd in umem case since the userspace daemon can access the memory
directly without using mmap() to map it into the address space in
advance.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/5] VDUSE: Support registering userspace memory as bounce buffer
  2022-07-08  9:53       ` Yongji Xie
@ 2022-07-11  6:02         ` Jason Wang
  2022-07-11  7:24           ` Yongji Xie
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Wang @ 2022-07-11  6:02 UTC (permalink / raw)
  To: Yongji Xie
  Cc: mst, Liu Xiaodong, Maxime Coquelin, Stefan Hajnoczi, songmuchun,
	virtualization, linux-kernel

On Fri, Jul 8, 2022 at 5:53 PM Yongji Xie <xieyongji@bytedance.com> wrote:
>
> On Fri, Jul 8, 2022 at 4:38 PM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Wed, Jul 6, 2022 at 6:16 PM Yongji Xie <xieyongji@bytedance.com> wrote:
> > >
> > > On Wed, Jul 6, 2022 at 5:30 PM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Wed, Jul 6, 2022 at 1:05 PM Xie Yongji <xieyongji@bytedance.com> wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > This series introduces some new ioctls: VDUSE_IOTLB_GET_INFO,
> > > > > VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM to support
> > > > > registering and de-registering userspace memory for IOTLB
> > > > > as bounce buffer in virtio-vdpa case.
> > > > >
> > > > > The VDUSE_IOTLB_GET_INFO ioctl can help user to query IOLTB
> > > > > information such as bounce buffer size. Then user can use
> > > > > those information on VDUSE_IOTLB_REG_UMEM and
> > > > > VDUSE_IOTLB_DEREG_UMEM ioctls to register and de-register
> > > > > userspace memory for IOTLB.
> > > > >
> > > > > During registering and de-registering, the DMA data in use
> > > > > would be copied from kernel bounce pages to userspace bounce
> > > > > pages and back.
> > > > >
> > > > > With this feature, some existing application such as SPDK
> > > > > and DPDK can leverage the datapath of VDUSE directly and
> > > > > efficiently as discussed before [1][2]. They can register
> > > > > some preallocated hugepages to VDUSE to avoid an extra
> > > > > memcpy from bounce-buffer to hugepages.
> > > >
> > > > This is really interesting.
> > > >
> > > > But a small concern on uAPI is that this seems to expose the VDUSE
> > > > internal implementation (bounce buffer) to userspace. We tried hard to
> > > > hide it via the GET_FD before. Anyway can we keep it?
> > > >
> > >
> > > Another way is changing GET_FD ioctl to add a flag or reuse 'perm'
> > > field to indicate whether a IOVA region supports userspace memory
> > > registration. Then userspace can use
> > > VDUSE_IOTLB_REG_UMEM/VDUSE_IOTLB_DEREG_UMEM to register/deregister
> > > userspace memory for this IOVA region.
> >
> > Looks better.
> >
>
> OK.
>
> > > Any suggestions?
> >
> > I wonder what's the value of keeping the compatibility with the kernel
> > mmaped bounce buffer. It means we need to take extra care on e.g data
> > copying when reg/reg user space memory.
> >
>
> I'm not sure I get your point on the compatibility with the kernel
> bounce buffer. Do you mean they use the same iova region?

Yes.

>
> The userspace daemon might crash or reboot. In those cases, we still
> need a kernel buffer to store/recover the data.

Yes, this should be a good point.

>
> > Can we simply allow the third kind of fd that only works for umem registration?
> >
>
> Do you mean using another iova region for umem?

I meant having a new kind of fd that only allows umem registration.

>I think we don't need
> a fd in umem case since the userspace daemon can access the memory
> directly without using mmap() to map it into the address space in
> advance.

Ok, I will have a look at the code and get back.

Thanks

>
> Thanks,
> Yongji
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/5] VDUSE: Support registering userspace memory as bounce buffer
  2022-07-11  6:02         ` Jason Wang
@ 2022-07-11  7:24           ` Yongji Xie
  2022-07-14  2:59             ` Jason Wang
  0 siblings, 1 reply; 25+ messages in thread
From: Yongji Xie @ 2022-07-11  7:24 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, Liu Xiaodong, Maxime Coquelin, Stefan Hajnoczi, songmuchun,
	virtualization, linux-kernel

On Mon, Jul 11, 2022 at 2:02 PM Jason Wang <jasowang@redhat.com> wrote:
>
> On Fri, Jul 8, 2022 at 5:53 PM Yongji Xie <xieyongji@bytedance.com> wrote:
> >
> > On Fri, Jul 8, 2022 at 4:38 PM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Wed, Jul 6, 2022 at 6:16 PM Yongji Xie <xieyongji@bytedance.com> wrote:
> > > >
> > > > On Wed, Jul 6, 2022 at 5:30 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > On Wed, Jul 6, 2022 at 1:05 PM Xie Yongji <xieyongji@bytedance.com> wrote:
> > > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > This series introduces some new ioctls: VDUSE_IOTLB_GET_INFO,
> > > > > > VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM to support
> > > > > > registering and de-registering userspace memory for IOTLB
> > > > > > as bounce buffer in virtio-vdpa case.
> > > > > >
> > > > > > The VDUSE_IOTLB_GET_INFO ioctl can help user to query IOLTB
> > > > > > information such as bounce buffer size. Then user can use
> > > > > > those information on VDUSE_IOTLB_REG_UMEM and
> > > > > > VDUSE_IOTLB_DEREG_UMEM ioctls to register and de-register
> > > > > > userspace memory for IOTLB.
> > > > > >
> > > > > > During registering and de-registering, the DMA data in use
> > > > > > would be copied from kernel bounce pages to userspace bounce
> > > > > > pages and back.
> > > > > >
> > > > > > With this feature, some existing application such as SPDK
> > > > > > and DPDK can leverage the datapath of VDUSE directly and
> > > > > > efficiently as discussed before [1][2]. They can register
> > > > > > some preallocated hugepages to VDUSE to avoid an extra
> > > > > > memcpy from bounce-buffer to hugepages.
> > > > >
> > > > > This is really interesting.
> > > > >
> > > > > But a small concern on uAPI is that this seems to expose the VDUSE
> > > > > internal implementation (bounce buffer) to userspace. We tried hard to
> > > > > hide it via the GET_FD before. Anyway can we keep it?
> > > > >
> > > >
> > > > Another way is changing GET_FD ioctl to add a flag or reuse 'perm'
> > > > field to indicate whether a IOVA region supports userspace memory
> > > > registration. Then userspace can use
> > > > VDUSE_IOTLB_REG_UMEM/VDUSE_IOTLB_DEREG_UMEM to register/deregister
> > > > userspace memory for this IOVA region.
> > >
> > > Looks better.
> > >
> >
> > OK.
> >
> > > > Any suggestions?
> > >
> > > I wonder what's the value of keeping the compatibility with the kernel
> > > mmaped bounce buffer. It means we need to take extra care on e.g data
> > > copying when reg/reg user space memory.
> > >
> >
> > I'm not sure I get your point on the compatibility with the kernel
> > bounce buffer. Do you mean they use the same iova region?
>
> Yes.
>
> >
> > The userspace daemon might crash or reboot. In those cases, we still
> > need a kernel buffer to store/recover the data.
>
> Yes, this should be a good point.
>
> >
> > > Can we simply allow the third kind of fd that only works for umem registration?
> > >
> >
> > Do you mean using another iova region for umem?
>
> I meant having a new kind of fd that only allows umem registration.
>

OK. It seems to be a little complicated to allow mapping a registered
user memory via a new fd, e.g. how to handle the mapping if the
userspace daemon exits but the fd is already passed to another
process.

> >I think we don't need
> > a fd in umem case since the userspace daemon can access the memory
> > directly without using mmap() to map it into the address space in
> > advance.
>
> Ok, I will have a look at the code and get back.
>

OK. Looking forward to your reply.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/5] vduse: Remove unnecessary spin lock protection
  2022-07-06  5:04 ` [PATCH v2 1/5] vduse: Remove unnecessary spin lock protection Xie Yongji
@ 2022-07-13  5:43   ` Jason Wang
  2022-07-13 11:08     ` Yongji Xie
  2022-07-13  5:57   ` Michael S. Tsirkin
  1 sibling, 1 reply; 25+ messages in thread
From: Jason Wang @ 2022-07-13  5:43 UTC (permalink / raw)
  To: Xie Yongji, mst, xiaodong.liu, maxime.coquelin, stefanha
  Cc: songmuchun, virtualization, linux-kernel


在 2022/7/6 13:04, Xie Yongji 写道:
> Taking iotlb lock to access bounce page in page fault
> handler is meaningless since vduse_domain_free_bounce_pages()
> would only be called during file release.
>
> Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
> ---
>   drivers/vdpa/vdpa_user/iova_domain.c | 7 ++-----
>   1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c
> index 6daa3978d290..bca1f0b8850c 100644
> --- a/drivers/vdpa/vdpa_user/iova_domain.c
> +++ b/drivers/vdpa/vdpa_user/iova_domain.c
> @@ -211,17 +211,14 @@ static struct page *
>   vduse_domain_get_bounce_page(struct vduse_iova_domain *domain, u64 iova)
>   {
>   	struct vduse_bounce_map *map;
> -	struct page *page = NULL;
> +	struct page *page;
>   
> -	spin_lock(&domain->iotlb_lock);
>   	map = &domain->bounce_maps[iova >> PAGE_SHIFT];
>   	if (!map->bounce_page)
> -		goto out;
> +		return NULL;


Interesting, I wonder why we don't serialize with 
vduse_domain_map_bounce_page() with iotlb_lock?

Thanks


>   
>   	page = map->bounce_page;
>   	get_page(page);
> -out:
> -	spin_unlock(&domain->iotlb_lock);
>   
>   	return page;
>   }


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/5] vduse: Remove unnecessary spin lock protection
  2022-07-06  5:04 ` [PATCH v2 1/5] vduse: Remove unnecessary spin lock protection Xie Yongji
  2022-07-13  5:43   ` Jason Wang
@ 2022-07-13  5:57   ` Michael S. Tsirkin
  2022-07-13 10:39     ` Yongji Xie
  1 sibling, 1 reply; 25+ messages in thread
From: Michael S. Tsirkin @ 2022-07-13  5:57 UTC (permalink / raw)
  To: Xie Yongji
  Cc: jasowang, xiaodong.liu, maxime.coquelin, stefanha, songmuchun,
	virtualization, linux-kernel

On Wed, Jul 06, 2022 at 01:04:59PM +0800, Xie Yongji wrote:
> Taking iotlb lock to access bounce page in page fault
> handler is meaningless since vduse_domain_free_bounce_pages()
> would only be called during file release.
> 
> Signed-off-by: Xie Yongji <xieyongji@bytedance.com>

vduse_domain_free_bounce_pages is not the
only one taking this lock. This commit log needs more
analysis documenting all points of access to bounce_maps
and why vduse_domain_get_bounce_page and file
release are the only two.

> ---
>  drivers/vdpa/vdpa_user/iova_domain.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c
> index 6daa3978d290..bca1f0b8850c 100644
> --- a/drivers/vdpa/vdpa_user/iova_domain.c
> +++ b/drivers/vdpa/vdpa_user/iova_domain.c
> @@ -211,17 +211,14 @@ static struct page *
>  vduse_domain_get_bounce_page(struct vduse_iova_domain *domain, u64 iova)
>  {
>  	struct vduse_bounce_map *map;
> -	struct page *page = NULL;
> +	struct page *page;
>  
> -	spin_lock(&domain->iotlb_lock);
>  	map = &domain->bounce_maps[iova >> PAGE_SHIFT];
>  	if (!map->bounce_page)
> -		goto out;
> +		return NULL;
>  
>  	page = map->bounce_page;
>  	get_page(page);
> -out:
> -	spin_unlock(&domain->iotlb_lock);
>  
>  	return page;
>  }
> -- 
> 2.20.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/5] vduse: Use memcpy_{to,from}_page() in do_bounce()
  2022-07-06  5:05 ` [PATCH v2 2/5] vduse: Use memcpy_{to,from}_page() in do_bounce() Xie Yongji
@ 2022-07-13  5:59   ` Jason Wang
  0 siblings, 0 replies; 25+ messages in thread
From: Jason Wang @ 2022-07-13  5:59 UTC (permalink / raw)
  To: Xie Yongji, mst, xiaodong.liu, maxime.coquelin, stefanha
  Cc: songmuchun, virtualization, linux-kernel


在 2022/7/6 13:05, Xie Yongji 写道:
> kmap_atomic() is being deprecated in favor of kmap_local_page().
>
> The use of kmap_atomic() in do_bounce() is all thread local therefore
> kmap_local_page() is a sufficient replacement.
>
> Convert to kmap_local_page() but, instead of open coding it,
> use the helpers memcpy_to_page() and memcpy_from_page().
>
> Signed-off-by: Xie Yongji <xieyongji@bytedance.com>


Acked-by: Jason Wang <jasowang@redhat.com>


> ---
>   drivers/vdpa/vdpa_user/iova_domain.c | 9 ++++-----
>   1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c
> index bca1f0b8850c..50d7c08d5450 100644
> --- a/drivers/vdpa/vdpa_user/iova_domain.c
> +++ b/drivers/vdpa/vdpa_user/iova_domain.c
> @@ -138,18 +138,17 @@ static void do_bounce(phys_addr_t orig, void *addr, size_t size,
>   {
>   	unsigned long pfn = PFN_DOWN(orig);
>   	unsigned int offset = offset_in_page(orig);
> -	char *buffer;
> +	struct page *page;
>   	unsigned int sz = 0;
>   
>   	while (size) {
>   		sz = min_t(size_t, PAGE_SIZE - offset, size);
>   
> -		buffer = kmap_atomic(pfn_to_page(pfn));
> +		page = pfn_to_page(pfn);
>   		if (dir == DMA_TO_DEVICE)
> -			memcpy(addr, buffer + offset, sz);
> +			memcpy_from_page(addr, page, offset, sz);
>   		else
> -			memcpy(buffer + offset, addr, sz);
> -		kunmap_atomic(buffer);
> +			memcpy_to_page(page, offset, addr, sz);
>   
>   		size -= sz;
>   		pfn++;


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/5] vduse: Remove unnecessary spin lock protection
  2022-07-13  5:57   ` Michael S. Tsirkin
@ 2022-07-13 10:39     ` Yongji Xie
  0 siblings, 0 replies; 25+ messages in thread
From: Yongji Xie @ 2022-07-13 10:39 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Liu Xiaodong, Maxime Coquelin, Stefan Hajnoczi,
	songmuchun, virtualization, linux-kernel

On Wed, Jul 13, 2022 at 1:57 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jul 06, 2022 at 01:04:59PM +0800, Xie Yongji wrote:
> > Taking iotlb lock to access bounce page in page fault
> > handler is meaningless since vduse_domain_free_bounce_pages()
> > would only be called during file release.
> >
> > Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
>
> vduse_domain_free_bounce_pages is not the
> only one taking this lock.

Yes, but only vduse_domain_free_bounce_pages() is to protect
domain->bounce_maps with this lock. In other places, the lock is used
to protect the domain->iotlb.

> This commit log needs more
> analysis documenting all points of access to bounce_maps
> and why vduse_domain_get_bounce_page and file
> release are the only two.
>

OK, I will explain that we actually protect two different variables
(domain->bounce_maps and domain->iotlb) with one lock.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/5] vduse: Remove unnecessary spin lock protection
  2022-07-13  5:43   ` Jason Wang
@ 2022-07-13 11:08     ` Yongji Xie
  2022-07-14  2:27       ` Jason Wang
  0 siblings, 1 reply; 25+ messages in thread
From: Yongji Xie @ 2022-07-13 11:08 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Liu Xiaodong, Maxime Coquelin,
	Stefan Hajnoczi, songmuchun, virtualization, linux-kernel

On Wed, Jul 13, 2022 at 1:44 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/7/6 13:04, Xie Yongji 写道:
> > Taking iotlb lock to access bounce page in page fault
> > handler is meaningless since vduse_domain_free_bounce_pages()
> > would only be called during file release.
> >
> > Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
> > ---
> >   drivers/vdpa/vdpa_user/iova_domain.c | 7 ++-----
> >   1 file changed, 2 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c
> > index 6daa3978d290..bca1f0b8850c 100644
> > --- a/drivers/vdpa/vdpa_user/iova_domain.c
> > +++ b/drivers/vdpa/vdpa_user/iova_domain.c
> > @@ -211,17 +211,14 @@ static struct page *
> >   vduse_domain_get_bounce_page(struct vduse_iova_domain *domain, u64 iova)
> >   {
> >       struct vduse_bounce_map *map;
> > -     struct page *page = NULL;
> > +     struct page *page;
> >
> > -     spin_lock(&domain->iotlb_lock);
> >       map = &domain->bounce_maps[iova >> PAGE_SHIFT];
> >       if (!map->bounce_page)
> > -             goto out;
> > +             return NULL;
>
>
> Interesting, I wonder why we don't serialize with
> vduse_domain_map_bounce_page() with iotlb_lock?
>

Userspace should only access the bounce page after we set up the dma
mapping, so we don't need serialization from the iotlb_lock in this
case. And vduse_domain_map_bounce_page() only sets the
map->bounce_page rather than clears the map->bounce_page, we would not
have any problem without the lock protection.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/5] vduse: Remove unnecessary spin lock protection
  2022-07-13 11:08     ` Yongji Xie
@ 2022-07-14  2:27       ` Jason Wang
  2022-07-14  6:06         ` Yongji Xie
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Wang @ 2022-07-14  2:27 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Michael S. Tsirkin, Liu Xiaodong, Maxime Coquelin,
	Stefan Hajnoczi, songmuchun, virtualization, linux-kernel

On Wed, Jul 13, 2022 at 7:09 PM Yongji Xie <xieyongji@bytedance.com> wrote:
>
> On Wed, Jul 13, 2022 at 1:44 PM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2022/7/6 13:04, Xie Yongji 写道:
> > > Taking iotlb lock to access bounce page in page fault
> > > handler is meaningless since vduse_domain_free_bounce_pages()
> > > would only be called during file release.
> > >
> > > Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
> > > ---
> > >   drivers/vdpa/vdpa_user/iova_domain.c | 7 ++-----
> > >   1 file changed, 2 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c
> > > index 6daa3978d290..bca1f0b8850c 100644
> > > --- a/drivers/vdpa/vdpa_user/iova_domain.c
> > > +++ b/drivers/vdpa/vdpa_user/iova_domain.c
> > > @@ -211,17 +211,14 @@ static struct page *
> > >   vduse_domain_get_bounce_page(struct vduse_iova_domain *domain, u64 iova)
> > >   {
> > >       struct vduse_bounce_map *map;
> > > -     struct page *page = NULL;
> > > +     struct page *page;
> > >
> > > -     spin_lock(&domain->iotlb_lock);
> > >       map = &domain->bounce_maps[iova >> PAGE_SHIFT];
> > >       if (!map->bounce_page)
> > > -             goto out;
> > > +             return NULL;
> >
> >
> > Interesting, I wonder why we don't serialize with
> > vduse_domain_map_bounce_page() with iotlb_lock?
> >
>
> Userspace should only access the bounce page after we set up the dma
> mapping, so we don't need serialization from the iotlb_lock in this
> case.

What about the buggy/malicious user space that tries to access those
pages before or just in the middle of it has been mapped?

> And vduse_domain_map_bounce_page() only sets the
> map->bounce_page rather than clears the map->bounce_page, we would not
> have any problem without the lock protection.

Probably, I see an assignment of orig_phys after the alloc_page() but
it seems only used in bouncing which will only be called by dma ops.
At least we'd better have a comment to explain the synchronization
here.

Thanks

>
> Thanks,
> Yongji
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/5] vduse: Support querying IOLTB information
  2022-07-06  5:05 ` [PATCH v2 4/5] vduse: Support querying IOLTB information Xie Yongji
@ 2022-07-14  2:51   ` Jason Wang
  2022-07-14  5:58     ` Yongji Xie
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Wang @ 2022-07-14  2:51 UTC (permalink / raw)
  To: Xie Yongji
  Cc: mst, Liu Xiaodong, Maxime Coquelin, Stefan Hajnoczi, songmuchun,
	virtualization, linux-kernel

On Wed, Jul 6, 2022 at 1:06 PM Xie Yongji <xieyongji@bytedance.com> wrote:
>
> This introduces a new ioctl: VDUSE_IOTLB_GET_INFO to
> support querying IOLTB information such as bounce
> buffer size.
>
> Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
> ---
>  drivers/vdpa/vdpa_user/vduse_dev.c | 13 +++++++++++++
>  include/uapi/linux/vduse.h         | 17 +++++++++++++++++
>  2 files changed, 30 insertions(+)
>
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> index 3bc27de58f46..c47a5d9765cf 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -1089,6 +1089,19 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
>                 ret = vduse_dev_queue_irq_work(dev, &dev->vqs[index].inject);
>                 break;
>         }
> +       case VDUSE_IOTLB_GET_INFO: {

As discussed, it's better not to expose the VDUSE internal structure
like "IOTLB" in the name.

We probably need to extend GET_FD or have a new ioctl like GET_FD_INFO.

Thanks

> +               struct vduse_iotlb_info iotlb;
> +
> +               iotlb.bounce_iova = 0;
> +               iotlb.bounce_size = dev->domain->bounce_size;
> +
> +               ret = -EFAULT;
> +               if (copy_to_user(argp, &iotlb, sizeof(iotlb)))
> +                       break;
> +
> +               ret = 0;
> +               break;
> +       }
>         default:
>                 ret = -ENOIOCTLCMD;
>                 break;
> diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h
> index 7cfe1c1280c0..c201b7a77c2c 100644
> --- a/include/uapi/linux/vduse.h
> +++ b/include/uapi/linux/vduse.h
> @@ -210,6 +210,23 @@ struct vduse_vq_eventfd {
>   */
>  #define VDUSE_VQ_INJECT_IRQ    _IOW(VDUSE_BASE, 0x17, __u32)
>
> +/**
> + * struct vduse_iotlb_info - IOTLB information
> + * @bounce_iova: start IOVA of bounce buffer
> + * @bounce_size: bounce buffer size
> + * @reserved: for future use, needs to be initialized to zero
> + *
> + * Structure used by VDUSE_IOTLB_GET_INFO ioctl to get IOTLB information.
> + */
> +struct vduse_iotlb_info {
> +       __u64 bounce_iova;
> +       __u64 bounce_size;
> +       __u64 reserved[2];
> +};
> +
> +/* Get IOTLB information, e.g. bounce buffer size */
> +#define VDUSE_IOTLB_GET_INFO    _IOR(VDUSE_BASE, 0x18, struct vduse_iotlb_info)
> +
>  /* The control messages definition for read(2)/write(2) on /dev/vduse/$NAME */
>
>  /**
> --
> 2.20.1
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/5] VDUSE: Support registering userspace memory as bounce buffer
  2022-07-11  7:24           ` Yongji Xie
@ 2022-07-14  2:59             ` Jason Wang
  0 siblings, 0 replies; 25+ messages in thread
From: Jason Wang @ 2022-07-14  2:59 UTC (permalink / raw)
  To: Yongji Xie
  Cc: mst, Liu Xiaodong, Maxime Coquelin, Stefan Hajnoczi, songmuchun,
	virtualization, linux-kernel


在 2022/7/11 15:24, Yongji Xie 写道:
> On Mon, Jul 11, 2022 at 2:02 PM Jason Wang <jasowang@redhat.com> wrote:
>> On Fri, Jul 8, 2022 at 5:53 PM Yongji Xie <xieyongji@bytedance.com> wrote:
>>> On Fri, Jul 8, 2022 at 4:38 PM Jason Wang <jasowang@redhat.com> wrote:
>>>> On Wed, Jul 6, 2022 at 6:16 PM Yongji Xie <xieyongji@bytedance.com> wrote:
>>>>> On Wed, Jul 6, 2022 at 5:30 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>>> On Wed, Jul 6, 2022 at 1:05 PM Xie Yongji <xieyongji@bytedance.com> wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> This series introduces some new ioctls: VDUSE_IOTLB_GET_INFO,
>>>>>>> VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM to support
>>>>>>> registering and de-registering userspace memory for IOTLB
>>>>>>> as bounce buffer in virtio-vdpa case.
>>>>>>>
>>>>>>> The VDUSE_IOTLB_GET_INFO ioctl can help user to query IOLTB
>>>>>>> information such as bounce buffer size. Then user can use
>>>>>>> those information on VDUSE_IOTLB_REG_UMEM and
>>>>>>> VDUSE_IOTLB_DEREG_UMEM ioctls to register and de-register
>>>>>>> userspace memory for IOTLB.
>>>>>>>
>>>>>>> During registering and de-registering, the DMA data in use
>>>>>>> would be copied from kernel bounce pages to userspace bounce
>>>>>>> pages and back.
>>>>>>>
>>>>>>> With this feature, some existing application such as SPDK
>>>>>>> and DPDK can leverage the datapath of VDUSE directly and
>>>>>>> efficiently as discussed before [1][2]. They can register
>>>>>>> some preallocated hugepages to VDUSE to avoid an extra
>>>>>>> memcpy from bounce-buffer to hugepages.
>>>>>> This is really interesting.
>>>>>>
>>>>>> But a small concern on uAPI is that this seems to expose the VDUSE
>>>>>> internal implementation (bounce buffer) to userspace. We tried hard to
>>>>>> hide it via the GET_FD before. Anyway can we keep it?
>>>>>>
>>>>> Another way is changing GET_FD ioctl to add a flag or reuse 'perm'
>>>>> field to indicate whether a IOVA region supports userspace memory
>>>>> registration. Then userspace can use
>>>>> VDUSE_IOTLB_REG_UMEM/VDUSE_IOTLB_DEREG_UMEM to register/deregister
>>>>> userspace memory for this IOVA region.
>>>> Looks better.
>>>>
>>> OK.
>>>
>>>>> Any suggestions?
>>>> I wonder what's the value of keeping the compatibility with the kernel
>>>> mmaped bounce buffer. It means we need to take extra care on e.g data
>>>> copying when reg/reg user space memory.
>>>>
>>> I'm not sure I get your point on the compatibility with the kernel
>>> bounce buffer. Do you mean they use the same iova region?
>> Yes.
>>
>>> The userspace daemon might crash or reboot. In those cases, we still
>>> need a kernel buffer to store/recover the data.
>> Yes, this should be a good point.
>>
>>>> Can we simply allow the third kind of fd that only works for umem registration?
>>>>
>>> Do you mean using another iova region for umem?
>> I meant having a new kind of fd that only allows umem registration.
>>
> OK. It seems to be a little complicated to allow mapping a registered
> user memory via a new fd, e.g. how to handle the mapping if the
> userspace daemon exits but the fd is already passed to another
> process.
>
>>> I think we don't need
>>> a fd in umem case since the userspace daemon can access the memory
>>> directly without using mmap() to map it into the address space in
>>> advance.
>> Ok, I will have a look at the code and get back.
>>
> OK. Looking forward to your reply.


Looks good overall.

Just few comments.

Thanks


>
> Thanks,
> Yongji
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/5] vduse: Support querying IOLTB information
  2022-07-14  2:51   ` Jason Wang
@ 2022-07-14  5:58     ` Yongji Xie
  0 siblings, 0 replies; 25+ messages in thread
From: Yongji Xie @ 2022-07-14  5:58 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, Liu Xiaodong, Maxime Coquelin, Stefan Hajnoczi, songmuchun,
	virtualization, linux-kernel

On Thu, Jul 14, 2022 at 10:51 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Wed, Jul 6, 2022 at 1:06 PM Xie Yongji <xieyongji@bytedance.com> wrote:
> >
> > This introduces a new ioctl: VDUSE_IOTLB_GET_INFO to
> > support querying IOLTB information such as bounce
> > buffer size.
> >
> > Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
> > ---
> >  drivers/vdpa/vdpa_user/vduse_dev.c | 13 +++++++++++++
> >  include/uapi/linux/vduse.h         | 17 +++++++++++++++++
> >  2 files changed, 30 insertions(+)
> >
> > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > index 3bc27de58f46..c47a5d9765cf 100644
> > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > @@ -1089,6 +1089,19 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> >                 ret = vduse_dev_queue_irq_work(dev, &dev->vqs[index].inject);
> >                 break;
> >         }
> > +       case VDUSE_IOTLB_GET_INFO: {
>
> As discussed, it's better not to expose the VDUSE internal structure
> like "IOTLB" in the name.
>
> We probably need to extend GET_FD or have a new ioctl like GET_FD_INFO.
>

OK, will do it in v3.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/5] vduse: Remove unnecessary spin lock protection
  2022-07-14  2:27       ` Jason Wang
@ 2022-07-14  6:06         ` Yongji Xie
  0 siblings, 0 replies; 25+ messages in thread
From: Yongji Xie @ 2022-07-14  6:06 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Liu Xiaodong, Maxime Coquelin,
	Stefan Hajnoczi, songmuchun, virtualization, linux-kernel

On Thu, Jul 14, 2022 at 10:27 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Wed, Jul 13, 2022 at 7:09 PM Yongji Xie <xieyongji@bytedance.com> wrote:
> >
> > On Wed, Jul 13, 2022 at 1:44 PM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > >
> > > 在 2022/7/6 13:04, Xie Yongji 写道:
> > > > Taking iotlb lock to access bounce page in page fault
> > > > handler is meaningless since vduse_domain_free_bounce_pages()
> > > > would only be called during file release.
> > > >
> > > > Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
> > > > ---
> > > >   drivers/vdpa/vdpa_user/iova_domain.c | 7 ++-----
> > > >   1 file changed, 2 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c
> > > > index 6daa3978d290..bca1f0b8850c 100644
> > > > --- a/drivers/vdpa/vdpa_user/iova_domain.c
> > > > +++ b/drivers/vdpa/vdpa_user/iova_domain.c
> > > > @@ -211,17 +211,14 @@ static struct page *
> > > >   vduse_domain_get_bounce_page(struct vduse_iova_domain *domain, u64 iova)
> > > >   {
> > > >       struct vduse_bounce_map *map;
> > > > -     struct page *page = NULL;
> > > > +     struct page *page;
> > > >
> > > > -     spin_lock(&domain->iotlb_lock);
> > > >       map = &domain->bounce_maps[iova >> PAGE_SHIFT];
> > > >       if (!map->bounce_page)
> > > > -             goto out;
> > > > +             return NULL;
> > >
> > >
> > > Interesting, I wonder why we don't serialize with
> > > vduse_domain_map_bounce_page() with iotlb_lock?
> > >
> >
> > Userspace should only access the bounce page after we set up the dma
> > mapping, so we don't need serialization from the iotlb_lock in this
> > case.
>
> What about the buggy/malicious user space that tries to access those
> pages before or just in the middle of it has been mapped?
>

Yes, it might happen. But it would not have any problem. The userspace
might get a page or a SIGBUS error.

> > And vduse_domain_map_bounce_page() only sets the
> > map->bounce_page rather than clears the map->bounce_page, we would not
> > have any problem without the lock protection.
>
> Probably, I see an assignment of orig_phys after the alloc_page() but
> it seems only used in bouncing which will only be called by dma ops.
> At least we'd better have a comment to explain the synchronization
> here.
>

Yes, only map->bounce_page might be accessed concurrently in those two
places. I will add some comments in them.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-07-14  6:06 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-06  5:04 [PATCH v2 0/5] VDUSE: Support registering userspace memory as bounce buffer Xie Yongji
2022-07-06  5:04 ` [PATCH v2 1/5] vduse: Remove unnecessary spin lock protection Xie Yongji
2022-07-13  5:43   ` Jason Wang
2022-07-13 11:08     ` Yongji Xie
2022-07-14  2:27       ` Jason Wang
2022-07-14  6:06         ` Yongji Xie
2022-07-13  5:57   ` Michael S. Tsirkin
2022-07-13 10:39     ` Yongji Xie
2022-07-06  5:05 ` [PATCH v2 2/5] vduse: Use memcpy_{to,from}_page() in do_bounce() Xie Yongji
2022-07-13  5:59   ` Jason Wang
2022-07-06  5:05 ` [PATCH v2 3/5] vduse: Support using userspace pages as bounce buffer Xie Yongji
2022-07-06  5:05 ` [PATCH v2 4/5] vduse: Support querying IOLTB information Xie Yongji
2022-07-14  2:51   ` Jason Wang
2022-07-14  5:58     ` Yongji Xie
2022-07-06  5:05 ` [PATCH v2 5/5] vduse: Support registering userspace memory for IOTLB Xie Yongji
2022-07-06  6:08   ` kernel test robot
2022-07-07 23:01   ` kernel test robot
2022-07-08  1:14   ` kernel test robot
2022-07-06  9:30 ` [PATCH v2 0/5] VDUSE: Support registering userspace memory as bounce buffer Jason Wang
2022-07-06 10:15   ` Yongji Xie
2022-07-08  8:38     ` Jason Wang
2022-07-08  9:53       ` Yongji Xie
2022-07-11  6:02         ` Jason Wang
2022-07-11  7:24           ` Yongji Xie
2022-07-14  2:59             ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).