linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/8] vdpa_sim: add support for user VA
@ 2023-03-02 11:34 Stefano Garzarella
  2023-03-02 11:34 ` [PATCH v2 1/8] vdpa: add bind_mm/unbind_mm callbacks Stefano Garzarella
                   ` (7 more replies)
  0 siblings, 8 replies; 40+ messages in thread
From: Stefano Garzarella @ 2023-03-02 11:34 UTC (permalink / raw)
  To: virtualization
  Cc: Andrey Zhadchenko, eperezma, netdev, stefanha, linux-kernel,
	Jason Wang, Michael S. Tsirkin, kvm, Stefano Garzarella

v2:
- rebased on Linus' tree, commit ae3419fbac84 ("vc_screen: don't clobber
  return value in vcs_read")
- removed `struct task_struct *owner` param (unused for now, maybe
  useful to support cgroups) [Jason]
- add unbind_mm callback [Jason]
- call the new unbind_mm callback during the release [Jason]
- avoid to call bind_mm callback after the reset, since the device
  is not detaching it now during the reset
- added new patch replace kmap_atomic() with kmap_local_page() since
  checkpatch.pl complained about deprecation of kmap_atomic() touched
  by a patch in this series
- fix cast warnings when build with W=1 C=1
- added new patch to replace the spinlock with a mutex [Jason]
- `use_va` set to true by default [Eugenio]
- supported the new unbind_mm callback [Jason]
- removed the unbind_mm call in vdpasim_do_reset() [Jason]
- avoided to release the lock while call kthread_flush_work() since
  we are now using a mutex to protect the device state

RFC v1: https://lore.kernel.org/lkml/20221214163025.103075-1-sgarzare@redhat.com/

This series adds support for the use of user virtual addresses in the
vDPA simulator devices.

The main reason for this change is to lift the pinning of all guest memory.
Especially with virtio devices implemented in software.

The next step would be to generalize the code in vdpa-sim to allow the
implementation of in-kernel software devices. Similar to vhost, but using vDPA
so we can reuse the same software stack (e.g. in QEMU) for both HW and SW
devices.

For example, we have never merged vhost-blk, and lately there has been interest.
So it would be nice to do it directly with vDPA to reuse the same code in the
VMM for both HW and SW vDPA block devices.

The main problem (addressed by this series) was due to the pinning of all
guest memory, which thus prevented the overcommit of guest memory.

Thanks,
Stefano

Stefano Garzarella (8):
  vdpa: add bind_mm/unbind_mm callbacks
  vhost-vdpa: use bind_mm/unbind_mm device callbacks
  vringh: replace kmap_atomic() with kmap_local_page()
  vringh: support VA with iotlb
  vdpa_sim: make devices agnostic for work management
  vdpa_sim: use kthread worker
  vdpa_sim: replace the spinlock with a mutex to protect the state
  vdpa_sim: add support for user VA

 drivers/vdpa/vdpa_sim/vdpa_sim.h     |  11 +-
 include/linux/vdpa.h                 |  10 ++
 include/linux/vringh.h               |   5 +-
 drivers/vdpa/mlx5/net/mlx5_vnet.c    |   2 +-
 drivers/vdpa/vdpa_sim/vdpa_sim.c     | 160 ++++++++++++++---
 drivers/vdpa/vdpa_sim/vdpa_sim_blk.c |  10 +-
 drivers/vdpa/vdpa_sim/vdpa_sim_net.c |  10 +-
 drivers/vhost/vdpa.c                 |  30 ++++
 drivers/vhost/vringh.c               | 247 +++++++++++++++++++++------
 9 files changed, 395 insertions(+), 90 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 1/8] vdpa: add bind_mm/unbind_mm callbacks
  2023-03-02 11:34 [PATCH v2 0/8] vdpa_sim: add support for user VA Stefano Garzarella
@ 2023-03-02 11:34 ` Stefano Garzarella
  2023-03-14  3:39   ` Jason Wang
  2023-03-02 11:34 ` [PATCH v2 2/8] vhost-vdpa: use bind_mm/unbind_mm device callbacks Stefano Garzarella
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 40+ messages in thread
From: Stefano Garzarella @ 2023-03-02 11:34 UTC (permalink / raw)
  To: virtualization
  Cc: Andrey Zhadchenko, eperezma, netdev, stefanha, linux-kernel,
	Jason Wang, Michael S. Tsirkin, kvm, Stefano Garzarella

These new optional callbacks is used to bind/unbind the device to
a specific address space so the vDPA framework can use VA when
these callbacks are implemented.

Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---

Notes:
    v2:
    - removed `struct task_struct *owner` param (unused for now, maybe
      useful to support cgroups) [Jason]
    - add unbind_mm callback [Jason]

 include/linux/vdpa.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 43f59ef10cc9..369c21394284 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -290,6 +290,14 @@ struct vdpa_map_file {
  *				@vdev: vdpa device
  *				@idx: virtqueue index
  *				Returns pointer to structure device or error (NULL)
+ * @bind_mm:			Bind the device to a specific address space
+ *				so the vDPA framework can use VA when this
+ *				callback is implemented. (optional)
+ *				@vdev: vdpa device
+ *				@mm: address space to bind
+ * @unbind_mm:			Unbind the device from the address space
+ *				bound using the bind_mm callback. (optional)
+ *				@vdev: vdpa device
  * @free:			Free resources that belongs to vDPA (optional)
  *				@vdev: vdpa device
  */
@@ -351,6 +359,8 @@ struct vdpa_config_ops {
 	int (*set_group_asid)(struct vdpa_device *vdev, unsigned int group,
 			      unsigned int asid);
 	struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 idx);
+	int (*bind_mm)(struct vdpa_device *vdev, struct mm_struct *mm);
+	void (*unbind_mm)(struct vdpa_device *vdev);
 
 	/* Free device resources */
 	void (*free)(struct vdpa_device *vdev);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 2/8] vhost-vdpa: use bind_mm/unbind_mm device callbacks
  2023-03-02 11:34 [PATCH v2 0/8] vdpa_sim: add support for user VA Stefano Garzarella
  2023-03-02 11:34 ` [PATCH v2 1/8] vdpa: add bind_mm/unbind_mm callbacks Stefano Garzarella
@ 2023-03-02 11:34 ` Stefano Garzarella
  2023-03-14  3:48   ` Jason Wang
  2023-03-02 11:34 ` [PATCH v2 3/8] vringh: replace kmap_atomic() with kmap_local_page() Stefano Garzarella
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 40+ messages in thread
From: Stefano Garzarella @ 2023-03-02 11:34 UTC (permalink / raw)
  To: virtualization
  Cc: Andrey Zhadchenko, eperezma, netdev, stefanha, linux-kernel,
	Jason Wang, Michael S. Tsirkin, kvm, Stefano Garzarella

When the user call VHOST_SET_OWNER ioctl and the vDPA device
has `use_va` set to true, let's call the bind_mm callback.
In this way we can bind the device to the user address space
and directly use the user VA.

The unbind_mm callback is called during the release after
stopping the device.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---

Notes:
    v2:
    - call the new unbind_mm callback during the release [Jason]
    - avoid to call bind_mm callback after the reset, since the device
      is not detaching it now during the reset

 drivers/vhost/vdpa.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index dc12dbd5b43b..1ab89fccd825 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -219,6 +219,28 @@ static int vhost_vdpa_reset(struct vhost_vdpa *v)
 	return vdpa_reset(vdpa);
 }
 
+static long vhost_vdpa_bind_mm(struct vhost_vdpa *v)
+{
+	struct vdpa_device *vdpa = v->vdpa;
+	const struct vdpa_config_ops *ops = vdpa->config;
+
+	if (!vdpa->use_va || !ops->bind_mm)
+		return 0;
+
+	return ops->bind_mm(vdpa, v->vdev.mm);
+}
+
+static void vhost_vdpa_unbind_mm(struct vhost_vdpa *v)
+{
+	struct vdpa_device *vdpa = v->vdpa;
+	const struct vdpa_config_ops *ops = vdpa->config;
+
+	if (!vdpa->use_va || !ops->unbind_mm)
+		return;
+
+	ops->unbind_mm(vdpa);
+}
+
 static long vhost_vdpa_get_device_id(struct vhost_vdpa *v, u8 __user *argp)
 {
 	struct vdpa_device *vdpa = v->vdpa;
@@ -711,6 +733,13 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
 		break;
 	default:
 		r = vhost_dev_ioctl(&v->vdev, cmd, argp);
+		if (!r && cmd == VHOST_SET_OWNER) {
+			r = vhost_vdpa_bind_mm(v);
+			if (r) {
+				vhost_dev_reset_owner(&v->vdev, NULL);
+				break;
+			}
+		}
 		if (r == -ENOIOCTLCMD)
 			r = vhost_vdpa_vring_ioctl(v, cmd, argp);
 		break;
@@ -1285,6 +1314,7 @@ static int vhost_vdpa_release(struct inode *inode, struct file *filep)
 	vhost_vdpa_clean_irq(v);
 	vhost_vdpa_reset(v);
 	vhost_dev_stop(&v->vdev);
+	vhost_vdpa_unbind_mm(v);
 	vhost_vdpa_free_domain(v);
 	vhost_vdpa_config_put(v);
 	vhost_vdpa_cleanup(v);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 3/8] vringh: replace kmap_atomic() with kmap_local_page()
  2023-03-02 11:34 [PATCH v2 0/8] vdpa_sim: add support for user VA Stefano Garzarella
  2023-03-02 11:34 ` [PATCH v2 1/8] vdpa: add bind_mm/unbind_mm callbacks Stefano Garzarella
  2023-03-02 11:34 ` [PATCH v2 2/8] vhost-vdpa: use bind_mm/unbind_mm device callbacks Stefano Garzarella
@ 2023-03-02 11:34 ` Stefano Garzarella
  2023-03-14  3:56   ` Jason Wang
  2023-03-16  9:13   ` Fabio M. De Francesco
  2023-03-02 11:34 ` [PATCH v2 4/8] vringh: support VA with iotlb Stefano Garzarella
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 40+ messages in thread
From: Stefano Garzarella @ 2023-03-02 11:34 UTC (permalink / raw)
  To: virtualization
  Cc: Andrey Zhadchenko, eperezma, netdev, stefanha, linux-kernel,
	Jason Wang, Michael S. Tsirkin, kvm, Stefano Garzarella

kmap_atomic() is deprecated in favor of kmap_local_page().

With kmap_local_page() the mappings are per thread, CPU local, can take
page-faults, and can be called from any context (including interrupts).
Furthermore, the tasks can be preempted and, when they are scheduled to
run again, the kernel virtual addresses are restored and still valid.

kmap_atomic() is implemented like a kmap_local_page() which also disables
page-faults and preemption (the latter only for !PREEMPT_RT kernels,
otherwise it only disables migration).

The code within the mappings/un-mappings in getu16_iotlb() and
putu16_iotlb() don't depend on the above-mentioned side effects of
kmap_atomic(), so that mere replacements of the old API with the new one
is all that is required (i.e., there is no need to explicitly add calls
to pagefault_disable() and/or preempt_disable()).

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---

Notes:
    v2:
    - added this patch since checkpatch.pl complained about deprecation
      of kmap_atomic() touched by next patch

 drivers/vhost/vringh.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index a1e27da54481..0ba3ef809e48 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -1220,10 +1220,10 @@ static inline int getu16_iotlb(const struct vringh *vrh,
 	if (ret < 0)
 		return ret;
 
-	kaddr = kmap_atomic(iov.bv_page);
+	kaddr = kmap_local_page(iov.bv_page);
 	from = kaddr + iov.bv_offset;
 	*val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
-	kunmap_atomic(kaddr);
+	kunmap_local(kaddr);
 
 	return 0;
 }
@@ -1241,10 +1241,10 @@ static inline int putu16_iotlb(const struct vringh *vrh,
 	if (ret < 0)
 		return ret;
 
-	kaddr = kmap_atomic(iov.bv_page);
+	kaddr = kmap_local_page(iov.bv_page);
 	to = kaddr + iov.bv_offset;
 	WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
-	kunmap_atomic(kaddr);
+	kunmap_local(kaddr);
 
 	return 0;
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 4/8] vringh: support VA with iotlb
  2023-03-02 11:34 [PATCH v2 0/8] vdpa_sim: add support for user VA Stefano Garzarella
                   ` (2 preceding siblings ...)
  2023-03-02 11:34 ` [PATCH v2 3/8] vringh: replace kmap_atomic() with kmap_local_page() Stefano Garzarella
@ 2023-03-02 11:34 ` Stefano Garzarella
  2023-03-03 14:38   ` Eugenio Perez Martin
  2023-03-14  4:53   ` Jason Wang
  2023-03-02 11:34 ` [PATCH v2 5/8] vdpa_sim: make devices agnostic for work management Stefano Garzarella
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 40+ messages in thread
From: Stefano Garzarella @ 2023-03-02 11:34 UTC (permalink / raw)
  To: virtualization
  Cc: Andrey Zhadchenko, eperezma, netdev, stefanha, linux-kernel,
	Jason Wang, Michael S. Tsirkin, kvm, Stefano Garzarella

vDPA supports the possibility to use user VA in the iotlb messages.
So, let's add support for user VA in vringh to use it in the vDPA
simulators.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---

Notes:
    v2:
    - replace kmap_atomic() with kmap_local_page() [see previous patch]
    - fix cast warnings when build with W=1 C=1

 include/linux/vringh.h            |   5 +-
 drivers/vdpa/mlx5/net/mlx5_vnet.c |   2 +-
 drivers/vdpa/vdpa_sim/vdpa_sim.c  |   4 +-
 drivers/vhost/vringh.c            | 247 ++++++++++++++++++++++++------
 4 files changed, 205 insertions(+), 53 deletions(-)

diff --git a/include/linux/vringh.h b/include/linux/vringh.h
index 1991a02c6431..d39b9f2dcba0 100644
--- a/include/linux/vringh.h
+++ b/include/linux/vringh.h
@@ -32,6 +32,9 @@ struct vringh {
 	/* Can we get away with weak barriers? */
 	bool weak_barriers;
 
+	/* Use user's VA */
+	bool use_va;
+
 	/* Last available index we saw (ie. where we're up to). */
 	u16 last_avail_idx;
 
@@ -279,7 +282,7 @@ void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb,
 		      spinlock_t *iotlb_lock);
 
 int vringh_init_iotlb(struct vringh *vrh, u64 features,
-		      unsigned int num, bool weak_barriers,
+		      unsigned int num, bool weak_barriers, bool use_va,
 		      struct vring_desc *desc,
 		      struct vring_avail *avail,
 		      struct vring_used *used);
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 3a0e721aef05..babc8dd171a6 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2537,7 +2537,7 @@ static int setup_cvq_vring(struct mlx5_vdpa_dev *mvdev)
 
 	if (mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))
 		err = vringh_init_iotlb(&cvq->vring, mvdev->actual_features,
-					MLX5_CVQ_MAX_ENT, false,
+					MLX5_CVQ_MAX_ENT, false, false,
 					(struct vring_desc *)(uintptr_t)cvq->desc_addr,
 					(struct vring_avail *)(uintptr_t)cvq->driver_addr,
 					(struct vring_used *)(uintptr_t)cvq->device_addr);
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 6a0a65814626..481eb156658b 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -60,7 +60,7 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx)
 	struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
 	uint16_t last_avail_idx = vq->vring.last_avail_idx;
 
-	vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true,
+	vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false,
 			  (struct vring_desc *)(uintptr_t)vq->desc_addr,
 			  (struct vring_avail *)
 			  (uintptr_t)vq->driver_addr,
@@ -81,7 +81,7 @@ static void vdpasim_vq_reset(struct vdpasim *vdpasim,
 	vq->cb = NULL;
 	vq->private = NULL;
 	vringh_init_iotlb(&vq->vring, vdpasim->dev_attr.supported_features,
-			  VDPASIM_QUEUE_MAX, false, NULL, NULL, NULL);
+			  VDPASIM_QUEUE_MAX, false, false, NULL, NULL, NULL);
 
 	vq->vring.notify = NULL;
 }
diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index 0ba3ef809e48..61c79cea44ca 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -1094,15 +1094,99 @@ EXPORT_SYMBOL(vringh_need_notify_kern);
 
 #if IS_REACHABLE(CONFIG_VHOST_IOTLB)
 
-static int iotlb_translate(const struct vringh *vrh,
-			   u64 addr, u64 len, u64 *translated,
-			   struct bio_vec iov[],
-			   int iov_size, u32 perm)
+static int iotlb_translate_va(const struct vringh *vrh,
+			      u64 addr, u64 len, u64 *translated,
+			      struct iovec iov[],
+			      int iov_size, u32 perm)
 {
 	struct vhost_iotlb_map *map;
 	struct vhost_iotlb *iotlb = vrh->iotlb;
+	u64 s = 0, last = addr + len - 1;
 	int ret = 0;
+
+	spin_lock(vrh->iotlb_lock);
+
+	while (len > s) {
+		u64 size;
+
+		if (unlikely(ret >= iov_size)) {
+			ret = -ENOBUFS;
+			break;
+		}
+
+		map = vhost_iotlb_itree_first(iotlb, addr, last);
+		if (!map || map->start > addr) {
+			ret = -EINVAL;
+			break;
+		} else if (!(map->perm & perm)) {
+			ret = -EPERM;
+			break;
+		}
+
+		size = map->size - addr + map->start;
+		iov[ret].iov_len = min(len - s, size);
+		iov[ret].iov_base = (void __user *)(unsigned long)
+				    (map->addr + addr - map->start);
+		s += size;
+		addr += size;
+		++ret;
+	}
+
+	spin_unlock(vrh->iotlb_lock);
+
+	if (translated)
+		*translated = min(len, s);
+
+	return ret;
+}
+
+static inline int copy_from_va(const struct vringh *vrh, void *dst, void *src,
+			       u64 len, u64 *translated)
+{
+	struct iovec iov[16];
+	struct iov_iter iter;
+	int ret;
+
+	ret = iotlb_translate_va(vrh, (u64)(uintptr_t)src, len, translated, iov,
+				 ARRAY_SIZE(iov), VHOST_MAP_RO);
+	if (ret == -ENOBUFS)
+		ret = ARRAY_SIZE(iov);
+	else if (ret < 0)
+		return ret;
+
+	iov_iter_init(&iter, ITER_SOURCE, iov, ret, *translated);
+
+	return copy_from_iter(dst, *translated, &iter);
+}
+
+static inline int copy_to_va(const struct vringh *vrh, void *dst, void *src,
+			     u64 len, u64 *translated)
+{
+	struct iovec iov[16];
+	struct iov_iter iter;
+	int ret;
+
+	ret = iotlb_translate_va(vrh, (u64)(uintptr_t)dst, len, translated, iov,
+				 ARRAY_SIZE(iov), VHOST_MAP_WO);
+	if (ret == -ENOBUFS)
+		ret = ARRAY_SIZE(iov);
+	else if (ret < 0)
+		return ret;
+
+	iov_iter_init(&iter, ITER_DEST, iov, ret, *translated);
+
+	return copy_to_iter(src, *translated, &iter);
+}
+
+static int iotlb_translate_pa(const struct vringh *vrh,
+			      u64 addr, u64 len, u64 *translated,
+			      struct bio_vec iov[],
+			      int iov_size, u32 perm)
+{
+	struct vhost_iotlb_map *map;
+	struct vhost_iotlb *iotlb = vrh->iotlb;
 	u64 s = 0, last = addr + len - 1;
+	int ret = 0;
 
 	spin_lock(vrh->iotlb_lock);
 
@@ -1141,28 +1225,61 @@ static int iotlb_translate(const struct vringh *vrh,
 	return ret;
 }
 
+static inline int copy_from_pa(const struct vringh *vrh, void *dst, void *src,
+			       u64 len, u64 *translated)
+{
+	struct bio_vec iov[16];
+	struct iov_iter iter;
+	int ret;
+
+	ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)src, len, translated, iov,
+				 ARRAY_SIZE(iov), VHOST_MAP_RO);
+	if (ret == -ENOBUFS)
+		ret = ARRAY_SIZE(iov);
+	else if (ret < 0)
+		return ret;
+
+	iov_iter_bvec(&iter, ITER_SOURCE, iov, ret, *translated);
+
+	return copy_from_iter(dst, *translated, &iter);
+}
+
+static inline int copy_to_pa(const struct vringh *vrh, void *dst, void *src,
+			     u64 len, u64 *translated)
+{
+	struct bio_vec iov[16];
+	struct iov_iter iter;
+	int ret;
+
+	ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)dst, len, translated, iov,
+				 ARRAY_SIZE(iov), VHOST_MAP_WO);
+	if (ret == -ENOBUFS)
+		ret = ARRAY_SIZE(iov);
+	else if (ret < 0)
+		return ret;
+
+	iov_iter_bvec(&iter, ITER_DEST, iov, ret, *translated);
+
+	return copy_to_iter(src, *translated, &iter);
+}
+
 static inline int copy_from_iotlb(const struct vringh *vrh, void *dst,
 				  void *src, size_t len)
 {
 	u64 total_translated = 0;
 
 	while (total_translated < len) {
-		struct bio_vec iov[16];
-		struct iov_iter iter;
 		u64 translated;
 		int ret;
 
-		ret = iotlb_translate(vrh, (u64)(uintptr_t)src,
-				      len - total_translated, &translated,
-				      iov, ARRAY_SIZE(iov), VHOST_MAP_RO);
-		if (ret == -ENOBUFS)
-			ret = ARRAY_SIZE(iov);
-		else if (ret < 0)
-			return ret;
-
-		iov_iter_bvec(&iter, ITER_SOURCE, iov, ret, translated);
+		if (vrh->use_va) {
+			ret = copy_from_va(vrh, dst, src,
+					   len - total_translated, &translated);
+		} else {
+			ret = copy_from_pa(vrh, dst, src,
+					   len - total_translated, &translated);
+		}
 
-		ret = copy_from_iter(dst, translated, &iter);
 		if (ret < 0)
 			return ret;
 
@@ -1180,22 +1297,17 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
 	u64 total_translated = 0;
 
 	while (total_translated < len) {
-		struct bio_vec iov[16];
-		struct iov_iter iter;
 		u64 translated;
 		int ret;
 
-		ret = iotlb_translate(vrh, (u64)(uintptr_t)dst,
-				      len - total_translated, &translated,
-				      iov, ARRAY_SIZE(iov), VHOST_MAP_WO);
-		if (ret == -ENOBUFS)
-			ret = ARRAY_SIZE(iov);
-		else if (ret < 0)
-			return ret;
-
-		iov_iter_bvec(&iter, ITER_DEST, iov, ret, translated);
+		if (vrh->use_va) {
+			ret = copy_to_va(vrh, dst, src,
+					 len - total_translated, &translated);
+		} else {
+			ret = copy_to_pa(vrh, dst, src,
+					 len - total_translated, &translated);
+		}
 
-		ret = copy_to_iter(src, translated, &iter);
 		if (ret < 0)
 			return ret;
 
@@ -1210,20 +1322,37 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
 static inline int getu16_iotlb(const struct vringh *vrh,
 			       u16 *val, const __virtio16 *p)
 {
-	struct bio_vec iov;
-	void *kaddr, *from;
 	int ret;
 
 	/* Atomic read is needed for getu16 */
-	ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
-			      &iov, 1, VHOST_MAP_RO);
-	if (ret < 0)
-		return ret;
+	if (vrh->use_va) {
+		struct iovec iov;
+		__virtio16 tmp;
+
+		ret = iotlb_translate_va(vrh, (u64)(uintptr_t)p, sizeof(*p),
+					 NULL, &iov, 1, VHOST_MAP_RO);
+		if (ret < 0)
+			return ret;
 
-	kaddr = kmap_local_page(iov.bv_page);
-	from = kaddr + iov.bv_offset;
-	*val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
-	kunmap_local(kaddr);
+		ret = __get_user(tmp, (__virtio16 __user *)iov.iov_base);
+		if (ret)
+			return ret;
+
+		*val = vringh16_to_cpu(vrh, tmp);
+	} else {
+		struct bio_vec iov;
+		void *kaddr, *from;
+
+		ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)p, sizeof(*p),
+					 NULL, &iov, 1, VHOST_MAP_RO);
+		if (ret < 0)
+			return ret;
+
+		kaddr = kmap_local_page(iov.bv_page);
+		from = kaddr + iov.bv_offset;
+		*val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
+		kunmap_local(kaddr);
+	}
 
 	return 0;
 }
@@ -1231,20 +1360,37 @@ static inline int getu16_iotlb(const struct vringh *vrh,
 static inline int putu16_iotlb(const struct vringh *vrh,
 			       __virtio16 *p, u16 val)
 {
-	struct bio_vec iov;
-	void *kaddr, *to;
 	int ret;
 
 	/* Atomic write is needed for putu16 */
-	ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
-			      &iov, 1, VHOST_MAP_WO);
-	if (ret < 0)
-		return ret;
+	if (vrh->use_va) {
+		struct iovec iov;
+		__virtio16 tmp;
 
-	kaddr = kmap_local_page(iov.bv_page);
-	to = kaddr + iov.bv_offset;
-	WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
-	kunmap_local(kaddr);
+		ret = iotlb_translate_va(vrh, (u64)(uintptr_t)p, sizeof(*p),
+					 NULL, &iov, 1, VHOST_MAP_RO);
+		if (ret < 0)
+			return ret;
+
+		tmp = cpu_to_vringh16(vrh, val);
+
+		ret = __put_user(tmp, (__virtio16 __user *)iov.iov_base);
+		if (ret)
+			return ret;
+	} else {
+		struct bio_vec iov;
+		void *kaddr, *to;
+
+		ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
+					 &iov, 1, VHOST_MAP_WO);
+		if (ret < 0)
+			return ret;
+
+		kaddr = kmap_local_page(iov.bv_page);
+		to = kaddr + iov.bv_offset;
+		WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
+		kunmap_local(kaddr);
+	}
 
 	return 0;
 }
@@ -1306,6 +1452,7 @@ static inline int putused_iotlb(const struct vringh *vrh,
  * @features: the feature bits for this ring.
  * @num: the number of elements.
  * @weak_barriers: true if we only need memory barriers, not I/O.
+ * @use_va: true if IOTLB contains user VA
  * @desc: the userpace descriptor pointer.
  * @avail: the userpace avail pointer.
  * @used: the userpace used pointer.
@@ -1313,11 +1460,13 @@ static inline int putused_iotlb(const struct vringh *vrh,
  * Returns an error if num is invalid.
  */
 int vringh_init_iotlb(struct vringh *vrh, u64 features,
-		      unsigned int num, bool weak_barriers,
+		      unsigned int num, bool weak_barriers, bool use_va,
 		      struct vring_desc *desc,
 		      struct vring_avail *avail,
 		      struct vring_used *used)
 {
+	vrh->use_va = use_va;
+
 	return vringh_init_kern(vrh, features, num, weak_barriers,
 				desc, avail, used);
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 5/8] vdpa_sim: make devices agnostic for work management
  2023-03-02 11:34 [PATCH v2 0/8] vdpa_sim: add support for user VA Stefano Garzarella
                   ` (3 preceding siblings ...)
  2023-03-02 11:34 ` [PATCH v2 4/8] vringh: support VA with iotlb Stefano Garzarella
@ 2023-03-02 11:34 ` Stefano Garzarella
  2023-03-03 14:40   ` Eugenio Perez Martin
  2023-03-14  5:27   ` Jason Wang
  2023-03-02 11:34 ` [PATCH v2 6/8] vdpa_sim: use kthread worker Stefano Garzarella
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 40+ messages in thread
From: Stefano Garzarella @ 2023-03-02 11:34 UTC (permalink / raw)
  To: virtualization
  Cc: Andrey Zhadchenko, eperezma, netdev, stefanha, linux-kernel,
	Jason Wang, Michael S. Tsirkin, kvm, Stefano Garzarella

Let's move work management inside the vdpa_sim core.
This way we can easily change how we manage the works, without
having to change the devices each time.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 drivers/vdpa/vdpa_sim/vdpa_sim.h     |  3 ++-
 drivers/vdpa/vdpa_sim/vdpa_sim.c     | 17 +++++++++++++++--
 drivers/vdpa/vdpa_sim/vdpa_sim_blk.c |  6 ++----
 drivers/vdpa/vdpa_sim/vdpa_sim_net.c |  6 ++----
 4 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
index 144858636c10..acee20faaf6a 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
@@ -45,7 +45,7 @@ struct vdpasim_dev_attr {
 	u32 ngroups;
 	u32 nas;
 
-	work_func_t work_fn;
+	void (*work_fn)(struct vdpasim *vdpasim);
 	void (*get_config)(struct vdpasim *vdpasim, void *config);
 	void (*set_config)(struct vdpasim *vdpasim, const void *config);
 	int (*get_stats)(struct vdpasim *vdpasim, u16 idx,
@@ -78,6 +78,7 @@ struct vdpasim {
 
 struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *attr,
 			       const struct vdpa_dev_set_config *config);
+void vdpasim_schedule_work(struct vdpasim *vdpasim);
 
 /* TODO: cross-endian support */
 static inline bool vdpasim_is_little_endian(struct vdpasim *vdpasim)
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 481eb156658b..a6ee830efc38 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -116,6 +116,13 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim)
 static const struct vdpa_config_ops vdpasim_config_ops;
 static const struct vdpa_config_ops vdpasim_batch_config_ops;
 
+static void vdpasim_work_fn(struct work_struct *work)
+{
+	struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
+
+	vdpasim->dev_attr.work_fn(vdpasim);
+}
+
 struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
 			       const struct vdpa_dev_set_config *config)
 {
@@ -152,7 +159,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
 
 	vdpasim = vdpa_to_sim(vdpa);
 	vdpasim->dev_attr = *dev_attr;
-	INIT_WORK(&vdpasim->work, dev_attr->work_fn);
+	INIT_WORK(&vdpasim->work, vdpasim_work_fn);
 	spin_lock_init(&vdpasim->lock);
 	spin_lock_init(&vdpasim->iommu_lock);
 
@@ -203,6 +210,12 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
 }
 EXPORT_SYMBOL_GPL(vdpasim_create);
 
+void vdpasim_schedule_work(struct vdpasim *vdpasim)
+{
+	schedule_work(&vdpasim->work);
+}
+EXPORT_SYMBOL_GPL(vdpasim_schedule_work);
+
 static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
 				  u64 desc_area, u64 driver_area,
 				  u64 device_area)
@@ -237,7 +250,7 @@ static void vdpasim_kick_vq(struct vdpa_device *vdpa, u16 idx)
 	}
 
 	if (vq->ready)
-		schedule_work(&vdpasim->work);
+		vdpasim_schedule_work(vdpasim);
 }
 
 static void vdpasim_set_vq_cb(struct vdpa_device *vdpa, u16 idx,
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
index 5117959bed8a..eb4897c8541e 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
@@ -11,7 +11,6 @@
 #include <linux/module.h>
 #include <linux/device.h>
 #include <linux/kernel.h>
-#include <linux/sched.h>
 #include <linux/blkdev.h>
 #include <linux/vringh.h>
 #include <linux/vdpa.h>
@@ -286,9 +285,8 @@ static bool vdpasim_blk_handle_req(struct vdpasim *vdpasim,
 	return handled;
 }
 
-static void vdpasim_blk_work(struct work_struct *work)
+static void vdpasim_blk_work(struct vdpasim *vdpasim)
 {
-	struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
 	bool reschedule = false;
 	int i;
 
@@ -326,7 +324,7 @@ static void vdpasim_blk_work(struct work_struct *work)
 	spin_unlock(&vdpasim->lock);
 
 	if (reschedule)
-		schedule_work(&vdpasim->work);
+		vdpasim_schedule_work(vdpasim);
 }
 
 static void vdpasim_blk_get_config(struct vdpasim *vdpasim, void *config)
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
index 862f405362de..e61a9ecbfafe 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
@@ -11,7 +11,6 @@
 #include <linux/module.h>
 #include <linux/device.h>
 #include <linux/kernel.h>
-#include <linux/sched.h>
 #include <linux/etherdevice.h>
 #include <linux/vringh.h>
 #include <linux/vdpa.h>
@@ -192,9 +191,8 @@ static void vdpasim_handle_cvq(struct vdpasim *vdpasim)
 	u64_stats_update_end(&net->cq_stats.syncp);
 }
 
-static void vdpasim_net_work(struct work_struct *work)
+static void vdpasim_net_work(struct vdpasim *vdpasim)
 {
-	struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
 	struct vdpasim_virtqueue *txq = &vdpasim->vqs[1];
 	struct vdpasim_virtqueue *rxq = &vdpasim->vqs[0];
 	struct vdpasim_net *net = sim_to_net(vdpasim);
@@ -260,7 +258,7 @@ static void vdpasim_net_work(struct work_struct *work)
 		vdpasim_net_complete(rxq, write);
 
 		if (tx_pkts > 4) {
-			schedule_work(&vdpasim->work);
+			vdpasim_schedule_work(vdpasim);
 			goto out;
 		}
 	}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 6/8] vdpa_sim: use kthread worker
  2023-03-02 11:34 [PATCH v2 0/8] vdpa_sim: add support for user VA Stefano Garzarella
                   ` (4 preceding siblings ...)
  2023-03-02 11:34 ` [PATCH v2 5/8] vdpa_sim: make devices agnostic for work management Stefano Garzarella
@ 2023-03-02 11:34 ` Stefano Garzarella
  2023-03-02 15:30   ` Simon Horman
                     ` (2 more replies)
  2023-03-02 11:34 ` [PATCH v2 7/8] vdpa_sim: replace the spinlock with a mutex to protect the state Stefano Garzarella
  2023-03-02 11:34 ` [PATCH v2 8/8] vdpa_sim: add support for user VA Stefano Garzarella
  7 siblings, 3 replies; 40+ messages in thread
From: Stefano Garzarella @ 2023-03-02 11:34 UTC (permalink / raw)
  To: virtualization
  Cc: Andrey Zhadchenko, eperezma, netdev, stefanha, linux-kernel,
	Jason Wang, Michael S. Tsirkin, kvm, Stefano Garzarella

Let's use our own kthread to run device jobs.
This allows us more flexibility, especially we can attach the kthread
to the user address space when vDPA uses user's VA.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 drivers/vdpa/vdpa_sim/vdpa_sim.h |  3 ++-
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 17 ++++++++++++-----
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
index acee20faaf6a..ce83f9130a5d 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
@@ -57,7 +57,8 @@ struct vdpasim_dev_attr {
 struct vdpasim {
 	struct vdpa_device vdpa;
 	struct vdpasim_virtqueue *vqs;
-	struct work_struct work;
+	struct kthread_worker *worker;
+	struct kthread_work work;
 	struct vdpasim_dev_attr dev_attr;
 	/* spinlock to synchronize virtqueue state */
 	spinlock_t lock;
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index a6ee830efc38..6feb29726c2a 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -11,8 +11,8 @@
 #include <linux/module.h>
 #include <linux/device.h>
 #include <linux/kernel.h>
+#include <linux/kthread.h>
 #include <linux/slab.h>
-#include <linux/sched.h>
 #include <linux/dma-map-ops.h>
 #include <linux/vringh.h>
 #include <linux/vdpa.h>
@@ -116,7 +116,7 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim)
 static const struct vdpa_config_ops vdpasim_config_ops;
 static const struct vdpa_config_ops vdpasim_batch_config_ops;
 
-static void vdpasim_work_fn(struct work_struct *work)
+static void vdpasim_work_fn(struct kthread_work *work)
 {
 	struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
 
@@ -159,7 +159,13 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
 
 	vdpasim = vdpa_to_sim(vdpa);
 	vdpasim->dev_attr = *dev_attr;
-	INIT_WORK(&vdpasim->work, vdpasim_work_fn);
+
+	kthread_init_work(&vdpasim->work, vdpasim_work_fn);
+	vdpasim->worker = kthread_create_worker(0, "vDPA sim worker: %s",
+						dev_attr->name);
+	if (IS_ERR(vdpasim->worker))
+		goto err_iommu;
+
 	spin_lock_init(&vdpasim->lock);
 	spin_lock_init(&vdpasim->iommu_lock);
 
@@ -212,7 +218,7 @@ EXPORT_SYMBOL_GPL(vdpasim_create);
 
 void vdpasim_schedule_work(struct vdpasim *vdpasim)
 {
-	schedule_work(&vdpasim->work);
+	kthread_queue_work(vdpasim->worker, &vdpasim->work);
 }
 EXPORT_SYMBOL_GPL(vdpasim_schedule_work);
 
@@ -612,7 +618,8 @@ static void vdpasim_free(struct vdpa_device *vdpa)
 	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
 	int i;
 
-	cancel_work_sync(&vdpasim->work);
+	kthread_cancel_work_sync(&vdpasim->work);
+	kthread_destroy_worker(vdpasim->worker);
 
 	for (i = 0; i < vdpasim->dev_attr.nvqs; i++) {
 		vringh_kiov_cleanup(&vdpasim->vqs[i].out_iov);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 7/8] vdpa_sim: replace the spinlock with a mutex to protect the state
  2023-03-02 11:34 [PATCH v2 0/8] vdpa_sim: add support for user VA Stefano Garzarella
                   ` (5 preceding siblings ...)
  2023-03-02 11:34 ` [PATCH v2 6/8] vdpa_sim: use kthread worker Stefano Garzarella
@ 2023-03-02 11:34 ` Stefano Garzarella
  2023-03-14  5:29   ` Jason Wang
  2023-03-02 11:34 ` [PATCH v2 8/8] vdpa_sim: add support for user VA Stefano Garzarella
  7 siblings, 1 reply; 40+ messages in thread
From: Stefano Garzarella @ 2023-03-02 11:34 UTC (permalink / raw)
  To: virtualization
  Cc: Andrey Zhadchenko, eperezma, netdev, stefanha, linux-kernel,
	Jason Wang, Michael S. Tsirkin, kvm, Stefano Garzarella

The spinlock we use to protect the state of the simulator is sometimes
held for a long time (for example, when devices handle requests).

This also prevents us from calling functions that might sleep (such as
kthread_flush_work() in the next patch), and thus having to release
and retake the lock.

For these reasons, let's replace the spinlock with a mutex that gives
us more flexibility.

Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 drivers/vdpa/vdpa_sim/vdpa_sim.h     |  4 ++--
 drivers/vdpa/vdpa_sim/vdpa_sim.c     | 34 ++++++++++++++--------------
 drivers/vdpa/vdpa_sim/vdpa_sim_blk.c |  4 ++--
 drivers/vdpa/vdpa_sim/vdpa_sim_net.c |  4 ++--
 4 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
index ce83f9130a5d..4774292fba8c 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
@@ -60,8 +60,8 @@ struct vdpasim {
 	struct kthread_worker *worker;
 	struct kthread_work work;
 	struct vdpasim_dev_attr dev_attr;
-	/* spinlock to synchronize virtqueue state */
-	spinlock_t lock;
+	/* mutex to synchronize virtqueue state */
+	struct mutex mutex;
 	/* virtio config according to device type */
 	void *config;
 	struct vhost_iotlb *iommu;
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 6feb29726c2a..a28103a67ae7 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -166,7 +166,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
 	if (IS_ERR(vdpasim->worker))
 		goto err_iommu;
 
-	spin_lock_init(&vdpasim->lock);
+	mutex_init(&vdpasim->mutex);
 	spin_lock_init(&vdpasim->iommu_lock);
 
 	dev = &vdpasim->vdpa.dev;
@@ -275,13 +275,13 @@ static void vdpasim_set_vq_ready(struct vdpa_device *vdpa, u16 idx, bool ready)
 	struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
 	bool old_ready;
 
-	spin_lock(&vdpasim->lock);
+	mutex_lock(&vdpasim->mutex);
 	old_ready = vq->ready;
 	vq->ready = ready;
 	if (vq->ready && !old_ready) {
 		vdpasim_queue_ready(vdpasim, idx);
 	}
-	spin_unlock(&vdpasim->lock);
+	mutex_unlock(&vdpasim->mutex);
 }
 
 static bool vdpasim_get_vq_ready(struct vdpa_device *vdpa, u16 idx)
@@ -299,9 +299,9 @@ static int vdpasim_set_vq_state(struct vdpa_device *vdpa, u16 idx,
 	struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
 	struct vringh *vrh = &vq->vring;
 
-	spin_lock(&vdpasim->lock);
+	mutex_lock(&vdpasim->mutex);
 	vrh->last_avail_idx = state->split.avail_index;
-	spin_unlock(&vdpasim->lock);
+	mutex_unlock(&vdpasim->mutex);
 
 	return 0;
 }
@@ -398,9 +398,9 @@ static u8 vdpasim_get_status(struct vdpa_device *vdpa)
 	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
 	u8 status;
 
-	spin_lock(&vdpasim->lock);
+	mutex_lock(&vdpasim->mutex);
 	status = vdpasim->status;
-	spin_unlock(&vdpasim->lock);
+	mutex_unlock(&vdpasim->mutex);
 
 	return status;
 }
@@ -409,19 +409,19 @@ static void vdpasim_set_status(struct vdpa_device *vdpa, u8 status)
 {
 	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
 
-	spin_lock(&vdpasim->lock);
+	mutex_lock(&vdpasim->mutex);
 	vdpasim->status = status;
-	spin_unlock(&vdpasim->lock);
+	mutex_unlock(&vdpasim->mutex);
 }
 
 static int vdpasim_reset(struct vdpa_device *vdpa)
 {
 	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
 
-	spin_lock(&vdpasim->lock);
+	mutex_lock(&vdpasim->mutex);
 	vdpasim->status = 0;
 	vdpasim_do_reset(vdpasim);
-	spin_unlock(&vdpasim->lock);
+	mutex_unlock(&vdpasim->mutex);
 
 	return 0;
 }
@@ -430,9 +430,9 @@ static int vdpasim_suspend(struct vdpa_device *vdpa)
 {
 	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
 
-	spin_lock(&vdpasim->lock);
+	mutex_lock(&vdpasim->mutex);
 	vdpasim->running = false;
-	spin_unlock(&vdpasim->lock);
+	mutex_unlock(&vdpasim->mutex);
 
 	return 0;
 }
@@ -442,7 +442,7 @@ static int vdpasim_resume(struct vdpa_device *vdpa)
 	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
 	int i;
 
-	spin_lock(&vdpasim->lock);
+	mutex_lock(&vdpasim->mutex);
 	vdpasim->running = true;
 
 	if (vdpasim->pending_kick) {
@@ -453,7 +453,7 @@ static int vdpasim_resume(struct vdpa_device *vdpa)
 		vdpasim->pending_kick = false;
 	}
 
-	spin_unlock(&vdpasim->lock);
+	mutex_unlock(&vdpasim->mutex);
 
 	return 0;
 }
@@ -525,14 +525,14 @@ static int vdpasim_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
 
 	iommu = &vdpasim->iommu[asid];
 
-	spin_lock(&vdpasim->lock);
+	mutex_lock(&vdpasim->mutex);
 
 	for (i = 0; i < vdpasim->dev_attr.nvqs; i++)
 		if (vdpasim_get_vq_group(vdpa, i) == group)
 			vringh_set_iotlb(&vdpasim->vqs[i].vring, iommu,
 					 &vdpasim->iommu_lock);
 
-	spin_unlock(&vdpasim->lock);
+	mutex_unlock(&vdpasim->mutex);
 
 	return 0;
 }
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
index eb4897c8541e..568119e1553f 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
@@ -290,7 +290,7 @@ static void vdpasim_blk_work(struct vdpasim *vdpasim)
 	bool reschedule = false;
 	int i;
 
-	spin_lock(&vdpasim->lock);
+	mutex_lock(&vdpasim->mutex);
 
 	if (!(vdpasim->status & VIRTIO_CONFIG_S_DRIVER_OK))
 		goto out;
@@ -321,7 +321,7 @@ static void vdpasim_blk_work(struct vdpasim *vdpasim)
 		}
 	}
 out:
-	spin_unlock(&vdpasim->lock);
+	mutex_unlock(&vdpasim->mutex);
 
 	if (reschedule)
 		vdpasim_schedule_work(vdpasim);
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
index e61a9ecbfafe..7ab434592bfe 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
@@ -201,7 +201,7 @@ static void vdpasim_net_work(struct vdpasim *vdpasim)
 	u64 rx_drops = 0, rx_overruns = 0, rx_errors = 0, tx_errors = 0;
 	int err;
 
-	spin_lock(&vdpasim->lock);
+	mutex_lock(&vdpasim->mutex);
 
 	if (!vdpasim->running)
 		goto out;
@@ -264,7 +264,7 @@ static void vdpasim_net_work(struct vdpasim *vdpasim)
 	}
 
 out:
-	spin_unlock(&vdpasim->lock);
+	mutex_unlock(&vdpasim->mutex);
 
 	u64_stats_update_begin(&net->tx_stats.syncp);
 	net->tx_stats.pkts += tx_pkts;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 8/8] vdpa_sim: add support for user VA
  2023-03-02 11:34 [PATCH v2 0/8] vdpa_sim: add support for user VA Stefano Garzarella
                   ` (6 preceding siblings ...)
  2023-03-02 11:34 ` [PATCH v2 7/8] vdpa_sim: replace the spinlock with a mutex to protect the state Stefano Garzarella
@ 2023-03-02 11:34 ` Stefano Garzarella
  2023-03-14  5:36   ` Jason Wang
  7 siblings, 1 reply; 40+ messages in thread
From: Stefano Garzarella @ 2023-03-02 11:34 UTC (permalink / raw)
  To: virtualization
  Cc: Andrey Zhadchenko, eperezma, netdev, stefanha, linux-kernel,
	Jason Wang, Michael S. Tsirkin, kvm, Stefano Garzarella

The new "use_va" module parameter (default: false) is used in
vdpa_alloc_device() to inform the vDPA framework that the device
supports VA.

vringh is initialized to use VA only when "use_va" is true and the
user's mm has been bound. So, only when the bus supports user VA
(e.g. vhost-vdpa).

vdpasim_mm_work_fn work is used to attach the kthread to the user
address space when the .bind_mm callback is invoked, and to detach
it when the .unbind_mm callback is invoked.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---

Notes:
    v2:
    - `use_va` set to true by default [Eugenio]
    - supported the new unbind_mm callback [Jason]
    - removed the unbind_mm call in vdpasim_do_reset() [Jason]
    - avoided to release the lock while call kthread_flush_work() since we
      are now using a mutex to protect the device state

 drivers/vdpa/vdpa_sim/vdpa_sim.h |  1 +
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 98 +++++++++++++++++++++++++++++++-
 2 files changed, 97 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
index 4774292fba8c..3a42887d05d9 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
@@ -59,6 +59,7 @@ struct vdpasim {
 	struct vdpasim_virtqueue *vqs;
 	struct kthread_worker *worker;
 	struct kthread_work work;
+	struct mm_struct *mm_bound;
 	struct vdpasim_dev_attr dev_attr;
 	/* mutex to synchronize virtqueue state */
 	struct mutex mutex;
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index a28103a67ae7..eda26bc33df5 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -35,10 +35,77 @@ module_param(max_iotlb_entries, int, 0444);
 MODULE_PARM_DESC(max_iotlb_entries,
 		 "Maximum number of iotlb entries for each address space. 0 means unlimited. (default: 2048)");
 
+static bool use_va = true;
+module_param(use_va, bool, 0444);
+MODULE_PARM_DESC(use_va, "Enable/disable the device's ability to use VA");
+
 #define VDPASIM_QUEUE_ALIGN PAGE_SIZE
 #define VDPASIM_QUEUE_MAX 256
 #define VDPASIM_VENDOR_ID 0
 
+struct vdpasim_mm_work {
+	struct kthread_work work;
+	struct mm_struct *mm;
+	bool bind;
+	int ret;
+};
+
+static void vdpasim_mm_work_fn(struct kthread_work *work)
+{
+	struct vdpasim_mm_work *mm_work =
+		container_of(work, struct vdpasim_mm_work, work);
+
+	mm_work->ret = 0;
+
+	if (mm_work->bind) {
+		kthread_use_mm(mm_work->mm);
+		//TODO: should we attach the cgroup of the mm owner?
+	} else {
+		kthread_unuse_mm(mm_work->mm);
+	}
+}
+
+static void vdpasim_worker_queue_mm(struct vdpasim *vdpasim,
+				    struct vdpasim_mm_work *mm_work)
+{
+	struct kthread_work *work = &mm_work->work;
+
+	kthread_init_work(work, vdpasim_mm_work_fn);
+	kthread_queue_work(vdpasim->worker, work);
+
+	kthread_flush_work(work);
+}
+
+static int vdpasim_worker_bind_mm(struct vdpasim *vdpasim,
+				  struct mm_struct *new_mm)
+{
+	struct vdpasim_mm_work mm_work;
+
+	mm_work.mm = new_mm;
+	mm_work.bind = true;
+
+	vdpasim_worker_queue_mm(vdpasim, &mm_work);
+
+	if (!mm_work.ret)
+		vdpasim->mm_bound = new_mm;
+
+	return mm_work.ret;
+}
+
+static void vdpasim_worker_unbind_mm(struct vdpasim *vdpasim)
+{
+	struct vdpasim_mm_work mm_work;
+
+	if (!vdpasim->mm_bound)
+		return;
+
+	mm_work.mm = vdpasim->mm_bound;
+	mm_work.bind = false;
+
+	vdpasim_worker_queue_mm(vdpasim, &mm_work);
+
+	vdpasim->mm_bound = NULL;
+}
 static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa)
 {
 	return container_of(vdpa, struct vdpasim, vdpa);
@@ -59,8 +126,10 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx)
 {
 	struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
 	uint16_t last_avail_idx = vq->vring.last_avail_idx;
+	bool va_enabled = use_va && vdpasim->mm_bound;
 
-	vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false,
+	vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true,
+			  va_enabled,
 			  (struct vring_desc *)(uintptr_t)vq->desc_addr,
 			  (struct vring_avail *)
 			  (uintptr_t)vq->driver_addr,
@@ -151,7 +220,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
 	vdpa = __vdpa_alloc_device(NULL, ops,
 				   dev_attr->ngroups, dev_attr->nas,
 				   dev_attr->alloc_size,
-				   dev_attr->name, false);
+				   dev_attr->name, use_va);
 	if (IS_ERR(vdpa)) {
 		ret = PTR_ERR(vdpa);
 		goto err_alloc;
@@ -571,6 +640,27 @@ static int vdpasim_set_map(struct vdpa_device *vdpa, unsigned int asid,
 	return ret;
 }
 
+static int vdpasim_bind_mm(struct vdpa_device *vdpa, struct mm_struct *mm)
+{
+	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+	int ret;
+
+	mutex_lock(&vdpasim->mutex);
+	ret = vdpasim_worker_bind_mm(vdpasim, mm);
+	mutex_unlock(&vdpasim->mutex);
+
+	return ret;
+}
+
+static void vdpasim_unbind_mm(struct vdpa_device *vdpa)
+{
+	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+
+	mutex_lock(&vdpasim->mutex);
+	vdpasim_worker_unbind_mm(vdpasim);
+	mutex_unlock(&vdpasim->mutex);
+}
+
 static int vdpasim_dma_map(struct vdpa_device *vdpa, unsigned int asid,
 			   u64 iova, u64 size,
 			   u64 pa, u32 perm, void *opaque)
@@ -667,6 +757,8 @@ static const struct vdpa_config_ops vdpasim_config_ops = {
 	.set_group_asid         = vdpasim_set_group_asid,
 	.dma_map                = vdpasim_dma_map,
 	.dma_unmap              = vdpasim_dma_unmap,
+	.bind_mm		= vdpasim_bind_mm,
+	.unbind_mm		= vdpasim_unbind_mm,
 	.free                   = vdpasim_free,
 };
 
@@ -701,6 +793,8 @@ static const struct vdpa_config_ops vdpasim_batch_config_ops = {
 	.get_iova_range         = vdpasim_get_iova_range,
 	.set_group_asid         = vdpasim_set_group_asid,
 	.set_map                = vdpasim_set_map,
+	.bind_mm		= vdpasim_bind_mm,
+	.unbind_mm		= vdpasim_unbind_mm,
 	.free                   = vdpasim_free,
 };
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 6/8] vdpa_sim: use kthread worker
  2023-03-02 11:34 ` [PATCH v2 6/8] vdpa_sim: use kthread worker Stefano Garzarella
@ 2023-03-02 15:30   ` Simon Horman
  2023-03-02 15:48     ` Stefano Garzarella
  2023-03-05 11:21   ` kernel test robot
  2023-03-14  5:31   ` Jason Wang
  2 siblings, 1 reply; 40+ messages in thread
From: Simon Horman @ 2023-03-02 15:30 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: virtualization, Andrey Zhadchenko, eperezma, netdev, stefanha,
	linux-kernel, Jason Wang, Michael S. Tsirkin, kvm

On Thu, Mar 02, 2023 at 12:34:19PM +0100, Stefano Garzarella wrote:
> Let's use our own kthread to run device jobs.
> This allows us more flexibility, especially we can attach the kthread
> to the user address space when vDPA uses user's VA.
> 
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  drivers/vdpa/vdpa_sim/vdpa_sim.h |  3 ++-
>  drivers/vdpa/vdpa_sim/vdpa_sim.c | 17 ++++++++++++-----
>  2 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> index acee20faaf6a..ce83f9130a5d 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> @@ -57,7 +57,8 @@ struct vdpasim_dev_attr {
>  struct vdpasim {
>  	struct vdpa_device vdpa;
>  	struct vdpasim_virtqueue *vqs;
> -	struct work_struct work;
> +	struct kthread_worker *worker;
> +	struct kthread_work work;
>  	struct vdpasim_dev_attr dev_attr;
>  	/* spinlock to synchronize virtqueue state */
>  	spinlock_t lock;
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index a6ee830efc38..6feb29726c2a 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -11,8 +11,8 @@
>  #include <linux/module.h>
>  #include <linux/device.h>
>  #include <linux/kernel.h>
> +#include <linux/kthread.h>
>  #include <linux/slab.h>
> -#include <linux/sched.h>
>  #include <linux/dma-map-ops.h>
>  #include <linux/vringh.h>
>  #include <linux/vdpa.h>
> @@ -116,7 +116,7 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim)
>  static const struct vdpa_config_ops vdpasim_config_ops;
>  static const struct vdpa_config_ops vdpasim_batch_config_ops;
>  
> -static void vdpasim_work_fn(struct work_struct *work)
> +static void vdpasim_work_fn(struct kthread_work *work)
>  {
>  	struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
>  
> @@ -159,7 +159,13 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
>  
>  	vdpasim = vdpa_to_sim(vdpa);
>  	vdpasim->dev_attr = *dev_attr;
> -	INIT_WORK(&vdpasim->work, vdpasim_work_fn);
> +
> +	kthread_init_work(&vdpasim->work, vdpasim_work_fn);
> +	vdpasim->worker = kthread_create_worker(0, "vDPA sim worker: %s",
> +						dev_attr->name);
> +	if (IS_ERR(vdpasim->worker))
> +		goto err_iommu;

Branching to err_iommu will result in a call to put_device(dev)...

> +
>  	spin_lock_init(&vdpasim->lock);
>  	spin_lock_init(&vdpasim->iommu_lock);

... but dev is not initialised until the line following this hunk,
which is:

	dev = &vdpasim->vdpa.dev;

In order to avoid leaking dev I _think_ the correct approach
is to move the initialisation of dev above the branch to
err_iommu, perhaps above the call to kthread_init_work()
is a good place.

This does move the assignment outside the locks above.
But I _think_ that is ok.

> @@ -212,7 +218,7 @@ EXPORT_SYMBOL_GPL(vdpasim_create);
>  
>  void vdpasim_schedule_work(struct vdpasim *vdpasim)
>  {
> -	schedule_work(&vdpasim->work);
> +	kthread_queue_work(vdpasim->worker, &vdpasim->work);
>  }
>  EXPORT_SYMBOL_GPL(vdpasim_schedule_work);
>  
> @@ -612,7 +618,8 @@ static void vdpasim_free(struct vdpa_device *vdpa)
>  	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
>  	int i;
>  
> -	cancel_work_sync(&vdpasim->work);
> +	kthread_cancel_work_sync(&vdpasim->work);
> +	kthread_destroy_worker(vdpasim->worker);
>  
>  	for (i = 0; i < vdpasim->dev_attr.nvqs; i++) {
>  		vringh_kiov_cleanup(&vdpasim->vqs[i].out_iov);
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 6/8] vdpa_sim: use kthread worker
  2023-03-02 15:30   ` Simon Horman
@ 2023-03-02 15:48     ` Stefano Garzarella
  0 siblings, 0 replies; 40+ messages in thread
From: Stefano Garzarella @ 2023-03-02 15:48 UTC (permalink / raw)
  To: Simon Horman
  Cc: virtualization, Andrey Zhadchenko, eperezma, netdev, stefanha,
	linux-kernel, Jason Wang, Michael S. Tsirkin, kvm

On Thu, Mar 02, 2023 at 04:30:04PM +0100, Simon Horman wrote:
>On Thu, Mar 02, 2023 at 12:34:19PM +0100, Stefano Garzarella wrote:
>> Let's use our own kthread to run device jobs.
>> This allows us more flexibility, especially we can attach the kthread
>> to the user address space when vDPA uses user's VA.
>>
>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>> ---
>>  drivers/vdpa/vdpa_sim/vdpa_sim.h |  3 ++-
>>  drivers/vdpa/vdpa_sim/vdpa_sim.c | 17 ++++++++++++-----
>>  2 files changed, 14 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
>> index acee20faaf6a..ce83f9130a5d 100644
>> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
>> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
>> @@ -57,7 +57,8 @@ struct vdpasim_dev_attr {
>>  struct vdpasim {
>>  	struct vdpa_device vdpa;
>>  	struct vdpasim_virtqueue *vqs;
>> -	struct work_struct work;
>> +	struct kthread_worker *worker;
>> +	struct kthread_work work;
>>  	struct vdpasim_dev_attr dev_attr;
>>  	/* spinlock to synchronize virtqueue state */
>>  	spinlock_t lock;
>> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> index a6ee830efc38..6feb29726c2a 100644
>> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> @@ -11,8 +11,8 @@
>>  #include <linux/module.h>
>>  #include <linux/device.h>
>>  #include <linux/kernel.h>
>> +#include <linux/kthread.h>
>>  #include <linux/slab.h>
>> -#include <linux/sched.h>
>>  #include <linux/dma-map-ops.h>
>>  #include <linux/vringh.h>
>>  #include <linux/vdpa.h>
>> @@ -116,7 +116,7 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim)
>>  static const struct vdpa_config_ops vdpasim_config_ops;
>>  static const struct vdpa_config_ops vdpasim_batch_config_ops;
>>
>> -static void vdpasim_work_fn(struct work_struct *work)
>> +static void vdpasim_work_fn(struct kthread_work *work)
>>  {
>>  	struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
>>
>> @@ -159,7 +159,13 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
>>
>>  	vdpasim = vdpa_to_sim(vdpa);
>>  	vdpasim->dev_attr = *dev_attr;
>> -	INIT_WORK(&vdpasim->work, vdpasim_work_fn);
>> +
>> +	kthread_init_work(&vdpasim->work, vdpasim_work_fn);
>> +	vdpasim->worker = kthread_create_worker(0, "vDPA sim worker: %s",
>> +						dev_attr->name);
>> +	if (IS_ERR(vdpasim->worker))
>> +		goto err_iommu;
>
>Branching to err_iommu will result in a call to put_device(dev)...

Good catch!

>
>> +
>>  	spin_lock_init(&vdpasim->lock);
>>  	spin_lock_init(&vdpasim->iommu_lock);
>
>... but dev is not initialised until the line following this hunk,
>which is:
>
>	dev = &vdpasim->vdpa.dev;
>
>In order to avoid leaking dev I _think_ the correct approach
>is to move the initialisation of dev above the branch to
>err_iommu, perhaps above the call to kthread_init_work()
>is a good place.

Yep, I agree. I'll fix in v3.

Thanks,
Stefano

>
>This does move the assignment outside the locks above.
>But I _think_ that is ok.
>
>> @@ -212,7 +218,7 @@ EXPORT_SYMBOL_GPL(vdpasim_create);
>>
>>  void vdpasim_schedule_work(struct vdpasim *vdpasim)
>>  {
>> -	schedule_work(&vdpasim->work);
>> +	kthread_queue_work(vdpasim->worker, &vdpasim->work);
>>  }
>>  EXPORT_SYMBOL_GPL(vdpasim_schedule_work);
>>
>> @@ -612,7 +618,8 @@ static void vdpasim_free(struct vdpa_device *vdpa)
>>  	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
>>  	int i;
>>
>> -	cancel_work_sync(&vdpasim->work);
>> +	kthread_cancel_work_sync(&vdpasim->work);
>> +	kthread_destroy_worker(vdpasim->worker);
>>
>>  	for (i = 0; i < vdpasim->dev_attr.nvqs; i++) {
>>  		vringh_kiov_cleanup(&vdpasim->vqs[i].out_iov);
>> --
>> 2.39.2
>>
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 4/8] vringh: support VA with iotlb
  2023-03-02 11:34 ` [PATCH v2 4/8] vringh: support VA with iotlb Stefano Garzarella
@ 2023-03-03 14:38   ` Eugenio Perez Martin
  2023-03-07  9:31     ` Stefano Garzarella
  2023-03-16 16:07     ` Stefano Garzarella
  2023-03-14  4:53   ` Jason Wang
  1 sibling, 2 replies; 40+ messages in thread
From: Eugenio Perez Martin @ 2023-03-03 14:38 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: virtualization, Andrey Zhadchenko, netdev, stefanha,
	linux-kernel, Jason Wang, Michael S. Tsirkin, kvm

On Thu, Mar 2, 2023 at 12:35 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> vDPA supports the possibility to use user VA in the iotlb messages.
> So, let's add support for user VA in vringh to use it in the vDPA
> simulators.
>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>
> Notes:
>     v2:
>     - replace kmap_atomic() with kmap_local_page() [see previous patch]
>     - fix cast warnings when build with W=1 C=1
>
>  include/linux/vringh.h            |   5 +-
>  drivers/vdpa/mlx5/net/mlx5_vnet.c |   2 +-
>  drivers/vdpa/vdpa_sim/vdpa_sim.c  |   4 +-
>  drivers/vhost/vringh.c            | 247 ++++++++++++++++++++++++------
>  4 files changed, 205 insertions(+), 53 deletions(-)
>
> diff --git a/include/linux/vringh.h b/include/linux/vringh.h
> index 1991a02c6431..d39b9f2dcba0 100644
> --- a/include/linux/vringh.h
> +++ b/include/linux/vringh.h
> @@ -32,6 +32,9 @@ struct vringh {
>         /* Can we get away with weak barriers? */
>         bool weak_barriers;
>
> +       /* Use user's VA */
> +       bool use_va;
> +
>         /* Last available index we saw (ie. where we're up to). */
>         u16 last_avail_idx;
>
> @@ -279,7 +282,7 @@ void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb,
>                       spinlock_t *iotlb_lock);
>
>  int vringh_init_iotlb(struct vringh *vrh, u64 features,
> -                     unsigned int num, bool weak_barriers,
> +                     unsigned int num, bool weak_barriers, bool use_va,
>                       struct vring_desc *desc,
>                       struct vring_avail *avail,
>                       struct vring_used *used);
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 3a0e721aef05..babc8dd171a6 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -2537,7 +2537,7 @@ static int setup_cvq_vring(struct mlx5_vdpa_dev *mvdev)
>
>         if (mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))
>                 err = vringh_init_iotlb(&cvq->vring, mvdev->actual_features,
> -                                       MLX5_CVQ_MAX_ENT, false,
> +                                       MLX5_CVQ_MAX_ENT, false, false,
>                                         (struct vring_desc *)(uintptr_t)cvq->desc_addr,
>                                         (struct vring_avail *)(uintptr_t)cvq->driver_addr,
>                                         (struct vring_used *)(uintptr_t)cvq->device_addr);
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index 6a0a65814626..481eb156658b 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -60,7 +60,7 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx)
>         struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
>         uint16_t last_avail_idx = vq->vring.last_avail_idx;
>
> -       vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true,
> +       vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false,
>                           (struct vring_desc *)(uintptr_t)vq->desc_addr,
>                           (struct vring_avail *)
>                           (uintptr_t)vq->driver_addr,
> @@ -81,7 +81,7 @@ static void vdpasim_vq_reset(struct vdpasim *vdpasim,
>         vq->cb = NULL;
>         vq->private = NULL;
>         vringh_init_iotlb(&vq->vring, vdpasim->dev_attr.supported_features,
> -                         VDPASIM_QUEUE_MAX, false, NULL, NULL, NULL);
> +                         VDPASIM_QUEUE_MAX, false, false, NULL, NULL, NULL);
>
>         vq->vring.notify = NULL;
>  }
> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> index 0ba3ef809e48..61c79cea44ca 100644
> --- a/drivers/vhost/vringh.c
> +++ b/drivers/vhost/vringh.c
> @@ -1094,15 +1094,99 @@ EXPORT_SYMBOL(vringh_need_notify_kern);
>
>  #if IS_REACHABLE(CONFIG_VHOST_IOTLB)
>
> -static int iotlb_translate(const struct vringh *vrh,
> -                          u64 addr, u64 len, u64 *translated,
> -                          struct bio_vec iov[],
> -                          int iov_size, u32 perm)
> +static int iotlb_translate_va(const struct vringh *vrh,
> +                             u64 addr, u64 len, u64 *translated,
> +                             struct iovec iov[],
> +                             int iov_size, u32 perm)
>  {
>         struct vhost_iotlb_map *map;
>         struct vhost_iotlb *iotlb = vrh->iotlb;
> +       u64 s = 0, last = addr + len - 1;
>         int ret = 0;
> +
> +       spin_lock(vrh->iotlb_lock);
> +
> +       while (len > s) {
> +               u64 size;
> +
> +               if (unlikely(ret >= iov_size)) {
> +                       ret = -ENOBUFS;
> +                       break;
> +               }
> +
> +               map = vhost_iotlb_itree_first(iotlb, addr, last);
> +               if (!map || map->start > addr) {
> +                       ret = -EINVAL;
> +                       break;
> +               } else if (!(map->perm & perm)) {
> +                       ret = -EPERM;
> +                       break;
> +               }
> +
> +               size = map->size - addr + map->start;
> +               iov[ret].iov_len = min(len - s, size);
> +               iov[ret].iov_base = (void __user *)(unsigned long)
> +                                   (map->addr + addr - map->start);
> +               s += size;
> +               addr += size;
> +               ++ret;
> +       }
> +
> +       spin_unlock(vrh->iotlb_lock);
> +
> +       if (translated)
> +               *translated = min(len, s);
> +
> +       return ret;
> +}

It seems to me iotlb_translate_va and iotlb_translate_pa are very
similar, their only difference is that the argument is that iov is
iovec instead of bio_vec. And how to fill it, obviously.

It would be great to merge both functions, only differing with a
conditional on vrh->use_va, or generics, or similar. Or, if following
the style of the rest of vringh code, to provide a callback to fill
iovec (although I like conditional more).

However I cannot think of an easy way to perform that without long
macros or type erasure.

> +
> +static inline int copy_from_va(const struct vringh *vrh, void *dst, void *src,
> +                              u64 len, u64 *translated)
> +{
> +       struct iovec iov[16];
> +       struct iov_iter iter;
> +       int ret;
> +
> +       ret = iotlb_translate_va(vrh, (u64)(uintptr_t)src, len, translated, iov,
> +                                ARRAY_SIZE(iov), VHOST_MAP_RO);
> +       if (ret == -ENOBUFS)
> +               ret = ARRAY_SIZE(iov);
> +       else if (ret < 0)
> +               return ret;
> +
> +       iov_iter_init(&iter, ITER_SOURCE, iov, ret, *translated);
> +
> +       return copy_from_iter(dst, *translated, &iter);

Maybe a good baby step for DRY is to return the iov_iter in
copy_from/to_va/pa here?

But I'm ok with this version too.

Acked-by: Eugenio Pérez Martin <eperezma@redhat.com>

> +}
> +
> +static inline int copy_to_va(const struct vringh *vrh, void *dst, void *src,
> +                            u64 len, u64 *translated)
> +{
> +       struct iovec iov[16];
> +       struct iov_iter iter;
> +       int ret;
> +
> +       ret = iotlb_translate_va(vrh, (u64)(uintptr_t)dst, len, translated, iov,
> +                                ARRAY_SIZE(iov), VHOST_MAP_WO);
> +       if (ret == -ENOBUFS)
> +               ret = ARRAY_SIZE(iov);
> +       else if (ret < 0)
> +               return ret;
> +
> +       iov_iter_init(&iter, ITER_DEST, iov, ret, *translated);
> +
> +       return copy_to_iter(src, *translated, &iter);
> +}
> +
> +static int iotlb_translate_pa(const struct vringh *vrh,
> +                             u64 addr, u64 len, u64 *translated,
> +                             struct bio_vec iov[],
> +                             int iov_size, u32 perm)
> +{
> +       struct vhost_iotlb_map *map;
> +       struct vhost_iotlb *iotlb = vrh->iotlb;
>         u64 s = 0, last = addr + len - 1;
> +       int ret = 0;
>
>         spin_lock(vrh->iotlb_lock);
>
> @@ -1141,28 +1225,61 @@ static int iotlb_translate(const struct vringh *vrh,
>         return ret;
>  }
>
> +static inline int copy_from_pa(const struct vringh *vrh, void *dst, void *src,
> +                              u64 len, u64 *translated)
> +{
> +       struct bio_vec iov[16];
> +       struct iov_iter iter;
> +       int ret;
> +
> +       ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)src, len, translated, iov,
> +                                ARRAY_SIZE(iov), VHOST_MAP_RO);
> +       if (ret == -ENOBUFS)
> +               ret = ARRAY_SIZE(iov);
> +       else if (ret < 0)
> +               return ret;
> +
> +       iov_iter_bvec(&iter, ITER_SOURCE, iov, ret, *translated);
> +
> +       return copy_from_iter(dst, *translated, &iter);
> +}
> +
> +static inline int copy_to_pa(const struct vringh *vrh, void *dst, void *src,
> +                            u64 len, u64 *translated)
> +{
> +       struct bio_vec iov[16];
> +       struct iov_iter iter;
> +       int ret;
> +
> +       ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)dst, len, translated, iov,
> +                                ARRAY_SIZE(iov), VHOST_MAP_WO);
> +       if (ret == -ENOBUFS)
> +               ret = ARRAY_SIZE(iov);
> +       else if (ret < 0)
> +               return ret;
> +
> +       iov_iter_bvec(&iter, ITER_DEST, iov, ret, *translated);
> +
> +       return copy_to_iter(src, *translated, &iter);
> +}
> +
>  static inline int copy_from_iotlb(const struct vringh *vrh, void *dst,
>                                   void *src, size_t len)
>  {
>         u64 total_translated = 0;
>
>         while (total_translated < len) {
> -               struct bio_vec iov[16];
> -               struct iov_iter iter;
>                 u64 translated;
>                 int ret;
>
> -               ret = iotlb_translate(vrh, (u64)(uintptr_t)src,
> -                                     len - total_translated, &translated,
> -                                     iov, ARRAY_SIZE(iov), VHOST_MAP_RO);
> -               if (ret == -ENOBUFS)
> -                       ret = ARRAY_SIZE(iov);
> -               else if (ret < 0)
> -                       return ret;
> -
> -               iov_iter_bvec(&iter, ITER_SOURCE, iov, ret, translated);
> +               if (vrh->use_va) {
> +                       ret = copy_from_va(vrh, dst, src,
> +                                          len - total_translated, &translated);
> +               } else {
> +                       ret = copy_from_pa(vrh, dst, src,
> +                                          len - total_translated, &translated);
> +               }
>
> -               ret = copy_from_iter(dst, translated, &iter);
>                 if (ret < 0)
>                         return ret;
>
> @@ -1180,22 +1297,17 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
>         u64 total_translated = 0;
>
>         while (total_translated < len) {
> -               struct bio_vec iov[16];
> -               struct iov_iter iter;
>                 u64 translated;
>                 int ret;
>
> -               ret = iotlb_translate(vrh, (u64)(uintptr_t)dst,
> -                                     len - total_translated, &translated,
> -                                     iov, ARRAY_SIZE(iov), VHOST_MAP_WO);
> -               if (ret == -ENOBUFS)
> -                       ret = ARRAY_SIZE(iov);
> -               else if (ret < 0)
> -                       return ret;
> -
> -               iov_iter_bvec(&iter, ITER_DEST, iov, ret, translated);
> +               if (vrh->use_va) {
> +                       ret = copy_to_va(vrh, dst, src,
> +                                        len - total_translated, &translated);
> +               } else {
> +                       ret = copy_to_pa(vrh, dst, src,
> +                                        len - total_translated, &translated);
> +               }
>
> -               ret = copy_to_iter(src, translated, &iter);
>                 if (ret < 0)
>                         return ret;
>
> @@ -1210,20 +1322,37 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
>  static inline int getu16_iotlb(const struct vringh *vrh,
>                                u16 *val, const __virtio16 *p)
>  {
> -       struct bio_vec iov;
> -       void *kaddr, *from;
>         int ret;
>
>         /* Atomic read is needed for getu16 */
> -       ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
> -                             &iov, 1, VHOST_MAP_RO);
> -       if (ret < 0)
> -               return ret;
> +       if (vrh->use_va) {
> +               struct iovec iov;
> +               __virtio16 tmp;
> +
> +               ret = iotlb_translate_va(vrh, (u64)(uintptr_t)p, sizeof(*p),
> +                                        NULL, &iov, 1, VHOST_MAP_RO);
> +               if (ret < 0)
> +                       return ret;
>
> -       kaddr = kmap_local_page(iov.bv_page);
> -       from = kaddr + iov.bv_offset;
> -       *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
> -       kunmap_local(kaddr);
> +               ret = __get_user(tmp, (__virtio16 __user *)iov.iov_base);
> +               if (ret)
> +                       return ret;
> +
> +               *val = vringh16_to_cpu(vrh, tmp);
> +       } else {
> +               struct bio_vec iov;
> +               void *kaddr, *from;
> +
> +               ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)p, sizeof(*p),
> +                                        NULL, &iov, 1, VHOST_MAP_RO);
> +               if (ret < 0)
> +                       return ret;
> +
> +               kaddr = kmap_local_page(iov.bv_page);
> +               from = kaddr + iov.bv_offset;
> +               *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
> +               kunmap_local(kaddr);
> +       }
>
>         return 0;
>  }
> @@ -1231,20 +1360,37 @@ static inline int getu16_iotlb(const struct vringh *vrh,
>  static inline int putu16_iotlb(const struct vringh *vrh,
>                                __virtio16 *p, u16 val)
>  {
> -       struct bio_vec iov;
> -       void *kaddr, *to;
>         int ret;
>
>         /* Atomic write is needed for putu16 */
> -       ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
> -                             &iov, 1, VHOST_MAP_WO);
> -       if (ret < 0)
> -               return ret;
> +       if (vrh->use_va) {
> +               struct iovec iov;
> +               __virtio16 tmp;
>
> -       kaddr = kmap_local_page(iov.bv_page);
> -       to = kaddr + iov.bv_offset;
> -       WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
> -       kunmap_local(kaddr);
> +               ret = iotlb_translate_va(vrh, (u64)(uintptr_t)p, sizeof(*p),
> +                                        NULL, &iov, 1, VHOST_MAP_RO);
> +               if (ret < 0)
> +                       return ret;
> +
> +               tmp = cpu_to_vringh16(vrh, val);
> +
> +               ret = __put_user(tmp, (__virtio16 __user *)iov.iov_base);
> +               if (ret)
> +                       return ret;
> +       } else {
> +               struct bio_vec iov;
> +               void *kaddr, *to;
> +
> +               ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
> +                                        &iov, 1, VHOST_MAP_WO);
> +               if (ret < 0)
> +                       return ret;
> +
> +               kaddr = kmap_local_page(iov.bv_page);
> +               to = kaddr + iov.bv_offset;
> +               WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
> +               kunmap_local(kaddr);
> +       }
>
>         return 0;
>  }
> @@ -1306,6 +1452,7 @@ static inline int putused_iotlb(const struct vringh *vrh,
>   * @features: the feature bits for this ring.
>   * @num: the number of elements.
>   * @weak_barriers: true if we only need memory barriers, not I/O.
> + * @use_va: true if IOTLB contains user VA
>   * @desc: the userpace descriptor pointer.
>   * @avail: the userpace avail pointer.
>   * @used: the userpace used pointer.
> @@ -1313,11 +1460,13 @@ static inline int putused_iotlb(const struct vringh *vrh,
>   * Returns an error if num is invalid.
>   */
>  int vringh_init_iotlb(struct vringh *vrh, u64 features,
> -                     unsigned int num, bool weak_barriers,
> +                     unsigned int num, bool weak_barriers, bool use_va,
>                       struct vring_desc *desc,
>                       struct vring_avail *avail,
>                       struct vring_used *used)
>  {
> +       vrh->use_va = use_va;
> +
>         return vringh_init_kern(vrh, features, num, weak_barriers,
>                                 desc, avail, used);
>  }
> --
> 2.39.2
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 5/8] vdpa_sim: make devices agnostic for work management
  2023-03-02 11:34 ` [PATCH v2 5/8] vdpa_sim: make devices agnostic for work management Stefano Garzarella
@ 2023-03-03 14:40   ` Eugenio Perez Martin
  2023-03-14  5:27   ` Jason Wang
  1 sibling, 0 replies; 40+ messages in thread
From: Eugenio Perez Martin @ 2023-03-03 14:40 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: virtualization, Andrey Zhadchenko, netdev, stefanha,
	linux-kernel, Jason Wang, Michael S. Tsirkin, kvm

On Thu, Mar 2, 2023 at 12:35 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> Let's move work management inside the vdpa_sim core.
> This way we can easily change how we manage the works, without
> having to change the devices each time.
>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>

Acked-by: Eugenio Pérez Martin <eperezma@redhat.com>

> ---
>  drivers/vdpa/vdpa_sim/vdpa_sim.h     |  3 ++-
>  drivers/vdpa/vdpa_sim/vdpa_sim.c     | 17 +++++++++++++++--
>  drivers/vdpa/vdpa_sim/vdpa_sim_blk.c |  6 ++----
>  drivers/vdpa/vdpa_sim/vdpa_sim_net.c |  6 ++----
>  4 files changed, 21 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> index 144858636c10..acee20faaf6a 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> @@ -45,7 +45,7 @@ struct vdpasim_dev_attr {
>         u32 ngroups;
>         u32 nas;
>
> -       work_func_t work_fn;
> +       void (*work_fn)(struct vdpasim *vdpasim);
>         void (*get_config)(struct vdpasim *vdpasim, void *config);
>         void (*set_config)(struct vdpasim *vdpasim, const void *config);
>         int (*get_stats)(struct vdpasim *vdpasim, u16 idx,
> @@ -78,6 +78,7 @@ struct vdpasim {
>
>  struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *attr,
>                                const struct vdpa_dev_set_config *config);
> +void vdpasim_schedule_work(struct vdpasim *vdpasim);
>
>  /* TODO: cross-endian support */
>  static inline bool vdpasim_is_little_endian(struct vdpasim *vdpasim)
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index 481eb156658b..a6ee830efc38 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -116,6 +116,13 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim)
>  static const struct vdpa_config_ops vdpasim_config_ops;
>  static const struct vdpa_config_ops vdpasim_batch_config_ops;
>
> +static void vdpasim_work_fn(struct work_struct *work)
> +{
> +       struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
> +
> +       vdpasim->dev_attr.work_fn(vdpasim);
> +}
> +
>  struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
>                                const struct vdpa_dev_set_config *config)
>  {
> @@ -152,7 +159,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
>
>         vdpasim = vdpa_to_sim(vdpa);
>         vdpasim->dev_attr = *dev_attr;
> -       INIT_WORK(&vdpasim->work, dev_attr->work_fn);
> +       INIT_WORK(&vdpasim->work, vdpasim_work_fn);
>         spin_lock_init(&vdpasim->lock);
>         spin_lock_init(&vdpasim->iommu_lock);
>
> @@ -203,6 +210,12 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
>  }
>  EXPORT_SYMBOL_GPL(vdpasim_create);
>
> +void vdpasim_schedule_work(struct vdpasim *vdpasim)
> +{
> +       schedule_work(&vdpasim->work);
> +}
> +EXPORT_SYMBOL_GPL(vdpasim_schedule_work);
> +
>  static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
>                                   u64 desc_area, u64 driver_area,
>                                   u64 device_area)
> @@ -237,7 +250,7 @@ static void vdpasim_kick_vq(struct vdpa_device *vdpa, u16 idx)
>         }
>
>         if (vq->ready)
> -               schedule_work(&vdpasim->work);
> +               vdpasim_schedule_work(vdpasim);
>  }
>
>  static void vdpasim_set_vq_cb(struct vdpa_device *vdpa, u16 idx,
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> index 5117959bed8a..eb4897c8541e 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> @@ -11,7 +11,6 @@
>  #include <linux/module.h>
>  #include <linux/device.h>
>  #include <linux/kernel.h>
> -#include <linux/sched.h>
>  #include <linux/blkdev.h>
>  #include <linux/vringh.h>
>  #include <linux/vdpa.h>
> @@ -286,9 +285,8 @@ static bool vdpasim_blk_handle_req(struct vdpasim *vdpasim,
>         return handled;
>  }
>
> -static void vdpasim_blk_work(struct work_struct *work)
> +static void vdpasim_blk_work(struct vdpasim *vdpasim)
>  {
> -       struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
>         bool reschedule = false;
>         int i;
>
> @@ -326,7 +324,7 @@ static void vdpasim_blk_work(struct work_struct *work)
>         spin_unlock(&vdpasim->lock);
>
>         if (reschedule)
> -               schedule_work(&vdpasim->work);
> +               vdpasim_schedule_work(vdpasim);
>  }
>
>  static void vdpasim_blk_get_config(struct vdpasim *vdpasim, void *config)
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
> index 862f405362de..e61a9ecbfafe 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
> @@ -11,7 +11,6 @@
>  #include <linux/module.h>
>  #include <linux/device.h>
>  #include <linux/kernel.h>
> -#include <linux/sched.h>
>  #include <linux/etherdevice.h>
>  #include <linux/vringh.h>
>  #include <linux/vdpa.h>
> @@ -192,9 +191,8 @@ static void vdpasim_handle_cvq(struct vdpasim *vdpasim)
>         u64_stats_update_end(&net->cq_stats.syncp);
>  }
>
> -static void vdpasim_net_work(struct work_struct *work)
> +static void vdpasim_net_work(struct vdpasim *vdpasim)
>  {
> -       struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
>         struct vdpasim_virtqueue *txq = &vdpasim->vqs[1];
>         struct vdpasim_virtqueue *rxq = &vdpasim->vqs[0];
>         struct vdpasim_net *net = sim_to_net(vdpasim);
> @@ -260,7 +258,7 @@ static void vdpasim_net_work(struct work_struct *work)
>                 vdpasim_net_complete(rxq, write);
>
>                 if (tx_pkts > 4) {
> -                       schedule_work(&vdpasim->work);
> +                       vdpasim_schedule_work(vdpasim);
>                         goto out;
>                 }
>         }
> --
> 2.39.2
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 6/8] vdpa_sim: use kthread worker
  2023-03-02 11:34 ` [PATCH v2 6/8] vdpa_sim: use kthread worker Stefano Garzarella
  2023-03-02 15:30   ` Simon Horman
@ 2023-03-05 11:21   ` kernel test robot
  2023-03-14  5:31   ` Jason Wang
  2 siblings, 0 replies; 40+ messages in thread
From: kernel test robot @ 2023-03-05 11:21 UTC (permalink / raw)
  To: Stefano Garzarella, virtualization
  Cc: llvm, oe-kbuild-all, Andrey Zhadchenko, eperezma, netdev,
	stefanha, linux-kernel, Jason Wang, Michael S. Tsirkin, kvm,
	Stefano Garzarella

Hi Stefano,

I love your patch! Perhaps something to improve:

[auto build test WARNING on mst-vhost/linux-next]
[also build test WARNING on linus/master next-20230303]
[cannot apply to v6.2]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Stefano-Garzarella/vdpa-add-bind_mm-unbind_mm-callbacks/20230302-193850
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
patch link:    https://lore.kernel.org/r/20230302113421.174582-7-sgarzare%40redhat.com
patch subject: [PATCH v2 6/8] vdpa_sim: use kthread worker
config: i386-allyesconfig (https://download.01.org/0day-ci/archive/20230305/202303051841.bPAIzJRy-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/5b2107457ac0e7b1bb0aa3635ebf13b02e82bb78
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Stefano-Garzarella/vdpa-add-bind_mm-unbind_mm-callbacks/20230302-193850
        git checkout 5b2107457ac0e7b1bb0aa3635ebf13b02e82bb78
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash drivers/net/wireless/ath/ath10k/ drivers/vdpa/vdpa_sim/ fs/erofs/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303051841.bPAIzJRy-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/vdpa/vdpa_sim/vdpa_sim.c:166:6: warning: variable 'dev' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
           if (IS_ERR(vdpasim->worker))
               ^~~~~~~~~~~~~~~~~~~~~~~
   drivers/vdpa/vdpa_sim/vdpa_sim.c:213:13: note: uninitialized use occurs here
           put_device(dev);
                      ^~~
   drivers/vdpa/vdpa_sim/vdpa_sim.c:166:2: note: remove the 'if' if its condition is always false
           if (IS_ERR(vdpasim->worker))
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/vdpa/vdpa_sim/vdpa_sim.c:132:20: note: initialize the variable 'dev' to silence this warning
           struct device *dev;
                             ^
                              = NULL
   1 warning generated.


vim +166 drivers/vdpa/vdpa_sim/vdpa_sim.c

   125	
   126	struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
   127				       const struct vdpa_dev_set_config *config)
   128	{
   129		const struct vdpa_config_ops *ops;
   130		struct vdpa_device *vdpa;
   131		struct vdpasim *vdpasim;
   132		struct device *dev;
   133		int i, ret = -ENOMEM;
   134	
   135		if (!dev_attr->alloc_size)
   136			return ERR_PTR(-EINVAL);
   137	
   138		if (config->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) {
   139			if (config->device_features &
   140			    ~dev_attr->supported_features)
   141				return ERR_PTR(-EINVAL);
   142			dev_attr->supported_features =
   143				config->device_features;
   144		}
   145	
   146		if (batch_mapping)
   147			ops = &vdpasim_batch_config_ops;
   148		else
   149			ops = &vdpasim_config_ops;
   150	
   151		vdpa = __vdpa_alloc_device(NULL, ops,
   152					   dev_attr->ngroups, dev_attr->nas,
   153					   dev_attr->alloc_size,
   154					   dev_attr->name, false);
   155		if (IS_ERR(vdpa)) {
   156			ret = PTR_ERR(vdpa);
   157			goto err_alloc;
   158		}
   159	
   160		vdpasim = vdpa_to_sim(vdpa);
   161		vdpasim->dev_attr = *dev_attr;
   162	
   163		kthread_init_work(&vdpasim->work, vdpasim_work_fn);
   164		vdpasim->worker = kthread_create_worker(0, "vDPA sim worker: %s",
   165							dev_attr->name);
 > 166		if (IS_ERR(vdpasim->worker))
   167			goto err_iommu;
   168	
   169		spin_lock_init(&vdpasim->lock);
   170		spin_lock_init(&vdpasim->iommu_lock);
   171	
   172		dev = &vdpasim->vdpa.dev;
   173		dev->dma_mask = &dev->coherent_dma_mask;
   174		if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64)))
   175			goto err_iommu;
   176		vdpasim->vdpa.mdev = dev_attr->mgmt_dev;
   177	
   178		vdpasim->config = kzalloc(dev_attr->config_size, GFP_KERNEL);
   179		if (!vdpasim->config)
   180			goto err_iommu;
   181	
   182		vdpasim->vqs = kcalloc(dev_attr->nvqs, sizeof(struct vdpasim_virtqueue),
   183				       GFP_KERNEL);
   184		if (!vdpasim->vqs)
   185			goto err_iommu;
   186	
   187		vdpasim->iommu = kmalloc_array(vdpasim->dev_attr.nas,
   188					       sizeof(*vdpasim->iommu), GFP_KERNEL);
   189		if (!vdpasim->iommu)
   190			goto err_iommu;
   191	
   192		vdpasim->iommu_pt = kmalloc_array(vdpasim->dev_attr.nas,
   193						  sizeof(*vdpasim->iommu_pt), GFP_KERNEL);
   194		if (!vdpasim->iommu_pt)
   195			goto err_iommu;
   196	
   197		for (i = 0; i < vdpasim->dev_attr.nas; i++)
   198			vhost_iotlb_init(&vdpasim->iommu[i], max_iotlb_entries, 0);
   199	
   200		vdpasim->buffer = kvmalloc(dev_attr->buffer_size, GFP_KERNEL);
   201		if (!vdpasim->buffer)
   202			goto err_iommu;
   203	
   204		for (i = 0; i < dev_attr->nvqs; i++)
   205			vringh_set_iotlb(&vdpasim->vqs[i].vring, &vdpasim->iommu[0],
   206					 &vdpasim->iommu_lock);
   207	
   208		vdpasim->vdpa.dma_dev = dev;
   209	
   210		return vdpasim;
   211	
   212	err_iommu:
   213		put_device(dev);
   214	err_alloc:
   215		return ERR_PTR(ret);
   216	}
   217	EXPORT_SYMBOL_GPL(vdpasim_create);
   218	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 4/8] vringh: support VA with iotlb
  2023-03-03 14:38   ` Eugenio Perez Martin
@ 2023-03-07  9:31     ` Stefano Garzarella
  2023-03-16 16:07     ` Stefano Garzarella
  1 sibling, 0 replies; 40+ messages in thread
From: Stefano Garzarella @ 2023-03-07  9:31 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: virtualization, Andrey Zhadchenko, netdev, stefanha,
	linux-kernel, Jason Wang, Michael S. Tsirkin, kvm

On Fri, Mar 03, 2023 at 03:38:57PM +0100, Eugenio Perez Martin wrote:
>On Thu, Mar 2, 2023 at 12:35 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>>
>> vDPA supports the possibility to use user VA in the iotlb messages.
>> So, let's add support for user VA in vringh to use it in the vDPA
>> simulators.
>>
>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>> ---
>>
>> Notes:
>>     v2:
>>     - replace kmap_atomic() with kmap_local_page() [see previous patch]
>>     - fix cast warnings when build with W=1 C=1
>>
>>  include/linux/vringh.h            |   5 +-
>>  drivers/vdpa/mlx5/net/mlx5_vnet.c |   2 +-
>>  drivers/vdpa/vdpa_sim/vdpa_sim.c  |   4 +-
>>  drivers/vhost/vringh.c            | 247 ++++++++++++++++++++++++------
>>  4 files changed, 205 insertions(+), 53 deletions(-)
>>
>> diff --git a/include/linux/vringh.h b/include/linux/vringh.h
>> index 1991a02c6431..d39b9f2dcba0 100644
>> --- a/include/linux/vringh.h
>> +++ b/include/linux/vringh.h
>> @@ -32,6 +32,9 @@ struct vringh {
>>         /* Can we get away with weak barriers? */
>>         bool weak_barriers;
>>
>> +       /* Use user's VA */
>> +       bool use_va;
>> +
>>         /* Last available index we saw (ie. where we're up to). */
>>         u16 last_avail_idx;
>>
>> @@ -279,7 +282,7 @@ void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb,
>>                       spinlock_t *iotlb_lock);
>>
>>  int vringh_init_iotlb(struct vringh *vrh, u64 features,
>> -                     unsigned int num, bool weak_barriers,
>> +                     unsigned int num, bool weak_barriers, bool use_va,
>>                       struct vring_desc *desc,
>>                       struct vring_avail *avail,
>>                       struct vring_used *used);
>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> index 3a0e721aef05..babc8dd171a6 100644
>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> @@ -2537,7 +2537,7 @@ static int setup_cvq_vring(struct mlx5_vdpa_dev *mvdev)
>>
>>         if (mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))
>>                 err = vringh_init_iotlb(&cvq->vring, mvdev->actual_features,
>> -                                       MLX5_CVQ_MAX_ENT, false,
>> +                                       MLX5_CVQ_MAX_ENT, false, false,
>>                                         (struct vring_desc *)(uintptr_t)cvq->desc_addr,
>>                                         (struct vring_avail *)(uintptr_t)cvq->driver_addr,
>>                                         (struct vring_used *)(uintptr_t)cvq->device_addr);
>> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> index 6a0a65814626..481eb156658b 100644
>> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> @@ -60,7 +60,7 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx)
>>         struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
>>         uint16_t last_avail_idx = vq->vring.last_avail_idx;
>>
>> -       vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true,
>> +       vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false,
>>                           (struct vring_desc *)(uintptr_t)vq->desc_addr,
>>                           (struct vring_avail *)
>>                           (uintptr_t)vq->driver_addr,
>> @@ -81,7 +81,7 @@ static void vdpasim_vq_reset(struct vdpasim *vdpasim,
>>         vq->cb = NULL;
>>         vq->private = NULL;
>>         vringh_init_iotlb(&vq->vring, vdpasim->dev_attr.supported_features,
>> -                         VDPASIM_QUEUE_MAX, false, NULL, NULL, NULL);
>> +                         VDPASIM_QUEUE_MAX, false, false, NULL, NULL, NULL);
>>
>>         vq->vring.notify = NULL;
>>  }
>> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
>> index 0ba3ef809e48..61c79cea44ca 100644
>> --- a/drivers/vhost/vringh.c
>> +++ b/drivers/vhost/vringh.c
>> @@ -1094,15 +1094,99 @@ EXPORT_SYMBOL(vringh_need_notify_kern);
>>
>>  #if IS_REACHABLE(CONFIG_VHOST_IOTLB)
>>
>> -static int iotlb_translate(const struct vringh *vrh,
>> -                          u64 addr, u64 len, u64 *translated,
>> -                          struct bio_vec iov[],
>> -                          int iov_size, u32 perm)
>> +static int iotlb_translate_va(const struct vringh *vrh,
>> +                             u64 addr, u64 len, u64 *translated,
>> +                             struct iovec iov[],
>> +                             int iov_size, u32 perm)
>>  {
>>         struct vhost_iotlb_map *map;
>>         struct vhost_iotlb *iotlb = vrh->iotlb;
>> +       u64 s = 0, last = addr + len - 1;
>>         int ret = 0;
>> +
>> +       spin_lock(vrh->iotlb_lock);
>> +
>> +       while (len > s) {
>> +               u64 size;
>> +
>> +               if (unlikely(ret >= iov_size)) {
>> +                       ret = -ENOBUFS;
>> +                       break;
>> +               }
>> +
>> +               map = vhost_iotlb_itree_first(iotlb, addr, last);
>> +               if (!map || map->start > addr) {
>> +                       ret = -EINVAL;
>> +                       break;
>> +               } else if (!(map->perm & perm)) {
>> +                       ret = -EPERM;
>> +                       break;
>> +               }
>> +
>> +               size = map->size - addr + map->start;
>> +               iov[ret].iov_len = min(len - s, size);
>> +               iov[ret].iov_base = (void __user *)(unsigned long)
>> +                                   (map->addr + addr - map->start);
>> +               s += size;
>> +               addr += size;
>> +               ++ret;
>> +       }
>> +
>> +       spin_unlock(vrh->iotlb_lock);
>> +
>> +       if (translated)
>> +               *translated = min(len, s);
>> +
>> +       return ret;
>> +}
>
>It seems to me iotlb_translate_va and iotlb_translate_pa are very
>similar, their only difference is that the argument is that iov is
>iovec instead of bio_vec. And how to fill it, obviously.
>
>It would be great to merge both functions, only differing with a
>conditional on vrh->use_va, or generics, or similar. Or, if following
>the style of the rest of vringh code, to provide a callback to fill
>iovec (although I like conditional more).
>
>However I cannot think of an easy way to perform that without long
>macros or type erasure.

I agree and I tried, but then I got messed up and let it go.

But maybe with the callback it shouldn't be too messy, I can try it and
see what comes out :-)

>
>> +
>> +static inline int copy_from_va(const struct vringh *vrh, void *dst, void *src,
>> +                              u64 len, u64 *translated)
>> +{
>> +       struct iovec iov[16];
>> +       struct iov_iter iter;
>> +       int ret;
>> +
>> +       ret = iotlb_translate_va(vrh, (u64)(uintptr_t)src, len, translated, iov,
>> +                                ARRAY_SIZE(iov), VHOST_MAP_RO);
>> +       if (ret == -ENOBUFS)
>> +               ret = ARRAY_SIZE(iov);
>> +       else if (ret < 0)
>> +               return ret;
>> +
>> +       iov_iter_init(&iter, ITER_SOURCE, iov, ret, *translated);
>> +
>> +       return copy_from_iter(dst, *translated, &iter);
>
>Maybe a good baby step for DRY is to return the iov_iter in
>copy_from/to_va/pa here?

Good point! I'll try it.

>
>But I'm ok with this version too.
>
>Acked-by: Eugenio Pérez Martin <eperezma@redhat.com>

Thanks for the review!
Stefano


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/8] vdpa: add bind_mm/unbind_mm callbacks
  2023-03-02 11:34 ` [PATCH v2 1/8] vdpa: add bind_mm/unbind_mm callbacks Stefano Garzarella
@ 2023-03-14  3:39   ` Jason Wang
  2023-03-16  8:17     ` Stefano Garzarella
  0 siblings, 1 reply; 40+ messages in thread
From: Jason Wang @ 2023-03-14  3:39 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: virtualization, Andrey Zhadchenko, eperezma, netdev, stefanha,
	linux-kernel, Michael S. Tsirkin, kvm

On Thu, Mar 2, 2023 at 7:34 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> These new optional callbacks is used to bind/unbind the device to
> a specific address space so the vDPA framework can use VA when
> these callbacks are implemented.
>
> Suggested-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---

One thing that came into my mind is that after this commit:

commit 5ce995f313ce56c0c62425c3ddc37c5c50fc33db
Author: Jason Wang <jasowang@redhat.com>
Date:   Fri May 29 16:02:59 2020 +0800

    vhost: use mmgrab() instead of mmget() for non worker device

    For the device that doesn't use vhost worker and use_mm(), mmget() is
    too heavy weight and it may brings troubles for implementing mmap()
    support for vDPA device.

We don't hold the address space after this commit, so the userspace
mapping could be invalid if the owner exits?

Thanks

>
> Notes:
>     v2:
>     - removed `struct task_struct *owner` param (unused for now, maybe
>       useful to support cgroups) [Jason]
>     - add unbind_mm callback [Jason]
>
>  include/linux/vdpa.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> index 43f59ef10cc9..369c21394284 100644
> --- a/include/linux/vdpa.h
> +++ b/include/linux/vdpa.h
> @@ -290,6 +290,14 @@ struct vdpa_map_file {
>   *                             @vdev: vdpa device
>   *                             @idx: virtqueue index
>   *                             Returns pointer to structure device or error (NULL)
> + * @bind_mm:                   Bind the device to a specific address space
> + *                             so the vDPA framework can use VA when this
> + *                             callback is implemented. (optional)
> + *                             @vdev: vdpa device
> + *                             @mm: address space to bind
> + * @unbind_mm:                 Unbind the device from the address space
> + *                             bound using the bind_mm callback. (optional)
> + *                             @vdev: vdpa device
>   * @free:                      Free resources that belongs to vDPA (optional)
>   *                             @vdev: vdpa device
>   */
> @@ -351,6 +359,8 @@ struct vdpa_config_ops {
>         int (*set_group_asid)(struct vdpa_device *vdev, unsigned int group,
>                               unsigned int asid);
>         struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 idx);
> +       int (*bind_mm)(struct vdpa_device *vdev, struct mm_struct *mm);
> +       void (*unbind_mm)(struct vdpa_device *vdev);
>
>         /* Free device resources */
>         void (*free)(struct vdpa_device *vdev);
> --
> 2.39.2
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/8] vhost-vdpa: use bind_mm/unbind_mm device callbacks
  2023-03-02 11:34 ` [PATCH v2 2/8] vhost-vdpa: use bind_mm/unbind_mm device callbacks Stefano Garzarella
@ 2023-03-14  3:48   ` Jason Wang
  2023-03-16  8:31     ` Stefano Garzarella
  0 siblings, 1 reply; 40+ messages in thread
From: Jason Wang @ 2023-03-14  3:48 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: virtualization, Andrey Zhadchenko, eperezma, netdev, stefanha,
	linux-kernel, Michael S. Tsirkin, kvm

On Thu, Mar 2, 2023 at 7:34 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> When the user call VHOST_SET_OWNER ioctl and the vDPA device
> has `use_va` set to true, let's call the bind_mm callback.
> In this way we can bind the device to the user address space
> and directly use the user VA.
>
> The unbind_mm callback is called during the release after
> stopping the device.
>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>
> Notes:
>     v2:
>     - call the new unbind_mm callback during the release [Jason]
>     - avoid to call bind_mm callback after the reset, since the device
>       is not detaching it now during the reset
>
>  drivers/vhost/vdpa.c | 30 ++++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index dc12dbd5b43b..1ab89fccd825 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -219,6 +219,28 @@ static int vhost_vdpa_reset(struct vhost_vdpa *v)
>         return vdpa_reset(vdpa);
>  }
>
> +static long vhost_vdpa_bind_mm(struct vhost_vdpa *v)
> +{
> +       struct vdpa_device *vdpa = v->vdpa;
> +       const struct vdpa_config_ops *ops = vdpa->config;
> +
> +       if (!vdpa->use_va || !ops->bind_mm)
> +               return 0;
> +
> +       return ops->bind_mm(vdpa, v->vdev.mm);
> +}
> +
> +static void vhost_vdpa_unbind_mm(struct vhost_vdpa *v)
> +{
> +       struct vdpa_device *vdpa = v->vdpa;
> +       const struct vdpa_config_ops *ops = vdpa->config;
> +
> +       if (!vdpa->use_va || !ops->unbind_mm)
> +               return;
> +
> +       ops->unbind_mm(vdpa);
> +}
> +
>  static long vhost_vdpa_get_device_id(struct vhost_vdpa *v, u8 __user *argp)
>  {
>         struct vdpa_device *vdpa = v->vdpa;
> @@ -711,6 +733,13 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
>                 break;
>         default:
>                 r = vhost_dev_ioctl(&v->vdev, cmd, argp);
> +               if (!r && cmd == VHOST_SET_OWNER) {
> +                       r = vhost_vdpa_bind_mm(v);
> +                       if (r) {
> +                               vhost_dev_reset_owner(&v->vdev, NULL);
> +                               break;
> +                       }
> +               }

Nit: is it better to have a new condition/switch branch instead of
putting them under default? (as what vring_ioctl did).

Thanks

>                 if (r == -ENOIOCTLCMD)
>                         r = vhost_vdpa_vring_ioctl(v, cmd, argp);
>                 break;
> @@ -1285,6 +1314,7 @@ static int vhost_vdpa_release(struct inode *inode, struct file *filep)
>         vhost_vdpa_clean_irq(v);
>         vhost_vdpa_reset(v);
>         vhost_dev_stop(&v->vdev);
> +       vhost_vdpa_unbind_mm(v);
>         vhost_vdpa_free_domain(v);
>         vhost_vdpa_config_put(v);
>         vhost_vdpa_cleanup(v);
> --
> 2.39.2
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 3/8] vringh: replace kmap_atomic() with kmap_local_page()
  2023-03-02 11:34 ` [PATCH v2 3/8] vringh: replace kmap_atomic() with kmap_local_page() Stefano Garzarella
@ 2023-03-14  3:56   ` Jason Wang
  2023-03-15 21:12     ` Fabio M. De Francesco
  2023-03-16  9:13   ` Fabio M. De Francesco
  1 sibling, 1 reply; 40+ messages in thread
From: Jason Wang @ 2023-03-14  3:56 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: virtualization, Andrey Zhadchenko, eperezma, netdev, stefanha,
	linux-kernel, Michael S. Tsirkin, kvm

On Thu, Mar 2, 2023 at 7:34 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> kmap_atomic() is deprecated in favor of kmap_local_page().

It's better to mention the commit or code that introduces this.

>
> With kmap_local_page() the mappings are per thread, CPU local, can take
> page-faults, and can be called from any context (including interrupts).
> Furthermore, the tasks can be preempted and, when they are scheduled to
> run again, the kernel virtual addresses are restored and still valid.
>
> kmap_atomic() is implemented like a kmap_local_page() which also disables
> page-faults and preemption (the latter only for !PREEMPT_RT kernels,
> otherwise it only disables migration).
>
> The code within the mappings/un-mappings in getu16_iotlb() and
> putu16_iotlb() don't depend on the above-mentioned side effects of
> kmap_atomic(),

Note we used to use spinlock to protect simulators (at least until
patch 7, so we probably need to re-order the patches at least) so I
think this is only valid when:

The vringh IOTLB helpers are not used in atomic context (e.g spinlock,
interrupts).

If yes, should we document this? (Or should we introduce a boolean to
say whether an IOTLB variant can be used in an atomic context)?

Thanks

> so that mere replacements of the old API with the new one
> is all that is required (i.e., there is no need to explicitly add calls
> to pagefault_disable() and/or preempt_disable()).
>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>
> Notes:
>     v2:
>     - added this patch since checkpatch.pl complained about deprecation
>       of kmap_atomic() touched by next patch
>
>  drivers/vhost/vringh.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> index a1e27da54481..0ba3ef809e48 100644
> --- a/drivers/vhost/vringh.c
> +++ b/drivers/vhost/vringh.c
> @@ -1220,10 +1220,10 @@ static inline int getu16_iotlb(const struct vringh *vrh,
>         if (ret < 0)
>                 return ret;
>
> -       kaddr = kmap_atomic(iov.bv_page);
> +       kaddr = kmap_local_page(iov.bv_page);
>         from = kaddr + iov.bv_offset;
>         *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
> -       kunmap_atomic(kaddr);
> +       kunmap_local(kaddr);
>
>         return 0;
>  }
> @@ -1241,10 +1241,10 @@ static inline int putu16_iotlb(const struct vringh *vrh,
>         if (ret < 0)
>                 return ret;
>
> -       kaddr = kmap_atomic(iov.bv_page);
> +       kaddr = kmap_local_page(iov.bv_page);
>         to = kaddr + iov.bv_offset;
>         WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
> -       kunmap_atomic(kaddr);
> +       kunmap_local(kaddr);
>
>         return 0;
>  }
> --
> 2.39.2
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 4/8] vringh: support VA with iotlb
  2023-03-02 11:34 ` [PATCH v2 4/8] vringh: support VA with iotlb Stefano Garzarella
  2023-03-03 14:38   ` Eugenio Perez Martin
@ 2023-03-14  4:53   ` Jason Wang
  2023-03-16  8:38     ` Stefano Garzarella
  1 sibling, 1 reply; 40+ messages in thread
From: Jason Wang @ 2023-03-14  4:53 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: virtualization, Andrey Zhadchenko, eperezma, netdev, stefanha,
	linux-kernel, Michael S. Tsirkin, kvm

On Thu, Mar 2, 2023 at 7:35 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> vDPA supports the possibility to use user VA in the iotlb messages.
> So, let's add support for user VA in vringh to use it in the vDPA
> simulators.
>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>
> Notes:
>     v2:
>     - replace kmap_atomic() with kmap_local_page() [see previous patch]
>     - fix cast warnings when build with W=1 C=1
>
>  include/linux/vringh.h            |   5 +-
>  drivers/vdpa/mlx5/net/mlx5_vnet.c |   2 +-
>  drivers/vdpa/vdpa_sim/vdpa_sim.c  |   4 +-
>  drivers/vhost/vringh.c            | 247 ++++++++++++++++++++++++------
>  4 files changed, 205 insertions(+), 53 deletions(-)
>
> diff --git a/include/linux/vringh.h b/include/linux/vringh.h
> index 1991a02c6431..d39b9f2dcba0 100644
> --- a/include/linux/vringh.h
> +++ b/include/linux/vringh.h
> @@ -32,6 +32,9 @@ struct vringh {
>         /* Can we get away with weak barriers? */
>         bool weak_barriers;
>
> +       /* Use user's VA */
> +       bool use_va;
> +
>         /* Last available index we saw (ie. where we're up to). */
>         u16 last_avail_idx;
>
> @@ -279,7 +282,7 @@ void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb,
>                       spinlock_t *iotlb_lock);
>
>  int vringh_init_iotlb(struct vringh *vrh, u64 features,
> -                     unsigned int num, bool weak_barriers,
> +                     unsigned int num, bool weak_barriers, bool use_va,
>                       struct vring_desc *desc,
>                       struct vring_avail *avail,
>                       struct vring_used *used);
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 3a0e721aef05..babc8dd171a6 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -2537,7 +2537,7 @@ static int setup_cvq_vring(struct mlx5_vdpa_dev *mvdev)
>
>         if (mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))
>                 err = vringh_init_iotlb(&cvq->vring, mvdev->actual_features,
> -                                       MLX5_CVQ_MAX_ENT, false,
> +                                       MLX5_CVQ_MAX_ENT, false, false,
>                                         (struct vring_desc *)(uintptr_t)cvq->desc_addr,
>                                         (struct vring_avail *)(uintptr_t)cvq->driver_addr,
>                                         (struct vring_used *)(uintptr_t)cvq->device_addr);
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index 6a0a65814626..481eb156658b 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -60,7 +60,7 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx)
>         struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
>         uint16_t last_avail_idx = vq->vring.last_avail_idx;
>
> -       vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true,
> +       vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false,
>                           (struct vring_desc *)(uintptr_t)vq->desc_addr,
>                           (struct vring_avail *)
>                           (uintptr_t)vq->driver_addr,
> @@ -81,7 +81,7 @@ static void vdpasim_vq_reset(struct vdpasim *vdpasim,
>         vq->cb = NULL;
>         vq->private = NULL;
>         vringh_init_iotlb(&vq->vring, vdpasim->dev_attr.supported_features,
> -                         VDPASIM_QUEUE_MAX, false, NULL, NULL, NULL);
> +                         VDPASIM_QUEUE_MAX, false, false, NULL, NULL, NULL);
>
>         vq->vring.notify = NULL;
>  }
> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> index 0ba3ef809e48..61c79cea44ca 100644
> --- a/drivers/vhost/vringh.c
> +++ b/drivers/vhost/vringh.c
> @@ -1094,15 +1094,99 @@ EXPORT_SYMBOL(vringh_need_notify_kern);
>
>  #if IS_REACHABLE(CONFIG_VHOST_IOTLB)
>
> -static int iotlb_translate(const struct vringh *vrh,
> -                          u64 addr, u64 len, u64 *translated,
> -                          struct bio_vec iov[],
> -                          int iov_size, u32 perm)
> +static int iotlb_translate_va(const struct vringh *vrh,
> +                             u64 addr, u64 len, u64 *translated,
> +                             struct iovec iov[],
> +                             int iov_size, u32 perm)
>  {
>         struct vhost_iotlb_map *map;
>         struct vhost_iotlb *iotlb = vrh->iotlb;
> +       u64 s = 0, last = addr + len - 1;
>         int ret = 0;
> +
> +       spin_lock(vrh->iotlb_lock);
> +
> +       while (len > s) {
> +               u64 size;
> +
> +               if (unlikely(ret >= iov_size)) {
> +                       ret = -ENOBUFS;
> +                       break;
> +               }
> +
> +               map = vhost_iotlb_itree_first(iotlb, addr, last);
> +               if (!map || map->start > addr) {
> +                       ret = -EINVAL;
> +                       break;
> +               } else if (!(map->perm & perm)) {
> +                       ret = -EPERM;
> +                       break;
> +               }
> +
> +               size = map->size - addr + map->start;
> +               iov[ret].iov_len = min(len - s, size);
> +               iov[ret].iov_base = (void __user *)(unsigned long)
> +                                   (map->addr + addr - map->start);
> +               s += size;
> +               addr += size;
> +               ++ret;
> +       }
> +
> +       spin_unlock(vrh->iotlb_lock);
> +
> +       if (translated)
> +               *translated = min(len, s);
> +
> +       return ret;
> +}
> +
> +static inline int copy_from_va(const struct vringh *vrh, void *dst, void *src,
> +                              u64 len, u64 *translated)
> +{
> +       struct iovec iov[16];
> +       struct iov_iter iter;
> +       int ret;
> +
> +       ret = iotlb_translate_va(vrh, (u64)(uintptr_t)src, len, translated, iov,
> +                                ARRAY_SIZE(iov), VHOST_MAP_RO);
> +       if (ret == -ENOBUFS)
> +               ret = ARRAY_SIZE(iov);
> +       else if (ret < 0)
> +               return ret;
> +
> +       iov_iter_init(&iter, ITER_SOURCE, iov, ret, *translated);
> +
> +       return copy_from_iter(dst, *translated, &iter);
> +}
> +
> +static inline int copy_to_va(const struct vringh *vrh, void *dst, void *src,
> +                            u64 len, u64 *translated)
> +{
> +       struct iovec iov[16];
> +       struct iov_iter iter;
> +       int ret;
> +
> +       ret = iotlb_translate_va(vrh, (u64)(uintptr_t)dst, len, translated, iov,
> +                                ARRAY_SIZE(iov), VHOST_MAP_WO);
> +       if (ret == -ENOBUFS)
> +               ret = ARRAY_SIZE(iov);
> +       else if (ret < 0)
> +               return ret;
> +
> +       iov_iter_init(&iter, ITER_DEST, iov, ret, *translated);
> +
> +       return copy_to_iter(src, *translated, &iter);
> +}
> +
> +static int iotlb_translate_pa(const struct vringh *vrh,
> +                             u64 addr, u64 len, u64 *translated,
> +                             struct bio_vec iov[],
> +                             int iov_size, u32 perm)
> +{
> +       struct vhost_iotlb_map *map;
> +       struct vhost_iotlb *iotlb = vrh->iotlb;
>         u64 s = 0, last = addr + len - 1;
> +       int ret = 0;
>
>         spin_lock(vrh->iotlb_lock);
>
> @@ -1141,28 +1225,61 @@ static int iotlb_translate(const struct vringh *vrh,
>         return ret;
>  }
>
> +static inline int copy_from_pa(const struct vringh *vrh, void *dst, void *src,
> +                              u64 len, u64 *translated)
> +{
> +       struct bio_vec iov[16];
> +       struct iov_iter iter;
> +       int ret;
> +
> +       ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)src, len, translated, iov,
> +                                ARRAY_SIZE(iov), VHOST_MAP_RO);
> +       if (ret == -ENOBUFS)
> +               ret = ARRAY_SIZE(iov);
> +       else if (ret < 0)
> +               return ret;
> +
> +       iov_iter_bvec(&iter, ITER_SOURCE, iov, ret, *translated);
> +
> +       return copy_from_iter(dst, *translated, &iter);
> +}
> +
> +static inline int copy_to_pa(const struct vringh *vrh, void *dst, void *src,
> +                            u64 len, u64 *translated)
> +{
> +       struct bio_vec iov[16];
> +       struct iov_iter iter;
> +       int ret;
> +
> +       ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)dst, len, translated, iov,
> +                                ARRAY_SIZE(iov), VHOST_MAP_WO);
> +       if (ret == -ENOBUFS)
> +               ret = ARRAY_SIZE(iov);
> +       else if (ret < 0)
> +               return ret;
> +
> +       iov_iter_bvec(&iter, ITER_DEST, iov, ret, *translated);
> +
> +       return copy_to_iter(src, *translated, &iter);
> +}
> +
>  static inline int copy_from_iotlb(const struct vringh *vrh, void *dst,
>                                   void *src, size_t len)
>  {
>         u64 total_translated = 0;
>
>         while (total_translated < len) {
> -               struct bio_vec iov[16];
> -               struct iov_iter iter;
>                 u64 translated;
>                 int ret;
>
> -               ret = iotlb_translate(vrh, (u64)(uintptr_t)src,
> -                                     len - total_translated, &translated,
> -                                     iov, ARRAY_SIZE(iov), VHOST_MAP_RO);
> -               if (ret == -ENOBUFS)
> -                       ret = ARRAY_SIZE(iov);
> -               else if (ret < 0)
> -                       return ret;
> -
> -               iov_iter_bvec(&iter, ITER_SOURCE, iov, ret, translated);
> +               if (vrh->use_va) {
> +                       ret = copy_from_va(vrh, dst, src,
> +                                          len - total_translated, &translated);
> +               } else {
> +                       ret = copy_from_pa(vrh, dst, src,
> +                                          len - total_translated, &translated);
> +               }
>
> -               ret = copy_from_iter(dst, translated, &iter);
>                 if (ret < 0)
>                         return ret;
>
> @@ -1180,22 +1297,17 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
>         u64 total_translated = 0;
>
>         while (total_translated < len) {
> -               struct bio_vec iov[16];
> -               struct iov_iter iter;
>                 u64 translated;
>                 int ret;
>
> -               ret = iotlb_translate(vrh, (u64)(uintptr_t)dst,
> -                                     len - total_translated, &translated,
> -                                     iov, ARRAY_SIZE(iov), VHOST_MAP_WO);
> -               if (ret == -ENOBUFS)
> -                       ret = ARRAY_SIZE(iov);
> -               else if (ret < 0)
> -                       return ret;
> -
> -               iov_iter_bvec(&iter, ITER_DEST, iov, ret, translated);
> +               if (vrh->use_va) {
> +                       ret = copy_to_va(vrh, dst, src,
> +                                        len - total_translated, &translated);
> +               } else {
> +                       ret = copy_to_pa(vrh, dst, src,
> +                                        len - total_translated, &translated);
> +               }
>
> -               ret = copy_to_iter(src, translated, &iter);
>                 if (ret < 0)
>                         return ret;
>
> @@ -1210,20 +1322,37 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
>  static inline int getu16_iotlb(const struct vringh *vrh,
>                                u16 *val, const __virtio16 *p)
>  {
> -       struct bio_vec iov;
> -       void *kaddr, *from;
>         int ret;
>
>         /* Atomic read is needed for getu16 */
> -       ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
> -                             &iov, 1, VHOST_MAP_RO);
> -       if (ret < 0)
> -               return ret;
> +       if (vrh->use_va) {
> +               struct iovec iov;
> +               __virtio16 tmp;
> +
> +               ret = iotlb_translate_va(vrh, (u64)(uintptr_t)p, sizeof(*p),
> +                                        NULL, &iov, 1, VHOST_MAP_RO);
> +               if (ret < 0)
> +                       return ret;

Nit: since we have copy_to_va/copy_to_pa variants, let's introduce
getu16_iotlb_va/pa variants?

>
> -       kaddr = kmap_local_page(iov.bv_page);
> -       from = kaddr + iov.bv_offset;
> -       *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
> -       kunmap_local(kaddr);
> +               ret = __get_user(tmp, (__virtio16 __user *)iov.iov_base);
> +               if (ret)
> +                       return ret;
> +
> +               *val = vringh16_to_cpu(vrh, tmp);
> +       } else {
> +               struct bio_vec iov;
> +               void *kaddr, *from;
> +
> +               ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)p, sizeof(*p),
> +                                        NULL, &iov, 1, VHOST_MAP_RO);
> +               if (ret < 0)
> +                       return ret;
> +
> +               kaddr = kmap_local_page(iov.bv_page);

If we decide to have a use_va switch, is kmap_local_page() still required here?

Other looks good.

Thanks

> +               from = kaddr + iov.bv_offset;
> +               *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
> +               kunmap_local(kaddr);
> +       }
>
>         return 0;
>  }
> @@ -1231,20 +1360,37 @@ static inline int getu16_iotlb(const struct vringh *vrh,
>  static inline int putu16_iotlb(const struct vringh *vrh,
>                                __virtio16 *p, u16 val)
>  {
> -       struct bio_vec iov;
> -       void *kaddr, *to;
>         int ret;
>
>         /* Atomic write is needed for putu16 */
> -       ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
> -                             &iov, 1, VHOST_MAP_WO);
> -       if (ret < 0)
> -               return ret;
> +       if (vrh->use_va) {
> +               struct iovec iov;
> +               __virtio16 tmp;
>
> -       kaddr = kmap_local_page(iov.bv_page);
> -       to = kaddr + iov.bv_offset;
> -       WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
> -       kunmap_local(kaddr);
> +               ret = iotlb_translate_va(vrh, (u64)(uintptr_t)p, sizeof(*p),
> +                                        NULL, &iov, 1, VHOST_MAP_RO);
> +               if (ret < 0)
> +                       return ret;
> +
> +               tmp = cpu_to_vringh16(vrh, val);
> +
> +               ret = __put_user(tmp, (__virtio16 __user *)iov.iov_base);
> +               if (ret)
> +                       return ret;
> +       } else {
> +               struct bio_vec iov;
> +               void *kaddr, *to;
> +
> +               ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
> +                                        &iov, 1, VHOST_MAP_WO);
> +               if (ret < 0)
> +                       return ret;
> +
> +               kaddr = kmap_local_page(iov.bv_page);
> +               to = kaddr + iov.bv_offset;
> +               WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
> +               kunmap_local(kaddr);
> +       }
>
>         return 0;
>  }
> @@ -1306,6 +1452,7 @@ static inline int putused_iotlb(const struct vringh *vrh,
>   * @features: the feature bits for this ring.
>   * @num: the number of elements.
>   * @weak_barriers: true if we only need memory barriers, not I/O.
> + * @use_va: true if IOTLB contains user VA
>   * @desc: the userpace descriptor pointer.
>   * @avail: the userpace avail pointer.
>   * @used: the userpace used pointer.
> @@ -1313,11 +1460,13 @@ static inline int putused_iotlb(const struct vringh *vrh,
>   * Returns an error if num is invalid.
>   */
>  int vringh_init_iotlb(struct vringh *vrh, u64 features,
> -                     unsigned int num, bool weak_barriers,
> +                     unsigned int num, bool weak_barriers, bool use_va,
>                       struct vring_desc *desc,
>                       struct vring_avail *avail,
>                       struct vring_used *used)
>  {
> +       vrh->use_va = use_va;
> +
>         return vringh_init_kern(vrh, features, num, weak_barriers,
>                                 desc, avail, used);
>  }
> --
> 2.39.2
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 5/8] vdpa_sim: make devices agnostic for work management
  2023-03-02 11:34 ` [PATCH v2 5/8] vdpa_sim: make devices agnostic for work management Stefano Garzarella
  2023-03-03 14:40   ` Eugenio Perez Martin
@ 2023-03-14  5:27   ` Jason Wang
  1 sibling, 0 replies; 40+ messages in thread
From: Jason Wang @ 2023-03-14  5:27 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: virtualization, Andrey Zhadchenko, eperezma, netdev, stefanha,
	linux-kernel, Michael S. Tsirkin, kvm

On Thu, Mar 2, 2023 at 7:35 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> Let's move work management inside the vdpa_sim core.
> This way we can easily change how we manage the works, without
> having to change the devices each time.
>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks

> ---
>  drivers/vdpa/vdpa_sim/vdpa_sim.h     |  3 ++-
>  drivers/vdpa/vdpa_sim/vdpa_sim.c     | 17 +++++++++++++++--
>  drivers/vdpa/vdpa_sim/vdpa_sim_blk.c |  6 ++----
>  drivers/vdpa/vdpa_sim/vdpa_sim_net.c |  6 ++----
>  4 files changed, 21 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> index 144858636c10..acee20faaf6a 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> @@ -45,7 +45,7 @@ struct vdpasim_dev_attr {
>         u32 ngroups;
>         u32 nas;
>
> -       work_func_t work_fn;
> +       void (*work_fn)(struct vdpasim *vdpasim);
>         void (*get_config)(struct vdpasim *vdpasim, void *config);
>         void (*set_config)(struct vdpasim *vdpasim, const void *config);
>         int (*get_stats)(struct vdpasim *vdpasim, u16 idx,
> @@ -78,6 +78,7 @@ struct vdpasim {
>
>  struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *attr,
>                                const struct vdpa_dev_set_config *config);
> +void vdpasim_schedule_work(struct vdpasim *vdpasim);
>
>  /* TODO: cross-endian support */
>  static inline bool vdpasim_is_little_endian(struct vdpasim *vdpasim)
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index 481eb156658b..a6ee830efc38 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -116,6 +116,13 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim)
>  static const struct vdpa_config_ops vdpasim_config_ops;
>  static const struct vdpa_config_ops vdpasim_batch_config_ops;
>
> +static void vdpasim_work_fn(struct work_struct *work)
> +{
> +       struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
> +
> +       vdpasim->dev_attr.work_fn(vdpasim);
> +}
> +
>  struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
>                                const struct vdpa_dev_set_config *config)
>  {
> @@ -152,7 +159,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
>
>         vdpasim = vdpa_to_sim(vdpa);
>         vdpasim->dev_attr = *dev_attr;
> -       INIT_WORK(&vdpasim->work, dev_attr->work_fn);
> +       INIT_WORK(&vdpasim->work, vdpasim_work_fn);
>         spin_lock_init(&vdpasim->lock);
>         spin_lock_init(&vdpasim->iommu_lock);
>
> @@ -203,6 +210,12 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
>  }
>  EXPORT_SYMBOL_GPL(vdpasim_create);
>
> +void vdpasim_schedule_work(struct vdpasim *vdpasim)
> +{
> +       schedule_work(&vdpasim->work);
> +}
> +EXPORT_SYMBOL_GPL(vdpasim_schedule_work);
> +
>  static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
>                                   u64 desc_area, u64 driver_area,
>                                   u64 device_area)
> @@ -237,7 +250,7 @@ static void vdpasim_kick_vq(struct vdpa_device *vdpa, u16 idx)
>         }
>
>         if (vq->ready)
> -               schedule_work(&vdpasim->work);
> +               vdpasim_schedule_work(vdpasim);
>  }
>
>  static void vdpasim_set_vq_cb(struct vdpa_device *vdpa, u16 idx,
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> index 5117959bed8a..eb4897c8541e 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> @@ -11,7 +11,6 @@
>  #include <linux/module.h>
>  #include <linux/device.h>
>  #include <linux/kernel.h>
> -#include <linux/sched.h>
>  #include <linux/blkdev.h>
>  #include <linux/vringh.h>
>  #include <linux/vdpa.h>
> @@ -286,9 +285,8 @@ static bool vdpasim_blk_handle_req(struct vdpasim *vdpasim,
>         return handled;
>  }
>
> -static void vdpasim_blk_work(struct work_struct *work)
> +static void vdpasim_blk_work(struct vdpasim *vdpasim)
>  {
> -       struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
>         bool reschedule = false;
>         int i;
>
> @@ -326,7 +324,7 @@ static void vdpasim_blk_work(struct work_struct *work)
>         spin_unlock(&vdpasim->lock);
>
>         if (reschedule)
> -               schedule_work(&vdpasim->work);
> +               vdpasim_schedule_work(vdpasim);
>  }
>
>  static void vdpasim_blk_get_config(struct vdpasim *vdpasim, void *config)
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
> index 862f405362de..e61a9ecbfafe 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
> @@ -11,7 +11,6 @@
>  #include <linux/module.h>
>  #include <linux/device.h>
>  #include <linux/kernel.h>
> -#include <linux/sched.h>
>  #include <linux/etherdevice.h>
>  #include <linux/vringh.h>
>  #include <linux/vdpa.h>
> @@ -192,9 +191,8 @@ static void vdpasim_handle_cvq(struct vdpasim *vdpasim)
>         u64_stats_update_end(&net->cq_stats.syncp);
>  }
>
> -static void vdpasim_net_work(struct work_struct *work)
> +static void vdpasim_net_work(struct vdpasim *vdpasim)
>  {
> -       struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
>         struct vdpasim_virtqueue *txq = &vdpasim->vqs[1];
>         struct vdpasim_virtqueue *rxq = &vdpasim->vqs[0];
>         struct vdpasim_net *net = sim_to_net(vdpasim);
> @@ -260,7 +258,7 @@ static void vdpasim_net_work(struct work_struct *work)
>                 vdpasim_net_complete(rxq, write);
>
>                 if (tx_pkts > 4) {
> -                       schedule_work(&vdpasim->work);
> +                       vdpasim_schedule_work(vdpasim);
>                         goto out;
>                 }
>         }
> --
> 2.39.2
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 7/8] vdpa_sim: replace the spinlock with a mutex to protect the state
  2023-03-02 11:34 ` [PATCH v2 7/8] vdpa_sim: replace the spinlock with a mutex to protect the state Stefano Garzarella
@ 2023-03-14  5:29   ` Jason Wang
  2023-03-14  5:31     ` Jason Wang
  0 siblings, 1 reply; 40+ messages in thread
From: Jason Wang @ 2023-03-14  5:29 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: virtualization, Andrey Zhadchenko, eperezma, netdev, stefanha,
	linux-kernel, Michael S. Tsirkin, kvm

On Thu, Mar 2, 2023 at 7:35 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> The spinlock we use to protect the state of the simulator is sometimes
> held for a long time (for example, when devices handle requests).
>
> This also prevents us from calling functions that might sleep (such as
> kthread_flush_work() in the next patch), and thus having to release
> and retake the lock.
>
> For these reasons, let's replace the spinlock with a mutex that gives
> us more flexibility.
>
> Suggested-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks

> ---
>  drivers/vdpa/vdpa_sim/vdpa_sim.h     |  4 ++--
>  drivers/vdpa/vdpa_sim/vdpa_sim.c     | 34 ++++++++++++++--------------
>  drivers/vdpa/vdpa_sim/vdpa_sim_blk.c |  4 ++--
>  drivers/vdpa/vdpa_sim/vdpa_sim_net.c |  4 ++--
>  4 files changed, 23 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> index ce83f9130a5d..4774292fba8c 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> @@ -60,8 +60,8 @@ struct vdpasim {
>         struct kthread_worker *worker;
>         struct kthread_work work;
>         struct vdpasim_dev_attr dev_attr;
> -       /* spinlock to synchronize virtqueue state */
> -       spinlock_t lock;
> +       /* mutex to synchronize virtqueue state */
> +       struct mutex mutex;
>         /* virtio config according to device type */
>         void *config;
>         struct vhost_iotlb *iommu;
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index 6feb29726c2a..a28103a67ae7 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -166,7 +166,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
>         if (IS_ERR(vdpasim->worker))
>                 goto err_iommu;
>
> -       spin_lock_init(&vdpasim->lock);
> +       mutex_init(&vdpasim->mutex);
>         spin_lock_init(&vdpasim->iommu_lock);
>
>         dev = &vdpasim->vdpa.dev;
> @@ -275,13 +275,13 @@ static void vdpasim_set_vq_ready(struct vdpa_device *vdpa, u16 idx, bool ready)
>         struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
>         bool old_ready;
>
> -       spin_lock(&vdpasim->lock);
> +       mutex_lock(&vdpasim->mutex);
>         old_ready = vq->ready;
>         vq->ready = ready;
>         if (vq->ready && !old_ready) {
>                 vdpasim_queue_ready(vdpasim, idx);
>         }
> -       spin_unlock(&vdpasim->lock);
> +       mutex_unlock(&vdpasim->mutex);
>  }
>
>  static bool vdpasim_get_vq_ready(struct vdpa_device *vdpa, u16 idx)
> @@ -299,9 +299,9 @@ static int vdpasim_set_vq_state(struct vdpa_device *vdpa, u16 idx,
>         struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
>         struct vringh *vrh = &vq->vring;
>
> -       spin_lock(&vdpasim->lock);
> +       mutex_lock(&vdpasim->mutex);
>         vrh->last_avail_idx = state->split.avail_index;
> -       spin_unlock(&vdpasim->lock);
> +       mutex_unlock(&vdpasim->mutex);
>
>         return 0;
>  }
> @@ -398,9 +398,9 @@ static u8 vdpasim_get_status(struct vdpa_device *vdpa)
>         struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
>         u8 status;
>
> -       spin_lock(&vdpasim->lock);
> +       mutex_lock(&vdpasim->mutex);
>         status = vdpasim->status;
> -       spin_unlock(&vdpasim->lock);
> +       mutex_unlock(&vdpasim->mutex);
>
>         return status;
>  }
> @@ -409,19 +409,19 @@ static void vdpasim_set_status(struct vdpa_device *vdpa, u8 status)
>  {
>         struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
>
> -       spin_lock(&vdpasim->lock);
> +       mutex_lock(&vdpasim->mutex);
>         vdpasim->status = status;
> -       spin_unlock(&vdpasim->lock);
> +       mutex_unlock(&vdpasim->mutex);
>  }
>
>  static int vdpasim_reset(struct vdpa_device *vdpa)
>  {
>         struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
>
> -       spin_lock(&vdpasim->lock);
> +       mutex_lock(&vdpasim->mutex);
>         vdpasim->status = 0;
>         vdpasim_do_reset(vdpasim);
> -       spin_unlock(&vdpasim->lock);
> +       mutex_unlock(&vdpasim->mutex);
>
>         return 0;
>  }
> @@ -430,9 +430,9 @@ static int vdpasim_suspend(struct vdpa_device *vdpa)
>  {
>         struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
>
> -       spin_lock(&vdpasim->lock);
> +       mutex_lock(&vdpasim->mutex);
>         vdpasim->running = false;
> -       spin_unlock(&vdpasim->lock);
> +       mutex_unlock(&vdpasim->mutex);
>
>         return 0;
>  }
> @@ -442,7 +442,7 @@ static int vdpasim_resume(struct vdpa_device *vdpa)
>         struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
>         int i;
>
> -       spin_lock(&vdpasim->lock);
> +       mutex_lock(&vdpasim->mutex);
>         vdpasim->running = true;
>
>         if (vdpasim->pending_kick) {
> @@ -453,7 +453,7 @@ static int vdpasim_resume(struct vdpa_device *vdpa)
>                 vdpasim->pending_kick = false;
>         }
>
> -       spin_unlock(&vdpasim->lock);
> +       mutex_unlock(&vdpasim->mutex);
>
>         return 0;
>  }
> @@ -525,14 +525,14 @@ static int vdpasim_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
>
>         iommu = &vdpasim->iommu[asid];
>
> -       spin_lock(&vdpasim->lock);
> +       mutex_lock(&vdpasim->mutex);
>
>         for (i = 0; i < vdpasim->dev_attr.nvqs; i++)
>                 if (vdpasim_get_vq_group(vdpa, i) == group)
>                         vringh_set_iotlb(&vdpasim->vqs[i].vring, iommu,
>                                          &vdpasim->iommu_lock);
>
> -       spin_unlock(&vdpasim->lock);
> +       mutex_unlock(&vdpasim->mutex);
>
>         return 0;
>  }
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> index eb4897c8541e..568119e1553f 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> @@ -290,7 +290,7 @@ static void vdpasim_blk_work(struct vdpasim *vdpasim)
>         bool reschedule = false;
>         int i;
>
> -       spin_lock(&vdpasim->lock);
> +       mutex_lock(&vdpasim->mutex);
>
>         if (!(vdpasim->status & VIRTIO_CONFIG_S_DRIVER_OK))
>                 goto out;
> @@ -321,7 +321,7 @@ static void vdpasim_blk_work(struct vdpasim *vdpasim)
>                 }
>         }
>  out:
> -       spin_unlock(&vdpasim->lock);
> +       mutex_unlock(&vdpasim->mutex);
>
>         if (reschedule)
>                 vdpasim_schedule_work(vdpasim);
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
> index e61a9ecbfafe..7ab434592bfe 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
> @@ -201,7 +201,7 @@ static void vdpasim_net_work(struct vdpasim *vdpasim)
>         u64 rx_drops = 0, rx_overruns = 0, rx_errors = 0, tx_errors = 0;
>         int err;
>
> -       spin_lock(&vdpasim->lock);
> +       mutex_lock(&vdpasim->mutex);
>
>         if (!vdpasim->running)
>                 goto out;
> @@ -264,7 +264,7 @@ static void vdpasim_net_work(struct vdpasim *vdpasim)
>         }
>
>  out:
> -       spin_unlock(&vdpasim->lock);
> +       mutex_unlock(&vdpasim->mutex);
>
>         u64_stats_update_begin(&net->tx_stats.syncp);
>         net->tx_stats.pkts += tx_pkts;
> --
> 2.39.2
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 7/8] vdpa_sim: replace the spinlock with a mutex to protect the state
  2023-03-14  5:29   ` Jason Wang
@ 2023-03-14  5:31     ` Jason Wang
  2023-03-16  8:42       ` Stefano Garzarella
  0 siblings, 1 reply; 40+ messages in thread
From: Jason Wang @ 2023-03-14  5:31 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: virtualization, Andrey Zhadchenko, eperezma, netdev, stefanha,
	linux-kernel, Michael S. Tsirkin, kvm

On Tue, Mar 14, 2023 at 1:29 PM Jason Wang <jasowang@redhat.com> wrote:
>
> On Thu, Mar 2, 2023 at 7:35 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >
> > The spinlock we use to protect the state of the simulator is sometimes
> > held for a long time (for example, when devices handle requests).
> >
> > This also prevents us from calling functions that might sleep (such as
> > kthread_flush_work() in the next patch), and thus having to release
> > and retake the lock.
> >
> > For these reasons, let's replace the spinlock with a mutex that gives
> > us more flexibility.
> >
> > Suggested-by: Jason Wang <jasowang@redhat.com>
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>
> Acked-by: Jason Wang <jasowang@redhat.com>
>
> Thanks

Btw, though it looks fine but we'd better double confirm virtio_vdpa works well.

(I think so since there's transport that might sleep).

Thanks

>
> > ---
> >  drivers/vdpa/vdpa_sim/vdpa_sim.h     |  4 ++--
> >  drivers/vdpa/vdpa_sim/vdpa_sim.c     | 34 ++++++++++++++--------------
> >  drivers/vdpa/vdpa_sim/vdpa_sim_blk.c |  4 ++--
> >  drivers/vdpa/vdpa_sim/vdpa_sim_net.c |  4 ++--
> >  4 files changed, 23 insertions(+), 23 deletions(-)
> >
> > diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> > index ce83f9130a5d..4774292fba8c 100644
> > --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
> > +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> > @@ -60,8 +60,8 @@ struct vdpasim {
> >         struct kthread_worker *worker;
> >         struct kthread_work work;
> >         struct vdpasim_dev_attr dev_attr;
> > -       /* spinlock to synchronize virtqueue state */
> > -       spinlock_t lock;
> > +       /* mutex to synchronize virtqueue state */
> > +       struct mutex mutex;
> >         /* virtio config according to device type */
> >         void *config;
> >         struct vhost_iotlb *iommu;
> > diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> > index 6feb29726c2a..a28103a67ae7 100644
> > --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> > +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> > @@ -166,7 +166,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
> >         if (IS_ERR(vdpasim->worker))
> >                 goto err_iommu;
> >
> > -       spin_lock_init(&vdpasim->lock);
> > +       mutex_init(&vdpasim->mutex);
> >         spin_lock_init(&vdpasim->iommu_lock);
> >
> >         dev = &vdpasim->vdpa.dev;
> > @@ -275,13 +275,13 @@ static void vdpasim_set_vq_ready(struct vdpa_device *vdpa, u16 idx, bool ready)
> >         struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> >         bool old_ready;
> >
> > -       spin_lock(&vdpasim->lock);
> > +       mutex_lock(&vdpasim->mutex);
> >         old_ready = vq->ready;
> >         vq->ready = ready;
> >         if (vq->ready && !old_ready) {
> >                 vdpasim_queue_ready(vdpasim, idx);
> >         }
> > -       spin_unlock(&vdpasim->lock);
> > +       mutex_unlock(&vdpasim->mutex);
> >  }
> >
> >  static bool vdpasim_get_vq_ready(struct vdpa_device *vdpa, u16 idx)
> > @@ -299,9 +299,9 @@ static int vdpasim_set_vq_state(struct vdpa_device *vdpa, u16 idx,
> >         struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> >         struct vringh *vrh = &vq->vring;
> >
> > -       spin_lock(&vdpasim->lock);
> > +       mutex_lock(&vdpasim->mutex);
> >         vrh->last_avail_idx = state->split.avail_index;
> > -       spin_unlock(&vdpasim->lock);
> > +       mutex_unlock(&vdpasim->mutex);
> >
> >         return 0;
> >  }
> > @@ -398,9 +398,9 @@ static u8 vdpasim_get_status(struct vdpa_device *vdpa)
> >         struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> >         u8 status;
> >
> > -       spin_lock(&vdpasim->lock);
> > +       mutex_lock(&vdpasim->mutex);
> >         status = vdpasim->status;
> > -       spin_unlock(&vdpasim->lock);
> > +       mutex_unlock(&vdpasim->mutex);
> >
> >         return status;
> >  }
> > @@ -409,19 +409,19 @@ static void vdpasim_set_status(struct vdpa_device *vdpa, u8 status)
> >  {
> >         struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> >
> > -       spin_lock(&vdpasim->lock);
> > +       mutex_lock(&vdpasim->mutex);
> >         vdpasim->status = status;
> > -       spin_unlock(&vdpasim->lock);
> > +       mutex_unlock(&vdpasim->mutex);
> >  }
> >
> >  static int vdpasim_reset(struct vdpa_device *vdpa)
> >  {
> >         struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> >
> > -       spin_lock(&vdpasim->lock);
> > +       mutex_lock(&vdpasim->mutex);
> >         vdpasim->status = 0;
> >         vdpasim_do_reset(vdpasim);
> > -       spin_unlock(&vdpasim->lock);
> > +       mutex_unlock(&vdpasim->mutex);
> >
> >         return 0;
> >  }
> > @@ -430,9 +430,9 @@ static int vdpasim_suspend(struct vdpa_device *vdpa)
> >  {
> >         struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> >
> > -       spin_lock(&vdpasim->lock);
> > +       mutex_lock(&vdpasim->mutex);
> >         vdpasim->running = false;
> > -       spin_unlock(&vdpasim->lock);
> > +       mutex_unlock(&vdpasim->mutex);
> >
> >         return 0;
> >  }
> > @@ -442,7 +442,7 @@ static int vdpasim_resume(struct vdpa_device *vdpa)
> >         struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> >         int i;
> >
> > -       spin_lock(&vdpasim->lock);
> > +       mutex_lock(&vdpasim->mutex);
> >         vdpasim->running = true;
> >
> >         if (vdpasim->pending_kick) {
> > @@ -453,7 +453,7 @@ static int vdpasim_resume(struct vdpa_device *vdpa)
> >                 vdpasim->pending_kick = false;
> >         }
> >
> > -       spin_unlock(&vdpasim->lock);
> > +       mutex_unlock(&vdpasim->mutex);
> >
> >         return 0;
> >  }
> > @@ -525,14 +525,14 @@ static int vdpasim_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
> >
> >         iommu = &vdpasim->iommu[asid];
> >
> > -       spin_lock(&vdpasim->lock);
> > +       mutex_lock(&vdpasim->mutex);
> >
> >         for (i = 0; i < vdpasim->dev_attr.nvqs; i++)
> >                 if (vdpasim_get_vq_group(vdpa, i) == group)
> >                         vringh_set_iotlb(&vdpasim->vqs[i].vring, iommu,
> >                                          &vdpasim->iommu_lock);
> >
> > -       spin_unlock(&vdpasim->lock);
> > +       mutex_unlock(&vdpasim->mutex);
> >
> >         return 0;
> >  }
> > diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> > index eb4897c8541e..568119e1553f 100644
> > --- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> > +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> > @@ -290,7 +290,7 @@ static void vdpasim_blk_work(struct vdpasim *vdpasim)
> >         bool reschedule = false;
> >         int i;
> >
> > -       spin_lock(&vdpasim->lock);
> > +       mutex_lock(&vdpasim->mutex);
> >
> >         if (!(vdpasim->status & VIRTIO_CONFIG_S_DRIVER_OK))
> >                 goto out;
> > @@ -321,7 +321,7 @@ static void vdpasim_blk_work(struct vdpasim *vdpasim)
> >                 }
> >         }
> >  out:
> > -       spin_unlock(&vdpasim->lock);
> > +       mutex_unlock(&vdpasim->mutex);
> >
> >         if (reschedule)
> >                 vdpasim_schedule_work(vdpasim);
> > diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
> > index e61a9ecbfafe..7ab434592bfe 100644
> > --- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
> > +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
> > @@ -201,7 +201,7 @@ static void vdpasim_net_work(struct vdpasim *vdpasim)
> >         u64 rx_drops = 0, rx_overruns = 0, rx_errors = 0, tx_errors = 0;
> >         int err;
> >
> > -       spin_lock(&vdpasim->lock);
> > +       mutex_lock(&vdpasim->mutex);
> >
> >         if (!vdpasim->running)
> >                 goto out;
> > @@ -264,7 +264,7 @@ static void vdpasim_net_work(struct vdpasim *vdpasim)
> >         }
> >
> >  out:
> > -       spin_unlock(&vdpasim->lock);
> > +       mutex_unlock(&vdpasim->mutex);
> >
> >         u64_stats_update_begin(&net->tx_stats.syncp);
> >         net->tx_stats.pkts += tx_pkts;
> > --
> > 2.39.2
> >


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 6/8] vdpa_sim: use kthread worker
  2023-03-02 11:34 ` [PATCH v2 6/8] vdpa_sim: use kthread worker Stefano Garzarella
  2023-03-02 15:30   ` Simon Horman
  2023-03-05 11:21   ` kernel test robot
@ 2023-03-14  5:31   ` Jason Wang
  2 siblings, 0 replies; 40+ messages in thread
From: Jason Wang @ 2023-03-14  5:31 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: virtualization, Andrey Zhadchenko, eperezma, netdev, stefanha,
	linux-kernel, Michael S. Tsirkin, kvm

On Thu, Mar 2, 2023 at 7:35 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> Let's use our own kthread to run device jobs.
> This allows us more flexibility, especially we can attach the kthread
> to the user address space when vDPA uses user's VA.
>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks



> ---
>  drivers/vdpa/vdpa_sim/vdpa_sim.h |  3 ++-
>  drivers/vdpa/vdpa_sim/vdpa_sim.c | 17 ++++++++++++-----
>  2 files changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> index acee20faaf6a..ce83f9130a5d 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> @@ -57,7 +57,8 @@ struct vdpasim_dev_attr {
>  struct vdpasim {
>         struct vdpa_device vdpa;
>         struct vdpasim_virtqueue *vqs;
> -       struct work_struct work;
> +       struct kthread_worker *worker;
> +       struct kthread_work work;
>         struct vdpasim_dev_attr dev_attr;
>         /* spinlock to synchronize virtqueue state */
>         spinlock_t lock;
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index a6ee830efc38..6feb29726c2a 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -11,8 +11,8 @@
>  #include <linux/module.h>
>  #include <linux/device.h>
>  #include <linux/kernel.h>
> +#include <linux/kthread.h>
>  #include <linux/slab.h>
> -#include <linux/sched.h>
>  #include <linux/dma-map-ops.h>
>  #include <linux/vringh.h>
>  #include <linux/vdpa.h>
> @@ -116,7 +116,7 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim)
>  static const struct vdpa_config_ops vdpasim_config_ops;
>  static const struct vdpa_config_ops vdpasim_batch_config_ops;
>
> -static void vdpasim_work_fn(struct work_struct *work)
> +static void vdpasim_work_fn(struct kthread_work *work)
>  {
>         struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
>
> @@ -159,7 +159,13 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
>
>         vdpasim = vdpa_to_sim(vdpa);
>         vdpasim->dev_attr = *dev_attr;
> -       INIT_WORK(&vdpasim->work, vdpasim_work_fn);
> +
> +       kthread_init_work(&vdpasim->work, vdpasim_work_fn);
> +       vdpasim->worker = kthread_create_worker(0, "vDPA sim worker: %s",
> +                                               dev_attr->name);
> +       if (IS_ERR(vdpasim->worker))
> +               goto err_iommu;
> +
>         spin_lock_init(&vdpasim->lock);
>         spin_lock_init(&vdpasim->iommu_lock);
>
> @@ -212,7 +218,7 @@ EXPORT_SYMBOL_GPL(vdpasim_create);
>
>  void vdpasim_schedule_work(struct vdpasim *vdpasim)
>  {
> -       schedule_work(&vdpasim->work);
> +       kthread_queue_work(vdpasim->worker, &vdpasim->work);
>  }
>  EXPORT_SYMBOL_GPL(vdpasim_schedule_work);
>
> @@ -612,7 +618,8 @@ static void vdpasim_free(struct vdpa_device *vdpa)
>         struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
>         int i;
>
> -       cancel_work_sync(&vdpasim->work);
> +       kthread_cancel_work_sync(&vdpasim->work);
> +       kthread_destroy_worker(vdpasim->worker);
>
>         for (i = 0; i < vdpasim->dev_attr.nvqs; i++) {
>                 vringh_kiov_cleanup(&vdpasim->vqs[i].out_iov);
> --
> 2.39.2
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 8/8] vdpa_sim: add support for user VA
  2023-03-02 11:34 ` [PATCH v2 8/8] vdpa_sim: add support for user VA Stefano Garzarella
@ 2023-03-14  5:36   ` Jason Wang
  2023-03-16  9:11     ` Stefano Garzarella
  0 siblings, 1 reply; 40+ messages in thread
From: Jason Wang @ 2023-03-14  5:36 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: virtualization, Andrey Zhadchenko, eperezma, netdev, stefanha,
	linux-kernel, Michael S. Tsirkin, kvm

On Thu, Mar 2, 2023 at 7:35 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> The new "use_va" module parameter (default: false) is used in
> vdpa_alloc_device() to inform the vDPA framework that the device
> supports VA.
>
> vringh is initialized to use VA only when "use_va" is true and the
> user's mm has been bound. So, only when the bus supports user VA
> (e.g. vhost-vdpa).
>
> vdpasim_mm_work_fn work is used to attach the kthread to the user
> address space when the .bind_mm callback is invoked, and to detach
> it when the .unbind_mm callback is invoked.
>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>
> Notes:
>     v2:
>     - `use_va` set to true by default [Eugenio]
>     - supported the new unbind_mm callback [Jason]
>     - removed the unbind_mm call in vdpasim_do_reset() [Jason]
>     - avoided to release the lock while call kthread_flush_work() since we
>       are now using a mutex to protect the device state
>
>  drivers/vdpa/vdpa_sim/vdpa_sim.h |  1 +
>  drivers/vdpa/vdpa_sim/vdpa_sim.c | 98 +++++++++++++++++++++++++++++++-
>  2 files changed, 97 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> index 4774292fba8c..3a42887d05d9 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> @@ -59,6 +59,7 @@ struct vdpasim {
>         struct vdpasim_virtqueue *vqs;
>         struct kthread_worker *worker;
>         struct kthread_work work;
> +       struct mm_struct *mm_bound;
>         struct vdpasim_dev_attr dev_attr;
>         /* mutex to synchronize virtqueue state */
>         struct mutex mutex;
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index a28103a67ae7..eda26bc33df5 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -35,10 +35,77 @@ module_param(max_iotlb_entries, int, 0444);
>  MODULE_PARM_DESC(max_iotlb_entries,
>                  "Maximum number of iotlb entries for each address space. 0 means unlimited. (default: 2048)");
>
> +static bool use_va = true;
> +module_param(use_va, bool, 0444);
> +MODULE_PARM_DESC(use_va, "Enable/disable the device's ability to use VA");
> +
>  #define VDPASIM_QUEUE_ALIGN PAGE_SIZE
>  #define VDPASIM_QUEUE_MAX 256
>  #define VDPASIM_VENDOR_ID 0
>
> +struct vdpasim_mm_work {
> +       struct kthread_work work;
> +       struct mm_struct *mm;
> +       bool bind;
> +       int ret;
> +};
> +
> +static void vdpasim_mm_work_fn(struct kthread_work *work)
> +{
> +       struct vdpasim_mm_work *mm_work =
> +               container_of(work, struct vdpasim_mm_work, work);
> +
> +       mm_work->ret = 0;
> +
> +       if (mm_work->bind) {
> +               kthread_use_mm(mm_work->mm);
> +               //TODO: should we attach the cgroup of the mm owner?
> +       } else {
> +               kthread_unuse_mm(mm_work->mm);
> +       }
> +}
> +
> +static void vdpasim_worker_queue_mm(struct vdpasim *vdpasim,
> +                                   struct vdpasim_mm_work *mm_work)
> +{

Nit: we need to tweak the name as it does flush besides queuing the work.

> +       struct kthread_work *work = &mm_work->work;
> +
> +       kthread_init_work(work, vdpasim_mm_work_fn);
> +       kthread_queue_work(vdpasim->worker, work);
> +
> +       kthread_flush_work(work);
> +}
> +
> +static int vdpasim_worker_bind_mm(struct vdpasim *vdpasim,
> +                                 struct mm_struct *new_mm)
> +{
> +       struct vdpasim_mm_work mm_work;
> +
> +       mm_work.mm = new_mm;
> +       mm_work.bind = true;
> +
> +       vdpasim_worker_queue_mm(vdpasim, &mm_work);
> +
> +       if (!mm_work.ret)
> +               vdpasim->mm_bound = new_mm;
> +
> +       return mm_work.ret;
> +}
> +
> +static void vdpasim_worker_unbind_mm(struct vdpasim *vdpasim)
> +{
> +       struct vdpasim_mm_work mm_work;
> +
> +       if (!vdpasim->mm_bound)
> +               return;
> +
> +       mm_work.mm = vdpasim->mm_bound;
> +       mm_work.bind = false;

Can we simply use mm_work.mm = NULL for unbinding?

> +
> +       vdpasim_worker_queue_mm(vdpasim, &mm_work);
> +
> +       vdpasim->mm_bound = NULL;

And change the mm_bound in the worker?

Thanks

> +}
>  static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa)
>  {
>         return container_of(vdpa, struct vdpasim, vdpa);
> @@ -59,8 +126,10 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx)
>  {
>         struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
>         uint16_t last_avail_idx = vq->vring.last_avail_idx;
> +       bool va_enabled = use_va && vdpasim->mm_bound;
>
> -       vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false,
> +       vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true,
> +                         va_enabled,
>                           (struct vring_desc *)(uintptr_t)vq->desc_addr,
>                           (struct vring_avail *)
>                           (uintptr_t)vq->driver_addr,
> @@ -151,7 +220,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
>         vdpa = __vdpa_alloc_device(NULL, ops,
>                                    dev_attr->ngroups, dev_attr->nas,
>                                    dev_attr->alloc_size,
> -                                  dev_attr->name, false);
> +                                  dev_attr->name, use_va);
>         if (IS_ERR(vdpa)) {
>                 ret = PTR_ERR(vdpa);
>                 goto err_alloc;
> @@ -571,6 +640,27 @@ static int vdpasim_set_map(struct vdpa_device *vdpa, unsigned int asid,
>         return ret;
>  }
>
> +static int vdpasim_bind_mm(struct vdpa_device *vdpa, struct mm_struct *mm)
> +{
> +       struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> +       int ret;
> +
> +       mutex_lock(&vdpasim->mutex);
> +       ret = vdpasim_worker_bind_mm(vdpasim, mm);
> +       mutex_unlock(&vdpasim->mutex);
> +
> +       return ret;
> +}
> +
> +static void vdpasim_unbind_mm(struct vdpa_device *vdpa)
> +{
> +       struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> +
> +       mutex_lock(&vdpasim->mutex);
> +       vdpasim_worker_unbind_mm(vdpasim);
> +       mutex_unlock(&vdpasim->mutex);
> +}
> +
>  static int vdpasim_dma_map(struct vdpa_device *vdpa, unsigned int asid,
>                            u64 iova, u64 size,
>                            u64 pa, u32 perm, void *opaque)
> @@ -667,6 +757,8 @@ static const struct vdpa_config_ops vdpasim_config_ops = {
>         .set_group_asid         = vdpasim_set_group_asid,
>         .dma_map                = vdpasim_dma_map,
>         .dma_unmap              = vdpasim_dma_unmap,
> +       .bind_mm                = vdpasim_bind_mm,
> +       .unbind_mm              = vdpasim_unbind_mm,
>         .free                   = vdpasim_free,
>  };
>
> @@ -701,6 +793,8 @@ static const struct vdpa_config_ops vdpasim_batch_config_ops = {
>         .get_iova_range         = vdpasim_get_iova_range,
>         .set_group_asid         = vdpasim_set_group_asid,
>         .set_map                = vdpasim_set_map,
> +       .bind_mm                = vdpasim_bind_mm,
> +       .unbind_mm              = vdpasim_unbind_mm,
>         .free                   = vdpasim_free,
>  };
>
> --
> 2.39.2
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 3/8] vringh: replace kmap_atomic() with kmap_local_page()
  2023-03-14  3:56   ` Jason Wang
@ 2023-03-15 21:12     ` Fabio M. De Francesco
  2023-03-16  2:53       ` Jason Wang
  2023-03-16  8:09       ` Stefano Garzarella
  0 siblings, 2 replies; 40+ messages in thread
From: Fabio M. De Francesco @ 2023-03-15 21:12 UTC (permalink / raw)
  To: Stefano Garzarella, Jason Wang
  Cc: virtualization, Andrey Zhadchenko, eperezma, netdev, stefanha,
	linux-kernel, Michael S. Tsirkin, kvm

On martedì 14 marzo 2023 04:56:08 CET Jason Wang wrote:
> On Thu, Mar 2, 2023 at 7:34 PM Stefano Garzarella <sgarzare@redhat.com> 
wrote:
> > kmap_atomic() is deprecated in favor of kmap_local_page().
> 
> It's better to mention the commit or code that introduces this.
> 
> > With kmap_local_page() the mappings are per thread, CPU local, can take
> > page-faults, and can be called from any context (including interrupts).
> > Furthermore, the tasks can be preempted and, when they are scheduled to
> > run again, the kernel virtual addresses are restored and still valid.
> > 
> > kmap_atomic() is implemented like a kmap_local_page() which also disables
> > page-faults and preemption (the latter only for !PREEMPT_RT kernels,
> > otherwise it only disables migration).
> > 
> > The code within the mappings/un-mappings in getu16_iotlb() and
> > putu16_iotlb() don't depend on the above-mentioned side effects of
> > kmap_atomic(),
> 
> Note we used to use spinlock to protect simulators (at least until
> patch 7, so we probably need to re-order the patches at least) so I
> think this is only valid when:
> 
> The vringh IOTLB helpers are not used in atomic context (e.g spinlock,
> interrupts).

I'm probably missing some context but it looks that you are saying that 
kmap_local_page() is not suited for any use in atomic context (you are 
mentioning spinlocks).

The commit message (that I know pretty well since it's the exact copy, word by 
word, of my boiler plate commits) explains that kmap_local_page() is perfectly 
usable in atomic context (including interrupts).

I don't know this code, however I am not able to see why these vringh IOTLB 
helpers cannot work if used under spinlocks. Can you please elaborate a little 
more?

> If yes, should we document this? (Or should we introduce a boolean to
> say whether an IOTLB variant can be used in an atomic context)?

Again, you'll have no problems from the use of kmap_local_page() and so you 
don't need any boolean to tell whether or not the code is running in atomic 
context. 

Please take a look at the Highmem documentation which has been recently 
reworked and extended by me: https://docs.kernel.org/mm/highmem.html

Anyway, I have been ATK 12 or 13 hours in a row. So I'm probably missing the 
whole picture.

Thanks, 

Fabio

> Thanks
> 
> > so that mere replacements of the old API with the new one
> > is all that is required (i.e., there is no need to explicitly add calls
> > to pagefault_disable() and/or preempt_disable()).
> > 
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > ---
> > 
> > Notes:
> >     v2:
> >     - added this patch since checkpatch.pl complained about deprecation
> >     
> >       of kmap_atomic() touched by next patch
> >  
> >  drivers/vhost/vringh.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> > index a1e27da54481..0ba3ef809e48 100644
> > --- a/drivers/vhost/vringh.c
> > +++ b/drivers/vhost/vringh.c
> > @@ -1220,10 +1220,10 @@ static inline int getu16_iotlb(const struct vringh
> > *vrh,> 
> >         if (ret < 0)
> >         
> >                 return ret;
> > 
> > -       kaddr = kmap_atomic(iov.bv_page);
> > +       kaddr = kmap_local_page(iov.bv_page);
> > 
> >         from = kaddr + iov.bv_offset;
> >         *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
> > 
> > -       kunmap_atomic(kaddr);
> > +       kunmap_local(kaddr);
> > 
> >         return 0;
> >  
> >  }
> > 
> > @@ -1241,10 +1241,10 @@ static inline int putu16_iotlb(const struct vringh
> > *vrh,> 
> >         if (ret < 0)
> >         
> >                 return ret;
> > 
> > -       kaddr = kmap_atomic(iov.bv_page);
> > +       kaddr = kmap_local_page(iov.bv_page);
> > 
> >         to = kaddr + iov.bv_offset;
> >         WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
> > 
> > -       kunmap_atomic(kaddr);
> > +       kunmap_local(kaddr);
> > 
> >         return 0;
> >  
> >  }
> > 
> > --
> > 2.39.2





^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 3/8] vringh: replace kmap_atomic() with kmap_local_page()
  2023-03-15 21:12     ` Fabio M. De Francesco
@ 2023-03-16  2:53       ` Jason Wang
  2023-03-16  8:09       ` Stefano Garzarella
  1 sibling, 0 replies; 40+ messages in thread
From: Jason Wang @ 2023-03-16  2:53 UTC (permalink / raw)
  To: Fabio M. De Francesco
  Cc: Stefano Garzarella, virtualization, Andrey Zhadchenko, eperezma,
	netdev, stefanha, linux-kernel, Michael S. Tsirkin, kvm

On Thu, Mar 16, 2023 at 5:12 AM Fabio M. De Francesco
<fmdefrancesco@gmail.com> wrote:
>
> On martedì 14 marzo 2023 04:56:08 CET Jason Wang wrote:
> > On Thu, Mar 2, 2023 at 7:34 PM Stefano Garzarella <sgarzare@redhat.com>
> wrote:
> > > kmap_atomic() is deprecated in favor of kmap_local_page().
> >
> > It's better to mention the commit or code that introduces this.
> >
> > > With kmap_local_page() the mappings are per thread, CPU local, can take
> > > page-faults, and can be called from any context (including interrupts).
> > > Furthermore, the tasks can be preempted and, when they are scheduled to
> > > run again, the kernel virtual addresses are restored and still valid.
> > >
> > > kmap_atomic() is implemented like a kmap_local_page() which also disables
> > > page-faults and preemption (the latter only for !PREEMPT_RT kernels,
> > > otherwise it only disables migration).
> > >
> > > The code within the mappings/un-mappings in getu16_iotlb() and
> > > putu16_iotlb() don't depend on the above-mentioned side effects of
> > > kmap_atomic(),
> >
> > Note we used to use spinlock to protect simulators (at least until
> > patch 7, so we probably need to re-order the patches at least) so I
> > think this is only valid when:
> >
> > The vringh IOTLB helpers are not used in atomic context (e.g spinlock,
> > interrupts).
>
> I'm probably missing some context but it looks that you are saying that
> kmap_local_page() is not suited for any use in atomic context (you are
> mentioning spinlocks).
>
> The commit message (that I know pretty well since it's the exact copy, word by
> word, of my boiler plate commits) explains that kmap_local_page() is perfectly
> usable in atomic context (including interrupts).

Thanks for the confirmation, I misread the change log and thought it
said it can't be used in interrupts.

>
> I don't know this code, however I am not able to see why these vringh IOTLB
> helpers cannot work if used under spinlocks. Can you please elaborate a little
> more?

My fault, see above.

>
> > If yes, should we document this? (Or should we introduce a boolean to
> > say whether an IOTLB variant can be used in an atomic context)?
>
> Again, you'll have no problems from the use of kmap_local_page() and so you
> don't need any boolean to tell whether or not the code is running in atomic
> context.
>
> Please take a look at the Highmem documentation which has been recently
> reworked and extended by me: https://docs.kernel.org/mm/highmem.html

This is really helpful.

>
> Anyway, I have been ATK 12 or 13 hours in a row. So I'm probably missing the
> whole picture.

It's me that misses the movitiation of kmap_local().

Thanks

>
> Thanks,
>
> Fabio
>
> > Thanks
> >
> > > so that mere replacements of the old API with the new one
> > > is all that is required (i.e., there is no need to explicitly add calls
> > > to pagefault_disable() and/or preempt_disable()).
> > >
> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > ---
> > >
> > > Notes:
> > >     v2:
> > >     - added this patch since checkpatch.pl complained about deprecation
> > >
> > >       of kmap_atomic() touched by next patch
> > >
> > >  drivers/vhost/vringh.c | 8 ++++----
> > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> > > index a1e27da54481..0ba3ef809e48 100644
> > > --- a/drivers/vhost/vringh.c
> > > +++ b/drivers/vhost/vringh.c
> > > @@ -1220,10 +1220,10 @@ static inline int getu16_iotlb(const struct vringh
> > > *vrh,>
> > >         if (ret < 0)
> > >
> > >                 return ret;
> > >
> > > -       kaddr = kmap_atomic(iov.bv_page);
> > > +       kaddr = kmap_local_page(iov.bv_page);
> > >
> > >         from = kaddr + iov.bv_offset;
> > >         *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
> > >
> > > -       kunmap_atomic(kaddr);
> > > +       kunmap_local(kaddr);
> > >
> > >         return 0;
> > >
> > >  }
> > >
> > > @@ -1241,10 +1241,10 @@ static inline int putu16_iotlb(const struct vringh
> > > *vrh,>
> > >         if (ret < 0)
> > >
> > >                 return ret;
> > >
> > > -       kaddr = kmap_atomic(iov.bv_page);
> > > +       kaddr = kmap_local_page(iov.bv_page);
> > >
> > >         to = kaddr + iov.bv_offset;
> > >         WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
> > >
> > > -       kunmap_atomic(kaddr);
> > > +       kunmap_local(kaddr);
> > >
> > >         return 0;
> > >
> > >  }
> > >
> > > --
> > > 2.39.2
>
>
>
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 3/8] vringh: replace kmap_atomic() with kmap_local_page()
  2023-03-15 21:12     ` Fabio M. De Francesco
  2023-03-16  2:53       ` Jason Wang
@ 2023-03-16  8:09       ` Stefano Garzarella
  2023-03-16  9:25         ` Fabio M. De Francesco
  1 sibling, 1 reply; 40+ messages in thread
From: Stefano Garzarella @ 2023-03-16  8:09 UTC (permalink / raw)
  To: Fabio M. De Francesco
  Cc: Jason Wang, virtualization, Andrey Zhadchenko, eperezma, netdev,
	stefanha, linux-kernel, Michael S. Tsirkin, kvm

On Wed, Mar 15, 2023 at 10:12 PM Fabio M. De Francesco
<fmdefrancesco@gmail.com> wrote:
>
> On martedì 14 marzo 2023 04:56:08 CET Jason Wang wrote:
> > On Thu, Mar 2, 2023 at 7:34 PM Stefano Garzarella <sgarzare@redhat.com>
> wrote:
> > > kmap_atomic() is deprecated in favor of kmap_local_page().
> >
> > It's better to mention the commit or code that introduces this.
> >
> > > With kmap_local_page() the mappings are per thread, CPU local, can take
> > > page-faults, and can be called from any context (including interrupts).
> > > Furthermore, the tasks can be preempted and, when they are scheduled to
> > > run again, the kernel virtual addresses are restored and still valid.
> > >
> > > kmap_atomic() is implemented like a kmap_local_page() which also disables
> > > page-faults and preemption (the latter only for !PREEMPT_RT kernels,
> > > otherwise it only disables migration).
> > >
> > > The code within the mappings/un-mappings in getu16_iotlb() and
> > > putu16_iotlb() don't depend on the above-mentioned side effects of
> > > kmap_atomic(),
> >
> > Note we used to use spinlock to protect simulators (at least until
> > patch 7, so we probably need to re-order the patches at least) so I
> > think this is only valid when:
> >
> > The vringh IOTLB helpers are not used in atomic context (e.g spinlock,
> > interrupts).
>
> I'm probably missing some context but it looks that you are saying that
> kmap_local_page() is not suited for any use in atomic context (you are
> mentioning spinlocks).
>
> The commit message (that I know pretty well since it's the exact copy, word by
> word, of my boiler plate commits)

I hope it's not a problem for you, should I mention it somehow?

I searched for the last commits that made a similar change and found
yours that explained it perfectly ;-)

Do I need to rephrase?

> explains that kmap_local_page() is perfectly
> usable in atomic context (including interrupts).
>
> I don't know this code, however I am not able to see why these vringh IOTLB
> helpers cannot work if used under spinlocks. Can you please elaborate a little
> more?
>
> > If yes, should we document this? (Or should we introduce a boolean to
> > say whether an IOTLB variant can be used in an atomic context)?
>
> Again, you'll have no problems from the use of kmap_local_page() and so you
> don't need any boolean to tell whether or not the code is running in atomic
> context.
>
> Please take a look at the Highmem documentation which has been recently
> reworked and extended by me: https://docs.kernel.org/mm/highmem.html
>
> Anyway, I have been ATK 12 or 13 hours in a row. So I'm probably missing the
> whole picture.

Thanks for your useful info!
Stefano


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/8] vdpa: add bind_mm/unbind_mm callbacks
  2023-03-14  3:39   ` Jason Wang
@ 2023-03-16  8:17     ` Stefano Garzarella
  0 siblings, 0 replies; 40+ messages in thread
From: Stefano Garzarella @ 2023-03-16  8:17 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Andrey Zhadchenko, eperezma, netdev, stefanha,
	linux-kernel, Michael S. Tsirkin, kvm

On Tue, Mar 14, 2023 at 11:39:42AM +0800, Jason Wang wrote:
>On Thu, Mar 2, 2023 at 7:34 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>>
>> These new optional callbacks is used to bind/unbind the device to
>> a specific address space so the vDPA framework can use VA when
>> these callbacks are implemented.
>>
>> Suggested-by: Jason Wang <jasowang@redhat.com>
>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>> ---
>
>One thing that came into my mind is that after this commit:
>
>commit 5ce995f313ce56c0c62425c3ddc37c5c50fc33db
>Author: Jason Wang <jasowang@redhat.com>
>Date:   Fri May 29 16:02:59 2020 +0800
>
>    vhost: use mmgrab() instead of mmget() for non worker device
>
>    For the device that doesn't use vhost worker and use_mm(), mmget() is
>    too heavy weight and it may brings troubles for implementing mmap()
>    support for vDPA device.
>
>We don't hold the address space after this commit, so the userspace
>mapping could be invalid if the owner exits?

Thanks for mentioning it, I'll take a look at it!

In case maybe I should do a mmget (or get_task_mm) in vhost-vdpa before
calling the callback, or in the parent driver inside the callback, but
it seems duplicating code.

Thanks,
Stefano

>
>Thanks
>
>>
>> Notes:
>>     v2:
>>     - removed `struct task_struct *owner` param (unused for now, maybe
>>       useful to support cgroups) [Jason]
>>     - add unbind_mm callback [Jason]
>>
>>  include/linux/vdpa.h | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
>> index 43f59ef10cc9..369c21394284 100644
>> --- a/include/linux/vdpa.h
>> +++ b/include/linux/vdpa.h
>> @@ -290,6 +290,14 @@ struct vdpa_map_file {
>>   *                             @vdev: vdpa device
>>   *                             @idx: virtqueue index
>>   *                             Returns pointer to structure device or error (NULL)
>> + * @bind_mm:                   Bind the device to a specific address space
>> + *                             so the vDPA framework can use VA when this
>> + *                             callback is implemented. (optional)
>> + *                             @vdev: vdpa device
>> + *                             @mm: address space to bind
>> + * @unbind_mm:                 Unbind the device from the address space
>> + *                             bound using the bind_mm callback. (optional)
>> + *                             @vdev: vdpa device
>>   * @free:                      Free resources that belongs to vDPA (optional)
>>   *                             @vdev: vdpa device
>>   */
>> @@ -351,6 +359,8 @@ struct vdpa_config_ops {
>>         int (*set_group_asid)(struct vdpa_device *vdev, unsigned int group,
>>                               unsigned int asid);
>>         struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 idx);
>> +       int (*bind_mm)(struct vdpa_device *vdev, struct mm_struct *mm);
>> +       void (*unbind_mm)(struct vdpa_device *vdev);
>>
>>         /* Free device resources */
>>         void (*free)(struct vdpa_device *vdev);
>> --
>> 2.39.2
>>
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/8] vhost-vdpa: use bind_mm/unbind_mm device callbacks
  2023-03-14  3:48   ` Jason Wang
@ 2023-03-16  8:31     ` Stefano Garzarella
  2023-03-16 10:11       ` Stefano Garzarella
  0 siblings, 1 reply; 40+ messages in thread
From: Stefano Garzarella @ 2023-03-16  8:31 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Andrey Zhadchenko, eperezma, netdev, stefanha,
	linux-kernel, Michael S. Tsirkin, kvm

On Tue, Mar 14, 2023 at 11:48:33AM +0800, Jason Wang wrote:
>On Thu, Mar 2, 2023 at 7:34 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>>
>> When the user call VHOST_SET_OWNER ioctl and the vDPA device
>> has `use_va` set to true, let's call the bind_mm callback.
>> In this way we can bind the device to the user address space
>> and directly use the user VA.
>>
>> The unbind_mm callback is called during the release after
>> stopping the device.
>>
>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>> ---
>>
>> Notes:
>>     v2:
>>     - call the new unbind_mm callback during the release [Jason]
>>     - avoid to call bind_mm callback after the reset, since the device
>>       is not detaching it now during the reset
>>
>>  drivers/vhost/vdpa.c | 30 ++++++++++++++++++++++++++++++
>>  1 file changed, 30 insertions(+)
>>
>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
>> index dc12dbd5b43b..1ab89fccd825 100644
>> --- a/drivers/vhost/vdpa.c
>> +++ b/drivers/vhost/vdpa.c
>> @@ -219,6 +219,28 @@ static int vhost_vdpa_reset(struct vhost_vdpa *v)
>>         return vdpa_reset(vdpa);
>>  }
>>
>> +static long vhost_vdpa_bind_mm(struct vhost_vdpa *v)
>> +{
>> +       struct vdpa_device *vdpa = v->vdpa;
>> +       const struct vdpa_config_ops *ops = vdpa->config;
>> +
>> +       if (!vdpa->use_va || !ops->bind_mm)
>> +               return 0;
>> +
>> +       return ops->bind_mm(vdpa, v->vdev.mm);
>> +}
>> +
>> +static void vhost_vdpa_unbind_mm(struct vhost_vdpa *v)
>> +{
>> +       struct vdpa_device *vdpa = v->vdpa;
>> +       const struct vdpa_config_ops *ops = vdpa->config;
>> +
>> +       if (!vdpa->use_va || !ops->unbind_mm)
>> +               return;
>> +
>> +       ops->unbind_mm(vdpa);
>> +}
>> +
>>  static long vhost_vdpa_get_device_id(struct vhost_vdpa *v, u8 __user *argp)
>>  {
>>         struct vdpa_device *vdpa = v->vdpa;
>> @@ -711,6 +733,13 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
>>                 break;
>>         default:
>>                 r = vhost_dev_ioctl(&v->vdev, cmd, argp);
>> +               if (!r && cmd == VHOST_SET_OWNER) {
>> +                       r = vhost_vdpa_bind_mm(v);
>> +                       if (r) {
>> +                               vhost_dev_reset_owner(&v->vdev, NULL);
>> +                               break;
>> +                       }
>> +               }
>
>Nit: is it better to have a new condition/switch branch instead of
>putting them under default? (as what vring_ioctl did).

Yep, I agree!

I'll change it.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 4/8] vringh: support VA with iotlb
  2023-03-14  4:53   ` Jason Wang
@ 2023-03-16  8:38     ` Stefano Garzarella
  0 siblings, 0 replies; 40+ messages in thread
From: Stefano Garzarella @ 2023-03-16  8:38 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Andrey Zhadchenko, eperezma, netdev, stefanha,
	linux-kernel, Michael S. Tsirkin, kvm

On Tue, Mar 14, 2023 at 12:53:57PM +0800, Jason Wang wrote:
>On Thu, Mar 2, 2023 at 7:35 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>>
>> vDPA supports the possibility to use user VA in the iotlb messages.
>> So, let's add support for user VA in vringh to use it in the vDPA
>> simulators.
>>
>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>> ---
>>
>> Notes:
>>     v2:
>>     - replace kmap_atomic() with kmap_local_page() [see previous patch]
>>     - fix cast warnings when build with W=1 C=1
>>
>>  include/linux/vringh.h            |   5 +-
>>  drivers/vdpa/mlx5/net/mlx5_vnet.c |   2 +-
>>  drivers/vdpa/vdpa_sim/vdpa_sim.c  |   4 +-
>>  drivers/vhost/vringh.c            | 247 ++++++++++++++++++++++++------
>>  4 files changed, 205 insertions(+), 53 deletions(-)
>>
>> diff --git a/include/linux/vringh.h b/include/linux/vringh.h
>> index 1991a02c6431..d39b9f2dcba0 100644
>> --- a/include/linux/vringh.h
>> +++ b/include/linux/vringh.h
>> @@ -32,6 +32,9 @@ struct vringh {
>>         /* Can we get away with weak barriers? */
>>         bool weak_barriers;
>>
>> +       /* Use user's VA */
>> +       bool use_va;
>> +
>>         /* Last available index we saw (ie. where we're up to). */
>>         u16 last_avail_idx;
>>
>> @@ -279,7 +282,7 @@ void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb,
>>                       spinlock_t *iotlb_lock);
>>
>>  int vringh_init_iotlb(struct vringh *vrh, u64 features,
>> -                     unsigned int num, bool weak_barriers,
>> +                     unsigned int num, bool weak_barriers, bool use_va,
>>                       struct vring_desc *desc,
>>                       struct vring_avail *avail,
>>                       struct vring_used *used);
>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> index 3a0e721aef05..babc8dd171a6 100644
>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> @@ -2537,7 +2537,7 @@ static int setup_cvq_vring(struct mlx5_vdpa_dev *mvdev)
>>
>>         if (mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))
>>                 err = vringh_init_iotlb(&cvq->vring, mvdev->actual_features,
>> -                                       MLX5_CVQ_MAX_ENT, false,
>> +                                       MLX5_CVQ_MAX_ENT, false, false,
>>                                         (struct vring_desc *)(uintptr_t)cvq->desc_addr,
>>                                         (struct vring_avail *)(uintptr_t)cvq->driver_addr,
>>                                         (struct vring_used *)(uintptr_t)cvq->device_addr);
>> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> index 6a0a65814626..481eb156658b 100644
>> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> @@ -60,7 +60,7 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx)
>>         struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
>>         uint16_t last_avail_idx = vq->vring.last_avail_idx;
>>
>> -       vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true,
>> +       vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false,
>>                           (struct vring_desc *)(uintptr_t)vq->desc_addr,
>>                           (struct vring_avail *)
>>                           (uintptr_t)vq->driver_addr,
>> @@ -81,7 +81,7 @@ static void vdpasim_vq_reset(struct vdpasim *vdpasim,
>>         vq->cb = NULL;
>>         vq->private = NULL;
>>         vringh_init_iotlb(&vq->vring, vdpasim->dev_attr.supported_features,
>> -                         VDPASIM_QUEUE_MAX, false, NULL, NULL, NULL);
>> +                         VDPASIM_QUEUE_MAX, false, false, NULL, NULL, NULL);
>>
>>         vq->vring.notify = NULL;
>>  }
>> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
>> index 0ba3ef809e48..61c79cea44ca 100644
>> --- a/drivers/vhost/vringh.c
>> +++ b/drivers/vhost/vringh.c
>> @@ -1094,15 +1094,99 @@ EXPORT_SYMBOL(vringh_need_notify_kern);
>>
>>  #if IS_REACHABLE(CONFIG_VHOST_IOTLB)
>>
>> -static int iotlb_translate(const struct vringh *vrh,
>> -                          u64 addr, u64 len, u64 *translated,
>> -                          struct bio_vec iov[],
>> -                          int iov_size, u32 perm)
>> +static int iotlb_translate_va(const struct vringh *vrh,
>> +                             u64 addr, u64 len, u64 *translated,
>> +                             struct iovec iov[],
>> +                             int iov_size, u32 perm)
>>  {
>>         struct vhost_iotlb_map *map;
>>         struct vhost_iotlb *iotlb = vrh->iotlb;
>> +       u64 s = 0, last = addr + len - 1;
>>         int ret = 0;
>> +
>> +       spin_lock(vrh->iotlb_lock);
>> +
>> +       while (len > s) {
>> +               u64 size;
>> +
>> +               if (unlikely(ret >= iov_size)) {
>> +                       ret = -ENOBUFS;
>> +                       break;
>> +               }
>> +
>> +               map = vhost_iotlb_itree_first(iotlb, addr, last);
>> +               if (!map || map->start > addr) {
>> +                       ret = -EINVAL;
>> +                       break;
>> +               } else if (!(map->perm & perm)) {
>> +                       ret = -EPERM;
>> +                       break;
>> +               }
>> +
>> +               size = map->size - addr + map->start;
>> +               iov[ret].iov_len = min(len - s, size);
>> +               iov[ret].iov_base = (void __user *)(unsigned long)
>> +                                   (map->addr + addr - map->start);
>> +               s += size;
>> +               addr += size;
>> +               ++ret;
>> +       }
>> +
>> +       spin_unlock(vrh->iotlb_lock);
>> +
>> +       if (translated)
>> +               *translated = min(len, s);
>> +
>> +       return ret;
>> +}
>> +
>> +static inline int copy_from_va(const struct vringh *vrh, void *dst, void *src,
>> +                              u64 len, u64 *translated)
>> +{
>> +       struct iovec iov[16];
>> +       struct iov_iter iter;
>> +       int ret;
>> +
>> +       ret = iotlb_translate_va(vrh, (u64)(uintptr_t)src, len, translated, iov,
>> +                                ARRAY_SIZE(iov), VHOST_MAP_RO);
>> +       if (ret == -ENOBUFS)
>> +               ret = ARRAY_SIZE(iov);
>> +       else if (ret < 0)
>> +               return ret;
>> +
>> +       iov_iter_init(&iter, ITER_SOURCE, iov, ret, *translated);
>> +
>> +       return copy_from_iter(dst, *translated, &iter);
>> +}
>> +
>> +static inline int copy_to_va(const struct vringh *vrh, void *dst, void *src,
>> +                            u64 len, u64 *translated)
>> +{
>> +       struct iovec iov[16];
>> +       struct iov_iter iter;
>> +       int ret;
>> +
>> +       ret = iotlb_translate_va(vrh, (u64)(uintptr_t)dst, len, translated, iov,
>> +                                ARRAY_SIZE(iov), VHOST_MAP_WO);
>> +       if (ret == -ENOBUFS)
>> +               ret = ARRAY_SIZE(iov);
>> +       else if (ret < 0)
>> +               return ret;
>> +
>> +       iov_iter_init(&iter, ITER_DEST, iov, ret, *translated);
>> +
>> +       return copy_to_iter(src, *translated, &iter);
>> +}
>> +
>> +static int iotlb_translate_pa(const struct vringh *vrh,
>> +                             u64 addr, u64 len, u64 *translated,
>> +                             struct bio_vec iov[],
>> +                             int iov_size, u32 perm)
>> +{
>> +       struct vhost_iotlb_map *map;
>> +       struct vhost_iotlb *iotlb = vrh->iotlb;
>>         u64 s = 0, last = addr + len - 1;
>> +       int ret = 0;
>>
>>         spin_lock(vrh->iotlb_lock);
>>
>> @@ -1141,28 +1225,61 @@ static int iotlb_translate(const struct vringh *vrh,
>>         return ret;
>>  }
>>
>> +static inline int copy_from_pa(const struct vringh *vrh, void *dst, void *src,
>> +                              u64 len, u64 *translated)
>> +{
>> +       struct bio_vec iov[16];
>> +       struct iov_iter iter;
>> +       int ret;
>> +
>> +       ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)src, len, translated, iov,
>> +                                ARRAY_SIZE(iov), VHOST_MAP_RO);
>> +       if (ret == -ENOBUFS)
>> +               ret = ARRAY_SIZE(iov);
>> +       else if (ret < 0)
>> +               return ret;
>> +
>> +       iov_iter_bvec(&iter, ITER_SOURCE, iov, ret, *translated);
>> +
>> +       return copy_from_iter(dst, *translated, &iter);
>> +}
>> +
>> +static inline int copy_to_pa(const struct vringh *vrh, void *dst, void *src,
>> +                            u64 len, u64 *translated)
>> +{
>> +       struct bio_vec iov[16];
>> +       struct iov_iter iter;
>> +       int ret;
>> +
>> +       ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)dst, len, translated, iov,
>> +                                ARRAY_SIZE(iov), VHOST_MAP_WO);
>> +       if (ret == -ENOBUFS)
>> +               ret = ARRAY_SIZE(iov);
>> +       else if (ret < 0)
>> +               return ret;
>> +
>> +       iov_iter_bvec(&iter, ITER_DEST, iov, ret, *translated);
>> +
>> +       return copy_to_iter(src, *translated, &iter);
>> +}
>> +
>>  static inline int copy_from_iotlb(const struct vringh *vrh, void *dst,
>>                                   void *src, size_t len)
>>  {
>>         u64 total_translated = 0;
>>
>>         while (total_translated < len) {
>> -               struct bio_vec iov[16];
>> -               struct iov_iter iter;
>>                 u64 translated;
>>                 int ret;
>>
>> -               ret = iotlb_translate(vrh, (u64)(uintptr_t)src,
>> -                                     len - total_translated, &translated,
>> -                                     iov, ARRAY_SIZE(iov), VHOST_MAP_RO);
>> -               if (ret == -ENOBUFS)
>> -                       ret = ARRAY_SIZE(iov);
>> -               else if (ret < 0)
>> -                       return ret;
>> -
>> -               iov_iter_bvec(&iter, ITER_SOURCE, iov, ret, translated);
>> +               if (vrh->use_va) {
>> +                       ret = copy_from_va(vrh, dst, src,
>> +                                          len - total_translated, &translated);
>> +               } else {
>> +                       ret = copy_from_pa(vrh, dst, src,
>> +                                          len - total_translated, &translated);
>> +               }
>>
>> -               ret = copy_from_iter(dst, translated, &iter);
>>                 if (ret < 0)
>>                         return ret;
>>
>> @@ -1180,22 +1297,17 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
>>         u64 total_translated = 0;
>>
>>         while (total_translated < len) {
>> -               struct bio_vec iov[16];
>> -               struct iov_iter iter;
>>                 u64 translated;
>>                 int ret;
>>
>> -               ret = iotlb_translate(vrh, (u64)(uintptr_t)dst,
>> -                                     len - total_translated, &translated,
>> -                                     iov, ARRAY_SIZE(iov), VHOST_MAP_WO);
>> -               if (ret == -ENOBUFS)
>> -                       ret = ARRAY_SIZE(iov);
>> -               else if (ret < 0)
>> -                       return ret;
>> -
>> -               iov_iter_bvec(&iter, ITER_DEST, iov, ret, translated);
>> +               if (vrh->use_va) {
>> +                       ret = copy_to_va(vrh, dst, src,
>> +                                        len - total_translated, &translated);
>> +               } else {
>> +                       ret = copy_to_pa(vrh, dst, src,
>> +                                        len - total_translated, &translated);
>> +               }
>>
>> -               ret = copy_to_iter(src, translated, &iter);
>>                 if (ret < 0)
>>                         return ret;
>>
>> @@ -1210,20 +1322,37 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
>>  static inline int getu16_iotlb(const struct vringh *vrh,
>>                                u16 *val, const __virtio16 *p)
>>  {
>> -       struct bio_vec iov;
>> -       void *kaddr, *from;
>>         int ret;
>>
>>         /* Atomic read is needed for getu16 */
>> -       ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
>> -                             &iov, 1, VHOST_MAP_RO);
>> -       if (ret < 0)
>> -               return ret;
>> +       if (vrh->use_va) {
>> +               struct iovec iov;
>> +               __virtio16 tmp;
>> +
>> +               ret = iotlb_translate_va(vrh, (u64)(uintptr_t)p, sizeof(*p),
>> +                                        NULL, &iov, 1, VHOST_MAP_RO);
>> +               if (ret < 0)
>> +                       return ret;
>
>Nit: since we have copy_to_va/copy_to_pa variants, let's introduce
>getu16_iotlb_va/pa variants?

Yep!

>
>>
>> -       kaddr = kmap_local_page(iov.bv_page);
>> -       from = kaddr + iov.bv_offset;
>> -       *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
>> -       kunmap_local(kaddr);
>> +               ret = __get_user(tmp, (__virtio16 __user *)iov.iov_base);
>> +               if (ret)
>> +                       return ret;
>> +
>> +               *val = vringh16_to_cpu(vrh, tmp);
>> +       } else {
>> +               struct bio_vec iov;
>> +               void *kaddr, *from;
>> +
>> +               ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)p, sizeof(*p),
>> +                                        NULL, &iov, 1, VHOST_MAP_RO);
>> +               if (ret < 0)
>> +                       return ret;
>> +
>> +               kaddr = kmap_local_page(iov.bv_page);
>
>If we decide to have a use_va switch, is kmap_local_page() still required here?
>

I think yes. This is related to the email where Fabio clarified for us,
right?

>Other looks good.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 7/8] vdpa_sim: replace the spinlock with a mutex to protect the state
  2023-03-14  5:31     ` Jason Wang
@ 2023-03-16  8:42       ` Stefano Garzarella
  0 siblings, 0 replies; 40+ messages in thread
From: Stefano Garzarella @ 2023-03-16  8:42 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Andrey Zhadchenko, eperezma, netdev, stefanha,
	linux-kernel, Michael S. Tsirkin, kvm

On Tue, Mar 14, 2023 at 01:31:25PM +0800, Jason Wang wrote:
>On Tue, Mar 14, 2023 at 1:29 PM Jason Wang <jasowang@redhat.com> wrote:
>>
>> On Thu, Mar 2, 2023 at 7:35 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>> >
>> > The spinlock we use to protect the state of the simulator is sometimes
>> > held for a long time (for example, when devices handle requests).
>> >
>> > This also prevents us from calling functions that might sleep (such as
>> > kthread_flush_work() in the next patch), and thus having to release
>> > and retake the lock.
>> >
>> > For these reasons, let's replace the spinlock with a mutex that gives
>> > us more flexibility.
>> >
>> > Suggested-by: Jason Wang <jasowang@redhat.com>
>> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>>
>> Acked-by: Jason Wang <jasowang@redhat.com>
>>
>> Thanks
>
>Btw, though it looks fine but we'd better double confirm virtio_vdpa works well.

I tested it, but I will do it more carefully to make sure everything
is okay.

>
>(I think so since there's transport that might sleep).

I see.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 8/8] vdpa_sim: add support for user VA
  2023-03-14  5:36   ` Jason Wang
@ 2023-03-16  9:11     ` Stefano Garzarella
  0 siblings, 0 replies; 40+ messages in thread
From: Stefano Garzarella @ 2023-03-16  9:11 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Andrey Zhadchenko, eperezma, netdev, stefanha,
	linux-kernel, Michael S. Tsirkin, kvm

On Tue, Mar 14, 2023 at 01:36:13PM +0800, Jason Wang wrote:
>On Thu, Mar 2, 2023 at 7:35 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>>
>> The new "use_va" module parameter (default: false) is used in
>> vdpa_alloc_device() to inform the vDPA framework that the device
>> supports VA.
>>
>> vringh is initialized to use VA only when "use_va" is true and the
>> user's mm has been bound. So, only when the bus supports user VA
>> (e.g. vhost-vdpa).
>>
>> vdpasim_mm_work_fn work is used to attach the kthread to the user
>> address space when the .bind_mm callback is invoked, and to detach
>> it when the .unbind_mm callback is invoked.
>>
>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>> ---
>>
>> Notes:
>>     v2:
>>     - `use_va` set to true by default [Eugenio]
>>     - supported the new unbind_mm callback [Jason]
>>     - removed the unbind_mm call in vdpasim_do_reset() [Jason]
>>     - avoided to release the lock while call kthread_flush_work() since we
>>       are now using a mutex to protect the device state
>>
>>  drivers/vdpa/vdpa_sim/vdpa_sim.h |  1 +
>>  drivers/vdpa/vdpa_sim/vdpa_sim.c | 98 +++++++++++++++++++++++++++++++-
>>  2 files changed, 97 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
>> index 4774292fba8c..3a42887d05d9 100644
>> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
>> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
>> @@ -59,6 +59,7 @@ struct vdpasim {
>>         struct vdpasim_virtqueue *vqs;
>>         struct kthread_worker *worker;
>>         struct kthread_work work;
>> +       struct mm_struct *mm_bound;
>>         struct vdpasim_dev_attr dev_attr;
>>         /* mutex to synchronize virtqueue state */
>>         struct mutex mutex;
>> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> index a28103a67ae7..eda26bc33df5 100644
>> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> @@ -35,10 +35,77 @@ module_param(max_iotlb_entries, int, 0444);
>>  MODULE_PARM_DESC(max_iotlb_entries,
>>                  "Maximum number of iotlb entries for each address space. 0 means unlimited. (default: 2048)");
>>
>> +static bool use_va = true;
>> +module_param(use_va, bool, 0444);
>> +MODULE_PARM_DESC(use_va, "Enable/disable the device's ability to use VA");
>> +
>>  #define VDPASIM_QUEUE_ALIGN PAGE_SIZE
>>  #define VDPASIM_QUEUE_MAX 256
>>  #define VDPASIM_VENDOR_ID 0
>>
>> +struct vdpasim_mm_work {
>> +       struct kthread_work work;
>> +       struct mm_struct *mm;
>> +       bool bind;
>> +       int ret;
>> +};
>> +
>> +static void vdpasim_mm_work_fn(struct kthread_work *work)
>> +{
>> +       struct vdpasim_mm_work *mm_work =
>> +               container_of(work, struct vdpasim_mm_work, work);
>> +
>> +       mm_work->ret = 0;
>> +
>> +       if (mm_work->bind) {
>> +               kthread_use_mm(mm_work->mm);
>> +               //TODO: should we attach the cgroup of the mm owner?
>> +       } else {
>> +               kthread_unuse_mm(mm_work->mm);
>> +       }
>> +}
>> +
>> +static void vdpasim_worker_queue_mm(struct vdpasim *vdpasim,
>> +                                   struct vdpasim_mm_work *mm_work)
>> +{
>
>Nit: we need to tweak the name as it does flush besides queuing the work.

Yep, or split in 2 functions.

>
>> +       struct kthread_work *work = &mm_work->work;
>> +
>> +       kthread_init_work(work, vdpasim_mm_work_fn);
>> +       kthread_queue_work(vdpasim->worker, work);
>> +
>> +       kthread_flush_work(work);
>> +}
>> +
>> +static int vdpasim_worker_bind_mm(struct vdpasim *vdpasim,
>> +                                 struct mm_struct *new_mm)
>> +{
>> +       struct vdpasim_mm_work mm_work;
>> +
>> +       mm_work.mm = new_mm;
>> +       mm_work.bind = true;
>> +
>> +       vdpasim_worker_queue_mm(vdpasim, &mm_work);
>> +
>> +       if (!mm_work.ret)
>> +               vdpasim->mm_bound = new_mm;
>> +
>> +       return mm_work.ret;
>> +}
>> +
>> +static void vdpasim_worker_unbind_mm(struct vdpasim *vdpasim)
>> +{
>> +       struct vdpasim_mm_work mm_work;
>> +
>> +       if (!vdpasim->mm_bound)
>> +               return;
>> +
>> +       mm_work.mm = vdpasim->mm_bound;
>> +       mm_work.bind = false;
>
>Can we simply use mm_work.mm = NULL for unbinding?
>
>> +
>> +       vdpasim_worker_queue_mm(vdpasim, &mm_work);
>> +
>> +       vdpasim->mm_bound = NULL;
>
>And change the mm_bound in the worker?

Yep, I need to put `vdpasim` in struct vdpasim_mm_work.

I'll do in the next version.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 3/8] vringh: replace kmap_atomic() with kmap_local_page()
  2023-03-02 11:34 ` [PATCH v2 3/8] vringh: replace kmap_atomic() with kmap_local_page() Stefano Garzarella
  2023-03-14  3:56   ` Jason Wang
@ 2023-03-16  9:13   ` Fabio M. De Francesco
  2023-03-16  9:17     ` Stefano Garzarella
  1 sibling, 1 reply; 40+ messages in thread
From: Fabio M. De Francesco @ 2023-03-16  9:13 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: virtualization, Andrey Zhadchenko, eperezma, netdev, stefanha,
	linux-kernel, Jason Wang, Michael S. Tsirkin, kvm,
	Stefano Garzarella

On giovedì 2 marzo 2023 12:34:16 CET Stefano Garzarella wrote:
> kmap_atomic() is deprecated in favor of kmap_local_page().
> 
> With kmap_local_page() the mappings are per thread, CPU local, can take
> page-faults, and can be called from any context (including interrupts).
> Furthermore, the tasks can be preempted and, when they are scheduled to
> run again, the kernel virtual addresses are restored and still valid.
> 
> kmap_atomic() is implemented like a kmap_local_page() which also disables
> page-faults and preemption (the latter only for !PREEMPT_RT kernels,
> otherwise it only disables migration).
> 
> The code within the mappings/un-mappings in getu16_iotlb() and
> putu16_iotlb() don't depend on the above-mentioned side effects of
> kmap_atomic(), so that mere replacements of the old API with the new one
> is all that is required (i.e., there is no need to explicitly add calls
> to pagefault_disable() and/or preempt_disable()).

It seems that my commit message is quite clear and complete and therefore has 
already been reused by others who have somehow given me credit. 

I would really appreciate it being mentioned here that you are reusing a 
"boiler plate" commit message of my own making and Cc me :-)

Thanks,

Fabio

> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
> 
> Notes:
>     v2:
>     - added this patch since checkpatch.pl complained about deprecation
>       of kmap_atomic() touched by next patch
> 
>  drivers/vhost/vringh.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> index a1e27da54481..0ba3ef809e48 100644
> --- a/drivers/vhost/vringh.c
> +++ b/drivers/vhost/vringh.c
> @@ -1220,10 +1220,10 @@ static inline int getu16_iotlb(const struct vringh
> *vrh, if (ret < 0)
>  		return ret;
> 
> -	kaddr = kmap_atomic(iov.bv_page);
> +	kaddr = kmap_local_page(iov.bv_page);
>  	from = kaddr + iov.bv_offset;
>  	*val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
> -	kunmap_atomic(kaddr);
> +	kunmap_local(kaddr);
> 
>  	return 0;
>  }
> @@ -1241,10 +1241,10 @@ static inline int putu16_iotlb(const struct vringh
> *vrh, if (ret < 0)
>  		return ret;
> 
> -	kaddr = kmap_atomic(iov.bv_page);
> +	kaddr = kmap_local_page(iov.bv_page);
>  	to = kaddr + iov.bv_offset;
>  	WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
> -	kunmap_atomic(kaddr);
> +	kunmap_local(kaddr);
> 
>  	return 0;
>  }
> --
> 2.39.2





^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 3/8] vringh: replace kmap_atomic() with kmap_local_page()
  2023-03-16  9:13   ` Fabio M. De Francesco
@ 2023-03-16  9:17     ` Stefano Garzarella
  0 siblings, 0 replies; 40+ messages in thread
From: Stefano Garzarella @ 2023-03-16  9:17 UTC (permalink / raw)
  To: Fabio M. De Francesco
  Cc: virtualization, Andrey Zhadchenko, eperezma, netdev, stefanha,
	linux-kernel, Jason Wang, Michael S. Tsirkin, kvm

On Thu, Mar 16, 2023 at 10:13:39AM +0100, Fabio M. De Francesco wrote:
>On giovedì 2 marzo 2023 12:34:16 CET Stefano Garzarella wrote:
>> kmap_atomic() is deprecated in favor of kmap_local_page().
>>
>> With kmap_local_page() the mappings are per thread, CPU local, can take
>> page-faults, and can be called from any context (including interrupts).
>> Furthermore, the tasks can be preempted and, when they are scheduled to
>> run again, the kernel virtual addresses are restored and still valid.
>>
>> kmap_atomic() is implemented like a kmap_local_page() which also disables
>> page-faults and preemption (the latter only for !PREEMPT_RT kernels,
>> otherwise it only disables migration).
>>
>> The code within the mappings/un-mappings in getu16_iotlb() and
>> putu16_iotlb() don't depend on the above-mentioned side effects of
>> kmap_atomic(), so that mere replacements of the old API with the new one
>> is all that is required (i.e., there is no need to explicitly add calls
>> to pagefault_disable() and/or preempt_disable()).
>
>It seems that my commit message is quite clear and complete and therefore has
>already been reused by others who have somehow given me credit.
>
>I would really appreciate it being mentioned here that you are reusing a
>"boiler plate" commit message of my own making and Cc me :-)

Yes of course, sorry for not doing this previously!

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 3/8] vringh: replace kmap_atomic() with kmap_local_page()
  2023-03-16  8:09       ` Stefano Garzarella
@ 2023-03-16  9:25         ` Fabio M. De Francesco
  0 siblings, 0 replies; 40+ messages in thread
From: Fabio M. De Francesco @ 2023-03-16  9:25 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Jason Wang, virtualization, Andrey Zhadchenko, eperezma, netdev,
	stefanha, linux-kernel, Michael S. Tsirkin, kvm

On giovedì 16 marzo 2023 09:09:29 CET Stefano Garzarella wrote:
> On Wed, Mar 15, 2023 at 10:12 PM Fabio M. De Francesco
> 
> <fmdefrancesco@gmail.com> wrote:
> > On martedì 14 marzo 2023 04:56:08 CET Jason Wang wrote:
> > > On Thu, Mar 2, 2023 at 7:34 PM Stefano Garzarella <sgarzare@redhat.com>
> > 
> > wrote:
> > > > kmap_atomic() is deprecated in favor of kmap_local_page().
> > > 
> > > It's better to mention the commit or code that introduces this.
> > > 
> > > > With kmap_local_page() the mappings are per thread, CPU local, can 
take
> > > > page-faults, and can be called from any context (including 
interrupts).
> > > > Furthermore, the tasks can be preempted and, when they are scheduled 
to
> > > > run again, the kernel virtual addresses are restored and still valid.
> > > > 
> > > > kmap_atomic() is implemented like a kmap_local_page() which also
> > > > disables
> > > > page-faults and preemption (the latter only for !PREEMPT_RT kernels,
> > > > otherwise it only disables migration).
> > > > 
> > > > The code within the mappings/un-mappings in getu16_iotlb() and
> > > > putu16_iotlb() don't depend on the above-mentioned side effects of
> > > > kmap_atomic(),
> > > 
> > > Note we used to use spinlock to protect simulators (at least until
> > > patch 7, so we probably need to re-order the patches at least) so I
> > > think this is only valid when:
> > > 
> > > The vringh IOTLB helpers are not used in atomic context (e.g spinlock,
> > > interrupts).
> > 
> > I'm probably missing some context but it looks that you are saying that
> > kmap_local_page() is not suited for any use in atomic context (you are
> > mentioning spinlocks).
> > 
> > The commit message (that I know pretty well since it's the exact copy, 
word
> > by word, of my boiler plate commits)
> 
> I hope it's not a problem for you, should I mention it somehow?

Sorry, I had missed your last message when I wrote a another message few 
minutes ago in this thread.

Obviously, I'm happy that my commit message it's being reused. As I said in 
the other message I would appreciate some kind of crediting me as the author.

I proposed a means you can use, but feel free to ignore my suggestion and do 
differently if you prefer to.

Again thanks,

Fabio

> I searched for the last commits that made a similar change and found
> yours that explained it perfectly ;-)
> 
> Do I need to rephrase?
> 
> > explains that kmap_local_page() is perfectly
> > usable in atomic context (including interrupts).
> > 
> > I don't know this code, however I am not able to see why these vringh 
IOTLB
> > helpers cannot work if used under spinlocks. Can you please elaborate a
> > little more?
> > 
> > > If yes, should we document this? (Or should we introduce a boolean to
> > > say whether an IOTLB variant can be used in an atomic context)?
> > 
> > Again, you'll have no problems from the use of kmap_local_page() and so 
you
> > don't need any boolean to tell whether or not the code is running in 
atomic
> > context.
> > 
> > Please take a look at the Highmem documentation which has been recently
> > reworked and extended by me: https://docs.kernel.org/mm/highmem.html
> > 
> > Anyway, I have been ATK 12 or 13 hours in a row. So I'm probably missing 
the
> > whole picture.
> 
> Thanks for your useful info!
> Stefano





^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/8] vhost-vdpa: use bind_mm/unbind_mm device callbacks
  2023-03-16  8:31     ` Stefano Garzarella
@ 2023-03-16 10:11       ` Stefano Garzarella
  0 siblings, 0 replies; 40+ messages in thread
From: Stefano Garzarella @ 2023-03-16 10:11 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Andrey Zhadchenko, eperezma, netdev, stefanha,
	linux-kernel, Michael S. Tsirkin, kvm

On Thu, Mar 16, 2023 at 9:31 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Tue, Mar 14, 2023 at 11:48:33AM +0800, Jason Wang wrote:
> >On Thu, Mar 2, 2023 at 7:34 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >>
> >> When the user call VHOST_SET_OWNER ioctl and the vDPA device
> >> has `use_va` set to true, let's call the bind_mm callback.
> >> In this way we can bind the device to the user address space
> >> and directly use the user VA.
> >>
> >> The unbind_mm callback is called during the release after
> >> stopping the device.
> >>
> >> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> >> ---
> >>
> >> Notes:
> >>     v2:
> >>     - call the new unbind_mm callback during the release [Jason]
> >>     - avoid to call bind_mm callback after the reset, since the device
> >>       is not detaching it now during the reset
> >>
> >>  drivers/vhost/vdpa.c | 30 ++++++++++++++++++++++++++++++
> >>  1 file changed, 30 insertions(+)
> >>
> >> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> >> index dc12dbd5b43b..1ab89fccd825 100644
> >> --- a/drivers/vhost/vdpa.c
> >> +++ b/drivers/vhost/vdpa.c
> >> @@ -219,6 +219,28 @@ static int vhost_vdpa_reset(struct vhost_vdpa *v)
> >>         return vdpa_reset(vdpa);
> >>  }
> >>
> >> +static long vhost_vdpa_bind_mm(struct vhost_vdpa *v)
> >> +{
> >> +       struct vdpa_device *vdpa = v->vdpa;
> >> +       const struct vdpa_config_ops *ops = vdpa->config;
> >> +
> >> +       if (!vdpa->use_va || !ops->bind_mm)
> >> +               return 0;
> >> +
> >> +       return ops->bind_mm(vdpa, v->vdev.mm);
> >> +}
> >> +
> >> +static void vhost_vdpa_unbind_mm(struct vhost_vdpa *v)
> >> +{
> >> +       struct vdpa_device *vdpa = v->vdpa;
> >> +       const struct vdpa_config_ops *ops = vdpa->config;
> >> +
> >> +       if (!vdpa->use_va || !ops->unbind_mm)
> >> +               return;
> >> +
> >> +       ops->unbind_mm(vdpa);
> >> +}
> >> +
> >>  static long vhost_vdpa_get_device_id(struct vhost_vdpa *v, u8 __user *argp)
> >>  {
> >>         struct vdpa_device *vdpa = v->vdpa;
> >> @@ -711,6 +733,13 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
> >>                 break;
> >>         default:
> >>                 r = vhost_dev_ioctl(&v->vdev, cmd, argp);
> >> +               if (!r && cmd == VHOST_SET_OWNER) {
> >> +                       r = vhost_vdpa_bind_mm(v);
> >> +                       if (r) {
> >> +                               vhost_dev_reset_owner(&v->vdev, NULL);
> >> +                               break;
> >> +                       }
> >> +               }
> >
> >Nit: is it better to have a new condition/switch branch instead of
> >putting them under default? (as what vring_ioctl did).
>
> Yep, I agree!
>
> I'll change it.

Or maybe I can simply add `case VHOST_SET_OWNER` on this switch and call
vhost_dev_set_owner() and vhost_vdpa_bind_mm(), I mean something like
this:

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 331d4a718bf6..20250c3418b2 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -731,15 +731,16 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
        case VHOST_VDPA_RESUME:
                r = vhost_vdpa_resume(v);
                break;
+       case VHOST_SET_OWNER:
+               r = vhost_dev_set_owner(d);
+               if (r)
+                       break;
+               r = vhost_vdpa_bind_mm(v);
+               if (r)
+                       vhost_dev_reset_owner(d, NULL);
+               break;
        default:
                r = vhost_dev_ioctl(&v->vdev, cmd, argp);
-               if (!r && cmd == VHOST_SET_OWNER) {
-                       r = vhost_vdpa_bind_mm(v);
-                       if (r) {
-                               vhost_dev_reset_owner(&v->vdev, NULL);
-                               break;
-                       }
-               }
                if (r == -ENOIOCTLCMD)
                        r = vhost_vdpa_vring_ioctl(v, cmd, argp);
                break;

WDYT?

Thanks,
Stefano


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 4/8] vringh: support VA with iotlb
  2023-03-03 14:38   ` Eugenio Perez Martin
  2023-03-07  9:31     ` Stefano Garzarella
@ 2023-03-16 16:07     ` Stefano Garzarella
  2023-03-17  2:53       ` Jason Wang
  2023-03-17  9:49       ` Eugenio Perez Martin
  1 sibling, 2 replies; 40+ messages in thread
From: Stefano Garzarella @ 2023-03-16 16:07 UTC (permalink / raw)
  To: Eugenio Perez Martin, Jason Wang
  Cc: virtualization, Andrey Zhadchenko, netdev, stefanha,
	linux-kernel, Michael S. Tsirkin, kvm

On Fri, Mar 3, 2023 at 3:39 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Thu, Mar 2, 2023 at 12:35 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >
> > vDPA supports the possibility to use user VA in the iotlb messages.
> > So, let's add support for user VA in vringh to use it in the vDPA
> > simulators.
> >
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > ---
> >
> > Notes:
> >     v2:
> >     - replace kmap_atomic() with kmap_local_page() [see previous patch]
> >     - fix cast warnings when build with W=1 C=1
> >
> >  include/linux/vringh.h            |   5 +-
> >  drivers/vdpa/mlx5/net/mlx5_vnet.c |   2 +-
> >  drivers/vdpa/vdpa_sim/vdpa_sim.c  |   4 +-
> >  drivers/vhost/vringh.c            | 247 ++++++++++++++++++++++++------
> >  4 files changed, 205 insertions(+), 53 deletions(-)
> >

[...]

>
> It seems to me iotlb_translate_va and iotlb_translate_pa are very
> similar, their only difference is that the argument is that iov is
> iovec instead of bio_vec. And how to fill it, obviously.
>
> It would be great to merge both functions, only differing with a
> conditional on vrh->use_va, or generics, or similar. Or, if following
> the style of the rest of vringh code, to provide a callback to fill
> iovec (although I like conditional more).
>
> However I cannot think of an easy way to perform that without long
> macros or type erasure.

Thank you for pushing me :-)
I finally managed to avoid code duplication (partial patch attached,
but not yet fully tested).

@Jason: with this refactoring I removed copy_to_va/copy_to_pa, so I
also avoided getu16_iotlb_va/pa.

I will send the full patch in v3, but I would like to get your opinion
first ;-)



diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index 0ba3ef809e48..71dd67700e36 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -1096,8 +1096,7 @@ EXPORT_SYMBOL(vringh_need_notify_kern);
 
 static int iotlb_translate(const struct vringh *vrh,
 			   u64 addr, u64 len, u64 *translated,
-			   struct bio_vec iov[],
-			   int iov_size, u32 perm)
+			   void *iov, int iov_size, bool iovec, u32 perm)
 {
 	struct vhost_iotlb_map *map;
 	struct vhost_iotlb *iotlb = vrh->iotlb;
@@ -1107,7 +1106,7 @@ static int iotlb_translate(const struct vringh *vrh,
 	spin_lock(vrh->iotlb_lock);
 
 	while (len > s) {
-		u64 size, pa, pfn;
+		u64 size;
 
 		if (unlikely(ret >= iov_size)) {
 			ret = -ENOBUFS;
@@ -1124,10 +1123,22 @@ static int iotlb_translate(const struct vringh *vrh,
 		}
 
 		size = map->size - addr + map->start;
-		pa = map->addr + addr - map->start;
-		pfn = pa >> PAGE_SHIFT;
-		bvec_set_page(&iov[ret], pfn_to_page(pfn), min(len - s, size),
-			      pa & (PAGE_SIZE - 1));
+		if (iovec) {
+			struct iovec *iovec = iov;
+
+			iovec[ret].iov_len = min(len - s, size);
+			iovec[ret].iov_base = (void __user *)(unsigned long)
+					      (map->addr + addr - map->start);
+		} else {
+			u64 pa = map->addr + addr - map->start;
+			u64 pfn = pa >> PAGE_SHIFT;
+			struct bio_vec *bvec = iov;
+
+			bvec_set_page(&bvec[ret], pfn_to_page(pfn),
+				      min(len - s, size),
+				      pa & (PAGE_SIZE - 1));
+		}
+
 		s += size;
 		addr += size;
 		++ret;
@@ -1141,26 +1152,38 @@ static int iotlb_translate(const struct vringh *vrh,
 	return ret;
 }
 
+#define IOTLB_IOV_SIZE 16
+
 static inline int copy_from_iotlb(const struct vringh *vrh, void *dst,
 				  void *src, size_t len)
 {
 	u64 total_translated = 0;
 
 	while (total_translated < len) {
-		struct bio_vec iov[16];
+		union {
+			struct iovec iovec[IOTLB_IOV_SIZE];
+			struct bio_vec bvec[IOTLB_IOV_SIZE];
+		} iov;
 		struct iov_iter iter;
 		u64 translated;
 		int ret;
 
 		ret = iotlb_translate(vrh, (u64)(uintptr_t)src,
 				      len - total_translated, &translated,
-				      iov, ARRAY_SIZE(iov), VHOST_MAP_RO);
+				      &iov, IOTLB_IOV_SIZE, vrh->use_va,
+				      VHOST_MAP_RO);
 		if (ret == -ENOBUFS)
-			ret = ARRAY_SIZE(iov);
+			ret = IOTLB_IOV_SIZE;
 		else if (ret < 0)
 			return ret;
 
-		iov_iter_bvec(&iter, ITER_SOURCE, iov, ret, translated);
+		if (vrh->use_va) {
+			iov_iter_init(&iter, ITER_SOURCE, iov.iovec, ret,
+				      translated);
+		} else {
+			iov_iter_bvec(&iter, ITER_SOURCE, iov.bvec, ret,
+				      translated);
+		}
 
 		ret = copy_from_iter(dst, translated, &iter);
 		if (ret < 0)
@@ -1180,20 +1203,30 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
 	u64 total_translated = 0;
 
 	while (total_translated < len) {
-		struct bio_vec iov[16];
+		union {
+			struct iovec iovec[IOTLB_IOV_SIZE];
+			struct bio_vec bvec[IOTLB_IOV_SIZE];
+		} iov;
 		struct iov_iter iter;
 		u64 translated;
 		int ret;
 
 		ret = iotlb_translate(vrh, (u64)(uintptr_t)dst,
 				      len - total_translated, &translated,
-				      iov, ARRAY_SIZE(iov), VHOST_MAP_WO);
+				      &iov, IOTLB_IOV_SIZE, vrh->use_va,
+				      VHOST_MAP_WO);
 		if (ret == -ENOBUFS)
-			ret = ARRAY_SIZE(iov);
+			ret = IOTLB_IOV_SIZE;
 		else if (ret < 0)
 			return ret;
 
-		iov_iter_bvec(&iter, ITER_DEST, iov, ret, translated);
+		if (vrh->use_va) {
+			iov_iter_init(&iter, ITER_DEST, iov.iovec, ret,
+				      translated);
+		} else {
+			iov_iter_bvec(&iter, ITER_DEST, iov.bvec, ret,
+				      translated);
+		}
 
 		ret = copy_to_iter(src, translated, &iter);
 		if (ret < 0)
@@ -1210,20 +1243,32 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
 static inline int getu16_iotlb(const struct vringh *vrh,
 			       u16 *val, const __virtio16 *p)
 {
-	struct bio_vec iov;
-	void *kaddr, *from;
+	union {
+		struct iovec iovec;
+		struct bio_vec bvec;
+	} iov;
+	__virtio16 tmp;
 	int ret;
 
 	/* Atomic read is needed for getu16 */
-	ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
-			      &iov, 1, VHOST_MAP_RO);
+	ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p),
+			      NULL, &iov, 1, vrh->use_va, VHOST_MAP_RO);
 	if (ret < 0)
 		return ret;
 
-	kaddr = kmap_local_page(iov.bv_page);
-	from = kaddr + iov.bv_offset;
-	*val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
-	kunmap_local(kaddr);
+	if (vrh->use_va) {
+		ret = __get_user(tmp, (__virtio16 __user *)iov.iovec.iov_base);
+		if (ret)
+			return ret;
+	} else {
+		void *kaddr = kmap_local_page(iov.bvec.bv_page);
+		void *from = kaddr + iov.bvec.bv_offset;
+
+		tmp = READ_ONCE(*(__virtio16 *)from);
+		kunmap_local(kaddr);
+	}
+
+	*val = vringh16_to_cpu(vrh, tmp);
 
 	return 0;
 }
@@ -1231,20 +1276,32 @@ static inline int getu16_iotlb(const struct vringh *vrh,
 static inline int putu16_iotlb(const struct vringh *vrh,
 			       __virtio16 *p, u16 val)
 {
-	struct bio_vec iov;
-	void *kaddr, *to;
+	union {
+		struct iovec iovec;
+		struct bio_vec bvec;
+	} iov;
+	__virtio16 tmp;
 	int ret;
 
 	/* Atomic write is needed for putu16 */
-	ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
-			      &iov, 1, VHOST_MAP_WO);
+	ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p),
+			      NULL, &iov, 1, vrh->use_va, VHOST_MAP_RO);
 	if (ret < 0)
 		return ret;
 
-	kaddr = kmap_local_page(iov.bv_page);
-	to = kaddr + iov.bv_offset;
-	WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
-	kunmap_local(kaddr);
+	tmp = cpu_to_vringh16(vrh, val);
+
+	if (vrh->use_va) {
+		ret = __put_user(tmp, (__virtio16 __user *)iov.iovec.iov_base);
+		if (ret)
+			return ret;
+	} else {
+		void *kaddr = kmap_local_page(iov.bvec.bv_page);
+		void *to = kaddr + iov.bvec.bv_offset;
+
+		WRITE_ONCE(*(__virtio16 *)to, tmp);
+		kunmap_local(kaddr);
+	}
 
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 4/8] vringh: support VA with iotlb
  2023-03-16 16:07     ` Stefano Garzarella
@ 2023-03-17  2:53       ` Jason Wang
  2023-03-17  9:49       ` Eugenio Perez Martin
  1 sibling, 0 replies; 40+ messages in thread
From: Jason Wang @ 2023-03-17  2:53 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Eugenio Perez Martin, virtualization, Andrey Zhadchenko, netdev,
	stefanha, linux-kernel, Michael S. Tsirkin, kvm

On Fri, Mar 17, 2023 at 12:07 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Fri, Mar 3, 2023 at 3:39 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> >
> > On Thu, Mar 2, 2023 at 12:35 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > >
> > > vDPA supports the possibility to use user VA in the iotlb messages.
> > > So, let's add support for user VA in vringh to use it in the vDPA
> > > simulators.
> > >
> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > ---
> > >
> > > Notes:
> > >     v2:
> > >     - replace kmap_atomic() with kmap_local_page() [see previous patch]
> > >     - fix cast warnings when build with W=1 C=1
> > >
> > >  include/linux/vringh.h            |   5 +-
> > >  drivers/vdpa/mlx5/net/mlx5_vnet.c |   2 +-
> > >  drivers/vdpa/vdpa_sim/vdpa_sim.c  |   4 +-
> > >  drivers/vhost/vringh.c            | 247 ++++++++++++++++++++++++------
> > >  4 files changed, 205 insertions(+), 53 deletions(-)
> > >
>
> [...]
>
> >
> > It seems to me iotlb_translate_va and iotlb_translate_pa are very
> > similar, their only difference is that the argument is that iov is
> > iovec instead of bio_vec. And how to fill it, obviously.
> >
> > It would be great to merge both functions, only differing with a
> > conditional on vrh->use_va, or generics, or similar. Or, if following
> > the style of the rest of vringh code, to provide a callback to fill
> > iovec (although I like conditional more).
> >
> > However I cannot think of an easy way to perform that without long
> > macros or type erasure.
>
> Thank you for pushing me :-)
> I finally managed to avoid code duplication (partial patch attached,
> but not yet fully tested).
>
> @Jason: with this refactoring I removed copy_to_va/copy_to_pa, so I
> also avoided getu16_iotlb_va/pa.
>
> I will send the full patch in v3, but I would like to get your opinion
> first ;-)

Fine with me.

Thanks

>
>
>
> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> index 0ba3ef809e48..71dd67700e36 100644
> --- a/drivers/vhost/vringh.c
> +++ b/drivers/vhost/vringh.c
> @@ -1096,8 +1096,7 @@ EXPORT_SYMBOL(vringh_need_notify_kern);
>
>  static int iotlb_translate(const struct vringh *vrh,
>                            u64 addr, u64 len, u64 *translated,
> -                          struct bio_vec iov[],
> -                          int iov_size, u32 perm)
> +                          void *iov, int iov_size, bool iovec, u32 perm)
>  {
>         struct vhost_iotlb_map *map;
>         struct vhost_iotlb *iotlb = vrh->iotlb;
> @@ -1107,7 +1106,7 @@ static int iotlb_translate(const struct vringh *vrh,
>         spin_lock(vrh->iotlb_lock);
>
>         while (len > s) {
> -               u64 size, pa, pfn;
> +               u64 size;
>
>                 if (unlikely(ret >= iov_size)) {
>                         ret = -ENOBUFS;
> @@ -1124,10 +1123,22 @@ static int iotlb_translate(const struct vringh *vrh,
>                 }
>
>                 size = map->size - addr + map->start;
> -               pa = map->addr + addr - map->start;
> -               pfn = pa >> PAGE_SHIFT;
> -               bvec_set_page(&iov[ret], pfn_to_page(pfn), min(len - s, size),
> -                             pa & (PAGE_SIZE - 1));
> +               if (iovec) {
> +                       struct iovec *iovec = iov;
> +
> +                       iovec[ret].iov_len = min(len - s, size);
> +                       iovec[ret].iov_base = (void __user *)(unsigned long)
> +                                             (map->addr + addr - map->start);
> +               } else {
> +                       u64 pa = map->addr + addr - map->start;
> +                       u64 pfn = pa >> PAGE_SHIFT;
> +                       struct bio_vec *bvec = iov;
> +
> +                       bvec_set_page(&bvec[ret], pfn_to_page(pfn),
> +                                     min(len - s, size),
> +                                     pa & (PAGE_SIZE - 1));
> +               }
> +
>                 s += size;
>                 addr += size;
>                 ++ret;
> @@ -1141,26 +1152,38 @@ static int iotlb_translate(const struct vringh *vrh,
>         return ret;
>  }
>
> +#define IOTLB_IOV_SIZE 16
> +
>  static inline int copy_from_iotlb(const struct vringh *vrh, void *dst,
>                                   void *src, size_t len)
>  {
>         u64 total_translated = 0;
>
>         while (total_translated < len) {
> -               struct bio_vec iov[16];
> +               union {
> +                       struct iovec iovec[IOTLB_IOV_SIZE];
> +                       struct bio_vec bvec[IOTLB_IOV_SIZE];
> +               } iov;
>                 struct iov_iter iter;
>                 u64 translated;
>                 int ret;
>
>                 ret = iotlb_translate(vrh, (u64)(uintptr_t)src,
>                                       len - total_translated, &translated,
> -                                     iov, ARRAY_SIZE(iov), VHOST_MAP_RO);
> +                                     &iov, IOTLB_IOV_SIZE, vrh->use_va,
> +                                     VHOST_MAP_RO);
>                 if (ret == -ENOBUFS)
> -                       ret = ARRAY_SIZE(iov);
> +                       ret = IOTLB_IOV_SIZE;
>                 else if (ret < 0)
>                         return ret;
>
> -               iov_iter_bvec(&iter, ITER_SOURCE, iov, ret, translated);
> +               if (vrh->use_va) {
> +                       iov_iter_init(&iter, ITER_SOURCE, iov.iovec, ret,
> +                                     translated);
> +               } else {
> +                       iov_iter_bvec(&iter, ITER_SOURCE, iov.bvec, ret,
> +                                     translated);
> +               }
>
>                 ret = copy_from_iter(dst, translated, &iter);
>                 if (ret < 0)
> @@ -1180,20 +1203,30 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
>         u64 total_translated = 0;
>
>         while (total_translated < len) {
> -               struct bio_vec iov[16];
> +               union {
> +                       struct iovec iovec[IOTLB_IOV_SIZE];
> +                       struct bio_vec bvec[IOTLB_IOV_SIZE];
> +               } iov;
>                 struct iov_iter iter;
>                 u64 translated;
>                 int ret;
>
>                 ret = iotlb_translate(vrh, (u64)(uintptr_t)dst,
>                                       len - total_translated, &translated,
> -                                     iov, ARRAY_SIZE(iov), VHOST_MAP_WO);
> +                                     &iov, IOTLB_IOV_SIZE, vrh->use_va,
> +                                     VHOST_MAP_WO);
>                 if (ret == -ENOBUFS)
> -                       ret = ARRAY_SIZE(iov);
> +                       ret = IOTLB_IOV_SIZE;
>                 else if (ret < 0)
>                         return ret;
>
> -               iov_iter_bvec(&iter, ITER_DEST, iov, ret, translated);
> +               if (vrh->use_va) {
> +                       iov_iter_init(&iter, ITER_DEST, iov.iovec, ret,
> +                                     translated);
> +               } else {
> +                       iov_iter_bvec(&iter, ITER_DEST, iov.bvec, ret,
> +                                     translated);
> +               }
>
>                 ret = copy_to_iter(src, translated, &iter);
>                 if (ret < 0)
> @@ -1210,20 +1243,32 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
>  static inline int getu16_iotlb(const struct vringh *vrh,
>                                u16 *val, const __virtio16 *p)
>  {
> -       struct bio_vec iov;
> -       void *kaddr, *from;
> +       union {
> +               struct iovec iovec;
> +               struct bio_vec bvec;
> +       } iov;
> +       __virtio16 tmp;
>         int ret;
>
>         /* Atomic read is needed for getu16 */
> -       ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
> -                             &iov, 1, VHOST_MAP_RO);
> +       ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p),
> +                             NULL, &iov, 1, vrh->use_va, VHOST_MAP_RO);
>         if (ret < 0)
>                 return ret;
>
> -       kaddr = kmap_local_page(iov.bv_page);
> -       from = kaddr + iov.bv_offset;
> -       *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
> -       kunmap_local(kaddr);
> +       if (vrh->use_va) {
> +               ret = __get_user(tmp, (__virtio16 __user *)iov.iovec.iov_base);
> +               if (ret)
> +                       return ret;
> +       } else {
> +               void *kaddr = kmap_local_page(iov.bvec.bv_page);
> +               void *from = kaddr + iov.bvec.bv_offset;
> +
> +               tmp = READ_ONCE(*(__virtio16 *)from);
> +               kunmap_local(kaddr);
> +       }
> +
> +       *val = vringh16_to_cpu(vrh, tmp);
>
>         return 0;
>  }
> @@ -1231,20 +1276,32 @@ static inline int getu16_iotlb(const struct vringh *vrh,
>  static inline int putu16_iotlb(const struct vringh *vrh,
>                                __virtio16 *p, u16 val)
>  {
> -       struct bio_vec iov;
> -       void *kaddr, *to;
> +       union {
> +               struct iovec iovec;
> +               struct bio_vec bvec;
> +       } iov;
> +       __virtio16 tmp;
>         int ret;
>
>         /* Atomic write is needed for putu16 */
> -       ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
> -                             &iov, 1, VHOST_MAP_WO);
> +       ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p),
> +                             NULL, &iov, 1, vrh->use_va, VHOST_MAP_RO);
>         if (ret < 0)
>                 return ret;
>
> -       kaddr = kmap_local_page(iov.bv_page);
> -       to = kaddr + iov.bv_offset;
> -       WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
> -       kunmap_local(kaddr);
> +       tmp = cpu_to_vringh16(vrh, val);
> +
> +       if (vrh->use_va) {
> +               ret = __put_user(tmp, (__virtio16 __user *)iov.iovec.iov_base);
> +               if (ret)
> +                       return ret;
> +       } else {
> +               void *kaddr = kmap_local_page(iov.bvec.bv_page);
> +               void *to = kaddr + iov.bvec.bv_offset;
> +
> +               WRITE_ONCE(*(__virtio16 *)to, tmp);
> +               kunmap_local(kaddr);
> +       }
>
>         return 0;
>  }
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 4/8] vringh: support VA with iotlb
  2023-03-16 16:07     ` Stefano Garzarella
  2023-03-17  2:53       ` Jason Wang
@ 2023-03-17  9:49       ` Eugenio Perez Martin
  2023-03-17 11:25         ` Stefano Garzarella
  1 sibling, 1 reply; 40+ messages in thread
From: Eugenio Perez Martin @ 2023-03-17  9:49 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Jason Wang, virtualization, Andrey Zhadchenko, netdev, stefanha,
	linux-kernel, Michael S. Tsirkin, kvm

On Thu, Mar 16, 2023 at 5:07 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Fri, Mar 3, 2023 at 3:39 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> >
> > On Thu, Mar 2, 2023 at 12:35 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > >
> > > vDPA supports the possibility to use user VA in the iotlb messages.
> > > So, let's add support for user VA in vringh to use it in the vDPA
> > > simulators.
> > >
> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > ---
> > >
> > > Notes:
> > >     v2:
> > >     - replace kmap_atomic() with kmap_local_page() [see previous patch]
> > >     - fix cast warnings when build with W=1 C=1
> > >
> > >  include/linux/vringh.h            |   5 +-
> > >  drivers/vdpa/mlx5/net/mlx5_vnet.c |   2 +-
> > >  drivers/vdpa/vdpa_sim/vdpa_sim.c  |   4 +-
> > >  drivers/vhost/vringh.c            | 247 ++++++++++++++++++++++++------
> > >  4 files changed, 205 insertions(+), 53 deletions(-)
> > >
>
> [...]
>
> >
> > It seems to me iotlb_translate_va and iotlb_translate_pa are very
> > similar, their only difference is that the argument is that iov is
> > iovec instead of bio_vec. And how to fill it, obviously.
> >
> > It would be great to merge both functions, only differing with a
> > conditional on vrh->use_va, or generics, or similar. Or, if following
> > the style of the rest of vringh code, to provide a callback to fill
> > iovec (although I like conditional more).
> >
> > However I cannot think of an easy way to perform that without long
> > macros or type erasure.
>
> Thank you for pushing me :-)
> I finally managed to avoid code duplication (partial patch attached,
> but not yet fully tested).
>
> @Jason: with this refactoring I removed copy_to_va/copy_to_pa, so I
> also avoided getu16_iotlb_va/pa.
>
> I will send the full patch in v3, but I would like to get your opinion
> first ;-)
>
>
>
> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> index 0ba3ef809e48..71dd67700e36 100644
> --- a/drivers/vhost/vringh.c
> +++ b/drivers/vhost/vringh.c
> @@ -1096,8 +1096,7 @@ EXPORT_SYMBOL(vringh_need_notify_kern);
>
>  static int iotlb_translate(const struct vringh *vrh,
>                            u64 addr, u64 len, u64 *translated,
> -                          struct bio_vec iov[],
> -                          int iov_size, u32 perm)
> +                          void *iov, int iov_size, bool iovec, u32 perm)

I think this is an improvement, but we're doing type erasure here. I
don't think it is a big deal since the function is not exported, it's
pretty contained in this file, so I'd ack this version too. I'm just
throwing ideas here:

a) typedef the union {iovec, bio_vec} and use that type in the parameter.

As a drawback, that union feels out of place in this file. Is this the
only place where it is needed? I don't see other similar uses in the
kernel.

b) To convert from iov to bio_iov at return
The drawback is the extra processing if the compiler is not smart
enough to inline it. I prefer the previous one but I didn't want to
omit it, just in case.

Thanks!

>  {
>         struct vhost_iotlb_map *map;
>         struct vhost_iotlb *iotlb = vrh->iotlb;
> @@ -1107,7 +1106,7 @@ static int iotlb_translate(const struct vringh *vrh,
>         spin_lock(vrh->iotlb_lock);
>
>         while (len > s) {
> -               u64 size, pa, pfn;
> +               u64 size;
>
>                 if (unlikely(ret >= iov_size)) {
>                         ret = -ENOBUFS;
> @@ -1124,10 +1123,22 @@ static int iotlb_translate(const struct vringh *vrh,
>                 }
>
>                 size = map->size - addr + map->start;
> -               pa = map->addr + addr - map->start;
> -               pfn = pa >> PAGE_SHIFT;
> -               bvec_set_page(&iov[ret], pfn_to_page(pfn), min(len - s, size),
> -                             pa & (PAGE_SIZE - 1));
> +               if (iovec) {
> +                       struct iovec *iovec = iov;
> +
> +                       iovec[ret].iov_len = min(len - s, size);
> +                       iovec[ret].iov_base = (void __user *)(unsigned long)
> +                                             (map->addr + addr - map->start);
> +               } else {
> +                       u64 pa = map->addr + addr - map->start;
> +                       u64 pfn = pa >> PAGE_SHIFT;
> +                       struct bio_vec *bvec = iov;
> +
> +                       bvec_set_page(&bvec[ret], pfn_to_page(pfn),
> +                                     min(len - s, size),
> +                                     pa & (PAGE_SIZE - 1));
> +               }
> +
>                 s += size;
>                 addr += size;
>                 ++ret;
> @@ -1141,26 +1152,38 @@ static int iotlb_translate(const struct vringh *vrh,
>         return ret;
>  }
>
> +#define IOTLB_IOV_SIZE 16
> +
>  static inline int copy_from_iotlb(const struct vringh *vrh, void *dst,
>                                   void *src, size_t len)
>  {
>         u64 total_translated = 0;
>
>         while (total_translated < len) {
> -               struct bio_vec iov[16];
> +               union {
> +                       struct iovec iovec[IOTLB_IOV_SIZE];
> +                       struct bio_vec bvec[IOTLB_IOV_SIZE];
> +               } iov;
>                 struct iov_iter iter;
>                 u64 translated;
>                 int ret;
>
>                 ret = iotlb_translate(vrh, (u64)(uintptr_t)src,
>                                       len - total_translated, &translated,
> -                                     iov, ARRAY_SIZE(iov), VHOST_MAP_RO);
> +                                     &iov, IOTLB_IOV_SIZE, vrh->use_va,
> +                                     VHOST_MAP_RO);
>                 if (ret == -ENOBUFS)
> -                       ret = ARRAY_SIZE(iov);
> +                       ret = IOTLB_IOV_SIZE;
>                 else if (ret < 0)
>                         return ret;
>
> -               iov_iter_bvec(&iter, ITER_SOURCE, iov, ret, translated);
> +               if (vrh->use_va) {
> +                       iov_iter_init(&iter, ITER_SOURCE, iov.iovec, ret,
> +                                     translated);
> +               } else {
> +                       iov_iter_bvec(&iter, ITER_SOURCE, iov.bvec, ret,
> +                                     translated);
> +               }
>
>                 ret = copy_from_iter(dst, translated, &iter);
>                 if (ret < 0)
> @@ -1180,20 +1203,30 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
>         u64 total_translated = 0;
>
>         while (total_translated < len) {
> -               struct bio_vec iov[16];
> +               union {
> +                       struct iovec iovec[IOTLB_IOV_SIZE];
> +                       struct bio_vec bvec[IOTLB_IOV_SIZE];
> +               } iov;
>                 struct iov_iter iter;
>                 u64 translated;
>                 int ret;
>
>                 ret = iotlb_translate(vrh, (u64)(uintptr_t)dst,
>                                       len - total_translated, &translated,
> -                                     iov, ARRAY_SIZE(iov), VHOST_MAP_WO);
> +                                     &iov, IOTLB_IOV_SIZE, vrh->use_va,
> +                                     VHOST_MAP_WO);
>                 if (ret == -ENOBUFS)
> -                       ret = ARRAY_SIZE(iov);
> +                       ret = IOTLB_IOV_SIZE;
>                 else if (ret < 0)
>                         return ret;
>
> -               iov_iter_bvec(&iter, ITER_DEST, iov, ret, translated);
> +               if (vrh->use_va) {
> +                       iov_iter_init(&iter, ITER_DEST, iov.iovec, ret,
> +                                     translated);
> +               } else {
> +                       iov_iter_bvec(&iter, ITER_DEST, iov.bvec, ret,
> +                                     translated);
> +               }
>
>                 ret = copy_to_iter(src, translated, &iter);
>                 if (ret < 0)
> @@ -1210,20 +1243,32 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
>  static inline int getu16_iotlb(const struct vringh *vrh,
>                                u16 *val, const __virtio16 *p)
>  {
> -       struct bio_vec iov;
> -       void *kaddr, *from;
> +       union {
> +               struct iovec iovec;
> +               struct bio_vec bvec;
> +       } iov;
> +       __virtio16 tmp;
>         int ret;
>
>         /* Atomic read is needed for getu16 */
> -       ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
> -                             &iov, 1, VHOST_MAP_RO);
> +       ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p),
> +                             NULL, &iov, 1, vrh->use_va, VHOST_MAP_RO);
>         if (ret < 0)
>                 return ret;
>
> -       kaddr = kmap_local_page(iov.bv_page);
> -       from = kaddr + iov.bv_offset;
> -       *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
> -       kunmap_local(kaddr);
> +       if (vrh->use_va) {
> +               ret = __get_user(tmp, (__virtio16 __user *)iov.iovec.iov_base);
> +               if (ret)
> +                       return ret;
> +       } else {
> +               void *kaddr = kmap_local_page(iov.bvec.bv_page);
> +               void *from = kaddr + iov.bvec.bv_offset;
> +
> +               tmp = READ_ONCE(*(__virtio16 *)from);
> +               kunmap_local(kaddr);
> +       }
> +
> +       *val = vringh16_to_cpu(vrh, tmp);
>
>         return 0;
>  }
> @@ -1231,20 +1276,32 @@ static inline int getu16_iotlb(const struct vringh *vrh,
>  static inline int putu16_iotlb(const struct vringh *vrh,
>                                __virtio16 *p, u16 val)
>  {
> -       struct bio_vec iov;
> -       void *kaddr, *to;
> +       union {
> +               struct iovec iovec;
> +               struct bio_vec bvec;
> +       } iov;
> +       __virtio16 tmp;
>         int ret;
>
>         /* Atomic write is needed for putu16 */
> -       ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
> -                             &iov, 1, VHOST_MAP_WO);
> +       ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p),
> +                             NULL, &iov, 1, vrh->use_va, VHOST_MAP_RO);
>         if (ret < 0)
>                 return ret;
>
> -       kaddr = kmap_local_page(iov.bv_page);
> -       to = kaddr + iov.bv_offset;
> -       WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
> -       kunmap_local(kaddr);
> +       tmp = cpu_to_vringh16(vrh, val);
> +
> +       if (vrh->use_va) {
> +               ret = __put_user(tmp, (__virtio16 __user *)iov.iovec.iov_base);
> +               if (ret)
> +                       return ret;
> +       } else {
> +               void *kaddr = kmap_local_page(iov.bvec.bv_page);
> +               void *to = kaddr + iov.bvec.bv_offset;
> +
> +               WRITE_ONCE(*(__virtio16 *)to, tmp);
> +               kunmap_local(kaddr);
> +       }
>
>         return 0;
>  }
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 4/8] vringh: support VA with iotlb
  2023-03-17  9:49       ` Eugenio Perez Martin
@ 2023-03-17 11:25         ` Stefano Garzarella
  0 siblings, 0 replies; 40+ messages in thread
From: Stefano Garzarella @ 2023-03-17 11:25 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Jason Wang, virtualization, Andrey Zhadchenko, netdev, stefanha,
	linux-kernel, Michael S. Tsirkin, kvm

On Fri, Mar 17, 2023 at 10:49:27AM +0100, Eugenio Perez Martin wrote:
>On Thu, Mar 16, 2023 at 5:07 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>>
>> On Fri, Mar 3, 2023 at 3:39 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>> >
>> > On Thu, Mar 2, 2023 at 12:35 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>> > >
>> > > vDPA supports the possibility to use user VA in the iotlb messages.
>> > > So, let's add support for user VA in vringh to use it in the vDPA
>> > > simulators.
>> > >
>> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>> > > ---
>> > >
>> > > Notes:
>> > >     v2:
>> > >     - replace kmap_atomic() with kmap_local_page() [see previous patch]
>> > >     - fix cast warnings when build with W=1 C=1
>> > >
>> > >  include/linux/vringh.h            |   5 +-
>> > >  drivers/vdpa/mlx5/net/mlx5_vnet.c |   2 +-
>> > >  drivers/vdpa/vdpa_sim/vdpa_sim.c  |   4 +-
>> > >  drivers/vhost/vringh.c            | 247 ++++++++++++++++++++++++------
>> > >  4 files changed, 205 insertions(+), 53 deletions(-)
>> > >
>>
>> [...]
>>
>> >
>> > It seems to me iotlb_translate_va and iotlb_translate_pa are very
>> > similar, their only difference is that the argument is that iov is
>> > iovec instead of bio_vec. And how to fill it, obviously.
>> >
>> > It would be great to merge both functions, only differing with a
>> > conditional on vrh->use_va, or generics, or similar. Or, if following
>> > the style of the rest of vringh code, to provide a callback to fill
>> > iovec (although I like conditional more).
>> >
>> > However I cannot think of an easy way to perform that without long
>> > macros or type erasure.
>>
>> Thank you for pushing me :-)
>> I finally managed to avoid code duplication (partial patch attached,
>> but not yet fully tested).
>>
>> @Jason: with this refactoring I removed copy_to_va/copy_to_pa, so I
>> also avoided getu16_iotlb_va/pa.
>>
>> I will send the full patch in v3, but I would like to get your opinion
>> first ;-)
>>
>>
>>
>> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
>> index 0ba3ef809e48..71dd67700e36 100644
>> --- a/drivers/vhost/vringh.c
>> +++ b/drivers/vhost/vringh.c
>> @@ -1096,8 +1096,7 @@ EXPORT_SYMBOL(vringh_need_notify_kern);
>>
>>  static int iotlb_translate(const struct vringh *vrh,
>>                            u64 addr, u64 len, u64 *translated,
>> -                          struct bio_vec iov[],
>> -                          int iov_size, u32 perm)
>> +                          void *iov, int iov_size, bool iovec, u32 perm)
>
>I think this is an improvement, but we're doing type erasure here. I
>don't think it is a big deal since the function is not exported, it's
>pretty contained in this file, so I'd ack this version too. I'm just
>throwing ideas here:
>
>a) typedef the union {iovec, bio_vec} and use that type in the parameter.
>
>As a drawback, that union feels out of place in this file. Is this the
>only place where it is needed? I don't see other similar uses in the
>kernel.

iov_iter has something similar, but they are const pointers, so IIUC
it is not supposed to be used to set the bvec contents, just iterate it.

Anyway I thought something similar and should be doable, but since
it was internal API I went to type erasure.

>
>b) To convert from iov to bio_iov at return
>The drawback is the extra processing if the compiler is not smart
>enough to inline it. I prefer the previous one but I didn't want to
>omit it, just in case.

Yep, I prefer too the previous one, so let's go in that direction for
v3 ;-)

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2023-03-17 11:26 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-02 11:34 [PATCH v2 0/8] vdpa_sim: add support for user VA Stefano Garzarella
2023-03-02 11:34 ` [PATCH v2 1/8] vdpa: add bind_mm/unbind_mm callbacks Stefano Garzarella
2023-03-14  3:39   ` Jason Wang
2023-03-16  8:17     ` Stefano Garzarella
2023-03-02 11:34 ` [PATCH v2 2/8] vhost-vdpa: use bind_mm/unbind_mm device callbacks Stefano Garzarella
2023-03-14  3:48   ` Jason Wang
2023-03-16  8:31     ` Stefano Garzarella
2023-03-16 10:11       ` Stefano Garzarella
2023-03-02 11:34 ` [PATCH v2 3/8] vringh: replace kmap_atomic() with kmap_local_page() Stefano Garzarella
2023-03-14  3:56   ` Jason Wang
2023-03-15 21:12     ` Fabio M. De Francesco
2023-03-16  2:53       ` Jason Wang
2023-03-16  8:09       ` Stefano Garzarella
2023-03-16  9:25         ` Fabio M. De Francesco
2023-03-16  9:13   ` Fabio M. De Francesco
2023-03-16  9:17     ` Stefano Garzarella
2023-03-02 11:34 ` [PATCH v2 4/8] vringh: support VA with iotlb Stefano Garzarella
2023-03-03 14:38   ` Eugenio Perez Martin
2023-03-07  9:31     ` Stefano Garzarella
2023-03-16 16:07     ` Stefano Garzarella
2023-03-17  2:53       ` Jason Wang
2023-03-17  9:49       ` Eugenio Perez Martin
2023-03-17 11:25         ` Stefano Garzarella
2023-03-14  4:53   ` Jason Wang
2023-03-16  8:38     ` Stefano Garzarella
2023-03-02 11:34 ` [PATCH v2 5/8] vdpa_sim: make devices agnostic for work management Stefano Garzarella
2023-03-03 14:40   ` Eugenio Perez Martin
2023-03-14  5:27   ` Jason Wang
2023-03-02 11:34 ` [PATCH v2 6/8] vdpa_sim: use kthread worker Stefano Garzarella
2023-03-02 15:30   ` Simon Horman
2023-03-02 15:48     ` Stefano Garzarella
2023-03-05 11:21   ` kernel test robot
2023-03-14  5:31   ` Jason Wang
2023-03-02 11:34 ` [PATCH v2 7/8] vdpa_sim: replace the spinlock with a mutex to protect the state Stefano Garzarella
2023-03-14  5:29   ` Jason Wang
2023-03-14  5:31     ` Jason Wang
2023-03-16  8:42       ` Stefano Garzarella
2023-03-02 11:34 ` [PATCH v2 8/8] vdpa_sim: add support for user VA Stefano Garzarella
2023-03-14  5:36   ` Jason Wang
2023-03-16  9:11     ` Stefano Garzarella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).