All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH vhost v4 00/15] vdpa/mlx5: Add support for resumable vqs
@ 2023-12-19 18:08 Dragos Tatulea
  2023-12-19 18:08 ` [PATCH mlx5-vhost v4 01/15] vdpa/mlx5: Expose resumable vq capability Dragos Tatulea
                   ` (15 more replies)
  0 siblings, 16 replies; 50+ messages in thread
From: Dragos Tatulea @ 2023-12-19 18:08 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Eugenio Perez Martin,
	Si-Wei Liu, Saeed Mahameed, Leon Romanovsky, virtualization,
	Gal Pressman
  Cc: Dragos Tatulea, kvm, linux-kernel, Parav Pandit, Xuan Zhuo

Add support for resumable vqs in the mlx5_vdpa driver. This is a
firmware feature that can be used for the following benefits:
- Full device .suspend/.resume.
- .set_map doesn't need to destroy and create new vqs anymore just to
  update the map. When resumable vqs are supported it is enough to
  suspend the vqs, set the new maps, and then resume the vqs.

The first patch exposes relevant bits for the feature in mlx5_ifc.h.
That means it needs to be applied to the mlx5-vhost tree [0] first. Once
applied there, the change has to be pulled from mlx5-vhost into the
vhost tree and only then the remaining patches can be applied. Same flow
as the vq descriptor mappings patchset [1].

The second part implements the vdpa backend feature support to allow
vq state and address changes when the device is in DRIVER_OK state and
suspended.

The third part adds support for seletively modifying vq parameters. This
is needed to be able to use resumable vqs.

Then the actual support for resumable vqs is added.

The last part of the series introduces reference counting for mrs which
is necessary to avoid freeing mkeys too early or leaking them.

* Changes in v4:
- Added vdpa backend feature support for changing vq properties in
  DRIVER_OK when device is suspended. Added support in the driver as
  well.
- Dropped Acked-by for the patches that had the tag mistakenly
  added.

* Changes in v3:
- Faulty version. Please ignore.

* Changes in v2:
- Added mr refcounting patches.
- Deleted unnecessary patch: "vdpa/mlx5: Split function into locked and
  unlocked variants"
- Small print improvement in "Introduce per vq and device resume"
  patch.
- Patch 1/7 has been applied to mlx5-vhost branch.


Dragos Tatulea (15):
  vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
  vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND flag
  vdpa: Accept VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND backend
    feature
  vdpa: Accept VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND backend
    feature
  vdpa: Track device suspended state
  vdpa: Block vq address change in DRIVER_OK unless device supports it
  vdpa: Block vq state change in DRIVER_OK unless device supports it
  vdpa/mlx5: Expose resumable vq capability
  vdpa/mlx5: Allow modifying multiple vq fields in one modify command
  vdpa/mlx5: Introduce per vq and device resume
  vdpa/mlx5: Mark vq addrs for modification in hw vq
  vdpa/mlx5: Mark vq state for modification in hw vq
  vdpa/mlx5: Use vq suspend/resume during .set_map
  vdpa/mlx5: Introduce reference counting to mrs
  vdpa/mlx5: Add mkey leak detection

 drivers/vdpa/mlx5/core/mlx5_vdpa.h |  10 +-
 drivers/vdpa/mlx5/core/mr.c        |  69 +++++++--
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 218 ++++++++++++++++++++++++++---
 drivers/vhost/vdpa.c               |  51 ++++++-
 include/linux/mlx5/mlx5_ifc.h      |   3 +-
 include/linux/mlx5/mlx5_ifc_vdpa.h |   4 +
 include/uapi/linux/vhost_types.h   |   8 ++
 7 files changed, 322 insertions(+), 41 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH mlx5-vhost v4 01/15] vdpa/mlx5: Expose resumable vq capability
  2023-12-19 18:08 [PATCH vhost v4 00/15] vdpa/mlx5: Add support for resumable vqs Dragos Tatulea
@ 2023-12-19 18:08 ` Dragos Tatulea
  2023-12-20  3:46   ` Jason Wang
  2023-12-19 18:08 ` [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag Dragos Tatulea
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Dragos Tatulea @ 2023-12-19 18:08 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Eugenio Perez Martin,
	Si-Wei Liu, Saeed Mahameed, Leon Romanovsky, virtualization,
	Gal Pressman
  Cc: Dragos Tatulea, kvm, linux-kernel, Parav Pandit, Xuan Zhuo

Necessary for checking if resumable vqs are supported by the hardware.
Actual support will be added in a downstream patch.

Reviewed-by: Gal Pressman <gal@nvidia.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
---
 include/linux/mlx5/mlx5_ifc.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 6f3631425f38..9eaceaf6bcb0 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1236,7 +1236,8 @@ struct mlx5_ifc_virtio_emulation_cap_bits {
 
 	u8	   reserved_at_c0[0x13];
 	u8         desc_group_mkey_supported[0x1];
-	u8         reserved_at_d4[0xc];
+	u8         freeze_to_rdy_supported[0x1];
+	u8         reserved_at_d5[0xb];
 
 	u8         reserved_at_e0[0x20];
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
  2023-12-19 18:08 [PATCH vhost v4 00/15] vdpa/mlx5: Add support for resumable vqs Dragos Tatulea
  2023-12-19 18:08 ` [PATCH mlx5-vhost v4 01/15] vdpa/mlx5: Expose resumable vq capability Dragos Tatulea
@ 2023-12-19 18:08 ` Dragos Tatulea
  2023-12-20  3:46   ` Jason Wang
  2023-12-20 16:09   ` Eugenio Perez Martin
  2023-12-19 18:08 ` [PATCH vhost v4 03/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND flag Dragos Tatulea
                   ` (13 subsequent siblings)
  15 siblings, 2 replies; 50+ messages in thread
From: Dragos Tatulea @ 2023-12-19 18:08 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Eugenio Perez Martin,
	Si-Wei Liu, Saeed Mahameed, Leon Romanovsky, virtualization,
	Gal Pressman
  Cc: Dragos Tatulea, kvm, linux-kernel, Parav Pandit, Xuan Zhuo

The virtio spec doesn't allow changing virtqueue addresses after
DRIVER_OK. Some devices do support this operation when the device is
suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
advertises this support as a backend features.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Suggested-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/uapi/linux/vhost_types.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
index d7656908f730..aacd067afc89 100644
--- a/include/uapi/linux/vhost_types.h
+++ b/include/uapi/linux/vhost_types.h
@@ -192,5 +192,9 @@ struct vhost_vdpa_iova_range {
 #define VHOST_BACKEND_F_DESC_ASID    0x7
 /* IOTLB don't flush memory mapping across device reset */
 #define VHOST_BACKEND_F_IOTLB_PERSIST  0x8
+/* Device supports changing virtqueue addresses when device is suspended
+ * and is in state DRIVER_OK.
+ */
+#define VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND  0x9
 
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH vhost v4 03/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND flag
  2023-12-19 18:08 [PATCH vhost v4 00/15] vdpa/mlx5: Add support for resumable vqs Dragos Tatulea
  2023-12-19 18:08 ` [PATCH mlx5-vhost v4 01/15] vdpa/mlx5: Expose resumable vq capability Dragos Tatulea
  2023-12-19 18:08 ` [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag Dragos Tatulea
@ 2023-12-19 18:08 ` Dragos Tatulea
  2023-12-20 16:10   ` Eugenio Perez Martin
  2023-12-19 18:08 ` [PATCH vhost v4 04/15] vdpa: Accept VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND backend feature Dragos Tatulea
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Dragos Tatulea @ 2023-12-19 18:08 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Eugenio Perez Martin,
	Si-Wei Liu, Saeed Mahameed, Leon Romanovsky, virtualization,
	Gal Pressman
  Cc: Dragos Tatulea, kvm, linux-kernel, Parav Pandit, Xuan Zhuo

The virtio spec doesn't allow changing virtqueue state after
DRIVER_OK. Some devices do support this operation when the device is
suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND flag
advertises this support as a backend features.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Suggested-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/uapi/linux/vhost_types.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
index aacd067afc89..848dadc95ca1 100644
--- a/include/uapi/linux/vhost_types.h
+++ b/include/uapi/linux/vhost_types.h
@@ -196,5 +196,9 @@ struct vhost_vdpa_iova_range {
  * and is in state DRIVER_OK.
  */
 #define VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND  0x9
+/* Device supports changing virtqueue state when device is suspended
+ * and is in state DRIVER_OK.
+ */
+#define VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND  0x10
 
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH vhost v4 04/15] vdpa: Accept VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND backend feature
  2023-12-19 18:08 [PATCH vhost v4 00/15] vdpa/mlx5: Add support for resumable vqs Dragos Tatulea
                   ` (2 preceding siblings ...)
  2023-12-19 18:08 ` [PATCH vhost v4 03/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND flag Dragos Tatulea
@ 2023-12-19 18:08 ` Dragos Tatulea
  2023-12-20 16:11   ` Eugenio Perez Martin
  2023-12-19 18:08 ` [PATCH vhost v4 05/15] vdpa: Accept VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND " Dragos Tatulea
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Dragos Tatulea @ 2023-12-19 18:08 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Eugenio Perez Martin,
	Si-Wei Liu, Saeed Mahameed, Leon Romanovsky, virtualization,
	Gal Pressman
  Cc: Dragos Tatulea, kvm, linux-kernel, Parav Pandit, Xuan Zhuo

If userland sets this feature, allow it.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Suggested-by: Eugenio Pérez <eperezma@redhat.com>
---
 drivers/vhost/vdpa.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index da7ec77cdaff..2250fcd90e5b 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -749,6 +749,7 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
 				 BIT_ULL(VHOST_BACKEND_F_IOTLB_PERSIST) |
 				 BIT_ULL(VHOST_BACKEND_F_SUSPEND) |
 				 BIT_ULL(VHOST_BACKEND_F_RESUME) |
+				 BIT_ULL(VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND) |
 				 BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK)))
 			return -EOPNOTSUPP;
 		if ((features & BIT_ULL(VHOST_BACKEND_F_SUSPEND)) &&
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH vhost v4 05/15] vdpa: Accept VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND backend feature
  2023-12-19 18:08 [PATCH vhost v4 00/15] vdpa/mlx5: Add support for resumable vqs Dragos Tatulea
                   ` (3 preceding siblings ...)
  2023-12-19 18:08 ` [PATCH vhost v4 04/15] vdpa: Accept VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND backend feature Dragos Tatulea
@ 2023-12-19 18:08 ` Dragos Tatulea
  2023-12-20 16:12   ` Eugenio Perez Martin
  2023-12-19 18:08 ` [PATCH vhost v4 06/15] vdpa: Track device suspended state Dragos Tatulea
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Dragos Tatulea @ 2023-12-19 18:08 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Eugenio Perez Martin,
	Si-Wei Liu, Saeed Mahameed, Leon Romanovsky, virtualization,
	Gal Pressman
  Cc: Dragos Tatulea, kvm, linux-kernel, Parav Pandit, Xuan Zhuo

If userland sets this feature, allow it.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Suggested-by: Eugenio Pérez <eperezma@redhat.com>
---
 drivers/vhost/vdpa.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 2250fcd90e5b..b4e8ddf86485 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -750,6 +750,7 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
 				 BIT_ULL(VHOST_BACKEND_F_SUSPEND) |
 				 BIT_ULL(VHOST_BACKEND_F_RESUME) |
 				 BIT_ULL(VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND) |
+				 BIT_ULL(VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND) |
 				 BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK)))
 			return -EOPNOTSUPP;
 		if ((features & BIT_ULL(VHOST_BACKEND_F_SUSPEND)) &&
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH vhost v4 06/15] vdpa: Track device suspended state
  2023-12-19 18:08 [PATCH vhost v4 00/15] vdpa/mlx5: Add support for resumable vqs Dragos Tatulea
                   ` (4 preceding siblings ...)
  2023-12-19 18:08 ` [PATCH vhost v4 05/15] vdpa: Accept VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND " Dragos Tatulea
@ 2023-12-19 18:08 ` Dragos Tatulea
  2023-12-20  3:46   ` Jason Wang
  2023-12-19 18:08 ` [PATCH vhost v4 07/15] vdpa: Block vq address change in DRIVER_OK unless device supports it Dragos Tatulea
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Dragos Tatulea @ 2023-12-19 18:08 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Eugenio Perez Martin,
	Si-Wei Liu, Saeed Mahameed, Leon Romanovsky, virtualization,
	Gal Pressman
  Cc: Dragos Tatulea, kvm, linux-kernel, Parav Pandit, Xuan Zhuo

Set vdpa device suspended state on successful suspend. Clear it on
successful resume and reset.

The state will be locked by the vhost_vdpa mutex. The mutex is taken
during suspend, resume and reset in vhost_vdpa_unlocked_ioctl. The
exception is vhost_vdpa_open which does a device reset but that should
be safe because it can only happen before the other ops.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Suggested-by: Eugenio Pérez <eperezma@redhat.com>
---
 drivers/vhost/vdpa.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index b4e8ddf86485..00b4fa8e89f2 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -59,6 +59,7 @@ struct vhost_vdpa {
 	int in_batch;
 	struct vdpa_iova_range range;
 	u32 batch_asid;
+	bool suspended;
 };
 
 static DEFINE_IDA(vhost_vdpa_ida);
@@ -232,6 +233,8 @@ static int _compat_vdpa_reset(struct vhost_vdpa *v)
 	struct vdpa_device *vdpa = v->vdpa;
 	u32 flags = 0;
 
+	v->suspended = false;
+
 	if (v->vdev.vqs) {
 		flags |= !vhost_backend_has_feature(v->vdev.vqs[0],
 						    VHOST_BACKEND_F_IOTLB_PERSIST) ?
@@ -590,11 +593,16 @@ static long vhost_vdpa_suspend(struct vhost_vdpa *v)
 {
 	struct vdpa_device *vdpa = v->vdpa;
 	const struct vdpa_config_ops *ops = vdpa->config;
+	int ret;
 
 	if (!ops->suspend)
 		return -EOPNOTSUPP;
 
-	return ops->suspend(vdpa);
+	ret = ops->suspend(vdpa);
+	if (!ret)
+		v->suspended = true;
+
+	return ret;
 }
 
 /* After a successful return of this ioctl the device resumes processing
@@ -605,11 +613,16 @@ static long vhost_vdpa_resume(struct vhost_vdpa *v)
 {
 	struct vdpa_device *vdpa = v->vdpa;
 	const struct vdpa_config_ops *ops = vdpa->config;
+	int ret;
 
 	if (!ops->resume)
 		return -EOPNOTSUPP;
 
-	return ops->resume(vdpa);
+	ret = ops->resume(vdpa);
+	if (!ret)
+		v->suspended = false;
+
+	return ret;
 }
 
 static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH vhost v4 07/15] vdpa: Block vq address change in DRIVER_OK unless device supports it
  2023-12-19 18:08 [PATCH vhost v4 00/15] vdpa/mlx5: Add support for resumable vqs Dragos Tatulea
                   ` (5 preceding siblings ...)
  2023-12-19 18:08 ` [PATCH vhost v4 06/15] vdpa: Track device suspended state Dragos Tatulea
@ 2023-12-19 18:08 ` Dragos Tatulea
  2023-12-20 16:31   ` Eugenio Perez Martin
  2023-12-19 18:08 ` [PATCH vhost v4 08/15] vdpa: Block vq state " Dragos Tatulea
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Dragos Tatulea @ 2023-12-19 18:08 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Eugenio Perez Martin,
	Si-Wei Liu, Saeed Mahameed, Leon Romanovsky, virtualization,
	Gal Pressman
  Cc: Dragos Tatulea, kvm, linux-kernel, Parav Pandit, Xuan Zhuo

Virtqueue address change during DRIVE_OK is not supported by the virtio
standard. Allow this op in DRIVER_OK only for devices that support
changing the address during DRIVER_OK if the device is suspended.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Suggested-by: Eugenio Pérez <eperezma@redhat.com>
---
 drivers/vhost/vdpa.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 00b4fa8e89f2..6bfa3391935a 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -625,6 +625,29 @@ static long vhost_vdpa_resume(struct vhost_vdpa *v)
 	return ret;
 }
 
+static bool vhost_vdpa_vq_config_allowed(struct vhost_vdpa *v, unsigned int cmd)
+{
+	struct vdpa_device *vdpa = v->vdpa;
+	const struct vdpa_config_ops *ops = vdpa->config;
+	int feature;
+
+	if (!(ops->get_status(vdpa) & VIRTIO_CONFIG_S_DRIVER_OK))
+		return true;
+
+	if (!v->suspended)
+		return false;
+
+	switch (cmd) {
+	case VHOST_SET_VRING_ADDR:
+		feature = VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND;
+		break;
+	default:
+		return false;
+	}
+
+	return v->vdev.vqs && vhost_backend_has_feature(v->vdev.vqs[0], feature);
+}
+
 static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
 				   void __user *argp)
 {
@@ -703,6 +726,9 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
 
 	switch (cmd) {
 	case VHOST_SET_VRING_ADDR:
+		if (!vhost_vdpa_vq_config_allowed(v, cmd))
+			return -EOPNOTSUPP;
+
 		if (ops->set_vq_address(vdpa, idx,
 					(u64)(uintptr_t)vq->desc,
 					(u64)(uintptr_t)vq->avail,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH vhost v4 08/15] vdpa: Block vq state change in DRIVER_OK unless device supports it
  2023-12-19 18:08 [PATCH vhost v4 00/15] vdpa/mlx5: Add support for resumable vqs Dragos Tatulea
                   ` (6 preceding siblings ...)
  2023-12-19 18:08 ` [PATCH vhost v4 07/15] vdpa: Block vq address change in DRIVER_OK unless device supports it Dragos Tatulea
@ 2023-12-19 18:08 ` Dragos Tatulea
  2023-12-20 16:32   ` Eugenio Perez Martin
  2023-12-19 18:08 ` [PATCH vhost v4 09/15] vdpa/mlx5: Allow modifying multiple vq fields in one modify command Dragos Tatulea
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Dragos Tatulea @ 2023-12-19 18:08 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Eugenio Perez Martin,
	Si-Wei Liu, Saeed Mahameed, Leon Romanovsky, virtualization,
	Gal Pressman
  Cc: Dragos Tatulea, kvm, linux-kernel, Parav Pandit, Xuan Zhuo

Virtqueue state change during DRIVE_OK is not supported by the virtio
standard. Allow this op in DRIVER_OK only for devices that support
changing the state during DRIVER_OK if the device is suspended.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Suggested-by: Eugenio Pérez <eperezma@redhat.com>
---
 drivers/vhost/vdpa.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 6bfa3391935a..77509440c723 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -641,6 +641,9 @@ static bool vhost_vdpa_vq_config_allowed(struct vhost_vdpa *v, unsigned int cmd)
 	case VHOST_SET_VRING_ADDR:
 		feature = VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND;
 		break;
+	case VHOST_SET_VRING_BASE:
+		feature = VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND;
+		break;
 	default:
 		return false;
 	}
@@ -737,6 +740,9 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
 		break;
 
 	case VHOST_SET_VRING_BASE:
+		if (!vhost_vdpa_vq_config_allowed(v, cmd))
+			return -EOPNOTSUPP;
+
 		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) {
 			vq_state.packed.last_avail_idx = vq->last_avail_idx & 0x7fff;
 			vq_state.packed.last_avail_counter = !!(vq->last_avail_idx & 0x8000);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH vhost v4 09/15] vdpa/mlx5: Allow modifying multiple vq fields in one modify command
  2023-12-19 18:08 [PATCH vhost v4 00/15] vdpa/mlx5: Add support for resumable vqs Dragos Tatulea
                   ` (7 preceding siblings ...)
  2023-12-19 18:08 ` [PATCH vhost v4 08/15] vdpa: Block vq state " Dragos Tatulea
@ 2023-12-19 18:08 ` Dragos Tatulea
  2023-12-20  3:46   ` Jason Wang
  2023-12-19 18:08 ` [PATCH vhost v4 10/15] vdpa/mlx5: Introduce per vq and device resume Dragos Tatulea
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Dragos Tatulea @ 2023-12-19 18:08 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Eugenio Perez Martin,
	Si-Wei Liu, Saeed Mahameed, Leon Romanovsky, virtualization,
	Gal Pressman
  Cc: Dragos Tatulea, kvm, linux-kernel, Parav Pandit, Xuan Zhuo

Add a bitmask variable that tracks hw vq field changes that
are supposed to be modified on next hw vq change command.

This will be useful to set multiple vq fields when resuming the vq.

Reviewed-by: Gal Pressman <gal@nvidia.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 48 +++++++++++++++++++++++++------
 1 file changed, 40 insertions(+), 8 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 26ba7da6b410..1e08a8805640 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -120,6 +120,9 @@ struct mlx5_vdpa_virtqueue {
 	u16 avail_idx;
 	u16 used_idx;
 	int fw_state;
+
+	u64 modified_fields;
+
 	struct msi_map map;
 
 	/* keep last in the struct */
@@ -1181,7 +1184,19 @@ static bool is_valid_state_change(int oldstate, int newstate)
 	}
 }
 
-static int modify_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, int state)
+static bool modifiable_virtqueue_fields(struct mlx5_vdpa_virtqueue *mvq)
+{
+	/* Only state is always modifiable */
+	if (mvq->modified_fields & ~MLX5_VIRTQ_MODIFY_MASK_STATE)
+		return mvq->fw_state == MLX5_VIRTIO_NET_Q_OBJECT_STATE_INIT ||
+		       mvq->fw_state == MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND;
+
+	return true;
+}
+
+static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
+			    struct mlx5_vdpa_virtqueue *mvq,
+			    int state)
 {
 	int inlen = MLX5_ST_SZ_BYTES(modify_virtio_net_q_in);
 	u32 out[MLX5_ST_SZ_DW(modify_virtio_net_q_out)] = {};
@@ -1193,6 +1208,9 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque
 	if (mvq->fw_state == MLX5_VIRTIO_NET_Q_OBJECT_NONE)
 		return 0;
 
+	if (!modifiable_virtqueue_fields(mvq))
+		return -EINVAL;
+
 	if (!is_valid_state_change(mvq->fw_state, state))
 		return -EINVAL;
 
@@ -1208,17 +1226,28 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque
 	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);
 
 	obj_context = MLX5_ADDR_OF(modify_virtio_net_q_in, in, obj_context);
-	MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select,
-		   MLX5_VIRTQ_MODIFY_MASK_STATE);
-	MLX5_SET(virtio_net_q_object, obj_context, state, state);
+	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE)
+		MLX5_SET(virtio_net_q_object, obj_context, state, state);
+
+	MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select, mvq->modified_fields);
 	err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
 	kfree(in);
 	if (!err)
 		mvq->fw_state = state;
 
+	mvq->modified_fields = 0;
+
 	return err;
 }
 
+static int modify_virtqueue_state(struct mlx5_vdpa_net *ndev,
+				  struct mlx5_vdpa_virtqueue *mvq,
+				  unsigned int state)
+{
+	mvq->modified_fields |= MLX5_VIRTQ_MODIFY_MASK_STATE;
+	return modify_virtqueue(ndev, mvq, state);
+}
+
 static int counter_set_alloc(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
 {
 	u32 in[MLX5_ST_SZ_DW(create_virtio_q_counters_in)] = {};
@@ -1347,7 +1376,7 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
 		goto err_vq;
 
 	if (mvq->ready) {
-		err = modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
+		err = modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
 		if (err) {
 			mlx5_vdpa_warn(&ndev->mvdev, "failed to modify to ready vq idx %d(%d)\n",
 				       idx, err);
@@ -1382,7 +1411,7 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
 	if (mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY)
 		return;
 
-	if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
+	if (modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
 		mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");
 
 	if (query_virtqueue(ndev, mvq, &attr)) {
@@ -1407,6 +1436,7 @@ static void teardown_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *
 		return;
 
 	suspend_vq(ndev, mvq);
+	mvq->modified_fields = 0;
 	destroy_virtqueue(ndev, mvq);
 	dealloc_vector(ndev, mvq);
 	counter_set_dealloc(ndev, mvq);
@@ -2207,7 +2237,7 @@ static void mlx5_vdpa_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready
 	if (!ready) {
 		suspend_vq(ndev, mvq);
 	} else {
-		err = modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
+		err = modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
 		if (err) {
 			mlx5_vdpa_warn(mvdev, "modify VQ %d to ready failed (%d)\n", idx, err);
 			ready = false;
@@ -2804,8 +2834,10 @@ static void clear_vqs_ready(struct mlx5_vdpa_net *ndev)
 {
 	int i;
 
-	for (i = 0; i < ndev->mvdev.max_vqs; i++)
+	for (i = 0; i < ndev->mvdev.max_vqs; i++) {
 		ndev->vqs[i].ready = false;
+		ndev->vqs[i].modified_fields = 0;
+	}
 
 	ndev->mvdev.cvq.ready = false;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH vhost v4 10/15] vdpa/mlx5: Introduce per vq and device resume
  2023-12-19 18:08 [PATCH vhost v4 00/15] vdpa/mlx5: Add support for resumable vqs Dragos Tatulea
                   ` (8 preceding siblings ...)
  2023-12-19 18:08 ` [PATCH vhost v4 09/15] vdpa/mlx5: Allow modifying multiple vq fields in one modify command Dragos Tatulea
@ 2023-12-19 18:08 ` Dragos Tatulea
  2023-12-20  3:47   ` Jason Wang
  2023-12-19 18:08 ` [PATCH vhost v4 11/15] vdpa/mlx5: Mark vq addrs for modification in hw vq Dragos Tatulea
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Dragos Tatulea @ 2023-12-19 18:08 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Eugenio Perez Martin,
	Si-Wei Liu, Saeed Mahameed, Leon Romanovsky, virtualization,
	Gal Pressman
  Cc: Dragos Tatulea, kvm, linux-kernel, Parav Pandit, Xuan Zhuo

Implement vdpa vq and device resume if capability detected. Add support
for suspend -> ready state change.

Reviewed-by: Gal Pressman <gal@nvidia.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 69 +++++++++++++++++++++++++++----
 1 file changed, 62 insertions(+), 7 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 1e08a8805640..f8f088cced50 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1170,7 +1170,12 @@ static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueu
 	return err;
 }
 
-static bool is_valid_state_change(int oldstate, int newstate)
+static bool is_resumable(struct mlx5_vdpa_net *ndev)
+{
+	return ndev->mvdev.vdev.config->resume;
+}
+
+static bool is_valid_state_change(int oldstate, int newstate, bool resumable)
 {
 	switch (oldstate) {
 	case MLX5_VIRTIO_NET_Q_OBJECT_STATE_INIT:
@@ -1178,6 +1183,7 @@ static bool is_valid_state_change(int oldstate, int newstate)
 	case MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY:
 		return newstate == MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND;
 	case MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND:
+		return resumable ? newstate == MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY : false;
 	case MLX5_VIRTIO_NET_Q_OBJECT_STATE_ERR:
 	default:
 		return false;
@@ -1200,6 +1206,7 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
 {
 	int inlen = MLX5_ST_SZ_BYTES(modify_virtio_net_q_in);
 	u32 out[MLX5_ST_SZ_DW(modify_virtio_net_q_out)] = {};
+	bool state_change = false;
 	void *obj_context;
 	void *cmd_hdr;
 	void *in;
@@ -1211,9 +1218,6 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
 	if (!modifiable_virtqueue_fields(mvq))
 		return -EINVAL;
 
-	if (!is_valid_state_change(mvq->fw_state, state))
-		return -EINVAL;
-
 	in = kzalloc(inlen, GFP_KERNEL);
 	if (!in)
 		return -ENOMEM;
@@ -1226,17 +1230,29 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
 	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);
 
 	obj_context = MLX5_ADDR_OF(modify_virtio_net_q_in, in, obj_context);
-	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE)
+
+	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE) {
+		if (!is_valid_state_change(mvq->fw_state, state, is_resumable(ndev))) {
+			err = -EINVAL;
+			goto done;
+		}
+
 		MLX5_SET(virtio_net_q_object, obj_context, state, state);
+		state_change = true;
+	}
 
 	MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select, mvq->modified_fields);
 	err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
-	kfree(in);
-	if (!err)
+	if (err)
+		goto done;
+
+	if (state_change)
 		mvq->fw_state = state;
 
 	mvq->modified_fields = 0;
 
+done:
+	kfree(in);
 	return err;
 }
 
@@ -1430,6 +1446,24 @@ static void suspend_vqs(struct mlx5_vdpa_net *ndev)
 		suspend_vq(ndev, &ndev->vqs[i]);
 }
 
+static void resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+	if (!mvq->initialized || !is_resumable(ndev))
+		return;
+
+	if (mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND)
+		return;
+
+	if (modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY))
+		mlx5_vdpa_warn(&ndev->mvdev, "modify to resume failed for vq %u\n", mvq->index);
+}
+
+static void resume_vqs(struct mlx5_vdpa_net *ndev)
+{
+	for (int i = 0; i < ndev->mvdev.max_vqs; i++)
+		resume_vq(ndev, &ndev->vqs[i]);
+}
+
 static void teardown_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
 {
 	if (!mvq->initialized)
@@ -3261,6 +3295,23 @@ static int mlx5_vdpa_suspend(struct vdpa_device *vdev)
 	return 0;
 }
 
+static int mlx5_vdpa_resume(struct vdpa_device *vdev)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev;
+
+	ndev = to_mlx5_vdpa_ndev(mvdev);
+
+	mlx5_vdpa_info(mvdev, "resuming device\n");
+
+	down_write(&ndev->reslock);
+	mvdev->suspended = false;
+	resume_vqs(ndev);
+	register_link_notifier(ndev);
+	up_write(&ndev->reslock);
+	return 0;
+}
+
 static int mlx5_set_group_asid(struct vdpa_device *vdev, u32 group,
 			       unsigned int asid)
 {
@@ -3317,6 +3368,7 @@ static const struct vdpa_config_ops mlx5_vdpa_ops = {
 	.get_vq_dma_dev = mlx5_get_vq_dma_dev,
 	.free = mlx5_vdpa_free,
 	.suspend = mlx5_vdpa_suspend,
+	.resume = mlx5_vdpa_resume, /* Op disabled if not supported. */
 };
 
 static int query_mtu(struct mlx5_core_dev *mdev, u16 *mtu)
@@ -3688,6 +3740,9 @@ static int mlx5v_probe(struct auxiliary_device *adev,
 	if (!MLX5_CAP_DEV_VDPA_EMULATION(mdev, desc_group_mkey_supported))
 		mgtdev->vdpa_ops.get_vq_desc_group = NULL;
 
+	if (!MLX5_CAP_DEV_VDPA_EMULATION(mdev, freeze_to_rdy_supported))
+		mgtdev->vdpa_ops.resume = NULL;
+
 	err = vdpa_mgmtdev_register(&mgtdev->mgtdev);
 	if (err)
 		goto reg_err;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH vhost v4 11/15] vdpa/mlx5: Mark vq addrs for modification in hw vq
  2023-12-19 18:08 [PATCH vhost v4 00/15] vdpa/mlx5: Add support for resumable vqs Dragos Tatulea
                   ` (9 preceding siblings ...)
  2023-12-19 18:08 ` [PATCH vhost v4 10/15] vdpa/mlx5: Introduce per vq and device resume Dragos Tatulea
@ 2023-12-19 18:08 ` Dragos Tatulea
  2023-12-19 18:08 ` [PATCH vhost v4 12/15] vdpa/mlx5: Mark vq state " Dragos Tatulea
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 50+ messages in thread
From: Dragos Tatulea @ 2023-12-19 18:08 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Eugenio Perez Martin,
	Si-Wei Liu, Saeed Mahameed, Leon Romanovsky, virtualization,
	Gal Pressman
  Cc: Dragos Tatulea, kvm, linux-kernel, Parav Pandit, Xuan Zhuo

Addresses get set by .set_vq_address. hw vq addresses will be updated on
next modify_virtqueue.

Advertise that the device supports changing the vq addresses when
device is in DRIVER_OK state and suspended.

Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 17 ++++++++++++++++-
 include/linux/mlx5/mlx5_ifc_vdpa.h |  1 +
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index f8f088cced50..93812683c88c 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1209,6 +1209,7 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
 	bool state_change = false;
 	void *obj_context;
 	void *cmd_hdr;
+	void *vq_ctx;
 	void *in;
 	int err;
 
@@ -1230,6 +1231,7 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
 	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);
 
 	obj_context = MLX5_ADDR_OF(modify_virtio_net_q_in, in, obj_context);
+	vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, virtio_q_context);
 
 	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE) {
 		if (!is_valid_state_change(mvq->fw_state, state, is_resumable(ndev))) {
@@ -1241,6 +1243,12 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
 		state_change = true;
 	}
 
+	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS) {
+		MLX5_SET64(virtio_q, vq_ctx, desc_addr, mvq->desc_addr);
+		MLX5_SET64(virtio_q, vq_ctx, used_addr, mvq->device_addr);
+		MLX5_SET64(virtio_q, vq_ctx, available_addr, mvq->driver_addr);
+	}
+
 	MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select, mvq->modified_fields);
 	err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
 	if (err)
@@ -2202,6 +2210,7 @@ static int mlx5_vdpa_set_vq_address(struct vdpa_device *vdev, u16 idx, u64 desc_
 	mvq->desc_addr = desc_area;
 	mvq->device_addr = device_area;
 	mvq->driver_addr = driver_area;
+	mvq->modified_fields |= MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS;
 	return 0;
 }
 
@@ -2626,7 +2635,13 @@ static void unregister_link_notifier(struct mlx5_vdpa_net *ndev)
 
 static u64 mlx5_vdpa_get_backend_features(const struct vdpa_device *vdpa)
 {
-	return BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK);
+	u64 features = BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK);
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdpa);
+
+	if (MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, freeze_to_rdy_supported))
+		features |= BIT_ULL(VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND);
+
+	return features;
 }
 
 static int mlx5_vdpa_set_driver_features(struct vdpa_device *vdev, u64 features)
diff --git a/include/linux/mlx5/mlx5_ifc_vdpa.h b/include/linux/mlx5/mlx5_ifc_vdpa.h
index b86d51a855f6..9594ac405740 100644
--- a/include/linux/mlx5/mlx5_ifc_vdpa.h
+++ b/include/linux/mlx5/mlx5_ifc_vdpa.h
@@ -145,6 +145,7 @@ enum {
 	MLX5_VIRTQ_MODIFY_MASK_STATE                    = (u64)1 << 0,
 	MLX5_VIRTQ_MODIFY_MASK_DIRTY_BITMAP_PARAMS      = (u64)1 << 3,
 	MLX5_VIRTQ_MODIFY_MASK_DIRTY_BITMAP_DUMP_ENABLE = (u64)1 << 4,
+	MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS           = (u64)1 << 6,
 	MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY          = (u64)1 << 14,
 };
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH vhost v4 12/15] vdpa/mlx5: Mark vq state for modification in hw vq
  2023-12-19 18:08 [PATCH vhost v4 00/15] vdpa/mlx5: Add support for resumable vqs Dragos Tatulea
                   ` (10 preceding siblings ...)
  2023-12-19 18:08 ` [PATCH vhost v4 11/15] vdpa/mlx5: Mark vq addrs for modification in hw vq Dragos Tatulea
@ 2023-12-19 18:08 ` Dragos Tatulea
  2023-12-20  3:47   ` Jason Wang
  2023-12-19 18:08 ` [PATCH vhost v4 13/15] vdpa/mlx5: Use vq suspend/resume during .set_map Dragos Tatulea
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Dragos Tatulea @ 2023-12-19 18:08 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Eugenio Perez Martin,
	Si-Wei Liu, Saeed Mahameed, Leon Romanovsky, virtualization,
	Gal Pressman
  Cc: Dragos Tatulea, kvm, linux-kernel, Parav Pandit, Xuan Zhuo

.set_vq_state will set the indices and mark the fields to be modified in
the hw vq.

Advertise that the device supports changing the vq state when the device
is in DRIVER_OK state and suspended.

Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 11 ++++++++++-
 include/linux/mlx5/mlx5_ifc_vdpa.h |  2 ++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 93812683c88c..b760005e2920 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1249,6 +1249,12 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
 		MLX5_SET64(virtio_q, vq_ctx, available_addr, mvq->driver_addr);
 	}
 
+	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_AVAIL_IDX)
+		MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, mvq->avail_idx);
+
+	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_USED_IDX)
+		MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, mvq->used_idx);
+
 	MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select, mvq->modified_fields);
 	err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
 	if (err)
@@ -2328,6 +2334,8 @@ static int mlx5_vdpa_set_vq_state(struct vdpa_device *vdev, u16 idx,
 
 	mvq->used_idx = state->split.avail_index;
 	mvq->avail_idx = state->split.avail_index;
+	mvq->modified_fields |= MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_AVAIL_IDX |
+				MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_USED_IDX;
 	return 0;
 }
 
@@ -2639,7 +2647,8 @@ static u64 mlx5_vdpa_get_backend_features(const struct vdpa_device *vdpa)
 	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdpa);
 
 	if (MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, freeze_to_rdy_supported))
-		features |= BIT_ULL(VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND);
+		features |= BIT_ULL(VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND) |
+			    BIT_ULL(VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND);
 
 	return features;
 }
diff --git a/include/linux/mlx5/mlx5_ifc_vdpa.h b/include/linux/mlx5/mlx5_ifc_vdpa.h
index 9594ac405740..32e712106e68 100644
--- a/include/linux/mlx5/mlx5_ifc_vdpa.h
+++ b/include/linux/mlx5/mlx5_ifc_vdpa.h
@@ -146,6 +146,8 @@ enum {
 	MLX5_VIRTQ_MODIFY_MASK_DIRTY_BITMAP_PARAMS      = (u64)1 << 3,
 	MLX5_VIRTQ_MODIFY_MASK_DIRTY_BITMAP_DUMP_ENABLE = (u64)1 << 4,
 	MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS           = (u64)1 << 6,
+	MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_AVAIL_IDX       = (u64)1 << 7,
+	MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_USED_IDX        = (u64)1 << 8,
 	MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY          = (u64)1 << 14,
 };
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH vhost v4 13/15] vdpa/mlx5: Use vq suspend/resume during .set_map
  2023-12-19 18:08 [PATCH vhost v4 00/15] vdpa/mlx5: Add support for resumable vqs Dragos Tatulea
                   ` (11 preceding siblings ...)
  2023-12-19 18:08 ` [PATCH vhost v4 12/15] vdpa/mlx5: Mark vq state " Dragos Tatulea
@ 2023-12-19 18:08 ` Dragos Tatulea
  2023-12-20  3:47   ` Jason Wang
  2023-12-19 18:08 ` [PATCH vhost v4 14/15] vdpa/mlx5: Introduce reference counting to mrs Dragos Tatulea
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Dragos Tatulea @ 2023-12-19 18:08 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Eugenio Perez Martin,
	Si-Wei Liu, Saeed Mahameed, Leon Romanovsky, virtualization,
	Gal Pressman
  Cc: Dragos Tatulea, kvm, linux-kernel, Parav Pandit, Xuan Zhuo

Instead of tearing down and setting up vq resources, use vq
suspend/resume during .set_map to speed things up a bit.

The vq mr is updated with the new mapping while the vqs are suspended.

If the device doesn't support resumable vqs, do the old teardown and
setup dance.

Reviewed-by: Gal Pressman <gal@nvidia.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 46 ++++++++++++++++++++++++------
 include/linux/mlx5/mlx5_ifc_vdpa.h |  1 +
 2 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index b760005e2920..fcadbeede3e1 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1206,6 +1206,7 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
 {
 	int inlen = MLX5_ST_SZ_BYTES(modify_virtio_net_q_in);
 	u32 out[MLX5_ST_SZ_DW(modify_virtio_net_q_out)] = {};
+	struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
 	bool state_change = false;
 	void *obj_context;
 	void *cmd_hdr;
@@ -1255,6 +1256,24 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
 	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_USED_IDX)
 		MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, mvq->used_idx);
 
+	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY) {
+		struct mlx5_vdpa_mr *mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]];
+
+		if (mr)
+			MLX5_SET(virtio_q, vq_ctx, virtio_q_mkey, mr->mkey);
+		else
+			mvq->modified_fields &= ~MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY;
+	}
+
+	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY) {
+		struct mlx5_vdpa_mr *mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]];
+
+		if (mr && MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, desc_group_mkey_supported))
+			MLX5_SET(virtio_q, vq_ctx, desc_group_mkey, mr->mkey);
+		else
+			mvq->modified_fields &= ~MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY;
+	}
+
 	MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select, mvq->modified_fields);
 	err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
 	if (err)
@@ -2791,24 +2810,35 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev *mvdev,
 				unsigned int asid)
 {
 	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	bool teardown = !is_resumable(ndev);
 	int err;
 
 	suspend_vqs(ndev);
-	err = save_channels_info(ndev);
-	if (err)
-		return err;
+	if (teardown) {
+		err = save_channels_info(ndev);
+		if (err)
+			return err;
 
-	teardown_driver(ndev);
+		teardown_driver(ndev);
+	}
 
 	mlx5_vdpa_update_mr(mvdev, new_mr, asid);
 
+	for (int i = 0; i < ndev->cur_num_vqs; i++)
+		ndev->vqs[i].modified_fields |= MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY |
+						MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY;
+
 	if (!(mvdev->status & VIRTIO_CONFIG_S_DRIVER_OK) || mvdev->suspended)
 		return 0;
 
-	restore_channels_info(ndev);
-	err = setup_driver(mvdev);
-	if (err)
-		return err;
+	if (teardown) {
+		restore_channels_info(ndev);
+		err = setup_driver(mvdev);
+		if (err)
+			return err;
+	}
+
+	resume_vqs(ndev);
 
 	return 0;
 }
diff --git a/include/linux/mlx5/mlx5_ifc_vdpa.h b/include/linux/mlx5/mlx5_ifc_vdpa.h
index 32e712106e68..40371c916cf9 100644
--- a/include/linux/mlx5/mlx5_ifc_vdpa.h
+++ b/include/linux/mlx5/mlx5_ifc_vdpa.h
@@ -148,6 +148,7 @@ enum {
 	MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS           = (u64)1 << 6,
 	MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_AVAIL_IDX       = (u64)1 << 7,
 	MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_USED_IDX        = (u64)1 << 8,
+	MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY            = (u64)1 << 11,
 	MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY          = (u64)1 << 14,
 };
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH vhost v4 14/15] vdpa/mlx5: Introduce reference counting to mrs
  2023-12-19 18:08 [PATCH vhost v4 00/15] vdpa/mlx5: Add support for resumable vqs Dragos Tatulea
                   ` (12 preceding siblings ...)
  2023-12-19 18:08 ` [PATCH vhost v4 13/15] vdpa/mlx5: Use vq suspend/resume during .set_map Dragos Tatulea
@ 2023-12-19 18:08 ` Dragos Tatulea
  2023-12-20  3:47   ` Jason Wang
  2023-12-19 18:08 ` [PATCH vhost v4 15/15] vdpa/mlx5: Add mkey leak detection Dragos Tatulea
  2023-12-25 14:41 ` [PATCH vhost v4 00/15] vdpa/mlx5: Add support for resumable vqs Michael S. Tsirkin
  15 siblings, 1 reply; 50+ messages in thread
From: Dragos Tatulea @ 2023-12-19 18:08 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Eugenio Perez Martin,
	Si-Wei Liu, Saeed Mahameed, Leon Romanovsky, virtualization,
	Gal Pressman
  Cc: Dragos Tatulea, kvm, linux-kernel, Parav Pandit, Xuan Zhuo

Deleting the old mr during mr update (.set_map) and then modifying the
vqs with the new mr is not a good flow for firmware. The firmware
expects that mkeys are deleted after there are no more vqs referencing
them.

Introduce reference counting for mrs to fix this. It is the only way to
make sure that mkeys are not in use by vqs.

An mr reference is taken when the mr is associated to the mr asid table
and when the mr is linked to the vq on create/modify. The reference is
released when the mkey is unlinked from the vq (trough modify/destroy)
and from the mr asid table.

To make things consistent, get rid of mlx5_vdpa_destroy_mr and use
get/put semantics everywhere.

Reviewed-by: Gal Pressman <gal@nvidia.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
---
 drivers/vdpa/mlx5/core/mlx5_vdpa.h |  8 +++--
 drivers/vdpa/mlx5/core/mr.c        | 50 ++++++++++++++++++++----------
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 45 ++++++++++++++++++++++-----
 3 files changed, 78 insertions(+), 25 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index 84547d998bcf..1a0d27b6e09a 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -35,6 +35,8 @@ struct mlx5_vdpa_mr {
 	struct vhost_iotlb *iotlb;
 
 	bool user_mr;
+
+	refcount_t refcount;
 };
 
 struct mlx5_vdpa_resources {
@@ -118,8 +120,10 @@ int mlx5_vdpa_destroy_mkey(struct mlx5_vdpa_dev *mvdev, u32 mkey);
 struct mlx5_vdpa_mr *mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
 					 struct vhost_iotlb *iotlb);
 void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev);
-void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev,
-			  struct mlx5_vdpa_mr *mr);
+void mlx5_vdpa_get_mr(struct mlx5_vdpa_dev *mvdev,
+		      struct mlx5_vdpa_mr *mr);
+void mlx5_vdpa_put_mr(struct mlx5_vdpa_dev *mvdev,
+		      struct mlx5_vdpa_mr *mr);
 void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev,
 			 struct mlx5_vdpa_mr *mr,
 			 unsigned int asid);
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index 2197c46e563a..c7dc8914354a 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -498,32 +498,52 @@ static void destroy_user_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr
 
 static void _mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr)
 {
+	if (WARN_ON(!mr))
+		return;
+
 	if (mr->user_mr)
 		destroy_user_mr(mvdev, mr);
 	else
 		destroy_dma_mr(mvdev, mr);
 
 	vhost_iotlb_free(mr->iotlb);
+
+	kfree(mr);
 }
 
-void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev,
-			  struct mlx5_vdpa_mr *mr)
+static void _mlx5_vdpa_put_mr(struct mlx5_vdpa_dev *mvdev,
+			      struct mlx5_vdpa_mr *mr)
 {
 	if (!mr)
 		return;
 
+	if (refcount_dec_and_test(&mr->refcount))
+		_mlx5_vdpa_destroy_mr(mvdev, mr);
+}
+
+void mlx5_vdpa_put_mr(struct mlx5_vdpa_dev *mvdev,
+		      struct mlx5_vdpa_mr *mr)
+{
 	mutex_lock(&mvdev->mr_mtx);
+	_mlx5_vdpa_put_mr(mvdev, mr);
+	mutex_unlock(&mvdev->mr_mtx);
+}
 
-	_mlx5_vdpa_destroy_mr(mvdev, mr);
+static void _mlx5_vdpa_get_mr(struct mlx5_vdpa_dev *mvdev,
+			      struct mlx5_vdpa_mr *mr)
+{
+	if (!mr)
+		return;
 
-	for (int i = 0; i < MLX5_VDPA_NUM_AS; i++) {
-		if (mvdev->mr[i] == mr)
-			mvdev->mr[i] = NULL;
-	}
+	refcount_inc(&mr->refcount);
+}
 
+void mlx5_vdpa_get_mr(struct mlx5_vdpa_dev *mvdev,
+		      struct mlx5_vdpa_mr *mr)
+{
+	mutex_lock(&mvdev->mr_mtx);
+	_mlx5_vdpa_get_mr(mvdev, mr);
 	mutex_unlock(&mvdev->mr_mtx);
-
-	kfree(mr);
 }
 
 void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev,
@@ -534,20 +554,16 @@ void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev,
 
 	mutex_lock(&mvdev->mr_mtx);
 
+	_mlx5_vdpa_put_mr(mvdev, old_mr);
 	mvdev->mr[asid] = new_mr;
-	if (old_mr) {
-		_mlx5_vdpa_destroy_mr(mvdev, old_mr);
-		kfree(old_mr);
-	}
 
 	mutex_unlock(&mvdev->mr_mtx);
-
 }
 
 void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev)
 {
 	for (int i = 0; i < MLX5_VDPA_NUM_AS; i++)
-		mlx5_vdpa_destroy_mr(mvdev, mvdev->mr[i]);
+		mlx5_vdpa_update_mr(mvdev, NULL, i);
 
 	prune_iotlb(mvdev->cvq.iotlb);
 }
@@ -607,6 +623,8 @@ struct mlx5_vdpa_mr *mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
 	if (err)
 		goto out_err;
 
+	refcount_set(&mr->refcount, 1);
+
 	return mr;
 
 out_err:
@@ -651,7 +669,7 @@ int mlx5_vdpa_reset_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid)
 	if (asid >= MLX5_VDPA_NUM_AS)
 		return -EINVAL;
 
-	mlx5_vdpa_destroy_mr(mvdev, mvdev->mr[asid]);
+	mlx5_vdpa_update_mr(mvdev, NULL, asid);
 
 	if (asid == 0 && MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) {
 		if (mlx5_vdpa_create_dma_mr(mvdev))
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index fcadbeede3e1..f81968b3f9cf 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -123,6 +123,9 @@ struct mlx5_vdpa_virtqueue {
 
 	u64 modified_fields;
 
+	struct mlx5_vdpa_mr *vq_mr;
+	struct mlx5_vdpa_mr *desc_mr;
+
 	struct msi_map map;
 
 	/* keep last in the struct */
@@ -946,6 +949,14 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque
 	kfree(in);
 	mvq->virtq_id = MLX5_GET(general_obj_out_cmd_hdr, out, obj_id);
 
+	mlx5_vdpa_get_mr(mvdev, vq_mr);
+	mvq->vq_mr = vq_mr;
+
+	if (vq_desc_mr && MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, desc_group_mkey_supported)) {
+		mlx5_vdpa_get_mr(mvdev, vq_desc_mr);
+		mvq->desc_mr = vq_desc_mr;
+	}
+
 	return 0;
 
 err_cmd:
@@ -972,6 +983,12 @@ static void destroy_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtq
 	}
 	mvq->fw_state = MLX5_VIRTIO_NET_Q_OBJECT_NONE;
 	umems_destroy(ndev, mvq);
+
+	mlx5_vdpa_put_mr(&ndev->mvdev, mvq->vq_mr);
+	mvq->vq_mr = NULL;
+
+	mlx5_vdpa_put_mr(&ndev->mvdev, mvq->desc_mr);
+	mvq->desc_mr = NULL;
 }
 
 static u32 get_rqpn(struct mlx5_vdpa_virtqueue *mvq, bool fw)
@@ -1207,6 +1224,8 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
 	int inlen = MLX5_ST_SZ_BYTES(modify_virtio_net_q_in);
 	u32 out[MLX5_ST_SZ_DW(modify_virtio_net_q_out)] = {};
 	struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
+	struct mlx5_vdpa_mr *desc_mr = NULL;
+	struct mlx5_vdpa_mr *vq_mr = NULL;
 	bool state_change = false;
 	void *obj_context;
 	void *cmd_hdr;
@@ -1257,19 +1276,19 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
 		MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, mvq->used_idx);
 
 	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY) {
-		struct mlx5_vdpa_mr *mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]];
+		vq_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]];
 
-		if (mr)
-			MLX5_SET(virtio_q, vq_ctx, virtio_q_mkey, mr->mkey);
+		if (vq_mr)
+			MLX5_SET(virtio_q, vq_ctx, virtio_q_mkey, vq_mr->mkey);
 		else
 			mvq->modified_fields &= ~MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY;
 	}
 
 	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY) {
-		struct mlx5_vdpa_mr *mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]];
+		desc_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]];
 
-		if (mr && MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, desc_group_mkey_supported))
-			MLX5_SET(virtio_q, vq_ctx, desc_group_mkey, mr->mkey);
+		if (desc_mr && MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, desc_group_mkey_supported))
+			MLX5_SET(virtio_q, vq_ctx, desc_group_mkey, desc_mr->mkey);
 		else
 			mvq->modified_fields &= ~MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY;
 	}
@@ -1282,6 +1301,18 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
 	if (state_change)
 		mvq->fw_state = state;
 
+	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY) {
+		mlx5_vdpa_put_mr(mvdev, mvq->vq_mr);
+		mlx5_vdpa_get_mr(mvdev, vq_mr);
+		mvq->vq_mr = vq_mr;
+	}
+
+	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY) {
+		mlx5_vdpa_put_mr(mvdev, mvq->desc_mr);
+		mlx5_vdpa_get_mr(mvdev, desc_mr);
+		mvq->desc_mr = desc_mr;
+	}
+
 	mvq->modified_fields = 0;
 
 done:
@@ -3102,7 +3133,7 @@ static int set_map_data(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb,
 	return mlx5_vdpa_update_cvq_iotlb(mvdev, iotlb, asid);
 
 out_err:
-	mlx5_vdpa_destroy_mr(mvdev, new_mr);
+	mlx5_vdpa_put_mr(mvdev, new_mr);
 	return err;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH vhost v4 15/15] vdpa/mlx5: Add mkey leak detection
  2023-12-19 18:08 [PATCH vhost v4 00/15] vdpa/mlx5: Add support for resumable vqs Dragos Tatulea
                   ` (13 preceding siblings ...)
  2023-12-19 18:08 ` [PATCH vhost v4 14/15] vdpa/mlx5: Introduce reference counting to mrs Dragos Tatulea
@ 2023-12-19 18:08 ` Dragos Tatulea
  2023-12-25 14:41 ` [PATCH vhost v4 00/15] vdpa/mlx5: Add support for resumable vqs Michael S. Tsirkin
  15 siblings, 0 replies; 50+ messages in thread
From: Dragos Tatulea @ 2023-12-19 18:08 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Eugenio Perez Martin,
	Si-Wei Liu, Saeed Mahameed, Leon Romanovsky, virtualization,
	Gal Pressman
  Cc: Dragos Tatulea, kvm, linux-kernel, Parav Pandit, Xuan Zhuo

Track allocated mrs in a list and show warning when leaks are detected
on device free or reset.

Reviewed-by: Gal Pressman <gal@nvidia.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
---
 drivers/vdpa/mlx5/core/mlx5_vdpa.h |  2 ++
 drivers/vdpa/mlx5/core/mr.c        | 23 +++++++++++++++++++++++
 drivers/vdpa/mlx5/net/mlx5_vnet.c  |  2 ++
 3 files changed, 27 insertions(+)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index 1a0d27b6e09a..50aac8fe57ef 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -37,6 +37,7 @@ struct mlx5_vdpa_mr {
 	bool user_mr;
 
 	refcount_t refcount;
+	struct list_head mr_list;
 };
 
 struct mlx5_vdpa_resources {
@@ -95,6 +96,7 @@ struct mlx5_vdpa_dev {
 	u32 generation;
 
 	struct mlx5_vdpa_mr *mr[MLX5_VDPA_NUM_AS];
+	struct list_head mr_list_head;
 	/* serialize mr access */
 	struct mutex mr_mtx;
 	struct mlx5_control_vq cvq;
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index c7dc8914354a..4758914ccf86 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -508,6 +508,8 @@ static void _mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_
 
 	vhost_iotlb_free(mr->iotlb);
 
+	list_del(&mr->mr_list);
+
 	kfree(mr);
 }
 
@@ -560,12 +562,31 @@ void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev,
 	mutex_unlock(&mvdev->mr_mtx);
 }
 
+static void mlx5_vdpa_show_mr_leaks(struct mlx5_vdpa_dev *mvdev)
+{
+	struct mlx5_vdpa_mr *mr;
+
+	mutex_lock(&mvdev->mr_mtx);
+
+	list_for_each_entry(mr, &mvdev->mr_list_head, mr_list) {
+
+		mlx5_vdpa_warn(mvdev, "mkey still alive after resource delete: "
+				      "mr: %p, mkey: 0x%x, refcount: %u\n",
+				       mr, mr->mkey, refcount_read(&mr->refcount));
+	}
+
+	mutex_unlock(&mvdev->mr_mtx);
+
+}
+
 void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev)
 {
 	for (int i = 0; i < MLX5_VDPA_NUM_AS; i++)
 		mlx5_vdpa_update_mr(mvdev, NULL, i);
 
 	prune_iotlb(mvdev->cvq.iotlb);
+
+	mlx5_vdpa_show_mr_leaks(mvdev);
 }
 
 static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
@@ -592,6 +613,8 @@ static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
 	if (err)
 		goto err_iotlb;
 
+	list_add_tail(&mr->mr_list, &mvdev->mr_list_head);
+
 	return 0;
 
 err_iotlb:
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index f81968b3f9cf..a783e8bd784d 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -3729,6 +3729,8 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
 	if (err)
 		goto err_mpfs;
 
+	INIT_LIST_HEAD(&mvdev->mr_list_head);
+
 	if (MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) {
 		err = mlx5_vdpa_create_dma_mr(mvdev);
 		if (err)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH mlx5-vhost v4 01/15] vdpa/mlx5: Expose resumable vq capability
  2023-12-19 18:08 ` [PATCH mlx5-vhost v4 01/15] vdpa/mlx5: Expose resumable vq capability Dragos Tatulea
@ 2023-12-20  3:46   ` Jason Wang
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Wang @ 2023-12-20  3:46 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S . Tsirkin, Eugenio Perez Martin, Si-Wei Liu,
	Saeed Mahameed, Leon Romanovsky, virtualization, Gal Pressman,
	kvm, linux-kernel, Parav Pandit, Xuan Zhuo

On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> Necessary for checking if resumable vqs are supported by the hardware.
> Actual support will be added in a downstream patch.
>
> Reviewed-by: Gal Pressman <gal@nvidia.com>
> Acked-by: Eugenio Pérez <eperezma@redhat.com>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


> ---
>  include/linux/mlx5/mlx5_ifc.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
> index 6f3631425f38..9eaceaf6bcb0 100644
> --- a/include/linux/mlx5/mlx5_ifc.h
> +++ b/include/linux/mlx5/mlx5_ifc.h
> @@ -1236,7 +1236,8 @@ struct mlx5_ifc_virtio_emulation_cap_bits {
>
>         u8         reserved_at_c0[0x13];
>         u8         desc_group_mkey_supported[0x1];
> -       u8         reserved_at_d4[0xc];
> +       u8         freeze_to_rdy_supported[0x1];
> +       u8         reserved_at_d5[0xb];
>
>         u8         reserved_at_e0[0x20];
>
> --
> 2.43.0
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
  2023-12-19 18:08 ` [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag Dragos Tatulea
@ 2023-12-20  3:46   ` Jason Wang
  2023-12-20  4:05     ` Jason Wang
  2023-12-20 16:09   ` Eugenio Perez Martin
  1 sibling, 1 reply; 50+ messages in thread
From: Jason Wang @ 2023-12-20  3:46 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S . Tsirkin, Eugenio Perez Martin, Si-Wei Liu,
	Saeed Mahameed, Leon Romanovsky, virtualization, Gal Pressman,
	kvm, linux-kernel, Parav Pandit, Xuan Zhuo

On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> The virtio spec doesn't allow changing virtqueue addresses after
> DRIVER_OK. Some devices do support this operation when the device is
> suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> advertises this support as a backend features.

There's an ongoing effort in virtio spec to introduce the suspend state.

So I wonder if it's better to just allow such behaviour?

Thanks


>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Suggested-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  include/uapi/linux/vhost_types.h | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
> index d7656908f730..aacd067afc89 100644
> --- a/include/uapi/linux/vhost_types.h
> +++ b/include/uapi/linux/vhost_types.h
> @@ -192,5 +192,9 @@ struct vhost_vdpa_iova_range {
>  #define VHOST_BACKEND_F_DESC_ASID    0x7
>  /* IOTLB don't flush memory mapping across device reset */
>  #define VHOST_BACKEND_F_IOTLB_PERSIST  0x8
> +/* Device supports changing virtqueue addresses when device is suspended
> + * and is in state DRIVER_OK.
> + */
> +#define VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND  0x9
>
>  #endif
> --
> 2.43.0
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 06/15] vdpa: Track device suspended state
  2023-12-19 18:08 ` [PATCH vhost v4 06/15] vdpa: Track device suspended state Dragos Tatulea
@ 2023-12-20  3:46   ` Jason Wang
  2023-12-20 12:55     ` Dragos Tatulea
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Wang @ 2023-12-20  3:46 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S . Tsirkin, Eugenio Perez Martin, Si-Wei Liu,
	Saeed Mahameed, Leon Romanovsky, virtualization, Gal Pressman,
	kvm, linux-kernel, Parav Pandit, Xuan Zhuo

On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> Set vdpa device suspended state on successful suspend. Clear it on
> successful resume and reset.
>
> The state will be locked by the vhost_vdpa mutex. The mutex is taken
> during suspend, resume and reset in vhost_vdpa_unlocked_ioctl. The
> exception is vhost_vdpa_open which does a device reset but that should
> be safe because it can only happen before the other ops.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Suggested-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  drivers/vhost/vdpa.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index b4e8ddf86485..00b4fa8e89f2 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -59,6 +59,7 @@ struct vhost_vdpa {
>         int in_batch;
>         struct vdpa_iova_range range;
>         u32 batch_asid;
> +       bool suspended;

Any reason why we don't do it in the core vDPA device but here?

Thanks


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 09/15] vdpa/mlx5: Allow modifying multiple vq fields in one modify command
  2023-12-19 18:08 ` [PATCH vhost v4 09/15] vdpa/mlx5: Allow modifying multiple vq fields in one modify command Dragos Tatulea
@ 2023-12-20  3:46   ` Jason Wang
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Wang @ 2023-12-20  3:46 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S . Tsirkin, Eugenio Perez Martin, Si-Wei Liu,
	Saeed Mahameed, Leon Romanovsky, virtualization, Gal Pressman,
	kvm, linux-kernel, Parav Pandit, Xuan Zhuo

On Wed, Dec 20, 2023 at 2:10 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> Add a bitmask variable that tracks hw vq field changes that
> are supposed to be modified on next hw vq change command.
>
> This will be useful to set multiple vq fields when resuming the vq.
>
> Reviewed-by: Gal Pressman <gal@nvidia.com>
> Acked-by: Eugenio Pérez <eperezma@redhat.com>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 10/15] vdpa/mlx5: Introduce per vq and device resume
  2023-12-19 18:08 ` [PATCH vhost v4 10/15] vdpa/mlx5: Introduce per vq and device resume Dragos Tatulea
@ 2023-12-20  3:47   ` Jason Wang
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Wang @ 2023-12-20  3:47 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S . Tsirkin, Eugenio Perez Martin, Si-Wei Liu,
	Saeed Mahameed, Leon Romanovsky, virtualization, Gal Pressman,
	kvm, linux-kernel, Parav Pandit, Xuan Zhuo

On Wed, Dec 20, 2023 at 2:10 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> Implement vdpa vq and device resume if capability detected. Add support
> for suspend -> ready state change.
>
> Reviewed-by: Gal Pressman <gal@nvidia.com>
> Acked-by: Eugenio Pérez <eperezma@redhat.com>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 12/15] vdpa/mlx5: Mark vq state for modification in hw vq
  2023-12-19 18:08 ` [PATCH vhost v4 12/15] vdpa/mlx5: Mark vq state " Dragos Tatulea
@ 2023-12-20  3:47   ` Jason Wang
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Wang @ 2023-12-20  3:47 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S . Tsirkin, Eugenio Perez Martin, Si-Wei Liu,
	Saeed Mahameed, Leon Romanovsky, virtualization, Gal Pressman,
	kvm, linux-kernel, Parav Pandit, Xuan Zhuo

On Wed, Dec 20, 2023 at 2:10 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> .set_vq_state will set the indices and mark the fields to be modified in
> the hw vq.
>
> Advertise that the device supports changing the vq state when the device
> is in DRIVER_OK state and suspended.
>
> Reviewed-by: Gal Pressman <gal@nvidia.com>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> ---

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 13/15] vdpa/mlx5: Use vq suspend/resume during .set_map
  2023-12-19 18:08 ` [PATCH vhost v4 13/15] vdpa/mlx5: Use vq suspend/resume during .set_map Dragos Tatulea
@ 2023-12-20  3:47   ` Jason Wang
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Wang @ 2023-12-20  3:47 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S . Tsirkin, Eugenio Perez Martin, Si-Wei Liu,
	Saeed Mahameed, Leon Romanovsky, virtualization, Gal Pressman,
	kvm, linux-kernel, Parav Pandit, Xuan Zhuo

On Wed, Dec 20, 2023 at 2:10 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> Instead of tearing down and setting up vq resources, use vq
> suspend/resume during .set_map to speed things up a bit.
>
> The vq mr is updated with the new mapping while the vqs are suspended.
>
> If the device doesn't support resumable vqs, do the old teardown and
> setup dance.
>
> Reviewed-by: Gal Pressman <gal@nvidia.com>
> Acked-by: Eugenio Pérez <eperezma@redhat.com>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> ---

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 14/15] vdpa/mlx5: Introduce reference counting to mrs
  2023-12-19 18:08 ` [PATCH vhost v4 14/15] vdpa/mlx5: Introduce reference counting to mrs Dragos Tatulea
@ 2023-12-20  3:47   ` Jason Wang
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Wang @ 2023-12-20  3:47 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S . Tsirkin, Eugenio Perez Martin, Si-Wei Liu,
	Saeed Mahameed, Leon Romanovsky, virtualization, Gal Pressman,
	kvm, linux-kernel, Parav Pandit, Xuan Zhuo

On Wed, Dec 20, 2023 at 2:10 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> Deleting the old mr during mr update (.set_map) and then modifying the
> vqs with the new mr is not a good flow for firmware. The firmware
> expects that mkeys are deleted after there are no more vqs referencing
> them.
>
> Introduce reference counting for mrs to fix this. It is the only way to
> make sure that mkeys are not in use by vqs.
>
> An mr reference is taken when the mr is associated to the mr asid table
> and when the mr is linked to the vq on create/modify. The reference is
> released when the mkey is unlinked from the vq (trough modify/destroy)
> and from the mr asid table.
>
> To make things consistent, get rid of mlx5_vdpa_destroy_mr and use
> get/put semantics everywhere.
>
> Reviewed-by: Gal Pressman <gal@nvidia.com>
> Acked-by: Eugenio Pérez <eperezma@redhat.com>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> ---

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
  2023-12-20  3:46   ` Jason Wang
@ 2023-12-20  4:05     ` Jason Wang
  2023-12-20 12:57       ` Dragos Tatulea
  2023-12-20 13:32       ` Eugenio Perez Martin
  0 siblings, 2 replies; 50+ messages in thread
From: Jason Wang @ 2023-12-20  4:05 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S . Tsirkin, Eugenio Perez Martin, Si-Wei Liu,
	Saeed Mahameed, Leon Romanovsky, virtualization, Gal Pressman,
	kvm, linux-kernel, Parav Pandit, Xuan Zhuo

On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> >
> > The virtio spec doesn't allow changing virtqueue addresses after
> > DRIVER_OK. Some devices do support this operation when the device is
> > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > advertises this support as a backend features.
>
> There's an ongoing effort in virtio spec to introduce the suspend state.
>
> So I wonder if it's better to just allow such behaviour?

Actually I mean, allow drivers to modify the parameters during suspend
without a new feature.

Thanks

>
> Thanks
>
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 06/15] vdpa: Track device suspended state
  2023-12-20  3:46   ` Jason Wang
@ 2023-12-20 12:55     ` Dragos Tatulea
  2023-12-22 11:22       ` Dragos Tatulea
  0 siblings, 1 reply; 50+ messages in thread
From: Dragos Tatulea @ 2023-12-20 12:55 UTC (permalink / raw)
  To: jasowang
  Cc: xuanzhuo, Parav Pandit, Gal Pressman, eperezma, virtualization,
	linux-kernel, si-wei.liu, kvm, mst, Saeed Mahameed, leon

On Wed, 2023-12-20 at 11:46 +0800, Jason Wang wrote:
> On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > 
> > Set vdpa device suspended state on successful suspend. Clear it on
> > successful resume and reset.
> > 
> > The state will be locked by the vhost_vdpa mutex. The mutex is taken
> > during suspend, resume and reset in vhost_vdpa_unlocked_ioctl. The
> > exception is vhost_vdpa_open which does a device reset but that should
> > be safe because it can only happen before the other ops.
> > 
> > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> > Suggested-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >  drivers/vhost/vdpa.c | 17 +++++++++++++++--
> >  1 file changed, 15 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > index b4e8ddf86485..00b4fa8e89f2 100644
> > --- a/drivers/vhost/vdpa.c
> > +++ b/drivers/vhost/vdpa.c
> > @@ -59,6 +59,7 @@ struct vhost_vdpa {
> >         int in_batch;
> >         struct vdpa_iova_range range;
> >         u32 batch_asid;
> > +       bool suspended;
> 
> Any reason why we don't do it in the core vDPA device but here?
> 
Not really. I wanted to be safe and not expose it in a header due to locking.

Thanks,
Dragos

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
  2023-12-20  4:05     ` Jason Wang
@ 2023-12-20 12:57       ` Dragos Tatulea
  2023-12-20 13:32       ` Eugenio Perez Martin
  1 sibling, 0 replies; 50+ messages in thread
From: Dragos Tatulea @ 2023-12-20 12:57 UTC (permalink / raw)
  To: jasowang
  Cc: xuanzhuo, Parav Pandit, Gal Pressman, eperezma, virtualization,
	linux-kernel, si-wei.liu, kvm, mst, Saeed Mahameed, leon

On Wed, 2023-12-20 at 12:05 +0800, Jason Wang wrote:
> On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <jasowang@redhat.com> wrote:
> > 
> > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > 
> > > The virtio spec doesn't allow changing virtqueue addresses after
> > > DRIVER_OK. Some devices do support this operation when the device is
> > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > advertises this support as a backend features.
> > 
> > There's an ongoing effort in virtio spec to introduce the suspend state.
> > 
> > So I wonder if it's better to just allow such behaviour?
> 
> Actually I mean, allow drivers to modify the parameters during suspend
> without a new feature.
> 
Fine by me. Less code is better than more code. The v2 of this series would be
sufficient then.

Thanks

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
  2023-12-20  4:05     ` Jason Wang
  2023-12-20 12:57       ` Dragos Tatulea
@ 2023-12-20 13:32       ` Eugenio Perez Martin
  2023-12-21  2:03         ` Jason Wang
  1 sibling, 1 reply; 50+ messages in thread
From: Eugenio Perez Martin @ 2023-12-20 13:32 UTC (permalink / raw)
  To: Jason Wang
  Cc: Dragos Tatulea, Michael S . Tsirkin, Si-Wei Liu, Saeed Mahameed,
	Leon Romanovsky, virtualization, Gal Pressman, kvm, linux-kernel,
	Parav Pandit, Xuan Zhuo

On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > >
> > > The virtio spec doesn't allow changing virtqueue addresses after
> > > DRIVER_OK. Some devices do support this operation when the device is
> > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > advertises this support as a backend features.
> >
> > There's an ongoing effort in virtio spec to introduce the suspend state.
> >
> > So I wonder if it's better to just allow such behaviour?
>
> Actually I mean, allow drivers to modify the parameters during suspend
> without a new feature.
>

That would be ideal, but how do userland checks if it can suspend +
change properties + resume?

The only way that comes to my mind is to make sure all parents return
error if userland tries to do it, and then fallback in userland. I'm
ok with that, but I'm not sure if the current master & previous kernel
has a coherent behavior. Do they return error? Or return success
without changing address / vq state?


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
  2023-12-19 18:08 ` [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag Dragos Tatulea
  2023-12-20  3:46   ` Jason Wang
@ 2023-12-20 16:09   ` Eugenio Perez Martin
  1 sibling, 0 replies; 50+ messages in thread
From: Eugenio Perez Martin @ 2023-12-20 16:09 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S . Tsirkin, Jason Wang, Si-Wei Liu, Saeed Mahameed,
	Leon Romanovsky, virtualization, Gal Pressman, kvm, linux-kernel,
	Parav Pandit, Xuan Zhuo

On Tue, Dec 19, 2023 at 7:09 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> The virtio spec doesn't allow changing virtqueue addresses after
> DRIVER_OK. Some devices do support this operation when the device is
> suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> advertises this support as a backend features.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Suggested-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  include/uapi/linux/vhost_types.h | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
> index d7656908f730..aacd067afc89 100644
> --- a/include/uapi/linux/vhost_types.h
> +++ b/include/uapi/linux/vhost_types.h
> @@ -192,5 +192,9 @@ struct vhost_vdpa_iova_range {
>  #define VHOST_BACKEND_F_DESC_ASID    0x7
>  /* IOTLB don't flush memory mapping across device reset */
>  #define VHOST_BACKEND_F_IOTLB_PERSIST  0x8
> +/* Device supports changing virtqueue addresses when device is suspended
> + * and is in state DRIVER_OK.
> + */
> +#define VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND  0x9
>

If we go by feature flag,

Acked-by: Eugenio Pérez <eperezma@redhat.com>

>  #endif
> --
> 2.43.0
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 03/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND flag
  2023-12-19 18:08 ` [PATCH vhost v4 03/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND flag Dragos Tatulea
@ 2023-12-20 16:10   ` Eugenio Perez Martin
  0 siblings, 0 replies; 50+ messages in thread
From: Eugenio Perez Martin @ 2023-12-20 16:10 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S . Tsirkin, Jason Wang, Si-Wei Liu, Saeed Mahameed,
	Leon Romanovsky, virtualization, Gal Pressman, kvm, linux-kernel,
	Parav Pandit, Xuan Zhuo

On Tue, Dec 19, 2023 at 7:09 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> The virtio spec doesn't allow changing virtqueue state after
> DRIVER_OK. Some devices do support this operation when the device is
> suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND flag
> advertises this support as a backend features.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Suggested-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  include/uapi/linux/vhost_types.h | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
> index aacd067afc89..848dadc95ca1 100644
> --- a/include/uapi/linux/vhost_types.h
> +++ b/include/uapi/linux/vhost_types.h
> @@ -196,5 +196,9 @@ struct vhost_vdpa_iova_range {
>   * and is in state DRIVER_OK.
>   */
>  #define VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND  0x9
> +/* Device supports changing virtqueue state when device is suspended
> + * and is in state DRIVER_OK.
> + */
> +#define VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND  0x10

Acked-by: Eugenio Pérez <eperezma@redhat.com>

It would be great to find a shorter ID but I'm very bad at naming :(.
Maybe it is ok to drop the _BACKEND_ word? I'm ok with carrying the
acked-by if so.

Thanks!

>
>  #endif
> --
> 2.43.0
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 04/15] vdpa: Accept VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND backend feature
  2023-12-19 18:08 ` [PATCH vhost v4 04/15] vdpa: Accept VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND backend feature Dragos Tatulea
@ 2023-12-20 16:11   ` Eugenio Perez Martin
  0 siblings, 0 replies; 50+ messages in thread
From: Eugenio Perez Martin @ 2023-12-20 16:11 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S . Tsirkin, Jason Wang, Si-Wei Liu, Saeed Mahameed,
	Leon Romanovsky, virtualization, Gal Pressman, kvm, linux-kernel,
	Parav Pandit, Xuan Zhuo

On Tue, Dec 19, 2023 at 7:09 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> If userland sets this feature, allow it.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Suggested-by: Eugenio Pérez <eperezma@redhat.com>

Acked-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  drivers/vhost/vdpa.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index da7ec77cdaff..2250fcd90e5b 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -749,6 +749,7 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
>                                  BIT_ULL(VHOST_BACKEND_F_IOTLB_PERSIST) |
>                                  BIT_ULL(VHOST_BACKEND_F_SUSPEND) |
>                                  BIT_ULL(VHOST_BACKEND_F_RESUME) |
> +                                BIT_ULL(VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND) |
>                                  BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK)))
>                         return -EOPNOTSUPP;
>                 if ((features & BIT_ULL(VHOST_BACKEND_F_SUSPEND)) &&
> --
> 2.43.0
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 05/15] vdpa: Accept VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND backend feature
  2023-12-19 18:08 ` [PATCH vhost v4 05/15] vdpa: Accept VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND " Dragos Tatulea
@ 2023-12-20 16:12   ` Eugenio Perez Martin
  0 siblings, 0 replies; 50+ messages in thread
From: Eugenio Perez Martin @ 2023-12-20 16:12 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S . Tsirkin, Jason Wang, Si-Wei Liu, Saeed Mahameed,
	Leon Romanovsky, virtualization, Gal Pressman, kvm, linux-kernel,
	Parav Pandit, Xuan Zhuo

On Tue, Dec 19, 2023 at 7:09 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> If userland sets this feature, allow it.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Suggested-by: Eugenio Pérez <eperezma@redhat.com>

Acked-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  drivers/vhost/vdpa.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 2250fcd90e5b..b4e8ddf86485 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -750,6 +750,7 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
>                                  BIT_ULL(VHOST_BACKEND_F_SUSPEND) |
>                                  BIT_ULL(VHOST_BACKEND_F_RESUME) |
>                                  BIT_ULL(VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND) |
> +                                BIT_ULL(VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND) |
>                                  BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK)))
>                         return -EOPNOTSUPP;
>                 if ((features & BIT_ULL(VHOST_BACKEND_F_SUSPEND)) &&
> --
> 2.43.0
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 07/15] vdpa: Block vq address change in DRIVER_OK unless device supports it
  2023-12-19 18:08 ` [PATCH vhost v4 07/15] vdpa: Block vq address change in DRIVER_OK unless device supports it Dragos Tatulea
@ 2023-12-20 16:31   ` Eugenio Perez Martin
  0 siblings, 0 replies; 50+ messages in thread
From: Eugenio Perez Martin @ 2023-12-20 16:31 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S . Tsirkin, Jason Wang, Si-Wei Liu, Saeed Mahameed,
	Leon Romanovsky, virtualization, Gal Pressman, kvm, linux-kernel,
	Parav Pandit, Xuan Zhuo

On Tue, Dec 19, 2023 at 7:09 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> Virtqueue address change during DRIVE_OK is not supported by the virtio
> standard. Allow this op in DRIVER_OK only for devices that support
> changing the address during DRIVER_OK if the device is suspended.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Suggested-by: Eugenio Pérez <eperezma@redhat.com>

Acked-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  drivers/vhost/vdpa.c | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 00b4fa8e89f2..6bfa3391935a 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -625,6 +625,29 @@ static long vhost_vdpa_resume(struct vhost_vdpa *v)
>         return ret;
>  }
>
> +static bool vhost_vdpa_vq_config_allowed(struct vhost_vdpa *v, unsigned int cmd)
> +{
> +       struct vdpa_device *vdpa = v->vdpa;
> +       const struct vdpa_config_ops *ops = vdpa->config;
> +       int feature;
> +
> +       if (!(ops->get_status(vdpa) & VIRTIO_CONFIG_S_DRIVER_OK))
> +               return true;
> +
> +       if (!v->suspended)
> +               return false;
> +
> +       switch (cmd) {
> +       case VHOST_SET_VRING_ADDR:
> +               feature = VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND;
> +               break;
> +       default:
> +               return false;
> +       }
> +
> +       return v->vdev.vqs && vhost_backend_has_feature(v->vdev.vqs[0], feature);
> +}
> +
>  static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
>                                    void __user *argp)
>  {
> @@ -703,6 +726,9 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
>
>         switch (cmd) {
>         case VHOST_SET_VRING_ADDR:
> +               if (!vhost_vdpa_vq_config_allowed(v, cmd))
> +                       return -EOPNOTSUPP;
> +
>                 if (ops->set_vq_address(vdpa, idx,
>                                         (u64)(uintptr_t)vq->desc,
>                                         (u64)(uintptr_t)vq->avail,
> --
> 2.43.0
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 08/15] vdpa: Block vq state change in DRIVER_OK unless device supports it
  2023-12-19 18:08 ` [PATCH vhost v4 08/15] vdpa: Block vq state " Dragos Tatulea
@ 2023-12-20 16:32   ` Eugenio Perez Martin
  0 siblings, 0 replies; 50+ messages in thread
From: Eugenio Perez Martin @ 2023-12-20 16:32 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S . Tsirkin, Jason Wang, Si-Wei Liu, Saeed Mahameed,
	Leon Romanovsky, virtualization, Gal Pressman, kvm, linux-kernel,
	Parav Pandit, Xuan Zhuo

On Tue, Dec 19, 2023 at 7:10 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> Virtqueue state change during DRIVE_OK is not supported by the virtio
> standard. Allow this op in DRIVER_OK only for devices that support
> changing the state during DRIVER_OK if the device is suspended.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Suggested-by: Eugenio Pérez <eperezma@redhat.com>

Acked-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  drivers/vhost/vdpa.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 6bfa3391935a..77509440c723 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -641,6 +641,9 @@ static bool vhost_vdpa_vq_config_allowed(struct vhost_vdpa *v, unsigned int cmd)
>         case VHOST_SET_VRING_ADDR:
>                 feature = VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND;
>                 break;
> +       case VHOST_SET_VRING_BASE:
> +               feature = VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND;
> +               break;
>         default:
>                 return false;
>         }
> @@ -737,6 +740,9 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
>                 break;
>
>         case VHOST_SET_VRING_BASE:
> +               if (!vhost_vdpa_vq_config_allowed(v, cmd))
> +                       return -EOPNOTSUPP;
> +
>                 if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) {
>                         vq_state.packed.last_avail_idx = vq->last_avail_idx & 0x7fff;
>                         vq_state.packed.last_avail_counter = !!(vq->last_avail_idx & 0x8000);
> --
> 2.43.0
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
  2023-12-20 13:32       ` Eugenio Perez Martin
@ 2023-12-21  2:03         ` Jason Wang
  2023-12-21  7:46           ` Eugenio Perez Martin
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Wang @ 2023-12-21  2:03 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Dragos Tatulea, Michael S . Tsirkin, Si-Wei Liu, Saeed Mahameed,
	Leon Romanovsky, virtualization, Gal Pressman, kvm, linux-kernel,
	Parav Pandit, Xuan Zhuo

On Wed, Dec 20, 2023 at 9:32 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > >
> > > > The virtio spec doesn't allow changing virtqueue addresses after
> > > > DRIVER_OK. Some devices do support this operation when the device is
> > > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > > advertises this support as a backend features.
> > >
> > > There's an ongoing effort in virtio spec to introduce the suspend state.
> > >
> > > So I wonder if it's better to just allow such behaviour?
> >
> > Actually I mean, allow drivers to modify the parameters during suspend
> > without a new feature.
> >
>
> That would be ideal, but how do userland checks if it can suspend +
> change properties + resume?

As discussed, it looks to me the only device that supports suspend is
simulator and it supports change properties.

E.g:

static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
                                  u64 desc_area, u64 driver_area,
                                  u64 device_area)
{
        struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
        struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];

        vq->desc_addr = desc_area;
        vq->driver_addr = driver_area;
        vq->device_addr = device_area;

        return 0;
}

>
> The only way that comes to my mind is to make sure all parents return
> error if userland tries to do it, and then fallback in userland.

Yes.

> I'm
> ok with that, but I'm not sure if the current master & previous kernel
> has a coherent behavior. Do they return error? Or return success
> without changing address / vq state?

We probably don't need to worry too much here, as e.g set_vq_address
could fail even without suspend (just at uAPI level).

Thanks

>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
  2023-12-21  2:03         ` Jason Wang
@ 2023-12-21  7:46           ` Eugenio Perez Martin
  2023-12-21 11:52             ` Dragos Tatulea
  2023-12-22  2:50             ` Jason Wang
  0 siblings, 2 replies; 50+ messages in thread
From: Eugenio Perez Martin @ 2023-12-21  7:46 UTC (permalink / raw)
  To: Jason Wang
  Cc: Dragos Tatulea, Michael S . Tsirkin, Si-Wei Liu, Saeed Mahameed,
	Leon Romanovsky, virtualization, Gal Pressman, kvm, linux-kernel,
	Parav Pandit, Xuan Zhuo

On Thu, Dec 21, 2023 at 3:03 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Wed, Dec 20, 2023 at 9:32 PM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
> >
> > On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > > >
> > > > > The virtio spec doesn't allow changing virtqueue addresses after
> > > > > DRIVER_OK. Some devices do support this operation when the device is
> > > > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > > > advertises this support as a backend features.
> > > >
> > > > There's an ongoing effort in virtio spec to introduce the suspend state.
> > > >
> > > > So I wonder if it's better to just allow such behaviour?
> > >
> > > Actually I mean, allow drivers to modify the parameters during suspend
> > > without a new feature.
> > >
> >
> > That would be ideal, but how do userland checks if it can suspend +
> > change properties + resume?
>
> As discussed, it looks to me the only device that supports suspend is
> simulator and it supports change properties.
>
> E.g:
>
> static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
>                                   u64 desc_area, u64 driver_area,
>                                   u64 device_area)
> {
>         struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
>         struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
>
>         vq->desc_addr = desc_area;
>         vq->driver_addr = driver_area;
>         vq->device_addr = device_area;
>
>         return 0;
> }
>

So in the current kernel master it is valid to set a different vq
address while the device is suspended in vdpa_sim. But it is not valid
in mlx5, as the FW will not be updated in resume (Dragos, please
correct me if I'm wrong). Both of them return success.

How can we know in the destination QEMU if it is valid to suspend &
set address? Should we handle this as a bugfix and backport the
change?

> >
> > The only way that comes to my mind is to make sure all parents return
> > error if userland tries to do it, and then fallback in userland.
>
> Yes.
>
> > I'm
> > ok with that, but I'm not sure if the current master & previous kernel
> > has a coherent behavior. Do they return error? Or return success
> > without changing address / vq state?
>
> We probably don't need to worry too much here, as e.g set_vq_address
> could fail even without suspend (just at uAPI level).
>

I don't get this, sorry. I rephrased my point with an example earlier
in the mail.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
  2023-12-21  7:46           ` Eugenio Perez Martin
@ 2023-12-21 11:52             ` Dragos Tatulea
  2023-12-21 12:08               ` Eugenio Perez Martin
  2023-12-22  2:50             ` Jason Wang
  1 sibling, 1 reply; 50+ messages in thread
From: Dragos Tatulea @ 2023-12-21 11:52 UTC (permalink / raw)
  To: jasowang, eperezma
  Cc: xuanzhuo, Parav Pandit, virtualization, Gal Pressman,
	linux-kernel, si-wei.liu, kvm, mst, Saeed Mahameed, leon

On Thu, 2023-12-21 at 08:46 +0100, Eugenio Perez Martin wrote:
> On Thu, Dec 21, 2023 at 3:03 AM Jason Wang <jasowang@redhat.com> wrote:
> > 
> > On Wed, Dec 20, 2023 at 9:32 PM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > > 
> > > On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > 
> > > > On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > 
> > > > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > > > > 
> > > > > > The virtio spec doesn't allow changing virtqueue addresses after
> > > > > > DRIVER_OK. Some devices do support this operation when the device is
> > > > > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > > > > advertises this support as a backend features.
> > > > > 
> > > > > There's an ongoing effort in virtio spec to introduce the suspend state.
> > > > > 
> > > > > So I wonder if it's better to just allow such behaviour?
> > > > 
> > > > Actually I mean, allow drivers to modify the parameters during suspend
> > > > without a new feature.
> > > > 
> > > 
> > > That would be ideal, but how do userland checks if it can suspend +
> > > change properties + resume?
> > 
> > As discussed, it looks to me the only device that supports suspend is
> > simulator and it supports change properties.
> > 
> > E.g:
> > 
> > static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
> >                                   u64 desc_area, u64 driver_area,
> >                                   u64 device_area)
> > {
> >         struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> >         struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> > 
> >         vq->desc_addr = desc_area;
> >         vq->driver_addr = driver_area;
> >         vq->device_addr = device_area;
> > 
> >         return 0;
> > }
> > 
> 
> So in the current kernel master it is valid to set a different vq
> address while the device is suspended in vdpa_sim. But it is not valid
> in mlx5, as the FW will not be updated in resume (Dragos, please
> correct me if I'm wrong). Both of them return success.
> 
In the current state, there is no resume. HW Virtqueues will just get re-created
with the new address. 

> How can we know in the destination QEMU if it is valid to suspend &
> set address? Should we handle this as a bugfix and backport the
> change?
> 
> > > 
> > > The only way that comes to my mind is to make sure all parents return
> > > error if userland tries to do it, and then fallback in userland.
> > 
> > Yes.
> > 
> > > I'm
> > > ok with that, but I'm not sure if the current master & previous kernel
> > > has a coherent behavior. Do they return error? Or return success
> > > without changing address / vq state?
> > 
> > We probably don't need to worry too much here, as e.g set_vq_address
> > could fail even without suspend (just at uAPI level).
> > 
> 
> I don't get this, sorry. I rephrased my point with an example earlier
> in the mail.
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
  2023-12-21 11:52             ` Dragos Tatulea
@ 2023-12-21 12:08               ` Eugenio Perez Martin
  2023-12-21 14:38                 ` Dragos Tatulea
  0 siblings, 1 reply; 50+ messages in thread
From: Eugenio Perez Martin @ 2023-12-21 12:08 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: jasowang, xuanzhuo, Parav Pandit, virtualization, Gal Pressman,
	linux-kernel, si-wei.liu, kvm, mst, Saeed Mahameed, leon

On Thu, Dec 21, 2023 at 12:52 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> On Thu, 2023-12-21 at 08:46 +0100, Eugenio Perez Martin wrote:
> > On Thu, Dec 21, 2023 at 3:03 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Wed, Dec 20, 2023 at 9:32 PM Eugenio Perez Martin
> > > <eperezma@redhat.com> wrote:
> > > >
> > > > On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > >
> > > > > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > > > > >
> > > > > > > The virtio spec doesn't allow changing virtqueue addresses after
> > > > > > > DRIVER_OK. Some devices do support this operation when the device is
> > > > > > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > > > > > advertises this support as a backend features.
> > > > > >
> > > > > > There's an ongoing effort in virtio spec to introduce the suspend state.
> > > > > >
> > > > > > So I wonder if it's better to just allow such behaviour?
> > > > >
> > > > > Actually I mean, allow drivers to modify the parameters during suspend
> > > > > without a new feature.
> > > > >
> > > >
> > > > That would be ideal, but how do userland checks if it can suspend +
> > > > change properties + resume?
> > >
> > > As discussed, it looks to me the only device that supports suspend is
> > > simulator and it supports change properties.
> > >
> > > E.g:
> > >
> > > static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
> > >                                   u64 desc_area, u64 driver_area,
> > >                                   u64 device_area)
> > > {
> > >         struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> > >         struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> > >
> > >         vq->desc_addr = desc_area;
> > >         vq->driver_addr = driver_area;
> > >         vq->device_addr = device_area;
> > >
> > >         return 0;
> > > }
> > >
> >
> > So in the current kernel master it is valid to set a different vq
> > address while the device is suspended in vdpa_sim. But it is not valid
> > in mlx5, as the FW will not be updated in resume (Dragos, please
> > correct me if I'm wrong). Both of them return success.
> >
> In the current state, there is no resume. HW Virtqueues will just get re-created
> with the new address.
>

Oh, then all of this is effectively transparent to the userspace
except for the time it takes?

In that case you're right, we don't need feature flags. But I think it
would be great to also move the error return in case userspace tries
to modify vq parameters out of suspend state.

Thanks!


> > How can we know in the destination QEMU if it is valid to suspend &
> > set address? Should we handle this as a bugfix and backport the
> > change?
> >
> > > >
> > > > The only way that comes to my mind is to make sure all parents return
> > > > error if userland tries to do it, and then fallback in userland.
> > >
> > > Yes.
> > >
> > > > I'm
> > > > ok with that, but I'm not sure if the current master & previous kernel
> > > > has a coherent behavior. Do they return error? Or return success
> > > > without changing address / vq state?
> > >
> > > We probably don't need to worry too much here, as e.g set_vq_address
> > > could fail even without suspend (just at uAPI level).
> > >
> >
> > I don't get this, sorry. I rephrased my point with an example earlier
> > in the mail.
> >
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
  2023-12-21 12:08               ` Eugenio Perez Martin
@ 2023-12-21 14:38                 ` Dragos Tatulea
  2023-12-21 14:55                   ` Eugenio Perez Martin
  0 siblings, 1 reply; 50+ messages in thread
From: Dragos Tatulea @ 2023-12-21 14:38 UTC (permalink / raw)
  To: eperezma
  Cc: xuanzhuo, Parav Pandit, Gal Pressman, virtualization,
	linux-kernel, si-wei.liu, kvm, jasowang, Saeed Mahameed, mst,
	leon

On Thu, 2023-12-21 at 13:08 +0100, Eugenio Perez Martin wrote:
> On Thu, Dec 21, 2023 at 12:52 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > 
> > On Thu, 2023-12-21 at 08:46 +0100, Eugenio Perez Martin wrote:
> > > On Thu, Dec 21, 2023 at 3:03 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > 
> > > > On Wed, Dec 20, 2023 at 9:32 PM Eugenio Perez Martin
> > > > <eperezma@redhat.com> wrote:
> > > > > 
> > > > > On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > 
> > > > > > On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > 
> > > > > > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > > > > > > 
> > > > > > > > The virtio spec doesn't allow changing virtqueue addresses after
> > > > > > > > DRIVER_OK. Some devices do support this operation when the device is
> > > > > > > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > > > > > > advertises this support as a backend features.
> > > > > > > 
> > > > > > > There's an ongoing effort in virtio spec to introduce the suspend state.
> > > > > > > 
> > > > > > > So I wonder if it's better to just allow such behaviour?
> > > > > > 
> > > > > > Actually I mean, allow drivers to modify the parameters during suspend
> > > > > > without a new feature.
> > > > > > 
> > > > > 
> > > > > That would be ideal, but how do userland checks if it can suspend +
> > > > > change properties + resume?
> > > > 
> > > > As discussed, it looks to me the only device that supports suspend is
> > > > simulator and it supports change properties.
> > > > 
> > > > E.g:
> > > > 
> > > > static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
> > > >                                   u64 desc_area, u64 driver_area,
> > > >                                   u64 device_area)
> > > > {
> > > >         struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> > > >         struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> > > > 
> > > >         vq->desc_addr = desc_area;
> > > >         vq->driver_addr = driver_area;
> > > >         vq->device_addr = device_area;
> > > > 
> > > >         return 0;
> > > > }
> > > > 
> > > 
> > > So in the current kernel master it is valid to set a different vq
> > > address while the device is suspended in vdpa_sim. But it is not valid
> > > in mlx5, as the FW will not be updated in resume (Dragos, please
> > > correct me if I'm wrong). Both of them return success.
> > > 
> > In the current state, there is no resume. HW Virtqueues will just get re-created
> > with the new address.
> > 
> 
> Oh, then all of this is effectively transparent to the userspace
> except for the time it takes?
> 
Not quite: mlx5_vdpa_set_vq_address will save the vq address only on the SW vq
representation. Only later will it will call into the FW to update the FW. Later
means:
- On DRIVER_OK state, when the VQs get created.
- On .set_map when the VQs get re-created (before this series) / updated (after
this series)
- On .resume (after this series).

So if the .set_vq_address is called when the VQ is in DRIVER_OK but not
suspended those addresses will be set later for later.

> In that case you're right, we don't need feature flags. But I think it
> would be great to also move the error return in case userspace tries
> to modify vq parameters out of suspend state.
> 
On the driver side or on the core side?

Thanks
> Thanks!
> 
> 
> > > How can we know in the destination QEMU if it is valid to suspend &
> > > set address? Should we handle this as a bugfix and backport the
> > > change?
> > > 
> > > > > 
> > > > > The only way that comes to my mind is to make sure all parents return
> > > > > error if userland tries to do it, and then fallback in userland.
> > > > 
> > > > Yes.
> > > > 
> > > > > I'm
> > > > > ok with that, but I'm not sure if the current master & previous kernel
> > > > > has a coherent behavior. Do they return error? Or return success
> > > > > without changing address / vq state?
> > > > 
> > > > We probably don't need to worry too much here, as e.g set_vq_address
> > > > could fail even without suspend (just at uAPI level).
> > > > 
> > > 
> > > I don't get this, sorry. I rephrased my point with an example earlier
> > > in the mail.
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
  2023-12-21 14:38                 ` Dragos Tatulea
@ 2023-12-21 14:55                   ` Eugenio Perez Martin
  2023-12-21 15:07                     ` Dragos Tatulea
  0 siblings, 1 reply; 50+ messages in thread
From: Eugenio Perez Martin @ 2023-12-21 14:55 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: xuanzhuo, Parav Pandit, Gal Pressman, virtualization,
	linux-kernel, si-wei.liu, kvm, jasowang, Saeed Mahameed, mst,
	leon

On Thu, Dec 21, 2023 at 3:38 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> On Thu, 2023-12-21 at 13:08 +0100, Eugenio Perez Martin wrote:
> > On Thu, Dec 21, 2023 at 12:52 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > >
> > > On Thu, 2023-12-21 at 08:46 +0100, Eugenio Perez Martin wrote:
> > > > On Thu, Dec 21, 2023 at 3:03 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > On Wed, Dec 20, 2023 at 9:32 PM Eugenio Perez Martin
> > > > > <eperezma@redhat.com> wrote:
> > > > > >
> > > > > > On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > >
> > > > > > > On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > > > > > > >
> > > > > > > > > The virtio spec doesn't allow changing virtqueue addresses after
> > > > > > > > > DRIVER_OK. Some devices do support this operation when the device is
> > > > > > > > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > > > > > > > advertises this support as a backend features.
> > > > > > > >
> > > > > > > > There's an ongoing effort in virtio spec to introduce the suspend state.
> > > > > > > >
> > > > > > > > So I wonder if it's better to just allow such behaviour?
> > > > > > >
> > > > > > > Actually I mean, allow drivers to modify the parameters during suspend
> > > > > > > without a new feature.
> > > > > > >
> > > > > >
> > > > > > That would be ideal, but how do userland checks if it can suspend +
> > > > > > change properties + resume?
> > > > >
> > > > > As discussed, it looks to me the only device that supports suspend is
> > > > > simulator and it supports change properties.
> > > > >
> > > > > E.g:
> > > > >
> > > > > static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
> > > > >                                   u64 desc_area, u64 driver_area,
> > > > >                                   u64 device_area)
> > > > > {
> > > > >         struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> > > > >         struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> > > > >
> > > > >         vq->desc_addr = desc_area;
> > > > >         vq->driver_addr = driver_area;
> > > > >         vq->device_addr = device_area;
> > > > >
> > > > >         return 0;
> > > > > }
> > > > >
> > > >
> > > > So in the current kernel master it is valid to set a different vq
> > > > address while the device is suspended in vdpa_sim. But it is not valid
> > > > in mlx5, as the FW will not be updated in resume (Dragos, please
> > > > correct me if I'm wrong). Both of them return success.
> > > >
> > > In the current state, there is no resume. HW Virtqueues will just get re-created
> > > with the new address.
> > >
> >
> > Oh, then all of this is effectively transparent to the userspace
> > except for the time it takes?
> >
> Not quite: mlx5_vdpa_set_vq_address will save the vq address only on the SW vq
> representation. Only later will it will call into the FW to update the FW. Later
> means:
> - On DRIVER_OK state, when the VQs get created.
> - On .set_map when the VQs get re-created (before this series) / updated (after
> this series)
> - On .resume (after this series).
>
> So if the .set_vq_address is called when the VQ is in DRIVER_OK but not
> suspended those addresses will be set later for later.
>

Ouch, that is more in the line of my thoughts :(.

> > In that case you're right, we don't need feature flags. But I think it
> > would be great to also move the error return in case userspace tries
> > to modify vq parameters out of suspend state.
> >
> On the driver side or on the core side?
>

Core side.

It does not have to be part of this series, I meant it can be proposed
in a separate series and applied before the parent driver one.

> Thanks
> > Thanks!
> >
> >
> > > > How can we know in the destination QEMU if it is valid to suspend &
> > > > set address? Should we handle this as a bugfix and backport the
> > > > change?
> > > >
> > > > > >
> > > > > > The only way that comes to my mind is to make sure all parents return
> > > > > > error if userland tries to do it, and then fallback in userland.
> > > > >
> > > > > Yes.
> > > > >
> > > > > > I'm
> > > > > > ok with that, but I'm not sure if the current master & previous kernel
> > > > > > has a coherent behavior. Do they return error? Or return success
> > > > > > without changing address / vq state?
> > > > >
> > > > > We probably don't need to worry too much here, as e.g set_vq_address
> > > > > could fail even without suspend (just at uAPI level).
> > > > >
> > > >
> > > > I don't get this, sorry. I rephrased my point with an example earlier
> > > > in the mail.
> > > >
> > >
> >
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
  2023-12-21 14:55                   ` Eugenio Perez Martin
@ 2023-12-21 15:07                     ` Dragos Tatulea
  2023-12-22  7:30                       ` Eugenio Perez Martin
  2023-12-22  8:29                       ` Michael S. Tsirkin
  0 siblings, 2 replies; 50+ messages in thread
From: Dragos Tatulea @ 2023-12-21 15:07 UTC (permalink / raw)
  To: eperezma
  Cc: xuanzhuo, Parav Pandit, Gal Pressman, virtualization,
	linux-kernel, si-wei.liu, jasowang, kvm, Saeed Mahameed, mst,
	leon

On Thu, 2023-12-21 at 15:55 +0100, Eugenio Perez Martin wrote:
> On Thu, Dec 21, 2023 at 3:38 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > 
> > On Thu, 2023-12-21 at 13:08 +0100, Eugenio Perez Martin wrote:
> > > On Thu, Dec 21, 2023 at 12:52 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > > 
> > > > On Thu, 2023-12-21 at 08:46 +0100, Eugenio Perez Martin wrote:
> > > > > On Thu, Dec 21, 2023 at 3:03 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > 
> > > > > > On Wed, Dec 20, 2023 at 9:32 PM Eugenio Perez Martin
> > > > > > <eperezma@redhat.com> wrote:
> > > > > > > 
> > > > > > > On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > 
> > > > > > > > On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > 
> > > > > > > > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > > > > > > > > 
> > > > > > > > > > The virtio spec doesn't allow changing virtqueue addresses after
> > > > > > > > > > DRIVER_OK. Some devices do support this operation when the device is
> > > > > > > > > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > > > > > > > > advertises this support as a backend features.
> > > > > > > > > 
> > > > > > > > > There's an ongoing effort in virtio spec to introduce the suspend state.
> > > > > > > > > 
> > > > > > > > > So I wonder if it's better to just allow such behaviour?
> > > > > > > > 
> > > > > > > > Actually I mean, allow drivers to modify the parameters during suspend
> > > > > > > > without a new feature.
> > > > > > > > 
> > > > > > > 
> > > > > > > That would be ideal, but how do userland checks if it can suspend +
> > > > > > > change properties + resume?
> > > > > > 
> > > > > > As discussed, it looks to me the only device that supports suspend is
> > > > > > simulator and it supports change properties.
> > > > > > 
> > > > > > E.g:
> > > > > > 
> > > > > > static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
> > > > > >                                   u64 desc_area, u64 driver_area,
> > > > > >                                   u64 device_area)
> > > > > > {
> > > > > >         struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> > > > > >         struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> > > > > > 
> > > > > >         vq->desc_addr = desc_area;
> > > > > >         vq->driver_addr = driver_area;
> > > > > >         vq->device_addr = device_area;
> > > > > > 
> > > > > >         return 0;
> > > > > > }
> > > > > > 
> > > > > 
> > > > > So in the current kernel master it is valid to set a different vq
> > > > > address while the device is suspended in vdpa_sim. But it is not valid
> > > > > in mlx5, as the FW will not be updated in resume (Dragos, please
> > > > > correct me if I'm wrong). Both of them return success.
> > > > > 
> > > > In the current state, there is no resume. HW Virtqueues will just get re-created
> > > > with the new address.
> > > > 
> > > 
> > > Oh, then all of this is effectively transparent to the userspace
> > > except for the time it takes?
> > > 
> > Not quite: mlx5_vdpa_set_vq_address will save the vq address only on the SW vq
> > representation. Only later will it will call into the FW to update the FW. Later
> > means:
> > - On DRIVER_OK state, when the VQs get created.
> > - On .set_map when the VQs get re-created (before this series) / updated (after
> > this series)
> > - On .resume (after this series).
> > 
> > So if the .set_vq_address is called when the VQ is in DRIVER_OK but not
> > suspended those addresses will be set later for later.
> > 
> 
> Ouch, that is more in the line of my thoughts :(.
> 
> > > In that case you're right, we don't need feature flags. But I think it
> > > would be great to also move the error return in case userspace tries
> > > to modify vq parameters out of suspend state.
> > > 
> > On the driver side or on the core side?
> > 
> 
> Core side.
> 
Checking my understanding: instead of the feature flags there would be a check
(for .set_vq_addr and .set_vq_state) to return an error if they are called under
DRIVER_OK and not suspended state?

> It does not have to be part of this series, I meant it can be proposed
> in a separate series and applied before the parent driver one.
> 
> > Thanks
> > > Thanks!
> > > 
> > > 
> > > > > How can we know in the destination QEMU if it is valid to suspend &
> > > > > set address? Should we handle this as a bugfix and backport the
> > > > > change?
> > > > > 
> > > > > > > 
> > > > > > > The only way that comes to my mind is to make sure all parents return
> > > > > > > error if userland tries to do it, and then fallback in userland.
> > > > > > 
> > > > > > Yes.
> > > > > > 
> > > > > > > I'm
> > > > > > > ok with that, but I'm not sure if the current master & previous kernel
> > > > > > > has a coherent behavior. Do they return error? Or return success
> > > > > > > without changing address / vq state?
> > > > > > 
> > > > > > We probably don't need to worry too much here, as e.g set_vq_address
> > > > > > could fail even without suspend (just at uAPI level).
> > > > > > 
> > > > > 
> > > > > I don't get this, sorry. I rephrased my point with an example earlier
> > > > > in the mail.
> > > > > 
> > > > 
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
  2023-12-21  7:46           ` Eugenio Perez Martin
  2023-12-21 11:52             ` Dragos Tatulea
@ 2023-12-22  2:50             ` Jason Wang
  1 sibling, 0 replies; 50+ messages in thread
From: Jason Wang @ 2023-12-22  2:50 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Dragos Tatulea, Michael S . Tsirkin, Si-Wei Liu, Saeed Mahameed,
	Leon Romanovsky, virtualization, Gal Pressman, kvm, linux-kernel,
	Parav Pandit, Xuan Zhuo

On Thu, Dec 21, 2023 at 3:47 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Thu, Dec 21, 2023 at 3:03 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Wed, Dec 20, 2023 at 9:32 PM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > > > >
> > > > > > The virtio spec doesn't allow changing virtqueue addresses after
> > > > > > DRIVER_OK. Some devices do support this operation when the device is
> > > > > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > > > > advertises this support as a backend features.
> > > > >
> > > > > There's an ongoing effort in virtio spec to introduce the suspend state.
> > > > >
> > > > > So I wonder if it's better to just allow such behaviour?
> > > >
> > > > Actually I mean, allow drivers to modify the parameters during suspend
> > > > without a new feature.
> > > >
> > >
> > > That would be ideal, but how do userland checks if it can suspend +
> > > change properties + resume?
> >
> > As discussed, it looks to me the only device that supports suspend is
> > simulator and it supports change properties.
> >
> > E.g:
> >
> > static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
> >                                   u64 desc_area, u64 driver_area,
> >                                   u64 device_area)
> > {
> >         struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> >         struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> >
> >         vq->desc_addr = desc_area;
> >         vq->driver_addr = driver_area;
> >         vq->device_addr = device_area;
> >
> >         return 0;
> > }
> >
>
> So in the current kernel master it is valid to set a different vq
> address while the device is suspended in vdpa_sim. But it is not valid
> in mlx5, as the FW will not be updated in resume (Dragos, please
> correct me if I'm wrong). Both of them return success.
>
> How can we know in the destination QEMU if it is valid to suspend &
> set address? Should we handle this as a bugfix and backport the
> change?

Good point.

We probably need to do backport, this seems to be the easiest way.
Theoretically, userspace may assume this behavior (though I don't
think there would be a user that depends on the simulator).

>
> > >
> > > The only way that comes to my mind is to make sure all parents return
> > > error if userland tries to do it, and then fallback in userland.
> >
> > Yes.
> >
> > > I'm
> > > ok with that, but I'm not sure if the current master & previous kernel
> > > has a coherent behavior. Do they return error? Or return success
> > > without changing address / vq state?
> >
> > We probably don't need to worry too much here, as e.g set_vq_address
> > could fail even without suspend (just at uAPI level).
> >
>
> I don't get this, sorry. I rephrased my point with an example earlier
> in the mail.

I mean currently, VHOST_SET_VRING_ADDR can fail. So userspace should
not assume it will always succeed.

Thanks

>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
  2023-12-21 15:07                     ` Dragos Tatulea
@ 2023-12-22  7:30                       ` Eugenio Perez Martin
  2023-12-22  8:29                       ` Michael S. Tsirkin
  1 sibling, 0 replies; 50+ messages in thread
From: Eugenio Perez Martin @ 2023-12-22  7:30 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: xuanzhuo, Parav Pandit, Gal Pressman, virtualization,
	linux-kernel, si-wei.liu, jasowang, kvm, Saeed Mahameed, mst,
	leon

On Thu, Dec 21, 2023 at 4:07 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> On Thu, 2023-12-21 at 15:55 +0100, Eugenio Perez Martin wrote:
> > On Thu, Dec 21, 2023 at 3:38 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > >
> > > On Thu, 2023-12-21 at 13:08 +0100, Eugenio Perez Martin wrote:
> > > > On Thu, Dec 21, 2023 at 12:52 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > > >
> > > > > On Thu, 2023-12-21 at 08:46 +0100, Eugenio Perez Martin wrote:
> > > > > > On Thu, Dec 21, 2023 at 3:03 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > >
> > > > > > > On Wed, Dec 20, 2023 at 9:32 PM Eugenio Perez Martin
> > > > > > > <eperezma@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > The virtio spec doesn't allow changing virtqueue addresses after
> > > > > > > > > > > DRIVER_OK. Some devices do support this operation when the device is
> > > > > > > > > > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > > > > > > > > > advertises this support as a backend features.
> > > > > > > > > >
> > > > > > > > > > There's an ongoing effort in virtio spec to introduce the suspend state.
> > > > > > > > > >
> > > > > > > > > > So I wonder if it's better to just allow such behaviour?
> > > > > > > > >
> > > > > > > > > Actually I mean, allow drivers to modify the parameters during suspend
> > > > > > > > > without a new feature.
> > > > > > > > >
> > > > > > > >
> > > > > > > > That would be ideal, but how do userland checks if it can suspend +
> > > > > > > > change properties + resume?
> > > > > > >
> > > > > > > As discussed, it looks to me the only device that supports suspend is
> > > > > > > simulator and it supports change properties.
> > > > > > >
> > > > > > > E.g:
> > > > > > >
> > > > > > > static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
> > > > > > >                                   u64 desc_area, u64 driver_area,
> > > > > > >                                   u64 device_area)
> > > > > > > {
> > > > > > >         struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> > > > > > >         struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> > > > > > >
> > > > > > >         vq->desc_addr = desc_area;
> > > > > > >         vq->driver_addr = driver_area;
> > > > > > >         vq->device_addr = device_area;
> > > > > > >
> > > > > > >         return 0;
> > > > > > > }
> > > > > > >
> > > > > >
> > > > > > So in the current kernel master it is valid to set a different vq
> > > > > > address while the device is suspended in vdpa_sim. But it is not valid
> > > > > > in mlx5, as the FW will not be updated in resume (Dragos, please
> > > > > > correct me if I'm wrong). Both of them return success.
> > > > > >
> > > > > In the current state, there is no resume. HW Virtqueues will just get re-created
> > > > > with the new address.
> > > > >
> > > >
> > > > Oh, then all of this is effectively transparent to the userspace
> > > > except for the time it takes?
> > > >
> > > Not quite: mlx5_vdpa_set_vq_address will save the vq address only on the SW vq
> > > representation. Only later will it will call into the FW to update the FW. Later
> > > means:
> > > - On DRIVER_OK state, when the VQs get created.
> > > - On .set_map when the VQs get re-created (before this series) / updated (after
> > > this series)
> > > - On .resume (after this series).
> > >
> > > So if the .set_vq_address is called when the VQ is in DRIVER_OK but not
> > > suspended those addresses will be set later for later.
> > >
> >
> > Ouch, that is more in the line of my thoughts :(.
> >
> > > > In that case you're right, we don't need feature flags. But I think it
> > > > would be great to also move the error return in case userspace tries
> > > > to modify vq parameters out of suspend state.
> > > >
> > > On the driver side or on the core side?
> > >
> >
> > Core side.
> >
> Checking my understanding: instead of the feature flags there would be a check
> (for .set_vq_addr and .set_vq_state) to return an error if they are called under
> DRIVER_OK and not suspended state?
>

Yes, correct. Per Jason's message, it should be enough with two
independent series:
* Patches 6, 7 and 8 of this series, just checking for suspend state
and not feature flags.
* Your v2.

Thanks!

> > It does not have to be part of this series, I meant it can be proposed
> > in a separate series and applied before the parent driver one.
> >
> > > Thanks
> > > > Thanks!
> > > >
> > > >
> > > > > > How can we know in the destination QEMU if it is valid to suspend &
> > > > > > set address? Should we handle this as a bugfix and backport the
> > > > > > change?
> > > > > >
> > > > > > > >
> > > > > > > > The only way that comes to my mind is to make sure all parents return
> > > > > > > > error if userland tries to do it, and then fallback in userland.
> > > > > > >
> > > > > > > Yes.
> > > > > > >
> > > > > > > > I'm
> > > > > > > > ok with that, but I'm not sure if the current master & previous kernel
> > > > > > > > has a coherent behavior. Do they return error? Or return success
> > > > > > > > without changing address / vq state?
> > > > > > >
> > > > > > > We probably don't need to worry too much here, as e.g set_vq_address
> > > > > > > could fail even without suspend (just at uAPI level).
> > > > > > >
> > > > > >
> > > > > > I don't get this, sorry. I rephrased my point with an example earlier
> > > > > > in the mail.
> > > > > >
> > > > >
> > > >
> > >
> >
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
  2023-12-21 15:07                     ` Dragos Tatulea
  2023-12-22  7:30                       ` Eugenio Perez Martin
@ 2023-12-22  8:29                       ` Michael S. Tsirkin
  2023-12-22 10:51                         ` Dragos Tatulea
  1 sibling, 1 reply; 50+ messages in thread
From: Michael S. Tsirkin @ 2023-12-22  8:29 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: eperezma, xuanzhuo, Parav Pandit, Gal Pressman, virtualization,
	linux-kernel, si-wei.liu, jasowang, kvm, Saeed Mahameed, leon

On Thu, Dec 21, 2023 at 03:07:22PM +0000, Dragos Tatulea wrote:
> > > > In that case you're right, we don't need feature flags. But I think it
> > > > would be great to also move the error return in case userspace tries
> > > > to modify vq parameters out of suspend state.
> > > > 
> > > On the driver side or on the core side?
> > > 
> > 
> > Core side.
> > 
> Checking my understanding: instead of the feature flags there would be a check
> (for .set_vq_addr and .set_vq_state) to return an error if they are called under
> DRIVER_OK and not suspended state?

Yea this looks much saner, if we start adding feature flags for
each OPERATION_X_LEGAL_IN_STATE_Y then we will end up with N^2
feature bits which is not reasonable.

-- 
MST


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
  2023-12-22  8:29                       ` Michael S. Tsirkin
@ 2023-12-22 10:51                         ` Dragos Tatulea
  2023-12-25 13:45                           ` Dragos Tatulea
  0 siblings, 1 reply; 50+ messages in thread
From: Dragos Tatulea @ 2023-12-22 10:51 UTC (permalink / raw)
  To: mst
  Cc: xuanzhuo, Parav Pandit, Gal Pressman, virtualization, eperezma,
	linux-kernel, si-wei.liu, jasowang, kvm, Saeed Mahameed, leon

On Fri, 2023-12-22 at 03:29 -0500, Michael S. Tsirkin wrote:
> On Thu, Dec 21, 2023 at 03:07:22PM +0000, Dragos Tatulea wrote:
> > > > > In that case you're right, we don't need feature flags. But I think it
> > > > > would be great to also move the error return in case userspace tries
> > > > > to modify vq parameters out of suspend state.
> > > > > 
> > > > On the driver side or on the core side?
> > > > 
> > > 
> > > Core side.
> > > 
> > Checking my understanding: instead of the feature flags there would be a check
> > (for .set_vq_addr and .set_vq_state) to return an error if they are called under
> > DRIVER_OK and not suspended state?
> 
> Yea this looks much saner, if we start adding feature flags for
> each OPERATION_X_LEGAL_IN_STATE_Y then we will end up with N^2
> feature bits which is not reasonable.
> 
Ack. Is the v2 enough or should I respin a v5 with the updated Acked-by tags?

I will prepare the core part as a different series without the flags.

Thanks,
Dragos

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 06/15] vdpa: Track device suspended state
  2023-12-20 12:55     ` Dragos Tatulea
@ 2023-12-22 11:22       ` Dragos Tatulea
  2023-12-25  4:11         ` Jason Wang
  0 siblings, 1 reply; 50+ messages in thread
From: Dragos Tatulea @ 2023-12-22 11:22 UTC (permalink / raw)
  To: jasowang
  Cc: xuanzhuo, Parav Pandit, Gal Pressman, eperezma, virtualization,
	linux-kernel, si-wei.liu, kvm, mst, Saeed Mahameed, leon

On Wed, 2023-12-20 at 13:55 +0100, Dragos Tatulea wrote:
> On Wed, 2023-12-20 at 11:46 +0800, Jason Wang wrote:
> > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > 
> > > Set vdpa device suspended state on successful suspend. Clear it on
> > > successful resume and reset.
> > > 
> > > The state will be locked by the vhost_vdpa mutex. The mutex is taken
> > > during suspend, resume and reset in vhost_vdpa_unlocked_ioctl. The
> > > exception is vhost_vdpa_open which does a device reset but that should
> > > be safe because it can only happen before the other ops.
> > > 
> > > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> > > Suggested-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > >  drivers/vhost/vdpa.c | 17 +++++++++++++++--
> > >  1 file changed, 15 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > > index b4e8ddf86485..00b4fa8e89f2 100644
> > > --- a/drivers/vhost/vdpa.c
> > > +++ b/drivers/vhost/vdpa.c
> > > @@ -59,6 +59,7 @@ struct vhost_vdpa {
> > >         int in_batch;
> > >         struct vdpa_iova_range range;
> > >         u32 batch_asid;
> > > +       bool suspended;
> > 
> > Any reason why we don't do it in the core vDPA device but here?
> > 
> Not really. I wanted to be safe and not expose it in a header due to locking.
> 
A few clearer answers for why the state is not added in struct vdpa_device:
- All the suspend infrastructure is currently only for vhost.
- If the state would be moved to struct vdpa_device then the cf_lock would have
to be used. This adds more complexity to the code.

Thanks,
Dragos

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 06/15] vdpa: Track device suspended state
  2023-12-22 11:22       ` Dragos Tatulea
@ 2023-12-25  4:11         ` Jason Wang
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Wang @ 2023-12-25  4:11 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: xuanzhuo, Parav Pandit, Gal Pressman, eperezma, virtualization,
	linux-kernel, si-wei.liu, kvm, mst, Saeed Mahameed, leon

On Fri, Dec 22, 2023 at 7:22 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> On Wed, 2023-12-20 at 13:55 +0100, Dragos Tatulea wrote:
> > On Wed, 2023-12-20 at 11:46 +0800, Jason Wang wrote:
> > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > >
> > > > Set vdpa device suspended state on successful suspend. Clear it on
> > > > successful resume and reset.
> > > >
> > > > The state will be locked by the vhost_vdpa mutex. The mutex is taken
> > > > during suspend, resume and reset in vhost_vdpa_unlocked_ioctl. The
> > > > exception is vhost_vdpa_open which does a device reset but that should
> > > > be safe because it can only happen before the other ops.
> > > >
> > > > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> > > > Suggested-by: Eugenio Pérez <eperezma@redhat.com>
> > > > ---
> > > >  drivers/vhost/vdpa.c | 17 +++++++++++++++--
> > > >  1 file changed, 15 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > > > index b4e8ddf86485..00b4fa8e89f2 100644
> > > > --- a/drivers/vhost/vdpa.c
> > > > +++ b/drivers/vhost/vdpa.c
> > > > @@ -59,6 +59,7 @@ struct vhost_vdpa {
> > > >         int in_batch;
> > > >         struct vdpa_iova_range range;
> > > >         u32 batch_asid;
> > > > +       bool suspended;
> > >
> > > Any reason why we don't do it in the core vDPA device but here?
> > >
> > Not really. I wanted to be safe and not expose it in a header due to locking.
> >
> A few clearer answers for why the state is not added in struct vdpa_device:
> - All the suspend infrastructure is currently only for vhost.
> - If the state would be moved to struct vdpa_device then the cf_lock would have
> to be used. This adds more complexity to the code.
>
> Thanks,
> Dragos

Ok, I'm fine with that.

Thanks


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
  2023-12-22 10:51                         ` Dragos Tatulea
@ 2023-12-25 13:45                           ` Dragos Tatulea
  0 siblings, 0 replies; 50+ messages in thread
From: Dragos Tatulea @ 2023-12-25 13:45 UTC (permalink / raw)
  To: mst
  Cc: xuanzhuo, Parav Pandit, Gal Pressman, virtualization, eperezma,
	linux-kernel, si-wei.liu, jasowang, kvm, Saeed Mahameed, leon

On Fri, 2023-12-22 at 11:51 +0100, Dragos Tatulea wrote:
> On Fri, 2023-12-22 at 03:29 -0500, Michael S. Tsirkin wrote:
> > On Thu, Dec 21, 2023 at 03:07:22PM +0000, Dragos Tatulea wrote:
> > > > > > In that case you're right, we don't need feature flags. But I think it
> > > > > > would be great to also move the error return in case userspace tries
> > > > > > to modify vq parameters out of suspend state.
> > > > > > 
> > > > > On the driver side or on the core side?
> > > > > 
> > > > 
> > > > Core side.
> > > > 
> > > Checking my understanding: instead of the feature flags there would be a check
> > > (for .set_vq_addr and .set_vq_state) to return an error if they are called under
> > > DRIVER_OK and not suspended state?
> > 
> > Yea this looks much saner, if we start adding feature flags for
> > each OPERATION_X_LEGAL_IN_STATE_Y then we will end up with N^2
> > feature bits which is not reasonable.
> > 
> Ack. Is the v2 enough or should I respin a v5 with the updated Acked-by tags?
> 
> I will prepare the core part as a different series without the flags.
> 
Core part sent:
https://lore.kernel.org/virtualization/20231225134210.151540-1-dtatulea@nvidia.com/T/#t

I also have a v2 respin with extra Acked-by tags if necessary as a v5. Just let
me know if it is needed.

Thanks,
Dragos

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 00/15] vdpa/mlx5: Add support for resumable vqs
  2023-12-19 18:08 [PATCH vhost v4 00/15] vdpa/mlx5: Add support for resumable vqs Dragos Tatulea
                   ` (14 preceding siblings ...)
  2023-12-19 18:08 ` [PATCH vhost v4 15/15] vdpa/mlx5: Add mkey leak detection Dragos Tatulea
@ 2023-12-25 14:41 ` Michael S. Tsirkin
  2023-12-25 15:05   ` Dragos Tatulea
  15 siblings, 1 reply; 50+ messages in thread
From: Michael S. Tsirkin @ 2023-12-25 14:41 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Jason Wang, Eugenio Perez Martin, Si-Wei Liu, Saeed Mahameed,
	Leon Romanovsky, virtualization, Gal Pressman, kvm, linux-kernel,
	Parav Pandit, Xuan Zhuo

On Tue, Dec 19, 2023 at 08:08:43PM +0200, Dragos Tatulea wrote:
> Add support for resumable vqs in the mlx5_vdpa driver. This is a
> firmware feature that can be used for the following benefits:
> - Full device .suspend/.resume.
> - .set_map doesn't need to destroy and create new vqs anymore just to
>   update the map. When resumable vqs are supported it is enough to
>   suspend the vqs, set the new maps, and then resume the vqs.
> 
> The first patch exposes relevant bits for the feature in mlx5_ifc.h.
> That means it needs to be applied to the mlx5-vhost tree [0] first. Once
> applied there, the change has to be pulled from mlx5-vhost into the
> vhost tree and only then the remaining patches can be applied. Same flow
> as the vq descriptor mappings patchset [1].
> 
> The second part implements the vdpa backend feature support to allow
> vq state and address changes when the device is in DRIVER_OK state and
> suspended.
> 
> The third part adds support for seletively modifying vq parameters. This
> is needed to be able to use resumable vqs.
> 
> Then the actual support for resumable vqs is added.
> 
> The last part of the series introduces reference counting for mrs which
> is necessary to avoid freeing mkeys too early or leaking them.


I lost track. Are you going to send v5 or not?

> * Changes in v4:
> - Added vdpa backend feature support for changing vq properties in
>   DRIVER_OK when device is suspended. Added support in the driver as
>   well.
> - Dropped Acked-by for the patches that had the tag mistakenly
>   added.
> 
> * Changes in v3:
> - Faulty version. Please ignore.
> 
> * Changes in v2:
> - Added mr refcounting patches.
> - Deleted unnecessary patch: "vdpa/mlx5: Split function into locked and
>   unlocked variants"
> - Small print improvement in "Introduce per vq and device resume"
>   patch.
> - Patch 1/7 has been applied to mlx5-vhost branch.
> 
> 
> Dragos Tatulea (15):
>   vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
>   vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND flag
>   vdpa: Accept VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND backend
>     feature
>   vdpa: Accept VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND backend
>     feature
>   vdpa: Track device suspended state
>   vdpa: Block vq address change in DRIVER_OK unless device supports it
>   vdpa: Block vq state change in DRIVER_OK unless device supports it
>   vdpa/mlx5: Expose resumable vq capability
>   vdpa/mlx5: Allow modifying multiple vq fields in one modify command
>   vdpa/mlx5: Introduce per vq and device resume
>   vdpa/mlx5: Mark vq addrs for modification in hw vq
>   vdpa/mlx5: Mark vq state for modification in hw vq
>   vdpa/mlx5: Use vq suspend/resume during .set_map
>   vdpa/mlx5: Introduce reference counting to mrs
>   vdpa/mlx5: Add mkey leak detection
> 
>  drivers/vdpa/mlx5/core/mlx5_vdpa.h |  10 +-
>  drivers/vdpa/mlx5/core/mr.c        |  69 +++++++--
>  drivers/vdpa/mlx5/net/mlx5_vnet.c  | 218 ++++++++++++++++++++++++++---
>  drivers/vhost/vdpa.c               |  51 ++++++-
>  include/linux/mlx5/mlx5_ifc.h      |   3 +-
>  include/linux/mlx5/mlx5_ifc_vdpa.h |   4 +
>  include/uapi/linux/vhost_types.h   |   8 ++
>  7 files changed, 322 insertions(+), 41 deletions(-)
> 
> -- 
> 2.43.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH vhost v4 00/15] vdpa/mlx5: Add support for resumable vqs
  2023-12-25 14:41 ` [PATCH vhost v4 00/15] vdpa/mlx5: Add support for resumable vqs Michael S. Tsirkin
@ 2023-12-25 15:05   ` Dragos Tatulea
  0 siblings, 0 replies; 50+ messages in thread
From: Dragos Tatulea @ 2023-12-25 15:05 UTC (permalink / raw)
  To: mst
  Cc: xuanzhuo, Parav Pandit, Gal Pressman, eperezma, virtualization,
	linux-kernel, si-wei.liu, kvm, jasowang, Saeed Mahameed, leon

On Mon, 2023-12-25 at 09:41 -0500, Michael S. Tsirkin wrote:
> On Tue, Dec 19, 2023 at 08:08:43PM +0200, Dragos Tatulea wrote:
> > Add support for resumable vqs in the mlx5_vdpa driver. This is a
> > firmware feature that can be used for the following benefits:
> > - Full device .suspend/.resume.
> > - .set_map doesn't need to destroy and create new vqs anymore just to
> >   update the map. When resumable vqs are supported it is enough to
> >   suspend the vqs, set the new maps, and then resume the vqs.
> > 
> > The first patch exposes relevant bits for the feature in mlx5_ifc.h.
> > That means it needs to be applied to the mlx5-vhost tree [0] first. Once
> > applied there, the change has to be pulled from mlx5-vhost into the
> > vhost tree and only then the remaining patches can be applied. Same flow
> > as the vq descriptor mappings patchset [1].
> > 
> > The second part implements the vdpa backend feature support to allow
> > vq state and address changes when the device is in DRIVER_OK state and
> > suspended.
> > 
> > The third part adds support for seletively modifying vq parameters. This
> > is needed to be able to use resumable vqs.
> > 
> > Then the actual support for resumable vqs is added.
> > 
> > The last part of the series introduces reference counting for mrs which
> > is necessary to avoid freeing mkeys too early or leaking them.
> 
> 
> I lost track. Are you going to send v5 or not?
> 
I was waiting for your answer if I should send it or not. I suppose this means
yes. I will send it in a few minutes.

> > * Changes in v4:
> > - Added vdpa backend feature support for changing vq properties in
> >   DRIVER_OK when device is suspended. Added support in the driver as
> >   well.
> > - Dropped Acked-by for the patches that had the tag mistakenly
> >   added.
> > 
> > * Changes in v3:
> > - Faulty version. Please ignore.
> > 
> > * Changes in v2:
> > - Added mr refcounting patches.
> > - Deleted unnecessary patch: "vdpa/mlx5: Split function into locked and
> >   unlocked variants"
> > - Small print improvement in "Introduce per vq and device resume"
> >   patch.
> > - Patch 1/7 has been applied to mlx5-vhost branch.
> > 
> > 
> > Dragos Tatulea (15):
> >   vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> >   vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND flag
> >   vdpa: Accept VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND backend
> >     feature
> >   vdpa: Accept VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND backend
> >     feature
> >   vdpa: Track device suspended state
> >   vdpa: Block vq address change in DRIVER_OK unless device supports it
> >   vdpa: Block vq state change in DRIVER_OK unless device supports it
> >   vdpa/mlx5: Expose resumable vq capability
> >   vdpa/mlx5: Allow modifying multiple vq fields in one modify command
> >   vdpa/mlx5: Introduce per vq and device resume
> >   vdpa/mlx5: Mark vq addrs for modification in hw vq
> >   vdpa/mlx5: Mark vq state for modification in hw vq
> >   vdpa/mlx5: Use vq suspend/resume during .set_map
> >   vdpa/mlx5: Introduce reference counting to mrs
> >   vdpa/mlx5: Add mkey leak detection
> > 
> >  drivers/vdpa/mlx5/core/mlx5_vdpa.h |  10 +-
> >  drivers/vdpa/mlx5/core/mr.c        |  69 +++++++--
> >  drivers/vdpa/mlx5/net/mlx5_vnet.c  | 218 ++++++++++++++++++++++++++---
> >  drivers/vhost/vdpa.c               |  51 ++++++-
> >  include/linux/mlx5/mlx5_ifc.h      |   3 +-
> >  include/linux/mlx5/mlx5_ifc_vdpa.h |   4 +
> >  include/uapi/linux/vhost_types.h   |   8 ++
> >  7 files changed, 322 insertions(+), 41 deletions(-)
> > 
> > -- 
> > 2.43.0
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2023-12-25 15:05 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-19 18:08 [PATCH vhost v4 00/15] vdpa/mlx5: Add support for resumable vqs Dragos Tatulea
2023-12-19 18:08 ` [PATCH mlx5-vhost v4 01/15] vdpa/mlx5: Expose resumable vq capability Dragos Tatulea
2023-12-20  3:46   ` Jason Wang
2023-12-19 18:08 ` [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag Dragos Tatulea
2023-12-20  3:46   ` Jason Wang
2023-12-20  4:05     ` Jason Wang
2023-12-20 12:57       ` Dragos Tatulea
2023-12-20 13:32       ` Eugenio Perez Martin
2023-12-21  2:03         ` Jason Wang
2023-12-21  7:46           ` Eugenio Perez Martin
2023-12-21 11:52             ` Dragos Tatulea
2023-12-21 12:08               ` Eugenio Perez Martin
2023-12-21 14:38                 ` Dragos Tatulea
2023-12-21 14:55                   ` Eugenio Perez Martin
2023-12-21 15:07                     ` Dragos Tatulea
2023-12-22  7:30                       ` Eugenio Perez Martin
2023-12-22  8:29                       ` Michael S. Tsirkin
2023-12-22 10:51                         ` Dragos Tatulea
2023-12-25 13:45                           ` Dragos Tatulea
2023-12-22  2:50             ` Jason Wang
2023-12-20 16:09   ` Eugenio Perez Martin
2023-12-19 18:08 ` [PATCH vhost v4 03/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND flag Dragos Tatulea
2023-12-20 16:10   ` Eugenio Perez Martin
2023-12-19 18:08 ` [PATCH vhost v4 04/15] vdpa: Accept VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND backend feature Dragos Tatulea
2023-12-20 16:11   ` Eugenio Perez Martin
2023-12-19 18:08 ` [PATCH vhost v4 05/15] vdpa: Accept VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND " Dragos Tatulea
2023-12-20 16:12   ` Eugenio Perez Martin
2023-12-19 18:08 ` [PATCH vhost v4 06/15] vdpa: Track device suspended state Dragos Tatulea
2023-12-20  3:46   ` Jason Wang
2023-12-20 12:55     ` Dragos Tatulea
2023-12-22 11:22       ` Dragos Tatulea
2023-12-25  4:11         ` Jason Wang
2023-12-19 18:08 ` [PATCH vhost v4 07/15] vdpa: Block vq address change in DRIVER_OK unless device supports it Dragos Tatulea
2023-12-20 16:31   ` Eugenio Perez Martin
2023-12-19 18:08 ` [PATCH vhost v4 08/15] vdpa: Block vq state " Dragos Tatulea
2023-12-20 16:32   ` Eugenio Perez Martin
2023-12-19 18:08 ` [PATCH vhost v4 09/15] vdpa/mlx5: Allow modifying multiple vq fields in one modify command Dragos Tatulea
2023-12-20  3:46   ` Jason Wang
2023-12-19 18:08 ` [PATCH vhost v4 10/15] vdpa/mlx5: Introduce per vq and device resume Dragos Tatulea
2023-12-20  3:47   ` Jason Wang
2023-12-19 18:08 ` [PATCH vhost v4 11/15] vdpa/mlx5: Mark vq addrs for modification in hw vq Dragos Tatulea
2023-12-19 18:08 ` [PATCH vhost v4 12/15] vdpa/mlx5: Mark vq state " Dragos Tatulea
2023-12-20  3:47   ` Jason Wang
2023-12-19 18:08 ` [PATCH vhost v4 13/15] vdpa/mlx5: Use vq suspend/resume during .set_map Dragos Tatulea
2023-12-20  3:47   ` Jason Wang
2023-12-19 18:08 ` [PATCH vhost v4 14/15] vdpa/mlx5: Introduce reference counting to mrs Dragos Tatulea
2023-12-20  3:47   ` Jason Wang
2023-12-19 18:08 ` [PATCH vhost v4 15/15] vdpa/mlx5: Add mkey leak detection Dragos Tatulea
2023-12-25 14:41 ` [PATCH vhost v4 00/15] vdpa/mlx5: Add support for resumable vqs Michael S. Tsirkin
2023-12-25 15:05   ` Dragos Tatulea

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.