All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] vDPA: initial config export via "vdpa dev show"
@ 2022-10-21 22:43 ` Si-Wei Liu
  0 siblings, 0 replies; 42+ messages in thread
From: Si-Wei Liu @ 2022-10-21 22:43 UTC (permalink / raw)
  To: mst, jasowang, parav; +Cc: linux-kernel, virtualization

Live migration of vdpa would typically require re-instate vdpa
device with an idential set of configs on the destination node,
same way as how source node created the device in the first place. 

In order to allow live migration orchestration software to export the
initial set of vdpa attributes with which the device was created, it
will be useful if the vdpa tool can report the config on demand with
simple query. This will ease the orchestration software implementation
so that it doesn't have to keep track of vdpa config change, or have
to persist vdpa attributes across failure and recovery, in fear of
being killed due to accidental software error.

In this series, the initial device config for vdpa creation will be
exported via the "vdpa dev show" command. This is unlike the "vdpa
dev config show" command that usually goes with the live value in
the device config space, and is not reliable subject to the dynamics
of feature negotiation or possible change by driver to the config
space.

Examples:

1) Create vDPA by default without any config attribute

$ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0
$ vdpa dev show vdpa0
vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 max_vqs 9 max_vq_size 256
$ vdpa dev -jp show vdpa0
{
    "dev": {
        "vdpa0": {
            "type": "network",
            "mgmtdev": "pci/0000:41:04.2",
            "vendor_id": 5555,
            "max_vqs": 9,
            "max_vq_size": 256,
        }
    }
}

2) Create vDPA with config attribute(s) specified

$ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0 \
    mac e4:11:c6:d3:45:f0 max_vq_pairs 4
$ vdpa dev show
vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 max_vqs 9 max_vq_size 256
  initial_config: mac e4:11:c6:d3:45:f0 max_vq_pairs 4
$ vdpa dev -jp show
{
    "dev": {
        "vdpa0": {
            "type": "network",
            "mgmtdev": "pci/0000:41:04.2",
            "vendor_id": 5555,
            "max_vqs": 9,
            "max_vq_size": 256,
            "initial_config": {
                "mac": "e4:11:c6:d3:45:f0",
                "max_vq_pairs": 4
            }
        }
    }
}

---
v2 -> v3:
  - Rename vdev_cfg to init_cfg and also related function (Jason)
  - Change "virtio_config" to "initial_config" in command example
    output (Parav)

v1 -> v2:
  - Revised example output to export all config attributes under a
    json object (Jason)

---

Si-Wei Liu (4):
  vdpa: save vdpa_dev_set_config in struct vdpa_device
  vdpa: pass initial config to _vdpa_register_device()
  vdpa: show dev config as-is in "vdpa dev show" output
  vdpa: fix improper error message when adding vdpa dev

 drivers/vdpa/ifcvf/ifcvf_main.c      |  2 +-
 drivers/vdpa/mlx5/net/mlx5_vnet.c    |  2 +-
 drivers/vdpa/vdpa.c                  | 63 +++++++++++++++++++++++++++++++++---
 drivers/vdpa/vdpa_sim/vdpa_sim_blk.c |  2 +-
 drivers/vdpa/vdpa_sim/vdpa_sim_net.c |  2 +-
 drivers/vdpa/vdpa_user/vduse_dev.c   |  2 +-
 drivers/vdpa/virtio_pci/vp_vdpa.c    |  3 +-
 include/linux/vdpa.h                 | 26 ++++++++-------
 8 files changed, 80 insertions(+), 22 deletions(-)

-- 
1.8.3.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v3 0/4] vDPA: initial config export via "vdpa dev show"
@ 2022-10-21 22:43 ` Si-Wei Liu
  0 siblings, 0 replies; 42+ messages in thread
From: Si-Wei Liu @ 2022-10-21 22:43 UTC (permalink / raw)
  To: mst, jasowang, parav; +Cc: virtualization, linux-kernel

Live migration of vdpa would typically require re-instate vdpa
device with an idential set of configs on the destination node,
same way as how source node created the device in the first place. 

In order to allow live migration orchestration software to export the
initial set of vdpa attributes with which the device was created, it
will be useful if the vdpa tool can report the config on demand with
simple query. This will ease the orchestration software implementation
so that it doesn't have to keep track of vdpa config change, or have
to persist vdpa attributes across failure and recovery, in fear of
being killed due to accidental software error.

In this series, the initial device config for vdpa creation will be
exported via the "vdpa dev show" command. This is unlike the "vdpa
dev config show" command that usually goes with the live value in
the device config space, and is not reliable subject to the dynamics
of feature negotiation or possible change by driver to the config
space.

Examples:

1) Create vDPA by default without any config attribute

$ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0
$ vdpa dev show vdpa0
vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 max_vqs 9 max_vq_size 256
$ vdpa dev -jp show vdpa0
{
    "dev": {
        "vdpa0": {
            "type": "network",
            "mgmtdev": "pci/0000:41:04.2",
            "vendor_id": 5555,
            "max_vqs": 9,
            "max_vq_size": 256,
        }
    }
}

2) Create vDPA with config attribute(s) specified

$ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0 \
    mac e4:11:c6:d3:45:f0 max_vq_pairs 4
$ vdpa dev show
vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 max_vqs 9 max_vq_size 256
  initial_config: mac e4:11:c6:d3:45:f0 max_vq_pairs 4
$ vdpa dev -jp show
{
    "dev": {
        "vdpa0": {
            "type": "network",
            "mgmtdev": "pci/0000:41:04.2",
            "vendor_id": 5555,
            "max_vqs": 9,
            "max_vq_size": 256,
            "initial_config": {
                "mac": "e4:11:c6:d3:45:f0",
                "max_vq_pairs": 4
            }
        }
    }
}

---
v2 -> v3:
  - Rename vdev_cfg to init_cfg and also related function (Jason)
  - Change "virtio_config" to "initial_config" in command example
    output (Parav)

v1 -> v2:
  - Revised example output to export all config attributes under a
    json object (Jason)

---

Si-Wei Liu (4):
  vdpa: save vdpa_dev_set_config in struct vdpa_device
  vdpa: pass initial config to _vdpa_register_device()
  vdpa: show dev config as-is in "vdpa dev show" output
  vdpa: fix improper error message when adding vdpa dev

 drivers/vdpa/ifcvf/ifcvf_main.c      |  2 +-
 drivers/vdpa/mlx5/net/mlx5_vnet.c    |  2 +-
 drivers/vdpa/vdpa.c                  | 63 +++++++++++++++++++++++++++++++++---
 drivers/vdpa/vdpa_sim/vdpa_sim_blk.c |  2 +-
 drivers/vdpa/vdpa_sim/vdpa_sim_net.c |  2 +-
 drivers/vdpa/vdpa_user/vduse_dev.c   |  2 +-
 drivers/vdpa/virtio_pci/vp_vdpa.c    |  3 +-
 include/linux/vdpa.h                 | 26 ++++++++-------
 8 files changed, 80 insertions(+), 22 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v3 1/4] vdpa: save vdpa_dev_set_config in struct vdpa_device
  2022-10-21 22:43 ` Si-Wei Liu
@ 2022-10-21 22:43   ` Si-Wei Liu
  -1 siblings, 0 replies; 42+ messages in thread
From: Si-Wei Liu @ 2022-10-21 22:43 UTC (permalink / raw)
  To: mst, jasowang, parav; +Cc: linux-kernel, virtualization

In order to allow live migration orchestration software to export the
initial set of vdpa attributes with which the device was created, it
will be useful if the vdpa tool can report the config on demand with
simple query. This will ease the orchestration software implementation
so that it doesn't have to keep track of vdpa config change, or have
to persist vdpa attributes across failure and recovery, in fear of
being killed due to accidental software error.

This commit attempts to make struct vdpa_device contain the struct
vdpa_dev_set_config, where all config attributes upon vdpa creation
are carried over. Which will be used in subsequent commits.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 include/linux/vdpa.h | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 6d0f5e4..9f519a3 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -58,6 +58,16 @@ struct vdpa_vq_state {
 	};
 };
 
+struct vdpa_dev_set_config {
+	u64 device_features;
+	struct {
+		u8 mac[ETH_ALEN];
+		u16 mtu;
+		u16 max_vq_pairs;
+	} net;
+	u64 mask;
+};
+
 struct vdpa_mgmt_dev;
 
 /**
@@ -77,6 +87,8 @@ struct vdpa_vq_state {
  * @nvqs: maximum number of supported virtqueues
  * @mdev: management device pointer; caller must setup when registering device as part
  *	  of dev_add() mgmtdev ops callback before invoking _vdpa_register_device().
+ * @init_cfg: initial device config on vdpa creation; useful when instantiating
+ *            device with identical config is needed, e.g. migration.
  */
 struct vdpa_device {
 	struct device dev;
@@ -91,6 +103,7 @@ struct vdpa_device {
 	struct vdpa_mgmt_dev *mdev;
 	unsigned int ngroups;
 	unsigned int nas;
+	struct vdpa_dev_set_config init_cfg;
 };
 
 /**
@@ -103,16 +116,6 @@ struct vdpa_iova_range {
 	u64 last;
 };
 
-struct vdpa_dev_set_config {
-	u64 device_features;
-	struct {
-		u8 mac[ETH_ALEN];
-		u16 mtu;
-		u16 max_vq_pairs;
-	} net;
-	u64 mask;
-};
-
 /**
  * Corresponding file area for device memory mapping
  * @file: vma->vm_file for the mapping
-- 
1.8.3.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 1/4] vdpa: save vdpa_dev_set_config in struct vdpa_device
@ 2022-10-21 22:43   ` Si-Wei Liu
  0 siblings, 0 replies; 42+ messages in thread
From: Si-Wei Liu @ 2022-10-21 22:43 UTC (permalink / raw)
  To: mst, jasowang, parav; +Cc: virtualization, linux-kernel

In order to allow live migration orchestration software to export the
initial set of vdpa attributes with which the device was created, it
will be useful if the vdpa tool can report the config on demand with
simple query. This will ease the orchestration software implementation
so that it doesn't have to keep track of vdpa config change, or have
to persist vdpa attributes across failure and recovery, in fear of
being killed due to accidental software error.

This commit attempts to make struct vdpa_device contain the struct
vdpa_dev_set_config, where all config attributes upon vdpa creation
are carried over. Which will be used in subsequent commits.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 include/linux/vdpa.h | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 6d0f5e4..9f519a3 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -58,6 +58,16 @@ struct vdpa_vq_state {
 	};
 };
 
+struct vdpa_dev_set_config {
+	u64 device_features;
+	struct {
+		u8 mac[ETH_ALEN];
+		u16 mtu;
+		u16 max_vq_pairs;
+	} net;
+	u64 mask;
+};
+
 struct vdpa_mgmt_dev;
 
 /**
@@ -77,6 +87,8 @@ struct vdpa_vq_state {
  * @nvqs: maximum number of supported virtqueues
  * @mdev: management device pointer; caller must setup when registering device as part
  *	  of dev_add() mgmtdev ops callback before invoking _vdpa_register_device().
+ * @init_cfg: initial device config on vdpa creation; useful when instantiating
+ *            device with identical config is needed, e.g. migration.
  */
 struct vdpa_device {
 	struct device dev;
@@ -91,6 +103,7 @@ struct vdpa_device {
 	struct vdpa_mgmt_dev *mdev;
 	unsigned int ngroups;
 	unsigned int nas;
+	struct vdpa_dev_set_config init_cfg;
 };
 
 /**
@@ -103,16 +116,6 @@ struct vdpa_iova_range {
 	u64 last;
 };
 
-struct vdpa_dev_set_config {
-	u64 device_features;
-	struct {
-		u8 mac[ETH_ALEN];
-		u16 mtu;
-		u16 max_vq_pairs;
-	} net;
-	u64 mask;
-};
-
 /**
  * Corresponding file area for device memory mapping
  * @file: vma->vm_file for the mapping
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 2/4] vdpa: pass initial config to _vdpa_register_device()
  2022-10-21 22:43 ` Si-Wei Liu
@ 2022-10-21 22:43   ` Si-Wei Liu
  -1 siblings, 0 replies; 42+ messages in thread
From: Si-Wei Liu @ 2022-10-21 22:43 UTC (permalink / raw)
  To: mst, jasowang, parav; +Cc: linux-kernel, virtualization

Just as _vdpa_register_device taking @nvqs as the number of queues
to feed userspace inquery via vdpa_dev_fill(), we can follow the
same to stash config attributes in struct vdpa_device at the time
of vdpa registration.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 drivers/vdpa/ifcvf/ifcvf_main.c      |  2 +-
 drivers/vdpa/mlx5/net/mlx5_vnet.c    |  2 +-
 drivers/vdpa/vdpa.c                  | 15 +++++++++++----
 drivers/vdpa/vdpa_sim/vdpa_sim_blk.c |  2 +-
 drivers/vdpa/vdpa_sim/vdpa_sim_net.c |  2 +-
 drivers/vdpa/vdpa_user/vduse_dev.c   |  2 +-
 drivers/vdpa/virtio_pci/vp_vdpa.c    |  3 ++-
 include/linux/vdpa.h                 |  3 ++-
 8 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
index f9c0044..c54ab2c 100644
--- a/drivers/vdpa/ifcvf/ifcvf_main.c
+++ b/drivers/vdpa/ifcvf/ifcvf_main.c
@@ -771,7 +771,7 @@ static int ifcvf_vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
 	else
 		ret = dev_set_name(&vdpa_dev->dev, "vdpa%u", vdpa_dev->index);
 
-	ret = _vdpa_register_device(&adapter->vdpa, vf->nr_vring);
+	ret = _vdpa_register_device(&adapter->vdpa, vf->nr_vring, config);
 	if (ret) {
 		put_device(&adapter->vdpa.dev);
 		IFCVF_ERR(pdev, "Failed to register to vDPA bus");
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 9091336..376082e 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -3206,7 +3206,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
 	mlx5_notifier_register(mdev, &ndev->nb);
 	ndev->nb_registered = true;
 	mvdev->vdev.mdev = &mgtdev->mgtdev;
-	err = _vdpa_register_device(&mvdev->vdev, max_vqs + 1);
+	err = _vdpa_register_device(&mvdev->vdev, max_vqs + 1, add_config);
 	if (err)
 		goto err_reg;
 
diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index febdc99..bebded6 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -215,11 +215,16 @@ static int vdpa_name_match(struct device *dev, const void *data)
 	return (strcmp(dev_name(&vdev->dev), data) == 0);
 }
 
-static int __vdpa_register_device(struct vdpa_device *vdev, u32 nvqs)
+static int __vdpa_register_device(struct vdpa_device *vdev, u32 nvqs,
+				  const struct vdpa_dev_set_config *cfg)
 {
 	struct device *dev;
 
 	vdev->nvqs = nvqs;
+	if (cfg)
+		vdev->init_cfg = *cfg;
+	else
+		vdev->init_cfg.mask = 0ULL;
 
 	lockdep_assert_held(&vdpa_dev_lock);
 	dev = bus_find_device(&vdpa_bus, NULL, dev_name(&vdev->dev), vdpa_name_match);
@@ -237,15 +242,17 @@ static int __vdpa_register_device(struct vdpa_device *vdev, u32 nvqs)
  * callback after setting up valid mgmtdev for this vdpa device.
  * @vdev: the vdpa device to be registered to vDPA bus
  * @nvqs: number of virtqueues supported by this device
+ * @cfg: initial config on vdpa device creation
  *
  * Return: Returns an error when fail to add device to vDPA bus
  */
-int _vdpa_register_device(struct vdpa_device *vdev, u32 nvqs)
+int _vdpa_register_device(struct vdpa_device *vdev, u32 nvqs,
+			  const struct vdpa_dev_set_config *cfg)
 {
 	if (!vdev->mdev)
 		return -EINVAL;
 
-	return __vdpa_register_device(vdev, nvqs);
+	return __vdpa_register_device(vdev, nvqs, cfg);
 }
 EXPORT_SYMBOL_GPL(_vdpa_register_device);
 
@@ -262,7 +269,7 @@ int vdpa_register_device(struct vdpa_device *vdev, u32 nvqs)
 	int err;
 
 	down_write(&vdpa_dev_lock);
-	err = __vdpa_register_device(vdev, nvqs);
+	err = __vdpa_register_device(vdev, nvqs, NULL);
 	up_write(&vdpa_dev_lock);
 	return err;
 }
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
index c6db1a1..5e1cebc 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
@@ -387,7 +387,7 @@ static int vdpasim_blk_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
 	if (IS_ERR(simdev))
 		return PTR_ERR(simdev);
 
-	ret = _vdpa_register_device(&simdev->vdpa, VDPASIM_BLK_VQ_NUM);
+	ret = _vdpa_register_device(&simdev->vdpa, VDPASIM_BLK_VQ_NUM, config);
 	if (ret)
 		goto put_dev;
 
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
index c3cb225..06ef5a0 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
@@ -260,7 +260,7 @@ static int vdpasim_net_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
 
 	vdpasim_net_setup_config(simdev, config);
 
-	ret = _vdpa_register_device(&simdev->vdpa, VDPASIM_NET_VQ_NUM);
+	ret = _vdpa_register_device(&simdev->vdpa, VDPASIM_NET_VQ_NUM, config);
 	if (ret)
 		goto reg_err;
 
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index 35dceee..6530fd2 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -1713,7 +1713,7 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
 	if (ret)
 		return ret;
 
-	ret = _vdpa_register_device(&dev->vdev->vdpa, dev->vq_num);
+	ret = _vdpa_register_device(&dev->vdev->vdpa, dev->vq_num, config);
 	if (ret) {
 		put_device(&dev->vdev->vdpa.dev);
 		return ret;
diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c b/drivers/vdpa/virtio_pci/vp_vdpa.c
index d448db0..ffdc90e 100644
--- a/drivers/vdpa/virtio_pci/vp_vdpa.c
+++ b/drivers/vdpa/virtio_pci/vp_vdpa.c
@@ -538,7 +538,8 @@ static int vp_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
 	vp_vdpa->config_irq = VIRTIO_MSI_NO_VECTOR;
 
 	vp_vdpa->vdpa.mdev = &vp_vdpa_mgtdev->mgtdev;
-	ret = _vdpa_register_device(&vp_vdpa->vdpa, vp_vdpa->queues);
+	ret = _vdpa_register_device(&vp_vdpa->vdpa, vp_vdpa->queues,
+				    add_config);
 	if (ret) {
 		dev_err(&pdev->dev, "Failed to register to vdpa bus\n");
 		goto err;
diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 9f519a3..e68ab65 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -381,7 +381,8 @@ struct vdpa_device *__vdpa_alloc_device(struct device *parent,
 int vdpa_register_device(struct vdpa_device *vdev, u32 nvqs);
 void vdpa_unregister_device(struct vdpa_device *vdev);
 
-int _vdpa_register_device(struct vdpa_device *vdev, u32 nvqs);
+int _vdpa_register_device(struct vdpa_device *vdev, u32 nvqs,
+			  const struct vdpa_dev_set_config *cfg);
 void _vdpa_unregister_device(struct vdpa_device *vdev);
 
 /**
-- 
1.8.3.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 2/4] vdpa: pass initial config to _vdpa_register_device()
@ 2022-10-21 22:43   ` Si-Wei Liu
  0 siblings, 0 replies; 42+ messages in thread
From: Si-Wei Liu @ 2022-10-21 22:43 UTC (permalink / raw)
  To: mst, jasowang, parav; +Cc: virtualization, linux-kernel

Just as _vdpa_register_device taking @nvqs as the number of queues
to feed userspace inquery via vdpa_dev_fill(), we can follow the
same to stash config attributes in struct vdpa_device at the time
of vdpa registration.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 drivers/vdpa/ifcvf/ifcvf_main.c      |  2 +-
 drivers/vdpa/mlx5/net/mlx5_vnet.c    |  2 +-
 drivers/vdpa/vdpa.c                  | 15 +++++++++++----
 drivers/vdpa/vdpa_sim/vdpa_sim_blk.c |  2 +-
 drivers/vdpa/vdpa_sim/vdpa_sim_net.c |  2 +-
 drivers/vdpa/vdpa_user/vduse_dev.c   |  2 +-
 drivers/vdpa/virtio_pci/vp_vdpa.c    |  3 ++-
 include/linux/vdpa.h                 |  3 ++-
 8 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
index f9c0044..c54ab2c 100644
--- a/drivers/vdpa/ifcvf/ifcvf_main.c
+++ b/drivers/vdpa/ifcvf/ifcvf_main.c
@@ -771,7 +771,7 @@ static int ifcvf_vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
 	else
 		ret = dev_set_name(&vdpa_dev->dev, "vdpa%u", vdpa_dev->index);
 
-	ret = _vdpa_register_device(&adapter->vdpa, vf->nr_vring);
+	ret = _vdpa_register_device(&adapter->vdpa, vf->nr_vring, config);
 	if (ret) {
 		put_device(&adapter->vdpa.dev);
 		IFCVF_ERR(pdev, "Failed to register to vDPA bus");
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 9091336..376082e 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -3206,7 +3206,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
 	mlx5_notifier_register(mdev, &ndev->nb);
 	ndev->nb_registered = true;
 	mvdev->vdev.mdev = &mgtdev->mgtdev;
-	err = _vdpa_register_device(&mvdev->vdev, max_vqs + 1);
+	err = _vdpa_register_device(&mvdev->vdev, max_vqs + 1, add_config);
 	if (err)
 		goto err_reg;
 
diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index febdc99..bebded6 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -215,11 +215,16 @@ static int vdpa_name_match(struct device *dev, const void *data)
 	return (strcmp(dev_name(&vdev->dev), data) == 0);
 }
 
-static int __vdpa_register_device(struct vdpa_device *vdev, u32 nvqs)
+static int __vdpa_register_device(struct vdpa_device *vdev, u32 nvqs,
+				  const struct vdpa_dev_set_config *cfg)
 {
 	struct device *dev;
 
 	vdev->nvqs = nvqs;
+	if (cfg)
+		vdev->init_cfg = *cfg;
+	else
+		vdev->init_cfg.mask = 0ULL;
 
 	lockdep_assert_held(&vdpa_dev_lock);
 	dev = bus_find_device(&vdpa_bus, NULL, dev_name(&vdev->dev), vdpa_name_match);
@@ -237,15 +242,17 @@ static int __vdpa_register_device(struct vdpa_device *vdev, u32 nvqs)
  * callback after setting up valid mgmtdev for this vdpa device.
  * @vdev: the vdpa device to be registered to vDPA bus
  * @nvqs: number of virtqueues supported by this device
+ * @cfg: initial config on vdpa device creation
  *
  * Return: Returns an error when fail to add device to vDPA bus
  */
-int _vdpa_register_device(struct vdpa_device *vdev, u32 nvqs)
+int _vdpa_register_device(struct vdpa_device *vdev, u32 nvqs,
+			  const struct vdpa_dev_set_config *cfg)
 {
 	if (!vdev->mdev)
 		return -EINVAL;
 
-	return __vdpa_register_device(vdev, nvqs);
+	return __vdpa_register_device(vdev, nvqs, cfg);
 }
 EXPORT_SYMBOL_GPL(_vdpa_register_device);
 
@@ -262,7 +269,7 @@ int vdpa_register_device(struct vdpa_device *vdev, u32 nvqs)
 	int err;
 
 	down_write(&vdpa_dev_lock);
-	err = __vdpa_register_device(vdev, nvqs);
+	err = __vdpa_register_device(vdev, nvqs, NULL);
 	up_write(&vdpa_dev_lock);
 	return err;
 }
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
index c6db1a1..5e1cebc 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
@@ -387,7 +387,7 @@ static int vdpasim_blk_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
 	if (IS_ERR(simdev))
 		return PTR_ERR(simdev);
 
-	ret = _vdpa_register_device(&simdev->vdpa, VDPASIM_BLK_VQ_NUM);
+	ret = _vdpa_register_device(&simdev->vdpa, VDPASIM_BLK_VQ_NUM, config);
 	if (ret)
 		goto put_dev;
 
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
index c3cb225..06ef5a0 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
@@ -260,7 +260,7 @@ static int vdpasim_net_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
 
 	vdpasim_net_setup_config(simdev, config);
 
-	ret = _vdpa_register_device(&simdev->vdpa, VDPASIM_NET_VQ_NUM);
+	ret = _vdpa_register_device(&simdev->vdpa, VDPASIM_NET_VQ_NUM, config);
 	if (ret)
 		goto reg_err;
 
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index 35dceee..6530fd2 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -1713,7 +1713,7 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
 	if (ret)
 		return ret;
 
-	ret = _vdpa_register_device(&dev->vdev->vdpa, dev->vq_num);
+	ret = _vdpa_register_device(&dev->vdev->vdpa, dev->vq_num, config);
 	if (ret) {
 		put_device(&dev->vdev->vdpa.dev);
 		return ret;
diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c b/drivers/vdpa/virtio_pci/vp_vdpa.c
index d448db0..ffdc90e 100644
--- a/drivers/vdpa/virtio_pci/vp_vdpa.c
+++ b/drivers/vdpa/virtio_pci/vp_vdpa.c
@@ -538,7 +538,8 @@ static int vp_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
 	vp_vdpa->config_irq = VIRTIO_MSI_NO_VECTOR;
 
 	vp_vdpa->vdpa.mdev = &vp_vdpa_mgtdev->mgtdev;
-	ret = _vdpa_register_device(&vp_vdpa->vdpa, vp_vdpa->queues);
+	ret = _vdpa_register_device(&vp_vdpa->vdpa, vp_vdpa->queues,
+				    add_config);
 	if (ret) {
 		dev_err(&pdev->dev, "Failed to register to vdpa bus\n");
 		goto err;
diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 9f519a3..e68ab65 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -381,7 +381,8 @@ struct vdpa_device *__vdpa_alloc_device(struct device *parent,
 int vdpa_register_device(struct vdpa_device *vdev, u32 nvqs);
 void vdpa_unregister_device(struct vdpa_device *vdev);
 
-int _vdpa_register_device(struct vdpa_device *vdev, u32 nvqs);
+int _vdpa_register_device(struct vdpa_device *vdev, u32 nvqs,
+			  const struct vdpa_dev_set_config *cfg);
 void _vdpa_unregister_device(struct vdpa_device *vdev);
 
 /**
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 3/4] vdpa: show dev config as-is in "vdpa dev show" output
  2022-10-21 22:43 ` Si-Wei Liu
@ 2022-10-21 22:43   ` Si-Wei Liu
  -1 siblings, 0 replies; 42+ messages in thread
From: Si-Wei Liu @ 2022-10-21 22:43 UTC (permalink / raw)
  To: mst, jasowang, parav; +Cc: linux-kernel, virtualization

Live migration of vdpa would typically require re-instate vdpa
device with an idential set of configs on the destination node,
same way as how source node created the device in the first
place. In order to save orchestration software from memorizing
and keeping track of vdpa config, it will be helpful if the vdpa
tool provides the aids for exporting the initial configs as-is,
the way how vdpa device was created. The "vdpa dev show" command
seems to be the right vehicle for that. It is unlike the "vdpa dev
config show" command output which usually goes with the live value
in the device config space, and is not quite reliable subject to
the dynamics of feature negotiation or possible change by the
driver to the config space.

Examples:

1) Create vDPA by default without any config attribute

$ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0
$ vdpa dev show vdpa0
vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 max_vqs 9 max_vq_size 256
$ vdpa dev -jp show vdpa0
{
    "dev": {
        "vdpa0": {
            "type": "network",
            "mgmtdev": "pci/0000:41:04.2",
            "vendor_id": 5555,
            "max_vqs": 9,
            "max_vq_size": 256,
        }
    }
}

2) Create vDPA with config attribute(s) specified

$ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0 \
    mac e4:11:c6:d3:45:f0 max_vq_pairs 4
$ vdpa dev show
vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 max_vqs 9 max_vq_size 256
  initial_config: mac e4:11:c6:d3:45:f0 max_vq_pairs 4
$ vdpa dev -jp show
{
    "dev": {
        "vdpa0": {
            "type": "network",
            "mgmtdev": "pci/0000:41:04.2",
            "vendor_id": 5555,
            "max_vqs": 9,
            "max_vq_size": 256,
            "initial_config": {
                "mac": "e4:11:c6:d3:45:f0",
                "max_vq_pairs": 4
            }
        }
    }
}

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 drivers/vdpa/vdpa.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index bebded6..bfb8f54 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -677,6 +677,41 @@ static int vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct genl_info *i
 }
 
 static int
+vdpa_dev_initcfg_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 device_id)
+{
+	struct vdpa_dev_set_config *cfg = &vdev->init_cfg;
+	int err = -EMSGSIZE;
+
+	if (!cfg->mask)
+		return 0;
+
+	switch (device_id) {
+	case VIRTIO_ID_NET:
+		if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR)) != 0 &&
+		    nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
+			    sizeof(cfg->net.mac), cfg->net.mac))
+			return err;
+		if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU)) != 0 &&
+		    nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, cfg->net.mtu))
+			return err;
+		if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP)) != 0 &&
+		    nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
+				cfg->net.max_vq_pairs))
+			return err;
+		break;
+	default:
+		break;
+	}
+
+	if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) != 0 &&
+	    nla_put_u64_64bit(msg, VDPA_ATTR_DEV_FEATURES,
+			      cfg->device_features, VDPA_ATTR_PAD))
+		return err;
+
+	return 0;
+}
+
+static int
 vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 portid, u32 seq,
 	      int flags, struct netlink_ext_ack *extack)
 {
@@ -715,6 +750,10 @@ static int vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct genl_info *i
 	if (nla_put_u16(msg, VDPA_ATTR_DEV_MIN_VQ_SIZE, min_vq_size))
 		goto msg_err;
 
+	err = vdpa_dev_initcfg_fill(vdev, msg, device_id);
+	if (err)
+		goto msg_err;
+
 	genlmsg_end(msg, hdr);
 	return 0;
 
-- 
1.8.3.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 3/4] vdpa: show dev config as-is in "vdpa dev show" output
@ 2022-10-21 22:43   ` Si-Wei Liu
  0 siblings, 0 replies; 42+ messages in thread
From: Si-Wei Liu @ 2022-10-21 22:43 UTC (permalink / raw)
  To: mst, jasowang, parav; +Cc: virtualization, linux-kernel

Live migration of vdpa would typically require re-instate vdpa
device with an idential set of configs on the destination node,
same way as how source node created the device in the first
place. In order to save orchestration software from memorizing
and keeping track of vdpa config, it will be helpful if the vdpa
tool provides the aids for exporting the initial configs as-is,
the way how vdpa device was created. The "vdpa dev show" command
seems to be the right vehicle for that. It is unlike the "vdpa dev
config show" command output which usually goes with the live value
in the device config space, and is not quite reliable subject to
the dynamics of feature negotiation or possible change by the
driver to the config space.

Examples:

1) Create vDPA by default without any config attribute

$ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0
$ vdpa dev show vdpa0
vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 max_vqs 9 max_vq_size 256
$ vdpa dev -jp show vdpa0
{
    "dev": {
        "vdpa0": {
            "type": "network",
            "mgmtdev": "pci/0000:41:04.2",
            "vendor_id": 5555,
            "max_vqs": 9,
            "max_vq_size": 256,
        }
    }
}

2) Create vDPA with config attribute(s) specified

$ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0 \
    mac e4:11:c6:d3:45:f0 max_vq_pairs 4
$ vdpa dev show
vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 max_vqs 9 max_vq_size 256
  initial_config: mac e4:11:c6:d3:45:f0 max_vq_pairs 4
$ vdpa dev -jp show
{
    "dev": {
        "vdpa0": {
            "type": "network",
            "mgmtdev": "pci/0000:41:04.2",
            "vendor_id": 5555,
            "max_vqs": 9,
            "max_vq_size": 256,
            "initial_config": {
                "mac": "e4:11:c6:d3:45:f0",
                "max_vq_pairs": 4
            }
        }
    }
}

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 drivers/vdpa/vdpa.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index bebded6..bfb8f54 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -677,6 +677,41 @@ static int vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct genl_info *i
 }
 
 static int
+vdpa_dev_initcfg_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 device_id)
+{
+	struct vdpa_dev_set_config *cfg = &vdev->init_cfg;
+	int err = -EMSGSIZE;
+
+	if (!cfg->mask)
+		return 0;
+
+	switch (device_id) {
+	case VIRTIO_ID_NET:
+		if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR)) != 0 &&
+		    nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
+			    sizeof(cfg->net.mac), cfg->net.mac))
+			return err;
+		if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU)) != 0 &&
+		    nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, cfg->net.mtu))
+			return err;
+		if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP)) != 0 &&
+		    nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
+				cfg->net.max_vq_pairs))
+			return err;
+		break;
+	default:
+		break;
+	}
+
+	if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) != 0 &&
+	    nla_put_u64_64bit(msg, VDPA_ATTR_DEV_FEATURES,
+			      cfg->device_features, VDPA_ATTR_PAD))
+		return err;
+
+	return 0;
+}
+
+static int
 vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 portid, u32 seq,
 	      int flags, struct netlink_ext_ack *extack)
 {
@@ -715,6 +750,10 @@ static int vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct genl_info *i
 	if (nla_put_u16(msg, VDPA_ATTR_DEV_MIN_VQ_SIZE, min_vq_size))
 		goto msg_err;
 
+	err = vdpa_dev_initcfg_fill(vdev, msg, device_id);
+	if (err)
+		goto msg_err;
+
 	genlmsg_end(msg, hdr);
 	return 0;
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 4/4] vdpa: fix improper error message when adding vdpa dev
  2022-10-21 22:43 ` Si-Wei Liu
@ 2022-10-21 22:43   ` Si-Wei Liu
  -1 siblings, 0 replies; 42+ messages in thread
From: Si-Wei Liu @ 2022-10-21 22:43 UTC (permalink / raw)
  To: mst, jasowang, parav; +Cc: linux-kernel, virtualization

In below example, before the fix, mtu attribute is supported
by the parent mgmtdev, but the error message showing "All
provided are not supported" is just misleading.

$ vdpa mgmtdev show
vdpasim_net:
  supported_classes net
  max_supported_vqs 3
  dev_features MTU MAC CTRL_VQ CTRL_MAC_ADDR ANY_LAYOUT VERSION_1 ACCESS_PLATFORM

$ vdpa dev add mgmtdev vdpasim_net name vdpasim0 mtu 5000 max_vqp 2
Error: vdpa: All provided attributes are not supported.
kernel answers: Operation not supported

After fix, the relevant error message will be like:

$ vdpa dev add mgmtdev vdpasim_net name vdpasim0 mtu 5000 max_vqp 2
Error: vdpa: Some provided attributes are not supported.
kernel answers: Operation not supported

$ vdpa dev add mgmtdev vdpasim_net name vdpasim0 max_vqp 2
Error: vdpa: All provided attributes are not supported.
kernel answers: Operation not supported

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 drivers/vdpa/vdpa.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index bfb8f54..2638565 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -629,13 +629,20 @@ static int vdpa_nl_cmd_dev_add_set_doit(struct sk_buff *skb, struct genl_info *i
 		err = PTR_ERR(mdev);
 		goto err;
 	}
-	if ((config.mask & mdev->config_attr_mask) != config.mask) {
+	if (config.mask && (config.mask & mdev->config_attr_mask) == 0) {
 		NL_SET_ERR_MSG_MOD(info->extack,
 				   "All provided attributes are not supported");
 		err = -EOPNOTSUPP;
 		goto err;
 	}
 
+	if ((config.mask & mdev->config_attr_mask) != config.mask) {
+		NL_SET_ERR_MSG_MOD(info->extack,
+				   "Some provided attributes are not supported");
+		err = -EOPNOTSUPP;
+		goto err;
+	}
+
 	err = mdev->ops->dev_add(mdev, name, &config);
 err:
 	up_write(&vdpa_dev_lock);
-- 
1.8.3.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 4/4] vdpa: fix improper error message when adding vdpa dev
@ 2022-10-21 22:43   ` Si-Wei Liu
  0 siblings, 0 replies; 42+ messages in thread
From: Si-Wei Liu @ 2022-10-21 22:43 UTC (permalink / raw)
  To: mst, jasowang, parav; +Cc: virtualization, linux-kernel

In below example, before the fix, mtu attribute is supported
by the parent mgmtdev, but the error message showing "All
provided are not supported" is just misleading.

$ vdpa mgmtdev show
vdpasim_net:
  supported_classes net
  max_supported_vqs 3
  dev_features MTU MAC CTRL_VQ CTRL_MAC_ADDR ANY_LAYOUT VERSION_1 ACCESS_PLATFORM

$ vdpa dev add mgmtdev vdpasim_net name vdpasim0 mtu 5000 max_vqp 2
Error: vdpa: All provided attributes are not supported.
kernel answers: Operation not supported

After fix, the relevant error message will be like:

$ vdpa dev add mgmtdev vdpasim_net name vdpasim0 mtu 5000 max_vqp 2
Error: vdpa: Some provided attributes are not supported.
kernel answers: Operation not supported

$ vdpa dev add mgmtdev vdpasim_net name vdpasim0 max_vqp 2
Error: vdpa: All provided attributes are not supported.
kernel answers: Operation not supported

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 drivers/vdpa/vdpa.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index bfb8f54..2638565 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -629,13 +629,20 @@ static int vdpa_nl_cmd_dev_add_set_doit(struct sk_buff *skb, struct genl_info *i
 		err = PTR_ERR(mdev);
 		goto err;
 	}
-	if ((config.mask & mdev->config_attr_mask) != config.mask) {
+	if (config.mask && (config.mask & mdev->config_attr_mask) == 0) {
 		NL_SET_ERR_MSG_MOD(info->extack,
 				   "All provided attributes are not supported");
 		err = -EOPNOTSUPP;
 		goto err;
 	}
 
+	if ((config.mask & mdev->config_attr_mask) != config.mask) {
+		NL_SET_ERR_MSG_MOD(info->extack,
+				   "Some provided attributes are not supported");
+		err = -EOPNOTSUPP;
+		goto err;
+	}
+
 	err = mdev->ops->dev_add(mdev, name, &config);
 err:
 	up_write(&vdpa_dev_lock);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 3/4] vdpa: show dev config as-is in "vdpa dev show" output
  2022-10-21 22:43   ` Si-Wei Liu
@ 2022-10-24  8:40     ` Jason Wang
  -1 siblings, 0 replies; 42+ messages in thread
From: Jason Wang @ 2022-10-24  8:40 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: mst, parav, virtualization, linux-kernel

On Sat, Oct 22, 2022 at 7:49 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Live migration of vdpa would typically require re-instate vdpa
> device with an idential set of configs on the destination node,
> same way as how source node created the device in the first
> place. In order to save orchestration software from memorizing
> and keeping track of vdpa config, it will be helpful if the vdpa
> tool provides the aids for exporting the initial configs as-is,
> the way how vdpa device was created. The "vdpa dev show" command
> seems to be the right vehicle for that. It is unlike the "vdpa dev
> config show" command output which usually goes with the live value
> in the device config space, and is not quite reliable subject to
> the dynamics of feature negotiation or possible change by the
> driver to the config space.
>
> Examples:
>
> 1) Create vDPA by default without any config attribute
>
> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0
> $ vdpa dev show vdpa0
> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 max_vqs 9 max_vq_size 256
> $ vdpa dev -jp show vdpa0
> {
>     "dev": {
>         "vdpa0": {
>             "type": "network",
>             "mgmtdev": "pci/0000:41:04.2",
>             "vendor_id": 5555,
>             "max_vqs": 9,
>             "max_vq_size": 256,
>         }
>     }
> }
>
> 2) Create vDPA with config attribute(s) specified
>
> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0 \
>     mac e4:11:c6:d3:45:f0 max_vq_pairs 4
> $ vdpa dev show
> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 max_vqs 9 max_vq_size 256
>   initial_config: mac e4:11:c6:d3:45:f0 max_vq_pairs 4
> $ vdpa dev -jp show
> {
>     "dev": {
>         "vdpa0": {
>             "type": "network",
>             "mgmtdev": "pci/0000:41:04.2",
>             "vendor_id": 5555,
>             "max_vqs": 9,
>             "max_vq_size": 256,
>             "initial_config": {
>                 "mac": "e4:11:c6:d3:45:f0",
>                 "max_vq_pairs": 4
>             }
>         }
>     }
> }
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  drivers/vdpa/vdpa.c | 39 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 39 insertions(+)
>
> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
> index bebded6..bfb8f54 100644
> --- a/drivers/vdpa/vdpa.c
> +++ b/drivers/vdpa/vdpa.c
> @@ -677,6 +677,41 @@ static int vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct genl_info *i
>  }
>
>  static int
> +vdpa_dev_initcfg_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 device_id)
> +{
> +       struct vdpa_dev_set_config *cfg = &vdev->init_cfg;
> +       int err = -EMSGSIZE;
> +
> +       if (!cfg->mask)
> +               return 0;
> +
> +       switch (device_id) {
> +       case VIRTIO_ID_NET:
> +               if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR)) != 0 &&
> +                   nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
> +                           sizeof(cfg->net.mac), cfg->net.mac))
> +                       return err;
> +               if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU)) != 0 &&
> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, cfg->net.mtu))
> +                       return err;
> +               if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP)) != 0 &&
> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
> +                               cfg->net.max_vq_pairs))
> +                       return err;
> +               break;
> +       default:
> +               break;
> +       }
> +
> +       if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) != 0 &&
> +           nla_put_u64_64bit(msg, VDPA_ATTR_DEV_FEATURES,
> +                             cfg->device_features, VDPA_ATTR_PAD))
> +               return err;

A question: If any of those above attributes were not provisioned,
should we show the ones that are inherited from the parent?

Thanks

> +
> +       return 0;
> +}
> +
> +static int
>  vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 portid, u32 seq,
>               int flags, struct netlink_ext_ack *extack)
>  {
> @@ -715,6 +750,10 @@ static int vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct genl_info *i
>         if (nla_put_u16(msg, VDPA_ATTR_DEV_MIN_VQ_SIZE, min_vq_size))
>                 goto msg_err;
>
> +       err = vdpa_dev_initcfg_fill(vdev, msg, device_id);
> +       if (err)
> +               goto msg_err;
> +
>         genlmsg_end(msg, hdr);
>         return 0;
>
> --
> 1.8.3.1
>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 3/4] vdpa: show dev config as-is in "vdpa dev show" output
@ 2022-10-24  8:40     ` Jason Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Jason Wang @ 2022-10-24  8:40 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: virtualization, linux-kernel, mst

On Sat, Oct 22, 2022 at 7:49 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Live migration of vdpa would typically require re-instate vdpa
> device with an idential set of configs on the destination node,
> same way as how source node created the device in the first
> place. In order to save orchestration software from memorizing
> and keeping track of vdpa config, it will be helpful if the vdpa
> tool provides the aids for exporting the initial configs as-is,
> the way how vdpa device was created. The "vdpa dev show" command
> seems to be the right vehicle for that. It is unlike the "vdpa dev
> config show" command output which usually goes with the live value
> in the device config space, and is not quite reliable subject to
> the dynamics of feature negotiation or possible change by the
> driver to the config space.
>
> Examples:
>
> 1) Create vDPA by default without any config attribute
>
> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0
> $ vdpa dev show vdpa0
> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 max_vqs 9 max_vq_size 256
> $ vdpa dev -jp show vdpa0
> {
>     "dev": {
>         "vdpa0": {
>             "type": "network",
>             "mgmtdev": "pci/0000:41:04.2",
>             "vendor_id": 5555,
>             "max_vqs": 9,
>             "max_vq_size": 256,
>         }
>     }
> }
>
> 2) Create vDPA with config attribute(s) specified
>
> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0 \
>     mac e4:11:c6:d3:45:f0 max_vq_pairs 4
> $ vdpa dev show
> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 max_vqs 9 max_vq_size 256
>   initial_config: mac e4:11:c6:d3:45:f0 max_vq_pairs 4
> $ vdpa dev -jp show
> {
>     "dev": {
>         "vdpa0": {
>             "type": "network",
>             "mgmtdev": "pci/0000:41:04.2",
>             "vendor_id": 5555,
>             "max_vqs": 9,
>             "max_vq_size": 256,
>             "initial_config": {
>                 "mac": "e4:11:c6:d3:45:f0",
>                 "max_vq_pairs": 4
>             }
>         }
>     }
> }
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  drivers/vdpa/vdpa.c | 39 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 39 insertions(+)
>
> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
> index bebded6..bfb8f54 100644
> --- a/drivers/vdpa/vdpa.c
> +++ b/drivers/vdpa/vdpa.c
> @@ -677,6 +677,41 @@ static int vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct genl_info *i
>  }
>
>  static int
> +vdpa_dev_initcfg_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 device_id)
> +{
> +       struct vdpa_dev_set_config *cfg = &vdev->init_cfg;
> +       int err = -EMSGSIZE;
> +
> +       if (!cfg->mask)
> +               return 0;
> +
> +       switch (device_id) {
> +       case VIRTIO_ID_NET:
> +               if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR)) != 0 &&
> +                   nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
> +                           sizeof(cfg->net.mac), cfg->net.mac))
> +                       return err;
> +               if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU)) != 0 &&
> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, cfg->net.mtu))
> +                       return err;
> +               if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP)) != 0 &&
> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
> +                               cfg->net.max_vq_pairs))
> +                       return err;
> +               break;
> +       default:
> +               break;
> +       }
> +
> +       if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) != 0 &&
> +           nla_put_u64_64bit(msg, VDPA_ATTR_DEV_FEATURES,
> +                             cfg->device_features, VDPA_ATTR_PAD))
> +               return err;

A question: If any of those above attributes were not provisioned,
should we show the ones that are inherited from the parent?

Thanks

> +
> +       return 0;
> +}
> +
> +static int
>  vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 portid, u32 seq,
>               int flags, struct netlink_ext_ack *extack)
>  {
> @@ -715,6 +750,10 @@ static int vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct genl_info *i
>         if (nla_put_u16(msg, VDPA_ATTR_DEV_MIN_VQ_SIZE, min_vq_size))
>                 goto msg_err;
>
> +       err = vdpa_dev_initcfg_fill(vdev, msg, device_id);
> +       if (err)
> +               goto msg_err;
> +
>         genlmsg_end(msg, hdr);
>         return 0;
>
> --
> 1.8.3.1
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 4/4] vdpa: fix improper error message when adding vdpa dev
  2022-10-21 22:43   ` Si-Wei Liu
@ 2022-10-24  8:43     ` Jason Wang
  -1 siblings, 0 replies; 42+ messages in thread
From: Jason Wang @ 2022-10-24  8:43 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: mst, parav, virtualization, linux-kernel

On Sat, Oct 22, 2022 at 7:49 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> In below example, before the fix, mtu attribute is supported
> by the parent mgmtdev, but the error message showing "All
> provided are not supported" is just misleading.
>
> $ vdpa mgmtdev show
> vdpasim_net:
>   supported_classes net
>   max_supported_vqs 3
>   dev_features MTU MAC CTRL_VQ CTRL_MAC_ADDR ANY_LAYOUT VERSION_1 ACCESS_PLATFORM
>
> $ vdpa dev add mgmtdev vdpasim_net name vdpasim0 mtu 5000 max_vqp 2
> Error: vdpa: All provided attributes are not supported.
> kernel answers: Operation not supported
>
> After fix, the relevant error message will be like:
>
> $ vdpa dev add mgmtdev vdpasim_net name vdpasim0 mtu 5000 max_vqp 2
> Error: vdpa: Some provided attributes are not supported.
> kernel answers: Operation not supported
>
> $ vdpa dev add mgmtdev vdpasim_net name vdpasim0 max_vqp 2
> Error: vdpa: All provided attributes are not supported.
> kernel answers: Operation not supported
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>

Acked-by: Jason Wang <jasowang@redhat.com>

> ---
>  drivers/vdpa/vdpa.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
> index bfb8f54..2638565 100644
> --- a/drivers/vdpa/vdpa.c
> +++ b/drivers/vdpa/vdpa.c
> @@ -629,13 +629,20 @@ static int vdpa_nl_cmd_dev_add_set_doit(struct sk_buff *skb, struct genl_info *i
>                 err = PTR_ERR(mdev);
>                 goto err;
>         }
> -       if ((config.mask & mdev->config_attr_mask) != config.mask) {
> +       if (config.mask && (config.mask & mdev->config_attr_mask) == 0) {
>                 NL_SET_ERR_MSG_MOD(info->extack,
>                                    "All provided attributes are not supported");
>                 err = -EOPNOTSUPP;
>                 goto err;
>         }
>
> +       if ((config.mask & mdev->config_attr_mask) != config.mask) {
> +               NL_SET_ERR_MSG_MOD(info->extack,
> +                                  "Some provided attributes are not supported");
> +               err = -EOPNOTSUPP;
> +               goto err;
> +       }
> +
>         err = mdev->ops->dev_add(mdev, name, &config);
>  err:
>         up_write(&vdpa_dev_lock);
> --
> 1.8.3.1
>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 4/4] vdpa: fix improper error message when adding vdpa dev
@ 2022-10-24  8:43     ` Jason Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Jason Wang @ 2022-10-24  8:43 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: virtualization, linux-kernel, mst

On Sat, Oct 22, 2022 at 7:49 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> In below example, before the fix, mtu attribute is supported
> by the parent mgmtdev, but the error message showing "All
> provided are not supported" is just misleading.
>
> $ vdpa mgmtdev show
> vdpasim_net:
>   supported_classes net
>   max_supported_vqs 3
>   dev_features MTU MAC CTRL_VQ CTRL_MAC_ADDR ANY_LAYOUT VERSION_1 ACCESS_PLATFORM
>
> $ vdpa dev add mgmtdev vdpasim_net name vdpasim0 mtu 5000 max_vqp 2
> Error: vdpa: All provided attributes are not supported.
> kernel answers: Operation not supported
>
> After fix, the relevant error message will be like:
>
> $ vdpa dev add mgmtdev vdpasim_net name vdpasim0 mtu 5000 max_vqp 2
> Error: vdpa: Some provided attributes are not supported.
> kernel answers: Operation not supported
>
> $ vdpa dev add mgmtdev vdpasim_net name vdpasim0 max_vqp 2
> Error: vdpa: All provided attributes are not supported.
> kernel answers: Operation not supported
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>

Acked-by: Jason Wang <jasowang@redhat.com>

> ---
>  drivers/vdpa/vdpa.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
> index bfb8f54..2638565 100644
> --- a/drivers/vdpa/vdpa.c
> +++ b/drivers/vdpa/vdpa.c
> @@ -629,13 +629,20 @@ static int vdpa_nl_cmd_dev_add_set_doit(struct sk_buff *skb, struct genl_info *i
>                 err = PTR_ERR(mdev);
>                 goto err;
>         }
> -       if ((config.mask & mdev->config_attr_mask) != config.mask) {
> +       if (config.mask && (config.mask & mdev->config_attr_mask) == 0) {
>                 NL_SET_ERR_MSG_MOD(info->extack,
>                                    "All provided attributes are not supported");
>                 err = -EOPNOTSUPP;
>                 goto err;
>         }
>
> +       if ((config.mask & mdev->config_attr_mask) != config.mask) {
> +               NL_SET_ERR_MSG_MOD(info->extack,
> +                                  "Some provided attributes are not supported");
> +               err = -EOPNOTSUPP;
> +               goto err;
> +       }
> +
>         err = mdev->ops->dev_add(mdev, name, &config);
>  err:
>         up_write(&vdpa_dev_lock);
> --
> 1.8.3.1
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 1/4] vdpa: save vdpa_dev_set_config in struct vdpa_device
  2022-10-21 22:43   ` Si-Wei Liu
@ 2022-10-24  8:43     ` Jason Wang
  -1 siblings, 0 replies; 42+ messages in thread
From: Jason Wang @ 2022-10-24  8:43 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: mst, parav, virtualization, linux-kernel

On Sat, Oct 22, 2022 at 7:49 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> In order to allow live migration orchestration software to export the
> initial set of vdpa attributes with which the device was created, it
> will be useful if the vdpa tool can report the config on demand with
> simple query. This will ease the orchestration software implementation
> so that it doesn't have to keep track of vdpa config change, or have
> to persist vdpa attributes across failure and recovery, in fear of
> being killed due to accidental software error.
>
> This commit attempts to make struct vdpa_device contain the struct
> vdpa_dev_set_config, where all config attributes upon vdpa creation
> are carried over. Which will be used in subsequent commits.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>

Acked-by: Jason Wang <jasowang@redhat.com>


> ---
>  include/linux/vdpa.h | 23 +++++++++++++----------
>  1 file changed, 13 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> index 6d0f5e4..9f519a3 100644
> --- a/include/linux/vdpa.h
> +++ b/include/linux/vdpa.h
> @@ -58,6 +58,16 @@ struct vdpa_vq_state {
>         };
>  };
>
> +struct vdpa_dev_set_config {
> +       u64 device_features;
> +       struct {
> +               u8 mac[ETH_ALEN];
> +               u16 mtu;
> +               u16 max_vq_pairs;
> +       } net;
> +       u64 mask;
> +};
> +
>  struct vdpa_mgmt_dev;
>
>  /**
> @@ -77,6 +87,8 @@ struct vdpa_vq_state {
>   * @nvqs: maximum number of supported virtqueues
>   * @mdev: management device pointer; caller must setup when registering device as part
>   *       of dev_add() mgmtdev ops callback before invoking _vdpa_register_device().
> + * @init_cfg: initial device config on vdpa creation; useful when instantiating
> + *            device with identical config is needed, e.g. migration.
>   */
>  struct vdpa_device {
>         struct device dev;
> @@ -91,6 +103,7 @@ struct vdpa_device {
>         struct vdpa_mgmt_dev *mdev;
>         unsigned int ngroups;
>         unsigned int nas;
> +       struct vdpa_dev_set_config init_cfg;
>  };
>
>  /**
> @@ -103,16 +116,6 @@ struct vdpa_iova_range {
>         u64 last;
>  };
>
> -struct vdpa_dev_set_config {
> -       u64 device_features;
> -       struct {
> -               u8 mac[ETH_ALEN];
> -               u16 mtu;
> -               u16 max_vq_pairs;
> -       } net;
> -       u64 mask;
> -};
> -
>  /**
>   * Corresponding file area for device memory mapping
>   * @file: vma->vm_file for the mapping
> --
> 1.8.3.1
>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 1/4] vdpa: save vdpa_dev_set_config in struct vdpa_device
@ 2022-10-24  8:43     ` Jason Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Jason Wang @ 2022-10-24  8:43 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: virtualization, linux-kernel, mst

On Sat, Oct 22, 2022 at 7:49 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> In order to allow live migration orchestration software to export the
> initial set of vdpa attributes with which the device was created, it
> will be useful if the vdpa tool can report the config on demand with
> simple query. This will ease the orchestration software implementation
> so that it doesn't have to keep track of vdpa config change, or have
> to persist vdpa attributes across failure and recovery, in fear of
> being killed due to accidental software error.
>
> This commit attempts to make struct vdpa_device contain the struct
> vdpa_dev_set_config, where all config attributes upon vdpa creation
> are carried over. Which will be used in subsequent commits.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>

Acked-by: Jason Wang <jasowang@redhat.com>


> ---
>  include/linux/vdpa.h | 23 +++++++++++++----------
>  1 file changed, 13 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> index 6d0f5e4..9f519a3 100644
> --- a/include/linux/vdpa.h
> +++ b/include/linux/vdpa.h
> @@ -58,6 +58,16 @@ struct vdpa_vq_state {
>         };
>  };
>
> +struct vdpa_dev_set_config {
> +       u64 device_features;
> +       struct {
> +               u8 mac[ETH_ALEN];
> +               u16 mtu;
> +               u16 max_vq_pairs;
> +       } net;
> +       u64 mask;
> +};
> +
>  struct vdpa_mgmt_dev;
>
>  /**
> @@ -77,6 +87,8 @@ struct vdpa_vq_state {
>   * @nvqs: maximum number of supported virtqueues
>   * @mdev: management device pointer; caller must setup when registering device as part
>   *       of dev_add() mgmtdev ops callback before invoking _vdpa_register_device().
> + * @init_cfg: initial device config on vdpa creation; useful when instantiating
> + *            device with identical config is needed, e.g. migration.
>   */
>  struct vdpa_device {
>         struct device dev;
> @@ -91,6 +103,7 @@ struct vdpa_device {
>         struct vdpa_mgmt_dev *mdev;
>         unsigned int ngroups;
>         unsigned int nas;
> +       struct vdpa_dev_set_config init_cfg;
>  };
>
>  /**
> @@ -103,16 +116,6 @@ struct vdpa_iova_range {
>         u64 last;
>  };
>
> -struct vdpa_dev_set_config {
> -       u64 device_features;
> -       struct {
> -               u8 mac[ETH_ALEN];
> -               u16 mtu;
> -               u16 max_vq_pairs;
> -       } net;
> -       u64 mask;
> -};
> -
>  /**
>   * Corresponding file area for device memory mapping
>   * @file: vma->vm_file for the mapping
> --
> 1.8.3.1
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 3/4] vdpa: show dev config as-is in "vdpa dev show" output
  2022-10-24  8:40     ` Jason Wang
@ 2022-10-24 19:14       ` Si-Wei Liu
  -1 siblings, 0 replies; 42+ messages in thread
From: Si-Wei Liu @ 2022-10-24 19:14 UTC (permalink / raw)
  To: Jason Wang; +Cc: virtualization, linux-kernel, mst



On 10/24/2022 1:40 AM, Jason Wang wrote:
> On Sat, Oct 22, 2022 at 7:49 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>> Live migration of vdpa would typically require re-instate vdpa
>> device with an idential set of configs on the destination node,
>> same way as how source node created the device in the first
>> place. In order to save orchestration software from memorizing
>> and keeping track of vdpa config, it will be helpful if the vdpa
>> tool provides the aids for exporting the initial configs as-is,
>> the way how vdpa device was created. The "vdpa dev show" command
>> seems to be the right vehicle for that. It is unlike the "vdpa dev
>> config show" command output which usually goes with the live value
>> in the device config space, and is not quite reliable subject to
>> the dynamics of feature negotiation or possible change by the
>> driver to the config space.
>>
>> Examples:
>>
>> 1) Create vDPA by default without any config attribute
>>
>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0
>> $ vdpa dev show vdpa0
>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 max_vqs 9 max_vq_size 256
>> $ vdpa dev -jp show vdpa0
>> {
>>      "dev": {
>>          "vdpa0": {
>>              "type": "network",
>>              "mgmtdev": "pci/0000:41:04.2",
>>              "vendor_id": 5555,
>>              "max_vqs": 9,
>>              "max_vq_size": 256,
>>          }
>>      }
>> }
>>
>> 2) Create vDPA with config attribute(s) specified
>>
>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0 \
>>      mac e4:11:c6:d3:45:f0 max_vq_pairs 4
>> $ vdpa dev show
>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 max_vqs 9 max_vq_size 256
>>    initial_config: mac e4:11:c6:d3:45:f0 max_vq_pairs 4
>> $ vdpa dev -jp show
>> {
>>      "dev": {
>>          "vdpa0": {
>>              "type": "network",
>>              "mgmtdev": "pci/0000:41:04.2",
>>              "vendor_id": 5555,
>>              "max_vqs": 9,
>>              "max_vq_size": 256,
>>              "initial_config": {
>>                  "mac": "e4:11:c6:d3:45:f0",
>>                  "max_vq_pairs": 4
>>              }
>>          }
>>      }
>> }
>>
>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>> ---
>>   drivers/vdpa/vdpa.c | 39 +++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 39 insertions(+)
>>
>> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
>> index bebded6..bfb8f54 100644
>> --- a/drivers/vdpa/vdpa.c
>> +++ b/drivers/vdpa/vdpa.c
>> @@ -677,6 +677,41 @@ static int vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct genl_info *i
>>   }
>>
>>   static int
>> +vdpa_dev_initcfg_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 device_id)
>> +{
>> +       struct vdpa_dev_set_config *cfg = &vdev->init_cfg;
>> +       int err = -EMSGSIZE;
>> +
>> +       if (!cfg->mask)
>> +               return 0;
>> +
>> +       switch (device_id) {
>> +       case VIRTIO_ID_NET:
>> +               if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR)) != 0 &&
>> +                   nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
>> +                           sizeof(cfg->net.mac), cfg->net.mac))
>> +                       return err;
>> +               if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU)) != 0 &&
>> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, cfg->net.mtu))
>> +                       return err;
>> +               if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP)) != 0 &&
>> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
>> +                               cfg->net.max_vq_pairs))
>> +                       return err;
>> +               break;
>> +       default:
>> +               break;
>> +       }
>> +
>> +       if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) != 0 &&
>> +           nla_put_u64_64bit(msg, VDPA_ATTR_DEV_FEATURES,
>> +                             cfg->device_features, VDPA_ATTR_PAD))
>> +               return err;
> A question: If any of those above attributes were not provisioned,
> should we show the ones that are inherited from the parent?
A simple answer would be yes, but the long answer is that I am not sure 
if there's any for the moment - there's no  default value for mtu, mac, 
and max_vqp that can be inherited from the parent (max_vqp by default 
being 1 is spec defined, not something inherited from the parent). And 
the device_features if inherited is displayed at 'vdpa dev config show' 
output. Can you remind me of a good example for inherited value that we 
may want to show here?


Thanks,
-Siwei


>
> Thanks
>
>> +
>> +       return 0;
>> +}
>> +
>> +static int
>>   vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 portid, u32 seq,
>>                int flags, struct netlink_ext_ack *extack)
>>   {
>> @@ -715,6 +750,10 @@ static int vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct genl_info *i
>>          if (nla_put_u16(msg, VDPA_ATTR_DEV_MIN_VQ_SIZE, min_vq_size))
>>                  goto msg_err;
>>
>> +       err = vdpa_dev_initcfg_fill(vdev, msg, device_id);
>> +       if (err)
>> +               goto msg_err;
>> +
>>          genlmsg_end(msg, hdr);
>>          return 0;
>>
>> --
>> 1.8.3.1
>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 3/4] vdpa: show dev config as-is in "vdpa dev show" output
@ 2022-10-24 19:14       ` Si-Wei Liu
  0 siblings, 0 replies; 42+ messages in thread
From: Si-Wei Liu @ 2022-10-24 19:14 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, parav, virtualization, linux-kernel



On 10/24/2022 1:40 AM, Jason Wang wrote:
> On Sat, Oct 22, 2022 at 7:49 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>> Live migration of vdpa would typically require re-instate vdpa
>> device with an idential set of configs on the destination node,
>> same way as how source node created the device in the first
>> place. In order to save orchestration software from memorizing
>> and keeping track of vdpa config, it will be helpful if the vdpa
>> tool provides the aids for exporting the initial configs as-is,
>> the way how vdpa device was created. The "vdpa dev show" command
>> seems to be the right vehicle for that. It is unlike the "vdpa dev
>> config show" command output which usually goes with the live value
>> in the device config space, and is not quite reliable subject to
>> the dynamics of feature negotiation or possible change by the
>> driver to the config space.
>>
>> Examples:
>>
>> 1) Create vDPA by default without any config attribute
>>
>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0
>> $ vdpa dev show vdpa0
>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 max_vqs 9 max_vq_size 256
>> $ vdpa dev -jp show vdpa0
>> {
>>      "dev": {
>>          "vdpa0": {
>>              "type": "network",
>>              "mgmtdev": "pci/0000:41:04.2",
>>              "vendor_id": 5555,
>>              "max_vqs": 9,
>>              "max_vq_size": 256,
>>          }
>>      }
>> }
>>
>> 2) Create vDPA with config attribute(s) specified
>>
>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0 \
>>      mac e4:11:c6:d3:45:f0 max_vq_pairs 4
>> $ vdpa dev show
>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 max_vqs 9 max_vq_size 256
>>    initial_config: mac e4:11:c6:d3:45:f0 max_vq_pairs 4
>> $ vdpa dev -jp show
>> {
>>      "dev": {
>>          "vdpa0": {
>>              "type": "network",
>>              "mgmtdev": "pci/0000:41:04.2",
>>              "vendor_id": 5555,
>>              "max_vqs": 9,
>>              "max_vq_size": 256,
>>              "initial_config": {
>>                  "mac": "e4:11:c6:d3:45:f0",
>>                  "max_vq_pairs": 4
>>              }
>>          }
>>      }
>> }
>>
>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>> ---
>>   drivers/vdpa/vdpa.c | 39 +++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 39 insertions(+)
>>
>> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
>> index bebded6..bfb8f54 100644
>> --- a/drivers/vdpa/vdpa.c
>> +++ b/drivers/vdpa/vdpa.c
>> @@ -677,6 +677,41 @@ static int vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct genl_info *i
>>   }
>>
>>   static int
>> +vdpa_dev_initcfg_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 device_id)
>> +{
>> +       struct vdpa_dev_set_config *cfg = &vdev->init_cfg;
>> +       int err = -EMSGSIZE;
>> +
>> +       if (!cfg->mask)
>> +               return 0;
>> +
>> +       switch (device_id) {
>> +       case VIRTIO_ID_NET:
>> +               if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR)) != 0 &&
>> +                   nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
>> +                           sizeof(cfg->net.mac), cfg->net.mac))
>> +                       return err;
>> +               if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU)) != 0 &&
>> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, cfg->net.mtu))
>> +                       return err;
>> +               if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP)) != 0 &&
>> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
>> +                               cfg->net.max_vq_pairs))
>> +                       return err;
>> +               break;
>> +       default:
>> +               break;
>> +       }
>> +
>> +       if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) != 0 &&
>> +           nla_put_u64_64bit(msg, VDPA_ATTR_DEV_FEATURES,
>> +                             cfg->device_features, VDPA_ATTR_PAD))
>> +               return err;
> A question: If any of those above attributes were not provisioned,
> should we show the ones that are inherited from the parent?
A simple answer would be yes, but the long answer is that I am not sure 
if there's any for the moment - there's no  default value for mtu, mac, 
and max_vqp that can be inherited from the parent (max_vqp by default 
being 1 is spec defined, not something inherited from the parent). And 
the device_features if inherited is displayed at 'vdpa dev config show' 
output. Can you remind me of a good example for inherited value that we 
may want to show here?


Thanks,
-Siwei


>
> Thanks
>
>> +
>> +       return 0;
>> +}
>> +
>> +static int
>>   vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 portid, u32 seq,
>>                int flags, struct netlink_ext_ack *extack)
>>   {
>> @@ -715,6 +750,10 @@ static int vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct genl_info *i
>>          if (nla_put_u16(msg, VDPA_ATTR_DEV_MIN_VQ_SIZE, min_vq_size))
>>                  goto msg_err;
>>
>> +       err = vdpa_dev_initcfg_fill(vdev, msg, device_id);
>> +       if (err)
>> +               goto msg_err;
>> +
>>          genlmsg_end(msg, hdr);
>>          return 0;
>>
>> --
>> 1.8.3.1
>>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 3/4] vdpa: show dev config as-is in "vdpa dev show" output
  2022-10-24 19:14       ` Si-Wei Liu
@ 2022-10-25  2:24         ` Jason Wang
  -1 siblings, 0 replies; 42+ messages in thread
From: Jason Wang @ 2022-10-25  2:24 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: mst, parav, virtualization, linux-kernel

On Tue, Oct 25, 2022 at 3:14 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 10/24/2022 1:40 AM, Jason Wang wrote:
> > On Sat, Oct 22, 2022 at 7:49 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >> Live migration of vdpa would typically require re-instate vdpa
> >> device with an idential set of configs on the destination node,
> >> same way as how source node created the device in the first
> >> place. In order to save orchestration software from memorizing
> >> and keeping track of vdpa config, it will be helpful if the vdpa
> >> tool provides the aids for exporting the initial configs as-is,
> >> the way how vdpa device was created. The "vdpa dev show" command
> >> seems to be the right vehicle for that. It is unlike the "vdpa dev
> >> config show" command output which usually goes with the live value
> >> in the device config space, and is not quite reliable subject to
> >> the dynamics of feature negotiation or possible change by the
> >> driver to the config space.
> >>
> >> Examples:
> >>
> >> 1) Create vDPA by default without any config attribute
> >>
> >> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0
> >> $ vdpa dev show vdpa0
> >> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 max_vqs 9 max_vq_size 256
> >> $ vdpa dev -jp show vdpa0
> >> {
> >>      "dev": {
> >>          "vdpa0": {
> >>              "type": "network",
> >>              "mgmtdev": "pci/0000:41:04.2",
> >>              "vendor_id": 5555,
> >>              "max_vqs": 9,
> >>              "max_vq_size": 256,
> >>          }
> >>      }
> >> }
> >>
> >> 2) Create vDPA with config attribute(s) specified
> >>
> >> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0 \
> >>      mac e4:11:c6:d3:45:f0 max_vq_pairs 4
> >> $ vdpa dev show
> >> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 max_vqs 9 max_vq_size 256
> >>    initial_config: mac e4:11:c6:d3:45:f0 max_vq_pairs 4
> >> $ vdpa dev -jp show
> >> {
> >>      "dev": {
> >>          "vdpa0": {
> >>              "type": "network",
> >>              "mgmtdev": "pci/0000:41:04.2",
> >>              "vendor_id": 5555,
> >>              "max_vqs": 9,
> >>              "max_vq_size": 256,
> >>              "initial_config": {
> >>                  "mac": "e4:11:c6:d3:45:f0",
> >>                  "max_vq_pairs": 4
> >>              }
> >>          }
> >>      }
> >> }
> >>
> >> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> >> ---
> >>   drivers/vdpa/vdpa.c | 39 +++++++++++++++++++++++++++++++++++++++
> >>   1 file changed, 39 insertions(+)
> >>
> >> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
> >> index bebded6..bfb8f54 100644
> >> --- a/drivers/vdpa/vdpa.c
> >> +++ b/drivers/vdpa/vdpa.c
> >> @@ -677,6 +677,41 @@ static int vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct genl_info *i
> >>   }
> >>
> >>   static int
> >> +vdpa_dev_initcfg_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 device_id)
> >> +{
> >> +       struct vdpa_dev_set_config *cfg = &vdev->init_cfg;
> >> +       int err = -EMSGSIZE;
> >> +
> >> +       if (!cfg->mask)
> >> +               return 0;
> >> +
> >> +       switch (device_id) {
> >> +       case VIRTIO_ID_NET:
> >> +               if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR)) != 0 &&
> >> +                   nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
> >> +                           sizeof(cfg->net.mac), cfg->net.mac))
> >> +                       return err;
> >> +               if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU)) != 0 &&
> >> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, cfg->net.mtu))
> >> +                       return err;
> >> +               if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP)) != 0 &&
> >> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
> >> +                               cfg->net.max_vq_pairs))
> >> +                       return err;
> >> +               break;
> >> +       default:
> >> +               break;
> >> +       }
> >> +
> >> +       if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) != 0 &&
> >> +           nla_put_u64_64bit(msg, VDPA_ATTR_DEV_FEATURES,
> >> +                             cfg->device_features, VDPA_ATTR_PAD))
> >> +               return err;
> > A question: If any of those above attributes were not provisioned,
> > should we show the ones that are inherited from the parent?
> A simple answer would be yes, but the long answer is that I am not sure
> if there's any for the moment - there's no  default value for mtu, mac,
> and max_vqp that can be inherited from the parent (max_vqp by default
> being 1 is spec defined, not something inherited from the parent).

Note that it is by default from driver level that if _F_MQ is not
negotiated. But I think we are talking about something different that
is out of the spec here, what if:

vDPA inherit _F_MQ but mqx_vqp is not provisioned via netlink.

Or is it not allowed?  At least some time in the past, mlx5 were
enabled with MQ with 8 queue pairs by default.

> And
> the device_features if inherited is displayed at 'vdpa dev config show'
> output. Can you remind me of a good example for inherited value that we
> may want to show here?

Some other cases:

1) MTU: there should be something that the device needs to report if
_F_MTU is negotiated even if it is not provisioned from netlink.
2) device_features: if device_features is not provisioned, we should
still report it via netlink here or do you mean the mgmt can assume it
should be the same as mgmtdev. Anyhow if we don't show device_features
if it is not provisioned, it will complicate the mgmt software.

Thanks

>
>
> Thanks,
> -Siwei
>
>
> >
> > Thanks
> >
> >> +
> >> +       return 0;
> >> +}
> >> +
> >> +static int
> >>   vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 portid, u32 seq,
> >>                int flags, struct netlink_ext_ack *extack)
> >>   {
> >> @@ -715,6 +750,10 @@ static int vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct genl_info *i
> >>          if (nla_put_u16(msg, VDPA_ATTR_DEV_MIN_VQ_SIZE, min_vq_size))
> >>                  goto msg_err;
> >>
> >> +       err = vdpa_dev_initcfg_fill(vdev, msg, device_id);
> >> +       if (err)
> >> +               goto msg_err;
> >> +
> >>          genlmsg_end(msg, hdr);
> >>          return 0;
> >>
> >> --
> >> 1.8.3.1
> >>
>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 3/4] vdpa: show dev config as-is in "vdpa dev show" output
@ 2022-10-25  2:24         ` Jason Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Jason Wang @ 2022-10-25  2:24 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: virtualization, linux-kernel, mst

On Tue, Oct 25, 2022 at 3:14 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 10/24/2022 1:40 AM, Jason Wang wrote:
> > On Sat, Oct 22, 2022 at 7:49 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >> Live migration of vdpa would typically require re-instate vdpa
> >> device with an idential set of configs on the destination node,
> >> same way as how source node created the device in the first
> >> place. In order to save orchestration software from memorizing
> >> and keeping track of vdpa config, it will be helpful if the vdpa
> >> tool provides the aids for exporting the initial configs as-is,
> >> the way how vdpa device was created. The "vdpa dev show" command
> >> seems to be the right vehicle for that. It is unlike the "vdpa dev
> >> config show" command output which usually goes with the live value
> >> in the device config space, and is not quite reliable subject to
> >> the dynamics of feature negotiation or possible change by the
> >> driver to the config space.
> >>
> >> Examples:
> >>
> >> 1) Create vDPA by default without any config attribute
> >>
> >> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0
> >> $ vdpa dev show vdpa0
> >> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 max_vqs 9 max_vq_size 256
> >> $ vdpa dev -jp show vdpa0
> >> {
> >>      "dev": {
> >>          "vdpa0": {
> >>              "type": "network",
> >>              "mgmtdev": "pci/0000:41:04.2",
> >>              "vendor_id": 5555,
> >>              "max_vqs": 9,
> >>              "max_vq_size": 256,
> >>          }
> >>      }
> >> }
> >>
> >> 2) Create vDPA with config attribute(s) specified
> >>
> >> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0 \
> >>      mac e4:11:c6:d3:45:f0 max_vq_pairs 4
> >> $ vdpa dev show
> >> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 max_vqs 9 max_vq_size 256
> >>    initial_config: mac e4:11:c6:d3:45:f0 max_vq_pairs 4
> >> $ vdpa dev -jp show
> >> {
> >>      "dev": {
> >>          "vdpa0": {
> >>              "type": "network",
> >>              "mgmtdev": "pci/0000:41:04.2",
> >>              "vendor_id": 5555,
> >>              "max_vqs": 9,
> >>              "max_vq_size": 256,
> >>              "initial_config": {
> >>                  "mac": "e4:11:c6:d3:45:f0",
> >>                  "max_vq_pairs": 4
> >>              }
> >>          }
> >>      }
> >> }
> >>
> >> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> >> ---
> >>   drivers/vdpa/vdpa.c | 39 +++++++++++++++++++++++++++++++++++++++
> >>   1 file changed, 39 insertions(+)
> >>
> >> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
> >> index bebded6..bfb8f54 100644
> >> --- a/drivers/vdpa/vdpa.c
> >> +++ b/drivers/vdpa/vdpa.c
> >> @@ -677,6 +677,41 @@ static int vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct genl_info *i
> >>   }
> >>
> >>   static int
> >> +vdpa_dev_initcfg_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 device_id)
> >> +{
> >> +       struct vdpa_dev_set_config *cfg = &vdev->init_cfg;
> >> +       int err = -EMSGSIZE;
> >> +
> >> +       if (!cfg->mask)
> >> +               return 0;
> >> +
> >> +       switch (device_id) {
> >> +       case VIRTIO_ID_NET:
> >> +               if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR)) != 0 &&
> >> +                   nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
> >> +                           sizeof(cfg->net.mac), cfg->net.mac))
> >> +                       return err;
> >> +               if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU)) != 0 &&
> >> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, cfg->net.mtu))
> >> +                       return err;
> >> +               if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP)) != 0 &&
> >> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
> >> +                               cfg->net.max_vq_pairs))
> >> +                       return err;
> >> +               break;
> >> +       default:
> >> +               break;
> >> +       }
> >> +
> >> +       if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) != 0 &&
> >> +           nla_put_u64_64bit(msg, VDPA_ATTR_DEV_FEATURES,
> >> +                             cfg->device_features, VDPA_ATTR_PAD))
> >> +               return err;
> > A question: If any of those above attributes were not provisioned,
> > should we show the ones that are inherited from the parent?
> A simple answer would be yes, but the long answer is that I am not sure
> if there's any for the moment - there's no  default value for mtu, mac,
> and max_vqp that can be inherited from the parent (max_vqp by default
> being 1 is spec defined, not something inherited from the parent).

Note that it is by default from driver level that if _F_MQ is not
negotiated. But I think we are talking about something different that
is out of the spec here, what if:

vDPA inherit _F_MQ but mqx_vqp is not provisioned via netlink.

Or is it not allowed?  At least some time in the past, mlx5 were
enabled with MQ with 8 queue pairs by default.

> And
> the device_features if inherited is displayed at 'vdpa dev config show'
> output. Can you remind me of a good example for inherited value that we
> may want to show here?

Some other cases:

1) MTU: there should be something that the device needs to report if
_F_MTU is negotiated even if it is not provisioned from netlink.
2) device_features: if device_features is not provisioned, we should
still report it via netlink here or do you mean the mgmt can assume it
should be the same as mgmtdev. Anyhow if we don't show device_features
if it is not provisioned, it will complicate the mgmt software.

Thanks

>
>
> Thanks,
> -Siwei
>
>
> >
> > Thanks
> >
> >> +
> >> +       return 0;
> >> +}
> >> +
> >> +static int
> >>   vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 portid, u32 seq,
> >>                int flags, struct netlink_ext_ack *extack)
> >>   {
> >> @@ -715,6 +750,10 @@ static int vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct genl_info *i
> >>          if (nla_put_u16(msg, VDPA_ATTR_DEV_MIN_VQ_SIZE, min_vq_size))
> >>                  goto msg_err;
> >>
> >> +       err = vdpa_dev_initcfg_fill(vdev, msg, device_id);
> >> +       if (err)
> >> +               goto msg_err;
> >> +
> >>          genlmsg_end(msg, hdr);
> >>          return 0;
> >>
> >> --
> >> 1.8.3.1
> >>
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 3/4] vdpa: show dev config as-is in "vdpa dev show" output
  2022-10-25  2:24         ` Jason Wang
  (?)
@ 2022-10-26  1:10         ` Si-Wei Liu
  2022-10-26  4:44             ` Jason Wang
  -1 siblings, 1 reply; 42+ messages in thread
From: Si-Wei Liu @ 2022-10-26  1:10 UTC (permalink / raw)
  To: Jason Wang; +Cc: virtualization, linux-kernel, mst


[-- Attachment #1.1: Type: text/plain, Size: 8352 bytes --]



On 10/24/2022 7:24 PM, Jason Wang wrote:
> On Tue, Oct 25, 2022 at 3:14 AM Si-Wei Liu<si-wei.liu@oracle.com>  wrote:
>>
>>
>> On 10/24/2022 1:40 AM, Jason Wang wrote:
>>> On Sat, Oct 22, 2022 at 7:49 AM Si-Wei Liu<si-wei.liu@oracle.com>  wrote:
>>>> Live migration of vdpa would typically require re-instate vdpa
>>>> device with an idential set of configs on the destination node,
>>>> same way as how source node created the device in the first
>>>> place. In order to save orchestration software from memorizing
>>>> and keeping track of vdpa config, it will be helpful if the vdpa
>>>> tool provides the aids for exporting the initial configs as-is,
>>>> the way how vdpa device was created. The "vdpa dev show" command
>>>> seems to be the right vehicle for that. It is unlike the "vdpa dev
>>>> config show" command output which usually goes with the live value
>>>> in the device config space, and is not quite reliable subject to
>>>> the dynamics of feature negotiation or possible change by the
>>>> driver to the config space.
>>>>
>>>> Examples:
>>>>
>>>> 1) Create vDPA by default without any config attribute
>>>>
>>>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0
>>>> $ vdpa dev show vdpa0
>>>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 max_vqs 9 max_vq_size 256
>>>> $ vdpa dev -jp show vdpa0
>>>> {
>>>>       "dev": {
>>>>           "vdpa0": {
>>>>               "type": "network",
>>>>               "mgmtdev": "pci/0000:41:04.2",
>>>>               "vendor_id": 5555,
>>>>               "max_vqs": 9,
>>>>               "max_vq_size": 256,
>>>>           }
>>>>       }
>>>> }
>>>>
>>>> 2) Create vDPA with config attribute(s) specified
>>>>
>>>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0 \
>>>>       mac e4:11:c6:d3:45:f0 max_vq_pairs 4
>>>> $ vdpa dev show
>>>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 max_vqs 9 max_vq_size 256
>>>>     initial_config: mac e4:11:c6:d3:45:f0 max_vq_pairs 4
>>>> $ vdpa dev -jp show
>>>> {
>>>>       "dev": {
>>>>           "vdpa0": {
>>>>               "type": "network",
>>>>               "mgmtdev": "pci/0000:41:04.2",
>>>>               "vendor_id": 5555,
>>>>               "max_vqs": 9,
>>>>               "max_vq_size": 256,
>>>>               "initial_config": {
>>>>                   "mac": "e4:11:c6:d3:45:f0",
>>>>                   "max_vq_pairs": 4
>>>>               }
>>>>           }
>>>>       }
>>>> }
>>>>
>>>> Signed-off-by: Si-Wei Liu<si-wei.liu@oracle.com>
>>>> ---
>>>>    drivers/vdpa/vdpa.c | 39 +++++++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 39 insertions(+)
>>>>
>>>> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
>>>> index bebded6..bfb8f54 100644
>>>> --- a/drivers/vdpa/vdpa.c
>>>> +++ b/drivers/vdpa/vdpa.c
>>>> @@ -677,6 +677,41 @@ static int vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct genl_info *i
>>>>    }
>>>>
>>>>    static int
>>>> +vdpa_dev_initcfg_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 device_id)
>>>> +{
>>>> +       struct vdpa_dev_set_config *cfg = &vdev->init_cfg;
>>>> +       int err = -EMSGSIZE;
>>>> +
>>>> +       if (!cfg->mask)
>>>> +               return 0;
>>>> +
>>>> +       switch (device_id) {
>>>> +       case VIRTIO_ID_NET:
>>>> +               if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR)) != 0 &&
>>>> +                   nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
>>>> +                           sizeof(cfg->net.mac), cfg->net.mac))
>>>> +                       return err;
>>>> +               if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU)) != 0 &&
>>>> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, cfg->net.mtu))
>>>> +                       return err;
>>>> +               if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP)) != 0 &&
>>>> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
>>>> +                               cfg->net.max_vq_pairs))
>>>> +                       return err;
>>>> +               break;
>>>> +       default:
>>>> +               break;
>>>> +       }
>>>> +
>>>> +       if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) != 0 &&
>>>> +           nla_put_u64_64bit(msg, VDPA_ATTR_DEV_FEATURES,
>>>> +                             cfg->device_features, VDPA_ATTR_PAD))
>>>> +               return err;
>>> A question: If any of those above attributes were not provisioned,
>>> should we show the ones that are inherited from the parent?
>> A simple answer would be yes, but the long answer is that I am not sure
>> if there's any for the moment - there's no  default value for mtu, mac,
>> and max_vqp that can be inherited from the parent (max_vqp by default
>> being 1 is spec defined, not something inherited from the parent).
> Note that it is by default from driver level that if _F_MQ is not
> negotiated. But I think we are talking about something different that
> is out of the spec here, what if:
>
> vDPA inherit _F_MQ but mqx_vqp is not provisioned via netlink.
>
> Or is it not allowed?
My understanding is that this is not allowed any more since the 
introduction of max_vqp attribute. Noted, currently we don't have a way 
for vendor driver to report the default value for mqx_vqp, if not 
otherwise specified in the CLI. Without getting the default value 
reported in 'vdpa mgmtdev show' level, it'd just confuse mgmt software 
even more.

>    At least some time in the past, mlx5 were
> enabled with MQ with 8 queue pairs by default.
That was the situation when there's no max_vqp attribute support from 
vdpa netlink API level. I think now every driver honors the vdpa core 
disposition to get a single queue pair if max_vqp config is missing.  
And the mlx5_vdpa driver with 8 queue pairs in the wild days is just 
irrelevant to be manageable by mgmt software, regardless of live migration.
>
>> And
>> the device_features if inherited is displayed at 'vdpa dev config show'
>> output. Can you remind me of a good example for inherited value that we
>> may want to show here?
> Some other cases:
>
> 1) MTU: there should be something that the device needs to report if
> _F_MTU is negotiated even if it is not provisioned from netlink.
I am not sure I understand the ask here. Noted the QEMU argument has to 
offer host_mtu=X with the maximum MTU value for guest to use (and 
applied as the initial MTU config during virtio-net probing for Linux 
driver), and the way to get the parent device MTU and whether that's 
relevant to vdpa device's MTU is very vendor specific. I think we would 
need new attribute(s) in the mgmtdev level to support what you want here?

> 2) device_features: if device_features is not provisioned, we should
> still report it via netlink here
Not the way I expected it, but with Lingshan's series to expose fields 
out of FEATURES_OK, the device_features is now reported through 'vdpa 
dev config show' regardless being specified or not, if I am not mistaken?

Currently we export the config attributes upon vdpa creation under the 
"initial_config" key. If we want to expose more default values inherited 
from mgmtdev, I think we can wrap up these default values under another 
key "inherited_config" to display in 'vdpa dev show' output. Does it fit 
what you have in mind?

> or do you mean the mgmt can assume it
> should be the same as mgmtdev. Anyhow if we don't show device_features
> if it is not provisioned, it will complicate the mgmt software.
Yes, as I said earlier, since the device_features attr getting added to 
the 'vdpa dev config show' command, this divergence started to 
complicate mgmt software already.

Thanks,
-Siwei
>
> Thanks
>
>>
>> Thanks,
>> -Siwei
>>
>>
>>> Thanks
>>>
>>>> +
>>>> +       return 0;
>>>> +}
>>>> +
>>>> +static int
>>>>    vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 portid, u32 seq,
>>>>                 int flags, struct netlink_ext_ack *extack)
>>>>    {
>>>> @@ -715,6 +750,10 @@ static int vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct genl_info *i
>>>>           if (nla_put_u16(msg, VDPA_ATTR_DEV_MIN_VQ_SIZE, min_vq_size))
>>>>                   goto msg_err;
>>>>
>>>> +       err = vdpa_dev_initcfg_fill(vdev, msg, device_id);
>>>> +       if (err)
>>>> +               goto msg_err;
>>>> +
>>>>           genlmsg_end(msg, hdr);
>>>>           return 0;
>>>>
>>>> --
>>>> 1.8.3.1
>>>>

[-- Attachment #1.2: Type: text/html, Size: 10664 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 3/4] vdpa: show dev config as-is in "vdpa dev show" output
  2022-10-26  1:10         ` Si-Wei Liu
@ 2022-10-26  4:44             ` Jason Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Jason Wang @ 2022-10-26  4:44 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: mst, parav, virtualization, linux-kernel, Cindy Lu


在 2022/10/26 09:10, Si-Wei Liu 写道:
>
>
> On 10/24/2022 7:24 PM, Jason Wang wrote:
>> On Tue, Oct 25, 2022 at 3:14 AM Si-Wei Liu<si-wei.liu@oracle.com>  wrote:
>>> On 10/24/2022 1:40 AM, Jason Wang wrote:
>>>> On Sat, Oct 22, 2022 at 7:49 AM Si-Wei Liu<si-wei.liu@oracle.com>  wrote:
>>>>> Live migration of vdpa would typically require re-instate vdpa
>>>>> device with an idential set of configs on the destination node,
>>>>> same way as how source node created the device in the first
>>>>> place. In order to save orchestration software from memorizing
>>>>> and keeping track of vdpa config, it will be helpful if the vdpa
>>>>> tool provides the aids for exporting the initial configs as-is,
>>>>> the way how vdpa device was created. The "vdpa dev show" command
>>>>> seems to be the right vehicle for that. It is unlike the "vdpa dev
>>>>> config show" command output which usually goes with the live value
>>>>> in the device config space, and is not quite reliable subject to
>>>>> the dynamics of feature negotiation or possible change by the
>>>>> driver to the config space.
>>>>>
>>>>> Examples:
>>>>>
>>>>> 1) Create vDPA by default without any config attribute
>>>>>
>>>>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0
>>>>> $ vdpa dev show vdpa0
>>>>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 max_vqs 9 max_vq_size 256
>>>>> $ vdpa dev -jp show vdpa0
>>>>> {
>>>>>       "dev": {
>>>>>           "vdpa0": {
>>>>>               "type": "network",
>>>>>               "mgmtdev": "pci/0000:41:04.2",
>>>>>               "vendor_id": 5555,
>>>>>               "max_vqs": 9,
>>>>>               "max_vq_size": 256,
>>>>>           }
>>>>>       }
>>>>> }
>>>>>
>>>>> 2) Create vDPA with config attribute(s) specified
>>>>>
>>>>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0 \
>>>>>       mac e4:11:c6:d3:45:f0 max_vq_pairs 4
>>>>> $ vdpa dev show
>>>>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 max_vqs 9 max_vq_size 256
>>>>>     initial_config: mac e4:11:c6:d3:45:f0 max_vq_pairs 4
>>>>> $ vdpa dev -jp show
>>>>> {
>>>>>       "dev": {
>>>>>           "vdpa0": {
>>>>>               "type": "network",
>>>>>               "mgmtdev": "pci/0000:41:04.2",
>>>>>               "vendor_id": 5555,
>>>>>               "max_vqs": 9,
>>>>>               "max_vq_size": 256,
>>>>>               "initial_config": {
>>>>>                   "mac": "e4:11:c6:d3:45:f0",
>>>>>                   "max_vq_pairs": 4
>>>>>               }
>>>>>           }
>>>>>       }
>>>>> }
>>>>>
>>>>> Signed-off-by: Si-Wei Liu<si-wei.liu@oracle.com>
>>>>> ---
>>>>>    drivers/vdpa/vdpa.c | 39 +++++++++++++++++++++++++++++++++++++++
>>>>>    1 file changed, 39 insertions(+)
>>>>>
>>>>> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
>>>>> index bebded6..bfb8f54 100644
>>>>> --- a/drivers/vdpa/vdpa.c
>>>>> +++ b/drivers/vdpa/vdpa.c
>>>>> @@ -677,6 +677,41 @@ static int vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct genl_info *i
>>>>>    }
>>>>>
>>>>>    static int
>>>>> +vdpa_dev_initcfg_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 device_id)
>>>>> +{
>>>>> +       struct vdpa_dev_set_config *cfg = &vdev->init_cfg;
>>>>> +       int err = -EMSGSIZE;
>>>>> +
>>>>> +       if (!cfg->mask)
>>>>> +               return 0;
>>>>> +
>>>>> +       switch (device_id) {
>>>>> +       case VIRTIO_ID_NET:
>>>>> +               if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR)) != 0 &&
>>>>> +                   nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
>>>>> +                           sizeof(cfg->net.mac), cfg->net.mac))
>>>>> +                       return err;
>>>>> +               if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU)) != 0 &&
>>>>> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, cfg->net.mtu))
>>>>> +                       return err;
>>>>> +               if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP)) != 0 &&
>>>>> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
>>>>> +                               cfg->net.max_vq_pairs))
>>>>> +                       return err;
>>>>> +               break;
>>>>> +       default:
>>>>> +               break;
>>>>> +       }
>>>>> +
>>>>> +       if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) != 0 &&
>>>>> +           nla_put_u64_64bit(msg, VDPA_ATTR_DEV_FEATURES,
>>>>> +                             cfg->device_features, VDPA_ATTR_PAD))
>>>>> +               return err;
>>>> A question: If any of those above attributes were not provisioned,
>>>> should we show the ones that are inherited from the parent?
>>> A simple answer would be yes, but the long answer is that I am not sure
>>> if there's any for the moment - there's no  default value for mtu, mac,
>>> and max_vqp that can be inherited from the parent (max_vqp by default
>>> being 1 is spec defined, not something inherited from the parent).
>> Note that it is by default from driver level that if _F_MQ is not
>> negotiated. But I think we are talking about something different that
>> is out of the spec here, what if:
>>
>> vDPA inherit _F_MQ but mqx_vqp is not provisioned via netlink.
>>
>> Or is it not allowed?
> My understanding is that this is not allowed any more since the 
> introduction of max_vqp attribute. Noted, currently we don't have a 
> way for vendor driver to report the default value for mqx_vqp, 


I think it can be reported in this patch?


> if not otherwise specified in the CLI. Without getting the default 
> value reported in 'vdpa mgmtdev show' level, it'd just confuse mgmt 
> software even more.


Yes, this is something that we need to fix. And what's more in order to 
support dynamic provisioning, we need a way to report the number of 
available instances that could be used for vDPA device provisioning.


>
>>    At least some time in the past, mlx5 were
>> enabled with MQ with 8 queue pairs by default.
> That was the situation when there's no max_vqp attribute support from 
> vdpa netlink API level. I think now every driver honors the vdpa core 
> disposition to get a single queue pair if max_vqp config is missing.

So we have:

int vdpa_register_device(struct vdpa_device *vdev, int nvqs)

This means technically, parent can allocate a multiqueue devices with 
_F_MQ features if max_vqp and device_features is not provisioned. And 
what's more, what happens if _F_MQ is provisioned by max_vqp is not 
specified?

The question is:

When a attribute is not specificed/provisioned via net link, what's the 
default value? The answer should be consistent: if device_features is 
determined by the parent, we should do the same for mqx_vqp. And it 
looks to me all of those belongs to the initial config (self-contained)


> And the mlx5_vdpa driver with 8 queue pairs in the wild days is just 
> irrelevant to be manageable by mgmt software, regardless of live 
> migration.
>>> And
>>> the device_features if inherited is displayed at 'vdpa dev config show'
>>> output. Can you remind me of a good example for inherited value that we
>>> may want to show here?
>> Some other cases:
>>
>> 1) MTU: there should be something that the device needs to report if
>> _F_MTU is negotiated even if it is not provisioned from netlink.
> I am not sure I understand the ask here. Noted the QEMU argument has 
> to offer host_mtu=X with the maximum MTU value for guest to use (and 
> applied as the initial MTU config during virtio-net probing for Linux 
> driver), 


Adding Cindy.

I think it's a known issue that we need to do sanity check to make sure 
cli parameters matches what is provisioned from netlink.


> and the way to get the parent device MTU and whether that's relevant 
> to vdpa device's MTU is very vendor specific.


So I think the max MTU of parent should be equal to the max MTU of the vDPA.


> I think we would need new attribute(s) in the mgmtdev level to support 
> what you want here?


Not sure, but what I want to ask is consider we provision MTU feature 
but without max MTU value, do we need to report the initial max MTU here?


>
>> 2) device_features: if device_features is not provisioned, we should
>> still report it via netlink here
> Not the way I expected it, but with Lingshan's series to expose fields 
> out of FEATURES_OK, the device_features is now reported through 'vdpa 
> dev config show' regardless being specified or not, if I am not mistaken?


Yes.


>
> Currently we export the config attributes upon vdpa creation under the 
> "initial_config" key. If we want to expose more default values 
> inherited from mgmtdev, I think we can wrap up these default values 
> under another key "inherited_config" to display in 'vdpa dev show' 
> output. Does it fit what you have in mind?


I wonder if it's better to merge those two, or is there any advantages 
of splitting them?


>
>> or do you mean the mgmt can assume it
>> should be the same as mgmtdev. Anyhow if we don't show device_features
>> if it is not provisioned, it will complicate the mgmt software.
> Yes, as I said earlier, since the device_features attr getting added 
> to the 'vdpa dev config show' command, this divergence started to 
> complicate mgmt software already.
>
> Thanks,


Thanks


> -Siwei
>> Thanks
>>
>>> Thanks,
>>> -Siwei
>>>
>>>
>>>> Thanks
>>>>
>>>>> +
>>>>> +       return 0;
>>>>> +}
>>>>> +
>>>>> +static int
>>>>>    vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 portid, u32 seq,
>>>>>                 int flags, struct netlink_ext_ack *extack)
>>>>>    {
>>>>> @@ -715,6 +750,10 @@ static int vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct genl_info *i
>>>>>           if (nla_put_u16(msg, VDPA_ATTR_DEV_MIN_VQ_SIZE, min_vq_size))
>>>>>                   goto msg_err;
>>>>>
>>>>> +       err = vdpa_dev_initcfg_fill(vdev, msg, device_id);
>>>>> +       if (err)
>>>>> +               goto msg_err;
>>>>> +
>>>>>           genlmsg_end(msg, hdr);
>>>>>           return 0;
>>>>>
>>>>> --
>>>>> 1.8.3.1
>>>>>
>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 3/4] vdpa: show dev config as-is in "vdpa dev show" output
@ 2022-10-26  4:44             ` Jason Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Jason Wang @ 2022-10-26  4:44 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: virtualization, linux-kernel, Cindy Lu, mst


在 2022/10/26 09:10, Si-Wei Liu 写道:
>
>
> On 10/24/2022 7:24 PM, Jason Wang wrote:
>> On Tue, Oct 25, 2022 at 3:14 AM Si-Wei Liu<si-wei.liu@oracle.com>  wrote:
>>> On 10/24/2022 1:40 AM, Jason Wang wrote:
>>>> On Sat, Oct 22, 2022 at 7:49 AM Si-Wei Liu<si-wei.liu@oracle.com>  wrote:
>>>>> Live migration of vdpa would typically require re-instate vdpa
>>>>> device with an idential set of configs on the destination node,
>>>>> same way as how source node created the device in the first
>>>>> place. In order to save orchestration software from memorizing
>>>>> and keeping track of vdpa config, it will be helpful if the vdpa
>>>>> tool provides the aids for exporting the initial configs as-is,
>>>>> the way how vdpa device was created. The "vdpa dev show" command
>>>>> seems to be the right vehicle for that. It is unlike the "vdpa dev
>>>>> config show" command output which usually goes with the live value
>>>>> in the device config space, and is not quite reliable subject to
>>>>> the dynamics of feature negotiation or possible change by the
>>>>> driver to the config space.
>>>>>
>>>>> Examples:
>>>>>
>>>>> 1) Create vDPA by default without any config attribute
>>>>>
>>>>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0
>>>>> $ vdpa dev show vdpa0
>>>>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 max_vqs 9 max_vq_size 256
>>>>> $ vdpa dev -jp show vdpa0
>>>>> {
>>>>>       "dev": {
>>>>>           "vdpa0": {
>>>>>               "type": "network",
>>>>>               "mgmtdev": "pci/0000:41:04.2",
>>>>>               "vendor_id": 5555,
>>>>>               "max_vqs": 9,
>>>>>               "max_vq_size": 256,
>>>>>           }
>>>>>       }
>>>>> }
>>>>>
>>>>> 2) Create vDPA with config attribute(s) specified
>>>>>
>>>>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0 \
>>>>>       mac e4:11:c6:d3:45:f0 max_vq_pairs 4
>>>>> $ vdpa dev show
>>>>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 max_vqs 9 max_vq_size 256
>>>>>     initial_config: mac e4:11:c6:d3:45:f0 max_vq_pairs 4
>>>>> $ vdpa dev -jp show
>>>>> {
>>>>>       "dev": {
>>>>>           "vdpa0": {
>>>>>               "type": "network",
>>>>>               "mgmtdev": "pci/0000:41:04.2",
>>>>>               "vendor_id": 5555,
>>>>>               "max_vqs": 9,
>>>>>               "max_vq_size": 256,
>>>>>               "initial_config": {
>>>>>                   "mac": "e4:11:c6:d3:45:f0",
>>>>>                   "max_vq_pairs": 4
>>>>>               }
>>>>>           }
>>>>>       }
>>>>> }
>>>>>
>>>>> Signed-off-by: Si-Wei Liu<si-wei.liu@oracle.com>
>>>>> ---
>>>>>    drivers/vdpa/vdpa.c | 39 +++++++++++++++++++++++++++++++++++++++
>>>>>    1 file changed, 39 insertions(+)
>>>>>
>>>>> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
>>>>> index bebded6..bfb8f54 100644
>>>>> --- a/drivers/vdpa/vdpa.c
>>>>> +++ b/drivers/vdpa/vdpa.c
>>>>> @@ -677,6 +677,41 @@ static int vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct genl_info *i
>>>>>    }
>>>>>
>>>>>    static int
>>>>> +vdpa_dev_initcfg_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 device_id)
>>>>> +{
>>>>> +       struct vdpa_dev_set_config *cfg = &vdev->init_cfg;
>>>>> +       int err = -EMSGSIZE;
>>>>> +
>>>>> +       if (!cfg->mask)
>>>>> +               return 0;
>>>>> +
>>>>> +       switch (device_id) {
>>>>> +       case VIRTIO_ID_NET:
>>>>> +               if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR)) != 0 &&
>>>>> +                   nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
>>>>> +                           sizeof(cfg->net.mac), cfg->net.mac))
>>>>> +                       return err;
>>>>> +               if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU)) != 0 &&
>>>>> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, cfg->net.mtu))
>>>>> +                       return err;
>>>>> +               if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP)) != 0 &&
>>>>> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
>>>>> +                               cfg->net.max_vq_pairs))
>>>>> +                       return err;
>>>>> +               break;
>>>>> +       default:
>>>>> +               break;
>>>>> +       }
>>>>> +
>>>>> +       if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) != 0 &&
>>>>> +           nla_put_u64_64bit(msg, VDPA_ATTR_DEV_FEATURES,
>>>>> +                             cfg->device_features, VDPA_ATTR_PAD))
>>>>> +               return err;
>>>> A question: If any of those above attributes were not provisioned,
>>>> should we show the ones that are inherited from the parent?
>>> A simple answer would be yes, but the long answer is that I am not sure
>>> if there's any for the moment - there's no  default value for mtu, mac,
>>> and max_vqp that can be inherited from the parent (max_vqp by default
>>> being 1 is spec defined, not something inherited from the parent).
>> Note that it is by default from driver level that if _F_MQ is not
>> negotiated. But I think we are talking about something different that
>> is out of the spec here, what if:
>>
>> vDPA inherit _F_MQ but mqx_vqp is not provisioned via netlink.
>>
>> Or is it not allowed?
> My understanding is that this is not allowed any more since the 
> introduction of max_vqp attribute. Noted, currently we don't have a 
> way for vendor driver to report the default value for mqx_vqp, 


I think it can be reported in this patch?


> if not otherwise specified in the CLI. Without getting the default 
> value reported in 'vdpa mgmtdev show' level, it'd just confuse mgmt 
> software even more.


Yes, this is something that we need to fix. And what's more in order to 
support dynamic provisioning, we need a way to report the number of 
available instances that could be used for vDPA device provisioning.


>
>>    At least some time in the past, mlx5 were
>> enabled with MQ with 8 queue pairs by default.
> That was the situation when there's no max_vqp attribute support from 
> vdpa netlink API level. I think now every driver honors the vdpa core 
> disposition to get a single queue pair if max_vqp config is missing.

So we have:

int vdpa_register_device(struct vdpa_device *vdev, int nvqs)

This means technically, parent can allocate a multiqueue devices with 
_F_MQ features if max_vqp and device_features is not provisioned. And 
what's more, what happens if _F_MQ is provisioned by max_vqp is not 
specified?

The question is:

When a attribute is not specificed/provisioned via net link, what's the 
default value? The answer should be consistent: if device_features is 
determined by the parent, we should do the same for mqx_vqp. And it 
looks to me all of those belongs to the initial config (self-contained)


> And the mlx5_vdpa driver with 8 queue pairs in the wild days is just 
> irrelevant to be manageable by mgmt software, regardless of live 
> migration.
>>> And
>>> the device_features if inherited is displayed at 'vdpa dev config show'
>>> output. Can you remind me of a good example for inherited value that we
>>> may want to show here?
>> Some other cases:
>>
>> 1) MTU: there should be something that the device needs to report if
>> _F_MTU is negotiated even if it is not provisioned from netlink.
> I am not sure I understand the ask here. Noted the QEMU argument has 
> to offer host_mtu=X with the maximum MTU value for guest to use (and 
> applied as the initial MTU config during virtio-net probing for Linux 
> driver), 


Adding Cindy.

I think it's a known issue that we need to do sanity check to make sure 
cli parameters matches what is provisioned from netlink.


> and the way to get the parent device MTU and whether that's relevant 
> to vdpa device's MTU is very vendor specific.


So I think the max MTU of parent should be equal to the max MTU of the vDPA.


> I think we would need new attribute(s) in the mgmtdev level to support 
> what you want here?


Not sure, but what I want to ask is consider we provision MTU feature 
but without max MTU value, do we need to report the initial max MTU here?


>
>> 2) device_features: if device_features is not provisioned, we should
>> still report it via netlink here
> Not the way I expected it, but with Lingshan's series to expose fields 
> out of FEATURES_OK, the device_features is now reported through 'vdpa 
> dev config show' regardless being specified or not, if I am not mistaken?


Yes.


>
> Currently we export the config attributes upon vdpa creation under the 
> "initial_config" key. If we want to expose more default values 
> inherited from mgmtdev, I think we can wrap up these default values 
> under another key "inherited_config" to display in 'vdpa dev show' 
> output. Does it fit what you have in mind?


I wonder if it's better to merge those two, or is there any advantages 
of splitting them?


>
>> or do you mean the mgmt can assume it
>> should be the same as mgmtdev. Anyhow if we don't show device_features
>> if it is not provisioned, it will complicate the mgmt software.
> Yes, as I said earlier, since the device_features attr getting added 
> to the 'vdpa dev config show' command, this divergence started to 
> complicate mgmt software already.
>
> Thanks,


Thanks


> -Siwei
>> Thanks
>>
>>> Thanks,
>>> -Siwei
>>>
>>>
>>>> Thanks
>>>>
>>>>> +
>>>>> +       return 0;
>>>>> +}
>>>>> +
>>>>> +static int
>>>>>    vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 portid, u32 seq,
>>>>>                 int flags, struct netlink_ext_ack *extack)
>>>>>    {
>>>>> @@ -715,6 +750,10 @@ static int vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct genl_info *i
>>>>>           if (nla_put_u16(msg, VDPA_ATTR_DEV_MIN_VQ_SIZE, min_vq_size))
>>>>>                   goto msg_err;
>>>>>
>>>>> +       err = vdpa_dev_initcfg_fill(vdev, msg, device_id);
>>>>> +       if (err)
>>>>> +               goto msg_err;
>>>>> +
>>>>>           genlmsg_end(msg, hdr);
>>>>>           return 0;
>>>>>
>>>>> --
>>>>> 1.8.3.1
>>>>>
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 3/4] vdpa: show dev config as-is in "vdpa dev show" output
  2022-10-26  4:44             ` Jason Wang
@ 2022-10-27  6:31               ` Si-Wei Liu
  -1 siblings, 0 replies; 42+ messages in thread
From: Si-Wei Liu @ 2022-10-27  6:31 UTC (permalink / raw)
  To: Jason Wang; +Cc: virtualization, linux-kernel, Cindy Lu, mst



On 10/25/2022 9:44 PM, Jason Wang wrote:
>
> 在 2022/10/26 09:10, Si-Wei Liu 写道:
>>
>>
>> On 10/24/2022 7:24 PM, Jason Wang wrote:
>>> On Tue, Oct 25, 2022 at 3:14 AM Si-Wei Liu<si-wei.liu@oracle.com>  
>>> wrote:
>>>> On 10/24/2022 1:40 AM, Jason Wang wrote:
>>>>> On Sat, Oct 22, 2022 at 7:49 AM Si-Wei Liu<si-wei.liu@oracle.com>  
>>>>> wrote:
>>>>>> Live migration of vdpa would typically require re-instate vdpa
>>>>>> device with an idential set of configs on the destination node,
>>>>>> same way as how source node created the device in the first
>>>>>> place. In order to save orchestration software from memorizing
>>>>>> and keeping track of vdpa config, it will be helpful if the vdpa
>>>>>> tool provides the aids for exporting the initial configs as-is,
>>>>>> the way how vdpa device was created. The "vdpa dev show" command
>>>>>> seems to be the right vehicle for that. It is unlike the "vdpa dev
>>>>>> config show" command output which usually goes with the live value
>>>>>> in the device config space, and is not quite reliable subject to
>>>>>> the dynamics of feature negotiation or possible change by the
>>>>>> driver to the config space.
>>>>>>
>>>>>> Examples:
>>>>>>
>>>>>> 1) Create vDPA by default without any config attribute
>>>>>>
>>>>>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0
>>>>>> $ vdpa dev show vdpa0
>>>>>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 
>>>>>> max_vqs 9 max_vq_size 256
>>>>>> $ vdpa dev -jp show vdpa0
>>>>>> {
>>>>>>       "dev": {
>>>>>>           "vdpa0": {
>>>>>>               "type": "network",
>>>>>>               "mgmtdev": "pci/0000:41:04.2",
>>>>>>               "vendor_id": 5555,
>>>>>>               "max_vqs": 9,
>>>>>>               "max_vq_size": 256,
>>>>>>           }
>>>>>>       }
>>>>>> }
>>>>>>
>>>>>> 2) Create vDPA with config attribute(s) specified
>>>>>>
>>>>>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0 \
>>>>>>       mac e4:11:c6:d3:45:f0 max_vq_pairs 4
>>>>>> $ vdpa dev show
>>>>>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 
>>>>>> max_vqs 9 max_vq_size 256
>>>>>>     initial_config: mac e4:11:c6:d3:45:f0 max_vq_pairs 4
>>>>>> $ vdpa dev -jp show
>>>>>> {
>>>>>>       "dev": {
>>>>>>           "vdpa0": {
>>>>>>               "type": "network",
>>>>>>               "mgmtdev": "pci/0000:41:04.2",
>>>>>>               "vendor_id": 5555,
>>>>>>               "max_vqs": 9,
>>>>>>               "max_vq_size": 256,
>>>>>>               "initial_config": {
>>>>>>                   "mac": "e4:11:c6:d3:45:f0",
>>>>>>                   "max_vq_pairs": 4
>>>>>>               }
>>>>>>           }
>>>>>>       }
>>>>>> }
>>>>>>
>>>>>> Signed-off-by: Si-Wei Liu<si-wei.liu@oracle.com>
>>>>>> ---
>>>>>>    drivers/vdpa/vdpa.c | 39 +++++++++++++++++++++++++++++++++++++++
>>>>>>    1 file changed, 39 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
>>>>>> index bebded6..bfb8f54 100644
>>>>>> --- a/drivers/vdpa/vdpa.c
>>>>>> +++ b/drivers/vdpa/vdpa.c
>>>>>> @@ -677,6 +677,41 @@ static int 
>>>>>> vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct 
>>>>>> genl_info *i
>>>>>>    }
>>>>>>
>>>>>>    static int
>>>>>> +vdpa_dev_initcfg_fill(struct vdpa_device *vdev, struct sk_buff 
>>>>>> *msg, u32 device_id)
>>>>>> +{
>>>>>> +       struct vdpa_dev_set_config *cfg = &vdev->init_cfg;
>>>>>> +       int err = -EMSGSIZE;
>>>>>> +
>>>>>> +       if (!cfg->mask)
>>>>>> +               return 0;
>>>>>> +
>>>>>> +       switch (device_id) {
>>>>>> +       case VIRTIO_ID_NET:
>>>>>> +               if ((cfg->mask & 
>>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR)) != 0 &&
>>>>>> +                   nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
>>>>>> +                           sizeof(cfg->net.mac), cfg->net.mac))
>>>>>> +                       return err;
>>>>>> +               if ((cfg->mask & 
>>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU)) != 0 &&
>>>>>> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, 
>>>>>> cfg->net.mtu))
>>>>>> +                       return err;
>>>>>> +               if ((cfg->mask & 
>>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP)) != 0 &&
>>>>>> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
>>>>>> + cfg->net.max_vq_pairs))
>>>>>> +                       return err;
>>>>>> +               break;
>>>>>> +       default:
>>>>>> +               break;
>>>>>> +       }
>>>>>> +
>>>>>> +       if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) != 0 &&
>>>>>> +           nla_put_u64_64bit(msg, VDPA_ATTR_DEV_FEATURES,
>>>>>> +                             cfg->device_features, VDPA_ATTR_PAD))
>>>>>> +               return err;
>>>>> A question: If any of those above attributes were not provisioned,
>>>>> should we show the ones that are inherited from the parent?
>>>> A simple answer would be yes, but the long answer is that I am not 
>>>> sure
>>>> if there's any for the moment - there's no  default value for mtu, 
>>>> mac,
>>>> and max_vqp that can be inherited from the parent (max_vqp by default
>>>> being 1 is spec defined, not something inherited from the parent).
>>> Note that it is by default from driver level that if _F_MQ is not
>>> negotiated. But I think we are talking about something different that
>>> is out of the spec here, what if:
>>>
>>> vDPA inherit _F_MQ but mqx_vqp is not provisioned via netlink.
>>>
>>> Or is it not allowed?
>> My understanding is that this is not allowed any more since the 
>> introduction of max_vqp attribute. Noted, currently we don't have a 
>> way for vendor driver to report the default value for mqx_vqp, 
>
>
> I think it can be reported in this patch?
Yes, we can add, but I am not sure whether or not this will be 
practically useful, for e.g. the same command without max_vqp specified 
may render different number of queues across different devices, or 
different revisions of the same vendor's devices. Does it complicate the 
mgmt software even more, I'm not sure.... Could we instead mandate 
max_vqp to be 1 from vdpa core level if user doesn't explicitly specify 
the value? That way it is more consistent in terms of the resulting 
number of queue pairs (=1) with the case where parent device does not 
offer the _F_MQ feature.

>
>
>> if not otherwise specified in the CLI. Without getting the default 
>> value reported in 'vdpa mgmtdev show' level, it'd just confuse mgmt 
>> software even more.
>
>
> Yes, this is something that we need to fix. And what's more in order 
> to support dynamic provisioning, we need a way to report the number of 
> available instances that could be used for vDPA device provisioning.
Wouldn't it be possible to achieve that by simply checking how many 
parent mgmtdev instances don't have vdpa device provisioned yet? e.g.

inuse=$(vdpa dev show | grep mgmtdev | wc -l)
total=$(vdpa mgmtdev show  | grep "supported_classes" | wc -l )
echo $((total - inuse))

>
>
>>
>>>    At least some time in the past, mlx5 were
>>> enabled with MQ with 8 queue pairs by default.
>> That was the situation when there's no max_vqp attribute support from 
>> vdpa netlink API level. I think now every driver honors the vdpa core 
>> disposition to get a single queue pair if max_vqp config is missing.
>
> So we have:
>
> int vdpa_register_device(struct vdpa_device *vdev, int nvqs)
>
> This means technically, parent can allocate a multiqueue devices with 
> _F_MQ features if max_vqp and device_features is not provisioned. And 
> what's more, what happens if _F_MQ is provisioned by max_vqp is not 
> specified?
>
> The question is:
>
> When a attribute is not specificed/provisioned via net link, what's 
> the default value? The answer should be consistent: if device_features 
> is determined by the parent, we should do the same for mqx_vqp. 
OK I got your point.

> And it looks to me all of those belongs to the initial config 
> (self-contained)
Right. I wonder if we can have vdpa core define the default value (for 
e.g. max_vqp=1) for those unspecified attribute (esp. when the 
corresponding device feature is offered and provisioned) whenever 
possible. Which I think it'll be more consistent for the same command to 
get to the same result between different vendor drivers. While we still 
keep the possibility for future extension to allow driver override the 
vdpa core disposition if the real use case emerges. What do you think?

>
>
>> And the mlx5_vdpa driver with 8 queue pairs in the wild days is just 
>> irrelevant to be manageable by mgmt software, regardless of live 
>> migration.
>>>> And
>>>> the device_features if inherited is displayed at 'vdpa dev config 
>>>> show'
>>>> output. Can you remind me of a good example for inherited value 
>>>> that we
>>>> may want to show here?
>>> Some other cases:
>>>
>>> 1) MTU: there should be something that the device needs to report if
>>> _F_MTU is negotiated even if it is not provisioned from netlink.
>> I am not sure I understand the ask here. Noted the QEMU argument has 
>> to offer host_mtu=X with the maximum MTU value for guest to use (and 
>> applied as the initial MTU config during virtio-net probing for Linux 
>> driver), 
>
>
> Adding Cindy.
>
> I think it's a known issue that we need to do sanity check to make 
> sure cli parameters matches what is provisioned from netlink.
Right. How's the plan for QEMU to get to the mtu provisioned by netlink, 
via a new vhost-vdpa ioctl call? If so, will  QEMU be able to read it 
directly from kernel when it comes to the vhost-vdpa backend, without 
having user to specify host_mtu from CLI?
>
>
>> and the way to get the parent device MTU and whether that's relevant 
>> to vdpa device's MTU is very vendor specific.
>
>
> So I think the max MTU of parent should be equal to the max MTU of the 
> vDPA.
Noted here the parent might not be necessarily the mgmtdev where vdpa 
gets created over. It may well end up with the MTU on the PF (uplink 
port) which the mgmt software has to concern with. My point is the 
utility and tool chain able to derive the maximal MTU effectively 
allowed for vDPA device may live out of vDPA's realm. It's a rare or 
even invalid configuration to have vDPA configured with a bigger value 
than the MTU on the uplink port or parent device. It's more common when 
MTU config is involved, it has to be consistently configured across all 
the network links along, from parent device (uplink port) down to the 
switchdev representor port, vdpa device, and QEMU virtio-net object.

>
>
>> I think we would need new attribute(s) in the mgmtdev level to 
>> support what you want here?
>
>
> Not sure, but what I want to ask is consider we provision MTU feature 
> but without max MTU value, do we need to report the initial max MTU here?
Yep, maybe. I'm not very sure if this will be very useful to be honest, 
consider it's kinda a rare case to me were to provision MTU feature 
without a specific MTU value. If one cares about MTU, mgmt software 
should configure some mtu through "vdpa dev add ... mtu ...", no?

On the other hand, no mtu value specified may mean "go with what the 
uplink port or parent device has". I think this is a pretty useful case 
if the vendor's NIC supports updating MTU on the fly without having to 
tear down QEMU and reconfigure vdpa. I'm not sure if we end up with 
killing this use case by limiting initial max MTU to a fixed value.

>
>
>>
>>> 2) device_features: if device_features is not provisioned, we should
>>> still report it via netlink here
>> Not the way I expected it, but with Lingshan's series to expose 
>> fields out of FEATURES_OK, the device_features is now reported 
>> through 'vdpa dev config show' regardless being specified or not, if 
>> I am not mistaken?
>
>
> Yes.
Do you want me to relocate to 'vdpa dev show', or it's okay to leave it 
behind there?

>
>
>>
>> Currently we export the config attributes upon vdpa creation under 
>> the "initial_config" key. If we want to expose more default values 
>> inherited from mgmtdev, I think we can wrap up these default values 
>> under another key "inherited_config" to display in 'vdpa dev show' 
>> output. Does it fit what you have in mind?
>
>
> I wonder if it's better to merge those two, or is there any advantages 
> of splitting them?
I think for the most part "initial_config" will be sufficient for those 
config attributes with "vdpa dev add" equivalents, be it user specified, 
vdpa enforced default if missing user input, or default overridden by 
the parent device. "inherited_config" will be useful for the configs 
with no "vdpa dev add" equivalent or live out side of vdpa tool, but 
still important for mgmt software to replicate identical vdpa setup. 
Like max-supported-mtu (for the uplink port or parent device), 
effective-link-speed, effective-link-status et al. Let's see if there's 
more when we get there.

Thanks,
-Siwei

>
>
>>
>>> or do you mean the mgmt can assume it
>>> should be the same as mgmtdev. Anyhow if we don't show device_features
>>> if it is not provisioned, it will complicate the mgmt software.
>> Yes, as I said earlier, since the device_features attr getting added 
>> to the 'vdpa dev config show' command, this divergence started to 
>> complicate mgmt software already.
>>
>> Thanks,
>
>
> Thanks
>
>
>> -Siwei
>>> Thanks
>>>
>>>> Thanks,
>>>> -Siwei
>>>>
>>>>
>>>>> Thanks
>>>>>
>>>>>> +
>>>>>> +       return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int
>>>>>>    vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff *msg, 
>>>>>> u32 portid, u32 seq,
>>>>>>                 int flags, struct netlink_ext_ack *extack)
>>>>>>    {
>>>>>> @@ -715,6 +750,10 @@ static int 
>>>>>> vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct 
>>>>>> genl_info *i
>>>>>>           if (nla_put_u16(msg, VDPA_ATTR_DEV_MIN_VQ_SIZE, 
>>>>>> min_vq_size))
>>>>>>                   goto msg_err;
>>>>>>
>>>>>> +       err = vdpa_dev_initcfg_fill(vdev, msg, device_id);
>>>>>> +       if (err)
>>>>>> +               goto msg_err;
>>>>>> +
>>>>>>           genlmsg_end(msg, hdr);
>>>>>>           return 0;
>>>>>>
>>>>>> -- 
>>>>>> 1.8.3.1
>>>>>>
>>
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 3/4] vdpa: show dev config as-is in "vdpa dev show" output
@ 2022-10-27  6:31               ` Si-Wei Liu
  0 siblings, 0 replies; 42+ messages in thread
From: Si-Wei Liu @ 2022-10-27  6:31 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, parav, virtualization, linux-kernel, Cindy Lu



On 10/25/2022 9:44 PM, Jason Wang wrote:
>
> 在 2022/10/26 09:10, Si-Wei Liu 写道:
>>
>>
>> On 10/24/2022 7:24 PM, Jason Wang wrote:
>>> On Tue, Oct 25, 2022 at 3:14 AM Si-Wei Liu<si-wei.liu@oracle.com>  
>>> wrote:
>>>> On 10/24/2022 1:40 AM, Jason Wang wrote:
>>>>> On Sat, Oct 22, 2022 at 7:49 AM Si-Wei Liu<si-wei.liu@oracle.com>  
>>>>> wrote:
>>>>>> Live migration of vdpa would typically require re-instate vdpa
>>>>>> device with an idential set of configs on the destination node,
>>>>>> same way as how source node created the device in the first
>>>>>> place. In order to save orchestration software from memorizing
>>>>>> and keeping track of vdpa config, it will be helpful if the vdpa
>>>>>> tool provides the aids for exporting the initial configs as-is,
>>>>>> the way how vdpa device was created. The "vdpa dev show" command
>>>>>> seems to be the right vehicle for that. It is unlike the "vdpa dev
>>>>>> config show" command output which usually goes with the live value
>>>>>> in the device config space, and is not quite reliable subject to
>>>>>> the dynamics of feature negotiation or possible change by the
>>>>>> driver to the config space.
>>>>>>
>>>>>> Examples:
>>>>>>
>>>>>> 1) Create vDPA by default without any config attribute
>>>>>>
>>>>>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0
>>>>>> $ vdpa dev show vdpa0
>>>>>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 
>>>>>> max_vqs 9 max_vq_size 256
>>>>>> $ vdpa dev -jp show vdpa0
>>>>>> {
>>>>>>       "dev": {
>>>>>>           "vdpa0": {
>>>>>>               "type": "network",
>>>>>>               "mgmtdev": "pci/0000:41:04.2",
>>>>>>               "vendor_id": 5555,
>>>>>>               "max_vqs": 9,
>>>>>>               "max_vq_size": 256,
>>>>>>           }
>>>>>>       }
>>>>>> }
>>>>>>
>>>>>> 2) Create vDPA with config attribute(s) specified
>>>>>>
>>>>>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0 \
>>>>>>       mac e4:11:c6:d3:45:f0 max_vq_pairs 4
>>>>>> $ vdpa dev show
>>>>>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 
>>>>>> max_vqs 9 max_vq_size 256
>>>>>>     initial_config: mac e4:11:c6:d3:45:f0 max_vq_pairs 4
>>>>>> $ vdpa dev -jp show
>>>>>> {
>>>>>>       "dev": {
>>>>>>           "vdpa0": {
>>>>>>               "type": "network",
>>>>>>               "mgmtdev": "pci/0000:41:04.2",
>>>>>>               "vendor_id": 5555,
>>>>>>               "max_vqs": 9,
>>>>>>               "max_vq_size": 256,
>>>>>>               "initial_config": {
>>>>>>                   "mac": "e4:11:c6:d3:45:f0",
>>>>>>                   "max_vq_pairs": 4
>>>>>>               }
>>>>>>           }
>>>>>>       }
>>>>>> }
>>>>>>
>>>>>> Signed-off-by: Si-Wei Liu<si-wei.liu@oracle.com>
>>>>>> ---
>>>>>>    drivers/vdpa/vdpa.c | 39 +++++++++++++++++++++++++++++++++++++++
>>>>>>    1 file changed, 39 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
>>>>>> index bebded6..bfb8f54 100644
>>>>>> --- a/drivers/vdpa/vdpa.c
>>>>>> +++ b/drivers/vdpa/vdpa.c
>>>>>> @@ -677,6 +677,41 @@ static int 
>>>>>> vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct 
>>>>>> genl_info *i
>>>>>>    }
>>>>>>
>>>>>>    static int
>>>>>> +vdpa_dev_initcfg_fill(struct vdpa_device *vdev, struct sk_buff 
>>>>>> *msg, u32 device_id)
>>>>>> +{
>>>>>> +       struct vdpa_dev_set_config *cfg = &vdev->init_cfg;
>>>>>> +       int err = -EMSGSIZE;
>>>>>> +
>>>>>> +       if (!cfg->mask)
>>>>>> +               return 0;
>>>>>> +
>>>>>> +       switch (device_id) {
>>>>>> +       case VIRTIO_ID_NET:
>>>>>> +               if ((cfg->mask & 
>>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR)) != 0 &&
>>>>>> +                   nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
>>>>>> +                           sizeof(cfg->net.mac), cfg->net.mac))
>>>>>> +                       return err;
>>>>>> +               if ((cfg->mask & 
>>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU)) != 0 &&
>>>>>> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, 
>>>>>> cfg->net.mtu))
>>>>>> +                       return err;
>>>>>> +               if ((cfg->mask & 
>>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP)) != 0 &&
>>>>>> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
>>>>>> + cfg->net.max_vq_pairs))
>>>>>> +                       return err;
>>>>>> +               break;
>>>>>> +       default:
>>>>>> +               break;
>>>>>> +       }
>>>>>> +
>>>>>> +       if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) != 0 &&
>>>>>> +           nla_put_u64_64bit(msg, VDPA_ATTR_DEV_FEATURES,
>>>>>> +                             cfg->device_features, VDPA_ATTR_PAD))
>>>>>> +               return err;
>>>>> A question: If any of those above attributes were not provisioned,
>>>>> should we show the ones that are inherited from the parent?
>>>> A simple answer would be yes, but the long answer is that I am not 
>>>> sure
>>>> if there's any for the moment - there's no  default value for mtu, 
>>>> mac,
>>>> and max_vqp that can be inherited from the parent (max_vqp by default
>>>> being 1 is spec defined, not something inherited from the parent).
>>> Note that it is by default from driver level that if _F_MQ is not
>>> negotiated. But I think we are talking about something different that
>>> is out of the spec here, what if:
>>>
>>> vDPA inherit _F_MQ but mqx_vqp is not provisioned via netlink.
>>>
>>> Or is it not allowed?
>> My understanding is that this is not allowed any more since the 
>> introduction of max_vqp attribute. Noted, currently we don't have a 
>> way for vendor driver to report the default value for mqx_vqp, 
>
>
> I think it can be reported in this patch?
Yes, we can add, but I am not sure whether or not this will be 
practically useful, for e.g. the same command without max_vqp specified 
may render different number of queues across different devices, or 
different revisions of the same vendor's devices. Does it complicate the 
mgmt software even more, I'm not sure.... Could we instead mandate 
max_vqp to be 1 from vdpa core level if user doesn't explicitly specify 
the value? That way it is more consistent in terms of the resulting 
number of queue pairs (=1) with the case where parent device does not 
offer the _F_MQ feature.

>
>
>> if not otherwise specified in the CLI. Without getting the default 
>> value reported in 'vdpa mgmtdev show' level, it'd just confuse mgmt 
>> software even more.
>
>
> Yes, this is something that we need to fix. And what's more in order 
> to support dynamic provisioning, we need a way to report the number of 
> available instances that could be used for vDPA device provisioning.
Wouldn't it be possible to achieve that by simply checking how many 
parent mgmtdev instances don't have vdpa device provisioned yet? e.g.

inuse=$(vdpa dev show | grep mgmtdev | wc -l)
total=$(vdpa mgmtdev show  | grep "supported_classes" | wc -l )
echo $((total - inuse))

>
>
>>
>>>    At least some time in the past, mlx5 were
>>> enabled with MQ with 8 queue pairs by default.
>> That was the situation when there's no max_vqp attribute support from 
>> vdpa netlink API level. I think now every driver honors the vdpa core 
>> disposition to get a single queue pair if max_vqp config is missing.
>
> So we have:
>
> int vdpa_register_device(struct vdpa_device *vdev, int nvqs)
>
> This means technically, parent can allocate a multiqueue devices with 
> _F_MQ features if max_vqp and device_features is not provisioned. And 
> what's more, what happens if _F_MQ is provisioned by max_vqp is not 
> specified?
>
> The question is:
>
> When a attribute is not specificed/provisioned via net link, what's 
> the default value? The answer should be consistent: if device_features 
> is determined by the parent, we should do the same for mqx_vqp. 
OK I got your point.

> And it looks to me all of those belongs to the initial config 
> (self-contained)
Right. I wonder if we can have vdpa core define the default value (for 
e.g. max_vqp=1) for those unspecified attribute (esp. when the 
corresponding device feature is offered and provisioned) whenever 
possible. Which I think it'll be more consistent for the same command to 
get to the same result between different vendor drivers. While we still 
keep the possibility for future extension to allow driver override the 
vdpa core disposition if the real use case emerges. What do you think?

>
>
>> And the mlx5_vdpa driver with 8 queue pairs in the wild days is just 
>> irrelevant to be manageable by mgmt software, regardless of live 
>> migration.
>>>> And
>>>> the device_features if inherited is displayed at 'vdpa dev config 
>>>> show'
>>>> output. Can you remind me of a good example for inherited value 
>>>> that we
>>>> may want to show here?
>>> Some other cases:
>>>
>>> 1) MTU: there should be something that the device needs to report if
>>> _F_MTU is negotiated even if it is not provisioned from netlink.
>> I am not sure I understand the ask here. Noted the QEMU argument has 
>> to offer host_mtu=X with the maximum MTU value for guest to use (and 
>> applied as the initial MTU config during virtio-net probing for Linux 
>> driver), 
>
>
> Adding Cindy.
>
> I think it's a known issue that we need to do sanity check to make 
> sure cli parameters matches what is provisioned from netlink.
Right. How's the plan for QEMU to get to the mtu provisioned by netlink, 
via a new vhost-vdpa ioctl call? If so, will  QEMU be able to read it 
directly from kernel when it comes to the vhost-vdpa backend, without 
having user to specify host_mtu from CLI?
>
>
>> and the way to get the parent device MTU and whether that's relevant 
>> to vdpa device's MTU is very vendor specific.
>
>
> So I think the max MTU of parent should be equal to the max MTU of the 
> vDPA.
Noted here the parent might not be necessarily the mgmtdev where vdpa 
gets created over. It may well end up with the MTU on the PF (uplink 
port) which the mgmt software has to concern with. My point is the 
utility and tool chain able to derive the maximal MTU effectively 
allowed for vDPA device may live out of vDPA's realm. It's a rare or 
even invalid configuration to have vDPA configured with a bigger value 
than the MTU on the uplink port or parent device. It's more common when 
MTU config is involved, it has to be consistently configured across all 
the network links along, from parent device (uplink port) down to the 
switchdev representor port, vdpa device, and QEMU virtio-net object.

>
>
>> I think we would need new attribute(s) in the mgmtdev level to 
>> support what you want here?
>
>
> Not sure, but what I want to ask is consider we provision MTU feature 
> but without max MTU value, do we need to report the initial max MTU here?
Yep, maybe. I'm not very sure if this will be very useful to be honest, 
consider it's kinda a rare case to me were to provision MTU feature 
without a specific MTU value. If one cares about MTU, mgmt software 
should configure some mtu through "vdpa dev add ... mtu ...", no?

On the other hand, no mtu value specified may mean "go with what the 
uplink port or parent device has". I think this is a pretty useful case 
if the vendor's NIC supports updating MTU on the fly without having to 
tear down QEMU and reconfigure vdpa. I'm not sure if we end up with 
killing this use case by limiting initial max MTU to a fixed value.

>
>
>>
>>> 2) device_features: if device_features is not provisioned, we should
>>> still report it via netlink here
>> Not the way I expected it, but with Lingshan's series to expose 
>> fields out of FEATURES_OK, the device_features is now reported 
>> through 'vdpa dev config show' regardless being specified or not, if 
>> I am not mistaken?
>
>
> Yes.
Do you want me to relocate to 'vdpa dev show', or it's okay to leave it 
behind there?

>
>
>>
>> Currently we export the config attributes upon vdpa creation under 
>> the "initial_config" key. If we want to expose more default values 
>> inherited from mgmtdev, I think we can wrap up these default values 
>> under another key "inherited_config" to display in 'vdpa dev show' 
>> output. Does it fit what you have in mind?
>
>
> I wonder if it's better to merge those two, or is there any advantages 
> of splitting them?
I think for the most part "initial_config" will be sufficient for those 
config attributes with "vdpa dev add" equivalents, be it user specified, 
vdpa enforced default if missing user input, or default overridden by 
the parent device. "inherited_config" will be useful for the configs 
with no "vdpa dev add" equivalent or live out side of vdpa tool, but 
still important for mgmt software to replicate identical vdpa setup. 
Like max-supported-mtu (for the uplink port or parent device), 
effective-link-speed, effective-link-status et al. Let's see if there's 
more when we get there.

Thanks,
-Siwei

>
>
>>
>>> or do you mean the mgmt can assume it
>>> should be the same as mgmtdev. Anyhow if we don't show device_features
>>> if it is not provisioned, it will complicate the mgmt software.
>> Yes, as I said earlier, since the device_features attr getting added 
>> to the 'vdpa dev config show' command, this divergence started to 
>> complicate mgmt software already.
>>
>> Thanks,
>
>
> Thanks
>
>
>> -Siwei
>>> Thanks
>>>
>>>> Thanks,
>>>> -Siwei
>>>>
>>>>
>>>>> Thanks
>>>>>
>>>>>> +
>>>>>> +       return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int
>>>>>>    vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff *msg, 
>>>>>> u32 portid, u32 seq,
>>>>>>                 int flags, struct netlink_ext_ack *extack)
>>>>>>    {
>>>>>> @@ -715,6 +750,10 @@ static int 
>>>>>> vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct 
>>>>>> genl_info *i
>>>>>>           if (nla_put_u16(msg, VDPA_ATTR_DEV_MIN_VQ_SIZE, 
>>>>>> min_vq_size))
>>>>>>                   goto msg_err;
>>>>>>
>>>>>> +       err = vdpa_dev_initcfg_fill(vdev, msg, device_id);
>>>>>> +       if (err)
>>>>>> +               goto msg_err;
>>>>>> +
>>>>>>           genlmsg_end(msg, hdr);
>>>>>>           return 0;
>>>>>>
>>>>>> -- 
>>>>>> 1.8.3.1
>>>>>>
>>
>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 3/4] vdpa: show dev config as-is in "vdpa dev show" output
  2022-10-27  6:31               ` Si-Wei Liu
@ 2022-10-27  8:47                 ` Jason Wang
  -1 siblings, 0 replies; 42+ messages in thread
From: Jason Wang @ 2022-10-27  8:47 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: mst, parav, virtualization, linux-kernel, Cindy Lu

On Thu, Oct 27, 2022 at 2:31 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 10/25/2022 9:44 PM, Jason Wang wrote:
> >
> > 在 2022/10/26 09:10, Si-Wei Liu 写道:
> >>
> >>
> >> On 10/24/2022 7:24 PM, Jason Wang wrote:
> >>> On Tue, Oct 25, 2022 at 3:14 AM Si-Wei Liu<si-wei.liu@oracle.com>
> >>> wrote:
> >>>> On 10/24/2022 1:40 AM, Jason Wang wrote:
> >>>>> On Sat, Oct 22, 2022 at 7:49 AM Si-Wei Liu<si-wei.liu@oracle.com>
> >>>>> wrote:
> >>>>>> Live migration of vdpa would typically require re-instate vdpa
> >>>>>> device with an idential set of configs on the destination node,
> >>>>>> same way as how source node created the device in the first
> >>>>>> place. In order to save orchestration software from memorizing
> >>>>>> and keeping track of vdpa config, it will be helpful if the vdpa
> >>>>>> tool provides the aids for exporting the initial configs as-is,
> >>>>>> the way how vdpa device was created. The "vdpa dev show" command
> >>>>>> seems to be the right vehicle for that. It is unlike the "vdpa dev
> >>>>>> config show" command output which usually goes with the live value
> >>>>>> in the device config space, and is not quite reliable subject to
> >>>>>> the dynamics of feature negotiation or possible change by the
> >>>>>> driver to the config space.
> >>>>>>
> >>>>>> Examples:
> >>>>>>
> >>>>>> 1) Create vDPA by default without any config attribute
> >>>>>>
> >>>>>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0
> >>>>>> $ vdpa dev show vdpa0
> >>>>>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555
> >>>>>> max_vqs 9 max_vq_size 256
> >>>>>> $ vdpa dev -jp show vdpa0
> >>>>>> {
> >>>>>>       "dev": {
> >>>>>>           "vdpa0": {
> >>>>>>               "type": "network",
> >>>>>>               "mgmtdev": "pci/0000:41:04.2",
> >>>>>>               "vendor_id": 5555,
> >>>>>>               "max_vqs": 9,
> >>>>>>               "max_vq_size": 256,
> >>>>>>           }
> >>>>>>       }
> >>>>>> }
> >>>>>>
> >>>>>> 2) Create vDPA with config attribute(s) specified
> >>>>>>
> >>>>>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0 \
> >>>>>>       mac e4:11:c6:d3:45:f0 max_vq_pairs 4
> >>>>>> $ vdpa dev show
> >>>>>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555
> >>>>>> max_vqs 9 max_vq_size 256
> >>>>>>     initial_config: mac e4:11:c6:d3:45:f0 max_vq_pairs 4
> >>>>>> $ vdpa dev -jp show
> >>>>>> {
> >>>>>>       "dev": {
> >>>>>>           "vdpa0": {
> >>>>>>               "type": "network",
> >>>>>>               "mgmtdev": "pci/0000:41:04.2",
> >>>>>>               "vendor_id": 5555,
> >>>>>>               "max_vqs": 9,
> >>>>>>               "max_vq_size": 256,
> >>>>>>               "initial_config": {
> >>>>>>                   "mac": "e4:11:c6:d3:45:f0",
> >>>>>>                   "max_vq_pairs": 4
> >>>>>>               }
> >>>>>>           }
> >>>>>>       }
> >>>>>> }
> >>>>>>
> >>>>>> Signed-off-by: Si-Wei Liu<si-wei.liu@oracle.com>
> >>>>>> ---
> >>>>>>    drivers/vdpa/vdpa.c | 39 +++++++++++++++++++++++++++++++++++++++
> >>>>>>    1 file changed, 39 insertions(+)
> >>>>>>
> >>>>>> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
> >>>>>> index bebded6..bfb8f54 100644
> >>>>>> --- a/drivers/vdpa/vdpa.c
> >>>>>> +++ b/drivers/vdpa/vdpa.c
> >>>>>> @@ -677,6 +677,41 @@ static int
> >>>>>> vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct
> >>>>>> genl_info *i
> >>>>>>    }
> >>>>>>
> >>>>>>    static int
> >>>>>> +vdpa_dev_initcfg_fill(struct vdpa_device *vdev, struct sk_buff
> >>>>>> *msg, u32 device_id)
> >>>>>> +{
> >>>>>> +       struct vdpa_dev_set_config *cfg = &vdev->init_cfg;
> >>>>>> +       int err = -EMSGSIZE;
> >>>>>> +
> >>>>>> +       if (!cfg->mask)
> >>>>>> +               return 0;
> >>>>>> +
> >>>>>> +       switch (device_id) {
> >>>>>> +       case VIRTIO_ID_NET:
> >>>>>> +               if ((cfg->mask &
> >>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR)) != 0 &&
> >>>>>> +                   nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
> >>>>>> +                           sizeof(cfg->net.mac), cfg->net.mac))
> >>>>>> +                       return err;
> >>>>>> +               if ((cfg->mask &
> >>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU)) != 0 &&
> >>>>>> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU,
> >>>>>> cfg->net.mtu))
> >>>>>> +                       return err;
> >>>>>> +               if ((cfg->mask &
> >>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP)) != 0 &&
> >>>>>> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
> >>>>>> + cfg->net.max_vq_pairs))
> >>>>>> +                       return err;
> >>>>>> +               break;
> >>>>>> +       default:
> >>>>>> +               break;
> >>>>>> +       }
> >>>>>> +
> >>>>>> +       if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) != 0 &&
> >>>>>> +           nla_put_u64_64bit(msg, VDPA_ATTR_DEV_FEATURES,
> >>>>>> +                             cfg->device_features, VDPA_ATTR_PAD))
> >>>>>> +               return err;
> >>>>> A question: If any of those above attributes were not provisioned,
> >>>>> should we show the ones that are inherited from the parent?
> >>>> A simple answer would be yes, but the long answer is that I am not
> >>>> sure
> >>>> if there's any for the moment - there's no  default value for mtu,
> >>>> mac,
> >>>> and max_vqp that can be inherited from the parent (max_vqp by default
> >>>> being 1 is spec defined, not something inherited from the parent).
> >>> Note that it is by default from driver level that if _F_MQ is not
> >>> negotiated. But I think we are talking about something different that
> >>> is out of the spec here, what if:
> >>>
> >>> vDPA inherit _F_MQ but mqx_vqp is not provisioned via netlink.
> >>>
> >>> Or is it not allowed?
> >> My understanding is that this is not allowed any more since the
> >> introduction of max_vqp attribute. Noted, currently we don't have a
> >> way for vendor driver to report the default value for mqx_vqp,
> >
> >
> > I think it can be reported in this patch?
> Yes, we can add, but I am not sure whether or not this will be
> practically useful, for e.g. the same command without max_vqp specified
> may render different number of queues across different devices, or
> different revisions of the same vendor's devices. Does it complicate the
> mgmt software even more, I'm not sure....

It depends on the use case, e.g if we want to compare the migration
compatibility, having a single vdpa command query is much easier than
having two or more.

> Could we instead mandate
> max_vqp to be 1 from vdpa core level if user doesn't explicitly specify
> the value?

This seems to be not easy, at least not easy in the vDPA core. We can
probably document this somewhere but max_vqp is only one example, we
have other mq devices like block/SCSI/console.

> That way it is more consistent in terms of the resulting
> number of queue pairs (=1) with the case where parent device does not
> offer the _F_MQ feature.

Right, but a corner case is to provision _F_MQ but without max_vqp.

>
> >
> >
> >> if not otherwise specified in the CLI. Without getting the default
> >> value reported in 'vdpa mgmtdev show' level, it'd just confuse mgmt
> >> software even more.
> >
> >
> > Yes, this is something that we need to fix. And what's more in order
> > to support dynamic provisioning, we need a way to report the number of
> > available instances that could be used for vDPA device provisioning.
> Wouldn't it be possible to achieve that by simply checking how many
> parent mgmtdev instances don't have vdpa device provisioned yet? e.g.
>
> inuse=$(vdpa dev show | grep mgmtdev | wc -l)
> total=$(vdpa mgmtdev show  | grep "supported_classes" | wc -l )
> echo $((total - inuse))

I meant how many available vDPA devices that are available for the
mgmt to create?

E.g in the case of sub function or simulator a mgmtdev can create more
than 1 vdpa devices.

>
> >
> >
> >>
> >>>    At least some time in the past, mlx5 were
> >>> enabled with MQ with 8 queue pairs by default.
> >> That was the situation when there's no max_vqp attribute support from
> >> vdpa netlink API level. I think now every driver honors the vdpa core
> >> disposition to get a single queue pair if max_vqp config is missing.
> >
> > So we have:
> >
> > int vdpa_register_device(struct vdpa_device *vdev, int nvqs)
> >
> > This means technically, parent can allocate a multiqueue devices with
> > _F_MQ features if max_vqp and device_features is not provisioned. And
> > what's more, what happens if _F_MQ is provisioned by max_vqp is not
> > specified?
> >
> > The question is:
> >
> > When a attribute is not specificed/provisioned via net link, what's
> > the default value? The answer should be consistent: if device_features
> > is determined by the parent, we should do the same for mqx_vqp.
> OK I got your point.
>
> > And it looks to me all of those belongs to the initial config
> > (self-contained)
> Right. I wonder if we can have vdpa core define the default value (for
> e.g. max_vqp=1) for those unspecified attribute (esp. when the
> corresponding device feature is offered and provisioned) whenever
> possible. Which I think it'll be more consistent for the same command to
> get to the same result between different vendor drivers. While we still
> keep the possibility for future extension to allow driver override the
> vdpa core disposition if the real use case emerges. What do you think?

That's possible but we may end up with device specific code in the
vDPA core which is not elegant, and the code will grow as the number
of supported types grows.

Note that, max_vqp is not the only attribute that may suffer from
this, basically any config field that depends on a specific feature
bit may have the same issue.

>
> >
> >
> >> And the mlx5_vdpa driver with 8 queue pairs in the wild days is just
> >> irrelevant to be manageable by mgmt software, regardless of live
> >> migration.
> >>>> And
> >>>> the device_features if inherited is displayed at 'vdpa dev config
> >>>> show'
> >>>> output. Can you remind me of a good example for inherited value
> >>>> that we
> >>>> may want to show here?
> >>> Some other cases:
> >>>
> >>> 1) MTU: there should be something that the device needs to report if
> >>> _F_MTU is negotiated even if it is not provisioned from netlink.
> >> I am not sure I understand the ask here. Noted the QEMU argument has
> >> to offer host_mtu=X with the maximum MTU value for guest to use (and
> >> applied as the initial MTU config during virtio-net probing for Linux
> >> driver),
> >
> >
> > Adding Cindy.
> >
> > I think it's a known issue that we need to do sanity check to make
> > sure cli parameters matches what is provisioned from netlink.
> Right. How's the plan for QEMU to get to the mtu provisioned by netlink,
> via a new vhost-vdpa ioctl call?

I think netlink is not designed for qemu to use, the design is to
expose a vhost device to Qemu.

> If so, will  QEMU be able to read it
> directly from kernel when it comes to the vhost-vdpa backend, without
> having user to specify host_mtu from CLI?

I'm not sure I get the question, but Qemu should get this via config
space (otherwise it should be a bug). And Qemu need to verify the mtu
got from cli vs the mtu got from vhost and fail the device
initialization if they don't match.

> >
> >
> >> and the way to get the parent device MTU and whether that's relevant
> >> to vdpa device's MTU is very vendor specific.
> >
> >
> > So I think the max MTU of parent should be equal to the max MTU of the
> > vDPA.
> Noted here the parent might not be necessarily the mgmtdev where vdpa
> gets created over. It may well end up with the MTU on the PF (uplink
> port) which the mgmt software has to concern with. My point is the
> utility and tool chain able to derive the maximal MTU effectively
> allowed for vDPA device may live out of vDPA's realm. It's a rare or
> even invalid configuration to have vDPA configured with a bigger value
> than the MTU on the uplink port or parent device. It's more common when
> MTU config is involved, it has to be consistently configured across all
> the network links along, from parent device (uplink port) down to the
> switchdev representor port, vdpa device, and QEMU virtio-net object.

Ok, right.

>
> >
> >
> >> I think we would need new attribute(s) in the mgmtdev level to
> >> support what you want here?
> >
> >
> > Not sure, but what I want to ask is consider we provision MTU feature
> > but without max MTU value, do we need to report the initial max MTU here?
> Yep, maybe. I'm not very sure if this will be very useful to be honest,
> consider it's kinda a rare case to me were to provision MTU feature
> without a specific MTU value. If one cares about MTU, mgmt software
> should configure some mtu through "vdpa dev add ... mtu ...", no?

Yes, but this only works if all config fields could be provisioned,
which seems not the case now, vdpa_dev_set_config is currently a
subset of virtio_net_config. So this goes back to the question I
raised earlier. Is the time to switch to use virtio_net_config and
allow all fields to be provisioned?

And even for mtu we're lacking a way to report the maximum MTU allowed
by mgmt dev (e.g the uplink MTU via netlink):

1) report the maximum host mtu supported by the mgmtdev via netlink
(not done, so management needs to guess the maximum value now)
2) allow mtu to be provisioned (done)
3) show initial mtu (done by this patch)

We probably need to do the above for all fields to be self-contained.

>
> On the other hand, no mtu value specified may mean "go with what the
> uplink port or parent device has". I think this is a pretty useful case
> if the vendor's NIC supports updating MTU on the fly without having to
> tear down QEMU and reconfigure vdpa. I'm not sure if we end up with
> killing this use case by limiting initial max MTU to a fixed value.
>
> >
> >
> >>
> >>> 2) device_features: if device_features is not provisioned, we should
> >>> still report it via netlink here
> >> Not the way I expected it, but with Lingshan's series to expose
> >> fields out of FEATURES_OK, the device_features is now reported
> >> through 'vdpa dev config show' regardless being specified or not, if
> >> I am not mistaken?
> >
> >
> > Yes.
> Do you want me to relocate to 'vdpa dev show', or it's okay to leave it
> behind there?

It's probably too late for the relocation but I feel it's better to
place all the initial/inherited attributes into a single command even
if some of them are already somewhere in another command, but we can
hear from others.

>
> >
> >
> >>
> >> Currently we export the config attributes upon vdpa creation under
> >> the "initial_config" key. If we want to expose more default values
> >> inherited from mgmtdev, I think we can wrap up these default values
> >> under another key "inherited_config" to display in 'vdpa dev show'
> >> output. Does it fit what you have in mind?
> >
> >
> > I wonder if it's better to merge those two, or is there any advantages
> > of splitting them?
> I think for the most part "initial_config" will be sufficient for those
> config attributes with "vdpa dev add" equivalents, be it user specified,
> vdpa enforced default if missing user input, or default overridden by
> the parent device. "inherited_config" will be useful for the configs
> with no "vdpa dev add" equivalent or live out side of vdpa tool, but
> still important for mgmt software to replicate identical vdpa setup.
> Like max-supported-mtu (for the uplink port or parent device),
> effective-link-speed, effective-link-status et al. Let's see if there's
> more when we get there.

So one point I can see is that, if there's no difference from the
userpsace perspective, we'd better merge them. And I don't see any
difference between the initial versus inherited from the view of user
space. Do you?

Thanks

>
> Thanks,
> -Siwei
>
> >
> >
> >>
> >>> or do you mean the mgmt can assume it
> >>> should be the same as mgmtdev. Anyhow if we don't show device_features
> >>> if it is not provisioned, it will complicate the mgmt software.
> >> Yes, as I said earlier, since the device_features attr getting added
> >> to the 'vdpa dev config show' command, this divergence started to
> >> complicate mgmt software already.
> >>
> >> Thanks,
> >
> >
> > Thanks
> >
> >
> >> -Siwei
> >>> Thanks
> >>>
> >>>> Thanks,
> >>>> -Siwei
> >>>>
> >>>>
> >>>>> Thanks
> >>>>>
> >>>>>> +
> >>>>>> +       return 0;
> >>>>>> +}
> >>>>>> +
> >>>>>> +static int
> >>>>>>    vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff *msg,
> >>>>>> u32 portid, u32 seq,
> >>>>>>                 int flags, struct netlink_ext_ack *extack)
> >>>>>>    {
> >>>>>> @@ -715,6 +750,10 @@ static int
> >>>>>> vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct
> >>>>>> genl_info *i
> >>>>>>           if (nla_put_u16(msg, VDPA_ATTR_DEV_MIN_VQ_SIZE,
> >>>>>> min_vq_size))
> >>>>>>                   goto msg_err;
> >>>>>>
> >>>>>> +       err = vdpa_dev_initcfg_fill(vdev, msg, device_id);
> >>>>>> +       if (err)
> >>>>>> +               goto msg_err;
> >>>>>> +
> >>>>>>           genlmsg_end(msg, hdr);
> >>>>>>           return 0;
> >>>>>>
> >>>>>> --
> >>>>>> 1.8.3.1
> >>>>>>
> >>
> >
>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 3/4] vdpa: show dev config as-is in "vdpa dev show" output
@ 2022-10-27  8:47                 ` Jason Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Jason Wang @ 2022-10-27  8:47 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: virtualization, linux-kernel, Cindy Lu, mst

On Thu, Oct 27, 2022 at 2:31 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 10/25/2022 9:44 PM, Jason Wang wrote:
> >
> > 在 2022/10/26 09:10, Si-Wei Liu 写道:
> >>
> >>
> >> On 10/24/2022 7:24 PM, Jason Wang wrote:
> >>> On Tue, Oct 25, 2022 at 3:14 AM Si-Wei Liu<si-wei.liu@oracle.com>
> >>> wrote:
> >>>> On 10/24/2022 1:40 AM, Jason Wang wrote:
> >>>>> On Sat, Oct 22, 2022 at 7:49 AM Si-Wei Liu<si-wei.liu@oracle.com>
> >>>>> wrote:
> >>>>>> Live migration of vdpa would typically require re-instate vdpa
> >>>>>> device with an idential set of configs on the destination node,
> >>>>>> same way as how source node created the device in the first
> >>>>>> place. In order to save orchestration software from memorizing
> >>>>>> and keeping track of vdpa config, it will be helpful if the vdpa
> >>>>>> tool provides the aids for exporting the initial configs as-is,
> >>>>>> the way how vdpa device was created. The "vdpa dev show" command
> >>>>>> seems to be the right vehicle for that. It is unlike the "vdpa dev
> >>>>>> config show" command output which usually goes with the live value
> >>>>>> in the device config space, and is not quite reliable subject to
> >>>>>> the dynamics of feature negotiation or possible change by the
> >>>>>> driver to the config space.
> >>>>>>
> >>>>>> Examples:
> >>>>>>
> >>>>>> 1) Create vDPA by default without any config attribute
> >>>>>>
> >>>>>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0
> >>>>>> $ vdpa dev show vdpa0
> >>>>>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555
> >>>>>> max_vqs 9 max_vq_size 256
> >>>>>> $ vdpa dev -jp show vdpa0
> >>>>>> {
> >>>>>>       "dev": {
> >>>>>>           "vdpa0": {
> >>>>>>               "type": "network",
> >>>>>>               "mgmtdev": "pci/0000:41:04.2",
> >>>>>>               "vendor_id": 5555,
> >>>>>>               "max_vqs": 9,
> >>>>>>               "max_vq_size": 256,
> >>>>>>           }
> >>>>>>       }
> >>>>>> }
> >>>>>>
> >>>>>> 2) Create vDPA with config attribute(s) specified
> >>>>>>
> >>>>>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0 \
> >>>>>>       mac e4:11:c6:d3:45:f0 max_vq_pairs 4
> >>>>>> $ vdpa dev show
> >>>>>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555
> >>>>>> max_vqs 9 max_vq_size 256
> >>>>>>     initial_config: mac e4:11:c6:d3:45:f0 max_vq_pairs 4
> >>>>>> $ vdpa dev -jp show
> >>>>>> {
> >>>>>>       "dev": {
> >>>>>>           "vdpa0": {
> >>>>>>               "type": "network",
> >>>>>>               "mgmtdev": "pci/0000:41:04.2",
> >>>>>>               "vendor_id": 5555,
> >>>>>>               "max_vqs": 9,
> >>>>>>               "max_vq_size": 256,
> >>>>>>               "initial_config": {
> >>>>>>                   "mac": "e4:11:c6:d3:45:f0",
> >>>>>>                   "max_vq_pairs": 4
> >>>>>>               }
> >>>>>>           }
> >>>>>>       }
> >>>>>> }
> >>>>>>
> >>>>>> Signed-off-by: Si-Wei Liu<si-wei.liu@oracle.com>
> >>>>>> ---
> >>>>>>    drivers/vdpa/vdpa.c | 39 +++++++++++++++++++++++++++++++++++++++
> >>>>>>    1 file changed, 39 insertions(+)
> >>>>>>
> >>>>>> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
> >>>>>> index bebded6..bfb8f54 100644
> >>>>>> --- a/drivers/vdpa/vdpa.c
> >>>>>> +++ b/drivers/vdpa/vdpa.c
> >>>>>> @@ -677,6 +677,41 @@ static int
> >>>>>> vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct
> >>>>>> genl_info *i
> >>>>>>    }
> >>>>>>
> >>>>>>    static int
> >>>>>> +vdpa_dev_initcfg_fill(struct vdpa_device *vdev, struct sk_buff
> >>>>>> *msg, u32 device_id)
> >>>>>> +{
> >>>>>> +       struct vdpa_dev_set_config *cfg = &vdev->init_cfg;
> >>>>>> +       int err = -EMSGSIZE;
> >>>>>> +
> >>>>>> +       if (!cfg->mask)
> >>>>>> +               return 0;
> >>>>>> +
> >>>>>> +       switch (device_id) {
> >>>>>> +       case VIRTIO_ID_NET:
> >>>>>> +               if ((cfg->mask &
> >>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR)) != 0 &&
> >>>>>> +                   nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
> >>>>>> +                           sizeof(cfg->net.mac), cfg->net.mac))
> >>>>>> +                       return err;
> >>>>>> +               if ((cfg->mask &
> >>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU)) != 0 &&
> >>>>>> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU,
> >>>>>> cfg->net.mtu))
> >>>>>> +                       return err;
> >>>>>> +               if ((cfg->mask &
> >>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP)) != 0 &&
> >>>>>> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
> >>>>>> + cfg->net.max_vq_pairs))
> >>>>>> +                       return err;
> >>>>>> +               break;
> >>>>>> +       default:
> >>>>>> +               break;
> >>>>>> +       }
> >>>>>> +
> >>>>>> +       if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) != 0 &&
> >>>>>> +           nla_put_u64_64bit(msg, VDPA_ATTR_DEV_FEATURES,
> >>>>>> +                             cfg->device_features, VDPA_ATTR_PAD))
> >>>>>> +               return err;
> >>>>> A question: If any of those above attributes were not provisioned,
> >>>>> should we show the ones that are inherited from the parent?
> >>>> A simple answer would be yes, but the long answer is that I am not
> >>>> sure
> >>>> if there's any for the moment - there's no  default value for mtu,
> >>>> mac,
> >>>> and max_vqp that can be inherited from the parent (max_vqp by default
> >>>> being 1 is spec defined, not something inherited from the parent).
> >>> Note that it is by default from driver level that if _F_MQ is not
> >>> negotiated. But I think we are talking about something different that
> >>> is out of the spec here, what if:
> >>>
> >>> vDPA inherit _F_MQ but mqx_vqp is not provisioned via netlink.
> >>>
> >>> Or is it not allowed?
> >> My understanding is that this is not allowed any more since the
> >> introduction of max_vqp attribute. Noted, currently we don't have a
> >> way for vendor driver to report the default value for mqx_vqp,
> >
> >
> > I think it can be reported in this patch?
> Yes, we can add, but I am not sure whether or not this will be
> practically useful, for e.g. the same command without max_vqp specified
> may render different number of queues across different devices, or
> different revisions of the same vendor's devices. Does it complicate the
> mgmt software even more, I'm not sure....

It depends on the use case, e.g if we want to compare the migration
compatibility, having a single vdpa command query is much easier than
having two or more.

> Could we instead mandate
> max_vqp to be 1 from vdpa core level if user doesn't explicitly specify
> the value?

This seems to be not easy, at least not easy in the vDPA core. We can
probably document this somewhere but max_vqp is only one example, we
have other mq devices like block/SCSI/console.

> That way it is more consistent in terms of the resulting
> number of queue pairs (=1) with the case where parent device does not
> offer the _F_MQ feature.

Right, but a corner case is to provision _F_MQ but without max_vqp.

>
> >
> >
> >> if not otherwise specified in the CLI. Without getting the default
> >> value reported in 'vdpa mgmtdev show' level, it'd just confuse mgmt
> >> software even more.
> >
> >
> > Yes, this is something that we need to fix. And what's more in order
> > to support dynamic provisioning, we need a way to report the number of
> > available instances that could be used for vDPA device provisioning.
> Wouldn't it be possible to achieve that by simply checking how many
> parent mgmtdev instances don't have vdpa device provisioned yet? e.g.
>
> inuse=$(vdpa dev show | grep mgmtdev | wc -l)
> total=$(vdpa mgmtdev show  | grep "supported_classes" | wc -l )
> echo $((total - inuse))

I meant how many available vDPA devices that are available for the
mgmt to create?

E.g in the case of sub function or simulator a mgmtdev can create more
than 1 vdpa devices.

>
> >
> >
> >>
> >>>    At least some time in the past, mlx5 were
> >>> enabled with MQ with 8 queue pairs by default.
> >> That was the situation when there's no max_vqp attribute support from
> >> vdpa netlink API level. I think now every driver honors the vdpa core
> >> disposition to get a single queue pair if max_vqp config is missing.
> >
> > So we have:
> >
> > int vdpa_register_device(struct vdpa_device *vdev, int nvqs)
> >
> > This means technically, parent can allocate a multiqueue devices with
> > _F_MQ features if max_vqp and device_features is not provisioned. And
> > what's more, what happens if _F_MQ is provisioned by max_vqp is not
> > specified?
> >
> > The question is:
> >
> > When a attribute is not specificed/provisioned via net link, what's
> > the default value? The answer should be consistent: if device_features
> > is determined by the parent, we should do the same for mqx_vqp.
> OK I got your point.
>
> > And it looks to me all of those belongs to the initial config
> > (self-contained)
> Right. I wonder if we can have vdpa core define the default value (for
> e.g. max_vqp=1) for those unspecified attribute (esp. when the
> corresponding device feature is offered and provisioned) whenever
> possible. Which I think it'll be more consistent for the same command to
> get to the same result between different vendor drivers. While we still
> keep the possibility for future extension to allow driver override the
> vdpa core disposition if the real use case emerges. What do you think?

That's possible but we may end up with device specific code in the
vDPA core which is not elegant, and the code will grow as the number
of supported types grows.

Note that, max_vqp is not the only attribute that may suffer from
this, basically any config field that depends on a specific feature
bit may have the same issue.

>
> >
> >
> >> And the mlx5_vdpa driver with 8 queue pairs in the wild days is just
> >> irrelevant to be manageable by mgmt software, regardless of live
> >> migration.
> >>>> And
> >>>> the device_features if inherited is displayed at 'vdpa dev config
> >>>> show'
> >>>> output. Can you remind me of a good example for inherited value
> >>>> that we
> >>>> may want to show here?
> >>> Some other cases:
> >>>
> >>> 1) MTU: there should be something that the device needs to report if
> >>> _F_MTU is negotiated even if it is not provisioned from netlink.
> >> I am not sure I understand the ask here. Noted the QEMU argument has
> >> to offer host_mtu=X with the maximum MTU value for guest to use (and
> >> applied as the initial MTU config during virtio-net probing for Linux
> >> driver),
> >
> >
> > Adding Cindy.
> >
> > I think it's a known issue that we need to do sanity check to make
> > sure cli parameters matches what is provisioned from netlink.
> Right. How's the plan for QEMU to get to the mtu provisioned by netlink,
> via a new vhost-vdpa ioctl call?

I think netlink is not designed for qemu to use, the design is to
expose a vhost device to Qemu.

> If so, will  QEMU be able to read it
> directly from kernel when it comes to the vhost-vdpa backend, without
> having user to specify host_mtu from CLI?

I'm not sure I get the question, but Qemu should get this via config
space (otherwise it should be a bug). And Qemu need to verify the mtu
got from cli vs the mtu got from vhost and fail the device
initialization if they don't match.

> >
> >
> >> and the way to get the parent device MTU and whether that's relevant
> >> to vdpa device's MTU is very vendor specific.
> >
> >
> > So I think the max MTU of parent should be equal to the max MTU of the
> > vDPA.
> Noted here the parent might not be necessarily the mgmtdev where vdpa
> gets created over. It may well end up with the MTU on the PF (uplink
> port) which the mgmt software has to concern with. My point is the
> utility and tool chain able to derive the maximal MTU effectively
> allowed for vDPA device may live out of vDPA's realm. It's a rare or
> even invalid configuration to have vDPA configured with a bigger value
> than the MTU on the uplink port or parent device. It's more common when
> MTU config is involved, it has to be consistently configured across all
> the network links along, from parent device (uplink port) down to the
> switchdev representor port, vdpa device, and QEMU virtio-net object.

Ok, right.

>
> >
> >
> >> I think we would need new attribute(s) in the mgmtdev level to
> >> support what you want here?
> >
> >
> > Not sure, but what I want to ask is consider we provision MTU feature
> > but without max MTU value, do we need to report the initial max MTU here?
> Yep, maybe. I'm not very sure if this will be very useful to be honest,
> consider it's kinda a rare case to me were to provision MTU feature
> without a specific MTU value. If one cares about MTU, mgmt software
> should configure some mtu through "vdpa dev add ... mtu ...", no?

Yes, but this only works if all config fields could be provisioned,
which seems not the case now, vdpa_dev_set_config is currently a
subset of virtio_net_config. So this goes back to the question I
raised earlier. Is the time to switch to use virtio_net_config and
allow all fields to be provisioned?

And even for mtu we're lacking a way to report the maximum MTU allowed
by mgmt dev (e.g the uplink MTU via netlink):

1) report the maximum host mtu supported by the mgmtdev via netlink
(not done, so management needs to guess the maximum value now)
2) allow mtu to be provisioned (done)
3) show initial mtu (done by this patch)

We probably need to do the above for all fields to be self-contained.

>
> On the other hand, no mtu value specified may mean "go with what the
> uplink port or parent device has". I think this is a pretty useful case
> if the vendor's NIC supports updating MTU on the fly without having to
> tear down QEMU and reconfigure vdpa. I'm not sure if we end up with
> killing this use case by limiting initial max MTU to a fixed value.
>
> >
> >
> >>
> >>> 2) device_features: if device_features is not provisioned, we should
> >>> still report it via netlink here
> >> Not the way I expected it, but with Lingshan's series to expose
> >> fields out of FEATURES_OK, the device_features is now reported
> >> through 'vdpa dev config show' regardless being specified or not, if
> >> I am not mistaken?
> >
> >
> > Yes.
> Do you want me to relocate to 'vdpa dev show', or it's okay to leave it
> behind there?

It's probably too late for the relocation but I feel it's better to
place all the initial/inherited attributes into a single command even
if some of them are already somewhere in another command, but we can
hear from others.

>
> >
> >
> >>
> >> Currently we export the config attributes upon vdpa creation under
> >> the "initial_config" key. If we want to expose more default values
> >> inherited from mgmtdev, I think we can wrap up these default values
> >> under another key "inherited_config" to display in 'vdpa dev show'
> >> output. Does it fit what you have in mind?
> >
> >
> > I wonder if it's better to merge those two, or is there any advantages
> > of splitting them?
> I think for the most part "initial_config" will be sufficient for those
> config attributes with "vdpa dev add" equivalents, be it user specified,
> vdpa enforced default if missing user input, or default overridden by
> the parent device. "inherited_config" will be useful for the configs
> with no "vdpa dev add" equivalent or live out side of vdpa tool, but
> still important for mgmt software to replicate identical vdpa setup.
> Like max-supported-mtu (for the uplink port or parent device),
> effective-link-speed, effective-link-status et al. Let's see if there's
> more when we get there.

So one point I can see is that, if there's no difference from the
userpsace perspective, we'd better merge them. And I don't see any
difference between the initial versus inherited from the view of user
space. Do you?

Thanks

>
> Thanks,
> -Siwei
>
> >
> >
> >>
> >>> or do you mean the mgmt can assume it
> >>> should be the same as mgmtdev. Anyhow if we don't show device_features
> >>> if it is not provisioned, it will complicate the mgmt software.
> >> Yes, as I said earlier, since the device_features attr getting added
> >> to the 'vdpa dev config show' command, this divergence started to
> >> complicate mgmt software already.
> >>
> >> Thanks,
> >
> >
> > Thanks
> >
> >
> >> -Siwei
> >>> Thanks
> >>>
> >>>> Thanks,
> >>>> -Siwei
> >>>>
> >>>>
> >>>>> Thanks
> >>>>>
> >>>>>> +
> >>>>>> +       return 0;
> >>>>>> +}
> >>>>>> +
> >>>>>> +static int
> >>>>>>    vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff *msg,
> >>>>>> u32 portid, u32 seq,
> >>>>>>                 int flags, struct netlink_ext_ack *extack)
> >>>>>>    {
> >>>>>> @@ -715,6 +750,10 @@ static int
> >>>>>> vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct
> >>>>>> genl_info *i
> >>>>>>           if (nla_put_u16(msg, VDPA_ATTR_DEV_MIN_VQ_SIZE,
> >>>>>> min_vq_size))
> >>>>>>                   goto msg_err;
> >>>>>>
> >>>>>> +       err = vdpa_dev_initcfg_fill(vdev, msg, device_id);
> >>>>>> +       if (err)
> >>>>>> +               goto msg_err;
> >>>>>> +
> >>>>>>           genlmsg_end(msg, hdr);
> >>>>>>           return 0;
> >>>>>>
> >>>>>> --
> >>>>>> 1.8.3.1
> >>>>>>
> >>
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 3/4] vdpa: show dev config as-is in "vdpa dev show" output
  2022-10-27  8:47                 ` Jason Wang
@ 2022-10-28 23:23                   ` Si-Wei Liu
  -1 siblings, 0 replies; 42+ messages in thread
From: Si-Wei Liu @ 2022-10-28 23:23 UTC (permalink / raw)
  To: Jason Wang, Eli Cohen; +Cc: mst, parav, virtualization, linux-kernel, Cindy Lu



On 10/27/2022 1:47 AM, Jason Wang wrote:
> On Thu, Oct 27, 2022 at 2:31 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>
>> On 10/25/2022 9:44 PM, Jason Wang wrote:
>>> 在 2022/10/26 09:10, Si-Wei Liu 写道:
>>>>
>>>> On 10/24/2022 7:24 PM, Jason Wang wrote:
>>>>> On Tue, Oct 25, 2022 at 3:14 AM Si-Wei Liu<si-wei.liu@oracle.com>
>>>>> wrote:
>>>>>> On 10/24/2022 1:40 AM, Jason Wang wrote:
>>>>>>> On Sat, Oct 22, 2022 at 7:49 AM Si-Wei Liu<si-wei.liu@oracle.com>
>>>>>>> wrote:
>>>>>>>> Live migration of vdpa would typically require re-instate vdpa
>>>>>>>> device with an idential set of configs on the destination node,
>>>>>>>> same way as how source node created the device in the first
>>>>>>>> place. In order to save orchestration software from memorizing
>>>>>>>> and keeping track of vdpa config, it will be helpful if the vdpa
>>>>>>>> tool provides the aids for exporting the initial configs as-is,
>>>>>>>> the way how vdpa device was created. The "vdpa dev show" command
>>>>>>>> seems to be the right vehicle for that. It is unlike the "vdpa dev
>>>>>>>> config show" command output which usually goes with the live value
>>>>>>>> in the device config space, and is not quite reliable subject to
>>>>>>>> the dynamics of feature negotiation or possible change by the
>>>>>>>> driver to the config space.
>>>>>>>>
>>>>>>>> Examples:
>>>>>>>>
>>>>>>>> 1) Create vDPA by default without any config attribute
>>>>>>>>
>>>>>>>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0
>>>>>>>> $ vdpa dev show vdpa0
>>>>>>>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555
>>>>>>>> max_vqs 9 max_vq_size 256
>>>>>>>> $ vdpa dev -jp show vdpa0
>>>>>>>> {
>>>>>>>>        "dev": {
>>>>>>>>            "vdpa0": {
>>>>>>>>                "type": "network",
>>>>>>>>                "mgmtdev": "pci/0000:41:04.2",
>>>>>>>>                "vendor_id": 5555,
>>>>>>>>                "max_vqs": 9,
>>>>>>>>                "max_vq_size": 256,
>>>>>>>>            }
>>>>>>>>        }
>>>>>>>> }
>>>>>>>>
>>>>>>>> 2) Create vDPA with config attribute(s) specified
>>>>>>>>
>>>>>>>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0 \
>>>>>>>>        mac e4:11:c6:d3:45:f0 max_vq_pairs 4
>>>>>>>> $ vdpa dev show
>>>>>>>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555
>>>>>>>> max_vqs 9 max_vq_size 256
>>>>>>>>      initial_config: mac e4:11:c6:d3:45:f0 max_vq_pairs 4
>>>>>>>> $ vdpa dev -jp show
>>>>>>>> {
>>>>>>>>        "dev": {
>>>>>>>>            "vdpa0": {
>>>>>>>>                "type": "network",
>>>>>>>>                "mgmtdev": "pci/0000:41:04.2",
>>>>>>>>                "vendor_id": 5555,
>>>>>>>>                "max_vqs": 9,
>>>>>>>>                "max_vq_size": 256,
>>>>>>>>                "initial_config": {
>>>>>>>>                    "mac": "e4:11:c6:d3:45:f0",
>>>>>>>>                    "max_vq_pairs": 4
>>>>>>>>                }
>>>>>>>>            }
>>>>>>>>        }
>>>>>>>> }
>>>>>>>>
>>>>>>>> Signed-off-by: Si-Wei Liu<si-wei.liu@oracle.com>
>>>>>>>> ---
>>>>>>>>     drivers/vdpa/vdpa.c | 39 +++++++++++++++++++++++++++++++++++++++
>>>>>>>>     1 file changed, 39 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
>>>>>>>> index bebded6..bfb8f54 100644
>>>>>>>> --- a/drivers/vdpa/vdpa.c
>>>>>>>> +++ b/drivers/vdpa/vdpa.c
>>>>>>>> @@ -677,6 +677,41 @@ static int
>>>>>>>> vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct
>>>>>>>> genl_info *i
>>>>>>>>     }
>>>>>>>>
>>>>>>>>     static int
>>>>>>>> +vdpa_dev_initcfg_fill(struct vdpa_device *vdev, struct sk_buff
>>>>>>>> *msg, u32 device_id)
>>>>>>>> +{
>>>>>>>> +       struct vdpa_dev_set_config *cfg = &vdev->init_cfg;
>>>>>>>> +       int err = -EMSGSIZE;
>>>>>>>> +
>>>>>>>> +       if (!cfg->mask)
>>>>>>>> +               return 0;
>>>>>>>> +
>>>>>>>> +       switch (device_id) {
>>>>>>>> +       case VIRTIO_ID_NET:
>>>>>>>> +               if ((cfg->mask &
>>>>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR)) != 0 &&
>>>>>>>> +                   nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
>>>>>>>> +                           sizeof(cfg->net.mac), cfg->net.mac))
>>>>>>>> +                       return err;
>>>>>>>> +               if ((cfg->mask &
>>>>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU)) != 0 &&
>>>>>>>> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU,
>>>>>>>> cfg->net.mtu))
>>>>>>>> +                       return err;
>>>>>>>> +               if ((cfg->mask &
>>>>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP)) != 0 &&
>>>>>>>> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
>>>>>>>> + cfg->net.max_vq_pairs))
>>>>>>>> +                       return err;
>>>>>>>> +               break;
>>>>>>>> +       default:
>>>>>>>> +               break;
>>>>>>>> +       }
>>>>>>>> +
>>>>>>>> +       if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) != 0 &&
>>>>>>>> +           nla_put_u64_64bit(msg, VDPA_ATTR_DEV_FEATURES,
>>>>>>>> +                             cfg->device_features, VDPA_ATTR_PAD))
>>>>>>>> +               return err;
>>>>>>> A question: If any of those above attributes were not provisioned,
>>>>>>> should we show the ones that are inherited from the parent?
>>>>>> A simple answer would be yes, but the long answer is that I am not
>>>>>> sure
>>>>>> if there's any for the moment - there's no  default value for mtu,
>>>>>> mac,
>>>>>> and max_vqp that can be inherited from the parent (max_vqp by default
>>>>>> being 1 is spec defined, not something inherited from the parent).
>>>>> Note that it is by default from driver level that if _F_MQ is not
>>>>> negotiated. But I think we are talking about something different that
>>>>> is out of the spec here, what if:
>>>>>
>>>>> vDPA inherit _F_MQ but mqx_vqp is not provisioned via netlink.
>>>>>
>>>>> Or is it not allowed?
>>>> My understanding is that this is not allowed any more since the
>>>> introduction of max_vqp attribute. Noted, currently we don't have a
>>>> way for vendor driver to report the default value for mqx_vqp,
>>>
>>> I think it can be reported in this patch?
>> Yes, we can add, but I am not sure whether or not this will be
>> practically useful, for e.g. the same command without max_vqp specified
>> may render different number of queues across different devices, or
>> different revisions of the same vendor's devices. Does it complicate the
>> mgmt software even more, I'm not sure....
> It depends on the use case, e.g if we want to compare the migration
> compatibility, having a single vdpa command query is much easier than
> having two or more.
Yep I agree. I was saying not very attribute would need to be inherited 
from the parent device. Actually attributes like max_vqp could take the 
default from some common place for e.g. some default value can be 
applied by vdpa core. And we can document these attributes ruled by vdpa 
core in vdpa-dev(8) man page. Reduce the extra call of having mgmt 
software issue another query command which actually doesn't need to.

>
>> Could we instead mandate
>> max_vqp to be 1 from vdpa core level if user doesn't explicitly specify
>> the value?
> This seems to be not easy, at least not easy in the vDPA core.
We can load these default values from vdpa_nl_cmd_dev_add_set_doit() 
before ops->dev_add is called. I can post a v3 that shows the code, it 
shouldn't be too hard.

>   We can
> probably document this somewhere but max_vqp is only one example, we
> have other mq devices like block/SCSI/console.
Actually max_vqp is a network device specific config to provision mq 
devices. If the parent mgmtdev supports net vdpa device creation and 
user requests to provision _F_MQ with no supplied max_vqp value, we 
should load some global default value there.

>
>> That way it is more consistent in terms of the resulting
>> number of queue pairs (=1) with the case where parent device does not
>> offer the _F_MQ feature.
> Right, but a corner case is to provision _F_MQ but without max_vqp.
Yes, I will post the patch that supports this.
>
>>>
>>>> if not otherwise specified in the CLI. Without getting the default
>>>> value reported in 'vdpa mgmtdev show' level, it'd just confuse mgmt
>>>> software even more.
>>>
>>> Yes, this is something that we need to fix. And what's more in order
>>> to support dynamic provisioning, we need a way to report the number of
>>> available instances that could be used for vDPA device provisioning.
>> Wouldn't it be possible to achieve that by simply checking how many
>> parent mgmtdev instances don't have vdpa device provisioned yet? e.g.
>>
>> inuse=$(vdpa dev show | grep mgmtdev | wc -l)
>> total=$(vdpa mgmtdev show  | grep "supported_classes" | wc -l )
>> echo $((total - inuse))
> I meant how many available vDPA devices that are available for the
> mgmt to create?
Oh I see.

>
> E.g in the case of sub function or simulator a mgmtdev can create more
> than 1 vdpa devices.
Does the sub function today supports creation of multiple vDPA instance 
per mgmtdev? Something I wasn't aware of before. Is it with different 
device class?

>
>>>
>>>>>     At least some time in the past, mlx5 were
>>>>> enabled with MQ with 8 queue pairs by default.
>>>> That was the situation when there's no max_vqp attribute support from
>>>> vdpa netlink API level. I think now every driver honors the vdpa core
>>>> disposition to get a single queue pair if max_vqp config is missing.
>>> So we have:
>>>
>>> int vdpa_register_device(struct vdpa_device *vdev, int nvqs)
>>>
>>> This means technically, parent can allocate a multiqueue devices with
>>> _F_MQ features if max_vqp and device_features is not provisioned. And
>>> what's more, what happens if _F_MQ is provisioned by max_vqp is not
>>> specified?
>>>
>>> The question is:
>>>
>>> When a attribute is not specificed/provisioned via net link, what's
>>> the default value? The answer should be consistent: if device_features
>>> is determined by the parent, we should do the same for mqx_vqp.
>> OK I got your point.
>>
>>> And it looks to me all of those belongs to the initial config
>>> (self-contained)
>> Right. I wonder if we can have vdpa core define the default value (for
>> e.g. max_vqp=1) for those unspecified attribute (esp. when the
>> corresponding device feature is offered and provisioned) whenever
>> possible. Which I think it'll be more consistent for the same command to
>> get to the same result between different vendor drivers. While we still
>> keep the possibility for future extension to allow driver override the
>> vdpa core disposition if the real use case emerges. What do you think?
> That's possible but we may end up with device specific code in the
> vDPA core which is not elegant, and the code will grow as the number
> of supported types grows.
I guess that's unavoidable as this is already the case today. See 
various VIRTIO_ID_NET case switch in the vdpa.c code. I think type 
specific code just limits itself to the netlink API interfacing layer 
rather than down to the driver API, it might be just okay (as that's 
already the case).

>
> Note that, max_vqp is not the only attribute that may suffer from
> this, basically any config field that depends on a specific feature
> bit may have the same issue.
>
>>>
>>>> And the mlx5_vdpa driver with 8 queue pairs in the wild days is just
>>>> irrelevant to be manageable by mgmt software, regardless of live
>>>> migration.
>>>>>> And
>>>>>> the device_features if inherited is displayed at 'vdpa dev config
>>>>>> show'
>>>>>> output. Can you remind me of a good example for inherited value
>>>>>> that we
>>>>>> may want to show here?
>>>>> Some other cases:
>>>>>
>>>>> 1) MTU: there should be something that the device needs to report if
>>>>> _F_MTU is negotiated even if it is not provisioned from netlink.
>>>> I am not sure I understand the ask here. Noted the QEMU argument has
>>>> to offer host_mtu=X with the maximum MTU value for guest to use (and
>>>> applied as the initial MTU config during virtio-net probing for Linux
>>>> driver),
>>>
>>> Adding Cindy.
>>>
>>> I think it's a known issue that we need to do sanity check to make
>>> sure cli parameters matches what is provisioned from netlink.
>> Right. How's the plan for QEMU to get to the mtu provisioned by netlink,
>> via a new vhost-vdpa ioctl call?
> I think netlink is not designed for qemu to use, the design is to
> expose a vhost device to Qemu.
>
>> If so, will  QEMU be able to read it
>> directly from kernel when it comes to the vhost-vdpa backend, without
>> having user to specify host_mtu from CLI?
> I'm not sure I get the question, but Qemu should get this via config
> space (otherwise it should be a bug).
It's hard for QEMU to work this way with the existing get_config() ops I 
think, as it has assumption around endianness and feature negotiation, 
until the latter is done you can't get any reliable value for 
provisioned property. I think QEMU which need to validate the 
provisioned value way earlier (when QEMU is launched), before 
negotiation kicks in. It would be clearner to use another vhost and a 
new vdpa driver ops to retrieve the provisioned feature config values 
from vendor drivers.

>   And Qemu need to verify the mtu
> got from cli vs the mtu got from vhost and fail the device
> initialization if they don't match.
I mean today there's a problem for double provisioning: for e.g. mtu has 
to be first provided in the 'vdpa dev add' command when to provision 
_F_MTU, in QEMU CLI the same value has to be supplied to host_mtu. The 
same applies to mac address. It would be the best we can allow QEMU load 
the provisioned value from vdpa device directly, without having to 
provide extra duplicated configs in QEMU CLI level.

>
>>>
>>>> and the way to get the parent device MTU and whether that's relevant
>>>> to vdpa device's MTU is very vendor specific.
>>>
>>> So I think the max MTU of parent should be equal to the max MTU of the
>>> vDPA.
>> Noted here the parent might not be necessarily the mgmtdev where vdpa
>> gets created over. It may well end up with the MTU on the PF (uplink
>> port) which the mgmt software has to concern with. My point is the
>> utility and tool chain able to derive the maximal MTU effectively
>> allowed for vDPA device may live out of vDPA's realm. It's a rare or
>> even invalid configuration to have vDPA configured with a bigger value
>> than the MTU on the uplink port or parent device. It's more common when
>> MTU config is involved, it has to be consistently configured across all
>> the network links along, from parent device (uplink port) down to the
>> switchdev representor port, vdpa device, and QEMU virtio-net object.
> Ok, right.
>
>>>
>>>> I think we would need new attribute(s) in the mgmtdev level to
>>>> support what you want here?
>>>
>>> Not sure, but what I want to ask is consider we provision MTU feature
>>> but without max MTU value, do we need to report the initial max MTU here?
>> Yep, maybe. I'm not very sure if this will be very useful to be honest,
>> consider it's kinda a rare case to me were to provision MTU feature
>> without a specific MTU value. If one cares about MTU, mgmt software
>> should configure some mtu through "vdpa dev add ... mtu ...", no?
> Yes, but this only works if all config fields could be provisioned,
> which seems not the case now, vdpa_dev_set_config is currently a
> subset of virtio_net_config. So this goes back to the question I
> raised earlier. Is the time to switch to use virtio_net_config and
> allow all fields to be provisioned?
Don't quite get how it will be useful if switching to virtio_net_config. 
I thought we can add the missing fields to vdpa_dev_set_config even now 
to make it match virtio_net_config. Though the reality is there's few 
vdpa device that supports those features now. If any real device 
supports feature field in virtio_net_config but not in 
vdpa_dev_set_config, it can be gradually added so long as needed.

>
> And even for mtu we're lacking a way to report the maximum MTU allowed
> by mgmt dev (e.g the uplink MTU via netlink):
Since MTU is only implemented in mlx5_vdpa by now except for simulators, 
copy Eli to see if this is feasible to implement in real device. I think 
we also need to validate that the mtu configured on vDPA device instance 
shouldn't exceed the uplink MTU (maximum MTU allowed).

> 1) report the maximum host mtu supported by the mgmtdev via netlink
> (not done, so management needs to guess the maximum value now)
> 2) allow mtu to be provisioned (done)
> 3) show initial mtu (done by this patch)
So I wonder is it fine for vdpa core to come up with a default value for 
MTU when _F_MTU feature is to be provisioned or inherited? If we mandate 
each vDPA vendor to support at least the standard 1500 MTU for _F_MTU 
feature, we can make it default to 1500.

Otherwise the vDPA has to be taken (inherited) from the parent device. 
Unfortunately, right now for mlx5_vdpa, the parent mgmtdev device has 
1500 MTU by default regardless of the MTU on the uplink port, and I'm 
not sure if it's a right model to enforce mgmtdev go with uplink port's 
MTU. I would need to hear what vendors say about this requirement.

>
> We probably need to do the above for all fields to be self-contained.
Agreed on the part of being self-contained.

>
>> On the other hand, no mtu value specified may mean "go with what the
>> uplink port or parent device has". I think this is a pretty useful case
>> if the vendor's NIC supports updating MTU on the fly without having to
>> tear down QEMU and reconfigure vdpa. I'm not sure if we end up with
>> killing this use case by limiting initial max MTU to a fixed value.
>>
>>>
>>>>> 2) device_features: if device_features is not provisioned, we should
>>>>> still report it via netlink here
>>>> Not the way I expected it, but with Lingshan's series to expose
>>>> fields out of FEATURES_OK, the device_features is now reported
>>>> through 'vdpa dev config show' regardless being specified or not, if
>>>> I am not mistaken?
>>>
>>> Yes.
>> Do you want me to relocate to 'vdpa dev show', or it's okay to leave it
>> behind there?
> It's probably too late for the relocation but I feel it's better to
> place all the initial/inherited attributes into a single command even
> if some of them are already somewhere in another command, but we can
> hear from others.
Ok, that'll be fine. I supposed mgmt software should only query through 
"mgmtdev show" or "dev show", avoiding any query via"dev config show". 
It'd be the best to get all of the compatibility related info 
consolidated in one single place. Let me try to include it in "dev show".

>
>>>
>>>> Currently we export the config attributes upon vdpa creation under
>>>> the "initial_config" key. If we want to expose more default values
>>>> inherited from mgmtdev, I think we can wrap up these default values
>>>> under another key "inherited_config" to display in 'vdpa dev show'
>>>> output. Does it fit what you have in mind?
>>>
>>> I wonder if it's better to merge those two, or is there any advantages
>>> of splitting them?
>> I think for the most part "initial_config" will be sufficient for those
>> config attributes with "vdpa dev add" equivalents, be it user specified,
>> vdpa enforced default if missing user input, or default overridden by
>> the parent device. "inherited_config" will be useful for the configs
>> with no "vdpa dev add" equivalent or live out side of vdpa tool, but
>> still important for mgmt software to replicate identical vdpa setup.
>> Like max-supported-mtu (for the uplink port or parent device),
>> effective-link-speed, effective-link-status et al. Let's see if there's
>> more when we get there.
> So one point I can see is that, if there's no difference from the
> userpsace perspective, we'd better merge them. And I don't see any
> difference between the initial versus inherited from the view of user
> space. Do you?
So the major difference is "initial_config" is settable and equivalent 
to the config attribute in "vdpa dev add" command, while 
"inherited_config" is the read-only fields from "mgmtdev show" that does 
not correspond to any "vdpa dev add" vdpa attribute. That way the mgmt 
software can use the "initial_config" directly to recreate vdpa with 
identical device config, while using the "inherited_config" to replicate 
the other configs out of vdpa, for e.g. set uplink port's MTU to 9000. 
Maybe there's no need to fold such info into an "inherited_config" key? 
though I just want to make it relevant to migration compatibility. Any 
suggestion for the name or layout?


Thanks,
-Siwei

>
> Thanks
>
>> Thanks,
>> -Siwei
>>
>>>
>>>>> or do you mean the mgmt can assume it
>>>>> should be the same as mgmtdev. Anyhow if we don't show device_features
>>>>> if it is not provisioned, it will complicate the mgmt software.
>>>> Yes, as I said earlier, since the device_features attr getting added
>>>> to the 'vdpa dev config show' command, this divergence started to
>>>> complicate mgmt software already.
>>>>
>>>> Thanks,
>>>
>>> Thanks
>>>
>>>
>>>> -Siwei
>>>>> Thanks
>>>>>
>>>>>> Thanks,
>>>>>> -Siwei
>>>>>>
>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>> +
>>>>>>>> +       return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static int
>>>>>>>>     vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff *msg,
>>>>>>>> u32 portid, u32 seq,
>>>>>>>>                  int flags, struct netlink_ext_ack *extack)
>>>>>>>>     {
>>>>>>>> @@ -715,6 +750,10 @@ static int
>>>>>>>> vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct
>>>>>>>> genl_info *i
>>>>>>>>            if (nla_put_u16(msg, VDPA_ATTR_DEV_MIN_VQ_SIZE,
>>>>>>>> min_vq_size))
>>>>>>>>                    goto msg_err;
>>>>>>>>
>>>>>>>> +       err = vdpa_dev_initcfg_fill(vdev, msg, device_id);
>>>>>>>> +       if (err)
>>>>>>>> +               goto msg_err;
>>>>>>>> +
>>>>>>>>            genlmsg_end(msg, hdr);
>>>>>>>>            return 0;
>>>>>>>>
>>>>>>>> --
>>>>>>>> 1.8.3.1
>>>>>>>>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 3/4] vdpa: show dev config as-is in "vdpa dev show" output
@ 2022-10-28 23:23                   ` Si-Wei Liu
  0 siblings, 0 replies; 42+ messages in thread
From: Si-Wei Liu @ 2022-10-28 23:23 UTC (permalink / raw)
  To: Jason Wang, Eli Cohen; +Cc: virtualization, linux-kernel, Cindy Lu, mst



On 10/27/2022 1:47 AM, Jason Wang wrote:
> On Thu, Oct 27, 2022 at 2:31 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>
>> On 10/25/2022 9:44 PM, Jason Wang wrote:
>>> 在 2022/10/26 09:10, Si-Wei Liu 写道:
>>>>
>>>> On 10/24/2022 7:24 PM, Jason Wang wrote:
>>>>> On Tue, Oct 25, 2022 at 3:14 AM Si-Wei Liu<si-wei.liu@oracle.com>
>>>>> wrote:
>>>>>> On 10/24/2022 1:40 AM, Jason Wang wrote:
>>>>>>> On Sat, Oct 22, 2022 at 7:49 AM Si-Wei Liu<si-wei.liu@oracle.com>
>>>>>>> wrote:
>>>>>>>> Live migration of vdpa would typically require re-instate vdpa
>>>>>>>> device with an idential set of configs on the destination node,
>>>>>>>> same way as how source node created the device in the first
>>>>>>>> place. In order to save orchestration software from memorizing
>>>>>>>> and keeping track of vdpa config, it will be helpful if the vdpa
>>>>>>>> tool provides the aids for exporting the initial configs as-is,
>>>>>>>> the way how vdpa device was created. The "vdpa dev show" command
>>>>>>>> seems to be the right vehicle for that. It is unlike the "vdpa dev
>>>>>>>> config show" command output which usually goes with the live value
>>>>>>>> in the device config space, and is not quite reliable subject to
>>>>>>>> the dynamics of feature negotiation or possible change by the
>>>>>>>> driver to the config space.
>>>>>>>>
>>>>>>>> Examples:
>>>>>>>>
>>>>>>>> 1) Create vDPA by default without any config attribute
>>>>>>>>
>>>>>>>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0
>>>>>>>> $ vdpa dev show vdpa0
>>>>>>>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555
>>>>>>>> max_vqs 9 max_vq_size 256
>>>>>>>> $ vdpa dev -jp show vdpa0
>>>>>>>> {
>>>>>>>>        "dev": {
>>>>>>>>            "vdpa0": {
>>>>>>>>                "type": "network",
>>>>>>>>                "mgmtdev": "pci/0000:41:04.2",
>>>>>>>>                "vendor_id": 5555,
>>>>>>>>                "max_vqs": 9,
>>>>>>>>                "max_vq_size": 256,
>>>>>>>>            }
>>>>>>>>        }
>>>>>>>> }
>>>>>>>>
>>>>>>>> 2) Create vDPA with config attribute(s) specified
>>>>>>>>
>>>>>>>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0 \
>>>>>>>>        mac e4:11:c6:d3:45:f0 max_vq_pairs 4
>>>>>>>> $ vdpa dev show
>>>>>>>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555
>>>>>>>> max_vqs 9 max_vq_size 256
>>>>>>>>      initial_config: mac e4:11:c6:d3:45:f0 max_vq_pairs 4
>>>>>>>> $ vdpa dev -jp show
>>>>>>>> {
>>>>>>>>        "dev": {
>>>>>>>>            "vdpa0": {
>>>>>>>>                "type": "network",
>>>>>>>>                "mgmtdev": "pci/0000:41:04.2",
>>>>>>>>                "vendor_id": 5555,
>>>>>>>>                "max_vqs": 9,
>>>>>>>>                "max_vq_size": 256,
>>>>>>>>                "initial_config": {
>>>>>>>>                    "mac": "e4:11:c6:d3:45:f0",
>>>>>>>>                    "max_vq_pairs": 4
>>>>>>>>                }
>>>>>>>>            }
>>>>>>>>        }
>>>>>>>> }
>>>>>>>>
>>>>>>>> Signed-off-by: Si-Wei Liu<si-wei.liu@oracle.com>
>>>>>>>> ---
>>>>>>>>     drivers/vdpa/vdpa.c | 39 +++++++++++++++++++++++++++++++++++++++
>>>>>>>>     1 file changed, 39 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
>>>>>>>> index bebded6..bfb8f54 100644
>>>>>>>> --- a/drivers/vdpa/vdpa.c
>>>>>>>> +++ b/drivers/vdpa/vdpa.c
>>>>>>>> @@ -677,6 +677,41 @@ static int
>>>>>>>> vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct
>>>>>>>> genl_info *i
>>>>>>>>     }
>>>>>>>>
>>>>>>>>     static int
>>>>>>>> +vdpa_dev_initcfg_fill(struct vdpa_device *vdev, struct sk_buff
>>>>>>>> *msg, u32 device_id)
>>>>>>>> +{
>>>>>>>> +       struct vdpa_dev_set_config *cfg = &vdev->init_cfg;
>>>>>>>> +       int err = -EMSGSIZE;
>>>>>>>> +
>>>>>>>> +       if (!cfg->mask)
>>>>>>>> +               return 0;
>>>>>>>> +
>>>>>>>> +       switch (device_id) {
>>>>>>>> +       case VIRTIO_ID_NET:
>>>>>>>> +               if ((cfg->mask &
>>>>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR)) != 0 &&
>>>>>>>> +                   nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
>>>>>>>> +                           sizeof(cfg->net.mac), cfg->net.mac))
>>>>>>>> +                       return err;
>>>>>>>> +               if ((cfg->mask &
>>>>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU)) != 0 &&
>>>>>>>> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU,
>>>>>>>> cfg->net.mtu))
>>>>>>>> +                       return err;
>>>>>>>> +               if ((cfg->mask &
>>>>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP)) != 0 &&
>>>>>>>> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
>>>>>>>> + cfg->net.max_vq_pairs))
>>>>>>>> +                       return err;
>>>>>>>> +               break;
>>>>>>>> +       default:
>>>>>>>> +               break;
>>>>>>>> +       }
>>>>>>>> +
>>>>>>>> +       if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) != 0 &&
>>>>>>>> +           nla_put_u64_64bit(msg, VDPA_ATTR_DEV_FEATURES,
>>>>>>>> +                             cfg->device_features, VDPA_ATTR_PAD))
>>>>>>>> +               return err;
>>>>>>> A question: If any of those above attributes were not provisioned,
>>>>>>> should we show the ones that are inherited from the parent?
>>>>>> A simple answer would be yes, but the long answer is that I am not
>>>>>> sure
>>>>>> if there's any for the moment - there's no  default value for mtu,
>>>>>> mac,
>>>>>> and max_vqp that can be inherited from the parent (max_vqp by default
>>>>>> being 1 is spec defined, not something inherited from the parent).
>>>>> Note that it is by default from driver level that if _F_MQ is not
>>>>> negotiated. But I think we are talking about something different that
>>>>> is out of the spec here, what if:
>>>>>
>>>>> vDPA inherit _F_MQ but mqx_vqp is not provisioned via netlink.
>>>>>
>>>>> Or is it not allowed?
>>>> My understanding is that this is not allowed any more since the
>>>> introduction of max_vqp attribute. Noted, currently we don't have a
>>>> way for vendor driver to report the default value for mqx_vqp,
>>>
>>> I think it can be reported in this patch?
>> Yes, we can add, but I am not sure whether or not this will be
>> practically useful, for e.g. the same command without max_vqp specified
>> may render different number of queues across different devices, or
>> different revisions of the same vendor's devices. Does it complicate the
>> mgmt software even more, I'm not sure....
> It depends on the use case, e.g if we want to compare the migration
> compatibility, having a single vdpa command query is much easier than
> having two or more.
Yep I agree. I was saying not very attribute would need to be inherited 
from the parent device. Actually attributes like max_vqp could take the 
default from some common place for e.g. some default value can be 
applied by vdpa core. And we can document these attributes ruled by vdpa 
core in vdpa-dev(8) man page. Reduce the extra call of having mgmt 
software issue another query command which actually doesn't need to.

>
>> Could we instead mandate
>> max_vqp to be 1 from vdpa core level if user doesn't explicitly specify
>> the value?
> This seems to be not easy, at least not easy in the vDPA core.
We can load these default values from vdpa_nl_cmd_dev_add_set_doit() 
before ops->dev_add is called. I can post a v3 that shows the code, it 
shouldn't be too hard.

>   We can
> probably document this somewhere but max_vqp is only one example, we
> have other mq devices like block/SCSI/console.
Actually max_vqp is a network device specific config to provision mq 
devices. If the parent mgmtdev supports net vdpa device creation and 
user requests to provision _F_MQ with no supplied max_vqp value, we 
should load some global default value there.

>
>> That way it is more consistent in terms of the resulting
>> number of queue pairs (=1) with the case where parent device does not
>> offer the _F_MQ feature.
> Right, but a corner case is to provision _F_MQ but without max_vqp.
Yes, I will post the patch that supports this.
>
>>>
>>>> if not otherwise specified in the CLI. Without getting the default
>>>> value reported in 'vdpa mgmtdev show' level, it'd just confuse mgmt
>>>> software even more.
>>>
>>> Yes, this is something that we need to fix. And what's more in order
>>> to support dynamic provisioning, we need a way to report the number of
>>> available instances that could be used for vDPA device provisioning.
>> Wouldn't it be possible to achieve that by simply checking how many
>> parent mgmtdev instances don't have vdpa device provisioned yet? e.g.
>>
>> inuse=$(vdpa dev show | grep mgmtdev | wc -l)
>> total=$(vdpa mgmtdev show  | grep "supported_classes" | wc -l )
>> echo $((total - inuse))
> I meant how many available vDPA devices that are available for the
> mgmt to create?
Oh I see.

>
> E.g in the case of sub function or simulator a mgmtdev can create more
> than 1 vdpa devices.
Does the sub function today supports creation of multiple vDPA instance 
per mgmtdev? Something I wasn't aware of before. Is it with different 
device class?

>
>>>
>>>>>     At least some time in the past, mlx5 were
>>>>> enabled with MQ with 8 queue pairs by default.
>>>> That was the situation when there's no max_vqp attribute support from
>>>> vdpa netlink API level. I think now every driver honors the vdpa core
>>>> disposition to get a single queue pair if max_vqp config is missing.
>>> So we have:
>>>
>>> int vdpa_register_device(struct vdpa_device *vdev, int nvqs)
>>>
>>> This means technically, parent can allocate a multiqueue devices with
>>> _F_MQ features if max_vqp and device_features is not provisioned. And
>>> what's more, what happens if _F_MQ is provisioned by max_vqp is not
>>> specified?
>>>
>>> The question is:
>>>
>>> When a attribute is not specificed/provisioned via net link, what's
>>> the default value? The answer should be consistent: if device_features
>>> is determined by the parent, we should do the same for mqx_vqp.
>> OK I got your point.
>>
>>> And it looks to me all of those belongs to the initial config
>>> (self-contained)
>> Right. I wonder if we can have vdpa core define the default value (for
>> e.g. max_vqp=1) for those unspecified attribute (esp. when the
>> corresponding device feature is offered and provisioned) whenever
>> possible. Which I think it'll be more consistent for the same command to
>> get to the same result between different vendor drivers. While we still
>> keep the possibility for future extension to allow driver override the
>> vdpa core disposition if the real use case emerges. What do you think?
> That's possible but we may end up with device specific code in the
> vDPA core which is not elegant, and the code will grow as the number
> of supported types grows.
I guess that's unavoidable as this is already the case today. See 
various VIRTIO_ID_NET case switch in the vdpa.c code. I think type 
specific code just limits itself to the netlink API interfacing layer 
rather than down to the driver API, it might be just okay (as that's 
already the case).

>
> Note that, max_vqp is not the only attribute that may suffer from
> this, basically any config field that depends on a specific feature
> bit may have the same issue.
>
>>>
>>>> And the mlx5_vdpa driver with 8 queue pairs in the wild days is just
>>>> irrelevant to be manageable by mgmt software, regardless of live
>>>> migration.
>>>>>> And
>>>>>> the device_features if inherited is displayed at 'vdpa dev config
>>>>>> show'
>>>>>> output. Can you remind me of a good example for inherited value
>>>>>> that we
>>>>>> may want to show here?
>>>>> Some other cases:
>>>>>
>>>>> 1) MTU: there should be something that the device needs to report if
>>>>> _F_MTU is negotiated even if it is not provisioned from netlink.
>>>> I am not sure I understand the ask here. Noted the QEMU argument has
>>>> to offer host_mtu=X with the maximum MTU value for guest to use (and
>>>> applied as the initial MTU config during virtio-net probing for Linux
>>>> driver),
>>>
>>> Adding Cindy.
>>>
>>> I think it's a known issue that we need to do sanity check to make
>>> sure cli parameters matches what is provisioned from netlink.
>> Right. How's the plan for QEMU to get to the mtu provisioned by netlink,
>> via a new vhost-vdpa ioctl call?
> I think netlink is not designed for qemu to use, the design is to
> expose a vhost device to Qemu.
>
>> If so, will  QEMU be able to read it
>> directly from kernel when it comes to the vhost-vdpa backend, without
>> having user to specify host_mtu from CLI?
> I'm not sure I get the question, but Qemu should get this via config
> space (otherwise it should be a bug).
It's hard for QEMU to work this way with the existing get_config() ops I 
think, as it has assumption around endianness and feature negotiation, 
until the latter is done you can't get any reliable value for 
provisioned property. I think QEMU which need to validate the 
provisioned value way earlier (when QEMU is launched), before 
negotiation kicks in. It would be clearner to use another vhost and a 
new vdpa driver ops to retrieve the provisioned feature config values 
from vendor drivers.

>   And Qemu need to verify the mtu
> got from cli vs the mtu got from vhost and fail the device
> initialization if they don't match.
I mean today there's a problem for double provisioning: for e.g. mtu has 
to be first provided in the 'vdpa dev add' command when to provision 
_F_MTU, in QEMU CLI the same value has to be supplied to host_mtu. The 
same applies to mac address. It would be the best we can allow QEMU load 
the provisioned value from vdpa device directly, without having to 
provide extra duplicated configs in QEMU CLI level.

>
>>>
>>>> and the way to get the parent device MTU and whether that's relevant
>>>> to vdpa device's MTU is very vendor specific.
>>>
>>> So I think the max MTU of parent should be equal to the max MTU of the
>>> vDPA.
>> Noted here the parent might not be necessarily the mgmtdev where vdpa
>> gets created over. It may well end up with the MTU on the PF (uplink
>> port) which the mgmt software has to concern with. My point is the
>> utility and tool chain able to derive the maximal MTU effectively
>> allowed for vDPA device may live out of vDPA's realm. It's a rare or
>> even invalid configuration to have vDPA configured with a bigger value
>> than the MTU on the uplink port or parent device. It's more common when
>> MTU config is involved, it has to be consistently configured across all
>> the network links along, from parent device (uplink port) down to the
>> switchdev representor port, vdpa device, and QEMU virtio-net object.
> Ok, right.
>
>>>
>>>> I think we would need new attribute(s) in the mgmtdev level to
>>>> support what you want here?
>>>
>>> Not sure, but what I want to ask is consider we provision MTU feature
>>> but without max MTU value, do we need to report the initial max MTU here?
>> Yep, maybe. I'm not very sure if this will be very useful to be honest,
>> consider it's kinda a rare case to me were to provision MTU feature
>> without a specific MTU value. If one cares about MTU, mgmt software
>> should configure some mtu through "vdpa dev add ... mtu ...", no?
> Yes, but this only works if all config fields could be provisioned,
> which seems not the case now, vdpa_dev_set_config is currently a
> subset of virtio_net_config. So this goes back to the question I
> raised earlier. Is the time to switch to use virtio_net_config and
> allow all fields to be provisioned?
Don't quite get how it will be useful if switching to virtio_net_config. 
I thought we can add the missing fields to vdpa_dev_set_config even now 
to make it match virtio_net_config. Though the reality is there's few 
vdpa device that supports those features now. If any real device 
supports feature field in virtio_net_config but not in 
vdpa_dev_set_config, it can be gradually added so long as needed.

>
> And even for mtu we're lacking a way to report the maximum MTU allowed
> by mgmt dev (e.g the uplink MTU via netlink):
Since MTU is only implemented in mlx5_vdpa by now except for simulators, 
copy Eli to see if this is feasible to implement in real device. I think 
we also need to validate that the mtu configured on vDPA device instance 
shouldn't exceed the uplink MTU (maximum MTU allowed).

> 1) report the maximum host mtu supported by the mgmtdev via netlink
> (not done, so management needs to guess the maximum value now)
> 2) allow mtu to be provisioned (done)
> 3) show initial mtu (done by this patch)
So I wonder is it fine for vdpa core to come up with a default value for 
MTU when _F_MTU feature is to be provisioned or inherited? If we mandate 
each vDPA vendor to support at least the standard 1500 MTU for _F_MTU 
feature, we can make it default to 1500.

Otherwise the vDPA has to be taken (inherited) from the parent device. 
Unfortunately, right now for mlx5_vdpa, the parent mgmtdev device has 
1500 MTU by default regardless of the MTU on the uplink port, and I'm 
not sure if it's a right model to enforce mgmtdev go with uplink port's 
MTU. I would need to hear what vendors say about this requirement.

>
> We probably need to do the above for all fields to be self-contained.
Agreed on the part of being self-contained.

>
>> On the other hand, no mtu value specified may mean "go with what the
>> uplink port or parent device has". I think this is a pretty useful case
>> if the vendor's NIC supports updating MTU on the fly without having to
>> tear down QEMU and reconfigure vdpa. I'm not sure if we end up with
>> killing this use case by limiting initial max MTU to a fixed value.
>>
>>>
>>>>> 2) device_features: if device_features is not provisioned, we should
>>>>> still report it via netlink here
>>>> Not the way I expected it, but with Lingshan's series to expose
>>>> fields out of FEATURES_OK, the device_features is now reported
>>>> through 'vdpa dev config show' regardless being specified or not, if
>>>> I am not mistaken?
>>>
>>> Yes.
>> Do you want me to relocate to 'vdpa dev show', or it's okay to leave it
>> behind there?
> It's probably too late for the relocation but I feel it's better to
> place all the initial/inherited attributes into a single command even
> if some of them are already somewhere in another command, but we can
> hear from others.
Ok, that'll be fine. I supposed mgmt software should only query through 
"mgmtdev show" or "dev show", avoiding any query via"dev config show". 
It'd be the best to get all of the compatibility related info 
consolidated in one single place. Let me try to include it in "dev show".

>
>>>
>>>> Currently we export the config attributes upon vdpa creation under
>>>> the "initial_config" key. If we want to expose more default values
>>>> inherited from mgmtdev, I think we can wrap up these default values
>>>> under another key "inherited_config" to display in 'vdpa dev show'
>>>> output. Does it fit what you have in mind?
>>>
>>> I wonder if it's better to merge those two, or is there any advantages
>>> of splitting them?
>> I think for the most part "initial_config" will be sufficient for those
>> config attributes with "vdpa dev add" equivalents, be it user specified,
>> vdpa enforced default if missing user input, or default overridden by
>> the parent device. "inherited_config" will be useful for the configs
>> with no "vdpa dev add" equivalent or live out side of vdpa tool, but
>> still important for mgmt software to replicate identical vdpa setup.
>> Like max-supported-mtu (for the uplink port or parent device),
>> effective-link-speed, effective-link-status et al. Let's see if there's
>> more when we get there.
> So one point I can see is that, if there's no difference from the
> userpsace perspective, we'd better merge them. And I don't see any
> difference between the initial versus inherited from the view of user
> space. Do you?
So the major difference is "initial_config" is settable and equivalent 
to the config attribute in "vdpa dev add" command, while 
"inherited_config" is the read-only fields from "mgmtdev show" that does 
not correspond to any "vdpa dev add" vdpa attribute. That way the mgmt 
software can use the "initial_config" directly to recreate vdpa with 
identical device config, while using the "inherited_config" to replicate 
the other configs out of vdpa, for e.g. set uplink port's MTU to 9000. 
Maybe there's no need to fold such info into an "inherited_config" key? 
though I just want to make it relevant to migration compatibility. Any 
suggestion for the name or layout?


Thanks,
-Siwei

>
> Thanks
>
>> Thanks,
>> -Siwei
>>
>>>
>>>>> or do you mean the mgmt can assume it
>>>>> should be the same as mgmtdev. Anyhow if we don't show device_features
>>>>> if it is not provisioned, it will complicate the mgmt software.
>>>> Yes, as I said earlier, since the device_features attr getting added
>>>> to the 'vdpa dev config show' command, this divergence started to
>>>> complicate mgmt software already.
>>>>
>>>> Thanks,
>>>
>>> Thanks
>>>
>>>
>>>> -Siwei
>>>>> Thanks
>>>>>
>>>>>> Thanks,
>>>>>> -Siwei
>>>>>>
>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>> +
>>>>>>>> +       return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static int
>>>>>>>>     vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff *msg,
>>>>>>>> u32 portid, u32 seq,
>>>>>>>>                  int flags, struct netlink_ext_ack *extack)
>>>>>>>>     {
>>>>>>>> @@ -715,6 +750,10 @@ static int
>>>>>>>> vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct
>>>>>>>> genl_info *i
>>>>>>>>            if (nla_put_u16(msg, VDPA_ATTR_DEV_MIN_VQ_SIZE,
>>>>>>>> min_vq_size))
>>>>>>>>                    goto msg_err;
>>>>>>>>
>>>>>>>> +       err = vdpa_dev_initcfg_fill(vdev, msg, device_id);
>>>>>>>> +       if (err)
>>>>>>>> +               goto msg_err;
>>>>>>>> +
>>>>>>>>            genlmsg_end(msg, hdr);
>>>>>>>>            return 0;
>>>>>>>>
>>>>>>>> --
>>>>>>>> 1.8.3.1
>>>>>>>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v3 3/4] vdpa: show dev config as-is in "vdpa dev show" output
  2022-10-28 23:23                   ` Si-Wei Liu
  (?)
@ 2022-10-30 13:36                   ` Eli Cohen
  -1 siblings, 0 replies; 42+ messages in thread
From: Eli Cohen @ 2022-10-30 13:36 UTC (permalink / raw)
  To: Si-Wei Liu, Jason Wang
  Cc: mst, Parav Pandit, virtualization, linux-kernel, Cindy Lu

> From: Si-Wei Liu <si-wei.liu@oracle.com>
> Sent: Saturday, 29 October 2022 2:24
> To: Jason Wang <jasowang@redhat.com>; Eli Cohen <elic@nvidia.com>
> Cc: mst@redhat.com; Parav Pandit <parav@nvidia.com>;
> virtualization@lists.linux-foundation.org; linux-kernel@vger.kernel.org; Cindy
> Lu <lulu@redhat.com>
> Subject: Re: [PATCH v3 3/4] vdpa: show dev config as-is in "vdpa dev show"
> output
> 
> 
> 
> On 10/27/2022 1:47 AM, Jason Wang wrote:
> > On Thu, Oct 27, 2022 at 2:31 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>
> >>
> >> On 10/25/2022 9:44 PM, Jason Wang wrote:
> >>> 在 2022/10/26 09:10, Si-Wei Liu 写道:
> >>>>
> >>>> On 10/24/2022 7:24 PM, Jason Wang wrote:
> >>>>> On Tue, Oct 25, 2022 at 3:14 AM Si-Wei Liu<si-wei.liu@oracle.com>
> >>>>> wrote:
> >>>>>> On 10/24/2022 1:40 AM, Jason Wang wrote:
> >>>>>>> On Sat, Oct 22, 2022 at 7:49 AM Si-Wei Liu<si-wei.liu@oracle.com>
> >>>>>>> wrote:
> >>>>>>>> Live migration of vdpa would typically require re-instate vdpa
> >>>>>>>> device with an idential set of configs on the destination node,
> >>>>>>>> same way as how source node created the device in the first
> >>>>>>>> place. In order to save orchestration software from memorizing
> >>>>>>>> and keeping track of vdpa config, it will be helpful if the vdpa
> >>>>>>>> tool provides the aids for exporting the initial configs as-is,
> >>>>>>>> the way how vdpa device was created. The "vdpa dev show"
> command
> >>>>>>>> seems to be the right vehicle for that. It is unlike the "vdpa dev
> >>>>>>>> config show" command output which usually goes with the live value
> >>>>>>>> in the device config space, and is not quite reliable subject to
> >>>>>>>> the dynamics of feature negotiation or possible change by the
> >>>>>>>> driver to the config space.
> >>>>>>>>
> >>>>>>>> Examples:
> >>>>>>>>
> >>>>>>>> 1) Create vDPA by default without any config attribute
> >>>>>>>>
> >>>>>>>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0
> >>>>>>>> $ vdpa dev show vdpa0
> >>>>>>>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555
> >>>>>>>> max_vqs 9 max_vq_size 256
> >>>>>>>> $ vdpa dev -jp show vdpa0
> >>>>>>>> {
> >>>>>>>>        "dev": {
> >>>>>>>>            "vdpa0": {
> >>>>>>>>                "type": "network",
> >>>>>>>>                "mgmtdev": "pci/0000:41:04.2",
> >>>>>>>>                "vendor_id": 5555,
> >>>>>>>>                "max_vqs": 9,
> >>>>>>>>                "max_vq_size": 256,
> >>>>>>>>            }
> >>>>>>>>        }
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> 2) Create vDPA with config attribute(s) specified
> >>>>>>>>
> >>>>>>>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0 \
> >>>>>>>>        mac e4:11:c6:d3:45:f0 max_vq_pairs 4
> >>>>>>>> $ vdpa dev show
> >>>>>>>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555
> >>>>>>>> max_vqs 9 max_vq_size 256
> >>>>>>>>      initial_config: mac e4:11:c6:d3:45:f0 max_vq_pairs 4
> >>>>>>>> $ vdpa dev -jp show
> >>>>>>>> {
> >>>>>>>>        "dev": {
> >>>>>>>>            "vdpa0": {
> >>>>>>>>                "type": "network",
> >>>>>>>>                "mgmtdev": "pci/0000:41:04.2",
> >>>>>>>>                "vendor_id": 5555,
> >>>>>>>>                "max_vqs": 9,
> >>>>>>>>                "max_vq_size": 256,
> >>>>>>>>                "initial_config": {
> >>>>>>>>                    "mac": "e4:11:c6:d3:45:f0",
> >>>>>>>>                    "max_vq_pairs": 4
> >>>>>>>>                }
> >>>>>>>>            }
> >>>>>>>>        }
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> Signed-off-by: Si-Wei Liu<si-wei.liu@oracle.com>
> >>>>>>>> ---
> >>>>>>>>     drivers/vdpa/vdpa.c | 39
> +++++++++++++++++++++++++++++++++++++++
> >>>>>>>>     1 file changed, 39 insertions(+)
> >>>>>>>>
> >>>>>>>> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
> >>>>>>>> index bebded6..bfb8f54 100644
> >>>>>>>> --- a/drivers/vdpa/vdpa.c
> >>>>>>>> +++ b/drivers/vdpa/vdpa.c
> >>>>>>>> @@ -677,6 +677,41 @@ static int
> >>>>>>>> vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct
> >>>>>>>> genl_info *i
> >>>>>>>>     }
> >>>>>>>>
> >>>>>>>>     static int
> >>>>>>>> +vdpa_dev_initcfg_fill(struct vdpa_device *vdev, struct sk_buff
> >>>>>>>> *msg, u32 device_id)
> >>>>>>>> +{
> >>>>>>>> +       struct vdpa_dev_set_config *cfg = &vdev->init_cfg;
> >>>>>>>> +       int err = -EMSGSIZE;
> >>>>>>>> +
> >>>>>>>> +       if (!cfg->mask)
> >>>>>>>> +               return 0;
> >>>>>>>> +
> >>>>>>>> +       switch (device_id) {
> >>>>>>>> +       case VIRTIO_ID_NET:
> >>>>>>>> +               if ((cfg->mask &
> >>>>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR)) != 0 &&
> >>>>>>>> +                   nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
> >>>>>>>> +                           sizeof(cfg->net.mac), cfg->net.mac))
> >>>>>>>> +                       return err;
> >>>>>>>> +               if ((cfg->mask &
> >>>>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU)) != 0 &&
> >>>>>>>> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU,
> >>>>>>>> cfg->net.mtu))
> >>>>>>>> +                       return err;
> >>>>>>>> +               if ((cfg->mask &
> >>>>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP)) != 0 &&
> >>>>>>>> +                   nla_put_u16(msg,
> VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
> >>>>>>>> + cfg->net.max_vq_pairs))
> >>>>>>>> +                       return err;
> >>>>>>>> +               break;
> >>>>>>>> +       default:
> >>>>>>>> +               break;
> >>>>>>>> +       }
> >>>>>>>> +
> >>>>>>>> +       if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) != 0
> &&
> >>>>>>>> +           nla_put_u64_64bit(msg, VDPA_ATTR_DEV_FEATURES,
> >>>>>>>> +                             cfg->device_features, VDPA_ATTR_PAD))
> >>>>>>>> +               return err;
> >>>>>>> A question: If any of those above attributes were not provisioned,
> >>>>>>> should we show the ones that are inherited from the parent?
> >>>>>> A simple answer would be yes, but the long answer is that I am not
> >>>>>> sure
> >>>>>> if there's any for the moment - there's no  default value for mtu,
> >>>>>> mac,
> >>>>>> and max_vqp that can be inherited from the parent (max_vqp by
> default
> >>>>>> being 1 is spec defined, not something inherited from the parent).
> >>>>> Note that it is by default from driver level that if _F_MQ is not
> >>>>> negotiated. But I think we are talking about something different that
> >>>>> is out of the spec here, what if:
> >>>>>
> >>>>> vDPA inherit _F_MQ but mqx_vqp is not provisioned via netlink.
> >>>>>
> >>>>> Or is it not allowed?
> >>>> My understanding is that this is not allowed any more since the
> >>>> introduction of max_vqp attribute. Noted, currently we don't have a
> >>>> way for vendor driver to report the default value for mqx_vqp,
> >>>
> >>> I think it can be reported in this patch?
> >> Yes, we can add, but I am not sure whether or not this will be
> >> practically useful, for e.g. the same command without max_vqp specified
> >> may render different number of queues across different devices, or
> >> different revisions of the same vendor's devices. Does it complicate the
> >> mgmt software even more, I'm not sure....
> > It depends on the use case, e.g if we want to compare the migration
> > compatibility, having a single vdpa command query is much easier than
> > having two or more.
> Yep I agree. I was saying not very attribute would need to be inherited
> from the parent device. Actually attributes like max_vqp could take the
> default from some common place for e.g. some default value can be
> applied by vdpa core. And we can document these attributes ruled by vdpa
> core in vdpa-dev(8) man page. Reduce the extra call of having mgmt
> software issue another query command which actually doesn't need to.
> 
> >
> >> Could we instead mandate
> >> max_vqp to be 1 from vdpa core level if user doesn't explicitly specify
> >> the value?
> > This seems to be not easy, at least not easy in the vDPA core.
> We can load these default values from vdpa_nl_cmd_dev_add_set_doit()
> before ops->dev_add is called. I can post a v3 that shows the code, it
> shouldn't be too hard.
> 
> >   We can
> > probably document this somewhere but max_vqp is only one example, we
> > have other mq devices like block/SCSI/console.
> Actually max_vqp is a network device specific config to provision mq
> devices. If the parent mgmtdev supports net vdpa device creation and
> user requests to provision _F_MQ with no supplied max_vqp value, we
> should load some global default value there.
> 
> >
> >> That way it is more consistent in terms of the resulting
> >> number of queue pairs (=1) with the case where parent device does not
> >> offer the _F_MQ feature.
> > Right, but a corner case is to provision _F_MQ but without max_vqp.
> Yes, I will post the patch that supports this.
> >
> >>>
> >>>> if not otherwise specified in the CLI. Without getting the default
> >>>> value reported in 'vdpa mgmtdev show' level, it'd just confuse mgmt
> >>>> software even more.
> >>>
> >>> Yes, this is something that we need to fix. And what's more in order
> >>> to support dynamic provisioning, we need a way to report the number of
> >>> available instances that could be used for vDPA device provisioning.
> >> Wouldn't it be possible to achieve that by simply checking how many
> >> parent mgmtdev instances don't have vdpa device provisioned yet? e.g.
> >>
> >> inuse=$(vdpa dev show | grep mgmtdev | wc -l)
> >> total=$(vdpa mgmtdev show  | grep "supported_classes" | wc -l )
> >> echo $((total - inuse))
> > I meant how many available vDPA devices that are available for the
> > mgmt to create?
> Oh I see.
> 
> >
> > E.g in the case of sub function or simulator a mgmtdev can create more
> > than 1 vdpa devices.
> Does the sub function today supports creation of multiple vDPA instance
> per mgmtdev? Something I wasn't aware of before. Is it with different
> device class?
> 
> >
> >>>
> >>>>>     At least some time in the past, mlx5 were
> >>>>> enabled with MQ with 8 queue pairs by default.
> >>>> That was the situation when there's no max_vqp attribute support from
> >>>> vdpa netlink API level. I think now every driver honors the vdpa core
> >>>> disposition to get a single queue pair if max_vqp config is missing.
> >>> So we have:
> >>>
> >>> int vdpa_register_device(struct vdpa_device *vdev, int nvqs)
> >>>
> >>> This means technically, parent can allocate a multiqueue devices with
> >>> _F_MQ features if max_vqp and device_features is not provisioned. And
> >>> what's more, what happens if _F_MQ is provisioned by max_vqp is not
> >>> specified?
> >>>
> >>> The question is:
> >>>
> >>> When a attribute is not specificed/provisioned via net link, what's
> >>> the default value? The answer should be consistent: if device_features
> >>> is determined by the parent, we should do the same for mqx_vqp.
> >> OK I got your point.
> >>
> >>> And it looks to me all of those belongs to the initial config
> >>> (self-contained)
> >> Right. I wonder if we can have vdpa core define the default value (for
> >> e.g. max_vqp=1) for those unspecified attribute (esp. when the
> >> corresponding device feature is offered and provisioned) whenever
> >> possible. Which I think it'll be more consistent for the same command to
> >> get to the same result between different vendor drivers. While we still
> >> keep the possibility for future extension to allow driver override the
> >> vdpa core disposition if the real use case emerges. What do you think?
> > That's possible but we may end up with device specific code in the
> > vDPA core which is not elegant, and the code will grow as the number
> > of supported types grows.
> I guess that's unavoidable as this is already the case today. See
> various VIRTIO_ID_NET case switch in the vdpa.c code. I think type
> specific code just limits itself to the netlink API interfacing layer
> rather than down to the driver API, it might be just okay (as that's
> already the case).
> 
> >
> > Note that, max_vqp is not the only attribute that may suffer from
> > this, basically any config field that depends on a specific feature
> > bit may have the same issue.
> >
> >>>
> >>>> And the mlx5_vdpa driver with 8 queue pairs in the wild days is just
> >>>> irrelevant to be manageable by mgmt software, regardless of live
> >>>> migration.
> >>>>>> And
> >>>>>> the device_features if inherited is displayed at 'vdpa dev config
> >>>>>> show'
> >>>>>> output. Can you remind me of a good example for inherited value
> >>>>>> that we
> >>>>>> may want to show here?
> >>>>> Some other cases:
> >>>>>
> >>>>> 1) MTU: there should be something that the device needs to report if
> >>>>> _F_MTU is negotiated even if it is not provisioned from netlink.
> >>>> I am not sure I understand the ask here. Noted the QEMU argument has
> >>>> to offer host_mtu=X with the maximum MTU value for guest to use (and
> >>>> applied as the initial MTU config during virtio-net probing for Linux
> >>>> driver),
> >>>
> >>> Adding Cindy.
> >>>
> >>> I think it's a known issue that we need to do sanity check to make
> >>> sure cli parameters matches what is provisioned from netlink.
> >> Right. How's the plan for QEMU to get to the mtu provisioned by netlink,
> >> via a new vhost-vdpa ioctl call?
> > I think netlink is not designed for qemu to use, the design is to
> > expose a vhost device to Qemu.
> >
> >> If so, will  QEMU be able to read it
> >> directly from kernel when it comes to the vhost-vdpa backend, without
> >> having user to specify host_mtu from CLI?
> > I'm not sure I get the question, but Qemu should get this via config
> > space (otherwise it should be a bug).
> It's hard for QEMU to work this way with the existing get_config() ops I
> think, as it has assumption around endianness and feature negotiation,
> until the latter is done you can't get any reliable value for
> provisioned property. I think QEMU which need to validate the
> provisioned value way earlier (when QEMU is launched), before
> negotiation kicks in. It would be clearner to use another vhost and a
> new vdpa driver ops to retrieve the provisioned feature config values
> from vendor drivers.
> 
> >   And Qemu need to verify the mtu
> > got from cli vs the mtu got from vhost and fail the device
> > initialization if they don't match.
> I mean today there's a problem for double provisioning: for e.g. mtu has
> to be first provided in the 'vdpa dev add' command when to provision
> _F_MTU, in QEMU CLI the same value has to be supplied to host_mtu. The
> same applies to mac address. It would be the best we can allow QEMU load
> the provisioned value from vdpa device directly, without having to
> provide extra duplicated configs in QEMU CLI level.
> 
> >
> >>>
> >>>> and the way to get the parent device MTU and whether that's relevant
> >>>> to vdpa device's MTU is very vendor specific.
> >>>
> >>> So I think the max MTU of parent should be equal to the max MTU of the
> >>> vDPA.
> >> Noted here the parent might not be necessarily the mgmtdev where vdpa
> >> gets created over. It may well end up with the MTU on the PF (uplink
> >> port) which the mgmt software has to concern with. My point is the
> >> utility and tool chain able to derive the maximal MTU effectively
> >> allowed for vDPA device may live out of vDPA's realm. It's a rare or
> >> even invalid configuration to have vDPA configured with a bigger value
> >> than the MTU on the uplink port or parent device. It's more common when
> >> MTU config is involved, it has to be consistently configured across all
> >> the network links along, from parent device (uplink port) down to the
> >> switchdev representor port, vdpa device, and QEMU virtio-net object.
> > Ok, right.
> >
> >>>
> >>>> I think we would need new attribute(s) in the mgmtdev level to
> >>>> support what you want here?
> >>>
> >>> Not sure, but what I want to ask is consider we provision MTU feature
> >>> but without max MTU value, do we need to report the initial max MTU
> here?
> >> Yep, maybe. I'm not very sure if this will be very useful to be honest,
> >> consider it's kinda a rare case to me were to provision MTU feature
> >> without a specific MTU value. If one cares about MTU, mgmt software
> >> should configure some mtu through "vdpa dev add ... mtu ...", no?
> > Yes, but this only works if all config fields could be provisioned,
> > which seems not the case now, vdpa_dev_set_config is currently a
> > subset of virtio_net_config. So this goes back to the question I
> > raised earlier. Is the time to switch to use virtio_net_config and
> > allow all fields to be provisioned?
> Don't quite get how it will be useful if switching to virtio_net_config.
> I thought we can add the missing fields to vdpa_dev_set_config even now
> to make it match virtio_net_config. Though the reality is there's few
> vdpa device that supports those features now. If any real device
> supports feature field in virtio_net_config but not in
> vdpa_dev_set_config, it can be gradually added so long as needed.
> 
> >
> > And even for mtu we're lacking a way to report the maximum MTU allowed
> > by mgmt dev (e.g the uplink MTU via netlink):
> Since MTU is only implemented in mlx5_vdpa by now except for simulators,
> copy Eli to see if this is feasible to implement in real device. I think
> we also need to validate that the mtu configured on vDPA device instance
> shouldn't exceed the uplink MTU (maximum MTU allowed).
> 
The interface for querying the physical port operational MTU is in place but does not seem to work properly.
I am trying to get answers on this but it may take as people are in vacation and this coming Tuesday is not a
Working day in Israel (elections).

> > 1) report the maximum host mtu supported by the mgmtdev via netlink
> > (not done, so management needs to guess the maximum value now)
> > 2) allow mtu to be provisioned (done)
> > 3) show initial mtu (done by this patch)
> So I wonder is it fine for vdpa core to come up with a default value for
> MTU when _F_MTU feature is to be provisioned or inherited? If we mandate
> each vDPA vendor to support at least the standard 1500 MTU for _F_MTU
> feature, we can make it default to 1500.
> 
> Otherwise the vDPA has to be taken (inherited) from the parent device.
> Unfortunately, right now for mlx5_vdpa, the parent mgmtdev device has
> 1500 MTU by default regardless of the MTU on the uplink port, and I'm
> not sure if it's a right model to enforce mgmtdev go with uplink port's
> MTU. I would need to hear what vendors say about this requirement.
> 
> >
> > We probably need to do the above for all fields to be self-contained.
> Agreed on the part of being self-contained.
> 
> >
> >> On the other hand, no mtu value specified may mean "go with what the
> >> uplink port or parent device has". I think this is a pretty useful case
> >> if the vendor's NIC supports updating MTU on the fly without having to
> >> tear down QEMU and reconfigure vdpa. I'm not sure if we end up with
> >> killing this use case by limiting initial max MTU to a fixed value.
> >>
> >>>
> >>>>> 2) device_features: if device_features is not provisioned, we should
> >>>>> still report it via netlink here
> >>>> Not the way I expected it, but with Lingshan's series to expose
> >>>> fields out of FEATURES_OK, the device_features is now reported
> >>>> through 'vdpa dev config show' regardless being specified or not, if
> >>>> I am not mistaken?
> >>>
> >>> Yes.
> >> Do you want me to relocate to 'vdpa dev show', or it's okay to leave it
> >> behind there?
> > It's probably too late for the relocation but I feel it's better to
> > place all the initial/inherited attributes into a single command even
> > if some of them are already somewhere in another command, but we can
> > hear from others.
> Ok, that'll be fine. I supposed mgmt software should only query through
> "mgmtdev show" or "dev show", avoiding any query via"dev config show".
> It'd be the best to get all of the compatibility related info
> consolidated in one single place. Let me try to include it in "dev show".
> 
> >
> >>>
> >>>> Currently we export the config attributes upon vdpa creation under
> >>>> the "initial_config" key. If we want to expose more default values
> >>>> inherited from mgmtdev, I think we can wrap up these default values
> >>>> under another key "inherited_config" to display in 'vdpa dev show'
> >>>> output. Does it fit what you have in mind?
> >>>
> >>> I wonder if it's better to merge those two, or is there any advantages
> >>> of splitting them?
> >> I think for the most part "initial_config" will be sufficient for those
> >> config attributes with "vdpa dev add" equivalents, be it user specified,
> >> vdpa enforced default if missing user input, or default overridden by
> >> the parent device. "inherited_config" will be useful for the configs
> >> with no "vdpa dev add" equivalent or live out side of vdpa tool, but
> >> still important for mgmt software to replicate identical vdpa setup.
> >> Like max-supported-mtu (for the uplink port or parent device),
> >> effective-link-speed, effective-link-status et al. Let's see if there's
> >> more when we get there.
> > So one point I can see is that, if there's no difference from the
> > userpsace perspective, we'd better merge them. And I don't see any
> > difference between the initial versus inherited from the view of user
> > space. Do you?
> So the major difference is "initial_config" is settable and equivalent
> to the config attribute in "vdpa dev add" command, while
> "inherited_config" is the read-only fields from "mgmtdev show" that does
> not correspond to any "vdpa dev add" vdpa attribute. That way the mgmt
> software can use the "initial_config" directly to recreate vdpa with
> identical device config, while using the "inherited_config" to replicate
> the other configs out of vdpa, for e.g. set uplink port's MTU to 9000.
> Maybe there's no need to fold such info into an "inherited_config" key?
> though I just want to make it relevant to migration compatibility. Any
> suggestion for the name or layout?
> 
> 
> Thanks,
> -Siwei
> 
> >
> > Thanks
> >
> >> Thanks,
> >> -Siwei
> >>
> >>>
> >>>>> or do you mean the mgmt can assume it
> >>>>> should be the same as mgmtdev. Anyhow if we don't show
> device_features
> >>>>> if it is not provisioned, it will complicate the mgmt software.
> >>>> Yes, as I said earlier, since the device_features attr getting added
> >>>> to the 'vdpa dev config show' command, this divergence started to
> >>>> complicate mgmt software already.
> >>>>
> >>>> Thanks,
> >>>
> >>> Thanks
> >>>
> >>>
> >>>> -Siwei
> >>>>> Thanks
> >>>>>
> >>>>>> Thanks,
> >>>>>> -Siwei
> >>>>>>
> >>>>>>
> >>>>>>> Thanks
> >>>>>>>
> >>>>>>>> +
> >>>>>>>> +       return 0;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +static int
> >>>>>>>>     vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff *msg,
> >>>>>>>> u32 portid, u32 seq,
> >>>>>>>>                  int flags, struct netlink_ext_ack *extack)
> >>>>>>>>     {
> >>>>>>>> @@ -715,6 +750,10 @@ static int
> >>>>>>>> vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct
> >>>>>>>> genl_info *i
> >>>>>>>>            if (nla_put_u16(msg, VDPA_ATTR_DEV_MIN_VQ_SIZE,
> >>>>>>>> min_vq_size))
> >>>>>>>>                    goto msg_err;
> >>>>>>>>
> >>>>>>>> +       err = vdpa_dev_initcfg_fill(vdev, msg, device_id);
> >>>>>>>> +       if (err)
> >>>>>>>> +               goto msg_err;
> >>>>>>>> +
> >>>>>>>>            genlmsg_end(msg, hdr);
> >>>>>>>>            return 0;
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> 1.8.3.1
> >>>>>>>>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 3/4] vdpa: show dev config as-is in "vdpa dev show" output
  2022-10-28 23:23                   ` Si-Wei Liu
@ 2022-12-19  6:31                     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2022-12-19  6:31 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: Cindy Lu, linux-kernel, virtualization, Eli Cohen

On Fri, Oct 28, 2022 at 04:23:49PM -0700, Si-Wei Liu wrote:
> I can post a v3 that shows the code, it shouldn't be
> too hard.

I take it another version of this patchset is planned?

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 3/4] vdpa: show dev config as-is in "vdpa dev show" output
@ 2022-12-19  6:31                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2022-12-19  6:31 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: Jason Wang, Eli Cohen, parav, virtualization, linux-kernel, Cindy Lu

On Fri, Oct 28, 2022 at 04:23:49PM -0700, Si-Wei Liu wrote:
> I can post a v3 that shows the code, it shouldn't be
> too hard.

I take it another version of this patchset is planned?

-- 
MST


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 3/4] vdpa: show dev config as-is in "vdpa dev show" output
  2022-10-28 23:23                   ` Si-Wei Liu
@ 2022-12-20  7:58                     ` Jason Wang
  -1 siblings, 0 replies; 42+ messages in thread
From: Jason Wang @ 2022-12-20  7:58 UTC (permalink / raw)
  To: Si-Wei Liu, Eli Cohen; +Cc: virtualization, linux-kernel, Cindy Lu, mst


在 2022/10/29 07:23, Si-Wei Liu 写道:
>
>
> On 10/27/2022 1:47 AM, Jason Wang wrote:
>> On Thu, Oct 27, 2022 at 2:31 PM Si-Wei Liu <si-wei.liu@oracle.com> 
>> wrote:
>>>
>>>
>>> On 10/25/2022 9:44 PM, Jason Wang wrote:
>>>> 在 2022/10/26 09:10, Si-Wei Liu 写道:
>>>>>
>>>>> On 10/24/2022 7:24 PM, Jason Wang wrote:
>>>>>> On Tue, Oct 25, 2022 at 3:14 AM Si-Wei Liu<si-wei.liu@oracle.com>
>>>>>> wrote:
>>>>>>> On 10/24/2022 1:40 AM, Jason Wang wrote:
>>>>>>>> On Sat, Oct 22, 2022 at 7:49 AM Si-Wei Liu<si-wei.liu@oracle.com>
>>>>>>>> wrote:
>>>>>>>>> Live migration of vdpa would typically require re-instate vdpa
>>>>>>>>> device with an idential set of configs on the destination node,
>>>>>>>>> same way as how source node created the device in the first
>>>>>>>>> place. In order to save orchestration software from memorizing
>>>>>>>>> and keeping track of vdpa config, it will be helpful if the vdpa
>>>>>>>>> tool provides the aids for exporting the initial configs as-is,
>>>>>>>>> the way how vdpa device was created. The "vdpa dev show" command
>>>>>>>>> seems to be the right vehicle for that. It is unlike the "vdpa 
>>>>>>>>> dev
>>>>>>>>> config show" command output which usually goes with the live 
>>>>>>>>> value
>>>>>>>>> in the device config space, and is not quite reliable subject to
>>>>>>>>> the dynamics of feature negotiation or possible change by the
>>>>>>>>> driver to the config space.
>>>>>>>>>
>>>>>>>>> Examples:
>>>>>>>>>
>>>>>>>>> 1) Create vDPA by default without any config attribute
>>>>>>>>>
>>>>>>>>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0
>>>>>>>>> $ vdpa dev show vdpa0
>>>>>>>>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555
>>>>>>>>> max_vqs 9 max_vq_size 256
>>>>>>>>> $ vdpa dev -jp show vdpa0
>>>>>>>>> {
>>>>>>>>>        "dev": {
>>>>>>>>>            "vdpa0": {
>>>>>>>>>                "type": "network",
>>>>>>>>>                "mgmtdev": "pci/0000:41:04.2",
>>>>>>>>>                "vendor_id": 5555,
>>>>>>>>>                "max_vqs": 9,
>>>>>>>>>                "max_vq_size": 256,
>>>>>>>>>            }
>>>>>>>>>        }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> 2) Create vDPA with config attribute(s) specified
>>>>>>>>>
>>>>>>>>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0 \
>>>>>>>>>        mac e4:11:c6:d3:45:f0 max_vq_pairs 4
>>>>>>>>> $ vdpa dev show
>>>>>>>>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555
>>>>>>>>> max_vqs 9 max_vq_size 256
>>>>>>>>>      initial_config: mac e4:11:c6:d3:45:f0 max_vq_pairs 4
>>>>>>>>> $ vdpa dev -jp show
>>>>>>>>> {
>>>>>>>>>        "dev": {
>>>>>>>>>            "vdpa0": {
>>>>>>>>>                "type": "network",
>>>>>>>>>                "mgmtdev": "pci/0000:41:04.2",
>>>>>>>>>                "vendor_id": 5555,
>>>>>>>>>                "max_vqs": 9,
>>>>>>>>>                "max_vq_size": 256,
>>>>>>>>>                "initial_config": {
>>>>>>>>>                    "mac": "e4:11:c6:d3:45:f0",
>>>>>>>>>                    "max_vq_pairs": 4
>>>>>>>>>                }
>>>>>>>>>            }
>>>>>>>>>        }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Signed-off-by: Si-Wei Liu<si-wei.liu@oracle.com>
>>>>>>>>> ---
>>>>>>>>>     drivers/vdpa/vdpa.c | 39 
>>>>>>>>> +++++++++++++++++++++++++++++++++++++++
>>>>>>>>>     1 file changed, 39 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
>>>>>>>>> index bebded6..bfb8f54 100644
>>>>>>>>> --- a/drivers/vdpa/vdpa.c
>>>>>>>>> +++ b/drivers/vdpa/vdpa.c
>>>>>>>>> @@ -677,6 +677,41 @@ static int
>>>>>>>>> vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct
>>>>>>>>> genl_info *i
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>>     static int
>>>>>>>>> +vdpa_dev_initcfg_fill(struct vdpa_device *vdev, struct sk_buff
>>>>>>>>> *msg, u32 device_id)
>>>>>>>>> +{
>>>>>>>>> +       struct vdpa_dev_set_config *cfg = &vdev->init_cfg;
>>>>>>>>> +       int err = -EMSGSIZE;
>>>>>>>>> +
>>>>>>>>> +       if (!cfg->mask)
>>>>>>>>> +               return 0;
>>>>>>>>> +
>>>>>>>>> +       switch (device_id) {
>>>>>>>>> +       case VIRTIO_ID_NET:
>>>>>>>>> +               if ((cfg->mask &
>>>>>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR)) != 0 &&
>>>>>>>>> +                   nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
>>>>>>>>> + sizeof(cfg->net.mac), cfg->net.mac))
>>>>>>>>> +                       return err;
>>>>>>>>> +               if ((cfg->mask &
>>>>>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU)) != 0 &&
>>>>>>>>> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU,
>>>>>>>>> cfg->net.mtu))
>>>>>>>>> +                       return err;
>>>>>>>>> +               if ((cfg->mask &
>>>>>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP)) != 0 &&
>>>>>>>>> +                   nla_put_u16(msg, 
>>>>>>>>> VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
>>>>>>>>> + cfg->net.max_vq_pairs))
>>>>>>>>> +                       return err;
>>>>>>>>> +               break;
>>>>>>>>> +       default:
>>>>>>>>> +               break;
>>>>>>>>> +       }
>>>>>>>>> +
>>>>>>>>> +       if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) != 0 &&
>>>>>>>>> +           nla_put_u64_64bit(msg, VDPA_ATTR_DEV_FEATURES,
>>>>>>>>> + cfg->device_features, VDPA_ATTR_PAD))
>>>>>>>>> +               return err;
>>>>>>>> A question: If any of those above attributes were not provisioned,
>>>>>>>> should we show the ones that are inherited from the parent?
>>>>>>> A simple answer would be yes, but the long answer is that I am not
>>>>>>> sure
>>>>>>> if there's any for the moment - there's no  default value for mtu,
>>>>>>> mac,
>>>>>>> and max_vqp that can be inherited from the parent (max_vqp by 
>>>>>>> default
>>>>>>> being 1 is spec defined, not something inherited from the parent).
>>>>>> Note that it is by default from driver level that if _F_MQ is not
>>>>>> negotiated. But I think we are talking about something different 
>>>>>> that
>>>>>> is out of the spec here, what if:
>>>>>>
>>>>>> vDPA inherit _F_MQ but mqx_vqp is not provisioned via netlink.
>>>>>>
>>>>>> Or is it not allowed?
>>>>> My understanding is that this is not allowed any more since the
>>>>> introduction of max_vqp attribute. Noted, currently we don't have a
>>>>> way for vendor driver to report the default value for mqx_vqp,
>>>>
>>>> I think it can be reported in this patch?
>>> Yes, we can add, but I am not sure whether or not this will be
>>> practically useful, for e.g. the same command without max_vqp specified
>>> may render different number of queues across different devices, or
>>> different revisions of the same vendor's devices. Does it complicate 
>>> the
>>> mgmt software even more, I'm not sure....
>> It depends on the use case, e.g if we want to compare the migration
>> compatibility, having a single vdpa command query is much easier than
>> having two or more.
> Yep I agree. I was saying not very attribute would need to be 
> inherited from the parent device. Actually attributes like max_vqp 
> could take the default from some common place for e.g. some default 
> value can be applied by vdpa core. And we can document these 
> attributes ruled by vdpa core in vdpa-dev(8) man page. Reduce the 
> extra call of having mgmt software issue another query command which 
> actually doesn't need to.
>
>>
>>> Could we instead mandate
>>> max_vqp to be 1 from vdpa core level if user doesn't explicitly specify
>>> the value?
>> This seems to be not easy, at least not easy in the vDPA core.
> We can load these default values from vdpa_nl_cmd_dev_add_set_doit() 
> before ops->dev_add is called. I can post a v3 that shows the code, it 
> shouldn't be too hard.


Ok, and I wonder if it's time to move netlink specific code into a 
dedicated file.


>
>>   We can
>> probably document this somewhere but max_vqp is only one example, we
>> have other mq devices like block/SCSI/console.
> Actually max_vqp is a network device specific config to provision mq 
> devices. If the parent mgmtdev supports net vdpa device creation and 
> user requests to provision _F_MQ with no supplied max_vqp value, we 
> should load some global default value there.
>
>>
>>> That way it is more consistent in terms of the resulting
>>> number of queue pairs (=1) with the case where parent device does not
>>> offer the _F_MQ feature.
>> Right, but a corner case is to provision _F_MQ but without max_vqp.
> Yes, I will post the patch that supports this.
>>
>>>>
>>>>> if not otherwise specified in the CLI. Without getting the default
>>>>> value reported in 'vdpa mgmtdev show' level, it'd just confuse mgmt
>>>>> software even more.
>>>>
>>>> Yes, this is something that we need to fix. And what's more in order
>>>> to support dynamic provisioning, we need a way to report the number of
>>>> available instances that could be used for vDPA device provisioning.
>>> Wouldn't it be possible to achieve that by simply checking how many
>>> parent mgmtdev instances don't have vdpa device provisioned yet? e.g.
>>>
>>> inuse=$(vdpa dev show | grep mgmtdev | wc -l)
>>> total=$(vdpa mgmtdev show  | grep "supported_classes" | wc -l )
>>> echo $((total - inuse))
>> I meant how many available vDPA devices that are available for the
>> mgmt to create?
> Oh I see.
>
>>
>> E.g in the case of sub function or simulator a mgmtdev can create more
>> than 1 vdpa devices.
> Does the sub function today supports creation of multiple vDPA 
> instance per mgmtdev?


I think so, otherwise SF doesn't make too much sense.


> Something I wasn't aware of before. Is it with different device class?


It should be possible (no limitation in the vdpa core at least). Each 
device class should register its own mgmtdev.


>
>>
>>>>
>>>>>>     At least some time in the past, mlx5 were
>>>>>> enabled with MQ with 8 queue pairs by default.
>>>>> That was the situation when there's no max_vqp attribute support from
>>>>> vdpa netlink API level. I think now every driver honors the vdpa core
>>>>> disposition to get a single queue pair if max_vqp config is missing.
>>>> So we have:
>>>>
>>>> int vdpa_register_device(struct vdpa_device *vdev, int nvqs)
>>>>
>>>> This means technically, parent can allocate a multiqueue devices with
>>>> _F_MQ features if max_vqp and device_features is not provisioned. And
>>>> what's more, what happens if _F_MQ is provisioned by max_vqp is not
>>>> specified?
>>>>
>>>> The question is:
>>>>
>>>> When a attribute is not specificed/provisioned via net link, what's
>>>> the default value? The answer should be consistent: if device_features
>>>> is determined by the parent, we should do the same for mqx_vqp.
>>> OK I got your point.
>>>
>>>> And it looks to me all of those belongs to the initial config
>>>> (self-contained)
>>> Right. I wonder if we can have vdpa core define the default value (for
>>> e.g. max_vqp=1) for those unspecified attribute (esp. when the
>>> corresponding device feature is offered and provisioned) whenever
>>> possible. Which I think it'll be more consistent for the same 
>>> command to
>>> get to the same result between different vendor drivers. While we still
>>> keep the possibility for future extension to allow driver override the
>>> vdpa core disposition if the real use case emerges. What do you think?
>> That's possible but we may end up with device specific code in the
>> vDPA core which is not elegant, and the code will grow as the number
>> of supported types grows.
> I guess that's unavoidable as this is already the case today. See 
> various VIRTIO_ID_NET case switch in the vdpa.c code. I think type 
> specific code just limits itself to the netlink API interfacing layer 
> rather than down to the driver API, it might be just okay (as that's 
> already the case).
>
>>
>> Note that, max_vqp is not the only attribute that may suffer from
>> this, basically any config field that depends on a specific feature
>> bit may have the same issue.
>>
>>>>
>>>>> And the mlx5_vdpa driver with 8 queue pairs in the wild days is just
>>>>> irrelevant to be manageable by mgmt software, regardless of live
>>>>> migration.
>>>>>>> And
>>>>>>> the device_features if inherited is displayed at 'vdpa dev config
>>>>>>> show'
>>>>>>> output. Can you remind me of a good example for inherited value
>>>>>>> that we
>>>>>>> may want to show here?
>>>>>> Some other cases:
>>>>>>
>>>>>> 1) MTU: there should be something that the device needs to report if
>>>>>> _F_MTU is negotiated even if it is not provisioned from netlink.
>>>>> I am not sure I understand the ask here. Noted the QEMU argument has
>>>>> to offer host_mtu=X with the maximum MTU value for guest to use (and
>>>>> applied as the initial MTU config during virtio-net probing for Linux
>>>>> driver),
>>>>
>>>> Adding Cindy.
>>>>
>>>> I think it's a known issue that we need to do sanity check to make
>>>> sure cli parameters matches what is provisioned from netlink.
>>> Right. How's the plan for QEMU to get to the mtu provisioned by 
>>> netlink,
>>> via a new vhost-vdpa ioctl call?
>> I think netlink is not designed for qemu to use, the design is to
>> expose a vhost device to Qemu.
>>
>>> If so, will  QEMU be able to read it
>>> directly from kernel when it comes to the vhost-vdpa backend, without
>>> having user to specify host_mtu from CLI?
>> I'm not sure I get the question, but Qemu should get this via config
>> space (otherwise it should be a bug).
> It's hard for QEMU to work this way with the existing get_config() ops 
> I think, as it has assumption around endianness and feature 
> negotiation, until the latter is done you can't get any reliable value 
> for provisioned property. I think QEMU which need to validate the 
> provisioned value way earlier (when QEMU is launched), before 
> negotiation kicks in. It would be clearner to use another vhost and a 
> new vdpa driver ops to retrieve the provisioned feature config values 
> from vendor drivers.
>
>>   And Qemu need to verify the mtu
>> got from cli vs the mtu got from vhost and fail the device
>> initialization if they don't match.
> I mean today there's a problem for double provisioning: for e.g. mtu 
> has to be first provided in the 'vdpa dev add' command when to 
> provision _F_MTU, in QEMU CLI the same value has to be supplied to 
> host_mtu. The same applies to mac address. It would be the best we can 
> allow QEMU load the provisioned value from vdpa device directly, 
> without having to provide extra duplicated configs in QEMU CLI level.


That's the plan, I want to say is that Qemu should do santiy test to 
make sure what provided from CLI matches what is provisioned from the 
device.


>
>>
>>>>
>>>>> and the way to get the parent device MTU and whether that's relevant
>>>>> to vdpa device's MTU is very vendor specific.
>>>>
>>>> So I think the max MTU of parent should be equal to the max MTU of the
>>>> vDPA.
>>> Noted here the parent might not be necessarily the mgmtdev where vdpa
>>> gets created over. It may well end up with the MTU on the PF (uplink
>>> port) which the mgmt software has to concern with. My point is the
>>> utility and tool chain able to derive the maximal MTU effectively
>>> allowed for vDPA device may live out of vDPA's realm. It's a rare or
>>> even invalid configuration to have vDPA configured with a bigger value
>>> than the MTU on the uplink port or parent device. It's more common when
>>> MTU config is involved, it has to be consistently configured across all
>>> the network links along, from parent device (uplink port) down to the
>>> switchdev representor port, vdpa device, and QEMU virtio-net object.
>> Ok, right.
>>
>>>>
>>>>> I think we would need new attribute(s) in the mgmtdev level to
>>>>> support what you want here?
>>>>
>>>> Not sure, but what I want to ask is consider we provision MTU feature
>>>> but without max MTU value, do we need to report the initial max MTU 
>>>> here?
>>> Yep, maybe. I'm not very sure if this will be very useful to be honest,
>>> consider it's kinda a rare case to me were to provision MTU feature
>>> without a specific MTU value. If one cares about MTU, mgmt software
>>> should configure some mtu through "vdpa dev add ... mtu ...", no?
>> Yes, but this only works if all config fields could be provisioned,
>> which seems not the case now, vdpa_dev_set_config is currently a
>> subset of virtio_net_config. So this goes back to the question I
>> raised earlier. Is the time to switch to use virtio_net_config and
>> allow all fields to be provisioned?
> Don't quite get how it will be useful if switching to 
> virtio_net_config. I thought we can add the missing fields to 
> vdpa_dev_set_config even now to make it match virtio_net_config. 
> Though the reality is there's few vdpa device that supports those 
> features now. If any real device supports feature field in 
> virtio_net_config but not in vdpa_dev_set_config, it can be gradually 
> added so long as needed.
>
>>
>> And even for mtu we're lacking a way to report the maximum MTU allowed
>> by mgmt dev (e.g the uplink MTU via netlink):
> Since MTU is only implemented in mlx5_vdpa by now except for 
> simulators, copy Eli to see if this is feasible to implement in real 
> device. I think we also need to validate that the mtu configured on 
> vDPA device instance shouldn't exceed the uplink MTU (maximum MTU 
> allowed).
>
>> 1) report the maximum host mtu supported by the mgmtdev via netlink
>> (not done, so management needs to guess the maximum value now)
>> 2) allow mtu to be provisioned (done)
>> 3) show initial mtu (done by this patch)
> So I wonder is it fine for vdpa core to come up with a default value 
> for MTU when _F_MTU feature is to be provisioned or inherited?


It should be not easy since it depends on the parent.


> If we mandate each vDPA vendor to support at least the standard 1500 
> MTU for _F_MTU feature, we can make it default to 1500.
>
> Otherwise the vDPA has to be taken (inherited) from the parent device. 
> Unfortunately, right now for mlx5_vdpa, the parent mgmtdev device has 
> 1500 MTU by default regardless of the MTU on the uplink port, and I'm 
> not sure if it's a right model to enforce mgmtdev go with uplink 
> port's MTU. I would need to hear what vendors say about this requirement.
>
>>
>> We probably need to do the above for all fields to be self-contained.
> Agreed on the part of being self-contained.
>
>>
>>> On the other hand, no mtu value specified may mean "go with what the
>>> uplink port or parent device has". I think this is a pretty useful case
>>> if the vendor's NIC supports updating MTU on the fly without having to
>>> tear down QEMU and reconfigure vdpa. I'm not sure if we end up with
>>> killing this use case by limiting initial max MTU to a fixed value.
>>>
>>>>
>>>>>> 2) device_features: if device_features is not provisioned, we should
>>>>>> still report it via netlink here
>>>>> Not the way I expected it, but with Lingshan's series to expose
>>>>> fields out of FEATURES_OK, the device_features is now reported
>>>>> through 'vdpa dev config show' regardless being specified or not, if
>>>>> I am not mistaken?
>>>>
>>>> Yes.
>>> Do you want me to relocate to 'vdpa dev show', or it's okay to leave it
>>> behind there?
>> It's probably too late for the relocation but I feel it's better to
>> place all the initial/inherited attributes into a single command even
>> if some of them are already somewhere in another command, but we can
>> hear from others.
> Ok, that'll be fine. I supposed mgmt software should only query 
> through "mgmtdev show" or "dev show", avoiding any query via"dev 
> config show". It'd be the best to get all of the compatibility related 
> info consolidated in one single place. Let me try to include it in 
> "dev show".
>
>>
>>>>
>>>>> Currently we export the config attributes upon vdpa creation under
>>>>> the "initial_config" key. If we want to expose more default values
>>>>> inherited from mgmtdev, I think we can wrap up these default values
>>>>> under another key "inherited_config" to display in 'vdpa dev show'
>>>>> output. Does it fit what you have in mind?
>>>>
>>>> I wonder if it's better to merge those two, or is there any advantages
>>>> of splitting them?
>>> I think for the most part "initial_config" will be sufficient for those
>>> config attributes with "vdpa dev add" equivalents, be it user 
>>> specified,
>>> vdpa enforced default if missing user input, or default overridden by
>>> the parent device. "inherited_config" will be useful for the configs
>>> with no "vdpa dev add" equivalent or live out side of vdpa tool, but
>>> still important for mgmt software to replicate identical vdpa setup.
>>> Like max-supported-mtu (for the uplink port or parent device),
>>> effective-link-speed, effective-link-status et al. Let's see if there's
>>> more when we get there.
>> So one point I can see is that, if there's no difference from the
>> userpsace perspective, we'd better merge them. And I don't see any
>> difference between the initial versus inherited from the view of user
>> space. Do you?
> So the major difference is "initial_config" is settable and equivalent 
> to the config attribute in "vdpa dev add" command, while 
> "inherited_config" is the read-only fields from "mgmtdev show" that 
> does not correspond to any "vdpa dev add" vdpa attribute. That way the 
> mgmt software can use the "initial_config" directly to recreate vdpa 
> with identical device config, while using the "inherited_config" to 
> replicate the other configs out of vdpa, for e.g. set uplink port's 
> MTU to 9000. Maybe there's no need to fold such info into an 
> "inherited_config" key? though I just want to make it relevant to 
> migration compatibility. Any suggestion for the name or layout?


As stated above I think a single key would be better since I don't see a 
reason user need to differ them.

Thanks


>
>
> Thanks,
> -Siwei
>
>>
>> Thanks
>>
>>> Thanks,
>>> -Siwei
>>>
>>>>
>>>>>> or do you mean the mgmt can assume it
>>>>>> should be the same as mgmtdev. Anyhow if we don't show 
>>>>>> device_features
>>>>>> if it is not provisioned, it will complicate the mgmt software.
>>>>> Yes, as I said earlier, since the device_features attr getting added
>>>>> to the 'vdpa dev config show' command, this divergence started to
>>>>> complicate mgmt software already.
>>>>>
>>>>> Thanks,
>>>>
>>>> Thanks
>>>>
>>>>
>>>>> -Siwei
>>>>>> Thanks
>>>>>>
>>>>>>> Thanks,
>>>>>>> -Siwei
>>>>>>>
>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>> +
>>>>>>>>> +       return 0;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static int
>>>>>>>>>     vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff *msg,
>>>>>>>>> u32 portid, u32 seq,
>>>>>>>>>                  int flags, struct netlink_ext_ack *extack)
>>>>>>>>>     {
>>>>>>>>> @@ -715,6 +750,10 @@ static int
>>>>>>>>> vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct
>>>>>>>>> genl_info *i
>>>>>>>>>            if (nla_put_u16(msg, VDPA_ATTR_DEV_MIN_VQ_SIZE,
>>>>>>>>> min_vq_size))
>>>>>>>>>                    goto msg_err;
>>>>>>>>>
>>>>>>>>> +       err = vdpa_dev_initcfg_fill(vdev, msg, device_id);
>>>>>>>>> +       if (err)
>>>>>>>>> +               goto msg_err;
>>>>>>>>> +
>>>>>>>>>            genlmsg_end(msg, hdr);
>>>>>>>>>            return 0;
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> 1.8.3.1
>>>>>>>>>
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 3/4] vdpa: show dev config as-is in "vdpa dev show" output
@ 2022-12-20  7:58                     ` Jason Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Jason Wang @ 2022-12-20  7:58 UTC (permalink / raw)
  To: Si-Wei Liu, Eli Cohen; +Cc: mst, parav, virtualization, linux-kernel, Cindy Lu


在 2022/10/29 07:23, Si-Wei Liu 写道:
>
>
> On 10/27/2022 1:47 AM, Jason Wang wrote:
>> On Thu, Oct 27, 2022 at 2:31 PM Si-Wei Liu <si-wei.liu@oracle.com> 
>> wrote:
>>>
>>>
>>> On 10/25/2022 9:44 PM, Jason Wang wrote:
>>>> 在 2022/10/26 09:10, Si-Wei Liu 写道:
>>>>>
>>>>> On 10/24/2022 7:24 PM, Jason Wang wrote:
>>>>>> On Tue, Oct 25, 2022 at 3:14 AM Si-Wei Liu<si-wei.liu@oracle.com>
>>>>>> wrote:
>>>>>>> On 10/24/2022 1:40 AM, Jason Wang wrote:
>>>>>>>> On Sat, Oct 22, 2022 at 7:49 AM Si-Wei Liu<si-wei.liu@oracle.com>
>>>>>>>> wrote:
>>>>>>>>> Live migration of vdpa would typically require re-instate vdpa
>>>>>>>>> device with an idential set of configs on the destination node,
>>>>>>>>> same way as how source node created the device in the first
>>>>>>>>> place. In order to save orchestration software from memorizing
>>>>>>>>> and keeping track of vdpa config, it will be helpful if the vdpa
>>>>>>>>> tool provides the aids for exporting the initial configs as-is,
>>>>>>>>> the way how vdpa device was created. The "vdpa dev show" command
>>>>>>>>> seems to be the right vehicle for that. It is unlike the "vdpa 
>>>>>>>>> dev
>>>>>>>>> config show" command output which usually goes with the live 
>>>>>>>>> value
>>>>>>>>> in the device config space, and is not quite reliable subject to
>>>>>>>>> the dynamics of feature negotiation or possible change by the
>>>>>>>>> driver to the config space.
>>>>>>>>>
>>>>>>>>> Examples:
>>>>>>>>>
>>>>>>>>> 1) Create vDPA by default without any config attribute
>>>>>>>>>
>>>>>>>>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0
>>>>>>>>> $ vdpa dev show vdpa0
>>>>>>>>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555
>>>>>>>>> max_vqs 9 max_vq_size 256
>>>>>>>>> $ vdpa dev -jp show vdpa0
>>>>>>>>> {
>>>>>>>>>        "dev": {
>>>>>>>>>            "vdpa0": {
>>>>>>>>>                "type": "network",
>>>>>>>>>                "mgmtdev": "pci/0000:41:04.2",
>>>>>>>>>                "vendor_id": 5555,
>>>>>>>>>                "max_vqs": 9,
>>>>>>>>>                "max_vq_size": 256,
>>>>>>>>>            }
>>>>>>>>>        }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> 2) Create vDPA with config attribute(s) specified
>>>>>>>>>
>>>>>>>>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0 \
>>>>>>>>>        mac e4:11:c6:d3:45:f0 max_vq_pairs 4
>>>>>>>>> $ vdpa dev show
>>>>>>>>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555
>>>>>>>>> max_vqs 9 max_vq_size 256
>>>>>>>>>      initial_config: mac e4:11:c6:d3:45:f0 max_vq_pairs 4
>>>>>>>>> $ vdpa dev -jp show
>>>>>>>>> {
>>>>>>>>>        "dev": {
>>>>>>>>>            "vdpa0": {
>>>>>>>>>                "type": "network",
>>>>>>>>>                "mgmtdev": "pci/0000:41:04.2",
>>>>>>>>>                "vendor_id": 5555,
>>>>>>>>>                "max_vqs": 9,
>>>>>>>>>                "max_vq_size": 256,
>>>>>>>>>                "initial_config": {
>>>>>>>>>                    "mac": "e4:11:c6:d3:45:f0",
>>>>>>>>>                    "max_vq_pairs": 4
>>>>>>>>>                }
>>>>>>>>>            }
>>>>>>>>>        }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Signed-off-by: Si-Wei Liu<si-wei.liu@oracle.com>
>>>>>>>>> ---
>>>>>>>>>     drivers/vdpa/vdpa.c | 39 
>>>>>>>>> +++++++++++++++++++++++++++++++++++++++
>>>>>>>>>     1 file changed, 39 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
>>>>>>>>> index bebded6..bfb8f54 100644
>>>>>>>>> --- a/drivers/vdpa/vdpa.c
>>>>>>>>> +++ b/drivers/vdpa/vdpa.c
>>>>>>>>> @@ -677,6 +677,41 @@ static int
>>>>>>>>> vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct
>>>>>>>>> genl_info *i
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>>     static int
>>>>>>>>> +vdpa_dev_initcfg_fill(struct vdpa_device *vdev, struct sk_buff
>>>>>>>>> *msg, u32 device_id)
>>>>>>>>> +{
>>>>>>>>> +       struct vdpa_dev_set_config *cfg = &vdev->init_cfg;
>>>>>>>>> +       int err = -EMSGSIZE;
>>>>>>>>> +
>>>>>>>>> +       if (!cfg->mask)
>>>>>>>>> +               return 0;
>>>>>>>>> +
>>>>>>>>> +       switch (device_id) {
>>>>>>>>> +       case VIRTIO_ID_NET:
>>>>>>>>> +               if ((cfg->mask &
>>>>>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR)) != 0 &&
>>>>>>>>> +                   nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
>>>>>>>>> + sizeof(cfg->net.mac), cfg->net.mac))
>>>>>>>>> +                       return err;
>>>>>>>>> +               if ((cfg->mask &
>>>>>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU)) != 0 &&
>>>>>>>>> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU,
>>>>>>>>> cfg->net.mtu))
>>>>>>>>> +                       return err;
>>>>>>>>> +               if ((cfg->mask &
>>>>>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP)) != 0 &&
>>>>>>>>> +                   nla_put_u16(msg, 
>>>>>>>>> VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
>>>>>>>>> + cfg->net.max_vq_pairs))
>>>>>>>>> +                       return err;
>>>>>>>>> +               break;
>>>>>>>>> +       default:
>>>>>>>>> +               break;
>>>>>>>>> +       }
>>>>>>>>> +
>>>>>>>>> +       if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) != 0 &&
>>>>>>>>> +           nla_put_u64_64bit(msg, VDPA_ATTR_DEV_FEATURES,
>>>>>>>>> + cfg->device_features, VDPA_ATTR_PAD))
>>>>>>>>> +               return err;
>>>>>>>> A question: If any of those above attributes were not provisioned,
>>>>>>>> should we show the ones that are inherited from the parent?
>>>>>>> A simple answer would be yes, but the long answer is that I am not
>>>>>>> sure
>>>>>>> if there's any for the moment - there's no  default value for mtu,
>>>>>>> mac,
>>>>>>> and max_vqp that can be inherited from the parent (max_vqp by 
>>>>>>> default
>>>>>>> being 1 is spec defined, not something inherited from the parent).
>>>>>> Note that it is by default from driver level that if _F_MQ is not
>>>>>> negotiated. But I think we are talking about something different 
>>>>>> that
>>>>>> is out of the spec here, what if:
>>>>>>
>>>>>> vDPA inherit _F_MQ but mqx_vqp is not provisioned via netlink.
>>>>>>
>>>>>> Or is it not allowed?
>>>>> My understanding is that this is not allowed any more since the
>>>>> introduction of max_vqp attribute. Noted, currently we don't have a
>>>>> way for vendor driver to report the default value for mqx_vqp,
>>>>
>>>> I think it can be reported in this patch?
>>> Yes, we can add, but I am not sure whether or not this will be
>>> practically useful, for e.g. the same command without max_vqp specified
>>> may render different number of queues across different devices, or
>>> different revisions of the same vendor's devices. Does it complicate 
>>> the
>>> mgmt software even more, I'm not sure....
>> It depends on the use case, e.g if we want to compare the migration
>> compatibility, having a single vdpa command query is much easier than
>> having two or more.
> Yep I agree. I was saying not very attribute would need to be 
> inherited from the parent device. Actually attributes like max_vqp 
> could take the default from some common place for e.g. some default 
> value can be applied by vdpa core. And we can document these 
> attributes ruled by vdpa core in vdpa-dev(8) man page. Reduce the 
> extra call of having mgmt software issue another query command which 
> actually doesn't need to.
>
>>
>>> Could we instead mandate
>>> max_vqp to be 1 from vdpa core level if user doesn't explicitly specify
>>> the value?
>> This seems to be not easy, at least not easy in the vDPA core.
> We can load these default values from vdpa_nl_cmd_dev_add_set_doit() 
> before ops->dev_add is called. I can post a v3 that shows the code, it 
> shouldn't be too hard.


Ok, and I wonder if it's time to move netlink specific code into a 
dedicated file.


>
>>   We can
>> probably document this somewhere but max_vqp is only one example, we
>> have other mq devices like block/SCSI/console.
> Actually max_vqp is a network device specific config to provision mq 
> devices. If the parent mgmtdev supports net vdpa device creation and 
> user requests to provision _F_MQ with no supplied max_vqp value, we 
> should load some global default value there.
>
>>
>>> That way it is more consistent in terms of the resulting
>>> number of queue pairs (=1) with the case where parent device does not
>>> offer the _F_MQ feature.
>> Right, but a corner case is to provision _F_MQ but without max_vqp.
> Yes, I will post the patch that supports this.
>>
>>>>
>>>>> if not otherwise specified in the CLI. Without getting the default
>>>>> value reported in 'vdpa mgmtdev show' level, it'd just confuse mgmt
>>>>> software even more.
>>>>
>>>> Yes, this is something that we need to fix. And what's more in order
>>>> to support dynamic provisioning, we need a way to report the number of
>>>> available instances that could be used for vDPA device provisioning.
>>> Wouldn't it be possible to achieve that by simply checking how many
>>> parent mgmtdev instances don't have vdpa device provisioned yet? e.g.
>>>
>>> inuse=$(vdpa dev show | grep mgmtdev | wc -l)
>>> total=$(vdpa mgmtdev show  | grep "supported_classes" | wc -l )
>>> echo $((total - inuse))
>> I meant how many available vDPA devices that are available for the
>> mgmt to create?
> Oh I see.
>
>>
>> E.g in the case of sub function or simulator a mgmtdev can create more
>> than 1 vdpa devices.
> Does the sub function today supports creation of multiple vDPA 
> instance per mgmtdev?


I think so, otherwise SF doesn't make too much sense.


> Something I wasn't aware of before. Is it with different device class?


It should be possible (no limitation in the vdpa core at least). Each 
device class should register its own mgmtdev.


>
>>
>>>>
>>>>>>     At least some time in the past, mlx5 were
>>>>>> enabled with MQ with 8 queue pairs by default.
>>>>> That was the situation when there's no max_vqp attribute support from
>>>>> vdpa netlink API level. I think now every driver honors the vdpa core
>>>>> disposition to get a single queue pair if max_vqp config is missing.
>>>> So we have:
>>>>
>>>> int vdpa_register_device(struct vdpa_device *vdev, int nvqs)
>>>>
>>>> This means technically, parent can allocate a multiqueue devices with
>>>> _F_MQ features if max_vqp and device_features is not provisioned. And
>>>> what's more, what happens if _F_MQ is provisioned by max_vqp is not
>>>> specified?
>>>>
>>>> The question is:
>>>>
>>>> When a attribute is not specificed/provisioned via net link, what's
>>>> the default value? The answer should be consistent: if device_features
>>>> is determined by the parent, we should do the same for mqx_vqp.
>>> OK I got your point.
>>>
>>>> And it looks to me all of those belongs to the initial config
>>>> (self-contained)
>>> Right. I wonder if we can have vdpa core define the default value (for
>>> e.g. max_vqp=1) for those unspecified attribute (esp. when the
>>> corresponding device feature is offered and provisioned) whenever
>>> possible. Which I think it'll be more consistent for the same 
>>> command to
>>> get to the same result between different vendor drivers. While we still
>>> keep the possibility for future extension to allow driver override the
>>> vdpa core disposition if the real use case emerges. What do you think?
>> That's possible but we may end up with device specific code in the
>> vDPA core which is not elegant, and the code will grow as the number
>> of supported types grows.
> I guess that's unavoidable as this is already the case today. See 
> various VIRTIO_ID_NET case switch in the vdpa.c code. I think type 
> specific code just limits itself to the netlink API interfacing layer 
> rather than down to the driver API, it might be just okay (as that's 
> already the case).
>
>>
>> Note that, max_vqp is not the only attribute that may suffer from
>> this, basically any config field that depends on a specific feature
>> bit may have the same issue.
>>
>>>>
>>>>> And the mlx5_vdpa driver with 8 queue pairs in the wild days is just
>>>>> irrelevant to be manageable by mgmt software, regardless of live
>>>>> migration.
>>>>>>> And
>>>>>>> the device_features if inherited is displayed at 'vdpa dev config
>>>>>>> show'
>>>>>>> output. Can you remind me of a good example for inherited value
>>>>>>> that we
>>>>>>> may want to show here?
>>>>>> Some other cases:
>>>>>>
>>>>>> 1) MTU: there should be something that the device needs to report if
>>>>>> _F_MTU is negotiated even if it is not provisioned from netlink.
>>>>> I am not sure I understand the ask here. Noted the QEMU argument has
>>>>> to offer host_mtu=X with the maximum MTU value for guest to use (and
>>>>> applied as the initial MTU config during virtio-net probing for Linux
>>>>> driver),
>>>>
>>>> Adding Cindy.
>>>>
>>>> I think it's a known issue that we need to do sanity check to make
>>>> sure cli parameters matches what is provisioned from netlink.
>>> Right. How's the plan for QEMU to get to the mtu provisioned by 
>>> netlink,
>>> via a new vhost-vdpa ioctl call?
>> I think netlink is not designed for qemu to use, the design is to
>> expose a vhost device to Qemu.
>>
>>> If so, will  QEMU be able to read it
>>> directly from kernel when it comes to the vhost-vdpa backend, without
>>> having user to specify host_mtu from CLI?
>> I'm not sure I get the question, but Qemu should get this via config
>> space (otherwise it should be a bug).
> It's hard for QEMU to work this way with the existing get_config() ops 
> I think, as it has assumption around endianness and feature 
> negotiation, until the latter is done you can't get any reliable value 
> for provisioned property. I think QEMU which need to validate the 
> provisioned value way earlier (when QEMU is launched), before 
> negotiation kicks in. It would be clearner to use another vhost and a 
> new vdpa driver ops to retrieve the provisioned feature config values 
> from vendor drivers.
>
>>   And Qemu need to verify the mtu
>> got from cli vs the mtu got from vhost and fail the device
>> initialization if they don't match.
> I mean today there's a problem for double provisioning: for e.g. mtu 
> has to be first provided in the 'vdpa dev add' command when to 
> provision _F_MTU, in QEMU CLI the same value has to be supplied to 
> host_mtu. The same applies to mac address. It would be the best we can 
> allow QEMU load the provisioned value from vdpa device directly, 
> without having to provide extra duplicated configs in QEMU CLI level.


That's the plan, I want to say is that Qemu should do santiy test to 
make sure what provided from CLI matches what is provisioned from the 
device.


>
>>
>>>>
>>>>> and the way to get the parent device MTU and whether that's relevant
>>>>> to vdpa device's MTU is very vendor specific.
>>>>
>>>> So I think the max MTU of parent should be equal to the max MTU of the
>>>> vDPA.
>>> Noted here the parent might not be necessarily the mgmtdev where vdpa
>>> gets created over. It may well end up with the MTU on the PF (uplink
>>> port) which the mgmt software has to concern with. My point is the
>>> utility and tool chain able to derive the maximal MTU effectively
>>> allowed for vDPA device may live out of vDPA's realm. It's a rare or
>>> even invalid configuration to have vDPA configured with a bigger value
>>> than the MTU on the uplink port or parent device. It's more common when
>>> MTU config is involved, it has to be consistently configured across all
>>> the network links along, from parent device (uplink port) down to the
>>> switchdev representor port, vdpa device, and QEMU virtio-net object.
>> Ok, right.
>>
>>>>
>>>>> I think we would need new attribute(s) in the mgmtdev level to
>>>>> support what you want here?
>>>>
>>>> Not sure, but what I want to ask is consider we provision MTU feature
>>>> but without max MTU value, do we need to report the initial max MTU 
>>>> here?
>>> Yep, maybe. I'm not very sure if this will be very useful to be honest,
>>> consider it's kinda a rare case to me were to provision MTU feature
>>> without a specific MTU value. If one cares about MTU, mgmt software
>>> should configure some mtu through "vdpa dev add ... mtu ...", no?
>> Yes, but this only works if all config fields could be provisioned,
>> which seems not the case now, vdpa_dev_set_config is currently a
>> subset of virtio_net_config. So this goes back to the question I
>> raised earlier. Is the time to switch to use virtio_net_config and
>> allow all fields to be provisioned?
> Don't quite get how it will be useful if switching to 
> virtio_net_config. I thought we can add the missing fields to 
> vdpa_dev_set_config even now to make it match virtio_net_config. 
> Though the reality is there's few vdpa device that supports those 
> features now. If any real device supports feature field in 
> virtio_net_config but not in vdpa_dev_set_config, it can be gradually 
> added so long as needed.
>
>>
>> And even for mtu we're lacking a way to report the maximum MTU allowed
>> by mgmt dev (e.g the uplink MTU via netlink):
> Since MTU is only implemented in mlx5_vdpa by now except for 
> simulators, copy Eli to see if this is feasible to implement in real 
> device. I think we also need to validate that the mtu configured on 
> vDPA device instance shouldn't exceed the uplink MTU (maximum MTU 
> allowed).
>
>> 1) report the maximum host mtu supported by the mgmtdev via netlink
>> (not done, so management needs to guess the maximum value now)
>> 2) allow mtu to be provisioned (done)
>> 3) show initial mtu (done by this patch)
> So I wonder is it fine for vdpa core to come up with a default value 
> for MTU when _F_MTU feature is to be provisioned or inherited?


It should be not easy since it depends on the parent.


> If we mandate each vDPA vendor to support at least the standard 1500 
> MTU for _F_MTU feature, we can make it default to 1500.
>
> Otherwise the vDPA has to be taken (inherited) from the parent device. 
> Unfortunately, right now for mlx5_vdpa, the parent mgmtdev device has 
> 1500 MTU by default regardless of the MTU on the uplink port, and I'm 
> not sure if it's a right model to enforce mgmtdev go with uplink 
> port's MTU. I would need to hear what vendors say about this requirement.
>
>>
>> We probably need to do the above for all fields to be self-contained.
> Agreed on the part of being self-contained.
>
>>
>>> On the other hand, no mtu value specified may mean "go with what the
>>> uplink port or parent device has". I think this is a pretty useful case
>>> if the vendor's NIC supports updating MTU on the fly without having to
>>> tear down QEMU and reconfigure vdpa. I'm not sure if we end up with
>>> killing this use case by limiting initial max MTU to a fixed value.
>>>
>>>>
>>>>>> 2) device_features: if device_features is not provisioned, we should
>>>>>> still report it via netlink here
>>>>> Not the way I expected it, but with Lingshan's series to expose
>>>>> fields out of FEATURES_OK, the device_features is now reported
>>>>> through 'vdpa dev config show' regardless being specified or not, if
>>>>> I am not mistaken?
>>>>
>>>> Yes.
>>> Do you want me to relocate to 'vdpa dev show', or it's okay to leave it
>>> behind there?
>> It's probably too late for the relocation but I feel it's better to
>> place all the initial/inherited attributes into a single command even
>> if some of them are already somewhere in another command, but we can
>> hear from others.
> Ok, that'll be fine. I supposed mgmt software should only query 
> through "mgmtdev show" or "dev show", avoiding any query via"dev 
> config show". It'd be the best to get all of the compatibility related 
> info consolidated in one single place. Let me try to include it in 
> "dev show".
>
>>
>>>>
>>>>> Currently we export the config attributes upon vdpa creation under
>>>>> the "initial_config" key. If we want to expose more default values
>>>>> inherited from mgmtdev, I think we can wrap up these default values
>>>>> under another key "inherited_config" to display in 'vdpa dev show'
>>>>> output. Does it fit what you have in mind?
>>>>
>>>> I wonder if it's better to merge those two, or is there any advantages
>>>> of splitting them?
>>> I think for the most part "initial_config" will be sufficient for those
>>> config attributes with "vdpa dev add" equivalents, be it user 
>>> specified,
>>> vdpa enforced default if missing user input, or default overridden by
>>> the parent device. "inherited_config" will be useful for the configs
>>> with no "vdpa dev add" equivalent or live out side of vdpa tool, but
>>> still important for mgmt software to replicate identical vdpa setup.
>>> Like max-supported-mtu (for the uplink port or parent device),
>>> effective-link-speed, effective-link-status et al. Let's see if there's
>>> more when we get there.
>> So one point I can see is that, if there's no difference from the
>> userpsace perspective, we'd better merge them. And I don't see any
>> difference between the initial versus inherited from the view of user
>> space. Do you?
> So the major difference is "initial_config" is settable and equivalent 
> to the config attribute in "vdpa dev add" command, while 
> "inherited_config" is the read-only fields from "mgmtdev show" that 
> does not correspond to any "vdpa dev add" vdpa attribute. That way the 
> mgmt software can use the "initial_config" directly to recreate vdpa 
> with identical device config, while using the "inherited_config" to 
> replicate the other configs out of vdpa, for e.g. set uplink port's 
> MTU to 9000. Maybe there's no need to fold such info into an 
> "inherited_config" key? though I just want to make it relevant to 
> migration compatibility. Any suggestion for the name or layout?


As stated above I think a single key would be better since I don't see a 
reason user need to differ them.

Thanks


>
>
> Thanks,
> -Siwei
>
>>
>> Thanks
>>
>>> Thanks,
>>> -Siwei
>>>
>>>>
>>>>>> or do you mean the mgmt can assume it
>>>>>> should be the same as mgmtdev. Anyhow if we don't show 
>>>>>> device_features
>>>>>> if it is not provisioned, it will complicate the mgmt software.
>>>>> Yes, as I said earlier, since the device_features attr getting added
>>>>> to the 'vdpa dev config show' command, this divergence started to
>>>>> complicate mgmt software already.
>>>>>
>>>>> Thanks,
>>>>
>>>> Thanks
>>>>
>>>>
>>>>> -Siwei
>>>>>> Thanks
>>>>>>
>>>>>>> Thanks,
>>>>>>> -Siwei
>>>>>>>
>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>> +
>>>>>>>>> +       return 0;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static int
>>>>>>>>>     vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff *msg,
>>>>>>>>> u32 portid, u32 seq,
>>>>>>>>>                  int flags, struct netlink_ext_ack *extack)
>>>>>>>>>     {
>>>>>>>>> @@ -715,6 +750,10 @@ static int
>>>>>>>>> vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct
>>>>>>>>> genl_info *i
>>>>>>>>>            if (nla_put_u16(msg, VDPA_ATTR_DEV_MIN_VQ_SIZE,
>>>>>>>>> min_vq_size))
>>>>>>>>>                    goto msg_err;
>>>>>>>>>
>>>>>>>>> +       err = vdpa_dev_initcfg_fill(vdev, msg, device_id);
>>>>>>>>> +       if (err)
>>>>>>>>> +               goto msg_err;
>>>>>>>>> +
>>>>>>>>>            genlmsg_end(msg, hdr);
>>>>>>>>>            return 0;
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> 1.8.3.1
>>>>>>>>>
>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 3/4] vdpa: show dev config as-is in "vdpa dev show" output
  2022-12-19  6:31                     ` Michael S. Tsirkin
@ 2022-12-21  0:14                       ` Si-Wei Liu
  -1 siblings, 0 replies; 42+ messages in thread
From: Si-Wei Liu @ 2022-12-21  0:14 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Cindy Lu, linux-kernel, virtualization, Eli Cohen



On 12/18/2022 10:31 PM, Michael S. Tsirkin wrote:
> On Fri, Oct 28, 2022 at 04:23:49PM -0700, Si-Wei Liu wrote:
>> I can post a v3 that shows the code, it shouldn't be
>> too hard.
> I take it another version of this patchset is planned?
>
Yes, I saw you just merged "vdpa: merge functionally duplicated 
dev_features attributes". A v4 will be posted on top of that patch, stay 
tuned.

Thanks!
-Siwei
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 3/4] vdpa: show dev config as-is in "vdpa dev show" output
@ 2022-12-21  0:14                       ` Si-Wei Liu
  0 siblings, 0 replies; 42+ messages in thread
From: Si-Wei Liu @ 2022-12-21  0:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Eli Cohen, parav, virtualization, linux-kernel, Cindy Lu



On 12/18/2022 10:31 PM, Michael S. Tsirkin wrote:
> On Fri, Oct 28, 2022 at 04:23:49PM -0700, Si-Wei Liu wrote:
>> I can post a v3 that shows the code, it shouldn't be
>> too hard.
> I take it another version of this patchset is planned?
>
Yes, I saw you just merged "vdpa: merge functionally duplicated 
dev_features attributes". A v4 will be posted on top of that patch, stay 
tuned.

Thanks!
-Siwei

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 0/4] vDPA: initial config export via "vdpa dev show"
  2022-10-21 22:43 ` Si-Wei Liu
@ 2023-01-27  8:16   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2023-01-27  8:16 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: jasowang, parav, virtualization, linux-kernel

Did you say you are going to post v4 of this?
I'm dropping this from review for now.

-- 
MST


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 0/4] vDPA: initial config export via "vdpa dev show"
@ 2023-01-27  8:16   ` Michael S. Tsirkin
  0 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2023-01-27  8:16 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: linux-kernel, virtualization

Did you say you are going to post v4 of this?
I'm dropping this from review for now.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 0/4] vDPA: initial config export via "vdpa dev show"
  2023-01-27  8:16   ` Michael S. Tsirkin
@ 2023-01-30 21:05     ` Si-Wei Liu
  -1 siblings, 0 replies; 42+ messages in thread
From: Si-Wei Liu @ 2023-01-30 21:05 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: jasowang, parav, virtualization, linux-kernel

Apologies, I was over booked for multiple things in parallel, and there 
had been urgent internal priorities popped up at times for the past few 
weeks. On the other hand, there were brokenness or incompleteness 
identified around features provisioning while this series was being 
developed, which makes it grow a bit larger than v3. If you are eager to 
see patches posted I can split off the series. I will send out a 
prerequisite patchset for this series shortly.

Thanks for your patience,
-Siwei

On 1/27/2023 12:16 AM, Michael S. Tsirkin wrote:
> Did you say you are going to post v4 of this?
> I'm dropping this from review for now.
>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 0/4] vDPA: initial config export via "vdpa dev show"
@ 2023-01-30 21:05     ` Si-Wei Liu
  0 siblings, 0 replies; 42+ messages in thread
From: Si-Wei Liu @ 2023-01-30 21:05 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, virtualization

Apologies, I was over booked for multiple things in parallel, and there 
had been urgent internal priorities popped up at times for the past few 
weeks. On the other hand, there were brokenness or incompleteness 
identified around features provisioning while this series was being 
developed, which makes it grow a bit larger than v3. If you are eager to 
see patches posted I can split off the series. I will send out a 
prerequisite patchset for this series shortly.

Thanks for your patience,
-Siwei

On 1/27/2023 12:16 AM, Michael S. Tsirkin wrote:
> Did you say you are going to post v4 of this?
> I'm dropping this from review for now.
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 0/4] vDPA: initial config export via "vdpa dev show"
  2023-01-30 21:05     ` Si-Wei Liu
@ 2023-01-30 21:59       ` Si-Wei Liu
  -1 siblings, 0 replies; 42+ messages in thread
From: Si-Wei Liu @ 2023-01-30 21:59 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: jasowang, parav, virtualization, linux-kernel



On 1/30/2023 1:05 PM, Si-Wei Liu wrote:
> Apologies, I was over booked for multiple things in parallel, and 
> there had been urgent internal priorities popped up at times for the 
> past few weeks. On the other hand, there were brokenness or 
> incompleteness identified around features provisioning while this 
> series was being developed, which makes it grow a bit larger than v3. 
> If you are eager to see patches posted I can split off the series. I 
> will send out a prerequisite patchset for this series shortly.
Patches sent. Please review:

https://lore.kernel.org/virtualization/1675110643-28143-1-git-send-email-si-wei.liu@oracle.com/T/#t


>
> Thanks for your patience,
> -Siwei
>
> On 1/27/2023 12:16 AM, Michael S. Tsirkin wrote:
>> Did you say you are going to post v4 of this?
>> I'm dropping this from review for now.
>>
>
-Siwei

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 0/4] vDPA: initial config export via "vdpa dev show"
@ 2023-01-30 21:59       ` Si-Wei Liu
  0 siblings, 0 replies; 42+ messages in thread
From: Si-Wei Liu @ 2023-01-30 21:59 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, virtualization



On 1/30/2023 1:05 PM, Si-Wei Liu wrote:
> Apologies, I was over booked for multiple things in parallel, and 
> there had been urgent internal priorities popped up at times for the 
> past few weeks. On the other hand, there were brokenness or 
> incompleteness identified around features provisioning while this 
> series was being developed, which makes it grow a bit larger than v3. 
> If you are eager to see patches posted I can split off the series. I 
> will send out a prerequisite patchset for this series shortly.
Patches sent. Please review:

https://lore.kernel.org/virtualization/1675110643-28143-1-git-send-email-si-wei.liu@oracle.com/T/#t


>
> Thanks for your patience,
> -Siwei
>
> On 1/27/2023 12:16 AM, Michael S. Tsirkin wrote:
>> Did you say you are going to post v4 of this?
>> I'm dropping this from review for now.
>>
>
-Siwei
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2023-01-30 21:59 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-21 22:43 [PATCH v3 0/4] vDPA: initial config export via "vdpa dev show" Si-Wei Liu
2022-10-21 22:43 ` Si-Wei Liu
2022-10-21 22:43 ` [PATCH v3 1/4] vdpa: save vdpa_dev_set_config in struct vdpa_device Si-Wei Liu
2022-10-21 22:43   ` Si-Wei Liu
2022-10-24  8:43   ` Jason Wang
2022-10-24  8:43     ` Jason Wang
2022-10-21 22:43 ` [PATCH v3 2/4] vdpa: pass initial config to _vdpa_register_device() Si-Wei Liu
2022-10-21 22:43   ` Si-Wei Liu
2022-10-21 22:43 ` [PATCH v3 3/4] vdpa: show dev config as-is in "vdpa dev show" output Si-Wei Liu
2022-10-21 22:43   ` Si-Wei Liu
2022-10-24  8:40   ` Jason Wang
2022-10-24  8:40     ` Jason Wang
2022-10-24 19:14     ` Si-Wei Liu
2022-10-24 19:14       ` Si-Wei Liu
2022-10-25  2:24       ` Jason Wang
2022-10-25  2:24         ` Jason Wang
2022-10-26  1:10         ` Si-Wei Liu
2022-10-26  4:44           ` Jason Wang
2022-10-26  4:44             ` Jason Wang
2022-10-27  6:31             ` Si-Wei Liu
2022-10-27  6:31               ` Si-Wei Liu
2022-10-27  8:47               ` Jason Wang
2022-10-27  8:47                 ` Jason Wang
2022-10-28 23:23                 ` Si-Wei Liu
2022-10-28 23:23                   ` Si-Wei Liu
2022-10-30 13:36                   ` Eli Cohen
2022-12-19  6:31                   ` Michael S. Tsirkin
2022-12-19  6:31                     ` Michael S. Tsirkin
2022-12-21  0:14                     ` Si-Wei Liu
2022-12-21  0:14                       ` Si-Wei Liu
2022-12-20  7:58                   ` Jason Wang
2022-12-20  7:58                     ` Jason Wang
2022-10-21 22:43 ` [PATCH v3 4/4] vdpa: fix improper error message when adding vdpa dev Si-Wei Liu
2022-10-21 22:43   ` Si-Wei Liu
2022-10-24  8:43   ` Jason Wang
2022-10-24  8:43     ` Jason Wang
2023-01-27  8:16 ` [PATCH v3 0/4] vDPA: initial config export via "vdpa dev show" Michael S. Tsirkin
2023-01-27  8:16   ` Michael S. Tsirkin
2023-01-30 21:05   ` Si-Wei Liu
2023-01-30 21:05     ` Si-Wei Liu
2023-01-30 21:59     ` Si-Wei Liu
2023-01-30 21:59       ` Si-Wei Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.