All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] VFIO mdev aggregated resources handling
@ 2019-10-24  5:08 Zhenyu Wang
  2019-10-24  5:08 ` [PATCH 1/6] vfio/mdev: Add new "aggregate" parameter for mdev create Zhenyu Wang
                   ` (7 more replies)
  0 siblings, 8 replies; 32+ messages in thread
From: Zhenyu Wang @ 2019-10-24  5:08 UTC (permalink / raw)
  To: kvm; +Cc: alex.williamson, kwankhede, kevin.tian, cohuck

Hi,

This is a refresh for previous send of this series. I got impression that
some SIOV drivers would still deploy their own create and config method so
stopped effort on this. But seems this would still be useful for some other
SIOV driver which may simply want capability to aggregate resources. So here's
refreshed series.

Current mdev device create interface depends on fixed mdev type, which get uuid
from user to create instance of mdev device. If user wants to use customized
number of resource for mdev device, then only can create new mdev type for that
which may not be flexible. This requirement comes not only from to be able to
allocate flexible resources for KVMGT, but also from Intel scalable IO
virtualization which would use vfio/mdev to be able to allocate arbitrary
resources on mdev instance. More info on [1] [2] [3].

To allow to create user defined resources for mdev, it trys to extend mdev
create interface by adding new "aggregate=xxx" parameter following UUID, for
target mdev type if aggregation is supported, it can create new mdev device
which contains resources combined by number of instances, e.g

    echo "<uuid>,aggregate=10" > create

VM manager e.g libvirt can check mdev type with "aggregation" attribute which
can support this setting. If no "aggregation" attribute found for mdev type,
previous behavior is still kept for one instance allocation. And new sysfs
attribute "aggregated_instances" is created for each mdev device to show allocated number.

References:
[1] https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
[2] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
[3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf

Zhenyu Wang (6):
  vfio/mdev: Add new "aggregate" parameter for mdev create
  vfio/mdev: Add "aggregation" attribute for supported mdev type
  vfio/mdev: Add "aggregated_instances" attribute for supported mdev
    device
  Documentation/driver-api/vfio-mediated-device.rst: Update for
    vfio/mdev aggregation support
  Documentation/ABI/testing/sysfs-bus-vfio-mdev: Update for vfio/mdev
    aggregation support
  drm/i915/gvt: Add new type with aggregation support

 Documentation/ABI/testing/sysfs-bus-vfio-mdev | 24 ++++++
 .../driver-api/vfio-mediated-device.rst       | 23 ++++++
 drivers/gpu/drm/i915/gvt/gvt.c                |  4 +-
 drivers/gpu/drm/i915/gvt/gvt.h                | 11 ++-
 drivers/gpu/drm/i915/gvt/kvmgt.c              | 53 ++++++++++++-
 drivers/gpu/drm/i915/gvt/vgpu.c               | 56 ++++++++++++-
 drivers/vfio/mdev/mdev_core.c                 | 36 ++++++++-
 drivers/vfio/mdev/mdev_private.h              |  6 +-
 drivers/vfio/mdev/mdev_sysfs.c                | 79 ++++++++++++++++++-
 include/linux/mdev.h                          | 19 +++++
 10 files changed, 294 insertions(+), 17 deletions(-)

-- 
2.24.0.rc0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/6] vfio/mdev: Add new "aggregate" parameter for mdev create
  2019-10-24  5:08 [PATCH 0/6] VFIO mdev aggregated resources handling Zhenyu Wang
@ 2019-10-24  5:08 ` Zhenyu Wang
  2019-10-24  5:08 ` [PATCH 2/6] vfio/mdev: Add "aggregation" attribute for supported mdev type Zhenyu Wang
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 32+ messages in thread
From: Zhenyu Wang @ 2019-10-24  5:08 UTC (permalink / raw)
  To: kvm; +Cc: alex.williamson, kwankhede, kevin.tian, cohuck

For special mdev type which can aggregate instances for mdev device,
this extends mdev create interface by allowing extra "aggregate=xxx"
parameter, which is passed to mdev device model to be able to create
bundled number of instances for target mdev device.

Cc: Kirti Wankhede <kwankhede@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
---
v2: create new create_with_instances operator for vendor driver
v3:
- Change parameter name as "aggregate="
- Fix new interface comments.
- Parameter checking for new option, pass UUID string only to
  parse and properly end parameter for kstrtouint() conversion.
v4:
- rebase
- just call create_with_instances if exists, otherwise call create

 drivers/vfio/mdev/mdev_core.c    | 17 +++++++++++++++--
 drivers/vfio/mdev/mdev_private.h |  4 +++-
 drivers/vfio/mdev/mdev_sysfs.c   | 27 +++++++++++++++++++++++----
 include/linux/mdev.h             | 11 +++++++++++
 4 files changed, 52 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index b558d4cfd082..4926a99f664d 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -270,7 +270,8 @@ static void mdev_device_release(struct device *dev)
 }
 
 int mdev_device_create(struct kobject *kobj,
-		       struct device *dev, const guid_t *uuid)
+		       struct device *dev, const guid_t *uuid,
+		       unsigned int instances)
 {
 	int ret;
 	struct mdev_device *mdev, *tmp;
@@ -281,6 +282,13 @@ int mdev_device_create(struct kobject *kobj,
 	if (!parent)
 		return -EINVAL;
 
+	if (instances > 1 &&
+	    !parent->ops->create_with_instances) {
+		dev_warn(dev, "Non-supported aggregate instances create\n");
+		ret = -EINVAL;
+		goto mdev_fail;
+	}
+
 	mutex_lock(&mdev_list_lock);
 
 	/* Check for duplicate */
@@ -319,8 +327,13 @@ int mdev_device_create(struct kobject *kobj,
 	dev_set_name(&mdev->dev, "%pUl", uuid);
 	mdev->dev.groups = parent->ops->mdev_attr_groups;
 	mdev->type_kobj = kobj;
+	mdev->type_instances = instances;
 
-	ret = parent->ops->create(kobj, mdev);
+	if (parent->ops->create_with_instances)
+		ret = parent->ops->create_with_instances(kobj, mdev,
+							 instances);
+	else
+		ret = parent->ops->create(kobj, mdev);
 	if (ret)
 		goto ops_create_fail;
 
diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
index 7d922950caaf..56cbe9ea8817 100644
--- a/drivers/vfio/mdev/mdev_private.h
+++ b/drivers/vfio/mdev/mdev_private.h
@@ -33,6 +33,7 @@ struct mdev_device {
 	struct kobject *type_kobj;
 	struct device *iommu_device;
 	bool active;
+	unsigned int type_instances;
 };
 
 #define to_mdev_device(dev)	container_of(dev, struct mdev_device, dev)
@@ -58,7 +59,8 @@ int  mdev_create_sysfs_files(struct device *dev, struct mdev_type *type);
 void mdev_remove_sysfs_files(struct device *dev, struct mdev_type *type);
 
 int  mdev_device_create(struct kobject *kobj,
-			struct device *dev, const guid_t *uuid);
+			struct device *dev, const guid_t *uuid,
+			unsigned int instances);
 int  mdev_device_remove(struct device *dev);
 
 #endif /* MDEV_PRIVATE_H */
diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
index 7570c7602ab4..6c2693dd4022 100644
--- a/drivers/vfio/mdev/mdev_sysfs.c
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -48,17 +48,27 @@ static const struct sysfs_ops mdev_type_sysfs_ops = {
 	.store = mdev_type_attr_store,
 };
 
+#define MDEV_CREATE_BUF 4096
 static ssize_t create_store(struct kobject *kobj, struct device *dev,
 			    const char *buf, size_t count)
 {
-	char *str;
+	char *str, *param;
 	guid_t uuid;
 	int ret;
+	unsigned int instances = 1;
 
-	if ((count < UUID_STRING_LEN) || (count > UUID_STRING_LEN + 1))
+	if (count < UUID_STRING_LEN)
 		return -EINVAL;
+	if (count > MDEV_CREATE_BUF - 1)
+		return -E2BIG;
 
-	str = kstrndup(buf, count, GFP_KERNEL);
+	if ((param = strnchr(buf, count, ',')) == NULL) {
+		if (count > UUID_STRING_LEN + 1)
+			return -EINVAL;
+	} else if (param - buf != UUID_STRING_LEN)
+		return -EINVAL;
+
+	str = kstrndup(buf, UUID_STRING_LEN, GFP_KERNEL);
 	if (!str)
 		return -ENOMEM;
 
@@ -67,7 +77,16 @@ static ssize_t create_store(struct kobject *kobj, struct device *dev,
 	if (ret)
 		return ret;
 
-	ret = mdev_device_create(kobj, dev, &uuid);
+	if (param) {
+		param++;
+		if (strncmp(param, "aggregate=", 10))
+			return -EINVAL;
+		param += 10;
+		if (kstrtouint(param, 0, &instances))
+			return -EINVAL;
+	}
+
+	ret = mdev_device_create(kobj, dev, &uuid, instances);
 	if (ret)
 		return ret;
 
diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index 0ce30ca78db0..0dbb7ec27009 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -42,6 +42,14 @@ struct device *mdev_get_iommu_device(struct device *dev);
  *			@mdev: mdev_device structure on of mediated device
  *			      that is being created
  *			Returns integer: success (0) or error (< 0)
+ * @create_with_instances: Allocate aggregated instances' resources in parent device's
+ *			driver for a particular mediated device. Optional if aggregated
+ *                      resources are not supported.
+ *			@kobj: kobject of type for which 'create' is called.
+ *			@mdev: mdev_device structure of mediated device
+ *			      that is being created
+ *                      @instances: number of instances to aggregate
+ *			Returns integer: success (0) or error (< 0)
  * @remove:		Called to free resources in parent device's driver for a
  *			a mediated device. It is mandatory to provide 'remove'
  *			ops.
@@ -82,6 +90,9 @@ struct mdev_parent_ops {
 	struct attribute_group **supported_type_groups;
 
 	int     (*create)(struct kobject *kobj, struct mdev_device *mdev);
+	int     (*create_with_instances)(struct kobject *kobj,
+					 struct mdev_device *mdev,
+					 unsigned int instances);
 	int     (*remove)(struct mdev_device *mdev);
 	int     (*open)(struct mdev_device *mdev);
 	void    (*release)(struct mdev_device *mdev);
-- 
2.24.0.rc0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/6] vfio/mdev: Add "aggregation" attribute for supported mdev type
  2019-10-24  5:08 [PATCH 0/6] VFIO mdev aggregated resources handling Zhenyu Wang
  2019-10-24  5:08 ` [PATCH 1/6] vfio/mdev: Add new "aggregate" parameter for mdev create Zhenyu Wang
@ 2019-10-24  5:08 ` Zhenyu Wang
  2019-10-27  6:24     ` kbuild test robot
  2019-10-27  6:24     ` kbuild test robot
  2019-10-24  5:08 ` [PATCH 3/6] vfio/mdev: Add "aggregated_instances" attribute for supported mdev device Zhenyu Wang
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 32+ messages in thread
From: Zhenyu Wang @ 2019-10-24  5:08 UTC (permalink / raw)
  To: kvm; +Cc: alex.williamson, kwankhede, kevin.tian, cohuck

For supported mdev driver to create aggregated device, this creates
new "aggregation" attribute for target type, which will show maximum
number of instance resources that can be aggregated.

Cc: Kirti Wankhede <kwankhede@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
---
 drivers/vfio/mdev/mdev_core.c    | 19 +++++++++++++++++++
 drivers/vfio/mdev/mdev_private.h |  2 ++
 drivers/vfio/mdev/mdev_sysfs.c   | 27 +++++++++++++++++++++++++++
 include/linux/mdev.h             |  8 ++++++++
 4 files changed, 56 insertions(+)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 4926a99f664d..f8687893bff8 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -131,6 +131,25 @@ static int mdev_device_remove_cb(struct device *dev, void *data)
 	return 0;
 }
 
+int mdev_max_aggregated_instances(struct kobject *kobj, struct device *dev,
+				  unsigned int *m)
+{
+	struct mdev_parent *parent;
+	struct mdev_type *type = to_mdev_type(kobj);
+	int ret;
+
+	parent = mdev_get_parent(type->parent);
+	if (!parent)
+		return -EINVAL;
+
+	if (parent->ops->max_aggregated_instances) {
+		ret = parent->ops->max_aggregated_instances(kobj, dev, m);
+	} else
+		ret = -EINVAL;
+	mdev_put_parent(parent);
+	return ret;
+}
+
 /*
  * mdev_register_device : Register a device
  * @dev: device structure representing parent device.
diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
index 56cbe9ea8817..5dcbd00f3a46 100644
--- a/drivers/vfio/mdev/mdev_private.h
+++ b/drivers/vfio/mdev/mdev_private.h
@@ -63,4 +63,6 @@ int  mdev_device_create(struct kobject *kobj,
 			unsigned int instances);
 int  mdev_device_remove(struct device *dev);
 
+int  mdev_max_aggregated_instances(struct kobject *kobj, struct device *dev,
+				   unsigned int *m);
 #endif /* MDEV_PRIVATE_H */
diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
index 6c2693dd4022..acd3ec2900b5 100644
--- a/drivers/vfio/mdev/mdev_sysfs.c
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -95,6 +95,18 @@ static ssize_t create_store(struct kobject *kobj, struct device *dev,
 
 MDEV_TYPE_ATTR_WO(create);
 
+static ssize_t
+aggregation_show(struct kobject *kobj, struct device *dev, char *buf)
+{
+	unsigned int m;
+
+	if (mdev_max_aggregated_instances(kobj, dev, &m))
+		return sprintf(buf, "1\n");
+	else
+		return sprintf(buf, "%u\n", m);
+}
+MDEV_TYPE_ATTR_RO(aggregation);
+
 static void mdev_type_release(struct kobject *kobj)
 {
 	struct mdev_type *type = to_mdev_type(kobj);
@@ -137,6 +149,14 @@ static struct mdev_type *add_mdev_supported_type(struct mdev_parent *parent,
 	if (ret)
 		goto attr_create_failed;
 
+	if (parent->ops->create_with_instances &&
+	    parent->ops->max_aggregated_instances) {
+		ret = sysfs_create_file(&type->kobj,
+					&mdev_type_attr_aggregation.attr);
+		if (ret)
+			goto attr_aggregate_failed;
+	}
+
 	type->devices_kobj = kobject_create_and_add("devices", &type->kobj);
 	if (!type->devices_kobj) {
 		ret = -ENOMEM;
@@ -157,6 +177,8 @@ static struct mdev_type *add_mdev_supported_type(struct mdev_parent *parent,
 attrs_failed:
 	kobject_put(type->devices_kobj);
 attr_devices_failed:
+	sysfs_remove_file(&type->kobj, &mdev_type_attr_aggregation.attr);
+attr_aggregate_failed:
 	sysfs_remove_file(&type->kobj, &mdev_type_attr_create.attr);
 attr_create_failed:
 	kobject_del(&type->kobj);
@@ -166,9 +188,14 @@ static struct mdev_type *add_mdev_supported_type(struct mdev_parent *parent,
 
 static void remove_mdev_supported_type(struct mdev_type *type)
 {
+	struct mdev_parent *parent = type->parent;
+
 	sysfs_remove_files(&type->kobj,
 			   (const struct attribute **)type->group->attrs);
 	kobject_put(type->devices_kobj);
+	if (parent->ops->create_with_instances &&
+	    parent->ops->max_aggregated_instances)
+		sysfs_remove_file(&type->kobj, &mdev_type_attr_aggregation.attr);
 	sysfs_remove_file(&type->kobj, &mdev_type_attr_create.attr);
 	kobject_del(&type->kobj);
 	kobject_put(&type->kobj);
diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index 0dbb7ec27009..6808f24286dc 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -50,6 +50,11 @@ struct device *mdev_get_iommu_device(struct device *dev);
  *			      that is being created
  *                      @instances: number of instances to aggregate
  *			Returns integer: success (0) or error (< 0)
+ * @max_aggregated_instances: Return max number for aggregated resources
+ *			@kobj: kobject of type
+ *                      @dev: mdev parent device for target type
+ *                      @max: return max number of instances which can be aggregated
+ *			Returns integer: success (0) or error (< 0)
  * @remove:		Called to free resources in parent device's driver for a
  *			a mediated device. It is mandatory to provide 'remove'
  *			ops.
@@ -93,6 +98,9 @@ struct mdev_parent_ops {
 	int     (*create_with_instances)(struct kobject *kobj,
 					 struct mdev_device *mdev,
 					 unsigned int instances);
+	int     (*max_aggregated_instances)(struct kobject *kobj,
+					    struct device *dev,
+					    unsigned int *max);
 	int     (*remove)(struct mdev_device *mdev);
 	int     (*open)(struct mdev_device *mdev);
 	void    (*release)(struct mdev_device *mdev);
-- 
2.24.0.rc0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/6] vfio/mdev: Add "aggregated_instances" attribute for supported mdev device
  2019-10-24  5:08 [PATCH 0/6] VFIO mdev aggregated resources handling Zhenyu Wang
  2019-10-24  5:08 ` [PATCH 1/6] vfio/mdev: Add new "aggregate" parameter for mdev create Zhenyu Wang
  2019-10-24  5:08 ` [PATCH 2/6] vfio/mdev: Add "aggregation" attribute for supported mdev type Zhenyu Wang
@ 2019-10-24  5:08 ` Zhenyu Wang
  2019-10-24  5:08 ` [PATCH 4/6] Documentation/driver-api/vfio-mediated-device.rst: Update for vfio/mdev aggregation support Zhenyu Wang
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 32+ messages in thread
From: Zhenyu Wang @ 2019-10-24  5:08 UTC (permalink / raw)
  To: kvm; +Cc: alex.williamson, kwankhede, kevin.tian, cohuck

For mdev device created by "aggregate" parameter, this creates new mdev
device attribute "aggregated_instances" to show number of aggregated
instances allocated.

Cc: Kirti Wankhede <kwankhede@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
---
v2:
- change attribute name as "aggregated_instances"
v3:
- create only for aggregated allocation
v4:
- fix remove

 drivers/vfio/mdev/mdev_sysfs.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
index acd3ec2900b5..f131480a767a 100644
--- a/drivers/vfio/mdev/mdev_sysfs.c
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -289,6 +289,17 @@ static ssize_t remove_store(struct device *dev, struct device_attribute *attr,
 
 static DEVICE_ATTR_WO(remove);
 
+static ssize_t
+aggregated_instances_show(struct device *dev,
+			  struct device_attribute *attr,
+			  char *buf)
+{
+	struct mdev_device *mdev = to_mdev_device(dev);
+	return sprintf(buf, "%u\n", mdev->type_instances);
+}
+
+static DEVICE_ATTR_RO(aggregated_instances);
+
 static const struct attribute *mdev_device_attrs[] = {
 	&dev_attr_remove.attr,
 	NULL,
@@ -296,6 +307,7 @@ static const struct attribute *mdev_device_attrs[] = {
 
 int  mdev_create_sysfs_files(struct device *dev, struct mdev_type *type)
 {
+	struct mdev_device *mdev = to_mdev_device(dev);
 	int ret;
 
 	ret = sysfs_create_link(type->devices_kobj, &dev->kobj, dev_name(dev));
@@ -310,8 +322,17 @@ int  mdev_create_sysfs_files(struct device *dev, struct mdev_type *type)
 	if (ret)
 		goto create_files_failed;
 
+	if (mdev->type_instances > 1) {
+		ret = sysfs_create_file(&dev->kobj,
+					&dev_attr_aggregated_instances.attr);
+		if (ret)
+			goto create_aggregated_failed;
+	}
+
 	return ret;
 
+create_aggregated_failed:
+	sysfs_remove_files(&dev->kobj, mdev_device_attrs);
 create_files_failed:
 	sysfs_remove_link(&dev->kobj, "mdev_type");
 type_link_failed:
@@ -321,6 +342,10 @@ int  mdev_create_sysfs_files(struct device *dev, struct mdev_type *type)
 
 void mdev_remove_sysfs_files(struct device *dev, struct mdev_type *type)
 {
+	struct mdev_device *mdev = to_mdev_device(dev);
+	if (mdev->type_instances > 1)
+		sysfs_remove_file(&dev->kobj,
+				  &dev_attr_aggregated_instances.attr);
 	sysfs_remove_files(&dev->kobj, mdev_device_attrs);
 	sysfs_remove_link(&dev->kobj, "mdev_type");
 	sysfs_remove_link(type->devices_kobj, dev_name(dev));
-- 
2.24.0.rc0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/6] Documentation/driver-api/vfio-mediated-device.rst: Update for vfio/mdev aggregation support
  2019-10-24  5:08 [PATCH 0/6] VFIO mdev aggregated resources handling Zhenyu Wang
                   ` (2 preceding siblings ...)
  2019-10-24  5:08 ` [PATCH 3/6] vfio/mdev: Add "aggregated_instances" attribute for supported mdev device Zhenyu Wang
@ 2019-10-24  5:08 ` Zhenyu Wang
  2019-10-24  5:08 ` [PATCH 5/6] Documentation/ABI/testing/sysfs-bus-vfio-mdev: " Zhenyu Wang
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 32+ messages in thread
From: Zhenyu Wang @ 2019-10-24  5:08 UTC (permalink / raw)
  To: kvm; +Cc: alex.williamson, kwankhede, kevin.tian, cohuck

Update vfio/mdev doc on new "aggregate" create parameter, new "aggregation"
attribute and "aggregated_instances" attribute for mdev device.

Cc: Kirti Wankhede <kwankhede@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
---
 .../driver-api/vfio-mediated-device.rst       | 23 +++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst
index 25eb7d5b834b..2881be654516 100644
--- a/Documentation/driver-api/vfio-mediated-device.rst
+++ b/Documentation/driver-api/vfio-mediated-device.rst
@@ -216,6 +216,7 @@ Directories and files under the sysfs for Each Physical Device
   |          |--- available_instances
   |          |--- device_api
   |          |--- description
+  |          |--- <aggregation>
   |          |--- [devices]
 
 * [mdev_supported_types]
@@ -260,6 +261,21 @@ Directories and files under the sysfs for Each Physical Device
   This attribute should show brief features/description of the type. This is
   optional attribute.
 
+* <aggregation>
+
+  <aggregation> is an optional attributes to show maximum number that the
+  instance resources of [<type-id>] can be aggregated to be assigned
+  for one mdev device. No <aggregation> attribute means driver doesn't
+  support to aggregate instance resoures for one mdev device.
+
+  Set number of instances by appending "aggregate=N" parameter for
+  create attribute. By default without "aggregate=N" parameter it
+  will create one instance as normal.
+
+Example::
+
+	# echo "<uuid>,aggregate=N" > create
+
 Directories and Files Under the sysfs for Each mdev Device
 ----------------------------------------------------------
 
@@ -268,6 +284,7 @@ Directories and Files Under the sysfs for Each mdev Device
   |- [parent phy device]
   |--- [$MDEV_UUID]
          |--- remove
+         |--- <aggregated_instances>
          |--- mdev_type {link to its type}
          |--- vendor-specific-attributes [optional]
 
@@ -281,6 +298,12 @@ Example::
 
 	# echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove
 
+* <aggregated_instances> (read only)
+
+For mdev created with aggregate parameter, this shows number of
+instances assigned for this mdev. For normal type this attribute will
+not exist.
+
 Mediated device Hot plug
 ------------------------
 
-- 
2.24.0.rc0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 5/6] Documentation/ABI/testing/sysfs-bus-vfio-mdev: Update for vfio/mdev aggregation support
  2019-10-24  5:08 [PATCH 0/6] VFIO mdev aggregated resources handling Zhenyu Wang
                   ` (3 preceding siblings ...)
  2019-10-24  5:08 ` [PATCH 4/6] Documentation/driver-api/vfio-mediated-device.rst: Update for vfio/mdev aggregation support Zhenyu Wang
@ 2019-10-24  5:08 ` Zhenyu Wang
  2019-10-24  5:08 ` [PATCH 6/6] drm/i915/gvt: Add new type with " Zhenyu Wang
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 32+ messages in thread
From: Zhenyu Wang @ 2019-10-24  5:08 UTC (permalink / raw)
  To: kvm; +Cc: alex.williamson, kwankhede, kevin.tian, cohuck

Update vfio/mdev ABI description for new aggregation attributes.

Cc: Kirti Wankhede <kwankhede@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
---
 Documentation/ABI/testing/sysfs-bus-vfio-mdev | 24 +++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-vfio-mdev b/Documentation/ABI/testing/sysfs-bus-vfio-mdev
index 452dbe39270e..6645ff9a6a6a 100644
--- a/Documentation/ABI/testing/sysfs-bus-vfio-mdev
+++ b/Documentation/ABI/testing/sysfs-bus-vfio-mdev
@@ -85,6 +85,23 @@ Users:
 		a particular <type-id> that can help in understanding the
 		features provided by that type of mediated device.
 
+What:           /sys/.../mdev_supported_types/<type-id>/aggregation
+Date:           October 2019
+Contact:        Zhenyu Wang <zhenyuw@linux.intel.com>
+Description:
+		Reading this attribute will show number of mdev instances
+		that can be aggregated to assign for one mdev device.
+		This is an optional attribute. If this attribute exists,
+		the driver supports to aggregate target mdev type's
+		resources assigned for one mdev device.
+Users:
+		Userspace application should check the number of aggregation
+		instances that could be created before creating mediated device.
+		Create mdev instance with aggregation by applying this,
+		e.g
+		# echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001,aggregate=XX" > \
+		       /sys/devices/foo/mdev_supported_types/foo-1/create
+
 What:           /sys/.../<device>/<UUID>/
 Date:           October 2016
 Contact:        Kirti Wankhede <kwankhede@nvidia.com>
@@ -109,3 +126,10 @@ Description:
 		is active and the vendor driver doesn't support hot unplug.
 		Example:
 		# echo 1 > /sys/bus/mdev/devices/<UUID>/remove
+
+What:           /sys/.../<device>/<UUID>/aggregated_instances
+Date:           October 2019
+Contact:        Zhenyu Wang <zhenyuw@linux.intel.com>
+Description:
+		This attributes shows number of aggregated instances if this
+		mediated device was created with "aggregate" parameter.
\ No newline at end of file
-- 
2.24.0.rc0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 6/6] drm/i915/gvt: Add new type with aggregation support
  2019-10-24  5:08 [PATCH 0/6] VFIO mdev aggregated resources handling Zhenyu Wang
                   ` (4 preceding siblings ...)
  2019-10-24  5:08 ` [PATCH 5/6] Documentation/ABI/testing/sysfs-bus-vfio-mdev: " Zhenyu Wang
@ 2019-10-24  5:08 ` Zhenyu Wang
  2019-11-05 21:10 ` [PATCH 0/6] VFIO mdev aggregated resources handling Alex Williamson
  2019-11-07 20:37 ` Parav Pandit
  7 siblings, 0 replies; 32+ messages in thread
From: Zhenyu Wang @ 2019-10-24  5:08 UTC (permalink / raw)
  To: kvm; +Cc: alex.williamson, kwankhede, kevin.tian, cohuck

New aggregation type is created for KVMGT which can be used
to combine minimal resource number for target instances, to create
user defined number of resources. For KVMGT, aggregated resource
is determined by memory and fence resource allocation for target
number of instances.

Cc: Kirti Wankhede <kwankhede@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
---
v2:
- apply for new hooks
v3:
- show aggregation info in description

 drivers/gpu/drm/i915/gvt/gvt.c   |  4 +--
 drivers/gpu/drm/i915/gvt/gvt.h   | 11 +++++--
 drivers/gpu/drm/i915/gvt/kvmgt.c | 53 ++++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/gvt/vgpu.c  | 56 ++++++++++++++++++++++++++++++--
 4 files changed, 114 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gvt.c b/drivers/gpu/drm/i915/gvt/gvt.c
index 8f37eefa0a02..e8b82c925bba 100644
--- a/drivers/gpu/drm/i915/gvt/gvt.c
+++ b/drivers/gpu/drm/i915/gvt/gvt.c
@@ -98,11 +98,11 @@ static ssize_t description_show(struct kobject *kobj, struct device *dev,
 
 	return sprintf(buf, "low_gm_size: %dMB\nhigh_gm_size: %dMB\n"
 		       "fence: %d\nresolution: %s\n"
-		       "weight: %d\n",
+		       "weight: %d\naggregation: %s\n",
 		       BYTES_TO_MB(type->low_gm_size),
 		       BYTES_TO_MB(type->high_gm_size),
 		       type->fence, vgpu_edid_str(type->resolution),
-		       type->weight);
+		       type->weight, type->aggregation ? "gm" : "none");
 }
 
 static MDEV_TYPE_ATTR_RO(available_instances);
diff --git a/drivers/gpu/drm/i915/gvt/gvt.h b/drivers/gpu/drm/i915/gvt/gvt.h
index b47c6acaf9c0..9eb410257216 100644
--- a/drivers/gpu/drm/i915/gvt/gvt.h
+++ b/drivers/gpu/drm/i915/gvt/gvt.h
@@ -238,6 +238,9 @@ struct intel_vgpu {
 struct intel_gvt_gm {
 	unsigned long vgpu_allocated_low_gm_size;
 	unsigned long vgpu_allocated_high_gm_size;
+	unsigned long low_avail;
+	unsigned long high_avail;
+	unsigned long fence_avail;
 };
 
 struct intel_gvt_fence {
@@ -289,13 +292,15 @@ struct intel_gvt_firmware {
 
 #define NR_MAX_INTEL_VGPU_TYPES 20
 struct intel_vgpu_type {
-	char name[16];
+	const char *drv_name;
+	char name[32];
 	unsigned int avail_instance;
 	unsigned int low_gm_size;
 	unsigned int high_gm_size;
 	unsigned int fence;
 	unsigned int weight;
 	enum intel_vgpu_edid resolution;
+	unsigned int aggregation;
 };
 
 struct intel_gvt {
@@ -481,7 +486,7 @@ void intel_gvt_clean_vgpu_types(struct intel_gvt *gvt);
 struct intel_vgpu *intel_gvt_create_idle_vgpu(struct intel_gvt *gvt);
 void intel_gvt_destroy_idle_vgpu(struct intel_vgpu *vgpu);
 struct intel_vgpu *intel_gvt_create_vgpu(struct intel_gvt *gvt,
-					 struct intel_vgpu_type *type);
+					 struct intel_vgpu_type *type, unsigned int);
 void intel_gvt_destroy_vgpu(struct intel_vgpu *vgpu);
 void intel_gvt_release_vgpu(struct intel_vgpu *vgpu);
 void intel_gvt_reset_vgpu_locked(struct intel_vgpu *vgpu, bool dmlr,
@@ -562,7 +567,7 @@ struct intel_gvt_ops {
 	int (*emulate_mmio_write)(struct intel_vgpu *, u64, void *,
 				unsigned int);
 	struct intel_vgpu *(*vgpu_create)(struct intel_gvt *,
-				struct intel_vgpu_type *);
+					  struct intel_vgpu_type *, unsigned int);
 	void (*vgpu_destroy)(struct intel_vgpu *vgpu);
 	void (*vgpu_release)(struct intel_vgpu *vgpu);
 	void (*vgpu_reset)(struct intel_vgpu *);
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 04a5a0d90823..99b77f4d775c 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -643,7 +643,9 @@ static void kvmgt_put_vfio_device(void *vgpu)
 	vfio_device_put(((struct intel_vgpu *)vgpu)->vdev.vfio_device);
 }
 
-static int intel_vgpu_create(struct kobject *kobj, struct mdev_device *mdev)
+static int intel_vgpu_create_internal(struct kobject *kobj,
+				      struct mdev_device *mdev,
+				      unsigned int instances)
 {
 	struct intel_vgpu *vgpu = NULL;
 	struct intel_vgpu_type *type;
@@ -662,7 +664,14 @@ static int intel_vgpu_create(struct kobject *kobj, struct mdev_device *mdev)
 		goto out;
 	}
 
-	vgpu = intel_gvt_ops->vgpu_create(gvt, type);
+	if (instances > 1 && instances > type->aggregation) {
+		gvt_vgpu_err("wrong aggregation specified for type %s\n",
+			     kobject_name(kobj));
+		ret = -EINVAL;
+		goto out;
+	}
+
+	vgpu = intel_gvt_ops->vgpu_create(gvt, type, instances);
 	if (IS_ERR_OR_NULL(vgpu)) {
 		ret = vgpu == NULL ? -EFAULT : PTR_ERR(vgpu);
 		gvt_err("failed to create intel vgpu: %d\n", ret);
@@ -682,6 +691,44 @@ static int intel_vgpu_create(struct kobject *kobj, struct mdev_device *mdev)
 	return ret;
 }
 
+static int intel_vgpu_create(struct kobject *kobj, struct mdev_device *mdev)
+{
+       return intel_vgpu_create_internal(kobj, mdev, 1);
+}
+
+static int intel_vgpu_create_with_instances(struct kobject *kobj,
+                                           struct mdev_device *mdev,
+                                           unsigned int instances)
+{
+       return intel_vgpu_create_internal(kobj, mdev, instances);
+}
+
+static int intel_vgpu_max_aggregated_instances(struct kobject *kobj,
+					       struct device *dev,
+					       unsigned int *max)
+{
+	struct intel_vgpu_type *type;
+	struct intel_gvt *gvt;
+	int ret = 0;
+
+	gvt = kdev_to_i915(dev)->gvt;
+
+	type = intel_gvt_ops->gvt_find_vgpu_type(gvt, kobject_name(kobj));
+	if (!type) {
+		gvt_err("failed to find type %s to create\n",
+						kobject_name(kobj));
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (type->aggregation == 0)
+		*max = 1;
+	else
+		*max = type->aggregation;
+out:
+	return ret;
+}
+
 static int intel_vgpu_remove(struct mdev_device *mdev)
 {
 	struct intel_vgpu *vgpu = mdev_get_drvdata(mdev);
@@ -1584,6 +1631,8 @@ static const struct attribute_group *intel_vgpu_groups[] = {
 static struct mdev_parent_ops intel_vgpu_ops = {
 	.mdev_attr_groups       = intel_vgpu_groups,
 	.create			= intel_vgpu_create,
+	.create_with_instances  = intel_vgpu_create_with_instances,
+	.max_aggregated_instances = intel_vgpu_max_aggregated_instances,
 	.remove			= intel_vgpu_remove,
 
 	.open			= intel_vgpu_open,
diff --git a/drivers/gpu/drm/i915/gvt/vgpu.c b/drivers/gpu/drm/i915/gvt/vgpu.c
index d5a6e4e3d0fd..d0780a2f7e69 100644
--- a/drivers/gpu/drm/i915/gvt/vgpu.c
+++ b/drivers/gpu/drm/i915/gvt/vgpu.c
@@ -108,6 +108,7 @@ int intel_gvt_init_vgpu_types(struct intel_gvt *gvt)
 	unsigned int num_types;
 	unsigned int i, low_avail, high_avail;
 	unsigned int min_low;
+	const char *driver_name = dev_driver_string(&gvt->dev_priv->drm.pdev->dev);
 
 	/* vGPU type name is defined as GVTg_Vx_y which contains
 	 * physical GPU generation type (e.g V4 as BDW server, V5 as
@@ -125,11 +126,15 @@ int intel_gvt_init_vgpu_types(struct intel_gvt *gvt)
 	high_avail = gvt_hidden_sz(gvt) - HOST_HIGH_GM_SIZE;
 	num_types = sizeof(vgpu_types) / sizeof(vgpu_types[0]);
 
-	gvt->types = kcalloc(num_types, sizeof(struct intel_vgpu_type),
+	gvt->types = kcalloc(num_types + 1, sizeof(struct intel_vgpu_type),
 			     GFP_KERNEL);
 	if (!gvt->types)
 		return -ENOMEM;
 
+	gvt->gm.low_avail = low_avail;
+	gvt->gm.high_avail = high_avail;
+	gvt->gm.fence_avail = 32 - HOST_FENCE;
+
 	min_low = MB_TO_BYTES(32);
 	for (i = 0; i < num_types; ++i) {
 		if (low_avail / vgpu_types[i].low_mm == 0)
@@ -147,6 +152,7 @@ int intel_gvt_init_vgpu_types(struct intel_gvt *gvt)
 		gvt->types[i].resolution = vgpu_types[i].edid;
 		gvt->types[i].avail_instance = min(low_avail / vgpu_types[i].low_mm,
 						   high_avail / vgpu_types[i].high_mm);
+		gvt->types[i].aggregation = 0;
 
 		if (IS_GEN(gvt->dev_priv, 8))
 			sprintf(gvt->types[i].name, "GVTg_V4_%s",
@@ -154,6 +160,7 @@ int intel_gvt_init_vgpu_types(struct intel_gvt *gvt)
 		else if (IS_GEN(gvt->dev_priv, 9))
 			sprintf(gvt->types[i].name, "GVTg_V5_%s",
 						vgpu_types[i].name);
+		gvt->types[i].drv_name = driver_name;
 
 		gvt_dbg_core("type[%d]: %s avail %u low %u high %u fence %u weight %u res %s\n",
 			     i, gvt->types[i].name,
@@ -164,7 +171,32 @@ int intel_gvt_init_vgpu_types(struct intel_gvt *gvt)
 			     vgpu_edid_str(gvt->types[i].resolution));
 	}
 
-	gvt->num_types = i;
+	/* add aggregation type */
+	gvt->types[i].low_gm_size = MB_TO_BYTES(32);
+	gvt->types[i].high_gm_size = MB_TO_BYTES(192);
+	gvt->types[i].fence = 2;
+	gvt->types[i].weight = 16;
+	gvt->types[i].resolution = GVT_EDID_1024_768;
+	gvt->types[i].avail_instance = min(low_avail / gvt->types[i].low_gm_size,
+					   high_avail / gvt->types[i].high_gm_size);
+	gvt->types[i].avail_instance = min(gvt->types[i].avail_instance,
+					   (32 - HOST_FENCE) / gvt->types[i].fence);
+	if (IS_GEN(gvt->dev_priv, 8))
+		strcat(gvt->types[i].name, "GVTg_V4_aggregate");
+	else if (IS_GEN(gvt->dev_priv, 9))
+		strcat(gvt->types[i].name, "GVTg_V5_aggregate");
+	gvt->types[i].drv_name = driver_name;
+
+	gvt_dbg_core("type[%d]: %s avail %u low %u high %u fence %u weight %u res %s\n",
+		     i, gvt->types[i].name,
+		     gvt->types[i].avail_instance,
+		     gvt->types[i].low_gm_size,
+		     gvt->types[i].high_gm_size, gvt->types[i].fence,
+		     gvt->types[i].weight,
+		     vgpu_edid_str(gvt->types[i].resolution));
+
+	gvt->types[i].aggregation = gvt->types[i].avail_instance;
+	gvt->num_types = ++i;
 	return 0;
 }
 
@@ -195,6 +227,8 @@ static void intel_gvt_update_vgpu_types(struct intel_gvt *gvt)
 		fence_min = fence_avail / gvt->types[i].fence;
 		gvt->types[i].avail_instance = min(min(low_gm_min, high_gm_min),
 						   fence_min);
+		if (gvt->types[i].aggregation > 1)
+			gvt->types[i].aggregation = gvt->types[i].avail_instance;
 
 		gvt_dbg_core("update type[%d]: %s avail %u low %u high %u fence %u\n",
 		       i, gvt->types[i].name,
@@ -468,7 +502,8 @@ static struct intel_vgpu *__intel_gvt_create_vgpu(struct intel_gvt *gvt,
  * pointer to intel_vgpu, error pointer if failed.
  */
 struct intel_vgpu *intel_gvt_create_vgpu(struct intel_gvt *gvt,
-				struct intel_vgpu_type *type)
+					 struct intel_vgpu_type *type,
+					 unsigned int instances)
 {
 	struct intel_vgpu_creation_params param;
 	struct intel_vgpu *vgpu;
@@ -485,6 +520,21 @@ struct intel_vgpu *intel_gvt_create_vgpu(struct intel_gvt *gvt,
 	param.low_gm_sz = BYTES_TO_MB(param.low_gm_sz);
 	param.high_gm_sz = BYTES_TO_MB(param.high_gm_sz);
 
+	if (instances > 1 && instances <= type->aggregation) {
+		if (instances > type->avail_instance)
+			return ERR_PTR(-EINVAL);
+		param.low_gm_sz = min(param.low_gm_sz * instances,
+				      (u64)BYTES_TO_MB(gvt->gm.low_avail));
+		param.high_gm_sz = min(param.high_gm_sz * instances,
+				       (u64)BYTES_TO_MB(gvt->gm.high_avail));
+		param.fence_sz = min(param.fence_sz * instances,
+				     (u64)gvt->gm.fence_avail);
+		type->aggregation -= instances;
+		gvt_dbg_core("instances %d, low %lluMB, high %lluMB, fence %llu, left %u\n",
+			     instances, param.low_gm_sz, param.high_gm_sz, param.fence_sz,
+			     type->aggregation);
+	}
+
 	mutex_lock(&gvt->lock);
 	vgpu = __intel_gvt_create_vgpu(gvt, &param);
 	if (!IS_ERR(vgpu))
-- 
2.24.0.rc0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/6] vfio/mdev: Add "aggregation" attribute for supported mdev type
  2019-10-24  5:08 ` [PATCH 2/6] vfio/mdev: Add "aggregation" attribute for supported mdev type Zhenyu Wang
@ 2019-10-27  6:24     ` kbuild test robot
  2019-10-27  6:24     ` kbuild test robot
  1 sibling, 0 replies; 32+ messages in thread
From: kbuild test robot @ 2019-10-27  6:24 UTC (permalink / raw)
  To: Zhenyu Wang
  Cc: kbuild-all, kvm, alex.williamson, kwankhede, kevin.tian, cohuck

Hi Zhenyu,

I love your patch! Perhaps something to improve:

[auto build test WARNING on vfio/next]
[also build test WARNING on v5.4-rc4 next-20191025]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Zhenyu-Wang/VFIO-mdev-aggregated-resources-handling/20191027-111736
base:   https://github.com/awilliam/linux-vfio.git next
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.1-dirty
        make ARCH=x86_64 allmodconfig
        make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

   drivers/vfio/mdev/mdev_sysfs.c:96:1: sparse: sparse: symbol 'mdev_type_attr_create' was not declared. Should it be static?
>> drivers/vfio/mdev/mdev_sysfs.c:108:1: sparse: sparse: symbol 'mdev_type_attr_aggregation' was not declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/6] vfio/mdev: Add "aggregation" attribute for supported mdev type
@ 2019-10-27  6:24     ` kbuild test robot
  0 siblings, 0 replies; 32+ messages in thread
From: kbuild test robot @ 2019-10-27  6:24 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 1390 bytes --]

Hi Zhenyu,

I love your patch! Perhaps something to improve:

[auto build test WARNING on vfio/next]
[also build test WARNING on v5.4-rc4 next-20191025]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Zhenyu-Wang/VFIO-mdev-aggregated-resources-handling/20191027-111736
base:   https://github.com/awilliam/linux-vfio.git next
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.1-dirty
        make ARCH=x86_64 allmodconfig
        make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

   drivers/vfio/mdev/mdev_sysfs.c:96:1: sparse: sparse: symbol 'mdev_type_attr_create' was not declared. Should it be static?
>> drivers/vfio/mdev/mdev_sysfs.c:108:1: sparse: sparse: symbol 'mdev_type_attr_aggregation' was not declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH] vfio/mdev: mdev_type_attr_aggregation can be static
  2019-10-24  5:08 ` [PATCH 2/6] vfio/mdev: Add "aggregation" attribute for supported mdev type Zhenyu Wang
@ 2019-10-27  6:24     ` kbuild test robot
  2019-10-27  6:24     ` kbuild test robot
  1 sibling, 0 replies; 32+ messages in thread
From: kbuild test robot @ 2019-10-27  6:24 UTC (permalink / raw)
  To: Zhenyu Wang
  Cc: kbuild-all, kvm, alex.williamson, kwankhede, kevin.tian, cohuck


Fixes: b335e826b3eb ("vfio/mdev: Add "aggregation" attribute for supported mdev type")
Signed-off-by: kbuild test robot <lkp@intel.com>
---
 mdev_sysfs.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
index acd3ec2900b5c..2f4faef85858d 100644
--- a/drivers/vfio/mdev/mdev_sysfs.c
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -105,7 +105,7 @@ aggregation_show(struct kobject *kobj, struct device *dev, char *buf)
 	else
 		return sprintf(buf, "%u\n", m);
 }
-MDEV_TYPE_ATTR_RO(aggregation);
+static MDEV_TYPE_ATTR_RO(aggregation);
 
 static void mdev_type_release(struct kobject *kobj)
 {

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH] vfio/mdev: mdev_type_attr_aggregation can be static
@ 2019-10-27  6:24     ` kbuild test robot
  0 siblings, 0 replies; 32+ messages in thread
From: kbuild test robot @ 2019-10-27  6:24 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 690 bytes --]


Fixes: b335e826b3eb ("vfio/mdev: Add "aggregation" attribute for supported mdev type")
Signed-off-by: kbuild test robot <lkp@intel.com>
---
 mdev_sysfs.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
index acd3ec2900b5c..2f4faef85858d 100644
--- a/drivers/vfio/mdev/mdev_sysfs.c
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -105,7 +105,7 @@ aggregation_show(struct kobject *kobj, struct device *dev, char *buf)
 	else
 		return sprintf(buf, "%u\n", m);
 }
-MDEV_TYPE_ATTR_RO(aggregation);
+static MDEV_TYPE_ATTR_RO(aggregation);
 
 static void mdev_type_release(struct kobject *kobj)
 {

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/6] VFIO mdev aggregated resources handling
  2019-10-24  5:08 [PATCH 0/6] VFIO mdev aggregated resources handling Zhenyu Wang
                   ` (5 preceding siblings ...)
  2019-10-24  5:08 ` [PATCH 6/6] drm/i915/gvt: Add new type with " Zhenyu Wang
@ 2019-11-05 21:10 ` Alex Williamson
  2019-11-06  4:20   ` Zhenyu Wang
  2019-11-07 20:37 ` Parav Pandit
  7 siblings, 1 reply; 32+ messages in thread
From: Alex Williamson @ 2019-11-05 21:10 UTC (permalink / raw)
  To: Zhenyu Wang
  Cc: kvm, kwankhede, kevin.tian, cohuck, Libvirt Devel, Pavel Hrdina,
	Jonathon Jongsma

On Thu, 24 Oct 2019 13:08:23 +0800
Zhenyu Wang <zhenyuw@linux.intel.com> wrote:

> Hi,
> 
> This is a refresh for previous send of this series. I got impression that
> some SIOV drivers would still deploy their own create and config method so
> stopped effort on this. But seems this would still be useful for some other
> SIOV driver which may simply want capability to aggregate resources. So here's
> refreshed series.
> 
> Current mdev device create interface depends on fixed mdev type, which get uuid
> from user to create instance of mdev device. If user wants to use customized
> number of resource for mdev device, then only can create new mdev type for that
> which may not be flexible. This requirement comes not only from to be able to
> allocate flexible resources for KVMGT, but also from Intel scalable IO
> virtualization which would use vfio/mdev to be able to allocate arbitrary
> resources on mdev instance. More info on [1] [2] [3].
> 
> To allow to create user defined resources for mdev, it trys to extend mdev
> create interface by adding new "aggregate=xxx" parameter following UUID, for
> target mdev type if aggregation is supported, it can create new mdev device
> which contains resources combined by number of instances, e.g
> 
>     echo "<uuid>,aggregate=10" > create
> 
> VM manager e.g libvirt can check mdev type with "aggregation" attribute which
> can support this setting. If no "aggregation" attribute found for mdev type,
> previous behavior is still kept for one instance allocation. And new sysfs
> attribute "aggregated_instances" is created for each mdev device to show allocated number.

Given discussions we've had recently around libvirt interacting with
mdev, I think that libvirt would rather have an abstract interface via
mdevctl[1].  Therefore can you evaluate how mdevctl would support this
creation extension?  It seems like it would fit within the existing
mdev and mdevctl framework if aggregation were simply a sysfs attribute
for the device.  For example, the mdevctl steps might look like this:

mdevctl define -u UUID -p PARENT -t TYPE
mdevctl modify -u UUID --addattr=mdev/aggregation --value=2
mdevctl start -u UUID

When mdevctl starts the mdev, it will first create it using the
existing mechanism, then apply aggregation attribute, which can consume
the necessary additional instances from the parent device, or return an
error, which would unwind and return a failure code to the caller
(libvirt).  I think the vendor driver would then have freedom to decide
when the attribute could be modified, for instance it would be entirely
reasonable to return -EBUSY if the user attempts to modify the
attribute while the mdev device is in-use.  Effectively aggregation
simply becomes a standardized attribute with common meaning.  Thoughts?
[cc libvirt folks for their impression] Thanks,

Alex

[1] https://github.com/mdevctl/mdevctl

> References:
> [1] https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
> [2] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
> [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
> 
> Zhenyu Wang (6):
>   vfio/mdev: Add new "aggregate" parameter for mdev create
>   vfio/mdev: Add "aggregation" attribute for supported mdev type
>   vfio/mdev: Add "aggregated_instances" attribute for supported mdev
>     device
>   Documentation/driver-api/vfio-mediated-device.rst: Update for
>     vfio/mdev aggregation support
>   Documentation/ABI/testing/sysfs-bus-vfio-mdev: Update for vfio/mdev
>     aggregation support
>   drm/i915/gvt: Add new type with aggregation support
> 
>  Documentation/ABI/testing/sysfs-bus-vfio-mdev | 24 ++++++
>  .../driver-api/vfio-mediated-device.rst       | 23 ++++++
>  drivers/gpu/drm/i915/gvt/gvt.c                |  4 +-
>  drivers/gpu/drm/i915/gvt/gvt.h                | 11 ++-
>  drivers/gpu/drm/i915/gvt/kvmgt.c              | 53 ++++++++++++-
>  drivers/gpu/drm/i915/gvt/vgpu.c               | 56 ++++++++++++-
>  drivers/vfio/mdev/mdev_core.c                 | 36 ++++++++-
>  drivers/vfio/mdev/mdev_private.h              |  6 +-
>  drivers/vfio/mdev/mdev_sysfs.c                | 79 ++++++++++++++++++-
>  include/linux/mdev.h                          | 19 +++++
>  10 files changed, 294 insertions(+), 17 deletions(-)
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/6] VFIO mdev aggregated resources handling
  2019-11-05 21:10 ` [PATCH 0/6] VFIO mdev aggregated resources handling Alex Williamson
@ 2019-11-06  4:20   ` Zhenyu Wang
  2019-11-06 18:44     ` Alex Williamson
  0 siblings, 1 reply; 32+ messages in thread
From: Zhenyu Wang @ 2019-11-06  4:20 UTC (permalink / raw)
  To: Alex Williamson, Tian, Kevin
  Cc: Zhenyu Wang, kvm, kwankhede, kevin.tian, cohuck, Libvirt Devel,
	Pavel Hrdina, Jonathon Jongsma

[-- Attachment #1: Type: text/plain, Size: 5612 bytes --]

On 2019.11.05 14:10:42 -0700, Alex Williamson wrote:
> On Thu, 24 Oct 2019 13:08:23 +0800
> Zhenyu Wang <zhenyuw@linux.intel.com> wrote:
> 
> > Hi,
> > 
> > This is a refresh for previous send of this series. I got impression that
> > some SIOV drivers would still deploy their own create and config method so
> > stopped effort on this. But seems this would still be useful for some other
> > SIOV driver which may simply want capability to aggregate resources. So here's
> > refreshed series.
> > 
> > Current mdev device create interface depends on fixed mdev type, which get uuid
> > from user to create instance of mdev device. If user wants to use customized
> > number of resource for mdev device, then only can create new mdev type for that
> > which may not be flexible. This requirement comes not only from to be able to
> > allocate flexible resources for KVMGT, but also from Intel scalable IO
> > virtualization which would use vfio/mdev to be able to allocate arbitrary
> > resources on mdev instance. More info on [1] [2] [3].
> > 
> > To allow to create user defined resources for mdev, it trys to extend mdev
> > create interface by adding new "aggregate=xxx" parameter following UUID, for
> > target mdev type if aggregation is supported, it can create new mdev device
> > which contains resources combined by number of instances, e.g
> > 
> >     echo "<uuid>,aggregate=10" > create
> > 
> > VM manager e.g libvirt can check mdev type with "aggregation" attribute which
> > can support this setting. If no "aggregation" attribute found for mdev type,
> > previous behavior is still kept for one instance allocation. And new sysfs
> > attribute "aggregated_instances" is created for each mdev device to show allocated number.
> 
> Given discussions we've had recently around libvirt interacting with
> mdev, I think that libvirt would rather have an abstract interface via
> mdevctl[1].  Therefore can you evaluate how mdevctl would support this
> creation extension?  It seems like it would fit within the existing
> mdev and mdevctl framework if aggregation were simply a sysfs attribute
> for the device.  For example, the mdevctl steps might look like this:
> 
> mdevctl define -u UUID -p PARENT -t TYPE
> mdevctl modify -u UUID --addattr=mdev/aggregation --value=2
> mdevctl start -u UUID
> 
> When mdevctl starts the mdev, it will first create it using the
> existing mechanism, then apply aggregation attribute, which can consume
> the necessary additional instances from the parent device, or return an
> error, which would unwind and return a failure code to the caller
> (libvirt).  I think the vendor driver would then have freedom to decide
> when the attribute could be modified, for instance it would be entirely
> reasonable to return -EBUSY if the user attempts to modify the
> attribute while the mdev device is in-use.  Effectively aggregation
> simply becomes a standardized attribute with common meaning.  Thoughts?
> [cc libvirt folks for their impression] Thanks,

I think one problem is that before mdevctl start to create mdev you
don't know what vendor attributes are, as we apply mdev attributes
after create. You may need some lookup depending on parent.. I think
making aggregation like other vendor attribute for mdev might be the
simplest way, but do we want to define its behavior in formal? e.g
like previous discussed it should show maxium instances for aggregation, etc.

The behavior change for driver is that previously aggregation is
handled at create time, but for sysfs attr it should handle any
resource allocation before it's really in-use. I think some SIOV
driver which already requires some specific config should be ok,
but not sure for other driver which might not be explored in this before.
Would that be a problem? Kevin?

Thanks

> 
> Alex
> 
> [1] https://github.com/mdevctl/mdevctl
> 
> > References:
> > [1] https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
> > [2] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
> > [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
> > 
> > Zhenyu Wang (6):
> >   vfio/mdev: Add new "aggregate" parameter for mdev create
> >   vfio/mdev: Add "aggregation" attribute for supported mdev type
> >   vfio/mdev: Add "aggregated_instances" attribute for supported mdev
> >     device
> >   Documentation/driver-api/vfio-mediated-device.rst: Update for
> >     vfio/mdev aggregation support
> >   Documentation/ABI/testing/sysfs-bus-vfio-mdev: Update for vfio/mdev
> >     aggregation support
> >   drm/i915/gvt: Add new type with aggregation support
> > 
> >  Documentation/ABI/testing/sysfs-bus-vfio-mdev | 24 ++++++
> >  .../driver-api/vfio-mediated-device.rst       | 23 ++++++
> >  drivers/gpu/drm/i915/gvt/gvt.c                |  4 +-
> >  drivers/gpu/drm/i915/gvt/gvt.h                | 11 ++-
> >  drivers/gpu/drm/i915/gvt/kvmgt.c              | 53 ++++++++++++-
> >  drivers/gpu/drm/i915/gvt/vgpu.c               | 56 ++++++++++++-
> >  drivers/vfio/mdev/mdev_core.c                 | 36 ++++++++-
> >  drivers/vfio/mdev/mdev_private.h              |  6 +-
> >  drivers/vfio/mdev/mdev_sysfs.c                | 79 ++++++++++++++++++-
> >  include/linux/mdev.h                          | 19 +++++
> >  10 files changed, 294 insertions(+), 17 deletions(-)
> > 
> 

-- 
Open Source Technology Center, Intel ltd.

$gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/6] VFIO mdev aggregated resources handling
  2019-11-06  4:20   ` Zhenyu Wang
@ 2019-11-06 18:44     ` Alex Williamson
  2019-11-07 13:02       ` Cornelia Huck
  2019-11-15  4:24       ` Tian, Kevin
  0 siblings, 2 replies; 32+ messages in thread
From: Alex Williamson @ 2019-11-06 18:44 UTC (permalink / raw)
  To: Zhenyu Wang
  Cc: Tian, Kevin, kvm, kwankhede, cohuck, Libvirt Devel, Pavel Hrdina,
	Jonathon Jongsma

On Wed, 6 Nov 2019 12:20:31 +0800
Zhenyu Wang <zhenyuw@linux.intel.com> wrote:

> On 2019.11.05 14:10:42 -0700, Alex Williamson wrote:
> > On Thu, 24 Oct 2019 13:08:23 +0800
> > Zhenyu Wang <zhenyuw@linux.intel.com> wrote:
> >   
> > > Hi,
> > > 
> > > This is a refresh for previous send of this series. I got impression that
> > > some SIOV drivers would still deploy their own create and config method so
> > > stopped effort on this. But seems this would still be useful for some other
> > > SIOV driver which may simply want capability to aggregate resources. So here's
> > > refreshed series.
> > > 
> > > Current mdev device create interface depends on fixed mdev type, which get uuid
> > > from user to create instance of mdev device. If user wants to use customized
> > > number of resource for mdev device, then only can create new mdev type for that
> > > which may not be flexible. This requirement comes not only from to be able to
> > > allocate flexible resources for KVMGT, but also from Intel scalable IO
> > > virtualization which would use vfio/mdev to be able to allocate arbitrary
> > > resources on mdev instance. More info on [1] [2] [3].
> > > 
> > > To allow to create user defined resources for mdev, it trys to extend mdev
> > > create interface by adding new "aggregate=xxx" parameter following UUID, for
> > > target mdev type if aggregation is supported, it can create new mdev device
> > > which contains resources combined by number of instances, e.g
> > > 
> > >     echo "<uuid>,aggregate=10" > create
> > > 
> > > VM manager e.g libvirt can check mdev type with "aggregation" attribute which
> > > can support this setting. If no "aggregation" attribute found for mdev type,
> > > previous behavior is still kept for one instance allocation. And new sysfs
> > > attribute "aggregated_instances" is created for each mdev device to show allocated number.  
> > 
> > Given discussions we've had recently around libvirt interacting with
> > mdev, I think that libvirt would rather have an abstract interface via
> > mdevctl[1].  Therefore can you evaluate how mdevctl would support this
> > creation extension?  It seems like it would fit within the existing
> > mdev and mdevctl framework if aggregation were simply a sysfs attribute
> > for the device.  For example, the mdevctl steps might look like this:
> > 
> > mdevctl define -u UUID -p PARENT -t TYPE
> > mdevctl modify -u UUID --addattr=mdev/aggregation --value=2
> > mdevctl start -u UUID
> > 
> > When mdevctl starts the mdev, it will first create it using the
> > existing mechanism, then apply aggregation attribute, which can consume
> > the necessary additional instances from the parent device, or return an
> > error, which would unwind and return a failure code to the caller
> > (libvirt).  I think the vendor driver would then have freedom to decide
> > when the attribute could be modified, for instance it would be entirely
> > reasonable to return -EBUSY if the user attempts to modify the
> > attribute while the mdev device is in-use.  Effectively aggregation
> > simply becomes a standardized attribute with common meaning.  Thoughts?
> > [cc libvirt folks for their impression] Thanks,  
> 
> I think one problem is that before mdevctl start to create mdev you
> don't know what vendor attributes are, as we apply mdev attributes
> after create. You may need some lookup depending on parent.. I think
> making aggregation like other vendor attribute for mdev might be the
> simplest way, but do we want to define its behavior in formal? e.g
> like previous discussed it should show maxium instances for aggregation, etc.

Yes, we'd still want to standardize how we enable and discover
aggregation since we expect multiple users.  Even if libvirt were to
use mdevctl as it's mdev interface, higher level tools should have an
introspection mechanism available.  Possibly the sysfs interfaces
proposed in this series remains largely the same, but I think perhaps
the implementation of them moves out to the vendor driver.  In fact,
perhaps the only change to mdev core is to define the standard.  For
example, the "aggregation" attribute on the type is potentially simply
a defined, optional, per type attribute, similar to "name" and
"description".  For "aggregated_instances" we already have the
mdev_attr_groups of the mdev_parent_ops, we could define an
attribute_group with .name = "mdev" as a set of standardized
attributes, such that vendors could provide both their own vendor
specific attributes and per device attributes with a common meaning and
semantic defined in the mdev ABI.

> The behavior change for driver is that previously aggregation is
> handled at create time, but for sysfs attr it should handle any
> resource allocation before it's really in-use. I think some SIOV
> driver which already requires some specific config should be ok,
> but not sure for other driver which might not be explored in this before.
> Would that be a problem? Kevin?

Right, I'm assuming the aggregation could be modified until the device
is actually opened, the driver can nak the aggregation request by
returning an errno to the attribute write.  I'm trying to anticipate
whether this introduces new complications, for instances races with
contiguous allocations.  I think these seem solvable within the vendor
drivers, but please note it if I'm wrong.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/6] VFIO mdev aggregated resources handling
  2019-11-06 18:44     ` Alex Williamson
@ 2019-11-07 13:02       ` Cornelia Huck
  2019-11-15  4:24       ` Tian, Kevin
  1 sibling, 0 replies; 32+ messages in thread
From: Cornelia Huck @ 2019-11-07 13:02 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Zhenyu Wang, Tian, Kevin, kvm, kwankhede, Libvirt Devel,
	Pavel Hrdina, Jonathon Jongsma

On Wed, 6 Nov 2019 11:44:40 -0700
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Wed, 6 Nov 2019 12:20:31 +0800
> Zhenyu Wang <zhenyuw@linux.intel.com> wrote:
> 
> > On 2019.11.05 14:10:42 -0700, Alex Williamson wrote:  
> > > On Thu, 24 Oct 2019 13:08:23 +0800
> > > Zhenyu Wang <zhenyuw@linux.intel.com> wrote:
> > >     
> > > > Hi,
> > > > 
> > > > This is a refresh for previous send of this series. I got impression that
> > > > some SIOV drivers would still deploy their own create and config method so
> > > > stopped effort on this. But seems this would still be useful for some other
> > > > SIOV driver which may simply want capability to aggregate resources. So here's
> > > > refreshed series.
> > > > 
> > > > Current mdev device create interface depends on fixed mdev type, which get uuid
> > > > from user to create instance of mdev device. If user wants to use customized
> > > > number of resource for mdev device, then only can create new mdev type for that
> > > > which may not be flexible. This requirement comes not only from to be able to
> > > > allocate flexible resources for KVMGT, but also from Intel scalable IO
> > > > virtualization which would use vfio/mdev to be able to allocate arbitrary
> > > > resources on mdev instance. More info on [1] [2] [3].
> > > > 
> > > > To allow to create user defined resources for mdev, it trys to extend mdev
> > > > create interface by adding new "aggregate=xxx" parameter following UUID, for
> > > > target mdev type if aggregation is supported, it can create new mdev device
> > > > which contains resources combined by number of instances, e.g
> > > > 
> > > >     echo "<uuid>,aggregate=10" > create
> > > > 
> > > > VM manager e.g libvirt can check mdev type with "aggregation" attribute which
> > > > can support this setting. If no "aggregation" attribute found for mdev type,
> > > > previous behavior is still kept for one instance allocation. And new sysfs
> > > > attribute "aggregated_instances" is created for each mdev device to show allocated number.    
> > > 
> > > Given discussions we've had recently around libvirt interacting with
> > > mdev, I think that libvirt would rather have an abstract interface via
> > > mdevctl[1].  Therefore can you evaluate how mdevctl would support this
> > > creation extension?  It seems like it would fit within the existing
> > > mdev and mdevctl framework if aggregation were simply a sysfs attribute
> > > for the device.  For example, the mdevctl steps might look like this:
> > > 
> > > mdevctl define -u UUID -p PARENT -t TYPE
> > > mdevctl modify -u UUID --addattr=mdev/aggregation --value=2
> > > mdevctl start -u UUID
> > > 
> > > When mdevctl starts the mdev, it will first create it using the
> > > existing mechanism, then apply aggregation attribute, which can consume
> > > the necessary additional instances from the parent device, or return an
> > > error, which would unwind and return a failure code to the caller
> > > (libvirt).  I think the vendor driver would then have freedom to decide
> > > when the attribute could be modified, for instance it would be entirely
> > > reasonable to return -EBUSY if the user attempts to modify the
> > > attribute while the mdev device is in-use.  Effectively aggregation
> > > simply becomes a standardized attribute with common meaning.  Thoughts?
> > > [cc libvirt folks for their impression] Thanks,    
> > 
> > I think one problem is that before mdevctl start to create mdev you
> > don't know what vendor attributes are, as we apply mdev attributes
> > after create. You may need some lookup depending on parent.. I think
> > making aggregation like other vendor attribute for mdev might be the
> > simplest way, but do we want to define its behavior in formal? e.g
> > like previous discussed it should show maxium instances for aggregation, etc.  
> 
> Yes, we'd still want to standardize how we enable and discover
> aggregation since we expect multiple users.  Even if libvirt were to
> use mdevctl as it's mdev interface, higher level tools should have an
> introspection mechanism available.  Possibly the sysfs interfaces
> proposed in this series remains largely the same, but I think perhaps
> the implementation of them moves out to the vendor driver.  In fact,
> perhaps the only change to mdev core is to define the standard.  For
> example, the "aggregation" attribute on the type is potentially simply
> a defined, optional, per type attribute, similar to "name" and
> "description".  For "aggregated_instances" we already have the
> mdev_attr_groups of the mdev_parent_ops, we could define an
> attribute_group with .name = "mdev" as a set of standardized
> attributes, such that vendors could provide both their own vendor
> specific attributes and per device attributes with a common meaning and
> semantic defined in the mdev ABI.

+1 to standardizing this. While not every vendor driver will support
aggregation, providing a common infrastructure to ensure those that do
use the same approach is a good idea.

> 
> > The behavior change for driver is that previously aggregation is
> > handled at create time, but for sysfs attr it should handle any
> > resource allocation before it's really in-use. I think some SIOV
> > driver which already requires some specific config should be ok,
> > but not sure for other driver which might not be explored in this before.
> > Would that be a problem? Kevin?  
> 
> Right, I'm assuming the aggregation could be modified until the device
> is actually opened, the driver can nak the aggregation request by
> returning an errno to the attribute write.  I'm trying to anticipate
> whether this introduces new complications, for instances races with
> contiguous allocations.  I think these seem solvable within the vendor
> drivers, but please note it if I'm wrong.  Thanks,
> 
> Alex

FWIW, the ap driver does this post-creation configuration stuff
already. The intended workflow is create->add adapters/domains->start
vm with assigned device. Do we want to do some standardization as to
how post-creation configuration is supposed to work (like, at which
point in time is it fine to manipulate the attribute)? I'm not sure how
much of this is vendor-driver specific.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH 0/6] VFIO mdev aggregated resources handling
  2019-10-24  5:08 [PATCH 0/6] VFIO mdev aggregated resources handling Zhenyu Wang
                   ` (6 preceding siblings ...)
  2019-11-05 21:10 ` [PATCH 0/6] VFIO mdev aggregated resources handling Alex Williamson
@ 2019-11-07 20:37 ` Parav Pandit
  2019-11-08  8:19   ` Zhenyu Wang
  7 siblings, 1 reply; 32+ messages in thread
From: Parav Pandit @ 2019-11-07 20:37 UTC (permalink / raw)
  To: Zhenyu Wang, kvm; +Cc: alex.williamson, kwankhede, kevin.tian, cohuck

Hi,

> -----Original Message-----
> From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org> On Behalf
> Of Zhenyu Wang
> Sent: Thursday, October 24, 2019 12:08 AM
> To: kvm@vger.kernel.org
> Cc: alex.williamson@redhat.com; kwankhede@nvidia.com;
> kevin.tian@intel.com; cohuck@redhat.com
> Subject: [PATCH 0/6] VFIO mdev aggregated resources handling
> 
> Hi,
> 
> This is a refresh for previous send of this series. I got impression that some
> SIOV drivers would still deploy their own create and config method so stopped
> effort on this. But seems this would still be useful for some other SIOV driver
> which may simply want capability to aggregate resources. So here's refreshed
> series.
> 
> Current mdev device create interface depends on fixed mdev type, which get
> uuid from user to create instance of mdev device. If user wants to use
> customized number of resource for mdev device, then only can create new
Can you please give an example of 'resource'?
When I grep [1], [2] and [3], I couldn't find anything related to ' aggregate'.

> mdev type for that which may not be flexible. This requirement comes not only
> from to be able to allocate flexible resources for KVMGT, but also from Intel
> scalable IO virtualization which would use vfio/mdev to be able to allocate
> arbitrary resources on mdev instance. More info on [1] [2] [3].
> 
> To allow to create user defined resources for mdev, it trys to extend mdev
> create interface by adding new "aggregate=xxx" parameter following UUID, for
> target mdev type if aggregation is supported, it can create new mdev device
> which contains resources combined by number of instances, e.g
> 
>     echo "<uuid>,aggregate=10" > create
> 
> VM manager e.g libvirt can check mdev type with "aggregation" attribute
> which can support this setting. If no "aggregation" attribute found for mdev
> type, previous behavior is still kept for one instance allocation. And new sysfs
> attribute "aggregated_instances" is created for each mdev device to show
> allocated number.
> 
> References:
> [1] https://software.intel.com/en-us/download/intel-virtualization-technology-
> for-directed-io-architecture-specification
> [2] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-
> technical-specification
> [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
> 
> Zhenyu Wang (6):
>   vfio/mdev: Add new "aggregate" parameter for mdev create
>   vfio/mdev: Add "aggregation" attribute for supported mdev type
>   vfio/mdev: Add "aggregated_instances" attribute for supported mdev
>     device
>   Documentation/driver-api/vfio-mediated-device.rst: Update for
>     vfio/mdev aggregation support
>   Documentation/ABI/testing/sysfs-bus-vfio-mdev: Update for vfio/mdev
>     aggregation support
>   drm/i915/gvt: Add new type with aggregation support
> 
>  Documentation/ABI/testing/sysfs-bus-vfio-mdev | 24 ++++++
>  .../driver-api/vfio-mediated-device.rst       | 23 ++++++
>  drivers/gpu/drm/i915/gvt/gvt.c                |  4 +-
>  drivers/gpu/drm/i915/gvt/gvt.h                | 11 ++-
>  drivers/gpu/drm/i915/gvt/kvmgt.c              | 53 ++++++++++++-
>  drivers/gpu/drm/i915/gvt/vgpu.c               | 56 ++++++++++++-
>  drivers/vfio/mdev/mdev_core.c                 | 36 ++++++++-
>  drivers/vfio/mdev/mdev_private.h              |  6 +-
>  drivers/vfio/mdev/mdev_sysfs.c                | 79 ++++++++++++++++++-
>  include/linux/mdev.h                          | 19 +++++
>  10 files changed, 294 insertions(+), 17 deletions(-)
> 
> --
> 2.24.0.rc0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/6] VFIO mdev aggregated resources handling
  2019-11-07 20:37 ` Parav Pandit
@ 2019-11-08  8:19   ` Zhenyu Wang
  2019-12-04 17:36     ` Parav Pandit
  0 siblings, 1 reply; 32+ messages in thread
From: Zhenyu Wang @ 2019-11-08  8:19 UTC (permalink / raw)
  To: Parav Pandit; +Cc: kvm, alex.williamson, kwankhede, kevin.tian, cohuck

[-- Attachment #1: Type: text/plain, Size: 4219 bytes --]

On 2019.11.07 20:37:49 +0000, Parav Pandit wrote:
> Hi,
> 
> > -----Original Message-----
> > From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org> On Behalf
> > Of Zhenyu Wang
> > Sent: Thursday, October 24, 2019 12:08 AM
> > To: kvm@vger.kernel.org
> > Cc: alex.williamson@redhat.com; kwankhede@nvidia.com;
> > kevin.tian@intel.com; cohuck@redhat.com
> > Subject: [PATCH 0/6] VFIO mdev aggregated resources handling
> > 
> > Hi,
> > 
> > This is a refresh for previous send of this series. I got impression that some
> > SIOV drivers would still deploy their own create and config method so stopped
> > effort on this. But seems this would still be useful for some other SIOV driver
> > which may simply want capability to aggregate resources. So here's refreshed
> > series.
> > 
> > Current mdev device create interface depends on fixed mdev type, which get
> > uuid from user to create instance of mdev device. If user wants to use
> > customized number of resource for mdev device, then only can create new
> Can you please give an example of 'resource'?
> When I grep [1], [2] and [3], I couldn't find anything related to ' aggregate'.

The resource is vendor device specific, in SIOV spec there's ADI
(Assignable Device Interface) definition which could be e.g queue for
net device, context for gpu, etc. I just named this interface as 'aggregate'
for aggregation purpose, it's not used in spec doc.

Thanks

> 
> > mdev type for that which may not be flexible. This requirement comes not only
> > from to be able to allocate flexible resources for KVMGT, but also from Intel
> > scalable IO virtualization which would use vfio/mdev to be able to allocate
> > arbitrary resources on mdev instance. More info on [1] [2] [3].
> > 
> > To allow to create user defined resources for mdev, it trys to extend mdev
> > create interface by adding new "aggregate=xxx" parameter following UUID, for
> > target mdev type if aggregation is supported, it can create new mdev device
> > which contains resources combined by number of instances, e.g
> > 
> >     echo "<uuid>,aggregate=10" > create
> > 
> > VM manager e.g libvirt can check mdev type with "aggregation" attribute
> > which can support this setting. If no "aggregation" attribute found for mdev
> > type, previous behavior is still kept for one instance allocation. And new sysfs
> > attribute "aggregated_instances" is created for each mdev device to show
> > allocated number.
> > 
> > References:
> > [1] https://software.intel.com/en-us/download/intel-virtualization-technology-
> > for-directed-io-architecture-specification
> > [2] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-
> > technical-specification
> > [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
> > 
> > Zhenyu Wang (6):
> >   vfio/mdev: Add new "aggregate" parameter for mdev create
> >   vfio/mdev: Add "aggregation" attribute for supported mdev type
> >   vfio/mdev: Add "aggregated_instances" attribute for supported mdev
> >     device
> >   Documentation/driver-api/vfio-mediated-device.rst: Update for
> >     vfio/mdev aggregation support
> >   Documentation/ABI/testing/sysfs-bus-vfio-mdev: Update for vfio/mdev
> >     aggregation support
> >   drm/i915/gvt: Add new type with aggregation support
> > 
> >  Documentation/ABI/testing/sysfs-bus-vfio-mdev | 24 ++++++
> >  .../driver-api/vfio-mediated-device.rst       | 23 ++++++
> >  drivers/gpu/drm/i915/gvt/gvt.c                |  4 +-
> >  drivers/gpu/drm/i915/gvt/gvt.h                | 11 ++-
> >  drivers/gpu/drm/i915/gvt/kvmgt.c              | 53 ++++++++++++-
> >  drivers/gpu/drm/i915/gvt/vgpu.c               | 56 ++++++++++++-
> >  drivers/vfio/mdev/mdev_core.c                 | 36 ++++++++-
> >  drivers/vfio/mdev/mdev_private.h              |  6 +-
> >  drivers/vfio/mdev/mdev_sysfs.c                | 79 ++++++++++++++++++-
> >  include/linux/mdev.h                          | 19 +++++
> >  10 files changed, 294 insertions(+), 17 deletions(-)
> > 
> > --
> > 2.24.0.rc0
> 

-- 
Open Source Technology Center, Intel ltd.

$gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH 0/6] VFIO mdev aggregated resources handling
  2019-11-06 18:44     ` Alex Williamson
  2019-11-07 13:02       ` Cornelia Huck
@ 2019-11-15  4:24       ` Tian, Kevin
  2019-11-19 22:58         ` Alex Williamson
  1 sibling, 1 reply; 32+ messages in thread
From: Tian, Kevin @ 2019-11-15  4:24 UTC (permalink / raw)
  To: Alex Williamson, Zhenyu Wang
  Cc: kvm, kwankhede, cohuck, Libvirt Devel, Pavel Hrdina, Jonathon Jongsma

> From: Alex Williamson
> Sent: Thursday, November 7, 2019 2:45 AM
> 
> On Wed, 6 Nov 2019 12:20:31 +0800
> Zhenyu Wang <zhenyuw@linux.intel.com> wrote:
> 
> > On 2019.11.05 14:10:42 -0700, Alex Williamson wrote:
> > > On Thu, 24 Oct 2019 13:08:23 +0800
> > > Zhenyu Wang <zhenyuw@linux.intel.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > This is a refresh for previous send of this series. I got impression that
> > > > some SIOV drivers would still deploy their own create and config
> method so
> > > > stopped effort on this. But seems this would still be useful for some
> other
> > > > SIOV driver which may simply want capability to aggregate resources.
> So here's
> > > > refreshed series.
> > > >
> > > > Current mdev device create interface depends on fixed mdev type,
> which get uuid
> > > > from user to create instance of mdev device. If user wants to use
> customized
> > > > number of resource for mdev device, then only can create new mdev
> type for that
> > > > which may not be flexible. This requirement comes not only from to
> be able to
> > > > allocate flexible resources for KVMGT, but also from Intel scalable IO
> > > > virtualization which would use vfio/mdev to be able to allocate
> arbitrary
> > > > resources on mdev instance. More info on [1] [2] [3].
> > > >
> > > > To allow to create user defined resources for mdev, it trys to extend
> mdev
> > > > create interface by adding new "aggregate=xxx" parameter following
> UUID, for
> > > > target mdev type if aggregation is supported, it can create new mdev
> device
> > > > which contains resources combined by number of instances, e.g
> > > >
> > > >     echo "<uuid>,aggregate=10" > create
> > > >
> > > > VM manager e.g libvirt can check mdev type with "aggregation"
> attribute which
> > > > can support this setting. If no "aggregation" attribute found for mdev
> type,
> > > > previous behavior is still kept for one instance allocation. And new
> sysfs
> > > > attribute "aggregated_instances" is created for each mdev device to
> show allocated number.
> > >
> > > Given discussions we've had recently around libvirt interacting with
> > > mdev, I think that libvirt would rather have an abstract interface via
> > > mdevctl[1].  Therefore can you evaluate how mdevctl would support
> this
> > > creation extension?  It seems like it would fit within the existing
> > > mdev and mdevctl framework if aggregation were simply a sysfs
> attribute
> > > for the device.  For example, the mdevctl steps might look like this:
> > >
> > > mdevctl define -u UUID -p PARENT -t TYPE
> > > mdevctl modify -u UUID --addattr=mdev/aggregation --value=2
> > > mdevctl start -u UUID

Hi, Alex, can you elaborate why a sysfs attribute is more friendly
to mdevctl? what is the complexity if having mdevctl to pass
additional parameter at creation time as this series originally 
proposed? Just want to clearly understand the limitation of the
parameter way. :-)

> > >
> > > When mdevctl starts the mdev, it will first create it using the
> > > existing mechanism, then apply aggregation attribute, which can
> consume
> > > the necessary additional instances from the parent device, or return an
> > > error, which would unwind and return a failure code to the caller
> > > (libvirt).  I think the vendor driver would then have freedom to decide
> > > when the attribute could be modified, for instance it would be entirely
> > > reasonable to return -EBUSY if the user attempts to modify the
> > > attribute while the mdev device is in-use.  Effectively aggregation
> > > simply becomes a standardized attribute with common meaning.
> Thoughts?
> > > [cc libvirt folks for their impression] Thanks,
> >
> > I think one problem is that before mdevctl start to create mdev you
> > don't know what vendor attributes are, as we apply mdev attributes
> > after create. You may need some lookup depending on parent.. I think
> > making aggregation like other vendor attribute for mdev might be the
> > simplest way, but do we want to define its behavior in formal? e.g
> > like previous discussed it should show maxium instances for aggregation,
> etc.
> 
> Yes, we'd still want to standardize how we enable and discover
> aggregation since we expect multiple users.  Even if libvirt were to
> use mdevctl as it's mdev interface, higher level tools should have an
> introspection mechanism available.  Possibly the sysfs interfaces
> proposed in this series remains largely the same, but I think perhaps
> the implementation of them moves out to the vendor driver.  In fact,
> perhaps the only change to mdev core is to define the standard.  For
> example, the "aggregation" attribute on the type is potentially simply
> a defined, optional, per type attribute, similar to "name" and
> "description".  For "aggregated_instances" we already have the
> mdev_attr_groups of the mdev_parent_ops, we could define an
> attribute_group with .name = "mdev" as a set of standardized
> attributes, such that vendors could provide both their own vendor
> specific attributes and per device attributes with a common meaning and
> semantic defined in the mdev ABI.

such standardization sounds good.

> 
> > The behavior change for driver is that previously aggregation is
> > handled at create time, but for sysfs attr it should handle any
> > resource allocation before it's really in-use. I think some SIOV
> > driver which already requires some specific config should be ok,
> > but not sure for other driver which might not be explored in this before.
> > Would that be a problem? Kevin?
> 
> Right, I'm assuming the aggregation could be modified until the device
> is actually opened, the driver can nak the aggregation request by
> returning an errno to the attribute write.  I'm trying to anticipate
> whether this introduces new complications, for instances races with
> contiguous allocations.  I think these seem solvable within the vendor
> drivers, but please note it if I'm wrong.  Thanks,
> 

So far I didn't see a problem with this way. Regarding to contiguous
allocations, ideally it should be fine as long as aggregation paths are
properly locked similar  as creation paths when allocating resources.
It will introduce some additional work in vendor driver but such
overhead is worthy if it leads to cleaner uapi.

There is one open though. In concept the aggregation feature can
be used for both increasing and decreasing the resource when 
exposing as a sysfs attribute, any time when the device is not in-use. 
Increasing resource is possibly fine, but I'm not sure about decreasing
resource. Is there any vendor driver which cannot afford resource
decrease once it has ever been used (after deassignment), or require
at least an explicit reset before decrease? If yes, how do we report
such special requirement (only-once, multiple-times, multiple-times-
before-1st-usage) to user space?

It's sort of like what Cornelia commented about standardization
of post-creation resource configuration. If it may end up to be
a complex story (or at least take time to understand/standardize
all kinds of requirements), does it still make sense to support
creation-time parameter as a quick-path for this aggregation feature? :-)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/6] VFIO mdev aggregated resources handling
  2019-11-15  4:24       ` Tian, Kevin
@ 2019-11-19 22:58         ` Alex Williamson
  2019-11-20  0:46           ` Tian, Kevin
  0 siblings, 1 reply; 32+ messages in thread
From: Alex Williamson @ 2019-11-19 22:58 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Zhenyu Wang, kvm, kwankhede, cohuck, Libvirt Devel, Pavel Hrdina,
	Jonathon Jongsma

On Fri, 15 Nov 2019 04:24:35 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Alex Williamson
> > Sent: Thursday, November 7, 2019 2:45 AM
> > 
> > On Wed, 6 Nov 2019 12:20:31 +0800
> > Zhenyu Wang <zhenyuw@linux.intel.com> wrote:
> >   
> > > On 2019.11.05 14:10:42 -0700, Alex Williamson wrote:  
> > > > On Thu, 24 Oct 2019 13:08:23 +0800
> > > > Zhenyu Wang <zhenyuw@linux.intel.com> wrote:
> > > >  
> > > > > Hi,
> > > > >
> > > > > This is a refresh for previous send of this series. I got impression that
> > > > > some SIOV drivers would still deploy their own create and config  
> > method so  
> > > > > stopped effort on this. But seems this would still be useful for some  
> > other  
> > > > > SIOV driver which may simply want capability to aggregate resources.  
> > So here's  
> > > > > refreshed series.
> > > > >
> > > > > Current mdev device create interface depends on fixed mdev type,  
> > which get uuid  
> > > > > from user to create instance of mdev device. If user wants to use  
> > customized  
> > > > > number of resource for mdev device, then only can create new mdev  
> > type for that  
> > > > > which may not be flexible. This requirement comes not only from to  
> > be able to  
> > > > > allocate flexible resources for KVMGT, but also from Intel scalable IO
> > > > > virtualization which would use vfio/mdev to be able to allocate  
> > arbitrary  
> > > > > resources on mdev instance. More info on [1] [2] [3].
> > > > >
> > > > > To allow to create user defined resources for mdev, it trys to extend  
> > mdev  
> > > > > create interface by adding new "aggregate=xxx" parameter following  
> > UUID, for  
> > > > > target mdev type if aggregation is supported, it can create new mdev  
> > device  
> > > > > which contains resources combined by number of instances, e.g
> > > > >
> > > > >     echo "<uuid>,aggregate=10" > create
> > > > >
> > > > > VM manager e.g libvirt can check mdev type with "aggregation"  
> > attribute which  
> > > > > can support this setting. If no "aggregation" attribute found for mdev  
> > type,  
> > > > > previous behavior is still kept for one instance allocation. And new  
> > sysfs  
> > > > > attribute "aggregated_instances" is created for each mdev device to  
> > show allocated number.  
> > > >
> > > > Given discussions we've had recently around libvirt interacting with
> > > > mdev, I think that libvirt would rather have an abstract interface via
> > > > mdevctl[1].  Therefore can you evaluate how mdevctl would support  
> > this  
> > > > creation extension?  It seems like it would fit within the existing
> > > > mdev and mdevctl framework if aggregation were simply a sysfs  
> > attribute  
> > > > for the device.  For example, the mdevctl steps might look like this:
> > > >
> > > > mdevctl define -u UUID -p PARENT -t TYPE
> > > > mdevctl modify -u UUID --addattr=mdev/aggregation --value=2
> > > > mdevctl start -u UUID  
> 
> Hi, Alex, can you elaborate why a sysfs attribute is more friendly
> to mdevctl? what is the complexity if having mdevctl to pass
> additional parameter at creation time as this series originally 
> proposed? Just want to clearly understand the limitation of the
> parameter way. :-)

We could also flip this question around, vfio-ap already uses sysfs to
finish composing a device after it's created, therefore why shouldn't
aggregation use this existing mechanism.  Extending the creation
interface is a more fundamental change than simply standardizing an
optional sysfs namespace entry.

> > > >
> > > > When mdevctl starts the mdev, it will first create it using the
> > > > existing mechanism, then apply aggregation attribute, which can  
> > consume  
> > > > the necessary additional instances from the parent device, or return an
> > > > error, which would unwind and return a failure code to the caller
> > > > (libvirt).  I think the vendor driver would then have freedom to decide
> > > > when the attribute could be modified, for instance it would be entirely
> > > > reasonable to return -EBUSY if the user attempts to modify the
> > > > attribute while the mdev device is in-use.  Effectively aggregation
> > > > simply becomes a standardized attribute with common meaning.  
> > Thoughts?  
> > > > [cc libvirt folks for their impression] Thanks,  
> > >
> > > I think one problem is that before mdevctl start to create mdev you
> > > don't know what vendor attributes are, as we apply mdev attributes
> > > after create. You may need some lookup depending on parent.. I think
> > > making aggregation like other vendor attribute for mdev might be the
> > > simplest way, but do we want to define its behavior in formal? e.g
> > > like previous discussed it should show maxium instances for aggregation,  
> > etc.
> > 
> > Yes, we'd still want to standardize how we enable and discover
> > aggregation since we expect multiple users.  Even if libvirt were to
> > use mdevctl as it's mdev interface, higher level tools should have an
> > introspection mechanism available.  Possibly the sysfs interfaces
> > proposed in this series remains largely the same, but I think perhaps
> > the implementation of them moves out to the vendor driver.  In fact,
> > perhaps the only change to mdev core is to define the standard.  For
> > example, the "aggregation" attribute on the type is potentially simply
> > a defined, optional, per type attribute, similar to "name" and
> > "description".  For "aggregated_instances" we already have the
> > mdev_attr_groups of the mdev_parent_ops, we could define an
> > attribute_group with .name = "mdev" as a set of standardized
> > attributes, such that vendors could provide both their own vendor
> > specific attributes and per device attributes with a common meaning and
> > semantic defined in the mdev ABI.  
> 
> such standardization sounds good.
> 
> >   
> > > The behavior change for driver is that previously aggregation is
> > > handled at create time, but for sysfs attr it should handle any
> > > resource allocation before it's really in-use. I think some SIOV
> > > driver which already requires some specific config should be ok,
> > > but not sure for other driver which might not be explored in this before.
> > > Would that be a problem? Kevin?  
> > 
> > Right, I'm assuming the aggregation could be modified until the device
> > is actually opened, the driver can nak the aggregation request by
> > returning an errno to the attribute write.  I'm trying to anticipate
> > whether this introduces new complications, for instances races with
> > contiguous allocations.  I think these seem solvable within the vendor
> > drivers, but please note it if I'm wrong.  Thanks,
> >   
> 
> So far I didn't see a problem with this way. Regarding to contiguous
> allocations, ideally it should be fine as long as aggregation paths are
> properly locked similar  as creation paths when allocating resources.
> It will introduce some additional work in vendor driver but such
> overhead is worthy if it leads to cleaner uapi.
> 
> There is one open though. In concept the aggregation feature can
> be used for both increasing and decreasing the resource when 
> exposing as a sysfs attribute, any time when the device is not in-use. 
> Increasing resource is possibly fine, but I'm not sure about decreasing
> resource. Is there any vendor driver which cannot afford resource
> decrease once it has ever been used (after deassignment), or require
> at least an explicit reset before decrease? If yes, how do we report
> such special requirement (only-once, multiple-times, multiple-times-
> before-1st-usage) to user space?

It seems like a sloppy vendor driver that couldn't return a device to a
post-creation state, ie. drop and re-initialize the aggregation state.
Userspace would always need to handle an aggregation failure, there
might be multiple processes attempting to allocate resources
simultaneously or the user might simply be requesting more resources
than available.  The vendor driver should make a reasonable attempt to
satisfy the user request or else an insufficient resource error may
appear at the application.  vfio-mdev devices should always be reset
before and after usage.
 
> It's sort of like what Cornelia commented about standardization
> of post-creation resource configuration. If it may end up to be
> a complex story (or at least take time to understand/standardize
> all kinds of requirements), does it still make sense to support
> creation-time parameter as a quick-path for this aggregation feature? :-)

We're not going to do both, right?  We likely lock ourselves into one
schema when we do it.  Not only is the sysfs approach already in use in
vfio-ap, but it seems more flexible.  Above you raise the issue of
dynamically resizing the aggregation between uses.  We can't do that
with only a creation-time parameter.  With a sysfs parameter the vendor
driver can nak changes, allow changes when idle, potentially even allow
changes while in use.  Connie essentially brings up the question of how
we can introspect sysfs attribute, which is a big question.  Perhaps we
can nibble off a piece of that question by starting with a namespace
per attribute.  For instance, rather than doing:

echo 2 > /sys/bus/mdev/devices/UUID/mdev/aggregation

We could do:

echo 2 > /sys/bus/mdev/devices/UUID/mdev/aggregation/value

This allows us the whole mdev/aggregation/* namespace to describe other
attributes to expose aspects of the aggregation support.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH 0/6] VFIO mdev aggregated resources handling
  2019-11-19 22:58         ` Alex Williamson
@ 2019-11-20  0:46           ` Tian, Kevin
  0 siblings, 0 replies; 32+ messages in thread
From: Tian, Kevin @ 2019-11-20  0:46 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Zhenyu Wang, kvm, kwankhede, cohuck, Libvirt Devel, Pavel Hrdina,
	Jonathon Jongsma

> From: Alex Williamson
> Sent: Wednesday, November 20, 2019 6:58 AM
> 
> On Fri, 15 Nov 2019 04:24:35 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > > From: Alex Williamson
> > > Sent: Thursday, November 7, 2019 2:45 AM
> > >
> > > On Wed, 6 Nov 2019 12:20:31 +0800
> > > Zhenyu Wang <zhenyuw@linux.intel.com> wrote:
> > >
> > > > On 2019.11.05 14:10:42 -0700, Alex Williamson wrote:
> > > > > On Thu, 24 Oct 2019 13:08:23 +0800
> > > > > Zhenyu Wang <zhenyuw@linux.intel.com> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > This is a refresh for previous send of this series. I got impression
> that
> > > > > > some SIOV drivers would still deploy their own create and config
> > > method so
> > > > > > stopped effort on this. But seems this would still be useful for
> some
> > > other
> > > > > > SIOV driver which may simply want capability to aggregate
> resources.
> > > So here's
> > > > > > refreshed series.
> > > > > >
> > > > > > Current mdev device create interface depends on fixed mdev type,
> > > which get uuid
> > > > > > from user to create instance of mdev device. If user wants to use
> > > customized
> > > > > > number of resource for mdev device, then only can create new
> mdev
> > > type for that
> > > > > > which may not be flexible. This requirement comes not only from
> to
> > > be able to
> > > > > > allocate flexible resources for KVMGT, but also from Intel scalable
> IO
> > > > > > virtualization which would use vfio/mdev to be able to allocate
> > > arbitrary
> > > > > > resources on mdev instance. More info on [1] [2] [3].
> > > > > >
> > > > > > To allow to create user defined resources for mdev, it trys to
> extend
> > > mdev
> > > > > > create interface by adding new "aggregate=xxx" parameter
> following
> > > UUID, for
> > > > > > target mdev type if aggregation is supported, it can create new
> mdev
> > > device
> > > > > > which contains resources combined by number of instances, e.g
> > > > > >
> > > > > >     echo "<uuid>,aggregate=10" > create
> > > > > >
> > > > > > VM manager e.g libvirt can check mdev type with "aggregation"
> > > attribute which
> > > > > > can support this setting. If no "aggregation" attribute found for
> mdev
> > > type,
> > > > > > previous behavior is still kept for one instance allocation. And new
> > > sysfs
> > > > > > attribute "aggregated_instances" is created for each mdev device
> to
> > > show allocated number.
> > > > >
> > > > > Given discussions we've had recently around libvirt interacting with
> > > > > mdev, I think that libvirt would rather have an abstract interface via
> > > > > mdevctl[1].  Therefore can you evaluate how mdevctl would support
> > > this
> > > > > creation extension?  It seems like it would fit within the existing
> > > > > mdev and mdevctl framework if aggregation were simply a sysfs
> > > attribute
> > > > > for the device.  For example, the mdevctl steps might look like this:
> > > > >
> > > > > mdevctl define -u UUID -p PARENT -t TYPE
> > > > > mdevctl modify -u UUID --addattr=mdev/aggregation --value=2
> > > > > mdevctl start -u UUID
> >
> > Hi, Alex, can you elaborate why a sysfs attribute is more friendly
> > to mdevctl? what is the complexity if having mdevctl to pass
> > additional parameter at creation time as this series originally
> > proposed? Just want to clearly understand the limitation of the
> > parameter way. :-)
> 
> We could also flip this question around, vfio-ap already uses sysfs to
> finish composing a device after it's created, therefore why shouldn't
> aggregation use this existing mechanism.  Extending the creation
> interface is a more fundamental change than simply standardizing an
> optional sysfs namespace entry.
> 
> > > > >
> > > > > When mdevctl starts the mdev, it will first create it using the
> > > > > existing mechanism, then apply aggregation attribute, which can
> > > consume
> > > > > the necessary additional instances from the parent device, or return
> an
> > > > > error, which would unwind and return a failure code to the caller
> > > > > (libvirt).  I think the vendor driver would then have freedom to
> decide
> > > > > when the attribute could be modified, for instance it would be
> entirely
> > > > > reasonable to return -EBUSY if the user attempts to modify the
> > > > > attribute while the mdev device is in-use.  Effectively aggregation
> > > > > simply becomes a standardized attribute with common meaning.
> > > Thoughts?
> > > > > [cc libvirt folks for their impression] Thanks,
> > > >
> > > > I think one problem is that before mdevctl start to create mdev you
> > > > don't know what vendor attributes are, as we apply mdev attributes
> > > > after create. You may need some lookup depending on parent.. I think
> > > > making aggregation like other vendor attribute for mdev might be the
> > > > simplest way, but do we want to define its behavior in formal? e.g
> > > > like previous discussed it should show maxium instances for
> aggregation,
> > > etc.
> > >
> > > Yes, we'd still want to standardize how we enable and discover
> > > aggregation since we expect multiple users.  Even if libvirt were to
> > > use mdevctl as it's mdev interface, higher level tools should have an
> > > introspection mechanism available.  Possibly the sysfs interfaces
> > > proposed in this series remains largely the same, but I think perhaps
> > > the implementation of them moves out to the vendor driver.  In fact,
> > > perhaps the only change to mdev core is to define the standard.  For
> > > example, the "aggregation" attribute on the type is potentially simply
> > > a defined, optional, per type attribute, similar to "name" and
> > > "description".  For "aggregated_instances" we already have the
> > > mdev_attr_groups of the mdev_parent_ops, we could define an
> > > attribute_group with .name = "mdev" as a set of standardized
> > > attributes, such that vendors could provide both their own vendor
> > > specific attributes and per device attributes with a common meaning
> and
> > > semantic defined in the mdev ABI.
> >
> > such standardization sounds good.
> >
> > >
> > > > The behavior change for driver is that previously aggregation is
> > > > handled at create time, but for sysfs attr it should handle any
> > > > resource allocation before it's really in-use. I think some SIOV
> > > > driver which already requires some specific config should be ok,
> > > > but not sure for other driver which might not be explored in this
> before.
> > > > Would that be a problem? Kevin?
> > >
> > > Right, I'm assuming the aggregation could be modified until the device
> > > is actually opened, the driver can nak the aggregation request by
> > > returning an errno to the attribute write.  I'm trying to anticipate
> > > whether this introduces new complications, for instances races with
> > > contiguous allocations.  I think these seem solvable within the vendor
> > > drivers, but please note it if I'm wrong.  Thanks,
> > >
> >
> > So far I didn't see a problem with this way. Regarding to contiguous
> > allocations, ideally it should be fine as long as aggregation paths are
> > properly locked similar  as creation paths when allocating resources.
> > It will introduce some additional work in vendor driver but such
> > overhead is worthy if it leads to cleaner uapi.
> >
> > There is one open though. In concept the aggregation feature can
> > be used for both increasing and decreasing the resource when
> > exposing as a sysfs attribute, any time when the device is not in-use.
> > Increasing resource is possibly fine, but I'm not sure about decreasing
> > resource. Is there any vendor driver which cannot afford resource
> > decrease once it has ever been used (after deassignment), or require
> > at least an explicit reset before decrease? If yes, how do we report
> > such special requirement (only-once, multiple-times, multiple-times-
> > before-1st-usage) to user space?
> 
> It seems like a sloppy vendor driver that couldn't return a device to a
> post-creation state, ie. drop and re-initialize the aggregation state.

might be hardware limitation too...

> Userspace would always need to handle an aggregation failure, there
> might be multiple processes attempting to allocate resources
> simultaneously or the user might simply be requesting more resources
> than available.  The vendor driver should make a reasonable attempt to
> satisfy the user request or else an insufficient resource error may
> appear at the application.  vfio-mdev devices should always be reset
> before and after usage.

the two scenarios are different. One is to let userspace know whether
aggregation is supported, and any limitation. The other is to use
the feature under claimed limitations and then includes error handling
logic in case resource contention.

> 
> > It's sort of like what Cornelia commented about standardization
> > of post-creation resource configuration. If it may end up to be
> > a complex story (or at least take time to understand/standardize
> > all kinds of requirements), does it still make sense to support
> > creation-time parameter as a quick-path for this aggregation feature? :-)
> 
> We're not going to do both, right?  We likely lock ourselves into one
> schema when we do it.  Not only is the sysfs approach already in use in
> vfio-ap, but it seems more flexible.  Above you raise the issue of
> dynamically resizing the aggregation between uses.  We can't do that
> with only a creation-time parameter.  With a sysfs parameter the vendor

yes, because creation-time parameter is one-off.

> driver can nak changes, allow changes when idle, potentially even allow
> changes while in use.  Connie essentially brings up the question of how
> we can introspect sysfs attribute, which is a big question.  Perhaps we
> can nibble off a piece of that question by starting with a namespace
> per attribute.  For instance, rather than doing:
> 
> echo 2 > /sys/bus/mdev/devices/UUID/mdev/aggregation
> 
> We could do:
> 
> echo 2 > /sys/bus/mdev/devices/UUID/mdev/aggregation/value
> 
> This allows us the whole mdev/aggregation/* namespace to describe other
> attributes to expose aspects of the aggregation support.  Thanks,
> 

en, this sounds a better option. We can start with one attribute (value)
and extend to cover any possible restriction in the future. One note to
Zhenyu - with this approach at least you should prepare for both
increasing and decreasing resource through 'value' in GVT-g driver.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH 0/6] VFIO mdev aggregated resources handling
  2019-11-08  8:19   ` Zhenyu Wang
@ 2019-12-04 17:36     ` Parav Pandit
  2019-12-05  6:06       ` Zhenyu Wang
  0 siblings, 1 reply; 32+ messages in thread
From: Parav Pandit @ 2019-12-04 17:36 UTC (permalink / raw)
  To: Zhenyu Wang
  Cc: kvm, alex.williamson, kwankhede, kevin.tian, cohuck, Jiri Pirko,
	netdev, Jason Wang, Michael S. Tsirkin

+ Jiri + Netdev since you mentioned netdev queue.

+ Jason Wang and Michael as we had similar discussion in vdpa discussion thread.

> From: Zhenyu Wang <zhenyuw@linux.intel.com>
> Sent: Friday, November 8, 2019 2:19 AM
> To: Parav Pandit <parav@mellanox.com>
> 

My apologies to reply late.
Something bad with my email client, due to which I found this patch under spam folder today.
More comments below.

> On 2019.11.07 20:37:49 +0000, Parav Pandit wrote:
> > Hi,
> >
> > > -----Original Message-----
> > > From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org> On
> > > Behalf Of Zhenyu Wang
> > > Sent: Thursday, October 24, 2019 12:08 AM
> > > To: kvm@vger.kernel.org
> > > Cc: alex.williamson@redhat.com; kwankhede@nvidia.com;
> > > kevin.tian@intel.com; cohuck@redhat.com
> > > Subject: [PATCH 0/6] VFIO mdev aggregated resources handling
> > >
> > > Hi,
> > >
> > > This is a refresh for previous send of this series. I got impression
> > > that some SIOV drivers would still deploy their own create and
> > > config method so stopped effort on this. But seems this would still
> > > be useful for some other SIOV driver which may simply want
> > > capability to aggregate resources. So here's refreshed series.
> > >
> > > Current mdev device create interface depends on fixed mdev type,
> > > which get uuid from user to create instance of mdev device. If user
> > > wants to use customized number of resource for mdev device, then
> > > only can create new
> > Can you please give an example of 'resource'?
> > When I grep [1], [2] and [3], I couldn't find anything related to ' aggregate'.
> 
> The resource is vendor device specific, in SIOV spec there's ADI (Assignable
> Device Interface) definition which could be e.g queue for net device, context
> for gpu, etc. I just named this interface as 'aggregate'
> for aggregation purpose, it's not used in spec doc.
> 

Some 'unknown/undefined' vendor specific resource just doesn't work.
Orchestration tool doesn't know which resource and what/how to configure for which vendor.
It has to be well defined.

You can also find such discussion in recent lgpu DRM cgroup patches series v4.

Exposing networking resource configuration in non-net namespace aware mdev sysfs at PCI device level is no-go.
Adding per file NET_ADMIN or other checks is not the approach we follow in kernel.

devlink has been a subsystem though under net, that has very rich interface for syscaller, device health, resource management and many more.
Even though it is used by net driver today, its written for generic device management at bus/device level.

Yuval has posted patches to manage PCI sub-devices [1] and updated version will be posted soon which addresses comments.

For any device slice resource management of mdev, sub-function etc, we should be using single kernel interface as devlink [2], [3].

[1] https://lore.kernel.org/netdev/1573229926-30040-1-git-send-email-yuvalav@mellanox.com/
[2] http://man7.org/linux/man-pages/man8/devlink-dev.8.html
[3] http://man7.org/linux/man-pages/man8/devlink-resource.8.html

Most modern device configuration that I am aware of is usually done via well defined ioctl() of the subsystem (vhost, virtio, vfio, rdma, nvme and more) or via netlink commands (net, devlink, rdma and more) not via sysfs.

> Thanks
> 
> >
> > > mdev type for that which may not be flexible. This requirement comes
> > > not only from to be able to allocate flexible resources for KVMGT,
> > > but also from Intel scalable IO virtualization which would use
> > > vfio/mdev to be able to allocate arbitrary resources on mdev instance.
> More info on [1] [2] [3].
> > >
> > > To allow to create user defined resources for mdev, it trys to
> > > extend mdev create interface by adding new "aggregate=xxx" parameter
> > > following UUID, for target mdev type if aggregation is supported, it
> > > can create new mdev device which contains resources combined by
> > > number of instances, e.g
> > >
> > >     echo "<uuid>,aggregate=10" > create
> > >
> > > VM manager e.g libvirt can check mdev type with "aggregation"
> > > attribute which can support this setting. If no "aggregation"
> > > attribute found for mdev type, previous behavior is still kept for
> > > one instance allocation. And new sysfs attribute
> > > "aggregated_instances" is created for each mdev device to show allocated
> number.
> > >
> > > References:
> > > [1]
> > > https://software.intel.com/en-us/download/intel-virtualization-techn
> > > ology- for-directed-io-architecture-specification
> > > [2]
> > > https://software.intel.com/en-us/download/intel-scalable-io-virtuali
> > > zation-
> > > technical-specification
> > > [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
> > >
> > > Zhenyu Wang (6):
> > >   vfio/mdev: Add new "aggregate" parameter for mdev create
> > >   vfio/mdev: Add "aggregation" attribute for supported mdev type
> > >   vfio/mdev: Add "aggregated_instances" attribute for supported mdev
> > >     device
> > >   Documentation/driver-api/vfio-mediated-device.rst: Update for
> > >     vfio/mdev aggregation support
> > >   Documentation/ABI/testing/sysfs-bus-vfio-mdev: Update for vfio/mdev
> > >     aggregation support
> > >   drm/i915/gvt: Add new type with aggregation support
> > >
> > >  Documentation/ABI/testing/sysfs-bus-vfio-mdev | 24 ++++++
> > >  .../driver-api/vfio-mediated-device.rst       | 23 ++++++
> > >  drivers/gpu/drm/i915/gvt/gvt.c                |  4 +-
> > >  drivers/gpu/drm/i915/gvt/gvt.h                | 11 ++-
> > >  drivers/gpu/drm/i915/gvt/kvmgt.c              | 53 ++++++++++++-
> > >  drivers/gpu/drm/i915/gvt/vgpu.c               | 56 ++++++++++++-
> > >  drivers/vfio/mdev/mdev_core.c                 | 36 ++++++++-
> > >  drivers/vfio/mdev/mdev_private.h              |  6 +-
> > >  drivers/vfio/mdev/mdev_sysfs.c                | 79 ++++++++++++++++++-
> > >  include/linux/mdev.h                          | 19 +++++
> > >  10 files changed, 294 insertions(+), 17 deletions(-)
> > >
> > > --
> > > 2.24.0.rc0
> >
> 
> --
> Open Source Technology Center, Intel ltd.
> 
> $gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/6] VFIO mdev aggregated resources handling
  2019-12-04 17:36     ` Parav Pandit
@ 2019-12-05  6:06       ` Zhenyu Wang
  2019-12-05  6:40         ` Jason Wang
  2019-12-05 18:59         ` Parav Pandit
  0 siblings, 2 replies; 32+ messages in thread
From: Zhenyu Wang @ 2019-12-05  6:06 UTC (permalink / raw)
  To: Parav Pandit
  Cc: kvm, alex.williamson, kwankhede, kevin.tian, cohuck, Jiri Pirko,
	netdev, Jason Wang, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 7139 bytes --]

On 2019.12.04 17:36:12 +0000, Parav Pandit wrote:
> + Jiri + Netdev since you mentioned netdev queue.
> 
> + Jason Wang and Michael as we had similar discussion in vdpa discussion thread.
> 
> > From: Zhenyu Wang <zhenyuw@linux.intel.com>
> > Sent: Friday, November 8, 2019 2:19 AM
> > To: Parav Pandit <parav@mellanox.com>
> > 
> 
> My apologies to reply late.
> Something bad with my email client, due to which I found this patch under spam folder today.
> More comments below.
> 
> > On 2019.11.07 20:37:49 +0000, Parav Pandit wrote:
> > > Hi,
> > >
> > > > -----Original Message-----
> > > > From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org> On
> > > > Behalf Of Zhenyu Wang
> > > > Sent: Thursday, October 24, 2019 12:08 AM
> > > > To: kvm@vger.kernel.org
> > > > Cc: alex.williamson@redhat.com; kwankhede@nvidia.com;
> > > > kevin.tian@intel.com; cohuck@redhat.com
> > > > Subject: [PATCH 0/6] VFIO mdev aggregated resources handling
> > > >
> > > > Hi,
> > > >
> > > > This is a refresh for previous send of this series. I got impression
> > > > that some SIOV drivers would still deploy their own create and
> > > > config method so stopped effort on this. But seems this would still
> > > > be useful for some other SIOV driver which may simply want
> > > > capability to aggregate resources. So here's refreshed series.
> > > >
> > > > Current mdev device create interface depends on fixed mdev type,
> > > > which get uuid from user to create instance of mdev device. If user
> > > > wants to use customized number of resource for mdev device, then
> > > > only can create new
> > > Can you please give an example of 'resource'?
> > > When I grep [1], [2] and [3], I couldn't find anything related to ' aggregate'.
> > 
> > The resource is vendor device specific, in SIOV spec there's ADI (Assignable
> > Device Interface) definition which could be e.g queue for net device, context
> > for gpu, etc. I just named this interface as 'aggregate'
> > for aggregation purpose, it's not used in spec doc.
> > 
> 
> Some 'unknown/undefined' vendor specific resource just doesn't work.
> Orchestration tool doesn't know which resource and what/how to configure for which vendor.
> It has to be well defined.
> 
> You can also find such discussion in recent lgpu DRM cgroup patches series v4.
> 
> Exposing networking resource configuration in non-net namespace aware mdev sysfs at PCI device level is no-go.
> Adding per file NET_ADMIN or other checks is not the approach we follow in kernel.
> 
> devlink has been a subsystem though under net, that has very rich interface for syscaller, device health, resource management and many more.
> Even though it is used by net driver today, its written for generic device management at bus/device level.
> 
> Yuval has posted patches to manage PCI sub-devices [1] and updated version will be posted soon which addresses comments.
> 
> For any device slice resource management of mdev, sub-function etc, we should be using single kernel interface as devlink [2], [3].
> 
> [1] https://lore.kernel.org/netdev/1573229926-30040-1-git-send-email-yuvalav@mellanox.com/
> [2] http://man7.org/linux/man-pages/man8/devlink-dev.8.html
> [3] http://man7.org/linux/man-pages/man8/devlink-resource.8.html
> 
> Most modern device configuration that I am aware of is usually done via well defined ioctl() of the subsystem (vhost, virtio, vfio, rdma, nvme and more) or via netlink commands (net, devlink, rdma and more) not via sysfs.
> 

Current vfio/mdev configuration is via documented sysfs ABI instead of
other ways. So this adhere to that way to introduce more configurable
method on mdev device for standard, it's optional and not actually
vendor specific e.g vfio-ap.

I'm not sure how many devices support devlink now, or if really make
sense to utilize devlink for other devices except net, or if really make
sense to take mdev resource configuration from there...

> > 
> > >
> > > > mdev type for that which may not be flexible. This requirement comes
> > > > not only from to be able to allocate flexible resources for KVMGT,
> > > > but also from Intel scalable IO virtualization which would use
> > > > vfio/mdev to be able to allocate arbitrary resources on mdev instance.
> > More info on [1] [2] [3].
> > > >
> > > > To allow to create user defined resources for mdev, it trys to
> > > > extend mdev create interface by adding new "aggregate=xxx" parameter
> > > > following UUID, for target mdev type if aggregation is supported, it
> > > > can create new mdev device which contains resources combined by
> > > > number of instances, e.g
> > > >
> > > >     echo "<uuid>,aggregate=10" > create
> > > >
> > > > VM manager e.g libvirt can check mdev type with "aggregation"
> > > > attribute which can support this setting. If no "aggregation"
> > > > attribute found for mdev type, previous behavior is still kept for
> > > > one instance allocation. And new sysfs attribute
> > > > "aggregated_instances" is created for each mdev device to show allocated
> > number.
> > > >
> > > > References:
> > > > [1]
> > > > https://software.intel.com/en-us/download/intel-virtualization-techn
> > > > ology- for-directed-io-architecture-specification
> > > > [2]
> > > > https://software.intel.com/en-us/download/intel-scalable-io-virtuali
> > > > zation-
> > > > technical-specification
> > > > [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
> > > >
> > > > Zhenyu Wang (6):
> > > >   vfio/mdev: Add new "aggregate" parameter for mdev create
> > > >   vfio/mdev: Add "aggregation" attribute for supported mdev type
> > > >   vfio/mdev: Add "aggregated_instances" attribute for supported mdev
> > > >     device
> > > >   Documentation/driver-api/vfio-mediated-device.rst: Update for
> > > >     vfio/mdev aggregation support
> > > >   Documentation/ABI/testing/sysfs-bus-vfio-mdev: Update for vfio/mdev
> > > >     aggregation support
> > > >   drm/i915/gvt: Add new type with aggregation support
> > > >
> > > >  Documentation/ABI/testing/sysfs-bus-vfio-mdev | 24 ++++++
> > > >  .../driver-api/vfio-mediated-device.rst       | 23 ++++++
> > > >  drivers/gpu/drm/i915/gvt/gvt.c                |  4 +-
> > > >  drivers/gpu/drm/i915/gvt/gvt.h                | 11 ++-
> > > >  drivers/gpu/drm/i915/gvt/kvmgt.c              | 53 ++++++++++++-
> > > >  drivers/gpu/drm/i915/gvt/vgpu.c               | 56 ++++++++++++-
> > > >  drivers/vfio/mdev/mdev_core.c                 | 36 ++++++++-
> > > >  drivers/vfio/mdev/mdev_private.h              |  6 +-
> > > >  drivers/vfio/mdev/mdev_sysfs.c                | 79 ++++++++++++++++++-
> > > >  include/linux/mdev.h                          | 19 +++++
> > > >  10 files changed, 294 insertions(+), 17 deletions(-)
> > > >
> > > > --
> > > > 2.24.0.rc0
> > >
> > 
> > --
> > Open Source Technology Center, Intel ltd.
> > 
> > $gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827

-- 
Open Source Technology Center, Intel ltd.

$gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/6] VFIO mdev aggregated resources handling
  2019-12-05  6:06       ` Zhenyu Wang
@ 2019-12-05  6:40         ` Jason Wang
  2019-12-05 19:02           ` Parav Pandit
  2019-12-05 18:59         ` Parav Pandit
  1 sibling, 1 reply; 32+ messages in thread
From: Jason Wang @ 2019-12-05  6:40 UTC (permalink / raw)
  To: Zhenyu Wang, Parav Pandit
  Cc: kvm, alex.williamson, kwankhede, kevin.tian, cohuck, Jiri Pirko,
	netdev, Michael S. Tsirkin


On 2019/12/5 下午2:06, Zhenyu Wang wrote:
> On 2019.12.04 17:36:12 +0000, Parav Pandit wrote:
>> + Jiri + Netdev since you mentioned netdev queue.
>>
>> + Jason Wang and Michael as we had similar discussion in vdpa discussion thread.
>>
>>> From: Zhenyu Wang <zhenyuw@linux.intel.com>
>>> Sent: Friday, November 8, 2019 2:19 AM
>>> To: Parav Pandit <parav@mellanox.com>
>>>
>> My apologies to reply late.
>> Something bad with my email client, due to which I found this patch under spam folder today.
>> More comments below.
>>
>>> On 2019.11.07 20:37:49 +0000, Parav Pandit wrote:
>>>> Hi,
>>>>
>>>>> -----Original Message-----
>>>>> From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org> On
>>>>> Behalf Of Zhenyu Wang
>>>>> Sent: Thursday, October 24, 2019 12:08 AM
>>>>> To: kvm@vger.kernel.org
>>>>> Cc: alex.williamson@redhat.com; kwankhede@nvidia.com;
>>>>> kevin.tian@intel.com; cohuck@redhat.com
>>>>> Subject: [PATCH 0/6] VFIO mdev aggregated resources handling
>>>>>
>>>>> Hi,
>>>>>
>>>>> This is a refresh for previous send of this series. I got impression
>>>>> that some SIOV drivers would still deploy their own create and
>>>>> config method so stopped effort on this. But seems this would still
>>>>> be useful for some other SIOV driver which may simply want
>>>>> capability to aggregate resources. So here's refreshed series.
>>>>>
>>>>> Current mdev device create interface depends on fixed mdev type,
>>>>> which get uuid from user to create instance of mdev device. If user
>>>>> wants to use customized number of resource for mdev device, then
>>>>> only can create new
>>>> Can you please give an example of 'resource'?
>>>> When I grep [1], [2] and [3], I couldn't find anything related to ' aggregate'.
>>> The resource is vendor device specific, in SIOV spec there's ADI (Assignable
>>> Device Interface) definition which could be e.g queue for net device, context
>>> for gpu, etc. I just named this interface as 'aggregate'
>>> for aggregation purpose, it's not used in spec doc.
>>>
>> Some 'unknown/undefined' vendor specific resource just doesn't work.
>> Orchestration tool doesn't know which resource and what/how to configure for which vendor.
>> It has to be well defined.
>>
>> You can also find such discussion in recent lgpu DRM cgroup patches series v4.
>>
>> Exposing networking resource configuration in non-net namespace aware mdev sysfs at PCI device level is no-go.
>> Adding per file NET_ADMIN or other checks is not the approach we follow in kernel.
>>
>> devlink has been a subsystem though under net, that has very rich interface for syscaller, device health, resource management and many more.
>> Even though it is used by net driver today, its written for generic device management at bus/device level.
>>
>> Yuval has posted patches to manage PCI sub-devices [1] and updated version will be posted soon which addresses comments.
>>
>> For any device slice resource management of mdev, sub-function etc, we should be using single kernel interface as devlink [2], [3].
>>
>> [1] https://lore.kernel.org/netdev/1573229926-30040-1-git-send-email-yuvalav@mellanox.com/
>> [2] http://man7.org/linux/man-pages/man8/devlink-dev.8.html
>> [3] http://man7.org/linux/man-pages/man8/devlink-resource.8.html
>>
>> Most modern device configuration that I am aware of is usually done via well defined ioctl() of the subsystem (vhost, virtio, vfio, rdma, nvme and more) or via netlink commands (net, devlink, rdma and more) not via sysfs.
>>
> Current vfio/mdev configuration is via documented sysfs ABI instead of
> other ways. So this adhere to that way to introduce more configurable
> method on mdev device for standard, it's optional and not actually
> vendor specific e.g vfio-ap.
>
> I'm not sure how many devices support devlink now, or if really make
> sense to utilize devlink for other devices except net, or if really make
> sense to take mdev resource configuration from there...


It may make sense to allow other types of API to manage mdev other than 
sysfs. But I'm not sure whether or not it will be a challenge for 
orchestration.

Thanks


>>>>> mdev type for that which may not be flexible. This requirement comes
>>>>> not only from to be able to allocate flexible resources for KVMGT,
>>>>> but also from Intel scalable IO virtualization which would use
>>>>> vfio/mdev to be able to allocate arbitrary resources on mdev instance.
>>> More info on [1] [2] [3].
>>>>> To allow to create user defined resources for mdev, it trys to
>>>>> extend mdev create interface by adding new "aggregate=xxx" parameter
>>>>> following UUID, for target mdev type if aggregation is supported, it
>>>>> can create new mdev device which contains resources combined by
>>>>> number of instances, e.g
>>>>>
>>>>>      echo "<uuid>,aggregate=10" > create
>>>>>
>>>>> VM manager e.g libvirt can check mdev type with "aggregation"
>>>>> attribute which can support this setting. If no "aggregation"
>>>>> attribute found for mdev type, previous behavior is still kept for
>>>>> one instance allocation. And new sysfs attribute
>>>>> "aggregated_instances" is created for each mdev device to show allocated
>>> number.
>>>>> References:
>>>>> [1]
>>>>> https://software.intel.com/en-us/download/intel-virtualization-techn
>>>>> ology- for-directed-io-architecture-specification
>>>>> [2]
>>>>> https://software.intel.com/en-us/download/intel-scalable-io-virtuali
>>>>> zation-
>>>>> technical-specification
>>>>> [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
>>>>>
>>>>> Zhenyu Wang (6):
>>>>>    vfio/mdev: Add new "aggregate" parameter for mdev create
>>>>>    vfio/mdev: Add "aggregation" attribute for supported mdev type
>>>>>    vfio/mdev: Add "aggregated_instances" attribute for supported mdev
>>>>>      device
>>>>>    Documentation/driver-api/vfio-mediated-device.rst: Update for
>>>>>      vfio/mdev aggregation support
>>>>>    Documentation/ABI/testing/sysfs-bus-vfio-mdev: Update for vfio/mdev
>>>>>      aggregation support
>>>>>    drm/i915/gvt: Add new type with aggregation support
>>>>>
>>>>>   Documentation/ABI/testing/sysfs-bus-vfio-mdev | 24 ++++++
>>>>>   .../driver-api/vfio-mediated-device.rst       | 23 ++++++
>>>>>   drivers/gpu/drm/i915/gvt/gvt.c                |  4 +-
>>>>>   drivers/gpu/drm/i915/gvt/gvt.h                | 11 ++-
>>>>>   drivers/gpu/drm/i915/gvt/kvmgt.c              | 53 ++++++++++++-
>>>>>   drivers/gpu/drm/i915/gvt/vgpu.c               | 56 ++++++++++++-
>>>>>   drivers/vfio/mdev/mdev_core.c                 | 36 ++++++++-
>>>>>   drivers/vfio/mdev/mdev_private.h              |  6 +-
>>>>>   drivers/vfio/mdev/mdev_sysfs.c                | 79 ++++++++++++++++++-
>>>>>   include/linux/mdev.h                          | 19 +++++
>>>>>   10 files changed, 294 insertions(+), 17 deletions(-)
>>>>>
>>>>> --
>>>>> 2.24.0.rc0
>>> --
>>> Open Source Technology Center, Intel ltd.
>>>
>>> $gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH 0/6] VFIO mdev aggregated resources handling
  2019-12-05  6:06       ` Zhenyu Wang
  2019-12-05  6:40         ` Jason Wang
@ 2019-12-05 18:59         ` Parav Pandit
  2019-12-06  8:03           ` Zhenyu Wang
  1 sibling, 1 reply; 32+ messages in thread
From: Parav Pandit @ 2019-12-05 18:59 UTC (permalink / raw)
  To: Zhenyu Wang
  Cc: kvm, alex.williamson, kwankhede, kevin.tian, cohuck, Jiri Pirko,
	netdev, Jason Wang, Michael S. Tsirkin



> From: Zhenyu Wang <zhenyuw@linux.intel.com>
> Sent: Thursday, December 5, 2019 12:06 AM
> To: Parav Pandit <parav@mellanox.com>
> 
> On 2019.12.04 17:36:12 +0000, Parav Pandit wrote:
> > + Jiri + Netdev since you mentioned netdev queue.
> >
> > + Jason Wang and Michael as we had similar discussion in vdpa discussion
> thread.
> >
> > > From: Zhenyu Wang <zhenyuw@linux.intel.com>
> > > Sent: Friday, November 8, 2019 2:19 AM
> > > To: Parav Pandit <parav@mellanox.com>
> > >
> >
> > My apologies to reply late.
> > Something bad with my email client, due to which I found this patch under
> spam folder today.
> > More comments below.
> >
> > > On 2019.11.07 20:37:49 +0000, Parav Pandit wrote:
> > > > Hi,
> > > >
> > > > > -----Original Message-----
> > > > > From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org> On
> > > > > Behalf Of Zhenyu Wang
> > > > > Sent: Thursday, October 24, 2019 12:08 AM
> > > > > To: kvm@vger.kernel.org
> > > > > Cc: alex.williamson@redhat.com; kwankhede@nvidia.com;
> > > > > kevin.tian@intel.com; cohuck@redhat.com
> > > > > Subject: [PATCH 0/6] VFIO mdev aggregated resources handling
> > > > >
> > > > > Hi,
> > > > >
> > > > > This is a refresh for previous send of this series. I got
> > > > > impression that some SIOV drivers would still deploy their own
> > > > > create and config method so stopped effort on this. But seems
> > > > > this would still be useful for some other SIOV driver which may
> > > > > simply want capability to aggregate resources. So here's refreshed
> series.
> > > > >
> > > > > Current mdev device create interface depends on fixed mdev type,
> > > > > which get uuid from user to create instance of mdev device. If
> > > > > user wants to use customized number of resource for mdev device,
> > > > > then only can create new
> > > > Can you please give an example of 'resource'?
> > > > When I grep [1], [2] and [3], I couldn't find anything related to '
> aggregate'.
> > >
> > > The resource is vendor device specific, in SIOV spec there's ADI
> > > (Assignable Device Interface) definition which could be e.g queue
> > > for net device, context for gpu, etc. I just named this interface as
> 'aggregate'
> > > for aggregation purpose, it's not used in spec doc.
> > >
> >
> > Some 'unknown/undefined' vendor specific resource just doesn't work.
> > Orchestration tool doesn't know which resource and what/how to configure
> for which vendor.
> > It has to be well defined.
> >
> > You can also find such discussion in recent lgpu DRM cgroup patches series
> v4.
> >
> > Exposing networking resource configuration in non-net namespace aware
> mdev sysfs at PCI device level is no-go.
> > Adding per file NET_ADMIN or other checks is not the approach we follow in
> kernel.
> >
> > devlink has been a subsystem though under net, that has very rich interface
> for syscaller, device health, resource management and many more.
> > Even though it is used by net driver today, its written for generic device
> management at bus/device level.
> >
> > Yuval has posted patches to manage PCI sub-devices [1] and updated version
> will be posted soon which addresses comments.
> >
> > For any device slice resource management of mdev, sub-function etc, we
> should be using single kernel interface as devlink [2], [3].
> >
> > [1]
> > https://lore.kernel.org/netdev/1573229926-30040-1-git-send-email-yuval
> > av@mellanox.com/ [2]
> > http://man7.org/linux/man-pages/man8/devlink-dev.8.html
> > [3] http://man7.org/linux/man-pages/man8/devlink-resource.8.html
> >
> > Most modern device configuration that I am aware of is usually done via well
> defined ioctl() of the subsystem (vhost, virtio, vfio, rdma, nvme and more) or
> via netlink commands (net, devlink, rdma and more) not via sysfs.
> >
> 
> Current vfio/mdev configuration is via documented sysfs ABI instead of other
> ways. So this adhere to that way to introduce more configurable method on
> mdev device for standard, it's optional and not actually vendor specific e.g vfio-
> ap.
> 
Some unknown/undefined resource as 'aggregate' is just not an ABI.
It has to be well defined, as 'hardware_address', 'num_netdev_sqs' or something similar appropriate to that mdev device class.
If user wants to set a parameter for a mdev regardless of vendor, they must have single way to do so.

> I'm not sure how many devices support devlink now, or if really make sense to
> utilize devlink for other devices except net, or if really make sense to take
> mdev resource configuration from there...
> 
This is about adding new knobs not the existing one.
It has to be well defined. 'aggregate' is not the word that describes it.
If this is something very device specific, it should be prefixed with 'misc_' something.. or it should be misc_X ioctl().
Miscellaneous not so well defined class of devices are usually registered using misc_register().
Similarly attributes has to be well defined, otherwise, it should fall under misc category specially when you are pointing to 3 well defined specifications.

> > >
> > > >
> > > > > mdev type for that which may not be flexible. This requirement
> > > > > comes not only from to be able to allocate flexible resources
> > > > > for KVMGT, but also from Intel scalable IO virtualization which
> > > > > would use vfio/mdev to be able to allocate arbitrary resources on mdev
> instance.
> > > More info on [1] [2] [3].
> > > > >
> > > > > To allow to create user defined resources for mdev, it trys to
> > > > > extend mdev create interface by adding new "aggregate=xxx"
> > > > > parameter following UUID, for target mdev type if aggregation is
> > > > > supported, it can create new mdev device which contains
> > > > > resources combined by number of instances, e.g
> > > > >
> > > > >     echo "<uuid>,aggregate=10" > create
> > > > >
> > > > > VM manager e.g libvirt can check mdev type with "aggregation"
> > > > > attribute which can support this setting. If no "aggregation"
> > > > > attribute found for mdev type, previous behavior is still kept
> > > > > for one instance allocation. And new sysfs attribute
> > > > > "aggregated_instances" is created for each mdev device to show
> > > > > allocated
> > > number.
> > > > >
> > > > > References:
> > > > > [1]
> > > > > https://software.intel.com/en-us/download/intel-virtualization-t
> > > > > echn
> > > > > ology- for-directed-io-architecture-specification
> > > > > [2]
> > > > > https://software.intel.com/en-us/download/intel-scalable-io-virt
> > > > > uali
> > > > > zation-
> > > > > technical-specification
> > > > > [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
> > > > >
> > > > > Zhenyu Wang (6):
> > > > >   vfio/mdev: Add new "aggregate" parameter for mdev create
> > > > >   vfio/mdev: Add "aggregation" attribute for supported mdev type
> > > > >   vfio/mdev: Add "aggregated_instances" attribute for supported mdev
> > > > >     device
> > > > >   Documentation/driver-api/vfio-mediated-device.rst: Update for
> > > > >     vfio/mdev aggregation support
> > > > >   Documentation/ABI/testing/sysfs-bus-vfio-mdev: Update for vfio/mdev
> > > > >     aggregation support
> > > > >   drm/i915/gvt: Add new type with aggregation support
> > > > >
> > > > >  Documentation/ABI/testing/sysfs-bus-vfio-mdev | 24 ++++++
> > > > >  .../driver-api/vfio-mediated-device.rst       | 23 ++++++
> > > > >  drivers/gpu/drm/i915/gvt/gvt.c                |  4 +-
> > > > >  drivers/gpu/drm/i915/gvt/gvt.h                | 11 ++-
> > > > >  drivers/gpu/drm/i915/gvt/kvmgt.c              | 53 ++++++++++++-
> > > > >  drivers/gpu/drm/i915/gvt/vgpu.c               | 56 ++++++++++++-
> > > > >  drivers/vfio/mdev/mdev_core.c                 | 36 ++++++++-
> > > > >  drivers/vfio/mdev/mdev_private.h              |  6 +-
> > > > >  drivers/vfio/mdev/mdev_sysfs.c                | 79 ++++++++++++++++++-
> > > > >  include/linux/mdev.h                          | 19 +++++
> > > > >  10 files changed, 294 insertions(+), 17 deletions(-)
> > > > >
> > > > > --
> > > > > 2.24.0.rc0
> > > >
> > >
> > > --
> > > Open Source Technology Center, Intel ltd.
> > >
> > > $gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827
> 
> --
> Open Source Technology Center, Intel ltd.
> 
> $gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH 0/6] VFIO mdev aggregated resources handling
  2019-12-05  6:40         ` Jason Wang
@ 2019-12-05 19:02           ` Parav Pandit
  0 siblings, 0 replies; 32+ messages in thread
From: Parav Pandit @ 2019-12-05 19:02 UTC (permalink / raw)
  To: Jason Wang, Zhenyu Wang
  Cc: kvm, alex.williamson, kwankhede, kevin.tian, cohuck, Jiri Pirko,
	netdev, Michael S. Tsirkin

Hi Jason,

> From: Jason Wang <jasowang@redhat.com>
> Sent: Thursday, December 5, 2019 12:41 AM
> 
> 
> On 2019/12/5 下午2:06, Zhenyu Wang wrote:
> > On 2019.12.04 17:36:12 +0000, Parav Pandit wrote:
> >> + Jiri + Netdev since you mentioned netdev queue.
> >>
> >> + Jason Wang and Michael as we had similar discussion in vdpa discussion
> thread.
> >>
> >>> From: Zhenyu Wang <zhenyuw@linux.intel.com>
> >>> Sent: Friday, November 8, 2019 2:19 AM
> >>> To: Parav Pandit <parav@mellanox.com>
> >>>
> >> My apologies to reply late.
> >> Something bad with my email client, due to which I found this patch under
> spam folder today.
> >> More comments below.
> >>
> >>> On 2019.11.07 20:37:49 +0000, Parav Pandit wrote:
> >>>> Hi,
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org> On
> >>>>> Behalf Of Zhenyu Wang
> >>>>> Sent: Thursday, October 24, 2019 12:08 AM
> >>>>> To: kvm@vger.kernel.org
> >>>>> Cc: alex.williamson@redhat.com; kwankhede@nvidia.com;
> >>>>> kevin.tian@intel.com; cohuck@redhat.com
> >>>>> Subject: [PATCH 0/6] VFIO mdev aggregated resources handling
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> This is a refresh for previous send of this series. I got
> >>>>> impression that some SIOV drivers would still deploy their own
> >>>>> create and config method so stopped effort on this. But seems this
> >>>>> would still be useful for some other SIOV driver which may simply
> >>>>> want capability to aggregate resources. So here's refreshed series.
> >>>>>
> >>>>> Current mdev device create interface depends on fixed mdev type,
> >>>>> which get uuid from user to create instance of mdev device. If
> >>>>> user wants to use customized number of resource for mdev device,
> >>>>> then only can create new
> >>>> Can you please give an example of 'resource'?
> >>>> When I grep [1], [2] and [3], I couldn't find anything related to '
> aggregate'.
> >>> The resource is vendor device specific, in SIOV spec there's ADI
> >>> (Assignable Device Interface) definition which could be e.g queue
> >>> for net device, context for gpu, etc. I just named this interface as
> 'aggregate'
> >>> for aggregation purpose, it's not used in spec doc.
> >>>
> >> Some 'unknown/undefined' vendor specific resource just doesn't work.
> >> Orchestration tool doesn't know which resource and what/how to configure
> for which vendor.
> >> It has to be well defined.
> >>
> >> You can also find such discussion in recent lgpu DRM cgroup patches series
> v4.
> >>
> >> Exposing networking resource configuration in non-net namespace aware
> mdev sysfs at PCI device level is no-go.
> >> Adding per file NET_ADMIN or other checks is not the approach we follow in
> kernel.
> >>
> >> devlink has been a subsystem though under net, that has very rich interface
> for syscaller, device health, resource management and many more.
> >> Even though it is used by net driver today, its written for generic device
> management at bus/device level.
> >>
> >> Yuval has posted patches to manage PCI sub-devices [1] and updated version
> will be posted soon which addresses comments.
> >>
> >> For any device slice resource management of mdev, sub-function etc, we
> should be using single kernel interface as devlink [2], [3].
> >>
> >> [1]
> >> https://lore.kernel.org/netdev/1573229926-30040-1-git-send-email-yuva
> >> lav@mellanox.com/ [2]
> >> http://man7.org/linux/man-pages/man8/devlink-dev.8.html
> >> [3] http://man7.org/linux/man-pages/man8/devlink-resource.8.html
> >>
> >> Most modern device configuration that I am aware of is usually done via
> well defined ioctl() of the subsystem (vhost, virtio, vfio, rdma, nvme and more)
> or via netlink commands (net, devlink, rdma and more) not via sysfs.
> >>
> > Current vfio/mdev configuration is via documented sysfs ABI instead of
> > other ways. So this adhere to that way to introduce more configurable
> > method on mdev device for standard, it's optional and not actually
> > vendor specific e.g vfio-ap.
> >
> > I'm not sure how many devices support devlink now, or if really make
> > sense to utilize devlink for other devices except net, or if really
> > make sense to take mdev resource configuration from there...
> 
> 
> It may make sense to allow other types of API to manage mdev other than
> sysfs. But I'm not sure whether or not it will be a challenge for orchestration.
> 

There are two parts.
1. How you specify resource config (sysfs/netlink/devlink/ioctl etc)
2. definition of the resource itself. It has to be well defined. Or it should be categorized as miscellaneous.
It cannot be some undefined/vague name as 'aggregate'.
 
> Thanks
> 
> 
> >>>>> mdev type for that which may not be flexible. This requirement
> >>>>> comes not only from to be able to allocate flexible resources for
> >>>>> KVMGT, but also from Intel scalable IO virtualization which would
> >>>>> use vfio/mdev to be able to allocate arbitrary resources on mdev
> instance.
> >>> More info on [1] [2] [3].
> >>>>> To allow to create user defined resources for mdev, it trys to
> >>>>> extend mdev create interface by adding new "aggregate=xxx"
> >>>>> parameter following UUID, for target mdev type if aggregation is
> >>>>> supported, it can create new mdev device which contains resources
> >>>>> combined by number of instances, e.g
> >>>>>
> >>>>>      echo "<uuid>,aggregate=10" > create
> >>>>>
> >>>>> VM manager e.g libvirt can check mdev type with "aggregation"
> >>>>> attribute which can support this setting. If no "aggregation"
> >>>>> attribute found for mdev type, previous behavior is still kept for
> >>>>> one instance allocation. And new sysfs attribute
> >>>>> "aggregated_instances" is created for each mdev device to show
> >>>>> allocated
> >>> number.
> >>>>> References:
> >>>>> [1]
> >>>>> https://software.intel.com/en-us/download/intel-virtualization-tec
> >>>>> hn
> >>>>> ology- for-directed-io-architecture-specification
> >>>>> [2]
> >>>>> https://software.intel.com/en-us/download/intel-scalable-io-virtua
> >>>>> li
> >>>>> zation-
> >>>>> technical-specification
> >>>>> [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
> >>>>>
> >>>>> Zhenyu Wang (6):
> >>>>>    vfio/mdev: Add new "aggregate" parameter for mdev create
> >>>>>    vfio/mdev: Add "aggregation" attribute for supported mdev type
> >>>>>    vfio/mdev: Add "aggregated_instances" attribute for supported mdev
> >>>>>      device
> >>>>>    Documentation/driver-api/vfio-mediated-device.rst: Update for
> >>>>>      vfio/mdev aggregation support
> >>>>>    Documentation/ABI/testing/sysfs-bus-vfio-mdev: Update for vfio/mdev
> >>>>>      aggregation support
> >>>>>    drm/i915/gvt: Add new type with aggregation support
> >>>>>
> >>>>>   Documentation/ABI/testing/sysfs-bus-vfio-mdev | 24 ++++++
> >>>>>   .../driver-api/vfio-mediated-device.rst       | 23 ++++++
> >>>>>   drivers/gpu/drm/i915/gvt/gvt.c                |  4 +-
> >>>>>   drivers/gpu/drm/i915/gvt/gvt.h                | 11 ++-
> >>>>>   drivers/gpu/drm/i915/gvt/kvmgt.c              | 53 ++++++++++++-
> >>>>>   drivers/gpu/drm/i915/gvt/vgpu.c               | 56 ++++++++++++-
> >>>>>   drivers/vfio/mdev/mdev_core.c                 | 36 ++++++++-
> >>>>>   drivers/vfio/mdev/mdev_private.h              |  6 +-
> >>>>>   drivers/vfio/mdev/mdev_sysfs.c                | 79 ++++++++++++++++++-
> >>>>>   include/linux/mdev.h                          | 19 +++++
> >>>>>   10 files changed, 294 insertions(+), 17 deletions(-)
> >>>>>
> >>>>> --
> >>>>> 2.24.0.rc0
> >>> --
> >>> Open Source Technology Center, Intel ltd.
> >>>
> >>> $gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/6] VFIO mdev aggregated resources handling
  2019-12-05 18:59         ` Parav Pandit
@ 2019-12-06  8:03           ` Zhenyu Wang
  2019-12-06 17:33             ` Parav Pandit
  0 siblings, 1 reply; 32+ messages in thread
From: Zhenyu Wang @ 2019-12-06  8:03 UTC (permalink / raw)
  To: Parav Pandit
  Cc: kvm, alex.williamson, kwankhede, kevin.tian, cohuck, Jiri Pirko,
	netdev, Jason Wang, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 8729 bytes --]

On 2019.12.05 18:59:36 +0000, Parav Pandit wrote:
> > >
> > > > On 2019.11.07 20:37:49 +0000, Parav Pandit wrote:
> > > > > Hi,
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org> On
> > > > > > Behalf Of Zhenyu Wang
> > > > > > Sent: Thursday, October 24, 2019 12:08 AM
> > > > > > To: kvm@vger.kernel.org
> > > > > > Cc: alex.williamson@redhat.com; kwankhede@nvidia.com;
> > > > > > kevin.tian@intel.com; cohuck@redhat.com
> > > > > > Subject: [PATCH 0/6] VFIO mdev aggregated resources handling
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > This is a refresh for previous send of this series. I got
> > > > > > impression that some SIOV drivers would still deploy their own
> > > > > > create and config method so stopped effort on this. But seems
> > > > > > this would still be useful for some other SIOV driver which may
> > > > > > simply want capability to aggregate resources. So here's refreshed
> > series.
> > > > > >
> > > > > > Current mdev device create interface depends on fixed mdev type,
> > > > > > which get uuid from user to create instance of mdev device. If
> > > > > > user wants to use customized number of resource for mdev device,
> > > > > > then only can create new
> > > > > Can you please give an example of 'resource'?
> > > > > When I grep [1], [2] and [3], I couldn't find anything related to '
> > aggregate'.
> > > >
> > > > The resource is vendor device specific, in SIOV spec there's ADI
> > > > (Assignable Device Interface) definition which could be e.g queue
> > > > for net device, context for gpu, etc. I just named this interface as
> > 'aggregate'
> > > > for aggregation purpose, it's not used in spec doc.
> > > >
> > >
> > > Some 'unknown/undefined' vendor specific resource just doesn't work.
> > > Orchestration tool doesn't know which resource and what/how to configure
> > for which vendor.
> > > It has to be well defined.
> > >
> > > You can also find such discussion in recent lgpu DRM cgroup patches series
> > v4.
> > >
> > > Exposing networking resource configuration in non-net namespace aware
> > mdev sysfs at PCI device level is no-go.
> > > Adding per file NET_ADMIN or other checks is not the approach we follow in
> > kernel.
> > >
> > > devlink has been a subsystem though under net, that has very rich interface
> > for syscaller, device health, resource management and many more.
> > > Even though it is used by net driver today, its written for generic device
> > management at bus/device level.
> > >
> > > Yuval has posted patches to manage PCI sub-devices [1] and updated version
> > will be posted soon which addresses comments.
> > >
> > > For any device slice resource management of mdev, sub-function etc, we
> > should be using single kernel interface as devlink [2], [3].
> > >
> > > [1]
> > > https://lore.kernel.org/netdev/1573229926-30040-1-git-send-email-yuval
> > > av@mellanox.com/ [2]
> > > http://man7.org/linux/man-pages/man8/devlink-dev.8.html
> > > [3] http://man7.org/linux/man-pages/man8/devlink-resource.8.html
> > >
> > > Most modern device configuration that I am aware of is usually done via well
> > defined ioctl() of the subsystem (vhost, virtio, vfio, rdma, nvme and more) or
> > via netlink commands (net, devlink, rdma and more) not via sysfs.
> > >
> > 
> > Current vfio/mdev configuration is via documented sysfs ABI instead of other
> > ways. So this adhere to that way to introduce more configurable method on
> > mdev device for standard, it's optional and not actually vendor specific e.g vfio-
> > ap.
> > 
> Some unknown/undefined resource as 'aggregate' is just not an ABI.
> It has to be well defined, as 'hardware_address', 'num_netdev_sqs' or something similar appropriate to that mdev device class.
> If user wants to set a parameter for a mdev regardless of vendor, they must have single way to do so.

The idea is not specific for some device class, but for each mdev
type's resource, and be optional for each vendor. If more device class
specific way is preferred, then we might have very different ways for
different vendors. Better to avoid that, so here means to aggregate
number of mdev type's resources for target instance, instead of defining
kinds of mdev types for those number of resources.

> 
> > I'm not sure how many devices support devlink now, or if really make sense to
> > utilize devlink for other devices except net, or if really make sense to take
> > mdev resource configuration from there...
> > 
> This is about adding new knobs not the existing one.
> It has to be well defined. 'aggregate' is not the word that describes it.
> If this is something very device specific, it should be prefixed with 'misc_' something.. or it should be misc_X ioctl().
> Miscellaneous not so well defined class of devices are usually registered using misc_register().
> Similarly attributes has to be well defined, otherwise, it should fall under misc category specially when you are pointing to 3 well defined specifications.
>

Any suggestion for naming it?

> > > >
> > > > >
> > > > > > mdev type for that which may not be flexible. This requirement
> > > > > > comes not only from to be able to allocate flexible resources
> > > > > > for KVMGT, but also from Intel scalable IO virtualization which
> > > > > > would use vfio/mdev to be able to allocate arbitrary resources on mdev
> > instance.
> > > > More info on [1] [2] [3].
> > > > > >
> > > > > > To allow to create user defined resources for mdev, it trys to
> > > > > > extend mdev create interface by adding new "aggregate=xxx"
> > > > > > parameter following UUID, for target mdev type if aggregation is
> > > > > > supported, it can create new mdev device which contains
> > > > > > resources combined by number of instances, e.g
> > > > > >
> > > > > >     echo "<uuid>,aggregate=10" > create
> > > > > >
> > > > > > VM manager e.g libvirt can check mdev type with "aggregation"
> > > > > > attribute which can support this setting. If no "aggregation"
> > > > > > attribute found for mdev type, previous behavior is still kept
> > > > > > for one instance allocation. And new sysfs attribute
> > > > > > "aggregated_instances" is created for each mdev device to show
> > > > > > allocated
> > > > number.
> > > > > >
> > > > > > References:
> > > > > > [1]
> > > > > > https://software.intel.com/en-us/download/intel-virtualization-t
> > > > > > echn
> > > > > > ology- for-directed-io-architecture-specification
> > > > > > [2]
> > > > > > https://software.intel.com/en-us/download/intel-scalable-io-virt
> > > > > > uali
> > > > > > zation-
> > > > > > technical-specification
> > > > > > [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
> > > > > >
> > > > > > Zhenyu Wang (6):
> > > > > >   vfio/mdev: Add new "aggregate" parameter for mdev create
> > > > > >   vfio/mdev: Add "aggregation" attribute for supported mdev type
> > > > > >   vfio/mdev: Add "aggregated_instances" attribute for supported mdev
> > > > > >     device
> > > > > >   Documentation/driver-api/vfio-mediated-device.rst: Update for
> > > > > >     vfio/mdev aggregation support
> > > > > >   Documentation/ABI/testing/sysfs-bus-vfio-mdev: Update for vfio/mdev
> > > > > >     aggregation support
> > > > > >   drm/i915/gvt: Add new type with aggregation support
> > > > > >
> > > > > >  Documentation/ABI/testing/sysfs-bus-vfio-mdev | 24 ++++++
> > > > > >  .../driver-api/vfio-mediated-device.rst       | 23 ++++++
> > > > > >  drivers/gpu/drm/i915/gvt/gvt.c                |  4 +-
> > > > > >  drivers/gpu/drm/i915/gvt/gvt.h                | 11 ++-
> > > > > >  drivers/gpu/drm/i915/gvt/kvmgt.c              | 53 ++++++++++++-
> > > > > >  drivers/gpu/drm/i915/gvt/vgpu.c               | 56 ++++++++++++-
> > > > > >  drivers/vfio/mdev/mdev_core.c                 | 36 ++++++++-
> > > > > >  drivers/vfio/mdev/mdev_private.h              |  6 +-
> > > > > >  drivers/vfio/mdev/mdev_sysfs.c                | 79 ++++++++++++++++++-
> > > > > >  include/linux/mdev.h                          | 19 +++++
> > > > > >  10 files changed, 294 insertions(+), 17 deletions(-)
> > > > > >
> > > > > > --
> > > > > > 2.24.0.rc0
> > > > >
> > > >
> > > > --
> > > > Open Source Technology Center, Intel ltd.
> > > >
> > > > $gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827
> > 
> > --
> > Open Source Technology Center, Intel ltd.
> > 
> > $gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827

-- 
Open Source Technology Center, Intel ltd.

$gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/6] VFIO mdev aggregated resources handling
  2019-12-06  8:03           ` Zhenyu Wang
@ 2019-12-06 17:33             ` Parav Pandit
  2019-12-10  3:33               ` Tian, Kevin
  0 siblings, 1 reply; 32+ messages in thread
From: Parav Pandit @ 2019-12-06 17:33 UTC (permalink / raw)
  To: Zhenyu Wang
  Cc: kvm, alex.williamson, kwankhede, kevin.tian, cohuck, Jiri Pirko,
	netdev, Jason Wang, Michael S. Tsirkin

On 12/6/2019 2:03 AM, Zhenyu Wang wrote:
> On 2019.12.05 18:59:36 +0000, Parav Pandit wrote:
>>>>
>>>>> On 2019.11.07 20:37:49 +0000, Parav Pandit wrote:
>>>>>> Hi,
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org> On
>>>>>>> Behalf Of Zhenyu Wang
>>>>>>> Sent: Thursday, October 24, 2019 12:08 AM
>>>>>>> To: kvm@vger.kernel.org
>>>>>>> Cc: alex.williamson@redhat.com; kwankhede@nvidia.com;
>>>>>>> kevin.tian@intel.com; cohuck@redhat.com
>>>>>>> Subject: [PATCH 0/6] VFIO mdev aggregated resources handling
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> This is a refresh for previous send of this series. I got
>>>>>>> impression that some SIOV drivers would still deploy their own
>>>>>>> create and config method so stopped effort on this. But seems
>>>>>>> this would still be useful for some other SIOV driver which may
>>>>>>> simply want capability to aggregate resources. So here's refreshed
>>> series.
>>>>>>>
>>>>>>> Current mdev device create interface depends on fixed mdev type,
>>>>>>> which get uuid from user to create instance of mdev device. If
>>>>>>> user wants to use customized number of resource for mdev device,
>>>>>>> then only can create new
>>>>>> Can you please give an example of 'resource'?
>>>>>> When I grep [1], [2] and [3], I couldn't find anything related to '
>>> aggregate'.
>>>>>
>>>>> The resource is vendor device specific, in SIOV spec there's ADI
>>>>> (Assignable Device Interface) definition which could be e.g queue
>>>>> for net device, context for gpu, etc. I just named this interface as
>>> 'aggregate'
>>>>> for aggregation purpose, it's not used in spec doc.
>>>>>
>>>>
>>>> Some 'unknown/undefined' vendor specific resource just doesn't work.
>>>> Orchestration tool doesn't know which resource and what/how to configure
>>> for which vendor.
>>>> It has to be well defined.
>>>>
>>>> You can also find such discussion in recent lgpu DRM cgroup patches series
>>> v4.
>>>>
>>>> Exposing networking resource configuration in non-net namespace aware
>>> mdev sysfs at PCI device level is no-go.
>>>> Adding per file NET_ADMIN or other checks is not the approach we follow in
>>> kernel.
>>>>
>>>> devlink has been a subsystem though under net, that has very rich interface
>>> for syscaller, device health, resource management and many more.
>>>> Even though it is used by net driver today, its written for generic device
>>> management at bus/device level.
>>>>
>>>> Yuval has posted patches to manage PCI sub-devices [1] and updated version
>>> will be posted soon which addresses comments.
>>>>
>>>> For any device slice resource management of mdev, sub-function etc, we
>>> should be using single kernel interface as devlink [2], [3].
>>>>
>>>> [1]
>>>> https://lore.kernel.org/netdev/1573229926-30040-1-git-send-email-yuval
>>>> av@mellanox.com/ [2]
>>>> http://man7.org/linux/man-pages/man8/devlink-dev.8.html
>>>> [3] http://man7.org/linux/man-pages/man8/devlink-resource.8.html
>>>>
>>>> Most modern device configuration that I am aware of is usually done via well
>>> defined ioctl() of the subsystem (vhost, virtio, vfio, rdma, nvme and more) or
>>> via netlink commands (net, devlink, rdma and more) not via sysfs.
>>>>
>>>
>>> Current vfio/mdev configuration is via documented sysfs ABI instead of other
>>> ways. So this adhere to that way to introduce more configurable method on
>>> mdev device for standard, it's optional and not actually vendor specific e.g vfio-
>>> ap.
>>>
>> Some unknown/undefined resource as 'aggregate' is just not an ABI.
>> It has to be well defined, as 'hardware_address', 'num_netdev_sqs' or something similar appropriate to that mdev device class.
>> If user wants to set a parameter for a mdev regardless of vendor, they must have single way to do so.
> 
> The idea is not specific for some device class, but for each mdev
> type's resource, and be optional for each vendor. If more device class
> specific way is preferred, then we might have very different ways for
> different vendors. Better to avoid that, so here means to aggregate
> number of mdev type's resources for target instance, instead of defining
> kinds of mdev types for those number of resources.
> 
Parameter or attribute certainly can be optional.
But the way to aggregate them should not be vendor specific.
Look for some excellent existing examples across subsystems, for example
how you create aggregated netdev or block device is not depend on vendor
or underlying device type.

>>
>>> I'm not sure how many devices support devlink now, or if really make sense to
>>> utilize devlink for other devices except net, or if really make sense to take
>>> mdev resource configuration from there...
>>>
>> This is about adding new knobs not the existing one.
>> It has to be well defined. 'aggregate' is not the word that describes it.
>> If this is something very device specific, it should be prefixed with 'misc_' something.. or it should be misc_X ioctl().
>> Miscellaneous not so well defined class of devices are usually registered using misc_register().
>> Similarly attributes has to be well defined, otherwise, it should fall under misc category specially when you are pointing to 3 well defined specifications.
>>
> 
> Any suggestion for naming it?

If parameter is miscellaneous, please prefix it with misc in mdev
ioctl() or in sysfs.
If parameter/attribute is max_netdev_txqs for netdev, name as that,
If its max_dedicated_wqs of some dsa device, please name is that way.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH 0/6] VFIO mdev aggregated resources handling
  2019-12-06 17:33             ` Parav Pandit
@ 2019-12-10  3:33               ` Tian, Kevin
  2019-12-10 19:07                 ` Alex Williamson
  0 siblings, 1 reply; 32+ messages in thread
From: Tian, Kevin @ 2019-12-10  3:33 UTC (permalink / raw)
  To: Parav Pandit, Zhenyu Wang
  Cc: kvm, alex.williamson, kwankhede, cohuck, Jiri Pirko, netdev,
	Jason Wang, Michael S. Tsirkin

> From: Parav Pandit <parav@mellanox.com>
> Sent: Saturday, December 7, 2019 1:34 AM
> 
> On 12/6/2019 2:03 AM, Zhenyu Wang wrote:
> > On 2019.12.05 18:59:36 +0000, Parav Pandit wrote:
> >>>>
> >>>>> On 2019.11.07 20:37:49 +0000, Parav Pandit wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org>
> On
> >>>>>>> Behalf Of Zhenyu Wang
> >>>>>>> Sent: Thursday, October 24, 2019 12:08 AM
> >>>>>>> To: kvm@vger.kernel.org
> >>>>>>> Cc: alex.williamson@redhat.com; kwankhede@nvidia.com;
> >>>>>>> kevin.tian@intel.com; cohuck@redhat.com
> >>>>>>> Subject: [PATCH 0/6] VFIO mdev aggregated resources handling
> >>>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> This is a refresh for previous send of this series. I got
> >>>>>>> impression that some SIOV drivers would still deploy their own
> >>>>>>> create and config method so stopped effort on this. But seems
> >>>>>>> this would still be useful for some other SIOV driver which may
> >>>>>>> simply want capability to aggregate resources. So here's refreshed
> >>> series.
> >>>>>>>
> >>>>>>> Current mdev device create interface depends on fixed mdev type,
> >>>>>>> which get uuid from user to create instance of mdev device. If
> >>>>>>> user wants to use customized number of resource for mdev device,
> >>>>>>> then only can create new
> >>>>>> Can you please give an example of 'resource'?
> >>>>>> When I grep [1], [2] and [3], I couldn't find anything related to '
> >>> aggregate'.
> >>>>>
> >>>>> The resource is vendor device specific, in SIOV spec there's ADI
> >>>>> (Assignable Device Interface) definition which could be e.g queue
> >>>>> for net device, context for gpu, etc. I just named this interface as
> >>> 'aggregate'
> >>>>> for aggregation purpose, it's not used in spec doc.
> >>>>>
> >>>>
> >>>> Some 'unknown/undefined' vendor specific resource just doesn't work.
> >>>> Orchestration tool doesn't know which resource and what/how to
> configure
> >>> for which vendor.
> >>>> It has to be well defined.
> >>>>
> >>>> You can also find such discussion in recent lgpu DRM cgroup patches
> series
> >>> v4.
> >>>>
> >>>> Exposing networking resource configuration in non-net namespace
> aware
> >>> mdev sysfs at PCI device level is no-go.
> >>>> Adding per file NET_ADMIN or other checks is not the approach we
> follow in
> >>> kernel.
> >>>>
> >>>> devlink has been a subsystem though under net, that has very rich
> interface
> >>> for syscaller, device health, resource management and many more.
> >>>> Even though it is used by net driver today, its written for generic device
> >>> management at bus/device level.
> >>>>
> >>>> Yuval has posted patches to manage PCI sub-devices [1] and updated
> version
> >>> will be posted soon which addresses comments.
> >>>>
> >>>> For any device slice resource management of mdev, sub-function etc,
> we
> >>> should be using single kernel interface as devlink [2], [3].
> >>>>
> >>>> [1]
> >>>> https://lore.kernel.org/netdev/1573229926-30040-1-git-send-email-
> yuval
> >>>> av@mellanox.com/ [2]
> >>>> http://man7.org/linux/man-pages/man8/devlink-dev.8.html
> >>>> [3] http://man7.org/linux/man-pages/man8/devlink-resource.8.html
> >>>>
> >>>> Most modern device configuration that I am aware of is usually done
> via well
> >>> defined ioctl() of the subsystem (vhost, virtio, vfio, rdma, nvme and
> more) or
> >>> via netlink commands (net, devlink, rdma and more) not via sysfs.
> >>>>
> >>>
> >>> Current vfio/mdev configuration is via documented sysfs ABI instead of
> other
> >>> ways. So this adhere to that way to introduce more configurable method
> on
> >>> mdev device for standard, it's optional and not actually vendor specific
> e.g vfio-
> >>> ap.
> >>>
> >> Some unknown/undefined resource as 'aggregate' is just not an ABI.
> >> It has to be well defined, as 'hardware_address', 'num_netdev_sqs' or
> something similar appropriate to that mdev device class.
> >> If user wants to set a parameter for a mdev regardless of vendor, they
> must have single way to do so.
> >
> > The idea is not specific for some device class, but for each mdev
> > type's resource, and be optional for each vendor. If more device class
> > specific way is preferred, then we might have very different ways for
> > different vendors. Better to avoid that, so here means to aggregate
> > number of mdev type's resources for target instance, instead of defining
> > kinds of mdev types for those number of resources.
> >
> Parameter or attribute certainly can be optional.
> But the way to aggregate them should not be vendor specific.
> Look for some excellent existing examples across subsystems, for example
> how you create aggregated netdev or block device is not depend on vendor
> or underlying device type.

I'd like to hear Alex's opinion on this. Today VFIO mdev supports two styles
of "types" imo: fixed resource definition (most cases) and dynamic resource 
definition (vfio-ap). In fixed style, a type has fixed association to a set of 
vendor specific resources (resourceX=M, resourceY=N, ...). In dynamic case, 
the user is allowed to specify actual resource X/Y/... backing the mdev 
instance post its creation. In either case, the way to identify such association 
or configurable knobs is vendor specific, maybe contained in optional 
attributes (name and description) plus additional info in vendor documents.

Then the user is assumed to clearly understand the implication of the resource
allocation under a given type, when creating a new mdev under this type.

If this assumption holds true, the aggregated attribute simply provides an
extension in the same direction of fixed-style types but allowing for more 
flexible linearly-increasing resource allocation. e.g. when using aggregate=2, 
it means creating a instance with resourceX=2M, resourceY=2N, ... under 
the specified type. Along this direction I didn't see the need of well-defined 
vendor specific attributes here. When those are actually required, I suppose 
the dynamic style would better fit. Or if the vendor driver thinks implementing 
such aggregate feature will confuse its type definition, it's optional to not 
doing so anyway.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/6] VFIO mdev aggregated resources handling
  2019-12-10  3:33               ` Tian, Kevin
@ 2019-12-10 19:07                 ` Alex Williamson
  2019-12-10 21:08                   ` Parav Pandit
  0 siblings, 1 reply; 32+ messages in thread
From: Alex Williamson @ 2019-12-10 19:07 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Parav Pandit, Zhenyu Wang, kvm, kwankhede, cohuck, Jiri Pirko,
	netdev, Jason Wang, Michael S. Tsirkin

On Tue, 10 Dec 2019 03:33:23 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Parav Pandit <parav@mellanox.com>
> > Sent: Saturday, December 7, 2019 1:34 AM
> > 
> > On 12/6/2019 2:03 AM, Zhenyu Wang wrote:  
> > > On 2019.12.05 18:59:36 +0000, Parav Pandit wrote:  
> > >>>>  
> > >>>>> On 2019.11.07 20:37:49 +0000, Parav Pandit wrote:  
> > >>>>>> Hi,
> > >>>>>>  
> > >>>>>>> -----Original Message-----
> > >>>>>>> From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org>  
> > On  
> > >>>>>>> Behalf Of Zhenyu Wang
> > >>>>>>> Sent: Thursday, October 24, 2019 12:08 AM
> > >>>>>>> To: kvm@vger.kernel.org
> > >>>>>>> Cc: alex.williamson@redhat.com; kwankhede@nvidia.com;
> > >>>>>>> kevin.tian@intel.com; cohuck@redhat.com
> > >>>>>>> Subject: [PATCH 0/6] VFIO mdev aggregated resources handling
> > >>>>>>>
> > >>>>>>> Hi,
> > >>>>>>>
> > >>>>>>> This is a refresh for previous send of this series. I got
> > >>>>>>> impression that some SIOV drivers would still deploy their own
> > >>>>>>> create and config method so stopped effort on this. But seems
> > >>>>>>> this would still be useful for some other SIOV driver which may
> > >>>>>>> simply want capability to aggregate resources. So here's refreshed  
> > >>> series.  
> > >>>>>>>
> > >>>>>>> Current mdev device create interface depends on fixed mdev type,
> > >>>>>>> which get uuid from user to create instance of mdev device. If
> > >>>>>>> user wants to use customized number of resource for mdev device,
> > >>>>>>> then only can create new  
> > >>>>>> Can you please give an example of 'resource'?
> > >>>>>> When I grep [1], [2] and [3], I couldn't find anything related to '  
> > >>> aggregate'.  
> > >>>>>
> > >>>>> The resource is vendor device specific, in SIOV spec there's ADI
> > >>>>> (Assignable Device Interface) definition which could be e.g queue
> > >>>>> for net device, context for gpu, etc. I just named this interface as  
> > >>> 'aggregate'  
> > >>>>> for aggregation purpose, it's not used in spec doc.
> > >>>>>  
> > >>>>
> > >>>> Some 'unknown/undefined' vendor specific resource just doesn't work.
> > >>>> Orchestration tool doesn't know which resource and what/how to  
> > configure  
> > >>> for which vendor.  
> > >>>> It has to be well defined.
> > >>>>
> > >>>> You can also find such discussion in recent lgpu DRM cgroup patches  
> > series  
> > >>> v4.  
> > >>>>
> > >>>> Exposing networking resource configuration in non-net namespace  
> > aware  
> > >>> mdev sysfs at PCI device level is no-go.  
> > >>>> Adding per file NET_ADMIN or other checks is not the approach we  
> > follow in  
> > >>> kernel.  
> > >>>>
> > >>>> devlink has been a subsystem though under net, that has very rich  
> > interface  
> > >>> for syscaller, device health, resource management and many more.  
> > >>>> Even though it is used by net driver today, its written for generic device  
> > >>> management at bus/device level.  
> > >>>>
> > >>>> Yuval has posted patches to manage PCI sub-devices [1] and updated  
> > version  
> > >>> will be posted soon which addresses comments.  

Always good to see tools that intend to manage arbitrary devices posted
only to the netdev list :-\

> > >>>>
> > >>>> For any device slice resource management of mdev, sub-function etc,  
> > we  
> > >>> should be using single kernel interface as devlink [2], [3].  

This seems impractical, mdevs and SR-IOV are both enumerated,
inspected, created, and removed in sysfs, where do we define what
features are manipulated vis sysfs versus devlink?  mdevs, by
definition, are vendor defined "chunks" of a thing.  We allow vendor
drivers to define different types, representing different
configurations of these chunks.  Often these different types are
incrementally bigger or smaller chunks of these things, but defining
what bigger and smaller means generically across vendors is an
impossible task.  Orchestration tools already need to know vendor
specific information in terms of what type of mdev device they want to
create and make use of.  The aggregation seems to simply augment that
vendor information, ie. 'type' and 'scale' are separate rather than
combined only behind just 'type'.

> > >>>>
> > >>>> [1]
> > >>>> https://lore.kernel.org/netdev/1573229926-30040-1-git-send-email-  
> > yuval  
> > >>>> av@mellanox.com/ [2]
> > >>>> http://man7.org/linux/man-pages/man8/devlink-dev.8.html
> > >>>> [3] http://man7.org/linux/man-pages/man8/devlink-resource.8.html
> > >>>>
> > >>>> Most modern device configuration that I am aware of is usually done  
> > via well  
> > >>> defined ioctl() of the subsystem (vhost, virtio, vfio, rdma, nvme and  
> > more) or  
> > >>> via netlink commands (net, devlink, rdma and more) not via sysfs.  
> > >>>>  
> > >>>
> > >>> Current vfio/mdev configuration is via documented sysfs ABI instead of  
> > other  
> > >>> ways. So this adhere to that way to introduce more configurable method  
> > on  
> > >>> mdev device for standard, it's optional and not actually vendor specific  
> > e.g vfio-  
> > >>> ap.
> > >>>  
> > >> Some unknown/undefined resource as 'aggregate' is just not an ABI.
> > >> It has to be well defined, as 'hardware_address', 'num_netdev_sqs' or  
> > something similar appropriate to that mdev device class.  
> > >> If user wants to set a parameter for a mdev regardless of vendor, they  
> > must have single way to do so.

Aggregation augments type, which is by definition vendor specific.
  
> > >
> > > The idea is not specific for some device class, but for each mdev
> > > type's resource, and be optional for each vendor. If more device class
> > > specific way is preferred, then we might have very different ways for
> > > different vendors. Better to avoid that, so here means to aggregate
> > > number of mdev type's resources for target instance, instead of defining
> > > kinds of mdev types for those number of resources.
> > >  
> > Parameter or attribute certainly can be optional.
> > But the way to aggregate them should not be vendor specific.
> > Look for some excellent existing examples across subsystems, for example
> > how you create aggregated netdev or block device is not depend on vendor
> > or underlying device type.  
> 
> I'd like to hear Alex's opinion on this. Today VFIO mdev supports two styles
> of "types" imo: fixed resource definition (most cases) and dynamic resource 
> definition (vfio-ap). In fixed style, a type has fixed association to a set of 
> vendor specific resources (resourceX=M, resourceY=N, ...). In dynamic case, 
> the user is allowed to specify actual resource X/Y/... backing the mdev 
> instance post its creation. In either case, the way to identify such association 
> or configurable knobs is vendor specific, maybe contained in optional 
> attributes (name and description) plus additional info in vendor documents.
> 
> Then the user is assumed to clearly understand the implication of the resource
> allocation under a given type, when creating a new mdev under this type.
> 
> If this assumption holds true, the aggregated attribute simply provides an
> extension in the same direction of fixed-style types but allowing for more 
> flexible linearly-increasing resource allocation. e.g. when using aggregate=2, 
> it means creating a instance with resourceX=2M, resourceY=2N, ... under 
> the specified type. Along this direction I didn't see the need of well-defined 
> vendor specific attributes here. When those are actually required, I suppose 
> the dynamic style would better fit. Or if the vendor driver thinks implementing 
> such aggregate feature will confuse its type definition, it's optional to not 
> doing so anyway.

Yep, though I don't think we can even define that aggregate=2 indicates
that every resources is doubled, it's going to have vendor specific
meaning.  Maybe this is what Parav is rejecting, but I don't see an
alternative.  For example, an mdev vGPU might have high level resources
like the number of execution units, graphics memory, display heads,
maximum resolution, etc.  Aggregation could affect one or all of these.
Orchestration tools already need to know the vendor specific type of
device they want to create, so it doesn't seem unreasonable that if
they use aggregation that they choose a type that aggregates the
resource(s) they need, but that aggregation is going to be specific to
the type.  Potentially as we think about adding "defined" sysfs
attributes for devices we could start with
$SYSFS_DEV_PATH/mdev/aggregation/type, where value written to type is a
vendor specific aggregation of that mdev type.  This allows us the
option that we might someday agree on specific resources that might be
aggregated in a common way (ex. ./aggregation/graphics_memory), but I'm
somewhat doubtful those would ever be pursued.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/6] VFIO mdev aggregated resources handling
  2019-12-10 19:07                 ` Alex Williamson
@ 2019-12-10 21:08                   ` Parav Pandit
  2019-12-10 22:08                     ` Alex Williamson
  0 siblings, 1 reply; 32+ messages in thread
From: Parav Pandit @ 2019-12-10 21:08 UTC (permalink / raw)
  To: Alex Williamson, Tian, Kevin
  Cc: Zhenyu Wang, kvm, kwankhede, cohuck, Jiri Pirko, netdev,
	Jason Wang, Michael S. Tsirkin

On 12/10/2019 1:07 PM, Alex Williamson wrote:
> On Tue, 10 Dec 2019 03:33:23 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
>>> From: Parav Pandit <parav@mellanox.com>
>>> Sent: Saturday, December 7, 2019 1:34 AM
>>>
>>> On 12/6/2019 2:03 AM, Zhenyu Wang wrote:  
>>>> On 2019.12.05 18:59:36 +0000, Parav Pandit wrote:  
>>>>>>>  
>>>>>>>> On 2019.11.07 20:37:49 +0000, Parav Pandit wrote:  
>>>>>>>>> Hi,
>>>>>>>>>  
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org>  
>>> On  
>>>>>>>>>> Behalf Of Zhenyu Wang
>>>>>>>>>> Sent: Thursday, October 24, 2019 12:08 AM
>>>>>>>>>> To: kvm@vger.kernel.org
>>>>>>>>>> Cc: alex.williamson@redhat.com; kwankhede@nvidia.com;
>>>>>>>>>> kevin.tian@intel.com; cohuck@redhat.com
>>>>>>>>>> Subject: [PATCH 0/6] VFIO mdev aggregated resources handling
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> This is a refresh for previous send of this series. I got
>>>>>>>>>> impression that some SIOV drivers would still deploy their own
>>>>>>>>>> create and config method so stopped effort on this. But seems
>>>>>>>>>> this would still be useful for some other SIOV driver which may
>>>>>>>>>> simply want capability to aggregate resources. So here's refreshed  
>>>>>> series.  
>>>>>>>>>>
>>>>>>>>>> Current mdev device create interface depends on fixed mdev type,
>>>>>>>>>> which get uuid from user to create instance of mdev device. If
>>>>>>>>>> user wants to use customized number of resource for mdev device,
>>>>>>>>>> then only can create new  
>>>>>>>>> Can you please give an example of 'resource'?
>>>>>>>>> When I grep [1], [2] and [3], I couldn't find anything related to '  
>>>>>> aggregate'.  
>>>>>>>>
>>>>>>>> The resource is vendor device specific, in SIOV spec there's ADI
>>>>>>>> (Assignable Device Interface) definition which could be e.g queue
>>>>>>>> for net device, context for gpu, etc. I just named this interface as  
>>>>>> 'aggregate'  
>>>>>>>> for aggregation purpose, it's not used in spec doc.
>>>>>>>>  
>>>>>>>
>>>>>>> Some 'unknown/undefined' vendor specific resource just doesn't work.
>>>>>>> Orchestration tool doesn't know which resource and what/how to  
>>> configure  
>>>>>> for which vendor.  
>>>>>>> It has to be well defined.
>>>>>>>
>>>>>>> You can also find such discussion in recent lgpu DRM cgroup patches  
>>> series  
>>>>>> v4.  
>>>>>>>
>>>>>>> Exposing networking resource configuration in non-net namespace  
>>> aware  
>>>>>> mdev sysfs at PCI device level is no-go.  
>>>>>>> Adding per file NET_ADMIN or other checks is not the approach we  
>>> follow in  
>>>>>> kernel.  
>>>>>>>
>>>>>>> devlink has been a subsystem though under net, that has very rich  
>>> interface  
>>>>>> for syscaller, device health, resource management and many more.  
>>>>>>> Even though it is used by net driver today, its written for generic device  
>>>>>> management at bus/device level.  
>>>>>>>
>>>>>>> Yuval has posted patches to manage PCI sub-devices [1] and updated  
>>> version  
>>>>>> will be posted soon which addresses comments.  
> 
> Always good to see tools that intend to manage arbitrary devices posted
> only to the netdev list :-\
> 
>>>>>>>
>>>>>>> For any device slice resource management of mdev, sub-function etc,  
>>> we  
>>>>>> should be using single kernel interface as devlink [2], [3].  
> 
> This seems impractical, mdevs and SR-IOV are both enumerated,
> inspected, created, and removed in sysfs, 
Both enumerated via sysfs, but VFs are not configured via sysfs.

> where do we define what
> features are manipulated vis sysfs versus devlink?

VFs are configured via well defined, vendor neutral tool
iproute2/ip link set <pf_netdev> vf <vf_index> <attribute> <value>

This falls short lately for few cases and non-networking or generic VF
property configuration, are proposed to be handled by similar 'VF'
object using devlink, because they are either pure 'pci vf' property or
more device class type VF property such as MAC address or
number_of_queues etc.

More advance mode of networking VFs, are controlled using netdev
representors again in vendor neutral way for last few years.

It may be fair to say that mdev subsystem wants to invent new sysfs
files for configuration.

 mdevs, by
> definition, are vendor defined "chunks" of a thing.  We allow vendor
> drivers to define different types, representing different
> configurations of these chunks.  Often these different types are
> incrementally bigger or smaller chunks of these things, but defining
> what bigger and smaller means generically across vendors is an
> impossible task.  Orchestration tools already need to know vendor
> specific information in terms of what type of mdev device they want to
> create and make use of.  The aggregation seems to simply augment that
> vendor information, ie. 'type' and 'scale' are separate rather than
> combined only behind just 'type'.
> 
>>>>>>>
>>>>>>> [1]
>>>>>>> https://lore.kernel.org/netdev/1573229926-30040-1-git-send-email-  
>>> yuval  
>>>>>>> av@mellanox.com/ [2]
>>>>>>> http://man7.org/linux/man-pages/man8/devlink-dev.8.html
>>>>>>> [3] http://man7.org/linux/man-pages/man8/devlink-resource.8.html
>>>>>>>
>>>>>>> Most modern device configuration that I am aware of is usually done  
>>> via well  
>>>>>> defined ioctl() of the subsystem (vhost, virtio, vfio, rdma, nvme and  
>>> more) or  
>>>>>> via netlink commands (net, devlink, rdma and more) not via sysfs.  
>>>>>>>  
>>>>>>
>>>>>> Current vfio/mdev configuration is via documented sysfs ABI instead of  
>>> other  
>>>>>> ways. So this adhere to that way to introduce more configurable method  
>>> on  
>>>>>> mdev device for standard, it's optional and not actually vendor specific  
>>> e.g vfio-  
>>>>>> ap.
>>>>>>  
>>>>> Some unknown/undefined resource as 'aggregate' is just not an ABI.
>>>>> It has to be well defined, as 'hardware_address', 'num_netdev_sqs' or  
>>> something similar appropriate to that mdev device class.  
>>>>> If user wants to set a parameter for a mdev regardless of vendor, they  
>>> must have single way to do so.
> 
> Aggregation augments type, which is by definition vendor specific.
>   
>>>>
>>>> The idea is not specific for some device class, but for each mdev
>>>> type's resource, and be optional for each vendor. If more device class
>>>> specific way is preferred, then we might have very different ways for
>>>> different vendors. Better to avoid that, so here means to aggregate
>>>> number of mdev type's resources for target instance, instead of defining
>>>> kinds of mdev types for those number of resources.
>>>>  
>>> Parameter or attribute certainly can be optional.
>>> But the way to aggregate them should not be vendor specific.
>>> Look for some excellent existing examples across subsystems, for example
>>> how you create aggregated netdev or block device is not depend on vendor
>>> or underlying device type.  
>>
>> I'd like to hear Alex's opinion on this. Today VFIO mdev supports two styles
>> of "types" imo: fixed resource definition (most cases) and dynamic resource 
>> definition (vfio-ap). In fixed style, a type has fixed association to a set of 
>> vendor specific resources (resourceX=M, resourceY=N, ...). In dynamic case, 
>> the user is allowed to specify actual resource X/Y/... backing the mdev 
>> instance post its creation. In either case, the way to identify such association 
>> or configurable knobs is vendor specific, maybe contained in optional 
>> attributes (name and description) plus additional info in vendor documents.
>>
>> Then the user is assumed to clearly understand the implication of the resource
>> allocation under a given type, when creating a new mdev under this type.
>>
>> If this assumption holds true, the aggregated attribute simply provides an
>> extension in the same direction of fixed-style types but allowing for more 
>> flexible linearly-increasing resource allocation. e.g. when using aggregate=2, 
>> it means creating a instance with resourceX=2M, resourceY=2N, ... under 
>> the specified type. Along this direction I didn't see the need of well-defined 
>> vendor specific attributes here. When those are actually required, I suppose 
>> the dynamic style would better fit. Or if the vendor driver thinks implementing 
>> such aggregate feature will confuse its type definition, it's optional to not 
>> doing so anyway.
> 
> Yep, though I don't think we can even define that aggregate=2 indicates
> that every resources is doubled, it's going to have vendor specific
> meaning.  Maybe this is what Parav is rejecting, but I don't see an
> alternative.  For example, an mdev vGPU might have high level resources
> like the number of execution units, graphics memory, display heads,
> maximum resolution, etc.  Aggregation could affect one or all of these.
> Orchestration tools already need to know the vendor specific type of
> device they want to create, so it doesn't seem unreasonable that if
> they use aggregation that they choose a type that aggregates the
> resource(s) they need, but that aggregation is going to be specific to
> the type.  Potentially as we think about adding "defined" sysfs
> attributes for devices we could start with
> $SYSFS_DEV_PATH/mdev/aggregation/type, where value written to type is a
> vendor specific aggregation of that mdev type.  This allows us the
> option that we might someday agree on specific resources that might be
> aggregated in a common way (ex. ./aggregation/graphics_memory), but I'm
> somewhat doubtful those would ever be pursued.  Thanks,
> 

My point is, from Zhenyu Wang's example it is certainly incorrect to
define mdev sysfs files, as,

vendor_foo_mdev.netdev_mac_addr=X
vendor_bar_mdev.resource_addr=Y

vendor_foo_mdev.netdev_queues=4
vendor_bar_mdev.aggregate=8

Unless this is a miscellaneous (not well defined) parameter of a vendor
device.

I am 100% sure that consumers of network devices where a PCI PF is
sliced into multiple smaller devices, wants to configure these devices
in unified way regardless of vendor type.
That may not be the case with vGPU mdevs.

If Zhenyu Wang proposed to use networking class of mdev device,
attributes should have well defined meaning, as it is well known class
in linux kernel.
mdev should be providing an API to define such mdev config object and
all sysfs for such mdev to be created by the mdev core, not by vendor
driver.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/6] VFIO mdev aggregated resources handling
  2019-12-10 21:08                   ` Parav Pandit
@ 2019-12-10 22:08                     ` Alex Williamson
  2019-12-10 22:40                       ` Parav Pandit
  0 siblings, 1 reply; 32+ messages in thread
From: Alex Williamson @ 2019-12-10 22:08 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Tian, Kevin, Zhenyu Wang, kvm, kwankhede, cohuck, Jiri Pirko,
	netdev, Jason Wang, Michael S. Tsirkin

On Tue, 10 Dec 2019 21:08:29 +0000
Parav Pandit <parav@mellanox.com> wrote:

> On 12/10/2019 1:07 PM, Alex Williamson wrote:
> > On Tue, 10 Dec 2019 03:33:23 +0000
> > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >   
> >>> From: Parav Pandit <parav@mellanox.com>
> >>> Sent: Saturday, December 7, 2019 1:34 AM
> >>>
> >>> On 12/6/2019 2:03 AM, Zhenyu Wang wrote:    
> >>>> On 2019.12.05 18:59:36 +0000, Parav Pandit wrote:    
> >>>>>>>    
> >>>>>>>> On 2019.11.07 20:37:49 +0000, Parav Pandit wrote:    
> >>>>>>>>> Hi,
> >>>>>>>>>    
> >>>>>>>>>> -----Original Message-----
> >>>>>>>>>> From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org>    
> >>> On    
> >>>>>>>>>> Behalf Of Zhenyu Wang
> >>>>>>>>>> Sent: Thursday, October 24, 2019 12:08 AM
> >>>>>>>>>> To: kvm@vger.kernel.org
> >>>>>>>>>> Cc: alex.williamson@redhat.com; kwankhede@nvidia.com;
> >>>>>>>>>> kevin.tian@intel.com; cohuck@redhat.com
> >>>>>>>>>> Subject: [PATCH 0/6] VFIO mdev aggregated resources handling
> >>>>>>>>>>
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> This is a refresh for previous send of this series. I got
> >>>>>>>>>> impression that some SIOV drivers would still deploy their own
> >>>>>>>>>> create and config method so stopped effort on this. But seems
> >>>>>>>>>> this would still be useful for some other SIOV driver which may
> >>>>>>>>>> simply want capability to aggregate resources. So here's refreshed    
> >>>>>> series.    
> >>>>>>>>>>
> >>>>>>>>>> Current mdev device create interface depends on fixed mdev type,
> >>>>>>>>>> which get uuid from user to create instance of mdev device. If
> >>>>>>>>>> user wants to use customized number of resource for mdev device,
> >>>>>>>>>> then only can create new    
> >>>>>>>>> Can you please give an example of 'resource'?
> >>>>>>>>> When I grep [1], [2] and [3], I couldn't find anything related to '    
> >>>>>> aggregate'.    
> >>>>>>>>
> >>>>>>>> The resource is vendor device specific, in SIOV spec there's ADI
> >>>>>>>> (Assignable Device Interface) definition which could be e.g queue
> >>>>>>>> for net device, context for gpu, etc. I just named this interface as    
> >>>>>> 'aggregate'    
> >>>>>>>> for aggregation purpose, it's not used in spec doc.
> >>>>>>>>    
> >>>>>>>
> >>>>>>> Some 'unknown/undefined' vendor specific resource just doesn't work.
> >>>>>>> Orchestration tool doesn't know which resource and what/how to    
> >>> configure    
> >>>>>> for which vendor.    
> >>>>>>> It has to be well defined.
> >>>>>>>
> >>>>>>> You can also find such discussion in recent lgpu DRM cgroup patches    
> >>> series    
> >>>>>> v4.    
> >>>>>>>
> >>>>>>> Exposing networking resource configuration in non-net namespace    
> >>> aware    
> >>>>>> mdev sysfs at PCI device level is no-go.    
> >>>>>>> Adding per file NET_ADMIN or other checks is not the approach we    
> >>> follow in    
> >>>>>> kernel.    
> >>>>>>>
> >>>>>>> devlink has been a subsystem though under net, that has very rich    
> >>> interface    
> >>>>>> for syscaller, device health, resource management and many more.    
> >>>>>>> Even though it is used by net driver today, its written for generic device    
> >>>>>> management at bus/device level.    
> >>>>>>>
> >>>>>>> Yuval has posted patches to manage PCI sub-devices [1] and updated    
> >>> version    
> >>>>>> will be posted soon which addresses comments.    
> > 
> > Always good to see tools that intend to manage arbitrary devices posted
> > only to the netdev list :-\
> >   
> >>>>>>>
> >>>>>>> For any device slice resource management of mdev, sub-function etc,    
> >>> we    
> >>>>>> should be using single kernel interface as devlink [2], [3].    
> > 
> > This seems impractical, mdevs and SR-IOV are both enumerated,
> > inspected, created, and removed in sysfs,   
> Both enumerated via sysfs, but VFs are not configured via sysfs.
> 
> > where do we define what
> > features are manipulated vis sysfs versus devlink?  
> 
> VFs are configured via well defined, vendor neutral tool
> iproute2/ip link set <pf_netdev> vf <vf_index> <attribute> <value>
> 
> This falls short lately for few cases and non-networking or generic VF
> property configuration, are proposed to be handled by similar 'VF'
> object using devlink, because they are either pure 'pci vf' property or
> more device class type VF property such as MAC address or
> number_of_queues etc.
> 
> More advance mode of networking VFs, are controlled using netdev
> representors again in vendor neutral way for last few years.
> 
> It may be fair to say that mdev subsystem wants to invent new sysfs
> files for configuration.

It seems you're trying to apply rules for classes of devices where
configuration features are well defined to an environment where we
don't even have classes of devices, let alone agreed features.
 
>  mdevs, by
> > definition, are vendor defined "chunks" of a thing.  We allow vendor
> > drivers to define different types, representing different
> > configurations of these chunks.  Often these different types are
> > incrementally bigger or smaller chunks of these things, but defining
> > what bigger and smaller means generically across vendors is an
> > impossible task.  Orchestration tools already need to know vendor
> > specific information in terms of what type of mdev device they want to
> > create and make use of.  The aggregation seems to simply augment that
> > vendor information, ie. 'type' and 'scale' are separate rather than
> > combined only behind just 'type'.
> >   
> >>>>>>>
> >>>>>>> [1]
> >>>>>>> https://lore.kernel.org/netdev/1573229926-30040-1-git-send-email-    
> >>> yuval    
> >>>>>>> av@mellanox.com/ [2]
> >>>>>>> http://man7.org/linux/man-pages/man8/devlink-dev.8.html
> >>>>>>> [3] http://man7.org/linux/man-pages/man8/devlink-resource.8.html
> >>>>>>>
> >>>>>>> Most modern device configuration that I am aware of is usually done    
> >>> via well    
> >>>>>> defined ioctl() of the subsystem (vhost, virtio, vfio, rdma, nvme and    
> >>> more) or    
> >>>>>> via netlink commands (net, devlink, rdma and more) not via sysfs.    
> >>>>>>>    
> >>>>>>
> >>>>>> Current vfio/mdev configuration is via documented sysfs ABI instead of    
> >>> other    
> >>>>>> ways. So this adhere to that way to introduce more configurable method    
> >>> on    
> >>>>>> mdev device for standard, it's optional and not actually vendor specific    
> >>> e.g vfio-    
> >>>>>> ap.
> >>>>>>    
> >>>>> Some unknown/undefined resource as 'aggregate' is just not an ABI.
> >>>>> It has to be well defined, as 'hardware_address', 'num_netdev_sqs' or    
> >>> something similar appropriate to that mdev device class.    
> >>>>> If user wants to set a parameter for a mdev regardless of vendor, they    
> >>> must have single way to do so.  
> > 
> > Aggregation augments type, which is by definition vendor specific.
> >     
> >>>>
> >>>> The idea is not specific for some device class, but for each mdev
> >>>> type's resource, and be optional for each vendor. If more device class
> >>>> specific way is preferred, then we might have very different ways for
> >>>> different vendors. Better to avoid that, so here means to aggregate
> >>>> number of mdev type's resources for target instance, instead of defining
> >>>> kinds of mdev types for those number of resources.
> >>>>    
> >>> Parameter or attribute certainly can be optional.
> >>> But the way to aggregate them should not be vendor specific.
> >>> Look for some excellent existing examples across subsystems, for example
> >>> how you create aggregated netdev or block device is not depend on vendor
> >>> or underlying device type.    
> >>
> >> I'd like to hear Alex's opinion on this. Today VFIO mdev supports two styles
> >> of "types" imo: fixed resource definition (most cases) and dynamic resource 
> >> definition (vfio-ap). In fixed style, a type has fixed association to a set of 
> >> vendor specific resources (resourceX=M, resourceY=N, ...). In dynamic case, 
> >> the user is allowed to specify actual resource X/Y/... backing the mdev 
> >> instance post its creation. In either case, the way to identify such association 
> >> or configurable knobs is vendor specific, maybe contained in optional 
> >> attributes (name and description) plus additional info in vendor documents.
> >>
> >> Then the user is assumed to clearly understand the implication of the resource
> >> allocation under a given type, when creating a new mdev under this type.
> >>
> >> If this assumption holds true, the aggregated attribute simply provides an
> >> extension in the same direction of fixed-style types but allowing for more 
> >> flexible linearly-increasing resource allocation. e.g. when using aggregate=2, 
> >> it means creating a instance with resourceX=2M, resourceY=2N, ... under 
> >> the specified type. Along this direction I didn't see the need of well-defined 
> >> vendor specific attributes here. When those are actually required, I suppose 
> >> the dynamic style would better fit. Or if the vendor driver thinks implementing 
> >> such aggregate feature will confuse its type definition, it's optional to not 
> >> doing so anyway.  
> > 
> > Yep, though I don't think we can even define that aggregate=2 indicates
> > that every resources is doubled, it's going to have vendor specific
> > meaning.  Maybe this is what Parav is rejecting, but I don't see an
> > alternative.  For example, an mdev vGPU might have high level resources
> > like the number of execution units, graphics memory, display heads,
> > maximum resolution, etc.  Aggregation could affect one or all of these.
> > Orchestration tools already need to know the vendor specific type of
> > device they want to create, so it doesn't seem unreasonable that if
> > they use aggregation that they choose a type that aggregates the
> > resource(s) they need, but that aggregation is going to be specific to
> > the type.  Potentially as we think about adding "defined" sysfs
> > attributes for devices we could start with
> > $SYSFS_DEV_PATH/mdev/aggregation/type, where value written to type is a
> > vendor specific aggregation of that mdev type.  This allows us the
> > option that we might someday agree on specific resources that might be
> > aggregated in a common way (ex. ./aggregation/graphics_memory), but I'm
> > somewhat doubtful those would ever be pursued.  Thanks,
> >   
> 
> My point is, from Zhenyu Wang's example it is certainly incorrect to
> define mdev sysfs files, as,
> 
> vendor_foo_mdev.netdev_mac_addr=X
> vendor_bar_mdev.resource_addr=Y
> 
> vendor_foo_mdev.netdev_queues=4
> vendor_bar_mdev.aggregate=8
> 
> Unless this is a miscellaneous (not well defined) parameter of a vendor
> device.

I certainly think it's wrong to associate a "netdev" property with
something that the kernel only knows as an opaque device.  But that's
really the issue, mdevs are opaque devices as far as the host kernel is
concerned.  Since we seem to have landed on mdev being used exclusively
for vfio, the only thing we really know about an mdev generically is
which vfio bus driver API the device uses.  Any association of an mdev
to a GPU, NIC, HBA, or other accelerator or I/O interface is strictly
known by the user/admin's interpretation of the vendor specific type.
 
> I am 100% sure that consumers of network devices where a PCI PF is
> sliced into multiple smaller devices, wants to configure these devices
> in unified way regardless of vendor type.
> That may not be the case with vGPU mdevs.

I don't know about devlink, but iirc the ip command operates on a
netdev PF in order to, for example, assign MAC addresses to the VFs.
We have no guarantee with mdevs that there's a parent netdev device for
such an interface.  The parent device might be an FPGA where one type
it's able to expose looks like a NIC.  How do you envision devlink/ip
interacting with something like that?  Using common tools to set
networking properties on a device that the host kernel fundamentally
does not know is a networking device is... difficult.

> If Zhenyu Wang proposed to use networking class of mdev device,
> attributes should have well defined meaning, as it is well known class
> in linux kernel.
> mdev should be providing an API to define such mdev config object and
> all sysfs for such mdev to be created by the mdev core, not by vendor
> driver.

But of course there is no "networking class of mdev device".  Instead
there are mdev devices that might be NICs, but that's for the admin and
user to care about.  If you have an interface in mind for how devlink
is going to learn about mdev device and set properties, please share.
It's not clear to me if we need to design something to be compatible
with devlink or devlink needs to learn how to do certain things on mdev
devices (does devlink want to become a vfio userspace device driver in
order to probe the type of an mdev device?  That'll be hard given some
of the backdoor userspace dependencies of existing vGPU mdevs).  Thanks,

Alex


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/6] VFIO mdev aggregated resources handling
  2019-12-10 22:08                     ` Alex Williamson
@ 2019-12-10 22:40                       ` Parav Pandit
  0 siblings, 0 replies; 32+ messages in thread
From: Parav Pandit @ 2019-12-10 22:40 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Tian, Kevin, Zhenyu Wang, kvm, kwankhede, cohuck, Jiri Pirko,
	netdev, Jason Wang, Michael S. Tsirkin

On 12/10/2019 4:08 PM, Alex Williamson wrote:
> On Tue, 10 Dec 2019 21:08:29 +0000
> Parav Pandit <parav@mellanox.com> wrote:
> 
>> On 12/10/2019 1:07 PM, Alex Williamson wrote:
>>> On Tue, 10 Dec 2019 03:33:23 +0000
>>> "Tian, Kevin" <kevin.tian@intel.com> wrote:
>>>   
>>>>> From: Parav Pandit <parav@mellanox.com>
>>>>> Sent: Saturday, December 7, 2019 1:34 AM
>>>>>
>>>>> On 12/6/2019 2:03 AM, Zhenyu Wang wrote:    
>>>>>> On 2019.12.05 18:59:36 +0000, Parav Pandit wrote:    
>>>>>>>>>    
>>>>>>>>>> On 2019.11.07 20:37:49 +0000, Parav Pandit wrote:    
>>>>>>>>>>> Hi,
>>>>>>>>>>>    
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org>    
>>>>> On    
>>>>>>>>>>>> Behalf Of Zhenyu Wang
>>>>>>>>>>>> Sent: Thursday, October 24, 2019 12:08 AM
>>>>>>>>>>>> To: kvm@vger.kernel.org
>>>>>>>>>>>> Cc: alex.williamson@redhat.com; kwankhede@nvidia.com;
>>>>>>>>>>>> kevin.tian@intel.com; cohuck@redhat.com
>>>>>>>>>>>> Subject: [PATCH 0/6] VFIO mdev aggregated resources handling
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> This is a refresh for previous send of this series. I got
>>>>>>>>>>>> impression that some SIOV drivers would still deploy their own
>>>>>>>>>>>> create and config method so stopped effort on this. But seems
>>>>>>>>>>>> this would still be useful for some other SIOV driver which may
>>>>>>>>>>>> simply want capability to aggregate resources. So here's refreshed    
>>>>>>>> series.    
>>>>>>>>>>>>
>>>>>>>>>>>> Current mdev device create interface depends on fixed mdev type,
>>>>>>>>>>>> which get uuid from user to create instance of mdev device. If
>>>>>>>>>>>> user wants to use customized number of resource for mdev device,
>>>>>>>>>>>> then only can create new    
>>>>>>>>>>> Can you please give an example of 'resource'?
>>>>>>>>>>> When I grep [1], [2] and [3], I couldn't find anything related to '    
>>>>>>>> aggregate'.    
>>>>>>>>>>
>>>>>>>>>> The resource is vendor device specific, in SIOV spec there's ADI
>>>>>>>>>> (Assignable Device Interface) definition which could be e.g queue
>>>>>>>>>> for net device, context for gpu, etc. I just named this interface as    
>>>>>>>> 'aggregate'    
>>>>>>>>>> for aggregation purpose, it's not used in spec doc.
>>>>>>>>>>    
>>>>>>>>>
>>>>>>>>> Some 'unknown/undefined' vendor specific resource just doesn't work.
>>>>>>>>> Orchestration tool doesn't know which resource and what/how to    
>>>>> configure    
>>>>>>>> for which vendor.    
>>>>>>>>> It has to be well defined.
>>>>>>>>>
>>>>>>>>> You can also find such discussion in recent lgpu DRM cgroup patches    
>>>>> series    
>>>>>>>> v4.    
>>>>>>>>>
>>>>>>>>> Exposing networking resource configuration in non-net namespace    
>>>>> aware    
>>>>>>>> mdev sysfs at PCI device level is no-go.    
>>>>>>>>> Adding per file NET_ADMIN or other checks is not the approach we    
>>>>> follow in    
>>>>>>>> kernel.    
>>>>>>>>>
>>>>>>>>> devlink has been a subsystem though under net, that has very rich    
>>>>> interface    
>>>>>>>> for syscaller, device health, resource management and many more.    
>>>>>>>>> Even though it is used by net driver today, its written for generic device    
>>>>>>>> management at bus/device level.    
>>>>>>>>>
>>>>>>>>> Yuval has posted patches to manage PCI sub-devices [1] and updated    
>>>>> version    
>>>>>>>> will be posted soon which addresses comments.    
>>>
>>> Always good to see tools that intend to manage arbitrary devices posted
>>> only to the netdev list :-\
>>>   
>>>>>>>>>
>>>>>>>>> For any device slice resource management of mdev, sub-function etc,    
>>>>> we    
>>>>>>>> should be using single kernel interface as devlink [2], [3].    
>>>
>>> This seems impractical, mdevs and SR-IOV are both enumerated,
>>> inspected, created, and removed in sysfs,   
>> Both enumerated via sysfs, but VFs are not configured via sysfs.
>>
>>> where do we define what
>>> features are manipulated vis sysfs versus devlink?  
>>
>> VFs are configured via well defined, vendor neutral tool
>> iproute2/ip link set <pf_netdev> vf <vf_index> <attribute> <value>
>>
>> This falls short lately for few cases and non-networking or generic VF
>> property configuration, are proposed to be handled by similar 'VF'
>> object using devlink, because they are either pure 'pci vf' property or
>> more device class type VF property such as MAC address or
>> number_of_queues etc.
>>
>> More advance mode of networking VFs, are controlled using netdev
>> representors again in vendor neutral way for last few years.
>>
>> It may be fair to say that mdev subsystem wants to invent new sysfs
>> files for configuration.
> 
> It seems you're trying to apply rules for classes of devices where
> configuration features are well defined to an environment where we
> don't even have classes of devices, let alone agreed features.
>  
>>  mdevs, by
>>> definition, are vendor defined "chunks" of a thing.  We allow vendor
>>> drivers to define different types, representing different
>>> configurations of these chunks.  Often these different types are
>>> incrementally bigger or smaller chunks of these things, but defining
>>> what bigger and smaller means generically across vendors is an
>>> impossible task.  Orchestration tools already need to know vendor
>>> specific information in terms of what type of mdev device they want to
>>> create and make use of.  The aggregation seems to simply augment that
>>> vendor information, ie. 'type' and 'scale' are separate rather than
>>> combined only behind just 'type'.
>>>   
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> https://lore.kernel.org/netdev/1573229926-30040-1-git-send-email-    
>>>>> yuval    
>>>>>>>>> av@mellanox.com/ [2]
>>>>>>>>> http://man7.org/linux/man-pages/man8/devlink-dev.8.html
>>>>>>>>> [3] http://man7.org/linux/man-pages/man8/devlink-resource.8.html
>>>>>>>>>
>>>>>>>>> Most modern device configuration that I am aware of is usually done    
>>>>> via well    
>>>>>>>> defined ioctl() of the subsystem (vhost, virtio, vfio, rdma, nvme and    
>>>>> more) or    
>>>>>>>> via netlink commands (net, devlink, rdma and more) not via sysfs.    
>>>>>>>>>    
>>>>>>>>
>>>>>>>> Current vfio/mdev configuration is via documented sysfs ABI instead of    
>>>>> other    
>>>>>>>> ways. So this adhere to that way to introduce more configurable method    
>>>>> on    
>>>>>>>> mdev device for standard, it's optional and not actually vendor specific    
>>>>> e.g vfio-    
>>>>>>>> ap.
>>>>>>>>    
>>>>>>> Some unknown/undefined resource as 'aggregate' is just not an ABI.
>>>>>>> It has to be well defined, as 'hardware_address', 'num_netdev_sqs' or    
>>>>> something similar appropriate to that mdev device class.    
>>>>>>> If user wants to set a parameter for a mdev regardless of vendor, they    
>>>>> must have single way to do so.  
>>>
>>> Aggregation augments type, which is by definition vendor specific.
>>>     
>>>>>>
>>>>>> The idea is not specific for some device class, but for each mdev
>>>>>> type's resource, and be optional for each vendor. If more device class
>>>>>> specific way is preferred, then we might have very different ways for
>>>>>> different vendors. Better to avoid that, so here means to aggregate
>>>>>> number of mdev type's resources for target instance, instead of defining
>>>>>> kinds of mdev types for those number of resources.
>>>>>>    
>>>>> Parameter or attribute certainly can be optional.
>>>>> But the way to aggregate them should not be vendor specific.
>>>>> Look for some excellent existing examples across subsystems, for example
>>>>> how you create aggregated netdev or block device is not depend on vendor
>>>>> or underlying device type.    
>>>>
>>>> I'd like to hear Alex's opinion on this. Today VFIO mdev supports two styles
>>>> of "types" imo: fixed resource definition (most cases) and dynamic resource 
>>>> definition (vfio-ap). In fixed style, a type has fixed association to a set of 
>>>> vendor specific resources (resourceX=M, resourceY=N, ...). In dynamic case, 
>>>> the user is allowed to specify actual resource X/Y/... backing the mdev 
>>>> instance post its creation. In either case, the way to identify such association 
>>>> or configurable knobs is vendor specific, maybe contained in optional 
>>>> attributes (name and description) plus additional info in vendor documents.
>>>>
>>>> Then the user is assumed to clearly understand the implication of the resource
>>>> allocation under a given type, when creating a new mdev under this type.
>>>>
>>>> If this assumption holds true, the aggregated attribute simply provides an
>>>> extension in the same direction of fixed-style types but allowing for more 
>>>> flexible linearly-increasing resource allocation. e.g. when using aggregate=2, 
>>>> it means creating a instance with resourceX=2M, resourceY=2N, ... under 
>>>> the specified type. Along this direction I didn't see the need of well-defined 
>>>> vendor specific attributes here. When those are actually required, I suppose 
>>>> the dynamic style would better fit. Or if the vendor driver thinks implementing 
>>>> such aggregate feature will confuse its type definition, it's optional to not 
>>>> doing so anyway.  
>>>
>>> Yep, though I don't think we can even define that aggregate=2 indicates
>>> that every resources is doubled, it's going to have vendor specific
>>> meaning.  Maybe this is what Parav is rejecting, but I don't see an
>>> alternative.  For example, an mdev vGPU might have high level resources
>>> like the number of execution units, graphics memory, display heads,
>>> maximum resolution, etc.  Aggregation could affect one or all of these.
>>> Orchestration tools already need to know the vendor specific type of
>>> device they want to create, so it doesn't seem unreasonable that if
>>> they use aggregation that they choose a type that aggregates the
>>> resource(s) they need, but that aggregation is going to be specific to
>>> the type.  Potentially as we think about adding "defined" sysfs
>>> attributes for devices we could start with
>>> $SYSFS_DEV_PATH/mdev/aggregation/type, where value written to type is a
>>> vendor specific aggregation of that mdev type.  This allows us the
>>> option that we might someday agree on specific resources that might be
>>> aggregated in a common way (ex. ./aggregation/graphics_memory), but I'm
>>> somewhat doubtful those would ever be pursued.  Thanks,
>>>   
>>
>> My point is, from Zhenyu Wang's example it is certainly incorrect to
>> define mdev sysfs files, as,
>>
>> vendor_foo_mdev.netdev_mac_addr=X
>> vendor_bar_mdev.resource_addr=Y
>>
>> vendor_foo_mdev.netdev_queues=4
>> vendor_bar_mdev.aggregate=8
>>
>> Unless this is a miscellaneous (not well defined) parameter of a vendor
>> device.
> 
> I certainly think it's wrong to associate a "netdev" property with
> something that the kernel only knows as an opaque device.  But that's
> really the issue, mdevs are opaque devices as far as the host kernel is
> concerned.  Since we seem to have landed on mdev being used exclusively
> for vfio, the only thing we really know about an mdev generically is
> which vfio bus driver API the device uses.  Any association of an mdev
> to a GPU, NIC, HBA, or other accelerator or I/O interface is strictly
> known by the user/admin's interpretation of the vendor specific type.
>  
>> I am 100% sure that consumers of network devices where a PCI PF is
>> sliced into multiple smaller devices, wants to configure these devices
>> in unified way regardless of vendor type.
>> That may not be the case with vGPU mdevs.
> 
> I don't know about devlink, but iirc the ip command operates on a
> netdev PF in order to, for example, assign MAC addresses to the VFs.
> We have no guarantee with mdevs that there's a parent netdev device for
> such an interface.  

Right.  ip link works on netdev. But devlink works on devlink instance
such as bus/device.
Here is an example from one system
$ devlink dev show

pci/0000:06:00.0
pci/0000:06:00.1

Here two devlink instance for a PCI device is registered and this
devlink device has params, ports, health monitoring, register dumps and
lot more.

The parent device might be an FPGA where one type
> it's able to expose looks like a NIC.  How do you envision devlink/ip
> interacting with something like that?  Using common tools to set
> networking properties on a device that the host kernel fundamentally
> does not know is a networking device is... difficult.
> 
If it is exposing FPGA NIC and it needs to be configured, as individual
mdev devices, my series of sub-function is perfect example of it, where
lifecycle of mdev is done through the mdev subsystem, and all params
configured using devlink.
Series was NACKed for different reason which anyway still holds true
regardless of this discussion.

>> If Zhenyu Wang proposed to use networking class of mdev device,
>> attributes should have well defined meaning, as it is well known class
>> in linux kernel.
>> mdev should be providing an API to define such mdev config object and
>> all sysfs for such mdev to be created by the mdev core, not by vendor
>> driver.
> 
> But of course there is no "networking class of mdev device".  Instead
> there are mdev devices that might be NICs,

Such object should be created as pre-patch if this is networking class
mdev and configure it using such method. Things will be lot more clear.

> but that's for the admin and
> user to care about. 
Admin and user to care about but kernel is the one to provide config
interface so program 'net class mdev device' using one command.

> If you have an interface in mind for how devlink
> is going to learn about mdev device and set properties, please share.

I shared the working patches of mdev nics as mellanox sub-functions. It
is good starting point.

Since input was to use devlink interface for sub-function/mediated/slice
NICs, I have revised the RFC to do so, instead of mdev way.
I am happy to share once we finish internal review. I wish I can share
before Christmas holidays.

> It's not clear to me if we need to design something to be compatible
> with devlink or devlink needs to learn how to do certain things on mdev
> devices (does devlink want to become a vfio userspace device driver in
> order to probe the type of an mdev device?  That'll be hard given some
> of the backdoor userspace dependencies of existing vGPU mdevs).  Thanks,

Lets assume for a moment that devlink may not be the tool for mdev
device configuration.
Even in such case, my ask is to clearly define config params via ioctl()
or sysfs as exactly what that param is.

Exposing FPGA NIC net device mac address and other things in sysfs which
is not protected by net namespace is security bug.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2019-12-10 22:40 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-24  5:08 [PATCH 0/6] VFIO mdev aggregated resources handling Zhenyu Wang
2019-10-24  5:08 ` [PATCH 1/6] vfio/mdev: Add new "aggregate" parameter for mdev create Zhenyu Wang
2019-10-24  5:08 ` [PATCH 2/6] vfio/mdev: Add "aggregation" attribute for supported mdev type Zhenyu Wang
2019-10-27  6:24   ` kbuild test robot
2019-10-27  6:24     ` kbuild test robot
2019-10-27  6:24   ` [RFC PATCH] vfio/mdev: mdev_type_attr_aggregation can be static kbuild test robot
2019-10-27  6:24     ` kbuild test robot
2019-10-24  5:08 ` [PATCH 3/6] vfio/mdev: Add "aggregated_instances" attribute for supported mdev device Zhenyu Wang
2019-10-24  5:08 ` [PATCH 4/6] Documentation/driver-api/vfio-mediated-device.rst: Update for vfio/mdev aggregation support Zhenyu Wang
2019-10-24  5:08 ` [PATCH 5/6] Documentation/ABI/testing/sysfs-bus-vfio-mdev: " Zhenyu Wang
2019-10-24  5:08 ` [PATCH 6/6] drm/i915/gvt: Add new type with " Zhenyu Wang
2019-11-05 21:10 ` [PATCH 0/6] VFIO mdev aggregated resources handling Alex Williamson
2019-11-06  4:20   ` Zhenyu Wang
2019-11-06 18:44     ` Alex Williamson
2019-11-07 13:02       ` Cornelia Huck
2019-11-15  4:24       ` Tian, Kevin
2019-11-19 22:58         ` Alex Williamson
2019-11-20  0:46           ` Tian, Kevin
2019-11-07 20:37 ` Parav Pandit
2019-11-08  8:19   ` Zhenyu Wang
2019-12-04 17:36     ` Parav Pandit
2019-12-05  6:06       ` Zhenyu Wang
2019-12-05  6:40         ` Jason Wang
2019-12-05 19:02           ` Parav Pandit
2019-12-05 18:59         ` Parav Pandit
2019-12-06  8:03           ` Zhenyu Wang
2019-12-06 17:33             ` Parav Pandit
2019-12-10  3:33               ` Tian, Kevin
2019-12-10 19:07                 ` Alex Williamson
2019-12-10 21:08                   ` Parav Pandit
2019-12-10 22:08                     ` Alex Williamson
2019-12-10 22:40                       ` Parav Pandit

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.