kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] VFIO mdev aggregated resources handling
@ 2020-03-26  5:41 Zhenyu Wang
  2020-03-26  5:41 ` [PATCH v2 1/2] Documentation/driver-api/vfio-mediated-device.rst: update for aggregation support Zhenyu Wang
                   ` (4 more replies)
  0 siblings, 5 replies; 34+ messages in thread
From: Zhenyu Wang @ 2020-03-26  5:41 UTC (permalink / raw)
  To: alex.williamson; +Cc: kvm, kevin.tian, intel-gvt-dev

Hi,

This is a refresh on previous series: https://patchwork.kernel.org/cover/11208279/
and https://patchwork.freedesktop.org/series/70425/

Current mdev device create interface depends on fixed mdev type, which
get uuid from user to create instance of mdev device. If user wants to
use customized number of resource for mdev device, then only can
create new mdev type for that which may not be flexible. This
requirement comes not only from to be able to allocate flexible
resources for KVMGT, but also from Intel scalable IO virtualization
which would use vfio/mdev to be able to allocate arbitrary resources
on mdev instance. More info on [1] [2] [3].

As we agreed that for current opaque mdev device type, we'd still
explore management interface based on mdev sysfs definition. And this
one tries to follow Alex's previous suggestion to create generic
parameters under 'mdev' directory for each device, so vendor driver
could provide support like as other defined mdev sysfs entries.

For mdev type with aggregation support, files as "aggregated_instances"
and "max_aggregation" should be created under 'mdev' directory. E.g

/sys/devices/pci0000:00/0000:00:02.0/<UUID>/mdev/
   |-- aggregated_instances
   |-- max_aggregation

"aggregated_instances" is used to set or return current number of
instances for aggregation, which can not be larger than "max_aggregation".

The first patch is to update the document for new mdev parameter directory.
The second one is to add aggregation support in GVT driver.

References:
[1] https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
[2] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
[3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf

Zhenyu Wang (2):
  Documentation/driver-api/vfio-mediated-device.rst: update for
    aggregation support
  drm/i915/gvt: mdev aggregation type

 .../driver-api/vfio-mediated-device.rst       |  19 ++
 drivers/gpu/drm/i915/gvt/aperture_gm.c        |  44 +++--
 drivers/gpu/drm/i915/gvt/gtt.c                |   9 +-
 drivers/gpu/drm/i915/gvt/gvt.c                |   7 +-
 drivers/gpu/drm/i915/gvt/gvt.h                |  42 +++--
 drivers/gpu/drm/i915/gvt/kvmgt.c              | 115 +++++++++++-
 drivers/gpu/drm/i915/gvt/vgpu.c               | 172 ++++++++++++------
 7 files changed, 314 insertions(+), 94 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2 1/2] Documentation/driver-api/vfio-mediated-device.rst: update for aggregation support
  2020-03-26  5:41 [PATCH v2 0/2] VFIO mdev aggregated resources handling Zhenyu Wang
@ 2020-03-26  5:41 ` Zhenyu Wang
  2020-03-26  8:17   ` Tian, Kevin
  2020-03-26  5:41 ` [PATCH v2 2/2] drm/i915/gvt: mdev aggregation type Zhenyu Wang
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 34+ messages in thread
From: Zhenyu Wang @ 2020-03-26  5:41 UTC (permalink / raw)
  To: alex.williamson; +Cc: kvm, kevin.tian, intel-gvt-dev, Jiang, Dave

Update doc for mdev aggregation support. Describe mdev generic
parameter directory under mdev device directory.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: "Jiang, Dave" <dave.jiang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
---
 .../driver-api/vfio-mediated-device.rst       | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst
index 25eb7d5b834b..29c29432a847 100644
--- a/Documentation/driver-api/vfio-mediated-device.rst
+++ b/Documentation/driver-api/vfio-mediated-device.rst
@@ -269,6 +269,9 @@ Directories and Files Under the sysfs for Each mdev Device
   |--- [$MDEV_UUID]
          |--- remove
          |--- mdev_type {link to its type}
+         |--- mdev [optional]
+	     |--- aggregated_instances [optional]
+	     |--- max_aggregation [optional]
          |--- vendor-specific-attributes [optional]
 
 * remove (write only)
@@ -281,6 +284,22 @@ Example::
 
 	# echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove
 
+* mdev directory (optional)
+
+Vendor driver could create mdev directory to specify extra generic parameters
+on mdev device by its type. Currently aggregation parameters are defined.
+Vendor driver should provide both items to support.
+
+1) aggregated_instances (read/write)
+
+Set target aggregated instances for device. Reading will show current
+count of aggregated instances. Writing value larger than max_aggregation
+would fail and return error.
+
+2) max_aggregation (read only)
+
+Show maxium instances for aggregation.
+
 Mediated device Hot plug
 ------------------------
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 2/2] drm/i915/gvt: mdev aggregation type
  2020-03-26  5:41 [PATCH v2 0/2] VFIO mdev aggregated resources handling Zhenyu Wang
  2020-03-26  5:41 ` [PATCH v2 1/2] Documentation/driver-api/vfio-mediated-device.rst: update for aggregation support Zhenyu Wang
@ 2020-03-26  5:41 ` Zhenyu Wang
  2020-03-27  7:48   ` Tian, Kevin
  2020-04-08  5:58 ` [PATCH v3 0/2] VFIO mdev aggregated resources handling Zhenyu Wang
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 34+ messages in thread
From: Zhenyu Wang @ 2020-03-26  5:41 UTC (permalink / raw)
  To: alex.williamson; +Cc: kvm, kevin.tian, intel-gvt-dev, Jiang, Dave, Yuan, Hang

To increase the flexibility for resource allocation on mdev types,f
this trys to add aggregation attribute for mdev type that can
aggregate gfx memory resources in GVT case.

Use sysfs method for this attribute, the most difference for GVT is
that the gfx resource allocation will be delayed until mdev open
instead of mdev create to allow aggregation setting before VM
start. But gfx resource accouting would still handle left resource for
target vGPU or other types.

"max_aggregation" would show maxium instances for aggregation on
target vGPU type. "aggregated_instances" shows current count of aggregated
instances. "max_aggregation" is read-only and user sets needed account
by write to "aggregated_instances".

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: "Jiang, Dave" <dave.jiang@intel.com>
Cc: "Yuan, Hang" <hang.yuan@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
---
 drivers/gpu/drm/i915/gvt/aperture_gm.c |  44 +++++--
 drivers/gpu/drm/i915/gvt/gtt.c         |   9 +-
 drivers/gpu/drm/i915/gvt/gvt.c         |   7 +-
 drivers/gpu/drm/i915/gvt/gvt.h         |  42 ++++--
 drivers/gpu/drm/i915/gvt/kvmgt.c       | 115 ++++++++++++++++-
 drivers/gpu/drm/i915/gvt/vgpu.c        | 172 +++++++++++++++++--------
 6 files changed, 295 insertions(+), 94 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/aperture_gm.c b/drivers/gpu/drm/i915/gvt/aperture_gm.c
index 0d6d59871308..9ee3083b37ae 100644
--- a/drivers/gpu/drm/i915/gvt/aperture_gm.c
+++ b/drivers/gpu/drm/i915/gvt/aperture_gm.c
@@ -238,10 +238,10 @@ static void free_resource(struct intel_vgpu *vgpu)
 	gvt->fence.vgpu_allocated_fence_num -= vgpu_fence_sz(vgpu);
 }
 
-static int alloc_resource(struct intel_vgpu *vgpu,
-		struct intel_vgpu_creation_params *param)
+static int alloc_resource(struct intel_vgpu *vgpu)
 {
 	struct intel_gvt *gvt = vgpu->gvt;
+	struct intel_vgpu_creation_params *param = &vgpu->param;
 	unsigned long request, avail, max, taken;
 	const char *item;
 
@@ -254,7 +254,7 @@ static int alloc_resource(struct intel_vgpu *vgpu,
 	max = gvt_aperture_sz(gvt) - HOST_LOW_GM_SIZE;
 	taken = gvt->gm.vgpu_allocated_low_gm_size;
 	avail = max - taken;
-	request = MB_TO_BYTES(param->low_gm_sz);
+	request = param->low_gm_sz * param->aggregation;
 
 	if (request > avail)
 		goto no_enough_resource;
@@ -265,7 +265,7 @@ static int alloc_resource(struct intel_vgpu *vgpu,
 	max = gvt_hidden_sz(gvt) - HOST_HIGH_GM_SIZE;
 	taken = gvt->gm.vgpu_allocated_high_gm_size;
 	avail = max - taken;
-	request = MB_TO_BYTES(param->high_gm_sz);
+	request = param->high_gm_sz * param->aggregation;
 
 	if (request > avail)
 		goto no_enough_resource;
@@ -283,8 +283,8 @@ static int alloc_resource(struct intel_vgpu *vgpu,
 
 	vgpu_fence_sz(vgpu) = request;
 
-	gvt->gm.vgpu_allocated_low_gm_size += MB_TO_BYTES(param->low_gm_sz);
-	gvt->gm.vgpu_allocated_high_gm_size += MB_TO_BYTES(param->high_gm_sz);
+	gvt->gm.vgpu_allocated_low_gm_size += param->low_gm_sz * param->aggregation;
+	gvt->gm.vgpu_allocated_high_gm_size += param->high_gm_sz * param->aggregation;
 	gvt->fence.vgpu_allocated_fence_num += param->fence_sz;
 	return 0;
 
@@ -307,9 +307,34 @@ void intel_vgpu_free_resource(struct intel_vgpu *vgpu)
 {
 	free_vgpu_gm(vgpu);
 	free_vgpu_fence(vgpu);
+}
+
+void intel_vgpu_free_resource_count(struct intel_vgpu *vgpu)
+{
 	free_resource(vgpu);
 }
 
+int intel_vgpu_alloc_resource_count(struct intel_vgpu *vgpu)
+{
+	return alloc_resource(vgpu);
+}
+
+int intel_vgpu_adjust_resource_count(struct intel_vgpu *vgpu)
+{
+	if ((vgpu_aperture_sz(vgpu) != vgpu->param.low_gm_sz *
+	     vgpu->param.aggregation) ||
+	    (vgpu_hidden_sz(vgpu) != vgpu->param.high_gm_sz *
+	     vgpu->param.aggregation)) {
+		/* handle aggregation change */
+		intel_vgpu_free_resource_count(vgpu);
+		intel_vgpu_alloc_resource_count(vgpu);
+		mutex_lock(&vgpu->gvt->lock);
+		intel_gvt_update_vgpu_types(vgpu->gvt);
+		mutex_unlock(&vgpu->gvt->lock);
+	}
+	return 0;
+}
+
 /**
  * intel_vgpu_reset_resource - reset resource state owned by a vGPU
  * @vgpu: a vGPU
@@ -338,15 +363,10 @@ void intel_vgpu_reset_resource(struct intel_vgpu *vgpu)
  * zero on success, negative error code if failed.
  *
  */
-int intel_vgpu_alloc_resource(struct intel_vgpu *vgpu,
-		struct intel_vgpu_creation_params *param)
+int intel_vgpu_alloc_resource(struct intel_vgpu *vgpu)
 {
 	int ret;
 
-	ret = alloc_resource(vgpu, param);
-	if (ret)
-		return ret;
-
 	ret = alloc_vgpu_gm(vgpu);
 	if (ret)
 		goto out_free_resource;
diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 2a4b23f8aa74..60f455134ed0 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -2466,12 +2466,6 @@ int intel_vgpu_init_gtt(struct intel_vgpu *vgpu)
 {
 	struct intel_vgpu_gtt *gtt = &vgpu->gtt;
 
-	INIT_RADIX_TREE(&gtt->spt_tree, GFP_KERNEL);
-
-	INIT_LIST_HEAD(&gtt->ppgtt_mm_list_head);
-	INIT_LIST_HEAD(&gtt->oos_page_list_head);
-	INIT_LIST_HEAD(&gtt->post_shadow_list_head);
-
 	gtt->ggtt_mm = intel_vgpu_create_ggtt_mm(vgpu);
 	if (IS_ERR(gtt->ggtt_mm)) {
 		gvt_vgpu_err("fail to create mm for ggtt.\n");
@@ -2508,6 +2502,9 @@ static void intel_vgpu_destroy_ggtt_mm(struct intel_vgpu *vgpu)
 {
 	struct intel_gvt_partial_pte *pos, *next;
 
+	if (!vgpu->gtt.ggtt_mm)
+		return;
+
 	list_for_each_entry_safe(pos, next,
 				 &vgpu->gtt.ggtt_mm->ggtt_mm.partial_pte_list,
 				 list) {
diff --git a/drivers/gpu/drm/i915/gvt/gvt.c b/drivers/gpu/drm/i915/gvt/gvt.c
index 9e1787867894..bbd77cba239e 100644
--- a/drivers/gpu/drm/i915/gvt/gvt.c
+++ b/drivers/gpu/drm/i915/gvt/gvt.c
@@ -99,11 +99,11 @@ static ssize_t description_show(struct kobject *kobj, struct device *dev,
 
 	return sprintf(buf, "low_gm_size: %dMB\nhigh_gm_size: %dMB\n"
 		       "fence: %d\nresolution: %s\n"
-		       "weight: %d\n",
+		       "weight: %d\naggregation: %s\n",
 		       BYTES_TO_MB(type->low_gm_size),
 		       BYTES_TO_MB(type->high_gm_size),
 		       type->fence, vgpu_edid_str(type->resolution),
-		       type->weight);
+		       type->weight, type->aggregation ? "yes" : "no");
 }
 
 static MDEV_TYPE_ATTR_RO(available_instances);
@@ -185,6 +185,9 @@ static const struct intel_gvt_ops intel_gvt_ops = {
 	.vgpu_get_dmabuf = intel_vgpu_get_dmabuf,
 	.write_protect_handler = intel_vgpu_page_track_handler,
 	.emulate_hotplug = intel_vgpu_emulate_hotplug,
+	.vgpu_delayed_alloc = intel_vgpu_delayed_alloc,
+	.vgpu_res_free = intel_vgpu_res_free,
+	.vgpu_res_change = intel_vgpu_adjust_resource_count,
 };
 
 static void init_device_info(struct intel_gvt *gvt)
diff --git a/drivers/gpu/drm/i915/gvt/gvt.h b/drivers/gpu/drm/i915/gvt/gvt.h
index 58c2c7932e3f..bb8f16d468f4 100644
--- a/drivers/gpu/drm/i915/gvt/gvt.h
+++ b/drivers/gpu/drm/i915/gvt/gvt.h
@@ -165,15 +165,30 @@ struct intel_vgpu_submission {
 	bool active;
 };
 
+struct intel_vgpu_creation_params {
+	__u64 handle;
+	__u64 low_gm_sz;
+	__u64 high_gm_sz;
+	__u64 fence_sz;
+	__u64 resolution;
+	__s32 primary;
+	__u64 vgpu_id;
+	__u32 weight;
+	__u32 aggregation; /* requested aggregation number if type supports */
+};
+
 struct intel_vgpu {
 	struct intel_gvt *gvt;
 	struct mutex vgpu_lock;
 	int id;
 	unsigned long handle; /* vGPU handle used by hypervisor MPT modules */
 	bool active;
+	bool res_initialized;
 	bool pv_notified;
 	bool failsafe;
 	unsigned int resetting_eng;
+	struct intel_vgpu_creation_params param;
+	struct intel_vgpu_type *type;
 
 	/* Both sched_data and sched_ctl can be seen a part of the global gvt
 	 * scheduler structure. So below 2 vgpu data are protected
@@ -276,6 +291,7 @@ struct intel_vgpu_type {
 	unsigned int fence;
 	unsigned int weight;
 	enum intel_vgpu_edid resolution;
+	bool aggregation;
 };
 
 struct intel_gvt {
@@ -402,22 +418,12 @@ int intel_gvt_load_firmware(struct intel_gvt *gvt);
 #define vgpu_fence_base(vgpu) (vgpu->fence.base)
 #define vgpu_fence_sz(vgpu) (vgpu->fence.size)
 
-struct intel_vgpu_creation_params {
-	__u64 handle;
-	__u64 low_gm_sz;  /* in MB */
-	__u64 high_gm_sz; /* in MB */
-	__u64 fence_sz;
-	__u64 resolution;
-	__s32 primary;
-	__u64 vgpu_id;
-
-	__u32 weight;
-};
-
-int intel_vgpu_alloc_resource(struct intel_vgpu *vgpu,
-			      struct intel_vgpu_creation_params *param);
+int intel_vgpu_alloc_resource(struct intel_vgpu *vgpu);
 void intel_vgpu_reset_resource(struct intel_vgpu *vgpu);
 void intel_vgpu_free_resource(struct intel_vgpu *vgpu);
+int intel_vgpu_alloc_resource_count(struct intel_vgpu *vgpu);
+void intel_vgpu_free_resource_count(struct intel_vgpu *vgpu);
+int intel_vgpu_adjust_resource_count(struct intel_vgpu *vgpu);
 void intel_vgpu_write_fence(struct intel_vgpu *vgpu,
 	u32 fence, u64 value);
 
@@ -458,11 +464,13 @@ static inline void intel_vgpu_write_pci_bar(struct intel_vgpu *vgpu,
 
 int intel_gvt_init_vgpu_types(struct intel_gvt *gvt);
 void intel_gvt_clean_vgpu_types(struct intel_gvt *gvt);
+void intel_gvt_update_vgpu_types(struct intel_gvt *gvt);
 
 struct intel_vgpu *intel_gvt_create_idle_vgpu(struct intel_gvt *gvt);
 void intel_gvt_destroy_idle_vgpu(struct intel_vgpu *vgpu);
 struct intel_vgpu *intel_gvt_create_vgpu(struct intel_gvt *gvt,
 					 struct intel_vgpu_type *type);
+int intel_vgpu_delayed_alloc(struct intel_vgpu *vgpu);
 void intel_gvt_destroy_vgpu(struct intel_vgpu *vgpu);
 void intel_gvt_release_vgpu(struct intel_vgpu *vgpu);
 void intel_gvt_reset_vgpu_locked(struct intel_vgpu *vgpu, bool dmlr,
@@ -523,6 +531,9 @@ static inline u64 intel_vgpu_get_bar_gpa(struct intel_vgpu *vgpu, int bar)
 			PCI_BASE_ADDRESS_MEM_MASK;
 }
 
+int intel_vgpu_res_alloc(struct intel_vgpu *vgpu);
+void intel_vgpu_res_free(struct intel_vgpu *vgpu);
+
 void intel_vgpu_clean_opregion(struct intel_vgpu *vgpu);
 int intel_vgpu_init_opregion(struct intel_vgpu *vgpu);
 int intel_vgpu_opregion_base_write_handler(struct intel_vgpu *vgpu, u32 gpa);
@@ -557,6 +568,9 @@ struct intel_gvt_ops {
 	int (*write_protect_handler)(struct intel_vgpu *, u64, void *,
 				     unsigned int);
 	void (*emulate_hotplug)(struct intel_vgpu *vgpu, bool connected);
+	int (*vgpu_delayed_alloc)(struct intel_vgpu *vgpu);
+	void (*vgpu_res_free)(struct intel_vgpu *vgpu);
+	int (*vgpu_res_change)(struct intel_vgpu *vgpu);
 };
 
 
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 074c4efb58eb..b1a4096c6c50 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -711,6 +711,7 @@ static int intel_vgpu_create(struct kobject *kobj, struct mdev_device *mdev)
 		gvt_err("failed to create intel vgpu: %d\n", ret);
 		goto out;
 	}
+	vgpu->type = type;
 
 	INIT_WORK(&kvmgt_vdev(vgpu)->release_work, intel_vgpu_release_work);
 
@@ -793,6 +794,8 @@ static int intel_vgpu_open(struct mdev_device *mdev)
 	unsigned long events;
 	int ret;
 
+	mutex_lock(&vgpu->vgpu_lock);
+
 	vdev->iommu_notifier.notifier_call = intel_vgpu_iommu_notifier;
 	vdev->group_notifier.notifier_call = intel_vgpu_group_notifier;
 
@@ -814,21 +817,32 @@ static int intel_vgpu_open(struct mdev_device *mdev)
 		goto undo_iommu;
 	}
 
+	if (vgpu->type->aggregation && vgpu->param.aggregation) {
+		ret = intel_gvt_ops->vgpu_delayed_alloc(vgpu);
+		if (ret)
+			goto undo_group;
+	}
+
 	/* Take a module reference as mdev core doesn't take
 	 * a reference for vendor driver.
 	 */
 	if (!try_module_get(THIS_MODULE))
-		goto undo_group;
+		goto free_res;
 
 	ret = kvmgt_guest_init(mdev);
-	if (ret)
-		goto undo_group;
+	if (ret) {
+		module_put(THIS_MODULE);
+		goto free_res;
+	}
 
 	intel_gvt_ops->vgpu_activate(vgpu);
 
 	atomic_set(&vdev->released, 0);
-	return ret;
+	goto out;
 
+free_res:
+	if (vgpu->type->aggregation && vgpu->param.aggregation)
+		intel_gvt_ops->vgpu_res_free(vgpu);
 undo_group:
 	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
 					&vdev->group_notifier);
@@ -837,6 +851,7 @@ static int intel_vgpu_open(struct mdev_device *mdev)
 	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
 					&vdev->iommu_notifier);
 out:
+	mutex_unlock(&vgpu->vgpu_lock);
 	return ret;
 }
 
@@ -1628,8 +1643,100 @@ static const struct attribute_group intel_vgpu_group = {
 	.attrs = intel_vgpu_attrs,
 };
 
+/*
+ * "max_aggregation" display maxium instances for aggregation,
+ * if type doesn't support aggregation, it displays 0
+ */
+static ssize_t
+max_aggregation_show(struct device *dev, struct device_attribute *attr,
+		     char *buf)
+{
+	struct mdev_device *mdev = mdev_from_dev(dev);
+
+	if (mdev) {
+		struct intel_vgpu *vgpu = (struct intel_vgpu *)
+			mdev_get_drvdata(mdev);
+		struct intel_vgpu_type *type = vgpu->type;
+		/* return left avail + current allocated as maxium */
+		unsigned int m = type->avail_instance +
+			vgpu->param.aggregation;
+		if (type->aggregation)
+			return sprintf(buf, "%u\n", m);
+	}
+	return sprintf(buf, "0\n");
+}
+static DEVICE_ATTR_RO(max_aggregation);
+
+static ssize_t
+aggregated_instances_store(struct device *dev, struct device_attribute *attr,
+			   const char *buf, size_t count)
+{
+	struct mdev_device *mdev = mdev_from_dev(dev);
+	unsigned long val;
+	ssize_t ret = -EINVAL;
+
+	if (kstrtoul(buf, 0, &val) < 0)
+		return ret;
+
+	if (val > 0 && mdev) {
+		struct intel_vgpu *vgpu = (struct intel_vgpu *)
+			mdev_get_drvdata(mdev);
+		struct intel_vgpu_type *type = vgpu->type;
+
+		if (val == vgpu->param.aggregation)
+			return count;
+
+		mutex_lock(&vgpu->vgpu_lock);
+		if (vgpu->active) {
+			mutex_unlock(&vgpu->vgpu_lock);
+			return ret;
+		}
+		/*
+		 * value should be less than maxium aggregation
+		 * instance number.
+		 */
+		if (type->aggregation &&
+		    val <= (vgpu->param.aggregation + type->avail_instance)) {
+			vgpu->param.aggregation = val;
+			intel_gvt_ops->vgpu_res_change(vgpu);
+			ret = count;
+		}
+		mutex_unlock(&vgpu->vgpu_lock);
+	}
+	return ret;
+}
+
+static ssize_t
+aggregated_instances_show(struct device *dev, struct device_attribute *attr,
+		 char *buf)
+{
+	struct mdev_device *mdev = mdev_from_dev(dev);
+
+	if (mdev) {
+		struct intel_vgpu *vgpu = (struct intel_vgpu *)
+			mdev_get_drvdata(mdev);
+		struct intel_vgpu_type *type = vgpu->type;
+		if (type->aggregation)
+			return sprintf(buf, "%u\n", vgpu->param.aggregation);
+	}
+	return sprintf(buf, "0\n");
+}
+static DEVICE_ATTR_RW(aggregated_instances);
+
+static struct attribute *mdev_attrs[] = {
+	&dev_attr_max_aggregation.attr,
+	&dev_attr_aggregated_instances.attr,
+	NULL
+};
+
+static const struct attribute_group mdev_group = {
+	.name = "mdev",
+	.attrs = mdev_attrs,
+};
+
 static const struct attribute_group *intel_vgpu_groups[] = {
 	&intel_vgpu_group,
+	&mdev_group,
 	NULL,
 };
 
diff --git a/drivers/gpu/drm/i915/gvt/vgpu.c b/drivers/gpu/drm/i915/gvt/vgpu.c
index 1d5ff88078bd..08a1b164e9a8 100644
--- a/drivers/gpu/drm/i915/gvt/vgpu.c
+++ b/drivers/gpu/drm/i915/gvt/vgpu.c
@@ -89,12 +89,13 @@ static struct {
 	unsigned int weight;
 	enum intel_vgpu_edid edid;
 	char *name;
+	bool aggregation;
 } vgpu_types[] = {
 /* Fixed vGPU type table */
-	{ MB_TO_BYTES(64), MB_TO_BYTES(384), 4, VGPU_WEIGHT(8), GVT_EDID_1024_768, "8" },
-	{ MB_TO_BYTES(128), MB_TO_BYTES(512), 4, VGPU_WEIGHT(4), GVT_EDID_1920_1200, "4" },
-	{ MB_TO_BYTES(256), MB_TO_BYTES(1024), 4, VGPU_WEIGHT(2), GVT_EDID_1920_1200, "2" },
-	{ MB_TO_BYTES(512), MB_TO_BYTES(2048), 4, VGPU_WEIGHT(1), GVT_EDID_1920_1200, "1" },
+	{ MB_TO_BYTES(64), MB_TO_BYTES(384), 4, VGPU_WEIGHT(8), GVT_EDID_1024_768, "8", true },
+	{ MB_TO_BYTES(128), MB_TO_BYTES(512), 4, VGPU_WEIGHT(4), GVT_EDID_1920_1200, "4", false },
+	{ MB_TO_BYTES(256), MB_TO_BYTES(1024), 4, VGPU_WEIGHT(2), GVT_EDID_1920_1200, "2", false },
+	{ MB_TO_BYTES(512), MB_TO_BYTES(2048), 4, VGPU_WEIGHT(1), GVT_EDID_1920_1200, "1", false },
 };
 
 /**
@@ -148,6 +149,7 @@ int intel_gvt_init_vgpu_types(struct intel_gvt *gvt)
 		gvt->types[i].resolution = vgpu_types[i].edid;
 		gvt->types[i].avail_instance = min(low_avail / vgpu_types[i].low_mm,
 						   high_avail / vgpu_types[i].high_mm);
+		gvt->types[i].aggregation = vgpu_types[i].aggregation;
 
 		if (IS_GEN(gvt->gt->i915, 8))
 			sprintf(gvt->types[i].name, "GVTg_V4_%s",
@@ -174,7 +176,7 @@ void intel_gvt_clean_vgpu_types(struct intel_gvt *gvt)
 	kfree(gvt->types);
 }
 
-static void intel_gvt_update_vgpu_types(struct intel_gvt *gvt)
+void intel_gvt_update_vgpu_types(struct intel_gvt *gvt)
 {
 	int i;
 	unsigned int low_gm_avail, high_gm_avail, fence_avail;
@@ -213,9 +215,7 @@ static void intel_gvt_update_vgpu_types(struct intel_gvt *gvt)
  */
 void intel_gvt_activate_vgpu(struct intel_vgpu *vgpu)
 {
-	mutex_lock(&vgpu->vgpu_lock);
 	vgpu->active = true;
-	mutex_unlock(&vgpu->vgpu_lock);
 }
 
 /**
@@ -259,6 +259,8 @@ void intel_gvt_release_vgpu(struct intel_vgpu *vgpu)
 	mutex_lock(&vgpu->vgpu_lock);
 	intel_vgpu_clean_workloads(vgpu, ALL_ENGINES);
 	intel_vgpu_dmabuf_cleanup(vgpu);
+	if (vgpu->type->aggregation)
+		intel_vgpu_res_free(vgpu);
 	mutex_unlock(&vgpu->vgpu_lock);
 }
 
@@ -290,10 +292,13 @@ void intel_gvt_destroy_vgpu(struct intel_vgpu *vgpu)
 	intel_vgpu_clean_submission(vgpu);
 	intel_vgpu_clean_display(vgpu);
 	intel_vgpu_clean_opregion(vgpu);
-	intel_vgpu_reset_ggtt(vgpu, true);
-	intel_vgpu_clean_gtt(vgpu);
+	if (vgpu->res_initialized) {
+		intel_vgpu_reset_ggtt(vgpu, true);
+		intel_vgpu_clean_gtt(vgpu);
+		intel_vgpu_free_resource(vgpu);
+	}
+	intel_vgpu_free_resource_count(vgpu);
 	intel_gvt_hypervisor_detach_vgpu(vgpu);
-	intel_vgpu_free_resource(vgpu);
 	intel_vgpu_clean_mmio(vgpu);
 	intel_vgpu_dmabuf_cleanup(vgpu);
 	mutex_unlock(&vgpu->vgpu_lock);
@@ -364,59 +369,85 @@ void intel_gvt_destroy_idle_vgpu(struct intel_vgpu *vgpu)
 	vfree(vgpu);
 }
 
-static struct intel_vgpu *__intel_gvt_create_vgpu(struct intel_gvt *gvt,
-		struct intel_vgpu_creation_params *param)
+int intel_vgpu_res_alloc(struct intel_vgpu *vgpu)
 {
-	struct intel_vgpu *vgpu;
 	int ret;
 
-	gvt_dbg_core("handle %llu low %llu MB high %llu MB fence %llu\n",
-			param->handle, param->low_gm_sz, param->high_gm_sz,
-			param->fence_sz);
+	ret = intel_vgpu_alloc_resource(vgpu);
+	if (ret)
+		return ret;
 
-	vgpu = vzalloc(sizeof(*vgpu));
-	if (!vgpu)
-		return ERR_PTR(-ENOMEM);
+	populate_pvinfo_page(vgpu);
+
+	ret = intel_vgpu_init_gtt(vgpu);
+	if (ret) {
+		intel_vgpu_free_resource(vgpu);
+		return ret;
+	}
+	vgpu->res_initialized = true;
+	return 0;
+}
+
+void intel_vgpu_res_free(struct intel_vgpu *vgpu)
+{
+	intel_vgpu_reset_ggtt(vgpu, true);
+	intel_vgpu_clean_gtt(vgpu);
+	intel_vgpu_free_resource(vgpu);
+	vgpu->res_initialized = false;
+	mutex_lock(&vgpu->gvt->lock);
+	intel_gvt_update_vgpu_types(vgpu->gvt);
+	mutex_unlock(&vgpu->gvt->lock);
+}
+
+static int
+__intel_gvt_create_vgpu(struct intel_vgpu *vgpu,
+			bool delay_res_alloc)
+{
+	struct intel_gvt *gvt = vgpu->gvt;
+	struct intel_vgpu_gtt *gtt = &vgpu->gtt;
+	int ret;
 
 	ret = idr_alloc(&gvt->vgpu_idr, vgpu, IDLE_VGPU_IDR + 1, GVT_MAX_VGPU,
-		GFP_KERNEL);
+			GFP_KERNEL);
 	if (ret < 0)
-		goto out_free_vgpu;
-
+		return ret;
 	vgpu->id = ret;
-	vgpu->handle = param->handle;
-	vgpu->gvt = gvt;
-	vgpu->sched_ctl.weight = param->weight;
+
 	mutex_init(&vgpu->vgpu_lock);
 	mutex_init(&vgpu->dmabuf_lock);
 	INIT_LIST_HEAD(&vgpu->dmabuf_obj_list_head);
 	INIT_RADIX_TREE(&vgpu->page_track_tree, GFP_KERNEL);
 	idr_init(&vgpu->object_idr);
-	intel_vgpu_init_cfg_space(vgpu, param->primary);
+	INIT_RADIX_TREE(&gtt->spt_tree, GFP_KERNEL);
+	INIT_LIST_HEAD(&gtt->ppgtt_mm_list_head);
+	INIT_LIST_HEAD(&gtt->oos_page_list_head);
+	INIT_LIST_HEAD(&gtt->post_shadow_list_head);
+
+	intel_vgpu_init_cfg_space(vgpu, vgpu->param.primary);
 
 	ret = intel_vgpu_init_mmio(vgpu);
 	if (ret)
 		goto out_clean_idr;
 
-	ret = intel_vgpu_alloc_resource(vgpu, param);
+	ret = intel_vgpu_alloc_resource_count(vgpu);
 	if (ret)
 		goto out_clean_vgpu_mmio;
 
-	populate_pvinfo_page(vgpu);
+	if (!delay_res_alloc) {
+		ret = intel_vgpu_res_alloc(vgpu);
+		if (ret)
+			goto out_clean_vgpu_mmio;
+	}
 
 	ret = intel_gvt_hypervisor_attach_vgpu(vgpu);
 	if (ret)
 		goto out_clean_vgpu_resource;
 
-	ret = intel_vgpu_init_gtt(vgpu);
-	if (ret)
-		goto out_detach_hypervisor_vgpu;
-
 	ret = intel_vgpu_init_opregion(vgpu);
 	if (ret)
-		goto out_clean_gtt;
+		goto out_detach_hypervisor_vgpu;
 
-	ret = intel_vgpu_init_display(vgpu, param->resolution);
+	ret = intel_vgpu_init_display(vgpu, vgpu->param.resolution);
 	if (ret)
 		goto out_clean_opregion;
 
@@ -438,7 +469,7 @@ static struct intel_vgpu *__intel_gvt_create_vgpu(struct intel_gvt *gvt,
 	if (ret)
 		goto out_clean_sched_policy;
 
-	return vgpu;
+	return 0;
 
 out_clean_sched_policy:
 	intel_vgpu_clean_sched_policy(vgpu);
@@ -448,19 +479,17 @@ static struct intel_vgpu *__intel_gvt_create_vgpu(struct intel_gvt *gvt,
 	intel_vgpu_clean_display(vgpu);
 out_clean_opregion:
 	intel_vgpu_clean_opregion(vgpu);
-out_clean_gtt:
-	intel_vgpu_clean_gtt(vgpu);
 out_detach_hypervisor_vgpu:
 	intel_gvt_hypervisor_detach_vgpu(vgpu);
 out_clean_vgpu_resource:
+	intel_vgpu_clean_gtt(vgpu);
 	intel_vgpu_free_resource(vgpu);
+	intel_gvt_update_vgpu_types(vgpu->gvt);
 out_clean_vgpu_mmio:
 	intel_vgpu_clean_mmio(vgpu);
 out_clean_idr:
 	idr_remove(&gvt->vgpu_idr, vgpu->id);
-out_free_vgpu:
-	vfree(vgpu);
-	return ERR_PTR(ret);
+	return ret;
 }
 
 /**
@@ -474,33 +503,64 @@ static struct intel_vgpu *__intel_gvt_create_vgpu(struct intel_gvt *gvt,
  * pointer to intel_vgpu, error pointer if failed.
  */
 struct intel_vgpu *intel_gvt_create_vgpu(struct intel_gvt *gvt,
-				struct intel_vgpu_type *type)
+					 struct intel_vgpu_type *type)
 {
-	struct intel_vgpu_creation_params param;
 	struct intel_vgpu *vgpu;
+	struct intel_vgpu_creation_params *param;
+	int ret;
+
+	vgpu = vzalloc(sizeof(*vgpu));
+	if (!vgpu)
+		return ERR_PTR(-ENOMEM);
 
-	param.handle = 0;
-	param.primary = 1;
-	param.low_gm_sz = type->low_gm_size;
-	param.high_gm_sz = type->high_gm_size;
-	param.fence_sz = type->fence;
-	param.weight = type->weight;
-	param.resolution = type->resolution;
+	param = &vgpu->param;
+	param->handle = 0;
+	param->primary = 1;
+	param->low_gm_sz = type->low_gm_size;
+	param->high_gm_sz = type->high_gm_size;
+	param->fence_sz = type->fence;
+	param->weight = type->weight;
+	param->resolution = type->resolution;
 
-	/* XXX current param based on MB */
-	param.low_gm_sz = BYTES_TO_MB(param.low_gm_sz);
-	param.high_gm_sz = BYTES_TO_MB(param.high_gm_sz);
+	gvt_dbg_core("handle %llu low %llu MB high %llu MB fence %llu\n",
+		     param->handle, BYTES_TO_MB(param->low_gm_sz),
+		     BYTES_TO_MB(param->high_gm_sz),
+		     param->fence_sz);
+
+	vgpu->handle = param->handle;
+	vgpu->gvt = gvt;
+	vgpu->sched_ctl.weight = param->weight;
+	param->aggregation = 1;
 
 	mutex_lock(&gvt->lock);
-	vgpu = __intel_gvt_create_vgpu(gvt, &param);
-	if (!IS_ERR(vgpu))
-		/* calculate left instance change for types */
-		intel_gvt_update_vgpu_types(gvt);
+	ret = __intel_gvt_create_vgpu(vgpu, type->aggregation ? true : false);
+	if (ret) {
+		mutex_unlock(&gvt->lock);
+		vfree(vgpu);
+		return ERR_PTR(ret);
+	}
+	/* calculate left instance change for types */
+	intel_gvt_update_vgpu_types(vgpu->gvt);
 	mutex_unlock(&gvt->lock);
 
 	return vgpu;
 }
 
+int intel_vgpu_delayed_alloc(struct intel_vgpu *vgpu)
+{
+	int ret;
+
+	intel_vgpu_adjust_resource_count(vgpu);
+	ret = intel_vgpu_res_alloc(vgpu);
+	if (ret)
+		return -EINVAL;
+
+	mutex_lock(&vgpu->gvt->lock);
+	intel_gvt_update_vgpu_types(vgpu->gvt);
+	mutex_unlock(&vgpu->gvt->lock);
+	return 0;
+}
+
 /**
  * intel_gvt_reset_vgpu_locked - reset a virtual GPU by DMLR or GT reset
  * @vgpu: virtual GPU
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* RE: [PATCH v2 1/2] Documentation/driver-api/vfio-mediated-device.rst: update for aggregation support
  2020-03-26  5:41 ` [PATCH v2 1/2] Documentation/driver-api/vfio-mediated-device.rst: update for aggregation support Zhenyu Wang
@ 2020-03-26  8:17   ` Tian, Kevin
  2020-03-26  8:21     ` Zhenyu Wang
  0 siblings, 1 reply; 34+ messages in thread
From: Tian, Kevin @ 2020-03-26  8:17 UTC (permalink / raw)
  To: Zhenyu Wang, alex.williamson; +Cc: kvm, intel-gvt-dev, Jiang, Dave

> From: Zhenyu Wang <zhenyuw@linux.intel.com>
> Sent: Thursday, March 26, 2020 1:42 PM
> 
> Update doc for mdev aggregation support. Describe mdev generic
> parameter directory under mdev device directory.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: "Jiang, Dave" <dave.jiang@intel.com>
> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
> ---
>  .../driver-api/vfio-mediated-device.rst       | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/Documentation/driver-api/vfio-mediated-device.rst
> b/Documentation/driver-api/vfio-mediated-device.rst
> index 25eb7d5b834b..29c29432a847 100644
> --- a/Documentation/driver-api/vfio-mediated-device.rst
> +++ b/Documentation/driver-api/vfio-mediated-device.rst
> @@ -269,6 +269,9 @@ Directories and Files Under the sysfs for Each mdev
> Device
>    |--- [$MDEV_UUID]
>           |--- remove
>           |--- mdev_type {link to its type}
> +         |--- mdev [optional]
> +	     |--- aggregated_instances [optional]
> +	     |--- max_aggregation [optional]
>           |--- vendor-specific-attributes [optional]
> 
>  * remove (write only)
> @@ -281,6 +284,22 @@ Example::
> 
>  	# echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove
> 
> +* mdev directory (optional)

It sounds confusing to me when seeing a 'mdev' directory under a
mdev instance. How could one tell which attribute should put inside
or outside of 'mdev'?

> +
> +Vendor driver could create mdev directory to specify extra generic
> parameters
> +on mdev device by its type. Currently aggregation parameters are defined.
> +Vendor driver should provide both items to support.
> +
> +1) aggregated_instances (read/write)
> +
> +Set target aggregated instances for device. Reading will show current
> +count of aggregated instances. Writing value larger than max_aggregation
> +would fail and return error.

Can one write a value multiple-times and at any time? 

> +
> +2) max_aggregation (read only)
> +
> +Show maxium instances for aggregation.
> +

"show maximum-allowed instances which can be aggregated for this device". is
this value static or dynamic? if dynamic then the user is expected to read this
field before every write. worthy of some clarification here.

>  Mediated device Hot plug
>  ------------------------
> 
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] Documentation/driver-api/vfio-mediated-device.rst: update for aggregation support
  2020-03-26  8:17   ` Tian, Kevin
@ 2020-03-26  8:21     ` Zhenyu Wang
  2020-03-27  6:16       ` Tian, Kevin
  0 siblings, 1 reply; 34+ messages in thread
From: Zhenyu Wang @ 2020-03-26  8:21 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: alex.williamson, kvm, intel-gvt-dev, Jiang, Dave

[-- Attachment #1: Type: text/plain, Size: 3241 bytes --]

On 2020.03.26 08:17:20 +0000, Tian, Kevin wrote:
> > From: Zhenyu Wang <zhenyuw@linux.intel.com>
> > Sent: Thursday, March 26, 2020 1:42 PM
> > 
> > Update doc for mdev aggregation support. Describe mdev generic
> > parameter directory under mdev device directory.
> > 
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: "Jiang, Dave" <dave.jiang@intel.com>
> > Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
> > ---
> >  .../driver-api/vfio-mediated-device.rst       | 19 +++++++++++++++++++
> >  1 file changed, 19 insertions(+)
> > 
> > diff --git a/Documentation/driver-api/vfio-mediated-device.rst
> > b/Documentation/driver-api/vfio-mediated-device.rst
> > index 25eb7d5b834b..29c29432a847 100644
> > --- a/Documentation/driver-api/vfio-mediated-device.rst
> > +++ b/Documentation/driver-api/vfio-mediated-device.rst
> > @@ -269,6 +269,9 @@ Directories and Files Under the sysfs for Each mdev
> > Device
> >    |--- [$MDEV_UUID]
> >           |--- remove
> >           |--- mdev_type {link to its type}
> > +         |--- mdev [optional]
> > +	     |--- aggregated_instances [optional]
> > +	     |--- max_aggregation [optional]
> >           |--- vendor-specific-attributes [optional]
> > 
> >  * remove (write only)
> > @@ -281,6 +284,22 @@ Example::
> > 
> >  	# echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove
> > 
> > +* mdev directory (optional)
> 
> It sounds confusing to me when seeing a 'mdev' directory under a
> mdev instance. How could one tell which attribute should put inside
> or outside of 'mdev'?
>

After mdev create you get uuid directory under normal device path, so
from that point a 'mdev' directory can just tell this is a mdev
device. And it's proposed by Alex before.

Currently only mdev core could create attribute e.g 'remove' under
device dir, vendor specific attrs need to be in attrs group. So 'mdev'
directory here tries to be optional generic interface.

> > +
> > +Vendor driver could create mdev directory to specify extra generic
> > parameters
> > +on mdev device by its type. Currently aggregation parameters are defined.
> > +Vendor driver should provide both items to support.
> > +
> > +1) aggregated_instances (read/write)
> > +
> > +Set target aggregated instances for device. Reading will show current
> > +count of aggregated instances. Writing value larger than max_aggregation
> > +would fail and return error.
> 
> Can one write a value multiple-times and at any time? 
>

yeah, of coz multiple times, but normally won't succeed after open.

> > +
> > +2) max_aggregation (read only)
> > +
> > +Show maxium instances for aggregation.
> > +
> 
> "show maximum-allowed instances which can be aggregated for this device". is
> this value static or dynamic? if dynamic then the user is expected to read this
> field before every write. worthy of some clarification here.

yeah, user needs to read this before setting actual number, either static or dynamic
depends on vendor resource type.

Thanks

> 
> >  Mediated device Hot plug
> >  ------------------------
> > 
> > --
> > 2.25.1
> 

-- 
Open Source Technology Center, Intel ltd.

$gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v2 1/2] Documentation/driver-api/vfio-mediated-device.rst: update for aggregation support
  2020-03-26  8:21     ` Zhenyu Wang
@ 2020-03-27  6:16       ` Tian, Kevin
  2020-03-27  6:21         ` Zhenyu Wang
  0 siblings, 1 reply; 34+ messages in thread
From: Tian, Kevin @ 2020-03-27  6:16 UTC (permalink / raw)
  To: Zhenyu Wang; +Cc: Jiang, Dave, alex.williamson, intel-gvt-dev, kvm

> From: Zhenyu Wang
> Sent: Thursday, March 26, 2020 4:22 PM
> 
> On 2020.03.26 08:17:20 +0000, Tian, Kevin wrote:
> > > From: Zhenyu Wang <zhenyuw@linux.intel.com>
> > > Sent: Thursday, March 26, 2020 1:42 PM
> > >
> > > Update doc for mdev aggregation support. Describe mdev generic
> > > parameter directory under mdev device directory.
> > >
> > > Cc: Kevin Tian <kevin.tian@intel.com>
> > > Cc: "Jiang, Dave" <dave.jiang@intel.com>
> > > Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
> > > ---
> > >  .../driver-api/vfio-mediated-device.rst       | 19 +++++++++++++++++++
> > >  1 file changed, 19 insertions(+)
> > >
> > > diff --git a/Documentation/driver-api/vfio-mediated-device.rst
> > > b/Documentation/driver-api/vfio-mediated-device.rst
> > > index 25eb7d5b834b..29c29432a847 100644
> > > --- a/Documentation/driver-api/vfio-mediated-device.rst
> > > +++ b/Documentation/driver-api/vfio-mediated-device.rst
> > > @@ -269,6 +269,9 @@ Directories and Files Under the sysfs for Each
> mdev
> > > Device
> > >    |--- [$MDEV_UUID]
> > >           |--- remove
> > >           |--- mdev_type {link to its type}
> > > +         |--- mdev [optional]
> > > +	     |--- aggregated_instances [optional]
> > > +	     |--- max_aggregation [optional]
> > >           |--- vendor-specific-attributes [optional]
> > >
> > >  * remove (write only)
> > > @@ -281,6 +284,22 @@ Example::
> > >
> > >  	# echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove
> > >
> > > +* mdev directory (optional)
> >
> > It sounds confusing to me when seeing a 'mdev' directory under a
> > mdev instance. How could one tell which attribute should put inside
> > or outside of 'mdev'?
> >
> 
> After mdev create you get uuid directory under normal device path, so
> from that point a 'mdev' directory can just tell this is a mdev
> device. And it's proposed by Alex before.

I didn't quite get. Isn't $MDEV_UUID plus mdev_type already tell this is
a mdev device? If it is insufficient, then we're broken already since there
is no such 'mdev' sub-directory before.

Alex?

> 
> Currently only mdev core could create attribute e.g 'remove' under
> device dir, vendor specific attrs need to be in attrs group. So 'mdev'
> directory here tries to be optional generic interface.

I'm a bit confused. Then why cannot the new nodes exposed through
vendor specific attributes? I may overlook previous discussion why using
attrs group doesn't work here. 😊

> 
> > > +
> > > +Vendor driver could create mdev directory to specify extra generic
> > > parameters
> > > +on mdev device by its type. Currently aggregation parameters are
> defined.
> > > +Vendor driver should provide both items to support.
> > > +
> > > +1) aggregated_instances (read/write)
> > > +
> > > +Set target aggregated instances for device. Reading will show current
> > > +count of aggregated instances. Writing value larger than
> max_aggregation
> > > +would fail and return error.
> >
> > Can one write a value multiple-times and at any time?
> >
> 
> yeah, of coz multiple times, but normally won't succeed after open.
> 
> > > +
> > > +2) max_aggregation (read only)
> > > +
> > > +Show maxium instances for aggregation.
> > > +
> >
> > "show maximum-allowed instances which can be aggregated for this
> device". is
> > this value static or dynamic? if dynamic then the user is expected to read
> this
> > field before every write. worthy of some clarification here.
> 
> yeah, user needs to read this before setting actual number, either static or
> dynamic
> depends on vendor resource type.

Then adding above information might make the description clearer.

Thanks
Kevin

> 
> Thanks
> 
> >
> > >  Mediated device Hot plug
> > >  ------------------------
> > >
> > > --
> > > 2.25.1
> >
> 
> --
> Open Source Technology Center, Intel ltd.
> 
> $gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] Documentation/driver-api/vfio-mediated-device.rst: update for aggregation support
  2020-03-27  6:16       ` Tian, Kevin
@ 2020-03-27  6:21         ` Zhenyu Wang
  0 siblings, 0 replies; 34+ messages in thread
From: Zhenyu Wang @ 2020-03-27  6:21 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: intel-gvt-dev, alex.williamson, Jiang, Dave, kvm

[-- Attachment #1: Type: text/plain, Size: 4941 bytes --]

On 2020.03.27 06:16:11 +0000, Tian, Kevin wrote:
> > From: Zhenyu Wang
> > Sent: Thursday, March 26, 2020 4:22 PM
> > 
> > On 2020.03.26 08:17:20 +0000, Tian, Kevin wrote:
> > > > From: Zhenyu Wang <zhenyuw@linux.intel.com>
> > > > Sent: Thursday, March 26, 2020 1:42 PM
> > > >
> > > > Update doc for mdev aggregation support. Describe mdev generic
> > > > parameter directory under mdev device directory.
> > > >
> > > > Cc: Kevin Tian <kevin.tian@intel.com>
> > > > Cc: "Jiang, Dave" <dave.jiang@intel.com>
> > > > Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
> > > > ---
> > > >  .../driver-api/vfio-mediated-device.rst       | 19 +++++++++++++++++++
> > > >  1 file changed, 19 insertions(+)
> > > >
> > > > diff --git a/Documentation/driver-api/vfio-mediated-device.rst
> > > > b/Documentation/driver-api/vfio-mediated-device.rst
> > > > index 25eb7d5b834b..29c29432a847 100644
> > > > --- a/Documentation/driver-api/vfio-mediated-device.rst
> > > > +++ b/Documentation/driver-api/vfio-mediated-device.rst
> > > > @@ -269,6 +269,9 @@ Directories and Files Under the sysfs for Each
> > mdev
> > > > Device
> > > >    |--- [$MDEV_UUID]
> > > >           |--- remove
> > > >           |--- mdev_type {link to its type}
> > > > +         |--- mdev [optional]
> > > > +	     |--- aggregated_instances [optional]
> > > > +	     |--- max_aggregation [optional]
> > > >           |--- vendor-specific-attributes [optional]
> > > >
> > > >  * remove (write only)
> > > > @@ -281,6 +284,22 @@ Example::
> > > >
> > > >  	# echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove
> > > >
> > > > +* mdev directory (optional)
> > >
> > > It sounds confusing to me when seeing a 'mdev' directory under a
> > > mdev instance. How could one tell which attribute should put inside
> > > or outside of 'mdev'?
> > >
> > 
> > After mdev create you get uuid directory under normal device path, so
> > from that point a 'mdev' directory can just tell this is a mdev
> > device. And it's proposed by Alex before.
> 
> I didn't quite get. Isn't $MDEV_UUID plus mdev_type already tell this is
> a mdev device? If it is insufficient, then we're broken already since there
> is no such 'mdev' sub-directory before.
>

yep, I ignored mdev_type link there. The original purpose is to have a
general agreed directory to put generic parameters on mdev. 

> Alex?
> 
> > 
> > Currently only mdev core could create attribute e.g 'remove' under
> > device dir, vendor specific attrs need to be in attrs group. So 'mdev'
> > directory here tries to be optional generic interface.
> 
> I'm a bit confused. Then why cannot the new nodes exposed through
> vendor specific attributes? I may overlook previous discussion why using
> attrs group doesn't work here. ????

Vendor driver e.g vfio-ccw or future SIOV driver is free to have
custom attributes for mdev resource definition. If you choose that
way, fine it's just defined by vendor. But if utilize common mdev
parameter attributes, you get common defined resource config method
instead.

> 
> > 
> > > > +
> > > > +Vendor driver could create mdev directory to specify extra generic
> > > > parameters
> > > > +on mdev device by its type. Currently aggregation parameters are
> > defined.
> > > > +Vendor driver should provide both items to support.
> > > > +
> > > > +1) aggregated_instances (read/write)
> > > > +
> > > > +Set target aggregated instances for device. Reading will show current
> > > > +count of aggregated instances. Writing value larger than
> > max_aggregation
> > > > +would fail and return error.
> > >
> > > Can one write a value multiple-times and at any time?
> > >
> > 
> > yeah, of coz multiple times, but normally won't succeed after open.
> > 
> > > > +
> > > > +2) max_aggregation (read only)
> > > > +
> > > > +Show maxium instances for aggregation.
> > > > +
> > >
> > > "show maximum-allowed instances which can be aggregated for this
> > device". is
> > > this value static or dynamic? if dynamic then the user is expected to read
> > this
> > > field before every write. worthy of some clarification here.
> > 
> > yeah, user needs to read this before setting actual number, either static or
> > dynamic
> > depends on vendor resource type.
> 
> Then adding above information might make the description clearer.
>

Sure.

Thanks

> > > >  Mediated device Hot plug
> > > >  ------------------------
> > > >
> > > > --
> > > > 2.25.1
> > >
> > 
> > --
> > Open Source Technology Center, Intel ltd.
> > 
> > $gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827
> _______________________________________________
> intel-gvt-dev mailing list
> intel-gvt-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev

-- 
Open Source Technology Center, Intel ltd.

$gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v2 2/2] drm/i915/gvt: mdev aggregation type
  2020-03-26  5:41 ` [PATCH v2 2/2] drm/i915/gvt: mdev aggregation type Zhenyu Wang
@ 2020-03-27  7:48   ` Tian, Kevin
  2020-03-27  8:12     ` Zhenyu Wang
  0 siblings, 1 reply; 34+ messages in thread
From: Tian, Kevin @ 2020-03-27  7:48 UTC (permalink / raw)
  To: Zhenyu Wang, alex.williamson; +Cc: kvm, intel-gvt-dev, Jiang, Dave, Yuan, Hang

> From: Zhenyu Wang <zhenyuw@linux.intel.com>
> Sent: Thursday, March 26, 2020 1:42 PM
> 
> To increase the flexibility for resource allocation on mdev types,f

remove trailing ',f'

> this trys to add aggregation attribute for mdev type that can
> aggregate gfx memory resources in GVT case.
> 
> Use sysfs method for this attribute, the most difference for GVT is
> that the gfx resource allocation will be delayed until mdev open
> instead of mdev create to allow aggregation setting before VM
> start. But gfx resource accouting would still handle left resource for
> target vGPU or other types.

the last sentence is not very clear. I suppose although the resource
(#aggregated_instances) is not allocated but it is already reserved
and accounted before mdev open?

> 
> "max_aggregation" would show maxium instances for aggregation on
> target vGPU type. "aggregated_instances" shows current count of aggregated

"target vGPU type" or "target vGPU instance"?

> instances. "max_aggregation" is read-only and user sets needed account

account->count

> by write to "aggregated_instances".
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: "Jiang, Dave" <dave.jiang@intel.com>
> Cc: "Yuan, Hang" <hang.yuan@intel.com>
> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gvt/aperture_gm.c |  44 +++++--
>  drivers/gpu/drm/i915/gvt/gtt.c         |   9 +-
>  drivers/gpu/drm/i915/gvt/gvt.c         |   7 +-
>  drivers/gpu/drm/i915/gvt/gvt.h         |  42 ++++--
>  drivers/gpu/drm/i915/gvt/kvmgt.c       | 115 ++++++++++++++++-
>  drivers/gpu/drm/i915/gvt/vgpu.c        | 172 +++++++++++++++++--------
>  6 files changed, 295 insertions(+), 94 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/aperture_gm.c
> b/drivers/gpu/drm/i915/gvt/aperture_gm.c
> index 0d6d59871308..9ee3083b37ae 100644
> --- a/drivers/gpu/drm/i915/gvt/aperture_gm.c
> +++ b/drivers/gpu/drm/i915/gvt/aperture_gm.c
> @@ -238,10 +238,10 @@ static void free_resource(struct intel_vgpu *vgpu)
>  	gvt->fence.vgpu_allocated_fence_num -= vgpu_fence_sz(vgpu);
>  }
> 
> -static int alloc_resource(struct intel_vgpu *vgpu,
> -		struct intel_vgpu_creation_params *param)
> +static int alloc_resource(struct intel_vgpu *vgpu)
>  {
>  	struct intel_gvt *gvt = vgpu->gvt;
> +	struct intel_vgpu_creation_params *param = &vgpu->param;
>  	unsigned long request, avail, max, taken;
>  	const char *item;
> 
> @@ -254,7 +254,7 @@ static int alloc_resource(struct intel_vgpu *vgpu,
>  	max = gvt_aperture_sz(gvt) - HOST_LOW_GM_SIZE;
>  	taken = gvt->gm.vgpu_allocated_low_gm_size;
>  	avail = max - taken;
> -	request = MB_TO_BYTES(param->low_gm_sz);
> +	request = param->low_gm_sz * param->aggregation;
> 
>  	if (request > avail)
>  		goto no_enough_resource;
> @@ -265,7 +265,7 @@ static int alloc_resource(struct intel_vgpu *vgpu,
>  	max = gvt_hidden_sz(gvt) - HOST_HIGH_GM_SIZE;
>  	taken = gvt->gm.vgpu_allocated_high_gm_size;
>  	avail = max - taken;
> -	request = MB_TO_BYTES(param->high_gm_sz);
> +	request = param->high_gm_sz * param->aggregation;
> 
>  	if (request > avail)
>  		goto no_enough_resource;
> @@ -283,8 +283,8 @@ static int alloc_resource(struct intel_vgpu *vgpu,
> 
>  	vgpu_fence_sz(vgpu) = request;
> 
> -	gvt->gm.vgpu_allocated_low_gm_size += MB_TO_BYTES(param-
> >low_gm_sz);
> -	gvt->gm.vgpu_allocated_high_gm_size += MB_TO_BYTES(param-
> >high_gm_sz);
> +	gvt->gm.vgpu_allocated_low_gm_size += param->low_gm_sz *
> param->aggregation;
> +	gvt->gm.vgpu_allocated_high_gm_size += param->high_gm_sz *
> param->aggregation;
>  	gvt->fence.vgpu_allocated_fence_num += param->fence_sz;
>  	return 0;
> 
> @@ -307,9 +307,34 @@ void intel_vgpu_free_resource(struct intel_vgpu
> *vgpu)
>  {
>  	free_vgpu_gm(vgpu);
>  	free_vgpu_fence(vgpu);
> +}
> +
> +void intel_vgpu_free_resource_count(struct intel_vgpu *vgpu)
> +{
>  	free_resource(vgpu);
>  }
> 
> +int intel_vgpu_alloc_resource_count(struct intel_vgpu *vgpu)
> +{
> +	return alloc_resource(vgpu);
> +}
> +
> +int intel_vgpu_adjust_resource_count(struct intel_vgpu *vgpu)
> +{
> +	if ((vgpu_aperture_sz(vgpu) != vgpu->param.low_gm_sz *
> +	     vgpu->param.aggregation) ||
> +	    (vgpu_hidden_sz(vgpu) != vgpu->param.high_gm_sz *
> +	     vgpu->param.aggregation)) {
> +		/* handle aggregation change */
> +		intel_vgpu_free_resource_count(vgpu);
> +		intel_vgpu_alloc_resource_count(vgpu);

this logic sounds like different from the commit msg. Earlier you
said the resource is not allocated until mdev open, while the
aggregated_interfaces is only allowed to be written before
mdev open. In such case, why would it required to handle the
case where a vgpu already has resource allocated then coming
a new request to adjust the number of instances?

> +		mutex_lock(&vgpu->gvt->lock);
> +		intel_gvt_update_vgpu_types(vgpu->gvt);
> +		mutex_unlock(&vgpu->gvt->lock);
> +	}
> +	return 0;
> +}
> +
>  /**
>   * intel_vgpu_reset_resource - reset resource state owned by a vGPU
>   * @vgpu: a vGPU
> @@ -338,15 +363,10 @@ void intel_vgpu_reset_resource(struct intel_vgpu
> *vgpu)
>   * zero on success, negative error code if failed.
>   *
>   */
> -int intel_vgpu_alloc_resource(struct intel_vgpu *vgpu,
> -		struct intel_vgpu_creation_params *param)
> +int intel_vgpu_alloc_resource(struct intel_vgpu *vgpu)
>  {
>  	int ret;
> 
> -	ret = alloc_resource(vgpu, param);
> -	if (ret)
> -		return ret;
> -
>  	ret = alloc_vgpu_gm(vgpu);
>  	if (ret)
>  		goto out_free_resource;
> diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
> index 2a4b23f8aa74..60f455134ed0 100644
> --- a/drivers/gpu/drm/i915/gvt/gtt.c
> +++ b/drivers/gpu/drm/i915/gvt/gtt.c
> @@ -2466,12 +2466,6 @@ int intel_vgpu_init_gtt(struct intel_vgpu *vgpu)
>  {
>  	struct intel_vgpu_gtt *gtt = &vgpu->gtt;
> 
> -	INIT_RADIX_TREE(&gtt->spt_tree, GFP_KERNEL);
> -
> -	INIT_LIST_HEAD(&gtt->ppgtt_mm_list_head);
> -	INIT_LIST_HEAD(&gtt->oos_page_list_head);
> -	INIT_LIST_HEAD(&gtt->post_shadow_list_head);
> -
>  	gtt->ggtt_mm = intel_vgpu_create_ggtt_mm(vgpu);
>  	if (IS_ERR(gtt->ggtt_mm)) {
>  		gvt_vgpu_err("fail to create mm for ggtt.\n");
> @@ -2508,6 +2502,9 @@ static void intel_vgpu_destroy_ggtt_mm(struct
> intel_vgpu *vgpu)
>  {
>  	struct intel_gvt_partial_pte *pos, *next;
> 
> +	if (!vgpu->gtt.ggtt_mm)
> +		return;
> +
>  	list_for_each_entry_safe(pos, next,
>  				 &vgpu->gtt.ggtt_mm-
> >ggtt_mm.partial_pte_list,
>  				 list) {
> diff --git a/drivers/gpu/drm/i915/gvt/gvt.c b/drivers/gpu/drm/i915/gvt/gvt.c
> index 9e1787867894..bbd77cba239e 100644
> --- a/drivers/gpu/drm/i915/gvt/gvt.c
> +++ b/drivers/gpu/drm/i915/gvt/gvt.c
> @@ -99,11 +99,11 @@ static ssize_t description_show(struct kobject *kobj,
> struct device *dev,
> 
>  	return sprintf(buf, "low_gm_size: %dMB\nhigh_gm_size: %dMB\n"
>  		       "fence: %d\nresolution: %s\n"
> -		       "weight: %d\n",
> +		       "weight: %d\naggregation: %s\n",
>  		       BYTES_TO_MB(type->low_gm_size),
>  		       BYTES_TO_MB(type->high_gm_size),
>  		       type->fence, vgpu_edid_str(type->resolution),
> -		       type->weight);
> +		       type->weight, type->aggregation ? "yes" : "no");
>  }
> 
>  static MDEV_TYPE_ATTR_RO(available_instances);
> @@ -185,6 +185,9 @@ static const struct intel_gvt_ops intel_gvt_ops = {
>  	.vgpu_get_dmabuf = intel_vgpu_get_dmabuf,
>  	.write_protect_handler = intel_vgpu_page_track_handler,
>  	.emulate_hotplug = intel_vgpu_emulate_hotplug,
> +	.vgpu_delayed_alloc = intel_vgpu_delayed_alloc,
> +	.vgpu_res_free = intel_vgpu_res_free,
> +	.vgpu_res_change = intel_vgpu_adjust_resource_count,
>  };
> 
>  static void init_device_info(struct intel_gvt *gvt)
> diff --git a/drivers/gpu/drm/i915/gvt/gvt.h b/drivers/gpu/drm/i915/gvt/gvt.h
> index 58c2c7932e3f..bb8f16d468f4 100644
> --- a/drivers/gpu/drm/i915/gvt/gvt.h
> +++ b/drivers/gpu/drm/i915/gvt/gvt.h
> @@ -165,15 +165,30 @@ struct intel_vgpu_submission {
>  	bool active;
>  };
> 
> +struct intel_vgpu_creation_params {
> +	__u64 handle;
> +	__u64 low_gm_sz;
> +	__u64 high_gm_sz;
> +	__u64 fence_sz;
> +	__u64 resolution;
> +	__s32 primary;
> +	__u64 vgpu_id;
> +	__u32 weight;
> +	__u32 aggregation; /* requested aggregation number if type
> supports */
> +};
> +
>  struct intel_vgpu {
>  	struct intel_gvt *gvt;
>  	struct mutex vgpu_lock;
>  	int id;
>  	unsigned long handle; /* vGPU handle used by hypervisor MPT
> modules */
>  	bool active;
> +	bool res_initialized;
>  	bool pv_notified;
>  	bool failsafe;
>  	unsigned int resetting_eng;
> +	struct intel_vgpu_creation_params param;
> +	struct intel_vgpu_type *type;
> 
>  	/* Both sched_data and sched_ctl can be seen a part of the global gvt
>  	 * scheduler structure. So below 2 vgpu data are protected
> @@ -276,6 +291,7 @@ struct intel_vgpu_type {
>  	unsigned int fence;
>  	unsigned int weight;
>  	enum intel_vgpu_edid resolution;
> +	bool aggregation;
>  };
> 
>  struct intel_gvt {
> @@ -402,22 +418,12 @@ int intel_gvt_load_firmware(struct intel_gvt *gvt);
>  #define vgpu_fence_base(vgpu) (vgpu->fence.base)
>  #define vgpu_fence_sz(vgpu) (vgpu->fence.size)
> 
> -struct intel_vgpu_creation_params {
> -	__u64 handle;
> -	__u64 low_gm_sz;  /* in MB */
> -	__u64 high_gm_sz; /* in MB */
> -	__u64 fence_sz;
> -	__u64 resolution;
> -	__s32 primary;
> -	__u64 vgpu_id;
> -
> -	__u32 weight;
> -};
> -
> -int intel_vgpu_alloc_resource(struct intel_vgpu *vgpu,
> -			      struct intel_vgpu_creation_params *param);
> +int intel_vgpu_alloc_resource(struct intel_vgpu *vgpu);
>  void intel_vgpu_reset_resource(struct intel_vgpu *vgpu);
>  void intel_vgpu_free_resource(struct intel_vgpu *vgpu);
> +int intel_vgpu_alloc_resource_count(struct intel_vgpu *vgpu);
> +void intel_vgpu_free_resource_count(struct intel_vgpu *vgpu);
> +int intel_vgpu_adjust_resource_count(struct intel_vgpu *vgpu);
>  void intel_vgpu_write_fence(struct intel_vgpu *vgpu,
>  	u32 fence, u64 value);
> 
> @@ -458,11 +464,13 @@ static inline void intel_vgpu_write_pci_bar(struct
> intel_vgpu *vgpu,
> 
>  int intel_gvt_init_vgpu_types(struct intel_gvt *gvt);
>  void intel_gvt_clean_vgpu_types(struct intel_gvt *gvt);
> +void intel_gvt_update_vgpu_types(struct intel_gvt *gvt);
> 
>  struct intel_vgpu *intel_gvt_create_idle_vgpu(struct intel_gvt *gvt);
>  void intel_gvt_destroy_idle_vgpu(struct intel_vgpu *vgpu);
>  struct intel_vgpu *intel_gvt_create_vgpu(struct intel_gvt *gvt,
>  					 struct intel_vgpu_type *type);
> +int intel_vgpu_delayed_alloc(struct intel_vgpu *vgpu);
>  void intel_gvt_destroy_vgpu(struct intel_vgpu *vgpu);
>  void intel_gvt_release_vgpu(struct intel_vgpu *vgpu);
>  void intel_gvt_reset_vgpu_locked(struct intel_vgpu *vgpu, bool dmlr,
> @@ -523,6 +531,9 @@ static inline u64 intel_vgpu_get_bar_gpa(struct
> intel_vgpu *vgpu, int bar)
>  			PCI_BASE_ADDRESS_MEM_MASK;
>  }
> 
> +int intel_vgpu_res_alloc(struct intel_vgpu *vgpu);
> +void intel_vgpu_res_free(struct intel_vgpu *vgpu);
> +
>  void intel_vgpu_clean_opregion(struct intel_vgpu *vgpu);
>  int intel_vgpu_init_opregion(struct intel_vgpu *vgpu);
>  int intel_vgpu_opregion_base_write_handler(struct intel_vgpu *vgpu, u32
> gpa);
> @@ -557,6 +568,9 @@ struct intel_gvt_ops {
>  	int (*write_protect_handler)(struct intel_vgpu *, u64, void *,
>  				     unsigned int);
>  	void (*emulate_hotplug)(struct intel_vgpu *vgpu, bool connected);
> +	int (*vgpu_delayed_alloc)(struct intel_vgpu *vgpu);
> +	void (*vgpu_res_free)(struct intel_vgpu *vgpu);
> +	int (*vgpu_res_change)(struct intel_vgpu *vgpu);
>  };
> 
> 
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c
> b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 074c4efb58eb..b1a4096c6c50 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -711,6 +711,7 @@ static int intel_vgpu_create(struct kobject *kobj,
> struct mdev_device *mdev)
>  		gvt_err("failed to create intel vgpu: %d\n", ret);
>  		goto out;
>  	}
> +	vgpu->type = type;
> 
>  	INIT_WORK(&kvmgt_vdev(vgpu)->release_work,
> intel_vgpu_release_work);
> 
> @@ -793,6 +794,8 @@ static int intel_vgpu_open(struct mdev_device *mdev)
>  	unsigned long events;
>  	int ret;
> 
> +	mutex_lock(&vgpu->vgpu_lock);
> +
>  	vdev->iommu_notifier.notifier_call = intel_vgpu_iommu_notifier;
>  	vdev->group_notifier.notifier_call = intel_vgpu_group_notifier;
> 
> @@ -814,21 +817,32 @@ static int intel_vgpu_open(struct mdev_device
> *mdev)
>  		goto undo_iommu;
>  	}
> 
> +	if (vgpu->type->aggregation && vgpu->param.aggregation) {
> +		ret = intel_gvt_ops->vgpu_delayed_alloc(vgpu);
> +		if (ret)
> +			goto undo_group;
> +	}
> +
>  	/* Take a module reference as mdev core doesn't take
>  	 * a reference for vendor driver.
>  	 */
>  	if (!try_module_get(THIS_MODULE))
> -		goto undo_group;
> +		goto free_res;
> 
>  	ret = kvmgt_guest_init(mdev);
> -	if (ret)
> -		goto undo_group;
> +	if (ret) {
> +		module_put(THIS_MODULE);
> +		goto free_res;
> +	}
> 
>  	intel_gvt_ops->vgpu_activate(vgpu);
> 
>  	atomic_set(&vdev->released, 0);
> -	return ret;
> +	goto out;
> 
> +free_res:
> +	if (vgpu->type->aggregation && vgpu->param.aggregation)
> +		intel_gvt_ops->vgpu_res_free(vgpu);
>  undo_group:
>  	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
>  					&vdev->group_notifier);
> @@ -837,6 +851,7 @@ static int intel_vgpu_open(struct mdev_device *mdev)
>  	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
>  					&vdev->iommu_notifier);
>  out:
> +	mutex_unlock(&vgpu->vgpu_lock);
>  	return ret;
>  }
> 
> @@ -1628,8 +1643,100 @@ static const struct attribute_group
> intel_vgpu_group = {
>  	.attrs = intel_vgpu_attrs,
>  };
> 
> +/*
> + * "max_aggregation" display maxium instances for aggregation,
> + * if type doesn't support aggregation, it displays 0
> + */
> +static ssize_t
> +max_aggregation_show(struct device *dev, struct device_attribute *attr,
> +		     char *buf)
> +{
> +	struct mdev_device *mdev = mdev_from_dev(dev);
> +
> +	if (mdev) {
> +		struct intel_vgpu *vgpu = (struct intel_vgpu *)
> +			mdev_get_drvdata(mdev);
> +		struct intel_vgpu_type *type = vgpu->type;
> +		/* return left avail + current allocated as maxium */
> +		unsigned int m = type->avail_instance +
> +			vgpu->param.aggregation;
> +		if (type->aggregation)
> +			return sprintf(buf, "%u\n", m);
> +	}
> +	return sprintf(buf, "0\n");
> +}
> +static DEVICE_ATTR_RO(max_aggregation);
> +
> +static ssize_t
> +aggregated_instances_store(struct device *dev, struct device_attribute *attr,
> +			   const char *buf, size_t count)
> +{
> +	struct mdev_device *mdev = mdev_from_dev(dev);
> +	unsigned long val;
> +	ssize_t ret = -EINVAL;
> +
> +	if (kstrtoul(buf, 0, &val) < 0)
> +		return ret;
> +
> +	if (val > 0 && mdev) {
> +		struct intel_vgpu *vgpu = (struct intel_vgpu *)
> +			mdev_get_drvdata(mdev);
> +		struct intel_vgpu_type *type = vgpu->type;
> +
> +		if (val == vgpu->param.aggregation)
> +			return count;
> +
> +		mutex_lock(&vgpu->vgpu_lock);
> +		if (vgpu->active) {
> +			mutex_unlock(&vgpu->vgpu_lock);
> +			return ret;
> +		}
> +		/*
> +		 * value should be less than maxium aggregation
> +		 * instance number.
> +		 */
> +		if (type->aggregation &&
> +		    val <= (vgpu->param.aggregation + type->avail_instance)) {
> +			vgpu->param.aggregation = val;
> +			intel_gvt_ops->vgpu_res_change(vgpu);
> +			ret = count;
> +		}
> +		mutex_unlock(&vgpu->vgpu_lock);
> +	}
> +	return ret;
> +}
> +
> +static ssize_t
> +aggregated_instances_show(struct device *dev, struct device_attribute *attr,
> +		 char *buf)
> +{
> +	struct mdev_device *mdev = mdev_from_dev(dev);
> +
> +	if (mdev) {
> +		struct intel_vgpu *vgpu = (struct intel_vgpu *)
> +			mdev_get_drvdata(mdev);
> +		struct intel_vgpu_type *type = vgpu->type;
> +		if (type->aggregation)
> +			return sprintf(buf, "%u\n", vgpu-
> >param.aggregation);
> +	}
> +	return sprintf(buf, "0\n");
> +}
> +static DEVICE_ATTR_RW(aggregated_instances);
> +
> +static struct attribute *mdev_attrs[] = {
> +	&dev_attr_max_aggregation.attr,
> +	&dev_attr_aggregated_instances.attr,
> +	NULL
> +};
> +
> +static const struct attribute_group mdev_group = {
> +	.name = "mdev",
> +	.attrs = mdev_attrs,
> +};
> +
>  static const struct attribute_group *intel_vgpu_groups[] = {
>  	&intel_vgpu_group,
> +	&mdev_group,
>  	NULL,
>  };
> 
> diff --git a/drivers/gpu/drm/i915/gvt/vgpu.c
> b/drivers/gpu/drm/i915/gvt/vgpu.c
> index 1d5ff88078bd..08a1b164e9a8 100644
> --- a/drivers/gpu/drm/i915/gvt/vgpu.c
> +++ b/drivers/gpu/drm/i915/gvt/vgpu.c
> @@ -89,12 +89,13 @@ static struct {
>  	unsigned int weight;
>  	enum intel_vgpu_edid edid;
>  	char *name;
> +	bool aggregation;
>  } vgpu_types[] = {
>  /* Fixed vGPU type table */
> -	{ MB_TO_BYTES(64), MB_TO_BYTES(384), 4, VGPU_WEIGHT(8),
> GVT_EDID_1024_768, "8" },
> -	{ MB_TO_BYTES(128), MB_TO_BYTES(512), 4, VGPU_WEIGHT(4),
> GVT_EDID_1920_1200, "4" },
> -	{ MB_TO_BYTES(256), MB_TO_BYTES(1024), 4, VGPU_WEIGHT(2),
> GVT_EDID_1920_1200, "2" },
> -	{ MB_TO_BYTES(512), MB_TO_BYTES(2048), 4, VGPU_WEIGHT(1),
> GVT_EDID_1920_1200, "1" },
> +	{ MB_TO_BYTES(64), MB_TO_BYTES(384), 4, VGPU_WEIGHT(8),
> GVT_EDID_1024_768, "8", true },
> +	{ MB_TO_BYTES(128), MB_TO_BYTES(512), 4, VGPU_WEIGHT(4),
> GVT_EDID_1920_1200, "4", false },
> +	{ MB_TO_BYTES(256), MB_TO_BYTES(1024), 4, VGPU_WEIGHT(2),
> GVT_EDID_1920_1200, "2", false },
> +	{ MB_TO_BYTES(512), MB_TO_BYTES(2048), 4, VGPU_WEIGHT(1),
> GVT_EDID_1920_1200, "1", false },
>  };
> 
>  /**
> @@ -148,6 +149,7 @@ int intel_gvt_init_vgpu_types(struct intel_gvt *gvt)
>  		gvt->types[i].resolution = vgpu_types[i].edid;
>  		gvt->types[i].avail_instance = min(low_avail /
> vgpu_types[i].low_mm,
>  						   high_avail /
> vgpu_types[i].high_mm);
> +		gvt->types[i].aggregation = vgpu_types[i].aggregation;
> 
>  		if (IS_GEN(gvt->gt->i915, 8))
>  			sprintf(gvt->types[i].name, "GVTg_V4_%s",
> @@ -174,7 +176,7 @@ void intel_gvt_clean_vgpu_types(struct intel_gvt *gvt)
>  	kfree(gvt->types);
>  }
> 
> -static void intel_gvt_update_vgpu_types(struct intel_gvt *gvt)
> +void intel_gvt_update_vgpu_types(struct intel_gvt *gvt)
>  {
>  	int i;
>  	unsigned int low_gm_avail, high_gm_avail, fence_avail;
> @@ -213,9 +215,7 @@ static void intel_gvt_update_vgpu_types(struct
> intel_gvt *gvt)
>   */
>  void intel_gvt_activate_vgpu(struct intel_vgpu *vgpu)
>  {
> -	mutex_lock(&vgpu->vgpu_lock);
>  	vgpu->active = true;
> -	mutex_unlock(&vgpu->vgpu_lock);
>  }
> 
>  /**
> @@ -259,6 +259,8 @@ void intel_gvt_release_vgpu(struct intel_vgpu *vgpu)
>  	mutex_lock(&vgpu->vgpu_lock);
>  	intel_vgpu_clean_workloads(vgpu, ALL_ENGINES);
>  	intel_vgpu_dmabuf_cleanup(vgpu);
> +	if (vgpu->type->aggregation)
> +		intel_vgpu_res_free(vgpu);
>  	mutex_unlock(&vgpu->vgpu_lock);
>  }
> 
> @@ -290,10 +292,13 @@ void intel_gvt_destroy_vgpu(struct intel_vgpu
> *vgpu)
>  	intel_vgpu_clean_submission(vgpu);
>  	intel_vgpu_clean_display(vgpu);
>  	intel_vgpu_clean_opregion(vgpu);
> -	intel_vgpu_reset_ggtt(vgpu, true);
> -	intel_vgpu_clean_gtt(vgpu);
> +	if (vgpu->res_initialized) {
> +		intel_vgpu_reset_ggtt(vgpu, true);
> +		intel_vgpu_clean_gtt(vgpu);
> +		intel_vgpu_free_resource(vgpu);
> +	}
> +	intel_vgpu_free_resource_count(vgpu);
>  	intel_gvt_hypervisor_detach_vgpu(vgpu);
> -	intel_vgpu_free_resource(vgpu);
>  	intel_vgpu_clean_mmio(vgpu);
>  	intel_vgpu_dmabuf_cleanup(vgpu);
>  	mutex_unlock(&vgpu->vgpu_lock);
> @@ -364,59 +369,85 @@ void intel_gvt_destroy_idle_vgpu(struct intel_vgpu
> *vgpu)
>  	vfree(vgpu);
>  }
> 
> -static struct intel_vgpu *__intel_gvt_create_vgpu(struct intel_gvt *gvt,
> -		struct intel_vgpu_creation_params *param)
> +int intel_vgpu_res_alloc(struct intel_vgpu *vgpu)
>  {
> -	struct intel_vgpu *vgpu;
>  	int ret;
> 
> -	gvt_dbg_core("handle %llu low %llu MB high %llu MB fence %llu\n",
> -			param->handle, param->low_gm_sz, param-
> >high_gm_sz,
> -			param->fence_sz);
> +	ret = intel_vgpu_alloc_resource(vgpu);
> +	if (ret)
> +		return ret;
> 
> -	vgpu = vzalloc(sizeof(*vgpu));
> -	if (!vgpu)
> -		return ERR_PTR(-ENOMEM);
> +	populate_pvinfo_page(vgpu);
> +
> +	ret = intel_vgpu_init_gtt(vgpu);
> +	if (ret) {
> +		intel_vgpu_free_resource(vgpu);
> +		return ret;
> +	}
> +	vgpu->res_initialized = true;
> +	return 0;
> +}
> +
> +void intel_vgpu_res_free(struct intel_vgpu *vgpu)
> +{
> +	intel_vgpu_reset_ggtt(vgpu, true);
> +	intel_vgpu_clean_gtt(vgpu);
> +	intel_vgpu_free_resource(vgpu);
> +	vgpu->res_initialized = false;
> +	mutex_lock(&vgpu->gvt->lock);
> +	intel_gvt_update_vgpu_types(vgpu->gvt);
> +	mutex_unlock(&vgpu->gvt->lock);
> +}
> +
> +static int
> +__intel_gvt_create_vgpu(struct intel_vgpu *vgpu,
> +			bool delay_res_alloc)
> +{
> +	struct intel_gvt *gvt = vgpu->gvt;
> +	struct intel_vgpu_gtt *gtt = &vgpu->gtt;
> +	int ret;
> 
>  	ret = idr_alloc(&gvt->vgpu_idr, vgpu, IDLE_VGPU_IDR + 1,
> GVT_MAX_VGPU,
> -		GFP_KERNEL);
> +			GFP_KERNEL);
>  	if (ret < 0)
> -		goto out_free_vgpu;
> -
> +		return ret;
>  	vgpu->id = ret;
> -	vgpu->handle = param->handle;
> -	vgpu->gvt = gvt;
> -	vgpu->sched_ctl.weight = param->weight;
> +
>  	mutex_init(&vgpu->vgpu_lock);
>  	mutex_init(&vgpu->dmabuf_lock);
>  	INIT_LIST_HEAD(&vgpu->dmabuf_obj_list_head);
>  	INIT_RADIX_TREE(&vgpu->page_track_tree, GFP_KERNEL);
>  	idr_init(&vgpu->object_idr);
> -	intel_vgpu_init_cfg_space(vgpu, param->primary);
> +	INIT_RADIX_TREE(&gtt->spt_tree, GFP_KERNEL);
> +	INIT_LIST_HEAD(&gtt->ppgtt_mm_list_head);
> +	INIT_LIST_HEAD(&gtt->oos_page_list_head);
> +	INIT_LIST_HEAD(&gtt->post_shadow_list_head);
> +
> +	intel_vgpu_init_cfg_space(vgpu, vgpu->param.primary);
> 
>  	ret = intel_vgpu_init_mmio(vgpu);
>  	if (ret)
>  		goto out_clean_idr;
> 
> -	ret = intel_vgpu_alloc_resource(vgpu, param);
> +	ret = intel_vgpu_alloc_resource_count(vgpu);
>  	if (ret)
>  		goto out_clean_vgpu_mmio;
> 
> -	populate_pvinfo_page(vgpu);
> +	if (!delay_res_alloc) {
> +		ret = intel_vgpu_res_alloc(vgpu);
> +		if (ret)
> +			goto out_clean_vgpu_mmio;
> +	}

If delayed resource allocation works correctly, why do we still
need support non-delayed flavor? Even a type doesn't support
aggregation, it doesn't hurt to do allocation until mdev open...

> 
>  	ret = intel_gvt_hypervisor_attach_vgpu(vgpu);
>  	if (ret)
>  		goto out_clean_vgpu_resource;
> 
> -	ret = intel_vgpu_init_gtt(vgpu);
> -	if (ret)
> -		goto out_detach_hypervisor_vgpu;
> -
>  	ret = intel_vgpu_init_opregion(vgpu);
>  	if (ret)
> -		goto out_clean_gtt;
> +		goto out_detach_hypervisor_vgpu;
> 
> -	ret = intel_vgpu_init_display(vgpu, param->resolution);
> +	ret = intel_vgpu_init_display(vgpu, vgpu->param.resolution);
>  	if (ret)
>  		goto out_clean_opregion;
> 
> @@ -438,7 +469,7 @@ static struct intel_vgpu
> *__intel_gvt_create_vgpu(struct intel_gvt *gvt,
>  	if (ret)
>  		goto out_clean_sched_policy;
> 
> -	return vgpu;
> +	return 0;
> 
>  out_clean_sched_policy:
>  	intel_vgpu_clean_sched_policy(vgpu);
> @@ -448,19 +479,17 @@ static struct intel_vgpu
> *__intel_gvt_create_vgpu(struct intel_gvt *gvt,
>  	intel_vgpu_clean_display(vgpu);
>  out_clean_opregion:
>  	intel_vgpu_clean_opregion(vgpu);
> -out_clean_gtt:
> -	intel_vgpu_clean_gtt(vgpu);
>  out_detach_hypervisor_vgpu:
>  	intel_gvt_hypervisor_detach_vgpu(vgpu);
>  out_clean_vgpu_resource:
> +	intel_vgpu_clean_gtt(vgpu);
>  	intel_vgpu_free_resource(vgpu);
> +	intel_gvt_update_vgpu_types(vgpu->gvt);
>  out_clean_vgpu_mmio:
>  	intel_vgpu_clean_mmio(vgpu);
>  out_clean_idr:
>  	idr_remove(&gvt->vgpu_idr, vgpu->id);
> -out_free_vgpu:
> -	vfree(vgpu);
> -	return ERR_PTR(ret);
> +	return ret;
>  }
> 
>  /**
> @@ -474,33 +503,64 @@ static struct intel_vgpu
> *__intel_gvt_create_vgpu(struct intel_gvt *gvt,
>   * pointer to intel_vgpu, error pointer if failed.
>   */
>  struct intel_vgpu *intel_gvt_create_vgpu(struct intel_gvt *gvt,
> -				struct intel_vgpu_type *type)
> +					 struct intel_vgpu_type *type)
>  {
> -	struct intel_vgpu_creation_params param;
>  	struct intel_vgpu *vgpu;
> +	struct intel_vgpu_creation_params *param;
> +	int ret;
> +
> +	vgpu = vzalloc(sizeof(*vgpu));
> +	if (!vgpu)
> +		return ERR_PTR(-ENOMEM);
> 
> -	param.handle = 0;
> -	param.primary = 1;
> -	param.low_gm_sz = type->low_gm_size;
> -	param.high_gm_sz = type->high_gm_size;
> -	param.fence_sz = type->fence;
> -	param.weight = type->weight;
> -	param.resolution = type->resolution;
> +	param = &vgpu->param;
> +	param->handle = 0;
> +	param->primary = 1;
> +	param->low_gm_sz = type->low_gm_size;
> +	param->high_gm_sz = type->high_gm_size;
> +	param->fence_sz = type->fence;
> +	param->weight = type->weight;
> +	param->resolution = type->resolution;
> 
> -	/* XXX current param based on MB */
> -	param.low_gm_sz = BYTES_TO_MB(param.low_gm_sz);
> -	param.high_gm_sz = BYTES_TO_MB(param.high_gm_sz);
> +	gvt_dbg_core("handle %llu low %llu MB high %llu MB fence %llu\n",
> +		     param->handle, BYTES_TO_MB(param->low_gm_sz),
> +		     BYTES_TO_MB(param->high_gm_sz),
> +		     param->fence_sz);
> +
> +	vgpu->handle = param->handle;
> +	vgpu->gvt = gvt;
> +	vgpu->sched_ctl.weight = param->weight;
> +	param->aggregation = 1;
> 
>  	mutex_lock(&gvt->lock);
> -	vgpu = __intel_gvt_create_vgpu(gvt, &param);
> -	if (!IS_ERR(vgpu))
> -		/* calculate left instance change for types */
> -		intel_gvt_update_vgpu_types(gvt);
> +	ret = __intel_gvt_create_vgpu(vgpu, type->aggregation ? true : false);
> +	if (ret) {
> +		mutex_unlock(&gvt->lock);
> +		vfree(vgpu);
> +		return ERR_PTR(ret);
> +	}
> +	/* calculate left instance change for types */
> +	intel_gvt_update_vgpu_types(vgpu->gvt);
>  	mutex_unlock(&gvt->lock);
> 
>  	return vgpu;
>  }
> 
> +int intel_vgpu_delayed_alloc(struct intel_vgpu *vgpu)
> +{
> +	int ret;
> +
> +	intel_vgpu_adjust_resource_count(vgpu);
> +	ret = intel_vgpu_res_alloc(vgpu);
> +	if (ret)
> +		return -EINVAL;
> +
> +	mutex_lock(&vgpu->gvt->lock);
> +	intel_gvt_update_vgpu_types(vgpu->gvt);
> +	mutex_unlock(&vgpu->gvt->lock);
> +	return 0;
> +}
> +
>  /**
>   * intel_gvt_reset_vgpu_locked - reset a virtual GPU by DMLR or GT reset
>   * @vgpu: virtual GPU
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/2] drm/i915/gvt: mdev aggregation type
  2020-03-27  7:48   ` Tian, Kevin
@ 2020-03-27  8:12     ` Zhenyu Wang
  2020-03-27  8:44       ` Tian, Kevin
  0 siblings, 1 reply; 34+ messages in thread
From: Zhenyu Wang @ 2020-03-27  8:12 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: alex.williamson, kvm, intel-gvt-dev, Jiang, Dave, Yuan, Hang

[-- Attachment #1: Type: text/plain, Size: 29524 bytes --]

On 2020.03.27 07:48:14 +0000, Tian, Kevin wrote:
> > From: Zhenyu Wang <zhenyuw@linux.intel.com>
> > Sent: Thursday, March 26, 2020 1:42 PM
> > 
> > To increase the flexibility for resource allocation on mdev types,f
> 
> remove trailing ',f'

oops, typo

> 
> > this trys to add aggregation attribute for mdev type that can
> > aggregate gfx memory resources in GVT case.
> > 
> > Use sysfs method for this attribute, the most difference for GVT is
> > that the gfx resource allocation will be delayed until mdev open
> > instead of mdev create to allow aggregation setting before VM
> > start. But gfx resource accouting would still handle left resource for
> > target vGPU or other types.
> 
> the last sentence is not very clear. I suppose although the resource
> (#aggregated_instances) is not allocated but it is already reserved
> and accounted before mdev open?

yeah, that means to handle resource accounting before mdev open.

> 
> > 
> > "max_aggregation" would show maxium instances for aggregation on
> > target vGPU type. "aggregated_instances" shows current count of aggregated
> 
> "target vGPU type" or "target vGPU instance"?

yeah, for target instance.

> 
> > instances. "max_aggregation" is read-only and user sets needed account
> 
> account->count

ok.

> 
> > by write to "aggregated_instances".
> > 
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: "Jiang, Dave" <dave.jiang@intel.com>
> > Cc: "Yuan, Hang" <hang.yuan@intel.com>
> > Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/gvt/aperture_gm.c |  44 +++++--
> >  drivers/gpu/drm/i915/gvt/gtt.c         |   9 +-
> >  drivers/gpu/drm/i915/gvt/gvt.c         |   7 +-
> >  drivers/gpu/drm/i915/gvt/gvt.h         |  42 ++++--
> >  drivers/gpu/drm/i915/gvt/kvmgt.c       | 115 ++++++++++++++++-
> >  drivers/gpu/drm/i915/gvt/vgpu.c        | 172 +++++++++++++++++--------
> >  6 files changed, 295 insertions(+), 94 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gvt/aperture_gm.c
> > b/drivers/gpu/drm/i915/gvt/aperture_gm.c
> > index 0d6d59871308..9ee3083b37ae 100644
> > --- a/drivers/gpu/drm/i915/gvt/aperture_gm.c
> > +++ b/drivers/gpu/drm/i915/gvt/aperture_gm.c
> > @@ -238,10 +238,10 @@ static void free_resource(struct intel_vgpu *vgpu)
> >  	gvt->fence.vgpu_allocated_fence_num -= vgpu_fence_sz(vgpu);
> >  }
> > 
> > -static int alloc_resource(struct intel_vgpu *vgpu,
> > -		struct intel_vgpu_creation_params *param)
> > +static int alloc_resource(struct intel_vgpu *vgpu)
> >  {
> >  	struct intel_gvt *gvt = vgpu->gvt;
> > +	struct intel_vgpu_creation_params *param = &vgpu->param;
> >  	unsigned long request, avail, max, taken;
> >  	const char *item;
> > 
> > @@ -254,7 +254,7 @@ static int alloc_resource(struct intel_vgpu *vgpu,
> >  	max = gvt_aperture_sz(gvt) - HOST_LOW_GM_SIZE;
> >  	taken = gvt->gm.vgpu_allocated_low_gm_size;
> >  	avail = max - taken;
> > -	request = MB_TO_BYTES(param->low_gm_sz);
> > +	request = param->low_gm_sz * param->aggregation;
> > 
> >  	if (request > avail)
> >  		goto no_enough_resource;
> > @@ -265,7 +265,7 @@ static int alloc_resource(struct intel_vgpu *vgpu,
> >  	max = gvt_hidden_sz(gvt) - HOST_HIGH_GM_SIZE;
> >  	taken = gvt->gm.vgpu_allocated_high_gm_size;
> >  	avail = max - taken;
> > -	request = MB_TO_BYTES(param->high_gm_sz);
> > +	request = param->high_gm_sz * param->aggregation;
> > 
> >  	if (request > avail)
> >  		goto no_enough_resource;
> > @@ -283,8 +283,8 @@ static int alloc_resource(struct intel_vgpu *vgpu,
> > 
> >  	vgpu_fence_sz(vgpu) = request;
> > 
> > -	gvt->gm.vgpu_allocated_low_gm_size += MB_TO_BYTES(param-
> > >low_gm_sz);
> > -	gvt->gm.vgpu_allocated_high_gm_size += MB_TO_BYTES(param-
> > >high_gm_sz);
> > +	gvt->gm.vgpu_allocated_low_gm_size += param->low_gm_sz *
> > param->aggregation;
> > +	gvt->gm.vgpu_allocated_high_gm_size += param->high_gm_sz *
> > param->aggregation;
> >  	gvt->fence.vgpu_allocated_fence_num += param->fence_sz;
> >  	return 0;
> > 
> > @@ -307,9 +307,34 @@ void intel_vgpu_free_resource(struct intel_vgpu
> > *vgpu)
> >  {
> >  	free_vgpu_gm(vgpu);
> >  	free_vgpu_fence(vgpu);
> > +}
> > +
> > +void intel_vgpu_free_resource_count(struct intel_vgpu *vgpu)
> > +{
> >  	free_resource(vgpu);
> >  }
> > 
> > +int intel_vgpu_alloc_resource_count(struct intel_vgpu *vgpu)
> > +{
> > +	return alloc_resource(vgpu);
> > +}
> > +
> > +int intel_vgpu_adjust_resource_count(struct intel_vgpu *vgpu)
> > +{
> > +	if ((vgpu_aperture_sz(vgpu) != vgpu->param.low_gm_sz *
> > +	     vgpu->param.aggregation) ||
> > +	    (vgpu_hidden_sz(vgpu) != vgpu->param.high_gm_sz *
> > +	     vgpu->param.aggregation)) {
> > +		/* handle aggregation change */
> > +		intel_vgpu_free_resource_count(vgpu);
> > +		intel_vgpu_alloc_resource_count(vgpu);
> 
> this logic sounds like different from the commit msg. Earlier you
> said the resource is not allocated until mdev open, while the
> aggregated_interfaces is only allowed to be written before
> mdev open. In such case, why would it required to handle the
> case where a vgpu already has resource allocated then coming
> a new request to adjust the number of instances?

This is to handle resource accounting before mdev open by aggregation
setting change. When vgpu create from mdev type, default resource
count according to type is set for vgpu. So this function is just to
change resource count by aggregation.

> 
> > +		mutex_lock(&vgpu->gvt->lock);
> > +		intel_gvt_update_vgpu_types(vgpu->gvt);
> > +		mutex_unlock(&vgpu->gvt->lock);
> > +	}
> > +	return 0;
> > +}
> > +
> >  /**
> >   * intel_vgpu_reset_resource - reset resource state owned by a vGPU
> >   * @vgpu: a vGPU
> > @@ -338,15 +363,10 @@ void intel_vgpu_reset_resource(struct intel_vgpu
> > *vgpu)
> >   * zero on success, negative error code if failed.
> >   *
> >   */
> > -int intel_vgpu_alloc_resource(struct intel_vgpu *vgpu,
> > -		struct intel_vgpu_creation_params *param)
> > +int intel_vgpu_alloc_resource(struct intel_vgpu *vgpu)
> >  {
> >  	int ret;
> > 
> > -	ret = alloc_resource(vgpu, param);
> > -	if (ret)
> > -		return ret;
> > -
> >  	ret = alloc_vgpu_gm(vgpu);
> >  	if (ret)
> >  		goto out_free_resource;
> > diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
> > index 2a4b23f8aa74..60f455134ed0 100644
> > --- a/drivers/gpu/drm/i915/gvt/gtt.c
> > +++ b/drivers/gpu/drm/i915/gvt/gtt.c
> > @@ -2466,12 +2466,6 @@ int intel_vgpu_init_gtt(struct intel_vgpu *vgpu)
> >  {
> >  	struct intel_vgpu_gtt *gtt = &vgpu->gtt;
> > 
> > -	INIT_RADIX_TREE(&gtt->spt_tree, GFP_KERNEL);
> > -
> > -	INIT_LIST_HEAD(&gtt->ppgtt_mm_list_head);
> > -	INIT_LIST_HEAD(&gtt->oos_page_list_head);
> > -	INIT_LIST_HEAD(&gtt->post_shadow_list_head);
> > -
> >  	gtt->ggtt_mm = intel_vgpu_create_ggtt_mm(vgpu);
> >  	if (IS_ERR(gtt->ggtt_mm)) {
> >  		gvt_vgpu_err("fail to create mm for ggtt.\n");
> > @@ -2508,6 +2502,9 @@ static void intel_vgpu_destroy_ggtt_mm(struct
> > intel_vgpu *vgpu)
> >  {
> >  	struct intel_gvt_partial_pte *pos, *next;
> > 
> > +	if (!vgpu->gtt.ggtt_mm)
> > +		return;
> > +
> >  	list_for_each_entry_safe(pos, next,
> >  				 &vgpu->gtt.ggtt_mm-
> > >ggtt_mm.partial_pte_list,
> >  				 list) {
> > diff --git a/drivers/gpu/drm/i915/gvt/gvt.c b/drivers/gpu/drm/i915/gvt/gvt.c
> > index 9e1787867894..bbd77cba239e 100644
> > --- a/drivers/gpu/drm/i915/gvt/gvt.c
> > +++ b/drivers/gpu/drm/i915/gvt/gvt.c
> > @@ -99,11 +99,11 @@ static ssize_t description_show(struct kobject *kobj,
> > struct device *dev,
> > 
> >  	return sprintf(buf, "low_gm_size: %dMB\nhigh_gm_size: %dMB\n"
> >  		       "fence: %d\nresolution: %s\n"
> > -		       "weight: %d\n",
> > +		       "weight: %d\naggregation: %s\n",
> >  		       BYTES_TO_MB(type->low_gm_size),
> >  		       BYTES_TO_MB(type->high_gm_size),
> >  		       type->fence, vgpu_edid_str(type->resolution),
> > -		       type->weight);
> > +		       type->weight, type->aggregation ? "yes" : "no");
> >  }
> > 
> >  static MDEV_TYPE_ATTR_RO(available_instances);
> > @@ -185,6 +185,9 @@ static const struct intel_gvt_ops intel_gvt_ops = {
> >  	.vgpu_get_dmabuf = intel_vgpu_get_dmabuf,
> >  	.write_protect_handler = intel_vgpu_page_track_handler,
> >  	.emulate_hotplug = intel_vgpu_emulate_hotplug,
> > +	.vgpu_delayed_alloc = intel_vgpu_delayed_alloc,
> > +	.vgpu_res_free = intel_vgpu_res_free,
> > +	.vgpu_res_change = intel_vgpu_adjust_resource_count,
> >  };
> > 
> >  static void init_device_info(struct intel_gvt *gvt)
> > diff --git a/drivers/gpu/drm/i915/gvt/gvt.h b/drivers/gpu/drm/i915/gvt/gvt.h
> > index 58c2c7932e3f..bb8f16d468f4 100644
> > --- a/drivers/gpu/drm/i915/gvt/gvt.h
> > +++ b/drivers/gpu/drm/i915/gvt/gvt.h
> > @@ -165,15 +165,30 @@ struct intel_vgpu_submission {
> >  	bool active;
> >  };
> > 
> > +struct intel_vgpu_creation_params {
> > +	__u64 handle;
> > +	__u64 low_gm_sz;
> > +	__u64 high_gm_sz;
> > +	__u64 fence_sz;
> > +	__u64 resolution;
> > +	__s32 primary;
> > +	__u64 vgpu_id;
> > +	__u32 weight;
> > +	__u32 aggregation; /* requested aggregation number if type
> > supports */
> > +};
> > +
> >  struct intel_vgpu {
> >  	struct intel_gvt *gvt;
> >  	struct mutex vgpu_lock;
> >  	int id;
> >  	unsigned long handle; /* vGPU handle used by hypervisor MPT
> > modules */
> >  	bool active;
> > +	bool res_initialized;
> >  	bool pv_notified;
> >  	bool failsafe;
> >  	unsigned int resetting_eng;
> > +	struct intel_vgpu_creation_params param;
> > +	struct intel_vgpu_type *type;
> > 
> >  	/* Both sched_data and sched_ctl can be seen a part of the global gvt
> >  	 * scheduler structure. So below 2 vgpu data are protected
> > @@ -276,6 +291,7 @@ struct intel_vgpu_type {
> >  	unsigned int fence;
> >  	unsigned int weight;
> >  	enum intel_vgpu_edid resolution;
> > +	bool aggregation;
> >  };
> > 
> >  struct intel_gvt {
> > @@ -402,22 +418,12 @@ int intel_gvt_load_firmware(struct intel_gvt *gvt);
> >  #define vgpu_fence_base(vgpu) (vgpu->fence.base)
> >  #define vgpu_fence_sz(vgpu) (vgpu->fence.size)
> > 
> > -struct intel_vgpu_creation_params {
> > -	__u64 handle;
> > -	__u64 low_gm_sz;  /* in MB */
> > -	__u64 high_gm_sz; /* in MB */
> > -	__u64 fence_sz;
> > -	__u64 resolution;
> > -	__s32 primary;
> > -	__u64 vgpu_id;
> > -
> > -	__u32 weight;
> > -};
> > -
> > -int intel_vgpu_alloc_resource(struct intel_vgpu *vgpu,
> > -			      struct intel_vgpu_creation_params *param);
> > +int intel_vgpu_alloc_resource(struct intel_vgpu *vgpu);
> >  void intel_vgpu_reset_resource(struct intel_vgpu *vgpu);
> >  void intel_vgpu_free_resource(struct intel_vgpu *vgpu);
> > +int intel_vgpu_alloc_resource_count(struct intel_vgpu *vgpu);
> > +void intel_vgpu_free_resource_count(struct intel_vgpu *vgpu);
> > +int intel_vgpu_adjust_resource_count(struct intel_vgpu *vgpu);
> >  void intel_vgpu_write_fence(struct intel_vgpu *vgpu,
> >  	u32 fence, u64 value);
> > 
> > @@ -458,11 +464,13 @@ static inline void intel_vgpu_write_pci_bar(struct
> > intel_vgpu *vgpu,
> > 
> >  int intel_gvt_init_vgpu_types(struct intel_gvt *gvt);
> >  void intel_gvt_clean_vgpu_types(struct intel_gvt *gvt);
> > +void intel_gvt_update_vgpu_types(struct intel_gvt *gvt);
> > 
> >  struct intel_vgpu *intel_gvt_create_idle_vgpu(struct intel_gvt *gvt);
> >  void intel_gvt_destroy_idle_vgpu(struct intel_vgpu *vgpu);
> >  struct intel_vgpu *intel_gvt_create_vgpu(struct intel_gvt *gvt,
> >  					 struct intel_vgpu_type *type);
> > +int intel_vgpu_delayed_alloc(struct intel_vgpu *vgpu);
> >  void intel_gvt_destroy_vgpu(struct intel_vgpu *vgpu);
> >  void intel_gvt_release_vgpu(struct intel_vgpu *vgpu);
> >  void intel_gvt_reset_vgpu_locked(struct intel_vgpu *vgpu, bool dmlr,
> > @@ -523,6 +531,9 @@ static inline u64 intel_vgpu_get_bar_gpa(struct
> > intel_vgpu *vgpu, int bar)
> >  			PCI_BASE_ADDRESS_MEM_MASK;
> >  }
> > 
> > +int intel_vgpu_res_alloc(struct intel_vgpu *vgpu);
> > +void intel_vgpu_res_free(struct intel_vgpu *vgpu);
> > +
> >  void intel_vgpu_clean_opregion(struct intel_vgpu *vgpu);
> >  int intel_vgpu_init_opregion(struct intel_vgpu *vgpu);
> >  int intel_vgpu_opregion_base_write_handler(struct intel_vgpu *vgpu, u32
> > gpa);
> > @@ -557,6 +568,9 @@ struct intel_gvt_ops {
> >  	int (*write_protect_handler)(struct intel_vgpu *, u64, void *,
> >  				     unsigned int);
> >  	void (*emulate_hotplug)(struct intel_vgpu *vgpu, bool connected);
> > +	int (*vgpu_delayed_alloc)(struct intel_vgpu *vgpu);
> > +	void (*vgpu_res_free)(struct intel_vgpu *vgpu);
> > +	int (*vgpu_res_change)(struct intel_vgpu *vgpu);
> >  };
> > 
> > 
> > diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c
> > b/drivers/gpu/drm/i915/gvt/kvmgt.c
> > index 074c4efb58eb..b1a4096c6c50 100644
> > --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> > +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> > @@ -711,6 +711,7 @@ static int intel_vgpu_create(struct kobject *kobj,
> > struct mdev_device *mdev)
> >  		gvt_err("failed to create intel vgpu: %d\n", ret);
> >  		goto out;
> >  	}
> > +	vgpu->type = type;
> > 
> >  	INIT_WORK(&kvmgt_vdev(vgpu)->release_work,
> > intel_vgpu_release_work);
> > 
> > @@ -793,6 +794,8 @@ static int intel_vgpu_open(struct mdev_device *mdev)
> >  	unsigned long events;
> >  	int ret;
> > 
> > +	mutex_lock(&vgpu->vgpu_lock);
> > +
> >  	vdev->iommu_notifier.notifier_call = intel_vgpu_iommu_notifier;
> >  	vdev->group_notifier.notifier_call = intel_vgpu_group_notifier;
> > 
> > @@ -814,21 +817,32 @@ static int intel_vgpu_open(struct mdev_device
> > *mdev)
> >  		goto undo_iommu;
> >  	}
> > 
> > +	if (vgpu->type->aggregation && vgpu->param.aggregation) {
> > +		ret = intel_gvt_ops->vgpu_delayed_alloc(vgpu);
> > +		if (ret)
> > +			goto undo_group;
> > +	}
> > +
> >  	/* Take a module reference as mdev core doesn't take
> >  	 * a reference for vendor driver.
> >  	 */
> >  	if (!try_module_get(THIS_MODULE))
> > -		goto undo_group;
> > +		goto free_res;
> > 
> >  	ret = kvmgt_guest_init(mdev);
> > -	if (ret)
> > -		goto undo_group;
> > +	if (ret) {
> > +		module_put(THIS_MODULE);
> > +		goto free_res;
> > +	}
> > 
> >  	intel_gvt_ops->vgpu_activate(vgpu);
> > 
> >  	atomic_set(&vdev->released, 0);
> > -	return ret;
> > +	goto out;
> > 
> > +free_res:
> > +	if (vgpu->type->aggregation && vgpu->param.aggregation)
> > +		intel_gvt_ops->vgpu_res_free(vgpu);
> >  undo_group:
> >  	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
> >  					&vdev->group_notifier);
> > @@ -837,6 +851,7 @@ static int intel_vgpu_open(struct mdev_device *mdev)
> >  	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
> >  					&vdev->iommu_notifier);
> >  out:
> > +	mutex_unlock(&vgpu->vgpu_lock);
> >  	return ret;
> >  }
> > 
> > @@ -1628,8 +1643,100 @@ static const struct attribute_group
> > intel_vgpu_group = {
> >  	.attrs = intel_vgpu_attrs,
> >  };
> > 
> > +/*
> > + * "max_aggregation" display maxium instances for aggregation,
> > + * if type doesn't support aggregation, it displays 0
> > + */
> > +static ssize_t
> > +max_aggregation_show(struct device *dev, struct device_attribute *attr,
> > +		     char *buf)
> > +{
> > +	struct mdev_device *mdev = mdev_from_dev(dev);
> > +
> > +	if (mdev) {
> > +		struct intel_vgpu *vgpu = (struct intel_vgpu *)
> > +			mdev_get_drvdata(mdev);
> > +		struct intel_vgpu_type *type = vgpu->type;
> > +		/* return left avail + current allocated as maxium */
> > +		unsigned int m = type->avail_instance +
> > +			vgpu->param.aggregation;
> > +		if (type->aggregation)
> > +			return sprintf(buf, "%u\n", m);
> > +	}
> > +	return sprintf(buf, "0\n");
> > +}
> > +static DEVICE_ATTR_RO(max_aggregation);
> > +
> > +static ssize_t
> > +aggregated_instances_store(struct device *dev, struct device_attribute *attr,
> > +			   const char *buf, size_t count)
> > +{
> > +	struct mdev_device *mdev = mdev_from_dev(dev);
> > +	unsigned long val;
> > +	ssize_t ret = -EINVAL;
> > +
> > +	if (kstrtoul(buf, 0, &val) < 0)
> > +		return ret;
> > +
> > +	if (val > 0 && mdev) {
> > +		struct intel_vgpu *vgpu = (struct intel_vgpu *)
> > +			mdev_get_drvdata(mdev);
> > +		struct intel_vgpu_type *type = vgpu->type;
> > +
> > +		if (val == vgpu->param.aggregation)
> > +			return count;
> > +
> > +		mutex_lock(&vgpu->vgpu_lock);
> > +		if (vgpu->active) {
> > +			mutex_unlock(&vgpu->vgpu_lock);
> > +			return ret;
> > +		}
> > +		/*
> > +		 * value should be less than maxium aggregation
> > +		 * instance number.
> > +		 */
> > +		if (type->aggregation &&
> > +		    val <= (vgpu->param.aggregation + type->avail_instance)) {
> > +			vgpu->param.aggregation = val;
> > +			intel_gvt_ops->vgpu_res_change(vgpu);
> > +			ret = count;
> > +		}
> > +		mutex_unlock(&vgpu->vgpu_lock);
> > +	}
> > +	return ret;
> > +}
> > +
> > +static ssize_t
> > +aggregated_instances_show(struct device *dev, struct device_attribute *attr,
> > +		 char *buf)
> > +{
> > +	struct mdev_device *mdev = mdev_from_dev(dev);
> > +
> > +	if (mdev) {
> > +		struct intel_vgpu *vgpu = (struct intel_vgpu *)
> > +			mdev_get_drvdata(mdev);
> > +		struct intel_vgpu_type *type = vgpu->type;
> > +		if (type->aggregation)
> > +			return sprintf(buf, "%u\n", vgpu-
> > >param.aggregation);
> > +	}
> > +	return sprintf(buf, "0\n");
> > +}
> > +static DEVICE_ATTR_RW(aggregated_instances);
> > +
> > +static struct attribute *mdev_attrs[] = {
> > +	&dev_attr_max_aggregation.attr,
> > +	&dev_attr_aggregated_instances.attr,
> > +	NULL
> > +};
> > +
> > +static const struct attribute_group mdev_group = {
> > +	.name = "mdev",
> > +	.attrs = mdev_attrs,
> > +};
> > +
> >  static const struct attribute_group *intel_vgpu_groups[] = {
> >  	&intel_vgpu_group,
> > +	&mdev_group,
> >  	NULL,
> >  };
> > 
> > diff --git a/drivers/gpu/drm/i915/gvt/vgpu.c
> > b/drivers/gpu/drm/i915/gvt/vgpu.c
> > index 1d5ff88078bd..08a1b164e9a8 100644
> > --- a/drivers/gpu/drm/i915/gvt/vgpu.c
> > +++ b/drivers/gpu/drm/i915/gvt/vgpu.c
> > @@ -89,12 +89,13 @@ static struct {
> >  	unsigned int weight;
> >  	enum intel_vgpu_edid edid;
> >  	char *name;
> > +	bool aggregation;
> >  } vgpu_types[] = {
> >  /* Fixed vGPU type table */
> > -	{ MB_TO_BYTES(64), MB_TO_BYTES(384), 4, VGPU_WEIGHT(8),
> > GVT_EDID_1024_768, "8" },
> > -	{ MB_TO_BYTES(128), MB_TO_BYTES(512), 4, VGPU_WEIGHT(4),
> > GVT_EDID_1920_1200, "4" },
> > -	{ MB_TO_BYTES(256), MB_TO_BYTES(1024), 4, VGPU_WEIGHT(2),
> > GVT_EDID_1920_1200, "2" },
> > -	{ MB_TO_BYTES(512), MB_TO_BYTES(2048), 4, VGPU_WEIGHT(1),
> > GVT_EDID_1920_1200, "1" },
> > +	{ MB_TO_BYTES(64), MB_TO_BYTES(384), 4, VGPU_WEIGHT(8),
> > GVT_EDID_1024_768, "8", true },
> > +	{ MB_TO_BYTES(128), MB_TO_BYTES(512), 4, VGPU_WEIGHT(4),
> > GVT_EDID_1920_1200, "4", false },
> > +	{ MB_TO_BYTES(256), MB_TO_BYTES(1024), 4, VGPU_WEIGHT(2),
> > GVT_EDID_1920_1200, "2", false },
> > +	{ MB_TO_BYTES(512), MB_TO_BYTES(2048), 4, VGPU_WEIGHT(1),
> > GVT_EDID_1920_1200, "1", false },
> >  };
> > 
> >  /**
> > @@ -148,6 +149,7 @@ int intel_gvt_init_vgpu_types(struct intel_gvt *gvt)
> >  		gvt->types[i].resolution = vgpu_types[i].edid;
> >  		gvt->types[i].avail_instance = min(low_avail /
> > vgpu_types[i].low_mm,
> >  						   high_avail /
> > vgpu_types[i].high_mm);
> > +		gvt->types[i].aggregation = vgpu_types[i].aggregation;
> > 
> >  		if (IS_GEN(gvt->gt->i915, 8))
> >  			sprintf(gvt->types[i].name, "GVTg_V4_%s",
> > @@ -174,7 +176,7 @@ void intel_gvt_clean_vgpu_types(struct intel_gvt *gvt)
> >  	kfree(gvt->types);
> >  }
> > 
> > -static void intel_gvt_update_vgpu_types(struct intel_gvt *gvt)
> > +void intel_gvt_update_vgpu_types(struct intel_gvt *gvt)
> >  {
> >  	int i;
> >  	unsigned int low_gm_avail, high_gm_avail, fence_avail;
> > @@ -213,9 +215,7 @@ static void intel_gvt_update_vgpu_types(struct
> > intel_gvt *gvt)
> >   */
> >  void intel_gvt_activate_vgpu(struct intel_vgpu *vgpu)
> >  {
> > -	mutex_lock(&vgpu->vgpu_lock);
> >  	vgpu->active = true;
> > -	mutex_unlock(&vgpu->vgpu_lock);
> >  }
> > 
> >  /**
> > @@ -259,6 +259,8 @@ void intel_gvt_release_vgpu(struct intel_vgpu *vgpu)
> >  	mutex_lock(&vgpu->vgpu_lock);
> >  	intel_vgpu_clean_workloads(vgpu, ALL_ENGINES);
> >  	intel_vgpu_dmabuf_cleanup(vgpu);
> > +	if (vgpu->type->aggregation)
> > +		intel_vgpu_res_free(vgpu);
> >  	mutex_unlock(&vgpu->vgpu_lock);
> >  }
> > 
> > @@ -290,10 +292,13 @@ void intel_gvt_destroy_vgpu(struct intel_vgpu
> > *vgpu)
> >  	intel_vgpu_clean_submission(vgpu);
> >  	intel_vgpu_clean_display(vgpu);
> >  	intel_vgpu_clean_opregion(vgpu);
> > -	intel_vgpu_reset_ggtt(vgpu, true);
> > -	intel_vgpu_clean_gtt(vgpu);
> > +	if (vgpu->res_initialized) {
> > +		intel_vgpu_reset_ggtt(vgpu, true);
> > +		intel_vgpu_clean_gtt(vgpu);
> > +		intel_vgpu_free_resource(vgpu);
> > +	}
> > +	intel_vgpu_free_resource_count(vgpu);
> >  	intel_gvt_hypervisor_detach_vgpu(vgpu);
> > -	intel_vgpu_free_resource(vgpu);
> >  	intel_vgpu_clean_mmio(vgpu);
> >  	intel_vgpu_dmabuf_cleanup(vgpu);
> >  	mutex_unlock(&vgpu->vgpu_lock);
> > @@ -364,59 +369,85 @@ void intel_gvt_destroy_idle_vgpu(struct intel_vgpu
> > *vgpu)
> >  	vfree(vgpu);
> >  }
> > 
> > -static struct intel_vgpu *__intel_gvt_create_vgpu(struct intel_gvt *gvt,
> > -		struct intel_vgpu_creation_params *param)
> > +int intel_vgpu_res_alloc(struct intel_vgpu *vgpu)
> >  {
> > -	struct intel_vgpu *vgpu;
> >  	int ret;
> > 
> > -	gvt_dbg_core("handle %llu low %llu MB high %llu MB fence %llu\n",
> > -			param->handle, param->low_gm_sz, param-
> > >high_gm_sz,
> > -			param->fence_sz);
> > +	ret = intel_vgpu_alloc_resource(vgpu);
> > +	if (ret)
> > +		return ret;
> > 
> > -	vgpu = vzalloc(sizeof(*vgpu));
> > -	if (!vgpu)
> > -		return ERR_PTR(-ENOMEM);
> > +	populate_pvinfo_page(vgpu);
> > +
> > +	ret = intel_vgpu_init_gtt(vgpu);
> > +	if (ret) {
> > +		intel_vgpu_free_resource(vgpu);
> > +		return ret;
> > +	}
> > +	vgpu->res_initialized = true;
> > +	return 0;
> > +}
> > +
> > +void intel_vgpu_res_free(struct intel_vgpu *vgpu)
> > +{
> > +	intel_vgpu_reset_ggtt(vgpu, true);
> > +	intel_vgpu_clean_gtt(vgpu);
> > +	intel_vgpu_free_resource(vgpu);
> > +	vgpu->res_initialized = false;
> > +	mutex_lock(&vgpu->gvt->lock);
> > +	intel_gvt_update_vgpu_types(vgpu->gvt);
> > +	mutex_unlock(&vgpu->gvt->lock);
> > +}
> > +
> > +static int
> > +__intel_gvt_create_vgpu(struct intel_vgpu *vgpu,
> > +			bool delay_res_alloc)
> > +{
> > +	struct intel_gvt *gvt = vgpu->gvt;
> > +	struct intel_vgpu_gtt *gtt = &vgpu->gtt;
> > +	int ret;
> > 
> >  	ret = idr_alloc(&gvt->vgpu_idr, vgpu, IDLE_VGPU_IDR + 1,
> > GVT_MAX_VGPU,
> > -		GFP_KERNEL);
> > +			GFP_KERNEL);
> >  	if (ret < 0)
> > -		goto out_free_vgpu;
> > -
> > +		return ret;
> >  	vgpu->id = ret;
> > -	vgpu->handle = param->handle;
> > -	vgpu->gvt = gvt;
> > -	vgpu->sched_ctl.weight = param->weight;
> > +
> >  	mutex_init(&vgpu->vgpu_lock);
> >  	mutex_init(&vgpu->dmabuf_lock);
> >  	INIT_LIST_HEAD(&vgpu->dmabuf_obj_list_head);
> >  	INIT_RADIX_TREE(&vgpu->page_track_tree, GFP_KERNEL);
> >  	idr_init(&vgpu->object_idr);
> > -	intel_vgpu_init_cfg_space(vgpu, param->primary);
> > +	INIT_RADIX_TREE(&gtt->spt_tree, GFP_KERNEL);
> > +	INIT_LIST_HEAD(&gtt->ppgtt_mm_list_head);
> > +	INIT_LIST_HEAD(&gtt->oos_page_list_head);
> > +	INIT_LIST_HEAD(&gtt->post_shadow_list_head);
> > +
> > +	intel_vgpu_init_cfg_space(vgpu, vgpu->param.primary);
> > 
> >  	ret = intel_vgpu_init_mmio(vgpu);
> >  	if (ret)
> >  		goto out_clean_idr;
> > 
> > -	ret = intel_vgpu_alloc_resource(vgpu, param);
> > +	ret = intel_vgpu_alloc_resource_count(vgpu);
> >  	if (ret)
> >  		goto out_clean_vgpu_mmio;
> > 
> > -	populate_pvinfo_page(vgpu);
> > +	if (!delay_res_alloc) {
> > +		ret = intel_vgpu_res_alloc(vgpu);
> > +		if (ret)
> > +			goto out_clean_vgpu_mmio;
> > +	}
> 
> If delayed resource allocation works correctly, why do we still
> need support non-delayed flavor? Even a type doesn't support
> aggregation, it doesn't hurt to do allocation until mdev open...
>

The difference is user expectation I think, if there's really
awareness of this. As original way is to allocate at creat time, so
once created success, resource is guaranteed. But for aggregation type
which could be changed before open, alloc happens at that time which
may have different scenario, e.g might fail caused by other instance
or host. So original idea is to keep old behavior but only change for
aggregation type.

If we think this user expectation is not important, delayed alloc
could help to create vgpu quickly but may have more chance to fail
later..

> > 
> >  	ret = intel_gvt_hypervisor_attach_vgpu(vgpu);
> >  	if (ret)
> >  		goto out_clean_vgpu_resource;
> > 
> > -	ret = intel_vgpu_init_gtt(vgpu);
> > -	if (ret)
> > -		goto out_detach_hypervisor_vgpu;
> > -
> >  	ret = intel_vgpu_init_opregion(vgpu);
> >  	if (ret)
> > -		goto out_clean_gtt;
> > +		goto out_detach_hypervisor_vgpu;
> > 
> > -	ret = intel_vgpu_init_display(vgpu, param->resolution);
> > +	ret = intel_vgpu_init_display(vgpu, vgpu->param.resolution);
> >  	if (ret)
> >  		goto out_clean_opregion;
> > 
> > @@ -438,7 +469,7 @@ static struct intel_vgpu
> > *__intel_gvt_create_vgpu(struct intel_gvt *gvt,
> >  	if (ret)
> >  		goto out_clean_sched_policy;
> > 
> > -	return vgpu;
> > +	return 0;
> > 
> >  out_clean_sched_policy:
> >  	intel_vgpu_clean_sched_policy(vgpu);
> > @@ -448,19 +479,17 @@ static struct intel_vgpu
> > *__intel_gvt_create_vgpu(struct intel_gvt *gvt,
> >  	intel_vgpu_clean_display(vgpu);
> >  out_clean_opregion:
> >  	intel_vgpu_clean_opregion(vgpu);
> > -out_clean_gtt:
> > -	intel_vgpu_clean_gtt(vgpu);
> >  out_detach_hypervisor_vgpu:
> >  	intel_gvt_hypervisor_detach_vgpu(vgpu);
> >  out_clean_vgpu_resource:
> > +	intel_vgpu_clean_gtt(vgpu);
> >  	intel_vgpu_free_resource(vgpu);
> > +	intel_gvt_update_vgpu_types(vgpu->gvt);
> >  out_clean_vgpu_mmio:
> >  	intel_vgpu_clean_mmio(vgpu);
> >  out_clean_idr:
> >  	idr_remove(&gvt->vgpu_idr, vgpu->id);
> > -out_free_vgpu:
> > -	vfree(vgpu);
> > -	return ERR_PTR(ret);
> > +	return ret;
> >  }
> > 
> >  /**
> > @@ -474,33 +503,64 @@ static struct intel_vgpu
> > *__intel_gvt_create_vgpu(struct intel_gvt *gvt,
> >   * pointer to intel_vgpu, error pointer if failed.
> >   */
> >  struct intel_vgpu *intel_gvt_create_vgpu(struct intel_gvt *gvt,
> > -				struct intel_vgpu_type *type)
> > +					 struct intel_vgpu_type *type)
> >  {
> > -	struct intel_vgpu_creation_params param;
> >  	struct intel_vgpu *vgpu;
> > +	struct intel_vgpu_creation_params *param;
> > +	int ret;
> > +
> > +	vgpu = vzalloc(sizeof(*vgpu));
> > +	if (!vgpu)
> > +		return ERR_PTR(-ENOMEM);
> > 
> > -	param.handle = 0;
> > -	param.primary = 1;
> > -	param.low_gm_sz = type->low_gm_size;
> > -	param.high_gm_sz = type->high_gm_size;
> > -	param.fence_sz = type->fence;
> > -	param.weight = type->weight;
> > -	param.resolution = type->resolution;
> > +	param = &vgpu->param;
> > +	param->handle = 0;
> > +	param->primary = 1;
> > +	param->low_gm_sz = type->low_gm_size;
> > +	param->high_gm_sz = type->high_gm_size;
> > +	param->fence_sz = type->fence;
> > +	param->weight = type->weight;
> > +	param->resolution = type->resolution;
> > 
> > -	/* XXX current param based on MB */
> > -	param.low_gm_sz = BYTES_TO_MB(param.low_gm_sz);
> > -	param.high_gm_sz = BYTES_TO_MB(param.high_gm_sz);
> > +	gvt_dbg_core("handle %llu low %llu MB high %llu MB fence %llu\n",
> > +		     param->handle, BYTES_TO_MB(param->low_gm_sz),
> > +		     BYTES_TO_MB(param->high_gm_sz),
> > +		     param->fence_sz);
> > +
> > +	vgpu->handle = param->handle;
> > +	vgpu->gvt = gvt;
> > +	vgpu->sched_ctl.weight = param->weight;
> > +	param->aggregation = 1;
> > 
> >  	mutex_lock(&gvt->lock);
> > -	vgpu = __intel_gvt_create_vgpu(gvt, &param);
> > -	if (!IS_ERR(vgpu))
> > -		/* calculate left instance change for types */
> > -		intel_gvt_update_vgpu_types(gvt);
> > +	ret = __intel_gvt_create_vgpu(vgpu, type->aggregation ? true : false);
> > +	if (ret) {
> > +		mutex_unlock(&gvt->lock);
> > +		vfree(vgpu);
> > +		return ERR_PTR(ret);
> > +	}
> > +	/* calculate left instance change for types */
> > +	intel_gvt_update_vgpu_types(vgpu->gvt);
> >  	mutex_unlock(&gvt->lock);
> > 
> >  	return vgpu;
> >  }
> > 
> > +int intel_vgpu_delayed_alloc(struct intel_vgpu *vgpu)
> > +{
> > +	int ret;
> > +
> > +	intel_vgpu_adjust_resource_count(vgpu);
> > +	ret = intel_vgpu_res_alloc(vgpu);
> > +	if (ret)
> > +		return -EINVAL;
> > +
> > +	mutex_lock(&vgpu->gvt->lock);
> > +	intel_gvt_update_vgpu_types(vgpu->gvt);
> > +	mutex_unlock(&vgpu->gvt->lock);
> > +	return 0;
> > +}
> > +
> >  /**
> >   * intel_gvt_reset_vgpu_locked - reset a virtual GPU by DMLR or GT reset
> >   * @vgpu: virtual GPU
> > --
> > 2.25.1
> 

-- 
Open Source Technology Center, Intel ltd.

$gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v2 2/2] drm/i915/gvt: mdev aggregation type
  2020-03-27  8:12     ` Zhenyu Wang
@ 2020-03-27  8:44       ` Tian, Kevin
  2020-03-27  8:58         ` Zhenyu Wang
  0 siblings, 1 reply; 34+ messages in thread
From: Tian, Kevin @ 2020-03-27  8:44 UTC (permalink / raw)
  To: Zhenyu Wang; +Cc: alex.williamson, kvm, intel-gvt-dev, Jiang, Dave, Yuan, Hang

> From: Zhenyu Wang <zhenyuw@linux.intel.com>
> Sent: Friday, March 27, 2020 4:12 PM
> 
[...]
> > > +int intel_vgpu_adjust_resource_count(struct intel_vgpu *vgpu)
> > > +{
> > > +	if ((vgpu_aperture_sz(vgpu) != vgpu->param.low_gm_sz *
> > > +	     vgpu->param.aggregation) ||
> > > +	    (vgpu_hidden_sz(vgpu) != vgpu->param.high_gm_sz *
> > > +	     vgpu->param.aggregation)) {
> > > +		/* handle aggregation change */
> > > +		intel_vgpu_free_resource_count(vgpu);
> > > +		intel_vgpu_alloc_resource_count(vgpu);
> >
> > this logic sounds like different from the commit msg. Earlier you
> > said the resource is not allocated until mdev open, while the
> > aggregated_interfaces is only allowed to be written before
> > mdev open. In such case, why would it required to handle the
> > case where a vgpu already has resource allocated then coming
> > a new request to adjust the number of instances?
> 
> This is to handle resource accounting before mdev open by aggregation
> setting change. When vgpu create from mdev type, default resource
> count according to type is set for vgpu. So this function is just to
> change resource count by aggregation.

then better change the name, e.g. .xxx_adjust_resource_accounting,
otherwise it's easy to be confused.

[...]
> > >  	if (ret)
> > >  		goto out_clean_vgpu_mmio;
> > >
> > > -	populate_pvinfo_page(vgpu);
> > > +	if (!delay_res_alloc) {
> > > +		ret = intel_vgpu_res_alloc(vgpu);
> > > +		if (ret)
> > > +			goto out_clean_vgpu_mmio;
> > > +	}
> >
> > If delayed resource allocation works correctly, why do we still
> > need support non-delayed flavor? Even a type doesn't support
> > aggregation, it doesn't hurt to do allocation until mdev open...
> >
> 
> The difference is user expectation I think, if there's really
> awareness of this. As original way is to allocate at creat time, so
> once created success, resource is guaranteed. But for aggregation type
> which could be changed before open, alloc happens at that time which
> may have different scenario, e.g might fail caused by other instance
> or host. So original idea is to keep old behavior but only change for
> aggregation type.

but how could one expect any difference between instant allocation
and delayed allocation? You already update resource accounting so
the remaining instances are accurate anyway. Then the user only knows
how the vgpu looks like when it is opened...

> 
> If we think this user expectation is not important, delayed alloc
> could help to create vgpu quickly but may have more chance to fail
> later..
> 

why? If delayed allocation has more chance to fail, it means our
resource accounting has problem. Even for type w/o aggregation
capability, we should reserve one instance resource by default anyway

Thanks
Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/2] drm/i915/gvt: mdev aggregation type
  2020-03-27  8:44       ` Tian, Kevin
@ 2020-03-27  8:58         ` Zhenyu Wang
  2020-03-27  9:31           ` Tian, Kevin
  0 siblings, 1 reply; 34+ messages in thread
From: Zhenyu Wang @ 2020-03-27  8:58 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: alex.williamson, kvm, intel-gvt-dev, Jiang, Dave, Yuan, Hang

[-- Attachment #1: Type: text/plain, Size: 3416 bytes --]

On 2020.03.27 08:44:59 +0000, Tian, Kevin wrote:
> > From: Zhenyu Wang <zhenyuw@linux.intel.com>
> > Sent: Friday, March 27, 2020 4:12 PM
> > 
> [...]
> > > > +int intel_vgpu_adjust_resource_count(struct intel_vgpu *vgpu)
> > > > +{
> > > > +	if ((vgpu_aperture_sz(vgpu) != vgpu->param.low_gm_sz *
> > > > +	     vgpu->param.aggregation) ||
> > > > +	    (vgpu_hidden_sz(vgpu) != vgpu->param.high_gm_sz *
> > > > +	     vgpu->param.aggregation)) {
> > > > +		/* handle aggregation change */
> > > > +		intel_vgpu_free_resource_count(vgpu);
> > > > +		intel_vgpu_alloc_resource_count(vgpu);
> > >
> > > this logic sounds like different from the commit msg. Earlier you
> > > said the resource is not allocated until mdev open, while the
> > > aggregated_interfaces is only allowed to be written before
> > > mdev open. In such case, why would it required to handle the
> > > case where a vgpu already has resource allocated then coming
> > > a new request to adjust the number of instances?
> > 
> > This is to handle resource accounting before mdev open by aggregation
> > setting change. When vgpu create from mdev type, default resource
> > count according to type is set for vgpu. So this function is just to
> > change resource count by aggregation.
> 
> then better change the name, e.g. .xxx_adjust_resource_accounting,
> otherwise it's easy to be confused.
>

ok

> [...]
> > > >  	if (ret)
> > > >  		goto out_clean_vgpu_mmio;
> > > >
> > > > -	populate_pvinfo_page(vgpu);
> > > > +	if (!delay_res_alloc) {
> > > > +		ret = intel_vgpu_res_alloc(vgpu);
> > > > +		if (ret)
> > > > +			goto out_clean_vgpu_mmio;
> > > > +	}
> > >
> > > If delayed resource allocation works correctly, why do we still
> > > need support non-delayed flavor? Even a type doesn't support
> > > aggregation, it doesn't hurt to do allocation until mdev open...
> > >
> > 
> > The difference is user expectation I think, if there's really
> > awareness of this. As original way is to allocate at creat time, so
> > once created success, resource is guaranteed. But for aggregation type
> > which could be changed before open, alloc happens at that time which
> > may have different scenario, e.g might fail caused by other instance
> > or host. So original idea is to keep old behavior but only change for
> > aggregation type.
> 
> but how could one expect any difference between instant allocation
> and delayed allocation? You already update resource accounting so
> the remaining instances are accurate anyway. Then the user only knows
> how the vgpu looks like when it is opened...
> 
> > 
> > If we think this user expectation is not important, delayed alloc
> > could help to create vgpu quickly but may have more chance to fail
> > later..
> > 
> 
> why? If delayed allocation has more chance to fail, it means our
> resource accounting has problem. Even for type w/o aggregation
> capability, we should reserve one instance resource by default anyway
> 

If under really heavy load of host and many other vgpu running, we
might not have left continual gfx mem space..This is not new problem,
just that now we handle it at vgpu create time to reserve the
resource. Once host side could promise some limit, then our usage
will be guaranteed.

-- 
Open Source Technology Center, Intel ltd.

$gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v2 2/2] drm/i915/gvt: mdev aggregation type
  2020-03-27  8:58         ` Zhenyu Wang
@ 2020-03-27  9:31           ` Tian, Kevin
  0 siblings, 0 replies; 34+ messages in thread
From: Tian, Kevin @ 2020-03-27  9:31 UTC (permalink / raw)
  To: Zhenyu Wang; +Cc: alex.williamson, kvm, intel-gvt-dev, Jiang, Dave, Yuan, Hang

> From: Zhenyu Wang <zhenyuw@linux.intel.com>
> Sent: Friday, March 27, 2020 4:58 PM
> 
> On 2020.03.27 08:44:59 +0000, Tian, Kevin wrote:
> > > From: Zhenyu Wang <zhenyuw@linux.intel.com>
> > > Sent: Friday, March 27, 2020 4:12 PM
> > >
> > [...]
> > > > > +int intel_vgpu_adjust_resource_count(struct intel_vgpu *vgpu)
> > > > > +{
> > > > > +	if ((vgpu_aperture_sz(vgpu) != vgpu->param.low_gm_sz *
> > > > > +	     vgpu->param.aggregation) ||
> > > > > +	    (vgpu_hidden_sz(vgpu) != vgpu->param.high_gm_sz *
> > > > > +	     vgpu->param.aggregation)) {
> > > > > +		/* handle aggregation change */
> > > > > +		intel_vgpu_free_resource_count(vgpu);
> > > > > +		intel_vgpu_alloc_resource_count(vgpu);
> > > >
> > > > this logic sounds like different from the commit msg. Earlier you
> > > > said the resource is not allocated until mdev open, while the
> > > > aggregated_interfaces is only allowed to be written before
> > > > mdev open. In such case, why would it required to handle the
> > > > case where a vgpu already has resource allocated then coming
> > > > a new request to adjust the number of instances?
> > >
> > > This is to handle resource accounting before mdev open by aggregation
> > > setting change. When vgpu create from mdev type, default resource
> > > count according to type is set for vgpu. So this function is just to
> > > change resource count by aggregation.
> >
> > then better change the name, e.g. .xxx_adjust_resource_accounting,
> > otherwise it's easy to be confused.
> >
> 
> ok
> 
> > [...]
> > > > >  	if (ret)
> > > > >  		goto out_clean_vgpu_mmio;
> > > > >
> > > > > -	populate_pvinfo_page(vgpu);
> > > > > +	if (!delay_res_alloc) {
> > > > > +		ret = intel_vgpu_res_alloc(vgpu);
> > > > > +		if (ret)
> > > > > +			goto out_clean_vgpu_mmio;
> > > > > +	}
> > > >
> > > > If delayed resource allocation works correctly, why do we still
> > > > need support non-delayed flavor? Even a type doesn't support
> > > > aggregation, it doesn't hurt to do allocation until mdev open...
> > > >
> > >
> > > The difference is user expectation I think, if there's really
> > > awareness of this. As original way is to allocate at creat time, so
> > > once created success, resource is guaranteed. But for aggregation type
> > > which could be changed before open, alloc happens at that time which
> > > may have different scenario, e.g might fail caused by other instance
> > > or host. So original idea is to keep old behavior but only change for
> > > aggregation type.
> >
> > but how could one expect any difference between instant allocation
> > and delayed allocation? You already update resource accounting so
> > the remaining instances are accurate anyway. Then the user only knows
> > how the vgpu looks like when it is opened...
> >
> > >
> > > If we think this user expectation is not important, delayed alloc
> > > could help to create vgpu quickly but may have more chance to fail
> > > later..
> > >
> >
> > why? If delayed allocation has more chance to fail, it means our
> > resource accounting has problem. Even for type w/o aggregation
> > capability, we should reserve one instance resource by default anyway
> >
> 
> If under really heavy load of host and many other vgpu running, we
> might not have left continual gfx mem space..This is not new problem,
> just that now we handle it at vgpu create time to reserve the
> resource. Once host side could promise some limit, then our usage
> will be guaranteed.
> 

heavy load doesn't mean that you may have higher possibility of
securing resource at a earlier point. It's completely nondeterministic
when the situation is worse or better. In such case I don't think 
there is subtle difference between instant and delayed allocation,
while unifying on delayed allocation could simplify our code. 😊

Thanks
Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 0/2] VFIO mdev aggregated resources handling
  2020-03-26  5:41 [PATCH v2 0/2] VFIO mdev aggregated resources handling Zhenyu Wang
  2020-03-26  5:41 ` [PATCH v2 1/2] Documentation/driver-api/vfio-mediated-device.rst: update for aggregation support Zhenyu Wang
  2020-03-26  5:41 ` [PATCH v2 2/2] drm/i915/gvt: mdev aggregation type Zhenyu Wang
@ 2020-04-08  5:58 ` Zhenyu Wang
  2020-07-07 23:28   ` Tian, Kevin
  2020-04-08  5:58 ` [PATCH v3 1/2] Documentation/driver-api/vfio-mediated-device.rst: update for aggregation support Zhenyu Wang
  2020-04-08  5:58 ` [PATCH v3 2/2] drm/i915/gvt: mdev aggregation type Zhenyu Wang
  4 siblings, 1 reply; 34+ messages in thread
From: Zhenyu Wang @ 2020-04-08  5:58 UTC (permalink / raw)
  To: alex.williamson; +Cc: kevin.tian, intel-gvt-dev, kvm

Hi,

This is a refresh on previous series: https://patchwork.kernel.org/cover/11208279/
and https://patchwork.freedesktop.org/series/70425/

Current mdev device create interface depends on fixed mdev type, which
get uuid from user to create instance of mdev device. If user wants to
use customized number of resource for mdev device, then only can
create new mdev type for that which may not be flexible. This
requirement comes not only from to be able to allocate flexible
resources for KVMGT, but also from Intel scalable IO virtualization
which would use vfio/mdev to be able to allocate arbitrary resources
on mdev instance. More info on [1] [2] [3].

As we agreed that for current opaque mdev device type, we'd still
explore management interface based on mdev sysfs definition. And this
one tries to follow Alex's previous suggestion to create generic
parameters under 'mdev' directory for each device, so vendor driver
could provide support like as other defined mdev sysfs entries.

For mdev type with aggregation support, files as "aggregated_instances"
and "max_aggregation" should be created under 'mdev' directory. E.g

/sys/devices/pci0000:00/0000:00:02.0/<UUID>/mdev/
   |-- aggregated_instances
   |-- max_aggregation

"aggregated_instances" is used to set or return current number of
instances for aggregation, which can not be larger than "max_aggregation".

The first patch is to update the document for new mdev parameter directory.
The second one is to add aggregation support in GVT driver.

References:
[1] https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
[2] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
[3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf

Changelog:
v3:
- add more description for sysfs entries
- rebase GVT support
- rename accounting function

Zhenyu Wang (2):
  Documentation/driver-api/vfio-mediated-device.rst: update for
    aggregation support
  drm/i915/gvt: mdev aggregation type

 .../driver-api/vfio-mediated-device.rst       |  22 +++
 drivers/gpu/drm/i915/gvt/aperture_gm.c        |  44 +++--
 drivers/gpu/drm/i915/gvt/gtt.c                |   9 +-
 drivers/gpu/drm/i915/gvt/gvt.c                |   7 +-
 drivers/gpu/drm/i915/gvt/gvt.h                |  42 +++--
 drivers/gpu/drm/i915/gvt/kvmgt.c              | 115 +++++++++++-
 drivers/gpu/drm/i915/gvt/vgpu.c               | 172 ++++++++++++------
 7 files changed, 317 insertions(+), 94 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 1/2] Documentation/driver-api/vfio-mediated-device.rst: update for aggregation support
  2020-03-26  5:41 [PATCH v2 0/2] VFIO mdev aggregated resources handling Zhenyu Wang
                   ` (2 preceding siblings ...)
  2020-04-08  5:58 ` [PATCH v3 0/2] VFIO mdev aggregated resources handling Zhenyu Wang
@ 2020-04-08  5:58 ` Zhenyu Wang
  2020-04-08  5:58 ` [PATCH v3 2/2] drm/i915/gvt: mdev aggregation type Zhenyu Wang
  4 siblings, 0 replies; 34+ messages in thread
From: Zhenyu Wang @ 2020-04-08  5:58 UTC (permalink / raw)
  To: alex.williamson; +Cc: kevin.tian, intel-gvt-dev, kvm, Jiang, Dave

Update doc for mdev aggregation support. Describe mdev generic
parameter directory under mdev device directory.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: "Jiang, Dave" <dave.jiang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
---
 .../driver-api/vfio-mediated-device.rst       | 22 +++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst
index 25eb7d5b834b..fcc031adcf63 100644
--- a/Documentation/driver-api/vfio-mediated-device.rst
+++ b/Documentation/driver-api/vfio-mediated-device.rst
@@ -269,6 +269,9 @@ Directories and Files Under the sysfs for Each mdev Device
   |--- [$MDEV_UUID]
          |--- remove
          |--- mdev_type {link to its type}
+         |--- mdev [optional]
+	     |--- aggregated_instances [optional]
+	     |--- max_aggregation [optional]
          |--- vendor-specific-attributes [optional]
 
 * remove (write only)
@@ -281,6 +284,25 @@ Example::
 
 	# echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove
 
+* mdev directory (optional)
+
+Vendor driver could create mdev directory to specify extra generic parameters
+on mdev device by its type. Currently aggregation parameters are defined.
+Vendor driver should provide both items to support.
+
+1) aggregated_instances (read/write)
+
+Set target aggregated instances for device. Reading will show current
+count of aggregated instances. Writing value larger than max_aggregation
+would fail and return error. Multiple writes could be done to adjust the
+setting but ensure to not exceed max_aggregation. Normally write won't
+be success after device open.
+
+2) max_aggregation (read only)
+
+Show maxium allowed instances which can be aggregated for this device. Maxium
+aggregation could be dynamic changed by vendor driver.
+
 Mediated device Hot plug
 ------------------------
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v3 2/2] drm/i915/gvt: mdev aggregation type
  2020-03-26  5:41 [PATCH v2 0/2] VFIO mdev aggregated resources handling Zhenyu Wang
                   ` (3 preceding siblings ...)
  2020-04-08  5:58 ` [PATCH v3 1/2] Documentation/driver-api/vfio-mediated-device.rst: update for aggregation support Zhenyu Wang
@ 2020-04-08  5:58 ` Zhenyu Wang
  4 siblings, 0 replies; 34+ messages in thread
From: Zhenyu Wang @ 2020-04-08  5:58 UTC (permalink / raw)
  To: alex.williamson; +Cc: kevin.tian, intel-gvt-dev, kvm, Jiang, Dave, Yuan, Hang

To increase the flexibility for resource allocation on mdev types,
this trys to add aggregation attribute for mdev type that can
aggregate gfx memory resources in GVT case.

Use sysfs method for this attribute, the most difference for GVT is
that the gfx resource allocation will be delayed until mdev open
instead of mdev create to allow aggregation setting before VM
start. Resource accounting would be handled before mdev open.

"max_aggregation" would show maxium instances for aggregation on
target vGPU instance. "aggregated_instances" shows current count of
aggregated instances. "max_aggregation" is read-only and user sets
needed account by write to "aggregated_instances".

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: "Jiang, Dave" <dave.jiang@intel.com>
Cc: "Yuan, Hang" <hang.yuan@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
---
 drivers/gpu/drm/i915/gvt/aperture_gm.c |  44 +++++--
 drivers/gpu/drm/i915/gvt/gtt.c         |   9 +-
 drivers/gpu/drm/i915/gvt/gvt.c         |   7 +-
 drivers/gpu/drm/i915/gvt/gvt.h         |  42 ++++--
 drivers/gpu/drm/i915/gvt/kvmgt.c       | 115 ++++++++++++++++-
 drivers/gpu/drm/i915/gvt/vgpu.c        | 172 +++++++++++++++++--------
 6 files changed, 295 insertions(+), 94 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/aperture_gm.c b/drivers/gpu/drm/i915/gvt/aperture_gm.c
index 0d6d59871308..773a699fd9ed 100644
--- a/drivers/gpu/drm/i915/gvt/aperture_gm.c
+++ b/drivers/gpu/drm/i915/gvt/aperture_gm.c
@@ -238,10 +238,10 @@ static void free_resource(struct intel_vgpu *vgpu)
 	gvt->fence.vgpu_allocated_fence_num -= vgpu_fence_sz(vgpu);
 }
 
-static int alloc_resource(struct intel_vgpu *vgpu,
-		struct intel_vgpu_creation_params *param)
+static int alloc_resource(struct intel_vgpu *vgpu)
 {
 	struct intel_gvt *gvt = vgpu->gvt;
+	struct intel_vgpu_creation_params *param = &vgpu->param;
 	unsigned long request, avail, max, taken;
 	const char *item;
 
@@ -254,7 +254,7 @@ static int alloc_resource(struct intel_vgpu *vgpu,
 	max = gvt_aperture_sz(gvt) - HOST_LOW_GM_SIZE;
 	taken = gvt->gm.vgpu_allocated_low_gm_size;
 	avail = max - taken;
-	request = MB_TO_BYTES(param->low_gm_sz);
+	request = param->low_gm_sz * param->aggregation;
 
 	if (request > avail)
 		goto no_enough_resource;
@@ -265,7 +265,7 @@ static int alloc_resource(struct intel_vgpu *vgpu,
 	max = gvt_hidden_sz(gvt) - HOST_HIGH_GM_SIZE;
 	taken = gvt->gm.vgpu_allocated_high_gm_size;
 	avail = max - taken;
-	request = MB_TO_BYTES(param->high_gm_sz);
+	request = param->high_gm_sz * param->aggregation;
 
 	if (request > avail)
 		goto no_enough_resource;
@@ -283,8 +283,8 @@ static int alloc_resource(struct intel_vgpu *vgpu,
 
 	vgpu_fence_sz(vgpu) = request;
 
-	gvt->gm.vgpu_allocated_low_gm_size += MB_TO_BYTES(param->low_gm_sz);
-	gvt->gm.vgpu_allocated_high_gm_size += MB_TO_BYTES(param->high_gm_sz);
+	gvt->gm.vgpu_allocated_low_gm_size += param->low_gm_sz * param->aggregation;
+	gvt->gm.vgpu_allocated_high_gm_size += param->high_gm_sz * param->aggregation;
 	gvt->fence.vgpu_allocated_fence_num += param->fence_sz;
 	return 0;
 
@@ -307,9 +307,34 @@ void intel_vgpu_free_resource(struct intel_vgpu *vgpu)
 {
 	free_vgpu_gm(vgpu);
 	free_vgpu_fence(vgpu);
+}
+
+void intel_vgpu_free_resource_count(struct intel_vgpu *vgpu)
+{
 	free_resource(vgpu);
 }
 
+int intel_vgpu_alloc_resource_count(struct intel_vgpu *vgpu)
+{
+	return alloc_resource(vgpu);
+}
+
+int intel_vgpu_adjust_resource_acounting(struct intel_vgpu *vgpu)
+{
+	if ((vgpu_aperture_sz(vgpu) != vgpu->param.low_gm_sz *
+	     vgpu->param.aggregation) ||
+	    (vgpu_hidden_sz(vgpu) != vgpu->param.high_gm_sz *
+	     vgpu->param.aggregation)) {
+		/* handle aggregation change */
+		intel_vgpu_free_resource_count(vgpu);
+		intel_vgpu_alloc_resource_count(vgpu);
+		mutex_lock(&vgpu->gvt->lock);
+		intel_gvt_update_vgpu_types(vgpu->gvt);
+		mutex_unlock(&vgpu->gvt->lock);
+	}
+	return 0;
+}
+
 /**
  * intel_vgpu_reset_resource - reset resource state owned by a vGPU
  * @vgpu: a vGPU
@@ -338,15 +363,10 @@ void intel_vgpu_reset_resource(struct intel_vgpu *vgpu)
  * zero on success, negative error code if failed.
  *
  */
-int intel_vgpu_alloc_resource(struct intel_vgpu *vgpu,
-		struct intel_vgpu_creation_params *param)
+int intel_vgpu_alloc_resource(struct intel_vgpu *vgpu)
 {
 	int ret;
 
-	ret = alloc_resource(vgpu, param);
-	if (ret)
-		return ret;
-
 	ret = alloc_vgpu_gm(vgpu);
 	if (ret)
 		goto out_free_resource;
diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 2a4b23f8aa74..60f455134ed0 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -2466,12 +2466,6 @@ int intel_vgpu_init_gtt(struct intel_vgpu *vgpu)
 {
 	struct intel_vgpu_gtt *gtt = &vgpu->gtt;
 
-	INIT_RADIX_TREE(&gtt->spt_tree, GFP_KERNEL);
-
-	INIT_LIST_HEAD(&gtt->ppgtt_mm_list_head);
-	INIT_LIST_HEAD(&gtt->oos_page_list_head);
-	INIT_LIST_HEAD(&gtt->post_shadow_list_head);
-
 	gtt->ggtt_mm = intel_vgpu_create_ggtt_mm(vgpu);
 	if (IS_ERR(gtt->ggtt_mm)) {
 		gvt_vgpu_err("fail to create mm for ggtt.\n");
@@ -2508,6 +2502,9 @@ static void intel_vgpu_destroy_ggtt_mm(struct intel_vgpu *vgpu)
 {
 	struct intel_gvt_partial_pte *pos, *next;
 
+	if (!vgpu->gtt.ggtt_mm)
+		return;
+
 	list_for_each_entry_safe(pos, next,
 				 &vgpu->gtt.ggtt_mm->ggtt_mm.partial_pte_list,
 				 list) {
diff --git a/drivers/gpu/drm/i915/gvt/gvt.c b/drivers/gpu/drm/i915/gvt/gvt.c
index 9e1787867894..074139e165d3 100644
--- a/drivers/gpu/drm/i915/gvt/gvt.c
+++ b/drivers/gpu/drm/i915/gvt/gvt.c
@@ -99,11 +99,11 @@ static ssize_t description_show(struct kobject *kobj, struct device *dev,
 
 	return sprintf(buf, "low_gm_size: %dMB\nhigh_gm_size: %dMB\n"
 		       "fence: %d\nresolution: %s\n"
-		       "weight: %d\n",
+		       "weight: %d\naggregation: %s\n",
 		       BYTES_TO_MB(type->low_gm_size),
 		       BYTES_TO_MB(type->high_gm_size),
 		       type->fence, vgpu_edid_str(type->resolution),
-		       type->weight);
+		       type->weight, type->aggregation ? "yes" : "no");
 }
 
 static MDEV_TYPE_ATTR_RO(available_instances);
@@ -185,6 +185,9 @@ static const struct intel_gvt_ops intel_gvt_ops = {
 	.vgpu_get_dmabuf = intel_vgpu_get_dmabuf,
 	.write_protect_handler = intel_vgpu_page_track_handler,
 	.emulate_hotplug = intel_vgpu_emulate_hotplug,
+	.vgpu_delayed_alloc = intel_vgpu_delayed_alloc,
+	.vgpu_res_free = intel_vgpu_res_free,
+	.vgpu_res_change = intel_vgpu_adjust_resource_acounting,
 };
 
 static void init_device_info(struct intel_gvt *gvt)
diff --git a/drivers/gpu/drm/i915/gvt/gvt.h b/drivers/gpu/drm/i915/gvt/gvt.h
index 58c2c7932e3f..e0f2a81b4495 100644
--- a/drivers/gpu/drm/i915/gvt/gvt.h
+++ b/drivers/gpu/drm/i915/gvt/gvt.h
@@ -165,15 +165,30 @@ struct intel_vgpu_submission {
 	bool active;
 };
 
+struct intel_vgpu_creation_params {
+	__u64 handle;
+	__u64 low_gm_sz;
+	__u64 high_gm_sz;
+	__u64 fence_sz;
+	__u64 resolution;
+	__s32 primary;
+	__u64 vgpu_id;
+	__u32 weight;
+	__u32 aggregation; /* requested aggregation number if type supports */
+};
+
 struct intel_vgpu {
 	struct intel_gvt *gvt;
 	struct mutex vgpu_lock;
 	int id;
 	unsigned long handle; /* vGPU handle used by hypervisor MPT modules */
 	bool active;
+	bool res_initialized;
 	bool pv_notified;
 	bool failsafe;
 	unsigned int resetting_eng;
+	struct intel_vgpu_creation_params param;
+	struct intel_vgpu_type *type;
 
 	/* Both sched_data and sched_ctl can be seen a part of the global gvt
 	 * scheduler structure. So below 2 vgpu data are protected
@@ -276,6 +291,7 @@ struct intel_vgpu_type {
 	unsigned int fence;
 	unsigned int weight;
 	enum intel_vgpu_edid resolution;
+	bool aggregation;
 };
 
 struct intel_gvt {
@@ -402,22 +418,12 @@ int intel_gvt_load_firmware(struct intel_gvt *gvt);
 #define vgpu_fence_base(vgpu) (vgpu->fence.base)
 #define vgpu_fence_sz(vgpu) (vgpu->fence.size)
 
-struct intel_vgpu_creation_params {
-	__u64 handle;
-	__u64 low_gm_sz;  /* in MB */
-	__u64 high_gm_sz; /* in MB */
-	__u64 fence_sz;
-	__u64 resolution;
-	__s32 primary;
-	__u64 vgpu_id;
-
-	__u32 weight;
-};
-
-int intel_vgpu_alloc_resource(struct intel_vgpu *vgpu,
-			      struct intel_vgpu_creation_params *param);
+int intel_vgpu_alloc_resource(struct intel_vgpu *vgpu);
 void intel_vgpu_reset_resource(struct intel_vgpu *vgpu);
 void intel_vgpu_free_resource(struct intel_vgpu *vgpu);
+int intel_vgpu_alloc_resource_count(struct intel_vgpu *vgpu);
+void intel_vgpu_free_resource_count(struct intel_vgpu *vgpu);
+int intel_vgpu_adjust_resource_acounting(struct intel_vgpu *vgpu);
 void intel_vgpu_write_fence(struct intel_vgpu *vgpu,
 	u32 fence, u64 value);
 
@@ -458,11 +464,13 @@ static inline void intel_vgpu_write_pci_bar(struct intel_vgpu *vgpu,
 
 int intel_gvt_init_vgpu_types(struct intel_gvt *gvt);
 void intel_gvt_clean_vgpu_types(struct intel_gvt *gvt);
+void intel_gvt_update_vgpu_types(struct intel_gvt *gvt);
 
 struct intel_vgpu *intel_gvt_create_idle_vgpu(struct intel_gvt *gvt);
 void intel_gvt_destroy_idle_vgpu(struct intel_vgpu *vgpu);
 struct intel_vgpu *intel_gvt_create_vgpu(struct intel_gvt *gvt,
 					 struct intel_vgpu_type *type);
+int intel_vgpu_delayed_alloc(struct intel_vgpu *vgpu);
 void intel_gvt_destroy_vgpu(struct intel_vgpu *vgpu);
 void intel_gvt_release_vgpu(struct intel_vgpu *vgpu);
 void intel_gvt_reset_vgpu_locked(struct intel_vgpu *vgpu, bool dmlr,
@@ -523,6 +531,9 @@ static inline u64 intel_vgpu_get_bar_gpa(struct intel_vgpu *vgpu, int bar)
 			PCI_BASE_ADDRESS_MEM_MASK;
 }
 
+int intel_vgpu_res_alloc(struct intel_vgpu *vgpu);
+void intel_vgpu_res_free(struct intel_vgpu *vgpu);
+
 void intel_vgpu_clean_opregion(struct intel_vgpu *vgpu);
 int intel_vgpu_init_opregion(struct intel_vgpu *vgpu);
 int intel_vgpu_opregion_base_write_handler(struct intel_vgpu *vgpu, u32 gpa);
@@ -557,6 +568,9 @@ struct intel_gvt_ops {
 	int (*write_protect_handler)(struct intel_vgpu *, u64, void *,
 				     unsigned int);
 	void (*emulate_hotplug)(struct intel_vgpu *vgpu, bool connected);
+	int (*vgpu_delayed_alloc)(struct intel_vgpu *vgpu);
+	void (*vgpu_res_free)(struct intel_vgpu *vgpu);
+	int (*vgpu_res_change)(struct intel_vgpu *vgpu);
 };
 
 
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index eee530453aa6..88e5ef7c698c 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -714,6 +714,7 @@ static int intel_vgpu_create(struct kobject *kobj, struct mdev_device *mdev)
 		gvt_err("failed to create intel vgpu: %d\n", ret);
 		goto out;
 	}
+	vgpu->type = type;
 
 	INIT_WORK(&kvmgt_vdev(vgpu)->release_work, intel_vgpu_release_work);
 
@@ -797,6 +798,8 @@ static int intel_vgpu_open(struct mdev_device *mdev)
 	int ret;
 	struct vfio_group *vfio_group;
 
+	mutex_lock(&vgpu->vgpu_lock);
+
 	vdev->iommu_notifier.notifier_call = intel_vgpu_iommu_notifier;
 	vdev->group_notifier.notifier_call = intel_vgpu_group_notifier;
 
@@ -826,21 +829,32 @@ static int intel_vgpu_open(struct mdev_device *mdev)
 	}
 	vdev->vfio_group = vfio_group;
 
+	if (vgpu->type->aggregation && vgpu->param.aggregation) {
+		ret = intel_gvt_ops->vgpu_delayed_alloc(vgpu);
+		if (ret)
+			goto undo_group;
+	}
+
 	/* Take a module reference as mdev core doesn't take
 	 * a reference for vendor driver.
 	 */
 	if (!try_module_get(THIS_MODULE))
-		goto undo_group;
+		goto free_res;
 
 	ret = kvmgt_guest_init(mdev);
-	if (ret)
-		goto undo_group;
+	if (ret) {
+		module_put(THIS_MODULE);
+		goto free_res;
+	}
 
 	intel_gvt_ops->vgpu_activate(vgpu);
 
 	atomic_set(&vdev->released, 0);
-	return ret;
+	goto out;
 
+free_res:
+	if (vgpu->type->aggregation && vgpu->param.aggregation)
+		intel_gvt_ops->vgpu_res_free(vgpu);
 undo_group:
 	vfio_group_put_external_user(vdev->vfio_group);
 	vdev->vfio_group = NULL;
@@ -853,6 +867,7 @@ static int intel_vgpu_open(struct mdev_device *mdev)
 	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
 					&vdev->iommu_notifier);
 out:
+	mutex_unlock(&vgpu->vgpu_lock);
 	return ret;
 }
 
@@ -1645,8 +1660,100 @@ static const struct attribute_group intel_vgpu_group = {
 	.attrs = intel_vgpu_attrs,
 };
 
+/*
+ * "max_aggregation" display maxium instances for aggregation,
+ * if type doesn't support aggregation, it displays 0
+ */
+static ssize_t
+max_aggregation_show(struct device *dev, struct device_attribute *attr,
+		     char *buf)
+{
+	struct mdev_device *mdev = mdev_from_dev(dev);
+
+	if (mdev) {
+		struct intel_vgpu *vgpu = (struct intel_vgpu *)
+			mdev_get_drvdata(mdev);
+		struct intel_vgpu_type *type = vgpu->type;
+		/* return left avail + current allocated as maxium */
+		unsigned int m = type->avail_instance +
+			vgpu->param.aggregation;
+		if (type->aggregation)
+			return sprintf(buf, "%u\n", m);
+	}
+	return sprintf(buf, "0\n");
+}
+static DEVICE_ATTR_RO(max_aggregation);
+
+static ssize_t
+aggregated_instances_store(struct device *dev, struct device_attribute *attr,
+			   const char *buf, size_t count)
+{
+	struct mdev_device *mdev = mdev_from_dev(dev);
+	unsigned long val;
+	ssize_t ret = -EINVAL;
+
+	if (kstrtoul(buf, 0, &val) < 0)
+		return ret;
+
+	if (val > 0 && mdev) {
+		struct intel_vgpu *vgpu = (struct intel_vgpu *)
+			mdev_get_drvdata(mdev);
+		struct intel_vgpu_type *type = vgpu->type;
+
+		if (val == vgpu->param.aggregation)
+			return count;
+
+		mutex_lock(&vgpu->vgpu_lock);
+		if (vgpu->active) {
+			mutex_unlock(&vgpu->vgpu_lock);
+			return ret;
+		}
+		/*
+		 * value either less than maxium aggregation instance number,
+		 * or less than current aggregated count.
+		 */
+		if (type->aggregation &&
+		    val <= (vgpu->param.aggregation + type->avail_instance)) {
+			vgpu->param.aggregation = val;
+			intel_gvt_ops->vgpu_res_change(vgpu);
+			ret = count;
+		}
+		mutex_unlock(&vgpu->vgpu_lock);
+	}
+	return ret;
+}
+
+static ssize_t
+aggregated_instances_show(struct device *dev, struct device_attribute *attr,
+		 char *buf)
+{
+	struct mdev_device *mdev = mdev_from_dev(dev);
+
+	if (mdev) {
+		struct intel_vgpu *vgpu = (struct intel_vgpu *)
+			mdev_get_drvdata(mdev);
+		struct intel_vgpu_type *type = vgpu->type;
+		if (type->aggregation)
+			return sprintf(buf, "%u\n", vgpu->param.aggregation);
+	}
+	return sprintf(buf, "0\n");
+}
+static DEVICE_ATTR_RW(aggregated_instances);
+
+static struct attribute *mdev_attrs[] = {
+	&dev_attr_max_aggregation.attr,
+	&dev_attr_aggregated_instances.attr,
+	NULL
+};
+
+static const struct attribute_group mdev_group = {
+	.name = "mdev",
+	.attrs = mdev_attrs,
+};
+
 static const struct attribute_group *intel_vgpu_groups[] = {
 	&intel_vgpu_group,
+	&mdev_group,
 	NULL,
 };
 
diff --git a/drivers/gpu/drm/i915/gvt/vgpu.c b/drivers/gpu/drm/i915/gvt/vgpu.c
index 1d5ff88078bd..23c0bab11129 100644
--- a/drivers/gpu/drm/i915/gvt/vgpu.c
+++ b/drivers/gpu/drm/i915/gvt/vgpu.c
@@ -89,12 +89,13 @@ static struct {
 	unsigned int weight;
 	enum intel_vgpu_edid edid;
 	char *name;
+	bool aggregation;
 } vgpu_types[] = {
 /* Fixed vGPU type table */
-	{ MB_TO_BYTES(64), MB_TO_BYTES(384), 4, VGPU_WEIGHT(8), GVT_EDID_1024_768, "8" },
-	{ MB_TO_BYTES(128), MB_TO_BYTES(512), 4, VGPU_WEIGHT(4), GVT_EDID_1920_1200, "4" },
-	{ MB_TO_BYTES(256), MB_TO_BYTES(1024), 4, VGPU_WEIGHT(2), GVT_EDID_1920_1200, "2" },
-	{ MB_TO_BYTES(512), MB_TO_BYTES(2048), 4, VGPU_WEIGHT(1), GVT_EDID_1920_1200, "1" },
+	{ MB_TO_BYTES(64), MB_TO_BYTES(384), 4, VGPU_WEIGHT(8), GVT_EDID_1024_768, "8", true },
+	{ MB_TO_BYTES(128), MB_TO_BYTES(512), 4, VGPU_WEIGHT(4), GVT_EDID_1920_1200, "4", false },
+	{ MB_TO_BYTES(256), MB_TO_BYTES(1024), 4, VGPU_WEIGHT(2), GVT_EDID_1920_1200, "2", false },
+	{ MB_TO_BYTES(512), MB_TO_BYTES(2048), 4, VGPU_WEIGHT(1), GVT_EDID_1920_1200, "1", false },
 };
 
 /**
@@ -148,6 +149,7 @@ int intel_gvt_init_vgpu_types(struct intel_gvt *gvt)
 		gvt->types[i].resolution = vgpu_types[i].edid;
 		gvt->types[i].avail_instance = min(low_avail / vgpu_types[i].low_mm,
 						   high_avail / vgpu_types[i].high_mm);
+		gvt->types[i].aggregation = vgpu_types[i].aggregation;
 
 		if (IS_GEN(gvt->gt->i915, 8))
 			sprintf(gvt->types[i].name, "GVTg_V4_%s",
@@ -174,7 +176,7 @@ void intel_gvt_clean_vgpu_types(struct intel_gvt *gvt)
 	kfree(gvt->types);
 }
 
-static void intel_gvt_update_vgpu_types(struct intel_gvt *gvt)
+void intel_gvt_update_vgpu_types(struct intel_gvt *gvt)
 {
 	int i;
 	unsigned int low_gm_avail, high_gm_avail, fence_avail;
@@ -213,9 +215,7 @@ static void intel_gvt_update_vgpu_types(struct intel_gvt *gvt)
  */
 void intel_gvt_activate_vgpu(struct intel_vgpu *vgpu)
 {
-	mutex_lock(&vgpu->vgpu_lock);
 	vgpu->active = true;
-	mutex_unlock(&vgpu->vgpu_lock);
 }
 
 /**
@@ -259,6 +259,8 @@ void intel_gvt_release_vgpu(struct intel_vgpu *vgpu)
 	mutex_lock(&vgpu->vgpu_lock);
 	intel_vgpu_clean_workloads(vgpu, ALL_ENGINES);
 	intel_vgpu_dmabuf_cleanup(vgpu);
+	if (vgpu->type->aggregation)
+		intel_vgpu_res_free(vgpu);
 	mutex_unlock(&vgpu->vgpu_lock);
 }
 
@@ -290,10 +292,13 @@ void intel_gvt_destroy_vgpu(struct intel_vgpu *vgpu)
 	intel_vgpu_clean_submission(vgpu);
 	intel_vgpu_clean_display(vgpu);
 	intel_vgpu_clean_opregion(vgpu);
-	intel_vgpu_reset_ggtt(vgpu, true);
-	intel_vgpu_clean_gtt(vgpu);
+	if (vgpu->res_initialized) {
+		intel_vgpu_reset_ggtt(vgpu, true);
+		intel_vgpu_clean_gtt(vgpu);
+		intel_vgpu_free_resource(vgpu);
+	}
+	intel_vgpu_free_resource_count(vgpu);
 	intel_gvt_hypervisor_detach_vgpu(vgpu);
-	intel_vgpu_free_resource(vgpu);
 	intel_vgpu_clean_mmio(vgpu);
 	intel_vgpu_dmabuf_cleanup(vgpu);
 	mutex_unlock(&vgpu->vgpu_lock);
@@ -364,59 +369,85 @@ void intel_gvt_destroy_idle_vgpu(struct intel_vgpu *vgpu)
 	vfree(vgpu);
 }
 
-static struct intel_vgpu *__intel_gvt_create_vgpu(struct intel_gvt *gvt,
-		struct intel_vgpu_creation_params *param)
+int intel_vgpu_res_alloc(struct intel_vgpu *vgpu)
 {
-	struct intel_vgpu *vgpu;
 	int ret;
 
-	gvt_dbg_core("handle %llu low %llu MB high %llu MB fence %llu\n",
-			param->handle, param->low_gm_sz, param->high_gm_sz,
-			param->fence_sz);
+	ret = intel_vgpu_alloc_resource(vgpu);
+	if (ret)
+		return ret;
 
-	vgpu = vzalloc(sizeof(*vgpu));
-	if (!vgpu)
-		return ERR_PTR(-ENOMEM);
+	populate_pvinfo_page(vgpu);
+
+	ret = intel_vgpu_init_gtt(vgpu);
+	if (ret) {
+		intel_vgpu_free_resource(vgpu);
+		return ret;
+	}
+	vgpu->res_initialized = true;
+	return 0;
+}
+
+void intel_vgpu_res_free(struct intel_vgpu *vgpu)
+{
+	intel_vgpu_reset_ggtt(vgpu, true);
+	intel_vgpu_clean_gtt(vgpu);
+	intel_vgpu_free_resource(vgpu);
+	vgpu->res_initialized = false;
+	mutex_lock(&vgpu->gvt->lock);
+	intel_gvt_update_vgpu_types(vgpu->gvt);
+	mutex_unlock(&vgpu->gvt->lock);
+}
+
+static int
+__intel_gvt_create_vgpu(struct intel_vgpu *vgpu,
+			bool delay_res_alloc)
+{
+	struct intel_gvt *gvt = vgpu->gvt;
+	struct intel_vgpu_gtt *gtt = &vgpu->gtt;
+	int ret;
 
 	ret = idr_alloc(&gvt->vgpu_idr, vgpu, IDLE_VGPU_IDR + 1, GVT_MAX_VGPU,
-		GFP_KERNEL);
+			GFP_KERNEL);
 	if (ret < 0)
-		goto out_free_vgpu;
-
+		return ret;
 	vgpu->id = ret;
-	vgpu->handle = param->handle;
-	vgpu->gvt = gvt;
-	vgpu->sched_ctl.weight = param->weight;
+
 	mutex_init(&vgpu->vgpu_lock);
 	mutex_init(&vgpu->dmabuf_lock);
 	INIT_LIST_HEAD(&vgpu->dmabuf_obj_list_head);
 	INIT_RADIX_TREE(&vgpu->page_track_tree, GFP_KERNEL);
 	idr_init(&vgpu->object_idr);
-	intel_vgpu_init_cfg_space(vgpu, param->primary);
+	INIT_RADIX_TREE(&gtt->spt_tree, GFP_KERNEL);
+	INIT_LIST_HEAD(&gtt->ppgtt_mm_list_head);
+	INIT_LIST_HEAD(&gtt->oos_page_list_head);
+	INIT_LIST_HEAD(&gtt->post_shadow_list_head);
+
+	intel_vgpu_init_cfg_space(vgpu, vgpu->param.primary);
 
 	ret = intel_vgpu_init_mmio(vgpu);
 	if (ret)
 		goto out_clean_idr;
 
-	ret = intel_vgpu_alloc_resource(vgpu, param);
+	ret = intel_vgpu_alloc_resource_count(vgpu);
 	if (ret)
 		goto out_clean_vgpu_mmio;
 
-	populate_pvinfo_page(vgpu);
+	if (!delay_res_alloc) {
+		ret = intel_vgpu_res_alloc(vgpu);
+		if (ret)
+			goto out_clean_vgpu_mmio;
+	}
 
 	ret = intel_gvt_hypervisor_attach_vgpu(vgpu);
 	if (ret)
 		goto out_clean_vgpu_resource;
 
-	ret = intel_vgpu_init_gtt(vgpu);
-	if (ret)
-		goto out_detach_hypervisor_vgpu;
-
 	ret = intel_vgpu_init_opregion(vgpu);
 	if (ret)
-		goto out_clean_gtt;
+		goto out_detach_hypervisor_vgpu;
 
-	ret = intel_vgpu_init_display(vgpu, param->resolution);
+	ret = intel_vgpu_init_display(vgpu, vgpu->param.resolution);
 	if (ret)
 		goto out_clean_opregion;
 
@@ -438,7 +469,7 @@ static struct intel_vgpu *__intel_gvt_create_vgpu(struct intel_gvt *gvt,
 	if (ret)
 		goto out_clean_sched_policy;
 
-	return vgpu;
+	return 0;
 
 out_clean_sched_policy:
 	intel_vgpu_clean_sched_policy(vgpu);
@@ -448,19 +479,17 @@ static struct intel_vgpu *__intel_gvt_create_vgpu(struct intel_gvt *gvt,
 	intel_vgpu_clean_display(vgpu);
 out_clean_opregion:
 	intel_vgpu_clean_opregion(vgpu);
-out_clean_gtt:
-	intel_vgpu_clean_gtt(vgpu);
 out_detach_hypervisor_vgpu:
 	intel_gvt_hypervisor_detach_vgpu(vgpu);
 out_clean_vgpu_resource:
+	intel_vgpu_clean_gtt(vgpu);
 	intel_vgpu_free_resource(vgpu);
+	intel_gvt_update_vgpu_types(vgpu->gvt);
 out_clean_vgpu_mmio:
 	intel_vgpu_clean_mmio(vgpu);
 out_clean_idr:
 	idr_remove(&gvt->vgpu_idr, vgpu->id);
-out_free_vgpu:
-	vfree(vgpu);
-	return ERR_PTR(ret);
+	return ret;
 }
 
 /**
@@ -474,33 +503,64 @@ static struct intel_vgpu *__intel_gvt_create_vgpu(struct intel_gvt *gvt,
  * pointer to intel_vgpu, error pointer if failed.
  */
 struct intel_vgpu *intel_gvt_create_vgpu(struct intel_gvt *gvt,
-				struct intel_vgpu_type *type)
+					 struct intel_vgpu_type *type)
 {
-	struct intel_vgpu_creation_params param;
 	struct intel_vgpu *vgpu;
+	struct intel_vgpu_creation_params *param;
+	int ret;
+
+	vgpu = vzalloc(sizeof(*vgpu));
+	if (!vgpu)
+		return ERR_PTR(-ENOMEM);
 
-	param.handle = 0;
-	param.primary = 1;
-	param.low_gm_sz = type->low_gm_size;
-	param.high_gm_sz = type->high_gm_size;
-	param.fence_sz = type->fence;
-	param.weight = type->weight;
-	param.resolution = type->resolution;
+	param = &vgpu->param;
+	param->handle = 0;
+	param->primary = 1;
+	param->low_gm_sz = type->low_gm_size;
+	param->high_gm_sz = type->high_gm_size;
+	param->fence_sz = type->fence;
+	param->weight = type->weight;
+	param->resolution = type->resolution;
 
-	/* XXX current param based on MB */
-	param.low_gm_sz = BYTES_TO_MB(param.low_gm_sz);
-	param.high_gm_sz = BYTES_TO_MB(param.high_gm_sz);
+	gvt_dbg_core("handle %llu low %llu MB high %llu MB fence %llu\n",
+		     param->handle, BYTES_TO_MB(param->low_gm_sz),
+		     BYTES_TO_MB(param->high_gm_sz),
+		     param->fence_sz);
+
+	vgpu->handle = param->handle;
+	vgpu->gvt = gvt;
+	vgpu->sched_ctl.weight = param->weight;
+	param->aggregation = 1;
 
 	mutex_lock(&gvt->lock);
-	vgpu = __intel_gvt_create_vgpu(gvt, &param);
-	if (!IS_ERR(vgpu))
-		/* calculate left instance change for types */
-		intel_gvt_update_vgpu_types(gvt);
+	ret = __intel_gvt_create_vgpu(vgpu, type->aggregation ? true : false);
+	if (ret) {
+		mutex_unlock(&gvt->lock);
+		vfree(vgpu);
+		return ERR_PTR(ret);
+	}
+	/* calculate left instance change for types */
+	intel_gvt_update_vgpu_types(vgpu->gvt);
 	mutex_unlock(&gvt->lock);
 
 	return vgpu;
 }
 
+int intel_vgpu_delayed_alloc(struct intel_vgpu *vgpu)
+{
+	int ret;
+
+	intel_vgpu_adjust_resource_acounting(vgpu);
+	ret = intel_vgpu_res_alloc(vgpu);
+	if (ret)
+		return -EINVAL;
+
+	mutex_lock(&vgpu->gvt->lock);
+	intel_gvt_update_vgpu_types(vgpu->gvt);
+	mutex_unlock(&vgpu->gvt->lock);
+	return 0;
+}
+
 /**
  * intel_gvt_reset_vgpu_locked - reset a virtual GPU by DMLR or GT reset
  * @vgpu: virtual GPU
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* RE: [PATCH v3 0/2] VFIO mdev aggregated resources handling
  2020-04-08  5:58 ` [PATCH v3 0/2] VFIO mdev aggregated resources handling Zhenyu Wang
@ 2020-07-07 23:28   ` Tian, Kevin
  2020-07-08  1:06     ` Alex Williamson
  0 siblings, 1 reply; 34+ messages in thread
From: Tian, Kevin @ 2020-07-07 23:28 UTC (permalink / raw)
  To: Zhenyu Wang, alex.williamson; +Cc: intel-gvt-dev, kvm

Hi, Alex, 

Gentle ping... Please let us know whether this version looks good.

Thanks
Kevin

> From: Zhenyu Wang <zhenyuw@linux.intel.com>
> Sent: Wednesday, April 8, 2020 1:58 PM
> 
> Hi,
> 
> This is a refresh on previous series:
> https://patchwork.kernel.org/cover/11208279/
> and https://patchwork.freedesktop.org/series/70425/
> 
> Current mdev device create interface depends on fixed mdev type, which
> get uuid from user to create instance of mdev device. If user wants to
> use customized number of resource for mdev device, then only can
> create new mdev type for that which may not be flexible. This
> requirement comes not only from to be able to allocate flexible
> resources for KVMGT, but also from Intel scalable IO virtualization
> which would use vfio/mdev to be able to allocate arbitrary resources
> on mdev instance. More info on [1] [2] [3].
> 
> As we agreed that for current opaque mdev device type, we'd still
> explore management interface based on mdev sysfs definition. And this
> one tries to follow Alex's previous suggestion to create generic
> parameters under 'mdev' directory for each device, so vendor driver
> could provide support like as other defined mdev sysfs entries.
> 
> For mdev type with aggregation support, files as "aggregated_instances"
> and "max_aggregation" should be created under 'mdev' directory. E.g
> 
> /sys/devices/pci0000:00/0000:00:02.0/<UUID>/mdev/
>    |-- aggregated_instances
>    |-- max_aggregation
> 
> "aggregated_instances" is used to set or return current number of
> instances for aggregation, which can not be larger than "max_aggregation".
> 
> The first patch is to update the document for new mdev parameter directory.
> The second one is to add aggregation support in GVT driver.
> 
> References:
> [1] https://software.intel.com/en-us/download/intel-virtualization-
> technology-for-directed-io-architecture-specification
> [2] https://software.intel.com/en-us/download/intel-scalable-io-
> virtualization-technical-specification
> [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
> 
> Changelog:
> v3:
> - add more description for sysfs entries
> - rebase GVT support
> - rename accounting function
> 
> Zhenyu Wang (2):
>   Documentation/driver-api/vfio-mediated-device.rst: update for
>     aggregation support
>   drm/i915/gvt: mdev aggregation type
> 
>  .../driver-api/vfio-mediated-device.rst       |  22 +++
>  drivers/gpu/drm/i915/gvt/aperture_gm.c        |  44 +++--
>  drivers/gpu/drm/i915/gvt/gtt.c                |   9 +-
>  drivers/gpu/drm/i915/gvt/gvt.c                |   7 +-
>  drivers/gpu/drm/i915/gvt/gvt.h                |  42 +++--
>  drivers/gpu/drm/i915/gvt/kvmgt.c              | 115 +++++++++++-
>  drivers/gpu/drm/i915/gvt/vgpu.c               | 172 ++++++++++++------
>  7 files changed, 317 insertions(+), 94 deletions(-)
> 
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/2] VFIO mdev aggregated resources handling
  2020-07-07 23:28   ` Tian, Kevin
@ 2020-07-08  1:06     ` Alex Williamson
  2020-07-08  1:54       ` Zhenyu Wang
  2020-07-08  6:31       ` Tian, Kevin
  0 siblings, 2 replies; 34+ messages in thread
From: Alex Williamson @ 2020-07-08  1:06 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: Zhenyu Wang, intel-gvt-dev, kvm

On Tue, 7 Jul 2020 23:28:39 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> Hi, Alex, 
> 
> Gentle ping... Please let us know whether this version looks good.

I figured this is entangled with the versioning scheme.  There are
unanswered questions about how something that assumes a device of a
given type is software compatible to another device of the same type
handles aggregation and how the type class would indicate compatibility
with an aggregated instance.  Thanks,

Alex


> > From: Zhenyu Wang <zhenyuw@linux.intel.com>
> > Sent: Wednesday, April 8, 2020 1:58 PM
> > 
> > Hi,
> > 
> > This is a refresh on previous series:
> > https://patchwork.kernel.org/cover/11208279/
> > and https://patchwork.freedesktop.org/series/70425/
> > 
> > Current mdev device create interface depends on fixed mdev type, which
> > get uuid from user to create instance of mdev device. If user wants to
> > use customized number of resource for mdev device, then only can
> > create new mdev type for that which may not be flexible. This
> > requirement comes not only from to be able to allocate flexible
> > resources for KVMGT, but also from Intel scalable IO virtualization
> > which would use vfio/mdev to be able to allocate arbitrary resources
> > on mdev instance. More info on [1] [2] [3].
> > 
> > As we agreed that for current opaque mdev device type, we'd still
> > explore management interface based on mdev sysfs definition. And this
> > one tries to follow Alex's previous suggestion to create generic
> > parameters under 'mdev' directory for each device, so vendor driver
> > could provide support like as other defined mdev sysfs entries.
> > 
> > For mdev type with aggregation support, files as "aggregated_instances"
> > and "max_aggregation" should be created under 'mdev' directory. E.g
> > 
> > /sys/devices/pci0000:00/0000:00:02.0/<UUID>/mdev/
> >    |-- aggregated_instances
> >    |-- max_aggregation
> > 
> > "aggregated_instances" is used to set or return current number of
> > instances for aggregation, which can not be larger than "max_aggregation".
> > 
> > The first patch is to update the document for new mdev parameter directory.
> > The second one is to add aggregation support in GVT driver.
> > 
> > References:
> > [1] https://software.intel.com/en-us/download/intel-virtualization-
> > technology-for-directed-io-architecture-specification
> > [2] https://software.intel.com/en-us/download/intel-scalable-io-
> > virtualization-technical-specification
> > [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
> > 
> > Changelog:
> > v3:
> > - add more description for sysfs entries
> > - rebase GVT support
> > - rename accounting function
> > 
> > Zhenyu Wang (2):
> >   Documentation/driver-api/vfio-mediated-device.rst: update for
> >     aggregation support
> >   drm/i915/gvt: mdev aggregation type
> > 
> >  .../driver-api/vfio-mediated-device.rst       |  22 +++
> >  drivers/gpu/drm/i915/gvt/aperture_gm.c        |  44 +++--
> >  drivers/gpu/drm/i915/gvt/gtt.c                |   9 +-
> >  drivers/gpu/drm/i915/gvt/gvt.c                |   7 +-
> >  drivers/gpu/drm/i915/gvt/gvt.h                |  42 +++--
> >  drivers/gpu/drm/i915/gvt/kvmgt.c              | 115 +++++++++++-
> >  drivers/gpu/drm/i915/gvt/vgpu.c               | 172 ++++++++++++------
> >  7 files changed, 317 insertions(+), 94 deletions(-)
> > 
> > --
> > 2.25.1  
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/2] VFIO mdev aggregated resources handling
  2020-07-08  1:06     ` Alex Williamson
@ 2020-07-08  1:54       ` Zhenyu Wang
  2020-07-08  3:38         ` Yan Zhao
  2020-07-08  6:31       ` Tian, Kevin
  1 sibling, 1 reply; 34+ messages in thread
From: Zhenyu Wang @ 2020-07-08  1:54 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Tian, Kevin, intel-gvt-dev, kvm, Zhao, Yan Y

[-- Attachment #1: Type: text/plain, Size: 4046 bytes --]

On 2020.07.07 19:06:34 -0600, Alex Williamson wrote:
> On Tue, 7 Jul 2020 23:28:39 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > Hi, Alex, 
> > 
> > Gentle ping... Please let us know whether this version looks good.
> 
> I figured this is entangled with the versioning scheme.  There are
> unanswered questions about how something that assumes a device of a
> given type is software compatible to another device of the same type
> handles aggregation and how the type class would indicate compatibility
> with an aggregated instance.  Thanks,
> 

+Yan

Alex, If no concern on aggregated resources info for instance that would
be vendor's behavior to determine what type of resources would be aggregated,
then I'll check with Yan to see how to fulfill this during migration.

Thanks

> 
> 
> > > From: Zhenyu Wang <zhenyuw@linux.intel.com>
> > > Sent: Wednesday, April 8, 2020 1:58 PM
> > > 
> > > Hi,
> > > 
> > > This is a refresh on previous series:
> > > https://patchwork.kernel.org/cover/11208279/
> > > and https://patchwork.freedesktop.org/series/70425/
> > > 
> > > Current mdev device create interface depends on fixed mdev type, which
> > > get uuid from user to create instance of mdev device. If user wants to
> > > use customized number of resource for mdev device, then only can
> > > create new mdev type for that which may not be flexible. This
> > > requirement comes not only from to be able to allocate flexible
> > > resources for KVMGT, but also from Intel scalable IO virtualization
> > > which would use vfio/mdev to be able to allocate arbitrary resources
> > > on mdev instance. More info on [1] [2] [3].
> > > 
> > > As we agreed that for current opaque mdev device type, we'd still
> > > explore management interface based on mdev sysfs definition. And this
> > > one tries to follow Alex's previous suggestion to create generic
> > > parameters under 'mdev' directory for each device, so vendor driver
> > > could provide support like as other defined mdev sysfs entries.
> > > 
> > > For mdev type with aggregation support, files as "aggregated_instances"
> > > and "max_aggregation" should be created under 'mdev' directory. E.g
> > > 
> > > /sys/devices/pci0000:00/0000:00:02.0/<UUID>/mdev/
> > >    |-- aggregated_instances
> > >    |-- max_aggregation
> > > 
> > > "aggregated_instances" is used to set or return current number of
> > > instances for aggregation, which can not be larger than "max_aggregation".
> > > 
> > > The first patch is to update the document for new mdev parameter directory.
> > > The second one is to add aggregation support in GVT driver.
> > > 
> > > References:
> > > [1] https://software.intel.com/en-us/download/intel-virtualization-
> > > technology-for-directed-io-architecture-specification
> > > [2] https://software.intel.com/en-us/download/intel-scalable-io-
> > > virtualization-technical-specification
> > > [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
> > > 
> > > Changelog:
> > > v3:
> > > - add more description for sysfs entries
> > > - rebase GVT support
> > > - rename accounting function
> > > 
> > > Zhenyu Wang (2):
> > >   Documentation/driver-api/vfio-mediated-device.rst: update for
> > >     aggregation support
> > >   drm/i915/gvt: mdev aggregation type
> > > 
> > >  .../driver-api/vfio-mediated-device.rst       |  22 +++
> > >  drivers/gpu/drm/i915/gvt/aperture_gm.c        |  44 +++--
> > >  drivers/gpu/drm/i915/gvt/gtt.c                |   9 +-
> > >  drivers/gpu/drm/i915/gvt/gvt.c                |   7 +-
> > >  drivers/gpu/drm/i915/gvt/gvt.h                |  42 +++--
> > >  drivers/gpu/drm/i915/gvt/kvmgt.c              | 115 +++++++++++-
> > >  drivers/gpu/drm/i915/gvt/vgpu.c               | 172 ++++++++++++------
> > >  7 files changed, 317 insertions(+), 94 deletions(-)
> > > 
> > > --
> > > 2.25.1  
> > 
> 

-- 
Open Source Technology Center, Intel ltd.

$gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/2] VFIO mdev aggregated resources handling
  2020-07-08  1:54       ` Zhenyu Wang
@ 2020-07-08  3:38         ` Yan Zhao
  2020-07-08  3:40           ` Yan Zhao
  0 siblings, 1 reply; 34+ messages in thread
From: Yan Zhao @ 2020-07-08  3:38 UTC (permalink / raw)
  To: Zhenyu Wang; +Cc: Alex Williamson, Tian, Kevin, intel-gvt-dev, kvm

On Wed, Jul 08, 2020 at 09:54:19AM +0800, Zhenyu Wang wrote:
> On 2020.07.07 19:06:34 -0600, Alex Williamson wrote:
> > On Tue, 7 Jul 2020 23:28:39 +0000
> > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> > 
> > > Hi, Alex, 
> > > 
> > > Gentle ping... Please let us know whether this version looks good.
> > 
> > I figured this is entangled with the versioning scheme.  There are
> > unanswered questions about how something that assumes a device of a
> > given type is software compatible to another device of the same type
> > handles aggregation and how the type class would indicate compatibility
> > with an aggregated instance.  Thanks,
> > 
> 
> +Yan
> 
> Alex, If no concern on aggregated resources info for instance that would
> be vendor's behavior to determine what type of resources would be aggregated,
> then I'll check with Yan to see how to fulfill this during migration.
> 
> Thanks
>

hi zhenyu and Alex
currently in this series, it looks that aggregated instances are created
in this way:
    echo "<uuid>,aggregate=10" > create

Is that possible that we change it like that:
1. provide a separate attribute named "aggregator" under mdev type.
  |- [parent physical device]
  |--- Vendor-specific-attributes [optional]
  |--- [mdev_supported_types]
  |     |--- [<type-id>]
  |     |   |--- create
+ |     |   |--- aggregator
  |     |   |--- name
  |     |   |--- available_instances
  |     |   |--- device_api
  |     |   |--- description
  |     |   |--- [devices]

normally, the aggregator is read as 0.


2. when we want to create an aggregated instance, we first echo the count
into the aggregator attribute. e.g.
   echo 10 > aggregator
It will switch the mdev type to 10 x original_type. And then,
available_instances and description would be updated accordingly.

3. do the real mdev creation.
   echo <uuid> > create


In this way, before any instance is created, we can use the
migration_version attribute to test if two aggregation mdevs are
migration compatible.

Thanks
Yan



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/2] VFIO mdev aggregated resources handling
  2020-07-08  3:38         ` Yan Zhao
@ 2020-07-08  3:40           ` Yan Zhao
  2020-07-08  4:17             ` Alex Williamson
  0 siblings, 1 reply; 34+ messages in thread
From: Yan Zhao @ 2020-07-08  3:40 UTC (permalink / raw)
  To: Zhenyu Wang; +Cc: Alex Williamson, intel-gvt-dev, Tian, Kevin, kvm

On Wed, Jul 08, 2020 at 11:38:46AM +0800, Yan Zhao wrote:
> On Wed, Jul 08, 2020 at 09:54:19AM +0800, Zhenyu Wang wrote:
> > On 2020.07.07 19:06:34 -0600, Alex Williamson wrote:
> > > On Tue, 7 Jul 2020 23:28:39 +0000
> > > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> > > 
> > > > Hi, Alex, 
> > > > 
> > > > Gentle ping... Please let us know whether this version looks good.
> > > 
> > > I figured this is entangled with the versioning scheme.  There are
> > > unanswered questions about how something that assumes a device of a
> > > given type is software compatible to another device of the same type
> > > handles aggregation and how the type class would indicate compatibility
> > > with an aggregated instance.  Thanks,
> > > 
> > 
> > +Yan
> > 
> > Alex, If no concern on aggregated resources info for instance that would
> > be vendor's behavior to determine what type of resources would be aggregated,
> > then I'll check with Yan to see how to fulfill this during migration.
> > 
> > Thanks
> >
> 
> hi zhenyu and Alex
> currently in this series, it looks that aggregated instances are created
> in this way:
>     echo "<uuid>,aggregate=10" > create
> 
> Is that possible that we change it like that:
> 1. provide a separate attribute named "aggregator" under mdev type.
>   |- [parent physical device]
>   |--- Vendor-specific-attributes [optional]
>   |--- [mdev_supported_types]
>   |     |--- [<type-id>]
>   |     |   |--- create
> + |     |   |--- aggregator
>   |     |   |--- name
>   |     |   |--- available_instances
>   |     |   |--- device_api
>   |     |   |--- description
>   |     |   |--- [devices]
> 
> normally, the aggregator is read as 0.
>
correction: normally, the aggregator is read as "1"

> 2. when we want to create an aggregated instance, we first echo the count
> into the aggregator attribute. e.g.
>    echo 10 > aggregator
> It will switch the mdev type to 10 x original_type. And then,
> available_instances and description would be updated accordingly.
> 
> 3. do the real mdev creation.
>    echo <uuid> > create
> 
> 
> In this way, before any instance is created, we can use the
> migration_version attribute to test if two aggregation mdevs are
> migration compatible.
> 
> Thanks
> Yan
> 
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/2] VFIO mdev aggregated resources handling
  2020-07-08  3:40           ` Yan Zhao
@ 2020-07-08  4:17             ` Alex Williamson
  0 siblings, 0 replies; 34+ messages in thread
From: Alex Williamson @ 2020-07-08  4:17 UTC (permalink / raw)
  To: Yan Zhao; +Cc: Zhenyu Wang, intel-gvt-dev, Tian, Kevin, kvm

On Wed, 8 Jul 2020 11:40:55 +0800
Yan Zhao <yan.y.zhao@intel.com> wrote:

> On Wed, Jul 08, 2020 at 11:38:46AM +0800, Yan Zhao wrote:
> > On Wed, Jul 08, 2020 at 09:54:19AM +0800, Zhenyu Wang wrote:  
> > > On 2020.07.07 19:06:34 -0600, Alex Williamson wrote:  
> > > > On Tue, 7 Jul 2020 23:28:39 +0000
> > > > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> > > >   
> > > > > Hi, Alex, 
> > > > > 
> > > > > Gentle ping... Please let us know whether this version looks good.  
> > > > 
> > > > I figured this is entangled with the versioning scheme.  There are
> > > > unanswered questions about how something that assumes a device of a
> > > > given type is software compatible to another device of the same type
> > > > handles aggregation and how the type class would indicate compatibility
> > > > with an aggregated instance.  Thanks,
> > > >   
> > > 
> > > +Yan
> > > 
> > > Alex, If no concern on aggregated resources info for instance that would
> > > be vendor's behavior to determine what type of resources would be aggregated,
> > > then I'll check with Yan to see how to fulfill this during migration.
> > > 
> > > Thanks
> > >  
> > 
> > hi zhenyu and Alex
> > currently in this series, it looks that aggregated instances are created
> > in this way:
> >     echo "<uuid>,aggregate=10" > create
> > 
> > Is that possible that we change it like that:
> > 1. provide a separate attribute named "aggregator" under mdev type.
> >   |- [parent physical device]
> >   |--- Vendor-specific-attributes [optional]
> >   |--- [mdev_supported_types]
> >   |     |--- [<type-id>]
> >   |     |   |--- create
> > + |     |   |--- aggregator
> >   |     |   |--- name
> >   |     |   |--- available_instances
> >   |     |   |--- device_api
> >   |     |   |--- description
> >   |     |   |--- [devices]
> > 
> > normally, the aggregator is read as 0.
> >  
> correction: normally, the aggregator is read as "1"
> 
> > 2. when we want to create an aggregated instance, we first echo the count
> > into the aggregator attribute. e.g.
> >    echo 10 > aggregator
> > It will switch the mdev type to 10 x original_type. And then,
> > available_instances and description would be updated accordingly.
> > 
> > 3. do the real mdev creation.
> >    echo <uuid> > create

Nak, this is inherently racy.  Just imagine two processes trying to
simultaneously create devices.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v3 0/2] VFIO mdev aggregated resources handling
  2020-07-08  1:06     ` Alex Williamson
  2020-07-08  1:54       ` Zhenyu Wang
@ 2020-07-08  6:31       ` Tian, Kevin
  2020-07-08  9:54         ` Zhenyu Wang
  2020-07-08 18:48         ` Alex Williamson
  1 sibling, 2 replies; 34+ messages in thread
From: Tian, Kevin @ 2020-07-08  6:31 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Zhenyu Wang, intel-gvt-dev, kvm

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Wednesday, July 8, 2020 9:07 AM
> 
> On Tue, 7 Jul 2020 23:28:39 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > Hi, Alex,
> >
> > Gentle ping... Please let us know whether this version looks good.
> 
> I figured this is entangled with the versioning scheme.  There are
> unanswered questions about how something that assumes a device of a
> given type is software compatible to another device of the same type
> handles aggregation and how the type class would indicate compatibility
> with an aggregated instance.  Thanks,
> 

Yes, this open is an interesting topic. I didn't closely follow the versioning
scheme discussion. Below is some preliminary thought in my mind:

--
First, let's consider migrating an aggregated instance:

A conservative policy is to check whether the compatible type is supported 
on target device and whether available instances under that type can afford 
the ask of the aggregated instance. Compatibility check in this scheme is 
separated from aggregation check, then no change is required to the current 
versioning interface.

Then there comes a case where the target device doesn't handle aggregation
but support a different type which however provides compatible capabilities 
and same resource size as the aggregated instance expects. I guess this is
one puzzle how to check compatibility between such types. One possible
extension is to introduce a non_aggregated_list  to indicate compatible 
non-aggregated types for each aggregated instance. Then mgmt.. stack 
just loop the compatible list if the conservative policy fails.  I didn't think 
carefully about what format is reasonable here. But if we agree that an
separate interface is required to support such usage, then this may come
later after the basic migration_version interface is completed.
--

Another scenario is about migrating a non-aggregated instance to a device
handling aggregation. Then there is an open whether an aggregated type 
can be used to back the non-aggregated instance in case of no available 
instance under the original type claimed by non-aggregated instance. 
This won't happen in KVMGT, because all vGPU types share the same 
resource pool. Allocating instance under one type also decrement available 
instances under other types. So if we fail to find available instance under 
type-A (with 4x resource of type-B), then we will also fail to create an
 aggregated instance (aggregate=4) under type-B. therefore, we just 
need stick to basic type compatibility check for non-aggregated instance. 
And I feel this assumption can be applied to other devices handling 
aggregation. It doesn't make sense for two types to claim compatibility 
(only with resource size difference) when their resources are allocated
from different pools (which usually implies different capability or QOS/
SLA difference). With this assumption, we don't need provide another
interface to indicate compatible aggregated types for non-aggregated
interface.
--

I may definitely overlook something here, but if above analysis sounds
reasonable, then this series could be decoupled from the versioning 
scheme discussion based on conservative policy for now. :)

Thanks
Kevin

> 
> 
> > > From: Zhenyu Wang <zhenyuw@linux.intel.com>
> > > Sent: Wednesday, April 8, 2020 1:58 PM
> > >
> > > Hi,
> > >
> > > This is a refresh on previous series:
> > > https://patchwork.kernel.org/cover/11208279/
> > > and https://patchwork.freedesktop.org/series/70425/
> > >
> > > Current mdev device create interface depends on fixed mdev type, which
> > > get uuid from user to create instance of mdev device. If user wants to
> > > use customized number of resource for mdev device, then only can
> > > create new mdev type for that which may not be flexible. This
> > > requirement comes not only from to be able to allocate flexible
> > > resources for KVMGT, but also from Intel scalable IO virtualization
> > > which would use vfio/mdev to be able to allocate arbitrary resources
> > > on mdev instance. More info on [1] [2] [3].
> > >
> > > As we agreed that for current opaque mdev device type, we'd still
> > > explore management interface based on mdev sysfs definition. And this
> > > one tries to follow Alex's previous suggestion to create generic
> > > parameters under 'mdev' directory for each device, so vendor driver
> > > could provide support like as other defined mdev sysfs entries.
> > >
> > > For mdev type with aggregation support, files as "aggregated_instances"
> > > and "max_aggregation" should be created under 'mdev' directory. E.g
> > >
> > > /sys/devices/pci0000:00/0000:00:02.0/<UUID>/mdev/
> > >    |-- aggregated_instances
> > >    |-- max_aggregation
> > >
> > > "aggregated_instances" is used to set or return current number of
> > > instances for aggregation, which can not be larger than
> "max_aggregation".
> > >
> > > The first patch is to update the document for new mdev parameter
> directory.
> > > The second one is to add aggregation support in GVT driver.
> > >
> > > References:
> > > [1] https://software.intel.com/en-us/download/intel-virtualization-
> > > technology-for-directed-io-architecture-specification
> > > [2] https://software.intel.com/en-us/download/intel-scalable-io-
> > > virtualization-technical-specification
> > > [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
> > >
> > > Changelog:
> > > v3:
> > > - add more description for sysfs entries
> > > - rebase GVT support
> > > - rename accounting function
> > >
> > > Zhenyu Wang (2):
> > >   Documentation/driver-api/vfio-mediated-device.rst: update for
> > >     aggregation support
> > >   drm/i915/gvt: mdev aggregation type
> > >
> > >  .../driver-api/vfio-mediated-device.rst       |  22 +++
> > >  drivers/gpu/drm/i915/gvt/aperture_gm.c        |  44 +++--
> > >  drivers/gpu/drm/i915/gvt/gtt.c                |   9 +-
> > >  drivers/gpu/drm/i915/gvt/gvt.c                |   7 +-
> > >  drivers/gpu/drm/i915/gvt/gvt.h                |  42 +++--
> > >  drivers/gpu/drm/i915/gvt/kvmgt.c              | 115 +++++++++++-
> > >  drivers/gpu/drm/i915/gvt/vgpu.c               | 172 ++++++++++++------
> > >  7 files changed, 317 insertions(+), 94 deletions(-)
> > >
> > > --
> > > 2.25.1
> >


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/2] VFIO mdev aggregated resources handling
  2020-07-08  6:31       ` Tian, Kevin
@ 2020-07-08  9:54         ` Zhenyu Wang
  2020-07-08 18:48         ` Alex Williamson
  1 sibling, 0 replies; 34+ messages in thread
From: Zhenyu Wang @ 2020-07-08  9:54 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: Alex Williamson, Zhenyu Wang, intel-gvt-dev, kvm, Zhao, Yan Y

[-- Attachment #1: Type: text/plain, Size: 7265 bytes --]

On 2020.07.08 06:31:00 +0000, Tian, Kevin wrote:
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Wednesday, July 8, 2020 9:07 AM
> > 
> > On Tue, 7 Jul 2020 23:28:39 +0000
> > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> > 
> > > Hi, Alex,
> > >
> > > Gentle ping... Please let us know whether this version looks good.
> > 
> > I figured this is entangled with the versioning scheme.  There are
> > unanswered questions about how something that assumes a device of a
> > given type is software compatible to another device of the same type
> > handles aggregation and how the type class would indicate compatibility
> > with an aggregated instance.  Thanks,
> > 
> 
> Yes, this open is an interesting topic. I didn't closely follow the versioning
> scheme discussion. Below is some preliminary thought in my mind:
> 
> --
> First, let's consider migrating an aggregated instance:
> 
> A conservative policy is to check whether the compatible type is supported 
> on target device and whether available instances under that type can afford 
> the ask of the aggregated instance. Compatibility check in this scheme is 
> separated from aggregation check, then no change is required to the current 
> versioning interface.

In last mdev's aggregation series, no aggregation info is exposed in mdev type
until instance creates, so that would cause possible conflict w/o that info, e.g
type might have avail instances but not actually provide aggregation. Then from
that point of view, either require to add new flag because current 'description'
is useless or change versioning interface or require to be different type..

> 
> Then there comes a case where the target device doesn't handle aggregation
> but support a different type which however provides compatible capabilities 
> and same resource size as the aggregated instance expects. I guess this is
> one puzzle how to check compatibility between such types. One possible
> extension is to introduce a non_aggregated_list  to indicate compatible 
> non-aggregated types for each aggregated instance. Then mgmt.. stack 
> just loop the compatible list if the conservative policy fails.  I didn't think 
> carefully about what format is reasonable here. But if we agree that an
> separate interface is required to support such usage, then this may come
> later after the basic migration_version interface is completed.
> --
> 
> Another scenario is about migrating a non-aggregated instance to a device
> handling aggregation. Then there is an open whether an aggregated type 
> can be used to back the non-aggregated instance in case of no available 
> instance under the original type claimed by non-aggregated instance. 
> This won't happen in KVMGT, because all vGPU types share the same 
> resource pool. Allocating instance under one type also decrement available 
> instances under other types. So if we fail to find available instance under 
> type-A (with 4x resource of type-B), then we will also fail to create an
>  aggregated instance (aggregate=4) under type-B. therefore, we just 
> need stick to basic type compatibility check for non-aggregated instance. 
> And I feel this assumption can be applied to other devices handling 
> aggregation. It doesn't make sense for two types to claim compatibility 
> (only with resource size difference) when their resources are allocated
> from different pools (which usually implies different capability or QOS/
> SLA difference). With this assumption, we don't need provide another
> interface to indicate compatible aggregated types for non-aggregated
> interface.
> --
> 
> I may definitely overlook something here, but if above analysis sounds
> reasonable, then this series could be decoupled from the versioning 
> scheme discussion based on conservative policy for now. :)
> 
> Thanks
> Kevin
> 
> > 
> > 
> > > > From: Zhenyu Wang <zhenyuw@linux.intel.com>
> > > > Sent: Wednesday, April 8, 2020 1:58 PM
> > > >
> > > > Hi,
> > > >
> > > > This is a refresh on previous series:
> > > > https://patchwork.kernel.org/cover/11208279/
> > > > and https://patchwork.freedesktop.org/series/70425/
> > > >
> > > > Current mdev device create interface depends on fixed mdev type, which
> > > > get uuid from user to create instance of mdev device. If user wants to
> > > > use customized number of resource for mdev device, then only can
> > > > create new mdev type for that which may not be flexible. This
> > > > requirement comes not only from to be able to allocate flexible
> > > > resources for KVMGT, but also from Intel scalable IO virtualization
> > > > which would use vfio/mdev to be able to allocate arbitrary resources
> > > > on mdev instance. More info on [1] [2] [3].
> > > >
> > > > As we agreed that for current opaque mdev device type, we'd still
> > > > explore management interface based on mdev sysfs definition. And this
> > > > one tries to follow Alex's previous suggestion to create generic
> > > > parameters under 'mdev' directory for each device, so vendor driver
> > > > could provide support like as other defined mdev sysfs entries.
> > > >
> > > > For mdev type with aggregation support, files as "aggregated_instances"
> > > > and "max_aggregation" should be created under 'mdev' directory. E.g
> > > >
> > > > /sys/devices/pci0000:00/0000:00:02.0/<UUID>/mdev/
> > > >    |-- aggregated_instances
> > > >    |-- max_aggregation
> > > >
> > > > "aggregated_instances" is used to set or return current number of
> > > > instances for aggregation, which can not be larger than
> > "max_aggregation".
> > > >
> > > > The first patch is to update the document for new mdev parameter
> > directory.
> > > > The second one is to add aggregation support in GVT driver.
> > > >
> > > > References:
> > > > [1] https://software.intel.com/en-us/download/intel-virtualization-
> > > > technology-for-directed-io-architecture-specification
> > > > [2] https://software.intel.com/en-us/download/intel-scalable-io-
> > > > virtualization-technical-specification
> > > > [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
> > > >
> > > > Changelog:
> > > > v3:
> > > > - add more description for sysfs entries
> > > > - rebase GVT support
> > > > - rename accounting function
> > > >
> > > > Zhenyu Wang (2):
> > > >   Documentation/driver-api/vfio-mediated-device.rst: update for
> > > >     aggregation support
> > > >   drm/i915/gvt: mdev aggregation type
> > > >
> > > >  .../driver-api/vfio-mediated-device.rst       |  22 +++
> > > >  drivers/gpu/drm/i915/gvt/aperture_gm.c        |  44 +++--
> > > >  drivers/gpu/drm/i915/gvt/gtt.c                |   9 +-
> > > >  drivers/gpu/drm/i915/gvt/gvt.c                |   7 +-
> > > >  drivers/gpu/drm/i915/gvt/gvt.h                |  42 +++--
> > > >  drivers/gpu/drm/i915/gvt/kvmgt.c              | 115 +++++++++++-
> > > >  drivers/gpu/drm/i915/gvt/vgpu.c               | 172 ++++++++++++------
> > > >  7 files changed, 317 insertions(+), 94 deletions(-)
> > > >
> > > > --
> > > > 2.25.1
> > >
> 

-- 
Open Source Technology Center, Intel ltd.

$gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/2] VFIO mdev aggregated resources handling
  2020-07-08  6:31       ` Tian, Kevin
  2020-07-08  9:54         ` Zhenyu Wang
@ 2020-07-08 18:48         ` Alex Williamson
  2020-07-09  2:53           ` Tian, Kevin
  1 sibling, 1 reply; 34+ messages in thread
From: Alex Williamson @ 2020-07-08 18:48 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: Zhenyu Wang, intel-gvt-dev, kvm

On Wed, 8 Jul 2020 06:31:00 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Wednesday, July 8, 2020 9:07 AM
> > 
> > On Tue, 7 Jul 2020 23:28:39 +0000
> > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >   
> > > Hi, Alex,
> > >
> > > Gentle ping... Please let us know whether this version looks good.  
> > 
> > I figured this is entangled with the versioning scheme.  There are
> > unanswered questions about how something that assumes a device of a
> > given type is software compatible to another device of the same type
> > handles aggregation and how the type class would indicate compatibility
> > with an aggregated instance.  Thanks,
> >   
> 
> Yes, this open is an interesting topic. I didn't closely follow the versioning
> scheme discussion. Below is some preliminary thought in my mind:
> 
> --
> First, let's consider migrating an aggregated instance:
> 
> A conservative policy is to check whether the compatible type is supported 
> on target device and whether available instances under that type can afford 
> the ask of the aggregated instance. Compatibility check in this scheme is 
> separated from aggregation check, then no change is required to the current 
> versioning interface.

How many features, across how many attributes is an administrative tool
supposed to check for compatibility?  ie. if we add an 'aggregation'
feature now and 'translucency' feature next year, with new sysfs
attributes and creation options, won't that break this scheme?  I'm not
willing to assume aggregation is the sole new feature we will ever add,
therefore we don't get to make it a special case without a plan for how
the next special case will be integrated.

We also can't even seem to agree that type is a necessary requirement
for compatibility.  Your discussion below of a type-A, which is
equivalent to a type-B w/ aggregation set to some value is an example
of this.  We might also have physical devices with extensions to
support migration.  These could possibly be compatible with full mdev
devices.  We have no idea how an administrative tool would discover
this other than an exhaustive search across every possible target.
That's ugly but feasible when considering a single target host, but
completely untenable when considering a datacenter.

 
> Then there comes a case where the target device doesn't handle aggregation
> but support a different type which however provides compatible capabilities 
> and same resource size as the aggregated instance expects. I guess this is
> one puzzle how to check compatibility between such types. One possible
> extension is to introduce a non_aggregated_list  to indicate compatible 
> non-aggregated types for each aggregated instance. Then mgmt.. stack 
> just loop the compatible list if the conservative policy fails.  I didn't think 
> carefully about what format is reasonable here. But if we agree that an
> separate interface is required to support such usage, then this may come
> later after the basic migration_version interface is completed.

...and then a non_translucency_list and then a non_brilliance_list and
then a non_whatever_list... no.  Additionally it's been shown difficult
to predict the future, if a new device is developed to be compatible
with an existing device it would require updates to the existing device
to learn about that compatibility.

> --
> 
> Another scenario is about migrating a non-aggregated instance to a device
> handling aggregation. Then there is an open whether an aggregated type 
> can be used to back the non-aggregated instance in case of no available 
> instance under the original type claimed by non-aggregated instance. 
> This won't happen in KVMGT, because all vGPU types share the same 
> resource pool. Allocating instance under one type also decrement available 
> instances under other types. So if we fail to find available instance under 
> type-A (with 4x resource of type-B), then we will also fail to create an
>  aggregated instance (aggregate=4) under type-B. therefore, we just 
> need stick to basic type compatibility check for non-aggregated instance. 
> And I feel this assumption can be applied to other devices handling 
> aggregation. It doesn't make sense for two types to claim compatibility 
> (only with resource size difference) when their resources are allocated
> from different pools (which usually implies different capability or QOS/
> SLA difference). With this assumption, we don't need provide another
> interface to indicate compatible aggregated types for non-aggregated
> interface.
> --
> 
> I may definitely overlook something here, but if above analysis sounds
> reasonable, then this series could be decoupled from the versioning 
> scheme discussion based on conservative policy for now. :)

The only potential I see for decoupling the discussions would be to do
aggregation via a vendor attribute.  Those already provide a mechanism
to manipulate a device after creation and something that we'll already
need to solve in determining migration compatibility.  So in that
sense, it seems like it at least doesn't make the problem worse.
Thanks,

Alex


^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v3 0/2] VFIO mdev aggregated resources handling
  2020-07-08 18:48         ` Alex Williamson
@ 2020-07-09  2:53           ` Tian, Kevin
  2020-07-09  7:23             ` Yan Zhao
  2020-07-09 17:28             ` Alex Williamson
  0 siblings, 2 replies; 34+ messages in thread
From: Tian, Kevin @ 2020-07-09  2:53 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Zhenyu Wang, intel-gvt-dev, kvm

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Thursday, July 9, 2020 2:48 AM
> 
> On Wed, 8 Jul 2020 06:31:00 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Wednesday, July 8, 2020 9:07 AM
> > >
> > > On Tue, 7 Jul 2020 23:28:39 +0000
> > > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> > >
> > > > Hi, Alex,
> > > >
> > > > Gentle ping... Please let us know whether this version looks good.
> > >
> > > I figured this is entangled with the versioning scheme.  There are
> > > unanswered questions about how something that assumes a device of a
> > > given type is software compatible to another device of the same type
> > > handles aggregation and how the type class would indicate compatibility
> > > with an aggregated instance.  Thanks,
> > >
> >
> > Yes, this open is an interesting topic. I didn't closely follow the versioning
> > scheme discussion. Below is some preliminary thought in my mind:
> >
> > --
> > First, let's consider migrating an aggregated instance:
> >
> > A conservative policy is to check whether the compatible type is supported
> > on target device and whether available instances under that type can
> afford
> > the ask of the aggregated instance. Compatibility check in this scheme is
> > separated from aggregation check, then no change is required to the
> current
> > versioning interface.
> 
> How many features, across how many attributes is an administrative tool
> supposed to check for compatibility?  ie. if we add an 'aggregation'
> feature now and 'translucency' feature next year, with new sysfs
> attributes and creation options, won't that break this scheme?  I'm not
> willing to assume aggregation is the sole new feature we will ever add,
> therefore we don't get to make it a special case without a plan for how
> the next special case will be integrated.

Got you. I thought aggregation is special since it is purely about linear
resource adjustment w/o changing the feature set of the instance, thus
reasonable to get special handling in management stack which needs
to understand this attribute anyway. But I agree that it's difficult to 
predict the future and other special cases...

> 
> We also can't even seem to agree that type is a necessary requirement
> for compatibility.  Your discussion below of a type-A, which is
> equivalent to a type-B w/ aggregation set to some value is an example
> of this.  We might also have physical devices with extensions to
> support migration.  These could possibly be compatible with full mdev
> devices.  We have no idea how an administrative tool would discover
> this other than an exhaustive search across every possible target.
> That's ugly but feasible when considering a single target host, but
> completely untenable when considering a datacenter.

If exhaustive search can be done just one-off to build the compatibility
database for all assignable devices on each node, then it might be
still tenable in datacenter?

> 
> 
> > Then there comes a case where the target device doesn't handle
> aggregation
> > but support a different type which however provides compatible
> capabilities
> > and same resource size as the aggregated instance expects. I guess this is
> > one puzzle how to check compatibility between such types. One possible
> > extension is to introduce a non_aggregated_list  to indicate compatible
> > non-aggregated types for each aggregated instance. Then mgmt.. stack
> > just loop the compatible list if the conservative policy fails.  I didn't think
> > carefully about what format is reasonable here. But if we agree that an
> > separate interface is required to support such usage, then this may come
> > later after the basic migration_version interface is completed.
> 
> ...and then a non_translucency_list and then a non_brilliance_list and
> then a non_whatever_list... no.  Additionally it's been shown difficult
> to predict the future, if a new device is developed to be compatible
> with an existing device it would require updates to the existing device
> to learn about that compatibility.

I suppose a compatibility list like this doesn't require the existing device
to update. It should be new device's compatibility to claim compatibility
to the types carried in existing list. 

> 
> > --
> >
> > Another scenario is about migrating a non-aggregated instance to a device
> > handling aggregation. Then there is an open whether an aggregated type
> > can be used to back the non-aggregated instance in case of no available
> > instance under the original type claimed by non-aggregated instance.
> > This won't happen in KVMGT, because all vGPU types share the same
> > resource pool. Allocating instance under one type also decrement available
> > instances under other types. So if we fail to find available instance under
> > type-A (with 4x resource of type-B), then we will also fail to create an
> >  aggregated instance (aggregate=4) under type-B. therefore, we just
> > need stick to basic type compatibility check for non-aggregated instance.
> > And I feel this assumption can be applied to other devices handling
> > aggregation. It doesn't make sense for two types to claim compatibility
> > (only with resource size difference) when their resources are allocated
> > from different pools (which usually implies different capability or QOS/
> > SLA difference). With this assumption, we don't need provide another
> > interface to indicate compatible aggregated types for non-aggregated
> > interface.
> > --
> >
> > I may definitely overlook something here, but if above analysis sounds
> > reasonable, then this series could be decoupled from the versioning
> > scheme discussion based on conservative policy for now. :)
> 
> The only potential I see for decoupling the discussions would be to do
> aggregation via a vendor attribute.  Those already provide a mechanism
> to manipulate a device after creation and something that we'll already
> need to solve in determining migration compatibility.  So in that
> sense, it seems like it at least doesn't make the problem worse.
> Thanks,
> 

This makes some sense, since anyway 'aggregation' still changes how the
instance looks like. But let me understand clearly. Are you proposing 
actually moving 'aggregation' to be a vendor attribute (i.e. removing
the 'mdev' sub-directy in this patch), or more about a policy of treating
it as a vendor attribute? If the former, is there any problem of having
Libvirt manage this attribute given that it becomes vendor specific now?

Thanks
Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/2] VFIO mdev aggregated resources handling
  2020-07-09  2:53           ` Tian, Kevin
@ 2020-07-09  7:23             ` Yan Zhao
  2020-07-09 20:22               ` Alex Williamson
  2020-07-09 17:28             ` Alex Williamson
  1 sibling, 1 reply; 34+ messages in thread
From: Yan Zhao @ 2020-07-09  7:23 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: Alex Williamson, intel-gvt-dev, kvm, Zhenyu Wang

On Thu, Jul 09, 2020 at 02:53:05AM +0000, Tian, Kevin wrote:

<...>
> > We also can't even seem to agree that type is a necessary requirement
> > for compatibility.  Your discussion below of a type-A, which is
> > equivalent to a type-B w/ aggregation set to some value is an example
> > of this.  We might also have physical devices with extensions to
> > support migration.  These could possibly be compatible with full mdev
> > devices.  We have no idea how an administrative tool would discover
> > this other than an exhaustive search across every possible target.
> > That's ugly but feasible when considering a single target host, but
> > completely untenable when considering a datacenter.
> 
> If exhaustive search can be done just one-off to build the compatibility
> database for all assignable devices on each node, then it might be
> still tenable in datacenter?
yes, Alex, do you think below behavior to build compatibility database is
acceptable?

management stack could do the exhaustive search in one shot to build the
compatibility database for all devices in every node. Meanwhile, it caches
migration version strings for all tested devices.
when there's a newly created/attached device, management stack could write
every cached strings to migration version attribute of the newly
created/attached device in order to update the migration compatibility
database. Then it caches the migration version string of the newly
created/attached device as well.
once a device attribute is modified, e.g. after changing its aggregation
count or updating its parent driver, update its cached migration version
string and update the compatibility database by testing against migration
version attribute of this device.


Thanks
Yan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/2] VFIO mdev aggregated resources handling
  2020-07-09  2:53           ` Tian, Kevin
  2020-07-09  7:23             ` Yan Zhao
@ 2020-07-09 17:28             ` Alex Williamson
  2020-07-10  2:09               ` Tian, Kevin
  1 sibling, 1 reply; 34+ messages in thread
From: Alex Williamson @ 2020-07-09 17:28 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: Zhenyu Wang, intel-gvt-dev, kvm

On Thu, 9 Jul 2020 02:53:05 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Thursday, July 9, 2020 2:48 AM
> > 
> > On Wed, 8 Jul 2020 06:31:00 +0000
> > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >   
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Wednesday, July 8, 2020 9:07 AM
> > > >
> > > > On Tue, 7 Jul 2020 23:28:39 +0000
> > > > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> > > >  
> > > > > Hi, Alex,
> > > > >
> > > > > Gentle ping... Please let us know whether this version looks good.  
> > > >
> > > > I figured this is entangled with the versioning scheme.  There are
> > > > unanswered questions about how something that assumes a device of a
> > > > given type is software compatible to another device of the same type
> > > > handles aggregation and how the type class would indicate compatibility
> > > > with an aggregated instance.  Thanks,
> > > >  
> > >
> > > Yes, this open is an interesting topic. I didn't closely follow the versioning
> > > scheme discussion. Below is some preliminary thought in my mind:
> > >
> > > --
> > > First, let's consider migrating an aggregated instance:
> > >
> > > A conservative policy is to check whether the compatible type is supported
> > > on target device and whether available instances under that type can  
> > afford  
> > > the ask of the aggregated instance. Compatibility check in this scheme is
> > > separated from aggregation check, then no change is required to the  
> > current  
> > > versioning interface.  
> > 
> > How many features, across how many attributes is an administrative tool
> > supposed to check for compatibility?  ie. if we add an 'aggregation'
> > feature now and 'translucency' feature next year, with new sysfs
> > attributes and creation options, won't that break this scheme?  I'm not
> > willing to assume aggregation is the sole new feature we will ever add,
> > therefore we don't get to make it a special case without a plan for how
> > the next special case will be integrated.  
> 
> Got you. I thought aggregation is special since it is purely about linear
> resource adjustment w/o changing the feature set of the instance, thus
> reasonable to get special handling in management stack which needs
> to understand this attribute anyway. But I agree that it's difficult to 
> predict the future and other special cases...
> 
> > 
> > We also can't even seem to agree that type is a necessary requirement
> > for compatibility.  Your discussion below of a type-A, which is
> > equivalent to a type-B w/ aggregation set to some value is an example
> > of this.  We might also have physical devices with extensions to
> > support migration.  These could possibly be compatible with full mdev
> > devices.  We have no idea how an administrative tool would discover
> > this other than an exhaustive search across every possible target.
> > That's ugly but feasible when considering a single target host, but
> > completely untenable when considering a datacenter.  
> 
> If exhaustive search can be done just one-off to build the compatibility
> database for all assignable devices on each node, then it might be
> still tenable in datacenter?


I'm not sure what "one-off" means relative to this discussion.  Is this
trying to argue that if it's a disturbingly heavyweight operation, but
a management tool only needs to do it once, it's ok?  We should really
be including openstack and ovirt folks in any discussion about what
might be acceptable across a datacenter.  I can sometimes get away with
representing what might be feasible for libvirt, but this is the sort
of knowledge and policy decision that would occur above libvirt.


> > > Then there comes a case where the target device doesn't handle  
> > aggregation  
> > > but support a different type which however provides compatible  
> > capabilities  
> > > and same resource size as the aggregated instance expects. I guess this is
> > > one puzzle how to check compatibility between such types. One possible
> > > extension is to introduce a non_aggregated_list  to indicate compatible
> > > non-aggregated types for each aggregated instance. Then mgmt.. stack
> > > just loop the compatible list if the conservative policy fails.  I didn't think
> > > carefully about what format is reasonable here. But if we agree that an
> > > separate interface is required to support such usage, then this may come
> > > later after the basic migration_version interface is completed.  
> > 
> > ...and then a non_translucency_list and then a non_brilliance_list and
> > then a non_whatever_list... no.  Additionally it's been shown difficult
> > to predict the future, if a new device is developed to be compatible
> > with an existing device it would require updates to the existing device
> > to learn about that compatibility.  
> 
> I suppose a compatibility list like this doesn't require the existing device
> to update. It should be new device's compatibility to claim compatibility
> to the types carried in existing list. 


Doesn't the problem go both ways?  If we have an existing
non-aggregated device and a new aggregated device that claims
compatibility, then maybe we can figure out that [existing -> new] might
be a possibility, but we can't figure out [new -> existing].

I'm also a bit concerned about the idea that we can take an opaque
string from one {vendor,device} and try to use it on another.  Ideally
this wouldn't cause problems, but we're essentially making the usage
policy and exercise in fuzzing the interface from other vendors.  Also,
we've defined that the version string is opaque, userspace is not
allowed to interpret it, so why would we then allow another vendor
driver to interpret it?  Does that mean we should consider the vendor
driver as the top-level match rather than the mdev type (where the mdev
type already includes the vendor driver)?  That immediately excludes
[phys <-> mdev] migration though unless the same vendor driver is
wrapping the phys device.  I'm not sure we're making any progress,
perhaps the premise of an opaque version string needs to be revisited.


> > > --
> > >
> > > Another scenario is about migrating a non-aggregated instance to a device
> > > handling aggregation. Then there is an open whether an aggregated type
> > > can be used to back the non-aggregated instance in case of no available
> > > instance under the original type claimed by non-aggregated instance.
> > > This won't happen in KVMGT, because all vGPU types share the same
> > > resource pool. Allocating instance under one type also decrement available
> > > instances under other types. So if we fail to find available instance under
> > > type-A (with 4x resource of type-B), then we will also fail to create an
> > >  aggregated instance (aggregate=4) under type-B. therefore, we just
> > > need stick to basic type compatibility check for non-aggregated instance.
> > > And I feel this assumption can be applied to other devices handling
> > > aggregation. It doesn't make sense for two types to claim compatibility
> > > (only with resource size difference) when their resources are allocated
> > > from different pools (which usually implies different capability or QOS/
> > > SLA difference). With this assumption, we don't need provide another
> > > interface to indicate compatible aggregated types for non-aggregated
> > > interface.
> > > --
> > >
> > > I may definitely overlook something here, but if above analysis sounds
> > > reasonable, then this series could be decoupled from the versioning
> > > scheme discussion based on conservative policy for now. :)  
> > 
> > The only potential I see for decoupling the discussions would be to do
> > aggregation via a vendor attribute.  Those already provide a mechanism
> > to manipulate a device after creation and something that we'll already
> > need to solve in determining migration compatibility.  So in that
> > sense, it seems like it at least doesn't make the problem worse.
> > Thanks,
> >   
> 
> This makes some sense, since anyway 'aggregation' still changes how the
> instance looks like. But let me understand clearly. Are you proposing 
> actually moving 'aggregation' to be a vendor attribute (i.e. removing
> the 'mdev' sub-directy in this patch), or more about a policy of treating
> it as a vendor attribute? If the former, is there any problem of having
> Libvirt manage this attribute given that it becomes vendor specific now?

I expect that libvirt would prefer not to deal with vendor attributes,
but I don't know that they would be opposed to it.  But shouldn't
vendor attributes largely be hidden from libvirt if mdevctl is used to
create the target instance?  For example at the source we'd use 'mdevctl
list' with the --dumpjson option to get the definition of the device.
At the target, something would modify the parent information in that
json definition and use 'mdevctl <start|define>' to create the
instance.  The definition would include any vendor attributes that had
previously been set for the source device using 'mdevctl modify'.
Thanks,

Alex


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/2] VFIO mdev aggregated resources handling
  2020-07-09  7:23             ` Yan Zhao
@ 2020-07-09 20:22               ` Alex Williamson
  2020-07-10  1:58                 ` Yan Zhao
  0 siblings, 1 reply; 34+ messages in thread
From: Alex Williamson @ 2020-07-09 20:22 UTC (permalink / raw)
  To: Yan Zhao; +Cc: Tian, Kevin, intel-gvt-dev, kvm, Zhenyu Wang

On Thu, 9 Jul 2020 15:23:34 +0800
Yan Zhao <yan.y.zhao@intel.com> wrote:

> On Thu, Jul 09, 2020 at 02:53:05AM +0000, Tian, Kevin wrote:
> 
> <...>
> > > We also can't even seem to agree that type is a necessary requirement
> > > for compatibility.  Your discussion below of a type-A, which is
> > > equivalent to a type-B w/ aggregation set to some value is an example
> > > of this.  We might also have physical devices with extensions to
> > > support migration.  These could possibly be compatible with full mdev
> > > devices.  We have no idea how an administrative tool would discover
> > > this other than an exhaustive search across every possible target.
> > > That's ugly but feasible when considering a single target host, but
> > > completely untenable when considering a datacenter.  
> > 
> > If exhaustive search can be done just one-off to build the compatibility
> > database for all assignable devices on each node, then it might be
> > still tenable in datacenter?  
> yes, Alex, do you think below behavior to build compatibility database is
> acceptable?
> 
> management stack could do the exhaustive search in one shot to build the
> compatibility database for all devices in every node. Meanwhile, it caches
> migration version strings for all tested devices.
> when there's a newly created/attached device, management stack could write
> every cached strings to migration version attribute of the newly
> created/attached device in order to update the migration compatibility
> database. Then it caches the migration version string of the newly
> created/attached device as well.
> once a device attribute is modified, e.g. after changing its aggregation
> count or updating its parent driver, update its cached migration version
> string and update the compatibility database by testing against migration
> version attribute of this device.

This is exactly the scenario that I think is untenable.  You're asking
a management application to keep a live database of the opaque version
string of every device type and likely every instance of a device,
which it's not allowed to compare on its own, it can only pipe them
through to every other device across the datacenter to determine which
are comparable.  It would need to respond to not only devices being
added and removed, but bound and unbound from drivers, and entire nodes
being added and removed.  That seems like a lot of overhead, in
addition to the effect of essentially fuzzing the version interface
across all vendors and types.  We need a better solution or we need
someone from openstack and ovirt to agree that this is more viable than
I suspect. Thanks,

Alex


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/2] VFIO mdev aggregated resources handling
  2020-07-09 20:22               ` Alex Williamson
@ 2020-07-10  1:58                 ` Yan Zhao
  2020-07-10 15:00                   ` Alex Williamson
  0 siblings, 1 reply; 34+ messages in thread
From: Yan Zhao @ 2020-07-10  1:58 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Tian, Kevin, intel-gvt-dev, kvm, Zhenyu Wang

On Thu, Jul 09, 2020 at 02:22:26PM -0600, Alex Williamson wrote:
> On Thu, 9 Jul 2020 15:23:34 +0800
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > On Thu, Jul 09, 2020 at 02:53:05AM +0000, Tian, Kevin wrote:
> > 
> > <...>
> > > > We also can't even seem to agree that type is a necessary requirement
> > > > for compatibility.  Your discussion below of a type-A, which is
> > > > equivalent to a type-B w/ aggregation set to some value is an example
> > > > of this.  We might also have physical devices with extensions to
> > > > support migration.  These could possibly be compatible with full mdev
> > > > devices.  We have no idea how an administrative tool would discover
> > > > this other than an exhaustive search across every possible target.
> > > > That's ugly but feasible when considering a single target host, but
> > > > completely untenable when considering a datacenter.  
> > > 
> > > If exhaustive search can be done just one-off to build the compatibility
> > > database for all assignable devices on each node, then it might be
> > > still tenable in datacenter?  
> > yes, Alex, do you think below behavior to build compatibility database is
> > acceptable?
> > 
> > management stack could do the exhaustive search in one shot to build the
> > compatibility database for all devices in every node. Meanwhile, it caches
> > migration version strings for all tested devices.
> > when there's a newly created/attached device, management stack could write
> > every cached strings to migration version attribute of the newly
> > created/attached device in order to update the migration compatibility
> > database. Then it caches the migration version string of the newly
> > created/attached device as well.
> > once a device attribute is modified, e.g. after changing its aggregation
> > count or updating its parent driver, update its cached migration version
> > string and update the compatibility database by testing against migration
> > version attribute of this device.
> 
> This is exactly the scenario that I think is untenable.  You're asking
> a management application to keep a live database of the opaque version
> string of every device type and likely every instance of a device,
> which it's not allowed to compare on its own, it can only pipe them
if management stack is allowed to compare on its own, then the migration
version strings have to be standardized.

But it's a little hard to do it.
e.g. 
for GVT, string format can be
"parent device PCI ID" + "version of gvt driver" + "mdev type" +
"aggregator count".

for an NVMe VF connecting to a remote storage. it is
"PCI ID" + "driver version" + "configured remote storage URL"

for a QAT VF, it's
"PCI ID" + "driver version" + "supported encryption set".

The management stack also needs to understand how to compare those
strings, which is also hard. e.g.
two device with different PCI IDs are incompatible initially, but later
because of software upgrade, they are compatible again.


> through to every other device across the datacenter to determine which
> are comparable.  It would need to respond to not only devices being
> added and removed, but bound and unbound from drivers, and entire nodes
> being added and removed.  That seems like a lot of overhead, in
those responses are also required if the management stack wants to
compare on its own, right?

> addition to the effect of essentially fuzzing the version interface
> across all vendors and types.  We need a better solution or we need
> someone from openstack and ovirt to agree that this is more viable than
> I suspect. Thanks
before we have a better solution, do you think it's good to ask people
from openstack and ovirt first? what if the problem is not as complicated
as we thought?

Thanks
Yan

 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v3 0/2] VFIO mdev aggregated resources handling
  2020-07-09 17:28             ` Alex Williamson
@ 2020-07-10  2:09               ` Tian, Kevin
  2020-07-10  6:29                 ` Yan Zhao
  0 siblings, 1 reply; 34+ messages in thread
From: Tian, Kevin @ 2020-07-10  2:09 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Zhenyu Wang, intel-gvt-dev, kvm

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, July 10, 2020 1:28 AM
> 
> On Thu, 9 Jul 2020 02:53:05 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Thursday, July 9, 2020 2:48 AM
> > >
> > > On Wed, 8 Jul 2020 06:31:00 +0000
> > > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> > >
> > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > Sent: Wednesday, July 8, 2020 9:07 AM
> > > > >
> > > > > On Tue, 7 Jul 2020 23:28:39 +0000
> > > > > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> > > > >
> > > > > > Hi, Alex,
> > > > > >
> > > > > > Gentle ping... Please let us know whether this version looks good.
> > > > >
> > > > > I figured this is entangled with the versioning scheme.  There are
> > > > > unanswered questions about how something that assumes a device
> of a
> > > > > given type is software compatible to another device of the same type
> > > > > handles aggregation and how the type class would indicate
> compatibility
> > > > > with an aggregated instance.  Thanks,
> > > > >
> > > >
> > > > Yes, this open is an interesting topic. I didn't closely follow the
> versioning
> > > > scheme discussion. Below is some preliminary thought in my mind:
> > > >
> > > > --
> > > > First, let's consider migrating an aggregated instance:
> > > >
> > > > A conservative policy is to check whether the compatible type is
> supported
> > > > on target device and whether available instances under that type can
> > > afford
> > > > the ask of the aggregated instance. Compatibility check in this scheme is
> > > > separated from aggregation check, then no change is required to the
> > > current
> > > > versioning interface.
> > >
> > > How many features, across how many attributes is an administrative tool
> > > supposed to check for compatibility?  ie. if we add an 'aggregation'
> > > feature now and 'translucency' feature next year, with new sysfs
> > > attributes and creation options, won't that break this scheme?  I'm not
> > > willing to assume aggregation is the sole new feature we will ever add,
> > > therefore we don't get to make it a special case without a plan for how
> > > the next special case will be integrated.
> >
> > Got you. I thought aggregation is special since it is purely about linear
> > resource adjustment w/o changing the feature set of the instance, thus
> > reasonable to get special handling in management stack which needs
> > to understand this attribute anyway. But I agree that it's difficult to
> > predict the future and other special cases...
> >
> > >
> > > We also can't even seem to agree that type is a necessary requirement
> > > for compatibility.  Your discussion below of a type-A, which is
> > > equivalent to a type-B w/ aggregation set to some value is an example
> > > of this.  We might also have physical devices with extensions to
> > > support migration.  These could possibly be compatible with full mdev
> > > devices.  We have no idea how an administrative tool would discover
> > > this other than an exhaustive search across every possible target.
> > > That's ugly but feasible when considering a single target host, but
> > > completely untenable when considering a datacenter.
> >
> > If exhaustive search can be done just one-off to build the compatibility
> > database for all assignable devices on each node, then it might be
> > still tenable in datacenter?
> 
> 
> I'm not sure what "one-off" means relative to this discussion.  Is this
> trying to argue that if it's a disturbingly heavyweight operation, but
> a management tool only needs to do it once, it's ok?  We should really

yes

> be including openstack and ovirt folks in any discussion about what
> might be acceptable across a datacenter.  I can sometimes get away with
> representing what might be feasible for libvirt, but this is the sort
> of knowledge and policy decision that would occur above libvirt.

Agree. and since this is more about general migration compatibility,
let's start new thread and involve openstack/ovirt guys. Yan, can you
initiate this?

> 
> 
> > > > Then there comes a case where the target device doesn't handle
> > > aggregation
> > > > but support a different type which however provides compatible
> > > capabilities
> > > > and same resource size as the aggregated instance expects. I guess this
> is
> > > > one puzzle how to check compatibility between such types. One
> possible
> > > > extension is to introduce a non_aggregated_list  to indicate compatible
> > > > non-aggregated types for each aggregated instance. Then mgmt.. stack
> > > > just loop the compatible list if the conservative policy fails.  I didn't think
> > > > carefully about what format is reasonable here. But if we agree that an
> > > > separate interface is required to support such usage, then this may
> come
> > > > later after the basic migration_version interface is completed.
> > >
> > > ...and then a non_translucency_list and then a non_brilliance_list and
> > > then a non_whatever_list... no.  Additionally it's been shown difficult
> > > to predict the future, if a new device is developed to be compatible
> > > with an existing device it would require updates to the existing device
> > > to learn about that compatibility.
> >
> > I suppose a compatibility list like this doesn't require the existing device
> > to update. It should be new device's compatibility to claim compatibility
> > to the types carried in existing list.
> 
> 
> Doesn't the problem go both ways?  If we have an existing
> non-aggregated device and a new aggregated device that claims
> compatibility, then maybe we can figure out that [existing -> new] might
> be a possibility, but we can't figure out [new -> existing].

I thought some restrictions on [new->existing] is usually acceptable. 

> 
> I'm also a bit concerned about the idea that we can take an opaque
> string from one {vendor,device} and try to use it on another.  Ideally
> this wouldn't cause problems, but we're essentially making the usage
> policy and exercise in fuzzing the interface from other vendors.  Also,
> we've defined that the version string is opaque, userspace is not
> allowed to interpret it, so why would we then allow another vendor
> driver to interpret it?  Does that mean we should consider the vendor
> driver as the top-level match rather than the mdev type (where the mdev
> type already includes the vendor driver)?  That immediately excludes
> [phys <-> mdev] migration though unless the same vendor driver is
> wrapping the phys device.  I'm not sure we're making any progress,
> perhaps the premise of an opaque version string needs to be revisited.

I see your concerns. My original point was for each type in non_aggregated_
list to follow the same compatibility check as required for the basic case 
(w/o aggregation). Obviously there are lots of opens even in the basic
part. So I'd suggest to focus on how we may move this series forward 
in this thread, without being entangled by the compatibility progress. 😊 

> 
> 
> > > > --
> > > >
> > > > Another scenario is about migrating a non-aggregated instance to a
> device
> > > > handling aggregation. Then there is an open whether an aggregated
> type
> > > > can be used to back the non-aggregated instance in case of no available
> > > > instance under the original type claimed by non-aggregated instance.
> > > > This won't happen in KVMGT, because all vGPU types share the same
> > > > resource pool. Allocating instance under one type also decrement
> available
> > > > instances under other types. So if we fail to find available instance
> under
> > > > type-A (with 4x resource of type-B), then we will also fail to create an
> > > >  aggregated instance (aggregate=4) under type-B. therefore, we just
> > > > need stick to basic type compatibility check for non-aggregated instance.
> > > > And I feel this assumption can be applied to other devices handling
> > > > aggregation. It doesn't make sense for two types to claim compatibility
> > > > (only with resource size difference) when their resources are allocated
> > > > from different pools (which usually implies different capability or QOS/
> > > > SLA difference). With this assumption, we don't need provide another
> > > > interface to indicate compatible aggregated types for non-aggregated
> > > > interface.
> > > > --
> > > >
> > > > I may definitely overlook something here, but if above analysis sounds
> > > > reasonable, then this series could be decoupled from the versioning
> > > > scheme discussion based on conservative policy for now. :)
> > >
> > > The only potential I see for decoupling the discussions would be to do
> > > aggregation via a vendor attribute.  Those already provide a mechanism
> > > to manipulate a device after creation and something that we'll already
> > > need to solve in determining migration compatibility.  So in that
> > > sense, it seems like it at least doesn't make the problem worse.
> > > Thanks,
> > >
> >
> > This makes some sense, since anyway 'aggregation' still changes how the
> > instance looks like. But let me understand clearly. Are you proposing
> > actually moving 'aggregation' to be a vendor attribute (i.e. removing
> > the 'mdev' sub-directy in this patch), or more about a policy of treating
> > it as a vendor attribute? If the former, is there any problem of having
> > Libvirt manage this attribute given that it becomes vendor specific now?
> 
> I expect that libvirt would prefer not to deal with vendor attributes,
> but I don't know that they would be opposed to it.  But shouldn't
> vendor attributes largely be hidden from libvirt if mdevctl is used to
> create the target instance?  For example at the source we'd use 'mdevctl
> list' with the --dumpjson option to get the definition of the device.
> At the target, something would modify the parent information in that
> json definition and use 'mdevctl <start|define>' to create the
> instance.  The definition would include any vendor attributes that had
> previously been set for the source device using 'mdevctl modify'.
> Thanks,
> 

But Libvirt still needs to know this attribute and exposed to openstack
or ovirt for control, right? If we leave it as a vendor defined attribute,
one vendor may choose 'aggregate' and another vendor may use
'merge', etc. Cannot we still pursue the generic attribute path as
this patch does, but just treat it as vendor attributes when doing
compatibility check? In concept 'aggregate' is really vendor-agnostic
and just about linear resource management...

Thanks
Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/2] VFIO mdev aggregated resources handling
  2020-07-10  2:09               ` Tian, Kevin
@ 2020-07-10  6:29                 ` Yan Zhao
  2020-07-10 15:12                   ` Alex Williamson
  0 siblings, 1 reply; 34+ messages in thread
From: Yan Zhao @ 2020-07-10  6:29 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: Alex Williamson, intel-gvt-dev, kvm, Zhenyu Wang

On Fri, Jul 10, 2020 at 02:09:06AM +0000, Tian, Kevin wrote:
<...>
> > > > We also can't even seem to agree that type is a necessary requirement
> > > > for compatibility.  Your discussion below of a type-A, which is
> > > > equivalent to a type-B w/ aggregation set to some value is an example
> > > > of this.  We might also have physical devices with extensions to
> > > > support migration.  These could possibly be compatible with full mdev
> > > > devices.  We have no idea how an administrative tool would discover
> > > > this other than an exhaustive search across every possible target.
> > > > That's ugly but feasible when considering a single target host, but
> > > > completely untenable when considering a datacenter.
> > >
> > > If exhaustive search can be done just one-off to build the compatibility
> > > database for all assignable devices on each node, then it might be
> > > still tenable in datacenter?
> > 
> > 
> > I'm not sure what "one-off" means relative to this discussion.  Is this
> > trying to argue that if it's a disturbingly heavyweight operation, but
> > a management tool only needs to do it once, it's ok?  We should really
> 
> yes
> 
> > be including openstack and ovirt folks in any discussion about what
> > might be acceptable across a datacenter.  I can sometimes get away with
> > representing what might be feasible for libvirt, but this is the sort
> > of knowledge and policy decision that would occur above libvirt.
> 
> Agree. and since this is more about general migration compatibility,
> let's start new thread and involve openstack/ovirt guys. Yan, can you
> initiate this?
>
sure.
hi Alex,
I'm not sure if below mailling lists are enough and accurate,
do you know what extra people and lists I need to involve in?

devel@ovirt.org, openstack-discuss@lists.openstack.org,
libvir-list@redhat.com


BTW, I found a page about live migration of SRIOV devices in openstack.
https://specs.openstack.org/openstack/nova-specs/specs/stein/approved/libvirt-neutron-sriov-livemigration.html

Thanks
Yan


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/2] VFIO mdev aggregated resources handling
  2020-07-10  1:58                 ` Yan Zhao
@ 2020-07-10 15:00                   ` Alex Williamson
  0 siblings, 0 replies; 34+ messages in thread
From: Alex Williamson @ 2020-07-10 15:00 UTC (permalink / raw)
  To: Yan Zhao; +Cc: Tian, Kevin, intel-gvt-dev, kvm, Zhenyu Wang

On Fri, 10 Jul 2020 09:58:49 +0800
Yan Zhao <yan.y.zhao@intel.com> wrote:

> On Thu, Jul 09, 2020 at 02:22:26PM -0600, Alex Williamson wrote:
> > On Thu, 9 Jul 2020 15:23:34 +0800
> > Yan Zhao <yan.y.zhao@intel.com> wrote:
> >   
> > > On Thu, Jul 09, 2020 at 02:53:05AM +0000, Tian, Kevin wrote:
> > > 
> > > <...>  
> > > > > We also can't even seem to agree that type is a necessary requirement
> > > > > for compatibility.  Your discussion below of a type-A, which is
> > > > > equivalent to a type-B w/ aggregation set to some value is an example
> > > > > of this.  We might also have physical devices with extensions to
> > > > > support migration.  These could possibly be compatible with full mdev
> > > > > devices.  We have no idea how an administrative tool would discover
> > > > > this other than an exhaustive search across every possible target.
> > > > > That's ugly but feasible when considering a single target host, but
> > > > > completely untenable when considering a datacenter.    
> > > > 
> > > > If exhaustive search can be done just one-off to build the compatibility
> > > > database for all assignable devices on each node, then it might be
> > > > still tenable in datacenter?    
> > > yes, Alex, do you think below behavior to build compatibility database is
> > > acceptable?
> > > 
> > > management stack could do the exhaustive search in one shot to build the
> > > compatibility database for all devices in every node. Meanwhile, it caches
> > > migration version strings for all tested devices.
> > > when there's a newly created/attached device, management stack could write
> > > every cached strings to migration version attribute of the newly
> > > created/attached device in order to update the migration compatibility
> > > database. Then it caches the migration version string of the newly
> > > created/attached device as well.
> > > once a device attribute is modified, e.g. after changing its aggregation
> > > count or updating its parent driver, update its cached migration version
> > > string and update the compatibility database by testing against migration
> > > version attribute of this device.  
> > 
> > This is exactly the scenario that I think is untenable.  You're asking
> > a management application to keep a live database of the opaque version
> > string of every device type and likely every instance of a device,
> > which it's not allowed to compare on its own, it can only pipe them  
> if management stack is allowed to compare on its own, then the migration
> version strings have to be standardized.
> 
> But it's a little hard to do it.
> e.g. 
> for GVT, string format can be
> "parent device PCI ID" + "version of gvt driver" + "mdev type" +
> "aggregator count".
> 
> for an NVMe VF connecting to a remote storage. it is
> "PCI ID" + "driver version" + "configured remote storage URL"
> 
> for a QAT VF, it's
> "PCI ID" + "driver version" + "supported encryption set".
> 
> The management stack also needs to understand how to compare those
> strings, which is also hard. e.g.
> two device with different PCI IDs are incompatible initially, but later
> because of software upgrade, they are compatible again.

You're rehashing all the reasons we decided to make the string opaque.
Your examples include driver version information, but different driver
versions don't necessarily imply incompatibility.  Therefore the only
means that a consumer of the version string really has for determining
compatibility is to push that version string back into the driver and
ask whether it it's compatible.  The absolute only test that a
management tool could do on its own would be a simple strcmp(), but
clearly that can't support different driver versions or parent device
IDs, or any other field of the proposed version string.  These examples
are also extremely PCI biased, we need a solution that supports any
type of device.

> > through to every other device across the datacenter to determine which
> > are comparable.  It would need to respond to not only devices being
> > added and removed, but bound and unbound from drivers, and entire nodes
> > being added and removed.  That seems like a lot of overhead, in  
> those responses are also required if the management stack wants to
> compare on its own, right?

I think there needs to be an efficient mechanism to ask, given this
source device, does a target system have a compatible device, or
resources and capacity to create one.  The proposals we're discussing
fail that efficiency qualification in my mind.
 
> > addition to the effect of essentially fuzzing the version interface
> > across all vendors and types.  We need a better solution or we need
> > someone from openstack and ovirt to agree that this is more viable than
> > I suspect. Thanks  
> before we have a better solution, do you think it's good to ask people
> from openstack and ovirt first? what if the problem is not as complicated
> as we thought?

That's exactly what I just suggested here and in the other fork of this
thread.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/2] VFIO mdev aggregated resources handling
  2020-07-10  6:29                 ` Yan Zhao
@ 2020-07-10 15:12                   ` Alex Williamson
  2020-07-13  0:59                     ` Yan Zhao
  0 siblings, 1 reply; 34+ messages in thread
From: Alex Williamson @ 2020-07-10 15:12 UTC (permalink / raw)
  To: Yan Zhao; +Cc: Tian, Kevin, intel-gvt-dev, kvm, Zhenyu Wang

On Fri, 10 Jul 2020 14:29:59 +0800
Yan Zhao <yan.y.zhao@intel.com> wrote:

> On Fri, Jul 10, 2020 at 02:09:06AM +0000, Tian, Kevin wrote:
> <...>
> > > > > We also can't even seem to agree that type is a necessary requirement
> > > > > for compatibility.  Your discussion below of a type-A, which is
> > > > > equivalent to a type-B w/ aggregation set to some value is an example
> > > > > of this.  We might also have physical devices with extensions to
> > > > > support migration.  These could possibly be compatible with full mdev
> > > > > devices.  We have no idea how an administrative tool would discover
> > > > > this other than an exhaustive search across every possible target.
> > > > > That's ugly but feasible when considering a single target host, but
> > > > > completely untenable when considering a datacenter.  
> > > >
> > > > If exhaustive search can be done just one-off to build the compatibility
> > > > database for all assignable devices on each node, then it might be
> > > > still tenable in datacenter?  
> > > 
> > > 
> > > I'm not sure what "one-off" means relative to this discussion.  Is this
> > > trying to argue that if it's a disturbingly heavyweight operation, but
> > > a management tool only needs to do it once, it's ok?  We should really  
> > 
> > yes
> >   
> > > be including openstack and ovirt folks in any discussion about what
> > > might be acceptable across a datacenter.  I can sometimes get away with
> > > representing what might be feasible for libvirt, but this is the sort
> > > of knowledge and policy decision that would occur above libvirt.  
> > 
> > Agree. and since this is more about general migration compatibility,
> > let's start new thread and involve openstack/ovirt guys. Yan, can you
> > initiate this?
> >  
> sure.
> hi Alex,
> I'm not sure if below mailling lists are enough and accurate,
> do you know what extra people and lists I need to involve in?
> 
> devel@ovirt.org, openstack-discuss@lists.openstack.org,
> libvir-list@redhat.com

You could also include

Daniel P. Berrangé <berrange@redhat.com>
Sean Mooney <smooney@redhat.com>

 
> BTW, I found a page about live migration of SRIOV devices in openstack.
> https://specs.openstack.org/openstack/nova-specs/specs/stein/approved/libvirt-neutron-sriov-livemigration.html

Sean, above, is involved with that specification.  AFAIK the only
current live migration of SR-IOV devices involve failover and hotplug
trickery.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/2] VFIO mdev aggregated resources handling
  2020-07-10 15:12                   ` Alex Williamson
@ 2020-07-13  0:59                     ` Yan Zhao
  0 siblings, 0 replies; 34+ messages in thread
From: Yan Zhao @ 2020-07-13  0:59 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Tian, Kevin, intel-gvt-dev, kvm, Zhenyu Wang

On Fri, Jul 10, 2020 at 09:12:17AM -0600, Alex Williamson wrote:
> On Fri, 10 Jul 2020 14:29:59 +0800
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > On Fri, Jul 10, 2020 at 02:09:06AM +0000, Tian, Kevin wrote:
> > <...>
> > > > > > We also can't even seem to agree that type is a necessary requirement
> > > > > > for compatibility.  Your discussion below of a type-A, which is
> > > > > > equivalent to a type-B w/ aggregation set to some value is an example
> > > > > > of this.  We might also have physical devices with extensions to
> > > > > > support migration.  These could possibly be compatible with full mdev
> > > > > > devices.  We have no idea how an administrative tool would discover
> > > > > > this other than an exhaustive search across every possible target.
> > > > > > That's ugly but feasible when considering a single target host, but
> > > > > > completely untenable when considering a datacenter.  
> > > > >
> > > > > If exhaustive search can be done just one-off to build the compatibility
> > > > > database for all assignable devices on each node, then it might be
> > > > > still tenable in datacenter?  
> > > > 
> > > > 
> > > > I'm not sure what "one-off" means relative to this discussion.  Is this
> > > > trying to argue that if it's a disturbingly heavyweight operation, but
> > > > a management tool only needs to do it once, it's ok?  We should really  
> > > 
> > > yes
> > >   
> > > > be including openstack and ovirt folks in any discussion about what
> > > > might be acceptable across a datacenter.  I can sometimes get away with
> > > > representing what might be feasible for libvirt, but this is the sort
> > > > of knowledge and policy decision that would occur above libvirt.  
> > > 
> > > Agree. and since this is more about general migration compatibility,
> > > let's start new thread and involve openstack/ovirt guys. Yan, can you
> > > initiate this?
> > >  
> > sure.
> > hi Alex,
> > I'm not sure if below mailling lists are enough and accurate,
> > do you know what extra people and lists I need to involve in?
> > 
> > devel@ovirt.org, openstack-discuss@lists.openstack.org,
> > libvir-list@redhat.com
> 
> You could also include
> 
> Daniel P. Berrangé <berrange@redhat.com>
> Sean Mooney <smooney@redhat.com>
> 
>  
> > BTW, I found a page about live migration of SRIOV devices in openstack.
> > https://specs.openstack.org/openstack/nova-specs/specs/stein/approved/libvirt-neutron-sriov-livemigration.html
> 
> Sean, above, is involved with that specification.  AFAIK the only
> current live migration of SR-IOV devices involve failover and hotplug
> trickery.  Thanks,
> 
got it!

Thanks
Yan

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2020-07-13  1:10 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-26  5:41 [PATCH v2 0/2] VFIO mdev aggregated resources handling Zhenyu Wang
2020-03-26  5:41 ` [PATCH v2 1/2] Documentation/driver-api/vfio-mediated-device.rst: update for aggregation support Zhenyu Wang
2020-03-26  8:17   ` Tian, Kevin
2020-03-26  8:21     ` Zhenyu Wang
2020-03-27  6:16       ` Tian, Kevin
2020-03-27  6:21         ` Zhenyu Wang
2020-03-26  5:41 ` [PATCH v2 2/2] drm/i915/gvt: mdev aggregation type Zhenyu Wang
2020-03-27  7:48   ` Tian, Kevin
2020-03-27  8:12     ` Zhenyu Wang
2020-03-27  8:44       ` Tian, Kevin
2020-03-27  8:58         ` Zhenyu Wang
2020-03-27  9:31           ` Tian, Kevin
2020-04-08  5:58 ` [PATCH v3 0/2] VFIO mdev aggregated resources handling Zhenyu Wang
2020-07-07 23:28   ` Tian, Kevin
2020-07-08  1:06     ` Alex Williamson
2020-07-08  1:54       ` Zhenyu Wang
2020-07-08  3:38         ` Yan Zhao
2020-07-08  3:40           ` Yan Zhao
2020-07-08  4:17             ` Alex Williamson
2020-07-08  6:31       ` Tian, Kevin
2020-07-08  9:54         ` Zhenyu Wang
2020-07-08 18:48         ` Alex Williamson
2020-07-09  2:53           ` Tian, Kevin
2020-07-09  7:23             ` Yan Zhao
2020-07-09 20:22               ` Alex Williamson
2020-07-10  1:58                 ` Yan Zhao
2020-07-10 15:00                   ` Alex Williamson
2020-07-09 17:28             ` Alex Williamson
2020-07-10  2:09               ` Tian, Kevin
2020-07-10  6:29                 ` Yan Zhao
2020-07-10 15:12                   ` Alex Williamson
2020-07-13  0:59                     ` Yan Zhao
2020-04-08  5:58 ` [PATCH v3 1/2] Documentation/driver-api/vfio-mediated-device.rst: update for aggregation support Zhenyu Wang
2020-04-08  5:58 ` [PATCH v3 2/2] drm/i915/gvt: mdev aggregation type Zhenyu Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).