linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv2 00/10] vfio/mdev: Improve vfio/mdev core module
@ 2019-04-30 22:49 Parav Pandit
  2019-04-30 22:49 ` [PATCHv2 01/10] vfio/mdev: Avoid release parent reference during error path Parav Pandit
                   ` (10 more replies)
  0 siblings, 11 replies; 31+ messages in thread
From: Parav Pandit @ 2019-04-30 22:49 UTC (permalink / raw)
  To: kvm, linux-kernel, kwankhede, alex.williamson; +Cc: cjia, parav

As we would like to use mdev subsystem for wider use case as
discussed in [1], [2] apart from an offline discussion.
This use case is also discussed with wider forum in [4] in track
'Lightweight NIC HW functions for container offload use cases'.

This series is prep-work and improves vfio/mdev module in following ways.

Patch-1 Fixes releasing parent dev reference during error unwinding
        mdev parent registration.
Patch-2 Simplifies mdev device for unused kref.
Patch-3 Drops redundant extern prefix of exported symbols.
Patch-4 Returns right error code from vendor driver.
Patch-5 Fixes to use right sysfs remove sequence.
Patch-6 Fixes removing all child devices if one of them fails.
Patch-7 Remove unnecessary inline
Patch-8 Improve the mdev create/remove sequence to match Linux
        bus, device model
Patch-9 Avoid recreating remove file on stale device to
        eliminate call trace
Patch-10 Fix race conditions of create/remove with parent removal
This is improved version than using srcu as srcu can take
seconds to minutes.

This series is tested using
(a) mtty with VM using vfio_mdev driver for positive tests and
device removal while device in use by VM using vfio_mdev driver

(b) mlx5 core driver using RFC patches [3] and internal patches.
Internal patches are large and cannot be combined with this
prep-work patches. It will posted once prep-work completes.

[1] https://www.spinics.net/lists/netdev/msg556978.html
[2] https://lkml.org/lkml/2019/3/7/696
[3] https://lkml.org/lkml/2019/3/8/819
[4] https://netdevconf.org/0x13/session.html?workshop-hardware-offload

---
Changelog:
---
v1->v2:
 - Addressed comments from Alex
 - Rebased
 - Inserted the device checking loop in Patch-6 as original code
 - Added patch 7 to 10
 - Added fixes for race condition in create/remove with parent removal
   Patch-10 uses simplified refcount and completion, instead of srcu
   which might take seconds to minutes on busy system.
 - Added fix for device create/remove sequence to match
   Linux device, bus model
v0->v1:
 - Dropped device placement on bus sequence patch for this series
 - Addressed below comments from Alex, Kirti, Maxim.
 - Added Review-by tag for already reviewed patches.
 - Dropped incorrect patch of put_device().
 - Corrected Fixes commit tag for sysfs remove sequence fix
 - Split last 8th patch to smaller refactor and fixes patch
 - Following coding style commenting format
 - Fixed accidental delete of mutex_lock in mdev_unregister_device
 - Renamed remove helped to mdev_device_remove_common().
 - Rebased for uuid/guid change

Parav Pandit (10):
  vfio/mdev: Avoid release parent reference during error path
  vfio/mdev: Removed unused kref
  vfio/mdev: Drop redundant extern for exported symbols
  vfio/mdev: Avoid masking error code to EBUSY
  vfio/mdev: Follow correct remove sequence
  vfio/mdev: Fix aborting mdev child device removal if one fails
  vfio/mdev: Avoid inline get and put parent helpers
  vfio/mdev: Improve the create/remove sequence
  vfio/mdev: Avoid creating sysfs remove file on stale device removal
  vfio/mdev: Synchronize device create/remove with parent removal

 drivers/vfio/mdev/mdev_core.c    | 162 +++++++++++++------------------
 drivers/vfio/mdev/mdev_private.h |   9 +-
 drivers/vfio/mdev/mdev_sysfs.c   |   8 +-
 include/linux/mdev.h             |  21 ++--
 4 files changed, 89 insertions(+), 111 deletions(-)

-- 
2.19.2


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCHv2 01/10] vfio/mdev: Avoid release parent reference during error path
  2019-04-30 22:49 [PATCHv2 00/10] vfio/mdev: Improve vfio/mdev core module Parav Pandit
@ 2019-04-30 22:49 ` Parav Pandit
  2019-04-30 22:49 ` [PATCHv2 02/10] vfio/mdev: Removed unused kref Parav Pandit
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Parav Pandit @ 2019-04-30 22:49 UTC (permalink / raw)
  To: kvm, linux-kernel, kwankhede, alex.williamson; +Cc: cjia, parav

During mdev parent registration in mdev_register_device(),
if parent device is duplicate, it releases the reference of existing
parent device.
This is incorrect. Existing parent device should not be touched.

Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed By: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
---
 drivers/vfio/mdev/mdev_core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index b96fedc77ee5..1299d2e72ce2 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -181,6 +181,7 @@ int mdev_register_device(struct device *dev, const struct mdev_parent_ops *ops)
 	/* Check for duplicate */
 	parent = __find_parent_device(dev);
 	if (parent) {
+		parent = NULL;
 		ret = -EEXIST;
 		goto add_dev_err;
 	}
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCHv2 02/10] vfio/mdev: Removed unused kref
  2019-04-30 22:49 [PATCHv2 00/10] vfio/mdev: Improve vfio/mdev core module Parav Pandit
  2019-04-30 22:49 ` [PATCHv2 01/10] vfio/mdev: Avoid release parent reference during error path Parav Pandit
@ 2019-04-30 22:49 ` Parav Pandit
  2019-04-30 22:49 ` [PATCHv2 03/10] vfio/mdev: Drop redundant extern for exported symbols Parav Pandit
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Parav Pandit @ 2019-04-30 22:49 UTC (permalink / raw)
  To: kvm, linux-kernel, kwankhede, alex.williamson; +Cc: cjia, parav

Remove unused kref from the mdev_device structure.

Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed By: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
---
 drivers/vfio/mdev/mdev_core.c    | 1 -
 drivers/vfio/mdev/mdev_private.h | 1 -
 2 files changed, 2 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 1299d2e72ce2..00ca61392de9 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -311,7 +311,6 @@ int mdev_device_create(struct kobject *kobj,
 	mutex_unlock(&mdev_list_lock);
 
 	mdev->parent = parent;
-	kref_init(&mdev->ref);
 
 	mdev->dev.parent  = dev;
 	mdev->dev.bus     = &mdev_bus_type;
diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
index 379758c52b1b..ddcf9c72bd8a 100644
--- a/drivers/vfio/mdev/mdev_private.h
+++ b/drivers/vfio/mdev/mdev_private.h
@@ -30,7 +30,6 @@ struct mdev_device {
 	struct mdev_parent *parent;
 	guid_t uuid;
 	void *driver_data;
-	struct kref ref;
 	struct list_head next;
 	struct kobject *type_kobj;
 	bool active;
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCHv2 03/10] vfio/mdev: Drop redundant extern for exported symbols
  2019-04-30 22:49 [PATCHv2 00/10] vfio/mdev: Improve vfio/mdev core module Parav Pandit
  2019-04-30 22:49 ` [PATCHv2 01/10] vfio/mdev: Avoid release parent reference during error path Parav Pandit
  2019-04-30 22:49 ` [PATCHv2 02/10] vfio/mdev: Removed unused kref Parav Pandit
@ 2019-04-30 22:49 ` Parav Pandit
  2019-04-30 22:49 ` [PATCHv2 04/10] vfio/mdev: Avoid masking error code to EBUSY Parav Pandit
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Parav Pandit @ 2019-04-30 22:49 UTC (permalink / raw)
  To: kvm, linux-kernel, kwankhede, alex.williamson; +Cc: cjia, parav

There is no need use 'extern' for exported functions.

Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
---
 include/linux/mdev.h | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index d7aee90e5da5..4924d8038814 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -118,21 +118,20 @@ struct mdev_driver {
 
 #define to_mdev_driver(drv)	container_of(drv, struct mdev_driver, driver)
 
-extern void *mdev_get_drvdata(struct mdev_device *mdev);
-extern void mdev_set_drvdata(struct mdev_device *mdev, void *data);
-extern const guid_t *mdev_uuid(struct mdev_device *mdev);
+void *mdev_get_drvdata(struct mdev_device *mdev);
+void mdev_set_drvdata(struct mdev_device *mdev, void *data);
+const guid_t *mdev_uuid(struct mdev_device *mdev);
 
 extern struct bus_type mdev_bus_type;
 
-extern int  mdev_register_device(struct device *dev,
-				 const struct mdev_parent_ops *ops);
-extern void mdev_unregister_device(struct device *dev);
+int mdev_register_device(struct device *dev, const struct mdev_parent_ops *ops);
+void mdev_unregister_device(struct device *dev);
 
-extern int  mdev_register_driver(struct mdev_driver *drv, struct module *owner);
-extern void mdev_unregister_driver(struct mdev_driver *drv);
+int mdev_register_driver(struct mdev_driver *drv, struct module *owner);
+void mdev_unregister_driver(struct mdev_driver *drv);
 
-extern struct device *mdev_parent_dev(struct mdev_device *mdev);
-extern struct device *mdev_dev(struct mdev_device *mdev);
-extern struct mdev_device *mdev_from_dev(struct device *dev);
+struct device *mdev_parent_dev(struct mdev_device *mdev);
+struct device *mdev_dev(struct mdev_device *mdev);
+struct mdev_device *mdev_from_dev(struct device *dev);
 
 #endif /* MDEV_H */
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCHv2 04/10] vfio/mdev: Avoid masking error code to EBUSY
  2019-04-30 22:49 [PATCHv2 00/10] vfio/mdev: Improve vfio/mdev core module Parav Pandit
                   ` (2 preceding siblings ...)
  2019-04-30 22:49 ` [PATCHv2 03/10] vfio/mdev: Drop redundant extern for exported symbols Parav Pandit
@ 2019-04-30 22:49 ` Parav Pandit
  2019-04-30 22:49 ` [PATCHv2 05/10] vfio/mdev: Follow correct remove sequence Parav Pandit
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Parav Pandit @ 2019-04-30 22:49 UTC (permalink / raw)
  To: kvm, linux-kernel, kwankhede, alex.williamson; +Cc: cjia, parav

Instead of masking return error to -EBUSY, return actual error
returned by the driver.

Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
---
 drivers/vfio/mdev/mdev_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 00ca61392de9..836d31985f14 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -141,7 +141,7 @@ static int mdev_device_remove_ops(struct mdev_device *mdev, bool force_remove)
 	 */
 	ret = parent->ops->remove(mdev);
 	if (ret && !force_remove)
-		return -EBUSY;
+		return ret;
 
 	sysfs_remove_groups(&mdev->dev.kobj, parent->ops->mdev_attr_groups);
 	return 0;
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCHv2 05/10] vfio/mdev: Follow correct remove sequence
  2019-04-30 22:49 [PATCHv2 00/10] vfio/mdev: Improve vfio/mdev core module Parav Pandit
                   ` (3 preceding siblings ...)
  2019-04-30 22:49 ` [PATCHv2 04/10] vfio/mdev: Avoid masking error code to EBUSY Parav Pandit
@ 2019-04-30 22:49 ` Parav Pandit
  2019-04-30 22:49 ` [PATCHv2 06/10] vfio/mdev: Fix aborting mdev child device removal if one fails Parav Pandit
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Parav Pandit @ 2019-04-30 22:49 UTC (permalink / raw)
  To: kvm, linux-kernel, kwankhede, alex.williamson; +Cc: cjia, parav

mdev_remove_sysfs_files() should follow exact mirror sequence of a
create, similar to what is followed in error unwinding path of
mdev_create_sysfs_files().

Fixes: 6a62c1dfb5c7 ("vfio/mdev: Re-order sysfs attribute creation")
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
---
 drivers/vfio/mdev/mdev_sysfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
index 5193a0e0ce5a..cbf94b8165ea 100644
--- a/drivers/vfio/mdev/mdev_sysfs.c
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -280,7 +280,7 @@ int  mdev_create_sysfs_files(struct device *dev, struct mdev_type *type)
 
 void mdev_remove_sysfs_files(struct device *dev, struct mdev_type *type)
 {
+	sysfs_remove_files(&dev->kobj, mdev_device_attrs);
 	sysfs_remove_link(&dev->kobj, "mdev_type");
 	sysfs_remove_link(type->devices_kobj, dev_name(dev));
-	sysfs_remove_files(&dev->kobj, mdev_device_attrs);
 }
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCHv2 06/10] vfio/mdev: Fix aborting mdev child device removal if one fails
  2019-04-30 22:49 [PATCHv2 00/10] vfio/mdev: Improve vfio/mdev core module Parav Pandit
                   ` (4 preceding siblings ...)
  2019-04-30 22:49 ` [PATCHv2 05/10] vfio/mdev: Follow correct remove sequence Parav Pandit
@ 2019-04-30 22:49 ` Parav Pandit
  2019-04-30 22:49 ` [PATCHv2 07/10] vfio/mdev: Avoid inline get and put parent helpers Parav Pandit
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Parav Pandit @ 2019-04-30 22:49 UTC (permalink / raw)
  To: kvm, linux-kernel, kwankhede, alex.williamson; +Cc: cjia, parav

device_for_each_child() stops executing callback function for remaining
child devices, if callback hits an error.
Each child mdev device is independent of each other.
While unregistering parent device, mdev core must remove all child mdev
devices.
Therefore, mdev_device_remove_cb() always returns success so that
device_for_each_child doesn't abort if one child removal hits error.

While at it, improve remove and unregister functions for below simplicity.

There isn't need to pass forced flag pointer during mdev parent
removal which invokes mdev_device_remove(). So simplify the flow.

mdev_device_remove() is called from two paths.
1. mdev_unregister_driver()
     mdev_device_remove_cb()
       mdev_device_remove()
2. remove_store()
     mdev_device_remove()

Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
---
 drivers/vfio/mdev/mdev_core.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 836d31985f14..1a317e409355 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -149,10 +149,10 @@ static int mdev_device_remove_ops(struct mdev_device *mdev, bool force_remove)
 
 static int mdev_device_remove_cb(struct device *dev, void *data)
 {
-	if (!dev_is_mdev(dev))
-		return 0;
+	if (dev_is_mdev(dev))
+		mdev_device_remove(dev, true);
 
-	return mdev_device_remove(dev, data ? *(bool *)data : true);
+	return 0;
 }
 
 /*
@@ -240,7 +240,6 @@ EXPORT_SYMBOL(mdev_register_device);
 void mdev_unregister_device(struct device *dev)
 {
 	struct mdev_parent *parent;
-	bool force_remove = true;
 
 	mutex_lock(&parent_list_lock);
 	parent = __find_parent_device(dev);
@@ -254,8 +253,7 @@ void mdev_unregister_device(struct device *dev)
 	list_del(&parent->next);
 	class_compat_remove_link(mdev_bus_compat_class, dev, NULL);
 
-	device_for_each_child(dev, (void *)&force_remove,
-			      mdev_device_remove_cb);
+	device_for_each_child(dev, NULL, mdev_device_remove_cb);
 
 	parent_remove_sysfs_files(parent);
 
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCHv2 07/10] vfio/mdev: Avoid inline get and put parent helpers
  2019-04-30 22:49 [PATCHv2 00/10] vfio/mdev: Improve vfio/mdev core module Parav Pandit
                   ` (5 preceding siblings ...)
  2019-04-30 22:49 ` [PATCHv2 06/10] vfio/mdev: Fix aborting mdev child device removal if one fails Parav Pandit
@ 2019-04-30 22:49 ` Parav Pandit
  2019-04-30 22:49 ` [PATCHv2 08/10] vfio/mdev: Improve the create/remove sequence Parav Pandit
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Parav Pandit @ 2019-04-30 22:49 UTC (permalink / raw)
  To: kvm, linux-kernel, kwankhede, alex.williamson; +Cc: cjia, parav

As section 15 of Documentation/process/coding-style.rst clearly
describes that compiler will be able to optimize code.

Hence drop inline for get and put helpers for parent.

Signed-off-by: Parav Pandit <parav@mellanox.com>
---
 drivers/vfio/mdev/mdev_core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 1a317e409355..1040a4a2dcbc 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -88,7 +88,7 @@ static void mdev_release_parent(struct kref *kref)
 	put_device(dev);
 }
 
-static inline struct mdev_parent *mdev_get_parent(struct mdev_parent *parent)
+static struct mdev_parent *mdev_get_parent(struct mdev_parent *parent)
 {
 	if (parent)
 		kref_get(&parent->ref);
@@ -96,7 +96,7 @@ static inline struct mdev_parent *mdev_get_parent(struct mdev_parent *parent)
 	return parent;
 }
 
-static inline void mdev_put_parent(struct mdev_parent *parent)
+static void mdev_put_parent(struct mdev_parent *parent)
 {
 	if (parent)
 		kref_put(&parent->ref, mdev_release_parent);
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCHv2 08/10] vfio/mdev: Improve the create/remove sequence
  2019-04-30 22:49 [PATCHv2 00/10] vfio/mdev: Improve vfio/mdev core module Parav Pandit
                   ` (6 preceding siblings ...)
  2019-04-30 22:49 ` [PATCHv2 07/10] vfio/mdev: Avoid inline get and put parent helpers Parav Pandit
@ 2019-04-30 22:49 ` Parav Pandit
  2019-05-08 17:09   ` Cornelia Huck
  2019-04-30 22:49 ` [PATCHv2 09/10] vfio/mdev: Avoid creating sysfs remove file on stale device removal Parav Pandit
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Parav Pandit @ 2019-04-30 22:49 UTC (permalink / raw)
  To: kvm, linux-kernel, kwankhede, alex.williamson; +Cc: cjia, parav

This patch addresses below two issues and prepares the code to address
3rd issue listed below.

1. mdev device is placed on the mdev bus before it is created in the
vendor driver. Once a device is placed on the mdev bus without creating
its supporting underlying vendor device, mdev driver's probe() gets triggered.
However there isn't a stable mdev available to work on.

   create_store()
     mdev_create_device()
       device_register()
          ...
         vfio_mdev_probe()
        [...]
        parent->ops->create()
          vfio_ap_mdev_create()
            mdev_set_drvdata(mdev, matrix_mdev);
            /* Valid pointer set above */

Due to this way of initialization, mdev driver who want to use the mdev,
doesn't have a valid mdev to work on.

2. Current creation sequence is,
   parent->ops_create()
   groups_register()

Remove sequence is,
   parent->ops->remove()
   groups_unregister()

However, remove sequence should be exact mirror of creation sequence.
Once this is achieved, all users of the mdev will be terminated first
before removing underlying vendor device.
(Follow standard linux driver model).
At that point vendor's remove() ops shouldn't failed because device is
taken off the bus that should terminate the users.

3. When remove operation fails, mdev sysfs removal attempts to add the
file back on already removed device. Following call trace [1] is observed.

[1] call trace:
kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327 sysfs_create_file_ns+0x7f/0x90
kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted 5.1.0-rc6-vdevbus+ #6
kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
kernel: Call Trace:
kernel: remove_store+0xdc/0x100 [mdev]
kernel: kernfs_fop_write+0x113/0x1a0
kernel: vfs_write+0xad/0x1b0
kernel: ksys_write+0x5a/0xe0
kernel: do_syscall_64+0x5a/0x210
kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe

Therefore, mdev core is improved in following ways.

1. Before placing mdev devices on the bus, perform vendor drivers
creation which supports the mdev creation.
This ensures that mdev specific all necessary fields are initialized
before a given mdev can be accessed by bus driver.
This follows standard Linux kernel bus and device model similar to other
widely used PCI bus.

2. During remove flow, first remove the device from the bus. This
ensures that any bus specific devices and data is cleared.
Once device is taken of the mdev bus, perform remove() of mdev from the
vendor driver.

3. Linux core device model provides way to register and auto unregister
the device sysfs attribute groups at dev->groups.
Make use of this groups to let core create the groups and simplify code
to avoid explicit groups creation and removal.

A below stack dump of a mdev device remove process also ensures that
vfio driver guards against device removal already in use.

 cat /proc/21962/stack
[<0>] vfio_del_group_dev+0x216/0x3c0 [vfio]
[<0>] mdev_remove+0x21/0x40 [mdev]
[<0>] device_release_driver_internal+0xe8/0x1b0
[<0>] bus_remove_device+0xf9/0x170
[<0>] device_del+0x168/0x350
[<0>] mdev_device_remove_common+0x1d/0x50 [mdev]
[<0>] mdev_device_remove+0x8c/0xd0 [mdev]
[<0>] remove_store+0x71/0x90 [mdev]
[<0>] kernfs_fop_write+0x113/0x1a0
[<0>] vfs_write+0xad/0x1b0
[<0>] ksys_write+0x5a/0xe0
[<0>] do_syscall_64+0x5a/0x210
[<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<0>] 0xffffffffffffffff

This prepares the code to eliminate calling device_create_file() in
subsquent patch.

Signed-off-by: Parav Pandit <parav@mellanox.com>
---
 drivers/vfio/mdev/mdev_core.c    | 94 +++++++++-----------------------
 drivers/vfio/mdev/mdev_private.h |  2 +-
 drivers/vfio/mdev/mdev_sysfs.c   |  2 +-
 3 files changed, 27 insertions(+), 71 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 1040a4a2dcbc..2b98da2ee361 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -102,55 +102,10 @@ static void mdev_put_parent(struct mdev_parent *parent)
 		kref_put(&parent->ref, mdev_release_parent);
 }
 
-static int mdev_device_create_ops(struct kobject *kobj,
-				  struct mdev_device *mdev)
-{
-	struct mdev_parent *parent = mdev->parent;
-	int ret;
-
-	ret = parent->ops->create(kobj, mdev);
-	if (ret)
-		return ret;
-
-	ret = sysfs_create_groups(&mdev->dev.kobj,
-				  parent->ops->mdev_attr_groups);
-	if (ret)
-		parent->ops->remove(mdev);
-
-	return ret;
-}
-
-/*
- * mdev_device_remove_ops gets called from sysfs's 'remove' and when parent
- * device is being unregistered from mdev device framework.
- * - 'force_remove' is set to 'false' when called from sysfs's 'remove' which
- *   indicates that if the mdev device is active, used by VMM or userspace
- *   application, vendor driver could return error then don't remove the device.
- * - 'force_remove' is set to 'true' when called from mdev_unregister_device()
- *   which indicate that parent device is being removed from mdev device
- *   framework so remove mdev device forcefully.
- */
-static int mdev_device_remove_ops(struct mdev_device *mdev, bool force_remove)
-{
-	struct mdev_parent *parent = mdev->parent;
-	int ret;
-
-	/*
-	 * Vendor driver can return error if VMM or userspace application is
-	 * using this mdev device.
-	 */
-	ret = parent->ops->remove(mdev);
-	if (ret && !force_remove)
-		return ret;
-
-	sysfs_remove_groups(&mdev->dev.kobj, parent->ops->mdev_attr_groups);
-	return 0;
-}
-
 static int mdev_device_remove_cb(struct device *dev, void *data)
 {
 	if (dev_is_mdev(dev))
-		mdev_device_remove(dev, true);
+		mdev_device_remove(dev);
 
 	return 0;
 }
@@ -310,41 +265,43 @@ int mdev_device_create(struct kobject *kobj,
 
 	mdev->parent = parent;
 
+	device_initialize(&mdev->dev);
 	mdev->dev.parent  = dev;
 	mdev->dev.bus     = &mdev_bus_type;
 	mdev->dev.release = mdev_device_release;
 	dev_set_name(&mdev->dev, "%pUl", uuid);
+	mdev->dev.groups = parent->ops->mdev_attr_groups;
+	mdev->type_kobj = kobj;
 
-	ret = device_register(&mdev->dev);
-	if (ret) {
-		put_device(&mdev->dev);
-		goto mdev_fail;
-	}
+	ret = parent->ops->create(kobj, mdev);
+	if (ret)
+		goto ops_create_fail;
 
-	ret = mdev_device_create_ops(kobj, mdev);
+	ret = device_add(&mdev->dev);
 	if (ret)
-		goto create_fail;
+		goto add_fail;
 
 	ret = mdev_create_sysfs_files(&mdev->dev, type);
-	if (ret) {
-		mdev_device_remove_ops(mdev, true);
-		goto create_fail;
-	}
+	if (ret)
+		goto sysfs_fail;
 
-	mdev->type_kobj = kobj;
 	mdev->active = true;
 	dev_dbg(&mdev->dev, "MDEV: created\n");
 
 	return 0;
 
-create_fail:
-	device_unregister(&mdev->dev);
+sysfs_fail:
+	device_del(&mdev->dev);
+add_fail:
+	parent->ops->remove(mdev);
+ops_create_fail:
+	put_device(&mdev->dev);
 mdev_fail:
 	mdev_put_parent(parent);
 	return ret;
 }
 
-int mdev_device_remove(struct device *dev, bool force_remove)
+int mdev_device_remove(struct device *dev)
 {
 	struct mdev_device *mdev, *tmp;
 	struct mdev_parent *parent;
@@ -373,16 +330,15 @@ int mdev_device_remove(struct device *dev, bool force_remove)
 	mutex_unlock(&mdev_list_lock);
 
 	type = to_mdev_type(mdev->type_kobj);
+	mdev_remove_sysfs_files(dev, type);
+	device_del(&mdev->dev);
 	parent = mdev->parent;
+	ret = parent->ops->remove(mdev);
+	if (ret)
+		dev_err(&mdev->dev, "Remove failed: err=%d\n", ret);
 
-	ret = mdev_device_remove_ops(mdev, force_remove);
-	if (ret) {
-		mdev->active = true;
-		return ret;
-	}
-
-	mdev_remove_sysfs_files(dev, type);
-	device_unregister(dev);
+	/* Balances with device_initialize() */
+	put_device(&mdev->dev);
 	mdev_put_parent(parent);
 
 	return 0;
diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
index ddcf9c72bd8a..067dc5d8c5de 100644
--- a/drivers/vfio/mdev/mdev_private.h
+++ b/drivers/vfio/mdev/mdev_private.h
@@ -59,6 +59,6 @@ void mdev_remove_sysfs_files(struct device *dev, struct mdev_type *type);
 
 int  mdev_device_create(struct kobject *kobj,
 			struct device *dev, const guid_t *uuid);
-int  mdev_device_remove(struct device *dev, bool force_remove);
+int  mdev_device_remove(struct device *dev);
 
 #endif /* MDEV_PRIVATE_H */
diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
index cbf94b8165ea..9f774b91d275 100644
--- a/drivers/vfio/mdev/mdev_sysfs.c
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -236,7 +236,7 @@ static ssize_t remove_store(struct device *dev, struct device_attribute *attr,
 	if (val && device_remove_file_self(dev, attr)) {
 		int ret;
 
-		ret = mdev_device_remove(dev, false);
+		ret = mdev_device_remove(dev);
 		if (ret) {
 			device_create_file(dev, attr);
 			return ret;
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCHv2 09/10] vfio/mdev: Avoid creating sysfs remove file on stale device removal
  2019-04-30 22:49 [PATCHv2 00/10] vfio/mdev: Improve vfio/mdev core module Parav Pandit
                   ` (7 preceding siblings ...)
  2019-04-30 22:49 ` [PATCHv2 08/10] vfio/mdev: Improve the create/remove sequence Parav Pandit
@ 2019-04-30 22:49 ` Parav Pandit
  2019-05-08 17:16   ` Cornelia Huck
  2019-04-30 22:49 ` [PATCHv2 10/10] vfio/mdev: Synchronize device create/remove with parent removal Parav Pandit
  2019-05-06 22:03 ` [PATCHv2 00/10] vfio/mdev: Improve vfio/mdev core module Alex Williamson
  10 siblings, 1 reply; 31+ messages in thread
From: Parav Pandit @ 2019-04-30 22:49 UTC (permalink / raw)
  To: kvm, linux-kernel, kwankhede, alex.williamson; +Cc: cjia, parav

If device is removal is initiated by two threads as below, mdev core
attempts to create a syfs remove file on stale device.
During this flow, below [1] call trace is observed.

     cpu-0                                    cpu-1
     -----                                    -----
  mdev_unregister_device()
    device_for_each_child
       mdev_device_remove_cb
          mdev_device_remove
                                       user_syscall
                                         remove_store()
                                           mdev_device_remove()
                                        [..]
   unregister device();
                                       /* not found in list or
                                        * active=false.
                                        */
                                          sysfs_create_file()
                                          ..Call trace

Now that mdev core follows correct device removal system of the linux
bus model, remove shouldn't fail in normal cases. If it fails, there is
no point of creating a stale file or checking for specific error status.

kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
sysfs_create_file_ns+0x7f/0x90
kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted
5.1.0-rc6-vdevbus+ #6
kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b
08/09/2016
kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
kernel: Call Trace:
kernel: remove_store+0xdc/0x100 [mdev]
kernel: kernfs_fop_write+0x113/0x1a0
kernel: vfs_write+0xad/0x1b0
kernel: ksys_write+0x5a/0xe0
kernel: do_syscall_64+0x5a/0x210
kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Parav Pandit <parav@mellanox.com>
---
 drivers/vfio/mdev/mdev_sysfs.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
index 9f774b91d275..ffa3dcebf201 100644
--- a/drivers/vfio/mdev/mdev_sysfs.c
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -237,10 +237,8 @@ static ssize_t remove_store(struct device *dev, struct device_attribute *attr,
 		int ret;
 
 		ret = mdev_device_remove(dev);
-		if (ret) {
-			device_create_file(dev, attr);
+		if (ret)
 			return ret;
-		}
 	}
 
 	return count;
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCHv2 10/10] vfio/mdev: Synchronize device create/remove with parent removal
  2019-04-30 22:49 [PATCHv2 00/10] vfio/mdev: Improve vfio/mdev core module Parav Pandit
                   ` (8 preceding siblings ...)
  2019-04-30 22:49 ` [PATCHv2 09/10] vfio/mdev: Avoid creating sysfs remove file on stale device removal Parav Pandit
@ 2019-04-30 22:49 ` Parav Pandit
  2019-05-09  2:46   ` Alex Williamson
  2019-05-06 22:03 ` [PATCHv2 00/10] vfio/mdev: Improve vfio/mdev core module Alex Williamson
  10 siblings, 1 reply; 31+ messages in thread
From: Parav Pandit @ 2019-04-30 22:49 UTC (permalink / raw)
  To: kvm, linux-kernel, kwankhede, alex.williamson; +Cc: cjia, parav

In following sequences, child devices created while removing mdev parent
device can be left out, or it may lead to race of removing half
initialized child mdev devices.

issue-1:
--------
       cpu-0                         cpu-1
       -----                         -----
                                  mdev_unregister_device()
                                    device_for_each_child()
                                      mdev_device_remove_cb()
                                        mdev_device_remove()
create_store()
  mdev_device_create()                   [...]
    device_add()
                                  parent_remove_sysfs_files()

/* BUG: device added by cpu-0
 * whose parent is getting removed
 * and it won't process this mdev.
 */

issue-2:
--------
Below crash is observed when user initiated remove is in progress
and mdev_unregister_driver() completes parent unregistration.

       cpu-0                         cpu-1
       -----                         -----
remove_store()
   mdev_device_remove()
   active = false;
                                  mdev_unregister_device()
                                  parent device removed.
   [...]
   parents->ops->remove()
 /*
  * BUG: Accessing invalid parent.
  */

This is similar race like create() racing with mdev_unregister_device().

BUG: unable to handle kernel paging request at ffffffffc0585668
PGD e8f618067 P4D e8f618067 PUD e8f61a067 PMD 85adca067 PTE 0
Oops: 0000 [#1] SMP PTI
CPU: 41 PID: 37403 Comm: bash Kdump: loaded Not tainted 5.1.0-rc6-vdevbus+ #6
Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
RIP: 0010:mdev_device_remove+0xfa/0x140 [mdev]
Call Trace:
 remove_store+0x71/0x90 [mdev]
 kernfs_fop_write+0x113/0x1a0
 vfs_write+0xad/0x1b0
 ksys_write+0x5a/0xe0
 do_syscall_64+0x5a/0x210
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Therefore, mdev core is improved as below to overcome above issues.

Wait for any ongoing mdev create() and remove() to finish before
unregistering parent device using refcount and completion.
This continues to allow multiple create and remove to progress in
parallel for different mdev devices as most common case.
At the same time guard parent removal while parent is being access by
create() and remove callbacks.

Code is simplified from kref to use refcount as unregister_device() has
to wait anyway for all create/remove to finish.

While removing mdev devices during parent unregistration, there isn't
need to acquire refcount of parent device, hence code is restructured
using mdev_device_remove_common() to avoid it.

Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
Signed-off-by: Parav Pandit <parav@mellanox.com>
---
 drivers/vfio/mdev/mdev_core.c    | 86 ++++++++++++++++++++------------
 drivers/vfio/mdev/mdev_private.h |  6 ++-
 2 files changed, 60 insertions(+), 32 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 2b98da2ee361..a5da24d662f4 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -78,34 +78,41 @@ static struct mdev_parent *__find_parent_device(struct device *dev)
 	return NULL;
 }
 
-static void mdev_release_parent(struct kref *kref)
+static bool mdev_try_get_parent(struct mdev_parent *parent)
 {
-	struct mdev_parent *parent = container_of(kref, struct mdev_parent,
-						  ref);
-	struct device *dev = parent->dev;
-
-	kfree(parent);
-	put_device(dev);
+	if (parent)
+		return refcount_inc_not_zero(&parent->refcount);
+	return false;
 }
 
-static struct mdev_parent *mdev_get_parent(struct mdev_parent *parent)
+static void mdev_put_parent(struct mdev_parent *parent)
 {
-	if (parent)
-		kref_get(&parent->ref);
-
-	return parent;
+	if (parent && refcount_dec_and_test(&parent->refcount))
+		complete(&parent->unreg_completion);
 }
 
-static void mdev_put_parent(struct mdev_parent *parent)
+static void mdev_device_remove_common(struct mdev_device *mdev)
 {
-	if (parent)
-		kref_put(&parent->ref, mdev_release_parent);
+	struct mdev_parent *parent;
+	struct mdev_type *type;
+	int ret;
+
+	type = to_mdev_type(mdev->type_kobj);
+	mdev_remove_sysfs_files(&mdev->dev, type);
+	device_del(&mdev->dev);
+	parent = mdev->parent;
+	ret = parent->ops->remove(mdev);
+	if (ret)
+		dev_err(&mdev->dev, "Remove failed: err=%d\n", ret);
+
+	/* Balances with device_initialize() */
+	put_device(&mdev->dev);
 }
 
 static int mdev_device_remove_cb(struct device *dev, void *data)
 {
 	if (dev_is_mdev(dev))
-		mdev_device_remove(dev);
+		mdev_device_remove_common(to_mdev_device(dev));
 
 	return 0;
 }
@@ -147,7 +154,8 @@ int mdev_register_device(struct device *dev, const struct mdev_parent_ops *ops)
 		goto add_dev_err;
 	}
 
-	kref_init(&parent->ref);
+	refcount_set(&parent->refcount, 1);
+	init_completion(&parent->unreg_completion);
 
 	parent->dev = dev;
 	parent->ops = ops;
@@ -206,14 +214,27 @@ void mdev_unregister_device(struct device *dev)
 	dev_info(dev, "MDEV: Unregistering\n");
 
 	list_del(&parent->next);
+	mutex_unlock(&parent_list_lock);
+
+	/* Release the initial reference so that new create cannot start */
+	mdev_put_parent(parent);
+
+	/*
+	 * Wait for all the create and remove references to drop.
+	 */
+	wait_for_completion(&parent->unreg_completion);
+
+	/*
+	 * New references cannot be taken and all users are done
+	 * using the parent. So it is safe to unregister parent.
+	 */
 	class_compat_remove_link(mdev_bus_compat_class, dev, NULL);
 
 	device_for_each_child(dev, NULL, mdev_device_remove_cb);
 
 	parent_remove_sysfs_files(parent);
-
-	mutex_unlock(&parent_list_lock);
-	mdev_put_parent(parent);
+	kfree(parent);
+	put_device(dev);
 }
 EXPORT_SYMBOL(mdev_unregister_device);
 
@@ -237,10 +258,11 @@ int mdev_device_create(struct kobject *kobj,
 	struct mdev_parent *parent;
 	struct mdev_type *type = to_mdev_type(kobj);
 
-	parent = mdev_get_parent(type->parent);
-	if (!parent)
+	if (!mdev_try_get_parent(type->parent))
 		return -EINVAL;
 
+	parent = type->parent;
+
 	mutex_lock(&mdev_list_lock);
 
 	/* Check for duplicate */
@@ -287,6 +309,7 @@ int mdev_device_create(struct kobject *kobj,
 
 	mdev->active = true;
 	dev_dbg(&mdev->dev, "MDEV: created\n");
+	mdev_put_parent(parent);
 
 	return 0;
 
@@ -306,7 +329,6 @@ int mdev_device_remove(struct device *dev)
 	struct mdev_device *mdev, *tmp;
 	struct mdev_parent *parent;
 	struct mdev_type *type;
-	int ret;
 
 	mdev = to_mdev_device(dev);
 
@@ -330,15 +352,17 @@ int mdev_device_remove(struct device *dev)
 	mutex_unlock(&mdev_list_lock);
 
 	type = to_mdev_type(mdev->type_kobj);
-	mdev_remove_sysfs_files(dev, type);
-	device_del(&mdev->dev);
-	parent = mdev->parent;
-	ret = parent->ops->remove(mdev);
-	if (ret)
-		dev_err(&mdev->dev, "Remove failed: err=%d\n", ret);
+	if (!mdev_try_get_parent(type->parent)) {
+		/*
+		 * Parent unregistration have started.
+		 * No need to remove here.
+		 */
+		mutex_unlock(&mdev_list_lock);
+		return -ENODEV;
+	}
 
-	/* Balances with device_initialize() */
-	put_device(&mdev->dev);
+	parent = mdev->parent;
+	mdev_device_remove_common(mdev);
 	mdev_put_parent(parent);
 
 	return 0;
diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
index 067dc5d8c5de..781f111d66d2 100644
--- a/drivers/vfio/mdev/mdev_private.h
+++ b/drivers/vfio/mdev/mdev_private.h
@@ -19,7 +19,11 @@ void mdev_bus_unregister(void);
 struct mdev_parent {
 	struct device *dev;
 	const struct mdev_parent_ops *ops;
-	struct kref ref;
+	/* Protects unregistration to wait until create/remove
+	 * are completed.
+	 */
+	refcount_t refcount;
+	struct completion unreg_completion;
 	struct list_head next;
 	struct kset *mdev_types_kset;
 	struct list_head type_list;
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCHv2 00/10] vfio/mdev: Improve vfio/mdev core module
  2019-04-30 22:49 [PATCHv2 00/10] vfio/mdev: Improve vfio/mdev core module Parav Pandit
                   ` (9 preceding siblings ...)
  2019-04-30 22:49 ` [PATCHv2 10/10] vfio/mdev: Synchronize device create/remove with parent removal Parav Pandit
@ 2019-05-06 22:03 ` Alex Williamson
  2019-05-06 23:22   ` Parav Pandit
  10 siblings, 1 reply; 31+ messages in thread
From: Alex Williamson @ 2019-05-06 22:03 UTC (permalink / raw)
  To: Parav Pandit; +Cc: kvm, linux-kernel, kwankhede, cjia

On Tue, 30 Apr 2019 17:49:27 -0500
Parav Pandit <parav@mellanox.com> wrote:

> As we would like to use mdev subsystem for wider use case as
> discussed in [1], [2] apart from an offline discussion.
> This use case is also discussed with wider forum in [4] in track
> 'Lightweight NIC HW functions for container offload use cases'.
> 
> This series is prep-work and improves vfio/mdev module in following ways.
> 
> Patch-1 Fixes releasing parent dev reference during error unwinding
>         mdev parent registration.
> Patch-2 Simplifies mdev device for unused kref.
> Patch-3 Drops redundant extern prefix of exported symbols.
> Patch-4 Returns right error code from vendor driver.
> Patch-5 Fixes to use right sysfs remove sequence.
> Patch-6 Fixes removing all child devices if one of them fails.
> Patch-7 Remove unnecessary inline
> Patch-8 Improve the mdev create/remove sequence to match Linux
>         bus, device model
> Patch-9 Avoid recreating remove file on stale device to
>         eliminate call trace
> Patch-10 Fix race conditions of create/remove with parent removal
> This is improved version than using srcu as srcu can take
> seconds to minutes.
> 
> This series is tested using
> (a) mtty with VM using vfio_mdev driver for positive tests and
> device removal while device in use by VM using vfio_mdev driver
> 
> (b) mlx5 core driver using RFC patches [3] and internal patches.
> Internal patches are large and cannot be combined with this
> prep-work patches. It will posted once prep-work completes.
> 
> [1] https://www.spinics.net/lists/netdev/msg556978.html
> [2] https://lkml.org/lkml/2019/3/7/696
> [3] https://lkml.org/lkml/2019/3/8/819
> [4] https://netdevconf.org/0x13/session.html?workshop-hardware-offload
> 
> ---
> Changelog:
> ---
> v1->v2:
>  - Addressed comments from Alex
>  - Rebased
>  - Inserted the device checking loop in Patch-6 as original code
>  - Added patch 7 to 10
>  - Added fixes for race condition in create/remove with parent removal
>    Patch-10 uses simplified refcount and completion, instead of srcu
>    which might take seconds to minutes on busy system.
>  - Added fix for device create/remove sequence to match
>    Linux device, bus model
> v0->v1:
>  - Dropped device placement on bus sequence patch for this series
>  - Addressed below comments from Alex, Kirti, Maxim.
>  - Added Review-by tag for already reviewed patches.
>  - Dropped incorrect patch of put_device().
>  - Corrected Fixes commit tag for sysfs remove sequence fix
>  - Split last 8th patch to smaller refactor and fixes patch
>  - Following coding style commenting format
>  - Fixed accidental delete of mutex_lock in mdev_unregister_device
>  - Renamed remove helped to mdev_device_remove_common().
>  - Rebased for uuid/guid change
> 
> Parav Pandit (10):
>   vfio/mdev: Avoid release parent reference during error path
>   vfio/mdev: Removed unused kref
>   vfio/mdev: Drop redundant extern for exported symbols
>   vfio/mdev: Avoid masking error code to EBUSY
>   vfio/mdev: Follow correct remove sequence
>   vfio/mdev: Fix aborting mdev child device removal if one fails
>   vfio/mdev: Avoid inline get and put parent helpers
>   vfio/mdev: Improve the create/remove sequence
>   vfio/mdev: Avoid creating sysfs remove file on stale device removal
>   vfio/mdev: Synchronize device create/remove with parent removal
> 
>  drivers/vfio/mdev/mdev_core.c    | 162 +++++++++++++------------------
>  drivers/vfio/mdev/mdev_private.h |   9 +-
>  drivers/vfio/mdev/mdev_sysfs.c   |   8 +-
>  include/linux/mdev.h             |  21 ++--
>  4 files changed, 89 insertions(+), 111 deletions(-)
> 

Hi Parav,

I applied 1-7 to the vfio next branch for v5.2 since these are mostly
previously reviewed or trivial.  I'm not ruling out the rest for v5.2
as bug fixes yet, but they require a bit more to digest and hopefully
we'll get some feedback from others as well.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCHv2 00/10] vfio/mdev: Improve vfio/mdev core module
  2019-05-06 22:03 ` [PATCHv2 00/10] vfio/mdev: Improve vfio/mdev core module Alex Williamson
@ 2019-05-06 23:22   ` Parav Pandit
  0 siblings, 0 replies; 31+ messages in thread
From: Parav Pandit @ 2019-05-06 23:22 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, linux-kernel, kwankhede, cjia



> -----Original Message-----
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Monday, May 6, 2019 5:03 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankhede@nvidia.com; cjia@nvidia.com
> Subject: Re: [PATCHv2 00/10] vfio/mdev: Improve vfio/mdev core module
> 
> On Tue, 30 Apr 2019 17:49:27 -0500
> Parav Pandit <parav@mellanox.com> wrote:
> 
> > As we would like to use mdev subsystem for wider use case as discussed
> > in [1], [2] apart from an offline discussion.
> > This use case is also discussed with wider forum in [4] in track
> > 'Lightweight NIC HW functions for container offload use cases'.
> >
> > This series is prep-work and improves vfio/mdev module in following ways.
> >
> > Patch-1 Fixes releasing parent dev reference during error unwinding
> >         mdev parent registration.
> > Patch-2 Simplifies mdev device for unused kref.
> > Patch-3 Drops redundant extern prefix of exported symbols.
> > Patch-4 Returns right error code from vendor driver.
> > Patch-5 Fixes to use right sysfs remove sequence.
> > Patch-6 Fixes removing all child devices if one of them fails.
> > Patch-7 Remove unnecessary inline
> > Patch-8 Improve the mdev create/remove sequence to match Linux
> >         bus, device model
> > Patch-9 Avoid recreating remove file on stale device to
> >         eliminate call trace
> > Patch-10 Fix race conditions of create/remove with parent removal This
> > is improved version than using srcu as srcu can take seconds to
> > minutes.
> >
> > This series is tested using
> > (a) mtty with VM using vfio_mdev driver for positive tests and device
> > removal while device in use by VM using vfio_mdev driver
> >
> > (b) mlx5 core driver using RFC patches [3] and internal patches.
> > Internal patches are large and cannot be combined with this prep-work
> > patches. It will posted once prep-work completes.
> >
> > [1] https://www.spinics.net/lists/netdev/msg556978.html
> > [2] https://lkml.org/lkml/2019/3/7/696
> > [3] https://lkml.org/lkml/2019/3/8/819
> > [4] https://netdevconf.org/0x13/session.html?workshop-hardware-offload
> >
> > ---
> > Changelog:
> > ---
> > v1->v2:
> >  - Addressed comments from Alex
> >  - Rebased
> >  - Inserted the device checking loop in Patch-6 as original code
> >  - Added patch 7 to 10
> >  - Added fixes for race condition in create/remove with parent removal
> >    Patch-10 uses simplified refcount and completion, instead of srcu
> >    which might take seconds to minutes on busy system.
> >  - Added fix for device create/remove sequence to match
> >    Linux device, bus model
> > v0->v1:
> >  - Dropped device placement on bus sequence patch for this series
> >  - Addressed below comments from Alex, Kirti, Maxim.
> >  - Added Review-by tag for already reviewed patches.
> >  - Dropped incorrect patch of put_device().
> >  - Corrected Fixes commit tag for sysfs remove sequence fix
> >  - Split last 8th patch to smaller refactor and fixes patch
> >  - Following coding style commenting format
> >  - Fixed accidental delete of mutex_lock in mdev_unregister_device
> >  - Renamed remove helped to mdev_device_remove_common().
> >  - Rebased for uuid/guid change
> >
> > Parav Pandit (10):
> >   vfio/mdev: Avoid release parent reference during error path
> >   vfio/mdev: Removed unused kref
> >   vfio/mdev: Drop redundant extern for exported symbols
> >   vfio/mdev: Avoid masking error code to EBUSY
> >   vfio/mdev: Follow correct remove sequence
> >   vfio/mdev: Fix aborting mdev child device removal if one fails
> >   vfio/mdev: Avoid inline get and put parent helpers
> >   vfio/mdev: Improve the create/remove sequence
> >   vfio/mdev: Avoid creating sysfs remove file on stale device removal
> >   vfio/mdev: Synchronize device create/remove with parent removal
> >
> >  drivers/vfio/mdev/mdev_core.c    | 162 +++++++++++++------------------
> >  drivers/vfio/mdev/mdev_private.h |   9 +-
> >  drivers/vfio/mdev/mdev_sysfs.c   |   8 +-
> >  include/linux/mdev.h             |  21 ++--
> >  4 files changed, 89 insertions(+), 111 deletions(-)
> >
> 
> Hi Parav,
> 
> I applied 1-7 to the vfio next branch for v5.2 since these are mostly
> previously reviewed or trivial.  I'm not ruling out the rest for v5.2 as bug fixes
> yet, but they require a bit more to digest and hopefully we'll get some
> feedback from others as well.  Thanks,
> 
Ok. Great.
Yes, these are important for us to make use of mdev.
We should address them in 5.2 window.
I will look for any comments this week and address them as required.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHv2 08/10] vfio/mdev: Improve the create/remove sequence
  2019-04-30 22:49 ` [PATCHv2 08/10] vfio/mdev: Improve the create/remove sequence Parav Pandit
@ 2019-05-08 17:09   ` Cornelia Huck
  2019-05-08 22:06     ` Parav Pandit
  0 siblings, 1 reply; 31+ messages in thread
From: Cornelia Huck @ 2019-05-08 17:09 UTC (permalink / raw)
  To: Parav Pandit; +Cc: kvm, linux-kernel, kwankhede, alex.williamson, cjia

On Tue, 30 Apr 2019 17:49:35 -0500
Parav Pandit <parav@mellanox.com> wrote:

> This patch addresses below two issues and prepares the code to address
> 3rd issue listed below.
> 
> 1. mdev device is placed on the mdev bus before it is created in the
> vendor driver. Once a device is placed on the mdev bus without creating
> its supporting underlying vendor device, mdev driver's probe() gets triggered.
> However there isn't a stable mdev available to work on.
> 
>    create_store()
>      mdev_create_device()
>        device_register()
>           ...
>          vfio_mdev_probe()
>         [...]
>         parent->ops->create()
>           vfio_ap_mdev_create()
>             mdev_set_drvdata(mdev, matrix_mdev);
>             /* Valid pointer set above */
> 
> Due to this way of initialization, mdev driver who want to use the mdev,
> doesn't have a valid mdev to work on.
> 
> 2. Current creation sequence is,
>    parent->ops_create()
>    groups_register()
> 
> Remove sequence is,
>    parent->ops->remove()
>    groups_unregister()
> 
> However, remove sequence should be exact mirror of creation sequence.
> Once this is achieved, all users of the mdev will be terminated first
> before removing underlying vendor device.
> (Follow standard linux driver model).
> At that point vendor's remove() ops shouldn't failed because device is
> taken off the bus that should terminate the users.
> 
> 3. When remove operation fails, mdev sysfs removal attempts to add the
> file back on already removed device. Following call trace [1] is observed.
> 
> [1] call trace:
> kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327 sysfs_create_file_ns+0x7f/0x90
> kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted 5.1.0-rc6-vdevbus+ #6
> kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
> kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
> kernel: Call Trace:
> kernel: remove_store+0xdc/0x100 [mdev]
> kernel: kernfs_fop_write+0x113/0x1a0
> kernel: vfs_write+0xad/0x1b0
> kernel: ksys_write+0x5a/0xe0
> kernel: do_syscall_64+0x5a/0x210
> kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> Therefore, mdev core is improved in following ways.
> 
> 1. Before placing mdev devices on the bus, perform vendor drivers
> creation which supports the mdev creation.
> This ensures that mdev specific all necessary fields are initialized
> before a given mdev can be accessed by bus driver.
> This follows standard Linux kernel bus and device model similar to other
> widely used PCI bus.
> 
> 2. During remove flow, first remove the device from the bus. This
> ensures that any bus specific devices and data is cleared.
> Once device is taken of the mdev bus, perform remove() of mdev from the
> vendor driver.
> 
> 3. Linux core device model provides way to register and auto unregister
> the device sysfs attribute groups at dev->groups.
> Make use of this groups to let core create the groups and simplify code
> to avoid explicit groups creation and removal.
> 
> A below stack dump of a mdev device remove process also ensures that
> vfio driver guards against device removal already in use.
> 
>  cat /proc/21962/stack
> [<0>] vfio_del_group_dev+0x216/0x3c0 [vfio]
> [<0>] mdev_remove+0x21/0x40 [mdev]
> [<0>] device_release_driver_internal+0xe8/0x1b0
> [<0>] bus_remove_device+0xf9/0x170
> [<0>] device_del+0x168/0x350
> [<0>] mdev_device_remove_common+0x1d/0x50 [mdev]
> [<0>] mdev_device_remove+0x8c/0xd0 [mdev]
> [<0>] remove_store+0x71/0x90 [mdev]
> [<0>] kernfs_fop_write+0x113/0x1a0
> [<0>] vfs_write+0xad/0x1b0
> [<0>] ksys_write+0x5a/0xe0
> [<0>] do_syscall_64+0x5a/0x210
> [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [<0>] 0xffffffffffffffff
> 
> This prepares the code to eliminate calling device_create_file() in
> subsquent patch.

I'm afraid I have a bit of a problem following this explanation, so let
me try to summarize what the patch does to make sure that I understand
it correctly:

- Add the sysfs groups to device->groups so that the driver core deals
  with proper registration/deregistration.
- Split the device registration/deregistration sequence so that some
  things can be done between initialization of the device and hooking
  it up to the infrastructure respectively after deregistering it from
  the infrastructure but before giving up our final reference. In
  particular, this means invoking the ->create and ->remove callback in
  those new windows. This gives the vendor driver an initialized mdev
  device to work with during creation.
- Don't allow ->remove to fail, as the device is already removed from
  the infrastructure at that point in time.

> 
> Signed-off-by: Parav Pandit <parav@mellanox.com>
> ---
>  drivers/vfio/mdev/mdev_core.c    | 94 +++++++++-----------------------
>  drivers/vfio/mdev/mdev_private.h |  2 +-
>  drivers/vfio/mdev/mdev_sysfs.c   |  2 +-
>  3 files changed, 27 insertions(+), 71 deletions(-)

(...)

> @@ -310,41 +265,43 @@ int mdev_device_create(struct kobject *kobj,
>  
>  	mdev->parent = parent;
>  
> +	device_initialize(&mdev->dev);
>  	mdev->dev.parent  = dev;
>  	mdev->dev.bus     = &mdev_bus_type;
>  	mdev->dev.release = mdev_device_release;
>  	dev_set_name(&mdev->dev, "%pUl", uuid);
> +	mdev->dev.groups = parent->ops->mdev_attr_groups;

I like that, that makes things much easier.

> +	mdev->type_kobj = kobj;
>  
> -	ret = device_register(&mdev->dev);
> -	if (ret) {
> -		put_device(&mdev->dev);
> -		goto mdev_fail;
> -	}
> +	ret = parent->ops->create(kobj, mdev);
> +	if (ret)
> +		goto ops_create_fail;
>  
> -	ret = mdev_device_create_ops(kobj, mdev);
> +	ret = device_add(&mdev->dev);
>  	if (ret)
> -		goto create_fail;
> +		goto add_fail;
>  
>  	ret = mdev_create_sysfs_files(&mdev->dev, type);
> -	if (ret) {
> -		mdev_device_remove_ops(mdev, true);
> -		goto create_fail;
> -	}
> +	if (ret)
> +		goto sysfs_fail;
>  
> -	mdev->type_kobj = kobj;
>  	mdev->active = true;
>  	dev_dbg(&mdev->dev, "MDEV: created\n");
>  
>  	return 0;
>  
> -create_fail:
> -	device_unregister(&mdev->dev);
> +sysfs_fail:
> +	device_del(&mdev->dev);
> +add_fail:
> +	parent->ops->remove(mdev);
> +ops_create_fail:
> +	put_device(&mdev->dev);
>  mdev_fail:
>  	mdev_put_parent(parent);
>  	return ret;
>  }
>  
> -int mdev_device_remove(struct device *dev, bool force_remove)
> +int mdev_device_remove(struct device *dev)
>  {
>  	struct mdev_device *mdev, *tmp;
>  	struct mdev_parent *parent;
> @@ -373,16 +330,15 @@ int mdev_device_remove(struct device *dev, bool force_remove)
>  	mutex_unlock(&mdev_list_lock);
>  
>  	type = to_mdev_type(mdev->type_kobj);
> +	mdev_remove_sysfs_files(dev, type);
> +	device_del(&mdev->dev);
>  	parent = mdev->parent;
> +	ret = parent->ops->remove(mdev);
> +	if (ret)
> +		dev_err(&mdev->dev, "Remove failed: err=%d\n", ret);

I think carrying on with removal regardless of the return code of the
->remove callback makes sense, as it simply matches usual practice.
However, are we sure that every vendor driver works well with that? I
think it should, as removal from bus unregistration (vs. from the sysfs
file) was always something it could not veto, but have you looked at
the individual drivers?

>  
> -	ret = mdev_device_remove_ops(mdev, force_remove);
> -	if (ret) {
> -		mdev->active = true;
> -		return ret;
> -	}
> -
> -	mdev_remove_sysfs_files(dev, type);
> -	device_unregister(dev);
> +	/* Balances with device_initialize() */
> +	put_device(&mdev->dev);
>  	mdev_put_parent(parent);
>  
>  	return 0;

I think that looks sane in general, but the commit message might
benefit from tweaking.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHv2 09/10] vfio/mdev: Avoid creating sysfs remove file on stale device removal
  2019-04-30 22:49 ` [PATCHv2 09/10] vfio/mdev: Avoid creating sysfs remove file on stale device removal Parav Pandit
@ 2019-05-08 17:16   ` Cornelia Huck
  2019-05-08 22:13     ` Parav Pandit
  0 siblings, 1 reply; 31+ messages in thread
From: Cornelia Huck @ 2019-05-08 17:16 UTC (permalink / raw)
  To: Parav Pandit; +Cc: kvm, linux-kernel, kwankhede, alex.williamson, cjia

On Tue, 30 Apr 2019 17:49:36 -0500
Parav Pandit <parav@mellanox.com> wrote:

> If device is removal is initiated by two threads as below, mdev core
> attempts to create a syfs remove file on stale device.
> During this flow, below [1] call trace is observed.
> 
>      cpu-0                                    cpu-1
>      -----                                    -----
>   mdev_unregister_device()
>     device_for_each_child
>        mdev_device_remove_cb
>           mdev_device_remove
>                                        user_syscall
>                                          remove_store()
>                                            mdev_device_remove()
>                                         [..]
>    unregister device();
>                                        /* not found in list or
>                                         * active=false.
>                                         */
>                                           sysfs_create_file()
>                                           ..Call trace
> 
> Now that mdev core follows correct device removal system of the linux
> bus model, remove shouldn't fail in normal cases. If it fails, there is
> no point of creating a stale file or checking for specific error status.

Which error cases are left? Is there anything that does not indicate
that something got terribly messed up internally?

> 
> kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
> sysfs_create_file_ns+0x7f/0x90
> kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted
> 5.1.0-rc6-vdevbus+ #6
> kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b
> 08/09/2016
> kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
> kernel: Call Trace:
> kernel: remove_store+0xdc/0x100 [mdev]
> kernel: kernfs_fop_write+0x113/0x1a0
> kernel: vfs_write+0xad/0x1b0
> kernel: ksys_write+0x5a/0xe0
> kernel: do_syscall_64+0x5a/0x210
> kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> Signed-off-by: Parav Pandit <parav@mellanox.com>
> ---
>  drivers/vfio/mdev/mdev_sysfs.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
> index 9f774b91d275..ffa3dcebf201 100644
> --- a/drivers/vfio/mdev/mdev_sysfs.c
> +++ b/drivers/vfio/mdev/mdev_sysfs.c
> @@ -237,10 +237,8 @@ static ssize_t remove_store(struct device *dev, struct device_attribute *attr,
>  		int ret;
>  
>  		ret = mdev_device_remove(dev);
> -		if (ret) {
> -			device_create_file(dev, attr);
> +		if (ret)

Should you merge this into the previous patch?

>  			return ret;
> -		}
>  	}
>  
>  	return count;


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCHv2 08/10] vfio/mdev: Improve the create/remove sequence
  2019-05-08 17:09   ` Cornelia Huck
@ 2019-05-08 22:06     ` Parav Pandit
  2019-05-09  9:06       ` Cornelia Huck
  0 siblings, 1 reply; 31+ messages in thread
From: Parav Pandit @ 2019-05-08 22:06 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: kvm, linux-kernel, kwankhede, alex.williamson, cjia



> -----Original Message-----
> From: Cornelia Huck <cohuck@redhat.com>
> Sent: Wednesday, May 8, 2019 12:10 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankhede@nvidia.com; alex.williamson@redhat.com; cjia@nvidia.com
> Subject: Re: [PATCHv2 08/10] vfio/mdev: Improve the create/remove
> sequence
> 
> On Tue, 30 Apr 2019 17:49:35 -0500
> Parav Pandit <parav@mellanox.com> wrote:
> 
> > This patch addresses below two issues and prepares the code to address
> > 3rd issue listed below.
> >
> > 1. mdev device is placed on the mdev bus before it is created in the
> > vendor driver. Once a device is placed on the mdev bus without
> > creating its supporting underlying vendor device, mdev driver's probe()
> gets triggered.
> > However there isn't a stable mdev available to work on.
> >
> >    create_store()
> >      mdev_create_device()
> >        device_register()
> >           ...
> >          vfio_mdev_probe()
> >         [...]
> >         parent->ops->create()
> >           vfio_ap_mdev_create()
> >             mdev_set_drvdata(mdev, matrix_mdev);
> >             /* Valid pointer set above */
> >
> > Due to this way of initialization, mdev driver who want to use the
> > mdev, doesn't have a valid mdev to work on.
> >
> > 2. Current creation sequence is,
> >    parent->ops_create()
> >    groups_register()
> >
> > Remove sequence is,
> >    parent->ops->remove()
> >    groups_unregister()
> >
> > However, remove sequence should be exact mirror of creation sequence.
> > Once this is achieved, all users of the mdev will be terminated first
> > before removing underlying vendor device.
> > (Follow standard linux driver model).
> > At that point vendor's remove() ops shouldn't failed because device is
> > taken off the bus that should terminate the users.
> >
> > 3. When remove operation fails, mdev sysfs removal attempts to add the
> > file back on already removed device. Following call trace [1] is observed.
> >
> > [1] call trace:
> > kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
> > sysfs_create_file_ns+0x7f/0x90
> > kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted
> > 5.1.0-rc6-vdevbus+ #6
> > kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b
> > 08/09/2016
> > kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
> > kernel: Call Trace:
> > kernel: remove_store+0xdc/0x100 [mdev]
> > kernel: kernfs_fop_write+0x113/0x1a0
> > kernel: vfs_write+0xad/0x1b0
> > kernel: ksys_write+0x5a/0xe0
> > kernel: do_syscall_64+0x5a/0x210
> > kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >
> > Therefore, mdev core is improved in following ways.
> >
> > 1. Before placing mdev devices on the bus, perform vendor drivers
> > creation which supports the mdev creation.
> > This ensures that mdev specific all necessary fields are initialized
> > before a given mdev can be accessed by bus driver.
> > This follows standard Linux kernel bus and device model similar to
> > other widely used PCI bus.
> >
> > 2. During remove flow, first remove the device from the bus. This
> > ensures that any bus specific devices and data is cleared.
> > Once device is taken of the mdev bus, perform remove() of mdev from
> > the vendor driver.
> >
> > 3. Linux core device model provides way to register and auto
> > unregister the device sysfs attribute groups at dev->groups.
> > Make use of this groups to let core create the groups and simplify
> > code to avoid explicit groups creation and removal.
> >
> > A below stack dump of a mdev device remove process also ensures that
> > vfio driver guards against device removal already in use.
> >
> >  cat /proc/21962/stack
> > [<0>] vfio_del_group_dev+0x216/0x3c0 [vfio] [<0>]
> > mdev_remove+0x21/0x40 [mdev] [<0>]
> > device_release_driver_internal+0xe8/0x1b0
> > [<0>] bus_remove_device+0xf9/0x170
> > [<0>] device_del+0x168/0x350
> > [<0>] mdev_device_remove_common+0x1d/0x50 [mdev] [<0>]
> > mdev_device_remove+0x8c/0xd0 [mdev] [<0>] remove_store+0x71/0x90
> > [mdev] [<0>] kernfs_fop_write+0x113/0x1a0 [<0>] vfs_write+0xad/0x1b0
> > [<0>] ksys_write+0x5a/0xe0 [<0>] do_syscall_64+0x5a/0x210 [<0>]
> > entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > [<0>] 0xffffffffffffffff
> >
> > This prepares the code to eliminate calling device_create_file() in
> > subsquent patch.
> 
> I'm afraid I have a bit of a problem following this explanation, so let me try
> to summarize what the patch does to make sure that I understand it
> correctly:
> 
> - Add the sysfs groups to device->groups so that the driver core deals
>   with proper registration/deregistration.
> - Split the device registration/deregistration sequence so that some
>   things can be done between initialization of the device and hooking
>   it up to the infrastructure respectively after deregistering it from
>   the infrastructure but before giving up our final reference. In
>   particular, this means invoking the ->create and ->remove callback in
>   those new windows. This gives the vendor driver an initialized mdev
>   device to work with during creation.
> - Don't allow ->remove to fail, as the device is already removed from
>   the infrastructure at that point in time.
> 
You got all the points pretty accurate.

> >
> > Signed-off-by: Parav Pandit <parav@mellanox.com>
> > ---
> >  drivers/vfio/mdev/mdev_core.c    | 94 +++++++++-----------------------
> >  drivers/vfio/mdev/mdev_private.h |  2 +-
> >  drivers/vfio/mdev/mdev_sysfs.c   |  2 +-
> >  3 files changed, 27 insertions(+), 71 deletions(-)
> 
> (...)
> 
> > @@ -310,41 +265,43 @@ int mdev_device_create(struct kobject *kobj,
> >
> >  	mdev->parent = parent;
> >
> > +	device_initialize(&mdev->dev);
> >  	mdev->dev.parent  = dev;
> >  	mdev->dev.bus     = &mdev_bus_type;
> >  	mdev->dev.release = mdev_device_release;
> >  	dev_set_name(&mdev->dev, "%pUl", uuid);
> > +	mdev->dev.groups = parent->ops->mdev_attr_groups;
> 
> I like that, that makes things much easier.
> 
True.

> > +	mdev->type_kobj = kobj;
> >
> > -	ret = device_register(&mdev->dev);
> > -	if (ret) {
> > -		put_device(&mdev->dev);
> > -		goto mdev_fail;
> > -	}
> > +	ret = parent->ops->create(kobj, mdev);
> > +	if (ret)
> > +		goto ops_create_fail;
> >
> > -	ret = mdev_device_create_ops(kobj, mdev);
> > +	ret = device_add(&mdev->dev);
> >  	if (ret)
> > -		goto create_fail;
> > +		goto add_fail;
> >
> >  	ret = mdev_create_sysfs_files(&mdev->dev, type);
> > -	if (ret) {
> > -		mdev_device_remove_ops(mdev, true);
> > -		goto create_fail;
> > -	}
> > +	if (ret)
> > +		goto sysfs_fail;
> >
> > -	mdev->type_kobj = kobj;
> >  	mdev->active = true;
> >  	dev_dbg(&mdev->dev, "MDEV: created\n");
> >
> >  	return 0;
> >
> > -create_fail:
> > -	device_unregister(&mdev->dev);
> > +sysfs_fail:
> > +	device_del(&mdev->dev);
> > +add_fail:
> > +	parent->ops->remove(mdev);
> > +ops_create_fail:
> > +	put_device(&mdev->dev);
> >  mdev_fail:
> >  	mdev_put_parent(parent);
> >  	return ret;
> >  }
> >
> > -int mdev_device_remove(struct device *dev, bool force_remove)
> > +int mdev_device_remove(struct device *dev)
> >  {
> >  	struct mdev_device *mdev, *tmp;
> >  	struct mdev_parent *parent;
> > @@ -373,16 +330,15 @@ int mdev_device_remove(struct device *dev,
> bool force_remove)
> >  	mutex_unlock(&mdev_list_lock);
> >
> >  	type = to_mdev_type(mdev->type_kobj);
> > +	mdev_remove_sysfs_files(dev, type);
> > +	device_del(&mdev->dev);
> >  	parent = mdev->parent;
> > +	ret = parent->ops->remove(mdev);
> > +	if (ret)
> > +		dev_err(&mdev->dev, "Remove failed: err=%d\n", ret);
> 
> I think carrying on with removal regardless of the return code of the
> ->remove callback makes sense, as it simply matches usual practice.
> However, are we sure that every vendor driver works well with that? I think
> it should, as removal from bus unregistration (vs. from the sysfs
> file) was always something it could not veto, but have you looked at the
> individual drivers?
> 
I looked at following drivers a little while back.
Looked again now.

drivers/gpu/drm/i915/gvt/kvmgt.c which clears the handle valid in intel_vgpu_release(), which should finish first before remove() is invoked.

s390 vfio_ccw_mdev_remove() driver drivers/s390/cio/vfio_ccw_ops.c remove() always returns 0.
s39 crypo fails the remove() once vfio_ap_mdev_release marks kvm null, which should finish before remove() is invoked.
samples/vfio-mdev/mbochs.c mbochs_remove() always returns 0.

> >
> > -	ret = mdev_device_remove_ops(mdev, force_remove);
> > -	if (ret) {
> > -		mdev->active = true;
> > -		return ret;
> > -	}
> > -
> > -	mdev_remove_sysfs_files(dev, type);
> > -	device_unregister(dev);
> > +	/* Balances with device_initialize() */
> > +	put_device(&mdev->dev);
> >  	mdev_put_parent(parent);
> >
> >  	return 0;
> 
> I think that looks sane in general, but the commit message might benefit
> from tweaking.
Part of your description is more crisp than my commit message, I can probably take snippet from it to improve?
Or any specific entries in commit message that I should address?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCHv2 09/10] vfio/mdev: Avoid creating sysfs remove file on stale device removal
  2019-05-08 17:16   ` Cornelia Huck
@ 2019-05-08 22:13     ` Parav Pandit
  2019-05-09  9:18       ` Cornelia Huck
  0 siblings, 1 reply; 31+ messages in thread
From: Parav Pandit @ 2019-05-08 22:13 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: kvm, linux-kernel, kwankhede, alex.williamson, cjia



> -----Original Message-----
> From: Cornelia Huck <cohuck@redhat.com>
> Sent: Wednesday, May 8, 2019 12:17 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankhede@nvidia.com; alex.williamson@redhat.com; cjia@nvidia.com
> Subject: Re: [PATCHv2 09/10] vfio/mdev: Avoid creating sysfs remove file on
> stale device removal
> 
> On Tue, 30 Apr 2019 17:49:36 -0500
> Parav Pandit <parav@mellanox.com> wrote:
> 
> > If device is removal is initiated by two threads as below, mdev core
> > attempts to create a syfs remove file on stale device.
> > During this flow, below [1] call trace is observed.
> >
> >      cpu-0                                    cpu-1
> >      -----                                    -----
> >   mdev_unregister_device()
> >     device_for_each_child
> >        mdev_device_remove_cb
> >           mdev_device_remove
> >                                        user_syscall
> >                                          remove_store()
> >                                            mdev_device_remove()
> >                                         [..]
> >    unregister device();
> >                                        /* not found in list or
> >                                         * active=false.
> >                                         */
> >                                           sysfs_create_file()
> >                                           ..Call trace
> >
> > Now that mdev core follows correct device removal system of the linux
> > bus model, remove shouldn't fail in normal cases. If it fails, there
> > is no point of creating a stale file or checking for specific error status.
> 
> Which error cases are left? Is there anything that does not indicate that
> something got terribly messed up internally?
> 
Few reasons I can think of that can fail remove are:

1. Some device removal requires allocating memory too as it needs to issue commands to device.
If on the path, such allocation fails, remove can fail. However such fail to allocate memory will probably result into more serious warnings before this.
2. if the device firmware has crashed, device removal commands will likely timeout and return such error upto user.
3. If user tries to remove a device, while parent is already in removal path, this call will eventually fail as it won't find the device in the internal list.

> >
> > kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
> > sysfs_create_file_ns+0x7f/0x90
> > kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted
> > 5.1.0-rc6-vdevbus+ #6
> > kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b
> > 08/09/2016
> > kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
> > kernel: Call Trace:
> > kernel: remove_store+0xdc/0x100 [mdev]
> > kernel: kernfs_fop_write+0x113/0x1a0
> > kernel: vfs_write+0xad/0x1b0
> > kernel: ksys_write+0x5a/0xe0
> > kernel: do_syscall_64+0x5a/0x210
> > kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >
> > Signed-off-by: Parav Pandit <parav@mellanox.com>
> > ---
> >  drivers/vfio/mdev/mdev_sysfs.c | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/drivers/vfio/mdev/mdev_sysfs.c
> > b/drivers/vfio/mdev/mdev_sysfs.c index 9f774b91d275..ffa3dcebf201
> > 100644
> > --- a/drivers/vfio/mdev/mdev_sysfs.c
> > +++ b/drivers/vfio/mdev/mdev_sysfs.c
> > @@ -237,10 +237,8 @@ static ssize_t remove_store(struct device *dev,
> struct device_attribute *attr,
> >  		int ret;
> >
> >  		ret = mdev_device_remove(dev);
> > -		if (ret) {
> > -			device_create_file(dev, attr);
> > +		if (ret)
> 
> Should you merge this into the previous patch?
> 
I am not sure. Previous patch changes the sequence. I think that deserved an own patch by itself.
This change is making use of that sequence.
So its easier to review? Alex had comment in v0 to split into more logical patches, so...
Specially to capture a different call trace, I cut into different patch.
Otherwise previous patch's commit message is too long.

> >  			return ret;
> > -		}
> >  	}
> >
> >  	return count;


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHv2 10/10] vfio/mdev: Synchronize device create/remove with parent removal
  2019-04-30 22:49 ` [PATCHv2 10/10] vfio/mdev: Synchronize device create/remove with parent removal Parav Pandit
@ 2019-05-09  2:46   ` Alex Williamson
  2019-05-09  9:49     ` Cornelia Huck
  2019-05-09 16:03     ` Parav Pandit
  0 siblings, 2 replies; 31+ messages in thread
From: Alex Williamson @ 2019-05-09  2:46 UTC (permalink / raw)
  To: Parav Pandit; +Cc: kvm, linux-kernel, kwankhede, cjia

On Tue, 30 Apr 2019 17:49:37 -0500
Parav Pandit <parav@mellanox.com> wrote:

> In following sequences, child devices created while removing mdev parent
> device can be left out, or it may lead to race of removing half
> initialized child mdev devices.
> 
> issue-1:
> --------
>        cpu-0                         cpu-1
>        -----                         -----
>                                   mdev_unregister_device()
>                                     device_for_each_child()
>                                       mdev_device_remove_cb()
>                                         mdev_device_remove()
> create_store()
>   mdev_device_create()                   [...]
>     device_add()
>                                   parent_remove_sysfs_files()
> 
> /* BUG: device added by cpu-0
>  * whose parent is getting removed
>  * and it won't process this mdev.
>  */
> 
> issue-2:
> --------
> Below crash is observed when user initiated remove is in progress
> and mdev_unregister_driver() completes parent unregistration.
> 
>        cpu-0                         cpu-1
>        -----                         -----
> remove_store()
>    mdev_device_remove()
>    active = false;
>                                   mdev_unregister_device()
>                                   parent device removed.
>    [...]
>    parents->ops->remove()
>  /*
>   * BUG: Accessing invalid parent.
>   */
> 
> This is similar race like create() racing with mdev_unregister_device().
> 
> BUG: unable to handle kernel paging request at ffffffffc0585668
> PGD e8f618067 P4D e8f618067 PUD e8f61a067 PMD 85adca067 PTE 0
> Oops: 0000 [#1] SMP PTI
> CPU: 41 PID: 37403 Comm: bash Kdump: loaded Not tainted 5.1.0-rc6-vdevbus+ #6
> Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
> RIP: 0010:mdev_device_remove+0xfa/0x140 [mdev]
> Call Trace:
>  remove_store+0x71/0x90 [mdev]
>  kernfs_fop_write+0x113/0x1a0
>  vfs_write+0xad/0x1b0
>  ksys_write+0x5a/0xe0
>  do_syscall_64+0x5a/0x210
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> Therefore, mdev core is improved as below to overcome above issues.
> 
> Wait for any ongoing mdev create() and remove() to finish before
> unregistering parent device using refcount and completion.
> This continues to allow multiple create and remove to progress in
> parallel for different mdev devices as most common case.
> At the same time guard parent removal while parent is being access by
> create() and remove callbacks.
> 
> Code is simplified from kref to use refcount as unregister_device() has
> to wait anyway for all create/remove to finish.
> 
> While removing mdev devices during parent unregistration, there isn't
> need to acquire refcount of parent device, hence code is restructured
> using mdev_device_remove_common() to avoid it.

Did you consider calling parent_remove_sysfs_files() earlier in
mdev_unregister_device() and adding srcu support to know there are no
in-flight callers of the create path?  I think that would address
issue-1.

Issue-2 suggests a bug in our handling of the parent device krefs, the
parent object should exist until all child devices which have a kref
reference to the parent are removed, but clearly
mdev_unregister_device() is not blocking for that to occur allowing the
parent driver .remove callback to finish.  This seems similar to
vfio_del_group_dev() where we need to block a vfio bus driver from
removing a device until it becomes unused, could a similar solution
with a wait_queue and wait_woken be used here?

I'm not immediately sold on the idea that removing a kref to solve this
problem is a good thing, it seems odd to me that mdevs don't hold a
reference to the parent throughout their life with this change, and the
remove_store path branch to exit if we find we're racing the parent
remove path is rather ugly.  BTW, why is the sanitization loop in
mdev_device_remove() still here, wasn't that fixed by the previous two
patches?  Thanks,

Alex

> Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
> Signed-off-by: Parav Pandit <parav@mellanox.com>
> ---
>  drivers/vfio/mdev/mdev_core.c    | 86 ++++++++++++++++++++------------
>  drivers/vfio/mdev/mdev_private.h |  6 ++-
>  2 files changed, 60 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index 2b98da2ee361..a5da24d662f4 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -78,34 +78,41 @@ static struct mdev_parent *__find_parent_device(struct device *dev)
>  	return NULL;
>  }
>  
> -static void mdev_release_parent(struct kref *kref)
> +static bool mdev_try_get_parent(struct mdev_parent *parent)
>  {
> -	struct mdev_parent *parent = container_of(kref, struct mdev_parent,
> -						  ref);
> -	struct device *dev = parent->dev;
> -
> -	kfree(parent);
> -	put_device(dev);
> +	if (parent)
> +		return refcount_inc_not_zero(&parent->refcount);
> +	return false;
>  }
>  
> -static struct mdev_parent *mdev_get_parent(struct mdev_parent *parent)
> +static void mdev_put_parent(struct mdev_parent *parent)
>  {
> -	if (parent)
> -		kref_get(&parent->ref);
> -
> -	return parent;
> +	if (parent && refcount_dec_and_test(&parent->refcount))
> +		complete(&parent->unreg_completion);
>  }
>  
> -static void mdev_put_parent(struct mdev_parent *parent)
> +static void mdev_device_remove_common(struct mdev_device *mdev)
>  {
> -	if (parent)
> -		kref_put(&parent->ref, mdev_release_parent);
> +	struct mdev_parent *parent;
> +	struct mdev_type *type;
> +	int ret;
> +
> +	type = to_mdev_type(mdev->type_kobj);
> +	mdev_remove_sysfs_files(&mdev->dev, type);
> +	device_del(&mdev->dev);
> +	parent = mdev->parent;
> +	ret = parent->ops->remove(mdev);
> +	if (ret)
> +		dev_err(&mdev->dev, "Remove failed: err=%d\n", ret);
> +
> +	/* Balances with device_initialize() */
> +	put_device(&mdev->dev);
>  }
>  
>  static int mdev_device_remove_cb(struct device *dev, void *data)
>  {
>  	if (dev_is_mdev(dev))
> -		mdev_device_remove(dev);
> +		mdev_device_remove_common(to_mdev_device(dev));
>  
>  	return 0;
>  }
> @@ -147,7 +154,8 @@ int mdev_register_device(struct device *dev, const struct mdev_parent_ops *ops)
>  		goto add_dev_err;
>  	}
>  
> -	kref_init(&parent->ref);
> +	refcount_set(&parent->refcount, 1);
> +	init_completion(&parent->unreg_completion);
>  
>  	parent->dev = dev;
>  	parent->ops = ops;
> @@ -206,14 +214,27 @@ void mdev_unregister_device(struct device *dev)
>  	dev_info(dev, "MDEV: Unregistering\n");
>  
>  	list_del(&parent->next);
> +	mutex_unlock(&parent_list_lock);
> +
> +	/* Release the initial reference so that new create cannot start */
> +	mdev_put_parent(parent);
> +
> +	/*
> +	 * Wait for all the create and remove references to drop.
> +	 */
> +	wait_for_completion(&parent->unreg_completion);
> +
> +	/*
> +	 * New references cannot be taken and all users are done
> +	 * using the parent. So it is safe to unregister parent.
> +	 */
>  	class_compat_remove_link(mdev_bus_compat_class, dev, NULL);
>  
>  	device_for_each_child(dev, NULL, mdev_device_remove_cb);
>  
>  	parent_remove_sysfs_files(parent);
> -
> -	mutex_unlock(&parent_list_lock);
> -	mdev_put_parent(parent);
> +	kfree(parent);
> +	put_device(dev);
>  }
>  EXPORT_SYMBOL(mdev_unregister_device);
>  
> @@ -237,10 +258,11 @@ int mdev_device_create(struct kobject *kobj,
>  	struct mdev_parent *parent;
>  	struct mdev_type *type = to_mdev_type(kobj);
>  
> -	parent = mdev_get_parent(type->parent);
> -	if (!parent)
> +	if (!mdev_try_get_parent(type->parent))
>  		return -EINVAL;
>  
> +	parent = type->parent;
> +
>  	mutex_lock(&mdev_list_lock);
>  
>  	/* Check for duplicate */
> @@ -287,6 +309,7 @@ int mdev_device_create(struct kobject *kobj,
>  
>  	mdev->active = true;
>  	dev_dbg(&mdev->dev, "MDEV: created\n");
> +	mdev_put_parent(parent);
>  
>  	return 0;
>  
> @@ -306,7 +329,6 @@ int mdev_device_remove(struct device *dev)
>  	struct mdev_device *mdev, *tmp;
>  	struct mdev_parent *parent;
>  	struct mdev_type *type;
> -	int ret;
>  
>  	mdev = to_mdev_device(dev);
>  
> @@ -330,15 +352,17 @@ int mdev_device_remove(struct device *dev)
>  	mutex_unlock(&mdev_list_lock);
>  
>  	type = to_mdev_type(mdev->type_kobj);
> -	mdev_remove_sysfs_files(dev, type);
> -	device_del(&mdev->dev);
> -	parent = mdev->parent;
> -	ret = parent->ops->remove(mdev);
> -	if (ret)
> -		dev_err(&mdev->dev, "Remove failed: err=%d\n", ret);
> +	if (!mdev_try_get_parent(type->parent)) {
> +		/*
> +		 * Parent unregistration have started.
> +		 * No need to remove here.
> +		 */
> +		mutex_unlock(&mdev_list_lock);
> +		return -ENODEV;
> +	}
>  
> -	/* Balances with device_initialize() */
> -	put_device(&mdev->dev);
> +	parent = mdev->parent;
> +	mdev_device_remove_common(mdev);
>  	mdev_put_parent(parent);
>  
>  	return 0;
> diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
> index 067dc5d8c5de..781f111d66d2 100644
> --- a/drivers/vfio/mdev/mdev_private.h
> +++ b/drivers/vfio/mdev/mdev_private.h
> @@ -19,7 +19,11 @@ void mdev_bus_unregister(void);
>  struct mdev_parent {
>  	struct device *dev;
>  	const struct mdev_parent_ops *ops;
> -	struct kref ref;
> +	/* Protects unregistration to wait until create/remove
> +	 * are completed.
> +	 */
> +	refcount_t refcount;
> +	struct completion unreg_completion;
>  	struct list_head next;
>  	struct kset *mdev_types_kset;
>  	struct list_head type_list;


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHv2 08/10] vfio/mdev: Improve the create/remove sequence
  2019-05-08 22:06     ` Parav Pandit
@ 2019-05-09  9:06       ` Cornelia Huck
  2019-05-09 16:26         ` Pierre Morel
  2019-05-09 19:19         ` Parav Pandit
  0 siblings, 2 replies; 31+ messages in thread
From: Cornelia Huck @ 2019-05-09  9:06 UTC (permalink / raw)
  To: Parav Pandit
  Cc: kvm, linux-kernel, kwankhede, alex.williamson, cjia,
	Tony Krowiak, Pierre Morel, Halil Pasic

[vfio-ap folks: find a question regarding removal further down]

On Wed, 8 May 2019 22:06:48 +0000
Parav Pandit <parav@mellanox.com> wrote:

> > -----Original Message-----
> > From: Cornelia Huck <cohuck@redhat.com>
> > Sent: Wednesday, May 8, 2019 12:10 PM
> > To: Parav Pandit <parav@mellanox.com>
> > Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> > kwankhede@nvidia.com; alex.williamson@redhat.com; cjia@nvidia.com
> > Subject: Re: [PATCHv2 08/10] vfio/mdev: Improve the create/remove
> > sequence
> > 
> > On Tue, 30 Apr 2019 17:49:35 -0500
> > Parav Pandit <parav@mellanox.com> wrote:
> >   
> > > This patch addresses below two issues and prepares the code to address
> > > 3rd issue listed below.
> > >
> > > 1. mdev device is placed on the mdev bus before it is created in the
> > > vendor driver. Once a device is placed on the mdev bus without
> > > creating its supporting underlying vendor device, mdev driver's probe()  
> > gets triggered.  
> > > However there isn't a stable mdev available to work on.
> > >
> > >    create_store()
> > >      mdev_create_device()
> > >        device_register()
> > >           ...
> > >          vfio_mdev_probe()
> > >         [...]
> > >         parent->ops->create()
> > >           vfio_ap_mdev_create()
> > >             mdev_set_drvdata(mdev, matrix_mdev);
> > >             /* Valid pointer set above */
> > >
> > > Due to this way of initialization, mdev driver who want to use the

s/want/wants/

> > > mdev, doesn't have a valid mdev to work on.
> > >
> > > 2. Current creation sequence is,
> > >    parent->ops_create()
> > >    groups_register()
> > >
> > > Remove sequence is,
> > >    parent->ops->remove()
> > >    groups_unregister()
> > >
> > > However, remove sequence should be exact mirror of creation sequence.
> > > Once this is achieved, all users of the mdev will be terminated first
> > > before removing underlying vendor device.
> > > (Follow standard linux driver model).
> > > At that point vendor's remove() ops shouldn't failed because device is

s/failed/fail/

> > > taken off the bus that should terminate the users.

"because taking the device off the bus should terminate any usage" ?

> > >
> > > 3. When remove operation fails, mdev sysfs removal attempts to add the
> > > file back on already removed device. Following call trace [1] is observed.
> > >
> > > [1] call trace:
> > > kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
> > > sysfs_create_file_ns+0x7f/0x90
> > > kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted
> > > 5.1.0-rc6-vdevbus+ #6
> > > kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b
> > > 08/09/2016
> > > kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
> > > kernel: Call Trace:
> > > kernel: remove_store+0xdc/0x100 [mdev]
> > > kernel: kernfs_fop_write+0x113/0x1a0
> > > kernel: vfs_write+0xad/0x1b0
> > > kernel: ksys_write+0x5a/0xe0
> > > kernel: do_syscall_64+0x5a/0x210
> > > kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > >
> > > Therefore, mdev core is improved in following ways.
> > >
> > > 1. Before placing mdev devices on the bus, perform vendor drivers
> > > creation which supports the mdev creation.

"invoke the vendor driver ->create callback" ?

> > > This ensures that mdev specific all necessary fields are initialized

"that all necessary mdev specific fields are initialized" ?

> > > before a given mdev can be accessed by bus driver.
> > > This follows standard Linux kernel bus and device model similar to
> > > other widely used PCI bus.

"This follows standard practice on other Linux device model buses." ?

> > >
> > > 2. During remove flow, first remove the device from the bus. This
> > > ensures that any bus specific devices and data is cleared.
> > > Once device is taken of the mdev bus, perform remove() of mdev from

s/of/off/

> > > the vendor driver.
> > >
> > > 3. Linux core device model provides way to register and auto
> > > unregister the device sysfs attribute groups at dev->groups.

"The driver core provides a way to automatically register and
unregister sysfs attributes via dev->groups." ?

> > > Make use of this groups to let core create the groups and simplify
> > > code to avoid explicit groups creation and removal.
> > >
> > > A below stack dump of a mdev device remove process also ensures that
> > > vfio driver guards against device removal already in use.
> > >
> > >  cat /proc/21962/stack
> > > [<0>] vfio_del_group_dev+0x216/0x3c0 [vfio] [<0>]
> > > mdev_remove+0x21/0x40 [mdev] [<0>]
> > > device_release_driver_internal+0xe8/0x1b0
> > > [<0>] bus_remove_device+0xf9/0x170
> > > [<0>] device_del+0x168/0x350
> > > [<0>] mdev_device_remove_common+0x1d/0x50 [mdev] [<0>]
> > > mdev_device_remove+0x8c/0xd0 [mdev] [<0>] remove_store+0x71/0x90
> > > [mdev] [<0>] kernfs_fop_write+0x113/0x1a0 [<0>] vfs_write+0xad/0x1b0
> > > [<0>] ksys_write+0x5a/0xe0 [<0>] do_syscall_64+0x5a/0x210 [<0>]
> > > entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > [<0>] 0xffffffffffffffff
> > >
> > > This prepares the code to eliminate calling device_create_file() in
> > > subsquent patch.  

I find this stack dump and explanation more confusing than
enlightening. Maybe just drop it?

> > 
> > I'm afraid I have a bit of a problem following this explanation, so let me try
> > to summarize what the patch does to make sure that I understand it
> > correctly:
> > 
> > - Add the sysfs groups to device->groups so that the driver core deals
> >   with proper registration/deregistration.
> > - Split the device registration/deregistration sequence so that some
> >   things can be done between initialization of the device and hooking
> >   it up to the infrastructure respectively after deregistering it from
> >   the infrastructure but before giving up our final reference. In
> >   particular, this means invoking the ->create and ->remove callback in
> >   those new windows. This gives the vendor driver an initialized mdev
> >   device to work with during creation.
> > - Don't allow ->remove to fail, as the device is already removed from
> >   the infrastructure at that point in time.
> >   
> You got all the points pretty accurate.

Ok, good.

> 
> > >
> > > Signed-off-by: Parav Pandit <parav@mellanox.com>
> > > ---
> > >  drivers/vfio/mdev/mdev_core.c    | 94 +++++++++-----------------------
> > >  drivers/vfio/mdev/mdev_private.h |  2 +-
> > >  drivers/vfio/mdev/mdev_sysfs.c   |  2 +-
> > >  3 files changed, 27 insertions(+), 71 deletions(-)  
> > 
> > (...)

> > > @@ -373,16 +330,15 @@ int mdev_device_remove(struct device *dev,  
> > bool force_remove)  
> > >  	mutex_unlock(&mdev_list_lock);
> > >
> > >  	type = to_mdev_type(mdev->type_kobj);
> > > +	mdev_remove_sysfs_files(dev, type);
> > > +	device_del(&mdev->dev);
> > >  	parent = mdev->parent;
> > > +	ret = parent->ops->remove(mdev);
> > > +	if (ret)
> > > +		dev_err(&mdev->dev, "Remove failed: err=%d\n", ret);  
> > 
> > I think carrying on with removal regardless of the return code of the  
> > ->remove callback makes sense, as it simply matches usual practice.  
> > However, are we sure that every vendor driver works well with that? I think
> > it should, as removal from bus unregistration (vs. from the sysfs
> > file) was always something it could not veto, but have you looked at the
> > individual drivers?
> >   
> I looked at following drivers a little while back.
> Looked again now.
> 
> drivers/gpu/drm/i915/gvt/kvmgt.c which clears the handle valid in intel_vgpu_release(), which should finish first before remove() is invoked.
> 
> s390 vfio_ccw_mdev_remove() driver drivers/s390/cio/vfio_ccw_ops.c remove() always returns 0.
> s39 crypo fails the remove() once vfio_ap_mdev_release marks kvm null, which should finish before remove() is invoked.

That one is giving me a bit of a headache (the ->kvm reference is
supposed to keep us from detaching while a vm is running), so let's cc:
the vfio-ap maintainers to see whether they have any concerns.

> samples/vfio-mdev/mbochs.c mbochs_remove() always returns 0.
> 
> > >
> > > -	ret = mdev_device_remove_ops(mdev, force_remove);
> > > -	if (ret) {
> > > -		mdev->active = true;
> > > -		return ret;
> > > -	}
> > > -
> > > -	mdev_remove_sysfs_files(dev, type);
> > > -	device_unregister(dev);
> > > +	/* Balances with device_initialize() */
> > > +	put_device(&mdev->dev);
> > >  	mdev_put_parent(parent);
> > >
> > >  	return 0;  
> > 
> > I think that looks sane in general, but the commit message might benefit
> > from tweaking.  
> Part of your description is more crisp than my commit message, I can probably take snippet from it to improve?
> Or any specific entries in commit message that I should address?

I have added some comments inline (mostly some wording tweaks).

Feel free to take anything from my summary as well.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHv2 09/10] vfio/mdev: Avoid creating sysfs remove file on stale device removal
  2019-05-08 22:13     ` Parav Pandit
@ 2019-05-09  9:18       ` Cornelia Huck
  2019-05-09 16:16         ` Parav Pandit
  0 siblings, 1 reply; 31+ messages in thread
From: Cornelia Huck @ 2019-05-09  9:18 UTC (permalink / raw)
  To: Parav Pandit; +Cc: kvm, linux-kernel, kwankhede, alex.williamson, cjia

On Wed, 8 May 2019 22:13:28 +0000
Parav Pandit <parav@mellanox.com> wrote:

> > -----Original Message-----
> > From: Cornelia Huck <cohuck@redhat.com>
> > Sent: Wednesday, May 8, 2019 12:17 PM
> > To: Parav Pandit <parav@mellanox.com>
> > Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> > kwankhede@nvidia.com; alex.williamson@redhat.com; cjia@nvidia.com
> > Subject: Re: [PATCHv2 09/10] vfio/mdev: Avoid creating sysfs remove file on
> > stale device removal
> > 
> > On Tue, 30 Apr 2019 17:49:36 -0500
> > Parav Pandit <parav@mellanox.com> wrote:
> >   
> > > If device is removal is initiated by two threads as below, mdev core
> > > attempts to create a syfs remove file on stale device.
> > > During this flow, below [1] call trace is observed.
> > >
> > >      cpu-0                                    cpu-1
> > >      -----                                    -----
> > >   mdev_unregister_device()
> > >     device_for_each_child
> > >        mdev_device_remove_cb
> > >           mdev_device_remove
> > >                                        user_syscall
> > >                                          remove_store()
> > >                                            mdev_device_remove()
> > >                                         [..]
> > >    unregister device();
> > >                                        /* not found in list or
> > >                                         * active=false.
> > >                                         */
> > >                                           sysfs_create_file()
> > >                                           ..Call trace
> > >
> > > Now that mdev core follows correct device removal system of the linux
> > > bus model, remove shouldn't fail in normal cases. If it fails, there
> > > is no point of creating a stale file or checking for specific error status.  
> > 
> > Which error cases are left? Is there anything that does not indicate that
> > something got terribly messed up internally?
> >   
> Few reasons I can think of that can fail remove are:
> 
> 1. Some device removal requires allocating memory too as it needs to issue commands to device.
> If on the path, such allocation fails, remove can fail. However such fail to allocate memory will probably result into more serious warnings before this.

Nod. If we're OOM, we probably have some bigger problems anyway.

> 2. if the device firmware has crashed, device removal commands will likely timeout and return such error upto user.

In that case, I'd consider the device pretty much unusable in any case.

> 3. If user tries to remove a device, while parent is already in removal path, this call will eventually fail as it won't find the device in the internal list.

This should be benign, I think.

> 
> > >
> > > kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
> > > sysfs_create_file_ns+0x7f/0x90
> > > kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted
> > > 5.1.0-rc6-vdevbus+ #6
> > > kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b
> > > 08/09/2016
> > > kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
> > > kernel: Call Trace:
> > > kernel: remove_store+0xdc/0x100 [mdev]
> > > kernel: kernfs_fop_write+0x113/0x1a0
> > > kernel: vfs_write+0xad/0x1b0
> > > kernel: ksys_write+0x5a/0xe0
> > > kernel: do_syscall_64+0x5a/0x210
> > > kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > >
> > > Signed-off-by: Parav Pandit <parav@mellanox.com>
> > > ---
> > >  drivers/vfio/mdev/mdev_sysfs.c | 4 +---
> > >  1 file changed, 1 insertion(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/vfio/mdev/mdev_sysfs.c
> > > b/drivers/vfio/mdev/mdev_sysfs.c index 9f774b91d275..ffa3dcebf201
> > > 100644
> > > --- a/drivers/vfio/mdev/mdev_sysfs.c
> > > +++ b/drivers/vfio/mdev/mdev_sysfs.c
> > > @@ -237,10 +237,8 @@ static ssize_t remove_store(struct device *dev,  
> > struct device_attribute *attr,  
> > >  		int ret;
> > >
> > >  		ret = mdev_device_remove(dev);
> > > -		if (ret) {
> > > -			device_create_file(dev, attr);
> > > +		if (ret)  
> > 
> > Should you merge this into the previous patch?
> >   
> I am not sure. Previous patch changes the sequence. I think that deserved an own patch by itself.
> This change is making use of that sequence.
> So its easier to review? Alex had comment in v0 to split into more logical patches, so...
> Specially to capture a different call trace, I cut into different patch.
> Otherwise previous patch's commit message is too long.

I'm not sure if splitting out this one is worth it... your call.

> 
> > >  			return ret;
> > > -		}
> > >  	}
> > >
> > >  	return count;  
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHv2 10/10] vfio/mdev: Synchronize device create/remove with parent removal
  2019-05-09  2:46   ` Alex Williamson
@ 2019-05-09  9:49     ` Cornelia Huck
  2019-05-09 16:14       ` Parav Pandit
  2019-05-09 16:03     ` Parav Pandit
  1 sibling, 1 reply; 31+ messages in thread
From: Cornelia Huck @ 2019-05-09  9:49 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Parav Pandit, kvm, linux-kernel, kwankhede, cjia

On Wed, 8 May 2019 20:46:05 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Tue, 30 Apr 2019 17:49:37 -0500
> Parav Pandit <parav@mellanox.com> wrote:
> 
> > In following sequences, child devices created while removing mdev parent
> > device can be left out, or it may lead to race of removing half
> > initialized child mdev devices.
> > 
> > issue-1:
> > --------
> >        cpu-0                         cpu-1
> >        -----                         -----
> >                                   mdev_unregister_device()
> >                                     device_for_each_child()
> >                                       mdev_device_remove_cb()
> >                                         mdev_device_remove()
> > create_store()
> >   mdev_device_create()                   [...]
> >     device_add()
> >                                   parent_remove_sysfs_files()
> > 
> > /* BUG: device added by cpu-0
> >  * whose parent is getting removed
> >  * and it won't process this mdev.
> >  */
> > 
> > issue-2:
> > --------
> > Below crash is observed when user initiated remove is in progress
> > and mdev_unregister_driver() completes parent unregistration.
> > 
> >        cpu-0                         cpu-1
> >        -----                         -----
> > remove_store()
> >    mdev_device_remove()
> >    active = false;
> >                                   mdev_unregister_device()
> >                                   parent device removed.
> >    [...]
> >    parents->ops->remove()
> >  /*
> >   * BUG: Accessing invalid parent.
> >   */
> > 
> > This is similar race like create() racing with mdev_unregister_device().
> > 
> > BUG: unable to handle kernel paging request at ffffffffc0585668
> > PGD e8f618067 P4D e8f618067 PUD e8f61a067 PMD 85adca067 PTE 0
> > Oops: 0000 [#1] SMP PTI
> > CPU: 41 PID: 37403 Comm: bash Kdump: loaded Not tainted 5.1.0-rc6-vdevbus+ #6
> > Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
> > RIP: 0010:mdev_device_remove+0xfa/0x140 [mdev]
> > Call Trace:
> >  remove_store+0x71/0x90 [mdev]
> >  kernfs_fop_write+0x113/0x1a0
> >  vfs_write+0xad/0x1b0
> >  ksys_write+0x5a/0xe0
> >  do_syscall_64+0x5a/0x210
> >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > 
> > Therefore, mdev core is improved as below to overcome above issues.
> > 
> > Wait for any ongoing mdev create() and remove() to finish before
> > unregistering parent device using refcount and completion.
> > This continues to allow multiple create and remove to progress in
> > parallel for different mdev devices as most common case.
> > At the same time guard parent removal while parent is being access by
> > create() and remove callbacks.
> > 
> > Code is simplified from kref to use refcount as unregister_device() has
> > to wait anyway for all create/remove to finish.
> > 
> > While removing mdev devices during parent unregistration, there isn't
> > need to acquire refcount of parent device, hence code is restructured
> > using mdev_device_remove_common() to avoid it.  
> 
> Did you consider calling parent_remove_sysfs_files() earlier in
> mdev_unregister_device() and adding srcu support to know there are no
> in-flight callers of the create path?  I think that would address
> issue-1.
> 
> Issue-2 suggests a bug in our handling of the parent device krefs, the
> parent object should exist until all child devices which have a kref
> reference to the parent are removed, but clearly
> mdev_unregister_device() is not blocking for that to occur allowing the
> parent driver .remove callback to finish.  This seems similar to
> vfio_del_group_dev() where we need to block a vfio bus driver from
> removing a device until it becomes unused, could a similar solution
> with a wait_queue and wait_woken be used here?
> 
> I'm not immediately sold on the idea that removing a kref to solve this
> problem is a good thing, it seems odd to me that mdevs don't hold a
> reference to the parent throughout their life with this change, and the
> remove_store path branch to exit if we find we're racing the parent
> remove path is rather ugly.  BTW, why is the sanitization loop in
> mdev_device_remove() still here, wasn't that fixed by the previous two
> patches?  Thanks,

Agreed, I think not holding a reference to the parent is rather odd.

> 
> Alex
> 
> > Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
> > Signed-off-by: Parav Pandit <parav@mellanox.com>
> > ---
> >  drivers/vfio/mdev/mdev_core.c    | 86 ++++++++++++++++++++------------
> >  drivers/vfio/mdev/mdev_private.h |  6 ++-
> >  2 files changed, 60 insertions(+), 32 deletions(-)

(...)

> > @@ -206,14 +214,27 @@ void mdev_unregister_device(struct device *dev)
> >  	dev_info(dev, "MDEV: Unregistering\n");
> >  
> >  	list_del(&parent->next);
> > +	mutex_unlock(&parent_list_lock);
> > +
> > +	/* Release the initial reference so that new create cannot start */
> > +	mdev_put_parent(parent);
> > +
> > +	/*
> > +	 * Wait for all the create and remove references to drop.
> > +	 */
> > +	wait_for_completion(&parent->unreg_completion);
> > +
> > +	/*
> > +	 * New references cannot be taken and all users are done
> > +	 * using the parent. So it is safe to unregister parent.
> > +	 */
> >  	class_compat_remove_link(mdev_bus_compat_class, dev, NULL);
> >  
> >  	device_for_each_child(dev, NULL, mdev_device_remove_cb);
> >  
> >  	parent_remove_sysfs_files(parent);
> > -
> > -	mutex_unlock(&parent_list_lock);
> > -	mdev_put_parent(parent);
> > +	kfree(parent);

Such a kfree() is usually a big, flashing warning sign to me, even
though it probably isn't strictly broken in this case.

> > +	put_device(dev);
> >  }
> >  EXPORT_SYMBOL(mdev_unregister_device);
> >  

I think one problem I'm having here is that two things are conflated
with that approach:

- Structures holding a reference to another structure, where they need
  to be sure that it isn't pulled out from under them.
- Structures being hooked up and discoverable from somewhere else.

I think what we actually need is that the code possibly creating a new
mdev device is not able to look up the parent device if removal has
been already triggered for it. Same for triggering mdev device removal.

Do we need to somehow tie getting an extra reference to looking up the
device? Any extra reference does not hurt, as long as we remember to
drop it again :)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCHv2 10/10] vfio/mdev: Synchronize device create/remove with parent removal
  2019-05-09  2:46   ` Alex Williamson
  2019-05-09  9:49     ` Cornelia Huck
@ 2019-05-09 16:03     ` Parav Pandit
  1 sibling, 0 replies; 31+ messages in thread
From: Parav Pandit @ 2019-05-09 16:03 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, linux-kernel, kwankhede, cjia



> -----Original Message-----
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Wednesday, May 8, 2019 9:46 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankhede@nvidia.com; cjia@nvidia.com
> Subject: Re: [PATCHv2 10/10] vfio/mdev: Synchronize device create/remove
> with parent removal
> 
> On Tue, 30 Apr 2019 17:49:37 -0500
> Parav Pandit <parav@mellanox.com> wrote:
> 
> > In following sequences, child devices created while removing mdev
> > parent device can be left out, or it may lead to race of removing half
> > initialized child mdev devices.
> >
> > issue-1:
> > --------
> >        cpu-0                         cpu-1
> >        -----                         -----
> >                                   mdev_unregister_device()
> >                                     device_for_each_child()
> >                                       mdev_device_remove_cb()
> >                                         mdev_device_remove()
> > create_store()
> >   mdev_device_create()                   [...]
> >     device_add()
> >                                   parent_remove_sysfs_files()
> >
> > /* BUG: device added by cpu-0
> >  * whose parent is getting removed
> >  * and it won't process this mdev.
> >  */
> >
> > issue-2:
> > --------
> > Below crash is observed when user initiated remove is in progress and
> > mdev_unregister_driver() completes parent unregistration.
> >
> >        cpu-0                         cpu-1
> >        -----                         -----
> > remove_store()
> >    mdev_device_remove()
> >    active = false;
> >                                   mdev_unregister_device()
> >                                   parent device removed.
> >    [...]
> >    parents->ops->remove()
> >  /*
> >   * BUG: Accessing invalid parent.
> >   */
> >
> > This is similar race like create() racing with mdev_unregister_device().
> >
> > BUG: unable to handle kernel paging request at ffffffffc0585668 PGD
> > e8f618067 P4D e8f618067 PUD e8f61a067 PMD 85adca067 PTE 0
> > Oops: 0000 [#1] SMP PTI
> > CPU: 41 PID: 37403 Comm: bash Kdump: loaded Not tainted
> > 5.1.0-rc6-vdevbus+ #6 Hardware name: Supermicro
> > SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
> > RIP: 0010:mdev_device_remove+0xfa/0x140 [mdev] Call Trace:
> >  remove_store+0x71/0x90 [mdev]
> >  kernfs_fop_write+0x113/0x1a0
> >  vfs_write+0xad/0x1b0
> >  ksys_write+0x5a/0xe0
> >  do_syscall_64+0x5a/0x210
> >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >
> > Therefore, mdev core is improved as below to overcome above issues.
> >
> > Wait for any ongoing mdev create() and remove() to finish before
> > unregistering parent device using refcount and completion.
> > This continues to allow multiple create and remove to progress in
> > parallel for different mdev devices as most common case.
> > At the same time guard parent removal while parent is being access by
> > create() and remove callbacks.
> >
> > Code is simplified from kref to use refcount as unregister_device()
> > has to wait anyway for all create/remove to finish.
> >
> > While removing mdev devices during parent unregistration, there isn't
> > need to acquire refcount of parent device, hence code is restructured
> > using mdev_device_remove_common() to avoid it.
> 
> Did you consider calling parent_remove_sysfs_files() earlier in
> mdev_unregister_device() and adding srcu support to know there are no in-
> flight callers of the create path?  I think that would address issue-1.
> 
parent_remove_sysfs_files() cannot be done until create is completed because child mdev are under the parent's supported_types/../devices.
So once parent directory is removed, it removes the child link too.
And doing mdev_device_remove_common() on such child (for whom create was ongoing), results in warning.

Secondly, I dropped the srcu approach because srcu are slow, and call_rcu() is not helpful,
because once mdev_unregister_device() is completed, we want to be sure that there are no references to this parent device.

> Issue-2 suggests a bug in our handling of the parent device krefs, the parent
> object should exist until all child devices which have a kref reference to the
> parent are removed, but clearly
> mdev_unregister_device() is not blocking for that to occur allowing the
> parent driver .remove callback to finish.  This seems similar to
> vfio_del_group_dev() where we need to block a vfio bus driver from
> removing a device until it becomes unused, could a similar solution with a
> wait_queue and wait_woken be used here?
> 
mdev_unregister_device(), removes all the child mdevs by first by detaching from the drivers, followed by calling remove() of the vendor driver.
once those remove() call completes, there shouldn't be any active reference to parent.

Once mdevs are created or removed, they drop the reference to the parent.
If we keep around the reference to parent, than parent cannot publish that it is going away.
An additional refcount or another sync mechanism is needed to achieve that.
I don't' see what would that buy us.

> I'm not immediately sold on the idea that removing a kref to solve this
> problem is a good thing, it seems odd to me that mdevs don't hold a
> reference to the parent throughout their life with this change, and the
> remove_store path branch to exit if we find we're racing the parent remove
> path is rather ugly.  
Mdev's are anchored to its parent. When parent is going away, its removing the child mdevs after ensuring that no removes() are progressing.
so remove_store and create are not very different in what they do.

> BTW, why is the sanitization loop in
> mdev_device_remove() still here, wasn't that fixed by the previous two
> patches?  Thanks,
> 
The loop still exist because remove_store() does't hold any reference to the struct device getting removed.
So it is very much possible that remove_store() didn't yet invoke the mdev_device_remove(), but such device is already removed.
This sanitization loop for now helps there.
I want to do more sane things for remove by uuid (similar to create file) and not by self-file, but at later point...

> Alex
> 
> > Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
> > Signed-off-by: Parav Pandit <parav@mellanox.com>
> > ---
> >  drivers/vfio/mdev/mdev_core.c    | 86 ++++++++++++++++++++------------
> >  drivers/vfio/mdev/mdev_private.h |  6 ++-
> >  2 files changed, 60 insertions(+), 32 deletions(-)
> >
> > diff --git a/drivers/vfio/mdev/mdev_core.c
> > b/drivers/vfio/mdev/mdev_core.c index 2b98da2ee361..a5da24d662f4
> > 100644
> > --- a/drivers/vfio/mdev/mdev_core.c
> > +++ b/drivers/vfio/mdev/mdev_core.c
> > @@ -78,34 +78,41 @@ static struct mdev_parent
> *__find_parent_device(struct device *dev)
> >  	return NULL;
> >  }
> >
> > -static void mdev_release_parent(struct kref *kref)
> > +static bool mdev_try_get_parent(struct mdev_parent *parent)
> >  {
> > -	struct mdev_parent *parent = container_of(kref, struct
> mdev_parent,
> > -						  ref);
> > -	struct device *dev = parent->dev;
> > -
> > -	kfree(parent);
> > -	put_device(dev);
> > +	if (parent)
> > +		return refcount_inc_not_zero(&parent->refcount);
> > +	return false;
> >  }
> >
> > -static struct mdev_parent *mdev_get_parent(struct mdev_parent
> > *parent)
> > +static void mdev_put_parent(struct mdev_parent *parent)
> >  {
> > -	if (parent)
> > -		kref_get(&parent->ref);
> > -
> > -	return parent;
> > +	if (parent && refcount_dec_and_test(&parent->refcount))
> > +		complete(&parent->unreg_completion);
> >  }
> >
> > -static void mdev_put_parent(struct mdev_parent *parent)
> > +static void mdev_device_remove_common(struct mdev_device *mdev)
> >  {
> > -	if (parent)
> > -		kref_put(&parent->ref, mdev_release_parent);
> > +	struct mdev_parent *parent;
> > +	struct mdev_type *type;
> > +	int ret;
> > +
> > +	type = to_mdev_type(mdev->type_kobj);
> > +	mdev_remove_sysfs_files(&mdev->dev, type);
> > +	device_del(&mdev->dev);
> > +	parent = mdev->parent;
> > +	ret = parent->ops->remove(mdev);
> > +	if (ret)
> > +		dev_err(&mdev->dev, "Remove failed: err=%d\n", ret);
> > +
> > +	/* Balances with device_initialize() */
> > +	put_device(&mdev->dev);
> >  }
> >
> >  static int mdev_device_remove_cb(struct device *dev, void *data)  {
> >  	if (dev_is_mdev(dev))
> > -		mdev_device_remove(dev);
> > +		mdev_device_remove_common(to_mdev_device(dev));
> >
> >  	return 0;
> >  }
> > @@ -147,7 +154,8 @@ int mdev_register_device(struct device *dev, const
> struct mdev_parent_ops *ops)
> >  		goto add_dev_err;
> >  	}
> >
> > -	kref_init(&parent->ref);
> > +	refcount_set(&parent->refcount, 1);
> > +	init_completion(&parent->unreg_completion);
> >
> >  	parent->dev = dev;
> >  	parent->ops = ops;
> > @@ -206,14 +214,27 @@ void mdev_unregister_device(struct device *dev)
> >  	dev_info(dev, "MDEV: Unregistering\n");
> >
> >  	list_del(&parent->next);
> > +	mutex_unlock(&parent_list_lock);
> > +
> > +	/* Release the initial reference so that new create cannot start */
> > +	mdev_put_parent(parent);
> > +
> > +	/*
> > +	 * Wait for all the create and remove references to drop.
> > +	 */
> > +	wait_for_completion(&parent->unreg_completion);
> > +
> > +	/*
> > +	 * New references cannot be taken and all users are done
> > +	 * using the parent. So it is safe to unregister parent.
> > +	 */
> >  	class_compat_remove_link(mdev_bus_compat_class, dev, NULL);
> >
> >  	device_for_each_child(dev, NULL, mdev_device_remove_cb);
> >
> >  	parent_remove_sysfs_files(parent);
> > -
> > -	mutex_unlock(&parent_list_lock);
> > -	mdev_put_parent(parent);
> > +	kfree(parent);
> > +	put_device(dev);
> >  }
> >  EXPORT_SYMBOL(mdev_unregister_device);
> >
> > @@ -237,10 +258,11 @@ int mdev_device_create(struct kobject *kobj,
> >  	struct mdev_parent *parent;
> >  	struct mdev_type *type = to_mdev_type(kobj);
> >
> > -	parent = mdev_get_parent(type->parent);
> > -	if (!parent)
> > +	if (!mdev_try_get_parent(type->parent))
> >  		return -EINVAL;
> >
> > +	parent = type->parent;
> > +
> >  	mutex_lock(&mdev_list_lock);
> >
> >  	/* Check for duplicate */
> > @@ -287,6 +309,7 @@ int mdev_device_create(struct kobject *kobj,
> >
> >  	mdev->active = true;
> >  	dev_dbg(&mdev->dev, "MDEV: created\n");
> > +	mdev_put_parent(parent);
> >
> >  	return 0;
> >
> > @@ -306,7 +329,6 @@ int mdev_device_remove(struct device *dev)
> >  	struct mdev_device *mdev, *tmp;
> >  	struct mdev_parent *parent;
> >  	struct mdev_type *type;
> > -	int ret;
> >
> >  	mdev = to_mdev_device(dev);
> >
> > @@ -330,15 +352,17 @@ int mdev_device_remove(struct device *dev)
> >  	mutex_unlock(&mdev_list_lock);
> >
> >  	type = to_mdev_type(mdev->type_kobj);
> > -	mdev_remove_sysfs_files(dev, type);
> > -	device_del(&mdev->dev);
> > -	parent = mdev->parent;
> > -	ret = parent->ops->remove(mdev);
> > -	if (ret)
> > -		dev_err(&mdev->dev, "Remove failed: err=%d\n", ret);
> > +	if (!mdev_try_get_parent(type->parent)) {
> > +		/*
> > +		 * Parent unregistration have started.
> > +		 * No need to remove here.
> > +		 */
> > +		mutex_unlock(&mdev_list_lock);
> > +		return -ENODEV;
> > +	}
> >
> > -	/* Balances with device_initialize() */
> > -	put_device(&mdev->dev);
> > +	parent = mdev->parent;
> > +	mdev_device_remove_common(mdev);
> >  	mdev_put_parent(parent);
> >
> >  	return 0;
> > diff --git a/drivers/vfio/mdev/mdev_private.h
> > b/drivers/vfio/mdev/mdev_private.h
> > index 067dc5d8c5de..781f111d66d2 100644
> > --- a/drivers/vfio/mdev/mdev_private.h
> > +++ b/drivers/vfio/mdev/mdev_private.h
> > @@ -19,7 +19,11 @@ void mdev_bus_unregister(void);  struct
> mdev_parent
> > {
> >  	struct device *dev;
> >  	const struct mdev_parent_ops *ops;
> > -	struct kref ref;
> > +	/* Protects unregistration to wait until create/remove
> > +	 * are completed.
> > +	 */
> > +	refcount_t refcount;
> > +	struct completion unreg_completion;
> >  	struct list_head next;
> >  	struct kset *mdev_types_kset;
> >  	struct list_head type_list;


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCHv2 10/10] vfio/mdev: Synchronize device create/remove with parent removal
  2019-05-09  9:49     ` Cornelia Huck
@ 2019-05-09 16:14       ` Parav Pandit
  0 siblings, 0 replies; 31+ messages in thread
From: Parav Pandit @ 2019-05-09 16:14 UTC (permalink / raw)
  To: Cornelia Huck, Alex Williamson; +Cc: kvm, linux-kernel, kwankhede, cjia



> -----Original Message-----
> From: Cornelia Huck <cohuck@redhat.com>
> Sent: Thursday, May 9, 2019 4:49 AM
> To: Alex Williamson <alex.williamson@redhat.com>
> Cc: Parav Pandit <parav@mellanox.com>; kvm@vger.kernel.org; linux-
> kernel@vger.kernel.org; kwankhede@nvidia.com; cjia@nvidia.com
> Subject: Re: [PATCHv2 10/10] vfio/mdev: Synchronize device create/remove
> with parent removal
> 
> On Wed, 8 May 2019 20:46:05 -0600
> Alex Williamson <alex.williamson@redhat.com> wrote:
> 
> > On Tue, 30 Apr 2019 17:49:37 -0500
> > Parav Pandit <parav@mellanox.com> wrote:
> >
> > > In following sequences, child devices created while removing mdev
> > > parent device can be left out, or it may lead to race of removing
> > > half initialized child mdev devices.
> > >
> > > issue-1:
> > > --------
> > >        cpu-0                         cpu-1
> > >        -----                         -----
> > >                                   mdev_unregister_device()
> > >                                     device_for_each_child()
> > >                                       mdev_device_remove_cb()
> > >                                         mdev_device_remove()
> > > create_store()
> > >   mdev_device_create()                   [...]
> > >     device_add()
> > >                                   parent_remove_sysfs_files()
> > >
> > > /* BUG: device added by cpu-0
> > >  * whose parent is getting removed
> > >  * and it won't process this mdev.
> > >  */
> > >
> > > issue-2:
> > > --------
> > > Below crash is observed when user initiated remove is in progress
> > > and mdev_unregister_driver() completes parent unregistration.
> > >
> > >        cpu-0                         cpu-1
> > >        -----                         -----
> > > remove_store()
> > >    mdev_device_remove()
> > >    active = false;
> > >                                   mdev_unregister_device()
> > >                                   parent device removed.
> > >    [...]
> > >    parents->ops->remove()
> > >  /*
> > >   * BUG: Accessing invalid parent.
> > >   */
> > >
> > > This is similar race like create() racing with mdev_unregister_device().
> > >
> > > BUG: unable to handle kernel paging request at ffffffffc0585668 PGD
> > > e8f618067 P4D e8f618067 PUD e8f61a067 PMD 85adca067 PTE 0
> > > Oops: 0000 [#1] SMP PTI
> > > CPU: 41 PID: 37403 Comm: bash Kdump: loaded Not tainted
> > > 5.1.0-rc6-vdevbus+ #6 Hardware name: Supermicro
> > > SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
> > > RIP: 0010:mdev_device_remove+0xfa/0x140 [mdev] Call Trace:
> > >  remove_store+0x71/0x90 [mdev]
> > >  kernfs_fop_write+0x113/0x1a0
> > >  vfs_write+0xad/0x1b0
> > >  ksys_write+0x5a/0xe0
> > >  do_syscall_64+0x5a/0x210
> > >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > >
> > > Therefore, mdev core is improved as below to overcome above issues.
> > >
> > > Wait for any ongoing mdev create() and remove() to finish before
> > > unregistering parent device using refcount and completion.
> > > This continues to allow multiple create and remove to progress in
> > > parallel for different mdev devices as most common case.
> > > At the same time guard parent removal while parent is being access
> > > by
> > > create() and remove callbacks.
> > >
> > > Code is simplified from kref to use refcount as unregister_device()
> > > has to wait anyway for all create/remove to finish.
> > >
> > > While removing mdev devices during parent unregistration, there
> > > isn't need to acquire refcount of parent device, hence code is
> > > restructured using mdev_device_remove_common() to avoid it.
> >
> > Did you consider calling parent_remove_sysfs_files() earlier in
> > mdev_unregister_device() and adding srcu support to know there are no
> > in-flight callers of the create path?  I think that would address
> > issue-1.
> >
> > Issue-2 suggests a bug in our handling of the parent device krefs, the
> > parent object should exist until all child devices which have a kref
> > reference to the parent are removed, but clearly
> > mdev_unregister_device() is not blocking for that to occur allowing
> > the parent driver .remove callback to finish.  This seems similar to
> > vfio_del_group_dev() where we need to block a vfio bus driver from
> > removing a device until it becomes unused, could a similar solution
> > with a wait_queue and wait_woken be used here?
> >
> > I'm not immediately sold on the idea that removing a kref to solve
> > this problem is a good thing, it seems odd to me that mdevs don't hold
> > a reference to the parent throughout their life with this change, and
> > the remove_store path branch to exit if we find we're racing the
> > parent remove path is rather ugly.  BTW, why is the sanitization loop
> > in
> > mdev_device_remove() still here, wasn't that fixed by the previous two
> > patches?  Thanks,
> 
> Agreed, I think not holding a reference to the parent is rather odd.
> 
> >
> > Alex
> >
> > > Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
> > > Signed-off-by: Parav Pandit <parav@mellanox.com>
> > > ---
> > >  drivers/vfio/mdev/mdev_core.c    | 86 ++++++++++++++++++++------------
> > >  drivers/vfio/mdev/mdev_private.h |  6 ++-
> > >  2 files changed, 60 insertions(+), 32 deletions(-)
> 
> (...)
> 
> > > @@ -206,14 +214,27 @@ void mdev_unregister_device(struct device
> *dev)
> > >  	dev_info(dev, "MDEV: Unregistering\n");
> > >
> > >  	list_del(&parent->next);
> > > +	mutex_unlock(&parent_list_lock);
> > > +
> > > +	/* Release the initial reference so that new create cannot start */
> > > +	mdev_put_parent(parent);
> > > +
> > > +	/*
> > > +	 * Wait for all the create and remove references to drop.
> > > +	 */
> > > +	wait_for_completion(&parent->unreg_completion);
> > > +
> > > +	/*
> > > +	 * New references cannot be taken and all users are done
> > > +	 * using the parent. So it is safe to unregister parent.
> > > +	 */
> > >  	class_compat_remove_link(mdev_bus_compat_class, dev, NULL);
> > >
> > >  	device_for_each_child(dev, NULL, mdev_device_remove_cb);
> > >
> > >  	parent_remove_sysfs_files(parent);
> > > -
> > > -	mutex_unlock(&parent_list_lock);
> > > -	mdev_put_parent(parent);
> > > +	kfree(parent);
> 
> Such a kfree() is usually a big, flashing warning sign to me, even though it
> probably isn't strictly broken in this case.
> 
Devices are removed, create files are removed. Parent is taken off the list.
So who else can access this parent?

> > > +	put_device(dev);
> > >  }
> > >  EXPORT_SYMBOL(mdev_unregister_device);
> > >
> 
> I think one problem I'm having here is that two things are conflated with
> that approach:
> 
> - Structures holding a reference to another structure, where they need
>   to be sure that it isn't pulled out from under them.
> - Structures being hooked up and discoverable from somewhere else.
> 
> I think what we actually need is that the code possibly creating a new mdev
> device is not able to look up the parent device if removal has been already
> triggered for it. Same for triggering mdev device removal.
> 
This is already present in the patch.
mdev_try_get_parent() API looks up the parent during creation and removal path.

> Do we need to somehow tie getting an extra reference to looking up the
> device? Any extra reference does not hurt, as long as we remember to drop
> it again :)

A single refcount publishes its parent's device existence.
If we want to hold reference to parent until the life of child device, than we need additional plumbing.
Additional plumbing advertises when parent is reachable/getting removed and also waits until create/remove is completed.
I think if we can justify the need for this additional plumbing, it makes it easy to think what to be done.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCHv2 09/10] vfio/mdev: Avoid creating sysfs remove file on stale device removal
  2019-05-09  9:18       ` Cornelia Huck
@ 2019-05-09 16:16         ` Parav Pandit
  0 siblings, 0 replies; 31+ messages in thread
From: Parav Pandit @ 2019-05-09 16:16 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: kvm, linux-kernel, kwankhede, alex.williamson, cjia



> -----Original Message-----
> From: Cornelia Huck <cohuck@redhat.com>
> Sent: Thursday, May 9, 2019 4:18 AM
> To: Parav Pandit <parav@mellanox.com>
> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankhede@nvidia.com; alex.williamson@redhat.com; cjia@nvidia.com
> Subject: Re: [PATCHv2 09/10] vfio/mdev: Avoid creating sysfs remove file on
> stale device removal
> 
> On Wed, 8 May 2019 22:13:28 +0000
> Parav Pandit <parav@mellanox.com> wrote:
> 
> > > -----Original Message-----
> > > From: Cornelia Huck <cohuck@redhat.com>
> > > Sent: Wednesday, May 8, 2019 12:17 PM
> > > To: Parav Pandit <parav@mellanox.com>
> > > Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > kwankhede@nvidia.com; alex.williamson@redhat.com; cjia@nvidia.com
> > > Subject: Re: [PATCHv2 09/10] vfio/mdev: Avoid creating sysfs remove
> > > file on stale device removal
> > >
> > > On Tue, 30 Apr 2019 17:49:36 -0500
> > > Parav Pandit <parav@mellanox.com> wrote:
> > >
> > > > If device is removal is initiated by two threads as below, mdev
> > > > core attempts to create a syfs remove file on stale device.
> > > > During this flow, below [1] call trace is observed.
> > > >
> > > >      cpu-0                                    cpu-1
> > > >      -----                                    -----
> > > >   mdev_unregister_device()
> > > >     device_for_each_child
> > > >        mdev_device_remove_cb
> > > >           mdev_device_remove
> > > >                                        user_syscall
> > > >                                          remove_store()
> > > >                                            mdev_device_remove()
> > > >                                         [..]
> > > >    unregister device();
> > > >                                        /* not found in list or
> > > >                                         * active=false.
> > > >                                         */
> > > >                                           sysfs_create_file()
> > > >                                           ..Call trace
> > > >
> > > > Now that mdev core follows correct device removal system of the
> > > > linux bus model, remove shouldn't fail in normal cases. If it
> > > > fails, there is no point of creating a stale file or checking for specific
> error status.
> > >
> > > Which error cases are left? Is there anything that does not indicate
> > > that something got terribly messed up internally?
> > >
> > Few reasons I can think of that can fail remove are:
> >
> > 1. Some device removal requires allocating memory too as it needs to issue
> commands to device.
> > If on the path, such allocation fails, remove can fail. However such fail to
> allocate memory will probably result into more serious warnings before this.
> 
> Nod. If we're OOM, we probably have some bigger problems anyway.
> 
> > 2. if the device firmware has crashed, device removal commands will likely
> timeout and return such error upto user.
> 
> In that case, I'd consider the device pretty much unusable in any case.
> 
Right.

> > 3. If user tries to remove a device, while parent is already in removal path,
> this call will eventually fail as it won't find the device in the internal list.
> 
> This should be benign, I think.
> 
Right.

> >
> > > >
> > > > kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
> > > > sysfs_create_file_ns+0x7f/0x90
> > > > kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted
> > > > 5.1.0-rc6-vdevbus+ #6
> > > > kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS
> > > > 2.0b
> > > > 08/09/2016
> > > > kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
> > > > kernel: Call Trace:
> > > > kernel: remove_store+0xdc/0x100 [mdev]
> > > > kernel: kernfs_fop_write+0x113/0x1a0
> > > > kernel: vfs_write+0xad/0x1b0
> > > > kernel: ksys_write+0x5a/0xe0
> > > > kernel: do_syscall_64+0x5a/0x210
> > > > kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > >
> > > > Signed-off-by: Parav Pandit <parav@mellanox.com>
> > > > ---
> > > >  drivers/vfio/mdev/mdev_sysfs.c | 4 +---
> > > >  1 file changed, 1 insertion(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/vfio/mdev/mdev_sysfs.c
> > > > b/drivers/vfio/mdev/mdev_sysfs.c index 9f774b91d275..ffa3dcebf201
> > > > 100644
> > > > --- a/drivers/vfio/mdev/mdev_sysfs.c
> > > > +++ b/drivers/vfio/mdev/mdev_sysfs.c
> > > > @@ -237,10 +237,8 @@ static ssize_t remove_store(struct device
> > > > *dev,
> > > struct device_attribute *attr,
> > > >  		int ret;
> > > >
> > > >  		ret = mdev_device_remove(dev);
> > > > -		if (ret) {
> > > > -			device_create_file(dev, attr);
> > > > +		if (ret)
> > >
> > > Should you merge this into the previous patch?
> > >
> > I am not sure. Previous patch changes the sequence. I think that deserved
> an own patch by itself.
> > This change is making use of that sequence.
> > So its easier to review? Alex had comment in v0 to split into more logical
> patches, so...
> > Specially to capture a different call trace, I cut into different patch.
> > Otherwise previous patch's commit message is too long.
> 
> I'm not sure if splitting out this one is worth it... your call.
> 
Ok. either ways...
> >
> > > >  			return ret;
> > > > -		}
> > > >  	}
> > > >
> > > >  	return count;
> >


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHv2 08/10] vfio/mdev: Improve the create/remove sequence
  2019-05-09  9:06       ` Cornelia Huck
@ 2019-05-09 16:26         ` Pierre Morel
  2019-05-09 22:12           ` Halil Pasic
  2019-05-09 19:19         ` Parav Pandit
  1 sibling, 1 reply; 31+ messages in thread
From: Pierre Morel @ 2019-05-09 16:26 UTC (permalink / raw)
  To: Cornelia Huck, Parav Pandit
  Cc: kvm, linux-kernel, kwankhede, alex.williamson, cjia,
	Tony Krowiak, Halil Pasic

On 09/05/2019 11:06, Cornelia Huck wrote:
> [vfio-ap folks: find a question regarding removal further down]
> 
> On Wed, 8 May 2019 22:06:48 +0000
> Parav Pandit <parav@mellanox.com> wrote:
> 
>>> -----Original Message-----
>>> From: Cornelia Huck <cohuck@redhat.com>
>>> Sent: Wednesday, May 8, 2019 12:10 PM
>>> To: Parav Pandit <parav@mellanox.com>
>>> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
>>> kwankhede@nvidia.com; alex.williamson@redhat.com; cjia@nvidia.com
>>> Subject: Re: [PATCHv2 08/10] vfio/mdev: Improve the create/remove
>>> sequence
>>>
>>> On Tue, 30 Apr 2019 17:49:35 -0500
>>> Parav Pandit <parav@mellanox.com> wrote:
>>>    

...snip...

>>>> @@ -373,16 +330,15 @@ int mdev_device_remove(struct device *dev,
>>> bool force_remove)
>>>>   	mutex_unlock(&mdev_list_lock);
>>>>
>>>>   	type = to_mdev_type(mdev->type_kobj);
>>>> +	mdev_remove_sysfs_files(dev, type);
>>>> +	device_del(&mdev->dev);
>>>>   	parent = mdev->parent;
>>>> +	ret = parent->ops->remove(mdev);
>>>> +	if (ret)
>>>> +		dev_err(&mdev->dev, "Remove failed: err=%d\n", ret);
>>>
>>> I think carrying on with removal regardless of the return code of the
>>> ->remove callback makes sense, as it simply matches usual practice.
>>> However, are we sure that every vendor driver works well with that? I think
>>> it should, as removal from bus unregistration (vs. from the sysfs
>>> file) was always something it could not veto, but have you looked at the
>>> individual drivers?
>>>    
>> I looked at following drivers a little while back.
>> Looked again now.
>>
>> drivers/gpu/drm/i915/gvt/kvmgt.c which clears the handle valid in intel_vgpu_release(), which should finish first before remove() is invoked.
>>
>> s390 vfio_ccw_mdev_remove() driver drivers/s390/cio/vfio_ccw_ops.c remove() always returns 0.
>> s39 crypo fails the remove() once vfio_ap_mdev_release marks kvm null, which should finish before remove() is invoked.
> 
> That one is giving me a bit of a headache (the ->kvm reference is
> supposed to keep us from detaching while a vm is running), so let's cc:
> the vfio-ap maintainers to see whether they have any concerns.
> 

We are aware of this race and we did correct this in the IRQ patches for 
which it would have become a real issue.
We now increment/decrement the KVM reference counter inside open and 
release.
Should be right after this.

Thanks for the cc,
Pierre


-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCHv2 08/10] vfio/mdev: Improve the create/remove sequence
  2019-05-09  9:06       ` Cornelia Huck
  2019-05-09 16:26         ` Pierre Morel
@ 2019-05-09 19:19         ` Parav Pandit
  2019-05-10  7:08           ` Pierre Morel
  2019-05-14 20:34           ` Parav Pandit
  1 sibling, 2 replies; 31+ messages in thread
From: Parav Pandit @ 2019-05-09 19:19 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: kvm, linux-kernel, kwankhede, alex.williamson, cjia,
	Tony Krowiak, Pierre Morel, Halil Pasic



> -----Original Message-----
> From: Cornelia Huck <cohuck@redhat.com>
> Sent: Thursday, May 9, 2019 4:06 AM
> To: Parav Pandit <parav@mellanox.com>
> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankhede@nvidia.com; alex.williamson@redhat.com; cjia@nvidia.com;
> Tony Krowiak <akrowiak@linux.ibm.com>; Pierre Morel
> <pmorel@linux.ibm.com>; Halil Pasic <pasic@linux.ibm.com>
> Subject: Re: [PATCHv2 08/10] vfio/mdev: Improve the create/remove
> sequence
> 
> [vfio-ap folks: find a question regarding removal further down]
> 
> On Wed, 8 May 2019 22:06:48 +0000
> Parav Pandit <parav@mellanox.com> wrote:
> 
> > > -----Original Message-----
> > > From: Cornelia Huck <cohuck@redhat.com>
> > > Sent: Wednesday, May 8, 2019 12:10 PM
> > > To: Parav Pandit <parav@mellanox.com>
> > > Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > kwankhede@nvidia.com; alex.williamson@redhat.com; cjia@nvidia.com
> > > Subject: Re: [PATCHv2 08/10] vfio/mdev: Improve the create/remove
> > > sequence
> > >
> > > On Tue, 30 Apr 2019 17:49:35 -0500
> > > Parav Pandit <parav@mellanox.com> wrote:
> > >
> > > > This patch addresses below two issues and prepares the code to
> > > > address 3rd issue listed below.
> > > >
> > > > 1. mdev device is placed on the mdev bus before it is created in
> > > > the vendor driver. Once a device is placed on the mdev bus without
> > > > creating its supporting underlying vendor device, mdev driver's
> > > > probe()
> > > gets triggered.
> > > > However there isn't a stable mdev available to work on.
> > > >
> > > >    create_store()
> > > >      mdev_create_device()
> > > >        device_register()
> > > >           ...
> > > >          vfio_mdev_probe()
> > > >         [...]
> > > >         parent->ops->create()
> > > >           vfio_ap_mdev_create()
> > > >             mdev_set_drvdata(mdev, matrix_mdev);
> > > >             /* Valid pointer set above */
> > > >
> > > > Due to this way of initialization, mdev driver who want to use the
> 
> s/want/wants/
> 
> > > > mdev, doesn't have a valid mdev to work on.
> > > >
> > > > 2. Current creation sequence is,
> > > >    parent->ops_create()
> > > >    groups_register()
> > > >
> > > > Remove sequence is,
> > > >    parent->ops->remove()
> > > >    groups_unregister()
> > > >
> > > > However, remove sequence should be exact mirror of creation
> sequence.
> > > > Once this is achieved, all users of the mdev will be terminated
> > > > first before removing underlying vendor device.
> > > > (Follow standard linux driver model).
> > > > At that point vendor's remove() ops shouldn't failed because
> > > > device is
> 
> s/failed/fail/
> 
> > > > taken off the bus that should terminate the users.
> 
> "because taking the device off the bus should terminate any usage" ?
> 
> > > >
> > > > 3. When remove operation fails, mdev sysfs removal attempts to add
> > > > the file back on already removed device. Following call trace [1] is
> observed.
> > > >
> > > > [1] call trace:
> > > > kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
> > > > sysfs_create_file_ns+0x7f/0x90
> > > > kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted
> > > > 5.1.0-rc6-vdevbus+ #6
> > > > kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS
> > > > 2.0b
> > > > 08/09/2016
> > > > kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
> > > > kernel: Call Trace:
> > > > kernel: remove_store+0xdc/0x100 [mdev]
> > > > kernel: kernfs_fop_write+0x113/0x1a0
> > > > kernel: vfs_write+0xad/0x1b0
> > > > kernel: ksys_write+0x5a/0xe0
> > > > kernel: do_syscall_64+0x5a/0x210
> > > > kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > >
> > > > Therefore, mdev core is improved in following ways.
> > > >
> > > > 1. Before placing mdev devices on the bus, perform vendor drivers
> > > > creation which supports the mdev creation.
> 
> "invoke the vendor driver ->create callback" ?
> 
> > > > This ensures that mdev specific all necessary fields are
> > > > initialized
> 
> "that all necessary mdev specific fields are initialized" ?
> 
> > > > before a given mdev can be accessed by bus driver.
> > > > This follows standard Linux kernel bus and device model similar to
> > > > other widely used PCI bus.
> 
> "This follows standard practice on other Linux device model buses." ?
> 
> > > >
> > > > 2. During remove flow, first remove the device from the bus. This
> > > > ensures that any bus specific devices and data is cleared.
> > > > Once device is taken of the mdev bus, perform remove() of mdev
> > > > from
> 
> s/of/off/
> 
> > > > the vendor driver.
> > > >
> > > > 3. Linux core device model provides way to register and auto
> > > > unregister the device sysfs attribute groups at dev->groups.
> 
> "The driver core provides a way to automatically register and unregister sysfs
> attributes via dev->groups." ?
> 
> > > > Make use of this groups to let core create the groups and simplify
> > > > code to avoid explicit groups creation and removal.
> > > >
> > > > A below stack dump of a mdev device remove process also ensures
> > > > that vfio driver guards against device removal already in use.
> > > >
> > > >  cat /proc/21962/stack
> > > > [<0>] vfio_del_group_dev+0x216/0x3c0 [vfio] [<0>]
> > > > mdev_remove+0x21/0x40 [mdev] [<0>]
> > > > device_release_driver_internal+0xe8/0x1b0
> > > > [<0>] bus_remove_device+0xf9/0x170 [<0>] device_del+0x168/0x350
> > > > [<0>] mdev_device_remove_common+0x1d/0x50 [mdev] [<0>]
> > > > mdev_device_remove+0x8c/0xd0 [mdev] [<0>]
> remove_store+0x71/0x90
> > > > [mdev] [<0>] kernfs_fop_write+0x113/0x1a0 [<0>]
> > > > vfs_write+0xad/0x1b0 [<0>] ksys_write+0x5a/0xe0 [<0>]
> > > > do_syscall_64+0x5a/0x210 [<0>]
> > > > entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > > [<0>] 0xffffffffffffffff
> > > >
> > > > This prepares the code to eliminate calling device_create_file()
> > > > in subsquent patch.
> 
> I find this stack dump and explanation more confusing than enlightening.
> Maybe just drop it?
> 
> > >
> > > I'm afraid I have a bit of a problem following this explanation, so
> > > let me try to summarize what the patch does to make sure that I
> > > understand it
> > > correctly:
> > >
> > > - Add the sysfs groups to device->groups so that the driver core deals
> > >   with proper registration/deregistration.
> > > - Split the device registration/deregistration sequence so that some
> > >   things can be done between initialization of the device and hooking
> > >   it up to the infrastructure respectively after deregistering it from
> > >   the infrastructure but before giving up our final reference. In
> > >   particular, this means invoking the ->create and ->remove callback in
> > >   those new windows. This gives the vendor driver an initialized mdev
> > >   device to work with during creation.
> > > - Don't allow ->remove to fail, as the device is already removed from
> > >   the infrastructure at that point in time.
> > >
> > You got all the points pretty accurate.
> 
> Ok, good.
> 
> >
> > > >
> > > > Signed-off-by: Parav Pandit <parav@mellanox.com>
> > > > ---
> > > >  drivers/vfio/mdev/mdev_core.c    | 94 +++++++++-----------------------
> > > >  drivers/vfio/mdev/mdev_private.h |  2 +-
> > > >  drivers/vfio/mdev/mdev_sysfs.c   |  2 +-
> > > >  3 files changed, 27 insertions(+), 71 deletions(-)
> > >
> > > (...)
> 
> > > > @@ -373,16 +330,15 @@ int mdev_device_remove(struct device *dev,
> > > bool force_remove)
> > > >  	mutex_unlock(&mdev_list_lock);
> > > >
> > > >  	type = to_mdev_type(mdev->type_kobj);
> > > > +	mdev_remove_sysfs_files(dev, type);
> > > > +	device_del(&mdev->dev);
> > > >  	parent = mdev->parent;
> > > > +	ret = parent->ops->remove(mdev);
> > > > +	if (ret)
> > > > +		dev_err(&mdev->dev, "Remove failed: err=%d\n", ret);
> > >
> > > I think carrying on with removal regardless of the return code of
> > > the
> > > ->remove callback makes sense, as it simply matches usual practice.
> > > However, are we sure that every vendor driver works well with that?
> > > I think it should, as removal from bus unregistration (vs. from the
> > > sysfs
> > > file) was always something it could not veto, but have you looked at
> > > the individual drivers?
> > >
> > I looked at following drivers a little while back.
> > Looked again now.
> >
> > drivers/gpu/drm/i915/gvt/kvmgt.c which clears the handle valid in
> intel_vgpu_release(), which should finish first before remove() is invoked.
> >
> > s390 vfio_ccw_mdev_remove() driver drivers/s390/cio/vfio_ccw_ops.c
> remove() always returns 0.
> > s39 crypo fails the remove() once vfio_ap_mdev_release marks kvm null,
> which should finish before remove() is invoked.
> 
> That one is giving me a bit of a headache (the ->kvm reference is supposed
> to keep us from detaching while a vm is running), so let's cc:
> the vfio-ap maintainers to see whether they have any concerns.
> 
I probably wrote wrongly.
vfio_ap_mdev_remove() fails if the VM is already running (i.e. vfio_ap_mdev_release() is not yet called).

And if VM is running it guarded by the vfio_mdev driver which is the one who binds to the device mdev device.
That is why I shown the above stack trace in the commit log, indicating that vfio driver is guarding it.

> > samples/vfio-mdev/mbochs.c mbochs_remove() always returns 0.
> >
> > > >
> > > > -	ret = mdev_device_remove_ops(mdev, force_remove);
> > > > -	if (ret) {
> > > > -		mdev->active = true;
> > > > -		return ret;
> > > > -	}
> > > > -
> > > > -	mdev_remove_sysfs_files(dev, type);
> > > > -	device_unregister(dev);
> > > > +	/* Balances with device_initialize() */
> > > > +	put_device(&mdev->dev);
> > > >  	mdev_put_parent(parent);
> > > >
> > > >  	return 0;
> > >
> > > I think that looks sane in general, but the commit message might
> > > benefit from tweaking.
> > Part of your description is more crisp than my commit message, I can
> probably take snippet from it to improve?
> > Or any specific entries in commit message that I should address?
> 
> I have added some comments inline (mostly some wording tweaks).
> 
> Feel free to take anything from my summary as well.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHv2 08/10] vfio/mdev: Improve the create/remove sequence
  2019-05-09 16:26         ` Pierre Morel
@ 2019-05-09 22:12           ` Halil Pasic
  0 siblings, 0 replies; 31+ messages in thread
From: Halil Pasic @ 2019-05-09 22:12 UTC (permalink / raw)
  To: Pierre Morel
  Cc: Cornelia Huck, Parav Pandit, kvm, linux-kernel, kwankhede,
	alex.williamson, cjia, Tony Krowiak

On Thu, 9 May 2019 18:26:59 +0200
Pierre Morel <pmorel@linux.ibm.com> wrote:

> On 09/05/2019 11:06, Cornelia Huck wrote:
> > [vfio-ap folks: find a question regarding removal further down]
> > 
> > On Wed, 8 May 2019 22:06:48 +0000
> > Parav Pandit <parav@mellanox.com> wrote:
> > 
> >>> -----Original Message-----
> >>> From: Cornelia Huck <cohuck@redhat.com>
> >>> Sent: Wednesday, May 8, 2019 12:10 PM
> >>> To: Parav Pandit <parav@mellanox.com>
> >>> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> >>> kwankhede@nvidia.com; alex.williamson@redhat.com; cjia@nvidia.com
> >>> Subject: Re: [PATCHv2 08/10] vfio/mdev: Improve the create/remove
> >>> sequence
> >>>
> >>> On Tue, 30 Apr 2019 17:49:35 -0500
> >>> Parav Pandit <parav@mellanox.com> wrote:
> >>>    
> 
> ...snip...
> 
> >>>> @@ -373,16 +330,15 @@ int mdev_device_remove(struct device *dev,
> >>> bool force_remove)
> >>>>   	mutex_unlock(&mdev_list_lock);
> >>>>
> >>>>   	type = to_mdev_type(mdev->type_kobj);
> >>>> +	mdev_remove_sysfs_files(dev, type);
> >>>> +	device_del(&mdev->dev);
> >>>>   	parent = mdev->parent;
> >>>> +	ret = parent->ops->remove(mdev);
> >>>> +	if (ret)
> >>>> +		dev_err(&mdev->dev, "Remove failed: err=%d\n", ret);
> >>>
> >>> I think carrying on with removal regardless of the return code of the
> >>> ->remove callback makes sense, as it simply matches usual practice.
> >>> However, are we sure that every vendor driver works well with that? I think
> >>> it should, as removal from bus unregistration (vs. from the sysfs
> >>> file) was always something it could not veto, but have you looked at the
> >>> individual drivers?
> >>>    
> >> I looked at following drivers a little while back.
> >> Looked again now.
> >>
> >> drivers/gpu/drm/i915/gvt/kvmgt.c which clears the handle valid in intel_vgpu_release(), which should finish first before remove() is invoked.
> >>
> >> s390 vfio_ccw_mdev_remove() driver drivers/s390/cio/vfio_ccw_ops.c remove() always returns 0.
> >> s39 crypo fails the remove() once vfio_ap_mdev_release marks kvm null, which should finish before remove() is invoked.
> > 
> > That one is giving me a bit of a headache (the ->kvm reference is
> > supposed to keep us from detaching while a vm is running), so let's cc:
> > the vfio-ap maintainers to see whether they have any concerns.
> > 
> 
> We are aware of this race and we did correct this in the IRQ patches for 
> which it would have become a real issue.
> We now increment/decrement the KVM reference counter inside open and 
> release.
> Should be right after this.
> 

Tony, what is your take on this? I don't have the bandwidth to think
this through properly, but my intuition tells me: this might be more
complicated than what Pierre's response suggests.

Regards,
Halil 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHv2 08/10] vfio/mdev: Improve the create/remove sequence
  2019-05-09 19:19         ` Parav Pandit
@ 2019-05-10  7:08           ` Pierre Morel
  2019-05-14 20:34           ` Parav Pandit
  1 sibling, 0 replies; 31+ messages in thread
From: Pierre Morel @ 2019-05-10  7:08 UTC (permalink / raw)
  To: Parav Pandit, Cornelia Huck
  Cc: kvm, linux-kernel, kwankhede, alex.williamson, cjia,
	Tony Krowiak, Halil Pasic

On 09/05/2019 21:19, Parav Pandit wrote:
> 
> 
>> -----Original Message-----
>> From: Cornelia Huck <cohuck@redhat.com>
>> Sent: Thursday, May 9, 2019 4:06 AM
>> To: Parav Pandit <parav@mellanox.com>
>> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
>> kwankhede@nvidia.com; alex.williamson@redhat.com; cjia@nvidia.com;
>> Tony Krowiak <akrowiak@linux.ibm.com>; Pierre Morel
>> <pmorel@linux.ibm.com>; Halil Pasic <pasic@linux.ibm.com>
>> Subject: Re: [PATCHv2 08/10] vfio/mdev: Improve the create/remove
>> sequence
>>
>> [vfio-ap folks: find a question regarding removal further down]
>>
>> On Wed, 8 May 2019 22:06:48 +0000
>> Parav Pandit <parav@mellanox.com> wrote:
>>
>>>> -----Original Message-----
>>>> From: Cornelia Huck <cohuck@redhat.com>
>>>> Sent: Wednesday, May 8, 2019 12:10 PM
>>>> To: Parav Pandit <parav@mellanox.com>
>>>> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
>>>> kwankhede@nvidia.com; alex.williamson@redhat.com; cjia@nvidia.com
>>>> Subject: Re: [PATCHv2 08/10] vfio/mdev: Improve the create/remove
>>>> sequence
>>>>
>>>> On Tue, 30 Apr 2019 17:49:35 -0500
>>>> Parav Pandit <parav@mellanox.com> wrote:
>>>>
>>>>> This patch addresses below two issues and prepares the code to
>>>>> address 3rd issue listed below.
>>>>>
>>>>> 1. mdev device is placed on the mdev bus before it is created in
>>>>> the vendor driver. Once a device is placed on the mdev bus without
>>>>> creating its supporting underlying vendor device, mdev driver's
>>>>> probe()
>>>> gets triggered.
>>>>> However there isn't a stable mdev available to work on.
>>>>>
>>>>>     create_store()
>>>>>       mdev_create_device()
>>>>>         device_register()
>>>>>            ...
>>>>>           vfio_mdev_probe()
>>>>>          [...]
>>>>>          parent->ops->create()
>>>>>            vfio_ap_mdev_create()
>>>>>              mdev_set_drvdata(mdev, matrix_mdev);
>>>>>              /* Valid pointer set above */
>>>>>
>>>>> Due to this way of initialization, mdev driver who want to use the
>>
>> s/want/wants/
>>
>>>>> mdev, doesn't have a valid mdev to work on.
>>>>>
>>>>> 2. Current creation sequence is,
>>>>>     parent->ops_create()
>>>>>     groups_register()
>>>>>
>>>>> Remove sequence is,
>>>>>     parent->ops->remove()
>>>>>     groups_unregister()
>>>>>
>>>>> However, remove sequence should be exact mirror of creation
>> sequence.
>>>>> Once this is achieved, all users of the mdev will be terminated
>>>>> first before removing underlying vendor device.
>>>>> (Follow standard linux driver model).
>>>>> At that point vendor's remove() ops shouldn't failed because
>>>>> device is
>>
>> s/failed/fail/
>>
>>>>> taken off the bus that should terminate the users.
>>
>> "because taking the device off the bus should terminate any usage" ?
>>
>>>>>
>>>>> 3. When remove operation fails, mdev sysfs removal attempts to add
>>>>> the file back on already removed device. Following call trace [1] is
>> observed.
>>>>>
>>>>> [1] call trace:
>>>>> kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
>>>>> sysfs_create_file_ns+0x7f/0x90
>>>>> kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted
>>>>> 5.1.0-rc6-vdevbus+ #6
>>>>> kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS
>>>>> 2.0b
>>>>> 08/09/2016
>>>>> kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
>>>>> kernel: Call Trace:
>>>>> kernel: remove_store+0xdc/0x100 [mdev]
>>>>> kernel: kernfs_fop_write+0x113/0x1a0
>>>>> kernel: vfs_write+0xad/0x1b0
>>>>> kernel: ksys_write+0x5a/0xe0
>>>>> kernel: do_syscall_64+0x5a/0x210
>>>>> kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>>>
>>>>> Therefore, mdev core is improved in following ways.
>>>>>
>>>>> 1. Before placing mdev devices on the bus, perform vendor drivers
>>>>> creation which supports the mdev creation.
>>
>> "invoke the vendor driver ->create callback" ?
>>
>>>>> This ensures that mdev specific all necessary fields are
>>>>> initialized
>>
>> "that all necessary mdev specific fields are initialized" ?
>>
>>>>> before a given mdev can be accessed by bus driver.
>>>>> This follows standard Linux kernel bus and device model similar to
>>>>> other widely used PCI bus.
>>
>> "This follows standard practice on other Linux device model buses." ?
>>
>>>>>
>>>>> 2. During remove flow, first remove the device from the bus. This
>>>>> ensures that any bus specific devices and data is cleared.
>>>>> Once device is taken of the mdev bus, perform remove() of mdev
>>>>> from
>>
>> s/of/off/
>>
>>>>> the vendor driver.
>>>>>
>>>>> 3. Linux core device model provides way to register and auto
>>>>> unregister the device sysfs attribute groups at dev->groups.
>>
>> "The driver core provides a way to automatically register and unregister sysfs
>> attributes via dev->groups." ?
>>
>>>>> Make use of this groups to let core create the groups and simplify
>>>>> code to avoid explicit groups creation and removal.
>>>>>
>>>>> A below stack dump of a mdev device remove process also ensures
>>>>> that vfio driver guards against device removal already in use.
>>>>>
>>>>>   cat /proc/21962/stack
>>>>> [<0>] vfio_del_group_dev+0x216/0x3c0 [vfio] [<0>]
>>>>> mdev_remove+0x21/0x40 [mdev] [<0>]
>>>>> device_release_driver_internal+0xe8/0x1b0
>>>>> [<0>] bus_remove_device+0xf9/0x170 [<0>] device_del+0x168/0x350
>>>>> [<0>] mdev_device_remove_common+0x1d/0x50 [mdev] [<0>]
>>>>> mdev_device_remove+0x8c/0xd0 [mdev] [<0>]
>> remove_store+0x71/0x90
>>>>> [mdev] [<0>] kernfs_fop_write+0x113/0x1a0 [<0>]
>>>>> vfs_write+0xad/0x1b0 [<0>] ksys_write+0x5a/0xe0 [<0>]
>>>>> do_syscall_64+0x5a/0x210 [<0>]
>>>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>>> [<0>] 0xffffffffffffffff
>>>>>
>>>>> This prepares the code to eliminate calling device_create_file()
>>>>> in subsquent patch.
>>
>> I find this stack dump and explanation more confusing than enlightening.
>> Maybe just drop it?
>>
>>>>
>>>> I'm afraid I have a bit of a problem following this explanation, so
>>>> let me try to summarize what the patch does to make sure that I
>>>> understand it
>>>> correctly:
>>>>
>>>> - Add the sysfs groups to device->groups so that the driver core deals
>>>>    with proper registration/deregistration.
>>>> - Split the device registration/deregistration sequence so that some
>>>>    things can be done between initialization of the device and hooking
>>>>    it up to the infrastructure respectively after deregistering it from
>>>>    the infrastructure but before giving up our final reference. In
>>>>    particular, this means invoking the ->create and ->remove callback in
>>>>    those new windows. This gives the vendor driver an initialized mdev
>>>>    device to work with during creation.
>>>> - Don't allow ->remove to fail, as the device is already removed from
>>>>    the infrastructure at that point in time.
>>>>
>>> You got all the points pretty accurate.
>>
>> Ok, good.
>>
>>>
>>>>>
>>>>> Signed-off-by: Parav Pandit <parav@mellanox.com>
>>>>> ---
>>>>>   drivers/vfio/mdev/mdev_core.c    | 94 +++++++++-----------------------
>>>>>   drivers/vfio/mdev/mdev_private.h |  2 +-
>>>>>   drivers/vfio/mdev/mdev_sysfs.c   |  2 +-
>>>>>   3 files changed, 27 insertions(+), 71 deletions(-)
>>>>
>>>> (...)
>>
>>>>> @@ -373,16 +330,15 @@ int mdev_device_remove(struct device *dev,
>>>> bool force_remove)
>>>>>   	mutex_unlock(&mdev_list_lock);
>>>>>
>>>>>   	type = to_mdev_type(mdev->type_kobj);
>>>>> +	mdev_remove_sysfs_files(dev, type);
>>>>> +	device_del(&mdev->dev);
>>>>>   	parent = mdev->parent;
>>>>> +	ret = parent->ops->remove(mdev);
>>>>> +	if (ret)
>>>>> +		dev_err(&mdev->dev, "Remove failed: err=%d\n", ret);
>>>>
>>>> I think carrying on with removal regardless of the return code of
>>>> the
>>>> ->remove callback makes sense, as it simply matches usual practice.
>>>> However, are we sure that every vendor driver works well with that?
>>>> I think it should, as removal from bus unregistration (vs. from the
>>>> sysfs
>>>> file) was always something it could not veto, but have you looked at
>>>> the individual drivers?
>>>>
>>> I looked at following drivers a little while back.
>>> Looked again now.
>>>
>>> drivers/gpu/drm/i915/gvt/kvmgt.c which clears the handle valid in
>> intel_vgpu_release(), which should finish first before remove() is invoked.
>>>
>>> s390 vfio_ccw_mdev_remove() driver drivers/s390/cio/vfio_ccw_ops.c
>> remove() always returns 0.
>>> s39 crypo fails the remove() once vfio_ap_mdev_release marks kvm null,
>> which should finish before remove() is invoked.
>>
>> That one is giving me a bit of a headache (the ->kvm reference is supposed
>> to keep us from detaching while a vm is running), so let's cc:
>> the vfio-ap maintainers to see whether they have any concerns.
>>
> I probably wrote wrongly.
> vfio_ap_mdev_remove() fails if the VM is already running (i.e. vfio_ap_mdev_release() is not yet called).
> 
> And if VM is running it guarded by the vfio_mdev driver which is the one who binds to the device mdev device.
> That is why I shown the above stack trace in the commit log, indicating that vfio driver is guarding it.
> 

I looked again, and see the race you are speaking of, it is not the one 
I thought first, which does not really mater at this stage of the driver 
but indeed between release and remove we do not take the lock in the 
right way.

We will correct this.
Thanks,

Pierre


>>> samples/vfio-mdev/mbochs.c mbochs_remove() always returns 0.
>>>
>>>>>
>>>>> -	ret = mdev_device_remove_ops(mdev, force_remove);
>>>>> -	if (ret) {
>>>>> -		mdev->active = true;
>>>>> -		return ret;
>>>>> -	}
>>>>> -
>>>>> -	mdev_remove_sysfs_files(dev, type);
>>>>> -	device_unregister(dev);
>>>>> +	/* Balances with device_initialize() */
>>>>> +	put_device(&mdev->dev);
>>>>>   	mdev_put_parent(parent);
>>>>>
>>>>>   	return 0;
>>>>
>>>> I think that looks sane in general, but the commit message might
>>>> benefit from tweaking.
>>> Part of your description is more crisp than my commit message, I can
>> probably take snippet from it to improve?
>>> Or any specific entries in commit message that I should address?
>>
>> I have added some comments inline (mostly some wording tweaks).
>>
>> Feel free to take anything from my summary as well.
> 


-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCHv2 08/10] vfio/mdev: Improve the create/remove sequence
  2019-05-09 19:19         ` Parav Pandit
  2019-05-10  7:08           ` Pierre Morel
@ 2019-05-14 20:34           ` Parav Pandit
  2019-05-14 22:20             ` Alex Williamson
  1 sibling, 1 reply; 31+ messages in thread
From: Parav Pandit @ 2019-05-14 20:34 UTC (permalink / raw)
  To: Parav Pandit, Cornelia Huck
  Cc: kvm, linux-kernel, kwankhede, alex.williamson, cjia,
	Tony Krowiak, Pierre Morel, Halil Pasic

Hi Alex, Cornelia,


> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org <linux-kernel-
> owner@vger.kernel.org> On Behalf Of Parav Pandit
> Sent: Thursday, May 9, 2019 2:20 PM
> To: Cornelia Huck <cohuck@redhat.com>
> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankhede@nvidia.com; alex.williamson@redhat.com; cjia@nvidia.com;
> Tony Krowiak <akrowiak@linux.ibm.com>; Pierre Morel
> <pmorel@linux.ibm.com>; Halil Pasic <pasic@linux.ibm.com>
> Subject: RE: [PATCHv2 08/10] vfio/mdev: Improve the create/remove
> sequence
> 
> 
> 
> > -----Original Message-----
> > From: Cornelia Huck <cohuck@redhat.com>
> > Sent: Thursday, May 9, 2019 4:06 AM
> > To: Parav Pandit <parav@mellanox.com>
> > Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> > kwankhede@nvidia.com; alex.williamson@redhat.com; cjia@nvidia.com;
> > Tony Krowiak <akrowiak@linux.ibm.com>; Pierre Morel
> > <pmorel@linux.ibm.com>; Halil Pasic <pasic@linux.ibm.com>
> > Subject: Re: [PATCHv2 08/10] vfio/mdev: Improve the create/remove
> > sequence
> >
> > [vfio-ap folks: find a question regarding removal further down]
> >
> > On Wed, 8 May 2019 22:06:48 +0000
> > Parav Pandit <parav@mellanox.com> wrote:
> >
> > > > -----Original Message-----
> > > > From: Cornelia Huck <cohuck@redhat.com>
> > > > Sent: Wednesday, May 8, 2019 12:10 PM
> > > > To: Parav Pandit <parav@mellanox.com>
> > > > Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > > kwankhede@nvidia.com; alex.williamson@redhat.com;
> cjia@nvidia.com
> > > > Subject: Re: [PATCHv2 08/10] vfio/mdev: Improve the create/remove
> > > > sequence
> > > >
> > > > On Tue, 30 Apr 2019 17:49:35 -0500 Parav Pandit
> > > > <parav@mellanox.com> wrote:
> > > >
> > > > > This patch addresses below two issues and prepares the code to
> > > > > address 3rd issue listed below.
> > > > >
> > > > > 1. mdev device is placed on the mdev bus before it is created in
> > > > > the vendor driver. Once a device is placed on the mdev bus
> > > > > without creating its supporting underlying vendor device, mdev
> > > > > driver's
> > > > > probe()
> > > > gets triggered.
> > > > > However there isn't a stable mdev available to work on.
> > > > >
> > > > >    create_store()
> > > > >      mdev_create_device()
> > > > >        device_register()
> > > > >           ...
> > > > >          vfio_mdev_probe()
> > > > >         [...]
> > > > >         parent->ops->create()
> > > > >           vfio_ap_mdev_create()
> > > > >             mdev_set_drvdata(mdev, matrix_mdev);
> > > > >             /* Valid pointer set above */
> > > > >
> > > > > Due to this way of initialization, mdev driver who want to use
> > > > > the
> >
> > s/want/wants/
> >
> > > > > mdev, doesn't have a valid mdev to work on.
> > > > >
> > > > > 2. Current creation sequence is,
> > > > >    parent->ops_create()
> > > > >    groups_register()
> > > > >
> > > > > Remove sequence is,
> > > > >    parent->ops->remove()
> > > > >    groups_unregister()
> > > > >
> > > > > However, remove sequence should be exact mirror of creation
> > sequence.
> > > > > Once this is achieved, all users of the mdev will be terminated
> > > > > first before removing underlying vendor device.
> > > > > (Follow standard linux driver model).
> > > > > At that point vendor's remove() ops shouldn't failed because
> > > > > device is
> >
> > s/failed/fail/
> >
> > > > > taken off the bus that should terminate the users.
> >
> > "because taking the device off the bus should terminate any usage" ?
> >
> > > > >
> > > > > 3. When remove operation fails, mdev sysfs removal attempts to
> > > > > add the file back on already removed device. Following call
> > > > > trace [1] is
> > observed.
> > > > >
> > > > > [1] call trace:
> > > > > kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
> > > > > sysfs_create_file_ns+0x7f/0x90
> > > > > kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted
> > > > > 5.1.0-rc6-vdevbus+ #6
> > > > > kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS
> > > > > 2.0b
> > > > > 08/09/2016
> > > > > kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
> > > > > kernel: Call Trace:
> > > > > kernel: remove_store+0xdc/0x100 [mdev]
> > > > > kernel: kernfs_fop_write+0x113/0x1a0
> > > > > kernel: vfs_write+0xad/0x1b0
> > > > > kernel: ksys_write+0x5a/0xe0
> > > > > kernel: do_syscall_64+0x5a/0x210
> > > > > kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > > >
> > > > > Therefore, mdev core is improved in following ways.
> > > > >
> > > > > 1. Before placing mdev devices on the bus, perform vendor
> > > > > drivers creation which supports the mdev creation.
> >
> > "invoke the vendor driver ->create callback" ?
> >
> > > > > This ensures that mdev specific all necessary fields are
> > > > > initialized
> >
> > "that all necessary mdev specific fields are initialized" ?
> >
> > > > > before a given mdev can be accessed by bus driver.
> > > > > This follows standard Linux kernel bus and device model similar
> > > > > to other widely used PCI bus.
> >
> > "This follows standard practice on other Linux device model buses." ?
> >
> > > > >
> > > > > 2. During remove flow, first remove the device from the bus.
> > > > > This ensures that any bus specific devices and data is cleared.
> > > > > Once device is taken of the mdev bus, perform remove() of mdev
> > > > > from
> >
> > s/of/off/
> >
> > > > > the vendor driver.
> > > > >
> > > > > 3. Linux core device model provides way to register and auto
> > > > > unregister the device sysfs attribute groups at dev->groups.
> >
> > "The driver core provides a way to automatically register and
> > unregister sysfs attributes via dev->groups." ?
> >
> > > > > Make use of this groups to let core create the groups and
> > > > > simplify code to avoid explicit groups creation and removal.
> > > > >
> > > > > A below stack dump of a mdev device remove process also ensures
> > > > > that vfio driver guards against device removal already in use.
> > > > >
> > > > >  cat /proc/21962/stack
> > > > > [<0>] vfio_del_group_dev+0x216/0x3c0 [vfio] [<0>]
> > > > > mdev_remove+0x21/0x40 [mdev] [<0>]
> > > > > device_release_driver_internal+0xe8/0x1b0
> > > > > [<0>] bus_remove_device+0xf9/0x170 [<0>] device_del+0x168/0x350
> > > > > [<0>] mdev_device_remove_common+0x1d/0x50 [mdev] [<0>]
> > > > > mdev_device_remove+0x8c/0xd0 [mdev] [<0>]
> > remove_store+0x71/0x90
> > > > > [mdev] [<0>] kernfs_fop_write+0x113/0x1a0 [<0>]
> > > > > vfs_write+0xad/0x1b0 [<0>] ksys_write+0x5a/0xe0 [<0>]
> > > > > do_syscall_64+0x5a/0x210 [<0>]
> > > > > entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > > > [<0>] 0xffffffffffffffff
> > > > >
> > > > > This prepares the code to eliminate calling device_create_file()
> > > > > in subsquent patch.
> >
> > I find this stack dump and explanation more confusing than enlightening.
> > Maybe just drop it?
> >
> > > >
> > > > I'm afraid I have a bit of a problem following this explanation,
> > > > so let me try to summarize what the patch does to make sure that I
> > > > understand it
> > > > correctly:
> > > >
> > > > - Add the sysfs groups to device->groups so that the driver core deals
> > > >   with proper registration/deregistration.
> > > > - Split the device registration/deregistration sequence so that some
> > > >   things can be done between initialization of the device and hooking
> > > >   it up to the infrastructure respectively after deregistering it from
> > > >   the infrastructure but before giving up our final reference. In
> > > >   particular, this means invoking the ->create and ->remove callback in
> > > >   those new windows. This gives the vendor driver an initialized mdev
> > > >   device to work with during creation.
> > > > - Don't allow ->remove to fail, as the device is already removed from
> > > >   the infrastructure at that point in time.
> > > >
> > > You got all the points pretty accurate.
> >
> > Ok, good.
> >
> > >
> > > > >
> > > > > Signed-off-by: Parav Pandit <parav@mellanox.com>
> > > > > ---
> > > > >  drivers/vfio/mdev/mdev_core.c    | 94 +++++++++-----------------------
> > > > >  drivers/vfio/mdev/mdev_private.h |  2 +-
> > > > >  drivers/vfio/mdev/mdev_sysfs.c   |  2 +-
> > > > >  3 files changed, 27 insertions(+), 71 deletions(-)
> > > >
> > > > (...)
> >
> > > > > @@ -373,16 +330,15 @@ int mdev_device_remove(struct device
> *dev,
> > > > bool force_remove)
> > > > >  	mutex_unlock(&mdev_list_lock);
> > > > >
> > > > >  	type = to_mdev_type(mdev->type_kobj);
> > > > > +	mdev_remove_sysfs_files(dev, type);
> > > > > +	device_del(&mdev->dev);
> > > > >  	parent = mdev->parent;
> > > > > +	ret = parent->ops->remove(mdev);
> > > > > +	if (ret)
> > > > > +		dev_err(&mdev->dev, "Remove failed: err=%d\n",
> ret);
> > > >
> > > > I think carrying on with removal regardless of the return code of
> > > > the
> > > > ->remove callback makes sense, as it simply matches usual practice.
> > > > However, are we sure that every vendor driver works well with that?
> > > > I think it should, as removal from bus unregistration (vs. from
> > > > the sysfs
> > > > file) was always something it could not veto, but have you looked
> > > > at the individual drivers?
> > > >
> > > I looked at following drivers a little while back.
> > > Looked again now.
> > >
> > > drivers/gpu/drm/i915/gvt/kvmgt.c which clears the handle valid in
> > intel_vgpu_release(), which should finish first before remove() is invoked.
> > >
> > > s390 vfio_ccw_mdev_remove() driver drivers/s390/cio/vfio_ccw_ops.c
> > remove() always returns 0.
> > > s39 crypo fails the remove() once vfio_ap_mdev_release marks kvm
> > > null,
> > which should finish before remove() is invoked.
> >
> > That one is giving me a bit of a headache (the ->kvm reference is
> > supposed to keep us from detaching while a vm is running), so let's cc:
> > the vfio-ap maintainers to see whether they have any concerns.
> >
> I probably wrote wrongly.
> vfio_ap_mdev_remove() fails if the VM is already running (i.e.
> vfio_ap_mdev_release() is not yet called).
> 
> And if VM is running it guarded by the vfio_mdev driver which is the one
> who binds to the device mdev device.
> That is why I shown the above stack trace in the commit log, indicating that
> vfio driver is guarding it.
> 
> > > samples/vfio-mdev/mbochs.c mbochs_remove() always returns 0.
> > >
> > > > >
> > > > > -	ret = mdev_device_remove_ops(mdev, force_remove);
> > > > > -	if (ret) {
> > > > > -		mdev->active = true;
> > > > > -		return ret;
> > > > > -	}
> > > > > -
> > > > > -	mdev_remove_sysfs_files(dev, type);
> > > > > -	device_unregister(dev);
> > > > > +	/* Balances with device_initialize() */
> > > > > +	put_device(&mdev->dev);
> > > > >  	mdev_put_parent(parent);
> > > > >
> > > > >  	return 0;
> > > >
> > > > I think that looks sane in general, but the commit message might
> > > > benefit from tweaking.
> > > Part of your description is more crisp than my commit message, I can
> > probably take snippet from it to improve?
> > > Or any specific entries in commit message that I should address?
> >
> > I have added some comments inline (mostly some wording tweaks).
> >
> > Feel free to take anything from my summary as well.

I want to send v3 addressing commit log comment and take updated description from Cornelia, if this 3 patches looks reasonable enough.
What do you think?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHv2 08/10] vfio/mdev: Improve the create/remove sequence
  2019-05-14 20:34           ` Parav Pandit
@ 2019-05-14 22:20             ` Alex Williamson
  2019-05-15 20:42               ` Parav Pandit
  0 siblings, 1 reply; 31+ messages in thread
From: Alex Williamson @ 2019-05-14 22:20 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Cornelia Huck, kvm, linux-kernel, kwankhede, cjia, Tony Krowiak,
	Pierre Morel, Halil Pasic

On Tue, 14 May 2019 20:34:12 +0000
Parav Pandit <parav@mellanox.com> wrote:

> Hi Alex, Cornelia,
> 
> 
> > -----Original Message-----
> > From: linux-kernel-owner@vger.kernel.org <linux-kernel-  
> > owner@vger.kernel.org> On Behalf Of Parav Pandit  
> > Sent: Thursday, May 9, 2019 2:20 PM
> > To: Cornelia Huck <cohuck@redhat.com>
> > Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> > kwankhede@nvidia.com; alex.williamson@redhat.com; cjia@nvidia.com;
> > Tony Krowiak <akrowiak@linux.ibm.com>; Pierre Morel
> > <pmorel@linux.ibm.com>; Halil Pasic <pasic@linux.ibm.com>
> > Subject: RE: [PATCHv2 08/10] vfio/mdev: Improve the create/remove
> > sequence
> > 
> > 
> >   
> > > -----Original Message-----
> > > From: Cornelia Huck <cohuck@redhat.com>
> > > Sent: Thursday, May 9, 2019 4:06 AM
> > > To: Parav Pandit <parav@mellanox.com>
> > > Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > kwankhede@nvidia.com; alex.williamson@redhat.com; cjia@nvidia.com;
> > > Tony Krowiak <akrowiak@linux.ibm.com>; Pierre Morel
> > > <pmorel@linux.ibm.com>; Halil Pasic <pasic@linux.ibm.com>
> > > Subject: Re: [PATCHv2 08/10] vfio/mdev: Improve the create/remove
> > > sequence
> > >
> > > [vfio-ap folks: find a question regarding removal further down]
> > >
> > > On Wed, 8 May 2019 22:06:48 +0000
> > > Parav Pandit <parav@mellanox.com> wrote:
> > >  
> > > > > -----Original Message-----
> > > > > From: Cornelia Huck <cohuck@redhat.com>
> > > > > Sent: Wednesday, May 8, 2019 12:10 PM
> > > > > To: Parav Pandit <parav@mellanox.com>
> > > > > Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > > > kwankhede@nvidia.com; alex.williamson@redhat.com;  
> > cjia@nvidia.com  
> > > > > Subject: Re: [PATCHv2 08/10] vfio/mdev: Improve the create/remove
> > > > > sequence
> > > > >
> > > > > On Tue, 30 Apr 2019 17:49:35 -0500 Parav Pandit
> > > > > <parav@mellanox.com> wrote:
> > > > >  
> > > > > > This patch addresses below two issues and prepares the code to
> > > > > > address 3rd issue listed below.
> > > > > >
> > > > > > 1. mdev device is placed on the mdev bus before it is created in
> > > > > > the vendor driver. Once a device is placed on the mdev bus
> > > > > > without creating its supporting underlying vendor device, mdev
> > > > > > driver's
> > > > > > probe()  
> > > > > gets triggered.  
> > > > > > However there isn't a stable mdev available to work on.
> > > > > >
> > > > > >    create_store()
> > > > > >      mdev_create_device()
> > > > > >        device_register()
> > > > > >           ...
> > > > > >          vfio_mdev_probe()
> > > > > >         [...]
> > > > > >         parent->ops->create()
> > > > > >           vfio_ap_mdev_create()
> > > > > >             mdev_set_drvdata(mdev, matrix_mdev);
> > > > > >             /* Valid pointer set above */
> > > > > >
> > > > > > Due to this way of initialization, mdev driver who want to use
> > > > > > the  
> > >
> > > s/want/wants/
> > >  
> > > > > > mdev, doesn't have a valid mdev to work on.
> > > > > >
> > > > > > 2. Current creation sequence is,
> > > > > >    parent->ops_create()
> > > > > >    groups_register()
> > > > > >
> > > > > > Remove sequence is,
> > > > > >    parent->ops->remove()
> > > > > >    groups_unregister()
> > > > > >
> > > > > > However, remove sequence should be exact mirror of creation  
> > > sequence.  
> > > > > > Once this is achieved, all users of the mdev will be terminated
> > > > > > first before removing underlying vendor device.
> > > > > > (Follow standard linux driver model).
> > > > > > At that point vendor's remove() ops shouldn't failed because
> > > > > > device is  
> > >
> > > s/failed/fail/
> > >  
> > > > > > taken off the bus that should terminate the users.  
> > >
> > > "because taking the device off the bus should terminate any usage" ?
> > >  
> > > > > >
> > > > > > 3. When remove operation fails, mdev sysfs removal attempts to
> > > > > > add the file back on already removed device. Following call
> > > > > > trace [1] is  
> > > observed.  
> > > > > >
> > > > > > [1] call trace:
> > > > > > kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
> > > > > > sysfs_create_file_ns+0x7f/0x90
> > > > > > kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted
> > > > > > 5.1.0-rc6-vdevbus+ #6
> > > > > > kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS
> > > > > > 2.0b
> > > > > > 08/09/2016
> > > > > > kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
> > > > > > kernel: Call Trace:
> > > > > > kernel: remove_store+0xdc/0x100 [mdev]
> > > > > > kernel: kernfs_fop_write+0x113/0x1a0
> > > > > > kernel: vfs_write+0xad/0x1b0
> > > > > > kernel: ksys_write+0x5a/0xe0
> > > > > > kernel: do_syscall_64+0x5a/0x210
> > > > > > kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > > > >
> > > > > > Therefore, mdev core is improved in following ways.
> > > > > >
> > > > > > 1. Before placing mdev devices on the bus, perform vendor
> > > > > > drivers creation which supports the mdev creation.  
> > >
> > > "invoke the vendor driver ->create callback" ?
> > >  
> > > > > > This ensures that mdev specific all necessary fields are
> > > > > > initialized  
> > >
> > > "that all necessary mdev specific fields are initialized" ?
> > >  
> > > > > > before a given mdev can be accessed by bus driver.
> > > > > > This follows standard Linux kernel bus and device model similar
> > > > > > to other widely used PCI bus.  
> > >
> > > "This follows standard practice on other Linux device model buses." ?
> > >  
> > > > > >
> > > > > > 2. During remove flow, first remove the device from the bus.
> > > > > > This ensures that any bus specific devices and data is cleared.
> > > > > > Once device is taken of the mdev bus, perform remove() of mdev
> > > > > > from  
> > >
> > > s/of/off/
> > >  
> > > > > > the vendor driver.
> > > > > >
> > > > > > 3. Linux core device model provides way to register and auto
> > > > > > unregister the device sysfs attribute groups at dev->groups.  
> > >
> > > "The driver core provides a way to automatically register and
> > > unregister sysfs attributes via dev->groups." ?
> > >  
> > > > > > Make use of this groups to let core create the groups and
> > > > > > simplify code to avoid explicit groups creation and removal.
> > > > > >
> > > > > > A below stack dump of a mdev device remove process also ensures
> > > > > > that vfio driver guards against device removal already in use.
> > > > > >
> > > > > >  cat /proc/21962/stack
> > > > > > [<0>] vfio_del_group_dev+0x216/0x3c0 [vfio] [<0>]
> > > > > > mdev_remove+0x21/0x40 [mdev] [<0>]
> > > > > > device_release_driver_internal+0xe8/0x1b0
> > > > > > [<0>] bus_remove_device+0xf9/0x170 [<0>] device_del+0x168/0x350
> > > > > > [<0>] mdev_device_remove_common+0x1d/0x50 [mdev] [<0>]
> > > > > > mdev_device_remove+0x8c/0xd0 [mdev] [<0>]  
> > > remove_store+0x71/0x90  
> > > > > > [mdev] [<0>] kernfs_fop_write+0x113/0x1a0 [<0>]
> > > > > > vfs_write+0xad/0x1b0 [<0>] ksys_write+0x5a/0xe0 [<0>]
> > > > > > do_syscall_64+0x5a/0x210 [<0>]
> > > > > > entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > > > > [<0>] 0xffffffffffffffff
> > > > > >
> > > > > > This prepares the code to eliminate calling device_create_file()
> > > > > > in subsquent patch.  
> > >
> > > I find this stack dump and explanation more confusing than enlightening.
> > > Maybe just drop it?
> > >  
> > > > >
> > > > > I'm afraid I have a bit of a problem following this explanation,
> > > > > so let me try to summarize what the patch does to make sure that I
> > > > > understand it
> > > > > correctly:
> > > > >
> > > > > - Add the sysfs groups to device->groups so that the driver core deals
> > > > >   with proper registration/deregistration.
> > > > > - Split the device registration/deregistration sequence so that some
> > > > >   things can be done between initialization of the device and hooking
> > > > >   it up to the infrastructure respectively after deregistering it from
> > > > >   the infrastructure but before giving up our final reference. In
> > > > >   particular, this means invoking the ->create and ->remove callback in
> > > > >   those new windows. This gives the vendor driver an initialized mdev
> > > > >   device to work with during creation.
> > > > > - Don't allow ->remove to fail, as the device is already removed from
> > > > >   the infrastructure at that point in time.
> > > > >  
> > > > You got all the points pretty accurate.  
> > >
> > > Ok, good.
> > >  
> > > >  
> > > > > >
> > > > > > Signed-off-by: Parav Pandit <parav@mellanox.com>
> > > > > > ---
> > > > > >  drivers/vfio/mdev/mdev_core.c    | 94 +++++++++-----------------------
> > > > > >  drivers/vfio/mdev/mdev_private.h |  2 +-
> > > > > >  drivers/vfio/mdev/mdev_sysfs.c   |  2 +-
> > > > > >  3 files changed, 27 insertions(+), 71 deletions(-)  
> > > > >
> > > > > (...)  
> > >  
> > > > > > @@ -373,16 +330,15 @@ int mdev_device_remove(struct device  
> > *dev,  
> > > > > bool force_remove)  
> > > > > >  	mutex_unlock(&mdev_list_lock);
> > > > > >
> > > > > >  	type = to_mdev_type(mdev->type_kobj);
> > > > > > +	mdev_remove_sysfs_files(dev, type);
> > > > > > +	device_del(&mdev->dev);
> > > > > >  	parent = mdev->parent;
> > > > > > +	ret = parent->ops->remove(mdev);
> > > > > > +	if (ret)
> > > > > > +		dev_err(&mdev->dev, "Remove failed: err=%d\n",  
> > ret);  
> > > > >
> > > > > I think carrying on with removal regardless of the return code of
> > > > > the  
> > > > > ->remove callback makes sense, as it simply matches usual practice.  
> > > > > However, are we sure that every vendor driver works well with that?
> > > > > I think it should, as removal from bus unregistration (vs. from
> > > > > the sysfs
> > > > > file) was always something it could not veto, but have you looked
> > > > > at the individual drivers?
> > > > >  
> > > > I looked at following drivers a little while back.
> > > > Looked again now.
> > > >
> > > > drivers/gpu/drm/i915/gvt/kvmgt.c which clears the handle valid in  
> > > intel_vgpu_release(), which should finish first before remove() is invoked.  
> > > >
> > > > s390 vfio_ccw_mdev_remove() driver drivers/s390/cio/vfio_ccw_ops.c  
> > > remove() always returns 0.  
> > > > s39 crypo fails the remove() once vfio_ap_mdev_release marks kvm
> > > > null,  
> > > which should finish before remove() is invoked.
> > >
> > > That one is giving me a bit of a headache (the ->kvm reference is
> > > supposed to keep us from detaching while a vm is running), so let's cc:
> > > the vfio-ap maintainers to see whether they have any concerns.
> > >  
> > I probably wrote wrongly.
> > vfio_ap_mdev_remove() fails if the VM is already running (i.e.
> > vfio_ap_mdev_release() is not yet called).
> > 
> > And if VM is running it guarded by the vfio_mdev driver which is the one
> > who binds to the device mdev device.
> > That is why I shown the above stack trace in the commit log, indicating that
> > vfio driver is guarding it.
> >   
> > > > samples/vfio-mdev/mbochs.c mbochs_remove() always returns 0.
> > > >  
> > > > > >
> > > > > > -	ret = mdev_device_remove_ops(mdev, force_remove);
> > > > > > -	if (ret) {
> > > > > > -		mdev->active = true;
> > > > > > -		return ret;
> > > > > > -	}
> > > > > > -
> > > > > > -	mdev_remove_sysfs_files(dev, type);
> > > > > > -	device_unregister(dev);
> > > > > > +	/* Balances with device_initialize() */
> > > > > > +	put_device(&mdev->dev);
> > > > > >  	mdev_put_parent(parent);
> > > > > >
> > > > > >  	return 0;  
> > > > >
> > > > > I think that looks sane in general, but the commit message might
> > > > > benefit from tweaking.  
> > > > Part of your description is more crisp than my commit message, I can  
> > > probably take snippet from it to improve?  
> > > > Or any specific entries in commit message that I should address?  
> > >
> > > I have added some comments inline (mostly some wording tweaks).
> > >
> > > Feel free to take anything from my summary as well.  
> 
> I want to send v3 addressing commit log comment and take updated description from Cornelia, if this 3 patches looks reasonable enough.
> What do you think?

The kref removal in the last patch still makes me uncomfortable, but I
can't really find a reason to keep it or see any problems with the way
you're using refcount either.  It's probably good to see a new version
of the series regardless.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCHv2 08/10] vfio/mdev: Improve the create/remove sequence
  2019-05-14 22:20             ` Alex Williamson
@ 2019-05-15 20:42               ` Parav Pandit
  0 siblings, 0 replies; 31+ messages in thread
From: Parav Pandit @ 2019-05-15 20:42 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Cornelia Huck, kvm, linux-kernel, kwankhede, cjia, Tony Krowiak,
	Pierre Morel, Halil Pasic



> -----Original Message-----
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, May 14, 2019 5:20 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Cornelia Huck <cohuck@redhat.com>; kvm@vger.kernel.org; linux-
> kernel@vger.kernel.org; kwankhede@nvidia.com; cjia@nvidia.com; Tony
> Krowiak <akrowiak@linux.ibm.com>; Pierre Morel <pmorel@linux.ibm.com>;
> Halil Pasic <pasic@linux.ibm.com>
> Subject: Re: [PATCHv2 08/10] vfio/mdev: Improve the create/remove
> sequence
> 
> On Tue, 14 May 2019 20:34:12 +0000
> Parav Pandit <parav@mellanox.com> wrote:
> 
> > Hi Alex, Cornelia,
> >
> >
> > > -----Original Message-----
> > > From: linux-kernel-owner@vger.kernel.org <linux-kernel-
> > > owner@vger.kernel.org> On Behalf Of Parav Pandit
> > > Sent: Thursday, May 9, 2019 2:20 PM
> > > To: Cornelia Huck <cohuck@redhat.com>
> > > Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > kwankhede@nvidia.com; alex.williamson@redhat.com; cjia@nvidia.com;
> > > Tony Krowiak <akrowiak@linux.ibm.com>; Pierre Morel
> > > <pmorel@linux.ibm.com>; Halil Pasic <pasic@linux.ibm.com>
> > > Subject: RE: [PATCHv2 08/10] vfio/mdev: Improve the create/remove
> > > sequence
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Cornelia Huck <cohuck@redhat.com>
> > > > Sent: Thursday, May 9, 2019 4:06 AM
> > > > To: Parav Pandit <parav@mellanox.com>
> > > > Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > > kwankhede@nvidia.com; alex.williamson@redhat.com;
> cjia@nvidia.com;
> > > > Tony Krowiak <akrowiak@linux.ibm.com>; Pierre Morel
> > > > <pmorel@linux.ibm.com>; Halil Pasic <pasic@linux.ibm.com>
> > > > Subject: Re: [PATCHv2 08/10] vfio/mdev: Improve the create/remove
> > > > sequence
> > > >
> > > > [vfio-ap folks: find a question regarding removal further down]
> > > >
> > > > On Wed, 8 May 2019 22:06:48 +0000
> > > > Parav Pandit <parav@mellanox.com> wrote:
> > > >
> > > > > > -----Original Message-----
> > > > > > From: Cornelia Huck <cohuck@redhat.com>
> > > > > > Sent: Wednesday, May 8, 2019 12:10 PM
> > > > > > To: Parav Pandit <parav@mellanox.com>
> > > > > > Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > > > > kwankhede@nvidia.com; alex.williamson@redhat.com;
> > > cjia@nvidia.com
> > > > > > Subject: Re: [PATCHv2 08/10] vfio/mdev: Improve the
> > > > > > create/remove sequence
> > > > > >
> > > > > > On Tue, 30 Apr 2019 17:49:35 -0500 Parav Pandit
> > > > > > <parav@mellanox.com> wrote:
> > > > > >
> > > > > > > This patch addresses below two issues and prepares the code
> > > > > > > to address 3rd issue listed below.
> > > > > > >
> > > > > > > 1. mdev device is placed on the mdev bus before it is
> > > > > > > created in the vendor driver. Once a device is placed on the
> > > > > > > mdev bus without creating its supporting underlying vendor
> > > > > > > device, mdev driver's
> > > > > > > probe()
> > > > > > gets triggered.
> > > > > > > However there isn't a stable mdev available to work on.
> > > > > > >
> > > > > > >    create_store()
> > > > > > >      mdev_create_device()
> > > > > > >        device_register()
> > > > > > >           ...
> > > > > > >          vfio_mdev_probe()
> > > > > > >         [...]
> > > > > > >         parent->ops->create()
> > > > > > >           vfio_ap_mdev_create()
> > > > > > >             mdev_set_drvdata(mdev, matrix_mdev);
> > > > > > >             /* Valid pointer set above */
> > > > > > >
> > > > > > > Due to this way of initialization, mdev driver who want to
> > > > > > > use the
> > > >
> > > > s/want/wants/
> > > >
> > > > > > > mdev, doesn't have a valid mdev to work on.
> > > > > > >
> > > > > > > 2. Current creation sequence is,
> > > > > > >    parent->ops_create()
> > > > > > >    groups_register()
> > > > > > >
> > > > > > > Remove sequence is,
> > > > > > >    parent->ops->remove()
> > > > > > >    groups_unregister()
> > > > > > >
> > > > > > > However, remove sequence should be exact mirror of creation
> > > > sequence.
> > > > > > > Once this is achieved, all users of the mdev will be
> > > > > > > terminated first before removing underlying vendor device.
> > > > > > > (Follow standard linux driver model).
> > > > > > > At that point vendor's remove() ops shouldn't failed because
> > > > > > > device is
> > > >
> > > > s/failed/fail/
> > > >
> > > > > > > taken off the bus that should terminate the users.
> > > >
> > > > "because taking the device off the bus should terminate any usage" ?
> > > >
> > > > > > >
> > > > > > > 3. When remove operation fails, mdev sysfs removal attempts
> > > > > > > to add the file back on already removed device. Following
> > > > > > > call trace [1] is
> > > > observed.
> > > > > > >
> > > > > > > [1] call trace:
> > > > > > > kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
> > > > > > > sysfs_create_file_ns+0x7f/0x90
> > > > > > > kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not
> > > > > > > tainted 5.1.0-rc6-vdevbus+ #6
> > > > > > > kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+,
> > > > > > > BIOS 2.0b
> > > > > > > 08/09/2016
> > > > > > > kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
> > > > > > > kernel: Call Trace:
> > > > > > > kernel: remove_store+0xdc/0x100 [mdev]
> > > > > > > kernel: kernfs_fop_write+0x113/0x1a0
> > > > > > > kernel: vfs_write+0xad/0x1b0
> > > > > > > kernel: ksys_write+0x5a/0xe0
> > > > > > > kernel: do_syscall_64+0x5a/0x210
> > > > > > > kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > > > > >
> > > > > > > Therefore, mdev core is improved in following ways.
> > > > > > >
> > > > > > > 1. Before placing mdev devices on the bus, perform vendor
> > > > > > > drivers creation which supports the mdev creation.
> > > >
> > > > "invoke the vendor driver ->create callback" ?
> > > >
> > > > > > > This ensures that mdev specific all necessary fields are
> > > > > > > initialized
> > > >
> > > > "that all necessary mdev specific fields are initialized" ?
> > > >
> > > > > > > before a given mdev can be accessed by bus driver.
> > > > > > > This follows standard Linux kernel bus and device model
> > > > > > > similar to other widely used PCI bus.
> > > >
> > > > "This follows standard practice on other Linux device model buses." ?
> > > >
> > > > > > >
> > > > > > > 2. During remove flow, first remove the device from the bus.
> > > > > > > This ensures that any bus specific devices and data is cleared.
> > > > > > > Once device is taken of the mdev bus, perform remove() of
> > > > > > > mdev from
> > > >
> > > > s/of/off/
> > > >
> > > > > > > the vendor driver.
> > > > > > >
> > > > > > > 3. Linux core device model provides way to register and auto
> > > > > > > unregister the device sysfs attribute groups at dev->groups.
> > > >
> > > > "The driver core provides a way to automatically register and
> > > > unregister sysfs attributes via dev->groups." ?
> > > >
> > > > > > > Make use of this groups to let core create the groups and
> > > > > > > simplify code to avoid explicit groups creation and removal.
> > > > > > >
> > > > > > > A below stack dump of a mdev device remove process also
> > > > > > > ensures that vfio driver guards against device removal already in
> use.
> > > > > > >
> > > > > > >  cat /proc/21962/stack
> > > > > > > [<0>] vfio_del_group_dev+0x216/0x3c0 [vfio] [<0>]
> > > > > > > mdev_remove+0x21/0x40 [mdev] [<0>]
> > > > > > > device_release_driver_internal+0xe8/0x1b0
> > > > > > > [<0>] bus_remove_device+0xf9/0x170 [<0>]
> > > > > > > device_del+0x168/0x350 [<0>]
> > > > > > > mdev_device_remove_common+0x1d/0x50 [mdev] [<0>]
> > > > > > > mdev_device_remove+0x8c/0xd0 [mdev] [<0>]
> > > > remove_store+0x71/0x90
> > > > > > > [mdev] [<0>] kernfs_fop_write+0x113/0x1a0 [<0>]
> > > > > > > vfs_write+0xad/0x1b0 [<0>] ksys_write+0x5a/0xe0 [<0>]
> > > > > > > do_syscall_64+0x5a/0x210 [<0>]
> > > > > > > entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > > > > > [<0>] 0xffffffffffffffff
> > > > > > >
> > > > > > > This prepares the code to eliminate calling
> > > > > > > device_create_file() in subsquent patch.
> > > >
> > > > I find this stack dump and explanation more confusing than
> enlightening.
> > > > Maybe just drop it?
> > > >
> > > > > >
> > > > > > I'm afraid I have a bit of a problem following this
> > > > > > explanation, so let me try to summarize what the patch does to
> > > > > > make sure that I understand it
> > > > > > correctly:
> > > > > >
> > > > > > - Add the sysfs groups to device->groups so that the driver core
> deals
> > > > > >   with proper registration/deregistration.
> > > > > > - Split the device registration/deregistration sequence so that some
> > > > > >   things can be done between initialization of the device and
> hooking
> > > > > >   it up to the infrastructure respectively after deregistering it from
> > > > > >   the infrastructure but before giving up our final reference. In
> > > > > >   particular, this means invoking the ->create and ->remove callback
> in
> > > > > >   those new windows. This gives the vendor driver an initialized
> mdev
> > > > > >   device to work with during creation.
> > > > > > - Don't allow ->remove to fail, as the device is already removed
> from
> > > > > >   the infrastructure at that point in time.
> > > > > >
> > > > > You got all the points pretty accurate.
> > > >
> > > > Ok, good.
> > > >
> > > > >
> > > > > > >
> > > > > > > Signed-off-by: Parav Pandit <parav@mellanox.com>
> > > > > > > ---
> > > > > > >  drivers/vfio/mdev/mdev_core.c    | 94 +++++++++-----------------------
> > > > > > >  drivers/vfio/mdev/mdev_private.h |  2 +-
> > > > > > >  drivers/vfio/mdev/mdev_sysfs.c   |  2 +-
> > > > > > >  3 files changed, 27 insertions(+), 71 deletions(-)
> > > > > >
> > > > > > (...)
> > > >
> > > > > > > @@ -373,16 +330,15 @@ int mdev_device_remove(struct device
> > > *dev,
> > > > > > bool force_remove)
> > > > > > >  	mutex_unlock(&mdev_list_lock);
> > > > > > >
> > > > > > >  	type = to_mdev_type(mdev->type_kobj);
> > > > > > > +	mdev_remove_sysfs_files(dev, type);
> > > > > > > +	device_del(&mdev->dev);
> > > > > > >  	parent = mdev->parent;
> > > > > > > +	ret = parent->ops->remove(mdev);
> > > > > > > +	if (ret)
> > > > > > > +		dev_err(&mdev->dev, "Remove failed: err=%d\n",
> > > ret);
> > > > > >
> > > > > > I think carrying on with removal regardless of the return code
> > > > > > of the
> > > > > > ->remove callback makes sense, as it simply matches usual practice.
> > > > > > However, are we sure that every vendor driver works well with
> that?
> > > > > > I think it should, as removal from bus unregistration (vs.
> > > > > > from the sysfs
> > > > > > file) was always something it could not veto, but have you
> > > > > > looked at the individual drivers?
> > > > > >
> > > > > I looked at following drivers a little while back.
> > > > > Looked again now.
> > > > >
> > > > > drivers/gpu/drm/i915/gvt/kvmgt.c which clears the handle valid
> > > > > in
> > > > intel_vgpu_release(), which should finish first before remove() is
> invoked.
> > > > >
> > > > > s390 vfio_ccw_mdev_remove() driver
> > > > > drivers/s390/cio/vfio_ccw_ops.c
> > > > remove() always returns 0.
> > > > > s39 crypo fails the remove() once vfio_ap_mdev_release marks kvm
> > > > > null,
> > > > which should finish before remove() is invoked.
> > > >
> > > > That one is giving me a bit of a headache (the ->kvm reference is
> > > > supposed to keep us from detaching while a vm is running), so let's cc:
> > > > the vfio-ap maintainers to see whether they have any concerns.
> > > >
> > > I probably wrote wrongly.
> > > vfio_ap_mdev_remove() fails if the VM is already running (i.e.
> > > vfio_ap_mdev_release() is not yet called).
> > >
> > > And if VM is running it guarded by the vfio_mdev driver which is the
> > > one who binds to the device mdev device.
> > > That is why I shown the above stack trace in the commit log,
> > > indicating that vfio driver is guarding it.
> > >
> > > > > samples/vfio-mdev/mbochs.c mbochs_remove() always returns 0.
> > > > >
> > > > > > >
> > > > > > > -	ret = mdev_device_remove_ops(mdev, force_remove);
> > > > > > > -	if (ret) {
> > > > > > > -		mdev->active = true;
> > > > > > > -		return ret;
> > > > > > > -	}
> > > > > > > -
> > > > > > > -	mdev_remove_sysfs_files(dev, type);
> > > > > > > -	device_unregister(dev);
> > > > > > > +	/* Balances with device_initialize() */
> > > > > > > +	put_device(&mdev->dev);
> > > > > > >  	mdev_put_parent(parent);
> > > > > > >
> > > > > > >  	return 0;
> > > > > >
> > > > > > I think that looks sane in general, but the commit message
> > > > > > might benefit from tweaking.
> > > > > Part of your description is more crisp than my commit message, I
> > > > > can
> > > > probably take snippet from it to improve?
> > > > > Or any specific entries in commit message that I should address?
> > > >
> > > > I have added some comments inline (mostly some wording tweaks).
> > > >
> > > > Feel free to take anything from my summary as well.
> >
> > I want to send v3 addressing commit log comment and take updated
> description from Cornelia, if this 3 patches looks reasonable enough.
> > What do you think?
> 
> The kref removal in the last patch still makes me uncomfortable, but I can't
> really find a reason to keep it or see any problems with the way you're using
> refcount either.  It's probably good to see a new version of the series
> regardless.  Thanks,
> 
Ok. thanks. Right. I will send these 3 patches with updated cover letter and commit log enhancements from Cornelia.

> Alex

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2019-05-15 20:42 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-30 22:49 [PATCHv2 00/10] vfio/mdev: Improve vfio/mdev core module Parav Pandit
2019-04-30 22:49 ` [PATCHv2 01/10] vfio/mdev: Avoid release parent reference during error path Parav Pandit
2019-04-30 22:49 ` [PATCHv2 02/10] vfio/mdev: Removed unused kref Parav Pandit
2019-04-30 22:49 ` [PATCHv2 03/10] vfio/mdev: Drop redundant extern for exported symbols Parav Pandit
2019-04-30 22:49 ` [PATCHv2 04/10] vfio/mdev: Avoid masking error code to EBUSY Parav Pandit
2019-04-30 22:49 ` [PATCHv2 05/10] vfio/mdev: Follow correct remove sequence Parav Pandit
2019-04-30 22:49 ` [PATCHv2 06/10] vfio/mdev: Fix aborting mdev child device removal if one fails Parav Pandit
2019-04-30 22:49 ` [PATCHv2 07/10] vfio/mdev: Avoid inline get and put parent helpers Parav Pandit
2019-04-30 22:49 ` [PATCHv2 08/10] vfio/mdev: Improve the create/remove sequence Parav Pandit
2019-05-08 17:09   ` Cornelia Huck
2019-05-08 22:06     ` Parav Pandit
2019-05-09  9:06       ` Cornelia Huck
2019-05-09 16:26         ` Pierre Morel
2019-05-09 22:12           ` Halil Pasic
2019-05-09 19:19         ` Parav Pandit
2019-05-10  7:08           ` Pierre Morel
2019-05-14 20:34           ` Parav Pandit
2019-05-14 22:20             ` Alex Williamson
2019-05-15 20:42               ` Parav Pandit
2019-04-30 22:49 ` [PATCHv2 09/10] vfio/mdev: Avoid creating sysfs remove file on stale device removal Parav Pandit
2019-05-08 17:16   ` Cornelia Huck
2019-05-08 22:13     ` Parav Pandit
2019-05-09  9:18       ` Cornelia Huck
2019-05-09 16:16         ` Parav Pandit
2019-04-30 22:49 ` [PATCHv2 10/10] vfio/mdev: Synchronize device create/remove with parent removal Parav Pandit
2019-05-09  2:46   ` Alex Williamson
2019-05-09  9:49     ` Cornelia Huck
2019-05-09 16:14       ` Parav Pandit
2019-05-09 16:03     ` Parav Pandit
2019-05-06 22:03 ` [PATCHv2 00/10] vfio/mdev: Improve vfio/mdev core module Alex Williamson
2019-05-06 23:22   ` Parav Pandit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).