linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv4 0/3] vfio/mdev: Improve vfio/mdev core module
@ 2019-05-24 13:57 Parav Pandit
  2019-05-24 13:57 ` [PATCHv4 1/3] vfio/mdev: Improve the create/remove sequence Parav Pandit
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Parav Pandit @ 2019-05-24 13:57 UTC (permalink / raw)
  To: kvm, linux-kernel, cohuck, kwankhede, alex.williamson; +Cc: cjia, parav

As we would like to use mdev subsystem for wider use case as
discussed in [1], [2] apart from an offline discussion.
This use case is also discussed with wider forum in [4] in track
'Lightweight NIC HW functions for container offload use cases'.

This series is prep-work and improves vfio/mdev module in following ways.

Patch-1 Improves the mdev create/remove sequence to match Linux
bus, device model
Patch-2 Avoid recreating remove file on stale device to eliminate
call trace
Patch-3 Fix race conditions of create/remove with parent removal.
This is improved version than using srcu as srcu can take seconds
to minutes.

This series is tested using
(a) mtty with VM using vfio_mdev driver for positive tests and device
removal while device in use by VM using vfio_mdev driver.

(b) mlx5 core driver using RFC patches [3] and internal patches.
Internal patches are large and cannot be combined with this prep-work
patches. It will posted once prep-work completes.

[1] https://www.spinics.net/lists/netdev/msg556978.html
[2] https://lkml.org/lkml/2019/3/7/696
[3] https://lkml.org/lkml/2019/3/8/819
[4] https://netdevconf.org/0x13/session.html?workshop-hardware-offload

---
Changelog:
---
v3->v4:
 - Addressed comments from Cornelia for unbalanced mutex_unlock
 - Correct typo of subsquent to subsequent in patch-1 commit log
 - Instead of using refcount and completion, using rwsem to synchronize
   between mdev creation/deletion and parent unregistration
v2->v3:
 - Addressed comment from Cornelia
 - Corrected several errors in commit log, updated commit log
 - Dropped already merged 7 patches
v1->v2:
 - Addressed comments from Alex
 - Rebased
 - Inserted the device checking loop in Patch-6 as original code
 - Added patch 7 to 10
 - Added fixes for race condition in create/remove with parent removal
   Patch-10 uses simplified refcount and completion, instead of srcu
   which might take seconds to minutes on busy system.
 - Added fix for device create/remove sequence to match
   Linux device, bus model
v0->v1:
 - Dropped device placement on bus sequence patch for this series
 - Addressed below comments from Alex, Kirti, Maxim.
 - Added Review-by tag for already reviewed patches.
 - Dropped incorrect patch of put_device().
 - Corrected Fixes commit tag for sysfs remove sequence fix
 - Split last 8th patch to smaller refactor and fixes patch
 - Following coding style commenting format
 - Fixed accidental delete of mutex_lock in mdev_unregister_device
 - Renamed remove helped to mdev_device_remove_common().
 - Rebased for uuid/guid change

Parav Pandit (3):
  vfio/mdev: Improve the create/remove sequence
  vfio/mdev: Avoid creating sysfs remove file on stale device removal
  vfio/mdev: Synchronize device create/remove with parent removal

 drivers/vfio/mdev/mdev_core.c    | 125 ++++++++++++++-----------------
 drivers/vfio/mdev/mdev_private.h |   4 +-
 drivers/vfio/mdev/mdev_sysfs.c   |   6 +-
 3 files changed, 62 insertions(+), 73 deletions(-)

-- 
2.19.2


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCHv4 1/3] vfio/mdev: Improve the create/remove sequence
  2019-05-24 13:57 [PATCHv4 0/3] vfio/mdev: Improve vfio/mdev core module Parav Pandit
@ 2019-05-24 13:57 ` Parav Pandit
  2019-05-24 13:57 ` [PATCHv4 2/3] vfio/mdev: Avoid creating sysfs remove file on stale device removal Parav Pandit
  2019-05-24 13:57 ` [PATCHv4 3/3] vfio/mdev: Synchronize device create/remove with parent removal Parav Pandit
  2 siblings, 0 replies; 6+ messages in thread
From: Parav Pandit @ 2019-05-24 13:57 UTC (permalink / raw)
  To: kvm, linux-kernel, cohuck, kwankhede, alex.williamson; +Cc: cjia, parav

This patch addresses below two issues and prepares the code to address
3rd issue listed below.

1. mdev device is placed on the mdev bus before it is created in the
vendor driver. Once a device is placed on the mdev bus without creating
its supporting underlying vendor device, mdev driver's probe() gets triggered.
However there isn't a stable mdev available to work on.

   create_store()
     mdev_create_device()
       device_register()
          ...
         vfio_mdev_probe()
        [...]
        parent->ops->create()
          vfio_ap_mdev_create()
            mdev_set_drvdata(mdev, matrix_mdev);
            /* Valid pointer set above */

Due to this way of initialization, mdev driver who wants to use the mdev,
doesn't have a valid mdev to work on.

2. Current creation sequence is,
   parent->ops_create()
   groups_register()

Remove sequence is,
   parent->ops->remove()
   groups_unregister()

However, remove sequence should be exact mirror of creation sequence.
Once this is achieved, all users of the mdev will be terminated first
before removing underlying vendor device.
(Follow standard linux driver model).
At that point vendor's remove() ops shouldn't fail because taking the
device off the bus should terminate any usage.

3. When remove operation fails, mdev sysfs removal attempts to add the
file back on already removed device. Following call trace [1] is observed.

[1] call trace:
kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327 sysfs_create_file_ns+0x7f/0x90
kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted 5.1.0-rc6-vdevbus+ #6
kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
kernel: Call Trace:
kernel: remove_store+0xdc/0x100 [mdev]
kernel: kernfs_fop_write+0x113/0x1a0
kernel: vfs_write+0xad/0x1b0
kernel: ksys_write+0x5a/0xe0
kernel: do_syscall_64+0x5a/0x210
kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe

Therefore, mdev core is improved in following ways.

1. Split the device registration/deregistration sequence so that some
things can be done between initialization of the device and
hooking it up to the bus respectively after deregistering it
from the bus but before giving up our final reference.
In particular, this means invoking the ->create and ->remove
callbacks in those new windows. This gives the vendor driver an
initialized mdev device to work with during creation.
At the same time, a bus driver who wish to bind to mdev driver also
gets initialized mdev device.

This follows standard Linux kernel bus and device model.

2. During remove flow, first remove the device from the bus. This
ensures that any bus specific devices are removed.
Once device is taken off the mdev bus, invoke remove() of mdev
from the vendor driver.

3. The driver core device model provides way to register and auto
unregister the device sysfs attribute groups at dev->groups.
Make use of dev->groups to let core create the groups and eliminate
code to avoid explicit groups creation and removal.

To ensure, that new sequence is solid, a below stack dump of a
process is taken who attempts to remove the device while device is in
use by vfio driver and user application.
This stack dump validates that vfio driver guards against such device
removal when device is in use.

 cat /proc/21962/stack
[<0>] vfio_del_group_dev+0x216/0x3c0 [vfio]
[<0>] mdev_remove+0x21/0x40 [mdev]
[<0>] device_release_driver_internal+0xe8/0x1b0
[<0>] bus_remove_device+0xf9/0x170
[<0>] device_del+0x168/0x350
[<0>] mdev_device_remove_common+0x1d/0x50 [mdev]
[<0>] mdev_device_remove+0x8c/0xd0 [mdev]
[<0>] remove_store+0x71/0x90 [mdev]
[<0>] kernfs_fop_write+0x113/0x1a0
[<0>] vfs_write+0xad/0x1b0
[<0>] ksys_write+0x5a/0xe0
[<0>] do_syscall_64+0x5a/0x210
[<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<0>] 0xffffffffffffffff

This prepares the code to eliminate calling device_create_file() in
subsequent patch.

Signed-off-by: Parav Pandit <parav@mellanox.com>
---
 drivers/vfio/mdev/mdev_core.c    | 94 +++++++++-----------------------
 drivers/vfio/mdev/mdev_private.h |  2 +-
 drivers/vfio/mdev/mdev_sysfs.c   |  2 +-
 3 files changed, 27 insertions(+), 71 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 3cc1a05fde1c..0bef0cae1d4b 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -102,55 +102,10 @@ static void mdev_put_parent(struct mdev_parent *parent)
 		kref_put(&parent->ref, mdev_release_parent);
 }
 
-static int mdev_device_create_ops(struct kobject *kobj,
-				  struct mdev_device *mdev)
-{
-	struct mdev_parent *parent = mdev->parent;
-	int ret;
-
-	ret = parent->ops->create(kobj, mdev);
-	if (ret)
-		return ret;
-
-	ret = sysfs_create_groups(&mdev->dev.kobj,
-				  parent->ops->mdev_attr_groups);
-	if (ret)
-		parent->ops->remove(mdev);
-
-	return ret;
-}
-
-/*
- * mdev_device_remove_ops gets called from sysfs's 'remove' and when parent
- * device is being unregistered from mdev device framework.
- * - 'force_remove' is set to 'false' when called from sysfs's 'remove' which
- *   indicates that if the mdev device is active, used by VMM or userspace
- *   application, vendor driver could return error then don't remove the device.
- * - 'force_remove' is set to 'true' when called from mdev_unregister_device()
- *   which indicate that parent device is being removed from mdev device
- *   framework so remove mdev device forcefully.
- */
-static int mdev_device_remove_ops(struct mdev_device *mdev, bool force_remove)
-{
-	struct mdev_parent *parent = mdev->parent;
-	int ret;
-
-	/*
-	 * Vendor driver can return error if VMM or userspace application is
-	 * using this mdev device.
-	 */
-	ret = parent->ops->remove(mdev);
-	if (ret && !force_remove)
-		return ret;
-
-	sysfs_remove_groups(&mdev->dev.kobj, parent->ops->mdev_attr_groups);
-	return 0;
-}
-
 static int mdev_device_remove_cb(struct device *dev, void *data)
 {
 	if (dev_is_mdev(dev))
-		mdev_device_remove(dev, true);
+		mdev_device_remove(dev);
 
 	return 0;
 }
@@ -310,41 +265,43 @@ int mdev_device_create(struct kobject *kobj,
 
 	mdev->parent = parent;
 
+	device_initialize(&mdev->dev);
 	mdev->dev.parent  = dev;
 	mdev->dev.bus     = &mdev_bus_type;
 	mdev->dev.release = mdev_device_release;
 	dev_set_name(&mdev->dev, "%pUl", uuid);
+	mdev->dev.groups = parent->ops->mdev_attr_groups;
+	mdev->type_kobj = kobj;
 
-	ret = device_register(&mdev->dev);
-	if (ret) {
-		put_device(&mdev->dev);
-		goto mdev_fail;
-	}
+	ret = parent->ops->create(kobj, mdev);
+	if (ret)
+		goto ops_create_fail;
 
-	ret = mdev_device_create_ops(kobj, mdev);
+	ret = device_add(&mdev->dev);
 	if (ret)
-		goto create_fail;
+		goto add_fail;
 
 	ret = mdev_create_sysfs_files(&mdev->dev, type);
-	if (ret) {
-		mdev_device_remove_ops(mdev, true);
-		goto create_fail;
-	}
+	if (ret)
+		goto sysfs_fail;
 
-	mdev->type_kobj = kobj;
 	mdev->active = true;
 	dev_dbg(&mdev->dev, "MDEV: created\n");
 
 	return 0;
 
-create_fail:
-	device_unregister(&mdev->dev);
+sysfs_fail:
+	device_del(&mdev->dev);
+add_fail:
+	parent->ops->remove(mdev);
+ops_create_fail:
+	put_device(&mdev->dev);
 mdev_fail:
 	mdev_put_parent(parent);
 	return ret;
 }
 
-int mdev_device_remove(struct device *dev, bool force_remove)
+int mdev_device_remove(struct device *dev)
 {
 	struct mdev_device *mdev, *tmp;
 	struct mdev_parent *parent;
@@ -373,16 +330,15 @@ int mdev_device_remove(struct device *dev, bool force_remove)
 	mutex_unlock(&mdev_list_lock);
 
 	type = to_mdev_type(mdev->type_kobj);
+	mdev_remove_sysfs_files(dev, type);
+	device_del(&mdev->dev);
 	parent = mdev->parent;
+	ret = parent->ops->remove(mdev);
+	if (ret)
+		dev_err(&mdev->dev, "Remove failed: err=%d\n", ret);
 
-	ret = mdev_device_remove_ops(mdev, force_remove);
-	if (ret) {
-		mdev->active = true;
-		return ret;
-	}
-
-	mdev_remove_sysfs_files(dev, type);
-	device_unregister(dev);
+	/* Balances with device_initialize() */
+	put_device(&mdev->dev);
 	mdev_put_parent(parent);
 
 	return 0;
diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
index 36cbbdb754de..924ed2274941 100644
--- a/drivers/vfio/mdev/mdev_private.h
+++ b/drivers/vfio/mdev/mdev_private.h
@@ -60,6 +60,6 @@ void mdev_remove_sysfs_files(struct device *dev, struct mdev_type *type);
 
 int  mdev_device_create(struct kobject *kobj,
 			struct device *dev, const guid_t *uuid);
-int  mdev_device_remove(struct device *dev, bool force_remove);
+int  mdev_device_remove(struct device *dev);
 
 #endif /* MDEV_PRIVATE_H */
diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
index cbf94b8165ea..9f774b91d275 100644
--- a/drivers/vfio/mdev/mdev_sysfs.c
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -236,7 +236,7 @@ static ssize_t remove_store(struct device *dev, struct device_attribute *attr,
 	if (val && device_remove_file_self(dev, attr)) {
 		int ret;
 
-		ret = mdev_device_remove(dev, false);
+		ret = mdev_device_remove(dev);
 		if (ret) {
 			device_create_file(dev, attr);
 			return ret;
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCHv4 2/3] vfio/mdev: Avoid creating sysfs remove file on stale device removal
  2019-05-24 13:57 [PATCHv4 0/3] vfio/mdev: Improve vfio/mdev core module Parav Pandit
  2019-05-24 13:57 ` [PATCHv4 1/3] vfio/mdev: Improve the create/remove sequence Parav Pandit
@ 2019-05-24 13:57 ` Parav Pandit
  2019-05-24 13:57 ` [PATCHv4 3/3] vfio/mdev: Synchronize device create/remove with parent removal Parav Pandit
  2 siblings, 0 replies; 6+ messages in thread
From: Parav Pandit @ 2019-05-24 13:57 UTC (permalink / raw)
  To: kvm, linux-kernel, cohuck, kwankhede, alex.williamson; +Cc: cjia, parav

If device is removal is initiated by two threads as below, mdev core
attempts to create a syfs remove file on stale device.
During this flow, below [1] call trace is observed.

     cpu-0                                    cpu-1
     -----                                    -----
  mdev_unregister_device()
    device_for_each_child
       mdev_device_remove_cb
          mdev_device_remove
                                       user_syscall
                                         remove_store()
                                           mdev_device_remove()
                                        [..]
   unregister device();
                                       /* not found in list or
                                        * active=false.
                                        */
                                          sysfs_create_file()
                                          ..Call trace

Now that mdev core follows correct device removal system of the linux
bus model, remove shouldn't fail in normal cases. If it fails, there is
no point of creating a stale file or checking for specific error status.

kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
sysfs_create_file_ns+0x7f/0x90
kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted
5.1.0-rc6-vdevbus+ #6
kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b
08/09/2016
kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
kernel: Call Trace:
kernel: remove_store+0xdc/0x100 [mdev]
kernel: kernfs_fop_write+0x113/0x1a0
kernel: vfs_write+0xad/0x1b0
kernel: ksys_write+0x5a/0xe0
kernel: do_syscall_64+0x5a/0x210
kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Parav Pandit <parav@mellanox.com>
---
 drivers/vfio/mdev/mdev_sysfs.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
index 9f774b91d275..ffa3dcebf201 100644
--- a/drivers/vfio/mdev/mdev_sysfs.c
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -237,10 +237,8 @@ static ssize_t remove_store(struct device *dev, struct device_attribute *attr,
 		int ret;
 
 		ret = mdev_device_remove(dev);
-		if (ret) {
-			device_create_file(dev, attr);
+		if (ret)
 			return ret;
-		}
 	}
 
 	return count;
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCHv4 3/3] vfio/mdev: Synchronize device create/remove with parent removal
  2019-05-24 13:57 [PATCHv4 0/3] vfio/mdev: Improve vfio/mdev core module Parav Pandit
  2019-05-24 13:57 ` [PATCHv4 1/3] vfio/mdev: Improve the create/remove sequence Parav Pandit
  2019-05-24 13:57 ` [PATCHv4 2/3] vfio/mdev: Avoid creating sysfs remove file on stale device removal Parav Pandit
@ 2019-05-24 13:57 ` Parav Pandit
  2019-05-29 14:56   ` Alex Williamson
  2 siblings, 1 reply; 6+ messages in thread
From: Parav Pandit @ 2019-05-24 13:57 UTC (permalink / raw)
  To: kvm, linux-kernel, cohuck, kwankhede, alex.williamson; +Cc: cjia, parav

In following sequences, child devices created while removing mdev parent
device can be left out, or it may lead to race of removing half
initialized child mdev devices.

issue-1:
--------
       cpu-0                         cpu-1
       -----                         -----
                                  mdev_unregister_device()
                                    device_for_each_child()
                                      mdev_device_remove_cb()
                                        mdev_device_remove()
create_store()
  mdev_device_create()                   [...]
    device_add()
                                  parent_remove_sysfs_files()

/* BUG: device added by cpu-0
 * whose parent is getting removed
 * and it won't process this mdev.
 */

issue-2:
--------
Below crash is observed when user initiated remove is in progress
and mdev_unregister_driver() completes parent unregistration.

       cpu-0                         cpu-1
       -----                         -----
remove_store()
   mdev_device_remove()
   active = false;
                                  mdev_unregister_device()
                                  parent device removed.
   [...]
   parents->ops->remove()
 /*
  * BUG: Accessing invalid parent.
  */

This is similar race like create() racing with mdev_unregister_device().

BUG: unable to handle kernel paging request at ffffffffc0585668
PGD e8f618067 P4D e8f618067 PUD e8f61a067 PMD 85adca067 PTE 0
Oops: 0000 [#1] SMP PTI
CPU: 41 PID: 37403 Comm: bash Kdump: loaded Not tainted 5.1.0-rc6-vdevbus+ #6
Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
RIP: 0010:mdev_device_remove+0xfa/0x140 [mdev]
Call Trace:
 remove_store+0x71/0x90 [mdev]
 kernfs_fop_write+0x113/0x1a0
 vfs_write+0xad/0x1b0
 ksys_write+0x5a/0xe0
 do_syscall_64+0x5a/0x210
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Therefore, mdev core is improved as below to overcome above issues.

Wait for any ongoing mdev create() and remove() to finish before
unregistering parent device.
This continues to allow multiple create and remove to progress in
parallel for different mdev devices as most common case.
At the same time guard parent removal while parent is being access by
create() and remove callbacks.
create()/remove() and unregister_device() are synchronized by the rwsem.

Refactor device removal code to mdev_device_remove_common() to avoid
acquiring unreg_sem of the parent.

Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
Signed-off-by: Parav Pandit <parav@mellanox.com>
---
 drivers/vfio/mdev/mdev_core.c    | 61 ++++++++++++++++++++++++--------
 drivers/vfio/mdev/mdev_private.h |  2 ++
 2 files changed, 49 insertions(+), 14 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 0bef0cae1d4b..c5401a8c6843 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -102,11 +102,36 @@ static void mdev_put_parent(struct mdev_parent *parent)
 		kref_put(&parent->ref, mdev_release_parent);
 }
 
+static void mdev_device_remove_common(struct mdev_device *mdev)
+{
+	struct mdev_parent *parent;
+	struct mdev_type *type;
+	int ret;
+
+	type = to_mdev_type(mdev->type_kobj);
+	mdev_remove_sysfs_files(&mdev->dev, type);
+	device_del(&mdev->dev);
+	parent = mdev->parent;
+	ret = parent->ops->remove(mdev);
+	if (ret)
+		dev_err(&mdev->dev, "Remove failed: err=%d\n", ret);
+
+	/* Balances with device_initialize() */
+	put_device(&mdev->dev);
+	mdev_put_parent(parent);
+}
+
 static int mdev_device_remove_cb(struct device *dev, void *data)
 {
-	if (dev_is_mdev(dev))
-		mdev_device_remove(dev);
+	struct mdev_parent *parent;
+	struct mdev_device *mdev;
 
+	if (!dev_is_mdev(dev))
+		return 0;
+
+	mdev = to_mdev_device(dev);
+	parent = mdev->parent;
+	mdev_device_remove_common(mdev);
 	return 0;
 }
 
@@ -148,6 +173,7 @@ int mdev_register_device(struct device *dev, const struct mdev_parent_ops *ops)
 	}
 
 	kref_init(&parent->ref);
+	init_rwsem(&parent->unreg_sem);
 
 	parent->dev = dev;
 	parent->ops = ops;
@@ -206,13 +232,17 @@ void mdev_unregister_device(struct device *dev)
 	dev_info(dev, "MDEV: Unregistering\n");
 
 	list_del(&parent->next);
+	mutex_unlock(&parent_list_lock);
+
+	down_write(&parent->unreg_sem);
+
 	class_compat_remove_link(mdev_bus_compat_class, dev, NULL);
 
 	device_for_each_child(dev, NULL, mdev_device_remove_cb);
 
 	parent_remove_sysfs_files(parent);
+	up_write(&parent->unreg_sem);
 
-	mutex_unlock(&parent_list_lock);
 	mdev_put_parent(parent);
 }
 EXPORT_SYMBOL(mdev_unregister_device);
@@ -265,6 +295,12 @@ int mdev_device_create(struct kobject *kobj,
 
 	mdev->parent = parent;
 
+	ret = down_read_trylock(&parent->unreg_sem);
+	if (!ret) {
+		ret = -ENODEV;
+		goto mdev_fail;
+	}
+
 	device_initialize(&mdev->dev);
 	mdev->dev.parent  = dev;
 	mdev->dev.bus     = &mdev_bus_type;
@@ -287,6 +323,7 @@ int mdev_device_create(struct kobject *kobj,
 
 	mdev->active = true;
 	dev_dbg(&mdev->dev, "MDEV: created\n");
+	up_read(&parent->unreg_sem);
 
 	return 0;
 
@@ -295,6 +332,7 @@ int mdev_device_create(struct kobject *kobj,
 add_fail:
 	parent->ops->remove(mdev);
 ops_create_fail:
+	up_read(&parent->unreg_sem);
 	put_device(&mdev->dev);
 mdev_fail:
 	mdev_put_parent(parent);
@@ -305,7 +343,6 @@ int mdev_device_remove(struct device *dev)
 {
 	struct mdev_device *mdev, *tmp;
 	struct mdev_parent *parent;
-	struct mdev_type *type;
 	int ret;
 
 	mdev = to_mdev_device(dev);
@@ -329,18 +366,14 @@ int mdev_device_remove(struct device *dev)
 	mdev->active = false;
 	mutex_unlock(&mdev_list_lock);
 
-	type = to_mdev_type(mdev->type_kobj);
-	mdev_remove_sysfs_files(dev, type);
-	device_del(&mdev->dev);
 	parent = mdev->parent;
-	ret = parent->ops->remove(mdev);
-	if (ret)
-		dev_err(&mdev->dev, "Remove failed: err=%d\n", ret);
-
-	/* Balances with device_initialize() */
-	put_device(&mdev->dev);
-	mdev_put_parent(parent);
+	/* Check if parent unregistration has started */
+	ret = down_read_trylock(&parent->unreg_sem);
+	if (!ret)
+		return -ENODEV;
 
+	mdev_device_remove_common(mdev);
+	up_read(&parent->unreg_sem);
 	return 0;
 }
 
diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
index 924ed2274941..398767526276 100644
--- a/drivers/vfio/mdev/mdev_private.h
+++ b/drivers/vfio/mdev/mdev_private.h
@@ -23,6 +23,8 @@ struct mdev_parent {
 	struct list_head next;
 	struct kset *mdev_types_kset;
 	struct list_head type_list;
+	/* Synchronize device creation/removal with parent unregistration */
+	struct rw_semaphore unreg_sem;
 };
 
 struct mdev_device {
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCHv4 3/3] vfio/mdev: Synchronize device create/remove with parent removal
  2019-05-24 13:57 ` [PATCHv4 3/3] vfio/mdev: Synchronize device create/remove with parent removal Parav Pandit
@ 2019-05-29 14:56   ` Alex Williamson
  2019-05-30  8:58     ` Parav Pandit
  0 siblings, 1 reply; 6+ messages in thread
From: Alex Williamson @ 2019-05-29 14:56 UTC (permalink / raw)
  To: Parav Pandit; +Cc: kvm, linux-kernel, cohuck, kwankhede, cjia

On Fri, 24 May 2019 08:57:38 -0500
Parav Pandit <parav@mellanox.com> wrote:

> In following sequences, child devices created while removing mdev parent
> device can be left out, or it may lead to race of removing half
> initialized child mdev devices.
> 
> issue-1:
> --------
>        cpu-0                         cpu-1
>        -----                         -----
>                                   mdev_unregister_device()
>                                     device_for_each_child()
>                                       mdev_device_remove_cb()
>                                         mdev_device_remove()
> create_store()
>   mdev_device_create()                   [...]
>     device_add()
>                                   parent_remove_sysfs_files()
> 
> /* BUG: device added by cpu-0
>  * whose parent is getting removed
>  * and it won't process this mdev.
>  */
> 
> issue-2:
> --------
> Below crash is observed when user initiated remove is in progress
> and mdev_unregister_driver() completes parent unregistration.
> 
>        cpu-0                         cpu-1
>        -----                         -----
> remove_store()
>    mdev_device_remove()
>    active = false;
>                                   mdev_unregister_device()
>                                   parent device removed.
>    [...]
>    parents->ops->remove()
>  /*
>   * BUG: Accessing invalid parent.
>   */
> 
> This is similar race like create() racing with mdev_unregister_device().
> 
> BUG: unable to handle kernel paging request at ffffffffc0585668
> PGD e8f618067 P4D e8f618067 PUD e8f61a067 PMD 85adca067 PTE 0
> Oops: 0000 [#1] SMP PTI
> CPU: 41 PID: 37403 Comm: bash Kdump: loaded Not tainted 5.1.0-rc6-vdevbus+ #6
> Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
> RIP: 0010:mdev_device_remove+0xfa/0x140 [mdev]
> Call Trace:
>  remove_store+0x71/0x90 [mdev]
>  kernfs_fop_write+0x113/0x1a0
>  vfs_write+0xad/0x1b0
>  ksys_write+0x5a/0xe0
>  do_syscall_64+0x5a/0x210
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> Therefore, mdev core is improved as below to overcome above issues.
> 
> Wait for any ongoing mdev create() and remove() to finish before
> unregistering parent device.
> This continues to allow multiple create and remove to progress in
> parallel for different mdev devices as most common case.
> At the same time guard parent removal while parent is being access by
> create() and remove callbacks.
> create()/remove() and unregister_device() are synchronized by the rwsem.
> 
> Refactor device removal code to mdev_device_remove_common() to avoid
> acquiring unreg_sem of the parent.
> 
> Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
> Signed-off-by: Parav Pandit <parav@mellanox.com>
> ---
>  drivers/vfio/mdev/mdev_core.c    | 61 ++++++++++++++++++++++++--------
>  drivers/vfio/mdev/mdev_private.h |  2 ++
>  2 files changed, 49 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index 0bef0cae1d4b..c5401a8c6843 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -102,11 +102,36 @@ static void mdev_put_parent(struct mdev_parent *parent)
>  		kref_put(&parent->ref, mdev_release_parent);
>  }
>  

Some sort of locking semantics comment would be useful here, ex:

/* Caller holds parent unreg_sem read or write lock */

> +static void mdev_device_remove_common(struct mdev_device *mdev)
> +{
> +	struct mdev_parent *parent;
> +	struct mdev_type *type;
> +	int ret;
> +
> +	type = to_mdev_type(mdev->type_kobj);
> +	mdev_remove_sysfs_files(&mdev->dev, type);
> +	device_del(&mdev->dev);
> +	parent = mdev->parent;
> +	ret = parent->ops->remove(mdev);
> +	if (ret)
> +		dev_err(&mdev->dev, "Remove failed: err=%d\n", ret);
> +
> +	/* Balances with device_initialize() */
> +	put_device(&mdev->dev);
> +	mdev_put_parent(parent);
> +}
> +
>  static int mdev_device_remove_cb(struct device *dev, void *data)
>  {
> -	if (dev_is_mdev(dev))
> -		mdev_device_remove(dev);
> +	struct mdev_parent *parent;
> +	struct mdev_device *mdev;
>  
> +	if (!dev_is_mdev(dev))
> +		return 0;
> +
> +	mdev = to_mdev_device(dev);
> +	parent = mdev->parent;
> +	mdev_device_remove_common(mdev);

'parent' is unused here and we only use mdev once, so we probably don't
need to put it in a local variable.

>  	return 0;
>  }
>  
> @@ -148,6 +173,7 @@ int mdev_register_device(struct device *dev, const struct mdev_parent_ops *ops)
>  	}
>  
>  	kref_init(&parent->ref);
> +	init_rwsem(&parent->unreg_sem);
>  
>  	parent->dev = dev;
>  	parent->ops = ops;
> @@ -206,13 +232,17 @@ void mdev_unregister_device(struct device *dev)
>  	dev_info(dev, "MDEV: Unregistering\n");
>  
>  	list_del(&parent->next);
> +	mutex_unlock(&parent_list_lock);
> +
> +	down_write(&parent->unreg_sem);
> +
>  	class_compat_remove_link(mdev_bus_compat_class, dev, NULL);
>  
>  	device_for_each_child(dev, NULL, mdev_device_remove_cb);
>  
>  	parent_remove_sysfs_files(parent);
> +	up_write(&parent->unreg_sem);
>  
> -	mutex_unlock(&parent_list_lock);
>  	mdev_put_parent(parent);
>  }
>  EXPORT_SYMBOL(mdev_unregister_device);
> @@ -265,6 +295,12 @@ int mdev_device_create(struct kobject *kobj,
>  
>  	mdev->parent = parent;
>  
> +	ret = down_read_trylock(&parent->unreg_sem);
> +	if (!ret) {
> +		ret = -ENODEV;

I would have expected -EAGAIN or -EBUSY here, but I guess that since we
consider the lock-out to deterministically be the parent going away
that -ENODEV makes sense.  Ok.

> +		goto mdev_fail;
> +	}
> +
>  	device_initialize(&mdev->dev);
>  	mdev->dev.parent  = dev;
>  	mdev->dev.bus     = &mdev_bus_type;
> @@ -287,6 +323,7 @@ int mdev_device_create(struct kobject *kobj,
>  
>  	mdev->active = true;
>  	dev_dbg(&mdev->dev, "MDEV: created\n");
> +	up_read(&parent->unreg_sem);
>  
>  	return 0;
>  
> @@ -295,6 +332,7 @@ int mdev_device_create(struct kobject *kobj,
>  add_fail:
>  	parent->ops->remove(mdev);
>  ops_create_fail:
> +	up_read(&parent->unreg_sem);
>  	put_device(&mdev->dev);
>  mdev_fail:
>  	mdev_put_parent(parent);
> @@ -305,7 +343,6 @@ int mdev_device_remove(struct device *dev)
>  {
>  	struct mdev_device *mdev, *tmp;
>  	struct mdev_parent *parent;
> -	struct mdev_type *type;
>  	int ret;
>  
>  	mdev = to_mdev_device(dev);
> @@ -329,18 +366,14 @@ int mdev_device_remove(struct device *dev)
>  	mdev->active = false;
>  	mutex_unlock(&mdev_list_lock);
>  
> -	type = to_mdev_type(mdev->type_kobj);
> -	mdev_remove_sysfs_files(dev, type);
> -	device_del(&mdev->dev);
>  	parent = mdev->parent;
> -	ret = parent->ops->remove(mdev);
> -	if (ret)
> -		dev_err(&mdev->dev, "Remove failed: err=%d\n", ret);
> -
> -	/* Balances with device_initialize() */
> -	put_device(&mdev->dev);
> -	mdev_put_parent(parent);
> +	/* Check if parent unregistration has started */
> +	ret = down_read_trylock(&parent->unreg_sem);
> +	if (!ret)
> +		return -ENODEV;
>  
> +	mdev_device_remove_common(mdev);
> +	up_read(&parent->unreg_sem);
>  	return 0;
>  }
>  
> diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
> index 924ed2274941..398767526276 100644
> --- a/drivers/vfio/mdev/mdev_private.h
> +++ b/drivers/vfio/mdev/mdev_private.h
> @@ -23,6 +23,8 @@ struct mdev_parent {
>  	struct list_head next;
>  	struct kset *mdev_types_kset;
>  	struct list_head type_list;
> +	/* Synchronize device creation/removal with parent unregistration */
> +	struct rw_semaphore unreg_sem;
>  };
>  
>  struct mdev_device {


^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCHv4 3/3] vfio/mdev: Synchronize device create/remove with parent removal
  2019-05-29 14:56   ` Alex Williamson
@ 2019-05-30  8:58     ` Parav Pandit
  0 siblings, 0 replies; 6+ messages in thread
From: Parav Pandit @ 2019-05-30  8:58 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, linux-kernel, cohuck, kwankhede, cjia

Hi Alex,

> -----Original Message-----
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Wednesday, May 29, 2019 8:27 PM

[..]
> >
> > diff --git a/drivers/vfio/mdev/mdev_core.c
> > b/drivers/vfio/mdev/mdev_core.c index 0bef0cae1d4b..c5401a8c6843
> > 100644
> > --- a/drivers/vfio/mdev/mdev_core.c
> > +++ b/drivers/vfio/mdev/mdev_core.c
> > @@ -102,11 +102,36 @@ static void mdev_put_parent(struct mdev_parent
> *parent)
> >  		kref_put(&parent->ref, mdev_release_parent);  }
> >
> 
> Some sort of locking semantics comment would be useful here, ex:
> 
> /* Caller holds parent unreg_sem read or write lock */
> 
Added.

> > +
> >  static int mdev_device_remove_cb(struct device *dev, void *data)  {
> > -	if (dev_is_mdev(dev))
> > -		mdev_device_remove(dev);
> > +	struct mdev_parent *parent;
> > +	struct mdev_device *mdev;
> >
> > +	if (!dev_is_mdev(dev))
> > +		return 0;
> > +
> > +	mdev = to_mdev_device(dev);
> > +	parent = mdev->parent;
> > +	mdev_device_remove_common(mdev);
> 
> 'parent' is unused here and we only use mdev once, so we probably don't need
> to put it in a local variable.
> 
Right left out from previous code.
Removed and refactored the code now.

> >  	return 0;
> >  }
> >
> > @@ -148,6 +173,7 @@ int mdev_register_device(struct device *dev, const
> struct mdev_parent_ops *ops)
> >  	}
> >
> >  	kref_init(&parent->ref);
> > +	init_rwsem(&parent->unreg_sem);
> >
> >  	parent->dev = dev;
> >  	parent->ops = ops;
> > @@ -206,13 +232,17 @@ void mdev_unregister_device(struct device *dev)
> >  	dev_info(dev, "MDEV: Unregistering\n");
> >
> >  	list_del(&parent->next);
> > +	mutex_unlock(&parent_list_lock);
> > +
> > +	down_write(&parent->unreg_sem);
> > +
> >  	class_compat_remove_link(mdev_bus_compat_class, dev, NULL);
> >
> >  	device_for_each_child(dev, NULL, mdev_device_remove_cb);
> >
> >  	parent_remove_sysfs_files(parent);
> > +	up_write(&parent->unreg_sem);
> >
> > -	mutex_unlock(&parent_list_lock);
> >  	mdev_put_parent(parent);
> >  }
> >  EXPORT_SYMBOL(mdev_unregister_device);
> > @@ -265,6 +295,12 @@ int mdev_device_create(struct kobject *kobj,
> >
> >  	mdev->parent = parent;
> >
> > +	ret = down_read_trylock(&parent->unreg_sem);
> > +	if (!ret) {
> > +		ret = -ENODEV;
> 
> I would have expected -EAGAIN or -EBUSY here, but I guess that since we
> consider the lock-out to deterministically be the parent going away that -
> ENODEV makes sense.  Ok.
> 
Yeah, I agree that ENODEV is more accurate error code as we don't want to tell user to retry so EAGAIN is less appropriate.
Sending v5.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-05-30  8:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-24 13:57 [PATCHv4 0/3] vfio/mdev: Improve vfio/mdev core module Parav Pandit
2019-05-24 13:57 ` [PATCHv4 1/3] vfio/mdev: Improve the create/remove sequence Parav Pandit
2019-05-24 13:57 ` [PATCHv4 2/3] vfio/mdev: Avoid creating sysfs remove file on stale device removal Parav Pandit
2019-05-24 13:57 ` [PATCHv4 3/3] vfio/mdev: Synchronize device create/remove with parent removal Parav Pandit
2019-05-29 14:56   ` Alex Williamson
2019-05-30  8:58     ` Parav Pandit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).