[PATCH 0/6] block: reread partitions changes and fix for loop

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/6] block: reread partitions changes and fix for loop
@ 2015-04-05  7:24 Ming Lei
  2015-04-05  7:24 ` [PATCH 1/6] block: export blkdev_reread_part() Ming Lei
                   ` (6 more replies)
  0 siblings, 7 replies; 37+ messages in thread
From: Ming Lei @ 2015-04-05  7:24 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel, Christoph Hellwig, Tejun Heo
  Cc: Andrew Morton, Alexander Viro, Jarod Wilson, David Herrmann,
	Markus Pargmann, nbd-general, Stefan Haberland, Sebastian Ott,
	Fabian Frederick, linux-s390

Recently there are several reports about loop partition scanning
failure[1][2].

For loop, the root cause is one ABBA and one AA lock dependency
issue, and the two are fixed by patch 2 and patch 3 each.

Another reason is from the trylock in blkdev_reread_part(), which
may cause partition scanning failure too sometimes when another task
is holding the bd_mutex. In the discussion[1], both Tejun and Christoph
suggests to replace the trylock with mutex_lock in blkdev_reread_part(),
also Christoph suggests to export blkdev_reread_part.

Following the discussion, this patchset exports blkdev_reread_part(), and
introduces blkdev_reread_part_nolock() for fixing loop's AA lock issue.
Then ioctl_by_bdev(BLKRRPART) in loop, nbd and dasd is replaced with
blkdev_reread_part(). In the last patch, trylock in __blkdev_reread_part()
is replaced with mutex_lock, and some analysis is provided about the conversion.

[1], https://lkml.org/lkml/2015/1/26/137
[2], https://lkml.org/lkml/2015/3/31/888

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 1/6] block: export blkdev_reread_part()
  2015-04-05  7:24 [PATCH 0/6] block: reread partitions changes and fix for loop Ming Lei
@ 2015-04-05  7:24 ` Ming Lei
  2015-04-05 16:12   ` Christoph Hellwig
  2015-04-05  7:24 ` [PATCH 2/6] block: loop: don't hold lo_ctl_mutex in lo_open Ming Lei
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 37+ messages in thread
From: Ming Lei @ 2015-04-05  7:24 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel, Christoph Hellwig, Tejun Heo
  Cc: Andrew Morton, Alexander Viro, Jarod Wilson, David Herrmann,
	Markus Pargmann, nbd-general, Stefan Haberland, Sebastian Ott,
	Fabian Frederick, linux-s390, Ming Lei

This patch exports blkdev_reread_part() for block drivers, also
introduce blkdev_reread_part_no_lock().

For some drivers, such as loop, reread partition can be run
from release path, and the bd_mutex has been held already before
calling ioctl_by_bdev(bdev, BLKRRPART, 0), so this patch introduces
blkdev_reread_part_no_lock() too for this requirment.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 block/ioctl.c      |   14 ++++++++++----
 include/linux/fs.h |    9 +++++++++
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/block/ioctl.c b/block/ioctl.c
index 7d8befd..7dce19a 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -150,7 +150,11 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user
 	}
 }
 
-static int blkdev_reread_part(struct block_device *bdev)
+/*
+ * This is exported as API for block driver, can be called
+ * with requiring bd_mutex or not.
+ */
+int __blkdev_reread_part(struct block_device *bdev, bool lock)
 {
 	struct gendisk *disk = bdev->bd_disk;
 	int res;
@@ -159,12 +163,14 @@ static int blkdev_reread_part(struct block_device *bdev)
 		return -EINVAL;
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
-	if (!mutex_trylock(&bdev->bd_mutex))
+	if (lock && !mutex_trylock(&bdev->bd_mutex))
 		return -EBUSY;
 	res = rescan_partitions(disk, bdev);
-	mutex_unlock(&bdev->bd_mutex);
+	if (lock)
+		mutex_unlock(&bdev->bd_mutex);
 	return res;
 }
+EXPORT_SYMBOL(__blkdev_reread_part);
 
 static int blk_ioctl_discard(struct block_device *bdev, uint64_t start,
 			     uint64_t len, int secure)
@@ -407,7 +413,7 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
 		ret = blkpg_ioctl(bdev, (struct blkpg_ioctl_arg __user *) arg);
 		break;
 	case BLKRRPART:
-		ret = blkdev_reread_part(bdev);
+		ret = __blkdev_reread_part(bdev, true);
 		break;
 	case BLKGETSIZE:
 		size = i_size_read(bdev->bd_inode);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 368e349..948bf75 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2275,6 +2275,15 @@ extern struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
 extern struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode,
 					      void *holder);
 extern void blkdev_put(struct block_device *bdev, fmode_t mode);
+extern int __blkdev_reread_part(struct block_device *bdev, bool lock);
+static inline int blkdev_reread_part(struct block_device *bdev)
+{
+	return __blkdev_reread_part(bdev, true);
+}
+static inline int blkdev_reread_part_nolock(struct block_device *bdev)
+{
+	return __blkdev_reread_part(bdev, false);
+}
 #ifdef CONFIG_SYSFS
 extern int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk);
 extern void bd_unlink_disk_holder(struct block_device *bdev,
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 2/6] block: loop: don't hold lo_ctl_mutex in lo_open
  2015-04-05  7:24 [PATCH 0/6] block: reread partitions changes and fix for loop Ming Lei
  2015-04-05  7:24 ` [PATCH 1/6] block: export blkdev_reread_part() Ming Lei
@ 2015-04-05  7:24 ` Ming Lei
  2015-04-05 16:28   ` Ming Lei
  2015-04-05  7:24 ` [PATCH 3/6] block: loop: fix another reread part failure Ming Lei
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 37+ messages in thread
From: Ming Lei @ 2015-04-05  7:24 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel, Christoph Hellwig, Tejun Heo
  Cc: Andrew Morton, Alexander Viro, Jarod Wilson, David Herrmann,
	Markus Pargmann, nbd-general, Stefan Haberland, Sebastian Ott,
	Fabian Frederick, linux-s390, Ming Lei

The lo_ctl_mutex is held for running all ioctl handlers, and
in some ioctl handlers, ioctl_by_bdev(BLKRRPART) is called for
rereading partitions, which requires bd_mutex.

So it is easy to cause failure because trylock(bd_mutex) may
fail inside blkdev_reread_part(), and follows the lock context:

blkid or other application:
	->open()
		->mutex_lock(bd_mutex)
		->lo_open()
			->mutex_lock(lo_ctl_mutex)

losetup(set fd ioctl):
	->mutex_lock(lo_ctl_mutex)
	->ioctl_by_bdev(BLKRRPART)
		->trylock(bd_mutex)

This patch trys to eliminate the ABBA lock dependency by removing
lo_ctl_mutext in lo_open() with the following approach:

1) introduce lo_open_mutex to protect lo_refcnt and avoid acquiring
lo_ctl_mutex in lo_open():
	- for open vs. add/del loop, no any problem because of loop_index_mutex
	- lo_open_mutex is used for syncing open() and loop_clr_fd()
	- both open() and release() have been serialized by bd_mutex already

2) don't hold lo_ctl_mutex for decreasing/checking lo_refcnt in
lo_release(), then lo_ctl_mutex is only required for the last release.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 drivers/block/loop.c |   32 ++++++++++++++++++++++++++------
 drivers/block/loop.h |    1 +
 2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index d1f168b..81a6bc1 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -879,14 +879,18 @@ static int loop_clr_fd(struct loop_device *lo)
 	 * <dev>/do something like mkfs/losetup -d <dev> causing the losetup -d
 	 * command to fail with EBUSY.
 	 */
+	mutex_lock(&lo->lo_open_mutex);
 	if (lo->lo_refcnt > 1) {
+		mutex_unlock(&lo->lo_open_mutex);
 		lo->lo_flags |= LO_FLAGS_AUTOCLEAR;
 		mutex_unlock(&lo->lo_ctl_mutex);
 		return 0;
 	}
 
-	if (filp == NULL)
+	if (filp == NULL) {
+		mutex_unlock(&lo->lo_open_mutex);
 		return -EINVAL;
+	}
 
 	spin_lock_irq(&lo->lo_lock);
 	lo->lo_state = Lo_rundown;
@@ -919,6 +923,15 @@ static int loop_clr_fd(struct loop_device *lo)
 	lo->lo_state = Lo_unbound;
 	/* This is safe: open() is still holding a reference. */
 	module_put(THIS_MODULE);
+
+	/*
+	 * Unlock open_mutex for avoiding -EBUSY of rereading part:
+	 * - try to acquire bd_mutex from reread part
+	 * - another task is opening the loop with holding bd_mutex
+	 *   and trys to acquire open_mutex
+	 */
+	mutex_unlock(&lo->lo_open_mutex);
+
 	if (lo->lo_flags & LO_FLAGS_PARTSCAN && bdev)
 		ioctl_by_bdev(bdev, BLKRRPART, 0);
 	lo->lo_flags = 0;
@@ -1376,9 +1389,9 @@ static int lo_open(struct block_device *bdev, fmode_t mode)
 		goto out;
 	}
 
-	mutex_lock(&lo->lo_ctl_mutex);
+	mutex_lock(&lo->lo_open_mutex);
 	lo->lo_refcnt++;
-	mutex_unlock(&lo->lo_ctl_mutex);
+	mutex_unlock(&lo->lo_open_mutex);
 out:
 	mutex_unlock(&loop_index_mutex);
 	return err;
@@ -1387,13 +1400,16 @@ out:
 static void lo_release(struct gendisk *disk, fmode_t mode)
 {
 	struct loop_device *lo = disk->private_data;
-	int err;
+	int err, ref;
 
-	mutex_lock(&lo->lo_ctl_mutex);
+	mutex_lock(&lo->lo_open_mutex);
+	ref = --lo->lo_refcnt;
+	mutex_unlock(&lo->lo_open_mutex);
 
-	if (--lo->lo_refcnt)
+	if (ref)
 		goto out;
 
+	mutex_lock(&lo->lo_ctl_mutex);
 	if (lo->lo_flags & LO_FLAGS_AUTOCLEAR) {
 		/*
 		 * In autoclear mode, stop the loop thread
@@ -1646,6 +1662,7 @@ static int loop_add(struct loop_device **l, int i)
 		disk->flags |= GENHD_FL_NO_PART_SCAN;
 	disk->flags |= GENHD_FL_EXT_DEVT;
 	mutex_init(&lo->lo_ctl_mutex);
+	mutex_init(&lo->lo_open_mutex);
 	lo->lo_number		= i;
 	spin_lock_init(&lo->lo_lock);
 	disk->major		= LOOP_MAJOR;
@@ -1763,11 +1780,14 @@ static long loop_control_ioctl(struct file *file, unsigned int cmd,
 			mutex_unlock(&lo->lo_ctl_mutex);
 			break;
 		}
+		mutex_lock(&lo->lo_open_mutex);
 		if (lo->lo_refcnt > 0) {
 			ret = -EBUSY;
+			mutex_unlock(&lo->lo_open_mutex);
 			mutex_unlock(&lo->lo_ctl_mutex);
 			break;
 		}
+		mutex_unlock(&lo->lo_open_mutex);
 		lo->lo_disk->private_data = NULL;
 		mutex_unlock(&lo->lo_ctl_mutex);
 		idr_remove(&loop_index_idr, lo->lo_number);
diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index 301c27f..1b4acf2 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -59,6 +59,7 @@ struct loop_device {
 	bool			write_started;
 	int			lo_state;
 	struct mutex		lo_ctl_mutex;
+	struct mutex		lo_open_mutex;
 
 	struct request_queue	*lo_queue;
 	struct blk_mq_tag_set	tag_set;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 3/6] block: loop: fix another reread part failure
  2015-04-05  7:24 [PATCH 0/6] block: reread partitions changes and fix for loop Ming Lei
  2015-04-05  7:24 ` [PATCH 1/6] block: export blkdev_reread_part() Ming Lei
  2015-04-05  7:24 ` [PATCH 2/6] block: loop: don't hold lo_ctl_mutex in lo_open Ming Lei
@ 2015-04-05  7:24 ` Ming Lei
  2015-04-05  7:24 ` [PATCH 4/6] block: nbd: convert to blkdev_reread_part() Ming Lei
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 37+ messages in thread
From: Ming Lei @ 2015-04-05  7:24 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel, Christoph Hellwig, Tejun Heo
  Cc: Andrew Morton, Alexander Viro, Jarod Wilson, David Herrmann,
	Markus Pargmann, nbd-general, Stefan Haberland, Sebastian Ott,
	Fabian Frederick, linux-s390, Ming Lei

loop_clr_fd() can be run piggyback with lo_release(), and
under this situation, reread partition may always fail because
of bd_mutex which has been held in release path.

This patch detects the situation by the reference count, and
call blkdev_reread_part_nolock() to avoid acquiring the lock again.

In the meantime, this patch switches to new kernel APIs
of blkdev_reread_part() and blkdev_reread_part_nolock().

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 drivers/block/loop.c |   28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 81a6bc1..80c9d87 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -528,6 +528,26 @@ static int loop_flush(struct loop_device *lo)
 	return loop_switch(lo, NULL);
 }
 
+static void loop_reread_partitions(struct loop_device *lo,
+				   struct block_device *bdev)
+{
+	int rc;
+	bool in_release;
+
+	mutex_lock(&lo->lo_open_mutex);
+	in_release = lo->lo_refcnt == 0;
+	mutex_unlock(&lo->lo_open_mutex);
+
+	/* bd_mutex has been held already in release path */
+	if (in_release)
+		rc = blkdev_reread_part_nolock(bdev);
+	else
+		rc = blkdev_reread_part(bdev);
+	if (rc)
+		pr_warn("%s: partition scan of loop%d (%s) failed (rc=%d)\n",
+			__func__, lo->lo_number, lo->lo_file_name, rc);
+}
+
 /*
  * loop_change_fd switched the backing store of a loopback device to
  * a new file. This is useful for operating system installers to free up
@@ -576,7 +596,7 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev,
 
 	fput(old_file);
 	if (lo->lo_flags & LO_FLAGS_PARTSCAN)
-		ioctl_by_bdev(bdev, BLKRRPART, 0);
+		loop_reread_partitions(lo, bdev);
 	return 0;
 
  out_putf:
@@ -807,7 +827,7 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 	if (part_shift)
 		lo->lo_flags |= LO_FLAGS_PARTSCAN;
 	if (lo->lo_flags & LO_FLAGS_PARTSCAN)
-		ioctl_by_bdev(bdev, BLKRRPART, 0);
+		loop_reread_partitions(lo, bdev);
 
 	/* Grab the block_device to prevent its destruction after we
 	 * put /dev/loopXX inode. Later in loop_clr_fd() we bdput(bdev).
@@ -933,7 +953,7 @@ static int loop_clr_fd(struct loop_device *lo)
 	mutex_unlock(&lo->lo_open_mutex);
 
 	if (lo->lo_flags & LO_FLAGS_PARTSCAN && bdev)
-		ioctl_by_bdev(bdev, BLKRRPART, 0);
+		loop_reread_partitions(lo, bdev);
 	lo->lo_flags = 0;
 	if (!part_shift)
 		lo->lo_disk->flags |= GENHD_FL_NO_PART_SCAN;
@@ -1008,7 +1028,7 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
 	     !(lo->lo_flags & LO_FLAGS_PARTSCAN)) {
 		lo->lo_flags |= LO_FLAGS_PARTSCAN;
 		lo->lo_disk->flags &= ~GENHD_FL_NO_PART_SCAN;
-		ioctl_by_bdev(lo->lo_device, BLKRRPART, 0);
+		loop_reread_partitions(lo, lo->lo_device);
 	}
 
 	lo->lo_encrypt_key_size = info->lo_encrypt_key_size;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 4/6] block: nbd: convert to blkdev_reread_part()
  2015-04-05  7:24 [PATCH 0/6] block: reread partitions changes and fix for loop Ming Lei
                   ` (2 preceding siblings ...)
  2015-04-05  7:24 ` [PATCH 3/6] block: loop: fix another reread part failure Ming Lei
@ 2015-04-05  7:24 ` Ming Lei
  2015-04-05  7:24 ` [PATCH 5/6] block: dasd_genhd: convert to blkdev_reread_part Ming Lei
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 37+ messages in thread
From: Ming Lei @ 2015-04-05  7:24 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel, Christoph Hellwig, Tejun Heo
  Cc: Andrew Morton, Alexander Viro, Jarod Wilson, David Herrmann,
	Markus Pargmann, nbd-general, Stefan Haberland, Sebastian Ott,
	Fabian Frederick, linux-s390, Ming Lei

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 drivers/block/nbd.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index a98c41f..7e9d26e 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -734,7 +734,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
 		bdev->bd_inode->i_size = 0;
 		set_capacity(nbd->disk, 0);
 		if (max_part > 0)
-			ioctl_by_bdev(bdev, BLKRRPART, 0);
+			blkdev_reread_part(bdev);
 		if (nbd->disconnect) /* user requested, ignore socket errors */
 			return 0;
 		return nbd->harderror;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 5/6] block: dasd_genhd: convert to blkdev_reread_part
  2015-04-05  7:24 [PATCH 0/6] block: reread partitions changes and fix for loop Ming Lei
                   ` (3 preceding siblings ...)
  2015-04-05  7:24 ` [PATCH 4/6] block: nbd: convert to blkdev_reread_part() Ming Lei
@ 2015-04-05  7:24 ` Ming Lei
  2015-04-06 13:46   ` Jarod Wilson
  2015-04-05  7:24 ` [RFC PATCH 6/6] block: replace trylock with mutex_lock in __blkdev_reread_part() Ming Lei
  2015-04-08  6:23 ` [PATCH 0/7] block: reread partitions improvements Jarod Wilson
  6 siblings, 1 reply; 37+ messages in thread
From: Ming Lei @ 2015-04-05  7:24 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel, Christoph Hellwig, Tejun Heo
  Cc: Andrew Morton, Alexander Viro, Jarod Wilson, David Herrmann,
	Markus Pargmann, nbd-general, Stefan Haberland, Sebastian Ott,
	Fabian Frederick, linux-s390, Ming Lei

Also remove the obsolete comment.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 drivers/s390/block/dasd_genhd.c |    9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/s390/block/dasd_genhd.c b/drivers/s390/block/dasd_genhd.c
index 90f39f7..2af4619 100644
--- a/drivers/s390/block/dasd_genhd.c
+++ b/drivers/s390/block/dasd_genhd.c
@@ -116,14 +116,11 @@ int dasd_scan_partitions(struct dasd_block *block)
 			      rc);
 		return -ENODEV;
 	}
-	/*
-	 * See fs/partition/check.c:register_disk,rescan_partitions
-	 * Can't call rescan_partitions directly. Use ioctl.
-	 */
-	rc = ioctl_by_bdev(bdev, BLKRRPART, 0);
+
+	rc = blkdev_reread_part(bdev);
 	while (rc == -EBUSY && retry > 0) {
 		schedule();
-		rc = ioctl_by_bdev(bdev, BLKRRPART, 0);
+		rc = blkdev_reread_part(bdev);
 		retry--;
 		DBF_DEV_EVENT(DBF_ERR, block->base,
 			      "scan partitions error, retry %d rc %d",
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [RFC PATCH 6/6] block: replace trylock with mutex_lock in __blkdev_reread_part()
  2015-04-05  7:24 [PATCH 0/6] block: reread partitions changes and fix for loop Ming Lei
                   ` (4 preceding siblings ...)
  2015-04-05  7:24 ` [PATCH 5/6] block: dasd_genhd: convert to blkdev_reread_part Ming Lei
@ 2015-04-05  7:24 ` Ming Lei
  2015-04-08  6:23 ` [PATCH 0/7] block: reread partitions improvements Jarod Wilson
  6 siblings, 0 replies; 37+ messages in thread
From: Ming Lei @ 2015-04-05  7:24 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel, Christoph Hellwig, Tejun Heo
  Cc: Andrew Morton, Alexander Viro, Jarod Wilson, David Herrmann,
	Markus Pargmann, nbd-general, Stefan Haberland, Sebastian Ott,
	Fabian Frederick, linux-s390, Ming Lei

The only possible problem of using mutex_lock() instead of trylock
is about deadlock.

If there aren't any locks held before calling __blkdev_reread_part(lock),
deadlock can't be caused by this conversion.

If there are locks held before calling __blkdev_reread_part(lock),
and if these locks arn't required in open, close handler and I/O
path, deadlock shouldn't be caused too.

Both user space's ioctl(BLKRRPART) and md_setup_drive() from
init/do_mounts_md.c belongs to the 1st case, so the conversion is safe
for the two cases.

For loop, the previous patches in this pathset has fixed the ABBA lock
dependency, so the conversion is OK.

For nbd, tx_lock is held when calling the function:

	- both open and release won't hold the lock
	- when blkdev_reread_part() is run, I/O thread has been stopped
	already, so tx_lock won't be acquired in I/O path at that time.
	- so the conversion won't cause deadlock for nbd

For dasd, both dasd_open(), dasd_release() and request function don't
acquire any mutex/semphone, so the conversion should be safe.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 block/ioctl.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/block/ioctl.c b/block/ioctl.c
index 7dce19a..a7aaf13 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -153,6 +153,12 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user
 /*
  * This is exported as API for block driver, can be called
  * with requiring bd_mutex or not.
+ *
+ * When lock is true, make sure the held locks aren't required in
+ * open()/close() handler and I/O path for avoiding ABBA deadlock:
+ * - bd_mutex is held before calling block driver's open/close
+ *   handler
+ * - reading partition table may submit I/O to the block device
  */
 int __blkdev_reread_part(struct block_device *bdev, bool lock)
 {
@@ -163,8 +169,8 @@ int __blkdev_reread_part(struct block_device *bdev, bool lock)
 		return -EINVAL;
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
-	if (lock && !mutex_trylock(&bdev->bd_mutex))
-		return -EBUSY;
+	if (lock)
+		mutex_lock(&bdev->bd_mutex);
 	res = rescan_partitions(disk, bdev);
 	if (lock)
 		mutex_unlock(&bdev->bd_mutex);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/6] block: export blkdev_reread_part()
  2015-04-05  7:24 ` [PATCH 1/6] block: export blkdev_reread_part() Ming Lei
@ 2015-04-05 16:12   ` Christoph Hellwig
  2015-04-05 16:40     ` Ming Lei
  0 siblings, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2015-04-05 16:12 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, Christoph Hellwig, Tejun Heo,
	Andrew Morton, Alexander Viro, Jarod Wilson, David Herrmann,
	Markus Pargmann, nbd-general, Stefan Haberland, Sebastian Ott,
	Fabian Frederick, linux-s390

> +/*
> + * This is exported as API for block driver, can be called
> + * with requiring bd_mutex or not.
> + */
> +int __blkdev_reread_part(struct block_device *bdev, bool lock)
>  {
>  	struct gendisk *disk = bdev->bd_disk;
>  	int res;
> @@ -159,12 +163,14 @@ static int blkdev_reread_part(struct block_device *bdev)
>  		return -EINVAL;
>  	if (!capable(CAP_SYS_ADMIN))
>  		return -EACCES;
> -	if (!mutex_trylock(&bdev->bd_mutex))
> +	if (lock && !mutex_trylock(&bdev->bd_mutex))
>  		return -EBUSY;

Please don't add funtions that do conditional locking, instead move
all the code into blkdev_reread_part_nolock, and then wrap it:

int blkdev_reread_part(struct block_device *bdev)
{
	if (!mutex_trylock(&bdev->bd_mutex))
		return -EBUSY;
	blkdev_reread_part_nolock(bdev);
	mutex_unlock(&bdev->bd_mutex);
}

Please also add a lockdep_assert_held to blkdev_reread_part_nolock to
ensure callers actually do hold the lock.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/6] block: loop: don't hold lo_ctl_mutex in lo_open
  2015-04-05  7:24 ` [PATCH 2/6] block: loop: don't hold lo_ctl_mutex in lo_open Ming Lei
@ 2015-04-05 16:28   ` Ming Lei
  2015-04-06 14:49     ` Christoph Hellwig
  0 siblings, 1 reply; 37+ messages in thread
From: Ming Lei @ 2015-04-05 16:28 UTC (permalink / raw)
  To: Jens Axboe, Linux Kernel Mailing List, Christoph Hellwig, Tejun Heo
  Cc: Andrew Morton, Alexander Viro, Jarod Wilson, David Herrmann,
	Markus Pargmann, nbd-general, Stefan Haberland, Sebastian Ott,
	Fabian Frederick, linux-s390, Ming Lei

On Sun, Apr 5, 2015 at 3:24 PM, Ming Lei <ming.lei@canonical.com> wrote:
> The lo_ctl_mutex is held for running all ioctl handlers, and
> in some ioctl handlers, ioctl_by_bdev(BLKRRPART) is called for
> rereading partitions, which requires bd_mutex.
>
> So it is easy to cause failure because trylock(bd_mutex) may
> fail inside blkdev_reread_part(), and follows the lock context:
>
> blkid or other application:
>         ->open()
>                 ->mutex_lock(bd_mutex)
>                 ->lo_open()
>                         ->mutex_lock(lo_ctl_mutex)
>
> losetup(set fd ioctl):
>         ->mutex_lock(lo_ctl_mutex)
>         ->ioctl_by_bdev(BLKRRPART)
>                 ->trylock(bd_mutex)
>
> This patch trys to eliminate the ABBA lock dependency by removing
> lo_ctl_mutext in lo_open() with the following approach:
>
> 1) introduce lo_open_mutex to protect lo_refcnt and avoid acquiring
> lo_ctl_mutex in lo_open():
>         - for open vs. add/del loop, no any problem because of loop_index_mutex
>         - lo_open_mutex is used for syncing open() and loop_clr_fd()
>         - both open() and release() have been serialized by bd_mutex already
>
> 2) don't hold lo_ctl_mutex for decreasing/checking lo_refcnt in
> lo_release(), then lo_ctl_mutex is only required for the last release.

Another simpler way is to make lo_refcnt as atomic_t and remove
lo_ctrl_mutext in lo_open(), and freeze request queue during clearing
fd, and better to freeze queue during setting fd too, so will update in
v1 with this way.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/6] block: export blkdev_reread_part()
  2015-04-05 16:12   ` Christoph Hellwig
@ 2015-04-05 16:40     ` Ming Lei
  2015-04-06 13:42       ` Jarod Wilson
  2015-04-06 14:50       ` Christoph Hellwig
  0 siblings, 2 replies; 37+ messages in thread
From: Ming Lei @ 2015-04-05 16:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Linux Kernel Mailing List, Tejun Heo, Andrew Morton,
	Alexander Viro, Jarod Wilson, David Herrmann, Markus Pargmann,
	nbd-general, Stefan Haberland, Sebastian Ott, Fabian Frederick,
	linux-s390

On Mon, Apr 6, 2015 at 12:12 AM, Christoph Hellwig <hch@infradead.org> wrote:
>> +/*
>> + * This is exported as API for block driver, can be called
>> + * with requiring bd_mutex or not.
>> + */
>> +int __blkdev_reread_part(struct block_device *bdev, bool lock)
>>  {
>>       struct gendisk *disk = bdev->bd_disk;
>>       int res;
>> @@ -159,12 +163,14 @@ static int blkdev_reread_part(struct block_device *bdev)
>>               return -EINVAL;
>>       if (!capable(CAP_SYS_ADMIN))
>>               return -EACCES;
>> -     if (!mutex_trylock(&bdev->bd_mutex))
>> +     if (lock && !mutex_trylock(&bdev->bd_mutex))
>>               return -EBUSY;
>
> Please don't add funtions that do conditional locking, instead move
> all the code into blkdev_reread_part_nolock, and then wrap it:
>
> int blkdev_reread_part(struct block_device *bdev)
> {
>         if (!mutex_trylock(&bdev->bd_mutex))
>                 return -EBUSY;
>         blkdev_reread_part_nolock(bdev);
>         mutex_unlock(&bdev->bd_mutex);
> }

Yes, it is more clean, but with extra acquiring lock cost for the
failure cases, especially when we replace trylock with mutex_lock().

>
> Please also add a lockdep_assert_held to blkdev_reread_part_nolock to
> ensure callers actually do hold the lock.

Good point!

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/6] block: export blkdev_reread_part()
  2015-04-05 16:40     ` Ming Lei
@ 2015-04-06 13:42       ` Jarod Wilson
  2015-04-06 13:48         ` Jarod Wilson
  2015-04-07  2:43         ` Ming Lei
  2015-04-06 14:50       ` Christoph Hellwig
  1 sibling, 2 replies; 37+ messages in thread
From: Jarod Wilson @ 2015-04-06 13:42 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, Linux Kernel Mailing List,
	Tejun Heo, Andrew Morton, Alexander Viro, David Herrmann,
	Markus Pargmann, nbd-general, Stefan Haberland, Sebastian Ott,
	Fabian Frederick, linux-s390

On Mon, Apr 06, 2015 at 12:40:12AM +0800, Ming Lei wrote:
> On Mon, Apr 6, 2015 at 12:12 AM, Christoph Hellwig <hch@infradead.org> wrote:
> >> +/*
> >> + * This is exported as API for block driver, can be called
> >> + * with requiring bd_mutex or not.
> >> + */
> >> +int __blkdev_reread_part(struct block_device *bdev, bool lock)
> >>  {
> >>       struct gendisk *disk = bdev->bd_disk;
> >>       int res;
> >> @@ -159,12 +163,14 @@ static int blkdev_reread_part(struct block_device *bdev)
> >>               return -EINVAL;
> >>       if (!capable(CAP_SYS_ADMIN))
> >>               return -EACCES;
> >> -     if (!mutex_trylock(&bdev->bd_mutex))
> >> +     if (lock && !mutex_trylock(&bdev->bd_mutex))
> >>               return -EBUSY;
> >
> > Please don't add funtions that do conditional locking, instead move
> > all the code into blkdev_reread_part_nolock, and then wrap it:
> >
> > int blkdev_reread_part(struct block_device *bdev)
> > {
> >         if (!mutex_trylock(&bdev->bd_mutex))
> >                 return -EBUSY;
> >         blkdev_reread_part_nolock(bdev);
> >         mutex_unlock(&bdev->bd_mutex);
> > }
> 
> Yes, it is more clean, but with extra acquiring lock cost for the
> failure cases, especially when we replace trylock with mutex_lock().

I was working on a version of this myself over the past few days, I
actually removed blkdev_reread_part() entirely, renamed
fs/partition-generic.c::reread_partitions() to __reread_partitions(), then
moved the locking from blkdev_reread_part() into a new reread_partitions()
that wrapped around __reread_partitions(). Same difference, I guess.

> > Please also add a lockdep_assert_held to blkdev_reread_part_nolock to
> > ensure callers actually do hold the lock.
> 
> Good point!

Looks like fs/block_dev.c::__blkdev_get() is the only thing that would be
calling the _nolock variant of whichever route, as it handles bd_mutex
acquisition within __blkdev_get().

As an aside, there's a piece of that function that could be worth
duplicating over into loop.c as well:

        if (bdev->bd_invalidated) {
                if (!ret)
                        rescan_partitions(bdev);
                else if (ret == -ENOMEDIUM)
                        invalidate_partitions(disk, bdev);

Might this possibly be put to use to help with the problem commit
8761a3dc1f07b163414e2215a2cadbb4cfe2a107 was trying to solve?

-- 
Jarod Wilson
jarod@redhat.com


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 5/6] block: dasd_genhd: convert to blkdev_reread_part
  2015-04-05  7:24 ` [PATCH 5/6] block: dasd_genhd: convert to blkdev_reread_part Ming Lei
@ 2015-04-06 13:46   ` Jarod Wilson
  2015-04-06 13:51     ` Jarod Wilson
  2015-04-07  0:47     ` Ming Lei
  0 siblings, 2 replies; 37+ messages in thread
From: Jarod Wilson @ 2015-04-06 13:46 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, Christoph Hellwig, Tejun Heo,
	Andrew Morton, Alexander Viro, David Herrmann, Markus Pargmann,
	nbd-general, Stefan Haberland, Sebastian Ott, Fabian Frederick,
	linux-s390

On Sun, Apr 05, 2015 at 03:24:47PM +0800, Ming Lei wrote:
> Also remove the obsolete comment.
> 
> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> ---
>  drivers/s390/block/dasd_genhd.c |    9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/s390/block/dasd_genhd.c b/drivers/s390/block/dasd_genhd.c
> index 90f39f7..2af4619 100644
> --- a/drivers/s390/block/dasd_genhd.c
> +++ b/drivers/s390/block/dasd_genhd.c
> @@ -116,14 +116,11 @@ int dasd_scan_partitions(struct dasd_block *block)
>  			      rc);
>  		return -ENODEV;
>  	}
> -	/*
> -	 * See fs/partition/check.c:register_disk,rescan_partitions
> -	 * Can't call rescan_partitions directly. Use ioctl.
> -	 */
> -	rc = ioctl_by_bdev(bdev, BLKRRPART, 0);
> +
> +	rc = blkdev_reread_part(bdev);
>  	while (rc == -EBUSY && retry > 0) {
>  		schedule();
> -		rc = ioctl_by_bdev(bdev, BLKRRPART, 0);
> +		rc = blkdev_reread_part(bdev);
>  		retry--;
>  		DBF_DEV_EVENT(DBF_ERR, block->base,
>  			      "scan partitions error, retry %d rc %d",

Note: patch 6/6 in the series makes this whole while() loops pointless,
since the possibility of the -EBUSY return goes away.

-- 
Jarod Wilson
jarod@redhat.com


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/6] block: export blkdev_reread_part()
  2015-04-06 13:42       ` Jarod Wilson
@ 2015-04-06 13:48         ` Jarod Wilson
  2015-04-07  2:43         ` Ming Lei
  1 sibling, 0 replies; 37+ messages in thread
From: Jarod Wilson @ 2015-04-06 13:48 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, Linux Kernel Mailing List,
	Tejun Heo, Andrew Morton, Alexander Viro, David Herrmann,
	Markus Pargmann, nbd-general, Stefan Haberland, Sebastian Ott,
	Fabian Frederick, linux-s390

On Mon, Apr 06, 2015 at 09:42:27AM -0400, Jarod Wilson wrote:
> On Mon, Apr 06, 2015 at 12:40:12AM +0800, Ming Lei wrote:
> > On Mon, Apr 6, 2015 at 12:12 AM, Christoph Hellwig <hch@infradead.org> wrote:
> > >> +/*
> > >> + * This is exported as API for block driver, can be called
> > >> + * with requiring bd_mutex or not.
> > >> + */
> > >> +int __blkdev_reread_part(struct block_device *bdev, bool lock)
> > >>  {
> > >>       struct gendisk *disk = bdev->bd_disk;
> > >>       int res;
> > >> @@ -159,12 +163,14 @@ static int blkdev_reread_part(struct block_device *bdev)
> > >>               return -EINVAL;
> > >>       if (!capable(CAP_SYS_ADMIN))
> > >>               return -EACCES;
> > >> -     if (!mutex_trylock(&bdev->bd_mutex))
> > >> +     if (lock && !mutex_trylock(&bdev->bd_mutex))
> > >>               return -EBUSY;
> > >
> > > Please don't add funtions that do conditional locking, instead move
> > > all the code into blkdev_reread_part_nolock, and then wrap it:
> > >
> > > int blkdev_reread_part(struct block_device *bdev)
> > > {
> > >         if (!mutex_trylock(&bdev->bd_mutex))
> > >                 return -EBUSY;
> > >         blkdev_reread_part_nolock(bdev);
> > >         mutex_unlock(&bdev->bd_mutex);
> > > }
> > 
> > Yes, it is more clean, but with extra acquiring lock cost for the
> > failure cases, especially when we replace trylock with mutex_lock().
> 
> I was working on a version of this myself over the past few days, I
> actually removed blkdev_reread_part() entirely, renamed
> fs/partition-generic.c::reread_partitions() to __reread_partitions(), then

Sorry, that was block/partition-generic.c, not fs/.

-- 
Jarod Wilson
jarod@redhat.com


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 5/6] block: dasd_genhd: convert to blkdev_reread_part
  2015-04-06 13:46   ` Jarod Wilson
@ 2015-04-06 13:51     ` Jarod Wilson
  2015-04-07  1:59       ` Ming Lei
  2015-04-07  0:47     ` Ming Lei
  1 sibling, 1 reply; 37+ messages in thread
From: Jarod Wilson @ 2015-04-06 13:51 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, Christoph Hellwig, Tejun Heo,
	Andrew Morton, Alexander Viro, David Herrmann, Markus Pargmann,
	nbd-general, Stefan Haberland, Sebastian Ott, Fabian Frederick,
	linux-s390

On Mon, Apr 06, 2015 at 09:46:55AM -0400, Jarod Wilson wrote:
> On Sun, Apr 05, 2015 at 03:24:47PM +0800, Ming Lei wrote:
> > Also remove the obsolete comment.
> > 
> > Signed-off-by: Ming Lei <ming.lei@canonical.com>
> > ---
> >  drivers/s390/block/dasd_genhd.c |    9 +++------
> >  1 file changed, 3 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/s390/block/dasd_genhd.c b/drivers/s390/block/dasd_genhd.c
> > index 90f39f7..2af4619 100644
> > --- a/drivers/s390/block/dasd_genhd.c
> > +++ b/drivers/s390/block/dasd_genhd.c
> > @@ -116,14 +116,11 @@ int dasd_scan_partitions(struct dasd_block *block)
> >  			      rc);
> >  		return -ENODEV;
> >  	}
> > -	/*
> > -	 * See fs/partition/check.c:register_disk,rescan_partitions
> > -	 * Can't call rescan_partitions directly. Use ioctl.
> > -	 */
> > -	rc = ioctl_by_bdev(bdev, BLKRRPART, 0);
> > +
> > +	rc = blkdev_reread_part(bdev);
> >  	while (rc == -EBUSY && retry > 0) {
> >  		schedule();
> > -		rc = ioctl_by_bdev(bdev, BLKRRPART, 0);
> > +		rc = blkdev_reread_part(bdev);
> >  		retry--;
> >  		DBF_DEV_EVENT(DBF_ERR, block->base,
> >  			      "scan partitions error, retry %d rc %d",
> 
> Note: patch 6/6 in the series makes this whole while() loops pointless,
> since the possibility of the -EBUSY return goes away.

Minor clarification: the -EBUSY due to the trylock, which is why that
retry loop exists, goes away. You *could* still get an -EBUSY through
blkdev_reread_part()->rescan_partitions()->drop_partitions() if
bdev->bd_part_count is non-zero.

-- 
Jarod Wilson
jarod@redhat.com


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/6] block: loop: don't hold lo_ctl_mutex in lo_open
  2015-04-05 16:28   ` Ming Lei
@ 2015-04-06 14:49     ` Christoph Hellwig
  0 siblings, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2015-04-06 14:49 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Linux Kernel Mailing List, Christoph Hellwig,
	Tejun Heo, Andrew Morton, Alexander Viro, Jarod Wilson,
	David Herrmann, Markus Pargmann, nbd-general, Stefan Haberland,
	Sebastian Ott, Fabian Frederick, linux-s390

On Mon, Apr 06, 2015 at 12:28:22AM +0800, Ming Lei wrote:
> Another simpler way is to make lo_refcnt as atomic_t and remove
> lo_ctrl_mutext in lo_open(), and freeze request queue during clearing
> fd, and better to freeze queue during setting fd too, so will update in
> v1 with this way.

Using an atomic_t sounds good to me.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/6] block: export blkdev_reread_part()
  2015-04-05 16:40     ` Ming Lei
  2015-04-06 13:42       ` Jarod Wilson
@ 2015-04-06 14:50       ` Christoph Hellwig
  2015-04-07  2:11         ` Ming Lei
  1 sibling, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2015-04-06 14:50 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, Linux Kernel Mailing List,
	Tejun Heo, Andrew Morton, Alexander Viro, Jarod Wilson,
	David Herrmann, Markus Pargmann, nbd-general, Stefan Haberland,
	Sebastian Ott, Fabian Frederick, linux-s390

On Mon, Apr 06, 2015 at 12:40:12AM +0800, Ming Lei wrote:
> > int blkdev_reread_part(struct block_device *bdev)
> > {
> >         if (!mutex_trylock(&bdev->bd_mutex))
> >                 return -EBUSY;
> >         blkdev_reread_part_nolock(bdev);
> >         mutex_unlock(&bdev->bd_mutex);
> > }
> 
> Yes, it is more clean, but with extra acquiring lock cost for the
> failure cases, especially when we replace trylock with mutex_lock().

It's just a few fairly trivial checks, so 'm not really worried about
it, especially given that blkdev_reread_part isn't called from a fast
path.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 5/6] block: dasd_genhd: convert to blkdev_reread_part
  2015-04-06 13:46   ` Jarod Wilson
  2015-04-06 13:51     ` Jarod Wilson
@ 2015-04-07  0:47     ` Ming Lei
  1 sibling, 0 replies; 37+ messages in thread
From: Ming Lei @ 2015-04-07  0:47 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: Jens Axboe, Linux Kernel Mailing List, Christoph Hellwig,
	Tejun Heo, Andrew Morton, Alexander Viro, David Herrmann,
	Markus Pargmann, nbd-general, Stefan Haberland, Sebastian Ott,
	Fabian Frederick, linux-s390

On Mon, Apr 6, 2015 at 9:46 PM, Jarod Wilson <jarod@redhat.com> wrote:
> On Sun, Apr 05, 2015 at 03:24:47PM +0800, Ming Lei wrote:
>> Also remove the obsolete comment.
>>
>> Signed-off-by: Ming Lei <ming.lei@canonical.com>
>> ---
>>  drivers/s390/block/dasd_genhd.c |    9 +++------
>>  1 file changed, 3 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/s390/block/dasd_genhd.c b/drivers/s390/block/dasd_genhd.c
>> index 90f39f7..2af4619 100644
>> --- a/drivers/s390/block/dasd_genhd.c
>> +++ b/drivers/s390/block/dasd_genhd.c
>> @@ -116,14 +116,11 @@ int dasd_scan_partitions(struct dasd_block *block)
>>                             rc);
>>               return -ENODEV;
>>       }
>> -     /*
>> -      * See fs/partition/check.c:register_disk,rescan_partitions
>> -      * Can't call rescan_partitions directly. Use ioctl.
>> -      */
>> -     rc = ioctl_by_bdev(bdev, BLKRRPART, 0);
>> +
>> +     rc = blkdev_reread_part(bdev);
>>       while (rc == -EBUSY && retry > 0) {
>>               schedule();
>> -             rc = ioctl_by_bdev(bdev, BLKRRPART, 0);
>> +             rc = blkdev_reread_part(bdev);
>>               retry--;
>>               DBF_DEV_EVENT(DBF_ERR, block->base,
>>                             "scan partitions error, retry %d rc %d",
>
> Note: patch 6/6 in the series makes this whole while() loops pointless,
> since the possibility of the -EBUSY return goes away.

Yes, I do see that, and the while() can be removed after this patchset
is merged.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 5/6] block: dasd_genhd: convert to blkdev_reread_part
  2015-04-06 13:51     ` Jarod Wilson
@ 2015-04-07  1:59       ` Ming Lei
  0 siblings, 0 replies; 37+ messages in thread
From: Ming Lei @ 2015-04-07  1:59 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: Jens Axboe, Linux Kernel Mailing List, Christoph Hellwig,
	Tejun Heo, Andrew Morton, Alexander Viro, David Herrmann,
	Markus Pargmann, nbd-general, Stefan Haberland, Sebastian Ott,
	Fabian Frederick, linux-s390

On Mon, Apr 6, 2015 at 9:51 PM, Jarod Wilson <jarod@redhat.com> wrote:
>>
>> Note: patch 6/6 in the series makes this whole while() loops pointless,
>> since the possibility of the -EBUSY return goes away.
>
> Minor clarification: the -EBUSY due to the trylock, which is why that
> retry loop exists, goes away. You *could* still get an -EBUSY through
> blkdev_reread_part()->rescan_partitions()->drop_partitions() if
> bdev->bd_part_count is non-zero.

Not like trylock(&bd_mutex), Inside kernel, it should be driver's
responsibility to avoid that before rescanning partitions, that said
retry may not be needed for this case.

Thanks,

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/6] block: export blkdev_reread_part()
  2015-04-06 14:50       ` Christoph Hellwig
@ 2015-04-07  2:11         ` Ming Lei
  0 siblings, 0 replies; 37+ messages in thread
From: Ming Lei @ 2015-04-07  2:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Linux Kernel Mailing List, Tejun Heo, Andrew Morton,
	Alexander Viro, Jarod Wilson, David Herrmann, Markus Pargmann,
	nbd-general, Stefan Haberland, Sebastian Ott, Fabian Frederick,
	linux-s390

On Mon, Apr 6, 2015 at 10:50 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Mon, Apr 06, 2015 at 12:40:12AM +0800, Ming Lei wrote:
>> > int blkdev_reread_part(struct block_device *bdev)
>> > {
>> >         if (!mutex_trylock(&bdev->bd_mutex))
>> >                 return -EBUSY;
>> >         blkdev_reread_part_nolock(bdev);
>> >         mutex_unlock(&bdev->bd_mutex);
>> > }
>>
>> Yes, it is more clean, but with extra acquiring lock cost for the
>> failure cases, especially when we replace trylock with mutex_lock().
>
> It's just a few fairly trivial checks, so 'm not really worried about
> it, especially given that blkdev_reread_part isn't called from a fast
> path.

OK, considered that common users don't have any privilege
on block devices at default in most distributions, so they can't
do DoS by running ioctl(RRPART) with this change.

I will change to this style in v1.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/6] block: export blkdev_reread_part()
  2015-04-06 13:42       ` Jarod Wilson
  2015-04-06 13:48         ` Jarod Wilson
@ 2015-04-07  2:43         ` Ming Lei
  1 sibling, 0 replies; 37+ messages in thread
From: Ming Lei @ 2015-04-07  2:43 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: Christoph Hellwig, Jens Axboe, Linux Kernel Mailing List,
	Tejun Heo, Andrew Morton, Alexander Viro, David Herrmann,
	Markus Pargmann, nbd-general, Stefan Haberland, Sebastian Ott,
	Fabian Frederick, linux-s390

On Mon, Apr 6, 2015 at 9:42 PM, Jarod Wilson <jarod@redhat.com> wrote:
> On Mon, Apr 06, 2015 at 12:40:12AM +0800, Ming Lei wrote:
>> On Mon, Apr 6, 2015 at 12:12 AM, Christoph Hellwig <hch@infradead.org> wrote:
>> >> +/*
>> >> + * This is exported as API for block driver, can be called
>> >> + * with requiring bd_mutex or not.
>> >> + */
>> >> +int __blkdev_reread_part(struct block_device *bdev, bool lock)
>> >>  {
>> >>       struct gendisk *disk = bdev->bd_disk;
>> >>       int res;
>> >> @@ -159,12 +163,14 @@ static int blkdev_reread_part(struct block_device *bdev)
>> >>               return -EINVAL;
>> >>       if (!capable(CAP_SYS_ADMIN))
>> >>               return -EACCES;
>> >> -     if (!mutex_trylock(&bdev->bd_mutex))
>> >> +     if (lock && !mutex_trylock(&bdev->bd_mutex))
>> >>               return -EBUSY;
>> >
>> > Please don't add funtions that do conditional locking, instead move
>> > all the code into blkdev_reread_part_nolock, and then wrap it:
>> >
>> > int blkdev_reread_part(struct block_device *bdev)
>> > {
>> >         if (!mutex_trylock(&bdev->bd_mutex))
>> >                 return -EBUSY;
>> >         blkdev_reread_part_nolock(bdev);
>> >         mutex_unlock(&bdev->bd_mutex);
>> > }
>>
>> Yes, it is more clean, but with extra acquiring lock cost for the
>> failure cases, especially when we replace trylock with mutex_lock().
>
> I was working on a version of this myself over the past few days, I
> actually removed blkdev_reread_part() entirely, renamed
> fs/partition-generic.c::reread_partitions() to __reread_partitions(), then
> moved the locking from blkdev_reread_part() into a new reread_partitions()
> that wrapped around __reread_partitions(). Same difference, I guess.
>
>> > Please also add a lockdep_assert_held to blkdev_reread_part_nolock to
>> > ensure callers actually do hold the lock.
>>
>> Good point!
>
> Looks like fs/block_dev.c::__blkdev_get() is the only thing that would be
> calling the _nolock variant of whichever route, as it handles bd_mutex
> acquisition within __blkdev_get().

I guess you forget __blkdev_put(), :-)

>
> As an aside, there's a piece of that function that could be worth
> duplicating over into loop.c as well:
>
>         if (bdev->bd_invalidated) {
>                 if (!ret)
>                         rescan_partitions(bdev);
>                 else if (ret == -ENOMEDIUM)
>                         invalidate_partitions(disk, bdev);
>
> Might this possibly be put to use to help with the problem commit
> 8761a3dc1f07b163414e2215a2cadbb4cfe2a107 was trying to solve?

I am wondering if the problem claimed in this commit exists in reality,
at least fdisk need to run reread partition first before adding partition.

- LO_FLAGS_PARTSCAN is set for both 'losetup -P' and max_parts
- if max_parts isn't set, GENHD_FL_NO_PART_SCAN is set, so user
can't reread partition successfully because of disk_part_scan_enabled().

If there is really the problem, it can be fixed by exporting
rescan_partitions or the approach in commit 8761a3dc
with not acquiring bd_mutex in release().

Thanks,
Ming Lei

>
> --
> Jarod Wilson
> jarod@redhat.com
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 0/7] block: reread partitions improvements
  2015-04-05  7:24 [PATCH 0/6] block: reread partitions changes and fix for loop Ming Lei
                   ` (5 preceding siblings ...)
  2015-04-05  7:24 ` [RFC PATCH 6/6] block: replace trylock with mutex_lock in __blkdev_reread_part() Ming Lei
@ 2015-04-08  6:23 ` Jarod Wilson
  2015-04-08  6:23   ` [PATCH 1/7] block: export blkdev_reread_part() and __blkdev_reread_part() Jarod Wilson
                     ` (6 more replies)
  6 siblings, 7 replies; 37+ messages in thread
From: Jarod Wilson @ 2015-04-08  6:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jarod Wilson, Christoph Hellwig, Jens Axboe, Tejun Heo,
	Alexander Viro, Markus Pargmann, Stefan Weinhuber,
	Stefan Haberland, Sebastian Ott, Fabian Frederick, Ming Lei,
	David Herrmann, Mike Galbraith, Andrew Morton, Peter Zijlstra,
	nbd-general, linux-s390

For loop, the root cause is one ABBA and one AA lock dependency
issue, and the two are fixed by patch 2 and patch 3 each.

Another reason is from the trylock in blkdev_reread_part(), which
may cause partition scanning failure too sometimes when another task
is holding the bd_mutex. In the discussion[1], both Tejun and Christoph
suggests to replace the trylock with mutex_lock in blkdev_reread_part(),
also Christoph suggests to export blkdev_reread_part.

Following the discussion, this patchset exports blkdev_reread_part(), and
introduces lockless __blkdev_reread_part() for fixing loop's AA lock issue.
Then ioctl_by_bdev(BLKRRPART) in loop, nbd and dasd is replaced with
blkdev_reread_part(). In the last patch, trylock in blkdev_reread_part()
is replaced with mutex_lock, and some analysis is provided about the
conversion.

This is a slightly reworked set from what Ming sent a few days ago,
based on Christoph's feedback. I've tested this out quite succesfully
in the scenario that prompted my first patch as well.


[1], https://lkml.org/lkml/2015/1/26/137
[2], https://lkml.org/lkml/2015/3/31/888

Jarod Wilson (3):
  block: export blkdev_reread_part() and __blkdev_reread_part()
  block: loop: fix another reread part failure
  s390/block/dasd: remove obsolete while -EBUSY loop

Ming Lei (4):
  block: loop: don't hold lo_ctl_mutex in lo_open
  block: nbd: convert to blkdev_reread_part()
  block: dasd_genhd: convert to blkdev_reread_part
  block: replace trylock with mutex_lock in blkdev_reread_part()

 block/ioctl.c                   | 34 +++++++++++++++++++----
 drivers/block/loop.c            | 60 ++++++++++++++++++++++++++++++++++-------
 drivers/block/loop.h            |  1 +
 drivers/block/nbd.c             |  2 +-
 drivers/s390/block/dasd_genhd.c | 22 +++++----------
 include/linux/fs.h              |  3 +++
 6 files changed, 90 insertions(+), 32 deletions(-)

CC: Christoph Hellwig <hch@infradead.org>
CC: Jens Axboe <axboe@kernel.dk>
CC: Tejun Heo <tj@kernel.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Markus Pargmann <mpa@pengutronix.de>
CC: Stefan Weinhuber <wein@de.ibm.com>
CC: Stefan Haberland <stefan.haberland@de.ibm.com>
CC: Sebastian Ott <sebott@linux.vnet.ibm.com>
CC: Fabian Frederick <fabf@skynet.be>
CC: Ming Lei <ming.lei@canonical.com>
CC: David Herrmann <dh.herrmann@gmail.com>
CC: Mike Galbraith <bitbucket@online.de>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: nbd-general@lists.sourceforge.net
CC: linux-s390@vger.kernel.org

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 1/7] block: export blkdev_reread_part() and __blkdev_reread_part()
  2015-04-08  6:23 ` [PATCH 0/7] block: reread partitions improvements Jarod Wilson
@ 2015-04-08  6:23   ` Jarod Wilson
  2015-04-08 14:50     ` Ming Lei
  2015-04-08  6:23   ` [PATCH 2/7] block: loop: don't hold lo_ctl_mutex in lo_open Jarod Wilson
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 37+ messages in thread
From: Jarod Wilson @ 2015-04-08  6:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jarod Wilson, Christoph Hellwig, Jens Axboe, Tejun Heo,
	Alexander Viro, Markus Pargmann, Stefan Weinhuber,
	Stefan Haberland, Sebastian Ott, Fabian Frederick, Ming Lei,
	David Herrmann, Mike Galbraith, Andrew Morton, Peter Zijlstra,
	nbd-general, linux-s390

This patch exports blkdev_reread_part() for block drivers, also
introduce __blkdev_reread_part(), a lockless version.

For some drivers, such as loop, a reread of partitions can be run
from the release path, and bd_mutex may already be held prior to
calling ioctl_by_bdev(bdev, BLKRRPART, 0), so introduce a lockless
path for use in such cases.

CC: Christoph Hellwig <hch@infradead.org>
CC: Jens Axboe <axboe@kernel.dk>
CC: Tejun Heo <tj@kernel.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Markus Pargmann <mpa@pengutronix.de>
CC: Stefan Weinhuber <wein@de.ibm.com>
CC: Stefan Haberland <stefan.haberland@de.ibm.com>
CC: Sebastian Ott <sebott@linux.vnet.ibm.com>
CC: Fabian Frederick <fabf@skynet.be>
CC: Ming Lei <ming.lei@canonical.com>
CC: David Herrmann <dh.herrmann@gmail.com>
CC: Mike Galbraith <bitbucket@online.de>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: nbd-general@lists.sourceforge.net
CC: linux-s390@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 block/ioctl.c      | 26 +++++++++++++++++++++++---
 include/linux/fs.h |  3 +++
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/block/ioctl.c b/block/ioctl.c
index 7d8befd..64a4fcb 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -150,21 +150,41 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user
 	}
 }
 
-static int blkdev_reread_part(struct block_device *bdev)
+/*
+ * This is an exported API for the block driver, and will not
+ * acquire bd_mutex, leaving it up to the caller to handle
+ * any necessary locking.
+ */
+int __blkdev_reread_part(struct block_device *bdev)
 {
 	struct gendisk *disk = bdev->bd_disk;
-	int res;
 
 	if (!disk_part_scan_enabled(disk) || bdev != bdev->bd_contains)
 		return -EINVAL;
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
+
+	return rescan_partitions(disk, bdev);
+}
+EXPORT_SYMBOL(__blkdev_reread_part);
+
+/*
+ * This is an exported API for the block driver, and will
+ * acquire bd_mutex. Make sure you aren't calling it with
+ * bd_mutex already held, or we'll return -EBUSY.
+ */
+int blkdev_reread_part(struct block_device *bdev)
+{
+	int res;
+
 	if (!mutex_trylock(&bdev->bd_mutex))
 		return -EBUSY;
-	res = rescan_partitions(disk, bdev);
+	res = __blkdev_reread_part(bdev);
 	mutex_unlock(&bdev->bd_mutex);
+
 	return res;
 }
+EXPORT_SYMBOL(blkdev_reread_part);
 
 static int blk_ioctl_discard(struct block_device *bdev, uint64_t start,
 			     uint64_t len, int secure)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f4131e8..11398e3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2245,6 +2245,9 @@ extern struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
 extern struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode,
 					      void *holder);
 extern void blkdev_put(struct block_device *bdev, fmode_t mode);
+extern int __blkdev_reread_part(struct block_device *bdev);
+extern int blkdev_reread_part(struct block_device *bdev);
+
 #ifdef CONFIG_SYSFS
 extern int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk);
 extern void bd_unlink_disk_holder(struct block_device *bdev,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 2/7] block: loop: don't hold lo_ctl_mutex in lo_open
  2015-04-08  6:23 ` [PATCH 0/7] block: reread partitions improvements Jarod Wilson
  2015-04-08  6:23   ` [PATCH 1/7] block: export blkdev_reread_part() and __blkdev_reread_part() Jarod Wilson
@ 2015-04-08  6:23   ` Jarod Wilson
  2015-04-08  6:50     ` Ming Lei
  2015-04-08  6:23   ` [PATCH 3/7] block: loop: fix another reread part failure Jarod Wilson
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 37+ messages in thread
From: Jarod Wilson @ 2015-04-08  6:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ming Lei, Christoph Hellwig, Jens Axboe, Tejun Heo,
	Alexander Viro, Markus Pargmann, Stefan Weinhuber,
	Stefan Haberland, Sebastian Ott, Fabian Frederick,
	David Herrmann, Mike Galbraith, Andrew Morton, Peter Zijlstra,
	nbd-general, linux-s390, Jarod Wilson

From: Ming Lei <ming.lei@canonical.com>

The lo_ctl_mutex is held for running all ioctl handlers, and
in some ioctl handlers, ioctl_by_bdev(BLKRRPART) is called for
rereading partitions, which requires bd_mutex.

So it is easy to cause failure because trylock(bd_mutex) may
fail inside blkdev_reread_part(), and follows the lock context:

blkid or other application:
	->open()
		->mutex_lock(bd_mutex)
		->lo_open()
			->mutex_lock(lo_ctl_mutex)

losetup(set fd ioctl):
	->mutex_lock(lo_ctl_mutex)
	->ioctl_by_bdev(BLKRRPART)
		->trylock(bd_mutex)

This patch trys to eliminate the ABBA lock dependency by removing
lo_ctl_mutext in lo_open() with the following approach:

1) introduce lo_open_mutex to protect lo_refcnt and avoid acquiring
lo_ctl_mutex in lo_open():
	- for open vs. add/del loop, no any problem because of loop_index_mutex
	- lo_open_mutex is used for syncing open() and loop_clr_fd()
	- both open() and release() have been serialized by bd_mutex already

2) don't hold lo_ctl_mutex for decreasing/checking lo_refcnt in
lo_release(), then lo_ctl_mutex is only required for the last release.

CC: Christoph Hellwig <hch@infradead.org>
CC: Jens Axboe <axboe@kernel.dk>
CC: Tejun Heo <tj@kernel.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Markus Pargmann <mpa@pengutronix.de>
CC: Stefan Weinhuber <wein@de.ibm.com>
CC: Stefan Haberland <stefan.haberland@de.ibm.com>
CC: Sebastian Ott <sebott@linux.vnet.ibm.com>
CC: Fabian Frederick <fabf@skynet.be>
CC: Ming Lei <ming.lei@canonical.com>
CC: David Herrmann <dh.herrmann@gmail.com>
CC: Mike Galbraith <bitbucket@online.de>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: nbd-general@lists.sourceforge.net
CC: linux-s390@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 drivers/block/loop.c | 32 ++++++++++++++++++++++++++------
 drivers/block/loop.h |  1 +
 2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index d1f168b..81a6bc1 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -879,14 +879,18 @@ static int loop_clr_fd(struct loop_device *lo)
 	 * <dev>/do something like mkfs/losetup -d <dev> causing the losetup -d
 	 * command to fail with EBUSY.
 	 */
+	mutex_lock(&lo->lo_open_mutex);
 	if (lo->lo_refcnt > 1) {
+		mutex_unlock(&lo->lo_open_mutex);
 		lo->lo_flags |= LO_FLAGS_AUTOCLEAR;
 		mutex_unlock(&lo->lo_ctl_mutex);
 		return 0;
 	}
 
-	if (filp == NULL)
+	if (filp == NULL) {
+		mutex_unlock(&lo->lo_open_mutex);
 		return -EINVAL;
+	}
 
 	spin_lock_irq(&lo->lo_lock);
 	lo->lo_state = Lo_rundown;
@@ -919,6 +923,15 @@ static int loop_clr_fd(struct loop_device *lo)
 	lo->lo_state = Lo_unbound;
 	/* This is safe: open() is still holding a reference. */
 	module_put(THIS_MODULE);
+
+	/*
+	 * Unlock open_mutex for avoiding -EBUSY of rereading part:
+	 * - try to acquire bd_mutex from reread part
+	 * - another task is opening the loop with holding bd_mutex
+	 *   and trys to acquire open_mutex
+	 */
+	mutex_unlock(&lo->lo_open_mutex);
+
 	if (lo->lo_flags & LO_FLAGS_PARTSCAN && bdev)
 		ioctl_by_bdev(bdev, BLKRRPART, 0);
 	lo->lo_flags = 0;
@@ -1376,9 +1389,9 @@ static int lo_open(struct block_device *bdev, fmode_t mode)
 		goto out;
 	}
 
-	mutex_lock(&lo->lo_ctl_mutex);
+	mutex_lock(&lo->lo_open_mutex);
 	lo->lo_refcnt++;
-	mutex_unlock(&lo->lo_ctl_mutex);
+	mutex_unlock(&lo->lo_open_mutex);
 out:
 	mutex_unlock(&loop_index_mutex);
 	return err;
@@ -1387,13 +1400,16 @@ out:
 static void lo_release(struct gendisk *disk, fmode_t mode)
 {
 	struct loop_device *lo = disk->private_data;
-	int err;
+	int err, ref;
 
-	mutex_lock(&lo->lo_ctl_mutex);
+	mutex_lock(&lo->lo_open_mutex);
+	ref = --lo->lo_refcnt;
+	mutex_unlock(&lo->lo_open_mutex);
 
-	if (--lo->lo_refcnt)
+	if (ref)
 		goto out;
 
+	mutex_lock(&lo->lo_ctl_mutex);
 	if (lo->lo_flags & LO_FLAGS_AUTOCLEAR) {
 		/*
 		 * In autoclear mode, stop the loop thread
@@ -1646,6 +1662,7 @@ static int loop_add(struct loop_device **l, int i)
 		disk->flags |= GENHD_FL_NO_PART_SCAN;
 	disk->flags |= GENHD_FL_EXT_DEVT;
 	mutex_init(&lo->lo_ctl_mutex);
+	mutex_init(&lo->lo_open_mutex);
 	lo->lo_number		= i;
 	spin_lock_init(&lo->lo_lock);
 	disk->major		= LOOP_MAJOR;
@@ -1763,11 +1780,14 @@ static long loop_control_ioctl(struct file *file, unsigned int cmd,
 			mutex_unlock(&lo->lo_ctl_mutex);
 			break;
 		}
+		mutex_lock(&lo->lo_open_mutex);
 		if (lo->lo_refcnt > 0) {
 			ret = -EBUSY;
+			mutex_unlock(&lo->lo_open_mutex);
 			mutex_unlock(&lo->lo_ctl_mutex);
 			break;
 		}
+		mutex_unlock(&lo->lo_open_mutex);
 		lo->lo_disk->private_data = NULL;
 		mutex_unlock(&lo->lo_ctl_mutex);
 		idr_remove(&loop_index_idr, lo->lo_number);
diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index 301c27f..1b4acf2 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -59,6 +59,7 @@ struct loop_device {
 	bool			write_started;
 	int			lo_state;
 	struct mutex		lo_ctl_mutex;
+	struct mutex		lo_open_mutex;
 
 	struct request_queue	*lo_queue;
 	struct blk_mq_tag_set	tag_set;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 3/7] block: loop: fix another reread part failure
  2015-04-08  6:23 ` [PATCH 0/7] block: reread partitions improvements Jarod Wilson
  2015-04-08  6:23   ` [PATCH 1/7] block: export blkdev_reread_part() and __blkdev_reread_part() Jarod Wilson
  2015-04-08  6:23   ` [PATCH 2/7] block: loop: don't hold lo_ctl_mutex in lo_open Jarod Wilson
@ 2015-04-08  6:23   ` Jarod Wilson
  2015-04-08  6:23   ` [PATCH 4/7] block: nbd: convert to blkdev_reread_part() Jarod Wilson
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 37+ messages in thread
From: Jarod Wilson @ 2015-04-08  6:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jarod Wilson, Christoph Hellwig, Jens Axboe, Tejun Heo,
	Alexander Viro, Markus Pargmann, Stefan Weinhuber,
	Stefan Haberland, Sebastian Ott, Fabian Frederick, Ming Lei,
	David Herrmann, Mike Galbraith, Andrew Morton, Peter Zijlstra,
	nbd-general, linux-s390

loop_clr_fd() can be run piggyback with lo_release(), and
under this situation, reread partition may always fail because
of bd_mutex which has been held in release path.

This patch detects the situation by the reference count, and
call blkdev_reread_part_nolock() to avoid acquiring the lock again.

In the meantime, this patch switches to new kernel APIs
of blkdev_reread_part() and blkdev_reread_part_nolock().

[jarod: this is a merger of my original patch, Ming's patch, and some
reworking for my reworked first patch in the series.]

CC: Christoph Hellwig <hch@infradead.org>
CC: Jens Axboe <axboe@kernel.dk>
CC: Tejun Heo <tj@kernel.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Markus Pargmann <mpa@pengutronix.de>
CC: Stefan Weinhuber <wein@de.ibm.com>
CC: Stefan Haberland <stefan.haberland@de.ibm.com>
CC: Sebastian Ott <sebott@linux.vnet.ibm.com>
CC: Fabian Frederick <fabf@skynet.be>
CC: Ming Lei <ming.lei@canonical.com>
CC: David Herrmann <dh.herrmann@gmail.com>
CC: Mike Galbraith <bitbucket@online.de>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: nbd-general@lists.sourceforge.net
CC: linux-s390@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 drivers/block/loop.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 81a6bc1..ab5c678 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -528,6 +528,26 @@ static int loop_flush(struct loop_device *lo)
 	return loop_switch(lo, NULL);
 }
 
+static void loop_reread_partitions(struct loop_device *lo,
+				   struct block_device *bdev)
+{
+	int rc;
+	bool in_release;
+
+	mutex_lock(&lo->lo_open_mutex);
+	in_release = lo->lo_refcnt == 0;
+	mutex_unlock(&lo->lo_open_mutex);
+
+	/* bd_mutex has been held already in release path */
+	if (in_release)
+		rc = __blkdev_reread_part(bdev);
+	else
+		rc = blkdev_reread_part(bdev);
+	if (rc)
+		pr_warn("%s: partition scan of loop%d (%s) failed (rc=%d)\n",
+			__func__, lo->lo_number, lo->lo_file_name, rc);
+}
+
 /*
  * loop_change_fd switched the backing store of a loopback device to
  * a new file. This is useful for operating system installers to free up
@@ -576,7 +596,7 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev,
 
 	fput(old_file);
 	if (lo->lo_flags & LO_FLAGS_PARTSCAN)
-		ioctl_by_bdev(bdev, BLKRRPART, 0);
+		loop_reread_partitions(lo, bdev);
 	return 0;
 
  out_putf:
@@ -807,7 +827,7 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 	if (part_shift)
 		lo->lo_flags |= LO_FLAGS_PARTSCAN;
 	if (lo->lo_flags & LO_FLAGS_PARTSCAN)
-		ioctl_by_bdev(bdev, BLKRRPART, 0);
+		loop_reread_partitions(lo, bdev);
 
 	/* Grab the block_device to prevent its destruction after we
 	 * put /dev/loopXX inode. Later in loop_clr_fd() we bdput(bdev).
@@ -933,7 +953,7 @@ static int loop_clr_fd(struct loop_device *lo)
 	mutex_unlock(&lo->lo_open_mutex);
 
 	if (lo->lo_flags & LO_FLAGS_PARTSCAN && bdev)
-		ioctl_by_bdev(bdev, BLKRRPART, 0);
+		loop_reread_partitions(lo, bdev);
 	lo->lo_flags = 0;
 	if (!part_shift)
 		lo->lo_disk->flags |= GENHD_FL_NO_PART_SCAN;
@@ -1008,7 +1028,7 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
 	     !(lo->lo_flags & LO_FLAGS_PARTSCAN)) {
 		lo->lo_flags |= LO_FLAGS_PARTSCAN;
 		lo->lo_disk->flags &= ~GENHD_FL_NO_PART_SCAN;
-		ioctl_by_bdev(lo->lo_device, BLKRRPART, 0);
+		loop_reread_partitions(lo, lo->lo_device);
 	}
 
 	lo->lo_encrypt_key_size = info->lo_encrypt_key_size;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 4/7] block: nbd: convert to blkdev_reread_part()
  2015-04-08  6:23 ` [PATCH 0/7] block: reread partitions improvements Jarod Wilson
                     ` (2 preceding siblings ...)
  2015-04-08  6:23   ` [PATCH 3/7] block: loop: fix another reread part failure Jarod Wilson
@ 2015-04-08  6:23   ` Jarod Wilson
  2015-04-08  6:23   ` [PATCH 5/7] block: dasd_genhd: convert to blkdev_reread_part Jarod Wilson
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 37+ messages in thread
From: Jarod Wilson @ 2015-04-08  6:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ming Lei, Christoph Hellwig, Jens Axboe, Tejun Heo,
	Alexander Viro, Markus Pargmann, Stefan Weinhuber,
	Stefan Haberland, Sebastian Ott, Fabian Frederick,
	David Herrmann, Mike Galbraith, Andrew Morton, Peter Zijlstra,
	nbd-general, linux-s390, Jarod Wilson

From: Ming Lei <ming.lei@canonical.com>

CC: Christoph Hellwig <hch@infradead.org>
CC: Jens Axboe <axboe@kernel.dk>
CC: Tejun Heo <tj@kernel.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Markus Pargmann <mpa@pengutronix.de>
CC: Stefan Weinhuber <wein@de.ibm.com>
CC: Stefan Haberland <stefan.haberland@de.ibm.com>
CC: Sebastian Ott <sebott@linux.vnet.ibm.com>
CC: Fabian Frederick <fabf@skynet.be>
CC: Ming Lei <ming.lei@canonical.com>
CC: David Herrmann <dh.herrmann@gmail.com>
CC: Mike Galbraith <bitbucket@online.de>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: nbd-general@lists.sourceforge.net
CC: linux-s390@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 drivers/block/nbd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index a98c41f..7e9d26e 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -734,7 +734,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
 		bdev->bd_inode->i_size = 0;
 		set_capacity(nbd->disk, 0);
 		if (max_part > 0)
-			ioctl_by_bdev(bdev, BLKRRPART, 0);
+			blkdev_reread_part(bdev);
 		if (nbd->disconnect) /* user requested, ignore socket errors */
 			return 0;
 		return nbd->harderror;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 5/7] block: dasd_genhd: convert to blkdev_reread_part
  2015-04-08  6:23 ` [PATCH 0/7] block: reread partitions improvements Jarod Wilson
                     ` (3 preceding siblings ...)
  2015-04-08  6:23   ` [PATCH 4/7] block: nbd: convert to blkdev_reread_part() Jarod Wilson
@ 2015-04-08  6:23   ` Jarod Wilson
  2015-04-08  6:23   ` [PATCH 6/7] block: replace trylock with mutex_lock in blkdev_reread_part() Jarod Wilson
  2015-04-08  6:23   ` [PATCH 7/7] s390/block/dasd: remove obsolete while -EBUSY loop Jarod Wilson
  6 siblings, 0 replies; 37+ messages in thread
From: Jarod Wilson @ 2015-04-08  6:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ming Lei, Christoph Hellwig, Jens Axboe, Tejun Heo,
	Alexander Viro, Markus Pargmann, Stefan Weinhuber,
	Stefan Haberland, Sebastian Ott, Fabian Frederick,
	David Herrmann, Mike Galbraith, Andrew Morton, Peter Zijlstra,
	nbd-general, linux-s390, Jarod Wilson

From: Ming Lei <ming.lei@canonical.com>

Also remove the obsolete comment.

CC: Christoph Hellwig <hch@infradead.org>
CC: Jens Axboe <axboe@kernel.dk>
CC: Tejun Heo <tj@kernel.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Markus Pargmann <mpa@pengutronix.de>
CC: Stefan Weinhuber <wein@de.ibm.com>
CC: Stefan Haberland <stefan.haberland@de.ibm.com>
CC: Sebastian Ott <sebott@linux.vnet.ibm.com>
CC: Fabian Frederick <fabf@skynet.be>
CC: Ming Lei <ming.lei@canonical.com>
CC: David Herrmann <dh.herrmann@gmail.com>
CC: Mike Galbraith <bitbucket@online.de>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: nbd-general@lists.sourceforge.net
CC: linux-s390@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 drivers/s390/block/dasd_genhd.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/s390/block/dasd_genhd.c b/drivers/s390/block/dasd_genhd.c
index 90f39f7..8026585 100644
--- a/drivers/s390/block/dasd_genhd.c
+++ b/drivers/s390/block/dasd_genhd.c
@@ -116,14 +116,11 @@ int dasd_scan_partitions(struct dasd_block *block)
 			      rc);
 		return -ENODEV;
 	}
-	/*
-	 * See fs/partition/check.c:register_disk,rescan_partitions
-	 * Can't call rescan_partitions directly. Use ioctl.
-	 */
-	rc = ioctl_by_bdev(bdev, BLKRRPART, 0);
+
+	rc = blkdev_reread_part(bdev);
 	while (rc == -EBUSY && retry > 0) {
 		schedule();
-		rc = ioctl_by_bdev(bdev, BLKRRPART, 0);
+		rc = blkdev_reread_part(bdev);
 		retry--;
 		DBF_DEV_EVENT(DBF_ERR, block->base,
 			      "scan partitions error, retry %d rc %d",
@@ -138,7 +135,7 @@ int dasd_scan_partitions(struct dasd_block *block)
 	 * dasd_generic_set_offline). As long as the partition
 	 * detection is running no offline should be allowed. That
 	 * is why the assignment to device->bdev is done AFTER
-	 * the BLKRRPART ioctl.
+	 * the blkdev_reread_part() call.
 	 */
 	block->bdev = bdev;
 	return 0;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 6/7] block: replace trylock with mutex_lock in blkdev_reread_part()
  2015-04-08  6:23 ` [PATCH 0/7] block: reread partitions improvements Jarod Wilson
                     ` (4 preceding siblings ...)
  2015-04-08  6:23   ` [PATCH 5/7] block: dasd_genhd: convert to blkdev_reread_part Jarod Wilson
@ 2015-04-08  6:23   ` Jarod Wilson
  2015-04-08  6:23   ` [PATCH 7/7] s390/block/dasd: remove obsolete while -EBUSY loop Jarod Wilson
  6 siblings, 0 replies; 37+ messages in thread
From: Jarod Wilson @ 2015-04-08  6:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ming Lei, Christoph Hellwig, Jens Axboe, Tejun Heo,
	Alexander Viro, Markus Pargmann, Stefan Weinhuber,
	Stefan Haberland, Sebastian Ott, Fabian Frederick,
	David Herrmann, Mike Galbraith, Andrew Morton, Peter Zijlstra,
	nbd-general, linux-s390, Jarod Wilson

From: Ming Lei <ming.lei@canonical.com>

The only possible problem of using mutex_lock() instead of trylock
is about deadlock.

If there aren't any locks held before calling blkdev_reread_part(lock),
deadlock can't be caused by this conversion.

If there are locks held before calling blkdev_reread_part(lock),
and if these locks arn't required in open, close handler and I/O
path, deadlock shouldn't be caused too.

Both user space's ioctl(BLKRRPART) and md_setup_drive() from
init/do_mounts_md.c belongs to the 1st case, so the conversion is safe
for the two cases.

For loop, the previous patches in this pathset has fixed the ABBA lock
dependency, so the conversion is OK.

For nbd, tx_lock is held when calling the function:

        - both open and release won't hold the lock
        - when blkdev_reread_part() is run, I/O thread has been stopped
        already, so tx_lock won't be acquired in I/O path at that time.
        - so the conversion won't cause deadlock for nbd

For dasd, both dasd_open(), dasd_release() and request function don't
acquire any mutex/semphone, so the conversion should be safe.

[jarod: update for modifications earlier in series.]

CC: Christoph Hellwig <hch@infradead.org>
CC: Jens Axboe <axboe@kernel.dk>
CC: Tejun Heo <tj@kernel.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Markus Pargmann <mpa@pengutronix.de>
CC: Stefan Weinhuber <wein@de.ibm.com>
CC: Stefan Haberland <stefan.haberland@de.ibm.com>
CC: Sebastian Ott <sebott@linux.vnet.ibm.com>
CC: Fabian Frederick <fabf@skynet.be>
CC: Ming Lei <ming.lei@canonical.com>
CC: David Herrmann <dh.herrmann@gmail.com>
CC: Mike Galbraith <bitbucket@online.de>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: nbd-general@lists.sourceforge.net
CC: linux-s390@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 block/ioctl.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/block/ioctl.c b/block/ioctl.c
index 64a4fcb..47c8e6d 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -170,15 +170,19 @@ EXPORT_SYMBOL(__blkdev_reread_part);
 
 /*
  * This is an exported API for the block driver, and will
- * acquire bd_mutex. Make sure you aren't calling it with
- * bd_mutex already held, or we'll return -EBUSY.
+ * acquire bd_mutex.
+ *
+ * Make sure held locks aren't required in open()/close()
+ * handlers and I/O paths to avoid an ABBA deadlock:
+ * - bd_mutex is held before calling a block driver's open/close
+ *   handler
+ * - reading a partition table may submit I/O to the block device
  */
 int blkdev_reread_part(struct block_device *bdev)
 {
 	int res;
 
-	if (!mutex_trylock(&bdev->bd_mutex))
-		return -EBUSY;
+	mutex_lock(&bdev->bd_mutex);
 	res = __blkdev_reread_part(bdev);
 	mutex_unlock(&bdev->bd_mutex);
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 7/7] s390/block/dasd: remove obsolete while -EBUSY loop
  2015-04-08  6:23 ` [PATCH 0/7] block: reread partitions improvements Jarod Wilson
                     ` (5 preceding siblings ...)
  2015-04-08  6:23   ` [PATCH 6/7] block: replace trylock with mutex_lock in blkdev_reread_part() Jarod Wilson
@ 2015-04-08  6:23   ` Jarod Wilson
  6 siblings, 0 replies; 37+ messages in thread
From: Jarod Wilson @ 2015-04-08  6:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jarod Wilson, Christoph Hellwig, Jens Axboe, Tejun Heo,
	Alexander Viro, Markus Pargmann, Stefan Weinhuber,
	Stefan Haberland, Sebastian Ott, Fabian Frederick, Ming Lei,
	David Herrmann, Mike Galbraith, Andrew Morton, Peter Zijlstra,
	nbd-general, linux-s390

With the mutex_trylock bit gone from blkdev_reread_part(), the retry logic
in dasd_scan_partitions() shouldn't be necessary.

CC: Christoph Hellwig <hch@infradead.org>
CC: Jens Axboe <axboe@kernel.dk>
CC: Tejun Heo <tj@kernel.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Markus Pargmann <mpa@pengutronix.de>
CC: Stefan Weinhuber <wein@de.ibm.com>
CC: Stefan Haberland <stefan.haberland@de.ibm.com>
CC: Sebastian Ott <sebott@linux.vnet.ibm.com>
CC: Fabian Frederick <fabf@skynet.be>
CC: Ming Lei <ming.lei@canonical.com>
CC: David Herrmann <dh.herrmann@gmail.com>
CC: Mike Galbraith <bitbucket@online.de>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: nbd-general@lists.sourceforge.net
CC: linux-s390@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 drivers/s390/block/dasd_genhd.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/s390/block/dasd_genhd.c b/drivers/s390/block/dasd_genhd.c
index 8026585..783d826 100644
--- a/drivers/s390/block/dasd_genhd.c
+++ b/drivers/s390/block/dasd_genhd.c
@@ -99,9 +99,8 @@ void dasd_gendisk_free(struct dasd_block *block)
 int dasd_scan_partitions(struct dasd_block *block)
 {
 	struct block_device *bdev;
-	int retry, rc;
+	int rc;
 
-	retry = 5;
 	bdev = bdget_disk(block->gdp, 0);
 	if (!bdev) {
 		DBF_DEV_EVENT(DBF_ERR, block->base, "%s",
@@ -118,14 +117,8 @@ int dasd_scan_partitions(struct dasd_block *block)
 	}
 
 	rc = blkdev_reread_part(bdev);
-	while (rc == -EBUSY && retry > 0) {
-		schedule();
-		rc = blkdev_reread_part(bdev);
-		retry--;
-		DBF_DEV_EVENT(DBF_ERR, block->base,
-			      "scan partitions error, retry %d rc %d",
-			      retry, rc);
-	}
+	DBF_DEV_EVENT(DBF_ERR, block->base,
+		      "scan partitions error, rc %d", rc);
 
 	/*
 	 * Since the matching blkdev_put call to the blkdev_get in
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/7] block: loop: don't hold lo_ctl_mutex in lo_open
  2015-04-08  6:23   ` [PATCH 2/7] block: loop: don't hold lo_ctl_mutex in lo_open Jarod Wilson
@ 2015-04-08  6:50     ` Ming Lei
  2015-04-08 13:40       ` Jarod Wilson
  0 siblings, 1 reply; 37+ messages in thread
From: Ming Lei @ 2015-04-08  6:50 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: Linux Kernel Mailing List, Christoph Hellwig, Jens Axboe,
	Tejun Heo, Alexander Viro, Markus Pargmann, Stefan Weinhuber,
	Stefan Haberland, Sebastian Ott, Fabian Frederick,
	David Herrmann, Mike Galbraith, Andrew Morton, Peter Zijlstra,
	nbd-general, linux-s390

Hi Jarod,

On Wed, Apr 8, 2015 at 2:23 PM, Jarod Wilson <jarod@redhat.com> wrote:
> From: Ming Lei <ming.lei@canonical.com>
>
> The lo_ctl_mutex is held for running all ioctl handlers, and
> in some ioctl handlers, ioctl_by_bdev(BLKRRPART) is called for
> rereading partitions, which requires bd_mutex.
>
> So it is easy to cause failure because trylock(bd_mutex) may
> fail inside blkdev_reread_part(), and follows the lock context:
>
> blkid or other application:
>         ->open()
>                 ->mutex_lock(bd_mutex)
>                 ->lo_open()
>                         ->mutex_lock(lo_ctl_mutex)
>
> losetup(set fd ioctl):
>         ->mutex_lock(lo_ctl_mutex)
>         ->ioctl_by_bdev(BLKRRPART)
>                 ->trylock(bd_mutex)
>
> This patch trys to eliminate the ABBA lock dependency by removing
> lo_ctl_mutext in lo_open() with the following approach:
>
> 1) introduce lo_open_mutex to protect lo_refcnt and avoid acquiring
> lo_ctl_mutex in lo_open():

It is a bit quick since I said the lo_open_mutex can be removed,
and Christoph agreed that too.

So looks we still need to post another version, :-)

>         - for open vs. add/del loop, no any problem because of loop_index_mutex
>         - lo_open_mutex is used for syncing open() and loop_clr_fd()
>         - both open() and release() have been serialized by bd_mutex already
>
> 2) don't hold lo_ctl_mutex for decreasing/checking lo_refcnt in
> lo_release(), then lo_ctl_mutex is only required for the last release.
>
> CC: Christoph Hellwig <hch@infradead.org>
> CC: Jens Axboe <axboe@kernel.dk>
> CC: Tejun Heo <tj@kernel.org>
> CC: Alexander Viro <viro@zeniv.linux.org.uk>
> CC: Markus Pargmann <mpa@pengutronix.de>
> CC: Stefan Weinhuber <wein@de.ibm.com>
> CC: Stefan Haberland <stefan.haberland@de.ibm.com>
> CC: Sebastian Ott <sebott@linux.vnet.ibm.com>
> CC: Fabian Frederick <fabf@skynet.be>
> CC: Ming Lei <ming.lei@canonical.com>
> CC: David Herrmann <dh.herrmann@gmail.com>
> CC: Mike Galbraith <bitbucket@online.de>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Peter Zijlstra <peterz@infradead.org>
> CC: nbd-general@lists.sourceforge.net
> CC: linux-s390@vger.kernel.org
> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> Signed-off-by: Jarod Wilson <jarod@redhat.com>
> ---
>  drivers/block/loop.c | 32 ++++++++++++++++++++++++++------
>  drivers/block/loop.h |  1 +
>  2 files changed, 27 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index d1f168b..81a6bc1 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -879,14 +879,18 @@ static int loop_clr_fd(struct loop_device *lo)
>          * <dev>/do something like mkfs/losetup -d <dev> causing the losetup -d
>          * command to fail with EBUSY.
>          */
> +       mutex_lock(&lo->lo_open_mutex);
>         if (lo->lo_refcnt > 1) {
> +               mutex_unlock(&lo->lo_open_mutex);
>                 lo->lo_flags |= LO_FLAGS_AUTOCLEAR;
>                 mutex_unlock(&lo->lo_ctl_mutex);
>                 return 0;
>         }
>
> -       if (filp == NULL)
> +       if (filp == NULL) {
> +               mutex_unlock(&lo->lo_open_mutex);
>                 return -EINVAL;
> +       }
>
>         spin_lock_irq(&lo->lo_lock);
>         lo->lo_state = Lo_rundown;
> @@ -919,6 +923,15 @@ static int loop_clr_fd(struct loop_device *lo)
>         lo->lo_state = Lo_unbound;
>         /* This is safe: open() is still holding a reference. */
>         module_put(THIS_MODULE);
> +
> +       /*
> +        * Unlock open_mutex for avoiding -EBUSY of rereading part:
> +        * - try to acquire bd_mutex from reread part
> +        * - another task is opening the loop with holding bd_mutex
> +        *   and trys to acquire open_mutex
> +        */
> +       mutex_unlock(&lo->lo_open_mutex);
> +
>         if (lo->lo_flags & LO_FLAGS_PARTSCAN && bdev)
>                 ioctl_by_bdev(bdev, BLKRRPART, 0);
>         lo->lo_flags = 0;
> @@ -1376,9 +1389,9 @@ static int lo_open(struct block_device *bdev, fmode_t mode)
>                 goto out;
>         }
>
> -       mutex_lock(&lo->lo_ctl_mutex);
> +       mutex_lock(&lo->lo_open_mutex);
>         lo->lo_refcnt++;
> -       mutex_unlock(&lo->lo_ctl_mutex);
> +       mutex_unlock(&lo->lo_open_mutex);
>  out:
>         mutex_unlock(&loop_index_mutex);
>         return err;
> @@ -1387,13 +1400,16 @@ out:
>  static void lo_release(struct gendisk *disk, fmode_t mode)
>  {
>         struct loop_device *lo = disk->private_data;
> -       int err;
> +       int err, ref;
>
> -       mutex_lock(&lo->lo_ctl_mutex);
> +       mutex_lock(&lo->lo_open_mutex);
> +       ref = --lo->lo_refcnt;
> +       mutex_unlock(&lo->lo_open_mutex);
>
> -       if (--lo->lo_refcnt)
> +       if (ref)
>                 goto out;
>
> +       mutex_lock(&lo->lo_ctl_mutex);
>         if (lo->lo_flags & LO_FLAGS_AUTOCLEAR) {
>                 /*
>                  * In autoclear mode, stop the loop thread
> @@ -1646,6 +1662,7 @@ static int loop_add(struct loop_device **l, int i)
>                 disk->flags |= GENHD_FL_NO_PART_SCAN;
>         disk->flags |= GENHD_FL_EXT_DEVT;
>         mutex_init(&lo->lo_ctl_mutex);
> +       mutex_init(&lo->lo_open_mutex);
>         lo->lo_number           = i;
>         spin_lock_init(&lo->lo_lock);
>         disk->major             = LOOP_MAJOR;
> @@ -1763,11 +1780,14 @@ static long loop_control_ioctl(struct file *file, unsigned int cmd,
>                         mutex_unlock(&lo->lo_ctl_mutex);
>                         break;
>                 }
> +               mutex_lock(&lo->lo_open_mutex);
>                 if (lo->lo_refcnt > 0) {
>                         ret = -EBUSY;
> +                       mutex_unlock(&lo->lo_open_mutex);
>                         mutex_unlock(&lo->lo_ctl_mutex);
>                         break;
>                 }
> +               mutex_unlock(&lo->lo_open_mutex);
>                 lo->lo_disk->private_data = NULL;
>                 mutex_unlock(&lo->lo_ctl_mutex);
>                 idr_remove(&loop_index_idr, lo->lo_number);
> diff --git a/drivers/block/loop.h b/drivers/block/loop.h
> index 301c27f..1b4acf2 100644
> --- a/drivers/block/loop.h
> +++ b/drivers/block/loop.h
> @@ -59,6 +59,7 @@ struct loop_device {
>         bool                    write_started;
>         int                     lo_state;
>         struct mutex            lo_ctl_mutex;
> +       struct mutex            lo_open_mutex;
>
>         struct request_queue    *lo_queue;
>         struct blk_mq_tag_set   tag_set;
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/7] block: loop: don't hold lo_ctl_mutex in lo_open
  2015-04-08  6:50     ` Ming Lei
@ 2015-04-08 13:40       ` Jarod Wilson
  2015-04-08 14:00         ` Jarod Wilson
  0 siblings, 1 reply; 37+ messages in thread
From: Jarod Wilson @ 2015-04-08 13:40 UTC (permalink / raw)
  To: Ming Lei
  Cc: Linux Kernel Mailing List, Christoph Hellwig, Jens Axboe,
	Tejun Heo, Alexander Viro, Markus Pargmann, Stefan Weinhuber,
	Stefan Haberland, Sebastian Ott, Fabian Frederick,
	David Herrmann, Mike Galbraith, Andrew Morton, Peter Zijlstra,
	nbd-general, linux-s390

On Wed, Apr 08, 2015 at 02:50:59PM +0800, Ming Lei wrote:
> Hi Jarod,
> 
> On Wed, Apr 8, 2015 at 2:23 PM, Jarod Wilson <jarod@redhat.com> wrote:
> > From: Ming Lei <ming.lei@canonical.com>
> >
> > The lo_ctl_mutex is held for running all ioctl handlers, and
> > in some ioctl handlers, ioctl_by_bdev(BLKRRPART) is called for
> > rereading partitions, which requires bd_mutex.
> >
> > So it is easy to cause failure because trylock(bd_mutex) may
> > fail inside blkdev_reread_part(), and follows the lock context:
> >
> > blkid or other application:
> >         ->open()
> >                 ->mutex_lock(bd_mutex)
> >                 ->lo_open()
> >                         ->mutex_lock(lo_ctl_mutex)
> >
> > losetup(set fd ioctl):
> >         ->mutex_lock(lo_ctl_mutex)
> >         ->ioctl_by_bdev(BLKRRPART)
> >                 ->trylock(bd_mutex)
> >
> > This patch trys to eliminate the ABBA lock dependency by removing
> > lo_ctl_mutext in lo_open() with the following approach:
> >
> > 1) introduce lo_open_mutex to protect lo_refcnt and avoid acquiring
> > lo_ctl_mutex in lo_open():
> 
> It is a bit quick since I said the lo_open_mutex can be removed,
> and Christoph agreed that too.
> 
> So looks we still need to post another version, :-)

Ah. I missed that bit. Just trying to keep up momentum.

-- 
Jarod Wilson
jarod@redhat.com


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/7] block: loop: don't hold lo_ctl_mutex in lo_open
  2015-04-08 13:40       ` Jarod Wilson
@ 2015-04-08 14:00         ` Jarod Wilson
  2015-04-08 14:20           ` Ming Lei
  0 siblings, 1 reply; 37+ messages in thread
From: Jarod Wilson @ 2015-04-08 14:00 UTC (permalink / raw)
  To: Ming Lei
  Cc: Linux Kernel Mailing List, Christoph Hellwig, Jens Axboe,
	Tejun Heo, Alexander Viro, Markus Pargmann, Stefan Weinhuber,
	Stefan Haberland, Sebastian Ott, Fabian Frederick,
	David Herrmann, Mike Galbraith, Andrew Morton, Peter Zijlstra,
	nbd-general, linux-s390

On Wed, Apr 08, 2015 at 09:40:34AM -0400, Jarod Wilson wrote:
> On Wed, Apr 08, 2015 at 02:50:59PM +0800, Ming Lei wrote:
> > Hi Jarod,
> > 
> > On Wed, Apr 8, 2015 at 2:23 PM, Jarod Wilson <jarod@redhat.com> wrote:
> > > From: Ming Lei <ming.lei@canonical.com>
> > >
> > > The lo_ctl_mutex is held for running all ioctl handlers, and
> > > in some ioctl handlers, ioctl_by_bdev(BLKRRPART) is called for
> > > rereading partitions, which requires bd_mutex.
> > >
> > > So it is easy to cause failure because trylock(bd_mutex) may
> > > fail inside blkdev_reread_part(), and follows the lock context:
> > >
> > > blkid or other application:
> > >         ->open()
> > >                 ->mutex_lock(bd_mutex)
> > >                 ->lo_open()
> > >                         ->mutex_lock(lo_ctl_mutex)
> > >
> > > losetup(set fd ioctl):
> > >         ->mutex_lock(lo_ctl_mutex)
> > >         ->ioctl_by_bdev(BLKRRPART)
> > >                 ->trylock(bd_mutex)
> > >
> > > This patch trys to eliminate the ABBA lock dependency by removing
> > > lo_ctl_mutext in lo_open() with the following approach:
> > >
> > > 1) introduce lo_open_mutex to protect lo_refcnt and avoid acquiring
> > > lo_ctl_mutex in lo_open():
> > 
> > It is a bit quick since I said the lo_open_mutex can be removed,
> > and Christoph agreed that too.
> > 
> > So looks we still need to post another version, :-)
> 
> Ah. I missed that bit. Just trying to keep up momentum.

Are you working on that atomic_t version then, or should I dive into it?

-- 
Jarod Wilson
jarod@redhat.com


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/7] block: loop: don't hold lo_ctl_mutex in lo_open
  2015-04-08 14:00         ` Jarod Wilson
@ 2015-04-08 14:20           ` Ming Lei
  2015-04-08 15:28             ` Jarod Wilson
  0 siblings, 1 reply; 37+ messages in thread
From: Ming Lei @ 2015-04-08 14:20 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: Linux Kernel Mailing List, Christoph Hellwig, Jens Axboe,
	Tejun Heo, Alexander Viro, Markus Pargmann, Stefan Weinhuber,
	Stefan Haberland, Sebastian Ott, Fabian Frederick,
	David Herrmann, Mike Galbraith, Andrew Morton, Peter Zijlstra,
	nbd-general, linux-s390

On Wed, Apr 8, 2015 at 10:00 PM, Jarod Wilson <jarod@redhat.com> wrote:
> On Wed, Apr 08, 2015 at 09:40:34AM -0400, Jarod Wilson wrote:
>> On Wed, Apr 08, 2015 at 02:50:59PM +0800, Ming Lei wrote:
>> > Hi Jarod,
>> >
>> > On Wed, Apr 8, 2015 at 2:23 PM, Jarod Wilson <jarod@redhat.com> wrote:
>> > > From: Ming Lei <ming.lei@canonical.com>
>> > >
>> > > The lo_ctl_mutex is held for running all ioctl handlers, and
>> > > in some ioctl handlers, ioctl_by_bdev(BLKRRPART) is called for
>> > > rereading partitions, which requires bd_mutex.
>> > >
>> > > So it is easy to cause failure because trylock(bd_mutex) may
>> > > fail inside blkdev_reread_part(), and follows the lock context:
>> > >
>> > > blkid or other application:
>> > >         ->open()
>> > >                 ->mutex_lock(bd_mutex)
>> > >                 ->lo_open()
>> > >                         ->mutex_lock(lo_ctl_mutex)
>> > >
>> > > losetup(set fd ioctl):
>> > >         ->mutex_lock(lo_ctl_mutex)
>> > >         ->ioctl_by_bdev(BLKRRPART)
>> > >                 ->trylock(bd_mutex)
>> > >
>> > > This patch trys to eliminate the ABBA lock dependency by removing
>> > > lo_ctl_mutext in lo_open() with the following approach:
>> > >
>> > > 1) introduce lo_open_mutex to protect lo_refcnt and avoid acquiring
>> > > lo_ctl_mutex in lo_open():
>> >
>> > It is a bit quick since I said the lo_open_mutex can be removed,
>> > and Christoph agreed that too.
>> >
>> > So looks we still need to post another version, :-)
>>
>> Ah. I missed that bit. Just trying to keep up momentum.
>
> Are you working on that atomic_t version then, or should I dive into it?

It has been ready, and I will post them out once it passes my tests.

Thanks,

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/7] block: export blkdev_reread_part() and __blkdev_reread_part()
  2015-04-08  6:23   ` [PATCH 1/7] block: export blkdev_reread_part() and __blkdev_reread_part() Jarod Wilson
@ 2015-04-08 14:50     ` Ming Lei
  2015-04-08 15:03       ` Peter Zijlstra
  0 siblings, 1 reply; 37+ messages in thread
From: Ming Lei @ 2015-04-08 14:50 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: Linux Kernel Mailing List, Christoph Hellwig, Jens Axboe,
	Tejun Heo, Alexander Viro, Markus Pargmann, Stefan Weinhuber,
	Stefan Haberland, Sebastian Ott, Fabian Frederick,
	David Herrmann, Mike Galbraith, Andrew Morton, Peter Zijlstra,
	nbd-general, linux-s390

On Wed, Apr 8, 2015 at 2:23 PM, Jarod Wilson <jarod@redhat.com> wrote:
> This patch exports blkdev_reread_part() for block drivers, also
> introduce __blkdev_reread_part(), a lockless version.

Generally the term of lockless is used when lock is avoided,
and that isn't the case of __blkdev_reread_part(), because
bd_mutex is definitely required here.

>
> For some drivers, such as loop, a reread of partitions can be run
> from the release path, and bd_mutex may already be held prior to
> calling ioctl_by_bdev(bdev, BLKRRPART, 0), so introduce a lockless
> path for use in such cases.
>
> CC: Christoph Hellwig <hch@infradead.org>
> CC: Jens Axboe <axboe@kernel.dk>
> CC: Tejun Heo <tj@kernel.org>
> CC: Alexander Viro <viro@zeniv.linux.org.uk>
> CC: Markus Pargmann <mpa@pengutronix.de>
> CC: Stefan Weinhuber <wein@de.ibm.com>
> CC: Stefan Haberland <stefan.haberland@de.ibm.com>
> CC: Sebastian Ott <sebott@linux.vnet.ibm.com>
> CC: Fabian Frederick <fabf@skynet.be>
> CC: Ming Lei <ming.lei@canonical.com>
> CC: David Herrmann <dh.herrmann@gmail.com>
> CC: Mike Galbraith <bitbucket@online.de>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Peter Zijlstra <peterz@infradead.org>
> CC: nbd-general@lists.sourceforge.net
> CC: linux-s390@vger.kernel.org
> Signed-off-by: Jarod Wilson <jarod@redhat.com>
> ---
>  block/ioctl.c      | 26 +++++++++++++++++++++++---
>  include/linux/fs.h |  3 +++
>  2 files changed, 26 insertions(+), 3 deletions(-)
>
> diff --git a/block/ioctl.c b/block/ioctl.c
> index 7d8befd..64a4fcb 100644
> --- a/block/ioctl.c
> +++ b/block/ioctl.c
> @@ -150,21 +150,41 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user
>         }
>  }
>
> -static int blkdev_reread_part(struct block_device *bdev)
> +/*
> + * This is an exported API for the block driver, and will not
> + * acquire bd_mutex, leaving it up to the caller to handle
> + * any necessary locking.

Actually, the function is introduced and should be used in case
that bd_mutex has been held already, such as clearing fd in
loop release().

> + */
> +int __blkdev_reread_part(struct block_device *bdev)
>  {
>         struct gendisk *disk = bdev->bd_disk;
> -       int res;
>
>         if (!disk_part_scan_enabled(disk) || bdev != bdev->bd_contains)
>                 return -EINVAL;
>         if (!capable(CAP_SYS_ADMIN))
>                 return -EACCES;
> +
> +       return rescan_partitions(disk, bdev);
> +}
> +EXPORT_SYMBOL(__blkdev_reread_part);
> +
> +/*
> + * This is an exported API for the block driver, and will
> + * acquire bd_mutex. Make sure you aren't calling it with
> + * bd_mutex already held, or we'll return -EBUSY.

Strictly speaking, it should be "Make sure you aren't calling it
with bd_mutex already held in current context".

> + */
> +int blkdev_reread_part(struct block_device *bdev)
> +{
> +       int res;
> +
>         if (!mutex_trylock(&bdev->bd_mutex))
>                 return -EBUSY;
> -       res = rescan_partitions(disk, bdev);
> +       res = __blkdev_reread_part(bdev);
>         mutex_unlock(&bdev->bd_mutex);
> +
>         return res;
>  }
> +EXPORT_SYMBOL(blkdev_reread_part);
>
>  static int blk_ioctl_discard(struct block_device *bdev, uint64_t start,
>                              uint64_t len, int secure)
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index f4131e8..11398e3 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2245,6 +2245,9 @@ extern struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
>  extern struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode,
>                                               void *holder);
>  extern void blkdev_put(struct block_device *bdev, fmode_t mode);
> +extern int __blkdev_reread_part(struct block_device *bdev);
> +extern int blkdev_reread_part(struct block_device *bdev);
> +
>  #ifdef CONFIG_SYSFS
>  extern int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk);
>  extern void bd_unlink_disk_holder(struct block_device *bdev,
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/7] block: export blkdev_reread_part() and __blkdev_reread_part()
  2015-04-08 14:50     ` Ming Lei
@ 2015-04-08 15:03       ` Peter Zijlstra
  2015-04-08 15:27         ` Jarod Wilson
  2015-04-08 15:28         ` Ming Lei
  0 siblings, 2 replies; 37+ messages in thread
From: Peter Zijlstra @ 2015-04-08 15:03 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jarod Wilson, Linux Kernel Mailing List, Christoph Hellwig,
	Jens Axboe, Tejun Heo, Alexander Viro, Markus Pargmann,
	Stefan Weinhuber, Stefan Haberland, Sebastian Ott,
	Fabian Frederick, David Herrmann, Mike Galbraith, Andrew Morton,
	nbd-general, linux-s390

On Wed, Apr 08, 2015 at 10:50:56PM +0800, Ming Lei wrote:
> > +/*
> > + * This is an exported API for the block driver, and will not
> > + * acquire bd_mutex, leaving it up to the caller to handle
> > + * any necessary locking.
> 
> Actually, the function is introduced and should be used in case
> that bd_mutex has been held already, such as clearing fd in
> loop release().
> 
> > + */
> > +int __blkdev_reread_part(struct block_device *bdev)
> >  {
> >         struct gendisk *disk = bdev->bd_disk;
> >

	lockdep_assert_held(&bdev->bd_mutex);

is an excellent means of avoiding that comment and verifying its
actually true :-)

> >         if (!disk_part_scan_enabled(disk) || bdev != bdev->bd_contains)
> >                 return -EINVAL;
> >         if (!capable(CAP_SYS_ADMIN))
> >                 return -EACCES;
> > +
> > +       return rescan_partitions(disk, bdev);
> > +}
> > +EXPORT_SYMBOL(__blkdev_reread_part);
> > +
> > +/*
> > + * This is an exported API for the block driver, and will
> > + * acquire bd_mutex. Make sure you aren't calling it with
> > + * bd_mutex already held, or we'll return -EBUSY.
> 
> Strictly speaking, it should be "Make sure you aren't calling it
> with bd_mutex already held in current context".
> 
> > + */
> > +int blkdev_reread_part(struct block_device *bdev)
> > +{
> > +       int res;
> > +
> >         if (!mutex_trylock(&bdev->bd_mutex))
> >                 return -EBUSY;

Is that really needed? It seems rather poor form.

> > -       res = rescan_partitions(disk, bdev);
> > +       res = __blkdev_reread_part(bdev);
> >         mutex_unlock(&bdev->bd_mutex);
> > +
> >         return res;
> >  }
> > +EXPORT_SYMBOL(blkdev_reread_part);

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/7] block: export blkdev_reread_part() and __blkdev_reread_part()
  2015-04-08 15:03       ` Peter Zijlstra
@ 2015-04-08 15:27         ` Jarod Wilson
  2015-04-08 15:28         ` Ming Lei
  1 sibling, 0 replies; 37+ messages in thread
From: Jarod Wilson @ 2015-04-08 15:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ming Lei, Linux Kernel Mailing List, Christoph Hellwig,
	Jens Axboe, Tejun Heo, Alexander Viro, Markus Pargmann,
	Stefan Weinhuber, Stefan Haberland, Sebastian Ott,
	Fabian Frederick, David Herrmann, Mike Galbraith, Andrew Morton,
	nbd-general, linux-s390

On Wed, Apr 08, 2015 at 05:03:25PM +0200, Peter Zijlstra wrote:
> On Wed, Apr 08, 2015 at 10:50:56PM +0800, Ming Lei wrote:
> > > +/*
> > > + * This is an exported API for the block driver, and will not
> > > + * acquire bd_mutex, leaving it up to the caller to handle
> > > + * any necessary locking.
> > 
> > Actually, the function is introduced and should be used in case
> > that bd_mutex has been held already, such as clearing fd in
> > loop release().
> > 
> > > + */
> > > +int __blkdev_reread_part(struct block_device *bdev)
> > >  {
> > >         struct gendisk *disk = bdev->bd_disk;
> > >
> 
> 	lockdep_assert_held(&bdev->bd_mutex);
> 
> is an excellent means of avoiding that comment and verifying its
> actually true :-)

Ah, yes, that was actually suggested by Christoph as well, I was too hasty
shoving something back out the door on multiple counts.

> > > + */
> > > +int blkdev_reread_part(struct block_device *bdev)
> > > +{
> > > +       int res;
> > > +
> > >         if (!mutex_trylock(&bdev->bd_mutex))
> > >                 return -EBUSY;
> 
> Is that really needed? It seems rather poor form.

It goes away later in the series and gets converted to a straight
mutex_lock().

-- 
Jarod Wilson
jarod@redhat.com


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/7] block: export blkdev_reread_part() and __blkdev_reread_part()
  2015-04-08 15:03       ` Peter Zijlstra
  2015-04-08 15:27         ` Jarod Wilson
@ 2015-04-08 15:28         ` Ming Lei
  1 sibling, 0 replies; 37+ messages in thread
From: Ming Lei @ 2015-04-08 15:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jarod Wilson, Linux Kernel Mailing List, Christoph Hellwig,
	Jens Axboe, Tejun Heo, Alexander Viro, Markus Pargmann,
	Stefan Weinhuber, Stefan Haberland, Sebastian Ott,
	Fabian Frederick, David Herrmann, Mike Galbraith, Andrew Morton,
	nbd-general, linux-s390

On Wed, Apr 8, 2015 at 11:03 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Apr 08, 2015 at 10:50:56PM +0800, Ming Lei wrote:
>> > +/*
>> > + * This is an exported API for the block driver, and will not
>> > + * acquire bd_mutex, leaving it up to the caller to handle
>> > + * any necessary locking.
>>
>> Actually, the function is introduced and should be used in case
>> that bd_mutex has been held already, such as clearing fd in
>> loop release().
>>
>> > + */
>> > +int __blkdev_reread_part(struct block_device *bdev)
>> >  {
>> >         struct gendisk *disk = bdev->bd_disk;
>> >
>
>         lockdep_assert_held(&bdev->bd_mutex);
>
> is an excellent means of avoiding that comment and verifying its
> actually true :-)

Exactly, and it has been added in my local tree, :-)

>
>> >         if (!disk_part_scan_enabled(disk) || bdev != bdev->bd_contains)
>> >                 return -EINVAL;
>> >         if (!capable(CAP_SYS_ADMIN))
>> >                 return -EACCES;
>> > +
>> > +       return rescan_partitions(disk, bdev);
>> > +}
>> > +EXPORT_SYMBOL(__blkdev_reread_part);
>> > +
>> > +/*
>> > + * This is an exported API for the block driver, and will
>> > + * acquire bd_mutex. Make sure you aren't calling it with
>> > + * bd_mutex already held, or we'll return -EBUSY.
>>
>> Strictly speaking, it should be "Make sure you aren't calling it
>> with bd_mutex already held in current context".
>>
>> > + */
>> > +int blkdev_reread_part(struct block_device *bdev)
>> > +{
>> > +       int res;
>> > +
>> >         if (!mutex_trylock(&bdev->bd_mutex))
>> >                 return -EBUSY;
>
> Is that really needed? It seems rather poor form.

The trylock will be replaced with mutex_lock in patch 6.

>
>> > -       res = rescan_partitions(disk, bdev);
>> > +       res = __blkdev_reread_part(bdev);
>> >         mutex_unlock(&bdev->bd_mutex);
>> > +
>> >         return res;
>> >  }
>> > +EXPORT_SYMBOL(blkdev_reread_part);

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/7] block: loop: don't hold lo_ctl_mutex in lo_open
  2015-04-08 14:20           ` Ming Lei
@ 2015-04-08 15:28             ` Jarod Wilson
  0 siblings, 0 replies; 37+ messages in thread
From: Jarod Wilson @ 2015-04-08 15:28 UTC (permalink / raw)
  To: Ming Lei
  Cc: Linux Kernel Mailing List, Christoph Hellwig, Jens Axboe,
	Tejun Heo, Alexander Viro, Markus Pargmann, Stefan Weinhuber,
	Stefan Haberland, Sebastian Ott, Fabian Frederick,
	David Herrmann, Mike Galbraith, Andrew Morton, Peter Zijlstra,
	nbd-general, linux-s390

On Wed, Apr 08, 2015 at 10:20:22PM +0800, Ming Lei wrote:
> On Wed, Apr 8, 2015 at 10:00 PM, Jarod Wilson <jarod@redhat.com> wrote:
> > On Wed, Apr 08, 2015 at 09:40:34AM -0400, Jarod Wilson wrote:
> >> On Wed, Apr 08, 2015 at 02:50:59PM +0800, Ming Lei wrote:
> >> > Hi Jarod,
> >> >
> >> > On Wed, Apr 8, 2015 at 2:23 PM, Jarod Wilson <jarod@redhat.com> wrote:
> >> > > From: Ming Lei <ming.lei@canonical.com>
> >> > >
> >> > > The lo_ctl_mutex is held for running all ioctl handlers, and
> >> > > in some ioctl handlers, ioctl_by_bdev(BLKRRPART) is called for
> >> > > rereading partitions, which requires bd_mutex.
> >> > >
> >> > > So it is easy to cause failure because trylock(bd_mutex) may
> >> > > fail inside blkdev_reread_part(), and follows the lock context:
> >> > >
> >> > > blkid or other application:
> >> > >         ->open()
> >> > >                 ->mutex_lock(bd_mutex)
> >> > >                 ->lo_open()
> >> > >                         ->mutex_lock(lo_ctl_mutex)
> >> > >
> >> > > losetup(set fd ioctl):
> >> > >         ->mutex_lock(lo_ctl_mutex)
> >> > >         ->ioctl_by_bdev(BLKRRPART)
> >> > >                 ->trylock(bd_mutex)
> >> > >
> >> > > This patch trys to eliminate the ABBA lock dependency by removing
> >> > > lo_ctl_mutext in lo_open() with the following approach:
> >> > >
> >> > > 1) introduce lo_open_mutex to protect lo_refcnt and avoid acquiring
> >> > > lo_ctl_mutex in lo_open():
> >> >
> >> > It is a bit quick since I said the lo_open_mutex can be removed,
> >> > and Christoph agreed that too.
> >> >
> >> > So looks we still need to post another version, :-)
> >>
> >> Ah. I missed that bit. Just trying to keep up momentum.
> >
> > Are you working on that atomic_t version then, or should I dive into it?
> 
> It has been ready, and I will post them out once it passes my tests.

Okay, I'll hold tight then. Sorry for the noise. :p

-- 
Jarod Wilson
jarod@redhat.com


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2015-04-08 15:29 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-05  7:24 [PATCH 0/6] block: reread partitions changes and fix for loop Ming Lei
2015-04-05  7:24 ` [PATCH 1/6] block: export blkdev_reread_part() Ming Lei
2015-04-05 16:12   ` Christoph Hellwig
2015-04-05 16:40     ` Ming Lei
2015-04-06 13:42       ` Jarod Wilson
2015-04-06 13:48         ` Jarod Wilson
2015-04-07  2:43         ` Ming Lei
2015-04-06 14:50       ` Christoph Hellwig
2015-04-07  2:11         ` Ming Lei
2015-04-05  7:24 ` [PATCH 2/6] block: loop: don't hold lo_ctl_mutex in lo_open Ming Lei
2015-04-05 16:28   ` Ming Lei
2015-04-06 14:49     ` Christoph Hellwig
2015-04-05  7:24 ` [PATCH 3/6] block: loop: fix another reread part failure Ming Lei
2015-04-05  7:24 ` [PATCH 4/6] block: nbd: convert to blkdev_reread_part() Ming Lei
2015-04-05  7:24 ` [PATCH 5/6] block: dasd_genhd: convert to blkdev_reread_part Ming Lei
2015-04-06 13:46   ` Jarod Wilson
2015-04-06 13:51     ` Jarod Wilson
2015-04-07  1:59       ` Ming Lei
2015-04-07  0:47     ` Ming Lei
2015-04-05  7:24 ` [RFC PATCH 6/6] block: replace trylock with mutex_lock in __blkdev_reread_part() Ming Lei
2015-04-08  6:23 ` [PATCH 0/7] block: reread partitions improvements Jarod Wilson
2015-04-08  6:23   ` [PATCH 1/7] block: export blkdev_reread_part() and __blkdev_reread_part() Jarod Wilson
2015-04-08 14:50     ` Ming Lei
2015-04-08 15:03       ` Peter Zijlstra
2015-04-08 15:27         ` Jarod Wilson
2015-04-08 15:28         ` Ming Lei
2015-04-08  6:23   ` [PATCH 2/7] block: loop: don't hold lo_ctl_mutex in lo_open Jarod Wilson
2015-04-08  6:50     ` Ming Lei
2015-04-08 13:40       ` Jarod Wilson
2015-04-08 14:00         ` Jarod Wilson
2015-04-08 14:20           ` Ming Lei
2015-04-08 15:28             ` Jarod Wilson
2015-04-08  6:23   ` [PATCH 3/7] block: loop: fix another reread part failure Jarod Wilson
2015-04-08  6:23   ` [PATCH 4/7] block: nbd: convert to blkdev_reread_part() Jarod Wilson
2015-04-08  6:23   ` [PATCH 5/7] block: dasd_genhd: convert to blkdev_reread_part Jarod Wilson
2015-04-08  6:23   ` [PATCH 6/7] block: replace trylock with mutex_lock in blkdev_reread_part() Jarod Wilson
2015-04-08  6:23   ` [PATCH 7/7] s390/block/dasd: remove obsolete while -EBUSY loop Jarod Wilson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).