All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/13 v2] block: Fix block device shutdown related races
@ 2017-02-21 17:09 Jan Kara
  2017-02-21 17:09 ` [PATCH 01/13] block: Move bdev_unhash_inode() after invalidate_partition() Jan Kara
                   ` (15 more replies)
  0 siblings, 16 replies; 40+ messages in thread
From: Jan Kara @ 2017-02-21 17:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo, NeilBrown,
	Omar Sandoval, Jan Kara

Hello,

this is a second revision of the patch set to fix several different races and
issues I've found when testing device shutdown and reuse. The first three
patches are fixes to problems in my previous series fixing BDI lifetime issues.
Patch 4 fixes issues with reuse of BDI name with scsi devices. With it I cannot
reproduce the BDI name reuse issues using Omar's stress test using scsi_debug
so it can be used as a replacement of Dan's patches. Patches 5-11 fix oops that
is triggered by __blkdev_put() calling inode_detach_wb() too early (the problem
reported by Thiago). Patches 12 and 13 fix oops due to a bug in gendisk code
where get_gendisk() can return already freed gendisk structure (again triggered
by Omar's stress test).

People, please have a look at patches. They are mostly simple however the
interactions are rather complex so I may have missed something. Also I'm
happy for any additional testing these patches can get - I've stressed them
with Omar's script, tested memcg writeback, tested static (not udev managed)
device inodes.

Jens, I think at least patches 1-3 should go in together with my fixes you
already have in your tree (or shortly after them). It is up to you whether
you decide to delay my first fixes or pick these up quickly. Patch 4 is
(IMHO a cleaner) replacement of Dan's patches so consider whether you want
to use it instead of those patches.

Changes since v1:
* Added Acks and Tested-by tags for patches in areas that did not change
* Reworked inode_detach_wb() related fixes based on Tejun's feedback

								Honza

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 01/13] block: Move bdev_unhash_inode() after invalidate_partition()
  2017-02-21 17:09 [PATCH 0/13 v2] block: Fix block device shutdown related races Jan Kara
@ 2017-02-21 17:09 ` Jan Kara
  2017-02-21 17:09 ` [PATCH 02/13] block: Unhash also block device inode for the whole device Jan Kara
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 40+ messages in thread
From: Jan Kara @ 2017-02-21 17:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo, NeilBrown,
	Omar Sandoval, Jan Kara

Move bdev_unhash_inode() after invalidate_partition() as
invalidate_partition() looks up bdev and it cannot find the right bdev
inode after bdev_unhash_inode() is called. Thus invalidate_partition()
would not invalidate page cache of the previously used bdev. Also use
part_devt() when calling bdev_unhash_inode() instead of manually
creating the device number.

Tested-by: Lekshmi Pillai <lekshmicpillai@in.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/genhd.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index d9ccd42f3675..6cb9f3a34a92 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -648,9 +648,8 @@ void del_gendisk(struct gendisk *disk)
 	disk_part_iter_init(&piter, disk,
 			     DISK_PITER_INCL_EMPTY | DISK_PITER_REVERSE);
 	while ((part = disk_part_iter_next(&piter))) {
-		bdev_unhash_inode(MKDEV(disk->major,
-					disk->first_minor + part->partno));
 		invalidate_partition(disk, part->partno);
+		bdev_unhash_inode(part_devt(part));
 		delete_partition(disk, part->partno);
 	}
 	disk_part_iter_exit(&piter);
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 02/13] block: Unhash also block device inode for the whole device
  2017-02-21 17:09 [PATCH 0/13 v2] block: Fix block device shutdown related races Jan Kara
  2017-02-21 17:09 ` [PATCH 01/13] block: Move bdev_unhash_inode() after invalidate_partition() Jan Kara
@ 2017-02-21 17:09 ` Jan Kara
  2017-02-21 17:09 ` [PATCH 03/13] block: Revalidate i_bdev reference in bd_aquire() Jan Kara
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 40+ messages in thread
From: Jan Kara @ 2017-02-21 17:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo, NeilBrown,
	Omar Sandoval, Jan Kara

Iteration over partitions in del_gendisk() omits part0. Add
bdev_unhash_inode() call for the whole device. Otherwise if the device
number gets reused, bdev inode will be still associated with the old
(stale) bdi.

Tested-by: Lekshmi Pillai <lekshmicpillai@in.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/genhd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/genhd.c b/block/genhd.c
index 6cb9f3a34a92..f6c4d4400759 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -655,6 +655,7 @@ void del_gendisk(struct gendisk *disk)
 	disk_part_iter_exit(&piter);
 
 	invalidate_partition(disk, 0);
+	bdev_unhash_inode(disk_devt(disk));
 	set_capacity(disk, 0);
 	disk->flags &= ~GENHD_FL_UP;
 
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 03/13] block: Revalidate i_bdev reference in bd_aquire()
  2017-02-21 17:09 [PATCH 0/13 v2] block: Fix block device shutdown related races Jan Kara
  2017-02-21 17:09 ` [PATCH 01/13] block: Move bdev_unhash_inode() after invalidate_partition() Jan Kara
  2017-02-21 17:09 ` [PATCH 02/13] block: Unhash also block device inode for the whole device Jan Kara
@ 2017-02-21 17:09 ` Jan Kara
  2017-02-21 17:09 ` [PATCH 04/13] block: Move bdi_unregister() to del_gendisk() Jan Kara
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 40+ messages in thread
From: Jan Kara @ 2017-02-21 17:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo, NeilBrown,
	Omar Sandoval, Jan Kara

When a device gets removed, block device inode unhashed so that it is not
used anymore (bdget() will not find it anymore). Later when a new device
gets created with the same device number, we create new block device
inode. However there may be file system device inodes whose i_bdev still
points to the original block device inode and thus we get two active
block device inodes for the same device. They will share the same
gendisk so the only visible differences will be that page caches will
not be coherent and BDIs will be different (the old block device inode
still points to unregistered BDI).

Fix the problem by checking in bd_acquire() whether i_bdev still points
to active block device inode and re-lookup the block device if not. That
way any open of a block device happening after the old device has been
removed will get correct block device inode.

Tested-by: Lekshmi Pillai <lekshmicpillai@in.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/block_dev.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 601b71b76d7f..68e855fdce58 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1043,13 +1043,22 @@ static struct block_device *bd_acquire(struct inode *inode)
 
 	spin_lock(&bdev_lock);
 	bdev = inode->i_bdev;
-	if (bdev) {
+	if (bdev && !inode_unhashed(bdev->bd_inode)) {
 		bdgrab(bdev);
 		spin_unlock(&bdev_lock);
 		return bdev;
 	}
 	spin_unlock(&bdev_lock);
 
+	/*
+	 * i_bdev references block device inode that was already shut down
+	 * (corresponding device got removed).  Remove the reference and look
+	 * up block device inode again just in case new device got
+	 * reestablished under the same device number.
+	 */
+	if (bdev)
+		bd_forget(inode);
+
 	bdev = bdget(inode->i_rdev);
 	if (bdev) {
 		spin_lock(&bdev_lock);
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 04/13] block: Move bdi_unregister() to del_gendisk()
  2017-02-21 17:09 [PATCH 0/13 v2] block: Fix block device shutdown related races Jan Kara
                   ` (2 preceding siblings ...)
  2017-02-21 17:09 ` [PATCH 03/13] block: Revalidate i_bdev reference in bd_aquire() Jan Kara
@ 2017-02-21 17:09 ` Jan Kara
  2017-02-21 19:53   ` Bart Van Assche
  2017-02-21 17:09 ` [PATCH 05/13] bdi: Mark congested->bdi as internal Jan Kara
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 40+ messages in thread
From: Jan Kara @ 2017-02-21 17:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo, NeilBrown,
	Omar Sandoval, Jan Kara

Commit 6cd18e711dd8 "block: destroy bdi before blockdev is
unregistered." moved bdi unregistration (at that time through
bdi_destroy()) from blk_release_queue() to blk_cleanup_queue() because
it needs to happen before blk_unregister_region() call in del_gendisk()
for MD. As much as it is fine for device registration / unregistration
purposes, it does not fit our needs wrt writeback code. For those we
will need bdi_unregister() to happen after bdev_unhash_inode() so that
we are sure bdev inode is destroyed or soon to be destroyed (as soon as
last inode reference is dropped and nobody should be holding bdev inode
reference for long at this point) because bdi_unregister() may block
waiting for bdev's inode i_wb reference to be dropped and that happens
only once bdev inode gets destroyed.

Also SCSI will free up the device number from sd_remove() called through
a maze of callbacks from device_del() in __scsi_remove_device() before
blk_cleanup_queue() and thus similar races as described in 6cd18e711dd8
can happen for SCSI as well as reported by Omar [1]. Moving
bdi_unregister() to del_gendisk() fixes these problems as well since
del_gendisk() gets called from sd_remove() before freeing the device
number.

This also makes device_add_disk() (calling bdi_register_owner()) more
symmetric with del_gendisk().

[1] http://marc.info/?l=linux-block&m=148554717109098&w=2

Tested-by: Lekshmi Pillai <lekshmicpillai@in.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/blk-core.c | 2 --
 block/genhd.c    | 7 +++++++
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 47104f6a398b..9a901dcfdd5c 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -580,8 +580,6 @@ void blk_cleanup_queue(struct request_queue *q)
 		q->queue_lock = &q->__queue_lock;
 	spin_unlock_irq(lock);
 
-	bdi_unregister(q->backing_dev_info);
-
 	/* @q is and will stay empty, shutdown and put */
 	blk_put_queue(q);
 }
diff --git a/block/genhd.c b/block/genhd.c
index f6c4d4400759..68c613edb93a 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -660,6 +660,13 @@ void del_gendisk(struct gendisk *disk)
 	disk->flags &= ~GENHD_FL_UP;
 
 	sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
+	/*
+	 * Unregister bdi before releasing device numbers (as they can get
+	 * reused and we'd get clashes in sysfs) but after bdev inodes are
+	 * unhashed and thus will be soon destroyed as bdev inode's reference
+	 * to wb_writeback can block bdi_unregister().
+	 */
+	bdi_unregister(disk->queue->backing_dev_info);
 	blk_unregister_queue(disk);
 	blk_unregister_region(disk_devt(disk), disk->minors);
 
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 05/13] bdi: Mark congested->bdi as internal
  2017-02-21 17:09 [PATCH 0/13 v2] block: Fix block device shutdown related races Jan Kara
                   ` (3 preceding siblings ...)
  2017-02-21 17:09 ` [PATCH 04/13] block: Move bdi_unregister() to del_gendisk() Jan Kara
@ 2017-02-21 17:09 ` Jan Kara
  2017-02-28 16:06   ` Tejun Heo
  2017-02-21 17:09 ` [PATCH 06/13] bdi: Make wb->bdi a proper reference Jan Kara
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 40+ messages in thread
From: Jan Kara @ 2017-02-21 17:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo, NeilBrown,
	Omar Sandoval, Jan Kara

congested->bdi pointer is used only to be able to remove congested
structure from bdi->cgwb_congested_tree on structure release. Moreover
the pointer can become NULL when we unregister the bdi. Rename the field
to __bdi and add a comment to make it more explicit this is internal
stuff of memcg writeback code and people should not use the field as
such use will be likely race prone.

We do not bother with converting congested->bdi to a proper refcounted
reference. It will be slightly ugly to special-case bdi->wb.congested to
avoid effectively a cyclic reference of bdi to itself and the reference
gets cleared from bdi_unregister() making it impossible to reference
a freed bdi.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/backing-dev-defs.h |  4 +++-
 mm/backing-dev.c                 | 10 +++++-----
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index ad955817916d..8fb3dcdebc80 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -54,7 +54,9 @@ struct bdi_writeback_congested {
 	atomic_t refcnt;		/* nr of attached wb's and blkg */
 
 #ifdef CONFIG_CGROUP_WRITEBACK
-	struct backing_dev_info *bdi;	/* the associated bdi */
+	struct backing_dev_info *__bdi;	/* the associated bdi, set to NULL
+					 * on bdi unregistration. For memcg-wb
+					 * internal use only! */
 	int blkcg_id;			/* ID of the associated blkcg */
 	struct rb_node rb_node;		/* on bdi->cgwb_congestion_tree */
 #endif
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 28ce6cf7b2ff..c324eae17f0d 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -438,7 +438,7 @@ wb_congested_get_create(struct backing_dev_info *bdi, int blkcg_id, gfp_t gfp)
 		return NULL;
 
 	atomic_set(&new_congested->refcnt, 0);
-	new_congested->bdi = bdi;
+	new_congested->__bdi = bdi;
 	new_congested->blkcg_id = blkcg_id;
 	goto retry;
 
@@ -466,10 +466,10 @@ void wb_congested_put(struct bdi_writeback_congested *congested)
 	}
 
 	/* bdi might already have been destroyed leaving @congested unlinked */
-	if (congested->bdi) {
+	if (congested->__bdi) {
 		rb_erase(&congested->rb_node,
-			 &congested->bdi->cgwb_congested_tree);
-		congested->bdi = NULL;
+			 &congested->__bdi->cgwb_congested_tree);
+		congested->__bdi = NULL;
 	}
 
 	spin_unlock_irqrestore(&cgwb_lock, flags);
@@ -698,7 +698,7 @@ static void cgwb_bdi_destroy(struct backing_dev_info *bdi)
 			rb_entry(rbn, struct bdi_writeback_congested, rb_node);
 
 		rb_erase(rbn, &bdi->cgwb_congested_tree);
-		congested->bdi = NULL;	/* mark @congested unlinked */
+		congested->__bdi = NULL;	/* mark @congested unlinked */
 	}
 
 	spin_unlock_irq(&cgwb_lock);
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 06/13] bdi: Make wb->bdi a proper reference
  2017-02-21 17:09 [PATCH 0/13 v2] block: Fix block device shutdown related races Jan Kara
                   ` (4 preceding siblings ...)
  2017-02-21 17:09 ` [PATCH 05/13] bdi: Mark congested->bdi as internal Jan Kara
@ 2017-02-21 17:09 ` Jan Kara
  2017-02-28 16:07   ` Tejun Heo
  2017-02-21 17:09 ` [PATCH 07/13] bdi: Move removal from bdi->wb_list into wb_shutdown() Jan Kara
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 40+ messages in thread
From: Jan Kara @ 2017-02-21 17:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo, NeilBrown,
	Omar Sandoval, Jan Kara

Make wb->bdi a proper refcounted reference to bdi for all bdi_writeback
structures except for the one embedded inside struct backing_dev_info.
That will allow us to simplify bdi unregistration.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/backing-dev.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index c324eae17f0d..d7aaf2517c30 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -294,6 +294,8 @@ static int wb_init(struct bdi_writeback *wb, struct backing_dev_info *bdi,
 
 	memset(wb, 0, sizeof(*wb));
 
+	if (wb != &bdi->wb)
+		bdi_get(bdi);
 	wb->bdi = bdi;
 	wb->last_old_flush = jiffies;
 	INIT_LIST_HEAD(&wb->b_dirty);
@@ -314,8 +316,10 @@ static int wb_init(struct bdi_writeback *wb, struct backing_dev_info *bdi,
 	wb->dirty_sleep = jiffies;
 
 	wb->congested = wb_congested_get_create(bdi, blkcg_id, gfp);
-	if (!wb->congested)
-		return -ENOMEM;
+	if (!wb->congested) {
+		err = -ENOMEM;
+		goto out_put_bdi;
+	}
 
 	err = fprop_local_init_percpu(&wb->completions, gfp);
 	if (err)
@@ -335,6 +339,9 @@ static int wb_init(struct bdi_writeback *wb, struct backing_dev_info *bdi,
 	fprop_local_destroy_percpu(&wb->completions);
 out_put_cong:
 	wb_congested_put(wb->congested);
+out_put_bdi:
+	if (wb != &bdi->wb)
+		bdi_put(bdi);
 	return err;
 }
 
@@ -372,6 +379,8 @@ static void wb_exit(struct bdi_writeback *wb)
 
 	fprop_local_destroy_percpu(&wb->completions);
 	wb_congested_put(wb->congested);
+	if (wb != &wb->bdi->wb)
+		bdi_put(wb->bdi);
 }
 
 #ifdef CONFIG_CGROUP_WRITEBACK
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 07/13] bdi: Move removal from bdi->wb_list into wb_shutdown()
  2017-02-21 17:09 [PATCH 0/13 v2] block: Fix block device shutdown related races Jan Kara
                   ` (5 preceding siblings ...)
  2017-02-21 17:09 ` [PATCH 06/13] bdi: Make wb->bdi a proper reference Jan Kara
@ 2017-02-21 17:09 ` Jan Kara
  2017-02-28 16:14   ` Tejun Heo
  2017-02-21 17:09 ` [PATCH 08/13] bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy() Jan Kara
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 40+ messages in thread
From: Jan Kara @ 2017-02-21 17:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo, NeilBrown,
	Omar Sandoval, Jan Kara

Currently the removal from bdi->wb_list happens directly in
cgwb_release_workfn(). Move it to wb_shutdown() which is functionally
equivalent and more logical (the list gets only used for distributing
writeback works among bdi_writeback structures). It will also allow us
to simplify writeback shutdown in cgwb_bdi_destroy().

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/backing-dev.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index d7aaf2517c30..54b9e934eef4 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -345,6 +345,8 @@ static int wb_init(struct bdi_writeback *wb, struct backing_dev_info *bdi,
 	return err;
 }
 
+static void cgwb_remove_from_bdi_list(struct bdi_writeback *wb);
+
 /*
  * Remove bdi from the global list and shutdown any threads we have running
  */
@@ -358,6 +360,7 @@ static void wb_shutdown(struct bdi_writeback *wb)
 	}
 	spin_unlock_bh(&wb->work_lock);
 
+	cgwb_remove_from_bdi_list(wb);
 	/*
 	 * Drain work list and shutdown the delayed_work.  !WB_registered
 	 * tells wb_workfn() that @wb is dying and its work_list needs to
@@ -491,10 +494,6 @@ static void cgwb_release_workfn(struct work_struct *work)
 						release_work);
 	struct backing_dev_info *bdi = wb->bdi;
 
-	spin_lock_irq(&cgwb_lock);
-	list_del_rcu(&wb->bdi_node);
-	spin_unlock_irq(&cgwb_lock);
-
 	wb_shutdown(wb);
 
 	css_put(wb->memcg_css);
@@ -526,6 +525,13 @@ static void cgwb_kill(struct bdi_writeback *wb)
 	percpu_ref_kill(&wb->refcnt);
 }
 
+static void cgwb_remove_from_bdi_list(struct bdi_writeback *wb)
+{
+	spin_lock_irq(&cgwb_lock);
+	list_del_rcu(&wb->bdi_node);
+	spin_unlock_irq(&cgwb_lock);
+}
+
 static int cgwb_create(struct backing_dev_info *bdi,
 		       struct cgroup_subsys_state *memcg_css, gfp_t gfp)
 {
@@ -776,6 +782,8 @@ static int cgwb_bdi_init(struct backing_dev_info *bdi)
 	return 0;
 }
 
+static void cgwb_remove_from_bdi_list(struct bdi_writeback *wb) { }
+
 static void cgwb_bdi_destroy(struct backing_dev_info *bdi) { }
 
 #endif	/* CONFIG_CGROUP_WRITEBACK */
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 08/13] bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()
  2017-02-21 17:09 [PATCH 0/13 v2] block: Fix block device shutdown related races Jan Kara
                   ` (6 preceding siblings ...)
  2017-02-21 17:09 ` [PATCH 07/13] bdi: Move removal from bdi->wb_list into wb_shutdown() Jan Kara
@ 2017-02-21 17:09 ` Jan Kara
  2017-02-28 16:44   ` Tejun Heo
  2017-02-21 17:09 ` [PATCH 09/13] bdi: Do not wait for cgwbs release in bdi_unregister() Jan Kara
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 40+ messages in thread
From: Jan Kara @ 2017-02-21 17:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo, NeilBrown,
	Omar Sandoval, Jan Kara

Currently we waited for all cgwbs to get freed in cgwb_bdi_destroy()
which also means that writeback has been shutdown on them. Since this
wait is going away, directly shutdown writeback on cgwbs from
cgwb_bdi_destroy() to avoid live writeback structures after
bdi_unregister() has finished.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/backing-dev.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 54b9e934eef4..c9623b410170 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -700,6 +700,7 @@ static void cgwb_bdi_destroy(struct backing_dev_info *bdi)
 	struct radix_tree_iter iter;
 	struct rb_node *rbn;
 	void **slot;
+	struct wb_writeback *wb;
 
 	WARN_ON(test_bit(WB_registered, &bdi->wb.state));
 
@@ -716,6 +717,14 @@ static void cgwb_bdi_destroy(struct backing_dev_info *bdi)
 		congested->__bdi = NULL;	/* mark @congested unlinked */
 	}
 
+	while (!list_empty(&bdi->wb_list)) {
+		wb = list_first_entry(&bdi->wb_list, struct bdi_writeback,
+				      bdi_node);
+		spin_unlock_irq(&cgwb_lock);
+		wb_shutdown(wb);
+		spin_lock_irq(&cgwb_lock);
+	}
+
 	spin_unlock_irq(&cgwb_lock);
 
 	/*
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 09/13] bdi: Do not wait for cgwbs release in bdi_unregister()
  2017-02-21 17:09 [PATCH 0/13 v2] block: Fix block device shutdown related races Jan Kara
                   ` (7 preceding siblings ...)
  2017-02-21 17:09 ` [PATCH 08/13] bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy() Jan Kara
@ 2017-02-21 17:09 ` Jan Kara
  2017-02-28 16:51   ` Tejun Heo
  2017-02-21 17:09 ` [PATCH 10/13] bdi: Rename cgwb_bdi_destroy() to cgwb_bdi_unregister() Jan Kara
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 40+ messages in thread
From: Jan Kara @ 2017-02-21 17:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo, NeilBrown,
	Omar Sandoval, Jan Kara

Currently we wait for all cgwbs to get released in cgwb_bdi_destroy()
(called from bdi_unregister()). That is however unnecessary now when
cgwb->bdi is a proper refcounted reference (thus bdi cannot get
released before all cgwbs are released) and when cgwb_bdi_destroy()
shuts down writeback directly.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/backing-dev-defs.h |  1 -
 mm/backing-dev.c                 | 18 +-----------------
 2 files changed, 1 insertion(+), 18 deletions(-)

diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index 8fb3dcdebc80..7bd5ba9890b0 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -163,7 +163,6 @@ struct backing_dev_info {
 #ifdef CONFIG_CGROUP_WRITEBACK
 	struct radix_tree_root cgwb_tree; /* radix tree of active cgroup wbs */
 	struct rb_root cgwb_congested_tree; /* their congested states */
-	atomic_t usage_cnt; /* counts both cgwbs and cgwb_contested's */
 #else
 	struct bdi_writeback_congested *wb_congested;
 #endif
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index c9623b410170..31cdee91e826 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -393,11 +393,9 @@ static void wb_exit(struct bdi_writeback *wb)
 /*
  * cgwb_lock protects bdi->cgwb_tree, bdi->cgwb_congested_tree,
  * blkcg->cgwb_list, and memcg->cgwb_list.  bdi->cgwb_tree is also RCU
- * protected.  cgwb_release_wait is used to wait for the completion of cgwb
- * releases from bdi destruction path.
+ * protected.
  */
 static DEFINE_SPINLOCK(cgwb_lock);
-static DECLARE_WAIT_QUEUE_HEAD(cgwb_release_wait);
 
 /**
  * wb_congested_get_create - get or create a wb_congested
@@ -492,7 +490,6 @@ static void cgwb_release_workfn(struct work_struct *work)
 {
 	struct bdi_writeback *wb = container_of(work, struct bdi_writeback,
 						release_work);
-	struct backing_dev_info *bdi = wb->bdi;
 
 	wb_shutdown(wb);
 
@@ -503,9 +500,6 @@ static void cgwb_release_workfn(struct work_struct *work)
 	percpu_ref_exit(&wb->refcnt);
 	wb_exit(wb);
 	kfree_rcu(wb, rcu);
-
-	if (atomic_dec_and_test(&bdi->usage_cnt))
-		wake_up_all(&cgwb_release_wait);
 }
 
 static void cgwb_release(struct percpu_ref *refcnt)
@@ -595,7 +589,6 @@ static int cgwb_create(struct backing_dev_info *bdi,
 		/* we might have raced another instance of this function */
 		ret = radix_tree_insert(&bdi->cgwb_tree, memcg_css->id, wb);
 		if (!ret) {
-			atomic_inc(&bdi->usage_cnt);
 			list_add_tail_rcu(&wb->bdi_node, &bdi->wb_list);
 			list_add(&wb->memcg_node, memcg_cgwb_list);
 			list_add(&wb->blkcg_node, blkcg_cgwb_list);
@@ -685,7 +678,6 @@ static int cgwb_bdi_init(struct backing_dev_info *bdi)
 
 	INIT_RADIX_TREE(&bdi->cgwb_tree, GFP_ATOMIC);
 	bdi->cgwb_congested_tree = RB_ROOT;
-	atomic_set(&bdi->usage_cnt, 1);
 
 	ret = wb_init(&bdi->wb, bdi, 1, GFP_KERNEL);
 	if (!ret) {
@@ -726,14 +718,6 @@ static void cgwb_bdi_destroy(struct backing_dev_info *bdi)
 	}
 
 	spin_unlock_irq(&cgwb_lock);
-
-	/*
-	 * All cgwb's and their congested states must be shutdown and
-	 * released before returning.  Drain the usage counter to wait for
-	 * all cgwb's and cgwb_congested's ever created on @bdi.
-	 */
-	atomic_dec(&bdi->usage_cnt);
-	wait_event(cgwb_release_wait, !atomic_read(&bdi->usage_cnt));
 }
 
 /**
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 10/13] bdi: Rename cgwb_bdi_destroy() to cgwb_bdi_unregister()
  2017-02-21 17:09 [PATCH 0/13 v2] block: Fix block device shutdown related races Jan Kara
                   ` (8 preceding siblings ...)
  2017-02-21 17:09 ` [PATCH 09/13] bdi: Do not wait for cgwbs release in bdi_unregister() Jan Kara
@ 2017-02-21 17:09 ` Jan Kara
  2017-02-28 16:51   ` Tejun Heo
  2017-02-21 17:09 ` [PATCH 11/13] block: Fix oops in locked_inode_to_wb_and_lock_list() Jan Kara
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 40+ messages in thread
From: Jan Kara @ 2017-02-21 17:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo, NeilBrown,
	Omar Sandoval, Jan Kara

Rename cgwb_bdi_destroy() to cgwb_bdi_unregister() as it gets called
from bdi_unregister() which is not necessarily called from bdi_destroy()
and thus the name is somewhat misleading.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/backing-dev.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 31cdee91e826..b1b50cce3f36 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -687,7 +687,7 @@ static int cgwb_bdi_init(struct backing_dev_info *bdi)
 	return ret;
 }
 
-static void cgwb_bdi_destroy(struct backing_dev_info *bdi)
+static void cgwb_bdi_unregister(struct backing_dev_info *bdi)
 {
 	struct radix_tree_iter iter;
 	struct rb_node *rbn;
@@ -777,7 +777,7 @@ static int cgwb_bdi_init(struct backing_dev_info *bdi)
 
 static void cgwb_remove_from_bdi_list(struct bdi_writeback *wb) { }
 
-static void cgwb_bdi_destroy(struct backing_dev_info *bdi) { }
+static void cgwb_bdi_unregister(struct backing_dev_info *bdi) { }
 
 #endif	/* CONFIG_CGROUP_WRITEBACK */
 
@@ -885,7 +885,7 @@ void bdi_unregister(struct backing_dev_info *bdi)
 	/* make sure nobody finds us on the bdi_list anymore */
 	bdi_remove_from_list(bdi);
 	wb_shutdown(&bdi->wb);
-	cgwb_bdi_destroy(bdi);
+	cgwb_bdi_unregister(bdi);
 
 	if (bdi->dev) {
 		bdi_debug_unregister(bdi);
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 11/13] block: Fix oops in locked_inode_to_wb_and_lock_list()
  2017-02-21 17:09 [PATCH 0/13 v2] block: Fix block device shutdown related races Jan Kara
                   ` (9 preceding siblings ...)
  2017-02-21 17:09 ` [PATCH 10/13] bdi: Rename cgwb_bdi_destroy() to cgwb_bdi_unregister() Jan Kara
@ 2017-02-21 17:09 ` Jan Kara
  2017-02-28 16:52   ` Tejun Heo
  2017-02-21 17:09 ` [PATCH 12/13] kobject: Export kobject_get_unless_zero() Jan Kara
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 40+ messages in thread
From: Jan Kara @ 2017-02-21 17:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo, NeilBrown,
	Omar Sandoval, Jan Kara

When block device is closed, we call inode_detach_wb() in __blkdev_put()
which sets inode->i_wb to NULL. That is contrary to expectations that
inode->i_wb stays valid once set during the whole inode's lifetime and
leads to oops in wb_get() in locked_inode_to_wb_and_lock_list() because
inode_to_wb() returned NULL.

The reason why we called inode_detach_wb() is not valid anymore though.
BDI is guaranteed to stay along until we call bdi_put() from
bdev_evict_inode() so we can postpone calling inode_detach_wb() to that
moment.

Also add a warning to catch if someone uses inode_detach_wb() in a
dangerous way.

Reported-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/block_dev.c            | 8 ++------
 include/linux/writeback.h | 1 +
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 68e855fdce58..a0a8a000bdde 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -884,6 +884,8 @@ static void bdev_evict_inode(struct inode *inode)
 	spin_lock(&bdev_lock);
 	list_del_init(&bdev->bd_list);
 	spin_unlock(&bdev_lock);
+	/* Detach inode from wb early as bdi_put() may free bdi->wb */
+	inode_detach_wb(inode);
 	if (bdev->bd_bdi != &noop_backing_dev_info)
 		bdi_put(bdev->bd_bdi);
 }
@@ -1874,12 +1876,6 @@ static void __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part)
 		kill_bdev(bdev);
 
 		bdev_write_inode(bdev);
-		/*
-		 * Detaching bdev inode from its wb in __destroy_inode()
-		 * is too late: the queue which embeds its bdi (along with
-		 * root wb) can be gone as soon as we put_disk() below.
-		 */
-		inode_detach_wb(bdev->bd_inode);
 	}
 	if (bdev->bd_contains == bdev) {
 		if (disk->fops->release)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 5527d910ba3d..f1af3f67ce5a 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -237,6 +237,7 @@ static inline void inode_attach_wb(struct inode *inode, struct page *page)
 static inline void inode_detach_wb(struct inode *inode)
 {
 	if (inode->i_wb) {
+		WARN_ON_ONCE(!(inode->i_state & I_CLEAR));
 		wb_put(inode->i_wb);
 		inode->i_wb = NULL;
 	}
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 12/13] kobject: Export kobject_get_unless_zero()
  2017-02-21 17:09 [PATCH 0/13 v2] block: Fix block device shutdown related races Jan Kara
                   ` (10 preceding siblings ...)
  2017-02-21 17:09 ` [PATCH 11/13] block: Fix oops in locked_inode_to_wb_and_lock_list() Jan Kara
@ 2017-02-21 17:09 ` Jan Kara
  2017-02-21 19:39   ` Bart Van Assche
  2017-02-21 17:09 ` [PATCH 13/13] block: Fix oops scsi_disk_get() Jan Kara
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 40+ messages in thread
From: Jan Kara @ 2017-02-21 17:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo, NeilBrown,
	Omar Sandoval, Jan Kara, Greg Kroah-Hartman

Make the function available for outside use and fortify it against NULL
kobject.

CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/kobject.h | 2 ++
 lib/kobject.c           | 5 ++++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/kobject.h b/include/linux/kobject.h
index e6284591599e..ca85cb80e99a 100644
--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -108,6 +108,8 @@ extern int __must_check kobject_rename(struct kobject *, const char *new_name);
 extern int __must_check kobject_move(struct kobject *, struct kobject *);
 
 extern struct kobject *kobject_get(struct kobject *kobj);
+extern struct kobject * __must_check kobject_get_unless_zero(
+						struct kobject *kobj);
 extern void kobject_put(struct kobject *kobj);
 
 extern const void *kobject_namespace(struct kobject *kobj);
diff --git a/lib/kobject.c b/lib/kobject.c
index 445dcaeb0f56..763d70a18941 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -601,12 +601,15 @@ struct kobject *kobject_get(struct kobject *kobj)
 }
 EXPORT_SYMBOL(kobject_get);
 
-static struct kobject * __must_check kobject_get_unless_zero(struct kobject *kobj)
+struct kobject * __must_check kobject_get_unless_zero(struct kobject *kobj)
 {
+	if (!kobj)
+		return NULL;
 	if (!kref_get_unless_zero(&kobj->kref))
 		kobj = NULL;
 	return kobj;
 }
+EXPORT_SYMBOL(kobject_get_unless_zero);
 
 /*
  * kobject_cleanup - free kobject resources.
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 13/13] block: Fix oops scsi_disk_get()
  2017-02-21 17:09 [PATCH 0/13 v2] block: Fix block device shutdown related races Jan Kara
                   ` (11 preceding siblings ...)
  2017-02-21 17:09 ` [PATCH 12/13] kobject: Export kobject_get_unless_zero() Jan Kara
@ 2017-02-21 17:09 ` Jan Kara
  2017-02-21 19:33   ` Bart Van Assche
  2017-02-21 17:19 ` [PATCH 0/13 v2] block: Fix block device shutdown related races Jan Kara
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 40+ messages in thread
From: Jan Kara @ 2017-02-21 17:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo, NeilBrown,
	Omar Sandoval, Jan Kara

When device open races with device shutdown, we can get the following
oops in scsi_disk_get():

[11863.044351] general protection fault: 0000 [#1] SMP
[11863.045561] Modules linked in: scsi_debug xfs libcrc32c netconsole btrfs raid6_pq zlib_deflate lzo_compress xor [last unloaded: loop]
[11863.047853] CPU: 3 PID: 13042 Comm: hald-probe-stor Tainted: G W      4.10.0-rc2-xen+ #35
[11863.048030] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[11863.048030] task: ffff88007f438200 task.stack: ffffc90000fd0000
[11863.048030] RIP: 0010:scsi_disk_get+0x43/0x70
[11863.048030] RSP: 0018:ffffc90000fd3a08 EFLAGS: 00010202
[11863.048030] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88007f56d000 RCX: 0000000000000000
[11863.048030] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffffff81a8d880
[11863.048030] RBP: ffffc90000fd3a18 R08: 0000000000000000 R09: 0000000000000001
[11863.059217] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffffa
[11863.059217] R13: ffff880078872800 R14: ffff880070915540 R15: 000000000000001d
[11863.059217] FS:  00007f2611f71800(0000) GS:ffff88007f0c0000(0000) knlGS:0000000000000000
[11863.059217] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[11863.059217] CR2: 000000000060e048 CR3: 00000000778d4000 CR4: 00000000000006e0
[11863.059217] Call Trace:
[11863.059217]  ? disk_get_part+0x22/0x1f0
[11863.059217]  sd_open+0x39/0x130
[11863.059217]  __blkdev_get+0x69/0x430
[11863.059217]  ? bd_acquire+0x7f/0xc0
[11863.059217]  ? bd_acquire+0x96/0xc0
[11863.059217]  ? blkdev_get+0x350/0x350
[11863.059217]  blkdev_get+0x126/0x350
[11863.059217]  ? _raw_spin_unlock+0x2b/0x40
[11863.059217]  ? bd_acquire+0x7f/0xc0
[11863.059217]  ? blkdev_get+0x350/0x350
[11863.059217]  blkdev_open+0x65/0x80
...

As you can see RAX value is already poisoned showing that gendisk we got
is already freed. The problem is that get_gendisk() looks up device
number in ext_devt_idr and then does get_disk() which does kobject_get()
on the disks kobject. However the disk gets removed from ext_devt_idr
only in disk_release() (through blk_free_devt()) at which moment it has
already 0 refcount and is already on its way to be freed. Indeed we've
got a warning from kobject_get() about 0 refcount shortly before the
oops.

We fix the problem by using kobject_get_unless_zero() in get_disk() so
that get_disk() cannot get reference on a disk that is already being
freed.

Tested-by: Lekshmi Pillai <lekshmicpillai@in.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/genhd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/genhd.c b/block/genhd.c
index 68c613edb93a..2baacfea7b5e 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1350,7 +1350,7 @@ struct kobject *get_disk(struct gendisk *disk)
 	owner = disk->fops->owner;
 	if (owner && !try_module_get(owner))
 		return NULL;
-	kobj = kobject_get(&disk_to_dev(disk)->kobj);
+	kobj = kobject_get_unless_zero(&disk_to_dev(disk)->kobj);
 	if (kobj == NULL) {
 		module_put(owner);
 		return NULL;
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/13 v2] block: Fix block device shutdown related races
  2017-02-21 17:09 [PATCH 0/13 v2] block: Fix block device shutdown related races Jan Kara
                   ` (12 preceding siblings ...)
  2017-02-21 17:09 ` [PATCH 13/13] block: Fix oops scsi_disk_get() Jan Kara
@ 2017-02-21 17:19 ` Jan Kara
  2017-02-21 18:55   ` Dan Williams
  2017-02-21 17:19 ` Jens Axboe
  2017-02-28 16:54 ` Tejun Heo
  15 siblings, 1 reply; 40+ messages in thread
From: Jan Kara @ 2017-02-21 17:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo, NeilBrown,
	Omar Sandoval, Jan Kara

On Tue 21-02-17 18:09:45, Jan Kara wrote:
> Hello,
> 
> this is a second revision of the patch set to fix several different races and
> issues I've found when testing device shutdown and reuse. The first three
> patches are fixes to problems in my previous series fixing BDI lifetime issues.
> Patch 4 fixes issues with reuse of BDI name with scsi devices. With it I cannot
> reproduce the BDI name reuse issues using Omar's stress test using scsi_debug
> so it can be used as a replacement of Dan's patches. Patches 5-11 fix oops that
> is triggered by __blkdev_put() calling inode_detach_wb() too early (the problem
> reported by Thiago). Patches 12 and 13 fix oops due to a bug in gendisk code
> where get_gendisk() can return already freed gendisk structure (again triggered
> by Omar's stress test).
> 
> People, please have a look at patches. They are mostly simple however the
> interactions are rather complex so I may have missed something. Also I'm
> happy for any additional testing these patches can get - I've stressed them
> with Omar's script, tested memcg writeback, tested static (not udev managed)
> device inodes.
> 
> Jens, I think at least patches 1-3 should go in together with my fixes you
> already have in your tree (or shortly after them). It is up to you whether
> you decide to delay my first fixes or pick these up quickly. Patch 4 is
> (IMHO a cleaner) replacement of Dan's patches so consider whether you want
> to use it instead of those patches.

I forgot to add that the patches are also available from my git tree:

git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git bdi

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/13 v2] block: Fix block device shutdown related races
  2017-02-21 17:09 [PATCH 0/13 v2] block: Fix block device shutdown related races Jan Kara
                   ` (13 preceding siblings ...)
  2017-02-21 17:19 ` [PATCH 0/13 v2] block: Fix block device shutdown related races Jan Kara
@ 2017-02-21 17:19 ` Jens Axboe
  2017-02-22 10:24   ` Jan Kara
  2017-02-28 16:54 ` Tejun Heo
  15 siblings, 1 reply; 40+ messages in thread
From: Jens Axboe @ 2017-02-21 17:19 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo, NeilBrown,
	Omar Sandoval

On 02/21/2017 10:09 AM, Jan Kara wrote:
> Hello,
> 
> this is a second revision of the patch set to fix several different races and
> issues I've found when testing device shutdown and reuse. The first three
> patches are fixes to problems in my previous series fixing BDI lifetime issues.
> Patch 4 fixes issues with reuse of BDI name with scsi devices. With it I cannot
> reproduce the BDI name reuse issues using Omar's stress test using scsi_debug
> so it can be used as a replacement of Dan's patches. Patches 5-11 fix oops that
> is triggered by __blkdev_put() calling inode_detach_wb() too early (the problem
> reported by Thiago). Patches 12 and 13 fix oops due to a bug in gendisk code
> where get_gendisk() can return already freed gendisk structure (again triggered
> by Omar's stress test).
> 
> People, please have a look at patches. They are mostly simple however the
> interactions are rather complex so I may have missed something. Also I'm
> happy for any additional testing these patches can get - I've stressed them
> with Omar's script, tested memcg writeback, tested static (not udev managed)
> device inodes.
> 
> Jens, I think at least patches 1-3 should go in together with my fixes you
> already have in your tree (or shortly after them). It is up to you whether
> you decide to delay my first fixes or pick these up quickly. Patch 4 is
> (IMHO a cleaner) replacement of Dan's patches so consider whether you want
> to use it instead of those patches.

I have applied 1-3 to my for-linus branch, which will go in after
the initial pull request has been pulled by Linus. Consider fixing up
#4 so it applies, I like it.

I'll review the rest later this week.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/13 v2] block: Fix block device shutdown related races
  2017-02-21 17:19 ` [PATCH 0/13 v2] block: Fix block device shutdown related races Jan Kara
@ 2017-02-21 18:55   ` Dan Williams
  0 siblings, 0 replies; 40+ messages in thread
From: Dan Williams @ 2017-02-21 18:55 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig,
	Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo, NeilBrown,
	Omar Sandoval

On Tue, Feb 21, 2017 at 9:19 AM, Jan Kara <jack@suse.cz> wrote:
> On Tue 21-02-17 18:09:45, Jan Kara wrote:
>> Hello,
>>
>> this is a second revision of the patch set to fix several different races and
>> issues I've found when testing device shutdown and reuse. The first three
>> patches are fixes to problems in my previous series fixing BDI lifetime issues.
>> Patch 4 fixes issues with reuse of BDI name with scsi devices. With it I cannot
>> reproduce the BDI name reuse issues using Omar's stress test using scsi_debug
>> so it can be used as a replacement of Dan's patches. Patches 5-11 fix oops that
>> is triggered by __blkdev_put() calling inode_detach_wb() too early (the problem
>> reported by Thiago). Patches 12 and 13 fix oops due to a bug in gendisk code
>> where get_gendisk() can return already freed gendisk structure (again triggered
>> by Omar's stress test).
>>
>> People, please have a look at patches. They are mostly simple however the
>> interactions are rather complex so I may have missed something. Also I'm
>> happy for any additional testing these patches can get - I've stressed them
>> with Omar's script, tested memcg writeback, tested static (not udev managed)
>> device inodes.
>>
>> Jens, I think at least patches 1-3 should go in together with my fixes you
>> already have in your tree (or shortly after them). It is up to you whether
>> you decide to delay my first fixes or pick these up quickly. Patch 4 is
>> (IMHO a cleaner) replacement of Dan's patches so consider whether you want
>> to use it instead of those patches.

FWIW, I wholeheartedly agree with replacing my band-aid with this deeper fix.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 13/13] block: Fix oops scsi_disk_get()
  2017-02-21 17:09 ` [PATCH 13/13] block: Fix oops scsi_disk_get() Jan Kara
@ 2017-02-21 19:33   ` Bart Van Assche
  0 siblings, 0 replies; 40+ messages in thread
From: Bart Van Assche @ 2017-02-21 19:33 UTC (permalink / raw)
  To: Jan Kara, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo, NeilBrown,
	Omar Sandoval

On 02/21/2017 09:55 AM, Jan Kara wrote:=0A=
> When device open races with device shutdown, we can get the following=0A=
> oops in scsi_disk_get(): [ ... ]=0A=
=0A=
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>=0A=

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 12/13] kobject: Export kobject_get_unless_zero()
  2017-02-21 17:09 ` [PATCH 12/13] kobject: Export kobject_get_unless_zero() Jan Kara
@ 2017-02-21 19:39   ` Bart Van Assche
  0 siblings, 0 replies; 40+ messages in thread
From: Bart Van Assche @ 2017-02-21 19:39 UTC (permalink / raw)
  To: Jan Kara, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo, NeilBrown,
	Omar Sandoval, Greg Kroah-Hartman

On 02/21/2017 09:55 AM, Jan Kara wrote:=0A=
> Make the function available for outside use and fortify it against NULL=
=0A=
> kobject.=0A=
=0A=
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>=0A=
=0A=

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 04/13] block: Move bdi_unregister() to del_gendisk()
  2017-02-21 17:09 ` [PATCH 04/13] block: Move bdi_unregister() to del_gendisk() Jan Kara
@ 2017-02-21 19:53   ` Bart Van Assche
  2017-02-22  8:51     ` Jan Kara
  0 siblings, 1 reply; 40+ messages in thread
From: Bart Van Assche @ 2017-02-21 19:53 UTC (permalink / raw)
  To: Jan Kara, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo, NeilBrown,
	Omar Sandoval

On 02/21/2017 09:55 AM, Jan Kara wrote:=0A=
> diff --git a/block/blk-core.c b/block/blk-core.c=0A=
> index 47104f6a398b..9a901dcfdd5c 100644=0A=
> --- a/block/blk-core.c=0A=
> +++ b/block/blk-core.c=0A=
> @@ -580,8 +580,6 @@ void blk_cleanup_queue(struct request_queue *q)=0A=
>  		q->queue_lock =3D &q->__queue_lock;=0A=
>  	spin_unlock_irq(lock);=0A=
>  =0A=
> -	bdi_unregister(q->backing_dev_info);=0A=
> -=0A=
>  	/* @q is and will stay empty, shutdown and put */=0A=
>  	blk_put_queue(q);=0A=
>  }=0A=
> diff --git a/block/genhd.c b/block/genhd.c=0A=
> index f6c4d4400759..68c613edb93a 100644=0A=
> --- a/block/genhd.c=0A=
> +++ b/block/genhd.c=0A=
> @@ -660,6 +660,13 @@ void del_gendisk(struct gendisk *disk)=0A=
>  	disk->flags &=3D ~GENHD_FL_UP;=0A=
>  =0A=
>  	sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");=0A=
> +	/*=0A=
> +	 * Unregister bdi before releasing device numbers (as they can get=0A=
> +	 * reused and we'd get clashes in sysfs) but after bdev inodes are=0A=
> +	 * unhashed and thus will be soon destroyed as bdev inode's reference=
=0A=
> +	 * to wb_writeback can block bdi_unregister().=0A=
> +	 */=0A=
> +	bdi_unregister(disk->queue->backing_dev_info);=0A=
>  	blk_unregister_queue(disk);=0A=
>  	blk_unregister_region(disk_devt(disk), disk->minors);=0A=
=0A=
This change looks suspicious to me. There are drivers that create a=0A=
block layer queue but neither call device_add_disk() nor del_gendisk(),=0A=
e.g. drivers/scsi/st.c. Although bdi_init() will be called for the=0A=
queues created by these drivers, this patch will cause the=0A=
bdi_unregister() call to be skipped for these drivers.=0A=
=0A=
Bart.=0A=
=0A=

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 04/13] block: Move bdi_unregister() to del_gendisk()
  2017-02-21 19:53   ` Bart Van Assche
@ 2017-02-22  8:51     ` Jan Kara
  2017-02-22 17:42       ` Bart Van Assche
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Kara @ 2017-02-22  8:51 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jan Kara, Jens Axboe, linux-block, Christoph Hellwig,
	Dan Williams, Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo,
	NeilBrown, Omar Sandoval

On Tue 21-02-17 19:53:29, Bart Van Assche wrote:
> On 02/21/2017 09:55 AM, Jan Kara wrote:
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index 47104f6a398b..9a901dcfdd5c 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -580,8 +580,6 @@ void blk_cleanup_queue(struct request_queue *q)
> >  		q->queue_lock = &q->__queue_lock;
> >  	spin_unlock_irq(lock);
> >  
> > -	bdi_unregister(q->backing_dev_info);
> > -
> >  	/* @q is and will stay empty, shutdown and put */
> >  	blk_put_queue(q);
> >  }
> > diff --git a/block/genhd.c b/block/genhd.c
> > index f6c4d4400759..68c613edb93a 100644
> > --- a/block/genhd.c
> > +++ b/block/genhd.c
> > @@ -660,6 +660,13 @@ void del_gendisk(struct gendisk *disk)
> >  	disk->flags &= ~GENHD_FL_UP;
> >  
> >  	sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
> > +	/*
> > +	 * Unregister bdi before releasing device numbers (as they can get
> > +	 * reused and we'd get clashes in sysfs) but after bdev inodes are
> > +	 * unhashed and thus will be soon destroyed as bdev inode's reference
> > +	 * to wb_writeback can block bdi_unregister().
> > +	 */
> > +	bdi_unregister(disk->queue->backing_dev_info);
> >  	blk_unregister_queue(disk);
> >  	blk_unregister_region(disk_devt(disk), disk->minors);
> 
> This change looks suspicious to me. There are drivers that create a
> block layer queue but neither call device_add_disk() nor del_gendisk(),
> e.g. drivers/scsi/st.c. Although bdi_init() will be called for the
> queues created by these drivers, this patch will cause the
> bdi_unregister() call to be skipped for these drivers.

Well, the thing is that bdi_unregister() is the counterpart to
bdi_register(). Unless you call bdi_register(), which happens only in
device_add_disk() (and some filesystems which create their private bdis),
there's no point in calling bdi_unregister(). Counterpart to bdi_init() is
bdi_exit() and that gets called always once bdi reference count drops to 0.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/13 v2] block: Fix block device shutdown related races
  2017-02-21 17:19 ` Jens Axboe
@ 2017-02-22 10:24   ` Jan Kara
  2017-03-01  7:25     ` Omar Sandoval
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Kara @ 2017-02-22 10:24 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Jan Kara, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo, NeilBrown,
	Omar Sandoval

[-- Attachment #1: Type: text/plain, Size: 2548 bytes --]

On Tue 21-02-17 10:19:28, Jens Axboe wrote:
> On 02/21/2017 10:09 AM, Jan Kara wrote:
> > Hello,
> > 
> > this is a second revision of the patch set to fix several different races and
> > issues I've found when testing device shutdown and reuse. The first three
> > patches are fixes to problems in my previous series fixing BDI lifetime issues.
> > Patch 4 fixes issues with reuse of BDI name with scsi devices. With it I cannot
> > reproduce the BDI name reuse issues using Omar's stress test using scsi_debug
> > so it can be used as a replacement of Dan's patches. Patches 5-11 fix oops that
> > is triggered by __blkdev_put() calling inode_detach_wb() too early (the problem
> > reported by Thiago). Patches 12 and 13 fix oops due to a bug in gendisk code
> > where get_gendisk() can return already freed gendisk structure (again triggered
> > by Omar's stress test).
> > 
> > People, please have a look at patches. They are mostly simple however the
> > interactions are rather complex so I may have missed something. Also I'm
> > happy for any additional testing these patches can get - I've stressed them
> > with Omar's script, tested memcg writeback, tested static (not udev managed)
> > device inodes.
> > 
> > Jens, I think at least patches 1-3 should go in together with my fixes you
> > already have in your tree (or shortly after them). It is up to you whether
> > you decide to delay my first fixes or pick these up quickly. Patch 4 is
> > (IMHO a cleaner) replacement of Dan's patches so consider whether you want
> > to use it instead of those patches.
> 
> I have applied 1-3 to my for-linus branch, which will go in after
> the initial pull request has been pulled by Linus. Consider fixing up
> #4 so it applies, I like it.

OK, attached is patch 4 rebased on top of Linus' tree from today which
already has linux-block changes pulled in. I've left put_disk_devt() in
blk_cleanup_queue() to maintain the logic in the original patch (now commit
0dba1314d4f8) that request_queue and gendisk each hold one devt reference.
The bdi_unregister() call that is moved to del_gendisk() by this patch is
now protected by the gendisk reference instead of the request_queue one
so it still maintains the property that devt reference protects bdi
registration-unregistration lifetime (as much as that is not needed anymore
after this patch).

I have also updated the comment in the code and the changelog - they were
somewhat stale after changes to the whole series Tejun suggested.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0001-block-Move-bdi_unregister-to-del_gendisk.patch --]
[-- Type: text/x-patch, Size: 2245 bytes --]

>From 9abe9565c83af6b653b159a7bf5b895aff638c65 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Wed, 8 Feb 2017 08:05:56 +0100
Subject: [PATCH] block: Move bdi_unregister() to del_gendisk()

Commit 6cd18e711dd8 "block: destroy bdi before blockdev is
unregistered." moved bdi unregistration (at that time through
bdi_destroy()) from blk_release_queue() to blk_cleanup_queue() because
it needs to happen before blk_unregister_region() call in del_gendisk()
for MD. SCSI though will free up the device number from sd_remove()
called through a maze of callbacks from device_del() in
__scsi_remove_device() before blk_cleanup_queue() and thus similar races
as described in 6cd18e711dd8 can happen for SCSI as well as reported by
Omar [1].

Moving bdi_unregister() to del_gendisk() works for MD and fixes the
problem for SCSI since del_gendisk() gets called from sd_remove() before
freeing the device number.

This also makes device_add_disk() (calling bdi_register_owner()) more
symmetric with del_gendisk().

[1] http://marc.info/?l=linux-block&m=148554717109098&w=2

Tested-by: Lekshmi Pillai <lekshmicpillai@in.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/blk-core.c | 1 -
 block/genhd.c    | 5 +++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index b9e857f4afe8..1086dac8724c 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -578,7 +578,6 @@ void blk_cleanup_queue(struct request_queue *q)
 		q->queue_lock = &q->__queue_lock;
 	spin_unlock_irq(lock);
 
-	bdi_unregister(q->backing_dev_info);
 	put_disk_devt(q->disk_devt);
 
 	/* @q is and will stay empty, shutdown and put */
diff --git a/block/genhd.c b/block/genhd.c
index 2f444b87a5f2..b26a5ea115d0 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -681,6 +681,11 @@ void del_gendisk(struct gendisk *disk)
 	disk->flags &= ~GENHD_FL_UP;
 
 	sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
+	/*
+	 * Unregister bdi before releasing device numbers (as they can get
+	 * reused and we'd get clashes in sysfs).
+	 */
+	bdi_unregister(disk->queue->backing_dev_info);
 	blk_unregister_queue(disk);
 	blk_unregister_region(disk_devt(disk), disk->minors);
 
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 04/13] block: Move bdi_unregister() to del_gendisk()
  2017-02-22  8:51     ` Jan Kara
@ 2017-02-22 17:42       ` Bart Van Assche
  0 siblings, 0 replies; 40+ messages in thread
From: Bart Van Assche @ 2017-02-22 17:42 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo, NeilBrown,
	Omar Sandoval

On 02/22/2017 01:20 AM, Jan Kara wrote:=0A=
> On Tue 21-02-17 19:53:29, Bart Van Assche wrote:=0A=
>> This change looks suspicious to me. There are drivers that create a=0A=
>> block layer queue but neither call device_add_disk() nor del_gendisk(),=
=0A=
>> e.g. drivers/scsi/st.c. Although bdi_init() will be called for the=0A=
>> queues created by these drivers, this patch will cause the=0A=
>> bdi_unregister() call to be skipped for these drivers.=0A=
> =0A=
> Well, the thing is that bdi_unregister() is the counterpart to=0A=
> bdi_register(). Unless you call bdi_register(), which happens only in=0A=
> device_add_disk() (and some filesystems which create their private bdis),=
=0A=
> there's no point in calling bdi_unregister(). Counterpart to bdi_init() i=
s=0A=
> bdi_exit() and that gets called always once bdi reference count drops to =
0.=0A=
=0A=
That makes sense to me. Hence:=0A=
=0A=
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>=0A=
=0A=

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 05/13] bdi: Mark congested->bdi as internal
  2017-02-21 17:09 ` [PATCH 05/13] bdi: Mark congested->bdi as internal Jan Kara
@ 2017-02-28 16:06   ` Tejun Heo
  0 siblings, 0 replies; 40+ messages in thread
From: Tejun Heo @ 2017-02-28 16:06 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, NeilBrown, Omar Sandoval

On Tue, Feb 21, 2017 at 06:09:50PM +0100, Jan Kara wrote:
> congested->bdi pointer is used only to be able to remove congested
> structure from bdi->cgwb_congested_tree on structure release. Moreover
> the pointer can become NULL when we unregister the bdi. Rename the field
> to __bdi and add a comment to make it more explicit this is internal
> stuff of memcg writeback code and people should not use the field as
> such use will be likely race prone.
> 
> We do not bother with converting congested->bdi to a proper refcounted
> reference. It will be slightly ugly to special-case bdi->wb.congested to
> avoid effectively a cyclic reference of bdi to itself and the reference
> gets cleared from bdi_unregister() making it impossible to reference
> a freed bdi.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 06/13] bdi: Make wb->bdi a proper reference
  2017-02-21 17:09 ` [PATCH 06/13] bdi: Make wb->bdi a proper reference Jan Kara
@ 2017-02-28 16:07   ` Tejun Heo
  0 siblings, 0 replies; 40+ messages in thread
From: Tejun Heo @ 2017-02-28 16:07 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, NeilBrown, Omar Sandoval

On Tue, Feb 21, 2017 at 06:09:51PM +0100, Jan Kara wrote:
> Make wb->bdi a proper refcounted reference to bdi for all bdi_writeback
> structures except for the one embedded inside struct backing_dev_info.
> That will allow us to simplify bdi unregistration.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 07/13] bdi: Move removal from bdi->wb_list into wb_shutdown()
  2017-02-21 17:09 ` [PATCH 07/13] bdi: Move removal from bdi->wb_list into wb_shutdown() Jan Kara
@ 2017-02-28 16:14   ` Tejun Heo
  0 siblings, 0 replies; 40+ messages in thread
From: Tejun Heo @ 2017-02-28 16:14 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, NeilBrown, Omar Sandoval

On Tue, Feb 21, 2017 at 06:09:52PM +0100, Jan Kara wrote:
> Currently the removal from bdi->wb_list happens directly in
> cgwb_release_workfn(). Move it to wb_shutdown() which is functionally
> equivalent and more logical (the list gets only used for distributing
> writeback works among bdi_writeback structures). It will also allow us
> to simplify writeback shutdown in cgwb_bdi_destroy().
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 08/13] bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()
  2017-02-21 17:09 ` [PATCH 08/13] bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy() Jan Kara
@ 2017-02-28 16:44   ` Tejun Heo
  0 siblings, 0 replies; 40+ messages in thread
From: Tejun Heo @ 2017-02-28 16:44 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, NeilBrown, Omar Sandoval

Hello,

On Tue, Feb 21, 2017 at 06:09:53PM +0100, Jan Kara wrote:
> Currently we waited for all cgwbs to get freed in cgwb_bdi_destroy()
> which also means that writeback has been shutdown on them. Since this
> wait is going away, directly shutdown writeback on cgwbs from
> cgwb_bdi_destroy() to avoid live writeback structures after
> bdi_unregister() has finished.

Hmmm... the only thing which is a bit bothering is that after removing
the shutdown from release_workfn this would make the final flushing
sequential, one cgwb at a time, which in certain cases can take an
unnecessarily long time.  It's not a correctness issue tho and if it
becomes a problem we can deal with it by splitting wb_shutdown() into
two pieces - the unregistration / issuing of the work items, and the
flushing of those.  Other than that, looks good to me.

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 09/13] bdi: Do not wait for cgwbs release in bdi_unregister()
  2017-02-21 17:09 ` [PATCH 09/13] bdi: Do not wait for cgwbs release in bdi_unregister() Jan Kara
@ 2017-02-28 16:51   ` Tejun Heo
  0 siblings, 0 replies; 40+ messages in thread
From: Tejun Heo @ 2017-02-28 16:51 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, NeilBrown, Omar Sandoval

Hello,

On Tue, Feb 21, 2017 at 06:09:54PM +0100, Jan Kara wrote:
> @@ -726,14 +718,6 @@ static void cgwb_bdi_destroy(struct backing_dev_info *bdi)
>  	}
>  
>  	spin_unlock_irq(&cgwb_lock);
> -
> -	/*
> -	 * All cgwb's and their congested states must be shutdown and
> -	 * released before returning.  Drain the usage counter to wait for
> -	 * all cgwb's and cgwb_congested's ever created on @bdi.
> -	 */
> -	atomic_dec(&bdi->usage_cnt);
> -	wait_event(cgwb_release_wait, !atomic_read(&bdi->usage_cnt));
>  }

Hmm... I'm not sure about wb_shutdown() synchronization.  If you look
at the function, it's allowed to be called multiple times but doesn't
synchronize the end of the operation.  With usage_cnt, it was okay
because cgwb_bdi_destroy() would have waited until everything is
finished via usage_cnt, but with that gone, we can have a race like
the following.

	A					B
 a cgroup gets removed
 a cgwb starts to get destroyed
 it starts wb_shutdown()
					bdi starts getting destroyed
					calls cgwb_bdi_destroy()
					calls wb_shutdown() on the same cgwb
					  but it returns because it lost to
 wb_shutdown() is still in progress	  A's wb_shutdown()
					bdi destruction proceeds
 Oops.

So, I think we need to make sure that wb_shutdown()'s are properly
synchronized from start to end to get rid of the usage_cnt waiting.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 10/13] bdi: Rename cgwb_bdi_destroy() to cgwb_bdi_unregister()
  2017-02-21 17:09 ` [PATCH 10/13] bdi: Rename cgwb_bdi_destroy() to cgwb_bdi_unregister() Jan Kara
@ 2017-02-28 16:51   ` Tejun Heo
  0 siblings, 0 replies; 40+ messages in thread
From: Tejun Heo @ 2017-02-28 16:51 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, NeilBrown, Omar Sandoval

On Tue, Feb 21, 2017 at 06:09:55PM +0100, Jan Kara wrote:
> Rename cgwb_bdi_destroy() to cgwb_bdi_unregister() as it gets called
> from bdi_unregister() which is not necessarily called from bdi_destroy()
> and thus the name is somewhat misleading.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 11/13] block: Fix oops in locked_inode_to_wb_and_lock_list()
  2017-02-21 17:09 ` [PATCH 11/13] block: Fix oops in locked_inode_to_wb_and_lock_list() Jan Kara
@ 2017-02-28 16:52   ` Tejun Heo
  0 siblings, 0 replies; 40+ messages in thread
From: Tejun Heo @ 2017-02-28 16:52 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, NeilBrown, Omar Sandoval

On Tue, Feb 21, 2017 at 06:09:56PM +0100, Jan Kara wrote:
> When block device is closed, we call inode_detach_wb() in __blkdev_put()
> which sets inode->i_wb to NULL. That is contrary to expectations that
> inode->i_wb stays valid once set during the whole inode's lifetime and
> leads to oops in wb_get() in locked_inode_to_wb_and_lock_list() because
> inode_to_wb() returned NULL.
> 
> The reason why we called inode_detach_wb() is not valid anymore though.
> BDI is guaranteed to stay along until we call bdi_put() from
> bdev_evict_inode() so we can postpone calling inode_detach_wb() to that
> moment.
> 
> Also add a warning to catch if someone uses inode_detach_wb() in a
> dangerous way.
> 
> Reported-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
> Signed-off-by: Jan Kara <jack@suse.cz>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/13 v2] block: Fix block device shutdown related races
  2017-02-21 17:09 [PATCH 0/13 v2] block: Fix block device shutdown related races Jan Kara
                   ` (14 preceding siblings ...)
  2017-02-21 17:19 ` Jens Axboe
@ 2017-02-28 16:54 ` Tejun Heo
  2017-03-01 15:37   ` Jan Kara
  15 siblings, 1 reply; 40+ messages in thread
From: Tejun Heo @ 2017-02-28 16:54 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, NeilBrown, Omar Sandoval

Hello,

It generally looks good to me.  The only worry I have is around
wb_shutdown() synchronization and if that is actually an issue it
shouldn't be too difficult to fix.  The other thing which came to mind
is that the congested->__bdi sever semantics.  IIRC, that one was also
to support the "bdi must go away now" behavior.  As bdi is refcnted
now, I think we can probably just let cong hold onto the bdi rather
than try to sever the ref there.

Thanks a lot for working on this!

-- 
tejun

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/13 v2] block: Fix block device shutdown related races
  2017-02-22 10:24   ` Jan Kara
@ 2017-03-01  7:25     ` Omar Sandoval
  2017-03-01  7:26       ` Omar Sandoval
  0 siblings, 1 reply; 40+ messages in thread
From: Omar Sandoval @ 2017-03-01  7:25 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo, NeilBrown

On Wed, Feb 22, 2017 at 11:24:25AM +0100, Jan Kara wrote:
> On Tue 21-02-17 10:19:28, Jens Axboe wrote:
> > On 02/21/2017 10:09 AM, Jan Kara wrote:
> > > Hello,
> > > 
> > > this is a second revision of the patch set to fix several different races and
> > > issues I've found when testing device shutdown and reuse. The first three
> > > patches are fixes to problems in my previous series fixing BDI lifetime issues.
> > > Patch 4 fixes issues with reuse of BDI name with scsi devices. With it I cannot
> > > reproduce the BDI name reuse issues using Omar's stress test using scsi_debug
> > > so it can be used as a replacement of Dan's patches. Patches 5-11 fix oops that
> > > is triggered by __blkdev_put() calling inode_detach_wb() too early (the problem
> > > reported by Thiago). Patches 12 and 13 fix oops due to a bug in gendisk code
> > > where get_gendisk() can return already freed gendisk structure (again triggered
> > > by Omar's stress test).
> > > 
> > > People, please have a look at patches. They are mostly simple however the
> > > interactions are rather complex so I may have missed something. Also I'm
> > > happy for any additional testing these patches can get - I've stressed them
> > > with Omar's script, tested memcg writeback, tested static (not udev managed)
> > > device inodes.
> > > 
> > > Jens, I think at least patches 1-3 should go in together with my fixes you
> > > already have in your tree (or shortly after them). It is up to you whether
> > > you decide to delay my first fixes or pick these up quickly. Patch 4 is
> > > (IMHO a cleaner) replacement of Dan's patches so consider whether you want
> > > to use it instead of those patches.
> > 
> > I have applied 1-3 to my for-linus branch, which will go in after
> > the initial pull request has been pulled by Linus. Consider fixing up
> > #4 so it applies, I like it.
> 
> OK, attached is patch 4 rebased on top of Linus' tree from today which
> already has linux-block changes pulled in. I've left put_disk_devt() in
> blk_cleanup_queue() to maintain the logic in the original patch (now commit
> 0dba1314d4f8) that request_queue and gendisk each hold one devt reference.
> The bdi_unregister() call that is moved to del_gendisk() by this patch is
> now protected by the gendisk reference instead of the request_queue one
> so it still maintains the property that devt reference protects bdi
> registration-unregistration lifetime (as much as that is not needed anymore
> after this patch).
> 
> I have also updated the comment in the code and the changelog - they were
> somewhat stale after changes to the whole series Tejun suggested.
> 
> 								Honza

Hey, Jan, I just tested this out when I was seeing similar crashes with
sr instead of sd, and this fixed it.

Tested-by: Omar Sandoval <osandov@fb.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/13 v2] block: Fix block device shutdown related races
  2017-03-01  7:25     ` Omar Sandoval
@ 2017-03-01  7:26       ` Omar Sandoval
  2017-03-01 15:11         ` Jan Kara
  0 siblings, 1 reply; 40+ messages in thread
From: Omar Sandoval @ 2017-03-01  7:26 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo, NeilBrown

On Tue, Feb 28, 2017 at 11:25:28PM -0800, Omar Sandoval wrote:
> On Wed, Feb 22, 2017 at 11:24:25AM +0100, Jan Kara wrote:
> > On Tue 21-02-17 10:19:28, Jens Axboe wrote:
> > > On 02/21/2017 10:09 AM, Jan Kara wrote:
> > > > Hello,
> > > > 
> > > > this is a second revision of the patch set to fix several different races and
> > > > issues I've found when testing device shutdown and reuse. The first three
> > > > patches are fixes to problems in my previous series fixing BDI lifetime issues.
> > > > Patch 4 fixes issues with reuse of BDI name with scsi devices. With it I cannot
> > > > reproduce the BDI name reuse issues using Omar's stress test using scsi_debug
> > > > so it can be used as a replacement of Dan's patches. Patches 5-11 fix oops that
> > > > is triggered by __blkdev_put() calling inode_detach_wb() too early (the problem
> > > > reported by Thiago). Patches 12 and 13 fix oops due to a bug in gendisk code
> > > > where get_gendisk() can return already freed gendisk structure (again triggered
> > > > by Omar's stress test).
> > > > 
> > > > People, please have a look at patches. They are mostly simple however the
> > > > interactions are rather complex so I may have missed something. Also I'm
> > > > happy for any additional testing these patches can get - I've stressed them
> > > > with Omar's script, tested memcg writeback, tested static (not udev managed)
> > > > device inodes.
> > > > 
> > > > Jens, I think at least patches 1-3 should go in together with my fixes you
> > > > already have in your tree (or shortly after them). It is up to you whether
> > > > you decide to delay my first fixes or pick these up quickly. Patch 4 is
> > > > (IMHO a cleaner) replacement of Dan's patches so consider whether you want
> > > > to use it instead of those patches.
> > > 
> > > I have applied 1-3 to my for-linus branch, which will go in after
> > > the initial pull request has been pulled by Linus. Consider fixing up
> > > #4 so it applies, I like it.
> > 
> > OK, attached is patch 4 rebased on top of Linus' tree from today which
> > already has linux-block changes pulled in. I've left put_disk_devt() in
> > blk_cleanup_queue() to maintain the logic in the original patch (now commit
> > 0dba1314d4f8) that request_queue and gendisk each hold one devt reference.
> > The bdi_unregister() call that is moved to del_gendisk() by this patch is
> > now protected by the gendisk reference instead of the request_queue one
> > so it still maintains the property that devt reference protects bdi
> > registration-unregistration lifetime (as much as that is not needed anymore
> > after this patch).
> > 
> > I have also updated the comment in the code and the changelog - they were
> > somewhat stale after changes to the whole series Tejun suggested.
> > 
> > 								Honza
> 
> Hey, Jan, I just tested this out when I was seeing similar crashes with
> sr instead of sd, and this fixed it.
> 
> Tested-by: Omar Sandoval <osandov@fb.com>

Just realized it wasn't clear, I'm talking about patch 4 specifically.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/13 v2] block: Fix block device shutdown related races
  2017-03-01  7:26       ` Omar Sandoval
@ 2017-03-01 15:11         ` Jan Kara
  2017-03-06 20:38           ` Omar Sandoval
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Kara @ 2017-03-01 15:11 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: Jan Kara, Jens Axboe, linux-block, Christoph Hellwig,
	Dan Williams, Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo,
	NeilBrown

On Tue 28-02-17 23:26:53, Omar Sandoval wrote:
> On Tue, Feb 28, 2017 at 11:25:28PM -0800, Omar Sandoval wrote:
> > On Wed, Feb 22, 2017 at 11:24:25AM +0100, Jan Kara wrote:
> > > On Tue 21-02-17 10:19:28, Jens Axboe wrote:
> > > > On 02/21/2017 10:09 AM, Jan Kara wrote:
> > > > > Hello,
> > > > > 
> > > > > this is a second revision of the patch set to fix several different races and
> > > > > issues I've found when testing device shutdown and reuse. The first three
> > > > > patches are fixes to problems in my previous series fixing BDI lifetime issues.
> > > > > Patch 4 fixes issues with reuse of BDI name with scsi devices. With it I cannot
> > > > > reproduce the BDI name reuse issues using Omar's stress test using scsi_debug
> > > > > so it can be used as a replacement of Dan's patches. Patches 5-11 fix oops that
> > > > > is triggered by __blkdev_put() calling inode_detach_wb() too early (the problem
> > > > > reported by Thiago). Patches 12 and 13 fix oops due to a bug in gendisk code
> > > > > where get_gendisk() can return already freed gendisk structure (again triggered
> > > > > by Omar's stress test).
> > > > > 
> > > > > People, please have a look at patches. They are mostly simple however the
> > > > > interactions are rather complex so I may have missed something. Also I'm
> > > > > happy for any additional testing these patches can get - I've stressed them
> > > > > with Omar's script, tested memcg writeback, tested static (not udev managed)
> > > > > device inodes.
> > > > > 
> > > > > Jens, I think at least patches 1-3 should go in together with my fixes you
> > > > > already have in your tree (or shortly after them). It is up to you whether
> > > > > you decide to delay my first fixes or pick these up quickly. Patch 4 is
> > > > > (IMHO a cleaner) replacement of Dan's patches so consider whether you want
> > > > > to use it instead of those patches.
> > > > 
> > > > I have applied 1-3 to my for-linus branch, which will go in after
> > > > the initial pull request has been pulled by Linus. Consider fixing up
> > > > #4 so it applies, I like it.
> > > 
> > > OK, attached is patch 4 rebased on top of Linus' tree from today which
> > > already has linux-block changes pulled in. I've left put_disk_devt() in
> > > blk_cleanup_queue() to maintain the logic in the original patch (now commit
> > > 0dba1314d4f8) that request_queue and gendisk each hold one devt reference.
> > > The bdi_unregister() call that is moved to del_gendisk() by this patch is
> > > now protected by the gendisk reference instead of the request_queue one
> > > so it still maintains the property that devt reference protects bdi
> > > registration-unregistration lifetime (as much as that is not needed anymore
> > > after this patch).
> > > 
> > > I have also updated the comment in the code and the changelog - they were
> > > somewhat stale after changes to the whole series Tejun suggested.
> > > 
> > > 								Honza
> > 
> > Hey, Jan, I just tested this out when I was seeing similar crashes with
> > sr instead of sd, and this fixed it.
> > 
> > Tested-by: Omar Sandoval <osandov@fb.com>
> 
> Just realized it wasn't clear, I'm talking about patch 4 specifically.

Thanks for confirmation!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/13 v2] block: Fix block device shutdown related races
  2017-02-28 16:54 ` Tejun Heo
@ 2017-03-01 15:37   ` Jan Kara
  2017-03-01 16:26     ` Tejun Heo
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Kara @ 2017-03-01 15:37 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jan Kara, Jens Axboe, linux-block, Christoph Hellwig,
	Dan Williams, Thiago Jung Bauermann, Lekshmi Pillai, NeilBrown,
	Omar Sandoval

Hello,

On Tue 28-02-17 11:54:41, Tejun Heo wrote:
> It generally looks good to me.

Thanks for review!

> The only worry I have is around wb_shutdown() synchronization and if that
> is actually an issue it shouldn't be too difficult to fix.

Yeah, I'll have a look at that.

> The other thing which came to mind is that the congested->__bdi sever
> semantics.  IIRC, that one was also to support the "bdi must go away now"
> behavior.  As bdi is refcnted now, I think we can probably just let cong
> hold onto the bdi rather than try to sever the ref there.

So currently I get away with __bdi not being a proper refcounted reference.
If we were to remove the clearing of __bdi, we'd have to make it into
refcounted reference which is sligthly ugly as we need to special-case
embedded bdi_writeback_congested structures. Maybe it will be a worthwhile
cleanup but for now I left it alone...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/13 v2] block: Fix block device shutdown related races
  2017-03-01 15:37   ` Jan Kara
@ 2017-03-01 16:26     ` Tejun Heo
  0 siblings, 0 replies; 40+ messages in thread
From: Tejun Heo @ 2017-03-01 16:26 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, NeilBrown, Omar Sandoval

Hello, Jan.

On Wed, Mar 01, 2017 at 04:37:00PM +0100, Jan Kara wrote:
> > The other thing which came to mind is that the congested->__bdi sever
> > semantics.  IIRC, that one was also to support the "bdi must go away now"
> > behavior.  As bdi is refcnted now, I think we can probably just let cong
> > hold onto the bdi rather than try to sever the ref there.
> 
> So currently I get away with __bdi not being a proper refcounted reference.
> If we were to remove the clearing of __bdi, we'd have to make it into
> refcounted reference which is sligthly ugly as we need to special-case
> embedded bdi_writeback_congested structures. Maybe it will be a worthwhile
> cleanup but for now I left it alone...

Yeah, absolutely, it's an additional step that we can take later.
Nothing urgent.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/13 v2] block: Fix block device shutdown related races
  2017-03-01 15:11         ` Jan Kara
@ 2017-03-06 20:38           ` Omar Sandoval
  2017-03-07 13:57             ` Jan Kara
  0 siblings, 1 reply; 40+ messages in thread
From: Omar Sandoval @ 2017-03-06 20:38 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo, NeilBrown

On Wed, Mar 01, 2017 at 04:11:09PM +0100, Jan Kara wrote:
> On Tue 28-02-17 23:26:53, Omar Sandoval wrote:
> > On Tue, Feb 28, 2017 at 11:25:28PM -0800, Omar Sandoval wrote:
> > > On Wed, Feb 22, 2017 at 11:24:25AM +0100, Jan Kara wrote:
> > > > On Tue 21-02-17 10:19:28, Jens Axboe wrote:
> > > > > On 02/21/2017 10:09 AM, Jan Kara wrote:
> > > > > > Hello,
> > > > > > 
> > > > > > this is a second revision of the patch set to fix several different races and
> > > > > > issues I've found when testing device shutdown and reuse. The first three
> > > > > > patches are fixes to problems in my previous series fixing BDI lifetime issues.
> > > > > > Patch 4 fixes issues with reuse of BDI name with scsi devices. With it I cannot
> > > > > > reproduce the BDI name reuse issues using Omar's stress test using scsi_debug
> > > > > > so it can be used as a replacement of Dan's patches. Patches 5-11 fix oops that
> > > > > > is triggered by __blkdev_put() calling inode_detach_wb() too early (the problem
> > > > > > reported by Thiago). Patches 12 and 13 fix oops due to a bug in gendisk code
> > > > > > where get_gendisk() can return already freed gendisk structure (again triggered
> > > > > > by Omar's stress test).
> > > > > > 
> > > > > > People, please have a look at patches. They are mostly simple however the
> > > > > > interactions are rather complex so I may have missed something. Also I'm
> > > > > > happy for any additional testing these patches can get - I've stressed them
> > > > > > with Omar's script, tested memcg writeback, tested static (not udev managed)
> > > > > > device inodes.
> > > > > > 
> > > > > > Jens, I think at least patches 1-3 should go in together with my fixes you
> > > > > > already have in your tree (or shortly after them). It is up to you whether
> > > > > > you decide to delay my first fixes or pick these up quickly. Patch 4 is
> > > > > > (IMHO a cleaner) replacement of Dan's patches so consider whether you want
> > > > > > to use it instead of those patches.
> > > > > 
> > > > > I have applied 1-3 to my for-linus branch, which will go in after
> > > > > the initial pull request has been pulled by Linus. Consider fixing up
> > > > > #4 so it applies, I like it.
> > > > 
> > > > OK, attached is patch 4 rebased on top of Linus' tree from today which
> > > > already has linux-block changes pulled in. I've left put_disk_devt() in
> > > > blk_cleanup_queue() to maintain the logic in the original patch (now commit
> > > > 0dba1314d4f8) that request_queue and gendisk each hold one devt reference.
> > > > The bdi_unregister() call that is moved to del_gendisk() by this patch is
> > > > now protected by the gendisk reference instead of the request_queue one
> > > > so it still maintains the property that devt reference protects bdi
> > > > registration-unregistration lifetime (as much as that is not needed anymore
> > > > after this patch).
> > > > 
> > > > I have also updated the comment in the code and the changelog - they were
> > > > somewhat stale after changes to the whole series Tejun suggested.
> > > > 
> > > > 								Honza
> > > 
> > > Hey, Jan, I just tested this out when I was seeing similar crashes with
> > > sr instead of sd, and this fixed it.
> > > 
> > > Tested-by: Omar Sandoval <osandov@fb.com>
> > 
> > Just realized it wasn't clear, I'm talking about patch 4 specifically.
> 
> Thanks for confirmation!
> 
> 								Honza

Unfortunately, this patch actually seems to have introduced a
regression. Reverting the patch fixes it.

I'm running a QEMU kvm vm with a virtio-scsi device and I get oopses
like this:

[    6.741773] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[    6.744401] IP: virtscsi_queuecommand_single+0x21/0x40 [virtio_scsi]
[    6.744985] PGD 0
[    6.744985]
[    6.744985] Oops: 0002 [#1] PREEMPT SMP
[    6.744985] Modules linked in: virtio_scsi scsi_mod virtio_net
[    6.744985] CPU: 3 PID: 5 Comm: kworker/u8:0 Not tainted 4.11.0-rc1 #36
[    6.744985] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-20161122_114906-anatol 04/01/2014
[    6.744985] Workqueue: events_unbound async_run_entry_fn
[    6.744985] task: ffff88007c8672c0 task.stack: ffffc9000033c000
[    6.750038] RIP: 0010:virtscsi_queuecommand_single+0x21/0x40 [virtio_scsi]
[    6.750038] RSP: 0018:ffffc9000033f8b0 EFLAGS: 00010246
[    6.750038] RAX: ffff88007c98d000 RBX: ffff880078c814c8 RCX: 0000000000000000
[    6.750038] RDX: ffff880078c814c8 RSI: ffff880078c814c8 RDI: ffff88007c98d000
[    6.750038] RBP: ffffc9000033f8b0 R08: 0000000000000000 R09: 0000000000000400
[    6.750038] R10: ffff88007b1fe748 R11: 0000000000000024 R12: ffff880078c814c8
[    6.750038] R13: ffff88007c98d000 R14: ffff880078c81380 R15: ffff88007b532000
[    6.750038] FS:  0000000000000000(0000) GS:ffff88007fd80000(0000) knlGS:0000000000000000
[    6.750038] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.750038] CR2: 0000000000000004 CR3: 0000000001c09000 CR4: 00000000000006e0
[    6.750038] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    6.750038] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    6.750038] Call Trace:
[    6.750038]  scsi_dispatch_cmd+0xa3/0x1d0 [scsi_mod]
[    6.750038]  scsi_queue_rq+0x586/0x740 [scsi_mod]
[    6.750038]  blk_mq_dispatch_rq_list+0x22a/0x420
[    6.750038]  blk_mq_sched_dispatch_requests+0x142/0x1b0
[    6.750038]  __blk_mq_run_hw_queue+0x94/0xb0
[    6.750038]  blk_mq_run_hw_queue+0x82/0xa0
[    6.750038]  blk_mq_sched_insert_request+0x89/0x1a0
[    6.750038]  ? blk_execute_rq+0xc0/0xc0
[    6.750038]  blk_execute_rq_nowait+0x9a/0x140
[    6.750038]  ? bio_phys_segments+0x19/0x20
[    6.750038]  blk_execute_rq+0x56/0xc0
[    6.750038]  scsi_execute+0x128/0x270 [scsi_mod]
[    6.750038]  scsi_probe_and_add_lun+0x247/0xc60 [scsi_mod]
[    6.750038]  ? __pm_runtime_resume+0x4c/0x60
[    6.750038]  __scsi_scan_target+0x102/0x520 [scsi_mod]
[    6.750038]  ? account_entity_dequeue+0xab/0xe0
[    6.750038]  scsi_scan_channel+0x81/0xa0 [scsi_mod]
[    6.750038]  scsi_scan_host_selected+0xcc/0x100 [scsi_mod]
[    6.750038]  do_scsi_scan_host+0x8d/0x90 [scsi_mod]
[    6.750038]  do_scan_async+0x1c/0x1a0 [scsi_mod]
[    6.750038]  async_run_entry_fn+0x4a/0x170
[    6.750038]  process_one_work+0x165/0x430
[    6.750038]  worker_thread+0x4e/0x490
[    6.750038]  kthread+0x101/0x140
[    6.750038]  ? process_one_work+0x430/0x430
[    6.750038]  ? kthread_create_on_node+0x60/0x60
[    6.750038]  ret_from_fork+0x2c/0x40
[    6.750038] Code: ff ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 4e 30 48 89 f8 48 89 f2 48 89 e5 48 8b 89 88 01 00 00 48 8b 89 08 03 00 00 <f0> ff 41 04 48 8d bf c0 07 00 00 48 8d b0 c8 09 00 00 e8 08 fd
[    6.750038] RIP: virtscsi_queuecommand_single+0x21/0x40 [virtio_scsi] RSP: ffffc9000033f8b0
[    6.750038] CR2: 0000000000000004
[    6.750038] ---[ end trace cee5df55872abfa0 ]---
[    6.750038] note: kworker/u8:0[5] exited with preempt_count 1

Arch Linux 4.11.0-rc1 (ttyS0)

silver login: [   27.370267] scsi 0:0:53:0: tag#766 abort
[   27.374501] scsi 0:0:53:0: device reset
[   27.378539] scsi 0:0:53:0: Device offlined - not ready after error recovery

That crash is because tgt is NULL here:

----
static int virtscsi_queuecommand_single(struct Scsi_Host *sh,
					struct scsi_cmnd *sc)
{
	struct virtio_scsi *vscsi = shost_priv(sh);
	struct virtio_scsi_target_state *tgt =
				scsi_target(sc->device)->hostdata;

======> atomic_inc(&tgt->reqs);
	return virtscsi_queuecommand(vscsi, &vscsi->req_vqs[0], sc);
}
----

Applying the rest of this patch set didn't fix it. Additionally,
enabling KASAN gives this use-after-free spew:
----
[    7.789280] SCSI subsystem initialized
[    7.809991] virtio_net virtio0 ens2: renamed from eth0
[    7.828301] scsi host0: Virtio SCSI HBA
[    7.916214] scsi 0:0:0:0: Direct-Access     QEMU     QEMU HARDDISK    2.5+ PQ: 0 ANSI: 5
[    7.946713] ==================================================================
[    7.947609] BUG: KASAN: use-after-free in rb_erase+0x13a2/0x1a20 at addr ffff88006a915e30
[    7.948583] Write of size 8 by task dhcpcd-run-hook/146
[    7.949202] CPU: 0 PID: 146 Comm: dhcpcd-run-hook Not tainted 4.11.0-rc1-00011-gc35ccd2d43e9 #42
[    7.950021] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-20161122_114906-anatol 04/01/2014
[    7.950021] Call Trace:
[    7.950021]  <IRQ>
[    7.950021]  dump_stack+0x63/0x86
[    7.950021]  kasan_object_err+0x21/0x70
[    7.950021]  kasan_report.part.1+0x23a/0x520
[    7.950021]  ? rb_erase+0x13a2/0x1a20
[    7.950021]  ? swake_up_locked+0x94/0x1c0
[    7.950021]  __asan_report_store8_noabort+0x31/0x40
[    7.950021]  rb_erase+0x13a2/0x1a20
[    7.950021]  wb_congested_put+0x6a/0xd0
[    7.950021]  __blkg_release_rcu+0x16b/0x230
[    7.950021]  rcu_process_callbacks+0x602/0xfc0
[    7.950021]  __do_softirq+0x14e/0x5b3
[    7.950021]  irq_exit+0x14e/0x180
[    7.950021]  smp_apic_timer_interrupt+0x7b/0xa0
[    7.950021]  apic_timer_interrupt+0x89/0x90
[    7.950021] RIP: 0010:copy_strings.isra.7+0x312/0x6b0
[    7.950021] RSP: 0018:ffff880063bffcd0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
[    7.950021] RAX: 0000000000002000 RBX: 0000000000000fe3 RCX: 0000000000000002
[    7.950021] RDX: ffff88006a734880 RSI: 0000000000000002 RDI: ffff880069750808
[    7.950021] RBP: ffff880063bffdd0 R08: 8000000063ec3067 R09: 0000000000000040
[    7.950021] R10: ffffed000c7d8600 R11: ffffffff82afa6a0 R12: 00007fffffffefe3
[    7.950021] R13: 0000000000000015 R14: ffff880069750780 R15: dffffc0000000000
[    7.950021]  </IRQ>
[    7.950021]  ? set_binfmt+0x120/0x120
[    7.950021]  ? insert_vm_struct+0x148/0x2e0
[    7.950021]  ? kasan_slab_alloc+0x12/0x20
[    7.950021]  ? count.isra.6.constprop.16+0x52/0x100
[    7.950021]  do_execveat_common.isra.14+0xef1/0x1b80
[    7.950021]  ? prepare_bprm_creds+0x100/0x100
[    7.950021]  ? getname_flags+0x90/0x3f0
[    7.950021]  ? __do_page_fault+0x5cc/0xbc0
[    7.950021]  SyS_execve+0x3a/0x50
[    7.950021]  ? ptregs_sys_vfork+0x10/0x10
[    7.950021]  do_syscall_64+0x180/0x380
[    7.950021]  entry_SYSCALL64_slow_path+0x25/0x25
[    7.950021] RIP: 0033:0x7f85de145477
[    7.950021] RSP: 002b:00007ffdf2da3568 EFLAGS: 00000202 ORIG_RAX: 000000000000003b
[    7.950021] RAX: ffffffffffffffda RBX: 0000000000d771c0 RCX: 00007f85de145477
[    7.950021] RDX: 0000000000d35fb0 RSI: 0000000000d697e0 RDI: 0000000000d771c0
[    7.950021] RBP: 0000000000000000 R08: 00007ffdf2da3560 R09: 0000000000000030
[    7.950021] R10: 000000000000059a R11: 0000000000000202 R12: 0000000000d771c0
[    7.950021] R13: 0000000000d697e0 R14: 0000000000d35fb0 R15: 0000000000000000
[    7.950021] Object at ffff88006a915b00, in cache kmalloc-1024 size: 1024
[    7.950021] Allocated:
[    7.950021] PID = 72
[    7.950021]  save_stack_trace+0x1b/0x20
[    7.950021]  kasan_kmalloc+0xee/0x190
[    7.950021]  kmem_cache_alloc_node_trace+0x138/0x200
[    7.950021]  bdi_alloc_node+0x4c/0xa0
[    7.950021]  blk_alloc_queue_node+0xdd/0x870
[    7.950021]  blk_mq_init_queue+0x41/0x90
[    7.950021]  scsi_mq_alloc_queue+0x41/0x130 [scsi_mod]
[    7.950021]  scsi_alloc_sdev+0x90e/0xd00 [scsi_mod]
[    7.950021]  scsi_probe_and_add_lun+0x14b3/0x2250 [scsi_mod]
[    7.950021]  __scsi_scan_target+0x23d/0xb60 [scsi_mod]
[    7.950021]  scsi_scan_channel+0x107/0x160 [scsi_mod]
[    7.950021]  scsi_scan_host_selected+0x1bf/0x250 [scsi_mod]
[    7.950021]  do_scsi_scan_host+0x1bc/0x250 [scsi_mod]
[    7.950021]  do_scan_async+0x41/0x4b0 [scsi_mod]
[    7.950021]  async_run_entry_fn+0xc3/0x630
[    7.950021]  process_one_work+0x531/0xf40
[    7.950021]  worker_thread+0xe4/0x10d0
[    7.950021]  kthread+0x298/0x390
[    7.950021]  ret_from_fork+0x2c/0x40
[    7.950021] Freed:
[    7.950021] PID = 72
[    7.950021]  save_stack_trace+0x1b/0x20
[    7.950021]  kasan_slab_free+0xb0/0x180
[    7.950021]  kfree+0x9f/0x1d0
[    7.950021]  bdi_put+0x2a/0x30
[    7.950021]  blk_release_queue+0x51/0x320
[    7.950021]  kobject_put+0x154/0x430
[    7.950021]  blk_put_queue+0x15/0x20
[    7.950021]  scsi_device_dev_release_usercontext+0x59b/0x880 [scsi_mod]
[    7.950021]  execute_in_process_context+0xd9/0x130
[    7.950021]  scsi_device_dev_release+0x1c/0x20 [scsi_mod]
[    7.950021]  device_release+0x76/0x1e0
[    7.950021]  kobject_put+0x154/0x430
[    7.950021]  put_device+0x17/0x20
[    7.950021]  __scsi_remove_device+0x18d/0x250 [scsi_mod]
[    7.950021]  scsi_probe_and_add_lun+0x14d6/0x2250 [scsi_mod]
[    7.950021]  __scsi_scan_target+0x23d/0xb60 [scsi_mod]
[    7.950021]  scsi_scan_channel+0x107/0x160 [scsi_mod]
[    7.950021]  scsi_scan_host_selected+0x1bf/0x250 [scsi_mod]
[    7.950021]  do_scsi_scan_host+0x1bc/0x250 [scsi_mod]
[    7.950021]  do_scan_async+0x41/0x4b0 [scsi_mod]
[    7.950021]  async_run_entry_fn+0xc3/0x630
[    7.950021]  process_one_work+0x531/0xf40
[    7.950021]  worker_thread+0xe4/0x10d0
[    7.950021]  kthread+0x298/0x390
[    7.950021]  ret_from_fork+0x2c/0x40
[    7.950021] Memory state around the buggy address:
[    7.950021]  ffff88006a915d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[    7.950021]  ffff88006a915d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[    7.950021] >ffff88006a915e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[    7.950021]                                      ^
[    7.950021]  ffff88006a915e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[    7.950021]  ffff88006a915f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
----

Any ideas Jan?

Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/13 v2] block: Fix block device shutdown related races
  2017-03-06 20:38           ` Omar Sandoval
@ 2017-03-07 13:57             ` Jan Kara
  2017-03-07 16:28               ` Omar Sandoval
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Kara @ 2017-03-07 13:57 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: Jan Kara, Jens Axboe, linux-block, Christoph Hellwig,
	Dan Williams, Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo,
	NeilBrown

On Mon 06-03-17 12:38:18, Omar Sandoval wrote:
> On Wed, Mar 01, 2017 at 04:11:09PM +0100, Jan Kara wrote:
> > On Tue 28-02-17 23:26:53, Omar Sandoval wrote:
> > > On Tue, Feb 28, 2017 at 11:25:28PM -0800, Omar Sandoval wrote:
> > > > On Wed, Feb 22, 2017 at 11:24:25AM +0100, Jan Kara wrote:
> > > > > On Tue 21-02-17 10:19:28, Jens Axboe wrote:
> > > > > > On 02/21/2017 10:09 AM, Jan Kara wrote:
> > > > > > > Hello,
> > > > > > > 
> > > > > > > this is a second revision of the patch set to fix several different races and
> > > > > > > issues I've found when testing device shutdown and reuse. The first three
> > > > > > > patches are fixes to problems in my previous series fixing BDI lifetime issues.
> > > > > > > Patch 4 fixes issues with reuse of BDI name with scsi devices. With it I cannot
> > > > > > > reproduce the BDI name reuse issues using Omar's stress test using scsi_debug
> > > > > > > so it can be used as a replacement of Dan's patches. Patches 5-11 fix oops that
> > > > > > > is triggered by __blkdev_put() calling inode_detach_wb() too early (the problem
> > > > > > > reported by Thiago). Patches 12 and 13 fix oops due to a bug in gendisk code
> > > > > > > where get_gendisk() can return already freed gendisk structure (again triggered
> > > > > > > by Omar's stress test).
> > > > > > > 
> > > > > > > People, please have a look at patches. They are mostly simple however the
> > > > > > > interactions are rather complex so I may have missed something. Also I'm
> > > > > > > happy for any additional testing these patches can get - I've stressed them
> > > > > > > with Omar's script, tested memcg writeback, tested static (not udev managed)
> > > > > > > device inodes.
> > > > > > > 
> > > > > > > Jens, I think at least patches 1-3 should go in together with my fixes you
> > > > > > > already have in your tree (or shortly after them). It is up to you whether
> > > > > > > you decide to delay my first fixes or pick these up quickly. Patch 4 is
> > > > > > > (IMHO a cleaner) replacement of Dan's patches so consider whether you want
> > > > > > > to use it instead of those patches.
> > > > > > 
> > > > > > I have applied 1-3 to my for-linus branch, which will go in after
> > > > > > the initial pull request has been pulled by Linus. Consider fixing up
> > > > > > #4 so it applies, I like it.
> > > > > 
> > > > > OK, attached is patch 4 rebased on top of Linus' tree from today which
> > > > > already has linux-block changes pulled in. I've left put_disk_devt() in
> > > > > blk_cleanup_queue() to maintain the logic in the original patch (now commit
> > > > > 0dba1314d4f8) that request_queue and gendisk each hold one devt reference.
> > > > > The bdi_unregister() call that is moved to del_gendisk() by this patch is
> > > > > now protected by the gendisk reference instead of the request_queue one
> > > > > so it still maintains the property that devt reference protects bdi
> > > > > registration-unregistration lifetime (as much as that is not needed anymore
> > > > > after this patch).
> > > > > 
> > > > > I have also updated the comment in the code and the changelog - they were
> > > > > somewhat stale after changes to the whole series Tejun suggested.
> > > > > 
> > > > > 								Honza
> > > > 
> > > > Hey, Jan, I just tested this out when I was seeing similar crashes with
> > > > sr instead of sd, and this fixed it.
> > > > 
> > > > Tested-by: Omar Sandoval <osandov@fb.com>
> > > 
> > > Just realized it wasn't clear, I'm talking about patch 4 specifically.
> > 
> > Thanks for confirmation!
> > 
> > 								Honza
> 
> Unfortunately, this patch actually seems to have introduced a
> regression. Reverting the patch fixes it.
> 
> I'm running a QEMU kvm vm with a virtio-scsi device and I get oopses
> like this:

What workload do you run? Or is it enough to just boot the kvm with
virtio-scsi enabled? Can you send me the kvm setup so that I can reproduce
this?

At least the KASAN error could be a result of the situation where someone
called bdi_register() but didn't call bdi_unregister() before dropping the
last bdi reference (which would make sense given I've moved
bdi_unregister() call). However I'd expect a warning from bdi_exit() in that
case and I don't see that in your kernel log. So I'll try to reproduce the
issue and debug more.

								Honza

> 
> [    6.741773] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
> [    6.744401] IP: virtscsi_queuecommand_single+0x21/0x40 [virtio_scsi]
> [    6.744985] PGD 0
> [    6.744985]
> [    6.744985] Oops: 0002 [#1] PREEMPT SMP
> [    6.744985] Modules linked in: virtio_scsi scsi_mod virtio_net
> [    6.744985] CPU: 3 PID: 5 Comm: kworker/u8:0 Not tainted 4.11.0-rc1 #36
> [    6.744985] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-20161122_114906-anatol 04/01/2014
> [    6.744985] Workqueue: events_unbound async_run_entry_fn
> [    6.744985] task: ffff88007c8672c0 task.stack: ffffc9000033c000
> [    6.750038] RIP: 0010:virtscsi_queuecommand_single+0x21/0x40 [virtio_scsi]
> [    6.750038] RSP: 0018:ffffc9000033f8b0 EFLAGS: 00010246
> [    6.750038] RAX: ffff88007c98d000 RBX: ffff880078c814c8 RCX: 0000000000000000
> [    6.750038] RDX: ffff880078c814c8 RSI: ffff880078c814c8 RDI: ffff88007c98d000
> [    6.750038] RBP: ffffc9000033f8b0 R08: 0000000000000000 R09: 0000000000000400
> [    6.750038] R10: ffff88007b1fe748 R11: 0000000000000024 R12: ffff880078c814c8
> [    6.750038] R13: ffff88007c98d000 R14: ffff880078c81380 R15: ffff88007b532000
> [    6.750038] FS:  0000000000000000(0000) GS:ffff88007fd80000(0000) knlGS:0000000000000000
> [    6.750038] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    6.750038] CR2: 0000000000000004 CR3: 0000000001c09000 CR4: 00000000000006e0
> [    6.750038] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    6.750038] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    6.750038] Call Trace:
> [    6.750038]  scsi_dispatch_cmd+0xa3/0x1d0 [scsi_mod]
> [    6.750038]  scsi_queue_rq+0x586/0x740 [scsi_mod]
> [    6.750038]  blk_mq_dispatch_rq_list+0x22a/0x420
> [    6.750038]  blk_mq_sched_dispatch_requests+0x142/0x1b0
> [    6.750038]  __blk_mq_run_hw_queue+0x94/0xb0
> [    6.750038]  blk_mq_run_hw_queue+0x82/0xa0
> [    6.750038]  blk_mq_sched_insert_request+0x89/0x1a0
> [    6.750038]  ? blk_execute_rq+0xc0/0xc0
> [    6.750038]  blk_execute_rq_nowait+0x9a/0x140
> [    6.750038]  ? bio_phys_segments+0x19/0x20
> [    6.750038]  blk_execute_rq+0x56/0xc0
> [    6.750038]  scsi_execute+0x128/0x270 [scsi_mod]
> [    6.750038]  scsi_probe_and_add_lun+0x247/0xc60 [scsi_mod]
> [    6.750038]  ? __pm_runtime_resume+0x4c/0x60
> [    6.750038]  __scsi_scan_target+0x102/0x520 [scsi_mod]
> [    6.750038]  ? account_entity_dequeue+0xab/0xe0
> [    6.750038]  scsi_scan_channel+0x81/0xa0 [scsi_mod]
> [    6.750038]  scsi_scan_host_selected+0xcc/0x100 [scsi_mod]
> [    6.750038]  do_scsi_scan_host+0x8d/0x90 [scsi_mod]
> [    6.750038]  do_scan_async+0x1c/0x1a0 [scsi_mod]
> [    6.750038]  async_run_entry_fn+0x4a/0x170
> [    6.750038]  process_one_work+0x165/0x430
> [    6.750038]  worker_thread+0x4e/0x490
> [    6.750038]  kthread+0x101/0x140
> [    6.750038]  ? process_one_work+0x430/0x430
> [    6.750038]  ? kthread_create_on_node+0x60/0x60
> [    6.750038]  ret_from_fork+0x2c/0x40
> [    6.750038] Code: ff ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 4e 30 48 89 f8 48 89 f2 48 89 e5 48 8b 89 88 01 00 00 48 8b 89 08 03 00 00 <f0> ff 41 04 48 8d bf c0 07 00 00 48 8d b0 c8 09 00 00 e8 08 fd
> [    6.750038] RIP: virtscsi_queuecommand_single+0x21/0x40 [virtio_scsi] RSP: ffffc9000033f8b0
> [    6.750038] CR2: 0000000000000004
> [    6.750038] ---[ end trace cee5df55872abfa0 ]---
> [    6.750038] note: kworker/u8:0[5] exited with preempt_count 1
> 
> Arch Linux 4.11.0-rc1 (ttyS0)
> 
> silver login: [   27.370267] scsi 0:0:53:0: tag#766 abort
> [   27.374501] scsi 0:0:53:0: device reset
> [   27.378539] scsi 0:0:53:0: Device offlined - not ready after error recovery
> 
> That crash is because tgt is NULL here:
> 
> ----
> static int virtscsi_queuecommand_single(struct Scsi_Host *sh,
> 					struct scsi_cmnd *sc)
> {
> 	struct virtio_scsi *vscsi = shost_priv(sh);
> 	struct virtio_scsi_target_state *tgt =
> 				scsi_target(sc->device)->hostdata;
> 
> ======> atomic_inc(&tgt->reqs);
> 	return virtscsi_queuecommand(vscsi, &vscsi->req_vqs[0], sc);
> }
> ----
> 
> Applying the rest of this patch set didn't fix it. Additionally,
> enabling KASAN gives this use-after-free spew:
> ----
> [    7.789280] SCSI subsystem initialized
> [    7.809991] virtio_net virtio0 ens2: renamed from eth0
> [    7.828301] scsi host0: Virtio SCSI HBA
> [    7.916214] scsi 0:0:0:0: Direct-Access     QEMU     QEMU HARDDISK    2.5+ PQ: 0 ANSI: 5
> [    7.946713] ==================================================================
> [    7.947609] BUG: KASAN: use-after-free in rb_erase+0x13a2/0x1a20 at addr ffff88006a915e30
> [    7.948583] Write of size 8 by task dhcpcd-run-hook/146
> [    7.949202] CPU: 0 PID: 146 Comm: dhcpcd-run-hook Not tainted 4.11.0-rc1-00011-gc35ccd2d43e9 #42
> [    7.950021] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-20161122_114906-anatol 04/01/2014
> [    7.950021] Call Trace:
> [    7.950021]  <IRQ>
> [    7.950021]  dump_stack+0x63/0x86
> [    7.950021]  kasan_object_err+0x21/0x70
> [    7.950021]  kasan_report.part.1+0x23a/0x520
> [    7.950021]  ? rb_erase+0x13a2/0x1a20
> [    7.950021]  ? swake_up_locked+0x94/0x1c0
> [    7.950021]  __asan_report_store8_noabort+0x31/0x40
> [    7.950021]  rb_erase+0x13a2/0x1a20
> [    7.950021]  wb_congested_put+0x6a/0xd0
> [    7.950021]  __blkg_release_rcu+0x16b/0x230
> [    7.950021]  rcu_process_callbacks+0x602/0xfc0
> [    7.950021]  __do_softirq+0x14e/0x5b3
> [    7.950021]  irq_exit+0x14e/0x180
> [    7.950021]  smp_apic_timer_interrupt+0x7b/0xa0
> [    7.950021]  apic_timer_interrupt+0x89/0x90
> [    7.950021] RIP: 0010:copy_strings.isra.7+0x312/0x6b0
> [    7.950021] RSP: 0018:ffff880063bffcd0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
> [    7.950021] RAX: 0000000000002000 RBX: 0000000000000fe3 RCX: 0000000000000002
> [    7.950021] RDX: ffff88006a734880 RSI: 0000000000000002 RDI: ffff880069750808
> [    7.950021] RBP: ffff880063bffdd0 R08: 8000000063ec3067 R09: 0000000000000040
> [    7.950021] R10: ffffed000c7d8600 R11: ffffffff82afa6a0 R12: 00007fffffffefe3
> [    7.950021] R13: 0000000000000015 R14: ffff880069750780 R15: dffffc0000000000
> [    7.950021]  </IRQ>
> [    7.950021]  ? set_binfmt+0x120/0x120
> [    7.950021]  ? insert_vm_struct+0x148/0x2e0
> [    7.950021]  ? kasan_slab_alloc+0x12/0x20
> [    7.950021]  ? count.isra.6.constprop.16+0x52/0x100
> [    7.950021]  do_execveat_common.isra.14+0xef1/0x1b80
> [    7.950021]  ? prepare_bprm_creds+0x100/0x100
> [    7.950021]  ? getname_flags+0x90/0x3f0
> [    7.950021]  ? __do_page_fault+0x5cc/0xbc0
> [    7.950021]  SyS_execve+0x3a/0x50
> [    7.950021]  ? ptregs_sys_vfork+0x10/0x10
> [    7.950021]  do_syscall_64+0x180/0x380
> [    7.950021]  entry_SYSCALL64_slow_path+0x25/0x25
> [    7.950021] RIP: 0033:0x7f85de145477
> [    7.950021] RSP: 002b:00007ffdf2da3568 EFLAGS: 00000202 ORIG_RAX: 000000000000003b
> [    7.950021] RAX: ffffffffffffffda RBX: 0000000000d771c0 RCX: 00007f85de145477
> [    7.950021] RDX: 0000000000d35fb0 RSI: 0000000000d697e0 RDI: 0000000000d771c0
> [    7.950021] RBP: 0000000000000000 R08: 00007ffdf2da3560 R09: 0000000000000030
> [    7.950021] R10: 000000000000059a R11: 0000000000000202 R12: 0000000000d771c0
> [    7.950021] R13: 0000000000d697e0 R14: 0000000000d35fb0 R15: 0000000000000000
> [    7.950021] Object at ffff88006a915b00, in cache kmalloc-1024 size: 1024
> [    7.950021] Allocated:
> [    7.950021] PID = 72
> [    7.950021]  save_stack_trace+0x1b/0x20
> [    7.950021]  kasan_kmalloc+0xee/0x190
> [    7.950021]  kmem_cache_alloc_node_trace+0x138/0x200
> [    7.950021]  bdi_alloc_node+0x4c/0xa0
> [    7.950021]  blk_alloc_queue_node+0xdd/0x870
> [    7.950021]  blk_mq_init_queue+0x41/0x90
> [    7.950021]  scsi_mq_alloc_queue+0x41/0x130 [scsi_mod]
> [    7.950021]  scsi_alloc_sdev+0x90e/0xd00 [scsi_mod]
> [    7.950021]  scsi_probe_and_add_lun+0x14b3/0x2250 [scsi_mod]
> [    7.950021]  __scsi_scan_target+0x23d/0xb60 [scsi_mod]
> [    7.950021]  scsi_scan_channel+0x107/0x160 [scsi_mod]
> [    7.950021]  scsi_scan_host_selected+0x1bf/0x250 [scsi_mod]
> [    7.950021]  do_scsi_scan_host+0x1bc/0x250 [scsi_mod]
> [    7.950021]  do_scan_async+0x41/0x4b0 [scsi_mod]
> [    7.950021]  async_run_entry_fn+0xc3/0x630
> [    7.950021]  process_one_work+0x531/0xf40
> [    7.950021]  worker_thread+0xe4/0x10d0
> [    7.950021]  kthread+0x298/0x390
> [    7.950021]  ret_from_fork+0x2c/0x40
> [    7.950021] Freed:
> [    7.950021] PID = 72
> [    7.950021]  save_stack_trace+0x1b/0x20
> [    7.950021]  kasan_slab_free+0xb0/0x180
> [    7.950021]  kfree+0x9f/0x1d0
> [    7.950021]  bdi_put+0x2a/0x30
> [    7.950021]  blk_release_queue+0x51/0x320
> [    7.950021]  kobject_put+0x154/0x430
> [    7.950021]  blk_put_queue+0x15/0x20
> [    7.950021]  scsi_device_dev_release_usercontext+0x59b/0x880 [scsi_mod]
> [    7.950021]  execute_in_process_context+0xd9/0x130
> [    7.950021]  scsi_device_dev_release+0x1c/0x20 [scsi_mod]
> [    7.950021]  device_release+0x76/0x1e0
> [    7.950021]  kobject_put+0x154/0x430
> [    7.950021]  put_device+0x17/0x20
> [    7.950021]  __scsi_remove_device+0x18d/0x250 [scsi_mod]
> [    7.950021]  scsi_probe_and_add_lun+0x14d6/0x2250 [scsi_mod]
> [    7.950021]  __scsi_scan_target+0x23d/0xb60 [scsi_mod]
> [    7.950021]  scsi_scan_channel+0x107/0x160 [scsi_mod]
> [    7.950021]  scsi_scan_host_selected+0x1bf/0x250 [scsi_mod]
> [    7.950021]  do_scsi_scan_host+0x1bc/0x250 [scsi_mod]
> [    7.950021]  do_scan_async+0x41/0x4b0 [scsi_mod]
> [    7.950021]  async_run_entry_fn+0xc3/0x630
> [    7.950021]  process_one_work+0x531/0xf40
> [    7.950021]  worker_thread+0xe4/0x10d0
> [    7.950021]  kthread+0x298/0x390
> [    7.950021]  ret_from_fork+0x2c/0x40
> [    7.950021] Memory state around the buggy address:
> [    7.950021]  ffff88006a915d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [    7.950021]  ffff88006a915d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [    7.950021] >ffff88006a915e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [    7.950021]                                      ^
> [    7.950021]  ffff88006a915e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [    7.950021]  ffff88006a915f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ----
> 
> Any ideas Jan?
> 
> Thanks.
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/13 v2] block: Fix block device shutdown related races
  2017-03-07 13:57             ` Jan Kara
@ 2017-03-07 16:28               ` Omar Sandoval
  2017-03-08 13:21                 ` Jan Kara
  0 siblings, 1 reply; 40+ messages in thread
From: Omar Sandoval @ 2017-03-07 16:28 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo, NeilBrown

On Tue, Mar 07, 2017 at 02:57:30PM +0100, Jan Kara wrote:
> On Mon 06-03-17 12:38:18, Omar Sandoval wrote:
> > Unfortunately, this patch actually seems to have introduced a
> > regression. Reverting the patch fixes it.
> > 
> > I'm running a QEMU kvm vm with a virtio-scsi device and I get oopses
> > like this:
> 
> What workload do you run? Or is it enough to just boot the kvm with
> virtio-scsi enabled? Can you send me the kvm setup so that I can reproduce
> this?

This is just while booting. In fact, you don't even need a virtio-scsi
disk attached, it's enough to have the virtio-scsi-pci controller. This
is my exact command line:

qemu-system-x86_64 -nodefaults -nographic -serial mon:stdio -cpu kvm64 \
	-enable-kvm -smp 1 -m 2G -watchdog i6300esb \
	-netdev user,id=vlan0,hostfwd=tcp:127.0.0.1:2222-:22 \
	-device virtio-net,netdev=vlan0 \
	-drive file=Silver/Silver.qcow2,index=0,media=disk,if=virtio,cache=none \
	-device virtio-scsi-pci \
	-kernel /home/osandov/linux/builds/virtio-scsi-crash/arch/x86/boot/bzImage \
	-virtfs local,path=/home/osandov/linux/builds/virtio-scsi-crash,security_model=none,readonly,mount_tag=modules \
	-append 'root=/dev/vda1 console=ttyS0,115200 scsi_mod.use_blk_mq=y'

My kernel config is here:
https://github.com/osandov/osandov-linux/blob/master/configs/qemu.config

> At least the KASAN error could be a result of the situation where someone
> called bdi_register() but didn't call bdi_unregister() before dropping the
> last bdi reference (which would make sense given I've moved
> bdi_unregister() call). However I'd expect a warning from bdi_exit() in that
> case and I don't see that in your kernel log. So I'll try to reproduce the
> issue and debug more.
> 
> 								Honza

Thanks for taking a look!

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/13 v2] block: Fix block device shutdown related races
  2017-03-07 16:28               ` Omar Sandoval
@ 2017-03-08 13:21                 ` Jan Kara
  0 siblings, 0 replies; 40+ messages in thread
From: Jan Kara @ 2017-03-08 13:21 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: Jan Kara, Jens Axboe, linux-block, Christoph Hellwig,
	Dan Williams, Thiago Jung Bauermann, Lekshmi Pillai, Tejun Heo,
	NeilBrown

[-- Attachment #1: Type: text/plain, Size: 1540 bytes --]

On Tue 07-03-17 08:28:23, Omar Sandoval wrote:
> On Tue, Mar 07, 2017 at 02:57:30PM +0100, Jan Kara wrote:
> > On Mon 06-03-17 12:38:18, Omar Sandoval wrote:
> > > Unfortunately, this patch actually seems to have introduced a
> > > regression. Reverting the patch fixes it.
> > > 
> > > I'm running a QEMU kvm vm with a virtio-scsi device and I get oopses
> > > like this:
> > 
> > What workload do you run? Or is it enough to just boot the kvm with
> > virtio-scsi enabled? Can you send me the kvm setup so that I can reproduce
> > this?
> 
> This is just while booting. In fact, you don't even need a virtio-scsi
> disk attached, it's enough to have the virtio-scsi-pci controller. This
> is my exact command line:
> 
> qemu-system-x86_64 -nodefaults -nographic -serial mon:stdio -cpu kvm64 \
> 	-enable-kvm -smp 1 -m 2G -watchdog i6300esb \
> 	-netdev user,id=vlan0,hostfwd=tcp:127.0.0.1:2222-:22 \
> 	-device virtio-net,netdev=vlan0 \
> 	-drive file=Silver/Silver.qcow2,index=0,media=disk,if=virtio,cache=none \
> 	-device virtio-scsi-pci \
> 	-kernel /home/osandov/linux/builds/virtio-scsi-crash/arch/x86/boot/bzImage \
> 	-virtfs local,path=/home/osandov/linux/builds/virtio-scsi-crash,security_model=none,readonly,mount_tag=modules \
> 	-append 'root=/dev/vda1 console=ttyS0,115200 scsi_mod.use_blk_mq=y'
> 
> My kernel config is here:
> https://github.com/osandov/osandov-linux/blob/master/configs/qemu.config

Thanks. I was able to reproduce. Do attached two patches help?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0001-block-Allow-bdi-re-registration.patch --]
[-- Type: text/x-patch, Size: 2047 bytes --]

>From 60623ea9de6309f7717fc3536e3594b079735af0 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Tue, 7 Mar 2017 18:01:51 +0100
Subject: [PATCH 1/2] block: Allow bdi re-registration

SCSI can call device_add_disk() several times for one request queue when
a device in unbound and bound, creating new gendisk each time. This will
lead to bdi being repeatedly registered and unregistered. This was not a
big problem until commit 165a5e22fafb "block: Move bdi_unregister() to
del_gendisk()" since bdi was only registered repeatedly (bdi_register()
handles repeated calls fine, only we ended up leaking reference to
gendisk due to overwriting bdi->owner) but unregistered only in
blk_cleanup_queue() which didn't get called repeatedly. After
165a5e22fafb we were doing correct bdi_register() - bdi_unregister()
cycles however bdi_unregister() is not prepared for it. So make sure
bdi_unregister() cleans up bdi in such a way that it is prepared for
a possible following bdi_register() call.

An easy way to provoke this behavior is to enable
CONFIG_DEBUG_TEST_DRIVER_REMOVE and use scsi_debug driver to create a
scsi disk which immediately hangs without this fix.

Fixes: 165a5e22fafb127ecb5914e12e8c32a1f0d3f820
Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/backing-dev.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 6d861d090e9f..6ac932210f56 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -710,6 +710,11 @@ static void cgwb_bdi_destroy(struct backing_dev_info *bdi)
 	 */
 	atomic_dec(&bdi->usage_cnt);
 	wait_event(cgwb_release_wait, !atomic_read(&bdi->usage_cnt));
+	/*
+	 * Grab back our reference so that we hold it when @bdi gets
+	 * re-registered.
+	 */
+	atomic_inc(&bdi->usage_cnt);
 }
 
 /**
@@ -857,6 +862,8 @@ int bdi_register_owner(struct backing_dev_info *bdi, struct device *owner)
 			MINOR(owner->devt));
 	if (rc)
 		return rc;
+	/* Leaking owner reference... */
+	WARN_ON(bdi->owner);
 	bdi->owner = owner;
 	get_device(owner);
 	return 0;
-- 
2.10.2


[-- Attachment #3: 0002-bdi-Fix-use-after-free-in-wb_congested_put.patch --]
[-- Type: text/x-patch, Size: 3877 bytes --]

>From b9a6424cabd0eb4a663a1faeb11afef268faebc6 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Wed, 8 Mar 2017 13:24:47 +0100
Subject: [PATCH 2/2] bdi: Fix use-after-free in wb_congested_put()

bdi_writeback_congested structures get created for each blkcg and bdi
regardless whether bdi is registered or not. When they are created in
unregistered bdi and the request queue (and thus bdi) is then destroyed
while blkg still holds reference to bdi_writeback_congested structure,
this structure will be referencing freed bdi and last wb_congested_put()
will try to remove the structure from already freed bdi.

With commit 165a5e22fafb "block: Move bdi_unregister() to
del_gendisk()", SCSI started to destroy bdis without calling
bdi_unregister() first (previously it was calling bdi_unregister() even
for unregistered bdis) and thus the code detaching
bdi_writeback_congested in cgwb_bdi_destroy() was not triggered and we
started hitting this use-after-free bug. It is enough to boot a KVM
instance with virtio-scsi device to trigger this behavior.

Fix the problem by detaching bdi_writeback_congested structures in
bdi_exit() instead of bdi_unregister(). This is also more logical as
they can get attached to bdi regardless whether it ever got registered
or not.

Fixes: 165a5e22fafb127ecb5914e12e8c32a1f0d3f820
Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/backing-dev.c | 36 +++++++++++++++++++++---------------
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 6ac932210f56..c6f2a37028c2 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -683,30 +683,18 @@ static int cgwb_bdi_init(struct backing_dev_info *bdi)
 static void cgwb_bdi_destroy(struct backing_dev_info *bdi)
 {
 	struct radix_tree_iter iter;
-	struct rb_node *rbn;
 	void **slot;
 
 	WARN_ON(test_bit(WB_registered, &bdi->wb.state));
 
 	spin_lock_irq(&cgwb_lock);
-
 	radix_tree_for_each_slot(slot, &bdi->cgwb_tree, &iter, 0)
 		cgwb_kill(*slot);
-
-	while ((rbn = rb_first(&bdi->cgwb_congested_tree))) {
-		struct bdi_writeback_congested *congested =
-			rb_entry(rbn, struct bdi_writeback_congested, rb_node);
-
-		rb_erase(rbn, &bdi->cgwb_congested_tree);
-		congested->bdi = NULL;	/* mark @congested unlinked */
-	}
-
 	spin_unlock_irq(&cgwb_lock);
 
 	/*
-	 * All cgwb's and their congested states must be shutdown and
-	 * released before returning.  Drain the usage counter to wait for
-	 * all cgwb's and cgwb_congested's ever created on @bdi.
+	 * All cgwb's must be shutdown and released before returning.  Drain
+	 * the usage counter to wait for all cgwb's ever created on @bdi.
 	 */
 	atomic_dec(&bdi->usage_cnt);
 	wait_event(cgwb_release_wait, !atomic_read(&bdi->usage_cnt));
@@ -754,6 +742,21 @@ void wb_blkcg_offline(struct blkcg *blkcg)
 	spin_unlock_irq(&cgwb_lock);
 }
 
+static void cgwb_bdi_exit(struct backing_dev_info *bdi)
+{
+	struct rb_node *rbn;
+
+	spin_lock_irq(&cgwb_lock);
+	while ((rbn = rb_first(&bdi->cgwb_congested_tree))) {
+		struct bdi_writeback_congested *congested =
+			rb_entry(rbn, struct bdi_writeback_congested, rb_node);
+
+		rb_erase(rbn, &bdi->cgwb_congested_tree);
+		congested->bdi = NULL;	/* mark @congested unlinked */
+	}
+	spin_unlock_irq(&cgwb_lock);
+}
+
 #else	/* CONFIG_CGROUP_WRITEBACK */
 
 static int cgwb_bdi_init(struct backing_dev_info *bdi)
@@ -774,7 +777,9 @@ static int cgwb_bdi_init(struct backing_dev_info *bdi)
 	return 0;
 }
 
-static void cgwb_bdi_destroy(struct backing_dev_info *bdi)
+static void cgwb_bdi_destroy(struct backing_dev_info *bdi) { }
+
+static void cgwb_bdi_exit(struct backing_dev_info *bdi)
 {
 	wb_congested_put(bdi->wb_congested);
 }
@@ -905,6 +910,7 @@ static void bdi_exit(struct backing_dev_info *bdi)
 {
 	WARN_ON_ONCE(bdi->dev);
 	wb_exit(&bdi->wb);
+	cgwb_bdi_exit(bdi);
 }
 
 static void release_bdi(struct kref *ref)
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2017-03-08 13:21 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-21 17:09 [PATCH 0/13 v2] block: Fix block device shutdown related races Jan Kara
2017-02-21 17:09 ` [PATCH 01/13] block: Move bdev_unhash_inode() after invalidate_partition() Jan Kara
2017-02-21 17:09 ` [PATCH 02/13] block: Unhash also block device inode for the whole device Jan Kara
2017-02-21 17:09 ` [PATCH 03/13] block: Revalidate i_bdev reference in bd_aquire() Jan Kara
2017-02-21 17:09 ` [PATCH 04/13] block: Move bdi_unregister() to del_gendisk() Jan Kara
2017-02-21 19:53   ` Bart Van Assche
2017-02-22  8:51     ` Jan Kara
2017-02-22 17:42       ` Bart Van Assche
2017-02-21 17:09 ` [PATCH 05/13] bdi: Mark congested->bdi as internal Jan Kara
2017-02-28 16:06   ` Tejun Heo
2017-02-21 17:09 ` [PATCH 06/13] bdi: Make wb->bdi a proper reference Jan Kara
2017-02-28 16:07   ` Tejun Heo
2017-02-21 17:09 ` [PATCH 07/13] bdi: Move removal from bdi->wb_list into wb_shutdown() Jan Kara
2017-02-28 16:14   ` Tejun Heo
2017-02-21 17:09 ` [PATCH 08/13] bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy() Jan Kara
2017-02-28 16:44   ` Tejun Heo
2017-02-21 17:09 ` [PATCH 09/13] bdi: Do not wait for cgwbs release in bdi_unregister() Jan Kara
2017-02-28 16:51   ` Tejun Heo
2017-02-21 17:09 ` [PATCH 10/13] bdi: Rename cgwb_bdi_destroy() to cgwb_bdi_unregister() Jan Kara
2017-02-28 16:51   ` Tejun Heo
2017-02-21 17:09 ` [PATCH 11/13] block: Fix oops in locked_inode_to_wb_and_lock_list() Jan Kara
2017-02-28 16:52   ` Tejun Heo
2017-02-21 17:09 ` [PATCH 12/13] kobject: Export kobject_get_unless_zero() Jan Kara
2017-02-21 19:39   ` Bart Van Assche
2017-02-21 17:09 ` [PATCH 13/13] block: Fix oops scsi_disk_get() Jan Kara
2017-02-21 19:33   ` Bart Van Assche
2017-02-21 17:19 ` [PATCH 0/13 v2] block: Fix block device shutdown related races Jan Kara
2017-02-21 18:55   ` Dan Williams
2017-02-21 17:19 ` Jens Axboe
2017-02-22 10:24   ` Jan Kara
2017-03-01  7:25     ` Omar Sandoval
2017-03-01  7:26       ` Omar Sandoval
2017-03-01 15:11         ` Jan Kara
2017-03-06 20:38           ` Omar Sandoval
2017-03-07 13:57             ` Jan Kara
2017-03-07 16:28               ` Omar Sandoval
2017-03-08 13:21                 ` Jan Kara
2017-02-28 16:54 ` Tejun Heo
2017-03-01 15:37   ` Jan Kara
2017-03-01 16:26     ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.